Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
If your product uses embedding-based retrieval and allows external or user-supplied documents, an attacker can cheaply force a poisoned document into search results and trigger downstream harms (phishing, data exfiltration, tool misuse). Protect retrieval and write access, not just the model.
Summary TLDR
The paper shows indirect prompt injection (IPI) is a practical end-to-end threat once an attacker makes poisoned documents be retrieved. It splits a malicious document into a short trigger (few tokens) that improves retrieval and an attack fragment with the payload. Using a black-box Cross-Entropy Method (CEM) to optimize triggers via embedding APIs, the authors achieve near-100% Recall@5 across 11 BEIR datasets and 8 embedding models, low cost (~$0.21 per target on some APIs), and end-to-end exploits (RAG and multi-agent) including SSH-key exfiltration with ~80% success in a multi-agent pipeline. Simple defenses (paraphrasing, perplexity filtering, token masking) fail once the attacker adap
Problem Statement
Modern LLM systems use external retrieval and can be hijacked if poisoned documents are returned. Previous work often assumes the malicious text is already retrieved. This paper asks: can an attacker reliably make a malicious item be retrieved under natural queries and realistic corpora when the attacker only has black-box access to embedding APIs and can inject a single document?
Main Contribution
Formulate IPI as two pieces: a compact trigger fragment (ensures retrieval) and an attack fragment (payload).
Design a practical black-box prefix-optimization attack (CEM variant) that builds short triggers (5–15 tokens) via only embedding API calls.
Large-scale empirical study: near-perfect retrieval across 11 BEIR datasets and 8 embedding models, low monetary cost (~$0.21 per query on some APIs).
First end-to-end exploits on RAG and agentic systems, including multi-agent SSH key exfiltration (~80% ASR), and evaluation showing common lightweight defenses are insufficient.
Key Findings
A short, optimized trigger reliably surfaces a single poisoned document into top-K retrieval.
The attack is low-cost and fast using commercial embedding APIs.
End-to-end exploitation succeeds across RAG and agentic pipelines including multi-agent orchestration.
Success depends on corpus competition (how relevant clean docs are).
Popular lightweight defenses are easily bypassed by adaptive attackers.
Perplexity-based filtering is fragile to trivial changes.
Results
Recall@5 (retrieval)
Attack cost (commercial APIs)
End-to-end ASR (multi-agent code exfiltration)
RAG targeted-answer ASR
Transferability (prefix from OpenAI embeddings)
Defense robustness (paraphrase)
Who Should Care
What To Try In 7 Days
Audit who can add documents to your retriever and block untrusted writers.
Log and monitor top-K retrieval outputs for high-sensitivity queries and alerts.
Run a red-team: generate a 10-token trigger for a few critical queries using an embedding API to test your corpus' vulnerability locally (use public BEIR/Enron samples). Do this in
Agent Features
Memory
- retrieval memory (external corpus)
Planning
- tool call planning
- round-robin scheduling
- orchestration across agents
Tool Use
- retrieval
- send_email
- contact_list access
- python code execution
Frameworks
- AutoGen
- MagenticOne
- Model Context Protocol (MCP)
Is Agentic
true
Architectures
- single-agent
- multi-agent (orchestrator + specialist agents)
Collaboration
- agent-to-agent delegation
- multi-agent information passing
Optimization Features
Token Efficiency
- effective triggers as short as 5–10 tokens
- longer triggers (≈15) increase success on harder corpora
System Optimization
- black-box API-only attack (no gradient access)
- sampling-based CEM avoids combinatorial search
Reproducibility
Code Urls
Data Urls
- BEIR benchmark (public)
- Enron email corpus (public)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Does not evaluate retriever pipelines that use rerankers or hybrid (embedding + lexical) search in depth.
- Transferability across different embedding architectures is not guaranteed; full black-box transfer requires attacker knowledge/guessing.
- Assumes attacker can inject at least one document into the corpus; closed write policies reduce attack surface.
- Defense evaluation excludes approaches that require retriever fine-tuning or model parameter changes.
When Not To Use
- If your retriever uses strong hybrid reranking or supervised rerankers that re-score candidates before returning them.
- If external corpora are fully write-restricted and only vetted ingest pipelines accept documents.
- If your deployment includes per-document cryptographic provenance or strict ingestion validation.
Failure Modes
- High corpus competition: many highly relevant clean documents can resist trigger insertion.
- Embedding models with strong position encoding (e.g., OpenAI in tests) can reduce transferability and dispersion attacks.
- Reranking layers that rely on features outside the raw embedding (e.g., lexical matches, supervised signals) can negate the optimized trigger.
Core Entities
Models
- gte-modernbert-base (ModernBERT)
- contriever-msmarco
- Qwen3-Embedding-0.6B
- Qwen3-Embedding-4B
- Qwen3-Embedding-8B
- OpenAI text-embedding-3-small
- VoyageAI voyage-3.5-lite
- Alibaba text-embedding-v4 (Qwen-v4)
- ViT-B-32 (OpenCLIP for image-text demo)
- GPT-4o
- GPT-4o-mini
- LLaMA-2-7B
- Vicuna (7B/13B)
- Qwen3 series
- AutoGen
- MagenticOne
Metrics
- Recall@5
- MRR@5
- nDCG@5
- Cosine similarity
- Attack Success Rate (ASR)
- Monetary cost per trigger generation
Datasets
- BEIR (11 corpora: MSMARCO, TREC-COVID, NFCorpus, NQ, HotpotQA, FiQA-2018, ArguAna, DBPedia, SCIDOCS,
- MS COCO (image-to-text demo)
- Enron email corpus (agent experiments)
Benchmarks
- BEIR
- MS COCO (cross-modal retrieval)

