A practical black-box method that forces poisoned documents into retrieval and hijacks RAG and agentic systems

January 11, 202610 min

Overview

Decision SnapshotReady For Pilot

The method adapts known CEM optimization to a new threat (trigger construction) and demonstrates consistent, reproducible results across many datasets and models; experiments include realistic costs and agent pipelines, giving strong practical evidence.

Citations0

Evidence Strength0.90

Confidence0.88

Risk Signals10

Trust Signals

Findings with numeric evidence: 6/6

Findings with evidence refs: 6/6

Results with explicit delta: 3/6

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 60%

Authors

Hongyan Chang, Ergute Bao, Xinjian Luo, Ting Yu

Links

Abstract / PDF / Code / Data

Why It Matters For Business

If your product uses embedding-based retrieval and allows external or user-supplied documents, an attacker can cheaply force a poisoned document into search results and trigger downstream harms (phishing, data exfiltration, tool misuse). Protect retrieval and write access, not just the model.

Who Should Care

Summary TLDR

The paper shows indirect prompt injection (IPI) is a practical end-to-end threat once an attacker makes poisoned documents be retrieved. It splits a malicious document into a short trigger (few tokens) that improves retrieval and an attack fragment with the payload. Using a black-box Cross-Entropy Method (CEM) to optimize triggers via embedding APIs, the authors achieve near-100% Recall@5 across 11 BEIR datasets and 8 embedding models, low cost (~$0.21 per target on some APIs), and end-to-end exploits (RAG and multi-agent) including SSH-key exfiltration with ~80% success in a multi-agent pipeline. Simple defenses (paraphrasing, perplexity filtering, token masking) fail once the attacker adap

Problem Statement

Modern LLM systems use external retrieval and can be hijacked if poisoned documents are returned. Previous work often assumes the malicious text is already retrieved. This paper asks: can an attacker reliably make a malicious item be retrieved under natural queries and realistic corpora when the attacker only has black-box access to embedding APIs and can inject a single document?

Main Contribution

Formulate IPI as two pieces: a compact trigger fragment (ensures retrieval) and an attack fragment (payload).

Design a practical black-box prefix-optimization attack (CEM variant) that builds short triggers (5–15 tokens) via only embedding API calls.

Key Findings

A short, optimized trigger reliably surfaces a single poisoned document into top-K retrieval.

NumbersRecall@5 ≈ 95% average across 11 BEIR datasets at n=10 tokens

Practical UseIf an attacker can write one document and run embedding queries, a 10-token trigger can make that document appear among top-5 results for most queries; operators must harden retrieval, not just post-retrieval generation.

Evidence RefTable 3; Section 4.1

The attack is low-cost and fast using commercial embedding APIs.

NumbersTrigger generation costs $0.21$0.76 per target on evaluated APIs

Practical UseAttack feasibility is realistic: small budgets buy reliable retrieval exploits; treat embedding API access and write permissions as high-risk assets.

Evidence RefSection 4.1 (Efficiency and cost)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Recall@5 (retrieval)≈95% average across 11 BEIR datasets at n=10 tokensVanilla (no trigger) ≈0%Large increase vs baselines (from near-zero to ~95%)Aggregate over 11 BEIR datasets (100 queries each)Table 3; Section 4.1Table 3
Attack cost (commercial APIs)$0.21 per target (Voyage/OpenAI) to $0.76 (Qwen-v4)Cost measured for trigger generation per target querySection 4.1 (Efficiency and cost)Section 4.1

What To Try In 7 Days

Audit who can add documents to your retriever and block untrusted writers.

Log and monitor top-K retrieval outputs for high-sensitivity queries and alerts.

Run a red-team: generate a 10-token trigger for a few critical queries using an embedding API to test your corpus' vulnerability locally (use public BEIR/Enron samples). Do this in

Agent Features

Memory
retrieval memory (external corpus)
Planning
tool call planninground-robin schedulingorchestration across agents
Tool Use
retrievalsend_emailcontact_list accesspython code execution
Frameworks
AutoGenMagenticOneModel Context Protocol (MCP)
Is Agentic

Yes

Architectures
single-agentmulti-agent (orchestrator + specialist agents)
Collaboration
agent-to-agent delegationmulti-agent information passing

Optimization Features

Token Efficiency
effective triggers as short as 5–10 tokenslonger triggers (≈15) increase success on harder corpora
System Optimization
black-box API-only attack (no gradient access)sampling-based CEM avoids combinatorial search

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

BEIR benchmark (public)Enron email corpus (public)

Risks & Boundaries

Limitations

Does not evaluate retriever pipelines that use rerankers or hybrid (embedding + lexical) search in depth.

Transferability across different embedding architectures is not guaranteed; full black-box transfer requires attacker knowledge/guessing.

When Not To Use

If your retriever uses strong hybrid reranking or supervised rerankers that re-score candidates before returning them.

If external corpora are fully write-restricted and only vetted ingest pipelines accept documents.

Failure Modes

High corpus competition: many highly relevant clean documents can resist trigger insertion.

Embedding models with strong position encoding (e.g., OpenAI in tests) can reduce transferability and dispersion attacks.

Core Entities

Models

gte-modernbert-base (ModernBERT)contriever-msmarcoQwen3-Embedding-0.6BQwen3-Embedding-4BQwen3-Embedding-8BOpenAI text-embedding-3-smallVoyageAI voyage-3.5-liteAlibaba text-embedding-v4 (Qwen-v4)ViT-B-32 (OpenCLIP for image-text demo)GPT-4oGPT-4o-miniLLaMA-2-7BVicuna (7B/13B)Qwen3 seriesAutoGenMagenticOne

Metrics

Recall@5MRR@5nDCG@5Cosine similarityAttack Success Rate (ASR)Monetary cost per trigger generation

Datasets

BEIR (11 corpora: MSMARCO, TREC-COVID, NFCorpus, NQ, HotpotQA, FiQA-2018, ArguAna, DBPedia, SCIDOCS,MS COCO (image-to-text demo)Enron email corpus (agent experiments)

Benchmarks

BEIRMS COCO (cross-modal retrieval)