Overview
Production Readiness
0.3
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
0
Why It Matters For Business
If your product augments an LLM with an open or large text store, attackers who can add or edit that store can steer answers or cause refusals; naive defenses leave gaps and some robust fixes reduce product quality.
Summary TLDR
This paper introduces RSB, a unified benchmark that measures how text-based data-poisoning attacks affect Retrieval-Augmented Generation (RAG) systems. It evaluates 13 poisoning methods and 7 defenses across five standard QA datasets and two larger "expanded" versions per dataset (15 datasets total). Main takeaways: many simple attacks achieve high success on original datasets; expanding the knowledge base with many correct, similar passages (EX-M/EX-L) sharply reduces most attack success; a few attacks optimized per poisoned text (e.g., CRAG variants) keep working on richer databases; defenses help only in narrow cases (DoS) and hybrid filtering (TrustRAG) trades strong defense for large QI
Problem Statement
RAG systems reduce hallucinations by adding retrieved context, but their text knowledge stores can be poisoned. There was no systematic, comparable benchmark to measure how different poisoning attacks and defenses behave across many datasets and RAG variants.
Main Contribution
RSB benchmark: centralized evaluation of 13 poisoning attacks and 7 defenses across 15 dataset variants (5 QA datasets + EX-M and EX-L expansions).
Large empirical study: end-to-end tests with multiple LLMs, retrievers, similarity metrics, and advanced RAG frameworks (sequential, branching, conditional, loop), plus multi-turn, multimodal, and agent settings.
Clear, actionable findings: expanded knowledge reduces many attacks; some attacks (budget/auxiliary-LLM optimized) remain effective; existing defenses have large blind spots.
Key Findings
Most poisoning attacks work well on original QA datasets.
Attack success drops dramatically when the knowledge base is enriched with many correct, similar passages.
Per-text optimized attacks remain the strongest in dense, information-rich databases.
Process-level defenses cut DoS-style attacks but fail against targeted poisoning; detection methods often miss crafted poisoned texts.
Hybrid filtering (TrustRAG) reduces many ASRs but harms utility by removing benign evidence.
Results
ASR (attack success rate)
ASR (attack success rate)
Defense effectiveness
Defense trade-off
Similarity metric impact
Who Should Care
What To Try In 7 Days
Add redundant, relevant passages for high-risk queries (create EX-M style augmentation) and measure ASR drop.
Switch retriever scoring to cosine and re-evaluate attack surface (low-effort change with measurable effect).
Run TrustRAG or a hybrid filter in a staging environment to measure false-positive removals and utility loss before deploying widely.
Agent Features
Memory
- retrieval memory (knowledge DB)
Frameworks
- multi-turn conversational RAG
- multimodal RAG
- RAG-based LLM agents
Architectures
- sequential RAG
- branching RAG
- conditional RAG
- loop RAG
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Benchmark focuses on offline, text-based knowledge databases; web-linked or live-index settings are not covered.
- Default proxy and judge model is GPT-4o-mini; some attack results vary with stronger proxy LLMs.
- No public code or datasets are linked in the paper, limiting turnkey reproducibility.
- Evaluations prioritize injection-by-text; attacker models that alter retriever weights are not fully explored.
When Not To Use
- If your RAG system retrieves live web pages and cites URLs, attack surface differs and RSB's offline assumptions may not apply.
- If your system uses proprietary closed retrievers with no public embedding model, some attack settings (white-box) are unrealistic.
Failure Modes
- Hybrid filters (TrustRAG) may remove all retrieved context and cause severe accuracy loss (high false positives).
- Per-text optimized attacks (CRAG variants) can bypass redundancy defenses by maximizing individual poisoned-text impact.
- Perplexity or embedding-norm detectors show high false negatives for well-crafted poisoned texts.
Core Entities
Models
- GPT-4o-mini
- GPT-4
- GPT-4.1
- Claude-3.7-Sonnet
- Gemini-2.0-flash
Metrics
- ACC
- ASR
- F1-score
Datasets
- Natural Questions (NQ)
- HotpotQA
- MS-MARCO
- SQuAD
- BoolQ
- EX-M (medium expansion)
- EX-L (large expansion)
Benchmarks
- RSB (this paper's RAG Security Bench)
Context Entities
Models
- Llama-4Scout
- DeepSeek-V3
Metrics
- Perplexity-based detection (PPL)
- Embedding norm detection (Norm)
Datasets
- InfoSeek (multimodal evaluation set)

