Overview
The study is broad and uses multiple datasets, LLMs, retrievers, and defense variants; results are empirical and reproducible in concept, but exact code/data/thresholds are not provided so applying results requires re-implementation.
Citations0
Evidence Strength0.85
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/5
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 50%
Production readiness: 30%
Novelty: 60%
Why It Matters For Business
If your product augments an LLM with an open or large text store, attackers who can add or edit that store can steer answers or cause refusals; naive defenses leave gaps and some robust fixes reduce product quality.
Who Should Care
Summary TLDR
This paper introduces RSB, a unified benchmark that measures how text-based data-poisoning attacks affect Retrieval-Augmented Generation (RAG) systems. It evaluates 13 poisoning methods and 7 defenses across five standard QA datasets and two larger "expanded" versions per dataset (15 datasets total). Main takeaways: many simple attacks achieve high success on original datasets; expanding the knowledge base with many correct, similar passages (EX-M/EX-L) sharply reduces most attack success; a few attacks optimized per poisoned text (e.g., CRAG variants) keep working on richer databases; defenses help only in narrow cases (DoS) and hybrid filtering (TrustRAG) trades strong defense for large QI
Problem Statement
RAG systems reduce hallucinations by adding retrieved context, but their text knowledge stores can be poisoned. There was no systematic, comparable benchmark to measure how different poisoning attacks and defenses behave across many datasets and RAG variants.
Main Contribution
RSB benchmark: centralized evaluation of 13 poisoning attacks and 7 defenses across 15 dataset variants (5 QA datasets + EX-M and EX-L expansions).
Large empirical study: end-to-end tests with multiple LLMs, retrievers, similarity metrics, and advanced RAG frameworks (sequential, branching, conditional, loop), plus multi-turn, multimodal, and agent settings.
Key Findings
Most poisoning attacks work well on original QA datasets.
Attack success drops dramatically when the knowledge base is enriched with many correct, similar passages.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| ASR (attack success rate) | BPI ASR = 0.94 on NQ (original) | — | — | NQ (original) | Table 2: NQ row | Table 2 |
| ASR (attack success rate) | BPRAG ASR drops from 0.62 (NQ) to 0.03 (NQ-EX-L) | NQ (original) | -0.59 ASR | NQ -> NQ-EX-L | Table 2; Section 5.2.1 | Table 2 |
What To Try In 7 Days
Add redundant, relevant passages for high-risk queries (create EX-M style augmentation) and measure ASR drop.
Switch retriever scoring to cosine and re-evaluate attack surface (low-effort change with measurable effect).
Run TrustRAG or a hybrid filter in a staging environment to measure false-positive removals and utility loss before deploying widely.
Agent Features
Memory
Frameworks
Architectures
Reproducibility
Risks & Boundaries
Limitations
Benchmark focuses on offline, text-based knowledge databases; web-linked or live-index settings are not covered.
Default proxy and judge model is GPT-4o-mini; some attack results vary with stronger proxy LLMs.
When Not To Use
If your RAG system retrieves live web pages and cites URLs, attack surface differs and RSB's offline assumptions may not apply.
If your system uses proprietary closed retrievers with no public embedding model, some attack settings (white-box) are unrealistic.
Failure Modes
Hybrid filters (TrustRAG) may remove all retrieved context and cause severe accuracy loss (high false positives).
Per-text optimized attacks (CRAG variants) can bypass redundancy defenses by maximizing individual poisoned-text impact.

