A broad benchmark shows RAG systems remain vulnerable to data poisoning and current defenses only partially help

Overview

Decision SnapshotNeeds Validation

The study is broad and uses multiple datasets, LLMs, retrievers, and defense variants; results are empirical and reproducible in concept, but exact code/data/thresholds are not provided so applying results requires re-implementation.

Citations0

Evidence Strength0.85

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/5

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 50%

Production readiness: 30%

Novelty: 60%

Authors

Baolei Zhang, Haoran Xin, Jiatong Li, Dongzhe Zhang, Minghong Fang, Zhuqing Liu, Lihai Nie, Zheli Liu

Links

Abstract / PDF

Why It Matters For Business

If your product augments an LLM with an open or large text store, attackers who can add or edit that store can steer answers or cause refusals; naive defenses leave gaps and some robust fixes reduce product quality.

Who Should Care

CTO ML Engineer Product Manager Engineering Lead Data Scientist

Summary TLDR

This paper introduces RSB, a unified benchmark that measures how text-based data-poisoning attacks affect Retrieval-Augmented Generation (RAG) systems. It evaluates 13 poisoning methods and 7 defenses across five standard QA datasets and two larger "expanded" versions per dataset (15 datasets total). Main takeaways: many simple attacks achieve high success on original datasets; expanding the knowledge base with many correct, similar passages (EX-M/EX-L) sharply reduces most attack success; a few attacks optimized per poisoned text (e.g., CRAG variants) keep working on richer databases; defenses help only in narrow cases (DoS) and hybrid filtering (TrustRAG) trades strong defense for large QI

Problem Statement

RAG systems reduce hallucinations by adding retrieved context, but their text knowledge stores can be poisoned. There was no systematic, comparable benchmark to measure how different poisoning attacks and defenses behave across many datasets and RAG variants.

Main Contribution

RSB benchmark: centralized evaluation of 13 poisoning attacks and 7 defenses across 15 dataset variants (5 QA datasets + EX-M and EX-L expansions).

Large empirical study: end-to-end tests with multiple LLMs, retrievers, similarity metrics, and advanced RAG frameworks (sequential, branching, conditional, loop), plus multi-turn, multimodal, and agent settings.

Key Findings

Most poisoning attacks work well on original QA datasets.

NumbersExample: BPI ASR = 0.94 on NQ (Table 2)

Practical UseAssume an attacker who can inject texts into a standard knowledge DB can often push attacker-chosen answers; hardening is needed before exposing RAG to uncurated sources.

Evidence RefTable 2

Attack success drops dramatically when the knowledge base is enriched with many correct, similar passages.

NumbersBPRAG ASR falls from 0.62 (NQ) to 0.03 (NQ-EX-L) (Table 2)

Practical UseAdding redundant, high-quality supporting passages for queries is a cheap passive defense: fill DB with diverse correct texts to reduce poisoning impact.

Evidence RefTable 2; Appendix G

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
ASR (attack success rate)	BPI ASR = 0.94 on NQ (original)	—	—	NQ (original)	Table 2: NQ row	Table 2
ASR (attack success rate)	BPRAG ASR drops from 0.62 (NQ) to 0.03 (NQ-EX-L)	NQ (original)	-0.59 ASR	NQ -> NQ-EX-L	Table 2; Section 5.2.1	Table 2

What To Try In 7 Days

Add redundant, relevant passages for high-risk queries (create EX-M style augmentation) and measure ASR drop.

Switch retriever scoring to cosine and re-evaluate attack surface (low-effort change with measurable effect).

Run TrustRAG or a hybrid filter in a staging environment to measure false-positive removals and utility loss before deploying widely.

Agent Features

Memory

retrieval memory (knowledge DB)

Frameworks

multi-turn conversational RAGmultimodal RAGRAG-based LLM agents

Architectures

sequential RAGbranching RAGconditional RAGloop RAG

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Risks & Boundaries

Limitations

Benchmark focuses on offline, text-based knowledge databases; web-linked or live-index settings are not covered.

Default proxy and judge model is GPT-4o-mini; some attack results vary with stronger proxy LLMs.

When Not To Use

If your RAG system retrieves live web pages and cites URLs, attack surface differs and RSB's offline assumptions may not apply.

If your system uses proprietary closed retrievers with no public embedding model, some attack settings (white-box) are unrealistic.

Failure Modes

Hybrid filters (TrustRAG) may remove all retrieved context and cause severe accuracy loss (high false positives).

Per-text optimized attacks (CRAG variants) can bypass redundancy defenses by maximizing individual poisoned-text impact.

Core Entities

Models

GPT-4o-miniGPT-4GPT-4.1Claude-3.7-SonnetGemini-2.0-flash

Metrics

ACCASRF1-score

Datasets

Natural Questions (NQ)HotpotQAMS-MARCOSQuADBoolQEX-M (medium expansion)EX-L (large expansion)

Benchmarks

RSB (this paper's RAG Security Bench)

Context Entities

Models

Llama-4ScoutDeepSeek-V3

Metrics

Perplexity-based detection (PPL)Embedding norm detection (Norm)

Datasets

InfoSeek (multimodal evaluation set)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Most poisoning attacks work well on original QA datasets.

Attack success drops dramatically when the knowledge base is enriched with many correct, similar passages.

Results

What To Try In 7 Days

Agent Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

Metrics

Datasets

You May Also Want to Read

Fine-tune LLMs to ignore misleading retrieved documents and cut RAG hallucinations by ~21%

Key finding

17K open-access synthesis recipes + an LLM-as-a-Judge benchmark to scale materials synthesis evaluation

Key finding

LIT-RAGBench: a 114-item benchmark testing LLM generators' integration, reasoning, table understanding, logic, and abstention in RAG

Key finding

RAGElo: use synthetic queries + LLM-as-judge + Elo tournaments to compare RAG vs RAG-Fusion on company docs

Key finding

First benchmark and toolkit to test RAG for multi-turn Chinese legal consultations

Key finding