Overview
Production Readiness
0.7
Novelty Score
0.3
Cost Impact Score
0.6
Citation Count
612
Why It Matters For Business
RAG lets you keep LLMs current and auditable by fetching external facts at inference time; this reduces hallucinations and speeds updates without retraining the base model.
Summary TLDR
This paper surveys Retrieval-Augmented Generation (RAG) for large language models. It organizes RAG into three practical paradigms (Naive, Advanced, Modular), and breaks down the technical stack across retrieval, generation, and augmentation. The survey catalogs retrieval sources, indexing and query tricks, embedding and reranking methods, iterative/adaptive retrieval patterns, evaluation tasks/benchmarks, and engineering challenges (robustness, long-context tradeoffs, production tooling). The authors provide a compact evaluation map and a GitHub resource list.
Problem Statement
LLMs are powerful but make factual errors, go out of date, and hide their evidence trail. Research on retrieval augmentation is scattered. Practitioners need a unified view of RAG methods, components, evaluation practices, and production challenges.
Main Contribution
Systematic review of RAG research organized into Naive, Advanced, and Modular paradigms.
Detailed analysis of the three core RAG stages: Retrieval, Generation, and Augmentation.
Compilation of downstream tasks, ~50 datasets, benchmarks, and evaluation objectives, plus a discussion of open challenges and directions.
Key Findings
Surveyed RAG work covers a broad task and dataset space.
RAG often beats unsupervised fine-tuning on knowledge updates.
Including irrelevant documents can sometimes increase accuracy.
LLMs now accept very long contexts, changing RAG tradeoffs.
Who Should Care
What To Try In 7 Days
Build a simple RAG QA pipeline: chunk docs, create embeddings, run nearest-neighbor retrieval, and feed top-k snippets to an LLM.
Add a light reranker or LLM-based filter to improve context relevance before generation.
Measure hit rate/MRR and compare one-shot retrieval vs. a small iterative retrieval loop on a key task.
Agent Features
Memory
- Retrieval Memory (external KB)
- LLM Self-memory modules
Planning
- Iterative Retrieve-Generate loops
- Recursive query decomposition
Tool Use
- Search engines and vector DBs
- LLM-generated queries (HyDE)
Frameworks
- LlamaIndex
- LangChain
- HayStack
Architectures
- Naive RAG
- Advanced RAG
- Modular RAG
Collaboration
- Retriever-generator alignment training
Optimization Features
Token Efficiency
- Context compression via small LM compressors
- Sliding-window and Small2Big chunking
Infra Optimization
- Vector DB indexing strategies
- Hierarchical indices and KG-backed indexes
Model Optimization
- Retriever fine-tuning
- Adapter layers for retriever/generator alignment
System Optimization
- Hybrid sparse+dense retrieval
- Metadata routing to narrow search scope
Training Optimization
- LM-supervised retriever (LSR)
- Contrastive learning for compressors
- KL alignment between retriever and generator
Inference Optimization
- Reranking and filter-reranker patterns
- Adaptive retrieval triggers (confidence thresholds)
- Token elimination and prompt compression
Reproducibility
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Retrieval noise and irrelevant documents can still break generation quality.
- Handling semi-structured data (tables, PDFs) is immature and error-prone.
- Evaluation metrics for RAG aspects (faithfulness, integration) are not standardized.
- Knowledge graphs give precision but incur build/maintenance cost.
When Not To Use
- Ultra-low-latency or very high throughput systems where retrieval latency is unacceptable.
- Tasks that require no external knowledge and can be solved by the base LLM.
- Environments with strict data exposure rules where retrieved sources may leak private data.
Failure Modes
- Hallucination despite retrieved context (generator ignores evidence).
- Over-reliance on retrieved text leading to verbatim echoing.
- Retriever misses critical documents (low recall) and yields wrong answers.
- Data leakage or provenance errors exposing sources or metadata.
Core Entities
Models
- RETRO++
- InstructRETRO
- REPLUG
- Self-RAG
- FLARE
- RECITE
- RAG-Robust
Metrics
- Accuracy
- EM
- F1
- Hit Rate
- MRR
- NDCG
- BLEU
- ROUGE-L
Datasets
- NaturalQuestions
- TriviaQA
- SQuAD
- HotpotQA
- ELI5
- ARXIV/PubMed (PaperQA examples)
- MSMARCO
Benchmarks
- RGB
- RECALL
- CRUD
- RAGAS
- ARES
- RALLE
Context Entities
Models
- BERT-based retrievers
- Dense retrievers (DPR-style)
- Sparse retrievers (BM25)
Metrics
- R-Rate (reappearance rate)
- BertScore
Datasets
- HotpotQA
- DPR Wikipedia
- C4 (pretraining examples)
Benchmarks
- MTEB (embedding leaderboard)
- C-MTEB (Chinese)

