Overview
Survey synthesizes many papers and benchmarks; evidence is descriptive rather than new experiments, so practical recommendations are reliable but need empirical tuning for each deployment.
Citations8
Evidence Strength0.70
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 2/3
Findings with evidence refs: 3/3
Results with explicit delta: 0/0
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
Keeping LLMs accurate saves user trust and legal risk: use prompt/input edits for cheap, fast fixes, model editing for durable updates, and retrieval for up-to-date answers when models show low confidence.
Who Should Care
Summary TLDR
This paper surveys two main ways to give LLMs fresh, accurate knowledge: knowledge editing (changing model behavior by editing weights or adding plug-ins) and retrieval augmentation (fetching external documents at inference). It organizes methods, catalogs benchmarks (editing: ZsRE/CounterFact; retrieval: NQ/HotPotQA/FEVER), and highlights gaps: most edits target single facts, retrieval needs robust judgement and conflict resolution, and multi-source/multimodal integration is underexplored. Practical takeaways: prefer prompt/input edits for cheap fixes, use model editing for persistent changes, and use retrieval when models are uncertain or entity popularity is low.
Problem Statement
Large language models hold a lot of knowledge in their weights but still fail on up-to-date facts, long-tail entities, and hallucinations. Two complementary fixes exist: knowledge editing (change model behavior or attach plug-ins) and retrieval augmentation (keep model weights fixed and fetch external text). The field is fragmented and lacks a unified taxonomy, comprehensive benchmarks, and practical guidance for conflict resolution.
Main Contribution
Systematic taxonomy of knowledge-integration methods: input editing, model editing, and post-edit assessment
Detailed review of retrieval augmentation: when to fetch, how to fetch, how to use docs, and how to handle conflicts
Key Findings
Most knowledge-editing evaluations focus on triple-fact QA benchmarks like ZsRE and CounterFact.
Retrieval-judgement methods cluster into simple calibration thresholds and model-based judgments, each with trade-offs.
What To Try In 7 Days
Log cases where your LLM is low-confidence or wrong; mark entity popularity
Add a retrieval step for low-popularity or low-confidence queries and measure accuracy lift
Prototype an input-editing prompt that prepends a short factual context and check impact on hallucination rates
Reproducibility
Risks & Boundaries
Limitations
Survey focuses on English and Wikipedia-style sources; less coverage of private or multimodal knowledge sources
Many editing methods assume single-fact edits; real-world bulk or structured updates remain hard
When Not To Use
When you need guaranteed, provable updates across all model outputs without ripple effects
When deployment cannot support a retriever or external corpus
Failure Modes
Model ignores retrieved context and returns memorized (outdated) facts
Edited facts cause unintended changes to unrelated model behavior (ripple effects)

