Survey: How to add, update, and use external knowledge with large language models

November 10, 20236 min

Overview

Production Readiness

0.6

Novelty Score

0.5

Cost Impact Score

0.6

Citation Count

8

Authors

Zhangyin Feng, Weitao Ma, Weijiang Yu, Lei Huang, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting liu

Links

Abstract / PDF

Why It Matters For Business

Keeping LLMs accurate saves user trust and legal risk: use prompt/input edits for cheap, fast fixes, model editing for durable updates, and retrieval for up-to-date answers when models show low confidence.

Summary TLDR

This paper surveys two main ways to give LLMs fresh, accurate knowledge: knowledge editing (changing model behavior by editing weights or adding plug-ins) and retrieval augmentation (fetching external documents at inference). It organizes methods, catalogs benchmarks (editing: ZsRE/CounterFact; retrieval: NQ/HotPotQA/FEVER), and highlights gaps: most edits target single facts, retrieval needs robust judgement and conflict resolution, and multi-source/multimodal integration is underexplored. Practical takeaways: prefer prompt/input edits for cheap fixes, use model editing for persistent changes, and use retrieval when models are uncertain or entity popularity is low.

Problem Statement

Large language models hold a lot of knowledge in their weights but still fail on up-to-date facts, long-tail entities, and hallucinations. Two complementary fixes exist: knowledge editing (change model behavior or attach plug-ins) and retrieval augmentation (keep model weights fixed and fetch external text). The field is fragmented and lacks a unified taxonomy, comprehensive benchmarks, and practical guidance for conflict resolution.

Main Contribution

Systematic taxonomy of knowledge-integration methods: input editing, model editing, and post-edit assessment

Detailed review of retrieval augmentation: when to fetch, how to fetch, how to use docs, and how to handle conflicts

Catalog of benchmarks for both editing and retrieval, plus a short roadmap of open problems and applications

Key Findings

Most knowledge-editing evaluations focus on triple-fact QA benchmarks like ZsRE and CounterFact.

NumbersZsRE: 182,282; CounterFact: 21,919

Retrieval-judgement methods cluster into simple calibration thresholds and model-based judgments, each with trade-offs.

Model editing methods range from single precise edits to bulk edits; MEMIT can update thousands of edits at once.

NumbersMEMIT: updates thousands of edits

Who Should Care

What To Try In 7 Days

Log cases where your LLM is low-confidence or wrong; mark entity popularity

Add a retrieval step for low-popularity or low-confidence queries and measure accuracy lift

Prototype an input-editing prompt that prepends a short factual context and check impact on hallucination rates

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Survey focuses on English and Wikipedia-style sources; less coverage of private or multimodal knowledge sources
  • Many editing methods assume single-fact edits; real-world bulk or structured updates remain hard
  • Conflict resolution between parametric memory and retrieved docs is mostly analyzed, not solved

When Not To Use

  • When you need guaranteed, provable updates across all model outputs without ripple effects
  • When deployment cannot support a retriever or external corpus
  • When you require real-time private data and cannot expose it to external retrievers

Failure Modes

  • Model ignores retrieved context and returns memorized (outdated) facts
  • Edited facts cause unintended changes to unrelated model behavior (ripple effects)
  • Retrieval returns noisy or adversarial passages and misleads the model
  • Threshold-based retrieval decisions fail across domains due to calibration drift

Core Entities

Models

  • ROME
  • MEMIT
  • MEND
  • KE
  • NKB
  • SERAC
  • T-Patcher
  • GRACE
  • PMET
  • REPLUG
  • REPLUG LSR
  • GENRE
  • DSI

Metrics

  • EM
  • F1
  • Accuracy

Datasets

  • ZsRE
  • CounterFact
  • CounterFact+
  • Bi-ZsRE
  • MQUAKE
  • RippleEdits
  • Eva-KELLM
  • Natural Questions
  • TriviaQA
  • PopQA
  • HotPotQA
  • 2WikiMultiHopQA
  • MuSiQue
  • Bamboogle
  • FEVER
  • FEVERous
  • FoolMeTwice
  • StrategyQA
  • CommonsenseQA
  • INFOTABS

Benchmarks

  • ZsRE
  • CounterFact
  • Bi-ZsRE
  • MQUAKE
  • Natural Questions
  • HotPotQA
  • FEVER
  • StrategyQA