Overview
KG augmentation gives practical gains (especially for small models and domain tasks) but depends on retriever quality, KG coverage, and added latency; choose retrieval/validation for immediate benefit and heavy training only for critical domains.
Citations16
Evidence Strength0.70
Confidence0.80
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 3/3
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
Adding knowledge graphs to LLMs can cut factual errors quickly, especially for small models and domain tasks, improving trustworthiness without full model retraining.
Who Should Care
Summary TLDR
This survey reviews methods that add structured knowledge (knowledge graphs, KGs) to large language models to reduce hallucinations. It groups approaches into three practical stages: KG-augmented inference (retrieval, reasoning, controlled generation), KG-aware training (pre-training and fine-tuning), and KG-based validation (fact-checking). The authors report that KG retrieval often boosts small-model QA accuracy substantially (papers report >80% improvements on evaluated QA tasks) and that KG-guided step-wise reasoning can raise reasoning accuracy (e.g., RoG raised ChatGPT from 66.8% to 85.7% on its tests). The survey highlights trade-offs: retrieval and validation are low-cost but rely on
Problem Statement
LLMs often produce plausible-sounding but incorrect statements ('hallucinations') because their internal knowledge is incomplete or outdated. The paper asks: can structured external knowledge (knowledge graphs) be added at inference, training, or validation stages to reduce hallucinations and improve reasoning?
Main Contribution
A concise taxonomy that groups KG-augmentation methods into Knowledge-Aware Inference, Knowledge-Aware Training, and Knowledge-Aware Validation.
A comparison table of representative methods, datasets, LLMs, and training costs to help pick an approach.
Key Findings
KG-augmented retrieval can dramatically improve QA correctness for small models.
KG-guided stepwise reasoning substantially raises reasoning accuracy on evaluated tasks.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| QA answer correctness (small LMs) | >80% improvement reported on evaluated QA tasks when augmented with KG retrieval | no KG retrieval | >=80% gain (as reported) | evaluated QA datasets (Baek et al.; Sen et al.; Wu et al.) | Section 4.3; Baek et al. 2023; Sen et al. 2023; Wu et al. 2023 | Section 4.3 |
| Accuracy | 85.7% with KG-guided reasoning | 66.8% without KG-guided reasoning | +18.9 percentage points | reasoning benchmark used in Luo et al. 2023 | Section 3.1.2 and 4.3; Luo et al. 2023 | Luo et al. 2023 |
What To Try In 7 Days
Build a simple KG-backed retriever and prepend retrieved triples to model prompts for QA experiments.
Run KG-based post-generation fact checks on a sample of high-stakes outputs to measure hallucination rates.
Pilot chain-of-thought + KG retrieval on a handful of multi-step queries to compare accuracy vs. baseline.
Agent Features
Memory
Tool Use
Frameworks
Architectures
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Survey may miss recent or niche works due to page and timeframe limits.
Comparisons mix heterogeneous benchmarks and setups, limiting direct apples-to-apples claims.
When Not To Use
For casual conversational agents where perfect factuality is not required.
When no reliable KG covers your domain or building one is infeasible.
Failure Modes
Retriever returns irrelevant or outdated triples, causing confident but wrong answers.
KGs introduce bias or propagate incorrect facts from their sources.

