Overview
Scores reflect a literature survey: the paper synthesizes many published systems and datasets, so conclusions are broad but not backed by a unified experimental protocol.
Citations0
Evidence Strength0.70
Confidence0.80
Risk Signals9
Trust Signals
Findings with numeric evidence: 0/5
Findings with evidence refs: 5/5
Results with explicit delta: 0/0
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 40%
Why It Matters For Business
Combining KGs with LLMs reduces hallucinations and adds verifiable evidence for high-stakes QA, but it raises compute and maintenance costs—trade accuracy and traceability against latency and budget.
Who Should Care
Summary TLDR
This is a focused survey that organizes and compares methods that combine large language models (LLMs) with knowledge graphs (KGs) to improve question answering (QA). It proposes a three-role taxonomy (KG as background knowledge, as reasoning guideline, and as refiner/validator), reviews representative systems (GraphRAG, KG-RAG, KG-Adapter, KG-Agent, etc.), summarizes benchmarks and metrics, and highlights practical bottlenecks: costly graph retrieval, knowledge misalignment, and KG incompleteness. The paper ends with concrete optimization ideas (indexing, prompt tuning, cost-aware policies) and research directions for scaling, dynamic updates, and fairness-aware retrieval.
Problem Statement
LLM-based QA is strong on language but struggles with complex, multi-step, time-sensitive, or domain-specific questions due to limited reasoning, outdated parametric knowledge, and hallucinations. How can structured, factual KGs be combined with LLMs to reduce hallucination, improve multi-hop reasoning, and provide explainable evidence while remaining efficient and up-to-date?
Main Contribution
A structured taxonomy that classifies LLM+KG QA methods by QA type and the KG's role: background knowledge, reasoning guideline, refiner/validator, and hybrid.
A systematic survey and comparison of recent representative methods, grouped by the KG role and aligned to complex QA tasks (multi-doc, multi-modal, multi-hop, conversational, explainable, temporal).
Key Findings
Using KGs in three roles (background, guideline, refiner) is the dominant design pattern for combining KGs with LLMs in QA.
Graph-based RAG (GraphRAG / KG-RAG) retrieves structured subgraphs rather than raw text and improves reasoning and evidence grounding compared to text-only RAG.
What To Try In 7 Days
Prototype KG-augmented retrieval: add a subgraph retrieval step to your RAG pipeline and compare answer correctness on 50 domain questions.
Run a simple KG-based validator: re-check LLM answers against a KG and measure how many answers change or get flagged.
Measure retrieval quality: compute retrieval relevance (MRR/NDCG) and downstream answer quality (accuracy/EM) with and without KG input.
Agent Features
Memory
Planning
Tool Use
Frameworks
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
May miss very recent papers due to rapid publication pace (authors note this).
Survey emphasizes taxonomy and qualitative alignment; it underemphasizes head-to-head quantitative comparisons.
When Not To Use
When no reliable KG exists for your domain or KG coverage is very sparse.
When ultra-low latency and very high throughput matter and you cannot afford KG traversal costs.
Failure Modes
Knowledge conflicts between KG facts and LLM parametric facts can cause inconsistent answers.
Outdated or incomplete KGs lead to false negatives in validation and wrongful filtering of correct model outputs.

