RoG: Ground LLM plans on knowledge‑graph relation paths for faithful, interpretable KGQA

October 2, 20238 min

Overview

Decision SnapshotNeeds Validation

RoG shows clear gains for KG-backed QA by forcing LLMs to plan with relation paths and reason over retrieved KG instances; it is practical when a high-quality KG and entity linking exist, but requires KG preprocessing and moderate GPU resources.

Citations38

Evidence Strength0.70

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Linhao Luo, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan

Links

Abstract / PDF / Code / Data

Why It Matters For Business

RoG reduces hallucinations by grounding LLM reasoning in KG facts and provides traceable, human-readable paths—this improves accuracy and trust on KG-backed QA without retraining every LLM.

Who Should Care

Summary TLDR

RoG is a planning–retrieval–reasoning method that forces LLMs to produce relation-path plans grounded in a knowledge graph (KG), retrieves matching reasoning paths from the KG, and then uses those paths as context for final answers. The method is trained by two instruction-tuning tasks (planning and retrieval-reasoning), is plug-and-play at inference (the planning module can be used with other LLMs), and yields state-of-the-art results on KGQA benchmarks (WebQSP and CWQ) while producing human-readable, KG-grounded explanations. Code and weights are released.

Problem Statement

LLMs hallucinate and lack up-to-date facts during multi-hop reasoning. Prior KG+LLM approaches either generate brittle logical queries or treat KGs as loose text stores and ignore KG structure. We need a way to make LLM reasoning faithful to KG facts and interpretable by exposing KG relation paths.

Main Contribution

Introduce RoG: a planning–retrieval–reasoning pipeline that uses relation paths as KG-grounded plans.

Two-task instruction tuning: (1) planning optimization to teach LLMs to output KG relation paths, (2) retrieval-reasoning optimization to make LLMs reason over retrieved KG paths.

Key Findings

RoG sets new best scores on standard KGQA benchmarks.

NumbersWebQSP Hits@1 85.7; F1 70.8. CWQ Hits@1 62.6; F1 56.2.

Practical UseIf you have a KG-backed QA task, RoG gives higher answer accuracy and more interpretable outputs than prior KGQA/LLM baselines on evaluated datasets.

Evidence RefTable 1

Grounding LLM plans with KG relation paths markedly improves off-the-shelf LLMs.

NumbersChatGPT Hits@1 66.77 → +RoG 81.51; Flan‑T5 30.95 → +RoG 67.87.

Practical UseYou can boost existing LLMs quickly by supplying KG-derived reasoning paths as context, without full model retraining.

Evidence RefTable 3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Hits@185.7DECAF 82.1 (reported)+4.4% relWebQSPRoG Hits@1 85.7 vs DECAF 82.1 (Table 1)Table 1
Hits@162.6UniKGQA 51.2+22.3% relCWQRoG 62.6 vs UniKGQA 51.2 (Table 1)Table 1

What To Try In 7 Days

Run RoG planning module to generate relation-path plans from a small company KG and feed retrieved paths to your production LLM to compare answer accuracy.

Benchmark K=1..5 to find the retrieval size that balances latency and precision for your use case (paper uses K=3).

Fine-tune a LLaMA2-style model on your KG QA pairs with RoG’s planning + retrieval-reasoning tasks to create a faster transfer model for similar KGs.

Agent Features

Memory
uses knowledge graph as external factual memory
Planning
relation-path planning (KG-grounded plan generation)
Tool Use
constrained BFS KG retrievalFiD fusion for multi-path reasoning
Frameworks
planning–retrieval–reasoning pipeline
Architectures
decoder-only LLM (LLaMA2-Chat-7B)

Optimization Features

Token Efficiency
not explicitly optimized
Infra Optimization
trained on 2x A100-80G GPUs for 38 hours (Freebase)
Model Optimization
instruction fine-tuning on planning and reasoning tasks
System Optimization
transfer finetuning to new KG only 2 hours after base training
Training Optimization
joint training on two tasks (planning and retrieval-reasoning)batch size 4, lr 2e-5, cosine scheduler, 3 epochs
Inference Optimization
beam search to generate top-K relation paths (K=3)constrained BFS retrieval then FiD-style reasoning

Reproducibility

Risks & Boundaries

Limitations

Requires a linked KG and correct entity linking; missing or wrong KG facts reduce effectiveness.

Retrieval cost grows with number of plans; latency and noise increase for large K.

When Not To Use

You have no suitable KG or entity linking pipeline.

You need very low-latency responses and cannot afford constrained-BFS retrieval.

Failure Modes

Noisy or irrelevant retrieved paths increase false positives and lower precision.

LLM may still generate incorrect plans if not sufficiently fine-tuned on relation names.

Core Entities

Models

RoGLLaMA2-Chat-7BChatGPTFlan-T5-xlAlpaca-7B

Metrics

Hits@1F1PrecisionRecall

Datasets

WebQSPComplex WebQuestions (CWQ)MetaQA-3hopFreebaseWiki-Movies (subset for MetaQA)

Benchmarks

KGQA (WebQSP, CWQ)