Overview
RoG shows clear gains for KG-backed QA by forcing LLMs to plan with relation paths and reason over retrieved KG instances; it is practical when a high-quality KG and entity linking exist, but requires KG preprocessing and moderate GPU resources.
Citations38
Evidence Strength0.70
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/5
Reproducibility
Status: Code + data available
Open source: Yes
At A Glance
Cost impact: 50%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
RoG reduces hallucinations by grounding LLM reasoning in KG facts and provides traceable, human-readable paths—this improves accuracy and trust on KG-backed QA without retraining every LLM.
Who Should Care
Summary TLDR
RoG is a planning–retrieval–reasoning method that forces LLMs to produce relation-path plans grounded in a knowledge graph (KG), retrieves matching reasoning paths from the KG, and then uses those paths as context for final answers. The method is trained by two instruction-tuning tasks (planning and retrieval-reasoning), is plug-and-play at inference (the planning module can be used with other LLMs), and yields state-of-the-art results on KGQA benchmarks (WebQSP and CWQ) while producing human-readable, KG-grounded explanations. Code and weights are released.
Problem Statement
LLMs hallucinate and lack up-to-date facts during multi-hop reasoning. Prior KG+LLM approaches either generate brittle logical queries or treat KGs as loose text stores and ignore KG structure. We need a way to make LLM reasoning faithful to KG facts and interpretable by exposing KG relation paths.
Main Contribution
Introduce RoG: a planning–retrieval–reasoning pipeline that uses relation paths as KG-grounded plans.
Two-task instruction tuning: (1) planning optimization to teach LLMs to output KG relation paths, (2) retrieval-reasoning optimization to make LLMs reason over retrieved KG paths.
Key Findings
RoG sets new best scores on standard KGQA benchmarks.
Grounding LLM plans with KG relation paths markedly improves off-the-shelf LLMs.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Hits@1 | 85.7 | DECAF 82.1 (reported) | +4.4% rel | WebQSP | RoG Hits@1 85.7 vs DECAF 82.1 (Table 1) | Table 1 |
| Hits@1 | 62.6 | UniKGQA 51.2 | +22.3% rel | CWQ | RoG 62.6 vs UniKGQA 51.2 (Table 1) | Table 1 |
What To Try In 7 Days
Run RoG planning module to generate relation-path plans from a small company KG and feed retrieved paths to your production LLM to compare answer accuracy.
Benchmark K=1..5 to find the retrieval size that balances latency and precision for your use case (paper uses K=3).
Fine-tune a LLaMA2-style model on your KG QA pairs with RoG’s planning + retrieval-reasoning tasks to create a faster transfer model for similar KGs.
Agent Features
Memory
Planning
Tool Use
Frameworks
Architectures
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Requires a linked KG and correct entity linking; missing or wrong KG facts reduce effectiveness.
Retrieval cost grows with number of plans; latency and noise increase for large K.
When Not To Use
You have no suitable KG or entity linking pipeline.
You need very low-latency responses and cannot afford constrained-BFS retrieval.
Failure Modes
Noisy or irrelevant retrieved paths increase false positives and lower precision.
LLM may still generate incorrect plans if not sufficiently fine-tuned on relation names.

