Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
3
Why It Matters For Business
FiDeLiS improves factual QA without model retraining by combining KG retrieval and stepwise logic checks, raising answer accuracy and cutting runtime—useful where auditability and verifiable facts matter.
Summary TLDR
FiDeLiS is a training-free framework that grounds LLM answers in verifiable knowledge-graph (KG) reasoning paths. It has two parts: Path-RAG, which preselects a small set of KG entity/relation candidates using dense embeddings + graph connectivity; and DVBS, a beam search guided by LLM planning with stepwise deductive verification to stop when the question is provably answered. Across KGQA benchmarks (WebQSP, CWQ, CR-LT) FiDeLiS raises accuracy vs prior KG+LLM methods and cuts runtime vs agentic baselines, while producing shorter, more verifiable reasoning paths. Code is released.
Problem Statement
LLMs often hallucinate or produce invalid multi-step reasoning. Knowledge graphs can anchor reasoning but prior approaches either (a) retrieve imprecise facts or miss graph structure, or (b) treat LLMs as agents and incur high latency. The field needs a method that is both faithful (verifiable steps) and efficient (limited LLM calls and search). RoG's analysis shows only 67% of generated reasoning steps were valid, leaving room for improvement.
Main Contribution
FiDeLiS: a training-free pipeline that combines Path-RAG (retrieval) and Deductive-Verification Beam Search (DVBS) to ground LLM answers in KG paths.
Path-RAG narrows KG search by combining semantic similarity with one-hop structural connectivity to preselect high-quality candidates.
DVBS uses LLM planning plus local/global deductive checks to validate each step and stop when the question is deducible, improving factuality and interpretability.
Key Findings
FiDeLiS improves top-answer accuracy on WebQSP with strong LLMs.
Path-RAG meaningfully increases retrieval quality vs vanilla retrievers.
Deductive verification shortens reasoning to be closer to ground truth depth.
Beam search is essential for quality; removing it collapses accuracy.
FiDeLiS reduces runtime vs an agentic baseline by roughly 1.7×.
Results
WebQSP Hits@1
CWQ Hits@1
Accuracy
Runtime per question
Ablation: remove beam search
Who Should Care
What To Try In 7 Days
Run FiDeLiS end-to-end on a small KGQA subset to compare Hits@1 and runtime versus your current pipeline.
Swap your retriever for Path‑RAG (embed entities+relations and add one‑hop scoring) to reduce candidate load.
Add a simple deductive-check prompt to stop reasoning when the answer can be logically deduced.
Agent Features
Memory
- pre-embedded KG vectors in nearest-neighbor index
Planning
- LLM-generated planning steps (plan prompts to guide search)
Tool Use
- LLM as a decision agent for step selection
- KG nearest-neighbor retrieval index
Frameworks
- Path-RAG
- DVBS
- FiDeLiS
Is Agentic
true
Architectures
- beam search
- retrieval-augmented generation
Collaboration
- retriever + LLM decision loop
Optimization Features
Token Efficiency
- Reduced average token usage vs ToG (see Table 7)
System Optimization
- precompute embeddings for entities and relations
- use nearest-neighbor index for fast retrieval
Training Optimization
- training-free design (no finetuning required)
Inference Optimization
- Path‑RAG candidate pruning
- deductive verification that enables early stopping
- beam pruning to keep top-K paths
Reproducibility
Code Urls
Data Urls
- Freebase subgraphs (used for WebQSP/CWQ, as in RoG)
- Wikidata (used for CR-LT-KGQA)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Performance depends on KG quality and coverage; missing or outdated KG facts can still cause failures.
- Beam search still requires multiple LLM calls; latency remains non-trivial for deep multi-hop cases.
- Deductive verification relies on LLM judgment prompts and can inherit LLM bias or mistakes.
- Tuned hyperparameters (α, beam width/depth) need dataset-specific calibration.
When Not To Use
- When no structured KG is available or building one is infeasible.
- For ultra-low-latency real-time services where tens of seconds per query are unacceptable.
- For open-ended free-text answers that cannot be expressed as KG path derivations.
Failure Modes
- Incorrect deductive-verifier judgments lead to premature stopping or false positives.
- Path-RAG may still miss correct candidates if KG lacks relation labels or entity surface forms.
- Beam search width/depth misconfiguration either wastes resources or misses paths.
Core Entities
Models
- gpt-4-turbo
- gpt-3.5-turbo
- gpt-4o
- gpt-4o-mini
- llama-2-13b
- mixtral-7b
- mistral-7b
Metrics
- Hits@1
- F1
- Accuracy
- Runtime (s/question)
- Token usage
- Coverage Ratio (CR)
- Validity Ratio (VR)
Datasets
- WebQSP
- Complex WebQuestions (CWQ)
- CR-LT-KGQA
- MedQA-USMIE (subset for robustness)
Benchmarks
- KGQA (WebQSP, CWQ, CR-LT-KGQA)
Context Entities
Models
- OpenAI text-embedding-3-small
- SentenceBERT
- E5
- BM25
Datasets
- Freebase (subgraphs)
- Wikidata (subgraphs)

