FiDeLiS: narrow retrieval + deductive beam search to ground LLM answers in KG paths

May 22, 20248 min

Overview

Decision SnapshotReady For Pilot

Practical for KG-backed QA: it increases answer fidelity and lowers runtime vs agentic baselines, but still relies on many LLM calls and KG coverage; expect moderate engineering effort to index KGs and tune beam parameters.

Citations3

Evidence Strength0.80

Confidence0.90

Risk Signals10

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 60%

Authors

Yuan Sui, Yufei He, Nian Liu, Xiaoxin He, Kun Wang, Bryan Hooi

Links

Abstract / PDF / Code / Data

Why It Matters For Business

FiDeLiS improves factual QA without model retraining by combining KG retrieval and stepwise logic checks, raising answer accuracy and cutting runtime—useful where auditability and verifiable facts matter.

Who Should Care

Summary TLDR

FiDeLiS is a training-free framework that grounds LLM answers in verifiable knowledge-graph (KG) reasoning paths. It has two parts: Path-RAG, which preselects a small set of KG entity/relation candidates using dense embeddings + graph connectivity; and DVBS, a beam search guided by LLM planning with stepwise deductive verification to stop when the question is provably answered. Across KGQA benchmarks (WebQSP, CWQ, CR-LT) FiDeLiS raises accuracy vs prior KG+LLM methods and cuts runtime vs agentic baselines, while producing shorter, more verifiable reasoning paths. Code is released.

Problem Statement

LLMs often hallucinate or produce invalid multi-step reasoning. Knowledge graphs can anchor reasoning but prior approaches either (a) retrieve imprecise facts or miss graph structure, or (b) treat LLMs as agents and incur high latency. The field needs a method that is both faithful (verifiable steps) and efficient (limited LLM calls and search). RoG's analysis shows only 67% of generated reasoning steps were valid, leaving room for improvement.

Main Contribution

FiDeLiS: a training-free pipeline that combines Path-RAG (retrieval) and Deductive-Verification Beam Search (DVBS) to ground LLM answers in KG paths.

Path-RAG narrows KG search by combining semantic similarity with one-hop structural connectivity to preselect high-quality candidates.

Key Findings

FiDeLiS improves top-answer accuracy on WebQSP with strong LLMs.

NumbersWebQSP Hits@1 84.39% (FiDeLiS, GPT‑4‑turbo) vs 81.84% (ToG)

Practical UseIf you run KG-backed QA with a powerful LLM, plug in FiDeLiS to gain a few points of Hits@1 without model finetuning.

Evidence RefTable 1

Path-RAG meaningfully increases retrieval quality vs vanilla retrievers.

NumbersWebQSP Hits@1 79.32% (Path‑RAG) vs 72.35% (vanilla retriever, same embeddings)

Practical UsePre-filter KG candidates by combining embeddings and graph neighbors to boost answer recall and reduce search work.

Evidence RefTable 3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
WebQSP Hits@184.39% (FiDeLiS, GPT‑4‑turbo)81.84% (ToG, GPT‑4‑turbo)+2.55ppWebQSPTable 1: FiDeLiS vs ToG on WebQSPTable 1
CWQ Hits@171.47% (FiDeLiS, GPT‑4‑turbo)68.51% (ToG, GPT‑4‑turbo)+2.96ppCWQTable 1: FiDeLiS vs ToG on CWQTable 1

What To Try In 7 Days

Run FiDeLiS end-to-end on a small KGQA subset to compare Hits@1 and runtime versus your current pipeline.

Swap your retriever for Path‑RAG (embed entities+relations and add one‑hop scoring) to reduce candidate load.

Add a simple deductive-check prompt to stop reasoning when the answer can be logically deduced.

Agent Features

Memory
pre-embedded KG vectors in nearest-neighbor index
Planning
LLM-generated planning steps (plan prompts to guide search)
Tool Use
LLM as a decision agent for step selectionKG nearest-neighbor retrieval index
Frameworks
Path-RAGDVBSFiDeLiS
Is Agentic

Yes

Architectures
beam searchretrieval-augmented generation
Collaboration
retriever + LLM decision loop

Optimization Features

Token Efficiency
Reduced average token usage vs ToG (see Table 7)
System Optimization
precompute embeddings for entities and relationsuse nearest-neighbor index for fast retrieval
Training Optimization
training-free design (no finetuning required)
Inference Optimization
Path‑RAG candidate pruningdeductive verification that enables early stoppingbeam pruning to keep top-K paths

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

Freebase subgraphs (used for WebQSP/CWQ, as in RoG)Wikidata (used for CR-LT-KGQA)

Risks & Boundaries

Limitations

Performance depends on KG quality and coverage; missing or outdated KG facts can still cause failures.

Beam search still requires multiple LLM calls; latency remains non-trivial for deep multi-hop cases.

When Not To Use

When no structured KG is available or building one is infeasible.

For ultra-low-latency real-time services where tens of seconds per query are unacceptable.

Failure Modes

Incorrect deductive-verifier judgments lead to premature stopping or false positives.

Path-RAG may still miss correct candidates if KG lacks relation labels or entity surface forms.

Core Entities

Models

gpt-4-turbogpt-3.5-turbogpt-4ogpt-4o-minillama-2-13bmixtral-7bmistral-7b

Metrics

Hits@1F1AccuracyRuntime (s/question)Token usageCoverage Ratio (CR)Validity Ratio (VR)

Datasets

WebQSPComplex WebQuestions (CWQ)CR-LT-KGQAMedQA-USMIE (subset for robustness)

Benchmarks

KGQA (WebQSP, CWQ, CR-LT-KGQA)

Context Entities

Models

OpenAI text-embedding-3-smallSentenceBERTE5BM25

Datasets

Freebase (subgraphs)Wikidata (subgraphs)