Overview
The paper treats retrieval as an explicit sequential decision problem and shows that prompt-level policy tuning improves retrieval coordination and stopping without changing LLM weights.
Citations0
Evidence Strength0.80
Confidence0.78
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/4
Reproducibility
Status: Partial assets available
Open source: No
At A Glance
Cost impact: 50%
Production readiness: 60%
Novelty: 65%
Why It Matters For Business
WKGFC lets products make more accurate, explainable fact checks by combining compact KG evidence with targeted web search and lightweight policy tuning, improving verification accuracy without retraining large LLMs.
Who Should Care
Summary TLDR
This paper presents WKGFC: an agentic fact‑checking system that treats evidence gathering as a sequential decision problem. It retrieves a compact subgraph from open KGs (Wikidata) using an expand‑and‑prune LLM-guided beam search, and, when needed, triggers coarse‑to‑fine web retrieval that is converted into KG triplets. The agent learns better retrieval behavior by storing episode critiques and optimizing its prompt (TextGrad) while keeping the base LLM frozen. On mixed benchmarks (Wikipedia, web, and gold-evidence), WKGFC reports an overall balanced accuracy of 74.3%, improving ~+5.4 points over the best baseline (FIRE).
Problem Statement
Text-only retrieval often misses multi-hop factual links and returns passages that are semantically similar but not factually relevant. Pure KG methods give precise relations but lack coverage in open-world claims. Existing systems also lack an adaptive procedure to decide when to expand KG vs. search the web.
Main Contribution
Formulate fact-checking as a POMDP agent that adaptively chooses KG expansion, web search, or verdict.
A KG-first expand-and-prune retrieval pipeline: seed entities from claims, SPARQL expansion, LLM-guided pruning (beam search).
Key Findings
WKGFC improves overall balanced accuracy compared to strong baselines.
Strong gains on Wikipedia single-hop verification.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | WKGFC 74.3% | FIRE 68.9% | +5.4 | Overall (Table 1) | Table 1; Section 5.4 | Table 1 |
| Accuracy | WKGFC 91.9% | FIRE 90.6% | +1.3 | FEVER (Wikipedia-sourced) | Table 1; Section 5.4 | Table 1 |
What To Try In 7 Days
Prototype a KG-first retrieval step using Wikidata SPARQL and spaCy entity linking on a small claim set.
Add a coarse BM25 web fetch plus an LLM filter to accept/reject passages.
Store decision traces and manually refine the prompt that decides when to expand or stop retrieval before automating prompt tuning.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Relies on KG coverage: missing facts in Wikidata cause many errors on non‑wiki claims.
Web retrieval can be noisy; web evidence is treated as lower‑precision expansions.
When Not To Use
When latency or API cost forbids multiple LLM calls and SPARQL queries.
When the domain has no suitable KG coverage and web sources are also sparse.
Failure Modes
Insufficient KG coverage: agent must trigger web search but evidence still missing.
Exceeding maximum steps: agent exhausts retrieval without confidence and is forced to guess.

