An LLM agent that first pulls subgraphs from Wikidata, then triggers focused web search and prompt-based self-improvement to improve fact‑f​

February 27, 20268 min

Overview

Decision SnapshotNeeds Validation

The paper treats retrieval as an explicit sequential decision problem and shows that prompt-level policy tuning improves retrieval coordination and stopping without changing LLM weights.

Citations0

Evidence Strength0.80

Confidence0.78

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/4

Reproducibility

Status: Partial assets available

Open source: No

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 65%

Authors

Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris, Preslav Nakov, Zhuohan Xie

Links

Abstract / PDF / Data

Why It Matters For Business

WKGFC lets products make more accurate, explainable fact checks by combining compact KG evidence with targeted web search and lightweight policy tuning, improving verification accuracy without retraining large LLMs.

Who Should Care

Summary TLDR

This paper presents WKGFC: an agentic fact‑checking system that treats evidence gathering as a sequential decision problem. It retrieves a compact subgraph from open KGs (Wikidata) using an expand‑and‑prune LLM-guided beam search, and, when needed, triggers coarse‑to‑fine web retrieval that is converted into KG triplets. The agent learns better retrieval behavior by storing episode critiques and optimizing its prompt (TextGrad) while keeping the base LLM frozen. On mixed benchmarks (Wikipedia, web, and gold-evidence), WKGFC reports an overall balanced accuracy of 74.3%, improving ~+5.4 points over the best baseline (FIRE).

Problem Statement

Text-only retrieval often misses multi-hop factual links and returns passages that are semantically similar but not factually relevant. Pure KG methods give precise relations but lack coverage in open-world claims. Existing systems also lack an adaptive procedure to decide when to expand KG vs. search the web.

Main Contribution

Formulate fact-checking as a POMDP agent that adaptively chooses KG expansion, web search, or verdict.

A KG-first expand-and-prune retrieval pipeline: seed entities from claims, SPARQL expansion, LLM-guided pruning (beam search).

Key Findings

WKGFC improves overall balanced accuracy compared to strong baselines.

NumbersOverall avg: WKGFC 74.3% vs FIRE 68.9% (+5.4)

Practical UseCombining KG-first retrieval, targeted web search, and prompt-level policy tuning can measurably raise veracity prediction on mixed fact-checking benchmarks.

Evidence RefTable 1; Section 5.4

Strong gains on Wikipedia single-hop verification.

NumbersFEVER: WKGFC 91.9% (best reported)

Practical UseFor Wikipedia-like claims, prioritizing KG subgraph retrieval gives precise evidence and high accuracy; implement KG-first as a low-latency step.

Evidence RefTable 1; Section 5.4

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
AccuracyWKGFC 74.3%FIRE 68.9%+5.4Overall (Table 1)Table 1; Section 5.4Table 1
AccuracyWKGFC 91.9%FIRE 90.6%+1.3FEVER (Wikipedia-sourced)Table 1; Section 5.4Table 1

What To Try In 7 Days

Prototype a KG-first retrieval step using Wikidata SPARQL and spaCy entity linking on a small claim set.

Add a coarse BM25 web fetch plus an LLM filter to accept/reject passages.

Store decision traces and manually refine the prompt that decides when to expand or stop retrieval before automating prompt tuning.

Agent Features

Memory
experience buffer of trajectories and structured self-critiquesprompt as policy parameter (trainable)
Planning
sequential retrieval actions (initKGRetrieval, expandKG, webSearch, verdict)beam search expansion with LLM pruning
Tool Use
initKGRetrieval(claim)expandKG(claim,currentKG,topicEntities)webSearch(query,currentKG)verdict(claim,G,K_web)
Frameworks
LLM-enabled retrieval controlprompt-level optimization (TextGrad)
Is Agentic

Yes

Architectures
POMDP / LLM agentexpand-and-prune beam search over KG
Collaboration
single agent decision loop (paper notes future multi-agent extension)

Optimization Features

Token Efficiency
pruning and stopping policy reduce extra LLM calls compared to blind multi-round retrieval
System Optimization
expand-and-prune KG traversal to control graph growth
Training Optimization
prompt-level optimization using TextGrad over an experience buffer (no LLM weight updates)
Inference Optimization
adaptive stopping decision reduces unnecessary retrieval callsbeam-pruning limits KG size

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusNo
LicenseUnknown

Data URLs

FEVERHOVERLIAR-NewAveriTeCSummEvalAggreFact-CNNPubHealth

Risks & Boundaries

Limitations

Relies on KG coverage: missing facts in Wikidata cause many errors on non‑wiki claims.

Web retrieval can be noisy; web evidence is treated as lower‑precision expansions.

When Not To Use

When latency or API cost forbids multiple LLM calls and SPARQL queries.

When the domain has no suitable KG coverage and web sources are also sparse.

Failure Modes

Insufficient KG coverage: agent must trigger web search but evidence still missing.

Exceeding maximum steps: agent exhausts retrieval without confidence and is forced to guess.

Core Entities

Models

WKGFC (Ours)GPT-4GPT-4oClaude 3.5-SonnetGemini-2.5-flashDeepSeek-V3 67BLlama3 8BLlama3.3 70BQwen2.5 7BQwen2.5 72BHerOFIREGraphRAGGraphCheck

Metrics

Accuracyerror rateneg rate

Datasets

FEVERHOVERLIAR-NewAveriTeCSummEvalAggreFact-CNNPubHealth

Benchmarks

Accuracy