Overview
The approach is practical and reproducible: it combines standard retrieval tools with LLM zero-shot scoring and adds a useful confidence signal, but it needs calibration and better time-aware retrieval before high-stakes use.
Citations0
Evidence Strength0.60
Confidence0.85
Risk Signals11
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 3/3
Reproducibility
Status: Code + data available
Open source: Yes
At A Glance
Cost impact: 40%
Production readiness: 50%
Novelty: 50%
Why It Matters For Business
Aggregating evidence from multiple sources and retrieving negated queries expands coverage and surfaces disagreements, improving zero-shot claim checks and making automated decisions more transparent.
Who Should Care
Summary TLDR
This paper builds an open-domain claim verification pipeline that (1) generates a claim's explicit negation, (2) retrieves sentence-level evidence for both forms from Wikipedia, PubMed, and Google, (3) deduplicates and merges per-source sentences into a single evidence set, and (4) asks zero-shot LLMs to verify the claim. Negated retrieval and multi-source aggregation give consistent zero-shot gains (typical +2–10% accuracy, +2–8% macro F1 on evaluated datasets). The system also reports per-source label log-probabilities so users can see when sources disagree. Code is available.
Problem Statement
Most automated fact-checkers rely on a single knowledge source and only retrieve evidence that supports the claim. That narrows coverage and hides disagreements between sources. We need a practical method that finds both supporting and contradicting evidence across multiple sources and shows when sources disagree.
Main Contribution
A dual-perspective retrieval pipeline that generates a claim's explicit negation and retrieves evidence for both the original and negated claim.
A multi-source aggregation method that deduplicates and ranks sentence-level evidence from Wikipedia, PubMed, and Google into a single evidence set per claim.
Key Findings
Retrieving both the claim and its negation (dual-perspective) improves zero-shot verification.
Aggregating Wikipedia, PubMed, and Google often outperforms any single source.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | SciFact Llama70 merged 0.610 (merged W+P+G) | Wikipedia 0.430 | +0.180 (≈+41.9% rel vs WP) | SciFact | Table 3 merged vs per-source numbers | Table 3 |
| Accuracy | SciFact Llama70 Google only 0.550 → Original+Negated 0.607 | Original-only 0.550 | +0.057 (+10.4% rel) | SciFact (Llama 70B + Google) | Table 1; original vs original+negated | Table 1 |
What To Try In 7 Days
Add an explicit negated-query stage: generate a simple negation for each claim and run retrieval for both forms.
Pull sentence-level evidence from at least two diverse sources (e.g., Wikipedia + web search) and deduplicate before calling an LLM.
Log and visualize per-source label log-probs to flag claims with source disagreement for human review.
Reproducibility
Risks & Boundaries
Limitations
Context window limits can truncate multi-document evidence and harm verification.
No time-aware retrieval: outdated evidence can mislead time-sensitive claims.
When Not To Use
For high-stakes decisions without calibration and human review.
When evidence requires long multi-document chains exceeding context windows.
Failure Modes
LLM hallucination or label mis-mapping despite relevant evidence.
Outdated or misleading web evidence yields incorrect veracity.

