LLM explanations speed up fact-checking but cause dangerous over-reliance when they are wrong

Overview

Decision SnapshotNeeds Validation

A well-powered human study supports the main claims: LLM explanations speed up verification but induce dangerous over-reliance; grounding and contrastive prompts help but do not fully beat retrieval.

Citations6

Evidence Strength0.80

Confidence0.88

Risk Signals11

Trust Signals

Findings with numeric evidence: 7/7

Findings with evidence refs: 7/7

Results with explicit delta: 7/8

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 40%

Production readiness: 60%

Novelty: 30%

Authors

Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé, Jordan Boyd-Graber

Links

Abstract / PDF / Data

Why It Matters For Business

LLM explanations let teams verify claims much faster but can mislead people when wrong; for important decisions, prioritize retrieval-grounded workflows or add checks to avoid over-reliance.

Who Should Care

Product Manager ML Engineer Engineering Lead Data Scientist

Summary TLDR

A human study (1,500 annotations, 80 workers) compares ChatGPT explanations and Wikipedia retrieval for fact-checking hard claims. Explanations and retrieved passages yield similar accuracy (~74% vs 73% vs 59% baseline), but explanations are much faster (~1.0 min vs ~2.5 min). Users heavily over-rely on LLM explanations when those explanations are wrong (human accuracy falls to 35%). Contrastive explanations reduce that over-reliance (raise accuracy to 56% on those cases) but do not beat retrieval overall. Grounding explanations on retrieved passages improves model accuracy (59.5% → 78%).

Problem Statement

People use LLMs and search results to check claims. We need to know which tool helps humans verify facts more accurately and whether LLM explanations help or hurt real users.

Main Contribution

Large human study comparing ChatGPT free-text explanations vs top-10 Wikipedia passages for fact verification on adversarial claims.

Measured speed and accuracy trade-offs: explanations speed decisions but encourage over-reliance when wrong.

Key Findings

ChatGPT explanations and retrieved Wikipedia passages both improve human accuracy over no help.

NumbersExplanation 74% ±0.09 vs Retrieval 73% ±0.12 vs Baseline 59% ±0.12

Practical UseIf you must speed verification, an LLM explanation gives similar accuracy to retrieval but saves time.

Evidence RefSec 5 (Fig 2)

Reading LLM explanations is ~2.5× faster than reading retrieved passages.

NumbersExplanation 1.01 ±0.45 min vs Retrieval 2.53 ±1.07 min per claim

Practical UseUse LLM explanations to improve throughput for low-risk checks or triage, not high-stakes decisions.

Evidence RefSec 5 (Fig 2b)

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Accuracy	0.74 ±0.09	Baseline 0.59 ±0.12	+0.15	All evaluated claims (200 sampled from FoolMeTwice)	Sec 5 (Fig 2)	Sec 5
Accuracy	0.73 ±0.12	Baseline 0.59 ±0.12	+0.14	All evaluated claims	Sec 5 (Fig 2)	Sec 5

What To Try In 7 Days

Ground LLM explanations on retrieved passages before showing them to users.

Use retrieval (top passages) as default for high-stakes verification workflows.

Pilot contrastive prompts (support + refute) for triage where users can inspect both sides.

Agent Features

Tool Use

retrieval-grounded prompting

Collaboration

human-in-the-loop verification

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Data URLs

FoolMeTwice (Eisenschlos et al., 2021); Wikipedia snapshots used for retrieval

Risks & Boundaries

Limitations

Limited participant pool (Prolific) and 16 annotators per condition; may not generalize to experts.

Single model checkpoint (GPT-3.5-turbo-0613) and time-limited API snapshot.

When Not To Use

High-stakes verification where mistaken LLM explanations could cause harm

Workflows with low retrieval recall (missing evidence in top-10)

Failure Modes

Users adopt LLM answers verbatim even when explanations are factually wrong

LLM generates convincing but incorrect supporting/refuting rationales (hallucinations)

Core Entities

Models

gpt-3.5-turbo (GPT-3.5-turbo-0613)

Metrics

Accuracytime per claimretrieval full-recalluser confidence

Datasets

FoolMeTwiceWikipedia (retrieved passages)

Context Entities

Models

GPT-3.5 family (chat completions)

Metrics

Accuracytime and confidence calibration

Datasets

FoolMeTwice (adversarial claims)Wikipedia passages used for grounding

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

ChatGPT explanations and retrieved Wikipedia passages both improve human accuracy over no help.

Reading LLM explanations is ~2.5× faster than reading retrieved passages.

Results

What To Try In 7 Days

Agent Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Context Entities

Models

Metrics

Datasets

You May Also Want to Read

Professional multilingual TruthfulQA shows truth gaps across languages but smaller than expected

Key finding

Train a model to judge and correct its own facts with token-level rewards to cut hallucinations

Key finding

TruthHypo benchmark and KnowHD detector to measure and filter hallucinated scientific hypotheses

Key finding

Use weak or small models as judges: peer prediction rewards honesty and detects deception even when judges are far weaker

Key finding

Induce a model to hallucinate, then penalize those hallucinations at decoding to reduce LLM fabrications

Key finding