LLM explanations speed up fact-checking but cause dangerous over-reliance when they are wrong

October 19, 20238 min

Overview

Decision SnapshotNeeds Validation

A well-powered human study supports the main claims: LLM explanations speed up verification but induce dangerous over-reliance; grounding and contrastive prompts help but do not fully beat retrieval.

Citations6

Evidence Strength0.80

Confidence0.88

Risk Signals11

Trust Signals

Findings with numeric evidence: 7/7

Findings with evidence refs: 7/7

Results with explicit delta: 7/8

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 40%

Production readiness: 60%

Novelty: 30%

Authors

Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé, Jordan Boyd-Graber

Links

Abstract / PDF / Data

Why It Matters For Business

LLM explanations let teams verify claims much faster but can mislead people when wrong; for important decisions, prioritize retrieval-grounded workflows or add checks to avoid over-reliance.

Who Should Care

Summary TLDR

A human study (1,500 annotations, 80 workers) compares ChatGPT explanations and Wikipedia retrieval for fact-checking hard claims. Explanations and retrieved passages yield similar accuracy (~74% vs 73% vs 59% baseline), but explanations are much faster (~1.0 min vs ~2.5 min). Users heavily over-rely on LLM explanations when those explanations are wrong (human accuracy falls to 35%). Contrastive explanations reduce that over-reliance (raise accuracy to 56% on those cases) but do not beat retrieval overall. Grounding explanations on retrieved passages improves model accuracy (59.5% → 78%).

Problem Statement

People use LLMs and search results to check claims. We need to know which tool helps humans verify facts more accurately and whether LLM explanations help or hurt real users.

Main Contribution

Large human study comparing ChatGPT free-text explanations vs top-10 Wikipedia passages for fact verification on adversarial claims.

Measured speed and accuracy trade-offs: explanations speed decisions but encourage over-reliance when wrong.

Key Findings

ChatGPT explanations and retrieved Wikipedia passages both improve human accuracy over no help.

NumbersExplanation 74% ±0.09 vs Retrieval 73% ±0.12 vs Baseline 59% ±0.12

Practical UseIf you must speed verification, an LLM explanation gives similar accuracy to retrieval but saves time.

Evidence RefSec 5 (Fig 2)

Reading LLM explanations is ~2.5× faster than reading retrieved passages.

NumbersExplanation 1.01 ±0.45 min vs Retrieval 2.53 ±1.07 min per claim

Practical UseUse LLM explanations to improve throughput for low-risk checks or triage, not high-stakes decisions.

Evidence RefSec 5 (Fig 2b)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Accuracy0.74 ±0.09Baseline 0.59 ±0.12+0.15All evaluated claims (200 sampled from FoolMeTwice)Sec 5 (Fig 2)Sec 5
Accuracy0.73 ±0.12Baseline 0.59 ±0.12+0.14All evaluated claimsSec 5 (Fig 2)Sec 5

What To Try In 7 Days

Ground LLM explanations on retrieved passages before showing them to users.

Use retrieval (top passages) as default for high-stakes verification workflows.

Pilot contrastive prompts (support + refute) for triage where users can inspect both sides.

Agent Features

Tool Use
retrieval-grounded prompting
Collaboration
human-in-the-loop verification

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

FoolMeTwice (Eisenschlos et al., 2021); Wikipedia snapshots used for retrieval

Risks & Boundaries

Limitations

Limited participant pool (Prolific) and 16 annotators per condition; may not generalize to experts.

Single model checkpoint (GPT-3.5-turbo-0613) and time-limited API snapshot.

When Not To Use

High-stakes verification where mistaken LLM explanations could cause harm

Workflows with low retrieval recall (missing evidence in top-10)

Failure Modes

Users adopt LLM answers verbatim even when explanations are factually wrong

LLM generates convincing but incorrect supporting/refuting rationales (hallucinations)

Core Entities

Models

gpt-3.5-turbo (GPT-3.5-turbo-0613)

Metrics

Accuracytime per claimretrieval full-recalluser confidence

Datasets

FoolMeTwiceWikipedia (retrieved passages)

Context Entities

Models

GPT-3.5 family (chat completions)

Metrics

Accuracytime and confidence calibration

Datasets

FoolMeTwice (adversarial claims)Wikipedia passages used for grounding