Find and fix contradictions in an LLM's own text without web lookups

Overview

Decision SnapshotReady For Pilot

The method is practical for postprocessing black-box LMs: detection works well in experiments and mitigation preserves fluency; costs are modest but scale with text length and analyzer choices.

Citations46

Evidence Strength0.80

Confidence0.88

Risk Signals8

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 3/9

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 60%

Authors

Niels Mündler, Jingxuan He, Slobodan Jenko, Martin Vechev

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Automate contradiction checks to catch internal hallucinations that retrieval misses, improving trust in long-form outputs and answers with modest extra cost.

Who Should Care

Product Manager ML Engineer Data Scientist

Summary TLDR

The paper studies self-contradictions—cases where an LLM produces two sentences that conflict about the same subject. The authors (1) define a practical pipeline to trigger such pairs, (2) use prompting to detect contradictions with high accuracy (around 80% F1), and (3) iteratively revise or remove contradictory sentences to cut contradictions by up to ~89% while keeping fluency and informativeness. The method works as a black-box (prompting only), complements retrieval-based checks, and is released as ChatProtect with code and datasets.

Problem Statement

Large instruction-tuned LMs often output hallucinated facts. A common and useful form is self-contradiction: two sentences the model generates about the same subject that cannot both be true. This both reveals non-factual content and offers an opportunity: detect and remove contradictions using only the model's own reasoning, without external knowledge retrieval.

Main Contribution

Formalized 'self-contradiction' as a targeted hallucination signal and showed it reliably indicates non-factuality.

A prompt-based three-step pipeline: trigger contradictory sentence pairs, detect contradictions with chain-of-thought prompts, and iteratively mitigate by local edits.

Key Findings

Self-contradictions are common in open-domain generations.

Numbers17.7% of sentences for ChatGPT (MainTestSet)

Practical UseExpect ~15–23% of sentences to yield contradictions depending on model; add automated screening for long-form outputs.

Evidence RefTable 2

A large share of contradictions cannot be resolved by web lookup.

Numbers35.2% of ChatGPT contradictions unverifiable online

Practical UseDon’t rely only on retrieval: run contradiction checks to catch non-verifiable hallucinations.

Evidence RefTable 2

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
self-contradiction rate (open-domain)	15.7%–22.9% (by model)	—	—	MainTestSet	GPT-4 15.7%; ChatGPT 17.7%; Llama2 19.0%; Vicuna 22.9%	Table 2
unverifiable fraction of contradictions	20.5%–35.2% (by model)	—	—	MainTestSet	ChatGPT 35.2% cannot be verified with Wikipedia/web	Table 2

What To Try In 7 Days

Run ChatProtect on a sample of your product's LLM outputs and measure contradiction rate.

Add aLM.detect prompts (chain-of-thought + Yes/No) to flag contradictions during postprocessing.

Apply iterative mitigation on flagged sentences and compare informativeness and fluency before rollout.

Agent Features

Tool Use

promptingexternal retrieval (as comparator)

Frameworks

ChatProtect (prompt pipeline)

Optimization Features

Token Efficiency

per-sentence linear queries; quadratic prompt growth with text length

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/eth-sri/ChatProtect https://chatprotect.ai/

Data URLs

https://github.com/eth-sri/ChatProtect

Risks & Boundaries

Limitations

Handles contradictions between two sentences sampled at the same position only.

Prompting cost grows with text length (quadratic prompt tokens).

When Not To Use

When you require provable factual verification from external primary sources.

When token budget or latency cannot absorb the extra LM calls needed for detection and mitigation.

Failure Modes

Analyzer (aLM) misses subtle contradictions (false negatives), esp. for weaker open-source models.

Analyzer can produce wrong explanations and thus fail to flag contradictions (observed for Vicuna-13B).

Core Entities

Models

GPT-4ChatGPT (gpt-3.5-turbo)Llama2-70B-ChatVicuna-13B

Metrics

self-contradiction rateprecisionrecallF1perplexity increaseinformativeness retained

Datasets

MainTestSet (360 descriptions, 30 entities)2ndTestSet (100 entities)PopQA (sampled 1.5k questions)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Self-contradictions are common in open-domain generations.

A large share of contradictions cannot be resolved by web lookup.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

PanCanBench: 282 real patient questions + 3,130 expert rubrics to test LLM clinical completeness and factuality

Key finding

Hallucinations in LLMs are diverse, theoretically inevitable, and must be managed with grounding and human oversight

Key finding

Bi'an: a bilingual RAG hallucination benchmark plus small fine-tuned judge models

Key finding

LLMs misjudge mixed-context hallucinations: external retrieval helps but factual cases remain hard

Key finding

MultiHal: a multilingual, Wikidata-grounded benchmark that uses KG paths to evaluate and reduce LLM hallucinations

Key finding