Find and fix contradictions in an LLM's own text without web lookups

May 25, 20237 min

Overview

Decision SnapshotReady For Pilot

The method is practical for postprocessing black-box LMs: detection works well in experiments and mitigation preserves fluency; costs are modest but scale with text length and analyzer choices.

Citations46

Evidence Strength0.80

Confidence0.88

Risk Signals8

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 3/9

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 60%

Authors

Niels Mündler, Jingxuan He, Slobodan Jenko, Martin Vechev

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Automate contradiction checks to catch internal hallucinations that retrieval misses, improving trust in long-form outputs and answers with modest extra cost.

Who Should Care

Summary TLDR

The paper studies self-contradictions—cases where an LLM produces two sentences that conflict about the same subject. The authors (1) define a practical pipeline to trigger such pairs, (2) use prompting to detect contradictions with high accuracy (around 80% F1), and (3) iteratively revise or remove contradictory sentences to cut contradictions by up to ~89% while keeping fluency and informativeness. The method works as a black-box (prompting only), complements retrieval-based checks, and is released as ChatProtect with code and datasets.

Problem Statement

Large instruction-tuned LMs often output hallucinated facts. A common and useful form is self-contradiction: two sentences the model generates about the same subject that cannot both be true. This both reveals non-factual content and offers an opportunity: detect and remove contradictions using only the model's own reasoning, without external knowledge retrieval.

Main Contribution

Formalized 'self-contradiction' as a targeted hallucination signal and showed it reliably indicates non-factuality.

A prompt-based three-step pipeline: trigger contradictory sentence pairs, detect contradictions with chain-of-thought prompts, and iteratively mitigate by local edits.

Key Findings

Self-contradictions are common in open-domain generations.

Numbers17.7% of sentences for ChatGPT (MainTestSet)

Practical UseExpect ~15–23% of sentences to yield contradictions depending on model; add automated screening for long-form outputs.

Evidence RefTable 2

A large share of contradictions cannot be resolved by web lookup.

Numbers35.2% of ChatGPT contradictions unverifiable online

Practical UseDon’t rely only on retrieval: run contradiction checks to catch non-verifiable hallucinations.

Evidence RefTable 2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
self-contradiction rate (open-domain)15.7%–22.9% (by model)MainTestSetGPT-4 15.7%; ChatGPT 17.7%; Llama2 19.0%; Vicuna 22.9%Table 2
unverifiable fraction of contradictions20.5%–35.2% (by model)MainTestSetChatGPT 35.2% cannot be verified with Wikipedia/webTable 2

What To Try In 7 Days

Run ChatProtect on a sample of your product's LLM outputs and measure contradiction rate.

Add aLM.detect prompts (chain-of-thought + Yes/No) to flag contradictions during postprocessing.

Apply iterative mitigation on flagged sentences and compare informativeness and fluency before rollout.

Agent Features

Tool Use
promptingexternal retrieval (as comparator)
Frameworks
ChatProtect (prompt pipeline)

Optimization Features

Token Efficiency
per-sentence linear queries; quadratic prompt growth with text length

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Handles contradictions between two sentences sampled at the same position only.

Prompting cost grows with text length (quadratic prompt tokens).

When Not To Use

When you require provable factual verification from external primary sources.

When token budget or latency cannot absorb the extra LM calls needed for detection and mitigation.

Failure Modes

Analyzer (aLM) misses subtle contradictions (false negatives), esp. for weaker open-source models.

Analyzer can produce wrong explanations and thus fail to flag contradictions (observed for Vicuna-13B).

Core Entities

Models

GPT-4ChatGPT (gpt-3.5-turbo)Llama2-70B-ChatVicuna-13B

Metrics

self-contradiction rateprecisionrecallF1perplexity increaseinformativeness retained

Datasets

MainTestSet (360 descriptions, 30 entities)2ndTestSet (100 entities)PopQA (sampled 1.5k questions)