Overview
The method is practical for postprocessing black-box LMs: detection works well in experiments and mitigation preserves fluency; costs are modest but scale with text length and analyzer choices.
Citations46
Evidence Strength0.80
Confidence0.88
Risk Signals8
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 3/9
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
Automate contradiction checks to catch internal hallucinations that retrieval misses, improving trust in long-form outputs and answers with modest extra cost.
Who Should Care
Summary TLDR
The paper studies self-contradictions—cases where an LLM produces two sentences that conflict about the same subject. The authors (1) define a practical pipeline to trigger such pairs, (2) use prompting to detect contradictions with high accuracy (around 80% F1), and (3) iteratively revise or remove contradictory sentences to cut contradictions by up to ~89% while keeping fluency and informativeness. The method works as a black-box (prompting only), complements retrieval-based checks, and is released as ChatProtect with code and datasets.
Problem Statement
Large instruction-tuned LMs often output hallucinated facts. A common and useful form is self-contradiction: two sentences the model generates about the same subject that cannot both be true. This both reveals non-factual content and offers an opportunity: detect and remove contradictions using only the model's own reasoning, without external knowledge retrieval.
Main Contribution
Formalized 'self-contradiction' as a targeted hallucination signal and showed it reliably indicates non-factuality.
A prompt-based three-step pipeline: trigger contradictory sentence pairs, detect contradictions with chain-of-thought prompts, and iteratively mitigate by local edits.
Key Findings
Self-contradictions are common in open-domain generations.
A large share of contradictions cannot be resolved by web lookup.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| self-contradiction rate (open-domain) | 15.7%–22.9% (by model) | — | — | MainTestSet | GPT-4 15.7%; ChatGPT 17.7%; Llama2 19.0%; Vicuna 22.9% | Table 2 |
| unverifiable fraction of contradictions | 20.5%–35.2% (by model) | — | — | MainTestSet | ChatGPT 35.2% cannot be verified with Wikipedia/web | Table 2 |
What To Try In 7 Days
Run ChatProtect on a sample of your product's LLM outputs and measure contradiction rate.
Add aLM.detect prompts (chain-of-thought + Yes/No) to flag contradictions during postprocessing.
Apply iterative mitigation on flagged sentences and compare informativeness and fluency before rollout.
Agent Features
Tool Use
Frameworks
Optimization Features
Token Efficiency
Reproducibility
Risks & Boundaries
Limitations
Handles contradictions between two sentences sampled at the same position only.
Prompting cost grows with text length (quadratic prompt tokens).
When Not To Use
When you require provable factual verification from external primary sources.
When token budget or latency cannot absorb the extra LM calls needed for detection and mitigation.
Failure Modes
Analyzer (aLM) misses subtle contradictions (false negatives), esp. for weaker open-source models.
Analyzer can produce wrong explanations and thus fail to flag contradictions (observed for Vicuna-13B).

