Overview
Production Readiness
0.3
Novelty Score
0.5
Cost Impact Score
0.2
Citation Count
18
Why It Matters For Business
LLMs can assert conclusions drawn from their training data or corpus statistics rather than the given context. That puts QA, summarization, and policy extraction at risk of silent misinformation; apply attestation checks and bias-controlled tests before deployment.
Summary TLDR
This paper runs controlled prompting tests on LLaMA-65B, GPT-3.5 (text-davinci-003), and PaLM-540B to find two concrete sources of false positive 'entailment' (hallucination) in inference tasks. First, LLMs tend to assert conclusions when the hypothesis sentence appears in their training data (attestation bias). Second, they favor entailment when the hypothesis expresses a more frequent predicate than the premise (relative frequency bias). Both biases come from pretraining statistics and cause big drops in reliable inference when test examples are designed against them.
Problem Statement
LLMs are trusted for inference tasks (e.g., question answering, summarization), but they sometimes hallucinate by asserting conclusions not supported by provided premises. The paper asks: which pretraining-derived biases cause these false positives, and how much do they harm real NLI performance?
Main Contribution
Show and measure an attestation bias: models more often predict 'Entail' when the hypothesis matches text the model likely saw in pretraining.
Show and measure a relative frequency bias: models favor entailment if the hypothesis predicate is more corpus-frequent than the premise predicate.
Quantify impact: construct adversarial subsets and show large drops in discriminative performance (AUC norm) when labels conflict with these biases.
Key Findings
Attestation (memorized sentence) strongly raises false positive entailments.
Relative term-frequency of predicates biases entailment decisions.
Bias-aligned vs. bias-adversarial examples cause large performance swings.
Results
Attestation bias multiplicative effect
Relative frequency bias multiplicative effect
AUC norm drop when attestation bias is adversarial
Who Should Care
What To Try In 7 Days
Add an attestation probe: ask the model whether the hypothesis is 'attested/unknown/false' before trusting outputs.
Run bias-controlled splits: evaluate models on examples adversarial to attestation and frequency biases.
Mask or canonicalize named entities in a staging test to see how much outputs rely on memorized entities.
Agent Features
Memory
- Propositional memory (sentence-level memorization)
Reproducibility
Data Urls
- https://github.com/mjhosseini/entgraph_eval/tree/master/LevyHoltDS
- RTE-1 (public RTE corpus)
- https://books.google.com/ngrams (Google N-grams)
- NewsCrawl (Barrault et al., 2019)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Paper tests two biases but does not claim to cover all hallucination sources.
- Google N-grams is a proxy for pretraining frequency and may not match private pretraining corpora exactly.
- Prompting choices and few-shot examples are fixed; other prompts might change magnitudes but core trends persist.
When Not To Use
- Do not rely solely on model outputs for high-stakes inference tasks without bias controls.
- Avoid using raw LLM predictions for knowledge extraction where user-provided context must be the only source of truth.
Failure Modes
- Model asserts hypothesis because it matches memorized training sentences, not because the premise supports it.
- Named entities act as memory indices, causing over-reliance on entity identity instead of predicate logic.
- Models prefer entailment when hypothesis predicates are more corpus-frequent, producing systematic false affirmations.
Core Entities
Models
- LLaMA-65B
- GPT-3.5 (text-davinci-003)
- PaLM-540B
- GPT-4 (analysis in Appendix F)
Metrics
- AUC norm
- Precision
- Recall
- F1
- Folded Entail probability estimates
Datasets
- Levy/Holt (directional NLI)
- RTE-1
- Google N-grams (1950-2019)
- NewsCrawl
Benchmarks
- Natural Language Inference (NLI) directional subset
Context Entities
Models
- Alpaca
- Vicuna
- OPT, GPT-J (omitted)
Datasets
- MMLU (excluded)
- Natural Questions (excluded)
- OpenBookQA (excluded)

