Overview
The paper runs controlled behavioral tests across three major LLM families and shows consistent biases rooted in pretraining; numeric evidence and dataset-controlled splits back the claims.
Citations18
Evidence Strength0.90
Confidence0.85
Risk Signals8
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 3/3
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 20%
Production readiness: 30%
Novelty: 50%
Why It Matters For Business
LLMs can assert conclusions drawn from their training data or corpus statistics rather than the given context. That puts QA, summarization, and policy extraction at risk of silent misinformation; apply attestation checks and bias-controlled tests before deployment.
Who Should Care
Summary TLDR
This paper runs controlled prompting tests on LLaMA-65B, GPT-3.5 (text-davinci-003), and PaLM-540B to find two concrete sources of false positive 'entailment' (hallucination) in inference tasks. First, LLMs tend to assert conclusions when the hypothesis sentence appears in their training data (attestation bias). Second, they favor entailment when the hypothesis expresses a more frequent predicate than the premise (relative frequency bias). Both biases come from pretraining statistics and cause big drops in reliable inference when test examples are designed against them.
Problem Statement
LLMs are trusted for inference tasks (e.g., question answering, summarization), but they sometimes hallucinate by asserting conclusions not supported by provided premises. The paper asks: which pretraining-derived biases cause these false positives, and how much do they harm real NLI performance?
Main Contribution
Show and measure an attestation bias: models more often predict 'Entail' when the hypothesis matches text the model likely saw in pretraining.
Show and measure a relative frequency bias: models favor entailment if the hypothesis predicate is more corpus-frequent than the premise predicate.
Key Findings
Attestation (memorized sentence) strongly raises false positive entailments.
Relative term-frequency of predicates biases entailment decisions.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Attestation bias multiplicative effect | 1.9x (LLaMA), 2.2x (GPT-3.5), 2.0x (PaLM) | When hypothesis not attested | ↑ false Entail | Levy/Holt random-premise (I_RandPrem) | Abstract; §5; Fig.2 | §5 |
| Relative frequency bias multiplicative effect | 1.6x (LLaMA), 1.8x (GPT-3.5), 2.0x (PaLM) | When premise frequency ≥ hypothesis frequency | ↑ false Entail | Levy/Holt I_GenArg_RandPrem | Abstract; §7; Fig.3 | §7 |
What To Try In 7 Days
Add an attestation probe: ask the model whether the hypothesis is 'attested/unknown/false' before trusting outputs.
Run bias-controlled splits: evaluate models on examples adversarial to attestation and frequency biases.
Mask or canonicalize named entities in a staging test to see how much outputs rely on memorized entities.
Agent Features
Memory
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Paper tests two biases but does not claim to cover all hallucination sources.
Google N-grams is a proxy for pretraining frequency and may not match private pretraining corpora exactly.
When Not To Use
Do not rely solely on model outputs for high-stakes inference tasks without bias controls.
Avoid using raw LLM predictions for knowledge extraction where user-provided context must be the only source of truth.
Failure Modes
Model asserts hypothesis because it matches memorized training sentences, not because the premise supports it.
Named entities act as memory indices, causing over-reliance on entity identity instead of predicate logic.

