Pretraining memory and corpus-frequency biases drive much of LLM hallucination on inference

May 23, 20237 min

Overview

Decision SnapshotNeeds Validation

The paper runs controlled behavioral tests across three major LLM families and shows consistent biases rooted in pretraining; numeric evidence and dataset-controlled splits back the claims.

Citations18

Evidence Strength0.90

Confidence0.85

Risk Signals8

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 20%

Production readiness: 30%

Novelty: 50%

Authors

Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman

Links

Abstract / PDF / Code / Data

Why It Matters For Business

LLMs can assert conclusions drawn from their training data or corpus statistics rather than the given context. That puts QA, summarization, and policy extraction at risk of silent misinformation; apply attestation checks and bias-controlled tests before deployment.

Who Should Care

Summary TLDR

This paper runs controlled prompting tests on LLaMA-65B, GPT-3.5 (text-davinci-003), and PaLM-540B to find two concrete sources of false positive 'entailment' (hallucination) in inference tasks. First, LLMs tend to assert conclusions when the hypothesis sentence appears in their training data (attestation bias). Second, they favor entailment when the hypothesis expresses a more frequent predicate than the premise (relative frequency bias). Both biases come from pretraining statistics and cause big drops in reliable inference when test examples are designed against them.

Problem Statement

LLMs are trusted for inference tasks (e.g., question answering, summarization), but they sometimes hallucinate by asserting conclusions not supported by provided premises. The paper asks: which pretraining-derived biases cause these false positives, and how much do they harm real NLI performance?

Main Contribution

Show and measure an attestation bias: models more often predict 'Entail' when the hypothesis matches text the model likely saw in pretraining.

Show and measure a relative frequency bias: models favor entailment if the hypothesis predicate is more corpus-frequent than the premise predicate.

Key Findings

Attestation (memorized sentence) strongly raises false positive entailments.

NumbersFalse Entail chance 1.9x (LLaMA), 2.2x (GPT-3.5), 2.0x (PaLM)

Practical UseIf a hypothesis appears in pretraining, the model may assert it regardless of the premise—check model attestation before trusting entailment outputs.

Evidence RefAbstract; §5; Fig.2

Relative term-frequency of predicates biases entailment decisions.

NumbersFalse Entail chance 1.6x (LLaMA), 1.8x (GPT-3.5), 2.0x (PaLM)

Practical UseWhen the hypothesis uses a more common predicate than the premise, expect the model to wrongly affirm entailment; controlling for frequency reduces this error source.

Evidence RefAbstract; §7; Fig.3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Attestation bias multiplicative effect1.9x (LLaMA), 2.2x (GPT-3.5), 2.0x (PaLM)When hypothesis not attested↑ false EntailLevy/Holt random-premise (I_RandPrem)Abstract; §5; Fig.2§5
Relative frequency bias multiplicative effect1.6x (LLaMA), 1.8x (GPT-3.5), 2.0x (PaLM)When premise frequency ≥ hypothesis frequency↑ false EntailLevy/Holt I_GenArg_RandPremAbstract; §7; Fig.3§7

What To Try In 7 Days

Add an attestation probe: ask the model whether the hypothesis is 'attested/unknown/false' before trusting outputs.

Run bias-controlled splits: evaluate models on examples adversarial to attestation and frequency biases.

Mask or canonicalize named entities in a staging test to see how much outputs rely on memorized entities.

Agent Features

Memory
Propositional memory (sentence-level memorization)

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Paper tests two biases but does not claim to cover all hallucination sources.

Google N-grams is a proxy for pretraining frequency and may not match private pretraining corpora exactly.

When Not To Use

Do not rely solely on model outputs for high-stakes inference tasks without bias controls.

Avoid using raw LLM predictions for knowledge extraction where user-provided context must be the only source of truth.

Failure Modes

Model asserts hypothesis because it matches memorized training sentences, not because the premise supports it.

Named entities act as memory indices, causing over-reliance on entity identity instead of predicate logic.

Core Entities

Models

LLaMA-65BGPT-3.5 (text-davinci-003)PaLM-540BGPT-4 (analysis in Appendix F)

Metrics

AUC normPrecisionRecallF1Folded Entail probability estimates

Datasets

Levy/Holt (directional NLI)RTE-1Google N-grams (1950-2019)NewsCrawl

Benchmarks

Natural Language Inference (NLI) directional subset

Context Entities

Models

AlpacaVicunaOPT, GPT-J (omitted)

Datasets

MMLU (excluded)Natural Questions (excluded)OpenBookQA (excluded)