Overview
High conceptual clarity and comprehensive literature coverage. Theoretical inevitability claim is strong but depends on the computability formalism; empirical effect sizes vary by task and benchmark.
Citations0
Evidence Strength0.70
Confidence0.86
Risk Signals10
Trust Signals
Findings with numeric evidence: 4/7
Findings with evidence refs: 7/7
Results with explicit delta: 0/5
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 70%
Production readiness: 40%
Novelty: 60%
Why It Matters For Business
Hallucinations can damage trust, cause legal/financial harm, and break workflows. Because some hallucination is inevitable, companies must design systems that detect, ground, and escalate risky outputs rather than assuming perfect model truthfulness.
Who Should Care
Summary TLDR
This 55-page survey formalizes what 'hallucination' means for LLMs, argues (with computability proofs) that some hallucination is inevitable, maps many hallucination subtypes (factual, intrinsic/extrinsic, temporal, ethical, code, multimodal, etc.), surveys benchmarks and metrics, and reviews mitigation patterns (RAG, tool calls, fine-tuning, guardrails, human-in-loop). Practical message: you cannot fully eliminate hallucination; focus on detection, grounding, hybrid safeguards, and human oversight.
Problem Statement
LLMs often generate plausible but incorrect or fabricated content. The field lacks a unified taxonomy and reliable, task-aware evaluation methods. The paper asks: what kinds of hallucination exist, why do they happen, how should we measure them, and how should practitioners mitigate their harms?
Main Contribution
A formal definition of hallucination and theoretical proofs that hallucination is unavoidable for computable LLMs.
A detailed, practical taxonomy that separates intrinsic vs extrinsic and factuality vs faithfulness, followed by many concrete subtypes (temporal, ethical, amalgamated, code, multimodal).
Key Findings
Hallucination is provably unavoidable for computable LLMs.
Logical inconsistencies form a non-trivial share of hallucinations (reported 19%).
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Logical inconsistency share | 19% | — | — | Aggregated hallucination analyses (cited sources) | Section 4.4 (19% of cases; cites [42;47;34;95]) | [42;47;34;95] |
| Temporal disorientation share | 12% | — | — | Aggregated hallucination analyses (cited sources) | Section 4.5 (12% of cases; cites [47;51]) | [47;51] |
What To Try In 7 Days
Add a simple retrieval step (search Wikipedia or internal docs) before answering time-sensitive user queries.
Surface source links and timestamps on model claims so users can verify quickly.
Implement a rule-based fallback: when confidence is low, ask for clarification or route to human review instead of inventing answers.
Agent Features
Tool Use
Optimization Features
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Theoretical inevitability is proven in a formal computability setting; practical models and tasks may behave differently.
Benchmarks are fragmented and task-dependent; no single metric detects all hallucination types.
When Not To Use
Do not rely on vanilla LLM outputs alone for high-stakes medical, legal, or financial decisions.
Avoid autonomous deployment without retrieval grounding and human oversight for safety-critical tasks.
Failure Modes
Overconfidence: fluent but incorrect outputs that mislead users.
Adversarial or fabricated prompts that induce false elaboration.

