Overview
Method provides clear diagnostics (activation gaps), a causal intervention, and measurable gains on constructed and out-of-domain benchmarks; replication needs access to controlled retrieval data and moderate compute.
Citations0
Evidence Strength0.80
Confidence0.86
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/4
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 70%
Novelty: 70%
Why It Matters For Business
ParamMute reduces hallucinations in RAG systems by suppressing FFNs that inject memorized facts, giving more reliable, evidence-aligned outputs with a plug-and-play finetuning path.
Who Should Care
Summary TLDR
RAG systems still hallucinate when internal model memory (parametric knowledge) overrides retrieved evidence. The authors find a narrow set of mid-to-deep FFN sublayers (called UA-FFNs) that are over-activated during unfaithful outputs. ParamMute: (1) identifies those UA-FFNs, (2) suppresses their activation, and (3) finetunes the suppressed model with a preference objective to favor retrieved context. On the introduced CoFaithfulQA benchmark and ConFiQA, ParamMute raises contextual recall and lowers memory recall (e.g., LLaMA3-8B: ConR 63.37→69.54, MemR 10.89→6.18). Code is available.
Problem Statement
Even with retrieval, LLMs can ignore accurate evidence and produce answers driven by internal memorized facts. The paper shows that over-activation of a small subset of FFN sublayers causes this behavior. The practical problem: how to reduce internal memory dominance so RAG outputs follow retrieved evidence.
Main Contribution
Identified Unfaithfulness-Associated FFNs (UA-FFNs): mid-to-deep FFN sublayers (e.g., layers ~20–29) whose high activation correlates with unfaithful outputs.
ParamMute method: select top-N UA-FFNs, apply soft/full suppression (activation scaling λ), then finetune with knowledge-augmented and max-margin preference objectives to favor retrieved evidence.
Key Findings
A narrow set of mid-to-deep FFN layers (around layers 20–29) show higher activation in unfaithful responses.
Causally suppressing UA-FFNs makes unfaithful outputs harder to produce (NLL increases as suppression grows).
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| ConR (context recall) | 69.54 | 63.37 (LLaMA3-8B vanilla-RAG) | +6.17 | CoFaithfulQA (average across subsets, LLaMA3-8B) | Table 7 reports LLaMA3-8B ConR 63.37→69.54 after ParamMute | Table 7 |
| MemR (memory recall) | 6.18 | 10.89 (LLaMA3-8B vanilla-RAG) | -4.71 | CoFaithfulQA (LLaMA3-8B) | Table 7 shows MemR drop from 10.89 to 6.18 for LLaMA3-8B | Table 7 |
What To Try In 7 Days
Measure layer-wise FFN activation gap between faithful/unfaithful outputs using self-consistency filtering.
Apply soft FFN suppression (scale activations with λ) on the top-N UA-FFNs (start N=8, λ=0.0 as reported).
Finetune the suppressed model with a knowledge-augmented likelihood plus a max-margin preference loss; use LoRA to save compute and speed testing.
Optimization Features
Model Optimization
Training Optimization
Inference Optimization
Reproducibility
Code URLs
Risks & Boundaries
Limitations
CoFaithfulQA is built under a controlled setting where retrieved context is guaranteed sufficient; it does not evaluate retrieval failures.
Suppression acts at FFN sublayer granularity; finer neuron-level interventions may be needed for smaller side effects.
When Not To Use
If retrieval quality is poor or documents miss the answer (retrieval failure scenarios).
For closed-book tasks where internal knowledge is the desired signal.
Failure Modes
Over-suppression (too many layers or very low λ) can hurt contextual grounding and accuracy.
Incorrect UA-FFN identification could suppress useful computation and reduce overall quality.

