ParamMute mutes specific FFN layers so RAG models rely on retrieved evidence, not internal memorized facts

February 21, 20258 min

Overview

Decision SnapshotReady For Pilot

Method provides clear diagnostics (activation gaps), a causal intervention, and measurable gains on constructed and out-of-domain benchmarks; replication needs access to controlled retrieval data and moderate compute.

Citations0

Evidence Strength0.80

Confidence0.86

Risk Signals9

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/4

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 70%

Authors

Pengcheng Huang, Zhenghao Liu, Yukun Yan, Haiyan Zhao, Xiaoyuan Yi, Hao Chen, Zhiyuan Liu, Maosong Sun, Tong Xiao, Ge Yu, Chenyan Xiong

Links

Abstract / PDF / Code

Why It Matters For Business

ParamMute reduces hallucinations in RAG systems by suppressing FFNs that inject memorized facts, giving more reliable, evidence-aligned outputs with a plug-and-play finetuning path.

Who Should Care

Summary TLDR

RAG systems still hallucinate when internal model memory (parametric knowledge) overrides retrieved evidence. The authors find a narrow set of mid-to-deep FFN sublayers (called UA-FFNs) that are over-activated during unfaithful outputs. ParamMute: (1) identifies those UA-FFNs, (2) suppresses their activation, and (3) finetunes the suppressed model with a preference objective to favor retrieved context. On the introduced CoFaithfulQA benchmark and ConFiQA, ParamMute raises contextual recall and lowers memory recall (e.g., LLaMA3-8B: ConR 63.37→69.54, MemR 10.89→6.18). Code is available.

Problem Statement

Even with retrieval, LLMs can ignore accurate evidence and produce answers driven by internal memorized facts. The paper shows that over-activation of a small subset of FFN sublayers causes this behavior. The practical problem: how to reduce internal memory dominance so RAG outputs follow retrieved evidence.

Main Contribution

Identified Unfaithfulness-Associated FFNs (UA-FFNs): mid-to-deep FFN sublayers (e.g., layers ~20–29) whose high activation correlates with unfaithful outputs.

ParamMute method: select top-N UA-FFNs, apply soft/full suppression (activation scaling λ), then finetune with knowledge-augmented and max-margin preference objectives to favor retrieved evidence.

Key Findings

A narrow set of mid-to-deep FFN layers (around layers 20–29) show higher activation in unfaithful responses.

Numbersactivation gap concentrated in layers 2029; PCC p<0.05 (Section 2.2, Fig.1b)

Practical UseInspect layer-wise FFN activation; target mid-to-deep FFNs for mitigation rather than broad pruning.

Evidence RefSection 2.2, Figure 1(b)

Causally suppressing UA-FFNs makes unfaithful outputs harder to produce (NLL increases as suppression grows).

NumbersNLL on unfaithful subset rises monotonically as λ decreases; max at λ=0.0 (A.3, Fig.3)

Practical UseUse activation scaling (λ) on selected FFNs to reduce the chance of hallucinated answers.

Evidence RefAppendix A.3, Figure 3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
ConR (context recall)69.5463.37 (LLaMA3-8B vanilla-RAG)+6.17CoFaithfulQA (average across subsets, LLaMA3-8B)Table 7 reports LLaMA3-8B ConR 63.37→69.54 after ParamMuteTable 7
MemR (memory recall)6.1810.89 (LLaMA3-8B vanilla-RAG)-4.71CoFaithfulQA (LLaMA3-8B)Table 7 shows MemR drop from 10.89 to 6.18 for LLaMA3-8BTable 7

What To Try In 7 Days

Measure layer-wise FFN activation gap between faithful/unfaithful outputs using self-consistency filtering.

Apply soft FFN suppression (scale activations with λ) on the top-N UA-FFNs (start N=8, λ=0.0 as reported).

Finetune the suppressed model with a knowledge-augmented likelihood plus a max-margin preference loss; use LoRA to save compute and speed testing.

Optimization Features

Model Optimization
activation-level suppression of selected FFN sublayersselective parameter masking (SNIP-like) evaluated
Training Optimization
LoRAknowledge-augmented likelihood and max-margin preference loss
Inference Optimization
runtime scaling of FFN activations (λ) to control memory reliance

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

CoFaithfulQA is built under a controlled setting where retrieved context is guaranteed sufficient; it does not evaluate retrieval failures.

Suppression acts at FFN sublayer granularity; finer neuron-level interventions may be needed for smaller side effects.

When Not To Use

If retrieval quality is poor or documents miss the answer (retrieval failure scenarios).

For closed-book tasks where internal knowledge is the desired signal.

Failure Modes

Over-suppression (too many layers or very low λ) can hurt contextual grounding and accuracy.

Incorrect UA-FFN identification could suppress useful computation and reduce overall quality.

Core Entities

Models

LLaMA3-8B-InstructLLaMA3.1-8BLLaMA3.2-1BLLaMA3.2-3BQwen2.5-0.5BQwen2.5-1.5BQwen2.5-3BQwen2.5-7BQwen2.5-14B

Metrics

ConRMemRMRPerplexity (PPL)Negative Log-Likelihood (NLL)

Datasets

CoFaithfulQA (new)ConFiQAHotpotQANewsQANatural Questions (NQ)SearchQASQuADTriviaQA

Benchmarks

CoFaithfulQAConFiQA

Context Entities

Models

GPT-4oGLM-4-plus

Metrics

Human agreement (Cohen's κ 0.766)Human-LLM agreement 90.4%

Datasets

MRQA-derived combined training set

Benchmarks

ConFiQA (out-of-domain counterfactual)