ParamMute mutes specific FFN layers so RAG models rely on retrieved evidence, not internal memorized facts

Overview

Decision SnapshotReady For Pilot

Method provides clear diagnostics (activation gaps), a causal intervention, and measurable gains on constructed and out-of-domain benchmarks; replication needs access to controlled retrieval data and moderate compute.

Citations0

Evidence Strength0.80

Confidence0.86

Risk Signals9

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/4

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 70%

Authors

Pengcheng Huang, Zhenghao Liu, Yukun Yan, Haiyan Zhao, Xiaoyuan Yi, Hao Chen, Zhiyuan Liu, Maosong Sun, Tong Xiao, Ge Yu, Chenyan Xiong

Links

Abstract / PDF / Code

Why It Matters For Business

ParamMute reduces hallucinations in RAG systems by suppressing FFNs that inject memorized facts, giving more reliable, evidence-aligned outputs with a plug-and-play finetuning path.

Who Should Care

CTO Product Manager ML Engineer Engineering Lead Data Scientist

Summary TLDR

RAG systems still hallucinate when internal model memory (parametric knowledge) overrides retrieved evidence. The authors find a narrow set of mid-to-deep FFN sublayers (called UA-FFNs) that are over-activated during unfaithful outputs. ParamMute: (1) identifies those UA-FFNs, (2) suppresses their activation, and (3) finetunes the suppressed model with a preference objective to favor retrieved context. On the introduced CoFaithfulQA benchmark and ConFiQA, ParamMute raises contextual recall and lowers memory recall (e.g., LLaMA3-8B: ConR 63.37→69.54, MemR 10.89→6.18). Code is available.

Problem Statement

Even with retrieval, LLMs can ignore accurate evidence and produce answers driven by internal memorized facts. The paper shows that over-activation of a small subset of FFN sublayers causes this behavior. The practical problem: how to reduce internal memory dominance so RAG outputs follow retrieved evidence.

Main Contribution

Identified Unfaithfulness-Associated FFNs (UA-FFNs): mid-to-deep FFN sublayers (e.g., layers ~20–29) whose high activation correlates with unfaithful outputs.

ParamMute method: select top-N UA-FFNs, apply soft/full suppression (activation scaling λ), then finetune with knowledge-augmented and max-margin preference objectives to favor retrieved evidence.

Key Findings

A narrow set of mid-to-deep FFN layers (around layers 20–29) show higher activation in unfaithful responses.

Numbersactivation gap concentrated in layers 20–29; PCC p<0.05 (Section 2.2, Fig.1b)

Practical UseInspect layer-wise FFN activation; target mid-to-deep FFNs for mitigation rather than broad pruning.

Evidence RefSection 2.2, Figure 1(b)

Causally suppressing UA-FFNs makes unfaithful outputs harder to produce (NLL increases as suppression grows).

NumbersNLL on unfaithful subset rises monotonically as λ decreases; max at λ=0.0 (A.3, Fig.3)

Practical UseUse activation scaling (λ) on selected FFNs to reduce the chance of hallucinated answers.

Evidence RefAppendix A.3, Figure 3

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
ConR (context recall)	69.54	63.37 (LLaMA3-8B vanilla-RAG)	+6.17	CoFaithfulQA (average across subsets, LLaMA3-8B)	Table 7 reports LLaMA3-8B ConR 63.37→69.54 after ParamMute	Table 7
MemR (memory recall)	6.18	10.89 (LLaMA3-8B vanilla-RAG)	-4.71	CoFaithfulQA (LLaMA3-8B)	Table 7 shows MemR drop from 10.89 to 6.18 for LLaMA3-8B	Table 7

What To Try In 7 Days

Measure layer-wise FFN activation gap between faithful/unfaithful outputs using self-consistency filtering.

Apply soft FFN suppression (scale activations with λ) on the top-N UA-FFNs (start N=8, λ=0.0 as reported).

Finetune the suppressed model with a knowledge-augmented likelihood plus a max-margin preference loss; use LoRA to save compute and speed testing.

Optimization Features

Model Optimization

activation-level suppression of selected FFN sublayersselective parameter masking (SNIP-like) evaluated

Training Optimization

LoRAknowledge-augmented likelihood and max-margin preference loss

Inference Optimization

runtime scaling of FFN activations (λ) to control memory reliance

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/OpenBMB/ParamMute

Risks & Boundaries

Limitations

CoFaithfulQA is built under a controlled setting where retrieved context is guaranteed sufficient; it does not evaluate retrieval failures.

Suppression acts at FFN sublayer granularity; finer neuron-level interventions may be needed for smaller side effects.

When Not To Use

If retrieval quality is poor or documents miss the answer (retrieval failure scenarios).

For closed-book tasks where internal knowledge is the desired signal.

Failure Modes

Over-suppression (too many layers or very low λ) can hurt contextual grounding and accuracy.

Incorrect UA-FFN identification could suppress useful computation and reduce overall quality.

Core Entities

Models

LLaMA3-8B-InstructLLaMA3.1-8BLLaMA3.2-1BLLaMA3.2-3BQwen2.5-0.5BQwen2.5-1.5BQwen2.5-3BQwen2.5-7BQwen2.5-14B

Metrics

ConRMemRMRPerplexity (PPL)Negative Log-Likelihood (NLL)

Datasets

CoFaithfulQA (new)ConFiQAHotpotQANewsQANatural Questions (NQ)SearchQASQuADTriviaQA

Benchmarks

CoFaithfulQAConFiQA

Context Entities

Models

GPT-4oGLM-4-plus

Metrics

Human agreement (Cohen's κ 0.766)Human-LLM agreement 90.4%

Datasets

MRQA-derived combined training set

Benchmarks

ConFiQA (out-of-domain counterfactual)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

A narrow set of mid-to-deep FFN layers (around layers 20–29) show higher activation in unfaithful responses.

Causally suppressing UA-FFNs makes unfaithful outputs harder to produce (NLL increases as suppression grows).

Results

What To Try In 7 Days

Optimization Features

Reproducibility

Code URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Bi'an: a bilingual RAG hallucination benchmark plus small fine-tuned judge models

Key finding

MultiHal: a multilingual, Wikidata-grounded benchmark that uses KG paths to evaluate and reduce LLM hallucinations

Key finding

DiaHalu: 1,103 multi-turn dialogues to test hallucination in chat-style LLMs

Key finding

An open leaderboard that measures LLM hallucinations across 15 tasks and 20 models

Key finding

LLMs (GPT-3.5, GPT-4, PaLM-2) do not reliably judge factuality on the FRANK benchmark

Key finding