AttnLRP: a faithful, efficient LRP variant that attributes attention and latent neurons in transformers

February 8, 20246 min

Overview

Decision SnapshotReady For Pilot

The method is well validated on multiple models and datasets; it is ready for real experiments but needs memory management and γ tuning for large ViTs.

Citations9

Evidence Strength0.80

Confidence0.80

Risk Signals9

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 70%

Production readiness: 70%

Novelty: 60%

Authors

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek

Links

Abstract / PDF / Code

Why It Matters For Business

AttnLRP gives faster, more faithful explanations for transformer decisions, lowering debugging cost and energy compared to perturbation; it also exposes neurons you can target to reduce hallucinations or bias.

Who Should Care

Summary TLDR

AttnLRP extends Layer-wise Relevance Propagation (LRP) to handle transformer-specific functions (softmax, matrix multiplication, normalization). It produces more faithful input and latent (neuron-level) attributions than prior methods while keeping computation comparable to a single backward pass. Evaluations on ViTs and LLMs (LLaMa 2, Mixtral, Flan‑T5, Phi) show consistent faithfulness gains. The method also enables finding and manipulating ‘knowledge neurons’ to change model outputs.

Problem Statement

Transformers use nonlinear attention, matrix multiplications and normalization that break standard attribution rules. Existing attention-only or simple backprop methods are either unfaithful, noisy, numerically unstable, or too expensive to get layerwise/latent attributions. We need a single-pass, numerically stable method that attributes both inputs and hidden neurons in large transformers.

Main Contribution

New LRP rules for transformer operations: derived faithful, efficient propagation rules for softmax, bilinear matrix multiplication and normalization tailored to attention.

Latent-neuron attribution and interaction: AttnLRP yields per-neuron relevances and, combined with activation maximization, enables identifying and manipulating neurons that shift model outputs.

Key Findings

AttnLRP yields higher faithfulness than prior LRP variants on next‑token/classification perturbation tests.

NumbersWikipedia perturbation area: AttnLRP 10.93 vs CP‑LRP 7.85 (∆=+3.08)

Practical UseUse AttnLRP when you need more faithful input attributions for transformer language tasks; it meaningfully outperforms conservative LRP on evaluated benchmarks.

Evidence RefTable 1 (Wikipedia next‑word), Section 4.1

AttnLRP improves top‑1 token identification accuracy in QA on Mi x tral 8x7b from 0.50 (CP‑LRP) to 0.96.

NumbersMixtral SQuADv2 top‑1 accuracy: AttnLRP 0.96 vs CP‑LRP 0.50

Practical UseFor question‑answering models with routing/expert layers, AttnLRP gives much clearer token-level explanations; prefer it for debugging or auditing QA outputs.

Evidence RefTable 1 (SQuADv2), Section 4.1.2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Faithfulness (perturbation area)10.93 (AttnLRP, LLaMa2 Wikipedia)7.85 (CP-LRP)+3.08Wikipedia next-wordTable 1, Section 4.1
Accuracy0.96 (AttnLRP, Mixtral 8x7b)0.50 (CP-LRP)+0.46Mixtral SQuADv2Table 1, Section 4.1

What To Try In 7 Days

Install AttnLRP from the paper's GitHub and run it on a small model and dataset to compare heatmaps with existing methods.

Run the perturbation faithfulness test (MoRF/LeRF area) to validate explanations on your task.

Use activation‑max samples + AttnLRP to find top neurons for a concept and try small neuron ablations to observe output shifts.

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusYes
LicenseUnknown

Risks & Boundaries

Limitations

ViTs require tuning of the γ hyperparameter to reduce noisy attributions.

Large models need checkpointing and substantial GPU memory; AttnLRP can exceed single‑node memory for very large contexts.

When Not To Use

On tiny edge devices where memory and compute cannot support checkpointed backward passes.

If you only need a cheap, approximate attention‑only heatmap (attention rollout may suffice).

Failure Modes

Numerical instabilities if bias handling is changed (distributing bias or identity rule) — can explode relevances.

Mis-tuned γ leads to under- or over-smoothing of attributions in vision models.

Core Entities

Models

LLaMa 2-7bMixtral 8x7bFlan-T5-XLViT-B-16ViT-L-16ViT-L-32Phi-1.5

Metrics

faithfulness area (A between curves)AccuracyIntersection over Union (IoU)

Datasets

ImageNetIMDBWikipediaSQuADv2Wikipedia summary dataset

Benchmarks

Perturbation faithfulness (area between MoRF/LeRF curves)