Overview
The method is well validated on multiple models and datasets; it is ready for real experiments but needs memory management and γ tuning for large ViTs.
Citations9
Evidence Strength0.80
Confidence0.80
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 3/3
Reproducibility
Status: Code + data available
Open source: Yes
At A Glance
Cost impact: 70%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
AttnLRP gives faster, more faithful explanations for transformer decisions, lowering debugging cost and energy compared to perturbation; it also exposes neurons you can target to reduce hallucinations or bias.
Who Should Care
Summary TLDR
AttnLRP extends Layer-wise Relevance Propagation (LRP) to handle transformer-specific functions (softmax, matrix multiplication, normalization). It produces more faithful input and latent (neuron-level) attributions than prior methods while keeping computation comparable to a single backward pass. Evaluations on ViTs and LLMs (LLaMa 2, Mixtral, Flan‑T5, Phi) show consistent faithfulness gains. The method also enables finding and manipulating ‘knowledge neurons’ to change model outputs.
Problem Statement
Transformers use nonlinear attention, matrix multiplications and normalization that break standard attribution rules. Existing attention-only or simple backprop methods are either unfaithful, noisy, numerically unstable, or too expensive to get layerwise/latent attributions. We need a single-pass, numerically stable method that attributes both inputs and hidden neurons in large transformers.
Main Contribution
New LRP rules for transformer operations: derived faithful, efficient propagation rules for softmax, bilinear matrix multiplication and normalization tailored to attention.
Latent-neuron attribution and interaction: AttnLRP yields per-neuron relevances and, combined with activation maximization, enables identifying and manipulating neurons that shift model outputs.
Key Findings
AttnLRP yields higher faithfulness than prior LRP variants on next‑token/classification perturbation tests.
AttnLRP improves top‑1 token identification accuracy in QA on Mi x tral 8x7b from 0.50 (CP‑LRP) to 0.96.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Faithfulness (perturbation area) | 10.93 (AttnLRP, LLaMa2 Wikipedia) | 7.85 (CP-LRP) | +3.08 | Wikipedia next-word | Table 1, Section 4.1 | — |
| Accuracy | 0.96 (AttnLRP, Mixtral 8x7b) | 0.50 (CP-LRP) | +0.46 | Mixtral SQuADv2 | Table 1, Section 4.1 | — |
What To Try In 7 Days
Install AttnLRP from the paper's GitHub and run it on a small model and dataset to compare heatmaps with existing methods.
Run the perturbation faithfulness test (MoRF/LeRF area) to validate explanations on your task.
Use activation‑max samples + AttnLRP to find top neurons for a concept and try small neuron ablations to observe output shifts.
Reproducibility
Risks & Boundaries
Limitations
ViTs require tuning of the γ hyperparameter to reduce noisy attributions.
Large models need checkpointing and substantial GPU memory; AttnLRP can exceed single‑node memory for very large contexts.
When Not To Use
On tiny edge devices where memory and compute cannot support checkpointed backward passes.
If you only need a cheap, approximate attention‑only heatmap (attention rollout may suffice).
Failure Modes
Numerical instabilities if bias handling is changed (distributing bias or identity rule) — can explode relevances.
Mis-tuned γ leads to under- or over-smoothing of attributions in vision models.

