AttnLRP: a faithful, efficient LRP variant that attributes attention and latent neurons in transformers

February 8, 20246 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.7

Citation Count

9

Authors

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek

Links

Abstract / PDF

Why It Matters For Business

AttnLRP gives faster, more faithful explanations for transformer decisions, lowering debugging cost and energy compared to perturbation; it also exposes neurons you can target to reduce hallucinations or bias.

Summary TLDR

AttnLRP extends Layer-wise Relevance Propagation (LRP) to handle transformer-specific functions (softmax, matrix multiplication, normalization). It produces more faithful input and latent (neuron-level) attributions than prior methods while keeping computation comparable to a single backward pass. Evaluations on ViTs and LLMs (LLaMa 2, Mixtral, Flan‑T5, Phi) show consistent faithfulness gains. The method also enables finding and manipulating ‘knowledge neurons’ to change model outputs.

Problem Statement

Transformers use nonlinear attention, matrix multiplications and normalization that break standard attribution rules. Existing attention-only or simple backprop methods are either unfaithful, noisy, numerically unstable, or too expensive to get layerwise/latent attributions. We need a single-pass, numerically stable method that attributes both inputs and hidden neurons in large transformers.

Main Contribution

New LRP rules for transformer operations: derived faithful, efficient propagation rules for softmax, bilinear matrix multiplication and normalization tailored to attention.

Latent-neuron attribution and interaction: AttnLRP yields per-neuron relevances and, combined with activation maximization, enables identifying and manipulating neurons that shift model outputs.

Open-source implementation: a ready-to-use library and practical guidance (including a γ-rule for denoising ViTs) to run AttnLRP on LLMs and ViTs.

Key Findings

AttnLRP yields higher faithfulness than prior LRP variants on next‑token/classification perturbation tests.

NumbersWikipedia perturbation area: AttnLRP 10.93 vs CP‑LRP 7.85 (∆=+3.08)

AttnLRP improves top‑1 token identification accuracy in QA on Mi x tral 8x7b from 0.50 (CP‑LRP) to 0.96.

NumbersMixtral SQuADv2 top‑1 accuracy: AttnLRP 0.96 vs CP‑LRP 0.50

AttnLRP runs with single backward‑pass efficiency and much lower forward‑pass cost than linear perturbation methods.

NumbersLRP checkpointing complexity O(1) vs perturbation O(N_T) forward passes

AttnLRP makes neuron interventions practical: activating/deactivating identified neurons changes generated tokens.

NumbersExample: manipulating neuron #3948 moved next token from 'Arctic' to 'sweet, sugary treats of the candy store' (qual.)

Vision transformers show noisy gradients; applying the γ‑rule improves faithfulness.

NumbersViT‑B‑16 perturbation area: AttnLRP (best) 6.19 vs CP‑LRP (γ) 6.06

Results

Faithfulness (perturbation area)

Value10.93 (AttnLRP, LLaMa2 Wikipedia)

Baseline7.85 (CP-LRP)

Accuracy

Value0.96 (AttnLRP, Mixtral 8x7b)

Baseline0.50 (CP-LRP)

Faithfulness (perturbation area)

Value6.19 (AttnLRP, ViT-B-16 best composite)

Baseline6.06 (γ CP-LRP)

Who Should Care

What To Try In 7 Days

Install AttnLRP from the paper's GitHub and run it on a small model and dataset to compare heatmaps with existing methods.

Run the perturbation faithfulness test (MoRF/LeRF area) to validate explanations on your task.

Use activation‑max samples + AttnLRP to find top neurons for a concept and try small neuron ablations to observe output shifts.

Reproducibility

Code Available

Data Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • ViTs require tuning of the γ hyperparameter to reduce noisy attributions.
  • Large models need checkpointing and substantial GPU memory; AttnLRP can exceed single‑node memory for very large contexts.
  • Softmax saturation (very low temperature) can stop gradient-based relevance flow; classification softmax attribution was bypassed in experiments.

When Not To Use

  • On tiny edge devices where memory and compute cannot support checkpointed backward passes.
  • If you only need a cheap, approximate attention‑only heatmap (attention rollout may suffice).
  • When you cannot tune γ and the model is a noisy ViT — results may be poor without tuning.

Failure Modes

  • Numerical instabilities if bias handling is changed (distributing bias or identity rule) — can explode relevances.
  • Mis-tuned γ leads to under- or over-smoothing of attributions in vision models.
  • Softmax at classification outputs can absorb relevance when gradients vanish, distorting attributions.

Core Entities

Models

  • LLaMa 2-7b
  • Mixtral 8x7b
  • Flan-T5-XL
  • ViT-B-16
  • ViT-L-16
  • ViT-L-32
  • Phi-1.5

Metrics

  • faithfulness area (A between curves)
  • Accuracy
  • Intersection over Union (IoU)

Datasets

  • ImageNet
  • IMDB
  • Wikipedia
  • SQuADv2
  • Wikipedia summary dataset

Benchmarks

  • Perturbation faithfulness (area between MoRF/LeRF curves)