Overview
Production Readiness
0.7
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
1
Why It Matters For Business
ARC-JSD gives a cheap, plug-in way to show which retrieved sentences actually caused an LLM answer, cutting compute costs and reducing hallucinations—useful for product trust, compliance, and debugging.
Summary TLDR
The paper introduces ARC-JSD, a lightweight inference-time method that ranks retrieved sentences by how much removing each sentence changes the model's output distribution, measured with Jensen-Shannon divergence (JSD). ARC-JSD needs only forward passes (no fine-tuning, gradients, or surrogate models), yields ≈10.7% average improvement in top-1 sentence attribution versus prior training-free baselines on TyDi QA, Hotpot QA and MuSiQue, and cuts compute cost up to 3x versus surrogate/gradient methods. The method also locates attention heads and MLP layers important for attribution and uses them to reduce hallucination (~39% drop) without harming factual F1.
Problem Statement
In Retrieval-Augmented Generation (RAG), it's hard and costly to verify which retrieved sentences actually caused a model's answer. Existing approaches need heavy fine-tuning, many forward passes, gradient computations, or human labels. We need a fast, training-free way to attribute responses to specific context sentences and to inspect which internal components use them.
Main Contribution
ARC-JSD: an inference-only, Jensen-Shannon-divergence method to rank context sentences by their causal effect on the output distribution.
Empirical demonstration that ARC-JSD improves top-1 context attribution accuracy by ~10.7% on standard RAG QA benchmarks while reducing compute up to 3× versus prior baselines.
Mechanistic analysis that pinpoints attention heads and MLP layers tied to context use, and a controlled gating intervention that lowers hallucination rate without harming factual F1.
Key Findings
ARC-JSD improves top-1 sentence attribution accuracy versus prior training-free baselines.
ARC-JSD reduces inference compute relative to surrogate/gradient baselines.
When ARC-JSD marks a model's attribution as correct, judged answers are semantically equivalent to gold answers.
Gating attention heads and MLPs located by ARC-JSD reduces hallucination rate while preserving factual score.
Top attention heads selected by ARC-JSD show larger attribution signal than random heads.
Results
Accuracy
Compute cost
Judge-verified answer correctness when attribution correct
Hallucination rate after gating ARC-JSD components
Average JSD of top-10 attention heads vs random 10
Who Should Care
What To Try In 7 Days
Run ARC-JSD on a sample of production RAG queries to flag low-evidence answers (sentence-JSD < 0.02 bits).
Compare ARC-JSD top-1 sentence vs your current citation heuristic to measure attribution gaps.
Use ARC-JSD to find top attention/MLP components and test gating them to reduce hallucinations safely.
Agent Features
Memory
- retrieval context (sentence-level)
Architectures
- autoregressive Transformer
Optimization Features
Infra Optimization
- lower GFLOPs per sample; practical 3× speedup reported
Training Optimization
- none required (inference-only method)
Inference Optimization
- reduces forward-call budget vs surrogate/gradient methods
- single ablation per sentence (no gradient/backprop)
Reproducibility
Data Urls
- TyDi QA (public)
- Hotpot QA (public)
- MuSiQue (public)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Granularity limited to sentence-level in reported experiments; finer spans need extra engineering.
- Does not identify individual neurons inside MLPs; layer-level only.
- Evaluations use public QA datasets and instruction-tuned LLMs; behavior may differ outside tested domains or with closed models.
When Not To Use
- When you need token- or phrase-level attribution out of the box (paper reports sentence-level).
- When the LLM does not expose reliable next-token probabilities or logits.
- If you require neuron-level editing; ARC-JSD operates at sentence and layer/head granularity.
Failure Modes
- All JSD scores very small: means model likely ignored context; ARC-JSD will report low evidence rather than force a label.
- If model answer comes from parametric memory (not retrieved context), JSD may be low and attribution will be uninformative.
- Judge bias: downstream evaluation used GPT-4.1 as a semantic judge, which can introduce its own biases.
Core Entities
Models
- Qwen2-1.5B-IT
- Qwen2-7B-IT
- Gemma2-2B-IT
- Gemma2-9B-IT
- LLaMA-3.1-8B-IT
- Qwen3-Next-80B-A3B-IT
Metrics
- Jensen-Shannon divergence (bits)
- Accuracy
- GFLOPs per sample
- Hallucination rate (%)
- Pass@1 factual F1 (%)
Datasets
- TyDi QA
- Hotpot QA
- MuSiQue
- PubMedQA
- MedQuAD
- LegalBench
Benchmarks
- Accuracy

