Overview
ARC-JSD is a practical inference-only tool with solid evidence on standard RAG QA benchmarks; strengths are compute savings and mechanistic consistency, while limits include sentence-level granularity and dependency on models exposing probabilities.
Citations1
Evidence Strength0.70
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/5
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
ARC-JSD gives a cheap, plug-in way to show which retrieved sentences actually caused an LLM answer, cutting compute costs and reducing hallucinations—useful for product trust, compliance, and debugging.
Who Should Care
Summary TLDR
The paper introduces ARC-JSD, a lightweight inference-time method that ranks retrieved sentences by how much removing each sentence changes the model's output distribution, measured with Jensen-Shannon divergence (JSD). ARC-JSD needs only forward passes (no fine-tuning, gradients, or surrogate models), yields ≈10.7% average improvement in top-1 sentence attribution versus prior training-free baselines on TyDi QA, Hotpot QA and MuSiQue, and cuts compute cost up to 3x versus surrogate/gradient methods. The method also locates attention heads and MLP layers important for attribution and uses them to reduce hallucination (~39% drop) without harming factual F1.
Problem Statement
In Retrieval-Augmented Generation (RAG), it's hard and costly to verify which retrieved sentences actually caused a model's answer. Existing approaches need heavy fine-tuning, many forward passes, gradient computations, or human labels. We need a fast, training-free way to attribute responses to specific context sentences and to inspect which internal components use them.
Main Contribution
ARC-JSD: an inference-only, Jensen-Shannon-divergence method to rank context sentences by their causal effect on the output distribution.
Empirical demonstration that ARC-JSD improves top-1 context attribution accuracy by ~10.7% on standard RAG QA benchmarks while reducing compute up to 3× versus prior baselines.
Key Findings
ARC-JSD improves top-1 sentence attribution accuracy versus prior training-free baselines.
ARC-JSD reduces inference compute relative to surrogate/gradient baselines.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | ≈+10.7% vs training-free baselines | ALTI-Logit/MIRAGE/ContextCite | +10.7% | Aggregate over TyDi QA, Hotpot QA, MuSiQue | Fig.2; §4.2 | Fig.2; §4.2 |
| Compute cost | Up to 3× faster | ContextCite and gradient-based baselines | ≤1/3 GFLOPs per sample | MuSiQue and others (compute-accuracy trade-off) | Table1; Fig.2; Appendix H | Table1; Fig.2 |
What To Try In 7 Days
Run ARC-JSD on a sample of production RAG queries to flag low-evidence answers (sentence-JSD < 0.02 bits).
Compare ARC-JSD top-1 sentence vs your current citation heuristic to measure attribution gaps.
Use ARC-JSD to find top attention/MLP components and test gating them to reduce hallucinations safely.
Agent Features
Memory
Architectures
Optimization Features
Infra Optimization
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Granularity limited to sentence-level in reported experiments; finer spans need extra engineering.
Does not identify individual neurons inside MLPs; layer-level only.
When Not To Use
When you need token- or phrase-level attribution out of the box (paper reports sentence-level).
When the LLM does not expose reliable next-token probabilities or logits.
Failure Modes
All JSD scores very small: means model likely ignored context; ARC-JSD will report low evidence rather than force a label.
If model answer comes from parametric memory (not retrieved context), JSD may be low and attribution will be uninformative.

