ARC-JSD: a fast, training-free JSD method to find which retrieved sentences make a RAG answer

May 22, 20258 min

Overview

Decision SnapshotNeeds Validation

ARC-JSD is a practical inference-only tool with solid evidence on standard RAG QA benchmarks; strengths are compute savings and mechanistic consistency, while limits include sentence-level granularity and dependency on models exposing probabilities.

Citations1

Evidence Strength0.70

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/5

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 70%

Novelty: 60%

Authors

Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz

Links

Abstract / PDF / Code / Data

Why It Matters For Business

ARC-JSD gives a cheap, plug-in way to show which retrieved sentences actually caused an LLM answer, cutting compute costs and reducing hallucinations—useful for product trust, compliance, and debugging.

Who Should Care

Summary TLDR

The paper introduces ARC-JSD, a lightweight inference-time method that ranks retrieved sentences by how much removing each sentence changes the model's output distribution, measured with Jensen-Shannon divergence (JSD). ARC-JSD needs only forward passes (no fine-tuning, gradients, or surrogate models), yields ≈10.7% average improvement in top-1 sentence attribution versus prior training-free baselines on TyDi QA, Hotpot QA and MuSiQue, and cuts compute cost up to 3x versus surrogate/gradient methods. The method also locates attention heads and MLP layers important for attribution and uses them to reduce hallucination (~39% drop) without harming factual F1.

Problem Statement

In Retrieval-Augmented Generation (RAG), it's hard and costly to verify which retrieved sentences actually caused a model's answer. Existing approaches need heavy fine-tuning, many forward passes, gradient computations, or human labels. We need a fast, training-free way to attribute responses to specific context sentences and to inspect which internal components use them.

Main Contribution

ARC-JSD: an inference-only, Jensen-Shannon-divergence method to rank context sentences by their causal effect on the output distribution.

Empirical demonstration that ARC-JSD improves top-1 context attribution accuracy by ~10.7% on standard RAG QA benchmarks while reducing compute up to 3× versus prior baselines.

Key Findings

ARC-JSD improves top-1 sentence attribution accuracy versus prior training-free baselines.

Numbers≈10.7% average accuracy gain (MuSiQue summary; §4.2, Fig.2)

Practical UseUse ARC-JSD to more reliably pick the single sentence that grounded an answer, improving auditability without extra training.

Evidence RefAbstract; §4.2; Fig.2

ARC-JSD reduces inference compute relative to surrogate/gradient baselines.

NumbersUp to speedup vs ContextCite/surrogate baselines (§4.2; H)

Practical UseRun ARC-JSD in production to get attribution with far lower GPU costs than methods requiring hundreds of forward passes or fine-tuning.

Evidence RefTable1; §4.2; Appendix H; Fig.2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Accuracy≈+10.7% vs training-free baselinesALTI-Logit/MIRAGE/ContextCite+10.7%Aggregate over TyDi QA, Hotpot QA, MuSiQueFig.2; §4.2Fig.2; §4.2
Compute costUp to fasterContextCite and gradient-based baselines≤1/3 GFLOPs per sampleMuSiQue and others (compute-accuracy trade-off)Table1; Fig.2; Appendix HTable1; Fig.2

What To Try In 7 Days

Run ARC-JSD on a sample of production RAG queries to flag low-evidence answers (sentence-JSD < 0.02 bits).

Compare ARC-JSD top-1 sentence vs your current citation heuristic to measure attribution gaps.

Use ARC-JSD to find top attention/MLP components and test gating them to reduce hallucinations safely.

Agent Features

Memory
retrieval context (sentence-level)
Architectures
autoregressive Transformer

Optimization Features

Infra Optimization
lower GFLOPs per sample; practical 3× speedup reported
Training Optimization
none required (inference-only method)
Inference Optimization
reduces forward-call budget vs surrogate/gradient methodssingle ablation per sentence (no gradient/backprop)

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

TyDi QA (public)Hotpot QA (public)MuSiQue (public)

Risks & Boundaries

Limitations

Granularity limited to sentence-level in reported experiments; finer spans need extra engineering.

Does not identify individual neurons inside MLPs; layer-level only.

When Not To Use

When you need token- or phrase-level attribution out of the box (paper reports sentence-level).

When the LLM does not expose reliable next-token probabilities or logits.

Failure Modes

All JSD scores very small: means model likely ignored context; ARC-JSD will report low evidence rather than force a label.

If model answer comes from parametric memory (not retrieved context), JSD may be low and attribution will be uninformative.

Core Entities

Models

Qwen2-1.5B-ITQwen2-7B-ITGemma2-2B-ITGemma2-9B-ITLLaMA-3.1-8B-ITQwen3-Next-80B-A3B-IT

Metrics

Jensen-Shannon divergence (bits)AccuracyGFLOPs per sampleHallucination rate (%)Pass@1 factual F1 (%)

Datasets

TyDi QAHotpot QAMuSiQuePubMedQAMedQuADLegalBench

Benchmarks

Accuracy