ARC-JSD: a fast, training-free JSD method to find which retrieved sentences make a RAG answer

May 22, 20258 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.7

Citation Count

1

Authors

Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz

Links

Abstract / PDF

Why It Matters For Business

ARC-JSD gives a cheap, plug-in way to show which retrieved sentences actually caused an LLM answer, cutting compute costs and reducing hallucinations—useful for product trust, compliance, and debugging.

Summary TLDR

The paper introduces ARC-JSD, a lightweight inference-time method that ranks retrieved sentences by how much removing each sentence changes the model's output distribution, measured with Jensen-Shannon divergence (JSD). ARC-JSD needs only forward passes (no fine-tuning, gradients, or surrogate models), yields ≈10.7% average improvement in top-1 sentence attribution versus prior training-free baselines on TyDi QA, Hotpot QA and MuSiQue, and cuts compute cost up to 3x versus surrogate/gradient methods. The method also locates attention heads and MLP layers important for attribution and uses them to reduce hallucination (~39% drop) without harming factual F1.

Problem Statement

In Retrieval-Augmented Generation (RAG), it's hard and costly to verify which retrieved sentences actually caused a model's answer. Existing approaches need heavy fine-tuning, many forward passes, gradient computations, or human labels. We need a fast, training-free way to attribute responses to specific context sentences and to inspect which internal components use them.

Main Contribution

ARC-JSD: an inference-only, Jensen-Shannon-divergence method to rank context sentences by their causal effect on the output distribution.

Empirical demonstration that ARC-JSD improves top-1 context attribution accuracy by ~10.7% on standard RAG QA benchmarks while reducing compute up to 3× versus prior baselines.

Mechanistic analysis that pinpoints attention heads and MLP layers tied to context use, and a controlled gating intervention that lowers hallucination rate without harming factual F1.

Key Findings

ARC-JSD improves top-1 sentence attribution accuracy versus prior training-free baselines.

Numbers≈10.7% average accuracy gain (MuSiQue summary; §4.2, Fig.2)

ARC-JSD reduces inference compute relative to surrogate/gradient baselines.

NumbersUp to 3× speedup vs ContextCite/surrogate baselines (§4.2; H)

When ARC-JSD marks a model's attribution as correct, judged answers are semantically equivalent to gold answers.

NumbersGPT-4.1 judge accuracy ≈99.3% when attribution is correct (Appendix F)

Gating attention heads and MLPs located by ARC-JSD reduces hallucination rate while preserving factual score.

NumbersHallucination down 13.4%→8.2% (~39% relative) with Pass@1 F1 76.1→75.9 (Table4)

Top attention heads selected by ARC-JSD show larger attribution signal than random heads.

NumbersAvg JSD top-10 heads 2.23 ± 0.12 vs random 1.53 ± 0.76 (Table5)

Results

Accuracy

Value≈+10.7% vs training-free baselines

BaselineALTI-Logit/MIRAGE/ContextCite

Compute cost

ValueUp to 3× faster

BaselineContextCite and gradient-based baselines

Judge-verified answer correctness when attribution correct

Value≈99.3% semantic match

Hallucination rate after gating ARC-JSD components

Value8.2% (from 13.4%)

BaselineBase RAG

Average JSD of top-10 attention heads vs random 10

Value2.23 ± 0.12 vs 1.53 ± 0.76

BaselineRandom heads

Who Should Care

What To Try In 7 Days

Run ARC-JSD on a sample of production RAG queries to flag low-evidence answers (sentence-JSD < 0.02 bits).

Compare ARC-JSD top-1 sentence vs your current citation heuristic to measure attribution gaps.

Use ARC-JSD to find top attention/MLP components and test gating them to reduce hallucinations safely.

Agent Features

Memory

  • retrieval context (sentence-level)

Architectures

  • autoregressive Transformer

Optimization Features

Infra Optimization

  • lower GFLOPs per sample; practical 3× speedup reported

Training Optimization

  • none required (inference-only method)

Inference Optimization

  • reduces forward-call budget vs surrogate/gradient methods
  • single ablation per sentence (no gradient/backprop)

Reproducibility

Data Urls

  • TyDi QA (public)
  • Hotpot QA (public)
  • MuSiQue (public)

Code Available

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Granularity limited to sentence-level in reported experiments; finer spans need extra engineering.
  • Does not identify individual neurons inside MLPs; layer-level only.
  • Evaluations use public QA datasets and instruction-tuned LLMs; behavior may differ outside tested domains or with closed models.

When Not To Use

  • When you need token- or phrase-level attribution out of the box (paper reports sentence-level).
  • When the LLM does not expose reliable next-token probabilities or logits.
  • If you require neuron-level editing; ARC-JSD operates at sentence and layer/head granularity.

Failure Modes

  • All JSD scores very small: means model likely ignored context; ARC-JSD will report low evidence rather than force a label.
  • If model answer comes from parametric memory (not retrieved context), JSD may be low and attribution will be uninformative.
  • Judge bias: downstream evaluation used GPT-4.1 as a semantic judge, which can introduce its own biases.

Core Entities

Models

  • Qwen2-1.5B-IT
  • Qwen2-7B-IT
  • Gemma2-2B-IT
  • Gemma2-9B-IT
  • LLaMA-3.1-8B-IT
  • Qwen3-Next-80B-A3B-IT

Metrics

  • Jensen-Shannon divergence (bits)
  • Accuracy
  • GFLOPs per sample
  • Hallucination rate (%)
  • Pass@1 factual F1 (%)

Datasets

  • TyDi QA
  • Hotpot QA
  • MuSiQue
  • PubMedQA
  • MedQuAD
  • LegalBench

Benchmarks

  • Accuracy