Overview
Production Readiness
0.6
Novelty Score
0.55
Cost Impact Score
0.35
Citation Count
0
Why It Matters For Business
You can audit autonomous agents to see which past memory or tool output caused a decision—useful for compliance, debugging, and fixing business rule violations without needing explicit failures.
Summary TLDR
This paper presents a two-stage attribution framework that explains why an LLM-based agent produced a specific action. First it replays the agent trajectory component-by-component and scores the marginal likelihood gain to find high-impact components (temporal likelihood dynamics). Then it ablates sentences inside those components (probability drop & hold) to surface the exact textual evidence. The method is evaluated on eight curated agent trajectories (memory- and tool-driven scenarios) using Llama-3.1-70B-Instruct; a simple leave-one-out and linear baselines are compared. Code is released.
Problem Statement
Existing attribution work focuses on locating explicit failures. But many undesirable agent actions occur without an explicit error signal (for example a reasonable-looking refund or a privacy leak caused by a retrieved email). We need methods to explain which past memory entries, tool returns, or sentences actually drove a chosen action.
Main Contribution
A hierarchical agentic attribution framework: component-level temporal replay + sentence-level perturbation.
Component-level method: score marginal likelihood gains when incrementally revealing trajectory components.
Sentence-level method: combine probability drop (necessary) and probability hold (sufficient) via ablation.
Empirical evaluation on eight curated agent trajectories (memory- and tool-driven cases) using Llama-3.1-70B-Instruct.
Code release: https://github.com/AI45Lab/AgentDoG
Key Findings
Prob. Drop&Hold hits the human-labelled top sentence 93.75% of the time (Hit@1).
Multiple sentence-level attribution methods work well inside the framework; leave-one-out and ContextCite reach 81.25% Hit@1.
The framework pinpoints diverse driver types: memory reuse, prompt injections, early spurious tool signals, and hallucination from user prompts.
Results
Hit@1 (Prob. Drop&Hold)
Hit@3 (Prob. Drop&Hold)
Who Should Care
What To Try In 7 Days
Run the component-level replay on a few real agent traces to surface high-impact steps.
Apply the probability drop&hold ablation to top components to surface the exact sentence that drove an action.
Use findings to add simple guards: ignore untrusted tool text, downweight single-case memories, or require explicit evidence before high-risk actions.
Agent Features
Memory
- retrieval memory
- long-term memory updates
Planning
- incremental replay of trajectory (temporal likelihood dynamics)
Tool Use
- treat tool outputs as observations
- expose tool returns (email, web search, file reader) for attribution
Frameworks
- smolagents
Is Agentic
true
Architectures
- LLM-based agent (single-agent)
Reproducibility
Code Urls
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Evaluation uses eight curated cases and a single model (Llama-3.1-70B-Instruct), limiting generality.
- Sentence-level ablation requires model likelihood access and can be costly on long contexts.
- Human ground truth uses intersection of five annotators, which is conservative and may miss valid alternative evidence.
When Not To Use
- When you cannot compute or compare model likelihoods for ablated inputs (closed APIs without log-probs).
- As a fully automated monitor at large scale without further work to automate interpretation.
- If you need end-to-end causal proof rather than plausible evidence localization.
Failure Modes
- Multiple components share influence and attribution may split credit ambiguously.
- Agent self-contradiction or latent internal state can lead to misleading likelihood signals.
- Gradient salience runs OOM on long traces (noted for Saliency Score).
Core Entities
Models
- Llama-3.1-70B-Instruct
Metrics
- Hit@k (Hit@1/3/5)
- log-likelihood gain
Datasets
- custom 8-case agent trajectories
Benchmarks
- GAIA (one complex retrieval case used)
Context Entities
Models
- none other explicitly evaluated
Metrics
- probability drop
- probability hold
Datasets
- GAIA (cited)
- human annotation intersection (5 annotators) for ground truth

