Overview
Production Readiness
0.6
Novelty Score
0.65
Cost Impact Score
0.4
Citation Count
0
Why It Matters For Business
If your agent can call external services, hidden instructions in returned content can trigger harmful actions. IPIGUARD stops many such attacks by pre-planning allowed tool calls and blocking unapproved ones, trading modest extra cost for much stronger protection.
Summary TLDR
IPIGUARD shifts defense from model prompting to controlling execution. It first asks the agent to plan all tool calls as a Tool Dependency Graph (TDG), then strictly follows that plan while allowing limited, safe read-only expansions and simulated ('fake') tool responses to avoid adversarial changes. On the AgentDojo benchmark, this reduces attacker success to about 0.7% while keeping user utility near the undefended level, at the cost of ~2x token usage.
Problem Statement
LLM agents calling external tools can be hijacked by hidden instructions inside tool outputs (Indirect Prompt Injection, IPI). Existing defenses tweak prompts or add detectors but still let agents call any tool at execution time, so attackers can trigger harmful tool calls. The paper asks: can we stop IPI at the source by pre-planning and forbidding unapproved tool invocations?
Main Contribution
Introduce IPIGUARD: enforce a planned Tool Dependency Graph (TDG) so execution follows a pre-approved ordered plan rather than freely invoking tools at runtime.
Design Argument Estimation, Node Expansion, and Fake Tool Invocation to handle unknown arguments, permit safe read-only extra queries, and neutralize injected instructions that reuse legitimate tools.
Empirically test across six LLMs and four attack types on the AgentDojo benchmark, showing a strong security-utility trade-off compared to prior defenses.
Key Findings
IPIGUARD reduces average targeted attack success rate (ASR)
Benign utility is preserved near the undefended baseline
Security gains come with measurable runtime/token overhead
Combining Fake Tool Invocation (FTI) and Node Expansion (NE) helps both security and utility
Results
Average Attack Success Rate (ASR)
Benign Utility (BU)
Utility under Attack (UA)
Input tokens per task
Task completion time
Who Should Care
What To Try In 7 Days
Run a smoke test: build a simple TDG for a 3-step workflow and enforce executor to follow it.
Enable Node Expansion for read-only queries only, to preserve utility while blocking writes.
Simulate Fake Tool Invocation for one overlapping-tool scenario to verify it prevents argument hijacks.
Agent Features
Memory
- execution context stores prior tool responses for argument estimation
Planning
- TDG construction (plan all tool calls up front)
- separate planning LLM and execution LLM allowed
Tool Use
- enforce pre-approved tool calls
- Query vs Command tool classification
Frameworks
- Command-Query Responsibility Segregation (inspired classification)
Is Agentic
true
Architectures
- tool-augmented LLM agent (planner + executor)
Collaboration
- planner and executor can be different LLMs
Optimization Features
Token Efficiency
- planning accounts for ~20% of token usage; allow stronger planners without large token cost
Inference Optimization
- use weaker executor + stronger planner to reduce cost (evaluated)
Reproducibility
Code Urls
- code mentioned as available in paper; exact URL not provided in text
Data Urls
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Focuses on tool-invocation attacks (IPI) and not on purely textual manipulations that don't trigger tools.
- Experiment scale limited by LLM query costs; fewer models and cases than fully exhaustive studies.
- Requires access to reasonably strong planners; weaker models may reduce TDG quality and utility.
When Not To Use
- When attacker only aims to subtly change text outputs without causing tool actions (different threat model).
- In highly resource-constrained deployments where 2x token usage and extra latency are unacceptable.
- When no model in your stack can produce reliable plans (TDG quality is critical).
Failure Modes
- Fake Tool Invocation might fail in corner cases, allowing argument hijacks.
- Errors in TDG construction (missing necessary nodes) can block legitimate functionality.
- Over-conservative blocking of command tools may reduce utility for workflows that require dynamic, write operations.
Core Entities
Models
- GPT-4o
- GPT-4o-mini
- Claude 3.5 Sonnet
- Qwen2.5-7B-Instruct
- Qwen3-32B
- o4-mini
Metrics
- Benign Utility (BU)
- Utility under Attack (UA)
- Attack Success Rate (ASR)
- Input Tokens
- Output Tokens
- Task Time (s)
Datasets
- AgentDojo
Benchmarks
- AgentDojo
Context Entities
Models
- other closed-source and open-source LLMs referenced for comparison
Metrics
- BU, UA, ASR, token/time overhead
Datasets
- AgentDojo (public link in paper)
Benchmarks
- AgentDojo

