Overview
Method is a practical system-level defense with clear mechanisms and benchmarked gains. Results are robust across models but evaluated at limited scale and with token/latency overhead.
Citations0
Evidence Strength0.80
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 5/5
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 40%
Production readiness: 60%
Novelty: 65%
Why It Matters For Business
If your agent can call external services, hidden instructions in returned content can trigger harmful actions. IPIGUARD stops many such attacks by pre-planning allowed tool calls and blocking unapproved ones, trading modest extra cost for much stronger protection.
Who Should Care
Summary TLDR
IPIGUARD shifts defense from model prompting to controlling execution. It first asks the agent to plan all tool calls as a Tool Dependency Graph (TDG), then strictly follows that plan while allowing limited, safe read-only expansions and simulated ('fake') tool responses to avoid adversarial changes. On the AgentDojo benchmark, this reduces attacker success to about 0.7% while keeping user utility near the undefended level, at the cost of ~2x token usage.
Problem Statement
LLM agents calling external tools can be hijacked by hidden instructions inside tool outputs (Indirect Prompt Injection, IPI). Existing defenses tweak prompts or add detectors but still let agents call any tool at execution time, so attackers can trigger harmful tool calls. The paper asks: can we stop IPI at the source by pre-planning and forbidding unapproved tool invocations?
Main Contribution
Introduce IPIGUARD: enforce a planned Tool Dependency Graph (TDG) so execution follows a pre-approved ordered plan rather than freely invoking tools at runtime.
Design Argument Estimation, Node Expansion, and Fake Tool Invocation to handle unknown arguments, permit safe read-only extra queries, and neutralize injected instructions that reuse legitimate tools.
Key Findings
IPIGUARD reduces average targeted attack success rate (ASR)
Benign utility is preserved near the undefended baseline
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Average Attack Success Rate (ASR) | 0.69% | No defense 13.16% (overall in Table 1) | ≈ -12.47 pp | AgentDojo, overall | IPIGUARD average ASR 0.69% vs no defense 13.16% across AgentDojo | Table 1 |
| Benign Utility (BU) | 67.01% | No defense 68.04% | -1.03 pp | Tasks without attack (AgentDojo) | IPIGUARD maintains BU ~67% vs 68% no-defense | Figure 5 / text |
What To Try In 7 Days
Run a smoke test: build a simple TDG for a 3-step workflow and enforce executor to follow it.
Enable Node Expansion for read-only queries only, to preserve utility while blocking writes.
Simulate Fake Tool Invocation for one overlapping-tool scenario to verify it prevents argument hijacks.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Inference Optimization
Reproducibility
Code URLs
Data URLs
Risks & Boundaries
Limitations
Focuses on tool-invocation attacks (IPI) and not on purely textual manipulations that don't trigger tools.
Experiment scale limited by LLM query costs; fewer models and cases than fully exhaustive studies.
When Not To Use
When attacker only aims to subtly change text outputs without causing tool actions (different threat model).
In highly resource-constrained deployments where 2x token usage and extra latency are unacceptable.
Failure Modes
Fake Tool Invocation might fail in corner cases, allowing argument hijacks.
Errors in TDG construction (missing necessary nodes) can block legitimate functionality.

