Build a tool-plan (TDG) and block unexpected tool calls to stop indirect prompt injections

August 21, 20258 min

Overview

Decision SnapshotReady For Pilot

Method is a practical system-level defense with clear mechanisms and benchmarked gains. Results are robust across models but evaluated at limited scale and with token/latency overhead.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 40%

Production readiness: 60%

Novelty: 65%

Authors

Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji

Links

Abstract / PDF / Code / Data

Why It Matters For Business

If your agent can call external services, hidden instructions in returned content can trigger harmful actions. IPIGUARD stops many such attacks by pre-planning allowed tool calls and blocking unapproved ones, trading modest extra cost for much stronger protection.

Who Should Care

Summary TLDR

IPIGUARD shifts defense from model prompting to controlling execution. It first asks the agent to plan all tool calls as a Tool Dependency Graph (TDG), then strictly follows that plan while allowing limited, safe read-only expansions and simulated ('fake') tool responses to avoid adversarial changes. On the AgentDojo benchmark, this reduces attacker success to about 0.7% while keeping user utility near the undefended level, at the cost of ~2x token usage.

Problem Statement

LLM agents calling external tools can be hijacked by hidden instructions inside tool outputs (Indirect Prompt Injection, IPI). Existing defenses tweak prompts or add detectors but still let agents call any tool at execution time, so attackers can trigger harmful tool calls. The paper asks: can we stop IPI at the source by pre-planning and forbidding unapproved tool invocations?

Main Contribution

Introduce IPIGUARD: enforce a planned Tool Dependency Graph (TDG) so execution follows a pre-approved ordered plan rather than freely invoking tools at runtime.

Design Argument Estimation, Node Expansion, and Fake Tool Invocation to handle unknown arguments, permit safe read-only extra queries, and neutralize injected instructions that reuse legitimate tools.

Key Findings

IPIGUARD reduces average targeted attack success rate (ASR)

NumbersASR ≈ 0.69% average on AgentDojo (Table 1)

Practical UseUse TDG enforcement to cut IPI attack success to <1% on evaluated benchmarks; good choice when preventing concrete harmful tool use matters.

Evidence RefTable 1

Benign utility is preserved near the undefended baseline

NumbersBenign Utility (BU) 67.01% vs no-defense 68.04% (Figure 5 / text)

Practical UseYou can add IPIGUARD without large drops in normal task success; expect only small utility loss in most domains.

Evidence RefFigure 5

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Average Attack Success Rate (ASR)0.69%No defense 13.16% (overall in Table 1)≈ -12.47 ppAgentDojo, overallIPIGUARD average ASR 0.69% vs no defense 13.16% across AgentDojoTable 1
Benign Utility (BU)67.01%No defense 68.04%-1.03 ppTasks without attack (AgentDojo)IPIGUARD maintains BU ~67% vs 68% no-defenseFigure 5 / text

What To Try In 7 Days

Run a smoke test: build a simple TDG for a 3-step workflow and enforce executor to follow it.

Enable Node Expansion for read-only queries only, to preserve utility while blocking writes.

Simulate Fake Tool Invocation for one overlapping-tool scenario to verify it prevents argument hijacks.

Agent Features

Memory
execution context stores prior tool responses for argument estimation
Planning
TDG construction (plan all tool calls up front)separate planning LLM and execution LLM allowed
Tool Use
enforce pre-approved tool callsQuery vs Command tool classification
Frameworks
Command-Query Responsibility Segregation (inspired classification)
Is Agentic

Yes

Architectures
tool-augmented LLM agent (planner + executor)
Collaboration
planner and executor can be different LLMs

Optimization Features

Token Efficiency
planning accounts for ~20% of token usage; allow stronger planners without large token cost
Inference Optimization
use weaker executor + stronger planner to reduce cost (evaluated)

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Code URLs

code mentioned as available in paper; exact URL not provided in text

Risks & Boundaries

Limitations

Focuses on tool-invocation attacks (IPI) and not on purely textual manipulations that don't trigger tools.

Experiment scale limited by LLM query costs; fewer models and cases than fully exhaustive studies.

When Not To Use

When attacker only aims to subtly change text outputs without causing tool actions (different threat model).

In highly resource-constrained deployments where 2x token usage and extra latency are unacceptable.

Failure Modes

Fake Tool Invocation might fail in corner cases, allowing argument hijacks.

Errors in TDG construction (missing necessary nodes) can block legitimate functionality.

Core Entities

Models

GPT-4oGPT-4o-miniClaude 3.5 SonnetQwen2.5-7B-InstructQwen3-32Bo4-mini

Metrics

Benign Utility (BU)Utility under Attack (UA)Attack Success Rate (ASR)Input TokensOutput TokensTask Time (s)

Datasets

AgentDojo

Benchmarks

AgentDojo

Context Entities

Models

other closed-source and open-source LLMs referenced for comparison

Metrics

BU, UA, ASR, token/time overhead

Datasets

AgentDojo (public link in paper)

Benchmarks

AgentDojo