Build a tool-plan (TDG) and block unexpected tool calls to stop indirect prompt injections

August 21, 20258 min

Overview

Production Readiness

0.6

Novelty Score

0.65

Cost Impact Score

0.4

Citation Count

0

Authors

Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji

Links

Abstract / PDF

Why It Matters For Business

If your agent can call external services, hidden instructions in returned content can trigger harmful actions. IPIGUARD stops many such attacks by pre-planning allowed tool calls and blocking unapproved ones, trading modest extra cost for much stronger protection.

Summary TLDR

IPIGUARD shifts defense from model prompting to controlling execution. It first asks the agent to plan all tool calls as a Tool Dependency Graph (TDG), then strictly follows that plan while allowing limited, safe read-only expansions and simulated ('fake') tool responses to avoid adversarial changes. On the AgentDojo benchmark, this reduces attacker success to about 0.7% while keeping user utility near the undefended level, at the cost of ~2x token usage.

Problem Statement

LLM agents calling external tools can be hijacked by hidden instructions inside tool outputs (Indirect Prompt Injection, IPI). Existing defenses tweak prompts or add detectors but still let agents call any tool at execution time, so attackers can trigger harmful tool calls. The paper asks: can we stop IPI at the source by pre-planning and forbidding unapproved tool invocations?

Main Contribution

Introduce IPIGUARD: enforce a planned Tool Dependency Graph (TDG) so execution follows a pre-approved ordered plan rather than freely invoking tools at runtime.

Design Argument Estimation, Node Expansion, and Fake Tool Invocation to handle unknown arguments, permit safe read-only extra queries, and neutralize injected instructions that reuse legitimate tools.

Empirically test across six LLMs and four attack types on the AgentDojo benchmark, showing a strong security-utility trade-off compared to prior defenses.

Key Findings

IPIGUARD reduces average targeted attack success rate (ASR)

NumbersASR ≈ 0.69% average on AgentDojo (Table 1)

Benign utility is preserved near the undefended baseline

NumbersBenign Utility (BU) 67.01% vs no-defense 68.04% (Figure 5 / text)

Security gains come with measurable runtime/token overhead

NumbersInput tokens: 14,605 vs 6,165 (no defense); time 13.88s vs 7.13s (Table 2)

Combining Fake Tool Invocation (FTI) and Node Expansion (NE) helps both security and utility

NumbersAblation: FTI+NE → BU 69.07%, UA 57.07%, ASR 0.64% (Table 3)

Results

Average Attack Success Rate (ASR)

Value0.69%

BaselineNo defense 13.16% (overall in Table 1)

Benign Utility (BU)

Value67.01%

BaselineNo defense 68.04%

Utility under Attack (UA)

Value58.77% (average reported)

BaselineNo defense UA 61.37% (one baseline row) / varies by config

Input tokens per task

Value14,605

BaselineNo defense 6,165

Task completion time

Value13.88 s

BaselineNo defense 7.13 s

Who Should Care

What To Try In 7 Days

Run a smoke test: build a simple TDG for a 3-step workflow and enforce executor to follow it.

Enable Node Expansion for read-only queries only, to preserve utility while blocking writes.

Simulate Fake Tool Invocation for one overlapping-tool scenario to verify it prevents argument hijacks.

Agent Features

Memory

  • execution context stores prior tool responses for argument estimation

Planning

  • TDG construction (plan all tool calls up front)
  • separate planning LLM and execution LLM allowed

Tool Use

  • enforce pre-approved tool calls
  • Query vs Command tool classification

Frameworks

  • Command-Query Responsibility Segregation (inspired classification)

Is Agentic

true

Architectures

  • tool-augmented LLM agent (planner + executor)

Collaboration

  • planner and executor can be different LLMs

Optimization Features

Token Efficiency

  • planning accounts for ~20% of token usage; allow stronger planners without large token cost

Inference Optimization

  • use weaker executor + stronger planner to reduce cost (evaluated)

Reproducibility

Code Urls

  • code mentioned as available in paper; exact URL not provided in text

Code Available

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Focuses on tool-invocation attacks (IPI) and not on purely textual manipulations that don't trigger tools.
  • Experiment scale limited by LLM query costs; fewer models and cases than fully exhaustive studies.
  • Requires access to reasonably strong planners; weaker models may reduce TDG quality and utility.

When Not To Use

  • When attacker only aims to subtly change text outputs without causing tool actions (different threat model).
  • In highly resource-constrained deployments where 2x token usage and extra latency are unacceptable.
  • When no model in your stack can produce reliable plans (TDG quality is critical).

Failure Modes

  • Fake Tool Invocation might fail in corner cases, allowing argument hijacks.
  • Errors in TDG construction (missing necessary nodes) can block legitimate functionality.
  • Over-conservative blocking of command tools may reduce utility for workflows that require dynamic, write operations.

Core Entities

Models

  • GPT-4o
  • GPT-4o-mini
  • Claude 3.5 Sonnet
  • Qwen2.5-7B-Instruct
  • Qwen3-32B
  • o4-mini

Metrics

  • Benign Utility (BU)
  • Utility under Attack (UA)
  • Attack Success Rate (ASR)
  • Input Tokens
  • Output Tokens
  • Task Time (s)

Datasets

  • AgentDojo

Benchmarks

  • AgentDojo

Context Entities

Models

  • other closed-source and open-source LLMs referenced for comparison

Metrics

  • BU, UA, ASR, token/time overhead

Datasets

  • AgentDojo (public link in paper)

Benchmarks

  • AgentDojo