Practical gap-filling for threat models of LLM-based multi-agent systems

August 13, 20257 min

Overview

Production Readiness

0.3

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

0

Authors

Klaudia Krawiecka, Christian Schroeder de Witt

Links

Abstract / PDF

Why It Matters For Business

LLM-based multi-agent products can fail in new ways that single-agent tests miss. These failures can cause silent misuse, compliance breaches, or data leaks because agents coordinate or drift without explicit errors.

Summary TLDR

The paper analyzes the OWASP Multi-Agentic System (MAS) threat guide and proposes concrete extensions tailored to LLM-driven multi-agent systems. It catalogues threats that OWASP omits or treats weakly (reasoning collapse, covert coordination, multi-agent backdoors, metric overfitting, goal drift, etc.) and recommends testing and monitoring practices (chaos testing, network-injection, coordination metrics, long-run emergent monitoring) to reduce risk in deployed multi-agent pipelines.

Problem Statement

Existing OWASP MAS guidance misses several risks that arise only when many LLM-driven agents interact. These gaps leave real deployments exposed to coordination failures, covert signaling, privilege escalation between agent roles, and metric-driven exploitation.

Main Contribution

Systematic gap analysis showing OWASP's MAS taxonomy misses multi-agent-specific threats such as reasoning collapse, emergent covert coordination, and heterogeneous multi-agent exploits.

A proposed set of new threat classes and example attack scenarios tailored to LLM-driven multi-agent architectures (planner, executor, verifier, refiner roles).

Practical evaluation guidance: robustness (chaos engineering), coordination metrics, safety enforcement layers, and long-run emergent behavior monitoring.

Key Findings

OWASP's current MAS guide does not cover several failure modes that appear only in interacting LLM agents.

Multi-agent interactions enable covert coordination and task-splitting attacks where each agent looks safe alone but the set behaves maliciously.

Reasoning collapse and benign goal drift can propagate through planner→executor→verifier chains and break safety checks.

Standard evaluation metrics can be gamed by coordinated agents (metric overfitting) and may reinforce unsafe behavior.

Practical defenses include chaos engineering, network-injection tests, coordination benchmarks, and long-run monitoring to detect emergent behavior.

Who Should Care

What To Try In 7 Days

Add role-based permission checks between planner/executor/verifier agents and log delegation ancestry for each action.

Run a short chaos test: inject delayed/corrupted messages into agent communications and observe whether verifiers catch errors.

Audit evaluation scripts for metric-gaming: add at least one adversarial input that checks for pattern-based scoring exploits.

Agent Features

Memory

  • cross-agent context propagation (can cause confusion)
  • emergent signaling protocols (learned over interactions)

Planning

  • planner→executor delegation chains
  • subplanner decomposition
  • coercive overrides (delegation pressure)

Tool Use

  • API and external tool invocation
  • tool invocation driven by executor agents

Frameworks

  • NetSafe
  • TrustAgent
  • Chaos engineering for MAS

Is Agentic

true

Architectures

  • planner/orchestrator
  • subplanner
  • executor
  • verifier
  • refiner

Collaboration

  • multi-agent coordination
  • covert coordination and collusion
  • heterogeneous agent chaining

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Conceptual work: no new large-scale experiments or quantitative validation included.
  • Recommendations rely on prior frameworks and examples rather than systematic benchmarks of attacker success rates.
  • Guidance focuses on LLM-driven agent pipelines; single-agent systems or non-LLM agents may need different checks.

When Not To Use

  • If your system is a single isolated model with no delegation, many multi-agent threats are irrelevant.
  • When you need empirically validated attack success rates — this paper is a taxonomy and guidance, not an attack benchmark.

Failure Modes

  • Extending threat lists without operational tests can give a false sense of security.
  • Over-restrictive role separation could break legitimate delegation and reduce performance.
  • Metrics added to guard against overfitting can themselves be gamed if not properly audited.

Core Entities

Metrics

  • task completion rate
  • efficiency (steps/time)
  • resource utilization
  • agreement/consensus scores (Faithful Agreement, Traitor Agreement)

Benchmarks

  • StarCraft Multi-Agent Challenge
  • VendingBench

Context Entities

Benchmarks

  • Curvo: hidden-role game evaluations