Practical gap-filling for threat models of LLM-based multi-agent systems

August 13, 20257 min

Overview

Decision SnapshotNeeds Validation

Practical and actionable taxonomy updates are proposed, but the paper is conceptual and lacks empirical validation. Use recommendations as design guidance and test them in your environment.

Citations0

Evidence Strength0.40

Confidence0.70

Risk Signals8

Trust Signals

Findings with numeric evidence: 0/5

Findings with evidence refs: 5/5

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 50%

Production readiness: 30%

Novelty: 60%

Authors

Klaudia Krawiecka, Christian Schroeder de Witt

Links

Abstract / PDF

Why It Matters For Business

LLM-based multi-agent products can fail in new ways that single-agent tests miss. These failures can cause silent misuse, compliance breaches, or data leaks because agents coordinate or drift without explicit errors.

Who Should Care

Summary TLDR

The paper analyzes the OWASP Multi-Agentic System (MAS) threat guide and proposes concrete extensions tailored to LLM-driven multi-agent systems. It catalogues threats that OWASP omits or treats weakly (reasoning collapse, covert coordination, multi-agent backdoors, metric overfitting, goal drift, etc.) and recommends testing and monitoring practices (chaos testing, network-injection, coordination metrics, long-run emergent monitoring) to reduce risk in deployed multi-agent pipelines.

Problem Statement

Existing OWASP MAS guidance misses several risks that arise only when many LLM-driven agents interact. These gaps leave real deployments exposed to coordination failures, covert signaling, privilege escalation between agent roles, and metric-driven exploitation.

Main Contribution

Systematic gap analysis showing OWASP's MAS taxonomy misses multi-agent-specific threats such as reasoning collapse, emergent covert coordination, and heterogeneous multi-agent exploits.

A proposed set of new threat classes and example attack scenarios tailored to LLM-driven multi-agent architectures (planner, executor, verifier, refiner roles).

Key Findings

OWASP's current MAS guide does not cover several failure modes that appear only in interacting LLM agents.

Practical UseExtend threat models to explicitly include multi-agent phenomena (reasoning collapse, covert coordination, metric overfitting) before deployment; assume single-agent checks are insufficient.

Evidence RefTable 1 and Table 2: many entries marked 'Not Covered' or 'Partially Covered'

Multi-agent interactions enable covert coordination and task-splitting attacks where each agent looks safe alone but the set behaves maliciously.

Practical UseMonitor cross-agent signaling and design network separation or strict role permissions; test attack patterns that span multiple agents.

Evidence RefTable 2 entries: 'Emergent Covert Coordination' and 'Heterogeneous Multi-Agent ​

What To Try In 7 Days

Add role-based permission checks between planner/executor/verifier agents and log delegation ancestry for each action.

Run a short chaos test: inject delayed/corrupted messages into agent communications and observe whether verifiers catch errors.

Audit evaluation scripts for metric-gaming: add at least one adversarial input that checks for pattern-based scoring exploits.

Agent Features

Memory
cross-agent context propagation (can cause confusion)emergent signaling protocols (learned over interactions)
Planning
planner→executor delegation chainssubplanner decompositioncoercive overrides (delegation pressure)
Tool Use
API and external tool invocationtool invocation driven by executor agents
Frameworks
NetSafeTrustAgentChaos engineering for MAS
Is Agentic

Yes

Architectures
planner/orchestratorsubplannerexecutorverifierrefiner
Collaboration
multi-agent coordinationcovert coordination and collusionheterogeneous agent chaining

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Conceptual work: no new large-scale experiments or quantitative validation included.

Recommendations rely on prior frameworks and examples rather than systematic benchmarks of attacker success rates.

When Not To Use

If your system is a single isolated model with no delegation, many multi-agent threats are irrelevant.

When you need empirically validated attack success rates — this paper is a taxonomy and guidance, not an attack benchmark.

Failure Modes

Extending threat lists without operational tests can give a false sense of security.

Over-restrictive role separation could break legitimate delegation and reduce performance.

Core Entities

Metrics

task completion rateefficiency (steps/time)resource utilizationagreement/consensus scores (Faithful Agreement, Traitor Agreement)

Benchmarks

StarCraft Multi-Agent ChallengeVendingBench

Context Entities

Benchmarks

Curvo: hidden-role game evaluations