Overview
The paper offers a comprehensive taxonomy and a clear cross-framework comparison that practitioners can act on, but many recommendations are governance-level and require engineering follow‑through to operationalize.
Citations0
Evidence Strength0.60
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 0/6
Reproducibility
Status: No open assets linked
Open source: No
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
Multi‑agent systems amplify security and cost risks (data leaks, tool abuse, resource exhaustion) and current frameworks leave blind spots; companies must combine frameworks and add technical controls to avoid regulatory, financial, and operational loss.
Who Should Care
Summary TLDR
This paper builds a technical knowledge base of production multi-agent AI systems, uses generative-AI-assisted threat modeling plus expert review to derive a taxonomy of 193 distinct agentic threats across nine categories, and scores 16 security/governance frameworks against every threat. No framework covers most multi-agent gaps: OWASP ASI leads at 65.3% coverage, CDAO GenAI covers development/ops well, and non-determinism and data-leakage channels are the worst‑covered domains. The work is a practical guide to picking frameworks and prioritizing defenses for real multi-agent deployments.
Problem Statement
Multi-agent AI systems (agents that share memory, delegate tools, and coordinate) create new, behavioral attack surfaces not covered well by existing AI or infrastructure frameworks. Practitioners lack a systematic threat taxonomy and cross-framework coverage data to guide secure architecture and tool choice.
Main Contribution
A 193-item taxonomy of security threats unique to production multi-agent AI systems across nine categories.
A reproducible four-phase method: deep system knowledge base, generative-AI-assisted threat modeling, per-threat survey planning, and cross-framework scoring.
Key Findings
Multi-agent threat taxonomy contains 193 distinct, agent-specific threats.
Survey evaluated 16 security frameworks against every threat item.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Threat taxonomy size | 193 items | — | — | — | Composed from generative-AI-assisted modeling and expert validation | Sec. I-III |
| Frameworks surveyed | 16 frameworks | — | — | — | Cross-framework scoring across production and governance frameworks | Abstract, Sec. V-VI |
What To Try In 7 Days
Inventory agent surfaces: list agents, shared memories, tool registries, and vector stores.
Run a quick gap matrix: map your controls vs the paper's nine categories and flag non-determinism and data‑leakage gaps.
Add short-term mitigations: per-agent cryptographic identity, signed tool manifests, and per-call least-privilege enforcement.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Rapidly evolving field: coverage reflects state at publication and needs frequent updates.
Framework scoring mixes governance and technical controls; operational applicability varies by org.
When Not To Use
For simple single-agent chatbots without tool access or persistent memory—overkill.
If you need low-latency, single-model microservices where traditional infra controls suffice.
Failure Modes
Applying a single framework and assuming full coverage leaves blind spots (non-determinism, planning).
Relying only on governance checklists without runtime controls causes detection gaps during streaming and stochastic execution.

