Overview
This survey synthesizes many recent studies and standards into practical controls, but gaps remain in standardized benchmarks, longitudinal validation, and large-scale red-team results.
Citations0
Evidence Strength0.80
Confidence0.85
Risk Signals10
Trust Signals
Findings with numeric evidence: 2/4
Findings with evidence refs: 4/4
Results with explicit delta: 0/3
Reproducibility
Status: No open assets linked
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 40%
Why It Matters For Business
Agentic systems increase autonomy and regulatory exposure; TRiSM reduces legal, reputational and operational risk while enabling auditable, compliant deployment.
Who Should Care
Summary TLDR
This survey adapts an industry-style Trust, Risk, and Security Management (TRiSM) framework to LLM-based multi-agent systems (AMAS). It catalogs AMAS-specific threats (prompt injection, memory poisoning, collusion), proposes two measurable metrics (Component Synergy Score and Tool Utilization Efficacy) to assess coordination and tool use, maps concrete controls across five TRiSM pillars, and issues a practical research and compliance roadmap for regulated deployments.
Problem Statement
LLM-based multi-agent systems introduce new, system-level risks from shared memory, tool calls, and inter-agent coordination. Existing literature focuses on agent capabilities but lacks an integrated, operational TRiSM view (governance, explainability, security, privacy, lifecycle controls) tailored to AMAS.
Main Contribution
A TRiSM framework specifically mapped to LLM-based multi-agent systems (explainability, ModelOps, security, privacy, governance).
A risk taxonomy for AMAS highlighting prompt injection, memory poisoning, agent collusion, emergent misbehavior, and tool abuse.
Key Findings
Academic interest in agentic AI has exploded, especially after ChatGPT's launch.
The paper proposes two concrete, measurable metrics for AMAS evaluation.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Component Synergy Score (CSS) example | CSS ≈ 0.32 for Planner→Coder→Tester example | — | — | illustrative software AMAS example in Section 5 | Planner speeds Coder by 35% (Impact=0.35); Tester quality conditioned on Coder = 0.92; CSS = 0.35×0.92≈0.32 | Section 5 (CSS illustrative example) |
| Bibliometric growth | multi-agent papers: 890→18,500 (2019→2024); LLM-agent papers ~0→9,800 | — | — | arXiv publication counts | Figure 1 and Introduction | Figure 1; Introduction |
What To Try In 7 Days
Add prompt hygiene and input sanitization to agent entry points.
Log tool calls and basic reasoning traces (timestamps, agent role, tool args).
Enforce least-privilege tool access and require human sign-off for sensitive actions.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Reproducibility
Risks & Boundaries
Limitations
Survey synthesizes literature but does not present new empirical defenses or code.
Few standardized stress-tests and limited cross-study benchmark comparability.
When Not To Use
For toy prototypes where no sensitive data or external actions are involved.
When strict low-latency requirements prohibit runtime monitoring and multi-agent checks.
Failure Modes
Memory poisoning that persists across sessions and agents.
Cascading failures due to orchestration misrouting or compromised orchestrator.

