TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

June 4, 20257 min

Overview

Production Readiness

0.6

Novelty Score

0.4

Cost Impact Score

0.7

Citation Count

0

Authors

Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis

Links

Abstract / PDF

Why It Matters For Business

Agentic systems increase autonomy and regulatory exposure; TRiSM reduces legal, reputational and operational risk while enabling auditable, compliant deployment.

Summary TLDR

This survey adapts an industry-style Trust, Risk, and Security Management (TRiSM) framework to LLM-based multi-agent systems (AMAS). It catalogs AMAS-specific threats (prompt injection, memory poisoning, collusion), proposes two measurable metrics (Component Synergy Score and Tool Utilization Efficacy) to assess coordination and tool use, maps concrete controls across five TRiSM pillars, and issues a practical research and compliance roadmap for regulated deployments.

Problem Statement

LLM-based multi-agent systems introduce new, system-level risks from shared memory, tool calls, and inter-agent coordination. Existing literature focuses on agent capabilities but lacks an integrated, operational TRiSM view (governance, explainability, security, privacy, lifecycle controls) tailored to AMAS.

Main Contribution

A TRiSM framework specifically mapped to LLM-based multi-agent systems (explainability, ModelOps, security, privacy, governance).

A risk taxonomy for AMAS highlighting prompt injection, memory poisoning, agent collusion, emergent misbehavior, and tool abuse.

Two operational metrics: Component Synergy Score (CSS) for inter-agent enablement and Tool Utilization Efficacy (TUE) for correct and efficient tool calls.

A technique map and gap analysis linking XAI methods, prompt hygiene, sandboxing, secure computation, and ModelOps to AMAS needs.

A practical research and compliance roadmap emphasizing adversarial robustness, benchmarks, human oversight interfaces, and regulatory alignment.

Key Findings

Academic interest in agentic AI has exploded, especially after ChatGPT's launch.

NumbersMulti-agent papers: 890 (2019) → 18,500 (2024); LLM-agent papers: ~0 → 9,800 (post-2022)

The paper proposes two concrete, measurable metrics for AMAS evaluation.

AMAS-specific threats cluster into four operational classes: adversarial attacks, data leakage, agent collusion, and emergent behaviors.

Literature review included 180 primary studies after screening.

Numbers180 primary studies shortlisted after full-text screening

Results

Component Synergy Score (CSS) example

ValueCSS ≈ 0.32 for Planner→Coder→Tester example

Bibliometric growth

Valuemulti-agent papers: 890→18,500 (2019→2024); LLM-agent papers ~0→9,800

Publication type distribution

Value61.2% journal articles

Who Should Care

What To Try In 7 Days

Add prompt hygiene and input sanitization to agent entry points.

Log tool calls and basic reasoning traces (timestamps, agent role, tool args).

Enforce least-privilege tool access and require human sign-off for sensitive actions.

Agent Features

Memory

  • Persistent vector DB (long-term memory)
  • Working context / short-term memory
  • Memory scoping and TTL

Planning

  • Chain-of-Thought (CoT)
  • ReAct (plan-act-observe loop)
  • Layered-CoT (decomposed reasoning)
  • Plan-then-Execute

Tool Use

  • API/function calling
  • Toolformer-style learned API calls
  • MRKL-style expert routing

Frameworks

  • AutoGen
  • MetaGPT
  • LangGraph
  • LangChain
  • OpenAI Agents SDK

Is Agentic

true

Architectures

  • LLM-based multi-agent orchestrator with shared memory
  • Role-specialized agent pipelines (planner/verifier/executor)
  • Middleware + Task Manager + World Model

Collaboration

  • Protocolized communication (A2A/ANP)
  • Role-based coordination and hierarchical monitoring
  • Cross-agent validation / critics

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Survey synthesizes literature but does not present new empirical defenses or code.
  • Few standardized stress-tests and limited cross-study benchmark comparability.
  • Human-centered evaluation is resource intensive and hard to scale.
  • Operational costs and latency overhead of TRiSM controls are non-trivial.

When Not To Use

  • For toy prototypes where no sensitive data or external actions are involved.
  • When strict low-latency requirements prohibit runtime monitoring and multi-agent checks.

Failure Modes

  • Memory poisoning that persists across sessions and agents.
  • Cascading failures due to orchestration misrouting or compromised orchestrator.
  • Tool misuse leading to unauthorized actions or data exfiltration.
  • Overtrust and user complacency when explanations are shallow or misleading.

Core Entities

Models

  • GPT-4
  • GPT-3.5
  • LLaMA
  • LLaVA
  • Provider-agnostic LLMs

Metrics

  • Component Synergy Score (CSS)
  • Tool Utilization Efficacy (TUE)
  • Attack Success Rate (ASR)
  • robustness degradation
  • ECE
  • Brier score
  • user satisfaction (CSAT)

Datasets

  • HarmBench
  • JailbreakBench
  • ToolBench
  • AgentBench
  • GAIA
  • WebArena

Benchmarks

  • HarmBench
  • JailbreakBench
  • ToolBench
  • AgentBench
  • GAIA
  • HELM
  • MLCommons AI Safety

Context Entities

Models

  • Toolformer-style API-calling models
  • Neuro-symbolic MRKL hybrids

Metrics

  • composite trustworthiness vector
  • coordination efficiency (messages/tokens, rounds-to-consensus)

Datasets

  • simulated multi-agent scenarios
  • red-team prompt collections

Benchmarks

  • domain-specific tool-use tests
  • multi-agent adversarial scenario suites