Seven concrete security gaps that break current defenses in cross-domain multi‑agent LLMs

May 28, 20256 min

Overview

Production Readiness

1

Novelty Score

0.7

Cost Impact Score

0.8

Citation Count

0

Authors

Ronny Ko, Jiseong Jeong, Shuyuan Zheng, Chuan Xiao, Tae-Wan Kim, Makoto Onizuka, Won-Yong Shin

Links

Abstract / PDF

Why It Matters For Business

Cross‑organization agent cooperation breaks single‑domain safety and audit assumptions, increasing legal, financial, and operational risk unless systems are instrumented with cross‑domain security metrics.

Summary TLDR

This position paper maps seven specific security problems that arise when independently owned LLM agents cooperate across organizational boundaries. The authors group challenges into behavior-centric (unvetted dynamic grouping, collusion, incentive conflict, distributed self‑tuning drift) and data‑centric (provenance obscurity, context bypass, inter‑domain confidentiality/integrity). For each challenge they sketch attacks, propose evaluation metrics (per‑challenge ratios) and practical countermeasures (trust ledgers, adversarial multi‑agent training, session firewalls, neural signatures, hybrid cryptographic proofs). The work calls for security primitives and benchmarks before wide cross‑org

Problem Statement

Cross‑domain multi‑agent LLM systems let independently owned agents cooperate without a shared trust anchor. That breaks core assumptions behind existing single‑domain safety methods: agents that are benign alone can leak data, collude, or drift to unsafe objectives when interacting. The paper identifies seven security gaps that current defenses and cryptographic tools do not fully address.

Main Contribution

Identify seven security challenge categories specific to cross‑domain multi‑agent LLM deployments (C1–C7).

For each challenge, describe plausible attack patterns and practical evaluation metrics you can measure at runtime.

Offer concrete countermeasures and high‑level designs (trust ledgers, adversarial collusion training, session firewalls, neural signatures, hybrid verifiable privacy).

Argue for cross‑discipline work and open benchmarks to measure security/utility tradeoffs before large deployments.

Key Findings

Seven distinct categories of security risk appear when LLM agents cross ownership boundaries.

Numbers7 challenge categories (C1–C7)

Tool‑using agents still take dangerous actions in many high‑stakes simulated scenarios.

Numbers24% of high‑stakes scenarios led to dangerous actions (ToolEmu result cited)

Prompt infection and LLM‑to‑LLM prompt injection can propagate and persist across agent networks.

Pure cryptographic privacy approaches (FHE/MPC/ZK) create large performance and complexity costs in multi‑domain agent pipelines.

Numbersdescribed as "orders of magnitude" higher overheads

Who Should Care

What To Try In 7 Days

Map where your agents cross organizational boundaries and list sensitive data flows.

Run sandboxed emulation (ToolEmu‑style) on critical agent workflows to find risky tool actions.

Start logging the proposed per‑challenge metrics (Group Volatility, Collusion Risk, Provenance Coverage).

Agent Features

Memory

  • distributed contextual state across agents

Planning

  • dynamic grouping
  • hierarchical arbitration (meta-LLM)

Tool Use

  • tool-augmented agents (code/web actions increase risk)

Frameworks

  • AutoGen
  • Camel
  • AutoGPT

Is Agentic

true

Architectures

  • cross-domain multi-agent networks
  • dynamic ad hoc teaming

Collaboration

  • multi-agent cooperation and negotiation
  • potential collusion

Reproducibility

Open Source Status

  • no

Risks & Boundaries

Limitations

  • Position paper without new empirical evaluations or released benchmarks.
  • Countermeasures are high‑level and require engineering and performance studies.
  • Cryptographic fixes are noted as impractical but concrete hybrid implementations are not provided.

When Not To Use

  • For single‑owner, fully centralized multi‑agent systems where existing single‑domain controls suffice.
  • When you already have full auditability and no external agent interactions.

Failure Modes

  • Proposed metrics can be gamed by adaptive adversaries.
  • Neural signatures or watermarks may be removed or altered by intermediaries.
  • Trusted arbitration layers can become single points of failure or privacy bottlenecks.

Core Entities

Models

  • GPT-4
  • Claude
  • AutoGen
  • Llama-Guard

Metrics

  • Group Volatility
  • On-boarding Trust
  • Policy Consistency
  • Collusion Risk
  • Covert-channel Score
  • Independence Ratio
  • Goal Completeness
  • Conflict Resolution
  • Mutual Benefit
  • Tuning Log Coverage
  • Drift-detection latency
  • Performance consistency
  • Provenance Coverage
  • Source Verification
  • Action Traceability
  • Ill-prompt Block Rate
  • Falsepositives
  • Infection Propagation
  • Secure-channel Utility
  • Data Leakage
  • Request Vetting

Context Entities

Models

  • GPT-4
  • Claude

Metrics

  • Covert-channel Score
  • Infection Propagation