Learn a sparse communication graph for multi-agent teams; matches full communication while using 40% of edges

May 14, 20247 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.6

Citation Count

4

Authors

Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao

Links

Abstract / PDF

Why It Matters For Business

Learned sparse communication can cut bandwidth and messaging hardware needs while keeping team performance, so multi-robot warehouses or distributed fleets can save cost and latency without retraining for every topology.

Summary TLDR

CommFormer treats agent communication as a learnable directed graph and trains it end-to-end via a relaxed, differentiable representation plus a simple bi-level optimization. It learns a sparse adjacency matrix (sparsity S) and uses attention to route messages. On multiple cooperative benchmarks (SMAC, GRF, Predator-Prey variants) CommFormer often outperforms communication baselines and nearly matches fully-connected communication while using only 40% of edges, reducing bandwidth and computation.

Problem Statement

Hand-designing who talks to whom in multi-agent systems is costly and brittle. Full all-to-all communication is expensive in bandwidth and compute. The paper asks: can we learn a compact, fixed communication graph before inference that preserves cooperation performance while lowering communication cost?

Main Contribution

Model the communication topology as a learnable adjacency matrix and optimize it end-to-end via continuous relaxation.

Combine a Transformer-like encoder–decoder with an attention-based message aggregator that masks by the learned graph.

Use a practical bi-level optimization loop (single-step inner approximation) and k-hot Gumbel sampling to optimize discrete graphs efficiently.

Show across SMAC, GRF and predator-prey domains that a learned sparse graph (S=0.4) matches or outperforms strong baselines and nearly matches full communication.

Key Findings

CommFormer often matches fully-connected communication while using 40% of edges.

NumbersS=0.4 (40% edges); many SMAC maps show 100.0% win rate vs FC

CommFormer beats strong CTDE baselines on hard SMAC maps.

Numbers3s5z: CommFormer 100.0% vs MAT 74.0% (∆≈+26pp)

Performance rises with increased graph sparsity parameter S, especially for many-agent, hard tasks.

NumbersFigure 3 shows monotonic gains as S increases; complex maps need larger S

The graph search is trained with a continuous relaxation and k-hot Gumbel trick enabling gradient updates.

Results

win rate (SMAC map 3s5z)

Value100.0% (0.0)

BaselineMAT 74.0%

win rate (aggregate many SMAC maps)

Value100.0% on many maps (e.g., 3m, 1c3s5z, MMM)

Baselinevarious baselines lower or highly variable

communication cost vs performance

ValueUses S=0.4 (40% edges) with near-FC performance

BaselineFully-connected CommFormer (FC) uses 100% edges

GRF success rate and episode length

Value100.0% success; 25.4 steps (±0.4)

BaselineMAGIC 98.2% success; 34.3 steps

Who Should Care

What To Try In 7 Days

Run CommFormer with sparsity S=0.4 on a small MARL task and compare win rate to your current baseline.

Visualize the learned adjacency matrix to find which agents become communication hubs.

Sweep S from 0.1 to 1.0 to trade off comm cost vs performance and pick an operational point.

Agent Features

Memory

  • short-term (autoregressive actions inserted back into decoder)

Tool Use

  • PPO for policy training
  • Gumbel-Max k-hot sampling for discrete graph relaxation

Frameworks

  • CTDE
  • graph modeling (masked attention edges)

Is Agentic

true

Architectures

  • Transformer encoder–decoder with masked attention
  • learnable adjacency matrix (α) representing directed graph

Collaboration

  • learned sparse directed message passing graph
  • centralized training, decentralized execution

Optimization Features

Token Efficiency

  • reduces number of transmitted messages by enforcing sparsity S

Model Optimization

  • continuous relaxation of discrete adjacency matrix
  • edge embeddings added to attention scores

System Optimization

  • encoder-as-critic reduces extra value-network code paths

Training Optimization

  • bi-level optimization approximated by single inner-step updates
  • end-to-end gradient updates on graph parameters

Inference Optimization

  • apply learned adjacency mask to attention (sparse communication)
  • deterministic adjacency at execution time

Reproducibility

Code Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • Learned graph is static at inference — poor if agents or connectivity change dynamically.
  • Method requires centralized training with global information; not plug-and-play for fully decentralized learning.
  • Sparsity S must be tuned per task; too low S degrades performance on complex maps.

When Not To Use

  • Open-world scenarios with highly variable agent distances where connectivity changes every episode.
  • Applications that cannot afford centralized training or global-state rollouts.

Failure Modes

  • Under-fitting communication when S is set too low for task complexity.
  • Learned graph may overfit to training opponents or environment configurations.
  • If agents or roles change at test time, the fixed graph may block necessary messages.

Core Entities

Models

  • CommFormer
  • CommFormer(FC)
  • MAT
  • MAPPO
  • HAPPO
  • QMIX
  • TarMAC
  • MAGIC
  • QGNN
  • CommNet
  • I3CNet
  • GA-Comm
  • UPDeT

Metrics

  • win rate / success rate
  • average cumulative reward
  • steps taken per episode

Datasets

  • SMAC
  • Google Research Football (GRF)
  • Predator-Prey (PP)
  • PredatorCapture-Prey (PCP)

Benchmarks

  • SMAC maps (e.g., 3m, 8m, 3s5z, 25m, MMM, MMM2, 1o2r_vs_4r, 1o10b_vs_1r)
  • GRF 3v2 football scenario
  • Predator-Prey capture tasks