Overview
The method shows solid empirical gains on benchmarks and a public codebase, but assumptions (perfect localization, synchronized moves) and lack of theoretical guarantees limit immediate deployment in safety-critical systems.
Citations1
Evidence Strength0.70
Confidence0.85
Risk Signals10
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 3/3
Reproducibility
Status: Code + data available
Open source: Yes
At A Glance
Cost impact: 50%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
SRMT offers a lightweight way to improve decentralized multi-robot coordination without centralized control; it can cut coordination failures and extend policies trained on small maps to larger deployments.
Who Should Care
Summary TLDR
SRMT adds a shared recurrent memory to transformer-based agent policies so agents can read/write a global workspace. On toy bottleneck tasks and the POGEMA benchmark, SRMT improves coordination versus non-sharing and communication baselines, generalizes to much longer corridors than seen in training, and scales to large lifelong MAPF scenarios. Code is available on GitHub for reproducible experiments.
Problem Statement
Coordinating many decentralized agents is hard because each agent sees only local observations and explicit communication protocols are costly or brittle. The paper asks: can a globally shared recurrent memory (a broadcast workspace) let decentralized transformer agents exchange information implicitly, avoid deadlocks, and generalize to larger map sizes?
Main Contribution
Shared Recurrent Memory Transformer (SRMT): a multi-agent transformer that pools each agent's recurrent memory and broadcasts it globally via cross-attention.
Empirical tests showing SRMT outperforms several MARL and memory/communication baselines on a two-agent Bottleneck task and is competitive on the POGEMA benchmark.
Key Findings
SRMT keeps near-perfect cooperative success on long corridors after training on short ones.
Under sparse rewards SRMT maintains performance while other baselines fail.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Cooperative Success Rate (CSR) | ≈1.0 up to corridor 400, drops to 0.8 beyond 400 (Sparse reward) | RMT and other baselines | SRMT > baselines on Sparse and Moving Negative | Bottleneck task, evaluation corridors 5–1000 | Figure 4 and Section 4.1 | Figure 4 |
| Top performance on Moving Negative reward | SRMT is top-1 for CSR/ISR/SoC up to corridor length 1000 | MAMBA, QPLEX, ATM, RATE, RRNN, RNN | SRMT outperforms all listed baselines | Bottleneck task with Moving Negative reward | Section 4.1 and Figure 4 | Figure 4 |
What To Try In 7 Days
Run the authors' SRMT code on POGEMA to reproduce baseline bottleneck results.
Swap in SRMT's shared-memory block for existing transformer agents to test coordination gains on your maps.
Combine SRMT with a simple heuristic planner (Follower-style) on high-congestion maps to see throughput gains.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Code URLs
Risks & Boundaries
Limitations
Assumes perfect agent localization and mapping.
Assumes synchronized and accurate action execution.
When Not To Use
When agents have noisy localization or unreliable actuators.
When formal safety or liveness guarantees are required.
Failure Modes
Performance drops outside tested regimes (CSR drops after corridor >400 in Sparse setting).
Potential deadlocks if memory initialization or write/read policies are poor.

