Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
1
Why It Matters For Business
SRMT offers a lightweight way to improve decentralized multi-robot coordination without centralized control; it can cut coordination failures and extend policies trained on small maps to larger deployments.
Summary TLDR
SRMT adds a shared recurrent memory to transformer-based agent policies so agents can read/write a global workspace. On toy bottleneck tasks and the POGEMA benchmark, SRMT improves coordination versus non-sharing and communication baselines, generalizes to much longer corridors than seen in training, and scales to large lifelong MAPF scenarios. Code is available on GitHub for reproducible experiments.
Problem Statement
Coordinating many decentralized agents is hard because each agent sees only local observations and explicit communication protocols are costly or brittle. The paper asks: can a globally shared recurrent memory (a broadcast workspace) let decentralized transformer agents exchange information implicitly, avoid deadlocks, and generalize to larger map sizes?
Main Contribution
Shared Recurrent Memory Transformer (SRMT): a multi-agent transformer that pools each agent's recurrent memory and broadcasts it globally via cross-attention.
Empirical tests showing SRMT outperforms several MARL and memory/communication baselines on a two-agent Bottleneck task and is competitive on the POGEMA benchmark.
Ablations and analysis showing memory initialization and sharing are key to scaling to much longer corridors than seen in training.
Public release of code and training pipeline (GitHub link in paper) for reproducibility.
Key Findings
SRMT keeps near-perfect cooperative success on long corridors after training on short ones.
Under sparse rewards SRMT maintains performance while other baselines fail.
With an improved memory initialization, other memory models improve but still lag SRMT.
In lifelong MAPF (LMAPF) SRMT achieves high throughput and competitive scores across POGEMA metrics.
SRMT training and evaluation were reproduced with public tools and code.
Results
Cooperative Success Rate (CSR)
Top performance on Moving Negative reward
Average throughput (LMAPF)
Who Should Care
What To Try In 7 Days
Run the authors' SRMT code on POGEMA to reproduce baseline bottleneck results.
Swap in SRMT's shared-memory block for existing transformer agents to test coordination gains on your maps.
Combine SRMT with a simple heuristic planner (Follower-style) on high-congestion maps to see throughput gains.
Agent Features
Memory
- Per-agent recurrent memory vectors
- Global pooled memory broadcast each step
- Memory head updates personal memory
Planning
- Integration with heuristic planner (Follower) optional
Tool Use
- ResNet spatial encoder
- GPT-2 style attention block
- Cross-attention to shared memory
Frameworks
- POGEMA
- Sample Factory
- Huggingface Transformers
Is Agentic
true
Architectures
- Transformer-based policy with memory tokens
- Shared recurrent memory (global workspace)
Collaboration
- Implicit communication via shared memory
- Decentralized training and execution (no central controller)
Optimization Features
Infra Optimization
- Reported training runs on a single Tesla P100 (MAPF models ~1 hour per run)
Model Optimization
- Memory tokens as compact recurrent state
System Optimization
- Batching via Sample Factory for many environments
Training Optimization
- Shared homogeneous policy across agents
- Grid search for entropy coefficient and learning rate
Inference Optimization
- Single forward pass with cross-attention to shared memory
Reproducibility
Code Urls
Code Available
Data Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Assumes perfect agent localization and mapping.
- Assumes synchronized and accurate action execution.
- Obstacles are static in all experiments.
- No theoretical guarantees that agents always reach goals.
When Not To Use
- When agents have noisy localization or unreliable actuators.
- When formal safety or liveness guarantees are required.
- If communication or memory sharing is forbidden by system constraints.
Failure Modes
- Performance drops outside tested regimes (CSR drops after corridor >400 in Sparse setting).
- Potential deadlocks if memory initialization or write/read policies are poor.
- May underperform centralized planners on cooperation metric in some crafted puzzles (RHCR leads in Cooperation).
Core Entities
Models
- SRMT
- RMT
- ATM
- RATE
- RATE_gen
- RRNN
- MAMBA
- QPLEX
- RNN
- Empty
- RHCR
- Follower
- MATS-LP
- SCRIMP
Metrics
- Cooperative Success Rate (CSR)
- Individual Success Rate (ISR)
- Sum-of-Costs (SoC)
- Average throughput
- Performance (relative throughput)
- Pathfinding (optimality)
- Congestion
- Cooperation
- Out-of-Distribution
- Scalability
Datasets
- POGEMA
- MovingAI
- MovingAI-tiles
- Bottleneck toy maps
- Mazes
- Random
- Warehouse
Benchmarks
- POGEMA benchmark
- Bottleneck task
- Lifelong MAPF (LMAPF)

