Share a recurrent transformer memory so agents coordinate implicitly and solve long narrow‑corridor pathfinding

January 22, 20258 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

1

Authors

Alsu Sagirova, Yuri Kuratov, Mikhail Burtsev

Links

Abstract / PDF

Why It Matters For Business

SRMT offers a lightweight way to improve decentralized multi-robot coordination without centralized control; it can cut coordination failures and extend policies trained on small maps to larger deployments.

Summary TLDR

SRMT adds a shared recurrent memory to transformer-based agent policies so agents can read/write a global workspace. On toy bottleneck tasks and the POGEMA benchmark, SRMT improves coordination versus non-sharing and communication baselines, generalizes to much longer corridors than seen in training, and scales to large lifelong MAPF scenarios. Code is available on GitHub for reproducible experiments.

Problem Statement

Coordinating many decentralized agents is hard because each agent sees only local observations and explicit communication protocols are costly or brittle. The paper asks: can a globally shared recurrent memory (a broadcast workspace) let decentralized transformer agents exchange information implicitly, avoid deadlocks, and generalize to larger map sizes?

Main Contribution

Shared Recurrent Memory Transformer (SRMT): a multi-agent transformer that pools each agent's recurrent memory and broadcasts it globally via cross-attention.

Empirical tests showing SRMT outperforms several MARL and memory/communication baselines on a two-agent Bottleneck task and is competitive on the POGEMA benchmark.

Ablations and analysis showing memory initialization and sharing are key to scaling to much longer corridors than seen in training.

Public release of code and training pipeline (GitHub link in paper) for reproducibility.

Key Findings

SRMT keeps near-perfect cooperative success on long corridors after training on short ones.

Numberstrained on corridors 3–30 cells; CSR ≈1.0 up to 400 cells, drops to 0.8 beyond 400

Under sparse rewards SRMT maintains performance while other baselines fail.

NumbersSRMT outperforms baselines on Sparse and Moving Negative reward settings (CSR/ISR/SoC metrics)

With an improved memory initialization, other memory models improve but still lag SRMT.

NumbersRATE with observation-based init (RATE_gen) improved vs original RATE but did not match SRMT overall

In lifelong MAPF (LMAPF) SRMT achieves high throughput and competitive scores across POGEMA metrics.

NumbersSRMT outperforms MAMBA and QPLEX on most maps; SRMT-FlwrPlan beats several baselines including RHCR on Warehouse

SRMT training and evaluation were reproduced with public tools and code.

NumbersCode on GitHub; experiments run on POGEMA and Sample Factory (training reported per-run ≈1 hour on Tesla P100 for MAPF)

Results

Cooperative Success Rate (CSR)

Value≈1.0 up to corridor 400, drops to 0.8 beyond 400 (Sparse reward)

BaselineRMT and other baselines

Top performance on Moving Negative reward

ValueSRMT is top-1 for CSR/ISR/SoC up to corridor length 1000

BaselineMAMBA, QPLEX, ATM, RATE, RRNN, RNN

Average throughput (LMAPF)

ValueHigher than MAMBA and QPLEX on Mazes and Random; mixed vs other baselines

BaselineMAMBA, QPLEX, MATS-LP, RHCR

Who Should Care

What To Try In 7 Days

Run the authors' SRMT code on POGEMA to reproduce baseline bottleneck results.

Swap in SRMT's shared-memory block for existing transformer agents to test coordination gains on your maps.

Combine SRMT with a simple heuristic planner (Follower-style) on high-congestion maps to see throughput gains.

Agent Features

Memory

  • Per-agent recurrent memory vectors
  • Global pooled memory broadcast each step
  • Memory head updates personal memory

Planning

  • Integration with heuristic planner (Follower) optional

Tool Use

  • ResNet spatial encoder
  • GPT-2 style attention block
  • Cross-attention to shared memory

Frameworks

  • POGEMA
  • Sample Factory
  • Huggingface Transformers

Is Agentic

true

Architectures

  • Transformer-based policy with memory tokens
  • Shared recurrent memory (global workspace)

Collaboration

  • Implicit communication via shared memory
  • Decentralized training and execution (no central controller)

Optimization Features

Infra Optimization

  • Reported training runs on a single Tesla P100 (MAPF models ~1 hour per run)

Model Optimization

  • Memory tokens as compact recurrent state

System Optimization

  • Batching via Sample Factory for many environments

Training Optimization

  • Shared homogeneous policy across agents
  • Grid search for entropy coefficient and learning rate

Inference Optimization

  • Single forward pass with cross-attention to shared memory

Reproducibility

Code Available

Data Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • Assumes perfect agent localization and mapping.
  • Assumes synchronized and accurate action execution.
  • Obstacles are static in all experiments.
  • No theoretical guarantees that agents always reach goals.

When Not To Use

  • When agents have noisy localization or unreliable actuators.
  • When formal safety or liveness guarantees are required.
  • If communication or memory sharing is forbidden by system constraints.

Failure Modes

  • Performance drops outside tested regimes (CSR drops after corridor >400 in Sparse setting).
  • Potential deadlocks if memory initialization or write/read policies are poor.
  • May underperform centralized planners on cooperation metric in some crafted puzzles (RHCR leads in Cooperation).

Core Entities

Models

  • SRMT
  • RMT
  • ATM
  • RATE
  • RATE_gen
  • RRNN
  • MAMBA
  • QPLEX
  • RNN
  • Empty
  • RHCR
  • Follower
  • MATS-LP
  • SCRIMP

Metrics

  • Cooperative Success Rate (CSR)
  • Individual Success Rate (ISR)
  • Sum-of-Costs (SoC)
  • Average throughput
  • Performance (relative throughput)
  • Pathfinding (optimality)
  • Congestion
  • Cooperation
  • Out-of-Distribution
  • Scalability

Datasets

  • POGEMA
  • MovingAI
  • MovingAI-tiles
  • Bottleneck toy maps
  • Mazes
  • Random
  • Warehouse

Benchmarks

  • POGEMA benchmark
  • Bottleneck task
  • Lifelong MAPF (LMAPF)