Share a recurrent transformer memory so agents coordinate implicitly and solve long narrow‑corridor pathfinding

Overview

Decision SnapshotNeeds Validation

The method shows solid empirical gains on benchmarks and a public codebase, but assumptions (perfect localization, synchronized moves) and lack of theoretical guarantees limit immediate deployment in safety-critical systems.

Citations1

Evidence Strength0.70

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Alsu Sagirova, Yuri Kuratov, Mikhail Burtsev

Links

Abstract / PDF / Code / Data

Why It Matters For Business

SRMT offers a lightweight way to improve decentralized multi-robot coordination without centralized control; it can cut coordination failures and extend policies trained on small maps to larger deployments.

Who Should Care

Product Manager ML Engineer Engineering Lead Data Scientist

Summary TLDR

SRMT adds a shared recurrent memory to transformer-based agent policies so agents can read/write a global workspace. On toy bottleneck tasks and the POGEMA benchmark, SRMT improves coordination versus non-sharing and communication baselines, generalizes to much longer corridors than seen in training, and scales to large lifelong MAPF scenarios. Code is available on GitHub for reproducible experiments.

Problem Statement

Coordinating many decentralized agents is hard because each agent sees only local observations and explicit communication protocols are costly or brittle. The paper asks: can a globally shared recurrent memory (a broadcast workspace) let decentralized transformer agents exchange information implicitly, avoid deadlocks, and generalize to larger map sizes?

Main Contribution

Shared Recurrent Memory Transformer (SRMT): a multi-agent transformer that pools each agent's recurrent memory and broadcasts it globally via cross-attention.

Empirical tests showing SRMT outperforms several MARL and memory/communication baselines on a two-agent Bottleneck task and is competitive on the POGEMA benchmark.

Key Findings

SRMT keeps near-perfect cooperative success on long corridors after training on short ones.

Numberstrained on corridors 3–30 cells; CSR ≈1.0 up to 400 cells, drops to 0.8 beyond 400

Practical UseIf you train on short layouts, SRMT can generalize to much longer narrow passages, so reuse of small-map training is feasible for longer deployments.

Evidence RefFigure 4; text in Section 4.1

Under sparse rewards SRMT maintains performance while other baselines fail.

NumbersSRMT outperforms baselines on Sparse and Moving Negative reward settings (CSR/ISR/SoC metrics)

Practical UseSRMT is a good option when reward signals are rare or delayed; the shared memory helps agents learn coordination with weak feedback.

Evidence RefFigure 3 and Appendix A.2

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Cooperative Success Rate (CSR)	≈1.0 up to corridor 400, drops to 0.8 beyond 400 (Sparse reward)	RMT and other baselines	SRMT > baselines on Sparse and Moving Negative	Bottleneck task, evaluation corridors 5–1000	Figure 4 and Section 4.1	Figure 4
Top performance on Moving Negative reward	SRMT is top-1 for CSR/ISR/SoC up to corridor length 1000	MAMBA, QPLEX, ATM, RATE, RRNN, RNN	SRMT outperforms all listed baselines	Bottleneck task with Moving Negative reward	Section 4.1 and Figure 4	Figure 4

What To Try In 7 Days

Run the authors' SRMT code on POGEMA to reproduce baseline bottleneck results.

Swap in SRMT's shared-memory block for existing transformer agents to test coordination gains on your maps.

Combine SRMT with a simple heuristic planner (Follower-style) on high-congestion maps to see throughput gains.

Agent Features

Memory

Per-agent recurrent memory vectorsGlobal pooled memory broadcast each stepMemory head updates personal memory

Planning

Integration with heuristic planner (Follower) optional

Tool Use

ResNet spatial encoderGPT-2 style attention blockCross-attention to shared memory

Frameworks

POGEMASample FactoryHuggingface Transformers

Is Agentic

Yes

Architectures

Transformer-based policy with memory tokensShared recurrent memory (global workspace)

Collaboration

Implicit communication via shared memoryDecentralized training and execution (no central controller)

Optimization Features

Infra Optimization

Reported training runs on a single Tesla P100 (MAPF models ~1 hour per run)

Model Optimization

Memory tokens as compact recurrent state

System Optimization

Batching via Sample Factory for many environments

Training Optimization

Shared homogeneous policy across agentsGrid search for entropy coefficient and learning rate

Inference Optimization

Single forward pass with cross-attention to shared memory

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusYes

LicenseUnknown

Code URLs

https://github.com/Aloriosa/srmt

Data URLs

https://github.com/Skrynnik/POGEMA (POGEMA benchmark)https://movingai.org (MovingAI maps)

Risks & Boundaries

Limitations

Assumes perfect agent localization and mapping.

Assumes synchronized and accurate action execution.

When Not To Use

When agents have noisy localization or unreliable actuators.

When formal safety or liveness guarantees are required.

Failure Modes

Performance drops outside tested regimes (CSR drops after corridor >400 in Sparse setting).

Potential deadlocks if memory initialization or write/read policies are poor.

Core Entities

Models

SRMTRMTATMRATERATE_genRRNNMAMBAQPLEXRNNEmptyRHCRFollowerMATS-LPSCRIMP

Metrics

Cooperative Success Rate (CSR)Individual Success Rate (ISR)Sum-of-Costs (SoC)Average throughputPerformance (relative throughput)Pathfinding (optimality)CongestionCooperationOut-of-DistributionScalability

Datasets

POGEMAMovingAIMovingAI-tilesBottleneck toy mapsMazesRandomWarehouse

Benchmarks

POGEMA benchmarkBottleneck taskLifelong MAPF (LMAPF)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

SRMT keeps near-perfect cooperative success on long corridors after training on short ones.

Under sparse rewards SRMT maintains performance while other baselines fail.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding