SOCIALGYM 2.0: configurable MARL simulator and benchmark for multi-robot social navigation

March 9, 20237 min

Overview

Production Readiness

0.45

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

3

Authors

Zayne Sprague, Rohan Chandra, Jarrett Holtz, Joydeep Biswas

Links

Abstract / PDF

Why It Matters For Business

SOCIALGYM 2.0 shortens iteration time for multi-robot navigation by providing modular, configurable simulations, reducing risky trial-and-error on hardware and enabling targeted benchmarking for specific crowded scenarios.

Summary TLDR

SOCIALGYM 2.0 is an open-source, Docker-packaged simulator for multi-agent social robot navigation. It plugs into PettingZoo and Stable Baselines3, uses ROS messaging, and models configurable robot kinodynamics, navigation graphs, and constrained scenarios (doorways, hallways, intersections, roundabouts). The platform makes observations, rewards, and environments fully configurable and ships example benchmarks showing there is no single best MARL policy across different social mini-games. It is aimed at fast iteration and comparative evaluation, but currently lacks parallel environments and continuous-action support.

Problem Statement

Robots must learn socially compliant navigation in constrained human spaces, but existing simulators focus on single robots, replayed crowds, or simple kinematics. Researchers need a modular multi-agent simulator that supports realistic robot dynamics, configurable observations/rewards, and benchmarking across constrained social scenarios.

Main Contribution

A modular multi-agent simulator (SOCIALGYM 2.0) that integrates PettingZoo MARL and Stable Baselines3 with ROS and a C++ engine (UTMRS).

Configurable robot kinodynamics (speed, acceleration, steering, different drive types) and local trajectory optimization tied to dynamics.

Support for constrained 'social mini-games' (Open, Doorway, Hallway, Intersection, Roundabout) built from 2D vector maps plus navigation graphs.

Fully customizable observation and reward classes, wrappers, and evaluation metrics; packaged for easy install via Docker.

Benchmarks comparing baselines (CADRL, CADRL-LSTM, Enforced Order, Any Order, Only Local) across multiple metrics and agent densities.

Key Findings

There is no single MARL algorithm that consistently wins across social mini-games.

NumbersCollision rates and lengths vary widely; example: collision rate range 0–2.8 across Table II

Higher agent density sharply reduces success rates.

NumbersSuccess can drop to 0% at 10 agents (e.g., CADRL on Open: 32% at 3 agents → 0% at 10 agents, Table III)

Using only a local planner (no high-level policy) fails in constrained or dense scenarios.

NumbersOnly Local shows high collision rates (up to 2.8) and near-zero success for ≥5 agents (Table II & III)

Results

collision rate

ValueOpen: Any Order 0.12 (avg per episode)

BaselineAny Order

average path length

ValueDoorway: Enforced Order 960 (units as reported)

BaselineEnforced Order

success rate vs agent count

ValueDoorway: Any Order success 72% at 3 agents → 4% at 10 agents

BaselineAny Order

Who Should Care

What To Try In 7 Days

Install the Docker image and run the provided Open and Doorway benchmarks to see baseline behaviors.

Swap observation or reward components (e.g., enable/disable collision penalty) and re-run a few training seeds to observe sensitivity.

Train a PPO policy with and without an LSTM wrapper to test generalization to higher agent counts.

Agent Features

Memory

  • LSTM (short-term temporal memory)

Planning

  • Global path planning
  • Local trajectory optimization

Tool Use

  • PettingZoo
  • Stable Baselines3
  • ROS
  • UTMRS

Frameworks

  • OpenAI Gym API (via PettingZoo)
  • SB3/SB3-Contrib

Is Agentic

true

Architectures

  • RL
  • LSTM-enhanced policy

Collaboration

  • Independent MARL policies (non-replay crowd)
  • Sub-goal ordering rewards (Any/Enforced Order)

Optimization Features

Infra Optimization

  • Docker packaging for reproducible environments

Training Optimization

  • VecNormalize wrapper for normalized observations and rewards

Reproducibility

Code Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • No parallel environments or multi-threaded optimization; CPU-limited.
  • Continuous action spaces not yet supported (discrete GO/STOP used in experiments).
  • Human motion model is optional and uses social forces, which may not cover all real-world behaviors.
  • Not optimized for large-scale training or real-time multi-robot deployment out of the box.

When Not To Use

  • When you need massively parallel simulation for large-scale RL training.
  • For continuous-action control research where discrete navigation graphs are limiting.
  • As a final test for real-world deployment without additional sim-to-real validation.

Failure Modes

  • Policies overfit to low-density training and fail at higher densities.
  • Evaluation metrics may miss socially useful behavior (queue-formation rewarded but not reflected in standard metrics).
  • Local-only planners produce high collisions in constrained or crowded scenes.

Core Entities

Models

  • PPO
  • LSTM-PPO
  • CADRL
  • CADRL-LSTM

Metrics

  • success rate
  • collision rate
  • average path length
  • average stopping time
  • max delta velocity

Benchmarks

  • Open
  • Doorway
  • Hallway
  • Intersection
  • Roundabout

Context Entities

Datasets

  • UT Austin campus floorplans (used as example maps)