Overview
Production Readiness
0.45
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
3
Why It Matters For Business
SOCIALGYM 2.0 shortens iteration time for multi-robot navigation by providing modular, configurable simulations, reducing risky trial-and-error on hardware and enabling targeted benchmarking for specific crowded scenarios.
Summary TLDR
SOCIALGYM 2.0 is an open-source, Docker-packaged simulator for multi-agent social robot navigation. It plugs into PettingZoo and Stable Baselines3, uses ROS messaging, and models configurable robot kinodynamics, navigation graphs, and constrained scenarios (doorways, hallways, intersections, roundabouts). The platform makes observations, rewards, and environments fully configurable and ships example benchmarks showing there is no single best MARL policy across different social mini-games. It is aimed at fast iteration and comparative evaluation, but currently lacks parallel environments and continuous-action support.
Problem Statement
Robots must learn socially compliant navigation in constrained human spaces, but existing simulators focus on single robots, replayed crowds, or simple kinematics. Researchers need a modular multi-agent simulator that supports realistic robot dynamics, configurable observations/rewards, and benchmarking across constrained social scenarios.
Main Contribution
A modular multi-agent simulator (SOCIALGYM 2.0) that integrates PettingZoo MARL and Stable Baselines3 with ROS and a C++ engine (UTMRS).
Configurable robot kinodynamics (speed, acceleration, steering, different drive types) and local trajectory optimization tied to dynamics.
Support for constrained 'social mini-games' (Open, Doorway, Hallway, Intersection, Roundabout) built from 2D vector maps plus navigation graphs.
Fully customizable observation and reward classes, wrappers, and evaluation metrics; packaged for easy install via Docker.
Benchmarks comparing baselines (CADRL, CADRL-LSTM, Enforced Order, Any Order, Only Local) across multiple metrics and agent densities.
Key Findings
There is no single MARL algorithm that consistently wins across social mini-games.
Higher agent density sharply reduces success rates.
Using only a local planner (no high-level policy) fails in constrained or dense scenarios.
Results
collision rate
average path length
success rate vs agent count
Who Should Care
What To Try In 7 Days
Install the Docker image and run the provided Open and Doorway benchmarks to see baseline behaviors.
Swap observation or reward components (e.g., enable/disable collision penalty) and re-run a few training seeds to observe sensitivity.
Train a PPO policy with and without an LSTM wrapper to test generalization to higher agent counts.
Agent Features
Memory
- LSTM (short-term temporal memory)
Planning
- Global path planning
- Local trajectory optimization
Tool Use
- PettingZoo
- Stable Baselines3
- ROS
- UTMRS
Frameworks
- OpenAI Gym API (via PettingZoo)
- SB3/SB3-Contrib
Is Agentic
true
Architectures
- RL
- LSTM-enhanced policy
Collaboration
- Independent MARL policies (non-replay crowd)
- Sub-goal ordering rewards (Any/Enforced Order)
Optimization Features
Infra Optimization
- Docker packaging for reproducible environments
Training Optimization
- VecNormalize wrapper for normalized observations and rewards
Reproducibility
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- No parallel environments or multi-threaded optimization; CPU-limited.
- Continuous action spaces not yet supported (discrete GO/STOP used in experiments).
- Human motion model is optional and uses social forces, which may not cover all real-world behaviors.
- Not optimized for large-scale training or real-time multi-robot deployment out of the box.
When Not To Use
- When you need massively parallel simulation for large-scale RL training.
- For continuous-action control research where discrete navigation graphs are limiting.
- As a final test for real-world deployment without additional sim-to-real validation.
Failure Modes
- Policies overfit to low-density training and fail at higher densities.
- Evaluation metrics may miss socially useful behavior (queue-formation rewarded but not reflected in standard metrics).
- Local-only planners produce high collisions in constrained or crowded scenes.
Core Entities
Models
- PPO
- LSTM-PPO
- CADRL
- CADRL-LSTM
Metrics
- success rate
- collision rate
- average path length
- average stopping time
- max delta velocity
Benchmarks
- Open
- Doorway
- Hallway
- Intersection
- Roundabout
Context Entities
Datasets
- UT Austin campus floorplans (used as example maps)

