Replace slow online search with learned hierarchical agents to cut responder reallocation time from minutes to fractions of a second

May 21, 20249 min

Overview

Decision SnapshotReady For Pilot

The method shows strong latency and modest response-time gains on real datasets; training complexity and city-specific tuning moderate immediate turnkey use in new cities.

Citations0

Evidence Strength0.80

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/5

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 70%

Production readiness: 70%

Novelty: 60%

Authors

Amutheezan Sivagnanam, Ava Pettet, Hunter Lee, Ayan Mukhopadhyay, Abhishek Dubey, Aron Laszka

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Switching from search-based planners to learned hierarchical agents delivers sub-second reallocation decisions and modestly shorter ambulance response times on real-city data, enabling practical real-time deployment and lower operational latency.

Who Should Care

Summary TLDR

The paper replaces an MCTS-based hierarchical planner for emergency responder repositioning with learned actor-critic agents. Low-level agents use TransformerXL to handle a variable number of responders per region; high-level agent allocates responders across regions and uses low-level critics to estimate rewards. Continuous actor outputs are discretized with combinatorial solvers (max-weight matching and min-cost flow). On real data from Nashville and Seattle the learned system lowers decision latency from ~3 minutes to ~0.22 seconds and reduces average response times by several seconds vs MCTS, while training requires days per agent.

Problem Statement

Proactively repositioning ambulances is a combinatorial, time-sensitive decision problem. The current hierarchical planner uses Monte-Carlo tree search (MCTS) and can take minutes per decision, which is too slow for emergency response. The challenge is to make decisions near-instantly without losing solution quality in a huge, variable, discrete state-action space.

Main Contribution

A hierarchical multi-agent RL system that replaces MCTS low- and high-level planners with actor-critic agents to make near-instant reallocation decisions.

Design of fixed-size feature projections and TransformerXL-based low-level actors to handle variable numbers of responders per region.

Key Findings

Decision latency cut from minutes to fractions of a second.

Numbers0.22s per decision vs 3 min (≈180s) for MCTS

Practical UseReplace MCTS with the learned actor to achieve real-time reallocation (sub-second latency) in live ERM systems.

Evidence RefSec 4.4; Fig.4

Average response time reduced versus MCTS on evaluated city data.

NumbersNashville: −5s (5-region) to −13s (7-region); Seattle: −10s on average

Practical UseDeploying learned hierarchical agents can shave multiple seconds from expected ambulance response times on historical workloads.

Evidence RefSec 4.4; Fig.4, Fig.10

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Decision computation time per decision0.22 s (DDPG agent)3 min (MCTS)≈818× faster than MCTSNashville evaluationSec 4.4 reports 0.22s vs 3 minutes for MCTSSec 4.4; Fig.4
Average response time change vs MCTS−5s (5-region) to −13s (7-region) in NashvilleMCTSsavings of 513 secondsNashville (10 chains)Sec 4.4; Fig.4a–cSec 4.4; Fig.4

What To Try In 7 Days

Run the provided code on one held-out historical incident chain to measure latency and response-time deltas.

Swap an MCTS low-level planner with the pre-trained TrXL low-level actor and measure end-to-end decision latency.

Validate the reward-estimation trick by comparing high-level actions using LLP critics versus raw response-time rewards on a single region.

Agent Features

Memory
experience replay buffer
Planning
hierarchical coordinationhigh-level region allocationlow-level depot assignment
Tool Use
maximum-weight matchingminimum-cost flowcontraction hierarchiesOSRM
Frameworks
hierarchical decomposition (regions / depots)DDPG training loop
Is Agentic

Yes

Architectures
actor-criticDDPGTransformerXL (TrXL) actorMLP critics
Collaboration
independent low-level agents coordinated by a high-level agentlow-level critics used to inform high-level rewards

Optimization Features

Infra Optimization
CPU-based training reported; training time varies by region size
Model Optimization
architecture search for TrXL hyperparameters
System Optimization
map variable-sized state to fixed-size featuresuse TrXL attention to handle variable responder counts
Training Optimization
experience replayactor-critic (DDPG)reward estimation via LLP critics
Inference Optimization
continuous actor outputs for fast forward pass (sub-second inference)discretization via fast combinatorial solvers

Reproducibility

Risks & Boundaries

Limitations

Training low-level agents can take many days for regions with many depots (up to 14 days).

Approach depends on incident-rate forecasts and time-varying travel models; quality of inputs affects outcomes.

When Not To Use

You lack historical incident data or travel-time models to train and evaluate the agents.

You cannot afford multiple days of offline training per region topology.

Failure Modes

Inaccurate low-level critics misestimate regional value, leading to bad high-level reallocations.

Actor outputs map poorly under the discretization step in rare combinatorial edge cases.

Core Entities

Models

DDPGTransformerXL (TrXL)GTrXLLSTMMCTSDRLSN

Metrics

average response timedecision computation timetraining convergence time

Datasets

Nashville ERM dataset (36 depots, 26 responders; 60 chains)Seattle public collisions dataset (City of Seattle 2022; 60 chains)

Benchmarks

MCTS (Pettet et al., 2022)p-median-based policygreedy policystatic policyDRLSN (Ji et al., 2019)

Context Entities

Models

Feudal / hierarchical RLAttention-based coordination (TrXL)

Metrics

p-values from paired permutation tests

Datasets

Historical incident chains and rate forecasts

Benchmarks

Prior hierarchical planning (Pettet et al., 2022)