A multi-agent RL leaf sequencer that reconstructs fluence maps and speeds optimizer convergence

June 3, 20248 min

Overview

Production Readiness

0.6

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

0

Authors

Riqiang Gao, Florin C. Ghesu, Simon Arberet, Shahab Basiri, Esa Kuusela, Martin Kraus, Dorin Comaniciu, Ali Kamen

Links

Abstract / PDF

Why It Matters For Business

RLS can shorten planning iterations and produce executable plans faster by replacing an iterative leaf sequencer, potentially cutting planning time and compute cost in automated radiotherapy pipelines.

Summary TLDR

This paper introduces RLS, a practical multi-agent reinforcement learning model that replaces iterative optimization for leaf sequencing in radiotherapy. RLS predicts leaf positions and monitor units in one pass per control point, reduces fluence reconstruction error versus a commercial optimizer, speeds early optimizer convergence when plugged into PORIx, and fits into a full AI planning pipeline. Key numbers: MNSE down from 0.219 to 0.149 on a head-and-neck set; RIRE97% reduced from 18.8 to 12.6 iterations on that set. The code and models are not public; clinical validation and end-to-end training remain future work.

Problem Statement

Leaf sequencing turns optimal radiation intensity (fluence) into machine-executable leaf positions and monitor units. Current solvers use slow, iterative optimization and cannot learn from large planning data. There is no widely used differentiable or learning-based sequencer that (a) leverages historical plans, (b) runs in a single pass per control point, and (c) integrates smoothly into AI planning pipelines.

Main Contribution

A first practical multi-agent RL formulation for leaf sequencing that predicts per-control-point leaf positions and monitor units.

A two-level actor setup: shared leaf actors (one per leaf pair) and an MU actor to coordinate monitor units.

A five-component reward design that balances fluence reconstruction, overdose penalties, physical constraints, motion smoothness, and aperture shape.

A single-pass (finite-horizon) inference procedure to avoid iterative optimization at inference time.

System evaluation on four datasets and three contexts (PORIx optimizer, full-AI VMAT pipeline, IMRT simulation) with ablation studies and cross-site tests.

Key Findings

RLS reduces fluence reconstruction error on head-and-neck data.

NumbersHNd MNSE: PORIx 0.219 → RLS 0.149 (−0.070)

RLS speeds early optimizer convergence when used inside PORIx.

NumbersRIRE97% (HNd) PORIx 18.8 → RLS 12.6 (↓6.2 iterations)

RLS produces executable fluences whose 3D dose errors are close to target-dose predictions.

NumbersVMAT (8 cps) Dose MAE 0.19; DVH MAE 0.88

A simple cropping normalization improves cross-site generalization.

NumbersPros(e) MNSE w/o crop 0.060 → w crop 0.043 (−28%)

RLS is lightweight in GPU memory.

NumbersEstimated GPU memory: RLS ≈ 0.35 GB

Results

MNSE

ValueRLS 0.149

BaselinePORIx 0.219

MNSE

ValueRLS 0.042

BaselinePORIx 0.079

RIRE97% (iterations to reach 97% improvement)

ValueRLS 12.6

BaselinePORIx 18.8

Dose score (MAE) / DVH score (MAE)

ValueDose MAE 0.19; DVH MAE 0.88

BaselineOpenKBP top entries (for context: Dose MAE ~2.4 best entry in different task)

Estimated GPU memory

ValueRLS ≈ 0.35 GB

BaselinePORIx pipeline ≈ 3.0 GB

Who Should Care

What To Try In 7 Days

Run RLS in PORIx on a small test set and compare MNSE to the current sequencer.

Enable the cropping normalization and measure cross-site MNSE improvement.

Tune the leaf-speed reward (λ4) to match your machine's max leaf speed and observe reconstruction change (Figure 11).

Agent Features

Memory

  • No long-term episodic memory reported
  • Uses cumulative fluence as running state

Planning

  • Finite-horizon (single-pass per control point)
  • Two-level action ordering (leaf then MU)

Tool Use

  • PORIx integration
  • CleanRL-style implementation

Frameworks

  • PPO
  • PPG (ablation)
  • CleanRL

Is Agentic

true

Architectures

  • Multi-agent actor-critic
  • PPO backbone
  • Two-level (leaf + MU) agents

Collaboration

  • Shared leaf policy across leaf agents
  • MU actor coordinates after leaf actions

Optimization Features

Infra Optimization

  • Cosine annealing LR, AdamW optimizer used

System Optimization

  • Lightweight networks (≈0.35 GB GPU footprint for RLS)

Training Optimization

  • Large-scale training on many fluence samples per patient
  • Reward-weight tuning to balance clinical constraints

Inference Optimization

  • Single execution per control point (no iterative loop)
  • Post-processing to merge control points for variable sector lengths

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Comparisons limited by lack of public, comparable learning-based sequencers and closed-source baselines.
  • RLS training focused on leaf sequencing only; full end-to-end joint training is future work.
  • Evaluation mainly covers first ≤100 iterations of PORIx; long-run optimizer behavior not fully tested.
  • Model backbone limited to PPO-family methods; other RL approaches were not fully explored.

When Not To Use

  • Do not deploy clinically without independent clinical validation and safety checks.
  • Avoid replacing full multi-resolution optimization pipelines that require many manual fine-tuning steps.
  • Do not use on machines whose physical constraints differ markedly without re-tuning reward weights and retraining.

Failure Modes

  • May suggest leaf movements that violate machine speed limits if reward weights are not tuned.
  • Performance can drop on unseen, highly atypical fluence shapes absent adequate augmentation.
  • Planner global cost may not be optimal since only the leaf-sequencing module was learned, not the whole loop.

Core Entities

Models

  • RLS
  • PPO
  • PPG
  • CleanRL

Metrics

  • MNSE
  • RIRE/AIRE
  • Dose score (MAE/MSE)
  • DVH score (MAE/MSE)

Datasets

  • HNd
  • HNe1
  • HNe2
  • Pros

Context Entities

Models

  • Fluence prediction model (referenced)
  • Dose prediction GAN (referenced)

Metrics

  • MNSE
  • RIRE/AIRE
  • Dose/DVH scores

Datasets

  • HNd
  • HNe1
  • HNe2
  • Pros