A multi-agent RL leaf sequencer that reconstructs fluence maps and speeds optimizer convergence

June 3, 20248 min

Overview

Decision SnapshotNeeds Validation

RLS is a promising prototype: it improves reconstruction and early convergence in a practical optimizer and is lightweight. Clinical validation, open models, and end-to-end training are still needed before deployment.

Citations0

Evidence Strength0.70

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/5

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 70%

Authors

Riqiang Gao, Florin C. Ghesu, Simon Arberet, Shahab Basiri, Esa Kuusela, Martin Kraus, Dorin Comaniciu, Ali Kamen

Links

Abstract / PDF / Data

Why It Matters For Business

RLS can shorten planning iterations and produce executable plans faster by replacing an iterative leaf sequencer, potentially cutting planning time and compute cost in automated radiotherapy pipelines.

Who Should Care

Summary TLDR

This paper introduces RLS, a practical multi-agent reinforcement learning model that replaces iterative optimization for leaf sequencing in radiotherapy. RLS predicts leaf positions and monitor units in one pass per control point, reduces fluence reconstruction error versus a commercial optimizer, speeds early optimizer convergence when plugged into PORIx, and fits into a full AI planning pipeline. Key numbers: MNSE down from 0.219 to 0.149 on a head-and-neck set; RIRE97% reduced from 18.8 to 12.6 iterations on that set. The code and models are not public; clinical validation and end-to-end training remain future work.

Problem Statement

Leaf sequencing turns optimal radiation intensity (fluence) into machine-executable leaf positions and monitor units. Current solvers use slow, iterative optimization and cannot learn from large planning data. There is no widely used differentiable or learning-based sequencer that (a) leverages historical plans, (b) runs in a single pass per control point, and (c) integrates smoothly into AI planning pipelines.

Main Contribution

A first practical multi-agent RL formulation for leaf sequencing that predicts per-control-point leaf positions and monitor units.

A two-level actor setup: shared leaf actors (one per leaf pair) and an MU actor to coordinate monitor units.

Key Findings

RLS reduces fluence reconstruction error on head-and-neck data.

NumbersHNd MNSE: PORIx 0.219 → RLS 0.149 (−0.070)

Practical UseReplace or augment the PORIx sequencer with RLS to get measurably lower fluence reconstruction error on similar HN data.

Evidence RefTable 1

RLS speeds early optimizer convergence when used inside PORIx.

NumbersRIRE97% (HNd) PORIx 18.8 → RLS 12.6 (↓6.2 iterations)

Practical UseIntegrate RLS into an optimizer to cut early iterations and shorten planning time in typical workflows.

Evidence RefTable 2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
MNSERLS 0.149PORIx 0.219-0.070HNd (test)Table 1 (MNSE comparison)Table 1
MNSERLS 0.042PORIx 0.079-0.037Pros (test)Table 1 (MNSE comparison)Table 1

What To Try In 7 Days

Run RLS in PORIx on a small test set and compare MNSE to the current sequencer.

Enable the cropping normalization and measure cross-site MNSE improvement.

Tune the leaf-speed reward (λ4) to match your machine's max leaf speed and observe reconstruction change (Figure 11).

Agent Features

Memory
No long-term episodic memory reportedUses cumulative fluence as running state
Planning
Finite-horizon (single-pass per control point)Two-level action ordering (leaf then MU)
Tool Use
PORIx integrationCleanRL-style implementation
Frameworks
PPOPPG (ablation)CleanRL
Is Agentic

Yes

Architectures
Multi-agent actor-criticPPO backboneTwo-level (leaf + MU) agents
Collaboration
Shared leaf policy across leaf agentsMU actor coordinates after leaf actions

Optimization Features

Infra Optimization
Cosine annealing LR, AdamW optimizer used
System Optimization
Lightweight networks (≈0.35 GB GPU footprint for RLS)
Training Optimization
Large-scale training on many fluence samples per patientReward-weight tuning to balance clinical constraints
Inference Optimization
Single execution per control point (no iterative loop)Post-processing to merge control points for variable sector lengths

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Comparisons limited by lack of public, comparable learning-based sequencers and closed-source baselines.

RLS training focused on leaf sequencing only; full end-to-end joint training is future work.

When Not To Use

Do not deploy clinically without independent clinical validation and safety checks.

Avoid replacing full multi-resolution optimization pipelines that require many manual fine-tuning steps.

Failure Modes

May suggest leaf movements that violate machine speed limits if reward weights are not tuned.

Performance can drop on unseen, highly atypical fluence shapes absent adequate augmentation.

Core Entities

Models

RLSPPOPPGCleanRL

Metrics

MNSERIRE/AIREDose score (MAE/MSE)DVH score (MAE/MSE)

Datasets

HNdHNe1HNe2Pros

Context Entities

Models

Fluence prediction model (referenced)Dose prediction GAN (referenced)

Metrics

MNSERIRE/AIREDose/DVH scores

Datasets

HNdHNe1HNe2Pros