Use LLMs to patch rule-based driving planners and cut dangerous scenarios on nuPlan.

December 30, 20237 min

Overview

Decision SnapshotNeeds Validation

LLM-ASSIST shows that constrained use of LLMs (parameter outputs) improves closed-loop planning on nuPlan; still, latency, grounding, and hallucination risks must be fixed before real-world deployment.

Citations6

Evidence Strength0.80

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/4

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 40%

Production readiness: 60%

Novelty: 55%

Authors

S P Sharan, Francesco Pittaluga, Vijay Kumar B G, Manmohan Chandraker

Links

Abstract / PDF / Code / Data

Why It Matters For Business

A language model can be used to patch edge-case failures of a strong rule-based planner and reduce dangerous scenarios without retraining the core planner, but latency, cost, and hallucination risk must be managed.

Who Should Care

Summary TLDR

The paper builds a hybrid planner that keeps a strong rule-based planner (PDM-Closed) for routine driving and invokes an LLM when the base planner predicts low-quality proposals. Two LLM roles are tested: (1) unconstrained LLM outputs full trajectories, (2) parameterized LLM returns planner parameters for PDM-Closed. The parameterized approach (GPT-3-ASSISTPAR) gives state-of-the-art closed-loop results on the nuPlan val14 split and reduces dangerous driving scenarios by ~11% versus PDM-Closed. Limitations: text-only state input, LLM latency, and hallucination risk.

Problem Statement

Rule-based planners handle most traffic but fail in some complex or rare scenarios. Pure learning planners overfit or struggle in closed-loop settings. Can large language models’ commonsense reasoning be used to fix those hard cases without losing the safe behavior of rule-based planners?

Main Contribution

A score-based gating strategy: invoke an LLM only when the base planner's simulated proposal scores fall below thresholds.

Two LLM integrations: unconstrained LLM that outputs trajectories and a parameterized LLM that returns planner parameters for PDM-Closed.

Key Findings

Parameterizing the base planner with an LLM reduces dangerous driving scenarios.

Numbers11% fewer dangerous events vs PDM-Closed (nuPlan val14)

Practical UseUse an LLM to pick planner parameters rather than replacing the planner to get immediate safety gains on evaluated benchmarks.

Evidence RefSection 5.3, Table 2

GPT-3-ASSISTPAR yields higher overall closed-loop scores than PDM-Closed on val14.

NumbersNon-reactive score 93.05 vs 92.51 (PDM-Closed)

Practical UseSmall but consistent score gains suggest LLM-parameterization can improve real-world closed-loop metrics without retraining the planner.

Evidence RefTable 2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Closed-loop non-reactive combined score93.05 (GPT-3-ASSISTPAR)92.51 (PDM-Closed)+0.54nuPlan val14Table 2 reports scores for PDM-Closed and GPT-3-ASSISTPAR on val14Table 2
Closed-loop reactive combined score92.20 (GPT-3-ASSISTPAR)91.79 (PDM-Closed)+0.41nuPlan val14Table 2 reports scores for PDM-Closed and GPT-3-ASSISTPAR on val14Table 2

What To Try In 7 Days

Add a simulation-based score threshold to detect low-confidence planner outputs and gate LLM invocation.

Implement a parameterized LLM interface that returns planner hyperparameters, not raw trajectories.

Run an offline eval on a held-out set (nuPlan val14 or your scenarios) permitting up to 4 LLM queries per decision step and measure safety metrics and latency.

Agent Features

Planning
gated LLM invocation (score-based)parameterized planning (LLM outputs planner params)unconstrained planning (LLM outputs trajectories)
Tool Use
GPT-3GPT-4Llama2-7BPDM-Closed
Frameworks
nuPlan API
Is Agentic

Yes

Architectures
hybrid rule-based + LLM
Collaboration
LLM supplements base planner decisions

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

System uses a text-only parsed state, omitting raw sensor richness.

LLMs introduce latency; Llama2-7B took ~3s for a single parameter query in tests.

When Not To Use

In hard real-time control loops that require millisecond-level latency.

As a standalone replacement for the planner (LLM direct trajectories performed poorly).

Failure Modes

LLM hallucination produces incorrect planner parameters or malformed outputs.

LLM formatting errors break the planner interface and cause fallback to low-quality proposals.

Core Entities

Models

PDM-ClosedIDMGPT-3GPT-4Llama2-7B

Metrics

ScoreCollisionsTime-to-Collision (TTC)DrivableComfortProgressSpeed LimitDirection

Datasets

nuPlan val14

Benchmarks

nuPlan closed-loop non-reactivenuPlan closed-loop reactive