Use LLMs (LightGPT) to control traffic lights with human-like reasoning and lower deployment cost

December 26, 20237 min

Overview

Decision SnapshotReady For Pilot

The approach is tested at simulation scale over many datasets and judged by experts; promising but still limited by single-intersection inputs and lack of multimodal sensing.

Citations10

Evidence Strength0.75

Confidence0.80

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 2/3

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 85%

Production readiness: 70%

Novelty: 70%

Authors

Siqi Lai, Zhao Xu, Weijia Zhang, Hao Liu, Hui Xiong

Links

Abstract / PDF / Code / Data

Why It Matters For Business

LLMLight enables interpretable, generalizable traffic control with much lower deployment cost than closed LLM APIs, making city-scale experiments and phased rollouts affordable.

Who Should Care

Summary TLDR

This paper turns large language models into traffic-signal agents. LLMLight converts local sensor counts into a text prompt, asks an LLM to reason step-by-step (chain-of-thought), and issues a signal phase. The authors build LightGPT by imitating GPT-4 trajectories and refining them with a critic model. On ten datasets (real + synthetic) and a 15-person expert review, LLMLight with LightGPT matches or beats state-of-the-art RL and heuristic methods while being far cheaper to run than closed models like GPT-4. Main limits: single-intersection view, no camera-image inputs, and no pedestrian/bicycle modeling.

Problem Statement

Reinforcement-learning traffic controllers can be powerful but often fail to generalize, are hard to interpret, and require costly training. Off-the-shelf LLMs generalize and reason but lack traffic-specific data and can hallucinate. The paper asks: can an LLM be turned into an interpretable, generalizable, and cost-effective traffic-signal controller?

Main Contribution

LLMLight: a prompting workflow that verbalizes local traffic features and asks an LLM to pick a signal phase with chain-of-thought reasoning.

LightGPT: a TSC-specialized LLM trained by imitation fine-tuning on GPT-4 reasoning plus critic-guided policy refinement.

Key Findings

LightGPT (Llama2-13B) yields low travel times on evaluated datasets.

NumbersATT ≈ 274.03 s on Jinan/Hangzhou (Table 2/8).

Practical UseA mid-sized fine-tuned LLM can match SOTA RL travel-time performance and is practical to run for intersection control.

Evidence RefTables 2 and 8

LLMLight maintains much lower waiting times than many RL methods when scaling to larger networks.

NumbersRL methods' waiting times were 57.8% and 49.8% longer than ours in large-network tests (Figure 5).

Practical UseUsing LLMLight can reduce extreme per-driver waits and improve perceived fairness in big city deployments.

Evidence RefSection 4.7.2 / Figure 5

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Average Travel Time (ATT)274.03 sGPT-4 ATT ≈ 275.26 s−1.23 sJinan / Hangzhou (reported average rows Table 2/8)LLMLight with LightGPT (Llama2-13B) ATT = 274.03 s vs GPT-4 ≈275.26 s (Table 2, Table 8)Tables 2 and 8
Average Waiting Time (AWT)43.24 sGPT-4 AWT ≈ 46.61 s−3.37 sJinan 1 (Table 2/8)LightGPT AWT 43.24 s vs GPT-4 46.61 s (Table 2/8)Tables 2 and 8

What To Try In 7 Days

Run LLMLight in CityFlow on one intersection using local counts as text prompts to validate reasoning outputs.

Fine-tune an open LLM via LoRA on a small set of GPT-4 CoT trajectories and compare ATT/AWT vs your current controller.

Train a simple action-value critic from your simulator to filter/improve candidate LLM trajectories before deployment.

Agent Features

Memory
No explicit long-term memory beyond prompt (per- step observation-based)
Planning
Chain-of-Thought reasoning (stepwise analysis)Critic-guided ranking refinement of action trajectories
Tool Use
CityFlow simulator (for training and evaluation)
Frameworks
LLMLight prompting workflowImitation fine-tuning + ranking-based policy refinement (CGPR)
Is Agentic

Yes

Architectures
Large Language Model (LLM) agent per intersectionLightGPT backbone (fine-tuned LLM variants)
Collaboration
Single-agent (local observation only); multi-agent cooperation noted as future work

Optimization Features

Token Efficiency
Top_p=1.0; temperature=0 or 0.1 for stability (Section 4.4)
Infra Optimization
Real-time settings tuned for 10–20 parallel intersections per machine
Model Optimization
LoRA
System Optimization
Batch control of multiple intersections per machine (details in Appendix A.4)
Training Optimization
Imitation fine-tuning on GPT-4 reasoningFiltering trajectories with an action-value criticRanking-loss-based policy refinement (RBC)
Inference Optimization
Use mid-sized models (Llama3-8B, Llama2-13B) for lower latency

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Single-intersection inputs only; no multi-agent coordination included.

No camera-image or multimodal inputs—relies on counts/sensor features.

When Not To Use

When global coordination across many intersections is required out-of-the-box.

When decisions must incorporate camera vision or pedestrian safety signals.

Failure Modes

LLM instruction-following failures or hallucinations (observed with ChatGPT-3.5).

Garbage sensor inputs lead to incorrect textual summaries and bad actions.

Core Entities

Models

LightGPT (Llama2-13B)LightGPT (Llama3-8B)LightGPT (Llama2-7B)LightGPT (Qwen2-7B)GPT-4ChatGPT-3.5Llama2-13BQwen2-0.5BQwen2-72BLlama3-70B

Metrics

Average Travel Time (ATT)Average Queue Length (AQL)Average Waiting Time (AWT)

Datasets

Jinan (1/2/3/Extreme/24-hour)Hangzhou (1/2/Extreme)New York (1/2)CityFlow (simulator)

Benchmarks

10-dataset TSC benchmark (this work, mixed real+synthetic)