Use LLMs (LightGPT) to control traffic lights with human-like reasoning and lower deployment cost

Overview

Decision SnapshotReady For Pilot

The approach is tested at simulation scale over many datasets and judged by experts; promising but still limited by single-intersection inputs and lack of multimodal sensing.

Citations10

Evidence Strength0.75

Confidence0.80

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 2/3

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 85%

Production readiness: 70%

Novelty: 70%

Authors

Siqi Lai, Zhao Xu, Weijia Zhang, Hao Liu, Hui Xiong

Links

Abstract / PDF / Code / Data

Why It Matters For Business

LLMLight enables interpretable, generalizable traffic control with much lower deployment cost than closed LLM APIs, making city-scale experiments and phased rollouts affordable.

Who Should Care

Product Manager ML Engineer CTO Founder Engineering Lead

Summary TLDR

This paper turns large language models into traffic-signal agents. LLMLight converts local sensor counts into a text prompt, asks an LLM to reason step-by-step (chain-of-thought), and issues a signal phase. The authors build LightGPT by imitating GPT-4 trajectories and refining them with a critic model. On ten datasets (real + synthetic) and a 15-person expert review, LLMLight with LightGPT matches or beats state-of-the-art RL and heuristic methods while being far cheaper to run than closed models like GPT-4. Main limits: single-intersection view, no camera-image inputs, and no pedestrian/bicycle modeling.

Problem Statement

Reinforcement-learning traffic controllers can be powerful but often fail to generalize, are hard to interpret, and require costly training. Off-the-shelf LLMs generalize and reason but lack traffic-specific data and can hallucinate. The paper asks: can an LLM be turned into an interpretable, generalizable, and cost-effective traffic-signal controller?

Main Contribution

LLMLight: a prompting workflow that verbalizes local traffic features and asks an LLM to pick a signal phase with chain-of-thought reasoning.

LightGPT: a TSC-specialized LLM trained by imitation fine-tuning on GPT-4 reasoning plus critic-guided policy refinement.

Key Findings

LightGPT (Llama2-13B) yields low travel times on evaluated datasets.

NumbersATT ≈ 274.03 s on Jinan/Hangzhou (Table 2/8).

Practical UseA mid-sized fine-tuned LLM can match SOTA RL travel-time performance and is practical to run for intersection control.

Evidence RefTables 2 and 8

LLMLight maintains much lower waiting times than many RL methods when scaling to larger networks.

NumbersRL methods' waiting times were 57.8% and 49.8% longer than ours in large-network tests (Figure 5).

Practical UseUsing LLMLight can reduce extreme per-driver waits and improve perceived fairness in big city deployments.

Evidence RefSection 4.7.2 / Figure 5

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Average Travel Time (ATT)	274.03 s	GPT-4 ATT ≈ 275.26 s	−1.23 s	Jinan / Hangzhou (reported average rows Table 2/8)	LLMLight with LightGPT (Llama2-13B) ATT = 274.03 s vs GPT-4 ≈275.26 s (Table 2, Table 8)	Tables 2 and 8
Average Waiting Time (AWT)	43.24 s	GPT-4 AWT ≈ 46.61 s	−3.37 s	Jinan 1 (Table 2/8)	LightGPT AWT 43.24 s vs GPT-4 46.61 s (Table 2/8)	Tables 2 and 8

What To Try In 7 Days

Run LLMLight in CityFlow on one intersection using local counts as text prompts to validate reasoning outputs.

Fine-tune an open LLM via LoRA on a small set of GPT-4 CoT trajectories and compare ATT/AWT vs your current controller.

Train a simple action-value critic from your simulator to filter/improve candidate LLM trajectories before deployment.

Agent Features

Memory

No explicit long-term memory beyond prompt (per- step observation-based)

Planning

Chain-of-Thought reasoning (stepwise analysis)Critic-guided ranking refinement of action trajectories

Tool Use

CityFlow simulator (for training and evaluation)

Frameworks

LLMLight prompting workflowImitation fine-tuning + ranking-based policy refinement (CGPR)

Is Agentic

Yes

Architectures

Large Language Model (LLM) agent per intersectionLightGPT backbone (fine-tuned LLM variants)

Collaboration

Single-agent (local observation only); multi-agent cooperation noted as future work

Optimization Features

Token Efficiency

Top_p=1.0; temperature=0 or 0.1 for stability (Section 4.4)

Infra Optimization

Real-time settings tuned for 10–20 parallel intersections per machine

Model Optimization

LoRA

System Optimization

Batch control of multiple intersections per machine (details in Appendix A.4)

Training Optimization

Imitation fine-tuning on GPT-4 reasoningFiltering trajectories with an action-value criticRanking-loss-based policy refinement (RBC)

Inference Optimization

Use mid-sized models (Llama3-8B, Llama2-13B) for lower latency

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/usail-hkust/LLMTSCS https://lightgpt2024.github.io/LLMLight_Demo/

Data URLs

http://traffic-signal-control.github.io

Risks & Boundaries

Limitations

Single-intersection inputs only; no multi-agent coordination included.

No camera-image or multimodal inputs—relies on counts/sensor features.

When Not To Use

When global coordination across many intersections is required out-of-the-box.

When decisions must incorporate camera vision or pedestrian safety signals.

Failure Modes

LLM instruction-following failures or hallucinations (observed with ChatGPT-3.5).

Garbage sensor inputs lead to incorrect textual summaries and bad actions.

Core Entities

Models

LightGPT (Llama2-13B)LightGPT (Llama3-8B)LightGPT (Llama2-7B)LightGPT (Qwen2-7B)GPT-4ChatGPT-3.5Llama2-13BQwen2-0.5BQwen2-72BLlama3-70B

Metrics

Average Travel Time (ATT)Average Queue Length (AQL)Average Waiting Time (AWT)

Datasets

Jinan (1/2/3/Extreme/24-hour)Hangzhou (1/2/Extreme)New York (1/2)CityFlow (simulator)

Benchmarks

10-dataset TSC benchmark (this work, mixed real+synthetic)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

LightGPT (Llama2-13B) yields low travel times on evaluated datasets.

LLMLight maintains much lower waiting times than many RL methods when scaling to larger networks.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

AgentAuditor: memory‑augmented RAG + CoT that makes LLM evaluators reach human-level accuracy on agent safety

Key finding

Metamorphic tests show many LLM agents give different answers to the same problem when phrased differently

Key finding

R-Judge: a human-curated benchmark (569 agent logs) that tests whether LLMs spot safety risks in agent interactions

Key finding

A single LLM can role-play homogeneous multi-agent workflows and cut inference cost via KV-cache reuse

Key finding

DeceptGuard: detect agent deception by reading CoT text and activation probes

Key finding