Use LLM agents and a fishbowl discussion to simulate participatory urban planning and improve resident satisfaction

February 27, 20247 min

Overview

Production Readiness

0.3

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

11

Authors

Zhilun Zhou, Yuming Lin, Depeng Jin, Yong Li

Links

Abstract / PDF

Why It Matters For Business

Simulated multi-agent LLM planning can surface local needs early, reducing time and rehearsal costs before engaging humans; it helps test many “what-if” land-use options quickly while keeping service coverage competitive.

Summary TLDR

This paper builds a multi-agent system of LLMs that simulates a planner plus thousands of resident agents to produce land-use plans. Residents are role-played from census distributions and discuss via a fishbowl mechanism (inner/outer circles). The planner (GPT-4 vision) proposes an initial map, residents discuss in rounds, discussion is summarized, and the planner revises the plan. On two Beijing regions the method raises need-aware metrics (Satisfaction and Inclusion) substantially vs baselines while keeping service/access metrics competitive. The setup omits costs, ownership, and many real-world constraints.

Problem Statement

Traditional participatory planning is slow, costly, and hard to scale to thousands of residents. How can we simulate many stakeholders cheaply and efficiently so planners can create land-use plans that actually reflect diverse residents' needs?

Main Contribution

A multi-agent LLM framework that role-plays a planner and many residents to simulate participatory urban planning.

A fishbowl discussion mechanism (inner/outer circles + summaries) to scale resident discussion and limit context length.

Deployment on two real Beijing regions with quantitative metrics showing higher resident satisfaction and inclusion than baselines and human experts.

Key Findings

Simulated participatory planning raised resident Satisfaction to 0.787 on HLG.

NumbersSatisfaction 0.787 (HLG) vs 0.708 (best baseline DRL)

Inclusion for marginalized groups improved to 0.773 on HLG.

NumbersInclusion 0.773 (HLG) vs 0.716 (DRL)

Service accessibility remained competitive while optimizing for residents.

NumbersService 0.756 (HLG) — close to top baseline 0.773 (DRL)

Fishbowl rounds materially affect outcomes: 3 rounds improved need-aware metrics.

NumbersSatisfaction rose 6.6% on HLG when rounds increased 1→3

Ablations show role-play and discussion matter.

NumbersRemoving role-play or discussion drops Satisfaction/Inclusion by ~4–8%

Results

Satisfaction

Value0.787

BaselineDRL 0.708 (best baseline)

Inclusion

Value0.773

BaselineDRL 0.716

Service

Value0.756

BaselineDRL 0.773 (best)

Ecology

Value0.713

BaselineDRL 0.747

Satisfaction

Value0.778

BaselineDecentralized 0.687

Service

Value0.760

BaselineDecentralized 0.743

Who Should Care

What To Try In 7 Days

Run a small pilot: create 100 resident agents from local demographics and role-play 1 community with GPT-4 vision and gpt-3.5 residents.

Use 3 fishbowl rounds and compare Satisfaction/Inclusion vs a planner-only baseline.

Produce summaries after each round to keep context short and reuse them in prompts.

Agent Features

Memory

  • Short-term discussion summary
  • Round-by-round history aggregation

Planning

  • Planning with LLMs
  • Task Decomposition
  • Community-level revision

Tool Use

  • Multimodal map input
  • Prompt-based role-play

Frameworks

  • Inner/outer fishbowl
  • Role-play prompts

Is Agentic

true

Architectures

  • GPT-4-vision
  • GPT-3.5

Collaboration

  • Fishbowl discussion
  • Sequential community revision

Optimization Features

Token Efficiency

  • Use summaries to limit token growth

Inference Optimization

  • Reduce context by summarizing rounds

Reproducibility

Open Source Status

  • no

Risks & Boundaries

Limitations

  • Does not model ownership, development cost, or regulatory constraints.
  • Land-use types and requirements are simplified to eight categories.
  • Relies heavily on manually designed prompts and map descriptions.

When Not To Use

  • For legally binding or final planning decisions that require ownership/cost modeling.
  • When transparent, auditable decision chains are required without prompt engineering.
  • If the site requires domain data not encoded in prompts (e.g., soil, utilities).

Failure Modes

  • LLM-generated residents may hallucinate unrealistic needs or locations.
  • Prompt bias can skew which resident concerns are surfaced.
  • Long discussions could stagnate or degrade outcomes if rounds exceed ~3.

Core Entities

Models

  • gpt-4-vision-preview
  • gpt-3.5-turbo-1106

Metrics

  • Service
  • Ecology
  • Satisfaction
  • Inclusion

Datasets

  • Huilongguan (HLG)
  • Dahongmen (DHM)