UrbanKGent: an LLM agent that builds city-scale knowledge graphs cheaper and more accurately using geospatial tools

February 10, 20248 min

Overview

Production Readiness

0.7

Novelty Score

0.65

Cost Impact Score

0.8

Citation Count

3

Authors

Yansong Ning, Hao Liu

Links

Abstract / PDF

Why It Matters For Business

UrbanKGent lets teams build large, practical city knowledge graphs with small open models, cutting inference costs roughly 20× and lowering data needs, so you can deploy KG-driven city apps faster and cheaper.

Summary TLDR

UrbanKGent is an LLM-agent pipeline that turns raw urban text and geo-data into large urban knowledge graphs. It builds city-scale graphs by: (1) creating heterogeneity-aware, geospatial-infused instructions; (2) distilling GPT-4 chain-of-thought trajectories and refining them with external geospatial tools; (3) fine-tuning Llama-family models with LoRA. The result: fine-tuned 7/8/13B agents that match or beat GPT-4 on urban triplet extraction and relation completion, cut inference cost ≈20×, and construct comparable UrbanKGs using ≈20% of the original data volume.

Problem Statement

Building urban knowledge graphs is labor-intensive and brittle: prior pipelines need hand-crafted rules or expensive annotated corpora. Off‑the‑shelf LLMs struggle with heterogeneous urban relations (spatial, temporal, functional) and with geospatial computation (distance, containment). The paper asks: can a domain-tailored LLM agent combine knowledge-aware prompts, geospatial tools, and distilled reasoning to automate UrbanKG construction cost-effectively?

Main Contribution

UrbanKGent: an end-to-end LLM agent framework that combines heterogeneity-aware instructions, geospatial tool calls, trajectory refinement, and hybrid fine-tuning to build UrbanKGs.

A geospatial-infused instruction set plus a tool-augmented iterative trajectory refinement method that distills GPT-4 chains-of-thought into faithful training trajectories.

Open-source UrbanKGent agents (7B/8B/13B) fine-tuned with LoRA that deliver state-of-the-art UrbanKGC performance while cutting inference cost substantially.

Key Findings

Fine-tuned UrbanKGent-13B outperforms GPT-4 on UrbanKGC accuracy on evaluated datasets.

NumbersNYC: +~15% (RTE) and +~14% (KGC) accuracy vs GPT-4 on evaluated splits

UrbanKGent constructs UrbanKGs at similar scale using far less input data.

NumbersConstructed UrbanKG with similar #entities/triplets using ~20% of data used by prior benchmark

Geospatial tool invocation and iterative refinement materially improve KGC quality.

NumbersRemoving tool invocation cut KGC human accuracy from ~0.43 to ~0.23 on ablation (≈20 pp drop)

Results

Accuracy

Value0.55

BaselineZero-shot (ZSL) human 0.42

Accuracy

Value0.46

BaselineZero-shot (ZSL) human 0.31

Inference cost (total)

Value≈20x lower cost

BaselineGPT-4 inference

Data efficiency for KG construction

Value≈5× less data required

BaselineUUKG benchmark pipeline

Who Should Care

What To Try In 7 Days

Run a quick pilot: fine-tune a Llama-7B model with LoRA on a few hundred domain instructions distilled from GPT-4.

Wrap simple geospatial utilities (distance, contains, intersects) and call them from your prompts to handle geometry accurately.

Validate outputs on 200 samples via human labels and GPT-4 evaluation to measure accuracy and calibrate filters.

Agent Features

Memory

  • trajectory distillation (saved CoT steps used as instruction targets)

Planning

  • iterative self-refinement (verifier + updater)
  • multi-turn multi-view instruction dialogs

Tool Use

  • external geospatial toolkit (distance, containment, geohash, intersection)
  • self-programmed tool interfaces generated via GPT-4

Frameworks

  • FireAct-style reasoning distillation
  • LoRA

Is Agentic

true

Architectures

  • LLM agent pipeline (Llama-family fine-tuned)
  • chain-of-thought distillation

Collaboration

  • uses GPT-4 both as trajectory teacher and as an automatic evaluator

Optimization Features

Token Efficiency

  • reduces GPT API dependence by using local LLMs

Infra Optimization

  • counts GPU runtime to estimate cost vs GPT API

Model Optimization

  • LoRA

System Optimization

  • A800 GPUs for batch inference

Training Optimization

  • hybrid instruction fine-tuning on distilled and refined GPT-4 trajectories
  • multi-view instruction mixture training

Inference Optimization

  • deploy smaller fine-tuned models (7B/8B/13B) instead of calling GPT-4
  • use task-specific prompts to reduce unnecessary tokens

Reproducibility

Data Urls

  • NYC/CHI data collected from public sources (NYC.gov, Chicago.gov, OpenStreetMap, Google Maps, Wikipedia, C4); dataset files referenced in repo

Code Available

Data Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • Evaluation relies heavily on GPT-4 self-evaluation which, while correlated with humans, is not flawless.
  • Applications shown are limited to two cities; generality to other urban contexts needs testing.
  • No image or multimodal data used; spatial reasoning using imagery is not addressed.

When Not To Use

  • When you need provable, traceable geospatial decisions without LLM ambiguity.
  • In safety-critical deployments before human-in-the-loop validation and legal review.
  • When you require multimodal (map imagery) signals not supported by this pipeline.

Failure Modes

  • LLM hallucinations producing incorrect triplets from noisy web text.
  • Tool invocation errors or mis-integration leading to wrong geospatial inferences.
  • Over-merged relation types during automated relation clustering causing loss of precision.

Core Entities

Models

  • UrbanKGent-7B
  • UrbanKGent-8B
  • UrbanKGent-13B
  • Llama-2-7B
  • Llama-2-13B
  • Llama-3-8B
  • Llama-2-70B
  • Llama-3-70B
  • GPT-3.5
  • GPT-4
  • Vicuna-7B
  • Alpaca-7B
  • Mistral-7B

Metrics

  • Accuracy
  • GPT-4 confidence
  • inference latency (minutes per dataset)
  • inference cost (USD per 1,000 tasks)

Datasets

  • NYC-Instruct
  • CHI-Instruct
  • NYC
  • CHI
  • NYC-Large
  • CHI-Large
  • UUKG (benchmark)

Benchmarks

  • UUKG