Use scene graphs + LLMs to split long robot goals into short sub-goals so classical planners solve them fast and reliably

April 4, 20247 min

Overview

Production Readiness

0.6

Novelty Score

0.45

Cost Impact Score

0.55

Citation Count

6

Authors

Yuchen Liu, Luigi Palmieri, Sebastian Koch, Ilche Georgievski, Marco Aiello

Links

Abstract / PDF

Why It Matters For Business

DELTA turns large, slow planning problems into fast, reliable sub-problems so robots can plan long household workflows quickly and with higher success, cutting compute and time costs when paired with a strong LLM.

Summary TLDR

DELTA feeds compact 3D scene graphs into an LLM to (1) generate PDDL domain/problem files, (2) prune irrelevant objects, and (3) decompose long goals into a sequence of sub-goals. An off-the-shelf automated planner then solves each sub-problem autoregressively. In simulation on five household-style domains and 3DSG scenes, DELTA (with GPT-4o) reached very high success rates (98–100% on independent-task domains, 80% and 74.7% on two dependent-task domains) and cut planner time and search nodes by roughly three and two orders of magnitude versus not decomposing and versus several LLM baselines.

Problem Statement

Robots struggle to plan long sequences in large scenes because raw LLM plans are often infeasible and classical planners explode in complexity when given many irrelevant objects; we need an automated pipeline that turns scene graphs and a user goal into correct, executable, and efficient task plans.

Main Contribution

DELTA: a 5-step pipeline that combines scene graphs and LLMs to auto-generate PDDL domain/problem files, prune irrelevant items, decompose long goals, and plan sub-tasks autoregressively.

Demonstration that goal decomposition plus SG pruning yields large speedups and higher success rates on long-term household planning tasks versus several LLM-based baselines.

Empirical ablation across LLMs (GPT-4o, GPT-4-turbo, Llama-3-70B) showing strong sensitivity to model choice and failure modes (predicate errors, missing attributes, planner timeout).

Key Findings

DELTA with GPT-4o achieves highest success rates across evaluated domains.

NumbersPC 98%, Dining 100%, Cleaning 80%, Office 74.67% (Table II)

Goal decomposition reduces automated planner runtime by thousands of times.

NumbersPC planning time: 0.0134s vs 51.76s (≈3860× faster) when decomposed (Table III)

Scene-graph pruning plus decomposition drastically shrinks planner search.

NumbersExpanded nodes: 624.8 vs 1,585,185 (≈2536× fewer) in PC when decomposed (Table III)

Performance depends strongly on LLM choice.

NumbersDELTA (GPT-4o) Office 74.67% → GPT-4-turbo 9.33% → Llama-3-70B 0.67% (Table II)

Top failure modes are planner timeouts and incorrect/missing predicates.

Numbers21/600 trials: planner timeout; 37/600: incorrect predicates; 25/600: missing attributes (Fig.4)

Results

success rate (DELTA, GPT-4o)

ValuePC 98%; Dining 100%; Cleaning 80%; Office 74.67%

Baselinevarious LLM baselines (Table II)

planning time (per-domain)

ValuePC 0.0134s (DELTA) vs 51.76s (no decomposition)

BaselineDELTA (w/o decomposition)

expanded nodes (search)

ValuePC 624.8 (DELTA) vs 1,585,185 (w/o decomposition)

BaselineDELTA (w/o decomposition)

sensitivity to LLM

ValueOffice: DELTA GPT-4o 74.67% → GPT-4-turbo 9.33% → Llama-3-70B 0.67%

BaselineDELTA GPT-4o

Who Should Care

What To Try In 7 Days

Build or load a scene graph for your environment and test SG pruning to reduce irrelevant objects.

Use an LLM to generate a PDDL domain/problem and compare planning time vs your current pipeline.

Implement simple goal decomposition (sequence of sub-goals) and run an off-the-shelf planner per sub-goal to measure speedups.

Agent Features

Memory

  • uses 3D Scene Graphs as structured environment memory

Planning

  • goal decomposition
  • one-shot domain generation (PDDL)
  • autogeneration of PDDL problem files

Tool Use

  • PDDL generation via LLM
  • Fast Downward planner
  • PDDLGym simulation
  • VAL plan validator

Frameworks

  • PDDL

Is Agentic

true

Architectures

  • LLM + automated planner pipeline
  • autoregressive sub-task planning

Optimization Features

Token Efficiency

  • SG pruning reduces token usage and hallucination risk

System Optimization

  • autoregressive solving of sub-problems for faster planning

Inference Optimization

  • scene graph pruning to cut LLM token input
  • goal decomposition to reduce planner search space

Reproducibility

Data Urls

  • 3D Scene Graph dataset (Armeni et al.)

Code Available

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Assumes full observability and pre-built scene graphs; real perception noise not tested.
  • Sensitive to LLM choice—smaller models perform much worse.
  • Some failures stem from incorrect predicates or missing attributes generated by the LLM.
  • Evaluation in simulation only; real-world dynamics and uncertainties remain unaddressed.

When Not To Use

  • When you lack a reliable scene graph or cannot precompute environment topology.
  • If you must run planning on a resource-constrained small LLM without external planner.
  • For highly dynamic tasks requiring continuous perception and real-time reactivity.

Failure Modes

  • Planner timeout on undecomposed or unpruned problems.
  • LLM-generated incorrect predicates (wrong relations between rooms/items).
  • Missing attributes in generated PDDL (item accessibility etc.).
  • Performance collapse with smaller or cheaper LLMs.

Core Entities

Models

  • GPT-4o
  • GPT-4-turbo
  • Llama-3-70B

Metrics

  • success rate
  • plan length
  • planning time
  • expanded nodes

Datasets

  • 3D Scene Graph dataset (Armeni et al.)

Benchmarks

  • Laundry
  • PC Assembly
  • Dining Table Setup
  • House Cleaning
  • Home Office Setup