Use scene graphs + LLMs to split long robot goals into short sub-goals so classical planners solve them fast and reliably

April 4, 20247 min

Overview

Decision SnapshotNeeds Validation

Strong simulation evidence shows big speed and success gains, but real-robot validation and robustness to smaller LLMs are missing.

Citations6

Evidence Strength0.80

Confidence0.87

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 4/4

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 55%

Production readiness: 60%

Novelty: 45%

Authors

Yuchen Liu, Luigi Palmieri, Sebastian Koch, Ilche Georgievski, Marco Aiello

Links

Abstract / PDF / Code / Data

Why It Matters For Business

DELTA turns large, slow planning problems into fast, reliable sub-problems so robots can plan long household workflows quickly and with higher success, cutting compute and time costs when paired with a strong LLM.

Who Should Care

Summary TLDR

DELTA feeds compact 3D scene graphs into an LLM to (1) generate PDDL domain/problem files, (2) prune irrelevant objects, and (3) decompose long goals into a sequence of sub-goals. An off-the-shelf automated planner then solves each sub-problem autoregressively. In simulation on five household-style domains and 3DSG scenes, DELTA (with GPT-4o) reached very high success rates (98–100% on independent-task domains, 80% and 74.7% on two dependent-task domains) and cut planner time and search nodes by roughly three and two orders of magnitude versus not decomposing and versus several LLM baselines.

Problem Statement

Robots struggle to plan long sequences in large scenes because raw LLM plans are often infeasible and classical planners explode in complexity when given many irrelevant objects; we need an automated pipeline that turns scene graphs and a user goal into correct, executable, and efficient task plans.

Main Contribution

DELTA: a 5-step pipeline that combines scene graphs and LLMs to auto-generate PDDL domain/problem files, prune irrelevant items, decompose long goals, and plan sub-tasks autoregressively.

Demonstration that goal decomposition plus SG pruning yields large speedups and higher success rates on long-term household planning tasks versus several LLM-based baselines.

Key Findings

DELTA with GPT-4o achieves highest success rates across evaluated domains.

NumbersPC 98%, Dining 100%, Cleaning 80%, Office 74.67% (Table II)

Practical UseUse DELTA (best with a strong LLM) to reliably solve long household planning tasks in simulation; expect near-perfect results for independent subtasks and strong but lower results for dependent subtasks.

Evidence RefTable II

Goal decomposition reduces automated planner runtime by thousands of times.

NumbersPC planning time: 0.0134s vs 51.76s (≈3860× faster) when decomposed (Table III)

Practical UseDecompose long goals into sub-goals before calling a classical planner to turn planning from minutes into milliseconds in many cases.

Evidence RefTable III

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
success rate (DELTA, GPT-4o)PC 98%; Dining 100%; Cleaning 80%; Office 74.67%various LLM baselines (Table II)Up to +~98% absolute vs naive LLM-as-planner in dependent-task domainsaveraged over 3 scenes and 50 trials each (600 trials total)Table IITable II
planning time (per-domain)PC 0.0134s (DELTA) vs 51.76s (no decomposition)DELTA (w/o decomposition)≈3.9k× fasterPC domain, averaged succeeded casesTable IIITable III

What To Try In 7 Days

Build or load a scene graph for your environment and test SG pruning to reduce irrelevant objects.

Use an LLM to generate a PDDL domain/problem and compare planning time vs your current pipeline.

Implement simple goal decomposition (sequence of sub-goals) and run an off-the-shelf planner per sub-goal to measure speedups.

Agent Features

Memory
uses 3D Scene Graphs as structured environment memory
Planning
goal decompositionone-shot domain generation (PDDL)autogeneration of PDDL problem files
Tool Use
PDDL generation via LLMFast Downward plannerPDDLGym simulationVAL plan validator
Frameworks
PDDL
Is Agentic

Yes

Architectures
LLM + automated planner pipelineautoregressive sub-task planning

Optimization Features

Token Efficiency
SG pruning reduces token usage and hallucination risk
System Optimization
autoregressive solving of sub-problems for faster planning
Inference Optimization
scene graph pruning to cut LLM token inputgoal decomposition to reduce planner search space

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

3D Scene Graph dataset (Armeni et al.)

Risks & Boundaries

Limitations

Assumes full observability and pre-built scene graphs; real perception noise not tested.

Sensitive to LLM choice—smaller models perform much worse.

When Not To Use

When you lack a reliable scene graph or cannot precompute environment topology.

If you must run planning on a resource-constrained small LLM without external planner.

Failure Modes

Planner timeout on undecomposed or unpruned problems.

LLM-generated incorrect predicates (wrong relations between rooms/items).

Core Entities

Models

GPT-4oGPT-4-turboLlama-3-70B

Metrics

success rateplan lengthplanning timeexpanded nodes

Datasets

3D Scene Graph dataset (Armeni et al.)

Benchmarks

LaundryPC AssemblyDining Table SetupHouse CleaningHome Office Setup