Overview
Production Readiness
0.6
Novelty Score
0.45
Cost Impact Score
0.55
Citation Count
6
Why It Matters For Business
DELTA turns large, slow planning problems into fast, reliable sub-problems so robots can plan long household workflows quickly and with higher success, cutting compute and time costs when paired with a strong LLM.
Summary TLDR
DELTA feeds compact 3D scene graphs into an LLM to (1) generate PDDL domain/problem files, (2) prune irrelevant objects, and (3) decompose long goals into a sequence of sub-goals. An off-the-shelf automated planner then solves each sub-problem autoregressively. In simulation on five household-style domains and 3DSG scenes, DELTA (with GPT-4o) reached very high success rates (98–100% on independent-task domains, 80% and 74.7% on two dependent-task domains) and cut planner time and search nodes by roughly three and two orders of magnitude versus not decomposing and versus several LLM baselines.
Problem Statement
Robots struggle to plan long sequences in large scenes because raw LLM plans are often infeasible and classical planners explode in complexity when given many irrelevant objects; we need an automated pipeline that turns scene graphs and a user goal into correct, executable, and efficient task plans.
Main Contribution
DELTA: a 5-step pipeline that combines scene graphs and LLMs to auto-generate PDDL domain/problem files, prune irrelevant items, decompose long goals, and plan sub-tasks autoregressively.
Demonstration that goal decomposition plus SG pruning yields large speedups and higher success rates on long-term household planning tasks versus several LLM-based baselines.
Empirical ablation across LLMs (GPT-4o, GPT-4-turbo, Llama-3-70B) showing strong sensitivity to model choice and failure modes (predicate errors, missing attributes, planner timeout).
Key Findings
DELTA with GPT-4o achieves highest success rates across evaluated domains.
Goal decomposition reduces automated planner runtime by thousands of times.
Scene-graph pruning plus decomposition drastically shrinks planner search.
Performance depends strongly on LLM choice.
Top failure modes are planner timeouts and incorrect/missing predicates.
Results
success rate (DELTA, GPT-4o)
planning time (per-domain)
expanded nodes (search)
sensitivity to LLM
Who Should Care
What To Try In 7 Days
Build or load a scene graph for your environment and test SG pruning to reduce irrelevant objects.
Use an LLM to generate a PDDL domain/problem and compare planning time vs your current pipeline.
Implement simple goal decomposition (sequence of sub-goals) and run an off-the-shelf planner per sub-goal to measure speedups.
Agent Features
Memory
- uses 3D Scene Graphs as structured environment memory
Planning
- goal decomposition
- one-shot domain generation (PDDL)
- autogeneration of PDDL problem files
Tool Use
- PDDL generation via LLM
- Fast Downward planner
- PDDLGym simulation
- VAL plan validator
Frameworks
- PDDL
Is Agentic
true
Architectures
- LLM + automated planner pipeline
- autoregressive sub-task planning
Optimization Features
Token Efficiency
- SG pruning reduces token usage and hallucination risk
System Optimization
- autoregressive solving of sub-problems for faster planning
Inference Optimization
- scene graph pruning to cut LLM token input
- goal decomposition to reduce planner search space
Reproducibility
Code Urls
Data Urls
- 3D Scene Graph dataset (Armeni et al.)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Assumes full observability and pre-built scene graphs; real perception noise not tested.
- Sensitive to LLM choice—smaller models perform much worse.
- Some failures stem from incorrect predicates or missing attributes generated by the LLM.
- Evaluation in simulation only; real-world dynamics and uncertainties remain unaddressed.
When Not To Use
- When you lack a reliable scene graph or cannot precompute environment topology.
- If you must run planning on a resource-constrained small LLM without external planner.
- For highly dynamic tasks requiring continuous perception and real-time reactivity.
Failure Modes
- Planner timeout on undecomposed or unpruned problems.
- LLM-generated incorrect predicates (wrong relations between rooms/items).
- Missing attributes in generated PDDL (item accessibility etc.).
- Performance collapse with smaller or cheaper LLMs.
Core Entities
Models
- GPT-4o
- GPT-4-turbo
- Llama-3-70B
Metrics
- success rate
- plan length
- planning time
- expanded nodes
Datasets
- 3D Scene Graph dataset (Armeni et al.)
Benchmarks
- Laundry
- PC Assembly
- Dining Table Setup
- House Cleaning
- Home Office Setup

