Overview
Strong simulation evidence shows big speed and success gains, but real-robot validation and robustness to smaller LLMs are missing.
Citations6
Evidence Strength0.80
Confidence0.87
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/4
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 55%
Production readiness: 60%
Novelty: 45%
Why It Matters For Business
DELTA turns large, slow planning problems into fast, reliable sub-problems so robots can plan long household workflows quickly and with higher success, cutting compute and time costs when paired with a strong LLM.
Who Should Care
Summary TLDR
DELTA feeds compact 3D scene graphs into an LLM to (1) generate PDDL domain/problem files, (2) prune irrelevant objects, and (3) decompose long goals into a sequence of sub-goals. An off-the-shelf automated planner then solves each sub-problem autoregressively. In simulation on five household-style domains and 3DSG scenes, DELTA (with GPT-4o) reached very high success rates (98–100% on independent-task domains, 80% and 74.7% on two dependent-task domains) and cut planner time and search nodes by roughly three and two orders of magnitude versus not decomposing and versus several LLM baselines.
Problem Statement
Robots struggle to plan long sequences in large scenes because raw LLM plans are often infeasible and classical planners explode in complexity when given many irrelevant objects; we need an automated pipeline that turns scene graphs and a user goal into correct, executable, and efficient task plans.
Main Contribution
DELTA: a 5-step pipeline that combines scene graphs and LLMs to auto-generate PDDL domain/problem files, prune irrelevant items, decompose long goals, and plan sub-tasks autoregressively.
Demonstration that goal decomposition plus SG pruning yields large speedups and higher success rates on long-term household planning tasks versus several LLM-based baselines.
Key Findings
DELTA with GPT-4o achieves highest success rates across evaluated domains.
Goal decomposition reduces automated planner runtime by thousands of times.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| success rate (DELTA, GPT-4o) | PC 98%; Dining 100%; Cleaning 80%; Office 74.67% | various LLM baselines (Table II) | Up to +~98% absolute vs naive LLM-as-planner in dependent-task domains | averaged over 3 scenes and 50 trials each (600 trials total) | Table II | Table II |
| planning time (per-domain) | PC 0.0134s (DELTA) vs 51.76s (no decomposition) | DELTA (w/o decomposition) | ≈3.9k× faster | PC domain, averaged succeeded cases | Table III | Table III |
What To Try In 7 Days
Build or load a scene graph for your environment and test SG pruning to reduce irrelevant objects.
Use an LLM to generate a PDDL domain/problem and compare planning time vs your current pipeline.
Implement simple goal decomposition (sequence of sub-goals) and run an off-the-shelf planner per sub-goal to measure speedups.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Optimization Features
Token Efficiency
System Optimization
Inference Optimization
Reproducibility
Code URLs
Data URLs
Risks & Boundaries
Limitations
Assumes full observability and pre-built scene graphs; real perception noise not tested.
Sensitive to LLM choice—smaller models perform much worse.
When Not To Use
When you lack a reliable scene graph or cannot precompute environment topology.
If you must run planning on a resource-constrained small LLM without external planner.
Failure Modes
Planner timeout on undecomposed or unpruned problems.
LLM-generated incorrect predicates (wrong relations between rooms/items).

