Let LLMs translate problems and a classical planner find correct, often optimal, plans

April 22, 20238 min

Overview

Decision SnapshotReady For Pilot

The pipeline is practical and relies on existing robust planners; experiments across 7 domains and a real-robot demo give moderate-to-strong evidence. Main weakness: dependency on correct PDDL generation and on human-provided domain files.

Citations84

Evidence Strength0.80

Confidence0.87

Risk Signals10

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 70%

Novelty: 60%

Authors

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, Peter Stone

Links

Abstract / PDF / Code / Data

Why It Matters For Business

LLM+P turns LLMs into reliable natural-language front ends for proven symbolic planners. That reduces execution risk and often lowers real-world costs (e.g., fewer extra robot trips). It avoids expensive LLM fine-tuning by delegating correctness to existing planners.

Who Should Care

Summary TLDR

LLM+P uses a large language model (LLM) to translate a natural-language planning problem into PDDL (a planner input), runs a fast classical planner to find a correct or optimal plan, then translates that plan back to natural language. This pipeline solves long-horizon robot planning tasks far more reliably than using LLMs alone, provided a domain PDDL and a short example are supplied.

Problem Statement

LLMs often produce plausible but incorrect long-horizon plans because they lack reliable symbolic reasoning about actions and preconditions. The paper asks: can we keep LLMs for language work (translation) and rely on classical planners for correct, optimal planning?

Main Contribution

Introduce LLM+P, a pipeline that: (1) asks an LLM to convert a natural-language planning problem into PDDL, (2) runs a classical planner on that PDDL, and (3) translates the planner's plan back to natural language or robot actions.

Provide a benchmark suite of seven robot planning domains (20 tasks each) derived from standard PDDL generators to evaluate planning performance.

Key Findings

LLM+P produced correct or optimal plans in most evaluated domains while LLM-only methods usually failed.

NumbersBLOCKSWORLD 90% (LLM 1520%); GRIPPERS 95% (LLM 35%) ; STORAGE 85% (LLM 0%)

Practical UseIf you can supply a domain PDDL and a short example, use LLM+P to get reliable plans instead of asking the LLM to plan directly.

Evidence RefTable I; Section V-C

Context (a single example problem + PDDL) is crucial for correct PDDL generation by the LLM.

NumbersWithout context many generated PDDL files are incorrect; experiments show high drop in solver success without context (d

Practical UseAlways include a short (problem,PDDL) demonstration when prompting the LLM to translate natural language to PDDL.

Evidence RefSection III-A, Section V-C

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
optimal plan success rateBLOCKSWORLD 90%LLM only 1520%≈ +70ppBLOCKSWORLD (20 tasks)Table I; Section V-CTable I
optimal plan success rateGRIPPERS 95% (100% with sub-optimal alias)LLM only 35% (some sub-optimal plans)≈ +60ppGRIPPERS (20 tasks)Table I; Section V-CTable I

What To Try In 7 Days

If you have a robotics task with defined actions, write a domain PDDL and try translating a few natural-language tasks with GPT-4 into problem PDDL; run FAST-DOWNWARD to compare pl

Create a 1-shot example (problem + PDDL) and include it in prompts; measure whether the planner now finds solutions.

Deploy the pipeline for a small field demo (e.g., pick-and-place or tidy-up) and compare execution cost and failure rate to LLM-only plans.

Agent Features

Memory
In-context learning (single example demonstration)
Planning
Translate NL to PDDL (LLM)Run classical PDDL planner to produce optimal planTranslate symbolic plan to natural language or robot actions
Tool Use
FAST-DOWNWARDPDDL (domain + problem files)
Is Agentic

Yes

Architectures
LLM (GPT-4) + classical planner (FAST-DOWNWARD)

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Requires a domain PDDL file for each domain; authors assume a human provides it (Section III-C).

LLM must be given a short (problem, PDDL) demonstration; without it produced PDDL is often incorrect.

When Not To Use

When you cannot produce a reliable domain PDDL (open-world tasks without a fixed action set).

When perception and low-level motion tightly couple to symbolic planning and no abstraction to PDDL exists.

Failure Modes

LLM omits or mangles initial conditions or uses made-up predicates → planner finds no plan.

LLM-only planning produces infeasible plans because it fails to track preconditions (e.g., ON, CLEAR).

Core Entities

Models

GPT-4

Metrics

Success rate % of producing (optimal) plansPlan cost (execution metric used in robot demo)

Datasets

7 PDDL domains (BLOCKSWORLD, BARMAN, FLOORTILE, GRIPPERS, STORAGE, TERMES, TYREWORLD) with 20 tasksPDDL generators (Seipp et al. 2022)

Benchmarks

7-domain robot planning benchmark (authors' suite)