Overview
AdaPlanner structures plans as executable code, distinguishes two feedback types, and refines/resumes without training, improving sample and call efficiency on text simulators.
Citations13
Evidence Strength0.70
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/4
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 70%
Why It Matters For Business
AdaPlanner cuts dependence on large labeled datasets and repeated LLM calls by adaptively revising code-style plans, saving annotation and API cost while improving performance on long-horizon text tasks.
Who Should Care
Summary TLDR
AdaPlanner is a closed-loop method that lets a single LLM act as both planner and refiner. It writes Python-like plans, checks assertions during execution, and reacts to two kinds of feedback: in-plan (extract info) and out-of-plan (revise whole plan). It also stores successful plans as ‘skills’ for few-shot prompting. On text simulators (ALFWorld, MiniWoB++), AdaPlanner reaches ~91% success, improves over prior prompting baselines, and cuts the need for demonstrations (2x fewer on ALFWorld, ~600x fewer vs a strong supervised method on MiniWoB++) while reducing hallucination via code prompts. Code is available on GitHub.
Problem Statement
Current LLM-agent methods either execute fixed, open-loop plans or only tweak the next action. They fail to revise entire plans in response to unexpected feedback or need heavy task-specific training. The field needs a lightweight, general way to adapt whole plans online without training a plan selector.
Main Contribution
Explicit closed-loop planning that lets an LLM both generate and revise entire plans during execution.
Two refinement modes: in-plan (extract info) and out-of-plan (revise full plan and resume at a checkpoint).
Key Findings
AdaPlanner achieves 91.79% overall success on 134 ALFWorld tasks.
AdaPlanner achieves 91.11% success on MiniWoB++ tasks with feedback.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Success rate (ALFWorld) | 91.79% | Reflexion/ReAct/BUTLER | up to + few percent over prompting baselines | 134 ALFWorld tasks | Table 2 shows AdaPlanner 91.79% overall. | Table 2 |
| Success rate (MiniWoB++ with feedback) | 91.11% | RCI, CC-Net, WGE | better than RCI and comparable to CC-Net with far fewer samples | MiniWoB++ (9 feedback tasks; 53 task subset) | Table 3 reports 91.11% with AdaPlanner. | Table 3 |
What To Try In 7 Days
Replace free-text action plans with small code-style templates to constrain LLM outputs.
Implement an ask_LLM() step to extract structured info from environment feedback.
Add simple checkpointing (start_from) so your system can refine a plan and resume mid-episode instead of restarting.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Optimization Features
Token Efficiency
System Optimization
Reproducibility
Risks & Boundaries
Limitations
Requires some few-shot demonstrations for complex tasks; not fully zero-shot for hardest cases.
Evaluated only on text-based simulated environments (ALFWorld, MiniWoB++), not on robotics or visual sensors.
When Not To Use
Safety-critical real-world systems without extensive validation.
Perception-heavy domains requiring raw visual or sensor grounding not supported by text prompts.
Failure Modes
LLM hallucination when prompts are ambiguous or model is lower-capacity.
Overfitting of discovered skills to episode-specific details leading to poor generalization.

