Overview
The idea is simple and practical: batch planning then fetch evidence, which avoids re-sending long context repeatedly; experiments on multiple benchmarks and ablations back this up.
Citations15
Evidence Strength0.80
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 4/4
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 80%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
ReWOO cuts API token usage and hosting cost by separating planning from tool calls, so multi-step tool-using pipelines can run cheaper and scale with smaller models.
Who Should Care
Summary TLDR
ReWOO is a modular prompting pattern for tool-augmented language systems that splits work into Planner (make plans), Worker (call tools and collect evidence), and Solver (use plans+evidence to answer). This avoids the common loop of thought→tool→observation→repeat, reducing repeated context tokens. On six benchmarks ReWOO cut token use massively (≈64% average) while matching or slightly improving accuracy. It also lets you fine-tune a small 7B Planner to emulate reasoning from a much larger model, enabling lighter deployments and better robustness when tools fail.
Problem Statement
Current augmented language models (ALMs) interleave reasoning and tool calls. Each tool response forces the LLM to be re-invoked with the entire history, causing quadratic growth in prompt tokens, high API cost, and slow execution. The paper asks: can we separate reasoning from observations to save tokens while keeping or improving task performance?
Main Contribution
Identify 'foreseeable reasoning' — LLMs can plan plausible next steps without immediate tool observations, enabling prompt-efficient workflows.
Design ReWOO, a Plan-Work-Solve modular paradigm that decouples planning, external tool calls, and final solving to avoid repeating long prompts.
Key Findings
ReWOO reduces token use on HotpotQA by about 5× compared to an observation-dependent ALM (ReAct).
Averaged over six public benchmarks, ReWOO cut input tokens by ~64% and raised absolute accuracy by ~4.4% versus ReAct-like ALMs.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| HotpotQA tokens | ReAct 9795.1 → ReWOO 1986.2 | ReAct | ~5× reduction | HotpotQA (1000 examples) | Table 2 token counts | Table 2 |
| Accuracy | ReAct 40.8 → ReWOO 42.4 | ReAct | +1.6 absolute | HotpotQA (1000 examples) | Table 2 accuracy | Table 2 |
What To Try In 7 Days
Prototype a Planner/Worker/Solver split for an existing tool-augmented QA flow.
Measure token usage per query before/after decoupling planning from tool calls.
Fine-tune a small Planner on a few hundred planning examples to move planning off a large API.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
When the environment state is unknown and planning would require enumerating many possibilities, foreseeable reasoning can be impractical (AlfWorld example).
Adding many irrelevant tools in context can harm performance via tool misuse.
When Not To Use
Interactive embodied tasks where the planner lacks prior environment info and must act adaptively.
Workflows that require immediate observation-dependent branching at every step.
Failure Modes
Tool misuse: workers invoked on wrong tools produce irrelevant evidence.
Solver mistakes: final synthesis step draws wrong conclusion despite valid evidence.

