Overview
RAFA is a clear, implementable protocol: use LLMs as prompted Model/Critic/Elite, plan several steps, execute the first, record feedback, replan; theoretical √T regret supports expected learning gains under stated assumptions.
Citations2
Evidence Strength0.80
Confidence0.85
Risk Signals10
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/5
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 80%
Why It Matters For Business
RAFA reduces costly environment trials by using LLMs as in-context model estimators and planning ahead, so you can ship agents that learn faster without fine-tuning models.
Who Should Care
Summary TLDR
RAFA is a prompting-based framework that makes an LLM alternate between (1) reasoning from a memory buffer to estimate the environment and plan a multi-step trajectory and (2) executing only the first planned action, storing feedback, and replanning. This 'reason for future, act for now' loop uses in-context learning (no weight updates) and is proven to achieve √T Bayesian regret under reasonable assumptions. Empirically RAFA improves sample efficiency and success rates on Game of 24, ALFWorld, BlocksWorld, and Tic-Tac-Toe versus ReAct, Reflexion, AdaPlanner and open-loop planners.
Problem Statement
LLM agents can reason but are stateless and ungrounded; we need a practical, sample-efficient protocol that turns LLM reasoning into actions while minimizing costly environment interactions and giving theoretical guarantees.
Main Contribution
A practical closed-loop prompting framework (RAFA) that alternates multi-step planning by an LLM with executing only the first action and storing feedback.
A formal mapping from LLM in-context learning to Bayesian adaptive MDPs, letting LLMs act as model/value estimators without parameter updates.
Key Findings
RAFA achieves state-of-the-art success on ALFWorld.
RAFA boosts Game of 24 solving with GPT-4.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Game of 24 success rate | GPT-4: 89% (B=1), 93% (B=2); GPT-3.5: 29% (B=1), 46% (B=2) | Tree-of-Thoughts (ToT) GPT-4: 73%/81%; Reflexion GPT-4: 21% | GPT-4 +16 to +12 pts over ToT | Game of 24 (100-task subset) | Table 2: RAFA vs ToT and Reflexion | Table 2 |
| ALFWorld overall success rate | RAFA: 99.25% | AdaPlanner: 91.79%; Reflexion: 92.54%; ReAct: 61.94% | +7.5 to +37.3 pts | ALFWorld (134 tasks across 6 categories) | Table 3: per-category success rates and total | Table 3 |
What To Try In 7 Days
Implement a small memory buffer of trajectories and prompt your LLM to both simulate next states (Model) and score rollouts (Critic).
Plan multi-step trajectories and only execute the first action; collect the true next state and add a short summary to the buffer.
Use a simple switching rule (e.g., when prediction disagrees with observation) to trigger re-prompting and re-planning.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Optimization Features
Token Efficiency
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Theory assumes LLM posterior alignment and MDP regularity; real LLMs may deviate and add an approximation error.
RAFA relies on multiple LLM calls per planning loop which raises latency and API cost.
When Not To Use
When each environment interaction is essentially free and offline RL fine-tuning is feasible.
When you cannot afford repeated LLM calls due to cost or latency constraints.
Failure Modes
Poor in-context examples or summaries can bias the LLM model and lead to systematic errors.
Planner suboptimality (small breadth/depth) may trap the agent in local optima despite replanning.

