Overview
The method is easy to implement with prompt engineering but shows modest absolute gains; main limits are planner quality and missing memory.
Citations1
Evidence Strength0.60
Confidence0.86
Risk Signals10
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 6/6
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 30%
Production readiness: 40%
Novelty: 60%
Why It Matters For Business
A simple global/local agent split can reduce looped interactions and improve automation success on multi-step web tasks, making web automation more robust with modest engineering effort.
Who Should Care
Summary TLDR
CoAct is a simple two-agent framework: a global planner that makes phased high-level plans and a local executor that runs and checks subtasks. On the WebArena web-navigation benchmark, CoAct (with GPT-3.5) raises task success from 9.4% (ReAct baseline) to 13.8%, and to 16.0% when using a force-stop cap. Errors remain: ~40% stem from weak global plans and ~60% from repetitive actions and missing memory. Adding short web-page search snippets improves results further.
Problem Statement
Large language models still fail many multi-step web tasks because single-agent prompting hits attention and planning limits. Agents can loop on observations, repeat actions, and fail to replan globally. The paper asks: can a simple hierarchical multi-agent setup improve robustness on long-horizon web tasks?
Main Contribution
CoAct framework: a two-agent hierarchy (global planner + local executor) for phased task decomposition and replanning.
Empirical evaluation on the WebArena web-navigation benchmark showing consistent gains over ReAct using gpt-3.5-turbo-16k.
Key Findings
CoAct raises average task success vs ReAct on WebArena.
Force-stop dialog limit further improves success.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Success Rate (Shop) | ReAct 12.0% | CoAct 22.0% | CoAct w/ FS 24.0% | HUMAN - | ReAct 12.0% | CoAct +10.0pp over ReAct | WebArena - Shop | Table 1: per-task SR | Table 1 |
| Success Rate (CMS) | ReAct 11.0% | CoAct 14.0% | CoAct w/ FS 17.0% | ReAct 11.0% | CoAct +3.0pp | WebArena - CMS | Table 1: per-task SR | Table 1 |
What To Try In 7 Days
Prototype a global-plan + local-executor prompt split for a web automation workflow.
Add a dialogue/exchange cap (force-stop) to stop repetitive action loops.
Plug short, page-specific text snippets (<=100 words) into your planner to reduce planning errors.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
System Optimization
Reproducibility
Code URLs
Risks & Boundaries
Limitations
≈40% failures due to weak global planner and insufficient page-specific knowledge
≈60% failures caused by iterative/repetitive actions and lack of memory
When Not To Use
When you need near-human reliability on complex web tasks today
When you cannot run repeated LLM calls due to cost or latency
Failure Modes
Poor global plan leads to wrong decomposition and downstream errors
Local agent repeats actions and exhausts exchange limits

