Overview
Production Readiness
0.4
Novelty Score
0.6
Cost Impact Score
0.3
Citation Count
1
Why It Matters For Business
A simple global/local agent split can reduce looped interactions and improve automation success on multi-step web tasks, making web automation more robust with modest engineering effort.
Summary TLDR
CoAct is a simple two-agent framework: a global planner that makes phased high-level plans and a local executor that runs and checks subtasks. On the WebArena web-navigation benchmark, CoAct (with GPT-3.5) raises task success from 9.4% (ReAct baseline) to 13.8%, and to 16.0% when using a force-stop cap. Errors remain: ~40% stem from weak global plans and ~60% from repetitive actions and missing memory. Adding short web-page search snippets improves results further.
Problem Statement
Large language models still fail many multi-step web tasks because single-agent prompting hits attention and planning limits. Agents can loop on observations, repeat actions, and fail to replan globally. The paper asks: can a simple hierarchical multi-agent setup improve robustness on long-horizon web tasks?
Main Contribution
CoAct framework: a two-agent hierarchy (global planner + local executor) for phased task decomposition and replanning.
Empirical evaluation on the WebArena web-navigation benchmark showing consistent gains over ReAct using gpt-3.5-turbo-16k.
Analysis of failure modes and preliminary improvement by injecting short, page-specific search snippets into global planning.
Key Findings
CoAct raises average task success vs ReAct on WebArena.
Force-stop dialog limit further improves success.
Adding short web-page search snippets to the planner boosts success on tested tasks.
Failure breakdown on medium-difficulty cases: planning vs repetition.
Tasks split by difficulty in Shop: many are multi-step.
Results
Success Rate (Shop)
Success Rate (CMS)
Success Rate (Reddit)
Success Rate (Gitlab)
Success Rate (Map)
Average Success Rate (all tasks)
Who Should Care
What To Try In 7 Days
Prototype a global-plan + local-executor prompt split for a web automation workflow.
Add a dialogue/exchange cap (force-stop) to stop repetitive action loops.
Plug short, page-specific text snippets (<=100 words) into your planner to reduce planning errors.
Agent Features
Memory
- no efficient memory mechanism implemented
- recommendation: add memory/experience to avoid repetition
Planning
- macro-level global planning
- local per-phase execution planning
- replanning on agent request
Tool Use
- web navigation actions (page operations)
- search engine snippets for planning (optional)
Frameworks
- CoAct
Is Agentic
true
Architectures
- global-local (hierarchical) two-agent
- phase-based task decomposition
Collaboration
- request/revise/overrule interaction loop between agents
- local agent validates and can request global replanning
Optimization Features
System Optimization
- dialogue force-stop to reduce loops
- context partitioning via phase prompts
Reproducibility
Code Urls
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- ≈40% failures due to weak global planner and insufficient page-specific knowledge
- ≈60% failures caused by iterative/repetitive actions and lack of memory
- Performance still far below humans (human avg SR 78.2% vs CoAct 13.8%)
- Relies on proprietary LLM (gpt-3.5) and WebArena synthetic web environments
When Not To Use
- When you need near-human reliability on complex web tasks today
- When you cannot run repeated LLM calls due to cost or latency
- When a single-step API or deterministic automation already suffices
Failure Modes
- Poor global plan leads to wrong decomposition and downstream errors
- Local agent repeats actions and exhausts exchange limits
- Overaccumulation of context prevents recognizing failure and switching strategies
Core Entities
Models
- gpt-3.5-turbo-16k-0613
- ReAct (baseline)
Metrics
- Success Rate (SR)
Datasets
- WebArena
Benchmarks
- WebArena
Context Entities
Datasets
- WebArena (Zhou et al., 2023)

