Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
Active memory control and reusable experience reduce error propagation in multi-agent workflows, improving reliability and reuse across tasks so teams get better multi-step outputs with fewer retries.
Summary TLDR
StackPlanner is a centralized, hierarchical multi-agent framework that treats memory as an explicit control target. A central coordinator issues PLAN/DELEGATE/REVISE actions while sub-agents execute tasks. Key features: (1) an active task-memory stack with explicit update/condense/prune (REVISE) operations to avoid context bloat; (2) a structured experience memory (user profiles, semantic facts, procedural SOPs) to reuse coordination experience; (3) coordinator trained with a token-level RL scheme (GRPO) that interleaves retrieval, reasoning, and memory actions. On multi-hop QA and agentic benchmarks, StackPlanner outperforms baselines (e.g., 32.92% vs 29.55% F1 on 2Wiki with Qwen2.5-3B) and
Problem Statement
Centralized multi-agent coordinators suffer two linked problems: (1) task memory grows noisy and bloated during long, multi-step workflows, causing error accumulation and degraded plans; (2) coordinators lack reusable cross-task experience, so they cold-start on new tasks and fail to generalize coordination strategies.
Main Contribution
A hierarchical centralized architecture that decouples high-level coordination from sub-agent execution.
An active task-memory stack with explicit REVISE actions: Update, Condense (summarize), and Prune.
A structured experience memory storing user profiles, factual (semantic) memory, and procedural SOPs for cross-task reuse.
A reinforcement-learning training pipeline for the coordinator using Group Relative Policy Optimization (GRPO) that conditions on retrieval and memory operations.
Empirical evaluation on multi-hop QA and agentic benchmarks showing improved F1 and better generalization.
Key Findings
StackPlanner yields higher F1 than prior agentic RL baselines on multi-hop QA.
Memory modules materially improve performance; removing both causes the largest drop.
Experience memory particularly helps multi-step retrieval tasks.
Results
F1
F1
F1
F1
F1
Who Should Care
What To Try In 7 Days
Add a central coordinator that issues high-level PLAN/DELEGATE/REVISE commands.
Implement a task-memory stack with explicit condense/prune operations and log pruning reasons.
Capture procedural patterns (SOPs) from completed tasks and add a small experience store retrievable by task type.
Agent Features
Memory
- task-memory stack (explicit revise ops)
- structured experience memory (profiles, semantic, procedural)
Planning
- high-level coordinator planning
- discrete action space (PLAN/DELEGATE/REVISE)
Tool Use
- search and web tools
- sub-agent tool invocation (ReAct)
Frameworks
- REACT
- RAG-style retrieval
- GRPO
Is Agentic
true
Architectures
- centralized hierarchical
Collaboration
- central coordinator delegating to specialized sub-agents
Optimization Features
Token Efficiency
- memory condensation to reduce context length
Training Optimization
- GRPO
Reproducibility
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Limited support for multi-turn conversational dependencies; current task memory targets single-turn workflows.
- Cold-start issues for long-term experience memory; initial stored experiences may not generalize to diverse real users.
When Not To Use
- Applications that require rich multi-turn conversational state across many user turns.
- Low-resource settings where building a useful experience memory is impractical.
Failure Modes
- If REVISE is misconfigured, useful context can be pruned and harm downstream reasoning.
- Experience retrieval mismatch: retrieving irrelevant SOPs can mislead planning.
- High inference latency: reported 40–300s per sample for complex tasks may be too slow for real-time use.
Core Entities
Models
- Qwen2.5-3B
- Qwen2.5-7B
Metrics
- F1
Datasets
- 2WikiMultiHopQA
- MuSiQue
- GAIA
- FRAMES
Benchmarks
- multi-hop QA
- GAIA
- FRAMES

