A modular blueprint for running reliable multi-agent workflows with planning, tool refinement, and episodic memory

June 28, 20247 min

Overview

Decision SnapshotNeeds Validation

This is a practical engineering blueprint with clear components but no empirical evaluations; it is actionable for prototyping yet lacks measured performance evidence.

Citations4

Evidence Strength0.30

Confidence0.85

Risk Signals13

Trust Signals

Findings with numeric evidence: 0/5

Findings with evidence refs: 5/5

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 40%

Production readiness: 60%

Novelty: 50%

Authors

Noel Crawford, Edward B. Duffy, Iman Evazzade, Torsten Foehr, Gregory Robbins, Debbrata Kumar Saha, Jiya Varma, Marcin Ziolkowski

Links

Abstract / PDF

Why It Matters For Business

Gives a reusable engineering blueprint to run reliable, auditable multi-agent automation across existing enterprise systems without retraining models.

Who Should Care

Summary TLDR

This paper presents a practical engineering blueprint for building multi-agent systems driven by large language models (LLMs). It specifies modular components (Planner, Executor, Verifier, Agent Units, Matchers), prompt strategies (ReAct variants, Programmable Prompt, ConvPlanReAct), tool handling (tool schema + Toolbox Refiner), and two memory levels (short per-task memory and episodic vector DB). The design highlights five multi-agent patterns (Independent, Sequential, Joint, Hierarchical, Broadcast), human-in-loop options, and resume/restart behavior for production-grade automation. No new model weights or benchmark experiments are provided.

Problem Statement

Current LLMs are powerful but lack direct access to proprietary systems and reliable multi-step execution. Organizations need a reusable engineering pattern to compose narrow expert agents, orchestrate tools and memory, verify results, and scale multi-agent workflows in enterprise IT environments.

Main Contribution

A modular agent engineering framework that separates Planning, Execution, and Verification and fits mixed modern/legacy IT.

ConvPlanReAct: a conversational extension of ReAct/PlanReAct that adds dialog-aware steps and explicit next-agent selection (@AgentName).

Key Findings

Narrow, persona-like agents perform more reliably than broad agents.

Practical UseDesign agents with focused roles and clear persona prompts; break complex tasks into sub-tasks assigned to specialized agents.

Evidence RefSection 4 (first paragraph); Section 3 (definitions of Agent persona)

Multi-agent workflows can be realized as five practical patterns: Independent, Sequential, Joint, Hierarchical, Broadcast.

Practical UseChoose the matching pattern that fits your process (e.g., Sequential for actor/critic edits, Broadcast for leader-driven parallel opinions).

Evidence RefSection 4.2 (panels 4.2.1–4.2.5)

What To Try In 7 Days

Prototype a Planner + Task Queue to decompose one recurring business process.

Wrap three narrow agents (e.g., Coder, Architect, Tester) and test a Joint workflow on a small coding task.

Add a Toolbox Refiner to limit tool list and measure tool-selection stability.

Agent Features

Memory
Short Memory: per-task prompt historyEpisodic Memory: vector DB of completed task episodes
Planning
Planner agent produces DAG of tasksPlanReAct and ConvPlanReAct for iterative planning
Tool Use
Tool abstraction via explicit input/output schema in promptsToolbox Refiner (Identity, Hierarchical, Semantic)
Frameworks
ConvPlanReActProgrammable PromptReActPlanReAct
Is Agentic

Yes

Architectures
Modular Plan-Execute-Verify pipelineAgent Unit (one or more agents per unit)
Collaboration
IndependentSequentialJointHierarchicalBroadcastAgent matching (semantic, mention, sequence)

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

No quantitative experiments or benchmarks are reported to validate effectiveness.

Framework assumes access to reliable external tools/APIs for many workflows.

When Not To Use

Safety-critical systems requiring formal guarantees and audit trails beyond heuristic verification.

Environments with no stable APIs or tools to perform external actions.

Failure Modes

Agent hallucination leading to incorrect actions or tool calls.

Wrong agent selection due to imperfect matchers or ambiguous personas.

Core Entities

Models

Large Language Models (unspecified)

Context Entities

Models

AutoGenAutoGPTLangChainLlamaIndexMetaGPTAgentVerseAgentLite