Reference architecture, multi-agent taxonomy, and enterprise hardening for LLM agents

Overview

Decision SnapshotNeeds Validation

The paper synthesizes vendor docs and classic literature to produce a practical architecture; it provides detailed design patterns and checklists but no empirical benchmark or formal verification results.

Citations0

Evidence Strength0.60

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 0/5

Findings with evidence refs: 5/5

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 50%

Authors

Mamdouh Alenezi

Links

Abstract / PDF

Why It Matters For Business

Agentic architectures turn LLMs into governed interfaces that can automate multi-step workflows, but they require engineering investments in governance, observability, and versioning to control risk and compliance.

Who Should Care

CTO Product Manager ML Engineer Engineering Lead Founder Data Scientist

Summary TLDR

This survey maps how LLMs are moving from stateless prompt-response calls to goal-directed 'agentic' systems. It proposes a layered reference architecture that separates LLM cognition from control, memory, and tool execution; it catalogs multi-agent topologies and their failure modes; and it presents an enterprise hardening checklist covering governance, observability, and reproducibility. The paper synthesizes vendor materials (LangChain, Kore.ai, TrueFoundry, ZenML, Salesforce) and classical agent theory to argue that production-grade agents need typed tool contracts, detailed traces, and enforced runtime policies.

Problem Statement

Stateless prompt-response patterns fail for long tasks, external tool effects, and regulated environments. Engineers patch models with brittle scaffolding. The paper asks: what software primitives define practical agentic systems, how do single-agent loops scale to multi-agent topologies, and what governance and reliability controls are required for production?

Main Contribution

A governed reference architecture that separates LLM reasoning from control, memory, tooling, and governance.

A taxonomy of multi-agent topologies (orchestrator-worker, router-solver, hierarchical, swarm) with mapped failure modes and mitigations.

Key Findings

Production agent systems should separate cognition (LLM) from execution and enforce typed tool interfaces.

Practical UseDesign the agent as a control loop: keep the LLM as a planner only, route all side-effecting calls through versioned, typed tool registries and policy gates so you can audit and enforce safety.

Evidence RefAbstract; §3; Fig.3

Observability via detailed traces (prompts, tool calls, model/tool versions) is essential to debug and govern agents.

Practical UseCapture and store step-level traces for every run; use traces for regression tests, incident response, and cost monitoring.

Evidence Ref§3 Observability; §4; LangChain/TrueFoundry references

What To Try In 7 Days

Add structured tracing for a single agent loop (prompts, tool calls, model/tool versions).

Define and register one typed tool with schema validation and sandboxed execution.

Enforce a simple budgeted autonomy policy (max steps, max tool calls, fallback to human approval).

Agent Features

Memory

working context (ephemeral prompt window)episodic memory (interaction traces)semantic memory (embeddings, docs, knowledge graphs)preference/profile memory

Planning

ReAct (interleave reasoning and acting)explicit planner modules (task graphs)reflection/self-correction (Reflexion)tree-based search (Tree of Thoughts)

Tool Use

typed tool interfaces (schemas)function callingtool registry/versioningsandboxed, least-privilege execution

Frameworks

LangChainLangSmithKore.aiTrueFoundryZenMLSalesforce AgentforceModel Context Protocol (MCP)

Is Agentic

Yes

Architectures

reactivedeliberativehybridBelief-Desire-Intention (BDI)ReAct looporchestrator-workerrouter-solverhierarchicalswarm/market

Collaboration

orchestrator-workerrouter-solverhierarchical command structuresswarm/market-like coordinationrole-based coordination

Optimization Features

Token Efficiency

context-window budgetinguse of episodic/semantic memory to avoid re-prompting

Infra Optimization

VPC/on-prem deployment for sovereigntyOpenTelemetry-compatible tracing for monitoringcapacity-aware routing to avoid solver overload

System Optimization

memory tiering (cache, vector DB, archival)gateway-first architectures to centralize routing and policysandboxing to reduce unsafe side effects

Inference Optimization

session awareness to reduce repeated context rebuildstoken budgeting to cap cost

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Risks & Boundaries

Limitations

Analysis relies heavily on vendor ‘grey literature’ and platform descriptions rather than controlled experiments.

No empirical benchmarks or performance measurements are provided for the proposed architectures.

When Not To Use

If you need formal, provable guarantees today (paper offers architecture, not formal verification).

For hard real-time or safety-critical control where low-latency deterministic loops are mandatory.

Failure Modes

silent worker failures (missing heartbeats / no ACK)

misrouting or classifier errors sending tasks to wrong solvers

Core Entities

Models

Large Language Models (LLMs)

Metrics

tool-use correctnessparameter correctnessprogress ratescost/token usage

Context Entities

Metrics

trace completenesslatency (vector DB)execution budget caps

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Production agent systems should separate cognition (LLM) from execution and enforce typed tool interfaces.

Observability via detailed traces (prompts, tool calls, model/tool versions) is essential to debug and govern agents.

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Context Entities

Metrics

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding