Overview
Production Readiness
0.7
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
Agentic architectures turn LLMs into governed interfaces that can automate multi-step workflows, but they require engineering investments in governance, observability, and versioning to control risk and compliance.
Summary TLDR
This survey maps how LLMs are moving from stateless prompt-response calls to goal-directed 'agentic' systems. It proposes a layered reference architecture that separates LLM cognition from control, memory, and tool execution; it catalogs multi-agent topologies and their failure modes; and it presents an enterprise hardening checklist covering governance, observability, and reproducibility. The paper synthesizes vendor materials (LangChain, Kore.ai, TrueFoundry, ZenML, Salesforce) and classical agent theory to argue that production-grade agents need typed tool contracts, detailed traces, and enforced runtime policies.
Problem Statement
Stateless prompt-response patterns fail for long tasks, external tool effects, and regulated environments. Engineers patch models with brittle scaffolding. The paper asks: what software primitives define practical agentic systems, how do single-agent loops scale to multi-agent topologies, and what governance and reliability controls are required for production?
Main Contribution
A governed reference architecture that separates LLM reasoning from control, memory, tooling, and governance.
A taxonomy of multi-agent topologies (orchestrator-worker, router-solver, hierarchical, swarm) with mapped failure modes and mitigations.
An enterprise hardening checklist detailing MUST/SHOULD controls for identity, policy, observability, budgeting, and reproducibility.
Key Findings
Production agent systems should separate cognition (LLM) from execution and enforce typed tool interfaces.
Observability via detailed traces (prompts, tool calls, model/tool versions) is essential to debug and govern agents.
Multi-agent topologies reduce context pollution and allow specialization but introduce coordination failure modes like deadlocks, misrouting, and cascading costs.
Enterprise deployments require baked-in governance: RBAC, immutable audit logs, VPC/on-prem options, and budgeted autonomy.
The field is converging on common platform primitives (registries, gateways, MCP-like schemas) but lacks formal verification and interoperability standards.
Who Should Care
What To Try In 7 Days
Add structured tracing for a single agent loop (prompts, tool calls, model/tool versions).
Define and register one typed tool with schema validation and sandboxed execution.
Enforce a simple budgeted autonomy policy (max steps, max tool calls, fallback to human approval).
Agent Features
Memory
- working context (ephemeral prompt window)
- episodic memory (interaction traces)
- semantic memory (embeddings, docs, knowledge graphs)
- preference/profile memory
Planning
- ReAct (interleave reasoning and acting)
- explicit planner modules (task graphs)
- reflection/self-correction (Reflexion)
- tree-based search (Tree of Thoughts)
Tool Use
- typed tool interfaces (schemas)
- function calling
- tool registry/versioning
- sandboxed, least-privilege execution
Frameworks
- LangChain
- LangSmith
- Kore.ai
- TrueFoundry
- ZenML
- Salesforce Agentforce
- Model Context Protocol (MCP)
Is Agentic
true
Architectures
- reactive
- deliberative
- hybrid
- Belief-Desire-Intention (BDI)
- ReAct loop
- orchestrator-worker
- router-solver
- hierarchical
- swarm/market
Collaboration
- orchestrator-worker
- router-solver
- hierarchical command structures
- swarm/market-like coordination
- role-based coordination
Optimization Features
Token Efficiency
- context-window budgeting
- use of episodic/semantic memory to avoid re-prompting
Infra Optimization
- VPC/on-prem deployment for sovereignty
- OpenTelemetry-compatible tracing for monitoring
- capacity-aware routing to avoid solver overload
System Optimization
- memory tiering (cache, vector DB, archival)
- gateway-first architectures to centralize routing and policy
- sandboxing to reduce unsafe side effects
Inference Optimization
- session awareness to reduce repeated context rebuilds
- token budgeting to cap cost
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Analysis relies heavily on vendor ‘grey literature’ and platform descriptions rather than controlled experiments.
- No empirical benchmarks or performance measurements are provided for the proposed architectures.
- Scope excludes embodied robotics and deep real-time control; focuses on software agents in information systems.
When Not To Use
- If you need formal, provable guarantees today (paper offers architecture, not formal verification).
- For hard real-time or safety-critical control where low-latency deterministic loops are mandatory.
- If you cannot provide strict data sovereignty or RBAC controls for regulated data.
Failure Modes
- silent worker failures (missing heartbeats / no ACK)
- misrouting or classifier errors sending tasks to wrong solvers
- delegation deadlocks and circular dependencies in hierarchical flows
- cascading tool calls causing unbounded costs or operational incidents
- policy bypass via delegation to higher-privilege agents
Core Entities
Models
- Large Language Models (LLMs)
Metrics
- tool-use correctness
- parameter correctness
- progress rates
- cost/token usage
Context Entities
Metrics
- trace completeness
- latency (vector DB)
- execution budget caps

