Overview
The chapter is a recent, well-sourced survey. It outlines architectures and priorities but provides few quantitative benchmarks; evidence mixes citations and conceptual analysis.
Citations0
Evidence Strength0.60
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 1/4
Findings with evidence refs: 4/4
Results with explicit delta: 0/1
Reproducibility
Status: No open assets linked
Open source: No
At A Glance
Cost impact: 60%
Production readiness: 40%
Novelty: 50%
Why It Matters For Business
Agentic AI can automate multi-step workflows, connect tools, and keep context. But it raises real risks (wrong actions, privacy leaks, higher compute bills). Companies must pilot with tight guardrails, audit logs, and cost controls.
Who Should Care
Summary TLDR
This survey explains how large language models (LLMs) are being wrapped into autonomous agents that plan, use tools, and keep memory. It lays out a simple architecture (perception, LLM brain, memory, action), gives examples (single- and multi-agent flows), and highlights the main technical and governance gaps: verifiable planning, robust long-term memory, multi-agent coordination, safety guardrails, and sustainable inference.
Problem Statement
LLMs are powerful text engines but not full agents. Building safe, reliable systems that can plan, act in the world, remember across sessions, and coordinate multiple roles requires new architectures, evaluation methods, and governance.
Main Contribution
Synthesis of how LLM capabilities extend toward agent-like behavior via reason-act-reflect loops.
An integrative architecture that lists core modules: perception, LLM reasoning/planning, memory, and action execution.
Key Findings
Agentic behavior arises when LLMs are combined with perception, external memory, and tool execution into a closed-loop reason-act-reflect cycle.
Existing language-model benchmarks can miss cultural and linguistic gaps; one cited Arabic benchmark found leading models score about 30% on culturally grounded reasoning tasks.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 30% | — | — | Arabic cultural reasoning benchmark (ref [33]) | Section 6.1 cites models scoring ~30% on this benchmark. | [33]; Section 6.1 |
What To Try In 7 Days
Build a simple ReAct-style agent that calls a calculator and a search API; log every tool call.
Add a vector DB for short-term memory and test consistency across 5–10 interactions.
Introduce action-level checkpoints with human approval for any irreversible operation.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Survey-style chapter: conceptual and synthetic, not an empirical method paper.
Few new quantitative experiments or benchmarks provided.
When Not To Use
Do not deploy agentic systems for irreversible, high-stakes actions without strict human approval.
Avoid relying on current persistent memory for identity-critical tasks due to drift and privacy risk.
Failure Modes
Error amplification across long multi-step workflows.
Non-deterministic outputs causing inconsistent behavior.

