Overview
Production Readiness
0.4
Novelty Score
0.4
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
Measure agent value as Agentic ROI (quality + time saved per dollar) to decide where to deploy agents profitably and avoid wasting resources on low-ROI, high-cost integrations.
Summary TLDR
This position paper argues that the real bottleneck for widespread LLM agent adoption is low Agentic ROI—the user-facing ratio of information gain and time savings to cost. The authors define Agentic ROI, demonstrate its use with a 34-person survey across five domains, and show high ROI in coding/research but low ROI in mass-market tasks like office work and e-commerce. They propose a zigzag roadmap: first "scale up" agents (sleep-time compute, multi-step reasoning, proactive interaction) to increase information gain and time savings, then "scale down" (memory retrieval, distillation, quantization, hardware-software co-optimization) to cut per-task cost. The paper is a strategic call to re-e
Problem Statement
LLM agents can technically automate many tasks, but many real-world uses deliver too little net benefit to users once time, prompting effort, verification, and cost are accounted for. The paper introduces Agentic ROI to measure whether deploying an agent actually improves users' utility compared to human or UI alternatives.
Main Contribution
Introduce Agentic ROI: a simple, actionable metric combining information gain, time savings, and monetary cost to evaluate agent usability.
Present a small empirical demonstration (n=34 survey) showing Agentic ROI correlates strongly with reported usability (r=0.95).
Describe a practical zigzag roadmap: scale up to raise information gain and time savings, then scale down to cut cost for mass-market adoption.
Highlight concrete engineering levers: sleep-time compute, multi-step capabilities, proactive interaction, memory retrieval, and model compression.
Key Findings
Reported agent usability across domains aligns tightly with computed Agentic ROI.
High Agentic ROI appears in coding and scientific research; low ROI in office work, e-commerce, and personal assistance.
Prompting overhead and verification time can erase time savings for short, well-structured tasks.
Agentic ROI is personalizable: users with lower baseline skill often gain disproportionately large ROI.
Results
Survey sample size
Correlation between Agentic ROI and reported usability
Domain-level ROI trend
Who Should Care
What To Try In 7 Days
Run a small ROI audit: pick one high-T0 workflow, log T0 and T_agent, and collect user quality ratings.
Add simple proactive features (prefilled templates, intent inference) to cut interaction time and re-measure ROI.
Pilot sleep-time compute or cached retrieval for repetitive tasks to estimate cost savings.
Agent Features
Memory
- sleep-time compute (offline refinement)
- long-term memory / retrieval
- state persistence
Planning
- long-horizon reasoning
- iterative simulation
- task decomposition
Tool Use
- API integration
- tool orchestration
- external verification calls
Frameworks
- n8n
- LangChain
- AutoGen
- MetaGPT
Is Agentic
true
Architectures
- multi-agent
- generalist-to-specialist pipeline
Collaboration
- agent swarms
- multi-agent coordination
Optimization Features
Token Efficiency
- speculative decoding
- context compression
Infra Optimization
- use of inference-optimized stacks (e.g., vLLM, FlashAttention)
- AI-specific hardware co-design
Model Optimization
- knowledge distillation
- quantization
- pruning
- speculative decoding
System Optimization
- memory retrieval instead of regeneration
- state persistence to avoid recomputation
Training Optimization
- specialization for sub-tasks
- distillation from generalist to expert models
Inference Optimization
- sleep-time compute precomputation
- retrieval-based reasoning
- hardware-software co-optimization
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Small empirical sample (34 survey responses) limits generalizability.
- Cost estimates per task are coarse and normalized heuristically.
- Survey is self-reported and domain selection is limited to five categories.
- Correlation reported is associative, not causal.
When Not To Use
- Short, single-step interactions where UI is faster (low T0 tasks).
- Deterministic, repetitive processes best served by RPA or rule systems.
- Sensitive settings where sleep-time compute raises privacy concerns without safeguards.
Failure Modes
- Prompting and verification overhead can erase time savings, yielding negative ROI.
- Agent hallucination or drift during long multi-step tasks causes extra verification.
- High compute cost can make marginal accuracy gains uneconomical.
- Inter-agent coordination overhead in swarms may reduce net benefit.
Core Entities
Models
- GPT-5
- Gemini-3
- Qwen-3
- DeepSeek-V3.2
Metrics
- Agentic ROI
- Information Gain
- Time Savings
- Cost
- Usability (user ratings)
Benchmarks
- GAIA
- AndroidWorld
- τ2-Bench
- AI Index
Context Entities
Models
- Gemini 3 pro
- ChatGPT Pulse
Metrics
- r (correlation coefficient)
Benchmarks
- AndroidWorld
- GAIA

