Agentic ROI: prioritize real user value, not raw model scores

May 23, 20257 min

Overview

Decision SnapshotNeeds Validation

Paper introduces a practical evaluation lens (Agentic ROI) and a roadmap. Evidence mixes a small survey (n=34) and conceptual arguments, so ideas are actionable but need broader empirical validation.

Citations0

Evidence Strength0.50

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/3

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 70%

Production readiness: 40%

Novelty: 40%

Authors

Weiwen Liu, Jiarui Qin, Xu Huang, Xingshan Zeng, Yunjia Xi, Jianghao Lin, Chuhan Wu, Yasheng Wang, Lifeng Shang, Ruiming Tang, Defu Lian, Yong Yu, Weinan Zhang

Links

Abstract / PDF

Why It Matters For Business

Measure agent value as Agentic ROI (quality + time saved per dollar) to decide where to deploy agents profitably and avoid wasting resources on low-ROI, high-cost integrations.

Who Should Care

Summary TLDR

This position paper argues that the real bottleneck for widespread LLM agent adoption is low Agentic ROI—the user-facing ratio of information gain and time savings to cost. The authors define Agentic ROI, demonstrate its use with a 34-person survey across five domains, and show high ROI in coding/research but low ROI in mass-market tasks like office work and e-commerce. They propose a zigzag roadmap: first "scale up" agents (sleep-time compute, multi-step reasoning, proactive interaction) to increase information gain and time savings, then "scale down" (memory retrieval, distillation, quantization, hardware-software co-optimization) to cut per-task cost. The paper is a strategic call to re-e

Problem Statement

LLM agents can technically automate many tasks, but many real-world uses deliver too little net benefit to users once time, prompting effort, verification, and cost are accounted for. The paper introduces Agentic ROI to measure whether deploying an agent actually improves users' utility compared to human or UI alternatives.

Main Contribution

Introduce Agentic ROI: a simple, actionable metric combining information gain, time savings, and monetary cost to evaluate agent usability.

Present a small empirical demonstration (n=34 survey) showing Agentic ROI correlates strongly with reported usability (r=0.95).

Key Findings

Reported agent usability across domains aligns tightly with computed Agentic ROI.

Numbersr = 0.95 correlation (survey analysis, Fig.1b)

Practical UseMeasure Agentic ROI before deploying agents; a strong ROI predicts real user acceptance.

Evidence RefSection 3.1, Figure 1b

High Agentic ROI appears in coding and scientific research; low ROI in office work, e-commerce, and personal assistance.

NumbersSurvey of 34 participants across five domains (coding, research, office, e-comm, personal)

Practical UsePrioritize agent deployment for high-T0, multi-step tasks (e.g., coding, research) before mass-market UIs.

Evidence RefSection 3.2 and Figure 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Survey sample size34 participantsSection 3.134 survey responses (14 AI practitioners, 20 end-users)Section 3.1
Correlation between Agentic ROI and reported usabilityr = 0.95Figure 1bStrong positive linear correlation reportedSection 3.1, Figure 1b

What To Try In 7 Days

Run a small ROI audit: pick one high-T0 workflow, log T0 and T_agent, and collect user quality ratings.

Add simple proactive features (prefilled templates, intent inference) to cut interaction time and re-measure ROI.

Pilot sleep-time compute or cached retrieval for repetitive tasks to estimate cost savings.

Agent Features

Memory
sleep-time compute (offline refinement)long-term memory / retrievalstate persistence
Planning
long-horizon reasoningiterative simulationtask decomposition
Tool Use
API integrationtool orchestrationexternal verification calls
Frameworks
n8nLangChainAutoGenMetaGPT
Is Agentic

Yes

Architectures
multi-agentgeneralist-to-specialist pipeline
Collaboration
agent swarmsmulti-agent coordination

Optimization Features

Token Efficiency
speculative decodingcontext compression
Infra Optimization
use of inference-optimized stacks (e.g., vLLM, FlashAttention)AI-specific hardware co-design
Model Optimization
knowledge distillationquantizationpruningspeculative decoding
System Optimization
memory retrieval instead of regenerationstate persistence to avoid recomputation
Training Optimization
specialization for sub-tasksdistillation from generalist to expert models
Inference Optimization
sleep-time compute precomputationretrieval-based reasoninghardware-software co-optimization

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Small empirical sample (34 survey responses) limits generalizability.

Cost estimates per task are coarse and normalized heuristically.

When Not To Use

Short, single-step interactions where UI is faster (low T0 tasks).

Deterministic, repetitive processes best served by RPA or rule systems.

Failure Modes

Prompting and verification overhead can erase time savings, yielding negative ROI.

Agent hallucination or drift during long multi-step tasks causes extra verification.

Core Entities

Models

GPT-5Gemini-3Qwen-3DeepSeek-V3.2

Metrics

Agentic ROIInformation GainTime SavingsCostUsability (user ratings)

Benchmarks

GAIAAndroidWorldτ2-BenchAI Index

Context Entities

Models

Gemini 3 proChatGPT Pulse

Metrics

r (correlation coefficient)

Benchmarks

AndroidWorldGAIA