Agentic ROI: prioritize real user value, not raw model scores

Overview

Decision SnapshotNeeds Validation

Paper introduces a practical evaluation lens (Agentic ROI) and a roadmap. Evidence mixes a small survey (n=34) and conceptual arguments, so ideas are actionable but need broader empirical validation.

Citations0

Evidence Strength0.50

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/3

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 70%

Production readiness: 40%

Novelty: 40%

Authors

Weiwen Liu, Jiarui Qin, Xu Huang, Xingshan Zeng, Yunjia Xi, Jianghao Lin, Chuhan Wu, Yasheng Wang, Lifeng Shang, Ruiming Tang, Defu Lian, Yong Yu, Weinan Zhang

Links

Abstract / PDF

Why It Matters For Business

Measure agent value as Agentic ROI (quality + time saved per dollar) to decide where to deploy agents profitably and avoid wasting resources on low-ROI, high-cost integrations.

Who Should Care

Product Manager CTO ML Engineer Founder Engineering Lead

Summary TLDR

This position paper argues that the real bottleneck for widespread LLM agent adoption is low Agentic ROI—the user-facing ratio of information gain and time savings to cost. The authors define Agentic ROI, demonstrate its use with a 34-person survey across five domains, and show high ROI in coding/research but low ROI in mass-market tasks like office work and e-commerce. They propose a zigzag roadmap: first "scale up" agents (sleep-time compute, multi-step reasoning, proactive interaction) to increase information gain and time savings, then "scale down" (memory retrieval, distillation, quantization, hardware-software co-optimization) to cut per-task cost. The paper is a strategic call to re-e

Problem Statement

LLM agents can technically automate many tasks, but many real-world uses deliver too little net benefit to users once time, prompting effort, verification, and cost are accounted for. The paper introduces Agentic ROI to measure whether deploying an agent actually improves users' utility compared to human or UI alternatives.

Main Contribution

Introduce Agentic ROI: a simple, actionable metric combining information gain, time savings, and monetary cost to evaluate agent usability.

Present a small empirical demonstration (n=34 survey) showing Agentic ROI correlates strongly with reported usability (r=0.95).

Key Findings

Reported agent usability across domains aligns tightly with computed Agentic ROI.

Numbersr = 0.95 correlation (survey analysis, Fig.1b)

Practical UseMeasure Agentic ROI before deploying agents; a strong ROI predicts real user acceptance.

Evidence RefSection 3.1, Figure 1b

High Agentic ROI appears in coding and scientific research; low ROI in office work, e-commerce, and personal assistance.

NumbersSurvey of 34 participants across five domains (coding, research, office, e-comm, personal)

Practical UsePrioritize agent deployment for high-T0, multi-step tasks (e.g., coding, research) before mass-market UIs.

Evidence RefSection 3.2 and Figure 1

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Survey sample size	34 participants	—	—	Section 3.1	34 survey responses (14 AI practitioners, 20 end-users)	Section 3.1
Correlation between Agentic ROI and reported usability	r = 0.95	—	—	Figure 1b	Strong positive linear correlation reported	Section 3.1, Figure 1b

What To Try In 7 Days

Run a small ROI audit: pick one high-T0 workflow, log T0 and T_agent, and collect user quality ratings.

Add simple proactive features (prefilled templates, intent inference) to cut interaction time and re-measure ROI.

Pilot sleep-time compute or cached retrieval for repetitive tasks to estimate cost savings.

Agent Features

Memory

sleep-time compute (offline refinement)long-term memory / retrievalstate persistence

Planning

long-horizon reasoningiterative simulationtask decomposition

Tool Use

API integrationtool orchestrationexternal verification calls

Frameworks

n8nLangChainAutoGenMetaGPT

Is Agentic

Yes

Architectures

multi-agentgeneralist-to-specialist pipeline

Collaboration

agent swarmsmulti-agent coordination

Optimization Features

Token Efficiency

speculative decodingcontext compression

Infra Optimization

use of inference-optimized stacks (e.g., vLLM, FlashAttention)AI-specific hardware co-design

Model Optimization

knowledge distillationquantizationpruningspeculative decoding

System Optimization

memory retrieval instead of regenerationstate persistence to avoid recomputation

Training Optimization

specialization for sub-tasksdistillation from generalist to expert models

Inference Optimization

sleep-time compute precomputationretrieval-based reasoninghardware-software co-optimization

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Risks & Boundaries

Limitations

Small empirical sample (34 survey responses) limits generalizability.

Cost estimates per task are coarse and normalized heuristically.

When Not To Use

Short, single-step interactions where UI is faster (low T0 tasks).

Deterministic, repetitive processes best served by RPA or rule systems.

Failure Modes

Prompting and verification overhead can erase time savings, yielding negative ROI.

Agent hallucination or drift during long multi-step tasks causes extra verification.

Core Entities

Models

GPT-5Gemini-3Qwen-3DeepSeek-V3.2

Metrics

Agentic ROIInformation GainTime SavingsCostUsability (user ratings)

Benchmarks

GAIAAndroidWorldτ2-BenchAI Index

Context Entities

Models

Gemini 3 proChatGPT Pulse

Metrics

r (correlation coefficient)

Benchmarks

AndroidWorldGAIA

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Reported agent usability across domains aligns tightly with computed Agentic ROI.

High Agentic ROI appears in coding and scientific research; low ROI in office work, e-commerce, and personal assistance.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Benchmarks

Context Entities

Models

Metrics

Benchmarks

You May Also Want to Read

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

MLRC-Bench: a competition-based benchmark that tests if LLM agents can propose and implement novel ML research

Key finding

A closed-loop Sensing→Regulating→Correcting system that routes LLM execution by uncertainty to cut errors and API cost

Key finding

BackdoorAgent: a stage-aware framework and benchmark showing memory backdoors persist across multi-step LLM agents

Key finding