Overview
Scores reflect solid controlled experiments on small LLMs and ABM support; effects are consistent for long trajectories but human-subject validation and large-model scaling are missing.
Citations0
Evidence Strength0.70
Confidence0.85
Risk Signals11
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 5/5
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 55%
Production readiness: 60%
Novelty: 65%
Why It Matters For Business
APEMO raises perceived reliability and reuse in multi-step AI workflows without retraining, giving product teams a runtime lever to improve user trust under fixed compute budgets.
Who Should Care
Summary TLDR
The paper introduces APEMO, a runtime orchestration layer that detects negative "peaks" and weak endings in multi-step agent workflows and reallocates a fixed compute budget toward repairs. APEMO does not change models or training. On small LLM families and multi-agent flows, it raises trajectory-level quality (e.g., +0.0791 mean quality vs a peak-end baseline) and reuse probability while modestly increasing coordination cost (~+6% in long-horizon blocks). Benefits grow with trajectory depth and are weaker when strong temporal baselines already exist.
Problem Statement
Human judgments of multi-step interactions weight intense moments and endings more than average step accuracy. Existing alignment and orchestration methods usually optimize step-level or structural properties and ignore this temporal asymmetry. The problem: how to improve perceived reliability and reuse of long-horizon agentic systems under a fixed compute budget by controlling when compute is applied over time.
Main Contribution
APEMO: a runtime temporal-affective orchestration layer that reallocates fixed compute toward detected negative peaks and endings.
A constrained multi-objective formulation balancing peak-end weighted quality, reuse robustness, frustration proxies, and coordination cost.
Key Findings
APEMO improves mean trajectory quality vs a peak-end baseline.
APEMO increases reuse probability in long-horizon runs.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Mean trajectory quality (APEMO - task_peak_end) | +0.0791 | task_peak_end | +0.0791 (95% CI [0.0525,0.1055]) | Long-horizon T=8, n=20 | Table 2 long-horizon block | Section 5.1 |
| Reuse probability (APEMO - task_peak_end) | +0.0609 | task_peak_end | +0.0609 (95% CI [0.0383,0.0826]) | Long-horizon T=8, n=20 | Table 2 long-horizon block | Section 5.1 |
What To Try In 7 Days
Instrument simple frustration proxies (repetition, drift) across a sample 8-turn workflow.
Implement a runtime monitor that flags negative peaks and reassigns precision to flagged turns.
Run A/B tests on endpoint quality and reuse probability under fixed compute caps.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
No human-subject studies to confirm subjective trust or reuse intent.
Comparisons limited to plan-execute style orchestrators, not full industrial stacks.
When Not To Use
Shallow workflows (T≈2) where coordination overhead does not amortize.
Systems that already apply strong temporal scheduling.
Failure Modes
Mis-detection of peaks causes wasted compute and reduced step-level accuracy.
Coordination overhead can erase gains in short trajectories.

