APEMO: reallocate compute to negative peaks and endings to stabilize long-horizon agent workflows

Overview

Decision SnapshotNeeds Validation

Scores reflect solid controlled experiments on small LLMs and ABM support; effects are consistent for long trajectories but human-subject validation and large-model scaling are missing.

Citations0

Evidence Strength0.70

Confidence0.85

Risk Signals11

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 5/5

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 55%

Production readiness: 60%

Novelty: 65%

Authors

Hanjing Shi, Dominic DiFranzo

Links

Abstract / PDF

Why It Matters For Business

APEMO raises perceived reliability and reuse in multi-step AI workflows without retraining, giving product teams a runtime lever to improve user trust under fixed compute budgets.

Who Should Care

Product Manager Engineering Lead ML Engineer Founder

Summary TLDR

The paper introduces APEMO, a runtime orchestration layer that detects negative "peaks" and weak endings in multi-step agent workflows and reallocates a fixed compute budget toward repairs. APEMO does not change models or training. On small LLM families and multi-agent flows, it raises trajectory-level quality (e.g., +0.0791 mean quality vs a peak-end baseline) and reuse probability while modestly increasing coordination cost (~+6% in long-horizon blocks). Benefits grow with trajectory depth and are weaker when strong temporal baselines already exist.

Problem Statement

Human judgments of multi-step interactions weight intense moments and endings more than average step accuracy. Existing alignment and orchestration methods usually optimize step-level or structural properties and ignore this temporal asymmetry. The problem: how to improve perceived reliability and reuse of long-horizon agentic systems under a fixed compute budget by controlling when compute is applied over time.

Main Contribution

APEMO: a runtime temporal-affective orchestration layer that reallocates fixed compute toward detected negative peaks and endings.

A constrained multi-objective formulation balancing peak-end weighted quality, reuse robustness, frustration proxies, and coordination cost.

Key Findings

APEMO improves mean trajectory quality vs a peak-end baseline.

Numbers+0.0791 mean quality (95% CI [0.0525,0.1055])

Practical UseUnder the same compute cap, move inference precision toward detected negative peaks and endings to raise overall perceived quality.

Evidence RefTable 2; Section 5.1

APEMO increases reuse probability in long-horizon runs.

Numbers+0.0609 reuse probability (95% CI [0.0383,0.0826])

Practical UseImproved endpoints and fewer negative peaks lead users to reuse or trust outputs more; test reuse as a primary metric.

Evidence RefTable 2; Section 5.1

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Mean trajectory quality (APEMO - task_peak_end)	+0.0791	task_peak_end	+0.0791 (95% CI [0.0525,0.1055])	Long-horizon T=8, n=20	Table 2 long-horizon block	Section 5.1
Reuse probability (APEMO - task_peak_end)	+0.0609	task_peak_end	+0.0609 (95% CI [0.0383,0.0826])	Long-horizon T=8, n=20	Table 2 long-horizon block	Section 5.1

What To Try In 7 Days

Instrument simple frustration proxies (repetition, drift) across a sample 8-turn workflow.

Implement a runtime monitor that flags negative peaks and reassigns precision to flagged turns.

Run A/B tests on endpoint quality and reuse probability under fixed compute caps.

Agent Features

Memory

short-term trajectory signals (frustration proxies)

Planning

LLM planner-executor flowstemporal scheduling of reasoning precision

Tool Use

runtime orchestration overlayprecision repair operations

Frameworks

APEMO temporal-affective orchestrationAgent-Based Modeling (ABM) for stress tests

Is Agentic

Yes

Architectures

Planner-Executor-Critic topologyrole-based multi-agent flows

Collaboration

applies to multi-agent coordination as an overlay

Optimization Features

Token Efficiency

fixed total compute budget; reassign tokens/precision

Infra Optimization

local Ollama-based runtime experiments

System Optimization

trade-off analysis between robustness gain and coordination cost

Training Optimization

none (no fine-tuning or reward modification)

Inference Optimization

reallocate inference precision across turnsprecision repair triggered by peak detection

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Risks & Boundaries

Limitations

No human-subject studies to confirm subjective trust or reuse intent.

Comparisons limited to plan-execute style orchestrators, not full industrial stacks.

When Not To Use

Shallow workflows (T≈2) where coordination overhead does not amortize.

Systems that already apply strong temporal scheduling.

Failure Modes

Mis-detection of peaks causes wasted compute and reduced step-level accuracy.

Coordination overhead can erase gains in short trajectories.

Core Entities

Models

llama3.2:1bqwen2.5:1.5bgemma2:2b

Metrics

peak-end weighted qualityreuse probabilityaverage frustrationcoordination cost

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

APEMO improves mean trajectory quality vs a peak-end baseline.

APEMO increases reuse probability in long-horizon runs.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

You May Also Want to Read

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

COMPASS: a multi-agent orchestration that uses RAG and an LLM-as-judge to enforce sovereignty, carbon-awareness, compliance, and ethics in实时

Key finding

RAPS: intent-driven, reputation-aware publish–subscribe for adaptive multi-agent LLM coordination

Key finding

ACP: a layered, federated protocol for secure cross-platform agent-to-agent collaboration

Key finding