ALAS: modular LLM agents with persistent memory that locally repair plans under runtime disruptions

Overview

Decision SnapshotNeeds Validation

ALAS is a strong prototype for reactive planning: it shows reproducible benchmark gains and clear mechanisms (memory + local compensation), but it needs deployment tests, robust code generation safeguards, and dynamic cost tuning before industrial use.

Citations2

Evidence Strength0.70

Confidence0.80

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 4/4

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 70%

Authors

Edward Y. Chang, Longling Geng

Links

Abstract / PDF

Why It Matters For Business

ALAS turns LLMs into practical schedulers for dynamic operations by keeping state, validating plans, and repairing disruptions locally—reducing rework, travel, and missed deadlines in logistics and operations.

Who Should Care

CTO Product Manager ML Engineer Engineering Lead

Summary TLDR

ALAS (Adaptive LLM Agent System) turns a monolithic LLM planner into a network of role-specialized agents with a shared persistent execution memory and a local reactive protocol (LRCP). The system: (1) validates templates, (2) instantiates agents (via code/LLM prompts), and (3) reacts to runtime failures with local compensation instead of full replanning. On toy and large job-shop benchmarks ALAS reduces travel distance (URS) and closes gap-to-optimal on JSSP benchmarks while fixing reactive failures that broke standalone LLMs. Key limits: no physical deployment yet, relies on reliable code generation, and uses static cost models.

Problem Statement

Standalone LLMs struggle for real-time, transaction-style planning because they lack self-checks, persistent state, long-context fidelity, and disruption recovery. ALAS fixes this by decomposing plans into role-specific agents, adding persistent execution memory, validators, and a Local Reactive Compensation Protocol (LRCP) that prefers local fixes over costly global replanning.

Main Contribution

Three-layer architecture (workflow blueprint, agent factory, runtime monitor) that builds validated templates and instantiates agents.

Persistent execution memory that logs state transitions, enabling rollback, targeted compensation, and causal checks.

Key Findings

Alas produces shorter ride-sharing routes than standalone LLM baselines on the URS task.

NumbersAverage distance 95.1 km vs 118.9 km (20% reduction, p<0.01)

Practical UseIf you use ALAS for dynamic dispatch, expect meaningful travel savings vs naive LLM outputs; adopt the template+agent pattern for routing tasks.

Evidence RefSection 4.1, Fig.2

Alas reliably handles mid-run disruptions in a family-event scenario while many LLMs fail.

NumbersAlas succeeded 10/10 reactive trials; DeepSeek and Claude failed 7/10 each

Practical UseFor workflows where deadlines and interdependent tasks change at run time, use persistent state + LRCP to avoid infeasible plans during replanning.

Evidence RefSection 4.2, D.9

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
URS total travel distance	95.1 km (Alas mean)	118.9 km (baseline LLMs mean)	-20%	URS (10 runs)	Section 4.1, Fig.2	Fig.2
Family Reunion reactive success	10/10 feasible replans (Alas)	3/10 feasible replans (DeepSeek/Claude typical success)	Alas +7 trials	Family Reunion reactive test (10 trials)	Section 4.2, D.9	D.9

What To Try In 7 Days

Prototype ALAS for one scheduling pipeline: build role templates, run a validator, and log persistent state for a small set of jobs.

Implement one compensator (LRCP) that performs local swaps and measures WIP movement vs global replanning cost.

Compare baseline LLM scheduling vs ALAS on a small real workflow (10–50 tasks) and track feasibility and downstream rework.

Agent Features

Memory

persistent execution memory (state transitions, logs)dependency graphs for rollback

Planning

workflow template constructionLRCP local reactive compensationqueue reordering with WIP penalty

Tool Use

LLMs to generate code and promptsexternal validators and classical heuristics

Frameworks

ALASLRCP

Is Agentic

Yes

Architectures

three-layer (blueprint, agent factory, runtime)role-specialized agent graph

Collaboration

master coordinator LLMmessage-based inter-agent alerts and DELAY_NOTIFY

Optimization Features

Token Efficiency

compartmentalize context to reduce long-context erosion

System Optimization

swap-limited queue optimization (practical bound O(S J O_max))WIP penalty to avoid excessive reordering

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Risks & Boundaries

Limitations

No in-factory or live deployment; results are simulation and benchmark-based.

Agent Factory assumes reliable LLM code generation; complex agents may need human review.

When Not To Use

Purely static optimization where classical combinatorial solvers already outperform LLM-based approaches.

Safety-critical domains that cannot tolerate automated code generation without human certification.

Failure Modes

Faulty agent code generation creating incorrect compensators or logging gaps.

Validator blind spots that miss constraint interactions leading to invalid templates.

Core Entities

Models

GPT-4o-TaskDeepSeek R1Claude 3.7 SonnetGemini 2.5 ProSeEvo-GPT3.5

Metrics

makespangap-to-upper-bound (%)total travel distance (km)reactive success rate (feasible replans / trials)

Datasets

Demirkol-DMU (DMU) JSSPTaillard (TA) JSSPUrban Ride Sharing (URS) syntheticFamily Reunion (custom event scenario)

ALAS: modular LLM agents with persistent memory that locally repair plans under runtime disruptions

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Alas produces shorter ride-sharing routes than standalone LLM baselines on the URS task.

Alas reliably handles mid-run disruptions in a family-event scenario while many LLMs fail.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Alas produces shorter ride-sharing routes than standalone LLM baselines on the URS task.

Alas reliably handles mid-run disruptions in a family-event scenario while many LLMs fail.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding