ALAS: modular LLM agents with persistent memory that locally repair plans under runtime disruptions

May 18, 20257 min

Overview

Production Readiness

0.6

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

2

Authors

Edward Y. Chang, Longling Geng

Links

Abstract / PDF

Why It Matters For Business

ALAS turns LLMs into practical schedulers for dynamic operations by keeping state, validating plans, and repairing disruptions locally—reducing rework, travel, and missed deadlines in logistics and operations.

Summary TLDR

ALAS (Adaptive LLM Agent System) turns a monolithic LLM planner into a network of role-specialized agents with a shared persistent execution memory and a local reactive protocol (LRCP). The system: (1) validates templates, (2) instantiates agents (via code/LLM prompts), and (3) reacts to runtime failures with local compensation instead of full replanning. On toy and large job-shop benchmarks ALAS reduces travel distance (URS) and closes gap-to-optimal on JSSP benchmarks while fixing reactive failures that broke standalone LLMs. Key limits: no physical deployment yet, relies on reliable code generation, and uses static cost models.

Problem Statement

Standalone LLMs struggle for real-time, transaction-style planning because they lack self-checks, persistent state, long-context fidelity, and disruption recovery. ALAS fixes this by decomposing plans into role-specific agents, adding persistent execution memory, validators, and a Local Reactive Compensation Protocol (LRCP) that prefers local fixes over costly global replanning.

Main Contribution

Three-layer architecture (workflow blueprint, agent factory, runtime monitor) that builds validated templates and instantiates agents.

Persistent execution memory that logs state transitions, enabling rollback, targeted compensation, and causal checks.

Local Reactive Compensation Protocol (LRCP) that performs localized recovery and queue reordering to contain disruption costs.

Cross-domain evaluation showing improvements on ride-sharing, event coordination, and large job-shop scheduling (DMU, TA).

Key Findings

Alas produces shorter ride-sharing routes than standalone LLM baselines on the URS task.

NumbersAverage distance 95.1 km vs 118.9 km (20% reduction, p<0.01)

Alas reliably handles mid-run disruptions in a family-event scenario while many LLMs fail.

NumbersAlas succeeded 10/10 reactive trials; DeepSeek and Claude failed 7/10 each

On large job-shop benchmarks, ALAS closes the gap-to-optimal far better than several RL and heuristic methods.

NumbersTA benchmark mean gap 0.86% (Alas) vs DRL-Chen 25%, DRL-Zhang 18%, SeEvo-GPT3.5 15%

LRCP maintains lower average gap after random machine failures compared to other reactive methods.

NumbersLRCP average gap 19.8% under 20 random failures (DMU instances)

Results

URS total travel distance

Value95.1 km (Alas mean)

Baseline118.9 km (baseline LLMs mean)

Family Reunion reactive success

Value10/10 feasible replans (Alas)

Baseline3/10 feasible replans (DeepSeek/Claude typical success)

TA benchmark mean gap to optimal

Value0.86% (Alas + LRCP)

Baseline25% (DRL-Chen), 18% (DRL-Zhang), 15% (SeEvo-GPT3.5)

Dynamic JSSP average gap under disruptions

Value19.8% (LRCP)

BaselineHigher for SeEvo/GP/DRL methods (not specified numerically)

Who Should Care

What To Try In 7 Days

Prototype ALAS for one scheduling pipeline: build role templates, run a validator, and log persistent state for a small set of jobs.

Implement one compensator (LRCP) that performs local swaps and measures WIP movement vs global replanning cost.

Compare baseline LLM scheduling vs ALAS on a small real workflow (10–50 tasks) and track feasibility and downstream rework.

Agent Features

Memory

  • persistent execution memory (state transitions, logs)
  • dependency graphs for rollback

Planning

  • workflow template construction
  • LRCP local reactive compensation
  • queue reordering with WIP penalty

Tool Use

  • LLMs to generate code and prompts
  • external validators and classical heuristics

Frameworks

  • ALAS
  • LRCP

Is Agentic

true

Architectures

  • three-layer (blueprint, agent factory, runtime)
  • role-specialized agent graph

Collaboration

  • master coordinator LLM
  • message-based inter-agent alerts and DELAY_NOTIFY

Optimization Features

Token Efficiency

  • compartmentalize context to reduce long-context erosion

System Optimization

  • swap-limited queue optimization (practical bound O(S J O_max))
  • WIP penalty to avoid excessive reordering

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • No in-factory or live deployment; results are simulation and benchmark-based.
  • Agent Factory assumes reliable LLM code generation; complex agents may need human review.
  • Static cost models (constant operation times, fixed t_WIP) limit real-world fidelity.

When Not To Use

  • Purely static optimization where classical combinatorial solvers already outperform LLM-based approaches.
  • Safety-critical domains that cannot tolerate automated code generation without human certification.
  • Environments requiring dynamic cost models unless ALAS is extended with live estimators.

Failure Modes

  • Faulty agent code generation creating incorrect compensators or logging gaps.
  • Validator blind spots that miss constraint interactions leading to invalid templates.
  • Scalability bottlenecks in agent generation or message traffic if many agents are auto-coded at once.

Core Entities

Models

  • GPT-4o-Task
  • DeepSeek R1
  • Claude 3.7 Sonnet
  • Gemini 2.5 Pro
  • SeEvo-GPT3.5

Metrics

  • makespan
  • gap-to-upper-bound (%)
  • total travel distance (km)
  • reactive success rate (feasible replans / trials)

Datasets

  • Demirkol-DMU (DMU) JSSP
  • Taillard (TA) JSSP
  • Urban Ride Sharing (URS) synthetic
  • Family Reunion (custom event scenario)

Benchmarks

  • DMU job-shop benchmarks
  • Taillard (TA) job-shop benchmarks

Context Entities

Models

  • Gemini
  • OpenAI APIs
  • Claude