Overview
Production Readiness
0.6
Novelty Score
0.7
Cost Impact Score
0.6
Citation Count
2
Why It Matters For Business
ALAS turns LLMs into practical schedulers for dynamic operations by keeping state, validating plans, and repairing disruptions locally—reducing rework, travel, and missed deadlines in logistics and operations.
Summary TLDR
ALAS (Adaptive LLM Agent System) turns a monolithic LLM planner into a network of role-specialized agents with a shared persistent execution memory and a local reactive protocol (LRCP). The system: (1) validates templates, (2) instantiates agents (via code/LLM prompts), and (3) reacts to runtime failures with local compensation instead of full replanning. On toy and large job-shop benchmarks ALAS reduces travel distance (URS) and closes gap-to-optimal on JSSP benchmarks while fixing reactive failures that broke standalone LLMs. Key limits: no physical deployment yet, relies on reliable code generation, and uses static cost models.
Problem Statement
Standalone LLMs struggle for real-time, transaction-style planning because they lack self-checks, persistent state, long-context fidelity, and disruption recovery. ALAS fixes this by decomposing plans into role-specific agents, adding persistent execution memory, validators, and a Local Reactive Compensation Protocol (LRCP) that prefers local fixes over costly global replanning.
Main Contribution
Three-layer architecture (workflow blueprint, agent factory, runtime monitor) that builds validated templates and instantiates agents.
Persistent execution memory that logs state transitions, enabling rollback, targeted compensation, and causal checks.
Local Reactive Compensation Protocol (LRCP) that performs localized recovery and queue reordering to contain disruption costs.
Cross-domain evaluation showing improvements on ride-sharing, event coordination, and large job-shop scheduling (DMU, TA).
Key Findings
Alas produces shorter ride-sharing routes than standalone LLM baselines on the URS task.
Alas reliably handles mid-run disruptions in a family-event scenario while many LLMs fail.
On large job-shop benchmarks, ALAS closes the gap-to-optimal far better than several RL and heuristic methods.
LRCP maintains lower average gap after random machine failures compared to other reactive methods.
Results
URS total travel distance
Family Reunion reactive success
TA benchmark mean gap to optimal
Dynamic JSSP average gap under disruptions
Who Should Care
What To Try In 7 Days
Prototype ALAS for one scheduling pipeline: build role templates, run a validator, and log persistent state for a small set of jobs.
Implement one compensator (LRCP) that performs local swaps and measures WIP movement vs global replanning cost.
Compare baseline LLM scheduling vs ALAS on a small real workflow (10–50 tasks) and track feasibility and downstream rework.
Agent Features
Memory
- persistent execution memory (state transitions, logs)
- dependency graphs for rollback
Planning
- workflow template construction
- LRCP local reactive compensation
- queue reordering with WIP penalty
Tool Use
- LLMs to generate code and prompts
- external validators and classical heuristics
Frameworks
- ALAS
- LRCP
Is Agentic
true
Architectures
- three-layer (blueprint, agent factory, runtime)
- role-specialized agent graph
Collaboration
- master coordinator LLM
- message-based inter-agent alerts and DELAY_NOTIFY
Optimization Features
Token Efficiency
- compartmentalize context to reduce long-context erosion
System Optimization
- swap-limited queue optimization (practical bound O(S J O_max))
- WIP penalty to avoid excessive reordering
Reproducibility
Open Source Status
- partial
Risks & Boundaries
Limitations
- No in-factory or live deployment; results are simulation and benchmark-based.
- Agent Factory assumes reliable LLM code generation; complex agents may need human review.
- Static cost models (constant operation times, fixed t_WIP) limit real-world fidelity.
When Not To Use
- Purely static optimization where classical combinatorial solvers already outperform LLM-based approaches.
- Safety-critical domains that cannot tolerate automated code generation without human certification.
- Environments requiring dynamic cost models unless ALAS is extended with live estimators.
Failure Modes
- Faulty agent code generation creating incorrect compensators or logging gaps.
- Validator blind spots that miss constraint interactions leading to invalid templates.
- Scalability bottlenecks in agent generation or message traffic if many agents are auto-coded at once.
Core Entities
Models
- GPT-4o-Task
- DeepSeek R1
- Claude 3.7 Sonnet
- Gemini 2.5 Pro
- SeEvo-GPT3.5
Metrics
- makespan
- gap-to-upper-bound (%)
- total travel distance (km)
- reactive success rate (feasible replans / trials)
Datasets
- Demirkol-DMU (DMU) JSSP
- Taillard (TA) JSSP
- Urban Ride Sharing (URS) synthetic
- Family Reunion (custom event scenario)
Benchmarks
- DMU job-shop benchmarks
- Taillard (TA) job-shop benchmarks
Context Entities
Models
- Gemini
- OpenAI APIs
- Claude

