Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
2
Why It Matters For Business
When you run multiple LLM-based agents in parallel, overall response time depends on the slowest chain of steps (the critical path). Training the orchestration policy to minimize that path reduces latency a lot without sacrificing accuracy, which helps interactive products and time-sensitive workflows.
Summary TLDR
LAMaS is a learned controller for multi-agent systems that explicitly optimizes latency under layer-wise parallel execution. On standard benchmarks it cuts the inferred critical-path length by ~38–46% vs a leading baseline while keeping similar or better task accuracy. The key idea: train with a latency penalty and assign that penalty only to the bottleneck operators that form the critical path.
Problem Statement
Multi-agent systems often assume sequential execution or minimize total cost, which does not guarantee low wall-clock latency when operators can run in parallel. Without explicit latency signals, learned orchestrators tend to produce deep, narrow graphs that hurt latency under parallel execution.
Main Contribution
Formulation: point out that under parallel execution latency is driven by the critical execution path, not total cost, and should be optimized directly.
Method: LAMaS — a controller that enables layer-wise parallelism, removes intra-layer dependencies, and trains with a critical-path-aware latency penalty.
Evaluation: show across GSM8K, HumanEval and MATH that latency-aware training shortens the critical path by 38–46% versus MaAS, without hurting accuracy.
Key Findings
LAMaS reduced critical-path length substantially compared to MaAS on three benchmarks.
Task performance stayed comparable or improved while reducing latency.
Enabling parallel execution alone (removing intra-layer dependencies) does not produce the same latency gains.
Critical-path-aware credit assignment yields measurable benefits.
Results
Score (%)
CP len (latency proxy)
Score (%)
CP len (latency proxy)
Score (%)
CP len (latency proxy)
Who Should Care
What To Try In 7 Days
Measure a critical-path proxy (sum of longest outputs per layer) for your multi-agent flows.
Remove unnecessary intra-layer dependencies so operators can run in parallel.
Add a small latency penalty to the controller reward and apply it only to bottleneck operators (critical-path credit assignment).
Agent Features
Planning
- query-aware topology sampling
Tool Use
- LLM API calls
- external tool execution (mapped to token proxy)
Frameworks
- LAMaS
- MaAS
Is Agentic
true
Architectures
- probabilistic agentic supernet (DAG)
- layer-wise execution graphs with parallel operators
Collaboration
- learned multi-agent orchestration
- parallel operator execution
Optimization Features
Token Efficiency
- cost penalty preserved from MaAS
System Optimization
- remove intra-layer operator dependencies to avoid artificial synchronization
Training Optimization
- policy-gradient training of controller
- critical-path-aware credit assignment
- EMA normalization for reward variance reduction
Inference Optimization
- layer-wise parallel execution
- threshold-based sampling to adjust per-layer width
Reproducibility
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Latency metric is a token-based proxy (CP len), not measured wall-clock under real network/cloud conditions.
- Real-world speedups depend on system-level factors (queues, rate limits, hardware) outside the scope of this study.
- Experiments use a closed-source LLM (gpt-4o-mini-0718) via API, which may limit exact reproducibility for some users.
When Not To Use
- If your deployment latency is dominated by network or infra (not model compute), algorithmic orchestration alone may not help.
- For tiny workflows where sequential execution is cheaper and simpler.
- When you cannot measure or approximate operator latency (no proxy available).
Failure Modes
- Latency proxy mismatch: token-based CP len may not reflect real wall-clock bottlenecks.
- Mis-tuned latency weight can reduce accuracy or produce under-explored architectures.
- Incorrect credit assignment (uniform penalty) can worsen both latency and performance.
Core Entities
Models
- gpt-4o-mini-0718
Metrics
- Accuracy
- pass@1
- API cost (USD)
- CP len (critical-path token-length proxy)
Datasets
- GSM8K
- HumanEval
- MATH
Benchmarks
- GSM8K
- HumanEval
- MATH

