Overview
Production Readiness
0.6
Novelty Score
0.7
Cost Impact Score
0.6
Citation Count
3
Why It Matters For Business
If you run many LLM-driven agents, LLaMAC lowers LLM calls and increases task success by coordinating agents through a centralized critic plus selective actor feedback.
Summary TLDR
This paper introduces LLaMAC, a modular actor-critic framework that coordinates many LLM-based agents via a TripletCritic (two preference critics + an assessor) and an external feedback loop from actors. The design trades off exploration and exploitation, reduces unnecessary LLM calls, and keeps token use low. Evaluated with GPT-4 on synthetic system resource allocation and robot grid-transport tasks (up to 50 agents), LLaMAC gets higher success rates, fewer steps, and fewer feedback iterations than baselines like multi-agent debate and HMAS-2.
Problem Statement
Scaling LLM-based multi-agent systems hits three practical limits: an exponentially large joint action space, increased LLM hallucinations when coordinating many agents, and high token / API costs from frequent LLM accesses. The paper seeks a coordination architecture that is stable, interpretable, and token-efficient for large agent counts.
Main Contribution
A modular LLaMAC architecture: centralized TripletCritic plus decentralized LLM actors with execution and memory modules.
The TripletCritic: two critics with different preferences (explore/exploit) plus an assessor that provides internal feedback and corrects beliefs.
An external feedback loop: actors confirm plans and only call the LLM when needed to cut access cost and tokens.
Empirical evaluation with GPT-4 on system resource allocation and grid-transport tasks (tests up to 50 agents) showing higher success, fewer steps, and lower feedback/token usage than baselines.
Key Findings
LLaMAC was tested on multi-agent resource allocation with up to 50 agents and maintains stable learning.
In grid-transport benchmarks LLaMAC achieved near-perfect success where a prior method failed.
LLaMAC reduces actor feedback and token use compared to a prior multi-agent method.
Results
Success
Success
Success
Steps
Feedback (actor-level calls)
Who Should Care
What To Try In 7 Days
Implement a minimal TripletCritic prototype that returns suggestions and an assessor to one group of 5–10 agents.
Compare token/API usage and task success between centralized critic routing and naive per-agent LLM calls on a small resource-allocation toy.
Log actor feedback counts and measure how many suggestions are accepted without extra LLM calls.
Agent Features
Memory
- Short-term memory (recent state)
- Long-term trajectory memory (last L steps) with filtering
Planning
- Iterative planning via internal/external feedback
- Chain-of-thought style textual planning
Tool Use
- LLM as planner/evaluator (GPT-4)
Frameworks
- LLaMAC
Is Agentic
true
Architectures
- Centralized Critic with Decentralized Actors (CCDA)
- TripletCritic (dual preference critics + assessor)
Collaboration
- Internal feedback among critics
- External feedback from actors to critic
Optimization Features
Token Efficiency
- Selective external feedback cuts tokens
- Assessor aims to reduce repeated actor corrections
System Optimization
- Centralized assessor aggregates feedback to update suggestions
Inference Optimization
- Reduce actor LLM calls via Plan Confirmation
- Generate central suggestions to avoid redundant per-agent access
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Relies on a strong LLM (GPT-4) — quality and cost depend on the model used.
- Evaluation uses synthetic tasks; real-world noise and latency not measured.
- Centralized critic can be a coordination bottleneck or single point of failure.
- No public code or dataset provided in the paper for direct replication.
When Not To Use
- If API cost or latency forbids frequent large-model calls.
- If decentralized/privacy constraints prevent a central critic from accessing agent actions.
- For tasks requiring hard real-time guarantees where LLM latency is unpredictable.
Failure Modes
- Persistent hallucinations that exceed the allowed iteration budget cause task failure (noted in grid task rules).
- Dialogue history hitting token limits can break planning and stop progress.
- Assessor errors can push the system to a poor equilibrium if internal feedback fails.
Core Entities
Models
- GPT-4
Metrics
- Success
- Steps
- Feedback
- Token usage
- System reward
Benchmarks
- system resource allocation
- grid transportation (easy/hard)

