LLaMAC: an actor-critic wrapper that coordinates many LLM-based agents with a TripletCritic and token‑efficient feedback

November 23, 20237 min

Overview

Decision SnapshotNeeds Validation

The architecture is practical and tested with GPT-4 on multiple synthetic tasks; results show consistent gains but rely on a high-quality LLM and bespoke prompts.

Citations3

Evidence Strength0.60

Confidence0.78

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 5/5

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 70%

Authors

Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, Guoliang Fan

Links

Abstract / PDF

Why It Matters For Business

If you run many LLM-driven agents, LLaMAC lowers LLM calls and increases task success by coordinating agents through a centralized critic plus selective actor feedback.

Who Should Care

Summary TLDR

This paper introduces LLaMAC, a modular actor-critic framework that coordinates many LLM-based agents via a TripletCritic (two preference critics + an assessor) and an external feedback loop from actors. The design trades off exploration and exploitation, reduces unnecessary LLM calls, and keeps token use low. Evaluated with GPT-4 on synthetic system resource allocation and robot grid-transport tasks (up to 50 agents), LLaMAC gets higher success rates, fewer steps, and fewer feedback iterations than baselines like multi-agent debate and HMAS-2.

Problem Statement

Scaling LLM-based multi-agent systems hits three practical limits: an exponentially large joint action space, increased LLM hallucinations when coordinating many agents, and high token / API costs from frequent LLM accesses. The paper seeks a coordination architecture that is stable, interpretable, and token-efficient for large agent counts.

Main Contribution

A modular LLaMAC architecture: centralized TripletCritic plus decentralized LLM actors with execution and memory modules.

The TripletCritic: two critics with different preferences (explore/exploit) plus an assessor that provides internal feedback and corrects beliefs.

Key Findings

LLaMAC was tested on multi-agent resource allocation with up to 50 agents and maintains stable learning.

Numbersevaluations with 3,5,10,20,50 agents

Practical UseYou can apply the design to problems with tens of agents instead of only small groups; expect more stable policy search under scale.

Evidence RefFig.4, Sect.4.1

In grid-transport benchmarks LLaMAC achieved near-perfect success where a prior method failed.

Numbers4x8 grid hard: LLaMAC 90% vs HMAS-2 0%; 4x8 easy: LLaMAC 100% vs HMAS-2 60%

Practical UseUse LLaMAC when tasks need long-horizon coordination; it meaningfully raises task completion rates on complex multi-step tasks.

Evidence RefTable 2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Success100% (2x2 easy)HMAS-2 100%equalGrid Transportation 2x2 easyTable 2 shows both reach 100% success on 2x2 easyTable 2
Success100% (4x8 easy)HMAS-2 60%+40pptGrid Transportation 4x8 easyTable 2: LLaMAC 100% vs HMAS-2 60%Table 2

What To Try In 7 Days

Implement a minimal TripletCritic prototype that returns suggestions and an assessor to one group of 5–10 agents.

Compare token/API usage and task success between centralized critic routing and naive per-agent LLM calls on a small resource-allocation toy.

Log actor feedback counts and measure how many suggestions are accepted without extra LLM calls.

Agent Features

Memory
Short-term memory (recent state)Long-term trajectory memory (last L steps) with filtering
Planning
Iterative planning via internal/external feedbackChain-of-thought style textual planning
Tool Use
LLM as planner/evaluator (GPT-4)
Frameworks
LLaMAC
Is Agentic

Yes

Architectures
Centralized Critic with Decentralized Actors (CCDA)TripletCritic (dual preference critics + assessor)
Collaboration
Internal feedback among criticsExternal feedback from actors to critic

Optimization Features

Token Efficiency
Selective external feedback cuts tokensAssessor aims to reduce repeated actor corrections
System Optimization
Centralized assessor aggregates feedback to update suggestions
Inference Optimization
Reduce actor LLM calls via Plan ConfirmationGenerate central suggestions to avoid redundant per-agent access

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Relies on a strong LLM (GPT-4) — quality and cost depend on the model used.

Evaluation uses synthetic tasks; real-world noise and latency not measured.

When Not To Use

If API cost or latency forbids frequent large-model calls.

If decentralized/privacy constraints prevent a central critic from accessing agent actions.

Failure Modes

Persistent hallucinations that exceed the allowed iteration budget cause task failure (noted in grid task rules).

Dialogue history hitting token limits can break planning and stop progress.

Core Entities

Models

GPT-4

Metrics

SuccessStepsFeedbackToken usageSystem reward

Benchmarks

system resource allocationgrid transportation (easy/hard)