LLaMAC: an actor-critic wrapper that coordinates many LLM-based agents with a TripletCritic and token‑efficient feedback

November 23, 20237 min

Overview

Production Readiness

0.6

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

3

Authors

Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, Guoliang Fan

Links

Abstract / PDF

Why It Matters For Business

If you run many LLM-driven agents, LLaMAC lowers LLM calls and increases task success by coordinating agents through a centralized critic plus selective actor feedback.

Summary TLDR

This paper introduces LLaMAC, a modular actor-critic framework that coordinates many LLM-based agents via a TripletCritic (two preference critics + an assessor) and an external feedback loop from actors. The design trades off exploration and exploitation, reduces unnecessary LLM calls, and keeps token use low. Evaluated with GPT-4 on synthetic system resource allocation and robot grid-transport tasks (up to 50 agents), LLaMAC gets higher success rates, fewer steps, and fewer feedback iterations than baselines like multi-agent debate and HMAS-2.

Problem Statement

Scaling LLM-based multi-agent systems hits three practical limits: an exponentially large joint action space, increased LLM hallucinations when coordinating many agents, and high token / API costs from frequent LLM accesses. The paper seeks a coordination architecture that is stable, interpretable, and token-efficient for large agent counts.

Main Contribution

A modular LLaMAC architecture: centralized TripletCritic plus decentralized LLM actors with execution and memory modules.

The TripletCritic: two critics with different preferences (explore/exploit) plus an assessor that provides internal feedback and corrects beliefs.

An external feedback loop: actors confirm plans and only call the LLM when needed to cut access cost and tokens.

Empirical evaluation with GPT-4 on system resource allocation and grid-transport tasks (tests up to 50 agents) showing higher success, fewer steps, and lower feedback/token usage than baselines.

Key Findings

LLaMAC was tested on multi-agent resource allocation with up to 50 agents and maintains stable learning.

Numbersevaluations with 3,5,10,20,50 agents

In grid-transport benchmarks LLaMAC achieved near-perfect success where a prior method failed.

Numbers4x8 grid hard: LLaMAC 90% vs HMAS-2 0%; 4x8 easy: LLaMAC 100% vs HMAS-2 60%

LLaMAC reduces actor feedback and token use compared to a prior multi-agent method.

Numbers2x4 easy feedback: LLaMAC 4.3 vs HMAS-2 12.3; 4x8 easy feedback: 10.7 vs 26.1

Results

Success

Value100% (2x2 easy)

BaselineHMAS-2 100%

Success

Value100% (4x8 easy)

BaselineHMAS-2 60%

Success

Value90% (4x8 hard)

BaselineHMAS-2 0%

Steps

Value12.9 (±2.70) mean steps (4x8 easy)

BaselineHMAS-2 30.6 (±9.70)

Feedback (actor-level calls)

Value10.7 (±3.35) (4x8 easy)

BaselineHMAS-2 26.1 (±13.59)

Who Should Care

What To Try In 7 Days

Implement a minimal TripletCritic prototype that returns suggestions and an assessor to one group of 5–10 agents.

Compare token/API usage and task success between centralized critic routing and naive per-agent LLM calls on a small resource-allocation toy.

Log actor feedback counts and measure how many suggestions are accepted without extra LLM calls.

Agent Features

Memory

  • Short-term memory (recent state)
  • Long-term trajectory memory (last L steps) with filtering

Planning

  • Iterative planning via internal/external feedback
  • Chain-of-thought style textual planning

Tool Use

  • LLM as planner/evaluator (GPT-4)

Frameworks

  • LLaMAC

Is Agentic

true

Architectures

  • Centralized Critic with Decentralized Actors (CCDA)
  • TripletCritic (dual preference critics + assessor)

Collaboration

  • Internal feedback among critics
  • External feedback from actors to critic

Optimization Features

Token Efficiency

  • Selective external feedback cuts tokens
  • Assessor aims to reduce repeated actor corrections

System Optimization

  • Centralized assessor aggregates feedback to update suggestions

Inference Optimization

  • Reduce actor LLM calls via Plan Confirmation
  • Generate central suggestions to avoid redundant per-agent access

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Relies on a strong LLM (GPT-4) — quality and cost depend on the model used.
  • Evaluation uses synthetic tasks; real-world noise and latency not measured.
  • Centralized critic can be a coordination bottleneck or single point of failure.
  • No public code or dataset provided in the paper for direct replication.

When Not To Use

  • If API cost or latency forbids frequent large-model calls.
  • If decentralized/privacy constraints prevent a central critic from accessing agent actions.
  • For tasks requiring hard real-time guarantees where LLM latency is unpredictable.

Failure Modes

  • Persistent hallucinations that exceed the allowed iteration budget cause task failure (noted in grid task rules).
  • Dialogue history hitting token limits can break planning and stop progress.
  • Assessor errors can push the system to a poor equilibrium if internal feedback fails.

Core Entities

Models

  • GPT-4

Metrics

  • Success
  • Steps
  • Feedback
  • Token usage
  • System reward

Benchmarks

  • system resource allocation
  • grid transportation (easy/hard)