MACNET: use directed acyclic graphs to scale LLM agents and show a logistic ‘collaborative scaling law’

June 11, 20248 min

Overview

Decision SnapshotNeeds Validation

MACNET presents a clear, reproducible system-level design and empirical gains, but results depend on base LLM quality and topology choices; expect engineering work to tune scale and wiring for your tasks.

Citations6

Evidence Strength0.75

Confidence0.82

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 80%

Authors

Chen Qian, Zihao Xie, YiFei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun

Links

Abstract / PDF / Code / Data

Why It Matters For Business

You can improve quality on mixed tasks by running many cooperating LLM agents in a DAG and avoid expensive retraining; randomized wiring often gives a good speed-quality trade-off.

Who Should Care

Summary TLDR

This paper introduces MACNET, a system that arranges LLM-driven agents into directed acyclic graphs (DAGs). Nodes run 'actors' that produce artifacts and edges run 'critics' that give refinement instructions. By propagating only refined artifacts (not full dialogues) and traversing in topological order, MACNET reduces context growth, supports collaboration at scale, and yields a logistic performance-vs.-size curve: improvements accelerate then saturate. Evaluations on MMLU, HumanEval, SRDD and CommonGen-Hard show MACNET variants beat several baselines; irregular (random) topologies balance quality and time best. Code: github.com/OpenBMB/ChatDev/tree/macnet.

Problem Statement

Existing multi-agent LLM systems rarely test large agent counts and often rely on simple voting or chain structures. We ask: how does continuous addition of collaborating agents affect performance, and can a scalable network design avoid context explosion while harnessing many agents?

Main Contribution

MACNET: a practical framework that maps agents to a DAG with actors on nodes and critics on edges to orchestrate iterative refinement.

A memory-control rule that propagates only final artifacts (not full dialogue), cutting worst-case token growth from quadratic to linear.

Key Findings

MACNET variants outperform multi-agent and single-agent baselines on average across diverse tasks.

NumbersQuality: MACNET-RANDOM 0.6522 vs AGENTVERSE 0.5805 (Table 1).

Practical UseFor mixed tasks (knowledge, coding, software dev, generation), try MACNET-style DAG networks instead of single-agent prompts or simple majority voting to improve end-to-end quality.

Evidence RefTable 1

Irregular/random topologies can beat regular dense designs while running faster.

NumbersRandom topologies took ~51.92% less time than mesh while matching or exceeding quality (text & Fig.5).

Practical UseUse randomized or small-world edge wiring to balance performance and latency rather than defaulting to fully connected meshes.

Evidence RefFigure 5 and Section 3.2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Quality (average across tasks)MACNET-RANDOM 0.6522AGENTVERSE 0.5805+0.0717Table 1 reports MACNET-RANDOM Quality 0.6522 versus baseline AGENTVERSE 0.5805.Table 1
AccuracyMACNET-CHAIN 0.6632AGENTVERSE 0.2977+0.3655MMLUTable 1 shows MACNET-CHAIN MMLU 0.6632 vs AGENTVERSE 0.2977.Table 1

What To Try In 7 Days

Prototype a small MACNET: assign actor roles at nodes and critic roles on edges using GPT-3.5 or your model.

Enable artifact-only propagation (store only final artifacts), then measure tokens and latency versus full-dialogue passing.

Compare chain, star, and a randomized graph with 10–50 agents to find the best trade-off for your task.

Agent Features

Memory
Short-term memory for interaction contextLong-term memory stores only final artifacts (artifact-only propagation)
Planning
Topological ordering traversalIterative local refinement between critic and actor
Tool Use
Uses LLMs for reasoning (GPT-3.5 in experiments)Supports agent profiles and external tools (profiles referenced)
Frameworks
MACNET (this paper)ChatDev/macnet (code)
Is Agentic

Yes

Architectures
Directed Acyclic Graph (DAG)Functional bipartition: actors (nodes) and critics (edges)
Collaboration
Dual-agent iterative refinement per edge (critic→actor→refine)Aggregation at convergent nodes (hierarchical aggregation)

Optimization Features

Token Efficiency
Memory control changes worst-case token growth from O(n^2) to O(n)
Infra Optimization
Design supports scaling to hundreds/thousands of agent instances by limiting context per agent
System Optimization
Assign critics to edges and actors to nodes to split duties and reduce backflowRandomized wiring to reduce average path length and time
Inference Optimization
Artifact-only propagation reduces tokens sent between agentsTopological traversal avoids global broadcasting

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

MMLU (public)HumanEval (public)SRDD (Qian et al.)CommonGen-Hard (public)

Risks & Boundaries

Limitations

Relies on the underlying LLM quality (experiments use GPT-3.5); gains may shrink with weaker models.

Dense meshes improve quality but dramatically increase token and time costs.

When Not To Use

When you have a single simple closed-domain task easily solved by a tuned single-model pipeline.

If API cost or latency is extremely tight and you cannot afford dozens of LLM calls.

Failure Modes

Context explosion if artifact-only propagation is not enforced.

Aggregation errors at convergent nodes leading to degraded artifacts.

Core Entities

Models

GPT-3.5

Metrics

Accuracypass@kcomprehensive SRDD metriccomposite CommonGen metricQuality (average across tasks)

Datasets

MMLUHumanEvalSRDDCommonGen-Hard

Benchmarks

MMLUHumanEvalSRDDCommonGen-Hard