Overview
MACNET presents a clear, reproducible system-level design and empirical gains, but results depend on base LLM quality and topology choices; expect engineering work to tune scale and wiring for your tasks.
Citations6
Evidence Strength0.75
Confidence0.82
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/5
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 80%
Why It Matters For Business
You can improve quality on mixed tasks by running many cooperating LLM agents in a DAG and avoid expensive retraining; randomized wiring often gives a good speed-quality trade-off.
Who Should Care
Summary TLDR
This paper introduces MACNET, a system that arranges LLM-driven agents into directed acyclic graphs (DAGs). Nodes run 'actors' that produce artifacts and edges run 'critics' that give refinement instructions. By propagating only refined artifacts (not full dialogues) and traversing in topological order, MACNET reduces context growth, supports collaboration at scale, and yields a logistic performance-vs.-size curve: improvements accelerate then saturate. Evaluations on MMLU, HumanEval, SRDD and CommonGen-Hard show MACNET variants beat several baselines; irregular (random) topologies balance quality and time best. Code: github.com/OpenBMB/ChatDev/tree/macnet.
Problem Statement
Existing multi-agent LLM systems rarely test large agent counts and often rely on simple voting or chain structures. We ask: how does continuous addition of collaborating agents affect performance, and can a scalable network design avoid context explosion while harnessing many agents?
Main Contribution
MACNET: a practical framework that maps agents to a DAG with actors on nodes and critics on edges to orchestrate iterative refinement.
A memory-control rule that propagates only final artifacts (not full dialogue), cutting worst-case token growth from quadratic to linear.
Key Findings
MACNET variants outperform multi-agent and single-agent baselines on average across diverse tasks.
Irregular/random topologies can beat regular dense designs while running faster.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Quality (average across tasks) | MACNET-RANDOM 0.6522 | AGENTVERSE 0.5805 | +0.0717 | — | Table 1 reports MACNET-RANDOM Quality 0.6522 versus baseline AGENTVERSE 0.5805. | Table 1 |
| Accuracy | MACNET-CHAIN 0.6632 | AGENTVERSE 0.2977 | +0.3655 | MMLU | Table 1 shows MACNET-CHAIN MMLU 0.6632 vs AGENTVERSE 0.2977. | Table 1 |
What To Try In 7 Days
Prototype a small MACNET: assign actor roles at nodes and critic roles on edges using GPT-3.5 or your model.
Enable artifact-only propagation (store only final artifacts), then measure tokens and latency versus full-dialogue passing.
Compare chain, star, and a randomized graph with 10–50 agents to find the best trade-off for your task.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Relies on the underlying LLM quality (experiments use GPT-3.5); gains may shrink with weaker models.
Dense meshes improve quality but dramatically increase token and time costs.
When Not To Use
When you have a single simple closed-domain task easily solved by a tuned single-model pipeline.
If API cost or latency is extremely tight and you cannot afford dozens of LLM calls.
Failure Modes
Context explosion if artifact-only propagation is not enforced.
Aggregation errors at convergent nodes leading to degraded artifacts.

