Overview
The paper combines a diagnostic pilot study and multiple benchmark runs to support the CLT mapping and architecture; ablations show predictable hyperparameter trade‑offs, but the approach increases API/computation cost and needs task tuning.
Citations1
Evidence Strength0.70
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/5
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
Designing LLM teams with shared memory and structured communication reduces reasoning failures on complex problems, improving solution quality for data analysis and math tasks while requiring careful tuning to avoid extra coordination cost.
Who Should Care
Summary TLDR
LLMs struggle when a task forces them to hold and integrate many interacting facts at once. The authors map human Cognitive Load Theory (CLT) to LLMs (attention as working memory), show diagnostic signals (attention entropy and perplexity), and introduce CoThinker: a multi-agent in‑context system that (1) assigns dynamic thinking styles, (2) keeps a shared transactive memory, and (3) moderates peer communication with a small‑world graph. On challenging benchmarks (LiveBench, CommonGen‑Hard) CoThinker improves math/reasoning and concept‑integration tasks versus single-agent and debate baselines, but it can hurt simple instruction-following due to coordination overhead.
Problem Statement
Large LLMs hit a performance ceiling on multi-faceted tasks because in‑context examples and constraints overload the model's selective attention (its working memory analogue). The paper argues this "cognitive overload" explains degeneration, lack of diversity, and failure to meet multiple constraints, and that multi-agent coordination designed with CLT principles can mitigate the problem.
Main Contribution
Formalized a mapping from human Cognitive Load Theory to LLM attention and in‑context limits, and validated it with attention entropy and perplexity probes.
Designed CoThinker, a CLT‑grounded multi‑agent architecture with dynamic thinking styles, a transactive memory system (TMS), and a communication moderator that enforces a small‑world communication graph.
Key Findings
Attention entropy rises with task complexity, consistent with higher working‑memory demands.
Structured instructions reduce uncertainty for hard tasks but add cost for easy tasks.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Attention Entropy | 4.44 → 6.10 across difficulty levels | Level1 | +1.66 (Level1→Level4) | AMPS arithmetic controlled set | Attention entropy increases monotonically with task complexity | Table 6, C.4 |
| Perplexity (Hard tasks) | 120.50 → 85.35 (instruction levels 1→3) | Level1 instruction | -35.15 | FLASK (hard vs easy) | Instructions reduce PPL for hard tasks, then increase if overly complex | Table 7, C.5 |
What To Try In 7 Days
Run a small CoThinker prototype (M=6, N=2–3, β≈0.3) on one high‑complexity task to compare vs single-agent baselines.
Add a concise transactive memory summary step to your agent pipeline to avoid redundant recomputation.
Use style prompts (1–2 sentences) to diversify agent approaches instead of fixed heavy role personas.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
System Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Attention entropy and perplexity are diagnostic proxies, not universal test‑time signals.
CoThinker can add extraneous coordination cost and underperform on low‑intrinsic‑load tasks like simple instruction following.
When Not To Use
Simple execution or instruction‑following tasks with low intrinsic cognitive load.
When compute or API budget is tight and latency matters.
Failure Modes
Echo chambers if β is too low (agents over‑similar and converge prematurely).
Overload from too many agents or too large reference sets (extraneous CL outweighs benefits).

