Overview
The survey compiles diverse early-stage systems and benchmarks; evidence is broad but mostly simulation and prototype experiments, so production readiness is limited without hybrid verification and latency fixes.
Citations4
Evidence Strength0.60
Confidence0.85
Risk Signals13
Trust Signals
Findings with numeric evidence: 2/5
Findings with evidence refs: 5/5
Results with explicit delta: 0/0
Reproducibility
Status: No open assets linked
Open source: Partial
At A Glance
Cost impact: 50%
Production readiness: 30%
Novelty: 60%
Why It Matters For Business
LLMs can speed up multi-robot coordination and simplify human instructions, but current limitations (math errors, hallucinations, latency) mean companies should pilot hybrid systems that pair LLMs for planning with verified controllers for execution.
Who Should Care
Summary TLDR
This survey reviews how large language models (LLMs) are being applied to multi-robot systems (MRS). It organizes work into four levels: high-level task allocation, mid-level motion planning, low-level action generation, and human intervention. The paper catalogs communication architectures (centralized, decentralized, hybrid), multimodal extensions (VLMs, VLAs), common simulators and benchmarks (AI2-THOR, PyBullet, RoCoBench, BOLAA, COHERENT), and practical challenges: weak mathematical reasoning, hallucination, latency, multi-modal fusion, and sparse standardized benchmarks. It ends with concrete opportunities: fine-tuning/LoRA, RAG, lightweight task-specific models, and richer multi-modal
Problem Statement
Integrating LLMs into real multi-robot teams promises easier instruction, dynamic task allocation, and richer human‑robot interaction, but MRS impose unique constraints—coordination, real-time behavior, heterogeneous robot bodies, and field deployment—that current LLM methods struggle with due to reasoning gaps, hallucination, latency, and weak benchmarks.
Main Contribution
First focused survey of LLM use specifically for multi-robot systems (MRS).
A clear taxonomy: high-level task allocation, mid-level motion planning, low-level action generation, and human intervention.
Key Findings
LLMs are being used at four operational levels in MRS: task allocation, motion planning, action generation, and human-in-the-loop.
LLMs show large failures on mathematical/logical reasoning tasks; performance can drop markedly when problem clauses change.
What To Try In 7 Days
Run a proof-of-concept: use an LLM for high-level task allocation and a traditional planner for low-level control in simulation.
Measure latency and token costs with centralized vs hybrid communication on a small team (3–6 robots).
Test LoRA fine-tuning on a small domain corpus and compare hallucination rates with/without RAG retrieval.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Weak mathematical and numerical reasoning in LLMs for planning
Prone to hallucination; needs verification and RAG
When Not To Use
Time-critical low-level control loops requiring sub-second response
Precise numerical optimization or trajectory planning without symbolic solvers
Failure Modes
Hallucinated plan leads to unsafe or infeasible robot actions
High latency causes missed control deadlines and mission failure

