Overview
Production Readiness
0.6
Novelty Score
0.7
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
RAPS offers inference-time, training-free coordination that improves task accuracy, scales with more agents, and reduces single-point failures — useful for teams building modular LLM services and open agent marketplaces.
Summary TLDR
This paper reframes multi-agent LLM coordination as dynamic ad-hoc networking and introduces RAPS: a distributed publish–subscribe substrate plus two overlays — Reactive Subscription (intent refinement) and Bayesian Reputation (decentralized trust). RAPS routes messages by semantic intent, lets agents refine intents at runtime, and uses Bayesian watchdogs to detect and isolate bad actors. On five standard benchmarks (reasoning, math, code) RAPS improved average performance and scaled better with more agents while staying robust to injected adversaries.
Problem Statement
Current automatic coordination either (a) fixes communication topologies and cannot adapt at inference, or (b) uses a central meta-controller that becomes a single point of failure and scalability bottleneck. The challenge is to design a coordination protocol that is adaptive (message-level), scalable (dynamic membership), and robust (resists misbehavior) without heavy training.
Main Contribution
Perspective: cast LLM agent coordination as dynamic ad-hoc networking to unify adaptivity, scalability, and robustness goals.
Communication substrate: a distributed publish–subscribe protocol that routes by semantic match between publications and subscriptions.
Overlays: Reactive Subscription for online intent refinement and Bayesian Reputation for decentralized trust and isolation of misbehaving agents.
Evaluation: wide experiments on five benchmarks showing improved accuracy, better runtime scaling, and robustness to adversarial agents.
Key Findings
RAPS achieves strong end-task gains across five benchmarks.
Reactive Subscription materially improves results.
Bayesian reputation substantially raises robustness to adversaries.
Results
Accuracy
Accuracy
Accuracy
Accuracy
HumanEval Pass@1
Average (five tasks)
Who Should Care
What To Try In 7 Days
Prototype a publish–subscribe wrapper around existing LLM agents and route by semantic similarity instead of static addresses.
Add a lightweight intent-refinement step: re-run a prompt rewriter on received messages before generating replies.
Implement a simple Bayesian score (beta counts) to downweight agents that produce flagged errors and observe robustness changes.
Agent Features
Memory
- Local context buffer H_i (interaction history)
- System prompt S_i as standing subscription
Planning
- Intent-driven routing (semantic matching)
- Reactive intent refinement at inference time
Tool Use
- Embedding-based semantic broker (text-embedding-3-small)
- LLM-driven broker variant (GPT-4o-mini)
- LLM watchdog audits for first-hand evaluation
Frameworks
- RAPS (this paper's framework)
Is Agentic
true
Architectures
- Distributed publish–subscribe substrate
- Reactive subscription overlay
- Bayesian reputation overlay
Collaboration
- Content-centric publish–subscribe interactions
- Spontaneous marketplace-style collaboration among agents
Optimization Features
Token Efficiency
- Accuracy
- Broker filters to avoid unnecessary agent invocations
System Optimization
- Decentralized routing to avoid central bottleneck
- Reputation filtering to prevent communication with low-trust agents
Training Optimization
- Training-free, inference-time intent refinement
Inference Optimization
- Embedding-based broker for fast semantic matching
- Selective dissemination (top-k subscribers) to reduce message routing
Reproducibility
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Performance depends on the quality of underlying LLM backbones; RAPS does not fix weak agents.
- Reputation has a cold-start: initial interactions may not provide enough evidence to isolate adversaries.
- Watchdog judgments rely on LLM audits and can be noisy or biased in edge cases.
- Broker semantic matching may miss useful but phrased-differently messages without strong embeddings or LLM reasoning.
When Not To Use
- You have a small, fixed agent chain where static orchestration is sufficient.
- Your LLM backbone is weak and cannot provide reliable audits or intent rewriting.
- You require strictly deterministic routing and outputs rather than adaptive, content-driven behavior.
Failure Modes
- Cold-start reputation allows early adversary influence until posterior evidence accumulates.
- Colluding adversaries may initially pass deviation tests and poison reputations.
- Incorrect watchdog evaluations could wrongly isolate honest agents and degrade performance.
- LLM-driven broker or watchdog hallucinations can misroute or misjudge messages.
Core Entities
Models
- GPT-4o-mini
- LLM-driven brokers (GPT-4o-mini variant)
Metrics
- Accuracy
- Pass@1
Datasets
- MMLU
- GSM8K
- SVAMP
- AQuA
- HumanEval
Benchmarks
- MMLU
- GSM8K
- SVAMP
- AQuA
- HumanEval

