RAPS: intent-driven, reputation-aware publish–subscribe for adaptive multi-agent LLM coordination

Overview

Decision SnapshotNeeds Validation

RAPS is a practical, training-free coordination layer that improves multi-agent performance and robustness in experiments, but it depends on backbone LLM quality and requires reputation warm-up in new deployments.

Citations0

Evidence Strength0.80

Confidence0.88

Risk Signals11

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 0/6

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 70%

Authors

Rui Li, Zeyu Zhang, Xiaohe Bo, Quanyu Dai, Chaozhuo Li, Feng Wen, Xu Chen

Links

Abstract / PDF

Why It Matters For Business

RAPS offers inference-time, training-free coordination that improves task accuracy, scales with more agents, and reduces single-point failures — useful for teams building modular LLM services and open agent marketplaces.

Who Should Care

CTO Engineering Lead ML Engineer Product Manager Data Scientist

Summary TLDR

This paper reframes multi-agent LLM coordination as dynamic ad-hoc networking and introduces RAPS: a distributed publish–subscribe substrate plus two overlays — Reactive Subscription (intent refinement) and Bayesian Reputation (decentralized trust). RAPS routes messages by semantic intent, lets agents refine intents at runtime, and uses Bayesian watchdogs to detect and isolate bad actors. On five standard benchmarks (reasoning, math, code) RAPS improved average performance and scaled better with more agents while staying robust to injected adversaries.

Problem Statement

Current automatic coordination either (a) fixes communication topologies and cannot adapt at inference, or (b) uses a central meta-controller that becomes a single point of failure and scalability bottleneck. The challenge is to design a coordination protocol that is adaptive (message-level), scalable (dynamic membership), and robust (resists misbehavior) without heavy training.

Main Contribution

Perspective: cast LLM agent coordination as dynamic ad-hoc networking to unify adaptivity, scalability, and robustness goals.

Communication substrate: a distributed publish–subscribe protocol that routes by semantic match between publications and subscriptions.

Key Findings

RAPS achieves strong end-task gains across five benchmarks.

Numbers90.0% average accuracy/Pass@1 across MMLU, GSM8K, SVAMP, AQuA, HumanEval (Table 1).

Practical UseUse RAPS to boost multi-agent performance on mixed reasoning and code tasks without additional model training.

Evidence RefTable 1

Reactive Subscription materially improves results.

NumbersRemoving RS lowers MMLU by 2.6pts (88.2→85.6) and HumanEval by 2.2pts (91.5→89.3) (Table 3).

Practical UseAllow agents to refine their intent prompts at runtime to get 1–3 percentage point gains on reasoning and code tasks.

Evidence RefTable 3

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Accuracy	88.2%	—	—	MMLU test subset	Table 1 reports RAPS MMLU = 88.2	Table 1
Accuracy	95.4%	—	—	GSM8K test	Table 1 reports RAPS GSM8K = 95.4	Table 1

What To Try In 7 Days

Prototype a publish–subscribe wrapper around existing LLM agents and route by semantic similarity instead of static addresses.

Add a lightweight intent-refinement step: re-run a prompt rewriter on received messages before generating replies.

Implement a simple Bayesian score (beta counts) to downweight agents that produce flagged errors and observe robustness changes.

Agent Features

Memory

Local context buffer H_i (interaction history)System prompt S_i as standing subscription

Planning

Intent-driven routing (semantic matching)Reactive intent refinement at inference time

Tool Use

Embedding-based semantic broker (text-embedding-3-small)LLM-driven broker variant (GPT-4o-mini)LLM watchdog audits for first-hand evaluation

Frameworks

RAPS (this paper's framework)

Is Agentic

Yes

Architectures

Distributed publish–subscribe substrateReactive subscription overlayBayesian reputation overlay

Collaboration

Content-centric publish–subscribe interactionsSpontaneous marketplace-style collaboration among agents

Optimization Features

Token Efficiency

AccuracyBroker filters to avoid unnecessary agent invocations

System Optimization

Decentralized routing to avoid central bottleneckReputation filtering to prevent communication with low-trust agents

Training Optimization

Training-free, inference-time intent refinement

Inference Optimization

Embedding-based broker for fast semantic matchingSelective dissemination (top-k subscribers) to reduce message routing

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Risks & Boundaries

Limitations

Performance depends on the quality of underlying LLM backbones; RAPS does not fix weak agents.

Reputation has a cold-start: initial interactions may not provide enough evidence to isolate adversaries.

When Not To Use

You have a small, fixed agent chain where static orchestration is sufficient.

Your LLM backbone is weak and cannot provide reliable audits or intent rewriting.

Failure Modes

Cold-start reputation allows early adversary influence until posterior evidence accumulates.

Colluding adversaries may initially pass deviation tests and poison reputations.

Core Entities

Models

GPT-4o-miniLLM-driven brokers (GPT-4o-mini variant)

Metrics

AccuracyPass@1

Datasets

MMLUGSM8KSVAMPAQuAHumanEval

Benchmarks

MMLUGSM8KSVAMPAQuAHumanEval

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

RAPS achieves strong end-task gains across five benchmarks.

Reactive Subscription materially improves results.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Survey: Reframe LLMs as agents that plan, act, and continually learn

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

Survey of safe interfaces, threat models, and standards for LLM-driven agents that act on blockchains

Key finding

ACP: a layered, federated protocol for secure cross-platform agent-to-agent collaboration

Key finding

Survey of how LLMs become autonomous agents, the core architecture, and the research gaps to make them safe and practical.

Key finding