RAPS: intent-driven, reputation-aware publish–subscribe for adaptive multi-agent LLM coordination

February 8, 20267 min

Overview

Production Readiness

0.6

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

0

Authors

Rui Li, Zeyu Zhang, Xiaohe Bo, Quanyu Dai, Chaozhuo Li, Feng Wen, Xu Chen

Links

Abstract / PDF

Why It Matters For Business

RAPS offers inference-time, training-free coordination that improves task accuracy, scales with more agents, and reduces single-point failures — useful for teams building modular LLM services and open agent marketplaces.

Summary TLDR

This paper reframes multi-agent LLM coordination as dynamic ad-hoc networking and introduces RAPS: a distributed publish–subscribe substrate plus two overlays — Reactive Subscription (intent refinement) and Bayesian Reputation (decentralized trust). RAPS routes messages by semantic intent, lets agents refine intents at runtime, and uses Bayesian watchdogs to detect and isolate bad actors. On five standard benchmarks (reasoning, math, code) RAPS improved average performance and scaled better with more agents while staying robust to injected adversaries.

Problem Statement

Current automatic coordination either (a) fixes communication topologies and cannot adapt at inference, or (b) uses a central meta-controller that becomes a single point of failure and scalability bottleneck. The challenge is to design a coordination protocol that is adaptive (message-level), scalable (dynamic membership), and robust (resists misbehavior) without heavy training.

Main Contribution

Perspective: cast LLM agent coordination as dynamic ad-hoc networking to unify adaptivity, scalability, and robustness goals.

Communication substrate: a distributed publish–subscribe protocol that routes by semantic match between publications and subscriptions.

Overlays: Reactive Subscription for online intent refinement and Bayesian Reputation for decentralized trust and isolation of misbehaving agents.

Evaluation: wide experiments on five benchmarks showing improved accuracy, better runtime scaling, and robustness to adversarial agents.

Key Findings

RAPS achieves strong end-task gains across five benchmarks.

Numbers90.0% average accuracy/Pass@1 across MMLU, GSM8K, SVAMP, AQuA, HumanEval (Table 1).

Reactive Subscription materially improves results.

NumbersRemoving RS lowers MMLU by 2.6pts (88.2→85.6) and HumanEval by 2.2pts (91.5→89.3) (Table 3).

Bayesian reputation substantially raises robustness to adversaries.

NumbersUnder mixed adversarial pools RAPS stays high (88.2→86.3 across 5T0A→5T5A) while many baselines collapse (e.g., Chain 84

Results

Accuracy

Value88.2%

Accuracy

Value95.4%

Accuracy

Value92.2%

Accuracy

Value82.6%

HumanEval Pass@1

Value91.5%

Average (five tasks)

Value90.0%

Who Should Care

What To Try In 7 Days

Prototype a publish–subscribe wrapper around existing LLM agents and route by semantic similarity instead of static addresses.

Add a lightweight intent-refinement step: re-run a prompt rewriter on received messages before generating replies.

Implement a simple Bayesian score (beta counts) to downweight agents that produce flagged errors and observe robustness changes.

Agent Features

Memory

  • Local context buffer H_i (interaction history)
  • System prompt S_i as standing subscription

Planning

  • Intent-driven routing (semantic matching)
  • Reactive intent refinement at inference time

Tool Use

  • Embedding-based semantic broker (text-embedding-3-small)
  • LLM-driven broker variant (GPT-4o-mini)
  • LLM watchdog audits for first-hand evaluation

Frameworks

  • RAPS (this paper's framework)

Is Agentic

true

Architectures

  • Distributed publish–subscribe substrate
  • Reactive subscription overlay
  • Bayesian reputation overlay

Collaboration

  • Content-centric publish–subscribe interactions
  • Spontaneous marketplace-style collaboration among agents

Optimization Features

Token Efficiency

  • Accuracy
  • Broker filters to avoid unnecessary agent invocations

System Optimization

  • Decentralized routing to avoid central bottleneck
  • Reputation filtering to prevent communication with low-trust agents

Training Optimization

  • Training-free, inference-time intent refinement

Inference Optimization

  • Embedding-based broker for fast semantic matching
  • Selective dissemination (top-k subscribers) to reduce message routing

Reproducibility

Code Available

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Performance depends on the quality of underlying LLM backbones; RAPS does not fix weak agents.
  • Reputation has a cold-start: initial interactions may not provide enough evidence to isolate adversaries.
  • Watchdog judgments rely on LLM audits and can be noisy or biased in edge cases.
  • Broker semantic matching may miss useful but phrased-differently messages without strong embeddings or LLM reasoning.

When Not To Use

  • You have a small, fixed agent chain where static orchestration is sufficient.
  • Your LLM backbone is weak and cannot provide reliable audits or intent rewriting.
  • You require strictly deterministic routing and outputs rather than adaptive, content-driven behavior.

Failure Modes

  • Cold-start reputation allows early adversary influence until posterior evidence accumulates.
  • Colluding adversaries may initially pass deviation tests and poison reputations.
  • Incorrect watchdog evaluations could wrongly isolate honest agents and degrade performance.
  • LLM-driven broker or watchdog hallucinations can misroute or misjudge messages.

Core Entities

Models

  • GPT-4o-mini
  • LLM-driven brokers (GPT-4o-mini variant)

Metrics

  • Accuracy
  • Pass@1

Datasets

  • MMLU
  • GSM8K
  • SVAMP
  • AQuA
  • HumanEval

Benchmarks

  • MMLU
  • GSM8K
  • SVAMP
  • AQuA
  • HumanEval