Combine private knowledge across silos by sharing compact, masked parametric adapters instead of raw documents

Overview

Decision SnapshotReady For Pilot

FedMosaic shows practical gains and cost reductions on multiple QA datasets; results are empirical and rely on assumptions such as homogeneous backbone and local compute for adapter training.

Citations0

Evidence Strength0.80

Confidence0.80

Risk Signals8

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 80%

Production readiness: 70%

Novelty: 70%

Authors

Zhilin Liang, Yuxiang Wang, Zimu Zhou, Hainan Zhang, Boyi Liu, Yongxin Tong

Links

Abstract / PDF / Data

Why It Matters For Business

FedMosaic lets companies aggregate private knowledge across departments or partners without moving raw documents, cutting network and storage costs dramatically while improving answer accuracy on evaluated QA tasks.

Who Should Care

CTO Product Manager ML Engineer Data Scientist Engineering Lead Founder

Summary TLDR

FedMosaic is a federated RAG (retrieval-augmented generation) method that keeps raw documents inside each silo by converting clusters of local documents into shared LoRA adapters plus per-document binary masks. At query time silos send only relevance scores and masks; the server selects low-conflict adapters, aggregates masked parameters, and composes them with a frozen LLM. On four QA datasets it reports an average +10.9% F1 over competitive baselines while cutting storage by ~79–86% and per-query communication by ~91%, and it resists targeted data-extraction attacks (0% success in experiments).

Problem Statement

RAG improves LLM factuality by using external documents, but many domains cannot centralize raw documents for privacy or compliance. Parametric RAG (converting docs into adapters) preserves locality but naively produces too many adapters and breaks when adapters from many silos are averaged. The problem: how to (1) avoid sharing raw text, (2) reduce storage and communication, and (3) prevent destructive aggregation across silos while still integrating multi-silo knowledge.

Main Contribution

FedMosaic: first federated parametric RAG framework that enforces the locality constraint by sharing adapters instead of plaintext.

Multi-document adapters + document-specific binary masks: cluster related docs into one adapter and learn sparse masks so each doc activates a small adapter subspace.

Key Findings

Average accuracy gain over state-of-the-art baselines

NumbersAvg +10.9% F1 across four datasets

Practical UseExpect about a 10% relative F1 improvement on evaluated QA datasets when replacing other federated/local RAG methods with FedMosaic.

Evidence RefSec.4.2, Table 1

Silo storage reduced by clustering adapters

NumbersStorage cut by 78.8%–86.3%

Practical UseYou can store many fewer adapter files per silo; plan for roughly one-fifth to one-eighth of prior adapter storage.

Evidence RefAbstract, Sec.4.4.1, Fig.4a

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Average F1 improvement	+10.9% (relative)	state-of-the-art methods across four categories	+10.9%	aggregated over HotpotQA, 2WQA, PopQA, CWQ	Sec.4.2, Table 1	Table 1
Per-silo storage overhead	reduced to 11–21% of no-clustering variant	w/o clustering parametric RAG	78.8%–86.3% reduction	clustering C∈{5,8,10}	Sec.4.4.1, Fig.4a	Fig.4a

What To Try In 7 Days

Prototype per-silo LoRA adapters on a small private corpus and verify adapter training works locally.

Cluster related docs (k-means with max cluster size 5–10) and train a cluster adapter plus per-document masks.

Implement simple re-ranking and greedy mask-based selection; measure per-query bytes and F1 against a baseline system.

Optimization Features

Token Efficiency

avoids sending raw documents in-context

Infra Optimization

reduced network transfer per query (~91% less)

Model Optimization

LoRAmask-gated parameter sparsity

System Optimization

balanced constrained k-means (max cluster size 5–10)bit-packing masks (8 bits per byte)

Training Optimization

train masks only after cluster adapter is frozentrain cluster adapter on augmented rewrites/QA pairs

Inference Optimization

upload masks + relevance scores onlymasked aggregation to avoid full parameter transfers

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Data URLs

HotpotQA2WikiMultihopQAPopQAComplexWebQuestions

Risks & Boundaries

Limitations

Assumes each silo has the same base LLM and can train adapters and masks locally.

Conflict-aware selection is NP-hard; paper uses a greedy heuristic which may not be optimal.

When Not To Use

When raw documents can be centralized securely and legal/operational costs of centralization are acceptable.

When silos lack compute to train adapters or a common LLM backbone.

Failure Modes

Poor clustering mixes unrelated docs and causes intra-adapter interference.

Masks that are too dense or poorly learned fail to prevent destructive aggregation.

Combine private knowledge across silos by sharing compact, masked parametric adapters instead of raw documents

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Average accuracy gain over state-of-the-art baselines

Silo storage reduced by clustering adapters

Results

What To Try In 7 Days

Optimization Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Context Entities

Models

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Average accuracy gain over state-of-the-art baselines

Silo storage reduced by clustering adapters

Results

What To Try In 7 Days

Optimization Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Context Entities

Models

You May Also Want to Read

A realistic benchmark and frozen-web environment for testing web research agents

Key finding

GeneAgent: an LLM agent that queries biology databases to verify and improve gene‑set function explanations

Key finding

Route simple queries straight to fast tools; use memory + planner only for complex job-career requests to cut latency and improve accuracy.

Key finding

SWAN: the first benchmark and baselines for mixing SQL databases with LLMs

Key finding

DQABench: a 200k QA benchmark and modular testbed to measure LLMs on real database questions

Key finding