Combine private knowledge across silos by sharing compact, masked parametric adapters instead of raw documents

February 5, 20267 min

Overview

Decision SnapshotReady For Pilot

FedMosaic shows practical gains and cost reductions on multiple QA datasets; results are empirical and rely on assumptions such as homogeneous backbone and local compute for adapter training.

Citations0

Evidence Strength0.80

Confidence0.80

Risk Signals8

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 80%

Production readiness: 70%

Novelty: 70%

Authors

Zhilin Liang, Yuxiang Wang, Zimu Zhou, Hainan Zhang, Boyi Liu, Yongxin Tong

Links

Abstract / PDF / Data

Why It Matters For Business

FedMosaic lets companies aggregate private knowledge across departments or partners without moving raw documents, cutting network and storage costs dramatically while improving answer accuracy on evaluated QA tasks.

Who Should Care

Summary TLDR

FedMosaic is a federated RAG (retrieval-augmented generation) method that keeps raw documents inside each silo by converting clusters of local documents into shared LoRA adapters plus per-document binary masks. At query time silos send only relevance scores and masks; the server selects low-conflict adapters, aggregates masked parameters, and composes them with a frozen LLM. On four QA datasets it reports an average +10.9% F1 over competitive baselines while cutting storage by ~79–86% and per-query communication by ~91%, and it resists targeted data-extraction attacks (0% success in experiments).

Problem Statement

RAG improves LLM factuality by using external documents, but many domains cannot centralize raw documents for privacy or compliance. Parametric RAG (converting docs into adapters) preserves locality but naively produces too many adapters and breaks when adapters from many silos are averaged. The problem: how to (1) avoid sharing raw text, (2) reduce storage and communication, and (3) prevent destructive aggregation across silos while still integrating multi-silo knowledge.

Main Contribution

FedMosaic: first federated parametric RAG framework that enforces the locality constraint by sharing adapters instead of plaintext.

Multi-document adapters + document-specific binary masks: cluster related docs into one adapter and learn sparse masks so each doc activates a small adapter subspace.

Key Findings

Average accuracy gain over state-of-the-art baselines

NumbersAvg +10.9% F1 across four datasets

Practical UseExpect about a 10% relative F1 improvement on evaluated QA datasets when replacing other federated/local RAG methods with FedMosaic.

Evidence RefSec.4.2, Table 1

Silo storage reduced by clustering adapters

NumbersStorage cut by 78.8%–86.3%

Practical UseYou can store many fewer adapter files per silo; plan for roughly one-fifth to one-eighth of prior adapter storage.

Evidence RefAbstract, Sec.4.4.1, Fig.4a

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Average F1 improvement+10.9% (relative)state-of-the-art methods across four categories+10.9%aggregated over HotpotQA, 2WQA, PopQA, CWQSec.4.2, Table 1Table 1
Per-silo storage overheadreduced to 1121% of no-clustering variantw/o clustering parametric RAG78.8%–86.3% reductionclustering C∈{5,8,10}Sec.4.4.1, Fig.4aFig.4a

What To Try In 7 Days

Prototype per-silo LoRA adapters on a small private corpus and verify adapter training works locally.

Cluster related docs (k-means with max cluster size 5–10) and train a cluster adapter plus per-document masks.

Implement simple re-ranking and greedy mask-based selection; measure per-query bytes and F1 against a baseline system.

Optimization Features

Token Efficiency
avoids sending raw documents in-context
Infra Optimization
reduced network transfer per query (~91% less)
Model Optimization
LoRAmask-gated parameter sparsity
System Optimization
balanced constrained k-means (max cluster size 5–10)bit-packing masks (8 bits per byte)
Training Optimization
train masks only after cluster adapter is frozentrain cluster adapter on augmented rewrites/QA pairs
Inference Optimization
upload masks + relevance scores onlymasked aggregation to avoid full parameter transfers

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

HotpotQA2WikiMultihopQAPopQAComplexWebQuestions

Risks & Boundaries

Limitations

Assumes each silo has the same base LLM and can train adapters and masks locally.

Conflict-aware selection is NP-hard; paper uses a greedy heuristic which may not be optimal.

When Not To Use

When raw documents can be centralized securely and legal/operational costs of centralization are acceptable.

When silos lack compute to train adapters or a common LLM backbone.

Failure Modes

Poor clustering mixes unrelated docs and causes intra-adapter interference.

Masks that are too dense or poorly learned fail to prevent destructive aggregation.

Core Entities

Models

LLaMA3.2-1B-InstructLLaMA3-8B-InstructLoRA

Metrics

F1attack success rateparameters transmitted per querysilo storage overhead

Datasets

HotpotQA2WikiMultihopQAPopQAComplexWebQuestionsEnron EmailsWikiText

Context Entities

Models

re-ranking model Mr