Combine private knowledge across silos by sharing compact, masked parametric adapters instead of raw documents

February 5, 20267 min

Overview

Production Readiness

0.7

Novelty Score

0.7

Cost Impact Score

0.8

Citation Count

0

Authors

Zhilin Liang, Yuxiang Wang, Zimu Zhou, Hainan Zhang, Boyi Liu, Yongxin Tong

Links

Abstract / PDF

Why It Matters For Business

FedMosaic lets companies aggregate private knowledge across departments or partners without moving raw documents, cutting network and storage costs dramatically while improving answer accuracy on evaluated QA tasks.

Summary TLDR

FedMosaic is a federated RAG (retrieval-augmented generation) method that keeps raw documents inside each silo by converting clusters of local documents into shared LoRA adapters plus per-document binary masks. At query time silos send only relevance scores and masks; the server selects low-conflict adapters, aggregates masked parameters, and composes them with a frozen LLM. On four QA datasets it reports an average +10.9% F1 over competitive baselines while cutting storage by ~79–86% and per-query communication by ~91%, and it resists targeted data-extraction attacks (0% success in experiments).

Problem Statement

RAG improves LLM factuality by using external documents, but many domains cannot centralize raw documents for privacy or compliance. Parametric RAG (converting docs into adapters) preserves locality but naively produces too many adapters and breaks when adapters from many silos are averaged. The problem: how to (1) avoid sharing raw text, (2) reduce storage and communication, and (3) prevent destructive aggregation across silos while still integrating multi-silo knowledge.

Main Contribution

FedMosaic: first federated parametric RAG framework that enforces the locality constraint by sharing adapters instead of plaintext.

Multi-document adapters + document-specific binary masks: cluster related docs into one adapter and learn sparse masks so each doc activates a small adapter subspace.

Selective adapter aggregation: server chooses high-relevance, low-conflict masks and aggregates masked adapters to avoid destructive averaging.

Key Findings

Average accuracy gain over state-of-the-art baselines

NumbersAvg +10.9% F1 across four datasets

Silo storage reduced by clustering adapters

NumbersStorage cut by 78.8%–86.3%

Per-query communication drops sharply

NumbersCommunication reduced by 91.4%

Improved resistance to targeted data-extraction attacks

NumbersTargeted-attack success = 0% in experiments

Results

Average F1 improvement

Value+10.9% (relative)

Baselinestate-of-the-art methods across four categories

Per-silo storage overhead

Valuereduced to 11–21% of no-clustering variant

Baselinew/o clustering parametric RAG

Per-query communication

Value≈4.86% of no-clustering at k=10

Baselinew/o clustering parametric RAG

Privacy: targeted attack success

Value0% (experiment)

Baselinein-context FedRAG

Scaling to larger backbone

ValueBridge +4.2%, Compose +20.8%

Baselinestrongest competitive baselines with LLaMA3-8B

Who Should Care

What To Try In 7 Days

Prototype per-silo LoRA adapters on a small private corpus and verify adapter training works locally.

Cluster related docs (k-means with max cluster size 5–10) and train a cluster adapter plus per-document masks.

Implement simple re-ranking and greedy mask-based selection; measure per-query bytes and F1 against a baseline system.

Optimization Features

Token Efficiency

  • avoids sending raw documents in-context

Infra Optimization

  • reduced network transfer per query (~91% less)

Model Optimization

  • LoRA
  • mask-gated parameter sparsity

System Optimization

  • balanced constrained k-means (max cluster size 5–10)
  • bit-packing masks (8 bits per byte)

Training Optimization

  • train masks only after cluster adapter is frozen
  • train cluster adapter on augmented rewrites/QA pairs

Inference Optimization

  • upload masks + relevance scores only
  • masked aggregation to avoid full parameter transfers

Reproducibility

Data Urls

  • HotpotQA
  • 2WikiMultihopQA
  • PopQA
  • ComplexWebQuestions

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Assumes each silo has the same base LLM and can train adapters and masks locally.
  • Conflict-aware selection is NP-hard; paper uses a greedy heuristic which may not be optimal.
  • Performance measured on QA datasets with synthetic corpora; real-world heterogeneity may require tuning.

When Not To Use

  • When raw documents can be centralized securely and legal/operational costs of centralization are acceptable.
  • When silos lack compute to train adapters or a common LLM backbone.

Failure Modes

  • Poor clustering mixes unrelated docs and causes intra-adapter interference.
  • Masks that are too dense or poorly learned fail to prevent destructive aggregation.
  • Re-ranker errors select irrelevant masks, degrading aggregated adapter quality.

Core Entities

Models

  • LLaMA3.2-1B-Instruct
  • LLaMA3-8B-Instruct
  • LoRA

Metrics

  • F1
  • attack success rate
  • parameters transmitted per query
  • silo storage overhead

Datasets

  • HotpotQA
  • 2WikiMultihopQA
  • PopQA
  • ComplexWebQuestions
  • Enron Emails
  • WikiText

Context Entities

Models

  • re-ranking model Mr