Overview
Production Readiness
0.7
Novelty Score
0.7
Cost Impact Score
0.8
Citation Count
0
Why It Matters For Business
FedMosaic lets companies aggregate private knowledge across departments or partners without moving raw documents, cutting network and storage costs dramatically while improving answer accuracy on evaluated QA tasks.
Summary TLDR
FedMosaic is a federated RAG (retrieval-augmented generation) method that keeps raw documents inside each silo by converting clusters of local documents into shared LoRA adapters plus per-document binary masks. At query time silos send only relevance scores and masks; the server selects low-conflict adapters, aggregates masked parameters, and composes them with a frozen LLM. On four QA datasets it reports an average +10.9% F1 over competitive baselines while cutting storage by ~79–86% and per-query communication by ~91%, and it resists targeted data-extraction attacks (0% success in experiments).
Problem Statement
RAG improves LLM factuality by using external documents, but many domains cannot centralize raw documents for privacy or compliance. Parametric RAG (converting docs into adapters) preserves locality but naively produces too many adapters and breaks when adapters from many silos are averaged. The problem: how to (1) avoid sharing raw text, (2) reduce storage and communication, and (3) prevent destructive aggregation across silos while still integrating multi-silo knowledge.
Main Contribution
FedMosaic: first federated parametric RAG framework that enforces the locality constraint by sharing adapters instead of plaintext.
Multi-document adapters + document-specific binary masks: cluster related docs into one adapter and learn sparse masks so each doc activates a small adapter subspace.
Selective adapter aggregation: server chooses high-relevance, low-conflict masks and aggregates masked adapters to avoid destructive averaging.
Key Findings
Average accuracy gain over state-of-the-art baselines
Silo storage reduced by clustering adapters
Per-query communication drops sharply
Improved resistance to targeted data-extraction attacks
Results
Average F1 improvement
Per-silo storage overhead
Per-query communication
Privacy: targeted attack success
Scaling to larger backbone
Who Should Care
What To Try In 7 Days
Prototype per-silo LoRA adapters on a small private corpus and verify adapter training works locally.
Cluster related docs (k-means with max cluster size 5–10) and train a cluster adapter plus per-document masks.
Implement simple re-ranking and greedy mask-based selection; measure per-query bytes and F1 against a baseline system.
Optimization Features
Token Efficiency
- avoids sending raw documents in-context
Infra Optimization
- reduced network transfer per query (~91% less)
Model Optimization
- LoRA
- mask-gated parameter sparsity
System Optimization
- balanced constrained k-means (max cluster size 5–10)
- bit-packing masks (8 bits per byte)
Training Optimization
- train masks only after cluster adapter is frozen
- train cluster adapter on augmented rewrites/QA pairs
Inference Optimization
- upload masks + relevance scores only
- masked aggregation to avoid full parameter transfers
Reproducibility
Data Urls
- HotpotQA
- 2WikiMultihopQA
- PopQA
- ComplexWebQuestions
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Assumes each silo has the same base LLM and can train adapters and masks locally.
- Conflict-aware selection is NP-hard; paper uses a greedy heuristic which may not be optimal.
- Performance measured on QA datasets with synthetic corpora; real-world heterogeneity may require tuning.
When Not To Use
- When raw documents can be centralized securely and legal/operational costs of centralization are acceptable.
- When silos lack compute to train adapters or a common LLM backbone.
Failure Modes
- Poor clustering mixes unrelated docs and causes intra-adapter interference.
- Masks that are too dense or poorly learned fail to prevent destructive aggregation.
- Re-ranker errors select irrelevant masks, degrading aggregated adapter quality.
Core Entities
Models
- LLaMA3.2-1B-Instruct
- LLaMA3-8B-Instruct
- LoRA
Metrics
- F1
- attack success rate
- parameters transmitted per query
- silo storage overhead
Datasets
- HotpotQA
- 2WikiMultihopQA
- PopQA
- ComplexWebQuestions
- Enron Emails
- WikiText
Context Entities
Models
- re-ranking model Mr

