Overview
FedMosaic shows practical gains and cost reductions on multiple QA datasets; results are empirical and rely on assumptions such as homogeneous backbone and local compute for adapter training.
Citations0
Evidence Strength0.80
Confidence0.80
Risk Signals8
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 5/5
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 80%
Production readiness: 70%
Novelty: 70%
Why It Matters For Business
FedMosaic lets companies aggregate private knowledge across departments or partners without moving raw documents, cutting network and storage costs dramatically while improving answer accuracy on evaluated QA tasks.
Who Should Care
Summary TLDR
FedMosaic is a federated RAG (retrieval-augmented generation) method that keeps raw documents inside each silo by converting clusters of local documents into shared LoRA adapters plus per-document binary masks. At query time silos send only relevance scores and masks; the server selects low-conflict adapters, aggregates masked parameters, and composes them with a frozen LLM. On four QA datasets it reports an average +10.9% F1 over competitive baselines while cutting storage by ~79–86% and per-query communication by ~91%, and it resists targeted data-extraction attacks (0% success in experiments).
Problem Statement
RAG improves LLM factuality by using external documents, but many domains cannot centralize raw documents for privacy or compliance. Parametric RAG (converting docs into adapters) preserves locality but naively produces too many adapters and breaks when adapters from many silos are averaged. The problem: how to (1) avoid sharing raw text, (2) reduce storage and communication, and (3) prevent destructive aggregation across silos while still integrating multi-silo knowledge.
Main Contribution
FedMosaic: first federated parametric RAG framework that enforces the locality constraint by sharing adapters instead of plaintext.
Multi-document adapters + document-specific binary masks: cluster related docs into one adapter and learn sparse masks so each doc activates a small adapter subspace.
Key Findings
Average accuracy gain over state-of-the-art baselines
Silo storage reduced by clustering adapters
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Average F1 improvement | +10.9% (relative) | state-of-the-art methods across four categories | +10.9% | aggregated over HotpotQA, 2WQA, PopQA, CWQ | Sec.4.2, Table 1 | Table 1 |
| Per-silo storage overhead | reduced to 11–21% of no-clustering variant | w/o clustering parametric RAG | 78.8%–86.3% reduction | clustering C∈{5,8,10} | Sec.4.4.1, Fig.4a | Fig.4a |
What To Try In 7 Days
Prototype per-silo LoRA adapters on a small private corpus and verify adapter training works locally.
Cluster related docs (k-means with max cluster size 5–10) and train a cluster adapter plus per-document masks.
Implement simple re-ranking and greedy mask-based selection; measure per-query bytes and F1 against a baseline system.
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Assumes each silo has the same base LLM and can train adapters and masks locally.
Conflict-aware selection is NP-hard; paper uses a greedy heuristic which may not be optimal.
When Not To Use
When raw documents can be centralized securely and legal/operational costs of centralization are acceptable.
When silos lack compute to train adapters or a common LLM backbone.
Failure Modes
Poor clustering mixes unrelated docs and causes intra-adapter interference.
Masks that are too dense or poorly learned fail to prevent destructive aggregation.

