Share tiny LoRA adapters so heterogeneous clients learn together with far less compute and bandwidth

October 20, 20236 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.8

Citation Count

4

Authors

Liping Yi, Han Yu, Gang Wang, Xiaoguang Liu, Xiaoxiao Li

Links

Abstract / PDF

Why It Matters For Business

FedLoRA lets federated systems mix different client models while cutting device compute and network usage, enabling FL on diverse hardware without public data.

Summary TLDR

FedLoRA inserts a small, shared low-rank adapter (LoRA) into each client's larger, private model. Clients iteratively train their own model and the small adapter (freeze one while training the other) and only upload the small adapters to the server. This enables federated learning across different model architectures with much lower compute and communication cost while improving personalization. On CIFAR-10/100 experiments FedLoRA gave up to +1.35% accuracy vs best baselines, 11.81× lower compute and 7.41× less communication on evaluated settings, and a provable O(1/T) non-convex convergence rate.

Problem Statement

Federated learning often needs all clients to share the same model, which fails when clients have different models or limited resources. Existing model-heterogeneous FL solutions either need public data or impose high compute/communication costs. The goal is to enable personalized federated training across heterogeneous client models with low computation and low communication while keeping or improving accuracy.

Main Contribution

FedLoRA: a model-heterogeneous FL framework that inserts a small shared low-rank adapter (LoRA) into clients' fully connected layers and aggregates only adapters on the server.

Iterative local learning: alternate between freezing the adapter to train the local model and freezing the local model to train the adapter, enabling bidirectional global/local knowledge transfer.

Theoretical guarantee: non-convex convergence rate O(1/T) under standard smoothness and variance assumptions.

Empirical gains: on CIFAR-10/100, FedLoRA improves accuracy vs six baselines while reducing client computation and communication by large factors.

Key Findings

FedLoRA improves average test accuracy over state-of-the-art MHPFL methods on CIFAR-10/100.

Numbers+1.35% accuracy (best reported on evaluated benchmarks)

FedLoRA reduces client computation substantially by training only small adapters in addition to local models.

Numbersup to 11.81× computation reduction

FedLoRA cuts communication by sending small adapters instead of full models.

Numbersup to 7.41× communication saving

FedLoRA has provable convergence under standard assumptions.

Numbersnon-convex rate O(1/T)

Results

Accuracy

Valueup to +1.35% vs best baseline on evaluated settings

Baselinebest competing method (varies by setting)

Accuracy

Valueup to 11.81× reduction

BaselineFedProto (reported comparison)

Accuracy

Valueup to 7.41× saving

BaselineFedProto (reported comparison)

theoretical convergence

Valuenon-convex rate O(1/T)

Who Should Care

What To Try In 7 Days

Prototype inserting small LoRA adapters into FC layers of your client models and aggregate adapters only.

Implement iterative local training: freeze adapter to train model, then freeze model to train adapter.

Measure transmitted parameters and local FLOPs to confirm communication and compute savings.

Optimization Features

Model Optimization

  • LoRA

System Optimization

  • reduced transmitted parameters by exchanging adapters only

Training Optimization

  • iterative alternation: freeze adapter then model
  • train only small adapter parameters for aggregation

Reproducibility

Data Available

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Evaluations only on small CNNs and CIFAR-10/100; not tested on large models or real-world FL deployments
  • Adapters are matched to fully connected layers; methods may need rework for architectures without similar FC layers
  • Requires tuning adapter size and the loss weight μ for good performance

When Not To Use

  • When clients cannot share any model parameters or only allow secure aggregation of gradients
  • When clients' model architectures lack compatible fully connected layers for adapter insertion
  • For tasks outside supervised classification without further validation

Failure Modes

  • An immature global adapter early in training can hurt local model performance until adapters stabilize
  • Convergence can be slower than plain FedAvg because adapters require extra local training
  • May underperform if adapter rank or insertion points are poorly chosen

Core Entities

Models

  • LoRA
  • FedAvg
  • FedProto
  • FML
  • FedKD
  • LG-FedAvg
  • FD

Metrics

  • Accuracy
  • communication cost (transmitted parameters)
  • computation cost (FLOPs)
  • convergence rate

Datasets

  • CIFAR-10
  • CIFAR-100