AutoFLIP: Federated hybrid pruning guided by client loss exploration

May 16, 20247 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.8

Citation Count

2

Authors

Christian Internò, Elena Raponi, Niki van Stein, Thomas Bäck, Markus Olhofer, Yaochu Jin, Barbara Hammer

Links

Abstract / PDF

Why It Matters For Business

AutoFLIP cuts client compute and bandwidth by tens of percent while often improving accuracy on heterogeneous data, enabling cheaper, faster federated deployments on edge devices.

Summary TLDR

AutoFLIP is a federated learning (FL) method that runs a one‑time federated loss exploration step to identify which weights and structures matter across clients. It produces a global pruning mask that applies both unstructured (individual weights) and structured (neurons/filters) pruning each round. On non‑IID benchmarks (MNIST, CIFAR10/100, FEMNIST, Shakespeare) AutoFLIP cuts FLOPs and bandwidth needs while often improving final global accuracy versus FedAvg and pruning baselines (PruneFL, EFLPrune). The method is single‑server, works with standard optimizers, and is most helpful for complex models and strongly non‑IID data.

Problem Statement

Federated learning with large models faces two linked problems: high communication cost when sending full models each round, and heavy local compute on resource‑limited clients. Non‑IID client data increases update variance and hinders convergence. The paper asks: can we use a short, federated loss exploration to find a pruning mask that reduces compute and communication while aligning client updates and preserving or improving accuracy?

Main Contribution

A federated loss exploration phase: clients explore local loss landscapes for a limited number of epochs and return per‑parameter squared deviations to the server.

A hybrid pruning scheme that binarizes a global guidance matrix to prune both individual weights (unstructured) and whole units (structured) based on exploration.

Extensive experiments across non‑IID partitionings and architectures showing reduced FLOPs, reduced communication cost, and improved or maintained accuracy versus FedAvg, PruneFL and EFLPrune.

Key Findings

Large accuracy gain on a hard non‑IID task (CIFAR‑100, ResNet).

NumbersAutoFLIP 0.987 vs FedAvg 0.918 (Δ +0.069 on CIFAR100 ResNet)

Substantial reduction in client compute (FLOPs).

NumbersResNet FLOPs −52.8%, FEMNIST −56.5%, EfficientNet‑B3 −46.4%

Lower communication needs to reach target accuracy.

NumbersSix‑layer CNN cost −41.61% ; EfficientNet‑B3 −30.93% (GB transferred)

AutoFLIP uses a short initial exploration to form pruning guidance.

NumbersExploration used Cexp=C and Eexp up to 150 (ablation shows gains saturate after ~300 epochs)

Compression ratios reported per model.

NumbersCompression: six‑layer CNN 1.74x, EfficientNet‑B3 2.1x, ResNet 1.58x, FEMNIST 1.8x

Results

Accuracy

Value0.987

BaselineFedAvg 0.918

FLOPs reduction (ResNet)

Value52.8% ↓

BaselineOriginal GFLOPs 7.8 → Reduced 4.1

Communication cost reduction (Six-layer CNN)

Value41.61% ↓

BaselineNo AutoFLIP cost 320.00 GB → AutoFLIP 189.45 GB

Compression rate (EfficientNet-B3)

Value2.1x (≈52.38% parameters pruned)

BaselineOriginal model parameters 10,838,784

Who Should Care

What To Try In 7 Days

Run AutoFLIP code on a small FL simulation (2–10 clients) to measure FLOPs and bandwidth versus FedAvg.

Tune the pruning threshold Tp to target a specific compression ratio and observe accuracy tradeoffs.

Profile client-side latency and energy before/after applying the produced pruning mask.

Optimization Features

Infra Optimization

  • better fit for resource-constrained edge devices

Model Optimization

  • hybrid pruning (structured + unstructured)
  • LoRA
  • binarized mask thresholding (Tp)

System Optimization

  • Accuracy
  • lower client compute requirements

Training Optimization

  • LoRA
  • apply pruning mask before local training each round

Inference Optimization

  • structured pruning removes filters/neurons to drop FLOPs
  • reduced model size lowers transmission time

Reproducibility

Code Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Tested in a single‑server FL setting; multi‑server or hierarchical setups not evaluated.
  • Assumes clients share the same initial model architecture and compatible pruning.
  • Does not evaluate label noise or intentional adversarial clients during exploration.
  • Unstructured sparsity benefits may not map to speedups on all hardware.

When Not To Use

  • Clients run widely different model architectures that cannot share a single pruning mask.
  • Hierarchical or multi‑server deployments where centralized exploration is infeasible.
  • Datasets with heavy label noise or suspected adversarial participants during exploration.
  • Targets require strict unstructured sparsity hardware support but deployment hardware does not exploit it.

Failure Modes

  • Over‑aggressive Tp setting prunes important parameters and reduces accuracy.
  • Exploration phase leaks sensitive gradient behavior if not privacy‑protected (not addressed).
  • Mismatch between pruning mask and client hardware causes no real runtime gain despite lower FLOPs.
  • Using too few explorer clients yields a poor global guidance matrix and suboptimal pruning.

Core Entities

Models

  • ResNet
  • EfficientNet-B3
  • Six-layer CNN
  • LSTM (two-layer)

Metrics

  • Accuracy
  • FLOPs
  • compression rate
  • communication cost (GB)
  • number of parameters

Datasets

  • MNIST
  • CIFAR10
  • CIFAR100
  • FEMNIST
  • Shakespeare

Benchmarks

  • LEAF