Overview
Production Readiness
0.5
Novelty Score
0.4
Cost Impact Score
0.6
Citation Count
1
Why It Matters For Business
Pruning cuts model size and theoretical compute but doesn't guarantee runtime speedups; measuring on your hardware and considering smaller architectures first saves cost and deployment time.
Summary TLDR
This paper benchmarks unstructured (magnitude) and structured (DepGraph) pruning on five Transformer-based multivariate time-series models across standard datasets. Key takeaways: you can usually prune ~50% of weights with little loss; Autoformer and FEDformer tolerate extreme pruning (down to ~1% of original params) on evaluated data; structured pruning reduces FLOPs strongly (up to ~7.6×) but yields little wall‑clock speedup on tested hardware; fine-tuning after pruning helps but results vary by model. Also, smaller models often outperform oversized ones on small datasets, so pick model size carefully.
Problem Statement
Transformer-based time-series models are growing large and costly. It is unclear how well common pruning methods (unstructured magnitude pruning and structured DepGraph node pruning) reduce model size, runtime, and predictive error for state-of-the-art time-series Transformers in practical settings.
Main Contribution
Trains and prunes five Transformer-based time-series models (Transformer, Informer, Autoformer, FEDformer, Crossformer) on multiple public datasets and horizons.
Compares unstructured magnitude pruning and structured pruning (torch-pruning / DepGraph) on predictive loss, parameter density, FLOPs and measured inference time.
Measures effect of fine-tuning after pruning, experiments with reduced model sizes, and studies pruning on a very large dataset (ENTSO-E).
Key Findings
Most models sustain pruning to about 50% density with little test loss increase.
Autoformer and FEDformer remain competitive even when pruned to very high sparsity.
Structured pruning can reduce theoretical FLOPs a lot but gives small real speed gains on tested hardware.
Fine-tuning after pruning recovers accuracy variably across models.
Smaller models often outperform large ones on small datasets; resizing beats pruning in some cases.
Results
Safe pruning baseline
Extreme pruning tolerance (Auto/FED)
Structured pruning — FLOP reduction
Structured pruning — measured speedup
Fine-tuning effect
Right-sizing vs pruning
Who Should Care
What To Try In 7 Days
Apply unstructured magnitude pruning to 50% density on your model; validate test MSE.
Fine-tune pruned model for a few epochs and compare accuracy vs baseline.
Benchmark structured pruning (DepGraph) to measure FLOP and real runtime change on your target GPU and inference stack.
Optimization Features
Infra Optimization
- Measured on NVIDIA A100/H100; runtime gains depend on CUDA kernels
Model Optimization
- Unstructured magnitude pruning (masking weights)
- Structured node pruning via DepGraph (torch-pruning)
- Reduce linear embedding sizes (right-size model)
System Optimization
- Need hardware/software support for sparse kernels to get real speedups
Training Optimization
- Train larger model then prune and fine-tune
- Accuracy
Inference Optimization
- Structured pruning reduces FLOPs but not guaranteed runtime gains
- TensorRT compilation failed for published implementations
Reproducibility
Code Urls
- to be released upon publication (authors state code will be made public)
Data Urls
- https://transparency.entsoe.eu (ENTSO-E)
- datasets from Autoformer/Informer source repositories (public)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Unstructured sparsity only masked; no specialized sparse kernels used so no native speedups.
- DepGraph structured pruning failed to reach target sparsities for some models and datasets.
- TensorRT compilation for faster inference failed on published implementations, blocking deployment test.
- Experiments limited to five Transformer variants and selected public datasets; results may differ for other models or domains.
When Not To Use
- If your inference stack lacks sparse kernel or compiler support — unstructured pruning won’t speed runtime.
- If model code cannot be recompiled or simplified for TensorRT, structured pruning may not yield runtime gains.
- For tiny datasets where right-sizing the model is cheaper and more stable than pruning.
Failure Modes
- Exploding gradients or out-of-memory errors at extreme sparsities during training.
- DepGraph pruner producing unexpected lower-than-target sparsity because of dependency graph constraints.
- FLOP reduction without wall-clock speedup due to kernel and non-linear-module bottlenecks.
Core Entities
Models
- Transformer
- Informer
- Autoformer
- FEDformer
- Crossformer
Metrics
- MSE
- Parameter density
- Sparsity
- FLOP reduction
- Inference speedup
- Epoch time
Datasets
- ETTm1
- ETTm2
- ECL
- Exchange
- Traffic
- Weather
- ENTSO-E

