Overview
Empirical benchmark across five models and multiple datasets shows repeatable trends. Results are strong on FLOP and density metrics but hardware-dependent for runtime; code will be released which aids reproducibility.
Citations1
Evidence Strength0.70
Confidence0.86
Risk Signals10
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 6/6
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 50%
Novelty: 40%
Why It Matters For Business
Pruning cuts model size and theoretical compute but doesn't guarantee runtime speedups; measuring on your hardware and considering smaller architectures first saves cost and deployment time.
Who Should Care
Summary TLDR
This paper benchmarks unstructured (magnitude) and structured (DepGraph) pruning on five Transformer-based multivariate time-series models across standard datasets. Key takeaways: you can usually prune ~50% of weights with little loss; Autoformer and FEDformer tolerate extreme pruning (down to ~1% of original params) on evaluated data; structured pruning reduces FLOPs strongly (up to ~7.6×) but yields little wall‑clock speedup on tested hardware; fine-tuning after pruning helps but results vary by model. Also, smaller models often outperform oversized ones on small datasets, so pick model size carefully.
Problem Statement
Transformer-based time-series models are growing large and costly. It is unclear how well common pruning methods (unstructured magnitude pruning and structured DepGraph node pruning) reduce model size, runtime, and predictive error for state-of-the-art time-series Transformers in practical settings.
Main Contribution
Trains and prunes five Transformer-based time-series models (Transformer, Informer, Autoformer, FEDformer, Crossformer) on multiple public datasets and horizons.
Compares unstructured magnitude pruning and structured pruning (torch-pruning / DepGraph) on predictive loss, parameter density, FLOPs and measured inference time.
Key Findings
Most models sustain pruning to about 50% density with little test loss increase.
Autoformer and FEDformer remain competitive even when pruned to very high sparsity.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Safe pruning baseline | Many models maintain test MSE at ≈50% density | unpruned | ≈2× parameter reduction | Multiple datasets (ETT, ECL, Traffic, Weather, ENTSO-E) | Fig.1, Sec.4.1 | Fig.1 |
| Extreme pruning tolerance (Auto/FED) | Comparable loss at ~1% density | other models at higher density | down to 1% params | Various small datasets (reported in Sec.4.1) | Sec.4.1, Fig.1 | Sec.4.1 |
What To Try In 7 Days
Apply unstructured magnitude pruning to 50% density on your model; validate test MSE.
Fine-tune pruned model for a few epochs and compare accuracy vs baseline.
Benchmark structured pruning (DepGraph) to measure FLOP and real runtime change on your target GPU and inference stack.
Optimization Features
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Code URLs
Data URLs
Risks & Boundaries
Limitations
Unstructured sparsity only masked; no specialized sparse kernels used so no native speedups.
DepGraph structured pruning failed to reach target sparsities for some models and datasets.
When Not To Use
If your inference stack lacks sparse kernel or compiler support — unstructured pruning won’t speed runtime.
If model code cannot be recompiled or simplified for TensorRT, structured pruning may not yield runtime gains.
Failure Modes
Exploding gradients or out-of-memory errors at extreme sparsities during training.
DepGraph pruner producing unexpected lower-than-target sparsity because of dependency graph constraints.

