Overview
Production Readiness
0.7
Novelty Score
0.65
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
Better short-to-long forecasting where periodic patterns exist. The model lowers error vs a leading MoE baseline and keeps inference costs similar, so operational forecasting (store traffic, sales, energy) can be more accurate without extra latency.
Summary TLDR
MoFE-Time adds a Frequency-Time Cell (FTC) inside Mixture-of-Experts (MoE) blocks and uses pretraining → fine-tuning to learn both periodic (frequency) and temporal features. On six public benchmarks and a proprietary NEV-sales dataset it reduces MSE/MAE vs. Time‑MoE (the main baseline) and keeps inference speed comparable.
Problem Statement
Current large time-series models either ignore intrinsic frequency (periodic) structure or convert signals to frequency space outside the model. That leads to suboptimal forecasts and poor cross-dataset transfer when series have varying periodicity and non-stationarity.
Main Contribution
MoFE-Time: integrate a Frequency-Time Cell (FTC) inside each MoE expert to learn frequency and time features jointly.
Adopt a pretraining → fine-tuning workflow (use Time-300B for pretraining) to transfer prior pattern knowledge across datasets.
Introduce RevIN (reversible instance normalization) and temporal aggregation to handle non-stationarity and variable-length series.
Collected NEV-sales, a proprietary daily store-traffic dataset (~330k points across 498 series) to test commercial performance.
Key Findings
MoFE-Time improves average forecast error on six public benchmarks compared to Time‑MoE.
On the proprietary NEV-sales commercial dataset, MoFE-Time outperforms Time‑MoE.
Pretraining provides the largest single ablation benefit; FTC and RevIN also help.
FTC experts learn concentrated energy at true harmonics; replacing FTC with feedforward weakens spectral focus and prediction.
MoFE-Time has similar or faster inference time than Time‑MoE with comparable parameter count.
Results
Average MSE (public benchmarks)
Average MAE (public benchmarks)
NEV-sales MSE (commercial dataset)
NEV-sales MAE (commercial dataset)
Inference speed
Who Should Care
What To Try In 7 Days
If you use a transformer MoE baseline: add a frequency-aware expert block (FTC) and test on a holdout set.
Apply RevIN at input to reduce non-stationarity before training or fine-tuning.
If you have cross-domain series, pretrain on diverse time-series or fine-tune a large pre-trained checkpoint to capture prior patterns.
Agent Features
Memory
- pretrained weights (transfer of prior pattern knowledge)
Architectures
- MoE
- Transformer-style attention
- Frequency-Time Cell (FTC)
Optimization Features
Model Optimization
- sparse MoE experts to focus compute
- frequency-aware expert structure (FTC) to concentrate spectral energy
Training Optimization
- pretraining on Time-300B then one-epoch fine-tuning on target sets
- use of RevIN to stabilize training across non-stationary windows
Inference Optimization
- sparse routing keeps inference cost comparable to Time‑MoE
- model sized ~118M params evaluated on A100
Reproducibility
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- NEV-sales is proprietary; gains on that dataset may not generalize to other commercial series.
- Paper reports no public code or hyperparameters for full replication beyond optimizer basics.
- Pretraining is a major driver of gains—if you cannot pretrain on large, diverse corpora, improvements may be smaller.
When Not To Use
- When series have no clear periodic components or when FFT-style spectral features are irrelevant.
- When you cannot pretrain or lack diverse time-series data for transfer learning.
Failure Modes
- Spectral leakage or poor harmonic separation on short or highly irregular sequences.
- Reduced benefit on datasets with unstable, non-periodic signals (authors note Exchange-rate is unstable).
- Proprietary-data overfitting: FTC may latch onto dataset-specific harmonics that don’t generalize.
Core Entities
Models
- MoFE-Time
- Time-MoE
- TimeMixer
- TimeXer
- TimesNet
- AutoFormer
- PatchTST
Metrics
- MSE
- MAE
- inference time
Datasets
- Time-300B
- ETTh1
- ETTh2
- ETTm1
- ETTm2
- Weather
- Exchange
- NEV-sales
Benchmarks
- ETTh1
- ETTh2
- ETTm1
- ETTm2
- Weather
- Exchange
Context Entities
Models
- Moment
- Chronos
- TimesFM
- Lag-Llama
Datasets
- public time series corpora (as used in Time-300B)

