Overview
Production Readiness
0.6
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
EasyTime speeds method evaluation and selection by reusing a large benchmark and automating ensembles, reducing experiment time and guesswork for forecasting projects.
Summary TLDR
EasyTime is a demo system that makes time-series forecasting easier for researchers and practitioners. It builds on the TFB benchmark (25 multivariate and 8,068 univariate datasets) and 30+ methods to provide: one-click evaluation for new algorithms, an Automated Ensemble that recommends and combines top methods using TS2Vec + a method-ranking classifier, and a natural-language Q&A module that converts questions to SQL, retrieves benchmark results, and returns answers, charts, and SQL. The system is aimed at faster evaluation, easier model selection, and interactive exploration of historical benchmarking evidence.
Problem Statement
Researchers and practitioners face three pain points: (1) evaluating forecasting methods comprehensively is time-consuming and error-prone; (2) picking suitable methods for a new dataset is hard because no single method wins everywhere; (3) querying benchmarking results or getting practical guidance requires technical effort or expert knowledge.
Main Contribution
A one-click evaluation layer that runs a method across TFB’s diverse datasets and standardized pipelines.
An Automated Ensemble module that uses TS2Vec features and a pre-trained classifier (soft-label loss) to recommend top-k methods and learn ensemble weights on the target data.
A natural-language Q&A module that turns user questions into SQL, verifies queries, retrieves benchmark data, and returns natural language answers plus charts and SQL.
An integrated demo showing how benchmark knowledge can support method selection, automated ensembles, and interactive queries.
Key Findings
TFB contains broad data and precomputed results.
Benchmark includes many methods and accumulated results.
Automated Ensemble uses representation learning plus a classifier to rank methods.
Q&A returns natural language, charts, and SQL for transparency.
Results
Coverage
Dataset counts
Q&A outputs
Who Should Care
What To Try In 7 Days
Embed one of your models into the TFB pipeline and run one-click evaluation.
Upload a real dataset and click 'Recommend Method' to see top-k candidates.
Try the 'AutoML' ensemble flow to compare ensemble vs single methods on your data.
Agent Features
Memory
- benchmark knowledge base
Tool Use
- LLM for NL2SQL
- SQL database for retrieval
- TS2Vec for feature extraction
Frameworks
- TFB
- Darts
- TSLib
Architectures
- offline pretraining + online inference
Reproducibility
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Quality of recommendations depends on TFB’s dataset coverage; out-of-distribution datasets may not be well served.
- Automated Ensemble relies on classifier trained on historical results and may misrank novel-method behaviors.
- NL answers depend on LLM correctness; verification reduces but does not eliminate hallucinations.
When Not To Use
- When your dataset has patterns not represented in TFB and bespoke modeling is required.
- When strict, validated probabilistic forecasting guarantees are required beyond empirical ensembles.
Failure Modes
- Classifier recommends poor models for truly novel datasets.
- Generated SQL is incorrect and returns misleading data if verification fails.
- Ensemble overfits small datasets when top-k candidates are too similar.
Core Entities
Models
- TS2Vec
- NLinear
- PatchTST
- FiLM
- TimesNet
- DLinear
- Linear
- MICN
Metrics
- MAE
- custom metrics (supported)
Datasets
- TFB
- traffic
- electricity
- energy
- environment
- nature
- economic
- stock
- banking
- health
- web
Benchmarks
- TFB

