Overview
System demo built on an existing open benchmark with concrete modules, but no end-to-end production deployment evidence is provided.
Citations0
Evidence Strength0.60
Confidence0.70
Risk Signals8
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 0/3
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
EasyTime speeds method evaluation and selection by reusing a large benchmark and automating ensembles, reducing experiment time and guesswork for forecasting projects.
Who Should Care
Summary TLDR
EasyTime is a demo system that makes time-series forecasting easier for researchers and practitioners. It builds on the TFB benchmark (25 multivariate and 8,068 univariate datasets) and 30+ methods to provide: one-click evaluation for new algorithms, an Automated Ensemble that recommends and combines top methods using TS2Vec + a method-ranking classifier, and a natural-language Q&A module that converts questions to SQL, retrieves benchmark results, and returns answers, charts, and SQL. The system is aimed at faster evaluation, easier model selection, and interactive exploration of historical benchmarking evidence.
Problem Statement
Researchers and practitioners face three pain points: (1) evaluating forecasting methods comprehensively is time-consuming and error-prone; (2) picking suitable methods for a new dataset is hard because no single method wins everywhere; (3) querying benchmarking results or getting practical guidance requires technical effort or expert knowledge.
Main Contribution
A one-click evaluation layer that runs a method across TFB’s diverse datasets and standardized pipelines.
An Automated Ensemble module that uses TS2Vec features and a pre-trained classifier (soft-label loss) to recommend top-k methods and learn ensemble weights on the target data.
Key Findings
TFB contains broad data and precomputed results.
Benchmark includes many methods and accumulated results.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Coverage | 30+ methods evaluated | — | — | TFB (8,000+ series) | Benchmark results collected from evaluating 30+ methods on 8,000+ time series | II.A |
| Dataset counts | 25 multivariate; 8,068 univariate | — | — | TFB | Data layer includes 25 multivariate and 8,068 univariate datasets | II.A |
What To Try In 7 Days
Embed one of your models into the TFB pipeline and run one-click evaluation.
Upload a real dataset and click 'Recommend Method' to see top-k candidates.
Try the 'AutoML' ensemble flow to compare ensemble vs single methods on your data.
Agent Features
Memory
Tool Use
Frameworks
Architectures
Reproducibility
Risks & Boundaries
Limitations
Quality of recommendations depends on TFB’s dataset coverage; out-of-distribution datasets may not be well served.
Automated Ensemble relies on classifier trained on historical results and may misrank novel-method behaviors.
When Not To Use
When your dataset has patterns not represented in TFB and bespoke modeling is required.
When strict, validated probabilistic forecasting guarantees are required beyond empirical ensembles.
Failure Modes
Classifier recommends poor models for truly novel datasets.
Generated SQL is incorrect and returns misleading data if verification fails.

