From Data Hunt to Model Building in One Line of Code

You're a data scientist at a logistics company tasked with building a demand forecasting system. Your manager wants to see results fast, but you're stuck spending days hunting for quality time series data, cleaning datasets, and figuring out proper evaluation metrics. Meanwhile, your competitors are already using state-of-the-art forecasting models on clean, benchmark-quality data.

Accessing benchmark datasets for time series analysis involves endless friction.

Kaggle demands registration and phone verification. Academic sources require complex authentication. Competition datasets hide behind paywalls.

Once downloaded, every dataset needs manual format conversion, custom preprocessing scripts, and hours of structural analysis. All this effort just to get started with time series analysis.

datasetsforecast eliminates this bottleneck by providing instant access to 7+ major forecasting benchmark datasets with automatic downloading, proper formatting, and competition-ready evaluation metrics.

Introduction to datasetsforecast

datasetsforecast is Nixtla's benchmark dataset library that provides one-line access to the world's most important time series forecasting competitions and research datasets:

Instant dataset loading: Download and format happens automatically
Competition datasets: M3, M4, M5 forecasting competitions
Hierarchical data: Tourism, Labor, Traffic datasets with constraint matrices
Built-in evaluation: Compare against competition winners and benchmarks
Research-ready format: Standardized pandas DataFrames for all libraries

All datasets follow the same three-column format (unique_id, ds, y) that works seamlessly with popular forecasting libraries like StatsForecast, MLforecast, and NeuralForecast.

Setup

Install datasetsforecast with pip:

pip install datasetsforecast

Additional dependencies for the examples:

pip install pandas numpy matplotlib statsforecast

Import the necessary libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datasetsforecast.m4 import M4
from datasetsforecast.hierarchical import HierarchicalData

Competition Datasets (M4)

The M4 forecasting competition is the largest and most prestigious time series competition, with over 100,000 time series across different frequencies. Loading any frequency group takes just one line:

# Load M4 Hourly dataset (414 time series, 48-hour forecasts)
data_df, _, meta_df = M4.load(directory='data', group='Hourly')

print(f"Training data shape: {data_df.shape}")
print(f"Number of unique series: {data_df['unique_id'].nunique()}")

# Preview the data structure
print(data_df.groupby('unique_id')['y'].count().head())

Training data shape: (373372, 3)
Number of unique series: 414
unique_id
H1      748
H10     748
H100    748
H101    748
H102    748
Name: y, dtype: int64

Each M4 frequency group has different characteristics:

# Available M4 frequency groups and their properties
frequency_info = {
    'Yearly': {'series': 23000, 'horizon': 6, 'seasonality': 1},
    'Quarterly': {'series': 24000, 'horizon': 8, 'seasonality': 4},
    'Monthly': {'series': 48000, 'horizon': 18, 'seasonality': 12},
    'Weekly': {'series': 359, 'horizon': 13, 'seasonality': 1},
    'Daily': {'series': 4227, 'horizon': 14, 'seasonality': 1},
    'Hourly': {'series': 414, 'horizon': 48, 'seasonality': 24}
}

for freq, info in frequency_info.items():
    print(f"{freq}: {info['series']} series, {info['horizon']}-step ahead forecasts")

Yearly: 23000 series, 6-step ahead forecasts
Quarterly: 24000 series, 8-step ahead forecasts
Monthly: 48000 series, 18-step ahead forecasts
Weekly: 359 series, 13-step ahead forecasts
Daily: 4227 series, 14-step ahead forecasts
Hourly: 414 series, 48-step ahead forecasts

Let's visualize a few series to understand the data patterns:

# Visualize sample hourly series
sample_series = ['H1', 'H2', 'H3']
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for i, series_id in enumerate(sample_series):
    series_data = data_df[data_df['unique_id'] == series_id]
    axes[i].plot(series_data['ds'], series_data['y'], color='#98FE09')
    axes[i].set_title(f'Series {series_id}')
    axes[i].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

M4 Hourly Series Examples

The hourly data shows typical electricity consumption patterns with clear daily seasonality and varying magnitudes across different series.

Other available competition datasets include:

M3 competition with 3,003 time series across Yearly, Quarterly, Monthly, and Other frequencies
M5 competition with 30,490 Walmart sales time series

All datasets maintain the same three-column format for seamless integration.

Hierarchical Forecasting (Tourism)

Hierarchical forecasting requires data organized in hierarchical structures where forecasts at different levels must be coherent (bottom-level forecasts sum to higher levels). datasetsforecast provides several hierarchical benchmark datasets with automatic constraint matrix generation:

# Load Tourism dataset with hierarchical structure
Y_df, S_df, _ = HierarchicalData.load(directory="data", group="TourismLarge")

print(f"Time series data shape: {Y_df.shape}")
print(f"Constraint matrix shape: {S_df.shape}")

# Show hierarchical structure
print(f"\nHierarchical structure:")
print(f"  Total series: {len(Y_df['unique_id'].unique())}")
print(f"  Bottom level series: {S_df.shape[1]}")
print(f"  Aggregated levels: {S_df.shape[0] - S_df.shape[1]}")
print(f"  Total hierarchical nodes: {S_df.shape[0]}")

Time series data shape: (126540, 3)
Constraint matrix shape: (555, 304)

Hierarchical structure:
  Total series: 555
  Bottom level series: 304
  Aggregated levels: 251
  Total hierarchical nodes: 555

Other available hierarchical datasets in the hierarchical collection include:

Tourism: TourismLarge and TourismSmall variants
Labour: Monthly labor statistics with 8-period horizon
Traffic: Daily road occupancy data with 14-period horizon
Wiki2: Daily Wikipedia page views data
Legacy datasets: OldTraffic and OldTourismLarge

Each dataset provides automatic constraint matrix generation for coherent hierarchical forecasting.

Long Horizon Forecasting (ETT)

Long-horizon forecasting focuses on predicting far into the future, often requiring models that can capture long-term trends and patterns. The ETT (Electricity Transformer) datasets provide normalized time series data with proper train/validation/test splits for benchmarking long-term forecasting models:

from datasetsforecast.long_horizon import LongHorizon

# Load ETT dataset for long-horizon forecasting
Y_train, Y_val, Y_test = LongHorizon.load(directory="data", group="ETTh1")

print(f"Training data shape: {Y_train.shape}")
print(f"Validation data shape: {Y_val.shape}")

# Show sample data
print(f"\nSample training data:")
print(Y_train.head())

Training data shape: (14400, 3)
Validation data shape: (14400, 6)

Sample training data:
  unique_id                   ds         y
0        OT  2016-07-01 00:00:00  1.460552
1        OT  2016-07-01 01:00:00  1.161527
2        OT  2016-07-01 02:00:00  1.161527
3        OT  2016-07-01 03:00:00  0.862611
4        OT  2016-07-01 04:00:00  0.525227

The ETT dataset contains hourly electricity transformer temperature and load data, pre-normalized and properly split for rigorous long-horizon forecasting evaluation across multiple variables.

Other available long-horizon datasets in the long horizon collection include:

ETT variants: ETTh1, ETTh2 (hourly), ETTm1, ETTm2 (15-minute electricity data)
Exchange: Daily currency exchange rates across eight countries
TrafficL: Hourly road occupancy measurements
ILI: Influenza-like illness tracking data
Weather: Meteorological measurements
ECL: Electricity consumption data

All follow the same train/validation/test split methodology.

Predictive Maintenance (PHM2008)

Predictive maintenance forecasting involves predicting the remaining useful life (RUL) of equipment based on sensor measurements over time. The PHM2008 dataset provides benchmark data for industrial forecasting and failure prediction:

from datasetsforecast.phm2008 import PHM2008

# Load PHM2008 dataset for remaining useful life prediction
Y_df, *_ = PHM2008.load(directory="data", group="FD001")

print(f"Dataset shape: {Y_df.shape}")
print(f"Unique engines: {Y_df['unique_id'].nunique()}")

# Analyze RUL distribution
rul_stats = Y_df.groupby('unique_id')['y'].agg(['min', 'max', 'count'])
print(f"\nRUL Statistics per engine:")
print(f"  Average cycles per engine: {rul_stats['count'].mean():.0f}")
print(f"  Min cycles: {rul_stats['count'].min()}")
print(f"  Max cycles: {rul_stats['count'].max()}")
print(f"  Average max RUL: {rul_stats['max'].mean():.0f}")

# Show sample data structure
print(f"\nSample data:")
print(Y_df.head(10))

Dataset shape: (20631, 3)
Unique engines: 100

RUL Statistics per engine:
  Average cycles per engine: 206
  Min cycles: 128
  Max cycles: 362
  Average max RUL: 206

Sample data:
  unique_id  ds      y
0         1   1  206.0
1         1   2  205.0
2         1   3  204.0
3         1   4  203.0
4         1   5  202.0
5         1   6  201.0
6         1   7  200.0
7         1   8  199.0
8         1   9  198.0
9         1  10  197.0

The PHM2008 dataset contains run-to-failure sensor data from aircraft engines, where each engine's remaining useful life decreases over operational cycles until failure occurs.

Other available predictive maintenance datasets in the PHM2008 collection include:

FD001: Single operating condition and fault mode
FD002: Multiple operating conditions, single fault mode
FD003: Single operating condition, multiple fault modes
FD004: Multiple operating conditions and fault modes

All datasets focus on remaining useful life prediction for industrial applications.

Built-in Evaluation and Benchmarking

datasetsforecast includes evaluation methods that implement competition-specific metrics. The M4 competition uses SMAPE (Symmetric Mean Absolute Percentage Error) and MASE (Mean Absolute Scaled Error) combined into an Overall Weighted Average (OWA):

from statsforecast import StatsForecast
from statsforecast.models import Naive, SeasonalNaive
from datasetsforecast.m4 import M4Evaluation

# Create simple benchmark forecasts
models = [Naive(), SeasonalNaive(season_length=24)]
sf = StatsForecast(models=models, freq='H')

# Generate forecasts for evaluation
forecasts = sf.forecast(df=data_df, h=48)
y_hat = forecasts[['Naive']].values

# Evaluate using M4 methodology
evaluation = M4Evaluation.evaluate('data', 'Hourly', y_hat)
print("M4 Evaluation Results:")
print(evaluation)

M4 Evaluation Results:
            SMAPE      MASE       OWA
Hourly  40.987889  11.32747  3.479615

The evaluation shows our naive forecast performance relative to M4 benchmarks:

OWA (Overall Weighted Average): 3.48 means 248% worse than M4 winner
SMAPE: 40.99% symmetric mean absolute percentage error
MASE: 11.33 means 1033% worse than seasonal naive baseline

Compare against competition winners:

# Load benchmark forecasts from M4 competition winners
naive2_forecasts = M4Evaluation.load_benchmark('data', 'Hourly')
naive2_evaluation = M4Evaluation.evaluate('data', 'Hourly', naive2_forecasts)

benchmark_comparison = pd.DataFrame({
    'Our Model': evaluation.iloc[0],
    'Naive2 Benchmark': naive2_evaluation.iloc[0]
})

print("Benchmark Comparison:")
print(benchmark_comparison)

Benchmark Comparison:
       Our Model  Naive2 Benchmark
SMAPE  40.987889         18.382878
MASE   11.327470          2.395040
OWA     3.479615          1.000000

This comparison shows exactly where your model stands relative to established benchmarks, enabling data-driven decisions about model improvements.

Conclusion

datasetsforecast eliminates the data hunting bottleneck that consumes 60-80% of forecasting project time. Instead of spending days searching for datasets, cleaning data, and implementing evaluation metrics, you get:

Instant access to 7+ benchmark datasets used in top-tier research
Automatic preprocessing with competition-ready train/test splits
Built-in evaluation against published benchmarks and competition winners
Consistent format that works with all major forecasting libraries
Rapid prototyping across multiple domains and frequencies

Stop spending days on data preparation. With datasetsforecast, you can focus on what matters: building better forecasting models and making data-driven decisions that impact your business.

Next Steps

Ready to scale your forecasting models to production? Consider TimeGPT's performance advantages for enterprise-scale deployments, or explore hierarchical forecasting approaches for complex organizational structures.