Evaluation Pipeline

Overview

After generating forecasts with TimeGPT, the next step is to evaluate how accurate those forecasts are. The evaluate function from the utilsforecast library provides a fast and flexible way to assess model performance using a wide range of metrics. This pipeline works seamlessly with TimeGPT and other forecasting models.
With the evaluation pipeline, you can easily select models and define metrics like MAE, MSE, or MAPE to benchmark forecasting performance.

Step-to-Step Guide

Step 1. Import Required Packages

Start by importing the necessary libraries and initializing the NixtlaClient with your API key.

import pandas as pd
from nixtla import NixtlaClient
from functools import partial
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, mse, rmse, mape, smape, mase, scaled_crps

nixtla_client = NixtlaClient(api_key='your_api_key_here')

Step 2. Load and Prepare the Dataset

For this example, we use the Air Passenger dataset, which records monthly totals of international airline passengers. We’ll load the dataset, format the timestamps, and split the data into a training set and a test set. The last 12 months are used for testing.

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df['unique_id'] = 'passengers'
df['timestamp'] = pd.to_datetime(df['timestamp'])

df_train = df.iloc[:-12]
df_test = df.iloc[-12:]

Step 3. Generate Forecast with TimeGPT

Next, we will:

Use the training set to generate a 12-step forecast with TimeGPT.
Merge the forecast with the test set for evaluation.

fcst_timegpt = nixtla_client.forecast(df = df_train,
                                      h=12,
                                      time_col = 'timestamp',
                                      target_col = 'value',
                                      level=[80, 95])
fcst_timegpt = fcst_timegpt.merge(df_test, on = ['timestamp','unique_id'])

Step 4. Define Models and Metrics for Evaluation

Next, we define the models to evaluate and the metrics to use. For more information about supported metrics, refer to the evaluation metrics tutorial .

models = ['TimeGPT']
metrics = [
           mae,
           mse, 
           rmse, 
           mape, 
           smape,
           partial(mase, seasonality=12),
           scaled_crps
           ]

Step 5. Run the Evaluation

Finally, call the evaluate function with your merged forecast results. Include train_df for metrics that need the training data and level if using probabilistic metrics.

evaluation = evaluate(
    fcst_timegpt,
    target_col = 'value',
    time_col = 'timestamp',
    metrics=metrics,
    models=model,
    train_df=df_train,
    level=[80, 95]
)

unique_id	metric	TimeGPT
passengers	mae	12.67930
passengers	mse	213.9358
passengers	rmse	14.62654
passengers	mape	0.026964
passengers	smape	0.013527
passengers	mase	0.416397
passengers	scaled_crps	0.008991

INTRODUCTION

SETUP

DATA REQUIREMENTS

FORECASTING

ANOMALY DETECTION

USE CASES

REFERENCE

About

Overview

Step-to-Step Guide

Step 1. Import Required Packages

Step 2. Load and Prepare the Dataset

Step 3. Generate Forecast with TimeGPT

Step 4. Define Models and Metrics for Evaluation

Step 5. Run the Evaluation

INTRODUCTION

SETUP

DATA REQUIREMENTS

FORECASTING

ANOMALY DETECTION

USE CASES

REFERENCE

About

​Overview

​Step-to-Step Guide

​Step 1. Import Required Packages

​Step 2. Load and Prepare the Dataset

​Step 3. Generate Forecast with TimeGPT

​Step 4. Define Models and Metrics for Evaluation

​Step 5. Run the Evaluation

Overview

Step-to-Step Guide

Step 1. Import Required Packages

Step 2. Load and Prepare the Dataset

Step 3. Generate Forecast with TimeGPT

Step 4. Define Models and Metrics for Evaluation

Step 5. Run the Evaluation