Overview

After generating forecasts with TimeGPT, the next step is to evaluate how accurate those forecasts are. The evaluate function from the utilsforecast library provides a fast and flexible way to assess model performance using a wide range of metrics. This pipeline works seamlessly with TimeGPT and other forecasting models.
With the evaluation pipeline, you can easily select models and define metrics like MAE, MSE, or MAPE to benchmark forecasting performance.

Step-to-Step Guide

Step 1. Import Required Packages

Start by importing the necessary libraries and initializing the NixtlaClient with your API key.

import pandas as pd
from nixtla import NixtlaClient
from functools import partial
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, mse, rmse, mape, smape, mase, scaled_crps

nixtla_client = NixtlaClient(api_key='your_api_key_here')

Step 2. Load and Prepare the Dataset

For this example, we use the Air Passenger dataset, which records monthly totals of international airline passengers. We’ll load the dataset, format the timestamps, and split the data into a training set and a test set. The last 12 months are used for testing.

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df['unique_id'] = 'passengers'
df['timestamp'] = pd.to_datetime(df['timestamp'])
df_train = df.iloc[:-12]
df_test = df.iloc[-12:]

Step 3. Generate Forecast with TimeGPT

Next, we will:

  • Use the training set to generate a 12-step forecast with TimeGPT.
  • Merge the forecast with the test set for evaluation.
fcst_timegpt = nixtla_client.forecast(df = df_train,
                                      h=12,
                                      time_col = 'timestamp',
                                      target_col = 'value',
                                      level=[80, 95])
fcst_timegpt = fcst_timegpt.merge(df_test, on = ['timestamp','unique_id'])

Step 4. Define Models and Metrics for Evaluation

Next, we define the models to evaluate and the metrics to use. For more information about supported metrics, refer to the evaluation metrics tutorial .

models = ['TimeGPT']
metrics = [
           mae,
           mse, 
           rmse, 
           mape, 
           smape,
           partial(mase, seasonality=12),
           scaled_crps
           ]

Step 5. Run the Evaluation

Finally, call the evaluate function with your merged forecast results. Include train_df for metrics that need the training data and level if using probabilistic metrics.

evaluation = evaluate(
    fcst_timegpt,
    target_col = 'value',
    time_col = 'timestamp',
    metrics=metrics,
    models=model,
    train_df=df_train,
    level=[80, 95]
)
unique_idmetricTimeGPT
passengersmae12.67930
passengersmse213.9358
passengersrmse14.62654
passengersmape0.026964
passengerssmape0.013527
passengersmase0.416397
passengersscaled_crps0.008991