Evaluation Pipeline
Learn how to evaluate TimeGPT model performance using tools in utilforecast
Overview
After generating forecasts with TimeGPT, the next step is to evaluate how accurate those forecasts are. The evaluate function from the utilsforecast library provides a fast and flexible way to assess model performance using a wide range of metrics. This pipeline works seamlessly with TimeGPT and other forecasting models.
With the evaluation pipeline, you can easily select models and define metrics like MAE, MSE, or MAPE to benchmark forecasting performance.
Step-to-Step Guide
Step 1. Import Required Packages
Start by importing the necessary libraries and initializing the NixtlaClient
with your API key.
Step 2. Load and Prepare the Dataset
For this example, we use the Air Passenger dataset, which records monthly totals of international airline passengers. We’ll load the dataset, format the timestamps, and split the data into a training set and a test set. The last 12 months are used for testing.
Step 3. Generate Forecast with TimeGPT
Next, we will:
- Use the training set to generate a 12-step forecast with TimeGPT.
- Merge the forecast with the test set for evaluation.
Step 4. Define Models and Metrics for Evaluation
Next, we define the models to evaluate and the metrics to use. For more information about supported metrics, refer to the evaluation metrics tutorial .
Step 5. Run the Evaluation
Finally, call the evaluate function with your merged forecast results. Include train_df
for metrics that need the training data and level
if using probabilistic metrics.
unique_id | metric | TimeGPT |
---|---|---|
passengers | mae | 12.67930 |
passengers | mse | 213.9358 |
passengers | rmse | 14.62654 |
passengers | mape | 0.026964 |
passengers | smape | 0.013527 |
passengers | mase | 0.416397 |
passengers | scaled_crps | 0.008991 |