Categorical Variables

What Are Categorical Variables?

Categorical variables are external factors that take on a limited range of discrete values, grouping observations by categories. For example, “Sporting” or “Cultural” events in a dataset describing product demand.

By capturing unique external conditions, categorical variables enhance the predictive power of your model and can reduce forecasting error. They are easy to incorporate by merging each time series data point with its corresponding categorical data.

This tutorial demonstrates how to incorporate categorical (discrete) variables into TimeGPT forecasts.

How to Use Categorical Variables in TimeGPT

Step 1: Import Packages and Initialize the Nixtla Client

Make sure you have the necessary libraries installed: pandas, nixtla, and datasetsforecast.

import pandas as pd
import os

from nixtla import NixtlaClient
from datasetsforecast.m5 import M5

# Initialize the Nixtla Client
nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'
)

Step 2: Load M5 Data

We use the M5 dataset — a collection of daily product sales demands across 10 US stores — to showcase how categorical variables can improve forecasts.

Start by loading the M5 dataset and converting the date columns to datetime objects.

Y_df, X_df, _ = M5.load(directory=os.getcwd())

Y_df['ds'] = pd.to_datetime(Y_df['ds'])
X_df['ds'] = pd.to_datetime(X_df['ds'])

Y_df.head(10)

unique_id	ds	y
FOODS_1_001_CA_1	2011-01-29	3.0
FOODS_1_001_CA_1	2011-01-30	0.0
FOODS_1_001_CA_1	2011-01-31	0.0
FOODS_1_001_CA_1	2011-02-01	1.0
FOODS_1_001_CA_1	2011-02-02	4.0
FOODS_1_001_CA_1	2011-02-03	2.0
FOODS_1_001_CA_1	2011-02-04	0.0
FOODS_1_001_CA_1	2011-02-05	2.0
FOODS_1_001_CA_1	2011-02-06	0.0
FOODS_1_001_CA_1	2011-02-07	0.0

Extract the categorical columns from the X_df dataframe.

X_df = X_df[['unique_id', 'ds', 'event_type_1']]
X_df.head(10)

unique_id	ds	event_type_1
FOODS_1_001_CA_1	2011-01-29	nan
FOODS_1_001_CA_1	2011-01-30	nan
FOODS_1_001_CA_1	2011-01-31	nan
FOODS_1_001_CA_1	2011-02-01	nan
FOODS_1_001_CA_1	2011-02-02	nan
FOODS_1_001_CA_1	2011-02-03	nan
FOODS_1_001_CA_1	2011-02-04	nan
FOODS_1_001_CA_1	2011-02-05	nan
FOODS_1_001_CA_1	2011-02-06	Sporting
FOODS_1_001_CA_1	2011-02-07	nan

Notice that there is a Sporting event on February 6, 2011, listed under event_type_1.

Step 3: Prepare Data for Forecasting

We’ll select a specific product to demonstrate how to incorporate categorical features into TimeGPT forecasts.

Select a High-Selling Product and Merge Data

Start by selecting a high-selling product and merging the data.

product = 'FOODS_3_090_CA_3'

Y_df_product = Y_df.query('unique_id == @product')
X_df_product = X_df.query('unique_id == @product')

df = Y_df_product.merge(X_df_product)
df.head(10)

unique_id	ds	y	event_type_1
FOODS_3_090_CA_3	2011-01-29	108.0	nan
FOODS_3_090_CA_3	2011-01-30	132.0	nan
FOODS_3_090_CA_3	2011-01-31	102.0	nan
FOODS_3_090_CA_3	2011-02-01	120.0	nan
FOODS_3_090_CA_3	2011-02-02	106.0	nan
FOODS_3_090_CA_3	2011-02-03	123.0	nan
FOODS_3_090_CA_3	2011-02-04	279.0	nan
FOODS_3_090_CA_3	2011-02-05	175.0	nan
FOODS_3_090_CA_3	2011-02-06	186.0	Sporting
FOODS_3_090_CA_3	2011-02-07	120.0	nan

One-Hot Encode Categorical Events

Encode categorical variables using one-hot encoding. One-hot encoding transforms each category into a separate column containing binary indicators (0 or 1).

event_type_1_ohe = pd.get_dummies(df['event_type_1'], dtype=int)

df = pd.concat([df, event_type_1_ohe], axis=1)
df = df.drop(columns=['event_type_1'])

df.tail(10)

unique_id	ds	y	Sporting	nan
FOODS_3_090_CA_3	2016-06-10	140.0	0	1
FOODS_3_090_CA_3	2016-06-11	151.0	0	1
FOODS_3_090_CA_3	2016-06-12	87.0	0	1
FOODS_3_090_CA_3	2016-06-13	67.0	0	1
FOODS_3_090_CA_3	2016-06-14	50.0	0	1
FOODS_3_090_CA_3	2016-06-15	58.0	0	1
FOODS_3_090_CA_3	2016-06-16	116.0	0	1
FOODS_3_090_CA_3	2016-06-17	124.0	0	1
FOODS_3_090_CA_3	2016-06-18	167.0	0	1
FOODS_3_090_CA_3	2016-06-19	118.0	1	0

Prepare Future External Variables

Select future external variables for Feb 1-7, 2016.

future_ex_vars_df = df.drop(columns=['y']).query("ds >= '2016-02-01' & ds <= '2016-02-07'")

Separate training data before Feb 1, 2016.

df_train = df.query("ds < '2016-02-01'")
df_train.tail(10)

unique_id	ds	y	nan
FOODS_3_090_CA_3	2016-01-22	94.0	1
FOODS_3_090_CA_3	2016-01-23	144.0	1
FOODS_3_090_CA_3	2016-01-24	146.0	1
FOODS_3_090_CA_3	2016-01-25	87.0	1
FOODS_3_090_CA_3	2016-01-26	73.0	1
FOODS_3_090_CA_3	2016-01-27	62.0	1
FOODS_3_090_CA_3	2016-01-28	64.0	1
FOODS_3_090_CA_3	2016-01-29	102.0	1
FOODS_3_090_CA_3	2016-01-30	113.0	1
FOODS_3_090_CA_3	2016-01-31	98.0	1

Step 4: Forecast Product Demand

To evaluate the impact of categorical variables, we’ll forecast product demand with and without them.

Forecast Without Categorical Variables

timegpt_fcst_without_cat_vars_df = nixtla_client.forecast(
    df=df_train,
    h=7,
    level=[80, 90]
)

timegpt_fcst_without_cat_vars_df.head()

unique_id	ds	TimeGPT	TimeGPT-lo-90	TimeGPT-lo-80	TimeGPT-hi-80	TimeGPT-hi-90
FOODS_3_090_CA_3	2016-02-01	73.304092	53.449049	54.795078	91.813107	93.159136
FOODS_3_090_CA_3	2016-02-02	66.335518	47.510669	50.274136	82.396899	85.160367
FOODS_3_090_CA_3	2016-02-03	65.881630	36.218617	41.388896	90.374364	95.544643
FOODS_3_090_CA_3	2016-02-04	72.371864	-26.683115	25.097362	119.646367	171.426844
FOODS_3_090_CA_3	2016-02-05	95.141045	-2.084882	34.027078	156.255011	192.366971

Visualize the forecast without categorical variables.

nixtla_client.plot(
    df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
    timegpt_fcst_without_cat_vars_df,
    max_insample_length=28,
)

Forecast with categorical variables

TimeGPT already provides a reasonable forecast, but it seems to somewhat underforecast the peak on the 6th of February 2016 - the day before the Super Bowl.

Forecast With Categorical Variables

timegpt_fcst_with_cat_vars_df = nixtla_client.forecast(
    df=df_train,
    X_df=future_ex_vars_df,
    h=7,
    level=[80, 90]
)

timegpt_fcst_with_cat_vars_df.head()

unique_id	ds	TimeGPT	TimeGPT-lo-90	TimeGPT-lo-80	TimeGPT-hi-80	TimeGPT-hi-90
FOODS_3_090_CA_3	2016-02-01	70.661271	-0.204378	14.593348	126.729194	141.526919
FOODS_3_090_CA_3	2016-02-02	65.566941	-20.394326	11.654239	119.479643	151.528208
FOODS_3_090_CA_3	2016-02-03	68.510010	-33.713710	6.732952	130.287069	170.733731
FOODS_3_090_CA_3	2016-02-04	75.417710	-40.974649	4.751767	146.083653	191.810069
FOODS_3_090_CA_3	2016-02-05	97.340302	-57.385361	18.253812	176.426792	252.065965

Visualize the forecast with categorical variables.

# Visualize the forecast with categorical variables
nixtla_client.plot(
    df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
    timegpt_fcst_with_cat_vars_df,
    max_insample_length=28,
)

Forecast with categorical variables

5. Evaluate Forecast Accuracy

Finally, we calculate the Mean Absolute Error (MAE) for the forecasts with and without categorical variables.

# Create target dataframe
df_target = df[['unique_id', 'ds', 'y']].query("ds >= '2016-02-01' & ds <= '2016-02-07'")

# Rename forecast columns
timegpt_fcst_without_cat_vars_df = timegpt_fcst_without_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-without-cat-vars'})
timegpt_fcst_with_cat_vars_df = timegpt_fcst_with_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-with-cat-vars'})

# Merge forecasts with target dataframe
df_target = df_target.merge(timegpt_fcst_without_cat_vars_df[['unique_id', 'ds', 'TimeGPT-without-cat-vars']])
df_target = df_target.merge(timegpt_fcst_with_cat_vars_df[['unique_id', 'ds', 'TimeGPT-with-cat-vars']])

# Compute errors
mean_absolute_errors = mae(df_target, ['TimeGPT-without-cat-vars', 'TimeGPT-with-cat-vars'])

unique_id	TimeGPT-without-cat-vars	TimeGPT-with-cat-vars
FOODS_3_090_CA_3	24.285649	20.028514

Including categorical variables noticeably improves forecast accuracy, reducing MAE by about 20%.

Conclusion

Categorical variables are powerful additions to TimeGPT forecasts, helping capture valuable external factors. By properly encoding these variables and merging them with your time series, you can significantly enhance predictive performance.

Continue exploring more advanced techniques or different datasets to further improve your TimeGPT forecasting models.

INTRODUCTION

SETUP

DATA REQUIREMENTS

FORECASTING

ANOMALY DETECTION

USE CASES

REFERENCE

About

What Are Categorical Variables?

How to Use Categorical Variables in TimeGPT

Step 1: Import Packages and Initialize the Nixtla Client

Step 2: Load M5 Data

Step 3: Prepare Data for Forecasting

Select a High-Selling Product and Merge Data

One-Hot Encode Categorical Events

Prepare Future External Variables

Step 4: Forecast Product Demand

Forecast Without Categorical Variables

Forecast With Categorical Variables

5. Evaluate Forecast Accuracy

Conclusion

INTRODUCTION

SETUP

DATA REQUIREMENTS

FORECASTING

ANOMALY DETECTION

USE CASES

REFERENCE

About

​What Are Categorical Variables?

​How to Use Categorical Variables in TimeGPT

​Step 1: Import Packages and Initialize the Nixtla Client

​Step 2: Load M5 Data

​Step 3: Prepare Data for Forecasting

​Select a High-Selling Product and Merge Data

​One-Hot Encode Categorical Events

​Prepare Future External Variables

​Step 4: Forecast Product Demand

​Forecast Without Categorical Variables

​Forecast With Categorical Variables

​5. Evaluate Forecast Accuracy

​Conclusion

What Are Categorical Variables?

How to Use Categorical Variables in TimeGPT

Step 1: Import Packages and Initialize the Nixtla Client

Step 2: Load M5 Data

Step 3: Prepare Data for Forecasting

Select a High-Selling Product and Merge Data

One-Hot Encode Categorical Events

Prepare Future External Variables

Step 4: Forecast Product Demand

Forecast Without Categorical Variables

Forecast With Categorical Variables

5. Evaluate Forecast Accuracy

Conclusion