What Are Categorical Variables?
Categorical variables are external factors that take on a limited range of discrete values, grouping observations by categories. For example, “Sporting” or “Cultural” events in a dataset describing product demand. By capturing unique external conditions, categorical variables enhance the predictive power of your model and can reduce forecasting error. They are easy to incorporate by merging each time series data point with its corresponding categorical data. This tutorial demonstrates how to incorporate categorical (discrete) variables into TimeGPT forecasts.How to Use Categorical Variables in TimeGPT
Step 1: Import Packages and Initialize the Nixtla Client
Make sure you have the necessary libraries installed: pandas, nixtla, and datasetsforecast.Step 2: Load M5 Data
We use the M5 dataset — a collection of daily product sales demands across 10 US stores — to showcase how categorical variables can improve forecasts. Start by loading the M5 dataset and converting the date columns to datetime objects.unique_id | ds | y |
---|---|---|
FOODS_1_001_CA_1 | 2011-01-29 | 3.0 |
FOODS_1_001_CA_1 | 2011-01-30 | 0.0 |
FOODS_1_001_CA_1 | 2011-01-31 | 0.0 |
FOODS_1_001_CA_1 | 2011-02-01 | 1.0 |
FOODS_1_001_CA_1 | 2011-02-02 | 4.0 |
FOODS_1_001_CA_1 | 2011-02-03 | 2.0 |
FOODS_1_001_CA_1 | 2011-02-04 | 0.0 |
FOODS_1_001_CA_1 | 2011-02-05 | 2.0 |
FOODS_1_001_CA_1 | 2011-02-06 | 0.0 |
FOODS_1_001_CA_1 | 2011-02-07 | 0.0 |
unique_id | ds | event_type_1 |
---|---|---|
FOODS_1_001_CA_1 | 2011-01-29 | nan |
FOODS_1_001_CA_1 | 2011-01-30 | nan |
FOODS_1_001_CA_1 | 2011-01-31 | nan |
FOODS_1_001_CA_1 | 2011-02-01 | nan |
FOODS_1_001_CA_1 | 2011-02-02 | nan |
FOODS_1_001_CA_1 | 2011-02-03 | nan |
FOODS_1_001_CA_1 | 2011-02-04 | nan |
FOODS_1_001_CA_1 | 2011-02-05 | nan |
FOODS_1_001_CA_1 | 2011-02-06 | Sporting |
FOODS_1_001_CA_1 | 2011-02-07 | nan |
event_type_1
.
Step 3: Prepare Data for Forecasting
We’ll select a specific product to demonstrate how to incorporate categorical features into TimeGPT forecasts.Select a High-Selling Product and Merge Data
Start by selecting a high-selling product and merging the data.unique_id | ds | y | event_type_1 |
---|---|---|---|
FOODS_3_090_CA_3 | 2011-01-29 | 108.0 | nan |
FOODS_3_090_CA_3 | 2011-01-30 | 132.0 | nan |
FOODS_3_090_CA_3 | 2011-01-31 | 102.0 | nan |
FOODS_3_090_CA_3 | 2011-02-01 | 120.0 | nan |
FOODS_3_090_CA_3 | 2011-02-02 | 106.0 | nan |
FOODS_3_090_CA_3 | 2011-02-03 | 123.0 | nan |
FOODS_3_090_CA_3 | 2011-02-04 | 279.0 | nan |
FOODS_3_090_CA_3 | 2011-02-05 | 175.0 | nan |
FOODS_3_090_CA_3 | 2011-02-06 | 186.0 | Sporting |
FOODS_3_090_CA_3 | 2011-02-07 | 120.0 | nan |
One-Hot Encode Categorical Events
Encode categorical variables using one-hot encoding. One-hot encoding transforms each category into a separate column containing binary indicators (0 or 1).unique_id | ds | y | Cultural | National | Religious | Sporting | nan |
---|---|---|---|---|---|---|---|
FOODS_3_090_CA_3 | 2016-06-10 | 140.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-11 | 151.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-12 | 87.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-13 | 67.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-14 | 50.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-15 | 58.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-16 | 116.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-17 | 124.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-18 | 167.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-06-19 | 118.0 | 0 | 0 | 0 | 1 | 0 |
Prepare Future External Variables
Select future external variables for Feb 1-7, 2016.unique_id | ds | y | Cultural | National | Religious | Sporting | nan |
---|---|---|---|---|---|---|---|
FOODS_3_090_CA_3 | 2016-01-22 | 94.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-23 | 144.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-24 | 146.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-25 | 87.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-26 | 73.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-27 | 62.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-28 | 64.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-29 | 102.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-30 | 113.0 | 0 | 0 | 0 | 0 | 1 |
FOODS_3_090_CA_3 | 2016-01-31 | 98.0 | 0 | 0 | 0 | 0 | 1 |
Step 4: Forecast Product Demand
To evaluate the impact of categorical variables, we’ll forecast product demand with and without them.Forecast Without Categorical Variables
unique_id | ds | TimeGPT | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 |
---|---|---|---|---|---|---|
FOODS_3_090_CA_3 | 2016-02-01 | 73.304092 | 53.449049 | 54.795078 | 91.813107 | 93.159136 |
FOODS_3_090_CA_3 | 2016-02-02 | 66.335518 | 47.510669 | 50.274136 | 82.396899 | 85.160367 |
FOODS_3_090_CA_3 | 2016-02-03 | 65.881630 | 36.218617 | 41.388896 | 90.374364 | 95.544643 |
FOODS_3_090_CA_3 | 2016-02-04 | 72.371864 | -26.683115 | 25.097362 | 119.646367 | 171.426844 |
FOODS_3_090_CA_3 | 2016-02-05 | 95.141045 | -2.084882 | 34.027078 | 156.255011 | 192.366971 |

Forecast with categorical variables
Forecast With Categorical Variables
unique_id | ds | TimeGPT | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 |
---|---|---|---|---|---|---|
FOODS_3_090_CA_3 | 2016-02-01 | 70.661271 | -0.204378 | 14.593348 | 126.729194 | 141.526919 |
FOODS_3_090_CA_3 | 2016-02-02 | 65.566941 | -20.394326 | 11.654239 | 119.479643 | 151.528208 |
FOODS_3_090_CA_3 | 2016-02-03 | 68.510010 | -33.713710 | 6.732952 | 130.287069 | 170.733731 |
FOODS_3_090_CA_3 | 2016-02-04 | 75.417710 | -40.974649 | 4.751767 | 146.083653 | 191.810069 |
FOODS_3_090_CA_3 | 2016-02-05 | 97.340302 | -57.385361 | 18.253812 | 176.426792 | 252.065965 |

Forecast with categorical variables
5. Evaluate Forecast Accuracy
Finally, we calculate the Mean Absolute Error (MAE) for the forecasts with and without categorical variables.unique_id | TimeGPT-without-cat-vars | TimeGPT-with-cat-vars |
---|---|---|
FOODS_3_090_CA_3 | 24.285649 | 20.028514 |