TimeGPT accepts pandas and polars dataframes in long format. The minimum required columns are:

Required Columns

  • ds(timestamp): String or datetime in YYYY-MM-DD or YYYY-MM-DD HH:MM:SS format.

  • y(numeric): Numerical target variable to forecast.

Optional Index

If a DataFrame lacks the ds column but uses a DatetimeIndex, that is also supported.

TimeGPT also supports distributed dataframe libraries such as dask, spark, and ray.

You can include additional exogenous features in the same DataFrame. See the Exogenous Variables tutorial for details.


Example DataFrame

Below is a sample of a valid input DataFrame for TimeGPT (with columns named timestamp and value instead of ds and y):

Sample Data Loading
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()

Data Preview

Sample Data Preview

idtimestampvalue
01949-01-01112
11949-02-01118
21949-03-01132
31949-04-01129
41949-05-01121

In this example:
timestamp corresponds to ds.
value corresponds to y.


Matching Columns to TimeGPT

You can choose how to align your DataFrame columns with TimeGPT’s expected structure:

Rename timestamp to ds and value to y:

Rename Columns Example
df = df.rename(columns={'timestamp': 'ds', 'value': 'y'})

Now your DataFrame has the explicitly required columns:

Show Head of DataFrame
print(df.head())

Example Forecast

When you run the forecast method:

Forecast Example
fcst = nixtla_client.forecast(
    df=df,
    h=12,
    time_col='timestamp',
    target_col='value'
)

fcst.head()
timestampTimeGPT
01961-01-01437.83792
11961-02-01426.06270
21961-03-01463.11655
31961-04-01478.24450
41961-05-01505.64648

Forecast Output Preview

TimeGPT attempts to automatically infer your data’s frequency (freq). You can override this by specifying the freq parameter (e.g., freq='MS').

For more information, see the TimeGPT Quickstart.


Multiple Series

When forecasting multiple time series simultaneously, each series must include a unique identifier column called unique_id:

Multiple Series Data Loading
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
df.head()
unique_iddsy
0BE2016-10-22 00:00:0070.00
1BE2016-10-22 01:00:0037.10

Multiple-Series Data Preview

Simply call:

Multiple Series Forecast Example
fcst = nixtla_client.forecast(df=df, h=24)
fcst.head()

TimeGPT will produce forecasts for all unique IDs in your DataFrame simultaneously.


Exogenous Variables

TimeGPT can use exogenous variables in your forecasts. If you have future values for these variables, provide them in a separate DataFrame.


Important Considerations

Warning: Data passed to TimeGPT must not contain missing values or time gaps.

To handle missing data, see Dealing with Missing Values in TimeGPT.


Minimum Data Requirements (Azure AI)

These are the minimum data sizes required for each frequency when using Azure AI:

FrequencyMinimum Size
Hourly and subhourly (e.g., “H”)1008
Daily (“D”)300
Weekly (e.g., “W-MON”)64
Monthly and others48

When preparing your data, also consider:

1

Forecast horizon (h)

Number of future periods you want to predict.

2

Number of validation windows (n_windows)

How many times to test the model’s performance.

3

Gaps (step_size)

Periodic offset between validation windows during cross-validation.

This ensures you have enough data for both training and evaluation.