Dask
Run TimeGPT in a distributed manner using Dask for scalable forecasting.
Dask is an open-source parallel computing library for Python. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks.
Highlights
• Simplify distributed computing with Fugue.
• Run TimeGPT at scale on a Dask cluster.
• Seamlessly convert pandas DataFrames to Dask.
Outline
Step 1: Installation
Step 2: Load Your Data
You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets:
unique_id | ds | y | |
---|---|---|---|
0 | BE | 2016-10-22 00:00:00 | 70.00 |
1 | BE | 2016-10-22 01:00:00 | 37.10 |
2 | BE | 2016-10-22 02:00:00 | 37.10 |
3 | BE | 2016-10-22 03:00:00 | 44.75 |
4 | BE | 2016-10-22 04:00:00 | 37.10 |
Step 3: Import Dask
Convert the pandas DataFrame into a Dask DataFrame for parallel processing.
When converting to a Dask DataFrame, you can specify the number of partitions based on your data size or system resources.
Step 4: Use TimeGPT on Dask
To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla’s client methods instead of a pandas DataFrame.
Important Concept: NixtlaClient
Instantiate the NixtlaClient
class to interact with Nixtla’s API.
You can use any method from the NixtlaClient
, such as forecast
or cross_validation
.
unique_id | ds | TimeGPT | |
---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 45.190453 |
1 | BE | 2016-12-31 01:00:00 | 43.244446 |
2 | BE | 2016-12-31 02:00:00 | 41.958389 |
3 | BE | 2016-12-31 03:00:00 | 39.796486 |
4 | BE | 2016-12-31 04:00:00 | 39.204533 |
unique_id | ds | TimeGPT | |
---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 45.190453 |
1 | BE | 2016-12-31 01:00:00 | 43.244446 |
2 | BE | 2016-12-31 02:00:00 | 41.958389 |
3 | BE | 2016-12-31 03:00:00 | 39.796486 |
4 | BE | 2016-12-31 04:00:00 | 39.204533 |
unique_id | ds | cutoff | TimeGPT | |
---|---|---|---|---|
0 | BE | 2016-12-30 04:00:00 | 2016-12-30 03:00:00 | 39.375439 |
1 | BE | 2016-12-30 05:00:00 | 2016-12-30 03:00:00 | 40.039215 |
2 | BE | 2016-12-30 06:00:00 | 2016-12-30 03:00:00 | 43.455849 |
3 | BE | 2016-12-30 07:00:00 | 2016-12-30 03:00:00 | 47.716408 |
4 | BE | 2016-12-30 08:00:00 | 2016-12-30 03:00:00 | 50.316650 |
TimeGPT with Dask also supports exogenous variables. Refer to the Exogenous Variables Tutorial for details. Substitute pandas DataFrames with Dask DataFrames as needed.