Skip to main content

Overview

Ray is an open-source unified compute framework that helps scale Python workloads for distributed computing. This guide demonstrates how to distribute TimeGPT forecasting jobs on top of Ray. Ray is ideal for machine learning pipelines with complex task dependencies and datasets with 10+ million observations. Its unified framework excels at orchestrating distributed ML workflows, making it perfect for integrating TimeGPT into broader AI applications.

Why Use Ray for Time Series Forecasting?

Ray offers unique advantages for ML-focused time series forecasting:
  • ML pipeline integration: Seamlessly integrate TimeGPT into complex ML workflows with Ray Tune and Ray Serve
  • Task parallelism: Handle complex task dependencies beyond data parallelism
  • Python-native: Pure Python with minimal boilerplate code
  • Flexible architecture: Scale from laptop to cluster with the same code
  • Actor model: Stateful computations for advanced forecasting scenarios
Choose Ray when you’re building ML pipelines, need complex task orchestration, or want to integrate TimeGPT with other ML frameworks like PyTorch or TensorFlow. What you’ll learn:
  • Install Fugue with Ray support for distributed computing
  • Initialize Ray clusters for distributed forecasting
  • Run TimeGPT forecasting and cross-validation on Ray

Prerequisites

Before proceeding, make sure you have an API key from Nixtla. When executing on a distributed Ray cluster, ensure the nixtla library is installed on all workers.

How to Use TimeGPT with Ray

Open In Colab

Step 1: Install Fugue and Ray

Fugue provides an easy-to-use interface for distributed computation across frameworks like Ray. Install Fugue with Ray support:
pip install fugue[ray]

Step 2: Load Your Data

Load your dataset into a pandas DataFrame. This tutorial uses hourly electricity prices from various markets:
import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
    parse_dates=['ds'],
)
df.head()
Example pandas DataFrame:
unique_iddsy
0BE2016-10-22 00:00:0070.00
1BE2016-10-22 01:00:0037.10
2BE2016-10-22 02:00:0037.10
3BE2016-10-22 03:00:0044.75
4BE2016-10-22 04:00:0037.10

Step 3: Initialize Ray

Create a Ray cluster locally by initializing a head node. You can scale this to multiple machines in a real cluster environment.
import ray
from ray.cluster_utils import Cluster

ray_cluster = Cluster(
    initialize_head=True,
    head_node_args={"num_cpus": 2}
)

ray.init(address=ray_cluster.address, ignore_reinit_error=True)

# Convert your DataFrame to Ray format:
ray_df = ray.data.from_pandas(df)
ray_df

Step 4: Use TimeGPT on Ray

To use TimeGPT with Ray, provide a Ray Dataset to Nixtla’s client methods instead of a pandas DataFrame. The API remains the same as local usage. Instantiate the NixtlaClient class to interact with Nixtla’s API:
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'
)
You can use any method from the NixtlaClient, such as forecast or cross_validation.
  • Forecast Example
  • Cross-validation Example
fcst_df = nixtla_client.forecast(ray_df, h=12)
fcst_df.to_pandas().tail()
Public API models supported include timegpt-1 (default) and timegpt-1-long-horizon. For long horizon forecasting, see the long-horizon model tutorial.

Step 5: Shutdown Ray

Always shut down Ray after you finish your tasks to free up resources:
ray.shutdown()

Working with Exogenous Variables

TimeGPT with Ray also supports exogenous variables. Refer to the Exogenous Variables Tutorial for details. Simply substitute pandas DataFrames with Ray Datasets—the API remains identical. Explore more distributed forecasting options: