Skip to main content

Distributed Computing for Large-Scale Forecasting

Handling large datasets is a common challenge in time series forecasting. For example, when working with retail data, you may need to forecast sales for 100,000+ products across hundreds of stores—generating millions of forecasts daily. Similarly, when dealing with electricity consumption data, you may need to predict consumption for millions of smart meters across multiple regions in real-time.

Why Distributed Computing for Forecasting?

Distributed computing offers significant advantages for time series forecasting:
  • Speed: Reduce computation time by 10-100x compared to single-machine processing
  • Scalability: Handle datasets that don’t fit in memory on a single machine
  • Cost-efficiency: Process more forecasts in less time, optimizing resource utilization
  • Reliability: Fault-tolerant processing ensures forecasts complete even if individual nodes fail
Nixtla’s TimeGPT enables you to efficiently handle expansive datasets by integrating distributed computing frameworks (Spark, Dask, and Ray through Fugue) that parallelize forecasts across multiple time series and drastically reduce computation times.

Getting Started

Before getting started, ensure you have your TimeGPT API key. Upon registration, you’ll receive an email prompting you to confirm your signup. Once confirmed, access your dashboard and navigate to the API Keys section to retrieve your key. For detailed setup instructions, see the Setting Up Your Authentication Key tutorial.

How to Use TimeGPT with Distributed Computing Frameworks

Using TimeGPT with distributed computing frameworks is straightforward. The process only slightly differs from non-distributed usage.

Step 1: Instantiate a NixtlaClient class

from nixtla import NixtlaClient

# Replace 'YOUR_API_KEY' with the key obtained from your Nixtla dashboard
client = NixtlaClient(api_key="YOUR_API_KEY")

Step 2: Load your data into a pandas DataFrame

Make sure your data is properly formatted, with each time series uniquely identified (e.g., by store or product).
import pandas as pd

data = pd.read_csv("your_time_series_data.csv")

Step 3: Initialize a distributed computing framework

Currently, TimeGPT supports: Follow the links above for examples on setting up each framework.

Step 4: Use NixtlaClient methods to forecast at scale

Once your framework is initialized and your data is loaded, you can apply the forecasting methods:
# Example function call within the distributed environment
forecast_results = client.forecast(
    data=data,
    h=14     # horizon (e.g., 14 days)
)

Step 5: Stop the distributed computing framework

When you’re finished, you may need to terminate your Spark, Dask, or Ray session. This depends on your environment and setup. Parallelization in these frameworks operates across multiple time series within your dataset. Ensure each series is uniquely identified so the parallelization can be fully leveraged.

Real-World Use Cases

Distributed forecasting with TimeGPT is essential for:
  • Retail & E-commerce: Forecast demand for 100,000+ SKUs across multiple locations simultaneously
  • Energy & Utilities: Predict consumption patterns for millions of smart meters in real-time
  • Finance: Generate forecasts for thousands of stocks, currencies, or commodities
  • IoT & Manufacturing: Process sensor data from thousands of devices for predictive maintenance
  • Telecommunications: Forecast network traffic across thousands of cell towers
The distributed approach reduces forecast generation time from hours to minutes, enabling faster decision-making at scale.

Important Considerations

When to Use a Distributed Computing Framework

Consider a distributed framework if your dataset:
  • Contains millions of observations across multiple time series
  • Cannot fit into memory on a single machine
  • Requires extensive processing time that is impractical on a single machine

Choosing the Right Framework

When selecting Spark, Dask, or Ray, weigh your existing infrastructure and your team’s expertise. Minimal code changes allow TimeGPT to work with each of these frameworks seamlessly. Pick the framework that aligns with your organization’s tools and resources for the most efficient large-scale forecasting efforts.

Framework Comparison

FrameworkBest ForIdeal Dataset SizeLearning Curve
SparkEnterprise environments with existing Hadoop infrastructure100M+ observationsMedium
DaskPython-native workflows, easy scaling from pandas10M-100M observationsLow
RayMachine learning pipelines, complex task dependencies10M+ observationsMedium
Each framework integrates seamlessly with TimeGPT through Fugue, requiring minimal code changes to scale from single-machine to distributed forecasting.

Best Practices

To maximize the benefits of distributed forecasting:
  • Distribute workloads efficiently: Spread your forecasts across multiple compute nodes to handle huge datasets without exhausting memory or overwhelming single-machine resources.
  • Use proper identifiers: Ensure your data has distinct identifiers for each series. Correct labeling is crucial for successful multi-series parallel forecasts.

Frequently Asked Questions

Q: Which distributed framework should I choose for TimeGPT? Choose Spark if you have existing Hadoop infrastructure, Dask if you’re already using Python/pandas and want the easiest transition, or Ray if you’re building complex ML pipelines. Q: How much faster is distributed forecasting compared to single-machine? Speed improvements typically range from 10-100x depending on your dataset size, number of time series, and cluster configuration. Datasets with more independent time series see greater parallelization benefits. Q: Do I need to change my TimeGPT code to use distributed computing? Minimal changes are required. After initializing your chosen framework (Spark/Dask/Ray), TimeGPT automatically detects and uses distributed processing. The API calls remain the same. Q: Can I use distributed computing with fine-tuning and cross-validation? Yes, TimeGPT supports distributed fine-tuning and cross-validation across all supported frameworks. Explore more TimeGPT capabilities:
⌘I