TimeGPT’s distributed capabilities help you handle expansive datasets by parallelizing your forecasts across multiple time series, drastically reducing computation times.
Outline
1. Getting Started
To use TimeGPT in any scenario—distributed or not—you must first have your API key. Make sure you’ve registered and confirmed your signup email with Nixtla.
For detailed steps on connecting your API key to Nixtla’s SDK, see the
Setting Up Your Authentication Key tutorial.
2. Forecasting at Scale
Using TimeGPT with distributed computing frameworks is straightforward. The process only slightly differs from non-distributed usage.1
1. Instantiate a NixtlaClient class
NixtlaClient Instantiation
2
2. Load your data into a pandas DataFrame
Make sure your data is properly formatted, with each time series uniquely identified (e.g., by store or product).
Loading Time Series Data
3
4
4. Use NixtlaClient methods to forecast at scale
Once your framework is initialized and your data is loaded, you can apply the forecasting methods:
Forecasting Example with NixtlaClient
5
5. Stop the distributed computing framework
When you’re finished, you may need to terminate your Spark, Dask, or Ray session. This depends on your environment and setup.
Parallelization in these frameworks operates across multiple time series within your dataset. Ensure each series is uniquely identified so the parallelization can be fully leveraged.
3. Important Considerations
When to Use a Distributed Computing Framework
When to Use a Distributed Computing Framework
Consider a distributed framework if your dataset:
- Contains millions of observations across multiple time series.
- Cannot fit into memory on a single machine.
- Requires extensive processing time that is impractical on a single machine.
Choosing the Right Framework
Choosing the Right Framework
When selecting Spark, Dask, or Ray, weigh your existing infrastructure and your team’s expertise. Minimal code changes allow TimeGPT to work with each of these frameworks seamlessly. Pick the framework that aligns with your organization’s tools and resources for the most efficient large-scale forecasting efforts.
Key Concept:
Distribute your forecasts across multiple compute nodes to handle huge datasets without clogging up memory or single-machine resources.
Key Concept:
Make sure your data has distinct identifiers for each series. Correct labeling is crucial for successful multi-series parallel forecasts.
With these guidelines, you can efficiently forecast large-scale time series data using TimeGPT and the distributed computing framework that best fits your environment.