Nixtla Blog
Eliminate Manual ARIMA Tuning Using StatsForecast AutoARIMA Automation
Eliminate weeks of manual ARIMA parameter tuning with StatsForecast's AutoARIMA. Automatically select optimal model parameters for 50+ time series with confidence intervals in under 30 minutes.
How much time does your team waste on parameter tuning? Manual ARIMA modeling requires weeks of statistical testing and expertise that doesn't scale beyond single series.
AutoARIMA eliminates manual parameter testing through automated optimization. It handles parameter selection, seasonality detection, and uncertainty quantification automatically. Complete 50-series forecasting in minutes, not weeks.
Introduction to StatsForecastStatsForecast provides fast classical statistical models with automatic parameter selection. The library offers:
Automatic model selection: AutoARIMA, AutoETS, AutoTheta with intelligent parameter optimization
Scalable processing: Handle millions of time series with distributed computing and optimized algorithms
Unified interface: Consistent API across different forecasting methods
Built-in validation: Time series cross-validation for reliable model evaluation
Prediction intervals: Confidence intervals for uncertainty quantification
Install StatsForecast with a single command:
pip install statsforecast
For this tutorial, install additional dependencies:
pip install pandas numpy matplotlib
The source code for this tutorial is available on GitHub.
Setup - Retail Sales ScenarioImport the necessary libraries and create our retail forecasting scenario:
import pandas as pd
import numpy as np
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS, SeasonalNaive
import matplotlib.pyplot as plt
We will create a controlled retail dataset with realistic business patterns:
5 product categories: electronics, clothing, home & garden, sports, books
4-year timespan: 2020-2023 monthly sales data (240 observations)
Realistic patterns: trend growth, seasonal cycles, and category-specific scaling factors
View the code to create the dataset in the source code.
Here is a sample of the dataset:
unique_id
ds
y
electronics
2020-01-01
62980.284918
electronics
2020-02-01
65936.371640
electronics
2020-03-01
75810.350968
electronics
2020-04-01
83436.051479
electronics
2020-05-01
72051.214384
AutoARIMA for Automatic Model SelectionTraditional ARIMA modeling requires extensive manual parameter testing. For example, the nested loops below test every ARIMA parameter combination:
Six nested loops generate all possible (p,d,q)(P,D,Q) combinations
Each iteration instantiates a new ARIMA object
AIC comparison tracks the lowest information criterion score
The exhaustive parameter search fits 144 models per series (3×2×3×2×2×2 combinations). This time-intensive process doesn't scale to multiple series.
from statsmodels.tsa.arima.model import ARIMA
def manual_arima_approach(data, max_p=2, max_q=2, max_d=2, max_P=1, max_Q=1):
"""Conceptual overview of manual ARIMA selection"""
best_aic = float('inf')
best_params = None
# Test all parameter combinations manually with nested loops
for p in range(max_p + 1):
for d in range(max_d + 1):
for q in range(max_q + 1):
for P in range(max_P + 1):
for D in range(2):
for Q in range(max_Q + 1):
model = ARIMA(
data,
order=(p, d, q),
seasonal_order=(P, D, Q, 12),
)
aic = model.fit().aic
if aic < best_aic:
best_aic = aic
best_params = (p, d, q, P, D, Q)
return best_params, best_aic
Manual ARIMA parameter selection for electronics category:
Testing parameter combinations manually...
==================================================
ARIMA(0,0,0)(0,0,0)[12]: AIC = 1045.73
ARIMA(0,0,0)(0,0,1)[12]: AIC = 1034.88
ARIMA(0,0,0)(0,1,0)[12]: AIC = 777.79
ARIMA(0,0,0)(0,1,1)[12]: AIC = 787.59
ARIMA(0,0,0)(1,0,0)[12]: AIC = 1037.25
... (tested 144 combinations)
Best: ARIMA(0, 1, 0)x(0, 1, 1)[12], AIC = 740.28
The manual approach shows rapid AIC improvements from 1045.73 to 777.79 in early iterations, but requires testing all 144 combinations to ensure the optimal model isn't missed.
AutoARIMA transforms weeks of manual parameter tuning into seconds of automatic optimization. One simple fit() call replaces complex nested loops with blazing-fast algorithms.
# AutoARIMA with constrained search space for fair comparison with manual approach
sf_fair = StatsForecast(
models=[
AutoARIMA(
season_length=12, # Annual seasonality
max_p=2, max_q=2, # Match manual search limits
max_P=1, max_Q=1, # Match manual seasonal limits
seasonal=True # Enable seasonal components
)
],
freq='MS'
)
Let's compare the performance of the manual approach with the AutoARIMA approach with constrained search space.
electronics_series = retail_df[retail_df["unique_id"] == "electronics"]
start_time = time.time()
sf_fair.fit(electronics_series)
fair_execution_time = time.time() - start_time
print(f"Manual approach (statsmodels): {manual_execution_time:.2f}s for 144 combinations")
print(f"AutoARIMA (statsforecast): {fair_execution_time:.2f}s for 144 combinations")
print(f"Algorithm efficiency gain: {manual_execution_time/fair_execution_time:.1f}x faster")
Manual approach (statsmodels): 7.02s for 144 combinations
AutoARIMA (statsforecast): 0.36s for 144 combinations
Algorithm efficiency gain: 19.5x faster
AutoARIMA achieves 19.5x speed improvements over the manual approach.
Multiple Model Comparison and EnsembleCompare AutoARIMA with other automatic methods to validate performance:
# Configure multiple automatic models for comparison
sf_comparison = StatsForecast(
models=[
AutoARIMA(season_length=12, stepwise=True),
AutoETS(season_length=12), # Exponential smoothing
SeasonalNaive(season_length=12) # Simple seasonal baseline
],
freq='MS'
)
Split the data into training and testing sets:
# Split data for training and testing (use 80% for training)
split_date = retail_df["ds"].quantile(0.8)
train_data = retail_df[retail_df["ds"] <= split_date]
test_data = retail_df[retail_df["ds"] > split_date]
print(f"Training period: {train_data['ds'].min()} to {train_data['ds'].max()}")
print(f"Test period: {test_data['ds'].min()} to {test_data['ds'].max()}")
Fit all models on the training data:
# Fit all models simultaneously
sf_comparison.fit(train_data)
print("All models fitted for comparison")
Training period: 2020-01-01 00:00:00 to 2023-03-01 00:00:00
Test period: 2023-04-01 00:00:00 to 2023-12-01 00:00:00
All models fitted for comparison
Generate forecasts from all models:
# Generate forecasts for comparison
forecasts = sf_comparison.predict(h=7) # 7-month ahead forecasts
print(f"Forecast shape: {forecasts.shape}")
forecasts.head()
Forecast shape: (35, 5)
unique_id
ds
AutoARIMA
AutoETS
SeasonalNaive
books
2023-04-01
60523.191642
55534.897267
54350.389275
books
2023-05-01
61275.648592
55534.897267
61674.121108
books
2023-06-01
59179.501052
55534.897267
44922.018634
books
2023-07-01
55284.722061
55534.897267
50806.165985
books
2023-08-01
48753.993447
55534.897267
39706.558281
Prediction Intervals and Uncertainty QuantificationPrediction intervals quantify forecast uncertainty by providing upper and lower bounds around point predictions. They answer critical business questions: "What's the worst-case scenario for inventory planning?" and "How confident should we be in these sales projections?"
Unlike point forecasts that give single values, prediction intervals capture the inherent uncertainty in future outcomes. A 95% confidence interval means that if you repeated the forecasting process 100 times, 95 of those intervals would contain the actual future value.
AutoARIMA generates prediction intervals automatically through its predict() method. To generate prediction intervals, specify confidence levels using the level parameter (e.g., level=[80, 95]).
First, create and configure the AutoARIMA model for generating prediction intervals:
# Create AutoARIMA model for prediction intervals
sf_auto = StatsForecast(
models=[AutoARIMA(season_length=12, stepwise=True)],
freq='MS'
)
Next, fit the model on training data and generate forecasts with multiple confidence levels:
# Fit on training data and generate forecasts with multiple confidence levels
sf_auto.fit(train_data)
forecasts_with_intervals = sf_auto.predict(
h=12, # 12-month forecast horizon
level=[50, 80, 90, 95] # Multiple confidence levels
)
print(f"Forecast columns: {forecasts_with_intervals.columns.tolist()}")
Forecast columns: ['unique_id', 'ds', 'AutoARIMA', 'AutoARIMA-lo-95', 'AutoARIMA-lo-90', 'AutoARIMA-lo-80', 'AutoARIMA-lo-50', 'AutoARIMA-hi-50', 'AutoARIMA-hi-80', 'AutoARIMA-hi-90', 'AutoARIMA-hi-95']
Finally, examine the forecast results with confidence intervals for the electronics category:
sample_forecast = forecasts_with_intervals[
forecasts_with_intervals["unique_id"] == "electronics"
].head()
sample_forecast[["ds", "AutoARIMA", "AutoARIMA-lo-95", "AutoARIMA-hi-95"]]
ds
AutoARIMA
AutoARIMA-lo-95
AutoARIMA-hi-95
2023-04-01
99350.422989
87049.266213
111651.579766
2023-05-01
94747.079991
82445.923214
107048.236767
2023-06-01
97033.851200
84732.694424
109335.007977
2023-07-01
86302.421665
74001.264889
98603.578442
2023-08-01
83997.249910
71696.093133
96298.406686
Visualize predictions with uncertainty bands:
# Visualize forecasts with confidence intervals
def plot_forecasts_with_intervals(category_name):
# Get data
historical = train_data[train_data["unique_id"] == category_name].tail(24)
forecast = forecasts_with_intervals[forecasts_with_intervals["unique_id"] == category_name]
# Create plot
plt.figure(figsize=(12, 6))
# Plot data
plt.plot(historical["ds"], historical["y"], label="Historical Sales")
plt.plot(forecast["ds"], forecast["AutoARIMA"], label="AutoARIMA Forecast")
plt.fill_between(forecast["ds"], forecast["AutoARIMA-lo-95"], forecast["AutoARIMA-hi-95"],
alpha=0.2, label="95% Confidence")
plt.fill_between(forecast["ds"], forecast["AutoARIMA-lo-80"], forecast["AutoARIMA-hi-80"],
alpha=0.3, label="80% Confidence")
# Labels and legend
plt.title(f"{category_name.title()} Sales Forecast with Confidence Intervals")
plt.ylabel("Sales ($)")
plt.xlabel("Date")
plt.legend()
plt.show()
# Plot forecasts for electronics category
plot_forecasts_with_intervals("electronics")
Electronics Sales Forecast with Confidence Intervals
The electronics forecast captures seasonal patterns with confidence intervals that widen over time, reflecting increasing uncertainty in longer-term predictions. AutoARIMA successfully identifies the cyclical sales patterns while providing realistic uncertainty bounds for inventory planning decisions.
Cross-Validation for Model EvaluationTime series cross-validation tests model performance by training on historical data and validating on future periods. This approach respects temporal order by using multiple validation windows that simulate how models perform as new data arrives.
AutoARIMA integrates cross-validation through the cross_validation() method. This automatically handles the complex temporal splitting required for reliable time series validation.
# Perform time series cross-validation
cv_results = sf_auto.cross_validation(
df=train_data,
h=6, # 6-month forecast horizon
step_size=3, # Move validation window by 3 months
n_windows=4 # Use 4 validation windows
)
print(f"Cross-validation results shape: {cv_results.shape}")
cv_results.head()
Cross-validation results shape: (120, 5)
unique_id
ds
cutoff
y
AutoARIMA
books
2022-01-01
2021-12-01
43018.516004
43226.769208
books
2022-02-01
2021-12-01
48841.347642
42457.344848
books
2022-03-01
2021-12-01
50980.426686
41966.543477
books
2022-04-01
2021-12-01
54350.389275
41653.470486
books
2022-05-01
2021-12-01
61674.121108
41453.767096
Calculate MAPE (Mean Absolute Percentage Error) for each product category to measure forecasting accuracy:
# Calculate mean absolute percentage error (MAPE) for each category
def calculate_mape(actual, predicted):
return np.mean(np.abs((actual - predicted) / actual)) * 100
cv_performance = cv_results.groupby("unique_id").apply(
lambda x: calculate_mape(x["y"], x["AutoARIMA"]),
include_groups=False
).reset_index(name="MAPE")
print("AutoARIMA Cross-Validation Performance (MAPE):")
print(cv_performance.sort_values("MAPE"))
AutoARIMA Cross-Validation Performance (MAPE):
unique_id MAPE
4 sports 9.252036
2 electronics 11.865445
3 home_garden 12.132121
1 clothing 13.172707
0 books 14.270602
The MAPE values show consistent performance across categories, with errors under 15% demonstrating reliable forecasting capability.
Complete Retail Forecasting WorkflowLet's put it all together in a complete retail forecasting pipeline, including:
Split data for validation
Configure AutoARIMA with essential parameters
Generate forecasts with confidence intervals
Validate performance on holdout data
# Split data for validation
split_date = retail_df["ds"].quantile(0.85) # Use 85% for training
train_data = retail_df[retail_df["ds"] <= split_date]
test_data = retail_df[retail_df["ds"] > split_date]
print(f"Training data: {len(train_data)} observations")
print(f"Training period: {train_data['ds'].min()} to {train_data['ds'].max()}")
Training data: 205 observations
Training period: 2020-01-01 00:00:00 to 2023-05-01 00:00:00
Configure the AutoARIMA model:
# Configure AutoARIMA with essential parameters
sf_production = StatsForecast(
models=[
AutoARIMA(
season_length=12, # Annual seasonality
stepwise=True, # Efficient search
seasonal=True, # Enable seasonal detection
)
],
freq='MS'
)
# Fit model on training data
sf_production.fit(train_data)
Generate forecasts with confidence intervals:
# Generate forecasts with confidence intervals
forecasts = sf_production.predict(
h=12, # 12-month forecast horizon
level=[80, 95] # Confidence levels
)
print(f"Final forecasts shape: {forecasts.shape}")
Final forecasts shape: (60, 7)
Validate model performance on holdout data:
# Generate predictions for validation period
validation_horizon = len(test_data["ds"].unique())
validation_forecasts = sf_production.predict(h=validation_horizon)
# Calculate validation metrics
validation_mape = calculate_mape(
test_data["y"].values,
validation_forecasts["AutoARIMA"].values
)
print(f"Validation MAPE: {validation_mape:.2f}%")
Validation MAPE: 22.42%
The 22.42% MAPE demonstrates reasonable predictive accuracy across product categories.
ConclusionAutoARIMA transforms weeks of manual parameter testing into a single, fast function call. This simple automation delivers reliable forecasts in minutes instead of months.
This time savings redirects analyst expertise toward business insights and strategic planning. Teams focus on interpreting forecasts and making data-driven decisions rather than manual model tuning.
