Eliminate Manual ARIMA Tuning Using StatsForecast AutoARIMA Automation

How much time does your team waste on parameter tuning? Manual ARIMA modeling requires weeks of statistical testing and expertise that doesn't scale beyond single series.

AutoARIMA eliminates manual parameter testing through automated optimization. It handles parameter selection, seasonality detection, and uncertainty quantification automatically. Complete 50-series forecasting in minutes, not weeks.

Introduction to StatsForecast

StatsForecast provides fast classical statistical models with automatic parameter selection. The library offers:

Automatic model selection: AutoARIMA, AutoETS, AutoTheta with intelligent parameter optimization
Scalable processing: Handle millions of time series with distributed computing and optimized algorithms
Unified interface: Consistent API across different forecasting methods
Built-in validation: Time series cross-validation for reliable model evaluation
Prediction intervals: Confidence intervals for uncertainty quantification

Install StatsForecast with a single command:

pip install statsforecast

For this tutorial, install additional dependencies:

pip install pandas numpy matplotlib

The source code for this tutorial is available on GitHub.

Setup - Retail Sales Scenario

Import the necessary libraries and create our retail forecasting scenario:

import pandas as pd
import numpy as np
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS, SeasonalNaive
import matplotlib.pyplot as plt

We will create a controlled retail dataset with realistic business patterns:

5 product categories: electronics, clothing, home & garden, sports, books
4-year timespan: 2020-2023 monthly sales data (240 observations)
Realistic patterns: trend growth, seasonal cycles, and category-specific scaling factors

View the code to create the dataset in the source code.

Here is a sample of the dataset:

unique_id	ds	y
electronics	2020-01-01	62980.284918
electronics	2020-02-01	65936.371640
electronics	2020-03-01	75810.350968
electronics	2020-04-01	83436.051479
electronics	2020-05-01	72051.214384

AutoARIMA for Automatic Model Selection

Traditional ARIMA modeling requires extensive manual parameter testing. For example, the nested loops below test every ARIMA parameter combination:

Six nested loops generate all possible (p,d,q)(P,D,Q) combinations
Each iteration instantiates a new ARIMA object
AIC comparison tracks the lowest information criterion score

The exhaustive parameter search fits 144 models per series (3×2×3×2×2×2 combinations). This time-intensive process doesn't scale to multiple series.

from statsmodels.tsa.arima.model import ARIMA

def manual_arima_approach(data, max_p=2, max_q=2, max_d=2, max_P=1, max_Q=1):
    """Conceptual overview of manual ARIMA selection"""
    best_aic = float('inf')
    best_params = None

    # Test all parameter combinations manually with nested loops
    for p in range(max_p + 1):
        for d in range(max_d + 1):
            for q in range(max_q + 1):
                for P in range(max_P + 1):
                    for D in range(2):
                        for Q in range(max_Q + 1):
                            model = ARIMA(
                                data,
                                order=(p, d, q),
                                seasonal_order=(P, D, Q, 12),
                            )
                            aic = model.fit().aic

                            if aic < best_aic:
                                best_aic = aic
                                best_params = (p, d, q, P, D, Q)

    return best_params, best_aic

Manual ARIMA parameter selection for electronics category:
Testing parameter combinations manually...
==================================================
ARIMA(0,0,0)(0,0,0)[12]: AIC = 1045.73
ARIMA(0,0,0)(0,0,1)[12]: AIC = 1034.88
ARIMA(0,0,0)(0,1,0)[12]: AIC = 777.79
ARIMA(0,0,0)(0,1,1)[12]: AIC = 787.59
ARIMA(0,0,0)(1,0,0)[12]: AIC = 1037.25
... (tested 144 combinations)
Best: ARIMA(0, 1, 0)x(0, 1, 1)[12], AIC = 740.28

The manual approach shows rapid AIC improvements from 1045.73 to 777.79 in early iterations, but requires testing all 144 combinations to ensure the optimal model isn't missed.

AutoARIMA transforms weeks of manual parameter tuning into seconds of automatic optimization. One simple fit() call replaces complex nested loops with blazing-fast algorithms.

# AutoARIMA with constrained search space for fair comparison with manual approach
sf_fair = StatsForecast(
    models=[
        AutoARIMA(
            season_length=12,      # Annual seasonality
            max_p=2, max_q=2,      # Match manual search limits
            max_P=1, max_Q=1,      # Match manual seasonal limits
            seasonal=True          # Enable seasonal components
        )
    ],
    freq='MS'
)

Let's compare the performance of the manual approach with the AutoARIMA approach with constrained search space.

electronics_series = retail_df[retail_df["unique_id"] == "electronics"]

start_time = time.time()
sf_fair.fit(electronics_series)
fair_execution_time = time.time() - start_time

print(f"Manual approach (statsmodels): {manual_execution_time:.2f}s for 144 combinations")
print(f"AutoARIMA (statsforecast):   {fair_execution_time:.2f}s for 144 combinations")
print(f"Algorithm efficiency gain:   {manual_execution_time/fair_execution_time:.1f}x faster")

Manual approach (statsmodels): 7.02s for 144 combinations
AutoARIMA (statsforecast):   0.36s for 144 combinations
Algorithm efficiency gain:   19.5x faster

AutoARIMA achieves 19.5x speed improvements over the manual approach.

Multiple Model Comparison and Ensemble

Compare AutoARIMA with other automatic methods to validate performance:

# Configure multiple automatic models for comparison
sf_comparison = StatsForecast(
    models=[
        AutoARIMA(season_length=12, stepwise=True),
        AutoETS(season_length=12),  # Exponential smoothing
        SeasonalNaive(season_length=12)  # Simple seasonal baseline
    ],
    freq='MS'
)

Split the data into training and testing sets:

# Split data for training and testing (use 80% for training)
split_date = retail_df["ds"].quantile(0.8)
train_data = retail_df[retail_df["ds"] <= split_date]
test_data = retail_df[retail_df["ds"] > split_date]

print(f"Training period: {train_data['ds'].min()} to {train_data['ds'].max()}")
print(f"Test period: {test_data['ds'].min()} to {test_data['ds'].max()}")

Fit all models on the training data:

# Fit all models simultaneously
sf_comparison.fit(train_data)
print("All models fitted for comparison")

Training period: 2020-01-01 00:00:00 to 2023-03-01 00:00:00
Test period: 2023-04-01 00:00:00 to 2023-12-01 00:00:00
All models fitted for comparison

Generate forecasts from all models:

# Generate forecasts for comparison
forecasts = sf_comparison.predict(h=7)  # 7-month ahead forecasts
print(f"Forecast shape: {forecasts.shape}")
forecasts.head()

Forecast shape: (35, 5)

unique_id	ds	AutoARIMA	AutoETS	SeasonalNaive
books	2023-04-01	60523.191642	55534.897267	54350.389275
books	2023-05-01	61275.648592	55534.897267	61674.121108
books	2023-06-01	59179.501052	55534.897267	44922.018634
books	2023-07-01	55284.722061	55534.897267	50806.165985
books	2023-08-01	48753.993447	55534.897267	39706.558281

Prediction Intervals and Uncertainty Quantification

Prediction intervals quantify forecast uncertainty by providing upper and lower bounds around point predictions. They answer critical business questions: "What's the worst-case scenario for inventory planning?" and "How confident should we be in these sales projections?"

Unlike point forecasts that give single values, prediction intervals capture the inherent uncertainty in future outcomes. A 95% confidence interval means that if you repeated the forecasting process 100 times, 95 of those intervals would contain the actual future value.

AutoARIMA generates prediction intervals automatically through its predict() method. To generate prediction intervals, specify confidence levels using the level parameter (e.g., level=[80, 95]).

First, create and configure the AutoARIMA model for generating prediction intervals:

# Create AutoARIMA model for prediction intervals
sf_auto = StatsForecast(
    models=[AutoARIMA(season_length=12, stepwise=True)],
    freq='MS'
)

Next, fit the model on training data and generate forecasts with multiple confidence levels:

# Fit on training data and generate forecasts with multiple confidence levels
sf_auto.fit(train_data)
forecasts_with_intervals = sf_auto.predict(
    h=12,                    # 12-month forecast horizon
    level=[50, 80, 90, 95]  # Multiple confidence levels
)

print(f"Forecast columns: {forecasts_with_intervals.columns.tolist()}")

Forecast columns: ['unique_id', 'ds', 'AutoARIMA', 'AutoARIMA-lo-95', 'AutoARIMA-lo-90', 'AutoARIMA-lo-80', 'AutoARIMA-lo-50', 'AutoARIMA-hi-50', 'AutoARIMA-hi-80', 'AutoARIMA-hi-90', 'AutoARIMA-hi-95']

Finally, examine the forecast results with confidence intervals for the electronics category:

sample_forecast = forecasts_with_intervals[
    forecasts_with_intervals["unique_id"] == "electronics"
].head()

sample_forecast[["ds", "AutoARIMA", "AutoARIMA-lo-95", "AutoARIMA-hi-95"]]

ds	AutoARIMA	AutoARIMA-lo-95	AutoARIMA-hi-95
2023-04-01	99350.422989	87049.266213	111651.579766
2023-05-01	94747.079991	82445.923214	107048.236767
2023-06-01	97033.851200	84732.694424	109335.007977
2023-07-01	86302.421665	74001.264889	98603.578442
2023-08-01	83997.249910	71696.093133	96298.406686

Visualize predictions with uncertainty bands:

# Visualize forecasts with confidence intervals
def plot_forecasts_with_intervals(category_name):
    # Get data
    historical = train_data[train_data["unique_id"] == category_name].tail(24)
    forecast = forecasts_with_intervals[forecasts_with_intervals["unique_id"] == category_name]

    # Create plot
    plt.figure(figsize=(12, 6))

    # Plot data
    plt.plot(historical["ds"], historical["y"], label="Historical Sales")
    plt.plot(forecast["ds"], forecast["AutoARIMA"], label="AutoARIMA Forecast")
    plt.fill_between(forecast["ds"], forecast["AutoARIMA-lo-95"], forecast["AutoARIMA-hi-95"],
                     alpha=0.2, label="95% Confidence")
    plt.fill_between(forecast["ds"], forecast["AutoARIMA-lo-80"], forecast["AutoARIMA-hi-80"],
                     alpha=0.3, label="80% Confidence")

    # Labels and legend
    plt.title(f"{category_name.title()} Sales Forecast with Confidence Intervals")
    plt.ylabel("Sales ($)")
    plt.xlabel("Date")
    plt.legend()
    plt.show()

# Plot forecasts for electronics category
plot_forecasts_with_intervals("electronics")

The electronics forecast captures seasonal patterns with confidence intervals that widen over time, reflecting increasing uncertainty in longer-term predictions. AutoARIMA successfully identifies the cyclical sales patterns while providing realistic uncertainty bounds for inventory planning decisions.

Cross-Validation for Model Evaluation

Time series cross-validation tests model performance by training on historical data and validating on future periods. This approach respects temporal order by using multiple validation windows that simulate how models perform as new data arrives.

AutoARIMA integrates cross-validation through the cross_validation() method. This automatically handles the complex temporal splitting required for reliable time series validation.

# Perform time series cross-validation
cv_results = sf_auto.cross_validation(
    df=train_data,
    h=6,           # 6-month forecast horizon
    step_size=3,   # Move validation window by 3 months
    n_windows=4    # Use 4 validation windows
)

print(f"Cross-validation results shape: {cv_results.shape}")
cv_results.head()

Cross-validation results shape: (120, 5)

unique_id	ds	cutoff	y	AutoARIMA
books	2022-01-01	2021-12-01	43018.516004	43226.769208
books	2022-02-01	2021-12-01	48841.347642	42457.344848
books	2022-03-01	2021-12-01	50980.426686	41966.543477
books	2022-04-01	2021-12-01	54350.389275	41653.470486
books	2022-05-01	2021-12-01	61674.121108	41453.767096

Calculate MAPE (Mean Absolute Percentage Error) for each product category to measure forecasting accuracy:

# Calculate mean absolute percentage error (MAPE) for each category
def calculate_mape(actual, predicted):
    return np.mean(np.abs((actual - predicted) / actual)) * 100

cv_performance = cv_results.groupby("unique_id").apply(
    lambda x: calculate_mape(x["y"], x["AutoARIMA"]),
    include_groups=False
).reset_index(name="MAPE")

print("AutoARIMA Cross-Validation Performance (MAPE):")
print(cv_performance.sort_values("MAPE"))

AutoARIMA Cross-Validation Performance (MAPE):
     unique_id       MAPE
4       sports   9.252036
2  electronics  11.865445
3  home_garden  12.132121
1     clothing  13.172707
0        books  14.270602

The MAPE values show consistent performance across categories, with errors under 15% demonstrating reliable forecasting capability.

Complete Retail Forecasting Workflow

Let's put it all together in a complete retail forecasting pipeline, including:

Split data for validation
Configure AutoARIMA with essential parameters
Generate forecasts with confidence intervals
Validate performance on holdout data

# Split data for validation
split_date = retail_df["ds"].quantile(0.85)  # Use 85% for training
train_data = retail_df[retail_df["ds"] <= split_date]
test_data = retail_df[retail_df["ds"] > split_date]

print(f"Training data: {len(train_data)} observations")
print(f"Training period: {train_data['ds'].min()} to {train_data['ds'].max()}")

Training data: 205 observations
Training period: 2020-01-01 00:00:00 to 2023-05-01 00:00:00

Configure the AutoARIMA model:

# Configure AutoARIMA with essential parameters
sf_production = StatsForecast(
    models=[
        AutoARIMA(
            season_length=12,  # Annual seasonality
            stepwise=True,  # Efficient search
            seasonal=True,  # Enable seasonal detection
        )
    ],
    freq='MS'
)

# Fit model on training data
sf_production.fit(train_data)

Generate forecasts with confidence intervals:

# Generate forecasts with confidence intervals
forecasts = sf_production.predict(
    h=12,                    # 12-month forecast horizon
    level=[80, 95]          # Confidence levels
)

print(f"Final forecasts shape: {forecasts.shape}")

Final forecasts shape: (60, 7)

Validate model performance on holdout data:

# Generate predictions for validation period
validation_horizon = len(test_data["ds"].unique())
validation_forecasts = sf_production.predict(h=validation_horizon)

# Calculate validation metrics
validation_mape = calculate_mape(
    test_data["y"].values,
    validation_forecasts["AutoARIMA"].values
)
print(f"Validation MAPE: {validation_mape:.2f}%")

Validation MAPE: 22.42%

The 22.42% MAPE demonstrates reasonable predictive accuracy across product categories.

Conclusion

AutoARIMA transforms weeks of manual parameter testing into a single, fast function call. This simple automation delivers reliable forecasts in minutes instead of months.

This time savings redirects analyst expertise toward business insights and strategic planning. Teams focus on interpreting forecasts and making data-driven decisions rather than manual model tuning.

Sections