Effortless Accuracy: Unlocking the Power of Baseline Forecasts
So, you are working on a forecasting project. The data has been set up and the required analysis has been done. Now we jump straight into getting the best forecasts right?
Not so fast. You might be missing a crucial step. Setting up Baseline Forecasts!
A baseline forecast provides a critical point of comparison, serving as a reference for all other modeling techniques applied to a specific problem. It helps answer questions like:
- How much accuracy can be achieved with little effort? (or) How predictable is this data?
- How good or bad is the sophisticated model you are working on, compared to the baseline?
- Is the improvement in accuracy using a sophisticated model, compared to the baseline forecast, worth the effort?
So, what are Baseline forecasts? They are usually characterised by:
- Simplicity – Requires minimal training or specialized intelligence.
- Speed – Quick to implement and computationally trivial for prediction.
Baseline Forecasting Methods
Commonly used trivial baseline forecasting methods involve,
- Mean Forecast
- Naive Forecast
- Seasonal Naive Forecast
- Rolling Averages
This is by no means an exhaustive list, but it is a fair start.
Mean Forecast
This is simply the mean of all the past observations. This works for time series that are stationary and don't have trend or seasonality.
represents the forecast for time period when we have observations until time period .
is the mean of past observations.
are the past observations.
Example:
Suppose we have monthly sales data for the last 5 months:
[120, 130, 125, 135, 140]
The mean of past observations is:
So the forecast for any future month (e.g., month 6, 7, 8, ...) will be:
Pros: It is very simple to calculate and understand. It provides stable forecasts that do not fluctuate over the forecast horizon.
Cons: Recent information as well as past information is equally weighted. Hence more relevant recent information can be ignored. It doesn't work well for series with trends or seasonality.
Naive Forecast
The Naive Forecast is the latest observed value in the current period. This kind of baseline is best suited for data that is close to a random walk pattern.
Example:
Continuing with the 5-month sales data example ([120, 130, 125, 135, 140]
), the forecast would be the latest observation. In this case the latest observation, . So,
Pros: It is super simple to implement and requires minimal data (a single data point is enough!).
Cons: It cannot account for any trend or seasonality. Patterns beyond the latest data point is not captured.
Seasonal Naive Forecast
What if your data set has strong seasonality? Seasonal Naive can serve as your baseline then. This is just like the naive forecast. But instead of the latest observation, we use the value observed in the same season in the previous cycle.
is the seasonal period. 12 for monthly data and 4 for quarterly data.
. This is the integer part (floor) of . This ensures that even if we forecast for several seasons ahead, the season from the latest cycle is picked up.
Example:
Consider this quarterly sales data.
Quarter | Period | Sales |
---|---|---|
2024-Q1 | 1 | 80 |
2024-Q2 | 2 | 90 |
2024-Q3 | 3 | 110 |
2024-Q4 | 4 | 130 |
If , and we want to find the forecast for 2025-Q1, then and .
We know that for quarterly data.
And .
Now,
Pros: Seasonality is captured with minimal effort.
Cons: It does not account for recent observations, trend, or other effects. Multiple seasonalities are not captured. If seasonality is weak or other signals dominate, performance will degrade.
If you are interested in learning more, read our tutorial on multiple seasonalities.
Rolling Averages Forecast
Rolling Averages or Moving Averages or Window Averages is the mean of the last observations of a time series. The value of (referred to as window or span) has to be decided by the forecaster.
is the window or span over which averages are to be calculated.
Example:
Recall the 5-month sales data example (120, 130, 125, 135, 140
). If we set the window length as and we want to find the forecast at , then and . Thus,
Pros: This is a simple technique. We can adjust the value to weight either recency (low value) or stability (high value).
Cons: Moving Average responds slower to new information. Seasonality cannot be captured.
Which model should we use as our baseline?
Now that we've explored several baseline models, which one should you use? The table below summarizes when each model is appropriate:
Condition | Recommended Baseline |
---|---|
Time series is stationary, no trend or seasonality, values centered around a mean | Mean Forecast |
Data resembles a random walk (next value = previous + random noise) | Naive Forecast |
Clear and stable seasonal pattern, with seasonality as the dominant signal | Seasonal Naive Forecast |
Recent values are strong predictors of future values | Rolling Average |
Easy Baseline Forecasts using the statsforecast
package
This section guides you through how to implement baseline forecasts using the StatsForecast package — a fast and scalable library for statistical time series forecasting.
Install the statsforecast
if you don't have it installed.
pip install statsforecast
Also, import the necessary packages:
import pandas as pd
import numpy as np
import os
from statsforecast import StatsForecast
from statsforecast.models import Naive, SeasonalNaive, HistoricAverage, WindowAverage
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import rmse
We will use a subset of the Tourism dataset (from the R tsibble
package), limited to three regions.
Transform the ds
column into a quarterly timestamp format to align it with the data's quarterly frequency:
df = pd.read_csv('EffortlessAccuracyUnlockingThePowerOfBaselineForecasts_3Region_tourism.csv')
df['ds'] = pd.PeriodIndex(df['ds'], freq='Q').to_timestamp()
df
unique_id | ds | y |
---|---|---|
Adelaide | 1998-01-01T00:00:00 | 658.55 |
Adelaide | 1998-04-01T00:00:00 | 449.85 |
Adelaide | 1998-07-01T00:00:00 | 592.90 |
Adelaide | 1998-10-01T00:00:00 | 524.24 |
Adelaide | 1999-01-01T00:00:00 | 548.39 |
Adelaide | 1999-04-01T00:00:00 | 568.69 |
Adelaide | 1999-07-01T00:00:00 | 538.05 |
Adelaide | 1999-10-01T00:00:00 | 562.42 |
Adelaide | 2000-01-01T00:00:00 | 646.35 |
Adelaide | 2000-04-01T00:00:00 | 562.75 |
The df
dataframe contains three columns:
unique_id
: Identifies each individual time series. One model will be trained perunique_id
.ds
: The timestamp column. Ensure it is properly formatted with quarterly frequency.y
: The target variable to forecast.
Split the data into training and testing sets:
test_df = df.groupby("unique_id", group_keys=False).tail(4)
train_df = df[~df.index.isin(test_df.index)]
Define the baseline models to use for forecasting:
models = [
HistoricAverage(),
Naive(),
SeasonalNaive(season_length = 4), # Quarterly data seasonality = 4
WindowAverage(window_size=4)
]
This list includes the four baseline models discussed earlier:
HistoricAverage()
: Mean ForecastNaive()
: Naive ForecastSeasonalNaive(season_length=4)
: Seasonal Naive Forecast. Here, aseason_length
of 4 is used to reflect quarterly seasonality.WindowAverage(window_size=4)
: Rolling Averages. Withwindow_size=4
, the model averages the last 4 quarters to generate forecasts.
Initialize the StatsForecast
object with models and frequency, then train it:
sf = StatsForecast(
models=models,
freq='QS', # Quarterly frequency
)
# Train the data on all the four models.
sf.fit(train_df)
The code is quite brief. statsforecast
has made training multiple models a walk in the park. Also, note that we didn't have to do anything about the 3 separate time series. This part is also nicely abstracted for us.
Set the number of periods to forecast, then generate predictions using the trained models:
# Define forecast horizon
h = 4 # 4 quarters = 1 year
pred_df = sf.predict(h=h)
Let's take a look at our forecasts now using the plot
method in the StatsForecast
class:
sf.plot(df, pred_df)
Now that we have predictions for the 4 models across the 3 different time series, let's evaluate the forecasts using the evaluate
method with rmse
as the error metric:
accuracy_df = pd.merge(test_df, pred_df, how = 'left', on = ['unique_id', 'ds'])
evaluate(accuracy_df, metrics=[rmse])
unique id | metric | Historic Average | Naive | Seasonal Naive | Window Average |
---|---|---|---|---|---|
Adelaide | rmse | 129.97 | 64.81 | 49.45 | 43.65 |
Ballarat | rmse | 61.25 | 46.27 | 51.04 | 49.31 |
Barkly | rmse | 14.7 | 14.96 | 17.76 | 13.54 |
We see the window average forecast provides the best baseline for our data.
Conclusion
We looked at what baseline forecasts are, why they are important, also how we can get some baseline forecasts up and running in no time using Nixtla's statsforecast
package.
The right baseline depends on the data-generating process of the time series and the closeness of these processes to the assumptions behind the baseline forecasts.
It is often beneficial to compute several baseline forecasts. Especially when the effort required is so low. Taking a cue from the "No free lunch" theorem, no single baseline method is universally superior. The effectiveness of a baseline depends on how well its implicit assumptions match the data-generating process of the specific time series.
Once one or more satisfactory baselines are established, and their performance is quantified, the focus shifts to developing more advanced forecasting models. The goal should then be to significantly outperform the best baseline.