So, you are working on a forecasting project. The data has been set up and the required analysis has been done. Now we jump straight into getting the best forecasts right?
Not so fast. You might be missing a crucial step. Setting up Baseline Forecasts!
A baseline forecast provides a critical point of comparison, serving as a reference for all other modeling techniques applied to a specific problem. It helps answer questions like:
- How much accuracy can be achieved with little effort? (or) How predictable is this data?
- How good or bad is the sophisticated model you are working on, compared to the baseline?
- Is the improvement in accuracy using a sophisticated model, compared to the baseline forecast, worth the effort?
So, what are Baseline forecasts? They are usually characterised by:
- Simplicity – Requires minimal training or specialized intelligence.
- Speed – Quick to implement and computationally trivial for prediction.
Baseline Forecasting Methods
Commonly used trivial baseline forecasting methods involve,
- Mean Forecast
- Naive Forecast
- Seasonal Naive Forecast
- Rolling Averages
This is by no means an exhaustive list, but it is a fair start.
Mean Forecast
This is simply the mean of all the past observations. This works for time series that are stationary and don't have trend or seasonality.
represents the forecast for time period when we have observations until time period . is the mean of past observations. are the past observations.
Example:
Suppose we have monthly sales data for the last 5 months:
[120, 130, 125, 135, 140]
The mean of past observations is:
So the forecast for any future month (e.g., month 6, 7, 8, ...) will be:
Pros: It is very simple to calculate and understand. It provides stable forecasts that do not fluctuate over the forecast horizon.
Cons: Recent information as well as past information is equally weighted. Hence more relevant recent information can be ignored. It doesn't work well for series with trends or seasonality.
Naive Forecast
The Naive Forecast is the latest observed value in the current period. This kind of baseline is best suited for data that is close to a random walk pattern.
Example:
Continuing with the 5-month sales data example ([120, 130, 125, 135, 140]), the forecast would be the latest observation. In this case the latest observation, . So,
Pros: It is super simple to implement and requires minimal data (a single data point is enough!).
Cons: It cannot account for any trend or seasonality. Patterns beyond the latest data point is not captured.
Seasonal Naive Forecast
What if your data set has strong seasonality? Seasonal Naive can serve as your baseline then. This is just like the naive forecast. But instead of the latest observation, we use the value observed in the same season in the previous cycle.
is the seasonal period. 12 for monthly data and 4 for quarterly data. . This is the integer part (floor) of . This ensures that even if we forecast for several seasons ahead, the season from the latest cycle is picked up.
Example: Consider this quarterly sales data.
| Quarter | Period | Sales |
|---|---|---|
| 2024-Q1 | 1 | 80 |
| 2024-Q2 | 2 | 90 |
| 2024-Q3 | 3 | 110 |
| 2024-Q4 | 4 | 130 |
If , and we want to find the forecast for 2025-Q1, then and . We know that for quarterly data. And . Now,
Pros: Seasonality is captured with minimal effort. Cons: It does not account for recent observations, trend, or other effects. Multiple seasonalities are not captured. If seasonality is weak or other signals dominate, performance will degrade.
If you are interested in learning more, read our tutorial on multiple seasonalities.
Rolling Averages Forecast
Rolling Averages or Moving Averages or Window Averages is the mean of the last observations of a time series. The value of (referred to as window or span) has to be decided by the forecaster.
is the window or span over which averages are to be calculated.
Example:
Recall the 5-month sales data example (120, 130, 125, 135, 140). If we set the window length as and we want to find the forecast at , then and . Thus,
Pros: This is a simple technique. We can adjust the value to weight either recency (low value) or stability (high value).
Cons: Moving Average responds slower to new information. Seasonality cannot be captured.
Which model should we use as our baseline?
Now that we've explored several baseline models, which one should you use? The table below summarizes when each model is appropriate:
| Condition | Recommended Baseline |
|---|---|
| Time series is stationary, no trend or seasonality, values centered around a mean | Mean Forecast |
| Data resembles a random walk (next value = previous + random noise) | Naive Forecast |
| Clear and stable seasonal pattern, with seasonality as the dominant signal | Seasonal Naive Forecast |
| Recent values are strong predictors of future values | Rolling Average |
Easy Baseline Forecasts using the statsforecast package
This section guides you through how to implement baseline forecasts using the StatsForecast package, a fast and scalable library for statistical time series forecasting.
Install the statsforecast if you don't have it installed.
