Introduction
In the real world, we often face forecasting problems in environments where several teams, departments, or companies are competing and building up their performance over time. Think about how sales branches stack up revenue throughout the year, how factories compare production outputs, or how bids play out during a procurement cycle. In all these settings, we’re not just interested in the numbers themselves, but in how each competitor measures up against the rest as the results accumulate.
Championship tournaments provide an excellent case study for this pattern. Unlike simple time series where we forecast a single variable in isolation, championship-style data creates unique challenges:
- Panel structure: Multiple entities (teams, plants, branches) tracked simultaneously
- Cumulative metrics: Performance compounds over time (points, sales, production)
- Fixed horizon: A predetermined endpoint where final rankings matter
- Historical patterns: Entities establish consistent performance trajectories
These characteristics appear across industries. In manufacturing, production lines accumulate defect rates or output volumes over quarters. In finance, regional offices accumulate sales targets. In logistics, distribution centers accumulate delivery performance metrics. Understanding how to forecast these cumulative, competitive time series has broad applications beyond sports.
In this blog post, we'll use a championship tournament as our example system to demonstrate how StatsForecast, a very powerful Nixtla's statistical forecasting library, can predict final outcomes by analyzing cumulative performance time series. The same methodology applies whenever you need to forecast how multiple entities will perform relative to each other over a defined period.
To accomplish this, we'll follow a systematic approach:
- Prepare the Data: Generate a simulated championship with cumulative points time series for each team
- Hold Out Last N Matches: Keep final matches for evaluation
- Train Forecast Model: Fit the model on matches 1 to T−N using StatsForecast and AutoARIMA
- Predict Last N Outcomes: Generate forecasts for the remaining matches
- Evaluate and Visualize Results: Compare predictions with actual outcomes and assess forecast accuracy
The setup is summarized in the following chart:
It seems like we have a lot to cover. Let's get to it!
Setup Championship Teams and Matches
To generate realistic championship data, we need to model teams with different strengths and simulate match outcomes. The key concepts are:
- Team strength parameters: Each team gets a strength value that influences their scoring ability
- Poisson match model: Goals are generated using a Poisson distribution based on team strengths
- Home advantage: Home teams get a slight boost in expected goals
The core logic uses a Poisson process where expected goals depend on:
- Team strength differential
- Home advantage (typically ~0.3 goals)
- Base scoring rate (~1.35 goals per team)
Match outcomes translate to points: Win = 3 points, Draw = 1 point, Loss = 0 points.
Generate Championship Schedule
For a valid championship, each team must play every other team exactly twice (once home, once away). We use the circle method algorithm:
- First half of season: N-1 rounds with rotating pairings
- Second half: Mirror of first half (swap home/away)
- Validation: Each team plays N-1 home games and N-1 away games
For 20 teams, this creates 38 matchdays with 380 total matches.
Sample Output:
Rounds: 38; Matches total: 380 (should be 38 & 380)
Matchday 1
Team12 vs Team08
Team19 vs Team06
Team17 vs Team03
Team02 vs Team13
Team11 vs Team07
...
Simulate Results and Build Time Series
Now we put everything together: simulate matches, track cumulative statistics, and transform the data into a panel time series ready for forecasting.
The key transformation is converting match-by-match results into a cumulative points time series for each team:
- Panel structure:
unique_id (team), ds (matchday), y (cumulative points)
- Cumulative metrics: Points, goals for/against, wins/draws/losses accumulate over time
- Train/test split: Hold out final matchdays for evaluation
This structure is exactly what Nixtla's forecasting libraries expect and is analogous to tracking cumulative sales across branches, production output across facilities, or any competitive metric across entities.
Full implementation: For the complete code covering team setup, calendar generation, match simulation, and data transformation, see the championship_forecasting.ipynb notebook.
Running the simulation:
teams = [f"Team{i:02d}" for i in range(1, 21)]
season = generate_calendar(teams, seed=2025, shuffle_rounds=True)
strengths = make_tiered_strengths(teams)
# 1) Full season → dataframes for plots + forecasting
full_season_results = prepare_forecasting_data(teams, season, strengths, seed=777)
matches_df = full_season_results["matches_df"]
full_season_ts = full_season_results["ts_df"] # (unique_id, ds, y) ready for StatsForecast/TimeGPT
standings_df = full_season_results["standings_df"]
# 2) Train on first 35 matchdays, forecast remaining 3
train_data = prepare_forecasting_data(teams, season, strengths, seed=777, cutoff_matchday=35)
train_ts = train_data["ts_df"] # ds ∈ [1..35]
forecast_horizon = train_data["h"] # 3 matchdays remaining
The following assumptions are made:
- We are considering 20 teams (so 38 matchdays per team, 380 matches total).
- We are training on the first 35 matchdays and predicting the last 3.
- Thanks to the structure of the output, we can train on part of the championship and predict the final championship results and standings.
...
| 637 | Team20 | 32 | 18 | 0 | Team14 | A | 1 | 4 | L | 26 | 91 | -65 | 4 | 6 | 22 |
| 657 | Team20 | 33 | 18 | 0 | Team13 | A | 2 | 4 | L | 28 | 95 | -67 | 4 | 6 | 23 |
| 678 | Team20 | 34 | 19 | 1 | Team15 | H | 0 | 0 | D | 28 | 95 | -67 | 4 | 7 | 23 |
| 683 | Team20 | 35 | 19 | 0 | Team04 | A | 1 | 5 | L | 29 | 100 | -71 | 4 | 7 | 24 |
Predict and Forecast with StatsForecast
Now that we have all the data, we can let StatsForecast do the magic. In particular, we will use the AutoARIMA feature to train and forecast the last three matches for the entire championship.
The whole thing can be done in literally three lines of code:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
sf = StatsForecast(models=[AutoARIMA()], freq=1)
sf.fit(train_ts)
forecast_raw = sf.predict(h=forecast_horizon, level=[95])
Evaluate the Results
The championship forecast outputs are stored in forecast_raw. To properly evaluate and visualize our predictions, we need two key steps:
Step 1: Round forecasts to valid integer points
Since championship points can only be integers (0, 1, or 3 per match), we need to round all forecast values:
def round_forecast_to_valid_points(forecast_df: pd.DataFrame) -> pd.DataFrame:
"""
Round forecast values to integers since points must be whole numbers.
"""
df = forecast_df.copy()
for col in df.columns:
if col not in ['unique_id', 'ds']:
df[col] = df[col].round().astype(int)
return df
Step 2: Visualize forecasts with actual results
For visualization, we'll use helper functions that plot cumulative points over time with prediction intervals. The plotting logic handles:
- Extracting team-specific data from panel forecasts
- Overlaying actual vs. predicted cumulative points
- Displaying 95% prediction intervals
- Marking the train/test split point
Plotting utilities: For the complete plotting functions (plot_team_cumpoints_with_forecast and helpers), see the championship_forecasting.ipynb notebook.
And display the results using the following block of code:
# Round to valid integer points (football only allows 0, 1, or 3 points per match)
forecast = round_forecast_to_valid_points(forecast_raw)
# Add actual values to compare with predictions
full_season_results = prepare_forecasting_data(teams, season, strengths, seed=777)
full_season_ts = full_season_results["ts_df"][["unique_id", "ds", "y"]]
# Merge actual values into the forecast dataframe
forecast = forecast.merge(
full_season_ts.rename(columns={"y": "actual"}),
on=["unique_id", "ds"],
how="left"
)
plot_team_cumpoints_with_forecast(
ts_df=full_season_results["ts_df"], # full actuals for context
team="Team01",
fcst_df=forecast,
model_name="AutoARIMA", # tell the helper how to read the wide columns
level=95
)
This is the output for Team01:
And this is how the predictions look (forecast for the full championship):
...
| 56 | Team19 | 38 | 23 | 19 | 27 | 21 | 2 | 2 | 4 |
| 57 | Team20 | 36 | 20 | 18 | 21 | 19 | 1 | 1 | 1 |
| 58 | Team20 | 37 | 20 | 17 | 23 | 19 | 1 | 1 | 1 |
| 59 | Team20 | 38 | 21 | 17 | 24 | 19 | 2 | 2 | 4 |
Thanks to the power of StatsForecast and AutoARIMA, we are able to predict the full championship in a few seconds, together with the prediction intervals and the average prediction for each team in the championship.
Conclusions
Let's recap what we covered in this post:
Forecast many entities at once with panel data structure: Instead of building separate models for each team, we organize our data so that all 20 teams are stacked together with shared columns (unique_id, ds, y).
Tracked cumulative metrics which create predictable patterns: When performance accumulates over time (points, sales, production output), historical trajectories become informative for future outcomes.
AutoARIMA automates model selection: Rather than manually tuning ARIMA parameters for each entity, StatsForecast's AutoARIMA automatically identifies the optimal model configuration per team. This automation is crucial when forecasting across many entities simultaneously, saving time while maintaining forecast accuracy.
Prediction intervals quantify uncertainty: The 95% prediction intervals generated by our model provide not just point forecasts but also confidence ranges. This is essential for decision-making—knowing that a team will finish with 85-90 points is more actionable than a single-point estimate of 87 points.
Historical holdout validation demonstrates practical performance: By training on matchdays 1-35 and predicting the final 3 matchdays, we simulated a realistic forecasting scenario, validating that this approach works when you need to forecast competitive outcomes before a period ends.
This forecasting methodology extends beyond sports to any scenario where multiple entities compete on cumulative metrics over a fixed horizon: quarterly sales targets across regions, monthly production goals across facilities, or seasonal performance metrics across departments. The combination of panel data structure, cumulative metric tracking, and automated model selection with StatsForecast provides a powerful framework for forecasting competitive, multi-entity systems in any industry.