Introduction
In the real world, we often face forecasting problems in environments where several teams, departments, or companies are competing and building up their performance over time. Think about how sales branches stack up revenue throughout the year, how factories compare production outputs, or how bids play out during a procurement cycle. In all these settings, we’re not just interested in the numbers themselves, but in how each competitor measures up against the rest as the results accumulate.
Championship tournaments provide an excellent case study for this pattern. Unlike simple time series where we forecast a single variable in isolation, championship-style data creates unique challenges:
- Panel structure: Multiple entities (teams, plants, branches) tracked simultaneously
- Cumulative metrics: Performance compounds over time (points, sales, production)
- Fixed horizon: A predetermined endpoint where final rankings matter
- Historical patterns: Entities establish consistent performance trajectories
These characteristics appear across industries. In manufacturing, production lines accumulate defect rates or output volumes over quarters. In finance, regional offices accumulate sales targets. In logistics, distribution centers accumulate delivery performance metrics. Understanding how to forecast these cumulative, competitive time series has broad applications beyond sports.
In this blog post, we'll use a championship tournament as our example system to demonstrate how StatsForecast, a very powerful Nixtla's statistical forecasting library, can predict final outcomes by analyzing cumulative performance time series. The same methodology applies whenever you need to forecast how multiple entities will perform relative to each other over a defined period.
To accomplish this, we'll follow a systematic approach:
- Prepare the Data: Generate a simulated championship with cumulative points time series for each team
- Hold Out Last N Matches: Keep final matches for evaluation
- Train Forecast Model: Fit the model on matches 1 to T−N using StatsForecast and AutoARIMA
- Predict Last N Outcomes: Generate forecasts for the remaining matches
- Evaluate and Visualize Results: Compare predictions with actual outcomes and assess forecast accuracy
The setup is summarized in the following chart:
It seems like we have a lot to cover. Let's get to it!
1. Setup Championship Teams and Matches
To generate realistic championship data, we need to model teams with different strengths and simulate match outcomes. The key concepts are:
- Team strength parameters: Each team gets a strength value that influences their scoring ability
- Poisson match model: Goals are generated using a Poisson distribution based on team strengths
- Home advantage: Home teams get a slight boost in expected goals
The core logic uses a Poisson process where expected goals depend on:
- Team strength differential
- Home advantage (typically ~0.3 goals)
- Base scoring rate (~1.35 goals per team)
Match outcomes translate to points: Win = 3 points, Draw = 1 point, Loss = 0 points.
2. Generate Championship Schedule
For a valid championship, each team must play every other team exactly twice (once home, once away). We use the circle method algorithm:
- First half of season: N-1 rounds with rotating pairings
- Second half: Mirror of first half (swap home/away)
- Validation: Each team plays N-1 home games and N-1 away games
For 20 teams, this creates 38 matchdays with 380 total matches.
Sample Output:
Rounds: 38; Matches total: 380 (should be 38 & 380)
Matchday 1
Team12 vs Team08
Team19 vs Team06
Team17 vs Team03
Team02 vs Team13
Team11 vs Team07
...
3. Simulate Results and Build Time Series
Now we put everything together: simulate matches, track cumulative statistics, and transform the data into a panel time series ready for forecasting.
The key transformation is converting match-by-match results into a cumulative points time series for each team:
- Panel structure:
unique_id(team),ds(matchday),y(cumulative points) - Cumulative metrics: Points, goals for/against, wins/draws/losses accumulate over time
- Train/test split: Hold out final matchdays for evaluation
This structure is exactly what Nixtla's forecasting libraries expect and is analogous to tracking cumulative sales across branches, production output across facilities, or any competitive metric across entities.
Full implementation: For the complete code covering team setup, calendar generation, match simulation, and data transformation, see the championship_forecasting.ipynb notebook.
Running the simulation:
teams = [f"Team{i:02d}" for i in range(1, 21)]
season = generate_calendar(teams, seed=2025, shuffle_rounds=True)
strengths = make_tiered_strengths(teams)
# 1) Full season → dataframes for plots + forecasting
full_season_results = prepare_forecasting_data(teams, season, strengths, seed=777)
matches_df = full_season_results["matches_df"]
full_season_ts = full_season_results["ts_df"] # (unique_id, ds, y) ready for StatsForecast/TimeGPT
standings_df = full_season_results["standings_df"]
# 2) Train on first 35 matchdays, forecast remaining 3
train_data = prepare_forecasting_data(teams, season, strengths, seed=777, cutoff_matchday=35)
train_ts = train_data["ts_df"] # ds ∈ [1..35]
forecast_horizon = train_data["h"] # 3 matchdays remaining
The following assumptions are made:
- We are considering 20 teams (so 38 matchdays per team, 380 matches total).
- We are training on the first 35 matchdays and predicting the last 3.
- Thanks to the structure of the output, we can train on part of the championship and predict the final championship results and standings.
| unique_id | ds | y | pts | opponent | ha | goals_for | goals_against | result | cum_gf | cum_ga | cum_gd | cum_w | cum_d | cum_l | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Team01 | 1 | 3 | 3 | Team20 | H | 6 | 0 | W | 6 | 0 | 6 | 1 | 0 | 0 |
| 20 | Team01 | 2 | 6 | 3 | Team09 | H | 4 | 1 | W | 10 | 1 | 9 | 2 | 0 | 0 |
| 40 | Team01 | 3 | 9 | 3 | Team11 | H | 5 | 1 | W | 15 | 2 | 13 | 3 | 0 | 0 |
| 60 | Team01 | 4 | 10 | 1 | Team07 | H | 0 | 0 | D | 15 | 2 | 13 | 3 | 1 | 0 |
| 80 | Team01 | 5 | 13 | 3 | Team12 | H | 5 | 3 | W | 20 | 5 | 15 | 4 | 1 | 0 |
| 117 | Team01 | 6 | 16 | 3 | Team19 | A | 7 | 1 | W | 27 | 6 | 21 | 5 | 1 | 0 |
| 120 | Team01 | 7 | 16 | 0 | Team10 | H | 1 | 2 | L | 28 | 8 | 20 | 5 | 1 | 1 |
| 140 | Team01 | 8 | 19 | 3 | Team14 | H | 3 | 1 | W | 31 | 9 | 22 | 6 | 1 | 1 |
| 160 | Team01 | 9 | 22 | 3 | Team04 | H | 5 | 2 | W | 36 | 11 | 25 | 7 | 1 | 1 |
| 180 | Team01 | 10 | 25 | 3 | Team05 | H | 2 | 1 | W | 38 | 12 | 26 | 8 | 1 | 1 |
| 200 | Team01 | 11 | 28 | 3 | Team13 | H | 4 | 1 | W | 42 | 13 | 29 | 9 | 1 | 1 |
| 220 | Team01 | 12 | 31 | 3 | Team06 | H | 6 | 0 | W | 48 | 13 | 35 | 10 | 1 | 1 |
| 240 | Team01 | 13 | 34 | 3 | Team17 | H | 6 | 0 | W | 54 | 13 | 41 | 11 | 1 | 1 |
| 273 | Team01 | 14 | 37 | 3 | Team16 | A | 5 | 0 | W | 59 | 13 | 46 | 12 | 1 | 1 |
| 280 | Team01 | 15 | 40 | 3 | Team03 | H | 3 | 1 | W | 62 | 14 | 48 | 13 | 1 | 1 |
| 315 | Team01 | 16 | 43 | 3 | Team18 | A | 4 | 1 | W | 66 | 15 | 51 | 14 | 1 | 1 |
| 320 | Team01 | 17 | 46 | 3 | Team02 | H | 2 | 0 | W | 68 | 15 | 53 | 15 | 1 | 1 |
| 347 | Team01 | 18 | 47 | 1 | Team08 | A | 0 | 0 | D | 68 | 15 | 53 | 15 | 2 | 1 |
| 373 | Team01 | 19 | 50 | 3 | Team15 | A | 3 | 0 | W | 71 | 15 | 56 | 16 | 2 | 1 |
| 380 | Team01 | 20 | 53 | 3 | Team08 | H | 2 | 1 | W | 73 | 16 | 57 | 17 | 2 | 1 |
| 415 | Team01 | 21 | 56 | 3 | Team12 | A | 1 | 0 | W | 74 | 16 | 58 | 18 | 2 | 1 |
| 439 | Team01 | 22 | 59 | 3 | Team20 | A | 5 | 1 | W | 79 | 17 | 62 | 19 | 2 | 1 |
| 447 | Team01 | 23 | 62 | 3 | Team06 | A | 2 | 1 | W | 81 | 18 | 63 | 20 | 2 | 1 |
| ... | |||||||||||||||
| 637 | Team20 | 32 | 18 | 0 | Team14 | A | 1 | 4 | L | 26 | 91 | -65 | 4 | 6 | 22 |
| 657 | Team20 | 33 | 18 | 0 | Team13 | A | 2 | 4 | L | 28 | 95 | -67 | 4 | 6 | 23 |
| 678 | Team20 | 34 | 19 | 1 | Team15 | H | 0 | 0 | D | 28 | 95 | -67 | 4 | 7 | 23 |
| 683 | Team20 | 35 | 19 | 0 | Team04 | A | 1 | 5 | L | 29 | 100 | -71 | 4 | 7 | 24 |
4. Predict and Forecast with StatsForecast
Now that we have all the data, we can let StatsForecast do the magic. In particular, we will use the AutoARIMA feature to train and forecast the last three matches for the entire championship.
The whole thing can be done in literally three lines of code:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
sf = StatsForecast(models=[AutoARIMA()], freq=1)
sf.fit(train_ts)
forecast_raw = sf.predict(h=forecast_horizon, level=[95])
5. Evaluate the Results
The championship forecast outputs are stored in forecast_raw. To properly evaluate and visualize our predictions, we need two key steps:
Step 1: Round forecasts to valid integer points
Since championship points can only be integers (0, 1, or 3 per match), we need to round all forecast values:
def round_forecast_to_valid_points(forecast_df: pd.DataFrame) -> pd.DataFrame:
"""
Round forecast values to integers since points must be whole numbers.
"""
df = forecast_df.copy()
for col in df.columns:
if col not in ['unique_id', 'ds']:
df[col] = df[col].round().astype(int)
return df
Step 2: Visualize forecasts with actual results
For visualization, we'll use helper functions that plot cumulative points over time with prediction intervals. The plotting logic handles:
- Extracting team-specific data from panel forecasts
- Overlaying actual vs. predicted cumulative points
- Displaying 95% prediction intervals
- Marking the train/test split point
Plotting utilities: For the complete plotting functions (
plot_team_cumpoints_with_forecastand helpers), see the championship_forecasting.ipynb notebook.
And display the results using the following block of code:
# Round to valid integer points (football only allows 0, 1, or 3 points per match)
forecast = round_forecast_to_valid_points(forecast_raw)
# Add actual values to compare with predictions
full_season_results = prepare_forecasting_data(teams, season, strengths, seed=777)
full_season_ts = full_season_results["ts_df"][["unique_id", "ds", "y"]]
# Merge actual values into the forecast dataframe
forecast = forecast.merge(
full_season_ts.rename(columns={"y": "actual"}),
on=["unique_id", "ds"],
how="left"
)
plot_team_cumpoints_with_forecast(
ts_df=full_season_results["ts_df"], # full actuals for context
team="Team01",
fcst_df=forecast,
model_name="AutoARIMA", # tell the helper how to read the wide columns
level=95
)
This is the output for Team01:
