Under the Hood: Why I Rebuilt the Prediction Model from Scratch

The Original Model Was Good. Not Good Enough.

When I launched Footy Science at the start of 2026, the prediction engine was a logistic regression model — a well-understood statistical technique that takes a bunch of numbers about how two teams have been playing and spits out a probability of who wins.

It was trained on more than a decade of AFL data. It was calibrated carefully so that when it said “65% chance the home team wins,” that was actually true about 65% of the time. It achieved around 69% accuracy on matches it had never seen before. That’s a solid baseline.

But it had a fundamental limitation I couldn’t shake: it was a single number in, single number out machine. Every match got reduced to one probability. No sense of how a team was winning. No ability to explain why it liked one team over another in any meaningful way. Just: “we think Carlton wins, 61%.”

After five rounds of the 2026 season I rebuilt it. This post explains what changed and why.

The Problem with One Big Number

The old model worked roughly like this. Take about a dozen statistics — things like disposal rate, metres gained, recent wins, travel disadvantage — and combine them into a single score. Higher score means higher chance of winning.

The stats get squashed together into a weighted average. A team that’s been moving the ball brilliantly but losing contested possessions looks identical to a team that’s winning the ball but going nowhere with it, as long as their combined score is the same.

That’s not how football works.

A game of AFL has distinct phases. There’s the contest for the ball — who wins it, how efficiently, how often under pressure. There’s field position — who’s pushing the play forward into dangerous territory. There’s chance creation — how many shots are being generated. There’s conversion — how clinical each team is when they get their opportunities. And there’s momentum — recent form, experience, structural advantages like travel.

These phases are related but they’re not the same thing. A team can dominate possession and lose. A team can get outplayed in the midfield but kick straight and win. The old model couldn’t see any of that structure. It just saw the average.

What the New Model Does Differently

The new model is a neural network — specifically, what I’ve called a Phase Model.

Rather than throwing all the statistics into one blender, it divides them into five groups based on what part of the game they measure. Disposals, efficiency, contested possessions, and clearances go into the Accumulation phase. Metres gained and inside 50s go into Territory. Score involvements and scoring shots go into Chance Creation. Shot accuracy and average winning margin go into Conversion. Recent wins, experience, and travel go into Momentum.

Each phase has its own small sub-model that processes only those statistics. It produces a score for each phase — a number that reflects which team has the advantage in that dimension of the game. Then all five phase scores feed into a final layer that produces the prediction.

The result isn’t just a win probability. It’s also a predicted winning margin and an estimate of how uncertain that margin is. And because each phase produces its own score, you can see where the model thinks the game will be won or lost.

Predicted Margin — What That Actually Means

The old model told you: “we think Geelong wins, 72%.”

The new model tells you: “we think Geelong wins by around 18 points, give or take 32.”

That second form is much more useful. A 72% win probability with a margin of +18 ± 32 describes a game that could genuinely go either way on the day — the model is confident about the direction but honest about the noise. That’s different from a 72% win probability with a margin of +22 ± 14, which describes a game the model thinks Geelong probably controls.

The uncertainty figure (the ±) comes from the model learning not just who wins but how predictable the match is. Some matchups are structurally lopsided. Others look close on paper but have high variance — weather, a single contested free kick, a freakish goal. The model tries to capture that spread.

Player Influence

One thing I wanted to be able to show, and couldn’t with the old model, is which players are actually driving the prediction.

The Phase Model attributes each team’s phase scores back to individual players, based on how much each player’s rolling statistics contribute to that phase and how sensitive the model is to those statistics right now. The result is a relative influence ranking — you can see which players are moving the needle most in each phase of the game.

This is an approximation, not a precise measurement. It works best for the four statistical phases (Accumulation, Territory, Chance Creation, Conversion) and is excluded for Momentum, which is a team-level concept. But it gives you something the old model could never produce: a human-readable story about why the model thinks what it thinks.

Does It Actually Predict Better?

On historical test data, the new model achieves around 68% accuracy — roughly the same as the old one.

So why bother?

Two reasons.

First, test accuracy on historical data is a noisy measure. Both models were evaluated on matches from 2012 to 2025, where the feature quality is consistently good. The Phase Model’s architecture — processing phases independently — should make it more robust to the messier, partial data you get during a live season, particularly early in the year when rolling averages haven’t settled.

Second, the Phase Model’s predictions are better calibrated to close games. The old model had a tendency to treat tight matchups as coin flips. The Phase Model produces predicted margins with uncertainty estimates, which means it can express “this looks close, but the structure of the game favours the home team” in a way the old model simply couldn’t.

Early 2026 results back this up. Through five rounds the Phase Model retrospective accuracy is around 72%, compared to 69% for the logistic model on the same games.

What Stayed the Same

The underlying data hasn’t changed. The same rolling player and team statistics that powered the old model power the new one — disposals, fantasy scores, metres gained, score involvements, and so on. The same FootyWire scraping pipeline. The same weekly retrain after each round.

The historical predictions on the site — every round from 2013 to 2025 — are still the original logistic model’s output. Those were genuine pre-round predictions and I’m not going to retroactively replace them with a model that didn’t exist at the time.

For the 2026 season, Opening Round through Round 5 also show the original logistic predictions, for the same reason. From Round 6 onwards, everything is the Phase Model.

The Honest Caveats

Neural networks are less interpretable than logistic regression. With the old model I could look at the coefficients and tell you exactly which features mattered most. With the Phase Model, the relationship between inputs and outputs is more complex.

I’ve mitigated this with the phase scores and player influence panels — but those are approximations of the model’s reasoning, not a direct readout of it. There’s a real trade-off between predictive power and interpretability, and I’ve moved slightly in the direction of power.

The model also still doesn’t know about late team changes, weather, or anything that happens after Thursday night selections. That limitation is unchanged. The Phase Model is smarter about what it knows; it’s still blind to what it doesn’t.

What This Means for You

If you’re using the predictions to tip, the main practical change is that you now have more to go on than a single probability. The margin estimate and uncertainty figure tell you how confident the model really is — a 65% prediction with ±15 points uncertainty is a very different bet from a 65% prediction with ±40 points uncertainty.

The phase breakdown tells you where the model expects each team to have the advantage. If you think the Territory phase is wrong — maybe you know one team’s key forward is injured — that’s a concrete reason to override the model’s prediction rather than just a vague feeling.

The player influence panel shows you who the model is leaning on. If one of those players is actually listed as a late out, that’s a signal the prediction might shift significantly once lineups are updated.

All of that is new. None of it was available with the old model.

One Last Thing

The old model did its job. It was honest, it was well-calibrated, and for what it was, it worked.

The new one is a genuine upgrade — not because it’s more complicated, but because football is more complicated than a single weighted average, and the model now reflects that.

Every round is a new test.

Under the Hood: Why I Rebuilt the Prediction Model from Scratch

The Original Model Was Good. Not Good Enough.

The Problem with One Big Number

What the New Model Does Differently

Predicted Margin — What That Actually Means

Player Influence

Does It Actually Predict Better?

What Stayed the Same

The Honest Caveats

What This Means for You

One Last Thing

Comments

One response to “Under the Hood: Why I Rebuilt the Prediction Model from Scratch”

Leave a Reply Cancel reply

More posts

Round 10 Results

Round 9: 7 from 9, Season at 58/81 (72%)

Round 8: 6 from 9, Season at 51/72 (71%)

Round 7: 6 from 9, Season at 45/63 (71%)