About number twenty
The Name
number twenty is named in tribute to Diogo Jota, Liverpool's Portuguese forward, who wore the number 20 shirt and played a key role in delivering the club's historic 20ᵗʰ league title just weeks before his tragic passing in a car accident. He died together with his brother, André Silva.
This project is dedicated to their memory, celebrating the attacking spirit and relentless energy Diogo brought to the game.
The Idea
Football debates often revolve around deserved results. Supporters often have strong opinions, but it is mathematically difficult to quantify the impression a match leaves. How can we provide an objective answer to a debate that is often too subjective?
Beyond xG: Limitations
Expected goals (xG) provide valuable long-term insights into team performance, but they sometimes fail to capture the true essence of a single match.
For instance, if we strictly rely on xG, there must always be a winner — the team with higher xG. In reality, roughly 27-32% of matches end in a draw depending on the competition.
Statistical Similarity
To go beyond xG, number twenty searches for statistical neighbors — past matches with similar profiles within the same league.
By comparing a match to its closest historical equivalents, we better capture local context and subtle dynamics that xG alone may miss. This aims to reduce the aleatoric uncertainty of football.
The model focuses on features like shot-creating actions, shots on target, offensive possessions, and passing, providing a compact yet powerful summary of match outcome and style.
Feature importance is not fixed: the model progressively adjusts the weight of each feature over time, reflecting how certain metrics may become more or less relevant depending on recent match trends.
While xG helps calibrate the features, the system captures the nuances of each match more faithfully. This allows the model to capture draws, home/away wins, and match context more realistically.
Additionally, the weights of each possible match outcome — Team 1 win, draw, or Team 2 win — are calibrated so that their representation reflects the actual observed distribution in the competition. This prevents the overrepresentation of the most common outcomes, such as home wins, when using many neighbors, while still respecting the real-world proportions of wins, draws, and losses.
Interpretation
In practice, the model aims to quantify the actual offensive dominance. It answers the question: given my offensive production and that of my opponent, was my team's performance sufficient? And based on that production, what was the result that would have been most “deserved”?
The model relies on full-match statistics and can evaluate whether, given the number of offensive opportunities, penalty area entries, shots on target, and similar metrics, the final score aligns with expectations.
However, it does not capture minute-by-minute evolution: it cannot know if a goal came early or late, or how match momentum shifted.
For instance, a team may score early and then defend while the opponent dominates possession and passing but fails to create clear shots. In such cases, the model may rate the defending team as stronger offensively — as in the Arsenal vs Liverpool match on January 8, 2026, where Liverpool's second half was dominant in play but produced no shots on target.
This could be seen as a limitation, but it also reflects actual offensive performance, reinforcing the relevance of the model. While not perfect, the approach remains fully interpretable and grounded in local match context.
Analysis shows that roughly 30% of matches end with a result that goes against what the statistics would suggest.
Transparency
number twenty is not a prediction engine. It illustrates how outcomes can vary widely even between games with nearly identical stats. Football is inherently chaotic, and randomness is part of its beauty.
Each match is presented through a match card. Cards display fair probabilities — either predicted before the match (for upcoming games) or observed after the match (for past games). Two badges are shown:
- Neighbor Score (0-10): average similarity to past matches. A high score means the match is very similar to previous ones, making comparisons more reliable, while a low score indicates a unique or rare match in the competition.
- Fairness Score (0-10): measures whether the predicted probabilities before the match align with the actual probabilities observed afterward. High values indicate that the prediction was consistent with reality, while low values highlight deviations. This doesn't suggest that the actual result is fair compared to the actual calculated probabilities.
Taken together, these scores give an at-a-glance sense of whether a result was in line with the statistical reality of the match, or if it stood out as an outlier.
It would be possible to increase the predictive power of the model with more complex approaches, but that is not the purpose. The goal is not to maximize accuracy, but to find truly similar matches and make comparisons that remain interpretable. The model is fully explainable — no black-box neural networks, every calculation can be traced — and it remains data-driven while embracing the unpredictability that makes football beautiful.
Fair Elo
Unlike classical rankings that award all points to the winner, the Fair Elo system distributes points according to the fair probabilities of the match evaluated after the game.
This means that even if a team wins, it may gain fewer points than its opponent if the statistical probabilities suggest it was dominated. Conversely, a team that loses can still earn points if the fair probabilities indicate it was the stronger side. The system thus reflects the merit of each team rather than just the raw result.
This dynamic ensures that no team ever receives all points or zero points outright, even in extreme results.
This system naturally compresses the Elo range: exceptionally strong teams never get 100% of points, and weaker teams never get zero. The Glicko2-inspired model update also accounts for time decay and momentum, ensuring that rating updates respond to uncertainty while maintaining stability.
Matches between teams from different competitions (e.g., UEFA matches) help calibrate inter-league transfers (inspired by Opta). Only the top-8 European leagues (Top 5 + Primeira Liga, Pro League, Eredivisie) are included in the Fair Elo ranking.
Power Ranking: A Power Ranking table combines Fair Elo and current form to provide a more dynamic view, capturing team strength in real-time. This compensates for the fact that Elo ratings update more slowly and can lag behind actual performance trends.
Competitions & Rankings
On each competition page, team rankings are presented in three modes:
- Actual: based on real match results
- Expected: points calculated from fair probabilities (3 × P(win) + 1 × P(draw)) for each team
- Projected: end-of-season projection combining current standings and predicted upcoming matches
For the Expected and Projected rankings, there are two display modes: first, a continuous mode (often seen with xG) reflecting proportional points exactly, and second, a discrete majority result mode where all points are given according to the most probable outcome. The continuous mode gives a precise reflection of probabilities, while the majority mode shows a simpler ordinal ranking with more extreme values.
Forever our number 20