I looked at the source data from which the ratings are derived (6,972 international matches from October 12, 2001 to October 11, 2009) and compared the predicted results in those games to the actual results.
Now to fully test the system, you’d have to test it against a subset of data not used in the source data (for obvious reasons). But I’m not trying to do that here. What I’m really looking for is a built in bias that has nothing to do with how well it will predict what will happen, but rather how well it matches up with what has already happened.
The following table might make it more clear:
Favorite Pred Win% Range | Total Adjusted Games | Predicted | Actual | |||||
---|---|---|---|---|---|---|---|---|
Win% | GF | GA | Win% | GF | GA | Poisson Pred | ||
50-59% | 667 | 55.0% | 1.26 | 1.05 | 56.2% | 1.31 | 1.06 | 55.9% |
60-69% | 558 | 65.1% | 1.48 | 0.86 | 65.5% | 1.52 | 0.87 | 65.5% |
70-79% | 491 | 74.9% | 1.84 | 0.72 | 79.4% | 1.85 | 0.68 | 76.2% |
80-89% | 393 | 84.7% | 2.32 | 0.56 | 87.8% | 2.32 | 0.52 | 85.7% |
90-94% | 148 | 92.3% | 3.00 | 0.44 | 96.0% | 3.02 | 0.37 | 93.3% |
95%+ | 160 | 97.9% | 4.88 | 0.31 | 98.2% | 4.74 | 0.28 | 98.7% |
90-100% | 308 | 95.2% | 3.98 | 0.37 | 97.2% | 3.91 | 0.32 | 97.1% |
All Favorites | 2,417 | 71.3% | 1.95 | 0.77 | 73.4% | 1.96 | 0.76 | 76.1% |
Win% = (wins + (draws/2)) / games; Total Adjusted Games = games adjusted for recentness and match importance, GF and GA = goals for and against the favorite.
What you see here is a test on whether the predicted outcomes are in any way biased against favored teams, such that the sims I’ve been doing might give a team like San Marino more of a chance against Slovenia than they actually have. And I’m not entirely sure what to make of this other than it looks a little like the poisson distribution might be a little off. It’s hard to say how much the predicted goals are off, but it seems clear to me the differences there are smaller than the differences between poisson’s projected win% and actual.
I’m not entirely convinced of the final column’s mathematical applicability (this column takes the average actual goals scored and allowed and runs that through poisson to get an expected win%). I put it there to get a rough idea of how much goal differences were affecting the differences in win%, and how much poisson was.
For now all I can do is chew on this and experiment to see if I can dial in a few improvements.
One final note, the predicted Slovenia at San Marino yields this:
Slovenia = 2.65
San Marino = 0.17
Slovenia predicted chances of winning = 89.7%, Slovenia predicted chance of a draw = 8.9%
So to compare with the table above, you’d place Slovenia’s “win%” at 94.2%. If the numbers above are a real effect and not just randomness, the chances of Slovenia winning would be maybe 1 to 3 percentage points better than predicted by the current model. Obviously that lowers the Czech Republic’s chances of making the playoff by a similar (though slightly smaller) amount.
Leave a Reply