A Quick Update on the “Big Favorite” Problem (e.g. Portugal vs. Malta, Slovenia vs. San Marino, etc.)

I looked at the source data from which the ratings are derived (6,972 international matches from October 12, 2001 to October 11, 2009) and compared the predicted results in those games to the actual results.

Now to fully test the system, you’d have to test it against a subset of data not used in the source data (for obvious reasons). But I’m not trying to do that here. What I’m really looking for is a built in bias that has nothing to do with how well it will predict what will happen, but rather how well it matches up with what has already happened.

The following table might make it more clear:

Comparison of predicted results versus actual results in the system’s source data
Favorite Pred Win% Range

Total Adjusted GamesPredictedActual
Win%GFGAWin%GFGAPoisson Pred
50-59%66755.0%1.261.0556.2%1.311.0655.9%
60-69%55865.1%1.480.8665.5%1.520.8765.5%
70-79%49174.9%1.840.7279.4%1.850.6876.2%
80-89%39384.7%2.320.5687.8%2.320.5285.7%
90-94%14892.3%3.000.4496.0%3.020.3793.3%
95%+16097.9%4.880.3198.2%4.740.2898.7%
90-100%30895.2%3.980.3797.2%3.910.3297.1%
All Favorites2,41771.3%1.950.7773.4%1.960.7676.1%

Win% = (wins + (draws/2)) / games; Total Adjusted Games = games adjusted for recentness and match importance, GF and GA = goals for and against the favorite.

What you see here is a test on whether the predicted outcomes are in any way biased against favored teams, such that the sims I’ve been doing might give a team like San Marino more of a chance against Slovenia than they actually have. And I’m not entirely sure what to make of this other than it looks a little like the poisson distribution might be a little off. It’s hard to say how much the predicted goals are off, but it seems clear to me the differences there are smaller than the differences between poisson’s projected win% and actual.

I’m not entirely convinced of the final column’s mathematical applicability (this column takes the average actual goals scored and allowed and runs that through poisson to get an expected win%). I put it there to get a rough idea of how much goal differences were affecting the differences in win%, and how much poisson was.

For now all I can do is chew on this and experiment to see if I can dial in a few improvements.

One final note, the predicted Slovenia at San Marino yields this:

Slovenia = 2.65
San Marino = 0.17

Slovenia predicted chances of winning = 89.7%, Slovenia predicted chance of a draw = 8.9%

So to compare with the table above, you’d place Slovenia’s “win%” at 94.2%. If the numbers above are a real effect and not just randomness, the chances of Slovenia winning would be maybe 1 to 3 percentage points better than predicted by the current model. Obviously that lowers the Czech Republic’s chances of making the playoff by a similar (though slightly smaller) amount.

26 responses to “A Quick Update on the “Big Favorite” Problem (e.g. Portugal vs. Malta, Slovenia vs. San Marino, etc.)”

  1. dorian Avatar
    dorian
  2. Voros Avatar
  3. dorian Avatar
    dorian
  4. Voros Avatar
  5. Mitz Avatar
    Mitz
  6. Voros Avatar
  7. Voros Avatar
  8. Mitz Avatar
    Mitz
  9. Brett Avatar
    Brett
  10. Voros Avatar
  11. california viola Avatar
    california viola
  12. Mitz Avatar
    Mitz
  13. Mitz Avatar
    Mitz
  14. Voros Avatar
  15. california viola Avatar
    california viola
  16. dorian Avatar
    dorian
  17. Voros Avatar
  18. dorian Avatar
    dorian
  19. Voros Avatar
  20. Amir Avatar
    Amir
  21. dorian Avatar
    dorian
  22. Amir Avatar
    Amir
  23. Amir Avatar
    Amir
  24. Joe M Avatar
    Joe M
  25. Voros Avatar

Leave a Reply

Your email address will not be published. Required fields are marked *