So how do I predict the results of a match with these ratings?
This is the huge one with two world powers vying for automatic qualification to South Africa next year. How would I sim this game?
Well let’s look up what ratings we had at the last pull (they will have changed very little since).
Column 1 is the Offensive rating for every team. This is simply how many goals they can be expected to score, not necessarily an evaluation of their attacking players. Sometimes good offense leads to goals, sometimes good defense does. Usually it’s some combination of the two (though admittedly more goals suggests better attackers without confirming it). Germany’s 1.78 is good for second in the rankings, a good bit behind Brazil. Russia’s 1.34 is pretty good as well, but off the pace some. On defense Germany is 192.74, while Russia is quite close at 180.64.
A few notes: first, notice that Russia does significantly better in the other rating method (column 4) which is a rating based on wins, draws and losses with margin of victory adjustments. This suggests that Russia’s results (at least in more recent or more critical matches) have been better than you’d normally expect given their goals scored and allowed. I’ll let you decide how you’d like to interpret that, the sims view that as essentially luck, a conclusion that isn’t necessarily the case (but it’s probably at least a partial explanation).
Second, the reason the Offense and Defense ratings are scaled differently (IE, the Defense ratings can be as much as 200 times higher than the Offense rating for a team) is to make the upcoming calculations as straightforward and easy to comprehend as possible. I could easily convert them so that they are on the same scale, but then to use them I’d just have to convert them back, so what’s the point?
As mentioned in the previous post you can now predict the final score of a match between the two teams with the following formula:
Russia Goals = (1.34/(1.34+192.74))*180 = 1.24 goals
Germany Goals = (1.78/(1.78+180.64))*180 = 1.76 goals
So Germany is (unexpectedly given their ratings) the favorite. But, of course, Russia is playing at home in Moscow, so we need to adjust for that. Now I’ve mentioned before that I’m not entirely thrilled with the way I handle Home Field Advantage (from now on “HFA”). The reason is that I’ve settled on a static advantage for all teams. Now we KNOW that isn’t the case. Some teams simply have larger HFAs than others. One bad trip to Bolivia, Ecuador or the deadly Azteca in Mexico will convince you of that in a hurry. The problem is that I’ve yet to work out a satisfactory way to determine the individual HFAs for every team. I plan on including and then testing a new system to do just that, very soon. But for now I’m going to stick with what I have, which is the average HFA for all home teams which I derived from the system a couple of years ago. Expect some further discussion on solving this issue in the coming months.
Home goals = Home expected goals * 1.186
Away goals = Away expected goals * 0.762
It’s a substantial benefit, which just indicates how huge an advantage some of the aforementioned fortresses might really be. Let’s throw the standard HFAs into the Russia/Germany mix.
Russia goals = 1.24 goals * 1.186 = 1.474 goals
Germany goals = 1.76 goals * 0.762 = 1.339 goals
And now, just like that, the Russians are the favorites.
So we’ve got a predicted outcome, but how do we adjust for the fact that we know Russia won’t score 1.474 goals and Germany won’t score 1.339 goals. How do we turn those numbers into actual scorelines. Some of you may have guessed the answer if you have some familiarity with it, but for those that don’t, I’ll use the old fallback the Poisson distribution.
Now I could have stuck with that 180 business I use in the rating system here, and then predicted the chances of scoring every half minute, but because of the nature of Poisson, the difference between the two methods will be very small. (Poisson is essentially breaking up an event with an unknown and potentially unlimited number of trials into infinitely small tiny pieces, and then using limit calculus to derive the formula). Poisson is usually used when you have an average number of successful events occurring within a particular time frame, but don’t have any clue as to how many specific trials have occurred (like in a Binomial distribution). If that sounds like way too much math lingo, think of instances where you know how often something happened in a period of time, but don’t know how many chances it had to happen during that time.
Because that description nicely fits goals scored in a football match, many have argued for and used Poisson to held model the outcome of football matches. As has been studied most people who have looked at the issue conclude that Poisson does a good enough job in most instances and that while there are concerns (particularly with regards to teams who try to hold or defend leads), those concerns often cancel one another out and that it’s a comfortably respectable way to model games.
The poisson formula is:
And if you don’t really care about the math jargon, all that menas is that if you know the average goals scored per game for a team, you can predict the chances of scoring a specific number of goals using that formula (that number being ‘x’). I’ll do the math for you for Russia and Germany.
Team 0 goals 1 goal 2 goals 3 goals 4 goals 5 goals 6 goals 7 goals Russia 22.9% 33.8% 24.9% 12.2% 4.5% 1.3% 0.3% 0.1% Germany 26.2% 35.1% 23.5% 10.5% 3.5% 0.9% 0.2% 0.0%
Note: there are actually microscopic chances on down the line and the sim program stops bothering at 40 goals (the programs used to generate results aren’t even accurate at those kind of decimal places anyway).
And then you can use a random number generator to generate scorelines over and over again. Say 10,000 times for each remaining qualifier. 🙂
After doing some multiplying on the big sheet that worked out to:
Russia wins: 40.5%
Draw: 25.1%
Germany wins: 34.4%
Because any result other than the first one is essentially a win for Germany, that explains why Germany remained the favorite in the group even though Russia is favored in a game likely to be decisive in determining the group winner.
Now of course if that was all there was to say about the matter, this would be a very boring sport indeed. This sort of approach is ultimately just a baseline to start a conversation from, and a means of being able to produce a huge number of simulated games in a short period of time. The number of additional factors that can go into predicting a game like this are virtually limitless and that’s part of the joy of this or any sport. Using the math as a clarifier and using whatever you know (or, in my case, don’t know) about the sport to go from there is not only allowed but encouraged. My goal is to try and make this process as accurate as I’m able, but I know perfection ain’t gonna happen.
A final note: here is a link to a small excel spreadsheet which performs all of the above calculations. If you want to try it out with different teams using the ratings, all you have to know is that the home team goes on top, and if you want to predict a neutral site game, change the ‘1’ to a ‘0’. To generate a new random number (and therefore a new score), go to an empty cell, type in a space and hit enter. A new random number will generate.
Leave a Reply