The National Team Rating System Explanation

Just what the hell am I doing back in the vorosmccracken.com command bunker when I’m running all of these ratings and simulations?

Now I’m not going to bother pretending I know anything about the effects subtle tactical and formation changes have on the outcome of a match. That’s not what I do. I’m a numbers guy and that’s how the analysis is going to be done.

If you really don’t care what precisely is going on, but have looked over the ratings and decided that they seem reasonable enough to you, you might want to stop here. This post will be long and a bit technical, so if this kind of explanation doesn’t interest you then you don’t have to continue.

You could get very in depth with a whole series of numbers and analyze all of them to rate teams. I have no problem with this approach and encourage people to have a try if they’d like. The issue is that I’m running simulations on dozens of qualifiers throughout every region of the world and I have neither the time nor the resources to do that kind of analysis on every team in the world. I also don’t have the time to verify and test such analyses to make sure the information they are giving me is reliable.

So I rely on my National Team Ratings to do the heavy lifting for all of these games. So how do those work? Unlike the FIFA Ranking or even ELO, my system uses an iterative method to derive the ratings. At its basics, the system asks “what ratings would produce a predicted total goals scored and allowed for every team that matches their actual goals scored and allowed in their matches.” To put it simply, if Team A plays Team B 10 times, and Team A scores 15 goals and team B scores 8 in those 10 matches, what ratings for Team A and Team B would produce those exact goal totals over 10 matches (while not necessarily matching the exact result for each individual matches).

Add in 206 more national teams, uneven schedules, weighting of games by recentness and match importance, home field advantage and a few other minor details and that question becomes rather difficult to solve directly. The iterative method allows me to take guesses at it, and then refine my guesses until the numbers eventually converge on the answer. I’m not saying my system is the best, but I will say that whatever system out there is the best, I’m 95% sure it’s going to be an iterative system.

The System Explained

But why goals? Why not wins and losses? That’s a fair question and the answer is that I think it gives a better estimation of actual team quality than one based strictly on wins, draws and losses. This has been shown pretty conclusively in baseball and other sports and my experience looking at the issue in football/soccer is that it applies here (though maybe not as absolutely) as well. I do include in my ratings an additional rating (calculated by a slightly different system) which takes into account wins, draws and losses (as well as the margin of victory) and you see that rating on the right hand side of the columns. Testing has convinced me that the goals method is slightly more accurate in future predictions when it comes to international football.

The reason why the goals method works best in international football is somewhat of a happy accident. In league play for clubs, all teams play the same number of games. In international football, the better the team, the more games that team is likely to play. This fact essentially solves the “running up the score” problem because the outcome of such a game will disproportionately affect the ratings of the weaker team, because this one game is a much larger chunk of their total games played than the stronger team. So Australia gained little by the message they sent to the OFC with their qualifier against American Samoa. All that happened was American Samoa’s already bad defensive rating became that much worse, Australia’s offensive rating barely nudged up. The other reason why it works is that Australia could only try and run up one of their two ratings. With the other half, all they could do was not let American Samoa score, an outcome as unimpressive in the ratings as it sounds on paper. However this does mean that the ratings start to get less accurate as you start to get past the top 100 and into the bottom half of teams. We know they’re bad, but because of a lack of games and a lack of games against good opposition, precisely how bad is up for debate.

One of the adjustments I make in the system is that one game does not necessarily equal “one game” in the system. All games are counted as fractions of one game, with “games=1.0” being the maximum any single match is worth. Such a match would be a World Cup Finals match that ended today. That sounds a little complicated, but what you really need to know is that if a team scores 35 goals and allows 31 in 30 games, the system might only see it as 7.6 goals scored and 7.1 allowed in 7.4 games depending on what kind of matches were played and when.

So how do I assign those weights? Well that’s really hard to do actually, and I’m always looking for better ways to do it. As it stands right now, a friendly is worth about half as much as a World Cup finals match. Having studied the issues, friendlies do tell us some information about the quality of a National Team, and for statistical purposes provide us with a much greater sample size of games which helps the overall accuracy of the system. I’m open to the idea of lowering their value, but so far trial and error has led me to settle on that kind of weighting. All types of matches that are neither friendlies nor World Cup finals matches lie somewhere in between. Again it’s just trial and error (and a little common sense) to come up with the right figures for those matches. In the future I’m working on ways to better accomplish this task. My current project is to try and devise an “outlier” system where the weight of the match is affected by how unusual a result it might be. This may sound unfair to underdogs, but if China beats Brazil 3-0 in a non World Cup match, that itself suggests Brazil wasn’t exactly treating the match very seriously. This type of adjustment has not been implemented (or tested) yet.

Another adjustment is for the recentness of games. This one is a little easier to get at empirically, and I’m a lot more confident in these weights. The matches used in the system go back eight years. Eight years! Yes, eight years. Rest assured, a match that took place eight years ago has almost no bearing on anything. Truthfully, the biggest reason for such a long time period is that it helps with teams that don’t play that many games. For the system not to return errors and zeroes, everybody has to be rated and the time period helps with that. Another thing is that I think if there’s an error in the public perception of team strength, I think the general public puts far more weight on the most recent results than is really warranted. Club seasons last 30 games or more and the champion at the end of 30 games is quite often not the champion at the end of 5. A great example of the problem was in this year’s Confederaions Cup where an Egyptian team whose previous two games were extremely impressive came up against a higher ranked American team whose previous tow games against the same teams were abysmal. The U.S. won easily (and then got very lucky with the result of the other game) and wound up making the tournament final. At the end of the day, the true talent level of a team is better identified with more games used to make that identification.

System Structure

It’s been almost 7 years now since I started trying to do this (where does the time go) and after about a year I settled on the current structure as one that could yield me accurate results, fit the competition format of international soccer (actually it was college baseball but it works for both) and was within my mental and computer capabilities. At the very base of it was KRACH a college hockey rating system by Ken Butler. Since then it’s been modified and adjusted beyond any real recognition of those original roots (this system has two ratings instead of one just for starters) so as to best take care of the task at hand. I suppose the most interesting thing about that history is that the Boston Red Sox indirectly helped create an International Football National Team rating system. Funny how things work out.

The ratings have this format:

In a game between Team A and Team B:

Team A Predicted Goals Scored = (Team A OFFrat/(Team A OFFrat + Team B DEFrat)) * 180
Team B Predicted Goals Scored = (Team B OFFrat/(Team B OFFrat + Team A DEFrat)) * 180

You might ask, what’s that “180” about. Well, sometimes numbers are easier to work with in a binomial distribution format. What this basically means is that all possible outcomes lie somewhere between the number 0 and 1. To convert goals scored in a soccer match to this format, I needed to break down a game into smaller parts so that I could keep the number “1” as the upper bound. I decided that breaking the 90 minute game into half minutes was the best way. This does in fact work, I have tested it, and in many ways it’s quite similar to a Poisson distribution. Because that’s the way the numbers went in to making the ratings, they have to be reformatted that way once you want to use them.

If that sounds like an awful lot of math gibberish, well it is. Working through math gibberish is what I do. The point is simply that the ratings have a relationship with goals scored and allowed that can be gotten at using that formula. Those results I adjust for home field advantage. That adjustment currently is the same for all home teams, though my very next change to the system will probably be a way to allow that number to change for each team (it’s a little harder to do than you might think). So not only can I predict the outcome of game about to be played using the ratings, but I can predict the outcomes of games that have already been played. As stated above, the latter predictions are used to adjust the ratings so that they best fit the actual results.

These ratings are then the backbone of the simulations I run. In my next post I’ll show exactly how that is done, with the upcoming Russia/Germany match as the example.

21 responses to “The National Team Rating System Explanation”

  1. dorian Avatar
    dorian
  2. Voros Avatar
  3. Mitz Avatar
    Mitz
  4. dorian Avatar
    dorian
  5. dorian Avatar
    dorian
  6. Voros Avatar
  7. Ross Avatar
  8. Mitz Avatar
    Mitz
  9. Voros Avatar
  10. Mitz Avatar
    Mitz
  11. Amir Avatar
    Amir
  12. Amir Avatar
    Amir
  13. Amir Avatar
    Amir
  14. dorian Avatar
    dorian
  15. scaryice Avatar
  16. Marko Avatar
  17. Marko Avatar
  18. Marko Avatar
  19. Ryusuke Avatar
    Ryusuke

Leave a Reply

Your email address will not be published. Required fields are marked *