Bonzini
USA Player Ranking/Rating System
Rev 2a (8/18/00)
Introduction:
The skill level of
playing foosball varies greatly, and it is for that reason a points
ranking/rating system is helpful. A points ratings system is necessary in order
to have an accurate/unbiased means of classifying (ranking) players by skill
level. Having such ratings/rankings enables events to be held that are limited
to players of certain skill levels, allowing players to play in an event
against other players of similar skill levels for a more even competition. The
system also provides a tool used to distinguish the top performing players. In
addition, ratings/rankings allow players to have a measure to know their
relative strength in the context of other players, and also serve as a gauge
all players can use as incentive/motivation to strive to improve their skills
so they can get higher rated. Optional
uses of a points ranking/rating system include allowing lower skilled players
with the least chances of winning to pay cheaper entry fees (and/or to receive
point spot advantages during special “handicapped” events); and having the
higher skilled players with the best chances of winning pay the highest entry
fees (and/or to receive the advantages of being seeded on the tournament charts
of certain events). However, there are some drawbacks to having a
ranking/rating system at all (See Appendix D).
The main events at
tournaments are “open” division events which may be entered by any player
regardless of their ranking. However,
in addition to the open division events, there are also “limited” division
events held in which higher ranked players (and sometimes lower ranked players
too) can be excluded from entering. Such special events can be limited by
either ranking (e.g., “Novice Singles”), or Points (e.g., 2500 Points Doubles).
For instance, Novice Singles would be limited to players ranked Novice or below
(players ranked above Novice are not allowed to enter). Likewise, 2500 Points
Doubles would be limited to teams whose players’ combined points ratings must
be less than or equal to 2500 (e.g., Player A has 1750 points, and Player B has
725 points, so their combined rating would be 2475 points and they could enter
as a team). Normally, lower ranked players may enter events that are equal to
or higher than their ranking (e.g., a “Novice” ranked player can enter “Expert”
singles). However, sometimes an event will be restricted to only certain
rankings (e.g., “Master Only DYP”) – such events are identified by having
“only” in the event title.
All players who enter a
rated event at a Bonzini USA sanctioned major tournament receive a points
ranking/rating, and will be listed in the Bonzini USA player ranking/rating
list (available on the “Players Page” of the Bonzini USA web site at www.bonziniusa.com).
Note: Even though you may have played in a Bonzini USA tournament, you won’t
appear on the list if it wasn’t a sanctioned major tournament, or if you only
played in non-rated event(s) such as a DYP at the tournament.
Points Rating System:
Several ratings systems and methods were evaluated by Bonzini USA for potential adaptation/use as our system, including the systems used by Tennis, Chess (Elo, Greek, and CXR versions), Bridge, Golf, Billiards, Table Tennis, The United States Table Soccer Association, The Association Francaise de BabyFoot (French Foosball Association), The British Foosball Association, and some others. Our evaluation led us to select the “Elo” rating system (which has been extensively tested and used by others, and is easily adaptable to foosball). All the other systems had intrinsic flaws in the methodology that could result in inaccurate ratings when applied to foosball, weren’t judged to be as accurate as the Elo system, or simply weren’t adaptable or were too cumbersome to apply to foosball (See Appendix C for some of the flaws in other major systems evaluated by Bonzini USA).
The Elo rating system (named after its developer Dr. Arpad Elo) is a numerical system in which differences in rating may be converted into winning probabilities and vice versa. The Elo system is a mathematical way to accurately predict how good players are in comparison to one another, and puts this prediction on a reasonable scale. It is based on the idea that one can define a probability of winning a match (winning expectancy) depending on the difference of opponents ratings. In particular, it predicts how you should do against a given opponent, compares it to how you actually do, and adjusts your rating accordingly.
Most competition rating systems today, including the system
used by FIDE (the international chess federation) to rate chess players, are
based on Elo's system. Some other games/sports that use a variation of the Elo
system for their ratings system are the official organizations for Backgammon,
Go, Scrabble, and several others. There are several books and web sites devoted
to a fuller explanation of this system than is provided here (Note: even the
Elo system is not without some minor flaws, see Appendix B). Bonzini USA’s
system is a modification of the FIDE version of the Elo rating system.
How the Points Rating System
Works:
Your rating is based on your performance in rated tournaments, as determined by a ratings methodology and formula developed by Elo (see APPENDIX A below for information on the ratings methodology/formula). Basically, each time you play a match, your rating will go up if you win and down if you lose. It will go up more if you beat a player who is rated above you, less if you beat a player who is rated below you. The opposite, of course, is true if you lose: your rating will go down more if you lose to someone rated below you, and less if you lose to a player who is rated above you. Essentially, the better your opponent is, the more points you will gain by beating them but the less points you will lose by losing to them. The exact amount your rating increases or decreases depends on a complex formula, and varies from -50 points, if you lose to somebody who's rated way below you, to +50 points, if you upset someone who's rated way above you. For instance: if you beat a much lower rated player you might gain only 1 point, if you beat an equal rated player you gain 25 points, and if you upset a much higher rated player you might gain 50 points. Conversely, in those situations your opponent loses the same amount of points as you gained. See Appendix A for some examples. Note: its also possible for your rating not to change at all for a given match (i.e., you’re rated so much higher/lower than your opponent that you only gain/lose less than half a point, which will be rounded off to zero). This differing amounts of gaining/losing points is based on the concept that the higher rated player is “supposed” to win (so each player gains/loses just a few points), but if the lower rated player beats the higher rated player he gains a lot more points and the higher rated player loses a lot more points (on the basis that either the higher rated player isn’t as good as his points indicate, or the lower rated players is better than his points indicate, or a combination thereof).
Whether you win or place or come in last in a tournament has no bearing on your rating, only whom you play in each match and how you did against them. This has the advantage of not causing players to get over-rated by winning or placing highly in a tournament with a weak field, or getting under-rated by losing quickly but to top players, etc.
The Elo system has a
different formula that can be used for new players (i.e., players who haven’t
played any/many matches and thus whose ratings are not yet accurately known),
and different variations of the formula used for established (i.e., not new)
players that are rated very highly (over 2100 points – i.e., top players who’ve
played many matches and thus should have a reasonably firmly established
rating), or for when an established player plays a “new” player. There are also
"feedback" points when you do really well in a tournament. These are
designed to get rapidly improving players to an accurate rating as quickly as
possible. However, these special formulas are not used by Bonzini USA, for
simplicity (Bonzini USA compensates for this by trying to make the initial
rating (ranking) given to a player close to their skill level, rather than
starting all new players at the average player base rating as the Elo system
typically does).
The rating system requires an initial set of player ratings. It's not terribly important from the individual player’s perspective how his initial ratings are set, so long as enough matches are played after this for the Elo system to correct mistakes, which will eventually die out. From a statistician's perspective, after a player has played about 30 rated matches, his rating will tend to converge on his true strength relative to his competitors, regardless of what his initial rating was. A rating of 1265 based on 8 matches is not considered nearly as accurate as a rating of 1310 based on 37 matches.
Matches won by “byes”, matches won/lost by forfeit, or matches where the players “tie” (e.g., finalists agree to split the winnings), are not counted in the ratings, since the match outcome does not reflect playing skill and thus the rating change wouldn’t represent a valid skill adjustment. Note: forfeits after the first ball of the match have been played DO count in the ratings, to prevent people from forfeiting a match they were probably going to lose in order not to lose ratings points. Also, in double elimination events, where the winner of the winners bracket plays the winner of the losers bracket in the finals (and it’s possible to play 2 “matches” if the losers bracket winner beats the winners bracket winner the first “match”), it is considered only one match (i.e., the overall winner and loser of the finals) for ratings purposes even if it actually goes two matches.
Ratings do not change due to lack of participation in events (e.g., if a player doesn’t play in a tournament for 2 years, he will still have the same rating he had 2 years ago) – this keeps players who are inactive for awhile from coming back with a lower rating than their true skill level, which would be inaccurate from a ratings/rankings standpoint, and unfair if it enabled them to enter lower classification events than their true skill level (generally, foosball players inactive even for years, may be a little “rusty”, but typically retain their skills). However, the inactive player’s old rating may still be somewhat inaccurate, in that the skill level of the entire player base in the ratings system may have changed over the years (e.g., if everyone improves, an “average” player with a rating of 1500 points now may be a better player than an “average” player rated 1500 points back when the inactive player stopped playing).
Separate Event
Rankings/Ratings:
Separate ratings are kept for doubles and singles, since some people are better singles players than they are in doubles, and vice versa. Your performance in singles does not affect your doubles rating/ranking, and vice versa. This keeps players from getting overrated or underrated in one event based on their performance in the other event. Every player has one ranking/rating for doubles (based only on their doubles event play), and another ranking/rating for singles (based only on their singles event play). It wouldn’t be fair to have, for instance, a player’s singles (where he may be weak shooter) ranking classification be at a high ranking classification where he really wasn’t competitive with the other players in that classification, because of his play in doubles (where he may be a great goalie). Likewise, it wouldn’t be fair to let a great goalie play in a low ranking doubles classification (where he could form a great team with a good forward) because of his play in singles. It is presumed that two goalies, or two forwards, generally wouldn’t choose to form a doubles team together (and for those players that switch positions a lot it wouldn’t make any difference since there’d just be a doubles rating).
Bonzini USA wanted to try to have separate rankings/ratings for forward and goalie too, since obviously some players are excellent goalies but mediocre forwards (and vice versa), and two excellent goalies on a team (or two excellent forwards) would not be nearly as strong of a team as their ratings would suggest (or as strong as a team consisting of an excellent forward and excellent goalie). However, we couldn’t come up with a practical way to track it. Even if players were to do something like designating their positions on doubles entry forms, there are so many players that switch positions in a tournament (based on matchups with other teams or the flow of the match, etc.) that the results would become skewed (i.e., wins/losses, and resulting points rating adjustment would not necessarily be applicable to their designated position) and eventually meaningless.
Women’s only singles and doubles events are rated separately/independently from singles and doubles events open to both men and women (and the resulting womens ratings/ranking are only used for women’s events – women playing in events open to both men and women use their regular singles and doubles rating/ranking). While the Elo rating formula theoretically should work equally well regardless of gender so women’s ratings would remain accurate despite earning/losing points in womens events (after all, playing a 1500 point woman should be as hard as playing a 1500 point man, as far as the ratings points won/lost are, since women don’t gain “extra” points just for playing extra matches in womens events), the rating of matches within the smaller population (ratings player base) of women players available for play in womens events may skew or otherwise adversely affect the points adjustments results when compared to a larger population (different ratings player base) of players available to play in non-womens events. This gives women an accurate rating for their events, without it impacting their rating in “regular” events.
Doubles Adaptation:
The Elo system was designed for player against player, or team against team. Since foosball players frequently change team partners from tournament to tournament, and even event to event within a tournament, tracking points won or lost as a “specific team” is not practical in foosball. Therefore, Bonzini USA came up with the following method to apply points won or lost as a “team” to each player’s individual doubles ratings. A team’s rating is determined by adding the two player’s doubles ratings together, and then dividing by 2. For instance, if one player on the team has a doubles rating of 1750, and the other player on the team has a doubles rating of 2040, the team rating would be 1895 (1750 + 2040 = 3790, 3790/2 = 1895). Likewise, a team consisting of a 1750 point player and a 2040 point player would logically be expected to be better than (and thus more “probable” to win per the Elo rating system) a team consisting of a 1750 point player and a 1600 point player. That is also the philosophy behind events like 2500 limited doubles – the theory is that teams should be somewhat evenly matched if their combined points are about the same.
Dividing the players combined rating by 2 to determine the team rating is important for a couple reasons:
First, certain variables in the Elo system formulas must be chosen to fit the ratings scale used. The Elo system, as modified by Bonzini USA, doesn’t handle accurately ratings much above the high 2000’s (the ratings scale the Elo system was designed for typically ranges from 0 to about 3000 points, with very few tournament players being rated below 1000 points). Also, the Elo system doesn’t handle accurately very large differences between opponents points (the maximum value limit for points gained/lost in a match would frequently be reached, negating much of the system’s purpose of differing match result rating points adjustments based on the relative point differentials in teams ratings. Just combining a team’s players points would result in teams frequently having combined ratings in the 3000’s and 4000’s, as well as potentially large differences between their team rating and their opponent’s team rating. For example, consider a team consisting of a 750 point player and a 1175 point player, for a team rating of 1925, as compared to the 3790 point team mentioned above, and thus a difference between the teams of almost 2000 points. Therefore, dividing by 2 brings the team’s rating, as well as the difference between the two teams points ratings, back down to the rating range scale of individual players which the system was designed to accurately process.
Second, the resulting ratings adjustment calculated by the software can be accurately/directly added to each of the teams individual player’s doubles ratings, just as if they were each a single 1895 point player playing against a 963 point player (750 + 1175 = 1925, 1925/2 = 962.5, rounded off to 963). Thus, if the 1895 point team beats the 963 point team, resulting in the 1895 point team gaining 5 points and the 963 point team losing 5 points, then each player on the winning team would have 5 points added to their individual doubles rating, and each player on the losing team would lose 5 points off their individual doubles rating.
This doubles method seem to be somewhat fair, since even if you’re a low rated player (say 1200 points), if you’re playing with a high rated player (2000 points), you’d become a team with strengths and weaknesses averaging about 1600 points, and it should be reasonable to apply the same calculation change in points to both players.
Ratings Scale and Ranking
Classifications:
The Elo system uses a so-called interval scale, which means that the differences in rating is the only factor of significance in terms of probabilities. The classification interval is simply the rating difference between the top and bottom of a ranking classification. In such a ranking classification the poorest player on a good day will play at the same level as the best player on an off day. Bonzini USA uses a classification interval (Ranking level) of 500 points, to make it easy to recognize when a player changes ranking classifications (i.e., whenever they cross a multiple of 500 points threshold, like 1000, 1500, 2000, etc.). Note: the classification interval may differ between Elo systems (Chess’s FIDE system uses a classification interval of 200 points). Each classification interval can be given a specific name (or “rank”) to make it easier to distinguish between them. Each classification can be given a specific name or rank to make it easier to distinguish between them. The ranking scale range is wide enough to cover all skill levels, so that no rating ever becomes negative. The Chess FIDE system ranking scale is from 0 to 3000 points, with a “mean” of 1500 (Bonzini USA’s scale is set similarly).
Bonzini USA has developed the following Ranking classification levels/definitions, along with their attendant points ratings scales. The more tournament matches a player has played, the more accurate their ranking/rating will be (see also the section on Initial Ratings). Note: Points are calculated separately/independently for singles and doubles events, so a player can have different rankings/ratings in singles and doubles. In addition, points in women’s events are calculated separately/independently from events open to both men and women, so a woman player can have different rankings/ratings in womens singles and doubles and regular singles and doubles. Also, ratings can not drop below 0 (i.e., negative points) – for instance, if a player has 0 points and he loses, he’ll remain a 0 points (as a practical matter, no one is likely to ever reach 0 points since you can’t lose very many points at a time if you have a very low rating).
Novice (0-499 Points, Initial Rating = 250 points): From a true beginner (player exposed to foosball for the first time) to the player who has developed the basic skills involved in foosball.
Novice-Elite (500-999 Points, Initial Rating = 750): Moderately skilled, but lacks the level of skills or consistency necessary to be "competitive" in major tournament open division events (i.e., against Expert-Elite and Master rated players). Note: since it is unlikely a Beginner would enter a tournament, most new players entering a tournament for the first time are initially rated in this classification.
Expert (1000-1499 Points, Initial Rating = 1250): Has the skills to be "competitive" in major tournament open division events (i.e., against Expert-Eite and Master rated players). Is good by ordinary standards, but not likely to win many matches in major tournament open division events.
Expert-Elite (1500-1999 Points, Initial Rating = 1750): Has the skills to beat "anyone" (e.g., Master or Master-Elite rated player) in a given major tournament open division match, but not likely to be able to win enough of those matches in a row (i.e., beat “everyone”) to win a major tournament open division event.
Master (2000-2499 Points, Initial Rating = 2250): Has the skills to beat “everyone” (i.e., win) in a given major tournament open division event.
Master-Elite (2500 Points and above, Initial Rating = 2750): The most "elite" players, expected to win or place highly in any given major tournament open division event. (Note: no one is given this ranking as an initial (average) rating, it can only be obtained by earning points in major tournament events).
Note: Other foosball organizations may use different names for similar type classifications, such as Rookie, Amateur, Semi-Pro, Pro, Pro-Master, etc. However, such names are not strictly indicative of “skill level”, although they generally give a good indication. For instance, Rookie technically means a player in the first year of competing in tournaments on the tour, regardless of skill level (Michael Jordan was a Rookie in the NBA at one time, but was still one of the top basketball players in the world even though he was a “rookie”). Likewise, the terms “Amateur” and “Pro” (Professional) refer to whether the player accepts prize money winnings or not (again, regardless of skill level – Jordan was an amateur while in college but still was one of the world’s best basketball players). Therefore, Bonzini USA has chosen to use the above designated ranking classification names, which we feel are more “skill based”.
Note: Prior to the 8/25/00 tournament, Bonzini USA used the following names for the 6 ranking classifications instead of the current Novice/Expert/Master (and their Elite’s): Beginner, Casual, Novice, Expert, Master, GrandMaster (although it can be argued that “Beginner” and “Casual” are not skill based terms, we couldn’t come up with anything better), therefore old rankings/ratings lists prior to 8/25/00 will utilize those terms on them.
However, in some cases, Bonzini USA tournaments will sometimes refer to events (and/or entry fees) as “Amateur” or “Pro” (for instance a Pro-Am DYP): In these cases Amateur should be considered to be the Expert level and below, while Pro should be considered to be the Expert-Elite level and above.
Initial Rankings/Ratings:
New players entering a rated tournament event are given an initial ranking based on the Bonzini USA staff’s best estimation of their skill level. This initial ranking is based on the players perceived ability as a singles player and as both a forward and goalie in doubles. Typically, the initial ranking (for both singles and doubles) will be based on where at least 2 of those 3 abilities are estimated at by the Bonzini USA staff. For example, if the new player is judged to be a Novice level singles player, a Novice level doubles forward, but an Expert level doubles goalie, his initial ranking (for both doubles and singles) would be as a Novice. In addition, there are a “lot” of reasonably good goalies that could win or place highly in a tournament if they were playing with a top forward, and it wouldn’t be accurate to make them all Experts or Masters. Therefore, for initial classification purposes, the Bonzini USA staff leans more toward how the player would do if they had to be the “strength” of the team (i.e., as a forward in doubles, or in singles) when determining the initial ranking.
Once the initial ranking is determined, his rating/ranking for doubles and singles would then be adjusted (by playing performance in each type of event) up or down independently from each other as appropriate. Thus, should this common singles/doubles initial ranking result in a player being classified higher or lower than they should be in either singles or doubles, they should quickly move to their proper singles/doubles classification based on their results in that event, while remaining in the initial ranking classification for the event they were properly classified. However, in some cases, where a player’s ability in one of the 3 areas is “significantly” different than the other 2 areas, the player may receive differing initial rankings in doubles and singles.
Their initial point rating (for use by the ratings formula) is the “average” value for the ranking’s point scale range, since it’s not feasible to accurately “estimate” where in the ranking classifications point range a player might be (estimating the ranking level alone is difficult enough). Using the “average” value will hopefully be reasonably close, closer than the value at the top or bottom of the range might be. Also, using a value at the top or bottom of the classification range could result in the player changing classification too quickly. Another reason to use the ranking classification’s “average” point value, rather than it’s lowest (entry) value, is so the lowest initial point rating that could be assigned would be set at the average of the lowest classification’s point range (i.e., 250 points rather than at 0), since a player has to have points to lose for the system to remain accurate (you can’t have negative points). If a beginner came in at 0 points (or another low number value too close to zero), and proceeds to lose a bunch of matches, he shouldn’t be in danger of dropping below 0 points (the rating formula wouldn’t let that happen anyway) – but if several people are rated at 0 points there’s no way to differentiate which of them is “better” from a ratings standpoint. A value of 250 points is sufficiently high enough to prevent this from happening, since it’s difficult to lose many points at that low of a rating since you’re normally playing players rated much higher.
Although the rankings would eventually work out even if all new players started out at the same ranking/ratings points level (e.g., the lowest 250 points, or the average 1500 points), it is better for the Bonzini USA system to start players rankings/ratings out as close to their skill level as possible, for reasons discussed below (Note: Chess’s FIDE system starts new players at an “average tournament player” value, the ratings boundary between amateurs and experts/masters on their rating scale, 2000 points). In addition, this method helps to keep new players from being able to play in lower classifications than their skills warrant, which would be unfair to the players competing in the lower classification event.
The initial ranking is only important for the first 20 or so matches a new player plays, to ensure that the players the new player plays against don’t have their ratings adversely impacted by winning or losing against a player rated significantly higher or lower than they should be (i.e., their points ratings changes would not be adjusted accurately based on their opponent’s skills), since Bonzini USA doesn’t use the special Elo formula for new players. In addition, the initial ranking enables the new player to quickly achieve a reasonably close rating to his skill level until he’s played enough matches for the Elo system to establish his true rating. The new player’s rating/ranking should settle into its true value (in both doubles and singles) within about 20 rated matches (in each event) having been played, assuming the initial ranking given was reasonably close (or after about 30 rated matches have been played if the initial rating given was substantially off – e.g., 30 matches times a potential maximum change of 50 points per match = 1500 points, or 3 classification levels).
Having the Bonzini USA staff estimate an initial ranking, instead of just staring all new players at the lowest (or average) ranking classification, is also important to prevent point “deflation” (or “inflation”) since the special Elo system formula for new players isn’t used by Bonzini USA. The Elo system formula used by Bonzini USA (i.e., the “established” players formula) “conserves” total points in the player base (as discussed in Appendix A), and if new players come into the system’s player base at too low (or high) of a points rating and thus subsequently have to gain (or lose) a lot of points to get to their true skill level’s points rating, then those points would have to come at the expense of lowering (or raising) over time all other players ratings in the player base.
Events Rated:
The Bonzini USA ratings/rankings reflect all major tournaments sanctioned by Bonzini USA since it’s formation in 1999 (all sanctioned major tournaments are identified on the “Tournaments” page of the Bonzini USA web site at www.bonziniusa.com). Only Singles and Bring Your Partner type doubles events are included in this ranking/rating (i.e., qualify for points adjustments). Besides the Open division events, this also includes any events restricted by ranking/points level, such as Novice Singles, 1700 Point Singles, Expert Doubles, 2500 Point Doubles, Master Only Doubles, etc. Women’s events ranked/rated include Womens Doubles, Womens Singles, etc.
Draw Your Partner type events are not rated, because players don’t have the opportunity to pick a partner that compliments their skills (e.g., two goalies drawing each other) as they do in a bring your partner type doubles event, and thus any ratings adjustment made in such events may not reflect the player’s true strength as a doubles player. For instance, its assumed that even if someone is a good goalie and a bad forward, in doubles he’d choose to play with a forward, so his doubles rating would be fair even if it was higher than what you’d expect of him based on his overall game.
Likewise, “specialty” events like Four on Four, Forward Shootout, and Goalie Wars are not rated since they may not reflect a player’s true skill level as a singles or doubles player, and those events are generally always open to all player rankings so ratings for them are not important.
Mixed Doubles is also currently not rated (either for women’s or regular doubles ratings), because of concerns about the effect on the doubles ratings of women consistently playing goalie with top male forwards. This may change in the future.
Administration:
The Bonzini USA ratings/rankings are recalculated/updated after every major sanctioned tournament.
Normally, events are rated in chronological order by their event starting date/time. New ratings produced by each event are used for the next event’s calculation, and so forth. Updated rankings/ratings will be posted on the Bonzini USA web site (www.bonziniusa.com) before the next sanctioned tournament, and also made available at every sanctioned tournament.
Bonzini USA reserves the right to manually adjust a players rating if conditions warrant (e.g., due to a mis-classification in their initial ranking, to prevent “sandbagging”, a player who hasn’t played in a sanctioned/rated tournament in a long time but has continued to practice and play locally comes back with significantly improved skills, etc.).
Ratings Comparisons:
Over time, players will settle into roughly the same accurate points rating/ranking level (assuming their skills don’t significantly improve or get worse), whether they play in 10 tournaments or 100 (i.e., players don’t gain more points than other players simply by playing in a lot more tournaments). One of the virtues of the Elo system vs. other ratings systems is that players eventually reach their correct/accurate rating/ranking level and STAY there, unless they TRULY improve (or get worse, or the player base as a whole gets better or worse relative to them). Players don’t move up to different rankings simply because they played in a lot of tournaments and thus gained points essentially for entering tournaments. Therefore, they don’t move up to a ranking classification for which they’re not really prepared to compete successfully, causing frustration and quitting because they are no longer able to compete successfully at tournaments. With the Elo system, players won’t move up in rank until they demonstrate they can compete effectively with the higher ranked players, and thus will stay in a ranking classification where they can have some success (albeit, players who do move up in rank should expect to struggle somewhat since they would then be at the bottom of that ranking’s players in skill, as opposed to near the top of the lower ranking’s players skills).
The Elo formula has the property of conserving the total number of ratings points in the player base (i.e., the sum of all ratings changes is zero). For instance, if the lower rated player wins, then the sum of the 2 players ratings changes is 0 (e.g., if the maximum number of points a player may win or lose in a single match in the system is set to 50, if one player gains 35 points the other player will lose 15 points (35 + 15 = 50)). Likewise, while the adding up to the value of 50 is not true in the case where the higher (or evenly) rated player wins, in that case each player gains or loses the same number of points (e.g., if the higher rated player gains 6 points, the lower rated player loses 6 points), again causing the sum of total ratings points changes to be 0 (6 + -6 = 0). It turns out that for ratings differences of more than 1000 points, then if the higher rated player wins there is no change in either rating (the ratings adjustment is less than half a point, which is rounded off to zero). This sum of all ratings changes equaling zero property of the rating system ensures that players are rated “relative” to all the other players in the ratings system player base (i.e., has a “fixed” rating scale), regardless of whether the skill of all the players in the player base increases (or decreases) over time (i.e., an “average” rated player today may be significantly more skilled than an “average” rated player many years ago when the game was still relatively new, but he is still rated as “average” relative to his (current) competition (player base)).
Please note that the Bonzini USA ratings can not be compared directly to any other foosball ratings based on similar (Elo) ratings systems. The ratings systems may not be identical (although they might be based on the same principles), the rating pools (the player base in the systems) may differ, and the matches may have been played under different conditions. For instance, consider two different foosball organizations (A & B), each using essentially identical Elo based ranking/rating systems and tournament/match formats. Say the players in organization A (i.e., their player base) consists totally of players who have excellent skills, and the players in organization B (i.e., their player base) consists totally of players who have mediocre skills. In each organization, some players will do well and gain high rankings in their ranking/rating system, and others will do poorly and have low rankings, since someone always has to win and lose in either organization. However, the lowest ranked players in organization A may still be more skilled, on an absolute scale, than the highest ranked players in organization B. This difference would only be corrected if the two organizations players started to play in the same tournaments (i.e., their player bases interacted). However, a player’s rating in one system can often serve as an indicator for the rating in another system (by converting the ratings).
Future Modifications:
Bonzini USA continually strives to improve our ranking/rating system. We monitor Chess and other organizations rankings/ratings systems for changes they may make or ideas we can incorporate into our system, or even completely new systems we can use. Towards that end, we have identified several potential modifications/systems we may implement in the future (these are noted in Appendix E). If anyone has any comments or suggestions for improvement, please send them to Bonzini USA (Contact information is on our web site at www.bonziniusa.com).
Note: Bonzini USA has software to do this calculation :-). This software, called “ELO CALC”, has been specially customized for Bonzini USA by its creator (Andreou Andreas) to meet the modifications to the standard Elo system made to adapt the Elo system to foosball and Bonzini USA, as discussed in this document. More information about this software (which also contains player rating performance history statistics and graphing features), including how to obtain a copy of it, is available on the “Accessories” page of the Bonzini USA web site (www.bonziniusa.com), or on Andreou Andreas’s Chess World web site (http://chess.8m.com).
The basic theory of the Elo rating system is that the difference between the ratings of players is a guide to predicting the outcome of a contest between those two players. The centerpiece of the Elo system is the Percentage Expectancy Curve, which takes the form of a probability function. The Percentage Expectancy Curve is the definition of a rating function (not the distribution of a random variable), and yields percentage scores, not probabilities. The interpretation of percentage scores as probabilities seeks to account for inconsistency of Chess (or foosball) results as variability of performance. The very concept itself of "ratings" requires an assumption of transitivity, that if team A is better than team B and B is better than C that A is better than C, where "better than" means "will win more often than will lose". This assumption is almost certainly not exactly correct, but is often close. Once ratings are established, the assumption is that a player who holds a given rating advantage over another player should win a certain percentage of his matches against that player. Elo assumed, and the system seems to work, that this advantage is essentially multiplicative; if player A beats player B 3 times as often as vice versa, and player B beats player C 3 times as often as vice versa, then player A should beat player C 9 times as often as the other way around. The ratings, then, are assigned logarithmically, so that the ratio (3) becomes a difference (proportional to the log of 3). Even if the assumption of multiplicativeness isn’t true, the next step is very reasonable so long as there is some way of calculating probabilities from ratings: if a player wins, his rating increases in proportion to the probability that he was to lose; if the player loses, his rating decreases in proportion to the probability that he was to win. Thus if the player wins exactly as often as he is supposed to, his rating stays fixed; if he wins more, his rating goes up over time, while if he loses it goes down.
The specific formula has been worked out according to statistical and probability theory. No rating, however, is a precise evaluation of a player's strength. Instead, ratings are averages of performances and should be viewed as approximations within a range (that range is at least equal to plus or minus the standard deviation). Ratings based on fewer than about 20 matches are much less reliable than ratings based on more than about 20 matches.
For Bonzini USA, the Elo (FIDE established players) equation used to calculate a player’s new rating (points gain or loss) based on his previous rating for a given match is:
Rn = Ro +
K(S-We)
Where:
“Rn” is the new (post-match) rating.
“Ro” is the old (pre-match) rating.
“K” is an arbitrary constant (rating coefficient) used to determine the relative magnitude of the rating change per match, including limiting the maximum allowed amount of change to the value of K (K = 50 is used by Bonzini USA).
“S” is the result
(score percentage) of the match (where S = 1 means 100% (win), S = 0 means 0%
(loss)).
“We” is the expected result (Win Expectancy percentage), from the following formula: We = 1/ (10 ^{(-D/F)} + 1); where "D" equals the difference in the two players ratings (your opponent’s rating – your rating), and F = the rating interval scale weight factor (i.e., the variance of the normal probability function).
This equation averages the latest performance (match results) into the prior rating. Earlier performances are smoothly diminished, while the full contribution of the latest performance is preserved. The rating is thus a weighted moving average. Note: the FIDE chess Elo system “publishes” player ratings (“Ro”) at periodic intervals (typically every 6 months) and uses the same published Ro for the calculation for every match the player plays (regardless of how he performs in those matches), until the next publishing interval. The FIDE ratings are calculated assuming that every player enters the matches with all opponents at the same “old” (last published) rating. That means that ratings are not re-evaluated after each match, and the “new” ratings are computed/adjusted from the “old” ones adding all points he gained/lost from the matches the player completed at any tournament(s) since the previous publishing date. In other words, FIDE assumes that the ratings of players do not change in the process of one tournament (or publishing interval). For instance, if your published rating is 1710, and your current rating has grown to 1850, the rating changes would be calculated using the published 1710 rating. However, Bonzini USA adjusts (i.e., “publishes”) the player’s rating after every match (although the calculation uses his opponent’s pre-event rating for each match calculation). For instance, if a 1500 point player wins his first match in the tournament and gains 6 points, instead of using 1500 for his points in the next match the calculation uses his new value of 1506, and so on for each match. This allows a player’s rating to more accurately be adjusted to any change in his skill level, as well as more accurately impacting the ratings adjustments to his opponent’s ratings.
The equation has 2 parameters which affect the dynamics of the Elo system: the coefficient K, and the rating interval scale weight factor F.
The coefficient K determines the relative importance given to the players pre-match rating versus their event performance (match results). A high K gives more importance (weight) to the most recent performance, i.e., allows the rating to change more. The effect of changing K is to change how quickly a player’s rating will change. A high value will increase its volatility, which is undesirable, but it will also allow the system to take account of a player’s change in skill (e.g., improvement) quicker. A small value for K is, in effect, an average over a longer period of time. Values for K in the Chess Elo ratings systems typically range from 10 to 32. In Chess’s system, K = 32 effectively averages, in practice, over about twenty or twenty-five matches, so that if a player gets markedly better (or worse) it will take about that long before this is adequately reflected in his rating. Note: chess’s FIDE system uses K = 25 for the first 30 matches; and after 30 matches have been played it uses K = 15 for 0-2399 points, and K = 10 for 2400 and above. The 3 different K’s used in the FIDE system serve to make the ratings for more highly rated/firmly established players ratings (i.e., ratings for which there is a high confidence level that they accurately reflect the players true skill level) not change as rapidly as lower rated/less firmly established players ratings (i.e., ratings for which there is a lower confidence level that they accurately reflect the players true skill level). The low value for K makes gaining points a lot more difficult, but also helps the player remain at a high rating.
Bonzini USA uses only one value for K, a relatively high value of 50 (one tenth of the rating classification interval of 500 points). This is both for simplicity’s sake, and due to differences in Bonzini USA’s system from the FIDE system (i.e., rating classification interval size and rating interval scale weight factor). The high value of K allows player ratings to change faster (a maximum of 50 points per match) to more quickly reflect the true skill level for new players whose initial ranking/rating classification is only a rough estimate, players who don’t play in many matches, or players whose skill level changes significantly. Also, a higher value of K (twice the FIDE high value of 25) is needed since Bonzini USA’s classification interval is set larger (more than twice the size) than the interval used in FIDE (500 points vs. 200 points), and also since Bonzini USA uses a larger value for F (1000 vs. 400, as discussed below). In addition, using a single value for K makes the doubles rating calculation easier since each player on a team could meet the criteria for a different K value if multiple K values are used, complicating the calculation.
Another potential use of the value of K is to give more weight to certain important tournaments or events than to normal tournaments or events (for instance, to the State Championships, based on the assumption that players would be trying to win harder in such tournaments than in others). By using a higher value of K (e.g., 60 or 75) for the important tournaments than for normal tournaments (e.g., 50), player ratings would change more based on their results in an important tournament than they would in a normal tournament (Bonzini USA currently does not do this, but may consider doing it in the future).
Since the stronger player does not always outperform the weaker player, a normal distribution function is used to represent the variable performance of a player (the normal distribution function is one of the fundamental functions in statistics). The distribution of match results of a player against someone their own strength (i.e., in the same ranking classification point interval, 500 points for Bonzini USA) is expected to fit the normal (bell) curve, and the expected result against a player different from their rating by “x” classification intervals follows the normal distribution function. This normal distribution function simply says that deviations from the average level occur, and that large deviations occur less frequently than small ones. The deviation is measured in standard deviations, a spread which encompasses about two-thirds of a players performances.
The normal probability function is derived from the normal distribution function, and this function determines the expected match results (Winning Expectancies) based on known rating differences (or the differences in ratings based on match results). Bonzini USA uses a much simpler approximation of this normal probability function, represented by the formula for “We” in the above calculation (the results of the approximated function are practically identical to the normal probability function). The parameter “F” (rating interval scale weight factor) in the “We” formula varies the scale for the Winning Expectancy values. Chess’s FIDE system sets this factor to where a difference of 400 points (two 200 point chess classification intervals) is a factor of 10. From that, you can calculate the probability that Player A will beat Player B by taking the difference between their ratings, dividing by 400, raising 10 to that power, then dividing this by one more than itself (so that the probability of A beating B plus the probability of B beating A is equal to 1) – this is the “We” formula. Since Bonzini USA’s classification intervals are set at 500 points (instead of Chess’s 200 points), Bonzini USA uses 1000 (two 500 point classification intervals) for this factor instead of 400. This expands the Winning Expectancy scale so that the “We” formula doesn’t essentially “max out” (i.e., hit > 90%) at too small a difference in player ratings (which would make the ratings adjustment essentially the same for most matches between players of different rankings – e.g., chess’s scale hits about 90% at a difference of only about 400 points, which would be less than a single ranking classification interval range (500 points) on Bonzini USA’s rating scale). This setting gives about a 75% probability for beating a 1 rank lower rated (500 point difference) player, and about a 90% probability for beating a player rated 2 ranks (1000 points) lower (or conversely, about a 25% chance of beating a 1 rank higher rated player, and about a 10% chance of beating a 2 rank higher rated player). Note that even at this setting, eventually the “We” formula will “max out” for very large differences in ratings (e.g., a player rated 610 points would get the same ratings adjustment for playing a player rated 2805 as he would playing a player rated 2420 (which is basically OK from a ratings standpoint since supposedly his odds of beating EITHER one would be practically nonexistent).
The formula for “We” (using Bonzini USA’s value for F of 1000) can be graphically shown, for various selected differences in points, by the following table (“We” values rounded off to nearest 1%):
Sample Winning Expectancies (We) |
||
Difference in Points |
Higher Rated Player (%) |
Lower Rated Player (%) |
0 |
50 |
50 |
25 |
51 |
49 |
50 |
53 |
47 |
75 |
54 |
46 |
100 |
56 |
44 |
150 |
59 |
41 |
200 |
61 |
39 |
250 |
64 |
36 |
300 |
67 |
33 |
350 |
69 |
31 |
400 |
72 |
28 |
450 |
74 |
26 |
500 |
76 |
24 |
600 |
80 |
20 |
700 |
83 |
17 |
800 |
86 |
14 |
900 |
89 |
11 |
1000 |
91 |
9 |
1100 |
93 |
7 |
1200 |
94 |
6 |
1300 |
95 |
5 |
1400 |
96 |
4 |
1500 |
97 |
3 |
1600 |
98 |
2 |
1700 |
98 |
2 |
1800 |
99 |
1 |
1900 |
99 |
1 |
2000 |
99 |
1 |
2100 |
99 |
1 |
2200 |
99 |
1 |
2300+ |
~100 |
~0 |
This means that, for example, if the difference in ratings between two players is 350 points, the higher rated player would be expected to win the match 69% of the time (or conversely, the lower rated player would be expected to win only 31% of the time). Intuitively, you would expect that players who are rated essentially the same (about a 0 difference in points) should beat each other about 50 – 50, which is confirmed by the above chart. Differences in ratings equal to multiples of Bonzini USA ranking classification intervals (500 points) are bolded on the table.
Examples:
Using the We formula (or values from the above chart), and the Elo FIDE equation, the following are typical examples of the Elo ratings calculation (Note: Any fractions of a ratings point are rounded off to the nearest point):
Example 1 (Singles):
Player A (with a singles rating of 1800) plays Player B (with a singles rating of 1550 points).
Example 1a:
If Player A beats Player B (as would be expected), then Player A’s rating of 1800 points is adjusted (Rn) as follows:
Rn (new rating) = Ro(1800) + (K(50) x (S(1) – We (0.64*)))
Rn = 1800 + (50 x (1 – 0.64))
Rn = 1800 + (50 x (0.36))
Rn = 1800 + (18)
Rn = 1818
* Since the difference between the ratings for Players A and B is 250 points (1800 – 1550), the We for the match (from the chart, or the We formula) is = 0.64.
Example 1b:
If Player A loses (upset) to Player B, then Player A’s rating of 1800 points would be adjusted (Rn) as follows:
Rn (new rating) = Ro(1800) + (K(50) x (S(0) – We (0.64, see “*” in 1a above)))
Rn = 1800 + (50 x (0 – 0.64))
Rn = 1800 + (50 x (-0.64))
Rn = 1800 + (-32)
Rn = 1768
Note that Player A loses more points (32) if he loses the match (Example 1b), than he gains (18) if he wins the match (Example 1a), since he is the higher rated player (by 250 points) and thus is “supposed” to win. Conversely, Player B would only lose 18 points if he lost since he was supposed to lose to Player A, but Player B would gain 32 points if he won since he wasn’t supposed to beat Player A (meaning, for example, that Player B has gotten better, or Player A has gotten worse, or a combination of both and thus the two players should be rated closer together than they originally were).
Example 2 (Doubles):
Team A (consisting of Player 1 who has a doubles ratings of 2070, and Player 2 who has a doubles rating of 1940) plays Team B (consisting of Player 3 who has a doubles rating of 1495, and Player 4 who has a doubles rating of 1315).
Example 2a:
If Team A beats Team B (as would be expected), then the Team A players individual doubles ratings would be adjusted (Rn) as follows:
Team A’s pre-match (old) rating (Ro) is 2070 (Team A’s Player 1 rating) + 1940 (Team A’s Player 2 rating) = 4010, then divided by 2 per the Bonzini USA doubles adaptation, for a Team A rating of 2005. Similarly, Team B’s pre-match (old) rating (Ro) is 1495 + 1315 = 2810, 2810/2 = 1405.
Rn (new rating) = Ro(2005) + (K(50) x (S(1) – We (0.80*)))
Rn = 2005 + (50 x (1 – 0.80))
Rn = 2005 + (50 x (0.20))
Rn = 2005 + (10)
Rn = 2015
Since the “Team A” earned 10 ratings points (2015 – 2005 = 10), each player on that team would have 10 rating points added to their doubles rating; so Player 1’s new doubles rating would be 2080, and Player 2’s new doubles rating would be 1950.
* Since the difference between the ratings for Team A and Team B is 600 points (2005 – 1405), the We for the match (from the chart, or the We formula) is = 0.80.
Example 2b:
If Team A loses to Team B (upset), then the Team A players individual doubles ratings would be adjusted (Rn) as follows:
Rn (new rating) = Ro(2005) + (K(50) x (S(0) – We(0.80, see ”*” in 2a above)))
Rn = 2005 + (50 x (0 – 0.80))
Rn = 2005 + (50 x (-0.80))
Rn = 2005 + (-40)
Rn = 1965
Since the “Team A” lost 40 ratings points (2005 – 1965 = 40), each player on that team would have 40 ratings points subtracted from their doubles rating; so Player 1’s new doubles rating would be 2030, and Player 2’s new doubles rating would be 1900.
Note that Team A’s players lose many more points (40) if they lose the match (Example 2b), than they gain (10) if they win the match (Example 2a), since they are the much higher rated team (by 600 points) and thus are expected to win. The converse of this is true for Team B (its players gain 40 points for winning, or lose 10 points for losing), since they are expected to lose.
Note also that Team A gained fewer points (10 vs. 18) by beating Team B or lost more points (40 vs. 32) by losing to Team B (Example 2), than Player A gained/lost by beating/losing to Player B (Example 1), because Team A was favored to beat Team B more strongly (600 point difference in their ratings) than Player A was favored to beat Player B (250 point difference in their ratings).
Developed by: Bruce Nardoci
Copyright (c) 2000 Bonzini USA