[SUB]In a prolonged fit of insanity[/SUB], I decided to compile all of the judges scores from the Grand Prix events in order to make it easier for fans to examine the judging records of judges throughout this season and determine whether and which judges exhibited evidence of bias in their scoring. I figured I would post it now in anticipation of the Grand Prix Final, especially as some of these judges will inevitably show up on the judging panels of the events there.
First, I created event reports which detailed how much a judge over- or under- scored a skater relative to other judges, for every single skater and every single judge on the Senior Grand Prix. In these reports, I entered in an abbreviated version of the protocols from each event and had my spreadsheet calculate how much each judge differed from the average other judges on the panel in three measures: Total Score (abbreviated TS), Average Raw GOE per element (out of 5) and Average Raw PCS per component (out of 10). Edit: a further explanation of how to read the data is available a few posts below this, which I recommend people to read, especially if they are confused.
Here are the event reports:**
Skate America
Skate Canada
Cup of Finland
NHK Trophy
Rostelecom
IdF
So, for instance, taking the first entry of the first event (Skate America Men’s) as an example, Patty Klein (CAN) scored Nathan Chen 10.03 points higher than the average of the other judges, gave him GOE scores that were 0.58/5 points higher on average per element, and PCS scores that were 0.12/10 higher per component, whereas Stanislava Smidova scored him 6.9 points below the average of the other judges, gave him GOEs 0.37 lower, and PCS 0.13 lower.
Have a question about whether a certain judge was being unfair? Compare their scores to the other judges!
Additionally, I also calculated the average nationalistic bias of each judge. More specifically, I determined the average TS/GOE/PCS difference for skaters sharing the same nationality as the judge in question, and the average TS/GOE/PCS difference for skaters having a different nationality, and found the difference between the two (which is represented by the somewhat obscurely named DELTA in the event reports). You can understand this as the average number of “bonus” points a skater or skaters receive from a certain judge when they share nationalities with that judge. Once I did all these calculations, I started to compile them into a database of judges.
You can find it here.
In this database, you can find average nationalistic biases averaged across all the events that the judge judged. So for instance, if you look at the first few entry as an example, you will see that Rebecca Andrew (AUS) scored Australian skaters virtually the same as she scored non-Australian skaters on average across all events. You can also see the Total Score average broken down by discipline, how many skaters she scored, as well as whether she was a relatively lenient or strict judge relative to the panels she’s been on (she was slightly on the stricter side). Then, in each of the specific judges’ pages, you can see the deviations of their scores from the other judges on the same panel for each skater for each event that they’ve judged so far on the Grand Prix. So if you want to take a closer look at any particular judge, you can!
Unfortunately, this database is not complete, as I still need to input roughly 40 or so out of the 100ish judges who judged on the Grand Prix (yeah, it’s a lot of work). However, I did get almost all of the judges from the big Federations (Russia, USA, Canada + Japan) as well as most of the judges who judged more than one or two events. So there should be plenty of points of interest.
General (if tentative) conclusions: Canadian judged are a mixed bag (though Canada doesn’t exactly have a lot of very competitive skaters this year), with some quite biased ones and some quite fair ones. American judges are mostly bad, as are Russian judges. Japanese judges largely avoid being too biased, although one or two are edging a little close. Rounding out the smaller feds, Israeli, Chinese, and Ukrainian judges seem pretty terrible, but most of the other small feds are all over the place as far as bias is concerned.
In terms of total score bias per discipline, the biggest biases so far are, in order, 1. Dance 2. Men 3. Ladies 4. Pairs. Men above Ladies and Pairs makes sense because there are more points available for the judges to distribute in Men, but Dance has the least available points...
Some caveats:
1. The data set for some judges is rather limited, so I wouldn’t necessarily draw hard conclusions about judges who’ve only scored one or two of their own skaters.
2. Similarly, past judging behavior is not necessarily predictive of future judging behavior. Some judges who’ve shown fair behavior before (eg. Agita Abele LAT, who I analyzed in a previous post that I’m too lazy to find right now) now appear to exhibit biased behavior, and a judge who in the past behaved in a manner consistent with bias may in the future not exhibit the same bias.
3. I’m not too sure that averaging total score across disciplines is very helpful as anything more than a crude picture of the bias of a judge. The GOE and PCS averages should be more comparable across disciplines, however. The numbers also behave a bit strangely when a judge has scored skaters from their own nationality in one discipline but not the other. Overall, however, I think the most informative numbers are the average bias a judge exhibits in a given discipline.
4. There may be a few errors in the protocols, as I have not had the time to proofread them carefully. However, I do think I’ve caught most of the large errors, so any remaining errors should not alter final numbers too much. If you spot an error, please let me know!
5. This set of data only tells you how much a judge over- or under- scores skaters relative to other judges. It does not tell you whether a specific skater is “correctly” scored by the judges as a whole. So it can’t really speak to questions like, for instance, whether Alina Zagitova or Shoma Uno or whoever are overscored, only whether a specific judge scored them above or below other judges and whether that might be related to nationality.
* Note that this is not a comparison between the judges’ scores and the official score. The formula is not Judge’s score minus official score, it’s Judge’s score minus the average of the other judges’ scores.
** I’ve also done some of the Senior Bs, though those numbers are not included in the judges database. For those interested, here are Autumn Classic, Ondrej Nepala, and half of Lombardia (singles disciplines only). I do intend to include them at some point but there’s just too much work. If anyone is interested in helping, please let me know.
First, I created event reports which detailed how much a judge over- or under- scored a skater relative to other judges, for every single skater and every single judge on the Senior Grand Prix. In these reports, I entered in an abbreviated version of the protocols from each event and had my spreadsheet calculate how much each judge differed from the average other judges on the panel in three measures: Total Score (abbreviated TS), Average Raw GOE per element (out of 5) and Average Raw PCS per component (out of 10). Edit: a further explanation of how to read the data is available a few posts below this, which I recommend people to read, especially if they are confused.
Here are the event reports:**
Skate America
Skate Canada
Cup of Finland
NHK Trophy
Rostelecom
IdF
So, for instance, taking the first entry of the first event (Skate America Men’s) as an example, Patty Klein (CAN) scored Nathan Chen 10.03 points higher than the average of the other judges, gave him GOE scores that were 0.58/5 points higher on average per element, and PCS scores that were 0.12/10 higher per component, whereas Stanislava Smidova scored him 6.9 points below the average of the other judges, gave him GOEs 0.37 lower, and PCS 0.13 lower.
Have a question about whether a certain judge was being unfair? Compare their scores to the other judges!
Additionally, I also calculated the average nationalistic bias of each judge. More specifically, I determined the average TS/GOE/PCS difference for skaters sharing the same nationality as the judge in question, and the average TS/GOE/PCS difference for skaters having a different nationality, and found the difference between the two (which is represented by the somewhat obscurely named DELTA in the event reports). You can understand this as the average number of “bonus” points a skater or skaters receive from a certain judge when they share nationalities with that judge. Once I did all these calculations, I started to compile them into a database of judges.
You can find it here.
In this database, you can find average nationalistic biases averaged across all the events that the judge judged. So for instance, if you look at the first few entry as an example, you will see that Rebecca Andrew (AUS) scored Australian skaters virtually the same as she scored non-Australian skaters on average across all events. You can also see the Total Score average broken down by discipline, how many skaters she scored, as well as whether she was a relatively lenient or strict judge relative to the panels she’s been on (she was slightly on the stricter side). Then, in each of the specific judges’ pages, you can see the deviations of their scores from the other judges on the same panel for each skater for each event that they’ve judged so far on the Grand Prix. So if you want to take a closer look at any particular judge, you can!
Unfortunately, this database is not complete, as I still need to input roughly 40 or so out of the 100ish judges who judged on the Grand Prix (yeah, it’s a lot of work). However, I did get almost all of the judges from the big Federations (Russia, USA, Canada + Japan) as well as most of the judges who judged more than one or two events. So there should be plenty of points of interest.
General (if tentative) conclusions: Canadian judged are a mixed bag (though Canada doesn’t exactly have a lot of very competitive skaters this year), with some quite biased ones and some quite fair ones. American judges are mostly bad, as are Russian judges. Japanese judges largely avoid being too biased, although one or two are edging a little close. Rounding out the smaller feds, Israeli, Chinese, and Ukrainian judges seem pretty terrible, but most of the other small feds are all over the place as far as bias is concerned.
In terms of total score bias per discipline, the biggest biases so far are, in order, 1. Dance 2. Men 3. Ladies 4. Pairs. Men above Ladies and Pairs makes sense because there are more points available for the judges to distribute in Men, but Dance has the least available points...
Some caveats:
1. The data set for some judges is rather limited, so I wouldn’t necessarily draw hard conclusions about judges who’ve only scored one or two of their own skaters.
2. Similarly, past judging behavior is not necessarily predictive of future judging behavior. Some judges who’ve shown fair behavior before (eg. Agita Abele LAT, who I analyzed in a previous post that I’m too lazy to find right now) now appear to exhibit biased behavior, and a judge who in the past behaved in a manner consistent with bias may in the future not exhibit the same bias.
3. I’m not too sure that averaging total score across disciplines is very helpful as anything more than a crude picture of the bias of a judge. The GOE and PCS averages should be more comparable across disciplines, however. The numbers also behave a bit strangely when a judge has scored skaters from their own nationality in one discipline but not the other. Overall, however, I think the most informative numbers are the average bias a judge exhibits in a given discipline.
4. There may be a few errors in the protocols, as I have not had the time to proofread them carefully. However, I do think I’ve caught most of the large errors, so any remaining errors should not alter final numbers too much. If you spot an error, please let me know!
5. This set of data only tells you how much a judge over- or under- scores skaters relative to other judges. It does not tell you whether a specific skater is “correctly” scored by the judges as a whole. So it can’t really speak to questions like, for instance, whether Alina Zagitova or Shoma Uno or whoever are overscored, only whether a specific judge scored them above or below other judges and whether that might be related to nationality.
* Note that this is not a comparison between the judges’ scores and the official score. The formula is not Judge’s score minus official score, it’s Judge’s score minus the average of the other judges’ scores.
** I’ve also done some of the Senior Bs, though those numbers are not included in the judges database. For those interested, here are Autumn Classic, Ondrej Nepala, and half of Lombardia (singles disciplines only). I do intend to include them at some point but there’s just too much work. If anyone is interested in helping, please let me know.