Judging bias on the Grand Prix post CoP revision (numbers!) | Golden Skate

Judging bias on the Grand Prix post CoP revision (numbers!)

Shanshani

On the Ice
Joined
Mar 21, 2018
[SUB]In a prolonged fit of insanity[/SUB], I decided to compile all of the judges scores from the Grand Prix events in order to make it easier for fans to examine the judging records of judges throughout this season and determine whether and which judges exhibited evidence of bias in their scoring. I figured I would post it now in anticipation of the Grand Prix Final, especially as some of these judges will inevitably show up on the judging panels of the events there.

First, I created event reports which detailed how much a judge over- or under- scored a skater relative to other judges, for every single skater and every single judge on the Senior Grand Prix. In these reports, I entered in an abbreviated version of the protocols from each event and had my spreadsheet calculate how much each judge differed from the average other judges on the panel in three measures: Total Score (abbreviated TS), Average Raw GOE per element (out of 5) and Average Raw PCS per component (out of 10). Edit: a further explanation of how to read the data is available a few posts below this, which I recommend people to read, especially if they are confused.

Here are the event reports:**
Skate America
Skate Canada
Cup of Finland
NHK Trophy
Rostelecom
IdF

So, for instance, taking the first entry of the first event (Skate America Men’s) as an example, Patty Klein (CAN) scored Nathan Chen 10.03 points higher than the average of the other judges, gave him GOE scores that were 0.58/5 points higher on average per element, and PCS scores that were 0.12/10 higher per component, whereas Stanislava Smidova scored him 6.9 points below the average of the other judges, gave him GOEs 0.37 lower, and PCS 0.13 lower.

Have a question about whether a certain judge was being unfair? Compare their scores to the other judges!

Additionally, I also calculated the average nationalistic bias of each judge. More specifically, I determined the average TS/GOE/PCS difference for skaters sharing the same nationality as the judge in question, and the average TS/GOE/PCS difference for skaters having a different nationality, and found the difference between the two (which is represented by the somewhat obscurely named DELTA in the event reports). You can understand this as the average number of “bonus” points a skater or skaters receive from a certain judge when they share nationalities with that judge. Once I did all these calculations, I started to compile them into a database of judges.

You can find it here.

In this database, you can find average nationalistic biases averaged across all the events that the judge judged. So for instance, if you look at the first few entry as an example, you will see that Rebecca Andrew (AUS) scored Australian skaters virtually the same as she scored non-Australian skaters on average across all events. You can also see the Total Score average broken down by discipline, how many skaters she scored, as well as whether she was a relatively lenient or strict judge relative to the panels she’s been on (she was slightly on the stricter side). Then, in each of the specific judges’ pages, you can see the deviations of their scores from the other judges on the same panel for each skater for each event that they’ve judged so far on the Grand Prix. So if you want to take a closer look at any particular judge, you can!

Unfortunately, this database is not complete, as I still need to input roughly 40 or so out of the 100ish judges who judged on the Grand Prix (yeah, it’s a lot of work). However, I did get almost all of the judges from the big Federations (Russia, USA, Canada + Japan) as well as most of the judges who judged more than one or two events. So there should be plenty of points of interest.

General (if tentative) conclusions: Canadian judged are a mixed bag (though Canada doesn’t exactly have a lot of very competitive skaters this year), with some quite biased ones and some quite fair ones. American judges are mostly bad, as are Russian judges. Japanese judges largely avoid being too biased, although one or two are edging a little close. Rounding out the smaller feds, Israeli, Chinese, and Ukrainian judges seem pretty terrible, but most of the other small feds are all over the place as far as bias is concerned.

In terms of total score bias per discipline, the biggest biases so far are, in order, 1. Dance 2. Men 3. Ladies 4. Pairs. Men above Ladies and Pairs makes sense because there are more points available for the judges to distribute in Men, but Dance has the least available points...

Some caveats:
1. The data set for some judges is rather limited, so I wouldn’t necessarily draw hard conclusions about judges who’ve only scored one or two of their own skaters.
2. Similarly, past judging behavior is not necessarily predictive of future judging behavior. Some judges who’ve shown fair behavior before (eg. Agita Abele LAT, who I analyzed in a previous post that I’m too lazy to find right now) now appear to exhibit biased behavior, and a judge who in the past behaved in a manner consistent with bias may in the future not exhibit the same bias.
3. I’m not too sure that averaging total score across disciplines is very helpful as anything more than a crude picture of the bias of a judge. The GOE and PCS averages should be more comparable across disciplines, however. The numbers also behave a bit strangely when a judge has scored skaters from their own nationality in one discipline but not the other. Overall, however, I think the most informative numbers are the average bias a judge exhibits in a given discipline.
4. There may be a few errors in the protocols, as I have not had the time to proofread them carefully. However, I do think I’ve caught most of the large errors, so any remaining errors should not alter final numbers too much. If you spot an error, please let me know!
5. This set of data only tells you how much a judge over- or under- scores skaters relative to other judges. It does not tell you whether a specific skater is “correctly” scored by the judges as a whole. So it can’t really speak to questions like, for instance, whether Alina Zagitova or Shoma Uno or whoever are overscored, only whether a specific judge scored them above or below other judges and whether that might be related to nationality.

* Note that this is not a comparison between the judges’ scores and the official score. The formula is not Judge’s score minus official score, it’s Judge’s score minus the average of the other judges’ scores.

** I’ve also done some of the Senior Bs, though those numbers are not included in the judges database. For those interested, here are Autumn Classic, Ondrej Nepala, and half of Lombardia (singles disciplines only). I do intend to include them at some point but there’s just too much work. If anyone is interested in helping, please let me know.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I'm afraid the math is beyond me.

Is there a way to take into account whether a judge is scoring higher or lower than the rest of the judging panel, on average, for the whole event?

E.g., it's not very meaningful to point out that the Freedonian judge scored the Freedonian skater 0.25 higher than the rest of the judging panel, if that same judge also scored all the other skaters in the event an average of 0.25 or 0.20 higher than the rest of the panel.

Or that the Sylvanian judge scored the Freedonian skater 0.5 lower than the rest of the panel if that judge also scored all the rest of the skaters (including the Sylvanian skater, though not as much) lower than the rest of the panel as well.
 

andromache

Record Breaker
Joined
Mar 23, 2014
I'm afraid the math is beyond me.

Is there a way to take into account whether a judge is scoring higher or lower than the rest of the judging panel, on average, for the whole event?

E.g., it's not very meaningful to point out that the Freedonian judge scored the Freedonian skater 0.25 higher than the rest of the judging panel, if that same judge also scored all the other skaters in the event an average of 0.25 or 0.20 higher than the rest of the panel.

Or that the Sylvanian judge scored the Freedonian skater 0.5 lower than the rest of the panel if that judge also scored all the rest of the skaters (including the Sylvanian skater, though not as much) lower than the rest of the panel as well.

So I am bad at math and spreadsheets, but I believe you can look at their standard deviations for each skater and draw conclusions based on that?

For example, for Skate America, ISR judge Alexei Beletski has a whole lot of negative numbers in his column for everyone (except Bychenko). So I am assuming that means that the ISR judge was overall pretty low-scoring?

Honestly I have no idea what any of these numbers mean and I am just guessing. But I do think the spreadsheets answers what you're asking. I'm just not sure how. :laugh:

But this is super interesting.
 

Shanshani

On the Ice
Joined
Mar 21, 2018
I'm afraid the math is beyond me.

Is there a way to take into account whether a judge is scoring higher or lower than the rest of the judging panel, on average, for the whole event?

E.g., it's not very meaningful to point out that the Freedonian judge scored the Freedonian skater 0.25 higher than the rest of the judging panel, if that same judge also scored all the other skaters in the event an average of 0.25 or 0.20 higher than the rest of the panel.

Or that the Sylvanian judge scored the Freedonian skater 0.5 lower than the rest of the panel if that judge also scored all the rest of the skaters (including the Sylvanian skater, though not as much) lower than the rest of the panel as well.

This is taken into account by the bias scores, which compare how a judge scores skaters of their own nationality with skaters of other nationalities by taking the average difference from the rest of the panel for home nation scores minus the average difference from the rest of the panel for other nation scores. So those already eliminate the effect of overall leniency/strictness of an individual judge.

If you want to look at an individual score from a judge, you can compare it to the mean (average) score deviation from that judge. Underneath each competition column you'll see a number labeled 'mean', which tells you what the average difference between the judge's scores and the other judge's scores. So for instance, the average difference between judge Beletski's scores at SkAm Men's is -6.81, which means he typically scored skaters 6.81 lower than other judges. So you'll see that his -11.51 underscore of Nathan Chen is more severe than average, but not as out of the norm for him as it would be for a more lenient judge.
 

Shanshani

On the Ice
Joined
Mar 21, 2018
I may have written this a little hastily since I wanted to get it out before I fly to GPF, so I can see why it's confusing. The basic idea is this. First, I created an array representing the differences between each judge's scores and the average of the other judges. This is what the numbers in the summary page of the event reports mean. So for instance, Patty Klein gave Nathan Chen a total score of 288.79, whereas the other judges gave Nathan an average of 278.76. Therefore Patty Klein "overscored" Nathan by 288.79-278.76=10.03 (overscore here isn't a judgment about the quality of her scoring, it's just description of her score being higher than the other judges on average). That's what the 10.03 number you see under her column in the Nathan Chen row represents, and the same idea for the other cells.

Now of course, as gkelly pointed out, just that number by itself doesn't tell you a lot. If Patty Klein is a lenient judge in general, then 10.03 may not be particularly unusual. Therefore, in order to contextualize these score deviations, I have included a number representing the judge's average score deviation from the other judges across the competition in the same discipline, labeled "MEAN". If we look at Patty Klein's average, we see that it is 0.06, which tells us that she doesn't have a particular tendency to over or under-score in general, in contrast to say, Alexei Beletski, who has an average of -6.81, which means he underscores skaters by about 7 points in comparison to other judges on average. So 10.03 is a bit high for Patty Klein, but we can also take a look at the other scores to see whether large over or underscores are common for Patty Klein. We can see that while 10.03 is Klein's biggest deviation, there are other deviations of similar magnitude, so maybe she just particularly likes Nathan Chen. (The calculations are then repeated for average GOE out of 5 per element and average score out of 10 for PCS)

So that's how to interpret the arrays of numbers next to skaters' names. Now, when it comes to calculating national bias, I do two things. First, I find the average of the score deviations of skaters of the same nation as the judge, and then I find the average of the score deviations of skaters of different nations as the judge. Let's take US judge Wendy Enzmann as an example this time (still using SkAm Men's as our example competition). Enzmann judged 3 US skaters at this competition: Nathan Chen, Vincent Zhou, and Jimmy Ma. Her score deviations were, respectively, 4.65, 3.56, and 1.58. So the average of her same-nation score deviations was (4.65+3.56+1.58)/3=3.26, which is the number listed under MEANSAME (...maybe I should have switched the stats labels to more transparent ones). This means that on average, she scored US skaters 3.26 points higher than other judges. Her average score deviation for non-US skaters, on the other hand, was 0.27, which means she scored non-US skaters 0.27 higher than other judges. In order to calculate her bias, I found the difference between these numbers (this is given as DELTA in the competition reports but was renamed bias in the judge database...hmm yeah I really should have relabeled it). Enzmann's bias was (a fairly modest, especially for men's) 3.26-0.27=3.00 (accounting for rounding). You can think of these 3 points as 3 apparent bonus points Nathan, Vincent, and Jimmy got from Enzmann for being American.

In the judges database, this information was all recorded. The judge's database additionally throws together data from different competitions, in order to get a bigger picture of a judge's biases. I'm still thinking about exactly how I want combining competition data to work, but for now I think the most helpful thing to look at is the biases for each discipline (labeled "Men bias" "Ladies bias" etc). These are calculated by throwing all the score deviations from all the competitions in a specific discipline that a judge has judged, and re-calculating the average score deviation for both home and non-home skaters for the combined data set. Then, I find the difference between those averages again (now reported as "bias" instead of "delta"), which represents the average "bonus points" skaters of the same nationality as a judge got across competitions in the same discipline. I also combined data across disciplines in the "GOE bias" "PCS bias" and "Score bias" columns, though I'm a little conflicted about how well that worked. I also color coded it (somehow I forgot to mention this)--red means that the bias level was fairly high (dark red means even higher), green means bias level was low. I used 0.1, 0.4, and 0.8 as cut offs for GOE and PCS (<0.1 = green, between 0.1 and 0.4 = colorless, between 0.4 and 0.8 light red, higher than 0.8 dark red) and 1, 6.5, and 13 as cut offs for total score, but those numbers are a little arbitrary (I don't have all the judges' data together yet so it's a little difficult to make a non-arbitrary determination at this point) and it's mainly supposed to serve as a visual aid. This is a work in progress, so I'll probably wind up changing things once my judge data is all compiled into the database.
 

icetug

Medalist
Joined
Apr 23, 2017
Just random thoughts.

Statistic can be misleading. Let's take NHK. The Japanese judge was really highly biased towards one of Japanese skaters (+12.06), but gave neutral score to another one (+1.18). Summing up, her "bias score" is average :laugh:

Sometimes a judge gives an average (or even higher than average) scores, but puts a skater 1-2 places lower in a tally. So not only scores given to a skater counts.

The higher level of national bias: Canadian judge who gave Samarin 13.12 pts less than average in France. After Nicolas' WD there were no Canadians at IdF. But Samarin's 2nd place could (and would, after Yuzuru WD) have give him a spot at GPF. Canadian skater got it.

And last, but definitely not least: I can't understand how the scores by two judges can differ by more than 40 points :scard7: (Boyang's at IdF, but especially Yaroslav's at NHK - as the judge who gave the highest score wasn't Ukrainian).
 

Ziotic

Medalist
Joined
Dec 23, 2016
Just random thoughts.

Statistic can be misleading. Let's take NHK. The Japanese judge was really highly biased towards one of Japanese skaters (+12.06), but gave neutral score to another one (+1.18). Summing up, her "bias score" is average :laugh:

Sometimes a judge gives an average (or even higher than average) scores, but puts a skater 1-2 places lower in a tally. So not only scores given to a skater counts.

The higher level of national bias: Canadian judge who gave Samarin 13.12 pts less than average in France. After Nicolas' WD there were no Canadians at IdF. But Samarin's 2nd place could (and would, after Yuzuru WD) have give him a spot at GPF. Canadian skater got it.

And last, but definitely not least: I can't understand how the scores by two judges can differ by more than 40 points :scard7: (Boyang's at IdF, but especially Yaroslav's at NHK - as the judge who gave the highest score wasn't Ukrainian).

Actually if Samarin got second he would have made GPF over Jun. I get what you’re saying but arguably the positions were correct with Nathan, Jason, Alexandre.
 

Shanshani

On the Ice
Joined
Mar 21, 2018
Just random thoughts.

Statistic can be misleading. Let's take NHK. The Japanese judge was really highly biased towards one of Japanese skaters (+12.06), but gave neutral score to another one (+1.18). Summing up, her "bias score" is average :laugh:

Sometimes a judge gives an average (or even higher than average) scores, but puts a skater 1-2 places lower in a tally. So not only scores given to a skater counts.

The higher level of national bias: Canadian judge who gave Samarin 13.12 pts less than average in France. After Nicolas' WD there were no Canadians at IdF. But Samarin's 2nd place could (and would, after Yuzuru WD) have give him a spot at GPF. Canadian skater got it.

And last, but definitely not least: I can't understand how the scores by two judges can differ by more than 40 points :scard7: (Boyang's at IdF, but especially Yaroslav's at NHK - as the judge who gave the highest score wasn't Ukrainian).

A few things. 1. In absence of greater context, it's actually a little hard to interpret the Japanese judge's scores. It's possible she was really interested in pushing one skater and not in the other, or it's possible she wanted to push both but her objective judgment was that skater one's qualities were better than what the other judge's saw and skater two's qualities are worse. There's always a bit of randomness that comes into play--you can't just take one large score deviation and assume without further context that it's the result of one thing or another. A 12 point deviation, while suspiciously large, sometimes also happens for no obvious reason. On the other hand, if you can show that a judge consistently overscores skaters from the same country, then that 12 point deviation starts to look a lot more suspect. That's what the averages are for.

2. Unfortunately, my methods are not good at handling targeted under-scoring, because I'm trying to avoid having to make judgment calls about whether a judge has an incentive to underscore specific skaters, though you're of course welcome to draw your own conclusions from the raw data (that's what it's there for--unfortunately some things are difficult to show using statistical methods alone). Though under-scoring does have some effect, as it lowers the non-home country skater average, so it's not completely invisible in the summary numbers either.
 

drivingmissdaisy

Record Breaker
Joined
Feb 17, 2010
"Canadian judged are a mixed bag (though Canada doesn’t exactly have a lot of very competitive skaters this year), with some quite biased ones and some quite fair ones. American judges are mostly bad, as are Russian judges. Japanese judges largely avoid being too biased, although one or two are edging a little close. Rounding out the smaller feds, Israeli, Chinese, and Ukrainian judges seem pretty terrible, but most of the other small feds are all over the place as far as bias is concerned."

There can be numerous explanations here. The Japanese men and women were overall outstanding on the GP this season, so there was usually no need for a Japanese judge to boost the scores because they were already high from most of the judges.
 

Miller

Final Flight
Joined
Dec 29, 2016
It looks to me as if the DELTA figure on the individual spreadsheets and the score bias (column E) on the judges database are the key to it all. (N.B. There's been several posts above since when I started to work it out, so apologies if I'm repeating what's already been said).

For example judge Beletski's score for Alexei Bychenko at SA is 9.08 points higher than the average of the other 8 judges, but his scores for the other 8 skaters are an average of 6.81 lower than the average of the other 8 judges, giving an overall 'swing' to Alexei of 17.34 marks (the Delta figure on the database) compared to the other skaters.

Then on the Judges database judge Beletski has given Israeli skaters an average overall score bias of 13.19 points (the figure in column E) with the Men's being the same 17.34 points i.e. it looks if he's just judged the single Men's competition. It looks to me as if judge Beletski is on pretty shaky ground when it comes to national bias.

The question then is whether there's other examples out there, and already judge Kantor from Israel immediately next to judge Beletski is already on shaky ground, plus there's a fair number of other examples out there. Plenty of food for thought, especially as the number of competitions rack up.

The one thing I would say the spreadsheet, or rather the Deltas side, can't really highlight is if there's any specific shenanigans in a particular competition.

For example a judge may only really favour one of their skaters that's really in contention and mark the rest fairly. Similarly their competitors, they may mark one or two down, but many of the others there's no need and they mark them fairly. This may therefore reflect itself in lower Deltas and it may seem the judge is quite fair, when in fact they could be up to no good when the chips are really down - see all the judging shenanigans at the Olympics and the Chinese judges being banned - if you ever look closely at the judges tallies sections on something like SkatingScores you'll find there were all sorts of things going on below the seemingly serene surface, with the best that can be said is that a lot of things just ended up cancelling each other out because of the ISU's tendency to pack its panels with judges from countries whose skaters are in contention.

N.B. I shall return tomorrow with an analysis I've done of how much the Chinese and German judges' judging of the Pairs competition might have affected the result - luckily of course Savchenko/Massot won so it's just an intellectual exercise, but I think you'll find the results very interesting, and food for thought about judges not marking their own countries' skaters.
 

Shanshani

On the Ice
Joined
Mar 21, 2018
It looks to me as if the DELTA figure on the individual spreadsheets and the score bias (column E) on the judges database are the key to it all. (N.B. There's been several posts above since when I started to work it out, so apologies if I'm repeating what's already been said).

For example judge Beletski's score for Alexei Bychenko at SA is 9.08 points higher than the average of the other 8 judges, but his scores for the other 8 skaters are an average of 6.81 lower than the average of the other 8 judges, giving an overall 'swing' to Alexei of 17.34 marks (the Delta figure on the database) compared to the other skaters.

Then on the Judges database judge Beletski has given Israeli skaters an average overall score bias of 13.19 points (the figure in column E) with the Men's being the same 17.34 points i.e. it looks if he's just judged the single Men's competition. It looks to me as if judge Beletski is on pretty shaky ground when it comes to national bias.

The question then is whether there's other examples out there, and already judge Kantor from Israel immediately next to judge Beletski is already on shaky ground, plus there's a fair number of other examples out there. Plenty of food for thought, especially as the number of competitions rack up.

Yes, that's exactly what Delta is supposed to be--the 'swing' between how a judge scores other skaters and their own skaters. You explained it really well, thanks!
 

Shanshani

On the Ice
Joined
Mar 21, 2018
Oh, a note on the discipline bias numbers in the judge's database. Those only show up if a judge has judged their own skater in the relevant discipline. Sometimes judges judge events where they don't have a skater from their own nationality, so just because they only have a bias number for, say, Men's doesn't mean they haven't judged other disciplines, just that they haven't judged any conationals in other disciplines. Sometimes you'll see a discrepancy between overall bias numbers and discipline bias numbers where those numbers are different event though the judge only has numbers for Men in the disciplines columns. This is because the numbers for the other disciplines are still included in the calculation of overall numbers (though I'm not sure about whether I want to keep this, tbh--some weird things happen when you throw in all the numbers together, especially for judges from small feds that don't have skaters in each discipline).
 

drivingmissdaisy

Record Breaker
Joined
Feb 17, 2010
Statistic can be misleading. Let's take NHK. The Japanese judge was really highly biased towards one of Japanese skaters (+12.06), but gave neutral score to another one (+1.18). Summing up, her "bias score" is average

This is interesting to see how a country's judge will strongly favor one skater over another. In the Nagano Olympics LP, The USA judge actually gave Tara the lowest total score (11.6, tied with two other judges) of all nine scores she received: https://www.sports-reference.com/olympics/winter/1998/FSK/womens-singles-free-skating.html
 

karne

in Emergency Backup Mode
Record Breaker
Joined
Jan 1, 2013
Country
Australia
I'm dreadfully terrible with numbers and maths.

What I want to know is consistency. For example, Judge X may have scored Chen 5 points lower than the average. So you shout, "bias!". But if Judge X scored every skater in that event 4-5 points lower than the average, then is it really bias or have we just got a judge whose interpretations are stricter than others?
 

[email protected]

Medalist
Record Breaker
Joined
Mar 26, 2014
[SUB]In a prolonged fit of insanity[/SUB], I decided to compile all of the judges scores from the Grand Prix events in order to make it easier for fans to examine the judging records of judges throughout this season and determine whether and which judges exhibited evidence of bias in their scoring....

.....American judges are mostly bad, as are Russian judges. Japanese judges largely avoid being too biased, although one or two are edging a little close....

I am sure many people won't bother to sift through the numbers looking for "expert's conclusions" and trusting them. As I did a sanity check and my conclusion is different from the above-stated one I would suggest to refrain from making generalizations leaving those to people themselves.

Namely, "mostly bad Russian judges" vs. Japanese judges who "avoid being too biased". If we use the term "bias" we imply one of the following:

1. favouring own country skates
2. lowballing skaters who directly compete with own country skaters
3. lowballing skaters from countries the judge does not like
4. favouring skaters from "friendly countries"

Although I personally could cite numerous cases regarding #3 when a certain judge from a small federation consistently gives low marks to Russian skaters, I would not use it as an argument to avoid "conspiracy theory" accusations.

So, let's stay with #1 and #2. And let's stay with ladies, because Japan is not competitive with Russia in pairs and dance and Russia is not competitive with Japan in men, no matter what marks the judges give.

The bias in ladies is meaningful because it may affect the results (and arguably it affected those at NHK). And the most relevant comparison is NHK vs. Rostelecom - home competitions to Japan and Russia respectively.

In Japan Russian judge gave to Russian skaters combined +23.32 points above the mean and to Japanese skaters 9.71 points below the mean. The total span is, hence, 33.03 . Japanese judge gave +12.22 to the Japanese and -9.09 to the Russians. The total span is 21.31

In Moscow Russian judge gave +22.08 to the Russians and +2.02 to the Japanese. The span is 20.06 Japanese judge gave +8.75 to the Japanese and -17.01 to the Russians. The span is 25.76

Now add the two and we have overall bias 53.09 Russia vs. 47.09 Japan Frankly speaking, I don't see the difference in numbers to justify "mostly bad" vs. "avoid to be too biased" when it matters.

Once again, the tables are thorough and instructive. Thanks a lot. But if any generalizations come out of them one should be prepared to be challenged.
 

Mawwerg

Final Flight
Joined
Nov 8, 2014
Sometimes strange things happened. If somebody could explain me what is behind RSA judge marks at ladies FS at JGP in Ostrava I'll be :party2:. I was shocked at that time and I still can not forget.
 

Shanshani

On the Ice
Joined
Mar 21, 2018
I'm dreadfully terrible with numbers and maths.

What I want to know is consistency. For example, Judge X may have scored Chen 5 points lower than the average. So you shout, "bias!". But if Judge X scored every skater in that event 4-5 points lower than the average, then is it really bias or have we just got a judge whose interpretations are stricter than others?

gkelly asked the same question you did, so please see my answers to that, as well as Miller's excellent recap. In short, this is completely accounted for by the bias statistics. (Also I don't even bother flagging a difference of4-5 points because it's so common so as to be unremarkable and many judges have biases more in the neighborhood of 7-9+.)
 

Shanshani

On the Ice
Joined
Mar 21, 2018
I am sure many people won't bother to sift through the numbers looking for "expert's conclusions" and trusting them. As I did a sanity check and my conclusion is different from the above-stated one I would suggest to refrain from making generalizations leaving those to people themselves.

Namely, "mostly bad Russian judges" vs. Japanese judges who "avoid being too biased". If we use the term "bias" we imply one of the following:

1. favouring own country skates
2. lowballing skaters who directly compete with own country skaters
3. lowballing skaters from countries the judge does not like
4. favouring skaters from "friendly countries"

Although I personally could cite numerous cases regarding #3 when a certain judge from a small federation consistently gives low marks to Russian skaters, I would not use it as an argument to avoid "conspiracy theory" accusations.

So, let's stay with #1 and #2. And let's stay with ladies, because Japan is not competitive with Russia in pairs and dance and Russia is not competitive with Japan in men, no matter what marks the judges give.

The bias in ladies is meaningful because it may affect the results (and arguably it affected those at NHK). And the most relevant comparison is NHK vs. Rostelecom - home competitions to Japan and Russia respectively.

In Japan Russian judge gave to Russian skaters combined +23.32 points above the mean and to Japanese skaters 9.71 points below the mean. The total span is, hence, 33.03 . Japanese judge gave +12.22 to the Japanese and -9.09 to the Russians. The total span is 21.31

In Moscow Russian judge gave +22.08 to the Russians and +2.02 to the Japanese. The span is 20.06 Japanese judge gave +8.75 to the Japanese and -17.01 to the Russians. The span is 25.76

Now add the two and we have overall bias 53.09 Russia vs. 47.09 Japan Frankly speaking, I don't see the difference in numbers to justify "mostly bad" vs. "avoid to be too biased" when it matters.

Once again, the tables are thorough and instructive. Thanks a lot. But if any generalizations come out of them one should be prepared to be challenged.

The conclusions come purely out of the judge summaries, where you can see that most Russian judges have worse records than most Japanese judges. Note that I did not say that Japanese judges were unbiased (I'd certainly be willing to say Nobuhiko Yoshioka is biased, for instance), simply that they weren't too biased as a whole--that just means that compared to other judges, their records weren't particularly egregious by my methodology. Of course, this does not preclude a particular judge from being biased at a particular competition.
 

Elucidus

Match Penalty
Joined
Nov 19, 2017
Interesting and important topic :agree:
However it seems to me you missed elephant in the room. While GOEs/PCS judges are pretty important - they can only influence on the score so much. There is built-in average scoring mechanism in the system for a reason. However there are judges who are much much much more important than what you have in database. Tech callers. Their multiple calls, fake calls or ignoring skaters mistakes combining with their nationality would be much more interesting and important data - considering what huge influence they are having on the score - practically forcing all common judges to lower their scores significantly in case of a call and vice versa.
So without that data I can't see this analysis as complete as it should be. Moreover - techcallers data should be done in the first place IMO - they are that important. Unfortunately questionable calls/lack of them are numerous. It's often when the same callers in the same tournament are very strict for one skater and very lax for another - literally making podium placements as they see fit. This situation is very concerning and should be addressed ASAP.
 

crazydreamer

On the Ice
Joined
Mar 3, 2007
I'm afraid the math is beyond me.

Is there a way to take into account whether a judge is scoring higher or lower than the rest of the judging panel, on average, for the whole event?

E.g., it's not very meaningful to point out that the Freedonian judge scored the Freedonian skater 0.25 higher than the rest of the judging panel, if that same judge also scored all the other skaters in the event an average of 0.25 or 0.20 higher than the rest of the panel.

Or that the Sylvanian judge scored the Freedonian skater 0.5 lower than the rest of the panel if that judge also scored all the rest of the skaters (including the Sylvanian skater, though not as much) lower than the rest of the panel as well.
I’ll one up you. I don’t even think it’s necessarily a problem that a judge clearly has favorites. It’s a taste issue. But what I would be interested in seeing would be, for example, evidence that a judge marked one medalist 10% higher than other judges and the closest competitor 10% lower.
 
Top