Interesting and important topic :agree:
However it seems to me you missed elephant in the room. While GOEs/PCS judges are pretty important - they can only influence on the score so much. There is built-in average scoring mechanism in the system for a reason. However there are judges who are much much much more important than what you have in database. Tech callers. Their multiple calls, fake calls or ignoring skaters mistakes combining with their nationality would be much more interesting and important data - considering what huge influence they are having on the score - practically forcing all common judges to lower their scores significantly in case of a call and vice versa.
So without that data I can't see this analysis as complete as it should be. Moreover - techcallers data should be done in the first place IMO - they are that important. Unfortunately questionable calls/lack of them are numerous. It's often when the same callers in the same tournament are very strict for one skater and very lax for another - literally making podium placements as they see fit. This situation is very concerning and should be addressed ASAP.
this is a lot of work. However, for now, it doesn't mean much as sample is too small..
I will give just one example... Cze judge, has -11.... judging ONCE ONE Cze skater... could be tough love... could be that she doesn't like Michal... could be that she has seen him skate SO MANY times that she recognizes a lot of his problems... could be anything.. but one thing for me that it doesn't mean is that she judges her own country's skaters less favorably ... not with a sample of ONE skater, in ONE event.
Maths are great. Love them... and I am sure that the OP is well aware of the sample size... so if OP is willing to keep going, I'd like to see this when most judges have judged several events and several skaters from their respective country... etc... though, like all of you, I opened my eyes very wide when I saw some countries being mostly in red...
Also, as someone mentioned earlier : it will be interesting to see how Japanese judges mark their skaters if they mess up so far, they have had an easy way here on GP as many have done so well and all agreed (other judges) ...
Finally, to me, what matters is the power of the individual judges on the global result. IF a judge by their OWN scoring, and I say ONE Judge, has managed to change the end result, then I am not happy.
Again, thanks for all the work.
But tech calls (levels, downgrades, underrotations, !, e, etc.) don't lend themselves to this kind of statistical analysis. Classifying calls as "fake" or "questionable" or "ignoring mistakes" is inherently subjective, for the same reasons that those calls themselves are necessarily subjective.
Earlier on in the thread I said that I would provide an analysis of the Pairs competition at the Olympics and how the result might have been affected by the scoring of the Chinese and German judges – remember that Savchenko and Massot won by 0.43 points, 235.90 to 235.47 and that the Chinese judge was banned afterwards.
...
This is really great and powerful analysis, mr. Miller. :agree: More of them, please
I think it is also worth remarking that even for judges' scores, statistical analyses accept the underlying assumption that the majority of judges is "right" and the odd-ball judge is "wrong," either because of national bias or incompetence. So we have to be careful before we frame our conclusions in accusatory language.
That said, all this information is very cool. The ISU did us a big favor by eliminating anonymous judging.
Substituting correlation with causation is a common pitfall. For example, there might be a correlation between risky behavior of a person and the likelihood she is going to spend vacation in Las Vegas. But the causation: if a person goes on vacation to Las Vegas she is likely to take risks in life does not work. I, for example, visited Las Vegas 7 or 8 times in my life. But I do not like risk. And the reason for me to go there was never gambling but concerts, shows, and fine dining.
Finally, the only cases which matter are those when judge's bias affected a significant outcome: podium placements, number of country spots, etc. Those should be studied and discussed. The rest is just a pastime.
The math eludes me as well, but all I have to say is that bias is inherent to judged sports. Judges are human and humans are prone to bias. Someone like Hanyu or Simone Biles is going to be given the benefit of the doubt while someone like Keegan Messing or Mariah Bell isn't. Also once you get a reputation for under rotation all of your jumps are going to be scrutinized, where someone who has a reputation for landing clean jumps may not be. Judges are human. It makes it kind of a relief to watch track or swimming or something that's timed sometimes (at least for me).
I’d be an oddball judge for certain!!
Even if a judge's bias didn't affect the results of a particular competition, it could very well affect the results of future competitions. Therefore, it's important that we discuss and keep account of judges' biases.