Two Olympic Judges suspended by ISU | Page 14 | Golden Skate

Two Olympic Judges suspended by ISU

cohen-esque

Final Flight
Joined
Jan 27, 2014
I'm actually currently working on a method to evaluate nationalistic judge bias based on score differentials across competitions. Basically, I'm taking a judge and look at all their scores for all senior level top tier competitions in the same segment of the same field (I'm testing this right now on Senior Men's Free Skate scores). Then I find the difference between their scores and the average score the skaters received from all of the judges--let's call that measure the score differential. Then I split the score differentials up into two groups: scores for skaters from the same country as the judge, and scores from skaters from a different country.
I’m not sure that this approach is completely sound... what about negative national bias against other countries?

For example, say the USA judge exhibits pretty much no bias towards her own skaters and her scores are well within the average for other skaters as a group. But she had a terrible date with Didier 20 years ago and has ruthlessly lowballed French skaters ever since. Your method would say she’s golden, because the French teams scoring deviation from her would get washed out in her average deviation for all the teams as part of the second group.

For a potential real-life example, the French judge’s scores for Savchenko/Massot at the Olympics look suspiciously like retaliation at first glance.

I find the average score differential for both groups and compare them. So for instance, if the average score differential for Judge A for skaters from her country is +5, while the average score differential for her scores for skaters not from her country is -1, that means that on average, she scores skaters from her country 5 points higher than other judges, and scores skaters not from her country one point lower. I then perform a one-tailed t-test (that's a statistical test used to see if the difference between two averages is significant, ie. not due to random chance) on the two groups of data.

Why not a two-tailed test? It seems like we’d want to account for the deviations in both directions.

This produces a statistic called p, which measures the odds between 0 and 1 that the difference between two averages would be due to random chance if the "real" averages are equal (in other words, if the judge was totally unbiased). So if p is, say 0.05, that means that the difference will occur by chance only 5% of the time if the judge is unbiased and therefore the judge is most probably not unbiased.

I've looks at 11 judges so far, and it's not pretty. All 11 judges scored their home country skaters higher than they scored skaters not from their home country, usually between 4-6 points higher. In 9 out of 11 cases, that difference is significant ie., highly unlikely to be the result of an unbiased judge using p=0.05 as the significance threshold, which is the standard significance threshold for scientific studies. If we lower the threshold to p=0.01, 6 judges' scoring records still show a statistically significant difference between how they score skaters from different countries versus their own.

I do think that 0.05 is a bit of a high standard for judges, when you consider all the myriad ways they could end up with significant devation in the *total* score: especially under the new percentage-based +/-5 GOE system, the difference between an eyebrow raising score might be something fairly innocuous, like they consistently lowballed their own skater on low-value spins but gave higher GOE to the all the high-value jumps. It’s a conspicuous pattern that’s probably not just coincidental, but is it national bias? (And considering the trimmed mean I think we can risk raising the threshold for Type 1 errors slightly.) I’d go with 0.10, although one could arbitrarily set it to something like 0.075 if we don’t want to give quite so much leeway.

In 2 cases (Weiguang Chen and Peggy Graham, a USA judge), the p value is so small that the program I'm using to calculate it ceases to show a number and instead just displays p < 0.00001. Lorrie Parker also clocks in at an extremely low p=0.000024. Drugs are approved on the backs of higher p values than that!
:eeking:
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I do think that 0.05 is a bit of a high standard for judges, when you consider all the myriad ways they could end up with significant devation in the *total* score: especially under the new percentage-based +/-5 GOE system, the difference between an eyebrow raising score might be something fairly innocuous, like they consistently lowballed their own skater on low-value spins but gave higher GOE to the all the high-value jumps. It’s a conspicuous pattern that’s probably not just coincidental, but is it national bias?

First you'd need to check whether this judge has a pattern of being more generous with jump GOEs than spin GOEs across the board.

It would be suspicious if the judge were doing that only for the home country skaters (and maybe the opposite for rivals). Not so much if that pattern were just how the judge tended to score in general.
 

wonderlen3000

Final Flight
Joined
Nov 8, 2008
This is a load of crap. Nothing change in figure skating. Chinese peoples (in general) tend to be less outspoken than westerner so its always easy to scapegoat them. Language barrier, culture, federation power , what ever .... if they suspended, I think Canadian judges should be suspended too.

Did not Canadian judge give blunt higher score for T/V and very score for the defending champion from France at the Olympics?? I wonder why the judge is not suspended too? Oh right ...only judges from Canada, Russia and Europe can get away with it.

But that aside, does it mean ISU going to scrap the anonymous judging?

Maybe they should do a solution of dropping home judges score (if applicable) along with the highest and lowest?? Not sure how much of difference that going to make in term of average score from 6 judges to 5.
 

narcissa

Record Breaker
Joined
Apr 1, 2014
I’m not sure that this approach is completely sound... what about negative national bias against other countries?

For example, say the USA judge exhibits pretty much no bias towards her own skaters and her scores are well within the average for other skaters as a group. But she had a terrible date with Didier 20 years ago and has ruthlessly lowballed French skaters ever since. Your method would say she’s golden, because the French teams scoring deviation from her would get washed out in her average deviation for all the teams as part of the second group.

For a potential real-life example, the French judge’s scores for Savchenko/Massot at the Olympics look suspiciously like retaliation at first glance.

Definitely important to look at the differential...in my graph, Canadian judges seems to have a moderate amount of bias towards their own skaters in GOE, but they have the most negative differential towards other skaters. So the aggregate amount shows that Canada is one of the worst offenders in GOE scoring.

And the Japanese judge seems to be generous in GOE across the board.
 

Sam-Skwantch

“I solemnly swear I’m up to no good”
Record Breaker
Joined
Dec 29, 2013
Country
United-States
Won't help. Negative bias remains.

Do you maybe have a negative bias though toward ISU? I’m trying to say this in a friendly way though. :) I’ve met and worked with ISU officials and members of USFSA and for the most part they seem quite reasonable. The judges I’ve spoke with specifically have a good knowledge of the sport and seem to be pretty sincere in how they judge and welcome discussion.

For me...I just try to enjoy the talented skaters and show patience in the judging and expect it to continue down a slow path of reform. Lots of young people are involved with the sport and will eventually take the reins. I don’t think it will be long until the ISU starts to rate and better monitor the judges and curb some of the Wild West vibe we have now. If not...we’ll do it on the Internet and they’ll know that we know. I actually think the on ice product is better than it’s been in the 15 years I’ve been a part of and watched the sport. At least when we’re talking from the top down and as a whole. Not only the average elite skater but the average lower level skaters seem to be doing far better and performing at a higher level than I’ve seen.

Obviously YMMV but from my point of view the disgruntled fandom I see at GS does not accurately portray the heart of the sport that I see everyday on the ice all the way from the pros to the “joes”.

Anyway....just wanted to share my perspective. I’m quite optimistic TBH in spite of a few bad seeds. I just wish they’d lower the impact PCS has and focus more on the skating from the judging table. Save the rest for the shows and galas and for fans to feel on an individual basis. For me...watching skating competitions is fun and the scores have little impact on the connection I have with the skaters.
 
Joined
Dec 9, 2017
Do you maybe have a negative bias though toward ISU? I’m trying to say this in a friendly way though. :)

Well, yes, but what I meant was that judges who mark up the skaters from their own country, when discarded, won't actually prevent the judges who might mark down skaters from that country.
 

Sam-Skwantch

“I solemnly swear I’m up to no good”
Record Breaker
Joined
Dec 29, 2013
Country
United-States
Well, yes, but what I meant was that judges who mark up the skaters from their own country, when discarded, won't actually prevent the judges who might mark down skaters from that country.

Haha...gotcha ;)
 

Sam-Skwantch

“I solemnly swear I’m up to no good”
Record Breaker
Joined
Dec 29, 2013
Country
United-States
Doesn't matter though. If they actually do improve, I can only pass out of shock :shrug: No way to go but up. I can only quote what the data being presented here is, and it seems fair to me currently.

I’m 100% on board with finding a way to score, rate, and assess the judges marks. I want the judges to have freedom with their scores though. I like the idea of 7’s in some catagories and 9’s in others. Especially if PCS remains as important as it is now going forward so we need to be careful not to create a circumstance where judges are just trying to score what everyone else is. I’m not really interested in corridors in comparison with other judges though because it doesn’t really solve anything. I’m more into analyzing if they are scoring their top skaters and those skaters rivals in a consistent manner as they do the rest of the field and I then if it matches up to similar patterns of the rest of the panel (especially the referee) after the fact.

I think publicly knowing what judges are warned and at which events and especially for which marks would be a great step in the right direction. That would certainly reveal the face of some of the games we see played by some and hopefully deter similar actions from them and even other judges. I also think the next big step will be to eliminate the federations choosing the judges at their events. They can still accomadate and House them as before but hand selecting them...I’m sorry. They’ve just proven time and time again to be incapable of doing it fairly.
 

Spirals for Miles

Anna Shcherbakova is my World Champion
Record Breaker
Joined
Aug 25, 2017
Does anyone have data on Alison Ryan's scores from the ladies at Skate Canada this year? She gave Maria Sotskova a 58 for a clean skate... (well except for the UR on the 2A, which she doesn't call anyway)
 

Eclair

Medalist
Joined
Dec 10, 2012
She didn't score Vincent above Yuzuru. Both the GOEs and the PCS she gave to Vincent were significantly lower those she gave to Yuzuru, especially the PCS.

The difference in the actual scores was smaller than for most of the other judges. Therefore her scores plus the base values ended up higher than their scores plus the base values. But none of the judges had any control over the base values. That was all down to Zhou doing harder jumps, not getting +REP on any of those jumps, etc.

Yes she did. If she was the only judge at that competition, then the placements for the men's FS been: A medal for Vincent. No medal for Hanyu.

There is no way to justify the way she scored except for bland nationalistic bias. She was the ONLY judge out of all 9 judges to give Hanyu 2's in GOE on his 4S and 4T. Every other judge gave him 3. Whenever she could go down in GOE or PCS for Hanyu she did - while still staying within the corridor. Like Posters before have already said, there is basically 0% possibility that this happened per coincidence.

She and judge 5 also gave Hanyu the lowest PCS out of all judges. At the same time, she gave Nathan the highest PCs by FAR out of all 9 judges while going high with GOE as much as possible while staying in the corridor. So no, she isn't just generally strict or generally loose.

In the end, she scored Hanyu approx. 10 points less than the other judges. With your logic that she get investigated.
She scored Nathan 10 points more than the other judges. This also, with your logic should get investigated.
She scored Boyang 11 points less than the other judges. Again, with your logic, this should get investigated.

If this isn't national bias, I don't know what is.

She gave clearly Vincent the highest GOE out of all 9 judges. This in combination with her scoring Hanyu 10 points less than the average judge, lead to her placing Vincent above Hanyu.
 

Matt K

On the Ice
Joined
Oct 3, 2013
I don't think whether it's conscious or unconscious really matters. What matters is whether the judge can grade fairly or not, not whether the judge is grading unfairly consciously or unconsciously.

Yes. I would even say that judges who are unconsciously biased and are giving out the types of egregiously eye popping marks like Sharon Rogers or the 2 Chinese judges have not just a bias problem anymore, but a competence problem. If people are seriously arguing that Sharon Rogers or the 2 Chinese judges, or Lorrie Parker or whoever, is sincerely and genuinely giving out their kinds of marks absent cheating/corruption, then these types of judges seriously have a competence/knowledge/skill problem that is hurting the sport. These people shouldn't be on the panel just because they genuinely (although I highly doubt it) believe the marks they are handing out are fair.

It may not matter for the skaters' results, or for strategies to minimize the effect of such bias on the results short of removing these judges from the judging ranks.

But it does matter when people start throwing around words like "corruption" and "fraud" or even "strategizing" or "manipulating."

Judging is not a social experiment. These people who sit on the judging panel are affecting the results, and I would go as far as to say, the livelihoods of the skaters that they are judging. Take your psychological social experiment to competitions that don't hand out World and Olympic medals, where competitors don't care where they are ranked/placed. Why do you think the 2 Chinese judges were suspended? It is not that much of a stretch to infer that there are quite possibly other factors at play when judges who have a history of overmarking/undermarking skaters and teams like Rogers and the 2 judges mark the way that they do that go simply beyond irresponsible bias (conscious or unconscious).

If not, then there would not be a need to suspend judges like these, or even talk about them for 20+ pages in a forum like this. The fact is, whenever judges overmark/undermark teams and skaters there is a very valid perception to the public and everyone else that manipulation and dishonesty is at play. And this is a very valid perception that should be raised.
 

Miller

Final Flight
Joined
Dec 29, 2016
Some feds have better skaters, and you can’t assume that the mean aptitude of skaters from all countries is the same. All true.

I think that this is utterly irrelevant to Miller’s analysis, but, I don’t think they expressed what I believe they’re trying to say very clearly. I think that when they say “placed their skater 1st” or “nine our nine 1st places” they aren’t referring to the judge ranking the skater in the competition. They are referring to the judges marks for an individual skater compared to whole panel: so, a “1st place out of 9” here would mean that they marked their own skater the highest of all the judges, a second place would mean they gave the second-highest marks, and so on.

Basically, find how high their scores are vs everyone else, and convert it to an ordinal value for easy math.

We should expect that the judges’ marks for individual skaters are similar. For example, the skater in 5th places who scores 175 points in the FS should receive ~175 points from every single judge, with some normal coincidental variation that *is not due to judging bias.* Since scores can be very, very close in skating competitions, the total scoring deviations from a judge for an individual skater may not actually tell us anything useful. (See 2014 Worlds, Ice Dance podium.) And their scores ranked against the other judges’ may not tell us anything, since with close scores even small deviations can produce dramatic swings there. If the US judge is scoring one USA skater one time and gives them the highest marks of the panel by 2.78 points (spread out over 12-13 elements and 5 PCS categories), is that due to bias, or just coincidence?

But if that country has more than one skater in an event, then that country’s judge will be evaluating a home skater on more than one occasion. Especially if that judge is on the short and free panels and their skaters progress to the free. In this case, since without bias it should be coincidental whether they give their own skaters the second-highest or third-lower standards or whatever else of the panel, we expect their average ordinal to be 4.5– so, for obvious reasons, let’s say we’ll take either between the 4th or 5th ordinal marks among the judges and say that’s fine.

For an example, say three Armenian Pairs make the FS at Worlds. There is an Armenian judge on the SP and the FS panels, so she evaluates her own Pair six times (3 Pairs x 2 segments.) The break-down of her marks, ranked against the other judges marks, looks like so:
Armenian Pair 1: SP 1st, FS 7th highest marks
Armenian Paid 2: SP 1st, FS 3rd highest marks
Armenian Pair 3: SP 4th, FS 6th highest marks​

So, on average, her ordinal score for the Armenian team is 3.67. We said 4th highest is good so she’s a little high... but is she too far out of bounds? Let’s check it against her other scores for her non-home country skaters... and, voila, 3.81. So her relative scores for her own skaters are really just fine, compared to how she usually scores.

Meanwhile we get the Moroccan judge, who gives his skaters an average 1.8. His average for non-Moroccan skaters is 6.78. Now, *he* is clearly biased.

In both cases, you need to establish an acceptable corridor of scores for those judges, which I will arbitrarily set within one standard deviation of the mean. Say the SD is 1.0 (cuz I’m lazy.) That means a judge would come under scrutiny for:
-Likely national (or other specific) bias if they score only certain skaters an average of 3.5+ or 5.5-
-Generally incompetent judging if they score all skaters like that.
-Both bias and incompetence if it’s mixed (I.e., the Moroccan judge.)

Notice that it doesn’t matter what the rank of their skaters in the competition— their aptitude, in other words— was for this method to work. It looks only at whether a judge gives relatively high marks compared to both herself and her peers for her own skaters. (Or, relatively low marks.)


Now, I’m not necessarily endorsing this approach. I see a *lot* of potential issues with it, particularly regarding very small relevant datasets. In real life, it would be just about useless at identifying if the Moroccan judge is biased since it’s such a small fed with few skaters. (Are there even any Moroccan judges?) It would work best for judges from successful, big feds with lots of opportunities, and over whole seasons rather than individual competitions. (It would work extremely well, though, to determine whether all the judges, together, of a certain fed tend to be more or less biased, which seems closest to Miller’s post regarding national bias, in general.)

*Or this isn’t anything at all like what Miller meant, in which case... take it as my own interesting proposal for a possible system of judging review, I guess.

**Or I’ve made some obvious glaring error and this whole thing is nonsensical. I have had a very long work week, so it is very possible. :bed:

Ah, this makes a lot of sense and is an interesting study, yes. If that is what Miller meant, I apologize (and certainly find his results very interesting...).

Thank you. This is exactly what I meant, and yes I probably didn't express it too clearly - have probably looked at it for far too long and have assumed everyone realised where I was coming from.

Re issues raised, I did look at all Federations judges but of course they're very small sample sizes. However back in my post on page 10 (don''t know how to link to it exactly) I did point out 11 countries who never placed their skater lower than 3rd out of the 9 judges, plus there were a number of others who got by by the skin of their teeth, so it's not just the big federations.

Also, just on doing it segment by segment it was so noticeable how it was across the board. Look at a skater's scores where their country's judge was on the panel. Another one to add to the collection. In fact you were taken aback when someone marked their skater low e.g. the Belgian judge giving Loena Hendrickx's free skate at Worlds the lowest of the panel - I could quite happily have given it 10 points more!

So yes, it was looking at how a judge marked their skater relative to the other 8 judges on the panel, not the final ranking vs all the other skaters - I 'looked across' on the judges tallies section of skating scores - link to Ladies LP at Worlds with Loena in 9th out of the 9 judges, but ranked 7th and final position 6th http://skatingscores.com/2018/wc/ladies/long/tss/. Try it with the Olympics, you'll be shocked, it's absolutely relentless.
 

Miller

Final Flight
Joined
Dec 29, 2016
The strength of such a system would be in establishing national bias exists generally on the grand scale (as in, all competitions in all disciplines over the whole season, or all competitions a particular judge has ever been in)— and here it shouldn’t favor the large feds that much, even with strategic overscoring. That’s how Miller was originally using it in his post: my post is a bit of a clarification of what I thought he meant to Narcissa, and then a brief general breakdown of the idea of evaluating the judging using their scores as ordinal variables.

Have just discovered this post - there's so much going on what with Shanshani's thread about judging bias and so on.

Yes, this is exactly what I had in mind. There was no worrying about where a skater (or their judge) came from, or what their margin might have been over other skaters. It was simply a numbers game, and if you've got a big enough sample size you can prove national bias one way or another.

For example if there'd only been 21 skaters at the Olympics judged by their own country's judge then it would have been meaningless, but at 221 it was good enough for my mind. At 2,221 for the entire season you would have had total and definitive proof, though I would suggest that the distribution of placements at Worlds and the Olympics was more than enough.

Also you could use it for individual judges who have judged their skaters on multiple occasions. It will be interesting to see how many occasions Shanshani's judges actually judged their own skaters. You will need a reasonable sample size, but looking at all international judges might not be too difficult a task if someone was prepared to do it - I'm a pen and paper person - but just finding how many times a judge judged their own country's skaters and what their average position was should flag up any potential candidates quite quickly e.g. I did it for Lorrie Parker in the other thread and came up with an average position for her own skaters of 1.97 based on 37 judging occasions. Anyone with an average of say 3.00 or less over multiple occasions would surely be a candidate for further investigation.
 

[email protected]

Medalist
Record Breaker
Joined
Mar 26, 2014
What kind of delusional conspiracy it is when people can count / identify & track steps and see striking discordance with attributed tech level noticing shady call? Or when they do see the very same skater skating almost virtually the same with miraculous increment in points on everything they do and receiving how conveniently no calls for what they did wrong again like someone used to throughout the season. It results in fraudulent result, so calling a fraud a fraud is not egregious from that stance.

Convoluted rules, various top-down powers and subjective factor unfortunately makes it easier to cheat and it still does continue even if nothing beats that one instance in my memory. You can always trust in people to defend and proclaim that they ''saw everything differently because they were there, they are tech specialists, skilled judges, they were closer ...''.

The thing stays, they are not willing to deal with the core of their corruption, because they'd have to get rid of themselves essentially.

Other people posted 3F UR which was not called either. It's a pointless discussion because the parties involved are biased. I am biased, sure, but I am not attaching labels to opponents.

Yes, I did not count the number of turns. But I was on the stands that day and I saw "the skate of the life" from the winner with tons of passion and desire and just one stepout on a double jump. I saw another skate of the life by Mao who like a samurai gave her all. And I saw a lifeless skate from the silver medalist who herself then said: "I am glad it's all over". And she skated like that. And she lacked one triple. And her spins were the spins from previous ages with the exception of the signature layback. The judges apparently saw it the same way because what is eggregious is to believe that "evil Putin" made ISU judges from various independent from Russia countries push the winner up disregarding what actually happened on the ice.
 

NymphyNymphy

On the Ice
Joined
Aug 26, 2017
Other people posted 3F UR which was not called either. It's a pointless discussion because the parties involved are biased. I am biased, sure, but I am not attaching labels to opponents.

Yes, I did not count the number of turns. But I was on the stands that day and I saw "the skate of the life" from the winner with tons of passion and desire and just one stepout on a double jump. I saw another skate of the life by Mao who like a samurai gave her all. And I saw a lifeless skate from the silver medalist who herself then said: "I am glad it's all over". And she skated like that. And she lacked one triple. And her spins were the spins from previous ages with the exception of the signature layback. The judges apparently saw it the same way because what is eggregious is to believe that "evil Putin" made ISU judges from various independent from Russia countries push the winner up disregarding what actually happened on the ice.

Yuna had the most disrespectful experience in her career. Of course shes glad its over. Adelina skated the "skate of her life" . Not going to disagree but saying you are going to execute hard elements is DIFFERENT from actually executing them properly. Yuna lacked a triple? Adelina's Free program alone was given an extra 20+ points. Her PCS rose from mid 7's to 9.5's in a matter of months. It took Yuna, Mao, Carolina YEARS of competitions to obtain 9's.

Yuna was lifeless?? You need to get your eyes checked. Apparently you have zero understanding of musicality or maybe you are just blinded by patriotism. Yuna performed a sensual tango to signal the end of her career and to thank everyone who supported her. Adelina performed like an aggressive junior skater going through the movements. Her edges lacked maturity and flow. Keep being delusional. Putin spent millions on Sochi. There was no way he was going to let anyone but a Russian win the womens' event after Russia lost the hockey game.
I'm just glad we've now got some Russian girls worthy of a gold medal. Adelina's gold will forever be disregarded by majority of the skating community. To call Yuna's program lifeless is an absolute insult to skating as a sport of art and technique. Think before you speak.
 

narcissa

Record Breaker
Joined
Apr 1, 2014
Other people posted 3F UR which was not called either. It's a pointless discussion because the parties involved are biased. I am biased, sure, but I am not attaching labels to opponents.

Yes, I did not count the number of turns. But I was on the stands that day and I saw "the skate of the life" from the winner with tons of passion and desire and just one stepout on a double jump. I saw another skate of the life by Mao who like a samurai gave her all. And I saw a lifeless skate from the silver medalist who herself then said: "I am glad it's all over". And she skated like that. And she lacked one triple. And her spins were the spins from previous ages with the exception of the signature layback. The judges apparently saw it the same way because what is eggregious is to believe that "evil Putin" made ISU judges from various independent from Russia countries push the winner up disregarding what actually happened on the ice.

What 3F?

Funny, I saw something completely different. A performer who skated to a beautiful, languid piece of music in a way that no other skater could have been able to pull off, because anyone can do upbeat, especially when you're skating well, but it really takes someone who actually feels and connects to the music to be able to portray that type of nostalgic, and the story she wanted to portray on the ice: the bittersweet end to a long, tumultuous career, and the fans that she wanted to say goodbye to. And it made me cry because of how beautiful it was, a perfect embodiment of figure skating: art and athletics.

And then I saw the eventual winner who clearly had the skate of her life and couldn't contain her excitement even before the end of her program, which made me tear up but in a different way. Because I was happy for her. But we don't score a skater by how happy we are that they finally skated well for once.

Clearly we saw different things, and there's nothing wrong with that. But it just goes to show that judging shouldn't be about these personal feelings, and we do need to count the steps. And the transitions. And the URS...If we are going to judge fairly.

But then again, if Alina Zagitova could get 75 PCS in her first senior season, and that's appropriate judging, then by that standard Sochi wasn't really that controversial after all.
 
Top