In Praise of the Judges | Page 2 | Golden Skate

In Praise of the Judges

GGFan

Record Breaker
Joined
Nov 9, 2013
Wow. Thanks Shanshani!! You guys are no joke.

Judges marking their own skaters better is definitely an issue that has been going on from the beginning of figure skating. Even with that being the case, how often are these judges able to spread their influence and really get their way? Even if they over score their own by 10 points won't there be underscores to offset that?

Note that this is not just about saying judges are great. Rather I'm arguing that with the math and the number of outputs the outliers get pulled back toward the mean. There is no verdict that "the judges" agree too, especially in tight decisions.
 

Shanshani

On the Ice
Joined
Mar 21, 2018
Wow. Thanks Shanshani!! You guys are no joke.

Judges marking their own skaters better is definitely an issue that has been going on from the beginning of figure skating. Even with that being the case, how often are these judges able to spread their influence and really get their way? Even if they over score their own by 10 points won't there be underscores to offset that?

Note that this is not just about saying judges are great. Rather I'm arguing that with the math and the number of outputs the outliers get pulled back toward the mean. There is no verdict that "the judges" agree too, especially in tight decisions.

It's true that scores often average out (but not always!) and there are limitations to analyzing things the way I've done--for instance, if you think that the judge's collectively overscore Shoma Uno or whomever, then there's nothing I can really say about that since I only measure how judges score relative to other judges.

But I think about it this way--if a judge consistently shows a propensity to overscore their own skaters by significant margins, that suggests to me that they are not trying very hard to be objective. After all, not every judge has the same problem--for instance, US judge Wendy Enzmann is pretty consistently fair (though do note that this trait is not shared by other US judges)--so it's clearly possible to be unbiased. Therefore, home country biased judging casts a shadow on their judging as a whole. If a judge is willing to push home country skaters so hard, what stops them from trying to manipulate scores in other ways?
 

GGFan

Record Breaker
Joined
Nov 9, 2013
I guess I look at it more from a game theory or economics perspective. There will always be incentives for judges to behave badly and there will always be biased judging. If we accept that premise my question is whether the system has mechanisms to prevent any one judge from affecting the outcome too much. I think the amount of inputs and the use of means does a lot of that work. Yes, we still end up with some screwy results from time to time, but the individual biased judge is policed somewhat.
 

Shanshani

On the Ice
Joined
Mar 21, 2018
I guess I look at it more from a game theory or economics perspective. There will always be incentives for judges to behave badly and there will always be biased judging. If we accept that premise my question is whether the system has mechanisms to prevent any one judge from affecting the outcome too much. I think the amount of inputs and the use of means does a lot of that work. Yes, we still end up with some screwy results from time to time, but the individual biased judge is policed somewhat.

I think that's a little too defeatist. Some judges are able to resist the siren song of home-country scoring--why shouldn't we try to have judging panels more full of those kinds of judges? Moreover, an ability to resist home-country scoring also suggests an ability to resist other kinds of incentives to score badly that the judges might face. For instance, if certain feds really do push certain skaters, and certain skaters are over-scored or underscored because of what federation they belong to, then someone who is able to resist the pressure to score home country skaters higher might also be able to resist the pressure to score big-fed or politically-pushed skaters higher. Plus, given the tiny margins that placements are sometimes decided by, even a well designed system that permits too much national bias from individual judges will occasionally have that national bias severely affect outcomes. For instance, last week at GP Finland, the difference between Stanislava Konstantinova and Kaori Sakamoto's scores was 0.15 points. I'm pretty sure if the Belgian judge De Rappard (who underscored Kaori by 10 points, presumably because she wanted Loena Hendrickx to podium) was replaced with an average, non-biased judge, that would have made the difference not only for 2nd and 3rd place but also in terms of which skaters make the Grand Prix Final.
 

GGFan

Record Breaker
Joined
Nov 9, 2013
I think that's a little too defeatist. Some judges are able to resist the siren song of home-country scoring--why shouldn't we try to have judging panels more full of those kinds of judges? Moreover, an ability to resist home-country scoring also suggests an ability to resist other kinds of incentives to score badly that the judges might face. For instance, if certain feds really do push certain skaters, and certain skaters are over-scored or underscored because of what federation they belong to, then someone who is able to resist the pressure to score home country skaters higher might also be able to resist the pressure to score big-fed or politically-pushed skaters higher. Plus, given the tiny margins that placements are sometimes decided by, even a well designed system that permits too much national bias from individual judges will occasionally have that national bias severely affect outcomes. For instance, last week at GP Finland, the difference between Stanislava Konstantinova and Kaori Sakamoto's scores was 0.15 points. I'm pretty sure if the Belgian judge De Rappard (who underscored Kaori by 10 points, presumably because she wanted Loena Hendrickx to podium) was replaced with an average, non-biased judge, that would have made the difference not only for 2nd and 3rd place but also in terms of which skaters make the Grand Prix Final.

I don't want the thread to be only my thoughts. I agree that we should continue to try to improve judging. In other threads I've talked about that as well.

I think what people talk about much less is that these tiny margins are meaningless and us deciding winners on them makes little sense. That is the margins are so tiny that if we had a panel of 10 different judges judge the competition we would probably get different outcomes each time. That's without even getting into biased judging. The margin of human error is much larger than this system acknowledges and that's a very uncomfortable reality. The truth is that Satoko and Liza were tied, but we accept the fiction that one was better because we don't like ties in competitions.
 

Danny T

Medalist
Joined
Mar 21, 2018
Since there were quite some complaints about Liza's jump GOE being underscored, I've been thinking about it. At first I'm inclined to agree - her jumps have textbook take-off, nice distance and height, usually nice flow-out. What's not to love right? But what I think the judges also consider, is the long telegraphing. The triple axel is of course the most noticeable, but her other jumps are not really entered from steps either (the double axel is the least telegraphed). And the rules did say telegraphing is deductable by 2-3 GOE. So most of Liza's jumps tick off bullets 1-4, and bullet 6 (maybe), making it +4-5. If the telegraphing penalty applies, her GOEs become +2-3 which are what she frequently gets. Now, of course the judges can just hate Liza/want to push Satoko/being paid off by JSF/home ice/etc., I really don't know and nobody can prove. But going by what the rules say, Liza's GOEs are not that inexplicable.

A few side notes: my hypothetical explanation is only for Liza's jump GOEs in isolation, not relative to Satoko's at all. Plus, from what I observe this telegraphing rule is not very applicable to men (hence why I forgot it exists half the time), especially men doing quads. But I don't think telegraphing is going to fly very well with judges in ladies, we're not in pre-Sochi/Vancouver era anymore. Funnily enough, Liza might want to thank her compatriots for that, ie. Adelina & Yulia.
 

Globetrotter

Medalist
Joined
Jan 17, 2014
The problem is not about a mistake, but multiple, and that's the same argument all the time.

It got really close between Satoko and Liza because judges wanted to get it that close.

I don't honestly get the argument "Satoko = bad technique; Liza = bad artistry so that's why scores are close" because i don't think the two issues are comparable at all.

Elizaveta Tuktamysheva's only real weakness in the components is lack of transitions; the performance is there, the interpretation is there, she has decent skating skills (obviously not on par with Miyahara),... Transitions is 1 element out of 5.

On the other hand Satoko has many mistakes on her jumping technique (edge issues for both flip and lutz, bad take-off, prerotation, underrotation, and the jumps are the smallest you can see in senior ladies), which quite frankly shouldn't let her being a top contender at all, i'm always shocked when i see her TES over 40 and over 70, because that means some judge gave her +3 and +4 for her jumps which is an actual mistake according to the new rules.

It bugs me soo much because we've seen this soo many times, ISU even changed the rules and here we are again.

Personally, I don’t have any issues with the final placement. My bug here is that Tuks and Satoko’s extreme differences in strengths ought to have been much more obvious. Jumps = Tuks >> Satoko; spins and steps Satoko >> Tuks. With the new +/-5 GOE, this should be more obviously differentiated but alas, the jump GOEs were just not suitably different. PCS wise, Satoko >> Tuks for TR, CO; > for SS, IN; Tuks >> Satoko for PE. I just wish the differences were more suitably scored.
 

Miller

Final Flight
Joined
Dec 29, 2016
I wasn't gonna post this until I was finished, but I've also been tracking judge's scores, and tbh there isn't that much to praise.

Random Senior Bs

Ondrej Nepala


On each spreadsheet you can see how much a judge over/undermarked each skater relative to the average of the other judges. You can also see their average score deviation for same-nation skaters ("MEANSAME"), the average score deviation for different nation skaters ("MEANDIFF"), and the difference between the two ("DELTA"). Underneath you can also see the same stats but for GOE and PCS raw scores.

If you look through, you'll see that there are lots of cases where the difference between how a judge marks their own nation's skaters and different nations' skaters is as much as 10 points or higher, and there are very few cases where judges under-mark their own skaters (which should happen around half of the time if judging were completely fair and score deviations completely random).

The marks of the Russian and Kazakhstani judges at Ondrej Nepela for Rika Kihira's FS were an utter joke. 10 and 12 points below the actual figure, which incidentally would also have reflected one of their scores. Also the only 11 point differential between the Free Skates of Elizabet Tursynbaeva and Rika. Elizabet had 3 URs and a fall in her non-3A 7 triple skate. Rika had 2 3As in her 8 triple skate which only had a mistake on the final 3S.

Out of interest this example also shows the effect if you have more than 1 outlier judge. If you average the middle 5 judges tallies for Rika you will get a figure of 147.27 vs an actual of 147.37 which incidentally shows that all the fine detail of knocking off highest and lowest GOEs etc. doesn't have a great deal of effect - I've tried it for one or two other protocols and you never get a big difference e.g. Elizabet would have had 122.41 vs 122.31 for her FS.

Hence if you remove the Russian judge's score and then take the middle 4 after that i.e. remove the scores of the Kazakhstani and Japanese judge, who also had nothing to be proud of BTW, then you get a Free Skate scopre of 149.77 i.e. the effect of a 2nd outlier was to reduce Rika's score by 2.4 points, so it really can make a big difference if you've got something like this.

http://skatingscores.com/2019/cssvk/ladies/long/tss/

Also out of interest National Bias very probably wouldn't have affected the final outcome of the NHK battle between Satoko and Elizaveta.

Looking at Skating Scores the Russian judge gave Elizaveta a total of 228.01 vs 219.02 actual, the Japanese judge Miyahara 224.57 vs 219.47 actual, while the Russian Judge gave Miyahara 215.59 vs 219.47 actual, and the Japanese judge gave Elizaveta 214.80 vs 219.02 actual i.e. the Russian judge favoured their own skater more than the Japanese, but the Japanese judge marked down Elizaveta more, but not by as much. Overall if you were to remove these judges from the equation entirely the result would almost certainly have been the same, notwithstanding anything that might have come out of the fine detail of removing highest and lowest GOEs etc.
 

GGFan

Record Breaker
Joined
Nov 9, 2013
Thanks Miller! I really appreciate all of the posters with more math knowledge than I do adding their expertise. I'm learning new things by actually looking at the numbers instead of just assuming things. :yes2:
 

Shanshani

On the Ice
Joined
Mar 21, 2018
Out of interest this example also shows the effect if you have more than 1 outlier judge. If you average the middle 5 judges tallies for Rika you will get a figure of 147.27 vs an actual of 147.37 which incidentally shows that all the fine detail of knocking off highest and lowest GOEs etc. doesn't have a great deal of effect - I've tried it for one or two other protocols and you never get a big difference e.g. Elizabet would have had 122.41 vs 122.31 for her FS.

Hence if you remove the Russian judge's score and then take the middle 4 after that i.e. remove the scores of the Kazakhstani and Japanese judge, who also had nothing to be proud of BTW, then you get a Free Skate scopre of 149.77 i.e. the effect of a 2nd outlier was to reduce Rika's score by 2.4 points, so it really can make a big difference if you've got something like this.

Yes, exactly. It's actually not uncommon at all to have two outlier judges in the same direction (usually negative). If you think about it, we should expect this if judges are behaving in a biased manner--after all, if more than two nationalities are competitive for the podium, then any individual top skater will have more than one judge looking to knock down their score. One outlier judge will get eliminated through trimming, but the other one will still be included and can have a pretty significant influence on the score. Add to this the fact that not every skater, even every top skater, will have a judge of their nationality on the judging panel in any given event (or may only have a judge for one segment), and you can start to see how bias can affect competition results fairly quickly.
 

moriel

Record Breaker
Joined
Mar 18, 2015
I don't want the thread to be only my thoughts. I agree that we should continue to try to improve judging. In other threads I've talked about that as well.

I think what people talk about much less is that these tiny margins are meaningless and us deciding winners on them makes little sense. That is the margins are so tiny that if we had a panel of 10 different judges judge the competition we would probably get different outcomes each time. That's without even getting into biased judging. The margin of human error is much larger than this system acknowledges and that's a very uncomfortable reality. The truth is that Satoko and Liza were tied, but we accept the fiction that one was better because we don't like ties in competitions.

But yep, agree with this.
If you look at my numbers, the deviation from average for each judge seems to be something around 1 point.
That deems differences of 0.5 point in final score statistically insignificant. While this is not necessarely clear for people, they kinda feel that the small differences could go any way and its kinda unfair.

Random thought, maybe isu should accept this and enable ties. So we would have stuff like

Olympics:
gold: Alina & Zhenya
silver: Osmond
bronze: Satoko

NHK:
gold: Rika
silver: Satoko and Liza
bronze: mai

would be much more sensible and would have saved us lots of controversies.
 

yume

🍉
Record Breaker
Joined
Mar 11, 2016
the same arguments about Satoko are repeated without paying attention to the main argument of the thread which is that judges do not vote together as a block and that they do not get to decide placements after everyone has skated. There's not universal skating conspiracy. I just think people want to be able to say that the results mirrored what they wanted.

Then, even in the case of really egregrious overscoring like Kostner free skate at euros (where she received 75 PCS for a disaster), should we really think that it's really each judge that thought objectively that this program deserved a very high artistic mark?
 

GGFan

Record Breaker
Joined
Nov 9, 2013
Then, even in the case of really egregrious overscoring like Kostner free skate at euros (where she received 75 PCS for a disaster), should we really think that it's really each judge that thought objectively that this program deserved a very high artistic mark?

I think you bring up a very good point. I've been focusing on the tiny margins and how no one judge can make a difference, but there are other problems that the system does not correct well for. I know some of you have more expertise but here are some examples I can think of (I'm really not trying to pick on the skater but on the judging):

1. Alina and Nathan's PCS rising across the board in one season without any visible improvement in their skating. This screams of great jumping prowess bringing up their PCS, as well as reputation judging and federations pushing skaters.

2. Skaters like Patrick and Carolina having such amazing skating skills that they are not properly docked when their programs are a mess. Even if the judges wanted to hold up the skating skills mark the other categories should still be affected.

3. Skaters like Satoko and Shoma who have some systemic issues in their jump technique (I love both of them but the toe jumps that become edge jumps are pretty ugly. e.g Shoma's 4F) who never see it reflected in their jump scores. They may not have noticed the issues a few years back but now everyone knows and they should be docked with a -2 or -3 on those jumps.

These are all instances where the outlier is the individual judge (there's usually one) that actually scores these skaters by their performances on the day. Their score does get completely eaten up by the others who have magically drifted upwards and refuse to notice issues.
 
Last edited:

gkelly

Record Breaker
Joined
Jul 26, 2003
Thanks, GGFan, for starting this thread and for making some great points.

2. We assume that the judges have the amount of time to digest the performances that we have. The judges are making quick decisions. Yes, there is some prejudging but they also have to make decisions that I'm sure they would slightly revise with the benefit of time and hindsight.

3. We assume that the judges decide the placings following everyone's performance. They have to judge the event as it goes. That gives us a huge advantage over them.

These are important.

If you want to get a sense of the experience of judging, try doing it yourself in a setup that closely replicates real judges' experience.

Ideally, try it at a live event where you have seats relatively close to the ice (OK if they're on the opposite side from the official panel). Could be an elite event or could be young kids at a local event -- if you're using IJS scoring, then choose an event that follows the same rules you're applying.

Stand by the marks you give in real time. The only changes you're allowed to make after the fact would be to subtract GOEs for jumps that were called as <, <<, or e that you didn't already take into consideration on your own.

If a live experience isn't possible, try it on video.

However, to keep yourself honest, only events for which you know nothing about the outcome should count. Obviously, watching in real time as the event unfolds is best, because nobody knows any outcome. Failing that, choose a video of an event that either has just finished and you haven't read or heard anything about, or one that you never paid attention to at the time.

Videos without commentary during the programs (or at least without commentary in a language you understand) and without a visible scoretracker are better for staying honest to your own evaluations in real time with no influence from the official scores.

If you like, do one event with just GOEs and another with just PCS. Don't worry about levels for the non-jump elements because judges don't know what levels are awarded. They are told about jump calls after the reviews.

On a different occasion, you could play tech specialist and look only at the calls. The best situation would be to use a high-definition video that you can use to replay pieces of the program at regular and slow motion speeds as allowed. Do your best to call the levels of the spins and steps in real time but since you only have one set of eyes rather than three you could go back and replay one more time to verify or to concentrate on other features. Try to keep your time for reviewing elements in a program no longer than the length of the program itself.

Don't try to judge and call at the same time. But if you're judging and you see underrotations or wrong edges, do make a note and reflect them in your GOEs.

For PCS especially I recommend JGP events. You won't have heard of most of the skaters before, especially at the first event of the season when even those you are familiar with might have made big improvements or suffered injuries or growth spurts since you saw them last. You can expect scores to range between 3s and 7s (though 4s and 5s are most common); but don't be afraid to award 2s or 8s if you think a skater deserves them for one or more components. Short programs on the JGP are a random draw, so the next unfamiliar skater might fit anywhere in that continuum.

The problem with video is that it does not convey the quality of skating skills reliably, or to a lesser extent program competition in terms of layout and ice coverage. Do your best to estimate. Maybe start by trying to evaluate only SS for one event and then only other components for the next.

You can report back on what kinds of thought processes you experience when scoring in real time without knowing the outcome and how that differs from attempts to judge the judges after the fact.
 

yume

🍉
Record Breaker
Joined
Mar 11, 2016
I'm trying to find excuses or at least understand judges. Maybe they, and some fans know or see something in Shoma and Satoko's jumps that i don't see.
I will take Mao Asada as the worst jumper of the CoP era, since she's surely the top skater who got dinged the most. I take two of her best and clean free programs (with no obbious mistakes) that she skated, one for each complete quad in which she competed: 2007 GPF and 2014 olympics. In the first she received mainly -1,0,+1 for jumps called clean. In the 2nd she got killed by UR calls.
In what Satoko's technical performance at SA this season or olympics for example (programs with no calls and generous GOEs) was superior to that?

Or just overall, what makes Satoko a better jumper than Mao?
 

Sam-Skwantch

“I solemnly swear I’m up to no good”
Record Breaker
Joined
Dec 29, 2013
Country
United-States
Don't try to judge and call at the same time. But if you're judging and you see underrotations or wrong edges, do make a note and reflect them in your GOEs.

I think this is what people would like to see more of from the judging panels ;)
 

moriel

Record Breaker
Joined
Mar 18, 2015
I think this is what people would like to see more of from the judging panels ;)

Id be fine with them saying which bullet points apply for each GOE.
Ideally, isu should have some system where the results can be contested, and if deemed innacurate, judges would have to give a public explanation of each thing, "ok, this jump checked 4 bullet points, but had a very long setup, so i gave it a +2"
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
I don't want the thread to be only my thoughts. I agree that we should continue to try to improve judging. In other threads I've talked about that as well.

I think what people talk about much less is that these tiny margins are meaningless and us deciding winners on them makes little sense. That is the margins are so tiny that if we had a panel of 10 different judges judge the competition we would probably get different outcomes each time. That's without even getting into biased judging. The margin of human error is much larger than this system acknowledges and that's a very uncomfortable reality. The truth is that Satoko and Liza were tied, but we accept the fiction that one was better because we don't like ties in competitions.

I think people should keep in mind how sports competitions work in general. By declaring the winner they are not saying litteraly who is better/or worse, just who scored more points in some exact competition. To make equation between more points and better player/team is not really what sports are trying to do in first place. But to declair the winner - the one who played the game better (according to the game rules) or who is better competitor (at that exact day) (not the one who is generally better).
 

Reddi

Rinkside
Joined
Jan 16, 2018
I think the core mistake in the whole Satoko/Lisa TESvsPCS calculation lies in the idea that Satoko's and Lisa's jumps are comparable. Like in the same way that their spins or step sequences are. But if we're going to look at their jumps from the athletic perspective than Satoko's and Lisa's jumps are barely able to be considered as the same elements. The power, the height, technique, the risk that comes with all the above. GOE that both of them received at NHK completely failed to reflect this gaping pit. That's why I consider Lisa to be awfully underscored. You shouldn't be able to compensate THAT big of a difference in athletic might with PCS. It's a sport.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
I think the core mistake in the whole Satoko/Lisa TESvsPCS calculation lies in the idea that Satoko's and Lisa's jumps are comparable. Like in the same way that their spins or step sequences are. But if we're going to look at their jumps from the athletic perspective than Satoko's and Lisa's jumps are barely able to be considered as the same elements. The power, the height, technique, the risk that comes with all the above. GOE that both of them received at NHK completely failed to reflect this gaping pit. That's why I consider Lisa to be awfully underscored. You shouldn't be able to compensate THAT big of a difference in athletic might with PCS. It's a sport.

Well, PCS is not any more a score of artistic impression, but it is a mark which reflects how skaters respond to some challenges defined by PCS. For example with giving higher Intrepretation mark judges are not trying to say who is a better artist, but who at that exact day interpret the music more time during the program/with more body and blades movements/with more aknowledging of the rhythm and nuancies in the music. No matter how that word sound, interpretation of the music is also an athletic achievement which requires good basic skating, stamina, flexibility, power, acceleration... And TES score also has some 'artistic' requirements as for example originality/creativity of the element, pleasant body position in jumps and spins, elements matching the music etc
 
Top