The Judging Controversy Thread

markmchen · Mar 24, 2014

Sam-Skwantch

that is an interesting question regarding outliers. let me run the standard devs on the three skaters and see what that shows

markmchen · Mar 24, 2014

Ok, finished inputting Caro

In terms of PCS, Yuna had highest StDev at 3.18 on score of 74.18. Caro had 3.02 on score of 73.56. Adelina had 2.63 on score of 74.18. So interestingly, Adelina had lowest PCS deviation.

In terms of GOE, Adelina had by far the highest StDev at 5.61 on unweighted GOE total scores of 22.78. Caro had 4.01 on 16.11 Yuna had 3.69 on 20.89

markmchen · Mar 24, 2014

there is only one obvious outlier. There is one judge that was (2.0) stdev against Yuna in PCS. Also one judge that was (1.6) stdev against Caro in PCS

markmchen · Mar 24, 2014

sorry to keep posting after myself. just adding more data.

the most disagreed element of the night was adelina's first jump combo (3Lz + 3T) scores ranged from -1 to +3 GOE. stdev of 1.22

least disagree element of the night was Caro's skating skills PCS. Excluding 1 outlier at 8.50, scores ranged from 9.00 to 9.50. almost unanimous

Sam-Skwantch · Mar 24, 2014

Thank you for taking the time to try and put a new perspective into the scoring. I for one appreciate it and am trying to figure out how to put into words the results you've established. Sometimes math is tough to put into words for me. Maybe Mathman will beat me to it and I can simply :thumbsup: his analysis.

usethis2 · Mar 24, 2014

markmchen said:
Ok, finished inputting Caro

In terms of PCS, Yuna had highest StDev at 3.18 on score of 74.18. Caro had 3.02 on score of 73.56. Adelina had 2.63 on score of 74.18. So interestingly, Adelina had lowest PCS deviation.

In terms of GOE, Adelina had by far the highest StDev at 5.61 on unweighted GOE total scores of 22.78. Caro had 4.01 on 16.11 Yuna had 3.69 on 20.89

Thank you for the analysis. Your two paragraphs say a whole lot. More than 200 sentences that I could have written. Then again you probably already knew.

markmchen · Mar 24, 2014

Hey Sam,

Sorry for all the numbers. Looks like I drove everyone away.

Here is my interpretation of the more interesting results:
1. one judge was very anti-Yuna and one judge was very anti-Kostner (probably not a surprise)
2. judges seem to relatively agree on Adelina's PCS scores (surprise!)
3. judges don't agree so much on Adelina's GOE, which had a large range of scores (this is where you see clear voting blocks)
4. given overall difference in scores, not difficult for 1-2 judges and/or technical assessments to swing decision

usethis2 · Mar 24, 2014

Vanshilar said:
Well there's already some online. For example, of the triple flips by Adelina and Yuna, showing Adelina's much-vaunted delayed rotation:

http://www.youtube.com/watch?v=F3iup7I46qE

That's almost a forward take-off. I've seen a similar technique in her 3T combo.

markmchen · Mar 24, 2014

Hey usethis2. Thanks goodness. I thought I drove everyone away with the flood of numbers

usethis2 · Mar 24, 2014

It's late here in the U.S. Time to bed. Thank you again for your analysis. Looking forward to seeing more number crunching from you.

P.S. Do not be discouraged by hostile posts that might attempt to distort/discredit your number crunching.

BTW:

markmchen said:
Here is my interpretation of the more interesting results:
1. one judge was very anti-Yuna and very anti-Kostner (probably not a surprise)

I think you meant to say there was one judge who was very anti-Yuna and one who was very Anti-Carolina. It's difficult to say at 100% confidence level the two were the same judge. Could have been two judges taking each part and doing their jobs.

You may also find a seriously funny business if you compare Adelina and Yulia's marks for a very different reason.

markmchen · Mar 24, 2014

Thanks usethis2. i'm actually in Tokyo so my number crunching will come late in the night for U.S. folks

markmchen · Mar 24, 2014

usethis2. you are absolutely right. that is what i meant. edited my original comment

btw, per your suggestion, i added Julia (and Mao) to the analysis. Here are their numbers:

On GOE, Mao had 3.62 stdev on 10.89. On PCS, Mao had 2.97 on 69.78
On GOE, Julia had 4.82 stdev on 12.78. On PCS, Julia had 3.31 on 69.96

Mathman · Mar 24, 2014

Thanks for the cool statistics, markmchen. :rock:

So I guess this is why the ISU will never give up anonymous judging. If we knew, for instance, that there was one judge who did not sit on the short program panel but came to the LP with a fury and gave crazy high scores to Sotnikova while lowballing both Kim and Kostner, then we would have something to go on, instead of just venting in the dark.

By the way, speaking of standard deviations, if you are not familiar with how the ISU assesses "judging anomalies," you might be interested in this document. (This is ISU Document 1631. There may be a more recent version.) To decide whether a particular judge's marks are suspiciously out of line they use a kind of cumulative "standard deviation" based on the absolute value rather than sum of squares. Note that it is different for GOEs and PCSs. Every year about a dozen judges run afoul of the "corridor" and are reviewed under this procedure.

http://static.isu.org/media/99367/1631-officials-evaluation.pdf

markmchen · Mar 24, 2014

Hey Mathman... fellow numbers dude. btw, my brother lives in Michigan too.

Took a look at your link. Very interesting the way they chose to enforce the "corridor" using absolute values. Let me see if any of the judges tripped the defined "corridor"

markmchen · Mar 24, 2014

Ran the "corridor" numbers. First observation, really really wide corridor for GOE. For example, Adelina scored 22.78 total GOEs (best of night i think). Judge is allowed to be off by +/- 12. In other words, judges can be 50% higher or lower than everyone else and still be in "corridor". For PCS, "corridor" range is 7.5 points, which given total PCS (pre 1.6x multiplier) can be ~45 (assuming 9s), only allows judges to be about ~17% off from average. Perhaps this is why GOE stdev is so much wider than PCS.

No judge tripped "corridor". Closest for GOE was one judge's scores for Caro which totaled 11.2 off average (just below the 12 cutoff). Closest for PCS was one judge's scores for Yuna which totaled (4.4) off average (versus cutoff at 7.5).

ILuvYuna · Mar 24, 2014

markmchen said:
Hey Sam,

Sorry for all the numbers. Looks like I drove everyone away.

Here is my interpretation of the more interesting results:
1. one judge was very anti-Yuna and one judge was very anti-Kostner (probably not a surprise)
2. judges seem to relatively agree on Adelina's PCS scores (surprise!)
3. judges don't agree so much on Adelina's GOE, which had a large range of scores (this is where you see clear voting blocks)
4. given overall difference in scores, not difficult for 1-2 judges and/or technical assessments to swing decision

Hi Markman!

Thank you so much for doing this, I'm glad I'm not the only one who can see the voting blocks (although for me, the voting blocks were even more discernible in the pcs portion of the score). I haven't even had a chance yet to look at Caro's scores, but it's interesting to know there was a judge who stood out as scoring her extra low. I remember Mathman also saying a few pages back that 1-2 could influence the score (and 4 could definitely steal it), so it's also interesting to know that 1-2 alone, along with Lakernik/tech panel could swing the decision in favor of Adelina.

Before I called it a night, I had put Y's and A's scores side by side, with my suspected cheating block on one side, and the less suspicious ones on the other side. For this, I averaged GoE from each judge (not scaled, just the numbers as they were given), and I averaged each Judge's pcs for each skater. The pcs in particular was like night and day, I have a feeling that despite the nearly 6 point lead A took in the TES portion of the score, if you were to take the averages of each component from the cheating block vs. the others, for each skater, and applied the 1.6 factor/multiplier that they use to get the PCS score, it would show how the potential cheating block's influence on PCS alone would have won it for Sotnikova (so even w/out needing Lakernik or anyone else on the tech panel).

Sam-Skwantch · Mar 24, 2014

markmchen said:
Hey Mathman... fellow numbers dude. btw, my brother lives in Michigan too.

Took a look at your link. Very interesting the way they chose to enforce the "corridor" using absolute values. Let me see if any of the judges tripped the defined "corridor"

Thanks for the link Mathman. While I'm not as much of a fan of math as you guys are I'm very interested in the results. It's very fascinating to see how they search out anomalies. Shall I just assume the ISU has already done this excersise?
Ill have to read it again to get a better grasp but if a judges scores become an outlier and therefore not used is the judge still responsible to stay within the corridor? I'm not even sure if I phrased that properly.

ILuvYuna · Mar 24, 2014

markmchen said:
Ran the "corridor" numbers. First observation, really really wide corridor for GOE. For example, Adelina scored 22.78 total GOEs (best of night i think). Judge is allowed to be off by +/- 12. In other words, judges can be 50% higher or lower than everyone else and still be in "corridor".

2 Questions for the Mathphiles

1) what is "sum of squares" and how does it differ from "absolute value"? Also, would it be possible to set up a "corridor" using sum of squares, and plugging in A's and Y's numbers to see if any judges would be flagged for suspicion?

2) how does the "corridor" thing translate to the GoE scale? -3 -2 -1 0 +1 +2 +3 Does it mean that if the average of the other judges is +2, the judge would only trip the corridor if they voted 0 or below? Or does it only apply after they've been scaled?

markmchen · Mar 24, 2014

Sum of squares is a method to adjust for positive and negative variance by squaring. Thus variance, positive or negative, ends up treated same. This is the more common way that standard deviation is calculated. This is the method I used earlier. Results of that analysis suggest there were a few judges for each skater that were about 1 standard deviation off mean regarding GOE and PCS. To be fair, these judges went both ways. Not all voted in same way.

The likelihood of this (assuming no bias) happening randomly is only about 30%. There was also one judge that was nearly 2 standard deviations off mean for PCS against yuna. Another judge was 1.6 standard deviations off mean in PCS against Caro. The likelihood of this happening randomly is only about 5%. These are the most clear outliers.

Regarding GOE, you can be 1 full point off mean for all 12 elements in fs or all 7 elements in sp. so if average is 2, a judge can give 3 (or 1) on every element and not trigger corridor. Seems like corridor is calculated before scaling.

markmchen · Mar 24, 2014

Iloveyuna

If you like, I can tell exactly which judges were well off mean tomorrow when I have my computer

The Judging Controversy Thread

markmchen

markmchen

markmchen

markmchen

Sam-Skwantch

“I solemnly swear I’m up to no good”

usethis2

markmchen

usethis2

markmchen

usethis2

markmchen

markmchen

Mathman

markmchen

markmchen

ILuvYuna

Sam-Skwantch

“I solemnly swear I’m up to no good”

ILuvYuna

markmchen

markmchen

Similar threads

Connect with us