Statistical Proof | Golden Skate

# Statistical Proof

#### Aleejsj

Spectator
In order to show that certain judges had a bias for Adelina vs Yuna, I looked at the average, standard deviations and z scores for their marks they received in the free skate. The z score is calculated by the sample-mean/standard deviation and is a measure for how far off the average a certain judge was. The first thing to note is that the standard deviation for Adelina is higher which is suspicious. The difficult in determining inflating scores is that judges are not identified. A judge should remain consistent throughout his or her judging and should not vary from one program; however, inherently certain judges are tougher judges and some are easier. If we assume that Yuna was judged fairly we can establish a baseline z-score to determine how "tough" a judge is relative to the other judges. If the judges were fair and remained consistent they should relative to other judges remain just as "tough". If a judge has a low z score that demonstrates the judge is just a more difficult judge, not necessarily that he was penalizing a certain skater. However, when one calculates the Z scores for Adeline, there five judges that were significantly less tough on Adelina, one judge in particular showed a 604% jump. Four judges demonstrated a 100% or higher increase while the other five were relatively consistent. The total sum of increase for Adelina was 1078% in terms of "ease of judging". It is in fact interesting that the judges who demonstrated the largest jumps were the judges who seemed to be right around the average. The fix was intelligent as they did not grade on either extreme. The numbers never lie.

Adelina Sotnikova
Element Z Score Component Z Score Total Z Score
16 -1.235069563 44.25 -1.2836263 60.25 -1.298440577 Judge 8
15 -1.414353854 45.5 -0.523584412 60.5 -1.261692259 Judge 3
19 -0.697216689 45 -0.827601167 64 -0.747215804 Judge 9
22 -0.159363815 44.25 -1.2836263 66.25 -0.41648094 Judge 4
22 -0.159363815 48 0.996499364 70 0.134743834 Judge 1
26 0.557773351 47.75 0.844490987 72.25 0.465478698 Judge 5
26 0.557773351 46.5 0.084449099 72.5 0.502227016 Judge 6
29 1.095626225 47.75 0.844490987 76.75 1.126948426 Judge 2
31 1.454194808 48.25 1.148507742 79.25 1.494431608 Judge 7
Average 22.88888889 46.36111111 69.08333333
Standard Deviation 5.57773351 1.644646196 6.803032412

Yuna Kim
17 -1.067521025 42 -2.192964157 59 -1.664406808 Judge 2
17 -1.067521025 45.25 -0.558716983 62.25 -0.961390214 Judge 4
18 -0.747264718 46.75 0.195550944 64.75 -0.420608219 Judge 7
20 -0.106752103 46 -0.181583019 66 -0.150217221 Judge 8
20 -0.106752103 47 0.321262265 67 0.066095577 Judge 5
19 -0.42700841 48 0.82410755 67 0.066095577 Judge 9
22 0.533760513 45.75 -0.307294341 67.75 0.228330176 Judge 1
24 1.174273128 48.75 1.201241513 72.75 1.309894167 Judge 3
26 1.814785743 47.75 0.698396228 73.75 1.526206965 Judge 6
Average 20.33333333 46.36111111 66.69444444
Standard Deviation 3.122498999 1.988683261 4.622934974

% Change in Z Score
-0.219877874
0.312362286
0.776512609
1.772524597
1.038621027
6.042509003
1.199564792
-0.139664521
-0.020819822
Sum 10.78255192

Yeah the formatting isn't excel friendly, but all the numbers are there.

#### Vanshilar

On the Ice
I'm kind of curious about this. If it's not too much work, what about calculating z scores for other skaters like Asada to see the spread in them? To establish a firmer baseline for this.

Additionally, from elsewhere I've read that the judge's evaluations are actually randomized not just by shuffling the columns, but also by shuffling the columns in each individual row. Thus, you wouldn't be able to say that (for example) "Judge 7 scored this" because the 7th column would actually correspond to different judges' scores. I don't know if they're right or not on this, but would there be some statistical test to determine this? Although admittedly it seems a bit statistically unlikely for randomly shuffled numbers to end up with 1 column having mostly 3's...

Record Breaker

#### Imagine

Medalist
Too bad you forgot a basic concept regarding these statistics, which is that they don't actually prove anything. They only indicate how likely an event is.

It is likely, by your numbers, there is a higher level of inconsistency in judging for Adelina Sotnikova as compared to Yuna Kim...in which case you haven't proven a thing except that the judges were more inconsistent with their marks for Adelina as compared to Yuna with inconsistencies present for both. How does that even show a bias for Adelina? If anything it only suggests that the judges really can't agree on what marks to give her.

You might want to calculate a different statistic, maybe one showing how unlikely that Adelina's scores would rise so much compared to her past performances. That would be much more informative and would definitely strengthen the argument that something shady went down with the judging (but it's not like that isn't apparent already). In any case, the raw numbers aren't going to confirm anything, and can only really raise suspicion. The only way to prove once and for all that there was a bias toward Adelina would be to get an honest response from each judge on whether or not they were biased in their judging, and statistics should tell you how likely it is for that to happen So yes, the numbers do lie, and that's why people are mad in the first place.

#### StellaCampo

On the Ice
Additionally, from elsewhere I've read that the judge's evaluations are actually randomized not just by shuffling the columns, but also by shuffling the columns in each individual row. Thus, you wouldn't be able to say that (for example) "Judge 7 scored this" because the 7th column would actually correspond to different judges' scores. I don't know if they're right or not on this, but would there be some statistical test to determine this? Although admittedly it seems a bit statistically unlikely for randomly shuffled numbers to end up with 1 column having mostly 3's...

I've seen this question being asked and answered (by someone who seems to be a judge in regional competitions) on FSUniverse. The answer, as I recall, is that the column belongs to a single judge. That is to say, for eg, the first column on the left shows the marks given by one judge, the next column by another judge and son on, but the columns are randomised with respect to skaters, ie judge in column 1 for skater X is not the same person (though could be by coincidence) as judge in column 1 for skater Y.

#### Aleejsj

Spectator
There is not just an inconsistency in the judging, but a considerably easing in terms of points awarded. That relative to the averages handed out by other judges, some remained consistent in their toughness of grading while some demonstrated an easing in the scoring. For example the judge who made a significant jump graded Yuna Kim at about the average score handed by the other judges but said judge was several standard deviations easier for Adelina. The other judges in questions also showed several standard deviation increases while four judges remained very consistent in their judging.

Yes while statistics does not actually prove anything, assuming that the judges are trained professionals and remain consistent. The probability of a judge happening to be several standard deviations nicer for one particular contestant would be the equivalent of flipping 100 coins and them all landing on heads. I guess when you flip Russian coins, things happen.

#### drivingmissdaisy

Record Breaker
The first thing to note is that the standard deviation for Adelina is higher which is suspicious.

I would think the standard deviation would be high for Yuna as well if the judges were lowballing her scores to benefit Adelina.

#### kslr0816

On the Ice
I would think the standard deviation would be high for Yuna as well if the judges were lowballing her scores to benefit Adelina.

aside from her SP, her FS score seemed about right. It's just Adelina's that were outrageous (and a couple others)

#### Aleejsj

Spectator
While I'm not trying to demonstrate that Yuna was lowballed, just that the judging was different for the two skaters a higher standard deviation would not imply lowballing for example if we had:
Adelina - 100, 90, 75, 65, 60 where the 100 and 90 were the two suspicious judges then
Yuna - 75, 65, 60, 55, 50 it's actually more difficult to determine and the standard deviation is smaller.

#### Mathman

For example the judge who made a significant jump graded Yuna Kim at about the average score handed by the other judges but said judge was several standard deviations easier for Adelina.

Unfortunately, this is what cannot be determined from the protocols. "Said judge" is not "said." Judge number two for Yuna might be judge number 6 for Adelina.

The ISU does this randomization thing specifically to prevent anyone from analyzing the statistics in a useful way.

#### zschultz1986

Final Flight
Unfortunately, this is what cannot be determined from the protocols. "Said judge" is not "said." Judge number two for Yuna might be judge number 6 for Adelina.

The ISU does this randomization thing specifically to prevent anyone from analyzing the statistics in a useful way.

Right? God forbid there be any consistency and transparency to the numbers...

#### Aleejsj

Spectator
I believe that it is possible to identify the judges, going on the large assumption that no judge expresses a preference for a specific skater. Proceeding under that pretty large assumption, an assumption we would like and hope to believe, it is possible to determine which judges correspond to each number for example judge 8 for Sotnikova is likely judge 2 for Yuna as that judge graded significantly lower than the average score. This does not imply that judge 8/2 is cheating, it just means that this judge is a harder judge. Again proceeding under the assumption that the judges maintain their level of grading relative to each other than we should not see a preference or a large jump relative to the average.

Super simplification:
Sotnikova-
Judge 1 gives 20, Judge 2 gives 16, Judge 3 gives 18, Judge 4 gives 19
Kim-
Judge 1 gives 17, Judge 2 gives 17, Judge 3 gives 21, Judge 4 gives 19

We can identify Judge 1 for Sotnikova as Judge 3 for Yuna because this judge simply scores higher. Similarly, Judge 2 for Sotnikova can be identified as Judge 1 or 2 for Yuna, it doesn't really matter which one so let's say Judge 2 was the same for both and we can tell they are the same because that judge consistently judged lower than the average. Sotnikova Judge 3 is Yuna Judge 4 as that judge judged exactly the average score of the judges. Judge 4 for Sotnikova and Judge 1 for Yuna would be identified as a suspicious candidate as this judge judged below the average for Yuna and above the average for Sotnikova. While this judge may appear to be a fair judge because his scores remain in the middle, this shift over the average, if large enough it is possible to demonstrate that the change being "proablistically acceptable" is so low that this judge demonstrates a scoring bias towards certain skaters.

Obviously the above example is dramatically simpler and the number of points makes it harder to show but this is the principle that demonstrates cheating. The highest scoring judge is in now way necessarily the cheater, that judge may simply like giving high scores. When a judge demonstrates a change in their relative judging though it demonstrates bias.

I suppose it would be possible to try and identify judging at the performance for non-Russian competitors and establish a baseline. Then look and see if particular judges demonstrated biases from the other skaters towards the Russians. But that's a lot of work...

#### Mafke

Medalist
The ISU does this randomization thing specifically to prevent anyone from analyzing the statistics in a useful way.

Which is why I'm not a fan anymore and think anyone who gives a rat's *** who wins phony ISU "competitions" is a deluded fool.

To be clear, there were some wonderful performances in Sochi, but the ISU has zero credibility and winning any competition run by the iSU is not an accomplishment.

The only thing that will bring them to their senses is a boycott of interest (and letting them know why).

Replies
55
Views
12K
Replies
39
Views
6K
Replies
106
Views
16K
Replies
36
Views
5K
Replies
37
Views
20K