Statistics around Olympic judging
We all know that sometimes the skaters' scores "seem" right and sometimes they "seem" wrong. One measure I've used to sort it out is figuring out the standard deviation in the scores. Basically, if all judges give identical scores, standard deviation will be 0; when judges seriously disagree, the deviation increases. I wrote a little program to figure out what got what deviations. If there's interest, I can post all of the results I got (a lot of numbers there!), but here are some highlights:
- The biggest agreement among the judges was for the components of Davis and White in all four programs; in the free, especially - the deviation was a mere 0.078. There was also remarkable agreement on Volosozhar and Trankov.
- Judges disagreed most on Plushenko's components - deviation of 0.62 in short, 0.88 in free during the team event (though in general, deviations were greater in team then in the individual).
- On the technical side, some if the biggest disagreements among the judges were in ladies' short, with Lipnitskaya's standard deviation being 0.68 and Wagner's 0.64.
- What I found interesting (and in part why I went through this exercise) was to compare Kim and Sotnikova's deviations in the free. This surprised me, as I expected a great deviation, especially on Sotnikova's components. In fact, it was not nearly as big as I thought - on the technical side 0.392 for Kim, 0.586 for Sotnikova; for components, 0.35 for Kim, 0.34 for Sotnikova (for Sotnikova, most agreement among judges on Interpretation, least on Performance).
It is not at all surprising because they were waiting with baited breath for Adelina to stay on her feet. Once that requirement was met, they had to give enough scores to secure Adelina's gold because they did not know how Yuna would skate. Hence so many 9.25~9.75's in Adelina's PCS. There is no merit in discussing actual "merits" of those scores. They were meaningless. One judge gave her 9.75/9.75/9.75/9.50/9.50 and another gave her 9.75/9.75/9.75/9.50/9.25. I don't know how many people will take those numbers seriously.
Wicked Yankee Girl
I'm always interested in the numbers.
I'm not surprised that D/W's components were quite uniform. They had the advantage of having skated 2 very fine programs in the team competition, while V/M were not quite up to their usual standard in the Team event. We argue all the time about the effect of the judges watching practices and perhaps pre-rating skaters accordingly. How much more might a judge be affected by a full competition run through less than a week previous?
I wonder how many journalists, columnists, bloggers and so on actually dig into numbers, especially those of not Olympic competitions and not top three skaters. Do you have any links to articles?
I looked up an altogether different statistics to understand the judging for ladies singles competition: Natural gas statistics . All countries represented in the panel, except Canada and Russia itself, were wholly or partially dependent on Russian gas. Meanwhile the four countries removed from the panel after the short are either self-sufficient or importers of non-Russian gas: US (domestic + canadian import), South Korea (imports from Indonesia), UK (domestic + norwegian imports), Sweden (100% Danish imports). Of course I am not saying this proves anything. But I could suggest the existence of a 'Natural gas bias' or an 'Energy bias.'