Ladies Free Skate and Results + GPF Finalists

Joesitz (RIP) · Dec 2, 2003

The bloc judging is cultural - not political - not conspiritorily. there are more slavic countries judging than there are celtic judges :laugh:

Joe

moyesii · Dec 2, 2003

The bloc judging is cultural - not political - not conspiritorily.

Joesitz, Don't you think that the sociological foundation (theoretical) of bloc judging is irrelevant to its impact on skating results (empirical)? If judges are biased, showing favoritism -- these issues have to be addressed.

Of course judging is subjective, but there are rules and standards that they follow so that judging is not an arbitrary and arcane process. Rules and protocol govern both the ordinal and CoP systems. The CoP is more transparent, giving viewers a certain satisfaction of understanding the process, but the CoP is also more faulty and deceptive, with many weak links in its chain. A court judge is a perfect example of a less transparent but fair and legitimate system.

chuckm · Dec 2, 2003

It's impossible to remove favoritism or unreasonably biased judging from the sport as long as a cloak of secrecy protects judges from scrutiny.

Under CoP, we can compare one judge's marks to the marks of other judges, and can plainly see when a different 'standard' is being applied. If the scores are way out of line with the other judges, then if that judge is selected, at least severely aberrant results will be thrown out. What is harder to detect is groups of judges acting in tandem, marking the same skaters up and/or down. Even if such patterns are detected, we don't know who the judges are because their identities are protected.

We can't rely on the ISU to police its own judges, because as we have seen in the past, nothing has ever been done until an enormous scandal erupted into public view, forcing the ISU to act. In those cases, the identities of the offending judges were known. Not any more.

Joesitz (RIP) · Dec 2, 2003

moysesii - Much of what you are saying is blowing my mind. I need time to devour it. However, I do like it and this dialogue.

But my argument about cultural bias does exis, imo, and the slavic world has the edge on this form of non-intential bias. That was the reason for so many complaints about the old system. They were demanding that Eastern Europe should be limited to 2 judges in each competition. Western Europe would have 2 judges (much more diverse people). Asia would have 2 and N.America would have 1. However, this didn't happen and at 2002 Worlds, Irina received 7 out of 9 first places all 7 of which were from Slavic countries. There was no conspiracy here. It was a matter of communal taste in art (and sport). Even Dick when asked the question: Can MK win tonight? Reply was not with this line of judges.

I raise this example, not for discussion of fans' personal preference of Irina and Michelle but simple to show the theory of non intentional cultural bias. And if you go through the scores of other competitions which another poster had done sometime ago, where there are 2 or more slavic judges you will see that those judges agree with themselves. Again, I reiterate, this is not a conspiracy. It is simply the way people are brought up.

In the US and Canada, the populations are much more diverse and many of those people (and judges, for that matter) are close to their ethnic backgrounds and have specific different tastes in culture. Americans are not always in agreement with each other.

Now, conspiracies are a different kettle of fish and I do believe they exist and can cause more dislike for the sport especially in view of the SLC scandal(s). I'm not sure about the Dance Competition. I hae to study your theory on this. BTW, thank you for interest in this subject.

Joe

euterpe · Dec 2, 2003

It is true that 2002 Worlds had a panel for the Ladies SP and FS that was 100% European, and 5 of the 9 judges were Eastern European.

It is also true that the Russian and Belorussian judges gave Irina Slutskaya 6.0 for presentation in the SP.

But, having said that, Irina won that competition fair and square, and gave possibly the best pair of SP and FS performances she had done since the 2000 GPF. Kwan had stumbled on her 2A in the SP and finished 3rd, behind Irina and Fumie Suguri, so she wasn't in control of her destiny in the FS anyway. Even if she had won the FS (and it was close between Irina and Michelle, as they both skated very well), she couldn't have won gold.

The most egregious example of cultural bloc judging I have ever seen was the 2001 GPF final FS, where Irina stumbled throughout, completing only 4 clean triples, while Kwan completed 6 and Sarah Hughes 7. Because Eastern European judges dominated the 7-judge panel, Irina was placed 1st in a 4-3 decision.

moyesii · Dec 2, 2003

Joesitz said:
But my argument about cultural bias does exis, imo, and the slavic world has the edge on this form of non-intential bias. That was the reason for so many complaints about the old system.

Joesitz,
I understand what you're saying about cultural bias. If you are right, then "bloc judging" would be a misnomer, because the judges would be unwittingly forming groups among themselves. But there has been proof that judges have been working together. Remember the toe-tapping incident, and SLC of course. No one can suggest that these are isolated incidents. Also, if results were based on inbred cultural differences and not forged alliances, wouldn't make the effects of cultural bias any less harmful than bloc judging. Also, are you suggesting that this problem is peculiar only to the 6.0 system, and is not a factor in the CoP?

There was no conspiracy here. It was a matter of communal taste in art (and sport).

I'm curious how you'd explain what criteria specifically are involved in cultural bias (non-skating-related judgments) that discriminate two skaters such as Michelle Kwan and Irina Slutskaya in the eyes of the judges. For example, skating-related factors that sometimes put Slutskaya rightly ahead of Kwan were speed, spins, and jumps. But what cultural aspects of these skaters' skating would you say made one skater more favorable to Eastern European judges? I felt that Kwan's nuanced skating and choreography to Scheherazade were much more in the classical European style than Slutskaya's Tosca. Aside from the fact that Slutskaya is from Eastern Europe, how would the judges "unwittingly" identify more with Slutskaya's skating than with Kwan's skating?

Mathman · Dec 2, 2003

I have wondered about that, too, Moyesii. We think of the European and especially the Russian style as favoring grace over klutziness (Oksana v. Nancy). Yet when it was Kwan against Slutskaya and Butyrskaya, suddenly they did a hundred and eighty.

I think the so-called Eastern Bloc has been more consistent in pairs. Sale and Pelletier, skating like a man and a woman, were not as well received by Eastern European judges as were the more ornate Berezhnaya and Sikharudlize, skating like two artists.

Mathman

Mathman · Dec 2, 2003

Pardon the double post, but I wanted to remark on some earlier posts on this thread.

The more I read arguments on the statistical merits and demerits of the CoP versus ordinal systems, both from casual fans and from serious observers with some exposure to statistical modeling, the more I have come to appreciate Cinquanta's cleverness. Like Bill Murray in Groundhog's Day, we can hardly click on a figure skating URL without reading the same rehash about means and medians, standard deviations and ordinals, analysis parametric and non. (BTW, whatever we might feel about this argument, there is no reason for us to use discourteous language to each other.)

Speedy has thrown out so many red herrings, statistical and otherwise, that we have forgotten that this was supposed to have something to do with corruption, scandal and pre-judging via deal-making in smoke-filled back rooms.

But I love the CoP. All those delicious numbers for us to look at! Speedy has co-opted me, too.

Mathman

giseledepkat · Dec 2, 2003

I really enjoyed your last post, Mathman! It was so finely nuanced, even for you!

I'm reminded of the old children's bedtime story, The Emperor's New Clothes. I think that a lot of the people so upset by CoP must feel like the little boy in that story, watching the emperor parade by and shouting, But can't you see -- he's naked!

Speaking for myself, I'm a fan of the new system, and I find so much to like in it -- but every once in a while I wonder if I'm just enraptured with my own image of the finery which has been described for us.

moyesii · Dec 2, 2003

Speedy has thrown out so many red herrings, statistical and otherwise, that we have forgotten that this was supposed to have something to do with corruption, scandal and pre-judging via deal-making in smoke-filled back rooms.

Mathman, I haven't forgotten. Just only so much at a time that we can deal with, and very little it feels like we can do (as spectators).

By the way, sorry if my posts sound like rehash. If you find yourself enamored with CoP and the ISU, perhaps it'd be better just to ignore my threads about the CoP and its demerits. As for me, I find the statistical faults in the CoP to be glaring, and very dangerous, particularly because they mislead the general public to thinking that results are being computed fairly and systematically without error. I'm reminded of something as well, giseledepkat. In my stats course in college, one of the things I learned very early is that statistics explains many phenomena counterintuitively. Therefore, I think the emphasis that people like Sandra Loosemore and George Rossano have placed on statistics in their methodology. Unfortunately, the majority of people will not be savvy to stats, and this makes it easy for us to be hoodwinked by the ISU into thinking that we have a legitimate, working system. As I said, one of the particularly preposterous aspects of the CoP is that many competitions' medal placements' will be determined by the random count of judges' marks, particularly in close competitions. Would it be right for an Olympic medal to be decided this way? I can picture it now: Another close event a la Kerrigan/Baiul. Cinquanta says, "Ah, but it's not the judges' fault. The competition was decided by fate! No one is to blame!"

If you haven't become too weary of a CoP discussion, perhaps we can have a discussion where you state one or two things that you like about the CoP and then we can analyze those, and then move on to another aspect of the CoP and analyze that, and so forth. Or, if you're way ahead of me (since I'm still learning), maybe you can either state your counterarguments to either Loosemore's or Rossano's findings on the statistical flaws in the CoP, or steer me towards your previous writings on the matter. This way we can tackle the CoP one step at a time, and you guys can help me understand and learn some things as well.

Mathman · Dec 2, 2003

Well, first I should say that I am not a statistician. I don't think that either Sandra Loosemore or George Rossano is, either. Dr. George Rossano works in aerospace engineering and Dr. Sandra Loosemore, I believe, in the theory of computation. My field is geometry with applications to cosmology. Still, I have taught statistics courses at the undergraduate and graduate levels off and on for quite a number of years.

I am not exactly enamored of Mr. Cinquanta and the ISU, but I do appreciate a good con man when I see one. It's not so much that I am bored with debating the statistical merits of various scoring systems as that I think these debates mostly serve to deflect attention from the real issues, namely, doing everything we can to catch cheaters and banning them for life when we do. Why are people who have been caught red-handed with their fingers in the till still judges in good standing? I think that issues such as regional representation on judging panels and weakening the grip of national federations over the judging process are more important than splitting hairs about how points are tallied up.

One of the points that Dr. Rossano is particularly concerned with is the effect of the random draw. I agree. But this is a public relations issue, not a statistical one. The public says, hey, wait a minute, doesn't this introduce an unwanted element of random chance into the mix? Not really. If you have a sitting panel of, let us say, 9 judges, it does not matter statistically whether or not an additional 5 dummy judges, whose votes are predetermined not to count, are taking up space at the judges' table. It's rather silly, of course, and it does give the public something to howl about. But statistically speaking, choosing 9 judges out of a pool of 14 by computer 15 minutes before the competition starts, has the same effect as if the voting judges had been chosen months in advance by drawing names out of a hat, which is how they did it under the old system. No matter when the judges' draw takes place, or how, skaters face the same probability of obtaining a draw that is favorable to them, or not, for whatever reason.

Rossano and others go on to say that the best thing would be to increase the judges' panel to, say, 25 and count every vote. Of course it would. It would be better yet to increase it to 1000 judges, if we could find that many qualified. But it would be no better or worse first to choose 2000 judges to crowd around the table and then eliminate half of them by a random draw just before the contest started. If we have in mind the paradigm of sampling theory, the only thing that counts is the sample size, along with a guarantee that each judge has an equally likely a priori chance of being selected.

But I have a little bit of a problem with the sampling theory model anyway. This depends on a tacit assumption that there is somehow a "true score" for every performance, that this "true score" is in principle quantifiable (as being the mean score of the hypothetical population of all possible qualified judges past, present or future, for instance), and that we can then treat the judges' panel as a sample of this population (like a political poll in trying to predict the outcome of national election).

Statisticians like this model because for many statistics it has been thoroughly understood for 100 years -- just pop the numbers into the formula and you're done. (Like everyone who likes both numbers and skating I jumped right up after the Nebelhorn competition and ran the numbers through every test I could think of. This was amusing for a while.)

But I am not convinced that this mythical "true score" actually exists. As we all admit, skating is subjective. In sober reality, the only thing that counts is, do these 9 judges like your performance of not. Thus the voting panel is the entire population, and everything that we learned in our statistics classes goes out the window.

Another point at which I disagree with the critics of the CoP is about the whole secrecy issue, and whether we can tell (as easily as before) when individuals and blocs are cheating. Yes, it's true that the public does not know that judge number 4 is Joe Blow from Outer Slobovia. But if the ISU continues to give us all the details of the scoring of each element, it will be easy enough to figure out which judges (by number) were eliminated in the random draw, and then to work up statistical analyses like, "judges 2, 7 and 9 are obviously conspiring to hold down skater A." Much more to the point, IMO, is what the ISU plans to do with this information, if anything.

Anyway, I guess the bottom line is, I don't really have a horse in this race. The mean, trimmed mean, median, ordinals -- none of this can stop people from cheating if they are determined to do so. Like Pollyanna I hold out a foolish hope that somewhere down the line the ISU folks will come to realize that they are killing the sport because the public, which after all pays the bills, think they're a bunch of crooks. Maybe then, when it hits them in the pocketbook, they will actually do something about it rather than just tinkering with the scoring system.

If I had my druthers, I'd rather see the USFSA sign up with the WSF.

Mathman

PS. Despite this rant, I love this sport! Go AP and Jenny at U.S. Nationals!

moyesii · Dec 3, 2003

I think these debates mostly serve to deflect attention from the real issues, namely, doing everything we can to catch cheaters and banning them for life when we do.

I agree. And I appreciate your post. But I think you said some interesting things about the CoP that I want to address, because I find them irreconcilable:

But I have a little bit of a problem with the sampling theory model anyway. This depends on a tacit assumption that there is somehow a "true score" for every performance, that this "true score" is in principle quantifiable (as being the mean score of the hypothetical population of all possible qualified judges past, present or future, for instance), and that we can then treat the judges' panel as a sample of this population (like a political poll in trying to predict the outcome of national election).
...
But I am not convinced that this mythical "true score" actually exists. As we all admit, skating is subjective. In sober reality, the only thing that counts is, do these 9 judges like your performance of not. Thus the voting panel is the entire population, and everything that we learned in our statistics classes goes out the window.

If I understand you correctly, we both agree that the CoP is inadequate. We both agree that a "true score" in skating cannot possibly exist. The only thing we can hope to achieve with some certainty is to rank the skaters. In effect, this is the first objective of both the ordinal and CoP systems. A secondary goal of the CoP is to give a meaningful score for each skater. Ironically, the secondary goal is the one being hyped by the ISU, but it is the first goal where the CoP is severely deficient. Deficient because results based on these scores -- of dubious value and meaning -- will themselves be dubious.

I took your quote above to mean that it is impossible to sample a population for a true score that doesn't exist. I'm unclear however, if we both agree that we can indeed "treat the judges' panel as a sample of this population" under the ordinal system. In the ordinal system, if a different sample of 9 judges out of a population of 1000 was taken multiple times, the system could produce samples with consistent results, i.e. ranking of skaters (but NOT assigning individual scores as in CoP).

One of the points that Dr. Rossano is particularly concerned with is the effect of the random draw. I agree. But this is a public relations issue, not a statistical one. The public says, hey, wait a minute, doesn't this introduce an unwanted element of random chance into the mix? Not really. If you have a sitting panel of, let us say, 9 judges, it does not matter statistically whether or not an additional 5 dummy judges, whose votes are predetermined not to count, are taking up space at the judges' table. It's rather silly, of course, and it does give the public something to howl about. But statistically speaking, choosing 9 judges out of a pool of 14 by computer 15 minutes before the competition starts, has the same effect as if the voting judges had been chosen months in advance by drawing names out of a hat, which is how they did it under the old system. No matter when the judges' draw takes place, or how, skaters face the same probability of obtaining a draw that is favorable to them, or not, for whatever reason.

It is only true that a random draw is insignificant when we are talking about the ordinal system. And first, we have to bar the issue of bloc judging, which is a separate issue that confounds both systems, and in which case we do often say, "Oh but with a different set of judges, so-and-so skater would have won..." For a discussion of the ordinal system (as for the CoP), we have to assume that all judges in the panel are marking each skater the same, regardless of nationality and using the same standards and criteria. With these standards in place, we have to assume that the results of a judging panel with a 5-4 or 6-3 split are indicative of bloc judging, which is why the majority of judges' placements determines the final results under the ordinal system, and which is why the actual scores themselves don't matter. (Under the CoP, the scores matter, to the detriment of the final placements.) For the ordinal system only, the placements of the 9 judges ARE representative of the larger population and how the population would have judged the same event. Your argument is that a sample of a sample will be representative of the population, such that a sample of 5 marbles out of a sample of 9 marbles out of 100 would probably be equal to a sample of 5 marbles out of a population of 100. Only in the ordinal system would this be a correct assumption. Under the CoP system, however, every judge introduces error and variability through each little mark that they generate, therefore each judge can and will individually affect the final outcome regardless of the majority opinion of the panel. Under the ordinal system, 9 judges' placements are compared, but in the CoP system, each judge contributes MULTIPLE scores -- with error -- into a sum total. Because actual scores are being used to compute the results and not ranks, every time you take a random sample of the judges' marks in the CoP, the results of the competition will be different! As we have seen, the GP competitions have been coming down to within hundredths of a point margins. Therefore, random count of judges' marks has a significant impact in the CoP system.

The ordinal system uses a majority system of ranks to determines the results. The ordinals eliminate error that would be inherent in raw scores as used in the CoP. The unfortunate conclusion is that it is not possible in any fair way to use an absolute point system in our subjective sport. It'd be nice if it were possible, but it's not, and the ISU is misleading people. The scoring can't be done systematically on such a micro level without enormous amounts of significant error introduced into the results. Secondly, the ISU claims that the point system gives a more meaningful measure of a skater's performance. Unfortunately, this cannot be true, because 20 out of 20 draws of a hat, the competition results will be different. This is not true with the ordinal system, except in the case of bloc judging, which must be dealt with irrespective of the judging system being used.

Under CoP, the random count of the scores by itself is a fault considering that the sample size is small to begin with. But in addition, even if we assume that the judges aren't cheating, there is the issue of human error in judgment. The ordinal system effectively eliminates this error by refining the judges' marks into relative placements. The CoP does not eliminate error, despite the trimmed mean, since the mean is not a robust measure of central tendency in small samples, and since EVERY judges' mark will have error and will introduce its own error into the sum. Therefore no amount of randomization or dropping scores will change the fact that the CoP's results are unreliable. In the ordinal system, the majority count should effectively eliminate human error and produce reliable results, consistent results time and again.

One of the reasons that the CoP is inherently faulty is that it is not complete in its objective. The first part, TES, consists of the technical components, which are judged element by element as the skater's program progresses. It is more objective than the 5 presentation component scores, which must be judged at the end of the skater's performance, just like the way it was in the ordinal system, except now there are 5 subjective marks instead of 1. These component scores are all as subjective and liable to human error as the single presentation mark in the 6.0 system. The difference is that these 5 subjective marks are added together to form part of a total that is touted to be an objective measure of performance. This is just not possible given the subjective nature of judging the presentation components. The total is not equal to the sum of its parts.

Mathman · Dec 3, 2003

Give me a day or two to ponder that post, OK?

Hey Rgirl, are you reading this? (Rgirl is my Golden Skate nemesis on this issue.)

MM

Mathman · Dec 4, 2003

"Beware of Mathematicians and all those who make false prophesies. The danger already exists that the Mathematicians have made a covenant with the Devil to darken the spirit and to confine mankind in the bonds of Hell." -- St. Augustine

Well, I'll do my best to try to respond to some of the points that you raise. I think my overall position is that the CoP has problems and so do the various systems based on ordinal placement, but none of it matters very much in view of the real problems that plague the sport of figure skating and the ISU.

But I think you said some interesting things about the CoP that I want to address, because I find them irreconcilable.

I think this is the price we pay for living in the real world. No matter how clever we think we are, all of our mathematical and statistical models are very quickly exposed as being just that -- models (in the sense of a model airplane, say) of something that can't really be captured in such a simple fashion. The great Scottish philosopher David Hume staked his reputation on his classic work, A Treatise on Human Nature, in three volumes. The thrust of volume 1, proved in a couple hundred pages of close argument, was that there is no such thing as causality. Volume three, on ethics, begins, no theory of ethics is possible without causality, so forget everything I said in volume 1.

So it is with statistics. We pretend that something exists (a true mean or a true ordinal ranking), even though we know it doesn't, because otherwise we couldn't carry on these discussions.

I took your quote above to mean that it is impossible to sample a population for a true score that doesn't exist. I'm unclear however, if we both agree that we can indeed "treat the judges' panel as a sample of this population" under the ordinal system.

No, I don't think so. I think that the same arguments against the reliability of CoP-type scores weigh also against ordinal placements.

In the ordinal system, if a different sample of 9 judges out of a population of 1000 was taken multiple times, the system could produce samples with consistent results, i.e. ranking of skaters (but NOT assigning individual scores as in CoP).

I think that such a sampling distribution would have variations like any other sampling distribution. True, two different panels could give the same rankings, while it would be virtually impossible to achieve a perfect match of CoP scores. But I do not see any reason to think that there would be less variation in one system compared to the other in the final outcomes of the contest if we looked at all 2.7 * 10^21 possible ways of choosing a 9 judge panel out of our 1000 candidates.

It is only true that a random draw is insignificant when we are talking about the ordinal system.

This is really the only point that I am fairly confident about. Any statistic whatever, be it mean, median, ordinal ranking or whatever, will have the same sampling distribution no matter how convoluted the process is of selecting the sample, so long as each prospective judge has an equal chance of being included in the final group of nine.

Imagine the most extreme case of random draw. At 0.010 seconds before the program begins, the panel of 1000 is randomly cut down to 999. A thousandth of a second later, it is cut down to 998....Finally only nine are left. The scores of these nine count and all the other 991 sit at the table, having been made fools of by Speedy, pretending to mark. No matter what statistic you extract from the final sample, the probability distribution of that statistic will be exactly the same as if you had, say, rolled dice for it three months earlier. In either case, every possible 9-judge combination out of the original 1000 will have an equally likely possibility to be the final voting panel (namely, 1 chance in 2.7*10^23, LOL).

It is not that the effect of having the random draw in two stages is "insignificant," rather that it is impossible for it to have any effect at all.

For a discussion of the ordinal system (as for the CoP), we have to assume that all judges in the panel are marking each skater the same, regardless of nationality and using the same standards and criteria. With these standards in place, we have to assume that the results of a judging panel with a 5-4 or 6-3 split are indicative of bloc judging,

I think that each judge should be judging fairly from one skater to another, but the judges need not agree among themselves. Therefore a 5-4 or 6-3 split may indicate nothing more than a difference of opinion.

For the ordinal system only, the placements of the 9 judges ARE representative of the larger population and how the population would have judged the same event.

I do not see how this can possibly be asserted? For any statistic whatever which admits of variation within a population, there is always an associated sampling error, by which we understand the difference between the population statistic and the sample statistic. The expected value of the sampling error (the "standard error") is typically easy to quantify for most commonly arising statistics (sigma/sqrt

for the mean, sigma*sqrt(pi/2n) for the median, etc., etc.) Non-parametric statistics such as ordinal rankings are certainly not immune to this phenomenon.

Did you mean to say MIGHT BE instead of ARE?

Your argument is that a sample of a sample will be representative of the population, such that a sample of 5 marbles out of a sample of 9 marbles out of 100 would probably be equal to a sample of 5 marbles out of a population of 100. Only in the ordinal system would this be a correct assumption.

On the contrary, this is absolutely and incontestably true for any statistic whatever.

Choose 9 out of 100, then 5 out of 9. What is the probability that any one particular 5-judge sample ends up as the sitting panel?

Answer: Our particular 5-judge panel has 1 chance out of 597520 of being included in the 9-judge panel, then 1 chance out of 126 of surviving the final cut (the hypergeometric distribution). So the probability of that particular set of 5 marbles ending up as the one that counts is 1 chance in 597520*126 = 75287520.

Suppose instead that we chose five marbles out of 100 to begin with. Our given panel now has one chance out of 100!/5!95! = 75287520 of being chosen. Exactly the same.

Now that this particular panel has been selected, any statistic whatsoever that we can extract from it is, well, the statistic we can extract from it. (?)

Therefore each judge can and will individually affect the final outcome regardless of the majority opinion of the panel.

Yes. That's what the judges are there for. To affect the final outcome.

Because actual scores are being used to compute the results and not ranks, every time you take a random sample of the judges' marks in the CoP, the results of the competition will be different!

I hope you mean that "every time you take a random sample (i.e., seat this particular 5-judge panel instead of that one), the total scores under the CoP scoring will be slightly different." If by "results" you mean "who won, who came in second, etc." then of course your underlined and exclamation-pointed sentence cannot possibly be true.

It'd be nice if it were possible, but it's not, and the ISU is misleading people.

In my previous post I jokingly complimented Cinquanta by calling him a good con man. In fact, however, I don't think he's conning anybody at all. Everybody knows the CoP is crap -- but it's so much fun to roll around in even so!

Everyone that I have tried to explain the CoP to has responded with, "but how will that stop judges from cheating?"

So I think the most we can say is that the ISU is attempting to mislead people, but I don't think that anyone is actually being misled.

Maybe this is a good time to insert again this disclaimer: I am no big fan of the CoP. As I say, I think it's fun. I guess I just don't think that the ordinal system is any better at speaking to the problems that figure skating and the ISU have brought on themselves by not running a clean house.

...because 20 out of 20 draws of a hat, the competition results will be different.

Again, you mean that the total scores will be different, not necessarily the final placements, right?

...the majority count should effectively eliminate human error and produce reliable results, consistent results time and again.

I guess I'm just more pessimistic by nature. Whenever I see a claim that someone has "eliminated human error" I think first of Murphy's law. I really think that you are way too impressed at the sublime glories of the ordinal system.

I wish instead people would just say, "well, it's a little better than the CoP."

...just like the way it was in the ordinal system, except now there are 5 subjective marks instead of 1. These component scores are all as subjective and liable to human error as the single presentation mark in the 6.0 system. The difference is that these 5 subjective marks are added together to form part of a total that is touted to be an objective measure of performance.

I take it that your objection here is to the "touting," not to the method of determining the scores. As you say, the method of determining the "presentation" scores is about the same in both systems.

This is one of the points about the CoP that statisticians have jumped all over with great glee, because it's easy to run statistical tests about it. So far under the CoP the judges have not made any pretense at figuring out the five program components and the multiple categories within each. If a judge likes a skater, he or she just gives that skater uniformly high marks across the board. Just like the old system, where a skater got just one score which said, OK, that was pretty, or maybe not so pretty.

BTW, you don't have to be a statistician to see this. A cursory glance at the numbers shows it as plain as day. In fact, I had intended to remark on your earlier comment that many of the conclusions of statistical analysis are counterintuitive. I don't think so. I think for the most part what statistics does is to provide ways to quantify things that are qualitatively obvious anyway.

So, let's see, do I have a conclusion? No, not really. The reason that I like the CoP is that I have learned a lot about skating by studying it. It is interesting to me to see what relative value the experts put on certain elements, to ask with Sasha Cohen, "So if my spiral isn't a level three then what the heck to I have to do to make it better," to think about whether a flutz deserves a -1.0 or a -2.0 GOE, etc.

As for the ordinal system, I learned all I wanted to know about it at Salt Lake City. Several months before the contest, the panel of judges was announced for the ladies events. Everyone took one look at the panel and said, oh, no, Michelle is screwed. As it happened,

5 judges thought Sarah was better than Irina, and 4 thought Irina was better than Sarah.

4 judges thought Michelle was better than Irina and 5 judges thought Irina was better than Michelle.

I don't know. I'm just not seeing: "In the ordinal system, the majority count should effectively eliminate human error and produce reliable results, consistent results time and again."

Mathman

Edited to add:

PS. One more thing about the random draw. Although this does not affect the results statistically, nevertheless IMO it is a terrible public relations blunder on the part of the ISU. It does not accomplish its goal of contributing to the anonymity of the judges (itself a bad thing anyway), and all it accomplishes is to make the public think either that the ISU is trying to pull a fast one or that it has lost its mind.

Ptichka · Dec 4, 2003

This is one of the points about the CoP that statisticians have jumped all over with great glee, because it's easy to run statistical tests about it. So far under the CoP the judges have not made any pretense at figuring out the five program components and the multiple categories within each. If a judge likes a skater, he or she just gives that skater uniformly high marks across the board. Just like the old system, where a skater got just one score which said, OK, that was pretty, or maybe not so pretty.

That is only partially correct. I think the judges have made an effort to judge Skating Skills (the fist program component) somewhat fairly. Look, for instance, at the Men's SP at Cup of China. There, even though Gao's overall TCS is 9th, his SS is 4th. I know this is an extreme case since the judges probably could not justify saying that a skater with the STRONGEST TES has very poor SS, but I have noticed this in other competitions.

Mathman · Dec 4, 2003

Hi, Ptichka. Well, I think that this is one area where statistics really can contribute to understanding the numbers. Here are links to two recent articles that quantify the tendency to give blanket marks across the board in the program components. One is by Dr. Sandra Loosemore and the other by Dr. Dirk Schaeffer.

http://www.goldenskate.com/articles/2003/101703.shtml

http://www.frogsonice.com/skateweb/articles/cop-components.shtml

I also posted an analysis of variance approach to this question a while back, but I can't find it in the archives right now. Anyway, as expected, for each judge the variation across the different categories was very small compared to the total variation among all the numbers.

To me, though, this is not really a fault in the system. It just means that some skaters are better than others.

Mathman

moyesii · Dec 4, 2003

MM,
I'm surprised at myself that I was actually able to follow your post, since you threw in so many quotes and stats.

Let's concentrate on one point, which I think is the main point, and which you addressed this way:

I hope you mean that "every time you take a random sample (i.e., seat this particular 5-judge panel instead of that one), the total scores under the CoP scoring will be slightly different." If by "results" you mean "who won, who came in second, etc." then of course your underlined and exclamation-pointed sentence cannot possibly be true.

In a close competition (see Trophee Lalique men's event), random sampling DOES affect the results. Every judge will mark a skater's component scores differently (see http://www.frogsonice.com/skateweb/articles/cop-components.shtml ) with huge amounts of variability between any two judges' scores for each skater. Therefore, the final outcome is completely dependent on the random draw since the final placements are determined by the results of the total scores, which will ALWAYS be different given a different set of judges. I think we both agree on this. But you don't agree that this will affect the final placements.

In the ordinal system, it is true that two different samples of 9 judges might produce results like 5-4 and 6-3 in favor of the same skater, but this does NOT affect the placements. The majority of 1st place ordinals is what counts, and the others are just error. Also keep in mind that statistically, this is expected since no sample will be an exact representation of the population. The ordinal system is much more robust than the CoP system, because taking random samples from the population of judges has a very likely chance of producing consistent results (outcomes) time and time again, even in a close competition (ex. 1996 Worlds, Lu Chen and Michelle Kwan, no bloc judging). The results from the ordinal system are valid and reliable, unlike the CoP results. I think Sandra Loosemore's point in the article above is that each judge should effectively NOT be considered a sample from the same population, because of the enormous amounts of variability between any two judges' scores for the same components. Therefore a sample of the judges' marks under that particular system can in no way produce reliable or valid results, since the scores themselves are not reliable or valid. If you analyze the CoP from the bottom-up, it completely falls apart.

gkelly · Dec 4, 2003

moyesii said:
final outcome is completely dependent on the random draw since the final placements are determined by the results of the total scores, which will ALWAYS be different given a different set of judges. I think we both agree on this. But you don't agree that this will affect the final placements.

Sometimes a different sample will change the results, sometimes not. The closer the contest, the more likely the results are to change, regardless of which system you use.

I'll leave it to Mathman to put it in mathematical terms and to figure out whether *results* are more or less likely to change under one system compared to the other.

In the ordinal system, it is true that two different samples of 9 judges might produce results like 5-4 and 6-3 in favor of the same skater, but this does NOT affect the placements.

Ah, but if you take a 5-4 decision in favor of skater A arrived at by 9 out of a larger pool of judges and randomly substitute one or more of the judges who counted in that decision with others who did not initially count (e.g., choose a different "substitute judge" out of the initial draw from the 10-judge panels used through 2002), the ordinal breakdown might still end up being 5-4 in favor of skater A, or it might become 6-3 in favor of skater A, OR it might swing to 5-4 in favor of skater B.

Substitute judge prefers A:
5/9 chance of keeping ordinal breakdown the same
4/9 chance of changing to 6-3

Substitute judge prefers B:
4/9 chance of keeping the breakdown exactly the same
5/9 chance of switching the 5-4 split to 4-5 and giving the win to B

What does that make, a 5/18 chance of reversing the results?

With a larger pool of judges and more options for the random selection (e.g., a different set of 9 out of the 14 used under the interim system, or hypothetically choosing any 9 out of a set of 1000), it's much more likely choosing a different set of judges would change the ordinal breakdown and potientially the results. There is no guarantee in a close contest that any one particular result is the "correct" one, so you can only use as a result what that particular panel happens to yield according to that particular scoring system. Because judging is a matter of (educated) opinion, it's not an exact science

The majority of 1st place ordinals is what counts, and the others are just error.

Think of a 5-4 decision where you preferred the skater who came out on the losing end. Do you consider the opinions of the judges who agreed with you to be "just error"?

Also keep in mind that statistically, this is expected since no sample will be an exact representation of the population. The ordinal system is much more robust than the CoP system, because taking random samples from the population of judges has a very likely chance of producing consistent results (outcomes) time and time again, even in a close competition

If you look at *close* competitions, especially when the ordinals are mixed among more than two skaters, you can find many examples where different calculations applied to the exact same sets of ordinals (e.g., majority vs. OBO) would produce different results. You can also find plenty of examples where substituting the rankings of the referee or the substitute judge for any of the official judges at random would change the results.

If you look at unanimous or near-unanimous decisions, any valid system should produce the same results.

moyesii · Dec 4, 2003

Unfortunately, your entire argument is based on a wrong assumption, gkelly, which is that a "close" competition is defined as a 5-4 split of the panel. A 5-4 split is not the definition of a close competiton. An event where MK and Lipinski skated like at Nagano, or like 1996 Worlds b/w MK and Lu Chen, is a close comp. Note that in both of those events, there was a convincing majority of 1st place ordinals for one skater, AND neither skater was from E. Europe. A 5-4 split is NOT indicative of a close comp. It is indicative of bloc judging. Consider that a sample of the judging population for any given comp will approach the mean. The mean will be the ideal where ALL 9 judges place the same skater 1st. All other splits of the panel, ex. 5-4, 6-3, are due to variability. However, as long as the results reflect the majority opinion of the panel, the sample of judges can be said to represent the population.

If you look at unanimous or near-unanimous decisions, any valid system should produce the same results.

Under CoP, there is no such a thing as unanimous decisions of the panel, because of the large contributing factor or error and chance into the component marks. The sum of these marks will produce results that no one could have predicted under a valid system.

Mathman · Dec 4, 2003

Hi, Moyesii. We seem to be talking about different things in this discussion of the effect of the random draw. (BTW, did you notice that we both gave the link to the same paper by Sandra Loosemore in our last two posts, LOL. So I guess we¡¦re on the same page after all.)

If you draw a sample from a large population, then of course the results may be different if you choose sample X than if you choose sample Y. This is true for any statistic that can be extracted from a sample, whether it is the sum of a lot of component scores (CoP) or whether it is an average ordinal placement (OBO, for instance).

What I am saying is that it does not matter WHEN this random selection is made. If the random selection is made by drawing straws three months before the event (the old method) then this introduces the possibility of ¡§sampling error.¡¨ If the draw is done in two stages ¡V first a draw of 14, then later a draw of 9 from the 14 (the new method) ¡V the amount of sampling error is exactly the same.

Think of it this way. Would we like the system any better if they conducted the second stage of the random draw a day ahead of time instead of 15 minutes? A week before? Three months before? At the same time as we did the initial draw of 14?

Would we like the system any better or any worse if the 5 judges who were not selected then went home and did not participate in a phony charade of play-judging? Would the system be any better or worse ¡V would the results of the competition be any different ¡V if the five dummy judges were struck by lightning two minutes before the skating began?

I will try to give an example, whittled down in size to make the issue more transparent.

There are 50 judges in the pool. Call them judge #1, judge #2, etc.

THE CoP WAY: Three months before the event, there is a random draw of five judges. Judges 3, 13, 25, 41 and 43 were chosen. It could have been different, but that¡¦s what happened.

Here are the scores given to the top men¡¦s competitors. I will use Plushenko (your personal favorite, LOL) and Joubert.

-----#3 #13 #25 #41 #43
Jou 200 200 200 150 150
Plu 150 150 150 200 200

Now we have a random draw of three. As luck would have it, we choose judges # 25, 41 and 43. Plushenko wins! Joubert is robbed by the system!

THE OLD WAY: Three months before the event, there is a random draw of three judges. As luck would have it, we chose judges # 25, 41 and 43. Here are their marks:

-----#25 #41 #4
Jou 200 150 150
Plu 150 200 200

Plushenko wins, fair and square.

This little example is supposed to illustrate why the only thing that matters is this: who are the final three judges? In either case Plushenko wins with 550 total points to Joubert¡¦s 500 (or, by two to one in first place ordinals). This is what I mean when I say that statistically the random draw thing is a red herring.

But still, it is terrible and the ISU should do away with it pronto. Not for statistical reasons but because it is a public relations disaster. To look again at the example, although the results are the same for both the CoP way and the old way, the real outcome of the contest is, with the CoP you¡¦ve got a million Joubert fans angry at the system and 10 million casual fans saying, what kind of a farce is this?

Mathman

Ladies Free Skate and Results + GPF Finalists

Joesitz (RIP)

moyesii

chuckm

Joesitz (RIP)

euterpe

moyesii

Mathman

Mathman

giseledepkat

moyesii

Mathman

moyesii

Mathman

Mathman

Ptichka

Forum translator

Mathman

moyesii

gkelly

moyesii

Mathman

Similar threads

Connect with us