Ladies Free Skate and Results + GPF Finalists

hockeyfan228 · Dec 8, 2003

moyesii said:
Oh I thought this thread was finished...

But it's such a good one.

But in fact, despite the apparent transparency of the system, each of the links in the scoring chain are very weak, such that the system has a good chance of breaking down at any point in the chain.

As opposed to ordinals, where a judge, with a typical human brain, see a relatively small number of "events" in a row, and condenses that into a judgement, sees a few more, aggregates that into the initial judgement, sees a few more, etc. until the very end, when he or she must then compare that final judgement against what she has seen over the last 20 or so programs? It seems to me that there is a greater chance of that logical chain breaking down and many more errors seeping in.

On the other hand, the system itself is crude and error-prone, so that the skaters can be majorly screwed by the single hand of one judge or random chance.

You've cited nothing to convince me that ordinals are no less error prone, or how under ordinals, a skater cannot be screwed by the single hand of one judge -- i.e. 5/4 split. As far as random chance is concerned, because one set of judges is chosen at the beginning -- not for each scoring element -- starting with 10 and choosing 7 is no different than having 7 in the first place, and using trimmed mean, like is done in diving. What the data shows is the variance among 10 judges instead of the variance among 7, and that if a different combination of judges were chosen -- outright or randomly -- there would be a different result. I see no difference between CoP and ordinals here.

In effect, what most fans seem to be saying is, the system sucks for the skaters, but at least we get to see what the judges are doing to them. Yay for the fans!

If it were for the fans, the fans would vote like on American Idol.

My 2nd response is, if the CoP "provides data to determine whether judges are adhering to the code," then what disciplinary actions against cheating judges are available to us in a secretive, pooled scoring system? We have already seen numerous times in the GP series that judges have not been taking the proper deductions.

Yes, and in contrast to ordinals, mistakes are transparent to everyone looking at them. Or another was of looking at is, that the standards aren't clear, and judges interpret the rules as having more discretion. For example, one judge may think that -3 is mandatory for a fall, while another might believe that if the entrance and rotation were good, that s/he can give the skater a -2. That can be addressed through clarification and training.

Yes, and under CoP, bias will be a factor in ALL of the different element and component scores, and those biases are just as difficult to decipher or rationalize.

I disagree that the bias for elements are just as difficult to decipher or rationalize. Computers and statisticians are quite good at finding patterns, and there is the videotape to confirm the basis for the pattern.

Unfortunately, it turns out that each of the 5 program components are just as obscurely marked as the single presentation mark. That's why we have concluded that ranks are less error-prone than an absolute point scale, and a single ordinal introduces less error into the results than do 5 separate marks for program components.

I agree that the program elements have been, for the most part, a mere substitution for ordinals, and a bust to date. Part of the underlying cause will not go away, and that is the aggregate nature of the judgement. There are some things that I think can be addressed in several ways, such as dividing the judges into technical judges and presentation judges, who would be specialist for a maximum of two of the PE, like is done in ski-jumping, and quantifying transitions in a similar way that elements are quantified -- i.e. counting each one and having a predetermined difficulty.

What does it mean when one judge gives a skater 5.50 points for Skating Skills and another judge gives the exact same skater 7.25 points? Then, what are we supposed to think when the final results are decided by less than a tenth of a point?

What does it mean when one judge gives a skater 5.8 for pre and another judge gives that skater 5.6 for pre, and the results are decided by one ordinal? We think the same thing that is happening under CoP: that it was extremely close, but that might not be the case, because the actual score in the ordinal is meaningless in its own context; it is a way to "leave room," even when the skaters are extremely close. Under CoP, we might think that the decision is close and statistically insignificant, and either skater could have won. We might look to see if the first judge was judging relatively higher across the board than the second judge, or if the second judge really thought that the skater was that much worse or better.

Just because there is more information provided by the CoP, doesn't imply anything about the integrity of the enormous amounts of information being given. Judges in BOTH systems are going to be biased.

Humans are biased. A judge is not a critic; a critic can have as many biases as s/he wants. A judge is someone who is trained against that natural bias and is audited against whether s/he judges to the defined standard without bias. One of the main points of any judging system is to be able to detect and address that bias.

The important thing is that in addition to the error from bias, the CoP also introduces random error into all the scores and calculations. The random error is far more serious, because bias cannot be completely gotten rid of in a judged sport, but random error should be minimized. A large amount of error means that the system is not producing reliable results.

A large amount of error may mean that the system is not producing reliable results, but there is no evidence that 6.0 produces equal or more reliable results. A large amount of error could point to the difficulty in judging figure skating, or that the judges need to be trained more.

I'm not sure I understand. Do you mean that ordinals simply place the better skater ahead, regardless of the "quantifiable" difference between skater A and skater B?

I mean that ordinals place the favored skater ahead, with no explanation or auditability of the value judgement.

First of all, it's not always possible to quantify instead of qualify the differences among skaters. It would only be possible if skating was a jumping contest. But every skater has their own strengths and weaknesses that cannot be added and subtracted into totals that can be quantifiably compared meaningfully. Really, skaters can only be compared relatively.[/b]

I disagree. Under 6.0 there are written standards for the judges to follow that have explicit value judgements built in. That is why the judges go to training seminars. Under 6.0 they've taken the leeway and have been allowed to decide their own balance based on preference for each skater in any competition.

Well I didn't know this was your job j/k

Notmyjob. But it is the ISU's, according to their bylaws.

First of all, I really hope you don't think that that's how judges judge. Obviously, bloc judging totally confounds the results process.

Yes, it is how I think judges judge, however simplified. By aggregating a series of judgements, it means that each judge can decide to factor or ignore any element, any level of difficulty, on any scale. It's not just a matter of bloc judging. It is a matter of giving leeway that can be justified in almost all cases "I thought "X" outweighed "Y' so I gave them ordinal "Z."

I gave these numbers based on how I recalled their skates. And basically, that is what the judges do in the CoP: At the end of the skate, they assign 5 numbers to the program components.

According to your logic, a wide range of scores should be acceptable -- it is, after all, what we have under 6.0 -- as long as the relative scoring by each judge is consistent. Under CoP it is possible to determine if this is the case.

The absolute scale is subjective and inaccurate enough that any WIDE range of scores by the judges will seem justifiable, legitimate, and uncontestable.

I think the opposite. It think that the absolute scale is far from subjective and inaccurate; I believe that in many cases the judging has been subjective and inaccurate, and if anything, some scores are not justifiable, legitimate, and incontestable, based on a comparison of the written code to the performances.

I made the scores above so that one judge preferred Baiul, and one preferred Kerrigan. Basically, we see that one judge likes Baiul, because of her balletic style and choreography. The other judge preferred Kerrigan, because of her strong, clean lines and perfect execution. Now, how the competition turns out is up to anyone and no one under the CoP system, because of the variability in the judges' marks and the random count of the scores.

Again, the random factor is the same as having chosen 7 judges outright, while 3 were armchair judges. The variability of the judges marks is only relevant to the final placement if relative standards aren't consistent. I think it is a bad thing that the presentation scores are so out of line with the standards, but the data makes this very clear.

Would you feel better now, knowing that one judge scored Kerrigan higher in Perf/Execution, even though another judge scored Baiul higher in the same mark? Or are we back where we started with the ordinals, except now we have 5 marks instead of one and a multitude of confounding variables, and no accountability.

The accountability takes place when judge A is asked, and exactly how did skater A's performance meet the criteria of score A?

I don't know what your point is. The ISU or the judging system?

That the ISU has not had the data with which to prove bias, because a single score, or single sets of scores can be justified by the relative judgement, weighting, and standards of the judge.

Let's be specific here. You mean a more transparent judging chain, a more transparent scoresheet for technical elements, but still equally untransparent marks for program components, and overall secrecy in the judges' marks.

No, I mean a more transparent scoresheet for technical andprogram elements, because the data I've seen so far makes it clear to me that the standards aren't being used. Overall secrecy -- hardly. It is secret to me and thee, but it is not secret to the ISU after the event.

Yes, that is most definitely a positive. Now here is one for OBO:Each judge marks the elements, including non-jump or spin elements, as THEY see them and not as some technical specialist calls them. In other words, in a 9-judge panel, there are nine judges making independent decisions, instead of a single person making calls for the entire group.

Three, actually, the specialist, the assistant specialist, and the controller. All of whom consult with each other and are mandated to use the videotape, if there is an issue. All three are hired, trained, and fired by the ISU. They are hardly infallible, but they are better trained, technically, than the average judge, based on the qualifications needed to be hired for the position, and the ISU is not subject to the choices of the Federation, which is the case with judges.

I disagree that the relative difficulty of elements should be judged by nine judges making independent decisions. That is like saying that a baseball referee should have discretion on whether a runner is called safe or out or whether a basketball player travelled or whether a play is offside. Yes, officials will make mistakes -- and, on occasion, receive death threats because of this -- but the rules are codified, and unlike the ISU, the leagues can fire officials that aren't following the code.

Do you mean Baiul at the Olympics? That was more likely bloc judging. If there's a group of collaborating judges working the CoP system, they can easily manipulate the system in more subversive ways.

And these more subversive ways would be?

Also, under CoP we have seen many times already in the GP that jumping mistakes are often "forgotten," by exaggerating the component marks.

I think the program element scores are not being judged properly. However, I haven't seen a consistent pattern of exaggerating the component marks, partly because so few skaters' programs have been televised on US networks. (Which doesn't mean that I haven't missed them.)

When did it ever do such a thing in the ordinal system? In both systems, the judges must take into account both the strengths and weaknesses of a skater's performance.

Actually, they don't. They can take into consideration any combination of things they want, and weigh any combinations of things they want in a different ratio for each skater within each competition.

The CoP seems to be more literal and systematic in its calculations, because of the TES mark, but people are forgetting that the TPC is highly subjective, and has been the deciding factor in numerous events in the GP.

Based on the standards, only two of the five program elements should be subjective, and then not highly so -- choreography and interpretation. There are clear technical standards for skating skills and performance and execution, and, as I mentioned earlier, I think that transitions could be made more quantifiable.

In addition, the problem of the systematic adding of the total elements and program components is that the marks are riddled with random and human error. Some consider the leeway with which judges have to award marks in the ordinal system to be a bad thing, but I consider it a good thing because it allows judges to check for error that would otherwise be present in the calculated results.

It sounds like you think that this is happening already, because the judges can look at the scores that they gave in the moment, and decide to use PE scores to "adjust." The only reason to "adjust" a technical score is to review the video to see if the original score was unduly harsh or too lenient.

Well some people consider this a bad thing. Some people think that the values that have been assigned to elements in the CoP don't make sense.

I agree that some things could use some tweaking, and I'm basing this on what skaters and coaches have said about relative difficulty that is nowhere differentiated by CoP, ex: a 2T/3T is more difficult than a 3T/2T. Since computers are extremely good at making these calculations automatically, there's no reason to stint on the number of differentiations and weights.

Whether I think spins should be more highly valued is irrelevant; this is a decision that the ISU must make, and I can cheer or hiss or beat my breast and say that the sport's apocalypse is here. But at least the skaters and coaches can look at the relative weights and adjust accordingly to maximize results.

I like the idea of a democratic judging panel (more reflective of the population) rather than a tyranny of what is considered valuable in skating.

And those tyrants would be, the ISU technical committee? A group that has been responsible for setting the technical standards for technical scoring since it was formulated? Do you really think that officials of a sport should ignore the rules, regulations, and standards of the sport -- because having standards is what makes them eligible for the Olympics?

Also, a technically and artistically rapidly evolving sport like skating NEEDS an adaptive system like ordinals, not a system rigidly coded with values for certain elements whose emphasis may or may not change with the times (even year to year or event to event!!)

The only way that ordinals are adaptive is the anarchic changes that are made by individual judges.

We have also seen that skaters can build an insurmountable lead after the SP... But, that is not the main problem. As I discussed earlier in this thread, the ultimate goal of a judging system in skating is to assign placements, not scores.

The ultimate goal of scoring is to assign placements. It's the same goal as in diving, gymnastics, and ski-jumping.

And there can never be such a thing as a "statistically insignificant difference" between any two skaters. This is a fallacy or myth generated by the CoP.

What is your justification for that assertion?

The strength of the ordinal system is that it is able to AMPLIFY the marginal differences between any two closely matched competitors into meaningful differences in placements.

I don't agree that AMPLIFICATION is the most important element of a judging system. 6.0 gives a false impression that the judgement was definitive, when it may have been anything but definitive.

The CoP is unable to do this, because any skaters that finish within about 5 points of each others' totals will not have a clear and definite justification for their placements (due to error and variability), and statistically there is no reason why those placements wouldn't change with another random draw.

Just as under 6.0 the placements could have been very different under a different set of judges.

The only thing the scores of these closely ranked skaters tells you is that the system was unable to differentiate the skaters. The placements of these undifferentiated skaters are entirely due to chance.

That's not the only interpretation. The scores under 6.0 could tell me that the under the system, the judges made arbitrary decisions in scoring skaters whose performances were very close, because the requirement was to have to choose.

Not so with ordinals. The judges must make a conscious, deliberate, and thoughtful decision to put one skater ahead of another so that there is no misunderstanding that skater placed #4 was better than skater placed #5.

There is no evidence that there was anything conscious, deliberate, or thoughtful about the decisions made, just that a decision was made.

And how do you prove that the judges "aren't following the code" when it is undetectable? I will quote myself here:
"The absolute scale is subjective and inaccurate enough {due to norms in human acceptance of variability} that any WIDE range of scores by the judges will seem justifiable, legitimate, and uncontestable."

There are two ways to look at this: 1. Is the judge internally consistent in his/her scoring, even if the score doesn't match code 2. Does the score match the code?

Yeah, who knows, maybe the ISU is just buying itself some time, and even if the CoP is passed at the ISU congress, maybe after years and years of CoP failures and continuous modifications, like maybe after a couple of years of Olympic scandals, the ISU will introduce a "new" system of ordinals to fix the problems of the "old" CoP system. So maybe the ISU is buying itself some time with this experiment... It's not so bad since they're only wasting tons of money and affecting the results of competitons in the meantime.

The new system is the cost of remaining in the Olympics. I'm sure that the Olympic TV revenue will offset the cost of the system, that a number of skaters have said has already helped them improve their programs from competition to competition.

moyesii · Dec 8, 2003

QUOTE]As opposed to ordinals, where a judge, with a typical human brain, see a relatively small number of "events" in a row, and condenses that into a judgement, sees a few more, aggregates that into the initial judgement, sees a few more, etc. until the very end, when he or she must then compare that final judgement against what she has seen over the last 20 or so programs? It seems to me that there is a greater chance of that logical chain breaking down and many more errors seeping in.[/QUOTE]
Well it seems that way to you (based on what?), but experiments in perception have proven otherwise. In general, humans are more accurate at making relative judgments than making judgments of measurement (which require an inhuman precision) on an absolute scale.

You've cited nothing to convince me that ordinals are no less error prone, or how under ordinals, a skater cannot be screwed by the single hand of one judge -- i.e. 5/4 split.

See above.. But are you saying that a 5-4 split is the work of a single judge?

You are still confusing the situation of a 5-4 split (bloc judging) with the situation of a close competition (Lipinski vs. Kwan).

The ordinal system is also less error-prone because if one judge makes a bad call in a 9 judge panel, then 8 judges can effectively override his marks. For ex. A skater winning with 8-1 judges' votes is the same as winning with all 9 judges' votes. Under CoP, if one judge or the tech specialist makes a bad call, then their marks go into the calculations. As Sandra Loosemore pointed out, the double trimmed mean only guarantees that the high and low scores are thrown out, but it doesn't guarantee that those high and low scores were actually error or that they should have been thrown out at all. For example, one judge may be marking ALL the skaters consistently, using a wider range of the 10-point scale. A cheating judge may be marking skaters around the average, to guarantee that his/her scores aren't thrown out. As we have seen in the GP, margins of placement have come down to within a hundredth of a point. Therefore, minor inflections in a cheating judge's scores, all situated around the average, can have a huge impact. The work of a single judge can corrupt the entire data.

As far as random chance is concerned, because one set of judges is chosen at the beginning -- not for each scoring element -- starting with 10 and choosing 7 is no different than having 7 in the first place, and using trimmed mean, like is done in diving. What the data shows is the variance among 10 judges instead of the variance among 7, and that if a different combination of judges were chosen -- outright or randomly -- there would be a different result. I see no difference between CoP and ordinals here.

The difference is that the CoP is more sensitive to the random draw of the judges, because the scores are more error-prone and there is huge variability in the marks. Barring bloc judging AND cheating under both systems, the CoP will produce different results almost all the time, esp. when 2 or more skaters are close in points. On top of that, if a single judge decides to cheat, the small inflections in the scores centered around the average will have a huge impact on the results and will be hard to detect.

If it were for the fans, the fans would vote like on American Idol.

Well that is a scary thought.

Yes, and in contrast to ordinals, mistakes are transparent to everyone looking at them.

Omissions of deductions are transparent. Cheating is not.

Or another was of looking at is, that the standards aren't clear, and judges interpret the rules as having more discretion. For example, one judge may think that -3 is mandatory for a fall, while another might believe that if the entrance and rotation were good, that s/he can give the skater a -2. That can be addressed through clarification and training.

Well it better be addressed soon. But I just don't see the variability as a problem that can be fixed in CoP.

I disagree that the bias for elements are just as difficult to decipher or rationalize. Computers and statisticians are quite good at finding patterns, and there is the videotape to confirm the basis for the pattern.

The CoP is more sensitive to slight variations in the judges' marks, so that the meaning and impact of these variations won't be immediately apparent to spectators, such as at an important skating event being broadcast live. Now you are saying that with a thorough statistical analysis, at least cheating can be detected. Under ordinals, any cheating is apparent right up front. With CoP, a competition and all the medal placements would be water under the bridge before the analyses come out and prove anything. I guess some people would prefer it this way. :rolleye:

I agree that the program elements have been, for the most part, a mere substitution for ordinals, and a bust to date. Part of the underlying cause will not go away, and that is the aggregate nature of the judgement. There are some things that I think can be addressed in several ways, such as dividing the judges into technical judges and presentation judges...

Ok, when that happens, why don't you get back to us. :laugh:

By the way, the "aggregate nature of the judgment" is NOT the problem. It's actually the goal of both systems to judge the whole and not just its parts. Ordinals are much better at this. The CoP is deceptive in that it claims it can look at the parts and make it equal to the whole. The whole of MK's Aranjuez was greater than the sum of its parts. For Sokolova, certain parts were better than the whole.

What does it mean when one judge gives a skater 5.8 for pre and another judge gives that skater 5.6 for pre, and the results are decided by one ordinal? We think the same thing that is happening under CoP: that it was extremely close, but that might not be the case, because the actual score in the ordinal is meaningless in its own context

A 5-4 split in the ordinals does not imply a close competition. It indicates bloc judging. For an example of a close competition, see MK and Lu Chen at 1996 Worlds. The ordinals were NOT split 5-4.

it is a way to "leave room," even when the skaters are extremely close.

As you said, under the ordinal system the scores themselves are not meaningful. It is how the judges rank each skater relative to the others that is significant.

Under CoP, we might think that the decision is close and statistically insignificant, and either skater could have won. We might look to see if the first judge was judging relatively higher across the board than the second judge, or if the second judge really thought that the skater was that much worse or better.

You have already admitted that there is extreme variation in the marks of the judges under CoP. That is far enough to conclude that the scores are statistically not reliable and the placements are not valid. No need for further analysis of cheating.

Humans are biased. A judge is not a critic; a critic can have as many biases as s/he wants. A judge is someone who is trained against that natural bias and is audited against whether s/he judges to the defined standard without bias. One of the main points of any judging system is to be able to detect and address that bias.

Judges are human. "Humans are biased."

A large amount of error may mean that the system is not producing reliable results, but there is no evidence that 6.0 produces equal or more reliable results.

No evidence except for the fact that ordinals (ranks assigned on a relative scale) are more accurate than points assigned on an absolute scale. The sum of the element and component marks in CoP equals huge error. Additionally, 9 judges judging independently makes the majority system more reliable than a pooled scoring system that includes all the error and bias in the total scores.

A large amount of error could point to the difficulty in judging figure skating, or that the judges need to be trained more.

"The difficulty in judging figure skating?" You mean too difficult for the CoP to resolve? Yes, the judges probably do need to be trained more, but until we see improved results, the CoP's results are not reliable and not valid. So why are they being used in the GP instead of being tested alongside the ordinal system? That would be one surefire way to test their validity.

I mean that ordinals place the favored skater ahead, with no explanation or auditability of the value judgement.

The judges can be made to explain their marks. Under ordinals, they are held accountable. The auditable data in the CoP can be reviewed, but since the total scores are sensitive to minor inflections in the judges' marks, there are too many confounding factors involved in finding the culpable marks. Basically, under CoP ALL the marks are full of error.

I disagree. Under 6.0 there are written standards for the judges to follow that have explicit value judgements built in. That is why the judges go to training seminars. Under 6.0 they've taken the leeway and have been allowed to decide their own balance based on preference for each skater in any competition.

Yes, judges are human. The ordinal system makes no disguise of the fact that the judges have faces. The CoP attempts to disguise the subjectivity involved in judging by making the results more arbitrary, by increasing the # of steps in the results process, and hiding everything behind a secret computer. By definition, judging is a human activity, no matter how many facades it is hidden under.

Yes, it is how I think judges judge, however simplified. By aggregating a series of judgements, it means that each judge can decide to factor or ignore any element, any level of difficulty, on any scale. It's not just a matter of bloc judging. It is a matter of giving leeway that can be justified in almost all cases "I thought "X" outweighed "Y' so I gave them ordinal "Z."

It doesn't change the fact that the judges are trained and well-versed in the standards and rules and regulations. These standards will always guide them. But it is the extra power of free-thinking that allows judges to consider the whole in addition to its parts. The ordinal system makes no disguise of this fact. But the leeway certainly exists in the CoP, so that a judge might give skater X 7.5 points in Choreography versus 6.5 points in Chore to skater Y. And the statistical relevance of the leeway that exists in both systems in their different forms (relative vs. absolute) is that the absolute scale (CoP) is more deleterious to the reliability and validity of the results. Therefore, the "transparency" of the actual data in CoP is meaningless.

According to your logic, a wide range of scores should be acceptable -- it is, after all, what we have under 6.0 -- as long as the relative scoring by each judge is consistent. Under CoP it is possible to determine if this is the case.

Not quite, the relative scoring of the judges is only valid for the reliability of the results in the ordinal system. Under CoP, not only will the marks of the skaters relative to each other impact the results, but also the magnitude of the differences in those marks. Therefore, it does matter in CoP whether a judge gives 7.25 or 7.0 to one skater over 6.5 to another. It doesn't matter in ordinals if one judge gives 5.9 or 6.0 to one skater and 5.4 to another, as long as the rank is correct.

It think that the absolute scale is far from subjective and inaccurate; I believe that in many cases the judging has been subjective and inaccurate, and if anything, some scores are not justifiable, legitimate, and incontestable, based on a comparison of the written code to the performances.

Well until it improves, I guess you agree that the results from CoP are not valid.

Again, the random factor is the same as having chosen 7 judges outright, while 3 were armchair judges. The variability of the judges marks is only relevant to the final placement if relative standards aren't consistent. I think it is a bad thing that the presentation scores are so out of line with the standards, but the data makes this very clear.

The variability of the judges marks are relevant, because it indicates that each judge has the power to affect the outcome despite the presence of the 8 others. Therefore, the results are not reliable. Any different set of judges would have produced different results. That kind of variability is unacceptable for determining official competition results.

No, I mean a more transparent scoresheet for technical andprogram elements, because the data I've seen so far makes it clear to me that the standards aren't being used. Overall secrecy -- hardly. It is secret to me and thee, but it is not secret to the ISU after the event.

Enough said. :laugh:

Three, actually, the specialist, the assistant specialist, and the controller. All of whom consult with each other and are mandated to use the videotape, if there is an issue. All three are hired, trained, and fired by the ISU. They are hardly infallible, but they are better trained, technically, than the average judge, based on the qualifications needed to be hired for the position

What are those qualifications? I didn't realize that Alexei Urmanov had years and years of judging experience under his belt. I believe that the ISU has put this Oly gold medalist in this prominent and visible position in order to put credibility in the role of the tech specialist.

I disagree that the relative difficulty of elements should be judged by nine judges making independent decisions. That is like saying that a baseball referee should have discretion on whether a runner is called safe or out or whether a basketball player travelled or whether a play is offside. Yes, officials will make mistakes -- and, on occasion, receive death threats because of this -- but the rules are codified, and unlike the ISU, the leagues can fire officials that aren't following the code.

I think that the majority decision of 9 judges are better than the power of 1.

quote:
--------------------------------------------------------------------------------
Do you mean Baiul at the Olympics? That was more likely bloc judging. If there's a group of collaborating judges working the CoP system, they can easily manipulate the system in more subversive ways.
--------------------------------------------------------------------------------
And these more subversive ways would be?

The minor inflections in the scores, centered around the mean.

Actually, they don't. They can take into consideration any combination of things they want, and weigh any combinations of things they want in a different ratio for each skater within each competition.

No, unless they cheat, the standards always guide their decisions. If the judges weren't all in tune to the standards in place each year, then there wouldn't have been a majority outcome in favor of Lipinski over Kwan in Nagano.

Based on the standards, only two of the five program elements should be subjective, and then not highly so -- choreography and interpretation. There are clear technical standards for skating skills and performance and execution, and, as I mentioned earlier, I think that transitions could be made more quantifiable.

But as you said, ALL of the program components are being marked contrary to the standards. Let's not look at what should be, but what is.

It sounds like you think that this is happening already, because the judges can look at the scores that they gave in the moment, and decide to use PE scores to "adjust." The only reason to "adjust" a technical score is to review the video to see if the original score was unduly harsh or too lenient.

No, you read it wrong:
"In addition, the problem of the systematic adding of the total elements and program components is that the marks are riddled with random and human error. Some consider the leeway with which judges have to award marks in the ordinal system to be a bad thing, but I consider it a good thing because it allows judges to check for error that would otherwise be present in the calculated results."
I meant that the calculated total scores based on the judges' marks will have a total error based on the error introduced from the individual marks, and the total error will need to be adjusted for, in order to make the results valid. The CoP doesn't allow this. Therefore, the total scores will include huge amounts of error that will go on to affect the outcome adversely.

Whether I think spins should be more highly valued is irrelevant; this is a decision that the ISU must make, and I can cheer or hiss or beat my breast and say that the sport's apocalypse is here. But at least the skaters and coaches can look at the relative weights and adjust accordingly to maximize results.

Or maybe the ordinal system is just better equipped to award originality and innovative new moves. A rigidly codified system hinders progress and evolution of the sport.

And those tyrants would be, the ISU technical committee? A group that has been responsible for setting the technical standards for technical scoring since it was formulated? Do you really think that officials of a sport should ignore the rules, regulations, and standards of the sport -- because having standards is what makes them eligible for the Olympics?

I think that we can trust the majority opinion of the judges to determine that the proper standards are followed. On the other hand, the decisions handed down by an unchecked decision-making body are usually questionable.

The only way that ordinals are adaptive is the anarchic changes that are made by individual judges.

And those judges will be overruled by the majority. Not so in CoP.

The ultimate goal of scoring is to assign placements. It's the same goal as in diving, gymnastics, and ski-jumping.

Yes, and they are all very different sports and have developed their own unique scoring systems accordingly. But maybe you think that diving is similar to skating.

quote:
--------------------------------------------------------------------------------
And there can never be such a thing as a "statistically insignificant difference" between any two skaters. This is a fallacy or myth generated by the CoP.
--------------------------------------------------------------------------------
What is your justification for that assertion?

It's not my idea, but a fact of competition. There can only be one winner, one silver medalist, one bronze medalist, etc. It doesn't matter if the competition is "close". In a 100-meter dash, it doesn't matter that the runner-up finished a thousandth of a second behind the winner. The CoP only obscures the concept of a clear winner in a "close" competition, ie. the one that the judges preferred, because if Skater #2 finished only .01 points ahead of Skater #1, the results are too unreliable and the data too muddled to determine if those placements are truly representative of the majority of judges' actual preference. The majority may have in fact preferred Skater #2.

I don't agree that AMPLIFICATION is the most important element of a judging system. 6.0 gives a false impression that the judgement was definitive, when it may have been anything but definitive.

Judging is definitive, that is to say that a judge must hand down a decision in order for a judged sport to have a conclusion, a result. Even under CoP, judges will make a decision about the placement of a skater as they mark the individual component scores. The CoP cannot take away the decision-making ability of the judges, and no judging system should claim to. Figure skating is a judged sport. Let's not try to hide this fact. And let's not hide the judges.

Just as under 6.0 the placements could have been very different under a different set of judges.

Only if the judges are cheating. The variability in the judges' marks in CoP is NOT due to cheating, it is due to compounding human error, and that is the biggest factor in the variability in the total scores. Not so in ordinals.

That's not the only interpretation. The scores under 6.0 could tell me that the under the system, the judges made arbitrary decisions in scoring skaters whose performances were very close, because the requirement was to have to choose.

That's why we have a majority system.

There is no evidence that there was anything conscious, deliberate, or thoughtful about the decisions made, just that a decision was made.

The evidence is whether or not the skating community agrees with the results. The data from the CoP is not evidence, but a clue, and misleading one at that.

There are two ways to look at this: 1. Is the judge internally consistent in his/her scoring, even if the score doesn't match code 2. Does the score match the code?

Not really.
1. Internal consistency is only relevant to the ordinal system. Under CoP, a judge can mark each skater up or down in relatively the same way as the other judges, but it is the slight variations in magnitude of the differences (not the relative placements) that will affect the total scores.
2. Um, that is rather circular. The question was, How do you determine if the judges' marks follow the code when there is a wide range of values across the board to begin with?

The new system is the cost of remaining in the Olympics. I'm sure that the Olympic TV revenue will offset the cost of the system, that a number of skaters have said has already helped them improve their programs from competition to competition.

We'll see.

hockeyfan228 · Dec 8, 2003

Okay, here's another pass...

Originally posted by moyesii [/pi]

Well it seems that way to you (based on what?), but experiments in perception have proven otherwise. In general, humans are more accurate at making relative judgments than making judgments of measurement (which require an inhuman precision) on an absolute scale.

Not the studies I've seen, which show that humans are more accurate at making relative judgements than remembering unaided the microdecisions and balancing them correctly. In most cases of human judgement, ballpark is enough, and relative judgement is very effective when getting it "right enough". I wouldn't define choosing figure skaters as "right enough." In addition, cognitive dissonance sets in, where an element is given too much weight, based on the contrast to the rest of the events. One of the biggest truisms in skating is "that's the last think the judges will see and remember."

These are the reasons that many job review systems have moved away from a single score to a series of scores with different criteria and weights. It is so that "what have you done for me lately" isn't the overriding concern, and that the score is based on a fixed set of criteria, not whatever the boss feels like using at that moment, for that employee.

Point one on which we are unlikely to disagree.

But are you saying that a 5-4 split is the work of a single judge?

Click to expand...

Can be, but not always. Let's say that four judges for skater A and four judges for skater B are using similar balanced gauges -- four lean a little toward artistry, four lean a little more toward technique. The fifth judge could be completely out of line, and the majority is hardly neutralizing that judge.

You are still confusing the situation of a 5-4 split (bloc judging) with the situation of a close competition (Lipinski vs. Kwan).

Click to expand...

Intent is irrelevant if a judge is out of line with the rest of the scoring. However, the ISU can compare the deviating scores to the written standards to determine if the judge is correct in his/her scores and the other eight are wrong, or if the scores are truly "out of whack."

The ordinal system is also less error-prone because if one judge makes a bad call in a 9 judge panel, then 8 judges can effectively override his marks. For ex. A skater winning with 8-1 judges' votes is the same as winning with all 9 judges' votes. Under CoP, if one judge or the tech specialist makes a bad call, then their marks go into the calculations.

Click to expand...

Under CoP if the total difference between scores for 8 of 9 judges is marginal and one judge is so out of line with the scoring practices that the differential in her score is greater than the total of all other eight judges, and this hasn't been addressed by trimmed mean, then yes, one judge can "skew" the results. This could actually be the correct decision. However, even in the ordinals world, there are far more cases of close calls where one or two judges being very far out of line ultimately make the decision.

As Sandra Loosemore pointed out, the double trimmed mean only guarantees that the high and low scores are thrown out, but it doesn't guarantee that those high and low scores were actually error or that they should have been thrown out at all. For example, one judge may be marking ALL the skaters consistently, using a wider range of the 10-point scale. A cheating judge may be marking skaters around the average, to guarantee that his/her scores aren't thrown out. As we have seen in the GP, margins of placement have come down to within a hundredth of a point. Therefore, minor inflections in a cheating judge's scores, all situated around the average, can have a huge impact. The work of a single judge can corrupt the entire data.

Click to expand...

That is why having standards against which the scores -- and not just those scores that are counted -- can be compared is critical.

In your 8-1 example, the 8 judges haven't necessarily correctly countered the 9th judge. The 8 judges could be playing the average/median game as well to skew the results. Under CoP, if the review of the 9th judge's scores against the written standards pans out, the other 8 judges can be brought up and assigned to a JPG event in Arizona.

The difference is that the CoP is more sensitive to the random draw of the judges, because the scores are more error-prone and there is huge variability in the marks. Barring bloc judging AND cheating under both systems, the CoP will produce different results almost all the time, esp. when 2 or more skaters are close in points. On top of that, if a single judge decides to cheat, the small inflections in the scores centered around the average will have a huge impact on the results and will be hard to detect.

Click to expand...

Point two on which we will never agree, based on point one. The errors in determining the placement in the first place are hidden, and always will be.

Omissions of deductions are transparent. Cheating is not.

Click to expand...

Under OBO, omissions of deductions are sometimes transparent, and sometimes not transparent. They are transparent when a skater who has never received a 5.8-6.0 falls -- mandatory .5 deduction -- and is given a 5.4 or 5.5 or 5.6 as a technical score. Under CoP right now, if that skater isn't given a -2 or -3 on that element, the judge's mark is immediately questionnable based on a uniform code. And the ISU might see fit to take away the discretion on that element.

The CoP is more sensitive to slight variations in the judges' marks, so that the meaning and impact of these variations won't be immediately apparent to spectators, such as at an important skating event being broadcast live. Now you are saying that with a thorough statistical analysis, at least cheating can be detected. Under ordinals, any cheating is apparent right up front.

Click to expand...

By looking at the nationalities of the judges? Cheating is hardly apparent up front, because judges whose scores are "out of line" are brought up after the fact to the ISU at the end of each year. When the judges weren't secret at the time of the competition, they were asked to write up explanations for their marks, marks they could justify any way they wanted to, by emphasizing anything they wanted to. I don't see the Italian judge who gave Butyrskaya a 10th place ordinal in the '02 Olympic SP banished to the Milan novice Ladies' qualifying event.

With CoP, a competition and all the medal placements would be water under the bridge before the analyses come out and prove anything. I guess some people would prefer it this way.

Click to expand...

Actually, once the algorithms were set, the analysis could be done real-time. When under ordinals were any decisions revoked and the medals redistributed? When one judge said she was pressured and the IOC swooped in? And even there the placements weren't changed.

Ok, when that happens, why don't you get back to us.

Click to expand...

Whatever.

By the way, the "aggregate nature of the judgment" is NOT the problem. It's actually the goal of both systems to judge the whole and not just its parts.

Click to expand...

It is the goal of both systems to judge the parts accurately and to take the whole into consideration. The choreography and interpretation marks, and, to an extent, the transitions scores, as well as performance/execution scores take the duration into consideration.

Ordinals are much better at this. The CoP is deceptive in that it claims it can look at the parts and make it equal to the whole. The whole of MK's Aranjuez was greater than the sum of its parts. For Sokolova, certain parts were better than the whole.

Click to expand...

I disagree with both contentions, first that ordinals are better at this -- same as irreconciable points one and two. Second that the whole is greater than the sum of the parts. If the skater keeps up speed, flow, deep edges throughout -- duration -- then that skater, ex: Kwan's -- skating skills scores should be higher than the skater who doesn't -- ex: Sokolova. If Kwan's choreography and her execution of that choreography uses the music according to the standards set for choreography, then her choreography score should be higher than Sokolova's, which met a lower standard. When the elements -- judged real-time -- are added to the duration pieces, I think that is the whole.

A 5-4 split in the ordinals does not imply a close competition. It indicates bloc judging. For an example of a close competition, see MK and Lu Chen at 1996 Worlds. The ordinals were NOT split 5-4.

Click to expand...

Sometimes a 5/4 split implies a close competition. Sometimes a 5/4 split implies two sets of judges very far apart. However, a 5/4 split does not always indicate political bloc judging, even when the judges are aligned geographically.

As you said, under the ordinal system the scores themselves are not meaningful. It is how the judges rank each skater relative to the others that is significant.

Click to expand...

That is one value system.

You have already admitted that there is extreme variation in the marks of the judges under CoP. That is far enough to conclude that the scores are statistically not reliable and the placements are not valid. No need for further analysis of cheating.

Click to expand...

I have said that I don't see that the judges on the whole are adhering to the code. I have not said that there is extreme variation in the marks. If anything, they reflect ordinal-like judgements too frequently. (The "extreme" marks I've seen so far are the ones that seem to be adhering to the code.) So the most that I admit is that the system is no worse than under ordinals, where the judges use their "independence" to set their own criteria.

Judges are human. "Humans are biased."

Click to expand...

Which is why a judge needs to be trained to be objective and to adhere to standards.

No evidence except for the fact that ordinals (ranks assigned on a relative scale) are more accurate than points assigned on an absolute scale.

Click to expand...

Again, I disagree. I don't think that the standards are that hard to follow, but I do think that having the same judges mark both immediate and duration elements is a mistake.

The sum of the element and component marks in CoP equals huge error. Additionally, 9 judges judging independently makes the majority system more reliable than a pooled scoring system that includes all the error and bias in the total scores.

Click to expand...

Sometimes and under some circumstances. Not other times and other under circumstances, like the ones I've already described.

"The difficulty in judging figure skating?" You mean too difficult for the CoP to resolve? Yes, the judges probably do need to be trained more, but until we see improved results, the CoP's results are not reliable and not valid. So why are they being used in the GP instead of being tested alongside the ordinal system? That would be one surefire way to test their validity.

Click to expand...

First, it is possible that judging in figure skating is too difficult for any judging system to resolve. In that case figure skating should be kicked out of the Olympics. Once that happens, figure skating can run its own championships independently, like pro skating. Or it can use a jury system like is used in piano competitions, where the judges lock themselves away with their notes, listen to the tapes over and over again, and argue it out until they reach consensus.

Second, CoP was tested last year against the ordinal system, and were certified to be at least as valid as the ordinals system.

The judges can be made to explain their marks. Under ordinals, they are held accountable.

Click to expand...

They are asked to explain their marks, which can be justified by any combination of criteria and weighting that they want. I can't remember the last time that a judge was held accountable for his/her marks, because accountability implies consequences.

The auditable data in the CoP can be reviewed, but since the total scores are sensitive to minor inflections in the judges' marks, there are too many confounding factors involved in finding the culpable marks. Basically, under CoP ALL the marks are full of error.

Click to expand...

The marks that have error are the marks that deviate from the code. It's pretty straightforward to identify this.

Yes, judges are human. The ordinal system makes no disguise of the fact that the judges have faces. The CoP attempts to disguise the subjectivity involved in judging by making the results more arbitrary, by increasing the # of steps in the results process, and hiding everything behind a secret computer. By definition, judging is a human activity, no matter how many facades it is hidden under.

Click to expand...

I would characterize the two systems differently: under ordinals, a series of judges accredited by skating federations make independent decisions based on their own weighing and interpretation of events, in a relative scoring system in which biases are difficult to identify. The CoP sets a standard for individual technical and duration elements (technical and presentation) and expects the judges to put aside their biases and judge to those standards.

It doesn't change the fact that the judges are trained and well-versed in the standards and rules and regulations. These standards will always guide them. But it is the extra power of free-thinking that allows judges to consider the whole in addition to its parts. The ordinal system makes no disguise of this fact. But the leeway certainly exists in the CoP, so that a judge might give skater X 7.5 points in Choreography versus 6.5 points in Chore to skater Y. And the statistical relevance of the leeway that exists in both systems in their different forms (relative vs. absolute) is that the absolute scale (CoP) is more deleterious to the reliability and validity of the results. Therefore, the "transparency" of the actual data in CoP is meaningless.

Click to expand...

Obiviously, I disagree. The ordinals system allows individual judges to use the rules and standards as guidelines, hidden under a single aggregate judgement, which they may have to explain, but for which they are not held accountable. The difference between a 6.5 and 7.5 for choreography, for example, is clearly explained in the guidelines, and is on the judges' consoles for reference.

Not quite, the relative scoring of the judges is only valid for the reliability of the results in the ordinal system. Under CoP, not only will the marks of the skaters relative to each other impact the results, but also the magnitude of the differences in those marks. Therefore, it does matter in CoP whether a judge gives 7.25 or 7.0 to one skater over 6.5 to another. It doesn't matter in ordinals if one judge gives 5.9 or 6.0 to one skater and 5.4 to another, as long as the rank is correct.

Click to expand...

If each judge is using the same relative criteria to mark choreography for each skater, it doesn't matter to the final result whether the scores are relatively high or low, because the same judges' scores are evaluated. It would matter if there was a "world record" LP score or if a prize was given by combining scores across all competitions. But for a single competition, there is no difference. I don't think the ISU should let this continue, though, because all judges should be judging against the written standards.

Well until it improves, I guess you agree that the results from CoP are not valid.

Click to expand...

I agree that the program elements scores under CoP are neither more or less valid than ordinals.

The variability of the judges marks are relevant, because it indicates that each judge has the power to affect the outcome despite the presence of the 8 others. Therefore, the results are not reliable.

Click to expand...

The single judge might be the most correct one in the bunch. Maybe, for example, Totmianina/Marinin did skate better in the LP than Shen/Zhao at World '03, with more difficulty, even though Shen/Zhao had that "je ne sais quoi" spark, not to mention the great human interest story of her bashed and numb ankle. (T&M received one first-place ordinal.)

Any different set of judges would have produced different results. That kind of variability is unacceptable for determining official competition results.

Click to expand...

That is exactly what happened under ordinals. Different judging panels determined different results. Hence the moaning and glee by the skaters and coaches when the '02 Olympics judging panel was announced many months in advance.

What are those qualifications? I didn't realize that Alexei Urmanov had years and years of judging experience under his belt.

Click to expand...

He's been coaching for most of the last decade, and "active coach" is among the qualifications for the position.

I believe that the ISU has put this Oly gold medalist in this prominent and visible position in order to put credibility in the role of the tech specialist.

Click to expand...

Whether or not this was one of the ISU's motivations, it does not speak against his being qualified. I believe it was Boitano who said that judges were not generally savvy about technical difficulty, and skaters used tricks to make the judges believe that a move was more difficult. That kink of trickery is unlikely to get by Urmanov.

I think that the majority decision of 9 judges are better than the power of 1.

Click to expand...

That contradicts what you were saying earlier about the incentive for judges to cheat by sticking their scores around the middle to cheat on their placements. I would say that sometimes that majority of 9 judges is better than the power of 1, and sometimes it's not.

No, unless they cheat, the standards always guide their decisions. If the judges weren't all in tune to the standards in place each year, then there wouldn't have been a majority outcome in favor of Lipinski over Kwan in Nagano.

Click to expand...

I would argue differently that the judges decided to overemphasize revolutions in their technical scoring and to downplay proper technique and quality. I also think that judges tend to ignore the "parts" and allow "magic" performances to give too much emphasis to the whole. Hence Lipinski's and Baiul's and Shen and Zhao's "magical" skates.

I believe that judges, like psychiatrists, must be trained to recognize when their heartstrings are being tugged and to put this into perspective. "I like" should be subordinate to "I saw."

But as you said, ALL of the program components are being marked contrary to the standards. Let's not look at what should be, but what is.

Click to expand...

I don't think all program components are being marked contrary to the standards. I suspect that the judges are taking their impression of the program for the rest, and not differentiating between them -- i.e., giving someone credit for great choreography, but not dinging them enough for performance/execution. So that the score that they choose is based on the standards, but that they calculate everything else relatively from that. Likewise, if they don't like the music, they might not be giving the skater enough credit for interpretation. However, as I stated before, I don't think this is any better or worse than ordinals, and it is something that the ISU can try to improve upon, unlike ordinals.

However, I think that each element should be scored separately. I think that each criteria of each component should be scored separately.

No, you read it wrong:
"In addition, the problem of the systematic adding of the total elements and program components is that the marks are riddled with random and human error. Some consider the leeway with which judges have to award marks in the ordinal system to be a bad thing, but I consider it a good thing because it allows judges to check for error that would otherwise be present in the calculated results."

I meant that the calculated total scores based on the judges' marks will have a total error based on the error introduced from the individual marks, and the total error will need to be adjusted for, in order to make the results valid.

The CoP doesn't allow this. Therefore, the total scores will include huge amounts of error that will go on to affect the outcome adversely.

Click to expand...

I disagree that the technical scoring includes huge amounts of error. The CoP doesn't allow a judge to retroactively change his/her judgement on technical elements based on a "feeling." The landing of a jump or whether a spin traveled does not chance because the judge thinks the choreography is great or lousy. Based on the technical content of his program, Brian Joubert did not deserve to get a higher technical score than Michael Weiss at Trophee Lalique. With ordinals, it would be up to the individual judges to decide whether easier/clean wins or harder/sloppier wins. (Ideally, the judges wouldn't be able to see the tech scores or totals, when judging the program components at the end.)

Or maybe the ordinal system is just better equipped to award originality and innovative new moves. A rigidly codified system hinders progress and evolution of the sport.

Click to expand...

The ordinal system allows individual judges to decide if something is "original." So if Michelle Kwan uses the COE spiral each year, each judge can decide whether this is "original." Or not. From season to season.

CoP is pretty clear -- if a skater introduces a new move, and no one else in that competition uses it, that skater is given two points each time the move is used for the rest of the season. If two skaters introduce the move in the same competition, they both get credit once. That's not to mention the transitions score, where innovation is also rewarded. Or in the technical scores, where level 3 requires additional difficulty, which is often achieved through innovation -- hard entrances and exits, change of handholds, etc.

The "rigidly codified system" actually incents skaters to be original and innovative, by suggesting how they might use this to increase points, and to be sure to be given credit for the innovation and originality. Under ordinals, it's up to the individual judges to decide if they are impressed. Tamara Moskvina, who certainly has enough credentials, said, when interviewed, that CoP provided the incentive.

There was an interview that I can't find now -- it may have been with Senft or Stapleford -- who said that the reason she liked anonymous judging as a referee was that the judges were more likely to give honest answers in a general discussion, because they weren't put on the defensive to justify a particular score or placement. During these post-competition discussions she found that different pairs judges could not judge the relative difficulty of lifts. That's not a problem under CoP, because the difficulty is called by a technical official -- and verfied by two others.

I think that we can trust the majority opinion of the judges to determine that the proper standards are followed. On the other hand, the decisions handed down by an unchecked decision-making body are usually questionable.

And those judges will be overruled by the majority. Not so in CoP.

Click to expand...

I would disagree. The majority is not always right. Majority voting -- a zero-sum game -- is not always the right answer either. That is why various types of voting systems across the world in democracies and republic address the voice of the minority.

Yes, and they are all very different sports and have developed their own unique scoring systems accordingly. But maybe you think that diving is similar to skating.

Click to expand...

Diving is a judged sport with relative difficulty in the moves. The judges don't decide the degree of difficulty; this is predetermined, and the athlete's know what their maximum score will be, based on their programs.

In judged sports there are two standard approaches -- judge specialization, and trimmed mean. Trimmed mean is a fairly standard statistical tool to eliminate influence of outlying factors.

It's not my idea, but a fact of competition. There can only be one winner, one silver medalist, one bronze medalist, etc. It doesn't matter if the competition is "close". In a 100-meter dash, it doesn't matter that the runner-up finished a thousandth of a second behind the winner. The CoP only obscures the concept of a clear winner in a "close" competition, ie. the one that the judges preferred, because if Skater #2 finished only .01 points ahead of Skater #1, the results are too unreliable and the data too muddled to determine if those placements are truly representative of the majority of judges' actual preference. The majority may have in fact preferred Skater #2.

Click to expand...

I won't address the majority, because under ordinals, the majority may not have preferred skater #2. For example, if skater A has four 1st place ordinals and five 4th or lower placed ordinals, that skater does not do better than the skater who has three 1st place ordinals and six 2nd place ordinals.

It is possible to look at a .01 difference under CoP and chalk this up to standard deviation. Under ordinals, the difference for five judges in a 5/4 split to each have made their decision on a .001 difference each. Just because the differences aren't quantified under ordinals doesn't mean that they are any more substantial.

Judging is definitive, that is to say that a judge must hand down a decision in order for a judged sport to have a conclusion, a result. Even under CoP, judges will make a decision about the placement of a skater as they mark the individual component scores. The CoP cannot take away the decision-making ability of the judges, and no judging system should claim to. Figure skating is a judged sport. Let's not try to hide this fact. And let's not hide the judges.

Click to expand...

Of course the judges will be making decisions. However the closeness of the decisions is quantified. CoP doesn't hide that at all; it exposes it. There's a total for every single judge on panel, all there for everyone to see.

Secret judging is a different story, but it is not any more of a core component of CoP than it was last year under ordinals.

Only if the judges are cheating. The variability in the judges' marks in CoP is NOT due to cheating, it is due to compounding human error, and that is the biggest factor in the variability in the total scores. Not so in ordinals.

Click to expand...

I don't think your statement is supportable until a lot of statisticians have had a go at it, and not just CoP's detractors and supporters.

The evidence is whether or not the skating community agrees with the results. The data from the CoP is not evidence, but a clue, and misleading one at that.

Click to expand...

How do you define the skating community? The skaters and coaches who've weighed in to say they like the system? The skaters and coaches who may be keeping silent because they don't want to be on the ISU's bad side? Dick Button? Everyone? Because the system was changed to placate the IOC. TV viewership and fan support was already on the wane before the '02 Olympics.

1. Internal consistency is only relevant to the ordinal system. Under CoP, a judge can mark each skater up or down in relatively the same way as the other judges, but it is the slight variations in magnitude of the differences (not the relative placements) that will affect the total scores.

Click to expand...

Ordinals diminish the magnitude of differences and affect the placements own way. Obviously we each prize a different value system.

2. Um, that is rather circular. The question was, How do you determine if the judges' marks follow the code when there is a wide range of values across the board to begin with?

Click to expand...

First, the people who created the standards judge the program. Then they compare the scores of the judges to what the expected result should be. Then they determine whether there is an issue with the standards, explanation, or training, or the judges themselves. Each of which they address.

moyesii · Dec 8, 2003

the studies I've seen, which show that humans are more accurate at making relative judgements than remembering unaided the microdecisions and balancing them correctly. In most cases of human judgement, ballpark is enough, and relative judgement is very effective when getting it "right enough". I wouldn't define choosing figure skaters as "right enough."

If that is the case, then the 5 program components are more of a problem in the CoP than the single presentation mark in the ordinal system. 5 different marks on a scale of 1-10 must each take into consideration the performances of the skaters before and after. These are relative judgments, since no 2 skaters are exactly the same, no marks in any program component should be given twice in order to ensure that each of the program component scores is statistically valid. This will require a human accuracy of judgment to the hundredth of a point. 5 such decisions per skater, more error.

In addition, cognitive dissonance sets in, where an element is given too much weight, based on the contrast to the rest of the events. One of the biggest truisms in skating is "that's the last think the judges will see and remember."

And that will affect the program component marks, which are put in at the end of the program.

Can be, but not always. Let's say that four judges for skater A and four judges for skater B are using similar balanced gauges -- four lean a little toward artistry, four lean a little more toward technique. The fifth judge could be completely out of line, and the majority is hardly neutralizing that judge.

The situation that you describe would be unlikely unless it is bloc judging. Under CoP, even if the majority of judges AGREE, one judge can skew the results Since 8 out of 9 judges can agree, and one judge has the power to change the results in CoP, and the situation that you described is unlikely, we conclude that ordinals are more robust than the CoP.

Intent is irrelevant if a judge is out of line with the rest of the scoring. However, the ISU can compare the deviating scores to the written standards to determine if the judge is correct in his/her scores and the other eight are wrong, or if the scores are truly "out of whack."

By your definition, ALL of the judges have been "out of whack" so far since there has been so much deviation in all the scores. :laugh:

Under CoP if the total difference between scores for 8 of 9 judges is marginal and one judge is so out of line with the scoring practices that the differential in her score is greater than the total of all other eight judges, and this hasn't been addressed by trimmed mean, then yes, one judge can "skew" the results. This could actually be the correct decision.

Or not.

However, even in the ordinals world, there are far more cases of close calls where one or two judges being very far out of line ultimately make the decision.

Bloc judging has a long history in the sport.

That is why having standards against which the scores -- and not just those scores that are counted -- can be compared is critical.

Have all the judges been following the standards?

In your 8-1 example, the 8 judges haven't necessarily correctly countered the 9th judge. The 8 judges could be playing the average/median game as well to skew the results.

How?

Under CoP, if the review of the 9th judge's scores against the written standards pans out, the other 8 judges can be brought up and assigned to a JPG event in Arizona.

We'll see.

Point two on which we will never agree, based on point one. The errors in determining the placement in the first place are hidden, and always will be.

The "error" is the fall of the minority placements. Such as 8-1 or 7-3. A unanimous decision of the panel has no error. Remember, the ordinal system does not calculate with raw scores. It only determines placements. The CoP claims that it can accurately measure skaters on an absolute scale. The error is inherent to CoP.

Under OBO, omissions of deductions are sometimes transparent, and sometimes not transparent. They are transparent when a skater who has never received a 5.8-6.0 falls -- mandatory .5 deduction -- and is given a 5.4 or 5.5 or 5.6 as a technical score.

Not necessarily. The mark itself (5.4, 5.5, etc.) is meaningless. It is the placement that is important. The placement might be correct in light of all the other skaters' performances and all the other elements that the current skater in review has performed.

By looking at the nationalities of the judges? Cheating is hardly apparent up front, because judges whose scores are "out of line" are brought up after the fact to the ISU at the end of each year. When the judges weren't secret at the time of the competition, they were asked to write up explanations for their marks, marks they could justify any way they wanted to, by emphasizing anything they wanted to. I don't see the Italian judge who gave Butyrskaya a 10th place ordinal in the '02 Olympic SP banished to the Milan novice Ladies' qualifying event.

Yes, the ISU has not dealt with this problem well.

Actually, once the algorithms were set, the analysis could be done real-time. When under ordinals were any decisions revoked and the medals redistributed? When one judge said she was pressured and the IOC swooped in? And even there the placements weren't changed.

You are confusing the ailments of the ISU itself with the scoring system. The ISU has chosen not to punish judges who have been caught cheating and it has not addressed the issue of bloc judging.

It is the goal of both systems to judge the parts accurately and to take the whole into consideration. The choreography and interpretation marks, and, to an extent, the transitions scores, as well as performance/execution scores take the duration into consideration.

But not accurately.

I disagree with both contentions, first that ordinals are better at this -- same as irreconciable points one and two. Second that the whole is greater than the sum of the parts. If the skater keeps up speed, flow, deep edges throughout -- duration -- then that skater, ex: Kwan's -- skating skills scores should be higher than the skater who doesn't -- ex: Sokolova. If Kwan's choreography and her execution of that choreography uses the music according to the standards set for choreography, then her choreography score should be higher than Sokolova's, which met a lower standard. When the elements -- judged real-time -- are added to the duration pieces, I think that is the whole.

But the "duration pieces" are not judged in real-time. They are judged in retrospect.

Sometimes a 5/4 split implies a close competition. Sometimes a 5/4 split implies two sets of judges very far apart. However, a 5/4 split does not always indicate political bloc judging, even when the judges are aligned geographically.

Ok.

I have said that I don't see that the judges on the whole are adhering to the code. I have not said that there is extreme variation in the marks. If anything, they reflect ordinal-like judgements too frequently. (The "extreme" marks I've seen so far are the ones that seem to be adhering to the code.) So the most that I admit is that the system is no worse than under ordinals, where the judges use their "independence" to set their own criteria.

Hm, but under CoP, that has a much more detrimental effect on the results.

Which is why a judge needs to be trained to be objective and to adhere to standards.

They're still human.

Again, I disagree. I don't think that the standards are that hard to follow, but I do think that having the same judges mark both immediate and duration elements is a mistake.

Well that's the way it is.

Sometimes and under some circumstances. Not other times and other under circumstances, like the ones I've already described.

What? Huh? Where? :eek:

Second, CoP was tested last year against the ordinal system, and were certified to be at least as valid as the ordinals system.

Where are the published results of these tests? How many competitions?

They are asked to explain their marks, which can be justified by any combination of criteria and weighting that they want. I can't remember the last time that a judge was held accountable for his/her marks, because accountability implies consequences.

Blame the ISU.

The marks that have error are the marks that deviate from the code. It's pretty straightforward to identify this.

Can you ID the marks for us? Example?

I would characterize the two systems differently: under ordinals, a series of judges accredited by skating federations make independent decisions based on their own weighing and interpretation of events, in a relative scoring system in which biases are difficult to identify. The CoP sets a standard for individual technical and duration elements (technical and presentation) and expects the judges to put aside their biases and judge to those standards.

I guess it will start happening, because you "expect" it to.

Obiviously, I disagree. The ordinals system allows individual judges to use the rules and standards as guidelines, hidden under a single aggregate judgement, which they may have to explain, but for which they are not held accountable.

The ordinal placements speak for themselves. The marks themselves are not meaningful in the ordinal system.

The difference between a 6.5 and 7.5 for choreography, for example, is clearly explained in the guidelines, and is on the judges' consoles for reference.

So maybe you'll have a stronger case when it starts happening. As Sandra Loosemore stated, so far the CoP has not been working the way it should.

If each judge is using the same relative criteria to mark choreography for each skater, it doesn't matter to the final result whether the scores are relatively high or low, because the same judges' scores are evaluated.

No, it does matter, because the interval between each and every mark is not fixed.

I agree that the program elements scores under CoP are neither more or less valid than ordinals.

Less valid, due to the computation.

The single judge might be the most correct one in the bunch. Maybe, for example, Totmianina/Marinin did skate better in the LP than Shen/Zhao at World '03, with more difficulty, even though Shen/Zhao had that "je ne sais quoi" spark, not to mention the great human interest story of her bashed and numb ankle. (T&M received one first-place ordinal.)

Ok.

That is exactly what happened under ordinals. Different judging panels determined different results. Hence the moaning and glee by the skaters and coaches when the '02 Olympics judging panel was announced many months in advance.

Bloc judging.

He's been coaching for most of the last decade, and "active coach" is among the qualifications for the position.

I guess coaches are good judges.

That contradicts what you were saying earlier about the incentive for judges to cheat by sticking their scores around the middle to cheat on their placements. I would say that sometimes that majority of 9 judges is better than the power of 1, and sometimes it's not.

9 judges are better than 1.

I would argue differently that the judges decided to overemphasize revolutions in their technical scoring and to downplay proper technique and quality. I also think that judges tend to ignore the "parts" and allow "magic" performances to give too much emphasis to the whole. Hence Lipinski's and Baiul's and Shen and Zhao's "magical" skates.

I wouldn't categorize Lipinski's and Baiul's performances together.

I don't think all program components are being marked contrary to the standards. I suspect that the judges are taking their impression of the program for the rest, and not differentiating between them

You mean it's not working?

CoP is pretty clear -- if a skater introduces a new move, and no one else in that competition uses it, that skater is given two points each time the move is used for the rest of the season. If two skaters introduce the move in the same competition, they both get credit once. That's not to mention the transitions score, where innovation is also rewarded. Or in the technical scores, where level 3 requires additional difficulty, which is often achieved through innovation -- hard entrances and exits, change of handholds, etc.

That sounds very clear.

The "rigidly codified system" actually incents skaters to be original and innovative, by suggesting how they might use this to increase points, and to be sure to be given credit for the innovation and originality.

I guess 3 double axels in a program is innovation.

I would disagree. The majority is not always right. Majority voting -- a zero-sum game -- is not always the right answer either. That is why various types of voting systems across the world in democracies and republic address the voice of the minority.

The majority of judges on a fair panel should decide the competition. I don't think anyone would disagree.

It is possible to look at a .01 difference under CoP and chalk this up to standard deviation. Under ordinals, the difference for five judges in a 5/4 split to each have made their decision on a .001 difference each. Just because the differences aren't quantified under ordinals doesn't mean that they are any more substantial.

Where did you come up with that??

Of course the judges will be making decisions. However the closeness of the decisions is quantified. CoP doesn't hide that at all; it exposes it. There's a total for every single judge on panel, all there for everyone to see.

The closeness in the totals is different from the patterns in the judges' marks. The CoP totals are misleading.

I don't think your statement is supportable until a lot of statisticians have had a go at it, and not just CoP's detractors and supporters.

We'll see.

How do you define the skating community? The skaters and coaches who've weighed in to say they like the system? The skaters and coaches who may be keeping silent because they don't want to be on the ISU's bad side? Dick Button? Everyone?

Yes.

Ordinals diminish the magnitude of differences and affect the placements own way. Obviously we each prize a different value system.

I value valid and reliable results.

hockeyfan228 · Dec 9, 2003

At this point, the only thing I can do is quote the entire post again and tell you again where I don't agree with you. That would be a waste of time, especially given the number of laugh symbols and roll-eyes, which I don't think indicates much respect.

gkelly · Dec 9, 2003

Two points, moyesii.

Random selection:

Random selection of a subset of judges to "count" from a larger set has always been used with ordinal systems (majority and OBO) in the sense of choosing 9 official judges and a substitute judge from an initial panel of 10.

It is being used on a larger scale with OBO in the "interim system" used last season and at ISU championships this season, and many of the same arguments for or against doing so apply equally whether the system in use is 6.0 or code of points.

It is not an inherent part of a code of points per se and could always be eliminated from the current code of points system. So could anonymity (also in place with OBO in the interim system).

Eliminating high and low scores from the sets of scores being added up and averaged in any system where absolute marks count (code of points, or the various versions of adding up total scores on a 10.0 scale used in numerous pro competitions) is not random. Whether trimmed means are better than just plain old untrimmed means or totals is another question, and the details would be somewhat different depending whether judges are marking whole programs with two marks or individual elements and "program components."

So make sure to separate arguments against random selection from arguments against trimmed means. They are not the same thing at all, and mushing them together just confuses the issue.

Bloc judging

My understanding of what this phrase means is either or both of the following:

1) Judges with similar geographical/political allegiances (which can be intra- as well as international, e.g., California vs. Michigan or Moscow vs. St. Petersburg) or similar cultural backgrounds and stylistic preferences or traditions in which aspects of technique to value most highly may have a consistent tendency to among themselves to rate the same skaters particularly high or low, even without any conscious attempt on the part of any of them individually to do anything but judge the skating honestly as they see it, much less colluding to fix results. If a judging panel happens to include several representatives of one or more such cohesive school of thought, we can see all those judges ranking the same skaters the same way just by judging completely honestly or by individually giving a little bit of active "benefit of the doubt" to those skaters who represent the nations/clubs/styles they want to support and the opposite to those skaters' competitors. When a subset of the panel votes very similarly, especially if their geographic affiliations obviously align them with each other and with the skaters they're favoring, we can call that a "bloc" even when it was in no way prearranged.

2) Sometimes several judges on a panel may in fact collude to attempt to influence the results together by agreeing to support certain skaters at the expense of others, perhaps by predetermining exact results or perhaps by agreeing to do favors for each other's countries by marking up the skaters from each nation with a judge in the bloc. Obviously, this practice is completely corrupt.

The number of judges in either type of "bloc" may vary. With a 9-judge panel, if you have a bloc of 5 or 6 or more judges working together or otherwise marking similarly, that bloc will control the result. If there are only 2 or 3 or 4 judges working together, they'll be in a minority but can swing the result their way if the rest of the panel is honestly divided.

There is nothing inherent in the number of judges in the majority in favor of the winner for first vs. second place being 5/4 vs. 6/3 or 7/2 to indicate whether bloc judging was at play. If the top two skaters are more or less well matched but with different strengths and weaknesses, and the rest of the skaters not quite at the same level, it's entirely likely to have a 5/4 split between them with all judges judging honestly. If a different panel of judges had been selected out of the entire pool of people eligible to judge that competition, a different split or a different result might have occured. It's also possible that one or more of the judges, on either side of the split, were cheating, but others who gave the same rankings for the top skaters arrived at them honestly.

Usually to identify collusion, you would need to look national or other allegiances of the judges compared to the skaters that they favor, and also at whether judges with different allegiances appear to be doing favors for each other's skaters. For that, you would need to look at more than just the top two skaters.

Mathman · Dec 9, 2003

I don't know whether this famous story is true or not, but it is told of the nineteenth century philosopher G.W.F. Hegel. Hegel was lecturing on his theory of the dialectic, when a student in the back row timidly raised his hand and said, "But Professor Hegel, what you are saying does not correspond to reality."

Hegel supposedly replied, "So much the worse for reality!"

I think this thread has too many words and not enough numbers in it. We have all read the same articles on this subject, but we seem to differ as to how impressed we are with claims about the "statistical robustness" of the ordinal system, within the context of a sampling theory model.

For simplicity, let's suppose that there are only two skaters -- let's call them Tara and Michelle -- and that it is the job of a 9-judge panel to rank them either T-M or M-T. Well, in the actual contest at Nagano the judges split 6-3 for Tara, so (as required by sampling theory) in the absence of any better estimate, let's pretend that in the great imaginary population of all possible figure skating judges, 2/3 of them would give Tara the first-place ordinal and 1/3 Michelle. What percent of the time would an arbitrary sample of nine judges have a majority for Michelle?

Answer: Using the cumulative binomial distribution, about 14.5%.

That is, if we did this experiment over and over, 85.5% of the time the majority of the panel will go along with the expected choice "Tara," and 14.5% of the time the majority would give the medal to Michelle "in error."

Not too bad. Maybe we can live with that.

Now imagine a closer contest. Say 55% of all the judges in the population prefer Tara's performance and 45% prefer Michelle's. (Within the model, this is still a huge landslide for Tara, BTW.) Now the probability is:

62.1% of the time the majority of the panel gets it right, and 37.9% of the time they get it wrong because of "sampling error."

Now, if we are content with odds like these, then hurray for ordinals and we can congratulate ourselves on our "robustness". If we are not content, what refuge do we have?

Well, I might have made a mistake in arithmetic (but I don't think so).

Or we might wonder whether we are using the right statistical model (elementary sampling theory) after all. That is, we might say, "But professor Hegel, that does not correspond to the way that figure skating is judged."

Mathman

Tonichelle · Dec 9, 2003

do you use math in every situation, Mathman? LOL!

moyesii · Dec 9, 2003

You didn't make a mistake in your arithmetic, Mathman, but you made a mistake in your assumption about the population.

The target population for the sample of judges would be a unanimous decision of the judges. This assumption is the basis of both the ordinal system AND the COP. You cannot assume that the sample of ordinals (ex. 7-3) is THE representation of the population. You assume that there is error (The 3 minority votes.). With the standards and guidelines in place for both systems and with a fair panel of judges, under ordinals we would expect the judges to vote 9-0 for each placement (ex. 2002 Worlds Yagudin). Under CoP, we would aim for the judges to score each skater with the exact same mark across the board for each element and each program component (but even for Yagudin, with all of the judges in favor, this would not happen. See Plushenko's scores for example.). Both systems have error, but the ordinal system doesn't use raw scores, and so the placements under ordinals are less defined by that error.

gkelly:
I don't think you understand my position on the random draw of the judges. I said that statistically the random draw is significant under CoP, because every draw will produce a different score for every skater every time. I also said, however, that under CoP having random selection of the judges makes no difference on the validity or reliability of the results system, because the scores produced by the CoP system are not reliable or valid to begin with. In other words, even if there's no random selection and all 9 judges' marks are used, the deviation in the judges' marks is enough to suggest that the total scores are meaningless in CoP.

It is not an inherent part of a code of points per se and could always be eliminated from the current code of points system. So could anonymity (also in place with OBO in the interim system).

Why do all the CoP proponents compare the CoP to the interim system? :laugh:

Eliminating high and low scores from the sets of scores being added up and averaged in any system where absolute marks count (code of points, or the various versions of adding up total scores on a 10.0 scale used in numerous pro competitions) is not random.

We all know the quality and strict judging standards of those pro competitions! :laugh:

1) Judges with similar geographical/political allegiances (which can be intra- as well as international, e.g., California vs. Michigan or Moscow vs. St. Petersburg) or similar cultural backgrounds and stylistic preferences or traditions in which aspects of technique to value most highly may have a consistent tendency to among themselves to rate the same skaters particularly high or low, even without any conscious attempt on the part of any of them individually to do anything but judge the skating honestly as they see it, much less colluding to fix results. If a judging panel happens to include several representatives of one or more such cohesive school of thought, we can see all those judges ranking the same skaters the same way just by judging completely honestly or by individually giving a little bit of active "benefit of the doubt" to those skaters who represent the nations/clubs/styles they want to support and the opposite to those skaters' competitors. When a subset of the panel votes very similarly, especially if their geographic affiliations obviously align them with each other and with the skaters they're favoring, we can call that a "bloc" even when it was in no way prearranged.

In other words, subjectivity and judging are inseparable. Under CoP, the impact of just one such an "honestly biased" judge on the panel can have a huge impact on the scores, since they are computed from the raw marks.

The number of judges in either type of "bloc" may vary. With a 9-judge panel, if you have a bloc of 5 or 6 or more judges working together or otherwise marking similarly, that bloc will control the result. If there are only 2 or 3 or 4 judges working together, they'll be in a minority but can swing the result their way if the rest of the panel is honestly divided.

More so under CoP.

Usually to identify collusion, you would need to look national or other allegiances of the judges compared to the skaters that they favor, and also at whether judges with different allegiances appear to be doing favors for each other's skaters. For that, you would need to look at more than just the top two skaters.

Or just look at consistent patterns of judging behavior as Tracy Wilson did.

Mathman · Dec 9, 2003

moyesii said:
The target population for the sample of judges would be a unanimous decision of the judges. This assumption is the basis of both the ordinal system AND the COP. You cannot assume that the sample of ordinals (ex. 7-3) is THE representation of the population. You assume that there is error (The 3 minority votes.). With the standards and guidelines in place for both systems and with a fair panel of judges, under ordinals we would expect the judges to vote 9-0 for each placement (ex. 2002 Worlds Yagudin). Under CoP, we would aim for the judges to score each skater with the exact same mark across the board for each element and each program component (but even for Yagudin, with all of the judges in favor, this would not happen. See Plushenko's scores for example.). Both systems have error, but the ordinal system doesn't use raw scores, and so the placements under ordinals are less defined by that error.

Well, you've got me on the run now, Moyesii. Every sentence in this paragraph seems flatly absurd to me, except the last, which makes sense but I don't know whether it is true or not.

I obviously will have to bone up before I can come to terms with of all these "assumptions." Can you give me a reference to help me begin my study?

Mathman

Tonichelle · Dec 9, 2003

We all know the quality and strict judging standards of those pro competitions!

yeah, how dare judges judge skaters based on the whole program and not just their stupid jumps for one basic score... :rolleye:

moyesii · Dec 9, 2003

Well, in the actual contest at Nagano the judges split 6-3 for Tara, so (as required by sampling theory) in the absence of any better estimate, let's pretend that in the great imaginary population of all possible figure skating judges, 2/3 of them would give Tara the first-place ordinal and 1/3 Michelle. What percent of the time would an arbitrary sample of nine judges have a majority for Michelle?

This is where I'm trying to match statistics with the real world, as per your objection to sampling theory in the first place. I'd say that while it is ok to use a sample to make an estimation of the population, we never assume that that sample represents the true (exact) population value, we assume that there is error. Also in this case, we do know something about the population value, namely that all 9 judges would mark the skaters according to the same rules and standards, so that we would expect a fair panel of judges to agree unanimously on the placements. This is the assumption behind both the majority system for ordinals and the application of the central limit theorem in the CoP. The reason that we can take the averages of each of the separate element and component marks, is because we assume that the distribution of samples would be centered at the mean. As our resident CoP expert, hockeyfan228 stated, the judges are expected to mark the skaters uniformly in the component marks according to the code, so that this mean value is expected to be in agreement with the code, and the actual values in the data set are "error".

To resolve the problem that if statistically, the target is a uniform vote of the judges, then how do we draw samples with "error"? For example, if we draw a sample of marbles from a bag, and get 7 red and 2 blue, we assume that the bag might have around 70 red marbles and 20 blue, but we don't assume that the bag has 90 red marbles. So I'm thinking that maybe the distribution is bimodal. One mode is 9 judges all in favor of the "correct" skater (as defined by the standards) and the other mode is all 9 judges making the incorrect choice. The theory behind my "assumption" might be wrong, but I know for sure that the standards make it so that there is only one true correct choice, as hockeyfan228 would agree.

Under CoP, we would aim for the judges to score each skater with the exact same mark across the board for each element and each program component (but even for Yagudin, with all of the judges in favor, this would not happen. See Plushenko's scores for example.).

How can this be absurd unless the CoP is absurd? This "assumption" is the backbone of the CoP.

Mathman · Dec 10, 2003

Hi, Toni.

Yes, I do always think in mathematical terms. The reason why is because sometimes I don't really understand things very well when they are just explained in words. Like, I read in the newspaper: "We took a poll and concluded that we can be pretty sure that somewhere around 55% of the people support President Bush's policy in Iraq."

What statistics does is help up to understand what phrases like "pretty sure" and "somewhere around" mean. So with statistics we can say, "We are 95% sure that the percentage of people who support Bush is between 53% and 57%."

I like that.

Moyesii, I do not think that the goal of the CoP is to take the judging out of judging. I also do not think that the goal or aim or hope of the ordinal system is to produce a unanimous decision. Indeed, I do not see any reason to expect this to happen or to feel particularly pleased when it does.

But enough words. Let's check it out. This is how the ordinal system works, using the model of statistical sampling theory. Just so we won't be overwhelmed by too many numbers, I will pretend that the entire population of all possible figure skating judges comprises five people, judges A, B, C, D and E. Judges A, B and C place Tara's performance first, while judges D and E would give the first place ordinal to Michelle.

Now we seat a panel of three to actually judge the event and award the prizes. Here are all the things that can happen, each equally likely.

We might chose:

ABC: Tara wins unanimously.
ABD: Tara wins 2 to 1.
ABE: Tara wins 2 to 1.
ACD: Tara wins 2 to 1.
ACE: Tara wins 2 to 1.
BCD: Tara wins 2 to 1.
BCE: Tara wins 2 to 1.
ADE: Michelle wins 2 to 1.
BDE: Michelle wins 2 to 1.
CDE: Michelle wins 2 to 1.

So we see that a unanimous decision, far from being the "mean" of anything, will occur only one time in ten.

The mode of this distribution is: Tara wins 2 to 1, and the probability of this occurring is 60% (6 out of 10).

There is a 30% chance (3 out of 10) that Michelle will win the prize "in error" -- that is, not because of her skating but because of the luck of the draw.

If the population size increases, the numbers don't change much, but if the sample size increases, the probability of having a unanimous verdict goes down. For instance, for an infinite population and a sample of size 9, the probability that all the judges will agree decreases to 2.6%.

So that's why I like numbers so much, Toni. Just look at how clear and clean that analysis was!

BUT EVEN SO...

Moyesii's main point might be right. Most statisticians who have commented on the CoP think that Moyesii IS right. Moyesii's main contention is that ordinal based systems are more "robust" than point-total systems like the CoP. "Robustness" means, if you do the same experiment over and over, do you get the same result each time, or do you get results that are all over the place? In our example above we got the typical result "Tara wins 2 to 1" 60% of the time. So that's pretty good, for such a small sample. Under the CoP we might get all kinds of different results without rhyme or reason to them.

But we do not know yet whether this is true or not, because we do not have enough data yet to come to a conclusion about this.

Mathman

moyesii · Dec 10, 2003

So that's why I like numbers so much, Toni. Just look at how clear and clean that analysis was!

But Mathman, that was pretty much the same analysis as your previous one, and based on the same assumption.

Moyesii, I do not think that the goal of the CoP is to take the judging out of judging.

Then I think your real discussion is with hockeyfan228.

I also do not think that the goal or aim or hope of the ordinal system is to produce a unanimous decision.

Not the aim of the system, but the aim of the laws and standards in place in any system. Of course, everyone is free to interpret the laws in different ways, which produces statistical deviation. The aim of the ordinal system itself is to produce results that on the whole are reflective of the standards.

Indeed, I do not see any reason to expect this to happen or to feel particularly pleased when it does.

I think you'd pretty much have no faith in the validity of any judged events at all, including trial by jury. Nevertheless, the validity of these systems rests in the possibility that the majority's decision will be reflective of the defined standards, that is reflective of the population as a whole. The ordinal system achieves this objective better than CoP, since as hockeyfan228 admitted, under CoP just one judge can skew the entire results for better or worse.

So we see that a unanimous decision, far from being the "mean" of anything, will occur only one time in ten.

As I said, a unanimous decision is not the aim of the system. It is the purpose of the majority system such that even if the results fall short of the target, the end result is the same.

There is a 30% chance (3 out of 10) that Michelle will win the prize "in error" -- that is, not because of her skating but because of the luck of the draw.

But you're violating a fundamental rule of inferential stats, which is that you can only make generalizations from a sample if the individuals in the sample are truly representative of the entire population. But we know that the assignment of the judges is not random, but deliberate. For example, we know that the composition of nations represented on the panel will have predictable effects on the result. Therefore, the panel of judges is not a random sample.

gkelly · Dec 10, 2003

moyesii said:
gkelly:
I don't think you understand my position on the random draw of the judges. I said that statistically the random draw is significant under CoP, because every draw will produce a different score for every skater every time. []

I have no idea what you're talking about and doubt you can explain it so I will.

Why do all the CoP proponents compare the CoP to the interim system?

Click to expand...

I'm not a proponent or opponent of any particular system. I'm just in favor of getting the facts as accurate as possible and keeping the issues straight when analyzing them.

Or just look at consistent patterns of judging behavior as Tracy Wilson did.

Click to expand...

Which involved looking at nationalities and more than two skaters, now didn't it?

Mathman · Dec 10, 2003

OT -- Moyesii, will you do me a favor and please turn on your PM thing? (Profile->options->enable personal messaging). Thanks.

MM

Tonichelle · Dec 10, 2003

Mathman I hope I didn't offend you with my post about your mathmatics

I just don't understand it as well... and so it kind of amuses me that even though I'm not taking it right now in college it's EVERYWHERE I turn

Mathman · Dec 10, 2003

Oh, not in the slightest, Toni. I am just the opposite. I do understand mathematics, and I usually get confused when people try to explain things in words, like on this thread. Different strokes for different folks, I guess.

But I hope you understood that list that I gave of all the different ways that Tara or Michelle might come out the winner, just by the luck of the draw. This is something that everybody knows might happen, that a skater will just get unlucky and face some judges who are going to vote for the other person no matter what. So I like to see the numbers which explain just how likely that is to happen. In the example that I gave, the judges will favor Tara 70% of the time, and 30% of the time Michelle will squeak through by luck, just because she got a panel of "Kwan judges." (Like me, LOL.)

Larry

Tonichelle · Dec 10, 2003

I understand percentages really well... I'm not a total math dunce

but I still hate it LOL

Mathman · Dec 12, 2003

Oops, now it's my turn to apoplgize, Toni. I didn't mean to imply that anyone is bad in math. I just wanted to bring up the point that if people want to argue about statistics then we need to look at some actual numbers, not just talk about it.

Mathman

Ladies Free Skate and Results + GPF Finalists

hockeyfan228

moyesii

hockeyfan228

moyesii

hockeyfan228

gkelly

Mathman

Tonichelle

Idita-Rock-n-Roll

moyesii

Mathman

Tonichelle

Idita-Rock-n-Roll

moyesii

Mathman

moyesii

gkelly

Mathman

Tonichelle

Idita-Rock-n-Roll

Mathman

Tonichelle

Idita-Rock-n-Roll

Mathman

Similar threads

Connect with us