Olympic judging changes ( 5 judge results)

PftJump · Apr 22, 2009

ISU just wanna have controllable system.
That's all.
Need more discussion?
They want another Sarah 'Cheated' Hughes.
All the current change is for that.
ISU has ideal Olympic's Gold members and podium pitures, Maybe...beneficial ones.

jennylovskt · Apr 22, 2009

Mathman said:
... it would be very likely that the ... conspirators' votes would be the extremes that would be discarded, as the culprits tried to inflate one skater's marks and low-ball another's.

:agree::agree::agree: But the ISU is so out of touch and has so much power. How can you make them listen?

feraina · Apr 23, 2009

Mathman said:
Seat nine judges, no random draw, then trim the mean by discarding the bottom two and the top two for each line.

Good idea. Random drawing accomplishes nothing. Trimming highs and lows is a good idea. It's a compromise toward median from mean, and when there are extreme outliers (either from cheating or noise), the median is more stable than the mean.

But there will be no new discussion of this, nor of the downgrade rules, until after 2010 Olympics, right?

nylynnr · Apr 24, 2009

Sorry to beat a dead horse, but this idea that "only five" people decides who wins isn't, I don't think, quite correct. 9 judges form the panel for the short, but of those 9, only 5 are seated for the long. Four "new" judges come in. Each panel has: 1. Two judges' randomly selected out, and 2. the high and low marks thrown out. So while only the decisions of five judges "count" in determining the standings of the short and long, in actuality, some 13 judges figure into the process. That's in addition to the three-person technical panel. So actually 16 people are involved, plus of course referees, data entry operators, etc.

Five has sort of been a key number for panels for a long time. A "bloc" of five has often decided who wins, and in the days of 6.0, each judge knew full well what holding up, say, 5.6/5.9 meant, and where it would place each skater. Nowadays, it's far harder for judges to sit there and calculate precisely where the GOEs and PCS scores they enter will put skaters in the order. Next season they won't even know if the jumps they mark have already been downgraded. Of course it's still possible there will be judges scoring PCS marks too high, for biased reasons, but now they won't even know if their marks count and, if their marks are the high or the low, they are thrown out. And anyone trying to "bribe" judges has no way of knowing if the judge they "pay off" will be seated for both panels, or if his or her marks will actually count.

Perhaps I am naive, but I really think this system, although certainly imperfect, is harder to manipulate than the old 6.0. As always with IJS, though, the trouble is it is rather complicated and most folks don't want to sit and hear (or read!) a half-hour explanation.

Mathman · Apr 24, 2009

The random draw and anonymous judging did not come in with the CoP. It was the essential feature of the "interim system" that was rushed into place after the Salt lake City debacle and used for one season until the ISU could get the CoP in place.

It seems clear to me that the purpose of these provisions was to make it harder for the public to detect and to priove cheating on the part of the judging panels.

nylynnr said:
As always with IJS, though, the trouble is it is rather complicated and most folks don't want to sit and hear (or read!) a half-hour explanation.

Not only don't most folks want to hear a complicated explanation, but worse, when you try to explain it, after a while folks just conclude you're a con man and a crook.

gsrossano · Apr 25, 2009

The bottom line for determining the precision of the results is how many sets of marks are in the calculation -- and under the new panel size there are five sets of marks in the calculation. It doesn't matter how you get down to the five sets of marks. What drives the mathematical "quality" of the scores is the number 5.

nylynnr said:
Nowadays, it's far harder for judges to sit there and calculate precisely where the GOEs and PCS scores they enter will put skaters in the order.

Sorry, you don't give the judges enough credit. Yes, in a large group they may not know exactly where they have put the skater in 17th place; but then who cares about 17th place. For the top skaters, and the handing out of medals they know exactly what marks they need to give, and what range of marks will not look suspicious. When I leave a panel I know exactly who I gave my top marks to and who I gave my lowest marks to. (The middle of course is all a blur under IJS.)

When marks are posted on the arena scoreboard, even if it is only TES and PCS, the savvy judges know how their marks compare to the rest of the panel, and what marks they need to give to help or hinder subsequent skaters. For example, if Brian Joubert skates first and gets a PCS of 75, then his average PC score was 7.5. I know from that if I marked high or low. If Evan Lysacek skates next, I know if I want to help him beat Joubert I have to go above 7.5.

Even with exactly placing the skaters under 6.0, a judge could only give a skater a nudge in a given direction, and could not guarantee a specific outcome. Smart judges know how to give the skaters a nudge under IJS also.

nylynnr · Apr 25, 2009

[Even with exactly placing the skaters under 6.0, a judge could only give a skater a nudge in a given direction, and could not guarantee a specific outcome. Smart judges know how to give the skaters a nudge under IJS also.[/QUOTE]

True, I just think it is a bit tougher. In the old 6.0, you knew absolutely who you put 1 or 2 or 3, and there was no chance your vote was going to be thrown out. Plus everyone knew your vote absolutely counted, which, it could be argued, opened up a greater possibility of outright bribery. And of course, no matter how high the GOE's are in IJS, to get the highest scores a skater has to have the ratified jumps and Level 3 or 4 elements, and those come from a different quarter than the judging panel. So while both systems are imperfect, one could argue IJS is more difficult (though not impossible) to manipulate. Better? That's another topic.

Mathman · Apr 25, 2009

gsrossano said:
The bottom line for determining the precision of the results is how many sets of marks are in the calculation -- and under the new panel size there are five sets of marks in the calculation. It doesn't matter how you get down to the five sets of marks. What drives the mathematical "quality" of the scores is the number 5.

I think throwing out the high and low scores muddles that a little.

I think we would expect that the standard error estimated from a random sample of size five would be greater than if we took a sample of size seven and trimmed before averaging.

gsrossano · Apr 25, 2009

Mathman said:
I think throwing out the high and low scores muddles that a little.

Only a little.

(And I never said the standard deviation wouldn't change -- only that the precision was determined by the number of marks in the calculation. For any calculation method the precision for that calculation method will improve with an increase in the number of marks, for nearly all noise sources. There are exceptions to this rule, but they are few and do not apply to marks in skating as far as I can tell.)

Mathman · Apr 25, 2009

Actually, I was thinking of the extreme case. Suppose you sat 9 judges, "counted" all nine, then took the median. This reduces the magic number from 5 all the way down to 1. But this would be an OK way to go, as far as I can tell.

And it would be really hard for a single judge, or a small minority of the panel, to boost their favorite.

gsrossano · Apr 25, 2009

Mathman said:
Actually, I was thinking of the extreme case. Suppose you sat 9 judges, "counted" all nine, then took the median. This reduces the magic number from 5 all the way down to 1.

I don't agree. All nine marks are being used to determine the median in your example. If eight of the nine marks were randomly dropped, the one remaining mark would often be significantly different from the median of the nine. If all nine marks are different, the one chosen mark would differ from the true median 8 of 9 time. And in all cases, the median (for PCs) could differ from the mean of the distribution by up to 0.25 points while the standard deviation of the mean for marks is typically 0.13 points.

The benefit from using a median also depends if you are trying to filter out random noise, or systematic errors, or outliers; and also depends on the number of samples in the distribution.

To expand on an earlier comment. Designing the system around the obsession with the occassional deal making judge is counter productive because the impact of day to day national bias, incompetence and random differences of opinion gets ignored -- and those problems are more common (than deal making) and generally more important.

The system has to be designed to deal will all potential sources of error, not just the one that is most popular to discuss.

But in general, more judges are always better than less. I can't think of any error sources where having more judges is a liabiilty. Worst case is that at some point adding more judges does not improve the reliabliity of the results. But we are so far from having enough judges to reliably decide scores to 0.01 points, that having too many judges will never be an issue.

fairly4 · Apr 25, 2009

It still boils down to the 5-4 split-want to bet the 5 judges that stay on are the ones that count

No i don't believe they will ever judge total fair. They will always judge by beneift of doubt,money and politics.

Mathman · Apr 25, 2009

I still think it is a little bit misleading to focus our mathematical wrath on the number five. It seems to me that the statistical culprit is the random draw that reduces the panel from 9 to 7. The trimming of the mean from 7 to 5 seems statistically benign to me.

In other words, I would expect the trimmed mean of random samples of size 7 to behave more like the untrimmed mean of samples of size 7 than like the untrimmed mean of samples of size 5.

gsrossano said:
I don't agree. All nine marks are being used to determine the median in your example.

By the same token, in the trimmed mean all seven marks play a role.

And in all cases, the median (for PCs) could differ from the mean of the distribution by up to 0.25 points while the standard deviation of the mean for marks is typically 0.13 points.

I think this is the crux of the matter. When the median and the mean are different, which one is "right?" Are we judging quality or are we measuring quantity?

The median is our best shot at addressing the question, "what is the judgment of the most typical judge."

In contrast, the branch of statistics that looks at means, standard deviations, etc., rests on a number of assumptions that I do not think are satisfied in the case of figure skating judging. The most important is the assumption that there is an objective quantifiable thing, external to and independent of our methods of measurement, that is "out there" waiting to be measured.

In the case of judging GOEs and PCSs, it we take the marks of each judge, add them up, and divide by n, then we have...what? At best we would have an estimate of what we would get if we added up the scores of all judges in the ISU judges' pool and divided by he number of such judges. Again, is it really the mean of all these numbers that we should be interested in estimating?

Mathman · Apr 25, 2009

Mathman said:
In other words, I would expect the trimmed mean of random samples of size 7 to behave more like the untrimmed mean of samples of size 7 than like the untrimmed mean of samples of size 5.

Pardon for quoting myself :laugh:

, but after I wrote that sentence I got to wondering if it was really true or not. I had to look it up

(I am far from being an expert on this subject.)

Here is the formula (under a few mild conditions, but not assuming normality) for the standard error of the trimmed mean:

S.E. = s_w/[(1-2g)xsqrt

]

where s_w is the Windsorized standard deviation, g is the percent of the data that you trim off each end, and n is the full sample size before trimming (distributed by the t distribution with n(1-2g)-1 d.f.)

Hsuhs · Apr 25, 2009

Mathman said:
The most important is the assumption that there is an objective quantifiable thing, external to and independent of our methods of measurement, that is "out there" waiting to be measured.

Hm!

gsrossano · Apr 26, 2009

Mathman said:
In contrast, the branch of statistics that looks at means, standard deviations, etc., rests on a number of assumptions that I do not think are satisfied in the case of figure skating judging. The most important is the assumption that there is an objective quantifiable thing, external to and independent of our methods of measurement, that is "out there" waiting to be measured.

Could not disagree more. Further, in staking out that position you are saying that there is no point in discussing the mathematics of the scores since a skating program cannot be measured; and there is no way to combine the evaluations of the judges since that would involve means or medians or some such things, the laws of which do not apply to skating marks in your view.

If skating performances cannot be measured (and you are not the only one I have heard take that point of view) then there is no point in holding a competition, and all we should have are shows and festivals.

So rather than trying to fix/change/improve the judging system, let's save everyone a lot of grief and just end competitions and limit skating to being a source of entertainment.

Mathman · Apr 26, 2009

gsrossano said:
Further, in staking out that position you are saying that there is no point in discussing the mathematics of the scores since a skating program cannot be measured;...

Heavens no!

If I didn't get off on the mathematics of figure skating judging I wouldn't have spent all day yesterday reading up on bootstrap methods for measuring the robustness of valiidity of the trimmed mean.

What I do think is this.

Ordinals are quite mathematical enough and express more honestly what is really going on in the "second mark" than do add-up-the-points methods. Under ordinal judging we can still do plenty of mathematical analyzes. I do not agree that because something cannot be measured therefore it cannot be judged, or that the judging cannot be subjected to mathematical scrutiny.

What exactly are we adding up -- and taking the mean and standard deviation of -- when we come to a judgment that this skater displayed better musical interpretation than that skater?

That having been said, I still think the most piquant comment ever made about the IJS is your, "I support the CoP a full 51 per cent." :rock:

For instance, I think the CoP is a great improvement over ordinal judging for skating contests at the developmental level (gkelly taught me this on this board). Now (I hope) we have a thousand children rushing to the protocols after their competitons to see what they need to work on next, rather than a thousand kids weeping, "the judges hate me for no reason."

gsrossano · Apr 26, 2009

Mathman said:
Ordinals are quite mathematical enough and express more honestly what is really going on in the "second mark" than do add-up-the-points methods. Under ordinal judging we can still do plenty of mathematical analyzes. I do not agree that because something cannot be measured therefore it cannot be judged, or that the judging cannot be subjected to mathematical scrutiny.

Well, then I quess we completely disagree on this too. I find this comment and your previous comment to be 100% at odds with each other. Even ordinals are a measuement -- a relative measurement but still a measuement -- and have a mean, a median, a standard deviation and an an associated uncertainty in the final results.

nylynnr · Apr 26, 2009

Obviously GRossano knows this, and certainly many others do, but it seems from other posts that some still think "five judges" make the decision. Maybe this is just semantics, but there are seven scoring judges, and the high and low mark for each element and each PC are dropped. So for each line item the five judges' marks that figure into the result will be different, unless there is an extreme judge that it is high or low on every single line. Seven judges' marks count and IMO this emphasis on "five judges make the decision" is misleading. "Seven, not nine, judges make the decision" would IMO be more accurate.

As GRossano points out, from a mathematical perspective, none of this changes anything as it is the number of votes cast, rather than the person casting the vote, that factors into the mathematical analysis.

Mathman · Apr 26, 2009

I think what is bugging me is this.

I think I am being a little bit lazy when I say, I know only one thing about statistics. I know that

t = s/sqrt

.

Therefore, every time I see some numbers I will compute the mean and the standard deviation and then I will apply this formula.

But here is where I accuse myself of laziness or a faulty memory. Although I remembered the conclusion of the Central Limit Theorem, I forgot the hypotheses. In so far as we want to apply this formula to data collected from the real world, the hypotheses are three.

(a) There is a correct mark, independent of our efforts to measure it.

Applied to figure skating scores, this means, for example, that the real and true value of the choreography of a skating program is 7.25, and if a judge gives a mark of 7.50 or 7.00, that judge has simply measured it wrong.

(b) Our measuring technique is such that the collection of all possible measurements comprises a normal distribution whose mean is the correct mark.

I do not know whether the normal distribution part is true or not for figure skating judging. The reason for using a trimmed mean (or median) is that it works better in the case where the distribution is not normal. If it is normal but with the wrong mean, that indicates a systematic error in our measurement technique.

If we are just doing a mathematical exercise and don’t really care about the true mark (or if we concede that it does not exist), then we can substitute the mean of the population of measurements for the true mark throughout.

(c) The particular sample that we have before us was chosen randomly. That is, every measurement in the population has an equal chance of being chosen for the sample.

In figure skating, this means a lot more than just, put the names of all the judges in a hat and draw some out. There are many ways in which this condition may be violated in the case of figure skating judging.

What I fear is that the IJS has rushed past the hypotheses and arrived at the conclusion with unjustified bravado and confidence.

gsrossana said:
Even ordinals are a measurement -- a relative measurement but still a measurement -- and have a mean, a median, a standard deviation and an an associated uncertainty in the final results.

I think we are using the word "measurement" differently. To me, "measuring" something means assigning a real number to it along a continuum. I do not regard saying, skater A is better than skater B, as "measuring."

I think of it as the primal "herdsmen versus farmers" war. Cowboys count, farmers measure.

Olympic judging changes ( 5 judge results)

PftJump

jennylovskt

feraina

nylynnr

Mathman

gsrossano

nylynnr

Mathman

gsrossano

Mathman

gsrossano

fairly4

Mathman

Mathman

Hsuhs

gsrossano

Mathman

gsrossano

nylynnr

Mathman

Similar threads

Connect with us