Hersh: In figure skating, same old, same old

Isabel_O'Reilly · Jun 29, 2014

papa said:
I call it "Rise of Russian Empire" though.

So will the 2018 Olympics be the Revenge of the Russians or Return of The Korea(South)?

fairly4 · Jul 2, 2014

i also forgot to add, the main reason is fallen in the U.s. is that most people in the U.S. don't care about the sport, don't want to know about the sport.
it felt like you abandon the public (obssessed fans=people who have no connection to the sport. that doesn't have a skater in it by aunt, grandma, sister, brother, niece, nephew, not chorographer, coach.
it seems that is all you have on this board for the u.s.
it is know to be crooked since the 1930's or led us to believe since the than.

it seems that who the public wants like michelle , in fact you led us to believe you didn't care about the public fans as long as you got the sponsors and Certain viewers, well you sitll have the sponsors and your niche viewers but not much else and you made sure it seemed like sthe rules went against michelle, for tara, nicole, whether they did or didn't is a non-issue now . it what people believe and not going to change because the some of the u.s public has given up hope

Pepe Nero · Jul 5, 2014

Mathman said:
How interesting. To me it is clear that skater A is the rightful winner.

Yes, I was assuming that the judges scores lined up in the two rows. In other words, that the same judges who were the most enthusiastic about skater B were the same as those who didn't like skater A at all.

It is quite easy to raise the same issue with numbers that do match actual results.

Skater A: 8.00 8.00 8.00 8.00 8.00 8.25 8.25 8.50 8.00
Skater B: 7.75 7.75 7.75 7.75 7.75 7.75 7.75 8.00 7.50

Same question. Skater A was preferred by a majority of judges. Skater B got the most total points. In the best of all possible judging systems, who deserves to win?

Sorry for my slow reply, Mathman. I am clearly more suited for a letter-writing culture. [Insert symbol expressing humility and shame here.]

I think there is a mistake in the numbers you typed; I assume you meant the outcome to be one in which the minority of judges preferred B, but by an extent to which their preference resulted in a very slightly higher point total for B. (No?)

I favor a judging system in which the extremity of judging preferences is accounted for. In other words, I think majority should rule when the point total difference between skaters is slight, but that the point total should be more important when there is a large divergence among the judges.

Mathman, I am not a quantitative scholar (is that a category?), but I think you folks have such methods. Yes?

Mathman · Jul 5, 2014

Yes, I meant to say this: :eek:

:

Skater A: 8.00 8.00 8.00 8.00 8.00 7.75 7.75 8.00 7.75
Skater B: 7.75 7.75 7.75 7.75 7.75 8.25 8.25 8.50 8.25

Skater A wins on majority of judges, 5 to 4. Skater B has the most total points, 72 to 71.25. (If highest and lowest are dropped, skater B still wins, 55.75 to 55.50.)

So this would be a case where "the point total difference between skaters is slight," but under CoP the majority does not rule.

For an example with extreme point differences:

Skater A: 8.00 8.00 8.00 8.00 8.00 8.00 8.00 9.75 9.75
Skater B: 7.75 7.75 7.75 7.75 7.75 7.75 7.75 3.25 3.25

Now skater B wins on points, but 7 out of 9 judges favor skater A.

Are you saying that in the first example you would prefer to go with skater A, but in the second example you think skater B should win?

Or would you say that the second scenario shows evidence that two judges are conspiring to skew the results?

To me, the possibility that two judges might successfully conspire to skew the results is a mark against CoP judging, compared to ordinal judging.

Edited to add: But it is not just about the possibility of cheating. Even assuming that all judges' scores are given in good conscience, the mathematical question remains. Is it fair that two dissenting judges can out-vote a majority of seven?

gkelly · Jul 5, 2014

Mathman, that (one mark from each judge for each skater) is not how any system used by the ISU ever worked, so there's not much point in arguing what a bad system it is.

The closest thing would have been some pro competitions that added up total scores. I argued at the time that it was a bad system for the same reasons you give.

However, I wouldn't assume that a minority of judges who disagree slightly or significantly with the majority are necessarily cheating, independently or in conspiracy. Other possibilities are honest difference of opinion between experts, general incompetence on the part of some less expert judges, and honest mistakes (e.g., data input error) regardless of competence.

The ideal system should allow for the honest difference of opinion among those deemed generally competent to judge at this level, while guarding against incompetence, momentary errors, and active cheating. That's the tricky part, because the system can't tell just by looking at numbers which scores were correctly arrived at and which were not.

The IJS does have a lot of safeguards built in -- separate panels for determining the "what" and the "how well," combining multiple scores for different elements and components, dropping high and low scores and averaging the rest; and statistical methods for calling out biased or incompetent judges after the fact.

It does remain true that if two or more skaters are close in their overall base values+GOEs+PCS, one or two judges who use much wider numerical ranges than the majority (whether by natural inclination or by intention to manipulate) can skew a result in a direction they prefer. Two members of the technical panel could do even more to intentionally give incorrect calls that favor one skater and undermine rivals. But in most cases the technical content the skater actually completes and the variation in how all the judges on the panel use numbers will tend to cancel out the variance from just two judges colluding.

Of course no system could protect against a majority of the panel colluding.

Do we want to look at how the numbers work with plausible examples, for honest differences of opinions first and then consider cheating as a special case afterward?

I could imagine something like
7.75 7.75 7.75 7.75 7.75 7.75 7.75 3.25 3.25
could be possible for a single program component (most likely Transitions or Interpretation) for a top skater who deserves and earns scores in the high 8s and 9s for the other components, with some judges being conservative in how much they "go down" on the component where the skater is markedly worse than the rest of her skating and two judges going out on a limb and telling the skater that that even though she's a top senior in most areas, in that particular area she's worse than most novices they've seen.

If they differed from the majority that severely on most of the components and also significantly lowballed most that skater's GOEs, they'd be called out for bias/incompetence at the end of the season and have a hard time explaining their reasoning. So they might force the result that they want in an honestly close contest, but they'd be sacrificing their future judging careers to do so.

Mathman · Jul 5, 2014

gkelly said:
Mathman, that (one mark from each judge for each skater) is not how any system used by the ISU ever worked, so there's not much point in arguing what a bad system it is.

In reality, it is even worse. Judges could do this in every component score and in every GOE, thus producing a huge overall difference, but with no particular score so far out of line as to raise eyebrows.

However, I wouldn't assume that a minority of judges who disagree slightly or significantly with the majority are necessarily cheating, independently or in conspiracy. Other possibilities are honest difference of opinion between experts, general incompetence on the part of some less expert judges, and honest mistakes (e.g., data input error) regardless of competence.

IMHO there are two separate questions.

(1) Does the IJS make it easier for judges to cheat, and to do so without getting caught; and

(2) In the case where all judges are scoring conscientiously, does the IJS allow a minority of judges to outweigh the majority opinion.

In my opinion the answer to both of these questions is "yes." So then the question becomes whether the positive features of the IJS outweigh these potential negatives.

But in most cases the technical content the skater actually completes and the variation in how all the judges on the panel use numbers will tend to cancel out the variance from just two judges colluding.

Agreed. In any judging system, however flawed, in most cases the "right" skater comes out on top. In that sense the whole discussion is kind of nit-picking.

Do we want to look at how the numbers work with plausible examples, for honest differences of opinions first and then consider cheating as a special case afterward?

I will do this in a separate post below.

First though, I want to make one more comment relevant to the Hersh article. The main thrust of Hersh's diatribe is that the ISU Congress had an opportunity to eliminate anonymous judges and they blew it.

Here is what is wrong with anonymous judging: it gives the appearance that something shady is going on. This gives the sport of figure skating a public relations black eye. The ISU gives the impression that it does not care about this public relations aspect of the judging of figure skating contests. In this respect either the ISU is saying, "the public be damned," or else (more cynically) the ISU believes that public interest in the sport has fallen away so drastically that public opinion has become irrelevant.

It is also worth noting, as Hersh does, that a majority of ISU member federations voted to eliminate anonymous judging, but, with Mr. Cinquanta wanting to keep it, the measure failed nonetheless.

Mathman · Jul 5, 2014

OK.

Here are the actual scores (let us suppose) given by nine judges for Skating Skills. The other component scores pretty much tracked these (uniformly about 0.5 points lower for transitions, if we want to keep it realistic

). The scores given here are matched up (i.e., Judge #1 is the same in both lists, etc.), although this information is hidden from public view.

Skater A: 8.00 8.25 8.00 8.25 8.50 7.75 7.00 7.25 7.25
Skater B: 7.50 7.75 7.75 8.00 8.00 7.50 8.25 8.50 8.50

In "one person, one vote" scoring, Skater A wins, 6 votes to 3.

This is what the public sees in the protocols, after randomizing.

Skater A: 8.25 7.75 8.00 7.00 7.25 8.50 8.00 8.25 7.25
Skater B: 8.00 7.25 8.50 7.75 7.50 8.50 8.00 8.25 7.50

After eliminating highest and lowest in each row, Skater A has a total of 54.75 points and an average of 7.82. Continuing the same pattern for all five program component scores, and applying factoring (ladies' long program), Skater A is headed for a PCS total of 62.56

Skater B, similarly, is looking at a PCS total of 63.71

Skater B wins if the difference in TES is only one point or less.

Now, to be sure, in many contests one of the two has blown the other away in base value and has made fewer errors. But in such cases the scores of the judges are irrelevant to the outcome anyway, and we don't need judges at all.

In the cases where the technical scores are comparable, winning the PCS battle by 1.15 points produces a nice cushion for Skater B.

This is my attempt at giving a realistic example which raises the question: In a perfect world, who should go home with the gold medal, the skater that was favored by 6 out of 9 judges, or the skater that amassed the greater number of PCS points?

(This example does not speak to the problem of cheating. But notice that anonymous judging does not allow either the question of cheating or the question under view in this post to arise in the first place.)

Mathman · Jul 5, 2014

(cont’d.

) Now suppose that judges 7, 8, and 9 in the first list are not merely in favor of Skater B but are actually willing to cheat (either separately or in concert) to help Skater B out at the expense of Skater A. New protocol:

Skater A: 8.25 7.75 8.00 6.75 7.00 8.50 8.00 8.25 6.75
Skater B: 8.00 7.25 9.00 7.75 7.50 9.25 8.00 9.00 7.50

Although there is a slightly larger spread of values than before, none of the judges’ scores is close to being “outside the corridor” which would trigger ISU scrutiny.

“Doing the math”

, now Skater A’s total PCS is 65.12 and Skater B’s is 61.68, for a difference of 3.44. More than twice as big a cushion for Skater B as before.

But Skater A was still the choice of 6 out of 9 judges.

(Could this happen in real life? :eek:

)

gkelly · Jul 5, 2014

Personally, I'm most interested in examining the honest differences of opinion scenario, figuring out what system works best there, and then seeing how to protect against cheating under that system.

I don't like anonymity. But it's part of the "protecting against cheating" discussion that I'm less interested in.

Mathman, again, the numbers you're giving do not represent judges choices of who was better overall and deserve to win.

It is not true that Skater A was the choice to win of 6 out of 9 judges. They represent 6 judges' evaluation of Skater A's performance on one program component.

If you prefer, we can stipulate that they are actually averages of all five components from each judge for each skater. Better yet, give the totals for all all five components, numbers out of 50 rather than out of 10, with those 6 judges giving skater A total PCS 1-2 points higher than skater B, and the other 3 judges giving B 4-5 points higher than A, imagining it's a men's short program so the PCS factor is 1 and we don't have to do any more math.

Let's also imagine that all judges gave approximately equal average GOEs to both skaters, and that as far as they can tell without knowing what levels were called and without having memorized the scale of values, the technical content was approximately equal, so the judges on both sides believe that the program component scores will probably determine the outcome.

The judges should know that their estimates of TES are likely to be off by a point or two, maybe more, precisely because they are not human calculators. If the 6 judges honestly believe that skater A was enough better than B on PCS that A should deserve to win even if B had higher levels on all spins and steps, and better GOEs on the elements where the GOEs have higher values, then they should reflect that by giving A significantly higher PCS.

If they don't think skater A was that much better on program components, then by giving A slightly higher component marks, they are not "choosing A" as the winner. They are simply reflecting that they thought A was slightly better on the program components.

If you like, we can say that they are choosing skater A as the winner of the program components, by a fairly slim margin.

If we're still discussing only honest judges, then it just so happens in this case that the minority of judges who prefer skater B on program components also happen to use wider numerical ranges. That may be because they feel more strongly about B's superiority. Or they could just be bolder in their use of numbers.

Neither camp is wrong about who is better in my honest judge scenario -- they just have different opinions.

Is it OK for the minority opinion to prevail in who "wins" the PCS? IMO, that depends on how good the reasons are for giving wider gaps in PCS. Which we can't tell just by looking at the numbers.

Mathman · Jul 5, 2014

gkelly said:
Mathman, again, the numbers you're giving do not represent judges choices of who was better overall and deserve to win.

It is not true that Skater A was the choice to win of 6 out of 9 judges. They represent 6 judges' evaluation of Skater A's performance on one program component.

All five program components tend to corollate with the first one that the judges write down, usually Skating Skills. If you take a judges' SS score and multiply by five, in practice that's a pretty good estimate of the total program component scores given by that judge. In other words...

If you prefer, we can stipulate that they are actually averages of all five components from each judge for each skater.

OK.

Let's also imagine that all judges gave approximately equal average GOEs to both skaters, and that as far as they can tell without knowing what levels were called and without having memorized the scale of values, the technical content was approximately equal, so the judges on both sides believe that the program component scores will probably determine the outcome.

Check. That is the scenario that is most interesting.

The judges should know that their estimates of TES are likely to be off by a point or two, maybe more, precisely because they are not human calculators. If the 6 judges honestly believe that skater A was enough better than B on PCS that A should deserve to win even if B had higher levels on all spins and steps, and better GOEs on the elements where the GOEs have higher values, then they should reflect that by giving A significantly higher PCS.

If they were giving ordinals, yes. But if they were IJS judges they should do this: For each skater and for each component a contentious judge is expected to give the score that the skater deserves for that component under the IJS rules, regardless of whom the judge thinks deserves to win. This is the crucial difference between ordinal and point-total judging.

If they don't think skater A was that much better on program components, then by giving A slightly higher component marks, they are not "choosing A" as the winner. They are simply reflecting that they thought A was slightly better on the program components.

If you like, we can say that they are choosing skater A as the winner of the program components, by a fairly slim margin.

If the judges expected that the element scores, levels, and GOEs are about the same, I do not see the distinction that you are making here. In that case -- everything the same but slightly higher program components -- why can't we infer that the judge thought that this skater deserves to win (on the second mark, as we used to say

).

But you are right. No matter what scores are given we do not know flat out for sure who a particular judge thought ought to win. To me, the point of this exercise is to postulate that 6 judges in fact did think that skater A performed better, and then to investigate how this might play out in the marks.

We can eliminate this assumption if you like, but then I don't understand what the question is.

If we're still discussing only honest judges, then it just so happens in this case that the minority of judges who prefer skater B on program components also happen to use wider numerical ranges. That may be because they feel more strongly about B's superiority. Or they could just be bolder in their use of numbers.

Neither camp is wrong about who is better in my honest judge scenario -- they just have different opinions.

That is the whole thing. The camp of six is of one opinion, the camp of three of another. Each camp has a right to its opinion. Which skater should get the gold medal?

Is it OK for the minority opinion to prevail in who "wins" the PCS? IMO, that depends on how good the reasons are for giving wider gaps in PCS.

This is a good argument. But to me it is a slippery slope. Who judges how good the judges' reasons are? What if a judge just felt like spreading the scores out more than another judge did, for no particular reason at all?

Which we can't tell just by looking at the numbers.

An alternative would be to ask the judges point blank. Then we would not have to guess. This is ordinal judging.

--------

I think I have gotten a little bit off track here. The position that I wanted to argue (independently of other pros and cons of various judging schemes) is actually much simpler. Is it possible for a dedicated minority of judges to prevail over the majority to determine the winner of a contest. In IJS, yes. In ordinal systems, no.

As William Butler Yeats wrote, "The center cannot hold...The best lack all conviction, while the worst are full of passionate intensity."

)

Sandpiper · Jul 5, 2014

Just chiming in to say I find this whole discussion fascinating. I'm intrigued by the possibility of certain judges using a wider range of marks. This could cause skewered results by the minority--as Mathman notes--even without dishonest collusion. The question is, is it "right" for Skater B to win because those judges that preferred him/her, really preferred him/her?

Generally, I'm with Mathman: the judges that favoured Skater A didn't favour Skater A "less" or something; more likely, they were simply more conservative with their marks. Ultimately, the judges are human, and we can't reflect "better by how much?" perfectly through a 10-point scale. Perhaps there can be guidelines on how many skaters you need to put in the 5s range, 7s range, 9s range, ect. for each competition, but even that runs into problems: What if someone suddenly performs very well, but you've "run out" of the 9s you can give in P/E? We can't expect the judges to mark everyone after the fact and rank them either, since I don't trust the judges to hold all the performances well enough in their heads.

Sam-Skwantch · Jul 5, 2014

I think I'm in the minority here. I think it is just as likely that the judges giving Adelina high scores were in fact the same ones giving Kim high...yet slightly less high marks than Adelina. At least just as likely as scoring high for one and low balling the other..... GIGANTIC :slink:

Sandpiper · Jul 5, 2014

^I don't think Mathman and Gkelly were talking about Adelina vs. Yuna. Just speculating on marks, ordinals, and COP in general. I have no idea what happened in terms of Adelina vs. Yuna. I don't agree with the result but who knows why it happened.

gkelly · Jul 6, 2014

Mathman said:
All five program components tend to corollate with the first one that the judges write down, usually Skating Skills. If you take a judges' SS score and multiply by five, in practice that's a pretty good estimate of the total program component scores given by that judge.

Yes, more or less. We rarely see big gaps between the highest and lowest component for the same skater from the same judge.

But, given two skaters who are close enough in ability to engender honest disagreement, it would be very rare that some judges would have skater A somewhat higher on all five components, and the other judges would have skater B higher on all components, and not one of them would have at least one component a little higher for the other skater. Which is why it makes more sense to look at total PCS rather than just one component.

If they were giving ordinals, yes. But if they were IJS judges they should do this: For each skater and for each component a contentious judge is expected to give the score that the skater deserves for that component under the IJS rules, regardless of whom the judge thinks deserves to win. This is the crucial difference between ordinal and point-total judging.

And, indeed, if they are approaching IJS scoring the way they're supposed to, they wouldn't be thinking in terms of who deserves to win at all. They'd just be scoring each component against their mental standard for that component.

But you're the one who used the terminology about voting for or choosing skaters. Which is not IJS thinking to begin with.

But you are right. No matter what scores are given we do not know flat out for sure who a particular judge thought ought to win.

And the judges won't always know either, especially if they trained primarily in IJS over the past decade and aren't stuck in 6.0 thinking.

To me, the point of this exercise is to postulate that 6 judges in fact did think that skater A performed better, and then to investigate how this might play out in the marks.

We can eliminate this assumption if you like, but then I don't understand what the question is.

Six judges think that Skater A is slightly better than Skater B -- within the margin of error for how closely they can guess what the TES might be.

Three judges think that Skater B is significantly better than Skater A -- that B was enough better on PCS to have, let's say, at least a one-triple advantage.

The question that Pepe Nero raised is, should the strength of those three judges' conviction that B was significantly better than A outweigh the six judges' more tepid opinion that A was slightly better than B?

Sandpiper said:
Just chiming in to say I find this whole discussion fascinating. I'm intrigued by the possibility of certain judges using a wider range of marks. This could cause skewered results by the minority--as Mathman notes--even without dishonest collusion. The question is, is it "right" for Skater B to win because those judges that preferred him/her, really preferred him/her?

That is the question.

For me, how right it is comes down to whether those judges had good, skating-related reasons for believing that B was that much better than A. If yes, it's fine with me. If they're deliberately trying to skew the results for political reasons or because they don't like Skater A for irrelevant reasons, then obviously no. If it just happens that the three judges who are in a minority of honestly preferring B's skating over A's also happen to be in a minority of using wider point spreads for everyone, then it's the luck of the draw.

Generally, I'm with Mathman: the judges that favoured Skater A didn't favour Skater A "less" or something; more likely, they were simply more conservative with their marks.

Yes, quite likely. Probably either from being stuck in 6.0 thinking or from lack of confidence in their own opinions.

People complain when judges bunch their PCS too tightly for each skater, if they appear to be trying to stay within the corridor more than to reflect real differences in the skating. Judges are encouraged to "spread their marks."

So in theory, spreading marks as appropriate to reflect differences between components in the same skater and to reflect real differences between skaters is a better use of numbers. Dare we say that judges who use a wider spread of marks appropriately are better judges, or at least better at using the numbers the way they're meant to be used?

Spreading marks also gives judges more control over the results than judges who choose to use narrower ranges.

Is it OK if the judges who have the strongest effect on the result are the judges with the

Of course it's not OK if those judges are spreading marks in a deliberate attempt to manipulate political results throughout the field.

And not good either if those judges have more confidence in their opinions than their actual level of competence supports.

Perhaps there can be guidelines on how many skaters you need to put in the 5s range, 7s range, 9s range, ect. for each competition,

No, definitely not. The whole point of IJS is that the numbers have real meanings in relation to each judge's mental standards for Average, Good, Outstanding, among all skaters. At some events most of the skaters might be in the average range, so most of the scores would be in the 9s. On a good day at the Grand Prix Final, all six of the skaters in one event might be outstanding skaters and all deserve 9s.

but even that runs into problems: What if someone suddenly performs very well, but you've "run out" of the 9s you can give in P/E?

This is only a problem at the very top of the scale. You can't completely run out of 9s because 10.0 is still available. Although if you've given 9.75 to one very good skate and then a later skater is significantly better than that, you can't give significantly higher marks, only slightly higher.

At worst, if you give some 10s to one outstanding skater and a later skater is even more outstanding, all you can do is give them straight 10s and hope the GOEs will help reflect the even greater superiority.

So I think it's better for judges to set their mental standard such that 9.5, 9.75, and especially 10 are very rare marks, not to be given out except on those very special occasions when one of the best skaters in the world has one of their best performances ever. And then not be afraid to go there when such special occasions do arise.

Mathman · Jul 6, 2014

gkelly said:
But, given two skaters who are close enough in ability to engender honest disagreement, it would be very rare that some judges would have skater A somewhat higher on all five components, and the other judges would have skater B higher on all components, and not one of them would have at least one component a little higher for the other skater. Which is why it makes more sense to look at total PCS rather than just one component.

Thanks to randomization of judges' scores, we do not know whether this is rare or common. My intuition is that it is not rare at all.

But anyway, now I am sorry that in illustrating the question I presented sample scores for only one component. This sent the discussion off on a tangent.

And, indeed, if they are approaching IJS scoring the way they're supposed to, they wouldn't be thinking in terms of who deserves to win at all. They'd just be scoring each component against their mental standard for that component.

I guess that is what the whole controversy comes down to. What is the purpose of a sports competition? Is it to see which competitor outperformed the other, or is it to decide which competitor did a better job of conforming to an objective standard?

It almost seems like this is the same thing, but this argument shows that it is not, quite.

People complain when judges bunch their PCS too tightly for each skater, if they appear to be trying to stay within the corridor more than to reflect real differences in the skating.

Staying in the corridor has nothing to do with bunching the PCS tightly for each skater. It has to do with being not too far off from the other judges for each component. If the other judges spread out their marks, you had better do so, too, or you risk being outside the corridor on some of them.

Judges are encouraged to "spread their marks."

Is this true? Do you mean that the ISU officially encourages judges to do this? The scoring scale has to accommodate all skaters from beginners to world champions. There cannot be too much of a spread between the best skater in the world and the second best.

I know, though, that this is the case for GOEs. After 2010 the ISU specifically revised the rules to encourage a greater spread, compensating for this by factoring the GOEs. So for instance, in the old days a judge might give a +2 for a pretty good triple jump and the skater would get +2 added to his score. With the new rules that same jump would get a +3 GOE and, after factoring, 2.1 points would be added to the score.

So in theory, spreading marks as appropriate to reflect differences between components in the same skater and to reflect real differences between skaters is a better use of numbers. Dare we say that judges who use a wider spread of marks appropriately are better judges, or at least better at using the numbers the way they're meant to be used?

I don't think so. If the contest is close, the scores should be close together. If one skater is much better than the other then the scores should be farther apart.

Spreading marks also gives judges more control over the results than judges who choose to use narrower ranges. Is it OK if the judges who have the strongest effect on the result are the judges with the (strongest opinions)?

My instinct says no. Yours says yes.

Certainly you are right if the discussion is about posting on a figure skating board. The partisans who shout the loudest about how their skater was robbed drown out the voices of fans who think it was close and could have gone either way. And the loud voices back up their views with cogent arguments, analyses of the protocols, and stop-frame You Tubes. But still I am uneasy about "loudest voice wins."

gkelly · Jul 6, 2014

I said

it would be very rare that some judges would have skater A somewhat higher on all five components, and the other judges would have skater B higher on all components

Mathman said:
Thanks to randomization of judges' scores, we do not know whether this is rare or common. My intuition is that it is not rare at all.

Here are all the protocols for US Nationals. Scores are not anonymous or randomized -- judge #1 on the officials list is always judge #1 for all skaters, etc.

Last I heard that was also true for the JGP, if you think international events are a better example.

For any two skaters (not necessarily near the top or even adjacent in the standings) in any event can we find examples in which
1) a majority of judges thought that skater A was better (or equal to) than B on all 5 components
and
2) all the remaining judges thought that skater B was better (or equal to) than A on all 5 components.

I.e., not even one judge had one component reversed from their overall opinions of the two skaters' relative PCS quality. I'll allow ties on some of the components.

It will be tedious to look for them. I'll take a quick look at the senior medalists to see if there are any examples there and report back if I find any.

ETA: In 24 head-to-head matchups among senior medalists in short and free programs for all disciplines, I found one example:
In the ladies' SP, 8 judges marked Gold higher than or equal to Edmonds in all components. Judge #1 marked Edmonds higher in all.

But anyway, now I am sorry that in illustrating the question I presented sample scores for only one component. This sent the discussion off on a tangent.

I agree this discussion has nothing to do with Hersh and little to do with anonymity. I was wondering a post or two ago whether to take it to a new thread -- it does interest me.

Fine with me if mods want to split off the last page or so of this thread.

I guess that is what the whole controversy comes down to. What is the purpose of a sports competition? Is it to see which competitor outperformed the other, or is it to decide which competitor did a better job of conforming to an objective standard?

The point of the competition is to see who outperformed.
However, the task of the judges under IJS is not to rank the skaters, vote for which skater they thought performed best, or choose who they think should finish higher. Unlike under 6.0, they're just supposed to score each skater independently.

With IJS it's possible to score skaters who have no one to compete against. This won't happen in international competition, but it does happen at some club competitions or even at some national championships of smaller federations: one skater (usually male) or team enters an event, no one else enters, or one or two others enter and then withdraw. The remaining skater has invested money to travel to the event, paid an entry fee for a club competition, or needs to skate and be scored to make a national title official.

With 6.0, the judges can write down whatever scores they like before the skater performs, and all ordinals will be 1s. The accountants could even print out the result sheets in advance. Or the judges could all sleep through the program then wake up and input random marks. All the judges need to do is rank the skaters, and with a field of 1 skater the result is literally a no-brainer.

With IJS, the tech panel calls the elements and the judges assign GOEs and PCS, based on what the skater actually does.
Their scores are not about ranking, but about evaluating the performance.

With IJS, even in large fields the tech panels' and judges' process is supposed to be about evaluating each performance independently. Then the numbers are added up and the skater with the highest total wins. But unlike with 6.0, none of the officials is tasked with deciding who should have the highest total.

Staying in the corridor has nothing to do with bunching the PCS tightly for each skater. It has to do with being not too far off from the other judges for each component. If the other judges spread out their marks, you had better do so, too, or you risk being outside the corridor on some of them.

See page 6 of ISU communication 1631:

For each of the five (5) Program Compnoents, the Judge's corridor will be based on 1.50 Deviation Points (15,0% of the maximum 10.0 points per Component) between the score of a Judge and the calculated Judges' average score for the same Component, i.e. in total 7.50 Deviation Points for the 5 Program Components. Plus and minus Deviation Points are subtracted.

Click to expand...

The example they give has Judge A giving a skater component scores of 4.00 4.00 6.25 7.25 7.00 on a panel with averages of 5.75 5.85 5.45 6.00 5.55. Judge A has a pretty extreme spread here but is "well within the allowed corridor" because the very low marks balance out the very high ones.

The way the ISU calculates deviation from the average for components (as opposed to GOEs, for which the plus and minus deviation points are added), spreading marks for the various components of the same skater can actually help a judge stay within the corridor better than bunching them too closely but marking in a different range than the rest of the panel.

I'm not sure how many judges actually realize that though.

Is this true? Do you mean that the ISU officially encourages judges to do this?

Click to expand...

As far as the ISU is concerned, all I know is what's in this document about judge evaluation. And the e-mail several years ago from a member of the assessment commission reminding other judges to evaluate Transitions independently.

I have heard US judges discuss the concept of spreading marks, as a good thing.

The scoring scale has to accommodate all skaters from beginners to world champions. There cannot be too much of a spread between the best skater in the world and the second best.

Click to expand...

True.
As I understand, the recommendation to spread marks between skaters means to use the whole range of marks as appropriate, regardless of the type of competition.

Just because a skater is entered in an ISU championship -- let's say Euros or 4Cs -- doesn't mean that they automatically deserve championship-level scores. Or that just because the vast majority of senior level skaters deserve scores in the 5s, 6s, maybe 7s, that judges should be limited to that range. At Euros or 4Cs you might well see the first place skater earning 9s and the last place skater earning 3s or even 2s for some components.

At Junior Worlds, 2s at the bottom of the field are more common but should only be given if warranted, if the skater is clearly below typical junior quality for that component. Very high scores are even rarer among juniors than seniors, but judges shouldn't go into the event thinking that they should cap their scores in the 6s just because this is a junior event -- if a great junior performance is just as good in some components as a senior performance that deserves 8s, then it should get 8s.

Sometimes the second-best skater in an event (in each judge's opinion) is pretty close to the best skater and should receive similar scores. And maybe the third, fourth, and fifth best as well. Sometimes the best skater in an event is in a class by him/herself and deserves much higher scores than the next best skater(s) in the field. Depends on how they skate -- their overall skill level, and how well they actually deliver on that day.

Spreading marks within the scores for a single skater means that just because the skater deserves a high score for Skating Skills doesn't mean they automatically deserve a high score for Transitions or Performance/Execution or Interpretation . . . or vice versa.

I think some fans want judges always to give large gaps between a skater's highest and lowest component.

As far as I can tell the ISU wants judges to give large gaps when the skater's skills are unbalanced from one component to another, and small when the skater is at close to the same level in all component areas.

I don't think so. If the contest is close, the scores should be close together. If one skater is much better than the other then the scores should be farther apart.

Click to expand...

Absolutely.

If a judge honestly believes that the best skater was significantly better than the next best (in their opinion), on one or all components, they should reflect that significant difference with scores more than 0.25 apart. If they think the skater was significantly better on all components, the larger gaps will add up to several points across 5 components.

If a judge honestly believes the two skaters were about equal on a component, she can give the same mark. Or give 0.25 more to the skater she thinks was slightly better. If she thinks skater A was slightly better than skater B on all components, that will add up to a full point or two on PCS as a whole.

But really the judges shouldn't be comparing the skaters directly, they should be comparing each to their own mental standards. Ideally they should have a good internal sense of the difference between Good (7) and Very Good (8), and halfway between Good and Very Good (7.5) or between but closer to one or the other (7.25 or 7.75), and then match each performance to that mental image.

Sandpiper · Jul 6, 2014

But ultimately, components are just numbers. They are not--in fact, cannot be--objective standards. What happens in the end is ranking skaters, because that determines the medals/placements everyone cares about. I don't think most judges are capable of keeping an objective scale in their heads. They'll have to, at points, go, "Oh, I gave Edmunds 7.50, that means I need to give Gold 8.25 because she's better." If they don't do that, they'll run into fatigue from looking at so many competitors, and likely end up giving scores they don't truly believe in (I think this might be a factor in why people who don't make the final group are low-balled. They're superior to the group they're in, but judges aren't comfortable giving out sudden 9s when the best they've given so far is a 7.50. They don't "need" 9s to place the skater ahead. But by the end of the night, judges are comfortable giving out 9s, thus potentially "screwing over" the earlier skater).

Mathman · Jul 6, 2014

gkelly said:
Here are all the protocols for US Nationals. Scores are not anonymous or randomized -- judge #1 on the officials list is always judge #1 for all skaters, etc.

Last I heard that was also true for the JGP, if you think international events are a better example.

For any two skaters (not necessarily near the top or even adjacent in the standings) in any event can we find examples in which

1) a majority of judges thought that skater A was better (or equal to) than B on all 5 components
and

2) all the remaining judges thought that skater B was better (or equal to) than A on all 5 components.

I.e., not even one judge had one component reversed from their overall opinions of the two skaters' relative PCS quality. I'll allow ties on some of the components.

ETA: In 24 head-to-head matchups among senior medalists in short and free programs for all disciplines, I found one example:
In the ladies' SP, 8 judges marked Gold higher than or equal to Edmonds in all components. Judge #1 marked Edmonds higher in all.

I will work on this, too. I am most interested in examples where a majority of judges favored one skater pretty much down the line, but only by a small amount, while other judges liked the other skater consistently and by quite a bit. Maybe this almost never happens in the absence of collusion and bias.

However, the task of the judges under IJS is not to rank the skaters, vote for which skater they thought performed best, or choose who they think should finish higher. Unlike under 6.0, they're just supposed to score each skater independently….

In arguments of this sort we are basically saying that the premisses underlying the IJS are valid because they are the premisses underlying the IJS. I do not accuse the IJS judges of not doing their assigned task. The question is, should the system be changed in view of the fact that it allows an enthusiastic minority to override a complacent majority?

With IJS it's possible to score skaters who have no one to compete against. This won't happen in international competition, but it does happen at some club competitions or even at some national championships of smaller federations: one skater (usually male) or team enters an event, no one else enters, or one or two others enter and then withdraw.

The IJS is good in this setting. However, I believe that under 6.0 judging also it was possible for a lone skater to skate against a "gold standard, silver standard, or bronze standard" in the case of a boy who is the only skater entered. (Or he could skate against the girls.

). The judges would decide whether he met the standard or not. (Sort of IJS in 6.0 clothing.

)

See page 6 of ISU communication 163

I had in mind this kind of example. Judges 1, 2, 3, and 4 score program components SS and TR.

SS: 8.00 8.00 8.00 8.00
Tr: 2.50 2.50 2.50 8.00

Judges 1, 2, and 3 have spread out their marks, but it is judge 4 who is “outside the corridor.”

Judges 1, 2, and 3 have a total of -1.5 deviation points over two components. They are OK. Judge 3 has +4.50 deviation points over two components. This judge is in trouble. The question is not whether the scores in each column are spread out or close together, it is whether the score in each row is close to the average in that row or not.

By the way, the fact that one has to go to such unrealistic extremes to create an example shows that it is almost impossible for any judge, however incompetent or biased, to get caught by the ISU judges’ oversight procedure -- especially in view of that plus/minus thing that you pointed out.

Anyway,I don't think the "corridor" is to blame for judges tending to give almost the same scores for each of the five components for a particular skater.

Mathman · Jul 6, 2014

Sandpiper said:
But ultimately, components are just numbers. They are not--in fact, cannot be--objective standards. What happens in the end is ranking skaters, because that determines the medals/placements everyone cares about. I don't think most judges are capable of keeping an objective scale in their heads. They'll have to, at points, go, "Oh, I gave Edmunds 7.50, that means I need to give Gold 8.25 because she's better." If they don't do that, they'll run into fatigue from looking at so many competitors, and likely end up giving scores they don't truly believe in (I think this might be a factor in why people who don't make the final group are low-balled. They're superior to the group they're in, but judges aren't comfortable giving out sudden 9s when the best they've given so far is a 7.50. They don't "need" 9s to place the skater ahead. But by the end of the night, judges are comfortable giving out 9s, thus potentially "screwing over" the earlier skater).

It's very tricky. I agree with what you said, and especially with the point that psychologically it is practically impossible to avoid saying in our minds, "this performance was better than that one." That is why I can only be dragged kicking and screaming from my home world "Earth 6.0.". 6.0, for all its faults (not as severe as it is generally supposed, though), is at least honest on the point that we humans are good at comparing but bad at measuring against a standard that exists only in our minds.

Every judged sport gives it a fling, though. In piano-playing contests performers get scores which supposedly rate them against a standard that, if not strictly objective, at least is supposed to take into account the judges' experience of many, many performances and also of imaginary ones that set the standard. Sometime no prize is given. Although pianist A was better than pianist B, they both stunk. (I.e., it was a splatfest.

) At a dog show the judges are not supposed to say, this dog is cuter than than one, but rather, this dog conforms more perfectly to the standards of the breed than that one (also a great champion) does.

Bottom line: I don't know what I think, and I am sorry if I have given the impression in these threads that I do.

gkelly · Jul 6, 2014

Mathman said:
I will work on this, too. I am most interested in examples where a majority of judges favored one skater pretty much down the line, but only by a small amount, while other judges liked the other skater consistently and by quite a bit. Maybe this almost never happens in the absence of collusion and bias.

It's a very specialized situation, which is why I expect it to be rare. Unless there really is pervasive collusion.

I don't think simple bias would result in none of the majority judges liking the other skater better on even one component. Or all of the minority judges using a significantly wider range between the skaters. Assuming a minority of more than 1 judge, which was all I could find in a narrow sample -- and in that example, the judges with the largest differences between the two skaters were part of the majority and didn't have in the top 2 on PCS.

I do not accuse the IJS judges of not doing their assigned task.

Not on purpose, but using language like choice and vote implies a 6.0 assigned task.

The question is, should the system be changed in view of the fact that it allows an enthusiastic minority to override a complacent majority?

That's a valid question. Which I think should first be asked about honest judges with different preferences and use of numbers. And then even if we decide it's not a problem in the ideal case, whether real-world collusion is enough of a problem that a system designed for an ideal case is not appropriate.

The IJS is good in this setting. However, I believe that under 6.0 judging also it was possible for a lone skater to skate against a "gold standard, silver standard, or bronze standard" in the case of a boy who is the only skater entered. (Or he could skate against the girls. ). The judges would decide whether he met the standard or not. (Sort of IJS in 6.0 clothing. )

"Skating against the book" was really an ISI concept that some USFS competitions adopted (and State Games, which combine the two systems). But USFS has not allowed lone entries to be given anything except 1st place ordinals for a number of years now.

I had in mind this kind of example. Judges 1, 2, 3, and 4 score program components SS and TR.

SS: 8.00 8.00 8.00 8.00
Tr: 2.50 2.50 2.50 8.00

Judges 1, 2, and 3 have spread out their marks, but it is judge 4 who is “outside the corridor.”

Judges 1, 2, and 3 have a total of -1.5 deviation points over two components. They are OK. Judge 3 has +4.50 deviation points over two components. This judge is in trouble. The question is not whether the scores in each column are spread out or close together, it is whether the score in each row is close to the average in that row or not.

By the way, the fact that one has to go to such unrealistic extremes to create an example shows that it is almost impossible for any judge, however incompetent or biased, to get caught by the ISU judges’ oversight procedure.

For two components, the allowable deviation would be 3.0, right? So you could be a little less extreme even here and still catch Judge 4.

Most likely a judge who is caught by this method would be in too high a range across the board for some skaters and too low on others, and also out of whack on the GOEs.

So a competent judge who judges most of the skaters honestly but intentionally tries to prop up a specific skater and lowball the likely closest rivals could avoid detection.

Hersh: In figure skating, same old, same old

Isabel_O'Reilly

fairly4

Pepe Nero

Mathman

gkelly

Mathman

Mathman

Mathman

gkelly

Mathman

Sandpiper

Sam-Skwantch

“I solemnly swear I’m up to no good”

Sandpiper

gkelly

Mathman

gkelly

Sandpiper

Mathman

Mathman

gkelly

Similar threads

Connect with us