Should the IJS use median scores instead of the trimmed mean? | Golden Skate

Should the IJS use median scores instead of the trimmed mean?

Joined
Jun 21, 2003
This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.

Here is an example, one that is not far-fetched in the least. Suppose the program component scores for the nine judges came out like this:

Skater A: 9.50 9.00 9,00 9,00 9,00 9.00 8.50 8.50 8.25
Skater B: 8.50 8.75 8.75 8.75 8.75 8.75 9.25 9.25 9.25

Throw out highest and lowest and we have

9.00 9.00 9.00 9.00 9.00 8.50 8.50
8.75 8.75 8.75 8.75 8.75 9.25 9.25

Nothing out of the ordinary, and if the scores were all mixed by by randomization there is nothing to comment on.

And yet … 5 judges out of 7, and 6 judges out of 9, thought that skater A performed the best. But skater B wins by a score of 62.25 to 62.00.

This situation, where a determined minority cabal can dominate the majority, could not happen if we used the median (middle score) instead of the mean. The median is simply the maximally trimmed mean -- we throw out the highest four and the lowest four instead of the highest one or two and the lowest one or two. In this example the median scores are

Skater A: 9.00
Skater B: 8.75

which certainly captures the opinions of the majority in this example.

What do you think? Would this be a better system?
 
Last edited:

FS.Addict

Rinkside
Joined
Jan 18, 2014
This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.

I TOTALLY agree with this. I simply don't understand why the ISU never thought about this.
 

YesWay

四年もかけて&#
Record Breaker
Joined
Sep 28, 2013
What do you think? Would this be a better system?
On the face of it, yes.

The only question is, does it really work better in all cases?
One example is not enough. Have you tested across a range of scenarios?
Or used real scores, from a bunch of real competitions?

I TOTALLY agree with this. I simply don't understand why the ISU never thought about this.
They may well have thought about it. For a second. But then discarded it in favour of a method... that best suited their purposes... :p
 

Meoima

Match Penalty
Joined
Feb 13, 2014
This method sounds interesting, why don't we try it with Sochi events? I am curious to know if the outcome would be different?
 
Joined
Jun 21, 2003
On the face of it, yes.

The only question is, does it really work better in all cases?

It works in all cases where there are only two competitors. If there are three or more competitors none of whom receive a majority of first place ordinals, then some oddities can occur. In every case, though, it diminishes the effectiveness of small knots of would-be cheaters.
 

Vanshilar

On the Ice
Joined
Feb 24, 2014
This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.

I'm not familiar with the ordinal judging system, but if a skater were 1st according to 5 judges but say 9th according to the other 4 judges, wouldn't a skater that was 2nd from every judge win?
 
Joined
Jun 21, 2003
I'm not familiar with the ordinal judging system, but if a skater were 1st according to 5 judges but say 9th according to the other 4 judges, wouldn't a skater that was 2nd from every judge win?

No. The skater with 5 first place ordinals wins.

There were two ordinal systems, "majority of ordinals" and "OBO (one by one)." Under majority of ordinals, in the first round if a skater has a majority of first place ordinals (5 out of 9), that's it. That skater is removed and the same rule is applied for second place, etc.

If no one has a majority of ordinals, then the skater wins who has the majority of first and second place ordinals combined.

This method suffers from the defect of allowing "flip-flops" -- violations of the principle of "independence of irrelevant alternatives." That is, A could be ahead of B for first place, then C comes along, gets third, but flips the first and second places. The one-by-one system is was supposed to prevent this, but it doesn't (nor can it, by the Arrow Impossibility Theorem). This system is a harder to explain in a few words, but it is quite straightforward and proceeds by comparing each pair of skaters one on one to determine the winner.

Even with OBO, though, 5 first place ordinals, you automatically win. This is because if you have 5 first place ordinals then you automatically beat every other competitor head-to-head by at least a score of 5 to 4. (Sarahs Hughes LP in the 2002 Olymoics is a good example.)
 
Last edited:
Joined
Jun 21, 2003
This method sounds interesting, why don't we try it with Sochi events? I am curious to know if the outcome would be different?

I think that 95% of the time the final result will be the same no matter what system is used. That is, in most contests there is a clear consensus winner. Most of the time, with honest judging, the median and the mean will be close enough not to affect the outcome.

However, if the question is, can we look back at the protocols of past events and see whether a minority was able to dominate the majority and whether this could have been remedied, the answer is, "no." We cannot do that because of anonymous randomized judging.
 

Sandpiper

Record Breaker
Joined
Apr 16, 2014
Like YesWay, I feel like one example isn't enough. I think it could have its own problems.

For instance, let's take these two skaters:

A: 6.00, 6.50, 7.25, 7.75, 8.25, 8.25, 8.50, 8.75, 8.75
B: 7.75, 8.00, 8.00, 8.25, 8.25, 9.00, 9.00, 9.25, 9.50

From all appearances, the judges generally think skater B did better. But if we just make the median, the two skaters both come out with 8.25. But if we take the average, dropping highest and lowest, skater A comes out with 7.89 and skater B with 8.53, which seem to be the better reflection of what the judges think.

Now, is screwing over skater B in this instance a lesser evil than the situation raised in Mathman's post? I don't know. But median has its own problems, which I think are worth pointing out. And if we implement the median, how would that affect the psychology of the judges, knowing that only one person's score will count at the end of the day (thankfully, there's more than one component. ;) ).
 
Joined
Jun 21, 2003
A: 6.00, 6.50, 7.25, 7.75, 8.25, 8.25, 8.50, 8.75, 8.75
B: 7.75, 8.00, 8.00, 8.25, 8.25, 9.00, 9.00, 9.25, 9.50

:eek: OK, forget the median. No method of this type is any good. Only ordinals tells the story: 8 out of 9 judges favored B; B should win. Bring back 6.0. :)
 

Vanshilar

On the Ice
Joined
Feb 24, 2014
:eek: OK, forget the median. No method of this type is any good. Only ordinals tells the story: 8 out of 9 judges favored B; B should win. Bring back 6.0. :)

Uh actually, no aggregating method of any type is good in ranked voting, which is what the Arrow's Impossibility Theorem (colloquially) says (I know I'm just loosely stating it). So it doesn't really matter per se, there won't be a "perfect" way to aggregate how multiple judges score the same event. If you just rank on ordinals, then judges 1-5 might have given a slight lead to skater X, but for example judges 6-9 might have noticed an uncalled technical error (such as a flutz) by skater X, and downgraded skater X heavily, which an ordinal system wouldn't capture. On the other hand, if you rank on a cardinal system (like IJS), then it's more readily possible for a few biased judges to throw the results, and it's very difficult to detect if the scores are anonymous. (Under an ordinal system, at least the rankings of other skaters could be used as a pseudo-cardinal metric -- like if a skater was ranked 1st by most judges but 8th by another (meaning the other judge felt 7 other skaters were better), then that probably merits further investigation.)

I'd probably favor an absolute scoring system like IJS, but one with clearly-defined methods for giving out points (i.e. measuring the value of a performance), and where judges would be held accountable for the scores they give (being properly calibrated for measuring). It seems like anonymous judging and the "corridor" system isn't really cutting it. Realistically though, whatever scoring system is only worth as much as it's actually used as advertised; as Kwan said, the problem isn't the scoring system, the problem is the people.
 

Sam-Skwantch

“I solemnly swear I’m up to no good”
Record Breaker
Joined
Dec 29, 2013
Country
United-States
I think the key issue that will arise and is always going to be present is simply that these numbers do not reflect an actual value but variable state of mind. In math 9.0 is always 9.0. In judging 9.0 can mean different things to different people or quite honestly even the same person. It's like quantum mechanics because a 9.0 can appear in two places at once for completely unrelated reasons.
 
Joined
Aug 16, 2009
I am intrigued by the idea of the median being applied here. The fact that Mathman gave only one example just means that the hypothesis must be tested a bit more extensively.
 
Joined
Jun 21, 2003
Uh actually, no aggregating method of any type is good in ranked voting, which is what the Arrow's Impossibility Theorem (colloquially) says (I know I'm just loosely stating it). So it doesn't really matter per se, there won't be a "perfect" way to aggregate how multiple judges score the same event…

I look at this question slightly differently. Arrow’s Theorem says that it is impossible to design an ordinal system that satisfies a list of properties that are considered desirable in many settings in economics and political science. As applied to figure skating, the most important of these is that in an ideal system the entry of a new candidate who finishes a distant third should not affect the placement of the top two.

The canonical example is the 2000 U.S. Presidential election between Al Gore and George Bush. It all came down to Florida, where Gore was clinging to a small lead, let us say 1,000,000 to 999,000. Along comes liberal candidate Ralph Nader, to the left of Gore. He siphons off 2000 votes from Gore. The final tally is

Bush 999,000; Gore 998,000; Nader 2000.

Bush (after a few hanging chads and a five to four vote in the U.S. Supreme court) wins all of Florida’s electoral votes and the presidency.

This sort of thing is regarded as anti-democratic because the will of the people (they preferred Gore to Bush) has been frustrated by an “irrelevant alternative” (Nader).

In figure skating this problem came to a head at the 1997 European men’s competition, where nobody skated well and the ordinals were all over the place. The ISU rushed a new system in place (OBO), but it did not address this particular problem. (All it did was make it harder for Michelle Kwan to win the 2002 Olympics. ;) ) Here is an excellent article, by Sandra Loosemore of Frogs on Ice about all this.

http://www.frogsonice.com/skateweb/obo/obo-analysis.shtml

Anyway, in figure skating the problem arises in a situation like this. Here are the ordinals given by nine judges after two skaters, A and B have gone.

Skater A: 1 1 1 1 1 2 2 2 2
Skater B: 2 2 2 2 2 1 1 1 1

Skater A is winning. She is preferred over skater B by a majority of the panel.

Now skater C goes. The new rankings are

Skater A: 1 1 1 2 2 2 2 3 3
Skater B: 2 2 2 3 3 1 1 1 1
Skater C: 3 3 3 1 1 3 3 2 2

Skater B wins with four first place ordinals, three seconds, and two thirds. Skater A must be satisfied with silver even though head-to-head she beat skater B by a score of five judges to four and she beat Skater C by a score of five judges to two. Skater A beat everybody (she is the “Condorset winner”), but Skater B won the gold medal.

So the question is, is this the wrong outcome? If we do not announce any intermediate results, but wait until the end to tally all the votes, Skater B has a good claim. Her 4 firsts and 3 seconds is better that Skater A’s 3 firsts and 4 seconds, both receiving 2 thirds. The only reason that Skater A is mad is because she thought she was winning before Skater C snuck in there and stole some first place ordinals from her. I don’t know that this is so terrible in figure skating, despite the fact that Prof. Arrow (a Nobel Prize winning economist) didn’t like it. :cool:
 

gkelly

Record Breaker
Joined
Jul 26, 2003
However, if the question is, can we look back at the protocols of past events and see whether a minority was able to dominate the majority and whether this could have been remedied, the answer is, "no." We cannot do that because of anonymous randomized judging.

Well, you could try looking at some events where the judging wasn't anonymous and randomized, e.g., JGP or US Nationals.

Of course, there would be less incentive to cheat at those events, and much less incentive or possibility to form national blocs. But there would still be minority opinions.
 
Joined
Jun 21, 2003
Well, you could try looking at some events where the judging wasn't anonymous and randomized, e.g., JGP or US Nationals.

!!!!!!!!! :yes: :party2:

Of course, there would be less incentive to cheat at those events, and much less incentive or possibility to form national blocs. But there would still be minority opinions.

That's OK. I am not necessarily trying to catch cheaters but rather to understand the mathematical peccadillos of the judging system.
 
Last edited:

caelum

On the Ice
Joined
Nov 8, 2013
The question of what measurement of central tendency is best depends on what kind of bias we are trying to correct for. If we are worried about a minority in collusion, then obviously median is the best method. If a majority of the judges are in collusion the median will lead to worse results than most other methods, but it's pretty much impossible to design a system than can handle majority-collusion effectively. But if the bias isn't too extreme and more individualistic - like a judge awarding +2 spin with a +3 or 8.50 Skating Skills with 9.00, while most of the other judges are fair, then maybe something like a winsorized mean would be appropriate.

The point is, we need to get a good handle on what kind of bias is going on in figure skating and the ISU's insistence on anonymous voting makes this really difficult to study statistically.
 

drivingmissdaisy

Record Breaker
Joined
Feb 17, 2010
I get what you're trying to do, but it seems like a waste to throw out 8 judges scores. If the panels are randomly selected, what advantage is there in having 9 judges over 7 or even 5 if only one score is going to count?
 
Top