# Thread: Should the IJS use median scores instead of the trimmed mean?

1. 0

## Should the IJS use median scores instead of the trimmed mean?

This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.

Here is an example, one that is not far-fetched in the least. Suppose the program component scores for the nine judges came out like this:

Skater A: 9.50 9.00 9,00 9,00 9,00 9.00 8.50 8.50 8.25
Skater B: 8.50 8.75 8.75 8.75 8.75 8.75 9.25 9.25 9.25

Throw out highest and lowest and we have

9.00 9.00 9.00 9.00 9.00 8.50 8.50
8.75 8.75 8.75 8.75 8.75 9.25 9.25

Nothing out of the ordinary, and if the scores were all mixed by by randomization there is nothing to comment on.

And yet … 5 judges out of 7, and 6 judges out of 9, thought that skater A performed the best. But skater B wins by a score of 62.25 to 62.00.

This situation, where a determined minority cabal can dominate the majority, could not happen if we used the median (middle score) instead of the mean. The median is simply the maximally trimmed mean -- we throw out the highest four and the lowest four instead of the highest one or two and the lowest one or two. In this example the median scores are

Skater A: 9.00
Skater B: 8.75

which certainly captures the opinions of the majority in this example.

What do you think? Would this be a better system?

2. 0
Originally Posted by Mathman
This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.

3. 0
Originally Posted by Mathman
What do you think? Would this be a better system?
On the face of it, yes.

The only question is, does it really work better in all cases?
One example is not enough. Have you tested across a range of scenarios?
Or used real scores, from a bunch of real competitions?

They may well have thought about it. For a second. But then discarded it in favour of a method... that best suited their purposes... :-P

4. 0
This method sounds interesting, why don't we try it with Sochi events? I am curious to know if the outcome would be different?

5. 0
Originally Posted by YesWay
On the face of it, yes.

The only question is, does it really work better in all cases?
It works in all cases where there are only two competitors. If there are three or more competitors none of whom receive a majority of first place ordinals, then some oddities can occur. In every case, though, it diminishes the effectiveness of small knots of would-be cheaters.

6. 0
Originally Posted by Mathman
This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.
I'm not familiar with the ordinal judging system, but if a skater were 1st according to 5 judges but say 9th according to the other 4 judges, wouldn't a skater that was 2nd from every judge win?

7. 0
I agree, I think it'd work better, but we probably should test in on some past events.

8. 0
Originally Posted by Vanshilar
I'm not familiar with the ordinal judging system, but if a skater were 1st according to 5 judges but say 9th according to the other 4 judges, wouldn't a skater that was 2nd from every judge win?
No. The skater with 5 first place ordinals wins.

There were two ordinal systems, "majority of ordinals" and "OBO (one by one)." Under majority of ordinals, in the first round if a skater has a majority of first place ordinals (5 out of 9), that's it. That skater is removed and the same rule is applied for second place, etc.

If no one has a majority of ordinals, then the skater wins who has the majority of first and second place ordinals combined.

This method suffers from the defect of allowing "flip-flops" -- violations of the principle of "independence of irrelevant alternatives." That is, A could be ahead of B for first place, then C comes along, gets third, but flips the first and second places. The one-by-one system is was supposed to prevent this, but it doesn't (nor can it, by the Arrow Impossibility Theorem). This system is a harder to explain in a few words, but it is quite straightforward and proceeds by comparing each pair of skaters one on one to determine the winner.

Even with OBO, though, 5 first place ordinals, you automatically win. This is because if you have 5 first place ordinals then you automatically beat every other competitor head-to-head by at least a score of 5 to 4. (Sarahs Hughes LP in the 2002 Olymoics is a good example.)

9. 0
Originally Posted by Meoima
This method sounds interesting, why don't we try it with Sochi events? I am curious to know if the outcome would be different?
I think that 95% of the time the final result will be the same no matter what system is used. That is, in most contests there is a clear consensus winner. Most of the time, with honest judging, the median and the mean will be close enough not to affect the outcome.

However, if the question is, can we look back at the protocols of past events and see whether a minority was able to dominate the majority and whether this could have been remedied, the answer is, "no." We cannot do that because of anonymous randomized judging.

10. 0
Like YesWay, I feel like one example isn't enough. I think it could have its own problems.

For instance, let's take these two skaters:

A: 6.00, 6.50, 7.25, 7.75, 8.25, 8.25, 8.50, 8.75, 8.75
B: 7.75, 8.00, 8.00, 8.25, 8.25, 9.00, 9.00, 9.25, 9.50

From all appearances, the judges generally think skater B did better. But if we just make the median, the two skaters both come out with 8.25. But if we take the average, dropping highest and lowest, skater A comes out with 7.89 and skater B with 8.53, which seem to be the better reflection of what the judges think.

Now, is screwing over skater B in this instance a lesser evil than the situation raised in Mathman's post? I don't know. But median has its own problems, which I think are worth pointing out. And if we implement the median, how would that affect the psychology of the judges, knowing that only one person's score will count at the end of the day (thankfully, there's more than one component. ).

11. 0
Originally Posted by Sandpiper

A: 6.00, 6.50, 7.25, 7.75, 8.25, 8.25, 8.50, 8.75, 8.75
B: 7.75, 8.00, 8.00, 8.25, 8.25, 9.00, 9.00, 9.25, 9.50
OK, forget the median. No method of this type is any good. Only ordinals tells the story: 8 out of 9 judges favored B; B should win. Bring back 6.0.

12. 0
Originally Posted by Mathman
OK, forget the median. No method of this type is any good. Only ordinals tells the story: 8 out of 9 judges favored B; B should win. Bring back 6.0.
Uh actually, no aggregating method of any type is good in ranked voting, which is what the Arrow's Impossibility Theorem (colloquially) says (I know I'm just loosely stating it). So it doesn't really matter per se, there won't be a "perfect" way to aggregate how multiple judges score the same event. If you just rank on ordinals, then judges 1-5 might have given a slight lead to skater X, but for example judges 6-9 might have noticed an uncalled technical error (such as a flutz) by skater X, and downgraded skater X heavily, which an ordinal system wouldn't capture. On the other hand, if you rank on a cardinal system (like IJS), then it's more readily possible for a few biased judges to throw the results, and it's very difficult to detect if the scores are anonymous. (Under an ordinal system, at least the rankings of other skaters could be used as a pseudo-cardinal metric -- like if a skater was ranked 1st by most judges but 8th by another (meaning the other judge felt 7 other skaters were better), then that probably merits further investigation.)

I'd probably favor an absolute scoring system like IJS, but one with clearly-defined methods for giving out points (i.e. measuring the value of a performance), and where judges would be held accountable for the scores they give (being properly calibrated for measuring). It seems like anonymous judging and the "corridor" system isn't really cutting it. Realistically though, whatever scoring system is only worth as much as it's actually used as advertised; as Kwan said, the problem isn't the scoring system, the problem is the people.

13. 0
So the correct answer is to keep the system and intensively monitor the judges?

14. 0
I think the key issue that will arise and is always going to be present is simply that these numbers do not reflect an actual value but variable state of mind. In math 9.0 is always 9.0. In judging 9.0 can mean different things to different people or quite honestly even the same person. It's like quantum mechanics because a 9.0 can appear in two places at once for completely unrelated reasons.

15. 0
I am intrigued by the idea of the median being applied here. The fact that Mathman gave only one example just means that the hypothesis must be tested a bit more extensively.

Page 1 of 5 1 2 3 4 5 Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•