Page 1 of 5 1 2 3 4 5 LastLast
Results 1 to 15 of 67

Thread: Should the IJS use median scores instead of the trimmed mean?

  1. #1
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,828

    Should the IJS use median scores instead of the trimmed mean?

    This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.

    Here is an example, one that is not far-fetched in the least. Suppose the program component scores for the nine judges came out like this:

    Skater A: 9.50 9.00 9,00 9,00 9,00 9.00 8.50 8.50 8.25
    Skater B: 8.50 8.75 8.75 8.75 8.75 8.75 9.25 9.25 9.25

    Throw out highest and lowest and we have

    9.00 9.00 9.00 9.00 9.00 8.50 8.50
    8.75 8.75 8.75 8.75 8.75 9.25 9.25

    Nothing out of the ordinary, and if the scores were all mixed by by randomization there is nothing to comment on.

    And yet … 5 judges out of 7, and 6 judges out of 9, thought that skater A performed the best. But skater B wins by a score of 62.25 to 62.00.

    This situation, where a determined minority cabal can dominate the majority, could not happen if we used the median (middle score) instead of the mean. The median is simply the maximally trimmed mean -- we throw out the highest four and the lowest four instead of the highest one or two and the lowest one or two. In this example the median scores are

    Skater A: 9.00
    Skater B: 8.75

    which certainly captures the opinions of the majority in this example.

    What do you think? Would this be a better system?
    Last edited by Mathman; 06-07-2014 at 03:19 PM.

  2. #2
    Tripping on the Podium
    Join Date
    Jan 2014
    Posts
    55
    Quote Originally Posted by Mathman View Post
    This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.
    I TOTALLY agree with this. I simply don't understand why the ISU never thought about this.

  3. #3
    Custom Title
    Join Date
    Sep 2013
    Posts
    575
    Quote Originally Posted by Mathman View Post
    What do you think? Would this be a better system?
    On the face of it, yes.

    The only question is, does it really work better in all cases?
    One example is not enough. Have you tested across a range of scenarios?
    Or used real scores, from a bunch of real competitions?

    Quote Originally Posted by FS.Addict View Post
    I TOTALLY agree with this. I simply don't understand why the ISU never thought about this.
    They may well have thought about it. For a second. But then discarded it in favour of a method... that best suited their purposes... :-P

  4. #4
    Say no to horrendous costumes Meoima's Avatar
    Join Date
    Feb 2014
    Location
    North of the world
    Posts
    6,143
    This method sounds interesting, why don't we try it with Sochi events? I am curious to know if the outcome would be different?

  5. #5
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,828
    Quote Originally Posted by YesWay View Post
    On the face of it, yes.

    The only question is, does it really work better in all cases?
    It works in all cases where there are only two competitors. If there are three or more competitors none of whom receive a majority of first place ordinals, then some oddities can occur. In every case, though, it diminishes the effectiveness of small knots of would-be cheaters.

  6. #6
    Custom Title
    Join Date
    Feb 2014
    Posts
    159
    Quote Originally Posted by Mathman View Post
    This question came up, in disguised form, on another thread. When scores are averaged, extreme values can play a disproportionate role in determining the outcome. Two or three resolute conspirators can thwart the will of the majority of the judging panel simply by highballing their favorite and lowballing his/her rival. This cannot happen in ordinal judging, where a majority of first place ordinals is guaranteed always to carry the day.
    I'm not familiar with the ordinal judging system, but if a skater were 1st according to 5 judges but say 9th according to the other 4 judges, wouldn't a skater that was 2nd from every judge win?

  7. #7
    Custom Title
    Join Date
    Feb 2013
    Posts
    468
    I agree, I think it'd work better, but we probably should test in on some past events.

  8. #8
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,828
    Quote Originally Posted by Vanshilar View Post
    I'm not familiar with the ordinal judging system, but if a skater were 1st according to 5 judges but say 9th according to the other 4 judges, wouldn't a skater that was 2nd from every judge win?
    No. The skater with 5 first place ordinals wins.

    There were two ordinal systems, "majority of ordinals" and "OBO (one by one)." Under majority of ordinals, in the first round if a skater has a majority of first place ordinals (5 out of 9), that's it. That skater is removed and the same rule is applied for second place, etc.

    If no one has a majority of ordinals, then the skater wins who has the majority of first and second place ordinals combined.

    This method suffers from the defect of allowing "flip-flops" -- violations of the principle of "independence of irrelevant alternatives." That is, A could be ahead of B for first place, then C comes along, gets third, but flips the first and second places. The one-by-one system is was supposed to prevent this, but it doesn't (nor can it, by the Arrow Impossibility Theorem). This system is a harder to explain in a few words, but it is quite straightforward and proceeds by comparing each pair of skaters one on one to determine the winner.

    Even with OBO, though, 5 first place ordinals, you automatically win. This is because if you have 5 first place ordinals then you automatically beat every other competitor head-to-head by at least a score of 5 to 4. (Sarahs Hughes LP in the 2002 Olymoics is a good example.)
    Last edited by Mathman; 06-07-2014 at 05:22 PM.

  9. #9
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,828
    Quote Originally Posted by Meoima View Post
    This method sounds interesting, why don't we try it with Sochi events? I am curious to know if the outcome would be different?
    I think that 95% of the time the final result will be the same no matter what system is used. That is, in most contests there is a clear consensus winner. Most of the time, with honest judging, the median and the mean will be close enough not to affect the outcome.

    However, if the question is, can we look back at the protocols of past events and see whether a minority was able to dominate the majority and whether this could have been remedied, the answer is, "no." We cannot do that because of anonymous randomized judging.

  10. #10
    Custom Title
    Join Date
    Apr 2014
    Posts
    2,705
    Like YesWay, I feel like one example isn't enough. I think it could have its own problems.

    For instance, let's take these two skaters:

    A: 6.00, 6.50, 7.25, 7.75, 8.25, 8.25, 8.50, 8.75, 8.75
    B: 7.75, 8.00, 8.00, 8.25, 8.25, 9.00, 9.00, 9.25, 9.50

    From all appearances, the judges generally think skater B did better. But if we just make the median, the two skaters both come out with 8.25. But if we take the average, dropping highest and lowest, skater A comes out with 7.89 and skater B with 8.53, which seem to be the better reflection of what the judges think.

    Now, is screwing over skater B in this instance a lesser evil than the situation raised in Mathman's post? I don't know. But median has its own problems, which I think are worth pointing out. And if we implement the median, how would that affect the psychology of the judges, knowing that only one person's score will count at the end of the day (thankfully, there's more than one component. ).

  11. #11
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,828
    Quote Originally Posted by Sandpiper View Post

    A: 6.00, 6.50, 7.25, 7.75, 8.25, 8.25, 8.50, 8.75, 8.75
    B: 7.75, 8.00, 8.00, 8.25, 8.25, 9.00, 9.00, 9.25, 9.50
    OK, forget the median. No method of this type is any good. Only ordinals tells the story: 8 out of 9 judges favored B; B should win. Bring back 6.0.

  12. #12
    Custom Title
    Join Date
    Feb 2014
    Posts
    159
    Quote Originally Posted by Mathman View Post
    OK, forget the median. No method of this type is any good. Only ordinals tells the story: 8 out of 9 judges favored B; B should win. Bring back 6.0.
    Uh actually, no aggregating method of any type is good in ranked voting, which is what the Arrow's Impossibility Theorem (colloquially) says (I know I'm just loosely stating it). So it doesn't really matter per se, there won't be a "perfect" way to aggregate how multiple judges score the same event. If you just rank on ordinals, then judges 1-5 might have given a slight lead to skater X, but for example judges 6-9 might have noticed an uncalled technical error (such as a flutz) by skater X, and downgraded skater X heavily, which an ordinal system wouldn't capture. On the other hand, if you rank on a cardinal system (like IJS), then it's more readily possible for a few biased judges to throw the results, and it's very difficult to detect if the scores are anonymous. (Under an ordinal system, at least the rankings of other skaters could be used as a pseudo-cardinal metric -- like if a skater was ranked 1st by most judges but 8th by another (meaning the other judge felt 7 other skaters were better), then that probably merits further investigation.)

    I'd probably favor an absolute scoring system like IJS, but one with clearly-defined methods for giving out points (i.e. measuring the value of a performance), and where judges would be held accountable for the scores they give (being properly calibrated for measuring). It seems like anonymous judging and the "corridor" system isn't really cutting it. Realistically though, whatever scoring system is only worth as much as it's actually used as advertised; as Kwan said, the problem isn't the scoring system, the problem is the people.

  13. #13
    Custom Title
    Join Date
    Feb 2013
    Posts
    468
    So the correct answer is to keep the system and intensively monitor the judges?

  14. #14
    Size 7 Knife Boots Sam-Skwantch's Avatar
    Join Date
    Dec 2013
    Location
    At the Rink
    Posts
    3,964
    I think the key issue that will arise and is always going to be present is simply that these numbers do not reflect an actual value but variable state of mind. In math 9.0 is always 9.0. In judging 9.0 can mean different things to different people or quite honestly even the same person. It's like quantum mechanics because a 9.0 can appear in two places at once for completely unrelated reasons.

  15. #15
    Custom Title
    Join Date
    Aug 2009
    Posts
    9,490
    I am intrigued by the idea of the median being applied here. The fact that Mathman gave only one example just means that the hypothesis must be tested a bit more extensively.

Page 1 of 5 1 2 3 4 5 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •