Page 1 of 2 1 2 LastLast
Results 1 to 15 of 17

Thread: How can we measure the degree of agreement between two judges?

  1. #1
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,179

    How can we measure the degree of agreement between two judges?

    Under the CoP the judges’ protocol sheets contain so much data that it is hard to know how to get started in addressing the question of whether two judges see things pretty much the same way or not.

    When judging is not anonymous, there is an easy way it to do it. Convert each judges’ scores to ordinals. Then compute something called the “Spearman rank correlation coefficient.”

    Here is how to do it.

    ..................RUS....…FRA..difference..d squared

    Yagudin.........1..........2..........1..........1
    Plushenko......2..........3..........1..........1
    Goebel..........3..........1..........2..........4
    Honda..........4...........4..........0..........0

    Total............................................. .6

    Now compute r = 6 times (sum of the squared differences) divided by nx(n^2-1)

    = (6x6)/(4x15) = .6

    This is 60% correlation.
    Last edited by Mathman; 09-11-2010 at 10:59 PM.

  2. #2
    Skating is art, if you let it be. Blades of Passion's Avatar
    Join Date
    Sep 2008
    Location
    Hollywood, CA
    Posts
    3,990
    I know this isn't the point of the thread, but are those ordinals actually real? I don't think it's ever happened that a judge scored Goebel over both Yagudin and Plushenko at a competition.

    We should look at a CoP competition before the protocols became randomized for each judge and see how it compares to 6.0 judging.

    EDIT - 2006 Olympics might be a great one to examine because of how all over the place the competition was.
    Last edited by Blades of Passion; 09-12-2010 at 04:52 AM.

  3. #3
    leave no stone unturned seniorita's Avatar
    Join Date
    Jun 2008
    Posts
    5,579
    Quote Originally Posted by Mathman View Post

    Yagudin.........1..........2..........1..........1
    Plushenko......2..........3..........1..........1
    Goebel..........3..........1..........2..........4
    Honda..........4...........4..........0.........0.
    :sheesh:

  4. #4
    Custom Title
    Join Date
    Jan 2009
    Location
    Lloret de Mar, Spain
    Posts
    377
    Calculating correlations of GOEs/PCSs would be enough, but random order spoils it.
    Like one judge gives: 0 1 1 1 2 1 2 1
    Another: 1 1 1 0 1 0 2 0

    Pearson correlation is 0.394, medium.

    In PCS those judges have 0.845 correlation

  5. #5
    can't come down to Earth prettykeys's Avatar
    Join Date
    Oct 2009
    Posts
    1,801
    Slightly different, but isn't there a way of calculating the divergence of a single sample (i.e. one judge's scores) from the mean score of all the judges...like, the variance? You can do that for each of the technical elements and program component scores. I'm not sure if this is appropriate, though (is it reasonable to assume that scores should follow a normal distribution about the mean?)

  6. #6
    Custom Title
    Join Date
    Jan 2009
    Location
    Lloret de Mar, Spain
    Posts
    377
    prettykeys
    Slightly different, but isn't there a way of calculating the divergence of a single sample (i.e. one judge's scores) from the mean score of all the judges...like, the variance? You can do that for each of the technical elements and program component scores.
    Yeah, it is about searching for a judge that marks very differently from the rest. I guess if someone is out from standard deviation is enough.

    I'm not sure if this is appropriate, though (is it reasonable to assume that scores should follow a normal distribution about the mean?)
    Yes, it's reasonable.

  7. #7
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,179
    Quote Originally Posted by prettykeys View Post
    Slightly different, but isn't there a way of calculating the divergence of a single sample (i.e. one judge's scores) from the mean score of all the judges...like, the variance? You can do that for each of the technical elements and program component scores. I'm not sure if this is appropriate, though (is it reasonable to assume that scores should follow a normal distribution about the mean?)
    This would be fine. The rule of thumb is usually something like, more than three standard deviations from the mean is out of bounds.

    However, the ISU’s own system for evaluating judges is somewhat different. It uses just the sum of the absolute values of the individual scores, rather than the square root of the sum of the squares.

    I don’t know why they decided to do it that way. The sum of the squares is prettier (in the sense of being compatible with distance formulas in large-dimensional Euclidean spaces), it lends itself to mathematical manipulations better (the absolutel value is not differentiable), and it gives relatively greater weight to those scores that are way out of line instead of just a little.

    Perhaps the concern was a possible loss of robustness if you use the more common standard deviation as your measure of variation. (This would be the case, for instance, if they -- like you -- had suspicions about whether the underlying distribution is symmetric or not.)

    Anyway, here is how the ISU identifies “possible anomalies” in judges scores. A new communication (#1631) about this came out in July. Scroll down to section E, page 5.

    http://www.isu.org/vsite/vnavsite/pa...v-list,00.html

    For GOEs, for instance, it goes like this.

    For each skater, for each element, calculate the mean score from all sitting judges, plus the referee counted twice (so this is not the same as the trimmed mean that we see in the protocols). Then for each judge, calculate the absolute value of the difference between that judge’s score and the average.

    And these differences up. If the sum exceeds the number of elements being judged, then that is “outside the corridore” and this “anomaly” comes to the attention of the judges’ oversight procedure.

    (The little chart about pluses and minus that accompanies this explanation in the Communication is kind of a red herring. I assume this information is kept so they can tell whether a judge is consistently favoring/dumping on a skater, or whether the judge is giving some marks way too high and others way too low for the same skater.)

  8. #8
    Tripping on the Podium
    Join Date
    Apr 2010
    Posts
    55
    Sadly I can't understand well what you're talking about, as I'm a totally non mathematical person.

    But do you know this Japanese site?
    http://fssim.sakura.ne.jp/

    For example, this is an analysis of OG Men's competition.
    http://fssim.sakura.ne.jp/200910/200...couverMen.html

    The owner of the site calculated deviation of each judge's mark.(Accoding to Mathman, ISU no longer uses the deviation in order to evaluate their judges. Rigtht? I'm such a mathematic fool.)

    I'm very satisfied with the final placement of the event. So basically I don't have any complaints on it.
    But looking into those scores, especially the FS scores, my unscientific brain can't help to wonder if the judges might have tried to adjust the result and succeeded.
    As you know, there were two judges who gave Plushenko very low marks (147.83/151.23) and two who gave him very high ones (180.03/179.03). Actually Evan had a low one (157.53) and a high one (180.43), as well. Probably they were a U.S. judge and a Russian, and it's a rather ordinary thing to happen in competiotion. But how about two and two? Didn't they try to make it even?

    I know it's off topic, but may I say this here? Being a huge fan of Daisuke, the J5 for his FS annoys me a lot. He/She gave him only 141.88 in TSS and 62.38 in TES. (He/She gave -3 GOE on 3Lz+2T. Did Daisuke fall on the jump??? He/She gave only +1 GOE on the fabulous Lv4 Cist as well.) I can't find a lower TES than it until Florent Amodio's J5 and J9. I hope the J5 already got a yellow card from ISU.

  9. #9
    leave no stone unturned seniorita's Avatar
    Join Date
    Jun 2008
    Posts
    5,579
    Quote Originally Posted by carignan View Post

    But do you know this Japanese site?
    http://fssim.sakura.ne.jp/

    For example, this is an analysis of OG Men's competition.
    http://fssim.sakura.ne.jp/200910/200...couverMen.html
    thank you for the site so a judge gave Plushenko 98.10 and another gave Lysacek 97.75 and someone else Dai 102.15 ?All sound
    Last edited by seniorita; 09-13-2010 at 04:17 AM.

  10. #10
    Tripping on the Podium
    Join Date
    Apr 2010
    Posts
    55
    Quote Originally Posted by seniorita
    so a judge gave Plushenko 98.10 and another gave Lysacek 97.75 and someone else Dai 102.15 ?All sound
    Yah, 102.15! Hilarious! I think it's the same judge who gave 96.1 to Plushenko, 97.75 to Evan and 102.15 to Daisuke. Even as a Daisuke's fan, it's just too much. For justice's sake, the judge was omitted. I guess she was a Japanese judge. What a brave / patriotic woman.
    Last edited by carignan; 09-13-2010 at 05:29 AM.

  11. #11
    leave no stone unturned seniorita's Avatar
    Join Date
    Jun 2008
    Posts
    5,579
    Can i ask something?In each column they are not the same judge's marks? i know two judges are out in the whole competition but seems like their marks are published randomly?i thought it was anonymous but each column represents the same judge. I was wrong?

    I m looking at the sheet now, what are the red marked numbers supposed to be?
    Last edited by seniorita; 09-13-2010 at 06:12 AM.

  12. #12
    Tripping on the Podium
    Join Date
    Apr 2010
    Posts
    55
    Quote Originally Posted by seniorita
    i thought it was anonymous but each column represents the same judge.
    No, now ISU shuffles the judges. (I hate this suffling!!) J1 for Plushenko and J1 for Evan aren't the same judge. We can't see who is who now.
    The owner of the site found out which two judges was eliminated by calculation and show them in red.

  13. #13
    leave no stone unturned seniorita's Avatar
    Join Date
    Jun 2008
    Posts
    5,579
    Quote Originally Posted by carignan View Post
    No, now ISU shuffles the judges. (I hate this suffling!!) J1 for Plushenko and J1 for Evan aren't the same judge. We can't see who is who now.
    The owner of the site found out which two judges was eliminated by calculation and show them in red.
    thank you, I thought so at first but then i take high low out and from the rest the average is not what it should be...I m sure I make something idiotic

  14. #14
    Tripping on the Podium
    Join Date
    Apr 2010
    Posts
    55
    Quote Originally Posted by seniorita
    thank you, I thought so at first but then i take high low out and from the rest the average is not what it should be...I m sure I make something idiotic
    I'm afraid I might be wrong... but I think the actual score is calculated based on each element. So you can't get the same score if you calculate using the total score of each judge.
    In this table, scoreA is the score in ISU's protocol. ScoreB is the average without judge drawing. (just remove the highest and the lowest and get the average from the rest.)

  15. #15
    leave no stone unturned seniorita's Avatar
    Join Date
    Jun 2008
    Posts
    5,579
    Ok nevermind, I got it after
    There used to be a thread here explaining CoP calculation very well, 1-2 years back, now I cant find it, I dont remember title...

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •