Page 1 of 2 1 2 LastLast
Results 1 to 15 of 20

Thread: Statistics tutorial (2)

  1. #1
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,330

    Statistics tutorial (2)

    This thread is for anyone who wants to understand some of the measures that statisticians use to draw conclusions about the merits and demerits of different judging systems. This is easy. Don't be intimidated if math isn't your favorite subject. It is my job to make it your favorite subject, LOL.

    I will start with this question. Under the ordinal system, how can we decide whether the judges are in substantial agreement in their ordinal placements, or whether the agreement is so bad that we begin to suspect that something is fishy?

    I will use the data from the Ladies skate at the recent International Skating Challenge to show how to do this. Again, this is easy!

    ---RUS GER CAN USA JPN
    MK 1 1 2 1 2
    SA 4 2 1 2 1
    SC 2 3 4 5 3
    JR 5 4 3 4 4
    JK 3 5 5 3 5
    AP 7 6 6 6 6
    ES 6 7 7 7 7

    If this table looks all bunched together, here is a prettier one:

    http://www.usfsa.org/events_results/...nge/ladies.htm

    As we see by reading down the columns, the German judge was the only one that "got it right," in the sense of matching the majority in every placement. Let's see how far off the Russian judge is from the German judge.

    For each skater, take the difference (d) between the placements given by the two judges. Square this difference, then add them all up.

    RUS GER d d-squared
    1 1 0 0
    4 2 2 4
    2 3 1 1
    5 4 1 1
    3 5 2 4
    7 6 1 1
    6 7 1 1
    Total-12

    Now run this 12 through the following formula, where n is the number of skaters:

    r' = 1- [6(sum of the d-squareds)]/[(n-1)n(n+1)]

    r' = 1 - (6*12)/(6*7*8)

    r' = .78

    We say informally that there is a 78% correlation between the two rankings.

    Note: This statistic r' is called the "rank correlation coefficient." It was invented by Charles Spearman (1863-1945), and it's distribution was worked out by "Student" (the nom de plume of W. S. Gossett, 1863-1937). It is a surrogate for the more common correlation coefficient r, used for continuous data. Spearman's statistic r' is distributed by the Student's T distribution with n-2 degrees of freedom.

    So what?

    OK, forget that. Here is the main point. So there is 78% correlation between the rankings of the Russian judge and the German judge, so what? Well, the closer this statistic is to 100%, the more the two judges agree. A correlation of 0 means they weren't even watching the same event. So 78% is "not too good, not too bad."

    Now we must quantify what "not too good or bad" means. Let's suppose that whatever we want to say about these judges, we agree to hold our tongue and not say anything at all unless we can be 95% certain that we are right. In that case, there is a "critical value" that we must beat before we can say that there is a "statistically significant correlation" between the two judges. In our case, the critical value turns out to be C = .67.

    So, bottom line, if we get a correlation bigger than .67, that means that we can be 95% sure that there is at least some agreement between the judges, but a correlation of less than .67 means that we cannot be sure of anything, really.

    Here are the correlations of all the judges, paired two by two. Remember, if we beat .67 correlation, that's good.

    RUS vs. CAN: r' = .57 (Oops. We cannot be 95% sure that these two judges are even watching the same competition.)

    RUS vs. USA: r = .71

    RUS vs. JPN: r' = .68

    GER vs. CAN: r' = .93 (That's good.)

    GER vs. USA: r' = .86

    GER vs. JPN: r' = .96 (Very good match.)

    CAN vs. USA: r' = .86

    CAN vs. JPN: r = .96 (It looks like Germany, Canada and Japan are pretty much on the same page.)

    USA vs. JPN: r' = .82

    So there you are.

    Home cooking?

    Wait, one more thing. In the cases where judges appear to disagree, where do these differences come from? Well...

    CAN matched the majority rankings for each skater, except that she (Casey Kelly) put Jennifer Robinson ahead of Cohen.

    JPN matched the majority in every ranking, except that he (Tomiko Yamada) put Shizuka Arakawa ahead of Kwan.

    GER had no horse in the race, and she (Sissy Krick) matched the majority without exception.

    RUS (Tatiana Danilenko) was the only judge to put Sokolova ahead of McDonough, and was also the only judge to put Cohen ahead of Arakawa.

    Hmm.

    Mathman
    Last edited by Mathman; 12-13-2003 at 04:17 PM.

  2. #2
    Custom Title
    Join Date
    Jul 2003
    Posts
    747

    Re: Statistics tutorial (2)

    Originally posted by Mathman
    [B]
    Now run this 12 through the following formula, where n is the number of skaters:

    r' = 1- 6*(sum of the d-squareds)/(n-1)(n)(n+1)

    r' = 1 - 6*12/6*7*8

    r' = .78

    Tutorial 2, was there a tutorial 1?

    I am slow, but I try to muddle through. It helps me at least if you state the equation as

    r' = 1- [6(sum of the d-squareds)]/(n-1)(n)(n+1)

    that is add an extra bracket, and leave out the *, because the * may be confused to be power of instead of product of. I know you are using proper math symbols, but I am slow. i know technically you can state it as

    r' = 1- 6(sum of the d-squareds)/(n-1)(n)(n+1)

    and that is still correct.

    BTW, should it always be 1-6 or 1-n, and in this case n = 6?

    Don't be intimidated if math isn't your favorite subject. It is my job to make it your favorite subject, LOL.
    I am exceedingly intimidated , but now that I muddled through it what is the reward, a box of pop corn? candies? Don't tell me self satisfaction, that does not work for me.
    Last edited by rtureck; 12-13-2003 at 03:16 PM.

  3. #3
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,330
    Hi, RTureck. I edited my equation as you suggested. Thanks for the feedback.

    The 6 is always 6. It is the 6 in the formula for the sum of the squares of the first n integers

    1^2 + 2^2 + 3^2 + ... + n^2 = n(n+1)(2n+1)/6.

    MM

  4. #4
    Custom Woman
    Join Date
    Aug 2003
    Location
    New York, NY
    Posts
    1,770
    Ah, Mathman, this brings back such fond memories. Seriously, you know I love this stuff and the way you're right, it IS easy. Besides, in the end, even if the only thing one gets is the "Home Cookin'" section (great title), it makes me miss the COP already. I can understand Dick Button preferring the 6.0 system--it's the system he learned, competed to, and has commented on for going on 50 years--but when I see the stats for a competition like this using the 6.0 system, I find the COP is more statistically accurate right now and has the probability of gaining even greater statistical accuracy once they fix a few things than the 6.0 system ever could. I know I'm slightly off topic, but then looking at and understanding statistics is, IMO, the most important way of evaluating which judging method most accurately rewards the best skating performances with the top placements. Thanks again, Master Mathman.
    Rgirl

  5. #5
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,330
    Originally posted by Rgirl
    "[W]hen I see the stats for a competition like this using the 6.0 system, I find the COP is more statistically accurate right now and has the probability of gaining even greater statistical accuracy once they fix a few things than the 6.0 system ever could." -- Rgirl
    We'll see, R. IMO the CoP has its statistical peccadilloes, too. If you want the CoP to look good, analyze the ordinal system. If you want the ordinal system to look good, analyze the CoP.

    My only beef is, if anyone wants to use statistics to criticize one system or the other, he or she has to do it by the numbers. Of all mathematical disciplines, statistics is the most driven by actual empirical data. For over a year now, long before we had a single datum to judge by, experts in statistics have been telling us what was going to happen under CoP judging.

    OK.

    We’ll see.

    Mathman

  6. #6
    Custom Title Joesitz's Avatar
    Join Date
    Jul 2003
    Location
    New York City
    Posts
    20,185
    Mathman -

    You certainly showed the national bias where applicable among the judges for the 6.0 system.

    I'd like to see you run through another competition using the comparisons of the judges in the CoP system. But then in the CoP, will we know who the judges are?

    Joe

  7. #7
    Custom Woman
    Join Date
    Aug 2003
    Location
    New York, NY
    Posts
    1,770
    Originally posted by Mathman
    We'll see, R. IMO the CoP has its statistical peccadilloes, too. If you want the CoP to look good, analyze the ordinal system. If you want the ordinal system to look good, analyze the CoP.

    My only beef is, if anyone wants to use statistics to criticize one system or the other, he or she has to do it by the numbers. Of all mathematical disciplines, statistics is the most driven by actual empirical data. For over a year now, long before we had a single datum to judge by, experts in statistics have been telling us what was going to happen under CoP judging.

    OK.

    We’ll see.

    Mathman
    Very true, Mathman, and a point I do try to keep in mind--wait till we've had at least two years of actual data. The ordinal system just always seemed so easy for judges to justify almost any placement of a given skater, especially if the skaters near the top skated about the same in terms of jumps and falls. I think of the events where the ordinals ranged from 8 to 1 for a skater. It too often seemed as if the judges didn't even need to cheat, that they could give placements completely out of line with what would have been an accurate reflection of what the skater did on the ice and could justify it, if they were ever asked to justify it, by subjective generalities, ie, the choreography was poor, there was no variation in speed, the skater wasn't musical, the jumps didn't seem secure (even if the skater landed everything clean) or the opposite if they gave a skater a high placement for a poor skate. At least with the COP, most of the error is on the page. With the ordinal system, aside from the statistical error, which was not acknowledged much less addressed, the infinite opportunities for error were all inside each judge's head. Not that there still isn't error inside the judges' heads with the COP, it's just that a whole lot more of the error is there in the Detailed Results to be analyzed by skaters, fans, judges, federations, the ISU, anybody. Just to pick one example, after the '94 Olympics, several of the judges were interviewed and said they did not see Oksana two-foot her landings on her 3L/2t combo and on another jump, I think it was either her 3L alone or her 3flip. Anyway, the point is they said they didn't see the 2-foots and if they had, they would not given her the first place ordinal. We never would have known about this particular error, at least as it was reported by these judges, had someone not showed them the video and pointed it out. It didn't change anything in terms of the results, but we can't learn from mistakes (which are not the same as error, which perhaps is something you might discuss) unless we know about them and we can't adjust the system to try to minimize error unless we know about it. I know that's just one example, but I wonder how often hings like this happened as well as how often judges saw or did not see according to their national, cultural, personal, or skating style biases.

    To me, at least with the COP there is an assigned value for each element for the Technical score and required deductions if an element is missed or performed incorrectly. Certainly I've seen problems with various aspects of the current COP; with implementation, such as the caller being incorrect; inconsistency in assigning levels for spins, spiral sequences, and footwork; and various other things that have been discussed in detail elsewhere. But in the COP, the fact that there are numbers for each element, as well as the the component scores, that can be analyzed is exactly what I see as one of the strongest arguments for the COP. When ice dance team Delobel and Schoenfelder fell on their lift near the end of their free dance at Trophee Lalique and not only did not receive any deduction, but received additions to the base mark of +2 from seven of the 11 judges, at least you could see what seemed to me to be a clear glitch in either the judging or the system in actual numbers. I don't want to get too detailed since this is not the thread in which to do so, but given that error is inherent in any statistical process, I like being able to see it in numbers rather than having it locked inside a judge's head where even the judge may not be aware of it. I'm not saying there aren't benefits to the ordinal system and I would never say the COP as it stands now does not have serious flaws. What I have had with COP and the events I've seen using it thus far is a comfort level most of the time with the placements, that is, the order of placements, both in each phase of the compettion and final, seems to accurately reflect the way the skaters performed. Also, the times when the placements have seemed wrong, I've been able to see in the detailed results where it was that the skaters seemed to either get additional credit when they should not have or even should have received deductions; should have received deductions and didn't; received what I thought were unfairly low or high Component scores in one or more components; or in any one of various other ways. The numbers were there to be seen and compared to what I saw the skaters do and that's what I find to be the COP's biggest advantage over the ordinal system.

    ITA that if you want the COP to look good, analyze the ordinal system and if you want the ordinal system to look good, analyze the COP. After all, what student of statistics doesn't have a copy of the never-out-of-print "How to Lie with Statistics" (and they don't mean in bed, although I'm sure some judges have done that too). But watching the Campbell's and IFSC use the ordinal system, especially the former live, each score has to encompass so many variables not only with what the skater is doing on the ice but also in terms of skate order and all those variables are only ever in the judges' minds. For all we know, under the ordinal system, some judges may mark a skater based only on jumps or because they hate the music of a certain composer. Those are extreme, but the point is, we don't know and can never know. Even if the judges say what they variables they used to score a skater, we don't know if they did. Then those scores are really just used as a way for judges to keep track of how they would place the skaters after they've seen them all skate, so the 5.8s and 5.9s for one judge might be 5.5s and 5.6s for another. I know they have guidelines, but again, we just don't know and can never know.

    Okay, I've already said more than I meant to say and said I would say and I don't want to take this thread in the wrong direction. So please, hit us wit more 'o dem numbers, baby! I got a good feelin' about 216 and 12,960,000--and I bet you know why
    Rgirl
    P.S. Re "For over a year now, long before we had a single datum to judge by, experts in statistics have been telling us what was going to happen under CoP judging." What are some of the things experts have been telling us was going to happen under COP judging? If this isn't the right thread, you can answer me under the "More Questions About the COP" thread. Just curious.
    Last edited by Rgirl; 12-15-2003 at 02:21 AM.

  8. #8
    Keeper of Michelle's Nose berthes ghost's Avatar
    Join Date
    Jul 2003
    Posts
    953
    I can understand Dick Button preferring the 6.0 system--it's the system he learned, competed to, and has commented on for going on 50 years-
    Gosh, you talk as if the system used at this year's worlds is the exact same system used back in 1948. It isn't. The scoring system has changed many times over the years and people like Dick have always kept abrest, it's their job.

  9. #9
    Custom Woman
    Join Date
    Aug 2003
    Location
    New York, NY
    Posts
    1,770
    Originally posted by berthes ghost
    Gosh, you talk as if the system used at this year's worlds is the exact same system used back in 1948. It isn't. The scoring system has changed many times over the years and people like Dick have always kept abrest, it's their job.
    Very true, Berthes Ghost, that the scoring system has been tweaked many times along the way. I should have been clearer with my language, but it is still the 6.0 system and that's what Dick said he preferred. However, it was a quick, general remark so I myself wouldn't read too much into it. He may have meant a specific aspect of the ordinal system, like when they used to know who the judges were
    Rgirl
    Last edited by Rgirl; 12-15-2003 at 09:54 PM.

  10. #10
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,330
    I don't think that Dick Button is against the CoP particularly. He is against secret judging.

    Mathman

  11. #11
    Custom Title Joesitz's Avatar
    Join Date
    Jul 2003
    Location
    New York City
    Posts
    20,185
    Hooray for Dick. I'm against secret judging. I want to see those bias scores.

    Joe

  12. #12
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,330
    (What I wrote while waiting for my car to get through its 120,000 mile checkup this afternoon. -- If you're so smart, Mathman, why can't you afford a better car?)

    Statistics 101, part 3. Testing the CoP.

    The only thing we really ask of a judging system is that "the right person wins." If that is too vague a goal, then we would settle for some assurance at least of consistency in judging: if lots of judges scored the performances over and over, the results would be more or less the same most of the time. In using language like this we are tacitly assuming that the judges' scores somehow represent a sample drawn from the population of all the marks that might have been given by all possible well-qualified, impartial and honest judges, conscientiously following the guidelines of the CoP.

    Suppose the total scores look like this (for simplicity I will assume that there are only five judges, and I will set aside for the moment the effects of the randon draw and the trimmed mean, as well as cumulative effects of adding up many component scores and considerations of national Chauvinism, etc.)

    Buttle: 180, 190, 185, 185, 185; average: 185
    Goeble 185, 190, 180, 180, 185; average: 184

    Buttle wins, 185 to 184.

    But since this was a close contest, can we be confident that Buttle's skate really was better, and would have been certified so by the majority of all judging panels that might have scored this event?

    First, to investigate this question as serious scientists rather than as fans of one skater or another, we must agree not to go popping off before we know what we are talking about. To insure this, let us stipulate the following:

    IF WE CANNOT BE 95% SURE, THEN WE WILL REMAIN SILENT

    Next, we pretend for the sake of the argument that we want the CoP to work. That is, we will test the hypothesis that Buttle's performance really was better. To do this, we set up a straw man, called the Null Hypothesis: The two performances were exactly the same. We don't believe this, however, and we want to show that the null hypothesis is wrong.

    Null Hypothesis: The two performances were exactly the same.
    Our Hypothesis: No, Buttle's was better and so the CoP worked.

    To test our hypothesis, we compute a statistic called the variance. This is a measure of how far each individual score is from the average. For Buttle the average was 185 and the individual scores are

    180 (off by 5), 190 (off by 5), 185 (off by 0), 185 (off by 0), 185 (off by 0).

    Take the sum of the squares of these differences and divide by n-1, where n is the sample size. That's the variance. For Buttle,

    v = (5x5 + 5x5 + 0x0 + 0x0 + 0X0)/4 = 12.5

    For Goebel,

    v = (1x1 + 6x6 + 4x4 + 4x4 + 1x1)/4 = 17.5

    (Remark: The standard deviation is the square root of the variance.)

    Now we need a standard unit-free measure of the difference between Buttle's and Goebel's average scores. The formula is

    T = (x1-x2)squareroot(n)/squareroot(v1 + v2)

    T = (185-184)squareroot(5)/squareroot(12.5 + 17.5)

    T = 0.4

    (In techspeak, Buttle's average score is 0.4 standard errors bigger than Goebel's.)

    But in order to be 95% sure of anything we must beat a certain "critical value." The exact number that we have to beat varies slightly with the sample size and other variables, but as a rule of thumb most of the critical values are around 2. So if this T score is bigger than 2, then we can be 95% confident that Buttle's performance really was better. If T is not bigger than 2, then we cannot be 95% sure, so we must remain silent.

    Conclusion: Our T score of 0.4 is not bigger than 2.0. Therefore we are not 95% sure (of anything).

    So, bottom line. We tried our hardest to say something good about the CoP, but we could not be sure.

    Mathman

    PS. Some pet peeves.

    (a) If you have taken Stat 101 in the last 20 years, you have probably used a textbook that made use of the "p-value" in hypothesis testing. The authors of these texts, IMO, are often somewhat lazy in explaining why it is cheating to use the sample data to determine the p-value, rather than (correctly) to set the level of significance first, then take the sample. I will explain more about this if there is any interest.

    (b) Most statistics texts say that we "accept" or "reject" the null hypothesis. This language is quite misleading. We never accept the null hypothesis -- it's just that we cannot be 95% sure that it's wrong.

    (c) In one-tailed hypothesis test some texts say "x1 is less than or equal to x2" instead of "x1 = x2" for the Null Hypothesis. While satisfactory English, IMO this creates confusion by hiding the mathematical assumptions on which the calculations actually rest.

    AND ONE MORE THING, LOL.

    (d) Here's a good way to tell if your statistics text is any good or not. If it refers to the "Chi square" distribution, it's crap. It should be "Chi squared." (Want some barbeque ribs?)
    Last edited by Mathman; 12-18-2003 at 06:52 PM.

  13. #13
    Custom Title
    Join Date
    Jul 2003
    Posts
    747
    Originally posted by Mathman

    T = (x1-x2)SQRT(n)/SQRT(v1 + v2)

    T = (185-184)SQRT(5)/SQRT(12.5 + 17.5)

    T = 0.4
    I am slow this is so intimidating, so what is SQRT? can you use

    ___
    / 5

    instead? or spell the whole thing out, thanks. It took me a long time to figure that out? I had problem with 4th grade math, and you are talking about null hypothesis? But thanks for the I am learning


    So, bottom line. We tried our hardest to say something good about the CoP, but we could not be sure.
    Can you figure out Shizuka's CoP scores in Skate Canada?
    Last edited by rtureck; 12-18-2003 at 06:40 PM.

  14. #14
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,330
    OK on "squareroot."

    Yes, these calculations are based on the assumption of the Null Hypothesis. The logic is, we are against the Null Hypothesis. So we are giving the Null Hypothesis enough rope to hang himself. That is, we assume the Null Hypothesis to be true and then prove that this assumption leads to a conclusion that is probably (with 95% probabilty) wrong. It is the probabilistic version of the reduction to absurdity.

    I'll look at the Skate Canada scores. But I won't say anything unless I am 95% sure.

    MM
    Last edited by Mathman; 12-18-2003 at 06:59 PM.

  15. #15
    Custom Title Joesitz's Avatar
    Join Date
    Jul 2003
    Location
    New York City
    Posts
    20,185
    Quote:
    Next, we pretend for the sake of the argument that we want the CoP to work.
    __________________________________________________ __

    That is like the Warren Commission which set forth that the single bullet theory was correct and collected evidence to support that theory. Any opposing evidence was discounted.

    Am I correct?

    Joe

Page 1 of 2 1 2 LastLast

Similar Threads

  1. Statistics from Canadian Nationals Prediction Contest
    By Jennifer in forum 2004-05 Figure Skating archives
    Replies: 0
    Last Post: 01-17-2005, 03:18 PM
  2. Statistics from US Nationals Prediction Contest
    By Jennifer in forum 2004-05 Figure Skating archives
    Replies: 1
    Last Post: 01-11-2005, 07:29 PM
  3. Timothy Goebel's Quad Statistics
    By SkateFan4Life in forum 2004-05 Figure Skating archives
    Replies: 6
    Last Post: 01-01-2005, 03:16 PM
  4. Math Help....Statistics!!
    By show 42 in forum 2004-05 Figure Skating archives
    Replies: 23
    Last Post: 09-24-2004, 10:46 AM
  5. Interesting Statistics about Predictions for U.S. Nationals
    By Jennifer in forum 2003-04 Figure Skating archives
    Replies: 3
    Last Post: 01-17-2004, 01:36 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •