Re: Statistical error in judging
Mathman--I wrote this in the "Worlds" folder before I read your post here. I agree that we are talking about sampling error, but it is error nonetheless. Also, I do think there are valid arguments against the "random selection error is meaningless" position. I've left my original post in the Worlds folder, but I've reposted it here. I agree, we will have to disagree--not about the sampling error, that it is, but about what it means. We've argued this before and I understand we will never agree, but in the interest of different points of view, I'll post this anyway
Okay Mathman, I'll put my head on the chopping block again

I agree that secrecy is a large part of what's wrong with the present system of FS judging, but it is not the only thing wrong. As always, I see your point about there being only nine real judges whether they are chosen six seconds or six months before the event. OTOH, I also think there is a certain amount about this argument that is about semantics and the difference between statistics and the meaning of statistics. The way I see it, there are 14 judges judging the event and all the scores of all 14 judges are shown to skaters and viewers. Dick and Peggy and my cat Pi are not trained as judges, are not brought to the event, and are not asked to officially evaluate the skaters. Also, the difference between selecting the judges six months vs. six seconds ahead of time is that with the latter system, any one of the 14 judges who has been asked to show up is a potential real judge. What any panel of judges is doing in any situation where you cannot evaluate who will win or lose based on an quantifiable measure like time, tasks completed (as in golf or basketball), height, etc. is to serve as surrogate quantifiers; they are evaluating the athletes based on an agreed upon set of criteria and selected based on their (supposed) expertise. It's often said that many posters at GS know enough about figure skating to be judges, but as far as I know, none of us has been through the judges' training system nor do we have experience in judging. You can't get rid of cheating and oddball points of view, which is why you try to build in statistical safety measures so that the proverbial true score, which should be the one that the majority of expert judges agree or come close to agreeing upon, is the one that gets assigned. The only way to determine how accurately the judges are assigning "true" scores is to by looking at large numbers of judges and their scores of various skating performances over time. From what I've read, that's what the statisticians for the ISU are trying to do and have been trying to do. While I agree that the secrecy in terms of who the judges are is one of the worst aspect of this judging system, I think it's only one several. Even if we knew who all the judges were in the present random selection system there would still be, IMO, a significant and unacceptable error rate. In any group of scores there is an error rate, whether it's from nine judges selected six months before the competition and whose names we all know or from the anonymous judges in the current system. The error rate is the degree to which the scores deviate from what the "true" score would be. The "true" score is never attainable in the real world, which is why you always have an error rate in athletic competitions or whenever one compares scores. I realize it is different in pure mathematics.
For me, as I've said before, if 5 out of 14 trained and officially assigned (I know, the computer "officially assigns" only nine judges and does so before the competition--that's really the heart of our argument) figure skating judges put Skater A in first place and 9 of those 14 judges put Skater A in second place, statistically speaking, Skater A's true finish, to the best we can determine, is second place. But if the computer selects the 5 judges who put Skater A in first and 4 of the judges who put Skater A in second for the final nine scores, thus putting Skater A in first place, then there is an error and a significant one. True it is sampling error, but the point IMO is what does this error mean? To me it's no different than if we say we're going to let you have 14 trials at making a basket from the freethrow line, but before you even start, we are randomly going to count only nine of the 14 trials towards your final score. If you make 9 baskets and miss 5, but the computer had preselected only 4 of the trials where you made the basket and the 5 you missed, then that is an unacceptable error. It does not reflect the person's true ability at shooting baskets. We go further when we bring in a competitor, shooter B. Shooter B misses 9 baskets and makes 5 but the random selection went in his favor. Thus Shooter B, with only 5 true baskets wins over Shooter A, who has 9 true baskets. The relation between the true score (the number of baskets each shooter actually made) to the number of baskets that counted toward the final decision is a significantly inaccurate representation of what actually happened. The way I see it, the judges are substitutes for baskets made or fastest times. Granted, they are a highly flawed substitute--kind of like working with a bad stopwatch--but it's the only system that sports like gymnastics, diving, and many others have.
Okay, you say, but that's like saying practice counts. It's like saying that 5 of the 14 judges are only practice judges because 5 are never meant to count, only Speedy just doesn't say so. This is true. However, please indulge me and read on. I say there will still be and always has been an error rate when the judges are not anonymous, subject to significant punishment for cheating, and the random selection method is no longer in place. In any case, even if the anonymity is dropped, I still think the random selection system has an unacceptably high error built in to the system. I do not think it is just a red herring, although I also think it's only one of a mess of problems. I think skaters who should have won have always lost competitions, or gotten third instead of second or whatever, at least some of the time, it's just that with this system, ironically, Speedy has made that error rate official and statistical. Before the random selection system and anonymity, you could know who gave which skaters what scores, but as we saw in SLC and other competitions, judges could still collude to make sure a certain skater or team got or did not get a certain place. With the old system, you had to hope somebody would squeal on the other judges or else 'fess up as being part of it. Judges could easily sway a competition with perfectly acceptable and defensible scores. Now the computer tells all viewers the raw scores and if there is a significant skew of the average scores, viewers and skaters for the first time can see what judges have previously done under the table or by inherent bias. We see, for example, that Skater A's average scores are 5.92 and Skater B's average scores are 5.85, yet Skater B wins. Something is fishy, we say, but we cannot say for sure because we can't see the ordinals nor can we know who the judges are and who assigned which scores. Before the random selection system, viewers could only say, "Something is fishy" based on what they thought of the skating vs. the judges' placements. Now Speedy has given us five raw scores to at least partially quantify our feelings of fishiness, if and when it happens. However, according to what I've read, the statisticians who are reviewing all this for the ISU have access to all the ordinals and scores of all 14 judges. Anyway, Speedy's random selection system just makes certain errors in judging easier to see and more quantifiable. Even when we knew who the judges were, there were still those who were completely out of whack with the majority, such as the judge at the Olympics who put Sarah Hughes in 10th place after her SP and 4th after her LP. The differences now are that at least the statisticians have a way to quantify when the skater with the most first place votes is not awarded first place--in other words, the statisticians can see when the final placement does not align with the majority of scores of all 14 judges, which should be the best indicator of the true score. We viewers can only suspect based on the raw scores. The nine randomly selected scores should reflect the same outcome as if they were selected from all 14 judges--or 50 offiical judges or the whole world of official judges. Whether it's nine precompetition randomly selected judges or 14 officially paneled judges whose scores may or may not count toward the final outcome, the judges are supposed to represent all expert figure skating judges in the world. Since we can't get them all, we take 7, 9, 14, 25, or x number of judges and say, "You represent all judges" and then use statistical methods to try to override bias, human error, and genuine difference of opinion. It's particularly on this last point--using statistics to override bias, human error, and difference of opinion--that I think is the basis of our difference of opinion on the meaning of the error rate in the random selction system. The scores of the five nonselected judges are supposed to serve as a way to evaluate whether the selected judges are being fair and accurate. So from my POV, their scores do, or should, have meaning, even if in this year it is only to show that the random selection process is unacceptably flawed. The five nonselected judges in a way should serve as a comparison panel, just as if a nonISU organization selected nine expert FS judges for each competition in a study to determine if the ISU judges' scores correlated with those of nonISU judges.
I can see a place for a panel of x number of judges where the high and low scores are thrown out or the scores that deviate most from the mean are thrown out, but I agree with the statisticians who feel there is an error rate in the random selection system and that it is unacceptably high. I also agree that anonymous judging and lack of accountability (which can still happen if we know who the judges are; it's been happening for decades) are increasing the error rate, we just don't know by how much because we can't measure something that's kept secret.
We may have to agree to disagree on the point about the random selection and error rates, but I do agree that secrecy and lack of accountability are major parts of what is wrong with FS judging. I just don't think they are the only things wrong. I also think the random selection process is wrong, as are several other things about the current judging system. I don't think that just getting rid of the secrecy will fix things. There was no secrecy in FS judging for decades and that led to SLC, plus all the problem competitions before SLC. I think it will take a combination of knowing who the judges are; professional, paid judges; balanced panels in terms of nationality; accountability for scores; comparitive judging panels (ie, judging panels who score the same events and whose scores are then compared to those of the actual judges); strict and significant punishment for judges caught cheating; changes to the way competitions are set-up (ie, Q rounds and how skaters are assigned to them vs. something other than a Q round as it now stands; and how much weight certain parts of the competition and certain elements are given); and better statistical methods in determining which scores are used and how placements are determined in order to clean up judging in figure skating. I think there is a lot of trial and error and analysis yet to be done before we get a system that is acceptably fair to the athletes given the human involvement.
In summary: Sampling error, we agree. The meaning of this error, we disagree.
Rgirl