1. 0
Well, it's kind of the backward version of that. We assume the opposite of what we want. Then we try to collect as much evidence as we can to show that our assumption is wrong.

In this case (Buttle versus Goebel), we were not able to collect enough evidence one way or the other to come to a conclusion.

Mathman

2. 0

## Hi Mathman...

LOL, just wanted to thank you for the grim reminder that I nearly flunked Statistics which was a required course for me back in the college days. :o So just like back then, I think I'll have a beer and let you do the math.

Seriously, this is very interesting and to the degree you're willing to provide us with analysis on various events, I'm all "eyes."

DG

3. 0
Re your most recent stat tutorial--APPLAUSE, APPLAUSE! BRAVO! Author! Author!

Your car should get a 120,000 mile check-up more often, lol. Seriously, your stat tutorials are all good.

Thanks for bringing up how statistics have changed over the last 20 years. I had my undergrad and grad courses in stats in '84-'85. After (not during) our "discussions" lol on stats last year, I emailed my graduate stat professor. Although you and I didn't get into p values and some of the things you mentioned here, we did get into some of them. Indeed, Professor Statman told me that things had changed in statistics and detailed many of the things you brought up here. However, Prof. Statman is still a True Score believer (sing that to the music of "Daydream Believer"), though at the time I emailed him the COP had not been used, only set out in the first Communications. Obviously he said he could not comment on it since he had not looked into it nor did he have the data to make an assessment.

As for Pet Peeve (b) "Most statistics texts say that we "accept" or "reject" the null hypothesis. This language is quite misleading. We never accept the null hypothesis -- it's just that we cannot be 95% sure that it's wrong." I learned "support" or "reject" the null hypothesis. Do you feel that the only proper language is "we can or cannot be 95% sure that the null is wrong" or would "support" or "reject" be generally satisfactory to you? However, I do feel that the "95% sure" way of stating it is the most accurate.

Also, although I think it was great to use an individual example for explaining whether we can be 95% sure that the null is wrong, do you think that in order to evaluate the COP system overall, and the ordinal system as well, that results for a large sample in which the COP and ordinal systems were used would give a more accurate indication as to the statistical accuracy of each system? In other words, whether COP or ordinal, an individual case of two skaters who competed against each other can give information about those skaters in those events, but can it give information about the accuracy of the system as a whole?

Great stuff, Mathman. BTW, I think your way of giving definitions in context is far better than my idea of just doing definitions separately. But then that's why you're Professor Mathman and Author Rgirl, lol.
Rgirl

4. 0
Hi, Doggygirl (no, not you, Rgirl, there really is a "Doggygirl" on the board now!), I see you're up to 21 posts already! Just wait till I finish turning out the CoP, you can go back to your statistics class and get an A. LOL.

(Aside -- Here's how you can tell whether you statistics teacher regards him/herself as a mathematician as opposed to a teacher or a person interested in applications. Mathematicians never say "stat" when they mean statistics, or "math" when they mean mathematics, LOL.)

RGirl, no, I don't like the language "these data 'support' the null hypothesis," and this for two reasons. First, it is not true. Even in a close contest, even in a contest that is "too close to call," like Buttle 185, Goebel 184, the data still "support," however tenuously, the belief that Buttle's performance was a tiny bit better than Goebel's.

Consider, for example, a presidential election. You have a pre-election poll that says 51% for G.W. Bush and 49% for H. Clinton, so you say that we are not sure that Bush is really ahead of Clinton, so it's "too close to call." But the null hypothesis says, e.g., that if 30,958,324 people vote for Bush, then exactly 30,958,324 people will vote for Clinton. If only 30,958,323 people voted for Clinton, then the null hypothesis is wrong and the alternative hypothesis (more people support Bush) is right. It is virtually impossible for the null hypothesis to be true no matter how close the sample proportions are. So I think it is not good language at all to say that the results of the poll, or of any poll, no matter how close, "support" the null hypothesis.

But there is a worse problem with this language. What question are we asking? When we give the answer, what is the subject of the sentence and what is the predicate?

The question is, Are we 95% sure that Buttle's performance really was better? The subject is "we," the predicate is "are sure" and the answer is either yes or no.

To me, this is so clean. If in fact our test shows that we can be only 94.99% sure, the answer is still no.

BTW, you can modify the null hypothesis to something like: The amount by which Buttle's "true score" exceeds Goebel's "true score" is 0.5 points. Then you can investigate whether the sample data allows you to be 95% confident that the difference is actually bigger than that.

Prof. Statman might also ask why I did an "independent sample" test rather than a "paired sample" test for these data, since it is the same judges marking both skaters. I went back and forth on that point. Maybe I was wrong about that. (This is a question of choosing the best mathematical model in the context of the real world problem.)

About the True Score. This is what is "true" about the true score: If your purpose in taking a sample in the first place is to estimate the mean of a population, then there is nothing wrong with calling the mean of the population the "true mean." I guess ... Oh, hell, yes there is. What's "true" about it? Why not call it what it is, the "population mean?" Why invoke TRUTH, JUSTICE AND THE AMERICAN WAY?
Do you think that in order to evaluate the COP system overall, and the ordinal system as well, that results for a large sample in which the COP and ordinal systems were used would give a more accurate indication as to the statistical accuracy of each system? In other words, whether COP or ordinal, an individual case of two skaters who competed against each other can give information about those skaters in those events, but can it give information about the accuracy of the system as a whole?
What I really think is that all these mathematical questions are overwhelmed by the real problems in figure skating judging: politics, deal-making and national federations pushing their own agendas. I am doing this just for fun, but I don't think that it amounts to a hill of beans.

Not that this will prevent the publication of tons of statistical analyses of the CoP in the coming year. Unfortunately, for most of the scholarly analysis I have seen, all you have to do is read the name of the author and you know what the conclusion will be.

Mathman

PS. RGirl, you asked earlier about what I meant when I talked about articles published last year which made "predictions" about the CoP. Predictions isn't really the right word. What I meant was, people published papers which gave theoretical mathematical reasons why ordinal-based systems or point-total systems, the median or the mean, etc., would turn out to be the more "robust" and reliable in the context of figure skating judging.

These articles, in my opinion, were quite delightfully data-free.

5. 0
Thanks, Mathman. Not only do I agree with you on the language "Are we 95% certain? Yes or no?" but so is Prof. Statman (I gave him that name because I'm too lazy to write Prof. Statisicsman, MATHman, lol). That was one of things he said had changed over the last 20 years.

His belief in the "true score" has more to do with timed tests such as the 100-meter dash and as I said, when I emailed him the COP Communique hadn't been released yet.

As for me, although I agree 100% (okay, 99%) that the biggest problems facing figure skating judging are the judges themselves, I think that is true in any judged sport and that to go too far in the other direction, as in, just make sure the judges aren't cheating and know what they're doing and pretty much any judging system will work about as well as another (not that you said that, but I'm 95% certain you would) is just as bad as publishing articles with nonexistant data. I still think statistics are important in minimizing the effects of the inevitable cheating, bias, poor judging, mistakes, and everything else that goes into having humans as judges. I agree that the ISU is using hocus-pocus statistics to try to make it seem like the COP is "cheat proof," but even with the best crackdown on cheating judges, no system is "human proof." That's why I still would like to see a statistical comparison between the ordinal and the COP. Although you're right that the anti-COP group will manipulate the statistics to make the ordinal system look best and vice versa. However, since the politics seem to dictate that the COP is going to be ratified, I'd like it to be as statistically robust as possible. Also, at least we can try to make the COP as accurate as possible in rewarding the skaters who skate the best according to the standards established for figure skating since getting anywhere on cheating judges, federation influence, etc. is going to take many, many years even if every single member of the ISU were full-bore behind it.

I know, it's our same old argument. I say push for designing the best statistical system for judging we can while we wait for a change in the ISU's attitude towards cheating, etc.--which might never come--and you say, well, you can speak for yourself of course. Anyway, the issue for me is finding the right balance between designing a judging system that statistically reduces the influence fo cheating or bias and designing a system that finds and gets rid of the cheating and biased judges. They have to deal with it in diving, gymnastics, synchronized swimming, and other sports, but IMO the challenge with figure skating is its complexity relative to those other sports, that is, there are a lot of things to evaluate at once and many different ways to be "excellent" especially when it comes to the component elements and presentation.

BTW, if by some magic means, we woke up tomorrow and the ISU had a very tight, very harsh system of dealing with cheating and biased judges, do you have a preference for any particular kind of judging system (it doesn't have to be COP or ordinal, it could be something else too)? Or would you lose all interest in judging if there were no cheating or bias?
Rgirl

Page 2 of 2 First 1 2

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•