Analyzing the Code of Points

gsk8 · Jan 28, 2004

Here is an analysis by Dirk for those who are interested.

Joesitz (RIP) · Jan 28, 2004

I am interested and after reading this and hearing the praises of so many posters for the CoP I am not convinced it will avoid cheating on CoE. TT mentioned that this system will make it easier for judges to cheat.

Those examples of GoE in Skate America are gawdawful.

The concluding paragraph sums it up - the judges are ignoring it! I believe it is because the evaluation of each element comes at them too fast to score them properly.

As for TT, i can see where the system will still have cheating.

While I and most posters agree the system is better, the system is far from perfect or even approaching perfect.

Let's see what they are going to do come May.

Joe

hockeyfan228 · Jan 28, 2004

There are a couple of points to the analysis that I don't think Mr. Schaeffer has substantiated:

As most people are aware, the CoP was thrown together rather hurriedly in a fairly blatant attempt by the ISU to salvage a reputation badly damaged by the Salt Lake City Olympic scandal.

According to the ISU the CoP system was already being developed two years before SLC, but the implementation was expedited because of the SLC scandal. That's hardly the definition of "thrown together."

This seems particularly unfortunate because the main rationale for the introduction of the CoP has been that of providing a "better" and "more objective" scoring system than that obtaining for the older ordinal systems. While the present data do not allow any judgment of the relative merit of these two systems to be made, it is difficult to see how the present problems could represent an improvement over anything at all.

If the data doesn't allow any judgment of the relative merit of the two systems, why does he come to this conclusion, especially when he doesn't compare the problems of CoP to the problems of 6.0/OBO? For example, he has already shown that the percentage of control that the skaters have over their score is relatively consistent within program type and discipline, while under 6.0/OBO, it was up to individual judges to decide, and, there was no way to tell whether the determination was consistent across all skaters. But he doesn't explain why, net/net, 6.0 is at least equal to this system.

Also, he points out a flaw in the implementation,

For example, in the Ladies competition at Skate America alone, one can find Shizuka Arakawa receiving scores ranging from -1 to +2 for her triple Lutz-double Toe combination, while Jenny Kirk gets three +1's, two -1's and one -2 for her triple Flip in the Short program, while the Free program shows four more instances of scores ranging from +1 to -2, or +2 to -1, in addition to nine instances of scores ranging from +1 to -1.

Under CoP the ISU can 1. Figure out which judges are off the mark, because there is an exact, not relative, score for the elements 2. Find out if the system itself is causing the issue -- i.e., the caller downgrades and the judges apply deductions to an underrotated jump, which the ISU has already said it's going to change -- which is a pretty instant feedback loop 3. Develop better training, including specialized training. For example, if a judge doesn't seem to have lifts or spins figured out, but the rest of his/her judging is in line, train the judge on that element and follow improvement. Under 6.0 there was no way to track this behavior through data. 4. Decide if there's a better way to score -- specialized judges, checking the criteria for a score and adjusting scores automatically based on missed errors, etc. 5. Change the relative weights to get different expected results; ex: ramp up both the upside and downside of landing/missing difficult jumps and combinations; increase the point value of footwork in the Men's LP. Note that changes of relative weight do not affect the judges' training, but happens automatically, so there is not a spectrum of rules interpretation and application.

If the ISU has no empirical evidence to point to what is wrong with the final score, how can they hold the judges accountable, which is what everyone's been screaming for?

Mathman · Jan 30, 2004

I greatly enjoyed this article, especially the graphs. Check it out, if you haven't read it yet.

But I have to confess that despite several close readings, I came away without the sense of alarm that Dr. Schaeffer writes into the concluding section.

"Most notable about [the CoP], is that a portion of the final mark awarded each performance is now more or less under the direct control of the competitors themselves. They are free to choose elements that [have] as great or small a degree of difficulty as they wish: the greater the difficulty, of course, the higher the value it receives. The question of how much higher that portion is, and what elements contribute to it most meaningfully, are the central concerns of the [statistical analysis in this article]."

And the answer is...

"In the Short program, about one-third to one-half of the final mark appears more or less under the direct control of the skater [in choosing which elements to include in his/her/their program], whereas another quarter to third comes from the judges' evaluation of how well the chosen elements are performed [the GoE]. A final 10% or so appears to come from the judges' overall summaries. In the Free program, about half of the final mark seems to be under the skaters' control and another quarter or so depends upon the judges' evaluation of the elements, and again, something like 10% comes from their overall summaries."

Sounds good to me!

In the second section Dr. Schaefer asks:

"What matters most - jumps, spins, or steps?"

and concludes:

"In both the Short and Free programs, for all three events, it is the jumps that indeed matter most, accounting for two-thirds to four-fifths of the final score in all six groups....Spins account for about 15% of the final mark in all three Short programs, but seem to matter only for the men in the Free programs, while step sequences have only small effects (or none at all) throughout."

Well, this could be tweaked I suppose, but successful jumps are the most important determinant of placement under the ordinal system, too.

The third section is raises the interesting question:

"If jumps matter most, in both the Short and Free programs, what matters most - their difficulty or their execution?"

And concludes:

"Unfortunately, the numbers here do very little to clarify this issue, since they vary so widely as to make any general summary appear suspect....Thus, the only consistent finding here is that jumps, either base rates or GoEs, seem more important than the other elements (which is what we began with)."

Well, I can see why a statistician might be disappointed that the numbers from these two events (Skate America and Skate Canada) did not help much in resolving this question. But in my opinion, the reason is, it's a hard question to resolve! To me, this just says that sometimes you get more credit for doing a hard trick poorly, and sometimes you get more credit for doing an easy trick well.

But everybody knows that, and we wouldn't expect -- or even, I think, want -- the CoP to speak with a single voice on that issue.

The final section before the conclusion is my favorite:

"Boys versus Girls...Other than in jumping ability, are there any consistent differences between the genders?"

"The short answer to that question is 'no', although we may have thought that women could be seen as better at spins or spirals than men."

This concludes (in summary) the body of the article.

I don't know about you, but I do not see anything in these data so far that would make me vote against the CoP.

Mathman

Ptichka · Jan 30, 2004

MM, I agree. The article does a good job pointing out some of CoP failure. However, the beauty of Cop, as HockeyFan pointer out, is that weight of element can be tweaked without re-training the judges! As to different judges scoring differently... well, at least under CoP we know exactly WHERE they differ, as opposed to just two marks under ordinal system.

HockeyFan, as to you point about specialized judges -- ITA. I would actually want to have seperate sets of judges look at Elements and Program Componenets. I think that would make the job easier for the jusges, and also make it more difficult to cheat.

Finally, the main thing that CoP still needs to develop is judges training in regards to Program Components. The only one of the 5 that gets scored more or less faily today is "Skating Skills", which is somewhat related to the elements. For the rest, I have a feeling the judges are not even really looking. Even this, though, does not make CoP worse that ordinals, it just does not utilize the potential for improvement.

Joesitz (RIP) · Jan 30, 2004

I don't think any poster or even Dirk is against the CoP. He is questioning some of the results of the testing during the Grand Prix. This is important. This is going to be the wave of the future in figure skating scoring. We should have questions raised. They should be answered. Otherwise we will giiving a blanket acceptance of what we already have. And you know after the first Worlds where the CoP is used, there will be tons of posts on how so and so was cheated. And we are back to square 1.

I would like to read more articles on what is wrong with the CoP, and how it can be rectified. I want the system to really work - not just bow down to the system as it is.

Joe

Mathman · Jan 31, 2004

ITA, Joe. Like it or not the CoP is here, nothing can stop Cinquanta and his allies from implementing it, so people who have reservations must now focus on identifying its worst faults and fixing them.

One thing, though -- Dirk Schaeffer, in common with most independent statisticians who have worked up mathematical models of figure skating judging, is against the CoP. Very much so. As expounded by scholars such as Dr. George Rossano and Dr. Sandra Loosemore, the whole concept of adding up points is inherently less reliable than methods based on ordinals.

Whether or no, to me this much is certain:

No method of judging can prevent cheating and collusion.

No method of judging can prevent (nor should it) an honest difference of opinion in a close contest.

No method of judging can prevent the loser and his/her fans from saying, "We wuz robbed."

So we continue on as best we can.

Mathman

Joesitz (RIP) · Jan 31, 2004

Mathman - You seem to be dancing to the tune.

"So we continue the best we can" - ok so why bother to change?

If we change for the better, that is the way to go. If we change for the sake of change, I'm not impressed.

I am for the CoP but I am listening to those opposed to the CoP so that the CoP can improve.

Joe

Mathman · Feb 1, 2004

We're saying the same thing, Joe. We should work hard to identify the shortcomings of the CoP and fix them.

MM

Mathman · Feb 8, 2004

I decided to revive this thread from the third page because I continue to be intrigued by the final conclusion that this author draws from his statistical analysis of the scoring of the Grand Prix events.

(The CoP) may possess some redeeming features, but at present, seems to avoid major embarrassment only by virtue of the fact that the judges appear to be ignoring it as best they can.

What Dr. Schaeffer is referring to is the tendency on the part of the judges to give blanket marks across the board both for GOE and (especially) for Program Components. That is, in practice they are mostly subverting the intent of the CoP and judging as they always have. "I liked this skater's performance the best, so I'll give her the highest marks throughout; I liked this skater second best," etc. Dr. Schaeffer thinks this is a good thing, and that if the judges didn't do this it would be a disaster.

This question came home to roost in the men's competition in the Grand Prix final. If the judges had been polled, most of them probably would have said that Plushenko gave the best skate overall and deserved to win the gold medal. But when the points were tallied, because Plushenko did not get any credit under the CoP for a third combination, Sandhu came out on top, frustrating the wishes and the intent of the judges. Presumably, this is the kind of thing that Dr. Schaeffer regards as a "major embarrassment" for the CoP.

What will the judges actually do, at Worlds for instance, if they believe in their hearts that Skater A deserves to win, but they know that Skater B did a lot of tricks and they are not sure how the points are going to work out. Will they load up on the GoE and Program Components to make sure that "justice is done" despite the system?

Mathman

Joesitz (RIP) · Feb 8, 2004

Mathman - I was listening to Dick talking during the Nats 6.0 system that the judges were using the CoP GoE for practice. Nothing wrong with that. The final result would be in the 6.0 system.

Joe

hockeyfan228 · Feb 9, 2004

Mathman said:
This question came home to roost in the men's competition in the Grand Prix final. If the judges had been polled, most of them probably would have said that Plushenko gave the best skate overall and deserved to win the gold medal. But when the points were tallied, because Plushenko did not get any credit under the CoP for a third combination, Sandhu came out on top, frustrating the wishes and the intent of the judges. Presumably, this is the kind of thing that Dr. Schaeffer regards as a "major embarrassment" for the CoP.

Why is it a major embarrassment for CoP for the scoring system to enforce the rules? It seems more of an embarrassment for the judges to "prefer" a skater who has broken them and give that skater extra credit for an unlevel playing field.

Athletic competition is not supposed to be like the episode of Star Trek where Kirk admits that a test was unsolveable except by breaking the rules, which he did to pass it.

Doggygirl · Feb 9, 2004

I agree with Hockeyfan...

I believe it is the skater's responsibility to know the rules and follow them. I cringe when I hear "Plushy lost on a technicality." I suppose one could argue that's a true statement, but under the rules, Sandhu won fair and square on the day of the comp in my opinion. I hate to see anything detract from that.

I truly hope the CoP can evolve over time to achieve a better result than just "overall impression."

DG

giseledepkat · Feb 9, 2004

Mathman said:
As expounded by scholars such as Dr. George Rossano and Dr. Sandra Loosemore, the whole concept of adding up points is inherently less reliable than methods based on ordinals.
Mathman

I apologize for picking on you, Mathman, but the above quote is unclear to me. Do you mean to say that:

Rossano and Loosemore believe strongly that "the whole concept of adding up points is inherently less reliable than methods based on ordinals", and have therefore devoted their energies to advancing said beliefs

or...

The statement "the whole concept of adding up points is inherently less reliable than methods based on ordinals" is not up for debate in the mathmatical community at large.

I guess what I'm really asking for is your opinion, Mathman!

'Cuz I respect you so much! Oh, and while I'm at it -- [OT] Did you ever finish the statistical analysis of Michelle's jump percentages that you talked about doing a while ago? (Maybe I missed it!) [/OT]

Thanks, Pennie

Joesitz (RIP) · Feb 9, 2004

hockeyfan228 said:
ISU can 1. Figure out which judges are off the mark, because there is an exact, not relative, score for the elements 2. Find out if the system itself is causing the issue -- i.e., the caller downgrades and the judges apply deductions to an underrotated jump, which the ISU has already said it's going to change -- which is a pretty instant feedback loop

What you say, if it works will assist in preventing such inequalities in later competitions. But what about NOW? If the ISU can figure out which judtges are off the mark, will they take immediate action? This action taken immediatey will assist the skater who is not in first place if the review is going in that direction. If they wait until May, I doubt very much they would admit to an inequality and even if they did, they will not take back any medals already issued. Am I Correct?

Joe

hockeyfan228 · Feb 9, 2004

Joesitz said:
What you say, if it works will assist in preventing such inequalities in later competitions. But what about NOW? If the ISU can figure out which judtges are off the mark, will they take immediate action? This action taken immediatey will assist the skater who is not in first place if the review is going in that direction. If they wait until May, I doubt very much they would admit to an inequality and even if they did, they will not take back any medals already issued. Am I Correct?Joe

There are a couple of issues that could be in affecting this:

a. An identified problem with assessing jumps is the overlap between the callers' and the judges' aegis. Until they remove that variable, it's hard to make an argument for any given judge being incompetent. The more the judges use the system, and the more they identify the crosswires, the more variables they can eliminate.

b. As far as the PE scores, I would say that they would have to take action against at least 90% of the judges on the basis of not following written guidelines. The PE scores are working like the pre ordinal. They know they have a problem here, and have admitted this publically. But if PE scores are working like ordinals, then it's no worse than before, but they have a basis to get better.

c. They have been making changes to the system all along. When the judges said that seeing the levels affected them psychologically -- i.e., they gave better GOE's to higher-level elements -- they asked to have the levels suppressed and only the elements displayed. This was implemented right away. They may have enough data now to at least suggest whether the same phenomenon happens with jumps -- i.e., 3A's and 4's don't get the same deductions for the same problems in the jump.

d. They need quite a bit of data to support patterns. If they go in for the kill right away based on insufficient data, they'll alienate everyone.

The official evaluations come at the end of the season, but that doesn't mean the ISU haven't been debriefing the certified judges, nor does it mean that they haven't been talking to the judges in the meantime.

They admit to inequalities every time they publish the results of the official investigations. Even under secret judging last year, in an ISU communication, they published the names of the judges who had been brought up for bias.

Mathman · Feb 10, 2004

giseledepkat said:
I apologize for picking on you, Mathman, but the above quote is unclear to me. Do you mean to say that:

Rossano and Loosemore believe strongly that "the whole concept of adding up points is inherently less reliable than methods based on ordinals", and have therefore devoted their energies to advancing said beliefs

or...

The statement "the whole concept of adding up points is inherently less reliable than methods based on ordinals" is not up for debate in the mathmatical community at large.

I guess what I'm really asking for is your opinion, Mathman! 'Cuz I respect you so much! Oh, and while I'm at it -- [OT] Did you ever finish the statistical analysis of Michelle's jump percentages that you talked about doing a while ago? (Maybe I missed it!) [/OT]

Thanks, Pennie

Hey, Gisele. Just a quicky for now. I guess I meant the first. The first is certainly true.

As for the second interpretation, I guess my overall position is, more skating, less statistics. Although numbers are my game, I think that these statistical analyses are swamped in importance by issues that cannot be addressed by mathematics (ethical issues, for instance), and often serve as little more than vehicles to advance our preconceived opinions, prejudices and bias. (That is pretty much the role of applied statistics anyway, especially in the social and behavioral sciences.) More later.

And from that dreary topic to...

Michelle's jump statistics. I went to Heather's Jump Statistics page, which contains data about most of Michelle's long programs since 1993. Counting two-foots, spin-outs, touch-downs, falls and doubling probable triple attempts as failing, Michelle was successful on 408 out of 487 attempted triple jumps in the data base. This is an 84% success rate.

For comparison, five out of six is 83% and six out of seven is 86%.

Mathman

Ptichka · Feb 10, 2004

c. They have been making changes to the system all along. When the judges said that seeing the levels affected them psychologically -- i.e., they gave better GOE's to higher-level elements -- they asked to have the levels suppressed and only the elements displayed. This was implemented right away. They may have enough data now to at least suggest whether the same phenomenon happens with jumps -- i.e., 3A's and 4's don't get the same deductions for the same problems in the jump

OK, now I am a little confused. Say the judges is marking a spiral. Let's take a change-of-edge requirement for levels 2 and 3. For level 2 it is: Spirals curve mainly in one direction but at least one uses opposite curve, different edge combinations. For level 3, it is: Spirals curve equally in both directions with unassisted change of edges. Let's ignore all other spiral requirements for a second, and pretend like the whole GoE was only based on this. Now, if a judge is marking a level 2 spiral, he/she sees a change of edge and gives a positive GoE. However, if that same spiral is judged as level 3, they may say, well, sure, she did a change-of-edge, but it's not anywhere near equal in both directions; this may result in a negative GoE.

hockeyfan228 · Feb 10, 2004

Ptichka said:
OK, now I am a little confused. Say the judges is marking a spiral. Let's take a change-of-edge requirement for levels 2 and 3. For level 2 it is: Spirals curve mainly in one direction but at least one uses opposite curve, different edge combinations. For level 3, it is: Spirals curve equally in both directions with unassisted change of edges. Let's ignore all other spiral requirements for a second, and pretend like the whole GoE was only based on this. Now, if a judge is marking a level 2 spiral, he/she sees a change of edge and gives a positive GoE. However, if that same spiral is judged as level 3, they may say, well, sure, she did a change-of-edge, but it's not anywhere near equal in both directions; this may result in a negative GoE.

Good question. This is another potential overlap between caller and judge. When the GOE's were first written, the Levels were displayed. Either

1. ISU didn't clean up the GOE requirements when the change was made to drop levels from the screen, so that the judge is left scratching his/her head (and making it up, judge by judge)

2. The GOE requirements were updated when the change was made -- even if only orally to the trained judges or not published in an ISU communication -- to tell them that the caller would decide how equal the COE was and call the appropriate level, and the only thing the judge had to worry about was how deep and secure the edges were, along with the rest of the requirements.

gkelly · Feb 10, 2004

Ptichka said:
Let's take a change-of-edge requirement for levels 2 and 3. For level 2 it is: Spirals curve mainly in one direction but at least one uses opposite curve, different edge combinations. For level 3, it is: Spirals curve equally in both directions with unassisted change of edges.

I don't read that as meaning that a change-of-edge spiral is required for level 2 and that the change-of-edge spiral(s) must curve equally in both directions (i.e., before and after the edge change) for level 3.

A spiral *sequence* for the short program especially must contain at least three spiral positions with at least one change of foot.

The sequence can be either circular, elliptical, or serpentine in shape, or can combine the two.

A circular or elliptical spiral sequence will travel only clockwise or only counterclockwise around the rink. Well, as with circular step sequence, there may be some edges that curve in the opposite direction against the circle, but they won't be very long or deep if the basic shape of the sequence is to be maintained.

"Spirals curve mainly one one direction but at least one uses opposite curve" to me means that of the three or more spiral positions in the sequence, at least one of them must be on a clockwise curve if the rest of them are counterclockwise, or vice versa.

So with these level definitions, a circular/elliptical sequence will never qualify as level 3 and would be unlikely to qualify as level 2.

A change-of-edge spiral curves in both directions without putting the free leg down, so including one automatically meets the "at least one uses opposite curve" requirement for level 2.

For level 3, the edge changes must be unassisted *and* the sequence in general must curve approximately equally in both directions. A two-lobe serpentine in which the skater is in different spiral positions for most of each lobe would meet the equal curving requirement.

I'm not sure what they mean by unassisted -- at the very least, I would assume that means the skater can't lower the free leg and paw the toe on the ice to assist the edge change; I'm not sure if changing the free leg position at the edge change and/or holding the free leg with the hand counts as an "assist" in this context. They define "unassisted positions" as "no holding of the free leg" in one of the other criteria, but here they may be talking about unassisted edge changes, not unassisted positions.

What about one-foot turns in the middle of a spiral? Without changing the position, most often we'd see the slide spiral from back outside to forward inside. Often we can see spiral, lower leg, three or bracket or rocker or counter, raise free leg to spiral again. Does that just count as "minimum number of steps between the positions" or does it also count as "unassisted change of edges"? In any case, it would also contribute to a sequence qualifying as level 3.

I would hope that the language has been or can be clarified for the judges and callers who need to apply the definitions, and for the skaters who need to fit their programs to the rules.

Analyzing the Code of Points

gsk8

Joesitz (RIP)

hockeyfan228

Mathman

Ptichka

Forum translator

Joesitz (RIP)

Mathman

Joesitz (RIP)

Mathman

Mathman

Joesitz (RIP)

hockeyfan228

Doggygirl

giseledepkat

Joesitz (RIP)

hockeyfan228

Mathman

Ptichka

Forum translator

hockeyfan228

gkelly

Similar threads

Connect with us