ISU Judges Assessments | Golden Skate

ISU Judges Assessments

Joined
Jun 21, 2003
Doggygirl said:
For whatever it's worth....
Hoo, boy -- my work's cut out for me now! :laugh:

I'll go to work on #1275 right away. About the first announcement, a couple of thing are worth noting.

1. There were 42 "assessments" (warnings) issued to judges in mens, ladies and pairs combined, compared to 56 for dance alone. Presumably this reflects the fact that dance judging requires more subjectivity in interpreting the rules than the other disciplines do.

2. The ISU notes that by giving out "assessments" in a timely fashion during the whole season, instead of waiting until the end, they were able to make corrections (or maybe just put the judges on their guard), so that the number of questionable judging calls decreased during the course of the season.

3. Here finally is a reason in favor of secret judging. The evaluators did not know the identities of the judges that they were investigating. This cuts down on the possibility of the evaluation committee members letting their friends off the hook. (I wonder if that is really true -- that the evaluators did not know who the subjects of the evaluation were.)
 
Last edited:

Doggygirl

Record Breaker
Joined
Dec 18, 2003
Mathman said:
Hoo, boy -- my work's cut out for me now! :laugh:

3. Here finally is a reason in favor of secret judging. The evaluators did not know the identities of the judges that they were investigating. This cuts down on the possibility of the evaluation committee members letting their friends off the hook. (I wonder if that is really true -- that the evaluators did not know who the subjects of the evaluation were.)

LOL MM. Like you, my initial reaction to this was good! That's actually a good use of secrecy! But like you, my skeptical side relative to Speedy entered my thinking very quickly.

Thanks for tackling this stuff!!!

DG
 
Joined
Jun 21, 2003
Thanks for posting this document, Doggygirl.

As for the mathematics, that is straightforward. For GOEs, a flag is raised if a judge's total GOEs for a skater is off from the average of the panel by an amount equal to the number of elements.

For instance a short program has 8 elements for which GOE is given. So you can be off from the majority judgment by a total of 8 points before you are in jeopardy of getting an "assessment" against you. This is quite lenient to the judges. You could give your favorite skater a +1 GOE on every element, when everyone else says 0, and still get in under the wire.

For program components the judges get even more leeway. In the five program components taken together you can be off by a total of 7.5. So for instance the judge who mistakenly entered 0.25 instead of 6.25, if 6.25 was the average of the panel, then this judge is off by "only" six points, so the judge is still OK if all of the other marks are in line.

If you get 4 assessments against you, you can be demoted to lower level contests, or suspended from judging assignments.

Here are a couple of other points.

1. The purpose of the evaluation is to spot instances of "errors" and also possible "national bias." There do not seem to be any mathematical tests in place specifically to investigate possible collusion between two or more judges.

2. An "assessment" is considered a “performance evaluation.” It is not a "sanction" and does not carry the risk of being punished by the ISU, as for instance supporting the WSF or publicly criticizing improper actions by ISU officials does.

3. Evaluations are done pretty much on the spot (later via videotapes for junior events). If the evaluators disagree with the majority of the panel, then the judges "corridor" is expanded appropriately. In other words, if the average of the judging panel gave a 6.25 in a particular program component score, but the evaluators thought it should have been 7.25, then a judge giving a score of 7.50 is regarded as being off by only .25 instead of 1.25. Evaluators are discouraged from attending practices, in order that their judgment of the actual performance not be compromised.

4. Judges have a right to defend their scores, even if they are very different from the majority. They can bring video tapes, etc., to a hearing to support their marks.

5. There are separate procedures for evaluating the performances of tech controllers.

I like it. Some people will think they need to tighten it up a little, but you have to let the judges judge.

MM
 

Doggygirl

Record Breaker
Joined
Dec 18, 2003
Interesting assessment MM!

Thanks for putting your brain power to this topic today. I do believe the judges need to be free to judge. Otherwise, we're right back where we started.

I do find the 7.5 point leeway on PCS interesting. As an example, I looked at the scoresheets from Jr. Worlds Ladies. The highest PCS score given was 7.5. So a judge would have to give a 0 to hit the radar. While I wouldn't want to see a leash so tight that someone other than the judges end up doing the "judging," I do think that path is a bit wide.

I almost think I'm OK with the range for the TES side. Back to my junior example, I happened to watch my download of Mao Asada's LP from Jr. World's today. Her 3A counted, but her GOE ranged from -2 to +2, which seems a pretty big spread. But the elements (especially jumps) happen so fast, that it's easy to imagine that one judge notices something for better or worse that another judge might not - hence a good reason for a panel with numerous judges.

If the vast majority of judges out there are truly trying to do their jobs to the best of their ability (rather than play politics) then I really think the change to COP...oops NJS can have a positive outcome.

DG

It's interesting that they make national bias a "point" of the evaluation process, but don't indicate a mathmatical method of accomplishing that.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Doggygirl said:
I do find the 7.5 point leeway on PCS interesting. As an example, I looked at the scoresheets from Jr. Worlds Ladies. The highest PCS score given was 7.5. So a judge would have to give a 0 to hit the radar.

But isn't that 7.5 across all five components? So if one skater is averaging 6.5 for her components, let's say, one judge could give her an average of 5.0 and still be within the corridor, but one more score in the 4s (or averaging 8.0 and one more score in the 8s), and they'd "hit the radar."

That is still a fairly wide leeway, but it does mean that a judge who is using a lower scale than the rest of the panel can't afford to also undermark a particular skater on purpose, or overmark one if they're using a high scale, without risking getting caught.

Do they still have a way of calibrating the range a la the "median mark" in the old system? If the scores are being announced, then the judges should be able to see where they fit in the range after the first skater. If not, someone might find that they were just on the wrong page all the way through for the general range of scores. But if they're out of the corridor in the same direction on all the skaters and not also favoring or punishing any in particular, they should be able to explain what the problem was if the assessors can't figure it out for themselves.
 

mpal2

Final Flight
Joined
Jul 27, 2003
Mathman said:
2. The ISU notes that by giving out "assessments" in a timely fashion during the whole season, instead of waiting until the end, they were able to make corrections (or maybe just put the judges on their guard), so that the number of questionable judging calls decreased during the course of the season.

Wouldn't this be an obvious result? IIRC, they were handing out assessments at the end of the season the 1st year which was completely stupid if you ask me. :sheesh: You could argue that they were giving judges time to get used to a new system. But if the point was to reduce errors, that just increases the need for immediate assessment. It doesn't make sense to let people keep making the same mistakes and getting into bad habits when they are learning something new.
 
Top