ISU experiment: dividing tasks among judges?? | Page 2 | Golden Skate

ISU experiment: dividing tasks among judges??

Alba

Record Breaker
Joined
Feb 26, 2014
If they had six "technical judges" they wouldn't need technical specialists at all. Each of the six judges could separately judge the level, under-rotations, wrong edge, etc., and come up with a total score for each element.


So at least 3 different opinions, if not more, if the level is 3 or 4. :slink:
With the tech. specialist at least we have one (3 at max) to blame only, and know who they are. :biggrin:
 
Joined
Jun 21, 2003
So at least 3 different opinions, if not more, if the level is 3 or 4. :slink:

To me, that is the nature of a judged sport. If three expert observers think that the skater did enough to earn a level four and three other equally expert observers disagree, what's wrong with averaging those scores?

With the tech. specialist at least we have one (3 at max) to blame only, and know who they are. :biggrin:

Do away with anonymous judging. :yes:
 

louisa05

Final Flight
Joined
Dec 3, 2011
The idea of dividing tasks is good (one person can't give GOEs and at the same time evaluate 5 different PCS categories accurately), but why do we need all these complications, with some judges doing more things than the others? :rolleye: Just make one GOE panel and one PCS panel! (But giving only GOE scores would be extremely boring, yes :slink: )

I have thought for a long time that a split was the best answer particularly for fairly judging PCS. PCS stays so static through the season regardless of a skater's actual performance and I have always assumed it is because the judges are not able to adequately evaluate it. How does one get a sense of choreography or connection to the music when so busy marking GOE on each element? They have to kind of make do on some of those marks and I think they end up just making assumptions based on knowledge of the skater and his/her previous performances--so the system builds in a reputation bias.
 

Alba

Record Breaker
Joined
Feb 26, 2014
To me, that is the nature of a judged sport. If three expert observers think that the skater did enough to earn a level four and three other equally expert observers disagree, what's wrong with averaging those scores?

Nothing wrong, that's the judged sport but it's also now with the GoE's and PCS's and we still complain and blame them.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
To me, that is the nature of a judged sport. If three expert observers think that the skater did enough to earn a level four and three other equally expert observers disagree, what's wrong with averaging those scores?

Some questions/possible issues:

Would these technical judges both call a level (or underrotation/downgrade) and also give a separate GOE?

It's easy to average GOEs, and it's possible to average levels.

But since underrotations and downgrades -- and as of this year edge calls -- change the base value of the element and possibly the value of the GOEs, it wouldn't work well to average the base values and average the GOEs separately and add/subract the two, as is done now. It would make more sense to take the base value +/- GOE for the element to get total element score from each judge and then average those totals.

So what would be shown on the protocol? E.g., just 3Lz or CCoSP in the element column, and then << e -3 or level 4 +1 along with or without the final element score in each judge's column? That would require wider columns.

Or just the final element score for each judge without publishing how they each called the element or rewarded/penalized in GOE?

What if a judge makes a mistake on identifying the basics of the element, e.g., calling 2A instead of 3A or 3T instead of 3F or CSp instead of FCSp, etc.? Would that judge's score for that element be based on that judge's mistake identification or would there be some mechanism to alert the judge that they probably made a mistake?

What if the skater makes a mistake such that it's unclear exactly what they did/what they should get credit for. E.g., a hop on the landing of a jump that turns into a single loop (no longer scored for juniors and seniors, but could affect whether that jump is considered a jump combination). Or off-balance steps between two jumps: Should it be called the first jump +SEQ and no value for the second jump? Or as two separate jumping passes, which would give credit for the second jump here but could lead to too many jump elements and no credit for the last jump pass in the program? Same with a wide step/recenter between two feet of what was supposed to be a change-foot spin.

In the current system with a technical panel working as a team, for those cases the tech panel can review the element after the program and determine the most appropriate call, and then alert the judges, who will award accordingly.

If each judge makes an independent determination of what the skater actually did, different judges might have significantly different lists of elements, with different numbers of elements. How would the computer or an as yet unidentified official determine that elements 7 and 8 should be considered 2A+SEQ (with the 2T after a step out ignored) and CSp, not 2A and 2T?

If there's no separate technical panel, who would keep track of the number of falls? The element judges? The PCS judges? Both? (What if they disagree?) Or would that responsibility go to the referee? Or get rid of the fall deduction and charge the PCS judges with reflecting falls in their PCS?

Would each judge be responsible for keeping written notes on level features, edge/rotation calls, and errors/+GOE bullet points as well as inputting both element codes and GOEs into the computer in real time? That's going to take their eyes off the skaters for several seconds between elements, which could be a big problem if the next element follows immediately. At least now the tech panel gets a data entry operator to do the data entry for them.

I think each judge would end up missing some of the feature and GOE details they're supposed to be responsible for. Hope that the rest of the panel will catch what each one misses and it'll all average out in the end?

This will be even more true in pairs and dance where each judge with a single set of eyes needs to keep track of both "what?" and "how well?" details of what two skaters are doing. How often will they they be accurate about catching different errors from each partner -- which may affect the element code/base value -- on side-by-side elements?

These technical judges would have little attention to spare for the "Element matches the musical structure" bullet point, so that would rarely be awarded in GOE -- but PCS judges could pay attention and reward it under Choreography and/or Interpretation.
 
Last edited by a moderator:
Joined
Jun 21, 2003
I think it would work fine if each of the six technical judges decided for himself what element the skater did, what level to assign, what errors should be deducted for and what positive GOE features should be rewarded. Judges could have access to replay if they were in doubt about whether the skater did a double of a triple Axel, whether the flutz was bad enough to deserve a deduction, etc. Put all these scores together for each judge, then average the totals.

I agree that there is a problem with displaying the results on the protocols, for instance if one judge scored a combination and another scored two separate elements. Maybe the judges should be allowed to confer on the name of the element, in the same manner that the three-person tech panel does now. Fall deductions, costume deductions, music deductions I suppose could be handled by the referee -- but I am not sure we need a separate fall deduction anyway. A fall would negatively impact GOE (and possibly even the name of the element), in addition to detracting from presentation, choreography, and interpretation.

One judge could make a bad mistake, but the tech panel can, too. Mistakes by the tech panel affect the marks of all judges, whereas a mistake by a single judge would tend to get averaged out. In the case of a borderline downgrade, for instance, the either-or choice of the technical panel can wipe out the whole element, whereas if three judges thought the jump was an under-rotated triple and three thought it was an over-rotated double, both opinions would be represented in the average. There could also be a trimming of the mean to mitigate the effect of bad errors in judgement.

I guess my instinct is that if the proposal is to dedicate six of the twelve judges to the examination of each element in isolation, well, tha;'s the job of the technical panel, too. Do we need to put nine people on this task?
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Essentially there are three primary tasks in scoring a program:

*Identifying each of the elements according to the rules in place

*Evaluating how well each element was performed

*Evaluating the overall performance as a whole in terms of several broad areas

Under 6.0 judging each judge did all of the above and were free to consider as many or as few details as they liked, with no report of what they actually did and did not consider.

IJS has added more officials and broken up the tasks in one way.
This Nebelhorn experiment proposes to keep the same tasks and the same assignments (but several new rules) for the tech panel but vary how many of the judging tasks are assigned to each of the judges.

Some posters in this thread have proposed other ways of dividing the tasks among officials. Still other methods are also possible. We could discuss how various alternatives might work. -- How much to keep certain assumptions that have been part of IJS since the beginning and how much to start from scratch, which the ISU has shown no evidence of doing?
 

karne

in Emergency Backup Mode
Record Breaker
Joined
Jan 1, 2013
Country
Australia
Interesting concept, though I'm more concerned about how it would trickle down to club comps.


Though I imagine people will only be happy with it until one day there's a Russian judge on the GOE panel and a Russian skater beats their favourite to win gold - then it'll be all CORRUPTION! EVILLE! THIS IDEA WAS OBVIOUSLY CREATED BY RUSSIA TO CONTINUE THEIR EVILLE!
 

Sandpiper

Record Breaker
Joined
Apr 16, 2014
My first thought: What an unnecessarily complicated mess. I don't mind dividing duties, but why do it in such a confusing, haphazard fashion?
 
Joined
Jun 21, 2003
My first thought: What an unnecessarily complicated mess. I don't mind dividing duties, but why do it in such a confusing, haphazard fashion?

I don't know about haphazard -- there is actually a method in their madness.

But madness it is none the less. :) There has to come a point at which someone says to the ISU, "hold, enough."

Here is the proposal for the tasks of the twelve judges, which will be tested at Nebelhorn.

Judge #1 SS TR PE
Judge#2 TR PE CH
Judge #3 PE CH INT
Judge #4 SS TR CH
Judge #5 SS TR INT
Judge #6 SS PE INT
Judge#7 TR CH INT

Judge #8 GOEs and also SS
Judge #9 GOEs and PE
Judge #10 GOEs and CH
Judge #11 GOEs and INT
Judge #12 Just GOEs

This way each of the five program components is evaluated by five judges, and GOEs are judged by 5 judges also. (Too bad the math worked out so that there was one GOE judge left over with no component to judge -- TR already had five judges. Oh well, nobody's perfect.)

The judges must arrive a day early to attend a seminar, led by the technical specialist, at which the duties of each judge will be assigned and explained.

So this isn't crazy after all -- except that it is. ;)
 
Last edited:
Joined
Jun 21, 2003
No, this is terrible. Who would want to be a judge under such a system?

Let's say you are judge #2. All we want from you is to count Transitions and then to give an opinion on Performance and on Choreography. If you have something to say about the skater's blade to ice skills (SS) or about the way they interpreted the music, or about the quality of the skater's triple Lutz -- shut up, we're not interested. Who do you think you are, judge #8?

The ISU is fiddling while Rome is burning. This monkey business will not produce better programs. It will not produce better skating. It will not win back disaffected fans. It will not improve their product at the elite level and it cannot be implemented in lower level contests with fewer judges. It will not eliminate cronyism and disruptive politicking within the organization.
 

Sandpiper

Record Breaker
Joined
Apr 16, 2014
Mathman, when you break the experiment down like that, it looks less confusing... but infinitely more crazy.

If they're going to split the judges, just have some judges specialize in GOE and some specialize in PCS.

Granted, I don't think the ISU is stupid enough to actually implement this judging division, but...
 

alebi

Medalist
Joined
Jan 11, 2014
I think we are on the right track, I'm looking forward to see if it works well or the judges'll find it too complicated even if I still believe that 1 tech specialist, 6 GOEs' judges and 6 PCS' judges is the easiest and best way.

Actually, one thing I would like to see different is the PCS range. While I can clearly identify a -3 from a -2 or a +1 from a +2 on a technical element, I can't understand the difference between a 6.25 and a 6.50 or 7.75 from a 8.25 skater. I've always found this range too big, with too many options inside that honestly don't give you the real idea of a skater. A 1 to 5 range would be more immediate to understand (for example 1 is poor on that element, 2 is sufficient, 3 is decent, 4 is good, 5 is majestic). Or if they want to diversify more the skaters they can still use 1 to 10 but without decimals.
 
Joined
Jun 21, 2003
I think we are on the right track, I'm looking forward to see if it works well or the judges'll find it too complicated…

I have the opposite concern, that the judges will find it too simple. If you are an individul judge all you have to do is worry about, say, SS, P&E, and Interpretation, and otherwise you can take a nap for four minutes.

Actually, one thing I would like to see different is the PCS range. While I can clearly identify a -3 from a -2 or a +1 from a +2 on a technical element, I can't understand the difference between a 6.25 and a 6.50 or 7.75 from a 8.25 skater.

It's a dilemma. The problem that any scoring system must deal with is that a single scale must be able to accommodate skaters at all levels from beginners to world champions. 6.0 had the same challenge, hence the need for decimals. If you didn't have enough divisions then every elite skater would automatically deserve the highest mark, in comparison to all the skaters in the world.
 

FSGMT

Record Breaker
Joined
Sep 10, 2012
1 technical specialist, 6 GOE judges and 6 PCS judges would be the best option for me, since there is another problem: training the judges, since "normal" judges (I think) are not trained in all the small details required to assign level/e or UR calls, so creating "technical judges" who evaluate both levels and GOEs would cost a lot...
 

alebi

Medalist
Joined
Jan 11, 2014
I have the opposite concern, that the judges will find it too simple. If you are an individul judge all you have to do is worry about, say, SS, P&E, and Interpretation, and otherwise you can take a nap for four minutes.

eeeeh?! How can you evaluate a performance or interpretation without watching the whole exhibition? :biggrin:
What you're saying is a lack of work ethic that, honestly, we can find under any kind of system. BTW watching a judge falling asleep during an exhibition would be epic :laugh:


It's a dilemma. The problem that any scoring system must deal with is that a single scale must be able to accommodate skaters at all levels from beginners to world champions. 6.0 had the same challenge, hence the need for decimals. If you didn't have enough divisions then every elite skater would automatically deserve the highest mark, in comparison to all the skaters in the world.

I think the only "automatic" mark is SS, which is obvious higher for elite skater than beginners and you don't need to watch the skater closely to evaluate it. But sometimes it happens that, for example, a junior can give a better interpretation than a senior skater. So I don't see any problem if the first get a 7 and the second one get a 5, I don't think it has to do with their category or level. Maybe a 1 to 10 scale is appropriate and perfectly understandable when 6 is the sufficiency, I can also accept a 0.5 mark... but having so many decimals for PCS... it's not like in school when you get a mark based on the number of errors, how can you say a performance was 0.25 better than another one? :think: And it has nothing to do with your level but with your exhibition of that day. So the same skater could get a 5 on a bad day and 8 on a good day. I want to see this kind of diversification, not marks based on categories so that you inevitably need all those decimals :eek:hwell:
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Mathman, when you break the experiment down like that, it looks less confusing... but infinitely more crazy.

If they're going to split the judges, just have some judges specialize in GOE and some specialize in PCS.

They tried that as an experiment at Nebelhorn a number of years ago but decided not to adopt that split. What I heard was that the judges who were judging GOEs-only were bored.

It looks like this breakdown is an attempt to give at least some of them something else to do as well. And to give the PCS-only judges less to do so they can be more analytical about the components that they are judging, plus they won't all be tied to the Skating Skills mark since some judges aren't judging Skating Skills at all.

There's no guarantee this experiment will be deemed successful and its approach officially adopted. And if it is, the breakdown of who does what would have to be flexible since not all competitions would be able to bring in 12 judges plus a tech panel.

I think we are on the right track, I'm looking forward to see if it works well or the judges'll find it too complicated even if I still believe that 1 tech specialist, 6 GOEs' judges and 6 PCS' judges is the easiest and best way.

This could be something like what Mathman suggested above.

The tech specialist could be responsible for identifying how many elements were executed and getting the codes for each one input into the computer. If they make a calling mistake, the data entry operator (if there is one) or the referee or any of the judges could alert them (did you really mean toe loop? it looked like a flip to us!) to fix it at the end of the program.

Element judges could identify jump errors that reduce the base value (but might not agree, so with this year's rules and five judges, the base value for a flip or lutz with questionable rotation and questionable takeoff edge could be different for every judge).

These judges would also subtract quality/GOE reductions for other errors and add points for good quality and for difficulty features. How those pluses and minuses would be displayed on the protocol would need to be determined.

Or keep the levels and 3-person tech panel system already in existence, but split the judges so that some ("technical judges") evaluate elements and Skating Skills and Transitions, and others ("performance judges") judge Performance Execution, Choreography, and Interpretation, with separate appointments and training for those two different roles, although any individual is welcome to pursue both.

Actually, one thing I would like to see different is the PCS range. While I can clearly identify a -3 from a -2 or a +1 from a +2 on a technical element, I can't understand the difference between a 6.25 and a 6.50 or 7.75 from a 8.25 skater. I've always found this range too big, with too many options inside that honestly don't give you the real idea of a skater. A 1 to 5 range would be more immediate to understand (for example 1 is poor on that element, 2 is sufficient, 3 is decent, 4 is good, 5 is majestic). Or if they want to diversify more the skaters they can still use 1 to 10 but without decimals.

It's a dilemma. The problem that any scoring system must deal with is that a single scale must be able to accommodate skaters at all levels from beginners to world champions.

Yup.

When starting out learning IJS, it's best to focus on the whole numbers. See the Program Components Overview (linked at the bottom of the page).

The numbers are defined as
Outstanding 9-10
Very Good 8
Good 7
Above Average 6
Average 5
Fair 4
Weak 3
Poor 2
Very Poor 1
Extremely Poor <1

For Skating Skills especially, these correlate with technical skill levels. I think of 10 as being a great performance by an all-time great skater, 0-1 as being a beginner with very little one-foot skating or identifiable edges. So what is Average/5? Seeing how judges have been using the numbers, I think of it as acceptable senior quality, nothing special in senior competition -- but a strong score at lower levels, more exceptional the lower you go.

If 5 is basic senior quality, then scores lower than that will be more common at lower levels, and scores higher than that will be more common at the international competitions. Judges who are experienced at judging all levels will quickly be able to peg skaters to a general range (e.g., 5s or 7s) on the full scale.

Fans who only watch seniors, or only elite seniors, will see a narrower range of skills and might think of 5 as a very low score. But watching a lot and analyzing the criteria for the various components could allow even fans who don't know much about skating technique to predict whether a skater they've never seen before (i.e., no reputation judging) will likely earn 6s or 7s or 8s.

The decimal places allow for finer distinctions among skaters who are basically in the same skill range. That's where judges might start thinking comparatively between skaters in the same event, even though strictly speaking they're not supposed to. They also allow judges to balance out the various criteria on the same component, in case a skater is notably stronger at some criteria and weaker at others.

At that level of distinction, there aren't really right or wrong answers.

The other components don't need to be directly tied to the Skating Skills skill level, although some of the criteria (e.g., Difficulty and Quality of Transitions, Carriage under Performance/Execution, Pattern and ice coverage under Choreography, Effortless movement under Interpretation) do rely at least to some degree on control of the technique.

So each judge needs to develop a mental standard of what is "Weak" or "Average" or "Very Good" performance or interpretation, across the full range of skaters from beginners to world champions. Here's where I think more detailed guidelines and training would be useful. But judges do develop a consensus of what they consider average, etc., by judging with each other, reading protocols to compare their marks to the whole panel, discussing in the judges' room afterward what they liked and didn't like.

Fans can develop a sense of above-average performance or very good interpretation too, at least at the whole number level. Since these don't rely so much on technical skating knowledge, fans' evaluations in these could be just as valid as judges', especially for fans with performing arts backgrounds. But fitting their evaluations to the 10-point scale means understanding the range of the scale, having a sense of what to expect from non-elite skaters as well as the elites.

6.0 had the same challenge, hence the need for decimals. If you didn't have enough divisions then every elite skater would automatically deserve the highest mark, in comparison to all the skaters in the world.

Yes.

The other reason 6.0 needed decimals, along with tiebreakers, was to have enough room to rank the skaters in large fields. If you only have 10 numbers available for each mark, you would have to use the full range of numbers for every competition regardless of the skill level of the skaters. There would not even be a rough correspondence between scores and skill level -- they would be nothing but place holders. And judges still might run out of numbers/get "boxed in" pretty quickly. Deductions (as in short programs) could not be taken from the actual scores but just considered mentally by each judge when deciding which placeholder scores to put up to rank the skater appropriately.

(With compulsories, which only received one mark for each figure or for each dance until the early 1990s, it would be impossible to distinguish 30 skaters with only 10 possible numbers.)

1 technical specialist, 6 GOE judges and 6 PCS judges would be the best option for me, since there is another problem: training the judges, since "normal" judges (I think) are not trained in all the small details required to assign level/e or UR calls, so creating "technical judges" who evaluate both levels and GOEs would cost a lot...

Yes, any major reassignment of duties would require more training.

If the tech panel as it now exists were to be abolished and individual judges were assigned to independently score difficulty along with quality for each element, I think it might make sense to get rid of the current definition of levels and just allow element judges to give extra points for extra difficulty as well as for extra quality. But they would still need to be retrained.

I think the only "automatic" mark is SS, which is obvious higher for elite skater than beginners and you don't need to watch the skater closely to evaluate it.

I don't think you need to watch the skater closely to get a sense of what general range they belong in. But two skaters with similar power and edge depth, for example, may show different levels of mastery of multidirectional skating and one-foot skating. So paying close attention allows judges to reward skaters who actually show more skills even if the first impression is similar.

But sometimes it happens that, for example, a junior can give a better interpretation than a senior skater. So I don't see any problem if the first get a 7 and the second one get a 5, I don't think it has to do with their category or level.

Yup.

Maybe a 1 to 10 scale is appropriate and perfectly understandable when 6 is the sufficiency, I can also accept a 0.5 mark... but having so many decimals for PCS... it's not like in school when you get a mark based on the number of errors, how can you say a performance was 0.25 better than another one?

As I mentioned, strictly speaking judges aren't supposed to be thinking "I already gave Skater P 6.5 on this component, and Skater Q was a little better, so I'll give her 6.75." Although I wouldn't be surprised if some do think that way.

What it's really for is for a judge to say to himself something like "Skater Q was better on this component than just Above Average, but not quite Good. Halfway in between? No, I'd say Q was closer to Good in my mind, almost there, but those couple little problems/weaknesses won't let me go all the way to 7 on this score."

Imagine that you're judging numerous skaters who are all very close in overall ability on this component. How much room should be available to reflect slight differences? If they're all pretty much average and nothing special, should they all earn 5.0? Or can the ones who have average skill and are having a good day earn 5.5 or even 6.0, and the ones having a bad day earn 4.5 or even 4.0?
 

Sandpiper

Record Breaker
Joined
Apr 16, 2014
It looks like this breakdown is an attempt to give at least some of them something else to do as well. And to give the PCS-only judges less to do so they can be more analytical about the components that they are judging, plus they won't all be tied to the Skating Skills mark since some judges aren't judging Skating Skills at all.
Hmm, when you put it that way, it makes sense. I think the current PCS scoring is too tied with skating skills. Someone with good SS immediately gets their P/E and INT scores raised, which is wrong to me. On the other hand, I actually do think PCS should be tied to TES to some degree, since some skaters would never break through if other skaters can make boatloads of mistakes and still win on PCS.

But: The judges were bored? :unsure: Well, I guess, if they're bored they won't do a very good job, but... it's their job. You do what you're required to do.
 

drivingmissdaisy

Record Breaker
Joined
Feb 17, 2010
Hmm, when you put it that way, it makes sense. I think the current PCS scoring is too tied with skating skills. Someone with good SS immediately gets their P/E and INT scores raised, which is wrong to me. On the other hand, I actually do think PCS should be tied to TES to some degree, since some skaters would never break through if other skaters can make boatloads of mistakes and still win on PCS.

I kind of agree with you, but another solution could be to increase penalties for errors so that PCS is less likely to save a skater with multiple falls.

As far as SS being a more important factor in PCS, I definitely agree. On TR, Yuna at her best would score ahead of Mao at her best, even though Mao typically has more intricate programs and performs transitions admirably. However, to me SS makes a skater stand out from the sea of pretty princesses with similar programs so I don't necessarily mind it being rewarded.
 
Top