Page 10 of 10 FirstFirst 1 2 3 4 5 6 7 8 9 10
Results 136 to 144 of 144

Thread: Hersh: In figure skating, same old, same old

  1. #136
    Custom Title
    Join Date
    Jul 2003
    Posts
    3,833
    I said
    it would be very rare that some judges would have skater A somewhat higher on all five components, and the other judges would have skater B higher on all components
    Quote Originally Posted by Mathman View Post
    Thanks to randomization of judges' scores, we do not know whether this is rare or common. My intuition is that it is not rare at all.
    Here are all the protocols for US Nationals. Scores are not anonymous or randomized -- judge #1 on the officials list is always judge #1 for all skaters, etc.

    Last I heard that was also true for the JGP, if you think international events are a better example.

    For any two skaters (not necessarily near the top or even adjacent in the standings) in any event can we find examples in which
    1) a majority of judges thought that skater A was better (or equal to) than B on all 5 components
    and
    2) all the remaining judges thought that skater B was better (or equal to) than A on all 5 components.

    I.e., not even one judge had one component reversed from their overall opinions of the two skaters' relative PCS quality. I'll allow ties on some of the components.

    It will be tedious to look for them. I'll take a quick look at the senior medalists to see if there are any examples there and report back if I find any.

    ETA: In 24 head-to-head matchups among senior medalists in short and free programs for all disciplines, I found one example:
    In the ladies' SP, 8 judges marked Gold higher than or equal to Edmonds in all components. Judge #1 marked Edmonds higher in all.

    But anyway, now I am sorry that in illustrating the question I presented sample scores for only one component. This sent the discussion off on a tangent.
    I agree this discussion has nothing to do with Hersh and little to do with anonymity. I was wondering a post or two ago whether to take it to a new thread -- it does interest me.

    Fine with me if mods want to split off the last page or so of this thread.

    I guess that is what the whole controversy comes down to. What is the purpose of a sports competition? Is it to see which competitor outperformed the other, or is it to decide which competitor did a better job of conforming to an objective standard?
    The point of the competition is to see who outperformed.
    However, the task of the judges under IJS is not to rank the skaters, vote for which skater they thought performed best, or choose who they think should finish higher. Unlike under 6.0, they're just supposed to score each skater independently.

    With IJS it's possible to score skaters who have no one to compete against. This won't happen in international competition, but it does happen at some club competitions or even at some national championships of smaller federations: one skater (usually male) or team enters an event, no one else enters, or one or two others enter and then withdraw. The remaining skater has invested money to travel to the event, paid an entry fee for a club competition, or needs to skate and be scored to make a national title official.

    With 6.0, the judges can write down whatever scores they like before the skater performs, and all ordinals will be 1s. The accountants could even print out the result sheets in advance. Or the judges could all sleep through the program then wake up and input random marks. All the judges need to do is rank the skaters, and with a field of 1 skater the result is literally a no-brainer.

    With IJS, the tech panel calls the elements and the judges assign GOEs and PCS, based on what the skater actually does.
    Their scores are not about ranking, but about evaluating the performance.

    With IJS, even in large fields the tech panels' and judges' process is supposed to be about evaluating each performance independently. Then the numbers are added up and the skater with the highest total wins. But unlike with 6.0, none of the officials is tasked with deciding who should have the highest total.

    Staying in the corridor has nothing to do with bunching the PCS tightly for each skater. It has to do with being not too far off from the other judges [b]for each component. If the other judges spread out their marks, you had better do so, too, or you risk being outside the corridor on some of them.
    See page 6 of ISU communication 1631:

    For each of the five (5) Program Compnoents, the Judge's corridor will be based on 1.50 Deviation Points (15,0% of the maximum 10.0 points per Component) between the score of a Judge and the calculated Judges' average score for the same Component, i.e. in total 7.50 Deviation Points for the 5 Program Components. Plus and minus Deviation Points are subtracted.
    The example they give has Judge A giving a skater component scores of 4.00 4.00 6.25 7.25 7.00 on a panel with averages of 5.75 5.85 5.45 6.00 5.55. Judge A has a pretty extreme spread here but is "well within the allowed corridor" because the very low marks balance out the very high ones.

    The way the ISU calculates deviation from the average for components (as opposed to GOEs, for which the plus and minus deviation points are added), spreading marks for the various components of the same skater can actually help a judge stay within the corridor better than bunching them too closely but marking in a different range than the rest of the panel.

    I'm not sure how many judges actually realize that though.

    Is this true? Do you mean that the ISU officially encourages judges to do this?
    As far as the ISU is concerned, all I know is what's in this document about judge evaluation. And the e-mail several years ago from a member of the assessment commission reminding other judges to evaluate Transitions independently.

    I have heard US judges discuss the concept of spreading marks, as a good thing.

    The scoring scale has to accommodate all skaters from beginners to world champions. There cannot be too much of a spread between the best skater in the world and the second best.
    True.
    As I understand, the recommendation to spread marks between skaters means to use the whole range of marks as appropriate, regardless of the type of competition.

    Just because a skater is entered in an ISU championship -- let's say Euros or 4Cs -- doesn't mean that they automatically deserve championship-level scores. Or that just because the vast majority of senior level skaters deserve scores in the 5s, 6s, maybe 7s, that judges should be limited to that range. At Euros or 4Cs you might well see the first place skater earning 9s and the last place skater earning 3s or even 2s for some components.

    At Junior Worlds, 2s at the bottom of the field are more common but should only be given if warranted, if the skater is clearly below typical junior quality for that component. Very high scores are even rarer among juniors than seniors, but judges shouldn't go into the event thinking that they should cap their scores in the 6s just because this is a junior event -- if a great junior performance is just as good in some components as a senior performance that deserves 8s, then it should get 8s.

    Sometimes the second-best skater in an event (in each judge's opinion) is pretty close to the best skater and should receive similar scores. And maybe the third, fourth, and fifth best as well. Sometimes the best skater in an event is in a class by him/herself and deserves much higher scores than the next best skater(s) in the field. Depends on how they skate -- their overall skill level, and how well they actually deliver on that day.


    Spreading marks within the scores for a single skater means that just because the skater deserves a high score for Skating Skills doesn't mean they automatically deserve a high score for Transitions or Performance/Execution or Interpretation . . . or vice versa.

    I think some fans want judges always to give large gaps between a skater's highest and lowest component.

    As far as I can tell the ISU wants judges to give large gaps when the skater's skills are unbalanced from one component to another, and small when the skater is at close to the same level in all component areas.

    I don't think so. If the contest is close, the scores should be close together. If one skater is much better than the other then the scores should be farther apart.
    Absolutely.

    If a judge honestly believes that the best skater was significantly better than the next best (in their opinion), on one or all components, they should reflect that significant difference with scores more than 0.25 apart. If they think the skater was significantly better on all components, the larger gaps will add up to several points across 5 components.

    If a judge honestly believes the two skaters were about equal on a component, she can give the same mark. Or give 0.25 more to the skater she thinks was slightly better. If she thinks skater A was slightly better than skater B on all components, that will add up to a full point or two on PCS as a whole.

    But really the judges shouldn't be comparing the skaters directly, they should be comparing each to their own mental standards. Ideally they should have a good internal sense of the difference between Good (7) and Very Good (8), and halfway between Good and Very Good (7.5) or between but closer to one or the other (7.25 or 7.75), and then match each performance to that mental image.

  2. #137
    Custom Title
    Join Date
    Apr 2014
    Posts
    1,549
    But ultimately, components are just numbers. They are not--in fact, cannot be--objective standards. What happens in the end is ranking skaters, because that determines the medals/placements everyone cares about. I don't think most judges are capable of keeping an objective scale in their heads. They'll have to, at points, go, "Oh, I gave Edmunds 7.50, that means I need to give Gold 8.25 because she's better." If they don't do that, they'll run into fatigue from looking at so many competitors, and likely end up giving scores they don't truly believe in (I think this might be a factor in why people who don't make the final group are low-balled. They're superior to the group they're in, but judges aren't comfortable giving out sudden 9s when the best they've given so far is a 7.50. They don't "need" 9s to place the skater ahead. But by the end of the night, judges are comfortable giving out 9s, thus potentially "screwing over" the earlier skater).

  3. #138
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,179
    Quote Originally Posted by gkelly View Post
    Here are all the protocols for US Nationals. Scores are not anonymous or randomized -- judge #1 on the officials list is always judge #1 for all skaters, etc.

    Last I heard that was also true for the JGP, if you think international events are a better example.

    For any two skaters (not necessarily near the top or even adjacent in the standings) in any event can we find examples in which

    1) a majority of judges thought that skater A was better (or equal to) than B on all 5 components
    and

    2) all the remaining judges thought that skater B was better (or equal to) than A on all 5 components.

    I.e., not even one judge had one component reversed from their overall opinions of the two skaters' relative PCS quality. I'll allow ties on some of the components.

    ETA: In 24 head-to-head matchups among senior medalists in short and free programs for all disciplines, I found one example:
    In the ladies' SP, 8 judges marked Gold higher than or equal to Edmonds in all components. Judge #1 marked Edmonds higher in all.
    I will work on this, too. I am most interested in examples where a majority of judges favored one skater pretty much down the line, but only by a small amount, while other judges liked the other skater consistently and by quite a bit. Maybe this almost never happens in the absence of collusion and bias.

    However, the task of the judges under IJS is not to rank the skaters, vote for which skater they thought performed best, or choose who they think should finish higher. Unlike under 6.0, they're just supposed to score each skater independently….
    In arguments of this sort we are basically saying that the premisses underlying the IJS are valid because they are the premisses underlying the IJS. I do not accuse the IJS judges of not doing their assigned task. The question is, should the system be changed in view of the fact that it allows an enthusiastic minority to override a complacent majority?

    With IJS it's possible to score skaters who have no one to compete against. This won't happen in international competition, but it does happen at some club competitions or even at some national championships of smaller federations: one skater (usually male) or team enters an event, no one else enters, or one or two others enter and then withdraw.
    The IJS is good in this setting. However, I believe that under 6.0 judging also it was possible for a lone skater to skate against a "gold standard, silver standard, or bronze standard" in the case of a boy who is the only skater entered. (Or he could skate against the girls. ). The judges would decide whether he met the standard or not. (Sort of IJS in 6.0 clothing. )

    See page 6 of ISU communication 163
    I had in mind this kind of example. Judges 1, 2, 3, and 4 score program components SS and TR.

    SS: 8.00 8.00 8.00 8.00
    Tr: 2.50 2.50 2.50 8.00

    Judges 1, 2, and 3 have spread out their marks, but it is judge 4 who is “outside the corridor.”

    Judges 1, 2, and 3 have a total of -1.5 deviation points over two components. They are OK. Judge 3 has +4.50 deviation points over two components. This judge is in trouble. The question is not whether the scores in each column are spread out or close together, it is whether the score in each row is close to the average in that row or not.

    By the way, the fact that one has to go to such unrealistic extremes to create an example shows that it is almost impossible for any judge, however incompetent or biased, to get caught by the ISU judges’ oversight procedure -- especially in view of that plus/minus thing that you pointed out.

    Anyway,I don't think the "corridor" is to blame for judges tending to give almost the same scores for each of the five components for a particular skater.
    Last edited by Mathman; 07-06-2014 at 11:10 AM.

  4. #139
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,179
    Quote Originally Posted by Sandpiper View Post
    But ultimately, components are just numbers. They are not--in fact, cannot be--objective standards. What happens in the end is ranking skaters, because that determines the medals/placements everyone cares about. I don't think most judges are capable of keeping an objective scale in their heads. They'll have to, at points, go, "Oh, I gave Edmunds 7.50, that means I need to give Gold 8.25 because she's better." If they don't do that, they'll run into fatigue from looking at so many competitors, and likely end up giving scores they don't truly believe in (I think this might be a factor in why people who don't make the final group are low-balled. They're superior to the group they're in, but judges aren't comfortable giving out sudden 9s when the best they've given so far is a 7.50. They don't "need" 9s to place the skater ahead. But by the end of the night, judges are comfortable giving out 9s, thus potentially "screwing over" the earlier skater).
    It's very tricky. I agree with what you said, and especially with the point that psychologically it is practically impossible to avoid saying in our minds, "this performance was better than that one." That is why I can only be dragged kicking and screaming from my home world "Earth 6.0.". 6.0, for all its faults (not as severe as it is generally supposed, though), is at least honest on the point that we humans are good at comparing but bad at measuring against a standard that exists only in our minds.

    Every judged sport gives it a fling, though. In piano-playing contests performers get scores which supposedly rate them against a standard that, if not strictly objective, at least is supposed to take into account the judges' experience of many, many performances and also of imaginary ones that set the standard. Sometime no prize is given. Although pianist A was better than pianist B, they both stunk. (I.e., it was a splatfest. ) At a dog show the judges are not supposed to say, this dog is cuter than than one, but rather, this dog conforms more perfectly to the standards of the breed than that one (also a great champion) does.

    Bottom line: I don't know what I think, and I am sorry if I have given the impression in these threads that I do.
    Last edited by Mathman; 07-06-2014 at 11:21 AM.

  5. #140
    Custom Title
    Join Date
    Jul 2003
    Posts
    3,833
    Quote Originally Posted by Mathman View Post
    I will work on this, too. I am most interested in examples where a majority of judges favored one skater pretty much down the line, but only by a small amount, while other judges liked the other skater consistently and by quite a bit. Maybe this almost never happens in the absence of collusion and bias.
    It's a very specialized situation, which is why I expect it to be rare. Unless there really is pervasive collusion.

    I don't think simple bias would result in none of the majority judges liking the other skater better on even one component. Or all of the minority judges using a significantly wider range between the skaters. Assuming a minority of more than 1 judge, which was all I could find in a narrow sample -- and in that example, the judges with the largest differences between the two skaters were part of the majority and didn't have in the top 2 on PCS.

    I do not accuse the IJS judges of not doing their assigned task.
    Not on purpose, but using language like choice and vote implies a 6.0 assigned task.

    The question is, should the system be changed in view of the fact that it allows an enthusiastic minority to override a complacent majority?
    That's a valid question. Which I think should first be asked about honest judges with different preferences and use of numbers. And then even if we decide it's not a problem in the ideal case, whether real-world collusion is enough of a problem that a system designed for an ideal case is not appropriate.

    The IJS is good in this setting. However, I believe that under 6.0 judging also it was possible for a lone skater to skate against a "gold standard, silver standard, or bronze standard" in the case of a boy who is the only skater entered. (Or he could skate against the girls. ). The judges would decide whether he met the standard or not. (Sort of IJS in 6.0 clothing. )
    "Skating against the book" was really an ISI concept that some USFS competitions adopted (and State Games, which combine the two systems). But USFS has not allowed lone entries to be given anything except 1st place ordinals for a number of years now.

    I had in mind this kind of example. Judges 1, 2, 3, and 4 score program components SS and TR.

    SS: 8.00 8.00 8.00 8.00
    Tr: 2.50 2.50 2.50 8.00

    Judges 1, 2, and 3 have spread out their marks, but it is judge 4 who is “outside the corridor.”

    Judges 1, 2, and 3 have a total of -1.5 deviation points over two components. They are OK. Judge 3 has +4.50 deviation points over two components. This judge is in trouble. The question is not whether the scores in each column are spread out or close together, it is whether the score in each row is close to the average in that row or not.

    By the way, the fact that one has to go to such unrealistic extremes to create an example shows that it is almost impossible for any judge, however incompetent or biased, to get caught by the ISU judges’ oversight procedure.
    For two components, the allowable deviation would be 3.0, right? So you could be a little less extreme even here and still catch Judge 4.

    Most likely a judge who is caught by this method would be in too high a range across the board for some skaters and too low on others, and also out of whack on the GOEs.

    So a competent judge who judges most of the skaters honestly but intentionally tries to prop up a specific skater and lowball the likely closest rivals could avoid detection.

  6. #141
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,179
    Let's see if we can catch judge #1 at U.S. Nationals, the judge who gave higher scores to Polina Edmonds than to Gracie Gold in all five components, while all of the other judges did the reverse.

    Judge #1's component scores for Polina: 8.00, 7.75, 8.50, 8.00, 8.50. Total 40.75.
    Average scores for Polina, all 9 judges: 7.36, 7.04, 7.71, 7.36, 7.54. Total 37,01

    Total deviation = +3.74. (Must be > 7.50 for an IJS "anomaly.")

    Judge #1's component scores for Gracie: 7.75, 7.00, 8.25, 7.50, 7.50. Total 38.00
    Average scores for Gracie, all 9 judges: 7.93, 7.61, 8.29, 8.04, 7.86. Total 39.73

    Total deviation = -1.73 (Must be < -7.50 for an IJS anomaly.)

    This judge is in the clear. If she (the judges are named ) had really wanted to push Polina at the expense of Gracie she could have given Polina 8.75, 8.75, 9.00, 9,00, 9.00, while giving Gracie straight 6.50s, and still not get caught.

  7. #142
    Custom Title
    Join Date
    Apr 2014
    Posts
    1,549
    Quote Originally Posted by Mathman View Post
    It's very tricky. I agree with what you said, and especially with the point that psychologically it is practically impossible to avoid saying in our minds, "this performance was better than that one." That is why I can only be dragged kicking and screaming from my home world "Earth 6.0.". 6.0, for all its faults (not as severe as it is generally supposed, though), is at least honest on the point that we humans are good at comparing but bad at measuring against a standard that exists only in our minds.

    Every judged sport gives it a fling, though. In piano-playing contests performers get scores which supposedly rate them against a standard that, if not strictly objective, at least is supposed to take into account the judges' experience of many, many performances and also of imaginary ones that set the standard. Sometime no prize is given. Although pianist A was better than pianist B, they both stunk. (I.e., it was a splatfest. ) At a dog show the judges are not supposed to say, this dog is cuter than than one, but rather, this dog conforms more perfectly to the standards of the breed than that one (also a great champion) does.

    Bottom line: I don't know what I think, and I am sorry if I have given the impression in these threads that I do.
    I'm mainly on the 6.0 side myself, especially in terms of "Presentation mark" vs. PCS. Mainly for the reasons you stated--that we won't having a vocal minority outweighing a conservative majority.

    Where I think COP has a possible advantage is the technical mark side. Under 6.0, judges need to keep all the technical stuff in their heads, and trying to compare skaters to the best of their ability. This is actually harder, imo, than just looking at elements and tossing around GOE.

    In some cases, I do think 6.0 could lead to better results nonetheless (e.g. if Skater A lands three beautiful jumps and falls on two other jumps, while Skater B lands five okay jumps, under 6.0 it's self-evident that Skater B should win, all else being equal. Whereas, under the current system, Skater A could end up winning due to getting +3s on the jumps she landed, even if she got -3 with fall deduction on the others. 6.0 would better be able to recognize that no, Skater A just made too many sloppy errors to win even though she had three good jumps).

    However, in other cases, judges could find it difficult to keep track of everyone under 6.0, without specific elements to score. This could lead to greater reputation judging under 6.0, assuming nobody makes huge glaring errors.

    ...So, bottom line, I'm not sure what I think either.

  8. #143
    Custom Title Mathman's Avatar
    Join Date
    Jun 2003
    Location
    Detroit, Michigan
    Posts
    28,179
    Quote Originally Posted by Mathman
    Bottom line: I don't know what I think.
    Quote Originally Posted by Sandpiper
    ...So, bottom line, I'm not sure what I think either.
    I do, however, know why I am so crotchety about the CoP. It was supposed to make figure skating better and it didn't.

    Quote Originally Posted by gkelly
    I said:

    Quote Originally Posted by gkelly
    it would be very rare that some judges would have skater A somewhat higher on all five components, and the other judges would have skater B higher on all components.
    Quote Originally Posted by Mathman:

    Quote Originally Posted by Mathman
    Thanks to randomization of judges' scores, we do not know whether this is rare or common. My intuition is that it is not rare at all.
    Here are all the protocols for US Nationals….It will be tedious to look for them.
    Indeed! 12420 comparisons to make for the ladies' short program alone. It would be cool if the USFSA made these data available electronically so we wouldn't have to input them by hand.

    Anyway, I have spent a little time with the protocols and have pretty much struck out. What I expected was that for each judge and for each pair of skaters, once the judge put skater A ahead on, say, SS, then this pattern would be repeated four more times for the other components. In the case of close contests, I found that sometimes it worked out this way, sometimes not. Not much to remark on one way or the other.

    I will look at some international results form the junior grand prix. So far it looks like my original concern is not much of a problem after all, in the absence of bias or aggressive advocacy.
    Last edited by Mathman; 07-06-2014 at 07:20 PM.

  9. #144
    Yulia and Ruslena team forever! Alba's Avatar
    Join Date
    Feb 2014
    Location
    Milan
    Posts
    3,184
    Quote Originally Posted by Mathman View Post
    Bottom line: I don't know what I think, and I am sorry if I have given the impression in these threads that I do.
    Quote Originally Posted by Sandpiper View Post
    ...So, bottom line, I'm not sure what I think either.

    I think both of you have a good point.
    So, I do think and belive that the only way to make figure skating better is to good/great skaters and programes.
    I know, it sounds very banal and a cliché, maybe , but in the end it's all about that. IMO.

Page 10 of 10 FirstFirst 1 2 3 4 5 6 7 8 9 10

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •