How to make components relevant for men? | Page 3 | Golden Skate

How to make components relevant for men?

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
That's what I was referring to when I wrote "gaming the system".

The point I was making was that the way the components are judged make them so irrelevant that even the 4th place scorer can win without extraordinary difficulty. I think she deserved her win and she did have the most difficult program (made more difficult with the back loading), but her technical wasn't extremely above the field the way it was with Nathan at worlds. Or you think there is a crucial difference between? I mean system is working the same with ladies and men, in both disciplines you cant up your PCS because it cant be upped after some point. And thats the problem there.

Yeah, i just disagree with your choices of words. Doesn't doing a multiple quads is gaming the system the same way like doing all your jumps in second half. I mean, doing more quads is the same way special like doing more jumps in second half. Or you think there is a difference :) The point is - system works the same with ladies and men. Both disciplines have the same problem - you cant up your PCS after some point. While you can your TES. So basically you cant win with your PCS...for now...
 

Metis

Shepherdess of the Teal Deer
Record Breaker
Joined
Feb 14, 2018
Unfortunately, no matter what system you use there is always opportunity for manipulation
People more gifted than I am at maths would dispute that. Range voting (what PCS arguably is, and ironically is the example for how range voting works) theoretically eliminates all strategic voting/metagaming. Ranked choice/instant runoff (IRV) comes close but isn’t bulletproof. In my opinion, voting systems are actually a good place to look for information on building a system that doesn’t reward dishonesty (“favourite candidate betrayal,” strategic voting), maximizing both utility and “pleasant surprise,” and begins with certain expectations regarding human behaviour — that we’re rational actors up to a point (“rational” in the game theoretical sense), but also predictably irrational and prone to distrust, bandwagoning, blessed and cursed with the brains we have and prone to predictable irrational choices, etc.

My points in bringing up the weakness of using a trimmed mean with regard to PCS are relatively straightforward:
1. It creates an inescapable metagame among the judges — i.e., it creates an incentive to score dishonestly, which sows distrust (which fuels metagaming).
2. Assuming some percent of judges are rational actors (play the metagame), there’s a point at which too many metagamers causes the system to collapse. If you’re familiar with the Prisoner’s dilemma, you already see where this ends: at a certain point, each judge has zero confidence in the others to score honestly, and we wind up with a stable, sub-optimal equilibrium. (You can actually play around with the group composition in this game, which is a solid overview of the Prisoner’s dilemma and common strategies.)
3. Does this explain all the upward variance in scoring? No. But it’s an untested hypothesis. And there are times when I’ve seen a judge’s marks and can’t tell if they were metagaming or drunkenly turning dials. The fact that that question exists at all is kind of the point.
4. This is a bug that’s also a feature of using a trimmed mean. It’s inherent to the method ISU chose to calculate PCS and GOEs. I actually agree that there’s always room for some manipulation in scoring (and I think ISU wants judges to be able to have a fair amount of control over podium order), but using a trimmed mean always results in the same inescapable problems and results in a worse system than, for example, just taking the total average (don’t discard any scores) (taking the total average has its own problems but that’s a different topic altogether).

I have kind of said everything I have to say on the deleterious effects of using a trimmed mean. Ultimately, it causes more problems than it solves, and you don’t need to have an understanding of game theory to see why. And the idea that judges can’t realistically guess how high to go is not a strong argument, given that there’s a rough “floor” to a given skater’s PCS (e.g., Skater X averages 90 for a clean programme), discretion even within mandatory GOE deductions, a laughably high bar to triggering ISU’s policing mechanisms in PCS, and the fact that a judge has plenty of time with a given panel to observe how high (or low) they’re averaging.

To put it another way: if I tell you to pick a value along a 0.25 range between 8.75 and 9.75 that’s equal to or higher than the one I chose, your answer is 9.75. That doesn’t require explaining. But what if I told you to pick a value from the same set that’s within 0.25 points of mine? There are three options that cover 0.75 variance (9.00, 9.25, and 9.50), which gives you the largest possible safety net (a full point, as you have an additional 0.25 in leeway). Which one should you choose? Whichever value is correct for the performance given, if we were using the mode instead of the mean. Here’s a fun hypothetical PCS row:
9.0 9.0 9.25 9.25 9.25 9.50 9.75

Three of seven judges went with 9.25, so we could arguably call it a day here, as we clearly have a modal value. But for argument’s sake, let’s say that’s not sufficient — 3/7 isn’t a majority, so we need to do the yadda-yadda, throw out the high and low, and find a trimmed mode. You can do the computations in your head: 9.00 and 9.50 cancel out in an average (as they’re + and - 0.25, respectively). The modal value remains 9.25. (Which number did I pick? 9.25.)

Is the mode superior to the mean for PCS? Maybe, maybe not. Averages are always susceptible to outliers, which is what the trimmed mean is meant to protect against, but it’s always easier to skew an average than a median or mode. That aside, while this was a fun thought exercise and basically what I’ve been playing around with with old scores, it’s also just not happening; we’re never moving off any kind of average because doing so would reduce the power of each individual judge, and I don’t think ISU wants that.


That's why it's so important to have a system of checks and consequences for the judges.
It’s also why tech panels need to enforce the rules equally or not at all, across all events. The breakdowns in the system are pervasive and not just limited to the judges. TES inherently assumes that the tech panel has proper equipment (they don’t) and are making UR and edge calls uniformly. What ISU doesn’t seem to realize is that their refusal to permanently ban anyone for cheating, the flagrant “the rules only matter when we say they matter” level of arbitrary judging, the rapid escalation in PCS, etc. — there will always be an audience for figure skating, but part of why I follow the sport in bursts before becoming burnt out again is because I end up caring about as much as ISU seems to, which is to say: not much.

Humans are imperfect and will make technical calls and judging decisions I disagree with, which is fine. But ISU has set up a system where the dominos start falling as soon as a skater hits the ice: the tech panel has worse angles than your average broadcast and makes the calls they feel like making, judges award positive GOEs to elements that should have been reduced in value or given a mandatory deduction, TES inflates, PCS inflates in part to keep up with TES... I feel like deleting my account. [emoji106]
 

rocketry

Rinkside
Joined
Mar 30, 2006
I think the biggest problem is corridor judging and the tendency to use PCS as proxy marks for "who we think should be placed in what order." I don't know how that gets changed with any structural changes.

I think it would be really interesting and technologically feasible these days if judges actually had quantified information on hand about some of the things you're mentioning like number of crossovers, amount of choreographic content in time, transitions into each element, etc. If the technical panel gets to review jumps, why shouldn't the PCS judges be able to review certain things? I don't think these quantifiable things should be the end all in whether or not someone deserves PCS marks, but I wonder how judges would actually react if they saw they were about to give a program that's 75% jumps and crossovers into jumps 9s.
 

BillNeal

You Know I'm a FS Fan...
Record Breaker
Joined
Jan 10, 2014
I am really starting to reminisce in the days when we got performances like Lambiel's Poeta, Buttle's Ararat, and Takahashi's Blues for Klook at their respective World Championships. These skaters got mostly 7's and 8's back then. These programs don't have the technical difficulty of today's but they deserve over 10's in PCS with the way the men's PCS are now given out.
 

Metis

Shepherdess of the Teal Deer
Record Breaker
Joined
Feb 14, 2018
I think the biggest problem is corridor judging and the tendency to use PCS as proxy marks for "who we think should be placed in what order." I don't know how that gets changed with any structural changes.

I think it would be really interesting and technologically feasible these days if judges actually had quantified information on hand about some of the things you're mentioning like number of crossovers, amount of choreographic content in time, transitions into each element, etc. If the technical panel gets to review jumps, why shouldn't the PCS judges be able to review certain things? I don't think these quantifiable things should be the end all in whether or not someone deserves PCS marks, but I wonder how judges would actually react if they saw they were about to give a program that's 75% jumps and crossovers into jumps 9s.

I still don’t understand why SS and TR are in PCS. Well, yes, I do, as TR is actually very much a “programme component,” but two of the five marks can be objectively and quantifiably measured to a reasonable degree, and the other three are highly subjective. (And it should be theoretically impossible to max TR by definition.)

Two of these things are not like the others, two of these things (probably) do not belong....
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I still don’t understand why SS and TR are in PCS. Well, yes, I do, as TR is actually very much a “programme component,” but two of the five marks can be objectively and quantifiably measured to a reasonable degree, and the other three are highly subjective. (And it should be theoretically impossible to max TR by definition.)

Because the TES is "technical elements score."

Program components are scores for various aspects of the program as a whole.

Skating skill is an aspect of the program as a whole, not an element.

As of now skating skill is judged qualitatively, not quantitatively, which also makes it more like the other program components than like elements.

However, if there were to be a division of labor between technical judges and performance judges, for example, then I would recommend that the Skating Skills component would be judged by the same judges who judge GOEs for the elements rather than by the judges who judge the more artistic components.

Maybe someday certain aspects of the Skating Skills mark could be measured quantitatively: measuring average and top speeds throughout the program, measuring the average depth of edge, counting the number of crossovers and turns of each kind in each direction and the amount of time spent on one foot vs. two feet, etc.

But there are still aspects of the Skating Skills component that really can't be measured and have to be evaluated qualitatively: e.g., "soft knees," effortlessness flow, balance, etc.

Similarly, it would be possible to simply count the total number of different transitional moves and the number of different kinds of moves and maybe the amount of time or the number of simple steps/strokes between each transitional move and the element that precedes or follows it. It might also be possible to assign levels of difficulty to specific types of transitional moves or ways of connecting in and out of elements. That would take care of the "Variety" and "Difficulty" criteria. But you can't count or measure the "Quality" criterion -- by definition that needs to be evaluated qualitatively. And the same with "Continuity of movement from one element to another." Two skaters can do the exact same moves between elements but one can make the movement look continuous and another can make each move look separate and the movement as a whole more choppy.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
I still don’t understand why SS and TR are in PCS. Well, yes, I do, as TR is actually very much a “programme component,” but two of the five marks can be objectively and quantifiably measured to a reasonable degree, and the other three are highly subjective. (And it should be theoretically impossible to max TR by definition.)

Two of these things are not like the others, two of these things (probably) do not belong....

If you read ISU explanations of PCS carefully, you will see that you can quantify to some degree all the PCS, not just those two. CO by looking of the skaters pattern on ice and variations in elements, IN by the time skaters are skating on the tacts of the music, PE by skaters expressions and publics reactions are just one of the possible examples. Judges indeed counting many things during the program in every of PCS category. They just dont base their marks solely on that.
 

Metis

Shepherdess of the Teal Deer
Record Breaker
Joined
Feb 14, 2018
Skating skill is an aspect of the program as a whole, not an element.
I agree it’s not TES, but I’m not sure SS belong in PCS, as what’s captured there is more a measure of a skill that takes time to build, not something specific to a given programme or performance. Moreover, it should be the component with the least amount of variance in a short time and/or shouldn’t be prone to moving up or down with TES, other component marks, etc. But that doesn’t seem to be the case. (I vaguely recall someone mentioning data on which components move together but haven’t turned up that study myself.) Skating skills absolutely can improve over the course of a season, but measuring them in PCS honestly seems to me the worst of both worlds, as if you do improve but have a low PCS floor, you won’t be rewarded, and if you have weaker skating skills but your PCS is being inflated for any number of reasons, you’re going to be rewarded there regardless of whether or not you deserve to be. Which is nonsense, and of the five components, SS and TR strike me as the easiest to measure, whereas IE, CO, and PE can lead to genuine debates in which all sides are arguing in good faith.

However, if there were to be a division of labor between technical judges and performance judges, for example, then I would recommend that the Skating Skills component would be judged by the same judges who judge GOEs for the elements rather than by the judges who judge the more artistic components.
I agree. I would also possibly have TR marked by the same panel, but regardless, I think SS should be separated from the other components and evaluated in the context of technical elements, not programme components.

Maybe someday certain aspects of the Skating Skills mark could be measured quantitatively: measuring average and top speeds throughout the program, measuring the average depth of edge, counting the number of crossovers and turns of each kind in each direction and the amount of time spent on one foot vs. two feet, etc.
For the record, I don’t endorse a purely quantitative measure of SS or TR. But I do think quite a bit would change if judges were given more self-checking mechanisms (such as total number of transitional elements in the programme, or a mandatory replay of the connecting steps into the solo jump in the short — literally anything to make them think before settling on a number that “seems fair” relative to what they’ve given or plan to give in other categories).


If you read ISU explanations of PCS carefully, you will see that you can quantify to some degree all the PCS, not just those two. CO by looking of the skaters pattern on ice and variations in elements, IN by the time skaters are skating on the tacts of the music, PE by skaters expressions and publics reactions are just one of the possible examples. Judges indeed counting many things during the program in every of PCS category. They just doesnt base their marks solely on that.

I’ve read ISU’s scoring rules many times. I don’t think they translate into an acceptable research proposal from an undergraduate, but I also studied and used mixed methods research (quantitative and qualitative) and come from the social sciences. I accept that my standards are probably somewhat high, but let’s just take a look at CO, for example: “An intentionally developed and/or original arrangement of all types of movements according to the principles of musical phrase, space, pattern and structure.“

I’d submit that’s not a falsifiable statement — is there any programme that isn’t intentionally developed with regard to the music, the principles of three-dimensional space, and around some concept? The subsequent guidelines aren’t quantifying anything; they’re an expansion of the thesis (“What Is Composition”). Which leaves the following questions unanswered:
1. How important is representation of the elements? (Toe versus edge jumps, for example.)
2. How important is thoughtful distribution of said elements? Is three toe jumps in quick succession a problem in programme composition? Two spins back to back?
3. If a programme is fully front or backloaded but is done so as an artistic choice and truly novel, and moreover, the distribution of the elements is absolutely perfect with the music, is that a strength or a weaknesses in composition? (Only one of those cases would be able to offset a deduction in TES.)
4. If I can see that a skater’s layout is clearly built around min-maxing their points but it also is a masterclass in what programme composition should be, can I bang my head against a wall to get out of this paradox?

IE is even worse. There’s also the issue of what constitutes “interpretation” and whose music. Part of what made me actually care about Hanyu was Seimei — and I’m not Asian, but I am non-Anglo, and my culture has been through the cultural appropriation wringer on ice enough, thanks. I honestly do not have it in me to explain how freaking strong he had to be to take his starting position and make a gesture he knew the majority of the judges wouldn’t understand. It wasn’t for them, and that was... maybe the boldest move I’ve seen anyone make in FS, at least since the Legendary Backflip Freakout. I don’t think there’s any denying Hanyu interpreted the hell out of his music, but he also incorporated nuances and movement that aren’t in the FS lexicon, aren’t what anyone had in mind when writing the PCS guidelines, and frankly could have been either rewarded or punished by the judges because it wasn’t an interpretation built on their understanding of “this is what IE is” and he certainly didn’t conform to their notions of what an “artistic” Asian figure skater should be. That’s not quantifiable or written down by ISU, but when we get to the tenth Carmen of the night — I’m groaning as much as anyone, but there’s a reason these things occur. IE, PE, and CO are so vague that I believe reasonable people can make good faith arguments for any number of scores under ISU’s current language.

Also, I think we have very different definitions of quantitative and what it means to quantify something. A quantitative measurement of IE would assess how many elements and movements were performed in time with prominent musical cues (or for ID, how well a pair keeps the beat), skater’s speed in relation to the music’s tempo (do they slow and end a spin as the music slows and fades?), etc. I don’t endorse this, but that’s what an actual quantitative approach would be. For CO, you would first look at the elements and their distribution, and award or deduct points based on whether or not they had “intention,” whether or not the skater made good use of multidirectional space, and assess other movement in relation to the programme’s concept. It would take a panel and you’d have intercoder reliability issues (variance between scorekeepers). But that would be an actual quantitative measurement.

You can easily learn nothing (or worse, find a false result) employing quantitative methods when qualitative ones were called for, and simply performing some of the most basic types of quantitative analysis (such as a regression analysis) is about as useful as the underlying data at best if you aren’t adhering to the scientific method. I’m a positivist — I believe in science and reality, and I also was a structural theorist before I became ill, and I joked I was fluent in jargon. That being said, if your model is built on bad assumptions, it’s not superior to a solid case study (cough, rationalists, cough). Or, you know, people could just use mixed methods and the world would be a better place.

One of my many methods courses ended with all of us concluding we couldn’t necessarily define what precisely made good research, but we all knew it when we saw it. I feel the same way about ISU and judging.
 

Izabela

On the Ice
Joined
Mar 1, 2018
Is the system really the problem? Or is it because of the judges and how ISU doesn't try to maintain the competence of the judging panel? Because COP has been here for more than a decade and only recently (as least 4 years) since we've seen this ridiculous rise in PCS. Because I don't think ISU has to go back to the drawing board and overhaul the system when the problem lies on the people who apply/interpret (abused?) the system. Changing some parts of the scoring system is IMO just a patchwork solution. If ISU really intends to make the scoring system truly reliable, they should focus on having a much better way to train their judges/tech panel.
 

xeyra

Constant state
Record Breaker
Joined
Jan 10, 2017
Is the system really the problem? Or is it because of the judges and how ISU doesn't try to maintain the competence of the judging panel? Because COP has been here for more than a decade and only recently (as least 4 years) since we've seen this ridiculous rise in PCS. Because I don't think ISU has to go back to the drawing board and overhaul the system when the problem lies on the people who apply/interpret (abused?) the system. Changing some parts of the scoring system is IMO just a patchwork solution. If ISU really intends to make the scoring system truly reliable, they should focus on having a much better way to train their judges/tech panel.

Sometime during 2015, Lakernik did an interview where he said judges were being too lenient with the ultra-C elements, from quads to throw quads, as in they were forgiving issues in quads through GOE they didn't with triples. And this is noticeable in the fact you hardly ever see triples get full GOE in men, outside of 3As, no matter how many transitions in or out you have on those elements, but a quad can get random 3s thrown at it even if it meets barely 2 bullets.

Since TES became higher and higher with these ultra-C elements, the PCS also started increasing, maybe because if the judges got more lenient on the quads, they also seemed to become more generous with their PCS scoring. Possibly also because they were trying to close the gap between TES and PCS. This leniency, of course, seems to apply more if you're from a big federation... reason why small time skaters who don't do steps before a solo quad in a SP can get dinged much more harshly than big fed skaters. The harder the quad, the more lenient the judging, even.

ISU has done basically nothing to dissuade the issue with judging, though. Their solution is to change the system, not the judging itself; but their judges will still be the same, with the same fed interests and the same leniency.
 

Harriet

Record Breaker
Joined
Oct 23, 2017
Country
Australia
Is the system really the problem? Or is it because of the judges and how ISU doesn't try to maintain the competence of the judging panel?

That's generally how grade inflation (or in this case score inflation) happens: the problem is not with the system so much as with the people applying it being inadequately trained, insufficiently supported, and under external sources of pressure that have an influence on their continued employment. Though the situation isn't helped by having inadequately/informally defined assessment criteria passed on within a closed community, which from what I've read on this site certainly seems to be the case with the IJS, particularly with PCS.

They really do need to hire a good project manager and invest a bit of time, money and effort into this. It wouldn't even take that much (work, or more importantly for the ISU money) to get the IJS and judge training sorted out. Grade/score inflation is not an insoluble problem; we've been solving it in the education field for upwards of twenty years now (including PE and performing arts education), though admittedly it gets harder when the problem's been let run rampant for a while. And neither is external pressure on judges. Gymnastics found a way to get that sorted out, at least with regards to the issue of bloc scoring, and to be honest did rather better at it than education. Figure skating can too.

It really is starting to look to me like the ISU either can't recognise the actual problems they're faced with, or doesn't really want to get them sorted.

I’ve read ISU’s scoring rules many times. I don’t think they translate into an acceptable research proposal from an undergraduate

:laugh: I'd grade them 'Revise and Resubmit', at best.
 

Baron Vladimir

Record Breaker
Joined
Dec 18, 2014
^^^
Well, i agree that PCS can be objectivized more, and ISU can make judges to follow their guidelines about PCS more strictly... define PCS as rules which must be followed and not just guidelines... So in the end we can get more objective system and objective judging... But i think we will face with bigger problem then. The problem would be in the fact that by following those ultimate written rules how the program which can give you more points should look, every program will basically look the same. Cause all programmes will look the same, we may lost general watchability of the competition. So the question is - do we watch figure skating more for our enjoyment in general competition or to see who is objective winner (declared by objective judges based on objective PCS rules)? Because i dont think a point of competition is just to declare winner but public enjoyment in it, i understand why those rules are not so strictly written. I think ISU follows the same logic. Thats why we dont have Figures and Compulsory dance any more (where you can judge skaters more objectively, but what general public dont enjoy to watch)...
 

xeyra

Constant state
Record Breaker
Joined
Jan 10, 2017
^^^
Well, i agree that PCS can be objectivized more, and ISU can make judges to follow their guidelines about PCS more strictly... define PCS as rules which must be followed and not just guidelines... So in the end we can get more objective system and objective judging... But i think we will face with bigger problem then. The problem would be in the fact that by following those ultimate written rules how the program which can give you more points should look, every program will basically look the same. Cause all programmes will look the same, we may lost general watchability of the competition. So the question is - do we watch figure skating more for our enjoyment in general competition or to see who is objective winner (declared by objective judges based on objective PCS rules)? Because i dont think a point of competition is just to declare winner but public enjoyment in it, i understand why those rules are not so strictly written. I think ISU follows the same logic. Thats why we dont have Figures and Compulsory dance any more (where you can judge skaters more objectively, but what general public dont enjoy to watch)...

I guess that's why they want to separate figure skating into a technical program and an artistic program. But then we run into the problem of even more subjective judging in the later type of program...
 

Metis

Shepherdess of the Teal Deer
Record Breaker
Joined
Feb 14, 2018
It really is starting to look to me like the ISU either can't recognise the actual problems they're faced with, or doesn't really want to get them sorted.
That’s the ultimate problem. I have no doubt ISU is aware of various issues — the exploitation of the trimmed mean was part of the Yuna-Adelina saga (and that was when one score was discarded at random, good times) and, well, I don’t believe in conspiracy theories, but obviously ISU has no problem with failing upwards, as you can see from Lakernik’s current job title; at the same time, I also have no doubt that we’ll ever see a system that meaningfully reduces the amount of agency judges have and their ability to rank order the podium in some way. The whole system could be transparent, but it’s not, and that’s not an accident. You know things are bad when an economist has to write a paper to prove things are worse under CoP than ordinals in terms of bias and corruption just to get some actual attention on the issue. (Oh, an economist has blessed us with simple statistical analysis? Everyone shut up and listen to the white guy. [emoji23]) (j/k, economists are rational actors too.)

So to answer the question someone asked: no, none of this is new. It’s arguably more obvious and relevant since PCS has gone up and functionally “maxed out” for some skaters. Simply changing how PCS is factoured would bring PCS’ value up relative to current competitive TES (i.e., both scores would be roughly equal in weight in total score) without mandating inflation in PCS.

The backhalf bonus, not PCS, is what’s killed originality in programme composition in my opinion. I’m actually agnostic on fully front/back- loaded programmes as an artistic choice — done well, I have no problem with either and am inherently against any rules that constrain originality. However, with the current backhalf bonus being what it is, min-maxing TES is extremely important, and it’s killing the ability to build a programme from the ground up rather than build a programme around a skater’s min-maxed layout. This may change when we see the full details of the new scoring system, but, regardless, I think it’s impossible to talk about CO without acknowledging the elephant in the room that is the 10% bonus.

I do think it’s worth asking what kind of variance people want and/or find acceptable. I understand the frustration with corridor judging, but what do we actually think is a reasonable range of variance in scoring within one component as well as across components? I’m genuinely curious; I haven’t thoroughly assessed the data but my hunch is that both are decreasing (or maybe this year’s WC panel was just really in sync while last year’s was “mom and dad are fighting now” — like I said, haven’t crunched the numbers). Is, say, 2 points between SS and PE (7 in SS, 9 in PE) crazy? Is a 9 in PE dependent on SS to some extent, such that it isn’t possible? What about a two point spread between SS and CO? If we want out of the corridor, we’re going to need to open some new rooms, to make a bad joke.

How much variance is good variance within a given mark? More to the point, pretending we aren’t stuck with a trimmed mean, how do we want to use the marks from the panel? A mean is most “representative” of everyone’s views but also the most volatile and easiest to skew measurement. Some form of a modal value is typically close to the mean but also would reward consensus and not represent the range of values present, assuming there’s a range to represent. But you can’t skew the mode with an outlier the way you can skew a mean. Or we could take the median, which is arguably the happy middle ground of both approaches, though not all of the incentive to inflate would be gone. Still, any takers? I’m honestly curious as to how people would make use of the raw marks, as on their own, they mean (not a pun, sorry) very little and we could change how we use them to prioritize different things — some form of a modal score with 4/7 identical values being the clear final mark is one option; it rewards consensus (if at least four judges land on the same number, that’s the score), it’s unlikely you’re going to get that level of agreement unless people are actually honest (dishonesty/inflation does you no good here), it actually makes perfect 10s feasible... and it doesn’t show any actual range, so anyone outside the consensus isn’t represented at all, making it arguably “unfair,” or at the very least not a “true” representation of scores. (It isn’t! And there are only seven data points, so a bimodal distribution isn’t going to help!) But who said that’s what we were prioritizing when averaging scores hasn’t necessarily created a system that does show any meaningful range in values?

Part of why I’m curious is because I wonder if this isn’t a bit of an Ouroboros — we’d all know a good system and good judging when we saw it, yet setting down markers declaring “no more than X amount of variance” or “you must be within Y standard deviations” feels like creating the problem and not the solution? Or maybe that’s just me. I just wish there was less discretion in TES/GOEs, and maybe a bit less in PCS (though that could be solved by moving at least SS out of PCS).

:laugh: I'd grade them 'Revise and Resubmit', at best.
You are a far kinder person than I am... this goes in my dumpster fire hall of fame, along with “Georgia is a Middle Eastern country,” “Japan was an allied power during WWII,” and other greatest hits. Honestly, someone really should teach this as a model of what not to do; not my field, but talking about ISU’s methodology failures has me like. [emoji23]☝️[emoji23]
 
Top