Multiplicative PCS scoring?

Skatesocs · Aug 29, 2020

Mathman said:
At the very top level a really fine performance might get 9.5, while a blah one (by a top-ranked skater) will get 8.5. This gives the first skater an 8 point advantage, which would increase to 10 with the higher factoring. So about an extra 2 points gain for the artist trying to catch the technician.

I doubt that much would change. This proposal would give the judges 20 gradations to work with instead of the current 40. The argument has been made, how can a judge possibly decide whether a performance is worth 4.75 points or only a piddling 4.5, and the same argument would question whether a judge can consistently discriminate between a performance worth 9 points out of 20 compared to one that deserves only 8 points out of 20, in musical interpretation, say. (Psychologists and learning specialists assert that 7 gradations is the most that humans are capable of handling -- that's why GOEs from -3 to +3 were so cool. A typical human mind can tell the difference between a +1 performance and a +2, on a scale from -3 t0 +3; but that same person cannot tell the difference between +2 and +3 on a scale from -5 to +5.

It's why I'm trying to think of how to make sure the differentials in scores in 6.0 would matter along with ranking. I think every single thing about this system is susceptible to time and psychology - and I think ranking things would make it easier? But I get the feeling I'll just be describing a CoP with better judging :slink:

l will have the excuse of having taken my last economics course in high school though.

Skatesocs · Aug 29, 2020

gkelly said:
ETA:
Oh, or did I miss understand, and you don't want to allow individual judges to use decimal places at all in the 20-point scale? In that case, ignore everything I just said and refer to Mathman's post above.

You misunderstood, yes. It's because of what Mathman points out - people are bad at finer differentiation - that I proposed that. He mentions a seven scale that works best - I've not read that bit of research, but it fits with humans being bad at that fine differentiation.

It's also partly why I feel it's just too many things and too much arithmetic in too little time. All you'd need to do in a ranking system is say "she is first. She is second. She is third"...

gkelly · Aug 29, 2020

Skatesocs said:
It's also why I feel it's just too many things and too much arithmetic in too little time. All you'd need to do in a ranking system is say "she is first. She is second. She is third"...

For the individual judges, skating had a ranking system for approximately a century, with various rule changes along the way.

I understand best the way it worked 1981 through 2002, or better 1991 through 2002 (once figures were out of the picture).

It's very simple for each judge to keep track of how they're ranking 2-6 skaters.

Keeping track of 30+ skaters in the same event is a whole different level of difficulty, which is why the placeholder scores and within-judge tiebreakers were necessary, and even so judges sometimes ran out of room and weren't able to slot later skaters in where they thought they deserved, or ended up tying skaters inadvertently. Hence the need for judges to "leave room" between and above earlier skaters, which affected how high they could score a top skater performing early in the draw.

There are also the issues that
*there is next to no information contained in the scores to indicate what each judge was thinking in coming up with those numbers,
and
*it's easy for judges to focus primarily on a few pet criteria or even on who they (consciously or unconsciously) hope will win when ranking the skaters, while ignoring other aspects of the performance

And that's not even getting into the complexities of the number crunching from the accountants' side of things, for each segment phase individually and for a competition as a whole. It is complex. Even a professional mathman has trouble keeping straight the distinctions between ordinals and factored placements and overall rankings/standings. (An experienced skating judge or accountant would be more familiar with how these terms were used in skating and less distracted by use of similar terms in other contexts.)

IJS is complicated both for judges and accountants (and tech panels, and coaches/skaters and fans) in different ways. Simplicity is certainly not its strong point.

However, it's important to note that the judges' (and tech panels') tasks in scoring performances under IJS are NOT to rank the skaters. They're just supposed to evaluate the skating on its own merits, according to the rules and their understanding of what makes good skating, in many specific areas.

In IJS the whole judging panels and technical panels in conjunction with the scale of values come up with total scores for each program that can be ordered numerically to produce rankings of which skater scored highest, second highest, etc., overall, and therefore overall results for the competition. But these scores also contain additional information that is more informative than 6.0-style scores could ever be, and that therefore can also be used for additional purposes.

If you think that a ranking system would be simpler for judges, give it a try yourself with 6 skaters you have no vested interest in or knowledge of official results. You could do a great job keeping in mind everything you know about skating. Or you could just count jumps and come up with a number for how well you liked the performance artistically. We'll never know how well you did your job based just on the scores. If some skaters are close in ability, you might need to make some tough choices between them. But once you decide, you should easily be able to assign scores to indicate how you ranked this handful of performances.

Then challenge yourself more by using this system to rank a 12-skater event, then 18, then 24, and then a short program with 30-40 skaters in randomly seeded skate order.

Then try the same process with similarly sized groups using IJS. Do NOT try to rank the skaters. As an IJS judge, you do not have knowledge or control of all the variables. Just evaluate each element and each component according to the guidelines and according to your knowledge of skating rules and technique.

Which system do you feel more confident with as a judge?

Then take off your judging hat and start thinking like an accountant.

Take a protocol from a large 6.0 competition phase (one whole SP or FS) in which there was significant disagreement among judges. How do you figure out the rankings of the panel as a whole? Do you use the majority system as it was in use for most of the 1980s and 90s, or do you use the OBO system from the turn of the century? Or do you have a better suggestion to offer?

(Don't worry about how to combine results from multiple competition phases. That's much simpler in comparison.)

Mathman · Aug 29, 2020

Skatesocs said:
I've not read that bit of research, but it fits with humans being bad at that fine differentiation.

The classical elementary experiment goes like this. Show the subject 7 rods of different lengths, ordered shortest to longest (or in seven shades of color from dark red to light red, or whatever). Now take away the models and select one of them to present to the subject by itself. Which one is it, the shortest, the second shortest, ..., the longest?

If there are 6 in total, everyone gets is right almost all the time. If it is 7, some people are better than others and most people get it right a pretty good percentage of the time. If there are 8 or more, practically no one performs well.

In contrast, if you give someone a thousand rods of ever so slightly different lengths, and the task is to place them from 1 to 1000 in order of length, allowing the subject to compare them two by two (this one is longer than that one), then everyone can do it and get the entire sequence right.

Applied to figure skating judging, this would argue in favor of rankings. When one skater performs and you have to decide whether that performance was an 8.25 rather than an 8.50 (choose one out of 40 categories), you are up the creek. But if every time a skater goes the judge says, that was better than the skaters that I so far have ranked worst, second worse, and third worst, but worse than the skaters that I have so far ranked best, 2nd best and 3rd best -- that, in principle, is no problem. She is better than the skater who I have in fourth place but worse than the skater who I have in third. Except for not being able to remember the details of each skater's performance all day long (

), this requires only two head-to head comparisons.

I used to admire 6.0 judging because there were seven categories: 0, 1, 2, , 6. (!!!

) But I found out that that had nothing to do with it. 6.0 was chosen as a perfect score in free skating because of how figures were judged, to allow the ISU to control the weight given to free skating versus school figures more naturally.

(By the way, the branch of mathematics that applies to deciding whether a skating performance is a 5.8 or a 5.9, or maybe sort of 5.8ish but leaning more toward 5.9 is called "fuzzy set theory." There was a flurry of interest in this subject in the 1970s and 1980s, but it kind of withered on the vine without, in my opinion, accomplishing much.)

It's also partly why I feel it's just too many things and too much arithmetic in too little time.

This, however, is the computer's problem, not the judges'. Once the judge hits that 8.25, his task is done and he can wash his hands of the outcome.

All you'd need to do in a ranking system is say "she is first. She is second. She is third"...

That's all an individual judge needs to do. But if a panel of 9 judges is not in agreement as to who is first, who is second, ..., ay, there's the rub.

Mathman · Aug 29, 2020

gkelly said:
Even a professional mathman has trouble keeping straight the distinctions between ordinals and factored placements and overall rankings/standings.

I will never live it down .

It's important to note that the judges' (and tech panels') tasks in scoring performances under IJS are NOT to rank the skaters.

Even so, you once made the following (in my opinion super cool) point about the 8.25s and 8.50s.

Let's say a judge thinks skater A and skater B both deserve somewhere in the 8.5 range for all three os the "artistic" components, Performance, Choreogrphy and Interpretation. But if you held his feet to the fire and forced hime to say that he liked A better than B (though as for that,he took the other as just as fair), he could do this:

Skater A: Perf 8.25, Ch 8.50, Int 8.75
Skater B: Perf 8.25, Ch 8.50, Int 8.50

This is a perfect reflection of his ranking preference while at the same time showing that there really isn't any way to choose between the two. And it translates into the barest fraction of a point advantage to skater, A, If you asked him, was skater A's interpretation really better? he would say, oh, not really, but I had to put down something, and these marks expressed my overall impression of the two skates perfectly. :yes:

Harriet · Aug 29, 2020

Skatesocs said:
Both TES and PCS are intact - so exactly the same as before.

If the objective is not to improve the amount and/or quality of feedback given to the skater, then why bother?

gkelly · Aug 29, 2020

Mathman said:
In contrast, if you give someone a thousand rods of ever so slightly different lengths, and the task is to place them from 1 to 1000 in order of length, allowing the subject to compare them two by two (this one is longer than that one), then everyone can do it and get the entire sequence right.

Applied to figure skating judging, this would argue in favor of rankings. When one skater performs and you have to decide whether that performance was an 8.25 rather than an 8.50 (choose one out of 40 categories), you are up the creek. But if every time a skater goes the judge says, that was better than the skaters that I so far have ranked worst, second worse, and third worst, but worse than the skaters that I have so far ranked best, 2nd best and 3rd best -- that, in principle, is no problem.

There are a couple of issues with this analogy.

The most obvious is that the length of a rod is one-dimensional, whereas a skating performance is multidimensional (is that what you were referring to by "choose one out of 40 categories"?).

And only one decision needs to be made for each rod.

It might be possible to compare single elements and come up with rankings for, e.g., all the solo double axels in the event or all the laybacks in the event, to choose relatively simple elements.

Well, not really. Because there are different ways in which double axels or laybacks can be good, or not so good. Using a length criterion only, comparable to the length of the rods, it would be comparably simple to determine which axel traveled the furthest in the air, or which layback traveled the least on the ice (centered the best). Ranking the length of jumps would be a pretty similar task to ranking the length of rods.

But ranking the quality of the jumps involves a lot more dimensions than just length. How high off the ice did the skater jump? How fast was each skater traveling across the ice on the takeoffs and landings (which is usually correlated with distance in the air)? How long was the landing edge held? What was the quality of security and flow and lack of skidding or scratching on the takeoff and landing edges? What was the quality of the body position on takeoff, in the air, and on the landing? Was there any extra difficulty added to the element in the takeoff approach, in the air position (e.g., delay), and/or on the landing? Was the element timed or otherwise explicitly connected to the music? Aside from possible qualitative weaknesses such as "poor takeoff," "poor air position," "poor landing edge or landing position," were there any outright errors such as touching down a hand or free foot, stepping out of the landing, falling on the landing, underrotation (and by how much), landing on two feet, etc.?

Even in just ranking double axels, each judge would need to make decisions about all those aspects of the jumps. It wouldn't be just ranking each axel on distance, height, edge quality, positions, etc., but also determining how much better one axel was than another on each quality in order to figure out overall rankings. And that's just assuming that all these dimensions should be weighted equally. There are also value judgments involved whereby either each judge individually or the technical panel or the rules and Scale of Values may decide that some of those dimensions, qualities, and errors are more important than others. Should a small, slow, downgraded, two-footed double axel rank higher or lower than a big fast one landed on one foot followed by a fall? Does remaining upright always trump rotation, or vice versa, and how does length figure in?

If judges are responsible for scoring whole programs, or giving a single score for "jumps," they can make mental or on-paper notes about each double axel and then take as many of those qualities into account, along with the qualities of all the other elements or other jumps, to come up with that single score or ranking for the program or the jumps collectively across skaters.

Multiply that by up to 7 jump elements or 12 total elements, each with a similar or more complex array of dimensions to evaluate, and just comparing "elements" or "jumps" could be 100 times more complicated than comparing the length of a rod or length of one jump.

Or with IJS-style scoring judges can just give a score for each axel based on positive and negative GOE criteria (with the tech panel possibly affecting the base value by rotation calls) and then move on, without needing to compare this axel directly to specific axels in this event, only to generalized standards for what makes a good, better, or flawed double axel in general.

It's possible to compare two rods, or several rods (though not really 1000 simultaneously), side by side to make decisions.

Skating performances take place across time. Therefore, it is never possible to compare even two performances simultaneously.

(Yes, it's possible for fans or officials to make videos comparing two or more elements or whole programs side by side after the fact. That might work for some kinds of comparisons -- except for compulsory pattern dances to the same or identically timed music it wouldn't work for the Interpretation component. But real-life competitions take place in real time with one skater performing at a time.)

You're always comparing to a memory of events that happened between 5 minutes and 5 hours earlier, not to something right in front of you at the same time.

She is better than the skater who I have in fourth place but worse than the skater who I have in third. Except for not being able to remember the details of each skater's performance all day long ( ), this requires only two head-to head comparisons.

The remembering performances across several hours is far from trivial.

Mathman · Aug 29, 2020

Harriet said:
If the objective is not to improve the amount and/or quality of feedback given to the skater, then why bother?

I am certain that this would be the ISU's response to such a proposal.

In fact, all scoring methods are more or less the same in terms of outcome, their differences being swamped by noisy wuz-robbin'.:yes:

Mathman · Aug 29, 2020

Skatesocs · Aug 29, 2020

Harriet said:
If the objective is not to improve the amount and/or quality of feedback given to the skater, then why bother?

This is a confusing comment. What is the quality of feedback given via the components system?

Mathman · Aug 29, 2020

gkelly said:
I don't understand why one would want to multiply scores for one kind of skills times scores for a different kind of skills. It seems to make as much (little) sense as multiplying the spin scores times the jump scores, or the SP scores times the FS scores. They're measuring separate things. So what would it mean to multiply them?

This objection is brought to the fore if we included units. What do you get if you add 2 centimeters to 2 centimeters? Answer: 4 centimeters.

What do you get if you multiply 2 cm by 2 cm.? Answer: 4 square centimeters -- a different kind of animal altogether. That's why you you need of of the two objects to be unit-free, like a percent.)

100 points times .875 points is 87.5 ... square points? But 100 points times .875 (no units) is 87.5 points, which can now be compared with the number of points that other skaters got.

But now a different problem arises. If you say that 100 points (TES) times 87.5 percent is 87.5 points, that is well and good. But why should a skater's score in choreography be measured as s a percentage of the sum of the individual element scores, instead of as a score in it's own right?

This by the way, is exactly how GOES go, right? +5 GOE does not mean 5 points, it means that you get 150% of the base value. +4 GOE means 140% of base value, etc.

Mathman · Aug 29, 2020

Skatesocs said:
This is a confusing comment. What is the quality of feedback given via the components system?

I think that feedback refers to this. If you consistently get 7.5 in SS and 8.5 in Choreography, then you need to keep your choreographer but fire your coach. (I mean, you need to work of skating skills and not worry too much about choreography). It would be more useful to learn that your need to work on power-stroking and effortless acceleration, but your mastery of one-foot skating and ice coverage is fine.

However, you (or your coach) already know this. You're slow as molasses. Personally, I think the question of receiving feedback from the judges is problematic for two reasons. First, it confuses the judge's responsibility with the coach's,. Coaches coach; judges judge.

Second, it leans trowards an old boy network, where the judge gives tips to the skater about how she can improve her marks. Then next time the judge will favor the skater who followed his avuncular advice and be extra tough on the ungrateful dog of a skater who didn't.

gkelly · Aug 29, 2020

Mathman said:
I think that feedback refers to this. If you consistently get 7.5 in SS and 8.5 in Choreography, then you need to keep your choreographer but fire your coach. (I mean, you need to work of skating skills and not worry too much about choreography). It would be more useful to learn that your need to work on power-stroking and effortless acceleration, but your mastery of one-foot skating and ice coverage is fine.

However, you (or your coach) already know this. You're slow as molasses. Personally, I think the question of receiving feedback from the judges is problematic for two reasons. First, that confuses the judge's responsibility with the coaches,. Coaches coach; judges judge.

Second, it leans trowards an old boy network, where the judge gives tips to the skater about how she can improve her marks. Then next time the judge will favor the skater who followed his avuncular advice and be extra tough on the ungrateful dog of a skater who didn't.

I don't think that separate program components scores would lean toward and old boy network in that sense more than 6.0 scores would.

With IJS, you can look at the protocols and see how the panel as a whole and how individual judges scored your Skating Skills compared to your Composition.

If the judges disagree with each other such that they cancel each other out, some preferring one of those components and others the other, then even with some judges using large gaps between your separate components you would still end up with similar averages between components and you don't get much information about what "the judges" in aggregate think of your skills in these areas.

E.g., if one judge gives you a full 1.0 higher for Composition (as it's now called) and several of the others give you 0.25 higher for skating skills, then you know that that first judge thought there was a big difference in favor of Composition but the other judges disagreed and some thought the Skating Skills were stronger. You might want to try to talk to individual judges for more detailed feedback, but talking to the one who gave a clear message with the 1.0 difference would probably be the least helpful in giving you additional information about what to work on -- you know what that judge would say.

If you're seeing a full 1.0 difference between these components in the average of the whole panel, probably with even larger gaps from one or more judges, then you have a stronger message from a consensus of several judges that your SS need work without needing to consult any of them in person.

The information that you get from the protocol isn't personal. The judges haven't given you personal tips. Those who tend to use wide ranges of PCS for the same performance probably won't remember exactly which scores they gave you, as opposed to the other hundreds or thousands of skaters they judge every year, unless you are a particularly memorable skater. So if they see improvements next time they judge you in just the areas where they thought you most needed improvement, they're not likely to think "See, you listened to my advice. I'm going to reward you for your wisdom in doing so." (Or "How long have I been sending you messages through my scores that you need to improve your SS? How dare you ignore what my scores are telling you? You should be grateful for those messages. Since you've chosen to ignore them, I will be extra tough on you!")

Skaters can get personal feedback beyond what's in the protocols through critiques in a few different contexts. What I'm aware of would be

*At a nonqualifying competition that offers critiques from a judge and a tech panel member to all competitors as part of the registration fee -- which officials are assigned to critique which events would be up to the referee

*At training camps, seminars, or other seminars organized by the federation for skaters already receiving international assignments, or who are considered to have potential, or in some cases who may just sign up and pay for the opportunity

*By coaches inviting a high-level judge to watch the skaters at their home rink and give feedback (perhaps the only situation where an international skater might get feedback from a judge representing a different federation, although input from their own federation's judges would be much more likely)

For the second and third of those contexts especially, the feedback would be much more personal and would be where the "old boy network" effects you mention might come in, much more than in the mere assigning of scores during competition to show up in the protocols. A judge is much more likely to believe that they gave you advice and that you should be grateful and repay the debt by following the advice if they actually talked to you face to face.

These kinds of critiques took place under 6.0 as well as under IJS. And because there was no way to tell from seeing a 5.4 or 5.5 for Presentation in the 6.0 protocols that one judge or the consensus of the whole panel was that your choreography was strong and your skating skills needed work, you'd be much more likely to seek that kind of input from personal critiques.

And judges who give personal critiques are more likely to feel they have done you a service than those who simply communicate through the protocols.

Skatesocs · Aug 29, 2020

Mathman said:
But now a different problem arises. If you say that 100 points (TES) times 87.5 percent is 87.5 points, that is well and good. But why should a skater's score in choreography be measured as s a percentage of the sum of the individual element scores, instead of as a score in it's own right?

I don't have much to add, but TBH, (as you later say) we are attaching too much explanatory value to these numbers. Even in 6.0, they are basically just ordered pairs, and we attached them a hierarchy depending on what we want to see. "Rank 1" for "6.0, 6.0" would mean nothing mathematically intuitive (I would think) - but it does mean they were best in both cases and deserved to win - so why does it matter if we multiply these?

Mathman said:
I think that feedback refers to this.

You give a good explanation for this. I'd also like to say that a "9.75" is meaningless without what a "10" is. So when a judge gives a "9.75" what are they giving "0.25" less than? Otherwise, it's just a more unintuitive ranking system, like you point out a bit before with what gkelly said about "skater 1" and "skater 2" - just give "8.75" for the person you liked more. That's what I get out of this currently anyway.

Arbitrary · Aug 29, 2020

Skatesocs said:
Both TES and PCS are intact - so exactly the same as before.

Why would you conclude this?

Any changes in the scoring initiated to fight someone's unwelcome superiority.

Sent from my ONEPLUS A6003 using Tapatalk

cohen-esque · Aug 30, 2020

Arbitrary said:
Any changes in the scoring initiated to fight someone's unwelcome superiority.

Sent from my ONEPLUS A6003 using Tapatalk

While ostensibly increasing “someone’s” relative margin of victory! It’s extra insidious because it makes it seem like it would do the opposite of the real goal. A great conspiracy, worthy of the Illuminati! Izxnl has really outdone themselves with this one. :agree2:

silveruskate · Aug 30, 2020

This system is better, but how much better I don't have the answer.

If we take a score of 160 on the original system.

For (TES-PCS) adding to 160: Old system --> New system

100-60 --> 75
80-80 --> 80

For any same score given on the old system, the new system values parity more. And there are diminished returns on increasing TES while PCS is lower, whereas on the current system there is no such thing.

kolyadafan2002 · Aug 30, 2020

I don't like this, as it's less accessable to the public and is very easy to manipulate.

A lot of the public stopped watching after 6.0 as they didn't understand the new system, we don't need a more complicated system.

For me, I'd be happy if they went back to -3 to +3

Mathman · Aug 30, 2020

silveruskate said:
For any same score given on the old system, the new system values parity more. And there are diminished returns on increasing TES while PCS is lower, whereas on the current system there is no such thing.

This!

McBibus · Aug 30, 2020

Mathman said:
If a skater does a quad, should that skater automatically get a higher score for musical intepretation than a sktetr who is equally musical but not as strong technically? Could an artistically exquisite and emotionally moving pperformance get a 9.5 even thoughi t contains only double jumps (or no jumps at all)? If a skter falls on a jump, to what extent dshoulod that reduce the choreography and presentation score? Are we double-dipping if we penalize a technical mistake by lowering the base value, AND giving negative GOE, AND lowering the compenentsd across the board?

Here you're addressing a judging problem not a scoring system problem.
It makes no sense that PCS goes up when TES goes UP but we see it everyday.

Multiplicative PCS scoring?

Skatesocs

Match Penalty

Skatesocs

Match Penalty

gkelly

Mathman

Mathman

Harriet

gkelly

Mathman

Mathman

Skatesocs

Match Penalty

Mathman

Mathman

gkelly

Skatesocs

Match Penalty

Arbitrary

cohen-esque

silveruskate

kolyadafan2002

Fan of Kolyada

Mathman

McBibus

Similar threads

Connect with us