Can we use machine learning to score elements | Page 2 | Golden Skate

Can we use machine learning to score elements

gkelly

Record Breaker
Joined
Jul 26, 2003
It would take less time to judges to evaluate 10 or so points than the 20+ that they currently have to evaluate on every element.
If they currently score like "this jump looks +4", then most of the rules are useless and pointless. Now, clicking 10 checkboxes is much faster and requires less attention than going through 20 of them in your head or on paper or whatever.

It takes more time to click (accurately) or check boxes on paper than it does to think. The eye and the brain are faster than the hand.

Assume a computer is evaluating speed and distance, and either the computer or a human tech panel or a combination thereof determining jump rotation and takeoff edge, so you as a judge don't need to worry about any of those things.

From your list in an earlier post, take the following as the full number of positive and negative bullets judges would still need to consider for jumps:

>> good take-off and landing
>> effortless throughout (including rhythm in Jump combination)
>> steps before the jump, unexpected or creative entry
>> very good body position from take-off to landing
>> element matches the music
>> Unclear edge take off F/Lz (no sign)
>> Poor take-off
>> Lacking rotation (no sign) including half loop in a combo
>> Loss of flow/direction/rhythm between jumps (combo/seq.)
>> Weak landing (bad pos./wrong edge/scratching etc)
>> Long preparation

Memorize it, as closely as you can for purposes of this exercise.

Do the same for spins and step sequences.

Watch a single's program. In your head, mentally tick off which positive points you want to award and which negative ones you want to subtract for. Do the math in your head and write down or type in your final score for each element. Don't worry about scoring PCS, but try not to take your eyes away from the skater so that you miss as little as possible of what they're doing.

If possible, do this in person sitting relatively close to the ice surface, so that in order to keep your eye on the skater when s/he's at the end of the ice, you would be facing a different direction than your computer screen or paper.

Failing that, try watching the skating on TV with your computer or paper lower and slightly to the side.

How much time were you looking away from the skater to record your marks?

Now make yourself a checklist for each type of element either on your computer or on paper.

Watch another program, and physically touch the list entry for every positive and negative bullet you want to award in real time while watching the performance.

How much time were you looking away from the skater this time?
 

NymphyNymphy

On the Ice
Joined
Aug 26, 2017
Very possible. I'm doing my computer science degree right now. Specializing in Artificial Intelligence. It is more than possible with current technology to score skaters using purely machine. You have two options. Have the machine analyze the skater purely on visuals (more error prone). Or have the skater wear special equipment that relays information to the system (more concise but more expensive). I don't think ISU wants to go there though. Not only will it be expensive, the crowd will not be happy when their favorite skater gets << on many jumps they thought were clean. UR and downgrades will become a common occurrence as many skaters often UR and << without detection. A machine would detect that right away. The skater will also have the extra psychological challenge of knowing they are being watched every second by a machine. No hiding ur's.
 

DonP

Rinkside
Joined
Nov 11, 2018
The idea seems to me like science fiction not because it is not possible to apply machine learning in judging, but because there is no practical way for this to happen. Machine learning means that computers "learn" from good examples, in this case from good judges. There will be no enough examples for each element, and not enough people willing to allow the machines to take over.
 

Metis

Shepherdess of the Teal Deer
Record Breaker
Joined
Feb 14, 2018
It takes more time to click (accurately) or check boxes on paper than it does to think. The eye and the brain are faster than the hand.

I come from esports, and while absolute human reaction time is a subject of debate, once you’re down to ~12 frames (1/5th of a second), you won’t find many people arguing that X or Y is reachable on sight. (And there are 19 frame animations that become unreactable due to where the key frames are placed — i.e., the giveaway that differentiates it from another animation come too late in the sequence to be helpful.) If you’re familiar with an arcade cabinet or a fightstick, then I’m talking about “reacting” to what’s on screen by nudging the stick from a right or left downback/diagonal position to a straight back one on the same side. Once you’re talking touchscreens and the like, yeah, we’re moving from executing seemingly tiny quarter-circle movements, for example, to hitting a screen for various GOE bullets without shifting line of sight.

Somewhat tangentially, TSL had an interesting bit of information that I’d like to check: that if a judge has just awarded a negative GOE, a judge will subconsciously lower the GOE of the next element (even from just a +5 to a +4) — and this was reported by a judge. If I recall correctly, the judge said something to the effect of “Once you’re in the red, you have to prove you deserve those positive GOEs.” The human brain is messy. Which is why I’m in favour of trying to train AI to deal with technical errors, frankly, although that wouldn’t fix the issue TSL brought up.
 

ssminnow

Rinkside
Joined
Nov 17, 2007
I'm an ML scientist/engineer, and I am very skeptical. Even in tennis, the instant replay system--a simple in or out decision--isn't always right. Something as complex as skating moves will be very difficult to train a model for. But even more fundamentally, as someone mentioned earlier, building a good ML model means you need access to a very large training set. Even if you take all the video footage in the history of the sport, we do not have a large enough corpus by orders of magnitudes. Moreover, this corpus needs to be labelled by humans, which encodes the biases that you are trying to avoid in the first place. Someday there may be an AI system capable of learning to score skating moves, but I do not believe ML will ever be sufficient.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
I come from esports, and while absolute human reaction time is a subject of debate, once you’re down to ~12 frames (1/5th of a second), you won’t find many people arguing that X or Y is reachable on sight. (And there are 19 frame animations that become unreactable due to where the key frames are placed — i.e., the giveaway that differentiates it from another animation come too late in the sequence to be helpful.) If you’re familiar with an arcade cabinet or a fightstick, then I’m talking about “reacting” to what’s on screen by nudging the stick from a right or left downback/diagonal position to a straight back one on the same side. Once you’re talking touchscreens and the like, yeah, we’re moving from executing seemingly tiny quarter-circle movements, for example, to hitting a screen for various GOE bullets without shifting line of sight.

I'm not referring to reaction time, and I'm not talking about the line of sight to hit a touchscreen while looking at the screen.

Judges should not be looking at their screens while the skater is skating. At most they should look to the screen for 1-2 seconds at a time to input one score per element. Most of the time they're going to be looking at a live skater who might be 100 feet away from the screen while executing the element. If a judge wants to tick three positive and two negative bullet points for any element, they will have to look at the screen (and away from the skater) for several seconds to make sure they're ticking the correct boxes. The more boxes to choose from and the more boxes they need to choose to accurately reflect their evaluation of the element, the longer they will need to look at the screen.

Maybe the solution would be to teach the judges braille or some similar system of codes that can be distinguished by touch, and give them handheld touchscreens with raised dots in different patterns so they can find the correct boxes to tick with their fingertips without having to look away from the skater at all.

Also, would the selection of boxes relevant to element 1 appear on the judges' screens as soon as element 1 is identified (by the tech panel and/or computer), and then be completely replaced on the screen by the boxes for element 2 as soon as element 2 is called? Or would judges be able to input for element 1 and element 2 while the skater is performing element 3, etc.?

What would the input interface be like?

Somewhat tangentially, TSL had an interesting bit of information that I’d like to check: that if a judge has just awarded a negative GOE, a judge will subconsciously lower the GOE of the next element (even from just a +5 to a +4) — and this was reported by a judge. If I recall correctly, the judge said something to the effect of “Once you’re in the red, you have to prove you deserve those positive GOEs.” The human brain is messy. Which is why I’m in favour of trying to train AI to deal with technical errors, frankly, although that wouldn’t fix the issue TSL brought up.

That is interesting. The solution might be different for unconscious unintended effects vs. intentional decisions. If the ISU really does want judges to evaluate each element independently, then they should train the judges to think of the elements as independent objects and put the previous one(s) out of their minds when evaluating the next one.
 

bobbob

Medalist
Joined
Feb 7, 2014
I'm not suggesting we remove judges entirely. As discussed, I think for jumps and spins we can easily evaluate number of rotations off the ice, height, speed, type of jump, things like that (without even any machine learning) and the machine learning can detect quality of landing, transition from jump, etc. Based on these things, which we would treat as features, we could assign a baseline GOE, which judges would be free to adjust but a solid reasoning would need to be provided.

A valid concern is the size of the training set, and whether it itself will be biased. If we are looking at an element and want to go straight to assigning that a GOE, this may be true. A big reason for bias comes from the fact that bullet points are not followed when judging--if we base the model on the bullet points themselves (i.e. a model should easily tell if a fall happens, and a score is assigned based on that, or will tell if there are preceding steps/difficult entry), this will reduce bias AND reduce the need for a large training set because each of the bullet points, on its own, is not super difficult to classify.

For PCS it gets more sticky...I think judges would still be needed though we can still get metrics about speed, transitions and crossovers, etc.

Another idea I had which is sort of on the opposite end of the spectrum...since the judges mostly look at screens anyways, why can't we have remote judges? Since we have more manpower this way it will be possible to scrutinize each element closely.
 

Metis

Shepherdess of the Teal Deer
Record Breaker
Joined
Feb 14, 2018
I'm not referring to reaction time, and I'm not talking about the line of sight to hit a touchscreen while looking at the screen.
I know. My point is that even in esports, where the eyes never look at the data input mechanism, turning what’s observed into a physical input still has an upper limit. And part of that is “mental stack”/cognition, which is going to be much higher when deliberating GOEs (in a world in which we assume there’s no need to look away from the skater to input scores), because in esports, you’re almost always looking for which option of a narrow set of choices your opponent is executing, and key frames and other tells are what allow you to recognize what’s going on faster and as such react faster. Hypothetically, it’s the same issue with edge calls — let’s imagine a judge needs to click a gizmo in their right hand if they see an edge issue. If said judge hyperfocuses on only the edge and all edge-related data, they’d probably be able to make a correct call more often than not quite quickly... at the cost of not noticing anything else about the jump. (Which leaves you with a an invisible gorilla/selective attention problem.)

More to the point, what I’m saying is that if judges had to mark each GOE bullet, every bullet requires reacting on sight and turning what’s observed into data. Let’s imagine a touchscreen GOE screen, from left to right, looked like this:
ERROR (automatically flags for review after the programme, so that a jump that has positive qualities but ends in a fall doesn’t force mental arithmetic in the moment) REVIEW (for cases in which a judge wants to check the replay for technical issues, such as a UR or edge violation) and then 0/1/2/3/4/5
The first button is red, the second is yellow, the final six are green. We can even stick a keyword in the + buttons, so that +3 becomes +3 EFFORTLESS.

Assuming this is the input mechanism and that a judge never needs to look at the screen because they’ve trained to rapidly input the information asked (except in review and error cases, which are done after the skat), you’re still mentally evaluating more than you are in esports — and in esports, there’s been a fair amount of work with eye trackers and measuring reaction times to see what the upper physical and cognitive limits are, and what effect the location of focus has on both speeding up and improving reactions. To give an example: this analysis of Hanyu’s 4Lz gives it a flight time of 0.781 seconds, which is a year in esports. But in esports, the decision making is much more streamlined, and so is the mechanism of input. In skating, the mental load is much greater (various GOE criteria, weighing them against each other, etc.). Even if you assumed every judge could hit key positive bullets at the upper limit of human response time, the cognitive work of assessing the 20-odd positive and negative features can’t be done in <1 second. Which is how we arrive at heuristics and “that looks like a +3/4.”

I’m not disagreeing with you — imagining a touchscreen approach that works by actual touch cues such as Braille is interesting. Even if we could stipulate that, for the sake of argument, everything can be inputted perfectly in the least amount of time possible, I don’t see how you bring the mental workload down to make marking each bullet possible, as the amount of time spent on cognition is too great (especially in instances of unexpected error or cases that need to be reviewed to check for error — which is why my hypothetical judging screen simply brackets those cases as “REVIEW,” as the time lost on an edge violation or a UR could cause delay in evaluating the next element and the negative bias that may be incurred from inputting a low GOE is also worth trying to control for). It just can’t be done in real time, not as far as I can tell and based on what I know about research on the upper limits of mental processing.

There’s an argument that judges should have to manually check off positive and negative features after the skater finishes to finalize their GOE marks, which is doable with replays and potentially allow the tech panel to take away specific bullets in the event of various calls. (Especially with an AI model that can determine height and distance on jumps, amount of traveling on a spin, etc., and be set to lock out those bullets based on how strictly its been programmed to call flaws.) That doesn’t leave time for PCS, most likely. But it would probably get rid off actual mistakes in protocols, such as the errant positive 3 GOE on an element that contained a fall and was marked by every other judge as a minus 3. (Which could also be handled by prompting judges to verify that marks a certain degree out of consensus are indeed what they intend to submit, as human error is a thing.)


That is interesting. The solution might be different for unconscious unintended effects vs. intentional decisions. If the ISU really does want judges to evaluate each element independently, then they should train the judges to think of the elements as independent objects and put the previous one(s) out of their minds when evaluating the next one.
Anchoring boas is a real problem — if your first element is shaky or ends in a fall, then the judges are likely to take hold of whatever negative GOE was inputted as the anchor, even if the rest of the performance is perfect. There’s also the fact that skaters aren’t unknown: if judges know that a skater routinely falls on a specific jump, URs it, or has an edge violation, human psychology suggests they’re going to be more inclined to find a flaw even in instances in which the jump as performed was lacking an error. (Though this might cancel out some of the anchor bias — “Oh, Kolyada always falls on the 4Lz, -3, moving on.”) The case specifically presented on TSL was a jump that ends in a fall causing a spin to be reduced in GOE, even if the spin is deserving of +5 GOE, due to the interruption of the programme and the sense that the skater now has to really “earn” those positive bullets.

I’m not sure you could reliably measure such an effect under the old GOE scale, but with enough data, it may be possible under the current one.
 

Neenah16

On the Ice
Joined
Dec 4, 2016
I personally maintain the opinion that it is better to use technology to provide the judges with more information to help them give accurate scores (and be held accountable) rather than have a machine do the scoring.
I think that a lot of people have high expectations when it comes to technology and what it can do because of big companies like Google and their huge machine learning projects, but these companies have resources most of the world doesn't have. For this project to work, you will first need to conduct a full study examining all the different parameter that may affect the system and scoring, have humans annotate the data for accuracy, choose what can be used, design and built the infrastructure that will allow the system to have the required data in real time, and then develop the system itself. This is a lot of work and requires a large time and money investment especially for a sport like FS because of the huge range of variables involved. This system should be able to capture very small things, for example a slight change of edge on different ice surfaces in different lighting conditions and for different skater who are using different blades and have different skills. That is too many "differents" to consider and capture, and that is only one example.

With proper training, the human brain is actually very efficient when it comes to zeroing in on the most relevant details and discarding distractions. The only problem is that we do tend to also use our emotions when thinking which may affect our logical reasoning, hence biased judging. Thus, if we can give the judges as much factual data as possible to help them be more objective, the scoring would be better. We will never have completely objective judging for a sport like FS but we can have fair judging.

Yes, I am ignoring corruption and judges who deliberately try to influence scores, because I don't think that it is as prevalent as many think. I do believe that most of the issues in judging are due to human psychology and the complexity of the rules. Though with better information available, more accountability can be enforced as well
 

narcissa

Record Breaker
Joined
Apr 1, 2014
One thing technology could be easily used is to look at underrotations and edges. Unfair tech calls are 80% of the problem, IMO! The problem is getting the ISU to adopt any changes at all.
 

moriel

Record Breaker
Joined
Mar 18, 2015
I'm an ML scientist/engineer, and I am very skeptical. Even in tennis, the instant replay system--a simple in or out decision--isn't always right. Something as complex as skating moves will be very difficult to train a model for. But even more fundamentally, as someone mentioned earlier, building a good ML model means you need access to a very large training set. Even if you take all the video footage in the history of the sport, we do not have a large enough corpus by orders of magnitudes. Moreover, this corpus needs to be labelled by humans, which encodes the biases that you are trying to avoid in the first place. Someday there may be an AI system capable of learning to score skating moves, but I do not believe ML will ever be sufficient.

Yep, ML totally not good due to those things. I'm all for model based approaches instead (as in "put sensors on blades and measure stuff") that do not use ML at all.

- - - Updated - - -

One thing technology could be easily used is to look at underrotations and edges. Unfair tech calls are 80% of the problem, IMO! The problem is getting the ISU to adopt any changes at all.

Because well, you know, they got to learn how to use flashdrives first =D
 

gkelly

Record Breaker
Joined
Jul 26, 2003
The simplest technological enhancement that could be adopted any time the ISU wants to commit to it would be to include a second and maybe third camera angle for video reviews at major championships and maybe Grand Prix events in large arenas. There would be some logistical hurdles, but easier to overcome than actual machine scoring. And the scoring system wouldn't need to change for those events -- they would just be making some more information available in the more important events toward confirming the same kinds of calls always made by the same people.

Even without adding a camera angle, showing the official review replays from the one official camera to the audience (on the jumbotron and on TV) would do a lot to improve public confidence in the integrity and competence of the tech panels without changing anything about how they do their jobs.
 

moriel

Record Breaker
Joined
Mar 18, 2015
The simplest technological enhancement that could be adopted any time the ISU wants to commit to it would be to include a second and maybe third camera angle for video reviews at major championships and maybe Grand Prix events in large arenas. There would be some logistical hurdles, but easier to overcome than actual machine scoring. And the scoring system wouldn't need to change for those events -- they would just be making some more information available in the more important events toward confirming the same kinds of calls always made by the same people.

Even without adding a camera angle, showing the official review replays from the one official camera to the audience (on the jumbotron and on TV) would do a lot to improve public confidence in the integrity and competence of the tech panels without changing anything about how they do their jobs.

also, considering the modern technology, they could stream it on youtube and then leave it there >.>
 

VegMom

On the Ice
Joined
Aug 25, 2017
I think it’s absolutely worth trying to create a machine learning scoring system. But not because it would eliminate bias itself, but rather because the process of teaching AI to judge would expose the current biases.

Matching learning requires input data to learn from. We’d take all the existing videos and scores to ‘teach’ the AI. What we’d get after isn’t going to be less biased than what we put in, but it would help identify previously unseen/unrecognized bias.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Another idea I had which is sort of on the opposite end of the spectrum...since the judges mostly look at screens anyways, why can't we have remote judges?

What makes you think judges mostly look at the screens during the programs (as opposed to replays)?

Maybe the right kind of video or other machine-based measurement would be more accurate for some determinations such as jump height, distance, and rotation.

Actual skating quality cannot be judged nearly as well on video as by direct perception.

So there could be machine-scoring and/or remote judges' scoring of elements, and live up-close scoring of the PCS especially Skating Skills.
 

VegMom

On the Ice
Joined
Aug 25, 2017
it’s great in theory. But first of all, remember that the ISU has, like, no money for anything. Implementing technology is expensive because of equipment. For skaters music, many are still requiring CDs, compact discs!!! I mean, my car doesn’t even have a CD player and it wasn’t even an available option! Then there’s the cost of developing the tech. Can you imagine if Japan lost interest in figure skating? Everyone would be left competing in abandoned buildings skating on uneven frozen patches of sewage.

Next, people that work in the ISU are tech illiterates still trying to calculate GOE multipliers with an abacus. Did you watch the ISU web seminars for judges to learn about the new changes in the judging system? That was the most ghetto thing I’ve watched online in a very long time. It looked like a high school PowerPoint presentation that was slapped together in the morning by a student that was out very late partying the night before. I'm honestly pleasantly surprised skating protocols are in PDF format, and not calligraphy on parchment paper.

Also, don't forget the many corrupt don't want the option to be corrupt to be taken away. They will only reconfigure the system and rules to make it seem like they're trying to fix things. Skating rules and scoring has changed a lot over the years. But the one thing that needs to be changed, judges' ability to unfairly manipulate placements, has not changed one bit.

So, idea not crazy. Thinking it can happen within the next 100 years, crazy. We would sooner see a quintuple salchow done in the one-arm Biellman position in the air.

This post WINS the web. Haha haha!!!
 

VegMom

On the Ice
Joined
Aug 25, 2017
It takes more time to click (accurately) or check boxes on paper than it does to think. The eye and the brain are faster than the hand.

[...]
How much time were you looking away from the skater this time?

Anyone who is good at typing, texting, Braille, signed languages, stenography, etc can tell you eyes are not necessary for taking notes/scoring/communicating with your hands. One mustn’t “look down at their paper” in order to check boxes. It’s possible to keep your eyes entirely on the skater and take notes at the same time.

I used to use a phone that had bumps on the number pad, different bumps for different numbers. I could easily text anyone without looking at my phone at all.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Yes, it's possible to write notes on paper or to type without looking down.

Or to press buttons that you can feel and distinguish them by position (even more if they're labeled by some different tactile markings).

What would not be so easy to do without looking would be to be presented with an array of 10 or so options and make sure to press/click exactly the 4 options you want to select and not any of the others.

This would be especially difficult if the array is only presented a few seconds after the element is called, only to be replaced onscreen by another array for the next element that the skater is already performing while you're selecting the bullets for the previous one. The options in the next array might be identical to those in the first if they're the same kind of element (two jumps or spins or lifts in a row) or different if it's a different kind of element. Either way, you'd need to verify that you're inputting the bullet points for the element you intend to apply them to and not the subsequent element.

And if 10 or more checkboxes were available onscreen for multiple elements at the same time, the total number would be that much greater and each individual box that much smaller, requiring more visual (or perhaps tactile) attention to make sure you were checking exactly the boxes you intended to check.

Coming up with an interface that allows judges to perform this kind of data input in real time without missing elements when there may be less time between the end of one element and the beginning of the next than it takes to click a handful of boxes accurately would be a challenge that any designers of such a system would have to consider.
 

Shayuki

Record Breaker
Joined
Nov 2, 2013
The idea seems to me like science fiction not because it is not possible to apply machine learning in judging, but because there is no practical way for this to happen. Machine learning means that computers "learn" from good examples, in this case from good judges. There will be no enough examples for each element, and not enough people willing to allow the machines to take over.
Computer can learn from individual rules. For instance, they're able to learn to calculate the amount of underrotation at the moment the skate hits the ice, or the amount of pre-rotation. What this would require is a data set with tons of examples for each separate bullet point. The actual judges don't matter at all for this.
What would not be so easy to do without looking would be to be presented with an array of 10 or so options and make sure to press/click exactly the 4 options you want to select and not any of the others.
Is it? I'm not sure how that's any different from typing blindly. You could even have the 10 different fingers all on their own button and press the corresponding button when necessary.
 

gkelly

Record Breaker
Joined
Jul 26, 2003
Is it? I'm not sure how that's any different from typing blindly. You could even have the 10 different fingers all on their own button and press the corresponding button when necessary.

When you're typing blindly, the keys stay in exactly the same place the whole time.

How are you imagining that the touchscreen or touchpad for GOE input would look (or feel) to accommodate all possible positive and negative criteria for each different kind of element?

Would it update each time the skater performs a new move and the technical panel calls that move?

Or would all the possible criteria for all different element types be on the screen at all times and the judge would have to type in the code or element number for the element they're scoring before finding the appropriate boxes by touch?

See moriel's post 14 in this thread for a proposed list of bullet points that could be assigned to the AI to determine, perhaps in conjunction with a human tech panel, and 11 remaining jump-related bullet points reserved for judges.

(And that's assuming that AI really could immediately determine everything in the first list in that post. At least at first, I'd expect many of them would usually need to be confirmed by the tech panel with video review after the program. Or else they would be left for judges to determine, making the list of judges bullet points longer.)
 
Top