# Thread: You Be the Judge

1. 0
Originally Posted by Mathman
As I understand it, you can only move up one boat at a time. When you bump someone those two boats are taken out of the water.

I guess there is a winner in that after each race either the boat in front is still in front or else the second place boat caught up and is now the leader. I think the goal is to stay in front for as many years in a row as possible.
Sounds like it could be fun, but it doesn't seem like a good method for determining who rowed the fastest this year. Depends what the goal is.

2. 0
Say, if this happened in Japanese Nationals:
SP standing: 1. Takahashi. 2. Kozuka. 3. Skater X. 4. Machida. 5. Hanyu
LP standing: 1. Hanyu. 2. Kozuka. 3. Takahashi. 4. Machida. 5. Skater X.
Final Standing (with LP being the tie breaker): 1. Kozuka (2/1 + 2 = 3). 2. Hanyu (5/2 + 1 = 3.5). 3. Takahashi (1/2 + 3 = 3.5). 4. Machida (4/2 + 4 = 6). 5. Skater X (3/2 + 5 = 6.5)

Shortly after the medal ceremony, Skater X failed the doping test, and as a result his scores were deemed invalid. So the standings were adjusted:
SP standing: 1. Takahashi. 2. Kozuka. 3. Machida. 4. Hanyu.....
New Final Standing: 1. Hanyu (4/2 + 1 = 3). 2. Kozuka (2/2 + 2 = 3). 3. Takahashi (1/2 + 3 = 3.5). 4. Machida (4/2 + 4 = 6).......
Isn't that ridiculous if Hanyu and Kozuka had to flip-flop their medals because of "irrelevant alternatives" (somebody else dropped out from the competition), not because of their performances per se?

3. 0
Originally Posted by gkelly
If this happens in the short program, it's confusing while the short program is in progress, but then at the end of the day the standings are fixed and it doesn't matter that Camille was ever higher in the standings than Babette early on.
Apply the same logic to the long program. In the short, "it doesn't matter" because we are content to wait until everyone has skated, understanding that skaters will be flip-flopping all over the place in the mean time.

Same with the long. Have patience. At the end of the day, run the numbers.

This is a psychological problem (Oh, I just can't wait!) rather than an objection of substance.

Originally Posted by skakinginbc
Isn't that ridiculous? if Hanyu and Kozuka had to flip-flop their medals because of "irrelevant alternatives" (somebody else dropped out from the competition), not because of their performances per se?
No, I don't think that is ridiculous at all. It seems ridiculous only because we are artificially inserting time as a variable. It is jarring psychologically because things happened in a funny temporal order. But mathematically, this is no different than if Skater X had failed the doping test before the event and had never skated. Among the un-doped skaters who skated, Hanyu ended up winning. Hanyu would have won if Skater X had never been born. Skater X is indeed "irrelevant."

4. 0
Originally Posted by Mathman
Speaking of "validation," by the way, when the CoP was in its developmental stages in 2003 the iSU conducted a lot of retro-scoring exercises to make sure that the CoP wasn't completely out in left field, when contrasted with results from ordinal judging. One of the events they scored was the 2002 Olympics. Tim Goebel won, beating Alexei Yagudin (not to mention Evgeni Plushenko) by doing three quads. The ISU immediately lowered the base value of quads to prevent such an obvious anomaly.
That decision was perhaps valid. At least I know two experts who had given their ratings for my "Be the Judge" exercise placed a heavier weight in PCSs than it is now. For instance, gkelly was willing to put Abbott, who had the lowest base value and technical score, first in the free at Cup of China. The lowering of quad's base value was at least based on some sort of validity research, while the raising of quad's base value later was based on complaints. And I am not sure if the way they gathered and analyzed the complaints was scientifically sound.

5. 0
Originally Posted by skatinginbc
And I am not sure if the way they gathered and analyzed the complaints was scientifically sound.
I think it was as scientifically sound as can be expected.

(a) Folks are winning Olympic and Worlds Championships without doing quads.

(b) This is bad ipso facto, whether they deserved their victories or not.

(c) Therefore we will raise the base value of quads. That'll teach 'em who's boss around here. Boss Quad.

6. 0
Originally Posted by skatinginbc
And I am not sure if the way they gathered and analyzed the complaints was scientifically sound.
I'm not sure that it is possible to be sciency about developing a scoring system. You are combining elements which you can give a set value, like a 3A, with things that are more open to interpretation, like skating skills. I think this also supports something more along the lines of 6.0. It's a matter of whether you trust the judges to make the right decision or if you have a system where you get results that surprise everyone because of how certain things are valued against another. I'd be curious to know whether judges generally agree with the order that they placed the skaters with their marks under COP.

7. 0
By the way, here is the canonical example of how a flip-flop can occur at the top in a single program, by a skater who loses to both of the top two. (This is majority of ordinals. As gkelly mentioned, OBO is a little more complicated, but you can still construct examples like this.) Here are the judges’ marks:

Alex: 2 2 2 1 1 1 1 1 1
Benj: 1 1 1 2 2 2 2 2 2

Alex has the majority of first place ordinals, 6.

Now Char skates (these are Benjamin Agosto, Alex Shibutani, and Charlie White )

Benj 1 1 1 2 2 2 2 3 3
Alex 3 3 3 1 1 1 1 2 2
Char 2 2 2 3 3 3 3 1 1

Now Benjamin wins, even though they both beat Charlie. No one has a majority of first place ordinals, but Benjamin has 7 first and second place ordinals to Alex’s six and Charlie’s five.

When something like this happened at 1997 Europeans, Cinquanta hit the roof. Sandra Loosemore reports:

Following the European Championships, Ottavio Cinquanta, president of the International Skating Union, announced that he wanted a new scoring system adopted.

Cinquanta has made repeated statements to the press that the new scoring system must not permit "flip-flops" in the standings, where the relative placements of two competitors who have already skated are changed by the marks given to a competitor who skates later, such as what happened at 1997 Europeans. (Most recently, at the 1997-98 Champions Series final in Munich, Cinquanta was quoted in press reports as stating that "If one skater is in front of another, he should remain there.")

Also getting into the act was IOC chair Juan Antonio Samaranch, who made public statements to the effect that the system used to score figure skating at the Olympics must be understandable to the public.
Sound familiar?

http://www.frogsonice.com/skateweb/obo/score-obo.shtml

This is an example of social choice arriving at a different course of action than individual choice. The majority of the judges (6) individually preferred Alex to Benjamin. But "society" (all the judges acting in concert, following the majority of ordinals system) chose Benjamin instead.

8. 0
Originally Posted by Mathman
It is jarring psychologically because things happened in a funny temporal order. But mathematically...
Mathematically you are right. But I hope I can prove it psychologically otherwise (e.g., sense of justice, fairness, integrity....).

"Sports are usually governed by a set of rules or customs. Physical events such as scoring goals or crossing a line first often define the result of a sport. However, the degree of skill and performance in some sports such as diving, dressage and figure skating is judged according to well-defined criteria. This is in contrast with other judged activities such as beauty pageants and body building, where skill does not have to be shown and the criteria are not as well defined.http://en.wikipedia.org/wiki/Sport

I can accept beauty pageants as a social choice contest and therefore the flip-flop in standing, but psychologically I have a hard time to accept it in figure skating.

9. 0
Originally Posted by Mathman
Sound familiar?
I cannot believe I'm speaking like Cinquanta: Must not permit "flip-flops" in standings.

10. 0
Originally Posted by drivingmissdaisy
I'd be curious to know whether judges generally agree with the order that they placed the skaters with their marks under COP.
Me, too. Many contests are so close that the judges can't possibly know who is winning or losing as they are entering their marks.

11. 0
In case anyone is curious about how OBO (one by one) went, here is a good example from the same paper by Sandra Loosemore quoted above.

http://www.frogsonice.com/skateweb/obo/score-obo.shtml

Here's an example involving a small event with only 6 competitors (derived from the actual marks given to the top 6 finishers in the free skating in the men's event at 1997 Europeans). The first step is computing the ordinals in the usual way. Suppose this works out to give us:

A 1 1 1 1 1 2 1 1 1
B 3 2 5 2 3 3 5 6 6
C 5 5 4 4 2 4 2 2 3
D 4 3 3 6 4 6 4 3 2
E 2 4 2 3 6 5 3 4 5
F 6 6 6 5 5 1 6 5 4

The OBO comparison table might look like this:

|| A | B | C | D | E | F || total wins || total JiF
----++-----+-----+-----+-----+-----+-----++--------------++-------------
A || | 1/9 | 1/9 | 1/9 | 1/9 | 1/8 || 5 || 44
----++-----+-----+-----+-----+-----+-----++--------------++-------------
B || 0/0 | | 0/4 | 1/5 | 0/4 | 1/6 || 2 || 19
----++-----+-----+-----+-----+-----+-----++--------------++-------------
C || 0/0 | 1/5 | | 1/5 | 1/5 | 1/8 || 4 || 23
----++-----+-----+-----+-----+-----+-----++--------------++-------------
D || 0/0 | 0/4 | 0/4 | | 0/4 | 1/7 || 1 || 19
----++-----+-----+-----+-----+-----+-----++--------------++-------------
E || 0/0 | 1/5 | 0/4 | 1/5 | | 1/6 || 3 || 20
----++-----+-----+-----+-----+-----+-----++--------------++-------------
F || 0/1 | 0/3 | 0/1 | 0/2 | 0/3 | || 0 || 10
----++-----+-----+-----+-----+-----+-----++--------------++-------------
Under majority of ordinals, the order ia A,B,C,D,E,F. Note that B has the most second and third place votes combined, five.

To read the hash table for OBO, an entry like 1/6 means that this skater won head-to-head, with 6 judges (to three for the other guy). 0/3 means this skater lost (0), but got three judges votes.

So for instance, if we read across the line for B: B lost to A and got 0 judges' votes. B lost to C but picked up 4 judges' votes; B beat D, with 5 judges' votes; B lost to E with 4; B beat F with 6 judges' votes. In total, B beat two other skaters (D and F) and picked up a total of 19 judges' votes.

Under OBO, C came in second and B dropped to fourth. The final order was A, C, E, B, D, F. (The order is determined by the most wins in the next-to-last column. The last column, total number of judges, was used as a tie-breaker.)

And that's why people don't like the new CoP scoring system. CoP is too complicated.

Edited to add: By the way, in the above example skater A is Alexei Urmanov and skater E is Alexei Yagudin.

12. 0
I laughed when I read this sentence from Sandra Loosemore's article: "The ISU is considering adoption of a new scoring system...as a result of complaints that the current ordinal scoring system is too complicated." And they eventually adopted a more complicated system, called CoP.

Originally Posted by Mathman
This is an example of social choice arriving at a different course of action than individual choice. The majority of the judges (6) individually preferred Alex to Benjamin. But "society" (all the judges acting in concert, following the majority of ordinals system) chose Benjamin instead.
It reminds me of President Bush who lost by 543,816 votes to Al Gore in the 2000 election. The majority of the Americans individually preferred Al Gore to George Bush, but "society" (the electoral system or whatever) chose Bush instead.

13. 0
^ That phenomenon is not always bad, however. For instance both sides might agree on a compromise candidate that is nobody's first choice but everybody's second choice. (This is basically why Benjamin won in the skating example in post 67 above.)

Anyway, I think the idea of testing the CoP for validity would be impossible to carry out. If we norm it against some sort of gut feeling or intuition of a consensus of experts, then how are the experts credentialed? Skaters past and present would not be an unbiased choice. If we asked judges, those judges' training and experience has been with particular judging systems. I think if you asked a judge, "look at these two performances; which do you like better?" -- the first question the judge would ask is, "under what scoring system?"

In other words, I think you have set us an impossible task when you say, in the first post on this thread, "Forget about 6.0. Forget about CoP. Forget about any scoring system or criteria that the authorities have prescribed."

What we can do is check to see whether the actual decisions of judges and tech specialists follow the written descriptions. The answer is "heavens no" -- but that's OK because in any system of laws there are always a bunch of shadow laws built up from common law experience and tradition. There is no written rule in the ISU rule book that says "if you do a quad your interpretation score goes up." Or "if Joe Inman shoots off an email everyones' transition scores go down" -- but that's how it is.

We can also investigate reliability (consistency). Similar programs should receive similar scores, across different events, different judging panels, different competitive fields, etc. Let me attempt a guess at the outcome of such a study. They don't.

Why not? This is figure skating we are talking about, not being the first across the finish line. (That system of "social choice" is a dictatorship: there is one judge, the stopwatch. The dictatorship model is the only ordinal system in which flip-flops do not occur. )

14. 0
Originally Posted by gkelly
If what you're measuring is primarily based on personal preference or qualitative perception of the same skills, then pure ranking makes sense.

If what you're measuring is based on objectively quantifiable skills, then pure absolute scoring of each those skills makes sense.

Figure skating has always been qualitative, and originally it put the most emphasis on everyone doing the exact same skills to compare who did them best, which is why it originally developed a qualitative comparative scoring system.

As the sport developed, the variety of skills that different skaters can choose to include in their performances has multiplied. So has the degree to which some important technical skills (primarily jumps, and number of "features" on other elements in teh current system) can be quantified. The quality of execution of those skills is also still considered important. The way that programs are put together can to some degree be quantified, and qualitative perception of "artistry" (however we define it) is also considered important. Not just in terms of personal preference, but also in terms of how artistic execution of the technical skills demonstrates superior command of the techniques.

If the scoring remains based purely on rankings, the quantifiable and obvious aspects of the performances are not accounted for in a transparent way. Hence a push toward quantifiability by people within the sport who are more interested in recognizing those athletic skills, and by non-figure skaters in the ISU (speedskaters) and in the IOC who are more comfortable with objective measurements.
This.

Technology, such as slow-motion replay and computers that can handle multiple complex rules and numbers, allows more accuracy in scoring especially at the most important competitions. For smaller competitions, the technological and human resources required may not always be financially feasible.

But figure skating doesn't want to give up its qualitative aspects either. How well something was done continues to matter just as much as exactly what was done. And transcending technical content to produce an aesthetically pleasing performance, even to the point of connecting with audiences on an emotional and artistic level, is still valued. And still subject to personal perception and personal preference.

So how can both the quantitative and the qualitative aspects of evaluation each be given appropriate weight?

I think breaking down the scores into element base marks, grades of execution, and program components is a good approach. How they're each translated into numbers is more debatable.

And I expect that some years, perhaps decades, down the line more technology will allow more objective measurements of the what that will include some aspects of the how well (measuring not only jump rotation but also height and speed -- or rotational speed in spins).
Also this.

Can there be better ways to measure, score, and report scores for qualitative aspects to reward them appropriately? On the one hand, if a skater has such mastery of the use of her blades and body that she can become one with the music in her skating and bring a whole panel of judges to tears, we want a way to reward that more highly than just executing the same technical skills with the same success and generally staying on time with the music. On the other hand, we don't want judges who are moved by a skater's personal story off ice (or by bribes or threats from the skater's supporters) to overreward based on personal preference at the expense of analytical evaluation of the technical skills.

So there's lots of room to figure out better solutions for balancing the objective and subjective aspects of the scoring.

And if the fairest ways become too expensive, how can the system be simplified for use below the level of the most important competitions that attract sponsors?
And all of this.

15. 0
Originally Posted by Mathman
I think if you asked a judge, "look at these two performances; which do you like better?" -- the first question the judge would ask is, "under what scoring system?"
I agree with almost all of the post this comes from, but I'm going to quibble with just this one sentence.

I think if you asked a judge "look at these two performances; which do you like better?" the judge will tell you which one she personally likes better. The reasons could be very relevant or completely irrelevant to how the programs should be scored.

If you asked a judge "look at these two performances; which do you think should place higher?" then she'll probably tell you which one she thinks deserves to place higher, or which one she would place higher under an ordinal ranking system, based on overall impression. She might also say that under her less favorite scoring system the results might not be the results she preferred. Or she think that the difference between these two programs is so clear which scoring system is used would not make any difference and so not be worth mentioning.

If you asked a judge "look at these two performances; how would you score them in relations to each other?", then she would have to ask "under what scoring system?" before she could answer the question.

Page 5 of 8 First 1 2 3 4 5 6 7 8 Last

