Predicting the PCSs from the TESs | Golden Skate

Predicting the PCSs from the TESs

Joined
Jun 21, 2003
Here are four fomulas, based on the ladies LP at 2007 Worlds. (The r vales range from .76 to .79.)

1. Total PCS = .75 x (Total TES) + 13.

2. Total PCS = 39 x Ln(Total TES) - 102. (Ln is the natural logarithm.)

3. Total PCS = 23 x 1.0153^(Total TES). ( "^" means, "raise to the power".)

4. Total PCS = 2.2 x (Total TES)^.8.

Let's try Elene Gedevanishvili. Her total tech score was 44.62. The predicted values of her PSCs are

1. PCS = .75 x 44.62 + 13 = 46.47

2. PCS = 39 x Ln(44.62) - 102 = 46.13

3. PCS = 23 x 1.0153^44.62 = 45.27

4. PCS = 2.2 x 44.62^.8 = 45.92

Actual PCS: 46.81 :)
 

Kypma

Final Flight
Joined
May 12, 2007
Mathman, you chose your username really well. But do your quations work for other skaters as well? And how did you figure this out?!?!

Kypma
 

decker

On the Ice
Joined
Nov 6, 2006
Blessed are the statisticians :bow:

I keep meaning to ask you ... what's up with the ISU calculation of the PCS? The PCS scale is ordinal, yes? Similar to a Likert scale? Ordinal scores do not have a (meaningful) mean. They have a median. But the calculation is a trimmed mean, calculated after 3 judges' scores are tossed at random. Surely this is statistical molestation?!

I know they toss 3 at random to make up for the anonymous judging. As if. But tossing and trimming? Sounds like it oughta be illegal to me.

Susan
 

gsrossano

Final Flight
Joined
Nov 11, 2006
There is a linear relationship between PCS vs. TES for all the divisions in the USFSA competition structure. The slope and intercept vary from one division to another. How tightly PCS is correlated to TES varies with division and discipline. For CD it is very tight. For most singles divisions the spread in PCS is typically +/- 5 points in PCS for a given TES. The spread also seems to vary with TES. For the highest TES, PCS spread is fairly low (compare to lowest values of TES).

I will have some graphs of this in an article I am working on for my site that should be finished in the next week or so (a marks analysis from Regionals and Sectionals).

... what's up with the ISU calculation of the PCS? The PCS scale is ordinal, yes?

No. PCS is a factored sum of the panel averaged PC scores. The PC scores are not ordinalized and are supposed to be marked to an absolute standard on an absolute marking scale.
 
Last edited:
Joined
Jun 21, 2003
Do these formulas work for the guys'scores as well?
For the men at 2007 Worlds (LP):

1. PCS = 24.5 + .6 x TES
2. PCS = 40.8 x Ln(TES) - 106
3. PCS = 34.6 x 1.0093^TES
4. PCS = 4.6 x TES^.63

Example: Preubert, TES = 68.32

1. PCS = 24.5 + .6 x 68.32 = 65.49
2. PCS = 40.8 x Ln(68.32) - 106 = 66.35
3. PCS = 34.6 x 1.0093^ 68.32 = 65.12
4. PCS = 4.6 x 68.32^.63 = 65.05

Actual PCS = 65.22 :)
 

Hsuhs

Record Breaker
Joined
Dec 8, 2006
Thanks, Mathman! So it's different numbers for the guys.
But.... what is exactly the meaning behind these figures? Are comparisons possible between different events?
 
Joined
Jun 21, 2003
Mathman, you chose your username really well. But do your quations work for other skaters as well? And how did you figure this out?!?!
For most skaters, the equations should work pretty well in predicting the PCSs in that particular event.

There is a statistic called the coefficient of correlation that addresses the question of how acurate these predictions are likely to be. For the data that I used here (2007 Worlds LP, men and ladies separately), these coefficients all turned out to be about 75%.

That's not bad. If it were 100%, then each prediction would be absolutely accurate. If it were 0, the predictions would be totally worthless.

How to figure it out? This is a topic in the general subject of "curve fitting." If you have a bunch of data points scattered over a piece of graph paper, try to draw a standard type of curve which matches the data points as closely as possible. The most usual model is a straight line. This is formula #1. Formula #1 is the equation of the straight line (called the least squares regression line) that matches the data the best.

The others are different kinds of curves. For instance, formula #3 represents exponential growth. This would be the right model if the top skaters had PCSs through the roof with just a small increase in tech, while the lower level skaters were basically stuck with uniform low PCS scores which did not rise much even when they increased their tech a lot.
 
Last edited:
Joined
Jun 21, 2003
I keep meaning to ask you ... what's up with the ISU calculation of the PCS? The PCS scale is ordinal, yes? Similar to a Likert scale? Ordinal scores do not have a (meaningful) mean. They have a median. But the calculation is a trimmed mean, calculated after 3 judges' scores are tossed at random. Surely this is statistical molestation?!
This is an interesting question. As Dr. Rossano says, the short answer is no, PCS are supposed to be absolute numbers based on a standard that is built into the ISU rules. So, no, they are not ordinals, and the mean is an appropriate measure of central tendancy. (BTW, if you visit Rossano's site and peruse the archived articles, you will see all kinds of cool analyses of figure skating statistics, and especially of the IJS. ::thumbsup: )

But...

In practice, yeah, they are sort of like a Likert scale, IMHO. At each level of skating, the judges pretty much know what the range of PCSs is likely to be. Like for instance for the next-to-last warmup group at U.S. Nationals for junior ladies, you pretty much know that the scores are going to be, say, 4.5 to 5.75. So you give the best skater in the group 5.75, the second best 5.50, the next 5.25, etc.

The trimming procedure is sort of a compromise between the median and the mean (although the main rationale for trimming is something different -- it lets you throw out ridiculous outliers caused by bias, cheating -- or just keyboard error!) If you had seven judges and decided to trim by throwing out the top three and the bottom three, the resulting "trimmed mean" would be the median.
I know they toss 3 at random to make up for the anonymous judging. As if. But tossing and trimming? Sounds like it oughta be illegal to me.
The supposed purpose in tossing out three at random is not exactly to make up for random judging. As I understand it (?), it is supposed to protect the anonymity of the judges.

The idea (Mr. Cinquanta's little brainchild, that was part even of the "interim system" that predated the CoP by one year) is something like this. You are an honest judge being hectored by the evil president of your national federation to go along with the conspiracy that he has cooked up. But you doublecross him and vote your conscience. Aferwards you lie and say, "yup, boss, I voted the way you told me to, sure did, yup."

The boss says, hey, I don't see any scores like what I told you to give out.

So the judge says, oh, I guess my scores must have been thrown out in the random draw.

This scenario is the premise of the random draw!

Is this any way to run a sport? (Not to mention, all the scores are in plain view in the protocols anyway, and you can almost always figure out which scores were used and which were tossed.)

Statistically, I don't think either the tossing or the trimming makes much difference one way or the other. In fact, one could argue that choosing ten judges at random to sit at the table, then choosing 7 of those 10 to count, is exactly the same as choosing 7 from the original pool in the first place. (I have Hockeyfan228 to thank for explaining to me why this isn't quite true, because of the likelyhood that different judges' scores will be tossed in the short and long programs.)

Anyway, from a strictly statistical point of view the best solution would be to increase the judging panel to 1000 and count everybody's score.
 
Last edited:

hockeyfan228

Record Breaker
Joined
Jul 26, 2003
But you doublecross him and vote your conscience. Aferwards you lie and say, "yup, boss, I voted the way you told me to, sure did, yup."

The boss says, hey, I don't see any scores like what I told you to give out.

So the judge says, oh, I guess my scores must have been thrown out in the random draw.
That won't work, though, because all of the judges scores are shown in the protocols. If the boss doesn't see anything like what s/he ageed to, then the judge didn't give those scores, regardless of whether that judge's scores were selected or dropped in the trim.

About the subject of this thread, :bow:Mathman!
 

gsrossano

Final Flight
Joined
Nov 11, 2006
That won't work, though, because all of the judges scores are shown in the protocols. If the boss doesn't see anything like what s/he ageed to, then the judge didn't give those scores, regardless of whether that judge's scores were selected or dropped in the trim.

Exactly.

I am the evil federation head, and you are the judge. I tell you to boost the scores of our skaters, or my buddies skaters, and to mark down the competition. I also tell you to give certain specific PC scores for a few skaters. Say some skaters in the middle or the bottom that no one cares about. We can guess pretty close what marks to give those skaters that won't look suspicious.

Then the protocol comes out. I look for the agreed upon scores. It they are not there you didn't do what I said and you are toast. If you did do what I ordered then I can identify which set of marks are yours and I can identify what marks you gave all the skaters we cared about.

This also works for me and my buddy judges to cut deals on our own and verify the deal was followed through.

IJS does NOTHING to prevent judges from cutting deals and verifying the deal has been honored. It does NOTHING to stop a federation from making it's judges mark a certain way and checking up on them.

It is also possible to identify the discard judges. I used to do it by hand, but after a couple of season I built it into my analysis software. After I enter a protocol into the computer, I click a button, and all the discard judges get identified.

Anyway, from a strictly statistical point of view the best solution would be to increase the judging panel to 1000 and count everybody's score.

Pop Quiz.

The random error in the scores is typically 3/4 to 1 point for 7 sets of marks. Some times a little more. Call it 1 point.

If you want to get the random statistical error down from 1 point to 0.01 point (the smallest difference in score that is calculated), and if the error goes as the square root of the number of judges, how many judges do you need?

(Hint: It's a LOT more than 1000 judges.)
 
Last edited:
Joined
Jun 21, 2003
But.... what is exactly the meaning behind these figures? Are comparisons possible between different events?
Here is what the numbers mean. :)

In the formula (for men)

PCS = .6 x TES + 24.5

the .6 means: For every extra tech point I can muster, I expect my PCS to go up, on the average, by six tenths of a point.

(The reason you don’t get more of an increase in PCSs – like a full point – is because of the phenomenon of “regression to the mean”: left to its own devices, everything eventually turns out blah. ;) )

The 24.5 means, this is the PCS that you would expect to get if you just skated around for four and a half minutes demonstrating your edging and choreography, but did not do any jumps, spins, or footwork, and got a zero in tech.

So if you added, say, a level 1 camel spin to your act (1.2 points), you would expect your PCS to go up by .6 x 1.2 = 0.72 points. Added to the 24.5, now your PCS is 25.22.

(In practice, though, these predictions are accurate only for skaters near the average in both scores. At the extreme ends, you can’t say as much.)

Yes, you can easily compare one event with another. It might turn out that in one event there is a close correlation between the two scores, and in another, not so much.

GSRossano, however, says in post 5 above, that he noticed pretty much the same results across all levels over many events – which seems like common sense, if you think about it. The best skaters are, on the average, the best in both categories, technical and presentation.
gsrossano said:
For the highest TES, PCS spread is fairly low (compare to lowest values of TES).
That's very interesting. I suppose it makes sense that among skaters who can do only a few technical tricks, some of them have pretty good presentation skills and others not.

But at the highest levels we see skaters like Plushenko and Joubert getting PCS as high as Lambiel's and better than Buttle's and Weir's.
gsrossano said:
(Hint: It's a LOT more than 1000 judges.)
Those hundredths of a point! :laugh:

Well, I'm glad I'm not a judge, to have to decide whether that performance was worth 5.25 or 5.50 in Interpretation.
 
Last edited:

Hsuhs

Record Breaker
Joined
Dec 8, 2006
It is also possible to identify the discard judges. I used to do it by hand, but after a couple of season I built it into my analysis software. After I enter a protocol into the computer, I click a button, and all the discard judges get identified.

It is possible. They themselves came up with a formula which would allow to detect a possible deviation in the scores. Here it is (page 4):
http://www.isu.org/vsite/vfile/page/fileurl/0,11040,4844-168542-185760-63933-0-file,00.pdf

If I apply this simple calculation to, say, TEB 2006 mens, more specifically, to Brian Joubert's SP scores:
http://www.isufs.org/results/gpfra06/gpfra06_Men_SP_Scores.pdf

I see that the highly anonymous judge number 1's (presumably French) scores exceed the "acceptable number of deviation points".

(Hint: It's a LOT more than 1000 judges.)

So why not let the audience vote on PCS?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
the .6 means: For every extra tech point I can muster, I expect my PCS to go up, on the average, by six tenths of a point.

(The reason you don’t get more of an increase in PCSs – like a full point

six tenths of a point per every tech point sound like a good deal to me.
 
Last edited:

Hsuhs

Record Breaker
Joined
Dec 8, 2006
Here is what the numbers mean. :)

Yes, you can easily compare one event with another. It might turn out that in one event there is a close correlation between the two scores, and in another, not so much.

This is what I don't understand. We have events (like CoC) where the majority of the participants are ninth- and eighth-graders, and then there are events (like NHK) with seventh- and sixth-graders for the most part. All these 'students' get the same math test. How come the group GPAs are comparable?

Assuming there were 70,000 people in the audience.

ppl can call or send text messages
 

gkelly

Record Breaker
Joined
Jul 26, 2003
This is what I don't understand. We have events (like CoC) where the majority of the participants are ninth- and eighth-graders, and then there are events (like NHK) with seventh- and sixth-graders for the most part. All these 'students' get the same math test. How come the group GPAs are comparable?

Can you explain what you mean here?

ppl can call or send text messages

How can they judge skating skills if they're not in the arena, preferably close to the ice?

Or should we just leave out judging skating in skating competitions because we can't get enough judges to get the random error down to 0.01 into position to judge it?


One answer could be to have experts (technical specialists, judges, etc.) judge the elements and the technical aspects of the skating and let the audience judge the artistic/entertainment value.
 
Joined
Jun 21, 2003
Or should we just leave out judging skating in skating competitions because we can't get enough judges to get the random error down to 0.01 into position to judge it?
I personally am not up in arms about it. In any sport there will be dead heats, photo finishes, judgements that are too close to call, close contests that could go either way, etc., etc., etc.

Not to mention all those competitions where the judges obviously must have cheated because my favorite skater didn't win!

What GSRossano is bringing out is that when someone wins or loses by mere hundredths of a point, the difference between winning and losing is utterly swamped by considerations of sampling error. That is, we have no confidence whatever that the skater won who skated best.

But then again, that's sport (IMHO).
Hsuhs said:
ppl can call or send text messages.
Does anyone remember how many people voted online in the Marshall's cheesfest in December of 2005? Was it something like 200,000? (I know I voted for Matt Savoie 20 times :laugh: )
 

gsrossano

Final Flight
Joined
Nov 11, 2006
What GSRossano is bringing out is that when someone wins or loses by mere hundredths of a point, the difference between winning and losing is utterly swamped by considerations of sampling error. That is, we have no confidence whatever that the skater won who skated best.

Exactly. And the answer to the quiz is indeed 70,000 judges.

The reality is, that even if you increase the size of panels somewhat and even if you could get the judgment of the panels more uniform, there is still going to be an uncertainty in what a program is worth of 0.5 to 1.0 points due to random errors. (There are also problems with deciding what a program is worth due to systematic errors).

So since I can't have 70,000 judges, and since I will never know what a program is really worth in an absolute sense, the practical and fair approach (IMO) is to round off scores to one point or half a point, and if you end up with a bunch of ties, such is life. And even if you end up with a bunch of ties in one event segment for total points, when you apply the tie breakers for TES/PCS and for one segment vs. another, you probably would not end up with many placement ties

If one believes in the fundamental design concept of IJS, that the skaters will earn points on an absolute scale that reflects the true value of a program, then one also has to accept the best we can do that is to 0.5-1.0 points, and let the chips fall where they may.
 

Hsuhs

Record Breaker
Joined
Dec 8, 2006
Can you explain what you mean here?

How can they judge skating skills if they're not in the arena, preferably close to the ice?

They won't have to judge skating skills, they'll answer a question like "I like skater X's performance": strongly disagree / disagree / neither agree nor disagree / agree / strongly agree.

Can you explain what you mean here?
I mean that skaters have different skill levels to begin with, and a different number of skaters of different levels participates in different events.
 
Top