I'm still not sure what exactly IJS is supposed to measure, what's the name of the construct? Without a certainty of that knowledge, I'm afraid I'm clueless about the best fit.
We can argue forever whether the judges' scores reflect noisy samples of some underlying "truth", or whether there is no independent "truth" except what emerges as a consensus from judges' scores.
I believe there is no independent "truth" but only a consensus at each moment which evolves as time goes on. Judges in the past, say 30 years ago had different standards compared to the current judges. After another 30 years, they will give different scores for the same performance even if the current judging system is still used in the future. It is fundamentally impossible to set the absolute standard because FS is still evolving while judges are human; they are inherently limited to what they have learned, what they have experienced and so on.
I understand the scores as follows.
Suppose we have three skaters and let a judge decide the ranking of their performances according to his own standard. Each skaters will get his own ranking from 1 to 3. Now the judge might feel that, even if he gave the numbers from 1 to 3, the number one guy is extremely good compared with the other two and that it would be unfair to express the difference just by the ranking. This is maybe because the number of skaters are just three. If there were, say, seven more skaters who are reasonably good, then these seven skaters actually might be ranked from 2nd to 8th. The judge would be happier if he gives the number 1 and 9 and 10 to the original three skaters, respectively, even when there are only three skaters.
Go one step further and imagine all the skaters in the world and let the judge do his job. Well, all the skaters in the world at a specific time might not be enough, because sometimes there are only, say 3 top-level skaters and there are 2nd tier 20 skaters and the difference between the two group might be huge. In this case, let the judge fill the gap by imagining hypothetical skaters. What I am trying to do here is, let the judge imagine huge number of "evenly distributed" (according to his own standard) hypothetical performances and identify the ranking of each performance. You may say that the number of hypothetical performances go to infinity but this is not a problem because we can ask the judge to normalize the ranking to the number between say 0 to 10.
One more step. Now consider all judges existing in the world at the present time. Let each judge do the same thing. In general each judge has his own standard which will be all different. So take average. Then you will get a unique number for each peformance (or for each element under consideration). It may be viewed as the best number representing the performance. It may not be considered as an "independent truth" but is a consensus at present.
Of course you cannot do this in reality. The only think you can do is select some finite number of judges. There is no guarantee that their standards are the same. Then the best you can do is make a guideline in advance which most judges at present would think reasonable, and ask the judges to consult it to give the "normalized ranking" for each performance. This will be the score we get. In this way, even if you cannot really "measure" something (in the sense of measuring the length of a rod), whether it is TES or PCS, you can give a quantitative number for each performance (or for each element) and the differences between the scores can have a meaning as differences between the normalized rankings according to the consensus at the moment.
Last edited: