In contrast, if you give someone a thousand rods of ever so slightly different lengths, and the task is to place them from 1 to 1000 in order of length, allowing the subject to compare them two by two (this one is longer than that one), then everyone can do it and get the entire sequence right.
Applied to figure skating judging, this would argue in favor of rankings. When one skater performs and you have to decide whether that performance was an 8.25 rather than an 8.50 (choose one out of 40 categories), you are up the creek. But if every time a skater goes the judge says, that was better than the skaters that I so far have ranked worst, second worse, and third worst, but worse than the skaters that I have so far ranked best, 2nd best and 3rd best -- that, in principle, is no problem.
There are a couple of issues with this analogy.
The most obvious is that the length of a rod is one-dimensional, whereas a skating performance is multidimensional (is that what you were referring to by "choose one out of 40 categories"?).
And only one decision needs to be made for each rod.
It might be possible to compare single elements and come up with rankings for, e.g., all the solo double axels in the event or all the laybacks in the event, to choose relatively simple elements.
Well, not really. Because there are different ways in which double axels or laybacks can be good, or not so good. Using a length criterion only, comparable to the length of the rods, it would be comparably simple to determine which axel traveled the furthest in the air, or which layback traveled the least on the ice (centered the best). Ranking the length of jumps would be a pretty similar task to ranking the length of rods.
But ranking the
quality of the jumps involves a lot more dimensions than just length. How high off the ice did the skater jump? How fast was each skater traveling across the ice on the takeoffs and landings (which is usually correlated with distance in the air)? How long was the landing edge held? What was the quality of security and flow and lack of skidding or scratching on the takeoff and landing edges? What was the quality of the body position on takeoff, in the air, and on the landing? Was there any extra difficulty added to the element in the takeoff approach, in the air position (e.g., delay), and/or on the landing? Was the element timed or otherwise explicitly connected to the music? Aside from possible qualitative weaknesses such as "poor takeoff," "poor air position," "poor landing edge or landing position," were there any outright errors such as touching down a hand or free foot, stepping out of the landing, falling on the landing, underrotation (and by how much), landing on two feet, etc.?
Even in just ranking double axels, each judge would need to make decisions about all those aspects of the jumps. It wouldn't be just ranking each axel on distance, height, edge quality, positions, etc., but also determining
how much better one axel was than another on each quality in order to figure out overall rankings. And that's just assuming that all these dimensions should be weighted equally. There are also value judgments involved whereby either each judge individually or the technical panel or the rules and Scale of Values may decide that some of those dimensions, qualities, and errors are more important than others. Should a small, slow, downgraded, two-footed double axel rank higher or lower than a big fast one landed on one foot followed by a fall? Does remaining upright always trump rotation, or vice versa, and how does length figure in?
If judges are responsible for scoring whole programs, or giving a single score for "jumps," they can make mental or on-paper notes about each double axel and then take as many of those qualities into account, along with the qualities of all the other elements or other jumps, to come up with that single score or ranking for the program or the jumps collectively across skaters.
Multiply that by up to 7 jump elements or 12 total elements, each with a similar or more complex array of dimensions to evaluate, and just comparing "elements" or "jumps" could be 100 times more complicated than comparing the length of a rod or length of one jump.
Or with IJS-style scoring judges can just give a score for each axel based on positive and negative GOE criteria (with the tech panel possibly affecting the base value by rotation calls) and then move on, without needing to compare this axel directly to specific axels in this event, only to generalized standards for what makes a good, better, or flawed double axel in general.
It's possible to compare two rods, or several rods (though not really 1000 simultaneously), side by side to make decisions.
Skating performances take place across time. Therefore, it is never possible to compare even two performances simultaneously.
(Yes, it's possible for fans or officials to make videos comparing two or more elements or whole programs side by side after the fact. That might work for some kinds of comparisons -- except for compulsory pattern dances to the same or identically timed music it wouldn't work for the Interpretation component. But real-life competitions take place in real time with one skater performing at a time.)
You're always comparing to a memory of events that happened between 5 minutes and 5 hours earlier, not to something right in front of you at the same time.
She is better than the skater who I have in fourth place but worse than the skater who I have in third. Except for not being able to remember the details of each skater's performance all day long (

), this requires only two head-to head comparisons.
The remembering performances across several hours is far from trivial.