However, I still maintain that in a judged discipline/sport, the agreement between experts is the best that we can go by, and for its intents and purposes, can be considered the "objective" standard. If one wishes to make a theoretical "population mean of expert judgements" (or your example of superduper judges) and set that as the objective standard, then in theory it follows that a larger judging panel can mitigate individual judges' biases (the ultimate example of subjectivity) better than a smaller panel.
Thus, to me, a statistical problem remains (and so I believe it is important to not let the panel of judges shrink further to say, 3). How can it not be so? You can apply statistical analyses to graded criteria, even if subjectivity has a role in defining part of them.