Tuesday, August 24, 2004
Comeback Kid
I know I've been gone a while. I caught the aftermath of Hurricane Charley from Tampa over to Daytona. I'll fill you in on that when I get a chance. Quick observation on Orlando though: when did it become a territory of the British Empire? I heard more varieties of British accents than Henry Higgins.

On the Olympic gymnastics controversies, my question is this: how can there be disagreement or uncertainty about start values? In diving, the diver writes out the list of dives with the degree of difficulty, signs it, and submits it to the officials, who review it and put it on the board. The dive announced is the dive written on the sheet and that's the dive you perform. You do a different dive, it's a failed dive.

Now, given the complexity of combinations, gymnasts need some flexibility (ha-ha) in terms of move choices (e.g., you know after your round off back handspring you won't be able to add the full twist to your double-back). That's fine, but they obviously have some process for determining a start value--it can't be that hard to submit the list correctly and get the start value posted correctly. Do they even post/display/announce the start values in the gym like they do in diving? Seriously, do we have to waith for Al Trautwig--a definite improvement over John Tesh--to ask Tim Daggett about the start value after the routine? Why is this not part of the information about the person's name and country of origin?

If it's posted incorrectly, that might be the time to raise hell, although it should really happen at the time the list is submitted: you hand in your routine with your presented start value, the official signs off on the start value or if you disagree, you figure out a start value that both can sign to (all of which is done x minutes before the start of competition for that day). This is not the most complicated thing in the world, and it is not unreasonable to expect the least subjective element of the gymnastics scoring process to be error free.

I'm not sure what you can do to fix the scoring biases. It seems inherent to any rating system, like the BCS in college football. The judges already have the deduction scales for each deficiency (just as in diving--in fact, in diving you could theoretically get a negative score with all the deductions). Reputational effects abound in most of these instances (favorite divers, favorite college football teams/conferences, favorite gymnasts, calling balls and strikes, calling pass interference) with each judge having her/his own tastes.

However, one option might be to review judges' ratings over time to identify patterns of variability in ratings. You should be able to identify judges who consistently score (one or two standard deviations) above the average, below the average, or above/below final scores for particular gymnasts or teams, and penalize judges who show problems. You could also work out models for trend effects within competitions (the tendency for later performers to be scored higher than later ones, or early performer to be judged more harshly than later ones--I think there's literature on these kinds of effects).

Now, this does little to address all problems like upward bias ("grade inflation") in scores (you can reduce variance at the top of the scale by scoring people closer to 10.o, or 6.o for skating) and this might penalize judges who preserve standards (e.g., most judges inflate somebody to 9.90 but you hold firm to the 9.70 they deserved and get penalized for being outside the norm), but I think there are workable solutions to all of this. The data certainly exist to evaluate these patterns, and based on the patterns we should be able to determine incentive structures that fix the problems.

