Thursday, May 22, 2003
Maybe Length Isn't the Only Thing that Matters
Annika Sorenstam seems to be doing well on day 1 of the Colonial. Apparently all the "she doesn't have the length" stuff was just a psych-job by the players that is proving to be ineffective.

Her performance brings to mind an issue about sports performance that I wanted to take a second to vent about. Sports produce many statistics, but most focus on frequencies/percents (batting average, winning percentage) or lengths/distances (average runs which is ERA, average driving distance in golf). Nobody reports statistics on variability in performance, and even when they kind of do (e.g., greens in regulation) even then it's used as a frequency measure not an indicator of variability.

Which brings me back to Ms. Sorenstam. Would you prefer to hit your drives 280 yards in the fareway 90% of the time, or hit it 320 yards but in the fareway only 60% of the time? Lots of things affect the tradeoff (e.g., your ability to hit out of the rough or a bunker), but the opportunity to hit consistently from an advantageous lie would seem to at some point offset the value of extra distance. Perhaps you have to hit the fareway only 50% or 40% of the time for it be worth dialing back power. Or, if you only hit 250 90% of the time, you might prefer the length. But, the tradeoff exists, and the data exists in theory to identify the returns to distance. (I say in theory because they collect the data, but I don't know who stores it and would make it available to study.)

Think about it in a football context: would you rather have a runner who over 4 games rushes for 200 yards one game, but only 50 in the others (350 total), or would you rather have the guy who runs for 87.5 in each of 4 games (350 yards)? Similarly, would you rather have the guy with a 4.0 yard per carry average who 90% of the time rushes for between 3 and 5 yards, or the guy who rushes has 4.0 per carry but who will get anywhere between -2 and 10 on a carry. This seems to be the crux of the Emmitt Smith v. Barry Sanders debate. Different people will value the variability differences (some people are risk averse, some are risk seekers), but it's not unreasonable to prefer the low-variability guy. The truly risk averse would prefer the guy with a lower average (3.8) to a higher average (4.0) if the variability in the higher average guy contains too many possible negative outcomes (e.g., tackles behind the line of scrimmage). (All of this is apart from issues of "does the guy hold onto the football," which is also a tradeoff.)

The same holds for baseball. Over 16 at bats in 4 games, would you prefer to have a hitter who goes 4-for-16 by going 1-4 in each game, or a hitter who goes 0-0 in 3 games and 4-4 in the last (assume all are singles and the person plays for the Tigers so that there are never runners on base ahead of the hitter)? In one sense, it may not matter: if you're not going to score, who cares how you get on base. But, if we relax the assumption about runners being on base, I think it would be better in the long run to have the hitter with minimum variability, all else held equal. Sure, you'd hope that the 4-4 occurred with runners on base but even that has an expectation and is unlikely to coincide with your great hitting.

Now, does this mean that you'd prefer the minimum variance hitter (1-4 every night) if the other guy were hitting homers in each at bat? Of course not. But, if you're evaluating player performance, I would say the streakiness of the hitter matters and that you'd want to minimize variance in hitting (or runs allowed or whatever). Even if you didn't report variances all the time, if you reported medians occasionally, a statistic that is less influenced by the extremes (e.g., a run for 98 yards, giving up 5 runs in 1/3 of an inning), you might have a better sense of performance. Let's see if they talk about that in Moneyball. I'll try to buy it this weekend.

