The Wine Buyer’s Dilemma, The Wine Economists’ Blunder
The American Association of Wine Economists—yes, there is such an entity—released a working paper the other week that caused a stir among grape nuts. “The Buyer’s Dilemma—Whose Rating Should a Wine Drinker Pay Attention To?”, co-authored by Omer Gokcekus of Seton Hall University and Dennis Nottebaum of the University of Münster, compared “expert” scores for 120 wines from the 2005 Bordeaux vintage with the ratings given those wines by CellarTracker users. The experts were Robert Parker, Stephen Tanzer, and the Wine Spectator, and the results were eye-catching: Parker and the Spectator had significantly higher average scores than the CT community, while Tanzer’s average score very nearly matched the CT average, coming in just a smidge under it. Based on these findings, the answer to the question is clear: Tanzer is the critic to heed. Buyer’s dilemma solved!
Bizarrely, Gokcekus and Nottebaum misreport their findings in the paper’s Abstract: they claim that the average scores for all three expert sources were higher than the CT ratings. Only when you get to the section devoted to Tanzer do you discover that his average score was actually lower. Also, Tanzer’s first name is spelled “Stephan” several times in the text. I understand that this is a “working” paper, but before it was released, it should have been scrubbed of these mistakes, which naturally cause the reader to wonder if the same carelessness pervaded the entire project. As it is, the study was compromised by a more fundamental mistake, which I’ll get to in a minute.
First, let’s go inside the numbers (someone has clearly been watching too much SportsCenter). Parker rated 107 of the 120 wines that were used in the study; his average score for those wines was 93.2, versus 91.7 on CellarTracker. The Spectator published ratings for 104 of the wines, and its average score was 93.24 versus 91.73 on CT. Tanzer graded 61 of the 120 wines, and his average score was slightly lower than the community’s—92.08 versus 92.16. You will note that the sample size for Tanzer was markedly smaller than those for Parker and the Spectator. Via email, Gokcekus told me that this wasn’t a problem: “When you conduct statistical tests the data size is taken into account. Thus, statistically speaking, the results are not skewed. [Tanzer’s] smaller data set is not giving him any advantage or disadvantage.” The average price of the wines that Tanzer reviewed was $145.60, versus $109.97 for Parker and $119.81 for the Spectator.
Although Tanzer is the study’s big winner, the paper focuses mainly on Parker (more of this Parker obsession!), and it’s his numbers that are most closely scrutinized. Wines that Parker rated 89 or 90 points received slightly higher scores from the CT crowd, but above 90 points, CT users were notably tougher graders. Wines that got 93 points from Parker averaged 91.5 points on CT; wines that he awarded 95 points garnered just 92 points on CT: and wines that earned 98 Parker points averaged 95 CT points. Gokcekus and Nottebaum offer two possible explanations for these discrepancies. They suggest that Parker might be more adept at sniffing out nuances than regular wine drinkers. Or it could be that CT reviewers are engaging in “iconoclastic” behavior—they resent Parker’s influence (his “hegemony over the wine community,” as Gokcekus and Nottebaum put it) and are using their own ratings as a form of rebellion against him.
You are not buying that last part? Me neither, but let’s set that aside for a moment. While “The Buyer’s Dilemma” is an interesting paper, I have issues with how the study was done and how Gokcekus and Nottebaum interpreted the data. The major flaw is that they chose a less-than-ideal set of wines as the basis for their research: the 2005 vintage in Bordeaux produced hugely structured clarets that are still years from reaching maturity. Gokcekus told me that they picked 2005 because “at the time we started this research project it was the youngest vintage that had already been scored by a large number of users on CT.” But the 05s are too young, and this may have skewed the results. Critics have graded the 05s based on the potential that they see in the wines. Whether you believe that the pros have the ability to accurately predict how a wine is likely to evolve over the course of decades is immaterial; till now, the experts have rated the 05s mostly with an eye to future performance.
By contrast, I’d guess that the majority of CellarTracker 05 Bordeaux scores were derived on the basis of immediate gratification, and because many of these wines are still painfully tight, that may have held down the CT ratings. It is more than likely that the two groups, the pros and the amateurs, were using different yardsticks to judge the wines, which could account for the split between the CT community and Parker and the Spectator. This is why the choice of wines was so key. Gokcekus and Nottebaum would have been much better off using wines that offered enough upfront pleasure now that this criteria gap, if you will, wouldn’t have been such an issue. A recent vintage of Napa cabernets would have been a more sensible option. Given where they are in their development, 05 Bordeaux were almost uniquely ill-suited to this kind of study.
Gokcekus and Nottebaum suggest that the difference between Parker’s average score and the CT average might be rooted in Parker’s ability to discern nuances that would likely elude everyday drinkers. I think they flatter Parker. For one thing, the 05s are still very primary and aren’t yet offering much in the way of subtleties. Beyond that, there is the issue of methodology. Parker and other critics taste dozens of wines a day and probably spend no more than 30 or 40 seconds evaluating most of them, so how much nuance can they really perceive? We know from the research into sensory perception that repeated, rapid exposure to the same basic aromas and flavors dulls the senses. Anyone who has ever tasted, oh, 40 young Barolos in a single sitting can attest to that (once they regain the ability to move their tannin-battered mouths). So the nuance argument gives Parker and other critics more credit than they probably deserve.
The “iconoclastic” explanation strikes me as completely farfetched. While there has always been resentment of Parker’s influence, it has generally been confined to fellow critics and winemakers; I don’t think consumers have felt much resentment—quite the opposite, actually. And while a small subset of wine enthusiasts seem to now define themselves in opposition to Parker—basically, if he’s for it, they are against it—I doubt that large numbers of CellarTracker users are scoring wines differently from Parker simply for the purpose of sticking it to The Man. If that were the case, the discrepancies would surely be greater. If you are going to make a statement, Make A Statement. A 2-3 point margin is a difference of opinion; a 20-point margin is an act of defiance, a raised middle finger.
In conjuring such an outlandish scenario, Gokcekus and Nottebaum overlooked what I think is the most intriguing possibility of all: that the differences between expert ratings and CT scores may be indicative of grade inflation on the part of critics. I’m repeating myself here, so forgive me, but there are powerful incentives now for critics to bump up their scores. High scores are catnip for retailers, who use them to flog wines via shelf talkers and email offers. In turn, those citations are excellent free publicity for critics. In a tough economy and a crowded marketplace for wine information, big numbers can help a critic to stand out, and I don’t think there is any doubt that score inflation has become rampant. Tanzer is one of the few critics who has held the line on scores, and the fact that his ratings most closely track the CT ratings strikes me as a highly suggestive data point. Parker and the Spectator are not as restrained as Tanzer, and this may account for the gap between their grades and those of the CT community. CT users might be amateurs, but many of them are very experienced and knowledgeable tasters, and it could be that they are giving these wines scores that more accurately reflect their quality. But, again, because Gokcekus and Nottebaum based their investigation on a less-than-ideal group of wines, this is purely speculation.
Gokcekus told me that he and Nottebaum plan to conduct more research on this topic, using a different region/vintage combination. Let’s hope they choose more wisely next time. Picking the wrong wines can spoil a dinner; it can also needlessly complicate a study.