subscribe: Posts | Comments      Facebook      Email Steve

BevMo tasting shows strengths, weaknesses of group rankings


Wilfred Wong’s blind tasting at BevMo (I think his title is e-cellarmaster) was a first of its kind, and quite an interesting affair, as it brought together an impressive range of industry types, most of whom came not only for the tasting but out of respect for Wilfred. Over a long career (longer, even, than mine!) Wilfred has stored up a deep repository of goodwill.

The purpose of the tasting was to measure some of BevMo’s bottlings (wines made for the chain) against others of their type and price. All four flights were blind. As is always the case, a group blind tasting is fascinating and frustrating. On the plus side of course is tasting the wines themselves. If they have been well-chosen (and Wilfred did a good job) it’s an interesting exercise in judgment. Then too, it’s always of academic interest to see what the group does. A group is a nebulous creature with no mind of its own, except a statistical one. On the debit side is that a group ranking is simply a mathematical number crunch. The fact that wine “A” came in first means only that more people preferred it than the others. It does not mean that nobody detested it.

There were four flights. The first was three Sauvignon Blancs. The group favorite was Husch 2008 (Mendocino), while mine was Vigilance 2008, from Lake County. I always did like those Lake County Sauvs that are so rich and savory. Last place was “75” Sauvignon Blanc (Napa Valley). But in truth, all these wines were pretty much of a piece.

The second flight was red table wines. Here again my first, Kumbaya non-vintage, was the group’s second, while their first, Red Truck 2008, was my second. And once again we agreed on the last place finisher, Folie a Deux 2008 Menage a Trois. This was a boring flight, but such wines are useful in the market. (Average price: about $10)

The third flight consisted of three Mendoza Malbecs. All I had to say was that there wasn’t a dime’s worth of difference between them. Here, the group and I were of accord, in this order (all 2008s): Alta Vista Classic, Zolo Gaucho Select and Crios de Susana Balbo.

Things got interesting in the fourth flight, which was ultra-premium Pinot Noir. The quality of these wines, as opposed to the first three flights, was instantly obvious. You knew you were dealing with wines of substance. From the discussion that followed — even before the results were announced — it was clear that there was widespread discrepancy between people’s impressions. Why would there not be? One person raves about something; another loathes it. Still others, one imagines, are not sure how they feel. In this instance, you have to wonder whether a group score possesses any credibility at all. That’s why I wonder about some of these online sites that purport to aggregate many different critical reviews, or to add them all up and calculate some kind of average. I noticed also during the discussion something that struck me, which I’ve been dimly aware of for some time, until this tasting really brought it home to me. That concerns how some professional tasters are so eager to discover what they fancy are technical flaws in the wines, and then announce them to the group. Somebody will find TCA, or brett, or some other kind of mold or imbalance. It’s a kind of gamesmanship. I recall very high level tastings with some very famous names in which, as soon as the wines were poured, there was a race to sniff through them all to be the first to cry out, “Corked!” It’s like playing Bingo.

Anyway, in the Pinot Noir tasting my scores and the group’s were wildly divergent. All the wines were 2007s. Here are my preferences, from first to last, with group rankings in parentheses, followed by my Wine Enthusiast rating, if applicable.

1. Dutton-Goldfield 2007 Devil’s Gulch, Marin County (Group 6th) (I scored this 90 points in Wine Enthusiast.)
2. Roar Garys’ Vineyard, Santa Lucia Highlands (Group 1st) (WE: 94 points.)
3. Patz & Hall Hyde Vineyard, Carneros (Group 4th) (unreviewed)
4. De Ponte Baldwin Reserve, Dundee Hills (Group 12th) (unreviewed)
5 (tie). Beau Freres Ribbon Ridge (Group 8th) (unreviewed)
5 (tie). Peay Scallop Shelf, Sonoma Coast (Group 11th) (unreviewed)
6. Failla Hirsch Vineyard, Sonoma Coast (Group 9th) (WE: 94 points)
7. Kosta Browne, Sonoma Coast (Group 3rd) (unreviewed)
8. Navarro Deep End, Anderson Valley (Group 7th) (unreviewed)
9. Testarossa Sierra Madre Vineyard, Santa Maria Valley (Group 2nd) (unreviewed)
10. Dahlia Reserve, Monterey County (Group 5th) (unreviewed)

The fact is that all these wines, with the possible exception of the Dahlia (which was pretty simple) were quite good. Their order of ranking would easily shift if you repeated this tasting the next day.

  1. Love this line: “A group is a nebulous creature with no mind of its own, except a statistical one.” I did my own rant a couple of days ago re: group tastings at wine competitions, and the problems they fail to acknowledge.
    You don’t list vintages for these wines – pourquoi?

  2. I totally agree that the aggregate reviews are less than helpful. I noticed that all the score seemed to gravitate to the 89-90 range.

    The tasting notes are worth a read.

  3. The rating & comments on the Pinot Tasting are particularly interesting, especially noting that the KB was the #4 “Wine of the Year”, in WS.

  4. My dear Gregutt, read more carefully. “All the wines were 2007s.”

  5. RE #4 for KB.

    The list of wines tasted was very special, and when one tastes wines of that quality, rankings are less important than overall quality, thus having that wine 7th probably means that it was very close to everything in the middle, and there are some quite good wines there.

    As for the wine finishing 4th with WS, that is a different topic altogether. For my taste, the KB was very good, getting 91 points in Connoisseurs’ Guide, but there were other KB Pinots (six, in fact) that I liked better.

    Still, at the level of the wines tasted, the individual tasting notes become more important than the scores because, no matter how you slice it, the Roar wine is simply a different breed than the KB, which is again very different from the Oregon wines.

  6. Diane Thompson says:

    Interesting comments. Not surprised to see that the Roar was universally liked. I’d be interested in whether there was some particular quality in the Testerossa that resulted in your divergent opinion from the group.

  7. Diane, the Testarossa was a fine wine. I just didn’t place it in the top ones on that day.

  8. Charlie makes excellent points. At wines of this quality numerical differences can be misinterpreted to mean more than they actually do.

  9. “The fact that wine “A” came in first means only that more people preferred it than the others. It does not mean that nobody detested it.”

    Exactly. The average of 82 and 100 scores is the same as 90 and 92. An average is not terribly useful without some measure of variability.

  10. The tasting was fascinating on so many levels. In all four flights, I gained insights into wines I really wanted to put to a test and was able to corral a group of top tasters (including Steve Heimhoff, who contributed greatly to the disscussion) to join me in the exercise. Oh yes and the Pinot Noir flight was super! I would drink anyone of those twelve wines anytime. One note: the Dahlia Reserve Pinot Noir (a BevMo vineyard partner (exclusive) is 2008 and the Calera Ryan is 2006. I do agree with Charlie Olken that with this caliber of Pinot Noir, rankings are not as important as good solid notes: I loved the 2007 Roar Pinot Noir…so incredible with is pure red fruit flavors and brightness…long and sexy providing endless enjoyment to anyone looking for the Holy Grail.

  11. Charley is of course right when he says that such top drawer wines are difficult to rank. Each would have gone to the top in a flight that included less stellar wines. Nonetheless, even the most experienced tasters can have their preferences among the best of the best. What struck me was Steve’s deviation from the group for his first wine, his 4th choice, 7th and 9th place. I commend him for being so forthright, but surely this says something about personal preferences vs. other those of the other experts. I wonder who tracked with the group and what this says methodologically.

  12. In fact, there are several statistical ways to deal with variability (dispersion); the mean absolute deviation (MAD) and the standard deviation (STD) being the most popular ones.
    You can divide, or subtract, the wine average rating by its measured dispersion (MAD or STD). This way, if someone detests one specific wine, it will be strongly penalized in the overall evaluation.

  13. Note to Tom Merle–

    If, as you so kindly say, I am right, then Steve’s deviations are nothing of consequence. That is the problem and the joy of superstar tastings. It is hard to be wrong, and thus, one is always right. In a tasting like the Pinot flight, individual preference for style will separate the wines for each taster. Some will adore the richness of Roar, others will like the pure fruit of the KB and still others will prefer the more structured approach of the Oregons.

    All of this is to point out that preference order is only one measure. Quality is another, and in a tasting like this one, there are likely to be very small qualitative differences but significant organoleptic differences.

    It is at this junction in the conversation that we need to turn to specific scores and bring in a statistician to analyze differences from the group. In a forced preference order ranking, there will be differences. When Earl Singer and I started Connoisseurs’ Guide, we spent a great deal of time doing statistical analysis of results. There are chapters about how to do this in Amerine and I think in Broadbent, and we got the point at which we could recognize when we had statistical significance in results and when we did not.

    The problem was that we were not publishing rank order reports of our tastings of twelve wines (the most we tasted when we started although we now do sixteen divided into two flights of eight) but qualitative assessments. Our three-star system at the time was widely hailed as a good system, but eventually, like so many others, we adopted the 100-point system as well (let’s not go there now).

    The operative point, and it applies in spades here, is that preference has nothing to do with quality level. And all of this is a long way of saying that I think you are barking up the wrong tree.


  14. Charlie,

    I certainly agree that preference and quality are two very different parameters, though I disagree that they have ~nothing~ to do with one another. Frequently, for the “expert” taster and the regular person they dovetail. But to clarify: I am not interested in determining “quality” along the lines of Arthur at (“searching for truth among wines”) or ranking wines by how best they reflect/capture terroir. For most consumers, such matters are largely irrelevant. They are looking for more pleasure for the buck. Taste preference is the dominant variable.

    I wonder how Wilfred’s 25 tasters ranked the Pinots; by some idealized objective standard, as interpreted by the participants, or according to their own personal enjoyment? I suspect the latter. While palates, as I have noted before, can vary, I still think that there are some wines that have broad appeal… or the converse.

    If one offers opinions about wines to guide purchasing decisions then how individual (idiosyncratic) can those opinions be? Should such reviews attempt to identify those wines which hit some sweet spot among one’s readership. Since this is quite difficult, I think the BevMo tasting panel approach taken by Mr. Wong, like that developed by Eric LeVine, makes the most sense.


  15. I believe Mr. Olken has raised a very important point now. Quality level has to be estimated via; fault detection, integrity assessment, and objective (standardized when possible) evaluation of the economic means and processes employed throughout the wine production. Tasting preference is a whole different ball game.

  16. Tom–

    In a large group average, you lose the opportunity to recognize differences in character that might appeal to those tasters who do not have average preferences.

    Take ripe Zinfandels, for example. If a group average likes wines that are about in the middle of the ripeness and acidity range, it may disdain wines that are more tightly structured, higher in acidity, less hedonisticly rewarding at first glance, etc. A qualified critic (we can debate how and who at another time) is able to say, “This is a really good example of a super-ripe Zinfandel because it has fruit and acidity and does not taste like prunes”, and thus give it a very high score. The tasting note directs the readers to the positives for those who like the style. An agglomerated rating will always knock that wine down. Same for the higher acid Zins.

    To me, a good critic is able to recognize several sweet spots and also to assess the level of grandeur that is present. I should not have to be told that a flight of Pinots is very good to know that it is very good when I taste it blind. And I should be able to tell my readers why a great wine from Roar is worth the same high rating as a great PN from Oregon despite the fact that those wines are very different.

    Let me try you with a different taste group. I am a fan of mystery thrillers, both in writing and in the movies. I am far a less a fan of stories like Bridges of Madison County. Yet, if I were a book reviewer, I would instantly recognize that Bridges was a far better written book than anything that Robert Ludlum or Tom Clancy or Dan Brown ever wrote.

    The same is true about wine. I might prefer to drink a $20 Zinfandel from Ridge, but I damn well better know that a $100+ wine from Spottswoode is also great. It is not idiosyncractic or narrow reviews that eminate from most reviewers–and certainly should not be.

    Mr. O’ Connor–

    A wine lover does need to be able to bring technical qualitative judgment to his or her tasting vision, but quality to me goes beyond technical evaluation and must necessarily include hedonistic judgments as well. Tasting preference is simply integral to the way we enjoy wine as a beverage, and it cannot be divorced from technical considerations in making an overall quality judgment as reflected by the judgment words in a wine review and ultimately by the rating given. I am not sure how far apart we are, and would welcome further comments from you.

  17. “quality to me goes beyond technical evaluation and must necessarily include hedonistic judgments as well. ” Well said, Charlie!

  18. So, Charlie, the major (and minor) wine critics are not only bringing out these sorts of distinctions in their notes, but are awarding points based largely on style and perhaps sense of place. Meaning they give high points to wines that they may not have a preference for themselves, say where different Chards fall on the oakiness spectrum, or Zins by ripeness, but recognize that within its type some wines deserve a 93. I remain skeptical.

    I also think there is a bit of distortion in your use of the term average. As the average among a group of tasters moves above that magical threshold of 90, say, we know that while some are less impressed a large number of tasters don’t consider the wine average but excellent, etc. Moreover, when my wine club tastes, say, Old Vine Zins, sure some will like the fruit bombs, but most are less pleased and do not find that one or two represent outstanding examples of the super rich version of this wine. Where Old Vine shines it is simply a superb Zin–the oldness of the vines generally adding to the qualities that make Zin Zin.

    I’m not sure about your book analogy. Genres–mysteries, histories etc.– are species within the genus ‘book’. Ditto beer, tea, wine within the category beverage. Within each of those genres tastes of those who have developed a more critical appreciation that enhances their senses, not the novice, I believe will cluster on the graph.

    When my group tasted unoaked Chards, the Kim Crawford, as it does with SBs, triumphed because the winery puts the vino through malolactic fermentation hitting that sweet spot, figuratively speaking. When we compared “naked” to “clothed” the oaked examples had higher scores with the exception of the Crawford which has managed to balance great fruit with excellent winemaking including the roundness that comes with malo.

    Well this isn’t very articulate; I can’t match your wordsmithing or Steve’s, but then you guys are writers and I’m not.

  19. Thank you for your comments.
    I agree with your view that while technical and economic factors are essential to a great wine, it also has to provide pleasure. You can always say this is a tautological argument, though. But then again, I’m afraid I will have to disagree with you about the strict definition of quality.
    Although “quality” could, in some cases, be defined as a metaphysical concept, “the quality of something depends on the criteria being applied to it. From the neutral point of view, “quality” is simply the inseparable sum of its essential attributes or properties“. [Reese, William L. (1996)].
    According to T. M. Scanlon, and most models of behavior, not only utilitarianism, but most economic, psychological and social conceptions of behavior, “something’s being good consists in its being such that there is a reason to respond positively towards it”. And this reason, which elicits a positive response, is under normal conditions called the object’s “intrinsic value”.
    French contemporary philosopher Michel Onfray defines hedonism “as an introspective attitude to life based on taking pleasure that explores how to use the brain’s and the body’s capacities to their fullest extent”. Marian W. Baldy [Baldy, M., PH.D.;The University Wine Course; T.W.A.G.; 1996] sustains that “it may seem that winetasting theory assumes that we all perceive things in the same way. However, we know that winetasters differ from one another in their sensory responses due to different genetic and biological traits, such as in thresholds and anosmias for specific chemicals; personality and intelligence factors, which influence their interest, motivation and understanding; differences in experience and language usage; and personal and cultural factors that influence the taster’s hedonic response to a wine”.
    I will, therefore, continue to believe that quality is an independent (and objective) attribute, while “hedonistic judgment” refers to a personal and/or individual sensory evaluation and appreciation of wine.

  20. Tom–

    Yes to your first paragraph. I think that is the difference between tasting groups and critical evaluation by professionals who see the value in a wide range of styles. That does not mean that Berger is wrong and I am right, for example. I can follow Dan’s logic in his tasting notes and know exactly where he stands while also knowing which of the wines he likes I will also like.

    But it is arguable that the willingness of Connoisseurs’ Guide to like Pahlmeyer and Hobbs and Lewis Chardonnays while also liking Pfendler, Freestone and Marimar is of greater utility to the general public–so long as the tasting notes explain why. And the fact is that I see no reason why one style is right to the exclusion of others.

    Mr. O’Connor–

    I understand the academic argument, but most wine buyers reading reviews of wine do not view the exercise as metaphysics but existentialism. We agree on what is therefore it is.

  21. This comment may appear defensive but is meant only to add another dimension to the analysis above.

    Some people make wine to try to please as many people as possible. Others to try to make wine that please only certain people. In my mind, that is why sweet, fruit-forward wines score well as critics judge for what will please the average palate (greater number of people) and do it in a setting that is slanted towards appreciating cocktail-style wines (simple and hedonistic wines) not wines made for enjoying with food that stimulate discussion.

    Again, really not trying to be defensive as I am not concerned with our “ranking.” Only that it is a flaw in wine criticism and one of the reasons I do not send my wine to critics who judge for the average consumer or enter them in large panel tastings. This was a favor to Wilfred and is why it showed up.

    Andy Peay


  1. Good Reads Wednesday « Artisan Family of Wines - [...] [...]

Leave a Reply


Recent Comments

Recent Posts