ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES 50, 324-340 (19%) Outcome Trees and Baseball: A Study of Expertise and List-Length Effects RICHARD D. JOHNSON University of Alberta RICHARD D. RENNIE University of Regina AND GARY L. WELLS Iowa State University When people estimate the probability of an event using a list that includes all or most of the possible events, their estimate of that probability is lower than if the other possible events are not explicitly identified on the list (i.e., are collapsed into an all-other-possibilities category). This list-length (or pruning) effect has been demonstrated to occur even for people who have expertise or considerable knowledge in the event domain. We reasoned that the experts used in previous studies would be unlikely to have probabilistic representations of their problem domains (e.g., auto mechanics, auditors, hospitality managers). We used baseball experts (n = 35) and novices (n = 56) on the assumption that expertise in baseball almost certainly involves mental representations of probability for various baseball events. Subjects estimated the frequency of hits, walks, strikeouts, putouts, and “all other” outcomes for an average major league player in 100times at bat. Other subjects estimated these event outcome frequencies in a short-list condition (e.g., strikeouts, walks, and “all other”). Strong list-length effects were observed with novices; the frequency estimate for strikeouts, for example, was nearly twice as high in the short-list condition as in the long-list condition. Experts, however, showed no list-length effect and their estimated probabilities were very near the actual (normatively correct) probabilities in all conditions. We argue that the omission effect can be overridden by strong mental representations of the family of possible events and/or a clear knowledge of the probabilities associated with the events. As well, we argue that list-length effects seem to result at least in part from an anchoring-and-adjustment strategy. Q 19% Academic press, hc. Order of authorship was determined alphabetically and does not necessarily reflect the relative contributions of the authors. Requests for reprints can be sent to any of the three authors: Gary L. Wells, Department of Psychology, Iowa State University, Ames, IA 50011; Richard D. Rennie, Faculty of Management, University of Regina, Regina, Saskatchewan, Canada; or Richard D. Johnson, Faculty of Business, University of Alberta, Edmonton, Alberta, Canada. 324 0749-5978191$3.00 Copyright 0 1991 by Academic Press, Inc. All rights of reproduction in any form reserved. LIST-LENGTH EFFECTS AND 325 EXPERTISE Many situations require people to make judgments about the probability that some event has happened or will happen. What is the probability that a terrorist group will follow through with its threat to kill a hostage? What is the probability that the fire was caused by arson attributable to the owner of the building? What is the probability that a manuscript will be accepted by Journal X? Previous research indicates that the probabilities that people attach to various events can be influenced substantially by the extent to which all the possible events are explicitly listed for consideration. Using one of the above examples, people would estimate a higher probability that the hostage will be killed if they were asked to estimate only two probabilities, the probability of being killed and all other outcomes, than if they were asked to estimate several possible outcomes (e.g., probability that hostage is killed, probability that hostage is released, probability that hostage is rescued, probability that hostage escapes, and all other outcomes). This is what is known as the list-length effect (sometimes called an “omission” or “pruning” effect; see Fischhoff, Slavic, & Lichtenstein, 1978). By omitting the explicit mention of possible events in the “all other” category, the probability associated with the target event that is mentioned is increased and the probability associated with the all-other category is diminished relative to the longer-list condition. Normatively, estimated probabilities for all other outcomes in the short-list condition for the hostage problem should equal the total probability for “released” plus “rescued” plus “escapes” plus all other outcomes in the long-list condition. The general idea of a list-length effect is illustrated in Table 1. When given the relatively complete list, the outcomes C, D, and “all other” add up to total 59% of the outcomes. When given the relatively incomplete list, however, the “all other” category is only 40%. These two percentages should be equal because they include all outcomes other than A or B in both the complete and incomplete list. Somehow, the omission of an explicit request of the subject to estimate the percentage of outcomes belonging in C and D results in a shift of the percentages. TABLE 1 HYPOTHETICAL DATA ILLUSTRATIVE OF THE OMISSION EFFECT Relatively Outcome Outcome Outcome Outcome All other Total complete list A B C D possible outcomes Incomplete 21% 20% 10% 19% 30% 100% list Outcome A Outcome B All other possible outcomes Total 32% 28% 40% 100% 326 JOHNSON, RENNIE, AND WELLS Previous studies showing list-length effects might not be very surprising if we assume that the people estimating the frequencies, probabilities, or percentages had little or no knowledge of possible numerical values associated with the categories. In cases where people know little or nothing about the events in question, they might use the number of categories for reasonable guesses, giving each listed category the same probability. As a result, when there are 5 categories (as in the relatively complete list in Table 1) the value of 20% might be placed in each; when there are 3 categories (as in the incomplete list in Table 1) the value of 33% might be placed in each. This will produce a list-length effect but, of course, the list-length effect would not be very interesting in such a case. The surprising fact about list-length effects is that they appear to occur for novices and experts alike. Fischhoff, Slavic, and Lichtenstein (1978) first demonstrated this with experienced garage mechanics and college student subjects who were required to estimate the frequencies of causes for a car not starting. Subjects were asked to estimate how many times out of 1,000 a car will not start because of a defective fuel system, a defective ignition system, and so on. In some conditions, subjects evaluated relatively complete or full “trees” whereas others evaluated “pruned” trees. For example, one list might have left off battery and fuel problems (short list) but included starting, ignition, and other engine problems whereas another group evaluated all of these (long list). In each case there was a miscellaneous or “all other” category. Comparisons were made between the “all other” value in the short-list conditions and the total of “all other” plus the relevant categories in the long list that were left out of the short list. List-length effects were found across all six experiments using a variety of pruning methods (e.g., pruning, manipulating detail, splitting and fusing branches). Of greatest concern to us was that experienced mechanics, who presumably had considerable knowledge of car problems, showed strong list-length effects. Similar list-length effects have been found for experts in hospitality management on hospitality management problems (Dube-Rioux and Russo, 1988) and for professional auditors on auditing problems (Rennie & Johnson, 1988). We suggest that previous studies showing list-length effects for experts might be explained, at least in part, by the idea that the expertise of these individuals does not involve frequency or probability representations of their problem areas. Consider the case of auto mechanics. Their approach to an automotive problem is likely to be dictated more by considerations of symptoms, time, cost, and effort than by probability. For example, if a car does not start, the probability that it is a fuel problem versus the starter or the battery is quite irrelevant because the symptoms alone could easily eliminate the starter or battery; if the motor turns over when the key is turned, then neither the battery nor the starter are faulty. Their LIST-LENGTH EFFECTS AND EXPERTISE 327 expertise, therefore, is less a game of probability than it is a game of reading symptoms. As well, the expertise of an auto mechanic is in knowing the sequence for checking problems. One does not remove the carburetor prior to checking to see if the car has gas in the tank even if a faulty carburetor is more probable than a foolish driver who simply is out of gas. Examining the carburetor before examining the gas gauge can be costly, time consuming, and effortful. Our point is that experienced auto mechanics are likely to be both unfamiliar with and unconcerned about probabilities or frequencies. Although their experience in auto repair is likely to have given them some appreciation for the frequencies of various mechanical problems, mechanics become experts by virtue of their diagnostic skills (e.g., see what happens when starter leads are connected), knowledge of appropriate sequences (e.g., check battery prior to checking alternator), ability to read repair manuals, and knowledge of tools and gauges. The prior probability that a car problem rests in one of several system-failure locations might be a very poor guide to auto repair. Thus, we see no functional reason for auto mechanics to maintain a probabilistic representation for the general problem of why a car will not start. Symptoms, meters, and diagnostic sequence tests are more likely than probabilities to be included in the mental representation of auto mechanics. A similar argument can be applied to hospitality managers and auditing managers. Neither of these professions publish frequencies of the underlying event. Lack of such statistics can perhaps be attributed to the fact that it is not something that will make them successful or unsuccessful in their professions. In the auditing profession, for example, an auditor might have to decide the cause of a disputed amount disclosed on accounts receivable confirmation letters. Causes could include (a) the balance was recorded on another customer’s account, (b) balance was paid but was in transit, (c) goods were on consignment, and so on. As with the auto mechanic, however, the issue is not one of probability but instead a mixture of judgments involving such factors as symptoms and sequence. Some possible causes are easier to check than others (and, hence, tend to be performed first) and some are associated with certain symptoms (e.g., customer tends to claim prior payment if balance was recorded on another customer’s account but not if goods were on consignment). It is this kind of sequence and symptoms knowledge, rather than knowledge of prior probabilities, that makes an auditor an expert in his or her area. In this experiment we chose to use the domain of baseball for two reasons. First, baseball is naturally associated with probabilities. Batting average, on-base percentages, fielding percentages, and so on are probability expressions that are used routinely in the sport. Not all of the numerical expressions in baseball are represented as probabilities. Some, 328 JOHNSON, RENNIE, AND WELLS such as earned run average and slugging percentage, represent ratios that are unique to the sport and are not probabilities per se. Nevertheless, the statistical nature of the game stands in contrast to auto mechanics, hospitality management, and auditing. Expertise in baseball almost certainly should include knowledge that takes the form of probabilities or percentages. Indeed, probabilistic knowledge in baseball is critical to decisions regarding when to replace a pitcher, when to steal a base, whether or not to platoon a player, when to bunt, and almost every other decision. Managers and players are commonly criticized for not making the “percentage play” and these criticisms often are predicated on rather sophisticated conditional probabilities (e.g., batting average given that players are in scoring position versus bases are empty). Thus, although we question the extent to which auto mechanics, auditors, and hospitality managers represent strong tests of experts’ resistance to list-length effects, we accept a priori the proposition that baseball experts ought to be resistant to list-length effects in their assessments of frequency or probability for a baseball task. We do not wish to leave the impression that some problem domains such as auto mechanics are entirely devoid of probabilistic reasoning whereas baseball is entirely a matter of probabilistic reasoning. Our argument instead is one of the relative amount and sophistication of probabilistic reasoning. If baseball experts show list-length effects, arguments for the pervasiveness of the list-length effect would be on much stronger ground. The second reason for using the domain of baseball is that it allows comparisons of subjects’ estimated probabilities or frequencies to the actual probabilities or frequencies. For example, we know that the average major league baseball player strikes out 15 times for every 100 at bat (James, 1987). This is quite unlike previous studies that used an automotive problem, an auditing problem, and a hospitality management problem for which the actual values were unknown (or at least unreported). Thus, we cannot determine from previous studies whether subjects’ assessments were more accurate with the short list or the long list. Whether the long list or the short list yields more accurate probability assessments depends critically on why the list-length effect occurs. The dominant explanation in the published literature centers on the notion of memory availability (Fischhoff, Slavic, & Lichtenstein, 1978; Hit-t & Castellan, 1988). The idea is that a short list does not foster a complete representation of all the possible categories (e.g., all possible causes for a car not starting or all possible outcomes of a batter’s trip to the plate). According to this interpretation, a long list should produce more accurate assessments than a relatively short list because the problem with the short list is that not all possible items are mentally available. LIST-LENGTH EFFECTS AND EXPERTISE 329 An alternative interpretation of the omission effect centers on the anchoring-and-adjustment heuristic (Tversky & Kahneman, 1974). According to this interpretation, each category or item listed is assigned an equal probability initially and then adjustments are made upward or downward according to the subjects’ knowledge of the item. In the longer list in Table 1; for example, each of the 5 categories (A through D plus “all other”) might receive an initial value of 20% whereas each of the 3 categories in the shorter list might receive an initial value of 33%. These anchors might then be adjusted when the category is more closely considered, but adjustments typically are insufficient. Thus, suppose the actual value for outcome A is 25%. Because outcome A is anchored at 20% for the long list and 33% for the short list, insufficient adjustment from the anchor yields correspondingly different estimates for outcome A. According to this interpretation, the accuracy of short list versus that of a long list depends on the extent to which the actual (true or normatively correct) values are close to the initial anchors. If a short list includes a low probability event or a long list includes a high probability event, then actual and estimated values will be highly discrepant for those events. Conversely, if a short list includes only high probability events and a long list includes only low probability events, then the actual and estimated value would be quite close. Importantly, both the availability interpretation and the anchoring-andadjustment interpretation imply that expertise should mitigate the omission effect. Expertise should help assure that a non-mentioned item or category is recalled and expertise should help subjects make more sufficient adjustments away from the anchor. In baseball, for example, not explicitly listing a base on balls (walk) as a possible outcome of a time at bat should not have much of an effect on a baseball expert’s ability to recall that such an outcome is possible on any given trip to the plate. As well, even if only three possible outcomes are listed (e.g., hit, walk, and “all other”) such that an initial anchor of 33% is used, the baseball expert’s confident knowledge that walks are relatively rare in major league baseball should allow the expert to overcome most of the effect of the anchor. METHOD Subjects The expert subjects (n = 35) were involved in amateur baseball as umpires, coaches, or managers. The experts had an average of 11.4 years experience in organized baseball and a mean age of 23.9 years. The novices (n = 56) were undergraduate psychology students who participated in the experiment for course credit. Although a larger number of students 330 JOHNSON, RENNIE, AND WELLS had participated in the study, we excluded the responses of all students who had never been involved in baseball as a player, coach, or umpire. This was done in order to give meaning to any finding of relative expertise on the part of the experts. (That is, comparing experts’ judgments with those of people who knew absolutely nothing about baseball would not be a valid demonstration of expertise.) The student responses were then randomly chosen from the remaining group in order to equalize the eventual cell sizes. The students had an average of 3.8 years involvement in baseball and an average age of 19.6 years. In order to provide an independent measure of the relative baseball expertise of the two subject groups, subjects were asked to respond to 10 four-alternative multiple choice questions on major league baseball trivia and 10 true-false questions concerning the rules of baseball. The experts correctly answered an average of 9.0 and 9.4, respectively, of these questions while the corresponding means for the students were 4.1 and 5.3. The two populations were significantly different on both tests (t(90) = 16.3 and 17.0, p < .OOl for the trivia and rules tests, respectively), variability within groups was rather small (pooled standard deviations of 1.39 and 1.10 for the two tests respectively), and there was almost no overlap in scores for the two groups (ranges were 2-6 and 5-10 for the students and experts respectively on the trivia test; ranges were 3-8 and 7-10 for students and experts respectively on the rules test). Experimental Design The design of the experiment was a 2 x 2 x 4 factorial of expertise (experts or students), list length (3 or 5 possible outcomes), and pairs of the outcomes (see below) which were presented in the short-list conditions and used for comparison in the long-list conditions. Specific pairs of outcomes were selected in order to assess whether the omission effect might be related to subjects’ abilities to categorize the-outcomes. This experiment employed the natural categories “hits and walks” (closely corresponding to “on base percentage”), “putouts and strikeouts” (the complement of “on base percentage”), and “walks and strikeouts” (the complement of “contact percentage”). For purposes of comparison, “hits and strikeouts” were paired because they do not represent a natural category. Subjects were asked to estimate the frequency of either two or four outcomes plus the frequency associated with “all other outcomes”. Note that even the long list is not exhaustive, because a batter could also get on base by being hit by a pitch or because of a fielding error. Although both fielding errors and being hit by a pitch are rare in major league baseball, their omission meant that there was some probability associated with the all-other category even in the long-list condition. LIST-LENGTH EFFECTS AND EXPERTISE 331 The dependent measure “adjusted all other outcomes” is derived from that used by Fischhoff, et al. (1978). In the partial-list conditions, it is simply the frequency that subjects assigned to “all other outcomes.” Conceptually, the dependent measure for the corresponding pair of outcomes in the long-list condition is the sum of the frequencies that subjects assigned to all outcomes other than that pair; that is, to “all other outcomes” and to the other two specified outcomes. Operationally, the dependent measure in all conditions was defined as 100 minus the estimated frequencies of the outcomes of interest in the experimental condition. Consider, for example, the short-list “hits and walks” condition. A subject who assessed hits to be 30, walks to be 10, and “other” to be 60, would produce a dependent measure of 60. If a subject in the corresponding long-list condition assessed hits to be 23, walks to be 7, putouts to be 50, strikeouts to be 15, and “other” to be 5, the dependent measure would be 70 (i.e., 100 - 23 - 7). In this case, the omission effect would be evident to the extent that 70 is greater than 60. The student subjects were randomly assigned to each of the eight Omission x Category conditions with equal cells sizes. Note that subjects in the long-list conditions provided assessments for all four categories plus the “all other” category. Thus, the responses from the seven expert subjects in the long-list condition accounted for all 28 responses for the expert long-list condition and cell sizes for the experts and students were the same (28 experts in the short-list conditions, plus 7 experts which produced 28 responses in the four long-list conditions equals %-the same number as the novices). Procedures and Materials The experimental task (for the long-list condition) was as follows: Betting on the Barter You are watching a major league baseball game with a friend. As each batter comes to the plate, the two of you guess what the outcome will be for that batter. The person who guesses right gets 10 points, and the person with the fewest total points at the end of the game buys the beer (or suitable substitute). A batter comes to the plate that you don’t know, so you decide to base your decision on the overall batting statistics for all players in the major leagues. Out of a randomly selected 100 plate appearances, indicate the number of times that each of the following would be expected to happen. Remember, the total of the five estimates should be 100. 1. How many times will the batter hit the ball but be put out? 2. How many times with the batter hit safely? 3. How many times will the batter draw a walk? 4. How many times will the batter strike out? 5. How many times will there be some other outcome? Total 1 332 JOHNSON, RENNIE, AND WELLS Following this task,’ the subjects answered the 20 baseball trivia and baseball rules questions. The experiment was administered to the student subjects in groups in a classroom setting. Participation in the experiment was voluntary and the students earned course credit for participation. For the expert group, instruments were distributed together with envelopes at various organizational meetings for amateur baseball. The experts were instructed to complete the task “without consulting other people and without consulting any papers or books.” The completed instruments were returned in sealed envelopes and the experts were advised that their responses would be anonymous so that they need not be concerned about giving inaccurate answers. Most of the experts filled out the instruments individually before leaving the meetings at which they were distributed. There was no evidence that later returns differed from those that were immediately completed. RESULTS Expertise and Omission Effects The means of “adjusted all other outcomes” by list length and expertise are shown in Fig. 1. The results support our prediction of a significant interaction between expertise and list length (F(1,96) = 30.9; p < -001). For the student group, the list-length effect was significant at p < .OOl (F(1,48) = 28.4 for the simple effects test). In the short lists, the students assessed “other outcomes” to be significantly lower than the assessments from the corresponding outcomes in the long list. For the expert group, however, there was no significant effect for list length (F(1,48) = 2.7; p > .I0 for the simple effects test). There was also an overall expertise effect (F(1,96) = 53.9; p < .OOl>in that experts generally assessed outcomes differently than did students. Category Effects Figure 2 presents the expertise by list-length effect for the four different short-list categories. Note that there is no significant list-length effect for the experts across any of the categories. For the students, there is a significant main effect for list length (F(1,48) = 28.4, p < .OOl). Further analysis indicates that this effect holds for the second 0, < .OOl), third (p < .Ol), and fourth (p < .05) categories; but not for the first category @ > ’ The normatively correct values across these outcomes are 53, 22, 9, 15, and 1 when rounded to the nearest whole number using the 1987 major league statistics reported in James (1987). LIST-LENGTH EFFECTS AND EXPERTISE 333 .40). This effect of different categories is discussed below with reference to the normative frequencies. Comparison with Norms Figure 3 compares the individual assessments with normative data for each of the outcomes, by expertise and list length. The norms for baseball players across outcomes were calculated from data reported in James (1987). The experts’ responses were very close to the norms in all categories for both the long-list and short-list conditions. Using a binomial probability test (Conover, 1980) comparing the experts’ estimates with the normatively correct values, no significant deviations were found for strikeouts, walks, and putouts. Experts’ estimates of hits, however, were significantly higher than the normatively correct value @ = .05). For the students, the hypothesis that each of the assessedmeans equals the normative values was rejected (p < .05) across the outcome categories and both list lengths; although the students’ estimates appear to be closer to the normative values in the long-list condition than in the short-list conditions. Note that the students’ assessments were less extreme, i.e., Expertise Fli.96) 2 g u '; 0 b X Omission - 30.9: p < ,001 8o 60 5 0 z B t; .$ Q 40 20 0 Experts Students Expertise m Long List Long or Short List (4 or 2 outcomes Shot-t List plus “all other”) FIG. 1. Effect of list length on adjusted all-other-outcomes measure as a function of expertise. 334 JOHNSON, RENNIE, AND WELLS closer to an average value, than the normative frequencies across all outcomes. Furthermore, these differences are more pronounced in the short-list conditions than in the long-list condition. Additional Follow-Up Tests The main results in this study were analyzed using analysis of variance. Although the homogeneity of variance assumption was violated for the expert versus student groups, ANOVA is robust to such violations with equal cell sizes. An examination of the model’s residuals indicated that the normality assumption could not be maintained (Lilliefor’s test); thus, the ANOVA was run using ranks (Conover, 1980) to assess the robustness of the results across different distributional assumptions. The results were similar to the parametric ANOVA except that the Category x Listlength interaction was significant @I < .05) for only the ranks test. The three-way interaction of expertise, list length, and category also approached significance @ < .07). Simple effects follow-up tests for each level of expertise reveal that these slightly different results are due to the smaller list-length effect for the students in the strikeouts-hits category. An analysis of the studentized residuals indicated two extreme observations for the student subjects. When these observations were deleted, however, the significance of the effects was unchanged. 100 Legend Students 90 Long List EO Short List Strikeouta and Hits Strikeouts and Walks Strikeouts ond Putouts Hits and Walks s/H Category / List Length s/n S/P w/H Condiion FIG. 2. Effects of list length and expertise on all-other-outcomes measure as a function of individual categories. LIST-LENGTH EFFECTS AND 335 EXPERTISE 60 ” ; .j 8 g ExDerts t Students 70 60 50 940 H s w H P S W P Outcome m Long List n Norma Short tiat FIG. 3. Comparisons of normatively correct values for outcomes to estimated values as functions of list length and expertise. DISCUSSION This is the first study to show that expertise can eliminate the list-length effect. We believe that previous studies have used domains in which expertise does not take the form of probability or relative frequency representations. Baseball, in contrast to auto mechanics, financial auditing, and hospitality management, has a natural (or at least historical) relation to statistics, especially relative frequency and probability. Indeed, we suspect that the pervasive “batting average” statistic is the first introduction to statistics that many American children encounter. Accordingly, we suspect that a large percentage of American college students would show little or no omission effect on this baseball task; their data would more closely resemble our experts than our Canadian students. The student novices provided data that help us to address the question of whether accuracy is greater for the long or the short list. The general assumption in previous studies has been that longer lists (full trees) provide more accurate probability assessments (e.g., Fischhoff, 1977; Slavic, Fischhoff, & Lichtenstein, 1982). The reasoning behind this assumption is that the failure to explicitly identify all possible events for subjects can result in some event(s) being overlooked and the events that are listed are concomitantly inflated beyond their appropriate levels. We do not dis- 336 JOHNSON, RENNIE, AND WELLS agree with this argument in its general form, but we believe that our data help clarify some conditions under which this general argument will and will not hold true. First, notice in Fig. 3 that three of the four outcomes show a better accuracy for the students’ estimates with the long list and the other outcome (putouts) shows better accuracy for the short list. This is precisely the pattern we would expect if students were using an anchoring-andadjustment heuristic. In the long-list conditions, subjects were provided with five event-categories (hits, walks, strikeouts, putouts, and all other). Let us assume that initially subjects placed equal frequencies in each category (i.e., 100% + 5 or 20%). This is their anchor from which they will adjust. In the short-list condition, only three categories exist (e.g., hits, walks, and all other), resulting in an anchor of 33%. Given that adjustments from an anchor tend to be insufficient, we would expect that the short list would provide more accurate estimates than the long list when the actual or true values are closer to 33% than 20%, and the long list would provide more accurate estimates when the true value is closer to 20% than to 33%. The actual or true values for hits, walks, and strikeouts are closer to 20% than 33% whereas the actual value for putouts is closer to 33% than to 20%. If, as we suggest, anchoring and adjustment plays a role in the listlength effect,* then there are certain conditions in which long lists will produce more accurate estimates, certain conditions in which shorter lists will produce more accurate estimates, and other conditions in which the results will be mixed. In general, when the actual probabilities of the events are near the value of l/N (where N is the number of events listed), estimated probabilities will be close to the actual probabilities. If the actual probabilities are lower (higher) than l/N, then estimates will fall above (below) the actual values. When there is considerable variation in the actual probabilities across events (as in the current study, where putouts are over 5 times more probable than walks), longer lists will tend to favor accuracy on the lower-probability events and shorter lists will tend to favor accuracy on the higher-probability events. Thus, we argue that longer lists produce smaller anchors and, in situations where smaller anchors are desirable, longer lists are superior to shorter lists. But this will not always be the case. Indeed, at this time we would argue that a moderate sized or small list should not include explicit mention of an 2 Additional evidence that an anchoring-and-adjustment process is involved in the listlength effect can be found in Rennie (1989). Using a sequential task, subjects were led to believe that they were going to assess probabilities in either 3 or 5 categories. Controlling for the actual event that they estimated first, their estimates for this first event were lower if they were expecting 5 categories rather than only 3 categories. LIST-LENGTH EFFECTS AND EXPERTISE 337 extremely low probability event merely for the purpose of making the list complete lest the subject anchor at too high a level on that event. Returning to our experts, the question arises as to how they managed to avoid the list-length effect. There are two kinds of knowledge that are individually sufficient to make a person impervious to the list-length effect. First, if a person has a strong mental representation of the relevant event possibilities, then it should not matter how many of these events are collapsed into the all-other category and how many are explicitly listed. The person who has a clear, well-articulated mental representation of the event possibilities can and will generate an appropriate listing without the aid of an externally provided list. In this case, it is not necessary that the person have a clear idea of the probabilities involved; the assumption merely is that the person is dealing (mentally) with a full list regardless of the list length that was provided by the experimenter. An alternative way to be unaffected by the list-length manipulation is to have a strong mental representation of the probabilities associated with the listed events even if one cannot (or does not) think of the possible events that are contained in the all-other category. For example, if a person knows that the probability of a hit is .22 and the probability of a walk is .09, then that person need not generate the other specific possibilities in the all-other category to report the value of .69 for the all-other category. In other words, the person does not have to think about the possibilities of walks, fly-outs, errors, being hit by a pitch, and so on because the probabilities associated with the listed events are known and the all-other category can be derived through simple subtraction. Although either of these interpretations can explain why our baseball experts were impervious to omission effects, the knowledge-of-relevantevent-possibilities interpretation does not, in and of itself, explain how our experts managed to be so accurate in their estimates. In other words, the knowledge-of-relevant-events interpretation simply means that the short-list conditions were functionally equivalent to the long-list conditions because the expert was able to think of all the relevant events that were collapsed into the all-other category in the short-list cases. Taken alone, this does not account for the extremely close correspondence between the estimated probabilities and the normatively correct probabilities. Thus, we must invoke the second interpretation, that our experts had strong mental representations of the event probabilities themselves rather than just knowledge of the family of possible events. Although experts’ estimates were extremely close to the normatively correct values in all categories, a statistically significant deviation was observed in the category of hits. The magnitude of this deviation, although small, leads us to speculate that subjects were using batting av- 338 JOHNSON, RENNIE, AND WELLS erages to estimate this value. If we ignore sacrifices and cases where the batter is hit by a pitch, both of which are rare, then batting average is closely approximated by the number of hits divided by the difference between trips to the plate and walks. This yields a value for batting average that is approximately five percentage points higher than the number of hits per 100 trips to the plate. This is very near the value estimated by the experts. Whether they were using batting average rather than hits per trip to the plate or not, however, the experts’ estimates were unaffected by list length.3 One of the differences between our trips-to-bat problem and previously used problems is that the sample space for the trips-to-bat problem is considerably smaller than the sample spaces for some of the previously used problems such as Fischhoff ef al.‘s (1978) auto-repair problem. Indeed, the baseball problem presented to our subjects was not hierarchical whereas Fischhoff et al.‘s auto-repair problem was hierarchical. Although we acknowledge that future research should address such differences, this difference in the structure and size of the sample spaces cannot account for why novices and experts differed so dramatically given that both groups were dealing with the same trips-to-bat problem. We believe that it would be more fruitful to focus on the question of how list-length effects interact with the rype of expertise held by subjects. We suggest that baseball experts are not the only ones who will show resistance to list-length effects on probability judgments. Experts on horse races and professional weather forecasters, for example, would seem especially unlikely to show list-length effects (because it seems that they both use and understand probability; see Hoer1 & Fallin, 1974; Murphy & Winkler, 1977). And, although research on the list-length effect has been conducted almost exclusively with probability judgment tasks, this need not be the only type of task in which we can examine such effects. Consider once again the auto mechanic. We believe that an auto mechanic’s expertise is related among other things to knowledge of tools and such things as “book times” associated with various repairs. Thus, experienced mechanics and novices could be presented with a repair problem 3 It is also possible that the “hit safely” category was interpreted to include getting on base by error. This possibility exists in part because getting on base by error was not explicitly listed but instead was meant to be part of the all-other category. This is related to Hirt and Castellan’s (1988) notion of category redefinition as an explanation for the listlength effect. However, this should be equally true for both the long and short-list conditions in this study. And, although it might account for some of the overestimation of hits relative to the normatively correct values, the actual frequencies of getting on base by error in major league baseball are too insignificant to account for the approximately 6.5% difference between the experts’ estimates and the actual values. LIST-LENGTH EFFECTS AND EXPERTISE 339 (e.g., replace starter and ignition) and asked how many minutes they would need to use Tool A, Tool B, and “all other tools” versus Tool A, Tool B, Tool C, Tool D, and all other tools. We doubt that auto repair experts would show list-length effects in this task whereas novices almost certainly would. Summary and Conclusions List-length effects were found for baseball novices but not baseball experts in a task involving estimated frequencies of outcomes for batters’ trips to the plate. We believe that probabilities and relative frequencies are not highly relevant to the kinds of mental representations that made experienced auto mechanics, auditors, and hospitality managers experts in their respective professions; we believe that this accounts for why experts in these professions fell prey to list-length effects in previous studies. As well, our data suggest that list-length effects derive at least in part from an anchoring-and-adjustment process in which the anchor value is an inverse function of the number of listed event-possibilities. Expertise can overcome the list-length effect if the expert has a well-defined internal representation of the nonlisted event possibilities and a confident knowledge of the relevant probabilities. Without such expertise, the accuracy of subjects’ estimates will be determined in part by the extent to which the actual probabilities are close to the anchor value. REFERENCES Conover, W. J. (1980). Practical nonparametric statisfics (2nd ed.). New York: John Wiley. Dube-Rioux, L., & Russo, J. E. (1988). An availability bias in professional judgment. Journal of Behavioral Decision Making, 1, 223-231. Fischhoff, B. (1977). Cost-benefit analysis and the art of motorcycle maintenance. Policy Sciences, 8, 177-202. Fischhoff, B., Slavic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance, 4, 330-344. Hirt, E. R., & Castellan, N. J., Jr. (1988). Probability and category redefinition in the fault tree paradigm. Journal of Experimental Psychology: Human Perception and Performance, 14, 122-131. Hoerl, A. E., & Fallin, H. K. (1974). Reliability of subjective evaluations in a high incentive situation. Journal of the Royal Statistical Society, 137, 227-230. James, B. (1987). The Bill James baseball abstract 1987. New York: Ballantine. Lilliefors, H. W. (1967). On the Kolmogorov-Smimov test for normality with mean and variance unknown. Journal of the American Statistical Association, 64, 399402. Murphy, A. H., & Winkler, R. L. (1977). Can weather forecasters formulate reliable probability forecasts of precipitation and temperature? National Weather Digest, 2, 2-9. Rennie, R. D. (1989). Determination of probable cause by auditors: A study of the omission effect in fault trees. Unpublished doctoral dissertation, University of Alberta. 340 JOHNSON, RENNIE, AND WELLS Renme, R. D., & Johnson, R. D. (1988, October). Auditors’judgments ofprobable causes: Effects of availability, experience, focusing and omission. Presentation at ORSAI TIMS, Denver. Tversky, A., L Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131. Slavic, P., Fischhoff, B., & Lichtenstein, S. (1982). Facts versus fears: Understanding perceived risk. In Kahneman, Slavic, & Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 463-492). New York: Cambridge University Press. RECEIVED: November 3, 1989
© Copyright 2026 Paperzz