Outcome Trees and Baseball

ORGANIZATIONAL
BEHAVIOR
AND
HUMAN
DECISION
PROCESSES 50,
324-340 (19%)
Outcome Trees and Baseball: A Study of Expertise and
List-Length Effects
RICHARD D. JOHNSON
University
of Alberta
RICHARD D. RENNIE
University
of Regina
AND
GARY L. WELLS
Iowa State University
When people estimate the probability of an event using a list that includes all
or most of the possible events, their estimate of that probability is lower than
if the other possible events are not explicitly identified on the list (i.e., are
collapsed into an all-other-possibilities category). This list-length (or pruning)
effect has been demonstrated to occur even for people who have expertise or
considerable knowledge in the event domain. We reasoned that the experts
used in previous studies would be unlikely to have probabilistic representations of their problem domains (e.g., auto mechanics, auditors, hospitality
managers). We used baseball experts (n = 35) and novices (n = 56) on the
assumption that expertise in baseball almost certainly involves mental representations of probability for various baseball events. Subjects estimated the
frequency of hits, walks, strikeouts, putouts, and “all other” outcomes for an
average major league player in 100times at bat. Other subjects estimated these
event outcome frequencies in a short-list condition (e.g., strikeouts, walks,
and “all other”). Strong list-length effects were observed with novices; the
frequency estimate for strikeouts, for example, was nearly twice as high in the
short-list condition as in the long-list condition. Experts, however, showed no
list-length effect and their estimated probabilities were very near the actual
(normatively correct) probabilities in all conditions. We argue that the omission effect can be overridden by strong mental representations of the family of
possible events and/or a clear knowledge of the probabilities associated with
the events. As well, we argue that list-length effects seem to result at least in
part from an anchoring-and-adjustment strategy. Q 19% Academic press, hc.
Order of authorship was determined alphabetically and does not necessarily reflect the
relative contributions of the authors. Requests for reprints can be sent to any of the three
authors: Gary L. Wells, Department of Psychology, Iowa State University, Ames, IA 50011;
Richard D. Rennie, Faculty of Management, University of Regina, Regina, Saskatchewan,
Canada; or Richard D. Johnson, Faculty of Business, University of Alberta, Edmonton,
Alberta, Canada.
324
0749-5978191$3.00
Copyright 0 1991 by Academic Press, Inc.
All rights of reproduction in any form reserved.
LIST-LENGTH
EFFECTS
AND
325
EXPERTISE
Many situations require people to make judgments about the probability that some event has happened or will happen. What is the probability
that a terrorist group will follow through with its threat to kill a hostage?
What is the probability that the fire was caused by arson attributable to
the owner of the building? What is the probability that a manuscript will
be accepted by Journal X?
Previous research indicates that the probabilities that people attach to
various events can be influenced substantially by the extent to which all
the possible events are explicitly listed for consideration. Using one of the
above examples, people would estimate a higher probability that the hostage will be killed if they were asked to estimate only two probabilities,
the probability of being killed and all other outcomes, than if they were
asked to estimate several possible outcomes (e.g., probability that hostage is killed, probability that hostage is released, probability that hostage
is rescued, probability that hostage escapes, and all other outcomes). This
is what is known as the list-length effect (sometimes called an “omission”
or “pruning” effect; see Fischhoff, Slavic, & Lichtenstein, 1978). By
omitting the explicit mention of possible events in the “all other” category, the probability associated with the target event that is mentioned is
increased and the probability associated with the all-other category is
diminished relative to the longer-list condition. Normatively, estimated
probabilities for all other outcomes in the short-list condition for the
hostage problem should equal the total probability for “released” plus
“rescued” plus “escapes” plus all other outcomes in the long-list condition.
The general idea of a list-length effect is illustrated in Table 1. When
given the relatively complete list, the outcomes C, D, and “all other” add
up to total 59% of the outcomes. When given the relatively incomplete
list, however, the “all other” category is only 40%. These two percentages should be equal because they include all outcomes other than A or B
in both the complete and incomplete list. Somehow, the omission of an
explicit request of the subject to estimate the percentage of outcomes
belonging in C and D results in a shift of the percentages.
TABLE 1
HYPOTHETICAL DATA ILLUSTRATIVE OF THE OMISSION EFFECT
Relatively
Outcome
Outcome
Outcome
Outcome
All other
Total
complete list
A
B
C
D
possible outcomes
Incomplete
21%
20%
10%
19%
30%
100%
list
Outcome A
Outcome B
All other possible outcomes
Total
32%
28%
40%
100%
326
JOHNSON,
RENNIE,
AND
WELLS
Previous studies showing list-length effects might not be very surprising
if we assume that the people estimating the frequencies, probabilities, or
percentages had little or no knowledge of possible numerical values associated with the categories. In cases where people know little or nothing
about the events in question, they might use the number of categories for
reasonable guesses, giving each listed category the same probability. As
a result, when there are 5 categories (as in the relatively complete list in
Table 1) the value of 20% might be placed in each; when there are 3
categories (as in the incomplete list in Table 1) the value of 33% might be
placed in each. This will produce a list-length effect but, of course, the
list-length effect would not be very interesting in such a case.
The surprising fact about list-length effects is that they appear to occur
for novices and experts alike. Fischhoff, Slavic, and Lichtenstein (1978)
first demonstrated this with experienced garage mechanics and college
student subjects who were required to estimate the frequencies of causes
for a car not starting. Subjects were asked to estimate how many times
out of 1,000 a car will not start because of a defective fuel system, a
defective ignition system, and so on. In some conditions, subjects evaluated relatively complete or full “trees” whereas others evaluated
“pruned” trees. For example, one list might have left off battery and fuel
problems (short list) but included starting, ignition, and other engine problems whereas another group evaluated all of these (long list). In each case
there was a miscellaneous or “all other” category. Comparisons were
made between the “all other” value in the short-list conditions and the
total of “all other” plus the relevant categories in the long list that were
left out of the short list. List-length effects were found across all six
experiments using a variety of pruning methods (e.g., pruning, manipulating detail, splitting and fusing branches). Of greatest concern to us was
that experienced mechanics, who presumably had considerable knowledge of car problems, showed strong list-length effects. Similar list-length
effects have been found for experts in hospitality management on hospitality management problems (Dube-Rioux and Russo, 1988) and for professional auditors on auditing problems (Rennie & Johnson, 1988).
We suggest that previous studies showing list-length effects for experts
might be explained, at least in part, by the idea that the expertise of these
individuals does not involve frequency or probability representations of
their problem areas. Consider the case of auto mechanics. Their approach
to an automotive problem is likely to be dictated more by considerations
of symptoms, time, cost, and effort than by probability. For example, if
a car does not start, the probability that it is a fuel problem versus the
starter or the battery is quite irrelevant because the symptoms alone could
easily eliminate the starter or battery; if the motor turns over when the
key is turned, then neither the battery nor the starter are faulty. Their
LIST-LENGTH
EFFECTS
AND
EXPERTISE
327
expertise, therefore, is less a game of probability than it is a game of
reading symptoms. As well, the expertise of an auto mechanic is in knowing the sequence for checking problems. One does not remove the carburetor prior to checking to see if the car has gas in the tank even if a
faulty carburetor is more probable than a foolish driver who simply is out
of gas. Examining the carburetor before examining the gas gauge can be
costly, time consuming, and effortful.
Our point is that experienced auto mechanics are likely to be both
unfamiliar with and unconcerned about probabilities or frequencies. Although their experience in auto repair is likely to have given them some
appreciation for the frequencies of various mechanical problems, mechanics become experts by virtue of their diagnostic skills (e.g., see what
happens when starter leads are connected), knowledge of appropriate
sequences (e.g., check battery prior to checking alternator), ability to
read repair manuals, and knowledge of tools and gauges. The prior probability that a car problem rests in one of several system-failure locations
might be a very poor guide to auto repair. Thus, we see no functional
reason for auto mechanics to maintain a probabilistic representation for
the general problem of why a car will not start. Symptoms, meters, and
diagnostic sequence tests are more likely than probabilities to be included
in the mental representation of auto mechanics.
A similar argument can be applied to hospitality managers and auditing
managers. Neither of these professions publish frequencies of the underlying event. Lack of such statistics can perhaps be attributed to the fact
that it is not something that will make them successful or unsuccessful in
their professions. In the auditing profession, for example, an auditor
might have to decide the cause of a disputed amount disclosed on accounts receivable confirmation letters. Causes could include (a) the balance was recorded on another customer’s account, (b) balance was paid
but was in transit, (c) goods were on consignment, and so on. As with the
auto mechanic, however, the issue is not one of probability but instead a
mixture of judgments involving such factors as symptoms and sequence.
Some possible causes are easier to check than others (and, hence, tend to
be performed first) and some are associated with certain symptoms (e.g.,
customer tends to claim prior payment if balance was recorded on another
customer’s account but not if goods were on consignment). It is this kind
of sequence and symptoms knowledge, rather than knowledge of prior
probabilities, that makes an auditor an expert in his or her area.
In this experiment we chose to use the domain of baseball for two
reasons. First, baseball is naturally associated with probabilities. Batting
average, on-base percentages, fielding percentages, and so on are probability expressions that are used routinely in the sport. Not all of the
numerical expressions in baseball are represented as probabilities. Some,
328
JOHNSON,
RENNIE,
AND
WELLS
such as earned run average and slugging percentage, represent ratios that
are unique to the sport and are not probabilities per se. Nevertheless, the
statistical nature of the game stands in contrast to auto mechanics, hospitality management, and auditing. Expertise in baseball almost certainly
should include knowledge that takes the form of probabilities or percentages.
Indeed, probabilistic knowledge in baseball is critical to decisions regarding when to replace a pitcher, when to steal a base, whether or not to
platoon a player, when to bunt, and almost every other decision. Managers and players are commonly criticized for not making the “percentage
play” and these criticisms often are predicated on rather sophisticated
conditional probabilities (e.g., batting average given that players are in
scoring position versus bases are empty). Thus, although we question the
extent to which auto mechanics, auditors, and hospitality managers represent strong tests of experts’ resistance to list-length effects, we accept
a priori the proposition that baseball experts ought to be resistant to
list-length effects in their assessments of frequency or probability for a
baseball task. We do not wish to leave the impression that some problem
domains such as auto mechanics are entirely devoid of probabilistic reasoning whereas baseball is entirely a matter of probabilistic reasoning.
Our argument instead is one of the relative amount and sophistication of
probabilistic reasoning. If baseball experts show list-length effects, arguments for the pervasiveness of the list-length effect would be on much
stronger ground.
The second reason for using the domain of baseball is that it allows
comparisons of subjects’ estimated probabilities or frequencies to the
actual probabilities or frequencies. For example, we know that the average major league baseball player strikes out 15 times for every 100 at bat
(James, 1987). This is quite unlike previous studies that used an automotive problem, an auditing problem, and a hospitality management problem
for which the actual values were unknown (or at least unreported). Thus,
we cannot determine from previous studies whether subjects’ assessments were more accurate with the short list or the long list.
Whether the long list or the short list yields more accurate probability
assessments depends critically on why the list-length effect occurs. The
dominant explanation in the published literature centers on the notion of
memory availability (Fischhoff, Slavic, & Lichtenstein, 1978; Hit-t & Castellan, 1988). The idea is that a short list does not foster a complete
representation of all the possible categories (e.g., all possible causes for a
car not starting or all possible outcomes of a batter’s trip to the plate).
According to this interpretation, a long list should produce more accurate
assessments than a relatively short list because the problem with the short
list is that not all possible items are mentally available.
LIST-LENGTH
EFFECTS
AND
EXPERTISE
329
An alternative interpretation of the omission effect centers on the anchoring-and-adjustment heuristic (Tversky & Kahneman, 1974). According to this interpretation, each category or item listed is assigned an equal
probability initially and then adjustments are made upward or downward
according to the subjects’ knowledge of the item. In the longer list in
Table 1; for example, each of the 5 categories (A through D plus “all
other”) might receive an initial value of 20% whereas each of the 3 categories in the shorter list might receive an initial value of 33%. These
anchors might then be adjusted when the category is more closely considered, but adjustments typically are insufficient. Thus, suppose the actual value for outcome A is 25%. Because outcome A is anchored at 20%
for the long list and 33% for the short list, insufficient adjustment from the
anchor yields correspondingly different estimates for outcome A. According to this interpretation, the accuracy of short list versus that of a long
list depends on the extent to which the actual (true or normatively correct) values are close to the initial anchors. If a short list includes a low
probability event or a long list includes a high probability event, then
actual and estimated values will be highly discrepant for those events.
Conversely, if a short list includes only high probability events and a long
list includes only low probability events, then the actual and estimated
value would be quite close.
Importantly, both the availability interpretation and the anchoring-andadjustment interpretation imply that expertise should mitigate the omission effect. Expertise should help assure that a non-mentioned item or
category is recalled and expertise should help subjects make more sufficient adjustments away from the anchor. In baseball, for example, not
explicitly listing a base on balls (walk) as a possible outcome of a time at
bat should not have much of an effect on a baseball expert’s ability to
recall that such an outcome is possible on any given trip to the plate. As
well, even if only three possible outcomes are listed (e.g., hit, walk, and
“all other”) such that an initial anchor of 33% is used, the baseball expert’s confident knowledge that walks are relatively rare in major league
baseball should allow the expert to overcome most of the effect of the
anchor.
METHOD
Subjects
The expert subjects (n = 35) were involved in amateur baseball as
umpires, coaches, or managers. The experts had an average of 11.4 years
experience in organized baseball and a mean age of 23.9 years. The novices (n = 56) were undergraduate psychology students who participated
in the experiment for course credit. Although a larger number of students
330
JOHNSON,
RENNIE,
AND
WELLS
had participated in the study, we excluded the responses of all students
who had never been involved in baseball as a player, coach, or umpire.
This was done in order to give meaning to any finding of relative expertise
on the part of the experts. (That is, comparing experts’ judgments with
those of people who knew absolutely nothing about baseball would not be
a valid demonstration of expertise.) The student responses were then
randomly chosen from the remaining group in order to equalize the eventual cell sizes. The students had an average of 3.8 years involvement in
baseball and an average age of 19.6 years.
In order to provide an independent measure of the relative baseball
expertise of the two subject groups, subjects were asked to respond to 10
four-alternative multiple choice questions on major league baseball trivia
and 10 true-false questions concerning the rules of baseball. The experts
correctly answered an average of 9.0 and 9.4, respectively, of these questions while the corresponding means for the students were 4.1 and 5.3.
The two populations were significantly different on both tests (t(90) =
16.3 and 17.0, p < .OOl for the trivia and rules tests, respectively), variability within groups was rather small (pooled standard deviations of 1.39
and 1.10 for the two tests respectively), and there was almost no overlap
in scores for the two groups (ranges were 2-6 and 5-10 for the students
and experts respectively on the trivia test; ranges were 3-8 and 7-10 for
students and experts respectively on the rules test).
Experimental
Design
The design of the experiment was a 2 x 2 x 4 factorial of expertise
(experts or students), list length (3 or 5 possible outcomes), and pairs of
the outcomes (see below) which were presented in the short-list conditions and used for comparison in the long-list conditions. Specific pairs of
outcomes were selected in order to assess whether the omission effect
might be related to subjects’ abilities to categorize the-outcomes. This
experiment employed the natural categories “hits and walks” (closely
corresponding to “on base percentage”), “putouts and strikeouts” (the
complement of “on base percentage”), and “walks and strikeouts” (the
complement of “contact percentage”). For purposes of comparison,
“hits and strikeouts” were paired because they do not represent a natural
category.
Subjects were asked to estimate the frequency of either two or four
outcomes plus the frequency associated with “all other outcomes”. Note
that even the long list is not exhaustive, because a batter could also get on
base by being hit by a pitch or because of a fielding error. Although both
fielding errors and being hit by a pitch are rare in major league baseball,
their omission meant that there was some probability associated with the
all-other category even in the long-list condition.
LIST-LENGTH
EFFECTS AND EXPERTISE
331
The dependent measure “adjusted all other outcomes” is derived from
that used by Fischhoff, et al. (1978). In the partial-list conditions, it is
simply the frequency that subjects assigned to “all other outcomes.”
Conceptually, the dependent measure for the corresponding pair of outcomes in the long-list condition is the sum of the frequencies that subjects
assigned to all outcomes other than that pair; that is, to “all other
outcomes” and to the other two specified outcomes. Operationally, the
dependent measure in all conditions was defined as 100 minus the estimated frequencies of the outcomes of interest in the experimental condition.
Consider, for example, the short-list “hits and walks” condition. A
subject who assessed hits to be 30, walks to be 10, and “other” to be 60,
would produce a dependent measure of 60. If a subject in the corresponding long-list condition assessed hits to be 23, walks to be 7, putouts to be
50, strikeouts to be 15, and “other” to be 5, the dependent measure would
be 70 (i.e., 100 - 23 - 7). In this case, the omission effect would be
evident to the extent that 70 is greater than 60.
The student subjects were randomly assigned to each of the eight Omission x Category conditions with equal cells sizes. Note that subjects in
the long-list conditions provided assessments for all four categories plus
the “all other” category. Thus, the responses from the seven expert
subjects in the long-list condition accounted for all 28 responses for the
expert long-list condition and cell sizes for the experts and students were
the same (28 experts in the short-list conditions, plus 7 experts which
produced 28 responses in the four long-list conditions equals %-the
same number as the novices).
Procedures
and Materials
The experimental task (for the long-list condition) was as follows:
Betting on the Barter
You are watching a major league baseball game with a friend. As each batter comes
to the plate, the two of you guess what the outcome will be for that batter. The
person who guesses right gets 10 points, and the person with the fewest total points
at the end of the game buys the beer (or suitable substitute). A batter comes to the
plate that you don’t know, so you decide to base your decision on the overall
batting statistics for all players in the major leagues. Out of a randomly selected 100
plate appearances, indicate the number of times that each of the following would be
expected to happen. Remember, the total of the five estimates should be 100.
1. How many times will the batter hit the ball but be put out?
2. How many times with the batter hit safely?
3. How many times will the batter draw a walk?
4. How many times will the batter strike out?
5. How many times will there be some other outcome?
Total 1
332
JOHNSON,
RENNIE,
AND
WELLS
Following this task,’ the subjects answered the 20 baseball trivia and
baseball rules questions.
The experiment was administered to the student subjects in groups in a
classroom setting. Participation in the experiment was voluntary and the
students earned course credit for participation. For the expert group,
instruments were distributed together with envelopes at various organizational meetings for amateur baseball. The experts were instructed to
complete the task “without consulting other people and without consulting any papers or books.” The completed instruments were returned in
sealed envelopes and the experts were advised that their responses would
be anonymous so that they need not be concerned about giving inaccurate
answers. Most of the experts filled out the instruments individually before
leaving the meetings at which they were distributed. There was no evidence that later returns differed from those that were immediately completed.
RESULTS
Expertise and Omission Effects
The means of “adjusted all other outcomes” by list length and expertise are shown in Fig. 1. The results support our prediction of a significant
interaction between expertise and list length (F(1,96) = 30.9; p < -001).
For the student group, the list-length effect was significant at p < .OOl
(F(1,48) = 28.4 for the simple effects test). In the short lists, the students
assessed “other outcomes” to be significantly lower than the assessments
from the corresponding outcomes in the long list. For the expert group,
however, there was no significant effect for list length (F(1,48) = 2.7; p >
.I0 for the simple effects test). There was also an overall expertise effect
(F(1,96) = 53.9; p < .OOl>in that experts generally assessed outcomes
differently than did students.
Category Effects
Figure 2 presents the expertise by list-length effect for the four different
short-list categories. Note that there is no significant list-length effect for
the experts across any of the categories. For the students, there is a
significant main effect for list length (F(1,48) = 28.4, p < .OOl). Further
analysis indicates that this effect holds for the second 0, < .OOl), third (p
< .Ol), and fourth (p < .05) categories; but not for the first category @ >
’ The normatively correct values across these outcomes are 53, 22, 9, 15, and 1 when
rounded to the nearest whole number using the 1987 major league statistics reported in
James (1987).
LIST-LENGTH
EFFECTS AND EXPERTISE
333
.40). This effect of different categories is discussed below with reference
to the normative frequencies.
Comparison
with Norms
Figure 3 compares the individual assessments with normative data for
each of the outcomes, by expertise and list length. The norms for baseball
players across outcomes were calculated from data reported in James
(1987).
The experts’ responses were very close to the norms in all categories
for both the long-list and short-list conditions. Using a binomial probability test (Conover, 1980) comparing the experts’ estimates with the normatively correct values, no significant deviations were found for strikeouts, walks, and putouts. Experts’ estimates of hits, however, were significantly higher than the normatively correct value @ = .05).
For the students, the hypothesis that each of the assessedmeans equals
the normative values was rejected (p < .05) across the outcome categories
and both list lengths; although the students’ estimates appear to be closer
to the normative values in the long-list condition than in the short-list
conditions. Note that the students’ assessments were less extreme, i.e.,
Expertise
Fli.96)
2
g
u
';
0
b
X Omission
- 30.9: p < ,001
8o
60
5
0
z
B
t;
.$
Q
40
20
0
Experts
Students
Expertise
m
Long
List
Long or Short List (4 or 2 outcomes
Shot-t List
plus “all other”)
FIG. 1. Effect of list length on adjusted all-other-outcomes measure as a function of
expertise.
334
JOHNSON, RENNIE, AND WELLS
closer to an average value, than the normative frequencies across all
outcomes. Furthermore, these differences are more pronounced in the
short-list conditions than in the long-list condition.
Additional Follow-Up Tests
The main results in this study were analyzed using analysis of variance.
Although the homogeneity of variance assumption was violated for the
expert versus student groups, ANOVA is robust to such violations with
equal cell sizes. An examination of the model’s residuals indicated that
the normality assumption could not be maintained (Lilliefor’s test); thus,
the ANOVA was run using ranks (Conover, 1980) to assess the robustness of the results across different distributional assumptions. The results
were similar to the parametric ANOVA except that the Category x Listlength interaction was significant @I < .05) for only the ranks test. The
three-way interaction of expertise, list length, and category also approached significance @ < .07). Simple effects follow-up tests for each
level of expertise reveal that these slightly different results are due to the
smaller list-length effect for the students in the strikeouts-hits category.
An analysis of the studentized residuals indicated two extreme observations for the student subjects. When these observations were deleted,
however, the significance of the effects was unchanged.
100
Legend
Students
90
Long List
EO
Short
List
Strikeouta
and Hits
Strikeouts
and Walks
Strikeouts
ond Putouts
Hits and Walks
s/H
Category
/
List Length
s/n
S/P w/H
Condiion
FIG. 2. Effects of list length and expertise on all-other-outcomes measure as a function of
individual categories.
LIST-LENGTH
EFFECTS
AND
335
EXPERTISE
60
”
;
.j
8
g
ExDerts
t
Students
70
60
50
940
H
s
w
H
P
S
W
P
Outcome
m
Long List
n
Norma
Short tiat
FIG. 3. Comparisons of normatively correct values for outcomes to estimated values as
functions of list length and expertise.
DISCUSSION
This is the first study to show that expertise can eliminate the list-length
effect. We believe that previous studies have used domains in which
expertise does not take the form of probability or relative frequency representations. Baseball, in contrast to auto mechanics, financial auditing,
and hospitality management, has a natural (or at least historical) relation
to statistics, especially relative frequency and probability. Indeed, we
suspect that the pervasive “batting average” statistic is the first introduction to statistics that many American children encounter. Accordingly, we suspect that a large percentage of American college students
would show little or no omission effect on this baseball task; their data
would more closely resemble our experts than our Canadian students.
The student novices provided data that help us to address the question
of whether accuracy is greater for the long or the short list. The general
assumption in previous studies has been that longer lists (full trees) provide more accurate probability assessments (e.g., Fischhoff, 1977; Slavic,
Fischhoff, & Lichtenstein, 1982). The reasoning behind this assumption is
that the failure to explicitly identify all possible events for subjects can
result in some event(s) being overlooked and the events that are listed are
concomitantly inflated beyond their appropriate levels. We do not dis-
336
JOHNSON,
RENNIE,
AND
WELLS
agree with this argument in its general form, but we believe that our data
help clarify some conditions under which this general argument will and
will not hold true.
First, notice in Fig. 3 that three of the four outcomes show a better
accuracy for the students’ estimates with the long list and the other outcome (putouts) shows better accuracy for the short list. This is precisely
the pattern we would expect if students were using an anchoring-andadjustment heuristic. In the long-list conditions, subjects were provided
with five event-categories (hits, walks, strikeouts, putouts, and all other).
Let us assume that initially subjects placed equal frequencies in each
category (i.e., 100% + 5 or 20%). This is their anchor from which they will
adjust. In the short-list condition, only three categories exist (e.g., hits,
walks, and all other), resulting in an anchor of 33%. Given that adjustments from an anchor tend to be insufficient, we would expect that the
short list would provide more accurate estimates than the long list when
the actual or true values are closer to 33% than 20%, and the long list
would provide more accurate estimates when the true value is closer to
20% than to 33%. The actual or true values for hits, walks, and strikeouts
are closer to 20% than 33% whereas the actual value for putouts is closer
to 33% than to 20%.
If, as we suggest, anchoring and adjustment plays a role in the listlength effect,* then there are certain conditions in which long lists will
produce more accurate estimates, certain conditions in which shorter lists
will produce more accurate estimates, and other conditions in which the
results will be mixed. In general, when the actual probabilities of the
events are near the value of l/N (where N is the number of events listed),
estimated probabilities will be close to the actual probabilities. If the
actual probabilities are lower (higher) than l/N, then estimates will fall
above (below) the actual values. When there is considerable variation in
the actual probabilities across events (as in the current study, where
putouts are over 5 times more probable than walks), longer lists will tend
to favor accuracy on the lower-probability events and shorter lists will
tend to favor accuracy on the higher-probability events. Thus, we argue
that longer lists produce smaller anchors and, in situations where smaller
anchors are desirable, longer lists are superior to shorter lists. But this
will not always be the case. Indeed, at this time we would argue that a
moderate sized or small list should not include explicit mention of an
2 Additional evidence that an anchoring-and-adjustment process is involved in the listlength effect can be found in Rennie (1989). Using a sequential task, subjects were led to
believe that they were going to assess probabilities in either 3 or 5 categories. Controlling for
the actual event that they estimated first, their estimates for this first event were lower if
they were expecting 5 categories rather than only 3 categories.
LIST-LENGTH
EFFECTS
AND
EXPERTISE
337
extremely low probability event merely for the purpose of making the list
complete lest the subject anchor at too high a level on that event.
Returning to our experts, the question arises as to how they managed to
avoid the list-length effect. There are two kinds of knowledge that are
individually sufficient to make a person impervious to the list-length effect. First, if a person has a strong mental representation of the relevant
event possibilities, then it should not matter how many of these events are
collapsed into the all-other category and how many are explicitly listed.
The person who has a clear, well-articulated mental representation of the
event possibilities can and will generate an appropriate listing without the
aid of an externally provided list. In this case, it is not necessary that the
person have a clear idea of the probabilities involved; the assumption
merely is that the person is dealing (mentally) with a full list regardless of
the list length that was provided by the experimenter. An alternative way
to be unaffected by the list-length manipulation is to have a strong mental
representation of the probabilities associated with the listed events even
if one cannot (or does not) think of the possible events that are contained
in the all-other category. For example, if a person knows that the probability of a hit is .22 and the probability of a walk is .09, then that person
need not generate the other specific possibilities in the all-other category
to report the value of .69 for the all-other category. In other words, the
person does not have to think about the possibilities of walks, fly-outs,
errors, being hit by a pitch, and so on because the probabilities associated
with the listed events are known and the all-other category can be derived
through simple subtraction.
Although either of these interpretations can explain why our baseball
experts were impervious to omission effects, the knowledge-of-relevantevent-possibilities interpretation does not, in and of itself, explain how
our experts managed to be so accurate in their estimates. In other words,
the knowledge-of-relevant-events interpretation simply means that the
short-list conditions were functionally equivalent to the long-list conditions because the expert was able to think of all the relevant events that
were collapsed into the all-other category in the short-list cases. Taken
alone, this does not account for the extremely close correspondence between the estimated probabilities and the normatively correct probabilities. Thus, we must invoke the second interpretation, that our experts had
strong mental representations of the event probabilities themselves rather
than just knowledge of the family of possible events.
Although experts’ estimates were extremely close to the normatively
correct values in all categories, a statistically significant deviation was
observed in the category of hits. The magnitude of this deviation, although small, leads us to speculate that subjects were using batting av-
338
JOHNSON,
RENNIE,
AND
WELLS
erages to estimate this value. If we ignore sacrifices and cases where the
batter is hit by a pitch, both of which are rare, then batting average is
closely approximated by the number of hits divided by the difference
between trips to the plate and walks. This yields a value for batting
average that is approximately five percentage points higher than the number of hits per 100 trips to the plate. This is very near the value estimated
by the experts. Whether they were using batting average rather than hits
per trip to the plate or not, however, the experts’ estimates were unaffected by list length.3
One of the differences between our trips-to-bat problem and previously
used problems is that the sample space for the trips-to-bat problem is
considerably smaller than the sample spaces for some of the previously
used problems such as Fischhoff ef al.‘s (1978) auto-repair problem. Indeed, the baseball problem presented to our subjects was not hierarchical
whereas Fischhoff et al.‘s auto-repair problem was hierarchical. Although
we acknowledge that future research should address such differences,
this difference in the structure and size of the sample spaces cannot
account for why novices and experts differed so dramatically given that
both groups were dealing with the same trips-to-bat problem. We believe
that it would be more fruitful to focus on the question of how list-length
effects interact with the rype of expertise held by subjects.
We suggest that baseball experts are not the only ones who will show
resistance to list-length effects on probability judgments. Experts on
horse races and professional weather forecasters, for example, would
seem especially unlikely to show list-length effects (because it seems that
they both use and understand probability; see Hoer1 & Fallin, 1974; Murphy & Winkler, 1977). And, although research on the list-length effect has
been conducted almost exclusively with probability judgment tasks, this
need not be the only type of task in which we can examine such effects.
Consider once again the auto mechanic. We believe that an auto mechanic’s expertise is related among other things to knowledge of tools and such
things as “book times” associated with various repairs. Thus, experienced mechanics and novices could be presented with a repair problem
3 It is also possible that the “hit safely” category was interpreted to include getting on
base by error. This possibility exists in part because getting on base by error was not
explicitly listed but instead was meant to be part of the all-other category. This is related to
Hirt and Castellan’s (1988) notion of category redefinition as an explanation for the listlength effect. However, this should be equally true for both the long and short-list conditions
in this study. And, although it might account for some of the overestimation of hits relative
to the normatively correct values, the actual frequencies of getting on base by error in major
league baseball are too insignificant to account for the approximately 6.5% difference between the experts’ estimates and the actual values.
LIST-LENGTH
EFFECTS AND EXPERTISE
339
(e.g., replace starter and ignition) and asked how many minutes they
would need to use Tool A, Tool B, and “all other tools” versus Tool A,
Tool B, Tool C, Tool D, and all other tools. We doubt that auto repair
experts would show list-length effects in this task whereas novices almost
certainly would.
Summary and Conclusions
List-length effects were found for baseball novices but not baseball
experts in a task involving estimated frequencies of outcomes for batters’
trips to the plate. We believe that probabilities and relative frequencies
are not highly relevant to the kinds of mental representations that made
experienced auto mechanics, auditors, and hospitality managers experts
in their respective professions; we believe that this accounts for why
experts in these professions fell prey to list-length effects in previous
studies. As well, our data suggest that list-length effects derive at least in
part from an anchoring-and-adjustment process in which the anchor value
is an inverse function of the number of listed event-possibilities. Expertise can overcome the list-length effect if the expert has a well-defined
internal representation of the nonlisted event possibilities and a confident
knowledge of the relevant probabilities. Without such expertise, the accuracy of subjects’ estimates will be determined in part by the extent to
which the actual probabilities are close to the anchor value.
REFERENCES
Conover, W. J. (1980). Practical nonparametric statisfics (2nd ed.). New York: John Wiley.
Dube-Rioux, L., & Russo, J. E. (1988). An availability bias in professional judgment. Journal of Behavioral Decision Making, 1, 223-231.
Fischhoff, B. (1977). Cost-benefit analysis and the art of motorcycle maintenance. Policy
Sciences, 8, 177-202.
Fischhoff, B., Slavic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated
failure probabilities to problem representation. Journal of Experimental Psychology:
Human Perception and Performance, 4, 330-344.
Hirt, E. R., & Castellan, N. J., Jr. (1988). Probability and category redefinition in the fault
tree paradigm. Journal of Experimental Psychology: Human Perception and Performance, 14, 122-131.
Hoerl, A. E., & Fallin, H. K. (1974). Reliability of subjective evaluations in a high incentive
situation. Journal of the Royal Statistical Society, 137, 227-230.
James, B. (1987). The Bill James baseball abstract 1987. New York: Ballantine.
Lilliefors, H. W. (1967). On the Kolmogorov-Smimov test for normality with mean and
variance unknown. Journal of the American Statistical Association, 64, 399402.
Murphy, A. H., & Winkler, R. L. (1977). Can weather forecasters formulate reliable probability forecasts of precipitation and temperature? National Weather Digest, 2, 2-9.
Rennie, R. D. (1989). Determination of probable cause by auditors: A study of the omission
effect in fault trees. Unpublished doctoral dissertation, University of Alberta.
340
JOHNSON,
RENNIE,
AND
WELLS
Renme, R. D., & Johnson, R. D. (1988, October). Auditors’judgments ofprobable causes:
Effects of availability, experience, focusing and omission. Presentation at ORSAI
TIMS, Denver.
Tversky, A., L Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
Science, 185, 1124-1131.
Slavic, P., Fischhoff, B., & Lichtenstein, S. (1982). Facts versus fears: Understanding
perceived risk. In Kahneman, Slavic, & Tversky (Eds.), Judgment under uncertainty:
Heuristics and biases (pp. 463-492). New York: Cambridge University Press.
RECEIVED:
November 3, 1989