Introducing the Leroy Problem

Introducing the Leroy Problem: Extending the Frequency vs. Probability Debate to Racial
Stereotypes via an Evolutionary Perspective
Nazia S. Mirza and Herbert H. Blumberg
Corresponding author: Herbert H. Blumberg, Department of Psychology, Goldsmiths College,
University of London, London SE14 6NW, England.
Phone number (from USA): 011-44-20-7919-7896;
Fax 011-44-20-7919-7873;
E-mail: [email protected]
Introducing Leroy
2
Running head: RACE AND THE FREQUENCY VS. PROBABILITY DEBATE
Introducing the Leroy Problem: Extending the Frequency vs. Probability Debate to Racial
Stereotypes via an Evolutionary Perspective
Nazia S. Mirza and Herbert H. Blumberg
Goldsmiths College, University of London
Abstract
Bayesian vs. frequentist paradigms are here extended to the issue of racial stereotypes. It has been
widely argued that human beings do not embody an innate probability of calculus and are not
Bayesian thinkers. Bayesian probabilists argue that probability refers to subjective degrees of
confidence, while the frequentists believe probability refers to frequencies of events in the real
world. A growing body of research has shown that frequentist versions of Bayesian problems elicit
Bayesian reasoning. This study (N = 118) replicated Fiedler's finding that a frequency version of
the Linda problem elicits Bayesian reasoning in about 75% of participants, compared to 17% for
the probability version in Tversky and Kahneman's studies. It also found, however, that the
inductive-reasoning mechanism that operates on frequency input is not activated when there is a
racial stereotype generated.
Keywords: stereotypes conjunction Bayesian frequentists probabilists
Introducing Leroy
3
Introducing the Leroy problem: Extending the Frequency vs. Probability Debate to
Racial Stereotypes via an Evolutionary Perspective.
Life is filled with decisions that are based on the likelihood of uncertain events--for
instance, guessing the outcome of a general election or deciding on the innocence or guilt of a
defendant. The questions of how we make our decisions and what influences them are intriguing
and important.
It appears that human beings use heuristic principles to simplify the task of using
probabilities to make predictions, but in the process, they often make systematic errors (Tvserky
& Kahneman, 1982). There are many biases found in the intuitive judgement of probability; these
include representativeness, misconceptions of chance, and base rate neglect.
For many years researchers have been fascinated with solving the "Linda problem" first
used by Tversky and Kahneman (1982) in their original study looking at representativeness in
forming bias. The Linda problem involves the conjunction rule, one of the simplest in using
probabilities (Tversky & Kahneman, 1982). The rule states that the joint occurrence of two events
cannot be more likely than the occurrence of either single event. Despite its simplicity, the vast
majority of people fail to apply the rule to the Linda problem and commit what Tversky and
Kahneman call a conjunction fallacy.
In the original presentation of the problem, participants were provided with a brief
personality description of Linda (including elements that implied feminism) and were then asked
to rank order the probabilities of various statements being true: Linda is a bank teller [Constituent
B] and Linda is a bank teller and is active in the feminist movement [Conjunction of B + F]
alongside 6 other outcomes. Tversky and Kahneman (1982) found that 83% of participants
ranked the conjunction (B + F) as more likely than the constituent (B)--a mathematical
impossibility as the bank teller category contains those who are and are not feminists. Surprisingly
they also found that statistical sophistication had little or no effect on the rate of conjunction
fallacies.
They suggested that this surprising finding reflected the representativeness heuristic in
operation. However this was not conclusive and despite intensive research on the Linda problem,
Introducing Leroy
4
which has led to the identification of many potential variables that lead to the conjunction fallacy,
no one has come close to eliminating it (Epstein, Denes-Raj, & Pacini, 1995; cf.: Hertwig &
Chase, 1998; Hertwig & Gigerenzer, 1999).
In a replication study where all personality descriptions were eliminated from the
description of Linda, Tversky and Kahneman (1982) elicited a very low rate of conjunction
fallacies. However, this does not necessarily mean that the conjunction rule is being correctly
applied. Participants are probably judging the combined activities of "bank teller and feminist" as
incompatible and therefore still engaging in the representative heuristic (Epstein, Denes-Raj, &
Pacini, 1995).
So why is the Linda problem so difficult? Tversky and Kahneman (1982) conclude that the
high rate of conjunction fallacies is obtained due to use of a judgmental heuristic, which they
describe as a strategy "that relies on a natural assessment to produce an estimate or a prediction"
(Tversky & Kahneman, 1983, p. 294). In simple terms, people make judgements about the Linda
problem based on perceiving Linda as being more similar to bank tellers who are feminists (thus
rendering this combination very "available") than to bank tellers in general.
The Paradigm of Evolutionary Psychology
Evolutionary psychology has become increasingly relevant to modern psychology in
general. Researchers are searching for possible underlying evolutionary psychological mechanisms
to explain the existence of behaviors. After all the "human brain did not fall out of the sky, an
inscrutable artifact of unknown origin, and there is no longer any sensible reason for studying it in
ignorance of the causal processes that constructed it" (Cosmides & Tooby, 1994, p. 85; but see
also Gigerenzer & Hoffrage, 1995).
The evolutionary history that led to our present "form" consists of a "step-by-step
succession of designs modified across millions of generations" (Cosmides & Tooby, 1994, p. 86).
Modifications are the results of either chance or natural selection (if one ignores the creationist
perspective) with the only plausible explanation for complex functional designs being natural
selection (Dawkins, 1986).
Natural selection works as follows: a long enduring adaptive problem results in the
formation of various competing designs to cope with it. The designs that better enhance their own
Introducing Leroy
5
propagation relative to alternative designs are selected for and eventually became the norm.
It is important to understand that evolution is a historical and not a predictive or
"foresightful" process. Hence our current design is geared towards adaptive problems of the past
without regard to the problems of the present. For humans, our cognitive mechanisms are, as it
were, designed to solve the adaptive problems created by the situations our Pleistocene hunter
gatherer ancestors faced. It is thus viewed as coincidental that a mechanism may solve present-day
problems and this event plays little or no role in explaining how the mechanism came, in the first
place, to have the design it does (Cosmides & Tooby, 1994). Also, it is important to understand
that natural selection does not always produce perfect or optimal designs (Darwin, 1859;
Dawkins, 1976, 1982).
Cognitive psychology has traditionally given emphasis to the acquisition of knowledge
rather than to the regulation of action. Understanding evolutionary theory turns this emphasis on
its head. The brain evolved mechanisms to acquire knowledge because knowledge is important in
the regulation of action. One should be asking what a mechanism was designed to do rather than
what it can do. As described by Cosmides and Tooby (1994), "Because an adaptive problem and
its cognitive solution ... need to fit together like a lock and a key, understanding adaptive
problems tells one a great deal about the associated cognitive mechanisms" (p. 96).
Frequencies or Probabilities
Following the work of Tversky and Kahneman (1982, 1983; Kahneman & Tversky, 1972),
conventional psychology now adheres to the idea that people's "untutored intuitions" do not
follow a calculus of probability (Cosmides & Tooby, 1996). However, the situation is not as
simple as it seems; even professional probability theorists are in disagreement as to what
probability means. Two of the prominent schools of thought are manifest as the frequentists and
the Bayesians. (But see also, Fiedler, Brinkmann, Betsch, & Wild, 2000; Hoffrage, Gigerenzer,
Krauss, & Martignon, 2002).
Bayesians argue that probability refers to a subjective degree of confidence and, because
one can express one's confidence that a single event will occur, it is possible to refer to the
probability of a single event. In contrast, frequentists argue that probability refers to the relative
frequencies of events in the world and are always defined over a specific reference class. Hence, a
Introducing Leroy
6
frequentist would argue that a single event (such as Pat being a bus driver) can not have a
probability (as it does not have a relative frequency), and, as such, accurate probabilities for single
events can not be computed by a calculus of probability within the mind.
Gigerenzer (1991) argued that even people who are not aware of the finer points of
probability theory may implicitly make the Baysean vs. frequentist distinction and that for most
domains, the human mind represents probabilistic information as frequencies. If this is correct, it
sidesteps the question: can humans make judgements under uncertainty that obey the rules of
probability theory if the probabilistic information provided (and answer required) is in terms of
frequencies? Tversky and Kahneman (1982) argue that the laws of chance are not intuitively
obvious nor applied very easily, suggesting that the "human mind is not designed to spontaneously
learn such rules" (Tversky & Kahneman, 1974, p. 1130).
When one is considering what the brain is designed to do, the paradigm of evolutionary
psychology is inescapable. Judgement under uncertainty is an adaptive problem that would have
regularly been experienced by our Pleistocene hunter gatherer ancestors, and statistical rules or
judgmental heuristics could have been used to solve it. It appears from Tversky and Kahneman's
finding that this is true. However, it is important to ask why one design was selected over another
(Cosmides & Tooby, 1996). It seems odd that natural selection would favor a design that used
error-prone heuristics rather than an accurate calculus of probability.
There is evidence to suggest that some birds and insects, with nervous systems
considerably simpler than those of humans, utilize very sophisticated statistical reasoning when
foraging (Real, 1991). Staddon (1988) argued that many organisms, from sea snails to humans,
have learning mechanisms responsible for a variety of tasks (e.g., habituation) that can be
described as "Bayesian inference machines." When one considers this evidence, it seem unlikely
that birds and insects and other similarly less sophisticated organisms can carry out statistical
functions that the human brain is considered incapable of.
A well-engineered reasoning mechanism. Cosmides and Tooby (1996, p. 14) identified that
the Marrion question of "what should the design of a well engineered reasoning mechanism look
like?" needs to be addressed, and experiments constructed that can detect these designs (Marr,
1982). In ancestral times the only reliable database for information would be one's own
Introducing Leroy
7
observation and those shared by the small community within which one lived. The probabilities of
single events would not have been available; instead contemporary humans' ancestors would have
thought in terms of "encountered frequencies" (Cosmides & Tooby, 1996).
So if one considers Gigerenzer's hypothesis that the mind is a good intuitive statistician of
the frequentist school and place it in an evolutionary framework, one is left with having evolved
mechanisms that "took frequency information as input, maintained such information as frequentist
representations and used these representations as a database for effective inductive reasoning"
(Cosmides & Tooby, 1996, p. 17; but cf., for instance, Evans, Handley, Perham, Over, &
Thompson, 2000). So experiments should find that performance on tasks that infer judgement
under uncertainty will differ depending on whether participants are asked to judge the frequency
or probability of a single event. This difference will favor frequency versions of problems which
should elicit superior performances.
This appears especially likely when one considers that there is evidence to suggest that
people possess a mechanism that is designed to encode frequency information very accurately as
well as automatically (Attig & Hasher, 1980; Hasher & Zacks, 1979; Zacks, Hasher, & Sanft,
1982).
It is important to understand that performances are expected to improve with frequency
versions but not to be perfect. This is true even when an "optimum algorithm" is selected for as it
is theoretically impossible to build an "omniscient algorithm" (Cosmides & Tooby, 1996). Hence
Tversky and Kahneman's (1982) original finding does not necessarily mean that human beings are
not good intuitive statisticians but rather may indicate that the information provided to solve the
problem was not in frequentist terms, and so the mind's calculus of probability was not designed
to solve it.
Cosmides and Tooby (1996) applied the frequency hypothesis to a problem famous in the
heuristics and biases literature for eliciting base rate neglect. They found that correct Bayesian
reasoning could be elicited in 76% (92% in the most ecologically valid condition) of participants.
Fiedler (1988) tested the frequency hypothesis on the Linda problem. Indeed, one might
wonder how there could be "more than one Linda." Frequency conditions typically state that
"there are 100 people who fit the description above. How many of them are: ..." In any event,
Introducing Leroy
8
Fiedler found that the vast majority of participants that commit the fallacy (quoted by Fiedler as
usually 70-80%) only do so on the probability version, where they are asked to rank statements
with respect to their probability. The frequency version saw this number drop to less than 20%
(here participants were asked "To how many out of 100 people do the following statements
apply?" as per Fiedler, 1988).
The findings suggest that statistical judgements may obey the conjunction rule provided
that the task is formulated appropriately; that is, with frequencies. Fiedler (1988) also found the
same drop in conjunction fallacies when the frequency judgement task was not presented as a
"probability-like" judgement task (i.e., not "how many out of 100"). The results from these studies
suggest that humans, like other animals, have inductive reasoning mechanisms that embody a
calculus of probability but that these mechanisms may have been designed to operate when
information is presented in a frequency format. (The mechanisms' design appears consistent with-although it is not necessarily a result of--the adaptive problems that people's ancestors faced).
Their existence remained hidden in earlier studies because (at least under the circumstances
studied) the mechanisms are unable to operate accurately when the information provided is in a
non-frequency format (Brase, Cosmides, & Tooby, 1998).
Avoiding confounds and extending the debate to racial stereotypes. Epstein, Donovan, and
Denes-Raj (1999) have criticized the relevance of Fiedler's findings due to the response formats of
the two versions of the Linda problem being confounded in important ways, one requiring ranking
and the other frequency estimates, It was reported that when confounds were eliminated, similar
rates of conjunction fallacies were obtained for frequency and probability versions of the Linda
problem (Epstein, Denes-Raj, & Pacini, 1995).
Racial Stereotype
The stereotypes that are invoked by the description of a fairly young black male, driving
an expensive car and wearing designer brand clothing are numerous and often entail the object of
the description being involved with crime or drugs. Although the fact that he is a university
graduate confounds this, it was decided to include the information so that "Leroy" is more
comparable to Linda.
Intergroup discrimination is a feature of most modern societies. Racial tensions may be as
Introducing Leroy
9
prevalent today as they have ever been despite the increased opportunities for positive contact
between blacks and whites. A considerable body of research has searched for circumstances under
which contact between whites and blacks results in positive intergroup relations (see, e.g.,
Henderson-King & Nisbett, 1996). However there is little evidence to show that positive contact
has prolonged effects or influences at group level. Disturbingly, research indicates that despite
being given increased opportunities for interracial contact, white American attitudes towards
blacks remain at best ambivalent (Dovidio, Evans, & Tyler, 1986; Gaertner & McLaughlin, 1983;
McConahay, 1986; cf. D. Katz & Braly, 1933).
Henderson-King and Nisbett (1996) showed that seeing a black person behave with
hostility or even simply overhearing a conversation where a black individual is the perpetrator of a
hostile event resulted in participants perceiving blacks as more antagonistic than whites (there
being no equivalent effect for whites). An explanation of this may be that a single black person's
behavior may have an inordinately large influence on white people's attitudes towards blacks in
general. These attitudes may be particularly disproportionate to reality if the observed behavior is
negative. It appears to be a case of another heuristic, labelled by Tversky and Kahneman (1982) as
the "law of small numbers." People rely too heavily on small, fortuitous samples to make
judgements, remaining blind to the fact that their observations can be explained by sample
variability. To put it another way, most people have substantial and varied ongoing social
interaction with both men and women. By contrast, many people may be especially ready--with
minimal cueing (such as being presented with a single vivid example)--to use the stereotype of a
racial "outgroup."
Prediction. The above evidence would suggest that it can be very difficult to remedy the
problem of intergroup discrimination; due to the effect of the law of small numbers, it seems that
many people are too ready to associate certain stereotypes with black (here referring to people of
African origin) people. It is predicted that (arguably) due to the seeming strength of racial
stereotypes, the frequency hypothesis shall not be supported for the Leroy problem and hence the
probability and frequency versions would elicit approximately equivalent rates of conjunction
fallacies.
The present study
Introducing Leroy
10
The present study seeks to replicate Fiedler's finding that a frequency version of the Linda
problem will achieve fewer conjunction fallacies (CF) than the probability version. Note that it
avoids Epstein, Donovan, and Denes-Raj's (1999) criticism by requiring participants to give either
probability or frequency estimates, hence avoiding ranking altogether. Thus, mode (frequency or
probability) acts as the first within-subjects variable.
It also introduces a question that would cause participants to engage in a racial bias if
operating the representativeness heuristic; this is the Leroy problem. The Leroy vignette reads:
"Leroy is 28 years old, black and single. He is intelligent and studied economics at university. He
likes to wear designer labels and drives an expensive car." (Note: the vignette and all possible
selections can be found in the Method section.)
Another effect that is examined is that of practice. In everyday life, people participate in a
variety of often inter-related situations. In the present study, there is a between-subjects variable
of sequence designed to determine whether practice with one problem type--frequency--improves
performance with the other (more difficult) problem type--probability.
To ensure that participants had the intuitive knowledge to use the conjunction rule and to
solve conjunction problems, a question was included that tested for this ability, which is most
reliably demonstrated by pairing two palpably unlikely events. The question (lottery vignette, as
described below) was first used by Epstein, Denes-Raj, and Pacini (1995) where it was found that
only 6.5% of participants committed the conjunction fallacy when answering it compared to
67.5% for fallacies in the Linda problem.
A prediction for the present study is that conjunction fallacies will be more common for
the probability version of the Linda problem than for the frequency version and also more
common than for either version of the Leroy problem. If, as we would hypothesize, the
frequency-probability difference in conjunction fallacies is not wholly stable--and, for instance,
varies as a function of sample, experimental treatments (such as racial vs. gender stereotypes),
measuring operations, and/or time and place of setting (cf. "UTOs" in Cronbach, 1982)--then this
would not, of course, "disprove" people's inherent ability to avoid conjunction fallacies when
using frequencies. (Equally, it would neither strengthen nor weaken a social/cultural rationale for
the conjunction fallacy in general nor the frequency/probability difference in particular.) It would,
Introducing Leroy
11
however, indicate that such ability is not robust and notably context-dependent--and would
moreover go some way toward elucidating this dependency.
Method
Participants
There was a total of 120 participants, 30 for each sequence. They were approached in the
refectory of Goldsmiths College, University of London, and participants were all undergraduates
of the College. There were equal numbers of each sex in each condition, i.e., 15 male and 15
female.
For the initial analysis the data from two participants had to be discarded as they failed to
complete the questionnaire properly (by leaving target scores blank, e.g., Linda is a bank teller).
After closer examination of data and initial analysis it was decided provisionally to remove
the data for 11 additional participants as they had appeared to fill in their questionnaires without
adequate thought and placed the same number e.g., 50% for all, or nearly all, outcomes. Including
respondents who show such a response set would (in inferential statistics tests) give an inflated
estimate of sample size and a deflated estimate of differences among probabilities (or among
frequencies); nevertheless, the data were analyzed both with and without these respondents and,
as indicated below. there were no major differences in the results.
Materials
Five conjunction problems were presented in counterbalanced order in four different
questionnaire booklets, each booklet corresponding to one of the four sequences. The five
problems were as follows: probability and frequency versions of the Linda problem, probability
and frequency versions of the Leroy problem, and the lottery problem. The lottery problem was
always presented last (in all four sequences) as previous research has revealed that when the
lottery problem precedes the Linda problem, participants' performance on the Linda problem
improves (Epstein, Denes-Raj, & Pacini, 1995).
The Linda vignette, reproduced from Tversky and Kahneman (1983) reads as follows:
"Linda is 31 years old, single outspoken and very bright. She majored in philosophy. As a
student she was deeply concerned with social issues of discrimination and social justice and also
participated in anti-nuclear demonstrations" (p. 297).
Introducing Leroy
12
Participants were then asked the following:
Probability version: Approximately how likely is each of the following? Rate each one
SEPARATELY on a scale from 0-100% (i.e., they do not have to all add up to 100%).
Frequency version: To how many out of 100 people, do the following statements apply?
For both modes participants had to estimate the likelihood of the following statements:
Linda is a bank teller (B) and Linda is a bank teller and is active in the feminist movement
(B + F)--alongside 6 other statements, which were ignored in the data analysis.
The Leroy Problem. The Leroy vignette reads as follows:
"Leroy is 28 years old, black and single. He is intelligent and studied economics at
university. He likes to wear designer labels and drives an expensive car."
Participants were asked to respond to the corresponding probability and frequency
questions as with the Linda problem and had to estimate the likelihood of the following
statements alongside 6 other statements: Leroy is a voluntary aid worker (V) and Leroy is a
voluntary aid worker and likes to listen to Rap music (V + R).
Lottery vignette. The lottery vignette--reproduced from Epstein, Denes-Raj, and Pacini
(1995)--reads as follows (but note that the vignette is slightly adapted to suit participants from a
British population, the word lotteries being replaced with lottery tickets):
"Tom buys two lottery tickets, one from the state lottery and one from the local fire
department. The chances of winning in the state lottery are one in a million. The chances of
winning in the fire department lottery are one in a thousand" (p. 1127).
Participants were then asked to rank order the following likelihood statements on a scale
from 1 (most likely) to 3 (least likely): "Tom wins the state lottery," "Tom wins the fire
department lottery," and "Tom wins the state lottery and the fire department lottery".
Each vignette (Linda, Leroy, and lottery), along with its questions, was placed on an
individual page.
Introducing Leroy
13
Design
The study was a 2 (Mode: frequency vs. probability) x 2 (Basis: sex vs. race) x 4
(Sequence) mixed factorial design. Sequence was a between-subjects variable and had 4 levels:
Sequence 1 = F-Linda, F-Leroy, P-Linda, P-Leroy;
Sequence 2 = F-Leroy, F-Linda, P-Leroy, P-Linda;
Sequence 3 = P-Linda, P-Leroy, F-Linda, F-Leroy;
Sequence 4 = P-Leroy, P-Linda, F-Leroy, F-Linda;
(Where F = Frequency, and P = Probability).
In order to account for order effects (if any), 4 sequences were used such that for either
mode (frequency or probability) some participants answered a race then sex problem and others
did the opposite. Mode and basis were within-subjects variables with 2 levels each as already
stated (see above).
Procedure
The experimenter approached participants in the college refectory and asked if they were
willing to take part in a ten-minute study exploring people's judgements of likelihoods. Those who
agreed were given one of the four types (i.e., sequences) of questionnaire booklets.
Participants were randomly allocated into a sequence type. This was done by the
experimenter, who prior to approaching students in the refectory sorted the questionnaires into a
pile with sequences rotating in the following fashion, Sequence 1, 2, 3, 4, 1, 2, 3, . . . etc.
Participants were handed the top questionnaire on the pile, so that neither they nor the
experimenter knew what sequence they were completing.
The questionnaires had clear instructions on the first page that they should not flick
through the booklet but should complete each question in the sequence provided; once a question
had been completed they were to turn the page and then not turn back again. The experimenter
read through the instructions with each participant and then asked if they had understood the
instructions. The participant was then left to complete the questionnaire while the experimenter
was close by.
Introducing Leroy
14
Scoring
The bias was scored quantitatively--e.g., if a participant gave the likelihood that "Linda is
a bank teller" a rating of 20% (or 20/100) and gave the likelihood "Linda is bank teller and a
feminist" a rating of 25% (or 25/100) their score would equal +5 (25 - 20). A positive score
indicates the breaking of the conjunction rule and a negative score the opposite, i.e., a lack of
bias. (The initial analyses are, however, concerned with the sheer rate of CF--proportion of
positive scores.)
Results
Conjunction Fallacies as a function of problem type. Table 1 shows that nearly 2/5 of
participants (39.8%) committed a Conjunction fallacy (CF) in response to the probability version
of the Linda vignette, compared to only 24.6% in the frequency version of the same problem. The
difference is substantial and highly significant. (See Table 2; using a McNemar test to compare
those 23 participants in the first two columns of the "+" row--who show CF only for
probabilities--with the 4 in the first two rows of the "+" column, who show CF only for
frequencies, Χ2 = 13.37, df = 1, p < .001).
-----------------------Tables 1 and 2 about here
-----------------------Nevertheless, the rate of CF for the "Probability-Linda" (P-Linda) version of 39.8% was
much lower than Tversky and Kahneman's (1983) rate of 83% in their original study.
Moreover, many participants failed to answer the Lottery problem correctly (30.5%), a
surprisingly high number when compared to 6.5% in Epstein, Denes-Raj, and Pacini's (1995)
study. It is possible that the participants have poor math ability in general or did not understand
the questions. In fact only 19 out of 118 could identify the conjunction rule explicitly.
Fiedler's (1988) finding of only 22% of participants committing a CF with a frequency
version of the Linda problem was replicated, with a difference of only 2.6%.
The Leroy problem (which invoked racial stereotypes) did not establish the same trend. It
elicited CF in 36.7% of participants in the probability version (P-Leroy) compared to 40.7% in the
frequency version. (The relevant "turnover table" is shown as Table 3.) As predicted, the
Introducing Leroy
15
frequency version did not reduce bias and indeed shows a non-significant trend towards increasing
it.
-----------------------Table 3 about here
-----------------------Actual CF scores (that is, conjunction minus constituent). Figure 1 provides a graphical
representation of the marginal means across sequence for all question types. The graph indicates
that F-Linda seems unusual in the response it elicits, suggesting an interaction of Mode x Basis,
yielding a low CF rate for the frequency version of the Linda problem. None of the other means
appear very different, suggesting that there are no pronounced main effects. (See also the ANOVA
results, below.)
-----------------------Figures 1 & 2 about here
-----------------------Practice effects
To establish if there is a practice effect, one needs to compare mean scores across all four
conditions, i.e., all sequences. Figure 2 shows that the F-Linda version elicited lower scores (i.e.,
fewer CFs) across all sequences--except Sequence 2--than did the P-Linda, P-Leroy, and F-Leroy
versions, with Sequence 4 (P-Leroy, P-Linda. F-Leroy, F-Linda) achieving particularly low scores.
The graph also indicates that Sequence 2 (F-Leroy, F-Linda, P-Leroy, P-Linda) seems to
have relatively low scores across the ranges of problem types. It suggests that practice with
frequency-type questions enhances performance of probability type questions, provided the
racial-stereotype-inducing question is asked first.
In order to assess the significance of differences among the means a 2 (Mode: frequency
vs. probability) x 2 (Basis: Linda/sex vs. Leroy/race) x 4 (Sequence) mixed factorial analysis of
variance was carried out, with Mode and Basis as within-subjects measures and Sequence
(1/2/3/4) as a between-subjects measure.
The grand means for the two levels of each of the within-subjects measures, Mode and
Basis, were as follows: Probability, 7.81; Frequency, 4.93; Sex(Linda) 5.47, and Race(Leroy),
Introducing Leroy
16
7.27. The analysis of variance revealed that there were no significant main effects but there was a
significant Mode x Basis interaction, F(1, 114) = 6.82, p < .01. From Figure 1 it seems clear that
the interaction is due to low CF scores on the frequency version of the Linda problem. The results
again indicate that a significant interaction for Mode x Basis occurs for the frequency version of
the Linda problem. As it was predicted that the frequency version of the Linda problem would
elicit fewer CFs than the probability version and that Mode would have no effect on the Leroy
problem, a single paired samples t-test was carried out. This revealed that participants,
irrespective of sequence, displayed smaller CFs on the frequency version of the Linda problem
compared to the probability version, t (117) = 3.08, p < .003.
Scores of Zero
Many participants achieved a score of zero for some answers. A zero score signifies that
the participant has answered the questions in one of two possible ways. Taking the Linda problem
as an example (the same thinking applies to the Leroy problem):
(1) Participant answered 0% or 0/100 for both the constituent (B) and for the conjunction
(B + F).
(2) Participant answered with the same non-zero figure, e.g., 25% or 25/100 for both the
constituent (B) and the conjunction (B + F).
Both answers are logically possible, if improbable, and hence do not inevitably actually
break the conjunction rule. Closer examination of raw data revealed that some participants who
had zero (or near-zero) scores via the second method had filled in the same (or virtually the same)
number for all categories, e.g., 25%. It was decided to repeat the main analyses without the
scores of these participants (n = 11) to see what impact, if any, they had on the findings. In the
event there were no major differences when the data were analyzed without the "zero scorers."
(See Tables 1, 2, and 3).
Conjunction fallacies as a function of problem type ("zero scorers" removed). Table 2
shows that participants perform better on the frequency version than the probability version of the
Linda problem with 26.2% and 43.0% committing Conjunction Fallacies (CF) respectively.
Again the rate of CF for the P-Linda version (42%) is much less than Tversky and
Kahneman's (1983) 83%. Possible reasons for this drop in number of CF are discussed below.
Introducing Leroy
17
Again Fiedler's (1988) finding of a frequency version eliciting fewer CFs than the probability
version of the Linda problem was replicated, with only 26.2% of participants committing the
fallacy in the frequency version.
The Leroy problem elicited approximately the same amount of CF in the probability
version as did the Linda problem with the frequency version producing no drop in number of
fallacies. This is as predicted.
A graph of the marginal means for each question type (Figure 3) reveals that removing the
11 participants had little effect on the means for both the frequency and probability versions of the
Linda problem and for the P-Leroy problem but substantially reduced the scores for the F-Leroy
version making the frequency and probability version of the Leroy problem approximately
equivalent in terms of mean scores.
-----------------------Figures 3 & 4 about here
-----------------------Figure 4 shows the mean scores for each question type in each sequence. The graph makes
it easy to see that Sequence 4 is the only one to undergo a major change, with the mean F-Leroy
score dropping significantly (a drop of 4.45). This happened mainly because two "zero scorers,"
both in Sequence 4, showed 0.0 conjunction fallacy for three of the four conditions but very large
conjunction fallacies (50 and 90, respectively) for F-Leroy; in other words, removing "zero
scorers" stripped F-Leroy of two conjunction-fallacy outliers. The "stripped" result would
tentatively suggest that practice with probability type questions enhances performance on
frequency type questions provided the sex stereotype inducing question is asked first.
Otherwise, the same trends seem apparent with or without "zero scorers," with Sequence
2 (F-Leroy, F-Linda, P-Leroy, P-Linda) appearing to elicit lower scores across all problem types
and F-Linda scores being the lowest in all sequences except for Sequence 2.
In order to determine if there were any significant differences between the means and to
assess any other impact from removing the scores of the 11 participants who appeared not to have
completed the questionnaire appropriately, a 2 (Mode) x 2 (Basis) x 4 (Sequence) mixed factorial
Analysis of variance was carried out, with mode and basis as within-subjects measures and
Introducing Leroy
18
sequence as a between-subjects measure.
The analysis revealed a significant main effect for mode F(1,103) = 5.689, p < .019. It also
revealed two interactions: Mode x Basis F(1, 103) = 3.966, p < .049 and Mode x Sequence
F(3,103) = 2.313, p < .063. (Note that although the Mode x Condition interaction was not
significant at the 5% level, it was arguably close enough to merit brief discussion.)
The grand means across all four sequences (cases with zero scores removed) were as
follows: Probability, 7.80; Frequency, 3.25; Sex(Linda) 5.70, and Race(Leroy), 5.36. The main
effect for Mode confirms that participants performed fewer CFs on frequency problems. From
Figure 3 it appears that the interaction Mode x Basis is rooted in the Linda problem, with
participants achieving fewer CFs in the frequency version. Several t-tests were carried out. (The
Bonferroni correction was applied to prevent capitalizing on chance, and probabilities were
adjusted accordingly). The further analysis revealed, as predicted, that the frequency version of
the problems only reduced CFs significantly for the Linda problem, t(106) = 2.973, p < .004 and
not for the others (p > .05).
Figure 4 shows the scores for each question type by sequence. Sequence 4 has atypically
low scores for the frequency version of both the Linda and the Leroy problem. To establish the
source of the Mode x Sequence interaction, further statistical analysis was carried out. A total of
four independent-samples t-tests were carried out, which revealed only one significant difference
between sequences. Participants in Sequence 4 (P-Leroy, P-Linda, F-Leroy, F-Linda) committed
significantly fewer CFs for both the Linda and the Leroy problems than those in Sequence 2
(F-Leroy, F-Linda, P-Leroy, P-Linda), t(1,106) = 2.39, p < .021. However once the Bonferroni
correction was applied, this lost its significance as p > 0.05. However, as the correction is quite
conservative and the difference (away from statistical significance) small it is probably safe to say
that the effect was significant.
Introducing Leroy
19
Discussion
The initial analysis (with no scores removed) showed no main effects for mode (frequency
or probability) or basis (sex or race). There was an interaction for Mode x Basis however. This
revealed, as predicted, that participants committed fewer CF on the frequency version of the
Linda problem compared to the probability version. This was a replication of Fiedler's (1988)
findings, where only 22% of Participants committed the fallacy when completing the frequency
version of the Linda problem (24.6% in this study).
The present study did not, however, replicate Tversky and Kahneman's (1982) original
finding of 83% of participants committing the fallacy on the probability version of the Linda
problem. Instead only 39.8% made conjunction errors. This could be because the present
participants were not ranking probabilities but giving quantified estimates (essentially thinking in
terms of frequency even for probability questions) (see, for example, Fisk & Pidgeon, 1996;
Hertwig & Chase, 1998), or it could be because some (approximately 1/3) of the participants
were psychology students who may have been aware of Tversky and Kahneman's work (though
we have no particular reason to think this to be the case) and hence avoided the fallacy. Also,
many participants voiced confusion at the profession "bank teller," an American term for which
the English equivalent is "bank cashier" or "bank clerk"--though this did not stop the sample,
overall, from demonstrating a typical "Linda problem" effect, albeit in attenuated form. It would
be interesting to see if a replication of the study using language more appropriate to British
participants would facilitate any change in the study's findings. Despite the drop in number, close
to 40% of participants broke the conjunction rule, which is described by Tversky and Kahneman
(1983) as one of the simplest in probability. This requires an explanation.
Kahneman and Tversky (1972) drew the conclusion, which has been widely accepted, that:
"In his evaluation of evidence, man is apparently not a conservative Bayesian: he is not a Bayesian
at all" (p. 450). However this view may well be premature.
Frequentist versions of Bayesian problems do appear to elicit Bayesian reasoning. There is
a growing body of research that is finding this to be true (see Brase, Cosmides, & Tooby, 1998;
Cosmides & Tooby, 1996; Fiedler, 1988; Gigerenzer, Hell, & Blank, 1988). The present study has
found that a frequency version of the famous Linda problem reduced the number of CFs to only
Introducing Leroy
20
24.6%--a drop of 58.4% from Tversky and Kahneman's (1982) original finding.
It was predicted that the frequency version would not elicit the same effect (as for the
Linda problem--i.e., a drop in number of CF) on a problem that generated racial stereotypes. This
was found to be true. As noted above, stereotypes of Blacks are very easily created and
reinforced. Moreover, these attitudes have proven very difficult to eliminate (see Gaertner &
Dovidio, 1986; I. Katz & Hass, 1988; McConahay, 1986), partly due to the operation of another
heuristic, the law of small numbers (Tversky & Kahneman, 1982), where people rely too heavily
on small samples when they make judgements.
Surprisingly (given the history of research on the Linda problem), the frequency version of
the Leroy problem actually led to a (non-significant) increase in the number of CFs (a rise of 6%)
compared with the probability version. Closer examination of the data revealed that several
participants who had committed fallacies on the F-Leroy version had filled in the same (or nearly
the same) number for all problems throughout the questionnaire. The validity of these participants
was brought into question and it was decided to redo the analysis with their data removed. A total
of 11 participants had their data removed.
The re-analysis showed a main effect for mode and interactions for Mode x Basis and
Mode x Condition. Removing the anomalous scorers did lead to the number of CFs dropping in
the frequency version of the Leroy problem. Comparison of Figures 2 and 4 (graphs of mean
scores by sequence) showed that most scores remained similar, but with Sequence 4 (P-Leroy,
P-Linda, F-Leroy, F-Linda) displaying a large change in the score for the frequency version of the
Leroy problem. It appears that many of the participants who had their data removed fell in this
group. Whether this was due to chance assignment to sequence groups or to the experimental
effect of initially facing the P-Leroy task is not certain. What is clear from the present results,
taken together, however, is that the "CF-freeing" effect of the frequency task varies with context.
The Mode x Basis interaction indicated the same effect as the initial analysis, where t-tests
revealed that there was only a significant drop in scores (hence fallacies) for the Linda problem.
As predicted, the Leroy problem, which generated racial stereotypes, did show substantial CFs
but did not show the frequency version reducing the number of CFs significantly. Apparently,
participants are still engaging in the representativeness heuristic even when the problem is posed
Introducing Leroy
21
in frequentist terms. This is possibly because the bias is too strong and somehow the mechanism
for inductive reasoning is bypassed and the representativeness heuristic engaged. Apparently the
black male stereotype is powerful enough to "override sound reasoning" in both frequentist and
probabilistic modes. (This is not of course to say that using sound base-rate probabilities as a
primary basis for social-cognition decisions is routinely maladaptive.) As Sears (2001) has put it,
though, Black-White differences tend to "trump" all other categorizations. An evolutionary
framework does not suggest that there are only mechanisms that deal with frequency input, and it
may be the case that in some instances the use of an heuristic would provide a more favorable
outcome and hence be the mechanism selected for. It is plausible (albeit uncertain) to consider
that, as our hunter gatherer ancestors lived and functioned socially within small discrete groups
(Tooby & Cosmides, 1996), members from outgroups were treated with suspicion and a quick
judgement (possibly applying the law of small numbers) was the most adaptive. Heuristic
reasoning, including representativeness, has its adaptive functions; in fact, as Epstein, Donovan,
and Denes-Raj (1999) noted: "In the real world, no one would doubt that the absence of such
[heuristic] thinking can be highly maladaptive" (p. 213).
It is important to note that although the Leroy problem involved generating a racial
stereotype, that black men like to listen to Rap music, the stereotype is not a particularly negative
one. It would be interesting to see if more negative stereotypes such as drug use, criminality etc.
would elicit a similar effect.
The conjunction fallacy is subject to shifts due to subtle differences in stimuli, and the
Linda and Leroy problems do necessarily differ in ways other than those linked directly to the
stereotypes concerned. It is also possible that (relatively small) effects were not found in the
Leroy problem (including the reduction of effect in finding CF in probabilities) due to the limited
number of participants used. Further research using a larger sample and different population (i.e.,
non-student) may produce different findings. It is generally considered that students may be more
aware of issues of gender and race discrimination, and their answers may reflect this. A different
sample population might well yield even stronger effects.
Additional general issues
It is worth taking stock briefly of what the present study implies as regards racial and
Introducing Leroy
22
sexual stereotyping and related matters. The results do indicate that basic logical (conjunction)
errors can follow not only from sex-based but also from race-based stereotypical expectations.
Moreover, although the present experiment confirms that participants may be more error-prone
for probabilistic than for frequency-framed reasoning, the Leroy problem makes it clear that the
latter does not grant "immunity" to such biases.
This may well be because racial stereotypes are less yielding than gender ones, though it
might also, for instance, simply be due to our having hit upon a vignette that has somewhat
different properties from the Linda problem. For example, albeit in a different research area,
Krueger and Rothbart (1988) found that, among sex-based examples, stronger stereotypes may be
less "malleable" in terms of inferences made from them in different contexts. Ideally, one would
follow Cronbach's (1982) dictum to sample Treatments in much the same way that one would
Units (participants). This is not so easily done, however, and a substantial literature attests to the
difficulty of replicating the Linda problem with other vignettes much less exceeding it with a
problem, such as the Leroy one, which yields substantial conjunction errors with frequency as well
as probability questions. Any conclusions with regard to gender vs. racial stereotyping must be
made with great caution. Nevertheless, the results are at least consistent with a view that (a)
frequency-based reasoning is well represented in our evolutionary history and (b) minority-group
stereotyping is especially resilient to change (though fortunately not impervious to change--see,
e.g., Aboud & Levy, 1999).
It has long been known that psychological factors can disrupt logical reasoning. In the
1950s, for example, Abelson and Rosenberg's (1958) demonstrations of "psycho-logic" showed
the impact of expectations on syllogistic accuracy (see also, for example, Simon & Holyoak,
2002). Until recently, however, such work has not been particularly directly concerned with
conjunction errors in categorical reasoning.
Further substantial programmatic research would be useful for understanding more
precisely the properties and ecological frequency of logical errors. Perhaps this could be
accomplished, in part, by creating abstract "stereotypes" by exposing participants to "training
trials" in which simple shapes and objects are presented with different frequencies of occurrence
for different manifestations (e.g., various shapes and sizes)--and then asking frequentist and
Introducing Leroy
23
probabilistic questions that are framed so as to be conceptually parallel to those in the Linda and
Leroy problems. It may well be possible to elicit conjunction errors without the "baggage" of
personality vignettes (see also, Yates & Carlson, 1986).
Although it is tempting to view a conjunction fallacy as a binary distinction--present when,
and only when, a conjunction is seen as more frequent or probable than one of its constituents-the signed difference apparently forms a fairly smooth distribution that spans the zero point. As
with many variables, its magnitude is associated with differences both within and between
individuals. That is, even differences where the constituent is (correctly) seen as larger than the
conjunction, with no "conjunction error" as such, are arguably part of a scaled "degree" of
conjunction error."
Thus phenomena that may be associated with some gender and racial stereotyping may be
basic not only in their automaticity (as in implicit association tests) but also in the generality of the
situations to which they may apply. In some contexts (cf. Borgida, Locksley, & Brekke, 1981;
Locksley & Stangor, 1984) a potential social problem with conclusions based on stereotypes
linked to sex or race is not that they are illogical--indeed they may be "overly" logical in a
Bayesian sense (e.g., based on perceived prior odds)--but that they may be used to justify wholly
unwarranted and unfair extrapolation and discrimination. By contrast, in the present context it
would seem that people may need to be more mindful of bias (such as possible conjunction errors
due to exaggerated expectations) reflecting logical violations in a broad spectrum of examples,
including basic categories (e.g., shapes) as well as stereotypes such as those related to women and
minority group members.
Conclusions
Whatever the cause, the present results do confirm that conjunction fallacies found with
probabilistic judgements may well be reduced when equivalent frequency judgements are made in
some circumstances (in the Linda problem)--but that such reduction does not necessarily take
place (the Leroy problem).
It seems as though frequentist problems are able to access a cognitive mechanism for
inductive-reasoning in certain circumstances only. Further research into different types of problem
that cause participants to engage in a variety of stereotypes is needed to understand the full effect
Introducing Leroy
24
of the frequency hypothesis.
It cannot be ignored that human beings are constantly being exposed to actual frequencies
of real events and that we, like many non-human animals, appear to have unconscious mechanisms
to keep track of these frequencies (Staddon, 1988). The evidence suggests we can use information
that is in a frequency format to apply statistical rules correctly when making judgements under
uncertainty. However, the notable conjunction fallacies of the probabilistic version of the Linda
Problem seem to be readily manifest in the racial arena of the Leroy Problem as well--so readily,
in fact, that they apparently can override any probabilist/frequentist disparity. As Tooby and
Cosmides (1996) pointed out, human beings in certain situations may be "good intuitive
statisticians after all!" (p. 1)--but, one may need to add, only under finely balanced circumstances.
Introducing Leroy
25
References
Abelson, R. P., & Rosenberg, M. J. (1958). Symbolic psycho-logic: A model of attitudinal
cognition. Behavioral Science. 3, 1-13.
Aboud, F. E., & Levy, S. R. (1999). Reducing racial prejudice, discrimination, and
stereotyping: Translating research into programs. Journal of Social Issues, 55, 621-625.
Attig, M., & Hasher, L. (1980). The processing of frequency occurrence information by
adults. Journal of Gerontology, 35, 66-69.
Borgida, E., Locksley, A., & Brekke, N. (1981). Social stereotypes and social judgment.
In N. Cantor and J. F. Kihlstrom (Eds.). Personality, cognition, and social interaction (pp.
153-169). Hillsdale, NJ: Erlbaum.
Brase, G. L., Cosmides, L., & Tooby, J. (1998). Individuation, counting, and statistical
inference: The role of frequency and whole-object representations in judgement under uncertainty.
Journal of Experimental Psychology: General, 127, 1, 3-21.
Cosmides, L., & Tooby, J. (1994). Origins of domain specificity: The evolution of
functional organisation. In L. A. Hirschfeld & S. A. Gelman (Eds.), Mapping the mind: Domain
specificity in cognition and culture (pp. 85-116). Cambridge, England: Cambridge University
Press.
Cosmides, L., & Tooby, J. (1996). Are humans good intuitive statisticians after all?
Rethinking some conclusions from the literature on judgement under uncertainty. Cognition, 58,
1-73.
Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San
Francisco: Jossey-Bass.
Darwin, C. (1859). On the origin of species by means of natural selection, or, the
preservation of favoured races in the struggle for life. London: Murray.
Dawkins, R. (1976). The selfish gene. Oxford, England: Oxford University Press.
Dawkins, R. (1982). The extended phenotype: The gene as the unit of selection. Oxford,
England: W. H. Freeman and Company Ltd.
Dawkins, R. (1986). The blind watchmaker. Harlow, England: Longman.
Introducing Leroy
26
Dovidio, J. F., Evans, N., & Tyler R. B. (1986). Racial stereotypes: The contents of their
cognitive representations. Journal of Experimental Social Psychology, 22, 22-37.
Epstein, S., Denes-Raj, V., & Pacini, R. (1995). The Linda problem revisited from the
perspective of cognitive-experiential self-theory. Personality and Social Psychology Bulletin, 21,
1124-1138.
Epstein, S., Donovan, S., & Denes-Raj, V. (1999). The missing link in the paradox of the
Linda conjunction problem: Beyond knowing and thinking of the conjunction rule, the intrinsic
appeal of heuristic processing. Personality and Social Psychology Bulletin, 25, 204-214.
Evans, J. St. B. T., Handley, S. J., Perham, N., Over, D. E., & Thompson, V. A. (2000).
Frequency versus probability formats in statistical word problems. Cognition, 77, 197-213.
Fiedler, K. (1988). The dependence of the conjunction fallacy on subtle linguistic factors.
Psychological Research, 50, 123-129.
Fiedler, K., Brinkmann, B., Betsch, T., & Wild, B. (2000). A sampling approach to biases
in conditional probability judgments: Beyond base rate neglect and statistical format. Journal of
Experimental Psychology: General, 129, 399-418.
Fisk, J. E., & Pidgeon, N. (1996). Component probabilities and the conjunction fallacy:
Resolving signed summation and the low component model in a contingent approach. Acta
Psychologica, 94, 1-20
Gaertner, S. L, & Dovidio, J. F. (1986). The aversive form of racism. In J. F. Dovidio & S.
L. Gaertner (Eds.), Prejudice, discrimination, and racism (pp. 61-89). San Diego, CA: Academic
Press.
Gaertner, S. L., & McLaughlin, J. P. (1983). Racial stereotypes: Associations and
ascriptions of positive and negative characteristics. Social Psychology Quarterly, 46, 23-30.
Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond "heuristics and
biases." In W. Stroebe & M. Hewstone (Eds.), European Review of Social Psychology (Vol. 2,
pp. 83-115). Chichester, England: Wiley.
Gigerenzer, G., Hell, W., & Blank, H. (1988). Presentation and content: The use of base
rates as a continuous variable. Journal of Experimental Psychology: Human Perception and
Performance, 14, 513-525.
Introducing Leroy
27
Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without
instruction: Frequency formats. Psychological Review, 102, 684-704.
Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal
of Experimental Psychology: General, 108, 356-388.
Henderson-King, E. I., & Nisbett, R. E. (1996). Anti-Black prejudice as a function of
exposure to the negative behavior of a single Black person. Journal of Personality and Social
Psychology 71, 654-664.
Hertwig, R., & Chase, V. M. (1998). Many reasons or just one: How response mode
affects reasoning in the conjunction problem. Thinking and Reasoning, 4, 319-352.
Hertwig, R., & Gigerenzer, G. (1999). The "conjunction fallacy" revisited: How intelligent
inferences look like reasoning errors. Journal of Behavioral Decision Making, 12, 275-305.
Hoffrage, U., Gigerenzer, G., Krauss, S., & Martignon, L. (2002). Representation
facilitates reasoning: What natural frequencies are and what they are not. Cognition, 84, 343-352.
Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of
representativeness. Cognitive Psychology, 3, 430-454.
Katz, D., & Braly, K. (1933). Racial stereotypes of one hundred college students. Journal
of Abnormal and Social Psychology, 28, 280-290.
Katz, I., & Hass, R. G. (1988). Racial ambivalence and American value conflict:
Correlational and priming studies of dual cognitive structures. Journal of Personality and Social
Psychology, 55, 893-905.
Krueger, J. & Rothbart, M. (1988). Use of categorical and individuating information in
making inferences about personality. Journal of Personality and Social Psychology, 55, 187-195.
Locksley, A., & Stangor, C. (1984). Why versus how often: Causal reasoning and the
incidence of judgmental bias. Journal of Experimental Social Psychology, 20, 470-483.
Marr, D. (1982). Vision: A computational investigation into the human representation and
processing of visual information San Francisco: Freeman.
McConahay, J. B. (1986). Modern racism, ambivalence, and the modern racism scale. In J.
F. Dovidio and S. L. Gaertner (Eds.), Prejudice, discrimination, and racism (pp. 91-125). Orlando,
FL: Academic Press.
Introducing Leroy
28
Real, L. A. (1991). Animal choice behavior and the evolution of cognitive architecture.
Science, 253, 980-986.
Sears, D. (2001, April). Continuities and contrasts in American racial politics. Paper
presented at The Yin and Yang of social cognition: Perspectives on the social psychology of
thought systems; A Festschrift honoring William J. McGuire. New Haven, CT.
Simon, D., & Holyoak, K. J. (2002). Structural dynamics of cognition: From consistency
theories to constraint satisfaction. Personality and Social Psychology Review, 6, 283-294.
Staddon, J. E. R. (1988). Learning as inference. In R. C Bolles & M. D. Beecher (Eds.),
Evolution and learning (pp. 59-77). Hillsdale, NJ: Erlbaum.
Tversky, A., & Kahneman, D. (1974). Judgement under uncertainty: Heuristics and biases.
Science, 185, 1124-1131.
Tversky, A., & Kahneman, D. (1982). Judgments of and by representativeness. In D.
Kahneman, P. Slovic, & A. Tversky (Eds.), Judgement under uncertainty: Heuristics and biases
(pp. 84-98). Cambridge, UK: Cambridge University Press.
Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The
conjunction fallacy in probability judgment. Psychological Review. 90, 293-315.
Yates, J. F., & Carlson, B. W. (1986). Conjunction errors; Evidence for multiple judgment
procedures including "signed summation." Organizational Behavior and Human Decision
Processes, 37, 230-253.
Zacks, R. T, Hasher, L., & Sanft, H. (1982). Automatic encoding of event frequency:
Further findings. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8, 106116.
Introducing Leroy
29
Author Note
Correspondence concerning this article should be addressed to Herbert H. Blumberg,
Department of Psychology, Goldsmiths College, University of London, London SE14 6NW,
England; e-mail: [email protected].
We would like to thank Jules Davidoff and others for helpful comments on a draft of this
article.
Introducing Leroy
30
Table 1
Percentage of Conjunction Fallacies (CF) Committed
--------------------------------------------------------------------Parameter
P-Linda
F-Linda
P-Leroy
F-Leroy
Lottery
-----------------------------------------------------------------------------------All Respondents (N = 118)
% of CF
40.7
24.6
36.4
40.7
30.5
N of CF
48
29
43
48
36
All Respondents Except "Zero Scorers" (N = 107)
% of CF
39.0
26.2
38.3
38.3
32.7
N of CF
46
28
41
41
35
------------------------------------------------------------------------------------
Introducing Leroy
31
Table 2
"Turnover Tables" for Conjunction Fallacies: Linda Problem.
_________________________________________________________
F-Linda
_______________________________________
-
0
+
| Sum
|
P-Linda
-
24
5
0
| 29
0
11 (10)
26 (18)
4
| 41 (32)
+
11
12 (11)
25 (24)
| 48 (46)
Sum
46 (45)
43 (34)
29 (28)
| 118 (107)
_________________________________________________________
Note. - indicates that conjunction is smaller than constituent (no conjunction fallacy), 0 indicates that
conjunction equals constituency, and + indicates that conjunction is larger than constituent (conjunction fallacy).
Table shows the number of respondents in each cell (where applicable, figures in parentheses show similar
numbers but with "zero scorers" deleted).
Introducing Leroy
32
Table 3
"Turnover Tables" for Conjunction Fallacies: Leroy Problem.
_________________________________________________________
F-Leroy
_______________________________________
-
0
+
| Sum
|
P-Leroy
-
13
4
11
| 28
0
11
18 (15)
18 (12)
| 47 (38)
+
7
17 (16)
19 (18)
| 43 (41)
Sum
31
39 (35)
48 (41)
| 118 (107)
_________________________________________________________
Note. - indicates that conjunction is smaller than constituent (no conjunction fallacy), 0 indicates that
conjunction equals constituency, and + indicates that conjunction is larger than constituent (conjunction fallacy).
Table shows the number of respondents in each cell (where applicable, figures in parentheses show similar
numbers but with "zero scorers" deleted).
Introducing Leroy
33
Figure Captions
Figure 1. Marginal means (for scores) across sequence.
Figure 2. Mean scores by sequence.
Note. Sequence 1 is F-Linda (FLIN), F-Leroy (FLER), P-Linda (PLIN), P-Leroy (PLER).
Sequence 2 is FLER, FLIN, PLER, PLIN. Sequence 3 is PLIN, PLER, FLIN, FLER. Sequence 4
is PLER, PLIN, FLER, FLIN.
Figure 3. Marginal means (for scores) across sequence (with "zero scorers" removed).
Figure 4. Mean scores by sequence (with "zero scorers" removed).
(Note. Sequence 1 is FLIN, FLER, PLIN, PLER. Sequence 2 is FLER, FLIN, PLER, PLIN.
Sequence 3 is PLIN, PLER, FLIN, FLER. Sequence 4 is PLER, PLIN, FLER, FLIN.)
Introducing Leroy
34
Introducing Leroy
35
Introducing Leroy
36
Introducing Leroy
37