Making Causal Judgments From the Proportion of Confirming

Journal of Experimental Psychology:
Learning, Memory, and Cognition
2003, Vol. 29, No. 4, 710 –727
Copyright 2003 by the American Psychological Association, Inc.
0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29.4.710
Making Causal Judgments From the Proportion of Confirming Instances:
The pCI Rule
Peter A. White
Cardiff University
It is proposed that causal judgments about contingency information are derived from the proportion of
confirmatory instances (pCI) that are evaluated as confirmatory for the causal candidate. In 6 experiments, pCI values were manipulated independently of objective contingencies assessed by the ⌬P rule.
Significant effects of the pCI manipulations were found in all cases, but causal judgments did not vary
significantly with objective contingencies when pCI was held constant. The experiments used a variety
of stimulus presentation procedures and different dependent measures. The power PC theory, a weighted
version of the ⌬P rule, the Rescorla–Wagner associative learning model (R. A. Rescorla & A. R. Wagner,
1972), and the ⌬D rule, which is the frequency-based version of the pCI rule, were unable to account for
the significant effects of the pCI manipulations. These results are consistent with a general explanatory
approach to causal judgment involving the evaluation of evidence and updating of beliefs with regard to
causal hypotheses.
on which the effect occurs when the cause is absent, in terms of
Table 1 c/(c ⫹ d). The rule produces values in the range ⫺1 to ⫹1,
where ⫹1 equals perfect positive contingency, ⫺1 equals perfect
negative contingency, and 0 equals zero contingency. The ⌬P rule
is a normative measure of contingency but it is not a normative
measure of causality. For example, there may be a high positive
contingency between two things, not because one of them causes
the other but because both of them are effects of a third factor. Two
other relevant variables should also be mentioned at this point: the
proportion of instances on which the effect occurs, p(E), equals (a
⫹ c)/(a ⫹ b ⫹ c ⫹ d), and the proportion of instances on which
the cause is present, p(C), equals (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d).1
A number of theoretical approaches to causal judgment from
contingency information can be distinguished. Under one approach, causal judgments are inferences made by applying an
inductive rule to contingency information. An example of an
inductive rule is the equation for estimating generative causal
power in the power PC theory (Cheng, 1997), given in Equation 5
below. Under another approach, causal judgments are derived
from associative bonds between cues (candidate causes) and outcomes (effects) that are acquired by learning from experience with
contingencies (Shanks, 1995). Under the present approach, contingency information is initially transformed into evidence, that is,
instances are encoded in terms of confirmatory or disconfirmatory
value for the hypothesis that the causal candidate causes the effect,
and causal judgment is derived from the weight of evidence.
Specifically, causal judgment tends to increase as the proportion of
confirmatory to disconfirmatory evidence increases. This general
rule is called pCI, representing the proportion of confirmatory
instances (White, 2000b).
Suppose that Joe Snooks occasionally suffers from short-lived
rashes of blue spots on his skin. Joe notices that these spots are
particularly likely to appear after he has drunk elderflower tea and
rarely appear at other times. Eventually he concludes that elderflower tea, or something in it, is causing the spots to appear. How
does he arrive at that causal judgment? The question seems innocuous, but it is an example of one of the most important research
questions in causal cognition, concerning as it does how people
make sense out of the confusion of multifarious events by detecting regularities and inducing causal relations from them.
In essence, Joe makes a judgment about elderflower tea based
on a set of instances in which he either does or does not drink
elderflower tea, and the rashes of blue spots either occur or do not.
Thus, there are four possible combinations of the presence and
absence of the candidate cause and the effect, and causal judgment
is in some way derived from the numbers of instances of each kind.
This kind of information can be represented as a 2 ⫻ 2 contingency table, and Table 1 shows the letters that are conventionally
used to identify the cells of the table. The degree of contingency
between the candidate cause and the effect is described by the ⌬P
rule, shown in Equation 1 (Jenkins & Ward, 1965; McKenzie,
1994; Ward & Jenkins, 1965):
⌬P ⫽ p(E/C) ⫺ p(E/⫺C) .
(1)
In this equation, p(E/C) equals the proportion of instances on
which the effect occurs when the cause is present, in terms of
Table 1 a/(a ⫹ b), and p(E/⫺C) equals the proportion of instances
I am grateful to John Pearce and David Shanks for helpful comments on
earlier versions of this article and to Serena Harris for acting as experimenter in Experiments 2 and 6.
Correspondence concerning this article should be addressed to Peter A.
White, School of Psychology, Cardiff University, P.O. Box 901, Cardiff
CF10 3YG, Wales, United Kingdom. E-mail: [email protected]
1
Throughout the article, uppercase C stands for cause (except when it
appears in pCI, proportion of confirmatory instances) and lowercase c
refers to Cell c of the contingency table.
710
MAKING CAUSAL JUDGMENTS
Table 1
Conventional Identification of Cells in the 2 ⫻ 2 Contingency
Table
Effect occurs
Candidate cause
Yes
No
Present
Absent
a
c
b
d
The four cells of the 2 ⫻ 2 contingency table have normative
evidential values. Cell a is confirmatory because instances in
which the candidate is present and the effect occurs tend to
increase the likelihood that the candidate caused the effect. Such
instances tend to increase ⌬P unless it is already ⫹1. Cell d
instances also tend to increase ⌬P and are also normatively confirmatory. Cells b and c both tend to decrease ⌬P and are normatively disconfirmatory. With these normative evidential values, the
principle of judging from the weight of evidence yields a simple
rule of judgment, shown in Equation 2:
pCI ⫽ (a ⫹ d ⫺ b ⫺ c)/(a ⫹ b ⫹ c ⫹ d) .
(2)
Like ⌬P, this equation generates values in the range ⫺1 to ⫹1.2
Like models based on ⌬P, including the probabilistic contrast
model (Cheng & Novick, 1990, 1992) and the power PC theory
(Cheng, 1997), the pCI rule supports ordinal predictions: Causal
judgment is assumed to be a monotonically increasing function of
pCI. Equation 2 is a specific version of the general pCI rule that
applies to the case of single causal candidates. Versions of the rule
have been proposed and tested for judging complex causal interpretations (White, 2000b), interactions between two causal candidates (White, 2002b), and Kelleyan causal attributions (White,
2002a). For the remainder of this article the term pCI will be used
to refer specifically to the form of the rule in Equation 2. The
central claim to be tested in this article is that the pCI rule predicts
phenomena of causal judgment with a single causal candidate that
are not predicted by any other extant rule or model. The article
reports several experiments designed to test the rule and to compare the rule with the predictions of other rules and models. It is
not proposed that the rule explains all the phenomena of causal
judgment. However the predictive success of the rule implies that
it should form part of a comprehensive account of causal judgment, and the form that such an account might take is discussed in
the light of the experimental results.
The pCI rule is a proportional version of the ⌬D or sum of
diagonals rule that has been much investigated in contingency
judgment research (Allan & Jenkins, 1983; Arkes & Harkness,
1983; Jenkins & Ward, 1965; Kao & Wasserman, 1993; Shaklee &
Hall, 1983; Shaklee & Mims, 1982; Shaklee & Tucker, 1980;
Shaklee & Wasserman, 1986; Shanks, 1987; Ward & Jenkins,
1965; Wasserman, Chatlosh, & Neunaber, 1983; Wasserman, Dorner, & Kao, 1990). This is shown in Equation 3:
⌬D ⫽ (a ⫹ d) ⫺ (b ⫹ c) .
(3)
Ordinal predictions generated by pCI rule and the ⌬D rule are
isomorphic when the number of instances per judgmental problem
711
is held constant. However, because the pCI rule is a proportionbased rule and the ⌬D rule is a frequency-based rule, the two rules
make different predictions when the number of instances is manipulated for nonzero contingencies. One of the experiments reported below is designed to distinguish the two rules.
The experiments are designed to manipulate pCI values independently of ⌬P. The formal relationship between pCI and ⌬P is
clarified (and illustrated with two figures) in Appendix A. As the
figures show, when p(C) ⫽ .5, pCI and ⌬P values are identical,
and the two rules generate identical ordinal predictions for causal
judgment. This is an important property of the pCI rule. Many
studies have found that causal judgments are highly correlated
with ⌬P and that human judges are acutely sensitive to small
objective differences in ⌬P (e.g., Anderson & Sheu, 1995; Baker,
Berbrier, & Vallée-Tourangeau, 1989; Shanks, 1985, 1987, 1995;
Wasserman et al., 1983; Wasserman, Elek, Chatlosh, & Baker,
1993; White, 2000a; see also Allan, 1993; Cheng, 1997). However,
in all of these studies p(C) ⫽ .5. This means that the ordinal
predictions of the ⌬P rule and the pCI rule for these studies are
identical and the results of the studies do not show which is the
better predictor of causal judgments.
In a small number of studies p(C) ⫽ .5, and then it is possible
to ascertain whether judgments vary with pCI when ⌬P is held
constant. Ideally, all other variables of relevance should be held
constant in comparisons of this sort, in particular the two components of the ⌬P rule, p(E/C) and p(E/⫺C). In addition, the proportion a/(a ⫹ d) should be held constant because many studies
have found that Cell a information carries-greater weight in causal
judgment than Cell d information (Anderson & Sheu, 1995; Kao &
Wasserman, 1993; Levin, Wasserman, & Kao, 1993; Mandel &
Lehman, 1998; Wasserman et al., 1990).
Levin, Wasserman, and Kao et al. (1993) conducted an experiment in which participants made causal judgments about each of
a series of individual problems. In that experiment, it was possible
to find pairs of problems that were matched for values of ⌬P,
p(E/C), p(E/⫺C), and a/(a ⫹ d). Comparisons could then be made
between pairs that had different pCI values. There were four of
these comparisons in the problems used by Levin et al. (1993), and
in all four cases the higher mean observed judgment was found for
the pair with the higher value of pCI. The relevant information is
presented in Table 2.
In Wasserman, Kao, Van Hamme, Katagiri, and Young (1996,
Experiment 1), some participants gave judgments after each trial
and others only after the final trial. In what follows, means for
these two groups have been averaged together. Among the 21
problems used by Wasserman et al., there are six pairs that have
the same mean values of p(E/C), and of p(E/⫺C). The relevant
information is presented in Table 3. This shows that the member of
each pair with the higher value of pCI received the higher mean
judgment in all six cases. Each comparison therefore supports the
pCI rule. In 13 of the problems ⌬P ⫽ 0. In four of these (8, 9, 10,
11) pCI ⫽ ⫺.08, in five (1, 2, 3, 4, 5) pCI ⫽ .00, and in four (6, 7,
2
The original version of this rule was (a ⫹ d)/(a ⫹ b ⫹ c ⫹ d) (e.g.,
White, 2002c). However, for several reasons it is better to include all four
cells in the numerator so the version in Equation 2, first propounded by
White (2003a), is preferred.
WHITE
712
Table 2
Paired Comparisons Testing pCI as a Predictor of Causal
Judgments
Problem pair
⌬P
pCI
Sum of means
2⫹9
12 ⫹ 14
3⫹5
8 ⫹ 15
17 ⫹ 24
27 ⫹ 29
18 ⫹ 20
23 ⫹ 30
.267
.267
⫺.267
⫺.267
.368
.368
⫺.368
⫺.368
.36
.21
⫺.36
⫺.21
.58
.27
⫺.58
⫺.27
.69
.64
⫺.83
⫺.67
.85
.79
⫺1.09
⫺.84
Note. Data adapted from Levin et al. (1993, Experiment 1). pCI ⫽
proportion of confirmatory instances.
12, 13) pCI ⫽ .08. Overall means for judgments in each of these
three groups were ⫺.45, .50, and 1.12 respectively, and this
increasing trend also supports the hypothesis that judgments were
made in accordance with the pCI rule.
Kao and Wasserman (1993, Experiment 1) used a large number
of zero contingency problems. There were 16 possible paired
comparisons between problems that had different values of pCI but
the same value of p(E/C) and p(E/⫺C). In 13 out of 16 comparisons, the higher mean was found for the problem with the higher
value of pCI. In this case, however, the proportion a/(a ⫹ d) was
not constant across the comparisons, and the three exceptions all
had the property that the hypothetically confirming information
mostly consisted of Cell d instances. Therefore, it is possible that
judgments were lower for the higher value of pCI in the three
exceptions because the pCI value was inflated by the presence of
a large number of Cell d instances that had relatively little influence on causal judgment.
Kao and Wasserman (1993, Experiment 2) used eight nonzero
contingency problems. There were four possible comparisons between problems that had different values of pCI but the same value
of p(E/C) and p(E/⫺C) with three groups of participants making 12 comparisons in all. In 9 of these comparisons, the higher
mean was found for the problem with the higher value of pCI. Kao
and Wasserman also used 13 zero contingency problems. Here,
too, there were four possible comparisons among problems that
had different values of pCI but the same value of p(E/C) and
p(E/⫺C), with three groups of participants making 12 comparisons
in all. In 9 of these comparisons, the higher mean was found for the
problem with the higher value of pCI. As before, the 6 exceptions
(out of 24 comparisons) all involved problems in which the high
value of pCI was created by a large number of Cell d instances, so
the same explanation could apply.
pCI and ⌬P can also be distinguished in a study by Allan and
Jenkins (1983). In that study p(C) and p(E), the proportion of
instances in which the effect occurred, were manipulated orthogonally for problems with identical ⌬P. In one pair of problems,
⌬P ⫽ ⫹.2 and p(E) ⫽ .8 in both. For one problem, p(C) ⫽ .5 and
for another p(C) ⫽ .7. In this case, for the former problem pCI ⫽
⫹.20, and for the latter problem pCI ⫽ ⫹.44; pCI therefore
predicted higher causal judgments for the latter problem than for
the former, and that is what Allan and Jenkins (1983) found. A
second pair of problems ruled out the possibility that p(C) could be
responsible for this difference. In this pair, ⌬P ⫽ ⫹.2 and p(E) ⫽
.2 in both. As before, for one problem p(C) ⫽ .5, and for another
p(C) ⫽ .7. In this case, for the former problem pCI ⫽ ⫹.20 and for
the latter pCI ⫽ ⫺.04. This time, therefore, pCI predicted lower
causal judgments for the latter problem than for the former. By
contrast, if p(C) accounted for the first set of results, then the
prediction would have been for higher causal judgments for the
latter problem. In fact the results supported the pCI prediction.
Several published studies therefore support the prediction that
judgments tend to vary with pCI values when ⌬P is held constant.
However none of these experiments was designed as a test of the
pCI rule. Furthermore it is important to investigate whether the
tendency of judgments to conform to pCI values depends on the
kind of procedure used, and it is also important to investigate
whether other models of causal judgment can account for the
results. The present experiments were designed to accomplish
these objectives. The basic strategy of the experiments was to
manipulate pCI while holding ⌬P constant and, in some cases, vice
versa. This manipulation is problematic because it tends to be
confounded with variations in values of other variables. However
other variables can be held constant by constructing pairs of
problems and comparing across the pairs.
Table 4 gives the relevant information about the design used in
Experiments 1, 2, and 3. There are in fact two distinct designs. The
eight problems fall into four pairs, labeled .17, .33, .50, and .25.
The first three pairs manipulate pCI values and are called the pCI
design. In the problem designations, “.17,” “33,” and so forth, refer
to values of pCI, and H and L refer to values of the other
independent variable. This is a manipulation of both p(E/C) and
p(E/⫺C) and is called the occurrence rate variable. This occurrence rate variable takes a high value in the H problems (1.0/0.5)
and a low value in the L problems (0.5/0.0); p(E/C) and p(E/⫺C)
have to be manipulated together because otherwise ⌬P would vary.
The manipulation has no intrinsic significance: It is simply the
means by which other relevant variables can be held constant.
Table 3
Paired Comparisons Testing pCI as a Predictor of Causal
Judgments
Problem
⌬P
p(E/C)
p(E/⫺C)
pCI
M judgment
7
9
10
12
14
19
17
20
16
21
15
18
.00
.00
.00
.00
⫺.25
⫺.25
⫺.25
⫺.25
.25
.25
.25
.25
.33
.33
.67
.67
.25
.25
.50
.50
.50
.50
.75
.75
.33
.33
.67
.67
.50
.50
.75
.75
.25
.25
.50
.50
.08
⫺.08
⫺.08
⫹.08
⫺.20
⫺.33
⫺.20
⫺.33
.20
.33
.20
.33
0.66
⫺1.00
.36
2.57
⫺1.87
⫺4.44
⫺0.09
⫺1.53
1.73
2.74
2.39
5.62
Note. Data adapted from Wasserman et al. (1996, Experiment 1).
p(E/C) ⫽ proportion of instances on which the effect occurs when the
cause is present; p(E/⫺C) ⫽ proportion of instances on which the effect
occurs when the cause is absent.
MAKING CAUSAL JUDGMENTS
713
Table 4
Stimulus Materials for Experiments 1, 2, and 3
Cell frequencies
Problem
.17H
.17L
.33H
.33L
.50H
.50L
.25H
.25L
M of
M of
M of
M of
.17H
.33H
.50H
.25H
&
&
&
&
.17L
.33L
.50L
.25L
a
b
c
d
pCI
⌬P
p(E)
p(C)
p(E/C)
p(E/⫺C)
a/(a ⫹ d)
p
⌬Pw
4
10
8
8
12
6
12
4
0
10
0
8
0
6
4
4
10
0
8
0
6
0
4
4
10
4
8
8
6
12
4
12
.17
.17
.33
.33
.50
.50
.33
.33
.17
.33
.50
.33
.50
.50
.50
.50
.50
.50
.25
.25
.50
.50
.50
.25
.58
.42
.67
.33
.75
.25
.67
.33
.50
.50
.50
.50
.17
.83
.33
.67
.50
.50
.67
.33
.50
.50
.50
.50
1.00
.50
1.00
.50
1.00
.50
.75
.50
.75
.75
.75
.62
.50
.00
.50
.00
.50
.00
.50
.25
.25
.25
.25
.37
.29
.71
.50
.50
.67
.33
.75
.25
.50
.50
.50
.50
1.00
0.50
1.00
0.50
1.00
.50
0.50
.33
.75
.75
.75
.42
.70
.50
.70
.50
.70
.50
.45
.35
.60
.60
.60
.40
Note. In terms of the cell identifications in Table 1 p(CI) ⫽ proportion of confirmatory instances; p(E) ⫽ (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d); p(C) ⫽ (a ⫹ b)/(a ⫹
b ⫹ c ⫹ d); p ⫽ causal power; ⌬Pw ⫽ weighted ⌬P; H ⫽ high; L ⫽ low; p(E/C) ⫽ proportion of instances on which the effect occurs when the cause
is present; p(E/⫺C) ⫽ proportion of instances on which the effect occurs when the cause is absent.
The bottom four rows of the table present mean values of each
variable for each pair. These rows show that, for the three pairs in
the pCI manipulation, mean values of all variables, namely ⌬P,
p(E), p(C), p(E/C), p(E/⫺C), and a/(a ⫹ d), are held constant
across the pairs. The proportion a/(a ⫹ d) must be held constant
because, as stated above, Cell a instances tend to influence causal
judgment more than Cell d instances do, and this differential
influence must be controlled. Thus, with this design, the main
effect provides a test of the pCI rule that is unconfounded with all
of these variables: Thus, if there is a significant main effect of pCI,
it cannot be attributed to any of these other variables.3 Table 4 also
presents values of two other variables, designated “p” and “⌬Pw”.
These variables will be considered in a later section of the article
in which tests of other models of causal judgment are discussed.
The .33 pair and the .25 pair are the second design in the
experiment. They manipulate ⌬P values and are called the ⌬P
manipulation. There are two values of ⌬P, .50 and .25, while pCI
is held constant at .33. The bottom section of Table 4 shows that
this manipulation holds p(E), p(C), and a/(a ⫹ d) constant. It is
impossible to hold p(E/C) and p(E/⫺C) constant because these are
the constituents of ⌬P so this design holds constant as many
variables as possible in a ⌬P manipulation.
Under the hypothesis that causal judgments are made in accordance with the pCI rule, it is predicted that there will be a
significant main effect of pCI and that the main effect of ⌬P will
not be significant. The first three experiments test these predictions
using the design in Table 4 with three different presentation
procedures, to test the further prediction that similar judgmental
tendencies will occur regardless of the procedure used.
Experiment 1 follows the procedure used by White (2000b).
This is the instance list procedure, in which instances are presented individually but on a single sheet of paper so that participants can scan them freely. Experiment 2 uses the serialpresentation procedure that has been used in many studies (e.g.,
Allan & Jenkins, 1983, Experiment 3; Anderson & Sheu, 1995,
Experiment 1; Catena, Maldonaldo, & Cándido, 1998; Dennis &
Ahn, 2001; Lober & Shanks, 2000, Experiments 1, 2, and 3; Ward
& Jenkins, 1965; Wasserman et al., 1996; White, 2000a). In this
procedure, instances are presented sequentially, each one being
withdrawn before the next is presented. Experiment 3 uses the
summary presentation procedure that has also been used in many
studies (e.g., Kao & Wasserman, 1993; Levin et al., 1993; Lober
3
Values of these other variables are kept constant by averaging over two
problems. This is necessary for an unconfounded test of pCI but it has a
potential weakness in that it assumes a linear relationship between these
variables and their effects on causal judgments. If this assumption does not
hold, the tests of pCI are not fully unconfounded. This danger can be
discussed with reference to the pCI design of Experiments 1, 2, and 3, as
shown in Table 4. The danger does not apply to ⌬P because all six
problems have the same value of ⌬P. It does not apply to p(E/C), p(E/⫺C),
p (Cheng’s, 1997, measure of causal power), or weighted ⌬P because each
pair of problems has the same two values of each of these variables: For
example, in the case of p(E/C), the value of one member of a pair is
always 1.0 and that of the other is always 0.5. The danger only applies to
the other three variables, p(E), p(C), and a/(a ⫹ d).
The danger, in essence, is that the effect on causal judgment of changing
p(E) (for example) from .58 to .75 (problems .17H and .50H) might be
greater or lesser than the effect of changing p(E) from .42 to .25 (problems
.17L and .50L), even though the size of the change is the same. There is no
published evidence with which to assess this danger. However, even if the
size of the effect is not the same, the direction of effect should be. If
increasing p(E) raises causal judgment (as past research indicates; Allan,
1993), then decreasing p(E) should lower causal judgment. However,
Table 5 shows that this did not happen. Mean judgments were higher in
.50L, with the lower value of p(E), than in .17L, with the higher value of
p(E). The difference between the .50 problems and the .17 problems,
therefore, cannot be accounted for in terms of unequal effects of changes
in p(E). If p(E) had any effect on causal judgment, it was not enough to
counteract the effect of the pCI manipulation. This is shown even more
clearly in Experiment 5 where extreme increases in p(E) across one set of
four problems are matched by extreme decreases in p(E) across the other,
and causal judgments tended to go up across both; see Tables 9 and 10. The
same applies to p(C) and a/(a ⫹ d). In fact, the possibility of unequal
effects of changes in p(C) is ruled out by the design of Experiment 5, where
values of p(C) are the same across the pCI manipulation in that experiment.
WHITE
714
& Shanks, 2000, Experiments 4, 5, and 6; Mandel & Lehman,
1998; Ward & Jenkins, 1965; Wasserman et al., 1990, Experiments 2 and 3). In this procedure, participants are presented with
actual frequencies of instances in each cell, in the form of a 2 ⫻ 2
table or in a column or a pie chart.
The experiments were designed to test the pCI rule. In addition
to the effects predicted by the pCI rule, some significant interactions were obtained with a similar pattern in all the experiments.
These interactions are not predicted by the pCI rule. In the General
Discussion section, however, it will be shown that a simple modification to the pCI rule suffices to account for the interactions.
Therefore, the interactions are described in the respective Results
sections and addressed in the General Discussion section.
Experiments 1, 2, and 3
Method
Participants. The participants were 1st-year undergraduate students at
Cardiff University, participating in return for course credit. Fifty students
participated in Experiment 1, but 1 student was excluded prior to data
analysis for omitting one response. Therefore, n ⫽ 49 throughout. Forty
students participated in Experiment 2 and 36 in Experiment 3.
Stimulus materials. Initial written instructions to participants in Experiment 1 were as follows:
Imagine that you are a doctor investigating patients suffering from
severe allergic reactions. You are trying to find out whether their
allergic reactions are caused by substances in the food they eat. People
vary a great deal in the way they are affected by different substances,
and you have to assess each one on an individual basis.
You are investigating the effects on some of your patients of a
particular food additive called Benzoic acid (BA). You ask your
patients to eat a certain number of meals in which BA is either present
or absent. So, for each patient, you’ll see information about several
different meals, showing whether BA was present or absent in a given
meal, and whether the patient had an allergic reaction or not.
For each patient you will see a statement at the bottom of the page
saying “BA causes the allergic reaction in this patient”. Your task is
to judge the extent to which this statement is right for that patient. To
make your judgment, write a number from 0 (zero) to 100 beside the
question. 0 (zero) means that the statement is definitely not right, and
100 means that the statement is definitely right. The more likely you
think it is that a given statement is right for that patient, the higher the
number you should put.
There were eight problems, one on each page. At the top of the page a
patient was identified by a set of initials. There followed a list of 24
observations, each one specifying whether BA was present or absent and
whether an allergic reaction occurred or not. Under these was a request to
“rate the likelihood of this statement being right for patient [initials]: BA
causes the patient’s allergic reactions”. An example problem is shown in
Appendix B. This is the .33L problem.
Initial instructions in Experiment 2 were similar except that the name of
the chemical was changed to manganese trioxide and information about the
presentation procedure was omitted.
Initial instructions in Experiment 3 were a modified version of those for
Experiment 1. Instead of the passages describing the instance-list presentation format, there was a paragraph describing the summary table format.
An example problem in this format is shown in Appendix C; this too is the
.33L problem. Instructions for making the causal judgment were as follows:
Under this table you will see the following statement:
“Please judge the likelihood that E263 causes allergic reactions in this
patient.”
To do this you should write a number from 0 to 100 beside the
statement. 0 means that E263 definitely does not cause allergic reactions in this patient, and 100 means that E263 definitely does cause
allergic reactions in this patient. The more likely it is that E263 causes
allergic reactions in the patient, the higher the number you should put.
The eight problems were then presented one to a page, all in the format
shown in Appendix C.
Design. The design for all three experiments was as described above.
To summarize, there were two separate experimental designs. One, the pCI
design, manipulated two variables. One variable was pCI with three values,
.17, .33, and .50. The other variable was the occurrence rate manipulation.
There were two pairs of values of p(E/C) and p(E/⫺C), 1.0/0.5 and 0.5/0.0,
respectively. Cell frequencies and values of other variables for each of the
six individual problems in this design are shown in Table 4.
The other design was the ⌬P design, which also manipulated two
variables. One variable was ⌬P with two values, .50 and .25. The other was
p(E), which is also called the occurrence rate manipulation. This had two
values, .67 and .33. The function of this manipulation was to hold constant
all possible relevant variables other than ⌬P and its components p(E/C) and
p(E/⫺C)—namely, pCI, p(E), p(C), and a/(a ⫹ d). Cell frequencies and
values of other variables for the four individual problems in this design are
also shown in Table 4 (the .33 pair and the .25 pair).
Procedure. Participants in Experiments 1 and 3 were tested in groups
of 2 or 3 in a large and comfortably furnished office. They were presented
with a booklet containing the instructions for the task on the first page
followed by the eight problems in random order. A different random order
was generated for each participant. For participants in Experiment 1, the 24
observations within each problem were also randomly ordered. Participants
were invited to ask questions if anything in the instructions was not clear
(none did so). When the participants had completed the questionnaire, they
were thanked and given course credit.
Experiment 2 was conducted by an experimenter who was blind to the
aims, hypotheses, and design of the study. The participant was initially
presented with the written instructions and the experimenter then explained
the procedure, described below.
At the start of a problem, the experimenter placed a card in front of the
participant with the patient’s initials on it. Trials then began. On each trial,
the experimenter stated the meal number and whether manganese trioxide
was present or absent. The participant then predicted whether an allergic
reaction would occur or not. The experimenter then informed the participant whether their prediction was correct or not: If the participant said yes,
and if an allergic reaction did occur, the experimenter said, “Correct—an
allergic reaction did occur”; if an allergic reaction did not occur the
experimenter said, “Incorrect—an allergic reaction did not occur”. If the
participant said no, and if an allergic reaction did occur the experimenter
said, “Incorrect—an allergic reaction did occur”; if an allergic reaction did
not occur the experimenter said, “Correct—an allergic reaction did not
occur.” This procedure was repeated for 24 trials in each problem. At the
end of the 24 trials, the participant was asked to judge the extent to which
they agreed with the statement that manganese trioxide causes allergic
reactions in the patient whose initials were in front of them. Instructions for
making this judgment were as in Experiment 1, except that the judgment
was reported verbally to the experimenter rather than written down. The
experimenter recorded the judgment.
The first problem was a practice problem using a moderate positive
contingency with the same number of trials (24) as the experimental
problems. At the end of the practice trials, the experimenter checked that
the participant was ready to proceed and the eight experimental problems
were then presented in random order, with a different random order being
used for each participant. Within-problem trials were also randomly or-
MAKING CAUSAL JUDGMENTS
dered and different random orders were used for different participants. At
no time did the participant see any written information other than the
instructions and patient initials.
Results
Data from the pCI design in each experiment were analyzed
with a 3 ( pCI; .17 vs. .33 vs. .50) ⫻ 2 (high vs. low occurrence
rate) analysis of variance (ANOVA) with repeated measures on
both factors. Data from the ⌬P manipulation in each experiment
were analyzed with a 2 (⌬P; .5 vs. .25) ⫻ 2 (high vs. low
occurrence rate) ANOVA with repeated measures on both factors.
Mean causal judgments for all three experiments are reported in
Table 5.
Experiment 1. Analysis of the pCI design yielded a significant
effect of pCI, F(2, 96) ⫽ 32.73, p ⬍ .001. Post hoc paired
comparisons with the Newman-Keuls test revealed the order .50 ⬎
.33 ⬎ .17 ( p ⬍ .01 in all cases).
The main effect of occurrence rate was not significant, F(1,
48) ⫽ 2.40. There was, however, a significant interaction between
the two variables, F(2, 96) ⫽ 4.32, p ⬍ .05. Simple effects analysis
revealed that the effect of pCI was significant at both occurrence
rates: for high occurrence rate, F(2, 96) ⫽ 25.78, p ⬍ .001, and for
low occurrence rate, F(2, 96) ⫽ 8.99, p ⬍ .001. There was a
significant effect of occurrence rate at pCI ⫽ .75, F(1, 48) ⫽ 6.43,
p ⬍ .05, with higher ratings at the higher value of occurrence rate.
There was no significant effect of occurrence rate at the other
levels of pCI: At pCI ⫽ .17, F(1, 48) ⫽ 1.16, and at pCI ⫽ .33,
F(1, 48) ⫽ 2.57.
As explained above, this interaction is addressed in the General
Discussion section. However, the interaction serves to make an
important point about the way the pCI rule was tested here. Simple
effects are not the proper way to test the predictions of the pCI rule
because at the level of simple effects, values of pCI are confounded with values of other variables, notably a/(a ⫹ d). It is the
main effect that provides the proper test of the predictions of the
pCI rule because at that level, pCI is unconfounded with respect to
these other variables.
Table 5
Mean Causal Judgments, Experiments 1, 2, and 3
Experiment
Problem
.17H
.17L
.33H
.33L
.50H
.50L
.25H
.25L
.17
.33
.50
.25
Note.
mean
mean
mean
mean
1
2
3
39.22
43.22
53.71
46.63
66.59
56.63
54.61
39.08
41.22
50.17
61.61
46.85
39.97
46.87
59.85
46.65
73.80
55.22
61.12
42.90
43.42
53.25
64.51
52.01
39.83
48.31
48.33
49.72
56.31
47.53
50.86
34.22
44.07
49.03
51.91
42.54
H ⫽ high; L ⫽ low.
715
Analysis of the ⌬P design revealed that the main effect of ⌬P
was not statistically significant, F(1, 48) ⫽ 2.87. There was a
significant effect of occurrence rate, F(1, 48) ⫽ 13.25, p ⬍ .001,
with higher ratings at the high level of occurrence rate. This is also
addressed in the General Discussion section. The interaction between the two variables was not statistically significant, F(1,
48) ⫽ 1.89.
Experiment 2. Analysis of the pCI design yielded a significant
effect of pCI, F(2, 78) ⫽ 42.84, p ⬍ .001. Post hoc paired
comparisons with the Newman-Keuls test revealed the order .50 ⬎
.33 ⬎ .17 ( p ⬍ .01 in all cases).
There was a significant main effect of occurrence rate, F(1,
39) ⫽ 7.15, p ⬍ .05, with higher ratings in the high occurrence rate
condition. There was also a significant interaction between the two
variables, F(2, 78) ⫽ 11.58, p ⬍ .001. Simple effects analysis
revealed that the effect of pCI was significant at both occurrence
rates: For high occurrence rate, F(2, 78) ⫽ 44.39, p ⬍ .001, and for
low occurrence rate, F(2, 78) ⫽ 3.69, p ⬍ .05. There was a
significant effect of occurrence rate at pCI ⫽ .33, F(1,
39) ⫽ 12.64, p ⬍ .01, and at pCI ⫽ .50, F(1, 39) ⫽ 14.59, p ⬍
.001, but not at pCI ⫽ .17, F(1, 39) ⫽ 2.11. The interaction has the
same pattern as the corresponding interaction in Experiment 1, as
Table 5 shows.
Analysis of the ⌬P design revealed that the main effect of ⌬P
was not statistically significant, F(1, 39) ⫽ 0.26. There was a
significant effect of occurrence rate, F(1, 39) ⫽ 38.70, p ⬍ .001,
with higher ratings in the high occurrence rate condition. The
interaction was not significant, however, F(1, 39) ⫽ 0.89.
Experiment 3. Analysis of the pCI design yielded a significant
main effect of pCI, F(2, 70) ⫽ 3.71, p ⬍ .05. Post hoc comparisons
with the Newman-Keuls test revealed higher ratings when pCI ⫽
.50 than when pCI ⫽ .17. The mean for pCI ⫽ .33 fell between
these two as predicted, but was not significantly different from
either. The main effect of occurrence rate was not significant, F(1,
35) ⫽ 0.01. There was a significant interaction, F(2, 70) ⫽ 5.34,
p ⬍ .01. This interaction has the same pattern as that found in
Experiments 1 and 2. Means are reported in Table 5.
Analysis of the ⌬P design revealed a significant main effect of
⌬P, F(1, 35) ⫽ 5.65, p ⬍ .05, with higher judgments at the higher
level of ⌬P. The main effect of occurrence rate fell short of
statistical significance, F(1, 35) ⫽ 3.83, p ⫽ .06. There was a
significant interaction between the two variables, F(1, 35) ⫽ 4.68,
p ⬍ .05. Simple effects analysis revealed a significant effect of ⌬P
at low occurrence rate, F(1, 35) ⫽ 8.26, p ⬍ .01, but not at high
occurrence rate, F(1, 35) ⫽ 0.31. There was a significant effect of
occurrence rate at the lower value of ⌬P, F(1, 35) ⫽ 11.78, p ⬍
.01, but not at the higher value of ⌬P, F(1, 35) ⫽ 0.05. Essentially,
the mean for the .25L problem was different from the other three
means, as Table 5 shows.
Discussion
All three experiments yielded similar results, supporting the pCI
rule. When pCI was manipulated and ⌬P held constant, mean
ratings varied significantly with pCI. When ⌬P was manipulated
and pCI held constant, there was no significant effect in Experiments 1 and 2.
WHITE
716
In Experiment 3, unlike the others, there was a significant main
effect of ⌬P. There are at least two possible interpretations of this
result. There is some evidence of more sophisticated rule use in the
summary presentation procedure than in the serial presentation
procedure (Arkes & Harkness, 1983; Lober & Shanks, 2000; Ward
& Jenkins, 1965), so the result may show that participants were
more inclined to judge in accordance with ⌬P in the summary
presentation procedure. Kao and Wasserman (1993), using a large
set of zero contingency problems, found that 9.5% of their participants could be classified as ⌬P users on the grounds that they
responded zero to every problem. This would equate to 3 or 4
participants in the present experiment, perhaps not enough to
account for the significant effect of ⌬P. Another possibility is that
participants based judgments more on p(E/C). The mean value of
p(E/C) for the .33 problems was 0.75, whereas the mean value
was 0.62 in the .25 problems (Table 4). Anderson and Sheu (1995)
found that 22.5% of their participants based judgments only on
Cells a and b. This equates to 8 or 9 participants in the present
study, which might be enough to produce a significant effect. On
the other hand, the effect of ⌬P was qualified by a significant
interaction, and it was found that the .25L problem was different
from the other three. Table 4 shows that this problem has the
lowest value of a/(a ⫹ d), so it is also possible that judgments were
significantly influenced by this variable.
These possibilities are not mutually exclusive, and it is possible
that all of them contributed to the observed effect. It is not clear,
however, why the effect should appear with the summary presentation procedure and not with the procedures used in Experiments 1 and 2. There have been only a few direct comparisons
between different presentation procedures (Arkes & Harkness,
1983; Kao & Wasserman, 1993; Lober & Shanks, 2000; Ward &
Jenkins, 1965), and more needs to be learned about the reasons for
differences between them. It is at least possible to conclude that
significant effects of the pCI manipulation occurred with all three
procedures.
Tests of Other Models
This section assesses the predictions of the foremost models in
the current literature, the power PC theory (Cheng, 1997), a
weighted version of the ⌬P rule (Lober & Shanks, 2000), and the
application to human causal judgment of the Rescorla–Wagner
model (R-W model; Rescorla & Wagner, 1972) of associative
learning (Allan, 1993; Shanks, 1995; Wasserman et al., 1996).
The R-W Model
The general principle of the associative learning approach to
human causal judgment is that causal judgment follows the accumulation of a record of experiences, forming an association between mental representations of a cue (or an action) and an
outcome. When the cue is present and the outcome occurs, the
associative bond tends to strengthen unless it has already reached
asymptote. When the outcome does not occur in the presence of
the cue, the associative bond tends to weaken. The stronger the
associative bond is, the more likely it is that the cue will be judged
a cause of the outcome. The associative learning approach has
generally taken as its model the R-W model (Rescorla & Wagner,
1972). The application of the R-W model to human causal judgment has been thoroughly described by many authors (e.g., Allan,
1993; Shanks, 1993, 1995; Shanks & Dickinson, 1987; Wasserman et al., 1996) and is briefly summarized here.
In this case, there is a stimulus (the food additive, BA, in
Experiment 1), which may or may not be present on any given
trial, and another stimulus, the meal, in which the food additive is
included and which is present on every trial (although the rest of
the content of the meal is not specified). For this situation, the
change in strength of an association between a stimulus and an
outcome is represented in Equation 4:
⌬VS ⫽ ␣ 䡠 ␤共 ␭ ⫺ VT兲 .
(4)
In this equation ⌬VS represents the change in associative
strength of the stimulus on a trial when S occurs. VT is the
combined associative strength of all the stimuli present on a trial,
including S. ␣ is a learning rate parameter representing the salience
of the stimulus and it determines the rate at which learning reaches
asymptote. ␤ is a learning rate parameter associated with the
outcome and can be specified separately for occurrences of the
outcome (reinforced trials) and nonoccurrences of the outcome
(nonreinforced trials). ␭ denotes the maximum level of associative
strength that can be supported by the outcome.
Each of the eight problems in Experiments 1, 2, and 3 was
entered into a simulation of the R-W model, with the meal that was
present on all types of trial represented by one stimulus and the
food additive that was present on two types of trials (i.e., Cell a
and Cell b trials) being represented by another stimulus. Asymptotic outputs were recorded. In the simulation ␣ and ␭ were both
set at 1. Four simulations were run: two in which ␤ was the same
for reinforced and nonreinforced trials, one in which it was higher
for reinforced than for nonreinforced trials, and one in which the
opposite was the case. The outputs of these four simulations for
each problem are reported in Table 6.4
Asymptotic values in Table 6 are means for different random
orderings of trials. The asymptote is sensitive to trial order, particularly to the identity of the last trial, and tends to fluctuate by
⫾.02. Table 6 shows that the simulation outputs converge on ⌬P
when both learning rates are the same, a property of the R-W
model demonstrated by Chapman and Robbins (1990) and further
confirmed by other authors since then (Allan, 1993; Shanks, 1993,
1995). This is not the case when the learning rates differ. When ␤
for reinforced trials is greater than that for nonreinforced trials,
asymptote is reduced for problems .17H, .33H, .50H, and .25H,
and raised for problems .17L, .33L, .50L, and .25L. The opposite is
the case when ␤ for nonreinforced trials is greater than that for
reinforced trials.
In all four simulations, however, the asymptotic values do not
resemble the ordinal pattern of differences in judgment found in
Experiments 1, 2, and 3. Regardless of what assumptions are made
4
The simulation was designed by John Pearce. Prior testing established
that the program conforms correctly to the Rescorla–Wagner (1972) model
and, in particular, that when ␤ for reinforced trials equals ␤ for nonreinforced trials, the asymptotic values produced by the simulation are isomorphic with ⌬P values. I am grateful to John for his assistance in running the
simulations reported herein.
MAKING CAUSAL JUDGMENTS
717
Table 6
Asymptotic and Preasymptotic Output of Simulations of the Rescorla–Wagner Model for Problems Used in Experiments 1, 2, and 3
Output
Asymptotic
Preasymptotic
Means
Problem
.02/.02
.1/.1
.02/.01
.01/.02
1
10
Exp. 1
Exp. 2
Exp. 3
.17H
.17L
.33H
.33L
.50H
.50L
.25H
.25L
.50
.50
.50
.50
.50
.50
.25
.25
.50
.50
.50
.50
.50
.50
.25
.25
.33
.66
.33
.66
.33
.66
.19
.26
.66
.33
.66
.33
.66
.33
.27
.18
.06
.14
.13
.12
.18
.10
.17
.06
.30
.32
.41
.34
.45
.35
.30
.20
39.22
43.22
53.71
46.63
66.59
56.63
54.61
39.08
39.97
46.87
59.85
46.65
73.80
55.22
61.12
42.90
39.83
48.31
48.33
49.72
56.31
47.53
50.86
34.22
Note. Column headings for the asymptotic output columns are learning rate for reinforced trials/learning rate for nonreinforced trials. Column headings
for preasymptotic output columns are number of iterations. The learning rates for the preasymptotic output columns were .02/.02. Exp. ⫽ experiment; H ⫽
high; L ⫽ low.
about learning rates, asymptotes for the .25 pair are always lower
than those for the other six problems. Furthermore, asymptotic
values sum to 1 (to a close approximation) for each of the three
pairs in the pCI design and, therefore, fail to reproduce the main
effect of pCI that is the principal finding of the three experiments.
The possibility must also be considered that judgments obtained
in the experiments had not reached asymptote after 24 trials, the
number used in these experiments. To test this possibility, the
simulation was conducted again with low learning rates, and
output was recorded after 1 iteration of the 24-trial sequence and
also after 10 iterations, at which point output values were still well
below asymptotic levels. Table 6 shows that the output after 1
iteration yielded a better fit with the observed means. The sums of
output values were .20 for the .17 pair, .25 for the .33 pair, and .28
for the .50 pair. Moreover the sum for the .25 pair was .23, which
is intermediate between the highest and lowest values. Therefore,
the simulation reproduced the main effect of pCI and the finding
that the sum of the means for the .25 pair was similar to that for the
.33 pair.
The problem with this result is that the preasymptotic output
values after one iteration model Cell a frequency. The correlation
between the output values and Cell a frequency was .99, but the
correlation between output values and pCI was only .33. As these
correlations imply, pCI was itself positively correlated with Cell a
frequency in these stimulus materials. Thus, the apparent success
of the preasymptotic output in modeling the main effect of pCI is
attributable merely to the positive correlation between Cell a
frequency and pCI in the stimulus materials.
After 10 iterations the output values still increased in accordance with the main effect of pCI. The sums of output values were
.62 for the .17 pair, .75 for the .33 pair, and .80 for the .50 pair.
However, at this stage the sum for the .25 pair was .50, substantially below that for the lowest value of pCI. At this stage,
therefore, the simulation did not reproduce the observed results for
this pair. To test this further, the sums of the output values and the
mean values of pCI were correlated with individual mean judgments across the four pairs of problems (from Experiment 2,
because the serial presentation procedure is the closest to a learn-
ing situation in these three experiments). The mean correlations
were .66 for pCI and .40 for the R-W model output. These two
means differed significantly, F(1, 39) ⫽ 15.00, MSE ⫽ 0.09, p ⬍
.001. Thereafter, the output values increasingly tended to converge
on ⌬P values.
If there is any stage of acquisition at which the output successfully models the obtained results, it must be prior to 10 iterations.
By that stage, not only has the output for the .25 pair already fallen
below that for the .17 pair, remaining there until asymptote is
reached, but the R-W model output is a significantly poorer predictor of individual judgments than the pCI rule, as just shown. It
needs to be emphasized that, with these learning rates, output
values after 10 iterations are still below those of the asymptotic
values. This implies that the R-W model could only account for
the present results if judgments were even further short of their
asymptotic values. This is very unlikely. There were 24 trials, and
other experiments using similar methods have shown that learning
in human contingency judgment experiments is usually rapid with
little or no change in judgment after 20 trials (Anderson & Sheu,
1995; Baker et al., 1989; Catena et al., 1998; Lober & Shanks,
2000; Shanks, 1985). Thus, although it is impossible to be sure that
the judgments observed in these experiments were asymptotic, the
R-W model could only account for the observed trends if learning
had proceeded far more slowly than in the experiments just cited.
There is no clear reason why this should happen.
In conclusion, asymptotic output of the R-W model simulation
failed to reproduce the main effect of pCI regardless of what
values were set for ␤ for reinforced and nonreinforced trials.
Preasymptotic output after one iteration did reproduce the main
effect of pCI, but this result can be explained as a by-product of the
fact that the model’s output was isomorphic with Cell a frequencies in the early stages of acquisition. It is therefore unlikely that
the main effect of pCI can be explained by the R-W model.
The Power PC Theory
In the power PC theory (Cheng, 1997), it is assumed that people
have the preconceived belief that things have causal powers to
WHITE
718
produce certain effects. These powers are unobservable, so the task
of causal induction is to arrive at an estimate of the strengths of
causal powers. To do this, reasoners seek to distinguish the contribution to an effect of the candidate being judged from the
contribution of the set of all other candidates. For a single causal
candidate, this is achieved by computing ⌬P for the candidate and
then adjusting ⌬P in the light of information about the base rate of
occurrence of the effect when the candidate is absent. In the case
of generative causal power, this is modeled by Equation 5:
p ⫽ ⌬P/[1 ⫺ p(E/⫺C)] .
(5)
In this equation p stands for the causal power of the candidate
being judged. This equation is valid when the causal candidate and
the set of other causes of the effect are independent of each other.
This account was tested against the results of Experiments 1, 2,
and 3 by computing values of p for each of the eight problems. The
results of these computations are reported in Table 4. The table
shows that p is higher than ⌬P for problems .17H, .33H, and .50H,
and identical to p for problems .17L, .33L, and .50L. However the
mean value of p is the same for each pair, .75. This shows that p
is no more able to explain the significant effect of pCI than ⌬P is.
Moreover, the mean value of p for the .25 pair (.42) is less than the
mean value for the .33 pair (.75). This means that the power PC
theory, like ⌬P, predicts lower mean judgments for the .25 pair
than for the .33 pair, and this prediction was not supported either.
The power PC theory, therefore, fails to account for the principal
results of the first three experiments, the significant main effect of
pCI, and the lack of significant effect of ⌬P.
The Weighted ⌬P Rule
The ⌬P statistic is an objective measure of contingency. However as a model of causal judgment, it assumes equal weighting of
cells and may therefore be an inaccurate predictor if there are
consistent tendencies in weightings of cells. Lober and Shanks
(2000) found a better fit to their data if p(E/C) was weighted
w1 ⫽ 1.0, and p(E/⫺C) was weighted w2 ⫽ .6, so a decision was
made to test a version of ⌬P with these weights on the present data.
Weighted ⌬P values are shown in the right column of Table 4. As
the table shows, weighted ⌬P values were highly correlated with
p values and encountered similar problems. They failed to predict
the main effect of pCI, and they predicted higher ratings for the .33
pair than for the .25 pair, which were not found.
No combination of weights given to p(E/C) and p(E/⫺C) predicts the main effect of pCI. For example, if p(E/C) is weighted
w1 ⫽ 1.0, and P(E/⫺C) is weighted w2 ⫽ .2, weighted ⌬P values
for the first 6 problems in Table 4 are .9, .5, .9, .5, .9, and .5,
respectively. The reason for this is that the values of p(E/C) and
p(E/⫺C) do not vary between the pairs of problems: p(E/C) ⫽ 1 in
problems .17H, .33H, and .50H, and .5 in .17L, .33L, and .50L,
whereas P(E/⫺C) ⫽ .5 in problems .17H, .33H, and .50H, and 0
in .17L, .33L, and .50L. Any weight must generate the same
product for a given value of p(E/C) or P(E/⫺C), so weighting the
probabilities can never model a main effect of pCI when ⌬P is held
constant.
Experiment 4
I stated in the introduction that the pCI rule is a proportional
version of the frequency-based ⌬D rule. In the ⌬D rule (Equation
3) the sum of disconfirming instances is subtracted from the sum
of confirming instances, and causal judgment is hypothetically
based on the result. When the number of instances is held constant
across problems, as in the design shown in Table 4, pCI and ⌬D
values are isomorphic, entailing that the two rules make identical
ordinal predictions. However, the ⌬D rule has the property not
shared by the pCI rule that values of ⌬D tend to increase as the
number of instances increases when there is a positive contingency
(Shanks, 1987). This implies that the rules can be tested by
manipulating frequencies independently of proportions: If people
judge according to ⌬D and not the pCI rule, then judgments should
vary with frequencies when proportions are held constant. This is
the aim of Experiment 4.
Method
Participants. The participants were 42 first-year undergraduates at
Cardiff University, who received course credit in return for participating.
Stimulus materials. The initial written instructions were as in Experiment 1, except that the name of the fictitious food additive was E263. The
instance list procedure was used, as in Experiment 1.
Design. The design was a reduced version of the pCI design shown in
Table 4, with problems .33H and .33L omitted, but with a frequency
manipulation as an extra independent variable. Cell frequencies and other
relevant features of the judgmental problems are shown in Table 7. The last
letter in each problem designation refers to number of instances presented,
low (L) or high (H). Three variables were manipulated.
One variable was the total number of instances in each problem, 24
or 48; ⌬D was computed according to Equation 3. As Table 7 shows,
doubling the number of instances doubles ⌬D values while pCI values
remain unchanged, so this manipulation tests the ⌬D rule against the pCI
rule. PCI values were manipulated with two values, .17 and .50. The third
variable was the occurrence rate manipulation, as in the previous three
experiments. The bottom two rows of Table 7 show that other variables,
specifically p(E), p(C), p(E/C), P(E/⫺C), ⌬P, and a/(a ⫹ d) were held
constant across the pCI manipulation, as in the previous experiments. The
bottom two rows of Table 7 also show that ⌬D was not held constant across
the pCI manipulation. ⌬D and pCI are distinguished by the frequency
manipulation, as described above.
Procedure. The procedure was similar to that of Experiment 1.
Results and Discussion
Data were analyzed with a 2 (frequency; 24 vs. 48 instances) ⫻ 2 ( pCI; .17 vs. .50) ⫻ 2 (high vs. low occurrence rate)
within-subjects ANOVA. Means are reported in Table 8.
Analysis yielded two significant results. There was a significant
main effect of pCI, F(1, 41) ⫽ 43.54, p ⬍ .001, with a higher mean
at the higher level of pCI (63.86) than at the lower level (48.42).
This effect was qualified by a significant interaction with occurrence rate, F(1, 41) ⫽ 14.52, p ⬍ .001. This interaction showed the
same pattern as that found in the previous three experiments.
The main effect of frequency was not significant, F(1,
41) ⫽ 1.09, and there were no significant interactions between
frequency and the other variables.
The main effect of pCI replicates the findings of the previous
three experiments. In this experiment, however, the pCI rule was
MAKING CAUSAL JUDGMENTS
719
Table 7
Cell Frequencies and Properties of Stimulus Information, Experiment 4
Cell frequencies
Problem
a
b
c
d
pCI
⌬D
p(C)
p(E)
p(E/C)
p(E/⫺C)
⌬P
a/(a ⫹ d)
.17HL
.17HH
.17LL
.17LH
.50HL
.50HH
.50LL
.50LH
M of first 4
M of second 4
4
8
10
20
12
24
6
12
0
0
10
20
0
0
6
12
10
20
0
0
6
12
0
0
10
20
4
8
6
12
12
24
.17
.17
.17
.17
.50
.50
.50
.50
.17
.50
4
8
4
8
12
24
12
24
6
18
.17
.17
.83
.83
.50
.50
.50
.50
.50
.50
.58
.58
.42
.42
.75
.75
.25
.25
.50
.50
1.00
1.00
.50
.50
1.00
1.00
.50
.50
.75
.75
.50
.50
.00
.00
.50
.50
.00
.00
.25
.25
.50
.50
.50
.50
.5
.5
.50
.50
.50
.50
.29
.29
.71
.71
.67
.67
.33
.33
.50
.50
Note. In terms of the cell identifications in Table 1, p(CI) ⫽ proportion of confirmatory instances; p(E) ⫽ (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d); p(C) ⫽ (a ⫹ b)/(a
⫹ b ⫹ c ⫹ d); p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present; p(E/⫺C) ⫽ proportion of instances on which the
effect occurs when the cause is absent.
tested against the ⌬D rule by means of a manipulation of the total
number of instances. This manipulation was ineffective as far as
can be judged from the ANOVA results: There was no significant
effect of or interaction with the frequency manipulation. Therefore, it is unlikely that people judge according to the ⌬D rule.
Previous experiments have compared different total numbers of
instances for otherwise identical contingencies and have consistently failed to find significant effects (Anderson & Sheu, 1995;
Baker et al., 1989; Lober & Shanks, 2000; Mandel & Lehman,
1998; Mercier & Parr, 1996). The only exceptions involved a very
small number of trials on one side of the comparison (eight in
Mercier & Parr, 1996, and six in Anderson & Sheu, 1995). Those
results are consistent with the results of Experiment 4. However,
Experiment 4 is the first experiment to design a frequency manipulation as a direct test of the ⌬D rule and to compare it with the
pCI rule.
results supportive of the pCI rule, it would obviously be desirable
to show that effects of pCI are not confined to this particular
design. Therefore, in Experiment 5, I tested the rule with a different design.
In addition, the measure of causal judgment used in the first four
experiments was worded as a measure of likelihood: The more
likely the participant thought it was that the causal candidate
caused the effect, the higher the rating the participant should have
given. The introduction to this measure stipulated that the participant’s task was to judge to what extent the statement that the
candidate caused the effect was true. However, it is possible that
participants treated the task as a rating of subjective confidence
rather than a true causal judgment. Accordingly, in Experiment 5
I used a different measure, intended to be an unambiguous measure
of causal strength.
Method
Experiment 5
The experiments reported so far all used the same design,
similar to that used by White (2000b). The purpose of the design
was to manipulate pCI while holding all other relevant factors
constant. It is difficult to find designs that have this feature because
there are so many factors that have to be held constant: not only the
objective contingency measured by ⌬P, but also p(E/C), P(E/⫺C),
p(C), p(E), and a/(a ⫹ d). Although the design reliably yielded
Table 8
Mean Causal Judgments, Experiment 4
pCI
Occurrence rate
Frequency
.17
.50
High
High
Low
Low
24
48
24
48
44.03
44.76
50.24
54.64
67.07
70.02
59.62
58.71
Note.
p(CI) ⫽ proportion of confirmatory instances.
Participants. The participants were 39 first-year undergraduates at
Cardiff University, who received course credit in return for participation.
Stimulus materials and procedure. All features of instructions to participants, stimulus format, and procedure were the same as in Experiment 1, except that the food additive was called E263 and a different
measure of causal judgment was used. Beneath the stimulus information
for each problem there was a statement saying, “Please rate the extent to
which E263 causes the patient’s allergic reactions.” On the initial instruction sheet, participants were instructed that they should do this by writing
a number from 0 to 100 beside the statement; 0 meant that E263 did not
cause the patient’s allergic reactions at all, and 100 meant that E263 was
a very strong cause of the patient’s allergic reactions. The more strongly
they thought E263 caused the patient’s allergic reactions, the higher the
number they were to assign.
Design. The design used in the first 4 experiments held constant all
factors other than pCI by means of a manipulation of the occurrence rate,
in other words a manipulation of p(E/C) and P(E/⫺C). This design held
constant all factors in a different way, by manipulating the proportion of
instances on which the cause was present, p(C). Cell frequencies and other
relevant features of the judgmental problems are shown in Table 9.
Two variables were manipulated: pCI was manipulated with four values,
⫺.2, .1, .4, and .7, and p(C) was manipulated with two values, .2 and .8.
The sole purpose of the p(C) manipulation was to hold constant values of
WHITE
720
Table 9
Cell Frequencies and Properties of Stimulus Information, Experiment 5
Cell frequencies
a
b
c
d
pCI
p(C)
p(E)
p(E/C)
p(E/⫺C)
⌬P
a/(a ⫹ d)
.32
0
24
2
16
4
8
6
0
24
0
16
2
8
4
0
6
8
8
16
6
24
4
32
2
⫺.2
⫺.2
.1
.1
.4
.4
.7
.7
⫺.2
.1
.4
.7
.2
.8
.2
.8
.2
.8
.2
.8
.5
.5
.5
.5
.80
.20
.55
.45
.30
.70
.05
.95
.50
.50
.50
.50
1.00
.25
.75
.50
.50
.75
.25
1.00
.62
.62
.62
.62
.75
.00
.50
.25
.25
.50
.00
.75
.37
.37
.37
.37
.25
.25
.25
.25
.25
.25
.25
.25
.25
.25
.25
.25
.50
.50
.27
.73
.14
.86
.06
.94
.50
.50
.50
.50
Problem
⫺.2L
⫺.2H
.1L
.1H
.4L
.4H
.7L
.7H
M of
M of
M of
M of
8
8
6
16
4
24
2
⫺.2L and ⫺.2H
.1L and .1H
.4L and .4H
.7L and .7H
Note. In terms of the cell identifications in Table 1, p(CI) ⫽ proportion of confirmatory instances; p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d); p(E) ⫽ (a ⫹ c)/(a
⫹ b ⫹ c ⫹ d); p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present; p(E/⫺C) ⫽ proportion of instances on which the
effect occurs when the cause is absent.
other variables so that the main effect of pCI was an unconfounded test of
the pCI rule. The bottom four rows of Table 9 show that all the other
variables were held constant across the pCI manipulation. In this design, as
in that used in the previous experiments, the main effect of the pCI
manipulation therefore provided a test of the pCI rule that was unconfounded with respect to these other variables.
Results and Discussion
Data were analyzed with a 4 ( pCI; ⫺.2 vs. .1 vs. .4 vs. .7) ⫻ 2
( p(C); .2 vs. .8) within-subjects ANOVA. Means are reported in
Table 10.
Analysis yielded a significant effect of pCI, F(3, 114) ⫽ 33.64,
p ⬍ .001. Post hoc comparisons with the Newman-Keuls test
revealed the order ⫺.2 ⬍ .1 ⬍ .4 ⬍ .7 ( p ⬍ .05 in all cases). This
was the predicted effect of the pCI manipulation.
There was a significant effect of p(C), F(1, 38) ⫽ 29.35, p ⬍
.001, with a higher mean at the higher value of p(C). This can be
interpreted as an effect of a/(a ⫹ d). Table 7 shows that values of
a/(a ⫹ d) tended to be higher at the higher value of p(C) (range ⫽
.50 –.94) than at the lower value (range ⫽ .06 –.50).
The interaction was also significant, F(3, 114) ⫽ 7.76, p ⬍ .001.
Table 10 shows that means tended to increase with increasing pCI
value at both values of p(C), but the rate of increase was considerably greater at the higher value of p(C). This, like the similar
Table 10
Mean Causal Judgments, Experiment 5
.2
.8
All
Experiment 6
If p(C) ⫽ .5, pCI can generate predictions for causal judgment
even in cases where the objective contingency assessed by ⌬P is
zero. The aim of Experiment 6 was to test this contention, in this
case using the serial presentation procedure. The design strategy
was similar to that of the previous experiments, manipulating pCI
while holding ⌬P, p(E/C), P(E/⫺C), p(E), p(C), and a/(a ⫹ d)
constant across pairs of problems.
Method
pCI
p(C)
interactions found in the previous experiments, is addressed in the
General Discussion section.
The results supported the predictions. Judgments tended to vary
with pCI even though the objective contingency was held constant.
It was noted in the Tests of Other Models section reporting the
simulations of the R-W model that pCI was correlated with Cell a
frequency in the stimulus materials. Therefore, there could be a
danger that participants made judgments on the basis of Cell a
frequency alone and that the apparent predictive success of the pCI
rule could be attributed to its correlation with Cell a frequency. In
this experiment, Cell a frequency decreased from Problem ⫺.2L to
.7L. However, the means for those four problems reported in
Table 10 show an increasing trend, consistent with the pCI rule and
contrary to the direction of change in Cell a. This disconfirms the
possibility that judgments were made in accordance with Cell a
frequency and not pCI.
⫺.2
.1
.4
.7
All
26.23
27.08
26.65
28.49
37.38
32.94
37.21
53.64
45.42
41.05
70.33
55.69
33.24
47.11
Note. pCI ⫽ proportion of confirmatory instances; p(C) ⫽ proportion of
instances on which the cause is present.
Participants. The participants were 43 first-year undergraduate students at Cardiff University, participating in return for course credit.
Stimulus materials, design, and procedure. Most features of method,
including initial written instructions; format of stimulus presentations; and
procedure, were the same as in Experiment 2. The differences were in the
design. In this experiment, there were four problems manipulating pCI with
two values, ⫺.17 and .17, while holding ⌬P constant at zero and also
holding constant the values of other variables. This was again accomplished by manipulating p(E/C) and p(E/⫺C), an occurrence rate manip-
MAKING CAUSAL JUDGMENTS
721
Table 12
Mean Causal Judgments, Experiment 6
ulation. Event frequencies and values of other variables for each problem
are shown in Table 11.
Occurrence rate
Results and Discussion
Ratings were analyzed with a 2 ( pCI; ⫺.17 vs. .17) ⫻ 2 (low vs.
high occurrence rate) ANOVA with repeated measures on both
factors. Means are reported in Table 12.
Analysis revealed a significant main effect of pCI, F(1,
42) ⫽ 15.71, p ⬍ .001, with higher ratings at the higher value of
pCI. There was a significant main effect of occurrence rate, F(1,
42) ⫽ 32.26, p ⬍ .001, with higher ratings at the higher occurrence
rates. There was also a significant interaction, F(1, 42) ⫽ 17.65,
p ⬍ .001. This interaction has the same pattern as the corresponding interactions in the previous experiments.
The results supported the predictions of the pCI rule. The power
PC theory is unable to account for these results because Equation 5
only applies to nonzero values of ⌬P. However it is possible to test
a weighted version of ⌬P. For this, p(E/C) was weighted w1 ⫽ 1.0
and P(E/⫺C) was weighted w2 ⫽ .6. Weighted ⌬P values are
shown in the right column of Table 10. As in Experiments 1, 2,
and 3, the mean value of ⌬Pw was the same at both levels of pCI,
and for that reason ⌬Pw cannot account for the significant effect of
pCI.
pCI
⫺.17
.17
All
low
high
All
31.28
29.26
30.27
35.49
54.44
44.97
33.38
41.85
37.62
Note. pCI ⫽ proportion of confirmatory instances.
cluding the use of zero contingencies (Experiment 6). This adds
to the evidence that the pCI rule predicts judgments of complex
causal interpretations specifying causes and enablers (White,
2000b) and judgments of interactions between two causal candidates (White, 2002b), thereby supporting the contention that
the pCI rule is a general rule that applies to judgments of any
kind of causal interpretation.
Manipulations of ⌬P with pCI held constant consistently
failed to produce significant results, except in Experiment 3.
Several simulations of the R-W model with different learning
rates, and taking both asymptotic and preasymptotic output,
failed to reproduce the pattern of findings obtained in the
first three experiments. It was also shown that neither the
equation for estimating causal power in the power PC theory
(Cheng, 1997) nor any weighted version of the ⌬P rule could
account for the effects of the pCI manipulation in the first three
experiments. The ⌬D rule, the frequency-based version of the
pCI rule, was eliminated by the results of Experiment 4, in
which no significant effect of a frequency manipulation was
obtained.
The present experiments were designed to test the pCI rule,
shown in Equation 2. However, there were also several significant interactions that had a consistent pattern, which can be
illustrated with the results of Experiments 1, 2, and 3 (Table 5).
In the .17 pair, judgments tended to be higher at the low value
of occurrence rate than at the high value. In the .33 pair, there
was no consistent difference. In the .50 pair, judgments tended
to be higher at the high value of occurrence rate than at the low
value. Across values of pCI, the tendency was for strong effects
General Discussion
The present findings indicate that the pCI rule predicts judgmental phenomena that are not predicted by any other rule or
model. Significant effects of pCI manipulations were consistently obtained when ⌬P was held constant. The results supported the predictions of the pCI rule in several different
stimulus presentation procedures, the instance list procedure
(Experiments 1, 4, and 5), the serial presentation procedure
(Experiments 2 and 6), and the summary presentation procedure
(Experiment 3). Most of the experiments used a measure in
which participants judged the likelihood that the causal candidate caused the effect. Because of the possibility that this might
have been treated as a judgment of subjective confidence,
Experiment 5 used a measure of causal strength that could not
be interpreted as a confidence measure. Different ways of
manipulating pCI values independently of ⌬P were used, in-
Table 11
Stimulus Materials for Experiment 6
Cell frequencies
Problem
a
b
c
d
pCI
⌬P
p(E)
p(C)
p(E/C)
p(E/⫺C)
a/(a ⫹ d)
⌬Pw
⫺.17L
⫺.17H
⫹.17L
⫹.17H
M of ⫺.17L & ⫺.17H
M of ⫹.17L & ⫹.17H
6
4
2
12
12
2
4
6
2
12
6
4
4
6
12
2
⫺.17
⫺.17
.17
.17
⫺.17
.17
0
0
0
0
0
0
.33
.67
.33
.67
.50
.50
.75
.25
.25
.75
.50
.50
.33
.67
.33
.67
.50
.50
.33
.67
.33
.67
.50
.50
.60
.40
.14
.86
.50
.50
.13
.27
.13
.27
.20
.20
Note. In terms of the cell identifications in Table 1, p(CI) ⫽ proportion of confirmatory instances; p(E) ⫽ (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d); p(C) ⫽ (a ⫹ b)/(a
⫹ b ⫹ c ⫹ d); p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present, p(E/⫺C) ⫽ proportion of instances on which the
effect occurs when the cause is absent; ⌬P ⫽ p(E/C) ⫺ p(E/⫺C). ⌬Pw ⫽ weighted ⌬P.
WHITE
722
of pCI to occur at a high occurrence rate and weak effects at a
low occurrence rate.
These findings can be accommodated by a simple modification to the pCI rule. The pCI rule shown in Equation 2 gives
equal weight to Cell a and Cell d instances. However, there is
abundant evidence that these cells are not weighted equally in
causal judgment and that Cell a consistently carries more
weight than Cell d (Anderson & Sheu, 1995; Kao & Wasserman, 1993; Levin et al., 1993; Mandel & Lehman, 1998;
Wasserman et al., 1990, 1993, 1996). This indicates that a
weighted version of the pCI rule may be a better predictor of
judgmental tendencies. This weighted version of the pCI rule
can be expressed as in Equation 6:
pCI ⫽
wa[s(xa)] ⫹ wd[s(xd)] ⫹ wb[s(xb)] ⫹ wc[s(xc)]
.
a⫹b⫹c⫹d
(6)
In this equation s(xa), s(xb), s(xc), and s(xd) are subjective
evaluations of Cells a, b, c, and d, respectively, and vary from
⫺1 to 1. wa, wb, wc, and wd are weights assigned to s(xa), s(xb),
s(xc), and s(xd), respectively, and vary from 0 to 1. In this rule,
therefore, each cell has both a value and a weight (Hogarth &
Einhorn, 1992; White, 2003a). The unweighted pCI rule can be
regarded as a version of Equation 6 in which all weights are
equal and subjective evaluations are 1 for Cells a and d and ⫺1
for Cells b and c.
Two antecedents share with Equation 6 the property of
weightings for individual cells, the information integration rule
proposed by Busemeyer (1991), and the weighted ⌬D rule
proposed by Catena et al. (1998). These two rules differ from
Equation 6 in various features. Both rules differ in that they
have weights but no values for the cells, assuming instead that
Cells a and d have a confirmatory impact and Cells b and c a
disconfirmatory impact on judgments. These assumptions may
be normative but are not invariably descriptive: White (2000a)
found that substantial minorities of participants ascribed confirmatory value to Cell c or disconfirmatory value to Cell d,
both in retrospective verbal reports and in their actual judgments. A model that has both weights and values, like Equation 6, can model idiosyncratic tendencies of those sorts, but
models with weights and normative assumptions about values
cannot. Busemeyer’s rule was also found to be less predictive
than the R-W model by Wasserman et al. (1996).
Weights and values are liable to vary between individuals
(Anderson & Sheu, 1995; White, 2000a). Finding the optimal
values and weights for predicting mean tendencies of a sample
of individuals is therefore an empirical matter, and different
samples may exhibit different aggregate tendencies. The unweighted pCI rule makes successful predictions because most
people conform to normative notions of cell value and weight
for most cells (White, 2000a). However, several experiments
have shown a consistent tendency in ordering of cell weights,
specifically a ⬎ b ⬎ c ⬎ d (Anderson & Sheu, 1995; Kao &
Wasserman, 1993; Levin et al., 1993; Mandel & Lehman, 1998;
Wasserman et al., 1990). Consistent with this finding, any
combination of cell weights in which Cell d carries significantly
less weight than Cell a successfully predicts both the main
effect of pCI and the interaction with occurrence rate in all the
present experiments.
To illustrate, suppose wa ⫽ wb ⫽ wc ⫽ 1 and wd ⫽ .6.
Consider the six problems in the pCI design of Experiments 1,
2, and 3 (Table 4). For the two .17 problems, weighted pCI
values are 0 for .17H and .1 for .17L. This amounts to a
prediction of higher judgments for the latter, and Table 5 shows
that this happened. For the two .50 problems, weighted pCI
values are .4 for .50H and .3 for .50L. This time, weighted pCI
values predict higher judgments for the former, and Table 5
shows that this happened. For the two .33 problems, weighted
pCI values are the same for both, whereas in fact Table 5 shows
generally higher means for .33H than for .33L. This could be
explained by assuming b ⬎ c, as has generally been found in the
studies just cited: The problem in which Cell c had a higher
frequency than Cell b (.33H) tended to receive higher judgments than the problem in which Cell b had a higher frequency
than Cell c (.33L).
Therefore, it is likely that the full pattern of differences
between means can be accounted for by assigning weights to
cells in accordance with the a ⬎ b ⬎ c ⬎ d ordering found in
previous studies. This cell-weight inequality still lacks a convincing explanation, although some hypotheses have been proposed and are consistent with the evidence (Levin et al., 1993;
Mandel & Lehman, 1998). However White (2003a) found evidence that cell weights tend to vary in accordance with variations in the confirmatory or disconfirmatory value of instances
of a particular kind. Cells a and d both tend to be evaluated as
confirmatory and Cells b and c as disconfirmatory. White found
that when the frequency in one cell was held constant at zero,
the weight given to its partner increased significantly. Usually,
Cell d carries the least weight of the four cells, but when Cell
a frequency was zero Cell d carried substantially more weight
than either Cell b or Cell c. The cell carried more weight
because it was the only kind of confirmatory information available. Similar tendencies were obtained for all four cells. This
suggests that the explanation for the cell-weight inequality has
to do with the relative evidential value of different kinds of
information: Manipulations that affect the evidential values of
cells change the pattern of cell weights. Under this interpretation, the main effects predicted by the unweighted pCI rule and
the interactions that relate to frequencies in Cells a and d can
both be understood as effects on judgment of the evidential
value ascribed to each kind of contingency information.
Manipulations of ⌬P with pCI held constant failed to yield
significant differences in Experiments 1 and 2. However ⌬P
was varied over a smaller range of values (.25) than pCI (.33),
which makes it somewhat less likely that significant effects of
⌬P would be obtained. It remains possible that a significant
effect of ⌬P would be found with a greater range of values of
⌬P under otherwise similar conditions. Having said that, a
range of .25 is not obviously inadequate as a test of the ⌬P rule.
Other studies have reported significant differences between
judgmental problems in which ⌬P values differed by .25 or less
(e.g. Anderson & Sheu, 1995; Wasserman et al., 1983; White,
2000a), and accounts that have proposed that causal judgments
are made in accordance with ⌬P certainly imply that judges
should be sensitive to differences of that magnitude (Cheng &
MAKING CAUSAL JUDGMENTS
Novick, 1992). The problem with the previous studies, as
explained in the introduction, is that in each case, ⌬P values
were confounded with pCI values, so the present results imply
that the significant differences observed therein can be attributed to variations in pCI, not ⌬P.
The most pervasive difficulty for tests of any model of causal
judgment is that of confounding variables. Past tests of the ⌬P
rule have confounded ⌬P with other variables, notably pCI as
already explained. The designs used here were intended to
unconfound pCI from as many variables as possible, including
⌬P, p(E), p(C), p(E/C), and P(E/⫺C). There is, however, no
possibility of a single design that unconfounds pCI, or any other
variable, from every variable that there is. For example values
of pCI in Experiments 1, 2, and 3 were positively correlated
with Cell a frequency, and with the related proportion a/T,
where T ⫽ the total number of instances. Thus, the supportive
evidence for pCI is inevitably weakened by the residual confounds with other variables. In the case of Cell a frequency, this
problem was addressed in Experiment 5. In that experiment,
when p(C) ⫽ .8, mean causal judgments tended to increase as
pCI increased. However Cell a frequency decreased as pCI
increased, so in this experiment the results contradicted the
hypothesis that judgments are made in accordance with Cell a
values.
Every published test of any model of causal judgment suffers
from problems of confounding variables to some degree. The
present results support an argument that past findings of causal
judgments varying with ⌬P values should in fact be interpreted
as support for the pCI rule because in every case ⌬P was
confounded with pCI. Further attempts to find designs that
unconfound relevant variables are vital to pin down the rule or
equation that best models causal judgment from contingency
information.
Under the present account, contingency information is interpreted as evidence. In effect the causal candidate is a hypothesis, and the extent to which the hypothesis is accepted depends
on the proportion of evidence that is confirmatory for it. Judgment is liable to change as evidence is accumulated until a point
is reached at which the judge achieves closure and judgment
does not change any further unless a significant change in the
proportion of confirmatory evidence is detected. The pCI rule is
a rule for asymptotic judgment and does not describe how
causal judgments are acquired. Under the evidential evaluation
model, however, the acquisition of causal judgment proceeds in
a manner similar to that predicted by belief revision models
(Catena et al., 1998; Hogarth & Einhorn, 1992). A belief
revision model specific to the evidential evaluation model has
been proposed and tested (White, 2003b), and the intention is
that the belief revision model should combine with the pCI rule
to provide a comprehensive account of the acquisition and
finalization of causal judgment.
Therefore, the present research has shown that a simple judgmental rule, the pCI rule, predicts phenomena not predicted by the
leading models of causal judgment. Causal judgments consistently
varied with variations in pCI when ⌬P and other relevant variables
were controlled and consistently failed to vary with ⌬P when pCI
values were controlled. The pCI rule is a central component of a
model in which instances of contingency information are trans-
723
formed into evidence in accordance with subjective notions of
evidential value, and this model is currently being developed to
explain phenomena connected with the acquisition of causal judgments. It would be premature to conclude that inductive rule
models such as the power PC theory (Cheng, 1997) and associative
learning models such as the R-W model (Allan, 1993; Rescorla &
Wagner, 1972; Shanks, 1995) are no longer viable contenders, but
the evidence supports the conclusion that a model that is based on
the principle of evidential evaluation is a strong alternative.
References
Allan, L. G. (1993). Human contingency judgements: Rule based or
associative? Psychological Bulletin, 114, 435– 448.
Allan, L. G., & Jenkins, H. M. (1983). The effect of representations of
binary variables on the judgment of influence. Learning and Motivation,
14, 381– 405.
Anderson, J. R., & Sheu, C.-F. (1995). Causal inferences as perceptual
judgments. Memory and Cognition, 23, 510 –524.
Arkes, H. R., & Harkness, A. R. (1983). Estimates of contingency between
two dichotomous variables. Journal of Experimental Psychology: General, 112, 117–135.
Baker, A. G., Berbrier, M., & Vallée-Tourangeau, F. (1989). Judgments of
a 2 ⫻ 2 contingency table: Sequential processing and the learning curve.
Quarterly Journal of Experimental Psychology, 41B, 65–97.
Busemeyer, J. R. (1991). Intuitive statistical estimation. In N. H. Anderson
(Ed.), Contributions to information integration theory (Vol. 1, pp. 187–
215). Hillsdale, NJ: Erlbaum.
Catena, A., Maldonaldo, A., & Cándido, A. (1998). The effect of the
frequency of judgment and the type of trials on covariation learning.
Journal of Experimental Psychology: Human Perception and Performance, 24, 481– 495.
Chapman, G. B., & Robbins, S. J. (1990). Cue interactions in human
contingency judgment. Memory & Cognition, 18, 537–545.
Cheng, P. W. (1997). From covariation to causation: a causal power theory.
Psychological Review, 104, 367– 405.
Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of
causal induction. Journal of Personality and Social Psychology, 58,
545–567.
Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal
induction. Psychological Review, 99, 365–382.
Dennis, M. J., & Ahn, W.-K. (2001). Primacy in causal strength judgments:
The effect of initial evidence for generative versus inhibitory relationships. Memory & Cognition, 29, 152–164.
Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating:
the belief-adjustment model. Cognitive Psychology, 24, 1–55.
Jenkins, H. M., & Ward, W. C. (1965). Judgment of contingency between
responses and outcomes. Psychological Monographs, 79(Whole no. 10).
Kao, S.-F., & Wasserman, E. A. (1993). Assessment of an information
integration account of contingency judgment with examination of subjective cell importance and method of information presentation. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 19,
1363–1386.
Levin, I. P., Wasserman, E. A., & Kao, S. F. (1993). Multiple methods for
examining biased information use in contingency judgments. Organizational Behavior and Human Decision Processes, 55, 228 –250.
Lober, K., & Shanks, D. R. (2000). Is causal induction based on causal
power? Critique of Cheng (1997). Psychological Review, 107, 195–212.
Mandel, D. R., & Lehman, D. R. (1998). Integration of contingency
information in judgments of cause, covariation, and probability. Journal
of Experimental Psychology: General, 127, 269 –285.
McKenzie, C. R. M. (1994). The accuracy of intuitive judgment strategies:
724
WHITE
Covariation assessment and Bayesian inference. Cognitive Psychology,
26, 209 –239.
Mercier, P., & Parr, W. (1996). Inter-trial interval, stimulus duration and
number of trials in contingency judgments. British Journal of Psychology, 87, 549 –566.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations on the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical
conditioning II: Current research and theory (pp. 64 –99). New York:
Appleton-Century-Crofts.
Shaklee, H., & Hall, L. (1983). Methods of assessing strategies for judging
covariation between events. Journal of Educational Psychology, 75,
583–594.
Shaklee, H., & Mims, M. (1982). Sources of error in judging event
covariations: Effects of memory demands. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 8, 208 –224.
Shaklee, H., & Tucker, D. (1980). A rule analysis of judgments of covariation between events. Memory & Cognition, 8, 459 – 467.
Shaklee, H., & Wasserman, E. A. (1986). Judging interevent contingencies:
Being right for the wrong reasons. Bulletin of the Psychonomic Society,
24, 91–94.
Shanks, D. R. (1985). Continuous monitoring of human contingency judgment across trials. Memory & Cognition, 13, 158 –167.
Shanks, D. R. (1987). Acquisition functions in contingency judgement.
Learning and Motivation, 18, 147–166.
Shanks, D. R. (1993). Human instrumental learning: A critical review of
data and theory. British Journal of Psychology, 84, 319 –354.
Shanks, D. R. (1995). The psychology of associative learning. Cambridge,
England: Cambridge University Press.
Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality
judgement. In G. H. Bower (Ed.), The psychology of learning and
motivation (Vol. 21, pp. 229 –261). New York: Academic Press.
Ward, W. C., & Jenkins, H. M. (1965). The display of information and the
judgment of contingency. Canadian Journal of Psychology, 19, 231–
241.
Wasserman, E. A., Chatlosh, D. L., & Neunaber, D. J. (1983). Perception
of causal relations in humans: Factors affecting judgements of responseoutcome contingencies under free-operant procedures. Learning and
Motivation, 14, 406 – 432.
Wasserman, E. A., Dorner, W. W., & Kao, S.-F. (1990). Contributions of
specific cell information to judgments of interevent contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16,
509 –521.
Wasserman, E. A., Elek, S. M., Chatlosh, D. L., & Baker, A. G. (1993).
Rating causal relations: The role of probability in judgements of
response-outcome contingency. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 19, 174 –188.
Wasserman, E. A., Kao, S.-F., Van Hamme, L. J., Katagiri, M., & Young,
M. E. (1996). Causation and association. In D. R. Shanks, K. J. Holyoak,
& D. L. Medin (Eds.), The psychology of learning and motivation:
Vol. 34. Causal learning (pp. 207– 264). San Diego, CA: Academic
Press.
White, P. A. (2000a). Causal judgement from contingency information:
Relation between subjective reports and individual tendencies in judgement. Memory & Cognition, 28, 415– 426.
White, P. A. (2000b). Causal judgement from contingency information:
The interpretation of factors common to all instances. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 26,
1083–1102.
White, P. A. (2002a). Causal attribution from covariation information: The
evidential evaluation model. European Journal of Social Psychology,
32, 667– 684.
White, P. A. (2002b). Causal judgement from contingency information:
Judging interactions between two causal candidates. Quarterly Journal
of Experimental Psychology: Human Experimental Psychology, 55(A),
819 – 838.
White, P. A. (2002c). Perceiving a strong causal relation in a weak
contingency: Further investigation of the evidential evaluation model of
causal judgement. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 55(A), 97–114.
White, P. A. (2003a). Causal judgement as evaluation of evidence: The use
of confirmatory and disconfirmatory information. Quarterly Journal of
Experimental Psychology: Human Experimental Psychology, 56(A),
491–503.
White, P. A. (2003b). Acquisition of causal judgement: associative learning or belief revision? Manuscript submitted for publication.
Appendix A
Relation Between pCI and ⌬P
The algebra and the computer simulation that generated the data used to construct the figures are both the work
of John Pearce. I am indebted to John for his help with this.
Assume that the value of each cell is represented as a proportion of the total number of instances in the 2 ⫻ 2
table. Then
p (C) ⫽ a ⫹ b,
(A1)
1 ⫺ p(C) ⫽ c ⫹ d,
(A2)
a/(a ⫹ b) ⫽ p(E/C), and
(A3)
c/(c ⫹ d) ⫽ p(E/⫺C).
(A4)
MAKING CAUSAL JUDGMENTS
725
The four cells of the contingency table can then be defined as follows:
a ⫽ p(C) 䡠 p(E/C),
(A5)
b ⫽ p(C) 䡠 [1 ⫺ p(E/C)],
(A6)
c ⫽ [1 ⫺ p(C)] 䡠 p(E/⫺C), and
(A7)
d ⫽ 1 ⫺ p(C) ⫺ p(E/ ⫺ C) ⫹ p(C) 䡠 p(E/⫺C).
(A8)
pCI ⫺ 4 䡠 p(C) 䡠 p(E/⫺C) ⫹ 2p(C) ⫹ 2p(E/⫺C) ⫺ 1
.
2p(C)
(A9)
From this it can be shown that
⌬P ⫽
Conversely,
pCI ⫽ 2p(C) 䡠 (⌬P ⫹ 2p(E/⫺C) ⫺ 1) ⫺ 2p(E/⫺C) ⫹ 1.
(A10)
The key variable in the relationship between pCI and ⌬P is p(C). The mediating role of p(C) is illustrated in
Figure A1, which represents values of pCI for different values of p(C), p(E/C) and p(E/⫺C) for given values of
⌬P. It can be seen in each case that pCI and ⌬P always have the same value when p(C) ⫽ .5. The relationship
between pCI and p(C) is linear in every case but the line rotates around the point at which p(C) ⫽ .5 as values
of p(E/C) and p(E/⫺C) change. Figures for negative values of ⌬P are mirror images of the figures for the
equivalent positive values.
It is also important to illustrate how pCI relates to ⌬P for given values of p(C). Setting values for p(E/C) and
⌬P fully constrains values for p(E/⫺C), so it is convenient to present figures that show how the relationship
Figure A1. Relationship between p(C) and pCI for several values of ⌬P. A: ⌬P ⫽ .75. B: ⌬P ⫽ .50. C: ⌬P ⫽
.25. D: ⌬P ⫽ 0. Values in the legend are p(CI) ⫽ proportion of confirmatory instances; p(C) ⫽ (a ⫹ b)/(a ⫹
b ⫹ c ⫹ d).
(Appendixes continue)
726
WHITE
between pCI and ⌬P changes for given values of p(C) when p(E/C) is held constant. Figure A2 presents these
results for five different values of p(E/C). These figures again show that values of pCI match ⌬P values when
p(C) ⫽ .5.
Figure A2. Relationship between ⌬P and pCI for several values. A: p(E/C) ⫽ 1. B: p(E/C) ⫽ .75. C: p(E/C)
⫽ .50. D: p(E/C) ⫽ .25. E: p(E/C) ⫽ 0. Values in the legend are p(C). p(CI) ⫽ proportion of confirmatory
instances; p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d).
MAKING CAUSAL JUDGMENTS
727
Appendix B
Example Problem, Experiment 1: The .33L problem
Patient: W.T.
Meal no.
BA
Allergic reaction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
present
present
present
absent
absent
present
present
present
present
absent
present
absent
present
absent
absent
present
present
absent
present
absent
present
present
present
present
yes
yes
no
no
no
no
no
yes
no
no
yes
no
no
no
no
yes
yes
no
no
no
yes
no
yes
no
Please rate the likelihood of this statement being right for Patient W.T.: BA causes the patient’s allergic
reactions.
Appendix C
Example Problem, Experiment 3: The .33L Problem
Patient A.M.
Meals containing E263
Meals not containing E263
Allergic
reaction
No allergic
reaction
8
0
8
8
Please judge the likelihood that E263 causes allergic reactions in this patient.
Received November 28, 2001
Revision received December 13, 2002
Accepted December 16, 2002 䡲