Journal of Experimental Psychology: Learning, Memory, and Cognition 2003, Vol. 29, No. 4, 710 –727 Copyright 2003 by the American Psychological Association, Inc. 0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29.4.710 Making Causal Judgments From the Proportion of Confirming Instances: The pCI Rule Peter A. White Cardiff University It is proposed that causal judgments about contingency information are derived from the proportion of confirmatory instances (pCI) that are evaluated as confirmatory for the causal candidate. In 6 experiments, pCI values were manipulated independently of objective contingencies assessed by the ⌬P rule. Significant effects of the pCI manipulations were found in all cases, but causal judgments did not vary significantly with objective contingencies when pCI was held constant. The experiments used a variety of stimulus presentation procedures and different dependent measures. The power PC theory, a weighted version of the ⌬P rule, the Rescorla–Wagner associative learning model (R. A. Rescorla & A. R. Wagner, 1972), and the ⌬D rule, which is the frequency-based version of the pCI rule, were unable to account for the significant effects of the pCI manipulations. These results are consistent with a general explanatory approach to causal judgment involving the evaluation of evidence and updating of beliefs with regard to causal hypotheses. on which the effect occurs when the cause is absent, in terms of Table 1 c/(c ⫹ d). The rule produces values in the range ⫺1 to ⫹1, where ⫹1 equals perfect positive contingency, ⫺1 equals perfect negative contingency, and 0 equals zero contingency. The ⌬P rule is a normative measure of contingency but it is not a normative measure of causality. For example, there may be a high positive contingency between two things, not because one of them causes the other but because both of them are effects of a third factor. Two other relevant variables should also be mentioned at this point: the proportion of instances on which the effect occurs, p(E), equals (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d), and the proportion of instances on which the cause is present, p(C), equals (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d).1 A number of theoretical approaches to causal judgment from contingency information can be distinguished. Under one approach, causal judgments are inferences made by applying an inductive rule to contingency information. An example of an inductive rule is the equation for estimating generative causal power in the power PC theory (Cheng, 1997), given in Equation 5 below. Under another approach, causal judgments are derived from associative bonds between cues (candidate causes) and outcomes (effects) that are acquired by learning from experience with contingencies (Shanks, 1995). Under the present approach, contingency information is initially transformed into evidence, that is, instances are encoded in terms of confirmatory or disconfirmatory value for the hypothesis that the causal candidate causes the effect, and causal judgment is derived from the weight of evidence. Specifically, causal judgment tends to increase as the proportion of confirmatory to disconfirmatory evidence increases. This general rule is called pCI, representing the proportion of confirmatory instances (White, 2000b). Suppose that Joe Snooks occasionally suffers from short-lived rashes of blue spots on his skin. Joe notices that these spots are particularly likely to appear after he has drunk elderflower tea and rarely appear at other times. Eventually he concludes that elderflower tea, or something in it, is causing the spots to appear. How does he arrive at that causal judgment? The question seems innocuous, but it is an example of one of the most important research questions in causal cognition, concerning as it does how people make sense out of the confusion of multifarious events by detecting regularities and inducing causal relations from them. In essence, Joe makes a judgment about elderflower tea based on a set of instances in which he either does or does not drink elderflower tea, and the rashes of blue spots either occur or do not. Thus, there are four possible combinations of the presence and absence of the candidate cause and the effect, and causal judgment is in some way derived from the numbers of instances of each kind. This kind of information can be represented as a 2 ⫻ 2 contingency table, and Table 1 shows the letters that are conventionally used to identify the cells of the table. The degree of contingency between the candidate cause and the effect is described by the ⌬P rule, shown in Equation 1 (Jenkins & Ward, 1965; McKenzie, 1994; Ward & Jenkins, 1965): ⌬P ⫽ p(E/C) ⫺ p(E/⫺C) . (1) In this equation, p(E/C) equals the proportion of instances on which the effect occurs when the cause is present, in terms of Table 1 a/(a ⫹ b), and p(E/⫺C) equals the proportion of instances I am grateful to John Pearce and David Shanks for helpful comments on earlier versions of this article and to Serena Harris for acting as experimenter in Experiments 2 and 6. Correspondence concerning this article should be addressed to Peter A. White, School of Psychology, Cardiff University, P.O. Box 901, Cardiff CF10 3YG, Wales, United Kingdom. E-mail: [email protected] 1 Throughout the article, uppercase C stands for cause (except when it appears in pCI, proportion of confirmatory instances) and lowercase c refers to Cell c of the contingency table. 710 MAKING CAUSAL JUDGMENTS Table 1 Conventional Identification of Cells in the 2 ⫻ 2 Contingency Table Effect occurs Candidate cause Yes No Present Absent a c b d The four cells of the 2 ⫻ 2 contingency table have normative evidential values. Cell a is confirmatory because instances in which the candidate is present and the effect occurs tend to increase the likelihood that the candidate caused the effect. Such instances tend to increase ⌬P unless it is already ⫹1. Cell d instances also tend to increase ⌬P and are also normatively confirmatory. Cells b and c both tend to decrease ⌬P and are normatively disconfirmatory. With these normative evidential values, the principle of judging from the weight of evidence yields a simple rule of judgment, shown in Equation 2: pCI ⫽ (a ⫹ d ⫺ b ⫺ c)/(a ⫹ b ⫹ c ⫹ d) . (2) Like ⌬P, this equation generates values in the range ⫺1 to ⫹1.2 Like models based on ⌬P, including the probabilistic contrast model (Cheng & Novick, 1990, 1992) and the power PC theory (Cheng, 1997), the pCI rule supports ordinal predictions: Causal judgment is assumed to be a monotonically increasing function of pCI. Equation 2 is a specific version of the general pCI rule that applies to the case of single causal candidates. Versions of the rule have been proposed and tested for judging complex causal interpretations (White, 2000b), interactions between two causal candidates (White, 2002b), and Kelleyan causal attributions (White, 2002a). For the remainder of this article the term pCI will be used to refer specifically to the form of the rule in Equation 2. The central claim to be tested in this article is that the pCI rule predicts phenomena of causal judgment with a single causal candidate that are not predicted by any other extant rule or model. The article reports several experiments designed to test the rule and to compare the rule with the predictions of other rules and models. It is not proposed that the rule explains all the phenomena of causal judgment. However the predictive success of the rule implies that it should form part of a comprehensive account of causal judgment, and the form that such an account might take is discussed in the light of the experimental results. The pCI rule is a proportional version of the ⌬D or sum of diagonals rule that has been much investigated in contingency judgment research (Allan & Jenkins, 1983; Arkes & Harkness, 1983; Jenkins & Ward, 1965; Kao & Wasserman, 1993; Shaklee & Hall, 1983; Shaklee & Mims, 1982; Shaklee & Tucker, 1980; Shaklee & Wasserman, 1986; Shanks, 1987; Ward & Jenkins, 1965; Wasserman, Chatlosh, & Neunaber, 1983; Wasserman, Dorner, & Kao, 1990). This is shown in Equation 3: ⌬D ⫽ (a ⫹ d) ⫺ (b ⫹ c) . (3) Ordinal predictions generated by pCI rule and the ⌬D rule are isomorphic when the number of instances per judgmental problem 711 is held constant. However, because the pCI rule is a proportionbased rule and the ⌬D rule is a frequency-based rule, the two rules make different predictions when the number of instances is manipulated for nonzero contingencies. One of the experiments reported below is designed to distinguish the two rules. The experiments are designed to manipulate pCI values independently of ⌬P. The formal relationship between pCI and ⌬P is clarified (and illustrated with two figures) in Appendix A. As the figures show, when p(C) ⫽ .5, pCI and ⌬P values are identical, and the two rules generate identical ordinal predictions for causal judgment. This is an important property of the pCI rule. Many studies have found that causal judgments are highly correlated with ⌬P and that human judges are acutely sensitive to small objective differences in ⌬P (e.g., Anderson & Sheu, 1995; Baker, Berbrier, & Vallée-Tourangeau, 1989; Shanks, 1985, 1987, 1995; Wasserman et al., 1983; Wasserman, Elek, Chatlosh, & Baker, 1993; White, 2000a; see also Allan, 1993; Cheng, 1997). However, in all of these studies p(C) ⫽ .5. This means that the ordinal predictions of the ⌬P rule and the pCI rule for these studies are identical and the results of the studies do not show which is the better predictor of causal judgments. In a small number of studies p(C) ⫽ .5, and then it is possible to ascertain whether judgments vary with pCI when ⌬P is held constant. Ideally, all other variables of relevance should be held constant in comparisons of this sort, in particular the two components of the ⌬P rule, p(E/C) and p(E/⫺C). In addition, the proportion a/(a ⫹ d) should be held constant because many studies have found that Cell a information carries-greater weight in causal judgment than Cell d information (Anderson & Sheu, 1995; Kao & Wasserman, 1993; Levin, Wasserman, & Kao, 1993; Mandel & Lehman, 1998; Wasserman et al., 1990). Levin, Wasserman, and Kao et al. (1993) conducted an experiment in which participants made causal judgments about each of a series of individual problems. In that experiment, it was possible to find pairs of problems that were matched for values of ⌬P, p(E/C), p(E/⫺C), and a/(a ⫹ d). Comparisons could then be made between pairs that had different pCI values. There were four of these comparisons in the problems used by Levin et al. (1993), and in all four cases the higher mean observed judgment was found for the pair with the higher value of pCI. The relevant information is presented in Table 2. In Wasserman, Kao, Van Hamme, Katagiri, and Young (1996, Experiment 1), some participants gave judgments after each trial and others only after the final trial. In what follows, means for these two groups have been averaged together. Among the 21 problems used by Wasserman et al., there are six pairs that have the same mean values of p(E/C), and of p(E/⫺C). The relevant information is presented in Table 3. This shows that the member of each pair with the higher value of pCI received the higher mean judgment in all six cases. Each comparison therefore supports the pCI rule. In 13 of the problems ⌬P ⫽ 0. In four of these (8, 9, 10, 11) pCI ⫽ ⫺.08, in five (1, 2, 3, 4, 5) pCI ⫽ .00, and in four (6, 7, 2 The original version of this rule was (a ⫹ d)/(a ⫹ b ⫹ c ⫹ d) (e.g., White, 2002c). However, for several reasons it is better to include all four cells in the numerator so the version in Equation 2, first propounded by White (2003a), is preferred. WHITE 712 Table 2 Paired Comparisons Testing pCI as a Predictor of Causal Judgments Problem pair ⌬P pCI Sum of means 2⫹9 12 ⫹ 14 3⫹5 8 ⫹ 15 17 ⫹ 24 27 ⫹ 29 18 ⫹ 20 23 ⫹ 30 .267 .267 ⫺.267 ⫺.267 .368 .368 ⫺.368 ⫺.368 .36 .21 ⫺.36 ⫺.21 .58 .27 ⫺.58 ⫺.27 .69 .64 ⫺.83 ⫺.67 .85 .79 ⫺1.09 ⫺.84 Note. Data adapted from Levin et al. (1993, Experiment 1). pCI ⫽ proportion of confirmatory instances. 12, 13) pCI ⫽ .08. Overall means for judgments in each of these three groups were ⫺.45, .50, and 1.12 respectively, and this increasing trend also supports the hypothesis that judgments were made in accordance with the pCI rule. Kao and Wasserman (1993, Experiment 1) used a large number of zero contingency problems. There were 16 possible paired comparisons between problems that had different values of pCI but the same value of p(E/C) and p(E/⫺C). In 13 out of 16 comparisons, the higher mean was found for the problem with the higher value of pCI. In this case, however, the proportion a/(a ⫹ d) was not constant across the comparisons, and the three exceptions all had the property that the hypothetically confirming information mostly consisted of Cell d instances. Therefore, it is possible that judgments were lower for the higher value of pCI in the three exceptions because the pCI value was inflated by the presence of a large number of Cell d instances that had relatively little influence on causal judgment. Kao and Wasserman (1993, Experiment 2) used eight nonzero contingency problems. There were four possible comparisons between problems that had different values of pCI but the same value of p(E/C) and p(E/⫺C) with three groups of participants making 12 comparisons in all. In 9 of these comparisons, the higher mean was found for the problem with the higher value of pCI. Kao and Wasserman also used 13 zero contingency problems. Here, too, there were four possible comparisons among problems that had different values of pCI but the same value of p(E/C) and p(E/⫺C), with three groups of participants making 12 comparisons in all. In 9 of these comparisons, the higher mean was found for the problem with the higher value of pCI. As before, the 6 exceptions (out of 24 comparisons) all involved problems in which the high value of pCI was created by a large number of Cell d instances, so the same explanation could apply. pCI and ⌬P can also be distinguished in a study by Allan and Jenkins (1983). In that study p(C) and p(E), the proportion of instances in which the effect occurred, were manipulated orthogonally for problems with identical ⌬P. In one pair of problems, ⌬P ⫽ ⫹.2 and p(E) ⫽ .8 in both. For one problem, p(C) ⫽ .5 and for another p(C) ⫽ .7. In this case, for the former problem pCI ⫽ ⫹.20, and for the latter problem pCI ⫽ ⫹.44; pCI therefore predicted higher causal judgments for the latter problem than for the former, and that is what Allan and Jenkins (1983) found. A second pair of problems ruled out the possibility that p(C) could be responsible for this difference. In this pair, ⌬P ⫽ ⫹.2 and p(E) ⫽ .2 in both. As before, for one problem p(C) ⫽ .5, and for another p(C) ⫽ .7. In this case, for the former problem pCI ⫽ ⫹.20 and for the latter pCI ⫽ ⫺.04. This time, therefore, pCI predicted lower causal judgments for the latter problem than for the former. By contrast, if p(C) accounted for the first set of results, then the prediction would have been for higher causal judgments for the latter problem. In fact the results supported the pCI prediction. Several published studies therefore support the prediction that judgments tend to vary with pCI values when ⌬P is held constant. However none of these experiments was designed as a test of the pCI rule. Furthermore it is important to investigate whether the tendency of judgments to conform to pCI values depends on the kind of procedure used, and it is also important to investigate whether other models of causal judgment can account for the results. The present experiments were designed to accomplish these objectives. The basic strategy of the experiments was to manipulate pCI while holding ⌬P constant and, in some cases, vice versa. This manipulation is problematic because it tends to be confounded with variations in values of other variables. However other variables can be held constant by constructing pairs of problems and comparing across the pairs. Table 4 gives the relevant information about the design used in Experiments 1, 2, and 3. There are in fact two distinct designs. The eight problems fall into four pairs, labeled .17, .33, .50, and .25. The first three pairs manipulate pCI values and are called the pCI design. In the problem designations, “.17,” “33,” and so forth, refer to values of pCI, and H and L refer to values of the other independent variable. This is a manipulation of both p(E/C) and p(E/⫺C) and is called the occurrence rate variable. This occurrence rate variable takes a high value in the H problems (1.0/0.5) and a low value in the L problems (0.5/0.0); p(E/C) and p(E/⫺C) have to be manipulated together because otherwise ⌬P would vary. The manipulation has no intrinsic significance: It is simply the means by which other relevant variables can be held constant. Table 3 Paired Comparisons Testing pCI as a Predictor of Causal Judgments Problem ⌬P p(E/C) p(E/⫺C) pCI M judgment 7 9 10 12 14 19 17 20 16 21 15 18 .00 .00 .00 .00 ⫺.25 ⫺.25 ⫺.25 ⫺.25 .25 .25 .25 .25 .33 .33 .67 .67 .25 .25 .50 .50 .50 .50 .75 .75 .33 .33 .67 .67 .50 .50 .75 .75 .25 .25 .50 .50 .08 ⫺.08 ⫺.08 ⫹.08 ⫺.20 ⫺.33 ⫺.20 ⫺.33 .20 .33 .20 .33 0.66 ⫺1.00 .36 2.57 ⫺1.87 ⫺4.44 ⫺0.09 ⫺1.53 1.73 2.74 2.39 5.62 Note. Data adapted from Wasserman et al. (1996, Experiment 1). p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present; p(E/⫺C) ⫽ proportion of instances on which the effect occurs when the cause is absent. MAKING CAUSAL JUDGMENTS 713 Table 4 Stimulus Materials for Experiments 1, 2, and 3 Cell frequencies Problem .17H .17L .33H .33L .50H .50L .25H .25L M of M of M of M of .17H .33H .50H .25H & & & & .17L .33L .50L .25L a b c d pCI ⌬P p(E) p(C) p(E/C) p(E/⫺C) a/(a ⫹ d) p ⌬Pw 4 10 8 8 12 6 12 4 0 10 0 8 0 6 4 4 10 0 8 0 6 0 4 4 10 4 8 8 6 12 4 12 .17 .17 .33 .33 .50 .50 .33 .33 .17 .33 .50 .33 .50 .50 .50 .50 .50 .50 .25 .25 .50 .50 .50 .25 .58 .42 .67 .33 .75 .25 .67 .33 .50 .50 .50 .50 .17 .83 .33 .67 .50 .50 .67 .33 .50 .50 .50 .50 1.00 .50 1.00 .50 1.00 .50 .75 .50 .75 .75 .75 .62 .50 .00 .50 .00 .50 .00 .50 .25 .25 .25 .25 .37 .29 .71 .50 .50 .67 .33 .75 .25 .50 .50 .50 .50 1.00 0.50 1.00 0.50 1.00 .50 0.50 .33 .75 .75 .75 .42 .70 .50 .70 .50 .70 .50 .45 .35 .60 .60 .60 .40 Note. In terms of the cell identifications in Table 1 p(CI) ⫽ proportion of confirmatory instances; p(E) ⫽ (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d); p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d); p ⫽ causal power; ⌬Pw ⫽ weighted ⌬P; H ⫽ high; L ⫽ low; p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present; p(E/⫺C) ⫽ proportion of instances on which the effect occurs when the cause is absent. The bottom four rows of the table present mean values of each variable for each pair. These rows show that, for the three pairs in the pCI manipulation, mean values of all variables, namely ⌬P, p(E), p(C), p(E/C), p(E/⫺C), and a/(a ⫹ d), are held constant across the pairs. The proportion a/(a ⫹ d) must be held constant because, as stated above, Cell a instances tend to influence causal judgment more than Cell d instances do, and this differential influence must be controlled. Thus, with this design, the main effect provides a test of the pCI rule that is unconfounded with all of these variables: Thus, if there is a significant main effect of pCI, it cannot be attributed to any of these other variables.3 Table 4 also presents values of two other variables, designated “p” and “⌬Pw”. These variables will be considered in a later section of the article in which tests of other models of causal judgment are discussed. The .33 pair and the .25 pair are the second design in the experiment. They manipulate ⌬P values and are called the ⌬P manipulation. There are two values of ⌬P, .50 and .25, while pCI is held constant at .33. The bottom section of Table 4 shows that this manipulation holds p(E), p(C), and a/(a ⫹ d) constant. It is impossible to hold p(E/C) and p(E/⫺C) constant because these are the constituents of ⌬P so this design holds constant as many variables as possible in a ⌬P manipulation. Under the hypothesis that causal judgments are made in accordance with the pCI rule, it is predicted that there will be a significant main effect of pCI and that the main effect of ⌬P will not be significant. The first three experiments test these predictions using the design in Table 4 with three different presentation procedures, to test the further prediction that similar judgmental tendencies will occur regardless of the procedure used. Experiment 1 follows the procedure used by White (2000b). This is the instance list procedure, in which instances are presented individually but on a single sheet of paper so that participants can scan them freely. Experiment 2 uses the serialpresentation procedure that has been used in many studies (e.g., Allan & Jenkins, 1983, Experiment 3; Anderson & Sheu, 1995, Experiment 1; Catena, Maldonaldo, & Cándido, 1998; Dennis & Ahn, 2001; Lober & Shanks, 2000, Experiments 1, 2, and 3; Ward & Jenkins, 1965; Wasserman et al., 1996; White, 2000a). In this procedure, instances are presented sequentially, each one being withdrawn before the next is presented. Experiment 3 uses the summary presentation procedure that has also been used in many studies (e.g., Kao & Wasserman, 1993; Levin et al., 1993; Lober 3 Values of these other variables are kept constant by averaging over two problems. This is necessary for an unconfounded test of pCI but it has a potential weakness in that it assumes a linear relationship between these variables and their effects on causal judgments. If this assumption does not hold, the tests of pCI are not fully unconfounded. This danger can be discussed with reference to the pCI design of Experiments 1, 2, and 3, as shown in Table 4. The danger does not apply to ⌬P because all six problems have the same value of ⌬P. It does not apply to p(E/C), p(E/⫺C), p (Cheng’s, 1997, measure of causal power), or weighted ⌬P because each pair of problems has the same two values of each of these variables: For example, in the case of p(E/C), the value of one member of a pair is always 1.0 and that of the other is always 0.5. The danger only applies to the other three variables, p(E), p(C), and a/(a ⫹ d). The danger, in essence, is that the effect on causal judgment of changing p(E) (for example) from .58 to .75 (problems .17H and .50H) might be greater or lesser than the effect of changing p(E) from .42 to .25 (problems .17L and .50L), even though the size of the change is the same. There is no published evidence with which to assess this danger. However, even if the size of the effect is not the same, the direction of effect should be. If increasing p(E) raises causal judgment (as past research indicates; Allan, 1993), then decreasing p(E) should lower causal judgment. However, Table 5 shows that this did not happen. Mean judgments were higher in .50L, with the lower value of p(E), than in .17L, with the higher value of p(E). The difference between the .50 problems and the .17 problems, therefore, cannot be accounted for in terms of unequal effects of changes in p(E). If p(E) had any effect on causal judgment, it was not enough to counteract the effect of the pCI manipulation. This is shown even more clearly in Experiment 5 where extreme increases in p(E) across one set of four problems are matched by extreme decreases in p(E) across the other, and causal judgments tended to go up across both; see Tables 9 and 10. The same applies to p(C) and a/(a ⫹ d). In fact, the possibility of unequal effects of changes in p(C) is ruled out by the design of Experiment 5, where values of p(C) are the same across the pCI manipulation in that experiment. WHITE 714 & Shanks, 2000, Experiments 4, 5, and 6; Mandel & Lehman, 1998; Ward & Jenkins, 1965; Wasserman et al., 1990, Experiments 2 and 3). In this procedure, participants are presented with actual frequencies of instances in each cell, in the form of a 2 ⫻ 2 table or in a column or a pie chart. The experiments were designed to test the pCI rule. In addition to the effects predicted by the pCI rule, some significant interactions were obtained with a similar pattern in all the experiments. These interactions are not predicted by the pCI rule. In the General Discussion section, however, it will be shown that a simple modification to the pCI rule suffices to account for the interactions. Therefore, the interactions are described in the respective Results sections and addressed in the General Discussion section. Experiments 1, 2, and 3 Method Participants. The participants were 1st-year undergraduate students at Cardiff University, participating in return for course credit. Fifty students participated in Experiment 1, but 1 student was excluded prior to data analysis for omitting one response. Therefore, n ⫽ 49 throughout. Forty students participated in Experiment 2 and 36 in Experiment 3. Stimulus materials. Initial written instructions to participants in Experiment 1 were as follows: Imagine that you are a doctor investigating patients suffering from severe allergic reactions. You are trying to find out whether their allergic reactions are caused by substances in the food they eat. People vary a great deal in the way they are affected by different substances, and you have to assess each one on an individual basis. You are investigating the effects on some of your patients of a particular food additive called Benzoic acid (BA). You ask your patients to eat a certain number of meals in which BA is either present or absent. So, for each patient, you’ll see information about several different meals, showing whether BA was present or absent in a given meal, and whether the patient had an allergic reaction or not. For each patient you will see a statement at the bottom of the page saying “BA causes the allergic reaction in this patient”. Your task is to judge the extent to which this statement is right for that patient. To make your judgment, write a number from 0 (zero) to 100 beside the question. 0 (zero) means that the statement is definitely not right, and 100 means that the statement is definitely right. The more likely you think it is that a given statement is right for that patient, the higher the number you should put. There were eight problems, one on each page. At the top of the page a patient was identified by a set of initials. There followed a list of 24 observations, each one specifying whether BA was present or absent and whether an allergic reaction occurred or not. Under these was a request to “rate the likelihood of this statement being right for patient [initials]: BA causes the patient’s allergic reactions”. An example problem is shown in Appendix B. This is the .33L problem. Initial instructions in Experiment 2 were similar except that the name of the chemical was changed to manganese trioxide and information about the presentation procedure was omitted. Initial instructions in Experiment 3 were a modified version of those for Experiment 1. Instead of the passages describing the instance-list presentation format, there was a paragraph describing the summary table format. An example problem in this format is shown in Appendix C; this too is the .33L problem. Instructions for making the causal judgment were as follows: Under this table you will see the following statement: “Please judge the likelihood that E263 causes allergic reactions in this patient.” To do this you should write a number from 0 to 100 beside the statement. 0 means that E263 definitely does not cause allergic reactions in this patient, and 100 means that E263 definitely does cause allergic reactions in this patient. The more likely it is that E263 causes allergic reactions in the patient, the higher the number you should put. The eight problems were then presented one to a page, all in the format shown in Appendix C. Design. The design for all three experiments was as described above. To summarize, there were two separate experimental designs. One, the pCI design, manipulated two variables. One variable was pCI with three values, .17, .33, and .50. The other variable was the occurrence rate manipulation. There were two pairs of values of p(E/C) and p(E/⫺C), 1.0/0.5 and 0.5/0.0, respectively. Cell frequencies and values of other variables for each of the six individual problems in this design are shown in Table 4. The other design was the ⌬P design, which also manipulated two variables. One variable was ⌬P with two values, .50 and .25. The other was p(E), which is also called the occurrence rate manipulation. This had two values, .67 and .33. The function of this manipulation was to hold constant all possible relevant variables other than ⌬P and its components p(E/C) and p(E/⫺C)—namely, pCI, p(E), p(C), and a/(a ⫹ d). Cell frequencies and values of other variables for the four individual problems in this design are also shown in Table 4 (the .33 pair and the .25 pair). Procedure. Participants in Experiments 1 and 3 were tested in groups of 2 or 3 in a large and comfortably furnished office. They were presented with a booklet containing the instructions for the task on the first page followed by the eight problems in random order. A different random order was generated for each participant. For participants in Experiment 1, the 24 observations within each problem were also randomly ordered. Participants were invited to ask questions if anything in the instructions was not clear (none did so). When the participants had completed the questionnaire, they were thanked and given course credit. Experiment 2 was conducted by an experimenter who was blind to the aims, hypotheses, and design of the study. The participant was initially presented with the written instructions and the experimenter then explained the procedure, described below. At the start of a problem, the experimenter placed a card in front of the participant with the patient’s initials on it. Trials then began. On each trial, the experimenter stated the meal number and whether manganese trioxide was present or absent. The participant then predicted whether an allergic reaction would occur or not. The experimenter then informed the participant whether their prediction was correct or not: If the participant said yes, and if an allergic reaction did occur, the experimenter said, “Correct—an allergic reaction did occur”; if an allergic reaction did not occur the experimenter said, “Incorrect—an allergic reaction did not occur”. If the participant said no, and if an allergic reaction did occur the experimenter said, “Incorrect—an allergic reaction did occur”; if an allergic reaction did not occur the experimenter said, “Correct—an allergic reaction did not occur.” This procedure was repeated for 24 trials in each problem. At the end of the 24 trials, the participant was asked to judge the extent to which they agreed with the statement that manganese trioxide causes allergic reactions in the patient whose initials were in front of them. Instructions for making this judgment were as in Experiment 1, except that the judgment was reported verbally to the experimenter rather than written down. The experimenter recorded the judgment. The first problem was a practice problem using a moderate positive contingency with the same number of trials (24) as the experimental problems. At the end of the practice trials, the experimenter checked that the participant was ready to proceed and the eight experimental problems were then presented in random order, with a different random order being used for each participant. Within-problem trials were also randomly or- MAKING CAUSAL JUDGMENTS dered and different random orders were used for different participants. At no time did the participant see any written information other than the instructions and patient initials. Results Data from the pCI design in each experiment were analyzed with a 3 ( pCI; .17 vs. .33 vs. .50) ⫻ 2 (high vs. low occurrence rate) analysis of variance (ANOVA) with repeated measures on both factors. Data from the ⌬P manipulation in each experiment were analyzed with a 2 (⌬P; .5 vs. .25) ⫻ 2 (high vs. low occurrence rate) ANOVA with repeated measures on both factors. Mean causal judgments for all three experiments are reported in Table 5. Experiment 1. Analysis of the pCI design yielded a significant effect of pCI, F(2, 96) ⫽ 32.73, p ⬍ .001. Post hoc paired comparisons with the Newman-Keuls test revealed the order .50 ⬎ .33 ⬎ .17 ( p ⬍ .01 in all cases). The main effect of occurrence rate was not significant, F(1, 48) ⫽ 2.40. There was, however, a significant interaction between the two variables, F(2, 96) ⫽ 4.32, p ⬍ .05. Simple effects analysis revealed that the effect of pCI was significant at both occurrence rates: for high occurrence rate, F(2, 96) ⫽ 25.78, p ⬍ .001, and for low occurrence rate, F(2, 96) ⫽ 8.99, p ⬍ .001. There was a significant effect of occurrence rate at pCI ⫽ .75, F(1, 48) ⫽ 6.43, p ⬍ .05, with higher ratings at the higher value of occurrence rate. There was no significant effect of occurrence rate at the other levels of pCI: At pCI ⫽ .17, F(1, 48) ⫽ 1.16, and at pCI ⫽ .33, F(1, 48) ⫽ 2.57. As explained above, this interaction is addressed in the General Discussion section. However, the interaction serves to make an important point about the way the pCI rule was tested here. Simple effects are not the proper way to test the predictions of the pCI rule because at the level of simple effects, values of pCI are confounded with values of other variables, notably a/(a ⫹ d). It is the main effect that provides the proper test of the predictions of the pCI rule because at that level, pCI is unconfounded with respect to these other variables. Table 5 Mean Causal Judgments, Experiments 1, 2, and 3 Experiment Problem .17H .17L .33H .33L .50H .50L .25H .25L .17 .33 .50 .25 Note. mean mean mean mean 1 2 3 39.22 43.22 53.71 46.63 66.59 56.63 54.61 39.08 41.22 50.17 61.61 46.85 39.97 46.87 59.85 46.65 73.80 55.22 61.12 42.90 43.42 53.25 64.51 52.01 39.83 48.31 48.33 49.72 56.31 47.53 50.86 34.22 44.07 49.03 51.91 42.54 H ⫽ high; L ⫽ low. 715 Analysis of the ⌬P design revealed that the main effect of ⌬P was not statistically significant, F(1, 48) ⫽ 2.87. There was a significant effect of occurrence rate, F(1, 48) ⫽ 13.25, p ⬍ .001, with higher ratings at the high level of occurrence rate. This is also addressed in the General Discussion section. The interaction between the two variables was not statistically significant, F(1, 48) ⫽ 1.89. Experiment 2. Analysis of the pCI design yielded a significant effect of pCI, F(2, 78) ⫽ 42.84, p ⬍ .001. Post hoc paired comparisons with the Newman-Keuls test revealed the order .50 ⬎ .33 ⬎ .17 ( p ⬍ .01 in all cases). There was a significant main effect of occurrence rate, F(1, 39) ⫽ 7.15, p ⬍ .05, with higher ratings in the high occurrence rate condition. There was also a significant interaction between the two variables, F(2, 78) ⫽ 11.58, p ⬍ .001. Simple effects analysis revealed that the effect of pCI was significant at both occurrence rates: For high occurrence rate, F(2, 78) ⫽ 44.39, p ⬍ .001, and for low occurrence rate, F(2, 78) ⫽ 3.69, p ⬍ .05. There was a significant effect of occurrence rate at pCI ⫽ .33, F(1, 39) ⫽ 12.64, p ⬍ .01, and at pCI ⫽ .50, F(1, 39) ⫽ 14.59, p ⬍ .001, but not at pCI ⫽ .17, F(1, 39) ⫽ 2.11. The interaction has the same pattern as the corresponding interaction in Experiment 1, as Table 5 shows. Analysis of the ⌬P design revealed that the main effect of ⌬P was not statistically significant, F(1, 39) ⫽ 0.26. There was a significant effect of occurrence rate, F(1, 39) ⫽ 38.70, p ⬍ .001, with higher ratings in the high occurrence rate condition. The interaction was not significant, however, F(1, 39) ⫽ 0.89. Experiment 3. Analysis of the pCI design yielded a significant main effect of pCI, F(2, 70) ⫽ 3.71, p ⬍ .05. Post hoc comparisons with the Newman-Keuls test revealed higher ratings when pCI ⫽ .50 than when pCI ⫽ .17. The mean for pCI ⫽ .33 fell between these two as predicted, but was not significantly different from either. The main effect of occurrence rate was not significant, F(1, 35) ⫽ 0.01. There was a significant interaction, F(2, 70) ⫽ 5.34, p ⬍ .01. This interaction has the same pattern as that found in Experiments 1 and 2. Means are reported in Table 5. Analysis of the ⌬P design revealed a significant main effect of ⌬P, F(1, 35) ⫽ 5.65, p ⬍ .05, with higher judgments at the higher level of ⌬P. The main effect of occurrence rate fell short of statistical significance, F(1, 35) ⫽ 3.83, p ⫽ .06. There was a significant interaction between the two variables, F(1, 35) ⫽ 4.68, p ⬍ .05. Simple effects analysis revealed a significant effect of ⌬P at low occurrence rate, F(1, 35) ⫽ 8.26, p ⬍ .01, but not at high occurrence rate, F(1, 35) ⫽ 0.31. There was a significant effect of occurrence rate at the lower value of ⌬P, F(1, 35) ⫽ 11.78, p ⬍ .01, but not at the higher value of ⌬P, F(1, 35) ⫽ 0.05. Essentially, the mean for the .25L problem was different from the other three means, as Table 5 shows. Discussion All three experiments yielded similar results, supporting the pCI rule. When pCI was manipulated and ⌬P held constant, mean ratings varied significantly with pCI. When ⌬P was manipulated and pCI held constant, there was no significant effect in Experiments 1 and 2. WHITE 716 In Experiment 3, unlike the others, there was a significant main effect of ⌬P. There are at least two possible interpretations of this result. There is some evidence of more sophisticated rule use in the summary presentation procedure than in the serial presentation procedure (Arkes & Harkness, 1983; Lober & Shanks, 2000; Ward & Jenkins, 1965), so the result may show that participants were more inclined to judge in accordance with ⌬P in the summary presentation procedure. Kao and Wasserman (1993), using a large set of zero contingency problems, found that 9.5% of their participants could be classified as ⌬P users on the grounds that they responded zero to every problem. This would equate to 3 or 4 participants in the present experiment, perhaps not enough to account for the significant effect of ⌬P. Another possibility is that participants based judgments more on p(E/C). The mean value of p(E/C) for the .33 problems was 0.75, whereas the mean value was 0.62 in the .25 problems (Table 4). Anderson and Sheu (1995) found that 22.5% of their participants based judgments only on Cells a and b. This equates to 8 or 9 participants in the present study, which might be enough to produce a significant effect. On the other hand, the effect of ⌬P was qualified by a significant interaction, and it was found that the .25L problem was different from the other three. Table 4 shows that this problem has the lowest value of a/(a ⫹ d), so it is also possible that judgments were significantly influenced by this variable. These possibilities are not mutually exclusive, and it is possible that all of them contributed to the observed effect. It is not clear, however, why the effect should appear with the summary presentation procedure and not with the procedures used in Experiments 1 and 2. There have been only a few direct comparisons between different presentation procedures (Arkes & Harkness, 1983; Kao & Wasserman, 1993; Lober & Shanks, 2000; Ward & Jenkins, 1965), and more needs to be learned about the reasons for differences between them. It is at least possible to conclude that significant effects of the pCI manipulation occurred with all three procedures. Tests of Other Models This section assesses the predictions of the foremost models in the current literature, the power PC theory (Cheng, 1997), a weighted version of the ⌬P rule (Lober & Shanks, 2000), and the application to human causal judgment of the Rescorla–Wagner model (R-W model; Rescorla & Wagner, 1972) of associative learning (Allan, 1993; Shanks, 1995; Wasserman et al., 1996). The R-W Model The general principle of the associative learning approach to human causal judgment is that causal judgment follows the accumulation of a record of experiences, forming an association between mental representations of a cue (or an action) and an outcome. When the cue is present and the outcome occurs, the associative bond tends to strengthen unless it has already reached asymptote. When the outcome does not occur in the presence of the cue, the associative bond tends to weaken. The stronger the associative bond is, the more likely it is that the cue will be judged a cause of the outcome. The associative learning approach has generally taken as its model the R-W model (Rescorla & Wagner, 1972). The application of the R-W model to human causal judgment has been thoroughly described by many authors (e.g., Allan, 1993; Shanks, 1993, 1995; Shanks & Dickinson, 1987; Wasserman et al., 1996) and is briefly summarized here. In this case, there is a stimulus (the food additive, BA, in Experiment 1), which may or may not be present on any given trial, and another stimulus, the meal, in which the food additive is included and which is present on every trial (although the rest of the content of the meal is not specified). For this situation, the change in strength of an association between a stimulus and an outcome is represented in Equation 4: ⌬VS ⫽ ␣ 䡠 共 ⫺ VT兲 . (4) In this equation ⌬VS represents the change in associative strength of the stimulus on a trial when S occurs. VT is the combined associative strength of all the stimuli present on a trial, including S. ␣ is a learning rate parameter representing the salience of the stimulus and it determines the rate at which learning reaches asymptote.  is a learning rate parameter associated with the outcome and can be specified separately for occurrences of the outcome (reinforced trials) and nonoccurrences of the outcome (nonreinforced trials). denotes the maximum level of associative strength that can be supported by the outcome. Each of the eight problems in Experiments 1, 2, and 3 was entered into a simulation of the R-W model, with the meal that was present on all types of trial represented by one stimulus and the food additive that was present on two types of trials (i.e., Cell a and Cell b trials) being represented by another stimulus. Asymptotic outputs were recorded. In the simulation ␣ and were both set at 1. Four simulations were run: two in which  was the same for reinforced and nonreinforced trials, one in which it was higher for reinforced than for nonreinforced trials, and one in which the opposite was the case. The outputs of these four simulations for each problem are reported in Table 6.4 Asymptotic values in Table 6 are means for different random orderings of trials. The asymptote is sensitive to trial order, particularly to the identity of the last trial, and tends to fluctuate by ⫾.02. Table 6 shows that the simulation outputs converge on ⌬P when both learning rates are the same, a property of the R-W model demonstrated by Chapman and Robbins (1990) and further confirmed by other authors since then (Allan, 1993; Shanks, 1993, 1995). This is not the case when the learning rates differ. When  for reinforced trials is greater than that for nonreinforced trials, asymptote is reduced for problems .17H, .33H, .50H, and .25H, and raised for problems .17L, .33L, .50L, and .25L. The opposite is the case when  for nonreinforced trials is greater than that for reinforced trials. In all four simulations, however, the asymptotic values do not resemble the ordinal pattern of differences in judgment found in Experiments 1, 2, and 3. Regardless of what assumptions are made 4 The simulation was designed by John Pearce. Prior testing established that the program conforms correctly to the Rescorla–Wagner (1972) model and, in particular, that when  for reinforced trials equals  for nonreinforced trials, the asymptotic values produced by the simulation are isomorphic with ⌬P values. I am grateful to John for his assistance in running the simulations reported herein. MAKING CAUSAL JUDGMENTS 717 Table 6 Asymptotic and Preasymptotic Output of Simulations of the Rescorla–Wagner Model for Problems Used in Experiments 1, 2, and 3 Output Asymptotic Preasymptotic Means Problem .02/.02 .1/.1 .02/.01 .01/.02 1 10 Exp. 1 Exp. 2 Exp. 3 .17H .17L .33H .33L .50H .50L .25H .25L .50 .50 .50 .50 .50 .50 .25 .25 .50 .50 .50 .50 .50 .50 .25 .25 .33 .66 .33 .66 .33 .66 .19 .26 .66 .33 .66 .33 .66 .33 .27 .18 .06 .14 .13 .12 .18 .10 .17 .06 .30 .32 .41 .34 .45 .35 .30 .20 39.22 43.22 53.71 46.63 66.59 56.63 54.61 39.08 39.97 46.87 59.85 46.65 73.80 55.22 61.12 42.90 39.83 48.31 48.33 49.72 56.31 47.53 50.86 34.22 Note. Column headings for the asymptotic output columns are learning rate for reinforced trials/learning rate for nonreinforced trials. Column headings for preasymptotic output columns are number of iterations. The learning rates for the preasymptotic output columns were .02/.02. Exp. ⫽ experiment; H ⫽ high; L ⫽ low. about learning rates, asymptotes for the .25 pair are always lower than those for the other six problems. Furthermore, asymptotic values sum to 1 (to a close approximation) for each of the three pairs in the pCI design and, therefore, fail to reproduce the main effect of pCI that is the principal finding of the three experiments. The possibility must also be considered that judgments obtained in the experiments had not reached asymptote after 24 trials, the number used in these experiments. To test this possibility, the simulation was conducted again with low learning rates, and output was recorded after 1 iteration of the 24-trial sequence and also after 10 iterations, at which point output values were still well below asymptotic levels. Table 6 shows that the output after 1 iteration yielded a better fit with the observed means. The sums of output values were .20 for the .17 pair, .25 for the .33 pair, and .28 for the .50 pair. Moreover the sum for the .25 pair was .23, which is intermediate between the highest and lowest values. Therefore, the simulation reproduced the main effect of pCI and the finding that the sum of the means for the .25 pair was similar to that for the .33 pair. The problem with this result is that the preasymptotic output values after one iteration model Cell a frequency. The correlation between the output values and Cell a frequency was .99, but the correlation between output values and pCI was only .33. As these correlations imply, pCI was itself positively correlated with Cell a frequency in these stimulus materials. Thus, the apparent success of the preasymptotic output in modeling the main effect of pCI is attributable merely to the positive correlation between Cell a frequency and pCI in the stimulus materials. After 10 iterations the output values still increased in accordance with the main effect of pCI. The sums of output values were .62 for the .17 pair, .75 for the .33 pair, and .80 for the .50 pair. However, at this stage the sum for the .25 pair was .50, substantially below that for the lowest value of pCI. At this stage, therefore, the simulation did not reproduce the observed results for this pair. To test this further, the sums of the output values and the mean values of pCI were correlated with individual mean judgments across the four pairs of problems (from Experiment 2, because the serial presentation procedure is the closest to a learn- ing situation in these three experiments). The mean correlations were .66 for pCI and .40 for the R-W model output. These two means differed significantly, F(1, 39) ⫽ 15.00, MSE ⫽ 0.09, p ⬍ .001. Thereafter, the output values increasingly tended to converge on ⌬P values. If there is any stage of acquisition at which the output successfully models the obtained results, it must be prior to 10 iterations. By that stage, not only has the output for the .25 pair already fallen below that for the .17 pair, remaining there until asymptote is reached, but the R-W model output is a significantly poorer predictor of individual judgments than the pCI rule, as just shown. It needs to be emphasized that, with these learning rates, output values after 10 iterations are still below those of the asymptotic values. This implies that the R-W model could only account for the present results if judgments were even further short of their asymptotic values. This is very unlikely. There were 24 trials, and other experiments using similar methods have shown that learning in human contingency judgment experiments is usually rapid with little or no change in judgment after 20 trials (Anderson & Sheu, 1995; Baker et al., 1989; Catena et al., 1998; Lober & Shanks, 2000; Shanks, 1985). Thus, although it is impossible to be sure that the judgments observed in these experiments were asymptotic, the R-W model could only account for the observed trends if learning had proceeded far more slowly than in the experiments just cited. There is no clear reason why this should happen. In conclusion, asymptotic output of the R-W model simulation failed to reproduce the main effect of pCI regardless of what values were set for  for reinforced and nonreinforced trials. Preasymptotic output after one iteration did reproduce the main effect of pCI, but this result can be explained as a by-product of the fact that the model’s output was isomorphic with Cell a frequencies in the early stages of acquisition. It is therefore unlikely that the main effect of pCI can be explained by the R-W model. The Power PC Theory In the power PC theory (Cheng, 1997), it is assumed that people have the preconceived belief that things have causal powers to WHITE 718 produce certain effects. These powers are unobservable, so the task of causal induction is to arrive at an estimate of the strengths of causal powers. To do this, reasoners seek to distinguish the contribution to an effect of the candidate being judged from the contribution of the set of all other candidates. For a single causal candidate, this is achieved by computing ⌬P for the candidate and then adjusting ⌬P in the light of information about the base rate of occurrence of the effect when the candidate is absent. In the case of generative causal power, this is modeled by Equation 5: p ⫽ ⌬P/[1 ⫺ p(E/⫺C)] . (5) In this equation p stands for the causal power of the candidate being judged. This equation is valid when the causal candidate and the set of other causes of the effect are independent of each other. This account was tested against the results of Experiments 1, 2, and 3 by computing values of p for each of the eight problems. The results of these computations are reported in Table 4. The table shows that p is higher than ⌬P for problems .17H, .33H, and .50H, and identical to p for problems .17L, .33L, and .50L. However the mean value of p is the same for each pair, .75. This shows that p is no more able to explain the significant effect of pCI than ⌬P is. Moreover, the mean value of p for the .25 pair (.42) is less than the mean value for the .33 pair (.75). This means that the power PC theory, like ⌬P, predicts lower mean judgments for the .25 pair than for the .33 pair, and this prediction was not supported either. The power PC theory, therefore, fails to account for the principal results of the first three experiments, the significant main effect of pCI, and the lack of significant effect of ⌬P. The Weighted ⌬P Rule The ⌬P statistic is an objective measure of contingency. However as a model of causal judgment, it assumes equal weighting of cells and may therefore be an inaccurate predictor if there are consistent tendencies in weightings of cells. Lober and Shanks (2000) found a better fit to their data if p(E/C) was weighted w1 ⫽ 1.0, and p(E/⫺C) was weighted w2 ⫽ .6, so a decision was made to test a version of ⌬P with these weights on the present data. Weighted ⌬P values are shown in the right column of Table 4. As the table shows, weighted ⌬P values were highly correlated with p values and encountered similar problems. They failed to predict the main effect of pCI, and they predicted higher ratings for the .33 pair than for the .25 pair, which were not found. No combination of weights given to p(E/C) and p(E/⫺C) predicts the main effect of pCI. For example, if p(E/C) is weighted w1 ⫽ 1.0, and P(E/⫺C) is weighted w2 ⫽ .2, weighted ⌬P values for the first 6 problems in Table 4 are .9, .5, .9, .5, .9, and .5, respectively. The reason for this is that the values of p(E/C) and p(E/⫺C) do not vary between the pairs of problems: p(E/C) ⫽ 1 in problems .17H, .33H, and .50H, and .5 in .17L, .33L, and .50L, whereas P(E/⫺C) ⫽ .5 in problems .17H, .33H, and .50H, and 0 in .17L, .33L, and .50L. Any weight must generate the same product for a given value of p(E/C) or P(E/⫺C), so weighting the probabilities can never model a main effect of pCI when ⌬P is held constant. Experiment 4 I stated in the introduction that the pCI rule is a proportional version of the frequency-based ⌬D rule. In the ⌬D rule (Equation 3) the sum of disconfirming instances is subtracted from the sum of confirming instances, and causal judgment is hypothetically based on the result. When the number of instances is held constant across problems, as in the design shown in Table 4, pCI and ⌬D values are isomorphic, entailing that the two rules make identical ordinal predictions. However, the ⌬D rule has the property not shared by the pCI rule that values of ⌬D tend to increase as the number of instances increases when there is a positive contingency (Shanks, 1987). This implies that the rules can be tested by manipulating frequencies independently of proportions: If people judge according to ⌬D and not the pCI rule, then judgments should vary with frequencies when proportions are held constant. This is the aim of Experiment 4. Method Participants. The participants were 42 first-year undergraduates at Cardiff University, who received course credit in return for participating. Stimulus materials. The initial written instructions were as in Experiment 1, except that the name of the fictitious food additive was E263. The instance list procedure was used, as in Experiment 1. Design. The design was a reduced version of the pCI design shown in Table 4, with problems .33H and .33L omitted, but with a frequency manipulation as an extra independent variable. Cell frequencies and other relevant features of the judgmental problems are shown in Table 7. The last letter in each problem designation refers to number of instances presented, low (L) or high (H). Three variables were manipulated. One variable was the total number of instances in each problem, 24 or 48; ⌬D was computed according to Equation 3. As Table 7 shows, doubling the number of instances doubles ⌬D values while pCI values remain unchanged, so this manipulation tests the ⌬D rule against the pCI rule. PCI values were manipulated with two values, .17 and .50. The third variable was the occurrence rate manipulation, as in the previous three experiments. The bottom two rows of Table 7 show that other variables, specifically p(E), p(C), p(E/C), P(E/⫺C), ⌬P, and a/(a ⫹ d) were held constant across the pCI manipulation, as in the previous experiments. The bottom two rows of Table 7 also show that ⌬D was not held constant across the pCI manipulation. ⌬D and pCI are distinguished by the frequency manipulation, as described above. Procedure. The procedure was similar to that of Experiment 1. Results and Discussion Data were analyzed with a 2 (frequency; 24 vs. 48 instances) ⫻ 2 ( pCI; .17 vs. .50) ⫻ 2 (high vs. low occurrence rate) within-subjects ANOVA. Means are reported in Table 8. Analysis yielded two significant results. There was a significant main effect of pCI, F(1, 41) ⫽ 43.54, p ⬍ .001, with a higher mean at the higher level of pCI (63.86) than at the lower level (48.42). This effect was qualified by a significant interaction with occurrence rate, F(1, 41) ⫽ 14.52, p ⬍ .001. This interaction showed the same pattern as that found in the previous three experiments. The main effect of frequency was not significant, F(1, 41) ⫽ 1.09, and there were no significant interactions between frequency and the other variables. The main effect of pCI replicates the findings of the previous three experiments. In this experiment, however, the pCI rule was MAKING CAUSAL JUDGMENTS 719 Table 7 Cell Frequencies and Properties of Stimulus Information, Experiment 4 Cell frequencies Problem a b c d pCI ⌬D p(C) p(E) p(E/C) p(E/⫺C) ⌬P a/(a ⫹ d) .17HL .17HH .17LL .17LH .50HL .50HH .50LL .50LH M of first 4 M of second 4 4 8 10 20 12 24 6 12 0 0 10 20 0 0 6 12 10 20 0 0 6 12 0 0 10 20 4 8 6 12 12 24 .17 .17 .17 .17 .50 .50 .50 .50 .17 .50 4 8 4 8 12 24 12 24 6 18 .17 .17 .83 .83 .50 .50 .50 .50 .50 .50 .58 .58 .42 .42 .75 .75 .25 .25 .50 .50 1.00 1.00 .50 .50 1.00 1.00 .50 .50 .75 .75 .50 .50 .00 .00 .50 .50 .00 .00 .25 .25 .50 .50 .50 .50 .5 .5 .50 .50 .50 .50 .29 .29 .71 .71 .67 .67 .33 .33 .50 .50 Note. In terms of the cell identifications in Table 1, p(CI) ⫽ proportion of confirmatory instances; p(E) ⫽ (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d); p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d); p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present; p(E/⫺C) ⫽ proportion of instances on which the effect occurs when the cause is absent. tested against the ⌬D rule by means of a manipulation of the total number of instances. This manipulation was ineffective as far as can be judged from the ANOVA results: There was no significant effect of or interaction with the frequency manipulation. Therefore, it is unlikely that people judge according to the ⌬D rule. Previous experiments have compared different total numbers of instances for otherwise identical contingencies and have consistently failed to find significant effects (Anderson & Sheu, 1995; Baker et al., 1989; Lober & Shanks, 2000; Mandel & Lehman, 1998; Mercier & Parr, 1996). The only exceptions involved a very small number of trials on one side of the comparison (eight in Mercier & Parr, 1996, and six in Anderson & Sheu, 1995). Those results are consistent with the results of Experiment 4. However, Experiment 4 is the first experiment to design a frequency manipulation as a direct test of the ⌬D rule and to compare it with the pCI rule. results supportive of the pCI rule, it would obviously be desirable to show that effects of pCI are not confined to this particular design. Therefore, in Experiment 5, I tested the rule with a different design. In addition, the measure of causal judgment used in the first four experiments was worded as a measure of likelihood: The more likely the participant thought it was that the causal candidate caused the effect, the higher the rating the participant should have given. The introduction to this measure stipulated that the participant’s task was to judge to what extent the statement that the candidate caused the effect was true. However, it is possible that participants treated the task as a rating of subjective confidence rather than a true causal judgment. Accordingly, in Experiment 5 I used a different measure, intended to be an unambiguous measure of causal strength. Method Experiment 5 The experiments reported so far all used the same design, similar to that used by White (2000b). The purpose of the design was to manipulate pCI while holding all other relevant factors constant. It is difficult to find designs that have this feature because there are so many factors that have to be held constant: not only the objective contingency measured by ⌬P, but also p(E/C), P(E/⫺C), p(C), p(E), and a/(a ⫹ d). Although the design reliably yielded Table 8 Mean Causal Judgments, Experiment 4 pCI Occurrence rate Frequency .17 .50 High High Low Low 24 48 24 48 44.03 44.76 50.24 54.64 67.07 70.02 59.62 58.71 Note. p(CI) ⫽ proportion of confirmatory instances. Participants. The participants were 39 first-year undergraduates at Cardiff University, who received course credit in return for participation. Stimulus materials and procedure. All features of instructions to participants, stimulus format, and procedure were the same as in Experiment 1, except that the food additive was called E263 and a different measure of causal judgment was used. Beneath the stimulus information for each problem there was a statement saying, “Please rate the extent to which E263 causes the patient’s allergic reactions.” On the initial instruction sheet, participants were instructed that they should do this by writing a number from 0 to 100 beside the statement; 0 meant that E263 did not cause the patient’s allergic reactions at all, and 100 meant that E263 was a very strong cause of the patient’s allergic reactions. The more strongly they thought E263 caused the patient’s allergic reactions, the higher the number they were to assign. Design. The design used in the first 4 experiments held constant all factors other than pCI by means of a manipulation of the occurrence rate, in other words a manipulation of p(E/C) and P(E/⫺C). This design held constant all factors in a different way, by manipulating the proportion of instances on which the cause was present, p(C). Cell frequencies and other relevant features of the judgmental problems are shown in Table 9. Two variables were manipulated: pCI was manipulated with four values, ⫺.2, .1, .4, and .7, and p(C) was manipulated with two values, .2 and .8. The sole purpose of the p(C) manipulation was to hold constant values of WHITE 720 Table 9 Cell Frequencies and Properties of Stimulus Information, Experiment 5 Cell frequencies a b c d pCI p(C) p(E) p(E/C) p(E/⫺C) ⌬P a/(a ⫹ d) .32 0 24 2 16 4 8 6 0 24 0 16 2 8 4 0 6 8 8 16 6 24 4 32 2 ⫺.2 ⫺.2 .1 .1 .4 .4 .7 .7 ⫺.2 .1 .4 .7 .2 .8 .2 .8 .2 .8 .2 .8 .5 .5 .5 .5 .80 .20 .55 .45 .30 .70 .05 .95 .50 .50 .50 .50 1.00 .25 .75 .50 .50 .75 .25 1.00 .62 .62 .62 .62 .75 .00 .50 .25 .25 .50 .00 .75 .37 .37 .37 .37 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .50 .50 .27 .73 .14 .86 .06 .94 .50 .50 .50 .50 Problem ⫺.2L ⫺.2H .1L .1H .4L .4H .7L .7H M of M of M of M of 8 8 6 16 4 24 2 ⫺.2L and ⫺.2H .1L and .1H .4L and .4H .7L and .7H Note. In terms of the cell identifications in Table 1, p(CI) ⫽ proportion of confirmatory instances; p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d); p(E) ⫽ (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d); p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present; p(E/⫺C) ⫽ proportion of instances on which the effect occurs when the cause is absent. other variables so that the main effect of pCI was an unconfounded test of the pCI rule. The bottom four rows of Table 9 show that all the other variables were held constant across the pCI manipulation. In this design, as in that used in the previous experiments, the main effect of the pCI manipulation therefore provided a test of the pCI rule that was unconfounded with respect to these other variables. Results and Discussion Data were analyzed with a 4 ( pCI; ⫺.2 vs. .1 vs. .4 vs. .7) ⫻ 2 ( p(C); .2 vs. .8) within-subjects ANOVA. Means are reported in Table 10. Analysis yielded a significant effect of pCI, F(3, 114) ⫽ 33.64, p ⬍ .001. Post hoc comparisons with the Newman-Keuls test revealed the order ⫺.2 ⬍ .1 ⬍ .4 ⬍ .7 ( p ⬍ .05 in all cases). This was the predicted effect of the pCI manipulation. There was a significant effect of p(C), F(1, 38) ⫽ 29.35, p ⬍ .001, with a higher mean at the higher value of p(C). This can be interpreted as an effect of a/(a ⫹ d). Table 7 shows that values of a/(a ⫹ d) tended to be higher at the higher value of p(C) (range ⫽ .50 –.94) than at the lower value (range ⫽ .06 –.50). The interaction was also significant, F(3, 114) ⫽ 7.76, p ⬍ .001. Table 10 shows that means tended to increase with increasing pCI value at both values of p(C), but the rate of increase was considerably greater at the higher value of p(C). This, like the similar Table 10 Mean Causal Judgments, Experiment 5 .2 .8 All Experiment 6 If p(C) ⫽ .5, pCI can generate predictions for causal judgment even in cases where the objective contingency assessed by ⌬P is zero. The aim of Experiment 6 was to test this contention, in this case using the serial presentation procedure. The design strategy was similar to that of the previous experiments, manipulating pCI while holding ⌬P, p(E/C), P(E/⫺C), p(E), p(C), and a/(a ⫹ d) constant across pairs of problems. Method pCI p(C) interactions found in the previous experiments, is addressed in the General Discussion section. The results supported the predictions. Judgments tended to vary with pCI even though the objective contingency was held constant. It was noted in the Tests of Other Models section reporting the simulations of the R-W model that pCI was correlated with Cell a frequency in the stimulus materials. Therefore, there could be a danger that participants made judgments on the basis of Cell a frequency alone and that the apparent predictive success of the pCI rule could be attributed to its correlation with Cell a frequency. In this experiment, Cell a frequency decreased from Problem ⫺.2L to .7L. However, the means for those four problems reported in Table 10 show an increasing trend, consistent with the pCI rule and contrary to the direction of change in Cell a. This disconfirms the possibility that judgments were made in accordance with Cell a frequency and not pCI. ⫺.2 .1 .4 .7 All 26.23 27.08 26.65 28.49 37.38 32.94 37.21 53.64 45.42 41.05 70.33 55.69 33.24 47.11 Note. pCI ⫽ proportion of confirmatory instances; p(C) ⫽ proportion of instances on which the cause is present. Participants. The participants were 43 first-year undergraduate students at Cardiff University, participating in return for course credit. Stimulus materials, design, and procedure. Most features of method, including initial written instructions; format of stimulus presentations; and procedure, were the same as in Experiment 2. The differences were in the design. In this experiment, there were four problems manipulating pCI with two values, ⫺.17 and .17, while holding ⌬P constant at zero and also holding constant the values of other variables. This was again accomplished by manipulating p(E/C) and p(E/⫺C), an occurrence rate manip- MAKING CAUSAL JUDGMENTS 721 Table 12 Mean Causal Judgments, Experiment 6 ulation. Event frequencies and values of other variables for each problem are shown in Table 11. Occurrence rate Results and Discussion Ratings were analyzed with a 2 ( pCI; ⫺.17 vs. .17) ⫻ 2 (low vs. high occurrence rate) ANOVA with repeated measures on both factors. Means are reported in Table 12. Analysis revealed a significant main effect of pCI, F(1, 42) ⫽ 15.71, p ⬍ .001, with higher ratings at the higher value of pCI. There was a significant main effect of occurrence rate, F(1, 42) ⫽ 32.26, p ⬍ .001, with higher ratings at the higher occurrence rates. There was also a significant interaction, F(1, 42) ⫽ 17.65, p ⬍ .001. This interaction has the same pattern as the corresponding interactions in the previous experiments. The results supported the predictions of the pCI rule. The power PC theory is unable to account for these results because Equation 5 only applies to nonzero values of ⌬P. However it is possible to test a weighted version of ⌬P. For this, p(E/C) was weighted w1 ⫽ 1.0 and P(E/⫺C) was weighted w2 ⫽ .6. Weighted ⌬P values are shown in the right column of Table 10. As in Experiments 1, 2, and 3, the mean value of ⌬Pw was the same at both levels of pCI, and for that reason ⌬Pw cannot account for the significant effect of pCI. pCI ⫺.17 .17 All low high All 31.28 29.26 30.27 35.49 54.44 44.97 33.38 41.85 37.62 Note. pCI ⫽ proportion of confirmatory instances. cluding the use of zero contingencies (Experiment 6). This adds to the evidence that the pCI rule predicts judgments of complex causal interpretations specifying causes and enablers (White, 2000b) and judgments of interactions between two causal candidates (White, 2002b), thereby supporting the contention that the pCI rule is a general rule that applies to judgments of any kind of causal interpretation. Manipulations of ⌬P with pCI held constant consistently failed to produce significant results, except in Experiment 3. Several simulations of the R-W model with different learning rates, and taking both asymptotic and preasymptotic output, failed to reproduce the pattern of findings obtained in the first three experiments. It was also shown that neither the equation for estimating causal power in the power PC theory (Cheng, 1997) nor any weighted version of the ⌬P rule could account for the effects of the pCI manipulation in the first three experiments. The ⌬D rule, the frequency-based version of the pCI rule, was eliminated by the results of Experiment 4, in which no significant effect of a frequency manipulation was obtained. The present experiments were designed to test the pCI rule, shown in Equation 2. However, there were also several significant interactions that had a consistent pattern, which can be illustrated with the results of Experiments 1, 2, and 3 (Table 5). In the .17 pair, judgments tended to be higher at the low value of occurrence rate than at the high value. In the .33 pair, there was no consistent difference. In the .50 pair, judgments tended to be higher at the high value of occurrence rate than at the low value. Across values of pCI, the tendency was for strong effects General Discussion The present findings indicate that the pCI rule predicts judgmental phenomena that are not predicted by any other rule or model. Significant effects of pCI manipulations were consistently obtained when ⌬P was held constant. The results supported the predictions of the pCI rule in several different stimulus presentation procedures, the instance list procedure (Experiments 1, 4, and 5), the serial presentation procedure (Experiments 2 and 6), and the summary presentation procedure (Experiment 3). Most of the experiments used a measure in which participants judged the likelihood that the causal candidate caused the effect. Because of the possibility that this might have been treated as a judgment of subjective confidence, Experiment 5 used a measure of causal strength that could not be interpreted as a confidence measure. Different ways of manipulating pCI values independently of ⌬P were used, in- Table 11 Stimulus Materials for Experiment 6 Cell frequencies Problem a b c d pCI ⌬P p(E) p(C) p(E/C) p(E/⫺C) a/(a ⫹ d) ⌬Pw ⫺.17L ⫺.17H ⫹.17L ⫹.17H M of ⫺.17L & ⫺.17H M of ⫹.17L & ⫹.17H 6 4 2 12 12 2 4 6 2 12 6 4 4 6 12 2 ⫺.17 ⫺.17 .17 .17 ⫺.17 .17 0 0 0 0 0 0 .33 .67 .33 .67 .50 .50 .75 .25 .25 .75 .50 .50 .33 .67 .33 .67 .50 .50 .33 .67 .33 .67 .50 .50 .60 .40 .14 .86 .50 .50 .13 .27 .13 .27 .20 .20 Note. In terms of the cell identifications in Table 1, p(CI) ⫽ proportion of confirmatory instances; p(E) ⫽ (a ⫹ c)/(a ⫹ b ⫹ c ⫹ d); p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d); p(E/C) ⫽ proportion of instances on which the effect occurs when the cause is present, p(E/⫺C) ⫽ proportion of instances on which the effect occurs when the cause is absent; ⌬P ⫽ p(E/C) ⫺ p(E/⫺C). ⌬Pw ⫽ weighted ⌬P. WHITE 722 of pCI to occur at a high occurrence rate and weak effects at a low occurrence rate. These findings can be accommodated by a simple modification to the pCI rule. The pCI rule shown in Equation 2 gives equal weight to Cell a and Cell d instances. However, there is abundant evidence that these cells are not weighted equally in causal judgment and that Cell a consistently carries more weight than Cell d (Anderson & Sheu, 1995; Kao & Wasserman, 1993; Levin et al., 1993; Mandel & Lehman, 1998; Wasserman et al., 1990, 1993, 1996). This indicates that a weighted version of the pCI rule may be a better predictor of judgmental tendencies. This weighted version of the pCI rule can be expressed as in Equation 6: pCI ⫽ wa[s(xa)] ⫹ wd[s(xd)] ⫹ wb[s(xb)] ⫹ wc[s(xc)] . a⫹b⫹c⫹d (6) In this equation s(xa), s(xb), s(xc), and s(xd) are subjective evaluations of Cells a, b, c, and d, respectively, and vary from ⫺1 to 1. wa, wb, wc, and wd are weights assigned to s(xa), s(xb), s(xc), and s(xd), respectively, and vary from 0 to 1. In this rule, therefore, each cell has both a value and a weight (Hogarth & Einhorn, 1992; White, 2003a). The unweighted pCI rule can be regarded as a version of Equation 6 in which all weights are equal and subjective evaluations are 1 for Cells a and d and ⫺1 for Cells b and c. Two antecedents share with Equation 6 the property of weightings for individual cells, the information integration rule proposed by Busemeyer (1991), and the weighted ⌬D rule proposed by Catena et al. (1998). These two rules differ from Equation 6 in various features. Both rules differ in that they have weights but no values for the cells, assuming instead that Cells a and d have a confirmatory impact and Cells b and c a disconfirmatory impact on judgments. These assumptions may be normative but are not invariably descriptive: White (2000a) found that substantial minorities of participants ascribed confirmatory value to Cell c or disconfirmatory value to Cell d, both in retrospective verbal reports and in their actual judgments. A model that has both weights and values, like Equation 6, can model idiosyncratic tendencies of those sorts, but models with weights and normative assumptions about values cannot. Busemeyer’s rule was also found to be less predictive than the R-W model by Wasserman et al. (1996). Weights and values are liable to vary between individuals (Anderson & Sheu, 1995; White, 2000a). Finding the optimal values and weights for predicting mean tendencies of a sample of individuals is therefore an empirical matter, and different samples may exhibit different aggregate tendencies. The unweighted pCI rule makes successful predictions because most people conform to normative notions of cell value and weight for most cells (White, 2000a). However, several experiments have shown a consistent tendency in ordering of cell weights, specifically a ⬎ b ⬎ c ⬎ d (Anderson & Sheu, 1995; Kao & Wasserman, 1993; Levin et al., 1993; Mandel & Lehman, 1998; Wasserman et al., 1990). Consistent with this finding, any combination of cell weights in which Cell d carries significantly less weight than Cell a successfully predicts both the main effect of pCI and the interaction with occurrence rate in all the present experiments. To illustrate, suppose wa ⫽ wb ⫽ wc ⫽ 1 and wd ⫽ .6. Consider the six problems in the pCI design of Experiments 1, 2, and 3 (Table 4). For the two .17 problems, weighted pCI values are 0 for .17H and .1 for .17L. This amounts to a prediction of higher judgments for the latter, and Table 5 shows that this happened. For the two .50 problems, weighted pCI values are .4 for .50H and .3 for .50L. This time, weighted pCI values predict higher judgments for the former, and Table 5 shows that this happened. For the two .33 problems, weighted pCI values are the same for both, whereas in fact Table 5 shows generally higher means for .33H than for .33L. This could be explained by assuming b ⬎ c, as has generally been found in the studies just cited: The problem in which Cell c had a higher frequency than Cell b (.33H) tended to receive higher judgments than the problem in which Cell b had a higher frequency than Cell c (.33L). Therefore, it is likely that the full pattern of differences between means can be accounted for by assigning weights to cells in accordance with the a ⬎ b ⬎ c ⬎ d ordering found in previous studies. This cell-weight inequality still lacks a convincing explanation, although some hypotheses have been proposed and are consistent with the evidence (Levin et al., 1993; Mandel & Lehman, 1998). However White (2003a) found evidence that cell weights tend to vary in accordance with variations in the confirmatory or disconfirmatory value of instances of a particular kind. Cells a and d both tend to be evaluated as confirmatory and Cells b and c as disconfirmatory. White found that when the frequency in one cell was held constant at zero, the weight given to its partner increased significantly. Usually, Cell d carries the least weight of the four cells, but when Cell a frequency was zero Cell d carried substantially more weight than either Cell b or Cell c. The cell carried more weight because it was the only kind of confirmatory information available. Similar tendencies were obtained for all four cells. This suggests that the explanation for the cell-weight inequality has to do with the relative evidential value of different kinds of information: Manipulations that affect the evidential values of cells change the pattern of cell weights. Under this interpretation, the main effects predicted by the unweighted pCI rule and the interactions that relate to frequencies in Cells a and d can both be understood as effects on judgment of the evidential value ascribed to each kind of contingency information. Manipulations of ⌬P with pCI held constant failed to yield significant differences in Experiments 1 and 2. However ⌬P was varied over a smaller range of values (.25) than pCI (.33), which makes it somewhat less likely that significant effects of ⌬P would be obtained. It remains possible that a significant effect of ⌬P would be found with a greater range of values of ⌬P under otherwise similar conditions. Having said that, a range of .25 is not obviously inadequate as a test of the ⌬P rule. Other studies have reported significant differences between judgmental problems in which ⌬P values differed by .25 or less (e.g. Anderson & Sheu, 1995; Wasserman et al., 1983; White, 2000a), and accounts that have proposed that causal judgments are made in accordance with ⌬P certainly imply that judges should be sensitive to differences of that magnitude (Cheng & MAKING CAUSAL JUDGMENTS Novick, 1992). The problem with the previous studies, as explained in the introduction, is that in each case, ⌬P values were confounded with pCI values, so the present results imply that the significant differences observed therein can be attributed to variations in pCI, not ⌬P. The most pervasive difficulty for tests of any model of causal judgment is that of confounding variables. Past tests of the ⌬P rule have confounded ⌬P with other variables, notably pCI as already explained. The designs used here were intended to unconfound pCI from as many variables as possible, including ⌬P, p(E), p(C), p(E/C), and P(E/⫺C). There is, however, no possibility of a single design that unconfounds pCI, or any other variable, from every variable that there is. For example values of pCI in Experiments 1, 2, and 3 were positively correlated with Cell a frequency, and with the related proportion a/T, where T ⫽ the total number of instances. Thus, the supportive evidence for pCI is inevitably weakened by the residual confounds with other variables. In the case of Cell a frequency, this problem was addressed in Experiment 5. In that experiment, when p(C) ⫽ .8, mean causal judgments tended to increase as pCI increased. However Cell a frequency decreased as pCI increased, so in this experiment the results contradicted the hypothesis that judgments are made in accordance with Cell a values. Every published test of any model of causal judgment suffers from problems of confounding variables to some degree. The present results support an argument that past findings of causal judgments varying with ⌬P values should in fact be interpreted as support for the pCI rule because in every case ⌬P was confounded with pCI. Further attempts to find designs that unconfound relevant variables are vital to pin down the rule or equation that best models causal judgment from contingency information. Under the present account, contingency information is interpreted as evidence. In effect the causal candidate is a hypothesis, and the extent to which the hypothesis is accepted depends on the proportion of evidence that is confirmatory for it. Judgment is liable to change as evidence is accumulated until a point is reached at which the judge achieves closure and judgment does not change any further unless a significant change in the proportion of confirmatory evidence is detected. The pCI rule is a rule for asymptotic judgment and does not describe how causal judgments are acquired. Under the evidential evaluation model, however, the acquisition of causal judgment proceeds in a manner similar to that predicted by belief revision models (Catena et al., 1998; Hogarth & Einhorn, 1992). A belief revision model specific to the evidential evaluation model has been proposed and tested (White, 2003b), and the intention is that the belief revision model should combine with the pCI rule to provide a comprehensive account of the acquisition and finalization of causal judgment. Therefore, the present research has shown that a simple judgmental rule, the pCI rule, predicts phenomena not predicted by the leading models of causal judgment. Causal judgments consistently varied with variations in pCI when ⌬P and other relevant variables were controlled and consistently failed to vary with ⌬P when pCI values were controlled. The pCI rule is a central component of a model in which instances of contingency information are trans- 723 formed into evidence in accordance with subjective notions of evidential value, and this model is currently being developed to explain phenomena connected with the acquisition of causal judgments. It would be premature to conclude that inductive rule models such as the power PC theory (Cheng, 1997) and associative learning models such as the R-W model (Allan, 1993; Rescorla & Wagner, 1972; Shanks, 1995) are no longer viable contenders, but the evidence supports the conclusion that a model that is based on the principle of evidential evaluation is a strong alternative. References Allan, L. G. (1993). Human contingency judgements: Rule based or associative? Psychological Bulletin, 114, 435– 448. Allan, L. G., & Jenkins, H. M. (1983). The effect of representations of binary variables on the judgment of influence. Learning and Motivation, 14, 381– 405. Anderson, J. R., & Sheu, C.-F. (1995). Causal inferences as perceptual judgments. Memory and Cognition, 23, 510 –524. Arkes, H. R., & Harkness, A. R. (1983). Estimates of contingency between two dichotomous variables. Journal of Experimental Psychology: General, 112, 117–135. Baker, A. G., Berbrier, M., & Vallée-Tourangeau, F. (1989). Judgments of a 2 ⫻ 2 contingency table: Sequential processing and the learning curve. Quarterly Journal of Experimental Psychology, 41B, 65–97. Busemeyer, J. R. (1991). Intuitive statistical estimation. In N. H. Anderson (Ed.), Contributions to information integration theory (Vol. 1, pp. 187– 215). Hillsdale, NJ: Erlbaum. Catena, A., Maldonaldo, A., & Cándido, A. (1998). The effect of the frequency of judgment and the type of trials on covariation learning. Journal of Experimental Psychology: Human Perception and Performance, 24, 481– 495. Chapman, G. B., & Robbins, S. J. (1990). Cue interactions in human contingency judgment. Memory & Cognition, 18, 537–545. Cheng, P. W. (1997). From covariation to causation: a causal power theory. Psychological Review, 104, 367– 405. Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58, 545–567. Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychological Review, 99, 365–382. Dennis, M. J., & Ahn, W.-K. (2001). Primacy in causal strength judgments: The effect of initial evidence for generative versus inhibitory relationships. Memory & Cognition, 29, 152–164. Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: the belief-adjustment model. Cognitive Psychology, 24, 1–55. Jenkins, H. M., & Ward, W. C. (1965). Judgment of contingency between responses and outcomes. Psychological Monographs, 79(Whole no. 10). Kao, S.-F., & Wasserman, E. A. (1993). Assessment of an information integration account of contingency judgment with examination of subjective cell importance and method of information presentation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1363–1386. Levin, I. P., Wasserman, E. A., & Kao, S. F. (1993). Multiple methods for examining biased information use in contingency judgments. Organizational Behavior and Human Decision Processes, 55, 228 –250. Lober, K., & Shanks, D. R. (2000). Is causal induction based on causal power? Critique of Cheng (1997). Psychological Review, 107, 195–212. Mandel, D. R., & Lehman, D. R. (1998). Integration of contingency information in judgments of cause, covariation, and probability. Journal of Experimental Psychology: General, 127, 269 –285. McKenzie, C. R. M. (1994). The accuracy of intuitive judgment strategies: 724 WHITE Covariation assessment and Bayesian inference. Cognitive Psychology, 26, 209 –239. Mercier, P., & Parr, W. (1996). Inter-trial interval, stimulus duration and number of trials in contingency judgments. British Journal of Psychology, 87, 549 –566. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations on the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64 –99). New York: Appleton-Century-Crofts. Shaklee, H., & Hall, L. (1983). Methods of assessing strategies for judging covariation between events. Journal of Educational Psychology, 75, 583–594. Shaklee, H., & Mims, M. (1982). Sources of error in judging event covariations: Effects of memory demands. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8, 208 –224. Shaklee, H., & Tucker, D. (1980). A rule analysis of judgments of covariation between events. Memory & Cognition, 8, 459 – 467. Shaklee, H., & Wasserman, E. A. (1986). Judging interevent contingencies: Being right for the wrong reasons. Bulletin of the Psychonomic Society, 24, 91–94. Shanks, D. R. (1985). Continuous monitoring of human contingency judgment across trials. Memory & Cognition, 13, 158 –167. Shanks, D. R. (1987). Acquisition functions in contingency judgement. Learning and Motivation, 18, 147–166. Shanks, D. R. (1993). Human instrumental learning: A critical review of data and theory. British Journal of Psychology, 84, 319 –354. Shanks, D. R. (1995). The psychology of associative learning. Cambridge, England: Cambridge University Press. Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality judgement. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 21, pp. 229 –261). New York: Academic Press. Ward, W. C., & Jenkins, H. M. (1965). The display of information and the judgment of contingency. Canadian Journal of Psychology, 19, 231– 241. Wasserman, E. A., Chatlosh, D. L., & Neunaber, D. J. (1983). Perception of causal relations in humans: Factors affecting judgements of responseoutcome contingencies under free-operant procedures. Learning and Motivation, 14, 406 – 432. Wasserman, E. A., Dorner, W. W., & Kao, S.-F. (1990). Contributions of specific cell information to judgments of interevent contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 509 –521. Wasserman, E. A., Elek, S. M., Chatlosh, D. L., & Baker, A. G. (1993). Rating causal relations: The role of probability in judgements of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 174 –188. Wasserman, E. A., Kao, S.-F., Van Hamme, L. J., Katagiri, M., & Young, M. E. (1996). Causation and association. In D. R. Shanks, K. J. Holyoak, & D. L. Medin (Eds.), The psychology of learning and motivation: Vol. 34. Causal learning (pp. 207– 264). San Diego, CA: Academic Press. White, P. A. (2000a). Causal judgement from contingency information: Relation between subjective reports and individual tendencies in judgement. Memory & Cognition, 28, 415– 426. White, P. A. (2000b). Causal judgement from contingency information: The interpretation of factors common to all instances. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1083–1102. White, P. A. (2002a). Causal attribution from covariation information: The evidential evaluation model. European Journal of Social Psychology, 32, 667– 684. White, P. A. (2002b). Causal judgement from contingency information: Judging interactions between two causal candidates. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 55(A), 819 – 838. White, P. A. (2002c). Perceiving a strong causal relation in a weak contingency: Further investigation of the evidential evaluation model of causal judgement. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 55(A), 97–114. White, P. A. (2003a). Causal judgement as evaluation of evidence: The use of confirmatory and disconfirmatory information. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 56(A), 491–503. White, P. A. (2003b). Acquisition of causal judgement: associative learning or belief revision? Manuscript submitted for publication. Appendix A Relation Between pCI and ⌬P The algebra and the computer simulation that generated the data used to construct the figures are both the work of John Pearce. I am indebted to John for his help with this. Assume that the value of each cell is represented as a proportion of the total number of instances in the 2 ⫻ 2 table. Then p (C) ⫽ a ⫹ b, (A1) 1 ⫺ p(C) ⫽ c ⫹ d, (A2) a/(a ⫹ b) ⫽ p(E/C), and (A3) c/(c ⫹ d) ⫽ p(E/⫺C). (A4) MAKING CAUSAL JUDGMENTS 725 The four cells of the contingency table can then be defined as follows: a ⫽ p(C) 䡠 p(E/C), (A5) b ⫽ p(C) 䡠 [1 ⫺ p(E/C)], (A6) c ⫽ [1 ⫺ p(C)] 䡠 p(E/⫺C), and (A7) d ⫽ 1 ⫺ p(C) ⫺ p(E/ ⫺ C) ⫹ p(C) 䡠 p(E/⫺C). (A8) pCI ⫺ 4 䡠 p(C) 䡠 p(E/⫺C) ⫹ 2p(C) ⫹ 2p(E/⫺C) ⫺ 1 . 2p(C) (A9) From this it can be shown that ⌬P ⫽ Conversely, pCI ⫽ 2p(C) 䡠 (⌬P ⫹ 2p(E/⫺C) ⫺ 1) ⫺ 2p(E/⫺C) ⫹ 1. (A10) The key variable in the relationship between pCI and ⌬P is p(C). The mediating role of p(C) is illustrated in Figure A1, which represents values of pCI for different values of p(C), p(E/C) and p(E/⫺C) for given values of ⌬P. It can be seen in each case that pCI and ⌬P always have the same value when p(C) ⫽ .5. The relationship between pCI and p(C) is linear in every case but the line rotates around the point at which p(C) ⫽ .5 as values of p(E/C) and p(E/⫺C) change. Figures for negative values of ⌬P are mirror images of the figures for the equivalent positive values. It is also important to illustrate how pCI relates to ⌬P for given values of p(C). Setting values for p(E/C) and ⌬P fully constrains values for p(E/⫺C), so it is convenient to present figures that show how the relationship Figure A1. Relationship between p(C) and pCI for several values of ⌬P. A: ⌬P ⫽ .75. B: ⌬P ⫽ .50. C: ⌬P ⫽ .25. D: ⌬P ⫽ 0. Values in the legend are p(CI) ⫽ proportion of confirmatory instances; p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d). (Appendixes continue) 726 WHITE between pCI and ⌬P changes for given values of p(C) when p(E/C) is held constant. Figure A2 presents these results for five different values of p(E/C). These figures again show that values of pCI match ⌬P values when p(C) ⫽ .5. Figure A2. Relationship between ⌬P and pCI for several values. A: p(E/C) ⫽ 1. B: p(E/C) ⫽ .75. C: p(E/C) ⫽ .50. D: p(E/C) ⫽ .25. E: p(E/C) ⫽ 0. Values in the legend are p(C). p(CI) ⫽ proportion of confirmatory instances; p(C) ⫽ (a ⫹ b)/(a ⫹ b ⫹ c ⫹ d). MAKING CAUSAL JUDGMENTS 727 Appendix B Example Problem, Experiment 1: The .33L problem Patient: W.T. Meal no. BA Allergic reaction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 present present present absent absent present present present present absent present absent present absent absent present present absent present absent present present present present yes yes no no no no no yes no no yes no no no no yes yes no no no yes no yes no Please rate the likelihood of this statement being right for Patient W.T.: BA causes the patient’s allergic reactions. Appendix C Example Problem, Experiment 3: The .33L Problem Patient A.M. Meals containing E263 Meals not containing E263 Allergic reaction No allergic reaction 8 0 8 8 Please judge the likelihood that E263 causes allergic reactions in this patient. Received November 28, 2001 Revision received December 13, 2002 Accepted December 16, 2002 䡲
© Copyright 2026 Paperzz