Product-Variable Models of Interaction Effects and Causal Mechanisms * Lowell Hargens University of Washington Working Paper no. 67R Center for Statistics and the Social Sciences University of Washington November 8, 2006; Revised February 20, 2008 * This paper has benefited from valuable advice and encouragement given by Paul Allison, Herbert Costner, Christine Fountain, Jerry Herting, Beth Hirsh, Stephanie Liddle, Stanley Lieberson, J. Scott Long, Ross Matsueda, Barbara Reskin, Stewart Tolnay and Michael Ward. Direct correspondence to Lowell Hargens, Department of Sociology, University of Washington, Seattle, WA 98195-3340. Email: [email protected]. 2 Product-Variable Models of Interaction Effects and Causal Mechanisms Abstract Among those who use multiple regression or its offshoots, the dominant method of modeling an interaction effect of two independent variables on a dependent variable is to include a product variable in a linear estimation equation. In this paper I show how the coefficient for the product variable in these models depends on the causal mechanism that underlies the interaction effect. I also show that different causal mechanisms can imply the same estimation equation, which means that one cannot determine the causal mechanism underlying an interaction effect from the empirical results produced by the estimation equation. Social scientists typically lack the kind of theoretical knowledge required to specify causal mechanisms for interaction effects, and researchers who do specify a mechanism, such as those who specify that contextual-level variables moderate the effects of individual-level variables but not vice versa, rarely justify their implicit claims. Although it is difficult to specify the causal mechanism underlying a particular interaction effect, I show that there are cases where it is possible to do so. 3 Product-Variable Models of Interaction Effects and Causal Mechanisms1 Many interesting findings in the social sciences involve “interaction” effects, also known as “moderator” or “synergistic” effects (Cohen et al. 2003, p. 255; Corno et al. 2002). Two variables have an interaction effect on a dependent variable if the relationship of either independent variable with the dependent variable changes across values of the other independent variable.2 In an early study of radio listenership, for example, Lazersfeld found that age was positively related to listening to classical music programs among the highly educated, but negatively related to it among the less educated (Zeisel, pp. 123-125). Although there are many ways to model interaction effects (Stolzenberg 1974, Allison 1977, Southwood 1978, Fisher 1988), quantitative studies using multiple regression analysis or its offshoots almost always employ product variables to model them. Indeed, most textbooks treat only the product-variable option. The product-variable approach to modeling an interaction effect of two explanatory variables involves including the product of the two variables in an equation that usually also contains both of them individually. Specifically, given a dependent variable Y and independent variables X and Z, we multiply the values of X and Z for 1 This paper draws on insights contained in two important papers. The first, Allison (1977), is heavily cited because it clearly articulates some key features of product-variable models of interaction effects. The second, Fisher (1988) is rarely cited even though it clearly shows that preexisting knowledge or theory is required before one can give a causal interpretation to an interaction effect model. Furthermore, my examination of papers that do cite it indicates that the authors of the citing papers refer to it for points tangential to its main message. 2 This usage of the word “interaction” is recognized by almost no English dictionaries and appears to be due to R. A. Fisher, who adapted a term from genetics to describe deviations from additivity (Box 1978, p. 111; Fisher 1924, p. 200). 4 each case and then include this new variable, as well as X and Z, in the regression equation shown in equation 1.3 Y = β 0 + β x X + β z Z + β xz ( XZ ) + u (1) Higher order interaction effects—those involving three or more independent variables— can also be modeled by forming products of the constituent independent variables (see Blalock 1965, Cohen et al. 2003, pp. 290-91). Omitting the product variable from equation 1 forces the regression surface to be a plane, and specifies that X and Z have only additive effects on Y (Cohen et al., pp. 257-259). In contrast, including the product variable in equation 1 allows the best fitting regression surface to be “warped” rather than forcing it to be two dimensional (see Figure 1). [ Figure 1 about here ] Including the XZ product term in equation 1 complicates the interpretation of the resulting regression coefficients. Several authors, for example, have noted that it is wrong to interpret βx and βz in equation 1 as “main effects” that are separate from the interaction effect of X and Z on Y (e.g., Allison 1977, Marsden 1981, Braumoeller 2004). Instead, βx only gives the slope of the regression surface when it intersects the Y-X plane at Z = 0, while βz gives the slope of the regression surface when it intersects the Y-Z plane at X = 0 (see Figure 1). Discussions of the meaning of the coefficient for the product term (βxz) in equation 1 are rare compared to discussions of the meanings of βx and βz. In this paper I 3 Saunders (1956) first proposed this procedure. 5 show that the value of βxz depends on the causal mechanism4 that underlies an interaction effect, and also that βxz’s value cannot tell us what that mechanism is. Specifically, although βxz’s value is part of the description of the regression surface produced by an interaction effect, different causal mechanisms can produce the same regression surface and, therefore, the same value for βxz. As a result, knowing the characteristics of an interaction effect’s regression surface, in particular βxz ‘s value, gives us no information about why the interaction effect exists, and sometimes misinforms us as to whether it exists at all. From Regression Surfaces to the Causal Mechanisms that Generate an Interaction Effect Discussions of how to interpret the results of modeling an interaction effect using the product-variable approach often begin by noting that that the partial derivatives of equation 1 with respect to X and Z specify how each independent is related to the dependent variable. For example, Stolzenberg (1979, p. 472) notes that: 4 ∂Y = β x + β xz Z ∂X (2) ∂Y = β z + β xz X ∂Z (3) Here I am using the term causal mechanism in the sense given by Marini and Singer (1988). Causal mechanisms are often distinguished from “causal effects.” Causal effects describe what happens when the value of an independent variable changes, while causal mechanisms describe how or why it happens (Holland 1988, Smith 1990). 6 These two equations illustrate the characteristics of regression surfaces generated by product-variable models of interaction effects. First, they show that, as noted above, βx is the slope of the regression surface in the X dimension when Z equals zero and that βz is the slope of the regression surface in the Z dimension when X equals zero. In addition, equation 2 shows that changes in the slope of the regression surface in the X dimension is a linear function of Z. With each unit increase in Z, the slope of the regression surface in the X dimension changes by βxz units. X’s slope in Figure 1, for example, is positive when Z = 0, which means that β x is positive. As Z increases, however, the slope of the regression plane in the X dimension becomes increasingly less positive (implying that β xz has a negative sign) and when Z = a, X’s slope, which at that point equals βx + (βxz × a), is negative. Equation 3 shows that the slope of the regression plane in the Z dimension is a linear function of X, which is why this kind of interaction effect is often called a “linear by linear interaction.5 Finally, equations 2 and 3 show that βxz simultaneously describes how the slope of the regression surface in the X dimension changes with variation in Z and how the slope the regression surface in the Z dimension changes with variation in X. In both cases, an increase of one unit in one variable is associated with a change of βxz in the slope of the regression surface in the other variable’s dimension. Although equations 2 and 3 provide a concise description of the regression surface produced by an interaction effect, researchers often want to interpret the regression surface as specifying something about the causal system that produced the 5 Although my discussion here focuses on the case where the independent variables involved in an interaction are both quantitative variables, the points I make below hold equally for cases where either or both of the independent variables is categorical. 7 interaction effect. Thus, following the approach to interpreting regression coefficients in additive models, researchers typically interpret βx as specifying the causal impact of X on Y when Z equals zero and βz as specifying the causal impact of Z on Y when X equals zero.6 The coefficient for the product term in equation 1 does not have a simple causal interpretation because it appears to describe how the causal effect of a variable on Y is affected by variation in the other variable. In the example of the interaction of age and education on listening to classical music, for example, βxz seems to describe both how variation in age produces changes in education’s effect on listening to classical music and how variation in education produces changes in age’s effect on listening to classical music. This feature has led many authors to characterize the product-variable approach as “symmetrical” (Aiken and West 1991, p. 10; Cohen et al. 2003, pp. 266, 271; Fox 1997, p. 146) or, put differently, that Z’s effect on X’s relationship with Y, and X’s effect on Z’s relationship with Y are “two sides of the same coin” (McClendon 1994, p. 274), or that X and Z contribute equally to their mutual interaction (Fisher 1988, p. 88). Below I show how βxz is determined by the causal mechanism that produces an interaction effect. To describe the causal mechanism producing an interaction effect, one must specify whether each independent variable affects the other independent variable’s causal impact on the dependent variable. This is a difficult task that researchers almost never consider seriously. For example, given the age-education interaction effect on 6 These “conditional” causal effect estimates are often meaningless because they refer to conditions that are not present in the population of cases. For the education by age interaction effect on listening to classical music, for example, if age were measured as years since birth the slope coefficient for the additive education coefficient would refer to the effect of education for cases who are not yet a year old, and there were no such cases in the data. Textbook authors frequently recommend centering the independent variables involved in an interaction so that the coefficients for the non-product terms in the regression equation will have concrete interpretations in terms of the observed data (e.g., Cohen et al. 2003, p. 261). 8 listening to classical music, how can one to determine whether age causes changes in education’s effect, or education causes changes in age’s effect, or both? Readers of previous drafts of this paper have sometimes argued that it is impossible to answer this kind of question and that, as a result, it is fruitless to talk about the causal mechanisms that underlie interaction effects. I believe that this argument is wrong both because researchers often implicitly specify them, and because one can sometimes specify them a priori. From Causal Mechanisms to Regression Surfaces In order to determine the relationship of the coefficient of the product variable in a product-variable interaction model to the causal mechanism that generated it, we can begin by specifying a causal mechanism, and then derive the regression equation that a researcher would use to estimate the components of the interaction it produces. Let us begin by specifying an interaction effect involving two independent variables, X and Z, and a dependent variable, Y, in which each independent variable causally affects the dependent variable when the other independent variable equals zero, and in which each independent variable affects the other independent variable’s effect on Y. This kind of causal mechanism is illustrated graphically by Figure 2, which portrays (a) an independent variable’s effect on a dependent variable when the other independent variable equals zero as a directed line from the independent (X or Z) to the dependent variable (Y), and (b) an independent variable’s impact on another independent variable’s effect on a dependent variable as a directed line pointing to the line that connects the second independent variable to the dependent variable. For labeling purposes, let us call 9 the first kind of effect an “effect conditioned on zero” (because its value depends on the other variable having a value of zero) and the second kind of effect a “moderator effect.” Thus, this kind of causal mechanism consists of two “effects conditioned on zero” and two “moderator effects.” Note that this kind of causal mechanism does not specify that the magnitude of Z’s effect on the X’s impact on Y is the same as X’s effect on the Z’s impact on Y; it only specifies that both of these effects are non-zero. [ Figure 2 about here ] We can translate the above specifications into equations by following some of the conventions of the multilevel-modeling literature (Hox 2002, see also Fisher 1988, and Allison 1999, pp. 166-69). Equation 4 is an overall equation specifying that both X and Z affect Y when the other variable equals zero, while equation 5 specifies that X’s effect on Y is a linear function of Z that equals α 0 when Z equals zero, and equation 6 specifies that Z’s effect on Y is a linear function of X that equals γ 0 when X equals zero. Borrowing the terminology of the multilevel-modeling literature, I refer to equation 4 as a “level 1” equation and equations 5 and 6 as “level 2” equations.7 Y = β0 + β x X + βz Z + u (4) β x = α 0 + α1Z (5) βz = γ 0 + γ1X (6) Substituting equations 5 and 6 into equation 4 produces equation 7, which can be rewritten as equation 8. Because equation 8 is structurally identical to equation 1, one 7 Level-1 equations have a dependent variable, Y, on their left-hand sides and level-2 equations have a coefficient, β, on their left-hand sides. In the multilevel-analysis literature level-2 equations have additional properties. For example, in that literature variables in level-2 equations characterize nested groupings of observations, and level-2 equations typically include disturbance terms. None of these additional properties are needed to make the main points in this paper. 10 would use equation 1 to estimate the various components of the causal mechanism specified in equations 4 through 6. Y = β 0 + (α 0 + α1Z ) X + (γ 0 + γ 1 X ) Z + u (7) Y = β 0 + α 0 X + γ 0 Z + (α1 + γ 1 ) ZX + u (8) Equation 8 shows that for this type of causal mechanism, the product term’s coefficient (βxz in equation 1) equals the sum of Z’s impact on X’s effect on Y (α1) and X’s impact on Z’s effect on Y (γ1). Without additional information about α1 or γ 1 we cannot determine either coefficient’s value from the value of βxz. This means that when we use equation 1 to estimate the components of this kind of causal mechanism and reject the null hypothesis that the coefficient for the product term equals zero, we can infer only that some kind of moderator effect is present but we cannot specify anything concrete about the nature of the moderator effect(s). Equation 8 also shows why it is wrong to claim that βxz in equation 1 simultaneously describes both X’s impact on Z’s effect on Y and Z’s impact on X’s effect on Y. Although we obtain only one coefficient for the product term when we estimate equation 8, that does not mean that the two underlying causal coefficients ( α1 and γ 1 ) are equal. It means only that we cannot determine their individual magnitudes without further information or assumptions that will help us solve this kind of identification problem (Manski 1993, pp. 32-36). Thus, even in the unlikely case where α1 equals γ1, the coefficient for the product term will not equal their common value; instead it will equal twice that value. 11 An interesting possibility suggested by equation 8 occurs when α1 and γ 1 have opposite signs and hence offset each other. For example, consider the relationship between the likelihood that an assistant professor will be tenured (Y), the prestige of her department (X), and her publication productivity (Z). First, in cases where assistant professors have no publications, departmental prestige will have a negative effect on the likelihood of being tenured, meaning that βx will have a negative value. (In low prestige departments it is possible to earn tenure even without having published because such settings deemphasize publication productivity compared to other activities such as teaching and departmental citizenship.) Second, across all levels of departmental prestige, higher levels of scholarly publication will be positively related to the likelihood of gaining tenure. Since measures of departmental prestige typically do not take on a value of zero, one could center the measure being used so that βz would have a positive value. Third, it is reasonable to expect that departmental prestige positively affects publication productivity’s effect on the likelihood of gaining tenure, meaning that γ 1 is positive. (The causal effect of publication productivity is higher in high prestige departments than in low prestige departments.) Finally, it is also reasonable to expect that publication productivity negatively affects the effect of departmental prestige on the likelihood of gaining tenure, (Highly productive assistant professors have a high probability of gaining tenure in any department) meaning that α1 is negative. Figure 3 shows this causal mechanism, and if the two moderator effects it contains are similar in magnitude, the usual statistical test for the coefficient of the product variable in equation 1 would suggest that departmental prestige and publication productivity have only 12 additive effects on the likelihood of gaining tenure when in fact they have an interaction effect.8 [ Figure 3 about here ] Thus, we must be confident that α1 and γ 1 have the same sign before we can argue that a zero value for βxz implies that the effects of X and Z are additive. Inferring additivity from a zero value of βxz without knowing that the moderator components of the interaction have the same sign commits the fallacy of affirming the consequent. Specifically, one cannot necessarily conclude that no interaction effect exists on the basis of (1) the truth that when no interaction is present the coefficient for a product term will equal zero and (2) the observation of a zero-value for that coefficient. Returning to the general case, one must obviously have some kind of additional information about α1 or γ 1 before one can determine a value for either of these coefficients. If one knew α1 ’s value, for example, one could subtract it from βxz to obtain an estimate of γ 1 . Similarly, if one knew that that α1 was a specific multiple of γ 1 and had an estimate of βxz , it would be possible to use knowledge of the multiple (k) and the results of two simultaneous equations ( α1 = k γ 1 and α1 + γ 1 = βxz ) to estimate α1 and γ 1 . A third strategy is available to those who know that X’s impact on Z’s effect on Y differs in functional form from Z’s impact on X’s effect on Y. If one knew that β 2 = γ 0 + γ 1 X , for example, one would substitute this equation and the other level-2 equation 5 into level-1 equation 4 to obtain Y = β 0 + α 0 X + γ 0 Z + α1ZX + γ 1Z X + u . 8 In fact, data on assistant professors in U.S. graduate sociology departments that I am currently analyzing show that net of relevant controls (public vs. private university, etc.), the coefficient for the product of publication productivity and departmental prestige is not statistically significant. 13 Because this equation contains two different product variables, one can to obtain separate estimates for α1 and γ 1 . Unfortunately, because the product variables ZX and Z X will be highly correlated, this strategy produces imprecise estimates of α1 and γ 1 . Obviously, these strategies for obtaining unique estimates of α1 and γ 1 require knowledge or theoretical precision that social scientists rarely possess. Without them, however, an estimate of βxz’s value tells us nothing about either X’s impact on Z’s effect on Y or Z’s impact on X’s effect on Y. Implicit Identification of the Product Term A common, although seldom explicitly justified, way of dealing with the identification problem noted above is to specify either that X has no impact on Z’s effect on Y, or that Z has no impact on X’s effect on Y. If we specify that γ 1 in equation 6 equals zero, for instance, level-2 equation 6 becomes β z = γ 0 , and the estimation equation is Y = β 0 + α 0 X + γ 0 Z + α1ZX + u . Figure 4 shows this kind of interaction effect, which consists of two effects conditioned on zero and one moderator effect. [ Figure 4 about here ] Once again, one would use equation 1 to estimate the coefficients specified by this new formulation. Now, however, βxz in equation 1 corresponds to α 1 ’s value. Of course, a parallel argument that α1 in equation 5 instead of γ 1 equals zero produces the equation Y = β 0 + α 0 X + γ 0 Z + γ 1ZX + u . In this case βxz in equation 1 corresponds to γ 1 ’s value. Thus, it is clear that the meaning of the coefficient for the product variable in 14 equation 1 depends on the nature of the causal mechanism that a researcher believes is generating the regression surface given by equation 1. Multilevel models almost always specify that one of the possible moderator effects in an interaction is absent. In particular, in contextual-effect models researchers almost always specify that contextual variables affect the effects of individual-level variables but not vice versa. For example, let us consider a simple model involving the effects of an individual level variable, pupil SES, and a contextual variable, schools’ per pupil funding, on individual pupils’ achievement-test scores. Researchers would typically specify a contextual effect model for these variables with level-1 equation 9 stipulating that individuals’ test scores (Y) are a function of individuals’ SES levels (X), and two level-2 equations (10 and 11) stipulating that both the intercept and slope of the level-1 equation are functions of schools’ per pupil funding (Z).9 Y = β0 + β x X + u (9) β 0 = α 0 + α1Z + e0 (10) β x = γ 0 + γ 1Z + e1 (11) Substituting equations 10 and 11 into equation 9 and simplifying, we obtain equation 12 which, except for its more complicated error term, is equivalent to equation 1. Y = α 0 + α1Z + γ 0 X + (γ 1 ) XZ + [e0 + e1 X + u ] 9 (12) Multilevel models usually specify that level-2 equations have disturbance terms. An alternative in this case might be to specify equation 4 as the level-1 equation and equation 11 as a single level-2 equation. Regardless of one’s choice, ordinary least-squares gives incorrect estimates of coefficients’ standard errors and researchers should therefore use other estimation procedures (Mason et al., 1983). This estimation problem does not affect the interpretational issue that I address in this paper, however. 15 Equation 12 specifies that the coefficient for the product term estimates the contextual effect of schools’ per pupil funding levels on the individual-level effect of pupils’ SES on test scores. It is important to realize, however, that by themselves equations 9 through 12 do not justify this specification, they only make explicit knowledge that a researcher should have before using those equations. Multilevel modeling textbooks show how to use versions of equations 9 through 12 to obtain estimates of contextual effects, but to my knowledge do not discuss the prior need to justify the assumption that contextual variables cause variation in the effects of individual-level variables, but that individual-level variables do not cause variation in the effects of contextual variables. In the example at hand, what is the justification for specifying that pupil SES does not affect the causal impact of per pupil funding on test scores? The textbooks’ silence on the possibility that individual-level variables may cause variation in the effects of contextual variables does not justify the assumption that they do not do so. Algebraically, one could just as easily place Z in the level-1 equation and X in the level-2 equations to produce an “individual effects” model asserting that variation in the effect of schools’ per pupil funding on test scores stems from variation in pupils’ SES.10 Furthermore, there are substantive questions for which this alternative specification is appropriate (Kozlowski and Klein 2000). In at least one instance (Xue et al. 2007), for example, researchers estimated a model in which an individual-level variable (prosocial activities) moderated the effects of contextual (neighborhood) variables. Following the common practice, the researchers did not attempt to justify their 10 In this case one would need to have sufficient cases for each observed value of X, and reasonable variation in Z for each value of X, to reliably estimate the impact of X on Z’s effect on Y. 16 specification that the contextual level variables had no effect upon the effects of the individual level variable. Given the apparently universal failure to justify the implicit claim that either γ 1 or α 1 in equation 6 equals zero, it is probably reasonable to admit that “contextual effects” and “individual effects” coexist, in which case we are back to the type of causal mechanism discussed in the previous section (although one with a more complicated error term than is present in equation 1) in which it is impossible to determine the meaning of the coefficient for the product term. In general, it is difficult to avoid the conclusion that the popularity of the specification that individual-level variables do not affect the effects of contextual-level variables rests on unconsidered convention rather than on conscious deliberation. Theoretically Based Identification of the Product Term Having dwelt on the general difficulties associated with specifying the effects of X and Z on the other variable’s effect on Y, let us consider whether this can ever be done. One class of causal mechanisms in which it is relatively easy to specify such effects consists of cases in which an independent variable (X) affects a dependent variable while a second independent variable (Z) affects cases’ exposure to X but alone has no effect on Y. As an example, consider the effect of whether a parent smokes tobacco on the likelihood that a preschool child exhibits asthma symptoms. One would expect that the difference in asthma symptoms between children with a parent who smokes tobacco and those whose parents do not will be smaller for children with parents who are employed outside the home, because differences in preschoolers’ exposure to tobacco smoke will be 17 smaller among this group than among those whose parents work in the home. However, whether a parent works outside the home has no effect on the likelihood that a preschooler will develop asthma among those preschoolers whose parents who do not smoke tobacco. [ Figure 5 about here ] Figure 5 presents a diagram representing this kind of interaction effect. Note that the omission of an arrow from Z to Y in this figure represents the fact that Z is causally related to Y only through its effect on X’s effect on Y. This kind of interaction effect nicely illustrates the distinction between a causal effect and a causal mechanism. If a parent who smokes were to quit a job outside the home, one can expect that the preschool child would be more likely to develop asthma than if the parent were to keep that job. The causal mechanism that produces this causal effect does not involve a direct causal link between working outside the home and asthma, however. Instead it is produced entirely by the effect of working outside the home on the effect of having a parent who smokes. We can turn the above theoretical specifications into level-1 equation 13 and level-2 equation 14, where X measures how much a parent smokes, Z is a measure of how much time that parent works outside the home, and Y is a measure of the extent to which a preschooler has developed asthma. Substituting equation 14 into equation 13, we obtain equation 15. Y = β0 + β x X + u (13) β x = α 0 + α1Z (14) 18 Y = β 0 + α 0 X + α1ZX + u (15) This equation differs structurally from equation 1 because it does not include a “first order” or “additive effect” term for Z. Note that using equation 1 rather than 15 will produce consistent, but inefficient estimates of the coefficients for this interaction because the equation contains an irrelevant variable (Allison 1977). Thus, if we believed that equations 13 and 14 specify the causal mechanism underlying an interaction effect, we would not use equation 1 to estimate its parameters. This conclusion is at odds with the claim that researchers should always include lower-order terms when estimating interaction-effect models (McClendon 1994, pp. 286-287; Cohen et al. 2003, p. 284). Whether lower-order terms need to be included in an estimation equation depends partly on the causal mechanism that one believes to be present (see also Bobko 1986, Cronbach 1987). A theoretical specification of the causal mechanism underlying an interaction effect is not the only determinant of the estimation equation one should use, however, because the types of scales that are used to measure the variables involved in the interaction also play a role. Specifically, when one or more of these variables are measured in terms of an interval scale,11 it may be necessary to include a first-order term in one’s estimation equation even though the theoretical equations for the interaction effect suggest that it should be omitted (Allison 1977, p. 150). This is because interval scales have an arbitrary zero point, which in turn requires that one use an estimation equation that allows for possible changes in that zero point. A general rule for the kind of causal mechanism specified by Figure 4 is that whenever the variable specified in the 11 Here I follow the typology of scales proposed by Stevens (1951). Interval scales have quantitative categories with equal intervals between successive integer values, but lack an absolute zero point. 19 level-1 equation (X) is measured in terms of an interval scale, we need to use an estimation equation that includes a first-order term for the moderating variable (Z). To see this, let us consider an interaction with one moderator and one effect conditioned on zero in which both X and Z are quantitative variables. Suppose that X in equation 13 is measured in terms of an interval scale. Since the score of zero for such a scale is arbitrary, we may add any non-zero constant to the existing values of X without altering any of the essential properties of X. Specifying the constant as c, we might create a new variable, x, where x = X + c, implying that X = x – c (Allison 1977). Substituting this into equation 13 we have for our new level-1 equation Y = β 0 + β x ( x − c ) + u , and once again equation 14 is the level-2 equation. Substituting equation 14 into the new level-1 equation yields equation 16, which can be simplified to equation 17, in which β 0* = β 0 + α 0c and β z* = −α1c . Y = β 0 + α 0c + α 0 x − α1cZ + α1 xZ + u (16) Y = β 0* + α 0 x + β z*Z + α1 xZ + u (17) Equation 16 is structurally identical to equation 1 and includes Z as a regressor even though the theoretical equations specify that Z is does not causally affect Y when X equals zero. We can justify omitting Z from the estimation equation only if there is reason to believe that either α1 or c, which compose β z* , equals zero. We have already specified that c is non-zero, however, and if we believed that α1 equals zero we would not be estimating an interaction-effect model in the first place. 20 Both β 0* and β1* in equation 17 are affected by one’s choice of a value for c. This makes sense because β 0* is the regression plane’s intersection with the Y axis when all of the independent variables, including X, equal zero, and β1* is the slope of the regression plane in the Z dimension when X equals zero. Returning to Figure 1, for example, if we changed the X’s zero point to be at point b on the X axis, β 0* ’s value would increase and β1* ’s value would be negative rather than positive. In contrast, Equation 17 shows that changes in X’s zero point will not affect the coefficient for X itself or the coefficient for the product variable (Allison 1977). Let us next consider the case where it is the level-2 variable (Z) that is measured in terms of an interval scale rather than the level-1 variable. Equation 13 will be the level-1 equation, but following the same logic as above, the level-2 equation is now β1 = α 0 + α1 ( z − c ) . Substituting this into equation 12 and simplifying, we obtain equation 18. Y = β 0 + (α 0 + α1c ) X + α1 Xz + u = β 0 + β1* X + α1 Xz + u (18) In this case there is no additive coefficient for z in the estimation equation, and the only coefficient affected by the choice of zero value for Z is β1* . Because β1* tells us the slope of the regression surface in the X dimension when Z equals zero, it should be affected by our arbitrary choice of Z’s zero value. In sum, when using a product variable to model the type of interaction effect shown in Figure 4 we must consider not only the level-1 and level-2 equations to 21 determine an appropriate equation for estimation and testing, but also whether any of the variables in the level-1 equation are measured in terms of interval scales. If they are, the estimation equation will have to include additive effects that are not implied by the level1 and level-2 equations. Another type of theoretically based approach to specifying a causal mechanism that underlies an interaction effect comprises instances where a given value of a dependent variable can occur only when two or more independent variables simultaneously have certain values. Most substantive instances of these interactions involve binary independent variables.12 For example, for many years only white men could become CEOs of Fortune 500 companies. Similarly, Goldstone and Useem (1999) argue that prison riots are much more likely to occur if five preconditions are simultaneously present than if any is absent. Bobko (1986) gives more examples of this kind of causal mechanism. Figure 6 presents a graphical representation of this kind of causal mechanism for two independent variables, X and Z. To indicate that only the joint occurrence of certain values of both independent variables affects the dependent variable, Figure 6 represents their joint occurrence by using the multiplication operator and grouping both variables within a circle. Returning to the CEO example, one could construct binary variables for being male (X = 0 if no, X = 1 if yes) and being white (Z = 0 if no, Z = 1 if yes). The product X*Z will then equal one when both X and Z have values of one but zero if either equals zero. X and Z appear in the level-1 equation for this kind of causal mechanism 12 The main exception to this generalization are “gravity” or “intervening opportunity” models of migration, which hold that migration between two geographical areas is the product of their population sizes and the distance between them, and which use ratio-scale measures of these concepts. See, for example Stewart (1948). 22 only as parts of the product term (see equation 19), and below I will refer to this kind of causal mechanism as a “purely multiplicative mechanism” (PMM). [ Figure 6 about here ] PMMs involve no level-2 equations, and in the CEO example the level-1 equation specifies that (1) being white has no effect on females’ chances of being a CEO, and (2) being male has no effect on blacks’ chances of being a CEO. Y = β 0 + β1 XZ + u (19) As noted earlier, many have claimed that the coefficient for the product term in product-term models of interaction effects simultaneously describes each independent variable’s effect on the other’s effect on the dependent variable. This claim is true for PMMs, but not for the previous types. In the CEO example, β1 in equation 19 describes both how variation in sex affects the effect of race (the latter equals zero when one is female, and β1 when one is male), and how variation in race affects the effect of being male (the latter equals zero when one is nonwhite and β1 when one is white). Alternatively, one might view this kind of causal mechanism as involving a single variable rather than two. Specifically, if one defines a binary variable in terms of being a white male ( = 1) or not (= 0), then the value of β1 describes the additive effect of this new variable. Equation 19 is both the theoretical equation for a PMM and, in most cases, the equation one would use to estimate the coefficient describing it. As is true for the previous type of interaction, however, if one of the independent variables in equation 19 is measured in terms of an interval scale, an additive effect for the other will need to be added to equation 19 to take account of the arbitrariness of the zero point of the interval- 23 scale independent variable. In fact, when both independent variables are measured in terms of interval scales, Allison (1977, p. 150) shows not only that equation 1 is the appropriate estimation equation but also that special estimation procedures are needed because this case implies constraints on the coefficients being estimated. In sum, although it is usually difficult to specify whether each variable in an interaction effect moderates the effects of the other variable(s) involved, there are cases where it is possible to do so. It is important to note, however, that the ability to make such a specification is based on preexisting theoretical knowledge rather than the empirical results obtained from estimating any of the above equations. Without such preexisting knowledge, one cannot go beyond interpreting the value of the coefficient for the product term as merely one part of a description of the regression surface yielded by one’s estimation procedure. Conclusion In this paper I have shown how the causal mechanism underlying an interaction effect generates the coefficient for the product term in a product-variable model of that interaction effect. The main lesson to be learned from this demonstration is that researchers need preexisting theoretical information about the nature of the causal mechanism at work before they can move beyond treating the coefficient for the product term merely as part of the description of a regression surface. Of course, for some purposes we do not need to go any further than describing the regression surface. When we need to include an interaction effect only in order to have a correctly specified analytic equation, for example, our inability to go beyond describing the regression surface is not a substantive hindrance. For other purposes, however, knowledge about 24 underlying causal mechanisms is crucial and errors can have serious consequences. For example, believing that the type of mechanism shown in Figure 2 is present when in reality the type operating is that shown in Figure 5 could lead to an ineffectual policy recommendation. Specifically, if the mechanism shown in Figure 5 is present, increasing the value of Z will have no effect on Y’s value when X is absent (has a value of zero). Believing that the mechanism shown in Figure 2 is operational in this case might lead one to recommend a policy change aimed at increasing Z’s value, because the kind of interaction effect shown in Figure 2 specifies that Z has an effect on Y even in the absence of X. Thus, even though we cannot distinguish between different possible causal mechanisms solely on the basis of the empirical results of studies employing product variables, different causal mechanisms can produce substantially different outcomes. Considering the causal mechanisms that may be underlying interaction effects will also reduce the prevalence of tacit specifications, such as the popular specification that contextual variables moderate the effects of individual-level variables but not vice versa. If one cannot theoretically justify this claim in a given analysis, then the coefficient for a cross-level product term in a multilevel model cannot be interpreted as measuring a contextual effect. I believe that many researchers who have reported statistically significant coefficients for cross-level product terms and interpreted them as evidence for contextual effects will have difficulty explicitly defending their implicit specifications. Finally, considering the causal mechanisms that might underlie an interaction effect reveals a limitation of the usual statistical test for the presence of an interaction effect, which involves assessing the statistical significance of the coefficient for the 25 product variable. In particular, if one is unable to justify specifying a causal mechanism other than that shown in Figure 2, and if α1 and γ1 in equation 8 have similar magnitudes but opposite signs, the value of βxz in equation 1 will be near zero and presumably not statistically significant, leading to the erroneous conclusion that no interaction effect is present. Researchers unable to specify that at least one of the independent variables has no effect on the other’s causal impact must therefore be confident that α1 and γ1 have the same sign before they can treat the usual statistical test as being a general test for the presence of an interaction effect. Obviously, when researchers want to carry out a simultaneous test for several possible interactions, such as all of the possible two-way interactions between gender and a group of other independent variables, they need to have information about the causal mechanisms underlying each of the interactions before they can be confident that they are carrying out an overall test of interaction effects. The general point that researchers need theoretical knowledge before they can interpret the characteristics of a regression surface in causal terms holds for additive models as well as those involving interaction effects (Duncan 1975, Berk 2004). In additive models, however, the connection between a theoretical model and the partial derivatives that characterize the regression surface is clear and direct because the variables we include in our estimation equations are the same as those that our theory specifies to be causes of a dependent variable. In contrast, when we model interaction effects we include product variables that typically do not directly correspond to the causal mechanisms at work and that are compatible with several different causal mechanisms. Because we usually lack theoretical knowledge about the causal mechanisms that underlie interaction effects our ability to give causal interpretations to product-variable 26 models of interaction effects is limited. Recognition of this limitation will not only prevent researchers from erroneously interpreting regression surfaces as giving information about the causal mechanisms underlying interaction effects, but should also encourage researchers to develop more adequate theoretical specifications of the interaction effects in which they are interested. 27 Bibliography Aiken, Leona S. and Stephen G. West. (1991) Multiple Regression: Testing and Interpreting Interactions. Newbury Park, CA: Sage. Allison, Paul D. (1977) “Testing for Interaction in Multiple Regression.” American Journal of Sociology 83: 144-153. Allison, Paul D. (1999) Multiple Regression: A Primer. Thousand Oaks, CA: Pine Forge Press. Berk, Richard A. (2004) Regression Analysis: A Constructive Critique. Thousand Oaks, CA: Sage/ Box, Joan F. (1978) R. A. Fisher: The Life of a Scientist. New York: John Wiley and Sons. Braumoeller, Bear F. (2004) “Hypothesis Testing and Multiplicative Interaction Terms.” International Organization 58: 807-820. Blalock, Hubert M., Jr. (1962) “Four-Variable Causal Models and Partial Correlations.” American Journal of Sociology 68: 182-194. Blalock, Hubert M., Jr. (1965) “Theory Building and the Concept of Interaction.” American Sociological Review 30: 374-381. Bobko, Philip. (1986) “A Solution to Some Dilemmas When Testing Hypotheses about Ordinal Interactions.” Journal of Applied Psychology 71: 323-326. Cohen, Jacob, Patricia Cohen, Steven G. West and Leona S. Aiken. (2003) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Mahwah, NJ: Lawrence Erlbaum Associates. Corno, Lyn, Lee J. Cronbach, Haggai Kupermintz, David F Lohman, Ellen B Mandinach, Ann W. Porteus, Joan E. Talbert. (2002) Remaking the Concept of Aptitude: Extending the Legacy of Richard E. Snow. Mahwah, NJ: Lawrence Erlbaum Associates. Cronbach, Lee J. (1987) “Statistical Tests for Moderator Variables: Flaws in Analyses Recently Reported.” Psychological Bulletin 102: 414-417. Duncan, Otis D. (1975) Introduction to Structural Equation Models. New York: Academic Press. 28 Fisher, Gene A. (1988) “Problems in the Use and Interpretation of Product Variables.” Pp. 84-107 in J. Scott Long (Ed.) Common Problems/Proper Solutions: Avoiding Error in Quantitative Research. Newbury Park, CA: Sage. Fisher, Ronald A. (1924) “The Biometrical Study of Heredity.” Eugenics Review 16:189-210. Fox, John. (1997) Applied Regression Analysis, Linear Models, and Related Methods. Thousand Oaks, CA: Sage. Goldstone, Jack A. and Bert Useem. (1999) “Prison Riots as Microrevolutions: An Extension of State-Centered Theories of Revolution.” American Journal of Sociology 104: 985-1029. Hox, Joop. (2002) Multilevel Analysis: Techniques and Applications. Mahwah, NJ: Lawrence Erlbaum Associates. Kozlowski, Steve W. J. and Katherine J. Klein. (2000) “A Multilevel Approach to Theory and Research in Organizations: Contextual, Temporal and Emergent Processes.” Pp. 3 – 90 in Katherine J. Klein and Steve W. J. Kozlowski (Eds.) Multilevel Theory, Research, and Methods in Organizations. San Francisco: Jossey-Bass. Manski, Charles F. (1993) “Identification Problems in the Social Sciences.” Pp. 1-56 in Peter V. Marsden (Ed.) Sociological Methodology 1993. Oxford: Basil Blackwell. Marini, Margaret M and Burton Singer. (1988) “Causality and Degrees of Determination in Social Sciences.” Pp. 347-409 in Clifford C. Clogg (Ed.) Sociological Methodology 1988. Washington, D.C.: American Sociological Association. Marsden, Peter V. (1981) “Conditional Effects in Regression Models.” Pp. 97-116 in P. V. Marsden (Ed.) Linear Models in Social Research. Newbury Park, CA: Sage. Mason, William M., George Y. Wong and Barbara Entwistle. (1983) “Contextual Analysis through the Multilevel Linear Model.” Pp. 72 – 103 in Samuel Linehardt (Ed.) Sociological Methodology 1983-1984. San Francisco: JosseyBass. McClendon, McKee J. (1994) Multiple Regression and Causal Analysis. Itaska, IL: F. E. Peacock. Saunders, David R. (1956) “Moderator Variables in Prediction.” Educational and Psychological Measurement 16: 209-222. 29 Smith, Herbert L. (1990) “Specification Problems in Experimental and Nonexperimental Social Research.” Pp. 59-91 in Clifford C. Clogg (Ed.) Sociological Methodology 1990. Oxford: Basil Blackwell. Southwood, Kenneth E. (1978) “Substantive Theory and Statistical Interaction: Five Models. American Journal of Sociology 83: 1154-1203. Stevens, S. S. (1951) “Mathematics, Measurement and Psychophysics.” Pp. 1-49 in S. S. Stevens (Ed.). Handbook of Experimental Psychology. New York: Wiley. Stewart, John Q. (1948) “Demographic Gravitiation: Evidence and Applications.” Sociometry 11: 31-58. Stolzenberg, Ross M. (1974) “Estimating an Equation with Multiplicative and Additive Terms, with an Application to Analysis of Wage Differentials between Men and Women in 1960.” Sociological Methods and Research 2: 313-331. Stolzenberg, Ross M. (1979) “The Measurement and Decomposition of Causal Effects in Nonlinear and Nonadditive Models.” Pp. 459 – 488 in Karl F. Schuessler (Ed.) Sociological Methodology 1980. San Francisco, CA: Jossey-Bass. Zeisel, Hans. (1968) Say it with Figures (Fifth Edition). New York: Harper and Row. 30 Y Slope = βx Slope = βz β0 X X=b Slope = βz + b βxz Z Z=a Slope = βx + a βxz Figure 1. A Warped Regression Surface with β0 > 0, β1> 0, β2> 0, and β3< 0. 31 X Y Z Figure 2. An Interaction Effect with Two Effects Conditioned on Zero and Two Moderators X βx < 0 γ1> 0 Y α1< 0 βz> 0 Z Figure 3. The Causal Mechanism Underlying the Interaction Effect of Departmental Prestige and Publication Productivity on the Likelihood that an Assisstant Professor will Gain Tenure. 32 X Y Z Figure 4. An Interaction Effect with Two Effects Conditioned on Zero and One Moderator X Y Z Figure 5. An Interaction Effect with One Effect Conditioned on Zero and One Moderator 33 X*Y Figure 6. A Purely Multiplicative Causal Mechanism Y
© Copyright 2026 Paperzz