Product-Variable Models of Interaction Effects and Causal

Product-Variable Models of Interaction Effects and Causal Mechanisms *
Lowell Hargens
University of Washington
Working Paper no. 67R
Center for Statistics and the Social Sciences
University of Washington
November 8, 2006; Revised February 20, 2008
* This paper has benefited from valuable advice and encouragement given by Paul
Allison, Herbert Costner, Christine Fountain, Jerry Herting, Beth Hirsh, Stephanie
Liddle, Stanley Lieberson, J. Scott Long, Ross Matsueda, Barbara Reskin, Stewart
Tolnay and Michael Ward. Direct correspondence to Lowell Hargens, Department of
Sociology, University of Washington, Seattle, WA 98195-3340. Email:
[email protected].
2
Product-Variable Models of Interaction Effects and Causal Mechanisms
Abstract
Among those who use multiple regression or its offshoots, the dominant method of
modeling an interaction effect of two independent variables on a dependent variable is to
include a product variable in a linear estimation equation. In this paper I show how the
coefficient for the product variable in these models depends on the causal mechanism that
underlies the interaction effect. I also show that different causal mechanisms can imply
the same estimation equation, which means that one cannot determine the causal
mechanism underlying an interaction effect from the empirical results produced by the
estimation equation. Social scientists typically lack the kind of theoretical knowledge
required to specify causal mechanisms for interaction effects, and researchers who do
specify a mechanism, such as those who specify that contextual-level variables moderate
the effects of individual-level variables but not vice versa, rarely justify their implicit
claims. Although it is difficult to specify the causal mechanism underlying a particular
interaction effect, I show that there are cases where it is possible to do so.
3
Product-Variable Models of Interaction Effects and Causal Mechanisms1
Many interesting findings in the social sciences involve “interaction” effects, also
known as “moderator” or “synergistic” effects (Cohen et al. 2003, p. 255; Corno et al.
2002). Two variables have an interaction effect on a dependent variable if the
relationship of either independent variable with the dependent variable changes across
values of the other independent variable.2 In an early study of radio listenership, for
example, Lazersfeld found that age was positively related to listening to classical music
programs among the highly educated, but negatively related to it among the less educated
(Zeisel, pp. 123-125). Although there are many ways to model interaction effects
(Stolzenberg 1974, Allison 1977, Southwood 1978, Fisher 1988), quantitative studies
using multiple regression analysis or its offshoots almost always employ product
variables to model them. Indeed, most textbooks treat only the product-variable option.
The product-variable approach to modeling an interaction effect of two
explanatory variables involves including the product of the two variables in an equation
that usually also contains both of them individually. Specifically, given a dependent
variable Y and independent variables X and Z, we multiply the values of X and Z for
1
This paper draws on insights contained in two important papers. The first, Allison (1977), is heavily cited
because it clearly articulates some key features of product-variable models of interaction effects. The
second, Fisher (1988) is rarely cited even though it clearly shows that preexisting knowledge or theory is
required before one can give a causal interpretation to an interaction effect model. Furthermore, my
examination of papers that do cite it indicates that the authors of the citing papers refer to it for points
tangential to its main message.
2
This usage of the word “interaction” is recognized by almost no English dictionaries and appears to be
due to R. A. Fisher, who adapted a term from genetics to describe deviations from additivity (Box 1978, p.
111; Fisher 1924, p. 200).
4
each case and then include this new variable, as well as X and Z, in the regression
equation shown in equation 1.3
Y = β 0 + β x X + β z Z + β xz ( XZ ) + u
(1)
Higher order interaction effects—those involving three or more independent variables—
can also be modeled by forming products of the constituent independent variables (see
Blalock 1965, Cohen et al. 2003, pp. 290-91). Omitting the product variable from
equation 1 forces the regression surface to be a plane, and specifies that X and Z have
only additive effects on Y (Cohen et al., pp. 257-259). In contrast, including the product
variable in equation 1 allows the best fitting regression surface to be “warped” rather than
forcing it to be two dimensional (see Figure 1).
[ Figure 1 about here ]
Including the XZ product term in equation 1 complicates the interpretation of the
resulting regression coefficients. Several authors, for example, have noted that it is
wrong to interpret βx and βz in equation 1 as “main effects” that are separate from the
interaction effect of X and Z on Y (e.g., Allison 1977, Marsden 1981, Braumoeller 2004).
Instead, βx only gives the slope of the regression surface when it intersects the Y-X plane
at Z = 0, while βz gives the slope of the regression surface when it intersects the Y-Z
plane at X = 0 (see Figure 1).
Discussions of the meaning of the coefficient for the product term (βxz) in
equation 1 are rare compared to discussions of the meanings of βx and βz. In this paper I
3
Saunders (1956) first proposed this procedure.
5
show that the value of βxz depends on the causal mechanism4 that underlies an interaction
effect, and also that βxz’s value cannot tell us what that mechanism is. Specifically,
although βxz’s value is part of the description of the regression surface produced by an
interaction effect, different causal mechanisms can produce the same regression surface
and, therefore, the same value for βxz. As a result, knowing the characteristics of an
interaction effect’s regression surface, in particular βxz ‘s value, gives us no information
about why the interaction effect exists, and sometimes misinforms us as to whether it
exists at all.
From Regression Surfaces to the Causal Mechanisms that Generate an Interaction
Effect
Discussions of how to interpret the results of modeling an interaction effect using
the product-variable approach often begin by noting that that the partial derivatives of
equation 1 with respect to X and Z specify how each independent is related to the
dependent variable. For example, Stolzenberg (1979, p. 472) notes that:
4
∂Y
= β x + β xz Z
∂X
(2)
∂Y
= β z + β xz X
∂Z
(3)
Here I am using the term causal mechanism in the sense given by Marini and Singer (1988). Causal
mechanisms are often distinguished from “causal effects.” Causal effects describe what happens when the
value of an independent variable changes, while causal mechanisms describe how or why it happens
(Holland 1988, Smith 1990).
6
These two equations illustrate the characteristics of regression surfaces generated
by product-variable models of interaction effects. First, they show that, as noted above,
βx is the slope of the regression surface in the X dimension when Z equals zero and that
βz is the slope of the regression surface in the Z dimension when X equals zero. In
addition, equation 2 shows that changes in the slope of the regression surface in the X
dimension is a linear function of Z. With each unit increase in Z, the slope of the
regression surface in the X dimension changes by βxz units. X’s slope in Figure 1, for
example, is positive when Z = 0, which means that β x is positive. As Z increases,
however, the slope of the regression plane in the X dimension becomes increasingly less
positive (implying that β xz has a negative sign) and when Z = a, X’s slope, which at that
point equals βx + (βxz × a), is negative. Equation 3 shows that the slope of the regression
plane in the Z dimension is a linear function of X, which is why this kind of interaction
effect is often called a “linear by linear interaction.5 Finally, equations 2 and 3 show that
βxz simultaneously describes how the slope of the regression surface in the X dimension
changes with variation in Z and how the slope the regression surface in the Z dimension
changes with variation in X. In both cases, an increase of one unit in one variable is
associated with a change of βxz in the slope of the regression surface in the other
variable’s dimension.
Although equations 2 and 3 provide a concise description of the regression
surface produced by an interaction effect, researchers often want to interpret the
regression surface as specifying something about the causal system that produced the
5
Although my discussion here focuses on the case where the independent variables involved in an
interaction are both quantitative variables, the points I make below hold equally for cases where either or
both of the independent variables is categorical.
7
interaction effect. Thus, following the approach to interpreting regression coefficients in
additive models, researchers typically interpret βx as specifying the causal impact of X on
Y when Z equals zero and βz as specifying the causal impact of Z on Y when X equals
zero.6 The coefficient for the product term in equation 1 does not have a simple causal
interpretation because it appears to describe how the causal effect of a variable on Y is
affected by variation in the other variable. In the example of the interaction of age and
education on listening to classical music, for example, βxz seems to describe both how
variation in age produces changes in education’s effect on listening to classical music and
how variation in education produces changes in age’s effect on listening to classical
music. This feature has led many authors to characterize the product-variable approach
as “symmetrical” (Aiken and West 1991, p. 10; Cohen et al. 2003, pp. 266, 271; Fox
1997, p. 146) or, put differently, that Z’s effect on X’s relationship with Y, and X’s effect
on Z’s relationship with Y are “two sides of the same coin” (McClendon 1994, p. 274), or
that X and Z contribute equally to their mutual interaction (Fisher 1988, p. 88).
Below I show how βxz is determined by the causal mechanism that produces an
interaction effect. To describe the causal mechanism producing an interaction effect, one
must specify whether each independent variable affects the other independent variable’s
causal impact on the dependent variable. This is a difficult task that researchers almost
never consider seriously. For example, given the age-education interaction effect on
6
These “conditional” causal effect estimates are often meaningless because they refer to conditions that are
not present in the population of cases. For the education by age interaction effect on listening to classical
music, for example, if age were measured as years since birth the slope coefficient for the additive
education coefficient would refer to the effect of education for cases who are not yet a year old, and there
were no such cases in the data. Textbook authors frequently recommend centering the independent
variables involved in an interaction so that the coefficients for the non-product terms in the regression
equation will have concrete interpretations in terms of the observed data (e.g., Cohen et al. 2003, p. 261).
8
listening to classical music, how can one to determine whether age causes changes in
education’s effect, or education causes changes in age’s effect, or both? Readers of
previous drafts of this paper have sometimes argued that it is impossible to answer this
kind of question and that, as a result, it is fruitless to talk about the causal mechanisms
that underlie interaction effects. I believe that this argument is wrong both because
researchers often implicitly specify them, and because one can sometimes specify them a
priori.
From Causal Mechanisms to Regression Surfaces
In order to determine the relationship of the coefficient of the product variable in
a product-variable interaction model to the causal mechanism that generated it, we can
begin by specifying a causal mechanism, and then derive the regression equation that a
researcher would use to estimate the components of the interaction it produces. Let us
begin by specifying an interaction effect involving two independent variables, X and Z,
and a dependent variable, Y, in which each independent variable causally affects the
dependent variable when the other independent variable equals zero, and in which each
independent variable affects the other independent variable’s effect on Y. This kind of
causal mechanism is illustrated graphically by Figure 2, which portrays (a) an
independent variable’s effect on a dependent variable when the other independent
variable equals zero as a directed line from the independent (X or Z) to the dependent
variable (Y), and (b) an independent variable’s impact on another independent variable’s
effect on a dependent variable as a directed line pointing to the line that connects the
second independent variable to the dependent variable. For labeling purposes, let us call
9
the first kind of effect an “effect conditioned on zero” (because its value depends on the
other variable having a value of zero) and the second kind of effect a “moderator effect.”
Thus, this kind of causal mechanism consists of two “effects conditioned on zero” and
two “moderator effects.” Note that this kind of causal mechanism does not specify that
the magnitude of Z’s effect on the X’s impact on Y is the same as X’s effect on the Z’s
impact on Y; it only specifies that both of these effects are non-zero.
[ Figure 2 about here ]
We can translate the above specifications into equations by following some of the
conventions of the multilevel-modeling literature (Hox 2002, see also Fisher 1988, and
Allison 1999, pp. 166-69). Equation 4 is an overall equation specifying that both X and
Z affect Y when the other variable equals zero, while equation 5 specifies that X’s effect
on Y is a linear function of Z that equals α 0 when Z equals zero, and equation 6 specifies
that Z’s effect on Y is a linear function of X that equals γ 0 when X equals zero.
Borrowing the terminology of the multilevel-modeling literature, I refer to equation 4 as a
“level 1” equation and equations 5 and 6 as “level 2” equations.7
Y = β0 + β x X + βz Z + u
(4)
β x = α 0 + α1Z
(5)
βz = γ 0 + γ1X
(6)
Substituting equations 5 and 6 into equation 4 produces equation 7, which can be
rewritten as equation 8. Because equation 8 is structurally identical to equation 1, one
7
Level-1 equations have a dependent variable, Y, on their left-hand sides and level-2 equations have a
coefficient, β, on their left-hand sides. In the multilevel-analysis literature level-2 equations have
additional properties. For example, in that literature variables in level-2 equations characterize nested
groupings of observations, and level-2 equations typically include disturbance terms. None of these
additional properties are needed to make the main points in this paper.
10
would use equation 1 to estimate the various components of the causal mechanism
specified in equations 4 through 6.
Y = β 0 + (α 0 + α1Z ) X + (γ 0 + γ 1 X ) Z + u
(7)
Y = β 0 + α 0 X + γ 0 Z + (α1 + γ 1 ) ZX + u
(8)
Equation 8 shows that for this type of causal mechanism, the product term’s
coefficient (βxz in equation 1) equals the sum of Z’s impact on X’s effect on Y (α1) and
X’s impact on Z’s effect on Y (γ1). Without additional information about α1 or γ 1 we
cannot determine either coefficient’s value from the value of βxz. This means that when
we use equation 1 to estimate the components of this kind of causal mechanism and reject
the null hypothesis that the coefficient for the product term equals zero, we can infer only
that some kind of moderator effect is present but we cannot specify anything concrete
about the nature of the moderator effect(s).
Equation 8 also shows why it is wrong to claim that βxz in equation 1
simultaneously describes both X’s impact on Z’s effect on Y and Z’s impact on X’s
effect on Y. Although we obtain only one coefficient for the product term when we
estimate equation 8, that does not mean that the two underlying causal coefficients ( α1
and γ 1 ) are equal. It means only that we cannot determine their individual magnitudes
without further information or assumptions that will help us solve this kind of
identification problem (Manski 1993, pp. 32-36). Thus, even in the unlikely case where
α1 equals γ1, the coefficient for the product term will not equal their common value;
instead it will equal twice that value.
11
An interesting possibility suggested by equation 8 occurs when α1 and γ 1 have
opposite signs and hence offset each other. For example, consider the relationship
between the likelihood that an assistant professor will be tenured (Y), the prestige of her
department (X), and her publication productivity (Z). First, in cases where assistant
professors have no publications, departmental prestige will have a negative effect on the
likelihood of being tenured, meaning that βx will have a negative value. (In low prestige
departments it is possible to earn tenure even without having published because such
settings deemphasize publication productivity compared to other activities such as
teaching and departmental citizenship.) Second, across all levels of departmental
prestige, higher levels of scholarly publication will be positively related to the likelihood
of gaining tenure. Since measures of departmental prestige typically do not take on a
value of zero, one could center the measure being used so that βz would have a positive
value. Third, it is reasonable to expect that departmental prestige positively affects
publication productivity’s effect on the likelihood of gaining tenure, meaning that γ 1 is
positive. (The causal effect of publication productivity is higher in high prestige
departments than in low prestige departments.) Finally, it is also reasonable to expect
that publication productivity negatively affects the effect of departmental prestige on the
likelihood of gaining tenure, (Highly productive assistant professors have a high
probability of gaining tenure in any department) meaning that α1 is negative. Figure 3
shows this causal mechanism, and if the two moderator effects it contains are similar in
magnitude, the usual statistical test for the coefficient of the product variable in equation
1 would suggest that departmental prestige and publication productivity have only
12
additive effects on the likelihood of gaining tenure when in fact they have an interaction
effect.8
[ Figure 3 about here ]
Thus, we must be confident that α1 and γ 1 have the same sign before we can
argue that a zero value for βxz implies that the effects of X and Z are additive. Inferring
additivity from a zero value of βxz without knowing that the moderator components of the
interaction have the same sign commits the fallacy of affirming the consequent.
Specifically, one cannot necessarily conclude that no interaction effect exists on the basis
of (1) the truth that when no interaction is present the coefficient for a product term will
equal zero and (2) the observation of a zero-value for that coefficient.
Returning to the general case, one must obviously have some kind of additional
information about α1 or γ 1 before one can determine a value for either of these
coefficients. If one knew α1 ’s value, for example, one could subtract it from βxz to
obtain an estimate of γ 1 . Similarly, if one knew that that α1 was a specific multiple of γ 1
and had an estimate of βxz , it would be possible to use knowledge of the multiple (k) and
the results of two simultaneous equations ( α1 = k γ 1 and α1 + γ 1 = βxz ) to estimate α1
and γ 1 . A third strategy is available to those who know that X’s impact on Z’s effect on
Y differs in functional form from Z’s impact on X’s effect on Y. If one knew that
β 2 = γ 0 + γ 1 X , for example, one would substitute this equation and the other level-2
equation 5 into level-1 equation 4 to obtain Y = β 0 + α 0 X + γ 0 Z + α1ZX + γ 1Z X + u .
8
In fact, data on assistant professors in U.S. graduate sociology departments that I am currently analyzing
show that net of relevant controls (public vs. private university, etc.), the coefficient for the product of
publication productivity and departmental prestige is not statistically significant.
13
Because this equation contains two different product variables, one can to obtain separate
estimates for α1 and γ 1 . Unfortunately, because the product variables ZX and Z X will
be highly correlated, this strategy produces imprecise estimates of α1 and γ 1 .
Obviously, these strategies for obtaining unique estimates of α1 and γ 1 require
knowledge or theoretical precision that social scientists rarely possess. Without them,
however, an estimate of βxz’s value tells us nothing about either X’s impact on Z’s effect
on Y or Z’s impact on X’s effect on Y.
Implicit Identification of the Product Term
A common, although seldom explicitly justified, way of dealing with the
identification problem noted above is to specify either that X has no impact on Z’s effect
on Y, or that Z has no impact on X’s effect on Y. If we specify that γ 1 in equation 6
equals zero, for instance, level-2 equation 6 becomes β z = γ 0 , and the estimation
equation is Y = β 0 + α 0 X + γ 0 Z + α1ZX + u . Figure 4 shows this kind of interaction
effect, which consists of two effects conditioned on zero and one moderator effect.
[ Figure 4 about here ]
Once again, one would use equation 1 to estimate the coefficients specified by
this new formulation. Now, however, βxz in equation 1 corresponds to α 1 ’s value. Of
course, a parallel argument that α1 in equation 5 instead of γ 1 equals zero produces the
equation Y = β 0 + α 0 X + γ 0 Z + γ 1ZX + u . In this case βxz in equation 1 corresponds to
γ 1 ’s value. Thus, it is clear that the meaning of the coefficient for the product variable in
14
equation 1 depends on the nature of the causal mechanism that a researcher believes is
generating the regression surface given by equation 1.
Multilevel models almost always specify that one of the possible moderator
effects in an interaction is absent. In particular, in contextual-effect models researchers
almost always specify that contextual variables affect the effects of individual-level
variables but not vice versa. For example, let us consider a simple model involving the
effects of an individual level variable, pupil SES, and a contextual variable, schools’ per
pupil funding, on individual pupils’ achievement-test scores. Researchers would
typically specify a contextual effect model for these variables with level-1 equation 9
stipulating that individuals’ test scores (Y) are a function of individuals’ SES levels (X),
and two level-2 equations (10 and 11) stipulating that both the intercept and slope of the
level-1 equation are functions of schools’ per pupil funding (Z).9
Y = β0 + β x X + u
(9)
β 0 = α 0 + α1Z + e0
(10)
β x = γ 0 + γ 1Z + e1
(11)
Substituting equations 10 and 11 into equation 9 and simplifying, we obtain
equation 12 which, except for its more complicated error term, is equivalent to equation
1.
Y = α 0 + α1Z + γ 0 X + (γ 1 ) XZ + [e0 + e1 X + u ]
9
(12)
Multilevel models usually specify that level-2 equations have disturbance terms. An alternative
in this case might be to specify equation 4 as the level-1 equation and equation 11 as a single level-2
equation. Regardless of one’s choice, ordinary least-squares gives incorrect estimates of coefficients’
standard errors and researchers should therefore use other estimation procedures (Mason et al., 1983). This
estimation problem does not affect the interpretational issue that I address in this paper, however.
15
Equation 12 specifies that the coefficient for the product term estimates the
contextual effect of schools’ per pupil funding levels on the individual-level effect of
pupils’ SES on test scores. It is important to realize, however, that by themselves
equations 9 through 12 do not justify this specification, they only make explicit
knowledge that a researcher should have before using those equations. Multilevel
modeling textbooks show how to use versions of equations 9 through 12 to obtain
estimates of contextual effects, but to my knowledge do not discuss the prior need to
justify the assumption that contextual variables cause variation in the effects of
individual-level variables, but that individual-level variables do not cause variation in the
effects of contextual variables. In the example at hand, what is the justification for
specifying that pupil SES does not affect the causal impact of per pupil funding on test
scores?
The textbooks’ silence on the possibility that individual-level variables may cause
variation in the effects of contextual variables does not justify the assumption that they do
not do so. Algebraically, one could just as easily place Z in the level-1 equation and X in
the level-2 equations to produce an “individual effects” model asserting that variation in
the effect of schools’ per pupil funding on test scores stems from variation in pupils’
SES.10 Furthermore, there are substantive questions for which this alternative
specification is appropriate (Kozlowski and Klein 2000). In at least one instance (Xue et
al. 2007), for example, researchers estimated a model in which an individual-level
variable (prosocial activities) moderated the effects of contextual (neighborhood)
variables. Following the common practice, the researchers did not attempt to justify their
10
In this case one would need to have sufficient cases for each observed value of X, and reasonable
variation in Z for each value of X, to reliably estimate the impact of X on Z’s effect on Y.
16
specification that the contextual level variables had no effect upon the effects of the
individual level variable.
Given the apparently universal failure to justify the implicit claim that either γ 1 or
α 1 in equation 6 equals zero, it is probably reasonable to admit that “contextual effects”
and “individual effects” coexist, in which case we are back to the type of causal
mechanism discussed in the previous section (although one with a more complicated
error term than is present in equation 1) in which it is impossible to determine the
meaning of the coefficient for the product term. In general, it is difficult to avoid the
conclusion that the popularity of the specification that individual-level variables do not
affect the effects of contextual-level variables rests on unconsidered convention rather
than on conscious deliberation.
Theoretically Based Identification of the Product Term
Having dwelt on the general difficulties associated with specifying the effects of
X and Z on the other variable’s effect on Y, let us consider whether this can ever be done.
One class of causal mechanisms in which it is relatively easy to specify such effects
consists of cases in which an independent variable (X) affects a dependent variable while
a second independent variable (Z) affects cases’ exposure to X but alone has no effect on
Y. As an example, consider the effect of whether a parent smokes tobacco on the
likelihood that a preschool child exhibits asthma symptoms. One would expect that the
difference in asthma symptoms between children with a parent who smokes tobacco and
those whose parents do not will be smaller for children with parents who are employed
outside the home, because differences in preschoolers’ exposure to tobacco smoke will be
17
smaller among this group than among those whose parents work in the home. However,
whether a parent works outside the home has no effect on the likelihood that a
preschooler will develop asthma among those preschoolers whose parents who do not
smoke tobacco.
[ Figure 5 about here ]
Figure 5 presents a diagram representing this kind of interaction effect. Note that
the omission of an arrow from Z to Y in this figure represents the fact that Z is causally
related to Y only through its effect on X’s effect on Y. This kind of interaction effect
nicely illustrates the distinction between a causal effect and a causal mechanism. If a
parent who smokes were to quit a job outside the home, one can expect that the preschool
child would be more likely to develop asthma than if the parent were to keep that job.
The causal mechanism that produces this causal effect does not involve a direct causal
link between working outside the home and asthma, however. Instead it is produced
entirely by the effect of working outside the home on the effect of having a parent who
smokes.
We can turn the above theoretical specifications into level-1 equation 13 and
level-2 equation 14, where X measures how much a parent smokes, Z is a measure of
how much time that parent works outside the home, and Y is a measure of the extent to
which a preschooler has developed asthma. Substituting equation 14 into equation 13,
we obtain equation 15.
Y = β0 + β x X + u
(13)
β x = α 0 + α1Z
(14)
18
Y = β 0 + α 0 X + α1ZX + u
(15)
This equation differs structurally from equation 1 because it does not include a
“first order” or “additive effect” term for Z. Note that using equation 1 rather than 15
will produce consistent, but inefficient estimates of the coefficients for this interaction
because the equation contains an irrelevant variable (Allison 1977). Thus, if we believed
that equations 13 and 14 specify the causal mechanism underlying an interaction effect,
we would not use equation 1 to estimate its parameters. This conclusion is at odds with
the claim that researchers should always include lower-order terms when estimating
interaction-effect models (McClendon 1994, pp. 286-287; Cohen et al. 2003, p. 284).
Whether lower-order terms need to be included in an estimation equation depends partly
on the causal mechanism that one believes to be present (see also Bobko 1986, Cronbach
1987).
A theoretical specification of the causal mechanism underlying an interaction
effect is not the only determinant of the estimation equation one should use, however,
because the types of scales that are used to measure the variables involved in the
interaction also play a role. Specifically, when one or more of these variables are
measured in terms of an interval scale,11 it may be necessary to include a first-order term
in one’s estimation equation even though the theoretical equations for the interaction
effect suggest that it should be omitted (Allison 1977, p. 150). This is because interval
scales have an arbitrary zero point, which in turn requires that one use an estimation
equation that allows for possible changes in that zero point. A general rule for the kind of
causal mechanism specified by Figure 4 is that whenever the variable specified in the
11
Here I follow the typology of scales proposed by Stevens (1951). Interval scales have quantitative
categories with equal intervals between successive integer values, but lack an absolute zero point.
19
level-1 equation (X) is measured in terms of an interval scale, we need to use an
estimation equation that includes a first-order term for the moderating variable (Z).
To see this, let us consider an interaction with one moderator and one effect
conditioned on zero in which both X and Z are quantitative variables. Suppose that X in
equation 13 is measured in terms of an interval scale. Since the score of zero for such a
scale is arbitrary, we may add any non-zero constant to the existing values of X without
altering any of the essential properties of X. Specifying the constant as c, we might
create a new variable, x, where x = X + c, implying that X = x – c (Allison 1977).
Substituting this into equation 13 we have for our new level-1 equation
Y = β 0 + β x ( x − c ) + u , and once again equation 14 is the level-2 equation. Substituting
equation 14 into the new level-1 equation yields equation 16, which can be simplified to
equation 17, in which β 0* = β 0 + α 0c and β z* = −α1c .
Y = β 0 + α 0c + α 0 x − α1cZ + α1 xZ + u
(16)
Y = β 0* + α 0 x + β z*Z + α1 xZ + u
(17)
Equation 16 is structurally identical to equation 1 and includes Z as a regressor
even though the theoretical equations specify that Z is does not causally affect Y when X
equals zero. We can justify omitting Z from the estimation equation only if there is
reason to believe that either α1 or c, which compose β z* , equals zero. We have already
specified that c is non-zero, however, and if we believed that α1 equals zero we would
not be estimating an interaction-effect model in the first place.
20
Both β 0* and β1* in equation 17 are affected by one’s choice of a value for c. This
makes sense because β 0* is the regression plane’s intersection with the Y axis when all of
the independent variables, including X, equal zero, and β1* is the slope of the regression
plane in the Z dimension when X equals zero. Returning to Figure 1, for example, if we
changed the X’s zero point to be at point b on the X axis, β 0* ’s value would increase
and β1* ’s value would be negative rather than positive. In contrast, Equation 17 shows
that changes in X’s zero point will not affect the coefficient for X itself or the coefficient
for the product variable (Allison 1977).
Let us next consider the case where it is the level-2 variable (Z) that is measured
in terms of an interval scale rather than the level-1 variable. Equation 13 will be the
level-1 equation, but following the same logic as above, the level-2 equation is now
β1 = α 0 + α1 ( z − c ) . Substituting this into equation 12 and simplifying, we obtain
equation 18.
Y = β 0 + (α 0 + α1c ) X + α1 Xz + u
= β 0 + β1* X + α1 Xz + u
(18)
In this case there is no additive coefficient for z in the estimation equation, and the only
coefficient affected by the choice of zero value for Z is β1* . Because β1* tells us the
slope of the regression surface in the X dimension when Z equals zero, it should be
affected by our arbitrary choice of Z’s zero value.
In sum, when using a product variable to model the type of interaction effect
shown in Figure 4 we must consider not only the level-1 and level-2 equations to
21
determine an appropriate equation for estimation and testing, but also whether any of the
variables in the level-1 equation are measured in terms of interval scales. If they are, the
estimation equation will have to include additive effects that are not implied by the level1 and level-2 equations.
Another type of theoretically based approach to specifying a causal mechanism
that underlies an interaction effect comprises instances where a given value of a
dependent variable can occur only when two or more independent variables
simultaneously have certain values. Most substantive instances of these interactions
involve binary independent variables.12 For example, for many years only white men
could become CEOs of Fortune 500 companies. Similarly, Goldstone and Useem (1999)
argue that prison riots are much more likely to occur if five preconditions are
simultaneously present than if any is absent. Bobko (1986) gives more examples of this
kind of causal mechanism.
Figure 6 presents a graphical representation of this kind of causal mechanism for
two independent variables, X and Z. To indicate that only the joint occurrence of certain
values of both independent variables affects the dependent variable, Figure 6 represents
their joint occurrence by using the multiplication operator and grouping both variables
within a circle. Returning to the CEO example, one could construct binary variables for
being male (X = 0 if no, X = 1 if yes) and being white (Z = 0 if no, Z = 1 if yes). The
product X*Z will then equal one when both X and Z have values of one but zero if either
equals zero. X and Z appear in the level-1 equation for this kind of causal mechanism
12
The main exception to this generalization are “gravity” or “intervening opportunity” models of
migration, which hold that migration between two geographical areas is the product of their population
sizes and the distance between them, and which use ratio-scale measures of these concepts. See, for
example Stewart (1948).
22
only as parts of the product term (see equation 19), and below I will refer to this kind of
causal mechanism as a “purely multiplicative mechanism” (PMM).
[ Figure 6 about here ]
PMMs involve no level-2 equations, and in the CEO example the level-1 equation
specifies that (1) being white has no effect on females’ chances of being a CEO, and (2)
being male has no effect on blacks’ chances of being a CEO.
Y = β 0 + β1 XZ + u
(19)
As noted earlier, many have claimed that the coefficient for the product term in
product-term models of interaction effects simultaneously describes each independent
variable’s effect on the other’s effect on the dependent variable. This claim is true for
PMMs, but not for the previous types. In the CEO example, β1 in equation 19 describes
both how variation in sex affects the effect of race (the latter equals zero when one is
female, and β1 when one is male), and how variation in race affects the effect of being
male (the latter equals zero when one is nonwhite and β1 when one is white).
Alternatively, one might view this kind of causal mechanism as involving a single
variable rather than two. Specifically, if one defines a binary variable in terms of being a
white male ( = 1) or not (= 0), then the value of β1 describes the additive effect of this
new variable.
Equation 19 is both the theoretical equation for a PMM and, in most cases, the
equation one would use to estimate the coefficient describing it. As is true for the
previous type of interaction, however, if one of the independent variables in equation 19
is measured in terms of an interval scale, an additive effect for the other will need to be
added to equation 19 to take account of the arbitrariness of the zero point of the interval-
23
scale independent variable. In fact, when both independent variables are measured in
terms of interval scales, Allison (1977, p. 150) shows not only that equation 1 is the
appropriate estimation equation but also that special estimation procedures are needed
because this case implies constraints on the coefficients being estimated.
In sum, although it is usually difficult to specify whether each variable in an
interaction effect moderates the effects of the other variable(s) involved, there are cases
where it is possible to do so. It is important to note, however, that the ability to make
such a specification is based on preexisting theoretical knowledge rather than the
empirical results obtained from estimating any of the above equations. Without such
preexisting knowledge, one cannot go beyond interpreting the value of the coefficient for
the product term as merely one part of a description of the regression surface yielded by
one’s estimation procedure.
Conclusion
In this paper I have shown how the causal mechanism underlying an interaction
effect generates the coefficient for the product term in a product-variable model of that
interaction effect. The main lesson to be learned from this demonstration is that
researchers need preexisting theoretical information about the nature of the causal
mechanism at work before they can move beyond treating the coefficient for the product
term merely as part of the description of a regression surface. Of course, for some
purposes we do not need to go any further than describing the regression surface. When
we need to include an interaction effect only in order to have a correctly specified
analytic equation, for example, our inability to go beyond describing the regression
surface is not a substantive hindrance. For other purposes, however, knowledge about
24
underlying causal mechanisms is crucial and errors can have serious consequences. For
example, believing that the type of mechanism shown in Figure 2 is present when in
reality the type operating is that shown in Figure 5 could lead to an ineffectual policy
recommendation. Specifically, if the mechanism shown in Figure 5 is present, increasing
the value of Z will have no effect on Y’s value when X is absent (has a value of zero).
Believing that the mechanism shown in Figure 2 is operational in this case might lead one
to recommend a policy change aimed at increasing Z’s value, because the kind of
interaction effect shown in Figure 2 specifies that Z has an effect on Y even in the
absence of X. Thus, even though we cannot distinguish between different possible causal
mechanisms solely on the basis of the empirical results of studies employing product
variables, different causal mechanisms can produce substantially different outcomes.
Considering the causal mechanisms that may be underlying interaction effects
will also reduce the prevalence of tacit specifications, such as the popular specification
that contextual variables moderate the effects of individual-level variables but not vice
versa. If one cannot theoretically justify this claim in a given analysis, then the
coefficient for a cross-level product term in a multilevel model cannot be interpreted as
measuring a contextual effect. I believe that many researchers who have reported
statistically significant coefficients for cross-level product terms and interpreted them as
evidence for contextual effects will have difficulty explicitly defending their implicit
specifications.
Finally, considering the causal mechanisms that might underlie an interaction
effect reveals a limitation of the usual statistical test for the presence of an interaction
effect, which involves assessing the statistical significance of the coefficient for the
25
product variable. In particular, if one is unable to justify specifying a causal mechanism
other than that shown in Figure 2, and if α1 and γ1 in equation 8 have similar magnitudes
but opposite signs, the value of βxz in equation 1 will be near zero and presumably not
statistically significant, leading to the erroneous conclusion that no interaction effect is
present. Researchers unable to specify that at least one of the independent variables has
no effect on the other’s causal impact must therefore be confident that α1 and γ1 have the
same sign before they can treat the usual statistical test as being a general test for the
presence of an interaction effect. Obviously, when researchers want to carry out a
simultaneous test for several possible interactions, such as all of the possible two-way
interactions between gender and a group of other independent variables, they need to
have information about the causal mechanisms underlying each of the interactions before
they can be confident that they are carrying out an overall test of interaction effects.
The general point that researchers need theoretical knowledge before they can
interpret the characteristics of a regression surface in causal terms holds for additive
models as well as those involving interaction effects (Duncan 1975, Berk 2004). In
additive models, however, the connection between a theoretical model and the partial
derivatives that characterize the regression surface is clear and direct because the
variables we include in our estimation equations are the same as those that our theory
specifies to be causes of a dependent variable. In contrast, when we model interaction
effects we include product variables that typically do not directly correspond to the causal
mechanisms at work and that are compatible with several different causal mechanisms.
Because we usually lack theoretical knowledge about the causal mechanisms that
underlie interaction effects our ability to give causal interpretations to product-variable
26
models of interaction effects is limited. Recognition of this limitation will not only
prevent researchers from erroneously interpreting regression surfaces as giving
information about the causal mechanisms underlying interaction effects, but should also
encourage researchers to develop more adequate theoretical specifications of the
interaction effects in which they are interested.
27
Bibliography
Aiken, Leona S. and Stephen G. West. (1991) Multiple Regression: Testing and
Interpreting Interactions. Newbury Park, CA: Sage.
Allison, Paul D. (1977) “Testing for Interaction in Multiple Regression.” American
Journal of Sociology 83: 144-153.
Allison, Paul D. (1999) Multiple Regression: A Primer. Thousand Oaks, CA: Pine
Forge Press.
Berk, Richard A. (2004) Regression Analysis: A Constructive Critique. Thousand
Oaks, CA: Sage/
Box, Joan F. (1978) R. A. Fisher: The Life of a Scientist. New York: John Wiley and
Sons.
Braumoeller, Bear F. (2004) “Hypothesis Testing and Multiplicative Interaction Terms.”
International Organization 58: 807-820.
Blalock, Hubert M., Jr. (1962) “Four-Variable Causal Models and Partial Correlations.”
American Journal of Sociology 68: 182-194.
Blalock, Hubert M., Jr. (1965) “Theory Building and the Concept of Interaction.”
American Sociological Review 30: 374-381.
Bobko, Philip. (1986) “A Solution to Some Dilemmas When Testing Hypotheses about
Ordinal Interactions.” Journal of Applied Psychology 71: 323-326.
Cohen, Jacob, Patricia Cohen, Steven G. West and Leona S. Aiken. (2003) Applied
Multiple Regression/Correlation Analysis for the Behavioral Sciences. Mahwah,
NJ: Lawrence Erlbaum Associates.
Corno, Lyn, Lee J. Cronbach, Haggai Kupermintz, David F Lohman, Ellen B Mandinach,
Ann W. Porteus, Joan E. Talbert. (2002) Remaking the Concept of Aptitude:
Extending the Legacy of Richard E. Snow. Mahwah, NJ: Lawrence Erlbaum
Associates.
Cronbach, Lee J. (1987) “Statistical Tests for Moderator Variables: Flaws in Analyses
Recently Reported.” Psychological Bulletin 102: 414-417.
Duncan, Otis D. (1975) Introduction to Structural Equation Models. New York:
Academic Press.
28
Fisher, Gene A. (1988) “Problems in the Use and Interpretation of Product Variables.”
Pp. 84-107 in J. Scott Long (Ed.) Common Problems/Proper Solutions: Avoiding
Error in Quantitative Research. Newbury Park, CA: Sage.
Fisher, Ronald A. (1924) “The Biometrical Study of Heredity.” Eugenics Review
16:189-210.
Fox, John. (1997) Applied Regression Analysis, Linear Models, and Related Methods.
Thousand Oaks, CA: Sage.
Goldstone, Jack A. and Bert Useem. (1999) “Prison Riots as Microrevolutions: An
Extension of State-Centered Theories of Revolution.” American Journal of
Sociology 104: 985-1029.
Hox, Joop. (2002) Multilevel Analysis: Techniques and Applications. Mahwah, NJ:
Lawrence Erlbaum Associates.
Kozlowski, Steve W. J. and Katherine J. Klein. (2000) “A Multilevel Approach to
Theory and Research in Organizations: Contextual, Temporal and Emergent
Processes.” Pp. 3 – 90 in Katherine J. Klein and Steve W. J. Kozlowski (Eds.)
Multilevel Theory, Research, and Methods in Organizations. San Francisco:
Jossey-Bass.
Manski, Charles F. (1993) “Identification Problems in the Social Sciences.” Pp. 1-56 in
Peter V. Marsden (Ed.) Sociological Methodology 1993. Oxford: Basil
Blackwell.
Marini, Margaret M and Burton Singer. (1988) “Causality and Degrees of
Determination in Social Sciences.” Pp. 347-409 in Clifford C. Clogg (Ed.)
Sociological Methodology 1988. Washington, D.C.: American Sociological
Association.
Marsden, Peter V. (1981) “Conditional Effects in Regression Models.” Pp. 97-116 in P.
V. Marsden (Ed.) Linear Models in Social Research. Newbury Park, CA: Sage.
Mason, William M., George Y. Wong and Barbara Entwistle. (1983) “Contextual
Analysis through the Multilevel Linear Model.” Pp. 72 – 103 in Samuel
Linehardt (Ed.) Sociological Methodology 1983-1984. San Francisco: JosseyBass.
McClendon, McKee J. (1994) Multiple Regression and Causal Analysis. Itaska, IL: F.
E. Peacock.
Saunders, David R. (1956) “Moderator Variables in Prediction.” Educational and
Psychological Measurement 16: 209-222.
29
Smith, Herbert L. (1990) “Specification Problems in Experimental and
Nonexperimental Social Research.” Pp. 59-91 in Clifford C. Clogg (Ed.)
Sociological Methodology 1990. Oxford: Basil Blackwell.
Southwood, Kenneth E. (1978) “Substantive Theory and Statistical Interaction: Five
Models. American Journal of Sociology 83: 1154-1203.
Stevens, S. S. (1951) “Mathematics, Measurement and Psychophysics.” Pp. 1-49 in S.
S. Stevens (Ed.). Handbook of Experimental Psychology. New York: Wiley.
Stewart, John Q. (1948) “Demographic Gravitiation: Evidence and Applications.”
Sociometry 11: 31-58.
Stolzenberg, Ross M. (1974) “Estimating an Equation with Multiplicative and Additive
Terms, with an Application to Analysis of Wage Differentials between Men and
Women in 1960.” Sociological Methods and Research 2: 313-331.
Stolzenberg, Ross M. (1979) “The Measurement and Decomposition of Causal Effects
in Nonlinear and Nonadditive Models.” Pp. 459 – 488 in Karl F. Schuessler (Ed.)
Sociological Methodology 1980. San Francisco, CA: Jossey-Bass.
Zeisel, Hans. (1968) Say it with Figures (Fifth Edition). New York: Harper and Row.
30
Y
Slope = βx
Slope = βz
β0
X
X=b
Slope = βz + b βxz
Z
Z=a
Slope = βx + a βxz
Figure 1. A Warped Regression Surface with β0 > 0, β1> 0, β2> 0, and β3< 0.
31
X
Y
Z
Figure 2. An Interaction Effect with Two Effects Conditioned on Zero and Two
Moderators
X
βx < 0
γ1> 0
Y
α1< 0
βz> 0
Z
Figure 3. The Causal Mechanism Underlying the Interaction Effect of Departmental
Prestige and Publication Productivity on the Likelihood that an Assisstant
Professor will Gain Tenure.
32
X
Y
Z
Figure 4. An Interaction Effect with Two Effects Conditioned on Zero and One
Moderator
X
Y
Z
Figure 5. An Interaction Effect with One Effect Conditioned on Zero and One Moderator
33
X*Y
Figure 6. A Purely Multiplicative Causal Mechanism
Y