Interaction as Departure from Additivity in Case

American Journal of Epidemiology
Copyright © 2003 by the Johns Hopkins Bloomberg School of Public Health
All rights reserved
Vol. 158, No. 3
Printed in U.S.A.
DOI: 10.1093/aje/kwg113
Interaction as Departure from Additivity in Case-Control Studies: A Cautionary Note
Anders Skrondal
From the Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway.
Received for publication December 9, 2001; accepted for publication January 8, 2003.
It has been argued that assessment of interaction should be based on departures from additive rates or risks.
The corresponding fundamental interaction parameter cannot generally be estimated from case-control studies.
Thus, surrogate measures of interaction based on relative risks from logistic models have been proposed, such
as the relative excess risk due to interaction (RERI), the attributable proportion due to interaction (AP), and the
synergy index (S). In practice, it is usually necessary to include covariates such as age and gender to control for
confounding. The author uncovers two problems associated with surrogate interaction measures in this case:
First, RERI and AP vary across strata defined by the covariates, whereas the fundamental interaction parameter
is unvarying. S does not vary across strata, which suggests that it is the measure of choice. Second, a
misspecification problem implies that measures based on logistic regression only approximate the true
measures. This problem can be rectified by using a linear odds model, which also enables investigators to test
whether the fundamental interaction parameter is zero. A simulation study reveals that coverage is much
improved by using the linear odds model, but bias may be a concern regardless of whether logistic regression or
the linear odds model is used.
additivity; case-control studies; epidemiologic methods; interaction
Abbreviations: AP, attributable proportion due to interaction; RERI, relative excess risk due to interaction; S, synergy index.
Logistic regression analysis is the workhorse of contemporary epidemiology. Consequently, assessment of interaction
is often performed by simply introducing product terms into
logistic risk models. This practice has been vehemently criticized by some epidemiologists, who argue that assessment
of interaction should mainly be based on additive rate or risk
models (1–7). For rare outcomes, this notion of interaction
follows from probabilistic independence, as embodied in the
classical toxicologic notion of “simple independent action”
discussed by Finney (8). The purpose of this article is not to
engage in the debate on how interaction should be conceptualized in epidemiology. Rather, I confine my investigation to
the performance of suggested measures of interaction as
departure from additivity.
In cohort studies, the desired interaction assessment can
easily be accomplished by fitting linear rate or risk models.
However, the parameters of linear models cannot be validly
estimated for case-control studies unless the sampling fractions for cases and controls are known or can be estimated.
On the other hand, it is well known that odds ratios can be
estimated in case-control studies. Furthermore, relative risks
are often well approximated by odds ratios in case-control
studies.
On the basis of these observations, Rothman (1, 2)
suggested a synergy index (S) which can be used in casecontrol studies to measure interaction as departure from
additive risks. Moreover, Rothman considered statistical
inference for the index, deriving confidence intervals using
the delta method. Rothman presented several additional
measures of interaction (3), including the relative excess risk
due to interaction (RERI), renamed the ICR by Rothman and
Greenland (6), and the attributable proportion due to interaction (AP), which is the focus in Rothman’s latest book (7).
Rothman furthermore pointed out (3, p. 324) that estimates
of RERI, AP, and S are easily obtained from logistic regression analysis, as are Wald tests and confidence intervals (9).
Alternatively, a likelihood ratio test of additive risks could
be performed in the logistic regression model. Although this
test would be expected to have better properties than the
Wald test, it would be much harder to implement.
Reprint requests to Dr. Anders Skrondal, Department of Epidemiology, Norwegian Institute of Public Health, P.O. Box 4404 Nydalen, N-0403
Oslo, Norway (e-mail: [email protected]).
251
Am J Epidemiol 2003;158:251–258
252 Skrondal
Discussion of the measures advocated by Rothman is typically confined to the somewhat unrealistic situation in which
there are two exposures but no additional covariates to
control for confounding. An exception is Flanders and
Rothman (10), who suggested a likelihood approach to estimating S from stratified case-control data. As Rothman
acknowledged (3), their approach only handles one or
possibly two additional covariates, because otherwise data in
each stratum become too sparse. Hence, Rothman suggests
invoking “multivariate methods” in estimating RERI, AP,
and S when there are additional covariates. Specifically,
Rothman states, “Confounding factors can be controlled by
including terms for those factors in the multiple logistic
model” (3, p. 324). This suggestion has been adhered to by
epidemiologists (for instance, see Olsen et al. (11)).
There has been a paucity of studies investigating the
performance of RERI, AP, and S. The only paper I am
aware of is that of Assmann et al. (12), where the investigation was limited to coverage of confidence intervals for
RERI and AP in models without additional covariates. The
primary concern in this article is the extent to which RERI,
AP, and S are useful summary measures of interaction as
departure from additive risks. In addition to the conventional approach based on logistic regression, I also suggest
an alternative approach based on linear odds models.
Attention is focused on the more realistic setting in which
there are additional covariates. However, the concepts are
best introduced in a setting with two exposures and no
additional covariates.
MODELS FOR TWO EXPOSURES
Let Y be a dichotomous outcome variable with outcomes 1
and 0. Consider the case of two dichotomous exposure variables x1 and x2 with levels j = 0, 1 and k = 0, 1, respectively.
Let
 1 if j = 1
 1 if k = 1
x1 = 
and x 2 = 
 0 if j = 0
 0 if k = 0.
Let Rjk ≡ P(Y = 1|xl, x2) be the conditional risk or probability that the outcome variable Y takes the value 1 given
the values of the exposures. For all j and k, define risk
differences as RDjk ≡ Rjk – R00, relative risks as RRjk ≡ R jk/
R00, odds as Ojk ≡ R jk/(l – Rjk), and odds ratios as OR jk ≡
Ojk/O00.
(compared with no exposure whatsoever), and b2 as the
excess risk under exposure x2. The parameter b3 can be
expressed as
b3 = RD11 – RD10 – RD01 = R11 – R10 – R01 + R00,
(1)
representing the excess risk due to interaction of the
exposures. If b 3 = 0, RD11 = RD01 + RD 10, which is riskdifference additivity. According to Rothman (3, p. 320),
b 3 is the most fundamental epidemiologic measure of
interaction.
Unfortunately, the linear risk model cannot in general be
validly estimated from case-control designs, unless the
sampling fraction of cases and controls is known or can be
estimated. Since this rarely appears to be the case, it follows
that direct inference regarding the fundamental interaction
parameter b3 cannot be performed in this case. This was the
impetus for the development of the surrogate interaction
measures RERI, AP, and S.
The logistic risk model
A logistic risk model is specified as
α + β 1 x1 + β2 x2 + β3 x1 x2
e
-.
R jk = -----------------------------------------------------α + β 1 x1 + β2 x2 + β3 x1 x2
1+e
Note that the parameters α, β1, β2, and β3 are different from
the corresponding parameters a, b1, b2, and b3 in the linear
risk model. The model can alternatively be expressed as
OR jk = e
β1 x1
×e
β 2 x2
×e
β3 x1 x2
β
(2)
.
β
Often RRjk ≈ ORjk, giving RR 10 ≈ e 1, RR 01 ≈ e 2, and
RR 11 ≈ e β1 + β2 + β3. If β3 = 0, RR11 = RR01 × RR10 is obtained,
which is relative-risk multiplicativity.
Importantly, the logistic model can be employed for casecontrol designs under reasonable assumptions (13).
Regarding the parameters, the only difference is that the
intercept now becomes
φ
α∗ = α + ln  ----1- ,
 φ 0
where φ1 and φ0 are the sampling fractions of cases and
controls, respectively.
The linear risk model
A linear risk model is now specified as
Rjk = a + b1x1 + b2x2 +b3x1x2,
where it is assumed that a > 0, b1 > 0, and b2 > 0. It follows
that a = R00, b1 = R10 – R00 = RD10, and b2 = R 0l – R 00 =
RD0l. Hence, a is interpreted as the risk when there is no
exposure, b1 as the excess risk under exposure x1
MEASURES OF INTERACTION
Several measures of interaction have been suggested that
can serve as surrogates for the fundamental interaction
parameter b3 in the linear risk model, including RERI, AP,
and S. The basic idea is that indirect statistical inference
regarding b3, including calculation of confidence intervals
ˆ , or Ŝ from logistic
ˆ , AP
and testing, can be based on RERI
modeling in case-control designs.
Am J Epidemiol 2003;158:251–258
Interaction as Departure from Additivity 253
Relative excess risk due to interaction
Rothman defines RERI (3, p. 323) as
b3
R 11 – R 10 – R 01 + R 00
- = RR 11 – RR 10 – RR 01 + 1 = ----- .
RERI = --------------------------------------------------a
R 00
RERI can be interpreted as the excess risk due to interaction relative to the risk without exposure. Rothman suggests substiˆ , and RR
ˆ
ˆ , RR
tuting estimated approximate risk ratios RR
11
10
01 from the logistic risk model. Under our parameterization of the
logistic risk model (equation 2), this leads to
ˆ = e β̂ 1 + β̂ 2 + β̂ 3 – e β̂1 – e β̂2 + 1.
RERI
(3)
Attributable proportion due to interaction
Rothman defines AP (3, p. 321) as
b3
RR 11 – RR 10 – RR 01 + 1
R 11 – R 10 – R 01 + R 00
RERI
-.
- = --------------------------------------------------------- = ------------- = ------------------------------------AP = --------------------------------------------------a + b1 + b 2 + b 3
R 11
RR 11
RR 11
AP is interpreted as the attributable proportion of disease which is due to interaction among persons with both exposures. However,
this interpretation does not make sense under negative interaction (b3 < 0), since the proportion would then be negative.
Substituting the estimated approximate risk ratios from the logistic risk model gives us
β̂ + β̂ + β̂
β̂
β̂
e 1 2 3–e 1–e 2+1
ˆ
AP = ---------------------------------------------------------- .
β̂ 1 + β̂ 2 + β̂ 3
e
(4)
Synergy index
Rothman defines S (3, p. 322) as
b1 + b2 + b3
RR 11 – 1
R 11 – R 00
- = ---------------------------------------------------------= ---------------------------- .
S = -----------------------------------------------------------b1 + b2
( R 10 – R 00 ) + ( R 01 – R 00 )
( RR 10 – 1 ) + ( RR 01 – 1 )
S can be interpreted as the excess risk from exposure (to both exposures) when there is interaction relative to the excess risk
from exposure (to both exposures) without interaction.
Substituting the estimated approximate risk ratios from the logistic risk model (equation 2) gives us
β̂ 1 + β̂ 2 + β̂ 3
–1
e
ˆ
S = ----------------------------------------------- .
β̂ 1
β̂ 2
(e – 1) + (e – 1)
(5)
MODELS INCLUDING ADDITIONAL COVARIATES
Covariates are included in most epidemiologic models to control for confounding. I still consider two dichotomous exposures, but now I also include a dichotomous covariate z, which is coded 1 or 0. This definition of z is chosen for simplicity; there
may of course be a vector of additional covariates, including both categorical and continuous covariates.
Let Rjkz ≡ P(Y = 1|xl, x2, z) be the conditional risk of Y taking the value 1 given covariates. Define stratum-specific risk differences as RDjkz ≡ Rjkz – R00z, relative risks as RRjkz ≡ Rjkz/R00z, odds as Ojkz ≡ Rjkz/(1 – Rjkz), and odds ratios as ORjkz ≡ Ojkz/O00z.
The linear risk model
Consider a linear risk model with an additional covariate, where there is interaction among exposures but not between the
exposures and the additional covariate:
Rjkz = a + b1x1 + b2x2 + b3x1x2 + gz,
Am J Epidemiol 2003;158:251–258
(6)
254 Skrondal
where a > 0, b1 > 0, b2 > 0, and gz > 0. a = R000 and g = R001 – R000; a is the risk under no exposure when z = 0, whereas g
represents the excess risk when z = 1 (compared with z = 0). Hence, the risk when there is no exposure can be expressed as a +
gz; note that it depends on the value taken by the additional covariate. Irrespective of the value of z, it follows that b1 = R10z –
R00z = RD10z, b2 = R01z – R00z = RD01z, and b3 = R11z – R10z – R01z + R00z = RD11z – RD10z – RD01z. It also follows that
b1
b2
b1 + b2 + b3
-.
- , RR 01z = 1 + -------------- , and RR 11z = 1 + --------------------------RR 00z = 1, RR 10z = 1 + -------------a + gz
a + gz
a + gz
(7)
Note that RR10z, RR01z, and RR11z are functions of the covariate z, in contrast to the risk differences.
The logistic risk model
A logistic risk model with an additional covariate, where there is interaction among exposures but not between exposures and
the covariate, is specified as
α + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + γz
e
-.
= -------------------------------------------------------------α + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + γz
1+e
R jkz
β
β
It follows that OR00z = 1, OR10z = e 1, OR01z = e 2, and OR11z = e
β1
β2
β1 + β2 + β3
(8)
. When RRjkz ≈ ORjkz,
RR 10z ≈ e , RR 01z ≈ e , and RR11z ≈ e
β1 + β2 + β3
.
(9)
Hence, the relative risks implied by the logistic risk model do not depend on the covariate z, in contrast to the linear case. On
the other hand, risk differences depend on the covariates, unlike the case in the linear risk model.
PROBLEMS WITH ADDITIONAL COVARIATES
There are two problems associated with using surrogates for the fundamental interaction parameter b3 when there are additional covariates.
The uniqueness problem
Noting that the interaction parameter of interest b3 is invariant across the strata defined by the covariates z, I investigate
whether this also applies for the surrogate measures.
Consider RERI for a given value of the covariates z. Substituting for the relative risk from the true linear risk model (equation
7) gives us
b3
-,
RERI z = RR 11z – RR 10z – RR 01z + 1 = -------------a + gz
demonstrating that the magnitude of RERI generally depends on the values of z. In contrast, Rothman’s suggestion of
ˆ , given in equation 3, where β̂ , β̂ ,
including additional covariates in the logistic model would produce a single RERI
1
2
and β̂3 are now estimates from the logistic model (equation 8) including the covariate but no interactions between the
covariate and either of the exposures or their product. Hence, there is clearly a tension between the suggested estimator,
based on the implicit assumption that there is one measure to be estimated, and the fact that there are several unknown
measures. The exception is when there is no interaction, b 3 = 0, since RERI = 0 in this case, whatever the value of z.
Also note that RERI retains the sign of b 3, since a + gz > 0.
Regarding AP, substitution for the relative risk from the true linear risk model (equation 7) produces
b3
RERI
AP z = ---------------z = -------------------------------------------------- ,
a + b 1 + b 2 + b 3 + gz
RR 11z
and there is a different AP for each stratum defined by the covariates, unless b3 = 0. Following Rothman’s strategy, on the other hand,
a single AP would be estimated as in equation 4, with estimates substituted from the logistic model with covariate (equation 8).
For S, substituting for the relative risk from the linear risk model (equation 7) gives us a unique measure
b1 + b2 + b3
RR 11z – 1
S = -------------------------------------------------------------= ---------------------------- ,
b 1 + b2
( RR 10z – 1 ) + ( RR 01z – 1 )
which does not depend on the covariate z. Analogous to the case without additional covariates, Rothman suggests estimating S
using equation 5, with estimates substituted from equation 8. S does not suffer from the uniqueness problem when additional
covariates are included, in contrast to RERI and AP, which suggests that S is the surrogate measure of choice.
Am J Epidemiol 2003;158:251–258
Interaction as Departure from Additivity 255
The misspecification problem
If a logistic model is used in estimation of the surrogate interaction measures, specified with interaction among exposures (but
not between exposures and additional covariates), the model is misspecified in the sense that it does not produce a relative risk
identical to that of the corresponding true linear model when there are additional covariates. This is evident from noting that the
relative risk from the logistic model (equation 9) does not depend on the value of the covariate z, whereas the relative risk from
the linear model in equation 7 does. Hence, RERI, AP, and S based on the logistic risk model with an additional covariate (equation 2) only approximate the true measures from the corresponding linear risk model (equation 6). This stands in contrast to the
case with solely two exposures, where the logistic and linear models are both “saturated” (both have as many parameters as
conditional probabilities) and produce identical relative risks (and hence RERI, AP, and S). An important implication is that the
estimated logistic model cannot be used to check the validity of the linear model, since a linear model without interaction
between exposures and covariate implies interaction in the logistic model.
RECTIFYING THE MISSPECIFICATION PROBLEM
Using a linear odds model
O jkz = a∗ + b∗1 x 1 + b∗2 x 2 + b∗3 x 1 x 2 + g∗ z
(10)
enables us to estimate a* = ka, b∗1 = kb1, b∗2 = kb2, b∗3 = kb3, and g* = kg based on a case-control study (6, pp. 418–419; 14). The
linear odds model is a misspecified version of the linear risk model (equation 6) in the sense that the parameters a, bl, b2, b3, and g
of the latter model are recovered up to a proportionality factor k. This proportionality misspecification has two important implications: First, it follows that hypotheses specifying that parameters of linear risk models are zero can be tested, particularly the
hypothesis of no departure from additive risks b3 = 0 by testing b∗3 = 0 in the model shown by equation 10. Second, the surrogate
measures of interaction as departure from additivity can be validly estimated from the linear odds model. Considering S,
kb 1 + kb 2 + kb 3
b1 + b 2 + b3
b∗1 + b∗2 + b∗3
------------------------------ = ------------------------------------= --------------------------- = S.
+
kb
kb
b1 + b2
∗
∗
b1+b2
1
2
(11)
Note that the unknown proportionality factor cancels out. Thus, although the linear odds model is a misspecified version of the linear
risk model, no misspecification problem is involved in obtaining S (or RERIz and APz), in contrast to the approach based on logistic
regression. However, the uniqueness problem involving RERI and AP persists, suggesting that linear odds modeling of S is the
method of choice in assessing interaction as departure from additivity in case-control studies with additional covariates. The linear
odds model can be fitted in software packages such as STATA, EPICURE, and SAS (a reparameterization is available in EGRET).
EXAMPLE
Consider two dichotomous exposures, cigarette smoking (x1) and coffee drinking (x2), and the additional dichotomous covariate gender (z). The outcome variable indicates whether or not the subject experienced myocardial infarction. Interest concerns
the interaction between cigarette smoking and coffee drinking.
To ease presentation, I now let risk and risk difference be expressed as number of cases per 100,000. That is, a risk of 0.0004
is written as 40. Remember that only (approximate) relative risks are generally available from case-control studies, and inference must hence be based on these.
I let z be coded 1 if male and 0 if female. A linear risk model with interaction between exposures but no interactions between
the covariate and the exposures is specified:
Rjkz = a + b1x1 + b2x2 + b3x1x2 + gz = 10 + 100x1 + 40x2 +40x1x2 +90z.
This setup is exhibited in table 1.
For females, RERI = 19 – 11 – 5 + 1 = 4 and AP = (19 – 11 – 5 + 1)/19 = 0.21; for males, RERI = 2.8 – 2 – 1.4 + 1 = 0.4 and
AP = (2.8 – 2 – 1.4 + 1)/2.8 = 0.14. In contrast, S attains the same value for both genders: S = (l9 – l)/[(11 – l) + (5 –
1)] = (2.8 – 1)/[(2 – 1) + (1.4 – 1)] = 1.29.
The example illustrates the problems previously uncovered. Although the fundamental interaction parameter b3 is invariant
over gender, the surrogates RERI and AP both vary across gender. S is the only adequate measure, attaining a unique value for
both gender strata. Regarding the misspecification problem, the relative risks for males in table 1 do not equal those for females,
as would be the case for a logistic model without interaction between exposures and covariate. A hypothetical case-control
study can be obtained from the table by letting the figures reported in the “Risk” column represent cases and considering 500
ˆ = 0.18, and Ŝ = 1.31 would be
ˆ
controls in each group. If logistic regression were used, the estimates RERI
= 0.80, AP
obtained, whereas using the linear odds approach produces Ŝ = 1.29.
Am J Epidemiol 2003;158:251–258
256 Skrondal
ence regarding RERI, AP, and S on the logistic risk model
(equation 8). Ninety-five percent confidence intervals for all
measures are obtained as described by Hosmer and Lemeshow (9). I then consider the performance of the alternative
approach based on fitting the linear odds model (equation
10). The Wald test of H0: b∗3 = 0 , which is also a test of the
hypothesis that the fundamental interaction parameter b3 is
zero, is investigated. The actual rejection probability at the
nominal level of 5 percent represents the actual significance
level when H0 is true and the power of the test otherwise. The
performance of point estimates of S and corresponding 95
percent confidence intervals obtained via the delta method
are also studied. Confidence intervals for S are not part of the
standard output from linear odds modeling; therefore, I
demonstrate in the Appendix how a calculator or spreadsheet
can be used to obtain these. Since RERI and AP suffer from
the uniqueness problem when there are additional covariates,
I do not consider inference regarding these measures based
on the linear odds model.
The nine scenarios investigated are presented in the lefthand portion of table 2. Throughout, I specify a = bl = b2 =
0.0001 but consider several scenarios for the interaction
parameter b3 and the covariate effect g. Regarding the
magnitude of interaction, no interaction (b3 = 0), a moderate
positive interaction (b3 = 0.0001), and a strong positive interaction (b3 = 0.001) are studied. Regarding the covariate
effect, I consider no effect (g = 0), a moderate effect (g =
0.0001), and a strong effect (g = 0.001) on disease. The
corresponding values of the interaction measures are given,
where RERI and AP are given subscripts designating the
strata defined by z.
Each of the scenarios was replicated 1,000 times. The
logistic model (equation 8) and the linear odds model (equation 10) were used for each replication. Let Ŝ r be the esti-
TABLE 1. Example of a linear risk model for myocardial
infarction with the exposures smoking and coffee drinking and
the additional covariate of gender
Gender
Smoking
Coffee
drinking
Risk
(×1,000)
Female
No
No
10
Yes
Male
No
Yes
Risk
difference Relative risk
(×1,000)
0
1
Yes
50
40
5
No
110
100
11
Yes
190
180
19
No
100
0
Yes
140
40
No
200
100
2
Yes
280
180
2.8
1
1.4
SIMULATION STUDY
In each replication of the study, a cohort is initially simulated from a linear risk model with an additional covariate
(equation 6). All covariates are dichotomous, with 50 percent
of individuals in each category. It is crucial to ensure that
covariates are appropriately correlated in simulation studies in
epidemiology, since such correlations are standard in observational studies. I have specified a “typical” Pearson correlation
of 0.3 among exposures and between exposures and the additional covariate. This leads to a correlation of 0.32 between the
additional covariate and the interaction term (the product of
the exposure dummies) and a correlation of 0.69 between
either exposure and the interaction term. I subsequently
produce a case-control study by randomly sampling 500 cases
and 500 controls from the cohort.
On the basis of the resulting case-control data, I first
consider the approach advocated by Rothman, basing infer-
TABLE 2. Scenarios and performance of interaction measures in a simulation study with 500 cases and 500 controls and 1,000
replications per scenario*
Scenario
Performance of interaction measure
Logistic risk model
Interaction measure
Parameter
Scenario
no.
b3
g
RERI0 RERI1 AP0
AP1
^
RERI†
S
M†
V†
Linear odds model
^
AP†
C†
M
V
^
S†
C
M
V
^
S
C
M
V
C
P[R]‡
for b3
1
0
0
0
0
0
0
1
–0.06
0.32 0.96
–0.02 0.03 0.95
1.05
0.14 0.96
1.06
0.14 0.96
0.04
2
0
0.0001
0
0
0
0
1
–0.16
0.23 0.97
–0.07 0.04 0.97
0.98
0.17 0.95
1.11
0.65 0.97
0.04
3
0
0.001
0
0
0
0
1
–0.26
0.18 0.96
–0.17 0.08 0.95
0.96
10.23 0.86
1.16
23.04 0.96
0.03
4
0.0001 0
1
1
0.25
0.25
1.50
0.99
0.42 0.95
0.24 0.02 0.95
1.58
0.57 0.96
1.60
0.36 0.96
0.36
5
0.0001 0.0001
1
0.50 0.25
0.20
1.50
0.49
0.32
0.14 0.03
1.38
1.19 0.90
1.65
2.10 0.96
0.28
6
0.0001 0.001
1
0.09 0.25
0.07
1.50
–0.03
0.17
–0.02 0.06
0.36
143.09 0.76
1.86
241.18 0.95
0.11
7
0.001
0
10
10
0.77
0.77
6
10.35
7.12 0.94
0.77 0.00 0.95
6.37
235.63 0.95
8.15
920.51 0.95
1.00
8
0.001
0.0001
10
5
0.77
0.71
6
7.33
3.21
0.72 0.00
5.63
256.69 0.89
7.47
482.68 0.93
1.00
9
0.001
0.001
10
0.91 0.77
0.43
6
1.84
0.33
0.48 0.01
3.99 1,422.56 0.60
8.93 11,344.38 0.92
1.00
* The left-hand portion of the table shows true parameters and corresponding interaction measures. The right-hand portion of the table shows
mean estimates, variances, and the coverage of 95% confidence intervals for RERI, AP, and S based on logistic regression.
† RERI, relative excess risk due to interaction; AP, attributable proportion due to interaction; S, synergy index; M, mean estimate; V, variance;
C, coverage of 95% confidence intervals.
‡ Probability of rejection P[R] of H0: b3 = 0 based on the linear odds model.
Am J Epidemiol 2003;158:251–258
Interaction as Departure from Additivity 257
mated S in replication r of a scenario. The mean estimate is
defined as
1
M ( Sˆ ) = --------------- ∑1r ,=0001 Sˆ r ,
1,000
the variance as
2
1
V ( Sˆ ) = --------------- ∑1r ,=0001 ( Sˆ r – M ( Sˆ ) ) ,
1,000
and coverage as the fraction of the 1,000 95 percent confidence intervals including the true S. Analogous definitions
apply for RERI and AP, but note that coverage cannot be
defined when these measures vary across strata. For each
scenario, the mean estimates and variances are reported in
the right-hand portion of table 2, and the coverage of the 95
percent confidence intervals is reported when applicable.
Considering the performance of inference based on
logistic regression, it is evident that RERI and AP are very
problematic under scenarios 5, 6, 7, and 8, where there is not
a unique measure. The evidence for bias in estimating RERI
and AP for the remaining scenarios is statistically significant, except for scenarios 4 and 7, respectively, where p >
0.05. Bias in estimating S is significant for scenarios 1, 4, 5,
and 6. However, the estimated bias is fairly tolerable in
magnitude for all unique measures, apart from scenarios 6
and 9 for S. Regarding precision, Ŝ did not perform satisfactorily for scenarios 6, 7, 8, and 9. This is due to its construction as a fraction, often producing very large absolute values
when the denominator by chance approaches zero. Coverage
was generally quite dismal, and it grew worse (more
discrepant from 95) as the interaction and the magnitude of
the covariate effect increased.
From a theoretical point of view, the linear odds model is
the model of choice for estimating S. Interestingly, the
results from the simulations are somewhat mixed. Regarding
coverage, the performance of the linear odds approach is
good, and it clearly outperforms the logistic approach. When
it comes to estimation, the evidence of bias in Ŝ from the
linear odds model is significant (p ≤ 0.05) for all scenarios
except 3, 6, and 9 (lack of significance for the latter is due to
extreme imprecision). Disappointingly, the estimated bias is
generally somewhat more pronounced than for the logistic
approach. The variances of the estimates are also generally
higher for the linear odds model than for the logistic model,
leading to larger mean squared errors. The nominal significance level for testing the fundamental interaction parameter
b3 in the linear odds model is reasonably well recovered.
Observe that the power is low when the interaction parameter is of the same magnitude as the main effects, notwithstanding that there are as many as 500 cases and 500
controls. The power also appears to decrease as the covariate
effect increases.
As expected, all measures perform fairly well in terms of
bias when there is no covariate effect (scenarios 1, 4, and 7).
Results based on the logistic and linear odds models differ
because the estimated models are misspecified by inclusion
of the covariate z. Identical results would be obtained for
both models if the estimated models were correctly specified
by omitting the covariate.
Am J Epidemiol 2003;158:251–258
A simulation study with smaller samples, 250 cases and
250 controls, was also conducted. The results were similar
but a bit more pronounced and are not reported here.
CONCLUSION
I strongly endorse the notion that interaction assessment
should be governed by the conceptualization of interaction.
Logistic regression is appropriate if interaction is taken as
departure from relative-risk multiplicativity, regardless of
whether additional covariates are included. Given a conceptualization of interaction as departure from additive risks or
rates, making direct inferences regarding the fundamental
interaction parameter b3 would be preferred. Unfortunately,
this is usually not possible in case-control studies. Hence,
surrogate measures of interaction as departure from additivity such as RERI, AP, and S that can be estimated from
case-control studies have been proposed. Estimation of the
measures on the basis of logistic regression is appropriate for
assessment of interaction as departure from additivity of
risks in case-control studies when there are no additional
covariates. This approach is problematic in practice,
however, where additional covariates are usually included to
control for confounding. A uniqueness problem arises
because the surrogates RERI and AP vary across strata
defined by the additional covariates, in contrast to the unique
interaction parameter of interest b3. S, on the other hand,
does not suffer from this problem, which suggests that it is
the measure of choice in assessing interaction as departure
from additivity in case-control studies that include additional
covariates. A misspecification problem arises because the
logistic model is no longer equivalent to the linear risk
model when there are additional covariates. This problem
can theoretically be rectified by using a linear odds model
instead, and simulations reveal that coverage is much
improved in comparison with the logistic approach.
However, bias in estimating surrogate measures can be a
problem regardless of whether logistic regression or the
linear odds model is used. An advantage of the linear odds
approach is that it enables us to test the hypothesis of interest
b3 = 0 directly, without using surrogate measures, but the
power appears to be rather low.
I conclude that considerable caution should be exercised in
assessing interaction as departure from additivity in casecontrol studies with additional covariates.
ACKNOWLEDGMENTS
The author acknowledges Drs. K. J. Rothman, S. O.
Samuelsen, and L. C. Stene for their helpful comments.
REFERENCES
1. Rothman KJ. Synergy and antagonism in cause-effect relationships. Am J Epidemiol 1974;99:385–8.
2. Rothman KJ. The estimation of synergy or antagonism. Am J
Epidemiol 1976;103:506–11.
258 Skrondal
3. Rothman KJ. Modern epidemiology. 1st ed. Boston, MA: Little, Brown and Company, 1986.
4. Koopman JS. Interaction between discrete causes. Am J Epidemiol 1981;113:716–24.
5. Rothman KJ, Greenland S, Walker AM. Concepts of interaction. Am J Epidemiol 1980;112:467–70.
6. Rothman KJ, Greenland S. Modern epidemiology. 2nd ed. Philadelphia, PA: Lippincott Williams and Wilkins, 1998.
7. Rothman KJ. Epidemiology: an introduction. Oxford, United
Kingdom: Oxford University Press, 2002.
8. Finney DJ. Probit analysis. 3rd ed. Cambridge, United Kingdom: Cambridge University Press, 1971.
9. Hosmer DW, Lemeshow S. Confidence interval estimation of
interaction. Epidemiology 1992;3:452–6.
10. Flanders WD, Rothman KJ. Interaction of alcohol and tobacco
in laryngeal cancer. Am J Epidemiol 1982;115:371–9.
11. Olsen AO, Dillner J, Skrondal A, et al. Combined effect of
smoking and human papillomavirus in cervical carcinogenesis.
Epidemiology 1998;9:346–9.
12. Assmann SF, Hosmer DW, Lemeshow S, et al. Confidence
intervals for measures of interaction. Epidemiology 1996;7:
286–90.
13. Farewell VT. Some results on the estimation of logistic models
based on retrospective data. Biometrika 1979;66:27–32.
14. Greenland S. Multivariate estimation of exposure-specific incidence from case-control studies. J Chronic Dis 1981;34:445–
53.
15. Serfling RJ. Approximation theorems of mathematical statistics. London, United Kingdom: John Wiley and Sons, 1980.
APPENDIX
Confidence intervals for S are not part of the standard output from linear odds modeling. Hence, a calculator or spreadsheet
can be used to obtain confidence intervals based on the parameter estimates b̂∗1 , b̂∗2 , and b̂∗3 and the estimated variances and
ˆ ( b̂∗ ) , Var
ˆ ( b̂∗2 ) , Var
ˆ ( b̂∗3 ) , Cov
ˆ ( b̂∗, b̂∗2 ) , Cov
ˆ ( b̂∗, b̂∗3 ) , and Cov
ˆ ( b̂∗2 , b̂∗3 ).
covariances of these parameter estimates Var
1
1
1
S based on the linear odds model was given in equation 11 as
b∗1 + b∗2 + b∗3
-,
S = -----------------------------------b∗1 + b∗2
from which it follows that
ln S = ln ( b∗1 + b∗2 + b∗3 ) – ln ( b∗1 + b∗2 ) .
Since S is a fraction, the coverage properties of a confidence interval for ln S are likely to be superior. Estimated standard
ˆ
errors of ln Ŝ , SE ( ln Ŝ ), can be obtained using the multivariate delta method (15) as
ˆ
SE ( ln Ŝ ) =
2 ˆ
2 ˆ
2 ˆ
2 ˆ
ˆ
ˆ
c Var ( b̂∗1 ) + c Var ( bˆ ∗2 ) + d Var ( bˆ ∗3 ) + 2c Cov ( b̂∗1 , bˆ ∗2 ) + 2cdCov ( b̂∗1, bˆ ∗3 ) + 2cdCov ( bˆ ∗2, bˆ ∗3 ) ,
where
1 1
- – ----------------c = ----------------------------ˆ
ˆ
∗
∗
∗
∗
b̂ 1 + b 2 + b 3 b̂ 1 + bˆ ∗2
and
1
d = ------------------------------ .
ˆ
∗
b̂ 1 + b∗2 + bˆ ∗3
An approximate 95 percent confidence interval for S will then have the lower confidence limit
ˆ
ˆ
exp ( ln S – 1.96 × SE ( ln Ŝ ) )
and the upper confidence limit
ˆ
ˆ
exp ( ln S + 1.96 × SE ( ln Ŝ ) ).
Am J Epidemiol 2003;158:251–258