Methods of Covariate Selection: Directed Acyclic Graphs and the

American Journal of Epidemiology
ª The Author 2009. Published by the Johns Hopkins Bloomberg School of Public Health.
All rights reserved. For permissions, please e-mail: [email protected].
Vol. 169, No. 10
DOI: 10.1093/aje/kwp035
Advance Access publication April 10, 2009
Practice of Epidemiology
Methods of Covariate Selection: Directed Acyclic Graphs and the Change-inEstimate Procedure
Hsin-Yi Weng, Ya-Hui Hsueh, Locksley L. McV. Messam, and Irva Hertz-Picciotto
Initially submitted August 10, 2007; accepted for publication January 26, 2009.
Four covariate selection approaches were compared: a directed acyclic graph (DAG) full model and 3 DAG and
change-in-estimate combined procedures. Twenty-five scenarios with case-control samples were generated from 10
simulated populations in order to address the performance of these covariate selection procedures in the presence of
confounders of various strengths and under DAG misspecification with omission of confounders or inclusion of
nonconfounders. Performance was evaluated by standard error, bias, square root of the mean-squared error, and
95% confidence interval coverage. In most scenarios, the DAG full model without further covariate selection performed as well as or better than the other procedures when the DAGs were correctly specified, as well as when
confounders were omitted. Model reduction by using change-in-estimate procedures showed potential gains in
precision when the DAGs included nonconfounders, but underestimation of regression-based standard error might
cause reduction in 95% confidence interval coverage. For modeling binary outcomes in a case-control study, the
authors recommend construction of a ‘‘conservative’’ DAG, determination of all potential confounders, and then
change-in-estimate procedures to simplify this full model. The authors advocate that, under the conditions investigated, the selection of final model should be based on changes in precision: Adopt the reduced model if its standard
error (derived from logistic regression) is substantially smaller; otherwise, the full DAG-based model is appropriate.
bias (epidemiology); computer simulation; confounding factors (epidemiology); epidemiologic methods; logistic
models; models, statistical; models, theoretical
Abbreviations: DAG, directed acyclic graph; lnOR, natural logarithm-transformed odds ratio; OR, odds ratio.
Directed acyclic graphs (DAGs) and change-in-estimate
procedures for confounder identification and selection
during data analysis have, to date, been discussed separately in the epidemiologic literature (1–8). With few exceptions (9–11), data analysts have also tended to apply
the procedures separately, although no obvious subject
matter considerations preclude their joint use. This has
been a natural course of action because the use of DAGs
is generally based only on prior knowledge or a priori
assumptions about causal relations among variables of interest in the source population from which the study sample is taken, while the change-in-estimate procedure relies
on sample-based relations among variables. These fundamental differences serve also to highlight limitations in
both approaches: DAGs in ignoring sampling variation
and the change-in-estimate in not taking into consideration
underlying causal relations.
Although use of prior knowledge in model building has
long been advocated (2, 12–14), previous studies have not
comprehensively examined the performance of covariate
selection procedures that combine the effect of both approaches on parameter estimation. Thus, in this simulation
study, we investigated whether combined approaches could
improve parameter estimation over the DAG approach alone
in the presence of confounders of various strengths and under DAG misspecification resulting from omission of confounders or inclusion of nonconfounders. This objective
distinguishes this study from previous studies on confounder selection strategies (5, 15, 16). We do not discuss
problems resulting from adjustment for colliders or from
Correspondence to Dr. Hsin-Yi Weng, 2430 Veterinary Medicine Basic Sciences Building, 2001 South Lincoln Avenue, Urbana, IL 61802 (e-mail:
[email protected]).
1182
Am J Epidemiol 2009;169:1182–1190
Methods of Covariate Selection
1183
Table 1. Predetermined Regression Coefficients for Covariate-Exposure (bE) and Covariate-Outcome (bO) Relations Used for Logistic
Regression Models Fit to Each of the 10 Populations
Population
Covariatea
1
2
3
4
5
6
7
8
9
10
bE
bO
bE
bO
bE
bO
bE
bO
bE
bO
bE
bO
bE
bO
bE
bO
bE
bO
bE
bO
X1
1.5
1.5
1.0
1.0
0.8
0.8
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
2.0
0
X2
0.5
0.5
0.5
0.5
0.3
0.3
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.0
1.0
1.5
0
X3
1.5
0.5
1.0
0.5
0.8
0.3 1.5 1.5 0.5 0.5
0.5
0.5
1.5
0
0
1.5
0.5
0.5
0.5
0
X4
0.5
1.5
0.5
1.0
0.3
0.8
0.5 0.5
0
1.5
0
1.5
0
0
2.0
X5
0.5
0.5
0.5
0.5
0.3
0.3
1.5
1.5
0
1.5
1.0
0
0
1.5
X6
1.5
0
1.0
0
0.8
0
0.5
0
0
1.0
X7
0
1.5
0
1.0
0
0.8
0
1.5
0
0.5
0
1.0
0.5 0.5 1.5
0
1.5
0
1.5
0
1.5
0
X8
X9
Exposure
Intercept
0
1.5
1.0
0.8
1.5
1.5
1.5
1.5
1.5
0.5
1.5
1.5
3.6 7.2 3.6 6.3 3.3 5.4 2.1 5.2 2.1 5.3 2.1 5.3 2.5 5.2 1.1 6.0 3.5 6.5 3.1 8.4
a
All covariates except X5 in populations 1–3 and X7 in population 10 were Bernoulli random variables with success probability of 0.2. These 4
covariates followed a normal distribution with a mean of 3 and a variance of 1.
confounder selection based on significance tests, as these
issues have been addressed elsewhere (3, 4, 5–7, 15, 17).
MATERIALS AND METHODS
Populations and samples
We created 10 different populations, each consisting of
500,000 observations with a binary exposure (E) and outcome (O), as well as covariates (Xj) (Table 1). We then
selected random samples of size 300 and 1,000 from these
populations with confounders of various strengths and generated 25 scenarios (Table 2) to examine the performance of
the covariate selection procedures under correctly and
incorrectly specified DAGs for populations with different
confounding structures. Misspecified DAGs may omit confounders or include nonconfounding covariates (i.e., those
nonconfounders that are neither colliders nor consequences
of either exposure or outcome).
We iterated the process of cumulative incidence casecontrol sampling from the population for each sample size
1,000 times. One set consisted of 150 cases and 150 controls, and the other consisted of 500 cases and 500 controls.
Sampling was without replacement within each iteration,
but the sampled observations were replaced for the next
iteration. SAS software for Windows (18) was used to generate the samples.
Covariate selection procedures
Logistic regression was used to model the parameter of
interest, that is, the odds ratio (OR) relating E to O while
simultaneously controlling for selected covariates. The covariate selection procedures investigated in this study are as
follows.
DAG full model. The covariates identified as confounders by the relevant DAG were included in the logistic reAm J Epidemiol 2009;169:1182–1190
gression model without further covariate selection. For
example, X1, . . ., X5, but not X6 or X7, were included
in the logistic regression models for all the samples in
scenarios 1–3 (Tables 1 and 2). Different sets of covariates
were included in the DAG full models in scenarios 4–25 to
represent different types of DAG misspecification (Table 2).
For example, X1, . . ., X4, but not X5, were included in the
DAG full model for scenario 4 (sampled from population 1)
to represent a misspecified DAG that omitted a confounder
(i.e., X5).
DAG gold-standard change-in-estimate procedure. In this
procedure, the initial full model was the DAG full model, and
covariates were selected by backward elimination. At each
stage, the 1 covariate for which removal caused the smallest
change in the exposure OR (defined as DOR) was removed,
providing the DOR was less than 0.1 (a 10% change). DOR, at
each stage, was given by the following equation:
DOR ¼ j ORi ORDAG j=ORDAG ;
ð1Þ
where ORi is the OR estimated at the ith step of the procedure, and ORDAG is the OR estimated by using the initial
DAG full model.
The procedure was discontinued at the step where no
covariate’s removal met the criterion (i.e., DOR 0.1).
DAG gold-standard change-in-estimate procedure with
consideration of precision. After selecting the model using
the previously described DAG gold-standard procedure, we
compared precision, quantified using the logistic regressionestimated standard error of the natural logarithm-transformed
OR (lnOR), of this simplified model with precision of the
DAG full model. At the final step, the model simplified by
using the DAG gold-standard change-in-estimate procedure
was selected if, and only if, it had greater precision (i.e.,
a smaller regression-estimated standard error) than did the
DAG full model. Otherwise, the DAG full model was
selected.
1184 Weng et al.
Table 2. Covariates Included (O) in the Initial Directed Acyclic Graph Full Model for Each of the 25 Scenarios
Scenariosb
Covariatea
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
23
24
25
X1
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
X2
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
X3
O
O
O
O
O
O
O
O
O
O
O
O
O
O
X4
O
O
O
O
X5
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
X6
21
22
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
X7
O
O
O
O
O
O
O
O
O
O
X8
O
O
X9
O
O
Exposure
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
Intercept
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
Source population
1
2
3
1
4
1
1
1
3
1
3
5
6
1
7
9
10
1
8
9
10
10
1
9
10
a
Covariate relations with the exposure and outcome are described in Table 1 for each of the 10 source populations.
Scenarios 1–3 contained confounders of decreasing strengths. Scenarios 4 and 5 represented misspecified directed acyclic graphs that
omitted a strong confounder. Scenarios 6–9 represented directed acyclic graphs that omitted a moderate or weak confounder. Scenarios 10–12
represented directed acyclic graphs that omitted 2 same-direction confounders, while scenario 13 omitted 2 opposite-direction confounders.
Scenarios 14–17 represented misspecified directed acyclic graphs that included nonconfounders associated with only study exposure, while
scenarios 18–22 represented the inclusion of nonconfounders associated with only study outcome. Scenarios 23–25 included nonconfounders that
were each associated with only study exposure or outcome.
b
DAG-stepwise change-in-estimate procedure. The DAG
full model was the initial model, and backward elimination
was applied. Instead of being compared with the OR obtained from the DAG full model, as in the case of the DAG
gold-standard change-in-estimate procedure, the OR obtained at each step was compared with that generated at
the previous step (i.e., DOR ¼ | ORi ORi–1 |/ORi1).
The 0.1 criterion previously described was also applied to
this procedure.
We used a SAS macro (19) to perform the DAG-stepwise
change-in-estimate procedure (2). This macro was further
modified to execute the DAG gold-standard change-inestimate procedures.
Performance measures
The performances of these 4 procedures were assessed by
comparing the lnOR calculated from each sample with the
corresponding population lnOR. The performance measures
were as follows: 1) standard error, estimated by using the
standard deviation of the sample lnORs; 2) bias, estimated
as the difference between the mean of the sample lnORs and
the population lnOR; 3) square root of the mean-squared
error, estimated as the square root of the sum of the bias
squared and the variance (squared standard error) of the
sample lnORs; and 4) 95% confidence interval coverage,
calculated as the percentage of sample 95% confidence intervals that included the population lnOR. Sample confidence intervals were Wald confidence intervals derived
from logistic regression.
To quantify precision, we calculated the standard errors
derived from both the standard deviation of the 1,000 iterations (sampling distribution-based standard error) and logistic
regression (regression-based standard error). These 2 standard
error estimates were then compared with each other.
Simulations
For simulations, all covariates followed a Bernoulli distribution with a success probability of 0.2 except for X5 in
populations 1–3 and X7 in population 10, which were continuous and distributed normally with a mean of 3 and a standard deviation of 1 (Table 1). After predetermination of the
distribution of each covariate, its relations with the exposure
and outcome were modeled by using logistic regression (refer to equations 2 and 3):
ð2Þ
Pr E ¼ 1j Z ¼ eZbE 1 þ eZbE
and
Pr O ¼ 1j Z# ¼ eZ#bO 1 þ eZ#bO ;
ð3Þ
where E and O are as previously defined, and PrðE ¼ 1j ZÞ
is the success probability of exposure conditional on the 1 3 n
covariate matrix Z ¼ 1, X1, . . . Xj (n ¼ j þ 1); PrðO ¼ 1j Z#Þ
is the success probability of the outcome conditional on the
1 3 n# covariate matrix Z# ¼ 1, E, X1, . . . Xk (n# ¼ k þ 2);
and bE and bO are the n 3 1 and n# 3 1 matrices of the
regression coefficients expressing the covariate-exposure
associations and the effects of the study exposure and other
covariates on outcome, respectively.
Logistic regression intercepts were predetermined in order to ensure that the 95th percentile of outcome incidence
proportion and the median of exposure prevalence in the
source populations were approximately 10% and 20%,
Am J Epidemiol 2009;169:1182–1190
Methods of Covariate Selection
respectively. The percentiles were derived from the exposurecovariate joint probabilities, computed by using the predetermined regression coefficients (Table 1). The means of
the continuous covariates and an exposure prevalence of
0.2 were used in these computations.
1185
Table 3. Scenarios Based on Correctly Specified Directed Acyclic
Graphs Representing Confounders of Various Strengthsa
Measures by
Scenariob
Covariate Selection Methods
DAG
DAG-GS CE DAG-GS P DAG-S CE
Standard error
RESULTS
Although the relative performances of the procedures did
not vary with sample size, the magnitudes of the differences
between them were greater at n ¼ 1,000 than at n ¼ 300. We
present only the results of n ¼ 1,000.
Simulation results
The simulated 95% confidence interval coverages of
properly specified DAG full models were all close to the
nominal level (95%–96%), and the absolute values of the
bias were all small (0.003–0.025) (only results from scenarios 1–3 are shown) (Table 3). These results indicated
that the number of simulations performed in the study was
satisfactory, because in this study confounding of less
than 0.1 (a 10% change in OR) was considered inconsequential in covariate selection.
Performance measures
No misspecification: strength of confounding (scenarios
1–3). Overall, when the DAG correctly specified the un-
1
0.173
0.177
0.177
0.188
2
0.164
0.167
0.167
0.179
3
0.176
0.177
0.177
0.184
1
0.007
0.057
0.057
0.077
2
0.006
0.058
0.058
0.119
3
0.003
0.072
0.072
0.159
1
0.173
0.186
0.186
0.204
2
0.164
0.177
0.177
0.215
3
0.176
0.191
0.191
0.244
Bias
Square root of the
mean-squared
error
95% confidence
interval
coverage, %
1
96
94
94
90
2
95
93
93
86
3
96
94
94
84
Abbreviations: DAG, directed acyclic graph full model; DAG-GS
CE, directed acyclic graph gold-standard change-in-estimate procedure without consideration of precision; DAG-GS P, directed acyclic
graph gold-standard change-in-estimate procedure with consideration of precision; DAG-S CE, directed acyclic graph stepwise
change-in-estimate procedure.
a
Standard errors, bias, square root of the mean-squared error, and
the 95% confidence interval coverage for the natural logarithm of
sample odds ratios were obtained by using the 4 model selection
methods shown. The results were from 1,000 case-control samples,
each consisting of 500 cases and 500 controls.
b
Scenarios 1–3 contained confounders of decreasing strengths,
respectively.
derlying causal relations, the DAG full model performed
best, and the DAG-stepwise change-in-estimate procedure
performed worst (Table 3). This was true regardless of
strength of confounding. A comparison of the DAG goldstandard change-in-estimate procedure with versus without
precision considerations showed that the results were identical on all performance measures.
Bias and 95% confidence interval coverage were the
measures that produced the most substantial differences
among the 4 procedures. Although the DAG full model
consistently produced the least bias, the bias resulting
from the DAG-stepwise change-in-estimate procedure
was consistently the greatest. Ninety-five percent confidence interval coverage was closer to nominal coverage
for both the DAG full model and the DAG gold-standard
change-in-estimate procedures than for the DAG-stepwise
change-in-estimate procedure. The differences among the
4 procedures for these 2 performance measures (bias and
95% confidence interval coverage) were inversely associated with strength of confounding; that is, the differences
increased as the strength of confounding decreased.
Standard errors generated by the DAG-stepwise changein-estimate procedure were as much as 9% greater than
DAG full model standard errors. Although the differences
in standard error were small between the DAG full model
and the DAG gold-standard change-in-estimate procedures,
they increased with strength of confounding.
(Table 4). In scenarios 5, 12, and 13, in which only 2 covariates were included in the initial full model for selection,
performance measures were essentially the same across the
4 methods. In all other scenarios (where 3 covariates were
included in the initial full model for selection), the DAG full
model performed best, with respect to bias, square root of
the mean-squared error, and 95% confidence interval coverage, and the DAG-stepwise change-in-estimate procedure
performed worst. This was also true in general for the standard errors.
The results of the DAG gold-standard change-in-estimate
procedure with versus without precision considerations
were identical for all 4 performance measures for scenarios
4–13.
Misspecification: omission of confounders (scenarios
4–13). When the DAG omitted confounders, the DAG full
Misspecification: inclusion of nonconfounding covariates
(scenarios 14–25). When the initial DAG included non-
model performed as well as or better than the other procedures, and the DAG-stepwise procedure performed worst
confounders, the DAG gold-standard change-in-estimate
and DAG-stepwise change-in-estimate procedures produced
Am J Epidemiol 2009;169:1182–1190
1186 Weng et al.
Table 4. Scenarios Based on Misspecified Directed Acyclic Graphs
That Omitted Confoundersa
Measures by
Scenario (Omitted
Confoundersb)
DAG-GS CE
DAG-GS P
Covariate Selection Methods
Measures by
Scenario (Omitted
Confoundersb)
Covariate Selection Methods
DAG
Table 4. Continued
DAG-S CE
Standard error
4 (1 strong)
0.169
0.173
0.173
0.183
5 (1 strong)
0.172
0.172
0.172
0.172
6 (1 moderate)
0.164
0.165
0.165
0.171
7 (1 moderate)
0.163
0.164
0.164
0.170
8 (1 weak)
0.171
0.184
0.184
0.189
9 (1 weak)
0.174
0.178
0.178
0.185
10 (2 moderate)c
0.156
0.155
0.155
0.156
11 (2 weak)c
0.171
0.177
0.177
0.181
c
12 (2 weak)
0.169
0.169
0.169
0.169
13 (2 weak)d
0.157
0.157
0.157
0.157
0.214
0.265
0.265
0.292
Bias
DAG
DAG-GS CE
DAG-GS P
DAG-S CE
Square root of the
mean-squared
error
4 (1 strong)
0.272
0.317
0.317
0.345
5 (1 strong)
0.243
0.243
0.243
0.243
6 (1 moderate)
0.224
0.256
0.256
0.272
7 (1 moderate)
0.221
0.256
0.256
0.267
8 (1 weak)
0.179
0.199
0.199
0.206
9 (1 weak)
0.175
0.196
0.196
0.244
c
10 (2 moderate)
0.314
0.344
0.344
0.345
11 (2 weak)c
0.199
0.234
0.234
0.251
12 (2 weak)c
0.188
0.188
0.188
0.188
13 (2 weak)d
0.159
0.159
0.159
0.159
5 (1 strong)
0.171
0.171
0.171
0.171
95% confidence
interval
coverage, %
6 (1 moderate)
0.153
0.195
0.195
0.212
4 (1 strong)
78
63
63
55
0.207
5 (1 strong)
85
85
85
85
0.081
6 (1 moderate)
85
79
79
74
88
79
79
76
4 (1 strong)
7 (1 moderate)
8 (1 weak)
0.149
0.051
9 (1 weak)
10 (2 moderate)c
c
11 (2 weak)
c
12 (2 weak)
d
13 (2 weak)
0.197
0.076
0.197
0.076
0.017
0.084
0.084
0.159
7 (1 moderate)
0.272
0.307
0.307
0.308
8 (1 weak)
95
92
92
90
0.175
9 (1 weak)
96
92
92
84
0.082
10 (2 moderate)c 62
51
51
50
0.022
11 (2 weak)c
93
86
86
82
c
12 (2 weak)
93
93
93
93
13 (2 weak)d
96
96
96
96
0.101
0.082
0.022
0.152
0.082
0.022
0.152
0.082
0.022
Table continues
smaller standard error than did the DAG full model in most
scenarios regardless of whether the nonconfounder(s) were
associated with only outcome or only exposure (Table 5).
These procedures also outperformed the DAG full model
with respect to bias and square root of the mean-squared
error when the DAGs included nonconfounders that were
associated with only study outcome. The DAG full model
resulted in 95% confidence interval coverage that was closer
to nominal coverage than the other procedures.
When the DAG was misspecified to include nonconfounders, the results of the DAG gold-standard
change-in-estimate procedure with versus without precision
considerations were, again, identical.
Regression-based versus sampling distribution-based
standard errors
The DAG gold-standard change-in-estimate procedure
had a smaller mean of regression-based standard error than
did their corresponding DAG full models in 22 of 25 scenarios (Table 6). On the other hand, only in 11 scenarios did
the DAG gold-standard change-in-estimate procedure produce smaller sampling distribution-based standard error
than the DAG full model. Ten of these scenarios corresponded to when the DAG included nonconfounders.
Abbreviations: DAG, directed acyclic graph full model; DAG-GS
CE, directed acyclic graph gold-standard change-in-estimate procedure without consideration of precision; DAG-GS P, directed acyclic
graph gold-standard change-in-estimate procedure with consideration of precision; DAG-S CE, directed acyclic graph stepwise
change-in-estimate procedure.
a
Standard errors, bias, square root of the mean-squared error, and
the 95% confidence interval coverage for the natural logarithm of
sample odds ratios were obtained by using the 4 model selection
methods shown. The results were from 1,000 case-control samples,
each consisting of 500 cases and 500 controls.
b
Number and strength of omitted confounders.
c
Omitted confounders were in the same direction.
d
Omitted confounders were in the opposite direction.
DISCUSSION
This study generated 25 different scenarios to investigate
whether covariate selection strategies that combined DAGs
and change-in-estimate approaches could improve parameter estimation over the DAG procedure used by itself under
different scenarios of correctly specified DAGs with various
strengths of confounding and of DAG misspecification.
The finding that the DAG full model consistently performed best when DAGs were correctly specified with various strengths of confounding (scenarios 1–3) suggests that
Am J Epidemiol 2009;169:1182–1190
Methods of Covariate Selection
Table 5. Scenarios Based on Misspecified Directed Acyclic Graphs
That Included Nonconfoundersa
Measures by
Scenario (Included
Nonconfoundersb)
DAG
Table 5. Continued
Measures by
Scenario (Included
Nonconfoundersb)
Covariate Selection Methods
DAG-GS CE DAG-GS P DAG-S CE
Standard error
14 (1 E only)
0.181
0.184
0.184
0.189
15 (1 E only)
0.180
0.173
0.173
0.173
16 (3 E only)
0.190
0.187
0.187
0.184
17 (1 E only)
0.181
0.173
0.173
0.172
18 (1 O only)
0.181
0.183
0.183
0.192
19 (1 O only)
0.178
0.173
0.173
0.173
20 (3 O only)
0.192
0.184
0.184
0.185
21 (3 O only)
0.197
0.184
0.184
0.181
22 (4 O only)
0.201
0.186
0.186
0.183
23 (1 E, 1 O only)
0.188
0.190
0.190
0.193
24 (3 E, 3 O only)
0.201
0.192
0.192
0.188
25 (3 E, 4 O only)
0.223
0.204
0.204
0.193
Bias
1187
Covariate Selection Methods
DAG
DAG-GS CE DAG-GS P DAG-S CE
Square root of the
mean-squared
error
14 (1 E only)
0.181
0.193
0.193
0.204
15 (1 E only)
0.180
0.173
0.173
0.173
16 (3 E only)
0.199
0.206
0.206
0.202
17 (1 E only)
0.182
0.173
0.173
0.172
18 (1 O only)
0.183
0.194
0.194
0.208
19 (1 O only)
0.181
0.174
0.174
0.174
20 (3 O only)
0.214
0.204
0.204
0.206
21 (3 O only)
0.203
0.186
0.186
0.183
22 (4 O only)
0.211
0.190
0.190
0.185
23 (1 E, 1 O only)
0.191
0.202
0.202
0.208
24 (3 E, 3 O only)
0.226
0.214
0.214
0.210
25 (3 E, 4 O only)
0.234
0.208
0.208
0.195
14 (1 E only)
0.010
0.060
0.060
0.077
15 (1 E only)
0
0.005
0.005
0.005
95% confidence
interval
coverage, %
16 (3 E only)
0.059
0.087
0.087
0.083
14 (1 E only)
95
92
92
90
96
95
95
95
17 (1 E only)
0.013
0.007
0.007
0.001
15 (1 E only)
18 (1 O only)
0.028
0.065
0.065
0.080
16 (3 E only)
94
92
92
93
0.021
17 (1 E only)
95
94
94
94
0.091
18 (1 O only)
95
92
92
90
94
94
94
94
19 (1 O only)
20 (3 O only)
0.032
0.095
0.021
0.088
0.021
0.088
21 (3 O only)
0.050
0.029
0.029
0.026
19 (1 O only)
22 (4 O only)
0.065
0.037
0.037
0.028
20 (3 O only)
92
92
92
92
0.079
21 (3 O only)
94
93
93
94
0.092
22 (4 O only)
95
93
93
94
0.027
23 (1 E, 1 O only) 94
90
90
90
24 (3 E, 3 O only) 94
92
92
92
25 (3 E, 4 O only) 95
90
90
92
23 (1 E, 1 O only)
24 (3 E, 3 O only)
25 (3 E, 4 O only)
0.030
0.102
0.069
0.069
0.095
0.041
0.069
0.095
0.041
Table continues
further model simplification by using the change-inestimate procedure, which takes sampling variation into account, did not on average improve parameter estimation.
The observations that the DAG full model provided minimal
bias and the best 95% confidence interval coverage when
DAGs were misspecified with omission of confounders (scenarios 4–13) simply reflect the fact that the variable selection procedures used in this study will not, in general,
attenuate the bias resulting from inaccurate or incomplete
causal assumptions made at the initial stage. The findings
that the DAG gold-standard change-in-estimate procedures
performed better with respect to bias than did the DAGstepwise change-in-estimate procedure in scenarios 1–13
highlight a potential deficiency in the latter. If a 0.1 change
in parameter estimate is used as the criterion, with regard to
bias, the DAG gold-standard change-in-estimate and the
DAG-stepwise change-in-estimate procedures estimate
ORs that conform to (0.9)DORDAG < OR < (1.1)ORDAG
and (0.9) jORDAG < OR < (1.1) jDORDAG, respectively
(where ORDAG and j are the estimated OR and the number
Am J Epidemiol 2009;169:1182–1190
Abbreviations: DAG, directed acyclic graph full model; DAG-GS
CE, directed acyclic graph gold-standard change-in-estimate procedure without consideration of precision; DAG-GS P, directed acyclic
graph gold-standard change-in-estimate procedure with consideration of precision; DAG-S CE, directed acyclic graph stepwise
change-in-estimate procedure; E, study exposure; O, study outcome.
a
Standard errors, bias, square root of the mean-squared error, and
the 95% confidence interval coverage for the natural logarithm of
sample odds ratios were obtained by using the 4 model selection
methods shown. The results were from 1,000 case-control samples,
each consisting of 500 cases and 500 controls.
b
Number of included nonconfounders and their relations with study
exposure and outcome. For example, scenario 25 has included 7
nonconfounders, 3 of which were associated with exposure only
and 4 with outcome only.
of confounders included in the initial DAG full model, respectively). Thus, the DAG-stepwise change-in-estimate
procedure results in a wider range of estimated ORs than does
the DAG gold-standard change-in-estimate approach and,
on average, is expected to produce more biased estimates
1188 Weng et al.
Table 6. Sampling Distribution-based Standard Error and the Mean
of Regression-based Standard Error for the Natural Logarithmtransformed Odds Ratios Obtained by Using the Directed Acyclic
Graph Full Model and the Directed Acyclic Graph Gold-Standard
Change-in-Estimate Procedurea
Scenarios
Sampling
Distributionbased SE
DAG DAG-GS CE
% of DAG-GS
CE With Smaller
Regressionbased SE
DAG DAG-GS CE
Regressionbased SE
1
0.173
0.177
0.176
0.173
2
0.164
0.167
0.165
0.162
98.3
3
0.176
0.177
0.183
0.179
4
0.169
0.173
0.170
0.166
99.7
5
0.172
0.172
0.176
0.176
19.1
6
0.171
0.184
0.174
0.171
39.5
7
0.164
0.165
0.169
0.166
99.5
8
0.163
0.164
0.167
0.164
99.4
9
0.174
0.178
0.182
0.178
99.8
10
0.156
0.155
0.160
0.159
99.9
11
0.171
0.177
0.177
0.176
99.8
12
0.169
0.169
0.175
0.175
25.9
13
0.157
0.157
0.167
0.167
24.7
14
0.181
0.184
0.181
0.173
99.9
15
0.180
0.173
0.178
0.171
95.8
16
0.190
0.187
0.190
0.178
100
17
0.181
0.173
0.182
0.166
100
18
0.181
0.183
0.181
0.172
100
19
0.178
0.173
0.178
0.169
20
0.192
0.184
0.192
0.179
100
21
0.197
0.184
0.191
0.170
100
22
0.201
0.186
0.197
0.171
100
23
0.188
0.190
0.186
0.172
100
24
0.201
0.192
0.205
0.180
100
25
0.223
0.204
0.220
0.177
100
99.7
100
93.6
Abbreviations: DAG, directed acyclic graph full model; DAG-GS
CE, directed acyclic graph gold-standard change-in-estimate procedure; SE, standard error.
a
The percentage of models simplified by using the DAG-GS CE
procedure that resulted in a smaller regression-based standard error
than the corresponding directed acyclic graph full models was also
presented. The results were from 1,000 case-control samples, each
consisting of 500 cases and 500 controls.
when a sizable number of covariates are being considered
for elimination and when the bias produced by the DAG full
model cannot be corrected by model reduction, such as
when the DAG is incorrectly specified to omit confounders.
From a statistical point of view, model reduction should
resolve the tradeoff of a larger bias for a smaller variance
(13, 20–22). We found, however, that model reduction with
logistic regression resulted in a larger bias but not necessarily a smaller standard error in most of the scenarios when the
underlying causal assumptions were correctly specified or
when the DAG was misspecified by the omission of confounders. Our study demonstrated that use of logistic
regression-based standard error in covariate selection is
problematic. Although the DAG gold-standard change-inestimate procedure always produced an equal or smaller
mean of regression-based standard error than did the corresponding DAG full model, it had a smaller sampling
distribution-based standard error than did the DAG full model
only in 1 of these 13 scenarios. Earlier work by Robinson and
Jewell (21) and Robinson et al. (22) explains the identical
results of the DAG gold-standard change-in-estimate with
versus without precision considerations observed in our
study. The inconsistency between the regression-based and
the sampling distribution-based standard errors is in agreement with findings from a previous study (16) and likely
reflects inflated precision resulting from ignoring covariate
selection-related uncertainty in regression modeling (16, 23).
Use of regression-based standard errors in covariate selection
optimizes the regression-estimated precision but not necessarily the true precision, which is estimated by the underlying
sampling distribution.
When the DAG was misspecified with inclusion of nonconfounders (scenarios 14–25), the DAG gold-standard
change-in-estimate and the DAG-stepwise change-inestimate procedures produced smaller standard errors than
did the DAG full model in most scenarios regardless of
whether the nonconfounders that were included were associated with only study exposure or only outcome. That the
largest differences in the mean regression-based standard errors between the DAG gold-standard change-in-estimate procedure and the DAG full model were consistently observed in
these scenarios is noteworthy and might serve as an indication
for misspecification of DAGs with inclusion of nonconfounders. However, the reduction in the 95% confidence interval
coverage resulting from the DAG gold-standard change-inestimate procedures in many of these scenarios signaled the
potential downward bias of the regression-based standard
error. The DAG full model produced 95% confidence interval
coverage that was closer to nominal coverage more consistently than did the other procedures in these scenarios.
The paradox that the DAG full models produced largest
bias in most of the scenarios when the DAG included nonconfounders that were associated with only study outcome but
not when they were associated with only study exposure could
be explained by the noncollapsibility of OR (22, 24–27). After
adjustment for these covariates that were associated with only
study outcome, the DAG full model produced the smallest
bias in 10 of these 12 scenarios (data not shown).
We caution the readers that the scenarios examined in our
study are restricted to only a small region of the parameter
space, and the conclusions made from this study are limited to
the population structures generated and the number of covariates investigated. For instance, the underlying structure that
we investigated involved confounders that were each independent of the others, that is, not on the same backdoor paths of
the DAGs. Redundant confounders such as those lying on the
same backdoor paths as other confounders were not investigated. We further acknowledge that there are different covariate selection procedures commonly used in epidemiologic
studies, such as hierarchical backwards elimination (13, 28).
The results from this study may not be generalizable or directly comparable with these procedures. Nevertheless,
Am J Epidemiol 2009;169:1182–1190
Methods of Covariate Selection
although we used case-control sampling, the results from this
study should be applicable to a cohort study that uses risk or
rate ratios as effect measures (29, 30) as we ensured that the
outcomes were rare (i.e., incidence proportion <10%) in most
joint strata of covariates in the source populations. On the
other hand, our study encountered problems specific to using
OR in confounder identification (12, 24–26). Additionally,
although all logistic regression models converged, this study
did not investigate constraints on model convergence. Although we expect that the results would be applicable to other
commonly used models such as Poisson and Cox proportionalhazards models, the methods might not be feasible for
others, such as log binomial models, as problems of model
convergence using these models have been previously reported (31–33).
In conclusion, potential, but not conclusive, benefits of
performing further covariate selection using change-inestimate procedures were observed only when the DAGs
were misspecified by the inclusion of nonconfounders. We
conclude, therefore, that the primary task for the researcher/
analyst is to ensure that proper causal assumptions are made,
in particular, that no strong confounders are excluded from
data collection or analysis. Given that the investigator is
never certain about the accuracy of prior causal assumptions,
the recommended strategy is to construct a ‘‘conservative’’
DAG, including all known confounders and potential confounders even at the risk of including nonconfounders
(given that they are neither colliders nor downstream
effects of the exposure or the outcome), and use this full
model in data analysis. An alternative is to construct a
series of DAGs, each having plausibility based on prior
knowledge, with various degrees of ‘‘conservativeness’’
regarding potential but not established confounders. The
DAG full model for each can then be reported with complete transparency about the assumed underlying model.
The analysts then could perform covariate selection using
the DAG gold-standard change-in-estimate procedure. A
large reduction in regression-based standard error in the
simplified model might be an indication of misspecification of the underlying causal assumption and, specifically,
with inclusion of nonconfounders. However, even under
this type of misspecification, bias may be increased, and
the 95% confidence interval coverage may deviate from
nominal coverage through covariate selection.
A final caveat is that results related to bias are applicable
on the basis of the average of 1,000 iterations, but in any
given single study, the actual performance of model reduction is unknown and may, in some circumstances, produce
a less biased estimate of effect.
ACKNOWLEDGMENTS
Author affiliations: Department of Pathobiology, College
of Veterinary Medicine, University of Illinois at UrbanaChampaign, Urbana, Illinois (Hsin-Yi Weng); Department of
Biostatistics, School of Public Health and Tropical Medicine,
Tulane University, New Orleans, Louisiana (Ya-Hui Hsueh);
Department of Public Health and Preventive Medicine,
Am J Epidemiol 2009;169:1182–1190
1189
School of Medicine and School of Veterinary Medicine,
St. George’s University, Grenada, West Indies (Locksley L.
McV. Messam); and Department of Public Health Sciences,
School of Medicine, University of California at Davis, Davis,
California (Irva Hertz-Picciotto).
The authors thank Lora D. Delwiche of the Public Health
Sciences, University of California at Davis, for assisting
with modifying the SAS macros for covariate selection
procedures.
Conflict of interest: none declared.
REFERENCES
1. Glymour MM, Weuve J, Berkman LF, et al. When is baseline
adjustment useful in analyses of change? An example with
education and cognitive change. Am J Epidemiol. 2005;162(3):
267–278.
2. Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health. 1989;79(3):
340–349.
3. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.
4. Hernán MA, Hernández-Dı́az S, Werler MM, et al. Causal
knowledge as a prerequisite for confounding evaluation: an
application to birth defects epidemiology. Am J Epidemiol.
2002;155(2):176–184.
5. Maldonado G, Greenland S. Simulation study of confounderselection strategies. Am J Epidemiol. 1993;138(11):923–936.
6. Pearl J. Causal diagrams for empirical research. Biometrika.
1995;82(4):669–688.
7. Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology. 2001;12(3):313–320.
8. Rothman KJ, Greenland S. Modern Epidemiology. 2nd ed.
Philadelphia, PA: Lippincott-Raven; 1998.
9. Nelson MC, Gordon-Larsen P, Adair LS. Are adolescents who
were breast-fed less likely to be overweight? Analyses of
sibling pairs to reduce confounding. Epidemiology. 2005;
16(2):247–253.
10. Weng HY, Kass PH, Hart LA, et al. Risk factors for unsuccessful dog ownership: an epidemiologic study in Taiwan.
Prev Vet Med. 2006;77(1-2):82–95.
11. Messam LL, Kass PH, Chomel BB, et al. The human-canine
environment: a risk factor for non-play bites? Vet J. 2008;
177(2):205–215.
12. Miettinen OS, Cook EF. Confounding—essence and detection.
Am J Epidemiol. 1981;114(4):593–603.
13. Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic
Research: Principles and Quantitative Methods. Belmont,
CA: Lifetime Learning Publications; 1982.
14. Pearl J. Causality: Models, Reasoning, and Inference.
Cambridge, United Kingdom: Cambridge University Press;
2000.
15. Mickey RM, Greenland S. The impact of confounder selection
criteria on effect estimation. Am J Epidemiol. 1989;129(1):
125–137.
16. Budtz-Jørgensen E, Keiding N, Grandjean P, et al. Confounder
selection in environmental epidemiology: assessment of health
effects of prenatal mercury exposure. Ann Epidemiol. 2007;
17(1):27–35.
17. Sonis J, Hertz-Picciotto I. Accessing the presence of confounding. Fam Med. 1996;28(7):462–463.
18. SAS Institute, Inc. SAS Software for Windows. Version 9.1.
Cary, NC: SAS Institute, Inc; 1999.
1190 Weng et al.
19. Hegewald J, Pfahlberg A, Uter W. A backwards-manual selection macro for binary logistic regression in SAS v. 8.02
PROC LOGISTIC procedure. Presented at the 16th Annual
North East SAS Users Group Conference (NESUG 2003),
Washington, DC, September 2002.
20. Robins JM, Greenland S. The role of model selection in causal
inference from nonexperimental data. Am J Epidemiol. 1986;
123(3):392–402.
21. Robinson LD, Jewell NP. Some surprising results about covariate adjustment in logistic regression models. Int Stat Rev.
1991;59(2):227–240.
22. Robinson LD, Dorroh JR, Lien D, et al. The effects of covariate adjustment in generalized linear models. Commun Stat
Theory Meth. 1998;27(7):1653–1675.
23. Chatfield C. Model uncertainty, data mining and statistical
inference. J R Stat Soc A. 1995;158(part 3):419–466.
24. Greenland S, Robins JM. Identifiability, exchangeability, and
epidemiological confounding. Int J Epidemiol. 1986;15(3):
413–419.
25. Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with non-linear
regression and omitted covariates. Biometrika. 1984;71:
431–444.
26. Hauck WW, Neuhaus JM, Kalbfleisch JD, et al. A consequence of omitted covariates when estimating odds ratios.
J Clin Epidemiol. 1991;44(1):77–81.
27. Negassa A, Hanley JA. The effect of omitted covariates on
confidence interval and study power in binary outcome analysis: a simulation study. Contemp Clin Trials. 2007;28(3):
242–248.
28. Kleinbaum D, Klein M. Logistic Regression: A Self-Learning
Text. 2nd ed. New York, NY: Springer; 2002.
29. Greenland S, Thomas DC. On the need for the rare disease
assumption in case-control studies. Am J Epidemiol. 1982;
116(3):547–553.
30. Greenland S. Interpretation and choice of effect measures in
epidemiologic analyses. Am J Epidemiol. 1987;125(5):761–768.
31. Wacholder S. Binomial regression in GLIM: estimating risk
ratios and risk differences. Am J Epidemiol. 1986;123(1):
174–184.
32. McNutt LA, Wu C, Xue X, et al. Estimating the relative risk in
cohort studies and clinical trials of common outcomes. Am J
Epidemiol. 2003;157(10):940–943.
33. Zou G. A modified Poisson regression approach to prospective
studies with binary data. Am J Epidemiol. 2004;159(7):
702–706.
Am J Epidemiol 2009;169:1182–1190