Joe Ibrahim, Harvard School of Public Health
Stu Lipsitz, Medical University of South Carolina
Department of Biostatistics
The University of North Carolina at Chapel Hill
Amy H. Herring
Methods for Missing Covariate Data
Analysis of Myeloma Data
Three major approaches
{ Maximum Likelihood
{ Weighted Estimating Equations
{ Multiple Imputation
Ad hoc approaches: a little knowledge can be a dangerous
thing!
Missing Data: why does it matter?
Overview
or unplanned
{ survey nonresponse
{ illness causes patient to leave study
planned
{ Latin square experimental design
{ randomized clinical trial
First step in analysis: think about what caused missing data and
implications for further data analysis.
Missing data may be
Causes of Missing Data
The missing data mechanism concerns the distribution
[R j Y; Xobs ; Xmis ; ]. (How does the probability a covariate is
observed depend on Y and X?)
{ Xobs: completely observed covariates
{ Xmis: covariates subject to missingness
R: matrix of indicators of being observed
{ rij = 1 if covariate j is observed for subject i
{ rij = 0 if covariate j is missing for subject i
Y: outcome of interest
X = (Xobs; Xmis ): matrix of xed covariates
Terminology
probability of observing a
covariate does not depend on its value but may depend on the
outcome and on other covariates. That is,
[R j Y; Xobs ; Xmis ; ] = [R j Y; Xobs ; ].
{ Missing Completely at Random (MCAR): special case of
MAR in which missing data is a random sample of complete
data. Here, [R j Y; Xobs ; Xmis ; ] = [R j ]
(p(observed) = 0:75, say).
Missing at Random (MAR):
Types of Missing Data
probability of observing a
covariate may depend on value of that covariate (which may
not be observed) as well as on the outcome and observed data.
Nonignorable Missing Data:
missing data are (1) MCAR or MAR, and (2) satisfy
condition that is distinct from parameters of interest (e.g. in regression).
Ignorable
Types of Missing Data
Omitting Missing Covariates from Model
{ Another quick and simple x
{ May lead to model misspecication
Complete Case Analysis
{ Delete all subjects with any missingness from the dataset
and pretend data were complete all along
{ Most common approach (default in most software)
{ OK with small amounts of missing data
{ Always lose eÆciency, can incur bias when data not MCAR
{ Makes strong assumptions about missing data that are
generally not recognized
Ad Hoc Approaches to Handling Missing Data
Mean imputation
{ Unconditional mean imputation \lls-in" missing values by
mean of observed values for that covariate
{ Conditional mean imputation \lls-in" missing values by
predicted values from regression of missing variable on
observed variables
{ After imputation, data is analyzed as if data were complete
all along.
{ Problem: both methods yield wrong standard errors
(imputing values at the mean of the distribution does not
accurately reect variability in the data), regardless of type
of missing data.
Ad Hoc Approaches to Handling Missing Data
Not valid in wide variety of settings (\hidden" assumptions)
Can be implemented without thinking about validity
Popular in practice
Easy to implement
The naive analyst can use these methods without thinking about
the causes of missing data (and therefore not knowing whether the
methods are appropriate), so that the results of the analysis can be
quite biased without the knowledge of the investigator.
Summary of Ad-hoc Approaches
Neither of these distributions is typically specied when data are
complete.
The second distribution, the distribution of the covariates
themselves, is given by p(Xmis j Xobs ; ).
The rst distribution, which we have already discussed, is
p(R j Y; Xobs ; Xmis ; ), which represents the mechanism that
generated the missing data. (Note: this distribution may be
multivariate)
When data are missing, we will introduce two other distributions
that may play a role in the analysis.
Complete data Y p(Y j X; ).
Inference with Missing Covariates
where our primary interest is in the estimation of , with other
parameters viewed as nuisance parameters.
p(yi j xobs;i ; xmis;i ; ) p(xmis;i j xobs;i ; ) p(ri j yi ; xobs;i ; xmis;i ; );
The complete data density of (yi ; xmis;i ; ri j xobs;i ; ; ; ) is given
by
Inference with Missing Covariates
For this missing covariate, xmis , we use a logistic regression model
for the probability that this covariate is observed. This model is
given by
exp(0 mi )
i = i () =
;
0
1 + exp( mi )
where mi is a function of yi ; xobs;i and is a vector of unknown
parameters.
p(r j y; Xobs ; xmis ; ) = p(r j y; Xobs ; ):
We will simplify matters by assuming that we have one MAR
binary covariate. That is, we assume that
Setup
Retain all nice properties of ML estimates
Base inferences on likelihood under that model
Dene model for partially observed data
Maximum Likelihood: Basic Idea
u() =
b
6
6
4
2
u1 (^ )
u2 (^ )
u3 (^ )
7
7
5
3
=
i=1
n
X
6
6
4
2
u2i (^ )
u3i (^ )
u1i (^ )
7
7
5
3
= 0;
With no missing data, the maximum likelihood estimating
equations for = (; ; ) are given by
Maximum Likelihood
i=1
n
X
u1i (; yi ; xobs;i ; xmis;i ) =
@ log p(yi j xobs;i ; xmis;i ; )
@
i=1
n
X
u3 () =
n
X
i=1
n
X
u3i (; ri ; yi ; xobs;i ) =
n
X
i=1
n
X
m0i [ri
i ()]
@ log p(xmis;i j xobs;i ; )
u2 () = u2i (; xmis;i ; xobs;i ) =
@
i=1
i=1
u1 () =
where
4
m0i (ri
i )
u2i (; xi )
u1i (; yi ; xi )
xi ; yi; r
7
7
i5
3
=
which equals the complete data score vector.
i
6
u () = E 6
2
If xi is completely observed,
6
6
4
2
m0i(ri
u2i ()
u1i ()
i )
7
7
5
3
;
Obtain the mle of = (; ; ) by setting the conditional
expectation of the complete data score, denoted u (), to 0 and
solving for b. (Expectation is taken with respect to the conditional
distribution of the missing data given the observed data.)
Estimation Via EM
3
mis;i
mis;i
i
i
obs;i
obs;i
Ex jy ;x
Ex jy ;x
m0i(ri i)
m0i (ri i )
[u1i (; yi ; xi )]
[u2i (; xi )]
7
7
5
3
;
where the score for the observed data indicators remains unchanged
as it does not depend on any quantities that are subject to
missingness.
=
6
6
4
2
2
u1i (; yi ; xi )
6
7
6
ui () = E 4 u2i (; xi )
xobs;i; yi; ri75
When some elements of xi , say xmis;i are subject to missingness,
then
j =0
obs;i
[u1i (; yi ; xobs;i ; xmis;i )]
pr(Xmis;i = j j yi ; xobs;i ; ) u1i (; yi ; xobs;i ; xmis;i = j );
i
jy ;x
and the conditional expectation for u() takes a similar form.
=
1
X
mis;i
Ex
When xmis;i is binary, the two conditional expectations (for u()
and u()) have particularly simple forms. Specically,
ML Score Equations: Binary Missing Covariate
Reference: Ibrahim (1990) and other papers.
where wi0 + wi1 = 1.
wij = pr(Xmis;i = j j yi ; xobs;i ; )
p(y j x ; x
= j; ) p(xmis;i = j j xobs;i ; )
= P1 i obs;i mis;i
;
j =0 p(yi j xobs;i ; xmis;i = j; ) p(xmis;i = j j xobs;i ; )
The conditional probabilities (or weights) are given by
ML Score Equations: Binary Missing Covariate
i=1
n
X
6
6
4
2
P1
ri u1i (; yi ; xi ) + (1 ri ) j =0 wij u1i (; yi ; xobs;i ; xmis;i = j )
P1
ri u2i (; xi ) + (1 ri ) j =0 wij u2i (; xobs;i ; xmis;i = j )
m0i (ri i )
u () =
Rewriting one last time, we have
ML Score Equations: Binary Missing Covariate
7
7
5
3
4. Iterate until convergence
3. Fixing the weights, solve u ((t+1) j (t) ) for (t+1) .
(t)
2. Using (t) , calculate the weights wij
.
1. Get initial estimate = (1) from complete cases. At the tth
EM iteration, we have (t) .
The EM algorithm in this case is relatively straightforward.
EM Algorithm
The distribution [r j y; xobs ; ] does not need to be specied
correctly.
Correctly specify distribution [xmis j xobs ; ].
{ Additional assumption needed in presence of missing data.
Correctly specify distribution [y j x; ].
{ Usual assumption when tting model.
In order for ML estimates to be valid, we must
Expectation of score for depends on , and expectation of score
for depends on . Neither expectation depends on the score for
.
Validity of ML Estimates
Observations more likely to be missing are given greater weight
to make up for larger probability of missingness
Weight observations by the inverse probability of being
observed
Dene model for i , the probability the covariate for subject i
is observed
Weighted Estimating Equations: Basic Idea
i=1
2
n 6
X
64
i
ri u (; y ; x ) + (1 ri ) P1 w u (; y ; x
i i
i obs;i ; xmis;i = j )
j =0 ij 1i
i 1i
i
ri u (; x ) + (1 ri ) P1 w u (; x
i
obs;i ; xmis;i = j )
j =0 ij 2i
i 2i
i
m0i (ri i )
u () =
i
i
3
7
7
5
Robins et al. (1994) and Zhao et al. (1996) dene general forms for
weighted estimating equations. Suppose that in the previous ML
score equations, we replace ri with r , where 1 is the inverse
probability of being observed. Then we obtain the weighted
estimating equation S (bW EE ) = 0, as given in Lipsitz, Ibrahim,
and Zhao (JASA, 1999), where
Weighted Estimating Equations
i=1
s1i (; yi ; xi )
m0i (ri i )
ri
4 i
5
3
;
Pn
where i=1 s1i (; yi ; xi ) is an estimating equation with
expectation zero.
u () =
n
X
2
The previous estimating equations show a link between WEE and
ML. Perhaps more people are familiar with the following form:
Weighted Estimating Equations: Note
the distribution (xmis j xobs ; ) need not be correctly specied
correct specication of the distribution for i , given by
(r j y; xobs ; )
{ additional assumption needed when data are missing
{ need suÆcient amount of missing data to estimate i with
some precision
correct specication of the distribution (y j x; ) (to be
specic, we just need the score contribution to have mean zero)
{ usual assumption when modeling
In order to obtain unbiased parameter estimates, we need
Validity of WEE
Combine separate inferences into one overall result
Analyze each lled-in dataset as if it were the complete data
Create multiple "complete" datasets by lling in values for the
missing data
Rubin (1978), other papers, and book.
Multiple Imputation: Basic Idea
(m)
for the mth imputed dataset, m = 1; : : : ; M .
(m)
Variance estimate straightforward
Parameter estimate b =
PM
b
m=1 M
Do this M times to construct M "complete" datasets, and then
use the M datasets to estimate variability
Obtain b
Multiple Imputation
1
PM
b
(m)
VM I
b
(m)
M
b
1 b
= V + (1 + )B:
b
b)0 . This is the
This is the average \within" imputation
Dene B = M 1 m=1 (
)(
\between" imputation variance.
b
V
M
b (m) .
m=1
The variance estimate is given by
M
1
variance.
V =
Vb (m): Variance estimate from mth imputed dataset
Dene VPto be the average variance estimate. That is,
Multiple Imputation Variance
xmis
P1
Problem: we don't know this density exactly.
p(xmis j y; xobs ; ) =
p(y j xobs ; xmis ; ) p(xmis j xobs ; )
=0 p(y j xobs ; xmis ; ) p(xmis j xobs ; )
We will impute data from the distribution p(xmis j y; xobs ; ).
Imputing Data
Allows better estimate of variability than simple imputation
with slightly more eort
Can improve specication through iterative procedure (can
become very sophisticated and similar to EM)
Depends on correct specication of p(xmis j y; xobs ; )
Validity of Multiple Imputation
Missing Covariate(s)
One Discrete
MI
WEE
Method
ML
Notes
program in SAS(IML)/S-plus
estimate likelihood iteratively (EM)
runs quickly (vertical augmentation)
program in SAS(IML)/S-plus
t one logistic regression for t one weighted model for outcome
runs quickly
program in SAS(IML)/S-plus
runs quickly (horizontal augmentation)
Computational Details
Missing Covariate(s)
Several Discrete
MI
WEE
Method
ML
Notes
program in SAS(IML)/S-plus
estimate likelihood iteratively (EM)
runs quickly (vertical augmentation)
program in SAS(IML)/S-plus
more diÆcult to model t one weighted model for outcome
runs quickly
program in SAS(IML)/S-plus
runs quickly (horizontal augmentation)
Computational Details
Missing Covariate(s)
Several Discrete
and Continuous
MI
WEE
Method
ML
Notes
program in SAS(IML)/S-plus more diÆcult
due to Gibbs sampling (I use C )
estimate likelihood iteratively (EM)
runs more slowly (Gibbs sampling)
program in SAS(IML)/S-plus
again, may be diÆcult to model
t one weighted model for outcome
runs quickly
program in SAS(IML)/S-plus with eort
(impute via Gibbs or importance sampling)
probably easier in C
runs less quickly (sampling)
Computational Details
You hope that your nal results are not aected too much by these
models. (If so, then you must be guided by your knowledge of the
subject matter at hand.)
With ML, WEE, or MI, it is important to conduct a sensitivity
analysis. This involves checking your distributional assumptions by
tting a variety of models for the distribution of the missing
covariates and the missing data mechanism. You may wish to
include models that assume MCAR, MAR, and nonignorable
missingness for purposes of comparison.
Sensitivity Analysis
Outcome: survival time (time from study entry until death)
New therapy (VBMCP: Vincristine, BCNU,
Cyclophosphamide, Melphalan, and Prednosone) vs. standard
therapy (MP: Melphalan and Prednosone alone).
E2479 (Kalish et al., 1992)
Multiple Myeloma: cancer of plasma cells
Myeloma Data
Baseline characteristics of interest:
{ frac: indicator of prior bone fractures
{ logbun: log of blood urea nitrogen
{ hgb: hemoglobin
{ platelet: platelet count
{ logwbc: log of white blood cell count
{ logpbm: log of percentage of plasma cells in bone marrow
{ scalc: serum calcium
Myeloma Data
20% (85 patients) missing frac
423 patients from subset of original study
i = exp(0 + 01 xobs;i + 2 fraci );
and xobs;i = (trti ; logbuni ; hgbi ; plateleti ; logwbci ; logpbmi ; scalci ).
where
i
(yi ; Æi j xobs;i ; xmis;i ; ) / exp( i yi )Æi ;
Model for survival time: exponential.
Myeloma Data
To obtain this model, we xed the model for and used a step-up
approach from the main eects model, retaining all eects
signicant at the 0.20 level.
where
g(xobs;i ) = 1;1 logbuni + 1;2 hgbi + 1;3 plateleti + 1;4 logwbci +
1;5 logpgmi + 1;6 scalci + 1;7 trti + 1;8 logbuni trti
1;9 logbuni logpbmi + 1;10 scalci trti :
exp 0 + 01 g(xobs;i )
p(fraci = 1 j xobs;i ; ) =
;
0
1 + exp 0 + 1 g(xobs;i )
Because frac is missing, we must specify its distribution.
Myeloma Data Model
logit(i ) = 0 + 1 log(timei ) + 2 logbuni + 3 hgbi + 4 plateleti
+ 5 logwbci + 6 logpbmi + 7 scalci + 8 trti
+ 9 log(timei ) scalci + 10 logbuni hgbi
+ 11 hgbi scalci :
We also need to specify a model for the missing data mechanism.
This model is easier to specify, since its estimation can be carried
out separately from the estimation of . (For example, one could
use stepwise selection in SAS.) The chosen model is
Myeloma Data Model
Method
CC
ML
WEE
MI
CC
ML
WEE
MI
Eect
INTERCEPT
FRAC
-0.02
-0.03
-0.04
0.03
-5.95
-6.10
-6.10
-6.09
^
0.12
0.11
0.11
0.16
0.95
0.84
0.86
0.84
-0.21
-0.24
-0.33
0.19
-6.28
-7.28
-7.10
-7.24
0.83
0.81
0.74
0.85
0.00
0.00
0.00
0.00
SE Z -statistic p-value
Note: Estimates presented so that negative estimates mean
protective eects.
Method
CC
ML
WEE
MI
CC
ML
WEE
MI
Eect
LOGBUN
HGB
-0.04
-0.04
-0.04
-0.03
0.27
0.35
0.35
0.28
^
0.03
0.02
0.02
0.02
0.23
0.20
0.21
0.21
-1.40
-1.68
-1.98
-1.33
1.18
1.71
1.62
1.36
0.16
0.09
0.05
0.18
0.24
0.09
0.11
0.17
SE Z -statistic p-value
Method
CC
ML
WEE
MI
CC
ML
WEE
MI
Eect
PLATELET
LOGWBC
0.21
0.08
0.08
0.20
-1.40
-0.15
-0.15
-1.59
^
0.16
0.14
0.17
0.14
0.60
0.14
0.20
0.58
1.33
0.56
0.45
1.38
-2.33
-1.09
-0.78
-2.73
0.18
0.58
0.66
0.17
0.02
0.27
0.44
0.00
SE Z -statistic p-value
Method
CC
ML
WEE
MI
CC
ML
WEE
MI
Eect
LOGPBM
SCALC
0.05
0.08
0.08
0.07
0.28
0.27
0.27
0.27
^
0.04
0.04
0.03
0.04
0.09
0.08
0.07
0.08
1.16
2.06
2.33
1.85
3.30
3.60
3.70
3.48
0.24
0.04
0.02
0.06
0.00
0.00
0.00
0.00
SE Z -statistic p-value
Eect
TREATMENT
Method
CC
ML
WEE
MI
-0.05
-0.06
-0.06
-0.08
^
0.11
0.10
0.10
0.10
-0.41
-0.59
-0.61
-0.83
0.69
0.55
0.54
0.41
SE Z -statistic p-value
MAR or Nonignorable missing data: Complete case and ad hoc
methods risky, especially with larger (> 15%) proportions of
missing data; conduct sensitivity analysis along with ML,
WEE, or MI (or some combination of the three if industrious)
MCAR missing data: Complete case OK, but eÆciency gain
with ML, WEE, or MI
Not much missing data (5 10%): Complete case analysis and
ad hoc methods probably benign (and better methods probably
not worth the trouble; WEE may be problematic without
enough missing data to estimate p(observed) with decent
precision)
Conservative Practical Guidelines
MI: Rubin (1987), Rubin & Schenker (Stat. in Med., 1991)
WEE: Robins, Rotnitzky, & Zhao (JASA, 1994), Zhao, Lipsitz,
& Lew (Biometrics, 1996), Lipsitz, Ibrahim, & Zhao (JASA,
1999)
ML: Ibrahim (JASA, 1990), Ibrahim, Chen, & Lipsitz
(Biometrics, 1999), Lipsitz & Ibrahim (Biometrika, 1996)
Missing Data: Little & Rubin (1987), Little (JASA, 1992)
Selected References
© Copyright 2026 Paperzz