Analysis of Longitudinal Data with Irregular

Analysis of Longitudinal Data with Irregular, Informative Followup
Haiqun Lin†
Yale University, New Haven, U.S.A.
Daniel O. Scharfstein
Johns Hopkins Bloomberg School of Public Health, Baltimore, U.S.A.
Robert A. Rosenheck
VA Northeast Program Evaluation Center & Yale University, West Haven, U.S.A.
Summary. A frequent problem in longitudinal studies is that subjects may miss some scheduled visits or be assessed at self-selected points in time. As a result, observed outcome data
may be highly unbalanced. Also, the availability of the data may be directly related to the
outcome measure or some auxiliary factors that are related to the outcomes. This situation
can be viewed as informative follow-up, as well as the one of informative intermittent missing
data. Analysis without accounting for informative follow-up will produce biased estimates.
Building on the work of Robins, Rotnitzky and Zhao (1995), we propose a class of inverse
intensity of visit process weighted estimators in marginal regression models for longitudinal
responses that may be observed in a continuous-time fashion. This allows us to handle
arbitrary patterns of missing data as embedded in a subject’s visit process. We derive the
large sample distribution for our inverse visit intensity weighted estimators and investigate
the finite sample behavior of our estimators with simulations. Our approach is also illustrated
with a data set from a health services research study in which homeless people with mental
illness were randomized to three different treatments and measures of homelessness and
other auxiliary factors were recorded at the follow-up times that are not fixed by design.
Keywords: Informative follow-up; Intermittently missingness; Dropout; Longitudinal data;
Weighted generalized estimating equations; Visit process.
†Address for correspondence: Haiqun Lin, Division of Biostatistics, Department of Epidemiology
& Public Health, Yale University, 60 College Street, Room 208 LEPH, New Haven, CT 06520,
U.S.A.
E-mail: [email protected]
2
Lin et al.
1.
Introduction
In many longitudinal studies, subjects are followed over a period of time and are scheduled
to be assessed at a common set of pre-specified visit times after enrollment. However,
subjects often selectively miss their visits or return at non-scheduled points in time. As
a result, the measurement times are irregular yielding a highly imbalanced data structure.
In addition, the frequency and timing of the visits may be informative with respect to the
longitudinal outcomes. This situation can be viewed as informative follow-up, as well as the
one of informative intermittent missing data. The goal of this paper is to develop a general
methodology for analyzing studies with these features.
Our interest is motivated by a randomized trial comparing three housing interventions
for homeless people with mental illness. The trial was conducted at four of 35 sites that
implemented a joint program linking services of the US Department of Housing and Urban Development (HUD) and the US Department of Veterans Affairs (VA) − the HUD-VA
Supported Housing (HUD-VASH) program. In this program, 460 veterans were randomly
assigned to receive: (i) the full HUD-VASH intervention (housing voucher plus case management); (ii) case management but no vouchers, or (iii) standard VA care (Rosenheck et al.,
2002).
Although efforts were made to conduct follow-up interviews every three months, subjects
often missed assessments or showed up between scheduled interviews. The frequency and
timing of the observations in the data are thus quite different across subjects. Over a 48
month period after randomization, the mean numbers of visits (standard errors) of group
(i) through (iii) are 8.5 (3.2), 7.1 (3.6) and 6.0 (3.5). At each visit, the primary measure of
interest is the percentage of days homeless during the past three months. Auxiliary measures
were also collected including income, quality of life, addiction severity, and whether any
social or VA benefit was received. During the trial, veterans were allowed to continue their
participation regardless how many previous assessments they had missed or whether they
reappeared at the time of a scheduled interview. For example, out of the total of 2,837
visits made by the 460 participants over four years, 686 visits had a gap of more than six
months from the previous visit, 126 had a gap of more than one year. In analyzing this
dataset, we are concerned about the informative nature of the visiting process. Specifically,
researchers believe that subjects with greater levels of homelessness and more severe social,
informative follow-up
3
economic, and mental health problems will be likely to visit more frequently. More detailed
information on the three service groups and the trial is deferred until Section 5.
Most of the previous statistical literature on incomplete longitudinal data has focused on
monotone missing data in which a subject’s data is observed only through a certain time
and is missing for all subsequent observations (Wu and Carroll, 1988; Diggle and Kenward,
1994; Little, 1995; Follmann and Wu, 1995; Hogan and Laird, 1997; Scharfstein et al., 1999;
Fitzmaurice et al., 2001, etc.). There are a few studies that deal with intermittently missed
visits in which subjects may miss some visits among a common set of pre-defined visit
times (Troxel et al., 1998; Deltour et al., 1999; Preisser et al., 2000; Albert et al., 2002,
etc.), however, these approaches require specifying the time points at which data can be
missing and do not readily handle irregularly spaced visit times. Since the frequency and
timing of subjects’ visits may be informative with respect to longitudinal outcomes, they
deserve careful consideration. Recently, Lipsitz et al. (2002) developed a likelihood-based
procedure for estimating the regression parameters in models for continuous longitudinal
data. Specifically, they assumed that the repeated measures followed a Gaussian process.
Under the assumption that the intensity of visiting at any time t is conditionally independent
of current and future outcomes given the observed past outcomes, the visit process becomes
ignorable and inference can proceed by maximizing the likelihood of the observed outcome
data. In addition to their assumption on the visit process, they rely on the correlation
structure of the longitudinal response to handle the informativeness of subjects’ visit process
and therefore the validity of their approach hinges on correct specification of the Gaussian
process, especially the correlation structure.
In this article, we are interested in a regression methodology which (a) accommodates
varieties of outcome measures such as binary, continuous, or percentage (e.g., the homelessness outcome in our motivating example), (b) affords comparison of the population-average
evolution of a longitudinal outcome measure, (c) makes minimal distributional assumptions
about the outcome and auxiliary processes, and (d) accommodates visits that occur in a
continuous-time fashion and may depend on both outcomes and auxiliary time-dependent
prognostic factors that are correlated with outcomes and visiting. Liang and Zeger (1986)’s
marginal modeling approach for longitudinal data can accommodate issues (a)-(c). Unfortunately, it is well known that the standard inferential approach using generalized estimating
equations (GEE) of Liang and Zeger (1986) will yield biased inferences when the visit process
4
Lin et al.
is correlated with the outcomes under investigation. Robins et al. (1995) have developed a
method to correct for this bias when visits occur only at pre-defined occasions. Specifically,
they proposed a weighted GEE technique that yields consistent and asymptotically normal
estimators when the dependence of missingness at a pre-defined visit time t on the observed
past (including possibly time-dependent auxiliary factors, outcomes, and visit history) is
correctly specified. In their approach, a subject’s contribution to the estimating equation at
a pre-defined time point t is weighted by the inverse of the conditional probability of being
observed at that time (marginalized over visit history). Thus, the conditional probability of
being observed at jth occasion is a product of j conditional probabilities (of visiting or not).
It may be argued that for visits occurring in continuous time, one can group the time scale
into discrete intervals and apply Robins et al. (1995) method. However, the determination
of time intervals may be arbitrary and it may be difficult to specify intervals so that a subject
makes at most one visit in any given interval. Our method extends the Robins et al. (1995)
approach to handle visits occurring in continuous time and therefore avoids grouping visit
times and also the calculation of the product of the conditional visiting probabilities. With
our extension, we are able to fully accommodate issues (a)-(d) above.
Our approach views missingness as embedded in a subject’s visit process occurring in
continuous time so as to accommodate unbalanced longitudinal data. We assume the intensity of visiting at time t is conditionally independent of the outcome at that time given the
past observed data that include visit history, outcome and possibly other prognostic factors.
This assumption is equivalent to sequential randomization of Robins (1998) and is weaker
than the one posited by Lipsitz et al. (2002). We specify a marginal regression model for the
longitudinal outcome at time t, which we allow to depend on baseline and external covariates
(Kalbfleisch and Prentice, 2002) and a deterministic function of time. We also specify an
intensity model for the counting process of visits at time t via past observed outcomes and
counting process of visits as well as possibly time-dependent auxiliary prognostic factors. To
estimate the parameters of the marginal regression model, we propose a class of stabilized,
inverse intensity of visit process-weighted estimators. Under correct specification of the visit
intensity model, our estimators will be consistent and asymptotically normal. Specification
of the joint distribution of the outcomes and the auxiliary processes is not required for our
methodology. Our estimators can be computed relatively easily using standard statistical
software (e.g., SAS, Splus, or R) that can fit Cox models with time-dependent covariates
informative follow-up
5
and generalized linear models with weights. Our methodology can be easily extended to
simultaneously handle dropout that is explainable by past observed factors (Robins and
Rotnitzky, 1992; Robins et al., 1995).
This remainder of this article is organized as follows. In Section 2, we define our data
structure, introduce the marginal outcome and visit process models and show how the model
assumptions are integrated through the inverse visit intensity weighted method to provide
identification of the regression parameters of primary interest. In Section 3, we provide the
estimation procedure and describe how estimation can take place with standard statistical
software. In Section 4, we derive the large sample properties of our class of inverse visit
intensity weighted estimators, and show how to compute the asymptotic standard errors.
In Section 5 we illustrate our approach with the analysis of the data from the HUD-VASH
clinical trial. In Section 6, we present findings from the simulation studies. We conclude with
a discussion in Section 7, where we describe some possible extensions of our methodology.
2.
The Models
2.1. Data Structure
Let τ be a positive fixed value of time, measured from the study entry, when the data will be
analyzed. Let C denote a subject’s follow-up time, also measured from study entry. Let P (t)
denote the primary outcome, potentially available at time t. In our motivating example,
P (t) is recorded as the percentage of days homeless in the past three months. We are mainly
interested in how the evolution of the mean of P (t) is related to a vector of fixed covariates
Z (such as treatment group assignment). Let X(t) denote a vector of auxiliary variables,
potentially available at time t. Let N (t) be a counting process recording the number of
observed visits recorded by time t. It is assumed that N (0) = 1. Let Y (t) = I(C ≥ t)
denote a left-continuous at-risk process, indicating whether a subject is still under follow-
up. We view the full complete data for a subject as
F = (X(τ ), P (τ ), Z),
where V (t) = {V (s) : 0 ≤ s ≤ t} denotes the history of the variable V (s) through time t.
We assume that C is independent of F . The observed data for an individual through time
t is
F obs (t) = (N (t), Y (t), X
obs
(t), P
obs
(t), Z)
6
Lin et al.
where V obs (t) = V (max{s : 0 ≤ s ≤ t, dN (s) = 1}) is the most recent observed value of
the variable V (t) at or prior to time t, V
obs
(t) = {V obs (s) : 0 ≤ s ≤ t} = {V (s) : 0 ≤ s ≤
t, dN (s) = 1} denotes the observed history of the variable V (s) through time t. We assume
there are n subjects in the study and that we observe n i.i.d. copies of O = F obs (τ ),
obs
obs
O = {Oi = Fiobs (τ ) = (N i (τ ), Y i (τ ), X i (τ ), P i (τ ), Z i ) : i = 1, . . . , n}.
2.2.
Marginal Regression Model
We assume that the conditional mean of the longitudinal outcome at time t, P (t), follows
the marginal regression model
E[P (t)|Z] = µ(t, Z; β 0 ) for all 0 ≤ t ≤ τ
(1)
where β 0 is a p × 1 vector of unknown regression parameters and the mean µ(t, Z; β0 ) is
a specified function of t, Z, and β0 . For example, in the analysis of the homelessness data
presented in Section 5, we define Z = (Z(i) , Z(ii) , Z(iii) )T , where Z(κ) is the indicator that a
subject is assigned to intervention (κ) and we will let µ(t, Z; β0 ) = logit−1 {Z(i) β T0,(i) Bd (t)+
Z(ii) β T0,(ii) Bd (t) + Z(iii) β T0,(iii) Bd (t)}, where logit is the logistic link function, Bd (t) is a
B-spline basis with d degrees of freedom, β0,(κ) is a d × 1 vector of unknown regression
parameters for intervention (κ), β 0 = (β T0,(i) , β T0,(ii) , βT0,(iii) )T , and p = 3d.
Our goal is to use O to obtain consistent estimators of β 0 in (1) above. It is well known
that when outcomes are observed at irregular points in time and the timing and frequency
of the observations may be informative about the longitudinal outcome of interest, the
estimates of β0 in (1) obtained via GEE will not be consistent. Our objective is to estimate
β 0 when subjects’ visit process depends on outcome and possibly on auxiliary variables that
are related to the outcome.
2.3.
Visit Process Assumptions
In order to obtain consistent estimates of β0 , one must make assumptions about the visit
process. As will be shown in Section 3.2, the GEE estimator with a working independence
covariance structure of Liang and Zeger (1986) is valid if the visiting at time t is independent
of P (t) given at-risk status at t− and the fixed covariate Z. That is, for 0 < t ≤ τ ,
P [ dN (t) = 1|Y (t−), Z, P (t)] = P [ dN (t) = 1|Y (t−), Z] = Y (t)λ(t, Z) dt,
(2)
informative follow-up
7
where λ(t, Z) is the intensity of visiting at time t that may be dependent on the fixed Z.
The validity of general GEE estimators, requires the even stronger assumption that, for all
t, the observed visit process N (τ ) is independent of P (t) given Z.
In many longitudinal studies, like the one in our motivating example, the timing and
frequency of the observations may be informative about the outcome of interest, and thus
the above assumption is unlikely to hold. As a result, naı̈ve application of GEE may yield
biased estimates. In order to obtain consistent estimates for β0 , the informativeness of
a subject’s visit process has to be taken into account. We relax the above assumption by
positing that visiting at time t is independent of P (t) given at-risk status at t−, the covariate
Z, all past recorded outcomes, counting process of visits and auxiliary factors. Specifically,
for 0 < t ≤ τ , we assume
P [dN (t) = 1|F obs (t−), P (t)] = P [dN (t) = 1|F obs (t−)] = Y (t)λ(t, F obs (t−))dt,
(3)
where λ(t, F obs (t−)) is the intensity of visiting at time t that may depend on the fixed
covariates and the recorded history of the outcome and some auxiliary factors. We refer
to (3) as the sequential randomization assumption (Robins, 1998) as it indicates that the
decision to visit at time t does not depend on current outcome given the past history.
2.4. Identification of β0
It is important to note that the regression parameters β0 are identified under (1) and (3)
because the visit intensity in (3) are identified from the distribution of the observed data
and it can be shown that
Z τ
c(t, Z; β 0 )
dN (t)] = 0
(4)
E[ {P (t) − µ(t, Z; β 0 )}
λ(t, F obs (t−))
0
where c(t, Z; β0 ) is a specified p × 1-dimensional function for t, Z, and β0 and the choice
c(t, Z; β0 ) will be discussed in Section 3.2. This follows by an application of the iterated
expectations formula:
Z τ
c(t, Z; β 0 )
dN (t)
E
{P (t) − µ(t, Z; β0 )}
λ(t, F obs (t−))
0
Z τ c(t, Z; β 0 )
obs
=E
dN (t)F (t−)
E {P (t) − µ(t, Z; β 0 )}
λ(t, F obs (t−))
0
Z τ
obs
c(t, Z; β 0 )
=E
{P (t) − µ(t, Z; β0 )}
E dN (t)F (t−)
λ(t, F obs (t−))
0
Z τ
=E
{P (t) − µ(t, Z; β0 )} c(t, Z; β 0 ) dt = 0.
0
8
Lin et al.
It can be seen that the inverse of the visit process intensity, 1/λ(t, F obs (t−)), serves as a
weight function in (4). By weighting, we create a “pseudo” population in which the visit
process is no longer associated with the outcome P (t) and valid marginal inference of the
longitudinal outcome can be made as if all the observed values of the outcome were random
sample from an underlying distribution.
Consistent estimates of β0 can be obtained as the solution to the empirical version of
(4) with the expectation replaced by a sample mean and λ(t, F obs (t−)) substituted by some
suitable estimates. The steps for consistently estimating λ(t, F obs (t−)) and β 0 are described
in Section 3.
2.5.
Visit Process Models
Unfortunately, non-parametric estimation of the visit intensity function λ(t, F obs (t−)), even
in moderate-sized datasets is infeasible due to the curse of dimensionality (Huber, 1985;
Robins and Ritov, 1997). Thus, we will assume that λ(t, F obs (t−)) specified in (3) follows a
lower-dimensional model of the form:
λ(t, F obs (t−)) = λ0 (t) exp(γ T0 H(t, F obs (t−)))
(5)
where the baseline visit intensity λ0 (t) is an unknown, non-negative function of t,
H(t, F obs (t−)) is a specified function of t and F obs (t−) chosen by the investigator, and γ 0
is a unknown vector of regression parameters of the same dimension as H(t, F obs (t−)). The
function H(t, F obs (t−)) may include various functional forms for N (t−), P
obs
(t−), X
obs
(t−),
Z and their interactions. In its full form, we refer to this model as the “predictor-adjusted”
intensity model.
We refer to the visit intensity model where H(t, F obs (t−)) depends only on Z as the”null”
intensity model. Specifically, the null intensity model assumes that
null,T null
λ(t, F obs (t−)) = λnull (t, F obs (t−)) = λnull
H (t, Z))
0 (t) exp(γ 0
(6)
where λnull
0 (t) is an unknown, non-negative function of t called baseline null intensity,
H null (t, Z) is a specified function of t and Z, and γ null
is a vector of unknown regression
0
parameters of the same dimension of H null (t, Z). Note that if (6) holds then (2) holds.
informative follow-up
3.
9
Estimation
We will consider the estimation under models (1) and (5). In order to estimate the parameter
of primary interest β0 in (1), we first need to estimate γ 0 and λ0 (t) from (5).
3.1. Estimation of Visit Process Model Parameters
For the intensity model (5), we know that the semi- or non-parametric maximum likelihood
(NPML) estimator for γ0 is found by maximizing the partial likelihood for the visit intensity
model (Andersen et al., 1993). Specifically, γ 0 is estimated by γ̂ as the solution to the
partial likelihood score equation S ∗ (O; γ) = 0, where
"
Pn
n Z τ
γ T H(t,Fjobs (t−)) #
obs
X
Y
(t)H(t,
F
(t−))e
j
j=1 j
dNi (t).
S ∗ (O; γ) =
H(t, Fiobs (t−)) −
Pn
γ T H(t,Fiobs (t−))
j=1 Yj (t)e
i=1 0
Rt
The NPML estimator of the cumulative baseline intensity, Λ0 (t) = 0 λ0 (s), in (5), is given
by Breslow’s estimator,
Λ̂0 (t) =
Z
t
0
Pn
dNi (s)
.
T
obs
i=1 Yi (s) exp{γ̂ H(s, Fi (s−))}
Pn
i=1
A kernel smoothed estimator of λ0 (t) can be obtained:
Z
t−s
1 τ
dΛ̂0 (s),
K
λ̃0 (t) =
b 0
bn
(7)
where bn is a sample-size dependent bandwidth and K(·) is a kernel of choice. By controlling
the kernel bandwidth bn such that bn → 0, nbn → ∞ and lim sup n1/5 bn < ∞ as n → ∞,
then λ̃0 is consistent and converges to λ0 in probability faster than n1/4 (Andersen et al.,
null
1993). The γ null
and λnull
and λ̃null
in a similar
0
0 (t) specified in (6) can be estimated by γ̂
0
fashion.
3.2. Estimation of Marginal Regression Parameters
We now estimate β0 as the solution, β̂(c) to the empirical version of (4) with γ 0 and λ0 (t)
replaced by their estimators, γ̂, λ̃0 . That is, we solve
n
X
U (Oi ; β; γ̂, λ̃0 ; c) = 0,
i=1
where
U (O; β 0 , γ 0 , λ0 ; c) =
Z
0
τ
{P (t) − µ(t, Z; β 0 )}
c(t, Z; β 0 )
dN (t).
λ0 (t) exp{γ T0 H(t, F obs(t−))}
(8)
10
Lin et al.
Note that the integrand above only needs to be evaluated at dN (t) = 1 when a subject’s
data is observed. Although the choice of c(t, Z; β0 ) can be completely arbitrary, we choose
∂µ(t, Z; β) null,T null
c(t, Z; β 0 ) =
Σ(t, µ(t, Z; β 0 ))−1 λnull
H (t, Z)},
0 (t) exp{γ 0
∂β
β =β 0
where Σ(t, µ(t, Z; β0 )) is a conditional variance function of P (t) given Z with an overdispersion parameter φ0 . This variance function is defined as a function of t and the mean function
µ(t, Z; β 0 ) and chosen by the investigator so that the conditional variance of P (t) is specified as Σ(t, µ(t, Z;β0 ))/φ0 . We choose c(0,
Z; β0 ) so that the ratio in the integral above at
{Σ(0, µ(0, Z; β 0 ))}−1 with λ0 (0) = λnull
t = 0 is equal to ∂µ(0, Z; β)/∂β 0 (0) = 1.
β =β 0
We force this condition because all subjects are assumed to have a visit at baseline.
Estimating equation (8) can also be written as the weighted estimating equation in a
similar form given in Liang and Zeger (1986):
n
X
D(Z i ; β)T V (Z i ; β)−1 W (Z i ; β; γ̂, λ̃0 ){P i − µ(Z i ; β)} = 0,
(9)
i=1
where P i = (Pi1 , . . . , Pini )T is a ni -dimensional observed outcome vector in which
Pij is the observed outcome for subject i at jth visit time, tij , µ(Z i ; β)
=
(µ(ti1 , Z i ; β), . . . , µ(tini , Z i ; β))T , D(Z i ; β) = ∂µ(Z i ; β)/∂β T is a ni × p matrix, V (Z i ; β)
is a ni × ni diagonal matrix of variance function with the jth diagonal element being
Σ(tij , µ(tij , Z i ; β)), and W (Z i ; β; γ̂, λ̃0 ) is a ni ×ni diagonal weight matrix with jth diagonal
null,T null
element being [λnull
H (tij , Z i )}]/[λ̃0 (tij ) exp{γ̂ T0 H(tij , F obs (tij −))}]. We
0 (tij ) exp{γ 0
c (Z i ; β; γ̂, λ̃0 ) in which the unknown
solve (9) for β with W (Z i ; β; γ̂, λ̃0 ) replaced by W
null
λnull
in W (Z i ; β; γ̂, λ̃0 ) are substituted with their consistent estimates λ̃null
0 (t) and γ 0
0 (t)
and γ̂ null
0 . Thus, we are effectively solving (8) with c replaced by a consistent estimator ĉ.
The solution, β̂(ĉ), to the weighted GEE given as (9) is found using the Fisher scoring
method (McCullagh and Nelder, 1989). It can be seen that the choice of c(t, Z; β) does
not involve the overdispersion parameter φ0 and therefore the value of φ0 does not affect
the estimation of β 0 . One nice feature of using the above generalized estimating equation is
that β 0 will be consistently estimated independent of a particular choice of c(t, Z; β) and
whether the matrix of the variance-covariance function V (Z; β) is correctly specified or not.
The reason that we use λnull (t, F obs (t−)) in c(t, Z, β) is two-fold. First, it serves to
numerically stabilize the influence of small values in the denominator of the integrand in
(8). Second, when F obs (t−) has no influence on the intensity of visiting at time t (null model
is correctly specified) and H(t, Fiobs (t−)) and H null (t, Z i ) are chosen so that they will be
informative follow-up
11
equal under the null, there will be cancellation in the numerator and denominator of the
ratio λnull (t, F obs (t−))/λ(t, F obs (t−)) in the integrand. Equivalently, W (Z i ; β; γ̂, λ̃0 ) in (9)
becomes the identity matrix and the resulting estimating equation will be exactly same as
the “working independence” GEE.
Because λ0 (t) and γ 0 are estimated from the data in the first stage, the variability in their
estimates must be taken into account when estimating the variances of the estimates of β0
in the second stage. We provide the estimates of the asymptotic variance of the estimates
of β 0 in Section 4.
3.3. Implementation of the Estimation Procedure in Standard Statistical Software
The above estimation procedure can be implemented in standard statistical software with
relative ease. The predictor-adjusted and the null intensity models (5) and (6) can be fitted
either using SAS proc phreg or S-plus/R function coxph with counting styled data. The
resulting intensities can be smoothed in SAS with proc kde or in S-plus/R with function
ksmooth. The estimates of β0 can then be obtained in SAS using proc genmod and in
S-plus/R using glm. The standard errors of β̂ can be obtained via bootstrap. For large
sample size, we provide the formula for the calculation of the asymptotic standard errors of
β̂ in Section 4, which can also be programmed with additional efforts.
4.
Large Sample Theory
The consistency of β̂(c) follows, under mild regularity conditions from the fact that
E[U (O; β 0 , γ 0 , λ0 , c)] = 0 under (1), (3) and (5) as shown in Section 2.4 and that our estimators of γ 0 and λ0 are consistent. Consistency of β̂(ĉ) also follows since the consistency
of β̂(c) does not rely on the choice of c(t, Z; β0 ), although we choose to use the consistent
estimators λ̃null
and γ̂ null
in β̂(ĉ). Using arguments similar to those in Robins, Mark and
0
0
Newey (1992), the large sample distribution for β̂(ĉ) is the same as for β̂(c) because the
covariates Z, used in estimating λnull (t, F obs (t−)) from (6), are also included in the marginal
model (1) for estimating β 0 .
We now consider the large sample distribution of
√
n{β̂(c)−β 0 }. Suppose for the moment
that λ0 was finite-dimensional and that γ 0 and λ0 were estimated by parametric maximum
12
Lin et al.
likelihood. Then, under regularity conditions,
n
√
1 X
n{β̂(c) − β 0 } = √
IF (Oi ; β 0 , γ 0 , λ0 ; c) + oP (1),
n i=1
where IF (O; β0 , γ 0 , λ0 ; c) is the influence function for β̂(c) as the solution to the estimating
equation (9). It can be easily shown using mean value expansion-type argument or the
geometry of Euclidean space, as in Robins et al. (1995), that the influence function would
be equal to
IF (O; β 0 , γ 0 , λ0 ; c) = −{∂E[U (O; β, γ 0 , λ0 ; c)]/∂β|β =β }−1
0
(10)
× U (O; β, γ 0 , λ0 ; c) − Π[U (O; β 0 , γ 0 , λ0 ; c)|T ]
where T is the linear subspace spanned by the scores for γ0 and λ0 and Π[·|T ] is the
projection of · onto T . The second term in the right-hand side of (10) is the residual of
U (O; β0 , γ 0 , λ0 ; c) after projecting onto T . Technically, this result only requires that the
estimators of γ 0 and λ0 converge at rates faster than n1/4+δ , for some δ > 0 (Newey, 1990).
When λ0 (t) is infinite-dimensional, Newey (1990), Bickel et al. (1993), van der Vaart
(1998), and van der Laan and Robins (2003) show, under mild regularity conditions, the
result (10) also applies, where T is the tangent sets defined as the mean-square closure of
the linear subspace spanned by the score for γ 0 and the scores for λ0 (·) from all parametric
submodels. Since our kernel estimator λ̃0 (·) is a function of the NPML Breslow estimator, by
controlling its bandwidth bn such that bn → 0, nbn → ∞ and lim sup n1/5 bn < ∞ as n → ∞,
we can achieve the convergence rate of n3/10 for λ̃0 , which is faster than n1/4 (Andersen
et al., 1993). Since β̂(ĉ) and β̂(c) share the same influence function, we have
√
D
n{β̂(ĉ) − β 0 } → Normal (0, E [IF (O; β 0 , γ 0 , λ0 ; c)IF (O; β0 , γ 0 , λ0 ; c)T ])
with the influence function IF (O; β0 , γ 0 , λ0 ; c) also in form of (10).
In the Appendix, we derive the tangent sets T and show that the projection is
Π[U (O; β0 , γ 0 , λ0 ; c)|T ]
(11)
= E [U (O; β 0 , γ 0 , λ0 ; c)S(O; γ 0 , λ0 )T ] E [S(O; γ 0 , λ0 )S(O; γ 0 , λ0 )T ]−1 S(O; γ 0 , λ0 )
Z τ
E[U (O; β 0 , γ 0 , λ0 ; c) dM (O, t; γ 0 , λ0 )]
+
dM (O, t; γ 0 , λ0 ),
λ0 (t)E[Y (t) exp{γ T0 H(t, F obs (t−))}]
0
where
S(O; γ 0 , λ0 )
(12)
Z τ
E[Y (t)H(t, F obs (t−)) exp{γ T0 H(t, F obs (t−))}]
=
H(t, F obs (t−)) −
dM (O, t; γ 0 , λ0 ),
E[Y (t) exp{γ T0 H(t, F obs (t−))}]
0
and M (O, t; γ 0 , λ0 ) = N (t) −
cess martingale.
Rt
0
informative follow-up
13
Y (s) exp{γ T0 H(s, F obs(s−))} dΛ0 (s) is the counting pro-
The first term in the right-hand side of (11) is the projection of
U (O; β0 , γ 0 , λ0 ; c) to the linear subspace spanned by the score S(O; γ0 , λ0 ) for γ 0 and the
second term is the projection to the space spanned by the scores for λ0 (·) from all parametric
submodels.
We estimate the influence function IF (O; β0 , γ 0 , λ0 ; c) given as (10) by
n
o−1
n
X
c (O; β̂, γ̂, λ̃0 ; ĉ) = − ∂
U (Oj ; β, γ̂, λ̃0 ; ĉ) ˆ
IF
∂β j=1
β =β
io
n
h
b U (O; β̂, γ̂, λ̃ ; ĉ)T
,
× U (O; β̂, γ̂, λ̃0 ; ĉ) − Π
0
where
b [U (O; β̂, γ̂, λ̃ ; ĉ)|T ]
Π
0
=
n
nX
U (Oj ; β̂, γ̂, λ̃0 ; ĉ)Ŝ(Oj ; γ̂, λ̃0 )
j=1
n
on X
j=1
+
Z
0
and
T
τ
Pn
Ŝ(Oj ; γ̂, λ̃0 )Ŝ(Oj ; γ̂, λ̃0 )T
o−1
Ŝ(O; γ̂, λ̃0 )
U (Oj ; β̂, γ̂, λ̃0 ; ĉ) dM (O, tj ; γ̂, λ̃0 )
dM (O, t; γ̂, λ̃0 )
Pn
λ̃0(t) j=1 Yj (t) exp{γ̂ T H(t, Fjobs (t−))}
j=1
Ŝ(O; γ̂, λ̃0 )
#
Pn
Z τ"
T
obs
obs
Y
(t)H(t,
F
(t−))
exp{γ̂
H(t,
F
(t−))}]
j
j
j
j=1
Pn
=
H(t, F obs (t−)) −
dM (O, t; γ̂, λ̃0 ).
T
obs
0
j=1 Yj (t) exp{γ̂ H(t, Fj (t−))}]
That is, in (11) and (12) above, the expectations are replaced by the corresponding sample averages and the unknown quantities (β0 , γ 0 , λ0 (·), c) are replaced by their consistent
estimators.
5.
Analysis of HUD-VASH Data
5.1. Data description
Delivery of effective services to homeless people with serious psychiatric and/or addictive
disorders has been a major challenge, in large part, because of the need for assistance from
multiple agencies in multiple service domains including housing, psychiatric and substance
abuse treatment, income support, and social and vocational rehabilitation. Decisions in
14
Lin et al.
medicine, public health, and social services rely critically on appropriate evaluation of competing treatments and services.
As described in the Introduction, a clinical trial of three housing services was conducted
with the HUD-VASH program. In the service group (i) of the full HUD-VASH intervention,
subjects were offered both housing vouchers for rent subsidies and intensive case management
in an integrated program. These vouchers allow payment of a standardized local fair market
rent less 30% of the individual beneficiary’s income. The case management intervention
promoted active liaisons between clients and their local Public Housing Authority and also
eased the transition to independent living by helping clients locate an apartment, negotiate
the lease, or furnish the apartment. Subjects randomized to group (ii) only received case
management, without a voucher. The group (iii) of standard care consisted of short-term
broker case management. The randomization among the three services was weighted so that
half as many veterans were assigned to case management alone as to the other two groups,
to assure that the vouchers would be used in a timely fashion. Recruitment took place from
1992 to 1995. All subjects were followed for at least four years.
The time scale for analysis was month since enrollment. The primary time-dependent
outcome measure, P (t), was the percentage of days homeless in past three months. The
main goal of the analysis is to compare the intervention-specific mean outcome trajectories
over four years. With this in mind, we take Z, the covariates in the marginal outcome
regression model, to be the vector of indicator variables (Z(i) , Z(ii) , Z(iii) )T , where Z(κ) is
the indicator of being assigned to intervention group (κ). Auxiliary time-dependent factors
collected during the study included income (in thousand dollars) in the past three months
- income(t), Lehman measures of quality of living situation - qliv(t), and whether social
security or VA benefit was received during the past three months - benefit(t).
In the top panel of Figure 1, we present the intervention-specific average of the primary homelessness outcome for subjects reporting at baseline and within 6 month intervals
thereafter. In the bottom panel of Figure 1, we present the intervention-specific average
cumulative number of visits as a function of time since enrollment. A crude view of the data
suggests that the full HUD-VASH program has the lowest level of homelessness and the highest level of visiting. The other groups appear to be comparable in terms of homelessness,
with the standard care group reporting the lowest level of visiting.
informative follow-up
15
5.2. Results from HUD-VASH Homeless Study
We first fit the intensity models for the visit process. For the predictor-adjusted intensity
model (5), the two indicators of intervention groups (i) and (ii), Z(i) and Z(ii) , are included
as fixed covariates, additional time-dependent covariates with possible interactions with
Z(i) , Z(ii) and Z(iii) that are significant at the 0.1 level are also included. The predictors for
(5) that we ended up with are
H(t, F obs (t−)) = (Z(i) , Z(ii) , income(0), benefit(0), P obs (t−),
{Z(i) + Z(ii) } × qlivobs (t−), Z(iii) × qlivobs (t−),
Z(i) × N (t−), Z(ii) × N (t−), Z(iii) × N (t−))T
Table 1 top panel presents the estimates of γ 0 and their associated standard errors. We see
that higher baseline income is associated with decreased intensity of visiting, while receiving
any social security or VA benefits significantly increases the intensity of visiting. The most
recently reported outcome, percentage of days homeless in past three months, is positively
associated with the intensity of visiting. Members of group (i) and (ii) have a higher intensity
of visiting than those of group (iii). Higher cumulative number of visits (N (t−)) and higher
quality of living are both associated with increased intensity of visiting and their effects are
service specific.
For the null intensity model (6), we only included Z(i) and Z(ii) , that is, we took
H null (t, Z) = (Z(i) , Z(ii) )T .
Table 1 bottom panel presents the estimates of γ null
and their associated standard errors.
0
Under the null model, we see that members of group (i) have the highest visit intensity
followed by group (ii) and then group (iii).
The Epanechnikov kernel was used to obtain smoothed baseline visit intensities λ̃0 (t)
and λ̃null
0 (t) from the Breslow estimates of the cumulative baseline intensity as described
in (7). Tail modifications were employed, as suggested by Gasser and Müller (1979). The
bandwidth was chosen to be fixed at 6.93 months to attain relatively small mean integrated
squared error (MISE) for λ̃0 (t) and λ̃null
0 (t) and at the same time to assure the baseline
intensities look smooth under visual inspection.
For the marginal outcome regression model, we set the mean of P (t) as µ(t, Z; β0 ) =
logit−1 {Z(i) β T0,(i) Bd (t) + Z(ii) β T0,(ii) Bd (t) + Z(iii) β T0,(iii) Bd (t)}, where logit is the logistic
16
Lin et al.
link function, Bd (t) is a B-spline basis with d degrees of freedom, β0,(κ) is a d × 1 vector of
unknown regression parameters for intervention group (κ), β0 = (β T0,(i) , β T0,(ii) , β T0,(iii) )T , and
we take d = 4 in our particular example. We take the unscaled variance function of P (t) as
Σ(t, µ(t, Z; β0 ), α) = µ(t, Z; β 0 ){1 − µ(t, Z; β 0 )}. The rationale for this choice of variance
function is as following: since the days homeless in past 90 days, which we define here as,
A(t), could be regarded as Binomial(90, µ(t, Z; β 0 )) with Var{A(t)|Z} = 90 µ(t, Z; β 0 ){1 −
µ(t, Z; β 0 )}, the variance of P (t) is thus given as Var{P (t)|Z} = Var{(1/90)A(t)|Z} =
(1/90)µ(t, Z; β 0 ){1 − µ(t, Z; β 0 )}. The estimated curves of P (t) for each of the three service
groups using the usual GEE analysis and using our inverse intensity of visiting weighted
estimation approach are presented in the top and bottom panels of Figure 2, respectively.
With the usual GEE analysis, it can be seen that the service group (i) “case manager +
voucher” has a much lower percentage of homeless days than the other two groups. The
profiles for service groups (ii) and (iii) tend to be alike until the later years of the trial where
service group (ii) has a higher percentage of homeless days than group (iii). This suggests
that the addition of a case manager may provide little help in reducing homelessness. The
results under the usual GEE analysis closely match the observed homeless profile in the upper
panel of Figure 1. When the visit process is accounted for using our proposed method, the
estimated percentage of homeless days for group (ii) (case manager only) decreases steadily
with time (bottom panel in Figure 2) and is much lower than that for group (iii) after about
one year. The service groups (i) and (ii) both appear to have a lower estimated percentage
of homeless days with our weighted analysis than with the usual GEE analysis, while the
profile of homelessness for group (iii) does not seem to be different. From the intensity
model for visiting (5), we learned that subjects who were worse off (more homelessness,
lower incomes, more benefits, poorer quality of living situation) tended to have an increased
intensity of visiting. This suggests that the level of homelessness in the observed data tend
to be biased upward. Thus, the decrease in the mean profiles for homelessness from the
GEE to the weighted analysis makes intuitive sense.
For each service group, we use area under the estimated outcome curves to quantify
the overall level of homelessness over the four year follow-up. The interpretation of the area
under the curve (AUC) is the expected number of homeless months over the 48 month period.
The AUC’s were estimated using β̂ with numerical integration. The group-specific estimates
of AUC and associated 95% confidence intervals (C.I.) are presented in the top panel of
informative follow-up
17
Table 2. As expected, we see that the GEE estimates are inflated related to the weighted
estimates. The difference between the GEE and weighted estimates range from a low of 1
week for group (iii) to a high of 2.6 months for group (ii). We calculated confidence intervals
for the AUC’s using the Delta method via the calculation of the asymptotic variance of β̂
presented in Section 4 and also via bootstrap (200 re-samples). It can be seen for our data
that the confidence intervals of AUC based on the calculation from the asymptotic variance
are much wider than those from bootstrap. Based on the results from the investigation of
the sample size requirement in order to obtain reliable estimates of the asymptotic standard
errors in finite samples and from the evaluation of the coverage rates for the asymptotic
and bootstrap confidence intervals in finite samples with simulation studies in Section 6, we
view that, for datasets with this sample size, the bootstrap confidence intervals are a more
reliable representation of variability than those based on large sample theory.
We parsimoniously estimate intervention effects by taking the difference in AUC between
the group-specific homelessness curves. This measures represents the difference in the expected number of months homeless over the four year period. We denote the intervention
ˆ (κ),(κ∗ ) using numerical ineffect of groups (κ) versus (κ∗ ) by ∆(κ),(κ∗ ) and estimate it by ∆
tegration. For our study, a beneficial intervention effect of treatment (κ) versus (κ∗ ) will
ˆ (k),(k∗ ) . In the bottom panel of Table 2, we present the pairwise
be signaled by a negative ∆
treatment effects along with 95% confidence intervals. As above, we present large sample
Delta method-based and boostrap confidence intervals. We rely on the bootstrap intervals
for our sample size. The treatment effect estimates differ between the GEE and our weighted
approach. In comparing groups (i) vs. (ii) and (i) vs. (iii), the estimates are in the same
direction, favoring the full HUD-VASH program. In comparing groups (ii) and (iii), the estimate for GEE favors standard care, while the estimate for the weighted approach favors the
case-manager alone intervention (these also agree with Figure 2). The results of hypothesis
testing are the same between the GEE and weighted approaches. The full HUD-VASH program is superior to the standard care and case-manager alone groups. The null hypothesis
of no treatment effect cannot be rejected when comparing groups (ii) and (iii).
Our results from data analysis show that taking the visit process into account markedly
changed the marginal profile of homelessness. In Section 6, we conducted simulation studies
to illustrate that the bias with the usual GEE analysis, if it exists, can be corrected with
our inverse intensity of visiting method.
18
6.
Lin et al.
Simulation Studies
We conducted two sets of simulation studies based on a single treatment group. The focus of
the first set of simulations was to evaluate the bias, variance, and mean-squared error (MSE)
of the mean outcome profile curves and AUC using the usual GEE and our visit intensity
weighted estimators. In the second set of simulations, we vary the sample size to evaluate
the performance of our large sample-based estimator of the standard error of our AUC
estimator and the coverage rates of large-sample based confidence intervals. We also look
at the coverage rate of a bootstrap-based confidence interval for sample sizes comparable to
those in the HUD-VASH evaluation. In all simulations, 200 datasets were generated.
6.1.
Simulation Procedure
For each subject, we simulate data for 1, 531 days (= 4 years and 3 months) with the first
90 days being regarded as “pre-baseline” and the 91st day being regarded as the 1st day of
study. Letting t = −90, −89, . . . , 0, 1, . . . , 1431 index the days, the procedure of generating
data for one subject is as follows:
(a) A time-dependent auxiliary factor X(t) was simulated from Normal distribution with
mean −1.5 + 0.066t and variance 0.25.
(b) A 1531-dimensional multivariate binary response, P ∗ (t), was simulated according to
the procedure described by Lunn and Davies (1998) using a compound symmetry
correlation coefficient of 0.6 for the logistic models:
logit E[P ∗ (t)|X(t)] = −0.5 − 0.075t + 0.001t2 − 0.25X(t),
logit E[P ∗ (t)] = −0.5 − 0.075t + 0.001t2 .
(13a)
(13b)
Equations (13a) and (13b) were used for the first and second set of simulation studies,
respectively. Cumulative averages of the binary response P ∗ (t) in past 90 days were
calculated for each of the days at and after the 1st day of the study, i.e., P (t−) =
Pt−1
∗
s=t−90 P (s)/90 for 1 ≤ s ≤ 1431. The resulting percentages are regarded as the
“true” longitudinal outcome for the subject.
(c) A subject’s visit times were simulated via the method for point process described
by Daley, Vere-Jones and Smirnov (2002) using one of the following visit intensity
informative follow-up
19
functions:
λ(t, F obs (t−)) = 0.005 exp{−0.001t},
(14a)
λ(t, F obs (t−)) = 0.005 exp{−0.001t + 1.0P obs (t−)},
(14b)
λ(t, F obs (t−)) = 0.005 exp{−0.0003t + 1.0P obs (t−) − 0.75X obs (t−)}
(14c)
for t = 1, . . . , 1431 days. Equations (14a)-(14c) were used for the first set of simulations, while only Equation (14b) was used for the second set simulations.
6.2. Evaluation of the Bias, Variance, and MSE
In these simulations, we fixed the sample size at n = 250 and generated the true outcomes
under (13a). This true mean outcome trajectory is represented by the black solid lines in
Figure 3 and the corresponding true AUC was 9.63. The three graphs in Figure 3 correspond
to the scenarios of the three different true visit process models (14a)-(14c) under which the
resulting observed outcome data are generated, respectively. Models (14a)-(14c) reflect
different degrees to which the visit process depends on the observed outcome P obs (t) and
the prognostic factor X obs (t). For a resulting dataset generated under one of the three true
visit intensity models (14a)-(14c), we estimated the visit intensities with the following four
choices of H(t, F obs (t−)) in model (5) one at a time:
(5a) H(t, F obs (t−)) = P obs (t−),
(5b) H(t, F obs (t−)) = X obs (t−),
(5c) H(t, F obs (t−)) = (P obs (t−), X obs (t−))T ,
(5d) H(t, F obs (t−)) = 0.
The estimator using (5d) is equivalent to the usual GEE estimator. We also set H null (t) in
(6) equal to zero since we only simulated data for one treatment group and estimated the
null intensity. We then estimated the mean outcome trajectory using the marginal outcome
regression model, E[P (t)] = µ(t, β0 ) = logit−1 β T0 {B4 (t)} with the inverse visit intensity
weighted approach.
Under the true null visit intensity model (14a), where visiting is unrelated to the outcome,
our results from the simulation show that all the four estimated curves of the outcome using
fitted intensity models (5a)-(5d) are unbiased (top panel of Figure 3) with the bias estimates
of the AUC’s given in the first line of Table 3. We also see that all four estimators of AUC
are comparable in terms of variance and MSE (2nd and 3rd lines of Table 3).
20
Lin et al.
Under the true visit model (14b), where visiting is related to the past observed outcome,
both fitted intensity models of (5a) and (5c) resulted in unbiased curves and the estimates
of AUC with (5c) giving a slightly better result. The fitted models (5b) and (5d) yielded
biased results with (5d) being the worst (the middle panel of Figure 3 and the 4th line of
Table 3). The order of increased bias is (5c), (5a), (5b) and (5d). All the estimators also
have comparable variance (the 5th line in Table 3). We see that the estimator using (5c)
has a slightly lower MSE than the one using (5a) and the GEE estimator and the weighted
one using an incorrectly specified intensity model have poor MSE (the 6th line in Table 3).
Under the true visit model (14c), where visiting depends on both the observed past
outcome and the auxiliary factor, the order of increased bias for the four fitted intensity
model is also (5c), (5a), (5b) and (5d) as shown in the bottom graph of Figure 3 for the
estimated curves and the 7th line of Table 3 for the AUC’s. All the four estimators have
comparable variance (the 8th line in Table 3). In terms of MSE, the estimator using (5c)
is half as large as the estimator using (5a) and the usual GEE estimator using (5d) and
the weighted estimator using an intensity model (5b) which only accounts for the auxiliary
factor have poor MSE (the 9th line in Table 3).
The above simulations demonstrate that when the visit process depends on outcome
and/or other auxiliary factors, the usual GEE produces biased estimates. The bias is corrected by the inverse visit intensity weighted approach proposed in this article in which the
weights are estimated from a visit process intensity model using the past outcome and possibly auxiliary prognostic factors as time-varying covariates. Including the auxiliary covariates
in the visit intensity model helps to further reduce the bias, regardless of whether or not
the true intensity depends on the auxiliary covariates. When the visit process is unrelated
to the outcome, both the GEE and our weighted approach yield unbiased estimates and
comparable MSEs.
6.3.
The Effects of Varying Sample Size on the Asymptotic and Bootstrap Variance and
Coverage
We generated true longitudinal outcome data according to model (13b) with the true AUC
of 9.5 for n subjects. The resulting observed outcome data were obtained using the true
visit intensity model (14b) in which the visit process depends on the observed outcome
P obs (t). For each resulting simulated dataset, we estimated the visit intensity by fitting the
informative follow-up
21
intensity model (5a): H(t, F obs(t−)) = P obs (t−). We also set H null (t) in (6) equal to zero
and estimated the null intensity. We then estimated the mean AUC using the marginal
outcome model, E[P (t)] = µ(t, β 0 ) = logit−1 β T0 {B4 (t)} with the inverse visit intensity
weighted approach.
We varied the sample size n for the simulated datasets from 50 to 2000. Whatever be the
sample size, we see that our weighted estimator has minimal bias (the 2nd column in Table
4). When we compare the average of the estimated standard errors of AUC based on the
large sample theory to the Monte-Carlo standard deviation of the estimated AUC, we see
that our asymptotic estimator of standard error starts to perform well as compared to the
Monte-Carlo standard deviation when the sample size reaches about 1000 (the 3rd, 5th, and
last columns in Table 4). The coverage rates of large-sample theory-based 95% confidence
intervals for the true AUC are conservative for sample sizes below 1000, but perform well
for sample sizes 1000 or higher (2nd to last column in Table 3). This suggests that, for
the sample sizes in the HUD-VASH study, large sample confidence intervals may be too
wide. In our simulation, the coverage rates of 95% bootstrap confidence intervals (based
on 200 re-samples for each generated data) for sample sizes 50, 80 and 150 were evaluated
to be 92.5%, 94.5% and 95%, respectively (4th column in Table 4). This suggests that the
bootstrap is a valuable tool for sample sizes around 80 or above.
With these results, we suggest computing bootstrap confidence intervals for small or
median sized samples. For a relatively large sized sample, results from bootstrap and the
asymptotic calculation are virtually the same.
7.
Discussion
In this paper, we proposed a class of consistent and asymptotically normal estimators for the
parameters of the marginal regression models for longitudinal outcomes, where follow-up on
subjects is highly irregular and potentially informative. Our method extends the weighted
estimating equations approach of Robins et al. (1995) to settings of irregularly spaced
longitudinal data in which the measurement (i.e., visit) process may depend on the outcomes
or auxiliary prognostic factors. Our semiparametric method does not require specification
of the complete data (outcomes, possibly auxiliary factors, and visits) likelihood and will
be particularly useful for analyzing non-Gaussian data. Our method does require correct
22
Lin et al.
specification of a visit process intensity model. We suggest that all available prognostic
covariates be considered for use in such an intensity model, so that the bias due to the
informative visit process can be corrected to the greatest extent possible. Although, we did
not observe the variance inflation of β̂ due to the increase in the number of covariates in the
visit intensity model, careful attention should be paid to possible variance-bias trade-off.
It is important to highlight that the extension of our model to simultaneously accommodate informative dropout is straightforward. That is, we would posit predictor-adjusted
and null proportional hazard models for drop-out. In the numerator (denominator) of the
integrand of an individual’s estimating function given as (8), we would put the estimated
conditional probability of having not dropped by time t under the null (predictor-adjusted)
model. However, we think that, in some settings, additional modeling of dropout may not
be necessary since the information on dropout may already be captured in the visit process.
The marginal outcome model is also easily extended to include external time-dependent
covariates (Kalbfleisch and Prentice, 2002).
In our paper, we have not officially considered issues of efficiency. For future work, we
plan to investigate locally-efficient estimators of β0 (van der Laan and Robins, 2003) and
it would be also useful to develop a sensitivity analysis methodology, along the lines of
Rotnitzky et al. (1998) and Scharfstein et al. (1999), to evaluate the sensitivity of inference
to departures from the sequential randomization assumption.
Acknowledgment
We appreciate the help of Wen Liu-Mares on data management for this project. We thank
Mark van der Laan for useful discussions related to the derivation of the influence function
for our weighted estimator.
A.
Derivation of Tangent Sets and Projections
Here, we first derive the tangent sets for γ and λ0 (·) in the semiparametric model (5)
and then derive the projection of U (O; β 0 , γ 0 , λ0 ; c) onto these tangent sets. The tangent
sets are closed linear subspaces of a Hilbert space consisting of all p-dimensional mean
zero random vectors with finite variance and the covariance inner product. Since λ0 (·) is
infinite dimensional, we derive the tangent sets through parametric submodels. A parametric
informative follow-up
23
submodel under model (5) corresponds to the parameterization λ(t, F obs (t−); ω, γ) such
that for some (ω 0 , γ 0 ), λ(t, F obs (t−); ω 0 , γ 0 ) is equal to λ(t, F obs (t−)), that is, a parametric
submodel contains the truth. We consider the following parametric submodel for (5):
λ(t, F obs (t−); ω, γ) = λ(t; ω) exp{γ T H(t, F obs (t−))}.
For example, if we take λ(t; ω) = λ0 (t) exp{ω1 h1 (t)+. . . , +ωr hr (t)}, where ω = (ω1 , . . . , ωr )T
is r×1 vector and h1 (t), . . . , hr (t) are r different smooth functions, then the truth is obtained
by setting ω = ω 0 = 0 and γ = γ 0 .
Z
The log-likelihood for the above parametric submodel is proportional to
Z τ
τ
T
obs
λ(t; ω) exp{γ T H(t, F obs (t−))}Y (t) dt.
log[λ(t; ω) exp{γ H(t, F (t−))}] dN (t) −
0
0
The resulting scores for γ and ω evaluated at the truth, (γ 0 , ω0 ), are equal to
Z τ
Z τn ∂
λ(t; ω) obs
∂ω
dM (t), respectively,
H(t, F (t−)) dM (t) and
λ0 (t)
ω =ω 0
0
0
Rt
where M (t) ≡ M (O, t; γ 0 , λ0 ) = N (t) − 0 Y (s)λ0 (s) exp{γ T0 H(s, F obs (s−))} ds. By ranging
over parametric submodels the 2nd integrand above can be made any p-dimensional function
of t by pre-multiplying a conformable matrix. Thus, the tangent set, T = T1 + T2 , is defined
as
T1 = Q
T2 =
Z
Z
0
τ
0
τ
H(t, F obs (t−)) dM (t) : Q is a any conformable matrix of p rows ,
g(t) dM (t) : g(t) is any p × 1-dimensional function of t .
Now let T1∗ ⊂ T1 denote the tangent set for all residuals of the elements in T1 after
projecting onto T2 , thus T1∗ and T2 are orthogonal and T = T1∗ ⊕ T2 . It is clear that T2
is the space spanned by the scores for λ0 (·) from all parametric submodels and it will also
become clear below that T1∗ is actually the tangent set of the efficient score for γ. To find the
projection of elements in T1 onto T2 , we need to find a p-dimensional measurable function
h∗ (t) such that
n Z
E Q
0
τ
H(t, F
obs
(t−)) dM (t) −
Z
0
τ
oT Z
h (t) dM (t)
τ
∗
0
g(t) dM (t) = 0
(15)
for all g(t). We use the property of the covariance of martingale stochastic integrals in the
calculations to find the solution of h∗ (t) in (15) (Fleming and Harrington, 1991; Andersen
et al., 1993) and it is then straightforward to show that
E H(t, F obs (t−))Y (t) exp{γ T0 H(t, F obs (t−))}
∗
.
h (t) = Q
E [Y (t) exp{γ T0 H(t, F obs (t−))}]
24
Lin et al.
To see this, we plug h∗ (t) back into the left hand side of (15) which can be easily verified
to be zero also by using the property of stochastic integrals with respect to martingale
covariance processes. Therefore,
Z τ
∗
∗
h̄ (t) dM (t) : Q is a any conformable matrix with p rows
T1 = {Q
0
= {QS(O; γ 0 , λ0 ) : Q is a any conformable matrix with p rows ,
where
E[H(t, F obs (t−))Y (t) exp{γ T0 H(t, F obs (t−))}]
∗
h̄ (t) = H(t, F obs (t−)) −
.
E [Y (t) exp{γ T0 H(t, F obs (t−))]
Rτ ∗
It is important to note that 0 h̄ (t) dM (t) = S(O; γ 0 , λ0 ) and S(O; γ 0 , λ0 ), defined in (12),
is the well known efficient score for γ.
We now derive the projection of U (O; β0 , γ 0 λ0 ) onto T1∗ and T2 . To find the projection
onto T1∗ and onto T2 , we need to find Q∗ and g ∗ (t) such that
n
Z τ
oT n Z τ
o
∗
∗
∗
E U (O; β0 , γ 0 λ0 ) − Q
h̄ (t) dM (t)
Q
h̄ (t) dM (t) = 0,
n
Z
E U (O; β0 , γ 0 λ0 ) −
0
0
0
oT n Z
τ
∗
g (t) dM (t)
τ
0
o
g(t) dM (t) = 0
for all matrices Q and for all g(t), respectively. Similar to (15), it can be shown that the
solution of Q∗ and the solution of g ∗ (t) to the above equations, respectively, are,
h Z τ
Z τ
Z τ
i−1
∗
∗
∗
∗
T
T
Q = E U (O; β0 , γ 0 λ0 )
h̄ (t) dM (t) E {
h̄ (t) dM (t)}{
h̄ (t) dM (t)}
0
0
0
= E [U (O; β0 , γ 0 λ0 )S(O; γ 0 , λ0 )T ] [S(O; γ 0 , λ0 )S(O; γ 0 , λ0 )T ]−1 ,
and
E [U (O; β0 , γ 0 λ0 ) dM (t)]
.
E [Y (t)λ0 (t) exp{γ T0 H(t, F obs (t−))}]
Therefore, the projection of U (O; β0 , γ 0 λ0 ) onto T is
g ∗ (t) =
Π [U (O; β0 , γ 0 λ0 )|T ] = Π [U (O; β 0 , γ 0 , λ0 ; c)|T1∗ ] + Π [U (O; β 0 , γ 0 , λ0 ; c)|T2 ] ,
which is given as (11) in Section 4.
References
Albert, P. S., Follmann, D. A., Wang, S. A. and Suh, E. B. (2002) A latent autoregressive
model for longitudinal binary data subject to informative missingness. Biometrics, 58,
631–642.
informative follow-up
25
Andersen, P. K., Borgan, Ø., Gill, R. D. and Keiding, N. (1993) Statistical Models Based on
Counting Processes. New York: Springer-Verlag.
Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993) Efficient and Adaptive
Estimation for Semiparametric Models. Baltimore: Johns Hopkins Series in theMathematical Sciences.
Daley, D. J., Vere-Jones, D. and Smirnov, B. M. (2002) An Introduction to the Theory of
Point Processes: Elementary Theory and Methods. New York: Springer Verlag, 2nd edn.
Deltour, I., Richardson, S. and Le Hesran, J. Y. (1999) Stochastic algorithms for markov
models estimation with intermittent missing data. Biometrics, 55, 565–573.
Diggle, P. J. and Kenward, M. G. (1994) Informative dropout in longitudinal data analysis
(with discussion). Appl. Statist., 43, 49–93.
Fitzmaurice, G. M., Laird, N. M. and Shneyer, L. (2001) An alternative parameterization
of the general linear mixture model for longitudinal data with non-ignorable drop-outs.
Statist. Med., 20, 1,009–1,021.
Fleming, T. R. and Harrington, D. P. (1991) Counting Processes and Survival Analysis. New
York: John Wiley & Sons.
Follmann, D. and Wu, M. (1995) An approximate generalized linear model with random
effects for informative missing data. Biometrics, 51, 151–168.
Gasser, T. and Müller, H. G. (1979) Kernel estimation of regression functions. In Smoothing Techniques for Curve Estimation, 23–68. Berlin: Springer-Verlag. Lecture Notes in
Mathematics 757.
Hogan, J. W. and Laird, N. M. (1997) Mixture models for the joint distribution of repeated
measures and event times. Statist. Med., 16, 239–257.
Huber, P. J. (1985) Projection pursuit. Ann. Statist., 13, 435–475.
Kalbfleisch, J. D. and Prentice, R. L. (2002) The Statistical Analysis of Failure Time Data.
New York: John Wiley & Sons, second edn.
Liang, K.-Y. Y. and Zeger, S. L. (1986) Longitudinal data analysis using generalized linear
models. Biometrika, 73, 13–22.
26
Lin et al.
Lipsitz, S. R., Fitzmaurice, G. M., Ibrahim, J. G., Gelber, R. and Lipshultz, S. (2002) Parameter estimation in longitudinal studies with outcome-dependent follow-up. Biometrics,
58, 621–630.
Little, R. J. A. (1995) Modelling the drop-out mechanism in repeated-measures studies. J.
Am. Statist. Ass., 90, 1112–1121.
Lunn, A. D. and Davies, S. J. (1998) A note on gererating correlated binary variables.
Biometrika, 85, 487–490.
van der Laan, M. J. and Robins, J. M. (2003) Unified Approach for Censored Longitudi-nal
Data and Causality. New York: Springer Verlag.
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman &
Hall, 2nd edn.
Newey, W. K. (1990) Semiparametric efficiency bounds. J. Appl. Econom., 5, 99–135.
Preisser, J. S., Galecki, A. T., Lohman, K. K. and Wagenknecht, L. E. (2000) Analysis of
smoking trends with incomplete longitudinal binary response. J. Am. Statist. Ass., 95,
1,021 – 1,031.
Robins, J. M. (1998) Marginal structural models. In 1997 Proceedings of the American
Statistical Association. Section on Bayesian Statistical Science, 1–10. American Statistical
Association.
Robins, J. M., Mark, S. D. and Newey, W. K. (1992) Estimating exposure effects by modeling
the expectation of exposure conditional on confounders. Biometrics, 48, 479–495.
Robins, J. M. and Ritov, Y. (1997) Toward a curse of dimensionality appropriate (CODA)
asymptotic theory for semi-parametric models. Statist. Med., 16, 285–319.
Robins, J. M. and Rotnitzky, A. (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology - Methodological Issues
(eds. N. Jewell, K. Dietz and V. Farewell), 297–331. Boston, MA: Birkhäuser.
Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995) Analysis of semiparametric regression
models for repeated outcomes in the presence of missing data. J. Am. Statist. Ass., 90,
106–121.
informative follow-up
27
Rosenheck, R. A., Kasprow, W., Frismn, L. K. and Liu-Mares, W. (2002) Cost-effectiveness
of supported housing for homeless persons with mental illness. Arch. Gen. Psychiat., In
press.
Rotnitzky, A., Robins, J. M. and Scharfstein, D. O. (1998) Semiparametric regression for
repeated outcomes with nonignorable nonresponse. J. Am. Statist. Ass., 93, 1,321– 1,239.
Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999) Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion). J. Am. Statist. Ass., 94,
1,096–1,120.
Troxel, A. B., Lipsitz, S. R. and Harrington, D. P. (1998) Marginal models for the analysis
of longitudinal measurements with nonignorable non-monotone missing data. Biometrika,
85, 661–672.
van der Vaart, A. W. (1998) Asymptotic Statistics. Cambridge: University Press.
Wu, M. C. and Carroll, R. J. (1988) Estimation and comparison of changes in the presence of
informative right censoring by modeling the censoring process. Biometrics, 44, 175–188.
Table 1. Parameter Estimates and Their Standard
Errors (SE) in the Intensity Models
Predictor-Adjusted Intensity Model (5)
Covariates
γ̂
SE(γ̂)
Z(i)
0.612
0.141
Z(ii)
0.409
0.144
income(0)
-0.221
0.050
benefit(0)
0.113
0.041
0.141
0.058
0.057
0.019
0.108
0.027
Z(i) × N (t−)
0.086
0.013
0.109
0.017
Z(iii) × N (t−)
0.119
0.015
P obs (t−)
{Z(i) + Z(ii) } ×
Z(iii)
qlivobs (t−)
× qlivobs (t−)
Z(ii) × N (t−)
Null Intensity Model (6)
Covariates
γ̂ null
SE(γ̂ null )
Z(i)
0.409
0.043
Z(ii)
0.207
0.054
i
Table 2. Estimates of Areas Under Curves and Intervention Effects
Area Under Curve (AUC)
GEE
Inverse Intensity Weighted GEE
Asymptotic
Bootstrap
95% C.I.
95% C.I.
5.51
[4.21, 6.80]
[4.15, 6.67]
(ii)
10.85
[7.48, 14.22]
[7.96, 14.38]
(iii)
10.14
[7.85, 12.43]
[7.90, 12.57]
Service
(i)
d
AUC
Asymptotic
Bootstrap
95% C.I.
95% C.I.
4.57
[1.89, 7.25]
[3.96, 6.52]
8.25
[2.19, 14.31]
[6.83, 13.08]
9.89
[4.23, 15.55]
[7.87, 13.13]
d
AUC
Intervention Effects
GEE
Asymptotic
Inverse Intensity Weighted GEE
Asymptotic
Bootstrap
95% C.I.
ˆ (κ),(κ∗ )
∆
95% C.I.
95% C.I.
[-6.34, -4.34]
[-8.88, -1.89]
-3.68
[-5.56, -1.80]
[-7.05, -1.62]
-4.57
[-8.04, -1.10]
[-7.00, -2.36]
-5.32
[-9.81, -0.83]
[-8.18, -2.20]
0.71
[-3.36, 4.78]
[-3.21, 4.71]
-1.64
[-10.26, 6.98]
[-4.00, 3.33]
(κ), (κ∗ )
ˆ (κ),(κ∗ )
∆
95% C.I.
(i), (ii)
-5.34
(i), (iii)
(ii), (iii)
Bootstrap
ii
Table 3. Simulation Study on the Effects of the Intensity Models
Fitted Intensity Models with
Line
(5a)
(5b)
(5c)
(5d)
P (t)
X(t)
P (t) + X(t)
Null
0.05
0.05
0.05
0.05
0.12
0.14
0.13
0.14
0.13
0.14
0.13
0.14
sample bias∗
0.15
1.37
0.01
1.52
sample variance∗
d∗
MSE
0.13
0.12
0.13
0.12
0.16
1.98
0.13
2.42
sample bias∗
0.37
1.39
-0.01
1.62
sample variance∗
d∗
MSE
0.13
0.12
0.13
0.13
0.26
2.04
0.13
2.75
True Intensity Model
Null Model, (14a)
sample bias∗
1
2
sample
d∗
MSE
3
variance∗
Model with P (t), (14b)
4
5
6
Model with P (t) + X(t), (14c)
7
8
9
∗:
The sample bias, variance and MSE are for AUC (area under the curve) and
the true value of the AUC is 9.63.
Table 4. Simulation Study on Sample Size for Asymptotic Variance of AUC ∗
Sample
Asymptotic
average
d
AUC
c AUC)
d
SE(
coverage∗∗
150
n
50
0.94
92.5%
0.72
94.5%
9.52
0.56
450
9.45
1000
average
c AUC)
d
SE(
coverage
ratio∗∗∗
4.83
100%
5.12
3.61
100%
5.03
95.0%
2.42
100%
4.35
0.38
–
0.88
100%
2.26
9.47
0.31
–
0.34
96.0%
1.10
1500
9.52
0.20
–
0.21
95.0%
1.04
2000
9.49
0.15
–
0.15
95.0%
0.99
80
∗:
9.53
bootstrap
9.48
The true value of the AUC is 9.5.
∗∗ :
95% bootstrap confidence interval was constructed for each simulated data
consisting of observations for n=50, 80 and 150 subjects and coverage rate was
calculated based on 200 simulated data.
∗∗∗ :
Ratio of the estimate of sample standard errors and the sample average of the
asymptotic standard error.
iii
Observed P(t)
40
(i) voucher +case manager
(ii) case manager only
(iii) standard care
percentage days homeless
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
35
40
45
50
months since randomization
Observed N(t)
9
(i) voucher +case manager
(ii) case manager only
(iii) standard care
8
cumulative number of visits
7
6
5
4
3
2
1
0
0
5
10
15
20
25
30
months since randomization
Fig. 1. Observed Percentage Days Homeless and Cumulative Number of Visits by Service Groups
iv
Usual GEE Analysis
(i) voucher + case manager
(ii) case manager only
(iii) standard service
40
35
% of homeless
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
months since randomization
Account for Informative Follow−up
(i) voucher + case manager
(ii) case manager only
(iii) standard service
40
35
% of days homeless
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
months since randomization
Fig. 2. Percentage of Days Homeless
v
45
50
Simulation Study Under Null Intensity Model
55
true mean profile of P(t)
mean profile from weighted analysis with P(t) only
mean profile from weighted analysis with X(t) only
mean profile from weighted analysis with both P(t) and X(t)
mean profile from unweighted analysis
50
45
% of homeless
40
35
30
25
20
15
10
0
5
10
15
20
25
30
35
40
45
50
months since randomization
Simulation Study Under the Intensity Model with P(t)
55
true mean profile of P(t)
mean profile from weighted analysis with P(t) only
mean profile from weighted analysis with X(t) only
mean profile from weighted analysis with both P(t) and X(t)
mean profile from unweighted analysis
50
45
% of homeless
40
35
30
25
20
15
10
0
5
10
15
20
25
30
35
40
45
50
months since randomization
Simulation Study Under the Intensity Model with P(t) & X(t)
55
true mean profile of P(t)
mean profile from weighted analysis with P(t) only
mean profile from weighted analysis with X(t) only
mean profile from weighted analysis with both P(t) and X(t)
mean profile from unweighted analysis
50
45
% of homeless
40
35
30
25
20
15
10
0
5
10
15
20
25
30
35
months since randomization
Fig. 3. Simulation Studies
vi
40
45
50