Analysis of Longitudinal Data with Irregular, Informative Followup Haiqun Lin† Yale University, New Haven, U.S.A. Daniel O. Scharfstein Johns Hopkins Bloomberg School of Public Health, Baltimore, U.S.A. Robert A. Rosenheck VA Northeast Program Evaluation Center & Yale University, West Haven, U.S.A. Summary. A frequent problem in longitudinal studies is that subjects may miss some scheduled visits or be assessed at self-selected points in time. As a result, observed outcome data may be highly unbalanced. Also, the availability of the data may be directly related to the outcome measure or some auxiliary factors that are related to the outcomes. This situation can be viewed as informative follow-up, as well as the one of informative intermittent missing data. Analysis without accounting for informative follow-up will produce biased estimates. Building on the work of Robins, Rotnitzky and Zhao (1995), we propose a class of inverse intensity of visit process weighted estimators in marginal regression models for longitudinal responses that may be observed in a continuous-time fashion. This allows us to handle arbitrary patterns of missing data as embedded in a subject’s visit process. We derive the large sample distribution for our inverse visit intensity weighted estimators and investigate the finite sample behavior of our estimators with simulations. Our approach is also illustrated with a data set from a health services research study in which homeless people with mental illness were randomized to three different treatments and measures of homelessness and other auxiliary factors were recorded at the follow-up times that are not fixed by design. Keywords: Informative follow-up; Intermittently missingness; Dropout; Longitudinal data; Weighted generalized estimating equations; Visit process. †Address for correspondence: Haiqun Lin, Division of Biostatistics, Department of Epidemiology & Public Health, Yale University, 60 College Street, Room 208 LEPH, New Haven, CT 06520, U.S.A. E-mail: [email protected] 2 Lin et al. 1. Introduction In many longitudinal studies, subjects are followed over a period of time and are scheduled to be assessed at a common set of pre-specified visit times after enrollment. However, subjects often selectively miss their visits or return at non-scheduled points in time. As a result, the measurement times are irregular yielding a highly imbalanced data structure. In addition, the frequency and timing of the visits may be informative with respect to the longitudinal outcomes. This situation can be viewed as informative follow-up, as well as the one of informative intermittent missing data. The goal of this paper is to develop a general methodology for analyzing studies with these features. Our interest is motivated by a randomized trial comparing three housing interventions for homeless people with mental illness. The trial was conducted at four of 35 sites that implemented a joint program linking services of the US Department of Housing and Urban Development (HUD) and the US Department of Veterans Affairs (VA) − the HUD-VA Supported Housing (HUD-VASH) program. In this program, 460 veterans were randomly assigned to receive: (i) the full HUD-VASH intervention (housing voucher plus case management); (ii) case management but no vouchers, or (iii) standard VA care (Rosenheck et al., 2002). Although efforts were made to conduct follow-up interviews every three months, subjects often missed assessments or showed up between scheduled interviews. The frequency and timing of the observations in the data are thus quite different across subjects. Over a 48 month period after randomization, the mean numbers of visits (standard errors) of group (i) through (iii) are 8.5 (3.2), 7.1 (3.6) and 6.0 (3.5). At each visit, the primary measure of interest is the percentage of days homeless during the past three months. Auxiliary measures were also collected including income, quality of life, addiction severity, and whether any social or VA benefit was received. During the trial, veterans were allowed to continue their participation regardless how many previous assessments they had missed or whether they reappeared at the time of a scheduled interview. For example, out of the total of 2,837 visits made by the 460 participants over four years, 686 visits had a gap of more than six months from the previous visit, 126 had a gap of more than one year. In analyzing this dataset, we are concerned about the informative nature of the visiting process. Specifically, researchers believe that subjects with greater levels of homelessness and more severe social, informative follow-up 3 economic, and mental health problems will be likely to visit more frequently. More detailed information on the three service groups and the trial is deferred until Section 5. Most of the previous statistical literature on incomplete longitudinal data has focused on monotone missing data in which a subject’s data is observed only through a certain time and is missing for all subsequent observations (Wu and Carroll, 1988; Diggle and Kenward, 1994; Little, 1995; Follmann and Wu, 1995; Hogan and Laird, 1997; Scharfstein et al., 1999; Fitzmaurice et al., 2001, etc.). There are a few studies that deal with intermittently missed visits in which subjects may miss some visits among a common set of pre-defined visit times (Troxel et al., 1998; Deltour et al., 1999; Preisser et al., 2000; Albert et al., 2002, etc.), however, these approaches require specifying the time points at which data can be missing and do not readily handle irregularly spaced visit times. Since the frequency and timing of subjects’ visits may be informative with respect to longitudinal outcomes, they deserve careful consideration. Recently, Lipsitz et al. (2002) developed a likelihood-based procedure for estimating the regression parameters in models for continuous longitudinal data. Specifically, they assumed that the repeated measures followed a Gaussian process. Under the assumption that the intensity of visiting at any time t is conditionally independent of current and future outcomes given the observed past outcomes, the visit process becomes ignorable and inference can proceed by maximizing the likelihood of the observed outcome data. In addition to their assumption on the visit process, they rely on the correlation structure of the longitudinal response to handle the informativeness of subjects’ visit process and therefore the validity of their approach hinges on correct specification of the Gaussian process, especially the correlation structure. In this article, we are interested in a regression methodology which (a) accommodates varieties of outcome measures such as binary, continuous, or percentage (e.g., the homelessness outcome in our motivating example), (b) affords comparison of the population-average evolution of a longitudinal outcome measure, (c) makes minimal distributional assumptions about the outcome and auxiliary processes, and (d) accommodates visits that occur in a continuous-time fashion and may depend on both outcomes and auxiliary time-dependent prognostic factors that are correlated with outcomes and visiting. Liang and Zeger (1986)’s marginal modeling approach for longitudinal data can accommodate issues (a)-(c). Unfortunately, it is well known that the standard inferential approach using generalized estimating equations (GEE) of Liang and Zeger (1986) will yield biased inferences when the visit process 4 Lin et al. is correlated with the outcomes under investigation. Robins et al. (1995) have developed a method to correct for this bias when visits occur only at pre-defined occasions. Specifically, they proposed a weighted GEE technique that yields consistent and asymptotically normal estimators when the dependence of missingness at a pre-defined visit time t on the observed past (including possibly time-dependent auxiliary factors, outcomes, and visit history) is correctly specified. In their approach, a subject’s contribution to the estimating equation at a pre-defined time point t is weighted by the inverse of the conditional probability of being observed at that time (marginalized over visit history). Thus, the conditional probability of being observed at jth occasion is a product of j conditional probabilities (of visiting or not). It may be argued that for visits occurring in continuous time, one can group the time scale into discrete intervals and apply Robins et al. (1995) method. However, the determination of time intervals may be arbitrary and it may be difficult to specify intervals so that a subject makes at most one visit in any given interval. Our method extends the Robins et al. (1995) approach to handle visits occurring in continuous time and therefore avoids grouping visit times and also the calculation of the product of the conditional visiting probabilities. With our extension, we are able to fully accommodate issues (a)-(d) above. Our approach views missingness as embedded in a subject’s visit process occurring in continuous time so as to accommodate unbalanced longitudinal data. We assume the intensity of visiting at time t is conditionally independent of the outcome at that time given the past observed data that include visit history, outcome and possibly other prognostic factors. This assumption is equivalent to sequential randomization of Robins (1998) and is weaker than the one posited by Lipsitz et al. (2002). We specify a marginal regression model for the longitudinal outcome at time t, which we allow to depend on baseline and external covariates (Kalbfleisch and Prentice, 2002) and a deterministic function of time. We also specify an intensity model for the counting process of visits at time t via past observed outcomes and counting process of visits as well as possibly time-dependent auxiliary prognostic factors. To estimate the parameters of the marginal regression model, we propose a class of stabilized, inverse intensity of visit process-weighted estimators. Under correct specification of the visit intensity model, our estimators will be consistent and asymptotically normal. Specification of the joint distribution of the outcomes and the auxiliary processes is not required for our methodology. Our estimators can be computed relatively easily using standard statistical software (e.g., SAS, Splus, or R) that can fit Cox models with time-dependent covariates informative follow-up 5 and generalized linear models with weights. Our methodology can be easily extended to simultaneously handle dropout that is explainable by past observed factors (Robins and Rotnitzky, 1992; Robins et al., 1995). This remainder of this article is organized as follows. In Section 2, we define our data structure, introduce the marginal outcome and visit process models and show how the model assumptions are integrated through the inverse visit intensity weighted method to provide identification of the regression parameters of primary interest. In Section 3, we provide the estimation procedure and describe how estimation can take place with standard statistical software. In Section 4, we derive the large sample properties of our class of inverse visit intensity weighted estimators, and show how to compute the asymptotic standard errors. In Section 5 we illustrate our approach with the analysis of the data from the HUD-VASH clinical trial. In Section 6, we present findings from the simulation studies. We conclude with a discussion in Section 7, where we describe some possible extensions of our methodology. 2. The Models 2.1. Data Structure Let τ be a positive fixed value of time, measured from the study entry, when the data will be analyzed. Let C denote a subject’s follow-up time, also measured from study entry. Let P (t) denote the primary outcome, potentially available at time t. In our motivating example, P (t) is recorded as the percentage of days homeless in the past three months. We are mainly interested in how the evolution of the mean of P (t) is related to a vector of fixed covariates Z (such as treatment group assignment). Let X(t) denote a vector of auxiliary variables, potentially available at time t. Let N (t) be a counting process recording the number of observed visits recorded by time t. It is assumed that N (0) = 1. Let Y (t) = I(C ≥ t) denote a left-continuous at-risk process, indicating whether a subject is still under follow- up. We view the full complete data for a subject as F = (X(τ ), P (τ ), Z), where V (t) = {V (s) : 0 ≤ s ≤ t} denotes the history of the variable V (s) through time t. We assume that C is independent of F . The observed data for an individual through time t is F obs (t) = (N (t), Y (t), X obs (t), P obs (t), Z) 6 Lin et al. where V obs (t) = V (max{s : 0 ≤ s ≤ t, dN (s) = 1}) is the most recent observed value of the variable V (t) at or prior to time t, V obs (t) = {V obs (s) : 0 ≤ s ≤ t} = {V (s) : 0 ≤ s ≤ t, dN (s) = 1} denotes the observed history of the variable V (s) through time t. We assume there are n subjects in the study and that we observe n i.i.d. copies of O = F obs (τ ), obs obs O = {Oi = Fiobs (τ ) = (N i (τ ), Y i (τ ), X i (τ ), P i (τ ), Z i ) : i = 1, . . . , n}. 2.2. Marginal Regression Model We assume that the conditional mean of the longitudinal outcome at time t, P (t), follows the marginal regression model E[P (t)|Z] = µ(t, Z; β 0 ) for all 0 ≤ t ≤ τ (1) where β 0 is a p × 1 vector of unknown regression parameters and the mean µ(t, Z; β0 ) is a specified function of t, Z, and β0 . For example, in the analysis of the homelessness data presented in Section 5, we define Z = (Z(i) , Z(ii) , Z(iii) )T , where Z(κ) is the indicator that a subject is assigned to intervention (κ) and we will let µ(t, Z; β0 ) = logit−1 {Z(i) β T0,(i) Bd (t)+ Z(ii) β T0,(ii) Bd (t) + Z(iii) β T0,(iii) Bd (t)}, where logit is the logistic link function, Bd (t) is a B-spline basis with d degrees of freedom, β0,(κ) is a d × 1 vector of unknown regression parameters for intervention (κ), β 0 = (β T0,(i) , β T0,(ii) , βT0,(iii) )T , and p = 3d. Our goal is to use O to obtain consistent estimators of β 0 in (1) above. It is well known that when outcomes are observed at irregular points in time and the timing and frequency of the observations may be informative about the longitudinal outcome of interest, the estimates of β0 in (1) obtained via GEE will not be consistent. Our objective is to estimate β 0 when subjects’ visit process depends on outcome and possibly on auxiliary variables that are related to the outcome. 2.3. Visit Process Assumptions In order to obtain consistent estimates of β0 , one must make assumptions about the visit process. As will be shown in Section 3.2, the GEE estimator with a working independence covariance structure of Liang and Zeger (1986) is valid if the visiting at time t is independent of P (t) given at-risk status at t− and the fixed covariate Z. That is, for 0 < t ≤ τ , P [ dN (t) = 1|Y (t−), Z, P (t)] = P [ dN (t) = 1|Y (t−), Z] = Y (t)λ(t, Z) dt, (2) informative follow-up 7 where λ(t, Z) is the intensity of visiting at time t that may be dependent on the fixed Z. The validity of general GEE estimators, requires the even stronger assumption that, for all t, the observed visit process N (τ ) is independent of P (t) given Z. In many longitudinal studies, like the one in our motivating example, the timing and frequency of the observations may be informative about the outcome of interest, and thus the above assumption is unlikely to hold. As a result, naı̈ve application of GEE may yield biased estimates. In order to obtain consistent estimates for β0 , the informativeness of a subject’s visit process has to be taken into account. We relax the above assumption by positing that visiting at time t is independent of P (t) given at-risk status at t−, the covariate Z, all past recorded outcomes, counting process of visits and auxiliary factors. Specifically, for 0 < t ≤ τ , we assume P [dN (t) = 1|F obs (t−), P (t)] = P [dN (t) = 1|F obs (t−)] = Y (t)λ(t, F obs (t−))dt, (3) where λ(t, F obs (t−)) is the intensity of visiting at time t that may depend on the fixed covariates and the recorded history of the outcome and some auxiliary factors. We refer to (3) as the sequential randomization assumption (Robins, 1998) as it indicates that the decision to visit at time t does not depend on current outcome given the past history. 2.4. Identification of β0 It is important to note that the regression parameters β0 are identified under (1) and (3) because the visit intensity in (3) are identified from the distribution of the observed data and it can be shown that Z τ c(t, Z; β 0 ) dN (t)] = 0 (4) E[ {P (t) − µ(t, Z; β 0 )} λ(t, F obs (t−)) 0 where c(t, Z; β0 ) is a specified p × 1-dimensional function for t, Z, and β0 and the choice c(t, Z; β0 ) will be discussed in Section 3.2. This follows by an application of the iterated expectations formula: Z τ c(t, Z; β 0 ) dN (t) E {P (t) − µ(t, Z; β0 )} λ(t, F obs (t−)) 0 Z τ c(t, Z; β 0 ) obs =E dN (t)F (t−) E {P (t) − µ(t, Z; β 0 )} λ(t, F obs (t−)) 0 Z τ obs c(t, Z; β 0 ) =E {P (t) − µ(t, Z; β0 )} E dN (t)F (t−) λ(t, F obs (t−)) 0 Z τ =E {P (t) − µ(t, Z; β0 )} c(t, Z; β 0 ) dt = 0. 0 8 Lin et al. It can be seen that the inverse of the visit process intensity, 1/λ(t, F obs (t−)), serves as a weight function in (4). By weighting, we create a “pseudo” population in which the visit process is no longer associated with the outcome P (t) and valid marginal inference of the longitudinal outcome can be made as if all the observed values of the outcome were random sample from an underlying distribution. Consistent estimates of β0 can be obtained as the solution to the empirical version of (4) with the expectation replaced by a sample mean and λ(t, F obs (t−)) substituted by some suitable estimates. The steps for consistently estimating λ(t, F obs (t−)) and β 0 are described in Section 3. 2.5. Visit Process Models Unfortunately, non-parametric estimation of the visit intensity function λ(t, F obs (t−)), even in moderate-sized datasets is infeasible due to the curse of dimensionality (Huber, 1985; Robins and Ritov, 1997). Thus, we will assume that λ(t, F obs (t−)) specified in (3) follows a lower-dimensional model of the form: λ(t, F obs (t−)) = λ0 (t) exp(γ T0 H(t, F obs (t−))) (5) where the baseline visit intensity λ0 (t) is an unknown, non-negative function of t, H(t, F obs (t−)) is a specified function of t and F obs (t−) chosen by the investigator, and γ 0 is a unknown vector of regression parameters of the same dimension as H(t, F obs (t−)). The function H(t, F obs (t−)) may include various functional forms for N (t−), P obs (t−), X obs (t−), Z and their interactions. In its full form, we refer to this model as the “predictor-adjusted” intensity model. We refer to the visit intensity model where H(t, F obs (t−)) depends only on Z as the”null” intensity model. Specifically, the null intensity model assumes that null,T null λ(t, F obs (t−)) = λnull (t, F obs (t−)) = λnull H (t, Z)) 0 (t) exp(γ 0 (6) where λnull 0 (t) is an unknown, non-negative function of t called baseline null intensity, H null (t, Z) is a specified function of t and Z, and γ null is a vector of unknown regression 0 parameters of the same dimension of H null (t, Z). Note that if (6) holds then (2) holds. informative follow-up 3. 9 Estimation We will consider the estimation under models (1) and (5). In order to estimate the parameter of primary interest β0 in (1), we first need to estimate γ 0 and λ0 (t) from (5). 3.1. Estimation of Visit Process Model Parameters For the intensity model (5), we know that the semi- or non-parametric maximum likelihood (NPML) estimator for γ0 is found by maximizing the partial likelihood for the visit intensity model (Andersen et al., 1993). Specifically, γ 0 is estimated by γ̂ as the solution to the partial likelihood score equation S ∗ (O; γ) = 0, where " Pn n Z τ γ T H(t,Fjobs (t−)) # obs X Y (t)H(t, F (t−))e j j=1 j dNi (t). S ∗ (O; γ) = H(t, Fiobs (t−)) − Pn γ T H(t,Fiobs (t−)) j=1 Yj (t)e i=1 0 Rt The NPML estimator of the cumulative baseline intensity, Λ0 (t) = 0 λ0 (s), in (5), is given by Breslow’s estimator, Λ̂0 (t) = Z t 0 Pn dNi (s) . T obs i=1 Yi (s) exp{γ̂ H(s, Fi (s−))} Pn i=1 A kernel smoothed estimator of λ0 (t) can be obtained: Z t−s 1 τ dΛ̂0 (s), K λ̃0 (t) = b 0 bn (7) where bn is a sample-size dependent bandwidth and K(·) is a kernel of choice. By controlling the kernel bandwidth bn such that bn → 0, nbn → ∞ and lim sup n1/5 bn < ∞ as n → ∞, then λ̃0 is consistent and converges to λ0 in probability faster than n1/4 (Andersen et al., null 1993). The γ null and λnull and λ̃null in a similar 0 0 (t) specified in (6) can be estimated by γ̂ 0 fashion. 3.2. Estimation of Marginal Regression Parameters We now estimate β0 as the solution, β̂(c) to the empirical version of (4) with γ 0 and λ0 (t) replaced by their estimators, γ̂, λ̃0 . That is, we solve n X U (Oi ; β; γ̂, λ̃0 ; c) = 0, i=1 where U (O; β 0 , γ 0 , λ0 ; c) = Z 0 τ {P (t) − µ(t, Z; β 0 )} c(t, Z; β 0 ) dN (t). λ0 (t) exp{γ T0 H(t, F obs(t−))} (8) 10 Lin et al. Note that the integrand above only needs to be evaluated at dN (t) = 1 when a subject’s data is observed. Although the choice of c(t, Z; β0 ) can be completely arbitrary, we choose ∂µ(t, Z; β) null,T null c(t, Z; β 0 ) = Σ(t, µ(t, Z; β 0 ))−1 λnull H (t, Z)}, 0 (t) exp{γ 0 ∂β β =β 0 where Σ(t, µ(t, Z; β0 )) is a conditional variance function of P (t) given Z with an overdispersion parameter φ0 . This variance function is defined as a function of t and the mean function µ(t, Z; β 0 ) and chosen by the investigator so that the conditional variance of P (t) is specified as Σ(t, µ(t, Z;β0 ))/φ0 . We choose c(0, Z; β0 ) so that the ratio in the integral above at {Σ(0, µ(0, Z; β 0 ))}−1 with λ0 (0) = λnull t = 0 is equal to ∂µ(0, Z; β)/∂β 0 (0) = 1. β =β 0 We force this condition because all subjects are assumed to have a visit at baseline. Estimating equation (8) can also be written as the weighted estimating equation in a similar form given in Liang and Zeger (1986): n X D(Z i ; β)T V (Z i ; β)−1 W (Z i ; β; γ̂, λ̃0 ){P i − µ(Z i ; β)} = 0, (9) i=1 where P i = (Pi1 , . . . , Pini )T is a ni -dimensional observed outcome vector in which Pij is the observed outcome for subject i at jth visit time, tij , µ(Z i ; β) = (µ(ti1 , Z i ; β), . . . , µ(tini , Z i ; β))T , D(Z i ; β) = ∂µ(Z i ; β)/∂β T is a ni × p matrix, V (Z i ; β) is a ni × ni diagonal matrix of variance function with the jth diagonal element being Σ(tij , µ(tij , Z i ; β)), and W (Z i ; β; γ̂, λ̃0 ) is a ni ×ni diagonal weight matrix with jth diagonal null,T null element being [λnull H (tij , Z i )}]/[λ̃0 (tij ) exp{γ̂ T0 H(tij , F obs (tij −))}]. We 0 (tij ) exp{γ 0 c (Z i ; β; γ̂, λ̃0 ) in which the unknown solve (9) for β with W (Z i ; β; γ̂, λ̃0 ) replaced by W null λnull in W (Z i ; β; γ̂, λ̃0 ) are substituted with their consistent estimates λ̃null 0 (t) and γ 0 0 (t) and γ̂ null 0 . Thus, we are effectively solving (8) with c replaced by a consistent estimator ĉ. The solution, β̂(ĉ), to the weighted GEE given as (9) is found using the Fisher scoring method (McCullagh and Nelder, 1989). It can be seen that the choice of c(t, Z; β) does not involve the overdispersion parameter φ0 and therefore the value of φ0 does not affect the estimation of β 0 . One nice feature of using the above generalized estimating equation is that β 0 will be consistently estimated independent of a particular choice of c(t, Z; β) and whether the matrix of the variance-covariance function V (Z; β) is correctly specified or not. The reason that we use λnull (t, F obs (t−)) in c(t, Z, β) is two-fold. First, it serves to numerically stabilize the influence of small values in the denominator of the integrand in (8). Second, when F obs (t−) has no influence on the intensity of visiting at time t (null model is correctly specified) and H(t, Fiobs (t−)) and H null (t, Z i ) are chosen so that they will be informative follow-up 11 equal under the null, there will be cancellation in the numerator and denominator of the ratio λnull (t, F obs (t−))/λ(t, F obs (t−)) in the integrand. Equivalently, W (Z i ; β; γ̂, λ̃0 ) in (9) becomes the identity matrix and the resulting estimating equation will be exactly same as the “working independence” GEE. Because λ0 (t) and γ 0 are estimated from the data in the first stage, the variability in their estimates must be taken into account when estimating the variances of the estimates of β0 in the second stage. We provide the estimates of the asymptotic variance of the estimates of β 0 in Section 4. 3.3. Implementation of the Estimation Procedure in Standard Statistical Software The above estimation procedure can be implemented in standard statistical software with relative ease. The predictor-adjusted and the null intensity models (5) and (6) can be fitted either using SAS proc phreg or S-plus/R function coxph with counting styled data. The resulting intensities can be smoothed in SAS with proc kde or in S-plus/R with function ksmooth. The estimates of β0 can then be obtained in SAS using proc genmod and in S-plus/R using glm. The standard errors of β̂ can be obtained via bootstrap. For large sample size, we provide the formula for the calculation of the asymptotic standard errors of β̂ in Section 4, which can also be programmed with additional efforts. 4. Large Sample Theory The consistency of β̂(c) follows, under mild regularity conditions from the fact that E[U (O; β 0 , γ 0 , λ0 , c)] = 0 under (1), (3) and (5) as shown in Section 2.4 and that our estimators of γ 0 and λ0 are consistent. Consistency of β̂(ĉ) also follows since the consistency of β̂(c) does not rely on the choice of c(t, Z; β0 ), although we choose to use the consistent estimators λ̃null and γ̂ null in β̂(ĉ). Using arguments similar to those in Robins, Mark and 0 0 Newey (1992), the large sample distribution for β̂(ĉ) is the same as for β̂(c) because the covariates Z, used in estimating λnull (t, F obs (t−)) from (6), are also included in the marginal model (1) for estimating β 0 . We now consider the large sample distribution of √ n{β̂(c)−β 0 }. Suppose for the moment that λ0 was finite-dimensional and that γ 0 and λ0 were estimated by parametric maximum 12 Lin et al. likelihood. Then, under regularity conditions, n √ 1 X n{β̂(c) − β 0 } = √ IF (Oi ; β 0 , γ 0 , λ0 ; c) + oP (1), n i=1 where IF (O; β0 , γ 0 , λ0 ; c) is the influence function for β̂(c) as the solution to the estimating equation (9). It can be easily shown using mean value expansion-type argument or the geometry of Euclidean space, as in Robins et al. (1995), that the influence function would be equal to IF (O; β 0 , γ 0 , λ0 ; c) = −{∂E[U (O; β, γ 0 , λ0 ; c)]/∂β|β =β }−1 0 (10) × U (O; β, γ 0 , λ0 ; c) − Π[U (O; β 0 , γ 0 , λ0 ; c)|T ] where T is the linear subspace spanned by the scores for γ0 and λ0 and Π[·|T ] is the projection of · onto T . The second term in the right-hand side of (10) is the residual of U (O; β0 , γ 0 , λ0 ; c) after projecting onto T . Technically, this result only requires that the estimators of γ 0 and λ0 converge at rates faster than n1/4+δ , for some δ > 0 (Newey, 1990). When λ0 (t) is infinite-dimensional, Newey (1990), Bickel et al. (1993), van der Vaart (1998), and van der Laan and Robins (2003) show, under mild regularity conditions, the result (10) also applies, where T is the tangent sets defined as the mean-square closure of the linear subspace spanned by the score for γ 0 and the scores for λ0 (·) from all parametric submodels. Since our kernel estimator λ̃0 (·) is a function of the NPML Breslow estimator, by controlling its bandwidth bn such that bn → 0, nbn → ∞ and lim sup n1/5 bn < ∞ as n → ∞, we can achieve the convergence rate of n3/10 for λ̃0 , which is faster than n1/4 (Andersen et al., 1993). Since β̂(ĉ) and β̂(c) share the same influence function, we have √ D n{β̂(ĉ) − β 0 } → Normal (0, E [IF (O; β 0 , γ 0 , λ0 ; c)IF (O; β0 , γ 0 , λ0 ; c)T ]) with the influence function IF (O; β0 , γ 0 , λ0 ; c) also in form of (10). In the Appendix, we derive the tangent sets T and show that the projection is Π[U (O; β0 , γ 0 , λ0 ; c)|T ] (11) = E [U (O; β 0 , γ 0 , λ0 ; c)S(O; γ 0 , λ0 )T ] E [S(O; γ 0 , λ0 )S(O; γ 0 , λ0 )T ]−1 S(O; γ 0 , λ0 ) Z τ E[U (O; β 0 , γ 0 , λ0 ; c) dM (O, t; γ 0 , λ0 )] + dM (O, t; γ 0 , λ0 ), λ0 (t)E[Y (t) exp{γ T0 H(t, F obs (t−))}] 0 where S(O; γ 0 , λ0 ) (12) Z τ E[Y (t)H(t, F obs (t−)) exp{γ T0 H(t, F obs (t−))}] = H(t, F obs (t−)) − dM (O, t; γ 0 , λ0 ), E[Y (t) exp{γ T0 H(t, F obs (t−))}] 0 and M (O, t; γ 0 , λ0 ) = N (t) − cess martingale. Rt 0 informative follow-up 13 Y (s) exp{γ T0 H(s, F obs(s−))} dΛ0 (s) is the counting pro- The first term in the right-hand side of (11) is the projection of U (O; β0 , γ 0 , λ0 ; c) to the linear subspace spanned by the score S(O; γ0 , λ0 ) for γ 0 and the second term is the projection to the space spanned by the scores for λ0 (·) from all parametric submodels. We estimate the influence function IF (O; β0 , γ 0 , λ0 ; c) given as (10) by n o−1 n X c (O; β̂, γ̂, λ̃0 ; ĉ) = − ∂ U (Oj ; β, γ̂, λ̃0 ; ĉ) ˆ IF ∂β j=1 β =β io n h b U (O; β̂, γ̂, λ̃ ; ĉ)T , × U (O; β̂, γ̂, λ̃0 ; ĉ) − Π 0 where b [U (O; β̂, γ̂, λ̃ ; ĉ)|T ] Π 0 = n nX U (Oj ; β̂, γ̂, λ̃0 ; ĉ)Ŝ(Oj ; γ̂, λ̃0 ) j=1 n on X j=1 + Z 0 and T τ Pn Ŝ(Oj ; γ̂, λ̃0 )Ŝ(Oj ; γ̂, λ̃0 )T o−1 Ŝ(O; γ̂, λ̃0 ) U (Oj ; β̂, γ̂, λ̃0 ; ĉ) dM (O, tj ; γ̂, λ̃0 ) dM (O, t; γ̂, λ̃0 ) Pn λ̃0(t) j=1 Yj (t) exp{γ̂ T H(t, Fjobs (t−))} j=1 Ŝ(O; γ̂, λ̃0 ) # Pn Z τ" T obs obs Y (t)H(t, F (t−)) exp{γ̂ H(t, F (t−))}] j j j j=1 Pn = H(t, F obs (t−)) − dM (O, t; γ̂, λ̃0 ). T obs 0 j=1 Yj (t) exp{γ̂ H(t, Fj (t−))}] That is, in (11) and (12) above, the expectations are replaced by the corresponding sample averages and the unknown quantities (β0 , γ 0 , λ0 (·), c) are replaced by their consistent estimators. 5. Analysis of HUD-VASH Data 5.1. Data description Delivery of effective services to homeless people with serious psychiatric and/or addictive disorders has been a major challenge, in large part, because of the need for assistance from multiple agencies in multiple service domains including housing, psychiatric and substance abuse treatment, income support, and social and vocational rehabilitation. Decisions in 14 Lin et al. medicine, public health, and social services rely critically on appropriate evaluation of competing treatments and services. As described in the Introduction, a clinical trial of three housing services was conducted with the HUD-VASH program. In the service group (i) of the full HUD-VASH intervention, subjects were offered both housing vouchers for rent subsidies and intensive case management in an integrated program. These vouchers allow payment of a standardized local fair market rent less 30% of the individual beneficiary’s income. The case management intervention promoted active liaisons between clients and their local Public Housing Authority and also eased the transition to independent living by helping clients locate an apartment, negotiate the lease, or furnish the apartment. Subjects randomized to group (ii) only received case management, without a voucher. The group (iii) of standard care consisted of short-term broker case management. The randomization among the three services was weighted so that half as many veterans were assigned to case management alone as to the other two groups, to assure that the vouchers would be used in a timely fashion. Recruitment took place from 1992 to 1995. All subjects were followed for at least four years. The time scale for analysis was month since enrollment. The primary time-dependent outcome measure, P (t), was the percentage of days homeless in past three months. The main goal of the analysis is to compare the intervention-specific mean outcome trajectories over four years. With this in mind, we take Z, the covariates in the marginal outcome regression model, to be the vector of indicator variables (Z(i) , Z(ii) , Z(iii) )T , where Z(κ) is the indicator of being assigned to intervention group (κ). Auxiliary time-dependent factors collected during the study included income (in thousand dollars) in the past three months - income(t), Lehman measures of quality of living situation - qliv(t), and whether social security or VA benefit was received during the past three months - benefit(t). In the top panel of Figure 1, we present the intervention-specific average of the primary homelessness outcome for subjects reporting at baseline and within 6 month intervals thereafter. In the bottom panel of Figure 1, we present the intervention-specific average cumulative number of visits as a function of time since enrollment. A crude view of the data suggests that the full HUD-VASH program has the lowest level of homelessness and the highest level of visiting. The other groups appear to be comparable in terms of homelessness, with the standard care group reporting the lowest level of visiting. informative follow-up 15 5.2. Results from HUD-VASH Homeless Study We first fit the intensity models for the visit process. For the predictor-adjusted intensity model (5), the two indicators of intervention groups (i) and (ii), Z(i) and Z(ii) , are included as fixed covariates, additional time-dependent covariates with possible interactions with Z(i) , Z(ii) and Z(iii) that are significant at the 0.1 level are also included. The predictors for (5) that we ended up with are H(t, F obs (t−)) = (Z(i) , Z(ii) , income(0), benefit(0), P obs (t−), {Z(i) + Z(ii) } × qlivobs (t−), Z(iii) × qlivobs (t−), Z(i) × N (t−), Z(ii) × N (t−), Z(iii) × N (t−))T Table 1 top panel presents the estimates of γ 0 and their associated standard errors. We see that higher baseline income is associated with decreased intensity of visiting, while receiving any social security or VA benefits significantly increases the intensity of visiting. The most recently reported outcome, percentage of days homeless in past three months, is positively associated with the intensity of visiting. Members of group (i) and (ii) have a higher intensity of visiting than those of group (iii). Higher cumulative number of visits (N (t−)) and higher quality of living are both associated with increased intensity of visiting and their effects are service specific. For the null intensity model (6), we only included Z(i) and Z(ii) , that is, we took H null (t, Z) = (Z(i) , Z(ii) )T . Table 1 bottom panel presents the estimates of γ null and their associated standard errors. 0 Under the null model, we see that members of group (i) have the highest visit intensity followed by group (ii) and then group (iii). The Epanechnikov kernel was used to obtain smoothed baseline visit intensities λ̃0 (t) and λ̃null 0 (t) from the Breslow estimates of the cumulative baseline intensity as described in (7). Tail modifications were employed, as suggested by Gasser and Müller (1979). The bandwidth was chosen to be fixed at 6.93 months to attain relatively small mean integrated squared error (MISE) for λ̃0 (t) and λ̃null 0 (t) and at the same time to assure the baseline intensities look smooth under visual inspection. For the marginal outcome regression model, we set the mean of P (t) as µ(t, Z; β0 ) = logit−1 {Z(i) β T0,(i) Bd (t) + Z(ii) β T0,(ii) Bd (t) + Z(iii) β T0,(iii) Bd (t)}, where logit is the logistic 16 Lin et al. link function, Bd (t) is a B-spline basis with d degrees of freedom, β0,(κ) is a d × 1 vector of unknown regression parameters for intervention group (κ), β0 = (β T0,(i) , β T0,(ii) , β T0,(iii) )T , and we take d = 4 in our particular example. We take the unscaled variance function of P (t) as Σ(t, µ(t, Z; β0 ), α) = µ(t, Z; β 0 ){1 − µ(t, Z; β 0 )}. The rationale for this choice of variance function is as following: since the days homeless in past 90 days, which we define here as, A(t), could be regarded as Binomial(90, µ(t, Z; β 0 )) with Var{A(t)|Z} = 90 µ(t, Z; β 0 ){1 − µ(t, Z; β 0 )}, the variance of P (t) is thus given as Var{P (t)|Z} = Var{(1/90)A(t)|Z} = (1/90)µ(t, Z; β 0 ){1 − µ(t, Z; β 0 )}. The estimated curves of P (t) for each of the three service groups using the usual GEE analysis and using our inverse intensity of visiting weighted estimation approach are presented in the top and bottom panels of Figure 2, respectively. With the usual GEE analysis, it can be seen that the service group (i) “case manager + voucher” has a much lower percentage of homeless days than the other two groups. The profiles for service groups (ii) and (iii) tend to be alike until the later years of the trial where service group (ii) has a higher percentage of homeless days than group (iii). This suggests that the addition of a case manager may provide little help in reducing homelessness. The results under the usual GEE analysis closely match the observed homeless profile in the upper panel of Figure 1. When the visit process is accounted for using our proposed method, the estimated percentage of homeless days for group (ii) (case manager only) decreases steadily with time (bottom panel in Figure 2) and is much lower than that for group (iii) after about one year. The service groups (i) and (ii) both appear to have a lower estimated percentage of homeless days with our weighted analysis than with the usual GEE analysis, while the profile of homelessness for group (iii) does not seem to be different. From the intensity model for visiting (5), we learned that subjects who were worse off (more homelessness, lower incomes, more benefits, poorer quality of living situation) tended to have an increased intensity of visiting. This suggests that the level of homelessness in the observed data tend to be biased upward. Thus, the decrease in the mean profiles for homelessness from the GEE to the weighted analysis makes intuitive sense. For each service group, we use area under the estimated outcome curves to quantify the overall level of homelessness over the four year follow-up. The interpretation of the area under the curve (AUC) is the expected number of homeless months over the 48 month period. The AUC’s were estimated using β̂ with numerical integration. The group-specific estimates of AUC and associated 95% confidence intervals (C.I.) are presented in the top panel of informative follow-up 17 Table 2. As expected, we see that the GEE estimates are inflated related to the weighted estimates. The difference between the GEE and weighted estimates range from a low of 1 week for group (iii) to a high of 2.6 months for group (ii). We calculated confidence intervals for the AUC’s using the Delta method via the calculation of the asymptotic variance of β̂ presented in Section 4 and also via bootstrap (200 re-samples). It can be seen for our data that the confidence intervals of AUC based on the calculation from the asymptotic variance are much wider than those from bootstrap. Based on the results from the investigation of the sample size requirement in order to obtain reliable estimates of the asymptotic standard errors in finite samples and from the evaluation of the coverage rates for the asymptotic and bootstrap confidence intervals in finite samples with simulation studies in Section 6, we view that, for datasets with this sample size, the bootstrap confidence intervals are a more reliable representation of variability than those based on large sample theory. We parsimoniously estimate intervention effects by taking the difference in AUC between the group-specific homelessness curves. This measures represents the difference in the expected number of months homeless over the four year period. We denote the intervention ˆ (κ),(κ∗ ) using numerical ineffect of groups (κ) versus (κ∗ ) by ∆(κ),(κ∗ ) and estimate it by ∆ tegration. For our study, a beneficial intervention effect of treatment (κ) versus (κ∗ ) will ˆ (k),(k∗ ) . In the bottom panel of Table 2, we present the pairwise be signaled by a negative ∆ treatment effects along with 95% confidence intervals. As above, we present large sample Delta method-based and boostrap confidence intervals. We rely on the bootstrap intervals for our sample size. The treatment effect estimates differ between the GEE and our weighted approach. In comparing groups (i) vs. (ii) and (i) vs. (iii), the estimates are in the same direction, favoring the full HUD-VASH program. In comparing groups (ii) and (iii), the estimate for GEE favors standard care, while the estimate for the weighted approach favors the case-manager alone intervention (these also agree with Figure 2). The results of hypothesis testing are the same between the GEE and weighted approaches. The full HUD-VASH program is superior to the standard care and case-manager alone groups. The null hypothesis of no treatment effect cannot be rejected when comparing groups (ii) and (iii). Our results from data analysis show that taking the visit process into account markedly changed the marginal profile of homelessness. In Section 6, we conducted simulation studies to illustrate that the bias with the usual GEE analysis, if it exists, can be corrected with our inverse intensity of visiting method. 18 6. Lin et al. Simulation Studies We conducted two sets of simulation studies based on a single treatment group. The focus of the first set of simulations was to evaluate the bias, variance, and mean-squared error (MSE) of the mean outcome profile curves and AUC using the usual GEE and our visit intensity weighted estimators. In the second set of simulations, we vary the sample size to evaluate the performance of our large sample-based estimator of the standard error of our AUC estimator and the coverage rates of large-sample based confidence intervals. We also look at the coverage rate of a bootstrap-based confidence interval for sample sizes comparable to those in the HUD-VASH evaluation. In all simulations, 200 datasets were generated. 6.1. Simulation Procedure For each subject, we simulate data for 1, 531 days (= 4 years and 3 months) with the first 90 days being regarded as “pre-baseline” and the 91st day being regarded as the 1st day of study. Letting t = −90, −89, . . . , 0, 1, . . . , 1431 index the days, the procedure of generating data for one subject is as follows: (a) A time-dependent auxiliary factor X(t) was simulated from Normal distribution with mean −1.5 + 0.066t and variance 0.25. (b) A 1531-dimensional multivariate binary response, P ∗ (t), was simulated according to the procedure described by Lunn and Davies (1998) using a compound symmetry correlation coefficient of 0.6 for the logistic models: logit E[P ∗ (t)|X(t)] = −0.5 − 0.075t + 0.001t2 − 0.25X(t), logit E[P ∗ (t)] = −0.5 − 0.075t + 0.001t2 . (13a) (13b) Equations (13a) and (13b) were used for the first and second set of simulation studies, respectively. Cumulative averages of the binary response P ∗ (t) in past 90 days were calculated for each of the days at and after the 1st day of the study, i.e., P (t−) = Pt−1 ∗ s=t−90 P (s)/90 for 1 ≤ s ≤ 1431. The resulting percentages are regarded as the “true” longitudinal outcome for the subject. (c) A subject’s visit times were simulated via the method for point process described by Daley, Vere-Jones and Smirnov (2002) using one of the following visit intensity informative follow-up 19 functions: λ(t, F obs (t−)) = 0.005 exp{−0.001t}, (14a) λ(t, F obs (t−)) = 0.005 exp{−0.001t + 1.0P obs (t−)}, (14b) λ(t, F obs (t−)) = 0.005 exp{−0.0003t + 1.0P obs (t−) − 0.75X obs (t−)} (14c) for t = 1, . . . , 1431 days. Equations (14a)-(14c) were used for the first set of simulations, while only Equation (14b) was used for the second set simulations. 6.2. Evaluation of the Bias, Variance, and MSE In these simulations, we fixed the sample size at n = 250 and generated the true outcomes under (13a). This true mean outcome trajectory is represented by the black solid lines in Figure 3 and the corresponding true AUC was 9.63. The three graphs in Figure 3 correspond to the scenarios of the three different true visit process models (14a)-(14c) under which the resulting observed outcome data are generated, respectively. Models (14a)-(14c) reflect different degrees to which the visit process depends on the observed outcome P obs (t) and the prognostic factor X obs (t). For a resulting dataset generated under one of the three true visit intensity models (14a)-(14c), we estimated the visit intensities with the following four choices of H(t, F obs (t−)) in model (5) one at a time: (5a) H(t, F obs (t−)) = P obs (t−), (5b) H(t, F obs (t−)) = X obs (t−), (5c) H(t, F obs (t−)) = (P obs (t−), X obs (t−))T , (5d) H(t, F obs (t−)) = 0. The estimator using (5d) is equivalent to the usual GEE estimator. We also set H null (t) in (6) equal to zero since we only simulated data for one treatment group and estimated the null intensity. We then estimated the mean outcome trajectory using the marginal outcome regression model, E[P (t)] = µ(t, β0 ) = logit−1 β T0 {B4 (t)} with the inverse visit intensity weighted approach. Under the true null visit intensity model (14a), where visiting is unrelated to the outcome, our results from the simulation show that all the four estimated curves of the outcome using fitted intensity models (5a)-(5d) are unbiased (top panel of Figure 3) with the bias estimates of the AUC’s given in the first line of Table 3. We also see that all four estimators of AUC are comparable in terms of variance and MSE (2nd and 3rd lines of Table 3). 20 Lin et al. Under the true visit model (14b), where visiting is related to the past observed outcome, both fitted intensity models of (5a) and (5c) resulted in unbiased curves and the estimates of AUC with (5c) giving a slightly better result. The fitted models (5b) and (5d) yielded biased results with (5d) being the worst (the middle panel of Figure 3 and the 4th line of Table 3). The order of increased bias is (5c), (5a), (5b) and (5d). All the estimators also have comparable variance (the 5th line in Table 3). We see that the estimator using (5c) has a slightly lower MSE than the one using (5a) and the GEE estimator and the weighted one using an incorrectly specified intensity model have poor MSE (the 6th line in Table 3). Under the true visit model (14c), where visiting depends on both the observed past outcome and the auxiliary factor, the order of increased bias for the four fitted intensity model is also (5c), (5a), (5b) and (5d) as shown in the bottom graph of Figure 3 for the estimated curves and the 7th line of Table 3 for the AUC’s. All the four estimators have comparable variance (the 8th line in Table 3). In terms of MSE, the estimator using (5c) is half as large as the estimator using (5a) and the usual GEE estimator using (5d) and the weighted estimator using an intensity model (5b) which only accounts for the auxiliary factor have poor MSE (the 9th line in Table 3). The above simulations demonstrate that when the visit process depends on outcome and/or other auxiliary factors, the usual GEE produces biased estimates. The bias is corrected by the inverse visit intensity weighted approach proposed in this article in which the weights are estimated from a visit process intensity model using the past outcome and possibly auxiliary prognostic factors as time-varying covariates. Including the auxiliary covariates in the visit intensity model helps to further reduce the bias, regardless of whether or not the true intensity depends on the auxiliary covariates. When the visit process is unrelated to the outcome, both the GEE and our weighted approach yield unbiased estimates and comparable MSEs. 6.3. The Effects of Varying Sample Size on the Asymptotic and Bootstrap Variance and Coverage We generated true longitudinal outcome data according to model (13b) with the true AUC of 9.5 for n subjects. The resulting observed outcome data were obtained using the true visit intensity model (14b) in which the visit process depends on the observed outcome P obs (t). For each resulting simulated dataset, we estimated the visit intensity by fitting the informative follow-up 21 intensity model (5a): H(t, F obs(t−)) = P obs (t−). We also set H null (t) in (6) equal to zero and estimated the null intensity. We then estimated the mean AUC using the marginal outcome model, E[P (t)] = µ(t, β 0 ) = logit−1 β T0 {B4 (t)} with the inverse visit intensity weighted approach. We varied the sample size n for the simulated datasets from 50 to 2000. Whatever be the sample size, we see that our weighted estimator has minimal bias (the 2nd column in Table 4). When we compare the average of the estimated standard errors of AUC based on the large sample theory to the Monte-Carlo standard deviation of the estimated AUC, we see that our asymptotic estimator of standard error starts to perform well as compared to the Monte-Carlo standard deviation when the sample size reaches about 1000 (the 3rd, 5th, and last columns in Table 4). The coverage rates of large-sample theory-based 95% confidence intervals for the true AUC are conservative for sample sizes below 1000, but perform well for sample sizes 1000 or higher (2nd to last column in Table 3). This suggests that, for the sample sizes in the HUD-VASH study, large sample confidence intervals may be too wide. In our simulation, the coverage rates of 95% bootstrap confidence intervals (based on 200 re-samples for each generated data) for sample sizes 50, 80 and 150 were evaluated to be 92.5%, 94.5% and 95%, respectively (4th column in Table 4). This suggests that the bootstrap is a valuable tool for sample sizes around 80 or above. With these results, we suggest computing bootstrap confidence intervals for small or median sized samples. For a relatively large sized sample, results from bootstrap and the asymptotic calculation are virtually the same. 7. Discussion In this paper, we proposed a class of consistent and asymptotically normal estimators for the parameters of the marginal regression models for longitudinal outcomes, where follow-up on subjects is highly irregular and potentially informative. Our method extends the weighted estimating equations approach of Robins et al. (1995) to settings of irregularly spaced longitudinal data in which the measurement (i.e., visit) process may depend on the outcomes or auxiliary prognostic factors. Our semiparametric method does not require specification of the complete data (outcomes, possibly auxiliary factors, and visits) likelihood and will be particularly useful for analyzing non-Gaussian data. Our method does require correct 22 Lin et al. specification of a visit process intensity model. We suggest that all available prognostic covariates be considered for use in such an intensity model, so that the bias due to the informative visit process can be corrected to the greatest extent possible. Although, we did not observe the variance inflation of β̂ due to the increase in the number of covariates in the visit intensity model, careful attention should be paid to possible variance-bias trade-off. It is important to highlight that the extension of our model to simultaneously accommodate informative dropout is straightforward. That is, we would posit predictor-adjusted and null proportional hazard models for drop-out. In the numerator (denominator) of the integrand of an individual’s estimating function given as (8), we would put the estimated conditional probability of having not dropped by time t under the null (predictor-adjusted) model. However, we think that, in some settings, additional modeling of dropout may not be necessary since the information on dropout may already be captured in the visit process. The marginal outcome model is also easily extended to include external time-dependent covariates (Kalbfleisch and Prentice, 2002). In our paper, we have not officially considered issues of efficiency. For future work, we plan to investigate locally-efficient estimators of β0 (van der Laan and Robins, 2003) and it would be also useful to develop a sensitivity analysis methodology, along the lines of Rotnitzky et al. (1998) and Scharfstein et al. (1999), to evaluate the sensitivity of inference to departures from the sequential randomization assumption. Acknowledgment We appreciate the help of Wen Liu-Mares on data management for this project. We thank Mark van der Laan for useful discussions related to the derivation of the influence function for our weighted estimator. A. Derivation of Tangent Sets and Projections Here, we first derive the tangent sets for γ and λ0 (·) in the semiparametric model (5) and then derive the projection of U (O; β 0 , γ 0 , λ0 ; c) onto these tangent sets. The tangent sets are closed linear subspaces of a Hilbert space consisting of all p-dimensional mean zero random vectors with finite variance and the covariance inner product. Since λ0 (·) is infinite dimensional, we derive the tangent sets through parametric submodels. A parametric informative follow-up 23 submodel under model (5) corresponds to the parameterization λ(t, F obs (t−); ω, γ) such that for some (ω 0 , γ 0 ), λ(t, F obs (t−); ω 0 , γ 0 ) is equal to λ(t, F obs (t−)), that is, a parametric submodel contains the truth. We consider the following parametric submodel for (5): λ(t, F obs (t−); ω, γ) = λ(t; ω) exp{γ T H(t, F obs (t−))}. For example, if we take λ(t; ω) = λ0 (t) exp{ω1 h1 (t)+. . . , +ωr hr (t)}, where ω = (ω1 , . . . , ωr )T is r×1 vector and h1 (t), . . . , hr (t) are r different smooth functions, then the truth is obtained by setting ω = ω 0 = 0 and γ = γ 0 . Z The log-likelihood for the above parametric submodel is proportional to Z τ τ T obs λ(t; ω) exp{γ T H(t, F obs (t−))}Y (t) dt. log[λ(t; ω) exp{γ H(t, F (t−))}] dN (t) − 0 0 The resulting scores for γ and ω evaluated at the truth, (γ 0 , ω0 ), are equal to Z τ Z τn ∂ λ(t; ω) obs ∂ω dM (t), respectively, H(t, F (t−)) dM (t) and λ0 (t) ω =ω 0 0 0 Rt where M (t) ≡ M (O, t; γ 0 , λ0 ) = N (t) − 0 Y (s)λ0 (s) exp{γ T0 H(s, F obs (s−))} ds. By ranging over parametric submodels the 2nd integrand above can be made any p-dimensional function of t by pre-multiplying a conformable matrix. Thus, the tangent set, T = T1 + T2 , is defined as T1 = Q T2 = Z Z 0 τ 0 τ H(t, F obs (t−)) dM (t) : Q is a any conformable matrix of p rows , g(t) dM (t) : g(t) is any p × 1-dimensional function of t . Now let T1∗ ⊂ T1 denote the tangent set for all residuals of the elements in T1 after projecting onto T2 , thus T1∗ and T2 are orthogonal and T = T1∗ ⊕ T2 . It is clear that T2 is the space spanned by the scores for λ0 (·) from all parametric submodels and it will also become clear below that T1∗ is actually the tangent set of the efficient score for γ. To find the projection of elements in T1 onto T2 , we need to find a p-dimensional measurable function h∗ (t) such that n Z E Q 0 τ H(t, F obs (t−)) dM (t) − Z 0 τ oT Z h (t) dM (t) τ ∗ 0 g(t) dM (t) = 0 (15) for all g(t). We use the property of the covariance of martingale stochastic integrals in the calculations to find the solution of h∗ (t) in (15) (Fleming and Harrington, 1991; Andersen et al., 1993) and it is then straightforward to show that E H(t, F obs (t−))Y (t) exp{γ T0 H(t, F obs (t−))} ∗ . h (t) = Q E [Y (t) exp{γ T0 H(t, F obs (t−))}] 24 Lin et al. To see this, we plug h∗ (t) back into the left hand side of (15) which can be easily verified to be zero also by using the property of stochastic integrals with respect to martingale covariance processes. Therefore, Z τ ∗ ∗ h̄ (t) dM (t) : Q is a any conformable matrix with p rows T1 = {Q 0 = {QS(O; γ 0 , λ0 ) : Q is a any conformable matrix with p rows , where E[H(t, F obs (t−))Y (t) exp{γ T0 H(t, F obs (t−))}] ∗ h̄ (t) = H(t, F obs (t−)) − . E [Y (t) exp{γ T0 H(t, F obs (t−))] Rτ ∗ It is important to note that 0 h̄ (t) dM (t) = S(O; γ 0 , λ0 ) and S(O; γ 0 , λ0 ), defined in (12), is the well known efficient score for γ. We now derive the projection of U (O; β0 , γ 0 λ0 ) onto T1∗ and T2 . To find the projection onto T1∗ and onto T2 , we need to find Q∗ and g ∗ (t) such that n Z τ oT n Z τ o ∗ ∗ ∗ E U (O; β0 , γ 0 λ0 ) − Q h̄ (t) dM (t) Q h̄ (t) dM (t) = 0, n Z E U (O; β0 , γ 0 λ0 ) − 0 0 0 oT n Z τ ∗ g (t) dM (t) τ 0 o g(t) dM (t) = 0 for all matrices Q and for all g(t), respectively. Similar to (15), it can be shown that the solution of Q∗ and the solution of g ∗ (t) to the above equations, respectively, are, h Z τ Z τ Z τ i−1 ∗ ∗ ∗ ∗ T T Q = E U (O; β0 , γ 0 λ0 ) h̄ (t) dM (t) E { h̄ (t) dM (t)}{ h̄ (t) dM (t)} 0 0 0 = E [U (O; β0 , γ 0 λ0 )S(O; γ 0 , λ0 )T ] [S(O; γ 0 , λ0 )S(O; γ 0 , λ0 )T ]−1 , and E [U (O; β0 , γ 0 λ0 ) dM (t)] . E [Y (t)λ0 (t) exp{γ T0 H(t, F obs (t−))}] Therefore, the projection of U (O; β0 , γ 0 λ0 ) onto T is g ∗ (t) = Π [U (O; β0 , γ 0 λ0 )|T ] = Π [U (O; β 0 , γ 0 , λ0 ; c)|T1∗ ] + Π [U (O; β 0 , γ 0 , λ0 ; c)|T2 ] , which is given as (11) in Section 4. References Albert, P. S., Follmann, D. A., Wang, S. A. and Suh, E. B. (2002) A latent autoregressive model for longitudinal binary data subject to informative missingness. Biometrics, 58, 631–642. informative follow-up 25 Andersen, P. K., Borgan, Ø., Gill, R. D. and Keiding, N. (1993) Statistical Models Based on Counting Processes. New York: Springer-Verlag. Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993) Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins Series in theMathematical Sciences. Daley, D. J., Vere-Jones, D. and Smirnov, B. M. (2002) An Introduction to the Theory of Point Processes: Elementary Theory and Methods. New York: Springer Verlag, 2nd edn. Deltour, I., Richardson, S. and Le Hesran, J. Y. (1999) Stochastic algorithms for markov models estimation with intermittent missing data. Biometrics, 55, 565–573. Diggle, P. J. and Kenward, M. G. (1994) Informative dropout in longitudinal data analysis (with discussion). Appl. Statist., 43, 49–93. Fitzmaurice, G. M., Laird, N. M. and Shneyer, L. (2001) An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable drop-outs. Statist. Med., 20, 1,009–1,021. Fleming, T. R. and Harrington, D. P. (1991) Counting Processes and Survival Analysis. New York: John Wiley & Sons. Follmann, D. and Wu, M. (1995) An approximate generalized linear model with random effects for informative missing data. Biometrics, 51, 151–168. Gasser, T. and Müller, H. G. (1979) Kernel estimation of regression functions. In Smoothing Techniques for Curve Estimation, 23–68. Berlin: Springer-Verlag. Lecture Notes in Mathematics 757. Hogan, J. W. and Laird, N. M. (1997) Mixture models for the joint distribution of repeated measures and event times. Statist. Med., 16, 239–257. Huber, P. J. (1985) Projection pursuit. Ann. Statist., 13, 435–475. Kalbfleisch, J. D. and Prentice, R. L. (2002) The Statistical Analysis of Failure Time Data. New York: John Wiley & Sons, second edn. Liang, K.-Y. Y. and Zeger, S. L. (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. 26 Lin et al. Lipsitz, S. R., Fitzmaurice, G. M., Ibrahim, J. G., Gelber, R. and Lipshultz, S. (2002) Parameter estimation in longitudinal studies with outcome-dependent follow-up. Biometrics, 58, 621–630. Little, R. J. A. (1995) Modelling the drop-out mechanism in repeated-measures studies. J. Am. Statist. Ass., 90, 1112–1121. Lunn, A. D. and Davies, S. J. (1998) A note on gererating correlated binary variables. Biometrika, 85, 487–490. van der Laan, M. J. and Robins, J. M. (2003) Unified Approach for Censored Longitudi-nal Data and Causality. New York: Springer Verlag. McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman & Hall, 2nd edn. Newey, W. K. (1990) Semiparametric efficiency bounds. J. Appl. Econom., 5, 99–135. Preisser, J. S., Galecki, A. T., Lohman, K. K. and Wagenknecht, L. E. (2000) Analysis of smoking trends with incomplete longitudinal binary response. J. Am. Statist. Ass., 95, 1,021 – 1,031. Robins, J. M. (1998) Marginal structural models. In 1997 Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1–10. American Statistical Association. Robins, J. M., Mark, S. D. and Newey, W. K. (1992) Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics, 48, 479–495. Robins, J. M. and Ritov, Y. (1997) Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Statist. Med., 16, 285–319. Robins, J. M. and Rotnitzky, A. (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology - Methodological Issues (eds. N. Jewell, K. Dietz and V. Farewell), 297–331. Boston, MA: Birkhäuser. Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Statist. Ass., 90, 106–121. informative follow-up 27 Rosenheck, R. A., Kasprow, W., Frismn, L. K. and Liu-Mares, W. (2002) Cost-effectiveness of supported housing for homeless persons with mental illness. Arch. Gen. Psychiat., In press. Rotnitzky, A., Robins, J. M. and Scharfstein, D. O. (1998) Semiparametric regression for repeated outcomes with nonignorable nonresponse. J. Am. Statist. Ass., 93, 1,321– 1,239. Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999) Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion). J. Am. Statist. Ass., 94, 1,096–1,120. Troxel, A. B., Lipsitz, S. R. and Harrington, D. P. (1998) Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data. Biometrika, 85, 661–672. van der Vaart, A. W. (1998) Asymptotic Statistics. Cambridge: University Press. Wu, M. C. and Carroll, R. J. (1988) Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44, 175–188. Table 1. Parameter Estimates and Their Standard Errors (SE) in the Intensity Models Predictor-Adjusted Intensity Model (5) Covariates γ̂ SE(γ̂) Z(i) 0.612 0.141 Z(ii) 0.409 0.144 income(0) -0.221 0.050 benefit(0) 0.113 0.041 0.141 0.058 0.057 0.019 0.108 0.027 Z(i) × N (t−) 0.086 0.013 0.109 0.017 Z(iii) × N (t−) 0.119 0.015 P obs (t−) {Z(i) + Z(ii) } × Z(iii) qlivobs (t−) × qlivobs (t−) Z(ii) × N (t−) Null Intensity Model (6) Covariates γ̂ null SE(γ̂ null ) Z(i) 0.409 0.043 Z(ii) 0.207 0.054 i Table 2. Estimates of Areas Under Curves and Intervention Effects Area Under Curve (AUC) GEE Inverse Intensity Weighted GEE Asymptotic Bootstrap 95% C.I. 95% C.I. 5.51 [4.21, 6.80] [4.15, 6.67] (ii) 10.85 [7.48, 14.22] [7.96, 14.38] (iii) 10.14 [7.85, 12.43] [7.90, 12.57] Service (i) d AUC Asymptotic Bootstrap 95% C.I. 95% C.I. 4.57 [1.89, 7.25] [3.96, 6.52] 8.25 [2.19, 14.31] [6.83, 13.08] 9.89 [4.23, 15.55] [7.87, 13.13] d AUC Intervention Effects GEE Asymptotic Inverse Intensity Weighted GEE Asymptotic Bootstrap 95% C.I. ˆ (κ),(κ∗ ) ∆ 95% C.I. 95% C.I. [-6.34, -4.34] [-8.88, -1.89] -3.68 [-5.56, -1.80] [-7.05, -1.62] -4.57 [-8.04, -1.10] [-7.00, -2.36] -5.32 [-9.81, -0.83] [-8.18, -2.20] 0.71 [-3.36, 4.78] [-3.21, 4.71] -1.64 [-10.26, 6.98] [-4.00, 3.33] (κ), (κ∗ ) ˆ (κ),(κ∗ ) ∆ 95% C.I. (i), (ii) -5.34 (i), (iii) (ii), (iii) Bootstrap ii Table 3. Simulation Study on the Effects of the Intensity Models Fitted Intensity Models with Line (5a) (5b) (5c) (5d) P (t) X(t) P (t) + X(t) Null 0.05 0.05 0.05 0.05 0.12 0.14 0.13 0.14 0.13 0.14 0.13 0.14 sample bias∗ 0.15 1.37 0.01 1.52 sample variance∗ d∗ MSE 0.13 0.12 0.13 0.12 0.16 1.98 0.13 2.42 sample bias∗ 0.37 1.39 -0.01 1.62 sample variance∗ d∗ MSE 0.13 0.12 0.13 0.13 0.26 2.04 0.13 2.75 True Intensity Model Null Model, (14a) sample bias∗ 1 2 sample d∗ MSE 3 variance∗ Model with P (t), (14b) 4 5 6 Model with P (t) + X(t), (14c) 7 8 9 ∗: The sample bias, variance and MSE are for AUC (area under the curve) and the true value of the AUC is 9.63. Table 4. Simulation Study on Sample Size for Asymptotic Variance of AUC ∗ Sample Asymptotic average d AUC c AUC) d SE( coverage∗∗ 150 n 50 0.94 92.5% 0.72 94.5% 9.52 0.56 450 9.45 1000 average c AUC) d SE( coverage ratio∗∗∗ 4.83 100% 5.12 3.61 100% 5.03 95.0% 2.42 100% 4.35 0.38 – 0.88 100% 2.26 9.47 0.31 – 0.34 96.0% 1.10 1500 9.52 0.20 – 0.21 95.0% 1.04 2000 9.49 0.15 – 0.15 95.0% 0.99 80 ∗: 9.53 bootstrap 9.48 The true value of the AUC is 9.5. ∗∗ : 95% bootstrap confidence interval was constructed for each simulated data consisting of observations for n=50, 80 and 150 subjects and coverage rate was calculated based on 200 simulated data. ∗∗∗ : Ratio of the estimate of sample standard errors and the sample average of the asymptotic standard error. iii Observed P(t) 40 (i) voucher +case manager (ii) case manager only (iii) standard care percentage days homeless 35 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 35 40 45 50 months since randomization Observed N(t) 9 (i) voucher +case manager (ii) case manager only (iii) standard care 8 cumulative number of visits 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 months since randomization Fig. 1. Observed Percentage Days Homeless and Cumulative Number of Visits by Service Groups iv Usual GEE Analysis (i) voucher + case manager (ii) case manager only (iii) standard service 40 35 % of homeless 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 months since randomization Account for Informative Follow−up (i) voucher + case manager (ii) case manager only (iii) standard service 40 35 % of days homeless 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 months since randomization Fig. 2. Percentage of Days Homeless v 45 50 Simulation Study Under Null Intensity Model 55 true mean profile of P(t) mean profile from weighted analysis with P(t) only mean profile from weighted analysis with X(t) only mean profile from weighted analysis with both P(t) and X(t) mean profile from unweighted analysis 50 45 % of homeless 40 35 30 25 20 15 10 0 5 10 15 20 25 30 35 40 45 50 months since randomization Simulation Study Under the Intensity Model with P(t) 55 true mean profile of P(t) mean profile from weighted analysis with P(t) only mean profile from weighted analysis with X(t) only mean profile from weighted analysis with both P(t) and X(t) mean profile from unweighted analysis 50 45 % of homeless 40 35 30 25 20 15 10 0 5 10 15 20 25 30 35 40 45 50 months since randomization Simulation Study Under the Intensity Model with P(t) & X(t) 55 true mean profile of P(t) mean profile from weighted analysis with P(t) only mean profile from weighted analysis with X(t) only mean profile from weighted analysis with both P(t) and X(t) mean profile from unweighted analysis 50 45 % of homeless 40 35 30 25 20 15 10 0 5 10 15 20 25 30 35 months since randomization Fig. 3. Simulation Studies vi 40 45 50
© Copyright 2026 Paperzz