Addressing an idiosyncrasy in estimating survival curves using double-sampling in the presence of self-selected right censoring. Constantine E. Frangakis Department of Biostatistics, Johns Hopkins University 615 N. Wolfe St., Baltimore, MD 21205, U.S.A. [email protected] and Donald B. Rubin Department of Statistics, Harvard University 1 Oxford St., Cambridge, MA 02138, U.S.A. [email protected] In Biometrics (2001, 57: 333-353), with discussion. S UMMARY. We investigate the use of follow-up samples of individuals to estimate survival curves from studies that are subject to right-censoring from two sources: (i) early termination of the study, namely, administrative censoring, or (ii) censoring due to lost data prior to administrative censoring, so-called dropout. We assume that, for the full cohort of individuals, administrative censoring times are independent of the subjects’ inherent characteristics, including survival time. To address the loss to censoring due to dropout, which we allow to be possibly selective, we consider an intensive second phase of the study where a representative sample of the originally lost subjects is subsequently followed and their data recorded. As with double-sampling designs in survey methodology, the objective is to provide data on a representative subset of the dropouts. Despite assumed full response from the follow-up sample, we show that, in general in our setting, administrative censoring times are not independent of survival times within the two subgroups, nondropouts and sampled dropouts. As a result, the stratified Kaplan-Meier estimator is not appropriate for the cohort survival curve. Moreover, using the concept of potential outcomes, as opposed to observed outcomes, and thereby explicitly formulating the problem as a missing data one, reveals and addresses these complications. We present an estimation method based on the likelihood of an easily observed subset of the data, and study its properties analytically for large samples. We evaluate our method in a realistic situation by simulating data that match published margins on survival and dropout from an actual hip-replacement study. Limitations and extensions of our design and analytic method are discussed. K EY W ORDS: Double-sampling; Dropouts; Loss to follow-up; Potential outcomes; Rubin Causal Model. 1 1. Introduction 1 1 Motivation. When studies follow subjects over time and investigate a time-to-event outcome , such as survival, the outcome may not be available for all subjects at the end of the study. We focus on the situation with the simultaneous occurrence of: (i) administrative censoring, that is, censoring due to early termination of the study, and (ii) dropout, or loss to follow-up, which occurs when the subject interrupts contact with the investigator before the end of the study and before the survival time is observed. An important example is patients who have surgery in which a human joint, e.g, the hip, is replaced with an artificial one, a prosthesis. Often, the prosthesis wears out and needs replacement, and surgeons are interested in the time, , between the first surgery and the next replacement, that is, the survival time of the prosthesis. Our work is motivated by the large number of dropout patients often occurring after surgery of joints (e.g., Gartland, 1988; Dorey and Amstutz, 1989; Murray, Carr, and Bulstrode, 1993). Our goal is to estimate the survival function of the cohort of all subjects entering the study. In order to estimate this survival function in the presence of censoring, it is common to assume that entry times into the study do not relate to survival and thus, essentially, that administrative censoring is ignorable in the sense of Rubin (1976, 1978). Moreover, it is sometimes assumed that dropout is also ignorable, possibly after conditioning on observed covariates (e.g., Dobbs, 1980). Ignorability of dropout then essentially requires that a subject whose survival is censored by dropout after time , say, from entry, has comparable survival to a subject whose survival is censored after time by administrative censoring. Although ignorability of administrative censoring can be plausible, ignorability of dropout is suspect (Sims, 1973; Austin et al., 1979), and one plan to lessen the reliance on this assumption is to use a random (or representative) sample of the original dropouts. Assume that, for this subset of subjects, the investigator successfully obtains the data intended to have been recorded in the absence of dropout, 2 namely either the actual failure time or knowledge that it would have been administratively censored. Similar two-phase designs are known in the survey literature as double-sampling designs (Cochran, 1963; Glynn, Laird, and Rubin, 1993; Rotnitzky and Robins, 1995; Zanutto, 1998) and the analyses of their data are typically simple when the outcome is either fully observed or fully missing. Problems involving double-sampling with competing risks have been discussed by Hogan and Laird (1996), and Flehinger, Reiser and Yashchin (1998), although under different frameworks, assumptions and goals from ours, whereas Baker, Wax, and Patterson (1993) and Wax, Baker, and Patterson (1993) discussed double-sampling of dropouts assuming discrete survival data and in the absence of competing varying administrative censoring. 1 2 Purpose and outline. We have two aims. First, by formulating an appropriate framework for our problem that links the concept of potential outcomes (Neyman, 1923; Rubin, 1974, 1978) to competing risks, we show how to use the data from the double-sampled subjects to draw inference for the cohort survival function. Obtaining valid inference using double-sampling with survival data is more subtle than with survey data because of an idiosyncrasy that can render usual analyses incorrect when estimating the cohort survival function. Demonstrating this fact is our second aim, because, to our knowledge, this has not been discussed in the literature. The procedure we focus on needs only a stratification of the intended data, and, for example, does not rely on the times of study entry and of study-dropout for each dropout subject (such times might not be available to the analyst due to confidentiality constraints). When more than the intended data are available for analysis, our procedure will not be fully efficient, and one could construct locally semiparametric efficient estimators (e.g., Robins, Rotnitzky, and Zhao, 1994), or could use fully parametric methods, and it would be interesting in future work to compare these procedures to the proposed method in realistic finite samples. 3 In the next section, we develop our framework using potential outcomes that are characteristics inherent to subjects. We posit an assumption that treats administrative censoring as a randomization variable, and thus independent of the potential outcomes. This reveals a specific structure on the two subgroups of subjects for whom the only reason for possibly missing data is administrative censoring: the nondropout group, and the recovered subset of the dropouts. In particular, for each of these two groups, it is tempting to apply standard survival methods that assume independence between survival and administrative censoring times, and then combine these two separate estimates as with two-phase designs in survey literature. Under our assumptions, however, we show that within each of the two subgroups, the administrative censoring times generally correlate with survival times. Consequently, Kaplan-Meier estimators (Kaplan and Meier, 1958) that stratify by these groups, or, alternatively, likelihood-based methods that ignore this correlation, generally do not appropriately estimate the survival curves within either of the subgroups, nor hence, the survival curve for the cohort. In Section 3, we discuss addressing the problem from the perspective of missing data and we derive an estimation procedure that, under our assumptions, appropriately estimates the cohort survival curve. In Section 4, we evaluate our procedure in situations that are realistic in practice. Although double-sampling when faced with both dropout and administrative censoring is natural, we do not have a completely real data example available for demonstration. Consequently, we use published margins for survival curve, dropout rate, sample size accrual rate, and available length of follow-up from an actual hip-replacement study, to project simulated double-sampling data, which we use to compare our procedure to the re-weighted stratified Kaplan-Meier estimators. The final section gives some further remarks for the design and estimation methods. The appendix provides technical details. 4 2. The Problem 2 1 Potential outcomes and goal. Consider a population of subjects, for example, having surgery to have the human hip joint replaced by a prosthesis. A common complication is that subjects drop out, that is, are lost to follow-up, in the sense that they interrupt contact with the investigator before their survival times becomes known. For clarity, we define potential outcomes (Neyman, 1923; Rubin, 1974, 1978) for subjects of the population before making probability statements, as in the Rubin Causal Model (Holland, 1986). Suppose that subjects ipate in a hypothetical study depicted in Fig. 1. Let study, e.g., the time of surgery, and from the population partic- be the calendar time of entry into the be the survival time of the prosthesis, counting from surgery for each subject (Fig. 1(a)). Also in part (a) of the figure, let be the length of time, counting from entry, for which subject would remain in the study, i.e., without dropping out, if the study remained open indefinitely, where . The potential outcome is treated as a characteristic inherent to subject at this time in this study, like birth-date and gender. For subjects who would drop out in that hypothetical scenario, , whereas for subjects who would not drop out before failure of their prosthesis no matter when the study would end, so we say that the indicator is the indicator function. !" $# is the true dropout status, where &%'# [Figure 1 about here.] Now that the potential outcomes have been defined, we consider the actual study. Let )(+*&, be the calendar time when the study actually ends, e.g., June 1 1999, as in Fig. 1(b). - /0)(+*1,23. be the administrative censoring time, and . 4 min 56-7# and 89: ;-$# , be, respectively, the length of survival time that lies Using standard notation, we let within the administrative censoring time, and the indicator for whether that length is the full survival time. The data 4 <8=># are the intended data, that is, the information the investigator would record in the absence of any dropout. But when 5 4 , neither 4 nor 8 is observed, 4 @<A$B C D !E ># ; @<A$B F for subjects who are not dropouts in the study, whereas @<A$B HG for the dropouts in the study (Fig. 1(c)). Note that the observed, study-dropout status, @<A$B , is not always equal to the true dropout status . For example, by comparing Fig. 1 (a) to (c), subjects 4 and 5 who are administratively censored in the study (and thus have @<A$B D ) are a mixture of true nondropouts ?IJK# and true dropouts ?IHGL# . We let M)>N# be the fraction of all subjects in whose survival time exceeds , for )OPG . Our goal is to learn about the survival function M)>N# . Our methods can be extended to allow but rather ?<6@<A$B # where for observed covariates. However, to communicate our main arguments without complicating notation, we assume that no covariates are recorded or, alternatively, that we are already within cells of observed covariates. 2 2 Initial aspects of study design. Assume that subjects that Q R S are a simple random sample from the population , and is large enough so that observations on different sampled subjects can be treated as independent. We focus discussion on cohorts of subjects that are homogeneous (e.g., in terms of a common surgical method over time) in the sense that entry times are not related with either survival times, , or true dropout times . We can express this in the following assumption, which we will make throughout the rest of the paper. A SSUMPTION 1. Randomness of entry times: TU < V6.$# T U V6$# T U ! )>#W where, here and in the sequel, the measure Pr ?# is the one induced by the random sampling from the population . 6 One implication of Assumption 1 is that, in the full cohort of subjects, survival and administrative censoring times are independent: Pr <6->#. Pr ># Pr ?-># , which is a common assumption (Fleming and Harrington, 1991, and references therein; Hougaard, 1999). Another immediate implication is that true dropout status and administrative censoring times are independent: Pr ?<6-># Pr ?X$# Pr !-$# . 2 3 Double-sampling and observed data. In joint-replacement studies, dropout occurs for a variety of reasons (e.g., the subject moves, has pain and visits a different physician, feels very well and interrupts the hospital visits). Because dropout is possibly related to survival (as in similar settings, Sims, 1973; Austin et al., 1979), we consider a second phase of data collection, where the investigator selects a subset of the study-dropouts, (i.e., with @<A$B YG ), and pursues them intensively enough to record their 4 < 68># , which would have been observed at calendar time Z(+*&, . We let MII; if subject has /@VA$B G and is pursued in this second phase, such as subject 2 in Fig. 1(d), and we let MI[;G otherwise. Pursuing dropouts is costly (e.g., Dorey and Amstutz, 1989), so intended data the size of the sample would depend on available resources and rarely include all those with @<A$B 3G . For situations where 4 <68$# is still not recorded among some subjects with M\I; , see Section 5. Also relevant to the design of the second phase of data collection is the following issue. The data 4 V68$# for dropouts with early entry times generally carry more information than those with later entry times in the sense that the former subjects have been in the study longer. Moreover, early entry subjects are more likely to be study-dropouts because they have been followed for a longer period. Therefore, a simple random sample from ]^ C @<A$B _G:` will automatically oversample study-dropouts with earlier entry times. For this reason, and because we generally will not know exactly the optimal design without prior knowledge of the survival 7 times of the dropouts, we assume for simplicity that, from those with a@<A$B chooses the set ]^ C MI/bR` 3G , the investigator by simple random sampling that, for convenience, we treat as independent Bernoulli trials with common and known selection probability Pr VM"c de @VA$B GL#gfIhjilk . We comment on extensions in Section 5. 2 4 Idiosyncrasy. After the double-sampling, we have two groups of subjects with in phase 1, ]^ C /@<A$B bR` , and the sampled subset ]^ C MmbR` 4 V 68$# : the nondropouts of the dropouts; for both groups, the only reason for missing survival time is administrative censoring. The two groups can have different survival curves, and, because only some subjects of those with @VA$B HG are R` versus ]nMo"YR` , after recovered, an analysis that ignores the group classification, l] @<A$B Y double-sampling (e.g., as in Dorey and Amstutz, 1989), is not appropriate generally. Therefore, in this section we consider the consequence of using the following procedure: (i) constructing Kaplan-Meier estimators within these two observed subgroups, and then (ii) combining the two estimators, weighted by the appropriate proportions of ]^ C @<A$B pGq` , to estimate the cohort survival function MX>N# . ]^ C @<A$B FR` and This stratified Kaplan-Meier (SKM) statistic is the standard estimator when censoring is solely administrative. The statistic SKM would be consistent for of the survival times with @<A$B rG M)$N# if the administrative censoring times for study-nondropouts, i.e., with @<A$B J -) were independent , and for study-dropouts, i.e., . However, this independence does not generally hold under Assumption 1 in either group; the following result is derived as an application of Bayes’ theorem and after some algebra (proof omitted). 8 R ESULT 1. Under Assumption 1, and with positive s and , TU T U OtKd @<A$B J -3su# t O Kde3J^# TU TU vxw ! T yU JK#mzg{+?s^T|NU # ?xO}sl63HGL# ?3J^#\z ?tO}s^6y~GL# TU TU Ot Kd xOs^6H3GL# where {+?s^|N# C OgKdeH;^# TU T U OxKde @<A$B ~G 6-3s# OxKde3HG tsu# In Result 1, Pr OKd 6-0rs# @<A$B is generally a function of s and and, therefore, survival and administrative censoring times are correlated within both strata defined by observed dropout status @<A$B . This complication is not due to the simple random sampling design from the study-dropouts in the second phase because the correlation between survival and censoring times is also present in the study-nondropouts. Focusing on the latter group, Result 1 shows - {+?s^|N# for all s^| . A necessary and sufficient set of conditions for this is that: (i) survival times be independent of true dropout statuses , and (ii) among true dropouts, ]^ C Z+G:` , survival times be independent of dropout times . However, by definition for true dropout, for all . If, in addition, there is a survival that and would be independent if time in the cohort that is smaller than the largest observed dropout time, which is expected to be true in most realistic cases, then it can be easily shown that conditions (i) and (ii) above cannot both hold. It follows that the straightforward within-stratum Kaplan-Meier estimators are generally inconsistent and, hence, that generally the combined statistic SKM is not an appropriate estimator for the cohort survival curve M)>N# . Result 1 also implies that standard “ignorable” (Rubin, 1978) likelihood or Bayesian survival methods that stratify on @<A$B and ignore the correlation between survival and administrative censoring times would also suffer from bias, even in large samples with the correct model for M)$N# . 9 The spurious correlation within observed groups in Result 1 occurs because the observed data are non-trivial functions of the potential outcomes that reflect scientifically relevant characteristics of subjects. The distinction between potential outcomes and observed data occurs, more generally, in studies that suffer from deviations from protocol, including treatmentnoncompliance and incomplete outcomes, and can be critical in defining and estimating quantities of interest (e.g., Rubin, 1978; Robins and Greenland, 1994; Angrist, Imbens and Rubin, 1996; Frangakis and Rubin, 1999; Robins, Greenland and Hu, 1999; Rubin and Frangakis, 1999). 3. Addressing the Problem: Missing Data Perspective 3 1 General. Because the selection mechanism of study-dropouts at the second phase is known conditionally on the first phase, any data 4 <8=># missing in stratum ? @VA$B 9GL# after double-sampling are missing at random (Rubin, 1976), a fact that we can use for robust estimation using likelihood principles. Under Assumption 1, the likelihood function of data ;]l @<A$B 6-< Mo!`n] 4 568 @<A$B JR`n]l @<A$B HG: MooHG:`n] 4 <68V6 @<A$B HG Moo;R` C C C is proportional to X?Qd #H where Pr v 4 V68<6-V6 @<A$B JR6 #1q! S Pr ?-<V6 @<A$B 3G 6 # hS5 ! S k7h5:i k Pr 4 V68<6-V6<6 V@ A$B 3 G 6R# i represents parameters governing the distribution of all observables. Consider also the likelihood function of the reduced data ]l Mo?`]: 4 ?68!# C @VA$B r @<A$B 10 or Mon` , proportional to !Qd #H Making inference on 4 V68|de @VA$B ; 6R#K !S Pr <8=Nde @<A$B HG 6R#K i ~ v Pr ? @<A$B D 6 # ! ] Pr ? @VA$B 3 G 6R# ` 5 ? MX>N# Pr 4 X?Hd by maximizing # when is unrestricted is generally not possible. Alternative ways to address the problem, under Assumption 1, include (i) maximizing the reduced data likelihood X?Qd # with unrestricted ; (ii) positing semiparametric submodels, , for ; B (iii) positing fully parametric submodels for . For approach (i), the components of that are identifiable from X?Qd # identify M)$N# , so this method will produce a consistent estimator of the survival curve for general distributions . This method provides the basic intuition for how the problem can be addressed. Because this method uses only the /@<A$B –stratification of the intended data 4 V68$# of nondropouts and of dropouts that have been double-sampled, it can be applied even in constrained situations, for example, when the entry times &V3.(+*1,.2-$# and dropout times are hidden from the analyst because of confidentiality concerns. For approach (ii), work of Robins, Rotnitzky, and Zhao (1994) can be used to construct estimators for M)$N# that take the form of inverse probability of censoring weighted (IPCW) B –optimal estimators within that class asymptotically: are efficient when the working model is correct; remain consistent when B B is incorrect; and are more efficient than that of (i) when the additional data 2 are statistics and include estimator (i) as a member. The available. In approach (iii) maximum likelihood or Bayesian inference can be used. We focus on approach (i), thereby providing insight into the situation in the simple case with 11 minimal data available. Nevertheless, it would be interesting to compare the three methods to see how much efficiency is lost using (i) relative to (ii) and (iii) in realistic settings where the extra information is available. 3 2 Maximum likelihood from X?Qd Assuming that the random variables #. and - are absolutely continuous, first we re-parameterize the problem in terms of hazard functions. Define the net hazard function of interest for the full cohort of subjects, net $N# C lim1.] Pr > 0z;md EN#| R¡` . By Assumption 1, survival and administrative censoring times are independent in the full cohort. Therefore, the net hazard function in the full cohort is equal to the crude hazard function in the full cohort, defined as crd >N# C z3md 4 EN#| R¡` lim1.] Pr >¢ (e.g., Fleming and Harrington, 1991). Furthermore, we can generally decompose the crude hazard function of the full cohort to the crude hazard functions for the two subgroups defined by the observed dropout status £ > N# C @<A$B , crd lim1.] Pr © crd $ xmztmd 4 E} @<A$BP¤ #| n¥` $N#c£N¦ § £ $N# C |¨ R© Pr ? £ $N#N crd £ > N#W , using where £ £ @VA$B ~¤"d 4 E}N# « &£ ¬j§ ª $N#$£&f ¬ £1¬ | ¨ ª $N#$f (3 2) £ $N# Pr 4 E}Kde@VA$BP¤ # are the probabilities of being in the “risk set” at time , ª £f Pr ? @VA$B P¤ # , for ¤~G ® , and the expressions in (3 2) follow from Bayes’ theorem. The time dependent weights £ >N# reflect that (3 2) is a stratification at every time point. At this © £ $N# , and the probabilities £ $N# and f £ can be estimated by stage, the crude hazard functions crd ª # with no further modeling assumptions. maximizing the likelihood function X!d For convenience in presentation, abbreviate the two groups ]^ C @<A$B ;n` and ]^ C MoI;n` by ¯ and ¯= , respectively. Using standard notation, define the “risk set” and failure processes: and where 12 4 4 ° $ N# C ±: Z EFN# and ² $ N# C ³: ZcN#N8 ; and for the groups « « ¯ £ |¤J G ® , by ° £ >N# C S ´Kµ·¶ ° $ N# and ² £ $N# C ¸´®µ¹¶ ² >N# . Using the notation ½ £ $N# C »º¢>N# for the differential of right-continuous processes º¢>N# , we define the processes ¼ crd ¾ h |¨ ¿ k »² £ ${o#| ° £ !{o# , and ¿ ½ £ !{o#1½ ¼ crd £ ${I# where ¼ >N# C £|¦ § ¼ |¨ À © (3 3) £ >N# C « ª ¼ £ >N# £&f ¼ ¬ £ £1¬ £ >N# C ~° £ >N#| n £ ¼ £&¬'§ |¨ ¼ $N# f ¼ ª¼ ª © « and where £ is the number of people in group ¯ £ , and f ¼ £ :? @VA$B H¤ #| n , for ¤¢G ® . ½ £ $N# maximize the likelihood X?yd # with respect to, and are centered The processes ¼ crd ¾ ½ £ £ !{o#1»{ , regardless >N# C |¨ ¿ crd approximately at, the cumulative crude hazard functions crd h k for each individual, by of dependence between survival and administrative censoring times. Similarly, the empirical £ ª ¼ > N# distributions approximately at, and the ratios £ ª >N# and f ¾ £ f ¼£ maximize # X?;d Q with respect to, and are centered respectively. Then, by (3 2) and (3 3), the process ½ ¼ >N# will be |¨ ¿ k crd !{o#NL{ , and thus, by Assumption 1, at the net cumulative h ¾ ½ hazard function net $N# C |¨ ¿ k net !{o#NL{ . h ½ The estimator ¼ $N# in (3 3) can also be viewed as having a generalized form of a “weighted centered approximately at logrank” statistic (Fleming and Harrington, 1991), where, here, the weights vary both across time and between the two estimated crude hazard functions. However, the approximate vari- « N£ § ¾ ¿ £ £ !{o#NL{ ½ net $N# , (given in (A.2)), has | ¨ $ {I#N crd © a different form from that of usual weighted logrank statistics. For a time |(Px)(+*&, such that Pr OP5(Ád @<A$B 3¤ # and Pr !-9O}&(Âde @<A$B H¤ # are positive for ¤ÃyG ® , the following result, ½ whose proof is outlined in the appendix, summarizes the large-sample distribution of ¼ $N# . ability of ½ ¼ >N# around the estimand, 13 R ESULT 2. Under Assumption , and for Gtx1( , ½ ½ÇÈ>É )ÅÄ Æ ¼ $N#[2 >N#^ÊË Ìc$N#W Ë H Also, for Ð with GÑJÐ |Á&( weakly in distribution, as Í Ìc$N# is a Gaussian process with ]nÌF>N# `ÎÏG . , the covariance ÒuÓKÔm]nÌcVÐ^#W Ìc>N# ` has the form (A.2) given in , where the appendix. The expression for ÒÓKÔI]nÌc?Ðl# Ìc$N#6` in (A.2) involves quantities, defined in the appendix, ½ crd £ $ N# , £ >N# , and f £ , ¤HG:® . By replacing the latter with ª ½ crd £ £ their sample analogues ¼ >N# , ¼ >N# , and f ¼ £ in all the defining expressions in the appendix, ª Õ we let the statistic ¼ VÐ |N# be the resulting sample analogue of expression (A.2). Then, using Result 2 and Slutsky’s theorem, it can be easily shown that, for times with G=xg|( , that are functions of the unknowns ½ ½ ÅÄ Æ ¼ $N#2 in distribution, as wise) for ½ net ½ ×WØ:Ùm]»2Ú¼ $N#6` $N# . ÏË net Å $N#^Ê Æ Õ ¼ $ |N#KÊ Ä Ë ²Ö?G:®^#W (3 5) . We can then use (3 5) to obtain confidence intervals (point- Using the identity MX>N#Á×WØ:Ùm]»2 ½ net $N#6` , we obtain an estimator, )¼M $N# C , that is consistent for the survival curve. We can use the same relation to obtain confidence intervals for ½ Í net MX>N# as the reciprocals of the exponentiated confidence intervals for >N# . X?Ûd Ü # instead of X?Úd # , it requires only the implication of Assumption 1 that survival and administrative censoring times - are independent in the cohort, and partially unobserved, group of subjects. Note that, because this estimation method is based on likelihood The full structure of Assumption 1 was important in Result 1 to demonstrate that stratified Kaplan-Meier estimators are generally not appropriate in this setting. 14 4. Projected Simulations from Actual Study 4 1 Setting. Results for the “Tharies” type of hip-replacement prostheses were reported by Dorey and Amstutz (1989), who discussed 1985 data for the first 100 such prostheses implanted in their hospital starting in 1975. These prostheses were implanted in the first 2-year window (19751977) and, although they all had a potential follow-up time -) of at least 8 years, 45 (45%) of the patients were lost to follow-up within 8 years following their surgery and before prosthesis failure (i.e., ÝrÞ , although the dropout times are not known to us). Among those original dropouts, the intended data 4 568$# were recovered for 35 patients (78%=35/45) af- ter an intensive search reported by the investigators. Subsequently, the survival curve for the cohort was computed using a single Kaplan-Meier estimator, where the non-sampled dropouts were treated as administratively censored. Because the general inconsistency of this unstratified Kaplan-Meier estimator in our framework follows from techniques used in Sec. 2.4, we do not simulate detailed results for it, but we use their reported curve as a reference “true curve” below. In addition, the study had, essentially, no administrative censoring because patients who got Tharies in the 8-year window 1977-1985 were ignored. In this Section, by projecting information from the Tharies study to allow potential inclusion of patients from the whole 10-year window available in 1985, we will compare our method with the stratified Kaplan-Meier when both dropout and administrative censoring are present, in conditions summarized in Table 1. To reflect looking at accrual within the 10-year window, in all 12 conditions of Table 1, (+*1,3 K G years and we simulated entry times uniformly in (0,10) [so, -.ß ) 1KGa2x.!#Zàá?G KGL# ]. We matched points on the Tharies empirical survival curve of Dorey Z and Amstutz (1989; Fig. 1, “complete analysis”) to a model for where C ÏâSÓ ã· !# is normally distributed, giving mean äm塿/Yâ¸ÓRã·?ç éè years # and standard deviation ê¡å¥æÁ9G ìë í Þ . The resulting survival probabilities M)$N# at times î ë and 8 years are 72.65%, 65.19%, we set 15 and 58.20%, and, in the following, will be treated as true and will be our estimation goal. The simulations follow Assumption 1. The Tharies’ study-dropout rate of 45% in the first 8 years of follow-up means that the fraction of true dropouts ?ZIHGL# would be larger than 45%. To reflect this, we fixed Pr !/I GL#ÏïRG ð , and used the models described in the next paragraphs to get study-dropout rates f (ñoòq B ó C Pr ? @<A$B ~Gode-"O}ÞL# in the range 40% to 46% when, as with the Tharies’ 45% study(+ dropout, we condition on -OyÞ . We also report our rates f B C Pr ? @VA$B JGL# over the full 10-year window. To investigate plausible conditions for the remaining associations between true dropout status, survival, and dropout time among true dropouts, for which we do not have further information from the Tharies study, we posit the following models. We draw conditionally on the survival times following the probit model Pr !ÁIJ:d&â¸Ó ã¡ N#Hô]lõ a z 6` $# , where ô/1%j# is the standard normal cumulative distribution function. Because we have fixed the marginal probability of being a true dropout to 50%, we vary Û2aR6G ® , V qí jíRè 6G ®2 í:ìíRè # . Under this setting, â¸ÓRã· # has the same standard deviation within both sets ]^ C ZG:` and ]^ C cn` , and de dqY generates a difference of 1.05 standard deviations between the means of â¸Ó ã· # in the two groups defined by . So, the meaning of is how long the true nondropout patients’ Tharies would last compared to true which determines õ dropouts. Among true dropouts, survival time . ]^ C Z\yG:` , we allow the dropout time + To do this, we simulate and right-truncated at Z . That is, to be related to the from a log-normal regression on P 6HHGL#à Pr ]l d 6yHG:6 `» where P 6H3GL#3² w Iä ÷Læzgø ê¥÷ æ >[2ùä¹å¥æ|#Wêm÷ ú æ 1)2ùø ú6# Pr ? d ê 塿 âSÓ ã·?#Kdö ?â¸Ó ã· >#|# 16 spectively, for simplicity. corr ?5 |d GL# , äm÷ æ¢â¸Ó ã·!)(+*&,n# and ê¥÷LæÃ_ê 塿 , re By varying the parameter ø we induced different values of ø C where we fix the mean and standard deviation of , the correlation between patient dropout times and prosthesis survival among the true dropouts. To match the observed sample size accrual rate of 100 patients in the actual Tharies 2year window, for each individual study and in each of the 12 conditions defined by the above settings we simulated }³ïnG G patients in the 10-year window. In each individual study, we subsequently double-sampled study-dropouts with a 50% probability each. We set our goal to compare performance, in terms of coverage rates of nominal 95% confidence intervals and mean squared errors for the survival probability M)>N# of Tharies at Hî: ë and 8 years, and evaluated over 2500 replicated studies for each condition in Table 1, using (i) the procedure described in Section 3, labeled IML because it is the maximum likelihood of the @<A$B –stratification of the intended data; and (ii) the procedure labeled SKM, based on the stratified Kaplan-Meier estimator described in Section 2.4 and a normal approximation to its distribution. The standard error for SKM is obtained by the delta method; these standard errors were very close to the sampling standard deviations of SKM as computed over the simulation replications for each condition (not shown). An S-plus5 program with FORTRAN subroutines for general implementation of procedure IML, and an S-plus5 program for generating the simulations described here are available from the authors. 4 2 Results. Based on the conditions in Table 1, when the true dropout patients’ Tharies have longer survival times than those of the true nondropouts (conditions with comparably except for the cases where the correlation ø ³2a ), SKM and IML perform between patients’ dropout time and Tharies survival is 0.34 or 0.16, where IML is superior to SKM. These cases, among all those 17 with û2a , have the highest fraction of study-dropouts f (+ñoòq B ó and closest to the observed 45% based on the 100 patients reported by Dorey and Amstutz (1989). This indicates that, even with relatively small association between survival time and dropout time among true dropouts, the fraction of study-dropouts influences the performance of the SKM procedure. [ Table 1 here ] This influence can also be seen in the conditions where true dropouts would have the same Tharies survival distribution as the true nondropouts (conditions with _G ). Because the survival distribution for the whole cohort is the same across all sensitivity conditions, the prostheses for true dropout patients must have shorter survival times under conditions G than under conditions ü2 . Consequently, by comparing the results assuming 2 to those assuming 9G with comparable values of the correlation ø , we observe that (i) there (+ ( are larger study-dropout rates, f ñoòqB ó and f B , and (ii) in these cases, SKM notably undercovers the true probability values. Nevertheless, IML has sufficient coverage, and is also more accurate than SKM as indicated by the mean squared errors. Finally, the conditions of Table 1 in which the prostheses for true dropout patients have r ) are these conditions, among all conditions in the Table, give the highest study-dropout rates f (ñoòq B ó , smaller survival times than the prostheses for true nondropouts (conditions with associated with high correlation, ø , between patients’ dropout time and Tharies survival. Also, and the closest to the observed 45% from the actual study. In these conditions, SKM performs very poorly, whereas IML performs quite well. 5. Further Remarks and Extensions Among the dropouts pursued in the second phase, there can be a small number of cases for which the data 4 V68$# still do not get recorded, perhaps if a person has died before Z(+*1, . If, in these cases, one still wished to project a hypothetical status of the prosthesis at (+*1, , which 18 would have to be assessed based on other observed information, we assume these special cases would be treated as administratively censored, and, therefore, practically we can still assume that if MoI; , then the data 4 < 68$# are recovered. As noted in Section 2.3, the data 4 < 8=># for dropouts with early entry times generally carry more information than those with later times. However, a nonprobability sampling in the second phase that would include only study-dropouts with early entry times cannot generally give appropriate inference for the cohort without using parametric model extrapolation because, by Result 1, the study-dropouts with early entry times have different survival distributions from the study-dropouts with later entry times. We restricted attention to the simple random sampling at the second phase, for convenience in demonstrating estimation and to clarify the complication of using the stratified KaplanMeier estimator even in this simple design. Alternative designs in the second phase can be implemented that use different probabilities of selection for different subjects. When these probabilities depend on continuous, as opposed to discrete, covariates, such as the entry times, the method proposed here based solely on stratification on @<A$B status would need adjustment to reflect the design probabilities of selection when estimating the cohort survival curve. As mentioned in Section 3.1, when, in addition to the data , the data are available for analysis, the proposed method can be further improved by semiparametric or parametric methods. Moreover, the implementation of estimation can be simulation-based, as opposed to analytic, and can be motivated even within our simple setting where closed-form approximate inference exists. For example, one could use the permutation distribution of the data the subsample ]^ C Mom;R` of study-dropouts to “impute” the missing data 4 58=># 4 <8=># of for the non- subsampled study-dropouts. For each such configuration, a Kaplan-Meier statistic could be calculated for the “imputed” full cohort, where censoring and survival times are independent, and these statistics could then be averaged to give an estimate for the cohort survival curve. 19 Because the total number of such permutations can be very large, this approach practically would rely on principled simulation methods coupled with principled methods of analysis of multiply imputed datasets (Rubin, 1987; Glynn, Laird and Rubin, 1993; Tu, Meng, and Pagano, 1993; Efron, 1994; Lin, Fleming and Wei, 1994; Rubin, 1996; Wang and Robins, 1998). Simulation-based implementation of estimation can be particularly relevant in extensions of our setting when analytic derivation becomes less tractable, for example, in situations when Assumption 1 is relaxed to allow for non-homogeneous cohorts with calendar time trends in survival and true dropout behavior. A final comment is that we expect many studies of survival employ at least some type of informal double-sampling of dropouts, and we hope that the availability of our framework and methods stimulate the study and use of double-sampling to address dropout with survival data. ACKNOWLEDGEMENT The authors thank the orthopedic clinic of IKA Hospital in Melissia, Athens, Greece, for hosting in part this work; Stuart G. Baker and Emanuel Frangakis for useful discussions; the editor and reviewers for helpful comments; and the support from the U.S. National Institute of Child Health and Human Development (R01 HD38209), National Institute of Mental Health (R01 MH56639), National Institute for Drug Abuse (R01 DA10184), the Johns Hopkins H.-C. Yang Memorial Faculty Fund, and the National Science Foundation. R EFERENCES Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables (with Discussion). Journal of the American Statistical Association 91, 444–472. Austin, M. A., Berreyesa, S., Elliott, J. L., Wallace, R. B., Barret–Connor, E., and Criqui, M. H. (1979). Methods for determining long-term survival in a population based study. American Journal of Epidemiology 110, 747–752. 20 Baker, S. G., Wax, Y., and Patterson, B. H. (1993). Regression analysis of grouped survival data: informative censoring and double sampling. Biometrics, 49, 379–389. Billingsley, P. (1979). Probability and Measure. New York: Wiley. Cochran, W. (1963). Sampling Techniques. New York: Wiley. Dobbs, H. S. (1980). Survivorship of total hip replacements. The Journal of Bone and Joint Surgery [Br] 62, 168–172. Dorey, F. and Amstutz, H. C. (1989). The validity of survivorship analysis in total joint arthroplasty. The Journal of Bone and Joint Surgery [Am] 71, 544–548. Efron, B. (1994). Missing data, imputation, and the Bootstrap (with Discussion). Journal of the American Statistical Association 89, 463–478. Flehinger, B. J., Reiser, B., and Yashchin, E. (1998). Survival with competing risks and masked causes of failures. Biometrika 85, 151–164. Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. New York: Wiley. Frangakis, C. E. (1999). Coexistent complications with noncompliance with study-protocols, and implications for statistical analysis. Unpublished PhD thesis (appendix of Part 3), Department of Statistics, Harvard University. Frangakis, C. E. and Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika 86, 365–379. Gartland, J. J. (1988). Orthopaedic clinical research. Deficiencies in experimental design and determination of outcome. The Journal of Bone and Joint Surgery [Am] 70, 1357–1364. Glynn, R. J., Laird, N. M., and Rubin, D. B. (1993). Multiple imputation in mixture models for nonignorable nonresponse with follow-ups. Journal of the American Statistical Association 88, 984–993. Hogan, J. W. and Laird, N. M. (1996). Intention-to-treat analyses for incomplete repeated measures data. Biometrics 52, 1002–1017. Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association 81, 21 945-970. Hougaard, P. (1999). Fundamentals of survival data. Biometrics 55, 13–22. Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457–481. Lin, D. Y., Fleming, T. R., and Wei, L. J. (1994). Confidence bands for survival curves under the proportional hazards model. Biometrika 81, 73–81. Murray, D. W., Carr, A. J., and Bulstrode, C. (1993). Survival analysis of joint replacements. The Journal of Bone and Joint Surgery [Br] 75, 697–704. Neyman, J. (1923). On the application of probability theory to agricultural experiments: essay on principles, Section 9. Translated in Statistical Science, 5, 465–480, 1990. Robins, J. M. and Greenland, S. (1994). Adjusting for differential rates of prophylaxis therapy for PCP in high-versus low-dose AZT treatment arms in an AIDS randomized trial. Journal of the American Statistical Association 89, 737–479. Robins, J. M., Greenland, S., and Hu, F.-C. (1999). Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome (with Discussion). Journal of the American Statistical Association 94, 687–712.. Robins, J. M., Rotnitzky, A., and Zhao, L.-P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866. Rotnitzky, A. and Robins, J. M. (1995). Semiparametric regression with follow-up of nonrespondents. Technical report, Department of Biostatistics, Harvard School of Public Health. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688–701. Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581–592. Rubin, D. B. (1978). Bayesian inference for causal effects. Annals of Statistics 6, 34–58. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. Rubin, D. B. (1996). Multiple imputation after 18+ years (with Discussion). Journal of the American Statistical Association 91, 473–489. 22 Rubin, D. B. and Frangakis, C. E. (1999). Comment on “Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome” by J. M. Robins, S. Greenland, and F.-C. Hu. Journal of the American Statistical Association 94, 702–704. Sims, A. C. P. (1973). Importance of a high tracing rate in long-term medical follow-up studies. Lancet No. 2, 433–435. Tu, X. M., Meng, X. L., and Pagano, M. (1993). The AIDS epidemic: estimating survival after AIDS diagnosis from surveillance data. Journal of the American Statistical Association 88, 26–36. Wang, N. and Robins, J. M. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika 85, 935–948. Wax, Y., Baker, S. G., and Patterson, B. H. (1993). A score test for non-informative censoring using doubly sampled grouped survival data. Applied Statistics 42, 159–172. Zanutto, E. L. (1998). Imputation for unit nonresponse: modeling sampled nonresponse follow-up, administrative records, and matched substitutes. PhD thesis, Department of Statistics, Harvard University. A PPENDIX : O UTLINE OF PROOF OF ½ R ESULT 2 ¼ >N# , we will use some additional definitions. For OxG , £ « >N# ; let >N#] £&¬j§ |¨ £&¬ $N#>f £&¬ ` ú , and, for ¤ÚFG \ , let: £ ¨ ý?þWÿ ®>N# C r12Â^# f¥<f > N # ª ª £ ¨ ý?þ ®$N# C &2aK# £ $f¥Vf K$N# $N# ; and £ ¨ ý ®$N# C F12© ^# £ >N# l$N# $N# . In the above, ª ª ª © £ ¨ ý?þuÿÄ ®$N# is the quantity obtained when the© the Äweight £ $N# in (3 2) is regarded as a function © l>N#W $N# , and f , and is differentiated partially© with respect to l>N# . The quantities of ª ª ª £ ¨ ý?þ $N# and £ ¨ ý $N# have similar interpretation. In an analogous way, we define the funcĽ « Ä £N§ ¾ ¿ £ « £N§ ¾ ¿ £ ½ £ ½ ½ £ © tions crd ${o# , crd !{o# , |¨ ¨ ý?þWÿ ${I#1 crd |¨ ¨ ý?þ ${I#1 crd ý?þWÿ >N# © C ý?þ >N# C Ä Ä « £N§ ¾ ¿ £ © ½ ½ £ and crd !{o# . Using these definitions, we© express Ì9h Kk1$N# C |¨ ¨ ý K${o#N crd ý >N# C ½ Ä ½ © Ä form, in Lemma A. ÅÄ Æ ¼ >N#2 net $N#KÊ in a linearized To state the approximate variability of 23 L EMMA A. Under Assumption , and for G gx1( , Ì hKk $N#ÁÌ ý h ®k $ N#\z£N¦ § Ì ý?hþ ®¶ k >N#\zxÌ ýh ®¶k $N#\zq$ ÅÄ # |¨ Ä ¿ ½ Ì ýh Kk $N#~ ÅÄ f ¼ 2Úf # ý !{o# Ä Ä ¿ À £ £ Å Å £ £ Ä Ä h ® k Ì ý?þ ¶ $N#~ ¦ ¨ 5$N# ¨ &>N# ]l° !{o#[2 ª ¸´®µ ¶ À ¿ £ Å Å ${I# £ ¨ &$N# £ ¨ 5$N# £ Ä Ì ýh K¶k $N#~ £ Ä ¦ £ !{o# ¸´®µ¹¶ À ©ª ¿ £ ¨ 1$N#3² $N#[2 and ° !{o#N À where £ !{o# `l ½ ?ý þW ¶ !{o#W £ ¨ 5${I# ½£ ${o# Proof. We have the identity Ì h®k >N# £N¦ § )ÅÄ | ¨ ¿ £ 2 £ ¿ £ ¼ £ »² £ z ¦ £ ¨ "!$ # £ ° ° © À © À © ¸ ´®µ ¶ (A.1) where we have suppressed the variable of integration. The first summands in the right hand side of (A.1) are asymptotically equivalent to expansion we have ¾ ¿ ½ £ ÅÄ ¼ £ 2 £ #N crd . © © Moreover, by a Taylor ÅÄ ¼ £ 2 £ #~ ÅÄ f ¼ 2Úf # £ ¨ ý z ÅÄ £ ¦ § ª ¼ £ æ2 ª £ æ # £ ¨ ý?þ ¶ æ z%q! ÅÄ # æ | ¨ © © © Ä © Substituting this expansion in (A.1), and noting that ° £ n £ converges uniformly in probability to £ in & G |5((' as £ Ë Í , by the Glivenko-Cantelli Theorem (Billingsley, 1979, Theorem ª 20.6), Lemma A follows after some algebra. Using Corollary B.1.1 of Fleming and Harrington (1991), in order to show Result 2 it suffices to show two conditions: (i) for any finite set of times converges in distribution to VÌc$ #W S 6ÌF>)®#N# , where Ì 24 S N ) , the vector ?Ì h®k $ #W S 6Ì hKk >)K#|# is a Gaussian process with moments as in Result 2, and (ii) the process Ì h®k is “tight” in the sense of Fleming and Harrington (1991, p. 340). The large-sample normality in condition (i) follows from applying the central limit theorem and Theorem 2.4.4 of Fleming and Harrington (1991) on Lemma A. This application also shows that, for fixed times GÐ |}X(+*1, , the covariance cov ]nÌc?Ðl# Ìc$N#6` is given by, Õ Õ Õ ý V Ð |N#\z £N¦ § ?ý þ ¶ ?ÐLNN#\z ý ¶ Ä | ¨ where Õ ý KVÐ |N# gf f¥ ¿ ½ ${o# B ½ C ý Ä Ä À À and, for D^ q]6f¥Vf h'ilk ` , J^ f ¿ $N#\z ý Ä Õ ý?þ ¶ ¨ ¶ VÐ |N#\z Õ ý?þ ¶ ¨ ¶ $ 6Ðl# (A.2) ${ W# , and ¤HG:®R B £ £ ${o# £ !{ # '^ ½.ý?þ ¶ $ { #N .½ ?ý þ ¶ !{o# & ] + *-,lØm!{"{ #6`Z2 ª ª ª À 02À 1eÇ ¨ ¿ £ h B k ] !{o# ` ú ½£ Õ ý KVÐ |N# / £ ${o#W and ¶ C £ ${I# © ª À 021eÇ ¨ ¿ h k B £ ${ # £ B ½ ½ £ Õ ý?þ ¨ ®VÐ |N# J23 £ ! o { 1 # !o{ # ¶ ¶ ª C ý?þ ¶ !{ 1# £ 4 $ I { # À À ª © Õ ý?þ ¶ VÐ |N# C £ The proof of condition (ii) is not difficult but is more laborious and is omitted here. Further details can be found in Frangakis (1999, appendix of Part 3). 25 Figure 1. Potential outcomes and observed data with double-sampling. For each part, solid lines represent observed information, dashed lines represent unobserved information: (a) survival times, (arrows), and true dropouts (circles); (b) administrative censoring with true dropout; (c) study-dropouts; (d) subject 2 is double-sampled. 26 Subject R Ei E max i 1 1 2 3 4 5 1 0 0 0 0 0 1 1 1 (a) E max Subject (c) ∆i E max 1 0 1 0 0 1 2 3 4 5 obs Ri Si 0 1 0 0 0 (b) (d) 27 Table 1 Projected Tharies prosthesis study of Sec. 4: sensitivity inference for the prostheses survival probabilities M)>N# at Hî çLïRð ë and 8 years from primary surgery; coverage of nominal confidence intervals and mean squared error. 576 -1 8:9 -0.4 -0.2 8 ;=<? >A@ E BDCF (%) ;=<?>A@ (%) 0 0.0 0.2 -0.4 -0.2 1 0.0 0.2 -0.4 -0.2 0.0 0.2 0.16 0.34 0.46 0.60 40.3 38.8 36.7 34.6 19.7 19.4 18.3 16.4 0.34 0.45 0.57 0.67 43.0 42.2 41.4 40.5 23.1 23.2 22.5 21.5 0.55 0.63 0.68 0.72 45.8 44.9 44.2 44.0 26.5 26.8 27.3 27.0 93.1 94.4 93.9 95.4 94.2 95.5 94.4 94.6 86.4 85.7 84.3 82.8 94.2 95.3 94.3 93.8 36.2 36.8 40.5 38.0 94.0 94.1 94.1 94.3 1.10 0.97 0.96 0.87 0.91 0.84 0.92 0.88 1.35 1.32 1.37 1.46 1.06 0.99 1.03 1.03 4.35 4.35 4.17 4.13 1.10 1.05 1.11 1.05 92.0 93.3 93.9 94.9 94.5 95.6 93.5 95.0 86.2 85.2 84.2 81.5 95.3 94.8 95.6 94.1 33.8 34.7 35.4 35.7 94.4 94.0 93.9 94.5 1.65 1.43 1.33 1.25 1.28 1.19 1.28 1.25 1.84 1.91 1.98 2.13 1.34 1.37 1.36 1.40 6.34 6.20 6.20 6.07 1.38 1.44 1.43 1.38 90.3 92.4 93.5 94.1 94.1 94.9 94.2 94.8 87.6 85.8 85.2 83.4 94.8 94.6 94.4 94.5 44.6 45.5 47.5 46.8 94.2 94.2 94.2 94.7 2.78 2.36 2.08 1.94 1.83 1.69 1.77 1.70 2.43 2.48 2.58 2.73 1.90 1.90 1.89 1.85 7.46 7.35 7.17 7.07 1.94 1.94 1.85 1.83 G2HJI"K Coverage (%) SKM IML HL2MN:O7K MSE SKM IML G2HP:K Coverage (%) SKM IML HL2MN:O7K MSE SKM IML G2HJQ"K Coverage (%) SKM IML HL2MN:O7K R MSE SKM IML (+ñoòq B óTS corr UWa Pr UWV cbed Y V @<A$B SfX _ S XY[Z ]\ ^+_ ; R ( B are considered because S Pr UWV a ?ghd @<A$B S X _ ; only positive values of for true dropouts. SKM, stratified Kaplan-Meier method; IML, method of Section 3.2. 28 ` S
© Copyright 2026 Paperzz