Inverse Probability of Censoring Weighted Estimates of Kendall’s τ for Gap Time Analyses Lajmi Lakhal-Chaieb,1 ,∗ Richard J. Cook2,† and Xihong Lin3,‡ 1 Département de Mathématiques et Statistique, Université Laval, 1045 av. de la Médecine, Local 1056, Québec, QC, Canada G1V 0A6 2 Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada N2L 3G1 3 Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Building II Room 419, Boston, MA 02115 October 18, 2008 ∗ email: [email protected] email: [email protected] ‡ email: [email protected] † 1 Summary. In life history studies interest often lies in the analysis of the inter-event, or gap, times and the association between event times. Gap time analyses are challenging however, even when the length of follow-up is determined independently of the event process, since associations between gap times induce dependent censoring for second and subsequent gap times. This paper discusses nonparametric estimation of the association between consecutive gap times based on Kendall’s τ in the presence of this type of dependent censoring. A nonparametric estimator is provided which using inverse probability of censoring weights. Estimates of joint and conditional gap time distributions can be obtained following specification of a particular copula function. Simulation studies show the estimator performs well and compares favourably with an alternative estimator. Generalizations to a piecewise constant Clayton copula are given. This estimator is used to analyse data on the times to an intermediate marker of menopause and menopause among women in a cohort study of reproductive health of women. Key words: copula, dependent censoring, gap times, Kendall’s tau 2 1. Introduction 1.1 Background In many settings involving the analysis of life history data, interest lies in the occurrence of two or more consecutive events. A common particular problem involves the joint analysis of a terminal event and an essential intermedicate event. Bebchuk and Betensky (2002) consider such a three state model in which the intermediate state represents HIV infection via blood transfusion in hemophiliacs and the terminal event is AIDS diagnosis. Lin et al. (1999) discuss the analysis of follow-up data from a randomized trial of patients with colon cancer (Moertel et al., 1990). Recurrence of disease and death are the intermediate and terminal event respectively in this setting. A third example discussed in Hougaard (2000) involves the analysis of urinary albumin levels in diabetic patients. Normal levels of urinary albumin are less than 30 mg/24 hours, but microalbuminuria is defined as 30-300 mg/24 hours and macroalbuminuria is present if levels are greater than 300 mg/24 hours; individuals must experience microalbuminuria before they experience macroalbuminuria. Multi-state models provide a natural representation for all such processes; see Hougaard (2000) and Andersen (2002) for a general review of methods for modeling multi-state data. Transitions between these states represent progression of the disease process and interest lies often lies in understanding the time course of the event process. Markov models, in which the operational time scale is the time since process initiation or some other common origin, are often adopted for progressive and degenerative processes which feature aging. However, in many cases interest lies in the sojourn distributions in particular states. In the latter case, semi-Markov models are the canonical models, but in many settings it is not plausible to assume that successive gap times are independent. When there is an association between gap times and data are subject to type I right-censoring, the second sojourn time is subject to dependent censoring, as pointed out by Wang (1999) and Huang (2000). Analyses are 3 then more complicated. There has been considerable advances in the analysis of consecutive sojourn, or gap, times in such settings recent years. Analyses may be based on shared frailties across gap time distributions with the assumption of conditional independence between gap times given the frailty; Kessing et al. (2004a,b) consider this in the analysis of hospitalization data in psychiatric patients and Kvist et al. (2007) develop a goodness-of-fit test for the latter model when the frailty is Gamma distributed. Wang and Wells (1998) and Lin et al. (1999) derived nonparametric estimators for the joint and conditional survival functions, using inverse probability of censoring weights; Schaubel and Cai (2004) consider alternative nonparametric methods. Andrei and Murray (2006) use inverse weighting to address dependent censoring in the analysis of quality adjusted lifetimes where different utilities are assigned to different states of a multi-state process. None of these methods yield simple summary measures of the association between sojourn times, and there is considerable appeal in developing methods with simple measures of association. It is therefore of interest to robustly estimate the extent of association between the gap times. This robustness should be achieved using nonparametric methods which are not influenced by this dependent censoring. Kendall’s tau is among the most popular association measures between two time-to-event random variables. Kendall and Gibson (1990) gave an empirical estimate of τ from an uncensored bivariate sample. Several authors proposed estimators for τ with bivariate right-censored parallel observations (Lakhal et al., 2008). Betensky and Finkelstein (1999) extended the estimation of τ to bivariate interval censored observations. Wang and Wells (2000) derived an estimator for τ valid for any censoring scheme as long as a nonparametric estimator for the joint survivor function exists. Little work, however, has been done for nonparametric estimation of τ under more complicated censoring schemes such as the dependent censoring scheme discussed above. In this paper we 4 propose a nonparametric estimator for Kendall’s τ measuring the association between two consecutive gap times. The proposed estimator incorporates inverse probability of censoring weights which address the impact of dependent censoring on second gap times. Because they summarize the association between paired failure times via a single parameter such as Kendall’s τ , copula functions offer and attractive framework for modeling the joint distribution of multiple failure times times (e.g. Fine et al., 2001; Wang and Ding, 2001; Lakhal et al., 2006, 2008a). He and Lawless (2003) consider the case of serial events, assume a Clayton copula to model the dependence between two successive event times, and derive maximum likelihood estimates for the copula parameter under weak marginal assumptions. Oakes (1986) proposed a model formulation by which the cross-ratio is constant within regions of the plane. Nan et al. (2006), adopt this model for a joint analysis of the time of an intermediate and terminal event, and derived estimators of the copula parameters using the maximum pseudo-likelihood procedure of Shih and Louis (1995) with the marginals left completely unspecified. The second purpose is develop methods for consecutive gap time analyses basedon on a standard copula formulation, and then in a more general way in spirit to Nan et al. (2006) based on gap times assuming a piecewise constant cross-ratio model between gap times. The parameters of the latter model will be estimated by inverting the estimator of Kendall’s tau derived in the first part of the paper. The remaining of the paper is organized as follow. In the next section we discuss a motivating study of the reproductive life of women in large cohort study (Lisabeth et al., 2004; Nan et al., 2006). In Section 2 we define Kendall’s τ , and propose a nonparametric estimate of it using censored serial gap time data. An estimate of the joint distribution of serial gap times are described in Section 3 based on an assumed Clayton copula function and these estimates are assessed through simulation studies; estimates for other functions are 5 also available. This method is generalized in Section 4 to accommodate piecewise constant cross-ratios as in Manatunga and Oakes (1996) and Nan et al. (2006). Application to the data from the Tremin Trust study illustrates the application of this method and facilitates comparisons with the findings of Nan et al. (2006). Section 5 contains general remarks and topics for further research. 1.2 Association between a Marker and Menopause in a Study of Reproductive Health Lisabeth et al. (2004) report on a large cohort study involving 1997 women recruited as students at the University of Minnesota between 1935 and 1939. We consider the data used by Nan et al. (2006), who analysed followup data from 562 women who were less than 25 years of age at the time of recruitment, who provided data on the age of menarche, and who were on study at 35 years of age. Figure 1 displays the multistate diagram for this process. Figure 2 gives a timeline diagram of a reproductive life cycle where T1 = X represents the time of the first cycle of at least 45 days, and T2 is the time of menopause; Y = T2 − X is the time from the first 45 day cycle and menopause. Thus X and Y represent the sojourn times in states 1 and 2 of Figure 1. We note that it is biologically possible for a woman to experience menopause without experiencing the 45 days cycle marker, but the probability of this is quite low; in data set of interest, only 1 of the 193 women experienced menopause without prior experience of 45 days cycle marker. Nan et al. (2006) considered an analysis of the association between T1 and T2 based on a piecewise constant Clayton copula, assuming a common Kendall’s τ for the intervals 35-39, 40-45, 46-49 and 50+ in the spirit of Manatunga and Oakes (1996). Based on the resulting estimates, Nan et al. (2006) obtained estimates of the conditional distribution of the time to menopause given age at the time of the marker event. While copula models accommodate association between the event times, the fact that 6 T1 ≤ T2 is not incorporated into the model and hence it does not provide an ideal representation of the process. A joint analysis of X and Y , is appealing in the sense that there are no constraints on either X or Y . Naive marginal analysis of Y is not valid, however, because if C is the censoring time defined as the age at last contact, for Y this yields a censoring time of C − X. As a result, if X and Y are associated, Y is subject to dependent censoring in marginal analyses. We deal with this problem in what follows. Insert Figures 1 to 3 about here 2. Nonparametric Estimation of Kendall’s τ 2.1 Estimation with Parallel Survival Times Kendall’s tau is among the most popular association measures between two time-toevent random variables. Suppose (X, Y ) is a pair of event time random variables, with joint survivor function π(x, y) = P (X > x, Y > y). If (X1 , Y1) and (X2 , Y2 ) are independently sampled from π(X, Y ), (X1 , Y1 ) and (X2 , Y2) are said to be concordant if X1 > X2 and Y1 > Y2 or X1 < X2 and Y1 < Y2 (i.e. the marginal rankings of individuals with respect to X and Y agree); the pairs (X1 , X2 ) and (Y1 , Y2 ) are otherwise discordant. Kendall’s τ is defined as the probability of concordance among two samples minus the probability of discordance and given as τ = Pr{(X1 − X2 )(Y1 − Y2 ) > 0} − Pr{(X1 − X2 )(Y1 − Y2 ) < 0} . An appealing feature is that this is independent of the marginal distributions of X and Y and is equal to zero under independence; it can be re-expressed as 2 × Pr{(X1 − X2 )(Y1 − Y2 ) > 0}−1. Moreover, if a12 = 2×I(X1 > X2 )−1 and b12 = 2×I(Y1 > Y2 )−1, then τ = E(a12 b12 ) which can be written τ =4 Z 0 ∞ Z 0 ∞ π(x, y) 7 ∂ 2 π(x, y) dxdy − 1 . ∂x∂y (2.1) Kendall and Gibson (1990) estimated τ from uncensored bivariate data {(Xi , Yi), i = 1, . . . , n} by its empirical version !−1 n τb = 2 X ψij , i<j where ψij = aij bij is the concordance/discordance status, equal to 1 if the pair (i, j) is concordant and −1 otherwise. In presence of censoring, it may not be possible to compute ψij for some pairs of points, making estimation of τ more difficult; such pairs are called non-orderable and pairs that can be ordered are orderable. With censored data, one may observe (X̃, Ỹ , δX , δY ), where X̃ = min(X, CX ), Ỹ = min(Y, CY ), δX = I(X < CX ), δY = I(Y < CY ) and (CX , CY ) ij are the censoring variables. Oakes (1982) showed the pair (i, j) is orderable if {X̃ij < C̃X , ij j i Ỹij < C̃Yij } where X̃ij = min (Xi , Xj ), Ỹij = min (Yi, Yj ), C̃X = min (CX , CX ) and C̃Yij = min (CYi , CYj ). If pair (i, j) is orderable, let Lij = 1; Lij = 0 otherwise (Oakes, 1982). The estimator !−1 n τbO = 2 X Lij ψij , i<j is biased when τ is non-zero; nevertheless, it is widely used to test independence of a pair of random variables based on censored data. Recently, Lakhal et al. (2008b) proposed a method that greatly reduces the bias of τbO by incorporating use of inverse probability of censoring weights. Let pbij is an estimator for the probability of being orderable given by pij = P 2 {CX > X̃ij ; CY > Ỹij |X̃ij , Ỹij } . The weighted estimate is then given as τbmo !−1 n = 2 8 X i<j Lij ψij . pbij (2.2) This estimator is shown to be consistent and asymptotically normally distributed under regularity conditions. It is shown empirically to have good performance with finite sample compared to existing competitors. 2.2 Estimation With Serial Gap Times and Dependent Censoring In this section, we adapt the estimator given by (2.2) to estimate Kendall’s tau between serial gap times. Let T1 = X denote the time of the intermediate event and T2 = X + Y the time of the terminal event. We assume that C is an independent censoring variable (independent of X and Y ) that defines the region over which T1 and T2 are observable. Denote its survival function by G(·). That is, one may only observe X̃ = T̃1 , Ỹ = T̃1 − T̃2 , δX = I(T1 < C) and δY = I(T2 < C) where T̃1 = min(T1 , C) and T̃2 = min(T2 , C). Note that if T1 is censored, T2 is also censored and Ỹ = 0. Thus Y is censored by C ′ = max(0, C − X). Unless X and Y are independent, C ′ is associated to Y . Under these conditions, Lin et al. (1999), among others, derived a nonparametric estimator for π(x; y) for each (x, y) such as x + y ≤ C̄, where C̄ > 0 satisfies G(C̄) > 0;, this estimator may be expressed as πb (x, y) = n 1X I(X̃i > x; Ỹi > y) , b X̃ + y) n i=1 G( i (2.3) b where G(·) is the Kaplan-Meier estimator of G(·) based on {(X̃k + Ỹk , 1 − δYk ), k = 1, . . . , n}. Plugging the latter estimator into (2.1) yields an estimator for τ we denote by τbW . Here we propose an alternative estimator for τ , and compare these estimators. The orderability condition for serial events Lij is expressed as {X̃ij < C̃ij ; Ỹij < min(Ci′ , Cj′ )}; a point with δX = 0 cannot be ordered with any other point. Indeed, such points contain no information on the association of X and Y since the second gap time is not observed and their contribution to the estimator of τ is restricted to the computation of the weights pbij . This is in agreement with Figure 4. 9 Insert Figure 4 about here By continuity, an orderable pair satisfies Pr(Ỹij > 0|Lij = 1) = 1, so min(Ci′ , Cj′ ) > 0, Xi < Ci and Xj < Cj and δXi = 1 and δXj = 1 (2.4) X̃ij < C̃ij . (2.5) By (2.5), Lij reduces to {Ỹij < min(Ci − Xi , Cj − Xj )}, which can be re-written as: {Ci > Xi + Ỹij ; Cj > Xj + Ỹij }. (2.6) Observe that for any orderable pair (i, j), the triplet {Xi , Xj , Ỹij } is observed. Thus one can express the conditional probability of being orderable by pij = Pr{Ci > Xi + Ỹij ; Cj > Xj + Ỹij |Xi , Xj , Ỹij } = Pr{Ci > Xi + Ỹij |Xi , Xj , Ỹij } × Pr{Cj > Xj + Ỹij |Xi , Xj , Ỹij } = G(Xi + Ỹij ) × G(Xj + Ỹij ). This probability is estimated by b b pbij = G(X i + Ỹij ) × G(Xj + Ỹij ), (2.7) Kendall’s tau is then estimated by (2.2), with pbij given by (2.7). As will be proved in the appendices, the resulting estimator is consistant and asymptotically normally distributed as long as there a constant C̄ and ǫ satisfying G(C̄) > ǫ > 0 such that Xi + Yi < C̄ for i = 1, . . . , n. 10 3. A Clayton Copula Model for serial gap times (X, Y ) Once the first event occurs, say at X = x, the conditional survival function SY (·|X = x) becomes of interest for practitioners. A convenient way to estimate this probability is to assume a Clayton copula for the pair (X, Y ). In this section, we investigate such model, derive related inference procedures and discuss further extensions. 3.1 Model and properties Under a Clayton copula for (X, Y ), the joint survival function is expressed as h π(x, y) = SX (x)−(θ−1) + SY (y)−(θ−1) − 1 i−1/(θ−1) , (3.1) where λY (y|·) = limdy↓0 Pr(Y ≤ y + dy|Y > y; ·)/dy is the conditional hazard function of Y and θ ∈ [0; ∞] is the cross-ratio of X and Y defined as θ= λY (y|X = x) = θ(x, y) . λY (y|X > x) (3.2) Under this model, the cross-ratio, measuring the level of dependence between X and Y , is constant and is related to Kendall’s tau through τ= The fact that θ+1 . θ−1 SY (y|X = x) = [SY (y|X > x)]θ . (3.3) (3.4) mean estimation of the conditional distribution can be obtained from SY (y|X > x). Lin et al. (1999) note that SY (y|X > x) can be estimated by πb (x; y)/πb (x; 0) for x + y < C̄, where πb (·, ·) is given by (2.3). We can also estimate θ from (3.3) by θb = (τb + 1)/(τb − 1), where τb is the nonparametric estimator of τ , derived in Section (2.2). A natural estimator for SY (y|X = x) is then obtained by plugging in estimators for unknown quantities in (3.4). 11 3.2 Numerical investigations with the Clayton Copula Colon cancer data: Moertel et al. (1990) discuss a clinical trial where patients treated for colon cancer are randomized into two groups: therapy and placebo, including 304 and 315 patients respectively. Patients are potentially subject to ordered events and the serial gap times in this example are: the time from randomization to cancer recurrence (X) and the time from cancer recurrence to death (Y ). At the end of the study, 108 patients died among the 119 who had cancer occurrence in the therapy group and 155 died among 177 who had cancer recurrence in the placebo group. We computed τb and τbW for both groups. We found τb = 0.2725 (s.e. 0.058) and τbW = −0.796 (s.e. 0.613) for the therapy group and τb = 0.2685 (s.e. 0.054) and τbW = 0.012 (s.e. 0.779) for the placebo group. Our estimator τb detects a significant positive dependence between X and Y in both groups. This was conjectured by Lin et al. (1999). Furthermore, τb suggests that the magnitude of this dependence isn’t affected by the therapy. At the opposite, the variance of τbW is too large to make inference. A convenient way to illustrate this dependence is to investigate the conditional survival SY (·|X = x). The latter is estimated under a Clayton copula for the pair (X, Y ). He and Lawless (2003) tested and didn’t reject the adequacy of the assumed model for both groups. In Figure 5, we report the median of SbY (·|X = x) versus x for both groups. Insert Figure 5 about here Figure 5 suggests that therapy decreases survival time following cancer recurrence. This is in agreement with the conclusions of Lin et al. (1999) and He and Lawless (2003). Simulations: A first set of simulations was conducted to assess and compare the performances of τb, τbW and the underlying estimators for SY (y|X = x) with finite samples. Four real τ values (0.2, 0.4, 0.6 and 0.8) were used to generate samples of 400 correlated pairs 12 (X, Y ) using a Clayton copula, with exponential marginals with means equal to 1 and 1/2 respectively. The censoring variable C was generated from a uniform distribution over [0, A], with A controlling the censoring fraction cf2 = P (X + Y > C). Two values (0.1 and 0.2) were used for the censoring fractions. They correspond to cf1 = P (X > C) equal to 0.20 and 0.30 approximately. Means and mean square errors of τb, τbW and the resulting SbY (yi |X = x0 ) (i = 1, 2, 3, 4), computed over 1000 iterations, are reported in Table 1. The points x0 , y1 , y2 , y3 and y4 are chosen such as SX (x0 ) = 1/2 and SY (yi|X = x0 ) = i/5. Insert Table 1 about here As expected, the censoring fractions affect all estimators. Table 1 shows that τb outperforms τbW under all simulation conditions, as attested by their respective mean square errors. The proposal τb is virtually unbiased, except under severe conditions cf1 = 0.2, cf2 = 0.3 and τ ≥ 0.6. On the other hand, the bias of τbW is non negligible, even under light censoring and small values of τ . The same conclusions can be reached for estimators of SbY (yi |X = x0 ) based on τb and τbW respectively. Table 1 suggests that the variance of SbY (y|X = x0 ) is smaller for large values of y. This result is not standard in survival analysis but is in accordance with the simulation results reported in Table 1 of Lin et al. (1999). 4. A Piecewise Clayton Copula A constant cross-ratio, and hence the Clayton copula may not be appropriate for some applications. Nan et al. (2006) discuss a copula model where the cross-ratio depends on one of the time-to-events variables, say X, rather than being constant. They assumed a partition 0 = w0 < w1 < · · · < wK of the support of X such as the cross-ratio is constant inside each 13 grid [wk−1 , wk [×[0, MY ] and equal to θk . Under such conditions, (3.4) becomes: SY (y|X = x) = [SY (y|X > x)]θk for x ∈ [wk−1, wk [ . (4.1) The resulting model is referred to as the piecewise Clayton copula in what follows. Kendall’s tau τk restricted to (X, Y ) ∈ Ak = [wk−1 , wk [×[0, MY ] is related to θk but this relationship is not as simple as (3.3), except for k = K. It depends on SX (wk−1 ) and SX (wk ) and is difficult to track analytically. Denote it as τk = ξ [θk ; SX (wk−1); SX (wk )] . (4.2) Nevertheless, numerical values of ξ can be easily obtained by simulations. In Figure 6, these are reported with SX (wk−1) = 0.75 and SX (wk ) = 0.5. Insert Figure 6 about here Nan et al. (2006) assumed a piecewise Clayton copula for (T1 , T2 ) and estimated the model parameters {θk , k = 1, . . . , K} by adapting the methodology of Shih and Louis (1995). This approach ignores the ordered nature of (T1 , T2 ). In particular, the resulting estimator of Pr(T1 > T2 ) is not identically zero as it should be. A more appealing approach may be to assume a piecewise Clayton copula for (X, Y ) where no order restrictions are present, rather than for (T1 , T2 ). Moreover, if the presence of a cycle of 45 days or more signals early changes prior to the onset of menopause, Y may well be a more natural object of interest for some scientific questions.. We propose to estimate the model parameters {θk , k = 1, · · · , K} by inversion of ξ. We begin by estimating τk via a slight modification of the estimator derived in Section (2.2). Note that for any comparable pair (i, j), both Xi and Xj are observed and thus censoring does not prevent one from knowing whether these points belong to a given strip Ak . Denote this 14 event by νi,j (k) = 1{wk−1 ≤X̃i ,X̃j ≤wk } . A natural estimator of τk is given by τbk = X i<j Lij ψij νi,j (k) X Lij νi,j (k) / . pbij pbij i<j (4.3) For k = 1, · · · , K, θk is then estimated by h i θbk = ξ −1 τbk , SbX (wk−1), SbX (wk ) . (4.4) The resulting θbk , along with (4.1), yields an estimator for SY (y|X = x). 4.1 Analysis of the Tremin Trust Data We consider the tremin trust data. The purpuse of this follow-up study is to quantify association between several bleeding makers such as the 45-days cycle and menopause. At the end of the study, 193 women observed menopause among the 357 women who experienced the 45-day marker. We consider the same intervals for T1 = X, used by Nan et al. (2006), namely: 35 − 39, 40 − 45, 46 − 49 and 50+ where the cross-ratio between X and Y is assumed constant. These boundaries satisfy SbX (wk ) = {0.8383, 0.562, 0.281}. We estimated τbk , k = 1, 2, 3, 4 using equation (4.3). Inverting these estimates according to (4.4) yields estimates for θ. The results are presented in Table 2. Insert Table 2 about here Thus, there is a significant association between the 45-day cycle and menopause if the latter occurs inside the interval 40 − 49. This is in agreement with the results of Nan et al. (2006) who also detected a significant association between T1 and T2 in the same region. Nan et al. (2006) presented plots of Pr(T2 > t2 |T1 = t1 ) versus t2 for different values of t1 . This probability is equal to Sy (t2 − t1 |X = t1 ) and thus can be estimated by the assumed model. The results are presented in Figure 7 for t1 ∈ {36, 39, 42, 45, 48, 51}. Insert Figure 7 about here 15 This figure agrees with the one produced by Nan et al. (2006) for women who experience the 45 days cycle marker after age 40. Nevertheless, we obtain different results for women who experience the 45 days cycle before age 40. In particular, we estimate that such a woman has about 20% of chances to observe menopause before age 45 (the X = 36 and 39 curves) while this probability is estimated to be approximately null by Nan et al. (2006). This may suggests a lack of fit of one of the models and thus the need for appropriate goodness-of-fit tests for such copulas models. This is beyond the scope of this paper. 4.2 Numerical Investigations A second set of simulations was conducted to assess the performances of the estimator of τk given by (4.2). We simulated 1000 samples of 562 bivariate observations according to a piecewise Clayton copula with parameters corresponding to: τk ∈ {−0.049, −0.374, −0.235, −0.139}. The boundaries wk , k = 1, 2, 3 are chosen such as SX (wk ) are equal to {0.8383, 0.562, 0.281}. We got τbk equal to {−0.049(s.e.0.10), −0.373(s.e.0.066), −0.229(s.e.0.054), −0.136(s.e.0.053)}. These simulations show that τbk is virtually unbiased. Furthermore, the jackknife estimates of the variance used with the real data set seem to over estimate the real variability. 5. Discussion In this paper, we consider nonparametric estimation of the association between successive gap times (X, Y ) when the second gap time is subject to dependent censoring induced by the association itself and an independent right censoring time of the process. Inverse probability weights are used to obtain a nonparametic estimate of Kendall’s tau which can be used to test the null hypothesis of independence between gap times. This can be done by either providing 16 an estimator for the variance of τb under independence or by resampling techniques. The proposed estimator of Kendall’s τ can then be used to make inferences about the conditional survivor function SY (y|X = x) under a Clayton copula. While this is possible by (3.4) for the Clayton copula, this particular relation does not hold for other copula families. Alternative ways of estimating SY (y|X = x) under arbitrary copula functions warrant investigation as the Clayton copula may not provide suitable fit for a given data set. Recurrent event processes yield successive gap times and using generalizations of the estimator of Kendall’s tau to make inference about gap time models for recurrent event processes with copula formulations is another area warranting development. Such models would allow the estimation of Pr(Tk > tk |T1 = t1 , T2 = t2 , . . . , Tk−1 = tk−1 ) . We employed a piecewise Clayton copula formulation in Section 4. Such models have received relatively little attention in the literature and methods to assist in the specification of regions with a constant cross-ratio would be helpful; at present they are based on ad hoc graphical methods. Derivation of formal goodness-of-fit tests for these models and objective procedures for specifying these regions would increase their practical utility. In many settings event times are not observed precisely, but individuals are only assessed at periodic inspection times creating interval censored data on gap times. In the context of a three-state progressive model, the intermediate event may be subject to interval-censoring and the terminal event right-censoring, or both events may be interval censored. In this case nonparametric estimation of the association between gap times is considerably more challenging, and parametric assumptions may be required. 17 REFERENCES Andersen, P.K. and Keiding, N. (2002). Multi-state models for event history analysis. Statistical Methods for Medical Research 11, 91–115. Andrei, A.-C. and Murray, S. (2006). Estimating the quality-of-life-adjusted gap time distribution of successive events subject to censoring. Biometrika 93(2): 343–355. Bebchuk, J.D. and Betensky R.A. (2002). Local likelihood analysis of the latency distribution with interval censored intermediate events. Statist. Med., 21, 3475-3491. Betensky, R. and Finkelstein, D. (1999). An extension of Kendall’s coefficient of concordance to bivariate interval-censored data. Statist. Med., 18, 3101–3109. Fine, J. Jiang, H. and Chappell, R. (2001). On semi-competing risks data Biometrika 88(4), 907-919. Hougaard, P. (2000). Multi-state models: A review. Lifetime Data Analysis 5, 239–264. Springer, New York. Huang, Y. (2000). Two-sample multistate accelerated sojourn times model. J. Am. Statist. Assoc 95, 61927. He, W. and Lawless, J.F. (2003) Flexible maximum likelihood methods for bivariate proportional hazards models. Biometrics 12: 837–848. Kendall, M. and Gibbons, J.D. (1990). Rank Correlation Methods. (Fifth ed.). A Charles Griffin Title. London: Edward Arnold. Kessing, L.V., Hansen, M.G., and Andersen, P.K. (2004a). Course of illness in depressive bipolar disorders: Naturalistic study, 1994–1999. Brit. J. Psych. 185, 372–377. Kessing, L.V., Hansen, M.G., Andersen, P.K., Angst, J. (2004b). The predictive effect of episodes on the risk of recurrence in depressive and bipolar disorder - a life-long prospective. Acta Psych. Scand. 109, 339–344. 18 Kvist, K., Gerster, M., Andersen, P.K., Kessing, L.V. (2007). Non-parameteric estimation and model checking procedures for marginal gap time distributions for recurrent events. Statist. Med. 26, 5394–5410. Lakhal-Chaieb L., Rivest, L.-P. and Abdous, B. (2006). Estimating survival under dependent truncation. Biometrika 93 (3), 655–669. Lakhal-Chaieb L., Rivest, L.-P. and Abdous, B. (2008a). Estimating survival and association in a semicompeting risks model. Biometrics 64 (1), 180–188. Lakhal-Chaieb L., Rivest, L.-P. and Beaudoin, D. (2008b) IPCW estimators for Kendall’s tau under bivariate censoring. Under revision. Lin, D. Y., Sun, W., and Ying, Z. (1999). Nonparametric estimation of the gap time distributions for serial events with censored data. Biometrika 86: 59-70. Lisabeth, L.D., Harlow, S.D., Gillespie, B., Lin, X., Sowers, M.F. (2004). Staging productive aging: a comparaison of proposed bleeding criteria for the menopausal transition. Menopause 11(2): 186–197. Manatunga, A.K. and Oakes D. (1996). A meaure of association for bivariate frailty distributions. Journal of Multivariate Analysis 56, 60–74. Moertel, C.G., Fleming, T.R., McDonald, J.S. et al. (1990). Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. New England Journal of Medicine, 322, 352–358. Nan, B., Lin, X., Lisabeth, L.D., Harlow, S.D. (2006). Piece-wise constant cross-ratio estimates for the association between age at marker event and age at menopause. J. Am. Statist. Assoc., 101 65–77. Oakes, D. (1982). A concordance test for independence in the presence of censoring. Bioemtrics, 38 (2): 451-455. Oakes, D. (1986). A model for bivariate survival data. In Modern Statistical Methods in 19 Chronic Disease Epidemiology, eds. S.H. Moolgavkar and R.L. Prentice. Wiley, New York. Schaubel, D.E. and Cai, J. (2004). Regression methods for gap time hazard functions of sequentially ordered multivariate failure time data. Biometrika 91, 291–303. Shih, J.H. and Louis, T.A. (1995). Inferences on the association parameter in copula models for bivariate survival data. Biometrics 51: 1384–1399. Wang, M. C. (1999). Gap time bias in incident and prevalent cohorts. Statist. Sinica 9: 999– 1010. Wang, W. and Wells, M.T. (1998). Nonparametric estimation of successive duration times under dependent censoring. Biometrika 85(3): 561–572. Wang, W. and Wells, M.T. (2000). Estimation of Kendall’s tau under censoring. Statist. Sinica 10(4): 1199-1215. 20 Appendix A: A U-statistics expression for √ n(τ̃ − τ ) It is clear that τ̃ is a U-statistics of order 2 whose expectation is equal to E(τ̃ ) = = = n 2 !−1 n 2 !−1 n 2 !−1 Lij aij bij } pij X E{ X E{E[ X E{ i<j i<j i<j Lij aij bij |Xi , Xj , Ỹij ]} pij 1 E[Lij aij bij |Xi , Xj , Ỹij ]}. pij Once Xi , Xj and Ỹij are fixed, by (2.6), the orderability event Lij depends only on the censoring variables Ci and Cj while the concordance/discordance status aij bij depends only on max(Yi , Yj ) and hence, by independence, one has E{Lij aij bij |Xi , Xj , Ỹij } = E{Lij |Xi , Xj , Ỹij }E{aij bij |X̃ij , Ỹij } and, if pij > 0 for all pairs, E(τ̃ ) = = And thus √ !−1 n 2 !−1 n 2 X E{E[aij bij |Xi , Xj , Ỹij ]} X E{aij bij } = τ i<j i<j n(τ̃ − τ ) is a zero mean U-statistic of order 2. 21 Appendix B: Asymptotic expression for √ n(τ̂ − τ̃ ) Following the lines of Andrei & Murray (2006), we write 1 pbij − 1 pij = = = + b G(X 1 1 − b G(Xi + Ỹij )G(Xj + Ỹij ) i + Ỹij )G(Xj + Ỹij ) b b G(Xi + Ỹij )G(Xj + Ỹij ) − G(X i + Ỹij )G(Xj + Ỹij ) b b G(Xi + Ỹij )G(Xj + Ỹij )G(X i + Ỹij )G(Xj + Ỹij ) (B.1) b G(Xi + Ỹij ){G(Xj + Ỹij ) − G(X j + Ỹij )} b b G(Xi + Ỹij )G(Xj + Ỹij )G(Xi + Ỹij )G(X j + Ỹij ) (B.2) b b G(X j + Ỹij ){G(Xi + Ỹij ) − G(Xi + Ỹij )} b b G(Xi + Ỹij )G(Xj + Ỹij )G(X i + Ỹij )G(Xj + Ỹij ) (B.3) The absolute value of the right hand side of (B.2) is bounded from above by b 2 × Sup0≤c≤C̄ |G(c) − G(c)| b 2 (C̄)G2 (C̄) G and thus converges to zero in probability by uniform convergence of the Kaplan-Meier esti√ mator on [0, C̄]. The convergence of n(τ̂ − τ̃ ) follows. On the other hand, one has √ " b G(t b ′) G(t)G(t′ ) − G(t) n G(t)G(t′ ) # = √ b n{G(t) − G(t)} + G(t) √ b ′ ) − G(t′ )} n{G(t + op (1) G(t′ ) Pn c b −) √ Z t G(u k=1 dMk (u) n = Pn 0 G(u) k=1 I(X̃k + Ỹk ≥ u) √ Z n + 0 t′ P n c b −) G(u k=1 dMk (u) + op (1) P G(u) nk=1 I(X̃k + Ỹk ≥ u) where M c (u) is the standard martingale associated to the censoring variable C defined by M c (u) = I(C ≤ u; δY = 0) − and thus √ Ru 0 I(X̃ + Ỹ ≥ s)dΛc (s), " b b G(Xi + Ỹij )G(Xj + Ỹij ) − G(X i + Ỹij )G(Xj + Ỹij ) n G(Xi + Ỹij )G(Xj + Ỹij ) # can be written as: √ Z n 0 C̄ P n c b −) √ G(u k=1 dMk (u) + n I(Xi +Ỹij ≥ u) Pn G(u) k=1 I(X̃k + Ỹk ≥ u) 22 Z C̄ 0 P n c b −) G(u k=1 dMk (u) +op ( I(Xj +Ỹij ≥ u) P G(u) nk=1 I(X̃k + Ỹk ≥ u) and √ √ Z n(τ̂ − τ̃ ) = n C̄ P C̄ P 0 √ Z + n 0 + op (1) n c b −) 1 X Lij aij bij I(Xi + Ỹij ≥ u) G(u k=1 dMk (u) P n b b G(u) nk=1 I(X̃k + Ỹk ≥ u) G(X i + Ỹij )G(Xj + Ỹij ) 2 i<j n c b −) 1 X Lij aij bij I(Xj + Ỹij ≥ u) G(u k=1 dMk (u) P n b b G(u) nk=1 I(X̃k + Ỹk ≥ u) G(X i + Ỹij )G(Xj + Ỹij ) 2 i<j Since b −) G(u 1 n → Pn G(u) k=1 I(X̃k + Ỹk ≥ u) P (X̃ + Ỹ ≥ u) almost surely, the right hand side of (B.4) can be expressed as √1 n Z C̄ = 1 √ n Z = n Z C̄ 1 1 X L12 a12 b12 {I(X1 + Ỹ12 ≥ u) + I(X2 + Ỹ12 ≥ u)} √ dMkc (u) + op (1) E n k=1 0 G(X1 + Ỹ12 )G(X2 + Ỹ12 ) P (X̃ ≥ u) 0 Finally, n X 1 X Lij aij bij {I(Xi + Ỹij ≥ u) + I(Xj + Ỹij ≥ u)} 1 dMkc (u) + op (1) n b b P (X̃ + Ỹ ≥ u) k=1 G(Xi + Ỹij )G(Xj + Ỹij ) 2 i<j √ 0 C̄ " # n X 1 L12 a12 b12 {I(X1 + Ỹ12 ≥ u) + I(X2 + Ỹ12 ≥ u)} E dMkc (u) + op (1) b b P ( X̃ ≥ u) G(X1 + Ỹ12 )G(X2 + Ỹ12 ) k=1 " # n(τ̂ − τ̃ ) is asymptotically equivalent to a sum of zero-mean independent and identically distributed terms. 23 Table 1 Means and Mean square errors (in parentheses; ×104 ) of τb (IPCW) and τbW (WW) and their resulting conditional survival estimates cf2 τ̂ ŜY (y1 |x0 ) 0.1 ŜY (y2 |x0 ) ŜY (y3 |x0 ) ŜY (y4 |x0 ) τ̂ ŜY (y1 |x0 ) 0.2 ŜY (y2 |x0 ) ŜY (y3 |x0 ) ŜY (y4 |x0 ) τ = 0.2 IPCW WW 0.196 0.214 (15) (34) 0.196 0.185 (12) (20) 0.400 0.387 (16) (28) 0.600 0.592 (17) (21) 0.804 0.797 (15) (16) 0.190 0.179 (25) (63) 0.198 0.193 (23) (36) 0.399 0.402 (28) (40) 0.597 0.595 (29) (38) 0.798 0.798 (20) (25) τ = 0.4 IPCW WW 0.400 0.413 (12) (43) 0.202 0.192 (11) (27) 0.402 0.390 (26) (46) 0.602 0.592 (31) (50) 0.802 0.794 (22) (31) 0.395 0.396 (17) (61) 0.195 0.192 (21) (40) 0.401 0.395 (40) (67) 0.602 0.596 (40) (60) 0.802 0.799 (36) (47) 24 τ = 0.6 IPCW WW 0.597 0.606 (7) (37) 0.204 0.194 (25) (41) 0.405 0.395 (59) (85) 0.607 0.608 (69) (87) 0.806 0.809 (62) (69) 0.590 0.596 (10) (74) 0.202 0.194 (41) (78) 0.401 0.387 (80) (149) 0.601 0.584 (116) (183) 0.797 0.775 (111) (145) τ = 0.8 IPCW WW 0.801 0.820 (3) (37) 0.207 0.445 (67) (141) 0.407 0.415 (193) (352) 0.610 0.593 (290) (554) 0.811 0.820 (305) (760) 0.795 0.788 (3) (70) 0.208 0.723 (156) (283) 0.402 0.417 (377) (787) 0.622 0.635 (840) (1653) 0.807 0.856 (884) (6224) Table 2 Estimates of τk and θk with the Tremin trust data k 1 2 3 4 τbk -0.049 -0.374 -0.235 -0.139 s.e. 0.157 0.070 0.093 0.118 θbk 0.412 0.184 0.327 0.756 25 95% C.I. [0.15;14.8] [0.128;0.270] [0.205;0.650] [0.460;1.200] Initial At Least One State 45−day Cycle Menopause Figure 1. A three-state model for an intermediate and termainal event. Y X | | | | BIRTH MENARCHE 1ST 45−DAY | | MENOPAUSE CYCLE T=0 T1 C1 T2 C2 Figure 2. Timeline diagram for the reproductive life cycle including the intermediate event of a cycle ≥ 45 days in duration. 26 1.0 0.8 0.6 0.4 0.0 0.2 Proportion of women with event 45−Day Cycle Menopause 40 45 50 55 60 Age in years Figure 3. Empirical marginal distribution of time to first cycle of ≥ 45 days duration and menopause in Tremin Trust data of Nan et al. (2006). 27 X Y Figure 4. Plots of various censored data patterns. 28 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 Conditional median gap time from cancer recurrence to death Observation Therapy 0 1 2 3 4 Time from randomisation to cancer recurrence (years) Figure 5. Empirical estimates of conditional median time from cancer occurrence to death given cancer occurrence date. 29 0.0 −0.2 −0.4 −0.6 −0.8 −1.0 Kendall’s tau Restricted Kendall’s tau Global Kendall’s tau 0.0 0.2 0.4 0.6 theta Figure 6. Restricted Kendall’s tau. 30 0.8 1.0 1.0 0.8 0.6 0.4 0.0 0.2 Survival Probability age 36 age 39 age 42 age 45 age 48 age 51 35 40 45 50 55 60 Age at menopause Figure 7. Empirical estimates of conditional survivor functions for menopause given age at first cycle of ≥ 45 days duration. 31 0.8 0.6 0.4 0.0 0 1 2 3 4 0 1 2 3 4 x=3 x=4 1.0 Time since cancer recurrence in years (Y) 0.6 0.8 Observation Therapy 0.4 Conditional survival 0.0 0.2 0.4 0.6 0.8 Observation Therapy 0.2 1.0 Time since cancer recurrence in years (Y) 0.0 Conditional survival Observation Therapy 0.2 Conditional survival 0.2 0.4 0.6 0.8 Observation Therapy 0.0 Conditional survival 1.0 x=2 1.0 x=1 0 1 2 3 4 0 Time since cancer recurrence in years (Y) 1 2 3 4 Time since cancer recurrence in years (Y) Figure 8. Empirical estimates of conditional survivor functions for time from cancer occurrence to death given cancer occurrence. 32 0.8 0.6 x=3 x=4 0.0 0.2 0.4 x=4 x=2 0.4 x=3 x=1 0.2 0.6 x=2 Conditional survival 0.8 x=1 0.0 Conditional survival 1.0 Therapy 1.0 Observation 0 1 2 3 4 0 Time since cancer recurrence in years (Y) 1 2 3 4 Time since cancer recurrence in years (Y) Figure 9. Empirical estimates of conditional survivor functions for time from cancer occurrence to death given cancer occurrence date. 33
© Copyright 2026 Paperzz