Inverse Probability of Censoring Weighted Estimates of Kendall`s τ

Inverse Probability of Censoring Weighted Estimates
of Kendall’s τ for Gap Time Analyses
Lajmi Lakhal-Chaieb,1 ,∗ Richard J. Cook2,† and Xihong Lin3,‡
1
Département de Mathématiques et Statistique, Université Laval,
1045 av. de la Médecine, Local 1056, Québec, QC, Canada G1V 0A6
2
Department of Statistics and Actuarial Science, University of Waterloo,
200 University Avenue West, Waterloo, ON, Canada N2L 3G1
3
Department of Biostatistics, Harvard School of Public Health,
655 Huntington Avenue, Building II Room 419, Boston, MA 02115
October 18, 2008
∗
email: [email protected]
email: [email protected]
‡
email: [email protected]
†
1
Summary. In life history studies interest often lies in the analysis of the inter-event, or gap,
times and the association between event times. Gap time analyses are challenging however,
even when the length of follow-up is determined independently of the event process, since
associations between gap times induce dependent censoring for second and subsequent gap
times. This paper discusses nonparametric estimation of the association between consecutive gap times based on Kendall’s τ in the presence of this type of dependent censoring. A
nonparametric estimator is provided which using inverse probability of censoring weights.
Estimates of joint and conditional gap time distributions can be obtained following specification of a particular copula function. Simulation studies show the estimator performs
well and compares favourably with an alternative estimator. Generalizations to a piecewise
constant Clayton copula are given. This estimator is used to analyse data on the times to
an intermediate marker of menopause and menopause among women in a cohort study of
reproductive health of women.
Key words: copula, dependent censoring, gap times, Kendall’s tau
2
1. Introduction
1.1 Background
In many settings involving the analysis of life history data, interest lies in the occurrence
of two or more consecutive events. A common particular problem involves the joint analysis
of a terminal event and an essential intermedicate event. Bebchuk and Betensky (2002)
consider such a three state model in which the intermediate state represents HIV infection
via blood transfusion in hemophiliacs and the terminal event is AIDS diagnosis. Lin et
al. (1999) discuss the analysis of follow-up data from a randomized trial of patients with
colon cancer (Moertel et al., 1990). Recurrence of disease and death are the intermediate
and terminal event respectively in this setting. A third example discussed in Hougaard
(2000) involves the analysis of urinary albumin levels in diabetic patients. Normal levels of
urinary albumin are less than 30 mg/24 hours, but microalbuminuria is defined as 30-300
mg/24 hours and macroalbuminuria is present if levels are greater than 300 mg/24 hours;
individuals must experience microalbuminuria before they experience macroalbuminuria.
Multi-state models provide a natural representation for all such processes; see Hougaard
(2000) and Andersen (2002) for a general review of methods for modeling multi-state data.
Transitions between these states represent progression of the disease process and interest lies
often lies in understanding the time course of the event process. Markov models, in which
the operational time scale is the time since process initiation or some other common origin,
are often adopted for progressive and degenerative processes which feature aging. However,
in many cases interest lies in the sojourn distributions in particular states. In the latter
case, semi-Markov models are the canonical models, but in many settings it is not plausible
to assume that successive gap times are independent. When there is an association between
gap times and data are subject to type I right-censoring, the second sojourn time is subject
to dependent censoring, as pointed out by Wang (1999) and Huang (2000). Analyses are
3
then more complicated.
There has been considerable advances in the analysis of consecutive sojourn, or gap,
times in such settings recent years. Analyses may be based on shared frailties across gap
time distributions with the assumption of conditional independence between gap times given
the frailty; Kessing et al. (2004a,b) consider this in the analysis of hospitalization data in
psychiatric patients and Kvist et al. (2007) develop a goodness-of-fit test for the latter model
when the frailty is Gamma distributed. Wang and Wells (1998) and Lin et al. (1999) derived nonparametric estimators for the joint and conditional survival functions, using inverse
probability of censoring weights; Schaubel and Cai (2004) consider alternative nonparametric
methods. Andrei and Murray (2006) use inverse weighting to address dependent censoring
in the analysis of quality adjusted lifetimes where different utilities are assigned to different
states of a multi-state process. None of these methods yield simple summary measures of the
association between sojourn times, and there is considerable appeal in developing methods
with simple measures of association.
It is therefore of interest to robustly estimate the extent of association between the
gap times. This robustness should be achieved using nonparametric methods which are not
influenced by this dependent censoring. Kendall’s tau is among the most popular association
measures between two time-to-event random variables. Kendall and Gibson (1990) gave an
empirical estimate of τ from an uncensored bivariate sample. Several authors proposed
estimators for τ with bivariate right-censored parallel observations (Lakhal et al., 2008).
Betensky and Finkelstein (1999) extended the estimation of τ to bivariate interval censored
observations. Wang and Wells (2000) derived an estimator for τ valid for any censoring
scheme as long as a nonparametric estimator for the joint survivor function exists. Little
work, however, has been done for nonparametric estimation of τ under more complicated
censoring schemes such as the dependent censoring scheme discussed above. In this paper we
4
propose a nonparametric estimator for Kendall’s τ measuring the association between two
consecutive gap times. The proposed estimator incorporates inverse probability of censoring
weights which address the impact of dependent censoring on second gap times.
Because they summarize the association between paired failure times via a single parameter such as Kendall’s τ , copula functions offer and attractive framework for modeling the
joint distribution of multiple failure times times (e.g. Fine et al., 2001; Wang and Ding, 2001;
Lakhal et al., 2006, 2008a). He and Lawless (2003) consider the case of serial events, assume
a Clayton copula to model the dependence between two successive event times, and derive
maximum likelihood estimates for the copula parameter under weak marginal assumptions.
Oakes (1986) proposed a model formulation by which the cross-ratio is constant within regions of the plane. Nan et al. (2006), adopt this model for a joint analysis of the time of
an intermediate and terminal event, and derived estimators of the copula parameters using
the maximum pseudo-likelihood procedure of Shih and Louis (1995) with the marginals left
completely unspecified.
The second purpose is develop methods for consecutive gap time analyses basedon on a
standard copula formulation, and then in a more general way in spirit to Nan et al. (2006)
based on gap times assuming a piecewise constant cross-ratio model between gap times. The
parameters of the latter model will be estimated by inverting the estimator of Kendall’s tau
derived in the first part of the paper.
The remaining of the paper is organized as follow. In the next section we discuss a
motivating study of the reproductive life of women in large cohort study (Lisabeth et al.,
2004; Nan et al., 2006). In Section 2 we define Kendall’s τ , and propose a nonparametric
estimate of it using censored serial gap time data. An estimate of the joint distribution of
serial gap times are described in Section 3 based on an assumed Clayton copula function and
these estimates are assessed through simulation studies; estimates for other functions are
5
also available. This method is generalized in Section 4 to accommodate piecewise constant
cross-ratios as in Manatunga and Oakes (1996) and Nan et al. (2006). Application to the
data from the Tremin Trust study illustrates the application of this method and facilitates
comparisons with the findings of Nan et al. (2006). Section 5 contains general remarks and
topics for further research.
1.2 Association between a Marker and Menopause in a Study of Reproductive Health
Lisabeth et al. (2004) report on a large cohort study involving 1997 women recruited as
students at the University of Minnesota between 1935 and 1939. We consider the data used
by Nan et al. (2006), who analysed followup data from 562 women who were less than 25
years of age at the time of recruitment, who provided data on the age of menarche, and who
were on study at 35 years of age. Figure 1 displays the multistate diagram for this process.
Figure 2 gives a timeline diagram of a reproductive life cycle where T1 = X represents
the time of the first cycle of at least 45 days, and T2 is the time of menopause; Y = T2 − X
is the time from the first 45 day cycle and menopause. Thus X and Y represent the sojourn
times in states 1 and 2 of Figure 1. We note that it is biologically possible for a woman to
experience menopause without experiencing the 45 days cycle marker, but the probability
of this is quite low; in data set of interest, only 1 of the 193 women experienced menopause
without prior experience of 45 days cycle marker. Nan et al. (2006) considered an analysis of
the association between T1 and T2 based on a piecewise constant Clayton copula, assuming a
common Kendall’s τ for the intervals 35-39, 40-45, 46-49 and 50+ in the spirit of Manatunga
and Oakes (1996). Based on the resulting estimates, Nan et al. (2006) obtained estimates
of the conditional distribution of the time to menopause given age at the time of the marker
event.
While copula models accommodate association between the event times, the fact that
6
T1 ≤ T2 is not incorporated into the model and hence it does not provide an ideal representation of the process. A joint analysis of X and Y , is appealing in the sense that there are
no constraints on either X or Y . Naive marginal analysis of Y is not valid, however, because
if C is the censoring time defined as the age at last contact, for Y this yields a censoring
time of C − X. As a result, if X and Y are associated, Y is subject to dependent censoring
in marginal analyses. We deal with this problem in what follows.
Insert Figures 1 to 3 about here
2. Nonparametric Estimation of Kendall’s τ
2.1 Estimation with Parallel Survival Times
Kendall’s tau is among the most popular association measures between two time-toevent random variables. Suppose (X, Y ) is a pair of event time random variables, with joint
survivor function π(x, y) = P (X > x, Y > y). If (X1 , Y1) and (X2 , Y2 ) are independently
sampled from π(X, Y ), (X1 , Y1 ) and (X2 , Y2) are said to be concordant if X1 > X2 and
Y1 > Y2 or X1 < X2 and Y1 < Y2 (i.e. the marginal rankings of individuals with respect to X
and Y agree); the pairs (X1 , X2 ) and (Y1 , Y2 ) are otherwise discordant. Kendall’s τ is defined
as the probability of concordance among two samples minus the probability of discordance
and given as
τ = Pr{(X1 − X2 )(Y1 − Y2 ) > 0} − Pr{(X1 − X2 )(Y1 − Y2 ) < 0} .
An appealing feature is that this is independent of the marginal distributions of X and Y and
is equal to zero under independence; it can be re-expressed as 2 × Pr{(X1 − X2 )(Y1 − Y2 ) >
0}−1. Moreover, if a12 = 2×I(X1 > X2 )−1 and b12 = 2×I(Y1 > Y2 )−1, then τ = E(a12 b12 )
which can be written
τ =4
Z
0
∞
Z
0
∞
π(x, y)
7
∂ 2 π(x, y)
dxdy − 1 .
∂x∂y
(2.1)
Kendall and Gibson (1990) estimated τ from uncensored bivariate data {(Xi , Yi), i = 1, . . . , n}
by its empirical version
!−1
n
τb =
2
X
ψij ,
i<j
where ψij = aij bij is the concordance/discordance status, equal to 1 if the pair (i, j) is
concordant and −1 otherwise.
In presence of censoring, it may not be possible to compute ψij for some pairs of points,
making estimation of τ more difficult; such pairs are called non-orderable and pairs that
can be ordered are orderable. With censored data, one may observe (X̃, Ỹ , δX , δY ), where
X̃ = min(X, CX ), Ỹ = min(Y, CY ), δX = I(X < CX ), δY = I(Y < CY ) and (CX , CY )
ij
are the censoring variables. Oakes (1982) showed the pair (i, j) is orderable if {X̃ij < C̃X
,
ij
j
i
Ỹij < C̃Yij } where X̃ij = min (Xi , Xj ), Ỹij = min (Yi, Yj ), C̃X
= min (CX
, CX
) and
C̃Yij = min (CYi , CYj ). If pair (i, j) is orderable, let Lij = 1; Lij = 0 otherwise (Oakes, 1982).
The estimator
!−1
n
τbO =
2
X
Lij ψij ,
i<j
is biased when τ is non-zero; nevertheless, it is widely used to test independence of a pair of
random variables based on censored data.
Recently, Lakhal et al. (2008b) proposed a method that greatly reduces the bias of τbO by
incorporating use of inverse probability of censoring weights. Let pbij is an estimator for the
probability of being orderable given by
pij = P 2 {CX > X̃ij ; CY > Ỹij |X̃ij , Ỹij } .
The weighted estimate is then given as
τbmo
!−1
n
=
2
8
X
i<j
Lij ψij
.
pbij
(2.2)
This estimator is shown to be consistent and asymptotically normally distributed under
regularity conditions. It is shown empirically to have good performance with finite sample
compared to existing competitors.
2.2 Estimation With Serial Gap Times and Dependent Censoring
In this section, we adapt the estimator given by (2.2) to estimate Kendall’s tau between serial
gap times. Let T1 = X denote the time of the intermediate event and T2 = X + Y the time
of the terminal event. We assume that C is an independent censoring variable (independent
of X and Y ) that defines the region over which T1 and T2 are observable. Denote its survival
function by G(·). That is, one may only observe X̃ = T̃1 , Ỹ = T̃1 − T̃2 , δX = I(T1 < C) and
δY = I(T2 < C) where T̃1 = min(T1 , C) and T̃2 = min(T2 , C). Note that if T1 is censored, T2
is also censored and Ỹ = 0. Thus Y is censored by C ′ = max(0, C − X). Unless X and Y
are independent, C ′ is associated to Y .
Under these conditions, Lin et al. (1999), among others, derived a nonparametric estimator
for π(x; y) for each (x, y) such as x + y ≤ C̄, where C̄ > 0 satisfies G(C̄) > 0;, this estimator
may be expressed as
πb (x, y) =
n
1X
I(X̃i > x; Ỹi > y)
,
b X̃ + y)
n i=1
G(
i
(2.3)
b
where G(·)
is the Kaplan-Meier estimator of G(·) based on {(X̃k + Ỹk , 1 − δYk ), k = 1, . . . , n}.
Plugging the latter estimator into (2.1) yields an estimator for τ we denote by τbW . Here we
propose an alternative estimator for τ , and compare these estimators.
The orderability condition for serial events Lij is expressed as {X̃ij < C̃ij ; Ỹij < min(Ci′ , Cj′ )};
a point with δX = 0 cannot be ordered with any other point. Indeed, such points contain no
information on the association of X and Y since the second gap time is not observed and
their contribution to the estimator of τ is restricted to the computation of the weights pbij .
This is in agreement with Figure 4.
9
Insert Figure 4 about here
By continuity, an orderable pair satisfies Pr(Ỹij > 0|Lij = 1) = 1, so min(Ci′ , Cj′ ) > 0,
Xi < Ci and Xj < Cj and
δXi = 1 and δXj = 1
(2.4)
X̃ij < C̃ij .
(2.5)
By (2.5), Lij reduces to {Ỹij < min(Ci − Xi , Cj − Xj )}, which can be re-written as:
{Ci > Xi + Ỹij ; Cj > Xj + Ỹij }.
(2.6)
Observe that for any orderable pair (i, j), the triplet {Xi , Xj , Ỹij } is observed. Thus one can
express the conditional probability of being orderable by
pij = Pr{Ci > Xi + Ỹij ; Cj > Xj + Ỹij |Xi , Xj , Ỹij }
= Pr{Ci > Xi + Ỹij |Xi , Xj , Ỹij } × Pr{Cj > Xj + Ỹij |Xi , Xj , Ỹij }
= G(Xi + Ỹij ) × G(Xj + Ỹij ).
This probability is estimated by
b
b
pbij = G(X
i + Ỹij ) × G(Xj + Ỹij ),
(2.7)
Kendall’s tau is then estimated by (2.2), with pbij given by (2.7). As will be proved in the
appendices, the resulting estimator is consistant and asymptotically normally distributed
as long as there a constant C̄ and ǫ satisfying G(C̄) > ǫ > 0 such that Xi + Yi < C̄ for
i = 1, . . . , n.
10
3. A Clayton Copula Model for serial gap times (X, Y )
Once the first event occurs, say at X = x, the conditional survival function SY (·|X = x)
becomes of interest for practitioners. A convenient way to estimate this probability is to
assume a Clayton copula for the pair (X, Y ). In this section, we investigate such model,
derive related inference procedures and discuss further extensions.
3.1 Model and properties
Under a Clayton copula for (X, Y ), the joint survival function is expressed as
h
π(x, y) = SX (x)−(θ−1) + SY (y)−(θ−1) − 1
i−1/(θ−1)
,
(3.1)
where λY (y|·) = limdy↓0 Pr(Y ≤ y + dy|Y > y; ·)/dy is the conditional hazard function of Y
and θ ∈ [0; ∞] is the cross-ratio of X and Y defined as
θ=
λY (y|X = x)
= θ(x, y) .
λY (y|X > x)
(3.2)
Under this model, the cross-ratio, measuring the level of dependence between X and Y , is
constant and is related to Kendall’s tau through
τ=
The fact that
θ+1
.
θ−1
SY (y|X = x) = [SY (y|X > x)]θ .
(3.3)
(3.4)
mean estimation of the conditional distribution can be obtained from SY (y|X > x).
Lin et al. (1999) note that SY (y|X > x) can be estimated by πb (x; y)/πb (x; 0) for x + y < C̄,
where πb (·, ·) is given by (2.3). We can also estimate θ from (3.3) by θb = (τb + 1)/(τb − 1),
where τb is the nonparametric estimator of τ , derived in Section (2.2). A natural estimator
for SY (y|X = x) is then obtained by plugging in estimators for unknown quantities in (3.4).
11
3.2 Numerical investigations with the Clayton Copula
Colon cancer data: Moertel et al. (1990) discuss a clinical trial where patients treated
for colon cancer are randomized into two groups: therapy and placebo, including 304 and
315 patients respectively. Patients are potentially subject to ordered events and the serial
gap times in this example are: the time from randomization to cancer recurrence (X) and
the time from cancer recurrence to death (Y ). At the end of the study, 108 patients died
among the 119 who had cancer occurrence in the therapy group and 155 died among 177
who had cancer recurrence in the placebo group. We computed τb and τbW for both groups.
We found τb = 0.2725 (s.e. 0.058) and τbW = −0.796 (s.e. 0.613) for the therapy group and
τb = 0.2685 (s.e. 0.054) and τbW = 0.012 (s.e. 0.779) for the placebo group. Our estimator
τb detects a significant positive dependence between X and Y in both groups. This was
conjectured by Lin et al. (1999). Furthermore, τb suggests that the magnitude of this
dependence isn’t affected by the therapy. At the opposite, the variance of τbW is too large
to make inference. A convenient way to illustrate this dependence is to investigate the
conditional survival SY (·|X = x). The latter is estimated under a Clayton copula for the
pair (X, Y ). He and Lawless (2003) tested and didn’t reject the adequacy of the assumed
model for both groups. In Figure 5, we report the median of SbY (·|X = x) versus x for both
groups.
Insert Figure 5 about here
Figure 5 suggests that therapy decreases survival time following cancer recurrence. This is
in agreement with the conclusions of Lin et al. (1999) and He and Lawless (2003).
Simulations: A first set of simulations was conducted to assess and compare the performances of τb, τbW and the underlying estimators for SY (y|X = x) with finite samples. Four
real τ values (0.2, 0.4, 0.6 and 0.8) were used to generate samples of 400 correlated pairs
12
(X, Y ) using a Clayton copula, with exponential marginals with means equal to 1 and 1/2
respectively. The censoring variable C was generated from a uniform distribution over [0, A],
with A controlling the censoring fraction cf2 = P (X + Y > C). Two values (0.1 and 0.2)
were used for the censoring fractions. They correspond to cf1 = P (X > C) equal to 0.20 and
0.30 approximately. Means and mean square errors of τb, τbW and the resulting SbY (yi |X = x0 )
(i = 1, 2, 3, 4), computed over 1000 iterations, are reported in Table 1. The points x0 , y1 , y2 , y3
and y4 are chosen such as SX (x0 ) = 1/2 and SY (yi|X = x0 ) = i/5.
Insert Table 1 about here
As expected, the censoring fractions affect all estimators. Table 1 shows that τb outperforms
τbW under all simulation conditions, as attested by their respective mean square errors. The
proposal τb is virtually unbiased, except under severe conditions cf1 = 0.2, cf2 = 0.3 and
τ ≥ 0.6. On the other hand, the bias of τbW is non negligible, even under light censoring
and small values of τ . The same conclusions can be reached for estimators of SbY (yi |X = x0 )
based on τb and τbW respectively.
Table 1 suggests that the variance of SbY (y|X = x0 ) is smaller for large values of y. This
result is not standard in survival analysis but is in accordance with the simulation results
reported in Table 1 of Lin et al. (1999).
4. A Piecewise Clayton Copula
A constant cross-ratio, and hence the Clayton copula may not be appropriate for some
applications. Nan et al. (2006) discuss a copula model where the cross-ratio depends on one
of the time-to-events variables, say X, rather than being constant. They assumed a partition
0 = w0 < w1 < · · · < wK of the support of X such as the cross-ratio is constant inside each
13
grid [wk−1 , wk [×[0, MY ] and equal to θk . Under such conditions, (3.4) becomes:
SY (y|X = x) = [SY (y|X > x)]θk
for x ∈ [wk−1, wk [ .
(4.1)
The resulting model is referred to as the piecewise Clayton copula in what follows. Kendall’s
tau τk restricted to (X, Y ) ∈ Ak = [wk−1 , wk [×[0, MY ] is related to θk but this relationship
is not as simple as (3.3), except for k = K. It depends on SX (wk−1 ) and SX (wk ) and is
difficult to track analytically. Denote it as
τk = ξ [θk ; SX (wk−1); SX (wk )] .
(4.2)
Nevertheless, numerical values of ξ can be easily obtained by simulations. In Figure 6, these
are reported with SX (wk−1) = 0.75 and SX (wk ) = 0.5.
Insert Figure 6 about here
Nan et al. (2006) assumed a piecewise Clayton copula for (T1 , T2 ) and estimated the model
parameters {θk , k = 1, . . . , K} by adapting the methodology of Shih and Louis (1995). This
approach ignores the ordered nature of (T1 , T2 ). In particular, the resulting estimator of
Pr(T1 > T2 ) is not identically zero as it should be. A more appealing approach may be to
assume a piecewise Clayton copula for (X, Y ) where no order restrictions are present, rather
than for (T1 , T2 ). Moreover, if the presence of a cycle of 45 days or more signals early changes
prior to the onset of menopause, Y may well be a more natural object of interest for some
scientific questions..
We propose to estimate the model parameters {θk , k = 1, · · · , K} by inversion of ξ. We begin
by estimating τk via a slight modification of the estimator derived in Section (2.2). Note
that for any comparable pair (i, j), both Xi and Xj are observed and thus censoring does
not prevent one from knowing whether these points belong to a given strip Ak . Denote this
14
event by νi,j (k) = 1{wk−1 ≤X̃i ,X̃j ≤wk } . A natural estimator of τk is given by
τbk =
X
i<j
Lij ψij νi,j (k) X Lij νi,j (k)
/
.
pbij
pbij
i<j
(4.3)
For k = 1, · · · , K, θk is then estimated by
h
i
θbk = ξ −1 τbk , SbX (wk−1), SbX (wk ) .
(4.4)
The resulting θbk , along with (4.1), yields an estimator for SY (y|X = x).
4.1 Analysis of the Tremin Trust Data
We consider the tremin trust data. The purpuse of this follow-up study is to quantify
association between several bleeding makers such as the 45-days cycle and menopause. At
the end of the study, 193 women observed menopause among the 357 women who experienced
the 45-day marker. We consider the same intervals for T1 = X, used by Nan et al. (2006),
namely: 35 − 39, 40 − 45, 46 − 49 and 50+ where the cross-ratio between X and Y is
assumed constant. These boundaries satisfy SbX (wk ) = {0.8383, 0.562, 0.281}. We estimated
τbk , k = 1, 2, 3, 4 using equation (4.3). Inverting these estimates according to (4.4) yields
estimates for θ. The results are presented in Table 2.
Insert Table 2 about here
Thus, there is a significant association between the 45-day cycle and menopause if the latter
occurs inside the interval 40 − 49. This is in agreement with the results of Nan et al. (2006)
who also detected a significant association between T1 and T2 in the same region. Nan et
al. (2006) presented plots of Pr(T2 > t2 |T1 = t1 ) versus t2 for different values of t1 . This
probability is equal to Sy (t2 − t1 |X = t1 ) and thus can be estimated by the assumed model.
The results are presented in Figure 7 for t1 ∈ {36, 39, 42, 45, 48, 51}.
Insert Figure 7 about here
15
This figure agrees with the one produced by Nan et al. (2006) for women who experience the
45 days cycle marker after age 40. Nevertheless, we obtain different results for women who
experience the 45 days cycle before age 40. In particular, we estimate that such a woman
has about 20% of chances to observe menopause before age 45 (the X = 36 and 39 curves)
while this probability is estimated to be approximately null by Nan et al. (2006). This may
suggests a lack of fit of one of the models and thus the need for appropriate goodness-of-fit
tests for such copulas models. This is beyond the scope of this paper.
4.2 Numerical Investigations
A second set of simulations was conducted to assess the performances of the estimator of
τk given by (4.2). We simulated 1000 samples of 562 bivariate observations according to a
piecewise Clayton copula with parameters corresponding to:
τk ∈ {−0.049, −0.374, −0.235, −0.139}.
The boundaries wk , k = 1, 2, 3 are chosen such as SX (wk ) are equal to {0.8383, 0.562, 0.281}.
We got τbk equal to
{−0.049(s.e.0.10), −0.373(s.e.0.066), −0.229(s.e.0.054), −0.136(s.e.0.053)}.
These simulations show that τbk is virtually unbiased. Furthermore, the jackknife estimates
of the variance used with the real data set seem to over estimate the real variability.
5. Discussion
In this paper, we consider nonparametric estimation of the association between successive
gap times (X, Y ) when the second gap time is subject to dependent censoring induced by the
association itself and an independent right censoring time of the process. Inverse probability
weights are used to obtain a nonparametic estimate of Kendall’s tau which can be used to test
the null hypothesis of independence between gap times. This can be done by either providing
16
an estimator for the variance of τb under independence or by resampling techniques. The
proposed estimator of Kendall’s τ can then be used to make inferences about the conditional
survivor function SY (y|X = x) under a Clayton copula. While this is possible by (3.4) for the
Clayton copula, this particular relation does not hold for other copula families. Alternative
ways of estimating SY (y|X = x) under arbitrary copula functions warrant investigation as
the Clayton copula may not provide suitable fit for a given data set.
Recurrent event processes yield successive gap times and using generalizations of the
estimator of Kendall’s tau to make inference about gap time models for recurrent event
processes with copula formulations is another area warranting development. Such models
would allow the estimation of
Pr(Tk > tk |T1 = t1 , T2 = t2 , . . . , Tk−1 = tk−1 ) .
We employed a piecewise Clayton copula formulation in Section 4. Such models have
received relatively little attention in the literature and methods to assist in the specification
of regions with a constant cross-ratio would be helpful; at present they are based on ad hoc
graphical methods. Derivation of formal goodness-of-fit tests for these models and objective
procedures for specifying these regions would increase their practical utility.
In many settings event times are not observed precisely, but individuals are only assessed
at periodic inspection times creating interval censored data on gap times. In the context of
a three-state progressive model, the intermediate event may be subject to interval-censoring
and the terminal event right-censoring, or both events may be interval censored. In this
case nonparametric estimation of the association between gap times is considerably more
challenging, and parametric assumptions may be required.
17
REFERENCES
Andersen, P.K. and Keiding, N. (2002). Multi-state models for event history analysis. Statistical Methods for Medical Research 11, 91–115.
Andrei, A.-C. and Murray, S. (2006). Estimating the quality-of-life-adjusted gap time distribution of successive events subject to censoring. Biometrika 93(2): 343–355.
Bebchuk, J.D. and Betensky R.A. (2002). Local likelihood analysis of the latency distribution
with interval censored intermediate events. Statist. Med., 21, 3475-3491.
Betensky, R. and Finkelstein, D. (1999). An extension of Kendall’s coefficient of concordance
to bivariate interval-censored data. Statist. Med., 18, 3101–3109.
Fine, J. Jiang, H. and Chappell, R. (2001). On semi-competing risks data Biometrika 88(4),
907-919.
Hougaard, P. (2000). Multi-state models: A review. Lifetime Data Analysis 5, 239–264.
Springer, New York.
Huang, Y. (2000). Two-sample multistate accelerated sojourn times model. J. Am. Statist.
Assoc 95, 61927.
He, W. and Lawless, J.F. (2003) Flexible maximum likelihood methods for bivariate proportional hazards models. Biometrics 12: 837–848.
Kendall, M. and Gibbons, J.D. (1990). Rank Correlation Methods. (Fifth ed.). A Charles
Griffin Title. London: Edward Arnold.
Kessing, L.V., Hansen, M.G., and Andersen, P.K. (2004a). Course of illness in depressive
bipolar disorders: Naturalistic study, 1994–1999. Brit. J. Psych. 185, 372–377.
Kessing, L.V., Hansen, M.G., Andersen, P.K., Angst, J. (2004b). The predictive effect of
episodes on the risk of recurrence in depressive and bipolar disorder - a life-long prospective. Acta Psych. Scand. 109, 339–344.
18
Kvist, K., Gerster, M., Andersen, P.K., Kessing, L.V. (2007). Non-parameteric estimation
and model checking procedures for marginal gap time distributions for recurrent events.
Statist. Med. 26, 5394–5410.
Lakhal-Chaieb L., Rivest, L.-P. and Abdous, B. (2006). Estimating survival under dependent
truncation. Biometrika 93 (3), 655–669.
Lakhal-Chaieb L., Rivest, L.-P. and Abdous, B. (2008a). Estimating survival and association
in a semicompeting risks model. Biometrics 64 (1), 180–188.
Lakhal-Chaieb L., Rivest, L.-P. and Beaudoin, D. (2008b) IPCW estimators for Kendall’s
tau under bivariate censoring. Under revision.
Lin, D. Y., Sun, W., and Ying, Z. (1999). Nonparametric estimation of the gap time distributions for serial events with censored data. Biometrika 86: 59-70.
Lisabeth, L.D., Harlow, S.D., Gillespie, B., Lin, X., Sowers, M.F. (2004). Staging productive aging: a comparaison of proposed bleeding criteria for the menopausal transition.
Menopause 11(2): 186–197.
Manatunga, A.K. and Oakes D. (1996). A meaure of association for bivariate frailty distributions. Journal of Multivariate Analysis 56, 60–74.
Moertel, C.G., Fleming, T.R., McDonald, J.S. et al. (1990). Levamisole and fluorouracil for
adjuvant therapy of resected colon carcinoma. New England Journal of Medicine, 322,
352–358.
Nan, B., Lin, X., Lisabeth, L.D., Harlow, S.D. (2006). Piece-wise constant cross-ratio estimates for the association between age at marker event and age at menopause. J. Am.
Statist. Assoc., 101 65–77.
Oakes, D. (1982). A concordance test for independence in the presence of censoring.
Bioemtrics, 38 (2): 451-455.
Oakes, D. (1986). A model for bivariate survival data. In Modern Statistical Methods in
19
Chronic Disease Epidemiology, eds. S.H. Moolgavkar and R.L. Prentice. Wiley, New York.
Schaubel, D.E. and Cai, J. (2004). Regression methods for gap time hazard functions of
sequentially ordered multivariate failure time data. Biometrika 91, 291–303.
Shih, J.H. and Louis, T.A. (1995). Inferences on the association parameter in copula models
for bivariate survival data. Biometrics 51: 1384–1399.
Wang, M. C. (1999). Gap time bias in incident and prevalent cohorts. Statist. Sinica 9: 999–
1010.
Wang, W. and Wells, M.T. (1998). Nonparametric estimation of successive duration times
under dependent censoring. Biometrika 85(3): 561–572.
Wang, W. and Wells, M.T. (2000). Estimation of Kendall’s tau under censoring. Statist.
Sinica 10(4): 1199-1215.
20
Appendix A: A U-statistics expression for
√
n(τ̃ − τ )
It is clear that τ̃ is a U-statistics of order 2 whose expectation is equal to
E(τ̃ ) =
=
=
n
2
!−1
n
2
!−1
n
2
!−1
Lij
aij bij }
pij
X
E{
X
E{E[
X
E{
i<j
i<j
i<j
Lij
aij bij |Xi , Xj , Ỹij ]}
pij
1
E[Lij aij bij |Xi , Xj , Ỹij ]}.
pij
Once Xi , Xj and Ỹij are fixed, by (2.6), the orderability event Lij depends only on the
censoring variables Ci and Cj while the concordance/discordance status aij bij depends only
on max(Yi , Yj ) and hence, by independence, one has
E{Lij aij bij |Xi , Xj , Ỹij } = E{Lij |Xi , Xj , Ỹij }E{aij bij |X̃ij , Ỹij }
and, if pij > 0 for all pairs,
E(τ̃ ) =
=
And thus
√
!−1
n
2
!−1
n
2
X
E{E[aij bij |Xi , Xj , Ỹij ]}
X
E{aij bij } = τ
i<j
i<j
n(τ̃ − τ ) is a zero mean U-statistic of order 2.
21
Appendix B: Asymptotic expression for
√
n(τ̂ − τ̃ )
Following the lines of Andrei & Murray (2006), we write
1
pbij
−
1
pij
=
=
=
+
b
G(X
1
1
−
b
G(Xi + Ỹij )G(Xj + Ỹij )
i + Ỹij )G(Xj + Ỹij )
b
b
G(Xi + Ỹij )G(Xj + Ỹij ) − G(X
i + Ỹij )G(Xj + Ỹij )
b
b
G(Xi + Ỹij )G(Xj + Ỹij )G(X
i + Ỹij )G(Xj + Ỹij )
(B.1)
b
G(Xi + Ỹij ){G(Xj + Ỹij ) − G(X
j + Ỹij )}
b
b
G(Xi + Ỹij )G(Xj + Ỹij )G(Xi + Ỹij )G(X
j + Ỹij )
(B.2)
b
b
G(X
j + Ỹij ){G(Xi + Ỹij ) − G(Xi + Ỹij )}
b
b
G(Xi + Ỹij )G(Xj + Ỹij )G(X
i + Ỹij )G(Xj + Ỹij )
(B.3)
The absolute value of the right hand side of (B.2) is bounded from above by
b
2 × Sup0≤c≤C̄ |G(c) − G(c)|
b 2 (C̄)G2 (C̄)
G
and thus converges to zero in probability by uniform convergence of the Kaplan-Meier esti√
mator on [0, C̄]. The convergence of n(τ̂ − τ̃ ) follows.
On the other hand, one has
√
"
b G(t
b ′)
G(t)G(t′ ) − G(t)
n
G(t)G(t′ )
#
=
√
b
n{G(t)
− G(t)}
+
G(t)
√
b ′ ) − G(t′ )}
n{G(t
+ op (1)
G(t′ )
Pn
c
b −)
√ Z t G(u
k=1 dMk (u)
n
=
Pn
0 G(u)
k=1 I(X̃k + Ỹk ≥ u)
√ Z
n
+
0
t′
P
n
c
b −)
G(u
k=1 dMk (u)
+ op (1)
P
G(u) nk=1 I(X̃k + Ỹk ≥ u)
where M c (u) is the standard martingale associated to the censoring variable C defined by
M c (u) = I(C ≤ u; δY = 0) −
and thus
√
Ru
0
I(X̃ + Ỹ ≥ s)dΛc (s),
"
b
b
G(Xi + Ỹij )G(Xj + Ỹij ) − G(X
i + Ỹij )G(Xj + Ỹij )
n
G(Xi + Ỹij )G(Xj + Ỹij )
#
can be written as:
√ Z
n
0
C̄
P
n
c
b −)
√
G(u
k=1 dMk (u)
+ n
I(Xi +Ỹij ≥ u)
Pn
G(u) k=1 I(X̃k + Ỹk ≥ u)
22
Z
C̄
0
P
n
c
b −)
G(u
k=1 dMk (u)
+op (
I(Xj +Ỹij ≥ u)
P
G(u) nk=1 I(X̃k + Ỹk ≥ u)
and
√
√ Z
n(τ̂ − τ̃ ) =
n
C̄


P
C̄


P
0
√ Z
+
n
0
+ op (1)
n
c
b −)
1 X Lij aij bij I(Xi + Ỹij ≥ u)  G(u
k=1 dMk (u)
 P
n
b
b
G(u) nk=1 I(X̃k + Ỹk ≥ u)
G(X
i + Ỹij )G(Xj + Ỹij )
2 i<j
n
c
b −)
1 X Lij aij bij I(Xj + Ỹij ≥ u)  G(u
k=1 dMk (u)
 P
n
b
b
G(u) nk=1 I(X̃k + Ỹk ≥ u)
G(X
i + Ỹij )G(Xj + Ỹij )
2 i<j
Since
b −)
G(u
1
n
→
Pn
G(u) k=1 I(X̃k + Ỹk ≥ u)
P (X̃ + Ỹ ≥ u)
almost surely, the right hand side of (B.4) can be expressed as
√1
n
Z
C̄

=
1
√
n
Z
=
n Z C̄
1
1 X
L12 a12 b12 {I(X1 + Ỹ12 ≥ u) + I(X2 + Ỹ12 ≥ u)}
√
dMkc (u) + op (1)
E
n k=1 0
G(X1 + Ỹ12 )G(X2 + Ỹ12 )
P (X̃ ≥ u)
0
Finally,

n
X
1 X Lij aij bij {I(Xi + Ỹij ≥ u) + I(Xj + Ỹij ≥ u)} 
1
 dMkc (u) + op (1)
n
b
b
P (X̃ + Ỹ ≥ u) k=1
G(Xi + Ỹij )G(Xj + Ỹij )
2 i<j
√
0
C̄
"
#
n
X
1
L12 a12 b12 {I(X1 + Ỹ12 ≥ u) + I(X2 + Ỹ12 ≥ u)}
E
dMkc (u) + op (1)
b
b
P
(
X̃
≥
u)
G(X1 + Ỹ12 )G(X2 + Ỹ12 )
k=1
"
#
n(τ̂ − τ̃ ) is asymptotically equivalent to a sum of zero-mean independent and
identically distributed terms.
23
Table 1
Means and Mean square errors (in parentheses; ×104 ) of τb (IPCW) and τbW (WW) and
their resulting conditional survival estimates
cf2
τ̂
ŜY (y1 |x0 )
0.1
ŜY (y2 |x0 )
ŜY (y3 |x0 )
ŜY (y4 |x0 )
τ̂
ŜY (y1 |x0 )
0.2
ŜY (y2 |x0 )
ŜY (y3 |x0 )
ŜY (y4 |x0 )
τ = 0.2
IPCW WW
0.196 0.214
(15)
(34)
0.196 0.185
(12)
(20)
0.400 0.387
(16)
(28)
0.600 0.592
(17)
(21)
0.804 0.797
(15)
(16)
0.190 0.179
(25)
(63)
0.198 0.193
(23)
(36)
0.399 0.402
(28)
(40)
0.597 0.595
(29)
(38)
0.798 0.798
(20)
(25)
τ = 0.4
IPCW WW
0.400 0.413
(12)
(43)
0.202 0.192
(11)
(27)
0.402 0.390
(26)
(46)
0.602 0.592
(31)
(50)
0.802 0.794
(22)
(31)
0.395 0.396
(17)
(61)
0.195 0.192
(21)
(40)
0.401 0.395
(40)
(67)
0.602 0.596
(40)
(60)
0.802 0.799
(36)
(47)
24
τ = 0.6
IPCW WW
0.597 0.606
(7)
(37)
0.204 0.194
(25)
(41)
0.405 0.395
(59)
(85)
0.607 0.608
(69)
(87)
0.806 0.809
(62)
(69)
0.590 0.596
(10)
(74)
0.202 0.194
(41)
(78)
0.401 0.387
(80)
(149)
0.601 0.584
(116) (183)
0.797 0.775
(111) (145)
τ = 0.8
IPCW WW
0.801
0.820
(3)
(37)
0.207
0.445
(67)
(141)
0.407
0.415
(193)
(352)
0.610
0.593
(290)
(554)
0.811
0.820
(305)
(760)
0.795
0.788
(3)
(70)
0.208
0.723
(156)
(283)
0.402
0.417
(377)
(787)
0.622
0.635
(840) (1653)
0.807
0.856
(884) (6224)
Table 2
Estimates of τk and θk with the Tremin trust data
k
1
2
3
4
τbk
-0.049
-0.374
-0.235
-0.139
s.e.
0.157
0.070
0.093
0.118
θbk
0.412
0.184
0.327
0.756
25
95% C.I.
[0.15;14.8]
[0.128;0.270]
[0.205;0.650]
[0.460;1.200]
Initial
At Least One
State
45−day Cycle
Menopause
Figure 1. A three-state model for an intermediate and termainal event.
Y
X
|
|
|
|
BIRTH
MENARCHE
1ST 45−DAY
|
|
MENOPAUSE
CYCLE
T=0
T1
C1
T2
C2
Figure 2. Timeline diagram for the reproductive life cycle including the intermediate event
of a cycle ≥ 45 days in duration.
26
1.0
0.8
0.6
0.4
0.0
0.2
Proportion of women with event
45−Day Cycle
Menopause
40
45
50
55
60
Age in years
Figure 3. Empirical marginal distribution of time to first cycle of ≥ 45 days duration and
menopause in Tremin Trust data of Nan et al. (2006).
27
X
Y
Figure 4. Plots of various censored data patterns.
28
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
Conditional median gap time from cancer recurrence to death
Observation
Therapy
0
1
2
3
4
Time from randomisation to cancer recurrence (years)
Figure 5. Empirical estimates of conditional median time from cancer occurrence to death
given cancer occurrence date.
29
0.0
−0.2
−0.4
−0.6
−0.8
−1.0
Kendall’s tau
Restricted Kendall’s tau
Global Kendall’s tau
0.0
0.2
0.4
0.6
theta
Figure 6. Restricted Kendall’s tau.
30
0.8
1.0
1.0
0.8
0.6
0.4
0.0
0.2
Survival Probability
age 36
age 39
age 42
age 45
age 48
age 51
35
40
45
50
55
60
Age at menopause
Figure 7. Empirical estimates of conditional survivor functions for menopause given age at
first cycle of ≥ 45 days duration.
31
0.8
0.6
0.4
0.0
0
1
2
3
4
0
1
2
3
4
x=3
x=4
1.0
Time since cancer recurrence in years (Y)
0.6
0.8
Observation
Therapy
0.4
Conditional survival
0.0
0.2
0.4
0.6
0.8
Observation
Therapy
0.2
1.0
Time since cancer recurrence in years (Y)
0.0
Conditional survival
Observation
Therapy
0.2
Conditional survival
0.2
0.4
0.6
0.8
Observation
Therapy
0.0
Conditional survival
1.0
x=2
1.0
x=1
0
1
2
3
4
0
Time since cancer recurrence in years (Y)
1
2
3
4
Time since cancer recurrence in years (Y)
Figure 8. Empirical estimates of conditional survivor functions for time from cancer occurrence to death given cancer occurrence.
32
0.8
0.6
x=3
x=4
0.0
0.2
0.4
x=4
x=2
0.4
x=3
x=1
0.2
0.6
x=2
Conditional survival
0.8
x=1
0.0
Conditional survival
1.0
Therapy
1.0
Observation
0
1
2
3
4
0
Time since cancer recurrence in years (Y)
1
2
3
4
Time since cancer recurrence in years (Y)
Figure 9. Empirical estimates of conditional survivor functions for time from cancer occurrence to death given cancer occurrence date.
33