Estimation with Univa.ria.te "Mixed Case

Estimation with Univa.ria.te "Mixed Case" Interval
Censored Data
by
Shuguang Song
The Boeing Company
TECHNICAL REPORT No. 414
August, 2002
Department of Statistics
Box 354322
University of \Vashington
Seattle, Washington, 98195 USA
Estimation with Univariate "Mixed Case" Interval
Censored Data
Shuguang Song
The Boeing Company
Abstract: In this paper, we study the Nonparametric Maximum Likelihood Estimator (NPMLE)
of univariate "Mixed Case" interval-censored data in which the number of observation times, and
observation time themselves are random variables. We provide a characterization of the NPMLE,
then formulate the ICM algorithm for computing the NPMLE. We also study the asymptotic
properties of the NPMLE: consistency, global rates of convergence with and without a separation
condition, an asymptotic minimax lower bound and pointwise asymptotic distribution for the "toy
estimator" proposed by GROENEBOOM AND WELLNER (1992)
Key words and phrases: convex minorant algorithm, maximum likelihood, empirical processes,
interval censored data, consistency, rate of convergence, asymptotic distribution.
1
Introduction and Formulation of the Problem
Univariate interval censoring problem arises when a random variable, such as failure time or onset
of disease, can not be directly observed, but is only known to be in an interval determined by several
observation times. Interval censored data and the corresponding application of statistical methods
can be found in animal carcinogenicity, epidemiology, HIV and AIDS studies, see FINKELSTEIN AND
WOLFE (1985), FINKELSTEIN (1986), SELF AND GROSSMAN (1986), BECKER AND MELBYE (1991) and
ARAGON AND EBERLY (1992). There are three types of univariate interval censorship models:
2" and "Case k" interval censoring,
interval censoring or current status
interval CellSC)f]Jlg.
Suppose that Y
tUllCtiorlS on
data observed are:
The nonparametric maximum likelihood estimation
1" and
2" inf./3rv"J
censored data are well summarized in GROENEBOOM AND VVELLNER (1992), GROENEBOOM (1996),
HUANG AND WELLNER (1997).
WELLNER (1995) studied the NPMLE of the
interval
censoring in which each subject has exactly k examination times.
To compute the NPMLE, TURNBULL (1976) derived self-consistency equations for a very general
censoring scheme and proposed solving the equations by the EM algorithm. GROENEBOOM (1991)
developed an iterative convex minorant algorithm (ICM). The ICM algorithm is considerably faster
than the EM algorithm especially when the sample size is large; see GROENEBOOM AND WELLNER
(1992). ARAGON AND EBERLY (1992) and Jongbloed (1995, 1998) proposed modifications of the
ICM algorithm, and Jongbloed (1995, 1998) showed that his modified algorithm always converges.
Very often in clinical trials, each patient has several follow-ups and the number of followups differs from patient to patient. This motivates the study of the following model. Let T
Tk,j, j
0,1, ... ,k, k + 1, k = 1, ... , be a triangular array of "potential observation times" with
Tk,o
0 and Tk,k+I ::::: +00, and let K, the number of observation times, be an integer-valued
random variable such that Y and (K,
are independent. \Vhat we can observe is a vector X =
(2.l.K' TK, K), with possible value x = (Ok, tk, k), where Tk is the kth row of the triangular array T,
2.l.k
(~k,ll' .. ,2.l.k,k) with 2.l.k,.f = 1(Tk.j_l,Tk,j](Y)' j
1,2, .. , , k + 1. Suppose we observe n i.i.d.
, . , ex·. X' 1,..i1.2,
v
v
rh ere X·~~ (,A
1 , 2 , ... ,n.
' Here (}.r(i) ,'T,(i)
..
copleso~
. . . ,An'"
u (i)(i)1 'T(i)
K(i)1 K(i))..
,Z
_ , K)
~ ,z
K
1,2, ... , are the underling i.i.d. copies of (Y, T, K). SCHICK AND Yu (2000) referred to this model as
the "Mixed Case" interval censoring model. They proved strong consistency in the L 1 (fl)-topology
of the NPMLE for a measure fl which is derived from the distribution of observation times.
VAN DER VAART AND WELLNER (2000) gave a different formulation of "Mixed Case" interval
censoring. They noted that conditional on K and T K , the vector
has a multinomial distribution:
where
Suppose for
moment that the distribution G k of (TKIK
Then the de11sil:y
X is
.1)
has density gk, and
P(K
(K,
with respect to the dominating measure v which is determined by the joint distribution
Thus the log-likelihood function for F of b
, ... ,Xn is given by
1
Ki+l
= -n '"
L::.K.Jlog(F(TK(i)
~ '"
~..
. ..
(1.2)
i=1 j=1
where
Ki+1
mF(X) =
L
F(T~;,j_1))
L::.Ki,jlog(F(T~;
j=1
K+1
L
j=1
6.Ki,jlog(6.F(T~;,j))'
VAN DER VAART AND WELLNER (2000) proved the consistency of the NPMLE of the "Mixed
Case" interval censoring in Hellinger distance, and also recovered the results of SCHICK AND Yu
(2000) by using preservation theorems for Glivenko-Cantelli classes. Here, we will use the above
formulation of the "Mixed Case" interval censoring problem. In Section 2, we give a characterization
of the NPMLE. We then formulate the ICM algorithm for computing the NPMLE. In Section 3, we
state the main asymptotic properties of the NPMLE: consistency, global rates of convergence with
and without a separation condition, an asymptotic minimax lower bound and pointwise asymptotic
distribution. The results in Section 3 are proved in Section 5 by using modern empirical process
theroy. There are other two estimators for the univariate "mixed case" interval censored data that
are based on the non-homogeneous Poisson process model: nonparametric maximum likelihood
estimator and nonparametric maximum pseudo-likelihood estimator (NPMPLE), see WELLNER AND
ZHANG (2000). We denote these two estimators as NPMLE wZ and NPMPLE wz . In Section 4,
we present simulation studies to compare the asymptotic relative efficiency of the above three
estimators.
2
Characterization and Computation of the NPMLE
Let tl ::; t2 ::; ... ::; t m denote the ordered distinct observation time points in the set of all
observation time points:
{TKi,j, j = 1, ... , Ki,i = 1, ... , n}.
Define the rank function R:
,j,
= s.
J
1, ... , Ki,
if
1, ... ,
such that
--t {I, 2, ... ,
s
1, ... ,
<
rewritten as:
F
and
the same reason as explained in
2" interval
m GROENEBOOM
WELLNER 1992), we can take tl to be an observation point TKi,j such
1,
and
to be an observation point such that l(TK,
(Y'i)
1, Note that n is a convex
In
is a concave function due to the linear combination
the concave function "log". Thus,
the characterization of the NPMLE Fn is given by the following slightly modified version of the
Fenchel duality theorem in Lemma 3.1 of WELLNER AND ZHAN (1997):
Theorem 2.1 Let ¢ : R n -+ R U { -oo} be a continuous concave function. Let K U R n be a convex
cone, and let K o
K n
(R). Suppose K o is nonempty and ¢ is differentiable on K o. Then
i;. E K o satisfies
if and only if
(2.1)
(i;., ¢(i;.))
0,
and
v ¢ (i;.))
(2.2)
::;
°
for all ;f E K.
Note that if;f
(0, ... ,0,1, ... ,1), E > 0, then
i;.+dj, where 1j
'-..-' '-..-'
m-j
j
v ¢(i;.))
m
v ¢(i;.))
0+
EI:
l=m-j+l
Thus, (2.2) becomes
Tn
I:
'i
= 1,2, ... , m.
this case,
any
E
n,
can be written as
::; 0,
::; o.
where
Ki
L
j=l
and
6K(k)
={
"J
1, if R(TK
.. i,J)
0, otherwIse.
= k,
Then, by applying (2.1) and (2.3), we have the following characterization of the NPMLE
Fn :
Theorem 2.2 Let T(l) correspond to an observation time TKi,j such that £lKi,j = 1, and let T(m)
be the largest ordered observation times such that the corresponding £lKi,j = 0. Then the unique
NPMLE
maximizes In(FIX) over all F E :F if and only if
m
(2.5)
n
L L li,k(FnIX) ~ 0,
k=l i=l
and
(2.6)
Denote
where
for l = 1,2, ... ,m,
the G(F,') and V(F,
==
processes
0,
p
G(F,
V(F,O)
=
=
p
I) -lkk(F1X)) =
L
k=l
k=l
L .,.",,,,,."'-,=-=-;
i=1
0,
P
V(F,
=
L(lk(FIX)
k=1
+ F(tk)(-lkk(FjX)))
n
p
L L [li,k
k=1 i=1
where p
F(tk) li.kk(FIX)] ,
1,2, ... , m. Since
p
L
k=1
G(F,pIX)
V(F,pIX)
=
by using Theorem 4.3 of WELLNER AND ZHAN (1997), we can also characterize the NPMLE
the slope of the convex minorant of a self-induced cumulative diagram:
Fn
as
Theorem 2.3 Let 1(1) correspond to an observation time 1Ki ,j such that D.K;,j
1, and let
be the largest ordered observation times such that the corresponding D.Ki,j = O. Then
is the
NPMLE of Fo if and only if
is the left derivative of the convex minorant of the cumulative sum
diagram consisting the following points
Po
p I , ... ,m.
convex minorant
the cumulative Slun
u.W'OL'UH
can be
and Proposition 1.4 of GROENEBOOM AND WELLNER 1992). Thus, the computation of the NPMLE
of "Mixed Case" interval censoring can be reduced to the "Case 2" interval censoring as noted
HUANG AND WELLNER
and VAN DER VAART AND WELLNER
The
algorithm has been applied in many problems: see GROENEBOOM AND WELLNER (1992),
Jongbloed (1995, 1998), WELLNER AND ZHAN (1997), WELLNER AND ZHANG (2000). To prove the
global convergence, JONGBLOED (1995) designed a modified iterative convex minorant algorithm
(MICM) by inserting a binary line search procedure. In this case, the NPMLE of the "Mixed
Case" interval censoring can be calculated by the MICM algorithm as for the "Case 2" interval
censoring.
3
Asymptotic Properties of the NPMLE: Results
In this section, we will study the asymptotic properties of the NPMLE: consistency, global rate of
convergence in Hellinger distance, a local asymptotic minimax lower bound, and pointwise asymptotic distribution of the "toy estimator". The "toy estimator", introduced by GROENEBOOM AND
WELLNER (1992), is obtained by taking one step in the iterative convex minorant algorithm starting
with the real underlying distribution function Fa. It is conjectured that this toy estimator has the
same asymptotic distribution as the NPMLE. The following regularity conditions are needed:
A. EK <
00.
B. Separation condition: there exists a 8
J
1,2, ... , K, K = 1,2, ....
C. For 0
tk,o < tk,l <
distribution function
> 0 such that
... < tk,k < tk,k+1 ==
00,
P(TK,j - TK,j-l
there exists a 77
> 8)
1, for every
> 0 such that the underlying
satisfies
1, ... ,k + 1.
for all j
D.
1
density of
such that
be the density of the joint distribution Q(tk,j, tk,j-lIK =
,and let
be the
with respect to Lebesgue measure on R. Suppose that there exist Co > 0, CI > 0
<
1
x.
for
t,
8
= 1 2, ... , k
1,
,2, ... ,
F. Fix to E
. Then there is a neighborhood
to such that
distribution of
Qk,j of
is differentiable, and
IS cOIltUlUC'US, positive and uniformly bounded
allj 1,2, ... ,k,k=1,2, ....
G. In a neighborhood of
H. In a neighborhood of to,
(3.7)
3.1
is differentiable, its density
is continuous and strictly positive.
is defined as
is continuous and strictly positive, where
a(t)
Consistency
(2000) have proved the consistency of the NPMLE F'n by using
preservation theorems for Glivenko-Cantelli classes. Here, we give another approach to the proof.
Then, following this approach, we continue to study the rate of convergence of the NPMLE.
In the probability space (n, A, P), where n is
{O, 1 }k+l x
x {k}; A is a a-field generated
by Akl x Bk X Ak3' Akl is the class of all subsets of {eJ
(0, ... ,~, 0, ... ,0) : j
1, ... ,k + I},
VAN DER VAART AND WELLNER
Rt
j-th
Bk is a a-field generated by the set of all k-dimensional rectangles on Rt, A k3 = a {0, {k } };
a probability measure with density defined in (1.1) and dominating measure v defined as
(3.8)
v( {ej} x B k x {k})
PF
is
counting measure on {ej}
xP(K = k) x P(T E BklK
k).
be the conditional distribution function of
. Let Qk,j denote the
Q(tk,b""
IK
1,2, ... , conditional on K = k. Then the
marginal distribution of Tk,j, for j = 1,2, ... , k, k
Hellinger distance is
that
~J
j=l
J{~
elC
~P(K
k=l
=
~P(K
k=l
00
=
~P(K
k)
=
-
F(tk'H)]} dq(hl K
J
[F(tk,k+l) - F(tk,O)] dQ(hi K
k)
k=l
=
~P(K
k=l
k) = 1.
For fixed Fo E F and Pp as defined in (1.1), let
(3.9)
{pp: F E F},
pp PPo
2pp
_ 1.
pp + PPo
PP + PPo
{mp: F E F},
P
(3.10)
mp
(3.11)
(3.12)
-
{mp - mpo : h(PP,PFb)
< 5, FE F}.
Then P is convex: i.e. for all pp, PPo E P, and for 0 E [0,1], opp + (1 0) PPo = Poe
Po E P.
As shown by VAN DER VAART AND WELLNER (2000), page 123, the Hellinger distance h(PFn ,PPo)
is less than or equal to
PIIM for a density pp with respect to a dominating measure v. To
show that h(po ,PF'n) -7nq 0, we only need to prove that the class A1 is a Glivenko-Cantelli class
by showing t-h'a:t it~ bra~k~ting numbers are fi~ite for each E > 0, see Theorem 2.4.1 in VAN DER
VAART AND WELLNER (1996). It follows from the following Lemma 3.1 that the bracketing number
u
(E, A1, Lr(P)) is bounded by
Lemma 3.1
(E, P, L r (Qa)), where dQa
for
For any integer r 2 1, let
sup {(J 2 0:
{pp+
J
dv S;
EP}.
r
E }
,
(J > 0.
dQ =
(J
--':::":-t-- av
For univariate
class P defined in
EK < 00.
the
two lemmas show that the
and the class ./'-"1 defined in (3.11) are both Glivenko-Cantelli classes under
Lemma 3.2
If EK <
Lemma 3.3
If EK
00,
P defined in
< 00, A1 defined in
9) is a Glivenko-Cantelli class.
11) is a Glivenko-Cantelli class.
Thus, we have the following theroem about the consistency of the NPMLE with univariate
"mixed case" interval censored data.
Theorem 3.1 Under EK < 00, the NPMLE of the univariate "mixed case" interval censored data
is consistent in Hell'inger distance.
Consistence of the NPMLE in L1-norm can be derived from the Theorem 3.1; see VAN DER
VAART AND WELLNER (2000). Under additional hypotheses, SCHICK AND Yu (2000) proved that Fn
converges pointwise and even uniformly.
3.2
A Local Asymptotic Minimax Lower Bound
Let Q be a set of probability densities on a measurable space (0, A) with respect to a a-finite
dominating measure p, T be a real-valued functional on Q, and let Tn, n ;:: 1, be a sequence of
estimators of T g based on samples of size n from a density g known to be contained in Q. The
nlinimax risk of the estimator Tn, vvhich measures the difficulty of the problem of estimating Tg,
is defined as
max
gE9
where 1 is an increasing convex loss function on
with l(O)
O.
n --7
the asymptotic lower bound for the minimax risk can be derived by the following inequality in Lemma 4.1,
GROENEBOOM
1996):
The local rates of convergence for
1" and "Case 2" of univariate
censored
have been well studied: see GROENEBOOM (1996). Here we follow the same approach to extend
the minimax results to univariate
interval-censored data. Consider the following
perturbations F n of the underlying distribution function Fo:
(t) =
(3.14)
where 0
function
Fo(t) + f) fo(to) (t - to +
Fo(t) f) (to) (to + cn{
Fo(t),
< f) < 1 is a constant,
f n of Fn is
if t E [to
if t E
to
otherwise,
c is a constant to be determined by Fo and Q. Then the density
fo(t) + f) fo(to),
fo(t) - f) fo(to),
{
fo(t),
fn(t)
, to),
+
ift E [t o cn- 1/ 3 ,0,
t )
if t E [to, to + cn- 1/ 3 J,
otherwise.
Note that
{ fn(t)dt
i R+
and for all t E [to, to
= (
fo(t)dt
+ f) fo(to)
i R+
( {to
dt
ito-en-1/3
(toto+cn-1/3 dt)
it,
= 1,
+ cn- 1/ 3 J,
fn(t)
= fo(t)
- f) fo(to)
= (1 -
f))fo(to)
+ O(n- 1/ 3 ) > 0,
for 0 < f) < 1, and large n. Thus, (3.14) is a sequence of univariate distribution functions.
For "Mixed Case" univariate interval-censored data, an asymptotic minimax result is given by
the following theorem.
Theorem 3.2 Let Fo(t) be the under-lying distr-ibution function with density
, let Fn be the
sequence of per-tur-bations of Fo
by (3.14),
qkO 0, qk,k+l == 0, qk,j(t) be the dens'ity of the
distr-ibution function Q(tk,jIK = k), and let
be the joint density of Q(tk,j,
IK =
for- j
1, ... , k, k
1, .... Suppose that condition B holds and
00
LP(K
+
< 00.
{ }
,is a constant ael)enaw"q on (), and
Co
00
LP(K
k=l
Note that in the cases of P(K = 1) and P(K
2), the above a(to) reduces to the known lower
bound results for univariate current status data and univariate interval censored data: case 2; see
GROENEBOOM (1996). In the case of bivariate interval censored data, the local rate of convergence
is also shown to be n l/3 ; see SONG (2001).
3.3
Global Rate of Convergence in Hellinger Distance
We will apply empirical processes theory to study the rate of convergence of the NPMLE of with
univariate "mixed case" interval censored data. Consider a deterministic function M : e i-7 R,
and stochastic processes
indexed by a semimetric space e. Let ()o be a point of maximum
be estimators that (nearly) maximize the maps ()
(8). An
of the map () i-7 M( 8). Let
upper bound for the rate of convergence of
can be obtained from the continuity modulus of the
difference
see Theorem 3.2.5 in VAN DER VAART AND WELLNER (1996). In the case of
i.i.d. data,
(8) =
(8) and M(8)
P(8), the centered process J1l(Mn = (mo) is an
empirical process at mo. Define
an
an
= {mo - moo: d(8,()o)
Then Theorem 3.2.5 in
VAN DER VAART AND WELLNER
< 6}.
(1996) leads to the following corollary,
Corollary 3.1 In the i.i.d. case, assume that, for every () in a neighborhood of 00 ,
P(mo
);S
a Jurletlion
C~
<2
some
key step to apply Corollary 3.1 is to derive sharp bounds on the continuity modulus of the
empirical process. Define the uniform entropy integral for the class
by
J(ry,
sup
P
(7
Jo
and the bracketing integral
(ry,
where iUS is an envelope function of the class
. Then the bounds can be obtained by the two
maximal inequalities given by Theorem 2.14.1 and 2.14.2 in VAN DER VAART AND WELLNER (1996):
Ep
:s J(l, MIs) (p* lVIn 1/ 2 ,
EpIIGnIII~ ~ J[](1,MIs,L 2 (P)) (P*Ml) 1/2 .
An alternative is to use Lemma 3.4.2 in VAN DER VAART AND WELLNER (1996),
where each element in
is uniformly bounded by },;I. We will apply Corollary 3.1 to study the
rate of convergence of the NPMLE with the univariate "mixed case" interval censored data. The
rates are obtained with and without a separation condition. We first obtain the entropy with
bracketing for the class Ai with density defined in (1.1).
Lemma 3.4 Under conditions .4, Band C,
Under the separation condition, the following theorem gives the global rate of convergence in
Hellinger distance
for the NPMLE of
univariate "mixed case" interval censored data.
Theorem 3.3
Under conditions .4, Band C,
h(PF'n,PFO)
Without the sej:)aration condition B, the global rate of convergence in
NPMLE with
univariate "mixed case" interval
data is
Theorem 3.4
CorWZic201"/'S
A and D,
J.J.C;ULU.i".'"J.
distance of the
3.4
Pointwise Asymptotic Distribution
GROENEBOOM AND WELLNER (1992) conjectured that the
estimator" has the same asymptotic
distribution as the NPMLE for the univariate interval censored data: "case
This cOJojectllfe
is called the "working hypothesis" and supported by numerical results. The conjecture has been
proved under a separation condition for the univariate interval censored data: "case 2", see GROENEBOOM (1996) , pages 138-159. Here we will study the pointwise asymptotic distribution of the "toy
version" NPMLE for the univariate "Mixed Case" interval censorship model.
Suppose the marginal distribution Q( tk,llK = k) has density Qk,1 (tk,1) with respect to Lebesgue
measure on R; the marginal distribution Q(tk,kIK
k) has density qk,k(tk,k) with respect to
Lebesgue measure on R; joint distribution Q(tk,j, tk,j+1IK = k) has density Qk,j,j+1(tk,j, tk,j+1)
with respect to Lebesgue measure on R 2 for j
1,2, ... ,k -1, k 2,3, .... The following theorem
gives the pointwise asymptotic distribution of the NPMLE for the univariate "mixed case" interval
censored data.
Theorem 3.5 Under conditions B, E, F, G, H, then the toy estimator Fr~1) satisfies
(3.15)
where Z (t) is a, standard two-sided Brownian motion on R starting .from 0, and a(t) is defined in
(3.7).
To get a version of Theorem 3.5 for the NPMLE, we first need to show that, for each lVI > 0,
(3.16)
sup
I'FAn (\
t)
-
D (to ) I = 0 p (n -1/3
' ' ),
L' 0
and
(3.17)
under some regularity conditions. Then (3.16)
(3.17) will be used with some empirical processes
arguments to show that the difference between the process
{
process
+
}-
{
+
}
in t of the process (3.
and the distribution of the process t -+ 11,1/3
+ tnis
determined by the local minima in t of the process (3.19). For details on the "switching relation",
see pages 91-95 in GROENEBOOM AND WELLNER (1992) and examples 3.2.14 and 3.2.15 in VAN DER
VAART AND WELLNER (1996). The pointwise asymptotic distribution of
(to) - Fa
}follows
from the Theorem 3.5. One difficulty here is that considerable work is needed to prove that (3.16)
and (3.17) hold.
4
Efficiency of the NPMLE
Let Fn denote the NPMLE in the current model. Let An and A~s denote the NPMLE wZ and
NPMPLE wZ in the non-homogeneous Poisson process model, see WELLNER AND ZHANG (2000).
Note that the one-jump counting process N(t) = 1[Y::;t] has mean function A(t) = E{N(t)} =
P(Y :s; t)
F(t); see example 4.1, page 792, WELLNER AND ZHANG (2000). Based on the study
of the pointwise asymptotic distribution of the Fn , An and A~s, see Theorem 3.5 in Section 3,
Theorems 4.3 and 4.4 in WELLNER AND ZHANG (2000), we suppose that the estimators Fn , An and
A~s have the same asymptotic distribution up to positive constants L 1, L 2 and L3:
and
11,
113
i
2 i3
(Apsn(to) - Ao(to)) -+d L 3/ X,
A
•
where X is a known random variable. Note that Var(Fn ) ~ 11,-2/3 Li/ 3 Var(x), Var(A n ) ~ 11,-2/3 L~/3
Var(x), and Var(A~S) ~ 11,-2/3 L;/3 Var(x). To study the asymptotic relative efficiency of two
estimators, we ask the two estimators to have the same variance, asymptotically. Thus, we can plot
the 3/2 power of the ratio of sample variance of the two estimators to obtain an approximation to
the asymptotic relative efficiency. WELLNER AND ZHANG (2000) showed the asymptotic efficiency
in their Figures 3 and 4. In this simulation study, we study the efficiency gain
gain of An over
of the
over An by plotting (Var(Fn )/Var(A n ))3/2.
Random samples of univariate "mixed case" interval censored data Xi
i = 1,
, n, were
as follows: K i E {I, 2, 3, 4, 5, 6, 7,
with
k = 1,
, 8.
are ordered sequence
K i random variables gelleratE:d
the unit
time
UIC'LLJ.UU.LJ~Jll with
.
~
:>,
<.>
55
~
Q)
~.~
<X>
0
.-"
:~.,...
<D
.2:
0
C3
....
0
<6
0::
<.>
'"5
15.
E
:>,
oil
N
0
<=>
0
0.0
0.5
1.5
1.0
2.0
3.0
2.5
Time
Figure 1: The asymptotic relative efficiency of
runs; n is the number of subjects.
Fn
versus the
An.
N is the number of simulation
the number of subjects n
1000. The estimated efficiency gain of F'n over An is plotted in Figure
1. As shown in Figure 1, the NPMLE Fn is more efficient than the NPMLE An: for most of those
points where the Fn and An were estimated, the estimated asymptotic relative efficiency is above
40% for both simulation cases. When time is between 0.2 and 1.6, the estimated asymptotic relative
efficiency for the simulation with n = 100 is close to that for the simulation with n = 1000; when
the time varies from 1.8 to 3.0, the estimated asymptotic relative efficiency for the simulation with
n
100 is generally higher than that for the simulation case with n = 1000. Although Theorem
3.5 and Theorem 4.4 in WELLNER AND ZHANG (2000) show that the asymptotic relative efficiency
for the toy estimators of Fn and An is equal to 1 for a one jump process, the log-likelihood function
of (2.7) for a one jump process in WELLNER AND ZHANG (2000) is different from that of (1.2). This
simulation shows that we need to study asymptotic distributions of the two estimators F'r, and An.
5
Asymptotic Properties of the NPl\1LE: Proof
sectioJn, we will give
Proof of Lemma 3.1
proofs of
::;eCILlon 3.
as)Tm]ptoitIc properties
tolJowin!! brackets for
A1:
where
,i
1, ... ,I, are
for class P. Since
is de<::reaslng in t, and
so for all mF E A1, there exists i E {I, ... ,I} such that m~,i :::; mF :::;
and
<
:::;1.
-m~1
+2
Thus,
-
m~llp,r
, and
{ / ImimWdP}
r
< {2 [ / PFo l[ppo::;£T]dV] + / Imi -
< (2 r
if II(mi
mD1pro>£TIIP,r:::;
E.
Er
mW l[ppo > £T]dP}
+ Er)l/r ,
This implies that
Also,
I
r
PF,i
PF',i
+ PFo
P~,i
+ PFo
PFo
dv
because -;---"'-- :::; 1, and
-c;;--...JL--
< 1. Define
Proof of Lemma 3.2 Since F is a class
bounded monotone functions: F : R
for every probability measure Q, by Theorem 2.7.5 in VAN DER VAART AND WELLNER
exists a constant K depending on T
1 here), such
1] then
there
logNr
" l
E > O. Let I
PK,j(B), where B
for
(E,F,LdQ)). Then for the probability measure P(TK,j
BIK =
E B, there exists a set of brackets:
[F{ , F{ ] ' ... , [Ff, F[ ] ' ... ,
such that for any F E F, Ff (t)
Since
:s; F(t):S;
1, ... ,I,
(t) for all t and somei, and EpK,j IFf (tk,j) - F[ (tk,j) I :s;
and
it follows that
Let the corresponding brackets for P be defined by
K+1
II (Ft(tK,j)
P~,i
(5.20)
F[(tK,j-d
K+l
==
(5.21)
II (F[
j=l
Then
!
!
dv
PFo
:s;
K
{
E.
::; 2 E
('2:: P(K
= 2E(K
l)E
< 00,
k=l
if EK
< 00. This implies that:
logN[ (2EE(K+1),P,L 1 (P))
Thus, by Theorem 2.4.1 in
under EK < 00.
VAN DER VAART AND WELLNER
(1996), P is a Glivenko-Cantelli class
0
Remark: From the above proof, we also have that
(5.22)
if EK
< 00. Here v is defined in (3.8).
Proof of Lemma 3.3
Let a
«uPlodV <
E/(EK). Then,
E~! dv ~ E~ {~P(K k) ~! dQ(cl K~ k)}
,..,Eu
E(K
+ 1) =
£;1\
(1
+
\
Let r = 1 in Lemma 3.1. Then dQer =
,..,lrT)J E.
£;1\
dv, so
I
<
PF,i
Then it follows that
<
00.
Proof
Theorem 3.
Proof of Theorem 3.2
Define
if t E
ift E
otherwise,
Note that
and
k+1
L Ok,j (Fn (tk,j) -
Fn(tk,j-d)
j=l
k+l
L Ok,j (Fo(tk,j) -
k+l
FO(tk,j-1))
+ (B fo(to)) L
j=l
Ok,j (Hn(tk,j) - Hn(tk,j-d)
j=l
k+1
PFo(rl,tJ
+ (B fo(to)) L
Ok,j (Hn(tk,j) - Hn(tk,j-d)·
j=l
Under the separation condition B and for large n, tk,j and tk,j-1 will not be in :In at the same
time. This indicates that
Thus
(
1
)2
1
4- PFo
+y'PFo
+ O(n- 1 /.'3 ) 1In l'.t)' ,
and
LP(K
k=l
L J(Hn(tk,j)
k+l
00
k)
Hn
j=l
I:{[
. In
H~
iK
}
if
00
I:: P(K =
I::(qk,j(tO)
k=1
j=1
The above condition is satisfied if EK
The Hellinger distance becomes
2
H (PFn ,PFb)
+2
2
VPFo)2dv
1
=2j
.
n
'
+ O(n- 1/ 3 ) ; ' (PFn
~j
+ o(n- 1 ),
(PFn
2
PfrJ dv
PFo
= 1" , , ,k, k = 1" , , ,
(J:i
=p~)2
PF
1
-8 j (PF'n -P Fo)2 dv
PFb
8
Now,
< 00,
< 00 and qk,j is bounded above for all j
~- j(VPFn -
=
+
vP~
dv
PFo)2 dv
Thus,
+
).
Now, by inequality (3.13),
I En,PFo
infmax
Tn
1
~ 4
IFn(to)
1
-t
1
}
2
F o(to)l{l
~ 4 0 c f 0 ( to) (1
Fo
H (PFn' PFo)}2n
.
+ o(n- 1
4" a(to) (Oio
1
-Ocfo(to) e-
as n
4
-t 00.
The last expression is maximized by
_ {
2/3
}
c = a(to) 02 f6(to)
and the maximum value is
~ (20/3)
e- 1/ 3 (fo(to)) 1/3
a(to)
Thus,
liminfn
n-+oo
where Co
max {En,PFn
i (20/3)
1>
tJ - Co
f0
1/3
{ a(to) }
o
is a constant depending on O.
Proof of Lemma 3.4 By Theorem 2.7.5 in VAN DER VAART AND WELLNER (1996), for all
and for any probability measure Q, there exists a constant K such that
such
,K
<
<
E
>0
the brackets of
To
collection
note that
TlLF
Also,
a
-c---,-;::-
(x
+
>0
for a > 0, and under conditions Band C, there exists a T7 > 0 such that Fo(TK,j) - Fo(TK,j-1) > T7
for all j = 1,2, ... , K, K = 1,2, .... Thus, 0 < flFo,K,j ::; 1, and t:.FK~:~'h,K,j is increasing in
flFK,j ;::: O. Choose the brackets of M as
mU X)
K+1
=
2
L
6K,j ~--"-'----"''-----''-''':'-~--'---­
1,
j=l
mi(X)
2
L
Fr()
l(
)
6K',
, i TK,j
Fi TK,j-1
- 1.
j=l
,J [F[(TK,j) - F;(TK,j-d] + flFo,K,j
,
K+l
where i = 1, ... , I. Then
~P(
4 '
r -m~12 PF~1dlJ
~ J{-;:------"-~:...,---"-'-:!"--'----
~4 J
~ P(K d )
-=-:-----"--'---~---"---'---c==--'----flF.'
2
}
O,K,J
00
< LP(K=
2
fl F.u'.k dQ (t k !K =
'
,J
- I
00
< 2LP(K
k+1
00
< 2LP(K=
(EK
00.
L(E
dQChl K
}
J
2
+
j=l
k=l
if EK <
~{J
2
+ 1) < 00,
This implies that
Proof of Theorem 3.3 In order to apply Corollary 3.1, we need to use Lemma 3.4.2 in
VAART AND WELLNER (1996) to find the upper bound for E*IIGnIIMs' where
A15
{mF : h(PF,PFo)
VAN DER
< 5}.
For mF E A15,
J.
m~PFo dv =
r
'.
<
Also,
1996),
J(
PF
PFi
PF
+ PFo
0
)2
PFo dv
!PF - PFol.
P~o dv
PF + PFo PF T PFo
J
IPF
PFol dv
dTv(pj, PFb) ::; V2h(PF,PFo)
::; 1. Thus, by Lemma 3.4.2 in
allmF E
E*
(1
< J2 5.
VAN DER VAART
+ -"-'----:::----==----
WELLNER
Corollary 3.1,
<
From
we have r n
::;
.
o
Thus, the rate is
Proof of Theorem 3.4
A1. In this case, r = 2, so
We will use Lemma 3.1 to control the bracketing entropy of the class
1
l[pF > 0-1- dv.
o
. PFo
Let [pL pi] be a pair of bracket for the class P as defined in (5.20), (5.21) . Note that
>0-1
1
·PFo
Define
1
>0-1'Pf'o
Then
/
J
II( =
dv.
::; {/(F[
dQ(hiK
Thus,
=
1- dV}
lPFO
0"1-
This implies that
Now for fixed k :::: 1 and 1 ::; j ::; k,
/l[FoC t k,j
Co, c do not depend on j and k by assumption D. Hence we have
/
k
L
. Then, if EK
Take a
<
00,
<
'"
P,
10gN[
(1
1.)
- 1og'1/2.(-)
E
E
.
By Lemma 3.1,
We have shown that
It follows that
1VI + ~\18,
1J~10g1/2(~)dE::S<51/210g1/4~.
8
J[
(<5, lvl8, L 2 (P))
10gN[](E,
L 2 (P)) dE
8
<
By Lemma 3.4.2 in
VAN DER VAART AND WELLNER
<'
E*
.
1
.. ;::, <5 1/ 2 1og 1/4 <5 (1 +
II M"
(1996),
<51/210g1/41
.
1
,)
)
= <5 1 / 2 1 .1/4
yin <52
og <5
100'1/2
+ _o=--,,<5
¢(<5).
By Corollary 3.1,
Tn ::;
Thus, the rate is
10g-1/6 n .
10g-1/6 n . This completes the proof.
Proof of Theorem 3.5
Define the processes
[~(~+~
Ki
(~
o
define the processes:
=
We will use Theorem 2.11.23, see
VAN DER VAART AND WELLNER
{
{
(5.26)
=
+
+
(to
(1996), to show that the process
(to) }
+
(to) }
lto+n-1/3t (Fo(s)
Fo(to)) dGhO) (s)
to
converges in distribution to the process U(t) defined by
(5.27)
where t E R. First, note that
l/3t]
where
K
(X)
L(
.
)=1
I
A
UK,j
!1Fo K)'
'
!1K ,)+
'. 1. ") (1
!1Fo, K,j+1
),
and
00
LP(K=
k=l
o.
Then
r
2
<
}
By assumptions B, F and G, there exist C > 0, C\ > 0 such that FO(tk,j) - FO(tk,j-1) 2:: C- 1, and
Q~,j(t) ::; C1, for j
1,2, ... ,k, k 1,2, .... Also by assumption E, we have
2
.6.K,j+l
.6.Fo K+1
,) ,
K
.
)
1
L [
-toi<Bn-
)=1
dQk,j
co
2Cn 1 / 3
LP(K
k=l
. we show
all T/
> 0,
O. For all
<
E
> O.
\2
) --+ 0,
}
as n
00.
and tl > t2. We
K
( X)
'"
~ t..Fo K,j
Then
t..K,j+l )
t..Fo K,j+l
t..Fo K,j
1/3 P (K (
<
t..K,j
~\t..FOK,j
n
~1
.L
_, t..K,j+1 )
D.FOK,j+1
2
' -lf3t
,)
[to+n- I / 3tz<Tg,j"5: to+n
I
IJ
j=l
-
13
n /
t; P(K k) J ~
00
=
[k
.L
K
(
t..K,j
t..K,j+l) 2
t..Fo K,j - t..Fo K,j+1
-to"5:n-1/3tJ]
]
dv
j=l
1
k=l
t..Fb K,j+l
iK=
Construct a set
by first partitioning - B, B] as
brackets for class
-B
<
< ... <
such that
< E, for 1 :s; i :s; q and q = O(
some i E {O, 1, ... ,q} such that the bracket
I _
fn,i -
K
Lj=l
{
_
< ... < h q
0<
= B
Then for given t E
is
1,
,there exists
(
1/6 ,,",K
n
L..tJ=l
n 1/6 ,,",K
L..tJ=l
_n 1/ 6 ,,",K
(
L..tJ=l
fJ.K,;
fJ.FoK,j
(-:cri-'-'--'--
fJ.K,j+l )
fJ.Fo K,j+l
1
[to<TK,j:StO+ n -
1
/
if i
3h d'
As just shown above
P(J~,t
f~,t)2 ~nl/3 p ( {~
~K,j+1
~Fo K,j
~Fo K,j+I
It follows that
which implies that
Thus,
dE =
5
<
5,
E
p + 1,
if i :s; p.
1[to+n- 1 / 3h i <TK,j::;toj'
K,j+l
~
)
K
(1
'f;
[
!
L3p
n '
(K
2.: [( Li K' ) 2
,J
LiFo K,j
j=l
!
T
(Li
LiFo
T/
_---'H~_
fP(K k)jt (LiFo1
j=l
t,
j=l
P(l(
k)
{t
K,j+1
+
K,j
1
LiFo K,j+1
Fb(:ktl
IK = k)
1,vlK = k)
=
+
k-1
+2.:
j=l
{
dv
Similarly, if s < t
t
E
)If s x t
< 0, dearly
\
) -7
,
-t a (to).
= O. Then
P(j(O)
j(O)) _ P(/O))P(j(O))
= O.
n,t n,s
n,t
n,s
Thus, by Theorem 2.11.23, see
VAN DER VAART AND WELLNER
(1996),
in FN[-B, B],
where Zoo [- B, BJ is set of all uniformly bounded real functions on [- B, BJ.
By assumptions of B, F, G and H, the expectation of the second term in (5.26) is
dQ(t-klIK
t2.
in
}
Drc)vi(ied that
Op(l).
To
we define the following two sto,ch,'tstic processes:
that
(h)
P)
[t (D.~Kil]
j=1
OK t
.>
-
,]
-=-::.~:......
+ rto+\Fo(s) _ Fo(to))dG~O)
Jto
and
M(h)
E [l:O+h(Fo(S) -
Fo(to))dG~O)(s)].
It follows that to prove (5.28) is equivalently to prove
n 1/ 3 hn
= n 1/ 3 argminn Mn (h)
Op(1).
Note that ho = 0 is the only location of the minima of M(h). As shown before, we have, using
conditions F, G and H,
M(h)
M(h o)
r rto+h
E lito
(Fo(s)
l
to +h
to
~
as h -+ O. The difference between the process
(h)
Fo(to) )dG~O) (.s)
]
(Fo(s) - Fo(to))a(s)d.s.
(to)a(to)h 2
o(h 2 ),
(h) and the process
is
r+
o h
d
Ito
[t (
+n
j=1
CiKi,J 2
(CiFOKi,j)
+ --=~~
(CiFoKi,j+l)
. (1[TKi,j9o+hj
Combining this with the fact that
vnl
(h) -
[t
CiFo K
)=1
Gn
[t((
j=1
(h o)
l[ TK i,j
(Fo(TKi,j) - Fo(tO))]
M(h o ) = 0 yields
(h o)1
' to)
CiKi,j 2
CiFOKi,j)
--'-"-'~
+
-,-----"=------:;:
(CiFoKi ,j+1
Define a class :F~) as
)
:
::;D~.
J
The envelope function of the class :F~) can be chosen as
-=~L:!:.!:_I
CiRo'K't,J'-'-1
1
I
Following the same calic:ul:a,tlC)ll of E
E
2
. we
r
!{t
=
I:P(K
k=l
K
2
. I:1[
} NOd"
It{
1
K
} dQ(!kI K
'I:1[
j=l
00
< 2C I : P(K
k) k
k=l
=
2cf
J
k)
k
I : 1[
j=l
dQk,j
[t rt°+ Q~,j(s)
D
P(K = k) k
k=l
dS]
j=l Jto-D
< 4C C 1 D (EK 2 ) < 00,
by conditions B, F and G. Similarly, following the same argument that gave
we obtain
It follows that
dE = 0(1).
Thus we
using Theorem 2.14.2 in
VAN DER VAART AND WELLNER
<
~
a
{
as
o
(1996),
Let
be
envelope function of the
, 2
Following the same calculation of
E[FJ})]
- Fa
(Fo
--~-+--~-'---
< 00 and arguments that yield
we obtain,
E [H V(oJ] 2 <
00,
by conditions of B, F, G, and
N[]
(EI1F1~~llp,2,F~,~,L2(P)) 0 (E12 ) .
It follows from Theoren1 2.14.2 in
VAN DER VAART AND WELLNER
(1996) again, we have
All these show that the function <Pn(S) in the Theorem 3.2.5 of VAN DER VAART AND WELLNER
n 1/ 3 in the Theorem 3.2.5 of VAN DER VAART
(1996) can be taken as <Pn(S) = 15 1/ 2 that yields Tn
AND WELLNER (1996). Hence (5.28) is proved.
By using Exercise 3.2.5 of
motion,
VAN DER VAART AND WELLNER
(1996) and symmetry of Brownian
d
(2f6(t:)a(to) )
d
( 2f6(t:)a(toJ
Thus, following the same argument as in
satisfies
GROENEBOOM AND WELLNER
(1992), the
>
>
. 2 argma:x. f
>
estimator
is
·2 argma:Kf
o
This completes the proof.
Acknowledgments
This paper is part of author's Ph.D. dissertation under supervision of Professor Jon A. Wellner.
Part of the writing and computation were carried out while the anthor was employed by the
Boeing Company. This research was partially supported by National Science Foundation grant
DMS-9971951.
References
Aragon, J. and D. Eberly (1992). On convergence of convex minorant algorithm for distribution
estimation with interval-censored data. J. of Computational and Graphical Statistics 1, 129~
140.
Becker, N. G. and M. Melbye (1991). Use of log-linear model to compute the empirical survival
curve from interval-censored failure time data, with application to data on tests for HIV
positivity. Austral. J. Statistics 33, 125~133.
Finkelstein, D. M. (1986). A proportional hazards model for interval-censored failure time data.
Biometrics 42, 845-854.
Finkelstein, D. M. and R. A. Wolfe (1985). A semiparametric model for regression analysis of
intcrval=censored failure time data. Biometrics 41, 993-945.
Groeneboom, P. (1991). Nonparametric maximum likelihood estimators for interval censoring
and deconvolution. Technical Report 378, Department of Statistics, Stanford University.
Groeneboom, P. (1996). Lectures on inverse problems. In P. Bernard (Ed.), Leet-ures on Probability Theory and Statistics, Volume 1648
Lectures Notes ,in lv[athematics, pp. 67-136. Berlin:
Springer-Verlag.
Groeneboom,
and J.
Iv[aximum
Jongbloed, G. (1998).
convex minorant algorithm for nonparametric estimation.
Journal of Computation and Graphical Statistics 1(3), 31O~321.
Schick,
and Q. Yu (2000). Consistency of the GMLE with mixed case interval censored data.
Scand. 1. Stat'ist. 21, 4555.
Self, S. G. and E. A. Grossman (1986). Linear rank tests for interval-censored data with application to PCB levels in adipose tissue of transformer repair workers. Biometr·ics 42, 521-530.
Song, S. (2001). Estimation with bivariate inter'val censored data. Ph. D. thesis, University of
Washington.
Turnbull, B. W. (1976). The empirical distribution function with arbitrary grouped, censored
and truncated data. J. Roy. soc. Ser. B 38, 290-295.
Van der Vaart, A. W. and J. A. Wellner (1996). Weak Converyence and Empirical processes with
Application to Statistics. New York: Springer.
Van derVaart, A. W. and J. A. Wellner (2000). Preservation theorems for Glivenko-Cantelli and
uniform Glivenko-Cantelli classes. In E. Gine, D. M. Mason, and J. A. Wellner (Eds.), High
Dimensional Probability II, pp. 115--133. Birkhiiuser, Basel.
Wellner, J. A. (1995). Interval censoring case 2: alternative hypotheses. In H. L. Koul and J. V.
Deshpande (Eds.), Analysis of Censored Data, Volume 21 of 1MS Lecture Notes - lvfonograph
Series, Hayward, pp. 271-291. IMS.
Wellner, J. A. and Y. Zhan (1997). A hybrid algorithm for computation of the nonparametric
maximum likelihood estimator from censored data. J. Amer. Statist. Assoc. 92, 272~29L
Wellner, J. A. and Y. Zhang (2000). Two estimators of the mean of a counting processes with
panel count data. Ann. Stat·ist. 28, 779--814.
Mathematics & Computing Technology, The Boeing Company, P.O. Box 3707, MS 7L-22, Seattle,
Washington 98124-2207, U. S. A.
E-mail: [email protected]