Truong, Young N. and Stone, Charles J.; (1996)Asymptotics for Hazard Regression."

'.
The Library of the Department of Statistics
North Carolina State University
•
ASYMPTOTICS FOR HAZARD REGRESSION
by
Young N. Truong and Charles J. Stone
Department of Biostatistics
University of North Carolina
Institute of Statistics
Mimeo Series No. 2165
August 1996
Asymptotics for Hazard Regression
Young K. Truong*
The University of North Carolina at Chapel Hill
Charles J. Stone t
University of California at B~rkeley
August 6, 1996
Abstract
In hazard regression (HARE), the logarithm of the conditional hazard function of a survival time given a covariate is modeled by a sum
of polynomial splines and their tensor products. Under appropriate
conditions, it has been shown that the (nonadaptive) HARE estimate
of the conditional log-hazard function possesses an optimal L 2 rate of
convergence. The current paper considers the L oo rates of convergence
and the distributional properties of HARE estimates of the conditional
hazard, cumulative hazard, survival and density functions. In particular, it will be shown that these estimates are asymptotically normal.
..
"This research was supported in part by National Science Foundation Grant DMS9403800 and National Institutes of Health Grant CA61937.
tThis research' was supported in part by National Science Foundation Grant DMS9504463.
AMS 1991 subject classifications. Primary 62G07j secondary 62G20.
Key words and phrases. Asymptotic normal distribution; Density function; Hazard functionj Maximum likelihood; Tensor product of polynomial splines; Survival function.
1
•
1
Introduction
Let T and C be nonnegative random variables having a joint distribution
that depends on an M -dimensional vector x = (x 1, ... , x M) of covariates
(M = 0 when there are no covariates). In survival analysis, T and Care
referred to as the survival time (or failure time) and censoring time, respectively. Set Y = min(T, C) and 6 = ind(T :::; C). The indicator random
variable 6 equals 1 if failure occurs on or before the censoring time (T :::; C)
and it equals 0 otherwise. The observable time Y is said to be uncensored
or censored according as 6 = 1 or 6 = O. For identifiability, it is assumed
that T and C are independent.
Let f(tlx) and F(tlx) denote the density function and distribution function, respectively, of T. The survival, hazard, cumulative hazard and loghazard functions are defined by
S(tlx) = 1 - F(tlx),
A(tlx) = f(tlx)/S(tlx),
H(tlx) =
~t A(ulx) du
and
•
¢>(tlx) = log A(tlx),
t
~
O•
Let FC(zlx) denote the distribution function of C, which depends on x, and
set Sc(tlx) = 1 - Fc(tlx).
In the HARE methodology for survival data analysis [see Kooperberg
et al. (1995a)], the logarithm of the hazard function of a survival time is
approximated by a function ¢>* having the form of a specified sum of functions of at most d of the variables t, Xl, ... , XM. Subject to this form, the
approximation is chosen to maximize the expected log-likelihood. Maximum likelihood and sums of tensor products of polynomial splines are then
used to construct an estimate
¢ of this
approximation based on a random
sample. The c~rresponding maximum likelihood estimates ~, iI, Sand
of approximations of the hazard, cumulative hazard, survival and density
1
functions can be easily derived using the relationships, A(tlx) = exp ¢>(tlx),
H(tlx) = j~ A(ulx) du, S(tlx) = exp(-H(tlx)) and f(tlx) = A(tlx) S(tlx).
2
Under appropriate conditions, it is shown in Kooperberg et al. (1995b) that
;;; possesses an optimal L 2 rate of convergence that depends only on d and
a suitably defined smoothness parameter.
A useful feature in the HARE methodology for survival data analysis is
that the space G of approximations is chosen adaptively [see Kooperberg
et al. (1995a, 1996) and Intrator and Kooperberg (1995)]. Based on our
experience in analyzing survival data, we have found the approach to be
quite promising. The current paper lends further theoretical support to
these methodologies by establishing L oo rates of convergence and aymptotic
normality for nonadaptive estimates of the hazard, density, survival and
cumulative hazard functions.
A log-hazard model is said to be saturated if d = M + 1 and unsaturated if
d < M + 1. In establishing the L oo rates of convergence and the asymptotic
normality of;;;, ~,ii, Sand ffor log-hazard models in this paper, we restrict
attention to the saturated models. Conceivably, these results also hold for
unsaturated models, but extending the L oo results in Stone (1989, 1991) to
additive and other unsaturated models remains an open problem.
The rest of the paper is organized as follows. Section 2 describes the
main results of the paper. Specifically, errors in approximating the loghazard, hazard, cumulative hazard, survival and density functions are given
in Theorem 1. Errors in estimating these approximations are described in
Theorem 2. These results indicate that the (nonadaptive) HARE estimates
achieve the usual optimal L 2 and L oo rates of convergence. Asymptotic
normality of these estimates is described in Theorem 3. The proofs of Theorems 1-3 are given in Sections 3-5, respectively. Additional details for the
proof of Theorem 1 are given in Section 6.
3
•
•
2
Statements of results
This section describes the main results of the paper. The description involves log-likelihood and expected log-likelihood for censored survival data,
polynomial splines and their tensor products, errors of approximation based
on the linear space G of tensor products of polynomial splines, maximum
likelihood estimates, optimal rates of convergence, asymptotic normality,
asymptotic variances, and standard errors. We start with the log-likelihood
and expected log-likelihood functions.
2.1
Expected log-likelihood function
The log-likelihood based on (Y, 8, x) is given by
log{[j(Ylx)]"[S(Ylx)p-"} =
8 log 'x(Ylx)
+ log S(Ylx)
8Iog'x(Ylx) -
J
ind(Y
~
u) 'x(ulx) du
[see Kooperberg et al. (1995a)]. Observe that
E(J
•
ind(Y
~ u) 'x(ulx) dU)
=
J
'x(ulx)SC(ulx)S(ulx) duo
Thus the expected log-likelihood is given by
E(8log 'x(Ylx) =
Let
Xi
J
J
ind(Y
~ u) 'x(ulx) dU)
Sc(tlx)[log'x(tlx)f(tlx) - S(tlx)'x(tlx)]dt.
E R M denote the vector of covariates for the ith individual, 1 :::;
i :::; n. Let (TI , Cd, ... , (Tn, Cn) be independent random vectors such that
Ti and Ci are ind~pendent random variables having distribution functions
F(·lxi) and Fc(·lxi), respectively. Set Yi = min(Ti,Ci) and 8; = ind(T; :::;
C) for 1 SiS n. Let G denote a linear space of functions on T x X. The
expected log-likelihood function A(·) is defined by
•
A(g) =
L
,
J
Sc(tlx;)[g(tlxi)f(tlx;) - S(tlx;) exp g(tIXi)] dt,
•
4
9 E G.
Observe that A(·) is maximized at 1>(tlx) = log[j(tlx)/ S(tlx)] which mayor
may not be in G. We define the best approximation to 1> as a function in G
that maximizes A(·) over G.
The first goal is to prove that A(·) has a maximum in G. Suppose the
vectors
Xl, ... ,X n
of covariates take values in a compact interval X C R M
•
•
.
Let T denote a compact interval of the form [0, r] for some positive number
r. Without loss of generality, we assume that T = [0,1] and X = [O,l]M.
The following conditions will be required to prove the existence of the best
approximation in G.
Condition 1 The density function f("') is bounded away from zero and
infinity on T X X . Moreover, the survival function S ( ·1·) is bounded away
from zero on T xX.
This condition implies that S(llx) = P(T > 1 1 x) >
11>(,1,)1 is bounded away from infinity on T xX.
Condition 2 P(C E T
away from zero on X.
I x)
°on
= 1 for x E X and P(C = 1
I x)
X and that
is bounded
•
This condition implies that the censoring distribution Sc(tlx) is bounded
away from zero on T X X. According to this condition, censoring automatically occurs at time 1 if failure or censoring does not occur before this
time.
2.2
Polynomial splines and its tensor procducts
The best approximation will be chosen from the linear space G of tensor
products of polynomial splines. Specifically, let K = K n be a positive
integer and let h, 1 S k S K, denote the subintervals of [0,1] defined by
1)/ K, k/ K) for 1 S k < K and IK = [1 - 1/ K, 1]. Let m and q
be fixed integers such that m ~ and m > q ~ -1. Let S denote the space
h
= [(k -
of functions
°
.'3
..
on [0, 1] such that
•
5
(i) the restriction of s to
k
~
h is a polynomial of degree m (or less) for 1
~
K; and, if q 2:: 0, then
(ii) sis q-times continuously differentiable on [0,1].
A function satisfying (i) is called a piecewise polynomial, and it is called
a spline if it satisfies both (i) and (ii). Let B j, 1
~
j ~ J, denote the
usual basis of S consisting of B-splines [see de Boor (1978)]. Then J =
°
(m + I)K - (q + I)(K - 1), so K + m ~ J ~ (m + I)K. Also, Bj 2::
on [0,1], Bj = on the complement of an interval of length (m+ 1)/K for
°
1 ~ j ~ J, and Ej Bj = 1 on [0,1]. Moreover, for 1 ~ j ~ J, there are
at most 2m + 1 values of j' E {I, ... , J} such that BjBj is not identically
zero on [0,1]. Set f3 = (f3ll"" f3J) E R J and let 1f31 = (Ej 13;)1/2 denote
the Euclidean norm of f3. According to Theorem 4.2 of DeVore and Lorentz
(1993), there is a positive constant M o such that
..
(2.1)
MOl J-
l
lf31 2 ~
JL
I
2
f3jBj 1
~
l 2
M oJ- lf31 ,
J
Set d
= M + 1.
Let A denote the collection of ordered d-tuples j
=
(jO,jl, ... ,jM) with jO,jl, ... ,jM E {1, ... ,J}. Let G denote the tensor
product space spanned by the basis functions on T X X of the form
Bj(tlx) = Bjo (t)
II
Bjv (xv),
l~v~M
as j ranges over A. Then G has dimension I := Jd. Set B = B(tlx)
(Bj(tlx))jEA and (J = (OJ)jEA E e := R I . Then each 9 E G can be written
as g(tlx; (J) = (J • B(tlx) := (JTB(tlx). (Here AT denotes the transpose of
A.)
JM
Let H(x) denote the vector in R
with entries Hj1,. ..,jM(X) = Bjl (xt}
... BjM (XM), jl, ... , jM E {I, ... , J}. Also, set
...
-f3. H(x) =
J
J
jl=l
jM=l
L '" L
f3jl, ...,jM B jl (Xl)" ·BjM(XM),
6
where (3 = ((3jl ,... ,jM) E R JM. The following condition on the design points
Xl, •.. , X n is required for the existence of the best spline approximation.
Condition 3 There is a positive constant M I such that
(2.2) M 1 l n /((3' H)2
~ ~ ((3. H(xi)r ~
MIn /((3. H)2,
(3 E R
JM
,
I
and
(2.3)
iI, ... ,jM
REMARK.
E {l, ... ,J}.
It follows easily from Lemma 3.4 of Stone (1994) that if Con-
dition 5 below holds and if Xl, ..• , x n are replaced by independent random
variables with a common density function that is bounded away from zero
and infinity on X [as in Kooperberg et al. (1995b)], then (2.2) and (2.3)
hold for sufficiently large n, except on an event whose probability tends to
.
zero with n.
2.3
Spline approximation
Under Conditions 1-3, there exists an essentially uniquely determined function ¢* E G such that A(¢*) = maxgEG A(g). Moreover, if ¢ E G, then
¢*
¢ almost everywhere [see Kooperberg et al. (1995b)]. Let 0* de-
=
note the vector of parameters that is associated with ¢*, so that ¢*(tlx) =
Also, set H*(tlx) = J~ A*(ulx) du,
S*(tlx) = exp( -H*(tlx)) and f*(tlx) = A*(tlx) S*(tlx). These functions are
referred to as spline approximations.
0* ·B(tlx), and set A*(tlx)
= exp<j>*(tlx).
The errors resulting from spline approximation will be quantified in terms
of a smoothness condition that will now be described. Let 0 < (3 < 1. A
function 9 on T X X is said to satisfy a Holder condition with exponent
(3 if there is a positive number "I such that Ig(z) - g(zo)1 ~ "lIz - zoll3 for
z, Zo E T
X. Let m be a nonnegative integer and set p = m
+ (3.
A
function 9 on T x X is said to be p-smooth if it is m times continuously
X
•
•
7
differentiable on
r
X
X and
g(m)
satisfies a Holder condition with exponent
(3. The following smoothness condition will be used to describe the errors
resulting from spline approximation.
Condition 4 1> is a p-smooth function with p > d/2.
We do not assume that 1> is exactly equal to a spline, but we still can make
use of spline approximation. In order for this method to be accurate, we
need the error of approximation to tend to zero as the sample size n tends to
infinity; for this, it is necessary that the dimension I of the approximation
space G tend to infinity. To control the error of estimation we need this
dimension to increase-more slowly than n 1 / 2 .
Condition 5 I = In --r
•
•
00
and 1 2 = 0(n 1-£) for some
For a real-valued function h on
r X X, set IIhl12 =
> o.
[fTxx
Ih(tlx)1 2]1/2
and Ilhlloo = sUPTxX Ih(·I·)I· Also, set p = infgEG IIg - 1>1100. Under the first
part of Condition 5, p = 0(1) [see (2) on page 167 of de Boor (1978)]. Under
Conditions 4 and 5, p = O(J-P) [see Theorem XII.l of de Boor (1978)]. Our
first result gives the error bounds for the spline approximations.
Theorem 1 Under Conditions 1-5,
(2.4)
(2.5)
(2.6)
(2.7)
(2.8)
•
f.
111>* - 1>1100
11,x* - ,xII00
IIH* - Hlloo
IIS* - Slioo
111* - flloo
O(p),
=
O(p),
O(p),
O(p),
O(p).
The proof of Theorem 1 is given in Section 3.
•
8
2.4
Maximum likelihood estimation
The likelihood corresponding to the survival data (Yl , 61 , xI), ... , (Yn , 6n , x n )
equals
II {[-\(Yilx
i )]
e5i
S(Yilxi )}'
I
and the log-likelihood is given by
9 = 9("'; 8) E G.
Under Conditions 1-5, the log-likelihood function £ is strictly concave and
hence there exists a unique maximum likelihood estimate 9 = 8· BEG [see
Kooperberg et al. (1995b)], so that 8 = maxe £(8). The maximum likelihood
estimates of the log-hazard, hazard, cumulative hazard, survival and density
functions are given, respectively, by 1(tlx) = 8.B(tlx), X(tlx) = exp 1(tlx),
H(tlx) = J~ X(ulx) du, S(tlx) = exp( -H(tlx)) and j(tlx) = X(tlx)S(tlx)
for 0 ~ t ~ 1. These estimates are referred to as HARE estimates. The
next result bounds the L 2 and L oo norms of the error of the various HARE
estimates.
•
Theorem 2 Under Conditions 1-5,
(2.9)
Op(I/..(n) ,
Op(Vn- 1I),
18 - 8*1
(2.10)
111- </>*112
(2.11)
IIX - -\*112
2
max
10·J - 0*1
jEA
J
Op(Vn- l 1),
~
(2.12)
Op(n -1 !log 1),
11(tlx) - </>*(tlx)I
Op(Vn-l !log
I),
max· IX(tlx) - -\*(tlx)1
Op( vn- l !log
I),
(2.15)
max IH(tlx) - H*(tlx)1
Op(vn- 1JM) ,
xEX
(2.16)
- maxIS(tlx) - S*(tlx)1
Op(Vn-1JM),
xEX
(2.13)
(2.14)
max
tET,xEX
tET,xEX
tET
tET
=
9
•
//1- 1*112 =
(2.17)
(2.18)
•
max
tET,xEX
Ij(tlx) - I*(tlx)/
Op( v:;;:1j) ,
Op(
In- Ilog I).
1
The proof of the above theorem is given in Section 4. Let J", n 1/(2p+d)
in Theorems 1 and 2. (Here an '" bn means that an/b n is bounded away from
zero and infinity.) Then p = O(J-P). We conclude from these theorems that
II¢; -
¢//2 = Op(n- P/(2 P+d)) ,
II¢; - ¢//= =
Op ( {(log n) / n} -P/(2P+d)) ;
,
II~
- -\112 =
111- fib
Op(n- P/(2 P+d)) ,
= Op(n- p/(2 p+d)),
II~ -
-\11= =
Op({(log n)/n}-P/(2P+d));
//1- fll= = Op({(log n)/n}-P/(2P+d)).
Under suitable conditions, these are the well known optimal rates of convergence for nonparametric function estimation; see Stone (1980, 1982, 1983).
It follows from (2.6), (2.7), (2.15) and (2.16) that if J '" n 1 /(2p+M)
(d + 1)/2, then maXt IH(tlx) - H(tlx)1 = Op(n- p/(2 p +M)) and
p p
maXt IS(tlx) - S(tlx)1 = Op(n- /(2 +M)). In particular, these rates are
n- 1 / 2 in the absence of covariates (M = 0). These are well known optimal
rates for estimation of the cumulative hazard and survival functions in both
parametric and nonparametric settings [see Breslow and Crowley (1974),
and Andersen et al. (1993)].
with p
2.5
Let I
>
Asymptotic distributions of HARE estimates
= I( 0)
denote the I X I information matrix, which has entries
as j and k range over A. Let w denote a real-valued parameter depending
on ¢*, so that w = r(o*) for some function r(0), 0 E e. The maximum
•
likelihood estimate of w is given by
w=
•
10
r(8). Suppose r is continuously
differentiable on 8. Let ~T(8) denote the gradient of fat 8, which is the 1dimensional vector whose jth entry is f)f(8)/f)(Jj. The asymptotic standard
deviation (ASD) and standard error (SE) of ware defined by
ASD(w) = VVT(8*)T[I(8*)]-1~T(8*)
and
When w depends on t and x, we write f(8) as f(8; t, x). Some important
cases are given in the following table:
w
<t>(tlx)
.-\(tlx)
H(tlx)
S(tlx)
f(8;t,x)
8· B(tlx)
exp(8 . B(tlx))
J~ exp(8· B)
exp[- J~ exp(8· B)]
vT(8; t, x)
B(tlx)
B(tlx) exp(8· B(tlx))
J~ B exp(8. B)
-f(8j t, x) J~ B exp(8. B)
Thus, for example,
ASD(¢(tlx)) = V[B(tlx)]T[I(8*)]-lB(tlx)
and
SE(¢(tlx)) = V[B(tlx)]T[I(O)]-lB(tlx).
To obtain the asymptotic standard deviation and the standard error of the
density estimate, set f(8;t,x) = exp(8· B(tlx))exp[-J~exp(8.B)] for
o :::; t :::; 1. Then vT(8; t, x) = f(8j t, x) [B(tlx) - J~ B exp(8 . B)]. Thus,
f(8*; t, x) = f*(tlx) and vT(8*; t, x) = f*(tlx) [B(tlx) - J~ B.-\*]. In fact, it
follows from the basic properties of B-splines that
ASD(!(tlx)) :: j*(tlx) V[B(tlx)]T[I(8*)]-lB(tlx)
and
SE(!(tlx)) :: !(tlx) V[B(tlx)]T[I(O)]-lB(tlx),
11
O<t<1.
..
(Here an ~ bn means that an/b n -t 1 as n -t 00.)
The asymptotic distributions of HARE estimates are described in the
,.
following result. Here Zn ~ N(O,I) means that the distribution of Zn
converges to the standard normal distribution as n -t 00.
Theorem 3 Under Conditions 1-5, for t E T and x EX,
(2.19)
(2.20)
ASD(4)(tlx))
:.\(tlx) -=- A*(tlx)
ASD(A(tlx))
,
~ N(O
(2.21) H(tlx) -~ H*(tlx)
ASD(H(tlx))
~ N(O
(2.22) S(tlx) -~ S*(tlx)
ASD(S(tlx))
~ N(O
(2.23)
•
~(tlx) -~ 4>* (tlx) ~ N(O 1)
!(tlx) -~f*(tlx)
ASD(f(tlx))
1)
,
,
1)
,
= 1 + 0 (1)
p,
SE(~tlx)) = 1 + 0
'ASD(A(tlx))
(1)
p,
SE(H!tlx)) = 1 + 0 (1)
'ASD(H(tlx))
p,
SE(Sit1x ))
1)
,
SE(~itlx))
ASD(4)(tlx))
'ASD(S(tlx))
~ N(O
1)
,
=1
SE(j~lx))
'ASD(f(tlx))
+
t> 0,
(1)
0
p
,
t
> 0,
= 1 + 0 (1).
p
The proof of Theorem 3 is given in Section 5. Confidence intervals can be
constructed using Theorem 3 in an obvious manner. Suppose w = r(9*) is
a parameter of interest. Then w ± Zl-a SE(w) is an asymptotic 100(1 - a)%
confidence interval for w. Here <.l>(Zl-a) = 1 - a with <.l> being the standard
normal distribution function. In particular, the 100(1 - a)% confidence
interval for 4>*(tlx) is given by
Note that this .will be the 100(1 - a)% confidence interval for 4>(tlx) only
when the error of approximation (quantified in terms of p as described in
Theorem 1) goes to zero faster than the asymptotic standard deviation of
~. This can be achieved by choosing J such that In- 1!(2p+d) :;}> 1. If p goes
to zero at· a rate slower or equivalent to that of ASD(~), then the above
•
12
interval is the 100(1 - a)% confidence interval for the best approximation
to the log-hazard function. Similar remarks also apply to the confidence
intervals for the cumulative hazard, survival and density functions.
The above asymptotic results are analogous to those given in Stone
(1991) for logspline response model estimation based on noncensored data.
The proofs given here are mostly similar to the proofs in Stone (1991).
However, Lemmas 14 and 22 of Stone (1991) were established by using wellknown properties of exponential families. The corresponding results in this
paper, Lemmas 3 and 8, are the key to obtaining the L oo rates of convergence and asymptotic normality for the various HARE estimates. We prove
Lemma 3 by using a result motivated by the elementary theory of differential
equations (see Lemma 2) and Lemma 8 by using the martingale structure
of the log-likelihood function.
3
Proof of Theorem 1
We assume that Conditions 1-4 hold throughout this section. The proof of
(2.4), which follows from an argument similar to that given in Stone (1989)
and based on de Boor (1976), is given in the appendix for completeness.
Equation (2.5) follows from (2.4) and the following:
.\* - .\
= exp ¢>* -
exp ¢> = (¢>* - ¢»
10
1
exp(¢> + u(¢>* - ¢») duo
Equation (2.6) follows easily from (2.5).
The proof of (2.7) follows from (2.6) and
-
Observe that
1
-I
5* - 5 = exp( -H*) - exp( -H) = -(H* - H)
exp( -H - u(H* - H)) duo
f* - f = .\*(5* - 5) + (.\* - .\)5. We conclude from (2.5)
and (2.7) that (2.8) holds.
13
4
Proof of Theorem 2
We assume Conditions 1-5 hold throughout this section. Recall that the
log-likelihood function is given by
£(g)
= ~ 8ig(Yilxi; 0) -
(y..
~ 10 • exp g( UIXi; 0) duo
t
t
Let
8
8(0) = 80£(g)
denote the score at 0, which is the I-dimensional vector having entries
8£(g)
.~ =
J
(Yi
L8
i Bj(Yilxi) - L10 Bj(ulxi)expg(ulxi;O)du.
i
i
0
Let
8 2 £(g)
8080 T
denote the Hessian of £(g), which is the I
X
I matrix having entries
8 2£(g)
(Yi
8(}i)(}k = - ~ 1 Bj(ulxi)Bk(ulxi) expg(uIXi; 0) duo
•
0
[Recall that j = (jo, j1, ... ,jd) and k = (k o, kl, ... , kd).]
Set S*
= 8(0*).
written as
The maximum likelihood equation S(9)
1
1
dd 8(0*
u
+ u(9 -
=0
can be
0*)) du = -8*.
This can further be written as D(9-0*) = -8*, where D is the I X I matrix
given by
1
(4.1)
D =
-
1
0
82
8080
T£(O*
~
+ u(O -
0*)) duo
Proofs of (2:9) and (2.10)
•
It follows from the maximum likelihood equation that
(4.2)
,
14
According to (3.5) and (3.6) of Kooperberg et al. (1995b), there is positive
.
constant M 1 such that
(4.3)
..
and
(4.4)
exc~pt
on an event whose probability tends to zero with n. Since
1(0 - 0*) TS*I ~ 10 - O*IIS*I,
it follows from (4.2)-(4.4) that
10 -
II¢ - 4>*11
0*1 2 = Op(I2 jn) and hence that
2
= Op(I/n).
This completes the proofs of (2.9) and (2.10).
Proof of (2.11)
According to (2.10), Lemma 2 of Kooperberg et al. (1995b) and Condition
5, II¢ - 4>*1100
= Op(I/-Jii) = op(1).
The desired result foHows from (2.4),
(2.10) and
(4.5)
X-
A*
= exp¢ -
exp4>*
= (¢ -
1
.
1
4>*)
exp(4)*
+ u(¢ -
The proof of (2.12) requires a sequence of lemmas.
min[exp g(·I·; 0)] and Arnax(O)
= max[exp g(·I·; 0)]
4>*)) duo
Set Arnin(O)
.
Lemma 1 There is a positive constant M 2 such that
for 0, TEe and n-
2 1. Moreover,
and
•
for n
2
1 and 0, TEe such that Arnin(O)
15
> O.
•
Proof. According to (3.9) of Kooperberg et al. (1995b),
L 1g2(ulxi; T)S(ulxi)Sc(ujXi) exp g(UIXi; 0) duo
1
T TI(O)T =
•
We conclude from (2.1), Condition 1 and (2.2) that (4.6) holds.
If Amin (0) > 0, then it follows from (4.6) that there is a nonsingular
symmetric matrix R(O) such that 1(0) = R(O)R(O). Also,
~
[M2Amax(0)]-ln-l I
Replacing
T
T);(';)T
~ M 2[A min(0)t 1n- 1I,
TEe.
by [R(O)]-I T , we conclude that (4.7) is valid.
Similarly, it follows from (4.7) applied to T and [R(0)t1T that (4.8) is
valid.
Lemma 2 Let1jJ(·) ands(·) denote piecewise smooth Junctions on [0, 1]. Set
w(y) = s(y)
•
-l
Y
s(u)1jJ(u) du +
1
1
s(u)1jJ(u) du = s(y)
+
i
1
s(u)1jJ(u) duo
Then
Proof. We have
dw(y),
ds(y) - s(y)1jJ(y)
d(s(y)ex p
i
i
i
s(l) - s(y) .exp
11jJ
(ex p
(U)du)
11jJ
11jJ
(U) du)dW(y),
i i
1
(ex p
(u) du
i
1
w(l) - s(y) exp
i
1jJ(u) du
1
(ex p
i
1
1jJ(t) dt)dw(u),
11jJ
(t) dt)dw(U).
[Since w(l) = s(l).] Thus
•
s(y) exp
i
1
1jJ(u) du =
w(l) -
~1
16
(
exp
i
1
1jJ(t) dt) dweu)
[W (u) exp
w (1) -
-i
1
.
1f; (t) dt
1
w(u)1f;(u) (ex p
i
w(y) (exp
-i
i1 J:
i
1f;(t) dt) du
1
1f;(u) dU)
1
w(u)1f;(u) (ex p
i
1
1f;(t) dt) duo
Set 'l1(u, y) = 1f;(u) exp f~ 1f;(t) dt. Then
s(y)
i
w(y) - exp ( w(y) -
i
1
1f;(u) dU)
i
1
w(u)1f;(u) (ex p
i
1
1
w(u)'l1(u, y) duo
Thus
1
1
s2(y~dy
<
<
1f;(t) dt) du
21
21
1
1
(i
1
W
2
(y)dY+21
W
2(y)d +21 w 2(u)du
Y
1
1
0(1
1
w(U)'l1(u,y) dU
11
1
1
r
dy
'l1 2(u,y)dudy
2
w (y) dY).
This completes the proof of Lemma 2.
Let G'"(y, 8, x) = G(y, 8, Xj 8") = [Gj(Y, 8, x; 8")] denote the I-dimensional
vectors with the j-th entry given by
G j (y,8,xj(J'") = (8Bjo(y)
-1"
Bjo(u)expg(uIXj(J'")du)
o
II
Bjv(x v ),
lS;vS;M
where j = (jo, it, ... , j M) and x = (x 1l ... , x M ). It follows from (2.5) and
the basic properties of B-splines that
(4.9)
max sup
IS;JoS;J ",X
I10(" Bjo (u) exp g(ulx; 8") dul = O(J-l)
.
and hence that
(4.10)
IG'"(y, 8, x)1 = 0(1)
uniformly in (y, x) E T xX.
17
•
Note that S* = 2:; G*(Yi, 8;, x;) and E(S*) = O. Let YC(S*) denote the
variance-covariance matrix of S*.
Lemma 3 There is a positive constant M 3 such that
Proof. Note that
r TYC(S*)r
var( r TS*)
~ var (8;g( YiI X;; r) -
l
Y
;
g( ulx;; r) exp g( ulx;; 8*) dU).
I
By (2.1) and (2.2),
~ E[g2(Yilx;; r)] = 0
(4.11)
(j-
1
IrI 2 )
I
•
and
(4.12)
~ E [(
l
Y
; g(ulx;;
r) exp g( ulx;; 8*) du
I
•
r]
= 0
(j- 1 IrI 2).
It follows from (4.11) and (4.12) that the upper bound holds.
Set s(ylx) = g(ylx; r), 1jJ(ylx) = exp g(ylx; 9*) and J-L = E[s(Ylx) s(ulx)1jJ(ulx)] duo Since censoring automatically occurs at time t = 1
(Condition 2), we have
Ier
-l
-l
Y
(4.13)
var(8;g(Y 1x; r)
g(ulx; r) exp g(ulx; 8*) dU)
Y
s(ulx)1jJ(ulx) dU)
= var(8s(Y1x)
l
l
-I
= E [ (8s(Y x) 1
Y
+ E [( -
-~ Jl~c
Y
1]
s(ulx)1jJ( ulx) du - J-L) \ Y ~ 1]
s(ulx)1jJ( ulx) du - J-L) \ Y
<
t
[s(t)
s(ulx)1jJ(ulx)du- J-L]2 dSC(clx)dS(tlx)
18
+ P(Y = 1) [
-1
1
s(ulx)'ljJ(ulx) du _
1
~ 09~1
min j(tlx) r Sc(tlx) [s(t)
10
+ P(Y =
1 1
1
1)
_
rt s(ulx)'ljJ(ulx) du _ Ii] 2 dt
10
1
[ -
.
Ii] 2
s(ulx)'ljJ(ulx) du _
By Conditions 1 and 2, P(Y = 1) = Sc(llx)
>
Ii] 2 dt.
0 and Sc(tlx) ~
> 0 for 0 ~ t ~ 1. Set C1 = min{min099 j(tlx)Sc(llx), P(Y =
I)}. Then C1 > O. Hence by (4.13), Lemma 2 and Condition 1, there is a
Sc(llx)
positive constant C2 such that
Y
var(8i g(Y 1x; T) - 1
2:
~
1
1
g(ulx; T) exp g(ulx; 8*) dU)
1
t
[s(t1x) -
s(ulx)'ljJ( ulx) du +
1
r
1
s( ulx)'ljJ( ulx) du
dt
2: c211 g2(tlx; T) dt.
It follows from Condition 1, (2.1) and (2.2) that there is a positive constant
C3 such that
•
L var (8ig(YiIXi; T) - 1 g(UIXi; T) exp g(UIXi; 9*) dU) 2: c3nl-1ITI2.
Yi
I
This completes the proof of Lemma 3.
Set 1* = 1(9*). Consider the approximation rp = rpn E
e
to
8-
9*
defined by I*rp = S*. Note that rp = (1*)-1S* and hence E(rp) = 0 and
[G*(y,8,x)]Trp = [G*(y,8,x)JT(I*)-1S*. It follows from (4.10), (2.5) and
(4.8) that
(4.14)
ITT (I*)-1G*(y, 8, x)1 = 0(n- 1 llTI)
uniformly in n, T, y and x.
Proof. Since (I*)-1VC(S*)(I*)-1 is the variance-covariance matrix of rp, it
follows fronl (2.5), (4.8) and Lemma 3 that maxj var( ~j) = 0(1 In). Observe
19
•
that
(PJ =
~)(I*)-1G*(Y;, c5i , Xi)]).
J
By (4.14),
max sup [(I*)-1G*(y, 15, X)]j = 0(1 In).
JEA y,x
The desired result follows from Condition 5 and Bernstein's inequality.
Proof of (2.12)
The proof is contained in the following lemma.
Lemma 5 (i) maxjEA
le
9jl2
j 2 3
= Op(n- 1 Ilog 1).
(ii) 10 - 9* - <p1 2 = Op(n- 1 log 1).
Proof. It follows from the maximum likelihood equation that [see (4.1)]
•
0- 9* "
<P = (1*)-1(1*
+ D)(O -
9*).
According to (2.5) and (4.8),
We claim that
(4.15)
1(1* + D)(O - 9*W = Op (nmax(ej JEA
9j)2) .
[The proof of (4.15) will be given shortly.] Therefore
10 ~ 9* - <p1 2 = Op
(n- 112max (ej - OJ)2) .
JEA
Consequently, by Lemma 4,
•
•
20
0(n I / 2 -
Thus by Condition 4 (I =
max (OJ
JEA
-
t
)),
OJ)2 = Op
(n-
I
!log
I) ,
which yields the desired results.
Proof of (4.15). Set Ni(t) = ind(Yi ~ t,8i = 1) and Zi(t) = ind(Yi ~ t),
1
~
i
~
n. Thus Ni(t) = 1 if and only if the ith subject is observed to have
failed by time t and Zi(t) = 1 if and only if the ith subject is still at risk
just prior to time t. The log-likelihood function can be written as
f(O) =
I: (/ g(UIXi; 0) dNi(U) -
/ Zi(U) exp g(UIXi; 0) dU).
I
The jth entry of the score function S(O) is given by
o~~~)
=
~ ( / Bj(ulxi) dNi(U)- /
Bj(ulxi)Zi(U) exp g(UIXi; 0) dU)'
j E A.
The entry in row j and column k of the Hessian matrix of f( 0) is given by
"
fjk(O)
[1
02f(0)
= OOjOOk = - I: 10
Bj(ulxi)Bk(ulxi)Zi(U) exp g(UIXi; 0) duo
I
..
Set
fill
Jkm
03f(0)
OOjOOkOOm
(0)
- I: faI Bj(ulxi)Bk(ulxi)Bm(ulxi)Zi(U) exp g(UIXi; 0) du,
I
where m = (mo, mI, ... , md) with mo, mI, ... , md E {I, ... , J}. Note that
The entry in row j and column, k of 1* + D can be written as
faI[f'jk(O*
+ t(8 -
0*)) - f'jk(O*)] dt + f'jk(O*) - E[f'jk(O*)]
I: Ajkm(Om m
21
O~)
+ f'jk(O*)
- E[f'jk(O*)],
•
where
Ajkm = Anjkm =
Thus, the jth entry of (1*
~1 (1- t)£'j~m(8* + t(O -
+ D)(O -
8*)) dt.
8*) is
LLAjkm(Ok - 0k)(Om - O~) + L{£'jk(8*) - E[£'jk(8*)]}(Ok - Ok)'
k
m
k
We claim that
(4.16)
L (L L Ajkm(Ok - Ok) (Om J
k
O~)) 2
m
=
Op( max IOj - Ojl2 n 2/-210 -
=
Op(n~ax
IOj - Oj12)
JEA
JEA
8*1 2)
and
(4.17)
~ (~{£'jk(8*) = Op
E[£'jk(8*)]}(Ok _ Ok)) 2
(n ~ax IOj - OjI2).
JEA
[The proofs of (4.16) and (4.17) will be given shortly.] It follows from (4.16)
and (4.17) that
as desired.
Proof of (4.16). This follows from (2.9) and the following result.
Lemma 6 There is a positive constant M4 such that
22
Proof. Note that
L (LL °99
max lejkm(8* + t(O k
j
<
-
8*)lliml)2
m
II exp g('I'; 8* + t(O _ 8*))11
[max
°99
•
00 ]
2
xL (L L liml! Bj(ulxdBm(ulx;) du
J
•
m
r·
There is a positive constant J o (not depending on n) such that
(4.18)
Bj(ulx;)Bm(ulx;) = 0 unless
Im v - ivl
$ Jo, v = 1, ... , M.
Thus, by (2.3) and the basic properties of B-splines,
2; L
I
liml! Bj('ulx;)Bm(ulx;) du
m
$
L
n[-l
L
L
Imo-jol~Jo Iml-jll~Jo
limoml ..·mMI·
ImM-jMI~Jo
.
Hence, by the Schwarz inequality,
L
(LLliml !
Bj(ulxi)Bm(ulx;)dUr $ n 2[-2(2Jo + 1) 2d lr
JIm
I2.
..
The desired result follows from (2.5), (2.9) and Condition 5.
Proof of (4.17). Set
Vjd'u) = LBj(ulx;)Bk(ulx;){Zi(U) - E[Zi(U)]} 'x*(ulx;).
I
Then E[Vjk( u)] = 0 for 0 $ u $ 1 and
e'jk(8*) - E[£'jk(8*)] = -
!
Vjk(U) duo
Thus (4.17) follows from the next lemma.
Lemma 1 Uniformly in r E 8,
. L (Llikl!Vjk(U)du)2 =Op(n max i fo...kM)'
.
k
ko,.·.,kM
J
23
'.
Proof. By (4.18) and the Schwarz inequality,
r
it (~Irkl J
Vjk(U) du
::; ( ko,oo.,kM
max rfo ...kM)
L. ( L ... L
J
IkM-jMI$Jo
Iko-jol:9o
IJVjk(U) dul)2
::; ( max rfooo.kM) (2Jo + 1)d
ko,oo.,kM
xL L ... L
J
J
101(u)du.
IkM-jMI$Jo
Iko-jol$Jo
Since E[V'j1(u)] = L;[Bj(ulx;)Bk(ulx;)A*(ulx;)]2var(Z;(u)) for 0 ::; u ::; 1,
the desired result follows from (2.3) and (2.5).
This completes the proof of (2.12).
..
Proof of (2.13)
It follows from Lemma 5(i) and the basic properties of B-splines that
•
4>*1100 =
II Lj(Oj -
OJ)Bjlloo = Op(Vn-1Ilog 1) .
Proof of (2.14)
This follows from (4.5), (2.4) and (2.13).
Proof of (2.15)
Set '!'(ulx) = J~ exp(4)*(ulx)
(4.19)
•
H(tlx) - H*(tlx)
+ w(¢(ulx) -
4>*(ulx))) dw. Note that
~t (~(ulx) _
~t[¢(ulx)
24
A*(ulx)) du
- 4>*(ulx)]'!'(ulx) du
II¢-
and
(4.20) ¢(tlx) - ¢*(tlx)
= 2)9j - 9; - rpj)Bj(tlx) + L
J
rpjBj(tlx).
J
It follows from Condition 5, Lemma 5(ii) and the basic properties of B-
splines that
(4.21)
~tIL(9j - 9; - rpj)Bj(ulx) Idu =
op(yln-1JM).
J
,
Also, it follows as in the proof of (37) of Stone (1991) that
o~~~ll ~ rpj ~t Bj(ulx)-\*(ulx) dul = Op( yln-1JM).
The desired result follows from (2.4) and (2.10).
Proof of (2.16)
This follows from (2.6), (2.15) and
•
S(tlx) - S*(tlx)
= exp( -H(tlx)) -
exp( -H*(tlx))
= -[H(tlx) - H*(tlx)] ~l exp( -H*(tlx) - u[H(tlx) - H*(tlx)]) duo
Proofs of (2.17) and (2.18)
Since !(tlx) - f*(tlx) = ~(tlx)[S(tlx) - S*(tlx)] + [~(tlx) - -\*(tlx)]S*(tlx),
the desired results follow from (2.5), (2.7), (2.11), (2.14) and (2.16).
5
Proof of Theorem 3
I.
Throughout this section, we assume that Conditions 1-5 hold.
25
Proof. Recall that the log-likelihood function is given by
•
£(8) =
L (/ g(UIXi;8) dNi(U) -
/ Zi(u)ex pg (u 1xi;8)dU)
•
and the score function is given by
8£(8)
80 j
L ( / Bj(ulxi) dNi(U) - / Bj(ulxi)Zi(U) exp g(UIXi; 8) dU)
•
L / Bj(ulxi) dMi(ulxi),
j E A,
•
where d.lVfi(ulx) = dNi(U) - Zi(U) exp g(ulx; 8) duo Thus,
(5.1)
'T TVC(S*)'T
= var( 'T TS*) = L
var ( / g( UIXi; 'T) dMt(UIXi))
•
and
(5.2)
LE (/ g2(ulxi; 'T)Zi(U)A*(ulxi) dU)
•
L / g2(ulxi; 'T)S(UIXi)SC(ulxi)A*(ulxi) du,
•
where Mt(t) = Ni(t) - J~ Zi(U) exp g(UIXi; 8*) duo Let E*(·) and var*(·)
denote the expectation and variance functions taken with respect to 1*.
According to Theorem 2.5.4 of Fleming and Harrington (1991) (p.77), or
Proposition 11.4.1 of Andersen et al. (1993) (p.78), Mt(ulxi) is a zero-mean
martingale with (Mt, Mt)*(t) = J~ Zi(U) exp g(UIXi; 8*) duo (Here it is necessary to use an alternative probability space with probability measure P*,
under which the counting process Ni(t) has an intensity function A*; (., .)*
is the corresponding variation process.) Hence,
(5.3)
•
var*('T TS*)
L
E* (/ g2(ulxi; 'T)Zi(U)A*(ulxi) dU)
•
L / g2(ulxi; 'T)S*(UIXi)SC(ulxi)A*(ulxi) duo
•
26
It follows from (2.5), (5.2), (5.3) (2.1) and (2.2) that
(5.4)
ITTI(8*)T - var*(T TS*)I = O(nI-1ITI21IS* - SlIoo) ,
TEe.
Set
Ui =
=
/ g(uIXj; T) dMt(uIXj)
8j g(Yilxj; T) - / g(uIXj; T)A*(ulxi)Zj(U) duo
Then
Write 9 = g(.; T). Then
E(Ui)
-
E(8 j g(YiIXj;T)) - E ( / 9A*Zi)
-
/ Se(ulxj)g(uIXj)f(ulxj) - / g(UIXj)A*(ulxj)Se(ulxj)S(ulxj)
=
E*(8ig(YiIXjjT)) - E* ( / 9A*Zj)
=
/ Se(ulxj)g(ulxj) j* (ulxj) - / g( ulxj) A*( ulxj)Se( ulxj)S*(ulxj).
•
and
•
E*(U;)
Thus,
[E(Ui)]2 - [E(Uj)]2
+ E*(Uj)]
=
[E(Uj) - E*(Ui)][E(Uj)
=
( / SCg(J - j*) - / gA*Se(S - s*))
x (/ SCg(J* + J) - / gA*SC(S* + S)) .
Hence, by (2.2), (2.5), (2.7), (2.8) and (2.1),
(5.6)
L I[E(Uj)F -
[E(Uj)]21 =
o(nI- 1ITI 211j* 27
flloo)'
TEe.
•
.
We claim that
[The proof of (5.7) will be given shortly.] Hence, by (5.1) and (5.5)-(5.7),
(5.8)
IrTVC(S*)r - var*(rTS*)! =
o( nI- 1 IrI 2 111* - 11100)'
r E
e.
The desired result follows from (2.7), (2.8), (5.4) and (5.8).
To verify (5.7), we first note that,
(5.9)
= E (~g2(Yilxi; r)) + E [( 1
E(Ul)
- 2E (8 g(YiIXi; r)
l
Yi
g(UIXi; r)A*( UIXi) du
f]
Yi
g(UIXi; r)A*( UIXi) dU)
and
..
= E*(8g 2(Yilxi;r)) +E*[(1
(5.10) E*(Ul)
Yi
9(uIXiir)A*(ulxi)duf]
Yi
- 2E* (8g(YiIXii r) 1
•
g(UIXi; r)A*(ulxi) dU) .
By (2.1) and ( 2.2),
(5.11)
L
2
IE(8 g2 (YiIXi; r)) - E* (8g (Yilxi; r)) I
I
~
L f SC(UIXi)g2(ulxi)l/(ulxi) -
1*(UIXi)1 du
I
1
= O(nI- IrI
2
111 - 1*1100)'
Set Sy(ylx) = S(ylx)SC(ylx) and Sy(ylx) = S*(ylx)SC(ylx). Then
dSy(ylx) = S(ylx)dSc(ylx) - l(ylx)SC(ylx),
•
dSy(ylx) = S*(ylx)dSC(ylx) - 1*(ylx)SC(ylx)
28
and
'f
Y
Y
E[Cfa g(u1x;r)-\*(UIX)dUrJ -E*[(1 g(UIX;r)-\*(UIX)dUrJ
=
!
Y
(1 g(UIX;r)-\*(UIX)dUr dSy(ylx)
-!
=
!
Y
(1 g(ulx)-\*(ulx) dUr dSy(ylx)
Y
(1 g(ulx)-\*(ulx) dUr (Sc(Ylx)[f(Ylx) - I*(ylx)] dy
- [S(ylx) - S*(ylx)] dSc(ylx)).
Thus, by (2.5), (2.7), (2.1) and ( 2.2),
(5.12)
~ IE[ (1
Y
; g(uIXj;
r)-\*(ulxj) dUfJ
I
Y
- E* [( 1
;
g(ulxj; r)-\*(uIXj) du
fJ 1= 0
(nI- 1 IrI 2 I1f - 1*1100).
Also,
Y
E(8g(Y1x; r) 1
g(ulx; r)-\*(UIX))
Y
- E* (8g(Y 1X; r) 1
=
!
g(ulx; r)-\*(ulx) dU)
Sc(tlx)g(tlx; r)[f(tlx) - I*(tlx)]
x (1
tg
(u x;r)-\*(u IX)dU)dt.
1
Thus, by the Schwarz inequality, (2.5), (2.1) and ( 2.2),
(5.13)
L
I
IE (8 jg(Yi IXj; r) l
Y
; g(uIXj;
r)-\*(ulxj) dU)
_
-E* (8 g(Yi I Xj ; r) 1
Yi
g(ulxj; r)-\*(ulxj) dU)
=
It follows from (5.9)-(5.13) that (5.7) holds.
This completes the proof of Lemma 8.
29
I
O(nI- 1 IrI 2 I1f - 1*1100)'
Proof of (2.19)
Set Wi
= [B(tlx)]T (I*)-lG*(Yilxi), i = 1, ... , n.
[B(tlx)]T (j; =
Observe that
L 9jBj (tlx) = L Wi.
i
J
Also, E([B(tlx)]T(j;) = 0 and
var([B(tlx)]T (j;) = [B(tlx)]T (1*)-lVC(S*)(I*)-lB(tlx).
Lemma 9 var([B(tlx)]T(j;)
Proof. Since Bj
~
f'o.J
n- 1 I.
0 and L:j Bj = 1, we have that
(5.14)
IB(tlx)1
f'o.J
1.
By (5.14), (4.8) and (2.5),
•
•
(5.15)
1(1*)-lB(tlx)1 = 0
(n- I) .
1
By Lemma 8, (5.15) and Condition 5 [p = 0(1)],
(5.16)
I[B(tlx)]T (1*) -lVC(S*)(I*) -lB(tlx) - [B(tlx)]T (1*) -lB(tlx) I
=0(n-1I).
By (5.14) and (4.7),
(5.17)
It follows from (5.16) and (5.17) that
as desired.
Lemma 10
•
30
Proof. The random variables WI,"" W n are independent with mean
zero. Moreover, by (5.15) and (4.10),
..
IW;1 2 = I[B(tlx)]T (I*)-IG*(Yilx;)1 2 = 0(12/ n 2).
(5.18)
The desired result follows from Condition 5, Lemma 9 and the central limit
theorem.
Now according to Lemma 5(ii) and Condition 5,
I[B(tlx)]T (8
-
0* - <p)1 =
op (In-II) .
Since ¢(tlx) -¢*(tlx) = Lj(Oj-Oj)Bj(tlx), we now conclude from Lemmas 9
and 10 that
¢(tlx) - ¢*(tlx) d
SD([B(tlx)]T<p) ~ N(O, 1).
By (5.17),
AV(¢(tlx)) = [B(tlx)]T(I*)-IB(tlx)
rv
n- I I.
Thus, by (5.16),
var([B(tlx)]T <p) ~ AV(¢(tlx)).
Hence,
¢(tlx)
-~¢*(tlx) ~ N(O 1).
ASD(¢(tlx))
•
,
This completes the proof of the first part of (2.19).
To prove the second part of (2.19), set
i
= 1(8). The proof depends on
the next two lemmas.
Lemma 11 Uniformly in
T
E 8,
2
l(i-I*)TI
=Op(nITI 2 I- l log
I).
Proof. Observe that
E[fjk(O)] - E€jk(O*)]
<
max 10m
- mEA
X
-
O~I [max
II exp g(·I·; 0* + t(O 099
0*))11 00 ]
LL f Bj(ulx;)Bdulx;)Bm(ulx;)S(ulx;)SC(ulxj) du,
m
,
31
•
It
It follows from the basic properties of B-splines as in the proof of Lemma 6
that uniformly in 0, T E 8,
L: [L: (E[fjk(O)] j
Efjk(O*)])Tkf
k
= 0
(n 2 max(Om - O~)2 [ max II exp g(.I.;
09~1
mEA
(J*
+ t(O _ 0*))11
00 ]
21-2ITI2).
The desired conclusion follows from (2.5), (2.9), Condition 5 and (2.12).
This completes the proof of Lemma 11.
Lemma 12 Uniformly in
.
T
E 8,
Proof. Since j-1 - (1*)-1 = (1*)-1(1* - j)j-1, the desired result follows
from (2.5), (2.14), (4.8) [with 0
= 0*
and 0
= 8] and Lemma 11.
The proof of the second part of (2.19) will now be given. Recall that
•
SE(¢(tlx)) = V[B(tlx)]Tj-1B(tlx)
and
ASD(¢(tlx)) = V[B(tlx)]T(I*)-lB(tlx).
By the Schwarz inequality, (5.14), Lemma 12 and Condition 5,
I[B(tlx)]Tj-1 B(tlx) - [B(tlx)]T (1*) -1 B(tlx) I
= I[B(tlx)]T[j-1 - (I*)-l]B(tlx)1
~ IB(tlx)ll[j-1 - (I*)-l]B(tlx)1
Op( vn- J3log I).
= op(n- I).
3
=
1
It now follows from (5.17) that the second part of (2.19) holds.
32
Proof of (2.20)
•
It follows from (2.4) and (4.5) that
~(tlx) - 'x*(tlx) = [¢(tlx) - 4>*(tlx)]'x*(tlx) + Op( v'n- 1I).
By (2.5) and (5.17), ASD(~(tlx)) = 'x*(tlx)ASD(¢(tlx)) ""' v'n- 1 I. Thus the
desired result follows from SE(~(tlx)) = ~(tlx) SE(¢(tlx)), (5.16), (2.14) and
(2.19).
Proof of (2.21)
Set r(8*; t, x) = J~ X"(ulx) du, 0 :S t
(5.19)
vT(8*; t, x) =
I
:S
1. Then
t
'x*(ulx)B(ulx) duo
It follows from (2.4) and (4.19)-(4.21) that
(5.20)
H(tlx) - H*(tlx) = [vT(8*; t, x)]T ep +
op( v'n- 1JM).
•
Now observe that
(5.21)
var([vT(8*; t, x)F ep)
= [vT(8*; t, x)]T (I*)-IVC(S*)(I*)-IV'r(8*; t, x).
By (5.19), (4.9), (2.5) and the basic properties of B-splines,
(5.22)
IV'r(8*; t, xW ""' J-l,
t > O.
It follows from (5.21), (5.22), (2.5), (4.8) and Lemma 8 that
(5.23)
var([V'r(8*; t, x)]T ep)
= [V'r(8*; t, x)]T (I*)-IV'r(8*; t, x)
+ 0 (n- 1JM p).
The following result is an easy consequence of (4.7) and (5.22).
•
33
Lemma 13 [vT(8*; t, X)]T(I*)-l VT(8*;t,x) '" n- 1JM,
t> O.
The next result establishes the central limit theorem for [vT(8*; t, x)F4"
The proof is similar to the argument for Lemma 10.
Lemma 141ft> 0, then
[vT(8*; t, x)F4'
SD([vT(8*; t, x)]T4')
d
--t
N(O, 1).
Since ASD(H(tlx)) = J[vT(8*; t, x)]T (1*)-1 vT(8*; t, x), we conclude
from (5.20), (5.23), and Lemmas 13 and 14 that the first part of (2.21) is
valid. By (5.19), (2.14), and (4.9) with 8* replaced by 9,
(5.24)
•
Similarly,
(5.25)
2
( -1) .
~ t, x)1 = Op J
IvT(8;
2
IvT(9;t,x) - Vr(8*;t,x)1 = op(J- 1 ).
Since SE(H(tlx)) = JVr(9;t,x)]Ti- 1vr(9;t,x), we conclude from (4.8),
(5.22), (5.24), (5.25), Lemmas 12 and 13, Condition 5 and the Schwarz
inequality that the second part of (2.21) is valid.
Proof of (2.22)
By a similar expansion as in the proof of (2.16),
S(tlx) - S*(tlx) = -[H(tlx) - H*(tlx)]S*(tlx)
+ op(yln- 1 JM).
According to Lemma 13 and (2.7), ASD(S(tlx)) = S*(tlx) ASD(H(tlx)) '"
yln- 1 JM. We ·conclude from SE(S(tlx)) = S(tlx) SE(H(tlx)), (2.16) and
(2.21) that (2.22) holds .
•
•
34
..
Proof of (2.23)
By (2.5), (2.14) and (2.16),
(5.26)
f(tlx) - j*(tlx)
= (~(tlx) - 'x*(tlx))S*(tlx)
= [~(tlx) -
+ (S(tlx) - S*(tlx))~(tlx)
'x*(tlx)]S*(tlx) + Op( v'n- 1JM).
By (5.22) and (5.24), respectively,
11
(5.27)
and
t
2
11 B~12
t
(5.28)
O(J- 1 )
B,X*1 =
=
Op(J-
1
'
).
It follows from (4.7), (5.14), (5.27), (5.28), Lemma 11, Condition 5 and the
Schwarz inequality that
-I
=
j*(tlx)
~
j*(tlx) ASD(~(tlx))
=
f(tlx)
ASD(f(tlx))
(B(tlx)
t
B,X*) T (1*)-1 (B(tlx)
-I
t
B,X*)
•
and
1B(tlx)~)
(B(tlx) -
1B~)
t
t
SE(f(tlx))
Ti-I (B(tlx) -
~ f(tlx) SE(~(tlx)).
The desired result follows from (5.26), (2.18), (2.19), (2.20) and ASD(f(tlx)) ""
JI/n.
6
Appendix. Proof of (2.4)
Let A11, A12 ,
and 2,
...
denote constants greater than 1. According to Conditions 1
(t,x) E T x X,
(6.1)
35
•
.
•
and S(tlx)Sc(tlx) = 0 for t > 1.
Let A denote a collection of functions ¢ on T
condition
I¢(z) - ¢(zo)1 ~ '"Ylz - zol13,
X
X satisfying the Holder
z,zo E Tx X,
and the boundedness condition
¢EA.
(6.2)
Note that if ¢ E A and 0
~
u
< 1, then
u¢ EA. Set
p = p(¢) = inf II¢ - 91100,
gEG
¢EA,
and note that p( ¢) ~ M 2 for ¢ E A. Writing ¢* as Q¢ and closely following
the argument in Stone (1989), we will obtain an inequality of the form
(6.3)
II¢ - Q¢lIoo ~ Mp(¢),
¢EA,
where the positive constant M depends on A and the degree m of G, but
not on the dimension I of G. We conclude from (6.3) that (2.4) holds.
Let 1jJ be a function on T
X
X such that
(t,x) E Tx X.
(6.4)
Consider the IxI matrix M whose (j, l)th entry is L:i J Bj (tIXi)BI (tIXi)1jJ(tlxi) dt
for j and 1 range over A. It follows from (2.1) and (2.2) that M is invertible.
Let '"Yjl denote the (j, l)th entry of M- 1. Then IIM-111 = maxj L:/I'"Yjil.
00
By a slight extension of a result in de Boor (1976), IIM-11l 00 ~ M 4 n- 1 I.
This has the following consequence.
•
•
36
We now verify (6.3). Choose </> E A and 9 E G. Since
L J{[u9(tIXi) + Q</>(tIXi)] exp </>(tIXi)
1
- exp(ug(tIXi)
+ Q</>(tIXi)) }S(tIXi)Sc(tlxi) dt
is maximized at u = 0,
L
J
g(tIXi)[exp</>(tlxi) - expQ</>(tIXi)]S(tlxi)Sc(tlxi) dt = O.
1
Consequently, for j E A,
(6.5)
L JBj(tlxi)[exp</>(tlxi) - expQ</>(tIXi)]S(tlx)Sc(tlxi) dt = O.
1
Let </> E A. Then there is an ¢ E G such that II</> - ¢1100 = p(</». Note
that Q¢ = ¢. Note also that 11¢1I00 ~ 2M2 and hence that exp( -2M2 ) ~
exp ¢ ~ exp(2M2 ) and
II exp¢ - exp</>lloo ~ exp(2M2 )p(</».
(6.6)
..
By (6.1), (2.1), (2.3), (6.5) and (6.6),
(6.7)
IL
J
#
Bj(tlxi)[exp¢(tlxi) - expQ</>(tIXi)]S(tlxi)Sc(tlxi) dtl
1
~
Write Q</> - ¢
and hence
(6.8)
= Lj OjBj
M1MsnI- 1 exp(2M2 )p(</»,
and set
f
II</> - Q</>lIoo ~
= maxj 10jl·
f
j E A.
Now IIQ</> - ¢1100 ~
f
+ p(</».
By repeatedly applying (viii) on Page 155 of de Boor (1978), there is a
positive constant M 6 , depending only on q, such that
(6.9)
Since exp Q</>
= (exp ¢) exp(Lj OjBj) ,
II ~xpQ</> -
exp ¢ - (exp ¢)
L OjBjll ~ exp(2M
2
J
oo
+
f);.
•
•
37
We now conclude from from (2.1), (2.2) and (6.7) that, for j E A,
I~
..
f
Bj(t!Xi)
~(}IBl(tlxi) eXP1(tlxi)S(tlxi)Sc(tlxd dtl
€2 )
S M 1 M s n1- 1 exp(2M2 ) (p(¢) + exp(€)2"
.
According to (6.4), (6.7) and Lemma 16 applied to 'ljJ
M3
= SSe exp 1 [with
= M 1 exp(2M2 )],
€ S M 1 M 4 M 5 exp(2M2 ) (p(¢)
+ exp(€) ;).
Suppose now that
(6.10)
•
(6.11)
where M 7 = 2[ M 1 M 4 M5 exp(2M2 ) + 1]. According to (6.9), a sufficient
condition for (6.10) and hence for (6.11) is
(6.12)
Let 0 < Po < 2- 1 Mil Mal. There is a positive integer 10 , depending on
M 1 and the degree of G, such that
(6.13)
p(¢) S Po,
1 ~ 10 and ¢ E A
[see Theorem 12.8.of Schumaker (1981)]. Let 1
~
10 , Suppose that
(6.14)
Since II¢ -11100 = p( ¢)
•
.
S 2- 1 Mal,
(6.12) holds.
We will now verify that (6.14) necessarily holds for 1
Now Ilu¢ ~ Q(u¢)lloo is continuous in u for 0
38
Su<
~
10 . Suppose not.
1 (since the expected
log-likelihood is a strictly concave function of (}1, ..• , () I and it is continuous
inu, (}1,
... , () I)
and it approaches 0 as u -+ O. Thus there is a value of
u E (0,1) such that Ilu¢ - Q(u¢)lloo = 2- 1 M g 1 . By the previous argument,
(6.11) and (6.13) hold with ¢ replaced by u¢; hence
"
..
which yields a contradiction.
We have now shown that
1 > 10 and ¢ E A.
To complete the proof of (6.3) we need to show that
II¢; - Q¢lloo ::; Mgp(¢),
1 < lo'and ¢ E A.
But this result, for each 1, follows in a straightforward manner by a compactness argument.
of
REFERENCES
P. K., BORGAN, 0., GILL, R. D. AND KEIDING, N. (1993). Statistical Models Based on Counting Processes. Springer-Verlag, New
York.
ANDERSEN,
(1976). A bound on the Loo-norm of the L 2 approximation
by splines in terms of a global mesh ratio. Math. Compo 30 365-771.
DE BOOR, C.
DE BOOR, C.
(1978). A Practical Guide to Splines. Springer-Verlag, New
York.
BRESLOW, N. E. AND CROWLEY,
J. J. (1974). A large sample study of
life table and product limit estimates under random censorship. Ann.
Statist. 2 437-453.
,.
.
39
R. A. AND LORENTZ, G. G. (1993). Constructive Approximation.
Springer-Verlag, Berlin.
•
DEVORE,
•
FLEMING, T.
R. AND HARRINGTON, D. P. (1991). Counting Processes and
Survival Analysis. Wiley, New York.
O. AND KOOPERBERG, C. (1995). Trees and splines in survival
analysis. Statist. Meth. Med. Res. 4 237-261.
INTRATOR,
C., STONE, C. J., AND TRUONG, Y. K. (1995a). Hazard regression. J. Amer. Statist. Assoc. 90 78-94.
KOOPERBERG,
KOOPERBERG,
C.,
STONE,
C. J.
AND TRUONG,
Y.
K.
(1995b). The
£2
rate
of convergence for hazard regression. Scand. J. Statist. 22 143-157.
KOOPERBERG,
C.,
STONE,
C. J.
AND TRUONG,
Y. K. (1996). Entry on 'Haz-
ard regression'. To appear in the Encyclopedia of Statistical Sciences.
Wiley.
or
SCHUMAKER, L. L.
(1981). Spline Functions: Basic Theory. Wiley, New
York.
to-
STONE,
C. J. (1980). Optimal rates of convergence for nonparametric esti-
mators. Ann. Statist. 8 1348-1360.
STONE,
C. J. (1982). Optimal global rates of convergence for non parametric
regression. Ann. Statist. 10 1040-1053.
STONE,
C. J. (1983). Optimal uniform rate of convergence for nonpara-
metric estimators of a density function or its derivatives. In Recent
Advances in Statistics: Papers in Honor of Herman Chernoff on His
Sixtieth Birthday (M. H. Rezvi, J. S. Rustagi and D. Siegmund, eds.)
393-406. Academic Press, New York.
STONE,
•
C. J. (1989). Uniform error bounds involving logspline models. In
Probability, Statistics and Mathematics: Papers in Honor of Samuel
Karlin (T. W. Anderson, K. B. Athreya and D. L. Iglehart, eds.) 335355. Academic Press, Boston.
40
STONE,
C. J. (1990). Large-sample inference for log-spline models. Ann.
Statist. 18 717-741.
C. J. (1991). Asymptotics for doubly flexible logspline response
models. Ann. Statist. 19 1832-1854.
STONE,
STONE,
C. J. (1994).
The use of polynomial splines and their tensor
products in multivariate function estimation (with discussion). Ann.
Statist. 22 118-184.
Department of Biostatistics
Department of Statistics
University of California
University of North Carolina
Chapel Hill, NC 27599-7400
Berkeley, CA 94720-3860
•
41