O'Grady, Haesook Kim; (1994).Nonparametric Regression Estimates with Censored Data."

'.
·lJ
~
. ,t<!I.. _
~
Haesook Kim O'Grady
Nonparametric Regression
•
Estimates with Censored Data
Dept. Biostatistics
Univ. of North Carolina
MlMEO SERIES # 212bT
May 1994
NAME
DATE
!~
I
I
1
•
I
I
NONPARAMETRIC REGRESSION ESTIMATES WITH
CENSORED DATA
by
Haesook Kim O'Grady
Department of Biostatistics
University of North Carolina
Institute of Statistics
Mimeo Series No. 2126T
May 1994
NONPARAMETRIC REGRESSION ESTIMATES WITH
•
CENSORED DATA
by
Haesook Kim O'Grady
A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill
in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the
Department of Biostatistics.
•
Chapel Hill
1994
Approved by:
L(
1/.1--.,
- - - - H - - - - - - - + - r - - - - Advisor
_----"']A"----I_M1_tit---=Q=r-U_~_de~
Reader
@1994
Haesook Kim O'Grady
ALL RIGHTS RESERVED
11
ABSTRACT
HAESOOK KIM O'GRADY. Nonparametric Regression Estimates with Censored Data.
(Under the direction of Young Truong)
•
When an individual's survival time is measured in a study, it is often accompanied by
measurements of covariates whose values may be thought to characterize the subpopulations of individuals. An important goal is to measure the expected survival time in the
subpopulation to which an individual belongs.
Several techniques have been developed to estimate the conditional mean survival
time. One of these techniques (Cox, 1972) bases its approach on the proportional hazards
model. The other methods, due to Miller (1976), Buckley and James (1979), Koul, Susarla
and Van Ryzin (1981) and Leurgans (1987), are based on the standard linear model. A
major problem with these approaches is that both models give a systematic bias when
estimating regression functions if the model assumption is incorrect. When the nature of the
dependence on covariates is not clear, a fully nonparametric approach seems advantageous
in exploratory analysis and in the examination of large data sets (e.g. clinical trials) where
•
more restrictive models may miss interesting features .
This dissertation proposes a nonparametric smoothing approach for estimating the
conditional mean and median survival functions based on data subject to random right
censoring. Beran (1981), followed by Doksum and Yandell (1981) and Dabrowska (1987),
suggested a kernel smoothing technique based on local constant fits to estimate the conditional mean and median survival functions. However, Beran's estimate is not designadaptive. Thus, the bias of these estimators can have an adverse effect when the derivative
of the marginal density or regression function is large. To remedy the problems encountered
in the previous parametric or nonparametric approaches we propose, in this dissertation, a
fully nonparametric technique based on a design-adaptive approach which uses local linear
fits to estimate the conditional mean and median survival functions.
The asymptotic properties of the estimates of conditional subsurvival function, cumu-
.
lative hazard function, survival functions, and mean and median functions are established.
The estimates of the conditional mean and median survival functions are compared with
existing estimates, such as the Cox and Beran estimates, using computer simulated data.
The procedure is also applied to the Stanford heart transplant data and SOLVD (Studies
III
Of Left Ventricular Dysfunction) data. For the purpose of statistical inference, the 95%
pointwise confidence intervals and 95% simultaneous confidence bands are constructed for
the conditional mean regression function of the 90% and 75% uncensored simulated data
sets, Stanford heart transplant data, and SOLVD data.
•
iv
ACKNOWLEDGEMENTS
First, I would like to thank my committee members, Dr. Young Truong, Dr. P. K. Sen,
Dr. Dana Quade, Dr. Ed Davis, and Dr. Al Tyroler, and Dr. Larry Kupper for their
•
support throughout the program in Biostatistics. A special thanks to my advisor, Dr.
Young Truong, who has guided me through the dissertation. Lastly, I would like to express
my deep gratitude to my parants for their inspiration, encouragement, and love.
•
..
•
v
Table of Contents
1
Introduction and Literature Review
1
1.1 Introduction . . . .
1
1.2 Literature Review
4
1.2.1
•
An estimator based on Cox's proportional hazards model ..
5
1.2.2 The Miller estimator . . . . . .
6
1.2.3
8
The Buckley-James estimator.
1.2.4 The Koul-Susarla-Van Ryzin (KSV) estimator
10
1.2.5 The Leurgans estimator . . .
11
1.2.6
Kernel Smoothing Technique
12
1.2.7
A kernel estimator based on local linear fits
16
1.2.8
The Beran estimator . . . .
18
1.2.9
The Fan-Gijbels estimator .
21
1.3 Proposed Research
22
..
23
1.3.1
Method
2 Design-Adaptive Kernel Estimator with Censored Data
26
2.1 Introduction .
26
...
27
2.2 Method
2.2.1
Application of the Local Linear Smoothing Technique
29
2.2.2
Selection of the smoothing parameter h
30
2.3 Asymptotic Properties
31
2.3.1
Consistency . .
32
2.3.2
Asymptotic Normality
33
2.4 Confidence Bands . . . . . . .
38
vi
3 Computer Simulation and Data Analysis
40
3.1
Computer Simulation
.
41
3.2
Stanford Heart Transplant data .
43
3.3
SOLVD (Studies Of Left Ventricular Dysfunction) data
44
4 Summary and Future Research
66
4.1
Concluding Remarks
66
4.2
Future Research
67
•
70
6 Proofs
5.1
Consistency
70
5.2
Asymptotic Normality
75
Appendix
110
A.1 Introduction.
110
A.2 The simulation experiment: rationale and S-plus code:
110
A.2.1 Generation of simulated data sets
.
110
111
A.2.2 Computing the proposed mean and median estimates
A.2.3 Computing Cox's mean and median regression estimates with quadratic
fitting
.
114
A.3 Real data analysis I: Stanford heart transplant data
116
A.3.1 Computing the proposed mean and median regression estimates
116
A.3.2 Computing Cox's quadratic mean and median regression estimates
118
AA Real data analysis II: SOLVD data
.
A.4.1 Computing the proposed mean and median estimates
A.5 Computing Cox's quadratic mean and median estimates
•
120
120
122
124
References
.
vii
..
List of Figures
1.1
Potatoes versus net income. A linear parametric fit of Y=expenditure for
potatoes versus X =net income of British householders (straight line) and a
nonparametric kernel smoother (bandwidth=O.4) for the same variables, year
1973, n=7125. Units are multiples of mean income and mean expenditure,
respectively. Family Expenditure Survey (1968-1983).
14
3.1
True mean regression line for the 90% uncensored data.
48
3.2
Mean conditional censoring line for the 90% uncensored data
49
3.3
Simulated data set with 90% uncensoring, estimated mean regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation
..
curve with h=2j Long dashed line - Cox's estimation curve with linear term;
Short dashed line - Cox's estimation curve with linear and quadratic term;
Dashed line - proposed estimation curve with h=3. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . "
3.4
50
Simulated data set with 75% uncensoring, estimated mean regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation
curve with h=2; Long dashed line - Cox's estimation curve with linear term;
Short dashed line - Cox's estimation curve with linear and quadratic term;
Dashed line - proposed estimation curve with h=2. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . ..
•
viii
51
3.5
Simulated data set with 50% uncensoring, estimated mean regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation
curve with h=2; Long dashed line - Cox's estimation curve with linear term;
Short dashed line - Cox's estimation curve with linear and quadratic term;
Dashed line - proposed estimation curve with h=2. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.6
52
Simulated data set with 25% uncensoring, estimated mean regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation
curve with h=2; Long dashed line - Cox's estimation curve with linear term;
Short dashed line - Cox's estimation curve with linear and quadratic term;
Dashed line - proposed estimation curve with h=4. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.7
53
Stanford heart transplant data, estimated mean regression curves: Dotted
line - Beran's estimation curve with h=4; Long dashed line - Cox's estimation
curve with linear term; Short dashed line - Cox's estimation curve with linear
and quadratic term; Dashed line - proposed estimation curve with h=4. *:
uncensored observation. o:censored observation . . . . . . . . . . . . . . ..
3.8
54
.
SOLVD data, estimated mean regression curves: Dotted line - Beran's estimation curve with h=3; Long dashed line - Cox's estimation curve with linear
term; Short dashed line - Cox's estimation curve with linear and quadratic
term; Dashed line - proposed estimation curve with h=3.
uncensored
. . . . . ..
observation. o:censored observation
3.9
*.
55
Simulated data set with 90% uncensoring, estimated median regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation
curve with h=2; Long dashed line - Cox's estimation curve with linear term;
Short dashed line - Cox's estimation curve with linear and quadratic term;
Dashed line - proposed estimation curve with h=3. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . ..
ix
56
3.10 Simulated data set with 75% uncensoring, estimated median regression curves:
Solid line - true mean regression curvej Dotted line - Beran's estimation
curve with h=2j Long dashed line - Cox's estimation curve with linear termj
Short dashed line - Cox's estimation curve with linear and quadratic termj
Dashed line - proposed estimation curve with h=2. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . ..
57
3.11 Simulated data set with 50% uncensoring, estimated median regression curves:
Solid line - true mean regression curvej Dotted line - Beran's estimation
curve with h=2j Long dashed line - Cox's estimation curve with linear term;
Short dashed line - Cox's estimation curve with linear and quadratic term;
Dashed line - proposed estimation curve with h=2. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . ..
58
3.12 Simulated data set with 25% uncensoring, estimated median regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation
curve with h=2j Long dashed line - Cox's estimation curve with linear term;
Short dashed line - Cox's estimation curve with linear and quadratic term;
Dashed line - proposed estimation curve with h=2. *: uncensored observa-
•
tion. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . . ..
59
3.13 Stanford heart transplant data, estimated median regression curves: Dotted
line - Beran's estimation curve with h=4j Long dashed line - Cox's estimation
curve with linear term; Short dashed line - Cox's estimation curve with linear
and quadratic termj Dashed line - proposed estimation curve with h=4. *:
uncensored observation. o:censored observation . . . . . . . . . . . . . . ..
60
3.14 SOLVD data, estimated median regression curves: Dotted line - Beran's estimation curve with h=3j Long dashed line - Cox's estimation curve with linear
termj Short dashed line - Cox's estimation curve with linear and quadratic
term; Dashed line - proposed estimation curve with h=3.
observation. o:censored observation . . . . . . . . . . . . ..
•
x
*.
uncensored
. . . . . ..
61
3.15 95% pointwise confidence interval and simultaneous confidence bands for the
90% uncensored simulated data set. Solid line - true mean regression curve;
Dotted lines - estimated mean regression line with its upper and lower pointwise confidence intervals; Vertical lines - simultaneous confidence bands on
distinct z points. *: uncensored observation. o:censored observation . . . .
62
.
3.16 95% pointwise confidence interval and simultaneous confidence bands of the
75% uncensored simulated data set. Solid line - true mean regression curve;
Dotted lines - estimated mean regression line with its upper and lower pointwise confidence intervals; Vertical lines - simultaneous confidence bands on
distinct z points. *: uncensored observation. o:censored observation . . ..
63
3.17 95% pointwise confidence interval and simultaneous confidence bands of the
Stanford heart transplant data set. Solid line - estimated mean regression
curve with its upper and lower pointwise confidence intervals; Vertical lines
- simultaneous confidence bands on distinct z points. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . ..
64
3.18 95% pointwise confidence interval and simultaneous confidence bands of the
Stanford heart transplant data set. Solid line - estimated mean regression
..
curve with its upper and lower pointwise confidence intervals; Vertical lines
- simultaneous confidence bands on distinct z points. *: uncensored observation. o:censored observation . . . . . . . . . . . . . . . . . . . . . . . . ..
xi
65
Chapter 1
Introduction and Literature
Review
1.1
Introduction
Survival analysis deals with the distribution of duration (in time) of an event. This is often
referred to as failure time. For example, failure times include the lifetime of a physical
component (electrical or mechanical), the time to the death of a biological unit (animal,
cell, etc.), or the time until myocardial infarction or cardiovascular death in a clinical trial.
What makes survival analysis interesting is censoring. Some individuals may be observed for the full time to failure, having an event before the end of a clinical trial, but
others will survive to the end of the trial: they are said to be censored. Thus, a censored
observation contains partial information about the survival time of the individual.
There are several types of censoring. Right censoring occurs at time C if the exact
survival time of an individual is known to be greater than C. Similarly, left censoring
occurs at time C if the exact survival time of an individual is known to be less than C.
Right censoring is common in clinical trials. In this dissertation, only right censoring is
considered.
There are several ways in which right censoring can occur. Suppose that, in the absence
of censoring, the ith individual in a sample of n would have failure time Ti, a random
•
variable. Suppose also that there is a period of observation
individual ceases at
Ci
Ci
such that observation on that
if failure has not occured by then. Then the observation consists
= min(Ti, Ci),
of
Yi
6i
= 0 if Ti
= 1 if Ti ::; Ci (uncensored),
Type I right censoring occurs when Ci = c, where c is some
together with the indicator variable Di
> Ci (censored).
(preassigned) fixed number which we call the fixed censoring time. Type II right censoring
occurs when a predetermined number r( r < n) of failures are observed, so that c becomes
a random variable. Random right censoring occurs when Ci
= Ci where the Ci
•
are LLd.
random variables associated with the Ti. Random censoring arises in medical applications
with animal studies or clinical trials (Cox and Oakes, 1984). In a clinical trial, even though
the trial ends at a predetermined time c, because patients may leave or enter the study at
random times, the censoring is considered random.
When an individual's survival time is measured in a study, it is often accompanied by
measurements of covariates whose values are thought to characterize the subpopulations of
individuals. An important goal is to measure the mean survival time in the subpopulation
to which an individual belongs.
To illustrate the association between survival time and covariates, let us consider the
LRC-CPPT study in cardiovascular disease. For a randomly selected individual, the time
of death due to cardiovascular disease T and the time of censoring C due to the closure of
the trial are non-negative random variables. We observe the random vector (Y, 6, X), where
Y
= min(T,C)
is the length of the time for the individual between entering the study and leaving the study
by death or closure of the study,
6 = I(T ::; C)
(1.1)
indicates the mode of leaving, and X is a random vector of covariates such as age, systolic
blood pressure, cholesterol level, or educational level associated with the individual. The
tie-breaking rule in (1.1) is that if time of death coincides with potential time of censoring,
it will be recorded as a death. Of particular interest is
m(x)
= E(TIX = x),
(1.2)
an individual's expected survival time given that X = x. For example, if T is the time of
death due to coronary heart disease, an interesting goal is to measure the expected survival
time for patients aged 50 and compare the mean survival time of the patients in the drug
(Chorestyramine) group to the mean survival time of the patients in the placebo group.
2
•
Several techniques have been developed to estimate the conditional mean survival
time (1.2). One ofthese techniques is based on the proportional hazards model (Cox, 1972).
The other methods, due to Miller (1976), Buckley and James (1979), Koul, Susarla and Van
Ryzin (1981) and Leurgans (1987), are based on the standard linear model. These methods,
although parametric in principle, are accompanied by some nonparametric techniques in
•
order to estimate the distribution function and various other functions (e.g., estimating
the intercept in the Buckley-James method). A major problem with these approaches is
that both models give a systematic bias when estimating regression functions if the model
assumption is incorrect. When the relationship between the survival time and the related
covariates is not clear, a fully nonparametric technique is advantageous in the analysis
of large data sets and in exploratory analysis (where more restrictive models may miss
interesting features). Beran (1981) and Fan and Gijbels (1992) have suggested a fully
nonparametric approach to estimate the conditional mean
surviv~
function using a kernel
smoothing technique. Since minimal assumptions on the underlying model are imposed,
though some smoothness condition on the regression function is needed (see Section 2.3.1
for more details), nonparametric methods are very flexible and can be important diagnostic
tools for verifying the parametric assumption.
In the nonparametric approach, there have been various techniques to estimate regres-
sion functions for complete data. They include kernel type estimators (the local averaging
method, the nearest neighbor method, the local polynomial fits method, etc.), orthogonal
series type estimators, and spline type estimators. See Silverman (1986) and HardIe (1990)
for more details. For both practical and theoretical reasons, the kernel type estimator is
widely used and well developed in literature. In the kernel type estimator, the nearest
neighbor and the local constant methods are frequently used. However, as Fan (1991), Fan
et al. (1991a, 1991b) and Fan and Gijbels (1991) point out, these two methods have an
unpleasant characteristic: the asymptotic bias depends on the derivative of the marginal
density of X. To remedy this characteristic ofthese methods, Fan (1991), Fan et al. (1991a,
.
1991b) and Fan and Gijbels (1991) proposed a design-adaptive approach using local linear
fits that greatly generalizes regression function estimations. This estimator has some major
•
advantages over other kernel methods. First, the bias of the estimator does not contain the
derivative of the marginal density f x, so it has smaller bias than other kernel methods and
estimation of the derivative of the marginal density of X is not necessary. Second, optimal
3
rates of convergence can be achieved without imposing extra smoothness conditions on the
marginal density of X. Third, the estimator does not have boundary adaptation problem.
In other words, the local linear fits behave better than other kernel methods, and the convergence rate remains the same near boundaries of X (Fan (1991». See Stone (1982) for
.
the optimal convergence rate of the kernel type estimator. The above discussion remains
true for censored data. Thus, it can also be applied to censored data. Beran (1981) applied
•
the local averaging method and the nearest neighbor method to censored data (See Section
1.2.8). Fan and Gijbels (1992) used local linear fits for censored data (See Section 1.2.9).
In this proposed research, I follow Beran's approach to estimate the conditional mean
and median survival functions but adopt Fan (1991), Fan et 01. (1991a, 1991b) and Fan and
Gijbels's local linear fits method to alleviate the problems of the local constant fits given
above. See Section 1.3 for further discussion.
1.2
Literature Review
Within the past two decades various estimators have been developed for handling regression
problems in which the dependent variable is subject to censoring. These techniques have
been based on parametric models (Elandt- Johnson and Johnson, 1980), on partly nonparametric models (Cox (1972), Miller (1976), Buckley and James (1979), Koul, Susarla and
Van Ryzin (1981) and Leurgans (1987», or on fully nonparametric models (Beran (1981)
and Fan and Gijbels (1992)). When the dependence of a reponse variable on covariates is
unclear, a fully nonparametric approach is advantageous in a preliminary and exploratory
statistical analysis of a data set. Doksum and Yandell (1982) illustrate these points clearly
by comparing two nonparametric regression estimates of survival time, Beran's running
product limit median and a running product limit mean, with Cox and Buckley-James
estimates on Stanford heart transplant data.
To estimate the effect of a covariate on a response variable, the conditional mean
function or some robust functions (median, percentiles etc.) would be considered. Some
existing methods, reviewed in this section, are for the estimation of the conditional mean
and median regression functions and those methods based on linear model are strictly for
the estimation of the conditional mean regression function.
4
•
1.2.1
An estimator based on Cox's proportional hazards model
H F(t) and J(t) are the underlying distribution and density functions for the survival time
T, the hazard rate, A(t), is given by
A(t) = lim pet ~ T < t + ~tlT > t)
At-O
~t
=
J(t)
1 - F(t)
(1.3)
•
Expression (1.3) implies that the hazard at time t is approximately the probability that
the event (e.g., cardiovascular death) will occur in a small interval around t, given that the
event has not occurred by time t. The proportional hazards model assumes that
A(tlx) = Ao(t)e:7:,B,
(1.4)
where fJ is the vector of regression coefficients of independent variables x and Ao(t) is the
hazard rate when x
= 0 (Le., baseline hazard).
This assumption implies that the relationship
between the independent variables and log hazard (log A(tlx» or log cumulative hazard
(log A( tlx» should be linear and that the effect ofthe independent variables is the same at
all values of t since log AO(t) can be separated from xfJ.
Assuming the model (1.4) is correct, Cox proposed a partial likelihood approach to
estimate fJ since the unknown baseline hazard function AO( t) prevents constructing a full
likelihood function (Cox, 1972, 1975). He argued that if the model (1.4) holds, information
about AO(t) is not useful for estimating the parameter of primary interest, fJ (Cox, 1972).
Let t} < t2 < ... < t m denote the distinct ordered uncensored observations of the n
subjects in a sample, assuming no tied uncensored observations, and let Ri be the set of
subjects j such that the subject has not failed or been censored at time ti, Le., the failure
or censoring times ofthe subjects in the risk set Ri are {lj
~
ti,j = 1, ... ,n}.
The conditional probability that the ith subject is the one that fails at ti, given that
only one subject fails among those subjects in the risk set Ri at ti is
P{subject i fails at ti I only one subject fails and Ri at til
_
Ao(ti) exp(XifJ)
_
exp(xifJ)
_
exp(XifJ)
- LjfRi Ao(ti)exp(XjfJ) - LjfRi exp(xjfJ) - LYJ~ti exp(xjfJ)'
Thus, the partial likelihood is
(1.5)
5
The MLE of f3 from (1.5) is obtained by solving for the root ofthe equations
o.
alog L / af3 =
In the literature, several suggestions to account for ties have been made. H ties occur
between uncensored and censored observations, the uncensored times are considered as
preceding the censored ones. Once
Phas been obtained Cox uses a product-limit approach
to derive an estimate of the distribution F.
Several authors have proposed methods for estimating the conditional survival function
S(tlx) = 1- F(tlx) = exp( _e x {3
lot Ao(u)du) = exp( -Ao(t)e
x {3).
•
(1.6)
The method adopted in this dissertation is to estimate Ao(t) by Breslow's (1974) estimator
•
di
AO(U)=.
(ti - ti-d Lj(Ri eX J{3
(1.7)
for ti-l < u < ti, where tl < t2 < ... are the ordered distinct uncensored observations
and di is the number of deaths at ti, and then to substitute ~o(t)" and pinto (1.6) to get
the estimator of the conditional survival function, Sn(tlx). Thus Cox's estimator of the
conditional mean regression function m(x) = E(TIX = x) is
mn(x) =
rX> Sn(tlx)dt= JrX> exp{_e x{3. JIt ~o(u)du}dt
J
o
o
o
where ~o is given in (1.7) and the estimator of the conditional median regression function
is
where
med~(x) =
inf{t: Sn(tlx) <
~}
med n (x) = sup{t : Sn(tlx) >
2}.
_
1.2.2
1
The Miller estimator
The standard linear regression model for n pairs of variates (ti' Xi) (i = 1, ... , n) is
where the random variables ei are assumed to be i.i.d. with mean J1,
u 2 • As before, the observable Yi is defined as
6
= 0 and finite variance
..
Miller (1976) suggested the conditional mean survival function based on the standard
linear model F( tj x) = F( t - a - x(3) with
E(Tlx) = a
+ x(3,
(1.8)
where a is the intercept and (3 is the vector of regression coefficients for the independent
= log U where
variable in x. If T is measured on a log scale so that T
U is the actual
survival time, then (1.8) corresponds to an accelerated time model.
To estimate a and (3, we minimize the sum of squares
(1.9)
with respect to a and the vector b, where F( ej a, b) is the product-limit estimator based on
6i, ei = ei(a, b) = Yi - a - Xib for i = 1, ... , n. Specifically,
.
1 - F(ej a, b) =
II (1 -
d(i)/n(i») 5(il,
e"<e
'where eel) <
e(2)
< ... are the ordered distinct values of ei, neil is the number at risk at
e(i)-' d(i) is the number dying at e(i)' and 6(i)
= 1 if d(i)
°
> 0, = otherwise. Expression
(1.9) is a generalization of the usual sum of squares E(Yi - a - Xib)2 for uncensored data.
It is difficult to locate the infimum of (1.9) because it is a discontinuous function of
b. Therefore, Miller proposed using an iterative sequence to calculate the estimate of the
regression coefficient vector (3:
(1.10)
where
n
X
= «Xij)),X = «LWiU~k)Xij)),
W
(1.11)
i=l
(1.12)
The limit ofthe sequence 13k for k = 0,1, ... , is the estimate of (3. Because of discontinuities
in the weights as functions of 13k, the sequence may become trapped in a loop. If the values
in the loop are not far apart, an average value over the loop can be used for the estimate.
•
The weight Wi(/3k) in (1.12) is the size ofthe jump assigned to e?
= ei(O,/3k) = Yi -xi/3k
by the Kaplan-Meier estimators applied to e~, ... ,e~. Inclusion of an intercept estimate O:k
I
in the estimate of ei is unnecessary since the weights are invariant under location shifts in
7
the data. If the largest
e? is censored, the mass 1 -
F( +00; 0, PI;) is unassigned to any
e?
The convention which seems to work best for these estimators is to normalize the weights so
that they sum to one in (1.12) and (1.14), but to assign the remaining mass 1-F( +00; 0, PI;)
to the largest
e! in (1.13).
Only the uncensored Yi received nonzero weights in (1.10). For this reason it makes
sense to use as a starting value
Po the ordinary unweighted least squares estimator applied
to just the uncensored data. For the limiting value /3 the associated estimate of the intercept
is
n
&=
L Wi(/3)(Yi - xi/3).
(1.13)
i=I
If the variation in
Pdue
to an estimate of
f3 rather than the true f3 being used in the
computation of the weights is ignored, then an estimate of the covariance matrix for
/3 is
n
{L wl(P)(Yi -
& - XiP)2}{ (X
- gw)TW(/3)(X - gWH -1.
(1.14)
i=!
For
/3 to be consistent it is necessary that the censoring distributions G( t; x) satisfy
G(t; x)
= G(t -
0: -
x(3).
This assumption requires that, as x changes, the censoring distributions shift along the
same line as the survival distributions.
Finally, we note that Miller's estimator is not consistent; a counterexample is given in
Mauro (1983).
1.2.3
The Buckley.-James estimator
To overcome the inconsistency problems of Miller's estimator, Buckley and James (1979) introduced a modification of the least squares normal equations, based on the idea of restoring
the censored observations.
The least squares normal equations for the non-censored regression case are
n
L(Ti - a - bXi)
i=I
=
°
and
n
L(Xi - X)(Ti - bXi)
i=I
8
= 0,
..
f3. In order to account for
where a and b are usual least squares estimators for a and
censoring, Buckley and James suggested a transformation of Ti to Tt defined by
Tt
where Yi
= Yi6i + (1- 6dE(1iITi > Yi),
i
= 1, ... , n,
= min(Ti, Ci) and 6i = I(Ti ~ Ci) as defined before.
Thus,
•
Since E(TiITi > Yi) is unknown, Buckley and James adopt a self-consistency approach and
estimate it from the Kaplan-Meier estimator. In other words, they put an estimate of the
conditional expectation E(TiITi > Yi) into the variable Yi
= 6iYi + (1- 6i)E(TiITi > Yi) and
then solve the usual least squares normal equations iteratively.
Specifically,
if uncensored
e? = e?(O, ~k) = Yi - Xi~k by the Kaplan-Meier estimator applied to e~, ... , e~. IT the
largest e? is censored, conventionally, assign the remaining mass to the largest e~.
where
The following argument is borrowed from Miller and Halpern (1982). The regression
estimator ~k+l at the (k+l)st step in the Buckley-James estimator is the usual least squares
estimator
(1.15)
where the matrix
X has elements
n- 1 Li Xij and
The iteration is continued until ~k converges to a limiting value ~ or becomes trapped in a
loop like the Miller estimator.
Since the estimator (1.15) uses a value for the dependent variable at every Xi, it seems
sensible to take for the starting ~o the least square estimator
•
{(X - Xf(X - X)}-l(X -
xfY
which treats all the observations as uncensored whether they are uncensored or not.
9
For the limiting value
(3 the associated estimate of the intercept is
In a sense this technique is a nonparametric version of the EM algorithm introduced
by Dempster, Laird, and Rubin (1977).
A first attempt to investigate the consistency of the Buckley-James estimator is due
to James and Smith (1984). Lai and Ying (1991) established the asymptotic normality
for a slightly modified Buckley-James estimator. However, all these studies are based on
the assumption that the linear relationship is correct, which is typically unknown to data
analysts.
1.2.4
The Koul-Susarla-Van Ryzin (KSV) estimator
Based on a linear model, but different from the Buckley-James estimator, Koul et oZ. (1981)
suggest:
Assuming G(t; Xi) == G(t), Koul et oZ. substitute an estimate for G(t) into the variable
Yi
= f5iYi(1 -
G(Yi»-1 and then solve the ordinary least squares normal equations. The
estimator of G(t) = P(Ci < t), the common censoring distribution, is given by
(1.16)
where N+(y) is the number of Yi greater than Y, i.e.
IT ties occur between censored and uncensored observations in (1.16), the censored times are
considered as preceding the uncensored ones. Asymptotically,
G behaves like the product-
limit estimator of G. Because the product-limit type estimators can be unstable for large t,
Koul et oZ. suggest truncating the large observations by M n , a sequence of constants(-- 00)
satisfying certain conditions. Thus, the least squares estimators of f3 and a are
(3 = E(Xi - x)f5iYi{I- G(Yi)}-l I[Yi ~ Mnl
E(Xi - x)2
and
10
..
The great advantage of this estimator over the Miller or the Buckley-James estimator is that
the KSV estimator does not require iteration. Koul et al. also show that their estimator
+ E(fi -
under some assumptions is mean square consistent, E(6: - a)2
asymptotically normal with convergence rate n-
P)2
= 0(1),
and
1
2" •
Zhou (1989) points out that the proper choice of M n may not be simple and the asymp-
..
totic variance of the KSV estimator needs to be fixed. Thus, Zhou suggests replacing M n
by some (observable) order statistic of the Y/ s, and then proving the asymptotic normality
using counting process and martingale techniques.
1.2.5
The Leurgans estimator
Let Fi(t) = P(Ti ~ t),
I[Yi~t],
Gi(t) = P(Ci ~ t),
R+(t) = Ei=l I[Y;~t]
Di(t) = P(Yi ~ t) = Fi(t)Gi(t),
Di(t)
=
yen) = maxdYi}.
and
Assuming i.i.d. censoring times, Leurgans (1987) transforms the censored observations
(Yi, 6i) into synthetic data,
Tt
=j
y(n)
I(Y;
(
-00
]
".~ ..
G(s)
-
Il..<o])ds,
where
G(t)
=
{1- ANC(s)}
R+(s)
"9I\y(n)
II
is the Kaplan-Meier estimator ofthe survival function G(t) ofthe censoring times Ci's. Here
NC(s)
= Ei=l I[Yi~..,6i=O]
and ANC(s)
= NC(s) -
NC(s-). Then, she applies the ordinary
least squares procedure to (Tti Xi). Thus, the least squares estimators 6: and
fi
based on
the synthetic data are
- X)Tt
P" -- E(Xi
E(Xi - X)2
where
an
d" - T-· - p"Xa -
,
t· = ~ ETt and X = ~ EXi .
Leurgans also discusses the asymptotic distribution of the estimator in the two-sample
case and consistency of the estimator. However, the asymptotic variance is unknown and
asymptotic properties for the general case are lacking.
Zhou (1992) derived the asymptotic distribution of the Leurgans estimator by representing the estimator as a martingale plus a higher-order remainder term using counting
process techniques. Zhou showed that for the linear model E(TiIXi) = a+pXi, the synthetic
11
data least squares estimators are asymptotically normally distributed with convergence rate
1.2.6
Kernel Smoothing Technique
For complete data, regression techniques are commonly used to describe a general relationship between an explanatory variable X and a response variable Y. For a fixed and
observed X = z, the average value of Y is given by the regression function. An immediate
interest is to have some knowledge about the mean dependence of Y on X
= z.
Thus, the
aim of a regression analysis is to produce a reasonable curve approximation to the unknown
relationship between X and Y. This curve approximation procedure is commonly called
"smoothing". There are essentially two ways to approximate the mean function, namely,
parametric and nonparametric approaches. As HardIe (1990) pointed out, the parametric
regression approach is to assume that the mean curve of Y on X has some prespecified
functional form, for example, a line with unknown slope and intercept. The functional form
I
is fully described by a finite set of parameters. In a preliminary and exploratory study,
a preselected parametric model is too restricted to fit unexpected features, whereas the
•
nonparametric smoothing approach offers a flexible tool in analyzing unknown regression
relationships.
An example of these two approaches is shown in Figure 1.1, borrowed from HardIe
(1990, p.6). The straight line indicates a linear parametric fit and the other curve is a
nonparametric smoothing fit of the expenditure for potatoes (Y) versus net income (X) of
British householders in 1973. The linear parametric model is unable to detect decreasing
demand for potatoes as a function of increasing income. The nonparametric smoothing
approach suggests an approximate mound-shaped regression relation between income and
expenditure for potatoes. Because the parametric fit is restricted to fitting a line, it misses
the real features; in a sense, it underfits or oversmooths the relationship between X and Y.
The nonparametric approach to estimating a regression curve has several advantages
(HardIe, 1990). First, it provides a versatile method of exploring a general relationship
between two variables. Second, it gives predictions of observations yet to be made without reference to a fixed parametric model. Third, it provides a tool for finding spurious
observations by studying the influence of extreme values. Fourth, it constitutes a flexible
method of substituting for missing values or interpolating between adjacent X values. The
12
•
nonparametric approach also has several drawbacks. The nonparametric regression does not
work well in dimensions higher than three with typical sample sizes. The convergence to
the limiting distribution is, if quantified, usually slower (n - r , 0
< r < t)
than in the case
1
of parametric (n - 3") methods. The amount of data required to avoid an unacceptably large
variance increases rapidly with increasing dimensionality. In this respect, the parametric
methods seem to be advantageous, except when large data sets are available.
The basic idea of smoothing is to use a local average of the data near x to construct
an estimator of the regression function at x, because it is believed that the observations at
Xi near x should contain information about the value of the regression function at x. More
formally this local average procedure can be defined (HardIe) as
n
m(x) = LWni(X)Yi,
(1.17)
i=l
where m(x) denotes an estimator of the regression function and {Wni(x)}f=l denotes a
sequence of weights which may depend on the whole vector {Xi}f=l'
Since the amount of averaging may alter the shape of a regression curve, a natural question is, how far from a fixed x should observations Xi be taken into account for averaging?
If the whole range of X is taken into account, the regression curve would be oversmoothed.
In the opposite extreme case, the regression curve is said to be undersmoothed. The amount
of distance between x and Xi is determined by a smoothing parameter, hn (bandwidth).
And the weight sequence {Wni( x )}?=1' tuned by a smoothing parameter, h n , controls the
amount of averaging. Finding the optimal choice of bandwidth that balances the trade-off
between oversmoothing and undersmoothing is called the smoothing parameter selection
problem.
According to HardIe, there is another way oflooking at the averaging formula. Suppose
that the weights {Wni(X)} are positive and sum to one for all x, that is,
n
LWni(X)
= 1.
(1.18)
i=l
Then m( x) is a least squares estimate at point x since we can write m( x) as a solution to
the following minimization problem:
n
n
min L Wni( x )(Yi - 8)2 = L Wni( x )(Yi - m( x»2.
6
i=l
(1.19)
i=l
The basic idea of local averaging is equivalent to the procedure of finding a local weighted
least squares estimate.
13
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure 1.1: Potatoes versus net income. A linear parametric fit of Y = expenditure for
potatoes versus X =net income of British householders (straight line) and a nonparametric
kernel smoother (bandwidth=O.4) for the same variables, year 1973, n=7125. Units are
multiples of mean income and mean expenditure, respectively. Family Expenditure Survey
(1968-1983).
"
14
Under condition (1.18), an easy way to understand the weight sequence {Wn i(x)}f=l
is to describe the shape of the weight function Wni(X) by a density function with a scale
parameter (bandwidth) that controls the size and the form of the weights near x. This
shape function is called a kernel K. The kernel is a continuous, bounded and symmetric
real function K which integrates to one,
J
K(u)du = 1.
Various kernel functions are possible in general, but both practical and theoretical considerations limit the choice. The following table, borrowed from Silverman (1986, p.43), presents
some standard kernels and their efficiencies. The efficiency of any kernel K is computed by
comparing its mean integrated square error with the Epanechnikov kernel's mean integrated
square error. See Silverman (1986) and HardIe (1990) for more details.
Efficiency
Kernel
K(t)
Epanechnikov
~(1-
Biweight
Triangular
(up to 4 d.p.)
tt 2 )/v'5
for
It I < v'5,
0
otherwise
~:(1 - t 2 )2
for
0
otherwise
I-It I
for
0
otherwise
1
It I < 1
0.9939
ItI < 1
0.9859
1 e-(Ht 2
Gaussian
Vf;
Rectangular
'2"
O'
1
0.9512
for
It I < 1
0.9295
otherwise
A commonly used kernel function, which has some optimality properties, is that of
Epanechnikov. The effective weight function {Wni(X)} of kernel smoothers is determined
by the kernel K and the bandwidth sequence h = hn • The weight sequence for kernel
smoothers (for one-dimensional x) is defined by
.
K(xC X
)
Wni(X) = L~- Kc"X.-X)'
1_1
~
where K (.) can be any of kernel function shown in the above table. The form (1.17) of
15
kernel weights Wni(X) was proposed by Nadaraya (1964) and Watson (1964) and therefore
m(x)
= Ei:l K(¥)Yi
Ei=l K(Xr
(1.20)
Z
)
is often called the Nadaraya-Watson estimator. This is a kernel estimator with local constant
fits. The normalization of weights makes it possible to adapt to the local intensity of the
X variables and, in addition, guarantees that the weights sum to one. The asymptotic bias
and variance of this estimator (Fan et al., 1991) are given by
bias ~
variance
~
respectively, where 0'2(x)
2
h
2f(x)(m"(x)f(x)+2m'(x)!'(x»
(nhf(x»-l
J
J
2
v K(v)dv,
K 2(v)dv0'2(x),
= Var(YIX = x).
(1.21)
(1.22)
Note that the estimator (1.20) uses same
bandwidth in the whole range of X.
A k-nearest neighbor (k-NN) estimator is a weighted average in a varying neighborhood
(i.e., averaging over k X's nearest to x). Thus, the bandwidth is wide where data are sparse
and narrow where data are dense. The formula for m(x) for the k-NN estimator is the same
as (1.20) except the bandwidth is not fixed. The asymptotic bias and the variance of the
..
k-NN estimator (HardIe, 1990) are given by
...., (~)2(mll(x)f(x) + 2m'(x)f'(x» d
bias ...., n
8f(x)3
Ie,
...., 20'2(x)
--elvariance(m(x)IX = x) ....,
k
.. ,
respectively, where
Cle
= J K 2 (u)du
(1.24)
= Ju 2 K(u)du.
and die
(1.23)
As discussed before, the bias
terms (1.21) and (1.23) contain the derivative of the density function, f(x). Note that
the k-NN estimator is much harder to interpret as a regression function. Also, it is not
necessarily more efficient than the local constant fits.
1.2.7
A kernel estimator based on local linear fits
Note that the function m( x) satisfies
m(x)
= min
E«Y a
a)21X
= x).
"
This suggests that m( x) can be estimated by
m(x)
= mJn t(Yi i=l
16
"
a)2K(Xi; x),
where h is a bandwidth, and (Xl! Yd, ... , (Xn,Yn)is a random sample from the distribution
of (X, Y). This method leads to kernel estimators based on local averages. Fan et oZ.
(1991) point out that the asymptotic bias of this estimator depends on the derivative of
the marginal density. A practical implication of this is that estimator is not adaptive to
certain designs of covariates. It turns out that this is not an intrinsic part of nonparametric
regression, but rather an artifact of kernel methods based on local constant fits.
To remedy the problems encountered in the approaches based on local constant fits,
Fan (1991) proposed a design-adaptive approach using local linear fits that greatly generalizes the case of regression function (the conditional mean) estimation.
Assume that the first derivative of m( x) exists. In a small neighborhood of a point x,
m(X)
~
m(x) + m'(x)(X - x) == a + b(X - x). Thus, the problem of estimating m(x) is
equivalent to a local linear regression problem: estimating the intercept a. Now, consider a
weighted local linear regression: finding a, b to minimize
(1.25)
Let 11,
.
b be
the solution to the weighted least square problem (1.25). Simple calculation
yields the regression estimator proposed by Fan et oZ. (1991):
n
m(x)
n
= 11 = LWi~/Lwi,
i=l
(1.26)
i=l
where
(1.27)
and
(1.28)
Note that this nonparametric regression estimator is a weighted average of the responses;
it is called a linear smoother in the literature. Fan (1991) showed that the estimator with
a suitable choice of K and h is the best linear smoother and adapt to a wide variety of
design densities. It is also known that the estimator does not have unpleasant boundary
adaptation. In other words, the local linear fits behaves better than other kernel estimators
and the convergence rate remains the same near boundaries of X. These are the major
advantages of the local linear fits over other kernel methods (e.g., local constant fits). Fan
et oZ. (1991) proved the asymptotic normality and mean squared error of the estimator
17
(1.26). The asymptotic bias and variance of (1.26) are given by
~h2ml1(x)
J
v 2K(v)dv,
(12(x) f K2(v)dv
nh
I(x)
respectively, where (12(x)
= Var(YIX = x). The estimator has the usual bias and variance
decomposition. The variance term is the same as obtained by the ordinary kernel methods.
However, the bias of the estimator does not contain the derivative of the marginal density
lx, a property which is not shared by the ordinary kernel methods based on local constant
fits or the k-NN method (See expressions (1.21) and (1.23».
This has the following implications:
1) optimal rates of convergence can be achieved without imposing extra smoothness condition on the marginal distribution, such as /'( x);
2) computing the mean squared error does not involve the derivative of the marginal density;
3) the bias of the estimate is dramatically reduced at the locations where either
l'x (x) or
m' (x) is large.
1.2.8
The Beran estimator
Beran (1981) proposed a class of nonparametric estimates of the conditional survival and cumulative hazard functions. Denote the respective conditional survival functions by S( tlx)
=
peT > tlX = x), H1(tlx) = P(Y > t,o = 11X = x) and H2 (tlx) = P(Y > tlX = x) and let
A(tlx) = -
lot S(s -lx)-ldS(slx)
(1.29)
be the conditional cumulative hazard function associated with S(tlx). It is assumed throughout that T and C are conditionally independent given X, which is sufficient to ensure identifiability of A(tlx) and S(tlx). Decompose S(tlx) into its continuous component Sc(tlx)
and its discrete component Sd(tlx): S( tlx) = Sc(tlx)
+ Sd(tlx). Let
.
D( x, S) denote the set
of discontinuity points of S(tlx). Let
..
p(ulx) = peT = ulX = x) = S(ulx) - S(u+lx)
18
(1.30)
and G(tlx) = P(C ~ tlX = x), the conditional censoring (or withdrawal) function. Since
Y = min(T, C) the event {Y ~ t} and {T ~ t, C ~ t} are equivalent. Thus, under the
assumption of conditional independence of T and C given X = x,
H 1(tlx) = -
J
~ t)G(ulx)dS(ulx)
l(u
(1.31)
and
H 2 (tlx) = S(tlx)G(tlx),
(1.32)
the integral in (1.31) being interpreted as an expectation.
The continuous and discrete components of H 1 (tlx) are
J
J
H 1c (tlx) = H 1d(tlx) = -
l(u
~ t)G(ulx)dSc(ulx)
(1.33)
l(u
~ t)G(ulx)dSd(ulx).
(1.34)
The set of discontinuity points of HI (tlx) is
D(x,H1) = D(x,S)n {t: G(tlx) >
OJ.
Let
(1.35)
Equations (1.30) and (1.31) imply that
q( ulx) = p( ulx )G(ulx).
(1.36)
From equations (1.29) through (1.34), for any t such that H 2 (tlx) > 0 we have
A(tlx) = _
r
dH1(tlx)
10 H (s -Ix)'
2
S(tlx) = exp[-A C(tlx)]II[l-
~A(slx)],
where AC(slx) is the continuous component of A(slx), the product is taken over the set of
discontinuities of H 1(slx) and s
.,
~
t, and
~A(slx)
= A(slx) - A(s - Ix). This is the well
known product-integral representation of distribution functions .
Let (Yj, 6j , Xj), j = 1, ... , n, be a sample of LLd. random variables each having the
•
same distribution as (Y,6,X). The survival functions H 1(tlx) and H 2 (tlx) are estimated by
n
H 1n (tlx)
= ~l(lj > t,6j = l)Wnj(x),
j=1
19
n
H 2n (tlz) = E/(Yj > t)Wnj(Z),
j=l
where Wnj(z) is a random set of non-negative weights depending on covariates only. Examples of possible weights include kernel type weights, nearest neighbors, or local linear
weights. Beran's estimates of A(tlz) and S( tlz) are provided by
(1.37)
(1.38)
where the product is taken over s
~
t. Both An(tlz) and Sn(tlz) are right continuous
functions of t, with jumps occurring at discontinuity points of H1n(tlz). Note that in the
homogeneous case (1.37) and (1.38) are simply Kaplan-Meier (1958) estimates.
To estimate the regression function
m(z)
= E(TIX = z) =
1
00
S(tlz)dt,
we assume that the integrals exists, and in order to ensure that m(·) is identifiable, we
assume that
sup(t : S(tlx) > 0)
~
sup(t : G(tlx) > 0).
Note that under this assumption, sup(t : S(tlx) > 0) = sup(t : H 2 (tlz) > 0). Natural
estimators of the conditional mean and median regression functions are defined by
and
where
medt(x) = inf{t: Sn(tlx) <
~}
and
med~(z) = sup{t: Sn(tlx) > ~}.
Dabrowska (1987) established the weak convergence results of Beran's estimator with
local constant kernel and nearest neighbor kernel estimates of the conditional cumulative
20
hazard and survival functions. Dabrowska also showed the asymptotic normality of the
truncated mean regression
mn(x;T(X)) =
r(x)
10
Sn(tlx)dt,
where 0 < T(X) < T(x) = sup(t : H 2 (tlx) > 0), and the median regression.
1.2.9
The Fan-Gijbels estimator
Buckley and James suggested a modified linear regression est.imator using a transformation
for censored data. See Section 1.2.3 for more details. However, their transformation involves
the unknown regression function which may not be linear. Motivated by the
Buckley-J~es
transformation, but avoiding the drawback oftheir estimator, Fan and Gijbels (1992) suggested an easily implemented nonparametric method to estimate the regression relationship
in censored data. The basic idea of the method is to transform the observed censored data
to an unbiased form, and then apply a locally weighted least squares regression.
Let us consider the regression model
T = m(X) + O'(X)£,
•
where T is the survival time and X is the covariate. The function m(·) is the unknown
regression curve and 0'(.) is the conditional variance representing the possible heteroscedasticity. Assume that X and £ are independent, E(£) = 0 and var(£)=1. Also, as before,
assume T and C are conditionally independent given X, where C is the censoring time, and
Y = min(T, C) and 6 = I(T :$ C). The observations are ordered according to the Xi'S.
Then, the observed random vector (Xi, l'i, 6i), i = 1, ... , n, is replaced by (Xi, Tt) where
T~ - 6·Y.·
(1- 6,)
E...
, - , ,+,
.-Y·K(
J.YJ>Y' J
Xi-Xj
(Xi+/r-Xi_/r)!2
)6'
x· X
"L.J:Y,>Yi
.
K( (Xi+lr-Xi_/r)!2
•- J
)6J.
J
'
(1.39)
for a given k and a nonnegative weight function K. This transformation ensures that
E(T*'X) = E(TIX) = m(X).
•
Using the transformed data (Xi, Tn, fit a loca1linear regression smoother, discussed in
Section 1.2.7, with an adaptive variable bandwidth based on the same smoothing parameter
k. Thus, the estimator of the conditional mean function m( x) is:
n
m(x) =
n
LWiTtiLwi,
i=l
i=l
21
(1.40)
where
and
"
..
for j = 0,1,2.
The notation
(1.41)
where l is the index of the design point Xl closest to x and k is a factor that determines
the number oflocal data points. Expression (1.41) implies that a small bandwidth is used
in a dense region and a large bandwidth is used in a sparse region.
To select the smoothing parameter k, transform the data by (1.39), then perform
cross-validation by disregarding the i-th transformed data point (Xi, Tn, and denote by
m-i(X) the estimator calculated from (1.40). Compute
n
GV(k) = ~)1i*
- m_i(Xi))2.
i=l
Then,
k is obtained by minimizing {GV(k) : k =
•
1, ... , [(n-l)/2]), where [1] is the greatest
integer part of I. Note that because the above cross-validation step requires very intensive
computation, Fan and Gijbe1s suggested an alternative by "trial-and-error".
They also discussed the calculation of the local linear smoother, data analysis of the
Stanford heart transplant data and the Primary Biliary Cirrhosis data, and established
some basic asymptotic properties. See Fan and Gijbels (1992) for more details.
1.3
Proposed Research
The method of nonparametric regression has received a great deal of attention lately, due
mainly to its flexibility in fitting data. This dissertation proposes a nonparametric approach
for estimating the conditional mean and median survival functions based on data subject
.
to random right censoring. Beran (1981), followed by Doksum and Yandell (1981) and
Dabrowska (1987) suggested a kernel smoothing technique based on local constant fits to
estimate the conditional mean and median survival regression functions. However, Beran's
estimate is not design-adaptive. Thus, the bias of these estimates can have an adverse effect
22
"
when the derivative of the marginal density or regression function is large. To remedy the
problems encountered in the previous parametric or nonparametric approaches we propose,
in this paper, a fully nonparametric technique based on the design-adaptive approach which
•
uses local linear fits to estimate the conditional mean and median survival functions.
..
1.3.1
Method
Let T be a non-negative random variable representing the survival time of an individual taking part in a clinical trial or other experimental study, and let X = (Xl, ... , X d) be a vector
of covariates such as age, blood pressure, and cholesterol level. To simplify the presentation,
the univariate case (d = 1) is considered in this paper. The survival time T is subject to right
censoring so that the observable random variables are given by Y = min(T, C), 6=/(T :::; C)
and X. Here C is a non-negative random variable representing time to withdrawal from the
study. Suppose we are interested in estimating the mean regression function. In general, the
distribution of the observable random vector (Y, 6, X) does not identify the mean regression
function uniquely (cf. Tsiatis (1975». However, this problem can be solved by assuming
.
the following condition:
(A)
The random variables T and C are conditionally independent given X.
P(T > tlX = x).
S(tlx) =
Let
Due to the censoring, Sn(tlx) is not directly estimable. Thus, we introduce estimable
/
conditional survival functions,
H 1 (tlx)
=
P(Y > t,6
= llX = x)
and
H 2 (tlx)
= P(Y > tlX = x).
Beran (1981) showed that for any t such that H2 (tlx) > 0 we have
A(tlx) = _
It dH1 (tlx)
10 H 2 (s -Ix)'
S(tlx) = exp[-A C (tlx)]Il[I-
~A(slx)],
where AC( six) is the continuous component of A( six), the product is taken over the set of
.
discontinuities of H 1 (slx), and s :::; t, and
~A(slx)
= A(slx) -
A(s -Ix).
Let (l'i, 6i , Xi), i = 1, ... , n, be a sample of LLd. random variables each having the same
distribution as (Y, 6, X). The survival functions H 1 (tlx) and H 2 (tlx) can be rewritten as
23
E(ItIX = x) and E(I2 IX = x), respectively, where It = I(Y > t,6 = 1), and 12 = I(Y > t),
and I is an indicator. Then, using the idea of the design adaptive kernel estimator, the
estimators of Ht(tlx) and H2(tlx) can be obtained by
.
(1.42)
where 1t ,i
= 1(Yi > t,6i = 1), 12 ,i = 1(Yi > t).
Solving (1.42) gives
(1.43)
where Ki = K( xr:c). The conditional cumulative hazard function and the conditional
survival functions can be estimated using (1.43)
(1.44)
where the product is taken over s
~
t. Thus, natural estimators of the conditional mean
and median survival function are given by
(1.45)
where Sn(tlx) is given in (1.44), and
(1.46)
where
med~(x) = inf {t : Sn(tlx) < ~}
and
In practice, this method has to face the problem of the choice of the smoothing param-
eter (i.e., bandwidth). The parameter determines the degree of smoothing and is a delicate
trade-off between the variance and the bias of the estimator. The literature review and
further discussion on selection of the smoothing parameter h is presented in Section 2.2.2.
The asymptotic properties of the proposed estimates for the subsurvival functions,
cumulative hazard and survival functions, and conditional mean and median regression
functions will be discussed. The proposed estimates will be examined through the simulated
24
..
data sets with various amount of censorings. The same procedure will also be applied to
the Stanford heart transplant data and SOLVD data.
For the purpose of statistical inference, the 95% pointwise confidence intervals and 95%
simultaneous confidence bands will be constructed for the conditional mean and median
regression functions of the simulated data sets, Stanford heart transplant data and SOLVD
data.
.
..
25
Chapter 2
..
Design-Adaptive Kernel
Estimator with Censored Data
2.1
Introduction
To study the effect of a covariate on the response when it is subject to censoring, a popular
approach is Cox's conditional likelihood method in which a proportional hazards model is
..
proposed to study the effect of the covariate. That is, this effect is examined through the
conditional hazard function.
On the other hand, the standard regression approach assumes
T= a+bX +£,
where the random variable
£
is independent of the covariate X and is assumed to have a
distribution F. The interesting problem is then the estimation of the parameters a and
b when the survival response T is subject to censoring. This approach, first proposed by
Miller (1976), then followed by Buckley and James (1979) and others, can be viewed as
examining the effect through the use of the linear conditional mean. See also Miller (1981)
and Miller and Halpern (1982). All the above approaches are based on the assumption
that the linear or proportional hazards model is correct, which is typically unknown to
data analysts, especially in exploratory studies. Beran (1981), followed by Doksum and
Yandell (1984) and Dabrowska (1987), suggested a nonparametric approach to estimate the
conditional mean and median regression functions using the kernel smoothing method based
on local constant fits. However, Beran's estimate is not design-adaptive. In other words,
Of
the asymptotic bias of this estimate depends on the derivative of the marginal distribution
of x, and the convergence rate near boundaries is slow. To remedy problems encountered
in the parametric or nonparametric approaches, in this dissertation, we propose a fully
•
nonparametric approach based on the design-adaptive technique which uses local linear fits
to estimate the mean and median regression functions.
•
In Section 2.2, we discuss the implementation of the local linear smoothing method to
estimate the conditional subsurvival functions, cumulative hazard function, mean and median functions. In Section 2.3, we discuss the consistency and asymptotic normality results
of these estimates. Finally, in Section 2.4, we describe how to construct the pointwise confidence intervals and the simultaneous confidence bands for the mean and median regression
functions.
2.2
Method
Let T be a non-negative random variable representing the survival time of an individual
taking part in a clinical trial or other experimental study, and let X
= (Xl!""
Xd) be
a vector of covariates such as age, blood pressure, and cholesterol level. To simplify the
presentation, the univariate case (d = 1) is considered in this dissertation. The survival
time T is subject to right censoring so that the observable random variables are given by
Y = min(T, C), 6=I(T
~
C) and X. Here C is a non-negative random variable represent-
ing time to withdrawal from the study. Suppose that we are interested in estimating the
conditional mean regression function. In general, the distribution of the observable random
vector (Y, 6, X) does not identify the mean regression function uniquely (d. Tsiatis (1975)).
However, this problem can be solved by assuming the following condition:
(A.l)
Let
..
The random variables T and C are conditionally independent given X.
S(tlx)
H 1 (tlx)
=
P(T
> tlX
= x),
= P(Y > t,6 = llX = x)
and
H 2 (tlx) = P(Y
be the conditional survival and subsurvival functions, respectively. Set
A(tlx) = _
r
Jo
27
dS(slx)
S(s-Ix)
> tlX = x),
be the conditional cumulative hazard function associated with S(tlx). Under assumption
(A.l), for any t such that H2(tlx) > 0, Beran (1981) showed that
A(tlx)
_ _ It dHt(slx)
- Jo H 2 (s -Ix)'
S(tlx)
=
exp[-AC (tlx)]II[I- AA(slx)],
where AC(six) is the continuous component of A( six), the product is taken over the set
of discontinuities of Ht(slx), for s
~
«
t, and AA(slx) = A(slx) - A(s - Ix). Then, the
conditional mean regression function is
m(x)
=
1
00
S(tlx)dt.
To make m(·) identifiable, we truncate the upper bound of the integral, i.e.,
m(xj rex)) =
r(:c)
Jo
S(tlx)dt,
where rex) < sup{t: H 2(tlx) > O}. Another regression function that takes into account the
influence of the covariate on survival is the median med( x) of the survival function S(tlx).
Let
= inf{t: S(tlx) <
~}
1
= sup{t: S(tlx) > 2}
be the upper and lower medians of S(tlx), respectively.
survival is defined by
1
med(x) =
2{med+ (x) + med
(2.1)
(2.2)
Then the conditional median
_
(x)}.
Along with the estimation of the conditional survival function, the estimation of the
conditional mean and median regression functions is very useful in practice. In complete
data, the mean regression function would be a common choice for the estimation of location
parameter. However, in the presence of censoring, estimation of the mean regression function is not only unidentifiable in general but very sensitive to the skewness of the conditional
distribution. Although the identifiability problem can be resolved by assuming mild conditions on the supports of the conditional distributions of survival and censoring variables,
in order to ensure weak convergence results one needs additional, somewhat cumbersome
conditions, described in Section 2.3, on the tail behavior of these distributions. For the
skewness of the conditional distribution, the median regression function is an alternative.
28
•
The nonparametric approach proposed in this research is very useful especially in exploratory data analysis. We have some asymptotic properties established for this method
whereas they remain to be accomplished for other nonparametric smoothing methods. However, it has several drawbacks. It does not work well for dimensions higher than three or
small sample size. Like other nonparametric approaches, the convergence to the limiting distribution is slower than in the parametric approach. Projection pursuit (Friedman
and Stiietzle (1981), Huber (1985)) and generalized additive models (Hastie and Tibshirani
,
(1986)) provide useful alternatives to the nonparametric regression considered.
2.2.1
Application of the Local Linear Smoothing Technique
Let (li, 6i , Xi), i = 1, ... , n, be a sample of Li.d. random variables each having the same
distribution as (Y, 6, X). Set h,i = I(li > t,6i = 1) and h,i = I(li > t). The subdistribution functions H1(tlx) and H 2(tlx) can be rewritten as E(hIX
= x) and E(I2IX = x),
respectively. Then, using the idea of the design-adaptive kernel estimator, the estimators
of H1(tlx) and H 2 (tlx) can be obtained by
. LJ
~( Il,i - al- bl(Xi - x) )2 K( Xi h- x ), l = 1,2.
mm
(2.3)
a,b i=l
Solving (2.3) gives
(2.4)
where Ki = K(xfX). Once we obtain (2.4), the conditional cumulative hazard function
and the conditional survival function can be estimated as
(2.5)
and
(2.6)
where the product is taken over s
~
t. Both An(tlx) and Sn(tlx) are right continuous
functions of tj jumps occur at discontinuity points of H1n(tlx). Thus, a natural estimate of
the conditional mean regression function is given by
29
IT the largest observation is censored, then the integral is infinite. In this case, we consider
the estimate of the truncated mean regression function
r(x)
mn(X;T(X)) = Jo
where T(X)
Sn(tlx)dt
(2.7)
•
< sup{t : H2 (tlx) > O}.
The estimates of (2.1) and (2.2) are given respectively by
med~(x)
= inf{t: Sn(tlx)
<~}
and
Therefore, the estimate for the conditional median regression function is
(2.8)
2.2.2
Selection of the smoothing parameter h
Selecting an optimal bandwidth that controls how much smoothing is done is an important
topic in statistical smoothing. There are several systematic ways of choosing the bandwidth
in literature. They are classical methods (e.g., MLE method: choosing h to maximize the
likelihood function), plug-in methods, and cross-validation methods. Due to a serious drawback of the first method (e.g., maximized at h = 0), attention has been focused on the latter
two methods. Recently, rapid developments have been made in automatic, data-based selection of a global smoothing parameter. Indeed, the developments have happened so fast
that the discussion of bandwidth selection methods in Silverman's (1986) book on nonparametric density estimation and even that in Hardie's (1989) book on applied nonparametric
regression are no longer up-to-date. Jones, Marron, and Sheather (1992) extensively reviewed the recent literature on automatic, data-based selection of a global bandwidth in
univariate kernel density estimation. They concluded that optimal theoretical performance
and acceptable practical performance were not accomplished by the same technique. Also,
the two best known bandwidth selection methods, namely "least squares cross-validation
(LSCV)" and "normal-based rules-of-thumb (ROT) plug-in" methods, cannot be advocated for general practical use. The LSCV method suffers from too much variability and
the ROT method is too biased towards oversmoothing. Though they were reluctant to
recommend one single method for general purposes too strongly, their current preference is
for the Sheather and Jones (1991) plug-in method.
30
..
Alternatively, the bandwidth can be selected by "trial-and-error". To use the "trialand-error" method, it is desirable to have a fast way of computing the estimated regression
curve. To reduce the computing cost and time, the linear binning idea (see e.g., Jones
(1989» was introduced. The basic idea of linear binning is following. Suppose that we are
interested in estimating the regression function m(·) on an interval [a, b]. First, partition
the interval into L intervals of length A
Xj
=
a
+ A * j.
= (b -
a)/ L with grid points {x j, j
= 0,···, L},
Then, move each observation (Xi,Id towards its two nearest grid points
and assign weights to each of the two new points according to the corresponding distance
between the original point and the grid point: the bigger the distance the smaller the
assigned weight.
In this dissertation, since the number of observations is not too big (:5 200) in each
data set and computing the estimated regression curve was not complicated, I selected the
bandwidth in each estimation by the direct "trial-and-error" metJiod.
2.3
Asymptotic Properties
To prove the asymptotic properties of the estimates given in (2.4) through (2.8), we shall
assume the following conditions.
Condition 1. The kemel K (.) is a nonnegative continuous function and its support (i. e.
{v: K(v):f: O}) is a ~ompact interval
in~.
Furthermore, K(.) satisfies
[:00 K(v)dv = 1, [:00 vK(v)dv = 0
and
[ : v 2 K(v)dv <
00.
The kernel functions K(·) given in Section 1.2.6 satisfy the above condition except the
normal density which has a noncompact support. But this can be easily modified to have
a compact support by a simple truncation.
Condition 2. The density function f(x) is continuous and 0 < f(x) <
J is a compact subset
•
00
for x E J. Here
of~ .
The condition f(x) > 0 ensures that the asymptotic variances of Hln(tlx), l = 1,2, are
finite. See Theorem 2.1. The condition f(x) <
Hln(tlx), l
00
is used in showing the consistency of
= 1,2, An(tlx), Sn(tlx), medn(x), and mn(x).
Condition 3. Random variables T and C are conditionally independent given X.
31
This condition is needed to make the distribution of the observable random vector (Y, 6, X)
identify S{tlx) uniquely (cf. Tsiatis (1975)).
Condition 4. For t E [O,T{X)), S{tlx)
= peT > tlX = x)
and G{tlx)
= P{C > tlX = x)
have continuous second derivatives respect to x.
Condition 4 is used in showing the asymptotic normality of Hln(tlx), l = 1,2, An{tlx),
Sn{tlx), and mn{x).
2.3.1
Consistency
We shall first describe the consistency properties of the proposed estimates given in (2.4)
through (2.8).
Proposition 2.1 Suppose Conditions 1-2 hold and that nh -
00
and h -
0, as n -
00.
Then
sup
tE[O,.r(x»
That is, for
£
IHln(tlx) - Hl(tlx)1 ~ 0,
> 0 and x
l = 1,2
for
x E J,
as
n -
00.
E J,
lim P (
n-+oo
sup
tE[O,.r(x))
IHln(tlx) - Hl(tlx)1 >
£) = O.
The following result is the consistency of the estimate of the conditional cumulative hazard
~
function.
Proposition 2.2 Suppose Conditions 1-9 hold and that nh -
00
and h -
0, as n -
00.
Then
IAn(tlx) - A(tlx)1 ~ 0
sup
tE[O,T(X»
That is, for
£
> 0 and x
for
x E J,
as
n-
00.
E J,
lim P (
n-+oo
sup
te[O,T(X))
IAn(tlx) - A{tlx)1
>
£) = O.
•
The following result is the consistency of the estimate of the conditional survival function.
Proposition 2.3 Suppose Conditions 1-9 hold and that nh -
00
and h - 0, as n -
Then
sup
ISn(tlx)-S{tlx)I~O for
tE[O,T(X))
32
xEJ,
as
n-oo.
00.
That is, for
£
> 0 and x
E
J,
lim P (
n-+oo
sup
tE[O,T(Z»
£) = O.
ISn(tlx) - S(tlx)1 >
Proposition 2.3 implies the following propositions.
Proposition 2.4 Suppose Conditions 1-3 hold and that nh
and h
-+ 00
-+
0, as n
-+ 00.
Then
Imedn(x) - med(x)1
That is, for
£
>
0 and x E
!. 0 for x
E J,
n
as
-+ 00.
J,
lim P(lmedn(x) - med(x)1 > £)
n-+oo
= o.
The following result is the consistency of the estimate of the truncated mean regression
function, mn(x;T(X».
Proposition 2.5 Suppose Conditions 1-3 hold and that nh
-+ 00
and h
-+
0, as n
-+ 00.
Then
Imn(x; T(X» - m(x; T(X »Il!. 0
That is, for
£
> 0 and x
E
for
x E J,
n
-+ 00.
J,
lim P(lmn(x;T(x»-m(x;T(x»1 > £)
n-+oo
2.3.2
as
= o.
Asymptotic Normality
In this section, we describe how the suitably normalized stochastic processes converge weakly
to Gaussian processes. The symbol ~ indicates convergence in distribution and N(p" E)
denotes a Gaussian random vector with mean p, and covariance E .
•
Theorem 2.1 Suppose Conditions 1, 2, and
n
-+ 00.
4 hold and that
nh
-+ 00
D
-+
N(O,
and h
-+
Then, forO < s,t < 00,
33
EH(z» ,
0, as
as n -
00,
where 0' = (0,0),
HI(slx)[I- HI(slx)]<I>(x)
[HI(tlx) - HI(slx)H2(tlx]<I>(x) ],
EH= [
[HI(tlx) - HI(slx)H2(tlx)]<I>(x)
H 2(tlx)[I- H2(tlx)]<I>(x)
and
"'( ) =
'Y
Let wln(tlx)
x
J K2(v)dv
f(x)'
= -Vnh(Hln(tlx)-Hl(tlx», l = 1,2. In Chapter 5, it is shown that cov(Wln(slx),
Wl,n(tlx'» - 0, x :F x', l,l' = 1,2 for 0 < s, t <
00,
whenever nh -
00,
h~
OJ that is, the
suitably normalized H ln and H2n , estimators of conditional survival functions are asymptotically uncorrelated at distinct x points. This leads to the following corollary.
Corollary 2.1 Suppose Conditions 1, 2, and
n-
00.
for
Then, for 0 <
S,
x:F x' E J, as n -
t<
4 hold and that nh -
00
and h -
0, as
00
and h -
0, as
00,
00.
The result below is from the results of Theorem 2.1 and Corollary 2.1.
Theorem 2.2 Suppose Conditions 1, 2, and 4 hold and that nh -
n-
00.
Let Xl, X2, .. " x p be the p different locations of x. Then, for 0 < S, t <
00
•
as n -
00,
where 0' = (0,0"",0),
BH(x;)
and LH(Xi) are given in Theorem
2.1.
The asymptotic normality of the estimate of the conditional cumulative hazard function is
given in the following theorem.
34
Theorem 2.3 Suppose Conditions 1-4 hold and that nh Then, for 0
.£
00,
and h -
0 as n -
00.
< s, t < 00,
Vnh [
as n -
00
(~~:II:; ) _
(A(SIX)) _
BA]
A(tlx)
N(O, EA)
where 0' = (0,0),
C(tlx)
C(slx)
C(s 1\ tlx) ],
C(sl\tlx)
C(tlx)
r dH
= - 10
1 (ulx)
H?(u _lx)4>(x),
for
x E J.
Let Ln(tlx) = v'7ih(A n(tlx)-A(tlx)). As a result of Corollary 2.1, cov(Ln(slx),Ln(tlx'))0, x =F x' for 0 < s, t <
00,
whenever nh -
00,
h - 0; that is, the suitably normalized An'
estimator of the conditional cumulative hazard function is asymptotically uncorrelated at
distinct x points. This leads to the following corollary.
Corollary 2.2 Suppose Conditions 1-4 hold and that nh Then, for 0
00
and h -
< s, t < 00,
The result below is from the results of Theorem 2.4 and Corollary 2.2.
35
0, as n -
00.
1-4 hold and that nh -+ 00 and h -+ 0 as n
Xl, x2, .. " xp be the p different locations of x. Then, for 0 < s, t < 00,
Theorem 2.4 Suppose Conditions
as n
-+ 00,
where 0' = (0,0"",0),
BA(Zi)
-+ 00.
Let
and LA(Zi) are given in Theorem 2.3.
The asymptotic normality of the estimate of the conditional survival function is given in
the following result.
Theorem 2.5 Suppose Conditions
Then, for 0 < s,t <
1-4 hold and that nh
-+ 00
and h
-+
0, as n
-+ 00.
00,
_ (S(SIX») _ BS]
!2.
~ N(O, LS) ,
S(tlx)
as n
-+ 00,
where 0' = (0,0),
Bs
== Bs(z) =
(
BS(SIX») ,
Bs(tlx) = -BA(tlx)S(tlx),
Bs(tlx)
and
LS=
S2(slx)~(sl~)
S(slx)S(tlx)C(sA tlx)
[ S( six )S(tlx )C( sA tlx)
],
x EJ .
for
S2(tlx )C(tlx)
Let Mn(tlx) = &(Sn(tlx) - S(tlx». As a result of Theorem 2.3 and Corollary 2.2,
cov(Mn(slx), Mn(tlx'»
-+
0, x
f:.
x' for 0 < s,t <
00,
whenever nh
-+ 00,
h
-+
OJ that is,
the suitably normalized Sn, estimator of the conditional survival function is asymptotically
uncorrelated at different
X
points. This leads to the following corollary.
Corollary 2.3 Suppose Conditions
Then, for 0 < s, t <
1-4 hold and that nh
-+ 00
and h
-+
0, as n
00,
E(Mn(slx)Mn(tlx'» - nhBs(slx)Bs(tlx')
36
-+
0,
x
f:.
x',
as
n
-+ 00.
-+ 00.
The result below is from the results of Theorem 2.6 and Corollary 2.3.
Theorem 2.6 Suppose Conditions
1-4 hold and that nh -
Xl, x2, .. " x p be the p different locations of x. Then, for
as n -
00,
00
and h - 0, as n -
°< s, t <
00.
Let
00,
where 0' = (0,0,' . ,,0), and Bs(x;) and LS(x;) are given in Theorem
2.5.
The asymptotic normality of the estimate of the truncated mean regression function will
now be described.
1-4 hold and that nh -
Theorem 2.7 Suppose Conditions
and h - 0, as n -
00
00.
Then
Vnh[mn(x; r(x)) - m(x; r(x)) - Bm(x; r(x))] !l N(O, O'~(x; r(x)),
as n -
00,
where
Bm(x;r(x)) = -
r(x)
Jo
Bs(tlx)dt,
and
O'~(x;r(x))=
Let Un(X)
= Vnh (mn(X; r(x)) -
r(x)
J
o
[l'7"(X)
a
]2
S(tlx)dt
dsdC(slx).
m(x; r(x))). As a result of Theorem 2.4 and Corollary 2.3,
cov(Un(x), Un(x')) - 0, x:F x', whenever nh -
00,
h - 0; that is, the suitably normalized
truncated mean regression function is asymptotically uncorrelated at distinct x points. This
leads to the following corollary.
Corollary 2.4 Suppose Conditions
1-4 hold and that nh -
00
and h - 0, as n -
Then
E(Un(x)Un(x')) - nhBm(x)Bm(x') - 0,
37
x:F x',
as n -
00.
00.
From the results of Theorem 2.8 and Corollary 2.4, we obtain the following result.
Theorem 2.8 Suppose Conditions
Xl,
1-4 hold and that nh -. 00 and h -. 0, as n -. 00. Let
X2, ... ,Xp be the p different locations of x. Then,
as n -.
00,
where 0'
= (0,·· ·,0) and Bm(Xij T(Xi»
and O'~(Xij T(Xi)} are given in Theorem
2.7.
Theorem 2.9 Suppose Conditions 1, 2, and 4 hold and that nh -.
n -.
00.
00
and h -. 0, as
Then, for s,t E [O,T(X», the process Wn(tlx) converges weakly to W(tlx), a two
dimensional Gaussian process with mean BH and covariance function
cov(Wln(slx), Wlln(tlx»
for X E J, as n -.
= [Hll\ll(S V t) -
Hl(slx)Hll(tlx)]¢>(x),
l,l'
= 1,2,
00.
Note that the results of Theorems 2.1 to 2.3 and 2.5 imply that the processes Ln(tlx) and
Mn(tlx) converge weakly to Gaussian processes, L(tlx) and M(tlx), with means BA(tlx) and
Bs(tlx) and covariance functions cov(L(slx),L(tlx» = C(slx) and cov(M(slx),M(tlx» =
S( six )S(tlx )C(six), respectively, where C( six) is given in Theorem 2.2.
2.4
Confidence Bands
For the purpose of statistical inference, in this section, we discuss the construction of si-
multaneous confidence bands as well as pointwise confidence intervals for the conditional
truncated mean regression function. First, we consider pointwise confidence intervals and
confidence bands for the truncated mean regression. Since
Vnh[ m n (x j T( x» -
m( x j T( X»]
has asymptotically normal distribution with mean Bm(xj T(X», variance O'~(Xj T(X» the
approximate (1 - a) 100% confidence interval at each x is given by
O'~(Xj
T(X»
nh
where ZI-Ot/2 is the (1 - a/2)th quantile of the standard normal distribution. In reality, we replace Bm(XjT(X» to Bm(XjT(X» and O'~(X;T(X» to O'~(XjT(X». To compute
38
.,
Bm(Xj r(x)), we need to estimate the second derivatives of HI(tlx) and H2(tlx). Because
the bias term is very complicated to compute and much smaller than the variance, we suppress the bias term by assuming that the bandwidth h tends to zero slightly faster than
the optimal rate n-
1
5' •
Therefore, the asymptotic confidence interval at each x point is
computed as
For large sample inferences about the mean, however, approximate confidence bands on the
mean are obtained by
where x = (Xl,···, Xp )', P ~ n,r: is a diagonal matrix of O'~(Xij r(xi)), i = 1,· H,P, x~(a)
is the upper 100a-th percentile of the x~-distribution. In other words, the asymptotic
(1 - a) 100% simultaneous confidence bands are
An alternative solution is to use a bootstrap approach which will be investigated in
future study.
Construction of confidence bands for the median regression is more complicated since
the asymptotic variance depends on the unknown conditional density f(tlx). Alternatively,
a bootstrapping approach could be considered. The investigation of confidence bands for
the quantile regression is postponed to future study.
39
Chapter 3
Computer Simulation and Data
Analysis
In this chapter, the proposed estimation method is examined for three examples. In the
first example, four simulated data sets with 90%, 75%, 50%, and 25% of uncensoring were
used to evaluate the performance of the proposed mean and median regression estimates,
given in Section 2.2.1, and to compare these to the other existing estimates - Cox and
Beran estimates. I then apply the same procedure to the Stanford heart transplant data,
which are given in Miller and Halpern (1982) and the SOLVD (Studies Of Left Ventricular
Dysfunction) data. In all the presented examples, I take the kernel function K (.) to be a
truncated standard normal density function.
As discussed in Section 2.2.2, the bandwidth in each estimation is selected individually
by the direct "trial-and-error" method.
For the purpose of statistical inference, the 95% pointwise confidence intervals and
95% simultaneous confidence bands are constructed for the mean regression function of the
90%, 75% uncensored simulated data sets, Stanford heart transplant data, and SOLVD
data.
S-plus is used for the computer simulation and data analysis. The S-plus programs
are attached in Appendix A.
•
•
3.1
Computer Simulation
200 data points with different amount of censoring were simulated from the following model:
•
=
(300 - 0.5Xi)I(Xi
"'iid
N(55,5),
< 55) + (272.5 - 0.3(Xi - 55)2)I(Xi >= 55) + Ei,
Ei "'iid N(O, 10)
where Xi .1 Ei. The censoring time Gi, conditionally independent of the survival time Ti' is
distributed as
(CilXi
= x) "'iid Exponential(c(x»
where c( x) is the mean conditional censoring time given by
c(x) =
< 60
{ a - 30 - 3(x - 60) if x >= 60.
if x
a - 0.5x
a is a constant adjusting the amount of censoring. Therefore, ¥i'is min(Ti, Ci). Figures
3.1 and 3.2 represent the true mean regression function m( x) and the conditional mean
censoring time c(x), respectively, of the 90% uncensored data set. Figure 3.3 shows the
performances of the three different mean regression estimates - Cox, Beran, and proposed
estimates of the 90% uncensored data set. In this figure, h
= 3 was used for the proposed
estimate and h = 2 was used for Beran's estimate. The proposed estimation curve shows
an improvement over Beran's estimation curve near the right tail where the data points
are sparse. By design, the linear Cox estimator must either increase, decrease, or remain
constant with covariate, though it is modulated by the time dependent baseline hazard
.\o(t). Therefore, the initial higher survival for the linear Cox estimator is an artifact.
Similarly, the initial lower survival for the quadratic Cox estimator is also an artifact. This
reflects that the parametric method is too restricted to fit an unknown regression curve.
We observe in this figure that the proposed estimation curve is the closest curve to the true
regression curve.
Figures 3.4 and 3.5 show the performances of the three different mean regression
estimation curves of the 75% and 50% uncensored data sets, respectively. In Figures 3.4
and 3.5, h
•
= 2 was used for the proposed and Beran's estimates.
We observe similar results
as in Figure 3.3.
Figure 3.6 presents the same three mean regression estimation curves of the 25%
uncensored data set with h = 4 for the proposed estimate and h = 2 for Beran's estimate.
41
When there is too much censoring, essentially no method guarantees an accurate estimation
of the true regression curve. Nevertheless, the proposed estimation curve is the closest
estimation curve to the true regression curve. Figures 3.9 through 3.12 show the median
regression estimates relating survival time to covariate x of the 90%, 75%, 50%, and 25%
uncensored, simulated data sets for the three estimates - Beran, Cox, and the proposed
estimates. In Figure 3.9, h = 3 was used for the proposed median estimate curve and h
was used for Beran's median estimate. In Figures 3.10 and 3.11, h
Beran and proposed median estimates. In Figure 3.12, h
=2
= 2 was used for the
= 4 was used for the proposed
estimate and h = 2 was used for Beran's estimate. The performance of Cox's estimation
curve with linear term is the poorest among the four estimation curves. Beran's estimation
curves are close to the proposed estimation curves which are the closest estimation curves
in Figures 3.9 through 3.11. In Figure 3.12, as we expected, no estimation curve fits the
true regression curve well.
Tables 3.1 and 3.2 present the estimated coefficients, their estimated standard errors,
standard normal z-value, and p-values of the linear Cox and quadratic Cox estimators for
the simulated data sets. The p-values for the linear fit and quadratic fit both are significant
even though the quadratic fit is better than the linear fit.
For the purpose of statistical inference, the 95% pointwise confidence interval and 95%
simultaneous confidence bands at distinct x points were constructed for the 90% and 75%
uncensored simulated data sets in Figures 3.15 and 3.16, respectively. The simultaneous
confidence bands are quite wide when they are compared to the confidence intervals. Also, it
is observed that the widths of the confidence intervals and confidence bands are proportional
to the amount of censoring and density of x points. In other words, the widths are greater
for the 75% uncensored data set. Also, the width is greater where the data points are largely
censored and sparse.
Through the results of this simulation experiment, we observed following. First, the
proposed estimation method overcomes the boundary problem of Beran's method, Le., it
works well whether data points are dense or not. Second, since the proposed estimation
method does not assume any model, it is very flexible to fit any feature of the true regression
curve. Third, it is an excellent diagnostic tool for verifying the parametric assumptions.
42
•
3.2
Stanford Heart Transplant data
A similar procedure is now applied to the Stanford heart transplant data. For purposes of
comparison with Miller and Halpern (1982) and Doksum and Yandell (1982), the data are
taken from Miller and Halpern (1982). The Stanford heart transplantation program started
.
in October 1967. There were 184 patients who received heart transplants between October
1967 and February 1980. Because the tissue typing for 27 patients was not completed, only
157 out of 184 patients are included in this analysis. Of these 157, 102 patients were dead
and 55 were alive as of February 1980. The survival time is to censored or uncensored time
by February 1980. The age at transplant is considered as a covariate because it was believed
that younger patients tend to survive longer after heart transplant. Figure 3.7 presents the
Cox, Beran, and the proposed mean regression estimates relating survival time to age on
the base 10 logarithmic scale. h = 4 was used for the proposed and Beran's estimates. The
proposed estimation curve suggests that the mean survival time remains relatively constant
with age between 15 to 47 and drops rapidly after age 47. Beran's estimation curve suggests
a similar trend as the proposed estimation curve except slower decrease of mean survival
time after age 45. Because the linear Cox estimator must either increase, decrease, or
remain constant with age, though it is modulated by the time dependent baseline hazard
Xo(t), it indicates a strict decrease of mean survival time with age. Similarly, the quadratic
Cox estimator fits a parabola shape which indicates increasing mean survival time up to
age 30, then decreasing mean survival time after age 30. If the true regression curve has
a flat area, the linear or quadratic Cox's estimator would miss the real feature. Since the
nonparametric
estimat~rs do
not have this restriction they are very flexible to fit any kind
of regression curve. Also; we observe that the quadratic Cox estimator fits better than
the linear one. This is, again, an illustration that the proposed estimation method is a
good diagnostic tool for verifying the parametric assumptions. Tables 3.1 and 3.2 present
the estimated coefficients, their standard errors, standard normal z-values, and p-values of
.
the linear and quadratic Cox models. Figure 3.13 presents the Cox, Beran, and proposed
median regression estimates applied to the Stanford heart transplant data. The proposed
median estimation curve suggests that the median survival time remains relatively constant
with age between 15 to 42 and drops rapidly after age 42. Beran's median estimation curve
is almost identical to the proposed one except near boundaries. We observe that the three
43
estimation curves - the quadratic Cox, Beran, and proposed curves, largely agree with each
other. However, the linear Cox curve is quite apart from the three curves.
Figure 3.17 presents the 95% pointwise confidence intervals and the 95% simultaneous
confidence bands of the Stanford heart transplant data. The widths of the confidence
intervals and confidence bands are great where the data points are largely censored and
..
sparse.
3.3
SOLVD (Studies Of Left Ventricular Dysfunction) data
The following paragraph is a summary of the SOLVD study protocol. See Bangdiwala
(1992). Congestive heart failure (CHF) is a major public health problem in the United
States and worldwide. Approximately two million Americans suffer from heart failure and
about 250,000 new cases of CHF are diagnosed every year in the U.S. The mortality among
these patients is reported to be between 10% and 20% per year, so that about 100,000 to
200,000 deaths per year can be attributed to CHF in the U.S. alone, and the number is much
bigger worldwide. There have been several studies done to reduce CHF mortality through
drug interventions. A pooled analysis of the few hundred patients suggests a favorable trend
toward a lower mortality but the data are so scant that no reliable assessment of these effects
is possible. Also, there was no controlled clinical study examining the effects of the drugs on
physiological parameters, symptoms, mortality and safety when administered over several
years. The motivation of this study is to assess the effect of ACE (Angiotension Converting
Enzyme) inhibitors, the most promising vasodilator therapy, on long term moderate (10
to 20%) CHF mortality reduction for asymptomatic and symptomatic patients with left
ventricular dysfunction. Left ventricular dysfunction is defined as resting ejection fraction
equal to or less than 35%. There are two randomized double blind clinical trials - prevention
and treatment trials - in SOLVD. The patients in the treatment trial are those with left
ventricular dysfunction (resting ejection fraction equal to or less than 35%) and symptoms
and signs of overt CHF. The patients in the prevention trial are those with left ventricular
dysfunction but without overt CHF. In this presentation, the participants in the treatment
trial have been considered.
From June 1986 to March 1989, 2569 patients were enrolled in the treatment trial.
Among these, 168 patients in the placebo group from clinics Brussels and Burrningham are
44
included in this analysis. Ofthese 168 patients 102 (61.7 %) had either a hospitalization or
death due to CHF. The follow-up time is the time (in days) to the hospitalization or death
due to CHF. The covariate is the baseline left ventricular ejection fraction (e.f.). Figure
3.8 presents the Cox, Beran, and proposed mean regression estimates relating survival time
to the baseline ejection fraction on the base 10 logarithmic scale. h
.
= 3 was used for the
proposed and Beran's estimates. The proposed and Beran's estimation curves suggest that
there is a bimodal feature in the mean regression curve. In other words, the increasing
trend of the mean survival time stopped near eJ. 17 to 22 and after 30. On the other hand,
the linear or quadratic Cox estimation curve indicates an increasing trend only. Figure 3.14
presents the four median regression curves. h
= 3 was used for the proposed and Beran's
median estimation curves. The proposed median regression curve indicates a constant
median survival time up to eJ. 12, a sharp increase in mean survival between eJ. 12 to
15 and a slower increse after eJ. 15. A similar pattern was observed in Beran's median
estimation curve except that the sharp increse is observed between eJ. 11 to 13. The linear
Cox estimation curve shows very high initial median survival, which is very unlikely. The
quadratic Cox estimation curve shows a monotonically increasing median survival. Tables
3.1 and 3.2 present the estimated coefficients, their standard errors, standard normal zvalues, and p-values of the linear and quadratic Cox models. The p-values of the Cox
models suggest that the linear Cox estimate fits better than the quadratic one though the
quadratic curve is closer to the Beran or the proposed estimation curve.
Figure 3.18 presents the 95% pointwise confidence interval and 95% simultaneous
confidence bands of the SOLVD data set. As before, the widths for the confidence intervals
and confidence bands are wide where the data points are largely censored and sparse.
.
45
Table 3.1: Regression estimates and standard errors, s.e., for 10910 of censored or uncensored time versus covariate (x for the simulated data, age at transplant for Stanford heart
transplant data, and baseline ejection fraction for the SOLVD data) and p-values of the
Cox estimate with linear term
{3 s.e.({3)
n
z
p-value
90% uncensored data
200
0.0985
0.0181
5.43
< 0.0001
75% uncensored data
200
0.139
0.0221
6.28
< 0.0001
50% uncensored data
200
0.152
0.0281
5.41
< 0.0001
25% uncensored data
200
0.201
0.0366
5.49
< 0.0001
Stanford heart transplant data
157
0.0299
0.0114
2.63
0.0085
SOLVD data
168
-0.0614
0.0153
-4.01
<0.0001
..
46
Table 3.2: Regression estimates and standard errors, s.e., for 10910 of censored or uncensored time versus covariate (x for the simulated data, age at transplant for Stanford heart
transplant data, and baseline ejection fraction for the SOLVD data) and p-values of the
Cox estimate with linear and quadratic terms
90% uncensored data
75% uncensored data
50% uncensored data
25% uncensored data
Stanford heart transplant data
SOLVD data
n
/3
s.e.(/3)
z
200
-0.8914
0.2754
-3.24
0.00918
0.0026
-1.5886
0.3278
-4.85
< 0.0001
0.0159
0.00302
5.26
< 0.0001
-1.478
0.3470
-4.26
0.015
0.0032
-1.3611
0.4144
0.0141
0.0038
-0.1398
0.0533
-2.62
< 0.0088
0.0022
0.00069
3.19
0.0014
-0.2182
0.09068
-2.41
0.016
0.0033
0.0019
1.74
0.082
200
200
200
157
168
47
p-value
< 0.0012
3.59 < 0.0003
< 0.0001
4.66 < 0.0001
< 0.0010
3.72 < 0.0002
-3.28
•
45
50
55
60
65
x
Figure 3.1: True mean regression line for the 90% uncensored data
48
CD
E
=
..
45
50
55
60
65
x
Figure 3.2: Mean conditional censoring line for the 90% uncensored data
49
..
•
•
°
•
:::::=------......~~
~
° .. ,.. °° °
~'*q• • 0 · " · .
••
•
....... ,**
.
0.*.00°
•• *.
.....
... '* *.
••
...
0.
0
.....!'a-e:--..
---~==:-~~~
-- ,,-----.
.'................... -.... .......
:.:.--~~t.~. . . .:....
•
.."" '*
-...;..:::
.. •*.,..
.*
..
.*... . .
*. ..
.......
...
°
•
00
•
•
............
..
.
..."" *.
::,..
.... ... .-.- \ \
\
'*
.
...
~-- .--........ --
...
.......
'
•
•
..
..
............
\
.*",, . .
....
'"
"
•
•
45
55
50
60
65
x
Figure 3.3: Simulated data set with 90% uncensoring, estimated mean regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation curve with h=2;
Long dashed line - Cox's estimation curve with linear term; Short dashed line - Cox's
estimation curve with linear and quadratic term; Dashed line - proposed estimation curve
with h=3. *: uncensored observation. o:censored observation
50
•
. ... .
••
•
..
45
50
60
55
•
•
•
65
70
x
Figure 3.4: Simulated data set with 75% uncensoring, estimated mean regression curves:
Solid line - true mean regression curvej Dotted line - Beran's estimation curve with h=2j
Long dashed line - Cox's estimation curve with linear termj Short dashed line - Cox's
estimation curve with linear and quadratic termj Dashed line - proposed estimation curve
with h=2. *: uncensored observation. o:censored observation
51
o
o
o
o
*
*
.
*
*
*
*
*
*
45
50
55
60
65
x
Figure 3.5: Simulated data set with 50% uncensoring, estimated mean regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation curve with h=2;
Long dashed line - Cox's estimation curve with linear term; Short dashed line - Cox's
estimation curve with linear and quadratic term; Dashed line - proposed estimation curve
with h=2. *: uncensored observation. o:censored observation
52
o
..::~,::".:.:.:- -o-......__ ~
0
*
*
*
40
45
50
55
60
65
70
x
Figure 3.6: Simulated data set with 25% uncensoring, estimated mean regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation curve with h=2;
Long dashed line - Cox's estimation curve with linear term; Short dashed line - Cox's
estimation curve with linear and quadratic term; Dashed line - proposed estimation curve
with h=4. *: uncensored observation. o:censored observation
53
.
o
o
o
00·
•
Cii"
E
"..
1
C\,J
-
0
i
•
.....
-
o
10
•
•
I
I
I
I
I
20
30
40
50
60
age
Figure 3.7: Stanford heart transplant data, estimated mean regression curves: Dotted line
- Beran's estimation curve with h=4j Long dashed line - Cox's estimation curve with
linear termj Short dashed line - Cox's estimation curve with linear and quadratic termj
Dashed line - proposed estimation curve with h=4. *: uncensored observation. o:censored
observation
...
54
o
08
0
009
·00 •
•
o •
Q
0•
0
0
0
..
•
• •
•
N
-
..
o
i
•
•
•
•
•
•
·· .•
• •
•
•
•
--
o
•
•
•
•
•
•
•
•
•
•
I
I
I
I
I
I
10
15
20
25
30
35
ejection fraction
Figure 3.8: SOLVD data, estimated mean regression curves: Dotted line - Beran's estimation curve with h=3j Long dashed line - Cox's estimation curve with linear termj Short
dashed line - Cox's estimation curve with linear and quadratic termj Dashed line - proposed
estimation curve with h=3. *: uncensored observation. o:censored observation
55
*
o
*0 *
* 0
*
---
*
*.
0
**,., 0
0- 0
*
'*...
* . . * 00
'*
'*
==-=..... ~~_!'"'-: .* ::.
'.........
'*
--~
,,.----_.
0
~"Q- *0*. *
0
0
0*
* *
*
**
0
00
*
** ...
* *
*
*«*
**
*
**
•
* *...
** *
* •
*
*
*
*
*
*
45
50
55
60
65
x
Figure 3.9: Simulated data set with 90% uncensoring, estimated median regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation curve with h=2;
Long dashed line - Cox's estimation curve with linear term; Short dashed line - Cox's
estimation curve with linear and quadratic term; Dashed line - proposed estimation curve
with h=3. *: uncensored observation. o:censored observation
56
•
-- --•
..
00
•
.~_
..
..
'*
..
•*
..
.. ..
-9Q.~ ••
_
...
_
... *
:.
....
..
000
................
'*.."
*
•
0
"
~-
..: ..........C..,
'*
'* ..
•
0.0
• O.
0
.
••
.
•
.. 0
.*
.....
..
'*
...
00
~ ........
~'
..-..-
.-.\
•
..
~
.
\\
"'--- --... .........
\~
\
,
,
. . . , V..
..' .......,
\
'*
'
---
~
,
.
\~
.~,
\\,
,,
,
,,
,
'•
45
" 50
60
55
65
70
x
•
Figure 3.10: Simulated data set with 75% uncensoring, estimated median regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation curve with h=2;
Long dashed line - Cox's estimation curve with linear term; Short dashed line - Cox's
estimation curve with linear and quadratic term; Dashed line - proposed estimation curve
with h=2. *: uncensored observation. o:censored observation
57
,
o
\. .--e-
o
o
*
\..
*
'-
o
*
* *
*
*. : :-.*
*
*
*
*
'" *
".
*
*
*
*
*
*
45
50
55
60
65
x
Figure 3.11: Simulated data set with 50% uncensoring, estimated median regression curves:
Solid line - true mean regression curve; Dotted line - Beran's estimation curve with h=2j
Long dashed line - Cox's estimation curve with linear termj Short dashed line - Cox's
estimation curve with linear and quadratic termj Dashed line - proposed estimation curve
with h=2. *: uncensored observation. o:censored observation
58
•
~
0
N
0
*----0
~
Q)
E
"..
~
•
40
45
50
55
60
65
70
x
Figure 3.12: Simulated data set with 25% uncensoring, estimated median regression curves:
Solid line - true mean regression curvej Dotted line - Beran's estimation curve with h=2j
Long dashed line - Cox's estimation curve with linear termj Short dashed line - Cox's
estimation curve with linear and quadratic termj Dashed line - proposed estimation curve
with h=2. *: uncensored observation. o:censored observation
•
59
o
_.......
o
f
...
~-
--.......
00.
0
0
0
••
0
0
00
0
•
0
/!:_~~-::-=..__ ,~ __~~o
• .o~ 80 0
'a~-"tt-;"""
I,
I,
o
0
-
0
'
II
,'00
j
"
. I,
" 0
,1
,I
, I
I ·
I
0
........
0
•
0
0
"'"
0
0
0
"
••
0 '.. \ 0 •
e.;:: ....... ~.,.
n o. . u
JO.
,),.\":-.
~\ '\,
'--Q
0
•
• ~\
0··
. ' -,~_
•
\"
"'"
0 · · ·
,,\
•
0
. .., '" \'
,
• • ,\.~
I
\ ..... ,
I
I
•
..
•
\
•
....... .......
....,'t
••
•
-
\\
~.t:~~...__..
.* : . . .•.•.
.: •
.......
,
'.
...
•
•
•
•
•
10
20
40
30
•
50
60
age
Figure 3.13: Stanford heart transplant data, estimated median regression curves: Dotted
line - Beran's estimation curve with h=4j Long dashed line - Cox's estimation curve with
linear termj Short dashed line - Cox's estimation curve with linear and quadratic termj
Dashed line - proposed estimation curve with h=4. *: uncensored observation. o:censored
observation
60
•
o
*
*
*
•
*
*
Q)
,B
* *
**
*
•
*
*
• *
*
*
*
C\,I
n;
.ii!:
*
I
*
0
E
*
* *
*
*
*
*
10
15
20
25
30
35
ejection fraction
Figure 3.14: SOLVD data, estimated median regression curves: Dotted line - Beran's
estimation curve with h=3j Long dashed line - Cox's estimation curve with linear termj
Short dashed line - Cox's estimation curve with linear and quadratic termj Dashed line proposed estimation curve with h=3. *: uncensored observation. o:censored observation
.
61
.
\
\
\
\
\
\
\
\
\
\
o
~
*
..
*
"-
-
*0,
--- --;.--*
'-,
**
...
* ....
**
\
\
.
\
*.
,,
,,
--**- ...
\
\
*
\
\
""
,. *
/
*
\
*
*
,
\
,
45
50
55
60
65
x
Figure 3.15: 95% pointwise confidence interval and simultaneous confidence bands for the
90% uncensored simulated data set. Solid line - true mean regression curve; Dotted lines estimated mean regression line with its upper and lower pointwise confidence intervals; Vertical lines - simultaneous confidence bands on distinct x points. *: uncensored observation.
o:censored observation
.
..
62
.r
\
\
\
\
\
\
\
\
\
,,
,,
,
\
,,
,,
,,
...
.........
----
--...
-o.P_o
0*
-.
... ,
*
...
CD
E
,
,,
,,
,,
• * •
•
",
.._---
,,
,,
,
.
---
,,
... ...
I
I
,
..............
... ...
...
I
I
I
I
I
45
50
60
55
65
70
x
..
Figure 3.16: 95% pointwise confidence interval and simultaneous confidence bands of the
75% uncensored simulated data set. Solid line - true mean regression curve; Dotted lines estimated mean regression line with its upper and lower pointwise confidence intervals; Vertical lines - simultaneous confidence bands on distinct x points. *: uncensored observation.
o:censored observation
•
63
.
f
•
•
•
10
20
•
30
40
50
60
age
Figure 3.17: 95% pointwise confidence interval and simultaneous confidence bands of the
Stanford heart transplant data set. Solid line - estimated mean regression curve with its
upper and lower pointwise confidence intervals; Vertical lines - simultaneous confidence
bands on distinct x points. *: uncensored observation. o:censored observation
64
"
.
0
o
0
o 8
"
0
*
*
Q
0
0
0
0
0
*
*
0
0
CD
~
.~
i
*
*
C'.I
*
0
*
i
*
*
*
*
*
*
o
10
15
20
25
30
35
ejection fraction
•
Figure 3.18: 95% pointwise confidence interval and simultaneous confidence bands of the
Stanford heart transplant data set. Solid line - estimated mean regression curve with its
upper and lower pointwise confidence intervals; Vertical lines - simultaneous confidence
bands on distinct x points. *: uncensored observation. o:censored observation
•
65
Chapter 4
Summary and Future Research
4.1
Concluding Remarks
The purpose of this research was to develop an improved method for estimating the regression functions in the presence of right censoring. This was done by using a nonparametric
kernel smoothing method based on local linear fits. This is in contrast to the parametric
approach which uses a specific functional form for the relationship between the survival
time and a covariate. A detailed review of methodology and problems of these existing
nonparametric and parametric approaches was presented in Chapter 1.
The methodology and theoretical performance of the proposed estimates were discussed in Chapter 2. The estimators of subsurvival functions, cumulative hazard, survival
function, conditional mean and median functions are asymptotically normal under appropriate conditions. The asymptotic bias of the proposed estimates is more interpretable
than the one for Beran's estimate and does not depend on the derivative of the marginal
distribution of X.
The practical performance of the proposed estimates was presented through the simulation experiment and real data analysis in Chapter 3. From the results of the simulation
experiment we observed that Beran's estimation curve captured the underlying feature of
the true regression curve though it didn't work well near the tail. The proposed estimation
curve shows an alleviation of Beran's boundary problem which suggests further theoretical
investigation. Cox's quadratic estimation curve roughly agrees with Beran's and the proposed estimation curves over the inner range of data points. Cox's linear estimate gave the
poorest fit among four estimation curves. The nonparametric estimation curves (the Beran
and proposed ones) are more flexible to fit the unknown underlying regression curve. Also,
they tell us which parametric curve fits better to the underlying regression curve.
Miller and Halpern (1981) analyzed log-transformed Stanford heart transplant data
by using linear and quadratic Cox and Buckley-James regression estimates. Doksum and
Yandell (1982) applied Beran's mean and median estimates to the same data and compared
these to the Cox and Buckley-James estimates. However, due to the boundary problem,
their estimated curves were limited to the inner data points only. Through our data analysis,
we found that the proposed, Beran, and quadratic Cox estimation curves roughly agree with
each other, especially over the inner range of data. But, the linear Cox estimates were very
far from the others.
The same procedure was also applied to the SOLVD data. Cox's quadratic mean
estimation curve is very close to the proposed mean estimation curve except that Cox's
quadratic curve missed the bimodal pattern. Beran's estimation curve has the same features
as the proposed estimation curve except for the initial higher survival. Similar results were
drawn from the median estimation curves. As before, we observed that Cox's linear mean
....
and median estimation curves were far from the others.
For the purpose of statistical inference, 95% pointwise confidence intervals and confidence bands at selected x points were constructed in Chapter 3. We observed that the
widths of the pointwise confidence intervals and confidence bands are inversely proportional
to the amount of censoring and the density of the covariate.
4.2
Future Research
The main interest in this research was focused on estimating the conditional ,mean and median regression functions in the presence of right censoring. Another interesting regression
function to estimate, other than these two regression functions, is the conditional quantile
function. In other words, estimating the survival time for the p-th unhealthiest patients
given a certain covariate is also an interesting scientific question. The estimate of the p-th
•
quantile regression function can be easily obtained by replacing p by
! in formulae (2.1) and
(2.2) in Chapter 2. However, the difficulty lies in establishing the asymptotic properties of
the conditional quantile regression estimates.
67
Based on the Bahadur's representation, Dabrowska. (1993) showed that
medn(x) - med(x) =
where IRln(x)1 = 0
Sn(med(x)lx) - 0.5
j(med(x)lx)
Cn"t)3/4 and IRln(x)1 = O(h
2
+ R1n(x) + R 2n (x)
,.
),
as n -
00
and medn(x) is constructed
using local constant fits. As a future research, the above representation will be generalized
..
to local linear fits. This result will be applied to establish the asymptotic normality of
med n( x). In other words,
vnh[medn(x) - med(x) - Bmed(X)] ~ N(O'O'~ed(x»,
for
x E J,
where
and
C(med(x)lx)
=-
r
°
ed(X)
..
dH 1( ulx )
Hi(u _lx)<P(x).
In this dissertation, we discussed the pointwise consistency of those estimates for the
conditional subsurvival, cumulative hazard, survival functions and the median and mean
regression functions. Future research is needed for the uniform consistency of these estimates
in x. In other words, for f.
> 0,
lim P(sup sup IHln{tlx) - Hl{tlx)1 > f.)
n-oo . xeJ te[O,T(X»
= 0,
lim P(sup sup IAn(tlx) - A(tlx)1 > f.)
n-oo xeJ te[o,T(X»
l
= 1,2,
= 0,
lim P(sup sup ISn(tlx) - S(tlx)1 > f.) = 0,
n-oo xeJ te[O,T(X»
lim P(sup sup Imedn{tlx) - med(tlx)1 > f.) = 0,
n-oo xeJ te[o,T(X»
lim P(sup sup Imn(tlx) - m(tlx)1 > f.) =
n-oo xeJ te[o,T(X»
and
°
need to be proven. The difficulty of proving uniform consistency of Hln(tlx) in t and x
simultaneously is that the estimate is a nonlinear form of two random variables X and
Jl,l = 1,2. i.e., it needs to be proven that the denominator of the estimate is uniformly
68
•
consistent in z and the numerator of the estimate is uniformly consistent in t and z. Then,
linearize the estimate by replacing the denominator with a constant.
For the purpose of statistical inference, we constructed pointwise confidence interval
•
and simultaneous confidence bands. However, due to the complexity of estimating the
bias term of the mean regression function, we suppressed the bias term by assuming that
the bandwidth h tends to zero slightly faster than the optimal rate n- 1 / 5 • An alternative
solution is to use a bootstrap approach, which needs to be developed for kernel smoothing.
A bootstrap approach could also be investigated for the confidence bands of the conditional
median regression.
At the present time, there is no single method that achieves both optimal theoretical
performance and acceptable practical performance at the same time in an automatic, datadriven bandwidth selection method. An automatic procedure for selecting the optimal band
width needs to be developed.
The kernel smoothing technique is known to have a dimensional problem. It is known
not to work well in dimensions higher than three. In this research, we examined the performance of the proposed method through the univariate case. The performance of the
proposed method in the multi-covariate case needs to be explored to confirm or to reject
the idea of a dimensional problem. Alternatively, generalized additive models (Hastie and
Tibshirani (1986» and projection pursuit (Friedman and Stiitzle (1981» can be considered.
69
Chapter 5
Proofs
5.1
Consistency
Proof of Proposition 2.1
Set Ki
= K(xf:l:) and 8j = Tbi E(Xi -
x)jK(xf:l:), j
IHtn(tlx) - Ht(tlx)1 =
= 0,1,2.
IA - B - C + DI
•
:5 IA - CI + ID - BI,
where
A
=
B
=
C
=
D
=
82[Tbi E1tiKi]
,
8082 - 8~
81[Tbi Elti(Xi - X)Ki]
8082 - 8~
82[Tbi E Ht(tlx )Ki]
8082 - 8~
81[;h E Ht(tlx )(Xi - x )Ki]
,
8082 - 8~
l
= 1,2.
Lemma 1 Under Conditions 1 and 2,
E(8j) =
hi f(x)
J
j
v K(v)dv(1 + 0(1))
and
var(sj)
= nh1 O(h 2"J).
Proof
This result follows from a standard argument in nonparametric smoothing area. See Parzen
(1961, Theorem lA) and Devroye & Gyorfi (1985, Theorem 3, p8). Here we give a direct
proof. By Conditions 1 and 2 and the dominated convergence theorem,
E(Sj) =
•
to
E (:h
~)Xi -
i
x)iK(X
x))
;
=
1
. Xl nh nE(XI - x)1K( h
=
. u-x
h1j(u-x)JK(-h-)!(u)du
=
j(hv)jK(v)!(X+ hv)dv
=
I(hv)j K(v)[j(x + hv) - !(x) + !(x)]dv
= h j !(x)
1vi
X
)
K(v)dv(l + 0(1))
as
h -. O.
Similarly,
va.r(Sj)
= E[sJ] -
[E(Sj)]2
= n2~2 {nE[(XI- x)jK(X\- X)]2 -
..
=
1 {
. Xl n2h2 nE[(XI -x)1K( h
=
n 21h 2 { n
=
2
n 21h 2 {n l(hv?jK (V)!(x
=
2j
n 21h 2 {nh +l!(x)
=
..!..-O(h2j )
nh
1
X
n 2[E(XI- x)jK(Xlh- X)F}
2
.
)] -n[E(XI -x)1K(
.
u- x
(u-x?JK 2(-h-)!(u)du-n
as
+ hv)hdv -
Xl h
X
)]
2}
[I·
u- X
(u-x)1K(-h-)!(u)du
]2}
n [/(hv)jK(V)!(X + hV)hdVf}
1
j
2j
2j
v K(v)dv(1+0(1))- nh +2p(x) [I V K 2(V)dV(1+0(1))f}
h -. O.
0
It follows from Lemma.! that ISj - E(Sj)1
SO~2 - s~ =
= 01'(1),
j
= 0,1,2,
and
h 2!2(x) j v 2K(v)dv(l + 01'(1)).
71
(5.1)
By the Glivenko-Cantelli Theorem, SUPt
IA - CI
~
I (lli-H~:lx»Ki I~ 0. See Beran (1981). Thus,
Op(l)o(l)
= op(l).
(5.2)
•
For M > 0,
p
(s~p
I:h E(Ili - Hl(tlx))(Xi - X)Kil > M)
E
~
(tr SUPt I E(Ili -
Hl(tlx))(Xi - X)Kil)
M
<
tr E (SUPt E Ihi - Hl(tlx )IIXj -
<
fE (IX
xlKi)
M
xIKI )
I -
M
= O(h)
Therefore, for n sufficiently large,
ID-BI =
ISI[tr EHl(tlx)(Xi -
X)Ki -
tr E1li(Xi -
X)Kil
s~
SOS2 -
~
Op
(h~) s~P I(:h E Hl(tlx )(Xi -
=
Op
(h~) Op(h)
=
op(l).
x )Ki - :h
I
E Ili(Xi -
x )Ki)
I
,.
(5.3)
Combining (5.2) and (5.3) gives
sup IHln(tlx) - Hl(tlx )I.!. 0,
L = 1,2.
t
This completes the proof of Proposition 2.1.
0
Proof of Proposition 2.2
For 0 ~ t < T(X) < 8Up{t : H 2(tlx)}, H2(tlx) > 0, H 2(0Ix) > 0, H 2n (tlx) > 0, and
H2n (0Ix) > O.
IAn{tlx) - A(tlx)1
= I It
Jo
=
=
Ir
Jo
IIt
Jo
= I It
I
r
dHI(slx) _
dHln(slx)
H 2(s -Ix) Jo H2n (s -Ix)
dHI(slx) _ It d(Hln(slx) - HI(slx)) _ It dHt(slx)
H2(s -Ix) Jo
H 2n (s -Ix)
Jo H 2n (s -Ix)
dHI(slx) _
H2(s -Ix) dHI(slx) _
d(Hln(slx) - HI(slx))
H 2(s -Ix) Jo H 2n (s -Ix) H2(s -Ix) Jo
H2n (s -Ix)
H2n(s -Ix) - H2(s -Ix) dHI(slx)
H 2n (s -Ix)
H2(s -Ix)
Jo
•
I
r
r
I
72
I
+
I1r d(H1nH(slx)
- H1(slx))/
2n(s-lx)
apply integration by parts
0
=
I1r H2n(s -Ix)
- H2(s -Ix) dH1(slx) I
H (s -Ix)
H (s -Ix)
2n
0
2
+1[Hln(slx)-Hl(Slx)]t _ r(H1n(slx)-H 1(slx))d(
1
)1
H2n (s -Ix)
0
10
H2n (s -Ix)
~
0(I)supIH2n (s -Ix) - H 2(s -lx)1 + 0(1) sup IH1n(slx) - H1(slx)1
•<t
.<t
This completes the proof of Proposition 2.2.
~ 0.
0
Proof of Proposition 2.3
The following argument is borrowed from Gill (1980, p.153).
Lemma 2 (Gill) Let A and B be right continuous increasing functions on [0,00) with
A(t) = B(t)
~A(t)
Let T B
.
=
°for t < °and
~A ~ 1, ~B
= A(t) - A(t-) & AC(t) == A(t) -
= inf{t: B(t) = oo}.
< 1 on [0, (0), where
E. 9
~A(s).
Then the unique locally (i.e., on each [O,t]) bounded solution
Z of
Z(t) =
lot 1 ~~~~s)d[A(S) - B(s))
(5.4)
exp( -AC(t)) TI.$t(1- ~A(s)) .
exp( -Bc(t)) TI.$t(1- ~B(s))
(5.5)
1-
on [0, TB) is given by
Z(t)
=
In this case, (5.4) and (5.5) can be rewritten as
Z(t)
r
= 1 - 10
Z(s-Ix)
1- ~A(slx)d[An(slx)- A(slx))
and
TI.<t(1- ~An(slx))
Z(t) = exp(-AC(tlx))TI.$t(1- ~A(slx))
.
Sn(tlx)
= S(tlx)'
The following argument is analogous to Schorack and Wellner (1986, p305).
Set Kn = f~ l_ak{.lx)d[An(slx) - A(slx)).
It follows from the above expressions that for t
Sn(tlx)
S(tlx)
73
:5 (J < T
S,,(tlx)
S(tlx) - S,,(tlx)
IS,,(tlx) - S(tlx)1
It S,,(s -Ix)
=
S(tlx) - S(tlx) 1
=
S(tlx)
1
S(s -Ix) 1- ~A(slx)d[A,,(slx)- A(slx)],
0
r S,,(s
-Ix) dK"
S(s - Ix)
. 10
•
< IS(t1x)11 It S,,(s -Ix) dK,,1
10
S(s-Ix)
-Ix) dK"1
apply integration by parts
S(s -Ix)
I1r S,,(s
K _ It K d{S,,(s -IX)}I
= Is,,(tIX)
S(tlx) " 10 "
S(s-Ix)
~
0
{S~9)} IK,,(t)1
<
+ II K"
II~
rIS(s -lx)dS,,(s -Ix)
- S,,(s -lx)dS(s -Ix) I
S2(s-lx)
10
< O(1)IIK,,(tlx)lIg
in
probability,
for
xeJ.
Now,
•
and
II K,,(tlx) IIg
~
II A,,(tlx) -
~
0,
for
A(tlx)
IIg +211 A,,(tlx) -
A(tlx) IIg [1- S(9)]/S(9)
x e J.
Thus,
sup IS,,(tlx) - S(tlx)1 ~ 0 for
x e J.
t
This completes the proof of Proposition 2.3.
0
Proof of Proposition 2.4
Let £ > 0 be arbitrary, but small enough that med(x)+£
< sup{t: S(tlx) > 0, G(tlx) > O}.
By (2.1) and (2.2),
and
_
1
S(med (x) - £Ix) > 2
In view of Proposition 2.3, for all sufficiently large n, we can replace S( 'Ix) by S,,( ·Ix).
Hence,
74
•
Beran (Corollary 4.1, 1981) proved that for x E J,
with probability one. These together imply that
P(med~(x) - med+(x)
> £) = P(med~(x) > med+(x) + £)
P(med~(x) -
> £) =
=0
for
x EJ
and
med-(x)
P(med~(x)
> med-(x) + £) = 0 for x
E J.
Therefore,
Imedn(x) - med(x)1 ~ 0
for
x E J.
0
Proof of Proposition 2.5
The following proof is the consistency of the truncated mean regression m( x; r(x)).
Let
£
> 0; then
Imn(x; r(x)) - m(x; r(x ))1 =
=
~
IfoT(X) (Sn(tlx) - S(tlx ))dtl
11°
T
f
(X)-t (Sn(t ,x)
..
rex)
I(Sn(tlx) - S(tlx))1 [rex) -
sup
rex)
JT(X)-!f
1
JT(X)-2 f
(Sn(tlx) - S(tIX))dtl
!£]
2
0StST(X)-!f
+
5.2
- S(tlx))dt+
(Sn(tlx) - S(tlx))dt
~ O.
Asymptotic Normality
Proof of Theorem 2.1
The proof of Theorem 2.1 depends on the following argument and Lemmas 1 and 3. Let
75
where hi and hi are given in Section 2.2.1. Then, ai and hi are obtained by minimizing Li
respect to ai and bi. Or, by solving
(5.6)
(5.7)
[Iii - al - bl(Xi - x)] - [Iii - aot - bot(Xi - x)]
=
-Cal - aol) - (hi - bol)(Xi - x).
(5.8)
By equating (5.8) to (5.6) and (5.8) to (5.7), we obtain
(5.9)
(5.10)
where S;i
= ~ E{Ili -
aot - boi(Xi - X»(Xi - x)iK(xfX),
j
= 0,1. Solving (5.9) and
(5.10) gives
(5.11)
(5.12)
Lemma 3 Under Conditions 1 -
4,
and
.
Proof
76
= Ii1J [Hl(tlu) -
. u-x
aot - bot(u - x)](u - xY K(-h-)f(u)du
= Ii1J [Hl(tlx + hv) -
.
aot- bol (hv)](hv)3K(v)f(x
+ hv)hdv
= ~h2+if(x)H~'(tlx)JV2+iK(V)dV(1+0(1))
•
as
h-O.
var[Sli] = E[Sli~ - [E(Sli)]2
=
{J E[(IlI - aol- bot(u - X)) IXI = U](U - X)
n 21h2 n
2
U
X
2'
2
J K (-h-)f(u)du
_n 2 [J E[IlI - aol - bol(U - x)IXI = U](U - xi K( u ~ X)f(U)dUf}
=
n 21h2 {n J E[(Ill-a ol-bot(hv))2IX I = x +hv](hv)2i K 2(v)f(x+hv)hdv
-n
=
n 21h2 {n J E[(Il I - aol)2 - 2(Ill - aot)bolhv + b~l(hv)2IXI = X + hv](hv)2iK 2(v)
f(x
=
[f ElIn - a", - b",(hvllX, ; z + hvJ(hvYK( v l/(z + hvlhdvr}
+ hv)hdv -
n
[~h3+i f(x)H;' (tlx) J
O(l))f}
n 21h2 {n J E[(Il I - a o )2lXI = x](hv)2iK 2(v)f(x)hdv(1 + 0(1)) + nh2i+3f(x)b~
J v2i+2K2(v)dv_nh2i+6f2(x)
=
v2+i K(v)dv(l +
[~H;(tlx) J
v2+i K (V)dV(1+0(1))f}
2
h :';x) {Hl(t I X)[l- Hl(tlx)] J v2i K 2(v)dv(1 + 0(1))
+h2 J v 2i +2K 2(v)dv(1 + 0(1)) - h 5 f(x)
= n~ h2i f(x)Hl(tlx)[l- Hl(tlx)] J
[~H~'(tlx) J
v2+i K(v)dv(l +
v2i K 2(v)dv(1 + 0(1)).
0
Lemma 4 Under Conditions 1 - 4,
p
SrJ - ESrJ 5t } = 4.)(t) +0(1),
{ vvar(Sli)
Proof
We verify the Liapounoff condition: for some D > 0,
77
j
= 0,1,
l
= 1,2.
O(l))f}
EI(Iii - aot - bot(Xi - X»(Xi - X)iK(Xi;: X)r+S
=
J
J
=
h(2+6)i+I j(x)
=
E[I/n - aot - bot( U- X)1 2+SIX1 = U]( U- x )(2+6)iK2+6( u ~ x )j(u)du
E[llt1 - a oil2+ 6 lX1 = x + hv]lhvl(2+6)iK 2+6(v)j(x)hdv(1 + 0(1»
J
E[I/n - aot1 2+SIX1 = x]lvl(2+6)iK2+6(V)dv(1 + 0(1)),
•
= 1,2.
l
Thus,
E E l(Iii - aoi - bot(Xi - X))(Xi - x)i K( ¥)/2+
6
[var (E(Iii =
aoi - bot(Xi - X))(Xi -x)i K( XiCI:))] 1+6/2
nM2+6)j+1 j(X) J E[I/i1 - aot1 2+SIX1 = X]v(2+S)iK2+S(v)dv(1 + 0(1»
[nh 2i+I j(X)]1+t[J E[(In - aot)21X1 = x]v 2i K2(V)dv(1 + 0(1»]1+6/2
=
1
J E[I/i1 - aot1 2+SIX1 = x]v(2+6)iK2+S( v)dv(1 + 0(1»
[nhj(x)]f [J E[(Ii1 - aot)21X1 = x]v 2i K2(v)dv(1 + 0(1))]1+6/2
= 0(1). 0
By Lemmas 1 and 3,
Hin (tl X) - Hi (tl X) -- al - aot A
-
82
Sio(tlx) - 8 1Si1(tlx) - Sl'o(tlx) (1 + (1))
2
j()
01"
808 2 - 8
X
1
It follows from Lemma 4 that
where E[Sio] and var[Sio] are given in Lemma 3,
78
..
l = 1,2.
..
and
Therefore,
vnh[Hin(tlx) - Hi(tlx) - BH1 ] ~ N
(0,0'1
1
),
l = 1,2.
The symbol ~ denotes asymptotic convergence in distribution.
To show the covariance structure in Theorem 2.1, let Win(tlx) = Jnh[Hin(tlx) -
Hi(tlx)], l
= 1,2. Then Win(tlx) can be decomposed to variance part and bias part, i.e.,
Win(tlx)
where Win(tlx)
=
vnh[Hin(tlx) - Hi(tlx)]
=
vnh[Hin(tlx) - Hi(tlx) + Hi(tlx) - Hi(tlx)]
=
Win(tlx) + vnhBHl'
= Jnh[Hin(tlx) -
Hi(tlx)],
Hi(tlx) =
...
and Sii
= r!r E Iii (Xi -
x)i K( ¥), j
_l_)Win(t I X)
( .;:;Ji
=
E[S2 SlO - Sl Sil]
E[SOS2 - s~]
= 0,1.
g(x)
SOS2 - s~
(5.13)
Note that
{S2SlO-S1Sil_Hi(tlx/OS2-S~}
g( x )
g(x )
where g(x) = h 2j2(x)fv 2K(v)dv. Let
W;n(tlx) =
Since
For
,(x) 2
60 6 2- 6 1
°< s, t <
vnh{S2S~~;lSil_Hi(tIX)SO:2(~S~}.
(5.14)
~ 1 by (5.1), Win(tlx) has the same asymptotic distribution as Wi*n(tlx).
00,
cov(Win(slx), W 2n(tlx)) = E[Win(slx)W2n (tlx)]
..
=
g~~)E ([[S2SlO(slx) -
SlSl1(slx)] - Hln(slx)[SOS2 -
[[S2S20(tlx) - SlS2l(tlx)] - H 2n (tlx)[SOS2 - S~]]}
= g2(x)
~E[A* -
B* - C* - n*]
79
S~]]
where
A*
B*
C*
D*
As h - 0 and nh -
00,
=
=
=
=
[S2 S l0(slx) - SlSl1(slx)][S2S20(tlx) - SlS2l(tlx)]
2
A
[S2 SlO(slx) - SlSl1(slx)]H2n (tIX)[SOS2 - Sl]
2
A
[S2 S20(tlx) - SlS2l(tlx)]H ln (sIX)[SOS2 - Sl]
A
22
A
H ln (slx)H2n(tlx)[saS2 - Sl] .
cov(Win(slx), W:zn(t/x» converges to
where
""( )= f K2(v)dv
'fJ
x
f(x)·
Note that E[A*], E[B*], E[C*], and E[D*] are computed in Lemmas 5-7.
Lemma 5 Under Conditions 1 - .4 , for a, b = 0,1,2, and c, d = 0,1,
(n 4 h 4 ) E[SlJSbSlc( six )S2d(tlx)]
= nhlJ+b+C+d+lf(x)Hl(svtIX)! vlJ +b+c+dK4 (v)dv(1+0(1»
+n(n - 1)hlJ +b+ c+d+2 f2(x) {Hl(s V tlx)! vb+c+d K 3(v)dv! vlJ K(v)dv(l + 0(1»
+Hl(s V t1x)! vlJ +bK 2(V)dv! v c+dK 2(v)dv(1 + 0(1»
...
+Hl (s lx)H 2(t I X)! vlJ+b+CK3(v)dv! v dK(v)dv(l + 0(1»
+Hl (s IX)H 2(t IX)! vCK(v)dv! vlJ+b+dK3(v)dv(1+0(1»
+Hl(s V tlx)! v lJ+c+d K 3(V)dv! vb K(v)dv(l + 0(1»
+Hl (slx)H 2(tlx)! vlJ +CK 2(v)dv! vb+ dK 2(v)dv(1 + 0(1»
d
+Hl (s lx)H 2(t I X)! vlJ + K 2(v)dv! vb+ cK 2(v)dv(1 +
O(l»}
+n(n - l)(n - 2)hlJ +b+c+d+3 f3(x) {
H l (s l x)H2(t IX)! vlJ+b K 2(v)dv! VCK(V)dv! v dK(v)dv(l + 0(1»
+Hl(s V tlx)! vlJ K(V)dv! vb K(V)dV! vc+dK 2(v)dv(1 + 0(1»
+Hl (slx)H 2(tlx)
+Hl (slx)H 2(tlx)
!
!
vlJ K(v)dv! v
b+cK 2(v)dv! vdK(v)dv(l + 0(1»
vlJ K(v)dv! vb+d K
2(v)dv! VC K(v)dv(l + 0(1»
+Hl (s I X)H 2(t I X)! v lJ + K 2(V)dV! vbK(v)dv! vdK(v)dv(l + 0(1»
C
80
"
+ HI (slx)H2(tlx)
J
Vo+dK 2(v)dv
J
J
Vb K(v)dv
VCK(v)dv(1 + 0(1))}
{J
+n(n - 1)(n - 2)(n - 3)h o+b+ c+dH r(x)HI (slx)H 2(tlx)
J
..
J
vbK(v)dv
J
vdK(v)dv(1 + 0(1))} ,
vCK(v)dv
V
O
as
K(v)dv
h
-+
O•
Proof
Let Ki = K(Xf:&), Kj = K(Xf:&), Kk = K(Xf:&), Kl = K(X'h-:&)' K(·) = K(T)'
Ili(S) = I(l'i
~ S,6i
= 1), I2i(t) = I(l'i
~
t).
(n 4 h4 ) E[SoSbSlc(slx )S2d(tlx)]
=
E {E(Xi - X)OKiE(Xj - x)bKjEI1k(s),(Xk - X)CKkEI2l(Xl- X)d K l }
i
j
L=tl=l
= E
+
k
l
Ju( s V t)(X; - x )0+'+0+" Kt
n(n-l)
E
[(Xi - X)O Ki][Ili(S V t)(Xj - X)b+c+d Kj]
i¢j=k=l
..
+
n(n-l)
E
[(Xi - x )o+b Kl][I1k (S V t)( Xk - X)c+d K~]
i=j¢k=l
."
+
n(n-l}
E
[Ili(S)(Xi-Xt+b+cKrUI2l(t)(Xl-X)dKtl
i=j=k¢l
n(n-l)
+
E
[I2i(t)(Xi - x)o+b+ dKrUI1k (s)(Xk - XYKk]
i=j=l¢k
+
n(n-l)
E
[hies V t)(Xi - x)o+c+dKt][(Xj - x)bKj]
i=k=l¢j
n(n-l)
+
E
[Ili(S)(Xi - X)o+c Kl][I2j(t)(Xj - X)b+d KJ]
i=k¢j=l
n(n-l)
+
E
[I2i(t)(Xi - X)o+d Kl][Ili(s)(Xj - X)b+ cKJ]
i=l¢j=k
n(n-l)(n-2}
+
E
[(Xi-xt+bK1I1k(s)(Xk-XYKk][I2l(t)(Xl-X)dKl]
i=j¢k¢l
n(n-l)(n-2)
+
E
[(Xi - xt Ki][(Xj - x)bKj][Ilk(S V t)(Xk - X)c+d Kl]
i¢j#=l
+
n(n-l)(n-2)
E
[(Xi - X)O Ki][Ilj(s)(Xj - x)b+ cKJ][I2l(t)(Xl- x)dKtl
i¢j=k¢l
81
n(n-l)(n-2}
E
+
[(Xi - x)tJKi][I2j(t)(Xj - x)b+ dKJ][I1k(s)(Xk - X)CKk]
i.pj=l#
+
n(n-l)(n-2)
E
[lti(S)(Xi - X)tJ+cKl][(Xj - X)b Kj][I21(t)(Xl - X)d Kt]
j.pi=k.pl
+
n(n-l)(n-2}
E
[I2i(t)(Xi - X)tJ+dKl][(Xj - X)b Kj][I1k(s)(Xk - X)C Kk]
j.pi=k.pl
+
n(n-l)(n-2)(n-3}
E
}
[(Xi - X)tJKi][(Xj - x)bKj][I1k(s)(Xk - x)CKk][I2l(t)(Xl- X)dKl]
i.pj.pk.pl
=
n I E[I1 (s
V
t)lu](u - X)tJ+b+c+d K 4 (·)f(u)du
+n(n - 1)1 E[I1 (s V t)lu](u - x)b+C+dK 3 (·)f(u)duI(U - x)O K(·)f(u)du
+n(n - 1)1 E[I1 (s V t)lu]( u - x)c+dK 2(·)f(u)duI (u - X)tJ+bK 2(. )f(u)du
+n(n - 1)1 E[lt(s)lu]( u - X)o+b+ cK 3 ( ·)f(u)duI E[I2(tlu)]( u - X)cl.K( ·)f(u)du
+n(n - 1)1 E[I1 (s)lu](u - X)CK(')f(u)dul E[I2(tlu)](u - x)o+b+dK 3 (·)f(u)du
+n(n - 1)1 E[I1 (s V t)lu](u - x)o+c+dK3 (')f(u)duI (u - x)bK( ·)f(u)du
+n(n -1)1 E[I1 (s)lu](u- X)tJ+CK2(')f(u)dul E[I2(t)lu](u- x)b+ dK 2(·)f(u)du
+n(n -1) I E[I1 (s)lu](u- X)b+CK 2(')f(u)dul E[I2(t)lu](u- x)o+dK 2(·)f(u)du
+n(n - 1)(n - 2) {/(u - x)tJ+b K 2(·)f(u)duI E[I1 (slu)](u - XY K(·)f(u)du
I E[I2 (t)lu](u-x)d K (')f(u)dU}
+n(n - 1)(n - 2) {/(u - x)O K(.)f(u)du/(u - x)bK(·)f(u)du
I E[I1 (s
V
t)lu](u - x y+dK 2 (')f(U)dU}
+n(n - 1)(n - 2) {/(u - x)tJ K(·)f(u)duI E[I1 (s)lu](u - x)b+cK 2(·)f(u)du
I E[I2(t)lu](u - x)dK(')f(U)dU}
+n(n - 1)(n - 2) {/(u - x)O K(·)f(u)duI E[I2(t)lu](u - x)b+dK 2(·)f(u)du
I E[I1 (s)lu](u-x)CK (')f(u)dU}
+n(n - 1)(n - 2) {I E[I1 (s)lu](u - x)tJ+c K2(')f(u)dul (u - x)b K(·)f(u)du
I E[I2(t)lu](u-x)d K (')f(u)du}
82
•
+n(n - 1)(n - 2) {j E[/2(t)lu](u - x)o+dK 2(.)j(u)duj(U - x)bK(·)j(u)du
j E[/I(s)lu](u - XrK(.)j(U)dU} "
+n(n -1)(n- 2)(n - 3) {j(U - X)OK(.)j(u)duj(u - x)bK(.)j(u)du
j E[/I(s)lu]( u - x)CK(· )j(u)duj E[/2(t)lu]( u - x)dK( .)j(U)dU}
=
n j H 1 (sVt/x)(hv)o+b+ c+dK 4 (v)j(x+hv)hdv
+n(n -1) j H 1 (s V tlx)(hv)b+ c+dK 3 (v)j(x + hv)hdvj(hv)OK(V)j(x + hv)hdv
+n(n - 1) j H 1 (s V tlx)(hv)c+dK 2(v)j(x + hv)hdvj(hv)O+bK 2(v)j(x + hv)hdv
+n(n - 1) j H 1 (slx)(hv)o+b+ cK 3 (v)j(x + hv)hdvj H2(tlx)(hv)dK(v)j(x
+ hv)hdv
+n(n -1) j H 1 (slx)(hv)CK(v)j(x + hv)hdvj H2(tlx)(hv)o+b+dK 3 (v)j(x + hv)hdv
+n(n - 1) j H 1 (s V tlx)(hv)o+c+dK 3 (v)j(x + hv)hdvj (hv)b K(v)j(x + hv)hdv
+n(n - 1) j H1 (slx)(hvt+ cK 2(v)j(x + hv)hdvj H 2(tlx)(hv)b+ dK 2(v)j(x
+ hv)hdv
+n(n - 1) j H1 (slx)(hv)b+ cK 2(v)j(x
+ hv)hdv
+ hv)hdvj
+n(n - 1)(n - 2) {j(hv)O+b K 2(v)j(x
..
H2(tlx)(hv)o+dK 2(v)j(x
+ hv)hdvj H1 (slx)(hv)CK(v)j(x + hv)hdv
j H2(tlx)(hv)dK(v)j(x + hv)hdv}
+n(n - 1)(n - 2) {j(hv)OK(v)j(x + hv)hdvj(hv)bK(V)j(X + hv)hdv
j H 1 (s
V
tlx)(hv)c+dK 2(v)j(x + hv)hdv}
+n(n - 1)(n - 2) {j(hV)O K(v)j(x + hv)hdvj H1 (slx)(hv)b+ cK 2(v)j(x + hv)hdv
j H2(tlx)(hv)dK(v)j(x + hv)hdV}
+n(n -1)(n - 2) {j(hV)OK(V)j(X + hv)hdvj H2(tlx)(hv)b+ dK 2(v)j(x + hv)hdv
j H1 (slx)(hv)CK(v)j(x + hV)hdv}
+n(n - 1)(n - 2) {j H1 (slx)(hv)o+cK 2(v)j(x + hv)hdvj (hv)b K(v)j(x + hv)hdv
j H 2(tlx)(hv)dK(v)j(x + hV)hdV}
+n(n -1)(n - 2) {j H 2(tlx)(hv)o+dK 2(v)j(x + hv)hdvj(hv)bK(V)j(X + hv)hdv
j H 1 (slx)(hvrK(v)j(x + hV)hdV}
83
+n(n - 1)(n - 2)(n - 3) {j (hvt K(v)j(x + hv)hdvj(hv)bK(v)j(x + hv)hdv
j H1(slx)(hv)CK(v)j(x + hv)hdvj H2 (tlx)(hv)dK(v)j(x + hV)hdV}
=
nha+b+c+d+I j(x)H1(svtlx) j va+b+c+dK 2(v)dv(1+0(1»
+n(n-1)h a+b+ c+d+2j2(X){H1(SVt IX)j Vb+C+dK3(V)dvj v aK(v)dv(1+0(1»
+H1(s V tlx) j va+bK 2(v)dv j vc+dK 2(v)dv(1 + 0(1»
+H1(slx)H 2(tlx) j Va+b+CK3(V)dvj vdK(v)dv(1 + 0(1»
+H1(slx)H2 (tlx) j vCK(v)dvj v a+b+dK3(v)dv(1 + 0(1»
+H1(Svt lx)j va+c+dK 3(V)dvj vbK(v)dv(1+0(1»
+H1(slx)H 2(tlx) j va+cK 2(v)dv j vb+ dK 2(v)dv(1 + 0(1»
+H1(slx)H 2(t IX)j va+dK 2(v)dvj Vb+ CK2 (V)dV(1+0(1»}
+n(n - 1)(n - 2)ha+b+c+d+3 j3(x) {
H 1(slx)H2(t IX)j Va+bK2(V)dvj VCK(v)dvj v dK(v)dv(1+0(1»
+H1(s V tlx) j vaK(v)dvj vbK(v)dvj Vc+dK 2(v)dv(1 + 0(1»
+H1(slx)H2 (t IX)j vaK(v)dvj vb+CK2(V)dvj v dK(v)dv(1+0(1»
+H1(slx)H2 (t IX)j vaK(v)dvj vb+dK2(v)dvj vCK(v)dv(1+0(1»
+H1(slx)H2(t IX)j va+cK 2(v)dvj vbK(v)dvj vdK(v)dv(1+0(1»
+ H 1(slx)H 2(t IX)j v a+dK 2(v)dvj vbK(v)dvj vCK(v)dV(1+0(1»}
+n(n - 1)(n - 2)(n - 3)ha+b+ c+dH j4(x)H1(slx)H 2 (tlx) {j va K(v)dv
j vbK(v)dvj vCK(v)dvj vdK(v)dV(1+0(1»}.
0
Thus,
(n 4 h4 )E[A*]
= (n 4h4)E {S~81O(slx)82o (tlx) - SIS281O(slx )821 (tlx) - SIS2811(slx )820(tlx)
+S~811(slx)S21(tlx)}
=
n(n - 1)h6 j2(x) {H 1(S V tlx) j v4K 2 (v)dv j K 2(v)dv(1 + 0(1»
-2H1(s
V
tlx) j v3K 2(v)dv j vK 2(v)dv(1 + 0(1»
84
•
-2Hl (slx)H 2(t/x) j v3K 2(V)dvj vK 2(v)dv(1+ 0(1»
Vt/x) [j v2K 2(v)dv(1 + 0(1»f
+2Hl (slx)H2(tlx) [f v 2K 2(v)dv(1 + 0(1»f}
+Hl(s
+n(n -1)(n - 2)h7 f3(x) {Hl(S v tlx)
•
U
v2K(V)dvf j K 2(v)dv(1 + 0(1»
+2Hl (slx)H 2(tlx) j v2K 2(v)dvj v2K(v)dv(1+ 0(1»
+Hl (s/x)H2(tlx) j v 4 K 2(v)dv(1 + 0(1»}
+n(n - 1)(n - 2)(n - 3)h8 f 4 (x)Hl (s/x)H2(tlx) [j v 2K 2(v)dv(1 +
0(1»f .
Lemma 6 Under Conditions 1 - '" , for a, b, c = 0,1,2, and d = 0,1,
(n 4 h4 ) E[SoSbScSld(six)]
=
nho+b+c+dH f(x)Hl(slx)
j vo+b+c+dK 4 (v)dv(1
+ 0(1»
+n(n - 1)ho+b+c+d+2 f2(x)H l (slx) {j vb+c+dK 3(v)dv j v OK(v)dv(1 + 0(1»
+ j v O+bK 2(V)dvj vc+dK 2(v)dv(1+0(1»+ j vO+b+CK3(v)dvj v dK(v)dv(1+0(1»
+ j vCK(V)dvj v o+b+dK 3(v)dv(1+0(1»+ j VO+C+dK 3(V)dvj v bK(v)dv(1+0(1»
+ j vO+CK 2(v)dvj vb+ dK 2(v)dv(1+0(1»+ j v O+dK 2(v)dvj vb+ CK 2(v)dv(1 + 0(1»}
+n(n - 1)(n - 2)ho+b+c+d+3 f3(x)H 1(slx) {
j
VO+bK2(v~dvj vCK(v)dvj vdK(v)dv(1+0(1»
+j vOK(v)dvj vbK(V)dvj vc+dK 2(v)dv(1+0(1»
+j
V
O
K(v)dvj vb+ cK 2(v)dv j v dK(v)dv(1 + 0(1»
+j vOK(v)dvj vb+dK2(v)dvj vCK(v)dv(1 + 0(1»
+j VO+CK 2(v)dvj vbK(V)dvj vdK(v)dv(1+0(1»
+ j VO+dK 2(V)dvj vbK(v)dvj v CK(V)dV(1+0(1»}
+n(n - 1)(n - 2)(n - 3)ho+b+ c+d+4 f 4 (x)Hl (S I X)'{j V O K(v)dv
j vbK(V)dvj vCK(v)dvj vdK(V)dV(1+0(1».}
/
85
Proof
We use the same notations as defined in Lemma 5.
(n 4 h 4 ) E[SClSbScSld(slx)]
{~(X; - ').K;~(X; - ')'K;t:(X. - ')'K.~I,,(X, - ')'K'}
= E
=
L=tl=/H(S)(X; - .)"+H-<+'Kt
E
+
n(n-l)
~
[(Xi - X)ClKi][I1j(s)(Xj - x)b+c+dKj]
i¢j=lc=l
n(n-l)
+
~
[(Xi - X)ClHKl][IlIc (s)(XIc - x)c+dKl]
i=j¢lc=l
+
n(n-l)
~
[(Xi- X)CI+b+ cKf][Ill(s)(Xl-x)dKt]
i=j=lc¢l
+
n(n-l)
~
[Ili(S)(Xi- X)ClH+dKf][(XIc-X) CKk]
i=j=l¢k
+
n(n-l)
~
[Ili(s)(Xi- X)CI+c+dKf][(Xj-x)bKj]
•
i=k=l¢j
n(n-l)
+
~
[(Xi - x)CI+cKl][I1j(s)(Xj - X)b+dKJ]
i=k¢j=l
n(n-l)
+
~
[I1i(S)(Xi - X)CI+dK;][(Xj - x)b+cKJ]
i=l¢j=1c
n(n-l)(n-2)
+
~
[(Xi - xt+bKl][(Xk - X)CKk][Ill(t)(Xl- x)dKt]
i=#Ic¢l
n(n-l)(n-2)
+
~
[(Xi - X)ClKi][(Xj - x)bKj][IlIc(S)(Xk - xy+dKl]
##k=l
n(n-l)(n-2)
+
~
[(Xi - X)ClKi][(Xj - x)b+ cKJ][Ill(s)(Xl- x)dKt]
i¢j=k¢l
n(n-l)(n-2)
+
~
[I1j(s)(Xj - X)b+dKJ][(Xi - x)ClKi][(Xk - XYKk]
i¢j=l¢1c
n(n-l)(n-2)
+
~
[(Xi - x)CI+cKl][(Xj - x)bKj][Ill(s)(Xl- x)dKtl
#i=lc¢l
86
•
+
n(n-l)(n-2)
L
[/li(S)(Xi - x)o+dK?][(Xj - x)bKj][/lk(S)(Xk - XrKk]
j¢i=l#
+
.
n(n-l)(n-2)(n-3)
.
L
}
[(Xi - X)OKi][(Xj - x)bKj][(Xk - XYKk][/ll(S)(Xl- x)dKt]
i¢j¢k¢l
=
n j E[/l(S)lu](U - X)o+b+c+d K 4 (·)f(u)du
+n(n - 1) j E[/l(S)lu](u - x)b+ c+dK 3(·)f(u)duj(U - x)O K(·)f(u)du
+n(n - 1) j E[/l(S)lu](u - xr+ dK 2(·)f(u)duj(U - x)o+bK 2(·)f(u)du
+n(n - 1)j E[/l(S)lu](u - x)dK(')f(u)duj(u - x)o+b+ cK 3(·)f(u)du
+n(n -1) j E[/l(S)lu](u - X)0+b+dK3(.)f(u)duj(u - xYK(')f(u)du
+n(n -1) j E[/l(S)/U](u- X)0+C+dK 3(.)f(u)duj(u- x)bK(·)f(u)du
+n(n - 1) j E[/l(S)lu](u - x)b+dK 2(·)f(u)duj(U - x)o+cK 2(·)f(u)du
+n(n - 1) j E[/l(S)lu](u - x)o+dK 2(·)f(u)duj(U - x)b+ cK 2(·)f(u)du
+n(n - 1)(n - 2) {j E[lt(s)lu](u - x)dK(·)f(u)duj(U - x)o+bK 2(·)f(u)du
•
j(U- XYK(.)f(U)dU}
+n(n -1)(n - 2) {j E[lt(s)lu](u - xy+d K 2(·)f(u)du j(u - x)O K(·)f(u)du
j(U- x)bK(·)f(u)du
+n(n -1)(n - 2) {j E[/l(S)lu](u - X)dK(')f(u)duj(u - x)OK(·)f(u)du
j(u- X)b+CK2(')f(U)dU}
+n(n - 1)(n - 2) {j E[lt(s)lu](u - x)b+d K 2(')f(U)duj(u - xt K( ·)f(u)du
j(U- X)CK(')f(u)dU}
+n(n -1)(n - 2) {j E[lt(s)lu](u - x)dK(·)f(u)duj(U - x)0+cK 2(·)f(u)du
j (u - x)bK(')f(U)dU}
+n(n -1)(n - 2) {j E[lt(s)lu](u - X)0+dK2(')f(u)duj(u - x)bK(·)f(u)du
j(U - X)CK(.)f(U)dU}
+n(n -1)(n - 2)(n - 3) {j E[/l(S)lu](u - x)dK(')f(u)duj(u - x)OK(')f(u)du
87
/ (u - X)b K(·)j(u)du/(u - X)d K(.)j(U)dU}
nho+b+c+d+I j(x)H1(8I x ) / Vo+b+c+d K 4(v)dv(1
=
+ 0(1))
+n(n _1)h o+b+ c+d+2 j2(x)H1(8Ix) {/ vb+c+ dK 3(v)dv/ v OK(v)dv(1 + 0(1))
+/
vO+bK2(V)dv/ v c+dK 2(v)dv(1 + 0(1)) +/ VO+b+CK3(V)dv/ vdK(v)dv(l+o(l))
+/
VCK( v )dv/ vo+b+d K 3(v)dv(1 + 0(1))/ v o +c+d K 3(v)dv/ vb K( v)dv(1 + 0(1))
+/
vo +c K 2(V)dv/ vb+dK 2(V)dv(1 + 0(1))
+/ VO+dK 2(V)dv/ v b+CK 2(v)dV(I+0(1))}
+n(n -1)(n - 2)ho+b+ c+d+3 j3(x)H1(8Ix) {
/ v O+bK 2(V)dv/ vCK(v)dv/ v dK(v)dv(l+o(I))
+/ vOK(v)dV/ vbK(v)dv/ vc+dK 2(v)dv(1 + 0(1))
+/
VOK(v)dV/ Vb+CK2(V)dv/ v dK(v)dv(l+o(l))
+/ vOK(v)dv/ Vb+dK2(V)dv/ vCK(v)dv(1 + 0(1))
+/
VO+CK 2(V)dV/ vbK(v)dv/ v d K(v)dv(1 + 0(1))
+/
vO+d K 2(V)dV/ vbK(V)dv/ VCK(V)dV(I+0(1))}
..
+n(n - 1)(n - 2)(n - 3)ho+b+c+dH j4(x)H1(8Ix) {/ VO K(v)dv
/ vbK(v)dv/ VCK(v)dv/ VdK(v)dv(1 + 0(1))}.
0
Thus,
(n4h4)E[B*]
=
= H2n (tlx)E [808~81O(8Ix) -
n(n -1)hSP(x)H1(8Ix)H2(tlx)
+/
8~8281O(8Ix) - 808182811(8IX) + 8~811(8Ix)]
{-4/ vK2(v)dv/
v3K 2(v)dv(1 + 0(1))
K 2(v)dv/ v 4 K 2(v)dv(1 + 0(1)) + 3 [/ v 2K 2(v)dv(1 +
+n(n - 1)(n - 2)h7 j3(x)H1(8Ix)H2(tlx)
{2/ v2K 2(v)dv/
O(1))f}
v 2K(v)dv
O(1))f}
+n(n - 1)(n - 2)(n - 3)h j4(x)H1(8Ix)H 2(tlx) [/ v2K(v)dv(1 + O(I))f.
+/
v4K 2(v)dv(1 + 0(1)) +/ K 2(v)dv [/ v 2K(v)dv(1 +
8
By symmetry, E[C*] = E[B*].
88
Lemma 7 Under Conditions 1 -
4,
(n 4h4)E[stJs"ScSd] = nhtJ+,,+c+d+I j(x)/ v tJ +,,+c+dK 4(v)dv(1 + 0(1))
....
•
+n(n - 1)htJ +,,+c+d+2 j2(x) {/ v"+c+dK 3(v)dv/ v tJ K(v)dv(1 + 0(1))
+/
v tJ +"K 2(V)dv/ v c+dK 2(v)dv(1 + 0(1))+ / vtJ +"+CK 3(v)dv/ v dK(v)dv(1 + 0(1))
+/
VCK(v)dv/ vtJ+"+dK 3(v)dv(1 + 0(1))/ vtJ +c+dK 3 (v)dv/ v"K(v)dv(1 + 0(1))
+/ vtJ+CK 2(V)dv/ v"+dK2(v)dv(1+0(1))+ / vtJ+dK 2(V)dv/ v"+CK 2(v)dv(1+0(1))}
+n(n -1)(n - 2)htJ+,,+c+d+3 j3(x) {/ vtJ +"K 2(v)dv/ vCK(v)dv/ vdK(v)dv(1+ 0(1))
+/ vtJK(v)dv/ v"K(v)dv/ v c+dK 2(v)dv(1 + 0(1))
+/
vtJK(v)dv/ v"+CK2(v)dV/ vdK(v)dv(1+0(1))
+/
vtJK(v)dv/ v"+dK2(v)dv/ v CK(v)dv(1+0(1»
+/
vtJ +cK 2(v )dv/ v"K( v )dv/ vdK( v)dv(1 + 0(1))
+/
v tJ +dK 2(V)dv/ v"K(v)dv/ vCK(v)dv(1 + 0(1))}
+n(n -1)(n - 2)(n - 3)h tJ +,,+c+dH j4(x) {/ vtJK(;)dv/ v"K(v)dv/ vCK(v)dv
JvdK(v)dv(1 + 0(1))}.
0
Thus,
(n h )E[D*] = H 1n (slx)H2n (tlx)E[sOS2 - 2S0 S2S1 + 81]
44
=
•
•
22
2
4
{-4/ vK 2(v)dvJv K 2(v)dv(1 + 0(1))
+/ K 2(v)dv Jv4K 2(v)dv(1 + 0(1)) + 3 [J v 2K 2(v)dv(1 + O(1))f}
+n(n - 1)(n - 2)h j3(x)H1(slx)H2(tlx) {2/ v 2K 2(v)dv/ v 2K(v)dv
+/ v4K 2(v)dv(1 + 0(1)) +/ K 2(v)dv [/ v2K(v)dv(1 + 0(1»f}
+n(n -1)(n - 2)(n - 3)h j4(x)H1(slx)H 2(tlx) [j v 2K(v)dv(1 + 0(1))f.
n(n -1)h6 j2(x)H1(slx)H 2(tlx)
3
7
.
S
Therefore,
(n 4h4)E[A* - B* - C* + D*]
=
n(n -1)h6 P(x)[H1(s V tlx) - H 1(slx)H2(tlx)] {/ K 2(V)dV/ v4K 2(v)dv(1 + 0(1))
89
2
3
+ [J v 2K 2(V)dV(1+0(1»f -2J VK (V)dvJ V K 2(V)dV(1+0(1»}
+n(n -l)(n - 2)h7 f3(x)[H 1(s
Vtlx) - H (slx)H2(tlx)] [f v2K(v)dv(l + O(l»f
1
J K 2(v)dv(1 + 0(1».
Applying the above result and assuming nh
More generally, as nh
-+ 00
-+ 00,
..
h -+ 0, we obtain
and h -+ 0,
cov(Wtn(slx), Wtln(tlx»
-+
l = 1,2.
[Htl\ll(s V tlx) - Ht(slx)Ht(tlx)]</>(x),
This completes the proof of Theorem 2.1.
0
Proof of Corollary 2.1
First, we shall decompose the MPE (Mean Product Error) into a covariance term and
product of two bias terms.
E[W,n(slxi)W,ln(tlxi' )]
=
E[Vnii"(H,n(slxi) - HI(slxi»Vnii"(H,n(tlxi l ) - H,(tlxi l»]
=
E {Vnii"[H,n(slxd - E(Hln(slxd) + E(H,n(slxi» - H,(slxdl
=
+ E(H'ln(tlxi l» - H,I(tlxi / )]}
E {[Wl~(slxi) + Vnii"BHAslxi)][W'~n(tlxi/) + v'nhBHtl(tlxil )]}
cov(W,~(SIXi), W'~n(tlxil» + nhBHt(slxi)BHt,(tlxi')
0(1) + nhBHt(slxi)BHt,(tlxi') as nh -+ 00, h -+ O.
=
=
=
Vnii"[H,ln(tlxi l ) - E(H'ln(tlxi l»
To prove cov(W,~(slxi), W'~n(tlxi'»
cOV(Sj(Xi),Stj/(Xi /»
-+
0, it is sufficient to show that cov(sj(xd, Sj,(Xi'»
= 0, cov(Stj(Xi),St1j/(Xi/» = 0,
j,j'
= 0,1,2,
l,i'
= 0,
= 1,2, asymptot-
ically.
Lemma 8 Under Conditions 1 and 2, and nh
-+ 00,
h -+ 0,
...
Proof
90
=
n 21h2 {E [£;(XA: - Xi)j(Xl- xii K(XA: ~ Xi )K(Xl ~
+ L:(XA: A:~
Xi)j K(XA:
+E[L:(XA: A:
1
=
{
n 2h2 n
-n I
=
-
Xi
h
I
(U -
)(Xl - xii K(Xl- Xi')]
h
Xi)j K(XA:
.
~ xi )]E[L:(Xl- xi,i K(Xl ~ Xi' )]}
I
U - Xi
.,
(u-xi)JK(-h-)(U-Xi,)3 K(
Xi)j K( U
~ Xi )f(u)du I( u -
n 21h2 {n I(hv)j K(V)(Xi - Xi'
:h {hjf(Xi) I
1
nh 0(1),
h
I
)f(u)du
K( U -h Xi' )f(U)dU}
Xi'
+ hv)j'K(Xi ~ Xi' + V)f(Xi + hV)hdV}
vi(Xi-Xi,+hviK(Xi~Xi' +v)dv(l+o(l))
j
-hj+! f2(Xi) I v K(v)dv I(Xi =
Xi' i
U - X·,
+ hv)j'K(Xi ~ xi' + V)f(Xi + hv)hdv
-n l(hv)jK(V)f(Xi + hv)hdv I(Xi -
=
Xi')
because
K(
x· - x·,
I h
I
Lemma 9 Under Conditions 1 -
Xi'
+ hviK(Xi ~ Xi' + v)dv(l + 0(1))}
+ v) -- 0
h --
as
o.
0
4, and nh -- 00, h -- 0,
Proof
COV(Sj(Xi),
=
Slj,(tlxi'))
= E[Sj(Xi)Slj,(tlxi')] + E[Sj{Xi)]E[Slj'(tlxi')]
n 21h2 { E [t;Il(t)(XA: - xii(Xl- xii K(XA: ~
Xi
)K(Xl ~
Xi')
+ ""
L.J(XA: - Xi)3.K( XA:-X'
h I )Il(t)(Xl - Xi)3., K( Xl-X"]
hi)
A:~
+E[L:(XA: -
Xi)j K(XA:
~ Xi )]E[L:lt(t)(Xl- xi' i
A:
=
n 21h2 { n I (u -
K(Xl
~ Xi' )]}
I
Xi)j K(
u
~ xi )E[Il(t)lu](u -
Xi'
i K( u ~ Xi' )f(u)du
-n I (u - Xi)j K( u ~ Xi )f(u)du I E[h(t)lu](u -
Xi' i
K( u ~ Xi' )f(u)du}
u - Xi
let v= -h-'
=
n 21h2 {n I(hv i K(v)E[Il(t)lxi
+ hV](Xi 91
Xi'
+ hvi K(Xi ~ Xi' + V)f(Xi + hv)hdv
-n
(j(h'IJ)j K('IJ)j(Xi
+ h'IJ)hd'IJ j
+ h'IJ](Xi -
E[Il(t)lxi
+ h'IJ)i'
Xi'
K(Xi~Xi' + 'IJ)j(Xi + h'IJ)hd'IJ)}
n~
=
j
{h j(xi)Hl(tlx) j vi(xi -
Xi'
-hj+! j2(Xi)Hl(tlx) j vi K('IJ)d'IJ j(Xi 1
=
nh 0(1),
because
x· - x·,
K( , h '
4,
Lemma 10 Under Conditions 1 -
..
+ h'IJ)i'K(Xi ~ Xi' + 'IJ)d'IJ(l + 0(1»
Xi'
+ 'IJ) -+ 0
and nh
+ h'IJ)i' K(Xi ~ Xi' + 'IJ)d'IJ(l + 0(1»}
h
as
-+ 00,
-+
o.
.
0
h -+ 0, for 0 < s < t <
00,
Proof
cov(Slj(slxi), Sl'j,(tlxi'» = E[Slj(slxi)Sl'j,(tlxi')] + E[Sj{slxi)]E[Slj,(tlxi')]
1
{
= n 2h2 E
[~ I
t/
Xle l(s)(XIe- x i)3K(
h
Xi
0
., •
)Il,(t)(Xl-xi)JR(
Xl h
Xi'
)
•
OX" - X0
0'
Xl - X·, ]
+L1l(s)(X,,-xi)JK(
h ')Il,(t)(Xl-xi)3K(
h ')
Ie¢'
+E[L1l(t)(XIe - xi)jK(X" ~ Xi)]E[L1l,(t)(Xl- xi,)i'K(Xl
I
Ie
=
n 21h2
{n J E[Il(s)11£](1£ -
Xi)j K(
-n J E[Il( S )11£]( 1£ - xii K( 1£
=
0
1£ ~ Xi )E[Il ,(t)I1£]( 1£ -
~ Xi )j( 1£ )d1£ J
{(J (h'IJFK('IJ)E[Il l\l'(s
n 21h2 n
V t)IXi
Xi' )j' K( 1£
E[Il'(t)I1£]( 1£ -
+ h'IJ)(Xi -
K(Xi
-n (J E[Il(s)lxi
~ Xi')]}
Xi'
~ xi' )j( 1£ )d1£
Xi')i' K( 1£
~ Xi' )j( 1£ )d1£ }
+ h'IJ)'
.,
~ Xi' + 'IJ)j(Xi + h'IJ)hd'IJ)
+ h'IJ](h'IJ)jK('IJ)j(Xi + h'IJ)hd'IJ
J E[Il,(t)lxi + h'IJ](Xi -
= n~ {h j j(xi)Hll\l'(S V tlx) j
vi(Xi -
Xi'
xi'
+ h'IJ)i' K(Xi ~ xi' + 'IJ)j(Xi + h'IJ)hd'IJ) }
+ h'IJ)i'K(Xi ~ Xi' + 'IJ)d'IJ(l + 0(1»
_hi+ 1 j2(Xi)Hl(slx)Hl,(tlx) (J 'IJ j K('IJ)d'IJ
J(Xi -
=
nIh 0(1),
because
K(Xi
Xi'
+ h'IJ)i'K(Xi ~ xi' + 'IJ)d'IJ(l + 0(1»)}
~ Xi' + 'IJ) -+ 0
92
as
h
-+
O.
The results of Lemmas 2.8 through 2.10 imply that
completes the proof of Corollary 2.1.
cov(W,~(slxi),W'~n(tlxil» --
O. This
0
Proof of Theorem 2.2
By the results of Theorem 2.1, Corollary 2.1, and the Cramer-Wold device, it is enough to
prove the the asymptotic normality of
2
P
L L ClkWln(tlxk).
l=l k=l
According to Lemma 4, Wln(tlxk) has the same asymptotic distribution as Slo(tlxk)/ j(Xk),
k
= 1", ',p, where Slo(tlxk) is given in the proof of Theorem 2.1.
Therefore, it suffices to
prove CLT for
where
2
Zni =
p
LL
l=l k=l
Clk
(
j(x ) [Ili - aot- bot(Xi - xk)]K
X . - Xk)
h
'
I
k
aOl and bOl are given in the proof of Theorem 2.1, and Zni are LLd. r.v.
We need to verify the Liapounoff condition: for some fJ > 0,
Z 2+6
L:i-l
[var (L:i=l Zni )]1+6/2
EI niI
Using
(1)
=0
.
la + bl q ~ max{2 q- l , 1Hlal q + Ibl q ), q ~ 0,
EI ZnI1 2+6
It, t. I~:·.)
~ t. t.
I
= E
[Ill - tJot - bol(Xl - •• )JK ( X
Glk E {Ill - aOl - bOl(XI
2
p
=LL
elk
l=l k=l
2
=
p
L L Glk
l=l k=l
1
~
"
~ Xk)
f'
2
-
xk)}K (Xl
-
X)12+ 6 IXI = u]K2+6( U
J
J
E[lIll - aol - bol( U
E[lIll - aoll2+ 6 lXI = x
93
1
+6
~ x )j( U )du
+ hv]K2+6(v)j(x)hdv(1 + 0(1»
2
=
p
hL:L:Ctk
t=1 k=1
J
E[lItl- aotI 2+cSIX1 = x+hv]K 2+cS(v)j(x)dv(1+o(1»,
where Ctk is a constant that depends on Ctk, j(Xk), and 6. Thus, Ei=1 EIZniI2+cS = O(nh).
Using the results of Lemmas 3 - 8,
var
(~Zni)
=
nvar(Znl)
=
n
{t. t. j2~~k)
2 2
(n h )var(Sio(tlxk))
p
+ 2: ;~t2k) (n 2h2)cov(Sio(tlxk), Sio(tlxk))
k=1
Xk
2 2
+22:
2:
j( Ctk)~t ) (n h )cov(Sio(tlxk), S;'O(tIXk'))}
t#;t' k#;k' Xk Xk'
=
2
n h
{t. t. j~:k)Ht(tlxk)[1-
+{;
p
Ht(tlxk)]
C
2k H1(tlxk)[1 - H2(tlxk)]
j(Xk)
Ca
J
J
2
K (v)dv(1 + 0(1))
2
K (v)dv(1
+0(1)) +0(1) }
Thus, [var (Ei=1 Zni)]I+e5/2 = O«n 2h)I+e5/2).
This completes the proof of Theorem 2.2.
D.
Proof of Theorem 2.3
94
"
Therefore,
(5.16)
where Ht(tlx),l = 1,2, is defined in Lemma 4,
and
95
The remainder terms, Rl n and R2n, converge to zero in probability. The bias term ,of
Ln(tlz) depends on the last three terms of (5.16) since the expected value of the first three
terms are zero. Thus, the bias of Ln(tlz) is
...
where BH"l
= 1,2, are given in Lemma 4.
The covariance of Ln(tlz) is computed anal-
ogously in Breslow & Crowly (1974). First, write cov(Ln(lz), Ln(tlz)) = var(Ln(slz)) +
cov(Ln(slz), Ln(tlz) - Ln(slz)). Using the results of Theorem 2.1 and integration by parts,
var(Ln(tlz)) may be expressed as the sum ofthe terms (5.17) through (5.22), given below
for 0 5 r 5 u 5 s.
Let us assume that P(Y ~ 0)
= 1 and P(Y ~ 0,6 = 1) = P(6 = 1) = Po, 0 < Po ~ 1.
.,
96
"
97
..
"
98
Summation of (5.17) - (5.22) yields the covariance expression given in Theorem 2.3.
It remains to show that cov{Ln{slx), Ln{t/x) - Ln{slx))
= O.
Summation of the following
nine terms, (5.23) through (5.31), yields cov{Ln{slx), Ln{tlx) - Ln{slx)) = O.
For 0 ~ r
~
COy
=
s
~
u
~
(fa6 ~~ndHb J.t ~1ndHl)
10
it 10r
= it r
1
=
6
6
r
10
=
= E
[fa6 ~1ndHl J.t ~1ndHl]
J.t r E[W2n{rlx)W2n{ulx)]dHl{rlx)dHl{ulx)
6
=
t,
H~{ rlx )H~{ ulx)
[H2{ulx) - H 2(rlx)H2(U I X)]dH { I )dH ( I )
H~{rlx)H1{ulx)
1 r x
1 u X
-it 1r
</>{x)dH1 (rlx)dH 1 {ulx)
H~{rlx)H2{ulx)
</>{x)dH1 {rlx)
dH 1 {ulx) _
H~{rlx)
6
H2{ulx)
0
it
6
r
10
</>{x)dH1 {rlx)dH1 {ulx)
H2{rlx)H 2{ulx)
</>{x)dH1{rlx)
dH 1 {ulx)
H2{rlx)
6
H2{ulx)
0
it
fa6 </>{~~7rI~;IX) {logS{tlx)-logS{slx)} -logS{slx)logS{tlx)</>{x)
+log2 S{slx)</>{x).
(5.23)
"
99
=
r
Hl(tIZ) Hl(slz)
}{
4>(z)dH1(rlz)
}
J
H~(rlzr
-logS(slz) 4>(z).
{ H (tlz) - H (slz) -logS(tlz)+logS(slz)
o
2
2
(5.25)
..
COY ( _
=
=
W1n(slz) W1n(slz) _ W1n(tIZ))
H2(slz) 'H~(slz)
H~(tlz)
E[W1n ( SIZ )W1n ( SIZ)]
E[W1n ( slZ )W1n (tlz)]
H~(slz)
H 2 (slz)H 2 (tlz)
_ H 1 (Slz)[1- H 1 (slz)]4>(z) +- H1(tlz)[1- Hl(slz)]4>(z)
H~(slz)
H 2 (sIZ)H 2 (tIZ)
•
(5.27)
COY
(
_ Wln(slz)
H 2 (slz)'
-it
WIn dH )
• H~
2
.
100
= _
=
r ¢(x)dH (rlx) it H (ulx)dH (ulx) + r H (rlx)¢(x)dH (rlx) it dH (ulx)
2
10
1
H (slx)
2
H~(rlx)
it
II
,
H1(ulx)¢(x)
H~(ulx)
1
1
+
1
H~(ulx)
10
{H1(S/X)
H (slx)
+log S(s/x)
2
1
H~(rlx)
}
1
, H2 (ulx)
{logS(tlx)-logS(slx)}¢(x).
(5.29)
(5.30)
•
..
This completes the proof of Theorem 2.3.
0
101
Proof of Theorem 2.5
The well known relationship between the cumulative hazard function and the survival function is
A= _JdH1 = _JdS = -logS.
H2
S
i.e., S = e- A • This is approximately Sn ~ e- An • Thus, the asymptotic distribution of
Sn(tlx) can be derived as:
=
v'fih[e- An
-
Sn
+ Sn -
e
-
(
) -A
At., - A e
+
(
An - A
)2
e- A
2! -
(
An - A
)3 e -
A
3! + .
e- A
=
-v'fih,(A n - A)e- A + v'fih,(An - A)22f
=
e -A
-v'fih(A n - A)3__ + ...
3!
-v'fih(A n - A)e- A + R1n(t, x)
-v'fih(A n - A)e- A + R1n(t,x)
e- A ] =
v'fih[Sn - S]
-A
-v'fih(An - A)e- A + R1n(t,x) + R 2n (t,x)
=
Thus, the bias of Mn(tlx) is
E[Mn(tlx)] = -e-A(tl~)E[An - A] = -e-A(tl~)BA(tlx) = -BAS = Bs.
The covariance is
•
cov (Mn(slx), Mn(tlx»
= cov (-Ln(slx )S(slx), -Ln(tlx )S(tlx»
=
S(slx )S(tlx)cov(Ln(slx )Ln(tlx»
=
S(slx)S(tlx)C(s A tlx),
where
C(slx) = -
Jor
dH1
H~ 4>(x).
The remainder terms are
R1n(t,x) =
v'fih {(An - A)2
e
;!A - (An -
e
A)3 ;;
+ (An _ A)4e~;} + ...
= v'fih,(An - A)2e-A~
=
=
_1_ [v'fih(A n _ A)]2 e-A~
~
_1_ L2 (tlx)e-A~
~n
,
102
.
where A:(tlx) is a random function assuming values between A(tlx) and An(tlx). Therefore,
by weak convergence of Ln(tlx) and consistency of A(tlx), sup IRln(t, X )1-
R2n(t, x) = ..;:;;h ISn(tlx) -
e-AnCtlz)
o.
I
=
..;:;;h/II(I- aAn(slx» _
=
..;:;;h ~)og(l- aAn(slx» + aAn(slx)
e-AnCtlz)/
"9
By weak convergence of H1n(tlx), H 2n (tlx) and An(tlx), R 2n (t,x) - 0 in probability. This
..
completes the proof of Theorem 2.5.
0
Proof of Theorem 2.7
The conditional mean regression function is m(x) = fooo S(tlx)dt and the estimate is
mn(x)
= fooo Sn(tlx)dt.
Thus,
..;:;;h[mn(x) - m(x)] = ..;:;Ji faoo[Sn(tIX)
=
where Mn(tlx)
= Vnh[Sn(tlx) -
-
S(tlx)]dt
faoo Mn(tlx)dt
S(tlx)]. The bias term is
00
=
10
10
=
Bm
E[mn(x) - m(x)] =
E[Sn(tlx) - S(tlx)]dt
00
The asymptotic variance is
..
var
[faoo Mn(tlx)dt]
= E
=
(T~(x)
[faoo Mn(t1x)dtf - B~
103
Bs(tlx)]dt
=
2
= 2
= 2
= 2
1 /.00
10000 /.00
1 /.00
100 [/.00
=
for 0
10000 /.00
~
s
~
E[Mn(tlx)Mn(slx)]dtds
{cov[Mn(tlx),Mn(slx)]E[Mn(tlx)]E[Mn(slx)]}dtds cov[Mn(tlx), Mn(slx)]dtds + 2
S(slx)S(tlx)C(slx) + B~ -
100 /.00
B~
E[Mn(tlx)]E[Mn(slx)]dtds _
B~
B~
S(tIX)dtr dsdC(slx),
t, and C( six) is given in Theorem 2.3. IT the largest observation is censored,
then the integral is infinite. In this case, we may truncate the upper bound of the integral.
Therefore, the bias and the asymptotic variance of the truncated mean Gaussian proccess
is
and
O'~(X,T(X))
=
'I"(z) [
10
/.
'I"(z)
This completes the proof of Theorem 2.7.
]2 dsdC(slx).
S(tlx)dt
0
..
.
Proof of Theorem 2.9
By the result of Theorem 2.1 and the Cramer-Wold device, it suffices to prove the asymptotic normality of
2
r
LL
ClpWln(tplx)
l=1 p=1
and tightness. The proof is analogous to the proof of Theorem 2.2. As before, Win (tplx) has
the same asymptotic distribution as Sio(tplx)/ f(x), p = 1,·", T, where Sl'o(tplx) is given in
the proof of Theorem 2.1. Therefore, we prove the CLT for
~~ Sio(tplx)
L-L-Cip
()
i=1 p=1
f x
104
and Vni are LLd. r.v.
We need to verify the Liapounoff condition: for some 6
Ef=1 ElVnil 2+c5
V.na.)]1+8/2 = 0
[var (~~
L.,,1=1
> 0,
(1)
.
ElVnl/ 2+c5
2
E1; ~l=)[Ill(tp)
< t,1; C,.E
=
E
l{lll(t,,) -
2
r
J
J
J
=
LLGlp
l=1 p=1
=
L L Glp
l=1 p=1
=
h L L Glp
l=1 p=1
2
r
2
r
2+8
- aOl - bOl(X1 - x)]K (X\- X)
"Ol -
b.,(XI
-
.)}K (Xl
h-') 1'+6
E[lIl1(tp) - aoi - boi(U - x)I2+ 8 IX1 = u]K 2+c5(U ~ X)j(u)du
E[lIl1(tp) -
aoi1
E[IIl1(t p) -
+c5IX1 = X + hv]K2+ 8(v)j(x)hdv(1 + 0(1»
2
aoi1
2
+c5IX1 = X + hv]K 2+c5(v)j(x)dv(1 + 0(1»,
where Glp is a constant that depends on Clp, j(x), and 6. Thus,
Ef=1 ElVnil 2 +c5 = O(nh).
By similar computation as in the proof of Theorem 2.2, we yield
This completes the proof of the Liapounoff condition.
.
It remains to show the joint asymptotic normality of Wln(t p, x) and W3n(X), where
f!r EK(xfX), and tightness.
Let tl < ... < t r and let Clp, p = 1, ... , l = 1,2, satisfy
W 3n (x) = v'iJi(fn(x) - j(x», jn(x) =
T,
II
105
Then Et E p CtpWtn(tplx) and W 3n (x) are sums of centered independent r.v. with variance
and covariance structures are
var
(~~CtPWtn(tplx»)
= LLvar(CtpWtn(tplx» + L L cov(CtpWtn(tplx),CtlpIWtn(tplx»
t p
t#:-t' p#:-p'
= LLc~pHt(tplx)[I- Ht(tplx)]¢(x)
t
p
+L
L CtpCtlpl[Htl\ll(tp V tp' )Ix) - Ht(tplx)Ht,(tp'lx)]¢(x) + 0(1)
t#:-t' p#:-p'
= L L[Htl\l,(tp V tp' )Ix) - Ht(tplx)]¢(x) + 0(1),
t,t' p,p'
var (W3n(X»
cov
=
=
=
var
(Vnh Un(x) - /(x»)
nh [var(80)]
/(x)
JK 2(v)dv(1 + 0(1» + 0(1),
(~~CtPWtn(tplx), W3n(X») = 0
by similar computation as in Lemmas 5 - 7. The Cramer-Wold device and Liapounoff
condition, discussed in Lemma 4, yield joint asymptotic normality of Wtn(tplx) and W 3n (x),
l
= 1,2,p= 1, ... ,r.
It remains to show the tightness. The expression Wtn(tlx) can be rewritten as
Wtn(tl~)
=
Vnh(Htn(tlx) - Ht(tlx»
= ..rnh {8 2Sto - 8IStI - Hl(t~X )(80 8 2
- 8
n}
80 8 2 - 8 1
=
where Sij
= ;& E(Ili -
..rnh {8 2 Sio -
i
8I S I } ,
8082 - 8~
Hl(tlx»(Xi - x)jK(xfX), j
Lemma 11 Under Conditions 1 -
= 0,1, l = 1,2.
4,
E[Sij] = 0(1) and
var[Sij] =
(:h) h2j /(X)Hl(tlx)[I- Hl(tlx)] Jv2j K 2(v)dv(1 + 0(1»,
106
l
= 1,2,
The proof of Lemma 11 is analogous to the proof of Lemma 3.
By Lemmas 1 and 11 and Slutsky's Theorem,
1
.
(
v'iihWln tlx) =
sio (
(
I(x) 1 + op 1)).
Therefore, Wln(tlx) has the same asymptotic distribution as v'iih (:1z~). It is sufficient to
"
exami ne the tightness of v'iih (:c~)
.
For s < t* < t we have
2
E [y';ihSio(slx) _ y';ihSio(t*lx)] [y';ihSio(t*lx) _ ..rnr;,Si.o(t1x)]
I(x) .
I(x)
I(x)
I(x)
~ 3{Hl(slx) - Hl(t*lx)}{Hl(t*lx) - Hl(tlx)}¢2(x) + 0(1).
2
Theorem 15.6 in Billingsley implies that for x E J, the process Wln(tlx) is tight. The proof
of this inequality can be given as follow.
r7hSio(slx) _ rrhSio(t*lx)]2 [ r-ThSio(t*lx) _ rrhSio(tlx)]2
E [vnn
I(x)
vnn I(x)
vnn I(x)
vnn I(x)
.
=
n 2h2
14(x)E[Sio(sIX) - Sio(t*lx)]2[Sio(t*lx) - Sio(tlx)]2
= n2h2~4(x)E[L>~li]2[L,Blil2
=
n2h2~4(x) {nE(a~l,Bld + n(n - l)E(a~I)E(,Bll)
+2n(n - 1)(Eal,Bld2}
(5.31)
Hl(slx)) - (Ili(t*) - Hl(t*lx»]K(xfX) and ,Bli = [(Ili(t*)Hl(t*lx)) - (Ili(t) - Hl(tlx))]K(xf X). Throughout the proof, we repeatedly use Theorem
where
ali
= [(Ili(S) -
1A in Parzen (1961) and Theorem 3 in Devroye & Gyam (1985).
E(a~d =
=
.
E[(Ili(S) - Hl(slx» - (Ili(t*) _ Hl(t*'x))]2K 2(Xi ; x)
((Hl(slx) - Hl(slx)) - 2(Hl(t*lx) - HI(slx)HI(t*lx))
+(Hl(t*lx) - HI(t*lx»}hl(x)
=
J
K 2(v)dv(1 + 0(1))
[Hl(slx) - Hl(t*lx)][l- (Hl(slx) - Hl(t*lx))]hl(x)
J
K 2(v)dv(1
+ 0(1)).
By symmetry
E(,Bll) = [Hl(t*lx) - Hl(tlx)][l- (Hl(t*lx) - Hl(tlx))]hl(x)
107
J
K 2(v)dv(1 + 0(1)).
Thus, as nh
-+ 00
and h
-+
0,
n(n - 1)
2
2
n 2h2f 4 (x)E(atl )E(!311)
~
{[Ht(slx) - Ht(t*lx)][l- (Ht(slx) - Ht(t*lx))][Ht(t*lx) - Ht(tlx)]
Of
[1- (Ht(t*lx) - Ht(tlx))]}4>2(x)
< [Ht(slx) - Ht(t*lx)][Ht(t*lx) - Ht(tlx)]4>2(x).
(5.32)
E(atl!3tl)
=
E[(ltl(S) - Ht(slx)) - (Itl(t*) - Ht(t*lx))][(Itl(t*) - Ht(t*lx))
2 Xl - X
-(Itl(t) - Ht(tlx))]K ( h )
A
= E[(Il1(S) -
Ht(slx)){Itl(t*) - Ht(t*lx)) - (Itl(S) - Ht(slx))(ltl(t) - Ht(tlx))
-(Il1(t*) - Ht(t*lx))2 + (Itl(t*) - Ht(t*lx))(Itl(t) _ Ht(tlx»]K 2(Xi; x)
=
[Ht(t*lx)] - Ht(slx)][Ht(t*lx) - Ht(tlx)]hf(x)
Thus, as nh
-+ 00
J
K 2(v)dV(l + 0(1)).
and h -+ 0,
E(a~I!3:I)
=
E{[{Iti(S) - Ht(slx)) - (Iti(t*) - Ht(t*lx))]2[(Iti (t*) - Ht(t*lx)) - (Iti(t) - Ht(tlx))]2
K 4 (X I - x)}
h
=
A
2
A
A
A
2
E{(Iti(S) - Ht(slx)) - 2(lti(s) - Hl(slx)){Ili(t*) - Hl(t*lx)) + (Ili(t) - Hl(tlx)) }
((Iti(t*) - Ht (t*lx))2 - 2{Ili(t*) - Hl(t*lx)))(lti(t) - Hl(tlx)) + (Ili(t) - Ht(tlx))2}
K 4 (X I - x)
h
= {Hl(slx)Hl(t*lx) + Hl(t*lx)Hl(tlx) + Hl(slx)Hl(tlx) + Ht(slx)Hl(t*lx)
Ht(t*lx)Hl(tlx) + Ht(slx)Hl(tlx) - 3Ht(slx)Ht(t*lx) - 5Ht(slx)Ht(tlx)}
hf(x)
=
JK 2
(v)dV(1 + 0(1))
O(h).
Thus, as nh
-+ 00
and h -+ 0,
(5.34)
108
Summation of (5.32) through (5.34) yields that
IJ"hSlo(slx) _ I"7h Slo(t*'x)]
E [vnn
f(x)
vnn f(x)
~
2
[
IJ"hSlo(t*lx) _ IJ"hSlo(tIX)]2
vnn f(x)
vnn f(x)
3[Hl(slx) - Ht(t*lx)][Ht(t*lx) - Ht(tlx)]4>2(x) + 0(1).
"
This completes the proof of Theorem 2.9.
0
109
Appendix A
Algorithms and S-plus Code Used for Simulations
and Calculation of the conditional mean and median survival time for the proposed, Beran and
Cox estimates
A.I
Introduction
The purpose of this appendix is to briefly discuss the statistical reasoning and algorithms
used to perform the computer simulation described in Chapter 3, and to provide the S-plus
code used to implement these algorithms. In addition, the S-plus code for calculation of
the conditional mean and median regression estimates for the proposed, Beran and Cox
estimates.
A.2
The simulation experiment: rationale and S-plus code:
A.2.1
Generation of simulated data sets
200 data points were generated from the following S-plus code. i is the random seed number.
i
= 1,2,3,4 were used
to generate 90%, 75%, 50%, and 25% uncensored simulated data
sets.
function(a. i. n
= 200)
{
• •••••••••••••• Generate Random Variables ••••••••••••••••••••••••••••••
set.seed(i)
110
b <- numerie(n)
x <- morm(n. mean
tt <- (300 - 0.5
= 55.
sd
= 5)
* x) * (x < 55)
+ (272.5 - 0.3
(x >- 55) + morm(n. mean - O. sd
r <- (300 - 0.5
,
* x) * (x < 55)
* (x - 55)-2) *
= 10)
+ (272.5 - 0.3
* (x - 55)-2) *
(x >- 55)
* x) * (x < 60)
cmean <- (a - 0.5
ee <- 5
(x
>- 60)
*
rexp(n)
+
+ (a - 30 - 3
* (x - 60» *
(cmean - 5)
Y <- apply(ebind(tt. ee). 1. min)
delta <- rep(O. n)
delta[tt <= ee] <- 1
ord <- order(x)
x <- x[ord]
tt <- tt[ord]
• orderd by x
r <- r[ord]
cmean <- cmean[ord]
ee <- ee [ord]
y <- y[ord]
delta <- delta[ord]
b <- ebind(x. tt, r. cmean. ee. y. delta)
b
}
A.2.2
Computing the proposed mean and median estimates
The formular of Hlhat and H2hat are given in (2.4). The kernel function K(·) is a standard
normal distribution. Because the difference between Beran's estimate and the proposed
estimate is the weight the program for Beran's estimate is omitted. The weight for Beran's
estimate is simply w = Knjsum(Kn).
funetion(Data
= sdata.90.
h)
{
111
•
•
•
•
•
sdata90 is a simulated data set with n=200, i=1, a=312.6, 901. events
sdata76 is a simulated data set with n=200, i=2. a=306.416. 761. events
sdata60 is a simulated data set with n=200. i=3. a=299.13. 601. events
sdata26 is a simulated data set with n=200. i=4. a=292.66. 261. events
sdata90=(x. tt, r. cmean. cc. y, delta)
•
J
x <- DataL 1J
y <- DataL 6J
delta <- DataL 7J
n <- length(x)
••••••••••••••• sort y and compute 11 and 12 •••••••••••••••••••••••••••••
sorty <- t(sort(y»
12 <- matrix(O, n, n)
for(i in 1:n) {
12 [i.
J <- (y [iJ >- sorty)
}
11 <- delta • 12
• •••••••••••••••••••••••• Compute H1hat. H2hat ••••••••••••••••••••••••••
H1hat <- matrix(O, n, n)
H2hat <- matrix(O. n. n)
sO <- rep(O,n)
for(i in 1:n) {
Kn <- exp( - (x - x[iJ)-2/2/h-2)
sO[iJ <- sum(Kn)
s1 <- sum(Kn • (x - x[iJ»
s2 <- sum(Kn • (x - x[iJ)-2)
de <- s2 • sO - (s1)-2
w <- (Kn • (s2 - (x - x[iJ) • s1»/de
H1hat[i.
J <- w 1..1. 11
H2hat[i.
J <- w 1..1. 12
}
112
• •••••••••••••••••••••••• Compute lambda ••••••••••••••••••••••••••••
qn <- matrix(O, n, n)
for(j in 1:n - 1) {
qn[, j] <- H1hat[, j] - H1hat[, j + 1]
}
qn[, n] <- H1hat [, n]
qn <- replace(qn, qn < 0, 0)
lambda <- (1 - qn/H2hat)
Cx <- qn/(H2hat-2)
Cx <- replace(Cx, qn>(H2hat-2), 0)
•••••••••••••••••••• Compute S ••••••••••••••••••••••••••••••••••••••••
S <- matrix(O, n, n)
for(i in 1:n) {
S[i,
] <- cumprod(lambda[i.
])
}
r
sorty <- sort(y)
deltat <- c(diff(sorty) ,0)
•••••••••••••••••••• Compute Median •••••••••••••••••••••••••••••••••••
med <- rep(O, n)
for(i in 1:n) {
u <- min(sorty[S[i.
] < 0.5])
1 <- max(sorty[S[i,
] > 0.5])
med[i] <- 0.5 • (u + 1)
}
•••••••••••••••••••• Compute Mean and variance of mean ••••••••••••••••••
dtS <- S • deltat
mhat <- sorty[1] + apply(dtS.l,sum)
dtS2 <- matrix(rep(mhat,n),n,n)
dtS3 <- apply(dtS.l.cumsum)
dtS4 <- cbind(rep(0,n),dtS3[,l:199])
dtS5 <- (dtS2-dtS4)-2
CxS <- Cx.dtS5
113
qn1 <- 1-H1hat[,1]
qn1 <- replace(qn1, qn1 < 0, 0)
sig <- qn1.(mhat-2)+apply(CxS,1,sum)
sigma <- (sig.nh)/(sqrt(2).sO)
••••••• make a matrix which contains mean, sigma and median •••••••••••••••
m <- cbind(x, mhat, sigma, med)
m
}
A.2.3
Computing Cox's mean and median regression estimates with quadratic
fitting
survftt for the coxph produces predicted survival at the mean x's, i.e., survftt computes
= (So)ezp(PX). Therefore, the conditional survival at each z point was computed
as S(tlz) = (Sm(tIZ))U'p(.Bx-I3x>. Then the conditional mean estimate at each z point is
Sm(tlz)
E t1t * S(tlz).
Note that since the difference between the linear and quadratic model is the
quadratic term in coxph we omit the S-plus code of the linear model.
function(Data • sdata90)
{
• sdata90=(xx,tt,r,cmean,cc,yy,delta): Simulated Data
• This program computes mean and median survival functions based on
• Proportional Hazards model with linear and quadratic terms
•
xx <- Data[, 1]
delta <- Data[, 7]
yy <- Data [, 6]
n <- length(xx)
•••••••••••••••
order by xx •••••••••••••••••••••••••••••••••••••••••••
ord <- order(xx)
xx <- xx[ord]
..
yy <- yy[ord]
delta <- delta[ord]
xm <- xx - mean (xx)
114
x.sq <- xx-2 - mean(xx-2)
• *********************
cox1 <- coxph(Surv(yy. delta)
..
*********************************
Cox Regression
N
xx + xx-2. model
= T.
x
= T.
print (cox1)
fitl <- summary(survfit(coxl»
SO <- fiU [, 4]
nS
<- length(SO)
tim <- fitl[. 1]
beta1 <- cox1$coef[1]
beta2 <- coxl$coef[2]
S <- matrix(O. nSf n)
for(i in l:nS) {
SCi.
] <- (SO[i])-(exp(betal
*
}
deltat <- rep(O. nS)
for(j in l:nS - 1) {
deltat [j] <- tim[j + 1] - tim[j]
.
}
tdt <- t(de1tat)
mhat <- tim[l] + tdt 1.*1. S
med <- rep(O. n)
for(j in 1:n) {
u <- min(tim[S[, j] < 0.5])
1 <- max(tim[S[, j] > 0.5])
med[j] <- 0.5
*
(u + 1)
}
m <- cbind(xx. t(mhat). med)
m
}
115
xm + beta2
*
x.sq»
Y = T)
A.3
Real data analysis I: Stanford heart transplant data
A.S.I
Computing the proposed mean and median regression estimates
func~ion(Da~a •
bb. h)
•
{
• bb is
•
•
s~anford he~ ~ransplan~ da~a:
bb.(~~. dea~h.
~~
del~a
<-
observa~ions
and 55 alives
age)
Da~aL
<-
157
1]
Da~aL
x <-
Da~aL
n <-
leng~h(x)
2]
3]
ord <- order(x)
x <- x[ord]
~~
<-
del~a
~~ [ord]
<-
.**************
so~y
<-
del~a[ord]
y and compute Ii and 12 *********************************
sor~
~(sor~(t~»
12 <- matrix(O. n. n)
for(i in 1:n) {
I2[i.
] <- (tt[i]
)=
Borty)
}
Ii <-
del~a
* 12
H1hat <- matrix(O. n. n)
H2hat <- matrix(O. n. n)
for(i in 1:n) {
Kn <- exp( - «x[i] - x)-2)/2/h-2)
sO <- sum(Kn)
•
s1 <- sum(Kn * (x - x[i]»
s2 <- sum(Kn * (x - x [i] ) -2)
to
de <- s2 * sO - (s1)-2
w <- (Kn * (s2 - (x - x[i]) * s1) )/de
116
Hlhat[i,
] (- v 1..1. 11
H2hat[i,
] (- v 1..1. 12
}
•
•••••••••••••••••••••••• Compute lambda •••••••••••••••••••••••••••••••
qn (- matrix(O, n, n)
for(j in l:n - 1) {
qn[, j] <- Hlhat [, j] - Hlhat [, j + 1]
}
qn[, n] <- Hlhat [, n]
qn <- replace (qn, qn < 0, 0)
lambda <- (1 - qn/H2hat)
Cx <- qn/(H2hat-2)
Cx <- replace(Cx, qn>(H2hat-2), 0)
•••••••••••••••••••• Compute S ••••••••••••••••••••••••••••••••••••••••
S <- matrix(O, n, n)
for(i in 1:n) {
SCi,
] <- cumprod(lambda[i,
}
tt <- replace(tt, tt
.-
0, 1.0001)
tt
==
1, 1.0005)
(-
replace(tt, tt
])
y <- log10(tt)
sorty <- sort(y)
deltat <- c(diff(sorty),O)
•••••••••••••••••••• Compute Median •••••••••••••••••••••••••••••••••••
med <- rep(O, n)
for(i in 1:n) {
'.
u (- min(sorty[S[i,
] < 0.5])
1 <- max(sorty[S[i,
] > 0.5])
med[i] <- 0.5 • (u + 1)
}
•••••••••••••••••••• Compute Mean and variance of mean ••••••••••••••••••
dtS <- S • deltat
117
mhat <- sorty[l] + apply(dtS,l,sum)
dtS2 <- matrix(rep(mhat,n),n,n)
dtS3 <- apply(dtS,l,cumsum)
.
dtS4 <- cbind(rep(O,n),dtS3[,l:199])
dtS5 <- (dtS2-dtS4)A2
CxS <- Cx"'dtS5
qnl <- 1-Hlhat[,1]
qnl <- replace(qnl, qnl < 0, 0)
sig <- qnl"'(mhat A2)+apply(CxS,l,sum)
sigma <- (sig"'nh)/(sqrt(2)"'sO)
................... make a matrix which contains mean, sigma and median
...
m <- cbind(x, mhat, sigma, med)
m
}
•
A.3.2
Computing Cox's quadratic mean and median regression estimates
function(Data • bb)
{
• bb=(tt, death, age): stanford heart transpalnt data.
• This program computes mean and median survival functions based on
• Proportional Hazards model with linear and quadratic terms.
•
xx <- Data[, 3]
tt <- Data[, 1]
delta <- Data[, 2]
tt <- replace(tt, tt == 0, 1.0001)
tt <- replace(tt, tt == 1, 1.0005)
n <- length(xx)
...........................................
order by xx
.
yy <- log10(tt)
ord <- order(xx)
118
xx <- xx[ord]
yy <- yy[ord]
delta <- delta[ord
•
Jan
<- xx - mean (xx)
x.sq <- xx-2 - mean(xx-2)
• ********************* Cox Regression ********************************
cox1 <- coxph(Surv(yy. delta) - xx + xx-2. model
= T.
x
= T.
print (cox1)
fit1 <- summary(survfit(cox1»
so <- fit1 [, 4]
nS <- length(SO)
tim <- fit! [, 1]
beta1 <- cox1$coef[1]
beta2 <- cox1$coef[2]
S <- matrix(O. nS. n)
for(i in 1:nS) {
S[i.
•
] <- (SO[i])-(exp(beta1
* Jan
}
deltat <- rep(O. nS)
for(j in 1:nS - 1) {
deltat[j] <- tim[j + 1] - tim[j]
}
tdt <- t(deltat)
mhat <- tim[1] + tdt Yo*Yo S
med <- rep(O. n)
for(j in 1:n) {
u <- min(tim[S [, j] < 0.5])
1 <- max(tim[S [, j] ) 0.5])
•
med[j] <- 0.5
*
(u + 1)
}
m <- cbind(xx. t(mhat). med)
m
119
+
beta2
*
x.sq»
Y • T)
}
A.4
Real data analysis II: SOLVD data
A.4.1
Computing the proposed mean and median estimates
function(Data - so. h)
{
• so is a solvd data set for clinic=BB or BB. 168 observations
• so-Cobs. id. age. ef. clinic. epx. chftime ) clinic-1 for BB. 2 for BB
•
x <- DataL 4]
delta <- DataL 6]
tt <- DataL 7]
n <- length(x)
tt <- replace(tt. tt == O. 1.0001)
Y <- 10g10(tt)
ord <- order(x)
x <- x[ord]
y <- y[ord]
delta <- delta[ord]
.************** sort y and compute 11 and 12 *****************************
sorty <- t(sort(y»
12 <- matrix(O. n. n)
for(i in 1:n) {
I2[i.
] <- (y[i] )= sorty)
}
11 <- delta
* 12
B1hat <- matrix(O. n. n)
•
B2hat <- matrix(O. n. n)
for(i in 1:n) {
Kn <- exp( - ((x[i] - x)-2)/2/h-2)
sO <- sum(Kn)
120
s1 <- sum(Kn • (x -
xCi]»~
s2 <- sum(Kn • (x - x [i] ) -2)
de <- s2 • sO - (s1)-2
•
w <- (Kn • (s2 - (x - x [i] ) • s1»/de
H1hat[i.
] <- w X.X 11
H2hat[i.
] <- w X.X 12
}
• •••••••••••••••••••••••• Compute lambda •••••••••••••••••••••••••••••••
qn <- matrix(O. n. n)
for(j in 1:n - 1) {
qnL j] <- H1hatL j] - H1hat[. j + 1]
}
qnL n] <- H1hat [. n]
qn <- replace (qn. qn < 0. 0)
lambda <- (1 - qn/H2hat)
Cx <- qn/(H2hat-2)
Cx <- replace(Cx. qn>(H2hat-2). 0)
.,
•••••••••••••••••••• Compute S ••••••••••••••••••••••••••••••••••••••••
S <- matrix(O. n. n)
for(i in 1:n) {
Sri.
] <- cumprod(lambda[i.
])
}
sorty <- sort(y)
deltat <- c(diff(sorty).O)
•••••••••••••••••••• Compute Median •••••••••••••••••••••••••••••••••••
med <- rep(O. n)
for(i in 1:n) {
u <- min(sorty[S[i.
] < 0.5])
1 <- max(sorty[S[i.
] > 0.5])
med[i] <- 0.5 • (u + 1)
}
•••••••••••••••••••• Compute Mean and variance of mean ••••••••••••••••••
121
dtS (- S • deltat
mhat (- sorty[1] + apply(dtS,1,sum)
dtS2 (- matrix(rep(mhat,n),n,n)
dtS3
(-
apply(dtS,1,cumsum)
dtS4 <- cbind(rep(O,n),dtS3[,1:199])
J
dtSS <- (dtS2-dtS4) -2
CxS <- Cx.dtSS
qn1 <- 1-H1hat L 1]
qn1 <- replace(qnl, qnl < 0, 0)
sig <- qn1.(mhat-2)+apply(CxS,l,sum)
sigma <- (sig.nh)/(sqrt(2).sO)
••••••• make a matrix which contains mean, sigma and median •••••••••••••••
m <- cbind(x, mhat, sigma, med)
m
..
}
A.5
Computing Cox's quadratic mean and median estimates
function(Data - so)
{
• so-Cobs, id, age, ef, clinic, epx, chftime): SOLVD data
• This program computes mean and median survival functions based on
• Proportional Hazards model with linear and quadratic terms
•
't't
<- DataL 7]
xx <- DataL 4]
delta <- DataL 6]
n <- length(xx)
tt <- replace(tt, tt
•••••••••••••••
==
0, 1.0001)
order by xx ••••••••••••••••••••••••••••••••••••••••••
122
yy <- log10(tt)
ord <- order(xx)
xx <- xx [ordJ
yy <- yy [ordJ
delta <- delta[ordJ
xm <- xx - mean (xx)
x.sq <- xx-2 - mean(xx-2)
• *********************
Cox Regression
********************************
cox1 <- coxph(Surv(yy. delta) - xx + xx-2. model
= T.
x
= T.
print (cox1)
fit1 <- summary(survfit(cox1»
SO <- fit1[. 4J
nS <- length(SO)
tim <- fit1 [, 1J
beta1 <- cox1$coef[1J
S <- matrix(O. nSf n)
for(i in 1:nS) {
S[i.
J <- (SO[iJ)-(exp(beta1
* xm
}
deltat <- rep(O. nS)
for(j in 1:nS - 1) {
deltat[jJ <- tim[j + 1J - tim[jJ
}
tdt <- t(deltat)
mhat <- tim[1J + tdt Yo*Yo S
med <- rep(O. n)
for(j in 1:n) {
u <- min(tim[S [, jJ < 0.5])
1 <- max(tim[S[. jJ > 0.5])
med[j] <- 0.5
,
*
(u + 1)
}
m (- cbind(xx. t(mhat). med)
123
+ beta2
*
x.sq»
Y = T)
,
References
Aalen, O. O. and Johansen, S. (1978). An empirical transition matrix for nonhomogeneous Markov chains based on censored observations. Scandinavian Journal
of Statistics, 6 141-150.
Bahadur, R. R. (1966). A note on quantiles in large samples. The Annals of Mathemat-
ical Statistics, 37 577-580.
Bangdiwala, S., Weiner, D., Bourassa, M., Friesinger II, G., Ghali, J., and Yusuf, S., for
the SOLVD Investigators (1992). Studies of Left Ventricular Dysfunction (SOLVD)
Registry: Rationale, disign, methods and description of baseline characteristics.
American Journal of Cardiology, 70347-53.
Beran, R. (1981). Nonparametric regression with randomly censored survival data.
Unpublished manuscript, Univ. of California, Berkeley.
Bhattacharya, P. K. and Gangopadhyay, A. K. (1990). Kernel and nearest-neighbor
estimation of a conditional quantile. Annals of Statistics, 18 1400-15.
Breslow, N. and Crawly, J. (1974). A large sample study of the life table and product
limit estimates under random censorship. Annals of Statistics, 2 437-53.
Buckley, J. (1984). Additive and multiplicative models for relative survival rates. Bio-
metrics 4051-62.
Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika 66
429-36.
..
Cox, D. R. (1972). Regression models and life-tables, Journal Royal Statistical Society
- Series B, 33 187-202.
124
Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall, New
York.
Dabrowska, D. (1993). Nonparametric quantile regression with censored data, Sankhya,
In print.
Dabrowska, D. (1987). Nonparametric regression with censored survival time data,
Scandinavian Journal of Statistics, 14 181-197.
Dabrowska, D. (1988). Kaplan-Meier estimates on the plane, Annals of Statistics, 4
1475-89.
Dabrowska, D. (1989). Kaplan-Meier Estimate on the plane: weak convergence, LIL,
and the bootstrap, J. of Multivariate Analysis, 29 308-25.
Devroye, L. (1981). On the almost everywhere convergence ofnonparametric regression
function estimates. Annals of Statistics, 9 1310-1319.
\
Devroye, L. and Gyorfi, 1. (1985). Nonparametric density estimation: the L 1 mew.
Wiley, New York.
Doksum, K. A. and Yandell, B. S. (1982). Properties of regression estimates based on
censored survival data. Festschrift for Erich Lehmann. Wadsworth, Belmont.
Elandt-Johnson, R. C. and Johnson, N. (1980). Survival models and data analysis.
Wiley, New York.
Fan, J. and Gijbels, J. (1991). Local linear smoothers in regression function estimation.
Institute of Statistics Mimeo Series
# 2055, University of North Carolina, Chapel
Hill.
Fan, J., Hu, T. and Truong, Y. K. (1991). Design adaptive nonparametric function estimation: a unified approach. Institute of Statistics Mimeo Series
.
# 2060, University
of North Carolina, Chapel Hill.
Fan, J. (1991). Design-adaptive nonparametric regression. Institute of Statistics Mimeo
Series
# 2049, University of North Carolina, Chapel Hill.
125
Fan, J. and Gijbels, I. (1992). Censored Regression: Nonparametric Techniques and
their application. Unpublished
Friedman, J. H. and Stuetzle, W. (1981). Projection Pursuit Regression (T & M) Journal of American Statistical Association, 76 817-823.
Gill, R. D. (1980). Censoring and stochastic integrals. Mathematical Centre Tracts 124,
,
I
Amsterdam.
Gill, R. (1983). Large sample behaviour of the product - limit estimator on the whole
line. Annals of Statistics, 11 49-58.
Gill, R. and Johansen, S. (1990). A survey of product - integration with a view toward
application in survival analysis. Annals of Statistics, 18 1501-1555.
Hardie, W. (1990). Applied nonparametric regression. Cambridge.
Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall,
London.
I
Jones, M.C., Marron, J.S., and Sheather, S.J. (1992). Progress in data-based bandwidth
selection for kernel density estimation. Institute of Statistic Mimeo Series
# 2088,
University of North Carolina, Chapel Hill.
Johnson, R. A. and Wichern, D. W. (1982). Applied multivariate statistical analysis.
Prentice Hall.
Johnston, G. J. (1979). Smooth nonparametric regression analysis. Institute of Statistic
Mimeo Series # 1253, University of North Carolina, Chapel Hill, NC.
Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of American Statistical Association, 53 457-81.
Koul, H., Susarla, V. and Van Ryzin, J. (1981). Regression analysis with randomly
right-censored data. Annals of Statistics, 9 1276-88.
Lai, T. L. and Ying, Z. (1988). Stochastic integrals of empirical-type processes with
applications to censored regression. Journal of Multivariate Analysis, 27 334-358.
126
~
Lai, T. L. and Ying, Z. (1991). Estimating a distribution function with truncated and
censored data. Annals of Statistics, 19 417-42.
•
'~
Lai, T. L. and Ying, Z. (1991). Large sample theory of a modified Buckley-James
estimator for regression analysis with censored data. it Annals of Statistics, 19
1370-1402.
Leurgans, S. (1987). Linear models, random censoring and synthetic data. Biometrika,
74301-9.
Lin, J. S. and Wei, L. J. (1992). Linear regression analysis based on Buckley-James
estimating equation. Biometrics, 48 679-81.
Mauro, D. W. (1983). A note on the consistency of Kaplan-Meier least squares estimators. Biometrika, 70534-5.
Miller, R. G. (1976). Least squares regression with censored data. Biometrika, 63449-
,
64.
Miller, R. G. (1981). Survival Analysis. John Wiley & Sons.
Miller, R. G. and Halpern, J. (1981). Regression with censored data. Technical Report
No. 66, Division of Biostatistics, Stanford University.
Parzen, E. (1962). On estimation of a probability density and mode. Annals of Statistics,
35 1065-76.
Peterson, A. V. (1977). Expressing the Kaplan-Meier estimator as a function of empiracal sub-survival functions. Journal of American Statistical Association, 72 854-58.
Schuster, E. F. (1972). Joint asymptotic distribution of the estimated regression function at a finite number of distinct points. The Annals of Mathematical Statistics,
43,84-8.
•
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. John Wiley
& Sons.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. Chapman
and Hill.
127
Shorack, G. R. & Wellner, J. A. (1986). Empirical processes with applications to statis-
tics. Wiley, New York.
Stone, C. J. (1977). Consistent nonparametric regression. Annals of Statistics, 5595645.
Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression.
Annals of Statistics, 10 1040-53.
Tsai, W., Leurgans, S. and Crowley, J. (1986). Nonparametric estimation of a bivariate
survival function in the presence of censoring. Annals of Statistics, 4 1351-65.
Tsiatis, A. (1975). A nonidentifiability aspect of the problem of competing risks. Pro-
cedure of National Academic Science, USA, 72 20-22.
Yang, S. (1991). Minimum Hellinger distance estimation of parameter in the random
censorship model. Annals of Statistics, 19 579-602.
Zhou, M. (1989). Asymptotic normality of Koul-Susarla-Van Ryzin estimator using
counting process. Institute of Statistics Mimeo Series
#
1770, University of North
Carolina, Chapel Hill.
Zhou, M. (1991). Some properties of the Kaplan-Meier estimator for independent nonidentically distributed random variables. Annals of Statistics, 192266-74.
Zhou, M. (1992). Asmptotic normality of the 'synthetic data' regression estimator for
censored survival data. Annals of Statistics, 20 1002-21.
128