Murphy, Susan A.; (1989).Time-Dependent Coefficients in a Cox-Type Regression Model."

No. 200lT
TIME-DEPENDENT CX>EFFICIENTS IN A CX>X-TYPE REGRESSION MODEL
by
Susan A. Murphy
A Dissertation submitted to the faculty of The
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for the
degree of Doctor of Philosophy in the Department
of Statistics
Chapel Hill
1989
Approved by:
_ _ _ _ _ _ _ _ _ _ _----'Advisor
_ _ _ _ _ _ _ _ _ _ _----'Reader
_ _ _ _ _ _ _ _ _ _ _....R eader
TIME-DEPENDENT CX>EFFICIENTS IN A CX>X-TYPE REGRESSION MODEL
SUSAN A. MURPHY.
(Under the direction of P.K. Sen.)
ABSTRACf
This thesis concerns itself with inference and estimation for a
time-varying regression coefficient in a Cox-type parameterization of a
point process intensity.
sieves.
The coefficient is estimated via the method of
A rate of convergence in probability for the sieve estimator is
derived and this rate is shown to be the best rate for the norm
considered.
A functional Central Limit Theorem for the integrated sieve
estimator and a consistent estimator of the asymptotic variance process
are given.
Inference for the above time-varying coefficient centers around
alternatives to a constant regression coefficient.
The use of a
likelihood ratio type test is explored for general infinite dimensional
alternatives.
Linear statistics and generalizations thereof are used to
test for a change point or a monotone trend.
For the linear statistic.
weight functions which maximize the asymptotic power with respect to
contiguous alternatives are derived.
TABLE OF CX>NTENTS
Chapter
I.
Page
INTRODUCfION
1.1
1.2
1.3
1.4
1.5
II.
The Cox Regression Model for Counting Processes
The Counting Process Framework
Parametrizations of the Stochastic Intensity
Motivating Applications.
Summary of Thesis and Further Research.
The Likelihood for a Counting Process
Cox's Partial Likelihood
The Method of Sieves .
The Statistical Model and Assumptions
Consistency of the Sieve Estimator
The i.i.d. Case.
Lemmas 2.1 and 2.2.
18
19
24
26
30
34
3S
ASYMPTOfIC DISTRIBUfIONAL TIIEORY
3.1
3.2
3.3
3.4
3.5
3.6
3.7
IV.
7
11
TIIE FSTlMATOR AND ITS CX>NSISTENCY
2.1
2.2
2.3
2.4
2.5
2.6
2.7
III.
1
2
5
Asymptotic Normality of the Sieve Estimator
A Consistent Estimator of the Asymptotic
Variance Process
A Partial Likelihood Ratio Test of a
Finite Dimensional Null HyPOthesis
Versus an Infinite Dimensional Alternate
Other Versions of the Partial Likelihood
Ratio Test
Consistency of the Partial Likelihood
Ratio Test
The i.i.d. Case
Lemmas 3.1. 3.2. 3.3. 3.4 and 3.5
46
51
53
73
77
87
91
LINEAR TFST STATISTICS AND GENERALIZATIONS
4.1 A General Linear Statistic.
4.2 Efficacy of the Linear Statistic
4.3 A Test for Regression
4.4 A Test for a Change-Point
4.5 The i.i.d. Case
4.6 Lemmas 4.2 and 4.3.
APPENDIX:
REFERENCES
ADDITIONAL NOfATION
107
114
123
132
145
147
154
156
I.
INTRODUCfION
1.1
The Cox Regression model for Counting Processes
This thesis is motivated by applications in which both an output
counting process N and an input covariate process X are observed.
Interest lies in estimating the effect of X on N and in formulating
statistical tests concerning the regression relationship.
In
particular. allowance is made for a non-stationary effect of X on N and
questions such as the following are addressed:
1)
Is there an increasing relationship between X and the frequency
of jumps of N?
2)
Does a stationary relationship between X and N experience a
"disruption" and change to a different yet still stationary
relationship?
3)
Is the relationship between X and N "stationary"?
Just as the mean is parametrized in the classical regression of one
random variable on another. here the conditional mean function can be
parametrized in the "regression" of N on X.
In particular the
stochastic intensity (the derivative of the conditional mean function).
of the output counting process is usually parameterized.
Here Cox's
parameterization (1972) of the stochastic intensity is adopted. i.e .•
2
i\
unknown scalar.
o
is an unknown function and 13
0
is an
Cox proposed this model as a way to relate covariates
such as treatment type • age • etc. to the chance of survival for an
individual.
However this model is not only useful in survival analysis
where each counting process has at most one jump. but also in contexts
where events can reoccur.
The Cox regression model as given above is limited in that it does
not allow for a time-dependent effect of X on N.
time-dependent effect is to consider 13
o
One way to allow for a
as a function.
The primary goal
of this thesis is to consider estimation and inference for the function
13.
o
1.2
The Counting Process Framework
Consider an n-component multivariate counting process. N =
'"
(N{l) •... N{n». and a predictable stochastic process
over the time interval
[O.T].
~
{O'~'{~t
stochastic intensity
intensity
~{X}
~{~}
(X{I) •... X{n»
For example. N{i} might count certain
life events for individual i and XCi} might be his age.
a stochastic base
=
t
€
~
is defined on
[O.T]}} with respect to which
= {i\{X.I), ...
i\{~,l}}.
~
~
has
haVing stochastic
implies that
(I. I)
is a local square integrable martingale with predictable variation,
(M{i), M{j}> t =
Jot
i\ {X , i }ds
s "'s
= 0
for
i=j
for i "# j.
3
~
Associated with
is the marked point process ( (T .Z ). (T .Z ) .... )
1 1
2 2
where T. is the time of the jth jump of N(e) (N(e)= ~ N(i»
J
i=l
specifies which of the components of
jumped at time T ..
~
J
and Z.
J
The
localizing times in (1.1) can be taken to be either the {T.}.>l or
J J_
{S.}.>l where
J J-
= co
otherwise
It is easily proved that
Ar
(~
j
.i) >
a
~
j
1.
on Z. = i.
j
J
In some applications. there are intervals of time in which jumps of
the N(i) can not be observed.
a
exists which is
If an observable predictable process C(i)
on these intervals and is 1 otherwise. then ~*. where
N~(i) = s~ Cs(i) dNs(i). i~l. is a multivariate counting process with
stochastic intensity
*
~ (~)
A(~.l)C(l),
= (
... ,
A(~,n)C(n».
The
localizing times for the martingale relationship will be as before.
fact, for
~ ITO
. 1
1=
ga
vector of predictable processes, if
Ics (i)\
A (X ,i)ds
s
~s
~ sot C (i) dN (i) -
i=l
s
In
s
< co
a.s.
~, then M. where Mt =
~ sot C (i) A (~ ,i)ds, t ~ [O,T], is a local
i=l
s
s
s
square integrable martingale with localizing times {Sj}.>ldefined by
J-
Sj = inf { S~ ICs(i)I As(Xs,i)ds ~ j}
t~O
=co
if { ... } # ~
otherwise
j ~ 1.
A review of the martingale theory used in this thesis can be found in
Anderson and Borgan (1985), Bremaud (1981), and Kopp (1984).
For completeness two results that will be used repeatedly in the
following work are given below.
Karr (1987) is given first:
Lenglart's inequality as formulated by
4
LengLart's InequaLity {Lenglart, 1977}.
Let X.Y be adapted. right
continuous nonnegative processes such that Y is predictable and
nondecreasing with YO = o.
Suppose that Y dominates X in the sense that
E[~] ~
for every finite stopping time T.
o and
E[YT];
then for each
e.~
>
every finite stopping time T.
If X-Y forms a local martingale then for every finite stopping time T.
E[~]
= E[Y ] and the above result can be utilized.
T
The next theorem is a central limit theorem of Rebolledo {1978}. as
generalized by Anderson and Gill {1982}:
ReboLLedo's CentraL Limit Theorem for LocaL square integrabLe
martingaLes.
For each n=1.2 •... let
NO
~
be a multivariate counting
Let ~ be a p x n {p ~ 1 is fixed} matrix of
process with n components.
locally bounded predictable processes.
Suppose that ~n has an intensity
process ~n. and define local square integrable martingales ~ = {~ ....•
~}~
p
~{t}
~ ~k{s} {dN~{s} - ~{s}
= Jt
o k=l
ds}.
Let A be a p x p matrix of continuous functions on [O.T] which form the
00
covariance functions of a continuous p-variate Gaussian martingale W .
~
00
00
00
with W CO} = 0; i.e. Cov{Wi{t}. Wi{s}} = Aij{t A s} for all i. j. t and
s.
Suppose that for all i. j. and t
< ~.
as n
~ 00
~j
>t=
Jt
~ ~k{s}~k{s}~{s}
ds ~ Aij{t}
J
o k=l
and that for all i and e > 0
5
~ H~k{s)~n{s)
So k=l
J
n!2l
Then W
'"
1.3
~
~
1
W00 as n
'"
~
I{I
~k{s) I
> c}
~
~
ds
0 as n
~
00.
1
p
00
in D{ [O.T] ).
Parametrizations of the Stochastic Intensity
In statistical applications the stochastic intensity.
~.
is often
Basawa and Rao (1980) and Borgan (1984) among others.
parametrized.
consider a finite dimensional parametrization
Both Basawa and
of~.
Rao. and Borgan use maximum likelihood estimation in order to construct
estimators.
A univariate linear regression model for
~(s)Xs{i) i~l)
(1988).
~.
(As{X.i)
=
is used by Aalen (1978. 1980). Karr (1987) and Leskow
To estimate
~.
Aalen proposes a martingale estimator whereas
Karr and Leskow use the method of sieves (Grenander. 1981).
McKeague
(1988) in a multicovariate linear regression model derives estimators
via a weighted least squares method.
Another regression model for A is the now classical Cox regression
'"
model (Cox. 1972).
Here. As (X.i)
=e
~oXs{i)
Ys {i)A0 (s). i€{l •...• n}.
where A is an infinite dimensional nuisance parameter and Y is a vector
o
'"
of predictable stochastic processes taking values in {O.l} indicating
censored intervals.
~.
o
Various approachs are taken in order to estimate
One approach presented in Anderson and Borgan (1985) is to let A
0
be a known function of a finite dimensional parameter.
estimate
e
and
~
o
e.
and then
simultaneously using maximum likelihood estimation.
Friedman (1982) parametrizes A by a step function and investigates the
o
properties of the maximum likelihood estimator of
number of steps defining A to increase with n.
o
~
o
• while allowing the
On the other hand. Cox
6
(1972) eliminates A by the use of a partial likelihood in order to
o
estimate p. This partial likelihood will be discussed further in
o
chapter 2. Asymptotic properties of the maximum partial likelihood
estimator are given by Anderson and Gill (1982).
As was mentioned in the introduction. this thesis is concerned with
a time dependent regression relationship between the covariate and the
This means Po is a function on [O.T]. i.e. As (X.i) =
counting process.
~
P (s)X (i)
e
s
0
Y (i)A (s).
s
0
regression coefficient.
Several authors have considered a time-varying
Brown (1975) using a discrete version of Cox's
model. parametrizes both p and A as step functions.
o
0
the likelihood in order to estimate
Po and A.
0
Stablein et. al. (1981)
parametrizes P as a polynomial function in time.
o
likelihood in order to estimate p.
o
He then maximizes
They use the partial
Anderson and Senthilselvan (1982)
also use the step function parametrization for p as did Brown. but they
o
allow the data to choose the jump points of the step function.
Essentially they consider a step function with two steps. so that there
are three parameters, the value of the step function before the step,
the time of the step and the value of the step function after the step.
The three parameters are estimated by maximizing Cox's partial
likelihood.
Each of the above authors makes simplifing assumptions on the form
of the function p so as to maintain a finite dimensional parameter
o
space.
However, Zucker and Karr (1989), using a penalized likelihood
technique, allow P
o
to be infinite dimensional.
They prove that for
large n, the maximizer of the penalized partial likelihood is a
polynomial spline with degree of the polynomial influenced by the choice
of the penalty function.
The knots of the spline occur at the jump
7
points of the counting process.
All of the above analyses are developed within the survival
analysis context; that is where N can have at most one jump.
The
estimation method presented in this thesis is applicable not only in the
survival analysis context. but also in the more general context where N
is allowed multiple jumps.
This method which also allows p
o
to be
infinite dimensional. utilizes the method of sieves {Grenander. 1981}.
and in particular a very simple sieve. the histogram sieve.
This choice
of a sieve retains the simplicity of analysis present in methods
involving only a finite dimensional parameterization of the regression
coefficient p .
o
1.4 Motivating Applications
In order to consider an application in survival analysis. consider
first how one might formulate a survival analysis model in the counting
process context.
Suppose that n individuals {operating independently}
are observed until failure.
Let T •...• T be the failure times. and
n
1
suppose that the hazard rate for the distribution of T. is given by h ..
I
Let Nt{i} =
I{Ti~t}.
and Yt{i} =
I{Ti~t}.
i=I •.... n.
I
Then
{N {I} •...• N {n}} forms a multivariate counting process with respect to
t
t
its' internal filtration
stochastic intensity
i=I .... n. t€[O.T]
{~ = a{Ns{i}. s~t. i=I •...• n}}t~O. The
of N{i} is given by.
{Aalen. 1978}.
At{i} = hi{t}Yt{i}.
If {C{I} •...• C{n}} is predictable
{say. left continuous. right hand limited and adapted to
,N}
where each
C{i} takes values in {O.I}. then N~{i} = f~ Cs{i} dNs{i}. i~l. also
forms a multiplicative counting process with respect to
,N
with
8
* = Ct(i)hi(t)Yt(i).
stochastic intensity At(i)
i~l.
On the other hand.
if the C(i)'s are independent of the N(j)'s. then by enlarging
=
V a(C(i). i=l •.... n». (N* (1) ....• N* (n»
.-N
<~.
counting process with respect to
*
~
ii
to ~*
forms a multivariate
(Bremaud. 1981).
For a discussion
of various censoring mechanisms that can be incorporated through C. see
Gi 11 (1980).
In survival analysis one might be interested in comparing an
invasive treatment (say. surgery and medication) to a less invasive
treatment (say. medication alone).
In this situation. it is often
expected that the hazard rate for the invasive treatment will be high
for some length of time and then drop. possibly to or below the hazard
rate for the noninvasive treatment.
might appear as in Figure 1.
A graph of the hazard functions
In the Cox regression model. the hazard at
I3X
time s is giyen by
e
0
A (s). where X=l for the invasive treatment and
o
o otherwise; that is. the hazard for the invasive treatment would be
130
e
A (s) and A (s) would be the hazard rate for the noninvasive
o
treatment.
0
Obviously this does not fit the hazards graphed in Figure 1.
In fact. it apPears that 13 is positive up to time s=t and is negative
o
thereafter.
This is the type of problem which motivated the work
illustrated in the last section.
Consider also a continuous time Markov chain on a finite state
space with inhomogeneous infinitesimal generator.
Suppose n individuals
move independently of one another from one state to the next.
Let
Nt{i.j.k) be the number of direct transitions from state i to state j up
to and including time t by individual k (i.jE<l •...• m».
If the
transition intensity from state i to state j for individual k is given
1
bya ijk • a L [O.T] function. and if the process is Markovian. then ~ =
10
N(i.j.k).
i.j.k~1.
i#j forms a multivariate counting process with
stochastic intensity. At(i,j.k) = aijk(t)Yt(i.k). where Yt(i.k) = 1 if
the k
th
individual is in state i at time t-. Yt(i.k)
(Aalen. 1975).
= O.otherwise
'fhe filtration here is the internal filtration.
Various
forms of censoring can be considered as in the survival analysis case.
Both Anderson and Gill (1982) and Anderson and Borgan (1985) consider
the Cox regression model in this context;
modeled conditionally on X by exp(
that is. where a
~o(i.j)Xt(k)
) AO(t).
ijk
is
Here as in
survival analysis. one can envision situations in which the effect of a
covariate on the transition intensities varies with time.
In sociology a simi liar problem might occur in the study of
criminal recidivism ( see Holden . 1985 for some failure time models
used in this area).
It is easy to envision a situation in which the
effect of the correctional treatment on the crime rate of an individual
will slowly disappear with time.
Another application arises in queuing problems.
Recall that a
simple queue can be constructed by a multivariate counting process.
(N(l). N(2». and Q a nonnegative integer valued random variable.
o
the queue. Q. is given by. Q = Q + At - D where At
t
t
o
I~ I(Qs_>O) dN s (2) (Bremaud. 1981).
= Nt (l)
Then
and Dt =
In the comparison of various
servers generating the output process D. an inexperienced server may
process the customers at lower rate than an experienced server. but the
rate of the inexperienced server may increase with time.
Basawa and Rao
(1980) consider inference for queueing problems when the parameters are
time independent.
11
1.5
Summary of Thesis and Further Research
In chapter II. the problem of estimating a time dependent
coefficient.
~
o
. in the Cox regression model is addressed.
Some
modification of the classical approach to maximizing the likelihood is
needed. since as explained in section 2.3. the likelihood is maximized
at
00.
here.
spaces.
The method of sieves (Grenander. 1981) is the modification used
In the method of sieves an increasing sequence of subparameter
{GK . n~1}.
is considered for which their union is dense in the
n
target parameter space. 9.
Classically the metric on the target
parameter space is derived in some fashion from the Kullback-Leibler
information (Grenander. 1981;
Geman and Hwang. 1982).
maximized over
GK .
n
increase with n.
GK
Karr. 1987;
Nguyen and Pham. 1982;
For a sample of size n. the likelihood is
To achieve consistency. K is then allowed to
n
In this thesis the histogram sieve is used; that is.
consists of all step functions with steps at predetermined points.
n
~o is estimated by a step function.
This means that
for a sample of size n.
consistent for ~
o
pn
o
with K steps
n
is shown via a
but involves unknown quantities.
the projection of fj
as the metric.
Consistency of
pn.
~. which is
Essentially ~ is
on the sieve using a "Kullback-Leibler information"
Grenander. Geman and Hwang. Nguyen and Pham. and Karr
all make use of the projection on the sieve in proving consistency;
however. in the applications they consider. the Kullback-Leibler
information translates in a 1:1 fashion into a metric on the target
parameter space.
This does not appear to occur here. instead the
Kullback-Leibler information can be approximated to the first order by
2
an L norm.
~
fj
is then the projection on
GK
n
2
using this L norm.
12
"n
Theorem 2.1 shows that the distance between
~
and
-n
~
approaches zero
(in probability) as the sample size increases and gives a rate of
2
convergence in L .
That this rate is the best achievable for this norm
can be seen from note 2 following Theorem 2.1.
In this case, as in similiar cases, (Karr, 1985;
Ramlau-Hansen, 1983;
1989;
Leskow, 1988;
Zucker and Karr,
Nguyen and Pham, 1982),
appears to have a limi ting "whi te noise" distribution;
hence, for
inference purposes it is useful to consider an integrated version of
"
= f Ot ~(s)-~o{s)ds.
i.e., Zt
"n
~
~,
It is natural to expect that Z, properly
normalized, will converge weakly to a Gaussian martingale.
indeed the case is proved in Theorem 3.1.
That this is
McKeague{I988) proves
functional weak convergence in a similiar situation, that of the method
of sieves combined with least squares estimation for a multivariate
version of Aalen's (1978) multiplicative intensity model, i. e., instead
of the intensity modeled by At{X)
At{X) =
~o{t)Xt.
=e
~
0
(t)X t
AO{t), it is modeled by
As would be expected, the competing interests of bias
and variance must be traded off in order to obtain a weak convergence
~.~
Essentially the weak convergence of n f
result.
requires that K «n~
n
~
•
and K »n~
n
;;Il
n f O p (s)-~o{s) ds to zero.
~
.
~
for n f O p
(s)-~o{s)
O
;;Il
p {s)-p (s) ds
is reqUired for the convergence of
In addition to the weak convergence result
ds, a consistent estimator of the asymptotic
variance process is given.
In chapter III, the problem of formulating a consistent test of a
finite dimensional null hypothesis versus an infinite dimensional
alternate hypothesis is addressed.
on the null hypothesis that
alternate hypothesis that
~
~
.0
o
In particular attention is focused
is a constant function versus the
is nonconstant.
As a first step a partial
13
likelihood ratio test is considered.
That is. two times the ratio of
the partial likelihood maximized under the alternate hypothesis to the
partial likelihood maximized under the null hypothesis.
Just as in the
problem of estimating an infinite dimensional parameter. the partial
likelihood under the alternate is maximized at
00.
This problem can be
remedied. as before. by working within the context of a sieve.
for a sample of size n. consider the null hypothesis.
versus the alternate hypothesis.
~o € ~.
~
o
That is.
is constant
It is natural to eXPect that
n
the partial likelihood ratio test (PLRT) is approximately distributed as
a chi-squared random variable on K -1 degrees of freedom.
n
intuition behind Theorem 3.3 in which
This is the
PLRT- (K -1)
_ _ _~n;.;;.""",..,...... is shown to
[2{K -1)]~
n
converge in distribution to a N{O.l) random variable under the null
hypothesis.
Alternately Fisher's (1922) transform of the chi-squared
random variable can be considered:
2~{ PLR~-{Kn -1)~ } which can also
be shown to converge in distribution to a N{O.I) random variable.
addition. a sequential version of the PLRT is given.
In
Moreau. O'Quigley
and Mesbah (1985) propose the use of a score test for the alternate
hypothesis that
~o
is a "K-step" step function.
Unless
~o
is actually a
"K-step" step function. this test as it stands will not be consistent
for general nonconstant
~.
o
However within the sieve context. ( i. e ..
as data/information accumulates. allow more steps in alternate
hyPOthesis). a consistent test results. as is proved in Theorem 3.4.
The PLRT can be used to investigate question 3 as posed at the beginning
of section 1.1.
In order to investigate questions such as 1. and 2. in section 1.1.
a linear statistic is proposed. i.e .. St
t
A
= f O w{s){~{s)-~o{s»ds
where
14
w is an appropriate weight function.
~ has a
Recall that. intuitively.
limiting white noise distribution. in other words. the
{~(s)}S€[O.T]
act as independent normal random variables with means {Po(s)}s€[O.T].
Given K independent random variables Xl ...
·.~
for linear inferences concerning the
the optimal statistic is the
~·s.
with means
~1'
...
linear statistic L w.X. where the w. are appropriate weights.
ill
w(s)~(s)"
=
I w(s)~(s) ds.
then
The
1
analogous statistic here is "L
s
'~
This type
of statistic has been used by Aalen (1978). Gill (1980). and O'Sullivan
(1986) among others.
In particular Gill (1980) uses the statistic
I w(s)~(s) ds for inference in Aalen's (1978) multiplicative intensity
model (At (X) = fJRO(t)X t ).
ITo w(s)~(s)
ds
= IT0
In this case
w(s)X-
1
s
dN.
s
Ie0 ~(s) ds
Ie0 X-I
dN s so that
S
At the beginning of chapter IV.
asymptotics for a general linear statistic
given.
=
I~ w(s)[~(s) - Po(S)]
ds are
In fact since w may involve unknown quantities. conditions are
provided for the convergence of an estimator. w • to w so as to preserve
n
the asymptotic distribution.
Naturally this section includes a
consistent estimator of the asymptotic variance of
ITo
wn (S)[~(S) -
P0 (S)]
ds.
In order to select a weight function. an optimality criterion must
be adopted.
One optimality criterion would be to maximize asymptotic
power against a contiguous alternative
(P o (s)
+ p(s)n-~).
This is the
method Gill(198O) uses in order to formulate two-sample tests in
survival analysis.
To return to the subject of chapter III. the PLRT.
it is interesting to note that the PLRT is not consistent against
contiguous alternatives.
Intuitively this is because the PLRT is
consistent against all directions of approach to P unlike the linear
o
tests proposed in chapter IV.
This is discussed further in note 3
15
following Theorem 3.4.
In chapter IV the choice of the weight function
is based on maximizing the asymptotic power against a contiguous
alternative.
Since the focus of this thesis is on deviations from the
null hypothesis that
~
o
STo
functions for which
is a constant unknown function. only weight
w(s) ds
=0
are considered.
Using Le Cam's third
lemma (Hajek and Sidak. 1967. p 208) it is easy to derive the asymptotic
power of the linear statistic and to subsequently derive the weight
function
test by
choice
w~
which maximizes the asymptotic power.
LS(w~).
of~.
Of course
LS(w~)
Denote the resulting
is only optimal for the particular
It is then tempting to employ Roy's union-intersection
principle (Roy. 1953); that is to reject the null hypothesis if for any
~. LS(w~)
direction of approach.
is large. but this returns one to the
PLRT which is inconsistent against contiguous alternatives as mentioned
above.
In the last two sections of chapter IV. attention is focused on
linear test statistics for the alternatives formulated by questions 1.
and 2. in section 1.1.
That is. the null hypothesis is that
~
o
is a
constant unknown function and the first alternate hyPOthesis is that
~
~
is an increasing function and the second alternate hyPOthesis is that
is constant up to some unknown time
constant thereafter.
T.
o
o
changes value and then remains
To formulate a test for increasing alternatives.
the weight function is chosed to be optimal for the contiguous
alternative.
~
~
o
+ sn
; i.e .• the linear alternative.
Asymptotics under
both the null hypothesis and under the contiguous alternate hypothesis
for this test are given in corollary 4.1.
Also. the test statistic
given in corollary 4.1 is consistent for the fixed alternate that
an increasing function.
~o
It turns out that the asymptotic power under
is
16
the contiguous alternative, ~
o
+ sn-~, of the linear test can be
attained by a much simpler test proposed by Cox(1972).
for H : At(X)
o
=e
aX t
AO(t), a, A unknown, versus HI: At(X)
O
(a+~t)Xt
e
This is the PLRT
A (t), a, A unknown,
o
0
=
~#O.
The test for the second alternate (that of a change point) is more
interesting than the test given above.
First the change point is
assumed to occur (if at all) at time T.
An optimal weight function, can
then be derived for the contiguous alternate
Call the resulting test statistic TS (T).
n
(~o + on~I{s~T}, 0>0).
It turns out that one can
derive the asymptotic distribution of TS (taken as a function in
n
C[O,T]) under both the null hypothesis and the contiguous alternate.
Since, in fact the time of change is unknown, Roy's union-intersection
principle (Roy, 1953) can be used to justify the use of sup TS (T) as
n
T
the test statistic.
This test statistic is consistent under the
contiguous alternate
(~o
+ On
-~
I{s~T},
0>0) and asymptotically has a
tabulated null distribution.
In each of the following chapters a section is devoted to the
observation of independent and identically distributed copies of
(~,~).
This is done so as to illustrate that simple moment conditions (at least
in the i.i.d. case) are sufficient to imply the assumptions of the
theorems in this thesis.
Some questions which spur further thought and/or research are as
follows:
1)
How useful are sieves in hypothesis testing?
As is seen in this
thesis, the sieve approach in hypothesis testing allows one to formulate
consistent tests for infinite dimensional alternatives.
In fact the
17
commonly used X2 goodness-of-fit test can be formulated within a sieve
context.
A related question is whether, in the sieve context, the X2
approximation to the likelihood ratio test is valid for finite n.
2)
It is plausible that a perceived time dependency in the regression
coefficient may be caused by an omitted covariate.
Is there some way to
discriminate between a true time dependency and superfluous time
dependency caused by the omitted covariate?
3)
Would the concept of stochastic complexity be useful in order to
more clearly identify the optimal rate of sieve growth?
Barron and
Cover (1989) use this concept in density estimation.
4)
Can something be said in general about the usefulness of fixed point
theorems in consistency proofs for finite dimensional sieve estimators?
5)
Will
sup
s€[O,T]
Ip~ (s)
- ~n (s)1 (properly normalized) have an asymptotic
extreme value distribution?
II.
2.1
THE ESfIMATOR AND ITS CONSISTENCY
The Likelihood for a Counting Process
~
Let
~)
and
~.
be two probability measures on a measurable space (0.
and let N be a n-dimensional vector of step functions. each of which
'"
is right continuous with jumps of size 1.
filtration
}.
{~t = ~ V ~o : t ~
t ~ [O.T].
under
~. ~
Assume that
~
On (0. ~) consider the
[O.T]} where
and
~.
-
~o
C
are equal on
~ and ~ =
~
0
. Also assume that
is a standard multivariate Poisson process and under
a multivariate counting process with stochastic intensity A.
~ fTo A (i)ds < 00 a.s. ~. then ~
i=1
~s' s~t
u{
s
« ~
'"
on
~. ~
If
~T with Radon-Nikodym
derivative.
= exp{ ~ ft Ln A (i) dN (i) + ~ ft I-A (i) ds}
i=1 0
s
s
i=1 0
s
o~
(Karr. 1986).
t
~
T
Using the associated marked point process ( (T .Z ).
1 1
(T2 .Z2 ) •... ). the above likelihood. Lt' can be rewritten as:
Ark (~)
Ar (.) • (
k
A_
-T
k
(.»
exp{f0t n-A s (.) ds}
is
19
(2.1)
=
P[
IT
k~1
7_
~
I~T]
k
- exp{J
t t
Ln A (-) dN (-) + J n-A (-) ds}
s
0
s
s
0
Tk~t
(a (-) in the place of an index means summed over the index). (Bremaund.
1981).
Note that the second term in the above product is the likelihood
for N(-) except for the term e(n-l)t.
One gets that
IT
k~l
P[
~ I~T-]
k
Tk~T
is proportional to the conditional likelihood of (N(l) •... N(n»
given N(-) on [O.T].
on [O.T]
This relationship will be useful for an intuitive
understanding of the partial likelihood.
2.2
Cox's Partial Likelihood
In order to reduce the dimensionality of a parameterization
containing many nuisance parameters. Cox (1975). proposed maximizing a
partial likelihood instead of the full likelihood to estimate the
parameters of interest.
Let y
be an observation of the random variable
'"
factored as:
m
IT f
(.) (J_l)(z.lt(j).z(j-l);p.A)
j=l Z .IT J • Z
J
J
where t(j)
= (t1 •...• t j ).
z(j)
= (zl ..... Zj).
p represents the
parameters of interest and A represents the nuisance parameter.
The
20
second product is the partial likelihood.
In order to construct the
Cox suggested that none of the T.·s should
J
contain important information about
partial likelihood.
~
and A should not appear in the
Wong (1986) derives sufficient conditions for
consistency of estimators based on the partial likelihood.
In the same
paper. Wong also presents conditions for asymptotic normality and gives
formulae for examining asymptotic efficiency.
One does not expect the
maximum partial likelihood estimator to be asymptotically fully
efficient as some information concerning the parameters of interest has
been discarded.
However there is a gain in simplicity of analysis and
in robustness since the functional form of A need not be specified.
The
efficiency of the partial likelihood in the counting process context
will be addressed below. The concept of partial likelihood is very
useful for the statistical analysis of Cox's regression model.
Cox (1972) considered Y •...• Y • independent. nonnegative. random
1
n
variables with hazard functions. lim P[Y i ~ t+hlY i ~ t]1h = At(i)
h!O
i=l •...• n.
where
He parametrized
At(i) =
At(~.i)
by
AO(t)
exp{~
Xt(i)}
Xt(i) is the covariate at time t for the random variable Y and
i
A is a nuisance function.
o
The parameter of interest.
~.
measures the
effect of the covariate X on the underlying hazard rate A.
o
likelihood of Y •...• Y is.
1
n
The
21
n
=
(2.2)
1
11
i=1 ~=1
exp{{j Xy.(j)}
1
YjLY i
Y.
n
•
exp{{j Xy. (i)}
Ao (Y.)
exp{{j ~_
(j)} ) exp{-J0 1 A (s) exp{{j X (i)} ds}
1
-~.
o
s
11
i=1
1
In order to estimate {j. Cox proposed maximizing the first term on the
RHS (right hand side) above which is a partial likelihood.
let
Zj = k
if
To see this
Y(j) = Yk and Tj = Y(j) where Y(j) is the jth lowest
value out of (Y •...• Yn ). j=1 •.... n.
1
Then it is easy to show that
P(Zj = klT j • (Tj_1.Zj_1) •.... (T1.Z1»
=
exp{{j Xy (k)}
~
i=1
YiLYk
k
exp{{j ~_ (i)}
I{ Y > Y.
k -
·~k
}
(J)
which does not involve the nuisance parameter A . If Y is the failure
k
O
th
time of the k
object. then intuitively the above is the probability
that the k
th
object fails at time T given that there is a failure at
j
time T and the past history.
j
To set Cox's regression model in a counting processes context. let
Nt(i) =
I{Yi~t}.
i=1 •...• n.
Then (N t (1) ..... Nt (n»
is a multivariate
~LO-counting process with intensities. At({j.i) =
AO(t) exp{p Xt(i)} I{Y i L t}.
i=1 •...• n (Aalen. 1978).
If
(T .Z ) •... (T .Z ) are defined as in the above paragraph. then the jump
1 1
n n
times of
~
are T .T •.... T and process
1 2
n
~
jumps at time Tk .
The
likelihood on ~ with respect to the standard multivariate poisson point
process is
22
n
IT exp{ST Ln A (~.i)dN (i) + ST I-A (~.i) ds}
0
s
s
0
s
i=1
Y.
n
•
IT exp{-S
. 1
A (s) exp{~ X (i)} ds + T} .
1
0
1=
e
e
0
S
The above is identical to the likelihood for Y ....• Y except for a
1
n
nT
factor of e . Thus from (2.1) and (2.2). is easy to see that the
partial likelihood is ITk~1
T ~T
k
probability that the
~
th
P[ ~ I~T-]: i.e .. the product over k of the
k
process jumps at time T given a jump at T
k
k
e
e
and the past history.
Within the counting process framework. Cox's regression model and
partial likelihood can be easily extended to counting processes with
more than one jump.
~
o
t
~
~~. ~ €
0
m.
Let (O.J) be a measurable space. and let J
= ~ V
contains "prior events" to N. so that '!iT C ~. Let
o
be a probability measure on (O.~). under which Nt(i) has
where
~
(~~'~t)-stochastiC intensity. At(~.i)
~t' ~t
t
= Yt(i) AO(t)
exp{~
Xt(i)}. where.
are (~~'~t)-predictable processes. Yt(i) € {0.1}. V t €
[0. T]. i=l •.... n. A is an unknown function and
o
~
is an unknown scalar.
Then by analogy to the above. the partial likelihood can be defined by.
(2.3)
~n(~) = ITk~1
tk~T
e
e
23
Recall that the above multiplied by e(n-1)T is the conditional
likelihood of (N(1) •...• N(n»
given N(·) on [O.T].
This means that the
partial likelihood can be thought of as a "condi tional" likelihood.
In
this context. Anderson and Gill (1982) derive consistency and asymptotic
normality for ~n. the maximizer of the partial likelihood.
Y(i). XCi). i=1 •...• n
i.i.d .• they show that under a Lindeberg
An P A L
condition and moment conditions. ~
the true value and
For N(i).
-1
~ ~ and Yh(~-~) ~ N(O.~
). for ~
positive.
~
Another justification for the use of the partial likelihood is
given by Johansen (1983).
maximizes
~ (~)
n
He demonstrates that the estimate ~n which
can be viewed as a maximum likelihood estimate in a more
general setting than that of a counting process.
Allow
~
to have jumps
of integer size greater than or equal to one [now instead of calling
counting process.
~
~
a
is called a jump process] and replace Xo(t)dt by
dA (t) in Cox's regression model. where A (t) is not necessarily
o
0
continuous.
= max
A
Then. the partial likelihood is a likelihood profile.
~ (~)
n
~ (~.A).
n
Dzhaparidze (1986) investigates the optimality of
maximizing the partial likelihood.
parameter.
~.
derived by
Say. one wishes to estimate a
in model with two unknowns (say.
known that the best asymptotic variance for
than if X is unknown.
~
~
~.
X).
Then it is well
is smaller if X is known
Essentially. if X is unknown. the inverse of the
best asymptotic variance is the norm of the score for
projection on the space spanned by the score for X.
~
minus its'
If the directions
of approach to X are restricted (prior knowledge concerning X). then the
above projection will be the same or further away from the score for
~.
resulting in a smaller best asymptotic variance.
is
This occurs when
~
24
finite dimensional and X is an infinite dimensional nuisance parameter
(as is the case in the Cox regression model).
Dzhaparidze finds that if
Cox's partial likelihood is maximized in order to
estimate~,
e
e
then the
best asymptotic variance is achieved as long as the directions of
approach to X are not too restricted.
In fact, if X is a known function
of a finite dimensional parameter then usually the directions of
approach to X will be so restrictive so that maximizing the partial
likelihood will not result in estimators with the lowest asymptotic
variance.
However, if one can not assume a specfic finite dimensional
form for X then maximizing Cox's partial likelihood will result in
o
estimators with the lowest asymptotic variance.
(1983) for more on this subject.
Since
~
See Begun et. al.
is a function of time in this
thesis, the optimality properties derived by Dzhaparidze (1986) may not
hold.
It is of interest to investigate optimality properties of
estimators of the function
~
but this will not be done here.
e
e
In this thesis the focus is on how the covariates influence the
underlying intensity, not on the underlying intensity itself.
in order to estimate
~,
Therefore
Cox's partial likelihood is used and minimal
assumptions are made concerning X .
o
2.3
Method of Sieves
If the parameter of interest is infinite dimensional, maximum
likelihood estimation may not be meaningful.
well-known problem of density estimation.
1
all positive L functions with integral 1.
This is illustrated by the
Here the parameter space is
The likelihood,
~=1 f(X i )
can be made as large as desired simply by making f large at points close
e
e
25
to X. and zero elsewhere.
1
Grenander (1981). proposed the method of
sieves as a way to accommodate such a large parameter space.
increasing sequence of subparameter spaces. say
{~
•
An
is given so
n~l}.
n
that within each
there exists a maximum likelihood estimate. say p~.
~
n
and U
is dense in 9. where 9 is the parameter space of interest.
~
n
n
For a sample of size n. the likelihood is maximized over
~.
To
n
achieve consistency. K is allowed to slowly increase with n.
n
The sieve method is used by Karr (1987) to estimate the unknown
~
in the multiplicative intensity model (A s (i) =
s Ys (i). see section
Let.
1.3).
9 =
~
{~
is left continuous. right-hand limited. nonnegative
~
1
function in L [O.1]} and
~
= {~ :
~ €
9. is absolutely continuous;
n
I~I ~
As
K
n
f
CD.
E\
Kn
f
~
a
on [O.l]}.
1
in the L norm (Grenander. 1981).
Under some
n
assumptions. Karr proves that 3 ~ € ~ • a global maximizer of the
n
likelihood. and for Kn
=n
~-T}
• 0
< 11 < ~.
~
II p -~ 111 -+ 0
p~
The
a.s.
underlying idea of the consistency proof in Karr and also in the proofs
of Geman and Hwang (1982) centers on the Kullback-Leibler information
<Wi
For dx =
just as do the classical proofs of consistency.
Kullback-Leibler information is.
f(x.~
o
) dx.
KL(~) =
Under general conditions
J Ln(
KL(~)
>0
f(x.~
o
f(x.~ )/f(x.~)
o
unless
~
=
~.
0
). the
)
The
first step in the above proofs is to find a "natural" metric. say p. so
that if lim KL(~n} = O. then lim p(~.~} = O.
n
n
Once this is done. the
26
next step is to locate a sequence. ~ e ~
n
One then shows that the mle.
An
~
. and
-n
~
e
for which lim KL(~) = O.
n
become progressively closer with
e
increasing n. This should be the case since ~ minimizes KL (~) =
n
n
n
An
;;n
Ln( f(X '~o)/f(X .~) ) and one expects that KLn(~ ) ~ KLn(p ) ~ KL(~o)
< KL(~)
and that KL
n
converges in a uniform fashion to KL.
ensures that ~ will be consistent.
classical proofs for consistency.
This then
~ plays the role that ~o plays in
As was mentioned in section 1.1. the
sieve used here is the histogram sieve.
~
n
= {~
~
is constant on,
1.]
k' k • j=l .... k}.
(i::!.
A "natural" metric does not seem to arise here.
an important role.
section 2.5.
However. ~ still plays
See the remarks following the consistency proof in
The histogram sieve is used by Leskow (1988) for
estimation purposes in Aalen's multiplicative intensity model (1978) and
e·
e
McKeague (1988) uses this sieve in the p-covariate multiplicative
intensity model {At(i) = ~~=1 ~t(j) Xt(i.j)} to estimate
(~(1).~(2) ..... ~(p».
2.4 Statistical Model and Assumptions
The statistical model considered in this thesis is as follows:
each n. one observes an n-component multivariate counting process.
n
(N (l) .... Nn(n». over the time interval [O.T].
might count certain life events for individual i.
For
n
~
=
For example. Nn(i)
~
.....
is defined on a
Nn has
stochastic base (nP.~n.{~ : t € [O.T]}) with respect to which .....
n
stochastic intensity ~n = (A (l) •... An(n» where
e
e
27
martingale.
In the above. both
n
n
[O.T]. ....
X = (X (l) •... Xn(n»
~
and A are deterministic functions on
o
0
is a vector of locally bounded •
predictable stochastic processes. and yn
= (Yn(l) •...
yn(n»
is a vector
of predictable stochastic processes each taking values in {O.l}.
As explained in section 2.2. inference for
~
is based on the
o
logarithm of Cox's partial likelihood (Cox. 1972) (see (2.3».
~ (~)
n
=
n
~
i=l
A direct maximization of
~ (~)
~
for
n
will not produce a meaningful
n
estimator.
For example. let ....
X be time independent and each component
n
of ~n have at most one jump. then if Rank(X ) = n and the jump of Nn(i)
~(T )Xn(i)
occurs at
T.
o
/l;T
Ln [ ne
~
e
jx"(j)
]
(which is negative) can be made as
0
j=l
close to zero as desired simply by increasing
1989).
~(T
o
) (Zucker and Karr.
As was illustrated in section 2.3. the method of sieves
(Grenander. 1981) is often useful in this situation.
The histogram
sieve is used here.
~
n
n
= {~ :
n
~(s) =
K
~ n ~. l{s € I~}
i=l
1
1
for
(~l' ... '~K )
The (ll •...• IK ) are consecutive segments of [O.T].
n
n
K
€
m n}
28
Define. for each s
n
Si(~.s) = -1 ~
n
n
e
. 1
J=
[a.T].
6
~(s)Xn(j)
s
.
(Xns(j»l yns(j)
e
e
i=O.1.2.3.4.
The following conditions will be referred to repeatedly in the theorems
and lemmas of this thesis.
A.
(Asymptotic stability)
1)
i
sE[a.T]
i
n
3)
there exist
>a
~
(~
o
- Sl(~ .s)1 =
0
0
(Sn (~0 .s) - S
2)
i
.
Isn (~0 .s)
sup
faT
There exist S
i
2
(~.s»
0
p
.s).
i=O.1.2. such that
(1).
ds = ap (1). and
such that
i
S (b.s)
sup
sE[a.T]
B.
Ib-~a(s) I<~
(Lindeberg Condition)
1)
c.
I~
sup
b€IR
max
I~Hn
fT I{s :
a
Sn (b.s)
For all
E
I
= 0 (1)
p
i=1.2.3.4.
e
e
> a.
Ix
(j)Y (j) I > 6v£"} ds = 0 (1).
s s p
(Asymptotic Regularity)
1)
There exist constants U
1
max{AO(S). S
a(~o.s)
~
S
2)
i
(~o.s).
U2
> a. U2 > a
i = a.1.2}
a.e.
o
.s) =
U1 and
Lebesgue on [a.T].
There exists a constant L
V(~
~
such that
> a such
that for
S2(~O'S)
~a::--~-
S (~o.s)
V(~o.s) Sa(~o's) AO(S) > L
a.e.
Lebesgue on [O.T].
e
e
29
D.
(Bias)
1}
~
2}
~o(s}
3}
V(~o's} SO(~o's} Ao(s} is continuous in s on [O.T].
4}
V(~o's} SO(~o's} Ao(s} is Lipshitz of order 1 on [O.T].
o
(s) is Lipshitz of order 1 on [O.T].
has bounded second derivative a.e. Lebesgue on [O.T].
In the following section. a member of
8K
will be denoted either by
n
its' functional form. ~(s) =
~
=
(~1""
~K
K
~ n ~il~(s}. or by its' vector form.
i=1
1
). It should be clear from the context which form of
~
is
n
.
n
n
pertinent. The lengths of the K intervals. I ....• I K . will be denoted
n
1
n
n
n
.
n
n
n
by I! = (I! l' ... • I!K ) WIth I! (I)' I! (K )' and III! II being the minimum length.
n
n
maximum length and the 1!2 norm. respectively.
1
Other definitions are:
0
1}
En (~.s) = Sn (~.s}/Sn (~.s).
2)
Vn (~.s) = Sn (~.s}/Sn (~.s) - (En (~.s}) .
2 0 2
3)
-n
~
4}
(u)
=
K
~
i=1
n on n
P~ lieu}. for u in [O.T] where
1
on _ ITO Ini(s}
Pi -
2
a.1
= IT0
n
lies}
~0 (s) V(~0 .s} SO(~o's} Ao(s}ds
a
V(~
0
2
. and
i
.s}S0 (~0 .s}A0 (s}ds. i=l, ...• k n .
In the following. the superscripts and subscripts. n. are dropped.
A and
o
~
0
are constant with increasing n.
Only
30
2.5
Consistency of the Sieve Estimator
e
e
One way to prove consistency of the maximum likelihood estimator is
to expand the log-likelihood about the true parameter, say
~
o
, and then
use a fixed point theorem as in Aitchison and Silvey (1958) or
Billingsley (1968a).
However, in the problem considered here,
general, not a member of
OK
for any finite K;
~
o
is, in
hence in the following
proof, the idea is to expand the log-likelihood about a point in
~, which is close to ~ , instead of expanding about ~.
o
0
OK'
say
This
introduces a technical difficulty as the score function is no longer a
martingale but a martingale plus a bias term.
To the first order, this
bias term can be eliminated by proper choice of ~n as is given in the
previous section.
Assumptions D and A2 are then useful in showing that
the bias is asymptotically negligible.
Theorem 2.1.
a)
e
e
Assume
lim nll211
10
-+
(Bias
0
-+
0),
n
b)
lim nll211
4
-+
(Variance converges), and
00
n
2
c)
then for
PROOF:
A, C, Dl,
.....n
~
lim
~ < 00,
n
(1)
maximizing
~n(~)
in
OK,
Recalling that L is defined in assumption C2, let
e
e
31
If
_K
1
aap.
R
~
(13)(13.-13
1
~
00.
IItll
~
0).
€
~
such that
-1 -1
~- n
t.
. 1
1=
wi th probabili ty going to 1 (as n
~
Aitchison and Silvey (1958). 3
n
1
-n
(i»
<0
then by lermna. 2 of
ago
~n(f3) If3=~
= 0 Vi
1
II~-pnll ~ (IItIl4n)~c5n on
and
a2~ ~
ap..
(13)
a set of probability going to 1.
is nonpositive for each i. this proves the conclusion.
Since
Using
n
1
-n
a Taylor series about the vector 13 =
-1
~
i:; (n til
=i~
a
(13-n1.... 13-nK),
gives.
-n
af3 ~n(f3)(f3i-f3i)
i
(n t i )-1
[ag i ~n(pn)](f3i-~~)
i! (n li l -
+
1
~~ ~n(jj"l)(lli-;r;l2
i
+
sup
l~i~K
I( n
0
C;i
)-1
2
a
cD
~ or.
ap..
i
n
n ) + 0 - 1 21 IIR_;;Il112 _ LIIR_;;Il112 .
(Rp
C;.
a.
p p
p p
1
1
32
Consider {3 €
~,
-n 2
4 -1 2
where 1I{3-{3 II = (llill n)
on' then, by lemma 2.1 of
a
an~.
~
section 2.7,
_K
Y-
. 1
(n i.)
-1
1
1=
-n
n
1
+ 0
({3)({3i-{3.)
1
p
«vb lIiIl 2 )-1)
+
0
p
(1) - L + 0 (1)1I{3-j¥111}
p
(since (lIiIl4n)~
Since lim 0
-
_K
Y-
i=l
n
(n i.)
1
2
n
-1
°
! 0).
n
> 0,
ana
~i
~
n
~nll-jj"n2 ~ JI
({3)
e
e
;;Il
((3i-P~)
1
+ 0p(nntn
IO
2
4
)+Op(ntn ) + OpH-hlntn )-I)
+
It is obvious that, for
> 0, 3
~
n~
0
p
(1) - L + 0 (1)1I{3-j¥111}.
p
such that for n
~ n~
Notes to Theorem 2.1.
1)
e
e
By the definition of
;;Il
p
e
,
.
2 T
-n
2
6
nllill f O ({3 (s)-{3o(s» ds = O(nllill ).
Since
e
33
4"-2
2T"n
2
nll211 lIifl-;fl1l = 0 p (1) implies that nll211 f O(/3 (s)-;fl(s» ds = 0 p (1). one
2
gets nll211 fT (~n(s)-/3 (s»2ds = 0 (1) + 0 (nIl211 6 ). Therefore in order
O
o
p
p
n~ ~ 11211-2 «nYz . It would be of interest to
to achieve consistency.
allow the data to choose the "optimal" rate of growth for 11211-2 .
One
approach would be to use a minimum complexity criterion as is done in
density estimation (Barron and Cover. 1989).
However this will not be
done here.
It is natural to question whether the rate JII211 4n from Theorem 1 can
2)
be improved.
In general this will not be possible.
To see this. let
T=l, and 2.= 11K for each i (so 11211 2 =11K). It turns out that ...;na~
1
1
(pr; 1
-n
/3i)' i=I •...• K. behave asymptotically like independent N(O.I) random
variables;
this indicates that the approximate distribution of
_K
"2
y- (...;na~ (~ - ~»
is chi-squared on
i=1
K degrees of freedom.
So one
1
expects that
i<
. 1
(...;na~ (~i
1
1=
-
~i»21K ~>
rigorously using lemmas 2.1 and 3.1.
rate
1.
This can be proven
Since o~ = 0p(IIK). this gives the
JIi7K2. I.e .• Jn1l211 4 . Other norms might allow for different rates.
For example. using the
JLn(K)
max o.
l~i~K
1
above intuitive reasoning. it is expected that
I ~ - ~ I = 0 (1).
P
=0(1).
p
3)
To understand why the choice of pn given above eliminates the bias
to a first order. consider the following:
Maximizing
~n(/3)
1
over
~ f~
n i=1
~
is equivalent to maximizing.
[S~(P.S)
(/3(s) - /3 (s»X (i) - Ln
0
s
S (J3 ,s)
n
0
]
dN (i)
s
34
for /3
This is "asymptotically" like maximizing a
€~.
"Kullback-Leibler" type information
e
e
(2.4)
•• -fTo {/3{s)-/30 (s»2 Y{/30 .s) SO{/30 .s) A0 {s)ds.
(under suitable conditions)
But the maximizer in ~ of the RHS of (2.4) is given by~.
Therefore
it is natural to expect that for the maximum partial likelihood
estimator. ~. the convergence of
A
t
fa
(~{s)-pn{s»
A
2
a
ds to
will be of a
faster rate than for choices of /3 € ~ other than pn.
4)
Further consideration of (2.4) lends substance to the use of the L
2
norm in proving consistency.
Usually in the method of sieves. the
Kullback-Leibler information {in this case. (2.4»
determines the norm
in which the maximum likelihood estimator converges to /3
(1981). Geman & Hwang (1982) and Karr (1987».
o
(see Grenander
e·
e
In the situation
considered here. the L norm approximates. to the first order. the
2
Kullback-Leibler information.
2.6
The Independent and Identically Distributed Case
As was mentioned in section 1.3. Zucker and Karr (1989) consider a
time-dependent coefficient in the context of survival analysis.
is. (Nn{i). Xn{i). yn{i»
have at most one jump.
That
i=I ..... n are i.i.d. and the Nn{i) can each
n
In addition. they assume that the covariate X
is bounded and the censoring mechanism is independent of the time of the
e
e
35
jump given the covariate.
Consistency of the histogram estimator of the
regression coefficient is given in a slightly more general setting
below.
In particular in the following corollary multiple jumps are
allowed, and the censoring mechanism operating on individual may depend
arbitrarily on the individual's past.
<€o'toUa/l,ff 2.1
Consider n i.i.d. observations of (N,X,Y) where both X and Yare
a.s. left continuous with right hand limits.
Then conditions A and C of
section 2.4 are satisfied if
a)
sets
R
""0
is continuous on
is bounded away from zero and infinity on
b)
A
c)
P[Yt =
d)
for each t €
o
in~,
[O,T],
1Vt
[O,T],
[O,T] ] > o.
€
[O,T] there exists at least two disjoint Borel
say At and B , such that
t
and either,
el)
X is a.s. bounded,
e2)
there exists an open interval,
or
[2
inf
t€[O,T]
E[
Remark:
fj (t), 2
0
sup
t€[O,T]
sup
e
(b,t)~ x[O,T]
~ € ~,
fj (t) ]
containing
such that
0
bX t
4
X Y ]
t t
< 00
Assumptions b, c and d ensure that there is sufficient activity
at each point t in order to estimate fj.
.
0
Essentially assumption d
specifies that the covariate X can not be a.s. constant at a point t.
36
PROOF:
Note that under a. el implies e2.
Assume a. b. c. d and e2. and
define
.
SJ(~ .s)
o
=E e
~
(s)X
Y Xj
s
0
s
for
s
0.1.2.3.4.
j =
Assumption e2 allows the application R. Rango Rao's (1963). strong law
of large numbers in D[O.T] to get AI.
E [nfT [n- 1 ~ (e
~ (s)X (i)
0'1
1=
~ nfb n- 1E [ e
Choose ~
If
> O.
s
0
2~ (s)X
0
so that ~' = [
As for A2. consider
.
~ (s)X
.
]
Y (i)XJ(i) _ E e 0
s Y XJ )]2ds
s
s
.
SS
s Y ~j
s
]
ds <
00
j = 0.1.2. (bye2).
inr
~ (s)-~.
sup
~ (s) + ~] C ~.
s€[O.T] 0
s€[O.T] 0
o
inf
S (b.s) = O. then there exists (b .s ) >1 €
(b.S)~IX[O.T]
v v v_
for which lim SO(b .s )
v-PJ
v v
= O.
Choose a convergent subsequence of
(b .s ) '1' say (b .s ) ~ (b.s) as ~ ~
v v v~
~
~
(b.s) € ~lx[O,T].
-
~lx[O.T]
Since ~'x[O.T] is closed,
00.
Then by the dominated convergence theorem.
.s ) = E lims
Y exp{ b~s
X } = O.
-lim SO(b~~.
~-PJ
~-PJ
~
This in turn implies that lim Ys exp{
~-PJ
~
__
li__
m Y exp{ b~Xs } = (e
s~
~-PJ
~
bX
s
Ae
addition to b and e2. gives Cl.
= 1.2.3.4.
sup
s€[O.T]
sup
hEm
0
Ib-~ (s}I<~ Sn(b.s)
o
b~Xs
}
=0
a.s.;
but
~
bX
sY
leads to a contridiction. hence
for j
~
s+y +) a.s.
s
inf
(b,s)~'x[O.T]
By assumption c this
So (b.s)
To prove A3. choose
~
> O. This. in
as above. then
37
sup
s€[O.T]
~
su~
b~'
Thus by Theorem 111.1 of Appendix III in Anderson and Gill (1982). and
the above arguments.
sup
s€[O.T]
sup
b€IR
b-t3 (s)
I
o
I<"Y
All that is left is to prove C2.
If
inf
V(t3o .s) = O. then there
s€[O.T]
= O.
exists {sv}v_>1 € [O.T] for which lim V(t3 .s )
v~
subsequence of
{sv}v~1'
closed. s € [O.T].
---=
~
were
h
EX
-_
s
[
{s~}~~1 ~
v
s as
Choose a convergent
~ ~oo.
Since [O.T] is
Then by dominated convergence.
lim V(t3 .s )
o~
~--
say
0
=E
~
Y [X - EX ]
[ lim
---""'s
s
s
~--
~
~
2
exp{ t3 (s)X
o~s
~
}
~
]= O.
t3 (s)X
sy X
s s
Ee o
This in turn implies that.
e
t3 (s)X
0
s y [X _
s s
Exs ]2
By assumption c. either Xs
contridicts d.
Ae
= EXs
t3 (s+)X
0
or Xs+
s+ Y [X s+ s+
= EXs+
a.s.
Exs+ ]2 = 0
a.s.
This. however.
C2 is proved.
o
Note to corollary 2.1
1.)
If lim nllill
n
6
<
00.
lim nllill
n
4
=
irK'
lim ~
n l(I)
00.
<
00.
and t3
is
0
Lipshitz continuous. then under the conditions of corollary
38
2.7 Lemmas 2.1 and 2.2
Lemma 2.1
Assume.
a)
I!
Ii m .:.1!9..
n 1!(1)
b)
A. Cl, Dl,
< en. and
then.
~
2)
4
2{1I1!1I n)
I(n
max
1~i~K
-1
I!.)
4
T
-2
-K
[1I1!1I f O r- Ii{s) I!i
i=1
-1
cJ2
~ ~
8/1-;
1
n
-1 a 2/
i
;;n
(p ) + I!i
0
V{J3 .s)8 (J3 .s)X (s) ds
0
=0
0
2 -1 ) +
r
p
0
({vn 1I1!1I)
0
p
(1). and
i
3
3)
/{n l!i)-1 £....3 ~ (J3*)/ = 0 ({~ 1I1!1I 2 )-1) + 0 (1).
813. n
p
p
iUP
max
1~i~K
13
€~
1
IIJ3*-jjn ll <.5'Y
PROOF:
1) Consider.
~ {{n I! )-1 ~~ {jjn»2
i=1
8J3 i n
i
=
(2.4)
_I(
X-
i=1
({n I!i)
-1
.Jl
T
;;n
f Ii{s)[X {j)-E (p .s)] dN (j»
j=1 O
s
n
s
~
2
~. 2 ~ {{n l!i)-1 ~ fTO I.{s)[X {j)-E (prt.s)] dM {j»2
i=1
j=1
1
S
n
s
39
+ 2
_K
-1 T
;;n
r(t. So Ii(s)[E (p .s)
i=1
1
n
o
- En
(~ .S)]S (~ .S)X (S) ds)
on 0
0
2
s ~o(u)Xu(j)
where M (j) = N (j) - S e
Y (j)X (u) du is a local
s
s o u
0
martingale. j=1 ..... n. Let.
_K
-1 ..Jl t
;;n
2
Zt = r- «n t.)
z So I.(s) [X (j)-E (p .s)] dM (j»
.1
1
·1
1
S
n
s
1=
J=
The compensator of Z. is
_K
Ct = r- (n t.)
.1
1
1=
_K
= Sot i=1
r-
-2
;;n
~_
~o(s)Xs(j)
I.(s)[X (j) - E (p .s)]-r (j)e
X (s) ds
·1
S
n
s
0
J
= .1
~
n
t
So
-2 -1 2
1
;;n
I.(s) t. n [S (~ .s) - 2S (~ .s) E (p .s)
1
1
non 0
n
2;;n
0
n
n
+ E (p .s}S
(~
0
.s}]X0 (s)ds
To show that Zt has the same limit in probability as its' compensator C.
it is sufficient. by Lenglart's inequality. (Lenglart. 1977) to show
4
that the quadratic variation of IItIl n(Z-C} goes to zero in probability.
Denoting the endpoints of interval Ii by a i and a i + 1 . and defining
M*(a .• s} = 2 ffl n- 1 SS- t~1[X (j)-E (p11.u)] dM (j). the optional
1
j=1
ai 1 u
n
u
4
variation of IItIl n(Z-C) is (Kopp.1984. pg. 148).
4
[lItIl n(Z-C}]t = ~
IItll
8
2
n (A(Z-C}s}2
s~t
82
=lItlln
..Jl
z
j=1
Sot
-K
*
2
r-I.(s} {M(ai.s)
i=1 1
;;n
2 -2 -2
;;n
4 -4 -4
• [X s (j) - En (p .s)] n t.1 + [Xs (j)-En (p .s)] n t.1
*
+ 2 M
(a 1.. s) •
;;n
[X (j) - E (p
S
n
.s)] 3n-3t -3
}
i
dN (j).
s
40
Then the compensator of [11211 4n(Z~)] is given by,
4
<11211 n (Z~»
{
t
_J(
= 11211 8 n 2 fat . r1
1. (s)
1
1=
*
2 -1 -2 [1M (a.,s) n 2i
~
~
n' l
J=
1
-n
(X (j)-E
s
n
(~,s»
2 e ~o(s)Xs(j)
Y (j)
s
-2 -3 [1 ~
+ 2 M* (a., s) n
2.
- ~ (X (j)
lin . 1
s
J=
-
- En (~,s»
4
3
e
~o(s)Xs(j)
_J(
*
2
rli(s) M (a. ,s) ds
t
= 11211 n fa
i=1
P
]}
A (s)ds
0
0 (1)
1
+ n-1 0 (1)+ 11211 2
Y (j)
s
P
_J(
fat rL(s) IM* (a. ,s) Ids 0 (1)
i=l
IIp
by A3, Cl.
Now,
max
l~i~K
sup 1M*(ai,s)1 ~
sEl i
max
l~i~K
sup IM*(O,s]I + max IM*(O,a i ]1
sEl i
l~i~K
1 1
1 i< 2? n- 2-i fos 1.(u)[X (j)-E (~,u)] dM (j)1
O~s~T i=l j=l
1
U
n
u
~ 4 sup
and using Lenglart's inequality (1977) for B
P(
sup
O~t~T
I
-l<
r-
~ n-1 2-1 f
~
i=l j=l
i
> 0,
t
-n
12
O li(s)[Xs (j)- En (~ ,s)] dMs (j)
]
41
+ En2 (P-n .s) Sn0 (P0 .S}]A0 (s) ds
]
for B large and n large (use A3. C1).
~ Eo
Therefore
Eo
>
~2
-
11211
2
f~
i<
i=l
I.(s}IM*(ai.s}lds = 0 (n-~).
1
P
*
_K
.X-
t
Consider the process. f o
Ii(s}M (ai.s)
2
ds for t belonging to
1=1
{aO = 0.a1.···.~+1= T} and the family {~ai}i=O.K+1· On {~ai}i=O.K+1.
•
fO
*
_K
2
X- Ii(s} M (a .. s) ds
. 1
1
1=
•
n
_K
- f O X- I.(s}
. 1 1
1=
~
1
J=
.
n
-2 -2
2.
1
f
s
a.
1
[X (j)u
2 Po(u}Xu(j}
En (pn.u)] e
Yu (j}A0 {u} du ds
is a local martingale.
B
> O.
Eo
Therefore by Lenglart's inequality {1977} for
> O.
(2.6)
2 -n
0
+ En (P .u}Sn (P0 .u)
- 2 En (~n.u}Sl{p
.U}]A0 {u} du ds
n 0
~ Eo
Therefore <11211
(2.5). (2.6}).
4
~
for B and n large {use AI. C1. and lemma 2}.
n{Z-CPT = 11211
2
0p{l} + n-
1
0p{l} + n~ 0p(l}
This. as mentioned earlier. implies that
Eo
B -2
(by
]
42
Since.
sup
O~ t~T
-K I. (5) 2.-2 n -1 V(fj. s)S0 ({j .S)X (5) ds I
11211 4 n ICt - f Ot ri=l 1
1
0
0
0
= 11211 4
_K -1
Y- 2.
i=l
1
0
P
(1)
Al and C1.
by
This concludes the proof for the first term on the RHS of (2.4).
Consider the second term on the RHS of (2.4).
_K
Y-
i=l
-1
T
0
-n
2
(2. f O 1.(s)[E (fj .5) - E (fj .s)]S (fj .s)X (s)ds) .
lIn
non 0
0
o
- V(fjo .5) S (fj0 .5»
•
sup
X0 (5) ds)
2
sup
fj(s)€IR
O~s~T
/fj(s)-l3o(s)I<'Y
Using a Taylor series for fixed s yields:
En (~.s) = En (130 .s) + (~n(s) - 130 (s»
Vn (130 .5)
where
II3(s)-13o (s)1 ~ I~(s) - 130 (s)/
Then. since
sup I~(s) - 13 (s)/ = 0(1).
O~s~T
0
-K
-1 T ; ; 1 l
0
2
r(2 i f O 1 i (s)[E (p .s) - E (13 .s)]S (13 .s)X (s)ds)
i=l
n
non 0
0
43
~ 211xll 2 + 211yll2
It turns out that. IIxll
2
=
° e!')
p n
6
and lIyll2 = 0(lIiIl )
term on the RHS of (2.4) is equal to
D1.
so that the second
° e!') + ° (lIiIl6 ).
p n
p
By A. Cl, and
one gets.
°
°
_K (IT 1. (s) IV (/3 . s)S (/3 . s) - V(/3 •s)S (/3 . s) Ids) 2 0(1)
IIxll 2 = YO 1
i=l
non 0
0
0
= ~ (i~
i=l
i
JI °
n
)2
p
(1)
1
° (-).
and
6
lIyll2 = ~ (i~)2 ° (1) = lIill ° (1).
i=l
P
P
=
p n
1
+
i~l lIb I i (s)V(/3o.s) dMs (-) I
+
i~l
lITo I.(s)V(/3
.s)(SO(/3 .s) - SO(/30 .s»A0 (s)dsl
1
Ion 0
~ sup
O~s~T
+
+
Iv (~.s)
n
- V(/3 .s)1 max i~l
0
i
Ib I.(s) dN (-)
1
I
max
i.-1 IT
I I (s)V(/3 .s) dM- (-)
sup
IsO(/3 .s) - SO(/3 .s)1 - max i-i
l~i~K
O~s~T
O i
l O
S
n
0
0
S
l~i~K
1
ITO I i (s)V(/3 .s)A (s)ds
0
0
44
So by lenuna 2.2 and C1,
3)
I(n i.)
AUP
max
13
l~i~K
€~
1
-1 0
3
* )1
--3 ~ (13
013.
n
=
1
1I(j*_~nll<.5"
max
l~i~K
<iN( e)
I
By assumption A3, C1, and I enuna 2, the above is,
o
Lemma 2.2.
Assume AI, A3, C1, and D1,
then,
2)
sup
O~s~T
ISi(~,s) - Si«(j ,s)1
n
n
= 0
0
(i(K»
i=O,1,2.
sup
t€[O,T]
Iv£" Mt ( e) I.
p
PROOF:
1)
Let B
max
l~i~K
> 0 and consider,
Iv£" f6
Ii (s) <1M (e)
s
I~
2
Using the version of Rebolledo's central limit theorem present in
Anderson and Gill (1982) it is easily proved that for Ztn
= vnr- -Mt(e),
Zn
45
converges weakly to a Gaussian martingale with variance function
f Ot s0 (~o .S)A0 (s)ds.
An application of the continuous mapping theorem
(Theorem 5.1 in Billingsley. 1968b) suffices to prove 1.
2)
Fix s. then using a Taylor series about ~(s) results in.
Therefore.
sup
O~s~T
i
i -n
I
(~ .s) - S (~ .s)
ISn
on
=
sup
O~s~T
I~o (s) - ~(s)1 0 p (1)
(by A3)
o
III.
ASYMPTOTIC DISTRIBUTIONAL TIIEORY
3.1
Asymptotic Normality of the Sieve Estimator
In order to conduct inference about the regression coefficient
function. P . it is useful to consider some sort of weak convergence
o
result for
.....n
p.
However. in this case as in similiar situations where
the parameter of interest is a function (Karr. 1985; Leskow. 1988;
Ramlau-Hansen. 1983) normalized versions of ~(t) and ~(s) have
asymptotically independent normal distributions.
.....n
means that the limiting distribution of p
Intuitively. this
is "white noise."
This
complicates inference concerning the function P • as this excludes a
o
functional central limit theorem.
Karr (1985) circumvents this by
giving a supremum type statistic which has an asymptotic extreme value
distribution.
Another possibility is to consider an integrated version
of ~ as will be done below.
McKeague (1988) also considers an
integrated version and then proposes the use of a supremum type
statistic based on the integrated estimator for inference purposes.
might also consider various
T
.....
f O wn (x)(~(x)-P0 (x»
One
weighted integrals of ~. i.e.
dx as is done in Aalen (1978) and in Gill (1980).
This is done in chapter IV.
In the following. the existence of a sequence of estimators
(~
€
E\n )
such that
II~-jflll
=0
p
(1). as n
-+
00.
is assumed.
Recall that
47
the capital letters in the assumptions refer to the conditions stated in
section 2.4.
The following weak convergence result is in terms of the
8korohod topology on D[O,I] (see Billingsley, 1968b).
Theorem 3.1
a)
Assume,
-
lim nllill 8 = 0
~
(Bias
0),
n
b)
lim nllill 4 =
00
(Variance converges), and
n
c)
-E- <
A, B, C, D2, 04, lim
n
00,
c;(1)
n
then,
w
.~
~ f O p (s) - ~o(s)ds => G
where G is a continuous Gaussian martingale with GO = 0
t
<G>t = f O
(V(~
a.s.,and
0
-1
,s)8 (~ ,s)X (s»
ds.
000
PROOF:
Using assumptions D2 and 04, it is easily proved that
sup
t€[O,T]
I~ f~ pn(s) - ~o(s)dsl = O(n~ lIiIl 4 ).
To show that ~ f~ pn(s) - ~n(s) ds ~> G consider the following Taylor
series:
o
1
a
~
1
a
;;Il
= - - - ~ (p ) = - - - ~ (p )
~ a~i n
~ a~i n
where
*
~
II~ -p
II
;;Il
"'n II.
~ lip -~
48
Define
2
3
A2
-1 a
nn
1 a
* An - n
0i = n --2 ~n(p ) - 2n --3 ~n(~ )(~i-~i)·
a~i
that P[ min
-1 A2
l~i~K
2.
1
0.
1
a~i
L
> -2]
~
1 so it is sufficient to consider
Yh f~ (~n(s) - pn(s»ds on this set only.
Yh
Lemma 3.1 implies
Therefore. solving for
An -n
(~i-~i)' multiplying by lies) and integrating from zero to t results
in.
(3.1)
2
where 0i = f
T
0
li(s)V(~ .s)8 (~ .S)A (s) ds. i=l ..... K.
o
000
first term on the RHS of (3.1) is
0
p
To show that the
(1) in sup norm consider.
(by lemma 2.1)
49
(1)]
. nlo [
6
1 4 ] + 0 (11211 ) + 0
P 11211 n
p
p n
= 0p (1)
(by I), 2), and lemma 3.1).
As for the second term on the RHS of (3.1),
1..Jl JT
° i=lr- I.(s)
--2
° I.(u)
1
j=l
1
1 Jt
= -~
_K
0
.-n
(X (J)
- E (P ,u)) dN (j)ds.
~
n
U
i
u
Let,
~
Z =!- ff1 Jot [
I.(S)2 2f ](X (j) - E
t
r-. 1
.
l
I O.
S
n
vn J=
1=
1
(~n,s))
dN (j) for each t
s
€
[O,T].
Using McKeague's (1988) lemma 4.1, one gets that if Z w
=> G, then the
second term on the RHS of (3.1) converges weakly to G.
Now,
Zt =!- ff1 Jot
r- . 1
vn J=
+
[~
I.(s) 22i](X (j)
. lIS
1=
- E
n
0i
~ Jot [~
I.(s)
'11
1=
(~,s))
22i](E (P ,s) - E
no
n
0i
dM (j)
s
(~,s))So(P
,s)X (s)
no
0
By lemma 3.2, the second term of Zt is 0p(l) in sup norm.
ds.
As for the
first term, the idea is to use the version of Rebolledo's central limit
theorem in section 1.2.
<Y>
Call the first term of Zt' Y .
t
Since
°
2
t [ r-I.(s)
.-K
2i] [S2 (P ,s) + E2 (P-n ,s)S (P ,s)= JO
--2
t
. l IO.n 0
n
n 0
1=
1
and max
l~i~K
sup 12~1 O~
scl
1
1
i
-
V(P
0
,s)SO(P ,s)X (s)1 ~ 0 (by the continuity of
0
0
50
V(~o ,s}SO(~0 ,S}A0 (s) in s}. one gets, using Al and lemma 2.2, that
P
0
t
<Y t
> ~ Io
(~ .S}A (s)]
O [V(~ ,s}S
o
0
-1
ds.
A Lindeberg condition must be satisfied also; that is, show
T 1
~
-n
2
I On.
- ~ (X s (j) - En (~,s)) e
J= 1
*
is
0
p
I{s : Ix s (j) - En (pn,s)
(I)
for each
> O.
~
~o(s)Xs(j}.
( _K
2 i )2
Y (J}A (s) Y- I.(s) --2
so. 1 1
1=
O.
1
I > ~~
Recall that
(.t< Ii(s) o.2~)-1}dS'
1=1
m~n
1
1
-1 2
2 i 0i
~
L so the Lindeberg
condition will be satisfied if,
(3.2)
IT ! 'J!l
(X (j) - E (pn,s}}2 e
0 n j=l
s
n
~
0
(s}X (j)
s
Y (j}A (s)
s
0
*
= 0 p (I)
V
~
I{s : Ixs (j) - En (pn,s) I
> ~~}ds
> o.
The LHS (left hand side) of (3.2) is bounded above by,
4
IT ! 'J!l
o n j=l
~
X (j}2 e
s
0
(s}X (j)
s
Y (j) A (s) I{s
s
0
IXs(j}1
> 2~ ~} ds
(3.3)
~ 4
ITOn.
! 'J!l1
J=
~
X (j}2e
s
+ 0p(l}
0
(s}X (j)
s
YS(j)AO(S} I{s
Ix s (j) I·Y s (j)
by AI. C1, and lemma 2.2.
So the LHS of 3.2 is
Ixs
(j) Y (j}1
> -2~ ~}ds + 0 (I)
s p
> -2~ ~}ds
51
= 0 p (1)
(by B and AI).
3.2 A Consistent Estimator of the Asymptotic Variance Process
Theorem 3.2 Assume,
4
=
lim nll211
n
b)
AI, A3, C, DI, 03, lim
(XI,
lim 11211
n
2
a)
=0
and
-E. <
n
(XI,
c;(I)
then for
A2
o (s)
n
o
2
=-
(s) =
_K
X- I.(s)(2 n)
j
j=I J
V(~
0
o
,s)5
(~
0
sup
s€[O,T]
-1 8 2
----2
8~.
J
An
~ (~
) and
n
,s)X0 (s),
10A2 (s)
n
-
2
0
(s)1
=
0
p
(1).
and,
PROOF:
sup
1;2(s) - 02(s) 1 = max sup
s€[O,T]
n
I~j~K s€I
(3.4)
1(2. n)-I
j
J
a2 2
8 ~j
~ (~)
n
- 02(s) 1
52
+
I2.-1
J
max
I~HK
o.2 -
0
J
2 (s)
I
2
J
where o.
= I T0
2
I.(s)o (s) ds.
J
The third term on the RHS of (3.4) is 0(1) by the continuity of 02(s)
and the second term on the RHS of (3.4) is
0
p
(1) by lemma 2.1.
Consider
the first term on the RHS of (3.4):
I(2
max
I~HK
n)
j
-1
2
8
:::0
-1
- 2 ~ (p ) - (2 n)
-:2 ~ (p;;11 )
j
8 {j
n
8 Jj ~ n
a2
j
I
j
~ max 2-j 1 IT I.(s)IV (~j's) - V (~j.s)1 dN
11'~J_
·<K
=
max
1 ~HK
n
J
0
n
T·
I8~
8
2-1
V ((j*.• s)
j I 0 I.(s)
J
.... j n J
where
I dN-
~I ~ I~ - i3~1
l{j;-
s
s
(e)
(e) 1:::0;;111
Pj~ - P~
J
for each j.
~ max 2-j 1 IT Ij(s) dN (e)1 ~-~jl 0 (1)
l~j~n
s
0
P
J
by A3 and Theorem 2.1
[Note that the assumptions lim nll211
10
= 0 and A2 are not necessary in
n
max I~j-~I
Theorem 2.1 if one only needs
~
max
l~j~n
+
2-liT
I Ij(s) dM- (e) I
j
0
s
max
l~j~n
= 0 P (1)
Therefore
l~j~n
sup
s€[O.T]
=0
(1)]
P
l.{j.-p~
. .n ;;l11
J
J
0 (1)
P
I
o ({j .s)X (s) ds II.{j.-p~
. .n ;;l11 0 (1)
2-1
I T
Ij(s)S
j
0
n 0
0
J J
P
by AI. lemma 2.2 and Theorem 2.1.
10....2 (s)
-
2
0
(s)1 =
This in turn implies
t~t
sup
(1).
0
n
O~t~T
P
II~ [~2(S)]-1 ds - I~ 0-2(s)
n
dsl
=
53
o (1) (by C2).
p
o
3.3
A Partial Likelihood Ratio Test of a Finite Dimensional Null
Hypothesis Versus an Infinite Dimensional Alternate
In the following. the null hypothesis H :
o
and the alternate hypothesis HI:
~o
~
is a constant function
0
is nonconstant. are contrasted.
Just as in the problem of estimating an infinite dimensional parameter.
the partial likelihood under the alternate is maximized at
00.
This
problem can be remedied. as before. by working within the context of a
n
sieve: that is. for a sample of size n. H
o
~o
€
~.
= H0
is contrasted with ~1:
In classical theory. the distribution of the likelihood ratio
n
test of a one dimensional null hypothesis versus a K dimensional
alternate will be approximately a chi-squared on K-l degrees of freedom.
In this case the dimensionality of the alternate. K • increases with the
n
sample size but intuitively one still expects that the partial
likelihood ratio test (PLRT) will have an approximate chi-squared
distribution on K -1 degrees of freedom.
n
Since the standardized
chi-squared random variable ( i.e .• subtract K and divide by the square
root of (two times K) ) has an approximate N(O.I) distribution for large
K. it is natural to consider this standardization of the PLRT as is done
in Theorem 3.3 below.
A
For the remainder of this chapter ~(s)
A
T
A
A
K
=~
i=1
A
n Ii(s)~ will be
1
written as the vector ~(Hl) = (~ •...• ~ ) in order to make clear that
n
54
this is the maximum partial likelihood estimator under~.
Similiarly.
the vector. ~(H~). represents the maximum partial likelihood estimator
t
A
under ~ on the interval [O.t] and the scalar. P{H )' represents the
o
maximum partial likelihood estimator under ~ on the interval [O.t].
o
Likewise a subscript t attached to
~n{P),
that is.
partial likelihood on the interval [O.t].
~n{P)t
Therefore
~n{P)T
the partial likelihood on the entire observation interval.
!
i
j=l
i j and i{t) to be that i for which a i - l < t
~
denotes the
represents
Define a. =
1
Recall that the
ai .
ii's and subsequently the ai's and i{t) change with n and that a
2
i
=
JT
O Ij{s) V{Po .s)So{P0 .S)A0 (s) ds.
Theorem 3.3.
Assume.
a)
P
b)
lim nllill 4 =
n
0
is a constant function.
00.
e
lim Jlill 2 = 0 and.
n
i
c) A. B. C. lim~
n i{l)
= 1.
Defining
for
t € [O.T].
then.
_K
Xl
where Xn
t
j=l
n
~.!8A
~j
~
n
{P )t]
2
0
-2
a. -i{t}
J
= ----:::=:::=-----
n
and X converges weakly in D[O.T] (in the supremum topology) to a
standard Wiener process. and
max
t€[al···~]
IY~I =
(l).
0
p
55
Remark The maximum of the process yn is only considered over
[al ..... ~] because to show that ~(H~) is consistent for small t. the
square root of the information matrix multiplied by the score function
must be close to zero for small t.
This does not occur here.
appears to be related to the fact that
lim t-~IW 1=
t-()
the standard Wiener process.
PROOF:
00
This
a.s .. where W is
t
For a similiar situation see Csaki (1975).
The following proof consists of four steps,
1)
max
t€[a 1 ·· .~]
2)
max
t€[al···~]
I~n
(~(Ht»t
- ~ (P 0 )t l
on
= 0 p (Ln K)
K-%12{~n(e(H~»t - ~n(~o)t}- .tK n-1(;8 ~n(~o)t)2uj21
Pj
J=1
=
Op«lIiIl4n)~).
Steps 1) and 2) imply that for yn = Xn - Zn
Now, X~
=
(2KT-l)~[ tKn- 1 ~Iot I.(s)a~2(u2(s.i) + 2M* (j)U(s,i» dN (i) - i(t)]
j=1
i=1
J
J
s-
s
where
U(s,i) = Xs (i) - En (P 0 ,s), and
M*(j) = ~ I~ I.(u) U(u,i)
s
i=1
J
dM (i).
u
i=I, ... ,n; j=I, ... ,K
and
s € [O,T].
n
In the third step, the compensator of the first term in X is shown to
56
n
converge to the second term of X .
3.)
_K t
sup
K-~I Yf O I.(s)a.-2 V (~ ,s)
t€[O,T]
j=l
J
J
n 0
0
Sn(~o's)
A (s) ds - i(t)
I
0
n
This implies that X has the same weak limit as,
(2KT-1)~
(3.5)
iK! zn
j=1
fOe I.(s)a-2 (u2(s.i) + 2M* (j)U(s,i»
n i=l
J
j
s-
dM (i).
s
The last step will then be to show weak convergence in D[O.T].
This will be done using the Skorohod metric. but since the standard
Wiener process is continuous. weak convergence with respect to the
Skorohod metric will imply weak convergence with respect to the supremum
metric (Billingsley. 1968b. pg. 151).
4)
(3.5)
converges weakly to a standard Wiener process on [O.T].
For the duration of this proof denote a 2 (s) = V (~ .s)So(~ .s)A (s)
n
non 0
0
and a (s) = V(~o,s)So(~o.s)Ao(s)
2
for s € [O.T].
To prove step 1)
recall that under H .
o
~n(~)t = i:: f~ ~s(i)
each t
€
- Ln SO(~.s) dN (i)
n
s
[a1 ....• ~.]. a Taylor series about
Also for each t
€
[a1 •...• ~]
where ~ is a scalar.
A
~(H
t
o
) results in
For
57
**
where I~t
t
- ~0
I < I~(H
) - ~ I
00
A
Therefore,
t
n o »t
A
2{~ (~(H
-
~ (~ )t}
no
= (n
-~
d
2
d ,..,no
R ~ (~ )t)
In lemma 3.4, it is shown that for any
I~t-~ol
<
A
{~t}t€[
] for which
al'''''~
t
1~(Ho) - ~ol for t € [al'···'~]' one has that
2
This implies that
max
t €[ al' ...
d
*
I(tn) -1 ---2
~ (~t)tl
R
n
d ,..,
'~ ]
2
and that
P[
min
t€[a l ,··· ,~]
(-(tn)-l dd...2
= 0 p (1)
~n(~~)t) > 0]
p
n:
(byel),
(by C2).
1
Therefore,
sup
[(tn)
t€[2 ,T]
~d
l
(recall 2 1
rlR
up
2
~ (~)t]
= all.
n
0
0 (1)
p
58
Let c
>0
and for each n define T7 by,
Tn
inf
{t
tE[i 1 ,T]
=
1
=
p[T-T7
T + 1
,.n
Note that T7 is a
~
(3.6)
I-P
fO
ds
t an(s)
2
sup
tE[i 1 ,T] f~ a2 (s) ds
P[3 t E [i 1 ,T]
3
~ 1 -
P[3 t E [i 1 ,T]
3
dR
~
n
IJ
1 + c}
{... } = <p.
~ 1 -
~d
~
2
f O a (s) ds
if
as n
= 1 - Po(I)
Note that n
t
stopping time and that
[
< 0]
t 2
f O an(s) ds
(/3 )
0
>1
+
c]
t 2
fo
an(s)-a2 (s)ds
~ c
tl an(s)-a
2
2 (s) Ids
t -1 f o
~ w
t 2
fo
a (s) ds]
~ c
by A2.
is a square integrable martingale.
tA'f.l
1
impl ies that,
X
t
= [n~ :R
IJ
~ (/3 )
n
0
tA'f.l
]2 for tE [O,T] is a submartingale.
1
To finish the proof of 1) consider B
P[(BL (K»
n
-1
sup
t
tE[ i 1 ' T]
~ P[(B Ln(K»
-1
-1
(n
~
d
dR
IJ
+ P[(B Ln(K»
(/3 »
not
liA
-1
>0
~
sUR
t
T]
tE[i 1 ,
sup
-1
L]
(n
tE[~AT,T]
and
2
> 1]
JA d
2
d/3 ~n(/3o)t)
t
-1
(n
-Y.I d
> 1]
d/3 ~ (/3 )t)
n 0
2
> 1]
This
59
+ P[
Since E(n
sup
(n-~ ~ ~ (~ ) )2
t€[~AT,T]
d~ n o t
-~ d
d~ ~ (~
~
n
)
0
(t+ii)A~
)
2
> (~lAT)
n
(t+i 1 )AT l
2
= E fa
0
n
B Ln(K)]
(s) ds,
the
Birnbaum-Marshall inequality (1961) implies,
~
E
TA~ 2
fa
0
n
(s) ds
BT Ln(K)
+
T
fa
~
t- 2 Ef
+ (B Ln(K»-l f~
tA~
o 1 o~(s) ds dt
1
PET > ~]
2
(s) ds(l+~) + (B Ln(K»-l
B T Ln(K)
fT
0
+ PCT
i
t- 2
fat
02(s)
1
ds dt (1+~)
> ~]
An application of Lenglart's inequality (section 1.2) and AI, C1
suffice to show that the third term above goes to zero as n
~ 00.
Therefore, by (3.6) and the above,
P[(B Ln(K»
-1
sup
t
-1
(n
-~
d
2
d~ ~n(~o)t)
> 1]
t€[i1 ' T]
~
(B Ln(K»
-1
faT
0
2
(s) ds[l +
~]
+ (B Ln(K»
-1
(Ln T - Ln i 1 )O(1)
+ 0(1).
That is, the above can be made arbitrarily small for B large and n
large.
Step 1 is proved.
Under HI'
~
the parameter by which
~n
is differentiated is a
60
vector. i.e .. for t
= .1
~ f ot I.(s)
J
J=
= avo
v
€
[1 ..... K].
~n(~)t can be
written as
~n(~)t
(~j Xs (i) - Ln SO(~
.. s» dN (i); therefore. ~(Htl)' for t
n J
s
A
tAt
= avo is a vector of v components (~I(Hl)'· .. ~v(Hl».
In addition.
t
t
since ~v (H l ) is the solution to 0 = f o I v (s)[X s (i) - En (~j's)] dNs (i).
one gets that ~v(H~) is the same for all t ~ avand can be written as
A
T
~v(Hl).
For t € [a l ..... ~] consider the following Taylor series about
A
t
~(Hl) .
where
Now for j € {I .... . K}.
This results in.
Therefore for t € [al •...• ~].
61
-1
=
2
I!.
J
i (t)
-1
0j( -I!.
J
2
n-1 8 ..2 '£ (13*.)T)
8p~
n
J
J
L
[
j=1
Denote the above term in square brackets by An(j). then.
2K~ {'£n(P(H~»t - '£n(f3o )t}
_lL
= K-n
i(t)
i(t)
[(n~ ~'£ (13 ) )20~2 - 1] An(j) + K~ L
L
j=1
8 f3 j
noT
j=1
J
~ i(t)
_~ 8
2 -2
n
L [en 8 R '£ (13 )T) o. - 1][A (j)-I]
j=1
Pj n 0
J
= K
-1h i(t)
+ K
L [An (j)-I]
j=1
~ i(t)
+ K
L
j=1
(n
~
on ni (t){_I!-1 n- l a2 '£ (f3~)
j=1
j
8~ n J T
Since lemma 3.4 implies
to show that
8
8
2
-2
f3 j '£n(f3o )T) OJ •
> O}.
An(j)
62
max
t€[a1 ..... ~]
K-~12{~n(~(H~»t
i(t)
~n(~n)t}
-
~
-
j=l
= 0 p (1).
it is sufficient to prove
_K
r-
n
[A (j)-l]
2
=
-1
and
_K
-Jh
r- [en
. 1
K
p
8
2
~R
~ (~)T)
n
VI'J.
J=
(1).
0
j=l
J
0
-2
J
o.
2
1] = 0 (1) .
p
This will then conclude the proof of step 2.
First consider
_K
r-
[An(J.)-1]2 __
j=l
2[
2
[1!-1
8 " (R*)
]
. o. - 1!-1
j n -1 2
I'J. T
j=l
J
J
8~. n J
J
_K
r-
(J)
by lemma 3.4.
Using a Taylor series. the term in brackets on the RHS above becomes.
I 2[
3
-1
-1 -1
-1 -1 8
~
I!. o. -I!j n
...2 ~ (~ )T - I!j n
- 3 ~ (~. )TW . -~ )
J
J
8P
n 0
8~j
n J
J 0
j
a2
*
]
~
where
~
I[I!.-1
J
2
-1
o.J + I!.J
n
-1
I~j-~ol V I~j-~ol ~ l~j(Hi)
a2~
8
j
-1 -1
~ (~ )T] I!j n
no
a2~
~
v
j
-
~ol
~ (~ )T
no
I
63
T
A
+ (lL (HI) - 13 )
J
2
0
Ii.-1
J
n
-1
a3
~
- 3 '£ (/3·)T I
al3.
n
J
J
Therefore by arguments used in the proof of lemma 3.4,
+
tK
[~j(HT1) - 13 ]2 0 (1)
. lOP
J=
~
(3.7)
tK
.1
J=
[i~l fT 1.(s) V (13 ,s) dM (_)]2 0 (1)
O J
J
no
s
p
+
_K
Y- [i
j=l
-1
j
T
0
f O 1j(s)[V (13 ,s) 8 (13 ,s)
non 0
o
2
- V(130
,s)8 0
(13 ,s)]~
(s)ds] - 0p (l)
0
(by lemma 3.3).
Consider,
_K -2 t
2
Zt' = Y- i. [fO 1j(s) V (13 ,s) dM (-)] , t
nos
. 1 J
€
[O,T] .
J=
The compensator of Z', evaluated at t = T is
_K -2 -1
Y- i
n
. 1
J=
Therefore
sup
t€[O,T]
j
T
2 0
non
f O 1j(s) V (13 ,s) 8 13
0
,s)~
0
(s) ds
IZ~I ~ 0 by Lenglart's inequality. From the above
and (3.7), one gets that
64
In order to complete the proof of step 2), consider
~
-1
K
" 1
[en
~na ~n (130 )T) 2
v~"
-~
J=
-2
o. - 1 ]
J
J
~ 2K
2
-4[
-1
_K
T
_2
rOJ n -1...n
~ So I.(s)[U-(s,i)
"I
J=
"I
1=
-1
+ 2K
J
*
+ 2M _(j)U(s,i)] dM (i)
s
s
]2
_K -4 T
2
2
2
ro. [So Ij(s)[o (s)-o (s)] ds]
j=l
J
n
By A2 and Cl, the second term above is 0 (IItIl 2 n)-1).
p
Now, let Z', be defined by
Z
.
-1
= K
_K
r-
j=l
t
-4 -1
o. [n
...n
i=l
~
J
Sot
*
_..2.
I j (s)[U-(S,l) + 2Ms _(j)U(s,i)] dMs(i)]
2
e.
The compensator of Z', evaluated at t = T, is
+
j
-1...n
n
i=l
where f (s) = n
4M*s- (j)2 Vn
(13 ,s)So(13 ,S»A (s) ds
on 0
0
j
130 (s) Xs(i)
U (s,i) e
Y (i)
~
j = 3,4.
s
The proof that the above compensator is 0 (1) is given later, starting
p
at equation (3.8).
that
-1
K
sup
O~t~T
_K
r-
j=l
Z~
[en
=0
.
Using Lenglart's inequality this in turn implies
(1).
Therefore,
p
~
a
~n
V~j
~
n
(13 )T)
0
2
-2
OJ
To prove step 3), consider
- 1]
2
= 0 (1)
p
and
step 2) is proved.
65
sup
K~I~
~t€[O.T]
j=l
~
fat 1.(s)a.~ a2 (s)
J
K-~
_K
~-
. 1
J=
Ij(s)a.-21 a2 (s)-a 2 (s)
faT
J
sup
l~j~K t€l
I
n
-2
-~
max
+
K
aj
j
ds
J
j
P
J
ds
fat 1.(s)a2 (s)
~ a-2 I!~ n-~ 0 (1) + K~
. 1
J=
I
ds - (i(t)-l)
n
J
(by A2 and C1)
(by b).
Step 3) is proved.
Define X' by (3.5). i.e ..
[O.T].
t €
To show that X' converges weakly in D[O.l] to a Wiener process. use the
version of Rebolledo's central limit in section 1.2.
The verification
of the conditions of this central limit theorem completes this proof.
Note first that
<X') = T(2K)
t
-1
~
~-
j=l
n
-2
~
~
i=l
*
M (j)
s-
fat
2 _~
-4
4
__":l
UI(s.i»
e
P
X (i)
0
s
Y (i)A (s) ds
s
0
(3.8)
+ 4M* (j) 2 V (P .s)S0 (P .s»A (s) ds
s-
*
1.(s) a. (U (s.i) + 2U1(s.i) M (j) +
J
J
s-
non
0
k
1
k
P X (i)
where f (s) = nffl.u (s.i)e 0 s
Y (i)
n
i=l
s
0
k = 3.4.
66
By A3 and Cl,
T(2Kn)
-1
_K
Y.1
J=
-4 4
t
x
f O I.(s)o. f (s)
°(s)
In
J
2 -1
ds = 0 (nlll!lI)
p
),
so that i f
T(2Kn)-1
i<
f ot
. 1
I (S)0-j4 4M* (j)2y (P ,s)So(P ,s)X (s) ds = 0 (I),
j
snon °
°
p
J=
one gets by the Cauchy-Schwarz inequality that
T(2Kn)
-1
_K
Y-
. 1
J=
t
-4
*
3
fO
I.(s) o. 4M (j)f (s) X (s) ds =
J
So to show that <X')t
t
s-
J
°
n
°p (1) .
t, all that is necessary is to show that,
f t Ij(s) 0~4 M* ( jJ2 V (P ,s) SO(P ,s)X (s) ds ~ t.
j=1 o
J
snon °
°
i<
2T(Kn)-1
But
2T(Kn) -1
_K
Y-
'1
J=
t
fO
Ij(s) o.-4 M* (j) 2 V (P ,s) S°(P ,s)X (s) ds
J
s
no
no
0
e.
- V(Po ,s)S°(P0 ,s)]X0 (s) ds
If (Kn)-1
f t Ij{s) 0-j4 M*(j)2 02(s) ds = 0 (1)
j=1 o
s
p
i<
the first term on the RHS above is
°p (1).
then by AI, and C,
Consider the second term,
call i t V,
(3.9)
= 2T(Kn)
-1
-K t
rf I.(s)
j=1 O J
-4
o.
J
0
2
(s)
....n
~
i=1
s
_..2
f O I.(u)[U-(u,i)
J
67
+ 2M*
u- (j) U(u.i}] dNu (i) ds
= 2T(Kn}
-1
j:; i~1 f O Ij(u}[U (U.l)
_K
..Jl
2.
t
(f
= 2T(Kn}-1
t<
2? ft
. 1 1=
. 1
J=
0
t
u
*
+ 2Mu _(j} U(u.i}]
2
I.(s}o
(s) ds}oj-4 dNu (i)
J
[I .(U)[u2(u.i} + 2M*u_(j} U(u.i}]
J
2
-4]
f t I.(s}o (s) ds o.
u
+ 2TK- 1
t< fot
j=1
J
I.(u} o2(u}
J
n
J
dM (i)
ft I.(s}o2(s}
u
J
u
ds o~4 duo
J
So V is equal to a local martingale plus an increasing process.
Consider the increasing process.
-1
2TK
_K t
2
t
2
-4
xf O I.(u}o (u) f I.(s}o (s) ds OJ du
. 1
J
n
u J
J=
= 2TK-
1
t< fot
j=1
Ij(u}[V (~ .u}So(~ .u)- V(~ .u}So(~ .u}] A (u)
non 0
0
0
0
2
t
-4
• f u Ij(s}o (s) ds o.J du
(3.10)
-1 _K t
+ 2TK
j:; f O Ij(u}
-1 _K t
= opel} +2TK
j:; f o Ij(u}
But 2TK- 1
t<
j=1
O-j4 f~ I.(u) o2(u}
J
-1
= 2TK
2
0
0
t
2
-4
(u) f u Ij(s}o (s) ds OJ du
2
ft
u
t
2
-4
(u}fu Ij(s}o (s}ds OJ du by AI. Cl.
I (s}o2(s} ds du
j
_K -4 t
2
s
2
rOJ f o I.(s} 0 (s) f o I.(u}o (u) du ds
. 1
J
J
J=
Therefore the limit in probability of the LHS of (3.10) is equal to the
limit of
68
TK
-1
_K -4
Y- a.
j=l
J
(10t I.(s) a 2 (s) ds) 2 = TK-1 (i(t)-l)
J
+
0
P
(1).
Recall that both i(t) and K are functions of n. and
Li (t)-l i.
Li (t)-l i.
i(l)
i(K)
J
J
i(t)-l
j=l
j=l
~
~
K
i(K) .~ i j
i(l)
ij
J=l
j~
i
Since lim ~ = 1.
n~
n
(1)
Therefore, by the above and (3.10 ) .
Li (t)-l i
j
j=l
It is easy to see that
i(t)-l
k
ft
~
~ t.
n~
Vt
= 2T(Kn)-1 ~ ~ I~ [l j (u)[u2(u,i)
- t
j=l i=l
+ 2M* (j)U(u,i)]
u-
Define
V'
= (Kn)-l
v
2 (s) ds a~4] dM (i)
Itu 1.(s)a
J
J
u
~ ~IotAV [l j (u)[u2(u,i)
€
P
(Kn)
(3.11)
-2
_K
.Y-
~
~
J=l i=l
P
(1).
u
[O,T].
variation of V' evaluated at v = T is shown to be
o (1) by Lenglart's inequality.
0
+ 2M*_(j)U(u,i)]
j=l i=l
dMu(i) for each v
I utVu Ij(s) ~2 (s) ds a -4]
j
+
If the quadratic
(1) then V -t =
P
t
The quadratic variation at v = Tis.
0
[_2
10t Ij(u)[U-(u,i) + 2Mu*_(j) U(u,i)] 2
t
2
2 -8
• (Iu Ij(s)a (5) ds) a.J e
~oXu(i)
Yu (i) A0 (u)
]
du
69
+ M* (i)U(s,i)] fT I.(u) du dN (i)] 0 (1)
ss J
s
P
The compensator of
-2 -1 _K -2
n·
_..2
*
K n
Y- i.
~ fa I.(s)[U-(s,i) + M (i)U(s,i)]
sJ
J
j=l
i=l
fTVs
s
I.(u) du dN (i),
J
s
evaluated at time T, is given by,
K- 2
i<
. 1
i- 2
j
J=
-2
= K
fT Ij(s)V (p ,s)So(P ,s)A (s) fTVs I.(u) du ds
O
non 0
0
s
J
-K
r-
j=l
-2
faT
i.
J
I.(s)
.
J
fT
s
Ij(u) du 0 (1)
P
By Lenglart's inequality the second term on the RHS of (3.12) is 0 (1)
which in turn, implies (3.11) is 0p(l}.
[O,T].
Therefore <X'>t
~
P
t
V t €
All that is left is to show that the Lindeberg condition in
Rebolledo's central limit theorem is verified, i.e. show,
-1
K
n
-2
_K
.J1
j:; i~l
T [
fa
-4
*
+ 2M
• I{K~ n -1 u -2
j
for each e > O.
4K
-1 -2 _K
n
.J1
j:; i~l
_..2
Ij(s}u j [U-(s,i}
s-
(j}U(s,i)]
1_..2
V-(s,i}
2
P
e 0
X (i)
s
+ 2M*_(j)U(s,i)
s
Y (i) A (s)
s
I> ]
e}
0
ds = 0p(l)
This is bounded above by,
T [
fa
-4
4
Ij(s)u j [U (s,i)
~ -1
I{K
n
U
-2 _..2
P X (i)
]
Y (i) A (s) ds
j U-(s,i} > e/2} e 0 s
s
0
70
-~ -1 OJ-21 Ms(j)U(S,i)
*
1 > c/2} • eff0Xs (i) Ys(i) Ao(S) ] ds
I{K n
i< 2~1
-1 -1
K
~
n
j=l
J
° (1)
+
P
_K T
rfa I j (s)2.-2 M* (j) 2
-1 -1
( max K n
l~i~n
j=l
I{K-~
1
faT
l~i~n
I{
(3.13)
i<
j=l
J
* 2
r- I j (s)2 -2
Ms(j)
j
j=l
i<
j=l
[i<
I j (S)o-j2
j=l
K-~
max
In~ M*(j) I =
sup
j
s
°p (1)
satisfied if
] ds •
° (1)
p
* 2
r- I j (s)2 -2
j M (j)
j=l
s
• nJAIXs(i)IYs(i) > c/4}] ds 0p(l)
I (S)2-2 M*(j)2
j
j
s
1
n- IM*(j)I • IE (ff ,s)1
s
n 0
i<
O . 1
J=
> c/4}]
Ij(s) 2~2 M*(j)2 ds =
J
s
ds
°p (1)
° (I), and
P
(use Lenglart'sinequality), one gets
that the third term above is 0p (1).
will be 0p (1) for all c
> c/2}
[ _K
Noting that from (3.9), (Kn)-l fT
l~j~K s€I
IEn (~ 0 ,s)l]
s
a
I{
faT
1,Ji
K~ n-~IM*(j)1
+ (Kn)-l fT
dS) .Op(l)
Ij(s)oj-2 K~ n -~ IM* (j) I[n~ IX (i) IY (s)
s
s
s
_K
XJ=
. 1
(Kn) -1
I.(s)o-j2
> c/2}
[ _K
+
= 0p(l) + max
s
oj2 n- IM:(j)u(s,i)Ys(i)I
= 0 (1) + max (Kn) -1
p
l~i~n
• I{
J
The second term in (3.13) above
> a and the Lindeberg condition will
be
71
(3.14)
= 0 p (1)
> 0,
For C
> o.
Ve
the LHS of (3.14) is bounded above by
(3.15)
+
~
C-2 f T
T C I{n-Y.l X (i) IY (i) > e/C} ds
fO
1
max
l~i~n
s
s
_K
-2 M* (j) 4 ds
rI.(s)e -2
j n
J
s
O . 1
J=
+ C max
l~i~n
fTO I{n-Y.lIX (i)IY (i)
s
s
> e/C}
ds
T _K
-2 -2 *
4
Suppose that f O r- I.(s)e. n
M (j) ds = 0 (1), then there exists
j=1 J
J
s
p
constants, B(e) and nee) such that
T
P[fo
* 4 ds < B(e)] >
j:; Ij(s)e-2j n-2 Ms(j)
_K
B(e'
~
Choose C so that
1 - e/2
for n ~ nee).
< e/2. By assumption B, there exists n'(e) such
that,
> e/C}
fTO I{!- Ix (i)IY (i)
1~i~n
~
s
s
P[ max
ds
< e2C] > 1
for n ~ n'(e).
- e/2
Therefore by (3.14) and (3.15),
P[ max
1~i~n
_K
-1 -1 * 2
fT
O r- I.(s)e j n M (j)
j=l
J
s
for n
~
nee) V n'(e),
72
if
_K
faT .r1
-2 -2 *
4
Ij(s)i. n M (j) ds =
°
(1) .
J
s
P
J=
n
=
Recall that M*(j) = 2 f~ I .(u)U(u,i) dN (i); therefore, M*(j)4
s
s
i=1
J
u
. 4 Now,
2 A(M* (J».
u
u~s
+ 4 AM*(j) M* (j)3
u
u-
[AM*(j)]k = ~ Ij(u) Uk(U,i) AN (i)
u
i=1
u
where
k = 1,2,3,4.
So,
M* (j) 4 = 2 AM* (j) 4 = Z~
s
u~s
u
i=1
fas
--~
* (j)
Ij(u)[U4 (u,i) + 4U-{u,i)M
u-
Thi s means that,
fT
I.(S)i-2 n- 2 M*{j)4 ds
j
O j=1 J
s
i<
°
By Lenglart's inequality, the above will be
(I) if its compensator is
p
•
_K
-2 -2 *
4
0p{I). The compensator of fa j:; Ij{s)i j n
Ms{j) ds, evaluated at
time T, is
_K -2 -1
rij n
faT Ij{u)[f 4 (u)
j=1
n
3
*
+ 4f CulM (j)
n
u
o
. .*
2 fT
+ 6V {~ ,s)S (~ ,S)M (j)]
I.{s) ds
no
no
u
UJ
A0 (u) du
73
which is 0 (1) by (3.9).
p
o
3.4
Other Versions of the Partial Likelihood Ratio Test
Theorem 3.3 of the last section can be used to formulate a
sequential test of
~
o
is constant versus
~
0
is nonconstant.
Corollary
3.1 below gives the null distribution for the sequential test.
An a
level test would be to reject the null hyPOthesis and stop the test at
time t € [al •...• ~] if T-~I Ztnl exceeds the (I-a) th percentile of the
distribution of
distribution to
sup
s€[O.T]
vT
Iw I.
sup
s€[O.I]
Since
s
Iw I.
sup
s€[O.T]
Iw I
is equal in
s
the relevant percentiles can be derived
s
from the series representation for the distribution of
given by Billingsley (1968. p. 80).
sup
s€[O.I]
Iws I
Since the set [al •...• ~] increases
with n up to a dense set of [O.T]. and the Wiener process is continuous.
it is sufficient to evaluate the maximum over t € [al •.... ~] of Zn for
the limiting distribution of
sup
s€[O.T]
Iws I
to hold.
'€o'toUalty. 3.1
Under the assumptions of Theorem 3.3.
max
t€[a 1 •··· .~]
converges in distribution to
process.
sup
IWtl where W is a standard Wiener
t€[O.T]
74
PROOF:
As in Theorem 3.3. define
Z~ = (2KT-1)~ 2{~n(~(H~»t
for t € [O.T].
n
Then. Zt
= Xnt
n
t
-
~n(~(H~»t}
- i(t)}
n
+ Y . where X converges weakly in D[O.l]
(in the supremum topology) to a standard Wiener process and
max
t€[a1·····~]
I
max
t€[a1.···.~]
IY~I
=0
(1).
COnsider
p
IZ~I
-
max
t€[a1.····~]
(3.16)
max
IZ~I has the same limiting distribution as
t€[a 1 ···· .~]
Therefore
max
IX~I.
t€[a 1 ···· .~]
Since Whas support contained in C[O.T] which is a
separable subset of D[O.T]. the representation theorem (Pollard. 1984.
pg. 71) implies the existence of in and Wwhere
sup
li~-Wtl-+O
t€[O.T]
a.s. and in. Ware equal in distribution to Xn . W. respectively. for
each n.
....n
If
concluded.
I
....
max
IXtl -+
sup
IWtl. then the proof will be
t€[a1 ..... ~]
a.s. t€[O.T]
max
t€[a 1 ···· .~]
75
~
00
the first term above goes a.s. to zero, and as K
that K
~
00
As n
as n
~
00)
~
00
(recall
the second term goes a.s. to zero; therefore,
This in turn implies that
(3.17)
Combining (3.16) and (3.17) concludes the proof.
o
ratio test is based on:
(21< )~
n
which by Theorem 3.3 converges in distribution to a standard normal.
Portnoy (1988) derives a likelihood ratio test of this type for an
exponential family setting in which the number of parameters increases
with n.
As mentioned earlier Fisher's transform is considered in corollary
3.2 below.
It is well known that Fisher's transform of a chi-squared
random variable converges more swiftly to a standard normal than the
above standardization. (Johnson and Kotz, 1970).
However in this setting
76
one can not immediately assume that Fisher's transform of PLRT
Kn will
converge faster to a standard normal that the standardized version of
PLRT
given above.
K
From the work in Theorem 3.3, there is no evidence
n
to support the intuition that the distribution of PLRT
Kn is close to a
chi-squared on K -1 degrees of freedom for finite K.
n
n
In fact it would
Kn < z ] -
be of interest to give a rate for the convergence of P[ PLRT
T
K
n
(z) to zero (T
K
n
is a chi-squared distribution on K - 1 degrees of
n
freedom) as K increases with n.
n
However this problem is not addressed
here.
<€O'l.ou,atty.
3.2
Under the assumptions of Theorem 3.3. the distribution of
converges to a standard. normal.
PROOF:
It is easy to show that for x
J1+x - 1
In other words for ~
that
and subsequently
~
>0
~ co.
0
such that Ixl
Now. by Theorem 3.3.
PLRTK ~
-"""K:-:-' ~ 1 as n
>0
~
x
there exists 0
<~.
x
that for each
>0
> -1. x
PLRTK -K
2K
-1
<0
~
~
implies
0 as n
~ co,
Putting x = K PLRT - 1 one gets
K
77
(K- 1 LRTK)~-l
P
K- 1 LRT -1
K
- 1
<c
-+
1
as n
-+
00.
2JK- 1 LRT
-1
On {K PLRT
K -1
> -I}
consider
(K-1pLRTK)~-1
K- 1pLRT -1
K
2(K-1pLRTK)~
(3.18)
(K-1pLRTK)~-1
K- 1pLRT -1
K
2(K-1pLRTK)~
The last two terms on the RHS of (3.18) go to 1 in probability and the
first term on the RHS of (3.18) converges in distribution (by Theorem
3.3) to a N(O,l) random variable.
Since P[K
-1
PLRT - 1
K
> -1]
1, the
-+
~
corollary is proved.
D
3.5 Consistency of the Partial Likelihood Ratio Test
If one suspects that the coefficient in Cox's regression model is
time dependent, then it is natural to approximate the coefficent by a
step function as in Brown (1975) and Anderson and Senthilselvan (1982).
Then to test if in fact the regression coefficient is time dependent, a
PLRT or a score test as in Moreau et. al. (1985) could be used.
However
78
this test can not be consistent for general nonconstant
~
o
. If the test
is formulated within the sieve context (i.e. as data/information
accumulates allow for a larger alternate space), a consistent test can
result, as is proved below.
For a discussion of consistency against
contiguous alternatives see note 3 following Theorem 3.4.
Theorem 3.4 Assume,
a)
is nonconstant and is Lipshitz continuous,
R
"'0
lim 1I1!1I 4n
n
b)
there exists
c)
for b €
~
00,
~(K) <
lim 1I1!1I 2= 0, lim
n
n (;(1)
an open set containing [
such that
sup
(b,s)~x[O,T]
inf
s€[O,T]
AI, A3, C,
00,
~o(s),
ISj(b,s) - sj(b,s)1
n
sup
s€[O,T]
=0
(1)
~o'(s)]
j
p
= 0,1,
define
KL(b)
d)
~
=
= ITo{(~
o
(s)-b)E(~
,s) +
0
there exists [bL,bu] C
d1)
there exists a
inf
KL(b)
b€[bL,bu]
d2)
there exists 6
~
Ln[ SS~(b,S),s)
(~
]}
SO(~
o
,s)A (s) ds
0
for which the following holds:
> 0 for which
< [KL(bL )
>0
0
A KL(bu)] - a, and
such that
inf
KL(b)
b€[bL,bu]
> 6.
79
Remark
Lemma 3.5 gives conditions under which d) holds.
PROOF:
In the following proof the subscripts and superscripts "T" are
deleted for ease of presentation.
Consider,
2n-1{~n(~(H1»
~n(~(Ho)}
-
= 2n-1{~n (~(H1»
(3.19)
_ ~ (~n)} + 2n-1{~ (~) - ~ (~ )
n
n
n 0
+ 2n-1{~ (~ ) - ~ (~(H )},
non
0
where ~ is as in section 2.4 ( a ""," has been added so as to remind one
that ~ is a vector);
dimensional vectors,
that is, both ~(H1) and ~ can be written as K
~(H1)
= (~ ..... ~)
and
~n
= (~ .... ~).
'"
A Taylor series about ~(H1)
of the first term on the RHS of (3.19)
results in,
-n-
1
_K cJ2
* .r..2 ~n(~ )(~
aPj
J=1
Also for each j
'"
-Il;)
= 1, ... ,K
where
Therefore. on
_K
rrj=1
2n-1{~n(~(H1»
{-(2 j n)
-
-1
~n(~)}
a..22
aPj
1I~**_~nll ~ 1I~-~(H1)1I
**
~n(~
)
> O}.
80
= n -1
~
By Theorem 2.1. lemma 3.4 and C2. P[
{-i.n)-l
j=l
as
n
~
Cll)
J
a2 ~ (~**)
8~
> 0]
n
~
1
and
2
-(i.n)-l ~~ (~*)
J
8~ n
j
=O{l).
max
l~j~K ({i n)-l 0'
_2
____
8~
j
]
~ (~**) 2
P
n
j
For the above application of Theorem 2.1. note that the assumptions lim
n
nllill
10
o (I).
=0
and A2 are not necessary since one only needs
max I~-pnl =
A
l~j~n
J
Therefore
p
2n-1{~ (~{H » - ~ (~n)}
n
1
n
= n- 1
= n-
(3.20)
1
iK i-1rn~
~~ (~»)2
j ~
8{jj n
0 (I)
j=l
i{K) n
= i{K) [Op{n-
p
iK {i.n)-2r8~
~ (~»)2
J
~ j n
j=l
1
tJ
0 (I)
P
2
4
lIill- ) + Op{lIiIl )]
by lemma 2.1
2
4
= 0 ({nlliIl )-I) + 0 (lIiIl )
P
Next consider the second term on the right hand side of (3.19). and
p
recalling that pn{s) =
.iK
J=l
Ij{s)P~
e.
81
= n-1
KL (p)t
n
t
[ S~ (13. s)
]
~ f O (13 (s)-p(s»X (i) + Ln 0
dNs(i)
i=l
0
s
S (13 .s)
n
t
€
[O.T]
0
The compensator of KLn (13-n ). is
""'"
KL (~)t
n
[SO(~S)
t
= fO[(~(s)
] 0
- p(s»E (13 .s) + Ln ~
S (13 .s)A (s) ds.
n o S (13 • s) n 0
0
n
0
For fixed s consider
(~(s) -
"""":S~o~(~_,s_)
13 (s)) E (13 .s) - Ln[
]
Sn(pO's)
o
n 0
~ ~
l
sup
€[O.T]
sup
be
V
Ib-Po (u) I<'Y
n
(b.u)
]
~
(p
(s)-p (s» 2
0
if I~(s)-po (s)1 < 'Y
= "e,,4
<KLn (~) (3.22)
0p (1)
(by the definition of ~).
KLn (~) >T
~
2n
-1 T -n
f O (13 (s)-po (s» 2 Sn2 (13 0 .s)A0 (s) ds
82
1 T
S0 (p
s) ]] 2 0
+ 2n- f O Ln
~ on
S (~ ,s) A (s) ds
[ [ S (p ,s)
n 0
0
n
0
is 0 p (11211 4 n- 1 ) (using Lenglart's inequality as stated in section 1.2).
The first term on the RHS of (3.22) is 0 (11211 4 n- 1) by the definition of
p
~.
The same is true for the second term on the RHS of (3.22) using the
Taylor series argument in (3.21).
Therefore and by (3.20) above,
2n-l{~n(P(Hl» - ~n(P(HO»}
If there exist an e
>0
for which
(3.23)
"
then
K~[2{~n(P(H1» - ~n(P(Ho»} - K]
=
nK~[2n-1{~n(P(H1» - ~n(P(HO»} - Kn- 1]
=
nK~[ n-1{~n
(~ o
) - ~ (P(H »}
n
O
+
0
p
and the theorem will be proved.
To prove (3.23), consider
n-1{~n
(~ o
) - ~ (P(H »}
n
o
where for
~ €
!I
(1) ]
~
00,
83
KL ({j)
n
t
= n-
1
t
[SO{{j.S) ]
~ JO{{j (s)-{j)X (i) + Ln ~
dNs{i) for t € [O.T]
i=l
0
s
S ({j .s)
n
0
The compensator of KLn ({j) is given by
for t € [O.T].
Assumptions c) and C1 imply that
To show that KLn{{j)T
~
~
KL{{j).
consider
<KLn ({j) - KLn ({j»T
= 0 p (I)
by c). A1.and C1.
By Lenglart's inequality KLn{{j)T
~ KL{{j) for each {j €~. Since KLn{{j)T
is a convex function in {j €~. this implies that sup IKL {{j)T - KL{{j) I
{j€A
n
op (I)
for each cOmPact set A C
~
and that KL is convex on
~
(see
Theorem 11.1 in Appendix II of Anderson and Gill. 1982).
A
If P[{j{HO) €
[~'bu]]
~
n-lllO
1. for bu. bL as in d). then for
~
< 0
(o as in d2.)
A
P[KLn{{j{HO»T
~
P[
> ~]
inf
KLn{{j)T
{j€[bL·bu]
> ~.
=
84
~
P[
~ P[
> E.,
inf
/3€[bL ,bU]
sup
IKLn (/3)T - KL(/3)
/3€[bL ,hu]
I < -E.+o,
"
In order to prove that P[/3(HO) € [bL,hu]]
there exists a
>0
for which
be done since KL is convex on
~);
< a/4,
"
T
" T
P[/3{HO) € [bL,hu]]
-1
PEn
~
"T
{~{/3{Ho»
n
'
I, recall that by dl),
< [KL(bL )
A KL(hu)] - a.
inf
KL(/3) + a/2 (this can
/3€[bL ,hu]
IKLn {/31)T - KL(/31)
I < a/4.
~
n ~
< a/4]
1
as
CD.
the convex function KL {/3)T' this implies that
n
1 as n
-
~
1.
then
IKL{hu)T - KL(hu) I
But since /3{HO) minimzes
<
~
n~
n~
inf
KL(/3)
/3€[bL ,hu]
Choose /3 1 € (bL,hu) for which KL(/31)
~ P[IKLn{bL)T - KL(bL) I
"
/3(HO) € [bL,hu]]
~
n
~
(/30 )}
CD.
Therefore there exist
> E.]
~
1 as n
~
E.
>0
for which
CD.
o
Notes to Theorem 3.4
1)
Theorem 3.4 also proves consistency for the sequential test of
corollary 3.1.
value of t
~
In fact the result of Theorem 3.4 holds for any fixed
T. i.e. for any B
> 0,
If the above holds for even one value of t, the sequential test of
corollary 3.1 will be consistent.
This implies that the sequential test
will be consistent for a wider range of alternatives that the test based
85
only on the likelihood ratio test statistic at time T.
For example. the
conditions of lemma 3.5 would need only hold on the interval [O.t] for
some t € [O.T].
Recall that PLRTK =
2)
~
T
2{~n{P{Hl»T
~
T
~n{P{HO»T}'
-
From the proof of
n
Theorem 3.4. it is obvious that for each B
> O.
P[ K~lpLRTK
>B
] ~ 1
n
as n
~ 00.
This in turn implies the consistency of the Fisherian
transform of the PLRT in corollory 3.2. since (PLRTK )
~
-
-~
K~
=
n
3)
Theorem 3.4 gives conditions for the consistency of the PLRT
against a fixed alternative.
Another possibility would have been to
consider a contiguous alternative.
Let ~~ be the probability under
which ~ has intensity with regression coefficient. Po + P{s)/~ at
time s [i.e. As{i) = e
(P +P{s)/~)X (i)
0
s
Ys{i) AO{S). s € [O.T].
i=I •...• n] and let~: be ~~ where P
Ln[ :; 1converges in ": -
=O.
Then lemma 4.3 implies that
distribution to a N(-
~ a~. a~) (a~ =
o
J~ ~(s)S2{PO.S)AO{S) ds).
n
If for some statistic TS • one has
then Le Cam's lemma 3 (Hajek and Sidak. 1967. p.208) implies that
n
TS
~(~~)
> N{oLTS'
2
0TS)'
If 0LTS # O.
against the contiguous alternative (P
shown that.
o
then TS
+ P{s)/~).
n
is consistent
In lemma 4.3 it is
86
Ln[
::"nP ] = n-~
~
~n IT ~(s)
i=l
0
1
X (i) dM (i) - -2
s
s
IT0
~2(s)S2(~
.s)A (s) ds
0
0
o
+
On
0
p
(1).
the other hand. Theorem 3.3 gives that.
2{~n(~(Hi»T - ~n(~(H~»T}
- K
-hi(
+ 0p(l).
Using Rebolledo's central limit theorem for local martingales (as in
section 1.2) 0LTS will be the limit in ~~-probability of
<n~ ~ Ie ~(s)X (i) dM (i).
i=l
s
0
s
But this is equal to (under appropriate conditions).
n~(2K)~ ~ ITO ~ I.(u) U(u.i) IT I.(s)~(s) ds dM (i) +
i=l
j=l
J
u
which converges to zero in probability.
J
u
(1)
0
p
In other words the PLRT is. in
general. not consistent against a contiguous alternative of rate n~.
Intuitively this means that the rate should be slower. perhaps ~ n~.
n
Consider
~~vK.
~~[Xs(i)
€
and for simplicity. for each component of
[0.1] V s € [O.T]]
= 1.
Therefore by lemma 4.3. for each B
then.
> O.
~.
assume
87
-~ -~
That is. this slower rate (K-n
probabilities.
) does not produce contiguous
In chapter IV. statistics which are consistent under a
contiguous alternative are considered.
3.6 The Independent and Identically Distributed Case
In section 2.6. conditions under which~. the sieve estimator is
consistent. were given.
It turns out that except for path properties of
the sj(~ .s). the assumptions made are also sufficient to imply
o
asymptotic normality.
All that is needed is to verify a Lindeberg
condition as in B) of section 2.4.
In fact. instead of B. a weaker
condition B') is sufficient.
B')
max
l~i~n
there exists 0
>0
such that.
fT I{s : Ix (i) I Y (i)
O
s
s
= 0p(l)
for each
> ~ ..;n. ~
~
0
(s) X (i)
s
> -olx
s
(i)
I}
ds
> o.
To see that B') is sufficient. consider equation (3.3) of Theorem 3.1.
fT n- 1
o
(s)X (i)
xn
X (i) 2 e 0
s
Y (i)A (s)
i=l s
s
~
I{s
0
Ixs (i)IY s (i)
> ~..;n}
Ix s (i)IY s (i)
+
-1
fT
n
O
(s)X (i)
xn
X (i) 2 e 0
s
Y (i)A
i=l s
s
.~
(s) I{s
0
Ix s (i)IY s (i)
ds
> ~..;n.
88
> t~.
+
o
ff1 X (i) 2 e
~ fT n-1
O
~ (s)X (i)
. 1
1=
s
T
max f O I{s
s
-0 Ix (i)
s
Ixs (i)IY s (i)
I~i~n
>-
I
]I..
0
olx s (i)l} ds
(s) I {s : Ix (i)
S
> t~.
I >t
~} ds
~0 (s)Xs
(i) > -olx
(i)l}
ds 0 (1)
s
p
by AI. CI
= 0 p (1)
+
T
max f O I{s : Ixs (i)IY s (i)
> t~.
I~i~n
~ (s)X (i)
> -olx
(i)l}
ds 0 (1).
s
s
p
0
Therefore B') can be used instead of B) as the Lindeberg condition for
asymptotic normal! ty.
Consider n i.i.d. observations of (N.X.Y) where both X and Yare
left continuous with right hand limits.
Then under the assumptions
given in corollory 2.1. the Lindeberg condition B') holds.
Remark.
The following proof is virtually identical to the verification
of the Lindeberg condition in Theorem 4.1 of Anderson and Gill (1982).
PROOF:
Consider
max
I~i~n
for some 0
fT I{s : Ix (1) IY (i)
O
s
s
> O.
> t ~. ~ (s)X (i) > -olx (i) I} ds
0
s
s
to be chosen.
~ max €-In-~ fTO Ix (i)IY (i) I{~ (s)X (i) > -olx (i)l} ds
I~i~n
s
s
0
s
s
Note that if Z(I) ..... Z(n) are i.i.d. random variables with finite
second moment then
89
P[ sup
n~hIZ(i)1 > ~] ~ n P[n-~IZI > ~]
l~i~n
S
~
~h
(n
2
Z dP
Izl>~)
-+
0
as n
~
-+
00.
Therefore to prove that
max
l~i~n
is
0
n-~ STO
Ix
s
(i)IY (i)I{P (s)X (i)
s
0
s
> -olxs (i)l} ds
(1). all that is necessary is that
p
2
E STO Xs Ys I{P0 (s)Xs
> 0 so that [
Pick 0
> -olxs I.
P (s)X
o
s
E
T
~
inf
P (s)-20. sup
P (s) + 20] C
s€[O.T] 0
s€[O.T] 0
there exists P' € ~ for which P' X
s
2
So
> -olxs I} ds < 00
X Y I{P X
s s
0
s
> -olx
s
I}
ds
<T
E[
sup
> O.
(b.s)~x[O.T]
e
then if
One gets.
b X
_-2
s Y X-]
s s
o
by assumption d.) of corollary 2.1.
Notes to Corollary 3.3
1)
If lim nll211 8
= O.
n
lim nll211
n
4
= -E <
lim 2
n
(1)
00.
and 00. D4 hold. then
00.
under the conditions of corollary 2.1. ~ S~ ~(s) - Po(s) ds
converges weakly to a Gaussian martingale.
2)
If P
is constant in s. lim nll211
o
n
.
2 rK ,
11m
~
n
(1)
= 1.
4
=
00.
lim 11211
n
2
= O. and
then under the conditions of corollary 2.1. corollary
3.1 implies
max
t€[a 1 ···· .~]
2{~n(~(H~»t - ~n(P(H~»t}
converges weakly to
- i(t)
(2KT-1)~
sup
IWtl where W is a standard Wiener
t€[O.T]
90
process.
To prove consistency of the PLRT, Theorem 3.4 can be used, but
condition d in that theorem requires an continuity assumption in
addition to the assumptions of corollary 2.1.
It turns out that
assumption c of the following corollary is sufficient.
3. 4
<€o'tollo'tg
Consider n i.i.d. observations of (N,X,Y) where both X and Yare
a.s. left continuous with right hand limits.
If
~
a)
o
is nonconstant and is Lipshitz continuous,
2-E <
b)
lim nll211 4 =
n
c)
X and Yare continuous in probability and
d)
the conditions of corollory 2.1 are satisfied,
then for every B
lim 11211 = 0, lim 2
n
n
(1)
00,
> 0,
P[2{~n(p(RIl IT~~n (P(Rbl lTl
PROOF:
- K
as
n -+
00.
The conclusion above will follow if the conditions of Theorem
3.4 are satisfied.
[inf
s€[O,T]
E[
00,
~o(s) -~,
sup
e
(b,t)~x[O,T]
bXt
Choose~'
sup
s€[O,T]
4)t
YtX
<
as in corollary 2.1 (i.e. for
~o(s)
00,
+
~]
C
~).
~
> 0,
~'
Since
the dominated convergence theorem implies
that Sj(b,s), j = 0,1,2 are continuous on
i' and by Theorem 111.1 in
Appendix III of Anderson and Gill (1982), c) of Theorem 3.4 is
satisfied.
=
8
0
Also by the dominated convergence theorem, 8b S (b,s)
=
91
1
8
1
2
S (b.s) and 8b S (b.s) = S (b.s). for (b.s)
E ~
.
x [O.T].
To show that
condition d} of Theorem 3.4 holds. lemma 3.5 can be used.
i~f
in corollary 2.1 it was shown that
SO(b.s}
(b.s}~lx[O.T]
inf V(~ .s)
s€[O.T]
0
> 0.
= E [e
Also. since V(b.s)
Recall that
>
° and that
bX
s Y (X
s
s
° (Exs is defined in corollary 2.1).
V(b.s} ~
Assumption c}. implies that SO(b.s} and Sl(b.s} are continuous on
~.
x [O.T].
(b.s) E
To see this. suppose there exist a sequence
~.
x [O.T] as k
~
00.
(~.sk) ~
Then.
lim E [ exp(h. X }Y X' - exp(bX }Y X ]
-k sk sk sk
s s s
k
= Elim
k
= E
=
[exp(b. X }Y X - exp(bX }Y X ]
-k sk sk sk
s s s
° V [ ebXs+ys+Xs+ -
e
bX]
sY X
S S
°
since X. Y continuous in probability implies that for each s, X = X
s+
s
a.s. and Ys+
= Ys
a.s.
A similiar proof can be used to show continuity
°
of S .
o
3.7
Lemmas 3.1, 3.2, 3.3, 3.4 and 3.5
Lemma 3.1
Assume A, C1, Dl, and lim
E
n
c;(1}
n
<
00,
define,
a2
"2
-1
a = - i
n 8~
-n
~ (~
n
i
a 2i = JTo lies}
V(~
3
1 8
* ~;;Il
) - - ~ (~ )(P .. -P .. ) and,
2n 8~3 n
i i
°
.s}S
(~
i
,s}A (s) ds.
000
92
then on lip~ - {jn II
< !kY,
PROOF:
_J(
y-
i=1
'""2 2
(a.2 - a.)
1
_J(
~ 2
Y-
[T
;;nf O Ii(s) V (p
,5) dN (.)
i=1
1
n
_J(
+ 2
Y-
5
3
[12n -3
8
*
<£ ({j
i=1
8{ji
n
'"" - ~)
- ]2
Hrf;
.5"Y,
~ 4
_J(
T
;;n
Y- (f Ii(s) V (p ,5) dM
O
i=1
+ 4
-K
r-
i=1
2
+ .51!(K)
n
(f
T
O
'""
5
;;n
0
0
Ii(s)[V (p ,5)8 ({j ,5) - V({j ,5)8 ({j ,s)]X (s)ds)
n
-
2
IIpn-~1I
+ 4
2
(.»
0
0
max
s~p
1(l!i n )
l~i~K {j
1I{j*-~1I<.5"Y
_J(
-1
Y- (I!i
1=1
n
ESK
T
;;n
0
0
3
-1 8
* 2
-3 <£ ({j ) I
8{ji
n
0
f O Ii(s)[V (p ,5)8 ({j ,5)
n
n
0
0 2 4
- V({j ,5)8 ({j ,5)] X (s)ds) 0(1I1!1I)
000
2
93
Using Lenglart's inequality (Lenglart. 1977) it is easy to show (using
lenuna 3.2) that
_K
r-
i=l
( ai-a
2 "'2)2
i
=
°P(IItll4n
1
4)
4)
R
) 0(11"11
~
+ II;;n_
~
~n I1 2 0 p (11"11
~
4
+ O(lItll)
T
r- (ti-1 f O
Ii(s)[V
.-K
i=l
n
;:;Il
0
(p .s)S (f3 .s)
n
0
o
- V(f30
.s)S 0
(f3 .s)]
A (s)ds)
0
2
All that is left is to prove that
.-K
-1 T
;:;Il
0
0
2
r(i. f O Ii(s)[Vn (p .s)Sno
(f3 .s) - V(f3 .s)S (f3 .s)] A (s)ds)
0
0
0
1=
.11
_K
. 1
-1
= r- (ii
1=
T;:;Il
0
f O Ii(s)[Vn (p .s) - Vn
(f3 .s)]S (f3 .S)A (s)ds)
on 0
0
2
(3.24)
+
_K
-1 T O O
2
r(ii f O Ii(s)[V (f3 .s)S (f3 .s) - V(f3 .s)S (f3 .S)]A (s)ds)
i=l
non
0
0
0
0
Using lenuna 2.2. it is easy to show that the first term on the LHS
2
of (3.24) is Op(lIiIl ).
The second term.
_K
-1 T O O
2
r(ii f O Ii(s)[V (f3 .s)S (f3 .s) - V(f3 .s)S (f3 .S)]A (s)ds) •
non 0
0
0
0
i=l
can be divided up into terms such as
_K
IT
j
.
r- (i-i f O Ii(s)IS (f3 .s) - SJ(f3 .s)lds)
i=l
n 0
0
j
2
°p (1)
= 0.1.2. by lenuna 2.2. and the fact that inf SO(f3 .s)
O~s~T
0
> O.
94
The proof will be concluded if for j = 0.1.2.
_K
fT
r(e -1
0 lies) IS j (~ .5)- S j
i
i=l
n 0
1
.5) Ids) 2 = 0 (----4)·
(~
p nllell
0
The above LHS is less than or equal to.
_K
-2
T j
j
2
r{e
e
fOeS
(~ .s) - S (~ .s» ds)
i
i
. 1
n 0
0
1=
e
Using A2. and lim ~
n
< 00
yields the desired result.
(1)
o
Lemma 3.2.
-E
e <
Assume A. C. D and lim
n
00.
(1)
then.
sup
O~t~T
1..Jii f ot (~
i=l
(~
lies) 22i] {E
.5) - E
0i
non
(~.s»So{~
.s)}. (s)
n 0
0
PROOF:
Using a Taylor series on ~(s) about ~ (s) at each s results in
o
En (~.s) - En (~0 .s) = (~{s) - ~ 0 (s»
Vn (~0 .s)
where I~{s) - ~o {s)1 ~ I~0 (s) - ~{s)1 and subsequently.
dsl
95
~
sup
O~t~T
I~ fot
[tK
1.(5) i 2i]
i=l
a.
1
(~0 (5) - ~n(S»
(~ ,s)80(~0 ,S)A0 (s)dsl
V
non
1
T
fO
r
+ vn
(~
o
-n
2
~ (5» ds
(5) -
0p (1)
(by A, C1).
Using the definition of ~n it is easy to see that the second term above
is 0 (~lIiIl4).
p
sup
O~t~T
~
I~ fot
sup
O~t~T
As for the first term,
(~
[tK
1.(5) i 2i]
i=l
a
1
i
(5) 0
~n(s»
(~ ,s)80(~,s)A
V
non
(5) dsl
0
I~ f~ (~ (5) - ~(s»dsl
0
t
_K
ii V (~ ,5)80 (~ ,S)A (5) - 1] ds.
+ sup I~ f O
I~ (5) - ~(s)1 [ Y1.(5) --2
O~t~T
0
i=l 1
a. n o n 0
0
1
The first term above,
sup
O~t~T
shown to be 0(~lIiIl4).
~ fT
I~ (5) - ~(s)
O
o
<
~ foTI~0 (s)
-
I
I~ f~ (~ (5) - ~(s»dsl, has already been
0
The second term is equal to,
2
ai ]
_K
Iv (~ ,5)80 (~ ,s)A (s) - [ Y1.(5) i I Ids-O(l}
non 0
0
i=l 1
'i
- ~(s)1 Ivn
(~ o
,s}80(~
,s}80(~
,s}IA (s}ds-O(l)
n 0 ,s) - V(~0
00
2
T
0
+ ~ fol~ (s) - ~(s}1 IV(~ ,s)8 (~ ,s}A (s) -
o
0
0
r
2 ) f TI V (~ ,s}80 (~ ,s) = 0p (vnllill
o no no
+ 0
p
(~lIiIl4)
V(~
0
0
[ _K
ai]
Y- lies) i I Ids-O(l}
i=l
'i
,s} 80 (~0 ,s) Ids
by D4.
Using A2 and C1, results in,
f TO
I V (~
,s}80 (~ ,s) non 0
V(~
0
1
,s}80 (~ ,s) Ids = 0 (~).
0
PJil
o
96
3.3
~emma
a)
~
0
is a constant function.
4
lim nllill = 00. lim lIill 2 =
n
n
i
lim .:..1!l < 00 • AI. A3. C.
n i(I)
b)
c)
1)
Assume.
Consider
for
~ (~) = ~ ITO
n
_I(
~ € ~-.
Let
i=I
An
~
IIi II
2)
Consider ~ (~)j
n
for ~ €
j
m and
= I ....• K.
=
4
iK
j=I
and
00.
Ij(s)[~.X (i) - Ln SO(~ .. s)] dN (i)
J s
n
be the maximizer of
n
_I(
r- [~j
A
j=I
- ~]
2
~n(~)'
J
s
Then
= 0 (1).
p
0
j
~ ITO I (s)[~ X (i) - Ln SO(~.s)] dN (i)
v=I i=I
v
s
n
s
1
j
Let~. be the maximizer of ~n(~) .
•
A
J = I ..... K.
Then
max
I~j~K
I~ - ~ I
j
0
=0
P
(I)
PROOF:
Essentially the proofs of 1) and 2) use the same techniques as
Theorem 2.1.
That is. lemma 2 of Aitchison and Silvey (I958) combined
with Taylor series arguments are used.
97
then by lemma 2 of Aitchison and Silvey (1958)
Since
_a2
min
1~HK
~ ~ (~) ~
ap j
0, 1) will be proved.
n
Following the same argument as in the proof of Theorem 2.1, but
expanding about the K dimensional vector with
instead of
_K
r-
(n 2 j )
j=1
~
o
for each coordinate
-n
~
-1
aRa
~j
~ (~)
n
(~i-~
)
0
+
max
1~j~K
I
(n 2 )
-1
j
a2~
ap~
J
~ (~ )
n
0
-1
+ 2j
o.21 - L
J
3
+ .5
1~HK
Choose ~ €
mK.
sup
~*~
max
J
II~*-~o II~II~-~0 II
so that II~-~ 11 2
o
(3.25) it is sufficient to show that
and
I(n 2.)-1
= (11211 4n)-1
[)2.
n
a 3 ~ (~*) III~-~ II
a~
n
j
Then to prove
0
98
and
max
l~j~K
Following the proof of 1) in lemma 2.1. results in
by AI. A3 and Cl.
max
l~j~K
ej )
I(n
-1
Again following the proof of 2) in lemma 2.1. gives
a2
a..2.
Pj
~n(fjo) + e j
2
~
4
= 0p«n ell)
ojl
) + 0p(l)
The proof of 3) in lemma 2.1 can be imitiated to yield
I(n
ej )
-1
3
a
---3 ~
afj
n
* )1
(fj
=
° (1).
p
j
(3.25) is proved.
Let
> 0. and let
~
~ (fj)j = 1 j
n
Denote
v
1
. 1
J=
~ ITO I (s)[fj X (i) - Ln SO(fj.s)] dN (i)
v=1 i=1
ej
v
n
s
s
by a • v=I •...• K and set a = 0.
O
v
If.
K
pI j=1n {sup
fjEIR
(3.26)
(a. n)-1
J
~A ~n ({j)j
jJ
Ifj-fjo I=~
-+ 1
n-lllO
then as in the proof of 1).
~
P[3
3
I~-fjol <~. ~fj ~n(~~)j
= 0.
and 2) holds.
Using a Taylor series about fj
(a.n)
J
-1 d
dA
jJ
~
n
j
(fj) (fj-fj )
0
o
results in
j = 1 •... . K]
-+ 1
n-lllO
99
-1
= (a.n)
J
d
d R ~ ({3
tJ
n 0
)
2
({3-{3 ) + {a.n)-I d
o
J
d{32
j
+ (2a.n)
J
where 1{3~ - {3
J
0
So that for 1{3-{3o 1= e. j
(e2
I(
1{3-{3
0
I
I~HK
{{3-{3 )2
0
3
-1 d
* j
3
---3 ~ ({3j) ({3-(30)
d{3
n
j = I ..... K.
= I •.... K and
max [e -1 {ajn)-I
~n {{30 )j
by C2
Idd{3 ~n {R o )j I
tJ
3
1
d
* j e I]
I(2a .n)- ---3 ~ «(3)
d{3
J
n
The above relation implies that if
max
l~j~K
I(ajn) -1
and
then (3.26) will be proved.
Consider first
max
l('(K
_J_
I(ajn) -1
d
dR
tJ
~
n
({3 )
0
jl
dd
R
tJ
~
n
«(3) jl
0
= 0p (I).
.
100
=
max
t €[ a1' ...
'~ ]
~ 2- 1 n-~ sup
1
= 2
In-~ ~ f t X (i) - E (~ ,s) dM (i)1
t~[O,T]
-1
n
1
-~
t
~ fo
X (i) - E (~ ,s) dM (i)1
s
nos
i --1
l(tn)-l
0p(l)
o
i=l
s
nos
by an application of Lenglart's inequality
and Al, Cl.
Next consider
max
I (ajn)
l~j~K
+
= 2
-1 d2
--2~(~)
dp
n
j
-1
-a.
~
2
j
a 1
J v=l v
0
O
max
[V (~
,s) 8 (~ ,s)O
- V(~ ,s)8 (~ ,S)]A (s) ds I
It -1 f Ot
no
no
0
0
0
t€[a 1 ,··· ,~]
-1
1
n
-~
0p (1) + 0 p (1)
by arguments similiar to those above.
If
Choose
then the proof is finished.
max
l~j~K
;W
EIR
~
<~
(~
in assumption A3).
Then,
3
l(a j n)-l
~3 ~n(~*)j I
I~ -~ol~~
max
Itt€[a1' ... '~]
1
f~ dN (e)1
s
0 (1)
(by A3)
p
+
101
max
t€[a1·····~]
= 2
-~
-1
n
1
Therefore.
It- 1 f ot So(~
.S)A0 (S)
n 0
0 p (1) + 0 p (1)
p[ j=lrf {sup
J
p
by arguments similiar to those above.
(a.n)-l dd n
~€IR
dsl 0 (1)
fJ
~n W)j(~-~0 )
<
o}
-+ 1
n~
for c
< 'Y.
I~-~ol~c
which implies (by Aitchison and Silvey's lemma 2 (1958»
j = 1....• K]
that
-+
1
for c
< 'Y.
n~
j=l ..... K. 2) is proved.
o
3.4
Assume.
~ellUl&a
a)
lim 2(1) n
~
=
00.
and
n
b)
AI. A3. C1
then. for
~ where
sup
t€[O.T]
I iK I.(s)~nj
j=l
J
-
~ (s)1 = 0p{l) as n
-+
00.
0
and
I2-1
j
2)
T
onf O Ij{s)Vn(Pj.s) dNs{e)
T I.(s)V(~ .s)S0 (~ .S)A (s) ds I = 0 (I)
- 2.-1 f O
J
J
0
0
0
p
PROOF:
= };v 2 and a =
O
v
j=l j
(s) gives
Define a
series about
~
o
o. Then. fix s € I j . A Taylor
102
vn (~j's)
where
= Vn (~ 0 (s).s)
1~:-~o(s)1 ~ 1~-~o(s)1
Recall that
sup
sE[O.T]
sup
If I.(s)V (~.s)
tE[O.T] o J
n J
t
therefore
Ib-~o (s) 1<'Y
dN
s
(e) - fot I.(s)V (~ .s) dN (e)1
J
nos
t
~
(3.27)
sup
hEIR
sup
If Ij(s)l~nj-~ (s)1
tE[O.T] o
0
To prove 1) consider.
max
It- I fot i< Ij(s)V (~j's)
n
j =I
t E[ aI ..... ~ ]
P
(e)
s
o
max
tE[a I •··· .~]
max
0
O O
(Vno
(~ .s)S (~ .s) - V(~ .s)S (~ .s»X (s)
It -1 f Ot
no
0
0
0
sup
I i< I j ( s)l3~ sE[O.T] j=I
+
0
In~ fot V (~ .s) dM (e)1 e i-II n-~
nos
tE[a I •··· .~]
+
(e)1 0 (1).
s
t
- t-Ifo
V(~ .s)SO(~ .s)X (s) dsl
(3.28)
+
dN
dN
max
tE[aI •.... a,]
~ 0 ( s) I
t-If~ SO(~
n
e [i
~1
max
tE[aI ..... ~]
.s)X (s) dS]
0
0
ds I
1M t ( e) I
°p (1).
K
(by 3.27).
Applying Lenglart's inequality and assuming AI. Clone gets that the
-1
~
first term on the RHS of (3.28) is i l n 0p(I). The second term on
the RHS of (3.28) is 0 (1) by AI. Again applying Lenglart's inequality
p
and assuming AI. CI. shows that the third term on the RHS of (3.28) is
op (1). hence 1) is proved.
To prove 2) consider.
max
HHK
T
",n
Ii -1
j f O Ij(s)Vn (Pj~'s)
-
-1 T
0
dN (e) - i. f O Ij(s)V (~ .s)S (~ .s)X (s) ds
s
J
n 0
0
0
I
103
+
+
max
12-j1
l~j~K
fb
Ij(s)V (~ .s)[sO(~ .s) - sO(~ .s)]X (s) dsl
non 0
0
0
I i< 1.(s)~-~
sup
s€[O.T]
j=l
+
J
J
max
l<.<K
_J_
The proof that the above is
2~1
J
0
p
(s)
I. [
fTO
I.(s)SO(~
0
J
12-j1 fTO 1.(s) elM (.)
J
s
max
l~j~K
n
0
I
.s)X (s) dS] ·0(1).
0
p
(1) is as in the proof of 1).
o
~ellUlla
3.5
Assume
a)
~
o
there exists
is nonconstant and is continuous in s € [O.T].
~
such that for (b.s) €
b)
~
an open set containing [inf
s€[O.T]
~
(s).
0
sup
s€[O.T]
~
o
(s)]
x [O.T]
CJ 0
1
CJb S (b.s) = S (b.s);
cJ2 S0 (b.s) = S2 (b.s);
-:2
0
S (b.s) > O. and
Ob
V(b.s) ~ 0 for each (b.s) € ~ x [O.T];
are continuous on [O.T] for each b €
c)
>0
V(~o's)
for each s € [O.T];
T
and f O Xo(s) ds
<
SO(b.s). and Sl(b.s)
~.
Xo(s)
>0
for each s € [O.T]
00.
Then for
KL(b)
=Jb
{(~o(S)-b) E(~o's) + Ln[ :~~:.~~) l}s°(~o'S)Ao(S)
ds
o
the following holds.
for each bL
<
inf
s€[O.T]
where [bL.bu] C
~
there exists a
1)
~o(s)
and for each bu
> 0 for which
>
sup
s€[O.T]
~o(s).
104
and
2)
there exists 0
> 0 such that
inf
KL(b)
b€[bL,hu]
PROOF:
g(~
o
Fix s € [O,T].
Define g(b,s)
~,
(s),s) = I, and for each b €
>0
.
= SO(b,s)/SO(~o ,s),
g(b,s)
> o.
then
In the following a '
denotes derivatives with respect to b.
Consider f(b,s) =
(~o (s)-b)E(~
;s) + Ln[
0
S~(b,S)
S (~ ,s)
]
o
= (~o(s)-b)g'(~o(s),s)
+ Ln[g(b,s)].
So
-g'(~o(s),s)
f'(b,s) =
= g"(b.s)/g(b,s)
f"(b,s)
and
f"(~
o
+ g'(b,s)/g(b,s) and,
(s),s)
- (g'(b,s)/g(b,s»2
= V(~0 ,s) > O.
= V(b,s)
~
0,
Since f"(b,s) is nonnegative for each b
and is positive at ~ (5), f(b,s) is a convex function in b € ~ with a
o
unique minimum at
f(~
o
= O.
(s),s)
Also this implies that for each b
Likewise for each b
= O.
< f'(~o (s),s)
< ~o (s),
>
f'(b,s)
This can be carried out for each s € [O,T]; hence
the nonnegativity of SO(~ ,s)X (s) implies that KL(b) is a convex
o
0
function on
~.
Let b €
b+h
€~,
~,
b
>
sup
s€[O,T]
~
(s).
Choose h
>0
small enough so that
0
.
then by the mean value theorem there eXIsts
b* (h,s) € [b,b+h]
such that, f(b+h,s~-f(b,s) = f'(b*(h,s),s) ~ f'(b,s)
Therefore, for b €
~,
b
>
sup
s€[O,T]
~
(s)
0
> f'(~o(s),s)
=
o.
105
.
(3.29)
1
~
KL(b+h)-KL(b)
h
h-<>
Similiarly let b € 2li. b
< inf
~
s€[O.T]
(s).
> 0 small enough so
Choose h
0
that b-h € 2li. then by the mean value theorem there exists b*(h.s) €
[b-h.b] so that. f(b.S)-~(b-h.s) = f'(b*(h.s).s) ~ f'{b.s)
=
o.
As before. this implies that for b € 2li. b
inf
s€[O.T]
~
(s).
0
lim KL{b)~KL(b-h) ~ I~ f'{b.s)SO{~ .S)A (s) ds
(3.30)
h-<>
For bu
<
< f'{~o{s).s)
>
sup
s€[O.T]
0
~o{s)
<
and bL
(3.30) imply that there exists
~o{s}.
inf
s€[O.T]
a
< O.
0
bu. bL
€~.
(3.29) and
> 0 so that inf KL(b) < KL(bu) -
a
b~bu
and
inf
KL(b)
b~bL
< KL(bL)
- a. 1) is proved.
To prove 2). first choose bu
bu. b
L
>
sup ~ (s) and bL
s€[O.T] 0
<
inf ~ (s).
s€[O.T] 0
€~. then choose c small enough so that there exists s*. s' for
which bu-c
> ~o(s*) > c + ~o{s') > c+bL (This
nonconstant and continuous).
can be done if
~o is
Recall that for a convex function on a
cOmPact set. say g. with a unique minimum. say at g(xO)' the following
holds:
Given c
then Ix-xOI
>0
< c.
there exists 6
Therefore. for c
>0
such that if g(x) - g(x )
O
<6
> 0 there exists 6 > 0 such that if
Ib-~o (s*) I > c/2 then f(b.s*) > 6 and if Ib-~0 (s')1 > c/2 then f(b.s') >
6.
However f(b.s)
= (Po(s)-b)E(Po's)
+ Ln[
S~(b.S)
) is uniformly
S (PO.s)
continuous in (b.s) on [bL.bu] x [O.T].
This implies the existence of
balls around 5*. 5'. i.e. ~(s*). ~(s'). for which
when Ib-~ (5*)1
o
Then
> c/2 and inf
s€2li(s')
f(b.s)
inf *
s~(s
f(b.s)
> 6/2
)
> 6/2 when Ib-P (s')1 > c/2.
0
106
JTO[(P (s)-b) E(P .s)
o
+ Ln(
0
S~(b.s) J]
S (P .s)
SO(P .S)A (s) ds
o
l max[ I{b : Ib-PO(s*)I > c/2}
J~ I~(s*)
I{b: Ib-Po(s')/ > c/2}
0
0
(u) SO(PO.S)AO(S) ds 0/2.
J~ I~(s')(u) SO(PO.S)AO(S)
ds 0/2]
this proves 2). since
inf
KL(b) l
inf
max[ I{b : Ib-Po(S*) I > c/2}
b€[bL·hu]
b€[bL·hu]
°
JTO I~(s*) (u) S (PO.S)AO(S) ds 0/2.
I{b : Ib-po(s')/ > c/2}
>
°
(since SO(PO.S)AO(S) >
°
for S
I{b : Ib-Po(s') I > c/2} l 1).
€
J~ I~(s,)(u) SO(PO.S)AO(S)
[O.T] and I{b
ds 0/2]
Ib-po (S*) 1 > c/2} +
D
IV.
LINEAR TFSf STATISTICS AND GENERALIZATIONS
4.1
A General Linear Statistic
An
As was seen in Theorem 3.1, the {P (s)}s€[O,T] behave as
independent random variables with means {Po(s)}s€[O,T].
In this
situation, the optimal test statistics for inferences concerning the
{Po(s)}s€[O,T] are often linear test statistics (particularly if the
random variables are multivariate normal).
Zl' ...
'~'
Given K random variables,
a linear statistic is of the form
iK Wi
Zi where the Wi'S
i=1
are appropriate weights.
In the situation here, the analogous statistic
Gill (1980) uses a
linear statistic for inference in the multiplicative model of Aalen
s
-
-
So [Xs (-)]
In this model S- ~(s) ds is given by
= P(s)Xs (i».
.
(1978) (X (X,i)
0
-1-
dN (-) which when suitably normalized will converge weakly
s
to a continuous Gaussian martingale (Gill, 1980).
test statistics of the form IT w(s)P(s) ds
o
= IT0
Gill then considers
w(s)[x (_)]-1 dN (-).
s
s
Since w may involve unknown quantities, w will have to be
estimated, say by w.
n
In this section conditions for convergence of w
to ware given so that the linear statistic has the appropriate
n
108
asymptotic distribution.
Also in this chapter. a (s) is given by V(~ .s)So(~ .S)A (s)
2
addressed.
and
A2
a
o
=
(s)
n
In section 4.2. the optimal choice of w is
_K
Y- I.(s)(i.n)
j=l J
J
-1 a2
An
--:2 ~ (~ )
a p~j n
0
0
for s € [O.T].
Theorem 4.1
Assume
a)
w has at most a finite number of discontinuities on [O.T] and
lIill-2
b)
lim nlltll
4
=
ITo (wn (s)-w(s»2 ds = 0 p (1).
lim IItll
lXI.
n
A. B. C. DB. lim t
is nonconstant.
o
d)
lXI.
(1)
n
~
= O.
n
-E <
c)
and if
2
lim nlltll
6
<
D2. D4. and
lXI.
n
e)
w is Lipshitz continuous between discontinuities on [O.T]
then.
T
A
Yh I o wn (s)[~(s) - ~0 (s)] ds =
n
~
T
~
-2
_K
i~1 I o j:; Ij(s)a j
T
;;Il
I o Ij(u)w(u) du (Xs(i) - En(p .s» dMs(i)
+ 0p(1)
2
= I To
where a j
r
2
Ij(s)a (s) ds and
~
T
vn I o wn (s)[p (s) -
~
0
(s)]ds
=> N(O.IT0
!!D
2
2
w (u)[a (u)]
-1
du)
Consider.
PROOF:
T
An
Yh I o wn (s)(~
(s)-~ (s»ds
0
= Yh
T
An
I 0 (wn (s)-w(s»(~ (s)-~ 0 (s» ds
+
T
A
Yh I o w(s)(~(s)-~0 (s» ds
109
The first term on the RHS above can be shown to be
T
An
I~ I (w (s) - w(s»(~ (s) - ~ (s» dsl
o
n
0
-2 T
<
[llill
I0
= [llill
(wn (s)-w(s»
2
ds nllill
-2 T
I o (wn (s)-w(s» 2 ds] ~
0
p
(1) as follows;
2 T An
I0
0 (1)
p
(~(s)-~
0
(s»
2
ds]
~
(by Theorem 2.1 or
lemma 3.3)
=
0
p
(1)
Therefore
T
A
~ I o wn (s)(~(s) - ~0 (s»
(4.1)
T
ds
An
= vh
I o w(s)(~ (s)-~0 (s» ds +
0
= vh
IT w(s)(~(s)_pn(s» ds +
vh
0
p
(1).
IT w(s)(pn(s)-~ (s»
0
ds + op(l).
0
The second term on the RHS of (4.1) is identically zero if
~
o
is a
constant function, otherwise, if w is piecewise Lipshitz continuous it
4
is O(n~ lIill).
To see this consider,
IT w(s)(pn(s)_~ (s»
o
ds
0
=
t<
T
T
2
_IO~I.K./_s_)w_(_s_)_I.....;;O----lI JI-.(_u_)(_~....;..o_(u_)_-~.-;o~(_s_»_u_(_u_)_d_u_ds
j=l
By D2, there exists L
2
IT
o Ij(s) U (s) ds
> 0 for which
I~0
(u) - 0
~ (s)0
- ~'(s)(u-s)1 ~ L(u-s)2 a.e. Lebesgue.
Therefore
lITo w(s)(pn(s)-~0 (s»
~
dsl
I .t<1 i~l
IT0 I.(s)w(s)~·(s)
IT0
J
J
0
J=
I.(u)(u-s)u2 (u) du dsIO(l)
J
110
_K
I .r1
+
-1
i.
J=
J
I T Ij(s}w(s} I T Ij(u}(u-s} 2 du ds I0(1}
0
0
>0
By 04, for u € I. there exists L'
J
la2 (u}
(recall that a j
=
-
a2 (a j -1}I ~
~j ii' a
i=1
lITo w(s}(~(s}-P 0 (s})
+
for which
L' lu-a j _ 1 1 a.e. Lebesgue
= O).
Then,
0
dsl
_K
rIj=1
i -1 I T I.(s}i.3 ds I0(1}.
j
0
J
J
Again by D2,
Ip~(s) - P~(aj_1)1 ~ L1s-aj_11
and bye), there exists L"
a.e. Lebesque
e
> 0 for which
on the intervals that do not contain a discontinuity point of w, so
that,
lITo w(s)(~(s)-P 0 (s»
dsl
Therefore. by (4.1) and the above arguments.
.,hi I
T A T
w(s)(jj1(s) - p (s»ds = .,hi I
0 0 0
A
-n
~
4
w(s)(jj1(s)-P (s» ds + O(n lIill )
Following the same steps as in Theorem 3.1, results in
~.
111
vh I To
An
-n
w(s)(~ (s) - ~ (s»
= n-~ IT
ds
iK I.(s)w(s) o~2 ~
J
J i=l
o j=l
+
0
p
IT
0
I.(u)(X (i) - E (~.u) dN (i) ds
J
u
n
u
(by lemmas 2.1. 3.1)
(1)
So
An
vh I To
-n
w(s)(~ (s) - ~ (s)ds
(X (i) - E (~.u»
u
n
+
= n~h . ~1 IT0
1=
0
p
iKI.(u)o-j2I T I.(s)w(s) ds (X (i) - E (~.u»
0
J
u
n
J= 1 J
~
+ n I
T
u
(1)
.
(4.2)
dN (i)
dM (i)
u
-2 T
_K
r-Ij(u)oj I I.(s)w(s) ds [E (~ .u)
o . 1
0
J
n 0
J=
-n
0
- En (~ .u)] Sn (~0 .s)A0 (s) ds + 0 p (1).
The second term on the RHS of (4.2) can be shown. by lemma 3.2. to be
o (1).
p
This concludes the proof of the first assertion.
The next step
is to show that ~ I T w(s)(p~ (s) - ~-n (s)ds has the required asymptotic
o
distribution.
Zt = n~
~
i=1
Define the process Z by
It iK
0
j=1
I (u)O-j2
j
IT
0
I.(s)w(s) ds (X (i) - E (~.u»
J
u
n
dM (i)
U
then
~
T
Io
An
-n
w(s)(~ (s) - ~ (s) ds
= -r
7_
+
0
p
(1).
Rebolledo's central limit theorem as stated in section 1.2 can be used
to show that Z converges weakly to a Gaussian martingale.
consider.
First
112
<Z>
=
t
I 0t
_K
Y-
-4
t
2
2
(I
Ij(s)w(s) ds) [S (~ .s)
J o n 0
I.(u)o.
j=l
J
+
0
p
2
;:;Il
0
+ E (p .s)S (~ .s)
n
n
0
by lemma 2.2 and AI.
(1)
Then.
__ It
+
I ot
+
0
p
_K
Y-
. 1
J=
-4
(IT0
I.(u)o.
J
J
I.(s)(w(s}-w(u}} ds}
J
2
0
2
(u) du
(I)
_K I ( ) -4 2( }D2 2( } d
Yj u OJ w u ~j 0 u
u
o j=l
+
0
P
(by a).
(l)
So.
t
-2
2
by 00.
<Z>t = I o 0 (u) w (u) du + 0 p (I)
To complete the proof all that is necessary is to verify the following
Lindeberg condition:
n
-1
~
T
i~l I o
j:; Ij(u}oj [Io Ij(s}w(s} ds]
-4
-K
T
2
- E (fl.u})
n
-1
• I{n
-4
T
e
~ (u}X
OJ (I o Ij(s)w(s} ds}
(i)
u
0
2
2.
(XU(l)
Y (i}A (u)
u
0
;:;Il
(Xu (i}-En (p .u)}
2
>~}
du
113
= 0 p (1)
for each
~
> o.
The proof of this is virtually identically to the proof of the Lindeberg
condition in Theorem 3.1 and is omitted.
o
In order to formulate a statistical test, as consistent estimator
for the variance of
"'n
vh I T
o wn (s)[~ (s)-~0 (s)]ds is needed. The next
theorem furnishes this.
Theorem 4.2
Assume,
a)
ITo
b)
lim nllill 4 =
(w (s) - w(s»2 ds
n
= 0 p (1),
lim lIill 2 = 0, and
00,
n
n
i
c)
AI, A3, C, DI, D3, lim.;i!9n c;(1)
<
00,
then
PROOF:
Recall that
"'2
=
o (9)
n
_K
r-
j=l
I (s)(2 .n)
j
i2
-1
(-) ---
~
a~~ n
J
J
and
o
2
(s)
= V(~o ,s)8
0
(~
Then,
(s)
II To wn2 (s)o"'_2
n
(4.3)
ds - I
T 2
0
W
(9)0-2 (s) ds I
0
,s)A0 (s).
~
(p )
114
"'2
By Theorem 3.2.
2
> o.
inf
a (s)
s€[O.T]
Note that by C2.
sup
la (s)
s€[O. T]
n
Therefore.
lIt w2 (s);-2(s) ds - It w2 (s)a-2 (s) dsl
o n
n
0
=
ITo(w
(s) - w(s»2 ds 0 (1) + 0 (1) + {IT (w (s)-w(s»2 ds}~ 0 (1)
n
p
p
0
n
p
=
4.2
0
p
(1)
o
by a.
Efficacy of the Linear Statistic
COnsider the statistical model given in section 2.4. but with the
further assumption that,n = a{Nn(i). s ~ t. i=l ..... n} x,n where,n ~
t
s
0
0
a(~n). 'So c,n.
Let ~p be the probability under which!! has stochastic
(13 (s)+f3(s)/Yh)Xn (i)
intensity.
X (i) = e
s
.
l=l ..... n and let
n
denote
o
~
s
0
n
~R
~
where 13
yn(i)X (s) s € [O.T].
s
_
=0
0
(X (i) = e
s
f3o (s)X:(i)
yn(i)X (s) s € [O.T] i=l •...• n). Note that up to this point all
s
0
asymptotic results given in this thesis are under ~n. On,n assume ~n
o
n
= ~13"
a.s.
and.
Since the Xn(i)'s are locally bounded.
which in turn implies that ~p «~:
0
~
sup
IXn(i)1
i=l s€[O.T]
s
0
< 00.
(see Kabanov et. al, 1980)
115
+
~
i=1
It [ l-e
~(s)n-~
s
(i»)
e
~
0
(s)X (i)
s
Y (i)A (s)
s
0
0
€ [O.T].
t
If ~~ is contiguous with respect to ~~ then Le Cam's third lemma
(Hajek & Sidak. 1967) can be used to choose a weight function which
results in a linear test statistic with highest asymptotic power against
the contiguous alternative
~
o
(s) +
~(s)n
-~
.
This is the method Gill
(1980) uses in order to formulate two-sample tests in survival analysis.
In particular. this method leads to a generalization of the Wilcoxon
test to censored data.
~n.
o
In lemma 4.2.
n
~~
is shown to be contiguous to
This result is well mown; see (Dzhaparidze. 1986) for a much more
general proof in the point process setting.
joint asymptotic normality of
The theorem to follow gives
~ ITo wn (s)(pn(s)-~ (s) ds.
0
and
Ln[~P]
~n
.
o T
An application of Le Cam's third lemma is then possible and follows the
theorem below.
Theorem 4.3
Assume.
a)
~(s)
b)
w has at most a finite number of discontinuities in [O.T] and
is bounded on [O.T]
lIill-2
c)
d)
lim nllill
n
4
= co.
~n
(w (s) - w(s»2 ds ~> 0
n
2
lim lIill = O.
n
ITo
i
A. C• D3 • 11n·m:ill.
i(l)
< co
.
d
and 1nstea
sup
n-~IX (i)
l~i~n s€[O.T]
s
max
0
f B•
~n
IY
(i)
s
_0_>
0
116
and if
~
o
e)
is nonconstant.
6
lim nllill < 00. D2. D4 and
n
w is Lipshitz continuous between discontinuities on [O.T]
f)
then.
(Yh
T
A
I o wn (s){~{s)-~0 (s» ds.
where
2
uLS
U
ITo ~(u) S2{~ 0 .u)A0 (u) duo and
2
=
L
u LLS
= 1T0
w{u)~{u)
duo
Lernnia 4.2 implies that.
PROOF:
Ln[
2
2
-1
1T
w (u)[u (u)]
duo
0
=
~; ) = n~i=1'ffl IT ~{s)X s (i)
~n
0
IT
elM (i) - .5
s
0
~(s)S2{~
.s)A (s) ds
0
0
o
+
0
~n
(1)
o
and by Theorem 4.1.
'"
Yh I T
o wn (s){~{s) - ~0 (s»ds
= n~ 'ffl
i=1
IT iKI j {s)u-j2I T Ij{u)w{u) du {X (i) 0 j=1
0
s
+
0
~n
(I).
o
where.
Let
zn =
t
n~ 'ffl It ~{s)X (i)
i=1
0
s
elM (i)
s
E (pn.s»
n
elM (i)
s
117
and
Xn = n-~ It ~ I.(s)a~2 IT I.(u)w(u) du (X (i) - E (pn,s»
t
0'1
J
J
0 J
s
n
J=
dM (i).
s
Then using Rebolledo's central limit theorem in section 1.2, lemma 4.2
and Theorem 4.1, all that is necessary to conclude this proof is to show
that
~n
<Zn, Xn > ~>
t
It0
~(s)w(s) ds
for each t
€
[O,T].
But,
<Zn,Xn>t =
Ito ~(S)[j=1
~
I.(s)a-j2 IT Ij(u)w(u) dU]V (~ ,s)So(~ ,S)A (s) ds
J o n 0
n 0
0
(4.4)
+
It
o
~(s)
T I.(u)w(u) du ] (E (~ ,s)
Y-Ij(s)a-2
j I OJ
'1
no
J=
[ _K
- E (pn,s»SI(~ ,S)A (s) ds
n
n
0
0
Consider the second term on the RHS of (4.4):
It
o
~(S)( '~I.(s)a-j2IT
Ij(u)w(u)
l J
0
J=
dU](E (~ ,s) - E (pn,s»Sl(~ ,S)A (s) ds
no
n
no
0
IE (~ ,s) - E (pn,s) I ds 0 (1)
non
~n
= Ito
(by AI, C)
o
= 0 ~n(1)
(by lemma 2.2).
o
Therefore
<Zn ,Xn >t = It
o
~(s)
( Y-K Ij(s)a-2
j
j=l
IT
0
Ij(u)w(u) du )V (~ ,s)S0 (~ ,S)A (s) ds
non 0
0
+
0
~n
(1)
o
= It
o
2I T I.(u)w(u)
~(S)[v=li<I.(s)aj
J
0 J
(~ ,s)So(~0 ,S)A0 (S)-W(S)]
du V
non
ds
118
+
It
o
~(s)w{s)
ds +
0
'!jn
(I)
o
The first term on the RHS above is bounded above in absolute value by
-1 T
I.{s)t. I
J
J
0
0
I.{u)w{u) du V {~ ,s)S (~ ,s)A (s)
J
non 0
0
-1 2
- w{s) t j OJ
=
0
p
by the continuity of w{s) and
(I)
(by C2)
ds O{l)
°2 (s)
and AI.
D
As outlined in section 1.1 of this thesis, a special interest lies
in testing various alternatives against the null hypothesis that
an unknown constant function.
ITo
=0
w{s) ds
o
is
To do this, first restrict the possible
ITo
weight functions to those for which
imply that
~
w (s) ds = 0 (which will then
n
if w converges to w appropriately) so that
n
asymptotically under the null hypothesis the statistic will have zero
mean.
The results of Theorem 4.3, can then be used to produce optimal
linear test statistics (optimal in the sense of maximizing asymptotic
power) for these problems.
LS (w )
nn
= V£
Define the linear test statistic to be
An
T
I ow
n (s)[~ (s) -
~ ] ds
0
= v£
T
An
I on
w (s)~ (s) ds.
Le Cam's
third lemma (see Hajek & Sidak, 1967) and Theorem 4.3 , imply that the
'!1~ distribution of LS (w ) converges to a normal distribution with the
n
fJ
n
2
mean 0u.s and variance 0LS of Theorem 4.3. Also define v£ Vn (wn ) =
T 2
A2
-1
I o wn {s)[on (s)] ds. Theorem 4.2 and Le Cam's lemma I, imply that
v£
Fix z
a
'!1n
V (w )
n n
:JL)
to be the (I-a)
distribution; then,
o2
th
LS
=
IT0
w2 {s)o-2(s) ds.
percentile by the standard normal
119
1 n [LSn(Wn )
o
jV (w )
n n
~n
[ LSn (wn )
[3
jvn(wn )
>
Za] a as
)
->
n
~ 00.
and
n [ LSn (wn ) - °LLS
_ GUS ]
--) z
a
°LS
°LS
jvn(wn )
za]
= ~[3
~
~[z
1 -
a
- 0LS
OLLS)
as
n~
00.
Therefore to maximize the asymptotic power of the test statistic LS (w )
n n
against the contiguous alternative. [3
maximize
°LLS
0LS
o
+ [3(s)/vn. w should be chosen to
subject to the restraint that
IT
w(s) ds = O.
Of course.
0
one must then find suitable w converging to w.
n
procedure are given in sections 4.3 and 4.4.
the efficacy of
LS (w )
n n
Examples of this
°LLS
The ratio - - is called
°LS
The following lemma gives the weight function
.Jin (wn )
which maximizes the efficacy.
Lemma 4.1
for s € [O.T].
then
o
0LLS
w maximizes _ _ =
°LS
subject to the constraints I
and 0
PROOF:
< ITo
w2 (s) 0-2(s) ds
T
o
<
IT
w(s)[3(s) ds
~=-o"-::-_ _--::~_ _-:-:'
{ITo
w2 (s) 0-2(s) ds}~
w(s) ds = O.
00
Let w be a Lebesgue measurable function which satisfies
ds = 0 and 0
< IT0
w2 (s)0-2(s) ds <
Then IT w(s)[3(s) ds = IT
0 0
00
ITo
w(s)
120
IT P(u) 02(u) du ]
w(s) P(s) - _o~T;:--:::2:------ ds, which implies by the Cauchy Schwarz
[
I o 0 (u) du
inequal i ty, that
[J~ w(s)P(s) dS]2
~
2
[
I To w2 (s)o -2 (s) ds I T
o 0 (s) P(s) -
I To P(u)
T 2
Io
0
0
2 (u) du ] 2
ds.
(u) du
That is,
IT w(s)P(s) ds
[
---"T~o2"""'--~2--~IL ~ ITo 02(s) P(s)
{I o w (s) 0- (s) ds}n
IT P(u) 02(u) du
-o.;;.."T=--=2---I o 0 (u) du
]2 ds.
This concludes the proof as the RHS above is the efficacy of woo
[]
Notes to Theorem 4.3 and Lemma 4.1
1)
The result of lemma 4.1 indicates that in order to construct an
optimal linear test statistic for a given contiguous alternative,
p(s)n
-~ ,s€[O,T],
_ I To P(u)o2 (u) du
I To
0
2
one should use the weight function w0 (s)
].
Since
O
W
Po
+
= 0 2 (s) [P(s)
involves unknown quantities, it must be
(u) du
estimated.
The following two sections will illustrate the choice of the
optimal weight function and its estimation.
2)
In order to construct a test which will be consistent against
all contiguous directions of approach to P , (in the classical case,
o
this would be a likelihood ratio test), it is tempting to use the
results of lemma 4.1 and Roy's union-intersection principle (Roy, 1953).
But as is well known this only leads back to the PLRT (which is not
121
consistent against contiguous alternatives}. as can be seen from the
following argument.
For a given approach to the constant function
~o'
reject H
o
~o
is
the regression coefficient. if
18 (wo }2
n
[n~ f~ w~(s) ~n(s}
=
n
dS]2
_
Vn (wno )
=
~
-
""2
0
f
~(u}o (u)
T ""2 n
dU]
ds
]2
(u) du
0
T
""2
f o ~(u}on(u) dU]2
[
f o ones} ~(s) -
for any
f
-="_--.,;.o~n-------
.
T ""2
is large.
T
~ T""
""2 [
n f o ~n(s)on(s} ~(s)
[
ds
T ""2
f o 0 n (u) du
Then using Roy's union-intersection principle reject H if
o
(i.e. any direction of approach) the above statistic is large.
That is. use
sup
T
~ T""
""2 [
n f o ~(s}on(s) ~(s)
[
f
-
0
f
""2
~(u}o (u)
T ""2 n
0
dU]
ds
]2
(u) du
-=-_.......;;o~;;;;n------
13
T
""2
f o ~(u}on(u) dU]2
f o ones) ~(s) T ""2
ds
f o 0 n (u) du
T ""2
as the test statistic.
[
n
~
T '"
[
But this is equal to.
'"
[",
f o ~(s}on(s) ~(s) -
f
T ""n
0
T "'2 n
(u) dUJ
f o 0 n (u) du
T '"
-
"'2
~ (u}o
"'2
f o ~(u}on (u) dUJ2
T ""2
f o 0 n (u) du
ds
ds
]2
122
This is a Wald type statistic and can be shown (when properly
normalized) to be equivalent to the PLRT of chapter III.
To see this.
consider
~
T "'2
'"
2
T "'2
'"
2
[n f a (s)~(s)-~ ) ds]
= n f a (s )(~( s) - ~) ds - -_0~TFn-=-:"'2::---_-':o:....-_o n
0
f a (s) ds
o n
'"
2
f To a"'2n (s)(~(s)
- ~)
ds + 0p (1)
0
=n
(by arguments
similiar to those in Theorem 3.1)
Recall that
~
P
•
a~....j ~n(~o)
j - ~o = -a2~---'
- -
a~
j
=
_K
T
j=1
0
r- f
~ (~
n
then
)
0
n
"'2
Ij(s)a (s) ds
n
-~
-n
-1
a
~ (~
n
0
af.l.... j
i< [n~a~ ~
j=1
.... j
2
+ 0 (1)
a2
-
p
a~
~ (~
n
j
=
)
(p)]2
a- 2 +O (1)
j
p
n
0
By the above and Theorem 3.3 one gets that.
)
0
123
[n~ ag j
= -i<
j=l
=
(2 KT
-1
<£ (f3 ) ]2
n
0
0-.J 2 + 0p (1)
'" T
)
_IL
'" T
2{<£n (~(H1»T - <£n (f3(H 0 »T}
(2 KT-1)-~
[
n
1ba.t is,
T '"
n I
T "'2
o
0
n
(s)
[",
pn(s) -
"'2
I o pn(u)on (u) du ]2
T "'2
I o 0 n (u) du
ds - K
+
4.3
0
p
(1).
A Test for Regression
~s
was mentioned in the previous section, interest lies in testing
various alternatives against the null hypothesis that f3
constant function.
o
is an unknown
In particular, one such class of alternatives is the
set of nondecreasing functions.
If a nondecreasing function is smooth
enough then at least locally it can be approximated by a linear,
nondecreasing, function:
in classical regression, this has lead to the
use of a simple regression test to detect this class of alternatives.
In an analogous way, consider the use of the weight function which is
optimal against a linear regression (i.e. set f3(s) = s in lemma 4.1).
R
w (s)
=
2
0
(s) [ s -
1T0 u
0
Io
0
2 (u) du ]
T 2
.
(u) du
In the following corollary, the behavior of the test statistic based on
this weight function in investigated for the class of nondecreasing
124
alternatives (see. also. the notes following the Corollary).
Assume.
a)
~
b)
lim nllell
o
is a constant function.
4
=
00.
lim lIell
n
c)
2
=0
and
n
A. B. C. D4 lim
n
e
..;.rn
<
00.
~(1)
then for
"'2
o
(s) = -
n
_K
r- Ij(s)(ejn)
-1
j=1
T
cJ2.
"'n
----2 ~ (~ )
a ~. n
J
"'2
f 0 u on (u) du ]
"'2
[
R
W (s) = ones)
s n
and
T "'2
f o 0 n (u) du
• one gets.
N(O.I)
and if assumption B in c) is replaced by
B' :
max
l~i~n
~
sup
n~IXs (i)1
s€[O.T]
is bounded on [O.T]
then.
T 2
~(~~)
fo
0
(s) [ s -
__>N
T
T [
{fo s -
PROOF:
f T u 0 2 (u) du ]
~ 2
~(s) ds
f o 0 (u) du
2
dU]
f 0 u 0 (u)
T 2
f o 0 (u) du
~
2
2
0
(s) ds
}
•
1
125
To use Theorems 4.1, 4.2 and 4.3, the following needs to be
2
11211-
verified:
ITo
If!'n
[wR(s) - wR(s)]2 ds
n
R
w (s)
=a
2
(s)
[
s -
-
> O.
0
I~T u2a
2
In this case,
(u) du ]
I o a (u) du
.
Now,
T
(4.5)
"2
I~
I o u a n (u) du
T "2
+ 4
[ I a (u) du
o n
T "2
2
2
u a (u) dU]2 T 4
I o a (s) ds
T 2
I o a (u) du
+ 4Jo (an(s) - a (s»
2
ds
Consider the second term on the RHS of (4.5),
2
ITo
u ;2(u)
du
IT0 u a (u) du ]2
n
[ I To a"2n (u) du - ---=T::---:::2--I 0 a (u) du
"2
T
= [I u a (u) du
on
IT0 a 2 (u)
du -
IT0 u
2
a (u) du
"2
ITon
a (u)
du ]
2
0
#1 (1)
o
by Theorem 3.2 and C2).
T
"'2
= [I u (a (u) - a 2 (u»
o
n
du I T a 2 (u) du ]2 0
o
~n
(1)
o
T 2
+ [I a (u) - a"2 (u) du I T u a 2 (u) du ]2 0 (1)
o
n
0
~n
o
=
I T (a"2 (u) - a 2 (u» 2 du 0
o
n
~n
(1)
(by the Cauchy-Schwarz
o
inequali ty)
126
Theorem 3.2 implies that
10A2n (s)
sup
s€[O,T]
-
2
0
(s)1 =
0
~n
(1).
Therefore by
o
C2, (4.5), and the above,
IIill- 2 IT [wR(S) - WR(S)]2 ds = IIill-2 IT (;2(s) - 02(s»2 ds
o
n o n
0
~n
(1).
o
Now
I~[;~(s)
~
- 02(S)]2 ds
2 I
T
_K
r- Ij(s)[(i j n)
o j=l
+ 4
+ 4
(4.6)
= 2
I
T
a2
An
---- ~ (~ ) - (i. n)
J
a ~2 n
_K
r- Ij(s)[(i. n)
o j=l
-1
J
T _K
r- Ij(s)[i.-1 o.2 o j=l
J
J
A
[T Ij(u) V
_K
rI
j=l
n
0
a2
--:2
a P~j
a2
--:2 ~ (~ )
aP n 0
~ (~)]
n
2
ds
0
-1 2 2
+ i j oJ.] ds
j
0
+ 4
-1
j
[T Ij(u)[Vn (pn,u) - vn (~
-K I
r-
j=l
I
-1
(~
0
0
2
(s)]
2
,u)] dN (e)
]2
U
-
0
ds
,u) dN (e) - I
U
TIj(u) 2(u) du ]2
0
0
Imitating the proof of lemma 3.1, one gets that the second term on the
RHS of (4.6) is 0
~n
(n- 1 ).
Also by the Lipshitz continuity of 02(u) , the
o
last term on the RHS of (4.6) is 0
~n
4
(IIill ).
o
Using a Taylor series for each j
Vn (~,u)
= Vn (~0 ,u) +
J
A
a
aI-'j
R
So by A3 and the fact that
*
Vn (~.,U)(R.-~)
J
I-'J 0
An
max I~-~
l~j~K
j
0
1=
where
0
~n
o
*
IRj_R
I
I-' 1-'0 I ~ I~_R
P j 1-'0
A
(1) (from lemma 3.3).
127
I~-~ I
J
0
0
~n
(1).
Using the
o
above, the first term on the RHS of (4.6) is
t<
j=l
[ST
L{u)
J
0
=
dN (e )]2
U
I~j - ~ 12
0
0
~n
(1)
o
t<
. 1
J=
[p~
J
_~ ]2
max
e
l~j~K
0
+
[[ST
I.{u)
dM
J
0
(e)]2
U
[STOI.{u)So{~
,u)X (u)
J
no
0
du]2] 0
~n
(1)
o
=
O~n[{lItIl4n)-1) [O~n{~)
o
+
0
O~n{lItIl2»)
(by lemmas 2.2 and
0
3.3).
Therefore by (4.6) and the above,
By Theorems 4.1 and 4.2 and the fact that
T
R
STo
w (s) ds = 0,
n
,.
~ So wn (s) ~(s) ds
and by Theorem 4.3 and Le Cam's third lemma (see Hajek and Sidak. 1967,
pg. 208)
~
STo
wR{s) pn{s) ds
n
~(~~)
fJ
128
2
T 2
[
JT u a (u) dUJ
J o a (s) s - ~ 2
~(s) ds
J a (u) du
N
T [
J
{ o s -
J~
0
u a (u) du J2 2
}~ .
a (s) ds
T 2
J o a (u) du
2
1
o
Notes to Corollary 4.1
1)
The result of the above corollory implies that the test
T
R
'"
~ J w (s) ~(s) ds
~
statistic given by
T oR n 2 "'_2
{J w (s) a (s) ds}
o n
n
[~
contiguous nondecreasing function
for which
> ~(s') > ~(O)].
~(T)
is consistent against a
is nondecreasing and 3 s' € [O.T]
This means that under the contiguous
alternative. the asymptotic mean is positive.
JT u a2 (u) du
denote ~ 2
J o a (u) du
by
s.
Note that
s€
To prove consistency
(O.T).
Then
and
T2
J - a (u)[u-s]~(u) du
s
~
T2
-J - a (u)[u-s]~(s) duo
s
One of the above inequalities is strict depending on whether s'
s'
~
s or
> s. Therefore
This means that under
n
~P'
{V (wR)}~
n
converges to a normal random
n
variable with a positive mean.
2)
In addition to the assumptions of corollary 4.1. assume d) and
129
e) of Theorem 4.1, then using the results of Theorems 4.1 and 4.2, one
R
R
also gets that LSn (wn )/{Vn (w)}
n
~
is consistent against the following
class of fixed alternatives:
A=
{~
o
satisfies D2 and is nondecreasing and 3 s' € [O,T] for which
>
>
~ (T)
~ (s')
~ (O)}.
0 0 0
To see this consider,
Recall that
-
by note 1 above.
~o'
JTo a2 (u)
du
]
~
(s) ds
>0
0
All this implies that
LS (wR)/{V (wR)}~
n n
n n
3)
JTo u a 2 (u) du
~>
~0 belongs to A.
if
00
If interest lies solely in detecting monotonic alternatives to
~o
a constant function, and not particularly in the estimation of
a function, then a simpler test could be formulated as H :
o
unknown versus HI :
~(s)
=a
likelihood ratio criterion.
+
~s,
~
> 0,
~(s)
=a,
as
a
a unknown, using some sort of
Cox (1972) proposed the use of a
multivariate regression model; the two covariate processes being X and
s
sX , s
s
€
[O,T], and the regression coefficient
and alternate hypotheses become H :
o
~
= 0,
~
= (a,~)
and HI :
~
€
> o.
2
m.
The null
In this
130
case, a test statistic would be v£ ~ divided by an estimate of its
'"
(~
standard deviation
is the maximum partial likelihood estimator of
~).
R
It turns out that the test based on LS (w ) has the same efficacy as
n n
that based on v£~ (i.e., they have the same asymptotic power functions
with respect to contiguous alternatives).
and
~o(s)
~o
= (l,s) •
= ao ·
To see this, let
~
= (ao'O)
Then Theorem 3.2 of Anderson and Gill
(1982) can be used to get (under appropriate conditions),
'"
~
(~
n
= ~-1
~o)
-
-~
n
U(~o) + 0p(l)
where
U(~o )
= {
~IT Xs (i)
i=l
(~
~nIT seXs (i)
- E
,s) dM (i),
nos
i=l
0
0
(~
- En .s»
dM (i)}T
os
and
I~
~
T
2
I s o (s)
o
02(s) ds
=
[ 1T s
0
2
0
(s) ds
Therefore.
T
2
2
I s o (s) ds
o
-ITo
s
A test statistic for H :
o
- n
~
~
i=l
IT
0
0
~
2
(s) ds
>0
s
0
p
(1)
0
2
(s) ds
2
IT
0 (s) ds
o
versus HI :
(X (i) - E (P
s
n 0
+
-ITo
.s»
> O.
~
dM (i)
s
would be based on.
2
IT
uo (u) du}
0
T
131
= n
-!.-S
- (X s (i) - En (~ 0
.s»
fT U0
0
2
(u) du} dM s (i)
+
0
p
(1).
The above converges in distribution under H to N(O. 1~1-1 fT 02(s) ds)
o
0
by Theorem 3.2 in Anderson and Gill (1982).
Using a very similar proof
to the one in Theorem 4.3. results in
where
2
o 'Y =
-1 T
I~ 1 f o ~(s)
2
0L
2
0
f To u 0 2 (u) du ] T 2
(s) [ s - -;;"'T~2=---- f o 0 (u) du ds
f o 0 (u) du
T 2
2
= f0
~ (s) S (~ .s) A (s)
0 0
!.-SA
The efficacy of ri
-1
'Y
T
= I~ 1
fo
T
[
2
2
is then. (0'YL) /0 'Y
0
2
(u) du
f~
2
=
T [
s -
f T u 0 2 (u) du
~ 2
f o 0 (u) du
f T0 u 0 2 (u) du ] 2 2
T
fo
dU]
[T
2
[
u 02(u)
f o ~(s)o (s) s - ---::T::----:2=---f o 0 (u) du
f o ~(s)o (s) [ s fo
ds.
2
0
0
ds
]2
] ]2
ds
(s) ds
(u) du
But as was seen in corollary 4.1. this is the efficacy of the test based
R
on LS (w ).
n
n
132
4.4
A Test for a Change Point
Instead of considering a smooth alternative to the null hypothesis
that
~
o
is constant as was done in the last section. consider the
alternative of a change point.
constant function versus HI :
That is. consider H :
o
~o
~
0
is an unknown
is constant up to an unknown point T €
(O.T). changes value and then stays constant at the new value.
Praagman
(1988) gives a nice discussion of the reasoning behind classical change
point tests.
test.
Essentially they are generalizations of the two sample
Say the time of change. T. is known: then to detect if a change
occurs. a two sample test. TS(T). can be used.
To allow for that fact
that T is in reality unknown. one considers a weighted sum of the
TS(T)'S or a maximum of weighted TS(T)·S.
In this section the latter
route is chosen to formulate a test for change point.
Suppose Xi
~ N(~i'
variables. where for i
unknown).
test.
a
~
-2 ) i = 1 •.... K. are independent random
i
T.
~i
=
~
+ 6 and for i
> T.
~i
=
~-6 (~
It is easy to derive the uniformly most powerful similiar
This test rejects 6 = 0 in favor of 6
~
TS(T) =
i=T+1
a
2
i
~ a~
i=1
T 2
'J. a i Xi
i=1
'J.T a 2
i=l i
~ a~
> O.
~
i=T+1
if.
2
a i Xi
is large.
i=1
Recall that one can intuitively think of n
~""'n
~i
~
.~
AiI p (s) ds
o
~.
•
-2
i = 1•...• K. as independent N(n AiI ~ (s) ds. AiI a (s) ds) random
000
variables.
= n
Therefore. it apPears that the above two sample test
2 · -2
statistic with Xi = n~ AiIo• p~ (s) ds and a i = AiIo a (s) ds would be
appropriate for testing whether there is a change in
~
o
at time T.
The
J
133
test statistic would then be,
(4.7)
This is precisely the statistic to which lemma 4.1 leads.
contiguous alternative of interest is
~o
+
~(s)n
-1h
where
That is, the
~(s)
= orcs
~
O of lemma 4.1 is,
therefore, W
T) - 0 r{s > T):
I: a {u)
2
2
o
w (s) = a (s)
[
r{s
~
T) -
T
Io
du ]
a (u) du
2
and nih IT wOes) ~(s) ds is given by (4.7) above.
o
gives the distribution for this two sample test.
Corollary 4.2 below
Recall that
Assume
a)
~
b)
lim nil I! 11
o
is a constant function,
4
= co,
lim II I! 11
n
c)
2
= 0,
and
n
A, B, C, 04, lim
n
E
n
c;{1)
< co
For each t € [O,T] define,
TSn (t)
= nih
IT0 [ r{s
~
t)
A2
An
an (s)~ (s) ds
T A2
on {Io a n (u) du
= 0
elsewhere, and
> O}
134
I~ ~~(u) du ]2 A2
"""":T=-""=A-:::'2--a n ( s) ds
I o a n (u) du
r A2
on {I o
=0
a
n
(u) du
> O}
elsewhere.
Then
~(~n)
1)
TS
TS (in the supremum norm topology on C[O.T]). where
0)
n
TS t = G - g(t)Gr' G is a continuous Gaussian martingale with variance
t
It a2 (s) ds
process <G>t
2
a (s) ds and get)
= I 0t
0
= """":r:---:::'2---
I o a (s) ds
• t € [O.T].
and
2)
Ivn (t)
sup
t€[O.T]
- V( t)
I
~n
_...;;0;.....-)
Vet) =
O. where
ITo a2 (s)
ds (g(t) - g2(t».
and if assumption B in c) is replaced by
~n
B' :
max
Hi~n
~
sup
n~IXs (i)1 Ys(i) _o ) 0 as n --+
s€[O. T]
co
and.
is bounded on [O.T]
then.
3)
TSn
~(~r;)
> TS
+ ~ (in the supremum norm topology on C[O.T])
where TS is as above.
~(t)
=
I~ [
1(s
~
t) - get)
~n
4)
sup
t€[O.T]
Ivn (t)
- V(t)1 ---p~> O.
PROOF:
Because
~
o
]a2(s)~(s) ds.
is constant.
t € [O.T] and.
135
18 (t)
n
= ~ IT0
= ~ 1ot
[ I{s
["n
a"2 (s) ~
n
~
"2
t
1 a (u) du ] "2
~ ,,~
1o a n (u) du
t) -
A2
t
JO 1T aA2n (s) [A~(s) -
1 a (s) ds
(s) - ~ ] ds - ~ A~
o
1 a (s) ds
o
"
an{u){~{s) - ~o) ds
~
0
n
t E
] ds.
0
[O.T].
The proof of part 1 follows these two steps.
=0
sup
tE[O.T]
1a)
~
1b)
1o•
~n
(1)
o
[An
a"2
(s) ~ (s) - ~0
n
] ds
converges weakly in C[O.T] to G where G
is a continuous Gaussian martingale with <G>t = 1~ a {s) ds.
2
If 1a and 1b are true. then since f : C[O.T]
~C[O.T]
defined by f{x)t
=
x t - g{t)Xr' t E [O.T] is a continuous functional.
f
[JO 1~ ~~(s) [pn{S)
-
~o]
dS] converges weakly to f{G).
This will then finish the proof of part 1.
To prove 1a). recall that by Theorem 3.2.
o
~n
(1).
sup
sE[O.T]
laA2 (s)
2
- a (s)1
n
This is sufficient to prove 1a). since IT a 2 {s) ds >
o
o.
o
To prove 1b). note that
~ 1t a"'2 (s) ['"
~(s) - ~ ] ds
o n
0
"
= ~ 10t [A2
an{s)-a2]
(s) ['~(s)
- ~o
] ds
(4.8)
+ ~
10t a 2 (s) [A~(s) - ~o ] ds
Consider the supremum of the first term on the RHS of (4.8).
sup
I~ 1t
tE[O.T]
0
["2an (s) - a2]
(s) [A~(s) - ~ ] dsl
0
=
136
(by lenuna 3.3).
=0
(from the proof of corollary 4.1. eq. 4.6)
(1)
!f1n
o
Following the proof in Theorem 3.1. results in
sup
t€[O. T]
l,;n It 02(s) [~(S) -
.-K I t Ij(s)oi
-2
- n ~ ~~ ri=1 j=1 0
=
0
If1n
o
ds
(j ]
0
0
0
2 (s)
I T Ij(u) [X (i) - E ({j .u) ) dN (i) ds I
uno
0
u
(1)
But
n~"J? ~
It I j (S)0-j2 02(s)IT Ij(u)[X (i) - E ({j
i=l j=l 0
0
uno
=
n~"J? .~
i=1 j=l
(4.9)
n~
-
+
It Ij(u)[X (i) - E ({j
0
uno
It
a i (t)-l
0
2
"J?
i=l
dN (i)
U
-2
~ ~~ I T I.(t)(u) [
]
(s) ds 0i(t)n
X.
(1) - E ({j .u) dN (i)
i=l 0 1
uno
U
:Ii
n~
dN (i) ds
u
"J? It
[X (i) - E ({j .u)] dN (i)
i=l a i (t)-l
uno
U
where a i = j=l 2 j
If
.U)]
.U»)
and
a i (t)-l
I- [X (i) - E ({j
<t
uno
0
~ ai(t)·
.U»)
dN (i) converges in distribution
U
to a random function. G. concentrated on C[O.T]. then by the
representation theorem (Pollard. 1984. pg. 71). there exists
sup
t€[O.T]
to
n~h
I~
"J?
i=l
- Gt '
~0
a.s.
I- [X (i) - E ({j
0
uno
as n ~m. and
.U)]
xn is equal
dN (i) and likewise
U
xn.
Gwhere
in distribution
G is
equal in
137
distribution to G.
Therefore the last two terms on the RHS of (4.9) are
equal in distribution to -
rxnai(t) - xna
l
i (t)-l
].
n
rxn l t
X
a i (t)-l
Ghas
Since
] + st
02(s) ds
a i (t)-l
0~2(t)
1
continuous paths and la.( )-t l goes to
1 t
zero.
sup
t€[O.T]
I-xnt + xna
rxnai(t) - xna
2
02(s) ds 0a i (t)-l
i(t)
+ st
i (t)-l
--+
0 a. s.
as n
l
-+
00
i (t)-l
]I
•
Hence the supremum of the absolute values of last two terms on the RHS
of (4.9) converge in ~n_probability to zero as n
n-~ ~ S·
i=l
0
o
[xu
(i) - E (~ .U)]
no
dN (i) converges in
u
random function concentrated on C[O.T].
n~ . ~I O
S· U
[x (i)
- E
n
1=
(~0 .U)]
-+
00.
if
~n-distribution to
a
0
The proof that
dN (i) converges to a continuous Gaussian
u
martingale. G. with variance process <G>t = S~ 02(s) ds. is similar to
the proof of normality in Theorem 3.1 and is omitted.
Therefore by
(4.8) and (4.9). 1b) is proved.
Define
~(t)
st ;2(u) du
T ....2
- ....,o:=---"..;,;n~__ on {S 0 (u) du > O}
- ST ;2(u) du
0
n
o n
= 0
Then by Theorem 3.2 and C.
{STo 0"'2n (u) du > O}. Vn can
elsewhere.
t € [O.T].
sup
Ig (t) - g(t)1 = 0 (1).
t€[O.T]
n
~n
o
On
be written by
Therefore
sup
IVn(t) t€[O. T]
S~
02(u) du [get) - g2(t)]1 =
o~n(l) as maintained.
"J'0
138
Part 2 is complete.
The proof of part 3 will be broken into the following two steps.
3a)
The finite dimensional distributions of TS converge under ~
n
the finite dimensional distributions of TS +
3b)
to
~.
{TSn}n~1 is tight with respect to ~~.
These two steps then imply part 3.
mand
Fix a , ... ,a €
1
m
t , ... ,t
1
€ [O,T] (m does not increase with n), and consider,
I
t i
'"'2
a (u) du ] '"'2
- ~ '"'~
I o a n (u) du
'"'
an{s){~{s) - ~o) ds
Theorem 4.3 can be used to derive the joint asymptotic distribution of
[~;;]
fD a iTS (t.) and Ln
under
n 1
...J,an
.-1
UJ"o
1-
T
~n.
Then Le Cam' s thi rd I ellllla (Rajek
0
and SidAk, 1967, p. 208) will give the asymptotic distribution of
i:: aiTSn{t i ) under ~~.
To use Theorem 4.3, set
I
t i
'"'2
a (u) du ] '"'2
and
[ I{s ~ til - I ~ a'"'~ (u) du an{s)
o
n
2
w{s) = fDa [ I{s ~ til - g{t i ) ] a {s).
i
w (s) = fDa
n
i
i=l
i=l
By repeatedly employing the triangle inequality one gets that
ITo
{w (s) - w{s»2 ds
n
~
'"'2
2
2
IT
ds O{l)
o (an (s) - a (s»
m
139
tv A2
m2
~ a
+
v=l v
I
T
[Io
0
T A2
2
2
= I (a (s) - a (s»
ds 0
o
an(u) du
a2 (u) du
A
ITo
'!In
n
2
-
g(t )
v
]
ds 0(1)
n
(1)
o
=
2
(llill- )
0
'!In
o
by the argument used in the proof of corollary 4.1 (eq 4.6).
By Theorem
4.3,
fD
[ i=1
~('!Jn)
a.TS (t ),
1
n
i
Ln[;i] ] __0_>
o T
N[(O,
~ a~),
n-lOO
and hence by Le Cam's third lemma,
Using a little bit of algebraic manipulation, it is easy to see that the
above implies that the finite dimensional '!J~-distributions of TS
n
converge to the finite dimensional distributions of
~(t) = I~ [
l(s
~
t) - get)
= It
o
~
+ TS. where
]a2(s)~(s) ds
a2(s)~(s) ds - get)
IT0 a2(s)~(s)
ds.
To show that TSn converges in '!J~-distribution to ~ + TS. Theorem 3 in
Pollard (1984. p. 92) can be used.
This theorem gives necessary and
sufficient conditions for the weak convergence of X € D[O.T] to X where
n
P[X € C[O.T]]
= 1.
The first condition is that the finite dimensional
distributions of X converge weakly to the finite dimensional
n
140
distributions of X.
The second condition is that for
corresponds a grid 0
= to < t 1 < ... < t m = 1
p{
lim
~
> 0,
0
>0
there
such that
max
l~i~m
n
So all that is needed is to show that the second condition, above, holds
for 18n under ~.
Let
~
> 0,
0
>0
p]
d1n
[d1
and denote
by L
n
By lenuna
o T
~(~n)
4.2, Ln
2
2
> L where log L ~ N(-.5 aL' aL). Since EL
0
= 1, Y
>0
n-P)
can be chosen so that 1 - E I{L
~(~~)
that 18
n
~'
> 0,
~'
<~
>
~
TS
y
as
-1
n
~
> y}L < ~~.
00; therefore, by theorem 3 in Pollard, for
there corresponds a grid 0
such that
lim
n
~n{ l~i~m
max
0
Then
- n{f3 Hi~m
lim
~
max
n
-
+ lim f I{Ln ~ y}
d1f3n
n
+ 1 - lim fY x d1n L- 1 (x)
-
n
In part 1 it was proved
0
-0
n
= to < t 1 < ... < t m = 1
141
~
y
~.
+ 1 - E I{L
Therefore. TSn
~(~~)
< y}L
> ~
~ ~.
Also since ~~ is contiguous to ~~. both
+ TS.
sup
IVn(t) tE[O.T]
I~
~n
02(s) ds g(t)(I-g(t»1
~> 0
~n
and
sup
Ign(t) - get)
tE[O.T]
I ~>
0
as n -+
00.
o
The test statistic TS (T) can be used as a basis for a change point
n
if the change point is allowed only to occur at time
T.
To construct a
change point test which is valid for any change point in (O.T). TS .
n
taken as a function of t can be used.
In particular Roy's
union-intersection principle is useful (Roy. 1953).
rejected if any of the TSn (t) are large.
That is. H
o
is
This test is quantified below.
See the notes following the corollary for a discussion of consistency.
In the corollary below. get)
2
[get) - g (t)] for t E [O.T].
ceo'WLLClItff 4.3
Define g (t)
n
=
ITo
=0
g-I(V}
n
on
;2(s) ds
{ITo
"'2
(s) ds
n
0
> O}
n
elsewhere. t E [O.T].
= inf{u E [O.T]
g (u)
n
= v}.
v E [0.1]. and
142
STSn
=
sup
tE[h,l-h]
where ~ is defined to be 0, hE (0,.5).
Under the assumptions a), b)
and c) of corollary 4.2,
1)
STS
n
which is equal in distribution to
where W is a
standard Wiener process.
Furthermore, under B' of corollary 4.2,
TS(g-l(t»
2)
sup
tE[h,l-h] [
STS
n
V(g-l(t)~
as n --+
co.
PROOF:
emma 4.3 can be used to prove this corollary as follows.
4.2, TS
By corollary
converges weakly in the supremum topology on D[O,T] to TS where
n
TS t = G - g(t)G.r. and G is a continuous Gaussian martingale with
t
variance process <G>t
= I~
02(s) ds.
verified for X = TS and X = TS.
n
n
g(t)1 =
0
~n
Assumption d) of lemma 4.3 is then
By Theorem 3.2.
sup
I~ (t) tE[O. T] --n
(1) where g is strictly increasing. g(O) = O.
Note that
2
'11
o
is a.s. nonnegative and nondecreasing. and that V is a.s. nonnegative.
n
Define Vet)
= ITo 02(s)
by corollary 4.2.
ds g(t)(l-g(t»; then V is positive on (O.T) and
sup
Ivn(t) - V(t)1 = 0 (1).
tE[O.T]
~n
o
This verifies
143
assumptions a). b). and c) of lemma 4.3.
o
0
g
-1
Gr.
)t - t
converges in
n
(TS 0 g-l)t
sup
t€[h.l-h] (V 0 g-l)t~
~n-distribution to
(G
Therefore STS
Note that (TS
g
0
-1
)t =
This is a continuous Gaussian process
t € [0.1].
with
E(TS
0
1 2
g- )t
= <TS
g
0
-1
~-l(t) 2
= J~
a (s)
>t
~-1(t) 2
and with E(TS
g-l)t(TS
0
0
g-l)
a (s) ds = I
J~
- 2t
o
= (s-st) IT0
s
2
ds + t
T
0
a 2 (s) ds
T 2
I o a (s) ds
2
a (s) ds
(s < t).
This covariance structure characterizes the Brownian Bridge
-1
TS o g
B= T 2
~
{I o a (s) ds}
t(l-t)
Hence STS
n
sup
sup
t€[h.l-h] Jt(l-t)
sup
I~(t) - g(t)1 ~> 0
t€[O.T]
Note that
as n ~ ~ by Theorem 3.2. lemma 4.2 and
Le Cam's third lemma (Hajek and Sidak, 1967, p 208).
Conditions a), c).
and d) of lemma 4.3 are satisfied by corollary 4.2 with Xn
TS
+~.
is equal in
where W is a standard Wiener process.
Lemma 4.3 can also be used to derive part 2.
~n
Also
converges in distribution to
But as is well known.
distribution to
0 g-l)t
=
T 2
I o a (s) ds
(V
is a standard Brownian Bridge.
for t € [O.T].
sup
t€[h.l-h] Jt(l-t)
Therefore
= TSn
and X
=
Therefore,
STS
n
«TS+~)o
sup
t€[h,1-h]
(V
0
-1
g
1
)t
~
g- )t
o
144
Notes to Corollary 4.3
1)
DeLong (1981) has given asymptotic critical values for the test
based on STS .
n
Part 2 of corollary 4.3 implies that, STS
2)
n
against the contiguous change point alternatives.
distribution of srs under
n
distribution of STS
n
sup
t€[h,l-h]
(TS
0
g-l)t
(V
0
g-l)t
n
~fj
under ~
0
is consistent
That is, the limiting
is stochastically larger than the limiting
To see this note that,
is equal in distribution to
sup
t€
was seen in the proof of corollary 4.3.
Therefore if
as
[1, e~h]2]
~ 0
g
-1
is a
nonnegative function with a positive value at, at least one time point,
then
(~ 0 g -1 )t
sup
t€[h,l-h]
(V
will be stochastically larger than
I ot
0
2
But
-1
~(t)
=
=a
+
~
l(s
~
T) where ~
>0
and T €
Using a little bit of algebra results in ~(t) = ~(IT 02(s) ds)-l
o
I~Vt 02(s) ds
g
sup
and since change point
alternatives are of interest, fj(s)
~ 0
g-l)~
0
(s)fj(s) ds
(O,T).
1
I:At 02(s)
ds
>0
for t € (O,T).
is a positive function on (0,1).
This implies that
145
4.5 The Independent and Identically Distributed Case
In the following. the notation of corollaries 4.1. 4.2 and 4.3 is
used without comment.
4.4
'€O'l,oUO'tg-
Consider n i.i.d. observations of (N.X.Y) where both X and Yare
a.s. left continuous with right hand limits.
is a constant function.
a)
~
b)
lim nllell
o
4
=
00.
lim lIell
n
c)
If
2
= O.
n
-:00.
e <
D4. lim
00.
and
(1)
n
d) the conditions of corollary 2.1 are satisfied.
then.
!l1)(~n)
__o~>
N(O.l)
and.
STS
n
which is equal in distribution to
standard Wiener process.
If in addi tion.
e)
E sup
s€[O.T]
~
x2y < 00 and
s s
is bounded on [O.T]
where W is a
146
then.
T 2
R
is (w )
n
n
=
{V (wR)}~
n n
~(~~)
I T u a 2 (u) du ]
~ 2
~(s) ds
I 0 a (u) du
[
I o a (s) s -
>N
f
{f~ [- -
T
2
~ .
}
o U a (u) dU]2 2
a (s) ds
T 2
I a (u) du
1
0
and
STSn
~(~j3»
[
sup
t€[h.1-h]
TS(g-l(t»
J.L(g
+
V(g-l(t)~
-1
(t»
V(g-l(t»~
]
as n --+
co.
PROOF:
Note that the conditions for corollaries 4.1. 4.2 and 4.3 are the
same;
that is A. B. C. and conditions a). b). and c) of this corollary.
In corollary 2.1. it was shown that A and C are consequences of the
assumptions given in corollary 2.1.
Also as before a slightly weaker
version of B. B'. is sufficient for the asymptotic distributional
results under ~n.
o
~ (s)X (i)
o
s
Recall that B':
> -olxs (i)l}
ds
=0
~n
max
l~i~n
IT
I{s:
Ix
0
(1) for some 0
>0
s
(i) IY (i)
s
and each E.
> E.n~
> O.
and
o
that the conditions of corollary 2.1 imply B'.
To derive the asymptotic distributional results under ~~ of
corollaries 4.1. 4.2. and 4.3. a stronger condition that B' is required;
i.e. it suffices to assume
max
sup
l~i~n s€[O.T]
n ~Ix (i)
s
~n
I Ys(i) ~>
0 as n --+ co.
Assumption e) of this corollary implies this condition. as can be seen
147
Let Z(i) = s€[~~T] IXs(i)1 Ys(i). i=I ..... n;
by the following argument.
> O.
then the Z(i)'s are i.i.d. and for c
P(
max
n-~(i) > c
~ nP(n-~ > c)
)
l~i~n
I
~
2
Z dP
.Jh_
->
0 as n -+
(1)
(n --Z>c)
2
(since EZ = E sup
s€[O.T]
x2y
<
s s
(1).
o
4.6.
Lenunas 4.2 and 4.3
~el4l1&a
4.2
Assume
a)
P is bounded on [O.T].
~n
b)
c)
max
sup
l~i~n s€[O.T]
A
n.J /X (i)IY (i) _0_> O. as n
s
-+
and
(1)
s
AI. A3. CI,
then.
sup
t€[O.T]
-
n~ i=1
~ It0
1
+ -2
P(s)X (i) dM (i)
s
s
It ~(s)S2(p .s)X (s) dsl
0
0
0
~n
_0_>
0
and for Z. defined by Zt = n~
as n -+
(1)
~ It P(s)X (i)dM (i). t € [O.T].
i=1
0
s
s
Z converges weakly under ~n in the supremum topology on D[O.T] to a
o
continuous Gaussian martingale with variance process.
148
Remark:
Assumption b. above. is a slight strengthening of the Lindeberg
condi tion B.
PROOF:
~n
Ln(
~ )t
o
= n~ xn
i=1
It ~(s)X
s
0
xn I
. 1
+
1=
= n~ xn
i=1
+
= n
~
xn I
i=1
~
~
. 1
1=
It ~(s)
(.J
n h
0
t
I0
s
t [
n
1-e
-~
~(s)X (i») ~ (s)X (i)
s
e
s
0
Y (i)A (s) ds
s
0
0
X (i) dM (i)
s
0
t
(i) dN (i)
s
n
~(s)X (i)+1-e
~
~(s)X (i)J ~ (s)X (i)
s
s
1
~(s)X (i) dM (i) - 2
e
0
s
Y (i)A (s) ds
s
0
2
t...2
I P (s)8 (~ .s)A0 (s) ds
son 0
s
( 4.10)
~
e
Choose ~
> O.
so that
sup
1~(s)1
s€[O.T]
lex Therefore on { max
sup
1~i~n s€[O.T]
~x2
< ~-1.
- x - 11
~
o
(s)Xs (i)
Ys (i)A0 (s) ds
and note that for
Ixl < 1
elx3 1 .
n~IX (i)Y (i)1 < ~}. the absolute value of
s
s
the third term on the RHS of (4.10) is bounded above by
149
P (s)X (i)
P3 (s)1x3(i)Ie 0
s
Y (i)A (s) ds
00
s
so
~ n-3/ 2
IT
'1
1=
= 0
~n
(n
-~
by A3, Cl.
)
o
Therefore by assumptions b and AI,
~n
sup
t€[O,T]
Ln(
~]
~n
- n -~
t
o
J!1
i=l
JT P{s)X (i) dM (i)
0
s
s
- ~T ~(s)S2(p ,s)A (s) ds
000
=0
~n
(1).
o
To prove that Z converges weakly to the appropriate continuous Gaussian
martingale, Rebolledo's theorem as given in section 1.2 is used.
Note
that
<Z)t = It ~(s)S2(p ,s)A (s) ds which by Al and CI converges in ~n
o
n 0
0
0
probability to st ~(s)S2(p ,s)A (s) ds for each t € [O,T].
000
arbitrary
n-
1
~
I
i=1
=
~
) 0 consider,
T ~
2
s
p (s)X (i) e
0
Po(s)Xs(i)
Y (i)A (s) I{n
s
-1 ~
2
p (s)X (1)
s
0
JT I{n- l ~(s}x2{i) Y (i) ) ~} ds 0 (l)
l/l<n 0
s
s
~n
max
~
Next for
-
Y (i»~} ds
s
(by a, AI, CI)
0
=
0
~n
(l)
{bya,b}.
o
o
~emma.
4.3
150
n
n
Consider X , V random elements in D[O,T], n ~ 1, X, ~, n ~ 1
random elements in C[O,T], and V,g deterministic functions in C[O,T],
for which the following holds,
a}
vn is a.s. nonnegative, V is positive on (O,T), g
n
: [O,T] ~
[0,1] is a.s. nonnegative and nondecreasing on [O,T], g : [O,T]
~
[0,1]
is strictly increasing, g(O} = 0,
b}
sup
Ign (t) - g(t}1 =
t€[O,T]
c}
sup
Ivn(t} - V(t}1 = 0 p (I)
t€[O,T]
n
X converges weakly in the supremum norm topology on D[O,T] to
d}
0
p
(I),
X.
then for any h € (0,.5),
sup
t€[h,l-h]
zn converges in distribution to
(X
sup
t€[h,l-h] (V
0
g-l}t
0
g-l)~
PROOF:
he following steps will be proved.
-1
I} For
~
(v)
= inf{u
sup
v€[O,l]
2}
n
X
X
3}
4}
Ign-1 (v)
= v},
- g -1 (v) I =
v € [0,1],
(I).
0
p
~l converges weakly in the supremum topology on D[O,l] to
0
0
€ [O,T] : gn(u}
g
-1
sup
I(vn
t€[O,l]
0
~1}
- (V
0
g-l)
t
Define f : D[O,l] x D[O,l]
t
~ ~
by
I
=
(l)
0
P
151
f(x.y)
where
=
X
t
sup
~
tE[h.1-h] Y t
gis taken to be zero;
I{y t
> o}
then f is continuous with respect
to the supremum norm topology on C[O.l] x {z E C[O.l] :
inf
z(t)
tE[h.1-h]
> O}.
If 1) through 4) hold. then by the continuous mapping theorem (Pollard.
n
1984. p. 68). Z converges in distribution to
(X
sup
tE[h.1-h] (V
0
g-1)t
0
g-1)~
and
the proof is finished.
Consider
sup
vE[O.1]
(4.11)
(since g is 1:1).
Define fn(u)
= g~l(g(u»
sup
Ig(fn(u»
uE[O.T]
(4.12)
u E [O.T]. then gn(fn(u»
- g(u) I =
sup
Ig(fn(u»
uE[O.T]
<
= g(u).
and
- gn(fn(u»I
sup
Ig(u) - gn (u)1 =
uE[O.1]
0
p (1)
Note that since g : [O.T] ~ [0.1] is 1:1 and continuous. g-l : [0.1] ~
[O.T] is also continuous and indeed uniformly continuous.
then 3 {,
> O.
SUp
{
uE[O.T]
< (,
so that if Ix-yl
If (u)-ul
n
<~ }
~{
and (4.11) and (4.12) imply that
Choose
then Ig- 1 (x) - g-l(y)I <~.
sup
Ig(f (u»
sE[O.T]
n
sup
vE[O.1]
Ig n-1 (v)
- g(u)1
- g -1 (v)
~
> O.
Then
< {, }.
I = 0 (1).
p
Step 1 is proved.
n
-1
) converges weakly in supremum norm
n
It is easy to see that (X.g
1
topology on D[O.T] x Do to (X. g-l) where P«X.g- ) E C[O.T] x Co) = 1
152
where D consists of the elements
o
satisfy 0
X
0
g
-1
g
0
of D[O.l] that are nondecreasing and
T for all t. and Co = Do n C[O.T].
~ ~(t) ~
n
implies that X
~
-1
n
This in turn
converges weakly in the supremum topology to
(see Billingley. 1968 pg. 144).
The proof for step 3 is
identical to the above and is omitted.
To show that f as defined in step 4 is continuous. let (xn.yn).
(x,y) € C[O.I] x
{Z €
C[O,I]
inf
z(t)
t€[h.1-h]
> O} and suppose that
n
sup
sup
Ix t - x t I +
t€[O.l]
t€[O.l]
Since
inf
y
t€[h.1-h] t
> O.
=
~
for n large
I
as
inf
Ytn
t€[h.1-h]
n
Xt
sup
t€[h.1-h]
(Y~)~
> 0 and
x
t
sup
~
t€[h.1-h] Yt
sup
t€[h.1-h] (Y~)~
xn
t
n -+ co.
x
t
-~
Yt
~
Ixt(Yt)
n
~ - xt(Yn ) ~I 0(1)
sup
t
t€[h.1-h]
~
I (Y ) ~-(Yn ) ~I
sup
Ix~-xtl 0(1) + sup
t
t
t€[h.1-h]
t€[h.1-h]
0(1)
The square root function is uniformly continuous on [
2 sup
Y];
t€[h.1-h] t
n
hence. for large n If(xn.y ) - f(x.Y)1 is small if
n
sup
Ix t - x t I +
t€[O.l]
continuous on
~
inf
Yt'
t€[h.1-h]
is small;
that is f is
154
APPENDIX:
ADDITIONAL NOTATION
N(e) = ffl N(i)
i=1
-
N(e) = n
-1
~
~
•
N(i)
i=1
f3 (u)X (i)
M (i) = N (i) - IS e 0
u
Y (i)A (u) du. s
s
s o u
0
~ = o{ Ns(i). s~t. i=I •...• n}. t
Ln x:
~:
€
[O.T]. i=I •.... n
[O.T]
€
natural logarithm of x
the closure of the open set ~ C
m
i
a i = ! lj' a = O. i=I ..... K (the dependence of li on n is
j=1
0
n
suppressed)
i(t) = j if t
lim:
n
lim inf.
€
(a. l.a.]
J-
lim:
n
n~CO
J
lim sup and lim:
n
n~co
lim
n~
I j is the indicator variable for the interval (a _ 1 .a j ]
j
~ (f3)
n
....2
o
= i=1
ffl IT
0
-K
X (i) -
s
Ln E (f3.s) dN (i)
(t) = - r- I.(t) (ljn)
n
j=1 J
n
-1
s
cJ2
..
~ ~ (~). t
ap n
€
[O.T]
j
o2 (t) = V(f3o .t)Sa(f3 0 .t)A0 (t). t
[O.T]
€
o~J
= IT Ij(s) 02(s) ds. j=1 ..... K
o n
a
on2 (t) = vn
(f3 o
.t)S
n (f3 0 .t)A0 (t). t
€
.,
[O.T]
155
,..
t
~(H
o
) = arg{ max '£ (b)t } where
bEm n
(0
OL
n
(b)
_- ~n It X (10)
.L.
t
i=l
0
S
-
Ln En (b • S ) dN S (0)
1 •
t E [a
l .
0
0
0
•
~]
156
REFERENCES
0.0. Aalen. A Model for Nonparametric Regression Analysis of Counting
Processes. Springer Lect. Notes in Statist. 2 (1980) 1-25.
0.0. Aalen. Nonparametric Inference for a Family of Counting Processes.
Ann. Statist. 6 (1978). 701-726.
J. Aitchison and S.D. Silvey. Maximum-likelihood Estimation of
Parameters Subject to Restraints. Ann. Math. Statist. 29 (1958)
813-828.
J.A. Anderson and A. Senthilselvan. A Two-step Regression Model for
Hazard Functions. Appl. Statist. 31 (1982) 44-51.
P.K. Anderson and o. Borgan. Counting Process Models for Life History
Data: A Review. Scand. J. Statist. 12 (1985) 97-158.
P.K. Anderson and R.D. Gill. Cox's Regression Model for Counting
Processes: A Large Sample Study. Ann. Statist. 10 (1982)
1100-1120.
A.R. Barron and T.M. Cover. Minimum Complexity Density Estimation.
unpublished manuscript. Dept. of Statistics. Univ. of Illinois.
Urbana. IL. 1989.
I.V. Basawa.and B.L.S. Prakasa Rao. Statistical Inference for
Stochastic Processes (Academic Press. London. 1980).
J.M. Begun. W.J. Hall. W. Huang. and J.A. Wellner. Information and
Asymptotic Efficiency in Parametric - Nonparametric Models. Ann.
Statist. 11 (1983) 432-452
P. Billingsley. Statistical Inference for Markov Processes (The
University of Chicago Press. Chicago. 1968a).
P. Billingsley. Convergence of Probability Measures (John Wiley & Sons.
New York. 1968b).
Z.W. Birnbaum and A.W. Marshall. Some Multivariate Chebyshev
Inequalities with Extensions to Continuous Parameter Processes. AMS
32 (1%1) 687-703
O. Borgan. Maximum Likelihood Estimation in Parametric Counting Process
Models. with Applications to Censored Failure Time Data. Scand. J.
Statist. 11 (1984) 1-16.
P. Bremaud. Point Processes and Queues. Martingale Dynamics
(Springer-Verlag. New York. 1981).
•
157
C.C. Brown, On the Use of Indicator Variables for Studying the
Time-Dependence of Parameters in a Response-Time Model, Biometrics
31 (1975) 863-872.
D.R. Cox, Regression Models and Life Tables (with discussion), ]. Roy
Statist. Soc. B 34 (1972) 187-220.
E. Csaki, Some Notes on the Law of the Iterated Logarithm for Empirical
Distribution Function, in: P. Revesz, ed., Colloquia Mathematica
Societatis Janos Bolyai. II., Limit Theorems of Probability Theory
(North Holland, Amsterdam-London, 1975).
K.O. Dzhaparidze, On Asymptotic Inference about Intensity Parameters of
a Counting Process, Report, Centre for Mathematics and Computer
Science, Amsterdam, 1986.
2
R.A. Fisher, On the Interpretation of X from Contigency Tables and
Calculation of P, ]. Roy. Statist. Soc. A 85 (1922) 87-94.
M. Friedman, Piecewise Exponential Models for Survival Data with
Covariates, Ann. Statist. 10 (1982) 101-113.
S. Geman and C. Hwang, Nonparametric Maximum Likelihood by the Method of
Sieves, Ann. Statist. 10 (1982) 401-414.
R.D. Gill, Censoring and Stochastic Integrals (Mathematisch Centrum,
Amsterdam 1980).
U. Grenander, Abstract Inference (John Wiley
& Sons, New York, 1981).
J. Hajek and Z. Sidak, Theory of Rank Tests (Academic Press, New York,
1967) .
R.T. Holden, Failure Time Models for Thinned Crime Commission Data,
Soc, Methods 8 Research, 14 (1985) 3-30.
S. Johansen, An Extension of Cox's Regression Model, Int. Statist. Reu.
51 (1983) 258-262.
N.L. Johnson and S. Kotz, Continuous Univariate Distributions - I,
Distributions in Statistics (John Wiley & Sons, New York, 1970).
Ju.M. Kabanov, R. S. Lipcer and A. N. Shiryayev, Absolute Continuity
and Singularity of Locally Absolute Continuous Probability
Distributions II, Math. USSR Sbornik 36 (1980) 31-58.
A.F. Karr, Inference for Thinned Point Processes, with Application to
Cox Processes, ]. MuLtivariate AnaL. 16 (1985) 368-392.
A.F. Karr, Maximum Likelihood Estimation in the Multiplicative Model via
Sieves, Ann. Statist. 15 (1987) 473-490.
A.F. Karr, Point Processes and their Statistical Inference (Marcel
Dekker, Inc., New York, 1986).
158
P.E. Kopp, Martingales and Stochastic Integrals (Cambridge Univ. Press,
Cambridge, 1984).
E. Lenglart, Relation de Domination entre deux Processus, Ann. Inst. H.
Poincare 13 (1977) 171-179.
•
J. Leskow, Histogram Maximum Likelihood Estimator of a Periodic Function
in the Multiplicative Intensity Model, Statistics & Decisions 6
(1988) 79-88.
I.W. McKeague, A Counting Process Approach to the Regression Analysis of
Grouped Survival Data, Stochastic Processes. AppL. 28 (1988)
221-239.
T. Moreau, J. O'Quigley and M. Mesbah, A Global Goodness-of-fit
Statistic for the Proportional Hazards Model, AppL. Statist. 34
(1985) 212-218.
H.T. Nguyen and T.D. Pham, Identification of Nonstationary Diffusion
Model by the Method of Sieves, SIAM ]. COntroL and Optimization 20
(1982) 603-611.
M.P. O'Sullivan, A New Class of Statistics for the Two-Sample Survival
Analysis Problem, thesis, Biomathematics Group, Univ. of
Washington, Seattle, Washington, 1986.
D. Pollard, Convergence of Stochastic Processes (Springer-Verlag, New
York, 1984).
S. Portnoy, Asymptotic Behavior of Likelihood Methods for Exponential
Families When the Number of Parameters Tends to Infinity, Ann.
Statist.16 (1988) 356-366.
J. Praagman, Bahadur Efficiency of Rank Tests for the Change-point
Problem, Ann. Statist. 16 (1988) 198-217.
H. Ramlau-Hansen, Smoothing Counting Process Intensities by Means of
Kernal Functions, Ann. Statist. 11 (1983) 453-466.
R.R. Rao, The Law of Large Numbers for D[O,I]-Valued Random Variables,
Theor. Probab. AppL. 8 (1963) 70-74.
R. Rebolledo, Sur les Applications de la Theorie des Martingales a
L'etude Statistique d'une Famille de Processus Ponctuels, Springer
Lecture Notes in Mathematics 636 (1978) 27-70.
S.N. Roy, On a Heuristic Method of Test Construction and Its Use in
Multivariate Analysis, AMS 24 (1953) 220-238.
D.M. Stablein, W.H. Carter, Jr., and J.W. Novak, Analysis of Survival
Data with Nonproportional Hazard Functions, COntroLLed CLinicaL
TriaLs 2 (1981) 149-159.
e
•
159
J.D. Taulbee, A General Model for the Hazard Rate with Covariables,
Biometrics 35 (1979) 439-450.
W.H. Wong, Theory of Partial Likelihood, Ann. Statist. 14 (1986) 88-123.
D.M. Zucker and A.F. Karr, Nonparametric Survival Analysis with
Time-Dependent Covariate Effects: A Penalized Partial Likelihood
Approach, to appear ill Ann. Statist., 1989.