No. 200lT TIME-DEPENDENT CX>EFFICIENTS IN A CX>X-TYPE REGRESSION MODEL by Susan A. Murphy A Dissertation submitted to the faculty of The University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics Chapel Hill 1989 Approved by: _ _ _ _ _ _ _ _ _ _ _----'Advisor _ _ _ _ _ _ _ _ _ _ _----'Reader _ _ _ _ _ _ _ _ _ _ _....R eader TIME-DEPENDENT CX>EFFICIENTS IN A CX>X-TYPE REGRESSION MODEL SUSAN A. MURPHY. (Under the direction of P.K. Sen.) ABSTRACf This thesis concerns itself with inference and estimation for a time-varying regression coefficient in a Cox-type parameterization of a point process intensity. sieves. The coefficient is estimated via the method of A rate of convergence in probability for the sieve estimator is derived and this rate is shown to be the best rate for the norm considered. A functional Central Limit Theorem for the integrated sieve estimator and a consistent estimator of the asymptotic variance process are given. Inference for the above time-varying coefficient centers around alternatives to a constant regression coefficient. The use of a likelihood ratio type test is explored for general infinite dimensional alternatives. Linear statistics and generalizations thereof are used to test for a change point or a monotone trend. For the linear statistic. weight functions which maximize the asymptotic power with respect to contiguous alternatives are derived. TABLE OF CX>NTENTS Chapter I. Page INTRODUCfION 1.1 1.2 1.3 1.4 1.5 II. The Cox Regression Model for Counting Processes The Counting Process Framework Parametrizations of the Stochastic Intensity Motivating Applications. Summary of Thesis and Further Research. The Likelihood for a Counting Process Cox's Partial Likelihood The Method of Sieves . The Statistical Model and Assumptions Consistency of the Sieve Estimator The i.i.d. Case. Lemmas 2.1 and 2.2. 18 19 24 26 30 34 3S ASYMPTOfIC DISTRIBUfIONAL TIIEORY 3.1 3.2 3.3 3.4 3.5 3.6 3.7 IV. 7 11 TIIE FSTlMATOR AND ITS CX>NSISTENCY 2.1 2.2 2.3 2.4 2.5 2.6 2.7 III. 1 2 5 Asymptotic Normality of the Sieve Estimator A Consistent Estimator of the Asymptotic Variance Process A Partial Likelihood Ratio Test of a Finite Dimensional Null HyPOthesis Versus an Infinite Dimensional Alternate Other Versions of the Partial Likelihood Ratio Test Consistency of the Partial Likelihood Ratio Test The i.i.d. Case Lemmas 3.1. 3.2. 3.3. 3.4 and 3.5 46 51 53 73 77 87 91 LINEAR TFST STATISTICS AND GENERALIZATIONS 4.1 A General Linear Statistic. 4.2 Efficacy of the Linear Statistic 4.3 A Test for Regression 4.4 A Test for a Change-Point 4.5 The i.i.d. Case 4.6 Lemmas 4.2 and 4.3. APPENDIX: REFERENCES ADDITIONAL NOfATION 107 114 123 132 145 147 154 156 I. INTRODUCfION 1.1 The Cox Regression model for Counting Processes This thesis is motivated by applications in which both an output counting process N and an input covariate process X are observed. Interest lies in estimating the effect of X on N and in formulating statistical tests concerning the regression relationship. In particular. allowance is made for a non-stationary effect of X on N and questions such as the following are addressed: 1) Is there an increasing relationship between X and the frequency of jumps of N? 2) Does a stationary relationship between X and N experience a "disruption" and change to a different yet still stationary relationship? 3) Is the relationship between X and N "stationary"? Just as the mean is parametrized in the classical regression of one random variable on another. here the conditional mean function can be parametrized in the "regression" of N on X. In particular the stochastic intensity (the derivative of the conditional mean function). of the output counting process is usually parameterized. Here Cox's parameterization (1972) of the stochastic intensity is adopted. i.e .• 2 i\ unknown scalar. o is an unknown function and 13 0 is an Cox proposed this model as a way to relate covariates such as treatment type • age • etc. to the chance of survival for an individual. However this model is not only useful in survival analysis where each counting process has at most one jump. but also in contexts where events can reoccur. The Cox regression model as given above is limited in that it does not allow for a time-dependent effect of X on N. time-dependent effect is to consider 13 o One way to allow for a as a function. The primary goal of this thesis is to consider estimation and inference for the function 13. o 1.2 The Counting Process Framework Consider an n-component multivariate counting process. N = '" (N{l) •... N{n». and a predictable stochastic process over the time interval [O.T]. ~ {O'~'{~t stochastic intensity intensity ~{X} ~{~} (X{I) •... X{n» For example. N{i} might count certain life events for individual i and XCi} might be his age. a stochastic base = t € ~ is defined on [O.T]}} with respect to which = {i\{X.I), ... i\{~,l}}. ~ ~ has haVing stochastic implies that (I. I) is a local square integrable martingale with predictable variation, (M{i), M{j}> t = Jot i\ {X , i }ds s "'s = 0 for i=j for i "# j. 3 ~ Associated with is the marked point process ( (T .Z ). (T .Z ) .... ) 1 1 2 2 where T. is the time of the jth jump of N(e) (N(e)= ~ N(i» J i=l specifies which of the components of jumped at time T .. ~ J and Z. J The localizing times in (1.1) can be taken to be either the {T.}.>l or J J_ {S.}.>l where J J- = co otherwise It is easily proved that Ar (~ j .i) > a ~ j 1. on Z. = i. j J In some applications. there are intervals of time in which jumps of the N(i) can not be observed. a exists which is If an observable predictable process C(i) on these intervals and is 1 otherwise. then ~*. where N~(i) = s~ Cs(i) dNs(i). i~l. is a multivariate counting process with stochastic intensity * ~ (~) A(~.l)C(l), = ( ... , A(~,n)C(n». The localizing times for the martingale relationship will be as before. fact, for ~ ITO . 1 1= ga vector of predictable processes, if Ics (i)\ A (X ,i)ds s ~s ~ sot C (i) dN (i) - i=l s In s < co a.s. ~, then M. where Mt = ~ sot C (i) A (~ ,i)ds, t ~ [O,T], is a local i=l s s s square integrable martingale with localizing times {Sj}.>ldefined by J- Sj = inf { S~ ICs(i)I As(Xs,i)ds ~ j} t~O =co if { ... } # ~ otherwise j ~ 1. A review of the martingale theory used in this thesis can be found in Anderson and Borgan (1985), Bremaud (1981), and Kopp (1984). For completeness two results that will be used repeatedly in the following work are given below. Karr (1987) is given first: Lenglart's inequality as formulated by 4 LengLart's InequaLity {Lenglart, 1977}. Let X.Y be adapted. right continuous nonnegative processes such that Y is predictable and nondecreasing with YO = o. Suppose that Y dominates X in the sense that E[~] ~ for every finite stopping time T. o and E[YT]; then for each e.~ > every finite stopping time T. If X-Y forms a local martingale then for every finite stopping time T. E[~] = E[Y ] and the above result can be utilized. T The next theorem is a central limit theorem of Rebolledo {1978}. as generalized by Anderson and Gill {1982}: ReboLLedo's CentraL Limit Theorem for LocaL square integrabLe martingaLes. For each n=1.2 •... let NO ~ be a multivariate counting Let ~ be a p x n {p ~ 1 is fixed} matrix of process with n components. locally bounded predictable processes. Suppose that ~n has an intensity process ~n. and define local square integrable martingales ~ = {~ ....• ~}~ p ~{t} ~ ~k{s} {dN~{s} - ~{s} = Jt o k=l ds}. Let A be a p x p matrix of continuous functions on [O.T] which form the 00 covariance functions of a continuous p-variate Gaussian martingale W . ~ 00 00 00 with W CO} = 0; i.e. Cov{Wi{t}. Wi{s}} = Aij{t A s} for all i. j. t and s. Suppose that for all i. j. and t < ~. as n ~ 00 ~j >t= Jt ~ ~k{s}~k{s}~{s} ds ~ Aij{t} J o k=l and that for all i and e > 0 5 ~ H~k{s)~n{s) So k=l J n!2l Then W '" 1.3 ~ ~ 1 W00 as n '" ~ I{I ~k{s) I > c} ~ ~ ds 0 as n ~ 00. 1 p 00 in D{ [O.T] ). Parametrizations of the Stochastic Intensity In statistical applications the stochastic intensity. ~. is often Basawa and Rao (1980) and Borgan (1984) among others. parametrized. consider a finite dimensional parametrization Both Basawa and of~. Rao. and Borgan use maximum likelihood estimation in order to construct estimators. A univariate linear regression model for ~(s)Xs{i) i~l) (1988). ~. (As{X.i) = is used by Aalen (1978. 1980). Karr (1987) and Leskow To estimate ~. Aalen proposes a martingale estimator whereas Karr and Leskow use the method of sieves (Grenander. 1981). McKeague (1988) in a multicovariate linear regression model derives estimators via a weighted least squares method. Another regression model for A is the now classical Cox regression '" model (Cox. 1972). Here. As (X.i) =e ~oXs{i) Ys {i)A0 (s). i€{l •...• n}. where A is an infinite dimensional nuisance parameter and Y is a vector o '" of predictable stochastic processes taking values in {O.l} indicating censored intervals. ~. o Various approachs are taken in order to estimate One approach presented in Anderson and Borgan (1985) is to let A 0 be a known function of a finite dimensional parameter. estimate e and ~ o e. and then simultaneously using maximum likelihood estimation. Friedman (1982) parametrizes A by a step function and investigates the o properties of the maximum likelihood estimator of number of steps defining A to increase with n. o ~ o • while allowing the On the other hand. Cox 6 (1972) eliminates A by the use of a partial likelihood in order to o estimate p. This partial likelihood will be discussed further in o chapter 2. Asymptotic properties of the maximum partial likelihood estimator are given by Anderson and Gill (1982). As was mentioned in the introduction. this thesis is concerned with a time dependent regression relationship between the covariate and the This means Po is a function on [O.T]. i.e. As (X.i) = counting process. ~ P (s)X (i) e s 0 Y (i)A (s). s 0 regression coefficient. Several authors have considered a time-varying Brown (1975) using a discrete version of Cox's model. parametrizes both p and A as step functions. o 0 the likelihood in order to estimate Po and A. 0 Stablein et. al. (1981) parametrizes P as a polynomial function in time. o likelihood in order to estimate p. o He then maximizes They use the partial Anderson and Senthilselvan (1982) also use the step function parametrization for p as did Brown. but they o allow the data to choose the jump points of the step function. Essentially they consider a step function with two steps. so that there are three parameters, the value of the step function before the step, the time of the step and the value of the step function after the step. The three parameters are estimated by maximizing Cox's partial likelihood. Each of the above authors makes simplifing assumptions on the form of the function p so as to maintain a finite dimensional parameter o space. However, Zucker and Karr (1989), using a penalized likelihood technique, allow P o to be infinite dimensional. They prove that for large n, the maximizer of the penalized partial likelihood is a polynomial spline with degree of the polynomial influenced by the choice of the penalty function. The knots of the spline occur at the jump 7 points of the counting process. All of the above analyses are developed within the survival analysis context; that is where N can have at most one jump. The estimation method presented in this thesis is applicable not only in the survival analysis context. but also in the more general context where N is allowed multiple jumps. This method which also allows p o to be infinite dimensional. utilizes the method of sieves {Grenander. 1981}. and in particular a very simple sieve. the histogram sieve. This choice of a sieve retains the simplicity of analysis present in methods involving only a finite dimensional parameterization of the regression coefficient p . o 1.4 Motivating Applications In order to consider an application in survival analysis. consider first how one might formulate a survival analysis model in the counting process context. Suppose that n individuals {operating independently} are observed until failure. Let T •...• T be the failure times. and n 1 suppose that the hazard rate for the distribution of T. is given by h .. I Let Nt{i} = I{Ti~t}. and Yt{i} = I{Ti~t}. i=I •.... n. I Then {N {I} •...• N {n}} forms a multivariate counting process with respect to t t its' internal filtration stochastic intensity i=I .... n. t€[O.T] {~ = a{Ns{i}. s~t. i=I •...• n}}t~O. The of N{i} is given by. {Aalen. 1978}. At{i} = hi{t}Yt{i}. If {C{I} •...• C{n}} is predictable {say. left continuous. right hand limited and adapted to ,N} where each C{i} takes values in {O.I}. then N~{i} = f~ Cs{i} dNs{i}. i~l. also forms a multiplicative counting process with respect to ,N with 8 * = Ct(i)hi(t)Yt(i). stochastic intensity At(i) i~l. On the other hand. if the C(i)'s are independent of the N(j)'s. then by enlarging = V a(C(i). i=l •.... n». (N* (1) ....• N* (n» .-N <~. counting process with respect to * ~ ii to ~* forms a multivariate (Bremaud. 1981). For a discussion of various censoring mechanisms that can be incorporated through C. see Gi 11 (1980). In survival analysis one might be interested in comparing an invasive treatment (say. surgery and medication) to a less invasive treatment (say. medication alone). In this situation. it is often expected that the hazard rate for the invasive treatment will be high for some length of time and then drop. possibly to or below the hazard rate for the noninvasive treatment. might appear as in Figure 1. A graph of the hazard functions In the Cox regression model. the hazard at I3X time s is giyen by e 0 A (s). where X=l for the invasive treatment and o o otherwise; that is. the hazard for the invasive treatment would be 130 e A (s) and A (s) would be the hazard rate for the noninvasive o treatment. 0 Obviously this does not fit the hazards graphed in Figure 1. In fact. it apPears that 13 is positive up to time s=t and is negative o thereafter. This is the type of problem which motivated the work illustrated in the last section. Consider also a continuous time Markov chain on a finite state space with inhomogeneous infinitesimal generator. Suppose n individuals move independently of one another from one state to the next. Let Nt{i.j.k) be the number of direct transitions from state i to state j up to and including time t by individual k (i.jE<l •...• m». If the transition intensity from state i to state j for individual k is given 1 bya ijk • a L [O.T] function. and if the process is Markovian. then ~ = 10 N(i.j.k). i.j.k~1. i#j forms a multivariate counting process with stochastic intensity. At(i,j.k) = aijk(t)Yt(i.k). where Yt(i.k) = 1 if the k th individual is in state i at time t-. Yt(i.k) (Aalen. 1975). = O.otherwise 'fhe filtration here is the internal filtration. Various forms of censoring can be considered as in the survival analysis case. Both Anderson and Gill (1982) and Anderson and Borgan (1985) consider the Cox regression model in this context; modeled conditionally on X by exp( that is. where a ~o(i.j)Xt(k) ) AO(t). ijk is Here as in survival analysis. one can envision situations in which the effect of a covariate on the transition intensities varies with time. In sociology a simi liar problem might occur in the study of criminal recidivism ( see Holden . 1985 for some failure time models used in this area). It is easy to envision a situation in which the effect of the correctional treatment on the crime rate of an individual will slowly disappear with time. Another application arises in queuing problems. Recall that a simple queue can be constructed by a multivariate counting process. (N(l). N(2». and Q a nonnegative integer valued random variable. o the queue. Q. is given by. Q = Q + At - D where At t t o I~ I(Qs_>O) dN s (2) (Bremaud. 1981). = Nt (l) Then and Dt = In the comparison of various servers generating the output process D. an inexperienced server may process the customers at lower rate than an experienced server. but the rate of the inexperienced server may increase with time. Basawa and Rao (1980) consider inference for queueing problems when the parameters are time independent. 11 1.5 Summary of Thesis and Further Research In chapter II. the problem of estimating a time dependent coefficient. ~ o . in the Cox regression model is addressed. Some modification of the classical approach to maximizing the likelihood is needed. since as explained in section 2.3. the likelihood is maximized at 00. here. spaces. The method of sieves (Grenander. 1981) is the modification used In the method of sieves an increasing sequence of subparameter {GK . n~1}. is considered for which their union is dense in the n target parameter space. 9. Classically the metric on the target parameter space is derived in some fashion from the Kullback-Leibler information (Grenander. 1981; Geman and Hwang. 1982). maximized over GK . n increase with n. GK Karr. 1987; Nguyen and Pham. 1982; For a sample of size n. the likelihood is To achieve consistency. K is then allowed to n In this thesis the histogram sieve is used; that is. consists of all step functions with steps at predetermined points. n ~o is estimated by a step function. This means that for a sample of size n. consistent for ~ o pn o with K steps n is shown via a but involves unknown quantities. the projection of fj as the metric. Consistency of pn. ~. which is Essentially ~ is on the sieve using a "Kullback-Leibler information" Grenander. Geman and Hwang. Nguyen and Pham. and Karr all make use of the projection on the sieve in proving consistency; however. in the applications they consider. the Kullback-Leibler information translates in a 1:1 fashion into a metric on the target parameter space. This does not appear to occur here. instead the Kullback-Leibler information can be approximated to the first order by 2 an L norm. ~ fj is then the projection on GK n 2 using this L norm. 12 "n Theorem 2.1 shows that the distance between ~ and -n ~ approaches zero (in probability) as the sample size increases and gives a rate of 2 convergence in L . That this rate is the best achievable for this norm can be seen from note 2 following Theorem 2.1. In this case, as in similiar cases, (Karr, 1985; Ramlau-Hansen, 1983; 1989; Leskow, 1988; Zucker and Karr, Nguyen and Pham, 1982), appears to have a limi ting "whi te noise" distribution; hence, for inference purposes it is useful to consider an integrated version of " = f Ot ~(s)-~o{s)ds. i.e., Zt "n ~ ~, It is natural to expect that Z, properly normalized, will converge weakly to a Gaussian martingale. indeed the case is proved in Theorem 3.1. That this is McKeague{I988) proves functional weak convergence in a similiar situation, that of the method of sieves combined with least squares estimation for a multivariate version of Aalen's (1978) multiplicative intensity model, i. e., instead of the intensity modeled by At{X) At{X) = ~o{t)Xt. =e ~ 0 (t)X t AO{t), it is modeled by As would be expected, the competing interests of bias and variance must be traded off in order to obtain a weak convergence ~.~ Essentially the weak convergence of n f result. requires that K «n~ n ~ • and K »n~ n ;;Il n f O p (s)-~o{s) ds to zero. ~ . ~ for n f O p (s)-~o{s) O ;;Il p {s)-p (s) ds is reqUired for the convergence of In addition to the weak convergence result ds, a consistent estimator of the asymptotic variance process is given. In chapter III, the problem of formulating a consistent test of a finite dimensional null hypothesis versus an infinite dimensional alternate hypothesis is addressed. on the null hypothesis that alternate hypothesis that ~ ~ .0 o In particular attention is focused is a constant function versus the is nonconstant. As a first step a partial 13 likelihood ratio test is considered. That is. two times the ratio of the partial likelihood maximized under the alternate hypothesis to the partial likelihood maximized under the null hypothesis. Just as in the problem of estimating an infinite dimensional parameter. the partial likelihood under the alternate is maximized at 00. This problem can be remedied. as before. by working within the context of a sieve. for a sample of size n. consider the null hypothesis. versus the alternate hypothesis. ~o € ~. ~ o That is. is constant It is natural to eXPect that n the partial likelihood ratio test (PLRT) is approximately distributed as a chi-squared random variable on K -1 degrees of freedom. n intuition behind Theorem 3.3 in which This is the PLRT- (K -1) _ _ _~n;.;;.""",..,...... is shown to [2{K -1)]~ n converge in distribution to a N{O.l) random variable under the null hypothesis. Alternately Fisher's (1922) transform of the chi-squared random variable can be considered: 2~{ PLR~-{Kn -1)~ } which can also be shown to converge in distribution to a N{O.I) random variable. addition. a sequential version of the PLRT is given. In Moreau. O'Quigley and Mesbah (1985) propose the use of a score test for the alternate hypothesis that ~o is a "K-step" step function. Unless ~o is actually a "K-step" step function. this test as it stands will not be consistent for general nonconstant ~. o However within the sieve context. ( i. e .. as data/information accumulates. allow more steps in alternate hyPOthesis). a consistent test results. as is proved in Theorem 3.4. The PLRT can be used to investigate question 3 as posed at the beginning of section 1.1. In order to investigate questions such as 1. and 2. in section 1.1. a linear statistic is proposed. i.e .. St t A = f O w{s){~{s)-~o{s»ds where 14 w is an appropriate weight function. ~ has a Recall that. intuitively. limiting white noise distribution. in other words. the {~(s)}S€[O.T] act as independent normal random variables with means {Po(s)}s€[O.T]. Given K independent random variables Xl ... ·.~ for linear inferences concerning the the optimal statistic is the ~·s. with means ~1' ... linear statistic L w.X. where the w. are appropriate weights. ill w(s)~(s)" = I w(s)~(s) ds. then The 1 analogous statistic here is "L s '~ This type of statistic has been used by Aalen (1978). Gill (1980). and O'Sullivan (1986) among others. In particular Gill (1980) uses the statistic I w(s)~(s) ds for inference in Aalen's (1978) multiplicative intensity model (At (X) = fJRO(t)X t ). ITo w(s)~(s) ds = IT0 In this case w(s)X- 1 s dN. s Ie0 ~(s) ds Ie0 X-I dN s so that S At the beginning of chapter IV. asymptotics for a general linear statistic given. = I~ w(s)[~(s) - Po(S)] ds are In fact since w may involve unknown quantities. conditions are provided for the convergence of an estimator. w • to w so as to preserve n the asymptotic distribution. Naturally this section includes a consistent estimator of the asymptotic variance of ITo wn (S)[~(S) - P0 (S)] ds. In order to select a weight function. an optimality criterion must be adopted. One optimality criterion would be to maximize asymptotic power against a contiguous alternative (P o (s) + p(s)n-~). This is the method Gill(198O) uses in order to formulate two-sample tests in survival analysis. To return to the subject of chapter III. the PLRT. it is interesting to note that the PLRT is not consistent against contiguous alternatives. Intuitively this is because the PLRT is consistent against all directions of approach to P unlike the linear o tests proposed in chapter IV. This is discussed further in note 3 15 following Theorem 3.4. In chapter IV the choice of the weight function is based on maximizing the asymptotic power against a contiguous alternative. Since the focus of this thesis is on deviations from the null hypothesis that ~ o STo functions for which is a constant unknown function. only weight w(s) ds =0 are considered. Using Le Cam's third lemma (Hajek and Sidak. 1967. p 208) it is easy to derive the asymptotic power of the linear statistic and to subsequently derive the weight function test by choice w~ which maximizes the asymptotic power. LS(w~). of~. Of course LS(w~) Denote the resulting is only optimal for the particular It is then tempting to employ Roy's union-intersection principle (Roy. 1953); that is to reject the null hypothesis if for any ~. LS(w~) direction of approach. is large. but this returns one to the PLRT which is inconsistent against contiguous alternatives as mentioned above. In the last two sections of chapter IV. attention is focused on linear test statistics for the alternatives formulated by questions 1. and 2. in section 1.1. That is. the null hypothesis is that ~ o is a constant unknown function and the first alternate hyPOthesis is that ~ ~ is an increasing function and the second alternate hyPOthesis is that is constant up to some unknown time constant thereafter. T. o o changes value and then remains To formulate a test for increasing alternatives. the weight function is chosed to be optimal for the contiguous alternative. ~ ~ o + sn ; i.e .• the linear alternative. Asymptotics under both the null hypothesis and under the contiguous alternate hypothesis for this test are given in corollary 4.1. Also. the test statistic given in corollary 4.1 is consistent for the fixed alternate that an increasing function. ~o It turns out that the asymptotic power under is 16 the contiguous alternative, ~ o + sn-~, of the linear test can be attained by a much simpler test proposed by Cox(1972). for H : At(X) o =e aX t AO(t), a, A unknown, versus HI: At(X) O (a+~t)Xt e This is the PLRT A (t), a, A unknown, o 0 = ~#O. The test for the second alternate (that of a change point) is more interesting than the test given above. First the change point is assumed to occur (if at all) at time T. An optimal weight function, can then be derived for the contiguous alternate Call the resulting test statistic TS (T). n (~o + on~I{s~T}, 0>0). It turns out that one can derive the asymptotic distribution of TS (taken as a function in n C[O,T]) under both the null hypothesis and the contiguous alternate. Since, in fact the time of change is unknown, Roy's union-intersection principle (Roy, 1953) can be used to justify the use of sup TS (T) as n T the test statistic. This test statistic is consistent under the contiguous alternate (~o + On -~ I{s~T}, 0>0) and asymptotically has a tabulated null distribution. In each of the following chapters a section is devoted to the observation of independent and identically distributed copies of (~,~). This is done so as to illustrate that simple moment conditions (at least in the i.i.d. case) are sufficient to imply the assumptions of the theorems in this thesis. Some questions which spur further thought and/or research are as follows: 1) How useful are sieves in hypothesis testing? As is seen in this thesis, the sieve approach in hypothesis testing allows one to formulate consistent tests for infinite dimensional alternatives. In fact the 17 commonly used X2 goodness-of-fit test can be formulated within a sieve context. A related question is whether, in the sieve context, the X2 approximation to the likelihood ratio test is valid for finite n. 2) It is plausible that a perceived time dependency in the regression coefficient may be caused by an omitted covariate. Is there some way to discriminate between a true time dependency and superfluous time dependency caused by the omitted covariate? 3) Would the concept of stochastic complexity be useful in order to more clearly identify the optimal rate of sieve growth? Barron and Cover (1989) use this concept in density estimation. 4) Can something be said in general about the usefulness of fixed point theorems in consistency proofs for finite dimensional sieve estimators? 5) Will sup s€[O,T] Ip~ (s) - ~n (s)1 (properly normalized) have an asymptotic extreme value distribution? II. 2.1 THE ESfIMATOR AND ITS CONSISTENCY The Likelihood for a Counting Process ~ Let ~) and ~. be two probability measures on a measurable space (0. and let N be a n-dimensional vector of step functions. each of which '" is right continuous with jumps of size 1. filtration }. {~t = ~ V ~o : t ~ t ~ [O.T]. under ~. ~ Assume that ~ On (0. ~) consider the [O.T]} where and ~. - ~o C are equal on ~ and ~ = ~ 0 . Also assume that is a standard multivariate Poisson process and under a multivariate counting process with stochastic intensity A. ~ fTo A (i)ds < 00 a.s. ~. then ~ i=1 ~s' s~t u{ s « ~ '" on ~. ~ If ~T with Radon-Nikodym derivative. = exp{ ~ ft Ln A (i) dN (i) + ~ ft I-A (i) ds} i=1 0 s s i=1 0 s o~ (Karr. 1986). t ~ T Using the associated marked point process ( (T .Z ). 1 1 (T2 .Z2 ) •... ). the above likelihood. Lt' can be rewritten as: Ark (~) Ar (.) • ( k A_ -T k (.» exp{f0t n-A s (.) ds} is 19 (2.1) = P[ IT k~1 7_ ~ I~T] k - exp{J t t Ln A (-) dN (-) + J n-A (-) ds} s 0 s s 0 Tk~t (a (-) in the place of an index means summed over the index). (Bremaund. 1981). Note that the second term in the above product is the likelihood for N(-) except for the term e(n-l)t. One gets that IT k~l P[ ~ I~T-] k Tk~T is proportional to the conditional likelihood of (N(l) •... N(n» given N(-) on [O.T]. on [O.T] This relationship will be useful for an intuitive understanding of the partial likelihood. 2.2 Cox's Partial Likelihood In order to reduce the dimensionality of a parameterization containing many nuisance parameters. Cox (1975). proposed maximizing a partial likelihood instead of the full likelihood to estimate the parameters of interest. Let y be an observation of the random variable '" factored as: m IT f (.) (J_l)(z.lt(j).z(j-l);p.A) j=l Z .IT J • Z J J where t(j) = (t1 •...• t j ). z(j) = (zl ..... Zj). p represents the parameters of interest and A represents the nuisance parameter. The 20 second product is the partial likelihood. In order to construct the Cox suggested that none of the T.·s should J contain important information about partial likelihood. ~ and A should not appear in the Wong (1986) derives sufficient conditions for consistency of estimators based on the partial likelihood. In the same paper. Wong also presents conditions for asymptotic normality and gives formulae for examining asymptotic efficiency. One does not expect the maximum partial likelihood estimator to be asymptotically fully efficient as some information concerning the parameters of interest has been discarded. However there is a gain in simplicity of analysis and in robustness since the functional form of A need not be specified. The efficiency of the partial likelihood in the counting process context will be addressed below. The concept of partial likelihood is very useful for the statistical analysis of Cox's regression model. Cox (1972) considered Y •...• Y • independent. nonnegative. random 1 n variables with hazard functions. lim P[Y i ~ t+hlY i ~ t]1h = At(i) h!O i=l •...• n. where He parametrized At(i) = At(~.i) by AO(t) exp{~ Xt(i)} Xt(i) is the covariate at time t for the random variable Y and i A is a nuisance function. o The parameter of interest. ~. measures the effect of the covariate X on the underlying hazard rate A. o likelihood of Y •...• Y is. 1 n The 21 n = (2.2) 1 11 i=1 ~=1 exp{{j Xy.(j)} 1 YjLY i Y. n • exp{{j Xy. (i)} Ao (Y.) exp{{j ~_ (j)} ) exp{-J0 1 A (s) exp{{j X (i)} ds} 1 -~. o s 11 i=1 1 In order to estimate {j. Cox proposed maximizing the first term on the RHS (right hand side) above which is a partial likelihood. let Zj = k if To see this Y(j) = Yk and Tj = Y(j) where Y(j) is the jth lowest value out of (Y •...• Yn ). j=1 •.... n. 1 Then it is easy to show that P(Zj = klT j • (Tj_1.Zj_1) •.... (T1.Z1» = exp{{j Xy (k)} ~ i=1 YiLYk k exp{{j ~_ (i)} I{ Y > Y. k - ·~k } (J) which does not involve the nuisance parameter A . If Y is the failure k O th time of the k object. then intuitively the above is the probability that the k th object fails at time T given that there is a failure at j time T and the past history. j To set Cox's regression model in a counting processes context. let Nt(i) = I{Yi~t}. i=1 •...• n. Then (N t (1) ..... Nt (n» is a multivariate ~LO-counting process with intensities. At({j.i) = AO(t) exp{p Xt(i)} I{Y i L t}. i=1 •...• n (Aalen. 1978). If (T .Z ) •... (T .Z ) are defined as in the above paragraph. then the jump 1 1 n n times of ~ are T .T •.... T and process 1 2 n ~ jumps at time Tk . The likelihood on ~ with respect to the standard multivariate poisson point process is 22 n IT exp{ST Ln A (~.i)dN (i) + ST I-A (~.i) ds} 0 s s 0 s i=1 Y. n • IT exp{-S . 1 A (s) exp{~ X (i)} ds + T} . 1 0 1= e e 0 S The above is identical to the likelihood for Y ....• Y except for a 1 n nT factor of e . Thus from (2.1) and (2.2). is easy to see that the partial likelihood is ITk~1 T ~T k probability that the ~ th P[ ~ I~T-]: i.e .. the product over k of the k process jumps at time T given a jump at T k k e e and the past history. Within the counting process framework. Cox's regression model and partial likelihood can be easily extended to counting processes with more than one jump. ~ o t ~ ~~. ~ € 0 m. Let (O.J) be a measurable space. and let J = ~ V contains "prior events" to N. so that '!iT C ~. Let o be a probability measure on (O.~). under which Nt(i) has where ~ (~~'~t)-stochastiC intensity. At(~.i) ~t' ~t t = Yt(i) AO(t) exp{~ Xt(i)}. where. are (~~'~t)-predictable processes. Yt(i) € {0.1}. V t € [0. T]. i=l •.... n. A is an unknown function and o ~ is an unknown scalar. Then by analogy to the above. the partial likelihood can be defined by. (2.3) ~n(~) = ITk~1 tk~T e e 23 Recall that the above multiplied by e(n-1)T is the conditional likelihood of (N(1) •...• N(n» given N(·) on [O.T]. This means that the partial likelihood can be thought of as a "condi tional" likelihood. In this context. Anderson and Gill (1982) derive consistency and asymptotic normality for ~n. the maximizer of the partial likelihood. Y(i). XCi). i=1 •...• n i.i.d .• they show that under a Lindeberg An P A L condition and moment conditions. ~ the true value and For N(i). -1 ~ ~ and Yh(~-~) ~ N(O.~ ). for ~ positive. ~ Another justification for the use of the partial likelihood is given by Johansen (1983). maximizes ~ (~) n He demonstrates that the estimate ~n which can be viewed as a maximum likelihood estimate in a more general setting than that of a counting process. Allow ~ to have jumps of integer size greater than or equal to one [now instead of calling counting process. ~ ~ a is called a jump process] and replace Xo(t)dt by dA (t) in Cox's regression model. where A (t) is not necessarily o 0 continuous. = max A Then. the partial likelihood is a likelihood profile. ~ (~) n ~ (~.A). n Dzhaparidze (1986) investigates the optimality of maximizing the partial likelihood. parameter. ~. derived by Say. one wishes to estimate a in model with two unknowns (say. known that the best asymptotic variance for than if X is unknown. ~ ~ ~. X). Then it is well is smaller if X is known Essentially. if X is unknown. the inverse of the best asymptotic variance is the norm of the score for projection on the space spanned by the score for X. ~ minus its' If the directions of approach to X are restricted (prior knowledge concerning X). then the above projection will be the same or further away from the score for ~. resulting in a smaller best asymptotic variance. is This occurs when ~ 24 finite dimensional and X is an infinite dimensional nuisance parameter (as is the case in the Cox regression model). Dzhaparidze finds that if Cox's partial likelihood is maximized in order to estimate~, e e then the best asymptotic variance is achieved as long as the directions of approach to X are not too restricted. In fact, if X is a known function of a finite dimensional parameter then usually the directions of approach to X will be so restrictive so that maximizing the partial likelihood will not result in estimators with the lowest asymptotic variance. However, if one can not assume a specfic finite dimensional form for X then maximizing Cox's partial likelihood will result in o estimators with the lowest asymptotic variance. (1983) for more on this subject. Since ~ See Begun et. al. is a function of time in this thesis, the optimality properties derived by Dzhaparidze (1986) may not hold. It is of interest to investigate optimality properties of estimators of the function ~ but this will not be done here. e e In this thesis the focus is on how the covariates influence the underlying intensity, not on the underlying intensity itself. in order to estimate ~, Therefore Cox's partial likelihood is used and minimal assumptions are made concerning X . o 2.3 Method of Sieves If the parameter of interest is infinite dimensional, maximum likelihood estimation may not be meaningful. well-known problem of density estimation. 1 all positive L functions with integral 1. This is illustrated by the Here the parameter space is The likelihood, ~=1 f(X i ) can be made as large as desired simply by making f large at points close e e 25 to X. and zero elsewhere. 1 Grenander (1981). proposed the method of sieves as a way to accommodate such a large parameter space. increasing sequence of subparameter spaces. say {~ • An is given so n~l}. n that within each there exists a maximum likelihood estimate. say p~. ~ n and U is dense in 9. where 9 is the parameter space of interest. ~ n n For a sample of size n. the likelihood is maximized over ~. To n achieve consistency. K is allowed to slowly increase with n. n The sieve method is used by Karr (1987) to estimate the unknown ~ in the multiplicative intensity model (A s (i) = s Ys (i). see section Let. 1.3). 9 = ~ {~ is left continuous. right-hand limited. nonnegative ~ 1 function in L [O.1]} and ~ = {~ : ~ € 9. is absolutely continuous; n I~I ~ As K n f CD. E\ Kn f ~ a on [O.l]}. 1 in the L norm (Grenander. 1981). Under some n assumptions. Karr proves that 3 ~ € ~ • a global maximizer of the n likelihood. and for Kn =n ~-T} • 0 < 11 < ~. ~ II p -~ 111 -+ 0 p~ The a.s. underlying idea of the consistency proof in Karr and also in the proofs of Geman and Hwang (1982) centers on the Kullback-Leibler information <Wi For dx = just as do the classical proofs of consistency. Kullback-Leibler information is. f(x.~ o ) dx. KL(~) = Under general conditions J Ln( KL(~) >0 f(x.~ o f(x.~ )/f(x.~) o unless ~ = ~. 0 ). the ) The first step in the above proofs is to find a "natural" metric. say p. so that if lim KL(~n} = O. then lim p(~.~} = O. n n Once this is done. the 26 next step is to locate a sequence. ~ e ~ n One then shows that the mle. An ~ . and -n ~ e for which lim KL(~) = O. n become progressively closer with e increasing n. This should be the case since ~ minimizes KL (~) = n n n An ;;n Ln( f(X '~o)/f(X .~) ) and one expects that KLn(~ ) ~ KLn(p ) ~ KL(~o) < KL(~) and that KL n converges in a uniform fashion to KL. ensures that ~ will be consistent. classical proofs for consistency. This then ~ plays the role that ~o plays in As was mentioned in section 1.1. the sieve used here is the histogram sieve. ~ n = {~ ~ is constant on, 1.] k' k • j=l .... k}. (i::!. A "natural" metric does not seem to arise here. an important role. section 2.5. However. ~ still plays See the remarks following the consistency proof in The histogram sieve is used by Leskow (1988) for estimation purposes in Aalen's multiplicative intensity model (1978) and e· e McKeague (1988) uses this sieve in the p-covariate multiplicative intensity model {At(i) = ~~=1 ~t(j) Xt(i.j)} to estimate (~(1).~(2) ..... ~(p». 2.4 Statistical Model and Assumptions The statistical model considered in this thesis is as follows: each n. one observes an n-component multivariate counting process. n (N (l) .... Nn(n». over the time interval [O.T]. might count certain life events for individual i. For n ~ = For example. Nn(i) ~ ..... is defined on a Nn has stochastic base (nP.~n.{~ : t € [O.T]}) with respect to which ..... n stochastic intensity ~n = (A (l) •... An(n» where e e 27 martingale. In the above. both n n [O.T]. .... X = (X (l) •... Xn(n» ~ and A are deterministic functions on o 0 is a vector of locally bounded • predictable stochastic processes. and yn = (Yn(l) •... yn(n» is a vector of predictable stochastic processes each taking values in {O.l}. As explained in section 2.2. inference for ~ is based on the o logarithm of Cox's partial likelihood (Cox. 1972) (see (2.3». ~ (~) n = n ~ i=l A direct maximization of ~ (~) ~ for n will not produce a meaningful n estimator. For example. let .... X be time independent and each component n of ~n have at most one jump. then if Rank(X ) = n and the jump of Nn(i) ~(T )Xn(i) occurs at T. o /l;T Ln [ ne ~ e jx"(j) ] (which is negative) can be made as 0 j=l close to zero as desired simply by increasing 1989). ~(T o ) (Zucker and Karr. As was illustrated in section 2.3. the method of sieves (Grenander. 1981) is often useful in this situation. The histogram sieve is used here. ~ n n = {~ : n ~(s) = K ~ n ~. l{s € I~} i=l 1 1 for (~l' ... '~K ) The (ll •...• IK ) are consecutive segments of [O.T]. n n K € m n} 28 Define. for each s n Si(~.s) = -1 ~ n n e . 1 J= [a.T]. 6 ~(s)Xn(j) s . (Xns(j»l yns(j) e e i=O.1.2.3.4. The following conditions will be referred to repeatedly in the theorems and lemmas of this thesis. A. (Asymptotic stability) 1) i sE[a.T] i n 3) there exist >a ~ (~ o - Sl(~ .s)1 = 0 0 (Sn (~0 .s) - S 2) i . Isn (~0 .s) sup faT There exist S i 2 (~.s» 0 p .s). i=O.1.2. such that (1). ds = ap (1). and such that i S (b.s) sup sE[a.T] B. Ib-~a(s) I<~ (Lindeberg Condition) 1) c. I~ sup b€IR max I~Hn fT I{s : a Sn (b.s) For all E I = 0 (1) p i=1.2.3.4. e e > a. Ix (j)Y (j) I > 6v£"} ds = 0 (1). s s p (Asymptotic Regularity) 1) There exist constants U 1 max{AO(S). S a(~o.s) ~ S 2) i (~o.s). U2 > a. U2 > a i = a.1.2} a.e. o .s) = U1 and Lebesgue on [a.T]. There exists a constant L V(~ ~ such that > a such that for S2(~O'S) ~a::--~- S (~o.s) V(~o.s) Sa(~o's) AO(S) > L a.e. Lebesgue on [O.T]. e e 29 D. (Bias) 1} ~ 2} ~o(s} 3} V(~o's} SO(~o's} Ao(s} is continuous in s on [O.T]. 4} V(~o's} SO(~o's} Ao(s} is Lipshitz of order 1 on [O.T]. o (s) is Lipshitz of order 1 on [O.T]. has bounded second derivative a.e. Lebesgue on [O.T]. In the following section. a member of 8K will be denoted either by n its' functional form. ~(s) = ~ = (~1"" ~K K ~ n ~il~(s}. or by its' vector form. i=1 1 ). It should be clear from the context which form of ~ is n . n n pertinent. The lengths of the K intervals. I ....• I K . will be denoted n 1 n n n . n n n by I! = (I! l' ... • I!K ) WIth I! (I)' I! (K )' and III! II being the minimum length. n n maximum length and the 1!2 norm. respectively. 1 Other definitions are: 0 1} En (~.s) = Sn (~.s}/Sn (~.s). 2) Vn (~.s) = Sn (~.s}/Sn (~.s) - (En (~.s}) . 2 0 2 3) -n ~ 4} (u) = K ~ i=1 n on n P~ lieu}. for u in [O.T] where 1 on _ ITO Ini(s} Pi - 2 a.1 = IT0 n lies} ~0 (s) V(~0 .s} SO(~o's} Ao(s}ds a V(~ 0 2 . and i .s}S0 (~0 .s}A0 (s}ds. i=l, ...• k n . In the following. the superscripts and subscripts. n. are dropped. A and o ~ 0 are constant with increasing n. Only 30 2.5 Consistency of the Sieve Estimator e e One way to prove consistency of the maximum likelihood estimator is to expand the log-likelihood about the true parameter, say ~ o , and then use a fixed point theorem as in Aitchison and Silvey (1958) or Billingsley (1968a). However, in the problem considered here, general, not a member of OK for any finite K; ~ o is, in hence in the following proof, the idea is to expand the log-likelihood about a point in ~, which is close to ~ , instead of expanding about ~. o 0 OK' say This introduces a technical difficulty as the score function is no longer a martingale but a martingale plus a bias term. To the first order, this bias term can be eliminated by proper choice of ~n as is given in the previous section. Assumptions D and A2 are then useful in showing that the bias is asymptotically negligible. Theorem 2.1. a) e e Assume lim nll211 10 -+ (Bias 0 -+ 0), n b) lim nll211 4 -+ (Variance converges), and 00 n 2 c) then for PROOF: A, C, Dl, .....n ~ lim ~ < 00, n (1) maximizing ~n(~) in OK, Recalling that L is defined in assumption C2, let e e 31 If _K 1 aap. R ~ (13)(13.-13 1 ~ 00. IItll ~ 0). € ~ such that -1 -1 ~- n t. . 1 1= wi th probabili ty going to 1 (as n ~ Aitchison and Silvey (1958). 3 n 1 -n (i» <0 then by lermna. 2 of ago ~n(f3) If3=~ = 0 Vi 1 II~-pnll ~ (IItIl4n)~c5n on and a2~ ~ ap.. (13) a set of probability going to 1. is nonpositive for each i. this proves the conclusion. Since Using n 1 -n a Taylor series about the vector 13 = -1 ~ i:; (n til =i~ a (13-n1.... 13-nK), gives. -n af3 ~n(f3)(f3i-f3i) i (n t i )-1 [ag i ~n(pn)](f3i-~~) i! (n li l - + 1 ~~ ~n(jj"l)(lli-;r;l2 i + sup l~i~K I( n 0 C;i )-1 2 a cD ~ or. ap.. i n n ) + 0 - 1 21 IIR_;;Il112 _ LIIR_;;Il112 . (Rp C;. a. p p p p 1 1 32 Consider {3 € ~, -n 2 4 -1 2 where 1I{3-{3 II = (llill n) on' then, by lemma 2.1 of a an~. ~ section 2.7, _K Y- . 1 (n i.) -1 1 1= -n n 1 + 0 ({3)({3i-{3.) 1 p «vb lIiIl 2 )-1) + 0 p (1) - L + 0 (1)1I{3-j¥111} p (since (lIiIl4n)~ Since lim 0 - _K Y- i=l n (n i.) 1 2 n -1 ° ! 0). n > 0, ana ~i ~ n ~nll-jj"n2 ~ JI ({3) e e ;;Il ((3i-P~) 1 + 0p(nntn IO 2 4 )+Op(ntn ) + OpH-hlntn )-I) + It is obvious that, for > 0, 3 ~ n~ 0 p (1) - L + 0 (1)1I{3-j¥111}. p such that for n ~ n~ Notes to Theorem 2.1. 1) e e By the definition of ;;Il p e , . 2 T -n 2 6 nllill f O ({3 (s)-{3o(s» ds = O(nllill ). Since e 33 4"-2 2T"n 2 nll211 lIifl-;fl1l = 0 p (1) implies that nll211 f O(/3 (s)-;fl(s» ds = 0 p (1). one 2 gets nll211 fT (~n(s)-/3 (s»2ds = 0 (1) + 0 (nIl211 6 ). Therefore in order O o p p n~ ~ 11211-2 «nYz . It would be of interest to to achieve consistency. allow the data to choose the "optimal" rate of growth for 11211-2 . One approach would be to use a minimum complexity criterion as is done in density estimation (Barron and Cover. 1989). However this will not be done here. It is natural to question whether the rate JII211 4n from Theorem 1 can 2) be improved. In general this will not be possible. To see this. let T=l, and 2.= 11K for each i (so 11211 2 =11K). It turns out that ...;na~ 1 1 (pr; 1 -n /3i)' i=I •...• K. behave asymptotically like independent N(O.I) random variables; this indicates that the approximate distribution of _K "2 y- (...;na~ (~ - ~» is chi-squared on i=1 K degrees of freedom. So one 1 expects that i< . 1 (...;na~ (~i 1 1= - ~i»21K ~> rigorously using lemmas 2.1 and 3.1. rate 1. This can be proven Since o~ = 0p(IIK). this gives the JIi7K2. I.e .• Jn1l211 4 . Other norms might allow for different rates. For example. using the JLn(K) max o. l~i~K 1 above intuitive reasoning. it is expected that I ~ - ~ I = 0 (1). P =0(1). p 3) To understand why the choice of pn given above eliminates the bias to a first order. consider the following: Maximizing ~n(/3) 1 over ~ f~ n i=1 ~ is equivalent to maximizing. [S~(P.S) (/3(s) - /3 (s»X (i) - Ln 0 s S (J3 ,s) n 0 ] dN (i) s 34 for /3 This is "asymptotically" like maximizing a €~. "Kullback-Leibler" type information e e (2.4) •• -fTo {/3{s)-/30 (s»2 Y{/30 .s) SO{/30 .s) A0 {s)ds. (under suitable conditions) But the maximizer in ~ of the RHS of (2.4) is given by~. Therefore it is natural to expect that for the maximum partial likelihood estimator. ~. the convergence of A t fa (~{s)-pn{s» A 2 a ds to will be of a faster rate than for choices of /3 € ~ other than pn. 4) Further consideration of (2.4) lends substance to the use of the L 2 norm in proving consistency. Usually in the method of sieves. the Kullback-Leibler information {in this case. (2.4» determines the norm in which the maximum likelihood estimator converges to /3 (1981). Geman & Hwang (1982) and Karr (1987». o (see Grenander e· e In the situation considered here. the L norm approximates. to the first order. the 2 Kullback-Leibler information. 2.6 The Independent and Identically Distributed Case As was mentioned in section 1.3. Zucker and Karr (1989) consider a time-dependent coefficient in the context of survival analysis. is. (Nn{i). Xn{i). yn{i» have at most one jump. That i=I ..... n are i.i.d. and the Nn{i) can each n In addition. they assume that the covariate X is bounded and the censoring mechanism is independent of the time of the e e 35 jump given the covariate. Consistency of the histogram estimator of the regression coefficient is given in a slightly more general setting below. In particular in the following corollary multiple jumps are allowed, and the censoring mechanism operating on individual may depend arbitrarily on the individual's past. <€o'toUa/l,ff 2.1 Consider n i.i.d. observations of (N,X,Y) where both X and Yare a.s. left continuous with right hand limits. Then conditions A and C of section 2.4 are satisfied if a) sets R ""0 is continuous on is bounded away from zero and infinity on b) A c) P[Yt = d) for each t € o in~, [O,T], 1Vt [O,T], [O,T] ] > o. € [O,T] there exists at least two disjoint Borel say At and B , such that t and either, el) X is a.s. bounded, e2) there exists an open interval, or [2 inf t€[O,T] E[ Remark: fj (t), 2 0 sup t€[O,T] sup e (b,t)~ x[O,T] ~ € ~, fj (t) ] containing such that 0 bX t 4 X Y ] t t < 00 Assumptions b, c and d ensure that there is sufficient activity at each point t in order to estimate fj. . 0 Essentially assumption d specifies that the covariate X can not be a.s. constant at a point t. 36 PROOF: Note that under a. el implies e2. Assume a. b. c. d and e2. and define . SJ(~ .s) o =E e ~ (s)X Y Xj s 0 s for s 0.1.2.3.4. j = Assumption e2 allows the application R. Rango Rao's (1963). strong law of large numbers in D[O.T] to get AI. E [nfT [n- 1 ~ (e ~ (s)X (i) 0'1 1= ~ nfb n- 1E [ e Choose ~ If > O. s 0 2~ (s)X 0 so that ~' = [ As for A2. consider . ~ (s)X . ] Y (i)XJ(i) _ E e 0 s Y XJ )]2ds s s . SS s Y ~j s ] ds < 00 j = 0.1.2. (bye2). inr ~ (s)-~. sup ~ (s) + ~] C ~. s€[O.T] 0 s€[O.T] 0 o inf S (b.s) = O. then there exists (b .s ) >1 € (b.S)~IX[O.T] v v v_ for which lim SO(b .s ) v-PJ v v = O. Choose a convergent subsequence of (b .s ) '1' say (b .s ) ~ (b.s) as ~ ~ v v v~ ~ ~ (b.s) € ~lx[O,T]. - ~lx[O.T] Since ~'x[O.T] is closed, 00. Then by the dominated convergence theorem. .s ) = E lims Y exp{ b~s X } = O. -lim SO(b~~. ~-PJ ~-PJ ~ This in turn implies that lim Ys exp{ ~-PJ ~ __ li__ m Y exp{ b~Xs } = (e s~ ~-PJ ~ bX s Ae addition to b and e2. gives Cl. = 1.2.3.4. sup s€[O.T] sup hEm 0 Ib-~ (s}I<~ Sn(b.s) o b~Xs } =0 a.s.; but ~ bX sY leads to a contridiction. hence for j ~ s+y +) a.s. s inf (b,s)~'x[O.T] By assumption c this So (b.s) To prove A3. choose ~ > O. This. in as above. then 37 sup s€[O.T] ~ su~ b~' Thus by Theorem 111.1 of Appendix III in Anderson and Gill (1982). and the above arguments. sup s€[O.T] sup b€IR b-t3 (s) I o I<"Y All that is left is to prove C2. If inf V(t3o .s) = O. then there s€[O.T] = O. exists {sv}v_>1 € [O.T] for which lim V(t3 .s ) v~ subsequence of {sv}v~1' closed. s € [O.T]. ---= ~ were h EX -_ s [ {s~}~~1 ~ v s as Choose a convergent ~ ~oo. Since [O.T] is Then by dominated convergence. lim V(t3 .s ) o~ ~-- say 0 =E ~ Y [X - EX ] [ lim ---""'s s s ~-- ~ ~ 2 exp{ t3 (s)X o~s ~ } ~ ]= O. t3 (s)X sy X s s Ee o This in turn implies that. e t3 (s)X 0 s y [X _ s s Exs ]2 By assumption c. either Xs contridicts d. Ae = EXs t3 (s+)X 0 or Xs+ s+ Y [X s+ s+ = EXs+ a.s. Exs+ ]2 = 0 a.s. This. however. C2 is proved. o Note to corollary 2.1 1.) If lim nllill n 6 < 00. lim nllill n 4 = irK' lim ~ n l(I) 00. < 00. and t3 is 0 Lipshitz continuous. then under the conditions of corollary 38 2.7 Lemmas 2.1 and 2.2 Lemma 2.1 Assume. a) I! Ii m .:.1!9.. n 1!(1) b) A. Cl, Dl, < en. and then. ~ 2) 4 2{1I1!1I n) I(n max 1~i~K -1 I!.) 4 T -2 -K [1I1!1I f O r- Ii{s) I!i i=1 -1 cJ2 ~ ~ 8/1-; 1 n -1 a 2/ i ;;n (p ) + I!i 0 V{J3 .s)8 (J3 .s)X (s) ds 0 =0 0 2 -1 ) + r p 0 ({vn 1I1!1I) 0 p (1). and i 3 3) /{n l!i)-1 £....3 ~ (J3*)/ = 0 ({~ 1I1!1I 2 )-1) + 0 (1). 813. n p p iUP max 1~i~K 13 €~ 1 IIJ3*-jjn ll <.5'Y PROOF: 1) Consider. ~ {{n I! )-1 ~~ {jjn»2 i=1 8J3 i n i = (2.4) _I( X- i=1 ({n I!i) -1 .Jl T ;;n f Ii{s)[X {j)-E (p .s)] dN (j» j=1 O s n s ~ 2 ~. 2 ~ {{n l!i)-1 ~ fTO I.{s)[X {j)-E (prt.s)] dM {j»2 i=1 j=1 1 S n s 39 + 2 _K -1 T ;;n r(t. So Ii(s)[E (p .s) i=1 1 n o - En (~ .S)]S (~ .S)X (S) ds) on 0 0 2 s ~o(u)Xu(j) where M (j) = N (j) - S e Y (j)X (u) du is a local s s o u 0 martingale. j=1 ..... n. Let. _K -1 ..Jl t ;;n 2 Zt = r- «n t.) z So I.(s) [X (j)-E (p .s)] dM (j» .1 1 ·1 1 S n s 1= J= The compensator of Z. is _K Ct = r- (n t.) .1 1 1= _K = Sot i=1 r- -2 ;;n ~_ ~o(s)Xs(j) I.(s)[X (j) - E (p .s)]-r (j)e X (s) ds ·1 S n s 0 J = .1 ~ n t So -2 -1 2 1 ;;n I.(s) t. n [S (~ .s) - 2S (~ .s) E (p .s) 1 1 non 0 n 2;;n 0 n n + E (p .s}S (~ 0 .s}]X0 (s)ds To show that Zt has the same limit in probability as its' compensator C. it is sufficient. by Lenglart's inequality. (Lenglart. 1977) to show 4 that the quadratic variation of IItIl n(Z-C} goes to zero in probability. Denoting the endpoints of interval Ii by a i and a i + 1 . and defining M*(a .• s} = 2 ffl n- 1 SS- t~1[X (j)-E (p11.u)] dM (j). the optional 1 j=1 ai 1 u n u 4 variation of IItIl n(Z-C) is (Kopp.1984. pg. 148). 4 [lItIl n(Z-C}]t = ~ IItll 8 2 n (A(Z-C}s}2 s~t 82 =lItlln ..Jl z j=1 Sot -K * 2 r-I.(s} {M(ai.s) i=1 1 ;;n 2 -2 -2 ;;n 4 -4 -4 • [X s (j) - En (p .s)] n t.1 + [Xs (j)-En (p .s)] n t.1 * + 2 M (a 1.. s) • ;;n [X (j) - E (p S n .s)] 3n-3t -3 } i dN (j). s 40 Then the compensator of [11211 4n(Z~)] is given by, 4 <11211 n (Z~» { t _J( = 11211 8 n 2 fat . r1 1. (s) 1 1= * 2 -1 -2 [1M (a.,s) n 2i ~ ~ n' l J= 1 -n (X (j)-E s n (~,s» 2 e ~o(s)Xs(j) Y (j) s -2 -3 [1 ~ + 2 M* (a., s) n 2. - ~ (X (j) lin . 1 s J= - - En (~,s» 4 3 e ~o(s)Xs(j) _J( * 2 rli(s) M (a. ,s) ds t = 11211 n fa i=1 P ]} A (s)ds 0 0 (1) 1 + n-1 0 (1)+ 11211 2 Y (j) s P _J( fat rL(s) IM* (a. ,s) Ids 0 (1) i=l IIp by A3, Cl. Now, max l~i~K sup 1M*(ai,s)1 ~ sEl i max l~i~K sup IM*(O,s]I + max IM*(O,a i ]1 sEl i l~i~K 1 1 1 i< 2? n- 2-i fos 1.(u)[X (j)-E (~,u)] dM (j)1 O~s~T i=l j=l 1 U n u ~ 4 sup and using Lenglart's inequality (1977) for B P( sup O~t~T I -l< r- ~ n-1 2-1 f ~ i=l j=l i > 0, t -n 12 O li(s)[Xs (j)- En (~ ,s)] dMs (j) ] 41 + En2 (P-n .s) Sn0 (P0 .S}]A0 (s) ds ] for B large and n large (use A3. C1). ~ Eo Therefore Eo > ~2 - 11211 2 f~ i< i=l I.(s}IM*(ai.s}lds = 0 (n-~). 1 P * _K .X- t Consider the process. f o Ii(s}M (ai.s) 2 ds for t belonging to 1=1 {aO = 0.a1.···.~+1= T} and the family {~ai}i=O.K+1· On {~ai}i=O.K+1. • fO * _K 2 X- Ii(s} M (a .. s) ds . 1 1 1= • n _K - f O X- I.(s} . 1 1 1= ~ 1 J= . n -2 -2 2. 1 f s a. 1 [X (j)u 2 Po(u}Xu(j} En (pn.u)] e Yu (j}A0 {u} du ds is a local martingale. B > O. Eo Therefore by Lenglart's inequality {1977} for > O. (2.6) 2 -n 0 + En (P .u}Sn (P0 .u) - 2 En (~n.u}Sl{p .U}]A0 {u} du ds n 0 ~ Eo Therefore <11211 (2.5). (2.6}). 4 ~ for B and n large {use AI. C1. and lemma 2}. n{Z-CPT = 11211 2 0p{l} + n- 1 0p{l} + n~ 0p(l} This. as mentioned earlier. implies that Eo B -2 (by ] 42 Since. sup O~ t~T -K I. (5) 2.-2 n -1 V(fj. s)S0 ({j .S)X (5) ds I 11211 4 n ICt - f Ot ri=l 1 1 0 0 0 = 11211 4 _K -1 Y- 2. i=l 1 0 P (1) Al and C1. by This concludes the proof for the first term on the RHS of (2.4). Consider the second term on the RHS of (2.4). _K Y- i=l -1 T 0 -n 2 (2. f O 1.(s)[E (fj .5) - E (fj .s)]S (fj .s)X (s)ds) . lIn non 0 0 o - V(fjo .5) S (fj0 .5» • sup X0 (5) ds) 2 sup fj(s)€IR O~s~T /fj(s)-l3o(s)I<'Y Using a Taylor series for fixed s yields: En (~.s) = En (130 .s) + (~n(s) - 130 (s» Vn (130 .5) where II3(s)-13o (s)1 ~ I~(s) - 130 (s)/ Then. since sup I~(s) - 13 (s)/ = 0(1). O~s~T 0 -K -1 T ; ; 1 l 0 2 r(2 i f O 1 i (s)[E (p .s) - E (13 .s)]S (13 .s)X (s)ds) i=l n non 0 0 43 ~ 211xll 2 + 211yll2 It turns out that. IIxll 2 = ° e!') p n 6 and lIyll2 = 0(lIiIl ) term on the RHS of (2.4) is equal to D1. so that the second ° e!') + ° (lIiIl6 ). p n p By A. Cl, and one gets. ° ° _K (IT 1. (s) IV (/3 . s)S (/3 . s) - V(/3 •s)S (/3 . s) Ids) 2 0(1) IIxll 2 = YO 1 i=l non 0 0 0 = ~ (i~ i=l i JI ° n )2 p (1) 1 ° (-). and 6 lIyll2 = ~ (i~)2 ° (1) = lIill ° (1). i=l P P = p n 1 + i~l lIb I i (s)V(/3o.s) dMs (-) I + i~l lITo I.(s)V(/3 .s)(SO(/3 .s) - SO(/30 .s»A0 (s)dsl 1 Ion 0 ~ sup O~s~T + + Iv (~.s) n - V(/3 .s)1 max i~l 0 i Ib I.(s) dN (-) 1 I max i.-1 IT I I (s)V(/3 .s) dM- (-) sup IsO(/3 .s) - SO(/3 .s)1 - max i-i l~i~K O~s~T O i l O S n 0 0 S l~i~K 1 ITO I i (s)V(/3 .s)A (s)ds 0 0 44 So by lenuna 2.2 and C1, 3) I(n i.) AUP max 13 l~i~K €~ 1 -1 0 3 * )1 --3 ~ (13 013. n = 1 1I(j*_~nll<.5" max l~i~K <iN( e) I By assumption A3, C1, and I enuna 2, the above is, o Lemma 2.2. Assume AI, A3, C1, and D1, then, 2) sup O~s~T ISi(~,s) - Si«(j ,s)1 n n = 0 0 (i(K» i=O,1,2. sup t€[O,T] Iv£" Mt ( e) I. p PROOF: 1) Let B max l~i~K > 0 and consider, Iv£" f6 Ii (s) <1M (e) s I~ 2 Using the version of Rebolledo's central limit theorem present in Anderson and Gill (1982) it is easily proved that for Ztn = vnr- -Mt(e), Zn 45 converges weakly to a Gaussian martingale with variance function f Ot s0 (~o .S)A0 (s)ds. An application of the continuous mapping theorem (Theorem 5.1 in Billingsley. 1968b) suffices to prove 1. 2) Fix s. then using a Taylor series about ~(s) results in. Therefore. sup O~s~T i i -n I (~ .s) - S (~ .s) ISn on = sup O~s~T I~o (s) - ~(s)1 0 p (1) (by A3) o III. ASYMPTOTIC DISTRIBUTIONAL TIIEORY 3.1 Asymptotic Normality of the Sieve Estimator In order to conduct inference about the regression coefficient function. P . it is useful to consider some sort of weak convergence o result for .....n p. However. in this case as in similiar situations where the parameter of interest is a function (Karr. 1985; Leskow. 1988; Ramlau-Hansen. 1983) normalized versions of ~(t) and ~(s) have asymptotically independent normal distributions. .....n means that the limiting distribution of p Intuitively. this is "white noise." This complicates inference concerning the function P • as this excludes a o functional central limit theorem. Karr (1985) circumvents this by giving a supremum type statistic which has an asymptotic extreme value distribution. Another possibility is to consider an integrated version of ~ as will be done below. McKeague (1988) also considers an integrated version and then proposes the use of a supremum type statistic based on the integrated estimator for inference purposes. might also consider various T ..... f O wn (x)(~(x)-P0 (x» One weighted integrals of ~. i.e. dx as is done in Aalen (1978) and in Gill (1980). This is done in chapter IV. In the following. the existence of a sequence of estimators (~ € E\n ) such that II~-jflll =0 p (1). as n -+ 00. is assumed. Recall that 47 the capital letters in the assumptions refer to the conditions stated in section 2.4. The following weak convergence result is in terms of the 8korohod topology on D[O,I] (see Billingsley, 1968b). Theorem 3.1 a) Assume, - lim nllill 8 = 0 ~ (Bias 0), n b) lim nllill 4 = 00 (Variance converges), and n c) -E- < A, B, C, D2, 04, lim n 00, c;(1) n then, w .~ ~ f O p (s) - ~o(s)ds => G where G is a continuous Gaussian martingale with GO = 0 t <G>t = f O (V(~ a.s.,and 0 -1 ,s)8 (~ ,s)X (s» ds. 000 PROOF: Using assumptions D2 and 04, it is easily proved that sup t€[O,T] I~ f~ pn(s) - ~o(s)dsl = O(n~ lIiIl 4 ). To show that ~ f~ pn(s) - ~n(s) ds ~> G consider the following Taylor series: o 1 a ~ 1 a ;;Il = - - - ~ (p ) = - - - ~ (p ) ~ a~i n ~ a~i n where * ~ II~ -p II ;;Il "'n II. ~ lip -~ 48 Define 2 3 A2 -1 a nn 1 a * An - n 0i = n --2 ~n(p ) - 2n --3 ~n(~ )(~i-~i)· a~i that P[ min -1 A2 l~i~K 2. 1 0. 1 a~i L > -2] ~ 1 so it is sufficient to consider Yh f~ (~n(s) - pn(s»ds on this set only. Yh Lemma 3.1 implies Therefore. solving for An -n (~i-~i)' multiplying by lies) and integrating from zero to t results in. (3.1) 2 where 0i = f T 0 li(s)V(~ .s)8 (~ .S)A (s) ds. i=l ..... K. o 000 first term on the RHS of (3.1) is 0 p To show that the (1) in sup norm consider. (by lemma 2.1) 49 (1)] . nlo [ 6 1 4 ] + 0 (11211 ) + 0 P 11211 n p p n = 0p (1) (by I), 2), and lemma 3.1). As for the second term on the RHS of (3.1), 1..Jl JT ° i=lr- I.(s) --2 ° I.(u) 1 j=l 1 1 Jt = -~ _K 0 .-n (X (J) - E (P ,u)) dN (j)ds. ~ n U i u Let, ~ Z =!- ff1 Jot [ I.(S)2 2f ](X (j) - E t r-. 1 . l I O. S n vn J= 1= 1 (~n,s)) dN (j) for each t s € [O,T]. Using McKeague's (1988) lemma 4.1, one gets that if Z w => G, then the second term on the RHS of (3.1) converges weakly to G. Now, Zt =!- ff1 Jot r- . 1 vn J= + [~ I.(s) 22i](X (j) . lIS 1= - E n 0i ~ Jot [~ I.(s) '11 1= (~,s)) 22i](E (P ,s) - E no n 0i dM (j) s (~,s))So(P ,s)X (s) no 0 By lemma 3.2, the second term of Zt is 0p(l) in sup norm. ds. As for the first term, the idea is to use the version of Rebolledo's central limit theorem in section 1.2. <Y> Call the first term of Zt' Y . t Since ° 2 t [ r-I.(s) .-K 2i] [S2 (P ,s) + E2 (P-n ,s)S (P ,s)= JO --2 t . l IO.n 0 n n 0 1= 1 and max l~i~K sup 12~1 O~ scl 1 1 i - V(P 0 ,s)SO(P ,s)X (s)1 ~ 0 (by the continuity of 0 0 50 V(~o ,s}SO(~0 ,S}A0 (s) in s}. one gets, using Al and lemma 2.2, that P 0 t <Y t > ~ Io (~ .S}A (s)] O [V(~ ,s}S o 0 -1 ds. A Lindeberg condition must be satisfied also; that is, show T 1 ~ -n 2 I On. - ~ (X s (j) - En (~,s)) e J= 1 * is 0 p I{s : Ix s (j) - En (pn,s) (I) for each > O. ~ ~o(s)Xs(j}. ( _K 2 i )2 Y (J}A (s) Y- I.(s) --2 so. 1 1 1= O. 1 I > ~~ Recall that (.t< Ii(s) o.2~)-1}dS' 1=1 m~n 1 1 -1 2 2 i 0i ~ L so the Lindeberg condition will be satisfied if, (3.2) IT ! 'J!l (X (j) - E (pn,s}}2 e 0 n j=l s n ~ 0 (s}X (j) s Y (j}A (s) s 0 * = 0 p (I) V ~ I{s : Ixs (j) - En (pn,s) I > ~~}ds > o. The LHS (left hand side) of (3.2) is bounded above by, 4 IT ! 'J!l o n j=l ~ X (j}2 e s 0 (s}X (j) s Y (j) A (s) I{s s 0 IXs(j}1 > 2~ ~} ds (3.3) ~ 4 ITOn. ! 'J!l1 J= ~ X (j}2e s + 0p(l} 0 (s}X (j) s YS(j)AO(S} I{s Ix s (j) I·Y s (j) by AI. C1, and lemma 2.2. So the LHS of 3.2 is Ixs (j) Y (j}1 > -2~ ~}ds + 0 (I) s p > -2~ ~}ds 51 = 0 p (1) (by B and AI). 3.2 A Consistent Estimator of the Asymptotic Variance Process Theorem 3.2 Assume, 4 = lim nll211 n b) AI, A3, C, DI, 03, lim (XI, lim 11211 n 2 a) =0 and -E. < n (XI, c;(I) then for A2 o (s) n o 2 =- (s) = _K X- I.(s)(2 n) j j=I J V(~ 0 o ,s)5 (~ 0 sup s€[O,T] -1 8 2 ----2 8~. J An ~ (~ ) and n ,s)X0 (s), 10A2 (s) n - 2 0 (s)1 = 0 p (1). and, PROOF: sup 1;2(s) - 02(s) 1 = max sup s€[O,T] n I~j~K s€I (3.4) 1(2. n)-I j J a2 2 8 ~j ~ (~) n - 02(s) 1 52 + I2.-1 J max I~HK o.2 - 0 J 2 (s) I 2 J where o. = I T0 2 I.(s)o (s) ds. J The third term on the RHS of (3.4) is 0(1) by the continuity of 02(s) and the second term on the RHS of (3.4) is 0 p (1) by lemma 2.1. Consider the first term on the RHS of (3.4): I(2 max I~HK n) j -1 2 8 :::0 -1 - 2 ~ (p ) - (2 n) -:2 ~ (p;;11 ) j 8 {j n 8 Jj ~ n a2 j I j ~ max 2-j 1 IT I.(s)IV (~j's) - V (~j.s)1 dN 11'~J_ ·<K = max 1 ~HK n J 0 n T· I8~ 8 2-1 V ((j*.• s) j I 0 I.(s) J .... j n J where I dN- ~I ~ I~ - i3~1 l{j;- s s (e) (e) 1:::0;;111 Pj~ - P~ J for each j. ~ max 2-j 1 IT Ij(s) dN (e)1 ~-~jl 0 (1) l~j~n s 0 P J by A3 and Theorem 2.1 [Note that the assumptions lim nll211 10 = 0 and A2 are not necessary in n max I~j-~I Theorem 2.1 if one only needs ~ max l~j~n + 2-liT I Ij(s) dM- (e) I j 0 s max l~j~n = 0 P (1) Therefore l~j~n sup s€[O.T] =0 (1)] P l.{j.-p~ . .n ;;l11 J J 0 (1) P I o ({j .s)X (s) ds II.{j.-p~ . .n ;;l11 0 (1) 2-1 I T Ij(s)S j 0 n 0 0 J J P by AI. lemma 2.2 and Theorem 2.1. 10....2 (s) - 2 0 (s)1 = This in turn implies t~t sup (1). 0 n O~t~T P II~ [~2(S)]-1 ds - I~ 0-2(s) n dsl = 53 o (1) (by C2). p o 3.3 A Partial Likelihood Ratio Test of a Finite Dimensional Null Hypothesis Versus an Infinite Dimensional Alternate In the following. the null hypothesis H : o and the alternate hypothesis HI: ~o ~ is a constant function 0 is nonconstant. are contrasted. Just as in the problem of estimating an infinite dimensional parameter. the partial likelihood under the alternate is maximized at 00. This problem can be remedied. as before. by working within the context of a n sieve: that is. for a sample of size n. H o ~o € ~. = H0 is contrasted with ~1: In classical theory. the distribution of the likelihood ratio n test of a one dimensional null hypothesis versus a K dimensional alternate will be approximately a chi-squared on K-l degrees of freedom. In this case the dimensionality of the alternate. K • increases with the n sample size but intuitively one still expects that the partial likelihood ratio test (PLRT) will have an approximate chi-squared distribution on K -1 degrees of freedom. n Since the standardized chi-squared random variable ( i.e .• subtract K and divide by the square root of (two times K) ) has an approximate N(O.I) distribution for large K. it is natural to consider this standardization of the PLRT as is done in Theorem 3.3 below. A For the remainder of this chapter ~(s) A T A A K =~ i=1 A n Ii(s)~ will be 1 written as the vector ~(Hl) = (~ •...• ~ ) in order to make clear that n 54 this is the maximum partial likelihood estimator under~. Similiarly. the vector. ~(H~). represents the maximum partial likelihood estimator t A under ~ on the interval [O.t] and the scalar. P{H )' represents the o maximum partial likelihood estimator under ~ on the interval [O.t]. o Likewise a subscript t attached to ~n{P), that is. partial likelihood on the interval [O.t]. ~n{P)t Therefore ~n{P)T the partial likelihood on the entire observation interval. ! i j=l i j and i{t) to be that i for which a i - l < t ~ denotes the represents Define a. = 1 Recall that the ai . ii's and subsequently the ai's and i{t) change with n and that a 2 i = JT O Ij{s) V{Po .s)So{P0 .S)A0 (s) ds. Theorem 3.3. Assume. a) P b) lim nllill 4 = n 0 is a constant function. 00. e lim Jlill 2 = 0 and. n i c) A. B. C. lim~ n i{l) = 1. Defining for t € [O.T]. then. _K Xl where Xn t j=l n ~.!8A ~j ~ n {P )t] 2 0 -2 a. -i{t} J = ----:::=:::=----- n and X converges weakly in D[O.T] (in the supremum topology) to a standard Wiener process. and max t€[al···~] IY~I = (l). 0 p 55 Remark The maximum of the process yn is only considered over [al ..... ~] because to show that ~(H~) is consistent for small t. the square root of the information matrix multiplied by the score function must be close to zero for small t. This does not occur here. appears to be related to the fact that lim t-~IW 1= t-() the standard Wiener process. PROOF: 00 This a.s .. where W is t For a similiar situation see Csaki (1975). The following proof consists of four steps, 1) max t€[a 1 ·· .~] 2) max t€[al···~] I~n (~(Ht»t - ~ (P 0 )t l on = 0 p (Ln K) K-%12{~n(e(H~»t - ~n(~o)t}- .tK n-1(;8 ~n(~o)t)2uj21 Pj J=1 = Op«lIiIl4n)~). Steps 1) and 2) imply that for yn = Xn - Zn Now, X~ = (2KT-l)~[ tKn- 1 ~Iot I.(s)a~2(u2(s.i) + 2M* (j)U(s,i» dN (i) - i(t)] j=1 i=1 J J s- s where U(s,i) = Xs (i) - En (P 0 ,s), and M*(j) = ~ I~ I.(u) U(u,i) s i=1 J dM (i). u i=I, ... ,n; j=I, ... ,K and s € [O,T]. n In the third step, the compensator of the first term in X is shown to 56 n converge to the second term of X . 3.) _K t sup K-~I Yf O I.(s)a.-2 V (~ ,s) t€[O,T] j=l J J n 0 0 Sn(~o's) A (s) ds - i(t) I 0 n This implies that X has the same weak limit as, (2KT-1)~ (3.5) iK! zn j=1 fOe I.(s)a-2 (u2(s.i) + 2M* (j)U(s,i» n i=l J j s- dM (i). s The last step will then be to show weak convergence in D[O.T]. This will be done using the Skorohod metric. but since the standard Wiener process is continuous. weak convergence with respect to the Skorohod metric will imply weak convergence with respect to the supremum metric (Billingsley. 1968b. pg. 151). 4) (3.5) converges weakly to a standard Wiener process on [O.T]. For the duration of this proof denote a 2 (s) = V (~ .s)So(~ .s)A (s) n non 0 0 and a (s) = V(~o,s)So(~o.s)Ao(s) 2 for s € [O.T]. To prove step 1) recall that under H . o ~n(~)t = i:: f~ ~s(i) each t € - Ln SO(~.s) dN (i) n s [a1 ....• ~.]. a Taylor series about Also for each t € [a1 •...• ~] where ~ is a scalar. A ~(H t o ) results in For 57 ** where I~t t - ~0 I < I~(H ) - ~ I 00 A Therefore, t n o »t A 2{~ (~(H - ~ (~ )t} no = (n -~ d 2 d ,..,no R ~ (~ )t) In lemma 3.4, it is shown that for any I~t-~ol < A {~t}t€[ ] for which al'''''~ t 1~(Ho) - ~ol for t € [al'···'~]' one has that 2 This implies that max t €[ al' ... d * I(tn) -1 ---2 ~ (~t)tl R n d ,.., '~ ] 2 and that P[ min t€[a l ,··· ,~] (-(tn)-l dd...2 = 0 p (1) ~n(~~)t) > 0] p n: (byel), (by C2). 1 Therefore, sup [(tn) t€[2 ,T] ~d l (recall 2 1 rlR up 2 ~ (~)t] = all. n 0 0 (1) p 58 Let c >0 and for each n define T7 by, Tn inf {t tE[i 1 ,T] = 1 = p[T-T7 T + 1 ,.n Note that T7 is a ~ (3.6) I-P fO ds t an(s) 2 sup tE[i 1 ,T] f~ a2 (s) ds P[3 t E [i 1 ,T] 3 ~ 1 - P[3 t E [i 1 ,T] 3 dR ~ n IJ 1 + c} {... } = <p. ~ 1 - ~d ~ 2 f O a (s) ds if as n = 1 - Po(I) Note that n t stopping time and that [ < 0] t 2 f O an(s) ds (/3 ) 0 >1 + c] t 2 fo an(s)-a2 (s)ds ~ c tl an(s)-a 2 2 (s) Ids t -1 f o ~ w t 2 fo a (s) ds] ~ c by A2. is a square integrable martingale. tA'f.l 1 impl ies that, X t = [n~ :R IJ ~ (/3 ) n 0 tA'f.l ]2 for tE [O,T] is a submartingale. 1 To finish the proof of 1) consider B P[(BL (K» n -1 sup t tE[ i 1 ' T] ~ P[(B Ln(K» -1 -1 (n ~ d dR IJ + P[(B Ln(K» (/3 » not liA -1 >0 ~ sUR t T] tE[i 1 , sup -1 L] (n tE[~AT,T] and 2 > 1] JA d 2 d/3 ~n(/3o)t) t -1 (n -Y.I d > 1] d/3 ~ (/3 )t) n 0 2 > 1] This 59 + P[ Since E(n sup (n-~ ~ ~ (~ ) )2 t€[~AT,T] d~ n o t -~ d d~ ~ (~ ~ n ) 0 (t+ii)A~ ) 2 > (~lAT) n (t+i 1 )AT l 2 = E fa 0 n B Ln(K)] (s) ds, the Birnbaum-Marshall inequality (1961) implies, ~ E TA~ 2 fa 0 n (s) ds BT Ln(K) + T fa ~ t- 2 Ef + (B Ln(K»-l f~ tA~ o 1 o~(s) ds dt 1 PET > ~] 2 (s) ds(l+~) + (B Ln(K»-l B T Ln(K) fT 0 + PCT i t- 2 fat 02(s) 1 ds dt (1+~) > ~] An application of Lenglart's inequality (section 1.2) and AI, C1 suffice to show that the third term above goes to zero as n ~ 00. Therefore, by (3.6) and the above, P[(B Ln(K» -1 sup t -1 (n -~ d 2 d~ ~n(~o)t) > 1] t€[i1 ' T] ~ (B Ln(K» -1 faT 0 2 (s) ds[l + ~] + (B Ln(K» -1 (Ln T - Ln i 1 )O(1) + 0(1). That is, the above can be made arbitrarily small for B large and n large. Step 1 is proved. Under HI' ~ the parameter by which ~n is differentiated is a 60 vector. i.e .. for t = .1 ~ f ot I.(s) J J= = avo v € [1 ..... K]. ~n(~)t can be written as ~n(~)t (~j Xs (i) - Ln SO(~ .. s» dN (i); therefore. ~(Htl)' for t n J s A tAt = avo is a vector of v components (~I(Hl)'· .. ~v(Hl». In addition. t t since ~v (H l ) is the solution to 0 = f o I v (s)[X s (i) - En (~j's)] dNs (i). one gets that ~v(H~) is the same for all t ~ avand can be written as A T ~v(Hl). For t € [a l ..... ~] consider the following Taylor series about A t ~(Hl) . where Now for j € {I .... . K}. This results in. Therefore for t € [al •...• ~]. 61 -1 = 2 I!. J i (t) -1 0j( -I!. J 2 n-1 8 ..2 '£ (13*.)T) 8p~ n J J L [ j=1 Denote the above term in square brackets by An(j). then. 2K~ {'£n(P(H~»t - '£n(f3o )t} _lL = K-n i(t) i(t) [(n~ ~'£ (13 ) )20~2 - 1] An(j) + K~ L L j=1 8 f3 j noT j=1 J ~ i(t) _~ 8 2 -2 n L [en 8 R '£ (13 )T) o. - 1][A (j)-I] j=1 Pj n 0 J = K -1h i(t) + K L [An (j)-I] j=1 ~ i(t) + K L j=1 (n ~ on ni (t){_I!-1 n- l a2 '£ (f3~) j=1 j 8~ n J T Since lemma 3.4 implies to show that 8 8 2 -2 f3 j '£n(f3o )T) OJ • > O}. An(j) 62 max t€[a1 ..... ~] K-~12{~n(~(H~»t i(t) ~n(~n)t} - ~ - j=l = 0 p (1). it is sufficient to prove _K r- n [A (j)-l] 2 = -1 and _K -Jh r- [en . 1 K p 8 2 ~R ~ (~)T) n VI'J. J= (1). 0 j=l J 0 -2 J o. 2 1] = 0 (1) . p This will then conclude the proof of step 2. First consider _K r- [An(J.)-1]2 __ j=l 2[ 2 [1!-1 8 " (R*) ] . o. - 1!-1 j n -1 2 I'J. T j=l J J 8~. n J J _K r- (J) by lemma 3.4. Using a Taylor series. the term in brackets on the RHS above becomes. I 2[ 3 -1 -1 -1 -1 -1 8 ~ I!. o. -I!j n ...2 ~ (~ )T - I!j n - 3 ~ (~. )TW . -~ ) J J 8P n 0 8~j n J J 0 j a2 * ] ~ where ~ I[I!.-1 J 2 -1 o.J + I!.J n -1 I~j-~ol V I~j-~ol ~ l~j(Hi) a2~ 8 j -1 -1 ~ (~ )T] I!j n no a2~ ~ v j - ~ol ~ (~ )T no I 63 T A + (lL (HI) - 13 ) J 2 0 Ii.-1 J n -1 a3 ~ - 3 '£ (/3·)T I al3. n J J Therefore by arguments used in the proof of lemma 3.4, + tK [~j(HT1) - 13 ]2 0 (1) . lOP J= ~ (3.7) tK .1 J= [i~l fT 1.(s) V (13 ,s) dM (_)]2 0 (1) O J J no s p + _K Y- [i j=l -1 j T 0 f O 1j(s)[V (13 ,s) 8 (13 ,s) non 0 o 2 - V(130 ,s)8 0 (13 ,s)]~ (s)ds] - 0p (l) 0 (by lemma 3.3). Consider, _K -2 t 2 Zt' = Y- i. [fO 1j(s) V (13 ,s) dM (-)] , t nos . 1 J € [O,T] . J= The compensator of Z', evaluated at t = T is _K -2 -1 Y- i n . 1 J= Therefore sup t€[O,T] j T 2 0 non f O 1j(s) V (13 ,s) 8 13 0 ,s)~ 0 (s) ds IZ~I ~ 0 by Lenglart's inequality. From the above and (3.7), one gets that 64 In order to complete the proof of step 2), consider ~ -1 K " 1 [en ~na ~n (130 )T) 2 v~" -~ J= -2 o. - 1 ] J J ~ 2K 2 -4[ -1 _K T _2 rOJ n -1...n ~ So I.(s)[U-(s,i) "I J= "I 1= -1 + 2K J * + 2M _(j)U(s,i)] dM (i) s s ]2 _K -4 T 2 2 2 ro. [So Ij(s)[o (s)-o (s)] ds] j=l J n By A2 and Cl, the second term above is 0 (IItIl 2 n)-1). p Now, let Z', be defined by Z . -1 = K _K r- j=l t -4 -1 o. [n ...n i=l ~ J Sot * _..2. I j (s)[U-(S,l) + 2Ms _(j)U(s,i)] dMs(i)] 2 e. The compensator of Z', evaluated at t = T, is + j -1...n n i=l where f (s) = n 4M*s- (j)2 Vn (13 ,s)So(13 ,S»A (s) ds on 0 0 j 130 (s) Xs(i) U (s,i) e Y (i) ~ j = 3,4. s The proof that the above compensator is 0 (1) is given later, starting p at equation (3.8). that -1 K sup O~t~T _K r- j=l Z~ [en =0 . Using Lenglart's inequality this in turn implies (1). Therefore, p ~ a ~n V~j ~ n (13 )T) 0 2 -2 OJ To prove step 3), consider - 1] 2 = 0 (1) p and step 2) is proved. 65 sup K~I~ ~t€[O.T] j=l ~ fat 1.(s)a.~ a2 (s) J K-~ _K ~- . 1 J= Ij(s)a.-21 a2 (s)-a 2 (s) faT J sup l~j~K t€l I n -2 -~ max + K aj j ds J j P J ds fat 1.(s)a2 (s) ~ a-2 I!~ n-~ 0 (1) + K~ . 1 J= I ds - (i(t)-l) n J (by A2 and C1) (by b). Step 3) is proved. Define X' by (3.5). i.e .. [O.T]. t € To show that X' converges weakly in D[O.l] to a Wiener process. use the version of Rebolledo's central limit in section 1.2. The verification of the conditions of this central limit theorem completes this proof. Note first that <X') = T(2K) t -1 ~ ~- j=l n -2 ~ ~ i=l * M (j) s- fat 2 _~ -4 4 __":l UI(s.i» e P X (i) 0 s Y (i)A (s) ds s 0 (3.8) + 4M* (j) 2 V (P .s)S0 (P .s»A (s) ds s- * 1.(s) a. (U (s.i) + 2U1(s.i) M (j) + J J s- non 0 k 1 k P X (i) where f (s) = nffl.u (s.i)e 0 s Y (i) n i=l s 0 k = 3.4. 66 By A3 and Cl, T(2Kn) -1 _K Y.1 J= -4 4 t x f O I.(s)o. f (s) °(s) In J 2 -1 ds = 0 (nlll!lI) p ), so that i f T(2Kn)-1 i< f ot . 1 I (S)0-j4 4M* (j)2y (P ,s)So(P ,s)X (s) ds = 0 (I), j snon ° ° p J= one gets by the Cauchy-Schwarz inequality that T(2Kn) -1 _K Y- . 1 J= t -4 * 3 fO I.(s) o. 4M (j)f (s) X (s) ds = J So to show that <X')t t s- J ° n °p (1) . t, all that is necessary is to show that, f t Ij(s) 0~4 M* ( jJ2 V (P ,s) SO(P ,s)X (s) ds ~ t. j=1 o J snon ° ° i< 2T(Kn)-1 But 2T(Kn) -1 _K Y- '1 J= t fO Ij(s) o.-4 M* (j) 2 V (P ,s) S°(P ,s)X (s) ds J s no no 0 e. - V(Po ,s)S°(P0 ,s)]X0 (s) ds If (Kn)-1 f t Ij{s) 0-j4 M*(j)2 02(s) ds = 0 (1) j=1 o s p i< the first term on the RHS above is °p (1). then by AI, and C, Consider the second term, call i t V, (3.9) = 2T(Kn) -1 -K t rf I.(s) j=1 O J -4 o. J 0 2 (s) ....n ~ i=1 s _..2 f O I.(u)[U-(u,i) J 67 + 2M* u- (j) U(u.i}] dNu (i) ds = 2T(Kn} -1 j:; i~1 f O Ij(u}[U (U.l) _K ..Jl 2. t (f = 2T(Kn}-1 t< 2? ft . 1 1= . 1 J= 0 t u * + 2Mu _(j} U(u.i}] 2 I.(s}o (s) ds}oj-4 dNu (i) J [I .(U)[u2(u.i} + 2M*u_(j} U(u.i}] J 2 -4] f t I.(s}o (s) ds o. u + 2TK- 1 t< fot j=1 J I.(u} o2(u} J n J dM (i) ft I.(s}o2(s} u J u ds o~4 duo J So V is equal to a local martingale plus an increasing process. Consider the increasing process. -1 2TK _K t 2 t 2 -4 xf O I.(u}o (u) f I.(s}o (s) ds OJ du . 1 J n u J J= = 2TK- 1 t< fot j=1 Ij(u}[V (~ .u}So(~ .u)- V(~ .u}So(~ .u}] A (u) non 0 0 0 0 2 t -4 • f u Ij(s}o (s) ds o.J du (3.10) -1 _K t + 2TK j:; f O Ij(u} -1 _K t = opel} +2TK j:; f o Ij(u} But 2TK- 1 t< j=1 O-j4 f~ I.(u) o2(u} J -1 = 2TK 2 0 0 t 2 -4 (u) f u Ij(s}o (s) ds OJ du 2 ft u t 2 -4 (u}fu Ij(s}o (s}ds OJ du by AI. Cl. I (s}o2(s} ds du j _K -4 t 2 s 2 rOJ f o I.(s} 0 (s) f o I.(u}o (u) du ds . 1 J J J= Therefore the limit in probability of the LHS of (3.10) is equal to the limit of 68 TK -1 _K -4 Y- a. j=l J (10t I.(s) a 2 (s) ds) 2 = TK-1 (i(t)-l) J + 0 P (1). Recall that both i(t) and K are functions of n. and Li (t)-l i. Li (t)-l i. i(l) i(K) J J i(t)-l j=l j=l ~ ~ K i(K) .~ i j i(l) ij J=l j~ i Since lim ~ = 1. n~ n (1) Therefore, by the above and (3.10 ) . Li (t)-l i j j=l It is easy to see that i(t)-l k ft ~ ~ t. n~ Vt = 2T(Kn)-1 ~ ~ I~ [l j (u)[u2(u,i) - t j=l i=l + 2M* (j)U(u,i)] u- Define V' = (Kn)-l v 2 (s) ds a~4] dM (i) Itu 1.(s)a J J u ~ ~IotAV [l j (u)[u2(u,i) € P (Kn) (3.11) -2 _K .Y- ~ ~ J=l i=l P (1). u [O,T]. variation of V' evaluated at v = T is shown to be o (1) by Lenglart's inequality. 0 + 2M*_(j)U(u,i)] j=l i=l dMu(i) for each v I utVu Ij(s) ~2 (s) ds a -4] j + If the quadratic (1) then V -t = P t The quadratic variation at v = Tis. 0 [_2 10t Ij(u)[U-(u,i) + 2Mu*_(j) U(u,i)] 2 t 2 2 -8 • (Iu Ij(s)a (5) ds) a.J e ~oXu(i) Yu (i) A0 (u) ] du 69 + M* (i)U(s,i)] fT I.(u) du dN (i)] 0 (1) ss J s P The compensator of -2 -1 _K -2 n· _..2 * K n Y- i. ~ fa I.(s)[U-(s,i) + M (i)U(s,i)] sJ J j=l i=l fTVs s I.(u) du dN (i), J s evaluated at time T, is given by, K- 2 i< . 1 i- 2 j J= -2 = K fT Ij(s)V (p ,s)So(P ,s)A (s) fTVs I.(u) du ds O non 0 0 s J -K r- j=l -2 faT i. J I.(s) . J fT s Ij(u) du 0 (1) P By Lenglart's inequality the second term on the RHS of (3.12) is 0 (1) which in turn, implies (3.11) is 0p(l}. [O,T]. Therefore <X'>t ~ P t V t € All that is left is to show that the Lindeberg condition in Rebolledo's central limit theorem is verified, i.e. show, -1 K n -2 _K .J1 j:; i~l T [ fa -4 * + 2M • I{K~ n -1 u -2 j for each e > O. 4K -1 -2 _K n .J1 j:; i~l _..2 Ij(s}u j [U-(s,i} s- (j}U(s,i)] 1_..2 V-(s,i} 2 P e 0 X (i) s + 2M*_(j)U(s,i) s Y (i) A (s) s I> ] e} 0 ds = 0p(l) This is bounded above by, T [ fa -4 4 Ij(s)u j [U (s,i) ~ -1 I{K n U -2 _..2 P X (i) ] Y (i) A (s) ds j U-(s,i} > e/2} e 0 s s 0 70 -~ -1 OJ-21 Ms(j)U(S,i) * 1 > c/2} • eff0Xs (i) Ys(i) Ao(S) ] ds I{K n i< 2~1 -1 -1 K ~ n j=l J ° (1) + P _K T rfa I j (s)2.-2 M* (j) 2 -1 -1 ( max K n l~i~n j=l I{K-~ 1 faT l~i~n I{ (3.13) i< j=l J * 2 r- I j (s)2 -2 Ms(j) j j=l i< j=l [i< I j (S)o-j2 j=l K-~ max In~ M*(j) I = sup j s °p (1) satisfied if ] ds • ° (1) p * 2 r- I j (s)2 -2 j M (j) j=l s • nJAIXs(i)IYs(i) > c/4}] ds 0p(l) I (S)2-2 M*(j)2 j j s 1 n- IM*(j)I • IE (ff ,s)1 s n 0 i< O . 1 J= > c/4}] Ij(s) 2~2 M*(j)2 ds = J s ds °p (1) ° (I), and P (use Lenglart'sinequality), one gets that the third term above is 0p (1). will be 0p (1) for all c > c/2} [ _K Noting that from (3.9), (Kn)-l fT l~j~K s€I IEn (~ 0 ,s)l] s a I{ faT 1,Ji K~ n-~IM*(j)1 + (Kn)-l fT dS) .Op(l) Ij(s)oj-2 K~ n -~ IM* (j) I[n~ IX (i) IY (s) s s s _K XJ= . 1 (Kn) -1 I.(s)o-j2 > c/2} [ _K + = 0p(l) + max s oj2 n- IM:(j)u(s,i)Ys(i)I = 0 (1) + max (Kn) -1 p l~i~n • I{ J The second term in (3.13) above > a and the Lindeberg condition will be 71 (3.14) = 0 p (1) > 0, For C > o. Ve the LHS of (3.14) is bounded above by (3.15) + ~ C-2 f T T C I{n-Y.l X (i) IY (i) > e/C} ds fO 1 max l~i~n s s _K -2 M* (j) 4 ds rI.(s)e -2 j n J s O . 1 J= + C max l~i~n fTO I{n-Y.lIX (i)IY (i) s s > e/C} ds T _K -2 -2 * 4 Suppose that f O r- I.(s)e. n M (j) ds = 0 (1), then there exists j=1 J J s p constants, B(e) and nee) such that T P[fo * 4 ds < B(e)] > j:; Ij(s)e-2j n-2 Ms(j) _K B(e' ~ Choose C so that 1 - e/2 for n ~ nee). < e/2. By assumption B, there exists n'(e) such that, > e/C} fTO I{!- Ix (i)IY (i) 1~i~n ~ s s P[ max ds < e2C] > 1 for n ~ n'(e). - e/2 Therefore by (3.14) and (3.15), P[ max 1~i~n _K -1 -1 * 2 fT O r- I.(s)e j n M (j) j=l J s for n ~ nee) V n'(e), 72 if _K faT .r1 -2 -2 * 4 Ij(s)i. n M (j) ds = ° (1) . J s P J= n = Recall that M*(j) = 2 f~ I .(u)U(u,i) dN (i); therefore, M*(j)4 s s i=1 J u . 4 Now, 2 A(M* (J». u u~s + 4 AM*(j) M* (j)3 u u- [AM*(j)]k = ~ Ij(u) Uk(U,i) AN (i) u i=1 u where k = 1,2,3,4. So, M* (j) 4 = 2 AM* (j) 4 = Z~ s u~s u i=1 fas --~ * (j) Ij(u)[U4 (u,i) + 4U-{u,i)M u- Thi s means that, fT I.(S)i-2 n- 2 M*{j)4 ds j O j=1 J s i< ° By Lenglart's inequality, the above will be (I) if its compensator is p • _K -2 -2 * 4 0p{I). The compensator of fa j:; Ij{s)i j n Ms{j) ds, evaluated at time T, is _K -2 -1 rij n faT Ij{u)[f 4 (u) j=1 n 3 * + 4f CulM (j) n u o . .* 2 fT + 6V {~ ,s)S (~ ,S)M (j)] I.{s) ds no no u UJ A0 (u) du 73 which is 0 (1) by (3.9). p o 3.4 Other Versions of the Partial Likelihood Ratio Test Theorem 3.3 of the last section can be used to formulate a sequential test of ~ o is constant versus ~ 0 is nonconstant. Corollary 3.1 below gives the null distribution for the sequential test. An a level test would be to reject the null hyPOthesis and stop the test at time t € [al •...• ~] if T-~I Ztnl exceeds the (I-a) th percentile of the distribution of distribution to sup s€[O.T] vT Iw I. sup s€[O.I] Since s Iw I. sup s€[O.T] Iw I is equal in s the relevant percentiles can be derived s from the series representation for the distribution of given by Billingsley (1968. p. 80). sup s€[O.I] Iws I Since the set [al •...• ~] increases with n up to a dense set of [O.T]. and the Wiener process is continuous. it is sufficient to evaluate the maximum over t € [al •.... ~] of Zn for the limiting distribution of sup s€[O.T] Iws I to hold. '€o'toUalty. 3.1 Under the assumptions of Theorem 3.3. max t€[a 1 •··· .~] converges in distribution to process. sup IWtl where W is a standard Wiener t€[O.T] 74 PROOF: As in Theorem 3.3. define Z~ = (2KT-1)~ 2{~n(~(H~»t for t € [O.T]. n Then. Zt = Xnt n t - ~n(~(H~»t} - i(t)} n + Y . where X converges weakly in D[O.l] (in the supremum topology) to a standard Wiener process and max t€[a1·····~] I max t€[a1.···.~] IY~I =0 (1). COnsider p IZ~I - max t€[a1.····~] (3.16) max IZ~I has the same limiting distribution as t€[a 1 ···· .~] Therefore max IX~I. t€[a 1 ···· .~] Since Whas support contained in C[O.T] which is a separable subset of D[O.T]. the representation theorem (Pollard. 1984. pg. 71) implies the existence of in and Wwhere sup li~-Wtl-+O t€[O.T] a.s. and in. Ware equal in distribution to Xn . W. respectively. for each n. ....n If concluded. I .... max IXtl -+ sup IWtl. then the proof will be t€[a1 ..... ~] a.s. t€[O.T] max t€[a 1 ···· .~] 75 ~ 00 the first term above goes a.s. to zero, and as K that K ~ 00 As n as n ~ 00) ~ 00 (recall the second term goes a.s. to zero; therefore, This in turn implies that (3.17) Combining (3.16) and (3.17) concludes the proof. o ratio test is based on: (21< )~ n which by Theorem 3.3 converges in distribution to a standard normal. Portnoy (1988) derives a likelihood ratio test of this type for an exponential family setting in which the number of parameters increases with n. As mentioned earlier Fisher's transform is considered in corollary 3.2 below. It is well known that Fisher's transform of a chi-squared random variable converges more swiftly to a standard normal than the above standardization. (Johnson and Kotz, 1970). However in this setting 76 one can not immediately assume that Fisher's transform of PLRT Kn will converge faster to a standard normal that the standardized version of PLRT given above. K From the work in Theorem 3.3, there is no evidence n to support the intuition that the distribution of PLRT Kn is close to a chi-squared on K -1 degrees of freedom for finite K. n n In fact it would Kn < z ] - be of interest to give a rate for the convergence of P[ PLRT T K n (z) to zero (T K n is a chi-squared distribution on K - 1 degrees of n freedom) as K increases with n. n However this problem is not addressed here. <€O'l.ou,atty. 3.2 Under the assumptions of Theorem 3.3. the distribution of converges to a standard. normal. PROOF: It is easy to show that for x J1+x - 1 In other words for ~ that and subsequently ~ >0 ~ co. 0 such that Ixl Now. by Theorem 3.3. PLRTK ~ -"""K:-:-' ~ 1 as n >0 ~ x there exists 0 <~. x that for each >0 > -1. x PLRTK -K 2K -1 <0 ~ ~ implies 0 as n ~ co, Putting x = K PLRT - 1 one gets K 77 (K- 1 LRTK)~-l P K- 1 LRT -1 K - 1 <c -+ 1 as n -+ 00. 2JK- 1 LRT -1 On {K PLRT K -1 > -I} consider (K-1pLRTK)~-1 K- 1pLRT -1 K 2(K-1pLRTK)~ (3.18) (K-1pLRTK)~-1 K- 1pLRT -1 K 2(K-1pLRTK)~ The last two terms on the RHS of (3.18) go to 1 in probability and the first term on the RHS of (3.18) converges in distribution (by Theorem 3.3) to a N(O,l) random variable. Since P[K -1 PLRT - 1 K > -1] 1, the -+ ~ corollary is proved. D 3.5 Consistency of the Partial Likelihood Ratio Test If one suspects that the coefficient in Cox's regression model is time dependent, then it is natural to approximate the coefficent by a step function as in Brown (1975) and Anderson and Senthilselvan (1982). Then to test if in fact the regression coefficient is time dependent, a PLRT or a score test as in Moreau et. al. (1985) could be used. However 78 this test can not be consistent for general nonconstant ~ o . If the test is formulated within the sieve context (i.e. as data/information accumulates allow for a larger alternate space), a consistent test can result, as is proved below. For a discussion of consistency against contiguous alternatives see note 3 following Theorem 3.4. Theorem 3.4 Assume, a) is nonconstant and is Lipshitz continuous, R "'0 lim 1I1!1I 4n n b) there exists c) for b € ~ 00, ~(K) < lim 1I1!1I 2= 0, lim n n (;(1) an open set containing [ such that sup (b,s)~x[O,T] inf s€[O,T] AI, A3, C, 00, ~o(s), ISj(b,s) - sj(b,s)1 n sup s€[O,T] =0 (1) ~o'(s)] j p = 0,1, define KL(b) d) ~ = = ITo{(~ o (s)-b)E(~ ,s) + 0 there exists [bL,bu] C d1) there exists a inf KL(b) b€[bL,bu] d2) there exists 6 ~ Ln[ SS~(b,S),s) (~ ]} SO(~ o ,s)A (s) ds 0 for which the following holds: > 0 for which < [KL(bL ) >0 0 A KL(bu)] - a, and such that inf KL(b) b€[bL,bu] > 6. 79 Remark Lemma 3.5 gives conditions under which d) holds. PROOF: In the following proof the subscripts and superscripts "T" are deleted for ease of presentation. Consider, 2n-1{~n(~(H1» ~n(~(Ho)} - = 2n-1{~n (~(H1» (3.19) _ ~ (~n)} + 2n-1{~ (~) - ~ (~ ) n n n 0 + 2n-1{~ (~ ) - ~ (~(H )}, non 0 where ~ is as in section 2.4 ( a ""," has been added so as to remind one that ~ is a vector); dimensional vectors, that is, both ~(H1) and ~ can be written as K ~(H1) = (~ ..... ~) and ~n = (~ .... ~). '" A Taylor series about ~(H1) of the first term on the RHS of (3.19) results in, -n- 1 _K cJ2 * .r..2 ~n(~ )(~ aPj J=1 Also for each j '" -Il;) = 1, ... ,K where Therefore. on _K rrj=1 2n-1{~n(~(H1» {-(2 j n) - -1 ~n(~)} a..22 aPj 1I~**_~nll ~ 1I~-~(H1)1I ** ~n(~ ) > O}. 80 = n -1 ~ By Theorem 2.1. lemma 3.4 and C2. P[ {-i.n)-l j=l as n ~ Cll) J a2 ~ (~**) 8~ > 0] n ~ 1 and 2 -(i.n)-l ~~ (~*) J 8~ n j =O{l). max l~j~K ({i n)-l 0' _2 ____ 8~ j ] ~ (~**) 2 P n j For the above application of Theorem 2.1. note that the assumptions lim n nllill 10 o (I). =0 and A2 are not necessary since one only needs max I~-pnl = A l~j~n J Therefore p 2n-1{~ (~{H » - ~ (~n)} n 1 n = n- 1 = n- (3.20) 1 iK i-1rn~ ~~ (~»)2 j ~ 8{jj n 0 (I) j=l i{K) n = i{K) [Op{n- p iK {i.n)-2r8~ ~ (~»)2 J ~ j n j=l 1 tJ 0 (I) P 2 4 lIill- ) + Op{lIiIl )] by lemma 2.1 2 4 = 0 ({nlliIl )-I) + 0 (lIiIl ) P Next consider the second term on the right hand side of (3.19). and p recalling that pn{s) = .iK J=l Ij{s)P~ e. 81 = n-1 KL (p)t n t [ S~ (13. s) ] ~ f O (13 (s)-p(s»X (i) + Ln 0 dNs(i) i=l 0 s S (13 .s) n t € [O.T] 0 The compensator of KLn (13-n ). is ""'" KL (~)t n [SO(~S) t = fO[(~(s) ] 0 - p(s»E (13 .s) + Ln ~ S (13 .s)A (s) ds. n o S (13 • s) n 0 0 n 0 For fixed s consider (~(s) - """":S~o~(~_,s_) 13 (s)) E (13 .s) - Ln[ ] Sn(pO's) o n 0 ~ ~ l sup €[O.T] sup be V Ib-Po (u) I<'Y n (b.u) ] ~ (p (s)-p (s» 2 0 if I~(s)-po (s)1 < 'Y = "e,,4 <KLn (~) (3.22) 0p (1) (by the definition of ~). KLn (~) >T ~ 2n -1 T -n f O (13 (s)-po (s» 2 Sn2 (13 0 .s)A0 (s) ds 82 1 T S0 (p s) ]] 2 0 + 2n- f O Ln ~ on S (~ ,s) A (s) ds [ [ S (p ,s) n 0 0 n 0 is 0 p (11211 4 n- 1 ) (using Lenglart's inequality as stated in section 1.2). The first term on the RHS of (3.22) is 0 (11211 4 n- 1) by the definition of p ~. The same is true for the second term on the RHS of (3.22) using the Taylor series argument in (3.21). Therefore and by (3.20) above, 2n-l{~n(P(Hl» - ~n(P(HO»} If there exist an e >0 for which (3.23) " then K~[2{~n(P(H1» - ~n(P(Ho»} - K] = nK~[2n-1{~n(P(H1» - ~n(P(HO»} - Kn- 1] = nK~[ n-1{~n (~ o ) - ~ (P(H »} n O + 0 p and the theorem will be proved. To prove (3.23), consider n-1{~n (~ o ) - ~ (P(H »} n o where for ~ € !I (1) ] ~ 00, 83 KL ({j) n t = n- 1 t [SO{{j.S) ] ~ JO{{j (s)-{j)X (i) + Ln ~ dNs{i) for t € [O.T] i=l 0 s S ({j .s) n 0 The compensator of KLn ({j) is given by for t € [O.T]. Assumptions c) and C1 imply that To show that KLn{{j)T ~ ~ KL{{j). consider <KLn ({j) - KLn ({j»T = 0 p (I) by c). A1.and C1. By Lenglart's inequality KLn{{j)T ~ KL{{j) for each {j €~. Since KLn{{j)T is a convex function in {j €~. this implies that sup IKL {{j)T - KL{{j) I {j€A n op (I) for each cOmPact set A C ~ and that KL is convex on ~ (see Theorem 11.1 in Appendix II of Anderson and Gill. 1982). A If P[{j{HO) € [~'bu]] ~ n-lllO 1. for bu. bL as in d). then for ~ < 0 (o as in d2.) A P[KLn{{j{HO»T ~ P[ > ~] inf KLn{{j)T {j€[bL·bu] > ~. = 84 ~ P[ ~ P[ > E., inf /3€[bL ,bU] sup IKLn (/3)T - KL(/3) /3€[bL ,hu] I < -E.+o, " In order to prove that P[/3(HO) € [bL,hu]] there exists a >0 for which be done since KL is convex on ~); < a/4, " T " T P[/3{HO) € [bL,hu]] -1 PEn ~ "T {~{/3{Ho» n ' I, recall that by dl), < [KL(bL ) A KL(hu)] - a. inf KL(/3) + a/2 (this can /3€[bL ,hu] IKLn {/31)T - KL(/31) I < a/4. ~ n ~ < a/4] 1 as CD. the convex function KL {/3)T' this implies that n 1 as n - ~ 1. then IKL{hu)T - KL(hu) I But since /3{HO) minimzes < ~ n~ n~ inf KL(/3) /3€[bL ,hu] Choose /3 1 € (bL,hu) for which KL(/31) ~ P[IKLn{bL)T - KL(bL) I " /3(HO) € [bL,hu]] ~ n ~ (/30 )} CD. Therefore there exist > E.] ~ 1 as n ~ E. >0 for which CD. o Notes to Theorem 3.4 1) Theorem 3.4 also proves consistency for the sequential test of corollary 3.1. value of t ~ In fact the result of Theorem 3.4 holds for any fixed T. i.e. for any B > 0, If the above holds for even one value of t, the sequential test of corollary 3.1 will be consistent. This implies that the sequential test will be consistent for a wider range of alternatives that the test based 85 only on the likelihood ratio test statistic at time T. For example. the conditions of lemma 3.5 would need only hold on the interval [O.t] for some t € [O.T]. Recall that PLRTK = 2) ~ T 2{~n{P{Hl»T ~ T ~n{P{HO»T}' - From the proof of n Theorem 3.4. it is obvious that for each B > O. P[ K~lpLRTK >B ] ~ 1 n as n ~ 00. This in turn implies the consistency of the Fisherian transform of the PLRT in corollory 3.2. since (PLRTK ) ~ - -~ K~ = n 3) Theorem 3.4 gives conditions for the consistency of the PLRT against a fixed alternative. Another possibility would have been to consider a contiguous alternative. Let ~~ be the probability under which ~ has intensity with regression coefficient. Po + P{s)/~ at time s [i.e. As{i) = e (P +P{s)/~)X (i) 0 s Ys{i) AO{S). s € [O.T]. i=I •...• n] and let~: be ~~ where P Ln[ :; 1converges in ": - =O. Then lemma 4.3 implies that distribution to a N(- ~ a~. a~) (a~ = o J~ ~(s)S2{PO.S)AO{S) ds). n If for some statistic TS • one has then Le Cam's lemma 3 (Hajek and Sidak. 1967. p.208) implies that n TS ~(~~) > N{oLTS' 2 0TS)' If 0LTS # O. against the contiguous alternative (P shown that. o then TS + P{s)/~). n is consistent In lemma 4.3 it is 86 Ln[ ::"nP ] = n-~ ~ ~n IT ~(s) i=l 0 1 X (i) dM (i) - -2 s s IT0 ~2(s)S2(~ .s)A (s) ds 0 0 o + On 0 p (1). the other hand. Theorem 3.3 gives that. 2{~n(~(Hi»T - ~n(~(H~»T} - K -hi( + 0p(l). Using Rebolledo's central limit theorem for local martingales (as in section 1.2) 0LTS will be the limit in ~~-probability of <n~ ~ Ie ~(s)X (i) dM (i). i=l s 0 s But this is equal to (under appropriate conditions). n~(2K)~ ~ ITO ~ I.(u) U(u.i) IT I.(s)~(s) ds dM (i) + i=l j=l J u which converges to zero in probability. J u (1) 0 p In other words the PLRT is. in general. not consistent against a contiguous alternative of rate n~. Intuitively this means that the rate should be slower. perhaps ~ n~. n Consider ~~vK. ~~[Xs(i) € and for simplicity. for each component of [0.1] V s € [O.T]] = 1. Therefore by lemma 4.3. for each B then. > O. ~. assume 87 -~ -~ That is. this slower rate (K-n probabilities. ) does not produce contiguous In chapter IV. statistics which are consistent under a contiguous alternative are considered. 3.6 The Independent and Identically Distributed Case In section 2.6. conditions under which~. the sieve estimator is consistent. were given. It turns out that except for path properties of the sj(~ .s). the assumptions made are also sufficient to imply o asymptotic normality. All that is needed is to verify a Lindeberg condition as in B) of section 2.4. In fact. instead of B. a weaker condition B') is sufficient. B') max l~i~n there exists 0 >0 such that. fT I{s : Ix (i) I Y (i) O s s = 0p(l) for each > ~ ..;n. ~ ~ 0 (s) X (i) s > -olx s (i) I} ds > o. To see that B') is sufficient. consider equation (3.3) of Theorem 3.1. fT n- 1 o (s)X (i) xn X (i) 2 e 0 s Y (i)A (s) i=l s s ~ I{s 0 Ixs (i)IY s (i) > ~..;n} Ix s (i)IY s (i) + -1 fT n O (s)X (i) xn X (i) 2 e 0 s Y (i)A i=l s s .~ (s) I{s 0 Ix s (i)IY s (i) ds > ~..;n. 88 > t~. + o ff1 X (i) 2 e ~ fT n-1 O ~ (s)X (i) . 1 1= s T max f O I{s s -0 Ix (i) s Ixs (i)IY s (i) I~i~n >- I ]I.. 0 olx s (i)l} ds (s) I {s : Ix (i) S > t~. I >t ~} ds ~0 (s)Xs (i) > -olx (i)l} ds 0 (1) s p by AI. CI = 0 p (1) + T max f O I{s : Ixs (i)IY s (i) > t~. I~i~n ~ (s)X (i) > -olx (i)l} ds 0 (1). s s p 0 Therefore B') can be used instead of B) as the Lindeberg condition for asymptotic normal! ty. Consider n i.i.d. observations of (N.X.Y) where both X and Yare left continuous with right hand limits. Then under the assumptions given in corollory 2.1. the Lindeberg condition B') holds. Remark. The following proof is virtually identical to the verification of the Lindeberg condition in Theorem 4.1 of Anderson and Gill (1982). PROOF: Consider max I~i~n for some 0 fT I{s : Ix (1) IY (i) O s s > O. > t ~. ~ (s)X (i) > -olx (i) I} ds 0 s s to be chosen. ~ max €-In-~ fTO Ix (i)IY (i) I{~ (s)X (i) > -olx (i)l} ds I~i~n s s 0 s s Note that if Z(I) ..... Z(n) are i.i.d. random variables with finite second moment then 89 P[ sup n~hIZ(i)1 > ~] ~ n P[n-~IZI > ~] l~i~n S ~ ~h (n 2 Z dP Izl>~) -+ 0 as n ~ -+ 00. Therefore to prove that max l~i~n is 0 n-~ STO Ix s (i)IY (i)I{P (s)X (i) s 0 s > -olxs (i)l} ds (1). all that is necessary is that p 2 E STO Xs Ys I{P0 (s)Xs > 0 so that [ Pick 0 > -olxs I. P (s)X o s E T ~ inf P (s)-20. sup P (s) + 20] C s€[O.T] 0 s€[O.T] 0 there exists P' € ~ for which P' X s 2 So > -olxs I} ds < 00 X Y I{P X s s 0 s > -olx s I} ds <T E[ sup > O. (b.s)~x[O.T] e then if One gets. b X _-2 s Y X-] s s o by assumption d.) of corollary 2.1. Notes to Corollary 3.3 1) If lim nll211 8 = O. n lim nll211 n 4 = -E < lim 2 n (1) 00. and 00. D4 hold. then 00. under the conditions of corollary 2.1. ~ S~ ~(s) - Po(s) ds converges weakly to a Gaussian martingale. 2) If P is constant in s. lim nll211 o n . 2 rK , 11m ~ n (1) = 1. 4 = 00. lim 11211 n 2 = O. and then under the conditions of corollary 2.1. corollary 3.1 implies max t€[a 1 ···· .~] 2{~n(~(H~»t - ~n(P(H~»t} converges weakly to - i(t) (2KT-1)~ sup IWtl where W is a standard Wiener t€[O.T] 90 process. To prove consistency of the PLRT, Theorem 3.4 can be used, but condition d in that theorem requires an continuity assumption in addition to the assumptions of corollary 2.1. It turns out that assumption c of the following corollary is sufficient. 3. 4 <€o'tollo'tg Consider n i.i.d. observations of (N,X,Y) where both X and Yare a.s. left continuous with right hand limits. If ~ a) o is nonconstant and is Lipshitz continuous, 2-E < b) lim nll211 4 = n c) X and Yare continuous in probability and d) the conditions of corollory 2.1 are satisfied, then for every B lim 11211 = 0, lim 2 n n (1) 00, > 0, P[2{~n(p(RIl IT~~n (P(Rbl lTl PROOF: - K as n -+ 00. The conclusion above will follow if the conditions of Theorem 3.4 are satisfied. [inf s€[O,T] E[ 00, ~o(s) -~, sup e (b,t)~x[O,T] bXt Choose~' sup s€[O,T] 4)t YtX < as in corollary 2.1 (i.e. for ~o(s) 00, + ~] C ~). ~ > 0, ~' Since the dominated convergence theorem implies that Sj(b,s), j = 0,1,2 are continuous on i' and by Theorem 111.1 in Appendix III of Anderson and Gill (1982), c) of Theorem 3.4 is satisfied. = 8 0 Also by the dominated convergence theorem, 8b S (b,s) = 91 1 8 1 2 S (b.s) and 8b S (b.s) = S (b.s). for (b.s) E ~ . x [O.T]. To show that condition d} of Theorem 3.4 holds. lemma 3.5 can be used. i~f in corollary 2.1 it was shown that SO(b.s} (b.s}~lx[O.T] inf V(~ .s) s€[O.T] 0 > 0. = E [e Also. since V(b.s) Recall that > ° and that bX s Y (X s s ° (Exs is defined in corollary 2.1). V(b.s} ~ Assumption c}. implies that SO(b.s} and Sl(b.s} are continuous on ~. x [O.T]. (b.s) E To see this. suppose there exist a sequence ~. x [O.T] as k ~ 00. (~.sk) ~ Then. lim E [ exp(h. X }Y X' - exp(bX }Y X ] -k sk sk sk s s s k = Elim k = E = [exp(b. X }Y X - exp(bX }Y X ] -k sk sk sk s s s ° V [ ebXs+ys+Xs+ - e bX] sY X S S ° since X. Y continuous in probability implies that for each s, X = X s+ s a.s. and Ys+ = Ys a.s. A similiar proof can be used to show continuity ° of S . o 3.7 Lemmas 3.1, 3.2, 3.3, 3.4 and 3.5 Lemma 3.1 Assume A, C1, Dl, and lim E n c;(1} n < 00, define, a2 "2 -1 a = - i n 8~ -n ~ (~ n i a 2i = JTo lies} V(~ 3 1 8 * ~;;Il ) - - ~ (~ )(P .. -P .. ) and, 2n 8~3 n i i ° .s}S (~ i ,s}A (s) ds. 000 92 then on lip~ - {jn II < !kY, PROOF: _J( y- i=1 '""2 2 (a.2 - a.) 1 _J( ~ 2 Y- [T ;;nf O Ii(s) V (p ,5) dN (.) i=1 1 n _J( + 2 Y- 5 3 [12n -3 8 * <£ ({j i=1 8{ji n '"" - ~) - ]2 Hrf; .5"Y, ~ 4 _J( T ;;n Y- (f Ii(s) V (p ,5) dM O i=1 + 4 -K r- i=1 2 + .51!(K) n (f T O '"" 5 ;;n 0 0 Ii(s)[V (p ,5)8 ({j ,5) - V({j ,5)8 ({j ,s)]X (s)ds) n - 2 IIpn-~1I + 4 2 (.» 0 0 max s~p 1(l!i n ) l~i~K {j 1I{j*-~1I<.5"Y _J( -1 Y- (I!i 1=1 n ESK T ;;n 0 0 3 -1 8 * 2 -3 <£ ({j ) I 8{ji n 0 f O Ii(s)[V (p ,5)8 ({j ,5) n n 0 0 2 4 - V({j ,5)8 ({j ,5)] X (s)ds) 0(1I1!1I) 000 2 93 Using Lenglart's inequality (Lenglart. 1977) it is easy to show (using lenuna 3.2) that _K r- i=l ( ai-a 2 "'2)2 i = °P(IItll4n 1 4) 4) R ) 0(11"11 ~ + II;;n_ ~ ~n I1 2 0 p (11"11 ~ 4 + O(lItll) T r- (ti-1 f O Ii(s)[V .-K i=l n ;:;Il 0 (p .s)S (f3 .s) n 0 o - V(f30 .s)S 0 (f3 .s)] A (s)ds) 0 2 All that is left is to prove that .-K -1 T ;:;Il 0 0 2 r(i. f O Ii(s)[Vn (p .s)Sno (f3 .s) - V(f3 .s)S (f3 .s)] A (s)ds) 0 0 0 1= .11 _K . 1 -1 = r- (ii 1= T;:;Il 0 f O Ii(s)[Vn (p .s) - Vn (f3 .s)]S (f3 .S)A (s)ds) on 0 0 2 (3.24) + _K -1 T O O 2 r(ii f O Ii(s)[V (f3 .s)S (f3 .s) - V(f3 .s)S (f3 .S)]A (s)ds) i=l non 0 0 0 0 Using lenuna 2.2. it is easy to show that the first term on the LHS 2 of (3.24) is Op(lIiIl ). The second term. _K -1 T O O 2 r(ii f O Ii(s)[V (f3 .s)S (f3 .s) - V(f3 .s)S (f3 .S)]A (s)ds) • non 0 0 0 0 i=l can be divided up into terms such as _K IT j . r- (i-i f O Ii(s)IS (f3 .s) - SJ(f3 .s)lds) i=l n 0 0 j 2 °p (1) = 0.1.2. by lenuna 2.2. and the fact that inf SO(f3 .s) O~s~T 0 > O. 94 The proof will be concluded if for j = 0.1.2. _K fT r(e -1 0 lies) IS j (~ .5)- S j i i=l n 0 1 .5) Ids) 2 = 0 (----4)· (~ p nllell 0 The above LHS is less than or equal to. _K -2 T j j 2 r{e e fOeS (~ .s) - S (~ .s» ds) i i . 1 n 0 0 1= e Using A2. and lim ~ n < 00 yields the desired result. (1) o Lemma 3.2. -E e < Assume A. C. D and lim n 00. (1) then. sup O~t~T 1..Jii f ot (~ i=l (~ lies) 22i] {E .5) - E 0i non (~.s»So{~ .s)}. (s) n 0 0 PROOF: Using a Taylor series on ~(s) about ~ (s) at each s results in o En (~.s) - En (~0 .s) = (~{s) - ~ 0 (s» Vn (~0 .s) where I~{s) - ~o {s)1 ~ I~0 (s) - ~{s)1 and subsequently. dsl 95 ~ sup O~t~T I~ fot [tK 1.(5) i 2i] i=l a. 1 (~0 (5) - ~n(S» (~ ,s)80(~0 ,S)A0 (s)dsl V non 1 T fO r + vn (~ o -n 2 ~ (5» ds (5) - 0p (1) (by A, C1). Using the definition of ~n it is easy to see that the second term above is 0 (~lIiIl4). p sup O~t~T ~ I~ fot sup O~t~T As for the first term, (~ [tK 1.(5) i 2i] i=l a 1 i (5) 0 ~n(s» (~ ,s)80(~,s)A V non (5) dsl 0 I~ f~ (~ (5) - ~(s»dsl 0 t _K ii V (~ ,5)80 (~ ,S)A (5) - 1] ds. + sup I~ f O I~ (5) - ~(s)1 [ Y1.(5) --2 O~t~T 0 i=l 1 a. n o n 0 0 1 The first term above, sup O~t~T shown to be 0(~lIiIl4). ~ fT I~ (5) - ~(s) O o < ~ foTI~0 (s) - I I~ f~ (~ (5) - ~(s»dsl, has already been 0 The second term is equal to, 2 ai ] _K Iv (~ ,5)80 (~ ,s)A (s) - [ Y1.(5) i I Ids-O(l} non 0 0 i=l 1 'i - ~(s)1 Ivn (~ o ,s}80(~ ,s}80(~ ,s}IA (s}ds-O(l) n 0 ,s) - V(~0 00 2 T 0 + ~ fol~ (s) - ~(s}1 IV(~ ,s)8 (~ ,s}A (s) - o 0 0 r 2 ) f TI V (~ ,s}80 (~ ,s) = 0p (vnllill o no no + 0 p (~lIiIl4) V(~ 0 0 [ _K ai] Y- lies) i I Ids-O(l} i=l 'i ,s} 80 (~0 ,s) Ids by D4. Using A2 and C1, results in, f TO I V (~ ,s}80 (~ ,s) non 0 V(~ 0 1 ,s}80 (~ ,s) Ids = 0 (~). 0 PJil o 96 3.3 ~emma a) ~ 0 is a constant function. 4 lim nllill = 00. lim lIill 2 = n n i lim .:..1!l < 00 • AI. A3. C. n i(I) b) c) 1) Assume. Consider for ~ (~) = ~ ITO n _I( ~ € ~-. Let i=I An ~ IIi II 2) Consider ~ (~)j n for ~ € j m and = I ....• K. = 4 iK j=I and 00. Ij(s)[~.X (i) - Ln SO(~ .. s)] dN (i) J s n be the maximizer of n _I( r- [~j A j=I - ~] 2 ~n(~)' J s Then = 0 (1). p 0 j ~ ITO I (s)[~ X (i) - Ln SO(~.s)] dN (i) v=I i=I v s n s 1 j Let~. be the maximizer of ~n(~) . • A J = I ..... K. Then max I~j~K I~ - ~ I j 0 =0 P (I) PROOF: Essentially the proofs of 1) and 2) use the same techniques as Theorem 2.1. That is. lemma 2 of Aitchison and Silvey (I958) combined with Taylor series arguments are used. 97 then by lemma 2 of Aitchison and Silvey (1958) Since _a2 min 1~HK ~ ~ (~) ~ ap j 0, 1) will be proved. n Following the same argument as in the proof of Theorem 2.1, but expanding about the K dimensional vector with instead of _K r- (n 2 j ) j=1 ~ o for each coordinate -n ~ -1 aRa ~j ~ (~) n (~i-~ ) 0 + max 1~j~K I (n 2 ) -1 j a2~ ap~ J ~ (~ ) n 0 -1 + 2j o.21 - L J 3 + .5 1~HK Choose ~ € mK. sup ~*~ max J II~*-~o II~II~-~0 II so that II~-~ 11 2 o (3.25) it is sufficient to show that and I(n 2.)-1 = (11211 4n)-1 [)2. n a 3 ~ (~*) III~-~ II a~ n j Then to prove 0 98 and max l~j~K Following the proof of 1) in lemma 2.1. results in by AI. A3 and Cl. max l~j~K ej ) I(n -1 Again following the proof of 2) in lemma 2.1. gives a2 a..2. Pj ~n(fjo) + e j 2 ~ 4 = 0p«n ell) ojl ) + 0p(l) The proof of 3) in lemma 2.1 can be imitiated to yield I(n ej ) -1 3 a ---3 ~ afj n * )1 (fj = ° (1). p j (3.25) is proved. Let > 0. and let ~ ~ (fj)j = 1 j n Denote v 1 . 1 J= ~ ITO I (s)[fj X (i) - Ln SO(fj.s)] dN (i) v=1 i=1 ej v n s s by a • v=I •...• K and set a = 0. O v If. K pI j=1n {sup fjEIR (3.26) (a. n)-1 J ~A ~n ({j)j jJ Ifj-fjo I=~ -+ 1 n-lllO then as in the proof of 1). ~ P[3 3 I~-fjol <~. ~fj ~n(~~)j = 0. and 2) holds. Using a Taylor series about fj (a.n) J -1 d dA jJ ~ n j (fj) (fj-fj ) 0 o results in j = 1 •... . K] -+ 1 n-lllO 99 -1 = (a.n) J d d R ~ ({3 tJ n 0 ) 2 ({3-{3 ) + {a.n)-I d o J d{32 j + (2a.n) J where 1{3~ - {3 J 0 So that for 1{3-{3o 1= e. j (e2 I( 1{3-{3 0 I I~HK {{3-{3 )2 0 3 -1 d * j 3 ---3 ~ ({3j) ({3-(30) d{3 n j = I ..... K. = I •.... K and max [e -1 {ajn)-I ~n {{30 )j by C2 Idd{3 ~n {R o )j I tJ 3 1 d * j e I] I(2a .n)- ---3 ~ «(3) d{3 J n The above relation implies that if max l~j~K I(ajn) -1 and then (3.26) will be proved. Consider first max l('(K _J_ I(ajn) -1 d dR tJ ~ n ({3 ) 0 jl dd R tJ ~ n «(3) jl 0 = 0p (I). . 100 = max t €[ a1' ... '~ ] ~ 2- 1 n-~ sup 1 = 2 In-~ ~ f t X (i) - E (~ ,s) dM (i)1 t~[O,T] -1 n 1 -~ t ~ fo X (i) - E (~ ,s) dM (i)1 s nos i --1 l(tn)-l 0p(l) o i=l s nos by an application of Lenglart's inequality and Al, Cl. Next consider max I (ajn) l~j~K + = 2 -1 d2 --2~(~) dp n j -1 -a. ~ 2 j a 1 J v=l v 0 O max [V (~ ,s) 8 (~ ,s)O - V(~ ,s)8 (~ ,S)]A (s) ds I It -1 f Ot no no 0 0 0 t€[a 1 ,··· ,~] -1 1 n -~ 0p (1) + 0 p (1) by arguments similiar to those above. If Choose then the proof is finished. max l~j~K ;W EIR ~ <~ (~ in assumption A3). Then, 3 l(a j n)-l ~3 ~n(~*)j I I~ -~ol~~ max Itt€[a1' ... '~] 1 f~ dN (e)1 s 0 (1) (by A3) p + 101 max t€[a1·····~] = 2 -~ -1 n 1 Therefore. It- 1 f ot So(~ .S)A0 (S) n 0 0 p (1) + 0 p (1) p[ j=lrf {sup J p by arguments similiar to those above. (a.n)-l dd n ~€IR dsl 0 (1) fJ ~n W)j(~-~0 ) < o} -+ 1 n~ for c < 'Y. I~-~ol~c which implies (by Aitchison and Silvey's lemma 2 (1958» j = 1....• K] that -+ 1 for c < 'Y. n~ j=l ..... K. 2) is proved. o 3.4 Assume. ~ellUl&a a) lim 2(1) n ~ = 00. and n b) AI. A3. C1 then. for ~ where sup t€[O.T] I iK I.(s)~nj j=l J - ~ (s)1 = 0p{l) as n -+ 00. 0 and I2-1 j 2) T onf O Ij{s)Vn(Pj.s) dNs{e) T I.(s)V(~ .s)S0 (~ .S)A (s) ds I = 0 (I) - 2.-1 f O J J 0 0 0 p PROOF: = };v 2 and a = O v j=l j (s) gives Define a series about ~ o o. Then. fix s € I j . A Taylor 102 vn (~j's) where = Vn (~ 0 (s).s) 1~:-~o(s)1 ~ 1~-~o(s)1 Recall that sup sE[O.T] sup If I.(s)V (~.s) tE[O.T] o J n J t therefore Ib-~o (s) 1<'Y dN s (e) - fot I.(s)V (~ .s) dN (e)1 J nos t ~ (3.27) sup hEIR sup If Ij(s)l~nj-~ (s)1 tE[O.T] o 0 To prove 1) consider. max It- I fot i< Ij(s)V (~j's) n j =I t E[ aI ..... ~ ] P (e) s o max tE[a I •··· .~] max 0 O O (Vno (~ .s)S (~ .s) - V(~ .s)S (~ .s»X (s) It -1 f Ot no 0 0 0 sup I i< I j ( s)l3~ sE[O.T] j=I + 0 In~ fot V (~ .s) dM (e)1 e i-II n-~ nos tE[a I •··· .~] + (e)1 0 (1). s t - t-Ifo V(~ .s)SO(~ .s)X (s) dsl (3.28) + dN dN max tE[aI •.... a,] ~ 0 ( s) I t-If~ SO(~ n e [i ~1 max tE[aI ..... ~] .s)X (s) dS] 0 0 ds I 1M t ( e) I °p (1). K (by 3.27). Applying Lenglart's inequality and assuming AI. Clone gets that the -1 ~ first term on the RHS of (3.28) is i l n 0p(I). The second term on the RHS of (3.28) is 0 (1) by AI. Again applying Lenglart's inequality p and assuming AI. CI. shows that the third term on the RHS of (3.28) is op (1). hence 1) is proved. To prove 2) consider. max HHK T ",n Ii -1 j f O Ij(s)Vn (Pj~'s) - -1 T 0 dN (e) - i. f O Ij(s)V (~ .s)S (~ .s)X (s) ds s J n 0 0 0 I 103 + + max 12-j1 l~j~K fb Ij(s)V (~ .s)[sO(~ .s) - sO(~ .s)]X (s) dsl non 0 0 0 I i< 1.(s)~-~ sup s€[O.T] j=l + J J max l<.<K _J_ The proof that the above is 2~1 J 0 p (s) I. [ fTO I.(s)SO(~ 0 J 12-j1 fTO 1.(s) elM (.) J s max l~j~K n 0 I .s)X (s) dS] ·0(1). 0 p (1) is as in the proof of 1). o ~ellUlla 3.5 Assume a) ~ o there exists is nonconstant and is continuous in s € [O.T]. ~ such that for (b.s) € b) ~ an open set containing [inf s€[O.T] ~ (s). 0 sup s€[O.T] ~ o (s)] x [O.T] CJ 0 1 CJb S (b.s) = S (b.s); cJ2 S0 (b.s) = S2 (b.s); -:2 0 S (b.s) > O. and Ob V(b.s) ~ 0 for each (b.s) € ~ x [O.T]; are continuous on [O.T] for each b € c) >0 V(~o's) for each s € [O.T]; T and f O Xo(s) ds < SO(b.s). and Sl(b.s) ~. Xo(s) >0 for each s € [O.T] 00. Then for KL(b) =Jb {(~o(S)-b) E(~o's) + Ln[ :~~:.~~) l}s°(~o'S)Ao(S) ds o the following holds. for each bL < inf s€[O.T] where [bL.bu] C ~ there exists a 1) ~o(s) and for each bu > 0 for which > sup s€[O.T] ~o(s). 104 and 2) there exists 0 > 0 such that inf KL(b) b€[bL,hu] PROOF: g(~ o Fix s € [O,T]. Define g(b,s) ~, (s),s) = I, and for each b € >0 . = SO(b,s)/SO(~o ,s), g(b,s) > o. then In the following a ' denotes derivatives with respect to b. Consider f(b,s) = (~o (s)-b)E(~ ;s) + Ln[ 0 S~(b,S) S (~ ,s) ] o = (~o(s)-b)g'(~o(s),s) + Ln[g(b,s)]. So -g'(~o(s),s) f'(b,s) = = g"(b.s)/g(b,s) f"(b,s) and f"(~ o + g'(b,s)/g(b,s) and, (s),s) - (g'(b,s)/g(b,s»2 = V(~0 ,s) > O. = V(b,s) ~ 0, Since f"(b,s) is nonnegative for each b and is positive at ~ (5), f(b,s) is a convex function in b € ~ with a o unique minimum at f(~ o = O. (s),s) Also this implies that for each b Likewise for each b = O. < f'(~o (s),s) < ~o (s), > f'(b,s) This can be carried out for each s € [O,T]; hence the nonnegativity of SO(~ ,s)X (s) implies that KL(b) is a convex o 0 function on ~. Let b € b+h €~, ~, b > sup s€[O,T] ~ (s). Choose h >0 small enough so that 0 . then by the mean value theorem there eXIsts b* (h,s) € [b,b+h] such that, f(b+h,s~-f(b,s) = f'(b*(h,s),s) ~ f'(b,s) Therefore, for b € ~, b > sup s€[O,T] ~ (s) 0 > f'(~o(s),s) = o. 105 . (3.29) 1 ~ KL(b+h)-KL(b) h h-<> Similiarly let b € 2li. b < inf ~ s€[O.T] (s). > 0 small enough so Choose h 0 that b-h € 2li. then by the mean value theorem there exists b*(h.s) € [b-h.b] so that. f(b.S)-~(b-h.s) = f'(b*(h.s).s) ~ f'{b.s) = o. As before. this implies that for b € 2li. b inf s€[O.T] ~ (s). 0 lim KL{b)~KL(b-h) ~ I~ f'{b.s)SO{~ .S)A (s) ds (3.30) h-<> For bu < < f'{~o{s).s) > sup s€[O.T] 0 ~o{s) < and bL (3.30) imply that there exists ~o{s}. inf s€[O.T] a < O. 0 bu. bL €~. (3.29) and > 0 so that inf KL(b) < KL(bu) - a b~bu and inf KL(b) b~bL < KL(bL) - a. 1) is proved. To prove 2). first choose bu bu. b L > sup ~ (s) and bL s€[O.T] 0 < inf ~ (s). s€[O.T] 0 €~. then choose c small enough so that there exists s*. s' for which bu-c > ~o(s*) > c + ~o{s') > c+bL (This nonconstant and continuous). can be done if ~o is Recall that for a convex function on a cOmPact set. say g. with a unique minimum. say at g(xO)' the following holds: Given c then Ix-xOI >0 < c. there exists 6 Therefore. for c >0 such that if g(x) - g(x ) O <6 > 0 there exists 6 > 0 such that if Ib-~o (s*) I > c/2 then f(b.s*) > 6 and if Ib-~0 (s')1 > c/2 then f(b.s') > 6. However f(b.s) = (Po(s)-b)E(Po's) + Ln[ S~(b.S) ) is uniformly S (PO.s) continuous in (b.s) on [bL.bu] x [O.T]. This implies the existence of balls around 5*. 5'. i.e. ~(s*). ~(s'). for which when Ib-~ (5*)1 o Then > c/2 and inf s€2li(s') f(b.s) inf * s~(s f(b.s) > 6/2 ) > 6/2 when Ib-P (s')1 > c/2. 0 106 JTO[(P (s)-b) E(P .s) o + Ln( 0 S~(b.s) J] S (P .s) SO(P .S)A (s) ds o l max[ I{b : Ib-PO(s*)I > c/2} J~ I~(s*) I{b: Ib-Po(s')/ > c/2} 0 0 (u) SO(PO.S)AO(S) ds 0/2. J~ I~(s')(u) SO(PO.S)AO(S) ds 0/2] this proves 2). since inf KL(b) l inf max[ I{b : Ib-Po(S*) I > c/2} b€[bL·hu] b€[bL·hu] ° JTO I~(s*) (u) S (PO.S)AO(S) ds 0/2. I{b : Ib-po(s')/ > c/2} > ° (since SO(PO.S)AO(S) > ° for S I{b : Ib-Po(s') I > c/2} l 1). € J~ I~(s,)(u) SO(PO.S)AO(S) [O.T] and I{b ds 0/2] Ib-po (S*) 1 > c/2} + D IV. LINEAR TFSf STATISTICS AND GENERALIZATIONS 4.1 A General Linear Statistic An As was seen in Theorem 3.1, the {P (s)}s€[O,T] behave as independent random variables with means {Po(s)}s€[O,T]. In this situation, the optimal test statistics for inferences concerning the {Po(s)}s€[O,T] are often linear test statistics (particularly if the random variables are multivariate normal). Zl' ... '~' Given K random variables, a linear statistic is of the form iK Wi Zi where the Wi'S i=1 are appropriate weights. In the situation here, the analogous statistic Gill (1980) uses a linear statistic for inference in the multiplicative model of Aalen s - - So [Xs (-)] In this model S- ~(s) ds is given by = P(s)Xs (i». . (1978) (X (X,i) 0 -1- dN (-) which when suitably normalized will converge weakly s to a continuous Gaussian martingale (Gill, 1980). test statistics of the form IT w(s)P(s) ds o = IT0 Gill then considers w(s)[x (_)]-1 dN (-). s s Since w may involve unknown quantities, w will have to be estimated, say by w. n In this section conditions for convergence of w to ware given so that the linear statistic has the appropriate n 108 asymptotic distribution. Also in this chapter. a (s) is given by V(~ .s)So(~ .S)A (s) 2 addressed. and A2 a o = (s) n In section 4.2. the optimal choice of w is _K Y- I.(s)(i.n) j=l J J -1 a2 An --:2 ~ (~ ) a p~j n 0 0 for s € [O.T]. Theorem 4.1 Assume a) w has at most a finite number of discontinuities on [O.T] and lIill-2 b) lim nlltll 4 = ITo (wn (s)-w(s»2 ds = 0 p (1). lim IItll lXI. n A. B. C. DB. lim t is nonconstant. o d) lXI. (1) n ~ = O. n -E < c) and if 2 lim nlltll 6 < D2. D4. and lXI. n e) w is Lipshitz continuous between discontinuities on [O.T] then. T A Yh I o wn (s)[~(s) - ~0 (s)] ds = n ~ T ~ -2 _K i~1 I o j:; Ij(s)a j T ;;Il I o Ij(u)w(u) du (Xs(i) - En(p .s» dMs(i) + 0p(1) 2 = I To where a j r 2 Ij(s)a (s) ds and ~ T vn I o wn (s)[p (s) - ~ 0 (s)]ds => N(O.IT0 !!D 2 2 w (u)[a (u)] -1 du) Consider. PROOF: T An Yh I o wn (s)(~ (s)-~ (s»ds 0 = Yh T An I 0 (wn (s)-w(s»(~ (s)-~ 0 (s» ds + T A Yh I o w(s)(~(s)-~0 (s» ds 109 The first term on the RHS above can be shown to be T An I~ I (w (s) - w(s»(~ (s) - ~ (s» dsl o n 0 -2 T < [llill I0 = [llill (wn (s)-w(s» 2 ds nllill -2 T I o (wn (s)-w(s» 2 ds] ~ 0 p (1) as follows; 2 T An I0 0 (1) p (~(s)-~ 0 (s» 2 ds] ~ (by Theorem 2.1 or lemma 3.3) = 0 p (1) Therefore T A ~ I o wn (s)(~(s) - ~0 (s» (4.1) T ds An = vh I o w(s)(~ (s)-~0 (s» ds + 0 = vh IT w(s)(~(s)_pn(s» ds + vh 0 p (1). IT w(s)(pn(s)-~ (s» 0 ds + op(l). 0 The second term on the RHS of (4.1) is identically zero if ~ o is a constant function, otherwise, if w is piecewise Lipshitz continuous it 4 is O(n~ lIill). To see this consider, IT w(s)(pn(s)_~ (s» o ds 0 = t< T T 2 _IO~I.K./_s_)w_(_s_)_I.....;;O----lI JI-.(_u_)(_~....;..o_(u_)_-~.-;o~(_s_»_u_(_u_)_d_u_ds j=l By D2, there exists L 2 IT o Ij(s) U (s) ds > 0 for which I~0 (u) - 0 ~ (s)0 - ~'(s)(u-s)1 ~ L(u-s)2 a.e. Lebesgue. Therefore lITo w(s)(pn(s)-~0 (s» ~ dsl I .t<1 i~l IT0 I.(s)w(s)~·(s) IT0 J J 0 J= I.(u)(u-s)u2 (u) du dsIO(l) J 110 _K I .r1 + -1 i. J= J I T Ij(s}w(s} I T Ij(u}(u-s} 2 du ds I0(1} 0 0 >0 By 04, for u € I. there exists L' J la2 (u} (recall that a j = - a2 (a j -1}I ~ ~j ii' a i=1 lITo w(s}(~(s}-P 0 (s}) + for which L' lu-a j _ 1 1 a.e. Lebesgue = O). Then, 0 dsl _K rIj=1 i -1 I T I.(s}i.3 ds I0(1}. j 0 J J Again by D2, Ip~(s) - P~(aj_1)1 ~ L1s-aj_11 and bye), there exists L" a.e. Lebesque e > 0 for which on the intervals that do not contain a discontinuity point of w, so that, lITo w(s)(~(s)-P 0 (s» dsl Therefore. by (4.1) and the above arguments. .,hi I T A T w(s)(jj1(s) - p (s»ds = .,hi I 0 0 0 A -n ~ 4 w(s)(jj1(s)-P (s» ds + O(n lIill ) Following the same steps as in Theorem 3.1, results in ~. 111 vh I To An -n w(s)(~ (s) - ~ (s» = n-~ IT ds iK I.(s)w(s) o~2 ~ J J i=l o j=l + 0 p IT 0 I.(u)(X (i) - E (~.u) dN (i) ds J u n u (by lemmas 2.1. 3.1) (1) So An vh I To -n w(s)(~ (s) - ~ (s)ds (X (i) - E (~.u» u n + = n~h . ~1 IT0 1= 0 p iKI.(u)o-j2I T I.(s)w(s) ds (X (i) - E (~.u» 0 J u n J= 1 J ~ + n I T u (1) . (4.2) dN (i) dM (i) u -2 T _K r-Ij(u)oj I I.(s)w(s) ds [E (~ .u) o . 1 0 J n 0 J= -n 0 - En (~ .u)] Sn (~0 .s)A0 (s) ds + 0 p (1). The second term on the RHS of (4.2) can be shown. by lemma 3.2. to be o (1). p This concludes the proof of the first assertion. The next step is to show that ~ I T w(s)(p~ (s) - ~-n (s)ds has the required asymptotic o distribution. Zt = n~ ~ i=1 Define the process Z by It iK 0 j=1 I (u)O-j2 j IT 0 I.(s)w(s) ds (X (i) - E (~.u» J u n dM (i) U then ~ T Io An -n w(s)(~ (s) - ~ (s) ds = -r 7_ + 0 p (1). Rebolledo's central limit theorem as stated in section 1.2 can be used to show that Z converges weakly to a Gaussian martingale. consider. First 112 <Z> = t I 0t _K Y- -4 t 2 2 (I Ij(s)w(s) ds) [S (~ .s) J o n 0 I.(u)o. j=l J + 0 p 2 ;:;Il 0 + E (p .s)S (~ .s) n n 0 by lemma 2.2 and AI. (1) Then. __ It + I ot + 0 p _K Y- . 1 J= -4 (IT0 I.(u)o. J J I.(s)(w(s}-w(u}} ds} J 2 0 2 (u) du (I) _K I ( ) -4 2( }D2 2( } d Yj u OJ w u ~j 0 u u o j=l + 0 P (by a). (l) So. t -2 2 by 00. <Z>t = I o 0 (u) w (u) du + 0 p (I) To complete the proof all that is necessary is to verify the following Lindeberg condition: n -1 ~ T i~l I o j:; Ij(u}oj [Io Ij(s}w(s} ds] -4 -K T 2 - E (fl.u}) n -1 • I{n -4 T e ~ (u}X OJ (I o Ij(s)w(s} ds} (i) u 0 2 2. (XU(l) Y (i}A (u) u 0 ;:;Il (Xu (i}-En (p .u)} 2 >~} du 113 = 0 p (1) for each ~ > o. The proof of this is virtually identically to the proof of the Lindeberg condition in Theorem 3.1 and is omitted. o In order to formulate a statistical test, as consistent estimator for the variance of "'n vh I T o wn (s)[~ (s)-~0 (s)]ds is needed. The next theorem furnishes this. Theorem 4.2 Assume, a) ITo b) lim nllill 4 = (w (s) - w(s»2 ds n = 0 p (1), lim lIill 2 = 0, and 00, n n i c) AI, A3, C, DI, D3, lim.;i!9n c;(1) < 00, then PROOF: Recall that "'2 = o (9) n _K r- j=l I (s)(2 .n) j i2 -1 (-) --- ~ a~~ n J J and o 2 (s) = V(~o ,s)8 0 (~ Then, (s) II To wn2 (s)o"'_2 n (4.3) ds - I T 2 0 W (9)0-2 (s) ds I 0 ,s)A0 (s). ~ (p ) 114 "'2 By Theorem 3.2. 2 > o. inf a (s) s€[O.T] Note that by C2. sup la (s) s€[O. T] n Therefore. lIt w2 (s);-2(s) ds - It w2 (s)a-2 (s) dsl o n n 0 = ITo(w (s) - w(s»2 ds 0 (1) + 0 (1) + {IT (w (s)-w(s»2 ds}~ 0 (1) n p p 0 n p = 4.2 0 p (1) o by a. Efficacy of the Linear Statistic COnsider the statistical model given in section 2.4. but with the further assumption that,n = a{Nn(i). s ~ t. i=l ..... n} x,n where,n ~ t s 0 0 a(~n). 'So c,n. Let ~p be the probability under which!! has stochastic (13 (s)+f3(s)/Yh)Xn (i) intensity. X (i) = e s . l=l ..... n and let n denote o ~ s 0 n ~R ~ where 13 yn(i)X (s) s € [O.T]. s _ =0 0 (X (i) = e s f3o (s)X:(i) yn(i)X (s) s € [O.T] i=l •...• n). Note that up to this point all s 0 asymptotic results given in this thesis are under ~n. On,n assume ~n o n = ~13" a.s. and. Since the Xn(i)'s are locally bounded. which in turn implies that ~p «~: 0 ~ sup IXn(i)1 i=l s€[O.T] s 0 < 00. (see Kabanov et. al, 1980) 115 + ~ i=1 It [ l-e ~(s)n-~ s (i») e ~ 0 (s)X (i) s Y (i)A (s) s 0 0 € [O.T]. t If ~~ is contiguous with respect to ~~ then Le Cam's third lemma (Hajek & Sidak. 1967) can be used to choose a weight function which results in a linear test statistic with highest asymptotic power against the contiguous alternative ~ o (s) + ~(s)n -~ . This is the method Gill (1980) uses in order to formulate two-sample tests in survival analysis. In particular. this method leads to a generalization of the Wilcoxon test to censored data. ~n. o In lemma 4.2. n ~~ is shown to be contiguous to This result is well mown; see (Dzhaparidze. 1986) for a much more general proof in the point process setting. joint asymptotic normality of The theorem to follow gives ~ ITo wn (s)(pn(s)-~ (s) ds. 0 and Ln[~P] ~n . o T An application of Le Cam's third lemma is then possible and follows the theorem below. Theorem 4.3 Assume. a) ~(s) b) w has at most a finite number of discontinuities in [O.T] and is bounded on [O.T] lIill-2 c) d) lim nllill n 4 = co. ~n (w (s) - w(s»2 ds ~> 0 n 2 lim lIill = O. n ITo i A. C• D3 • 11n·m:ill. i(l) < co . d and 1nstea sup n-~IX (i) l~i~n s€[O.T] s max 0 f B• ~n IY (i) s _0_> 0 116 and if ~ o e) is nonconstant. 6 lim nllill < 00. D2. D4 and n w is Lipshitz continuous between discontinuities on [O.T] f) then. (Yh T A I o wn (s){~{s)-~0 (s» ds. where 2 uLS U ITo ~(u) S2{~ 0 .u)A0 (u) duo and 2 = L u LLS = 1T0 w{u)~{u) duo Lernnia 4.2 implies that. PROOF: Ln[ 2 2 -1 1T w (u)[u (u)] duo 0 = ~; ) = n~i=1'ffl IT ~{s)X s (i) ~n 0 IT elM (i) - .5 s 0 ~(s)S2{~ .s)A (s) ds 0 0 o + 0 ~n (1) o and by Theorem 4.1. '" Yh I T o wn (s){~{s) - ~0 (s»ds = n~ 'ffl i=1 IT iKI j {s)u-j2I T Ij{u)w{u) du {X (i) 0 j=1 0 s + 0 ~n (I). o where. Let zn = t n~ 'ffl It ~{s)X (i) i=1 0 s elM (i) s E (pn.s» n elM (i) s 117 and Xn = n-~ It ~ I.(s)a~2 IT I.(u)w(u) du (X (i) - E (pn,s» t 0'1 J J 0 J s n J= dM (i). s Then using Rebolledo's central limit theorem in section 1.2, lemma 4.2 and Theorem 4.1, all that is necessary to conclude this proof is to show that ~n <Zn, Xn > ~> t It0 ~(s)w(s) ds for each t € [O,T]. But, <Zn,Xn>t = Ito ~(S)[j=1 ~ I.(s)a-j2 IT Ij(u)w(u) dU]V (~ ,s)So(~ ,S)A (s) ds J o n 0 n 0 0 (4.4) + It o ~(s) T I.(u)w(u) du ] (E (~ ,s) Y-Ij(s)a-2 j I OJ '1 no J= [ _K - E (pn,s»SI(~ ,S)A (s) ds n n 0 0 Consider the second term on the RHS of (4.4): It o ~(S)( '~I.(s)a-j2IT Ij(u)w(u) l J 0 J= dU](E (~ ,s) - E (pn,s»Sl(~ ,S)A (s) ds no n no 0 IE (~ ,s) - E (pn,s) I ds 0 (1) non ~n = Ito (by AI, C) o = 0 ~n(1) (by lemma 2.2). o Therefore <Zn ,Xn >t = It o ~(s) ( Y-K Ij(s)a-2 j j=l IT 0 Ij(u)w(u) du )V (~ ,s)S0 (~ ,S)A (s) ds non 0 0 + 0 ~n (1) o = It o 2I T I.(u)w(u) ~(S)[v=li<I.(s)aj J 0 J (~ ,s)So(~0 ,S)A0 (S)-W(S)] du V non ds 118 + It o ~(s)w{s) ds + 0 '!jn (I) o The first term on the RHS above is bounded above in absolute value by -1 T I.{s)t. I J J 0 0 I.{u)w{u) du V {~ ,s)S (~ ,s)A (s) J non 0 0 -1 2 - w{s) t j OJ = 0 p by the continuity of w{s) and (I) (by C2) ds O{l) °2 (s) and AI. D As outlined in section 1.1 of this thesis, a special interest lies in testing various alternatives against the null hypothesis that an unknown constant function. ITo =0 w{s) ds o is To do this, first restrict the possible ITo weight functions to those for which imply that ~ w (s) ds = 0 (which will then n if w converges to w appropriately) so that n asymptotically under the null hypothesis the statistic will have zero mean. The results of Theorem 4.3, can then be used to produce optimal linear test statistics (optimal in the sense of maximizing asymptotic power) for these problems. LS (w ) nn = V£ Define the linear test statistic to be An T I ow n (s)[~ (s) - ~ ] ds 0 = v£ T An I on w (s)~ (s) ds. Le Cam's third lemma (see Hajek & Sidak, 1967) and Theorem 4.3 , imply that the '!1~ distribution of LS (w ) converges to a normal distribution with the n fJ n 2 mean 0u.s and variance 0LS of Theorem 4.3. Also define v£ Vn (wn ) = T 2 A2 -1 I o wn {s)[on (s)] ds. Theorem 4.2 and Le Cam's lemma I, imply that v£ Fix z a '!1n V (w ) n n :JL) to be the (I-a) distribution; then, o2 th LS = IT0 w2 {s)o-2(s) ds. percentile by the standard normal 119 1 n [LSn(Wn ) o jV (w ) n n ~n [ LSn (wn ) [3 jvn(wn ) > Za] a as ) -> n ~ 00. and n [ LSn (wn ) - °LLS _ GUS ] --) z a °LS °LS jvn(wn ) za] = ~[3 ~ ~[z 1 - a - 0LS OLLS) as n~ 00. Therefore to maximize the asymptotic power of the test statistic LS (w ) n n against the contiguous alternative. [3 maximize °LLS 0LS o + [3(s)/vn. w should be chosen to subject to the restraint that IT w(s) ds = O. Of course. 0 one must then find suitable w converging to w. n procedure are given in sections 4.3 and 4.4. the efficacy of LS (w ) n n Examples of this °LLS The ratio - - is called °LS The following lemma gives the weight function .Jin (wn ) which maximizes the efficacy. Lemma 4.1 for s € [O.T]. then o 0LLS w maximizes _ _ = °LS subject to the constraints I and 0 PROOF: < ITo w2 (s) 0-2(s) ds T o < IT w(s)[3(s) ds ~=-o"-::-_ _--::~_ _-:-:' {ITo w2 (s) 0-2(s) ds}~ w(s) ds = O. 00 Let w be a Lebesgue measurable function which satisfies ds = 0 and 0 < IT0 w2 (s)0-2(s) ds < Then IT w(s)[3(s) ds = IT 0 0 00 ITo w(s) 120 IT P(u) 02(u) du ] w(s) P(s) - _o~T;:--:::2:------ ds, which implies by the Cauchy Schwarz [ I o 0 (u) du inequal i ty, that [J~ w(s)P(s) dS]2 ~ 2 [ I To w2 (s)o -2 (s) ds I T o 0 (s) P(s) - I To P(u) T 2 Io 0 0 2 (u) du ] 2 ds. (u) du That is, IT w(s)P(s) ds [ ---"T~o2"""'--~2--~IL ~ ITo 02(s) P(s) {I o w (s) 0- (s) ds}n IT P(u) 02(u) du -o.;;.."T=--=2---I o 0 (u) du ]2 ds. This concludes the proof as the RHS above is the efficacy of woo [] Notes to Theorem 4.3 and Lemma 4.1 1) The result of lemma 4.1 indicates that in order to construct an optimal linear test statistic for a given contiguous alternative, p(s)n -~ ,s€[O,T], _ I To P(u)o2 (u) du I To 0 2 one should use the weight function w0 (s) ]. Since O W Po + = 0 2 (s) [P(s) involves unknown quantities, it must be (u) du estimated. The following two sections will illustrate the choice of the optimal weight function and its estimation. 2) In order to construct a test which will be consistent against all contiguous directions of approach to P , (in the classical case, o this would be a likelihood ratio test), it is tempting to use the results of lemma 4.1 and Roy's union-intersection principle (Roy, 1953). But as is well known this only leads back to the PLRT (which is not 121 consistent against contiguous alternatives}. as can be seen from the following argument. For a given approach to the constant function ~o' reject H o ~o is the regression coefficient. if 18 (wo }2 n [n~ f~ w~(s) ~n(s} = n dS]2 _ Vn (wno ) = ~ - ""2 0 f ~(u}o (u) T ""2 n dU] ds ]2 (u) du 0 T ""2 f o ~(u}on(u) dU]2 [ f o ones} ~(s) - for any f -="_--.,;.o~n------- . T ""2 is large. T ~ T"" ""2 [ n f o ~n(s)on(s} ~(s) [ ds T ""2 f o 0 n (u) du Then using Roy's union-intersection principle reject H if o (i.e. any direction of approach) the above statistic is large. That is. use sup T ~ T"" ""2 [ n f o ~(s}on(s) ~(s) [ f - 0 f ""2 ~(u}o (u) T ""2 n 0 dU] ds ]2 (u) du -=-_.......;;o~;;;;n------ 13 T ""2 f o ~(u}on(u) dU]2 f o ones) ~(s) T ""2 ds f o 0 n (u) du T ""2 as the test statistic. [ n ~ T '" [ But this is equal to. '" [", f o ~(s}on(s) ~(s) - f T ""n 0 T "'2 n (u) dUJ f o 0 n (u) du T '" - "'2 ~ (u}o "'2 f o ~(u}on (u) dUJ2 T ""2 f o 0 n (u) du ds ds ]2 122 This is a Wald type statistic and can be shown (when properly normalized) to be equivalent to the PLRT of chapter III. To see this. consider ~ T "'2 '" 2 T "'2 '" 2 [n f a (s)~(s)-~ ) ds] = n f a (s )(~( s) - ~) ds - -_0~TFn-=-:"'2::---_-':o:....-_o n 0 f a (s) ds o n '" 2 f To a"'2n (s)(~(s) - ~) ds + 0p (1) 0 =n (by arguments similiar to those in Theorem 3.1) Recall that ~ P • a~....j ~n(~o) j - ~o = -a2~---' - - a~ j = _K T j=1 0 r- f ~ (~ n then ) 0 n "'2 Ij(s)a (s) ds n -~ -n -1 a ~ (~ n 0 af.l.... j i< [n~a~ ~ j=1 .... j 2 + 0 (1) a2 - p a~ ~ (~ n j = ) (p)]2 a- 2 +O (1) j p n 0 By the above and Theorem 3.3 one gets that. ) 0 123 [n~ ag j = -i< j=l = (2 KT -1 <£ (f3 ) ]2 n 0 0-.J 2 + 0p (1) '" T ) _IL '" T 2{<£n (~(H1»T - <£n (f3(H 0 »T} (2 KT-1)-~ [ n 1ba.t is, T '" n I T "'2 o 0 n (s) [", pn(s) - "'2 I o pn(u)on (u) du ]2 T "'2 I o 0 n (u) du ds - K + 4.3 0 p (1). A Test for Regression ~s was mentioned in the previous section, interest lies in testing various alternatives against the null hypothesis that f3 constant function. o is an unknown In particular, one such class of alternatives is the set of nondecreasing functions. If a nondecreasing function is smooth enough then at least locally it can be approximated by a linear, nondecreasing, function: in classical regression, this has lead to the use of a simple regression test to detect this class of alternatives. In an analogous way, consider the use of the weight function which is optimal against a linear regression (i.e. set f3(s) = s in lemma 4.1). R w (s) = 2 0 (s) [ s - 1T0 u 0 Io 0 2 (u) du ] T 2 . (u) du In the following corollary, the behavior of the test statistic based on this weight function in investigated for the class of nondecreasing 124 alternatives (see. also. the notes following the Corollary). Assume. a) ~ b) lim nllell o is a constant function. 4 = 00. lim lIell n c) 2 =0 and n A. B. C. D4 lim n e ..;.rn < 00. ~(1) then for "'2 o (s) = - n _K r- Ij(s)(ejn) -1 j=1 T cJ2. "'n ----2 ~ (~ ) a ~. n J "'2 f 0 u on (u) du ] "'2 [ R W (s) = ones) s n and T "'2 f o 0 n (u) du • one gets. N(O.I) and if assumption B in c) is replaced by B' : max l~i~n ~ sup n~IXs (i)1 s€[O.T] is bounded on [O.T] then. T 2 ~(~~) fo 0 (s) [ s - __>N T T [ {fo s - PROOF: f T u 0 2 (u) du ] ~ 2 ~(s) ds f o 0 (u) du 2 dU] f 0 u 0 (u) T 2 f o 0 (u) du ~ 2 2 0 (s) ds } • 1 125 To use Theorems 4.1, 4.2 and 4.3, the following needs to be 2 11211- verified: ITo If!'n [wR(s) - wR(s)]2 ds n R w (s) =a 2 (s) [ s - - > O. 0 I~T u2a 2 In this case, (u) du ] I o a (u) du . Now, T (4.5) "2 I~ I o u a n (u) du T "2 + 4 [ I a (u) du o n T "2 2 2 u a (u) dU]2 T 4 I o a (s) ds T 2 I o a (u) du + 4Jo (an(s) - a (s» 2 ds Consider the second term on the RHS of (4.5), 2 ITo u ;2(u) du IT0 u a (u) du ]2 n [ I To a"2n (u) du - ---=T::---:::2--I 0 a (u) du "2 T = [I u a (u) du on IT0 a 2 (u) du - IT0 u 2 a (u) du "2 ITon a (u) du ] 2 0 #1 (1) o by Theorem 3.2 and C2). T "'2 = [I u (a (u) - a 2 (u» o n du I T a 2 (u) du ]2 0 o ~n (1) o T 2 + [I a (u) - a"2 (u) du I T u a 2 (u) du ]2 0 (1) o n 0 ~n o = I T (a"2 (u) - a 2 (u» 2 du 0 o n ~n (1) (by the Cauchy-Schwarz o inequali ty) 126 Theorem 3.2 implies that 10A2n (s) sup s€[O,T] - 2 0 (s)1 = 0 ~n (1). Therefore by o C2, (4.5), and the above, IIill- 2 IT [wR(S) - WR(S)]2 ds = IIill-2 IT (;2(s) - 02(s»2 ds o n o n 0 ~n (1). o Now I~[;~(s) ~ - 02(S)]2 ds 2 I T _K r- Ij(s)[(i j n) o j=l + 4 + 4 (4.6) = 2 I T a2 An ---- ~ (~ ) - (i. n) J a ~2 n _K r- Ij(s)[(i. n) o j=l -1 J T _K r- Ij(s)[i.-1 o.2 o j=l J J A [T Ij(u) V _K rI j=l n 0 a2 --:2 a P~j a2 --:2 ~ (~ ) aP n 0 ~ (~)] n 2 ds 0 -1 2 2 + i j oJ.] ds j 0 + 4 -1 j [T Ij(u)[Vn (pn,u) - vn (~ -K I r- j=l I -1 (~ 0 0 2 (s)] 2 ,u)] dN (e) ]2 U - 0 ds ,u) dN (e) - I U TIj(u) 2(u) du ]2 0 0 Imitating the proof of lemma 3.1, one gets that the second term on the RHS of (4.6) is 0 ~n (n- 1 ). Also by the Lipshitz continuity of 02(u) , the o last term on the RHS of (4.6) is 0 ~n 4 (IIill ). o Using a Taylor series for each j Vn (~,u) = Vn (~0 ,u) + J A a aI-'j R So by A3 and the fact that * Vn (~.,U)(R.-~) J I-'J 0 An max I~-~ l~j~K j 0 1= where 0 ~n o * IRj_R I I-' 1-'0 I ~ I~_R P j 1-'0 A (1) (from lemma 3.3). 127 I~-~ I J 0 0 ~n (1). Using the o above, the first term on the RHS of (4.6) is t< j=l [ST L{u) J 0 = dN (e )]2 U I~j - ~ 12 0 0 ~n (1) o t< . 1 J= [p~ J _~ ]2 max e l~j~K 0 + [[ST I.{u) dM J 0 (e)]2 U [STOI.{u)So{~ ,u)X (u) J no 0 du]2] 0 ~n (1) o = O~n[{lItIl4n)-1) [O~n{~) o + 0 O~n{lItIl2») (by lemmas 2.2 and 0 3.3). Therefore by (4.6) and the above, By Theorems 4.1 and 4.2 and the fact that T R STo w (s) ds = 0, n ,. ~ So wn (s) ~(s) ds and by Theorem 4.3 and Le Cam's third lemma (see Hajek and Sidak. 1967, pg. 208) ~ STo wR{s) pn{s) ds n ~(~~) fJ 128 2 T 2 [ JT u a (u) dUJ J o a (s) s - ~ 2 ~(s) ds J a (u) du N T [ J { o s - J~ 0 u a (u) du J2 2 }~ . a (s) ds T 2 J o a (u) du 2 1 o Notes to Corollary 4.1 1) The result of the above corollory implies that the test T R '" ~ J w (s) ~(s) ds ~ statistic given by T oR n 2 "'_2 {J w (s) a (s) ds} o n n [~ contiguous nondecreasing function for which > ~(s') > ~(O)]. ~(T) is consistent against a is nondecreasing and 3 s' € [O.T] This means that under the contiguous alternative. the asymptotic mean is positive. JT u a2 (u) du denote ~ 2 J o a (u) du by s. Note that s€ To prove consistency (O.T). Then and T2 J - a (u)[u-s]~(u) du s ~ T2 -J - a (u)[u-s]~(s) duo s One of the above inequalities is strict depending on whether s' s' ~ s or > s. Therefore This means that under n ~P' {V (wR)}~ n converges to a normal random n variable with a positive mean. 2) In addition to the assumptions of corollary 4.1. assume d) and 129 e) of Theorem 4.1, then using the results of Theorems 4.1 and 4.2, one R R also gets that LSn (wn )/{Vn (w)} n ~ is consistent against the following class of fixed alternatives: A= {~ o satisfies D2 and is nondecreasing and 3 s' € [O,T] for which > > ~ (T) ~ (s') ~ (O)}. 0 0 0 To see this consider, Recall that - by note 1 above. ~o' JTo a2 (u) du ] ~ (s) ds >0 0 All this implies that LS (wR)/{V (wR)}~ n n n n 3) JTo u a 2 (u) du ~> ~0 belongs to A. if 00 If interest lies solely in detecting monotonic alternatives to ~o a constant function, and not particularly in the estimation of a function, then a simpler test could be formulated as H : o unknown versus HI : ~(s) =a likelihood ratio criterion. + ~s, ~ > 0, ~(s) =a, as a a unknown, using some sort of Cox (1972) proposed the use of a multivariate regression model; the two covariate processes being X and s sX , s s € [O,T], and the regression coefficient and alternate hypotheses become H : o ~ = 0, ~ = (a,~) and HI : ~ € > o. 2 m. The null In this 130 case, a test statistic would be v£ ~ divided by an estimate of its '" (~ standard deviation is the maximum partial likelihood estimator of ~). R It turns out that the test based on LS (w ) has the same efficacy as n n that based on v£~ (i.e., they have the same asymptotic power functions with respect to contiguous alternatives). and ~o(s) ~o = (l,s) • = ao · To see this, let ~ = (ao'O) Then Theorem 3.2 of Anderson and Gill (1982) can be used to get (under appropriate conditions), '" ~ (~ n = ~-1 ~o) - -~ n U(~o) + 0p(l) where U(~o ) = { ~IT Xs (i) i=l (~ ~nIT seXs (i) - E ,s) dM (i), nos i=l 0 0 (~ - En .s» dM (i)}T os and I~ ~ T 2 I s o (s) o 02(s) ds = [ 1T s 0 2 0 (s) ds Therefore. T 2 2 I s o (s) ds o -ITo s A test statistic for H : o - n ~ ~ i=l IT 0 0 ~ 2 (s) ds >0 s 0 p (1) 0 2 (s) ds 2 IT 0 (s) ds o versus HI : (X (i) - E (P s n 0 + -ITo .s» > O. ~ dM (i) s would be based on. 2 IT uo (u) du} 0 T 131 = n -!.-S - (X s (i) - En (~ 0 .s» fT U0 0 2 (u) du} dM s (i) + 0 p (1). The above converges in distribution under H to N(O. 1~1-1 fT 02(s) ds) o 0 by Theorem 3.2 in Anderson and Gill (1982). Using a very similar proof to the one in Theorem 4.3. results in where 2 o 'Y = -1 T I~ 1 f o ~(s) 2 0L 2 0 f To u 0 2 (u) du ] T 2 (s) [ s - -;;"'T~2=---- f o 0 (u) du ds f o 0 (u) du T 2 2 = f0 ~ (s) S (~ .s) A (s) 0 0 !.-SA The efficacy of ri -1 'Y T = I~ 1 fo T [ 2 2 is then. (0'YL) /0 'Y 0 2 (u) du f~ 2 = T [ s - f T u 0 2 (u) du ~ 2 f o 0 (u) du f T0 u 0 2 (u) du ] 2 2 T fo dU] [T 2 [ u 02(u) f o ~(s)o (s) s - ---::T::----:2=---f o 0 (u) du f o ~(s)o (s) [ s fo ds. 2 0 0 ds ]2 ] ]2 ds (s) ds (u) du But as was seen in corollary 4.1. this is the efficacy of the test based R on LS (w ). n n 132 4.4 A Test for a Change Point Instead of considering a smooth alternative to the null hypothesis that ~ o is constant as was done in the last section. consider the alternative of a change point. constant function versus HI : That is. consider H : o ~o ~ 0 is an unknown is constant up to an unknown point T € (O.T). changes value and then stays constant at the new value. Praagman (1988) gives a nice discussion of the reasoning behind classical change point tests. test. Essentially they are generalizations of the two sample Say the time of change. T. is known: then to detect if a change occurs. a two sample test. TS(T). can be used. To allow for that fact that T is in reality unknown. one considers a weighted sum of the TS(T)'S or a maximum of weighted TS(T)·S. In this section the latter route is chosen to formulate a test for change point. Suppose Xi ~ N(~i' variables. where for i unknown). test. a ~ -2 ) i = 1 •.... K. are independent random i T. ~i = ~ + 6 and for i > T. ~i = ~-6 (~ It is easy to derive the uniformly most powerful similiar This test rejects 6 = 0 in favor of 6 ~ TS(T) = i=T+1 a 2 i ~ a~ i=1 T 2 'J. a i Xi i=1 'J.T a 2 i=l i ~ a~ > O. ~ i=T+1 if. 2 a i Xi is large. i=1 Recall that one can intuitively think of n ~""'n ~i ~ .~ AiI p (s) ds o ~. • -2 i = 1•...• K. as independent N(n AiI ~ (s) ds. AiI a (s) ds) random 000 variables. = n Therefore. it apPears that the above two sample test 2 · -2 statistic with Xi = n~ AiIo• p~ (s) ds and a i = AiIo a (s) ds would be appropriate for testing whether there is a change in ~ o at time T. The J 133 test statistic would then be, (4.7) This is precisely the statistic to which lemma 4.1 leads. contiguous alternative of interest is ~o + ~(s)n -1h where That is, the ~(s) = orcs ~ O of lemma 4.1 is, therefore, W T) - 0 r{s > T): I: a {u) 2 2 o w (s) = a (s) [ r{s ~ T) - T Io du ] a (u) du 2 and nih IT wOes) ~(s) ds is given by (4.7) above. o gives the distribution for this two sample test. Corollary 4.2 below Recall that Assume a) ~ b) lim nil I! 11 o is a constant function, 4 = co, lim II I! 11 n c) 2 = 0, and n A, B, C, 04, lim n E n c;{1) < co For each t € [O,T] define, TSn (t) = nih IT0 [ r{s ~ t) A2 An an (s)~ (s) ds T A2 on {Io a n (u) du = 0 elsewhere, and > O} 134 I~ ~~(u) du ]2 A2 """":T=-""=A-:::'2--a n ( s) ds I o a n (u) du r A2 on {I o =0 a n (u) du > O} elsewhere. Then ~(~n) 1) TS TS (in the supremum norm topology on C[O.T]). where 0) n TS t = G - g(t)Gr' G is a continuous Gaussian martingale with variance t It a2 (s) ds process <G>t 2 a (s) ds and get) = I 0t 0 = """":r:---:::'2--- I o a (s) ds • t € [O.T]. and 2) Ivn (t) sup t€[O.T] - V( t) I ~n _...;;0;.....-) Vet) = O. where ITo a2 (s) ds (g(t) - g2(t». and if assumption B in c) is replaced by ~n B' : max Hi~n ~ sup n~IXs (i)1 Ys(i) _o ) 0 as n --+ s€[O. T] co and. is bounded on [O.T] then. 3) TSn ~(~r;) > TS + ~ (in the supremum norm topology on C[O.T]) where TS is as above. ~(t) = I~ [ 1(s ~ t) - get) ~n 4) sup t€[O.T] Ivn (t) - V(t)1 ---p~> O. PROOF: Because ~ o ]a2(s)~(s) ds. is constant. t € [O.T] and. 135 18 (t) n = ~ IT0 = ~ 1ot [ I{s ["n a"2 (s) ~ n ~ "2 t 1 a (u) du ] "2 ~ ,,~ 1o a n (u) du t) - A2 t JO 1T aA2n (s) [A~(s) - 1 a (s) ds (s) - ~ ] ds - ~ A~ o 1 a (s) ds o " an{u){~{s) - ~o) ds ~ 0 n t E ] ds. 0 [O.T]. The proof of part 1 follows these two steps. =0 sup tE[O.T] 1a) ~ 1b) 1o• ~n (1) o [An a"2 (s) ~ (s) - ~0 n ] ds converges weakly in C[O.T] to G where G is a continuous Gaussian martingale with <G>t = 1~ a {s) ds. 2 If 1a and 1b are true. then since f : C[O.T] ~C[O.T] defined by f{x)t = x t - g{t)Xr' t E [O.T] is a continuous functional. f [JO 1~ ~~(s) [pn{S) - ~o] dS] converges weakly to f{G). This will then finish the proof of part 1. To prove 1a). recall that by Theorem 3.2. o ~n (1). sup sE[O.T] laA2 (s) 2 - a (s)1 n This is sufficient to prove 1a). since IT a 2 {s) ds > o o. o To prove 1b). note that ~ 1t a"'2 (s) ['" ~(s) - ~ ] ds o n 0 " = ~ 10t [A2 an{s)-a2] (s) ['~(s) - ~o ] ds (4.8) + ~ 10t a 2 (s) [A~(s) - ~o ] ds Consider the supremum of the first term on the RHS of (4.8). sup I~ 1t tE[O.T] 0 ["2an (s) - a2] (s) [A~(s) - ~ ] dsl 0 = 136 (by lenuna 3.3). =0 (from the proof of corollary 4.1. eq. 4.6) (1) !f1n o Following the proof in Theorem 3.1. results in sup t€[O. T] l,;n It 02(s) [~(S) - .-K I t Ij(s)oi -2 - n ~ ~~ ri=1 j=1 0 = 0 If1n o ds (j ] 0 0 0 2 (s) I T Ij(u) [X (i) - E ({j .u) ) dN (i) ds I uno 0 u (1) But n~"J? ~ It I j (S)0-j2 02(s)IT Ij(u)[X (i) - E ({j i=l j=l 0 0 uno = n~"J? .~ i=1 j=l (4.9) n~ - + It Ij(u)[X (i) - E ({j 0 uno It a i (t)-l 0 2 "J? i=l dN (i) U -2 ~ ~~ I T I.(t)(u) [ ] (s) ds 0i(t)n X. (1) - E ({j .u) dN (i) i=l 0 1 uno U :Ii n~ dN (i) ds u "J? It [X (i) - E ({j .u)] dN (i) i=l a i (t)-l uno U where a i = j=l 2 j If .U)] .U») and a i (t)-l I- [X (i) - E ({j <t uno 0 ~ ai(t)· .U») dN (i) converges in distribution U to a random function. G. concentrated on C[O.T]. then by the representation theorem (Pollard. 1984. pg. 71). there exists sup t€[O.T] to n~h I~ "J? i=l - Gt ' ~0 a.s. I- [X (i) - E ({j 0 uno as n ~m. and .U)] xn is equal dN (i) and likewise U xn. Gwhere in distribution G is equal in 137 distribution to G. Therefore the last two terms on the RHS of (4.9) are equal in distribution to - rxnai(t) - xna l i (t)-l ]. n rxn l t X a i (t)-l Ghas Since ] + st 02(s) ds a i (t)-l 0~2(t) 1 continuous paths and la.( )-t l goes to 1 t zero. sup t€[O.T] I-xnt + xna rxnai(t) - xna 2 02(s) ds 0a i (t)-l i(t) + st i (t)-l --+ 0 a. s. as n l -+ 00 i (t)-l ]I • Hence the supremum of the absolute values of last two terms on the RHS of (4.9) converge in ~n_probability to zero as n n-~ ~ S· i=l 0 o [xu (i) - E (~ .U)] no dN (i) converges in u random function concentrated on C[O.T]. n~ . ~I O S· U [x (i) - E n 1= (~0 .U)] -+ 00. if ~n-distribution to a 0 The proof that dN (i) converges to a continuous Gaussian u martingale. G. with variance process <G>t = S~ 02(s) ds. is similar to the proof of normality in Theorem 3.1 and is omitted. Therefore by (4.8) and (4.9). 1b) is proved. Define ~(t) st ;2(u) du T ....2 - ....,o:=---"..;,;n~__ on {S 0 (u) du > O} - ST ;2(u) du 0 n o n = 0 Then by Theorem 3.2 and C. {STo 0"'2n (u) du > O}. Vn can elsewhere. t € [O.T]. sup Ig (t) - g(t)1 = 0 (1). t€[O.T] n ~n o On be written by Therefore sup IVn(t) t€[O. T] S~ 02(u) du [get) - g2(t)]1 = o~n(l) as maintained. "J'0 138 Part 2 is complete. The proof of part 3 will be broken into the following two steps. 3a) The finite dimensional distributions of TS converge under ~ n the finite dimensional distributions of TS + 3b) to ~. {TSn}n~1 is tight with respect to ~~. These two steps then imply part 3. mand Fix a , ... ,a € 1 m t , ... ,t 1 € [O,T] (m does not increase with n), and consider, I t i '"'2 a (u) du ] '"'2 - ~ '"'~ I o a n (u) du '"' an{s){~{s) - ~o) ds Theorem 4.3 can be used to derive the joint asymptotic distribution of [~;;] fD a iTS (t.) and Ln under n 1 ...J,an .-1 UJ"o 1- T ~n. Then Le Cam' s thi rd I ellllla (Rajek 0 and SidAk, 1967, p. 208) will give the asymptotic distribution of i:: aiTSn{t i ) under ~~. To use Theorem 4.3, set I t i '"'2 a (u) du ] '"'2 and [ I{s ~ til - I ~ a'"'~ (u) du an{s) o n 2 w{s) = fDa [ I{s ~ til - g{t i ) ] a {s). i w (s) = fDa n i i=l i=l By repeatedly employing the triangle inequality one gets that ITo {w (s) - w{s»2 ds n ~ '"'2 2 2 IT ds O{l) o (an (s) - a (s» m 139 tv A2 m2 ~ a + v=l v I T [Io 0 T A2 2 2 = I (a (s) - a (s» ds 0 o an(u) du a2 (u) du A ITo '!In n 2 - g(t ) v ] ds 0(1) n (1) o = 2 (llill- ) 0 '!In o by the argument used in the proof of corollary 4.1 (eq 4.6). By Theorem 4.3, fD [ i=1 ~('!Jn) a.TS (t ), 1 n i Ln[;i] ] __0_> o T N[(O, ~ a~), n-lOO and hence by Le Cam's third lemma, Using a little bit of algebraic manipulation, it is easy to see that the above implies that the finite dimensional '!J~-distributions of TS n converge to the finite dimensional distributions of ~(t) = I~ [ l(s ~ t) - get) = It o ~ + TS. where ]a2(s)~(s) ds a2(s)~(s) ds - get) IT0 a2(s)~(s) ds. To show that TSn converges in '!J~-distribution to ~ + TS. Theorem 3 in Pollard (1984. p. 92) can be used. This theorem gives necessary and sufficient conditions for the weak convergence of X € D[O.T] to X where n P[X € C[O.T]] = 1. The first condition is that the finite dimensional distributions of X converge weakly to the finite dimensional n 140 distributions of X. The second condition is that for corresponds a grid 0 = to < t 1 < ... < t m = 1 p{ lim ~ > 0, 0 >0 there such that max l~i~m n So all that is needed is to show that the second condition, above, holds for 18n under ~. Let ~ > 0, 0 >0 p] d1n [d1 and denote by L n By lenuna o T ~(~n) 4.2, Ln 2 2 > L where log L ~ N(-.5 aL' aL). Since EL 0 = 1, Y >0 n-P) can be chosen so that 1 - E I{L ~(~~) that 18 n ~' > 0, ~' <~ > ~ TS y as -1 n ~ > y}L < ~~. 00; therefore, by theorem 3 in Pollard, for there corresponds a grid 0 such that lim n ~n{ l~i~m max 0 Then - n{f3 Hi~m lim ~ max n - + lim f I{Ln ~ y} d1f3n n + 1 - lim fY x d1n L- 1 (x) - n In part 1 it was proved 0 -0 n = to < t 1 < ... < t m = 1 141 ~ y ~. + 1 - E I{L Therefore. TSn ~(~~) < y}L > ~ ~ ~. Also since ~~ is contiguous to ~~. both + TS. sup IVn(t) tE[O.T] I~ ~n 02(s) ds g(t)(I-g(t»1 ~> 0 ~n and sup Ign(t) - get) tE[O.T] I ~> 0 as n -+ 00. o The test statistic TS (T) can be used as a basis for a change point n if the change point is allowed only to occur at time T. To construct a change point test which is valid for any change point in (O.T). TS . n taken as a function of t can be used. In particular Roy's union-intersection principle is useful (Roy. 1953). rejected if any of the TSn (t) are large. That is. H o is This test is quantified below. See the notes following the corollary for a discussion of consistency. In the corollary below. get) 2 [get) - g (t)] for t E [O.T]. ceo'WLLClItff 4.3 Define g (t) n = ITo =0 g-I(V} n on ;2(s) ds {ITo "'2 (s) ds n 0 > O} n elsewhere. t E [O.T]. = inf{u E [O.T] g (u) n = v}. v E [0.1]. and 142 STSn = sup tE[h,l-h] where ~ is defined to be 0, hE (0,.5). Under the assumptions a), b) and c) of corollary 4.2, 1) STS n which is equal in distribution to where W is a standard Wiener process. Furthermore, under B' of corollary 4.2, TS(g-l(t» 2) sup tE[h,l-h] [ STS n V(g-l(t)~ as n --+ co. PROOF: emma 4.3 can be used to prove this corollary as follows. 4.2, TS By corollary converges weakly in the supremum topology on D[O,T] to TS where n TS t = G - g(t)G.r. and G is a continuous Gaussian martingale with t variance process <G>t = I~ 02(s) ds. verified for X = TS and X = TS. n n g(t)1 = 0 ~n Assumption d) of lemma 4.3 is then By Theorem 3.2. sup I~ (t) tE[O. T] --n (1) where g is strictly increasing. g(O) = O. Note that 2 '11 o is a.s. nonnegative and nondecreasing. and that V is a.s. nonnegative. n Define Vet) = ITo 02(s) by corollary 4.2. ds g(t)(l-g(t»; then V is positive on (O.T) and sup Ivn(t) - V(t)1 = 0 (1). tE[O.T] ~n o This verifies 143 assumptions a). b). and c) of lemma 4.3. o 0 g -1 Gr. )t - t converges in n (TS 0 g-l)t sup t€[h.l-h] (V 0 g-l)t~ ~n-distribution to (G Therefore STS Note that (TS g 0 -1 )t = This is a continuous Gaussian process t € [0.1]. with E(TS 0 1 2 g- )t = <TS g 0 -1 ~-l(t) 2 = J~ a (s) >t ~-1(t) 2 and with E(TS g-l)t(TS 0 0 g-l) a (s) ds = I J~ - 2t o = (s-st) IT0 s 2 ds + t T 0 a 2 (s) ds T 2 I o a (s) ds 2 a (s) ds (s < t). This covariance structure characterizes the Brownian Bridge -1 TS o g B= T 2 ~ {I o a (s) ds} t(l-t) Hence STS n sup sup t€[h.l-h] Jt(l-t) sup I~(t) - g(t)1 ~> 0 t€[O.T] Note that as n ~ ~ by Theorem 3.2. lemma 4.2 and Le Cam's third lemma (Hajek and Sidak, 1967, p 208). Conditions a), c). and d) of lemma 4.3 are satisfied by corollary 4.2 with Xn TS +~. is equal in where W is a standard Wiener process. Lemma 4.3 can also be used to derive part 2. ~n Also converges in distribution to But as is well known. distribution to 0 g-l)t = T 2 I o a (s) ds (V is a standard Brownian Bridge. for t € [O.T]. sup t€[h.l-h] Jt(l-t) Therefore = TSn and X = Therefore, STS n «TS+~)o sup t€[h,1-h] (V 0 -1 g 1 )t ~ g- )t o 144 Notes to Corollary 4.3 1) DeLong (1981) has given asymptotic critical values for the test based on STS . n Part 2 of corollary 4.3 implies that, STS 2) n against the contiguous change point alternatives. distribution of srs under n distribution of STS n sup t€[h,l-h] (TS 0 g-l)t (V 0 g-l)t n ~fj under ~ 0 is consistent That is, the limiting is stochastically larger than the limiting To see this note that, is equal in distribution to sup t€ was seen in the proof of corollary 4.3. Therefore if as [1, e~h]2] ~ 0 g -1 is a nonnegative function with a positive value at, at least one time point, then (~ 0 g -1 )t sup t€[h,l-h] (V will be stochastically larger than I ot 0 2 But -1 ~(t) = =a + ~ l(s ~ T) where ~ >0 and T € Using a little bit of algebra results in ~(t) = ~(IT 02(s) ds)-l o I~Vt 02(s) ds g sup and since change point alternatives are of interest, fj(s) ~ 0 g-l)~ 0 (s)fj(s) ds (O,T). 1 I:At 02(s) ds >0 for t € (O,T). is a positive function on (0,1). This implies that 145 4.5 The Independent and Identically Distributed Case In the following. the notation of corollaries 4.1. 4.2 and 4.3 is used without comment. 4.4 '€O'l,oUO'tg- Consider n i.i.d. observations of (N.X.Y) where both X and Yare a.s. left continuous with right hand limits. is a constant function. a) ~ b) lim nllell o 4 = 00. lim lIell n c) If 2 = O. n -:00. e < D4. lim 00. and (1) n d) the conditions of corollary 2.1 are satisfied. then. !l1)(~n) __o~> N(O.l) and. STS n which is equal in distribution to standard Wiener process. If in addi tion. e) E sup s€[O.T] ~ x2y < 00 and s s is bounded on [O.T] where W is a 146 then. T 2 R is (w ) n n = {V (wR)}~ n n ~(~~) I T u a 2 (u) du ] ~ 2 ~(s) ds I 0 a (u) du [ I o a (s) s - >N f {f~ [- - T 2 ~ . } o U a (u) dU]2 2 a (s) ds T 2 I a (u) du 1 0 and STSn ~(~j3» [ sup t€[h.1-h] TS(g-l(t» J.L(g + V(g-l(t)~ -1 (t» V(g-l(t»~ ] as n --+ co. PROOF: Note that the conditions for corollaries 4.1. 4.2 and 4.3 are the same; that is A. B. C. and conditions a). b). and c) of this corollary. In corollary 2.1. it was shown that A and C are consequences of the assumptions given in corollary 2.1. Also as before a slightly weaker version of B. B'. is sufficient for the asymptotic distributional results under ~n. o ~ (s)X (i) o s Recall that B': > -olxs (i)l} ds =0 ~n max l~i~n IT I{s: Ix 0 (1) for some 0 >0 s (i) IY (i) s and each E. > E.n~ > O. and o that the conditions of corollary 2.1 imply B'. To derive the asymptotic distributional results under ~~ of corollaries 4.1. 4.2. and 4.3. a stronger condition that B' is required; i.e. it suffices to assume max sup l~i~n s€[O.T] n ~Ix (i) s ~n I Ys(i) ~> 0 as n --+ co. Assumption e) of this corollary implies this condition. as can be seen 147 Let Z(i) = s€[~~T] IXs(i)1 Ys(i). i=I ..... n; by the following argument. > O. then the Z(i)'s are i.i.d. and for c P( max n-~(i) > c ~ nP(n-~ > c) ) l~i~n I ~ 2 Z dP .Jh_ -> 0 as n -+ (1) (n --Z>c) 2 (since EZ = E sup s€[O.T] x2y < s s (1). o 4.6. Lenunas 4.2 and 4.3 ~el4l1&a 4.2 Assume a) P is bounded on [O.T]. ~n b) c) max sup l~i~n s€[O.T] A n.J /X (i)IY (i) _0_> O. as n s -+ and (1) s AI. A3. CI, then. sup t€[O.T] - n~ i=1 ~ It0 1 + -2 P(s)X (i) dM (i) s s It ~(s)S2(p .s)X (s) dsl 0 0 0 ~n _0_> 0 and for Z. defined by Zt = n~ as n -+ (1) ~ It P(s)X (i)dM (i). t € [O.T]. i=1 0 s s Z converges weakly under ~n in the supremum topology on D[O.T] to a o continuous Gaussian martingale with variance process. 148 Remark: Assumption b. above. is a slight strengthening of the Lindeberg condi tion B. PROOF: ~n Ln( ~ )t o = n~ xn i=1 It ~(s)X s 0 xn I . 1 + 1= = n~ xn i=1 + = n ~ xn I i=1 ~ ~ . 1 1= It ~(s) (.J n h 0 t I0 s t [ n 1-e -~ ~(s)X (i») ~ (s)X (i) s e s 0 Y (i)A (s) ds s 0 0 X (i) dM (i) s 0 t (i) dN (i) s n ~(s)X (i)+1-e ~ ~(s)X (i)J ~ (s)X (i) s s 1 ~(s)X (i) dM (i) - 2 e 0 s Y (i)A (s) ds s 0 2 t...2 I P (s)8 (~ .s)A0 (s) ds son 0 s ( 4.10) ~ e Choose ~ > O. so that sup 1~(s)1 s€[O.T] lex Therefore on { max sup 1~i~n s€[O.T] ~x2 < ~-1. - x - 11 ~ o (s)Xs (i) Ys (i)A0 (s) ds and note that for Ixl < 1 elx3 1 . n~IX (i)Y (i)1 < ~}. the absolute value of s s the third term on the RHS of (4.10) is bounded above by 149 P (s)X (i) P3 (s)1x3(i)Ie 0 s Y (i)A (s) ds 00 s so ~ n-3/ 2 IT '1 1= = 0 ~n (n -~ by A3, Cl. ) o Therefore by assumptions b and AI, ~n sup t€[O,T] Ln( ~] ~n - n -~ t o J!1 i=l JT P{s)X (i) dM (i) 0 s s - ~T ~(s)S2(p ,s)A (s) ds 000 =0 ~n (1). o To prove that Z converges weakly to the appropriate continuous Gaussian martingale, Rebolledo's theorem as given in section 1.2 is used. Note that <Z)t = It ~(s)S2(p ,s)A (s) ds which by Al and CI converges in ~n o n 0 0 0 probability to st ~(s)S2(p ,s)A (s) ds for each t € [O,T]. 000 arbitrary n- 1 ~ I i=1 = ~ ) 0 consider, T ~ 2 s p (s)X (i) e 0 Po(s)Xs(i) Y (i)A (s) I{n s -1 ~ 2 p (s)X (1) s 0 JT I{n- l ~(s}x2{i) Y (i) ) ~} ds 0 (l) l/l<n 0 s s ~n max ~ Next for - Y (i»~} ds s (by a, AI, CI) 0 = 0 ~n (l) {bya,b}. o o ~emma. 4.3 150 n n Consider X , V random elements in D[O,T], n ~ 1, X, ~, n ~ 1 random elements in C[O,T], and V,g deterministic functions in C[O,T], for which the following holds, a} vn is a.s. nonnegative, V is positive on (O,T), g n : [O,T] ~ [0,1] is a.s. nonnegative and nondecreasing on [O,T], g : [O,T] ~ [0,1] is strictly increasing, g(O} = 0, b} sup Ign (t) - g(t}1 = t€[O,T] c} sup Ivn(t} - V(t}1 = 0 p (I) t€[O,T] n X converges weakly in the supremum norm topology on D[O,T] to d} 0 p (I), X. then for any h € (0,.5), sup t€[h,l-h] zn converges in distribution to (X sup t€[h,l-h] (V 0 g-l}t 0 g-l)~ PROOF: he following steps will be proved. -1 I} For ~ (v) = inf{u sup v€[O,l] 2} n X X 3} 4} Ign-1 (v) = v}, - g -1 (v) I = v € [0,1], (I). 0 p ~l converges weakly in the supremum topology on D[O,l] to 0 0 € [O,T] : gn(u} g -1 sup I(vn t€[O,l] 0 ~1} - (V 0 g-l) t Define f : D[O,l] x D[O,l] t ~ ~ by I = (l) 0 P 151 f(x.y) where = X t sup ~ tE[h.1-h] Y t gis taken to be zero; I{y t > o} then f is continuous with respect to the supremum norm topology on C[O.l] x {z E C[O.l] : inf z(t) tE[h.1-h] > O}. If 1) through 4) hold. then by the continuous mapping theorem (Pollard. n 1984. p. 68). Z converges in distribution to (X sup tE[h.1-h] (V 0 g-1)t 0 g-1)~ and the proof is finished. Consider sup vE[O.1] (4.11) (since g is 1:1). Define fn(u) = g~l(g(u» sup Ig(fn(u» uE[O.T] (4.12) u E [O.T]. then gn(fn(u» - g(u) I = sup Ig(fn(u» uE[O.T] < = g(u). and - gn(fn(u»I sup Ig(u) - gn (u)1 = uE[O.1] 0 p (1) Note that since g : [O.T] ~ [0.1] is 1:1 and continuous. g-l : [0.1] ~ [O.T] is also continuous and indeed uniformly continuous. then 3 {, > O. SUp { uE[O.T] < (, so that if Ix-yl If (u)-ul n <~ } ~{ and (4.11) and (4.12) imply that Choose then Ig- 1 (x) - g-l(y)I <~. sup Ig(f (u» sE[O.T] n sup vE[O.1] Ig n-1 (v) - g(u)1 - g -1 (v) ~ > O. Then < {, }. I = 0 (1). p Step 1 is proved. n -1 ) converges weakly in supremum norm n It is easy to see that (X.g 1 topology on D[O.T] x Do to (X. g-l) where P«X.g- ) E C[O.T] x Co) = 1 152 where D consists of the elements o satisfy 0 X 0 g -1 g 0 of D[O.l] that are nondecreasing and T for all t. and Co = Do n C[O.T]. ~ ~(t) ~ n implies that X ~ -1 n This in turn converges weakly in the supremum topology to (see Billingley. 1968 pg. 144). The proof for step 3 is identical to the above and is omitted. To show that f as defined in step 4 is continuous. let (xn.yn). (x,y) € C[O.I] x {Z € C[O,I] inf z(t) t€[h.1-h] > O} and suppose that n sup sup Ix t - x t I + t€[O.l] t€[O.l] Since inf y t€[h.1-h] t > O. = ~ for n large I as inf Ytn t€[h.1-h] n Xt sup t€[h.1-h] (Y~)~ > 0 and x t sup ~ t€[h.1-h] Yt sup t€[h.1-h] (Y~)~ xn t n -+ co. x t -~ Yt ~ Ixt(Yt) n ~ - xt(Yn ) ~I 0(1) sup t t€[h.1-h] ~ I (Y ) ~-(Yn ) ~I sup Ix~-xtl 0(1) + sup t t t€[h.1-h] t€[h.1-h] 0(1) The square root function is uniformly continuous on [ 2 sup Y]; t€[h.1-h] t n hence. for large n If(xn.y ) - f(x.Y)1 is small if n sup Ix t - x t I + t€[O.l] continuous on ~ inf Yt' t€[h.1-h] is small; that is f is 154 APPENDIX: ADDITIONAL NOTATION N(e) = ffl N(i) i=1 - N(e) = n -1 ~ ~ • N(i) i=1 f3 (u)X (i) M (i) = N (i) - IS e 0 u Y (i)A (u) du. s s s o u 0 ~ = o{ Ns(i). s~t. i=I •...• n}. t Ln x: ~: € [O.T]. i=I •.... n [O.T] € natural logarithm of x the closure of the open set ~ C m i a i = ! lj' a = O. i=I ..... K (the dependence of li on n is j=1 0 n suppressed) i(t) = j if t lim: n lim inf. € (a. l.a.] J- lim: n n~CO J lim sup and lim: n n~co lim n~ I j is the indicator variable for the interval (a _ 1 .a j ] j ~ (f3) n ....2 o = i=1 ffl IT 0 -K X (i) - s Ln E (f3.s) dN (i) (t) = - r- I.(t) (ljn) n j=1 J n -1 s cJ2 .. ~ ~ (~). t ap n € [O.T] j o2 (t) = V(f3o .t)Sa(f3 0 .t)A0 (t). t [O.T] € o~J = IT Ij(s) 02(s) ds. j=1 ..... K o n a on2 (t) = vn (f3 o .t)S n (f3 0 .t)A0 (t). t € ., [O.T] 155 ,.. t ~(H o ) = arg{ max '£ (b)t } where bEm n (0 OL n (b) _- ~n It X (10) .L. t i=l 0 S - Ln En (b • S ) dN S (0) 1 • t E [a l . 0 0 0 • ~] 156 REFERENCES 0.0. Aalen. A Model for Nonparametric Regression Analysis of Counting Processes. Springer Lect. Notes in Statist. 2 (1980) 1-25. 0.0. Aalen. Nonparametric Inference for a Family of Counting Processes. Ann. Statist. 6 (1978). 701-726. J. Aitchison and S.D. Silvey. Maximum-likelihood Estimation of Parameters Subject to Restraints. Ann. Math. Statist. 29 (1958) 813-828. J.A. Anderson and A. Senthilselvan. A Two-step Regression Model for Hazard Functions. Appl. Statist. 31 (1982) 44-51. P.K. Anderson and o. Borgan. Counting Process Models for Life History Data: A Review. Scand. J. Statist. 12 (1985) 97-158. P.K. Anderson and R.D. Gill. Cox's Regression Model for Counting Processes: A Large Sample Study. Ann. Statist. 10 (1982) 1100-1120. A.R. Barron and T.M. Cover. Minimum Complexity Density Estimation. unpublished manuscript. Dept. of Statistics. Univ. of Illinois. Urbana. IL. 1989. I.V. Basawa.and B.L.S. Prakasa Rao. Statistical Inference for Stochastic Processes (Academic Press. London. 1980). J.M. Begun. W.J. Hall. W. Huang. and J.A. Wellner. Information and Asymptotic Efficiency in Parametric - Nonparametric Models. Ann. Statist. 11 (1983) 432-452 P. Billingsley. Statistical Inference for Markov Processes (The University of Chicago Press. Chicago. 1968a). P. Billingsley. Convergence of Probability Measures (John Wiley & Sons. New York. 1968b). Z.W. Birnbaum and A.W. Marshall. Some Multivariate Chebyshev Inequalities with Extensions to Continuous Parameter Processes. AMS 32 (1%1) 687-703 O. Borgan. Maximum Likelihood Estimation in Parametric Counting Process Models. with Applications to Censored Failure Time Data. Scand. J. Statist. 11 (1984) 1-16. P. Bremaud. Point Processes and Queues. Martingale Dynamics (Springer-Verlag. New York. 1981). • 157 C.C. Brown, On the Use of Indicator Variables for Studying the Time-Dependence of Parameters in a Response-Time Model, Biometrics 31 (1975) 863-872. D.R. Cox, Regression Models and Life Tables (with discussion), ]. Roy Statist. Soc. B 34 (1972) 187-220. E. Csaki, Some Notes on the Law of the Iterated Logarithm for Empirical Distribution Function, in: P. Revesz, ed., Colloquia Mathematica Societatis Janos Bolyai. II., Limit Theorems of Probability Theory (North Holland, Amsterdam-London, 1975). K.O. Dzhaparidze, On Asymptotic Inference about Intensity Parameters of a Counting Process, Report, Centre for Mathematics and Computer Science, Amsterdam, 1986. 2 R.A. Fisher, On the Interpretation of X from Contigency Tables and Calculation of P, ]. Roy. Statist. Soc. A 85 (1922) 87-94. M. Friedman, Piecewise Exponential Models for Survival Data with Covariates, Ann. Statist. 10 (1982) 101-113. S. Geman and C. Hwang, Nonparametric Maximum Likelihood by the Method of Sieves, Ann. Statist. 10 (1982) 401-414. R.D. Gill, Censoring and Stochastic Integrals (Mathematisch Centrum, Amsterdam 1980). U. Grenander, Abstract Inference (John Wiley & Sons, New York, 1981). J. Hajek and Z. Sidak, Theory of Rank Tests (Academic Press, New York, 1967) . R.T. Holden, Failure Time Models for Thinned Crime Commission Data, Soc, Methods 8 Research, 14 (1985) 3-30. S. Johansen, An Extension of Cox's Regression Model, Int. Statist. Reu. 51 (1983) 258-262. N.L. Johnson and S. Kotz, Continuous Univariate Distributions - I, Distributions in Statistics (John Wiley & Sons, New York, 1970). Ju.M. Kabanov, R. S. Lipcer and A. N. Shiryayev, Absolute Continuity and Singularity of Locally Absolute Continuous Probability Distributions II, Math. USSR Sbornik 36 (1980) 31-58. A.F. Karr, Inference for Thinned Point Processes, with Application to Cox Processes, ]. MuLtivariate AnaL. 16 (1985) 368-392. A.F. Karr, Maximum Likelihood Estimation in the Multiplicative Model via Sieves, Ann. Statist. 15 (1987) 473-490. A.F. Karr, Point Processes and their Statistical Inference (Marcel Dekker, Inc., New York, 1986). 158 P.E. Kopp, Martingales and Stochastic Integrals (Cambridge Univ. Press, Cambridge, 1984). E. Lenglart, Relation de Domination entre deux Processus, Ann. Inst. H. Poincare 13 (1977) 171-179. • J. Leskow, Histogram Maximum Likelihood Estimator of a Periodic Function in the Multiplicative Intensity Model, Statistics & Decisions 6 (1988) 79-88. I.W. McKeague, A Counting Process Approach to the Regression Analysis of Grouped Survival Data, Stochastic Processes. AppL. 28 (1988) 221-239. T. Moreau, J. O'Quigley and M. Mesbah, A Global Goodness-of-fit Statistic for the Proportional Hazards Model, AppL. Statist. 34 (1985) 212-218. H.T. Nguyen and T.D. Pham, Identification of Nonstationary Diffusion Model by the Method of Sieves, SIAM ]. COntroL and Optimization 20 (1982) 603-611. M.P. O'Sullivan, A New Class of Statistics for the Two-Sample Survival Analysis Problem, thesis, Biomathematics Group, Univ. of Washington, Seattle, Washington, 1986. D. Pollard, Convergence of Stochastic Processes (Springer-Verlag, New York, 1984). S. Portnoy, Asymptotic Behavior of Likelihood Methods for Exponential Families When the Number of Parameters Tends to Infinity, Ann. Statist.16 (1988) 356-366. J. Praagman, Bahadur Efficiency of Rank Tests for the Change-point Problem, Ann. Statist. 16 (1988) 198-217. H. Ramlau-Hansen, Smoothing Counting Process Intensities by Means of Kernal Functions, Ann. Statist. 11 (1983) 453-466. R.R. Rao, The Law of Large Numbers for D[O,I]-Valued Random Variables, Theor. Probab. AppL. 8 (1963) 70-74. R. Rebolledo, Sur les Applications de la Theorie des Martingales a L'etude Statistique d'une Famille de Processus Ponctuels, Springer Lecture Notes in Mathematics 636 (1978) 27-70. S.N. Roy, On a Heuristic Method of Test Construction and Its Use in Multivariate Analysis, AMS 24 (1953) 220-238. D.M. Stablein, W.H. Carter, Jr., and J.W. Novak, Analysis of Survival Data with Nonproportional Hazard Functions, COntroLLed CLinicaL TriaLs 2 (1981) 149-159. e • 159 J.D. Taulbee, A General Model for the Hazard Rate with Covariables, Biometrics 35 (1979) 439-450. W.H. Wong, Theory of Partial Likelihood, Ann. Statist. 14 (1986) 88-123. D.M. Zucker and A.F. Karr, Nonparametric Survival Analysis with Time-Dependent Covariate Effects: A Penalized Partial Likelihood Approach, to appear ill Ann. Statist., 1989.
© Copyright 2025 Paperzz