Uniersity of North Carolina Institute of Statistics Mimeo Series # 2184T MARGINAL t-DDELS FOR MULTIVARIATE FAILURE TIME DATA WITH GENERALIZED • DEPENDENCE STRUCl'URE BY: Limin Xu Clegg Date Name .. ., ", I I II , MARGINAL MODELS FOR MULTIVARIATE FAILURE TIME DATA WITH GENERALIZED DEPENDENCE STRUCTURE by Limin Xu Clegg Department of Biostatistics University of North Carolina Institute of Statistics Mimeo Series No. 2184T May 1997 • MARGINAL MODELS FOR MULTIVARIATE FAILURE TIME DATA WITH GENERALIZED DEPENDENCE STRUCTURE by Limin Xu Clegg A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics, School of Public Health. Chapel Hill 1997 Approved by: Co-Advisor Co-Advisor Reader Reader Reader @1997 Limin Xu Clegg ALL RIGHTS RESERVED 1 LIMIN XU CLEGG. Marginal models for multivariate failure time data with generalized dependence structure. (Under the joint direction of Drs. Jianwen Cai and Pranab Kumar Sen.) ABSTRACT In epidemiologic studies, there is often more than one outcome measured on the same subject. Multiple failures from the same individual induce multivariate failure time data involving within-subject dependence. Furthermore, participants may not be independent within a cluster (e.g., as with members of the same family). Responses from subjects in the same cluster generate multivariate failure time data with between-subject dependence. In this research we consider a generalized dependence structure which consists of both between-subject and within-subject dependence instead of dealing only with one of the two types. We propose using a marginal approach to analyze multivariate failure time data with generalized dependence structure when the scientific interests are in the effects of covariates on the risk of failures and knowledge of the dependence structure is not available. Three types of marginal hazard models are proposed: distinct baseline hazard models, common baseline hazard models, and mixed baseline hazard models. All these hazard models are in the form of Cox regression models. The mixed baseline hazard model provides significantly greater modeling flexibility and applicability, and enables us to deal with some application problems the current existing methods can not handle. Inference on regression parameters for each type of model is based on a system of pseudo score equations obtained under the working assumption of independence, which is in the framework of generalized estimating equations. Relying on the theory of multivariate counting processes, stochastic integrals and local martingales, we have proven that the estimators for the proposed models are consistent and asymptotically 11 normal with a robust covariance matrix which can be consistently estimated. The simulation results show that the proposed large sample approximation is adequate even when the sample size is relatively small (n = 50). The methodology is illustrated on data from the Framingham Heart Study. The consequences of misspecified marginal models are also investigated. We derive the asymptotic properties of the pseudo maximum partial likelihood estimator under possibly misspecified marginal Cox regression hazard models. Simulation studies are conducted to obtain information on the effects of misspecified marginal hazard models on the sample sizes for practical applications. 11l ACKNOWLEDGEMENTS I would like to express my sincere appreciation to my advisors, Dr. Jianwen Cai and Dr. P. K. Sen, for their guidance and enthusiasm about this work. Their encouragement and confidence in me have been invaluable in this research. I would also like to thank the members of my committee, Dr. Gerardo Heiss, Dr. Lawrence Kupper, and Dr. Bahjat Qaqish, for their constructive comments and suggestions. I would especially like to thank Dr. Kupper for serving as my academic advisor and also personal advisor. lowe so much to him for his inspiration, guidance, and friendship. I am very grateful for the opportunity to work at the Collaborative Studies Coordinating Center in the Department of Biostatistics, which provides me financial support as well as a learning opportunity. This research topic originated from my job there as a statistician. I would especially like to thank Dr. Woody Chambless for all I have learned from him and to Dr. Ed Davis for his encouragement and support. I would like to express my gratitude to the statistical faculty members in the Department of Mathematical Sciences at the University of North Carolina at Greensboro while I was a student there: Dr. Herr, Dr. Kissling, Dr. Ludwig, and Dr. Warrack, for guiding me into statistics. Very special thanks go to Dr. Melvin Hurwitz, my advisor while I was in the Department of Clothing and Textiles at the University of North Carolina at Greensboro, for his understanding and moral support when I decided to give up being a Ph. D candidate to pursue my degree in Biostatistics. His confidence in me and his help are always appreciated. I am grateful, though I declined, for being awarded an On-Campus Dissertation IV Fellowship by the Graduate School at the University of North Carolina at Chapel Hill for the Fall 1996 semester and a Predoctoral Fellowship by the Veteran Affairs Central Office for the fiscal year of 1996-1997. My efforts were partially supported by a National Institute of Environmental Health Sciences Training Grant. I would like to thank my friends, my parents, brother, sisters, and in-laws for their encouragement, love, and support. Finally, I want to thank my wonderful husband, Carney, for having been with me every step of the way. This dissertation would not have been possible without his support. v Contents 1 Introduction and Literature Review 1 1.1 Introduction . . . . 1 1.2 Literature Review . 3 1.2.1 Univariate Survival Data Analysis . 3 1.2.2 Modeling Multivariate Failure Time Data. 6 Introduction . . . . 6 Joint Distribution . 7 Conditioning on Past Events or History. 12 Frailty Models. . 17 Marginal Models 20 . . . . . . ........ 26 Model Misspecification and Robust Inferences 26 .................. 34 Concluding Remarks 1.2.3 1.3 Synopsis of Research 2 Marginal Modeling and Estimation 2.1 Introduction.......... 36 36 VB Notation and Definitions 2.3 Marginal Hazard Models and Estimation 2.4 3 ........ 2.2 2.3.1 Distinct Baseline Hazard Models 2.3.2 Common Baseline Hazard Models 2.3.3 Mixed Baseline Hazard Models Concluding Remarks 36 38 · . · . .. . . 38 .......·..... 42 43 .......... 44 Asymptotic Distributions of Parameter Estimators 46 3.1 Introduction . . . . 3.2 Two useful Lemmas . 3.3 Mixed Baseline Hazard Model 48 3.4 Distinct Baseline Hazard Model 72 3.5 Common Baseline Hazard Model 73 3.6 Concluding Remarks 46 ...... ..... .. · . . . . . . . . .. . . · . · . . ... 4 Simulation Studies of Parameter Estimators 47 75 76 4.1 Introduction . 76 4.2 Simulation Parameters 77 4.3 Summary Statistics . . 79 4.4 Results and Discussion 80 5 Example 5.1 89 Introduction 89 Vlll • 5.2 Data and Model . . . . 89 5.3 Results and Discussion 91 6 Misspecification of Marginal Hazard Models 93 6.1 Introduction................... 93 6.2 Asymptotic Properties of Regression Estimators 95 6.2.1 Mixed Baseline Hazard Models 95 6.2.2 Distinct Baseline Hazard Models 114 6.2.3 Common Baseline Hazard Models 116 6.3 Special Cases . . . 118 6.4 Concluding Remarks 123 7 Simulation Studies of Marginal Hazard Model Misspecification 124 7.1 Introduction.................... 124 7.2 Simulation Parameters and Summary Statistics 125 7.3 Results and Discussion . . . . . . . . . . . . . . 125 134 8 Remarks 137 Bibliography IX List of Tables 4.1 Observed mean pairwise correlations for different () values based on 1,000 simulation runs with sample size of 1,000 and no censoring . .. 4.2 Simulation results for Normal(0,1) covariate, uniform(0, 5) censoring distribution, () 4.3 = 0.25, exponential marginals, and sample size of 100. 4.8 86 Simulation results for f30 = 0.7, Bernoulli(0.5) covariate, uniform(O, 5) censoring distribution, and exponential marginals .. 85 Simulation results for Normal(O,l) covariate, uniform(O, 5) censoring distribution, and Weibull marginals . . . . . . . . . . . . . . . . . .. 4.7 84 Simulation results for Normal(O,l) covariate, uniform(O, 1) censoring distribution, and exponential marginals . . . . . . . . . . . . . . . .. 4.6 83 Simulation results for Normal(O,l) covariate, uniform(O, 5) censoring distribution, and exponential marginals . . . . . . . . . . . . . . . .. 4.5 79 Simulation results for Normal(O,l) covariate, no censoring, and exponential marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . " 4.4 78 87 Simulation results for sample size of 100, Bernou11i(0.5) covariate, and uniform(O, 5) censoring distribution . x " 88 5.1 Estimates of regression parameters for the Framingham Heart Study data. . . . . . . . . . . . 92 I 7.1 Simulation results for the true mixed baseline hazard model with Normal(0,1) covariate, no censoring, and exponential marginals. . 7.2 128 Simulation results for the true mixed baseline hazard model with Normal(0,1) covariate, uniform(O, 5) censoring distribution, and exponential marginals 7.3 . . .. .. . . . . .. 129 Simulation results for the true mixed baseline hazard model with Normal(0,1) covariate, uniform(0, 1) censoring distribution, and exponential marginals . 7.4 . . . . . . . . . .. 130 Simulation results for the true mixed baseline hazard model with Normal(O,l) covariate, uniform(O, 5) censoring distribution, and Weibull marginals 7.5 .. . . . . . . . . .. .. .. Simulation results for the true mixed baseline hazard model with (30 131 = • 0.7, Bernoulli(0.5) covariate, uniform(O, 5) censoring distribution, and exponential marginals . 7.6 . . . . . . . . . .. .. . . 132 Simulation results for the true mixed baseline hazard model with sample size of 100, Bernoulli(0.5) covariate, and uniform(O, 5) censoring distribution . . . . . . . . . . . . 133 • ,. Xl .. Chapter 1 Introduction and Literature Review 1.1 Introduction This dissertation will consider multivariate failure time data in which more than one failure can occur on the same experimental unit and all of the event times can potentially be observed. This problem is different from the one that is commonly referred to as the competing risks problem, wherein one and only one among several distinct events can be observed. One well known study which exemplifies the motivation for this research is the Framingham Heart Study. One of the purposes of this study was to assess the effect of the risk factors on age at death for all causes and times until angina pectoris, myocardial infarction, silent myocardial infarction, cerebrovascular accident, coronary insufficiency, congestive heart failure, and cancer. Some individuals in the Framingham Heart Study are related because the sampling unit is family. There are siblings " and married couples in the study. In the cohort of 4,211 subjects used in the analysis by Klein (1992), there were 1,146 married couples. Only 1,919 individuals of 4,211 were singletons, who were either unmarried or whose spouses were not included in the dataset. Among these 4,211 cohort members, 1,050 individuals were from 452 sibships, with sibship size ranging from two to six. The multivariate failure time data from the Framingham Heart Study contains both between-subject and within-subject dependence structures. Husbands and wives who share common environmental conditions, such as diet, hazardous material exposures in the household, smoking, and other lifestyle characteristics, may have their event times dependent on each other. Siblings who share a common genetic code and early environmental exposure will have their event times more closely related than those of non-siblings. Such dependence of event times among family members constitutes between-subject dependence. Within-subject dependence refers to the dependence among different event times for the same individual, such as time to angina pectoris and time to myocardial infarction. The presence of both between-subject and within-subject dependence in these possibly censored data poses challenges to the statistical analysis. To our knowledge, the literature does not contain a reference to the problem of censored data with both between and within-subject dependence. By contrast~, much recent research effort has been devoted to multivariate failure time data to account for either between-subject dependence or within-subject dependence, but not both. In the next section we will review the relevant literature in these areas. 2 1.2 Literature Review 1.2.1 Univariate Survival Data Analysis Failure time data analysis, or survival analysis, deals with time to failure or events which may be censored. What makes failure time data analysis distinct from others is the issue of censoring, either intentional, as for example the termination of trials before the event of interest occurs, or unintentional, such as loss of follow-up. There are three main types of censoring: right censoring, left censoring, and interval censoring. The failure time is interval censored if it is only known that the failure occurs between the time interval (tL, tR). When tR right censored. If tL = 0, --+ 00, the observation is it is called left censored. The usual methods of handling right censored data cannot be used for interval censored data. In this dissertation, we consider only right censoring. In general, there are two types of regression models for failure time data. One type of regression is based on the linear regression model, Y = Zf3' + E, where Y = g(X) represents either the observed failure time X or any monotone transformation of the observed failure time. When Y is the logarithm of the observed failure time, this model is the accelerated failure time model, in which the covariates Z alter the time to failure through a multiplicative effect on the observed failure time (Kalbfleisch and Prentice (1980)). The other type of regression model is the well-known Cox regression model (Cox (1972)), also called the Cox proportional hazards model when the covariates Z . are time-independent. Let 1 A(t) = lim- Pr(t . h!O h =:; T < t + hiT ~ t) 3 be the hazard function. The Cox regression model postulates that the failure time T associates with covariate Zi through the hazard function (1.1 ) where AO(t) is the unknown and unspecified nonnegative baseline hazard. Assuming the observed failure times are distinct, the partial likelihood function for the Cox regression model (Cox (1975)) is (1.2) where ~ = {I : 1i ~ Ti } is the risk set, i.e., the set of indices corresponding to individuals at risk and uncensored at time ti. The corresponding score function is: where S(O)({3, t i ) = L exp[{3' Z/{ti)] /eR; and S(l)({3, ti) = L Z/(ti) exp[{3' Z/(ti)]. /eR; In most analyses, independent censorship and noninformative censoring schemes are assumed. Independent censoring would hold, for example, if the failure time Ti and the censoring time Ci are statistically independent, conditioning on covariates Zi, for i = 1,2,'" ,n. The censoring is noninformative if the parameters of the distribution of the censoring time do not depend on the parameters of interest, here, the regression coefficients. Notice that independent censorship does not guarantee noninformative censoring. In this dissertation, we will also assume independent censorship and a noninformative censoring scheme. The maximum partial likelihood estimator /3 is the solution to U({3) = O. Tsiatis (1981) proves that -,fii(/3 - (3o) converges to a multivariate normal distribution 4 . with mean zero and variance Z-l(f3) using the traditional approach, where fj2 Z = -E[8f38f3' log PL(f3)]. Based on induced order statistics and permutation arguments, Sen (1981) developed the asymptotic distribution theory of the score function test statistic by using discrete time martingales. Andersen and Gill (1982) elegantly proved the asymptotic distribution of /3 by applying martingale theory in the counting process framework. Johansen (1983) demonstrated that the partial likelihood may be viewed as a profile likelihood in which the unknown baseline function Ao(t) is replaced in the total likelihood by a nonparametric maximum likelihood estimate. A simple extension of the model (1.1) is the inclusion of strata (Kalbfleisch and Prentice (1980)). The hazard function for individual i in the jth stratum is defined as (1.3) and the partial likelihood is P L(f3) = II P Lj(f3) j where PL j (f3) is the partial likelihood function for stratum j. There are many other important articles on univariate survival analysis, so many that it is impossible to give a complete account of this topic in this short literature review. For a more comprehensive discussion of the topic, we refer to the books by Cox and Oakes (1984), Kalbfleisch and Prentice (1980), Fleming and Harrington (1991), and Andersen, Borgan, Gill and Keiding (1993). 5 1.2.2 Modeling Multivariate Failure Time Data Introduction Multivariate failure time data arise when either an individual records multiple failure events, the so-called subject effect which induces within-subject dependence, or individuals recording a single failure event are grouped into clusters because of clustered sampling or match, the so-called design effect which induces between-subject dependence. Here, we exclude the competing risks situation where failure of one type precludes the occurrence of other types of failure, as for studies involving different causes of death. In the setting of multivariate failure time data induced by within-subject dependence, multiple events may fall into one of two categories, ordered and unordered. In the unordered category, different types of failure processes are acting simultaneously and, at any time, each subject can experience a failure of a particular type, unless that type of failure has already occurred in that subject or has been censored. For example, in a bone marrow transplant study of leukemia patients, the unordered distinct types of failures include chronic graft versus host disease (GVHD), relapse of leukemia, or death. Other examples of studies involving unordered multiple failures are studies on diseases of the eye, kidney, lung, etc. if we consider the failure on different eyes, different kidneys, and different lungs as different failures. Repeated, successive or recurrent events of the same type of failure pertain to the ordered failure outcomes. Examples are repeat myocardial infarction attacks and bladder tumor recurrence. However, in the case of multivariate failure time data induced by betweensubject dependence, the failures among individuals in the same group are generally unordered. The ordered or unordered nature of multiple failure events will affect the way we analyze multivariate failure data. For example, we may choose a conditional modeling approach for the ordered multiple failure events but a marginal approach 6 . for the unordered multiple failures. The models for multivariate failure time data can be classified into two broad categories: models that specify the structure of dependence and marginal models which treat the dependence among failure times as a nuisance and avoid specifying the structure of the dependence. The structure of the dependence among multivariate failure time data can be specified in the following ways: (1) assume joint distributions of parametric models; (2) condition on past event or history by specifying multiplicative intensity models, overall intensity models, or other conditional models; (3) use frailty model theory to include random effects in the model. J oint Distribution Multivariate distributions can be generated or defined in a number of different ways, each of which is more or less relevant for the specific situation and purpose. Almost all examples we consider here are bivariate for simplicity of presentation. However, all can be generalized to higher dimensions of multivariate cases, unless stated otherwise. Assume T1 and T2 are two random variables representing times to failures of the first type and the second type, respectively. The hazard function of a failure of type i at time t given no failure before time t is i = 1,2 and the hazard function of a failure of type i at s given a previous failure of the other type j at time t is Ailj(slt) = ~W ~ Pr(s $ Ti < s + hlTi ~ S, Tj = t), i = 1,2, i + j = 3. These four hazard functions determined the joint distribution of (Tl, T2 ) if this bi- 7 variate density is continuous. The joint distribution for t 1 < t 2 is then with a similar expression if t 1 > t 2 • One approach to the construction of bivariate survival distribution is to transform the independent variables, as given in the following example. Example 1.1 (The bivariate 'shock' model of Marshall and Olkin (1967)) Let Vi, V2, and V3 denote the time to failure of type 1 but not i~ype 2, failure of type 2 but not type 1, and failures of type 1 and type 2 simultaneously, respectively. Assume that Vi, V2, and "'3 are independent exponentially distributed variables with hazards All A2, and A3' respectively. Let T 1 = min(Vi, "'3) and T 2 = min(V2, V3). Then the bivariate survival distribution is given by The distribution is a curved exponential family, as there is a positive probability of A3/(A1 + A2 + A3) that the two components fail simultaneously. Thus, the distribution should be used in the situation where simultaneoUls failure of two components is possible. The observations are independent if A3 == o. It can be shown that the marginal distribution of Ti is exponential with hazard Ai + A3 and the distribution of Tmin = min(Tll T2) is exponential with hazard (AI + A2 + A3), also. The distribution has a bivariate lack of memory property: This distribution allows semiparametric generalizations along the lines of the Cox regression model for failure time data with event intensities AI, A2' and A3 depending on a vector of covariates Z. Partial likelihoods for the regression parameters may be derived, and in most cases the standard Cox regression model software may 8 be applied for the analysis with minor modification of the input data file. Klein, Keiding and Kamby (1989) applied the semiparametric Marshall-Olkin Models to the occurrence of metastases at multiple sites after breast cancer. To avoid the singularity along the line tl = t 2 , Block and Basu (1974) derived an absolutely continuous bivariate exponential distribution which retains the bivariate lack of memory property but does not have exponential marginals. They also proved that the only absolutely continuous bivariate distribution having both exponential marginals and the bivariate lack of memory property is in the trivial case where Tl and T2 are independent. Later, Sarkar (1987) derived an absolutely continuous bivariate exponential distribution which has exponential marginals. Clearly, he had to abandon the bivariate lack of memory property. Hougaard (1986a) derived a multivariate Weibull distribution through the transformation of independent variables as well. A nice result about the dependence in his formulation is that the correlation between log Tl and log T2 is (1 - ( 2 ), where a E (0,1] is the index of a positive stable distribution. The observations are independent if a = 1. The bivariate exponential model mentioned by Gumbel (1960) is a special case of Haugaard's bivariate Weibull model. Similar to Marshall-Olkin's model, the marginal distributions of Ti (i = 1,2), and the distribution of Tmio of the Gumbel's bivariate exponential distribution are exponential. However, the Gumbel's model does not admit a singularity along the line t l = t 2 in the (tIl t 2 ) plane as does the Marshall-Olkin model. The joint distributions can also be defined through conditional distributions. One of the classical models of this type is the bivariate extension of the exponential distribution shown in the following example. Example 1.2 (Freund (1961)) The following is a bivariate exponential density of 9 " Unlike the Marshall-Olkin model and the Gumbel model, the marginals of the Freund model are not necessarily exponential. The marginal distribution of Ti is exponential if and only if 0i = /3i, i = 1,2. T1 and T2 are independent if The distribution of Tmin is exponential with hazard component i fails first is Od(OI + 02)' 01 + 02. 01 = /31 and 02 = /32, The probability that Here, the dependence between T 1 and T2 is characterized by the assumption that the failure of component 1 (or component 2) changes the hazard of failure of component 2 (or component 1) from to /32 ). 01 to /31 ( or 02 This is particdarly relevant in situations where the failure of one component implies an increased load on the other, for instance, the case of two-organ systems such as the kidneys of one individual. Sometimes, as for example, in simulation studies, it is convenient to parameterize the joint distribution by means of the marginal distribution Fi(.) and a single parameter 0 characterizing the dependence, that is, (1.4) Example 1.3 (Gumbel (1960) and Farlie (1960)) If F1 (t 1 ) and F 2 (t 2 ) are distribution functions, then is a bivariate distribution func.tion for 0 E [-1, 1]. When 0 = 0, T 1 and T 2 are independent. If the marginal distributions are exponential, the correlation between T 1 and T 2 is 0/4. Consequently, the correlation can ble either positive or negative, but the absolute value cannot exceed 0.25. 10 • Genest and MacKay (1986) noticed that many bivariate distributions are in the family of Archimedean copulas, which is a special case of (1.4) by restricting to • uniform marginals, in the form where K(.) is a strictly monotone decreasing convex function defined on (0,1]. One widely cited bivariate frailty model due to Clayton (1978) fits into this framework. Example 1.4 (Clayton (1978)) Assume that conditional on Y, the frailty, the failure times of subjects are independent and with hazard YAi(t) for subject i. Let Y have a gamma distribution. Then the bivariate survival function of (Tll T 2 ) is Here, independence is obtained as a limiting case, for e ---+ 00. This model was further studied by Oakes (1982). Notice that with this derivation, there is only positive dependence between members in a pair. However, the bivariate distribution can be extended to allow negative dependence (Genest and MacKay (1986)). Covariates can be included in the model assuming conditional proportional hazards (Clayton and Cuzick (1985)) and in the case of Weibull hazards (Crowder (1985)) also. Marshall and Olkin (1988) generalized (1.4) to induce the families of multivariate distributions through the use of frailty models. Hougaard (1986b) and Oakes (1989) discussed a wide range of densities for frailty Y. Bandeen-Roche and Liang (1996) constructed multivariate survival distributions by a recursive nesting of univariate frailty-type distributions through which Archimedean copula forms are determined for all bivariate margins. Their family of distributions allows for different levels of association between bivariate margins, instead of exchangeable dependence 11 structure only. Hougaard (1986a) derived many joint distributions when the frailty y follows a family of positive stable distribution using the Laplace transformation. • For a more detailed review of bivariate survival models and some of their known properties, see Hougaard (1987). Conditioning on Past Events or History Multiplicative intensity models condition on all past event history. In the counting process approach of multiplicative intensity models to censored failure time data, the intensity process for counting process N is modeled, rather than hazard functions for the failure time T. Let N = (NIl···' N n ) be an n-component multivariate counting process, where N i counts observed events in the life of the ith individual, i = 1,· .. ,n, over the time interval [0,1]. From the definition of the multivariate counting process, the sample path of N 1 , · · · , N n are step functions, zero at time zero, with jumps of size +1 only, and no two component processes can jump at the same time. Properties of stochastic processes, such as being a local martingale or a predictable process, are relative to a right-continuous nondecreasing family (.1t : t E [0,1]) of sub O'-algebras on the sample space (n,:F, P). In other words, .1t is the filtration, which represents everything that happens up to time t. Example 1.5 In the multiplicative intensity model approach, the Cox hazard model is based on the assumption that N has random intensity process A = (All·· ., An) such that where f3 is a fixed column vector of p coefficients, Ao(t) a fixed unknown hazard func- tion, and Yi (t) is a predictable at risk indicator process. Therefore, the Cox model is a special case of the multiplicative intensity model with Ni(t) = I {Ti :5 t, 6 = I}, a zero-one variable, and Yi(t) = I {Xi ~ t}. 12 ... Andersen and Gill (1982) suggested a sirrlple extension of the Cox proportional model to allow for multiple (recurrent) events per subject by applying multivariate counting processes and extended the Cox partial likelihood theory to this situation. Example 1.6 Andersen and Gill (1982) considered the hazard rate Ai(t) = lim -hI Pr{Ni(t + h) h!O Ni(t) = 11Ft }, i = 1, ... , n, t E [0,1]. They obtain repeated observed failures of recurrent events by taking }i(t) as long as the subject is under observation and Ni(t) Ni(l) < 00, = 1, 2:: 0, with the assumption a.s.. Assume that the counting process N has random intensity process A = (All··· An) such that Ai(tj Zi(t)) = }i(t)Ao(t) exp{,8' Zi(t)}, i = 1, ... , nj t E [0,1] and Zi(t) is predictable and locally bounded. Then, the processes Mi defined by are local square integrable martingales on the time interval [0,1}. The conditional variance of a martingale M is given by the predictable variation process < M defined by having the increments d<M>(t) = Var{dM(t) IFt - }. The predictable covariation process <Mil M 2 > is defined by having the increments Also, and i.e., M i and Mj are orthogonal when i # j. 13 >, Under the AG multiplicative intensity model, the risk sets for the (r + l)th recurrences are not restricted to the subjects who have experienced the first r recurrences. In other words, all the subjects who are not censored and do not experience more than (r + 1) recurrences are considered at risk of (r + l)th event, regardless of whether they have not experienced any event at all or have experienced r recurrences already. Also, under the assumption of constant baseline intensity over all events, the risk of a recurrent event for a given subject is unaffected by any earlier events that occurred to that subject, unless covariates that capture such dependence are included explicitly in the model as covariates. Consequently, such a model is somewhat restrictive in terms of the nature of the within-subject dependence among recurrent failure times on the same subject. The AG models were later extended to an nKdimensional multivariate counting process by Andersen and Borgan (1985) and Andersen et al. (1993), with elements Nik(t) counting the number of type k (recurrent) events on [0, t] for individual i. In the Cox model, the baseline hazards function is deterministic. The Cox regression models with stochastic baseline hazards were generalized by Prentice, Williams and Peterson (1981) (PWP) to multiple failures with within-subject dependence, called the overall intensity model. Example 1.7 (Prentice et al. (1981)) Let z(u) = [Zl(U), .. ·,zp(u)]' of covariates, for a study subject, available at time u ~ denote a vector O. Denote by Z(t) = {z(u) : u :5 t}, the corresponding covariate process up to time t. Let N(t) = {n(u) : u :5 t}, where n(u) is the number of failures on a study subject prior to time u. Assume that· (i) the failure time random variables are continuous and (ii) given that at least one failure occurs in [t, t + h), the limiting probability of two or more failures in [t, t + h) is zero as h --+ O. The counting process N(t) is equivalent to the random failure times T1 < ... < Tn(t) in [0, t] on a given individual. The hazard or intensity function at time t is defined as the instantaneous rate of failure a:t time t given the covariate and 14 counting processes at time t. Specifically, one can write A{tIN(t), Z(t)} = ~W-X Pr{t $ Tn(t)+l < t + hIN(t), Z(t)}. PWP proposed to permit (a) an arbitrary baseline intensity dependence on either the time from the beginning of the study (total time) or the time from the immediately preceding failure (gap time) and (b) the shape of the baseline hazard function to depend arbitrarily on the number of preceding failures and possibly on other characteristics of {N(t), Z(t)}. The two semiparametric hazard function models they suggested are: A{tiN(t ), Z (t)} = Aos ( t) exp{ z (t )',8 s } (1.5) A{tlN(t), Z( t)} = Aos(t - tn(t») exp{ z(t)',8 s} (1.6) and = 1, 2, ... ) are completely arbitrary baseline intensity stratification variable s = s{N(t), Z(t), t} may change as a where in each case Ao s ( .) 2:: 0 (s junctions, where the function of time for a given subject, and where ,8 s is a column vector of stratumspecific regression coefficients. For instance, an important special case for the stratification variable is s = n(t) + 1, for which a subject moves to stratum (j + 1) immediately following his jth failure and remains there until the (j + 1)th failure or until censorship takes place. The method of allowing the arbitrary baseline hazard to depend on gap times used by Gail, Santner and Brown (1980) in the analysis of comparative carcinogenesis experiments is a two-sample special case of (1.6). Partial likelihood was used for inferences about ,8 s' Let t sI < '" < tsd, denote the ordered (assumed distinct) failure times in stratum s. Suppose subject i fails in stratum s at time tsi and let Zsi(tsi) denote this subject's covariate vector at tsi. Also let R(t, s) denote the set of subjects at risk in stratum s just prior to time t and ds be the total number of failures in stratum s. The partial likelihood function for the 15 first model (1.5) is: d. L(f3) = II IIlexp{f3: z .,(t.,)}/ L ,,~1 ,=1 exp {f3:Z1 (t.,))]. lER(t.i,,,)i Assume the stratification is restricted to be s = n(t) + 1 or finer so that a subject can contribute at most one failure time in a specific stratum. Denote by Ud < ... < U.k., the distinct gap times from immediately preceding failure on the same subject, for the k. failures occurring in stratum s. Suppose subject i fails in stratum s at gap time U., and that z.,(t.,) is the corresponding covariate value. the set of subjects at risk in stratum s at gap time U - Let R( u, s) denote 0, U E (u.,'-ll u.,). Then the partial likelihood for the second model (1.6) is: k. L({3) = II IIlf3: exp{z",(t",)}/ L 13: exp{zl(tl + u",)}], ,,~1 ,=1 lER(u.i,") where tl is the last failure time on subject 1 prior to entry into stratum Sj tl = 0 if no preceding failure on the subject. Both of the partial likelihoods are of the same form as for failure time data with the exception of the time argument and risk set definitions. Notice that the PWP overall intensity models stratify the data according to the number of previous occurrences. This is done to allow for different baseline intensity for different events and to restrict the risk sets for the (r + 1)th recurrences to the subjects who have experienced the first r recurrences, which are different from the AG multiplicative intensity models. Consequently, PWP models are most likely to be useful when the sample size is large and the multiple failures are with 'ordered' nature such as recurrent events. PWP did not provide the asymptotic estimation theory for the overall intensity models they proposed. Chang and Hsiung (1994) showed later that their proposed estimators are asymptotically normal. 16 Frailty Models Frailty models use random effects to represent heterogeneity of 'frailty' or proneness to failure. In general, the frailty model is used to accommodate between-subject dependence induced by a group of subjects, such as married couples or siblings, who share some common characteristics. The concept of frailty was originally introduced by Vaupel, Manton and Stallard (1979) in a univariate analysis of life table data. However, as Aalen (1988) noted, when only one event can be observed for each subject, the introduction of heterogeneity via frailties leads to severe problems of identifiability in failure time data analysis. This is not surprising. It is analogous to the one-way random effects model in linear models where the between-subject and the within-subject variance components can only be estimated when there are multiple observations for at least some of the subjects. If the frailty term is common to several individuals, it generates dependence among their failure times. Clayton (1978) considered a model with no covariates and with frailties distributed according to a gamma distribution for the multivariate failure time data induced by clusters of subjects. In his model, the univariate frailty is assumed to affect the failure rates in a multiplicative way. So if the hazard function of a subject with a frailty value of 1 is given by 'x(t), then the hazard function of a subject with a frailty value of W is given by W'x(t). Clayton and Cuzick (1985) extended the Clayton model to include fixed time covariates Z in which a single random effect lti enters the intensity process specification in a multiplicative way. Example 1.8 (Clayton and Cuzick (1985)) Let Tij (i = 1,···, nij = 1,···, ni) de- note the survival time of individual j in family i. Assume that conditional on the family specific frailty Wi for i = 1,···, n, all individuals are independent; then, the 17 hazard of Tij is (1.7) Assume that .xij is in the form of the Cox proportional hazard; then, from (1.1), (1.8) where U'i '" Gamma (ex, v), ex = v = ,-I, E(Wi ) = 1, and var(Wi ) = ,. Note that the frailty is an unobserved random factor applied to the base- line hazard function. Conditional on the value of the unobservable frailty, the failure times follow the Cox regression model. The marginal hazard .xij(tIZij) = E w [W.x o(t)exp(,8'Zij)] no longer follows the proportional hazards model; instead there is convergence of hazards at a rate determined by the dependence parameter, (Clayton (1991)). Consequently, the dependence parameter and the regression parameters are confounded. Information for estimation of, comes partly from the coincidence of failure within families, and partly from marginal convergence of hazards in relation to covariates. This implies that the dependence parameter also measures something other than dependence. Clayton and Cuzick (1985) proposed likelihood-·based estimation procedures for , and ,8 using the distribution of the generalized rank vector (Prentice (1978)), but the proof of the asymptotic properties of the method is not available. A series of approximations were made in order to make this approach computationally feasible; however, their procedures remain computationally complex. Nielsen, Gill, Andersen and Sfljrensen (1992) used the profile likelihood via an EM algorithm to estimate the cumulative baseline hazard and the variance of the random effect in the frailty model. The asymptotic distributions of these estimators along with consistent estimators of their asymptotic: variances are given by Murphy (1994) and Murphy (1995). Klein (1992) also used an EM algorithm based on the profile likelihood to carry out the estimation of parameters in the frailty models for 18 the Framingham Heart Study. The difference between Nielsen et al. (1992) and Klein (1992) is that Nielson et al. used a one-dimensional search of the profile likelihood while Klein carried out the complete implementation of the EM algorithm. Because a common frailty cannot capture the between-subject dependence among married couples and siblings concurrently, Klein used one frailty model to incorporate the dependence induced by married couples and another frailty model to accommodate the dependence induced by siblings. However, only one endpoint of interest (death) was considered in his frailty model. Hougaard (1986a) presented positive stable distributions for frailty W, both with arbitrary and with Weibull individual hazards. The Weibull model is mathematically interesting because it has Weibull marginal distributions and the time to the first failure in a cluster also follows a Weibull distribution. The Fisher information of the bivariate Weibull was found by Oakes and Manatunga (1992). The frailty model with accelerated hazards for bivariate failure time data was proposed by Anderson and Louis (1995). They presented both parametric and semiparametric techniques for parameter estimation. However, the proof of the asymptotic properties of the methods remains elusive. Chang and Hsiung (1995) proposed proportional hazards models with time dependent frailties. A regular efficient estimator for the relative risk parameter is obtained. They showed that this estimator is asymptotically normal and asymptotically efficient. All these methods mentioned so far impose specific structures of dependence among the multivariate failure data either explicitly or implicitly. If the dependence structure is misspecified, the estimators may not be valid. When the dependence structure is not the parameter of interest, we can instead consider modeling only the marginal hazard functions to avoid the specification of the dependence structure. 19 Marginal Models The dependence of related failure times complicates the analysis of multivariate failure time data. Because of censoring, this dependence poses a greater challenge than uncensored multivariate data. By using a marginal model approach, we only specify the marginal model for each failure time variable and leave the dependence among failure times unspecified. Therefore, if we are only interested in the marginal regression parameters and treat the dependence amonl~ failure times as a nuisance, a marginal model approach can be used. Specifically, in a marginal model approach for regression analysis of multivariate failure time data, we use the following two steps: first, fit each failure time variable using a univariate model, ignoring the possible dependence among the multivariate failure time variables; then, replace the naive covariance matrix with a robust covariance matrix estimator to account for possible dependence among the multivariate failure time variables. Let (Tij, Cij, Oij, Zij), i = 1,,'" n,j = 1,"', J', denote the failure time, cen- soring time, censoring indicator, pj-vector of explanatory variables of the jth failure type of event on the ith subject under study, respectively. We observe (Xij, Oij, Zij), = min(Tij , Cij ) and Oij = I {Tij < Cij }, for i = 1,'" }'ij(t) = I{Xij ~ t}, Xi = (XiI,''',Xa )', Zi = = 1" .. ,J. where Xij ,n and j Denote (Ziu,''',Za)', and 6 i = (OiU,' .. ,oa)'. Example 1.9 (Wei, Lin and Weissfeld (1989)) In the setting of the regression analysis of multivariate failure time observations on the same subject, Wei, Lin, and Weissfeld (WLW) modeled the marginal hazard of each failure time variable Xij with a Cox regression model. The dependence structure l'lmong distinct failure times on each subject was left unspecified. The hazard for the j th failure type on the i th subject has the form 20 where Ao; (t) is an unspecified baseline hazard function for the j th failure type and (3i = (f31i, ... , f3Pii)' is the failure specific regression parameter. Therefore, the jth failure specific Cox partial likelihood is Li((3.) = i=l J where Rj(t) = {I : X ,j ~ f[[ exp{(3jZii(~ii)} E,eRi(X,j) exp{(3; Z Ij (Xij )} t i t} is the set of subjects at risk just prior to time t with respect to the jth type of failure. The regression parameters (3j are estimated by maximizing the failure specific partial likelihoods and solving the score equation Under the assumption that (Xi, 6 i , Zi(t)), i = 1"", n are independent and iden- tically distributed with bounded Zij, Wei et al. (1989) showed that the resulting estimators across all types of failure Vn(/3~ - (3~, ... ,/3~ - (3~)' are asymptotically jointly normal with mean zero and a covariance matrix which can be consistently estimated by the sandwich type robust covariance estimator. It is convenient to let the regression parameters (3 = (f3I,"', f3p)' be the same for all types of failure. This can always be achieved by introducing fail- ure type specific covariates. We are going to show how this can be done for the WLW model. Let (3 = (f3I,"', f3p)' = ((3~, ... , (3~)', where p = Ef=l Pj. If we introduce failure type specific covariates for the ith subject on the jth failure Zij = (O~ll"" O~i-l' Z~j' O~j+ll' .. ,O~J)', that is, Zij consists of stacking together (J-l) zero vectors corresponding to other (J-l) types of failure and the Zij for the jth type of failure, then (3'Zii(t) = (3jZij(t) is the risk score for the jth type of failure of the ith subject. Denote }'ij(t) = [{Xii ~ t}, where [{.} is an indicator function. Then, }'ij(t) = 1 if subject i is at risk and under observation just prior to time t for experiencing the jth failure. Therefore, the hazards for the jth failure on the ith subject in WLW approach can be re-written as Aii(t) = AOj(t) exp{(3' Zij(t)} , t ~ 0, 21 (1.9) and the jth failure specific Cox partial likelihoods as The corresponding score equations for all n subjects and all J failures are U(/.I) = 1J ~ ~ S.. {z~.(x .. )_ Ei=1 ¥,j(Xij ) exp[f3' Zlj(Xij)]Zlj(Xi j )} L..JL..J i=1 ;=1 I) I) ",n I) LJI=1 y,.(X· .. ) [/.I'Z*(X .. )] I) I) exp 1J Ij I) Hence, the use of f3 as regression parameters in the failure specific model does not preclude the use of failure specific parameters since we can take into account the failure specific regression parameters through the use of failure type specific covariates. Notice that if we consider different types of failures as strata, the WLW model stratifies the analysis based on the failure type by using different baselines (c.f. (1.3) and (1.9)). Example 1.10 Lee, Wei and Amato (1992) considered highly stratified data sets which arose from paired eye data on vision loss dUi~ to diabetic retinopathy, where photocoagulation was randomly assigned to one eye of each patient. Each pair of eyes on the same subject was treated as a cluster. The m,arginal hazard for the failure on the j th eye of the i th subject has the form where 'xo(t) is an unspecified common baseline hazard function and f3 = (/3h· .. ,/3p )' is the common regression parameter among J marginal models. Under the assumption that (Xi, c5 i , Zi(t)), i = 1,,,·, n are independent and identically distributed with bounded Zij, Lee, Wei, and Amato (LWA) estimated the regression parameters 13 by maximizing the 22 "partial likelihoods": • The corresponding score equations are U({3) n J = l: l: 6ij { i=l j=l ",n ",J Yi (X )ef3'Z,m(Xi j )Z (X)} Zij(Xij ) _ L.I=l L.m=l 1m ij , 1m ij . E;=l E~=l lim (Xij )e{3 Z'm(Xij) They also showed that the resulting estimators Vn(,8~ -.BL···, ,8~-.B~), are asymptotically jointly normal, with zero mean and covariance matrix which can be consistently estimated by the "sandwich" type covariance estimator. Consider the failure on different eyes as different failures. As we pointed out previously, the use of common regression parameters does not preclude the use of failure specific parameters. Therefore, the only difference between LWA and WLW models is that LWA postulates a common baseline hazard function among the marginal models whereas WLW allows different baseline hazard functions. In other words, the WLW model stratifies the analysis based on the failure type while the LWA model does not. Example 1.11 (Liang, Self and Chang (1993)) A different procedure was proposed to estimate the marginal regression parameters for the LWA model in the context of between-subject dependence induced by grouping individuals into clusters. Liang, Self, and Chang (LSC) based their estimators on pairwise comparison of individuals who have failed and independent individuals who are at risk at the time of failure. Their estimating equation is similar to LWA's with E;=l E~=l lim (Xij ) exp [{3' Z 1m (Xij )] Z 1m (Xij ) E;=l E~=l lIm(Xij)eXp[{3'Z'm(Xij )] replaced by pairwise comparisons of independent observations. The resulting estimating equation is n U({3) J = l: ~ f{ni(Xij»o}6ij {Zij (Xij ) i=l j=l where ni(t) = EI~i Ek lIk(t), fA nil (Xij) ~ ~ eij,lk({3,Xij )} 1# = 0, k is the indicator function of the set A and eij,lk({3, t) is given by Xj(t)Zij( t) exp{ Zij (t)' {3} Xj(t) exp{Zij(t)'{3} + lIk(t)Z'k(t) exp{ Z,k(t)'{3} + lIk(i) exp{Z,k(i)'{3} 23 The asymptotic distributions of estimators of marginal regression parameters are developed by the representation of the estimating equations as V-statistics. Their estimators are consistent and asymptotically normal. The estimates and their variances from LWA and LSC procedures are very similar when the data from the Diabetic Retinopathy Study (Lin (1994)) were used. It would be interest to compare the efficiencies of the LWA robust estimators and the LSC estimators. The LWA procedure is computationally easy and can be performed by using the standard statistical software (Therneau (1996)). However, no statistical software is available for the calculation of the LSC procedure. In the above marginal approaches, the "partial likelihood" score equations were derived under the working independence assumption. It may be more efficient to use weighted estimating equations that take into account the nature of dependence explicitly as in the case oflongitudinal data (Liang and Zeger (1986)). In an attempt to improve the efficiency of marginal parameter estimators, Cai and Prentice (1995) proposed weighted estimating equations for the estimation of marginal parameters. They proved the resulting estimators remain consistent and asymptotically normal with an estimable covariance matrix under mild regularity conditions on the weight matrices. The weight matrices can be estimated either parametrically or nonparametrically without affecting the asymptotic distribution of the regression parameter estimate. Cai and Prentice (1995) and Prentice and Cai (1992) suggested the use of the inverse matrix of estimated correlations among counting process marginal martingales. Their simulation studies (Cai and Prentice (Um5)) indicate that the inclusion of weights in the estimating equations results in efficiency gains only when the dependence among the failure times is strong and the censoring is not heavy. This may occur because it is difficult to construct optimal weight matrices as a result of the censoring and the non-linear nature of the Cox model (Lin (1994)). Prentice and Hsu (1996) proposed using joint estimating equations for hazard ratio and correlation parameter, which generalized the idea of Zhao and Prentice 24 (1990) and Prentice and Zhao (1991) to multivariate failure time situations. The elements of a multivariate failure time variate are assumed to have marginal hazard functions of the Cox regression form. The pairwise correlations among cumulative hazard variates provide summary measures of the dependence among failure times that do not depend on the corresponding marginal distribution shapes. In the absence of censorship, the mean and the covariance structure of cumulative baseline hazard variates, in conjunction with standard baseline hazard function estimators, is used to develop joint estimating equations for hazard ratio and cumulative hazard variate correlation parameters. Additional assumptions are required to generalize these estimating equations to allow independent right censorship and time-varying covariates. Semiparametric models are introduced for the joint distribution of pairs of cumulative hazard variates, with emphasis on the special case of the Clayton model. These authors argued that the cumulative hazard correlation estimates enjoy some robustness to departures from the assumed semiparametric model forms under conditions of light censorship, and that the corresponding Clayton model parameter estimates have a useful interpretation under heavy censorship. They showed that estimators of hazard ratio and cumulative hazard variate correlations parameters are consistent and asymptotically normal and that marginal hazard ratio parameters are generally consistently estimated even if other distributional assumptions do not hold. An attractive feature of these procedures is that it is unnecessary to make assumptions concerning the joint distribution of failure times, beyond marginal hazard models for failure times, the mean and the covariance structure of cumulative baseline hazard variates, and semiparametric models for the joint distribution of pairs of cumulative hazard variates. However, the computation is cumbersome and the cumulative hazard correlation estimates depend on the assumed semiparametric model forms. Instead of using the semiparametric Cox regression model as a marginal model, Huster, Brookmeyer and Self (1989) used a fully parametric specification of the marginal hazard function under the independent working assumption in their 25 marginal approach for bivariate failure time data. They showed that the maximum likelihood estimates of regression parameters under the independent working assumption are consistent with respect to a specified marginal distribution and their covariance matrix can be consistently estimated by the robust covariance estimate derived. Lin and Wei (1992) and Lee, Wei and Ying (1993) applied the methods of WLW and LWA to the case of accelerated failure time models, respectively. Concluding Remarks A fundamental consideration in choosing a strategy for the analysis of multivariate failure data is whether the dependence among multivariate failure time data is a nuisance or a parameter of intrinsic scientific interest. If dependence is a nuisance, a marginal approach should be used to avoid imposing a special structure on dependence and to account for dependence at the same time. \Vhen dependence is of interest in its own right, however, other approaches which explicitly model dependence have to be used. Usually, frailty models are used when between-subject dependence is considered. 1.2.3 Model Misspecification and Robust Inferences Assume that Xl,'" ,Xn are independently identically distributed (iid) random variables with specified probability density function f(x'; 8). It is well known that under some regularity conditions I I 2 / (8)(iJ - 8) converges in distribution to the standard normal N(O, 1) probability law, where iJ is the maximum likelihood estimator (MLE) of 8, I(8) = -Ei=IE[82 10gf(8;Xi)/8888'], the (expected) Fisher's information (Cramer (1946)). Efron and Hinkley (1978) have suggested that I(8) should be replaced by the "observed information", 1(8) = - Ei=l 8 2 log f(8; xi)/8888', to pro- vide inferences which are properly conditioned on ancillary statistics. 26 In the parametric setting, a number of techniques have been suggested for handling misspecified models for independently identically distributed random variables, e.g., Huber (1967), White (1982), Kent (1982), Royall (1986), and Gail, Tan and Piantadosi (1988). Example 1.12 Huber (1967) discussed the behavior of any estimator iJ from the solution to an estimating equation So, the score equation A U(8) n =~ ,=1 8 88 log f(8; Xi) =0 is a special case of the estimating equation, with "p(iJjXi) = 8Iogf(8jxi)/8818=iJ' In particular, Huber was interested in the asymptotic distribution of the MLE iJ for the true parameter 8 0 under model misspecification: the true distribution of the underlying observations is h(.) but misspecified as f(.). Huber showed that under some mild regularity conditions, the MLE iJ converges to a well-defined constant vector 8* and vn(iJ - 8*) is asymptotically normal with mean vector zero and covariate matrix I(8*)-lC(8*)I(8*)-I; where 8* satisfies E["p(8*jx)] = E[:8Iogf(8jx)]I8=8* = o. Huber did not discuss the information theoretic interpretation of the parameter vector 8*. This interpretation was emphasized by Akaike (1973). He noted that when the true distribution is unknown, the MLE iJ is a natural estimator for the parameters 8*, the parameter vector which minimizes the Kullback and Leibler (1951) information criteria (KLIC), IC(h: fj8) =E[log f~~~~)]' Here the expectations are taken with respect to the true distribution. The opposite of IC(h : Ij 8) = -IC(I : hj 8) is called the entropy of the true distribution H(x) 27 with respect to the working distribution F(x;8), where H(.) and F(.) are distribution functions with measurable Radon-Nikodym density h(x) = dH(x) . 8) _ dF(xj 8) dv . an d f( x, dv Intuitively, ICU : hj 8) measures our ignorance about the true distribution. Hence, we might call iJ the 'minimum ignorance' estimator. Example 1.13 (White {1982}} In the setting of consequence and detection of model misspecijication when using the maximum likelihood techniques of estimation and inference, White exploited the properties of the informl1,tion matrix to yield useful tests for model misspecijication and rediscovered Huber's 1'esults. Let 1 82 ?: n . I n(8) = - 8888' log f(8; Xi), n ,=1 1 n 8 8 C n(8) = ;;- ~[88Iogf(8;xi)][ 80Iogf(8;xi))', X(8) 82 = -E[8888' log f(8; x)], and C( 8) = E{[ :8 log f( 8; x)][ :8 log f( 8; x)]'}. Under some regularity conditions, White showe(l that the sandwich estimator I;;I(iJ)C n (iJ)I;;I(iJ) converges to X- 1 (8*)C(8*)X- 1 (8*) almost surely, element by element. Note that the Fisher's information X(8) should be consistently estimated by either the score derivative In(iJ) or the squared score matrix Cn(iJ) under the assumed model f(8; Xi), provided that some regularity conditions are satisfied, A significant difference between In(iJ) and Cn(iJ) indicates that the assumed model f(8;xi) is incorrect. White suggested using the test statistic n 1/ 2 {I n (iJ) - Cn(iJ)} to detect the 28 model misspecification. The asymptotic normality of the test statistic follows easily from a Taylor series expansion and the Lindeberg-Levy central limit theorem. Now, let's see how the sandwich type of robust covariance matrix is estimated when the true distribution is h(.) = h(z; ( 0 ) but incorrectly specified as f(z; 8). First, we consider maximum likelihood estimation of the independent and identically distributed random variables. Then n n 8 U(8) = L 1/J(8; Xi) = L 88 log f(8; Xi) i=1 1=1 n =L i=1 1£i(8). By interchanging the order of the expectation and the derivative, = I(8) 8 88 E [U(8)] 8 = E[ 88U(8)] 82 -E[8888' log f(8; x)], where the expectation is taken with respect to the true distribution. Hence, I(8 0 ) is the expected value of the information matrix, which is naturally estimated by the observed information 1 A I n (8) Since E[U(8 0)] n - 82 = -~ ~ 8888' log f(8; xi)1 8 =o' = 0, n - n L E[1£i(80)1£~(80)] + L E[1£i(8 0)1£j(8 0)] i=1 ii'i (1.10) n - LE[1£i(80)1£~(80)]. i=1 Note that the observations are independent, thus, 1£i(8 0 ) are also independent and the cross terms in (1.10) are zero. In view of (1.10) a natural estimator of C(8 0 ) is 29 When the observations are not independent, the estimator of C(8 o ) must be adjusted accordingly. A reasonable estimate is available when the correlation is confined to clusters. We assume that the data comes from clustered sampling with j = 1, ... , k clusters, where there may be correlation within clusters but observations from different clusters are independent. By (1.10) the cross-product terms between clusters can be eliminated, if we define k 0(8 0 ) =L E[Uj(8 0 )Uj(80 )] j=l where OJ = E?~l Uij is the sum over all subjects in the jth cluster. An estimator of 0(8 0 ) is Cn (8) = ~ I)U;(8)Uj(8)]. ;=1 This leads to the modified sandwich estimator 1- 1 (8)Cn ( 8)1- 1 (8). Marginal approach can be considered as a type of "model misspecification" in terms of dependence structure. This view point is illustrated by the next example. Example 1.14 (Liang and Zeger (1986)) Let Yi = [lil,"', lin]' be the n x 1 vector of outcome values and Xi = (Xi},"', Xin)' be the n x p matrix of covariate values observed at times t = 1,,", n for the i th subject, i = 1,"" K. Assume that the marginal density of Yit is where (ht = h(1]it) , 1]it = Zitf3. By ignoring dependence structure of a subject's ob- servations, they "misspecify" the joint distribution as the product of the marginal distributions. Under such an independence workin~, assumption, the score equations from a likelihood analysis have the form K [.7(f3) = LX~~iSi == 0, i=l 30 (1.11) where ~i = diag(dOit/dT/it) is an n x n matrix and Si =Y i- a~(8) is of order n x 1 for the i th subject. The estimator 131 is defined as the solution of equation (l.ll). Under mild regularity conditions, the resulting estimator 131 of f3 is consistent and K 1 /2(131 - (3) is asymptotically multivariate Gaussian as K ~ 00 with zero mean and covariance matrix l7 given by l7 = K K i=1 i=1 lim K(L: X~~iAi~iXi)-1L: X~~iCOV(Yi)~iXi) K -+00 K (L: X~~iAi~iX i)-1 i=1 where the moment calculations for the Y i 's are taken with respect to the true un- = diag{a"(Oit)}. derlying model and Ai The variance of 13/, 17, can be consistently estimated by K 1- (131 )([~ X~~iSiS~~iXi] f3=i3) /- 1 (131)' 1 To increase efficiency, Liang and Zeger also took the correlation into account through a class of weighted estimating equations. The resulting estimators of f3 remain consistent. Consistent sandwich variance estimates are also available under the weak assumption that a weighted average of the estimated correlation matrices converges to a fixed matrix. The consequences of model misspecification for independently identically distributed (iid) failure time data on the statistical inference of regression parameter f3, based on the assumed Cox regression model, are investigated by several authors, including Gail, Wieand and Piantadosi (1984), Lagakos and Schoenfeld (1984), Solomon (1984), Struthers and Kalbfleisch (1986), Lagakos (1988), Lin and Wei (1989), and Anderson and Fleming (1995). Example 1.15 (Struthers 'and Kalbfleisch (l986)) Assume that (Xi, Di' Zi), i 31 1, ... ,n are n independent realizations of (X, 6, Z), l~hat Z is bounded, that the sup- port of the failure time T properly contains that of the censoring variable, and that ~(tj Zi(t)) = ~o(t) exp(,8' Zi(t)) and Ai(t) be the wor'king hazard model and the true hazard for (Xi, bi' Zi), respectively. Let S(d)(,8,t) 1 =- n n ~l'i(t)exp(,8'Zi(t))Zi(t)l8ld, i=l s(d)(t) = E(S(d)(t)), for d = 0,1,2; and s(d)(,8, t) = E(S(d)(,8, t)) where for a column vector a, a l8l2 refers to the matrix aa', a l8l1 the vector a, a l8lo the scalar 1, and the expectations art: taken with respect to the true model of (Xi, bi' Zi), i = 1,···, n. The logarithm of the partial likelihood based on the working hazard model can be expressed as n 1(,8) = ~ bi[,8' Zi(Xi) -log(S(O)(,8, Xi))]. i=l The corresponding score function is Then, the maximum partial likelihood estimator /3, the solution to U(,8) = 0, con- verges in probability to a constant vector ,8*. ,8* is the unique solution to the system of equations The asymptotic distribution of /3 under a possibly misspecified Cox model for iid realizations of (X, b, Z) is given in the following example. Example 1.16 (Lin and Wei (1989)) In addition to the assumptions in Struthers and Kalbfleisch (1986), assume that I(,8*) is positive definite and other (unspecified) 32 regularity conditions, where If the maximum partial likelihood estimator!3 is under a possibly misspecified Cox model, the random vector vn(!3 - (3*) is asymptotically normal with mean vector zero and with a covariance matrix that can be consistently estimated by V(!3), where V({3) = I- 1 ({3)C n ({3)I- 1 ({3), I({3) = -~ ~ :~~, C n ({3) =~L Wi({3)02, and - b'dZi(X i ) - S(l)({3, Xi) SO({3, Xi) } _t ):XP({3'Zi(Xj)) {Zi(X j ) _ b'j}i(X j j=l nS( )({3, X j) S~)({3, Xj)}. S( )({3, X j) Notice that the "sandwich" type robust covariance estimator V(!3) for the Cox model takes a more complicated form than its parametric counterpart because the score function is no longer a sum of n iid random vectors in a partial likelihood setting. In Chapter 6, we will generalize the results of Struthers and Kalbfleisch (1986) and Lin and Wei (1989) from univariate failure time data to multivariate failure time data in a marginal model approach. Anderson and Fleming (1995) demonstrated the danger of generalizing the intuition gained from analysis of covariance in linear models to the Cox regression model. Their results show that covariate adjustment in Cox regression models has little effect on the variance but may significantly improve the accuracy of the treatment effect estimator. We point out in passing that in a marginal model approach, we intentionally "misspecify" the joint distribution of dependent responses as the product of the 33 marginal distributions, as if these responses were independent, to avoid specifying the dependence structure. Then, we use a robust covariance estimator to account for the misspecified dependence structure. This idea can be attributable to Huber (1967). The connection between the robust covariance for model misspecification and the marginal approach has not been made explicitly in the literature. 1.3 Synopsis of Research This research develops suitable methods for modeling multivariate failure time data with generalized dependence structure. Here, we consider a generalized dependence structure which consists of both between-subject and. within-subject dependence instead of dealing with either between-subject dependence or within-subject dependence only. We propose using a marginal model approach to analyze multivariate failure time data with generalized dependence structure when scientific interests are in the effects of covariates on the risk of failures and knowledge of the dependence structure is not available. In Chapter 2 three types of marginal models in the form of Cox regression models are proposed based on whether or not the baseline hazards are distinguishable: distinct baseline models, common baseline models, and mixed baseline models. The proposed marginal models generalize the WLW model and the LWA model. Our distinct baseline model is equivalent to the WLW model if we stratify our analysis based on both failure types and subjects in a. cluster. Similarly, our common baseline model becomes a more general setup of the LWA model by allowing multiple failures per subject. However, we have to assume either different baselines for each combination of failure types and subjects in a family, or an identical baseline for all combinations of failures and subjects in a stratum in order to apply WLW models or LWA models. Our mixed baseline hazard model offers significantly greater flexibility and applicability for modeling, and enables us to deal with some application problems the current existing methods can not handle. Inference on regression parameters 34 for each type of model is based on a pseudo score equation under the independence working assumption which is in the framework of generalized estimating equations. Relying on the theory of multivariate counting processes, stochastic integrals, and local martingales, we prove in Chapter 3 that the estimators for the proposed models are consistent and asymptoticaJly normal with a robust covariance matrix which can be consistently estimated. The mathematical background for the proofs can be found in Gill (1980), Andersen and Gill (1982), Fleming and Harrington (1991), and Andersen et al. (1993). Simulation studies were conducted in Chapter 4 to assess the adequacy of the proposed large-sample approximation for practical sample sizes. The methodology is illustrated in Chapter 5 on the data from the Framingham Heart Study. The consequences of marginal model misspecification are discussed in Chapter 6 and the numerical investigations are given in Chapter 7. In Chapter 8, we summarize this doctoral research and propose possible extensions and areas for further research. 35 Chapter 2 Marginal Modeling and Estimation 2.1 Introduction In this chapter we describe three types of marginal models proposed for censored data with generalized dependence structure. In Section 2.2 some basic notation and definitions are presented. The proposed models are described and the estimation methods are introduced in Section 2.3. We conclude the chapter with a brief discussion of the relationship between the proposed marginal models and the WLW and LWA models and a quick overview of the asymptotic properties of parameter estimators which are to be developed in Chapter 3. 2.2 Notation and Definitions Suppose that there are n independent families. In family i, there are Ji members. For member j in family i, /{ij types offailures may occur" We use (i, j, k) to denote the kth 36 type of failure on member j in the ith family, for i = 1,2,'" n; j = 1,2,' .. ,Ji ; k = 1,2"", Kij. Here, "family" (the independent unit) is used as a generic term for a cluster of related observations (for example, failure times of CHD and CVA observed from family members in the Framingham Heart Study) and "member" for an individual in a cluster. The data available in regression problems for multivariate failure time data with generalized dependence are observations on the triplet (Xijk , Oijk, Zijk) for (i,j, k), where Xijk is the minimum of the potential failure and the potential censoring time pair (Tijk , Cijk ) for the kth type of failure on member j in family i; the indicator of observing failure for (i,j, k) is Oijk = I {Tijk $ Cijk}; covariates for (i,j, k) are denoted by a p-dimensional column vector Z ijk = (Zijkl"", Zijkp)', which may be time-varying. If the number of members in families and/or the number of failure types on members are not equal, set J = maxi(Ji) and K = maXij(Kij ). The missing failure time Tijk or missing covariates Z ijk can be accommodated by setting the corresponding Cijk to zero. This implies that X ijk = 0 and Oijk = 0, since Tijk is positive. Hence, such cases make no contribution to the estimation procedure. Consequently, we assume implicitly that data are missing completely at random in the sense of Rubin (1976). Therefore, without loss of generality, we assume hereafter that there are K types of failure which may be experienced by each member and J members = I(Xijk ~ t) t, Oijk = 1) be and Mijk(t) = within each family. In terms of counting process notation, we let }ijk(t) denote the at risk indicator process for (i, j, k); N ijk (t) = I (Xijk $ the counting process which registers failure for (i,j, k); and Aijk(t) Nijk(t) -lot }ijk( )Aijk( U U )du denote the corresponding marginal hazard and marginal martingale, respectively. The marginal hazard rate of (i, j, k) is defined as: A't'k(t· Z) ), i = lim.!.. h!O h Pr(t < - T" 'k < t + hiT,· 'k > t· t) t) = 1,2,,, . ,n; j = 1,2,,, . ,J; k = 1,2" - , Z) .. ,K. As in most failure time data analyses, we assume both independent right cen37 sorship and a noninformative censoring scheme. Independent censoring would hold, for example, if for a given (i, j, k), the failure time Tij~: and the censoring time Cijk are statistically independent for i = 1,2,···, nj j = 1,2,··· ,Jj k = 1,2,··· ,Kj conditional on the covariates Z ijk. It is presumed that J and K are small relative to n, the number of independent families. We also assume tha.t the number of distinct events, K, is fixed for each subject and the number of subjects, J, fixed in each family, that is, J and K are not functions of n. Because we are studying the occurrence of random events in time, all results are presented on a continuous time interval T = [0, T), for a given terminal time 2.3 T, 0< T ~ 00. Marginal Hazard Models and Estimation We propose the following three types of marginal haz,:trd models, Le., distinct baseline hazard model, mixed baseline hazard model, and common baseline hazard model. All these models are in the Cox regression type of form. • 2.3.1 Distinct Baseline Hazard Models The marginal distinct baseline hazard model for the kth type of failure on member j in family i is: (2.1) for i = 1,· .. ,nj j = 1,··· ,Jj and k = 1,···, Kj where fixed, nonnegative baseline hazard function and f3 := AOjk(t) is an unspecified, yet ({3I, ... , (3p)' is a fixed column vector of p regression parameters. We use f3 to denote the true value of the regression parameter as well as a generic argument. Where it is necessary to make the distinction, f3 0 is used to denote the true parameter value. Recall that the assumption of common f30 for all J members and K types of failure does not forfeit the generality 38 of model (2.1). We can obtain member type specific or failure type specific distinct {30 by using member type specific or failure type specific covariates, respectively. In this model, a different baseline hazard AOjk(t) is assumed for each member and for each type of failure in a family. The distinct baseline model is useful when the failure events are of different types and the members in a family have heterogeneous susceptibilities to the same type of failure. Use the Framingham Heart Study as an example. Suppose that we are interested in the effect of some risk factors on the time to myocardial infarction and the time to cancer involving husbands and wives. The incidence and the prevalence of myocardial infarction are different from the incident and the prevalence of cancer, and husband and wives are likely to have dissimilar physiological resistance to myocardial infarction or to cancer. It is reasonable for us to assume different baseline hazard functions for myocardial infarction and for cancer, and also different baseline hazard functions for the husband and the wife from the same family. Hence, the distinct baseline marginal models of (2.1) should be considered. • The marginal partial likelihood for the kth type of failure on the jth member among the n independent families is P L j k({3) = Ii [ exp{{3' Zijk~Xijk)} ] 6ijlc i=l LIERjlc(Xijlc) exp{{3 Zljk(Xijk ))} j where Rjk(t) = {I: Xljk ~ = 1,'" J, k = 1,·· . K, t}, that is, the set offamilies at risk just prior to time t with respect to the kth type of failure on member j. If Tijk were statistically independent (that is, under a working independence assumption), the (pseudo) partial likelihood would be P L({3) { a' Z (X )} ] 6ijlc exp,., ijk, ijk k=l j=l i=l LIERjlc(Xijlc) exp{{3 Zljk(Xijk )} K = II IIJ IIn [ (2.2) When the pseudo partial likelihood in (2.2) is formulated by means of counting pro- 39 cesses, it has the form P L((3) K = II IIJ IIn II {V () n~jjk u exp [a'Z ()] ;jk U k=l j=l i=l U~O E,=l Y,jk( u) exp[(3 Z,jk( u)] fJ· }dNiilc(U) , where dNijk(t) = Nijk(t) - Nijk(t-). Let 1((3, t) be the logarithm of P L((3) at time t, that is, 1((3, t) is defined as 1((3, t) = log P L((3, t) for which the pseudo partial likelihood score processes are U((3, t) = 81((3,t) 8 (3 Now, we are going to show that (2.3) is equal to Notice that, (2.3) 40 • _ v;. ( ) \ .. ( ) S iJk U A1Jk U Ei=1 Y,jk( U )Zljk(U) exp[{3' Zljk( U )] d } n () [t:l' ()] U . E/=1 Y,jk U exp fJ Zljk U . (2.5) The last term in (2.5) n - E Zljk(U)Y,jk( U)Aljk( U )du} 1=1 o. Thus, we have (2.3) equal to (2.4). The maximum partial likelihood estimator to the pseudo partial likelihood score equations U({3) = 8 log P L({3, T) 8{3 41 i3d for (3 is defined as the solution - o. 2.3.2 (2.6) Common Baseline Hazard Models The marginal common baseline hazard model for the kth type of failure on member j in family i is, (2.7) for i = 1,'" ,n; j = 1,"', J; and k = 1",', K. Here, an identical baseline hazard function AO(t) is assumed for all types of failure and all members in a family. A common baseline hazard model is applied when failures are of the same type and members in a given family have similar susceptibilities to the same type of failure. For example, consider a study on vision loss caused by diabetic retinopathy involving siblings, where we treat the vision loss of each eye as one type of failure. There is no good evidence to assume different susceptibilities to vision loss among siblings and there are no biological differences to support that one eye is superior or inferior to another eye. Therefore, the common baseline hazard model is the choice. Under a working independent assumption, the pseudo partial likelihood for the common baseline model (2.7) is PL(j3) = IT IT IT [ exp{j3IZijk(~ijk)} lSii/C , k=1 j=1 i=1 E,/geR(Xij/c) expV3 Z'/g(Xijk )} where R(t) = {I, f,9 : X ,/ g ~ t}, i.e., the set of family, member, and failure type at risk just prior to time t. The maximum partial likelihood estimator f3 c for 130 is defined as the solution to the pseudo partial likelihood score equations t t tiT k=1 j=li=1 0 {Ziik(U) - E:=1 ~f=17i:l~/9(U)ZI.f9(U)eX;[,l~/zl/9(U)]}dMijk(U) = 0 E g=1 E/=1 E'=1 Y//g( u) exp[j3 Z,/g( u)] (2.8) 42 .. 2.3.3 Mixed Baseline Hazard Models The mixed baseline hazard model lies between the distinct baseline hazard model and the common baseline hazard model. It is useful when, for example, the baseline hazards are the same for all members, but are heterogeneous for different failure types. In more general cases, the mixed baseline hazard model may have an identical baseline for some members of a family but different baselines for the rest of the members in that family and/or the same baseline for some types of failures but different baselines for other types of failures. We refer to the model as a mixed baseline hazard models if the baseline hazard function is identical for some of the combinations of members and failure types but is different for other combinations. Again, use the Framingham Heart Study as an example. Suppose that we are interested in studying the hazard rates for myocardial infarction and cancer among siblings. For each type of failure, myocardial infarction or cancer, we need to use different baselines for husbands, wives, and siblings to account for the different physiological resistance among them, but an identical baseline for siblings only because of their similar susceptibilities to the same type of failure. Hence, a mixed baseline hazard model should be considered. We give here two mixed baseline hazard models: the mixed baseline hazard model with a different baseline for each member and an identical baseline for all failure types Aijk(t;Z(t)) = AOj(t)exp{,8'Zijk(t)}, t E [O,T) (2.9) and the mixed baseline hazard model with an identical baseline for all members and different baselines for each type of failure Under a working independence assumption, the pseudo partial likelihood for the 43 mixed baseline model (2.9) is: P L(P) = where Rj(t) IT IT fI [ exp{,8' Zijk(.;ijk)} ] 6ij/c k=1 j=1 i=1 2:/ge Rj(Xij/c) exp{,8 Z /jg (Xijk)} = {l,g : X/jg ~ , t}, that is, the set of family and failure type at risk just prior to time t with respect to member j. The maximum pseudo partial likelihood estimator 13m for Po in the mixed model (2.9) is defined as the solution to the pseudo partial likelihood score equations ttt r [Zijk( u) _ 2::=1 ~i=1 ~jg( U)Z/jg( u) ex~{,8' Z/jg( U)}] dMijk( u) = o. k=1 j=1 i=1 Jo 2: g=1 2:/=1 Yijg( u) exp{,8 Z/jg(u)} (2.10) Similarly, the maximum pseudo partial likelihood estimator ~m for Po in the mixed baseline hazard model with an identical baseline for all members and a different baseline for each type of failure is defined as the solution to this system of pseudo partial likelihood score equations 2.4 Concluding Remarks We propose using a marginal model approach to analyze multivariate failure time data with generalized dependence structure when the specific interest centers on the effect of risk factors. Based on whether or not the baseline hazards are distinguishable three types of marginal models in the form of Cox regression models are proposed: distinct baseline models, common baseline models, and mixed baseline models. When J = 1 and K = 1 it is a trivial consequence that the univariate Cox regression model can be obtained from anyone of the three models. When J = 1 and K > 1, or J > 1 and K = 1, we obtain the WLW models from the distinct baseline models, the LWA models from the c<.>mmon baseline models, and either WLW models or the 44 LWA models from the mixed baseline models depending on whether J = 1 or K =1 and the different baseline hazards are assumed in the mixed baseline models. The proposed marginal models generalize the WLW model and the LWA model. Our distinct baseline model is equivalent to the WLW model if we stratify our analysis based on both failure types and subjects in a cluster. Similarly, our common baseline model becomes a more general setup of the LWA model by allowing multiple failures per subject. However, we have to assume either different baselines for each combination of failure types and members in a family, or an identical baseline for all combinations of failures and subjects in a stratum in order to apply WLW models or LWA models, which may not be applicable in applications. Our mixed baseline model offers significantly greater modeling flexibility and applicability, and enables us to deal with some application problems the current existing methods can not handle. In the next chapter, we shall prove that the estimators for all three types of models are consistent and asymptotically normal with a covariance matrix which can be consistently estimated, under some sufficient regularity conditions and if the marginal hazard models are correctly specified. 45 Chapter 3 Asymptotic Distributions of Parameter Estimators 3.1 Introduction In this chapter we develop the asymptotic distribution theory for the parameter estimators for the three types of models described iIlL 2. First, we state two lemmas which are useful in proving consistency in Section 3.~~. Because the technical development for the mixed model combines the features of hoth the distinct baseline hazard model and the common baseline hazard model, we confine our attention to the mixed baseline hazard model. In Section 3.3 we derive the consistency and asymptotic normality of the estimators for the mixed baseline model. The asymptotic distributions of the estimators for the distinct baseline hazard model and for the common baseline are given in Section 3.4 and Section 3.5, respectively, without proof for the sake of conCIseness. 46 3.2 Two useful Lemmas When proving consistency, we often use two special cases of Lenglart's Inequality stated in the following lemma. The lemma is drawn from Lemma 8.2.1 in Fleming and Harrington (1991), and its proof can be found there. Essentially, the lemma says that we can bound the probability of a larger value of a local submartingale anywhere in the whole time interval [0, r] in terms just of the probability of a large value of its compensator at the endpoint r. In other words, the compensator dominates the local submartingale. Lemma 3.1 Let N be a univariate counting process with continuous compensator A such that M = N - A. Let H be a locally bounded and predictable process. Then for all C, TJ > ° and any t E [0, r], 1. Pr{N(t) 6 ~ TJ} ~ - TJ + Pr{A(t) ~ 6} 2. The next lemma is taken from Appendix II of Andersen and Gill (1982) and its proof is given there. This lemma is an extension of a result from standard convexity theory on finding extrema in sequences of concave functions. Basically, the lemma states that if a sequence of random concave functions converges pointwise in probability to a real-valued function on an open convex set E then the convergence is uniform in probability on compact subsets of E. Lemma 3.2 Let E be an open convex subset of R!. Let FI, F2 , •• " be a sequence of random concave functions on E and F a real-valued function on E such that, Vx E E, Fn(x) ~ F(x), asn 47 --+ 00 Then: 1. The function F is concave. ~ 00, 2. For all compact subsets A of E, as n sup IFn(x) - F(x)I-~ zEA o. 3. If F has a unique maximum at x and Fn has one at X n , then X n ~ x, as n~ 3.3 00. Mixed Baseline Hazard Model In addition to the notation we used in Chapter 2, we introduce the following notation for convenience: n N,jk(t) =L i=1 48 Nijk(t) n M.jk(t) = L:Mi;k(t) i=l n A.jk(t) = E Aijk(t) i=l and, for d = 0,1,2 S}~)({3, t) = K E S}~)({3, t). k=l For two p-dimensional vectors a and b, the outproduct of a and b is denoted as a ® b, = aib;. We also write matrix A is lIali = maXi lail , lal is the Euclidean norm: which is a p by p matrix ab' with the (i,j) element (ab'k; a@2 for the matrix aa'. The norm of a vector a or a IIAII = maxi,; IAi;l, respectively. For any vector lal = VEi=l a~. For any scalar a: lIali = lal = a. or a To ensure the consistency and asymptotic normality of the estimator 13m from solving the pseudo partial likelihood score equations (2.10) for the mixed baseline hazard model (2.9), we assume the following sufficient regularity conditions. These conditions are similar to those used by Andersen and Gill (1982) for the univariate maximum partial likelihood estimation. M.l. AOj(t) ~ ° and faT AOj(t)dt < 00, j = 1,''', J. M.2. There exists a neighborhood 8 of the true value (3o and scalar, vector, and matrix functions s}~)({3, t), sWC8, t) and s}i) ({3, t) defined on 8 x [0, r] such that, for d = 0,1,2 sup IISW(,l~, t) - sW({3, t)1I ~ te[O,T],{3e B j M.3. For all f3 E = 1, "', J; ° as n k = 1,' . " I<. 8, t E [0, r], j = 1,"', J and k 49 = 1"", I< -+ 00 and 8 (l)(~ ) _ (2}((3 ) 8(3B jk ,." t - Bjk ,t. MA. The functions BW((3, t) are bounded and s~~((3, t) is bounded away from zero on [0, r] and B}~)((3, t) are continuous functions of (3 E B uniformly in t E [0, r], for d = 0,1,2; j = 1"", J; and k = 1,,,, ,K. M.5. The matrices are positive definite, where, and B}~}((3, t) K =L B}~}((3, t) d == 0,1,2. k=l M.6. There exists a matrix JJm = JJm((3o) such that when n -+ 00, where J Di = K L L fa J T {Zijk(U) - e;((3o, u)}dMijk(U) = j=lk=l 0 K LL D ijk ({30) j=lk=l i = 1,"',n and (3.1) 50 M.7. For every f > 0 such that when n -+ 00, Note that the probability statements made in the above conditions are relative to the usual filtration defined by F(t) = {Fll1 (t),'" ,FllK(t),"', FUK(t) , ... ,FnJK(t)}, where :Fijk(t) = u{Nijk(U), }ijk(U+),Zijk(U+)j U ~ t}. Condition M.2 is the asymptotic stability condition for the functions SWU3, t), d = 0,1,2. Condition M.3 insures that differentiation with respect to {3 and limits with respect to n may be interchanged since the condition also holds for the functions S~~)({3, t). Conditions M.3 - M.5 are regularity conditions similar to those found in standard asymptotic likelihood theory. Conditions M.6 and M.7 are analogous to the variance-covariance stability and the Lindeberg condition of the classical multivariate central limit theory for a sum of independent zero-mean vectors. Theorem 3.1 (Consistency of 13m) Under conditions M.l - M.5, n -+ 13m ~ {3o as 00. Proof: The proof follows the argument in Lemma 3.1 of Andersen and Gill (1982) for the univariate independent case except for some modification to include our more general setup. Recall that the pseudo partial likelihood for the mixed baseline hazard model with different baselines for members expressed in the counting process notation has the form . P L({3) = V () [{3'Z ()] }dNii,.(U) K ~ ij: U exp ijk ,u k=1 j=1 i=1 U~O L g=1 Ll=1 Y,jg( u) exp[{3 Z,jg(u)] K II IIJ II II n { 51 and 1(13, t) - log P L(I3, t) Consider the process 1 .1 O({i, t) = -(1(13, t) - 1(13o, t)) = n K L L:: Ojk(l3, t) j::1 k=1 where Define the process J G(I3, t) K = L:: L:: Gjk ({3, t) j=1 k=1 where r{ 1 n iO) (13 u) } Gjk (l3,t)=-L::}o (13-l3o)'Zijk(u)-log t~) dNijk(U). n i=1 ° Sj. (130' u) , First, we are going to show that 0(13, T) is asymptotically equivalent to G(I3, T) in the sense that 0(13, T) - G(I3, T) 2-+ 0 as n --+ 00. From the definitions of Ojk(l3, T) and Gjk(l3, T), we have .. 52 = _ r Jo {IOg S(O)(t.l sj~)«(3, u) -10 s~~\(3, u) } d (N.jk( U)) . U) g S(O)(t.l U) n ). fJO, fJO' J. Based on Conditions M.2 and MA, for every u E [0, r] and (3 E B I og S~~)«(3,u) (0) Sj. «(30' u) ~ as n --+ 00. 1 og - s}~)«(3,u) (0) Sj. «(30' u) 0 Also, from (1) in Lemma 3.1 8 Pr { N .J"k(r) ~TJ } ~ -+Pr n TJ {loT -LYiik(U)Aiik(U)du~8 1 n 0 n } i=1 - ~ + Pr {loT AoA u )sj~)«(3o, u )du ~ 8} . Choose 8 > as n --+ 00, tion MA, ~ loT AOj (u )SJ~) «(30' u )du, then we have Pr {loT AOj(U)Sj~)«(3o, u)du ~ 8} ~ 0 by Condition M.2. As the consequence of Condition M.1 and Condi--+ 0 as TJ --+ 00. Thus, lim lim Pr {N.ik(r) n 'lloon.-oo ~ TJ} = 0 therefore, Ojk«(3,r) - Gj k«(3,r) ~ 0 and consequently 0«(3, r) is asymptotically equivalent to G«(3, r). 53 (3.2) Then, for each {3 E 8, the process 1 n (t{ ({3-{3o)'Z,jk(u)-log s(O)({3 U)} (~) , dM,jk(U) -Lin n ,=1 ° = Sj. (3.3) ({30, u) is a locally square integrable martingale in t with respect to the filtration F( t) since the integrand in Equation (3.3) is locally bounded and predictable. The predictable variation process of this martingale at t is + ( log (0) ({3, Sj. (0) Sj. u) ({30, u) ) 2 (0) } Sjk ({30, u) AOj(u)du. It now follows from conditions M.l, M.2 and MA, that for each (3 E B, nHj k({3, r) converges in probability to a finite function of {3. Hence, the inequality of Lenglart (2) in Lemma 3.1 implies that, for each {3 E 8 and as n 54 -+ 00, As the result of conditions M.1, M.2 and MA, ior each (3 E B, Ajk({3, r) converges in probability to gjk({3) = r{({3_{30)'8W({3o'U)-S~~)({3o'U)IOg ~~~)({3,u) }.xOj(U)dU. Jo Sj. ({3o, u) (304) Hence, Gj k({3,r) must converge in proba.bility to gjk({3), as long as (3 E B, for j = 1"", J and k = 1,"', K. Therefore, G({3, r) ~ g({3) where J g({3) = K l: l: gjk({3). j=lk=l Using Condition M.1 and the boundedness conditions of M.3 and MA, we may evaluate the first and the second derivatives of g({3) by taking partial derivatives inside the integral (cf. Corollary 5.9 in Bartle (1966)). Hence, for each {3 E B, {) {){3g({3) which is zero at {3 = {3o. Furthermore, which is negative definite at {3 = (3o by condition M.5. Therefore, G({3, r) converges pointwise in probability, for each (3 E B, to a concave function g({3) on B with a 55 = {3o. The random function G'({3, r) is also concave and has a unique maximum at {3 = {3o when the maximum exists. Following from Lemma 3.2, unique maximum at {3 the random concave function G({3, r) converges to 9(/3) in probability uniformly over 8. Consequently, the maximizing value 13m of G({3, r) converges in probability to the maximizing value of {3o of 9({3), that is, 13m ~ {3o as n ~ 00. 0 Theorem 3.2 (Asymptotic normality of 13m) Assuml~ that conditions M.l - M.1 hold. Then, as n ~ 00, where (3.5) J I({3) = I: I j ({3), j=1 and Ij({3) and 1jm({3) are defined as in conditions At.S and M.6. Proof: Recall the pseudo partial likelihood score fUIlction (2.10) is J K - I:I: Ujk({3), j=lk=1 where 56 .. In view of the first order Taylor expansion for U(f3) centered at the true value f30 of f3, the score U(f3) may be written as U(f3) = U(f3o) + :f3 U (f3)I =f3" (f3 - f3o), f3 where f3* is on a line segment between 13m and f3o. Since U«(3m) = 0 by the definition of 13m, 1 ..;nU(f3o) 1( 8)1 =;; - 8f3U(f3) f3=f3*..;n(f3m - f3o) . (3.6) A To prove the asymptotic normality of ..;n«(3m - f3o), it now suffices to prove (i) weak convergence ofn- 1 / 2 U(f3o) to a Gaussian process and (ii) convergence in probability of n- 1 ( -1rJU(f3») 1f3=f3* to a non-singular matrix I(f3o) for any random f3* = f3*(n) such that f3* ~ f30 as n --+ 00. For the first part, we need to show that 1 1 J K ..;nU(f3o) = ..;n;; (; U jk (f3o) ~ N p (0, ,Em (f3o» . First, we shall show that n- 1 / 2 Ujk(f3) is asymptotically equivalent to n- 1 / 2 Djk(f3) in the sense that ~o as n --+ 00, where, 1 1 ..;n D j k(f3) = ..;n tt D ijk (f3), n and D ij k(f3) is defined as in (3.1). By the martingale central limit theorems of Rebolledo [e.g., page 83 of Andersen et al. (1993)], for eachj and k, Mjk(U)/..;n converges 57 weakly in V[O, T] to a zero-mean normal process, say W(u). As the consequence of the tightness of W(u) [c.f., Sen and Singer (1993), p. .330], there exists 6*(f, N*), such that for every f > 0,6 < 6**:5 6*, and n > N*, = T / 6** where 6** is chosen in such a way that h is an integer. It follows from Conditions M.2 and MA, there exists a N** such that when n s}~)u~,u) _ 8~~)(,8,.u) < UE[0~~~E8 S}~)((3, u) Consequently, for n Ph s~~)((3, u) > N** 6** > N** and for any partition of [0, T] with 0 = po < PI < '" < = T, max q sup uE[pq-ltPq),(3e8 S~~)((3,u) () ((3, u) S/ S~~)((3,u) Choose 0 = Po < UE[:'~~E8 S}~)((3, u) < 6**. < PI < '" < Ph =T 8~~)((3,U) s~~)((3,u) 8~~)((3,u) - ~s}~)((3, u) in such a way that the length of each of the interval between Pq-l and Pq is 6** so that h = T / 6*'·. Then 1 -vITi - {U 3'k((3) - D 3'k((3)} - _1 ~ ~ t::: LJ LJ = I: 1 fPq+1 in yn q=Oi=1 Pq pq q=O pq +1 (0) Sj. ((3,U) (0) Sj. ((3,U) {S~~)((3, U) _ 8}~)((3, U2} dM.jk(u) S}~) ((3, U) Therefore, for each j and k and n 1 {S~~)((3, u) _ s}~)((3, u) } dM,. ( ) II- -vITi {U 3'k((3) - S~~) ((3, U) > max{ N* , N**} , D 3'k((3)} 58 II vITi 13k u < h8**f - Tf = op(l). Consequently, n- 1 / 2 U(130) is asymptotically equivalent to Notice that n- 1 / 2 D(l3o) is a sum of n independently distributed p-component random vectors D i with mean vector 0 since the integrand is predictable and covariance matrix J var(Di) = E[L: K J K L: L: L: Dijk(l3o)D~jg(l3o)]' j=lk=lj=lg=l It follows from Conditions M.6 and M.7 and the multivariate central limit theorem [c.f., Page 25 of Puri and Sen (1971)] that For the second part of the proof, we need to show n- 1 ( 1(130) for any random 13* = 13*(n) such that 13* ~ -frJ U (I3)) 113=13· ~ 130 as n ~ 00. The proof follows along the lines of the last part of the proof of Theorem 3.2 in (Andersen and Gill (1982)) with some modification to accommodate our more general setup. Because 8 8I3 U (I3) 59 _~ ~ r {S~~)«(3, u) _ LJ LJ 10 - j=lk=1 0 = - K r L L 10 j=lk=1 (0) (S~~)«(3, U)) 0 } ~ dN.. 2 (0) Sj. «(3, u) Sj. «(3, u) LJ i=1 ( ) '3k U J \';«(3, u)dN.jk(u), 0 II ~ (- :13 U (13)) 113=13' - 1(130) I ;t. f;;[ V; (13', u)dN.;k(u) - -1 L LK 10r {\'; (,8* , u)dN.jk(U) - Vj(,8o, tt)s~~)(,8o, u)Aoj(u)du} J n j=1 k=1 0 < tt t.[ r {\,;«(3*, u) - 11j«(3*, un dN.jk(uL j=1 k=1 10 + n t tilT j=1 k=1 ,,;(130' u )-\°)(130' u )>'0;( u )du 0 11j«(30' u) {dN.jk(U) n A.jk(U~dU} I n (3.7) It follows from Condition M.2 and the boundedness conditions in MA that sup l.lE[O,T],(3EB lilT {\';«(3, u) - 11j(,8, unll ~ o. 0 60 This results and {3* ~ {30 together with (3.2) indicate that the first term of (3.7) converges in probability to zero. The continuity in {3, uniformly in t, in Condition MA plus (3.2) show that the second term of (3.7) tends to zero also. ~; +Pr [foT{(vA{3o,u))a,b}2S)~)({3o,u)'\oj(u)du> 77]. where, (Vj({3, U))a,b is the (a, b) element ofvj({3, u) and M.jk(u) = Ei=l Mijk(u). Hence, Condition M.2 and the boundedness conditions in M.I and MA implies the third term of (3.7) vanishes. Finally, applying Conditions M.I, M.2, and M.3 directly to the fourth term of (3.7), we obtain that this term converges in probability to zero. Thus, when {3* ~ {30, as n ~ 00. The proof is completed. 0 If we assume that the covariate, counting, and censoring processes are identically distributed for i = 1,· .. ,n, that is, in the iid cases across families, a simplified set of conditions can be obtained. The result is summarized in the following theorem. Theorem 3.3 (Sufficient conditions of /3m in the iid case) Assume that the covariate, counting, and censoring processes are identically distributed for i and Yljk are left continuous with right hand limits for j = = 1, ... ,n and Zljk 1,···, J; k = 1,··· , K. Conditions M.l to M.1 are satisfied under the following conditions M.l to M.lV: 61 M.l. AOj(U) ~ °and faT AOj(t)dt < oo,j = 1"", J. M.ll. There exists a neighborhood B of the true value (3o such that, for j = 1,,,,, J = 1,,,,, K and k E{ sup Yijk(t)IZl;k(t)12e{3'Zlik(t)} < 00. te[o,T),{3eB M.Ill. Pr{Yijk(t) =1 Vt E [0, Tn > 0. M.lV. The matrices are positive definite, Vj = 1,' .. ,J, where S~k' S}k' and S~k are now defined by s~~)({3, t) = E{Yijk(t)e{3' Zl i k(t)}, sW({3, t) sW({3, t) = E{Yijk(t)Zljk(t)e{3'Zlik(t)}, = E{Yijk(t)Zljk(t)@2 e{3' Zl ik{t)}, and respectively. Proof: The conditions M.3 and MA now are automatically satisfied by the condition M.IVand the definition of s}~)({3, t), d = 0,1,2. From the condition M.I1, we also have E{ Yijk(t)IZljk(t)le{3' Zlik(t)} < sup 00, te[o,T),{3eB and E{ sup Yijk(t)e{3' Zlik(t)} < 00. te[o,T],{3eB By dominated convergence sW({3, t), sW({3, t), and ,s}i)({3, t) are continuous functions of (3 E B for each t E [0, T), uniformly in 62 t E [0, T). They are also bounded on B x [0, r] and, by condition M.III, s~~(f3, t) is bounded away from zero on B x [0, r]. Without loss of generality we may take B to be compact. Following the argument in Theorem 4.1 of Andersen and Gill (1982), we can consider Yijk(t) exp{,8' Zljk(t)} as a random element of D[O, r], where the elements of D[O, r] take values not in R but in the Banach space of continuous functions on B endowed with the supremum norm. Then by Theorem Ill.1 in Andersen and Gill (1982), we have sup IIS~~)(,8, t) - s}~)(,8, t)1I ~ tE[O,'T],,8EB ° as n -+ 00. The same argument works for SW(,8, t) and SW(,8, t). Therefore, sup II sW (,8, t) - sW(,8, t)1I ~ tE[O,'T].,8EB ° as n -+ 00, (3.8) and Condition M.2 holds. It is clear that conditions M.6 and M.7 (Lindeberg condition) are satisfied with K J J K }Jm(,8o) = ~ ~ ~ ~ E{Dljk(,8o)D~!g(,8o)}' j=lk=l!=lg=l The proof is completed. 0 It is natural to estimate the covariance matrix of y'n(13m - ,80)' }Jm(,8o), from the data by (3.9) where I({3) 1 n J K = - ~ ~ ~ fa n i=l j=1 k=1 'T ¥;({3, u)dNijk(U), (3.10) 0 (3.11) 63 (3.12) (3.13) (3.14) xdN.j.(u), (3.15) and n N.j.(U) K =L L Nijk(U), i=l k=l Note that Em is obtained from Em by replacing ,8o,8}~)(,8,u),s}~)(,8,u), and AOj(U) with 13m, S}~)(i3, u), S]~)(i3, u), and dN.j.(u)/ {nS]~\8, un, respectively. The follow- ing theorem states that Em is a consistent estimator of Em. To avoid imposing more conditions, we consider the iid cases. Theorem 3.4 (Consistency of Em(i3m) in iid cases) Suppose that all the assumptions of conditions M.l to M.lV are satisfied and in addition the following condition holds, then Proof: To prove the consistency of the Em' it suffices to show that: 64 as n ~ 00 since using Slutsky's theorem we obtain iJm((3m) ~ E m({3o) from the first part and using Slutsky's theorem again we get the desired results. To show that the first part holds it suffices to show that each component of converges in probability to the corresponding components in that is, for a, b = 1,'" ,p, Write the ath element of the vector b jjk as bjjk , a, then (3.17) (3.18) (3.19) 65 1 (?r) foT S(l) (?c ) f T S(l) j.,a 1"Jm' U dN.. ( ) j.,b 1"Jm' U 'N. ( ) L.J Jo (0) 13k U (0) ll. 1jg U +_'"' n i=l 0 Sj. (~m'U) 0 Sj. (~m'U) n A A 1 n foT { Z.o () _ +_'"' n~ 0 13 k,a U ,=1 (3.20) A S(l) (?r j.,a 1"Jm' U)} (0) A Sj. (,8m'U) I Yoijk (U) e ~m Zijk(U) n K , " , '"' dN (0) : nSj. (t'm'U) L.J L.J l=lr=l 13r U 0 ( ) (3.21 ) (3.22) (3.23) - E {foT [Zljk,a( u) - fj,a(~o, U)]dMljk(u) foT [Zljg,b( u) - ~,b(~o, U)]dMljg (U)} - E {foT[Zljk,a(U) - fj,a(~o,u)][dNljk(U) - Y1jk(u)e~~Zljk(U)AOj(u)du] X - foT [Zljg,b(U) - 4,b(~0, u)][dNljg(u) - Yijg(u)e~~ZlJ9(U) AOj(u)du]} E {foT Zljk,a(u)dNljk(U) foT Zlj9,b(U)dNlj9(u)} -E {foT fj,a(~o, u)dNljk(U) foT Zlj9,b(U)dNlJg(u)} 66 (3.24) (3.25) (3.26) +E {loT e;,a(/3 0' u)dN1jk(U) loT e;"b({30' U)dN1f9 (u)} (3.27) +E {loT [Zljk,a(U) - e;,a({3o, U)]Yi;k(U)e{3~Zljk(U) Ao;(u)du X fj,b({3o' U)]Yifg(u)e{3~Zlf9(U) AOf(u)du} loT [Zlfg,b(U) - (3.28) (3.29) (3.30) Note that (3.17) is just an average of iid random variables. Therefore, (3.17) ~ (3.24) by the law of large numbers. Now, let's rewrite (3.21) and (3.28) in the following forms: (3.21) = dN.j. (u )dN. f .(v) (1) (3.31) A S;.,a({3m,U) ( )dN ( ) X (0) (0) dN.;. u .f. v S;. ({3m,u)2Sf. ({3m'V) . A A 67 (3.32) .. (3.33) (1) X where N.j.(u) A (1) A Sj.,a~m,U)SJ.,b(~m,V) dN· (u)dN, (V) S)~)(,8m, U)2S}~)(,8m, V)2 .J. (3.34) .j. = Ei=l E~=l Nljr(U). (3.28) = (3.35) (3.36) (3.37) (3.38) Define 68 and From (3.16), we also have E{ y!jk(U)IZlj/c(U)le,8IZ1ik(U)y!jg(V)e{3IZ1f9(V)} < sup 00 {3EB.UE[O...j,VE[O...j l$M$J,l$k,g$K and E{ sup Y'ijk(U)e{3IZ1ik(U)Y'ijg(v)e{3IZ1f9(V)} < 00. (3 EB,uE[O''').VE[O,..) 1 $M $J,l $k,g$K Using the above condition and similar arguments as in Theorem 3.3, we have that mjkajgb({3, U, v) is a continuous function of (3 E Band M jkajgb({3, U, v) converges in probability to mjkajgb({3,u,V) uniformly in (u,v) E [o,TF. By (3.8) and the bound- edness of s~~)({3, t), the above results and the consistency of 13m, we have Mjkajgb(/3m,U,v) Sj. fJm'U ) S(O) j. ( {3m'v (O) (l:l A p ) ---+ mjkajgb({3o,U,v) (0) ( Sj. {3o,u ) Sj.(0) ( {3o,v ) uniformly in (u, v) E [0, TF. Note that n- 1 N.j.(t)(t) converges to rSj. ({3o,u)Aoj(u)du, 10 (0) that n- 1 N.j.(t)(t) is bounded in probability and that mjkajgb({3, U, v) resides in the product space ofleft continuous functions, it follows that (3.31) is consistent for (3.35). Using similar arguments, we can show that: (3.32) ~ (3.36), (3.33) ~ (3.37), and (3.34) ~ (3.38). Hence, (3.21) ~ (3.28). Again using the similar argument, we can show that (3.22) ~ (3.29) and (3.23) ~ (3.30). Now, we are going to show that (3.18) is consistent for (3.25). 1(3.18) - (3.25)1 69 where 11 1 n =- ~ n 1=1 [ 1 T 0 S(I)(tI j.,a I-'m' U ) (0) " Sj. (/3 m , U) - (1)(/.1) Sj.,a 1-'0' U (0) S j. (f3, U) dNijk(U) fo T ] IZijg,b(U)ldNijg(u) (3.39) 0 and - E r {ioo S<,I) J(;) Sj. (/3 u) 0' dN1jk(U) (/3, u) iT Zljg,b(U)dNljg(u) }. By the law of large numbers and consistency of n --+ 00. 0 13m we obtain that I 12 ~ 0 as . From the results in (3.8), consistency of i~m, and the iid condition across the families we have 11 since E(Zljg,b(X1j9 )) < 00 1 < f- < f- n ~Nijk(T)IZijg,,~(Xijg)1 n i=1 1 n ~ IZijg,b(Xijg) I n i=1 --+ fE( Zljg,b(X1j9 )) ~ 0 by Condition II. Hence, 11 ~ 0 as n --+ 00. Using simi- lar arguments, we can show that (3.19) ~ (3.26), and (3.20) ~ (3.27). Therefore, • 70 To prove the second part, we write < t.E ~f {~(,8m,U) t.Ellf {,,;(,8m, + + t t lilT j=l k=l - ";(,8m,U)} dN.~'(U)11 u) - ";(Po, u)} Vj({3o, u) 0 {dN. njk (u) - dN.~(U) A.jk n I dU} II Then, using the similar argument as for (3.7), we can show that I(j3m) ~ I({3o). o The large sample properties of the maximum partial likelihood estimator j3 d for the distinct baseline hazard model and the maximum partial likelihood estimator j3 c for the common baseline hazard model in the iid cases are given in the form of theorems in the next two sections, respectively. Their proof follows the similar argument as the proof for j3 m because the proof for the mixed baseline hazard model combines the features of both distinct baseline hazard model and the common baseline hazard model. For the sake of conciseness, the proofs for the two theorems are not repeated. 71 3.4 Distinct Baseline Hazard :Model Define K J 1({3) = ~ ~ Ijk(,8) , j=1 k=1 = loT tJjk({3, t)s}~)({3, t)'>'Ojk(t)dt, I jk ({3) d E (,8) J K J K = ~ ~ ~ ~ E{D1jkU")D~/g(,8)}, j=1 k=1 1=1 g=1 1(,8) - r 10 {z.. ()_ ,)k U SW(,8,U)} {dN" ( ) __ Y;. ( ) ,8'Ziilc(U) dN.jk(u) } s~O)(,8) ,)k U ,)k U e s~O)(,8 ) , ')k ,U n)k 72 ,u . . ({3 t) _ V,k , - (2) ({3, t) _ {(I) }02 8jd{3, t) 8 jk (0) Sjk ({3, t) (0) , Sjk ({3, t) and 8W({3, t) is defined as in Condition M.lY, d = 0,1,2. Theorem 3.5 (Asymptotic properties of 130) Assume that the following conditions D.l to D../. are satisfied: D.l. AOjk(t) ~ 0 and faT AOjk(t)dt < 00, j = 1"", Jj k = 1"" ,K. D.2. There exists a neighborhood 8 of the true value (30 such that, for j = 1,"',J and k = 1,'" E{ ,K, lljk(t)IZljk(t)1 2 e{3'Zl j k(t)} < sup 00. te[0,Tj.{3e8 D.S. Pr{lljk(t) = 1 Vt E [0, T]} > O. D../.. The matrices I jk ({3o) are positive definite, Vj = 1,"" 1,,,,, K. Then 3.5 Common Baseline Hazard Model Define L L Jor v({3, t)Sjk ({3, t)Ao(t)dt J I({3) K j=lk=1 • - iT (0) 0 V({3, t)s~~)({3, t)Ao(t)dt, 73 J and k = S~.2)«(3, t) = S~~)«(3, t) V«(3, t) K J }JC«(3) }0 , S~~)(~-f,t) {S~.l)(~", t) - 2 K J = ~ ~ ~ ~ E{Dljk(t~)D~Jg«(3)}, j=lk=lj=lg=l 1 = K T ~ ~ ~ 10 V«(3~1 t)dNijk(t), - n i=l j=l k=l 0 S~~)«(3, t) {S~.l)«(3, t) S.~O)«(3, t) - S.~O)«(3, t) V«(3, t) = S~d)«(3, t) J n J K j=l k=l n J = ~ ~ SW«(3, t) 1 K J }02 , fOj~ d = 0,1,2, K ±c«(3) = - ~~ ~ ~ ~ Bijk«(3)B~jg«(3), n i=lj=lk=lJ=lg=l r {z ijk ()U - - 10 S~.l)«(3,U)} s.~O)«(3, u) n N... (u) s~.d)«(3, t) J {dN () Yo ( ) (3'Ziilc(U) dN...(u) } ijk U - ijk U e nS~O)«(3, u)' J K = ~ ~ ~ NijA:(U), i=l j=l k=l K = ~ ~ sW«(3, t), j=lk=l and sW«(3, t) is defined as in Condition M.IV. 74 for d • = 0,1,2, • Theorem 3.6 (Asymptotic properties of /3 c) If the following conditions C.l to C.4 hold: C.l. Ao(t) ~0 and loT Ao(t)dt < oo,j = 1,···, J. C.2. There exists a neighborhood B of the true value J" = 1" ... J and k E{ = 1"... K f3 0 such that, for , sup Yijk(t)IZljk(t)12e,8'Zlik(t)} te[o,Tj,{3eB < 00. C.3. Pr{Y'ijk(t) = 1 'Vt E [0, T]} > O. C.4. The matrix 1({3) is positive definite. Then as n ~ 3.6 00 and E c({3o) can be consistently estimated by if(/3c). Concluding Remarks It may be unrealistic to assume that (X, 6, Z (t)) are identically distributed across (independent) families under our generalized dependence structure, although it was assumed both in the WLW model under the setting of between-subject dependence and in the LWA model in the context of within-subject dependence. We relax the assumption of identical distributions in proving the consistency of the estimators and their asymptotic normality. Simulation studies are conducted in Chapter 4 to assess the adequacy of the proposed large-sample approximation for practical sample sizes. We conclude this chapter by noting that it is vital that the marginal models are cor• rectly specified in developing the asymptotical distribution theory of the estimators. The consequences of misspecified marginal models are discussed in Chapter 6. 75 Chapter 4 Simulation Studies of Parameter Estimators 4.1 Introduction This chapter deals with the numerical evaluation of the regression parameter estimators for the three types of marginal models. In Chapter 3 we showed analytically that under some regularity conditions the estimators are consistent and asymptotically normal with a covariance matrix which can be consistently estimated if the marginal models are correctly specified. In this chapter we study the finite sample properties via simulation to assess the adequacy of the proposed large-sample approximation for practical sample sizes. Recall that the mixed baseline hazard model combines both features of the distinct baseline hazard model and the common baseline hazard model. We confine our attention to the mixed baseline haza.rd model in the simulations. • 76 4.2 Simulation Parameters In these simulations, we consider two members (J = 2) in each family and two failures (K = 2) for each member. The four failure times are independent across families. The failure times Till, Ti12 , Ti21 , and Ti22 for the ith family are generated from the multivariate Clayton-Oakes distribution (Clayton (1978), Oakes (1989), and Clayton and Cuzick (1985)), with a Weibull marginal distribution for each member j(= 1,2) and for each failure k(= 1,2) : (4.1) where the Weibull density distribution is ,p(ptp-l exp{ -(ptP} . . = 1. In the (, = 1) were Exponential distribution is a special case of Weibull distribution with, . simulation, both Weibull marginals (, :I 1) and exponential marginals used. We assume a different baseline for each member in a family and an identical baseline for the two failure times from the same member. This would be the case, for example, in a vision loss study involving husbands and wives, where we treat vision loss from each eye as one type of failure. It should be reasonable to assume a mixed baseline hazard model with different baseline hazards for the husband and the wife from the same family and an identical baseline hazard for the left and right eye vision loss. For the multivariate failure time distribution with exponential marginals, a (constant) baseline AOl = 1 is assumed for Member 1 and a (constant) baseline A02 = 5 for Member 2, that is, , = 1 and p = 1 in (4.1) for Member 1 and, = 1 and p = 5 for Member 2. In the model with Weibull marginals, p = 1" 77 = 1 and p = 1, ')' = 3 were used, baseline A02 = 3t 2 for which resulted in a baseline AOl - 1 for Member 1 and a Member 2. For the sake of simplicity we focus on the case of a single regression parameter for the mixed baseline hazard model in (4.1). We chose values for the true regression coefficient Po of 0 and 0.7, which correspond to relative risks of 1 and approximately 2, respectively. Distributions for the scalar covariate were taken to be independent Bernoulli(0.5) and Normal(O, 1) truncated at ±5. The parameter () represents the degree of pairwise dependence of failure times. When Po = 0, () ---. 0 gives the maximal positive correlation of 1. Independence is the limiting case of () ---. 00. In the simulations the dependence parameter () values were 0.25, 0.80, 1.50: and 3.00. Table 4.1 shows the observed means of pairwise correlations of failure times for the () values based on 1,000 simulation runs each with a sample size of 1,000 and N(O, 1) covariate truncated at ±5, when the marginal distribution is exponential or Weibull and no censoring is imposed. Table 4.1: Observed mean pairwise correlations for different () values based on 1,000 simulation runs with a sample size of 1,000 for each run and no censoring. (J .25 .80 1.50 3.00 Censoring times Cijk Exponential f30 = 0 f30 = 0.7 .936 .711 .510 .303 .423 .321 .230 .137 Weibull f30 =0 .854 .622 .442 .264 f30 = 0.7 .474 .344 .244 .146 were from uniform (0, T) distribution and they were gen- erated independent of each other and of Tijk and Z,ijk. .. The numbers of independent II clusters (n) were 50, 100, and 200. 78 The larger the number of simulation runs R, the better the precision of the estimator based on the simulations. Table 4.2 gives simulation results for R = 500, 1,000, and 10,000 given the configuration of sample size of 100, normal(O,l) covariate, uniform(O, 5) censoring, and () = 0.25. To balance the precision of the estimator and the simulation time required, we carried out 500 simulation runs (R = 500) for each given simulation configuration. Table 4.2: Simulation results for n=100, Normal(O,1) covariate, uniform(O, 5) censoring distribution (12% censoring for /30 = 0 and 14% censoring for /30 = 0.7), () = 0.25, exponential marginals, and sample size of 100. l30 V- mean Pm var(Pm) 0 500 1,000 10,000 .001 .001 -.000 .0027 .0031 .0030 .0028 .0028 .0028 .7 500 1,000 .708 .708 .0061 .0061 .0056 .0057 .. 4.3 mean R Summary Statistics The Newton-Raphson iterative procedure was used in each simulation run to obtain the estimators from the pseudo score equation (2.10). R random samples of size n were generated from the multivariate Clayton-Oakes distribution (4.1) for each of the simulation configurations. For each random sample of size n, the maximum partial likelihood estimator of • /30, f;r, and the robust variance estimator, were obtained. Notice that here we use if V,. , r = 1,'" .R, to denote the scalar version of Em(/J) in (3.9). From these estimators we calculated the Wald-type 90% and 95% confidence 79 intervals. The simulation results are summarized within each configuration by the following: mean ffim' mean V, and the estimated Vlald-type coverage probability. More specifically, AIR mean 13m = R L: f3r' r=l AIR mean V = R 1 R 90% CP = R L: I {f3r A A L: v,., A r=l r;:: A 1.645y v,. < 130 < f3r r=l and 1 95% CP = R R L: I {f3r A r;:: A 1.96y v,. < 130 < f3r r;:: + 1.645y v,.}, r;:: + 1.96y v,.}. r=l To estimate the true variance, E m(f3o), for the mixed baseline models used in simulations, we calculated the sampling variance within e:ach configuration as AIR 2 = ~ L:[f3r - mea.n (13m)] . - 1 r=l A A var(f3m) Note that good agreement between mean V and var(tlm), which occurs when the ratio of these two statistics is close to 1, indicates that the robust variance estimator is a good estimator to the true variance. The estimated coverage probability CP is a summary measure of the bias of the parameter estimator, ffim' the bias of the robust variance estimator, and the adequacy of the normal approximation of the parameter estimator. Hence, CP assesses the normality of the estimator if both the parameter estimator and its variance estimator are unbiased. 4.4 Results and Discussion The simulation results are displayed in tables 4.3 .. 4.8. In these tables 130 and 0 refer to the true regression parameter and dependence parameter of the underlying distributions and n is the sample size for each simulation run. Each row in the tables is based on 500 simulation runs. 80 • These simulation results suggest that the estimator 13m is approximately unbi- ased for each simulation configuration. The sandwich type robust variance estimator appears to be a good estimator of the true variance, as judged by the good agreement between var(~m), and mean V for sample size of 100 or larger. The empirical Wald-type coverage probability has proper sizes. This is true for all combinations of the true regression coefficients, the degree of correlation among failure times, the censoring probabilities considered, the covariate distribution, and both the exponential marginals and the Weibull marginals. Table 4.3, Table 4.4, and Table 4.5 show the comparison results of censoring for the failure times with Normal(O, 1) distributed covariate and exponential marginals. Table 4.3 gives the summary statistics of the simulation results when there is no censoring. The simulation results with censoring are shown in Table 4.4 and Table 4.5. The Uniform(0,5) censoring distribution in Table 4.4 resulted in 12% and 14% of the data censored when /30 = 0 and /30 = 0.7, respectively. The Uniform(O,I) censoring distribution in Table 4.5 led to about 14% of the data censored. It appears from these tables that as censoring proportion increases, the variances of ~m become larger. Comparing the results from Table 4.4, and Table 4.6, we see that the variances of ~m for the Weibull marginals are somewhat larger than those for the exponential marginals, consistently for each combination of /30, (), and sample sizes. Table 4.4 and Table 4.7 indicate that the variances of ~m, var(~m), with the Bernoulli(0.5) covariate are about three times larger than those with the Normal(O, 1) covariate for different () and sample sizes, when /30 = 0.7 and the failure times with the exponential marginals. When n ~ 100, var(~m) is somewhat larger than mean V and it appears that we need sample size n = 100 or larger for better variance estimates for the Bernoulli(0.5) covariate distribution. The results in Table 4.8 show that the variance of ~m based on sample size of 100 for the failure times with Weibull marginals is larger than that for the failure 81 times with exponential marginals when the Bernoulli(O.5) covariate distribution and uniform(O, 5) censoring distribution are used. The simulation results thus indicate that the proposed large-sample approximations are quite reasonable for practical applications. Further numerical studies are needed to study the impact of heterogeneous associations among failure times. 82 Table 4.3: Simulation results (based on 500 simulation runs) for Normal(O,l) covariate, no censoring, and exponential marginals. Po 0 .7 n mean Pm varCPm) .25 50 100 200 .001 .000 -.001 .0055 .0024 .0012 .80 50 100 200 .001 .000 -.002 1.50 50 100 200 3.00 (J mean V 90% CP 95%CP .0047 .0024 .0012 .862 .902 .894 .922 .942 .948 .0057 .0025 .0012 .0049 .0024 .0012 .866 .888 .890 .930 .936 .936 .001 -.000 -.002 .0058 .0026 .0013 .0050 .0025 .0012 .862 .876 .892 .920 .930 .930 50 100 200 .002 -.000 -.002 .0058 .0027 .0013 .0050 .0025 .0012 .866 .874 .894 .920 .926 .936 .25 50 100 200 .717 .709 .703 .0125 .0061 .0028 .0103 .0053 .0027 .858 .872 .894 .934 .920 .956 .80 50 100 200 .711 .707 .702 .0100 .0049 .0023 .0086 .0044 .0023 .862 .864 .892 .926 .928 .956 1.50 50 100 200 .708 .705 .701 .0089 .0043 .0020 .0078 .0040 .0020 .872 .882 .894 .936 .936 .946 3.00 50 100 200 .706 .704 .700 .0081 .0039 .0019 .0073 .0037 .0019 .882 .882 .900 .936 .938 .944 83 Table 4.4: Simulation results (based on 500 simulation runs) for Normal(O,l) covariate, uniform(O, 5) censoring distribution (12% censoring for for /30 = 0 and 14% censoring /30 = 0.7), and exponential marginals. 130 0 .7 n mean 13m var(/3m} .25 50 100 200 .005 .001 -.001 .0056 .0027 .0013 .80 50 100 200 .001 .000 -.001 1.50 50 100 200 3.00 mean V 90% CP 95%CP .0055 .0028 .0014 .902 .904 .922 .944 .962 .962 .0055 .0028 .0013 .0056 .0029 .0014 .900 .900 .908 .956 .954 .968 -.001 .000 -.001 .0055 .0028 .0014 .0056 .0029 .0014 .898 .900 .912 .950 .956 .956 50 100 200 -.002 .000 -.001 .0056 .0028 .0014 .0057 .0029 .0014 .896 .896 .900 .946 .960 .954 .25 50 100 200 .713 .708 .701 .0119 .0061 .0026 .0109 .0056 .0029 .896 .874 .910 .936 .944 .952 .80 50 100 200 .707 .706 .701 .0098 .0049 .0023 .0091 .0048 .0024 .878 .890 .904 .932 .932 .952 1.50 50 100 200 .703 .704 .700 .0090 .0043 .0022 .0084 .0044 .0022 .884 .890 .898 .934 .946 .944 3.00 50 100 200 .701 .704 .700 .0084 .0040 .0021 .0080 .00401 .0020 .880 .900 .898 .946 .942 .950 8 84 Table 4.5: Simulation results (based on 500 simulation runs) for Normal(0,1) covariate, uniform(O, 1) censoring distribution (42% censoring), and exponential marginals. Po 0 n mean Pm var(Pm) .25 50 100 200 .004 .001 -.000 .0018 .0041 .0021 .80 50 100 200 .001 .000 -.000 1.50 50 100 200 3.00 (J V 90% CP 95%CP .0085 .0043 .0021 .914 .902 .912 .958 .958 .956 .0083 .0044 .0022 .0086 .0043 .0021 .902 .900 .896 .954 .950 .962 .000 .000 .000 .0086 .0046 .0022 .0081 .0044 .0021 .888 .890 .894 .952 .938 .944 50 100 200 -.004 -.000 -.000 .0081 .0046 .0023 .0081 .0044 .0021 .900 .904 .882 .952 .944 .938 .25 50 100 200 .108 .106 .699 .0131 .0013 .0033 .0134 .0068 .0034 .910 .902 .912 .948 .942 .946 .80 50 100 200 .102 .105 .100 .0115 .0061 .0030 .0111 .0060 .0030 .902 .898 .898 .950 .938 .950 1.50 50 100 200 .100 .103 .100 .0112 .0056 .0029 .0111 .0051 .0028 .896 .894 .906 .952 .936 .952 3.00 50 100 200 .698 .103 .699 .0109 .0054 .0029 .0109 .0055 .0021 .888 .904 .896 .950 .948 .948 0 .1 mean 85 Table 4.6: Simulation results (based on 500 simulation runs) for Normal(0,1) covariate, uniform(O, 5) censoring distribution (19% censoring for 130 for 130 = 0 and 21 % censoring = 0.7), and Weibull marginals. f30 0 .7 (J n mean Pm var(Pm) mean V 90% CP 95%CP .25 50 100 200 .005 .001 -.000 .0060 .0029 .0014 .0060 .0031 .001.5 .890 .900 .914 .950 .956 .958 .80 50 100 200 .001 .001 -.000 .0061 .0030 .0014 .0061 .0031 .001.5 .904 .910 .918 .954 .950 .962 1.50 50 100 200 -.000 .000 -.000 .0062 .0030 .0015 .006:2 .0031 .001.5 .894 .910 .910 .958 .950 .962 3.00 50 100 200 -.002 .000 -.000 .0062 .0030 .0016 .006:2 .0031 .001.5 .890 .900 .896 .952 .952 .944 .25 50 100 200 .712 .707 .701 .0124 .0063 .0027 .011-4 .0059 .0030 .898 .884 .904 .930 .934 .954 .80 50 100 200 .706 .706 .700 .0103 .0052 .0025 .0097 .0050 .0026 .888 .892 .900 .944 .928 .950 1.50 50 100 200 .702 .704 .700 .0097 .0047 .0024 .0090 .0047 .0023 .894 .894 .894 .936 .932 .952 3.00 50 100 200 .700 .703 .700 .0092 .0044 .0023 .0086 .0044 .0022 .904 .880 .902 .946 .942 .954 86 Table 4.7: Simulation results (based on 500 simulation runs) for 130 - 0.7, Bernoulli(0.5) covariate, uniform(O, 5) censoring distribution (9% censoring), and exponential marginals. n mean Pm varCPm) mean V 90% CP 95%CP .25 50 100 200 .713 .704 .701 .0314 .0153 .0074 .0275 .0139 .0071 .876 .866 .902 .934 .924 .948 .80 50 100 200 .704 .700 .699 .0300 .0146 .0067 .0257 .0129 .0066 .872 .882 .894 .936 .938 .952 1.50 50 100 200 .702 .698 .698 .0283 .0139 .0063 .0248 .0124 .0063 .870 .884 .892 .934 .932 .948 3.00 50 100 200 .702 .698 .697 .0269 .0131 .0060 .0241 .0121 .0061 .878 .880 .896 .944 .936 .960 (J 87 Table 4.8: Simulation results (based on 500 simulation runs) for sample size of 100, Bernoulli(0.5) covariate, and uniform(O, 5) censoring distribution (12% censoring for f30 = 0 and 9% censoring for /30 = 0.7 with exponential marginals and 19% censoring for /30 = 0 and 16% censoring for /30 = 0.7 with Weibull marginals). fJo 0 .7 marginals mean ~m var(~m) .25 Exponential Weibull -.006 -.005 .0120 .0127 .80 Exponential Weibull -.009 -.008 1.50 Exponential Weibull 3.00 mean V 90% CP 95%CP .0112 .0122 .878 .888 .938 .938 .0123 .0130 .0112 .0122 .874 .872 .932 .930 -.008 -.008 .0123 .0128 .0113 .0123 .888 .874 .940 .940 Exponential Weibull -.008 -.009 .0120 .0129 .0113 .0123 .892 .886 .954 .948 .25 Exponential Weibull .704 .704 .0153 .0160 .0139 .0148 .866 .872 .924 .932 .80 Exponential Weibull .700 .700 .0146 .0153 .0129 .0139 .882 .888 .938 .928 1.50 Exponential Weibull .698 .698 .0139 .0146 .0124 .0134 .884 .890 .932 .938 3.00 Exponential Weibull .698 .697 .0131 .0140 .0121 .0130 .880 .880 .936 .938 (J . 88 Chapter 5 Example 5.1 Introduction In this chapter we illustrate the methodology using data from the Framingham Heart Study. In Section 5.2 we describe the data and the model which was fit. The estimates and a brief discussion are presented in Section 5.3. 5.2 Data and Model The Framingham Heart Study was undertaken to determine the risk factors for coronary heart disease (eHD) and other atherosclerotic disorders (Dawber (1980)). It began in 1948. At that time Framingham was a small self-contained community of approximately 28,000 white inhabitants, 18 miles west of Boston. The study included 2,336 men and 2,873 women aged between 30 and 62 at their baseline examination. The sampling unit was family. The members of the cohort were invited for an examination every two years, for the duration of this 30 year study. At each of the biennial visits a medical history, physical examination, blood chemistries, and other labora89 tory work were completed. Symptoms of illness that had developed since the previous visit were reviewed, and interim hospitalizations or medical visits were registered. Besides demographic information, risk factors such as blood pressure, cholesterol levels, weight, alcohol use, gout, diabetes mellitus, hemoglobin, smoking behavior, and so on, were recorded at baseline and throughout the follow-up period. The Framingham Heart Study data set collected over the course of thirty years of follow-up examinations is sizeable. For simplicity, in this example, we restrict our focus to the time to the first evidence of cerebrovascular accident (CVA) and the time to the first evidence of CHD. The data set used in this example includes all participants in the study who had an examination at age 44 or 45 and were diseasefree at that examination. By disease-free we mean no prior history of hypertension or glucose intolerance and without experiencing a CHD or CVA. The time origin is the time of the examination at which an individual entered the sample. Because some individuals were in the study several years prior to inclusion into the dataset, the waiting time (mean 7.4 years) from entering the study to reaching 44 or 45 years of age was used as a covariate to account for the cohort effect. . Of the 1,571 disease free individuals, 233 later experienced CHD but not CVA, 34 CVA but not CHD, and 17 both CHD and CVA. Three hundred and ten participants were siblings: 113 groups with sibship size of 2, 24 of size 3, and 3 of size 4 (J=4). We considered a mixed baseline model with different baselines for CHD and for CVA and an identical baseline for sibling members. The risk factors of interest were gender (59% females), systolic blood pressure (mean 122 mm Hg), body mass index (mean 24.8 kg/m 2 ), cholesterol level (mean 231 mg/dL), and cigarette smoking (63% smokers). The values of risk factors were taken from the biennial examination at which an individual was entered into the sample. Because of the small number of CVA events, we used the same regression coefficients for both CHD and CVA. For illustrative purposes, however, the gender effect was treated as failure-specific. More 90 ., specifically, the mixed marginal hazard model was Aijk(tj Zijk) where, k 5.3 = AOk(t) exp{,8' Zijd = 1 for CHD, k = 2 for CVA, Zijl = (Smoking, Cholesterol, BMI, SBP, Waiting time, Gender, 0)', and Zij2 = (Smoking, Cholesterol, BMI, SBP, Waiting time, 0, Gender)'. Results and Discussion The analysis results for the mixed baseline hazard model are shown in Table 5.1. The p-values are from the Wald test. We conclude from the analysis that cigarette smoking, higher blood cholesterol level, body mass index, and systolic blood pressure significantly increase the hazard rate for CHD and CVAj and that being a female significantly reduces the hazard rate of getting CHD but not CVA. Failure to establish an association between gender and the time to CVA may reflect the small number of CVA in the data. Klein, Moeschberger, Li and Wang (1992) also analyzed this data set using frailty models. Because a common frailty could not capture both between-subject dependence and within-subject dependence concurrently, they had to perform three analyses, separately: one analysis for CHD and CVA events which ignored the dependence among siblingsj the other two analyses for either CHD event or CVA event, but not both, which considered the association between the failure times among siblings. Ignoring dependence among failure times, however, may result in erroneous conclusions. Separate analyses for CHD and CVA preclude the direct estimation of the common covariate effect for both CHD and CVA. Because individuals' susceptibilities to CHD are different from their susceptibilities to CVA and there are no physiological differences among siblings, neither the 91 common baseline hazard model nor the distinct baseline hazard model is appropriate here. The mixed baseline hazard model offers greater modeling flexibility and applicability and allows us to handle the problem the current existing methods cannot deal with. Table 5.1: Estimates of regression parameters for the Framingham Heart Study data. Effect J9 SE p-value Smoking status (Yes=1, No=O) 0.360 0.134 0.007 Cholesterol (mgjdL) 0.004 0.001 0.006 Body mass Index (kgjm 2 ) 0.039 0.016 0.016 Systolic blood pressure (mm.Hg) 0.017 0.005 < 0.001 Waiting time (year) 0.006 0.019 0.770 CHn -0.616 0.132 < 0.001 CVA -0.308 0.280 0.272 Gender (Female= 1, Male=O) , 92 Chapter 6 Misspecification of Marginal Hazard Models 6.1 Introduction In the proof of the asymptotic distributions for regression parameter estimators in Chapter 3, it is critical that the marginal hazard functions Aijk(t; Z) are correctly specified. In this chapter we study the consequences of misspecification of Aijk(t; Z). The misspecification of a marginal hazard model may happen in a variety of ways. First, the model family may be misspecified. For example, the Cox regression hazard model is assumed for analysis when, in fact, the accelerated life model holds. Secondly, if the choice of the Cox hazard model is indeed correct, the wrong functional form for the regression portion, exp{,B' Z}, may be assumed, which includes omitting important covariates from the model or choosing the wrong functional form for a covariate. The baseline types may also be misspecified even when the Cox model is . applicable and the correct functional form of exp{,B' Z} is used, such as a common baseline hazard model is assumed but a mixed baseline hazard model should be used. 93 As in Chapter 3 we shall confine our attention to the mixed baseline hazard model in this chapter since the technical development for the mixed model combines the features of both the distinct baseline hazard model and the common baseline hazard model. In Section 6.2 we derive the asymptotic properties for the estimator under an assumed mixed baseline model which is ]possibly misspecified. We then give the asymptotic properties of the estimator under an assumed distinct baseline hazard model and under an assumed common baseline hazard model, respectively, without proof for the sake of conciseness. In Section 6.3 we apply the general results developed in Section 6.2 to some special cases, including the case of misspecifying the type of baseline hazard function for the Cox model when the functional form of the regression portion, exp{,8' Z}, is correct. To avoid imposing more conditions, we assume in this chapter that the covariate, counting, and censoring processes are independent and identically distributed. In other words, (Xijk , e5ijk, Z ijk)( i = 1,·", n) are n iid realizations of (X1jk , e51jk , Z Ijk) for j = 1,···, J and k = 1,···, K. We assume in the sequel that the true underlying marginal hazard func- tion for (i,j, k) is Aijk(t). Note that the true marginal hazard model Aijk(t) may not even belong to the Cox regression model family. In addition to the notation SW(,8, t), S}~)(,8, t), sW(,8, t) = E(SW(,8, t)) (d = 0,1,2), \,;(,8, t), and etc. we introduced in Chapter 3 under an assumed marginal baseline hazard model, hereafter, we use the following notation for the true underlying marginal hazard function Aijk( t): S}~)(t) = K L SW(t), k=l sW(t) = E(SW(t)), 94 d == 0,1,2, . . and S~~)(t) K = ~ SW(t) d = 0,1,2, k=1 where the expectations are taken with respect to the true model of (i,j, k). 6.2 Asymptotic Properties of Regression Estimators 6.2.1 Let Mixed Baseline Hazard Models /3 be defined as the solution to the pseudo partial likelihood score equations (c.f. 2.3) K . S}~)(,8'U)} T{ Zijk(U) L fa k=1 j=1 i=1 0 ~L J n (0) Sj. (,8, u) under an assumed mixed baseline hazard model dNijk(u) = 0 (6.1) Theorem 6.1 (Consistency of /3 under a possibly misspecified marginal mixed base- line hazard model) Let,8* be the unique solution to the system of equations q(,8) = 0, where ,8) q( (1)(,8 t) r {S(I)(t) _ Sj. , s(O)(t)}dt. ~ Jo (0)(,8 t) J T =~ J. J=1 Suppose that the following conditions hold: Condition 1. For j SJ. , J. = 1,···,J and k = 1,···,J{ E{ sup Y'ijk(t)!Zljk(t)1 2 Aljk(t)} < 00. te[O,T] There exists a neighborhood B of,8* such that E{ sup Y'ijk(t) IZljk(t) 12 e,8' Z1 11c(t)} < 00, te[o.T].,8eB 95 (6.2) AOj(t) ~ 0, and loT AOj(t)dt < 00. Condition 2. Pr{¥ijk(t) = 1 Vt E [0, T]} > 0. Condition 9. The matrices . K - E fa = 1"" * (0) "j({3, t).5jk (t)dt 0 k=l are positive definite, Vj T ,J. Then the maximum partial likelihood estimator f3 is a consistent estimator of {3*. Proof: This proof follows the argument as in the proof for Theorem 3.1. First, following the proof of Theorem 3.3, it is easy to show that Condition 1 and Condition 2 implies that for j = 1,,'" J, k = 1",', K, and d = 0,1,2 sup IIS~~)(t) - .W(t)1I .~ te[O,T] ° (6.3) and there exists a neighborhood B of {3* such that, sup IISW({3, t) - .W({3, t)1I ~ te[o,T],{3eB as n ~ 00; ° (6.4) sW(t) is bounded on (0, T) and sWUf, t) on B x (0, T); and s~~)(t) is bounded away uniformly from zero on (0, T) and s.~~)({3, t) away from B x (0, T) as well. Consider the process N G({3, t) 1 = -[I({3, t) - 1({3*, t)] n where 96 .!.... K = L E Gjk ({3, t) ~lbl N . Define the process J G(~, t) K = L: L: Gik(~, t) i=1 ];=1 where Then 1~ r {Iog S(O)(Q* sj~)(~, u) 1 s}~)(~, u) } dN. ) - og - -~!- Jo = _ ,=1 r Jo J. (O)(Q*) IJ, U SJo IJ, U {10g sj~)(~*,u) sj~)(~,u) -10 S}~)(~'U)} d (Noik(U)) . g s}~)(~*,u) n Based on (6.4) and boundedness of s~~)(~, u), for every U E [0, r] and ~ E 1 SJ~)(~, u) 1 s}~)(~, u) og sj~)(~*, u) - og s}~)(~*, u) 1 sj~)(~, u) ) sJ. IJ,U og (O)(Q ~ as n -+ 00. 1 - sj~)(~*, u) ) sJo IJ ,U og (O)(Q* 0 Also, from (1) in Lemma 3.1 ~ + Pr {faT SJ~)(u)du ~ b}. Choose b > ( ) iik U faT s~~)(u)du, then in view of (6.3) we have ·Pr {faT sj~)(u)du ~ 97 b} ~0 B as n ~ 00. As the consequence of the boundedness condition of s~~){t), ~ ~ 0 as TJ ~ 00. Thus, lim lim Pr {Noik(T) n '1too n_oo ~ TJ 1> = o. J (6.5) Therefore, we have Gik{~' T) - Gik(/J, T) --.£-+ 0, that is, G(/J, T) is asymptotically equivalent to G{~, T). The compensator of Gik(~, t) is Then, for each ~ E = 1 B, the process n -n!'" 1=1 1t{ 0 I~(O){/J,U)} dM"k{U) *' (~ - ~ ) Z"k{U) -log - (O){(.I*) ). I) Si. I) fJ' U is a locally square integrable martingale in t since the integrand is locally bounded and predictable. The predictable variation process of this martingale at t is . 98 + ( log )2} s~~)(,8,u) (0) * Sj. (f3 ,U) Yijk(U)-\ijk(U)du _ .!.n Jor{(f3-f3*)'S(2)(U)(f3-f3*)-2([3-[3*)'S(1)(U)IOg S~~)([3,U) Jk Jk (O)(a*) SJ. + log Sj.(0) ([3, u) ) 2 S(O)(u) } (O)(a*) sJ. ,." U ( ,." U du Jk It now follows from (6.3), (6.4), and the boundedness conditions of s~~)(t) and s~~)([3, t) that for each [3 E B, nHjk ([3, r) converges in probability to a finite function of [3. Hence, the inequality of Lenglart (2) in Lemma 3.1 implies that, for each [3 E Band as n --+ 00, Also, as the result of (6.3), (6.4), and the boundedness conditions of sW(t) and sW([3, t)(d = 0,1), for each [3 E B, A jk ([3, r) ( a) = 9jk,., T 1 o { converges in probability to (O)(a u ) } ([3 - [3 *)' Sjk (1)( (0)( u) - Sjk u)log Sj.,." (0) * duo . Sj. ([3 ,u) Hence, Gjk([3, r) must converge in probability to gjk([3), as long as [3 E B, for j = 1,''', J and k = 1"", K. Therefore, G([3, r) ~ g([3) where K J =L L g([3) gjk([3). j=lk=l Using the boundedness conditions of sW(t) and s}~)([3, t)(d = 0,1,2), we may evaluate the first and the second derivatives of g([3) by taking partial derivatives inside the integral [c.f., Corollary 5.9 in Bartle (1966)]. Hence, for each [3 E B, 8 8[3g([3) - ~~ r { S3k(1)( U) _ j=1k=l 0 L..J L..J Jo 99 s~~\[3, u) (0)( )} d (0) S3k u u Sj. ([3, u) ~ ior {8).(1)( U ) _ 8~~)(,8, u) (0)( )} d (0) S). U U - LJ j=1 0 = which is 0 at ,8 Sj. (,8, U) q(,8) = ,8* by the definition of {3*. - Furthermore, _LJ ~ r {8}~)({3, u) _ Jo j=1 0 = - (0) Sj. ({3,u) (8~~ll(,8, U)) 18l (0) Sj. (,8,u) 2 } (0)( S3. )d u U J r L io Vj(,8, u)s~~)(u)du j=1 0 which is negative definite at ,8 = ,8* by Condition a. Therefore, G({3, r) converges pointwise in probability, for each ,8 E B, to a concave function 9(,8) on B with a unique maximum at ,8 = ,8*. The random function 0'(,8, r) is also concave and has a unique maximum at ,8 = ,8* when the maximum exists. Following from Lemma 3.2, the random concave function G(,8, r) converges to 9(/3) in probability uniformly over i3 of 0(,8, r) converges in probability to the maximizing value of ,8* of 9(,8), that is, i3 ~ ,8* as n ~ 0 B. Consequently, the maximizing value 00. The foregoing result in Theorem 6.1 is a multivariate failure time generalization of Theorem 2.1 of Struthers and Kalbfleisch (1986) and (2.1) of Lin and Wei (1989). When the mixed baseline hazard model (6.1) is correct, i.e., note that we then have 8W(t) Hence, ,8* = AOj(t)8W(,80, t), (d:= 0,1). = ,80 is the solution to q(,8) = O. The asymptotic distribution of the maximum partial likelihood estimator fj under a possibly misspecified marginal mixed baseline hazard model is given in the next theorem. 100 Theorem 6.2 (Asymptotic normality of fj under a possibly misspecified marginal mixed baseline hazard model) Assume that Condition 1 - Condition 3 given in Theorem 6.1 are satisfied. vn {SJ~)((3*, t) as n --+ 00, In addition, suppose that vn {Fj.:n(t) - Fj.(t)} and s;~)((3*, t)} are normal processes with mean O. Then where (6.6) J W" (f3)= IJk r{z.. Jo IJk K K J A((3) = E{~ ~ ~ ~ Wljk((3)W~!g((3)}, j=lk=I!=lg=1 8;~)((3'U)}{dN." (U)_l'ijk(U)eXP{(3IZijk(U)}dF-(U)} (0)( (.I) IJk (0)( (.I ) J.' (u)- sJ. sJ. fJ' U fJ' U (6.7) K Fj.:n(t) K = ~ Fjk:n(t), Fj.(t) k=1 = ~ Fjk(t) k=1 J 1((3) = ~ I j ((3), ;=1 and 1;((3) is defined as in Condition 3 in Theorem 6.1. Proof: The pseudo partial likelihood score function under the working mixed baseline hazard model is J - I: U;((3) ;=1 where 101 The first order Taylor expansion of U(f3) centered at (3* results in U((3) where = U((3*) + :(3U((3) 1(3=19((3 - (3*) /3 is on a line segment between fj and (3*. Since U(fj) = 0 by the definition of fj, we have where First, we shall show that JnU((3*) ~ N p (0, A((3*)) . To this end, we shall show that n- 1 / 2 U j ((3*) can be expressed as a sum of independent and identically distributed random vectors 1 1 -w((3*) ~ n J K = - E E E tVijk((3*) ~ i=lj=lk=l plus terms that converge in probability to zero. In other words n- 1 / 2 U((3*) is asymptoticallyequivalent to n- 1 / 2 w((3*) in the sense that In Let Wj((3) {U((3*) - w((3*)} = Ei=l Ef:"=l Wijk((3). Notice that __ 1 {U .((3*) - w'((3*)} ~:J :J 102 --!~ o. (6.8) xdFj.(u) (6.9) Now, n 1 / 2 {Fj.:n (t) - Fj.(t)} converges weakly to a. zero-mean normal process by assumption. As the consequence of the tightness of the normal process of n 1/2 {Fj .:n (t) - Fj.(t)} and the boundedness of 8~~)(,8, t), (6.8) vanishes as n that 11(6.9)11 < sup UE[O,T] - {S~~)(,8*,u) (0) * Sj. (,8 ,u) - 8~~)(,8*,u) (0) S j. * (,8 ,u) (0)((.1* Sj. 01'(1) 103 fJ ,U )-1} --+ 00. We also have because (,1*) 5 j.(1)( fJ' U p { S(O)(f3*) uelo,T] j. , U SU - (,1* ) Bj.(1)( fJ' U (O)(f3*) Sj. , U (0) * -1 S· U J. (/3, ) } = 0 p( 1) in view of (6.4) and the boundedness of s}~)(f3, u), plus v'n ISJ~)(f3*, u) - s}~)(/3" u)1 = Op(l) from the assumption of the normal process of ..;n{ S)~) (f3* , u) - s}~) (f3*, u)}. Hence, n- 1/ 2 U(f3*) is asymptotically equivalent to K r { B}~)(/3*,U)} r.::-LLLJo Zijk(U)- (0) * V n i=1 j=1 k=1 Sj. (/3 ,U) lIn 'Ti W(/3*) va J 0 .. ( ) _ )'ijk(u)exp {/3*'Zi jk(U)}dF. ( )} (0) * J. U {dNIJk U Sj. (f3 ,u) X • Notice that n- 1/ 2 w(/3*) is a sum of n independently and identically distributed p-component random vectors with mean 0 and cova.riance matrix J A(/3*) = K J K E{L L L L Wljk(/3*)W~jg(/3*)}. j=lk=lj=lg=1 It then follows from the multivariate central limit theorem that What is left is to show the consistency of 1(~) for 1(/3*) for any random /3 = ~(n). We now establish the consistency using the t(~chniques applied in the proof of Theorem 3.2. Notice that r - - j=1 L k=1 L Jo J K Yj(/3, U )dN.jk (u) 0 104 and 1JKr JK -LL}o ¥;(,8,u)dN.;lc(U)-LL}or Vj({3*,u)s~~)(u)du n j=llc=1 0 j=1 lc=1 0 < t. t./[ + + + {11(.8, u) - ,,;(,8, u)} dN.~(u) II t.t.II[ t.t.II[ t.t.11!.' {1I;(,8,u)-"M",u)} dN.~(U)11 ";(/3", u) { dN.~<u) 11;(/3., u) {sj2)(u) - sj2) (u)du }II -w<u)} dull· (6.10) It follows from (6.4) and the boundedness of sW({3, t) that sup ue[O,T].{3e B Therefore, ,8 lilT 0 {¥;({3, u) - Vj({3, unll ~ o. ~ {3* together with (6.5) indicates that the first term of (6.10) con- verges in probability to zero. The continuity in {3, uniformly in t, of sW({3, t) and (6.5) shows that the second term of (6.10) tends to zero also. Following from Lenglart inequality (2) in Lemma 3.1, Pr [I (Vj({3*,u))a.b{ dN.:(u) -S~~)(U)dU} > 8] T 105 where, (Vj (f3 , U))a,b is the (a, b) element of Vj({3, u) and M.jk(U) = E?:l Mijk(U). Hence, (6.4) and the boundedness of sW({3, t) and Condition 1 implies the third term of (6.10) vanishes. Finally, applying (6.3), boundedness conditions of s;:>({3, t)(d = 0,1,2) and Condition 3 directly to the fourth term of (6.10), we obtain that this term converges in probability to zero. Thus, when {3 ~ (3*, as n ~ 00. The proof is completed. 0 Note that Theorem 6.2 is a multivariate failure time generalization of Theorem 2.1 of Lin and Wei (1989). Also, when the marginal mixed baseline hazard model (6.1) is correct, Wijk({3o) reduces to Dijk({3o) and E({3o) = E m ({3) [c.f. (6.7), (3.1), (6.6), and (3.11)]. The sufficient regularity conditions of asymptotic normality for y'n(fj - (3*) are stronger than those when the marginal mixed baseline hazard model (6.1) is correct. It is natural to estimate the covariance matrix of y'n(!J - (3*), E({3*), from the data by (6.11) where, I({3) r z: z: z: 10 ¥j({3, u)dNijk(U), n 1 n J K = - i=l j=l k=l (6.12) 0 (6.13) 106 and X Note that { 71.T () Ok U dJ.V; '3 dFi.:n ( u) } SJ~)(f3, u) • (6.14) by replacing f3*, B~~)(f3, t), s~~)(f3, t), and Fi.(t) with and Fi.:n(t), respectively. It is interesting to note also that the variance estimate for Em ( ) (.l' Zoo (u) 'k U e fJ 'JIe '3 E is obtained from E /3, S~~)(f3, t), SJ~)(f3, t), v -.1; /3, E is defined the same as the variance estimate for /3 m' [c.f. (3.9) - (3.15), (6.11) - (6.14)]. The following theorem states that E is a consistent estimator of E. Theorem 6.3 (Consistency of E(/3) under a possibly misspecijied marginal mixed baseline hazard model) Suppose that all the assumptions of Condition 1 to Condition 3 are satisfied and in addition the following condition holds, E{ sup Yiik(U)IZlik(U)lef3'Zlile(U)Yif9(V)IZlf9(V)lef3'Zl/9(V)} < 00 f3EB.UE[O,"I.tlE[O,"1 1$j,J$J,l$k,9$K (6.15) then Proof: This proof follows the argument in the proof for Theorem 3.4. Basically, we first need to show that 107 and 1(~) ~ 1(13*) as n --+ 00. Then using Slutsky's theorem we obtain A(~) ~ A(f3*) and using Slutsky's theorem again we get the desired results. To show that the first part holds it suffices to show that each component of converges in probability to the corresponding components in that is, for a, b = 1,'" ,p, Write the ath element of the vector bijk as bijk, a, then (6.16) (6.17) (6.18) 108 n +.!:. '" [ 'T (1) (1) 'T Sj.,a(~,U) dN.. (U) [ Sf.,b(~'U) dN· (U) A n ~ i o s(O)((.I ) 1=1 J. ,.."U A io S(o)((.I ) f. ,.."U IJk Ijg (6.19) (6.20) (6.21) --n1 i=l L ior { Zijg,b(U) n 0 A 'T X 1 o }ijk(u)e(3 (0) A S(l) ~~~ (/3 U)} dNijg(u) A ' Sf. ((3, U) I Zijlc(U) Sj. ((3, U) dFj.:n( U) (6.22) (6.23) (6.24) 109 (6.25) (6.26) r x Jo [Zlfg,b(U) o ' ej,b(,8*,U)] ¥if (u)e 13• Zl/9(U) 9 (0) * Sj.(,8,u:) dFj.(u)du } ", (6.27) (6.28) (6.29) Note that (6.16) is just an average of iid random variables. Therefore, (6.16) ~ (6.23) by the law of large numbers. Now, let's rewrite (6.20) and (6.27) in the following forms: (6.20) = dFj .:n ( U )dFj .: n ( v) X S)~)(13, u)S}~)(13, v) (6.30) 110 ,- (6.31) (6.32) (6.33) (6.27) = x dFj.(u)dFf.(v) s~~)(f3*, u )S}~)(f3*, v) Xfj (6.34) (f3* u) dFj.(u)dFf.(v) ,II , s~~)(f3*, u )s~~)(f3*, v) (6.35) - loT loT E{Yif9(V)ef30'Z1f9(V)Zljk,II(U)Yijk(U)ef30'Zlik(U)} X (f.l* fj,b 1J ,V ) dFj.(u)dFf.(v) ( Sj. (f3*, U)S f~)(f3*, V) (0) x fj,II(f3*, u)ef,b(f3*, V) dFj. (U )dFf.(V) S(O)(f.l* U)S(O)(f.l* ). 1J, f. 1J, V ) 111 (6.36) (6.37) Define and " From (6.15), we also have and E{ sup y}jk(u)ef31Z1i1l:(U)y}jg(v)ef31Z1/9(V)} < 00. {3eB,Ue[O,"J,ve[O,"J l$j,J$J,l$k,g$K Using the above condition and a similar argument <l.S in Theorem 3.3, we have that mjkajgb(f3, u, v) is a continuous function of f3 E Band Mjkajgb(f3, U, v) converges in probability to mjkajgb(f3,u,v) uniformly in (u,v) E [O,T]2. By (6.4) and boundedness of s}~)(/3, t), the above results and the consistency of (3, we have Mjkajgb(!3, U, v) p sj~)(!3, u )S}~)(!3, v) ---. mjka!.gb(/3*, u, v) s}~)(/3*, u )s~~)(/3*, v) uniformly in (u,v) E [O,T]2. Note that Fj.:n(u) converges to Fj.(u), that Fj.:n(u) is bounded in probability and that mjkajgb(/3*, u, v) resides in the product space of left continuous functions, it follows that (6.30) is consistent for (6.34). Using similar arguments, we can show that: (6.31) ~ (6.35), (ti.32) ~ (6.36), and (6.33) ~ (6.37). Hence, (6.20) ~ (6.27). Again using the similar argument, we can show that (6.21) ~ (6.28) and (6.22) ~ (6.29). 112 Now, we are going to show that (6.17) is consistent for (6.24). .. < A+B where and " By the law of large numbers and consistency of 13 we obtain that B ~ 0 as n --+ 00. From (6.4) and boundedness of s}:) (,8, t), consistency of 13, and the iid condition across the families we have A 1 n L: < fNi;k( T ) IZijg,b(Xijg) 1 n i=l < fIZijg,b(Xijg)1 n i=l --+ p --+ 1 n L: fE( Zljg,b(X1jg)) 0 113 since E(Zljg,b(X1jg)) < 00 by Condition 1. Hence, A ~ 0 as n ----+ 00. Using similar arguments, we can show that (6.18) ~ (6.25), and (6.19) ~ (6.26). Therefore, . To prove the second part, we write 11 1(,8) - 1(,8*)11 1JKr -n E E Jo j=l k=l JK Yj(/3, U)dN.jk ( u) - 0 E E Jor j=l k=l 0 l1j(,8*, U)s~~>C8*, U)AOj( U)du +t. Ell!.' {Vj(~, u) - u)} dN.~( u) I +t.EII!.' V;(p·,U){dN.~(U) '\~'du}11 v;(p', .. - . A A Then, using the same argument as for 6.10, we can show that 1(f3) 6.2.2 Let p ----+ 1(,8*). 0 Distinct Baseline Hazard Models /3 be defined as the solution to the pseudo partial likelihood score equations J EK EE f Jo n k=l j=l i=l 'T { Zijk(U) - sW (f3, u) } dNijk(U) = 0 (0) Sjk (f3, u) under an assumed distinct baseline hazard model (6.38) 114 Let (3* be the unique solution to the system of equations q({3) If J K faT j=1 k=1 0 = L: L: S(I)({3 t) t:) , {sW(t) - Sjk ({3, t) S)~)(t)}dt = o. (6.39) Define E({3) = I-I ({3)A({3)I- 1((3), J K J K L: L: L: A({3) = E{L: Wljk({3)W~Jg({3)}, j=lk=I/=lg=1 " (~) - r {Zoo ( ) _ sW({3,U)} {dN... ( ) _ }'ijk(u)exp{{3'Zijk(U)} dF- ( )} "k U (o)(~) "k U (o)(~ ) ,k U , o Sjk ,." U Sjk ,." U W"k'" - Jo I({3) J K j=1 k=1 = L: L: I jk({3), and - 8 Z (X) ijk { ijk ijk - S(I)(~ X .. )} jk,." sj~)({3, "k X ijk ) {z.o"k (U)_ SW({3,U)} - Inr (0) o Sjk ({3, u) 0 {dl\Too ( ) _ V:o ( ) (3'Zij/c(U) dFjk:n(U)} H"k U .Li,k U e (0) Sjk ({3, u) 115 • Theorem 6.4 (Asymptotic properties of /3 under (l possibly misspecified marginal distinct hazard model) Suppose that the following conditions are satisfied: Condition = 1,···, J, 1. For j k = 1,···, J(, E{ sup Yijk(t)IZljk(t)12 Aljk(i)} < 00. te[o,'Tj There exists a neighborhood B of f3* such that E{ Yi;k(t)IZl;k(t)1 2e,8' Zli lo (t)} < sup 00, te[o,'Tj,,8e8 AO;k(t) ~ 0, and fa'T AO;k(t)dt < 00. Condition 2. Pr{Yijk(t) = 1 "'It E [0, T]} > o. Condition 9. The matrices I jk (,8*) are positive definite. In addition, assume that .;n {F;k:n(t) - normal processes with mean o. Fjk(t)} andJTi {SJ~)(,8*, t) - sW(,8*, t)} are Then y'n(/3 - ,8*) ~ N as n --+ 6.2.3 Let 00 p (0, E(,8*)) and E(,8*) can be consistently estimated by ±(/3). Common Baseline Hazard Models /3 be defined as the solution to the pseudo partial likelihood score equations • under an assumed common baseline hazard model (6.40) 116 and K J F..(t) = L: L: Fik(t). j=lk=l Theorem 6.5 (Asymptotic properties of /3 under a possibly misspecified marginal common baseline hazard model) Suppose that the following conditions are satisfied: = 1"", J Condition 1. For j and k = 1"", J{ E{ sup Yijlc(t)IZljlc(t) 12Alilc(t)} < te[O,T] 00. There exists a neighborhood B of (3* such that E{ sup }}jlc(t)IZljk(t)12e(3'Zlj,,(t)} < te[o,T].(3e8 00, fT AO(t) ~ 0, and Jo Ao(t)dt < 00. Condition 2. Pr{Yijk(t) = 1 "It E [0, Tn > 0. Condition 3. The matrix 1«(3*) is positive definite. In addition, assume that y1n{F..:n (t)-F..(t)} and y1n{S~O)«(3*,t)-s~.O)«(3*,t)}are normal processes with mean 0, respectively. Then vn(!3 - (3*) ~ N as n 6.3 --+ 00 p (0, E«(3*)) and E«(3*) can be consistently estimated by E(/3). Special Cases The asymptotic results developed in Section 6.2 are very general in the sense that they apply to any possible kind of model misspecification. In this section we consider some special cases of model misspecification. First, we examine the consequences of baseline type misspecification when the true marginal hazard model is in the Cox regression family and the regression part, exp{(3' Z(t)}, is correct. 118 Corollary 6.1 If the true underlying marginal hazard model is a common baseline hazard model (6.40), then the estimator /3 under either an assumed mixed baseline hazard model (6.1) or an assumed distinct baseline model (6.38) is consistent for the • true value of the regression parameter f30. Proof: If the true state of the marginal hazard model is a common baseline model (6.40) but is misspecified as a mixed baseline hazard model (6.1), we have s~~)(t) = K I: s}:)(t) Ie=! - t E{Ao(t)Yljle(t)Z1jle(t)@def3~ZlJ,,(t)} 1e=1 - Ao(t) t E{Yijle(t)Z1jle(t)@def3~ZlJ"(t)} 1e=1 • Hence, from (6.2) _ - ~ r{ (1)( ) s~~)(f3, t) Sj.(0)( f3o, t )} Ao (t) dt. L.J in Sj. f3o' t - (0) j=1 It is clear that f3* 0 Sj. = f30 is the solution to the system of equations in (6.42). by Theorem 6.1, the estimator (6.42) (f3, t) Therefore, /3 from the assumed mixed baseline hazard models (6.1) is consistent when the true marginal hazard model is actually a common baseline model (6.40). By the same argument, it is easy to show that the estimator /3 from the assumed distinct baseline hazard models (6.38) is consistent for f30 when the true marginal hazard model is, in fact, a common baseline model (6.40). 0 The results in Corollary 6.1 guarantee the consistency of the estimator /3 from either a mixed baseline hazard model (6.1) or a "distinct baseline model (6.38) when the 119 Let 13* be the unique solution to the system of equations q(f3) = 0 and (6.41) Define K J A(f3) = E{E j=1 .. (/.I) W'Jk f.I = 1'" {zoo'Jk ( ) o E E E Wljk(f3)W~Jg(f3)}, k=I/=1 g=1 8~.I)(f3,U)} U (0)(/.1) f.I'U S .. J 1(13) 11(13, t) A U r K i=lk=1 11(13, t)s~~)(t)dt, 8~.2)(f3, t) =- s~~)(f3, t) , = ]-1 (f3)A(f3)]-1 (13), 1 n n i=1 n {8~.I)(f3, t) }®2 - K J = - E EEl 1 (0)(/.1 0 = s~~)(f3, t) V(f3, t) = A(f3) {dR.'Jk. ()_}ijk(u)exp{,8'Zijk(U)} dF ( )} s.. f.I,U = E E in E(f3) 1(13) K J i=1 k=1 .,. V(f3, t)dNiik(u), 0 S~.2)(f3, t) S.~O)(f3, t) J K {S~.I)(f3, t) }®2 S.~O)(f3, t) K J E E E E E Wiik(f3)W~/g(f3), n i=1 i=1 k=I/=1 g=1 F..:n(t) J K j=1 k=1 = E E Fjk:n(t), 117 ) .. U , true model is a common baseline hazard model (6.40). If the common baseline model is true and (30 is a scalar, we might anticipate that the variance for the regression estimator from the assumed mixed (or distinct) baseline hazard models would be larger than that from the common baseline hazard models, primarily because the mixed or the distinct baseline model involves more nuisance baseline hazard functions which over-stratify the risk sets and failure events. On the other hand, however, the robustness of the "sandwich" type variance estimator may compensate for the possible loss of efficiency. The following corollary indicates that the converse of Carollary 6.1 is not true. Corollary 6.2 If the underlying marginal hazard model is a distinct baseline hazard model (6.38), then the estimator /3 under either an assumed mixed baseline hazard model (6.1) or an assumed common baseline hazar(l model (6.40) is not consistent for f30. Similarly, the estimator (6.40) is not consistent for f30 /3 from an assumed common baseline hazard model in a mixed baseline hazard model (6.1). Proof: When the true marginal hazard model is the distinct baseline model (6.38) but is misspecified as the mixed baseline hazard model (6.1), then s}~)(t) = K L s}:)(t) k=1 - t E{AOjk(t)Yijk(t)Z1jk(t)®def3~Zljk(t)} k=1 - t AOjk(t)E{Yijk(t)Z1jk(t)®def3~Zljk(t)} k=1 K - L AOjk (t)s}:) (f30, t), d = 0,1,2, k=1 and from (6.2) J K j=1 k=1 (1)( q ( f3) = '" ~ '" ~ 10r{ 8jk f30, t ) 0 (1)(/.1 ' 8j. fJ' t) (0)( (0) \-Sjk f3 0, t )} AOjk ( t ) dt. Sj. (f3, t) 120 (6.43) .. Therefore, f30 is not the solution to the system of equations in (6.43) because sW(f3o, t) • -::j:. s~~)(f3o, t)/ s~~)(f3o, t)sW(f3o, t) in general. It follows from Theorem 6.1 that the estimator /3 from an assumed mixed baseline hazard model (6.1) is not consistent when the true marginal hazard model is a distinct baseline hazard model (6.38). Similar arguments also work for the assertion in other parts of the corollary. o The results in Corollary 6.1 and Corollary 6.2 agree with our expectation that over-stratification does not cause inconsistent estimators, but the converse is not true. When f3* is not equal to f30' the parameter f3* will not be interpretable in general, unless its relationship to f3 0 can be characterized. We focus our attention here on {30,}, the first component of the true parameter vector, {30,1. Corollary 6.3 Let (30,1, Zijk,l (t), and 131 be the first component of the parameter vector f30, the covariate vector Zijk(t), and the estimator vector that (30,1 =0 /3, respectively. Suppose and that Zijk,l (t) is independent of censoring and the other (p - 1) components of the covariate vector Zijk,2(t),'" ,Zijk,p(t). Then 131 under an assumed distinct baseline hazard model is consistent, that is, as n --+ 00. Similarly, if Zijk,l is identically distributed across failure types and /3 is estimated under an assumed marginal mixed baseline hazard model with different baselines for members or if Zijk,l is identically distributed across members and failure types and then /3 is estimated under an assumed marginal common baseline hazard model, 131 is consistent. Note that, in the corollary, the true underlying marginal hazard model Aijk(t) may not be even in the Cox regression family. Proof: Let S}~~l (t) and S}~~l (f3*, t) be the first components of sW(t) and sW(f3*, t), respectively. Let /3 be the estimator obtained from the assumed distinct baseline 121 hazard models (6.38). To show that P1 ~ 0 when #0,1 = 0, it suffices to show = 0 when {3o,1 = 0 by (6.39) and Theorem 6.4. In other words, verify {3i = 0 satisfies the following condition when {3o,1 = 0: {3; J K "" f !- L..J io J=1 k=1 Notice that when {30,1 (1) ( ) 8jk,1 t T {8(1) Jk,1 (t) _ (1) «(.1* t) Sjk,1 ,., , iO)(t)}dt (0)( (.1* t) Jk 8jk ,., , that it is sufficient to = O. (6.44) = 0, = Also, under {3i = 0 Therefore, (6.44) is satisfied, i.e., the P1 from the assumed distinct baseline model is consistent. When Zijk,1 is identically distributed for k = 1o, ••• , ]{, we have and . It is clear that 0 is the solution to the first component of q(f3) = 0 in (6.2). It follows from Theorem 6.1 that P1 obtained under an assumed mixed baseline hazard model is consistent for {3o,1. Using the same argument, we can easily show that P1 from an 122 assumed common baseline hazard model is consistent for (3o,ll if Zijk,l is identically . distributed for j = 1,· .. ,J and k = 1,·· . ,K. 0 Suppose that Zijk,l (t) is the indicator of the treatment assigned in a randomized clinical trial. The random assignment of the treatment should guarantee that Zijk,l(t) is independent of censoring and the other covariates of interesting. The re- sults in Corollary 6.3 imply that if the treatment has no effect on failure times then the estimated treatment effect will be zero even if the assumed marginal model is misspecified. 6.4 Concluding Remarks We derived the asymptotic properties for the maximum partial likelihood estimator f3 under a possibly misspecified marginal Cox regression hazard model. The general results are applied to some special cases, including the case of misspecifying the type of baseline hazard function for the Cox model when the functional form of .to the regression portion, exp{,8' Z}, is correct. Simulation studies are conducted in Chapter 7 to obtain information on the effect of misspecified marginal hazard models for applicable sample sizes in practice. • 123 Chapter 7 Simulation Studies of Marginal Hazard Model Misspecification 7.1 Introduction This chapter deals with the numerical evaluation of marginal hazard model misspeci- . fication. In Chapter 6 we derived the asymptotic properties for the maximum partial likelihood estimator /3 under a possibly misspecified marginal Cox regression hazard model. In this chapter we obtain information via simulations on the effects of misspecified marginal hazard models for practical sample sizes in applications. We confine our attention to the case of misspecifying baseline type when the true marginal hazard model is in the Cox regression family and the regression part exp{,8' Z} is correctly specified. -. 124 7.2 Simulation Parameters and Summary Statistics In this simulation study, the true marginal hazard model is a mixed baseline hazard model (6.1). The simulation parameters and methods are similar to those in Chapter 4 and are, therefore, not described in detail here. The simulation results are summarized within each simulation configuration by the following: mean mean ~, mean 13m, mean Vm , mean /3d' /3c' and mean Yc, where the subscripts m, c, and d indicate that the estimator is obtained under an assumed marginal mixed baseline hazard model (i.e., the correct model), an assumed distinct baseline hazard model (6.38), and an assumed common baseline hazard model (6.40), respectively. In this simulation study, the only part of model misspecification is the type of baseline hazards; the exp(,8 Z) part is correct. 7.3 Results and Discussion The simulation results are displayed in tables 7.1 - 7.6. In these tables ,80 and () refer to the regression parameter and the dependence parameter of the underlying model and n is the sample size for each simulation run. Each row in the tables is based on 500 simulation runs. These simulation results suggest that the estimator /3 d under an assumed dis- tinct baseline hazard model is approximately unbiased and the estimator /3c obtained from an assumed common baseline model is severely biased for each simulation con• figuration with /30 = 0.7, when the underlying marginal baseline hazard model is /3c is more severely biased for the exponential marginals than for the Weibull marginals. All estimators, including /3c' are approxi- the mixed baseline model. Estimator .. mately unbiased for each simulation configuration when ,80 = O. The unbiasedness of 125 estimators when (30 = 0 is expected as a result of Corollary 6.3 since the covariate Z is identically distributed for both j and k. The sandwich type robust variance estimator under an assumed distinct baseline hazard model agrees with that obtained from the • true mixed baseline hazard model, as judged by the values of mean Vd and mean Vm , although mean Vd tends to be consistently a little larger than mean Vm • This result agrees with the conclusion by Kalbfleisch and Prenti<:e (1980, p.88) for the univariate failure time data that the loss of efficiency in the estimate of f3 is generally not severe when the stratification is used unnecessarily. The rohust variance estimator under an assumed common baseline hazard model underestimates the underlying true variance, except when (30 = 0, in which case /3c is consistent. Table 7.1, Table 7.2, and Table 7.3 show the comparison results of censoring for the failure times with Normal(O, 1) distributed covariate and exponential marginals. Table 7.1 gives the summary statistics of the simulation results when there is no censoring. The simulation results with censoring are shown in Table 7.2 and Table 7.3. The Uniform(O,5) censoring distribution in Table 7.2 resulted in 12% and 14% of the data censored when (30 = 0 and (30 = 0.7, respectively. The Uniform(O,l) censoring distribution in Table 7.3 led to about 14% of the data censored. It shows clearly from these tables that all estimators are approximately unbiased when (30 that Pc is severely biased when (30 = 0.7 regardless of censoring proportion. • = 0 and However, the degree of underestimation of the robust variance estimator under an assumed common baseline hazard model decreases as the censoring percentage increases. Comparing the results from Table 7.2, and Tahle 7.4, we see that all estimators are approximately unbiased when (30 = 0 and that Pc is biased when (30 = 0.7 no matter whether the exponential marginals or the Weibull marginals are used. However, Pc is more severely biased for the exponential marginals than for the Weibull marginals. Also, the degree of underestimation of the robust variance estimator under the assumed common baseline hazard model decreases when the Weibull marginals are used. 126 . Table 7.2 and Table 7.5 indicate that Pc is severely biased when /30 0.7 regardless of the use of normal covariate or the use of Bernoulli covariate. fie is consistently more severely biased for the the Weibull marginals when /30 = 0.7, Bernoulli(0.5) The results in Table 7.6 show that exponential marginals than for covariate distribution and uniform(O, 5) censoring distribution are used. In brief summary, the simulation results indicate that the asymptotic results derived in Chapter 6 for the estimator under a possibly misspecified marginal model are preserved in practical sample sizes. Further numerical study is important to investigate the loss of efficiency in the estimate of /3 when stratification of failures is used unnecessarily. . • 127 Table 7.1: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with Normal(0,1) covariate, no censoring, and exponential marginals. 130 8 n mean ~m mean ~d mean ~c 0 .25 100 200 .000 -.001 .000 -.001 -.000 -.000 .0024 .0012 .0025 .0013 .0024 .0012 .80 100 200 .000 -.002 .001 -.002 .000 -.001 .0024 .0012 .0025 .0013 .0024 .0012 1.50 100 200 -.000 -.002 .001 -.002 .000 -.001 .0025 .0012 .0025 .0013 .0024 .0012 3.00 100 200 -.000 -.002 .000 -.002 .000 -.001 .0025 .0012 .0025 .0013 .0025 .0012 .25 100 200 .709 .703 .704 .700 .523 .522 .0053 .0027 .0052 .0027 .0031 .0016 .80 100 200 .707 .702 .703 .700 .523 .521 .0044 .0023 .0045 .0023 .0030 .0015 1.50 100 200 .705 .701 .702 .699 .523 .521 .0040 .0020 .0040 .0020 .0030 .0015 3.00 100 200 .704 .700 .702 .699 .523 .520 .0037 .0019 .0037 .0019 .0030 .0015 mean Vm mean Vd mean Vc • .7 128 Table 7.2: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with Normal(O,l) covariate, uniform(O, 5) censoring distribution (12% censoring for Po = 0 and 14% censoring for Po = 0.7), and exponential marginals. f30 0 • .7 (J n mean Pm mean Pd mean Pc mean Vm mean Vd mean Vc .25 100 200 .001 -.001 .001 -.001 .000 -.001 .0028 .0014 .0029 .0014 .0029 .0014 .80 100 200 .000 -.001 .000 -.001 -.001 -.001 .0029 .0014 .0029 .0014 .0029 .0014 1.50 100 200 .000 -.001 .000 -.001 -.001 -.001 .0029 .0014 .0029 .0014 .0029 .0014 3.00 100 200 .000 -.001 .000 -.001 -.001 -.001 .0029 .0014 .0029 .0014 .0029 .0014 .25 100 200 .708 .701 .704 .699 .521 .518 .0056 .0029 .0056 .0029 .0036 .0018 .80 100 200 .706 .701 .704 .699 .521 .519 .0048 .0024 .0048 .0024 .0034 .0017 1.50 100 200 .704 .700 .703 .699 .520 .518 .0044 .0022 .0044 .0022 .0034 .0017 3.00 100 200 .704 .700 .703 .700 .520 .518 .0041 .0020 .0041 .0021 .0034 .0017 • • 129 Table 7.3: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with Normal(O,l) covariate, uniform(O, 1) censoring distribution (42% censoring), and exponential marginals. 130 6 n mean ~m mean ~d mean ~c 0 .25 100 200 .001 -.000 .000 -.000 -.001 -.001 .0043 .0021 .0044 .0022 .0043 .0021 .80 100 200 .000 -.000 -.000 -.000 -.002 -.001 .0043 .0021 .0044 .0022 .0043 .0021 1.50 100 200 .000 .000 .001 .000 -.002 -.001 .0044 .0021 .0044 .0021 .0044 .0021 3.00 100 200 -.000 -.000 -.000 -.000 -.003 -.001 .0044 .0021 .0044 .0021 .0043 .0021 .25 100 200 .706 .699 .703 .698 .541 .538 .0068 .0034 .0068 .0034 .0051 .0025 .80 100 200 .705 .700 .704 .699 .542 .539 .0060 .0030 .0060 .0030 .0050 .0025 1.50 100 200 .703 .700 .703 .699 .541 .539 .0057 .0028 .0057 .0028 .0050 .0025 3.00 100 200 .703 .699 .703 .699 .541 .539 .0055 .0027 .0056 .0028 .0050 .0025 mean Vm mean Vd mean Vc • .7 • 130 Table 7.4: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with Normal(0,1) covariate, uniform(O, 5) censoring distribution (19% censoring for 130 130 0 Vm mean Vd mean Ye n mean ~m mean ~d mean ~c .25 100 200 .001 -.000 .001 -.000 .000 -.001 .0031 .0015 .0032 .0016 .0031 .0015 .80 100 200 .001 -.000 .001 -.000 -.000 -.001 .0031 .0015 .0032 .0016 .0031 .0015 1.50 100 200 .000 -.000 .001 -.000 -.000 -.000 .0031 .0015 .0032 .0015 .0031 .0015 3.00 100 200 .000 -.000 .001 -.001 .000 -.000 .0031 .0015 .0032 .0015 .0031 .0015 .25 100 200 .707 .701 .703 .699 .636 .630 .0059 .0030 .0059 .0030 .0051 .0026 .80 100 200 .706 .700 .703 .699 .635 .629 .0050 .0026 .0051 .0026 .0045 .0023 1.50 100 200 .704 .700 .703 .699 .633 .629 .0047 .0023 .0047 .0024 .0042 .0021 3.00 100 200 .703 .700 .703 .700 .633 .629 .0044 .0022 .0044 .0022 .0040 .0020 () • .7 = 0 and 21 % censoring for 130 = 0.7), and Weibull marginals. • 131 mean Table 7.5: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with (30 = 0.7, Bernoulli(0.5) covariate, uniform(O, 5) censoring distribution (9% censoring), and exponential marginals. () n mean Pm mean Pd mean Pc mean lim mean lid mean lie .25 100 200 .704 .701 .697 .697 .515 .513 .0139 .0071 .0142 .0072 .0113 .0057 .80 100 200 .700 .699 .695 .696 .512 .511 .0129 .0066 .0132 .0067 .0111 .0056 1.50 100 200 .698 .698 .696 .696 .511 .511 .0124 .0063 .0126 .0064 .0111 .0056 3.00 100 200 .698 .697 .697 .696 .511 .510 .0121 .0061 .0122 .0061 .0111 .0056 132 • • Table 7.6: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with sample size of 100, Bernoul1i{0.5) covariate, and uniform{O, 5) = 0 and 9% censoring for {30 = 0.7 with exponential marginals and 19% censoring for {30 = 0 and 16% censoring for (30 = 0.7 censoring distribution {12% censoring for {30 with Weibull marginals). 130 0 • .7 (J marginals mean Pm mean Pd mean Pc mean Vm mean Vd mean Vc .25 Exponential Weibull -.006 -.005 -.005 -.005 -.006 -.009 .0112 .0122 .0114 .0125 .0113 .0122 .80 Exponential Weibull -.009 -.008 -.008 -.007 -.009 -.012 .0112 .0122 .0114 .0125 .0113 .0123 1.50 Exponential Weibull -.008 -.008 -.008 -.009 -.009 -.013 .0113 .0123 .0114 .0125 .0113 .0123 3.00 Exponential Weibull -.008 -.009 -.007 -.009 -.010 -.014 .0113 .0123 .0114 .0124 .0113 .0123 .25 Exponential Weibull .704 .704 .697 .697 .515 .659 .0139 .0148 .0142 .0151 .0113 .0150 .80 Exponential Weibull .700 .700 .695 .696 .512 .655 .0129 .0139 .0132 .0141 .0111 .0140 1.50 Exponential Weibull .698 .698 .696 .696 .511 .653 .0124 .0134 .0126 .0136 .0111 .0135 3.00 Exponential Weibull .698 .697 .697 .697 .511 .653 .0121 .0130 .0122 .0131 .0111 .0132 133 Chapter 8 Remarks Multivariate failure time data with generalized dependence structure are common in applications. However, practically no work has been done on this issue. We have proposed using a marginal model approach to multivariate failure data with generalized dependence structure when the scientific interests are in the effect of covariates on the risk of failures and knowledge of the dependence structure is not • , available. Depending on whether the baseline hazards are distinguishable, we have proposed three types of marginal hazard models: distinct baseline hazard models, common baseline hazard models, and mixed baseline hazard models. Note that the 13 use of a common regression parameter vector of failures dose not preclude the use of different 13 for all members and all types of for different members or different failures. We can use member and failure-specific covariates to introduce different 13 for different members or different failures. For example, if Z 1 = (Z, 0)' and Z2 = (0, Z)', then 1301 and 1302 represents the effects of Z on failure time 1 and failure time 2, respectively. .. The proposed marginal models generalize the Vv'LW model and the LWA model. Our distinct baseline model is equivalent to the WLW model if we stratify our anal- 134 ysis based on both failure types and members in a cluster. Similarly, our common baseline model becomes a more general setup of the LWA model by allowing multiple failures per member. However, we have to assume either a different baseline for each combination of failure types and members in a cluster, or an identical baseline for all combinations of failures and subjects in a stratum in order to apply the WLW model or the LWA model, which may not be applicable in applications. Our mixed baseline hazard model provides significantly grea.ter modeling flexibility and applicability, and enables us to deal with some applica.tion problems the current existing methods can not handle, such as the data from the Framingham Heart Study. Inference on regression parameters for each type of model is based on a system of pseudo score equations obtained under the working assumption of independence, which is in the framework of generalized estimating equations. It may be more efficient to use the estimating equations which take into account the nature of dependence • explicitly via introducing weight matrices in a similar spirit as in the case of (noncensoring) longitudinal data suggested by Liang and Zeger (1986). Cai and Prentice (1995) proposed using the inverse matrix of the covariance functions between counting process martingales to construct the weight matrix for the WLW model or the LWA model. Their simulation results, however, showed that the efficiency gains are important only if the dependence among the failure times are very strong. This may indicate the difficulty of constructing optimal weight matrices because of the censoring and the non-linear nature of the Cox model (Lin (1994)). Further research is needed to study the efficiency of estimators by incorporating the dependence structure into the estimating equations. Relying on the theory of multivariate counting processes, stochastic integrals • and local martingales, we have proven that when the marginal hazard model is correctly specified, the estimators from all three types of the proposed marginal models are consistent and asymptotically normal with a "sandwich" type robust covariance matrix which can be consistently estimated. Note that, because the pseudo partial 135 likelihood score statistic is not a sum of independent terms, the large-sample properties do not follow the usual argument for the likelihood-based statistics. The simulation results show that the proposed large sample approximation is adequate even when the sample size is relatively small (n = 50). The methodology is illustrated on . data from the Framingham Heart Study. The validity of the asymptotic distribution theory for the estimator depends critically on the appropriateness of the assumed marginal models. The consequences of misspecified marginal models were investigated. We derived the general asymptotic properties for the maximum partial likelihood estimator under a possibly misspecified marginal Cox regression hazard model. The general results were then applied to some special cases, including the case of misspecifying the type of baseline hazard functions for the Cox model when the correct functional form of exp{,8' Z} was used. The simulation results indicate that the derived asymptotic results for the estimator under a possibly misspecified marginal hazard model are preserved in the finite sample sizes applicable in practice. Further numerical study is needed to investigate the loss • of efficiency in the estimate of 13 when stratification of failures is used unnecessarily. • It is important to assess the adequacy of the marginal hazard models. Spiek- erman and Lin (1996) developed a class of graphical and numerical techniques using residuals based on marginal martingale processes for checking the adequacy of the marginal Cox model. The applicability of their methods to the mixed baseline hazard models deserves investigation. . " 136 . Bibliography Aalen, O. (1988). Heterogeneity in survival analysis, Statistics in Medicine 7: 1121- 1137. Akaike, H. (1973). Information theory and an extension of the likelihood principle, Proceedings of the Second International Symposium of Information Theory, Akademiaia Kiado : Budapest. Andersen, P. K. and Borgan, 0. (1985). Counting process model for life history data: A review, Scandinavian Journal of Statistics 12: 97-158. • Andersen, P. K. and Gill, R. D. (1982). Cox's regression model for counting processes: A large sample study, The Annals of Statistics 10: 1100-1120. Andersen, P. K., Borgan, 0., Gill, R. D. and Keiding, N. (1993). Statistical Models Based on Counting Processes, New York: Springer. Anderson, G. L. and Fleming, T. R. (1995). Model misspecification in proportional hazards regression, Biometrika 82: 527-541. Anderson, J. E. and Louis, T. A. (1995). Survival analysis using a scale change random effects model, Journal of the American Statistical Association 90: 669-679. • Bandeen-Roche, K. J. and Liang, K. Y. (1996). Modelling failure-time associations in data with multiple levels of clustering, Biometrika 83: 29-39. Bartle, R. G. (1966). The Elements of Integration, New York: Wiley. 137 Block, H. W. and Basu, A. P. (1974). A continuous bivariate exponential extension, Journal of the American Statistical Association 69: 1031-1037. Cai, J. and Prentice, R. L. (1995). Estimating equations for hazard ratio parameters based on correlated failure time data, Biometrika 82: 151-164. . Chang, I. and Hsiung, C. A. (1995). An efficient estimator for proportional hazards models with frailties, Unpublished Manuscript. Chang, I. S. and Hsiung, C. A. (1994). Information and asymptotic efficiency in some generalized proportional hazards models for counting processes, The Annals of Statistics 22: 1275-1298. Clayton, D. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence, Biometrika 65: 141-151. Clayton, D. and Cuzick, J. (1985). Multivariate generalizations of the proportional hazards model, Journal of the Royal Statistical Society, Ser. A 148: 82-117. Clayton, D. G. (1991). A monte carlo method for bayesian inference in frailty models, Biometrics 47: 467-485. Cox, D. R. (1972). Regression models and life-tables (with discussion), Journal of the Royal Statistical Society, Ser. B 34: 187-220. Cox, D. R. (1975). Partial likelihood, Biometrika 62: 269-276. Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data, London: Chapman and Hall. Cramer, H. (1946). Mathematical Methods of Statistics, Princeton University Press. Crowder, M. (1985). A distributional model for repeated failure time measurements, Journal of the Royal Sta.tistical Society, Ser. B 65: 447-452. 138 • Dawber, T. R. (1980). The Framingham Study, the epidemiology of atherosclerotic disease, Harvard University Press. Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimation: observed versus expected information, Biometrika 65: 457-482. Farlie, D. J. G. (1960). The performance of some correlation coefficients for a general bivariate distribution, Biometrika 47: 307-323. Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Anal- ysis, New York: Wiley. Freund, J. E. (1961). A bivariate extension of the exponential distribution, Journal of the American Statistical Association 56: 971-977. Gail, M. H., Santner, T. J. and Brown, C. C. (1980). An analysis of compara- tive carcinogenesis experiments based on multiple times to tumor, Biometrics • a 36: 255-266. Gail, M. H., Tan, W. Y. and Piantadosi, S. (1988). Tests for no treatment effect in randomized clinical trials, Biometrika 75: 57-64. Gail, M. H., Wieand, S. and Piantadosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates, Biometrika 71: 431-444. Genest, C. and MacKay, R. J. (1986). The joy of copulas: Bivariate distributions with univariate marginals, The American Statistician 40: 280-283. Gill, R. D. (1980). Censoring and Stochastic Integrals Mathematical Center Tracts. • No. 124, Amsterdam: The Mathematical Centre. Gumbel, E. J. (1960). Bivariate exponential distributions, Journal of the American Statistical Association 55: 698-707. 139 Hougaard, P. (1986a). A class of multivariate failure time distributions, Biometrika 73: 671-678. Hougaard, P. (1986b). Survival models for heterogeneous populations derived from stable distributions, Biometrika 73: 387-396. Hougaard, P. (1987). Modelling multivariate survival, Scandinavian Journal of Statistics 14: 291-304. Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions, Proceedings of the Fifth Berkeley Symposium in Mathematical Statistics and Probability, University of California Press: Berkeley, pp. 221-233. Huster, W. J., Brookmeyer, R. and Self, S. G. (1989). Modelling paired survival data with covariates, Biometrics 45: 145-156. Johansen, S. (1983). An extension of cox's regression model, International Statistical • Review 51: 258-262. Kalbfleisch, J. D. and Prentice, R. 1. (1980). The Statistical Analysis of Failure Time , Data, New York: Wiley. Kent, J. T. (1982). Robust properties of likelihood ratio tests, Biometrika 69: 19-27. Klein, J. P. (1992). Semiparametric estimation of random effects using cox model based on the EM algorithm, Biometrics 48: 795-806. Klein, J. P., Keiding, N. and Kamby, C. (1989). Semiparametric marshall-olkin models applied to the occurrence of metastases at multiple sites after breast cancer, Biometrics 45: 1073-1086. Klein, J. P., Moeschberger, M., Li, Y. H. and Wang, S. T. (1992). Estimating random effects in the Framingham Heart Study, in J. P. Klein and P. K. Goel (eds), 140 • Survival Analysis: State of the Art, Kluwer Academic Publishers: Dordrecht, pp.99-120. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency, The Annals of Mathematical Statistics 22: 79-86. Lagakos, S. W. (1988). The loss in efficiency from misspecifying covariates in proportional hazards regression models, Biometrika 75: 156-160. Lagakos, S. W. and Schoenfeld, D. A. (1984). Properties of proportional hazards score tests under misspecified regression models, Biometrics 40: 1037-1048. Lee, E. W., Wei, L. J. and Amato, D. A. (1992). Cox-type regression analysis for large numbers of small groups of correlated failure time observations, in J. P. Klein and P. K. Goel (eds), Survival Analysis: State of the Art, Kluwer Academic Publishers: Dordrecht, pp. 237-247. • Lee, E. W., Wei, 1. J. and Ying, Z. (1993). Linear regression analysis for highly stratified failure time data, Journal of the American Statistical Association 88: 557565. Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models, Biometrika 73: 13-22. Liang, K. Y., Self, S. G. and Chang, Y. (1993). Modeling marginal hazards in multivariate failure time data, Journal of the Royal Statistical Society, Ser. B 55: 441453. Lin, D. Y. (1994). Cox regression analysis of multivariate failure time data: The marginal approach, Statistics in Medicine 13: 2233-2247. Lin, D. Y. and Wei, L. J. (1989). The robust inference for the Cox proportional hazards model, Journal of the American Statistical Association 84: 1074-1078. 141 Lin, J. S. and Wei, L. J. (1992). Linear regression analysis for multivariate failure time observations, Journal of the American Statistical Association 87: 1071-1097. Marshall, A. W. and Olkin, I. (1967). A multivariate exponential distribution, Journal of the American Statistical Association 62: 30-4,1. Marshall, A. W. and Olkin, I. (1988). Families of multivariate distributions, Journal of the American Statistical Association 83: 834-840. Murphy, S. A. (1994). Consistency in a proportional hazards model incorporating a random effect, The Annals of Statistics 22: 712-731. Murphy, S. A. (1995). Asymptotic theory for the frailty model, The Annals of Statis- tics 23: 182-198. Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sf/Jrensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models, Scandina- vian Journal of Statistics 19: 25-43. Oakes, D. (1982). A model for association in bivariate survival data, Journal of the Royal Statistical Society, Ser. B 44: 414-422. Oakes, D. (1989). Bivariate survival models induced by frailties, Journal of the Amer- ican Statistical Association 84: 487-493. Oakes, D. and Manatunga, A. (1992). Fisher information for a bivariate extreme value distribution, Biometrika 79: 827-832. Prentice, R. L. (1978). Linear rank test with right censored data, Biometrika 65: 167179. Prentice, R. 1. and Cai, J. (1992). Covariance and survivor function estimation using censored multivariate failure time data, Biomet'rika 79: 495-512. Amendment 80: 711-712. 142 • Prentice, R. 1. and Hsu, L. (1996). Estimating equations for hazard ratio and correlation parameters in multivariate failure time analysis, to appear in Biometrika. Prentice, R. 1. and Zhao, 1. P. (1991). Estimation equations for parameters in means and variances of multivariate discrete and continuous responses, Biomet- rics 47: 825-839. Prentice, R. L., Williams, B. J. and Peterson, A. V. (1981). On the regression analysis of multivariate failure time data, Biometrika 68: 373-379. Puri, M. 1. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis, New York: Chapman and Hall. Royall, R. M. (1986). Model robust confidence intervals using maximum likelihood estimators, International Statistical Review 54: 221-226. Rubin, D. B. (1976). Inference and missing values, Biometrika 63: 81-92. • Sarkar, S. K. (1987). A continuous bivariate exponential distribution, Journal of the American Statistical Association 82: 667-675. Sen, P. K. (1981). The Cox regression model, invariance principles for some induced quantile processes and some repeated significance test, The Annals of Statistics 9: 109-121. Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics: An Intro- duction with Applications, New York: Wiley. Solomon, P. J. (1984). Effect of misspecification of regression models in the analysis of survival data, Biometrika 71: 291-298. Amendment 73: 245. • Spiekerman, C. F. and Lin, D. Y. (1996). Checking the marginal cox model for correlated failure time data, Biometrika 83: 143-156. 143 Struthers, C. A. and Kalbfleisch, J. D. (1986). Misspecified proportional hazard models, Biometrika 73: 363-369. Therneau, T. M. (1996). Extending the Cox model, Technical Report no. 58, Mayo Foundation. Tsiatis, A. (1981). A large sample study of Cox's regression model, The Annals of Statistics 9: 93-108. Vaupel, J. W., Manton, K. G. and Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality, Demography 16: 439-454. Wei, L. J., Lin, D. Y. and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distribution, Journal of the American Statistical Association 84: 1065-1073. White, H. (1982). Maximum likelihood estimation of misspecified models, EconometII rica 50: 1-25. Zhao,1. P. and Prentice, R. 1. (1990). Correlated binary regression using a quadratic exponential model, Biometrika 77: 642-648. • 144
© Copyright 2025 Paperzz