Clegg, Limin Xu; (1997).Marginal Models for Multivariate Failure Time Data with Generalized Dependence Structure."

Uniersity of North Carolina
Institute of Statistics
Mimeo Series # 2184T
MARGINAL t-DDELS FOR MULTIVARIATE
FAILURE TIME DATA WITH GENERALIZED
•
DEPENDENCE STRUCl'URE
BY: Limin Xu Clegg
Date
Name
..
.,
",
I
I
II
,
MARGINAL MODELS FOR MULTIVARIATE FAILURE
TIME DATA WITH GENERALIZED
DEPENDENCE STRUCTURE
by
Limin Xu Clegg
Department of Biostatistics
University of North Carolina
Institute of Statistics
Mimeo Series No. 2184T
May 1997
•
MARGINAL MODELS FOR MULTIVARIATE
FAILURE TIME DATA WITH GENERALIZED
DEPENDENCE STRUCTURE
by
Limin Xu Clegg
A dissertation submitted to the faculty of the University of North Carolina at Chapel
Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy
in the Department of Biostatistics, School of Public Health.
Chapel Hill
1997
Approved by:
Co-Advisor
Co-Advisor
Reader
Reader
Reader
@1997
Limin Xu Clegg
ALL RIGHTS RESERVED
1
LIMIN XU CLEGG.
Marginal models for multivariate failure time data with
generalized dependence structure. (Under the joint direction of Drs. Jianwen Cai and
Pranab Kumar Sen.)
ABSTRACT
In epidemiologic studies, there is often more than one outcome measured on
the same subject. Multiple failures from the same individual induce multivariate
failure time data involving within-subject dependence. Furthermore, participants
may not be independent within a cluster (e.g., as with members of the same family).
Responses from subjects in the same cluster generate multivariate failure time data
with between-subject dependence.
In this research we consider a generalized dependence structure which consists
of both between-subject and within-subject dependence instead of dealing only with
one of the two types. We propose using a marginal approach to analyze multivariate
failure time data with generalized dependence structure when the scientific interests
are in the effects of covariates on the risk of failures and knowledge of the dependence
structure is not available. Three types of marginal hazard models are proposed:
distinct baseline hazard models, common baseline hazard models, and mixed baseline
hazard models. All these hazard models are in the form of Cox regression models.
The mixed baseline hazard model provides significantly greater modeling flexibility
and applicability, and enables us to deal with some application problems the current
existing methods can not handle.
Inference on regression parameters for each type of model is based on a system
of pseudo score equations obtained under the working assumption of independence,
which is in the framework of generalized estimating equations. Relying on the theory
of multivariate counting processes, stochastic integrals and local martingales, we have
proven that the estimators for the proposed models are consistent and asymptotically
11
normal with a robust covariance matrix which can be consistently estimated. The
simulation results show that the proposed large sample approximation is adequate
even when the sample size is relatively small (n
= 50).
The methodology is illustrated
on data from the Framingham Heart Study.
The consequences of misspecified marginal models are also investigated. We
derive the asymptotic properties of the pseudo maximum partial likelihood estimator under possibly misspecified marginal Cox regression hazard models. Simulation
studies are conducted to obtain information on the effects of misspecified marginal
hazard models on the sample sizes for practical applications.
11l
ACKNOWLEDGEMENTS
I would like to express my sincere appreciation to my advisors, Dr. Jianwen
Cai and Dr. P. K. Sen, for their guidance and enthusiasm about this work. Their
encouragement and confidence in me have been invaluable in this research. I would
also like to thank the members of my committee, Dr. Gerardo Heiss, Dr. Lawrence
Kupper, and Dr. Bahjat Qaqish, for their constructive comments and suggestions. I
would especially like to thank Dr. Kupper for serving as my academic advisor and also
personal advisor. lowe so much to him for his inspiration, guidance, and friendship.
I am very grateful for the opportunity to work at the Collaborative Studies
Coordinating Center in the Department of Biostatistics, which provides me financial
support as well as a learning opportunity. This research topic originated from my job
there as a statistician. I would especially like to thank Dr. Woody Chambless for all
I have learned from him and to Dr. Ed Davis for his encouragement and support.
I would like to express my gratitude to the statistical faculty members in the Department of Mathematical Sciences at the University of North Carolina at Greensboro
while I was a student there: Dr. Herr, Dr. Kissling, Dr. Ludwig, and Dr. Warrack,
for guiding me into statistics.
Very special thanks go to Dr.
Melvin Hurwitz, my advisor while I was in
the Department of Clothing and Textiles at the University of North Carolina at
Greensboro, for his understanding and moral support when I decided to give up
being a Ph. D candidate to pursue my degree in Biostatistics. His confidence in me
and his help are always appreciated.
I am grateful, though I declined, for being awarded an On-Campus Dissertation
IV
Fellowship by the Graduate School at the University of North Carolina at Chapel
Hill for the Fall 1996 semester and a Predoctoral Fellowship by the Veteran Affairs
Central Office for the fiscal year of 1996-1997. My efforts were partially supported
by a National Institute of Environmental Health Sciences Training Grant.
I would like to thank my friends, my parents, brother, sisters, and in-laws
for their encouragement, love, and support. Finally, I want to thank my wonderful
husband, Carney, for having been with me every step of the way. This dissertation
would not have been possible without his support.
v
Contents
1
Introduction and Literature Review
1
1.1
Introduction . . . .
1
1.2
Literature Review .
3
1.2.1
Univariate Survival Data Analysis .
3
1.2.2
Modeling Multivariate Failure Time Data.
6
Introduction . . . .
6
Joint Distribution .
7
Conditioning on Past Events or History.
12
Frailty Models. .
17
Marginal Models
20
. . . . . . ........
26
Model Misspecification and Robust Inferences
26
..................
34
Concluding Remarks
1.2.3
1.3
Synopsis of Research
2 Marginal Modeling and Estimation
2.1
Introduction..........
36
36
VB
Notation and Definitions
2.3
Marginal Hazard Models and Estimation
2.4
3
........
2.2
2.3.1
Distinct Baseline Hazard Models
2.3.2
Common Baseline Hazard Models
2.3.3
Mixed Baseline Hazard Models
Concluding Remarks
36
38
· . · . .. . .
38
.......·.....
42
43
..........
44
Asymptotic Distributions of Parameter Estimators
46
3.1
Introduction . . . .
3.2
Two useful Lemmas .
3.3
Mixed Baseline Hazard Model
48
3.4
Distinct Baseline Hazard Model
72
3.5
Common Baseline Hazard Model
73
3.6
Concluding Remarks
46
......
.....
.. · .
. . . . . . . .. . . · . · . . ...
4 Simulation Studies of Parameter Estimators
47
75
76
4.1
Introduction
.
76
4.2
Simulation Parameters
77
4.3
Summary Statistics . .
79
4.4
Results and Discussion
80
5 Example
5.1
89
Introduction
89
Vlll
•
5.2
Data and Model . . . .
89
5.3
Results and Discussion
91
6 Misspecification of Marginal Hazard Models
93
6.1
Introduction...................
93
6.2
Asymptotic Properties of Regression Estimators
95
6.2.1
Mixed Baseline Hazard Models
95
6.2.2
Distinct Baseline Hazard Models
114
6.2.3
Common Baseline Hazard Models
116
6.3
Special Cases . . .
118
6.4
Concluding Remarks
123
7 Simulation Studies of Marginal Hazard Model Misspecification
124
7.1
Introduction....................
124
7.2
Simulation Parameters and Summary Statistics
125
7.3
Results and Discussion . . . . . . . . . . . . . .
125
134
8 Remarks
137
Bibliography
IX
List of Tables
4.1
Observed mean pairwise correlations for different () values based on
1,000 simulation runs with sample size of 1,000 and no censoring . ..
4.2
Simulation results for Normal(0,1) covariate, uniform(0, 5) censoring
distribution, ()
4.3
= 0.25, exponential marginals, and sample size of 100.
4.8
86
Simulation results for f30 = 0.7, Bernoulli(0.5) covariate, uniform(O, 5)
censoring distribution, and exponential marginals
..
85
Simulation results for Normal(O,l) covariate, uniform(O, 5) censoring
distribution, and Weibull marginals . . . . . . . . . . . . . . . . . ..
4.7
84
Simulation results for Normal(O,l) covariate, uniform(O, 1) censoring
distribution, and exponential marginals . . . . . . . . . . . . . . . ..
4.6
83
Simulation results for Normal(O,l) covariate, uniform(O, 5) censoring
distribution, and exponential marginals . . . . . . . . . . . . . . . ..
4.5
79
Simulation results for Normal(O,l) covariate, no censoring, and exponential marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . "
4.4
78
87
Simulation results for sample size of 100, Bernou11i(0.5) covariate, and
uniform(O, 5) censoring distribution
.
x
"
88
5.1
Estimates of regression parameters for the Framingham Heart Study
data.
. . . . . . . . . . .
92
I
7.1
Simulation results for the true mixed baseline hazard model with Normal(0,1) covariate, no censoring, and exponential marginals. .
7.2
128
Simulation results for the true mixed baseline hazard model with Normal(0,1) covariate, uniform(O, 5) censoring distribution, and exponential marginals
7.3
. . ..
..
. .
. .
..
129
Simulation results for the true mixed baseline hazard model with Normal(0,1) covariate, uniform(0, 1) censoring distribution, and exponential marginals .
7.4
. . . .
. .
. . . ..
130
Simulation results for the true mixed baseline hazard model with Normal(O,l) covariate, uniform(O, 5) censoring distribution, and Weibull
marginals
7.5
..
. . . . . . . . ..
..
..
Simulation results for the true mixed baseline hazard model with (30
131
=
•
0.7, Bernoulli(0.5) covariate, uniform(O, 5) censoring distribution, and
exponential marginals .
7.6
. . . . . . . . . ..
..
. .
132
Simulation results for the true mixed baseline hazard model with sample size of 100, Bernoulli(0.5) covariate, and uniform(O, 5) censoring
distribution . . . .
. .
. . . .
. .
133
•
,.
Xl
..
Chapter 1
Introduction and Literature
Review
1.1
Introduction
This dissertation will consider multivariate failure time data in which more than
one failure can occur on the same experimental unit and all of the event times can
potentially be observed. This problem is different from the one that is commonly
referred to as the competing risks problem, wherein one and only one among several
distinct events can be observed.
One well known study which exemplifies the motivation for this research is the
Framingham Heart Study. One of the purposes of this study was to assess the effect
of the risk factors on age at death for all causes and times until angina pectoris, myocardial infarction, silent myocardial infarction, cerebrovascular accident, coronary
insufficiency, congestive heart failure, and cancer. Some individuals in the Framingham Heart Study are related because the sampling unit is family. There are siblings
"
and married couples in the study. In the cohort of 4,211 subjects used in the analysis
by Klein (1992), there were 1,146 married couples. Only 1,919 individuals of 4,211
were singletons, who were either unmarried or whose spouses were not included in
the dataset. Among these 4,211 cohort members, 1,050 individuals were from 452
sibships, with sibship size ranging from two to six.
The multivariate failure time data from the Framingham Heart Study contains
both between-subject and within-subject dependence structures. Husbands and wives
who share common environmental conditions, such as diet, hazardous material exposures in the household, smoking, and other lifestyle characteristics, may have their
event times dependent on each other. Siblings who share a common genetic code
and early environmental exposure will have their event times more closely related
than those of non-siblings. Such dependence of event times among family members
constitutes between-subject dependence. Within-subject dependence refers to the dependence among different event times for the same individual, such as time to angina
pectoris and time to myocardial infarction.
The presence of both between-subject and within-subject dependence in these
possibly censored data poses challenges to the statistical analysis. To our knowledge,
the literature does not contain a reference to the problem of censored data with both
between and within-subject dependence. By contrast~, much recent research effort has
been devoted to multivariate failure time data to account for either between-subject
dependence or within-subject dependence, but not both. In the next section we will
review the relevant literature in these areas.
2
1.2
Literature Review
1.2.1
Univariate Survival Data Analysis
Failure time data analysis, or survival analysis, deals with time to failure or events
which may be censored. What makes failure time data analysis distinct from others
is the issue of censoring, either intentional, as for example the termination of trials
before the event of interest occurs, or unintentional, such as loss of follow-up.
There are three main types of censoring: right censoring, left censoring, and
interval censoring. The failure time is interval censored if it is only known that the
failure occurs between the time interval (tL, tR). When tR
right censored. If tL
= 0,
--+ 00,
the observation is
it is called left censored. The usual methods of handling
right censored data cannot be used for interval censored data. In this dissertation,
we consider only right censoring.
In general, there are two types of regression models for failure time data. One
type of regression is based on the linear regression model,
Y
= Zf3' + E,
where Y = g(X) represents either the observed failure time X or any monotone
transformation of the observed failure time. When Y is the logarithm of the observed
failure time, this model is the accelerated failure time model, in which the covariates
Z alter the time to failure through a multiplicative effect on the observed failure time
(Kalbfleisch and Prentice (1980)).
The other type of regression model is the well-known Cox regression model
(Cox (1972)), also called the Cox proportional hazards model when the covariates Z
.
are time-independent. Let
1
A(t) = lim- Pr(t
.
h!O h
=:; T < t + hiT ~ t)
3
be the hazard function. The Cox regression model postulates that the failure time T
associates with covariate Zi through the hazard function
(1.1 )
where AO(t) is the unknown and unspecified nonnegative baseline hazard. Assuming
the observed failure times are distinct, the partial likelihood function for the Cox
regression model (Cox (1975)) is
(1.2)
where
~
=
{I :
1i
~
Ti } is the risk set, i.e., the set of indices corresponding to
individuals at risk and uncensored at time ti. The corresponding score function is:
where
S(O)({3, t i ) =
L
exp[{3' Z/{ti)]
/eR;
and
S(l)({3, ti) =
L
Z/(ti) exp[{3' Z/(ti)].
/eR;
In most analyses, independent censorship and noninformative censoring schemes are
assumed. Independent censoring would hold, for example, if the failure time Ti and
the censoring time Ci are statistically independent, conditioning on covariates Zi, for
i
= 1,2,'" ,n. The censoring is noninformative if the parameters of the distribution
of the censoring time do not depend on the parameters of interest, here, the regression
coefficients. Notice that independent censorship does not guarantee noninformative
censoring. In this dissertation, we will also assume independent censorship and a
noninformative censoring scheme.
The maximum partial likelihood estimator
/3
is the solution to U({3) = O.
Tsiatis (1981) proves that -,fii(/3 - (3o) converges to a multivariate normal distribution
4
.
with mean zero and variance Z-l(f3) using the traditional approach, where
fj2
Z = -E[8f38f3' log PL(f3)].
Based on induced order statistics and permutation arguments, Sen (1981) developed
the asymptotic distribution theory of the score function test statistic by using discrete time martingales. Andersen and Gill (1982) elegantly proved the asymptotic
distribution of
/3 by applying martingale theory in the counting process framework.
Johansen (1983) demonstrated that the partial likelihood may be viewed as a profile likelihood in which the unknown baseline function Ao(t) is replaced in the total
likelihood by a nonparametric maximum likelihood estimate.
A simple extension of the model (1.1) is the inclusion of strata (Kalbfleisch and
Prentice (1980)). The hazard function for individual i in the jth stratum is defined
as
(1.3)
and the partial likelihood is
P L(f3) =
II P Lj(f3)
j
where PL j (f3) is the partial likelihood function for stratum j.
There are many other important articles on univariate survival analysis, so many
that it is impossible to give a complete account of this topic in this short literature
review. For a more comprehensive discussion of the topic, we refer to the books
by Cox and Oakes (1984), Kalbfleisch and Prentice (1980), Fleming and Harrington
(1991), and Andersen, Borgan, Gill and Keiding (1993).
5
1.2.2
Modeling Multivariate Failure Time Data
Introduction
Multivariate failure time data arise when either an individual records multiple failure events, the so-called subject effect which induces within-subject dependence, or
individuals recording a single failure event are grouped into clusters because of clustered sampling or match, the so-called design effect which induces between-subject
dependence. Here, we exclude the competing risks situation where failure of one type
precludes the occurrence of other types of failure, as for studies involving different
causes of death.
In the setting of multivariate failure time data induced by within-subject dependence, multiple events may fall into one of two categories, ordered and unordered.
In the unordered category, different types of failure processes are acting simultaneously and, at any time, each subject can experience a failure of a particular type,
unless that type of failure has already occurred in that subject or has been censored.
For example, in a bone marrow transplant study of leukemia patients, the unordered
distinct types of failures include chronic graft versus host disease (GVHD), relapse of
leukemia, or death. Other examples of studies involving unordered multiple failures
are studies on diseases of the eye, kidney, lung, etc. if we consider the failure on
different eyes, different kidneys, and different lungs as different failures. Repeated,
successive or recurrent events of the same type of failure pertain to the ordered failure outcomes. Examples are repeat myocardial infarction attacks and bladder tumor
recurrence. However, in the case of multivariate failure time data induced by betweensubject dependence, the failures among individuals in the same group are generally
unordered. The ordered or unordered nature of multiple failure events will affect the
way we analyze multivariate failure data. For example, we may choose a conditional
modeling approach for the ordered multiple failure events but a marginal approach
6
.
for the unordered multiple failures.
The models for multivariate failure time data can be classified into two broad
categories: models that specify the structure of dependence and marginal models
which treat the dependence among failure times as a nuisance and avoid specifying
the structure of the dependence.
The structure of the dependence among multivariate failure time data can be
specified in the following ways: (1) assume joint distributions of parametric models;
(2) condition on past event or history by specifying multiplicative intensity models,
overall intensity models, or other conditional models; (3) use frailty model theory to
include random effects in the model.
J oint Distribution
Multivariate distributions can be generated or defined in a number of different ways,
each of which is more or less relevant for the specific situation and purpose. Almost
all examples we consider here are bivariate for simplicity of presentation. However, all
can be generalized to higher dimensions of multivariate cases, unless stated otherwise.
Assume T1 and T2 are two random variables representing times to failures of
the first type and the second type, respectively. The hazard function of a failure of
type i at time t given no failure before time t is
i = 1,2
and the hazard function of a failure of type i at s given a previous failure of the other
type j at time t is
Ailj(slt)
= ~W ~ Pr(s $
Ti < s + hlTi
~ S, Tj = t),
i
= 1,2,
i
+ j = 3.
These four hazard functions determined the joint distribution of (Tl, T2 ) if this bi-
7
variate density is continuous. The joint distribution for t 1
< t 2 is then
with a similar expression if t 1 > t 2 •
One approach to the construction of bivariate survival distribution is to transform the independent variables, as given in the following example.
Example 1.1 (The bivariate 'shock' model of Marshall and Olkin (1967)) Let Vi, V2,
and
V3
denote the time to failure of type 1 but not
i~ype
2, failure of type 2 but not
type 1, and failures of type 1 and type 2 simultaneously, respectively. Assume that
Vi, V2, and "'3 are independent exponentially distributed variables with hazards All A2,
and A3' respectively. Let T 1 = min(Vi, "'3) and T 2 = min(V2, V3). Then the bivariate
survival distribution is given by
The distribution is a curved exponential family, as there is a positive probability of
A3/(A1 + A2 + A3) that the two components fail simultaneously. Thus, the distribution
should be used in the situation where simultaneoUls failure of two components is
possible. The observations are independent if A3 ==
o.
It can be shown that the
marginal distribution of Ti is exponential with hazard Ai + A3 and the distribution of
Tmin = min(Tll T2) is exponential with hazard (AI
+ A2 + A3), also.
The distribution
has a bivariate lack of memory property:
This distribution allows semiparametric generalizations along the lines of the
Cox regression model for failure time data with event intensities AI, A2' and A3 depending on a vector of covariates Z. Partial likelihoods for the regression parameters
may be derived, and in most cases the standard Cox regression model software may
8
be applied for the analysis with minor modification of the input data file. Klein,
Keiding and Kamby (1989) applied the semiparametric Marshall-Olkin Models to the
occurrence of metastases at multiple sites after breast cancer.
To avoid the singularity along the line tl = t 2 , Block and Basu (1974) derived an
absolutely continuous bivariate exponential distribution which retains the bivariate
lack of memory property but does not have exponential marginals. They also proved
that the only absolutely continuous bivariate distribution having both exponential
marginals and the bivariate lack of memory property is in the trivial case where
Tl and T2 are independent. Later, Sarkar (1987) derived an absolutely continuous
bivariate exponential distribution which has exponential marginals. Clearly, he had
to abandon the bivariate lack of memory property.
Hougaard (1986a) derived a multivariate Weibull distribution through the transformation of independent variables as well. A nice result about the dependence in
his formulation is that the correlation between log Tl and log T2 is (1 - (
2 ),
where
a E (0,1] is the index of a positive stable distribution. The observations are independent if a
= 1.
The bivariate exponential model mentioned by Gumbel (1960) is
a special case of Haugaard's bivariate Weibull model. Similar to Marshall-Olkin's
model, the marginal distributions of Ti (i = 1,2), and the distribution of Tmio of the
Gumbel's bivariate exponential distribution are exponential. However, the Gumbel's
model does not admit a singularity along the line t l
= t 2 in the (tIl t 2 ) plane as does
the Marshall-Olkin model.
The joint distributions can also be defined through conditional distributions.
One of the classical models of this type is the bivariate extension of the exponential
distribution shown in the following example.
Example 1.2 (Freund (1961)) The following is a bivariate exponential density of
9
"
Unlike the Marshall-Olkin model and the Gumbel model, the marginals of the Freund
model are not necessarily exponential. The marginal distribution of Ti is exponential
if and only if
0i
=
/3i, i
= 1,2. T1 and T2 are independent if
The distribution of Tmin is exponential with hazard
component i fails first is Od(OI
+ 02)'
01
+ 02.
01
=
/31
and
02
=
/32,
The probability that
Here, the dependence between T 1 and T2 is
characterized by the assumption that the failure of component 1 (or component 2)
changes the hazard of failure of component 2 (or component 1) from
to
/32 ).
01
to
/31 (
or
02
This is particdarly relevant in situations where the failure of one component
implies an increased load on the other, for instance, the case of two-organ systems
such as the kidneys of one individual.
Sometimes, as for example, in simulation studies, it is convenient to parameterize the joint distribution by means of the marginal distribution Fi(.) and a single
parameter
0
characterizing the dependence, that is,
(1.4)
Example 1.3 (Gumbel (1960) and Farlie (1960)) If F1 (t 1 ) and F 2 (t 2 ) are distribution functions, then
is a bivariate distribution func.tion for
0
E [-1, 1]. When
0
= 0, T 1 and T 2 are
independent. If the marginal distributions are exponential, the correlation between T 1
and T 2 is 0/4. Consequently, the correlation can ble either positive or negative, but
the absolute value cannot exceed 0.25.
10
•
Genest and MacKay (1986) noticed that many bivariate distributions are in
the family of Archimedean copulas, which is a special case of (1.4) by restricting to
•
uniform marginals, in the form
where K(.) is a strictly monotone decreasing convex function defined on (0,1]. One
widely cited bivariate frailty model due to Clayton (1978) fits into this framework.
Example 1.4 (Clayton (1978)) Assume that conditional on Y, the frailty, the failure
times of subjects are independent and with hazard YAi(t) for subject i. Let Y have a
gamma distribution. Then the bivariate survival function of (Tll T 2 ) is
Here, independence is obtained as a limiting case, for
e ---+ 00.
This model was further studied by Oakes (1982). Notice that with this derivation,
there is only positive dependence between members in a pair. However, the bivariate
distribution can be extended to allow negative dependence (Genest and MacKay
(1986)). Covariates can be included in the model assuming conditional proportional
hazards (Clayton and Cuzick (1985)) and in the case of Weibull hazards (Crowder
(1985)) also.
Marshall and Olkin (1988) generalized (1.4) to induce the families of multivariate distributions through the use of frailty models. Hougaard (1986b) and Oakes
(1989) discussed a wide range of densities for frailty Y. Bandeen-Roche and Liang
(1996) constructed multivariate survival distributions by a recursive nesting of univariate frailty-type distributions through which Archimedean copula forms are determined for all bivariate margins. Their family of distributions allows for different
levels of association between bivariate margins, instead of exchangeable dependence
11
structure only. Hougaard (1986a) derived many joint distributions when the frailty
y follows a family of positive stable distribution using the Laplace transformation.
•
For a more detailed review of bivariate survival models and some of their known
properties, see Hougaard (1987).
Conditioning on Past Events or History
Multiplicative intensity models condition on all past event history. In the counting
process approach of multiplicative intensity models to censored failure time data, the
intensity process for counting process N is modeled, rather than hazard functions for
the failure time T. Let N = (NIl···' N n ) be an n-component multivariate counting
process, where N i counts observed events in the life of the ith individual, i
= 1,· .. ,n,
over the time interval [0,1]. From the definition of the multivariate counting process,
the sample path of N 1 , · · · , N n are step functions, zero at time zero, with jumps of
size +1 only, and no two component processes can jump at the same time. Properties
of stochastic processes, such as being a local martingale or a predictable process, are
relative to a right-continuous nondecreasing family (.1t : t E [0,1]) of sub O'-algebras
on the sample space (n,:F, P). In other words, .1t is the filtration, which represents
everything that happens up to time t.
Example 1.5 In the multiplicative intensity model approach, the Cox hazard model
is based on the assumption that N has random intensity process A = (All·· ., An) such
that
where
f3
is a fixed column vector of p coefficients, Ao(t) a fixed unknown hazard func-
tion, and Yi (t) is a predictable at risk indicator process.
Therefore, the Cox model is a special case of the multiplicative intensity model with
Ni(t)
= I {Ti :5 t, 6 = I}, a zero-one variable, and Yi(t) = I {Xi ~ t}.
12
...
Andersen and Gill (1982) suggested a sirrlple extension of the Cox proportional
model to allow for multiple (recurrent) events per subject by applying multivariate
counting processes and extended the Cox partial likelihood theory to this situation.
Example 1.6 Andersen and Gill (1982) considered the hazard rate
Ai(t)
= lim
-hI Pr{Ni(t + h) h!O
Ni(t)
= 11Ft },
i = 1, ... , n, t E [0,1].
They obtain repeated observed failures of recurrent events by taking }i(t)
as long as the subject is under observation and Ni(t)
Ni(l) <
00,
=
1,
2:: 0, with the assumption
a.s.. Assume that the counting process N has random intensity process
A = (All··· An) such that
Ai(tj Zi(t))
= }i(t)Ao(t) exp{,8' Zi(t)},
i
= 1, ... , nj
t E [0,1]
and Zi(t) is predictable and locally bounded. Then, the processes Mi defined by
are local square integrable martingales on the time interval [0,1}.
The conditional
variance of a martingale M is given by the predictable variation process < M
defined by having the increments
d<M>(t)
= Var{dM(t) IFt - }.
The predictable covariation process <Mil M 2 > is defined by having the increments
Also,
and
i.e., M i and Mj are orthogonal when i
# j.
13
>,
Under the AG multiplicative intensity model, the risk sets for the (r
+ l)th
recurrences are not restricted to the subjects who have experienced the first r recurrences. In other words, all the subjects who are not censored and do not experience
more than (r + 1) recurrences are considered at risk of (r + l)th event, regardless of
whether they have not experienced any event at all or have experienced r recurrences
already. Also, under the assumption of constant baseline intensity over all events,
the risk of a recurrent event for a given subject is unaffected by any earlier events
that occurred to that subject, unless covariates that capture such dependence are included explicitly in the model as covariates. Consequently, such a model is somewhat
restrictive in terms of the nature of the within-subject dependence among recurrent
failure times on the same subject. The AG models were later extended to an nKdimensional multivariate counting process by Andersen and Borgan (1985) and Andersen et al. (1993), with elements Nik(t) counting the number of type k (recurrent)
events on [0, t] for individual i.
In the Cox model, the baseline hazards function is deterministic. The Cox
regression models with stochastic baseline hazards were generalized by Prentice,
Williams and Peterson (1981) (PWP) to multiple failures with within-subject dependence, called the overall intensity model.
Example 1.7 (Prentice et al. (1981)) Let z(u)
= [Zl(U), .. ·,zp(u)]'
of covariates, for a study subject, available at time u
~
denote a vector
O. Denote by Z(t) = {z(u) :
u :5 t}, the corresponding covariate process up to time t. Let N(t) = {n(u) : u :5 t},
where n(u) is the number of failures on a study subject prior to time u. Assume that·
(i) the failure time random variables are continuous and (ii) given that at least one
failure occurs in [t, t + h), the limiting probability of two or more failures in [t, t + h) is
zero as h --+ O. The counting process N(t) is equivalent to the random failure times
T1
< ... < Tn(t)
in [0, t] on a given individual. The hazard or intensity function at
time t is defined as the instantaneous rate of failure a:t time t given the covariate and
14
counting processes at time t. Specifically, one can write
A{tIN(t), Z(t)} =
~W-X Pr{t $
Tn(t)+l
< t + hIN(t), Z(t)}.
PWP proposed to permit (a) an arbitrary baseline intensity dependence on either the
time from the beginning of the study (total time) or the time from the immediately
preceding failure (gap time) and (b) the shape of the baseline hazard function to depend
arbitrarily on the number of preceding failures and possibly on other characteristics
of {N(t), Z(t)}. The two semiparametric hazard function models they suggested are:
A{tiN(t ), Z (t)} = Aos ( t) exp{ z (t )',8 s }
(1.5)
A{tlN(t), Z( t)} = Aos(t - tn(t») exp{ z(t)',8 s}
(1.6)
and
= 1, 2, ... ) are completely arbitrary baseline intensity
stratification variable s = s{N(t), Z(t), t} may change as a
where in each case Ao s ( .) 2:: 0 (s
junctions, where the
function of time for a given subject, and where ,8 s is a column vector of stratumspecific regression coefficients.
For instance, an important special case for the stratification variable is s
= n(t) + 1,
for which a subject moves to stratum (j + 1) immediately following his jth failure and
remains there until the (j + 1)th failure or until censorship takes place. The method of
allowing the arbitrary baseline hazard to depend on gap times used by Gail, Santner
and Brown (1980) in the analysis of comparative carcinogenesis experiments is a
two-sample special case of (1.6).
Partial likelihood was used for inferences about ,8 s' Let t sI
< '" < tsd, denote
the ordered (assumed distinct) failure times in stratum s. Suppose subject i fails in
stratum s at time
tsi
and let
Zsi(tsi)
denote this subject's covariate vector at
tsi.
Also
let R(t, s) denote the set of subjects at risk in stratum s just prior to time t and ds
be the total number of failures in stratum s. The partial likelihood function for the
15
first model (1.5) is:
d.
L(f3) =
II IIlexp{f3: z .,(t.,)}/ L
,,~1
,=1
exp {f3:Z1 (t.,))].
lER(t.i,,,)i
Assume the stratification is restricted to be s
= n(t) + 1 or finer so that a subject can
contribute at most one failure time in a specific stratum. Denote by
Ud
< ... < U.k.,
the distinct gap times from immediately preceding failure on the same subject, for
the k. failures occurring in stratum s. Suppose subject i fails in stratum s at gap
time
U., and that z.,(t.,) is the corresponding covariate value.
the set of subjects at risk in stratum s at gap time
U -
Let R( u, s) denote
0, U E (u.,'-ll u.,). Then the
partial likelihood for the second model (1.6) is:
k.
L({3) =
II IIlf3: exp{z",(t",)}/ L 13: exp{zl(tl + u",)}],
,,~1
,=1
lER(u.i,")
where tl is the last failure time on subject 1 prior to entry into stratum
Sj
tl
= 0 if
no preceding failure on the subject. Both of the partial likelihoods are of the same
form as for failure time data with the exception of the time argument and risk set
definitions.
Notice that the PWP overall intensity models stratify the data according to the
number of previous occurrences. This is done to allow for different baseline intensity
for different events and to restrict the risk sets for the (r
+ 1)th recurrences to the
subjects who have experienced the first r recurrences, which are different from the
AG multiplicative intensity models. Consequently, PWP models are most likely to
be useful when the sample size is large and the multiple failures are with 'ordered'
nature such as recurrent events.
PWP did not provide the asymptotic estimation theory for the overall intensity
models they proposed. Chang and Hsiung (1994) showed later that their proposed
estimators are asymptotically normal.
16
Frailty Models
Frailty models use random effects to represent heterogeneity of 'frailty' or proneness
to failure. In general, the frailty model is used to accommodate between-subject
dependence induced by a group of subjects, such as married couples or siblings, who
share some common characteristics.
The concept of frailty was originally introduced by Vaupel, Manton and Stallard
(1979) in a univariate analysis of life table data. However, as Aalen (1988) noted,
when only one event can be observed for each subject, the introduction of heterogeneity via frailties leads to severe problems of identifiability in failure time data analysis.
This is not surprising. It is analogous to the one-way random effects model in linear models where the between-subject and the within-subject variance components
can only be estimated when there are multiple observations for at least some of the
subjects.
If the frailty term is common to several individuals, it generates dependence
among their failure times. Clayton (1978) considered a model with no covariates
and with frailties distributed according to a gamma distribution for the multivariate
failure time data induced by clusters of subjects. In his model, the univariate frailty
is assumed to affect the failure rates in a multiplicative way. So if the hazard function
of a subject with a frailty value of 1 is given by 'x(t), then the hazard function of a
subject with a frailty value of W is given by W'x(t).
Clayton and Cuzick (1985) extended the Clayton model to include fixed time
covariates Z in which a single random effect lti enters the intensity process specification in a multiplicative way.
Example 1.8 (Clayton and Cuzick (1985)) Let
Tij
(i = 1,···, nij = 1,···, ni) de-
note the survival time of individual j in family i. Assume that conditional on the
family specific frailty Wi for i
= 1,···, n,
all individuals are independent; then, the
17
hazard of Tij is
(1.7)
Assume that
.xij
is in the form of the Cox proportional hazard; then, from (1.1),
(1.8)
where
U'i '" Gamma (ex, v), ex = v = ,-I, E(Wi ) = 1, and var(Wi ) = ,.
Note that the frailty is an unobserved random factor applied to the base-
line hazard function.
Conditional on the value of the unobservable frailty, the
failure times follow the Cox regression model. The marginal hazard .xij(tIZij) =
E w [W.x o(t)exp(,8'Zij)] no longer follows the proportional hazards model; instead
there is convergence of hazards at a rate determined by the dependence parameter, (Clayton (1991)). Consequently, the dependence parameter and the regression
parameters are confounded. Information for estimation of, comes partly from the coincidence of failure within families, and partly from marginal convergence of hazards
in relation to covariates. This implies that the dependence parameter also measures
something other than dependence.
Clayton and Cuzick (1985) proposed likelihood-·based estimation procedures for
, and ,8 using the distribution of the generalized rank vector (Prentice (1978)), but
the proof of the asymptotic properties of the method is not available. A series of
approximations were made in order to make this approach computationally feasible;
however, their procedures remain computationally complex.
Nielsen, Gill, Andersen and Sfljrensen (1992) used the profile likelihood via an
EM algorithm to estimate the cumulative baseline hazard and the variance of the
random effect in the frailty model. The asymptotic distributions of these estimators
along with consistent estimators of their asymptotic: variances are given by Murphy
(1994) and Murphy (1995). Klein (1992) also used an EM algorithm based on the
profile likelihood to carry out the estimation of parameters in the frailty models for
18
the Framingham Heart Study. The difference between Nielsen et al. (1992) and Klein
(1992) is that Nielson et al. used a one-dimensional search of the profile likelihood
while Klein carried out the complete implementation of the EM algorithm. Because
a common frailty cannot capture the between-subject dependence among married
couples and siblings concurrently, Klein used one frailty model to incorporate the
dependence induced by married couples and another frailty model to accommodate
the dependence induced by siblings. However, only one endpoint of interest (death)
was considered in his frailty model.
Hougaard (1986a) presented positive stable distributions for frailty W, both
with arbitrary and with Weibull individual hazards. The Weibull model is mathematically interesting because it has Weibull marginal distributions and the time to
the first failure in a cluster also follows a Weibull distribution. The Fisher information of the bivariate Weibull was found by Oakes and Manatunga (1992). The
frailty model with accelerated hazards for bivariate failure time data was proposed
by Anderson and Louis (1995). They presented both parametric and semiparametric
techniques for parameter estimation. However, the proof of the asymptotic properties
of the methods remains elusive.
Chang and Hsiung (1995) proposed proportional hazards models with time dependent frailties. A regular efficient estimator for the relative risk parameter is obtained. They showed that this estimator is asymptotically normal and asymptotically
efficient.
All these methods mentioned so far impose specific structures of dependence
among the multivariate failure data either explicitly or implicitly. If the dependence
structure is misspecified, the estimators may not be valid. When the dependence
structure is not the parameter of interest, we can instead consider modeling only the
marginal hazard functions to avoid the specification of the dependence structure.
19
Marginal Models
The dependence of related failure times complicates the analysis of multivariate failure time data. Because of censoring, this dependence poses a greater challenge than
uncensored multivariate data. By using a marginal model approach, we only specify
the marginal model for each failure time variable and leave the dependence among
failure times unspecified. Therefore, if we are only interested in the marginal regression parameters and treat the dependence
amonl~
failure times as a nuisance, a
marginal model approach can be used. Specifically, in a marginal model approach for
regression analysis of multivariate failure time data, we use the following two steps:
first, fit each failure time variable using a univariate model, ignoring the possible
dependence among the multivariate failure time variables; then, replace the naive
covariance matrix with a robust covariance matrix estimator to account for possible
dependence among the multivariate failure time variables.
Let (Tij, Cij, Oij, Zij), i
= 1,,'"
n,j
= 1,"', J',
denote the failure time, cen-
soring time, censoring indicator, pj-vector of explanatory variables of the jth failure
type of event on the ith subject under study, respectively. We observe (Xij, Oij, Zij),
= min(Tij , Cij ) and Oij = I {Tij < Cij }, for i = 1,'"
}'ij(t) = I{Xij ~ t}, Xi = (XiI,''',Xa )', Zi =
= 1"
.. ,J.
where Xij
,n and j
Denote
(Ziu,''',Za)', and
6 i = (OiU,' .. ,oa)'.
Example 1.9 (Wei, Lin and Weissfeld (1989)) In the setting of the regression analysis of multivariate failure time observations on the same subject, Wei, Lin, and
Weissfeld (WLW) modeled the marginal hazard of each failure time variable Xij with
a Cox regression model. The dependence structure l'lmong distinct failure times on
each subject was left unspecified. The hazard for the j th failure type on the i th subject
has the form
20
where Ao; (t) is an unspecified baseline hazard function for the j th failure type and
(3i = (f31i, ... , f3Pii)' is the failure specific regression parameter.
Therefore, the jth failure specific Cox partial likelihood is
Li((3.) =
i=l
J
where Rj(t) = {I : X ,j
~
f[[
exp{(3jZii(~ii)}
E,eRi(X,j)
exp{(3; Z Ij (Xij )}
t
i
t} is the set of subjects at risk just prior to time t with
respect to the jth type of failure. The regression parameters (3j are estimated by
maximizing the failure specific partial likelihoods and solving the score equation
Under the assumption that (Xi, 6 i , Zi(t)), i
=
1"", n are independent and iden-
tically distributed with bounded Zij, Wei et al. (1989) showed that the resulting
estimators across all types of failure Vn(/3~
-
(3~, ... ,/3~ - (3~)' are asymptotically
jointly normal with mean zero and a covariance matrix which can be consistently
estimated by the sandwich type robust covariance estimator.
It is convenient to let the regression parameters (3 = (f3I,"', f3p)' be the
same for all types of failure.
This can always be achieved by introducing fail-
ure type specific covariates. We are going to show how this can be done for the
WLW model.
Let (3
=
(f3I,"', f3p)'
=
((3~, ... , (3~)', where p
=
Ef=l Pj.
If
we introduce failure type specific covariates for the ith subject on the jth failure
Zij = (O~ll"" O~i-l' Z~j' O~j+ll' .. ,O~J)', that is, Zij consists of stacking together
(J-l) zero vectors corresponding to other (J-l) types of failure and the Zij for the
jth type of failure, then (3'Zii(t) = (3jZij(t) is the risk score for the jth type of
failure of the ith subject. Denote }'ij(t)
=
[{Xii
~
t}, where [{.} is an indicator
function. Then, }'ij(t) = 1 if subject i is at risk and under observation just prior to
time t for experiencing the jth failure. Therefore, the hazards for the jth failure on
the ith subject in WLW approach can be re-written as
Aii(t) = AOj(t) exp{(3' Zij(t)} , t ~ 0,
21
(1.9)
and the jth failure specific Cox partial likelihoods as
The corresponding score equations for all n subjects and all J failures are
U(/.I) =
1J
~ ~ S.. {z~.(x .. )_ Ei=1 ¥,j(Xij ) exp[f3' Zlj(Xij)]Zlj(Xi j )}
L..JL..J
i=1
;=1
I)
I)
",n
I)
LJI=1
y,.(X·
.. )
[/.I'Z*(X
.. )]
I)
I) exp 1J
Ij
I)
Hence, the use of f3 as regression parameters in the failure specific model does not
preclude the use of failure specific parameters since we can take into account the failure
specific regression parameters through the use of failure type specific covariates.
Notice that if we consider different types of failures as strata, the WLW model
stratifies the analysis based on the failure type by using different baselines (c.f. (1.3)
and (1.9)).
Example 1.10 Lee, Wei and Amato (1992) considered highly stratified data sets
which arose from paired eye data on vision loss
dUi~
to diabetic retinopathy, where
photocoagulation was randomly assigned to one eye of each patient. Each pair of eyes
on the same subject was treated as a cluster. The m,arginal hazard for the failure on
the j th eye of the i th subject has the form
where 'xo(t) is an unspecified common baseline hazard function and f3 = (/3h· .. ,/3p )'
is the common regression parameter among J marginal models.
Under the assumption that (Xi, c5 i , Zi(t)), i = 1,,,·, n are independent and identically distributed with bounded Zij, Lee, Wei, and Amato (LWA) estimated the
regression parameters
13 by maximizing the
22
"partial likelihoods":
•
The corresponding score equations are
U({3)
n J
= l:
l: 6ij {
i=l j=l
",n ",J
Yi (X )ef3'Z,m(Xi j )Z (X)}
Zij(Xij ) _ L.I=l L.m=l 1m ij
,
1m ij
.
E;=l E~=l lim (Xij )e{3 Z'm(Xij)
They also showed that the resulting estimators Vn(,8~ -.BL···, ,8~-.B~), are asymptotically jointly normal, with zero mean and covariance matrix which can be consistently
estimated by the "sandwich" type covariance estimator.
Consider the failure on different eyes as different failures. As we pointed out previously, the use of common regression parameters does not preclude the use of failure
specific parameters. Therefore, the only difference between LWA and WLW models is
that LWA postulates a common baseline hazard function among the marginal models
whereas WLW allows different baseline hazard functions. In other words, the WLW
model stratifies the analysis based on the failure type while the LWA model does not.
Example 1.11 (Liang, Self and Chang (1993)) A different procedure was proposed
to estimate the marginal regression parameters for the LWA model in the context of
between-subject dependence induced by grouping individuals into clusters. Liang, Self,
and Chang (LSC) based their estimators on pairwise comparison of individuals who
have failed and independent individuals who are at risk at the time of failure. Their
estimating equation is similar to LWA's with
E;=l E~=l lim (Xij ) exp [{3' Z 1m (Xij )] Z 1m (Xij )
E;=l E~=l lIm(Xij)eXp[{3'Z'm(Xij )]
replaced by pairwise comparisons of independent observations. The resulting estimating equation is
n
U({3)
J
= l: ~ f{ni(Xij»o}6ij {Zij (Xij ) i=l j=l
where ni(t) = EI~i Ek lIk(t),
fA
nil (Xij) ~ ~ eij,lk({3,Xij )}
1#
= 0,
k
is the indicator function of the set A and eij,lk({3, t)
is given by
Xj(t)Zij( t) exp{ Zij (t)' {3}
Xj(t) exp{Zij(t)'{3}
+ lIk(t)Z'k(t) exp{ Z,k(t)'{3}
+ lIk(i) exp{Z,k(i)'{3}
23
The asymptotic distributions of estimators of marginal regression parameters are developed by the representation of the estimating equations as V-statistics. Their estimators are consistent and asymptotically normal. The estimates and their variances
from LWA and LSC procedures are very similar when the data from the Diabetic
Retinopathy Study (Lin (1994)) were used. It would be interest to compare the efficiencies of the LWA robust estimators and the LSC estimators. The LWA procedure is
computationally easy and can be performed by using the standard statistical software
(Therneau (1996)). However, no statistical software is available for the calculation of
the LSC procedure.
In the above marginal approaches, the "partial likelihood" score equations were
derived under the working independence assumption. It may be more efficient to
use weighted estimating equations that take into account the nature of dependence
explicitly as in the case oflongitudinal data (Liang and Zeger (1986)). In an attempt
to improve the efficiency of marginal parameter estimators, Cai and Prentice (1995)
proposed weighted estimating equations for the estimation of marginal parameters.
They proved the resulting estimators remain consistent and asymptotically normal
with an estimable covariance matrix under mild regularity conditions on the weight
matrices. The weight matrices can be estimated either parametrically or nonparametrically without affecting the asymptotic distribution of the regression parameter
estimate. Cai and Prentice (1995) and Prentice and Cai (1992) suggested the use of
the inverse matrix of estimated correlations among counting process marginal martingales. Their simulation studies (Cai and Prentice (Um5)) indicate that the inclusion
of weights in the estimating equations results in efficiency gains only when the dependence among the failure times is strong and the censoring is not heavy. This may
occur because it is difficult to construct optimal weight matrices as a result of the
censoring and the non-linear nature of the Cox model (Lin (1994)).
Prentice and Hsu (1996) proposed using joint estimating equations for hazard
ratio and correlation parameter, which generalized the idea of Zhao and Prentice
24
(1990) and Prentice and Zhao (1991) to multivariate failure time situations. The
elements of a multivariate failure time variate are assumed to have marginal hazard
functions of the Cox regression form. The pairwise correlations among cumulative
hazard variates provide summary measures of the dependence among failure times
that do not depend on the corresponding marginal distribution shapes. In the absence
of censorship, the mean and the covariance structure of cumulative baseline hazard
variates, in conjunction with standard baseline hazard function estimators, is used
to develop joint estimating equations for hazard ratio and cumulative hazard variate
correlation parameters. Additional assumptions are required to generalize these estimating equations to allow independent right censorship and time-varying covariates.
Semiparametric models are introduced for the joint distribution of pairs of cumulative hazard variates, with emphasis on the special case of the Clayton model. These
authors argued that the cumulative hazard correlation estimates enjoy some robustness to departures from the assumed semiparametric model forms under conditions
of light censorship, and that the corresponding Clayton model parameter estimates
have a useful interpretation under heavy censorship. They showed that estimators
of hazard ratio and cumulative hazard variate correlations parameters are consistent
and asymptotically normal and that marginal hazard ratio parameters are generally
consistently estimated even if other distributional assumptions do not hold. An attractive feature of these procedures is that it is unnecessary to make assumptions
concerning the joint distribution of failure times, beyond marginal hazard models for
failure times, the mean and the covariance structure of cumulative baseline hazard
variates, and semiparametric models for the joint distribution of pairs of cumulative
hazard variates. However, the computation is cumbersome and the cumulative hazard
correlation estimates depend on the assumed semiparametric model forms.
Instead of using the semiparametric Cox regression model as a marginal
model, Huster, Brookmeyer and Self (1989) used a fully parametric specification
of the marginal hazard function under the independent working assumption in their
25
marginal approach for bivariate failure time data. They showed that the maximum
likelihood estimates of regression parameters under the independent working assumption are consistent with respect to a specified marginal distribution and their covariance matrix can be consistently estimated by the robust covariance estimate derived.
Lin and Wei (1992) and Lee, Wei and Ying (1993) applied the methods of WLW and
LWA to the case of accelerated failure time models, respectively.
Concluding Remarks
A fundamental consideration in choosing a strategy for the analysis of multivariate
failure data is whether the dependence among multivariate failure time data is a
nuisance or a parameter of intrinsic scientific interest. If dependence is a nuisance, a
marginal approach should be used to avoid imposing a special structure on dependence
and to account for dependence at the same time. \Vhen dependence is of interest in
its own right, however, other approaches which explicitly model dependence have
to be used. Usually, frailty models are used when between-subject dependence is
considered.
1.2.3
Model Misspecification and Robust Inferences
Assume that Xl,'" ,Xn are independently identically distributed (iid) random variables with specified probability density function f(x'; 8). It is well known that under
some regularity conditions I
I 2
/ (8)(iJ
- 8) converges in distribution to the standard
normal N(O, 1) probability law, where iJ is the maximum likelihood estimator (MLE)
of 8, I(8) = -Ei=IE[82 10gf(8;Xi)/8888'], the (expected) Fisher's information
(Cramer (1946)). Efron and Hinkley (1978) have suggested that I(8) should be
replaced by the "observed information", 1(8)
= - Ei=l 8 2 log f(8; xi)/8888', to pro-
vide inferences which are properly conditioned on ancillary statistics.
26
In the parametric setting, a number of techniques have been suggested for handling misspecified models for independently identically distributed random variables,
e.g., Huber (1967), White (1982), Kent (1982), Royall (1986), and Gail, Tan and
Piantadosi (1988).
Example 1.12 Huber (1967) discussed the behavior of any estimator iJ from the
solution to an estimating equation
So, the score equation
A
U(8)
n
=~
,=1
8
88 log f(8; Xi)
=0
is a special case of the estimating equation, with "p(iJjXi)
= 8Iogf(8jxi)/8818=iJ'
In
particular, Huber was interested in the asymptotic distribution of the MLE iJ for the
true parameter 8 0 under model misspecification: the true distribution of the underlying
observations is h(.) but misspecified as f(.).
Huber showed that under some mild
regularity conditions, the MLE iJ converges to a well-defined constant vector 8* and
vn(iJ - 8*) is asymptotically normal with mean vector zero and covariate matrix
I(8*)-lC(8*)I(8*)-I; where 8* satisfies
E["p(8*jx)]
= E[:8Iogf(8jx)]I8=8* = o.
Huber did not discuss the information theoretic interpretation of the parameter vector
8*. This interpretation was emphasized by Akaike (1973). He noted that when the
true distribution is unknown, the MLE iJ is a natural estimator for the parameters 8*,
the parameter vector which minimizes the Kullback and Leibler (1951) information
criteria (KLIC),
IC(h: fj8)
=E[log f~~~~)]'
Here the expectations are taken with respect to the true distribution. The opposite
of IC(h : Ij 8) = -IC(I : hj 8) is called the entropy of the true distribution H(x)
27
with respect to the working distribution F(x;8), where H(.) and F(.) are distribution
functions with measurable Radon-Nikodym density
h(x)
= dH(x)
. 8) _ dF(xj 8)
dv
.
an d f( x,
dv
Intuitively, ICU : hj 8) measures our ignorance about the true distribution. Hence,
we might call iJ the 'minimum ignorance' estimator.
Example 1.13 (White {1982}} In the setting of consequence and detection of model
misspecijication when using the maximum likelihood techniques of estimation and inference, White exploited the properties of the informl1,tion matrix to yield useful tests
for model misspecijication and rediscovered Huber's 1'esults. Let
1
82
?:
n
.
I n(8) = - 8888' log f(8; Xi),
n ,=1
1 n 8
8
C n(8) = ;;- ~[88Iogf(8;xi)][ 80Iogf(8;xi))',
X(8)
82
= -E[8888' log f(8; x)],
and
C( 8) = E{[ :8 log f( 8; x)][ :8 log f( 8; x)]'}.
Under some regularity conditions,
White showe(l that the sandwich estimator
I;;I(iJ)C n (iJ)I;;I(iJ) converges to X- 1 (8*)C(8*)X- 1 (8*) almost surely, element by element.
Note that the Fisher's information X(8) should be consistently estimated by either
the score derivative In(iJ) or the squared score matrix Cn(iJ) under the assumed
model f(8; Xi), provided that some regularity conditions are satisfied, A significant
difference between In(iJ) and Cn(iJ) indicates that the assumed model f(8;xi) is
incorrect. White suggested using the test statistic n 1/ 2 {I n (iJ) - Cn(iJ)} to detect the
28
model misspecification. The asymptotic normality of the test statistic follows easily
from a Taylor series expansion and the Lindeberg-Levy central limit theorem.
Now, let's see how the sandwich type of robust covariance matrix is estimated
when the true distribution is h(.) = h(z; ( 0 ) but incorrectly specified as f(z; 8).
First, we consider maximum likelihood estimation of the independent and identically
distributed random variables. Then
n
n
8
U(8) = L 1/J(8; Xi) = L 88 log f(8; Xi)
i=1
1=1
n
=L
i=1
1£i(8).
By interchanging the order of the expectation and the derivative,
=
I(8)
8
88 E [U(8)]
8
= E[ 88U(8)]
82
-E[8888' log f(8; x)],
where the expectation is taken with respect to the true distribution. Hence, I(8 0 )
is the expected value of the information matrix, which is naturally estimated by the
observed information
1
A
I n (8)
Since E[U(8 0)]
n
-
82
= -~ ~ 8888' log f(8; xi)1 8 =o'
= 0,
n
-
n
L E[1£i(80)1£~(80)] + L E[1£i(8 0)1£j(8 0)]
i=1
ii'i
(1.10)
n
-
LE[1£i(80)1£~(80)].
i=1
Note that the observations are independent, thus, 1£i(8 0 ) are also independent and
the cross terms in (1.10) are zero. In view of (1.10) a natural estimator of C(8 0 ) is
29
When the observations are not independent, the estimator of C(8 o ) must be adjusted accordingly. A reasonable estimate is available when the correlation is confined
to clusters. We assume that the data comes from clustered sampling with j
= 1, ... , k
clusters, where there may be correlation within clusters but observations from different clusters are independent. By (1.10) the cross-product terms between clusters can
be eliminated, if we define
k
0(8 0 )
=L
E[Uj(8 0 )Uj(80 )]
j=l
where OJ
= E?~l Uij
is the sum over all subjects in the jth cluster. An estimator of
0(8 0 ) is
Cn (8) =
~ I)U;(8)Uj(8)].
;=1
This leads to the modified sandwich estimator 1- 1 (8)Cn ( 8)1- 1 (8).
Marginal approach can be considered as a type of "model misspecification" in
terms of dependence structure. This view point is illustrated by the next example.
Example 1.14 (Liang and Zeger (1986)) Let Yi = [lil,"', lin]' be the n x 1 vector
of outcome values and Xi = (Xi},"', Xin)' be the n x p matrix of covariate values
observed at times t
=
1,,", n for the i th subject, i
=
1,"" K. Assume that the
marginal density of Yit is
where (ht
= h(1]it) ,
1]it
= Zitf3.
By ignoring dependence structure of a subject's ob-
servations, they "misspecify" the joint distribution as the product of the marginal
distributions. Under such an independence
workin~,
assumption, the score equations
from a likelihood analysis have the form
K
[.7(f3)
= LX~~iSi == 0,
i=l
30
(1.11)
where ~i = diag(dOit/dT/it) is an n x n matrix and Si
=Y i-
a~(8)
is of order n x 1
for the i th subject. The estimator 131 is defined as the solution of equation (l.ll).
Under mild regularity conditions, the resulting estimator 131 of f3 is consistent and
K 1 /2(131 - (3) is asymptotically multivariate Gaussian as K ~
00
with zero mean
and covariance matrix l7 given by
l7
=
K
K
i=1
i=1
lim K(L: X~~iAi~iXi)-1L: X~~iCOV(Yi)~iXi)
K
-+00
K
(L: X~~iAi~iX i)-1
i=1
where the moment calculations for the Y i 's are taken with respect to the true un-
= diag{a"(Oit)}.
derlying model and Ai
The variance of 13/, 17, can be consistently
estimated by
K
1- (131 )([~ X~~iSiS~~iXi] f3=i3) /- 1 (131)'
1
To increase efficiency, Liang and Zeger also took the correlation into account through
a class of weighted estimating equations. The resulting estimators of f3 remain consistent. Consistent sandwich variance estimates are also available under the weak
assumption that a weighted average of the estimated correlation matrices converges
to a fixed matrix.
The consequences of model misspecification for independently identically distributed (iid) failure time data on the statistical inference of regression parameter f3,
based on the assumed Cox regression model, are investigated by several authors, including Gail, Wieand and Piantadosi (1984), Lagakos and Schoenfeld (1984), Solomon
(1984), Struthers and Kalbfleisch (1986), Lagakos (1988), Lin and Wei (1989), and
Anderson and Fleming (1995).
Example 1.15 (Struthers 'and Kalbfleisch (l986)) Assume that (Xi, Di' Zi), i 31
1, ... ,n are n independent realizations of (X, 6, Z),
l~hat
Z is bounded, that the sup-
port of the failure time T properly contains that of the censoring variable, and that
~(tj Zi(t))
= ~o(t) exp(,8' Zi(t))
and Ai(t) be the wor'king hazard model and the true
hazard for (Xi, bi' Zi), respectively. Let
S(d)(,8,t)
1
=-
n
n
~l'i(t)exp(,8'Zi(t))Zi(t)l8ld,
i=l
s(d)(t) = E(S(d)(t)),
for d
= 0,1,2;
and s(d)(,8, t) = E(S(d)(,8, t))
where for a column vector a, a l8l2 refers to the matrix aa', a l8l1 the
vector a, a l8lo the scalar 1, and the expectations art: taken with respect to the true
model of (Xi, bi' Zi), i = 1,···, n. The logarithm of the partial likelihood based on the
working hazard model can be expressed as
n
1(,8) = ~ bi[,8' Zi(Xi) -log(S(O)(,8, Xi))].
i=l
The corresponding score function is
Then, the maximum partial likelihood estimator
/3,
the solution to U(,8)
= 0,
con-
verges in probability to a constant vector ,8*. ,8* is the unique solution to the system
of equations
The asymptotic distribution of
/3
under a possibly misspecified Cox model for iid
realizations of (X, b, Z) is given in the following example.
Example 1.16 (Lin and Wei (1989)) In addition to the assumptions in Struthers
and Kalbfleisch (1986), assume that I(,8*) is positive definite and other (unspecified)
32
regularity conditions, where
If the maximum partial likelihood estimator!3 is under a possibly misspecified Cox
model, the random vector vn(!3 - (3*) is asymptotically normal with mean vector zero
and with a covariance matrix that can be consistently estimated by V(!3), where
V({3) = I- 1 ({3)C n ({3)I- 1 ({3),
I({3)
= -~ ~ :~~,
C n ({3)
=~L
Wi({3)02,
and
-
b'dZi(X i ) -
S(l)({3, Xi)
SO({3, Xi) }
_t
):XP({3'Zi(Xj)) {Zi(X j ) _
b'j}i(X j
j=l
nS(
)({3, X j)
S~)({3, Xj)}.
S(
)({3, X j)
Notice that the "sandwich" type robust covariance estimator V(!3) for the Cox model
takes a more complicated form than its parametric counterpart because the score
function is no longer a sum of n iid random vectors in a partial likelihood setting. In
Chapter 6, we will generalize the results of Struthers and Kalbfleisch (1986) and Lin
and Wei (1989) from univariate failure time data to multivariate failure time data in
a marginal model approach.
Anderson and Fleming (1995) demonstrated the danger of generalizing the intuition gained from analysis of covariance in linear models to the Cox regression model.
Their results show that covariate adjustment in Cox regression models has little effect
on the variance but may significantly improve the accuracy of the treatment effect
estimator.
We point out in passing that in a marginal model approach, we intentionally
"misspecify" the joint distribution of dependent responses as the product of the
33
marginal distributions, as if these responses were independent, to avoid specifying
the dependence structure. Then, we use a robust covariance estimator to account
for the misspecified dependence structure. This idea can be attributable to Huber
(1967). The connection between the robust covariance for model misspecification and
the marginal approach has not been made explicitly in the literature.
1.3
Synopsis of Research
This research develops suitable methods for modeling multivariate failure time data
with generalized dependence structure. Here, we consider a generalized dependence
structure which consists of both between-subject and. within-subject dependence instead of dealing with either between-subject dependence or within-subject dependence
only. We propose using a marginal model approach to analyze multivariate failure
time data with generalized dependence structure when scientific interests are in the
effects of covariates on the risk of failures and knowledge of the dependence structure
is not available. In Chapter 2 three types of marginal models in the form of Cox
regression models are proposed based on whether or not the baseline hazards are
distinguishable: distinct baseline models, common baseline models, and mixed baseline models. The proposed marginal models generalize the WLW model and the LWA
model. Our distinct baseline model is equivalent to the WLW model if we stratify our
analysis based on both failure types and subjects in a. cluster. Similarly, our common
baseline model becomes a more general setup of the LWA model by allowing multiple
failures per subject. However, we have to assume either different baselines for each
combination of failure types and subjects in a family, or an identical baseline for all
combinations of failures and subjects in a stratum in order to apply WLW models or
LWA models. Our mixed baseline hazard model offers significantly greater flexibility
and applicability for modeling, and enables us to deal with some application problems the current existing methods can not handle. Inference on regression parameters
34
for each type of model is based on a pseudo score equation under the independence
working assumption which is in the framework of generalized estimating equations.
Relying on the theory of multivariate counting processes, stochastic integrals,
and local martingales, we prove in Chapter 3 that the estimators for the proposed
models are consistent and asymptoticaJly normal with a robust covariance matrix
which can be consistently estimated. The mathematical background for the proofs
can be found in Gill (1980), Andersen and Gill (1982), Fleming and Harrington
(1991), and Andersen et al. (1993). Simulation studies were conducted in Chapter 4 to
assess the adequacy of the proposed large-sample approximation for practical sample
sizes. The methodology is illustrated in Chapter 5 on the data from the Framingham
Heart Study. The consequences of marginal model misspecification are discussed in
Chapter 6 and the numerical investigations are given in Chapter 7. In Chapter 8,
we summarize this doctoral research and propose possible extensions and areas for
further research.
35
Chapter 2
Marginal Modeling and
Estimation
2.1
Introduction
In this chapter we describe three types of marginal models proposed for censored
data with generalized dependence structure. In Section 2.2 some basic notation and
definitions are presented. The proposed models are described and the estimation
methods are introduced in Section 2.3. We conclude the chapter with a brief discussion
of the relationship between the proposed marginal models and the WLW and LWA
models and a quick overview of the asymptotic properties of parameter estimators
which are to be developed in Chapter 3.
2.2
Notation and Definitions
Suppose that there are n independent families. In family i, there are Ji members. For
member j in family i,
/{ij
types offailures may occur" We use (i, j, k) to denote the kth
36
type of failure on member j in the ith family, for i = 1,2,'" n; j = 1,2,' .. ,Ji ; k =
1,2"", Kij. Here, "family" (the independent unit) is used as a generic term for
a cluster of related observations (for example, failure times of CHD and CVA observed from family members in the Framingham Heart Study) and "member" for an
individual in a cluster.
The data available in regression problems for multivariate failure time data with
generalized dependence are observations on the triplet (Xijk , Oijk, Zijk) for (i,j, k),
where Xijk is the minimum of the potential failure and the potential censoring time
pair (Tijk , Cijk ) for the kth type of failure on member j in family i; the indicator
of observing failure for (i,j, k) is Oijk
=
I {Tijk $ Cijk}; covariates for (i,j, k) are
denoted by a p-dimensional column vector Z ijk
= (Zijkl"", Zijkp)',
which may be
time-varying. If the number of members in families and/or the number of failure
types on members are not equal, set J
= maxi(Ji) and K = maXij(Kij ).
The missing
failure time Tijk or missing covariates Z ijk can be accommodated by setting the
corresponding Cijk to zero. This implies that X ijk
=
0 and Oijk
=
0, since Tijk
is positive. Hence, such cases make no contribution to the estimation procedure.
Consequently, we assume implicitly that data are missing completely at random in the
sense of Rubin (1976). Therefore, without loss of generality, we assume hereafter that
there are K types of failure which may be experienced by each member and J members
= I(Xijk ~ t)
t, Oijk = 1) be
and Mijk(t) =
within each family. In terms of counting process notation, we let }ijk(t)
denote the at risk indicator process for (i, j, k); N ijk (t)
= I (Xijk
$
the counting process which registers failure for (i,j, k); and Aijk(t)
Nijk(t)
-lot }ijk( )Aijk(
U
U )du
denote the corresponding marginal hazard and marginal
martingale, respectively. The marginal hazard rate of (i, j, k) is defined as:
A't'k(t·
Z)
),
i
= lim.!..
h!O h Pr(t <
- T" 'k < t + hiT,· 'k > t·
t)
t)
= 1,2,,, . ,n; j = 1,2,,, . ,J; k = 1,2"
-
,
Z)
.. ,K.
As in most failure time data analyses, we assume both independent right cen37
sorship and a noninformative censoring scheme. Independent censoring would hold,
for example, if for a given (i, j, k), the failure time Tij~: and the censoring time Cijk are
statistically independent for i = 1,2,···, nj j = 1,2,··· ,Jj k = 1,2,··· ,Kj conditional on the covariates Z ijk. It is presumed that J and K are small relative to n, the
number of independent families. We also assume tha.t the number of distinct events,
K, is fixed for each subject and the number of subjects, J, fixed in each family, that
is, J and K are not functions of n. Because we are studying the occurrence of random
events in time, all results are presented on a continuous time interval T = [0, T), for
a given terminal time
2.3
T,
0<
T ~ 00.
Marginal Hazard Models and Estimation
We propose the following three types of marginal haz,:trd models, Le., distinct baseline
hazard model, mixed baseline hazard model, and common baseline hazard model. All
these models are in the Cox regression type of form.
•
2.3.1
Distinct Baseline Hazard Models
The marginal distinct baseline hazard model for the kth type of failure on member j
in family i is:
(2.1)
for i = 1,· .. ,nj j = 1,··· ,Jj and k = 1,···, Kj where
fixed, nonnegative baseline hazard function and f3
:=
AOjk(t)
is an unspecified, yet
({3I, ... , (3p)' is a fixed column
vector of p regression parameters. We use f3 to denote the true value of the regression
parameter as well as a generic argument. Where it is necessary to make the distinction, f3 0 is used to denote the true parameter value. Recall that the assumption of
common f30 for all J members and K types of failure does not forfeit the generality
38
of model (2.1). We can obtain member type specific or failure type specific distinct
{30 by using member type specific or failure type specific covariates, respectively.
In this model, a different baseline hazard AOjk(t) is assumed for each member
and for each type of failure in a family. The distinct baseline model is useful when the
failure events are of different types and the members in a family have heterogeneous
susceptibilities to the same type of failure. Use the Framingham Heart Study as an
example. Suppose that we are interested in the effect of some risk factors on the time
to myocardial infarction and the time to cancer involving husbands and wives. The
incidence and the prevalence of myocardial infarction are different from the incident
and the prevalence of cancer, and husband and wives are likely to have dissimilar
physiological resistance to myocardial infarction or to cancer. It is reasonable for
us to assume different baseline hazard functions for myocardial infarction and for
cancer, and also different baseline hazard functions for the husband and the wife
from the same family. Hence, the distinct baseline marginal models of (2.1) should
be considered.
•
The marginal partial likelihood for the kth type of failure on the jth member
among the n independent families is
P L j k({3)
=
Ii [
exp{{3' Zijk~Xijk)}
] 6ijlc
i=l LIERjlc(Xijlc) exp{{3 Zljk(Xijk ))}
j
where Rjk(t) = {I: Xljk
~
= 1,'" J,
k = 1,·· . K,
t}, that is, the set offamilies at risk just prior to time t with
respect to the kth type of failure on member j. If Tijk were statistically independent
(that is, under a working independence assumption), the (pseudo) partial likelihood
would be
P L({3)
{ a' Z
(X )}
] 6ijlc
exp,., ijk, ijk
k=l j=l i=l LIERjlc(Xijlc) exp{{3 Zljk(Xijk )}
K
= II
IIJ IIn [
(2.2)
When the pseudo partial likelihood in (2.2) is formulated by means of counting pro-
39
cesses, it has the form
P L((3)
K
= II
IIJ IIn II
{V ()
n~jjk u exp
[a'Z
()]
;jk U
k=l j=l i=l U~O E,=l Y,jk( u) exp[(3 Z,jk( u)]
fJ·
}dNiilc(U)
,
where dNijk(t) = Nijk(t) - Nijk(t-). Let 1((3, t) be the logarithm of P L((3) at time
t, that is, 1((3, t) is defined as
1((3, t) = log P L((3, t)
for which the pseudo partial likelihood score processes are
U((3, t)
=
81((3,t)
8 (3
Now, we are going to show that (2.3) is equal to
Notice that,
(2.3)
40
•
_ v;. ( ) \ .. ( )
S iJk U A1Jk U
Ei=1 Y,jk( U )Zljk(U) exp[{3' Zljk( U )] d }
n
()
[t:l'
()]
U .
E/=1
Y,jk U exp fJ Zljk U
.
(2.5)
The last term in (2.5)
n
- E Zljk(U)Y,jk( U)Aljk( U )du}
1=1
o.
Thus, we have (2.3) equal to (2.4).
The maximum partial likelihood estimator
to the pseudo partial likelihood score equations
U({3)
=
8 log P L({3, T)
8{3
41
i3d for (3
is defined as the solution
- o.
2.3.2
(2.6)
Common Baseline Hazard Models
The marginal common baseline hazard model for the kth type of failure on member
j in family i is,
(2.7)
for i
= 1,'"
,n; j
= 1,"', J;
and k
= 1",', K.
Here, an identical baseline hazard
function AO(t) is assumed for all types of failure and all members in a family.
A common baseline hazard model is applied when failures are of the same type
and members in a given family have similar susceptibilities to the same type of failure.
For example, consider a study on vision loss caused by diabetic retinopathy involving
siblings, where we treat the vision loss of each eye as one type of failure. There is no
good evidence to assume different susceptibilities to vision loss among siblings and
there are no biological differences to support that one eye is superior or inferior to
another eye. Therefore, the common baseline hazard model is the choice.
Under a working independent assumption, the pseudo partial likelihood for the
common baseline model (2.7) is
PL(j3) =
IT IT IT [
exp{j3IZijk(~ijk)}
lSii/C ,
k=1 j=1 i=1 E,/geR(Xij/c) expV3 Z'/g(Xijk )}
where R(t) = {I, f,9 : X ,/ g ~ t}, i.e., the set of family, member, and failure type
at risk just prior to time t. The maximum partial likelihood estimator
f3 c for 130 is
defined as the solution to the pseudo partial likelihood score equations
t t tiT
k=1 j=li=1
0
{Ziik(U) -
E:=1
~f=17i:l~/9(U)ZI.f9(U)eX;[,l~/zl/9(U)]}dMijk(U) = 0
E g=1 E/=1 E'=1 Y//g( u) exp[j3 Z,/g( u)]
(2.8)
42
..
2.3.3
Mixed Baseline Hazard Models
The mixed baseline hazard model lies between the distinct baseline hazard model
and the common baseline hazard model. It is useful when, for example, the baseline
hazards are the same for all members, but are heterogeneous for different failure types.
In more general cases, the mixed baseline hazard model may have an identical baseline
for some members of a family but different baselines for the rest of the members in
that family and/or the same baseline for some types of failures but different baselines
for other types of failures. We refer to the model as a mixed baseline hazard models
if the baseline hazard function is identical for some of the combinations of members
and failure types but is different for other combinations.
Again, use the Framingham Heart Study as an example. Suppose that we are
interested in studying the hazard rates for myocardial infarction and cancer among
siblings. For each type of failure, myocardial infarction or cancer, we need to use
different baselines for husbands, wives, and siblings to account for the different physiological resistance among them, but an identical baseline for siblings only because
of their similar susceptibilities to the same type of failure. Hence, a mixed baseline
hazard model should be considered.
We give here two mixed baseline hazard models: the mixed baseline hazard
model with a different baseline for each member and an identical baseline for all
failure types
Aijk(t;Z(t))
= AOj(t)exp{,8'Zijk(t)},
t E [O,T)
(2.9)
and the mixed baseline hazard model with an identical baseline for all members and
different baselines for each type of failure
Under a working independence assumption, the pseudo partial likelihood for the
43
mixed baseline model (2.9) is:
P L(P) =
where Rj(t)
IT IT fI [
exp{,8' Zijk(.;ijk)}
] 6ij/c
k=1 j=1 i=1 2:/ge Rj(Xij/c) exp{,8 Z /jg (Xijk)}
= {l,g : X/jg
~
,
t}, that is, the set of family and failure type at risk just
prior to time t with respect to member j. The maximum pseudo partial likelihood
estimator 13m for
Po in the mixed model (2.9) is defined as
the solution to the pseudo
partial likelihood score equations
ttt
r [Zijk( u) _ 2::=1 ~i=1 ~jg( U)Z/jg( u) ex~{,8' Z/jg( U)}] dMijk( u) = o.
k=1 j=1 i=1 Jo
2: g=1 2:/=1 Yijg( u) exp{,8 Z/jg(u)}
(2.10)
Similarly, the maximum pseudo partial likelihood estimator ~m for
Po in
the mixed
baseline hazard model with an identical baseline for all members and a different
baseline for each type of failure is defined as the solution to this system of pseudo
partial likelihood score equations
2.4
Concluding Remarks
We propose using a marginal model approach to analyze multivariate failure time
data with generalized dependence structure when the specific interest centers on the
effect of risk factors. Based on whether or not the baseline hazards are distinguishable
three types of marginal models in the form of Cox regression models are proposed:
distinct baseline models, common baseline models, and mixed baseline models. When
J = 1 and K = 1 it is a trivial consequence that the univariate Cox regression model
can be obtained from anyone of the three models. When J = 1 and K > 1, or
J > 1 and K = 1, we obtain the WLW models from the distinct baseline models,
the LWA models from the c<.>mmon baseline models, and either WLW models or the
44
LWA models from the mixed baseline models depending on whether J = 1 or K
=1
and the different baseline hazards are assumed in the mixed baseline models.
The proposed marginal models generalize the WLW model and the LWA model.
Our distinct baseline model is equivalent to the WLW model if we stratify our analysis based on both failure types and subjects in a cluster. Similarly, our common
baseline model becomes a more general setup of the LWA model by allowing multiple
failures per subject. However, we have to assume either different baselines for each
combination of failure types and members in a family, or an identical baseline for all
combinations of failures and subjects in a stratum in order to apply WLW models
or LWA models, which may not be applicable in applications. Our mixed baseline
model offers significantly greater modeling flexibility and applicability, and enables us
to deal with some application problems the current existing methods can not handle.
In the next chapter, we shall prove that the estimators for all three types of
models are consistent and asymptotically normal with a covariance matrix which
can be consistently estimated, under some sufficient regularity conditions and if the
marginal hazard models are correctly specified.
45
Chapter 3
Asymptotic Distributions of
Parameter Estimators
3.1
Introduction
In this chapter we develop the asymptotic distribution theory for the parameter estimators for the three types of models described iIlL 2. First, we state two lemmas
which are useful in proving consistency in Section 3.~~. Because the technical development for the mixed model combines the features of hoth the distinct baseline hazard
model and the common baseline hazard model, we confine our attention to the mixed
baseline hazard model. In Section 3.3 we derive the consistency and asymptotic normality of the estimators for the mixed baseline model. The asymptotic distributions
of the estimators for the distinct baseline hazard model and for the common baseline
are given in Section 3.4 and Section 3.5, respectively, without proof for the sake of
conCIseness.
46
3.2
Two useful Lemmas
When proving consistency, we often use two special cases of Lenglart's Inequality
stated in the following lemma. The lemma is drawn from Lemma 8.2.1 in Fleming
and Harrington (1991), and its proof can be found there. Essentially, the lemma says
that we can bound the probability of a larger value of a local submartingale anywhere
in the whole time interval [0, r] in terms just of the probability of a large value of its
compensator at the endpoint r. In other words, the compensator dominates the local
submartingale.
Lemma 3.1 Let N be a univariate counting process with continuous compensator A
such that M = N - A. Let H be a locally bounded and predictable process. Then for
all C, TJ
>
°
and any t E [0, r],
1.
Pr{N(t)
6
~ TJ} ~ -
TJ
+ Pr{A(t) ~ 6}
2.
The next lemma is taken from Appendix II of Andersen and Gill (1982) and
its proof is given there. This lemma is an extension of a result from standard convexity theory on finding extrema in sequences of concave functions. Basically, the
lemma states that if a sequence of random concave functions converges pointwise in
probability to a real-valued function on an open convex set E then the convergence
is uniform in probability on compact subsets of E.
Lemma 3.2 Let E be an open convex subset of R!. Let FI, F2 , •• " be a sequence of
random concave functions on E and F a real-valued function on E such that, Vx E E,
Fn(x) ~ F(x), asn
47
--+
00
Then:
1. The function F is concave.
~ 00,
2. For all compact subsets A of E, as n
sup IFn(x) - F(x)I-~
zEA
o.
3. If F has a unique maximum at x and Fn has one at X n , then X n ~ x, as
n~
3.3
00.
Mixed Baseline Hazard Model
In addition to the notation we used in Chapter 2, we introduce the following notation
for convenience:
n
N,jk(t)
=L
i=1
48
Nijk(t)
n
M.jk(t)
= L:Mi;k(t)
i=l
n
A.jk(t) =
E Aijk(t)
i=l
and, for d = 0,1,2
S}~)({3, t) =
K
E S}~)({3, t).
k=l
For two p-dimensional vectors a and b, the outproduct of a and b is denoted as a ® b,
= aib;. We also write
matrix A is lIali = maXi lail
, lal is the Euclidean norm:
which is a p by p matrix ab' with the (i,j) element (ab'k;
a@2 for the matrix aa'. The norm of a vector a or a
IIAII = maxi,; IAi;l, respectively. For any vector
lal = VEi=l a~. For any scalar a: lIali = lal = a.
or
a
To ensure the consistency and asymptotic normality of the estimator
13m from
solving the pseudo partial likelihood score equations (2.10) for the mixed baseline
hazard model (2.9), we assume the following sufficient regularity conditions. These
conditions are similar to those used by Andersen and Gill (1982) for the univariate
maximum partial likelihood estimation.
M.l. AOj(t)
~
°
and
faT AOj(t)dt < 00, j = 1,''', J.
M.2. There exists a neighborhood 8 of the true value (3o and scalar, vector, and
matrix functions s}~)({3, t), sWC8, t) and s}i) ({3, t) defined on 8 x [0, r] such
that, for d = 0,1,2
sup
IISW(,l~, t) - sW({3, t)1I ~
te[O,T],{3e B
j
M.3. For all
f3
E
= 1, "', J;
° as n
k = 1,' . " I<.
8, t E [0, r], j = 1,"', J and k
49
= 1"", I<
-+ 00
and
8 (l)(~
) _ (2}((3 )
8(3B
jk ,." t - Bjk ,t.
MA. The functions BW((3, t) are bounded and s~~((3, t) is bounded away from
zero on [0, r] and B}~)((3, t) are continuous functions of (3 E B uniformly
in t E [0, r], for d
= 0,1,2;
j
= 1"", J; and k = 1,,,, ,K.
M.5. The matrices
are positive definite, where,
and
B}~}((3, t)
K
=L
B}~}((3, t) d == 0,1,2.
k=l
M.6. There exists a matrix JJm = JJm((3o) such that when n
-+ 00,
where
J
Di =
K
L L fa
J
T
{Zijk(U) - e;((3o, u)}dMijk(U) =
j=lk=l 0
K
LL
D ijk ({30)
j=lk=l
i
= 1,"',n
and
(3.1)
50
M.7. For every
f
> 0 such that when n
-+ 00,
Note that the probability statements made in the above conditions are relative to the
usual filtration defined by
F(t) = {Fll1 (t),'" ,FllK(t),"', FUK(t) , ... ,FnJK(t)},
where :Fijk(t) = u{Nijk(U), }ijk(U+),Zijk(U+)j U ~ t}. Condition M.2 is the asymptotic stability condition for the functions
SWU3, t), d = 0,1,2.
Condition M.3 insures
that differentiation with respect to {3 and limits with respect to n may be interchanged
since the condition also holds for the functions S~~)({3, t). Conditions M.3 - M.5 are
regularity conditions similar to those found in standard asymptotic likelihood theory.
Conditions M.6 and M.7 are analogous to the variance-covariance stability and the
Lindeberg condition of the classical multivariate central limit theory for a sum of
independent zero-mean vectors.
Theorem 3.1 (Consistency of 13m) Under conditions M.l - M.5,
n -+
13m
~ {3o as
00.
Proof: The proof follows the argument in Lemma 3.1 of Andersen and Gill (1982)
for the univariate independent case except for some modification to include our more
general setup.
Recall that the pseudo partial likelihood for the mixed baseline hazard model
with different baselines for members expressed in the counting process notation has
the form
.
P L({3) =
V
()
[{3'Z ()]
}dNii,.(U)
K ~ ij: U exp
ijk ,u
k=1 j=1 i=1 U~O L g=1 Ll=1 Y,jg( u) exp[{3 Z,jg(u)]
K
II
IIJ II II
n
{
51
and
1(13, t) -
log P L(I3, t)
Consider the process
1
.1
O({i, t) = -(1(13, t) - 1(13o, t)) =
n
K
L L:: Ojk(l3, t)
j::1 k=1
where
Define the process
J
G(I3, t)
K
= L:: L:: Gjk ({3, t)
j=1 k=1
where
r{
1 n
iO) (13 u) }
Gjk (l3,t)=-L::}o (13-l3o)'Zijk(u)-log t~)
dNijk(U).
n i=1 °
Sj. (130' u)
,
First, we are going to show that 0(13, T) is asymptotically equivalent to G(I3, T) in
the sense that
0(13, T) - G(I3, T) 2-+ 0
as n --+
00.
From the definitions of Ojk(l3, T) and Gjk(l3, T), we have
..
52
= _
r
Jo
{IOg S(O)(t.l
sj~)«(3, u) -10 s~~\(3, u) } d (N.jk( U)) .
U)
g S(O)(t.l U)
n
).
fJO,
fJO'
J.
Based on Conditions M.2 and MA, for every u E [0, r] and (3 E B
I
og
S~~)«(3,u)
(0)
Sj. «(30' u)
~
as n
--+ 00.
1
og
-
s}~)«(3,u)
(0)
Sj. «(30' u)
0
Also, from (1) in Lemma 3.1
8
Pr { N .J"k(r) ~TJ } ~ -+Pr
n
TJ
{loT -LYiik(U)Aiik(U)du~8
1
n
0
n
}
i=1
- ~ + Pr {loT AoA u )sj~)«(3o, u )du ~ 8} .
Choose 8 >
as n
--+ 00,
tion MA, ~
loT AOj (u )SJ~) «(30' u )du, then we have
Pr {loT AOj(U)Sj~)«(3o, u)du ~ 8} ~ 0
by Condition M.2. As the consequence of Condition M.1 and Condi--+
0 as TJ
--+ 00.
Thus,
lim lim Pr {N.ik(r)
n
'lloon.-oo
~ TJ}
= 0
therefore,
Ojk«(3,r) - Gj k«(3,r) ~ 0
and consequently 0«(3, r) is asymptotically equivalent to G«(3, r).
53
(3.2)
Then, for each {3 E 8, the process
1
n
(t{ ({3-{3o)'Z,jk(u)-log s(O)({3
U)}
(~) ,
dM,jk(U)
-Lin
n ,=1 °
=
Sj.
(3.3)
({30, u)
is a locally square integrable martingale in t with respect to the filtration F( t) since
the integrand in Equation (3.3) is locally bounded and predictable. The predictable
variation process of this martingale at t is
+ ( log
(0) ({3,
Sj.
(0)
Sj.
u)
({30, u)
) 2
(0)
}
Sjk ({30, u) AOj(u)du.
It now follows from conditions M.l, M.2 and MA, that for each (3 E B, nHj k({3, r)
converges in probability to a finite function of {3. Hence, the inequality of Lenglart (2)
in Lemma 3.1 implies that, for each {3 E 8 and as n
54
-+ 00,
As the result of conditions M.1, M.2 and MA, ior each (3 E B, Ajk({3, r) converges in
probability to
gjk({3)
=
r{({3_{30)'8W({3o'U)-S~~)({3o'U)IOg ~~~)({3,u) }.xOj(U)dU.
Jo
Sj.
({3o, u)
(304)
Hence, Gj k({3,r) must converge in proba.bility to gjk({3), as long as (3 E B, for j =
1"", J and k
= 1,"', K.
Therefore,
G({3, r) ~ g({3)
where
J
g({3) =
K
l: l: gjk({3).
j=lk=l
Using Condition M.1 and the boundedness conditions of M.3 and MA, we may evaluate the first and the second derivatives of g({3) by taking partial derivatives inside
the integral (cf. Corollary 5.9 in Bartle (1966)). Hence, for each {3 E B,
{)
{){3g({3)
which is zero at {3
= {3o.
Furthermore,
which is negative definite at {3 = (3o by condition M.5. Therefore, G({3, r) converges
pointwise in probability, for each (3 E B, to a concave function g({3) on B with a
55
= {3o. The random function G'({3, r) is also concave and has a
unique maximum at {3 = {3o when the maximum exists. Following from Lemma 3.2,
unique maximum at {3
the random concave function G({3, r) converges to 9(/3) in probability uniformly over
8. Consequently, the maximizing value 13m of G({3, r) converges in probability to the
maximizing value of {3o of 9({3), that is,
13m
~ {3o as n ~
00.
0
Theorem 3.2 (Asymptotic normality of 13m) Assuml~ that conditions M.l - M.1 hold.
Then, as n
~ 00,
where
(3.5)
J
I({3) =
I: I
j
({3),
j=1
and Ij({3) and 1jm({3) are defined as in conditions At.S and M.6.
Proof: Recall the pseudo partial likelihood score fUIlction (2.10) is
J
K
- I:I: Ujk({3),
j=lk=1
where
56
..
In view of the first order Taylor expansion for U(f3) centered at the true value f30 of
f3, the score U(f3) may be written as
U(f3) = U(f3o) + :f3 U (f3)I =f3" (f3 - f3o),
f3
where f3* is on a line segment between 13m and f3o. Since U«(3m) = 0 by the definition
of 13m,
1
..;nU(f3o)
1( 8)1
=;; -
8f3U(f3)
f3=f3*..;n(f3m - f3o) .
(3.6)
A
To prove the asymptotic normality of ..;n«(3m - f3o), it now suffices to prove
(i) weak convergence ofn- 1 / 2 U(f3o) to a Gaussian process and
(ii) convergence in probability of n- 1 ( -1rJU(f3») 1f3=f3* to a non-singular
matrix I(f3o) for any random f3*
= f3*(n) such that f3*
~ f30 as n --+
00.
For the first part, we need to show that
1
1
J
K
..;nU(f3o) = ..;n;; (; U jk (f3o) ~ N p (0, ,Em (f3o» .
First, we shall show that n- 1 / 2 Ujk(f3) is asymptotically equivalent to n- 1 / 2 Djk(f3) in
the sense that
~o
as n --+
00,
where,
1
1
..;n D j k(f3) = ..;n
tt D ijk (f3),
n
and D ij k(f3) is defined as in (3.1). By the martingale central limit theorems of Rebolledo [e.g., page 83 of Andersen et al. (1993)], for eachj and k, Mjk(U)/..;n converges
57
weakly in V[O, T] to a zero-mean normal process, say W(u). As the consequence of
the tightness of W(u) [c.f., Sen and Singer (1993), p. .330], there exists 6*(f, N*), such
that for every f > 0,6 < 6**:5 6*, and n > N*,
= T / 6**
where 6** is chosen in such a way that h
is an integer. It follows from
Conditions M.2 and MA, there exists a N** such that when n
s}~)u~,u) _ 8~~)(,8,.u) <
UE[0~~~E8 S}~)((3, u)
Consequently, for n
Ph
s~~)((3, u)
> N**
6**
> N** and for any partition of [0, T] with 0
= po
< PI < '" <
= T,
max
q
sup
uE[pq-ltPq),(3e8
S~~)((3,u)
()
((3, u)
S/
S~~)((3,u)
Choose 0
= Po
<
UE[:'~~E8 S}~)((3, u)
<
6**.
< PI < '" <
Ph
=T
8~~)((3,U)
s~~)((3,u)
8~~)((3,u)
-
~s}~)((3, u)
in such a way that the length of each of the
interval between Pq-l and Pq is 6** so that h
= T / 6*'·.
Then
1
-vITi
- {U 3'k((3) - D 3'k((3)}
-
_1 ~ ~
t::: LJ LJ
=
I: 1
fPq+1
in
yn q=Oi=1 Pq
pq
q=O
pq
+1
(0)
Sj. ((3,U)
(0)
Sj.
((3,U)
{S~~)((3, U) _ 8}~)((3, U2} dM.jk(u)
S}~) ((3, U)
Therefore, for each j and k and n
1
{S~~)((3, u) _ s}~)((3, u) } dM,. ( )
II- -vITi {U 3'k((3) -
S~~) ((3, U)
> max{ N* , N**} ,
D 3'k((3)}
58
II
vITi
13k
u
< h8**f
-
Tf
=
op(l).
Consequently, n- 1 / 2 U(130) is asymptotically equivalent to
Notice that n- 1 / 2 D(l3o) is a sum of n independently distributed p-component random vectors D i with mean vector 0 since the integrand
is predictable and covariance matrix
J
var(Di) = E[L:
K
J
K
L: L: L: Dijk(l3o)D~jg(l3o)]'
j=lk=lj=lg=l
It follows from Conditions M.6 and M.7 and the multivariate central limit theorem
[c.f., Page 25 of Puri and Sen (1971)] that
For the second part of the proof, we need to show n- 1 (
1(130) for any random 13* = 13*(n) such that 13*
~
-frJ U (I3)) 113=13· ~
130 as n ~
00.
The proof follows
along the lines of the last part of the proof of Theorem 3.2 in (Andersen and Gill
(1982)) with some modification to accommodate our more general setup. Because
8
8I3 U (I3)
59
_~
~ r {S~~)«(3, u) _
LJ LJ 10
-
j=lk=1 0
= -
K r
L
L
10
j=lk=1
(0)
(S~~)«(3, U)) 0 } ~ dN..
2
(0)
Sj. «(3, u)
Sj. «(3, u)
LJ
i=1
( )
'3k U
J
\';«(3, u)dN.jk(u),
0
II ~ (- :13 U (13)) 113=13' - 1(130) I
;t. f;;[
V; (13', u)dN.;k(u) -
-1 L LK 10r
{\'; (,8* , u)dN.jk(U) - Vj(,8o, tt)s~~)(,8o, u)Aoj(u)du}
J
n j=1 k=1 0
<
tt
t.[
r {\,;«(3*, u) - 11j«(3*, un dN.jk(uL
j=1 k=1 10
+
n
t tilT
j=1 k=1
,,;(130' u )-\°)(130' u )>'0;( u )du
0
11j«(30' u) {dN.jk(U) n
A.jk(U~dU} I
n
(3.7)
It follows from Condition M.2 and the boundedness conditions in MA that
sup
l.lE[O,T],(3EB
lilT {\';«(3, u) - 11j(,8, unll ~ o.
0
60
This results and {3* ~ {30 together with (3.2) indicate that the first term of (3.7)
converges in probability to zero.
The continuity in {3, uniformly in t, in Condition MA plus (3.2) show that the
second term of (3.7) tends to zero also.
~; +Pr [foT{(vA{3o,u))a,b}2S)~)({3o,u)'\oj(u)du> 77].
where, (Vj({3, U))a,b is the (a, b) element ofvj({3, u) and M.jk(u)
= Ei=l Mijk(u). Hence,
Condition M.2 and the boundedness conditions in M.I and MA implies the third term
of (3.7) vanishes.
Finally, applying Conditions M.I, M.2, and M.3 directly to the fourth term of
(3.7), we obtain that this term converges in probability to zero.
Thus, when {3* ~ {30,
as n ~
00.
The proof is completed. 0
If we assume that the covariate, counting, and censoring processes are identically
distributed for i
= 1,· .. ,n, that is, in the iid cases across families, a simplified set of
conditions can be obtained. The result is summarized in the following theorem.
Theorem 3.3 (Sufficient conditions of /3m in the iid case) Assume that the covariate,
counting, and censoring processes are identically distributed for i
and Yljk are left continuous with right hand limits for j =
= 1, ... ,n
and Zljk
1,···, J; k = 1,··· , K.
Conditions M.l to M.1 are satisfied under the following conditions M.l to M.lV:
61
M.l. AOj(U)
~
°and faT AOj(t)dt <
oo,j = 1"", J.
M.ll. There exists a neighborhood B of the true value (3o such that, for
j
= 1,,,,, J
= 1,,,,, K
and k
E{
sup
Yijk(t)IZl;k(t)12e{3'Zlik(t)} <
00.
te[o,T),{3eB
M.Ill. Pr{Yijk(t)
=1
Vt E [0,
Tn > 0.
M.lV. The matrices
are positive definite, Vj
= 1,' .. ,J, where S~k' S}k'
and S~k are now defined
by
s~~)({3, t) = E{Yijk(t)e{3' Zl i k(t)},
sW({3, t)
sW({3, t)
= E{Yijk(t)Zljk(t)e{3'Zlik(t)},
= E{Yijk(t)Zljk(t)@2 e{3' Zl ik{t)},
and
respectively.
Proof: The conditions M.3 and MA now are automatically satisfied by the condition
M.IVand the definition of s}~)({3, t), d
= 0,1,2.
From the condition M.I1, we also
have
E{
Yijk(t)IZljk(t)le{3' Zlik(t)} <
sup
00,
te[o,T),{3eB
and
E{
sup
Yijk(t)e{3' Zlik(t)} <
00.
te[o,T],{3eB
By dominated convergence sW({3, t), sW({3, t), and ,s}i)({3, t) are continuous functions
of (3 E B for each t E [0, T), uniformly in
62
t
E [0, T).
They are also bounded on
B x [0, r] and, by condition M.III, s~~(f3, t) is bounded away from zero on B x [0, r].
Without loss of generality we may take B to be compact. Following the argument in
Theorem 4.1 of Andersen and Gill (1982), we can consider Yijk(t) exp{,8' Zljk(t)} as
a random element of D[O, r], where the elements of D[O, r] take values not in R but
in the Banach space of continuous functions on B endowed with the supremum norm.
Then by Theorem Ill.1 in Andersen and Gill (1982), we have
sup
IIS~~)(,8, t) - s}~)(,8, t)1I ~
tE[O,'T],,8EB
° as n
-+ 00.
The same argument works for SW(,8, t) and SW(,8, t). Therefore,
sup
II
sW (,8, t) -
sW(,8, t)1I ~
tE[O,'T].,8EB
°
as n
-+ 00,
(3.8)
and Condition M.2 holds. It is clear that conditions M.6 and M.7 (Lindeberg condition) are satisfied with
K
J
J
K
}Jm(,8o) = ~ ~ ~ ~ E{Dljk(,8o)D~!g(,8o)}'
j=lk=l!=lg=l
The proof is completed. 0
It is natural to estimate the covariance matrix of y'n(13m - ,80)' }Jm(,8o), from
the data by
(3.9)
where
I({3)
1
n
J
K
= - ~ ~ ~ fa
n
i=l j=1 k=1
'T
¥;({3, u)dNijk(U),
(3.10)
0
(3.11)
63
(3.12)
(3.13)
(3.14)
xdN.j.(u),
(3.15)
and
n
N.j.(U)
K
=L L
Nijk(U),
i=l k=l
Note that Em is obtained from Em by replacing ,8o,8}~)(,8,u),s}~)(,8,u), and AOj(U)
with
13m, S}~)(i3, u), S]~)(i3, u), and dN.j.(u)/ {nS]~\8, un, respectively.
The follow-
ing theorem states that Em is a consistent estimator of Em. To avoid imposing more
conditions, we consider the iid cases.
Theorem 3.4 (Consistency of Em(i3m) in iid cases) Suppose that all the assumptions of conditions M.l to M.lV are satisfied and in addition the following condition
holds,
then
Proof: To prove the consistency of the Em' it suffices to show that:
64
as n ~
00
since using Slutsky's theorem we obtain iJm((3m) ~ E m({3o) from the
first part and using Slutsky's theorem again we get the desired results.
To show that the first part holds it suffices to show that each component of
converges in probability to the corresponding components in
that is, for a, b = 1,'" ,p,
Write the ath element of the vector b jjk as bjjk , a, then
(3.17)
(3.18)
(3.19)
65
1
(?r)
foT S(l) (?c
)
f T S(l)
j.,a
1"Jm' U dN.. ( )
j.,b
1"Jm' U 'N.
( )
L.J Jo
(0)
13k U
(0)
ll. 1jg U
+_'"'
n i=l 0 Sj. (~m'U)
0 Sj. (~m'U)
n
A
A
1 n foT { Z.o () _
+_'"'
n~ 0
13 k,a U
,=1
(3.20)
A
S(l) (?r
j.,a 1"Jm' U)}
(0)
A
Sj. (,8m'U)
I
Yoijk (U) e
~m
Zijk(U)
n
K
, " , '"' dN
(0) :
nSj. (t'm'U)
L.J L.J
l=lr=l
13r U
0
(
)
(3.21 )
(3.22)
(3.23)
-
E {foT [Zljk,a( u) -
fj,a(~o, U)]dMljk(u) foT [Zljg,b( u) - ~,b(~o, U)]dMljg (U)}
-
E {foT[Zljk,a(U) -
fj,a(~o,u)][dNljk(U) - Y1jk(u)e~~Zljk(U)AOj(u)du]
X
-
foT [Zljg,b(U) - 4,b(~0, u)][dNljg(u) - Yijg(u)e~~ZlJ9(U) AOj(u)du]}
E {foT Zljk,a(u)dNljk(U) foT Zlj9,b(U)dNlj9(u)}
-E {foT
fj,a(~o, u)dNljk(U) foT Zlj9,b(U)dNlJg(u)}
66
(3.24)
(3.25)
(3.26)
+E {loT e;,a(/3 0' u)dN1jk(U) loT e;"b({30' U)dN1f9 (u)}
(3.27)
+E {loT [Zljk,a(U) - e;,a({3o, U)]Yi;k(U)e{3~Zljk(U) Ao;(u)du
X
fj,b({3o' U)]Yifg(u)e{3~Zlf9(U) AOf(u)du}
loT [Zlfg,b(U) -
(3.28)
(3.29)
(3.30)
Note that (3.17) is just an average of iid random variables. Therefore, (3.17) ~
(3.24) by the law of large numbers.
Now, let's rewrite (3.21) and (3.28) in the following forms:
(3.21) =
dN.j. (u )dN. f .(v)
(1)
(3.31)
A
S;.,a({3m,U)
( )dN ( )
X (0)
(0)
dN.;. u
.f. v
S;. ({3m,u)2Sf. ({3m'V)
.
A
A
67
(3.32)
..
(3.33)
(1)
X
where N.j.(u)
A
(1)
A
Sj.,a~m,U)SJ.,b(~m,V) dN· (u)dN, (V)
S)~)(,8m, U)2S}~)(,8m, V)2
.J.
(3.34)
.j.
= Ei=l E~=l Nljr(U).
(3.28) =
(3.35)
(3.36)
(3.37)
(3.38)
Define
68
and
From (3.16), we also have
E{
y!jk(U)IZlj/c(U)le,8IZ1ik(U)y!jg(V)e{3IZ1f9(V)} <
sup
00
{3EB.UE[O...j,VE[O...j
l$M$J,l$k,g$K
and
E{
sup
Y'ijk(U)e{3IZ1ik(U)Y'ijg(v)e{3IZ1f9(V)} <
00.
(3 EB,uE[O''').VE[O,..)
1 $M $J,l $k,g$K
Using the above condition and similar arguments as in Theorem 3.3, we have that
mjkajgb({3, U, v) is a continuous function of (3 E Band M jkajgb({3, U, v) converges in
probability to mjkajgb({3,u,V) uniformly in (u,v) E
[o,TF. By (3.8) and the bound-
edness of s~~)({3, t), the above results and the consistency of 13m, we have
Mjkajgb(/3m,U,v)
Sj. fJm'U ) S(O)
j. ( {3m'v
(O) (l:l
A
p
)
---+
mjkajgb({3o,U,v)
(0) (
Sj. {3o,u ) Sj.(0) ( {3o,v )
uniformly in (u, v) E [0, TF. Note that n- 1 N.j.(t)(t) converges to
rSj. ({3o,u)Aoj(u)du,
10
(0)
that n- 1 N.j.(t)(t) is bounded in probability and that mjkajgb({3, U, v) resides in the
product space ofleft continuous functions, it follows that (3.31) is consistent for (3.35).
Using similar arguments, we can show that: (3.32) ~ (3.36), (3.33) ~ (3.37), and
(3.34) ~ (3.38). Hence, (3.21) ~ (3.28).
Again using the similar argument, we can show that (3.22) ~ (3.29) and
(3.23) ~ (3.30).
Now, we are going to show that (3.18) is consistent for (3.25).
1(3.18) - (3.25)1
69
where
11
1
n
=- ~
n
1=1
[
1
T
0
S(I)(tI
j.,a I-'m' U )
(0) "
Sj. (/3 m , U)
-
(1)(/.1)
Sj.,a
1-'0' U
(0)
S j.
(f3, U)
dNijk(U)
fo
T
]
IZijg,b(U)ldNijg(u)
(3.39)
0
and
- E
r
{ioo
S<,I)
J(;)
Sj.
(/3 u)
0'
dN1jk(U)
(/3, u)
iT Zljg,b(U)dNljg(u) }.
By the law of large numbers and consistency of
n
--+ 00.
0
13m
we obtain that I 12 ~ 0 as
.
From the results in (3.8), consistency of i~m, and the iid condition across
the families we have
11
since E(Zljg,b(X1j9 )) <
00
1
<
f-
<
f-
n
~Nijk(T)IZijg,,~(Xijg)1
n i=1
1
n
~ IZijg,b(Xijg) I
n i=1
--+
fE( Zljg,b(X1j9 ))
~
0
by Condition II. Hence, 11 ~ 0 as n
--+ 00.
Using simi-
lar arguments, we can show that (3.19) ~ (3.26), and (3.20) ~ (3.27). Therefore,
•
70
To prove the second part, we write
<
t.E ~f {~(,8m,U)
t.Ellf {,,;(,8m,
+
+
t t lilT
j=l k=l
- ";(,8m,U)} dN.~'(U)11
u) - ";(Po, u)}
Vj({3o, u)
0
{dN. njk (u) -
dN.~(U)
A.jk
n
I
dU} II
Then, using the similar argument as for (3.7), we can show that
I(j3m)
~ I({3o).
o
The large sample properties of the maximum partial likelihood estimator
j3 d
for the distinct baseline hazard model and the maximum partial likelihood estimator
j3 c
for the common baseline hazard model in the iid cases are given in the form
of theorems in the next two sections, respectively. Their proof follows the similar
argument as the proof for
j3 m
because the proof for the mixed baseline hazard model
combines the features of both distinct baseline hazard model and the common baseline
hazard model. For the sake of conciseness, the proofs for the two theorems are not
repeated.
71
3.4
Distinct Baseline Hazard :Model
Define
K
J
1({3)
= ~ ~ Ijk(,8) ,
j=1 k=1
= loT tJjk({3, t)s}~)({3, t)'>'Ojk(t)dt,
I jk ({3)
d
E (,8)
J
K
J
K
= ~ ~ ~ ~ E{D1jkU")D~/g(,8)},
j=1 k=1 1=1 g=1
1(,8)
-
r
10
{z.. ()_
,)k U
SW(,8,U)} {dN" ( ) __ Y;. ( ) ,8'Ziilc(U) dN.jk(u) }
s~O)(,8)
,)k U
,)k U e
s~O)(,8 ) ,
')k
,U
n)k
72
,u
.
. ({3 t) _
V,k , -
(2) ({3, t) _ {(I)
}02
8jd{3, t)
8 jk
(0)
Sjk ({3, t)
(0)
,
Sjk ({3, t)
and 8W({3, t) is defined as in Condition M.lY, d = 0,1,2.
Theorem 3.5 (Asymptotic properties of 130) Assume that the following conditions
D.l to D../. are satisfied:
D.l. AOjk(t)
~ 0 and faT AOjk(t)dt < 00,
j = 1"", Jj k = 1"" ,K.
D.2. There exists a neighborhood 8 of the true value (30 such that, for
j
= 1,"',J and k = 1,'"
E{
,K,
lljk(t)IZljk(t)1 2 e{3'Zl j k(t)} <
sup
00.
te[0,Tj.{3e8
D.S. Pr{lljk(t) = 1 Vt E [0, T]} > O.
D../.. The matrices I jk ({3o) are positive definite, Vj
= 1,""
1,,,,, K.
Then
3.5
Common Baseline Hazard Model
Define
L L Jor v({3, t)Sjk ({3, t)Ao(t)dt
J
I({3)
K
j=lk=1
•
- iT
(0)
0
V({3, t)s~~)({3, t)Ao(t)dt,
73
J and k
=
S~.2)«(3, t)
= S~~)«(3, t)
V«(3, t)
K
J
}JC«(3)
}0 ,
S~~)(~-f,t)
{S~.l)(~", t)
-
2
K
J
= ~ ~ ~ ~ E{Dljk(t~)D~Jg«(3)},
j=lk=lj=lg=l
1
=
K
T
~ ~ ~ 10 V«(3~1 t)dNijk(t),
-
n i=l j=l k=l
0
S~~)«(3, t) {S~.l)«(3, t)
S.~O)«(3, t) - S.~O)«(3, t)
V«(3, t) =
S~d)«(3, t)
J
n
J
K
j=l
k=l
n
J
= ~ ~ SW«(3, t)
1
K
J
}02
,
fOj~ d = 0,1,2,
K
±c«(3) = - ~~ ~ ~ ~ Bijk«(3)B~jg«(3),
n i=lj=lk=lJ=lg=l
r {z ijk ()U -
- 10
S~.l)«(3,U)}
s.~O)«(3, u)
n
N... (u)
s~.d)«(3, t)
J
{dN () Yo ( ) (3'Ziilc(U) dN...(u) }
ijk U - ijk U e
nS~O)«(3, u)'
J
K
= ~ ~ ~ NijA:(U),
i=l j=l k=l
K
= ~ ~ sW«(3, t),
j=lk=l
and sW«(3, t) is defined as in Condition M.IV.
74
for d
•
= 0,1,2,
•
Theorem 3.6 (Asymptotic properties of /3 c) If the following conditions C.l to C.4
hold:
C.l. Ao(t)
~0
and
loT Ao(t)dt < oo,j = 1,···, J.
C.2. There exists a neighborhood B of the true value
J" = 1" ... J and k
E{
= 1"...
K
f3 0
such that, for
,
sup Yijk(t)IZljk(t)12e,8'Zlik(t)}
te[o,Tj,{3eB
< 00.
C.3. Pr{Y'ijk(t) = 1 'Vt E [0, T]} > O.
C.4. The matrix 1({3) is positive definite.
Then
as n ~
3.6
00
and E c({3o) can be consistently estimated by if(/3c).
Concluding Remarks
It may be unrealistic to assume that (X, 6, Z (t)) are identically distributed across
(independent) families under our generalized dependence structure, although it was
assumed both in the WLW model under the setting of between-subject dependence
and in the LWA model in the context of within-subject dependence. We relax the
assumption of identical distributions in proving the consistency of the estimators and
their asymptotic normality. Simulation studies are conducted in Chapter 4 to assess
the adequacy of the proposed large-sample approximation for practical sample sizes.
We conclude this chapter by noting that it is vital that the marginal models are cor•
rectly specified in developing the asymptotical distribution theory of the estimators.
The consequences of misspecified marginal models are discussed in Chapter 6.
75
Chapter 4
Simulation Studies of Parameter
Estimators
4.1
Introduction
This chapter deals with the numerical evaluation of the regression parameter estimators for the three types of marginal models. In Chapter 3 we showed analytically that
under some regularity conditions the estimators are consistent and asymptotically
normal with a covariance matrix which can be consistently estimated if the marginal
models are correctly specified. In this chapter we study the finite sample properties
via simulation to assess the adequacy of the proposed large-sample approximation for
practical sample sizes. Recall that the mixed baseline hazard model combines both
features of the distinct baseline hazard model and the common baseline hazard model.
We confine our attention to the mixed baseline haza.rd model in the simulations.
•
76
4.2
Simulation Parameters
In these simulations, we consider two members (J
= 2) in each family and two failures
(K = 2) for each member. The four failure times are independent across families.
The failure times Till, Ti12 , Ti21 , and Ti22 for the ith family are generated from the
multivariate Clayton-Oakes distribution (Clayton (1978), Oakes (1989), and Clayton
and Cuzick (1985)), with a Weibull marginal distribution for each member j(= 1,2)
and for each failure k(= 1,2) :
(4.1)
where the Weibull density distribution is
,p(ptp-l exp{ -(ptP} .
.
= 1. In the
(, = 1) were
Exponential distribution is a special case of Weibull distribution with,
.
simulation, both Weibull marginals (,
:I
1) and exponential marginals
used.
We assume a different baseline for each member in a family and an identical
baseline for the two failure times from the same member. This would be the case,
for example, in a vision loss study involving husbands and wives, where we treat
vision loss from each eye as one type of failure. It should be reasonable to assume a
mixed baseline hazard model with different baseline hazards for the husband and the
wife from the same family and an identical baseline hazard for the left and right eye
vision loss. For the multivariate failure time distribution with exponential marginals,
a (constant) baseline AOl = 1 is assumed for Member 1 and a (constant) baseline
A02
= 5 for Member 2, that is, , = 1 and p = 1 in (4.1) for Member 1 and, = 1
and p = 5 for Member 2. In the model with Weibull marginals, p = 1"
77
= 1 and
p
= 1, ')' = 3 were used,
baseline
A02
= 3t 2 for
which resulted in a baseline AOl - 1 for Member 1 and a
Member 2.
For the sake of simplicity we focus on the case of a single regression parameter
for the mixed baseline hazard model in (4.1). We chose values for the true regression
coefficient
Po of 0 and 0.7, which correspond to relative risks of 1 and approximately
2, respectively. Distributions for the scalar covariate were taken to be independent
Bernoulli(0.5) and Normal(O, 1) truncated at ±5.
The parameter () represents the degree of pairwise dependence of failure times.
When
Po =
0, () ---. 0 gives the maximal positive correlation of 1. Independence is
the limiting case of () ---.
00.
In the simulations the dependence parameter () values
were 0.25, 0.80, 1.50: and 3.00. Table 4.1 shows the observed means of pairwise
correlations of failure times for the () values based on 1,000 simulation runs each with
a sample size of 1,000 and N(O, 1) covariate truncated at ±5, when the marginal
distribution is exponential or Weibull and no censoring is imposed.
Table 4.1: Observed mean pairwise correlations for different () values based on 1,000
simulation runs with a sample size of 1,000 for each run and no censoring.
(J
.25
.80
1.50
3.00
Censoring times
Cijk
Exponential
f30 = 0 f30 = 0.7
.936
.711
.510
.303
.423
.321
.230
.137
Weibull
f30
=0
.854
.622
.442
.264
f30
= 0.7
.474
.344
.244
.146
were from uniform (0, T) distribution and they were gen-
erated independent of each other and of Tijk and
Z,ijk.
..
The numbers of independent
II
clusters (n) were 50, 100, and 200.
78
The larger the number of simulation runs R, the better the precision of the
estimator based on the simulations. Table 4.2 gives simulation results for R = 500,
1,000, and 10,000 given the configuration of sample size of 100, normal(O,l) covariate,
uniform(O, 5) censoring, and () = 0.25. To balance the precision of the estimator and
the simulation time required, we carried out 500 simulation runs (R = 500) for each
given simulation configuration.
Table 4.2: Simulation results for n=100, Normal(O,1) covariate, uniform(O, 5) censoring distribution (12% censoring for
/30 =
0 and 14% censoring for
/30 =
0.7), () = 0.25,
exponential marginals, and sample size of 100.
l30
V-
mean Pm
var(Pm)
0
500
1,000
10,000
.001
.001
-.000
.0027
.0031
.0030
.0028
.0028
.0028
.7
500
1,000
.708
.708
.0061
.0061
.0056
.0057
..
4.3
mean
R
Summary Statistics
The Newton-Raphson iterative procedure was used in each simulation run to obtain
the estimators from the pseudo score equation (2.10). R random samples of size n
were generated from the multivariate Clayton-Oakes distribution (4.1) for each of the
simulation configurations. For each random sample of size n, the maximum partial
likelihood estimator of
•
/30, f;r,
and the robust variance estimator,
were obtained. Notice that here we use
if
V,. , r
= 1,'" .R,
to denote the scalar version of Em(/J) in
(3.9). From these estimators we calculated the Wald-type 90% and 95% confidence
79
intervals. The simulation results are summarized within each configuration by the
following: mean ffim' mean
V,
and the estimated Vlald-type coverage probability.
More specifically,
AIR
mean 13m
= R L: f3r'
r=l
AIR
mean V = R
1 R
90% CP = R
L: I {f3r A
A
L: v,.,
A
r=l
r;::
A
1.645y v,. < 130 < f3r
r=l
and
1
95% CP = R
R
L: I {f3r A
r;::
A
1.96y v,. < 130 < f3r
r;::
+ 1.645y v,.},
r;::
+ 1.96y v,.}.
r=l
To estimate the true variance, E m(f3o), for the mixed baseline models used in simulations, we calculated the sampling variance within e:ach configuration as
AIR
2
= ~ L:[f3r - mea.n (13m)] .
- 1 r=l
A
A
var(f3m)
Note that good agreement between mean V and var(tlm), which occurs when the ratio
of these two statistics is close to 1, indicates that the robust variance estimator is
a good estimator to the true variance. The estimated coverage probability CP is a
summary measure of the bias of the parameter estimator, ffim' the bias of the robust
variance estimator, and the adequacy of the normal approximation of the parameter
estimator. Hence, CP assesses the normality of the estimator if both the parameter
estimator and its variance estimator are unbiased.
4.4
Results and Discussion
The simulation results are displayed in tables 4.3 .. 4.8. In these tables 130 and 0
refer to the true regression parameter and dependence parameter of the underlying
distributions and n is the sample size for each simulation run. Each row in the tables
is based on 500 simulation runs.
80
•
These simulation results suggest that the estimator
13m is approximately unbi-
ased for each simulation configuration. The sandwich type robust variance estimator
appears to be a good estimator of the true variance, as judged by the good agreement between var(~m), and mean
V for sample size of 100 or larger.
The empirical
Wald-type coverage probability has proper sizes. This is true for all combinations of
the true regression coefficients, the degree of correlation among failure times, the censoring probabilities considered, the covariate distribution, and both the exponential
marginals and the Weibull marginals.
Table 4.3, Table 4.4, and Table 4.5 show the comparison results of censoring for
the failure times with Normal(O, 1) distributed covariate and exponential marginals.
Table 4.3 gives the summary statistics of the simulation results when there is no
censoring. The simulation results with censoring are shown in Table 4.4 and Table 4.5.
The Uniform(0,5) censoring distribution in Table 4.4 resulted in 12% and 14% of the
data censored when
/30
= 0 and
/30
= 0.7, respectively. The Uniform(O,I) censoring
distribution in Table 4.5 led to about 14% of the data censored. It appears from these
tables that as censoring proportion increases, the variances of ~m become larger.
Comparing the results from Table 4.4, and Table 4.6, we see that the variances
of ~m for the Weibull marginals are somewhat larger than those for the exponential
marginals, consistently for each combination of
/30, (),
and sample sizes.
Table 4.4 and Table 4.7 indicate that the variances of ~m, var(~m), with the
Bernoulli(0.5) covariate are about three times larger than those with the Normal(O,
1) covariate for different () and sample sizes, when
/30
= 0.7 and the failure times
with the exponential marginals. When n ~ 100, var(~m) is somewhat larger than
mean
V and it appears that we need sample size n =
100 or larger for better variance
estimates for the Bernoulli(0.5) covariate distribution.
The results in Table 4.8 show that the variance of ~m based on sample size of
100 for the failure times with Weibull marginals is larger than that for the failure
81
times with exponential marginals when the Bernoulli(O.5) covariate distribution and
uniform(O, 5) censoring distribution are used.
The simulation results thus indicate that the proposed large-sample approximations are quite reasonable for practical applications. Further numerical studies are
needed to study the impact of heterogeneous associations among failure times.
82
Table 4.3: Simulation results (based on 500 simulation runs) for Normal(O,l) covariate, no censoring, and exponential marginals.
Po
0
.7
n
mean Pm
varCPm)
.25
50
100
200
.001
.000
-.001
.0055
.0024
.0012
.80
50
100
200
.001
.000
-.002
1.50
50
100
200
3.00
(J
mean
V
90% CP
95%CP
.0047
.0024
.0012
.862
.902
.894
.922
.942
.948
.0057
.0025
.0012
.0049
.0024
.0012
.866
.888
.890
.930
.936
.936
.001
-.000
-.002
.0058
.0026
.0013
.0050
.0025
.0012
.862
.876
.892
.920
.930
.930
50
100
200
.002
-.000
-.002
.0058
.0027
.0013
.0050
.0025
.0012
.866
.874
.894
.920
.926
.936
.25
50
100
200
.717
.709
.703
.0125
.0061
.0028
.0103
.0053
.0027
.858
.872
.894
.934
.920
.956
.80
50
100
200
.711
.707
.702
.0100
.0049
.0023
.0086
.0044
.0023
.862
.864
.892
.926
.928
.956
1.50
50
100
200
.708
.705
.701
.0089
.0043
.0020
.0078
.0040
.0020
.872
.882
.894
.936
.936
.946
3.00
50
100
200
.706
.704
.700
.0081
.0039
.0019
.0073
.0037
.0019
.882
.882
.900
.936
.938
.944
83
Table 4.4: Simulation results (based on 500 simulation runs) for Normal(O,l) covariate, uniform(O, 5) censoring distribution (12% censoring for
for
/30 = 0 and 14% censoring
/30 = 0.7), and exponential marginals.
130
0
.7
n
mean 13m
var(/3m}
.25
50
100
200
.005
.001
-.001
.0056
.0027
.0013
.80
50
100
200
.001
.000
-.001
1.50
50
100
200
3.00
mean
V
90% CP
95%CP
.0055
.0028
.0014
.902
.904
.922
.944
.962
.962
.0055
.0028
.0013
.0056
.0029
.0014
.900
.900
.908
.956
.954
.968
-.001
.000
-.001
.0055
.0028
.0014
.0056
.0029
.0014
.898
.900
.912
.950
.956
.956
50
100
200
-.002
.000
-.001
.0056
.0028
.0014
.0057
.0029
.0014
.896
.896
.900
.946
.960
.954
.25
50
100
200
.713
.708
.701
.0119
.0061
.0026
.0109
.0056
.0029
.896
.874
.910
.936
.944
.952
.80
50
100
200
.707
.706
.701
.0098
.0049
.0023
.0091
.0048
.0024
.878
.890
.904
.932
.932
.952
1.50
50
100
200
.703
.704
.700
.0090
.0043
.0022
.0084
.0044
.0022
.884
.890
.898
.934
.946
.944
3.00
50
100
200
.701
.704
.700
.0084
.0040
.0021
.0080
.00401
.0020
.880
.900
.898
.946
.942
.950
8
84
Table 4.5: Simulation results (based on 500 simulation runs) for Normal(0,1) covariate, uniform(O, 1) censoring distribution (42% censoring), and exponential marginals.
Po
0
n
mean Pm
var(Pm)
.25
50
100
200
.004
.001
-.000
.0018
.0041
.0021
.80
50
100
200
.001
.000
-.000
1.50
50
100
200
3.00
(J
V
90% CP
95%CP
.0085
.0043
.0021
.914
.902
.912
.958
.958
.956
.0083
.0044
.0022
.0086
.0043
.0021
.902
.900
.896
.954
.950
.962
.000
.000
.000
.0086
.0046
.0022
.0081
.0044
.0021
.888
.890
.894
.952
.938
.944
50
100
200
-.004
-.000
-.000
.0081
.0046
.0023
.0081
.0044
.0021
.900
.904
.882
.952
.944
.938
.25
50
100
200
.108
.106
.699
.0131
.0013
.0033
.0134
.0068
.0034
.910
.902
.912
.948
.942
.946
.80
50
100
200
.102
.105
.100
.0115
.0061
.0030
.0111
.0060
.0030
.902
.898
.898
.950
.938
.950
1.50
50
100
200
.100
.103
.100
.0112
.0056
.0029
.0111
.0051
.0028
.896
.894
.906
.952
.936
.952
3.00
50
100
200
.698
.103
.699
.0109
.0054
.0029
.0109
.0055
.0021
.888
.904
.896
.950
.948
.948
0
.1
mean
85
Table 4.6: Simulation results (based on 500 simulation runs) for Normal(0,1) covariate, uniform(O, 5) censoring distribution (19% censoring for 130
for 130
= 0 and 21 % censoring
= 0.7), and Weibull marginals.
f30
0
.7
(J
n
mean
Pm
var(Pm)
mean
V
90% CP
95%CP
.25
50
100
200
.005
.001
-.000
.0060
.0029
.0014
.0060
.0031
.001.5
.890
.900
.914
.950
.956
.958
.80
50
100
200
.001
.001
-.000
.0061
.0030
.0014
.0061
.0031
.001.5
.904
.910
.918
.954
.950
.962
1.50
50
100
200
-.000
.000
-.000
.0062
.0030
.0015
.006:2
.0031
.001.5
.894
.910
.910
.958
.950
.962
3.00
50
100
200
-.002
.000
-.000
.0062
.0030
.0016
.006:2
.0031
.001.5
.890
.900
.896
.952
.952
.944
.25
50
100
200
.712
.707
.701
.0124
.0063
.0027
.011-4
.0059
.0030
.898
.884
.904
.930
.934
.954
.80
50
100
200
.706
.706
.700
.0103
.0052
.0025
.0097
.0050
.0026
.888
.892
.900
.944
.928
.950
1.50
50
100
200
.702
.704
.700
.0097
.0047
.0024
.0090
.0047
.0023
.894
.894
.894
.936
.932
.952
3.00
50
100
200
.700
.703
.700
.0092
.0044
.0023
.0086
.0044
.0022
.904
.880
.902
.946
.942
.954
86
Table 4.7:
Simulation results (based on 500 simulation runs) for
130 -
0.7,
Bernoulli(0.5) covariate, uniform(O, 5) censoring distribution (9% censoring), and
exponential marginals.
n
mean Pm
varCPm)
mean V
90% CP
95%CP
.25
50
100
200
.713
.704
.701
.0314
.0153
.0074
.0275
.0139
.0071
.876
.866
.902
.934
.924
.948
.80
50
100
200
.704
.700
.699
.0300
.0146
.0067
.0257
.0129
.0066
.872
.882
.894
.936
.938
.952
1.50
50
100
200
.702
.698
.698
.0283
.0139
.0063
.0248
.0124
.0063
.870
.884
.892
.934
.932
.948
3.00
50
100
200
.702
.698
.697
.0269
.0131
.0060
.0241
.0121
.0061
.878
.880
.896
.944
.936
.960
(J
87
Table 4.8: Simulation results (based on 500 simulation runs) for sample size of 100,
Bernoulli(0.5) covariate, and uniform(O, 5) censoring distribution (12% censoring for
f30 = 0 and 9% censoring for /30 = 0.7 with exponential marginals and 19% censoring
for /30 = 0 and 16% censoring for /30 = 0.7 with Weibull marginals).
fJo
0
.7
marginals
mean ~m
var(~m)
.25
Exponential
Weibull
-.006
-.005
.0120
.0127
.80
Exponential
Weibull
-.009
-.008
1.50
Exponential
Weibull
3.00
mean
V
90% CP
95%CP
.0112
.0122
.878
.888
.938
.938
.0123
.0130
.0112
.0122
.874
.872
.932
.930
-.008
-.008
.0123
.0128
.0113
.0123
.888
.874
.940
.940
Exponential
Weibull
-.008
-.009
.0120
.0129
.0113
.0123
.892
.886
.954
.948
.25
Exponential
Weibull
.704
.704
.0153
.0160
.0139
.0148
.866
.872
.924
.932
.80
Exponential
Weibull
.700
.700
.0146
.0153
.0129
.0139
.882
.888
.938
.928
1.50
Exponential
Weibull
.698
.698
.0139
.0146
.0124
.0134
.884
.890
.932
.938
3.00
Exponential
Weibull
.698
.697
.0131
.0140
.0121
.0130
.880
.880
.936
.938
(J
.
88
Chapter 5
Example
5.1
Introduction
In this chapter we illustrate the methodology using data from the Framingham Heart
Study. In Section 5.2 we describe the data and the model which was fit. The estimates
and a brief discussion are presented in Section 5.3.
5.2
Data and Model
The Framingham Heart Study was undertaken to determine the risk factors for coronary heart disease (eHD) and other atherosclerotic disorders (Dawber (1980)). It
began in 1948. At that time Framingham was a small self-contained community of
approximately 28,000 white inhabitants, 18 miles west of Boston. The study included
2,336 men and 2,873 women aged between 30 and 62 at their baseline examination.
The sampling unit was family. The members of the cohort were invited for an examination every two years, for the duration of this 30 year study. At each of the biennial
visits a medical history, physical examination, blood chemistries, and other labora89
tory work were completed. Symptoms of illness that had developed since the previous
visit were reviewed, and interim hospitalizations or medical visits were registered. Besides demographic information, risk factors such as blood pressure, cholesterol levels,
weight, alcohol use, gout, diabetes mellitus, hemoglobin, smoking behavior, and so
on, were recorded at baseline and throughout the follow-up period.
The Framingham Heart Study data set collected over the course of thirty years
of follow-up examinations is sizeable. For simplicity, in this example, we restrict our
focus to the time to the first evidence of cerebrovascular accident (CVA) and the
time to the first evidence of CHD. The data set used in this example includes all
participants in the study who had an examination at age 44 or 45 and were diseasefree at that examination. By disease-free we mean no prior history of hypertension or
glucose intolerance and without experiencing a CHD or CVA. The time origin is the
time of the examination at which an individual entered the sample. Because some
individuals were in the study several years prior to inclusion into the dataset, the
waiting time (mean 7.4 years) from entering the study to reaching 44 or 45 years of
age was used as a covariate to account for the cohort effect.
.
Of the 1,571 disease free individuals, 233 later experienced CHD but not CVA,
34 CVA but not CHD, and 17 both CHD and CVA. Three hundred and ten participants were siblings: 113 groups with sibship size of 2, 24 of size 3, and 3 of size 4
(J=4). We considered a mixed baseline model with different baselines for CHD and
for CVA and an identical baseline for sibling members. The risk factors of interest
were gender (59% females), systolic blood pressure (mean 122 mm Hg), body mass
index (mean 24.8 kg/m 2 ), cholesterol level (mean 231 mg/dL), and cigarette smoking
(63% smokers). The values of risk factors were taken from the biennial examination
at which an individual was entered into the sample. Because of the small number of
CVA events, we used the same regression coefficients for both CHD and CVA. For
illustrative purposes, however, the gender effect was treated as failure-specific. More
90
.,
specifically, the mixed marginal hazard model was
Aijk(tj Zijk)
where, k
5.3
= AOk(t) exp{,8' Zijd
= 1 for CHD, k = 2 for CVA,
Zijl
= (Smoking, Cholesterol, BMI, SBP, Waiting time, Gender, 0)', and
Zij2
= (Smoking, Cholesterol, BMI, SBP, Waiting time, 0, Gender)'.
Results and Discussion
The analysis results for the mixed baseline hazard model are shown in Table 5.1.
The p-values are from the Wald test. We conclude from the analysis that cigarette
smoking, higher blood cholesterol level, body mass index, and systolic blood pressure
significantly increase the hazard rate for CHD and CVAj and that being a female
significantly reduces the hazard rate of getting CHD but not CVA. Failure to establish
an association between gender and the time to CVA may reflect the small number of
CVA in the data.
Klein, Moeschberger, Li and Wang (1992) also analyzed this data set using
frailty models. Because a common frailty could not capture both between-subject
dependence and within-subject dependence concurrently, they had to perform three
analyses, separately: one analysis for CHD and CVA events which ignored the dependence among siblingsj the other two analyses for either CHD event or CVA event,
but not both, which considered the association between the failure times among siblings. Ignoring dependence among failure times, however, may result in erroneous
conclusions. Separate analyses for CHD and CVA preclude the direct estimation of
the common covariate effect for both CHD and CVA.
Because individuals' susceptibilities to CHD are different from their susceptibilities to CVA and there are no physiological differences among siblings, neither the
91
common baseline hazard model nor the distinct baseline hazard model is appropriate
here. The mixed baseline hazard model offers greater modeling flexibility and applicability and allows us to handle the problem the current existing methods cannot
deal with.
Table 5.1: Estimates of regression parameters for the Framingham Heart Study data.
Effect
J9
SE
p-value
Smoking status (Yes=1, No=O)
0.360
0.134
0.007
Cholesterol (mgjdL)
0.004
0.001
0.006
Body mass Index (kgjm 2 )
0.039
0.016
0.016
Systolic blood pressure (mm.Hg)
0.017
0.005
< 0.001
Waiting time (year)
0.006
0.019
0.770
CHn
-0.616
0.132
< 0.001
CVA
-0.308
0.280
0.272
Gender (Female= 1, Male=O)
,
92
Chapter 6
Misspecification of Marginal
Hazard Models
6.1
Introduction
In the proof of the asymptotic distributions for regression parameter estimators in
Chapter 3, it is critical that the marginal hazard functions Aijk(t; Z) are correctly
specified. In this chapter we study the consequences of misspecification of Aijk(t; Z).
The misspecification of a marginal hazard model may happen in a variety of ways.
First, the model family may be misspecified. For example, the Cox regression hazard
model is assumed for analysis when, in fact, the accelerated life model holds. Secondly, if the choice of the Cox hazard model is indeed correct, the wrong functional
form for the regression portion, exp{,B' Z}, may be assumed, which includes omitting
important covariates from the model or choosing the wrong functional form for a
covariate. The baseline types may also be misspecified even when the Cox model is
.
applicable and the correct functional form of exp{,B' Z} is used, such as a common
baseline hazard model is assumed but a mixed baseline hazard model should be used.
93
As in Chapter 3 we shall confine our attention to the mixed baseline hazard
model in this chapter since the technical development for the mixed model combines
the features of both the distinct baseline hazard model and the common baseline
hazard model. In Section 6.2 we derive the asymptotic properties for the estimator
under an assumed mixed baseline model which is ]possibly misspecified. We then
give the asymptotic properties of the estimator under an assumed distinct baseline
hazard model and under an assumed common baseline hazard model, respectively,
without proof for the sake of conciseness. In Section 6.3 we apply the general results
developed in Section 6.2 to some special cases, including the case of misspecifying the
type of baseline hazard function for the Cox model when the functional form of the
regression portion, exp{,8' Z}, is correct.
To avoid imposing more conditions, we assume in this chapter that the covariate,
counting, and censoring processes are independent and identically distributed. In
other words, (Xijk , e5ijk, Z ijk)( i = 1,·", n) are n iid realizations of (X1jk , e51jk , Z Ijk)
for j
= 1,···, J and k = 1,···, K.
We assume in the sequel that the true underlying marginal hazard func-
tion for (i,j, k) is Aijk(t). Note that the true marginal hazard model Aijk(t) may
not even belong to the Cox regression model family. In addition to the notation
SW(,8, t), S}~)(,8, t), sW(,8, t)
= E(SW(,8, t))
(d
= 0,1,2),
\,;(,8, t), and etc. we
introduced in Chapter 3 under an assumed marginal baseline hazard model, hereafter, we use the following notation for the true underlying marginal hazard function
Aijk( t):
S}~)(t) =
K
L
SW(t),
k=l
sW(t) = E(SW(t)),
94
d == 0,1,2,
.
.
and
S~~)(t)
K
= ~ SW(t)
d = 0,1,2,
k=1
where the expectations are taken with respect to the true model of (i,j, k).
6.2
Asymptotic Properties of Regression Estimators
6.2.1
Let
Mixed Baseline Hazard Models
/3 be defined as the solution to the pseudo partial likelihood score equations (c.f.
2.3)
K
.
S}~)(,8'U)}
T{ Zijk(U) L
fa
k=1 j=1 i=1 0
~L
J
n
(0)
Sj. (,8, u)
under an assumed mixed baseline hazard model
dNijk(u) = 0
(6.1)
Theorem 6.1 (Consistency of /3 under a possibly misspecified marginal mixed base-
line hazard model) Let,8* be the unique solution to the system of equations q(,8) = 0,
where
,8)
q(
(1)(,8 t)
r
{S(I)(t) _ Sj.
, s(O)(t)}dt.
~ Jo
(0)(,8 t)
J
T
=~
J.
J=1
Suppose that the following conditions hold:
Condition 1. For j
SJ.
,
J.
= 1,···,J and k = 1,···,J{
E{ sup Y'ijk(t)!Zljk(t)1 2 Aljk(t)} <
00.
te[O,T]
There exists a neighborhood B of,8* such that
E{
sup
Y'ijk(t) IZljk(t) 12 e,8' Z1 11c(t)} < 00,
te[o.T].,8eB
95
(6.2)
AOj(t) ~ 0, and
loT AOj(t)dt < 00.
Condition 2. Pr{¥ijk(t) = 1 Vt E [0, T]} > 0.
Condition 9. The matrices
.
K
- E fa
= 1""
*
(0)
"j({3, t).5jk (t)dt
0
k=l
are positive definite, Vj
T
,J.
Then the maximum partial likelihood estimator f3 is a consistent estimator of {3*.
Proof: This proof follows the argument as in the proof for Theorem 3.1. First,
following the proof of Theorem 3.3, it is easy to show that Condition 1 and Condition 2
implies that for j = 1,,'" J, k = 1",', K, and d = 0,1,2
sup IIS~~)(t) - .W(t)1I .~
te[O,T]
°
(6.3)
and there exists a neighborhood B of {3* such that,
sup
IISW({3, t) - .W({3, t)1I ~
te[o,T],{3eB
as n ~
00;
°
(6.4)
sW(t) is bounded on (0, T) and sWUf, t) on B x (0, T); and s~~)(t) is
bounded away uniformly from zero on (0, T) and s.~~)({3, t) away from B x (0, T) as
well.
Consider the process
N
G({3, t)
1
= -[I({3, t) -
1({3*, t)]
n
where
96
.!....
K
= L E Gjk ({3, t)
~lbl
N
.
Define the process
J
G(~, t)
K
= L: L: Gik(~, t)
i=1 ];=1
where
Then
1~ r {Iog S(O)(Q*
sj~)(~, u) 1 s}~)(~, u) } dN.
) - og
-
-~!- Jo
=
_
,=1
r
Jo
J.
(O)(Q*)
IJ, U
SJo IJ, U
{10g sj~)(~*,u)
sj~)(~,u) -10 S}~)(~'U)} d (Noik(U)) .
g s}~)(~*,u)
n
Based on (6.4) and boundedness of s~~)(~, u), for every
U
E [0, r] and ~ E
1 SJ~)(~, u)
1 s}~)(~, u)
og sj~)(~*, u) - og s}~)(~*, u)
1
sj~)(~, u)
)
sJ. IJ,U
og (O)(Q
~
as n
-+ 00.
1
-
sj~)(~*, u)
)
sJo IJ ,U
og (O)(Q*
0
Also, from (1) in Lemma 3.1
~ + Pr {faT SJ~)(u)du ~ b}.
Choose b >
( )
iik U
faT s~~)(u)du,
then in view of (6.3) we have
·Pr {faT
sj~)(u)du ~
97
b}
~0
B
as n ~
00.
As the consequence of the boundedness condition of s~~){t), ~ ~ 0 as
TJ ~ 00.
Thus,
lim lim Pr {Noik(T)
n
'1too n_oo
~ TJ 1> = o.
J
(6.5)
Therefore, we have
Gik{~' T) - Gik(/J, T) --.£-+ 0,
that is, G(/J, T) is asymptotically equivalent to G{~, T).
The compensator of Gik(~, t) is
Then, for each
~ E
=
1
B, the process
n
-n!'"
1=1
1t{
0
I~(O){/J,U)} dM"k{U)
*'
(~ - ~ ) Z"k{U)
-log - (O){(.I*)
).
I)
Si.
I)
fJ' U
is a locally square integrable martingale in t since the integrand is locally bounded
and predictable. The predictable variation process of this martingale at t is
.
98
+ ( log
)2}
s~~)(,8,u)
(0)
*
Sj. (f3 ,U)
Yijk(U)-\ijk(U)du
_ .!.n Jor{(f3-f3*)'S(2)(U)(f3-f3*)-2([3-[3*)'S(1)(U)IOg
S~~)([3,U)
Jk
Jk
(O)(a*)
SJ.
+
log Sj.(0) ([3, u)
) 2 S(O)(u) }
(O)(a*)
sJ. ,." U
(
,."
U
du
Jk
It now follows from (6.3), (6.4), and the boundedness conditions of s~~)(t) and s~~)([3, t)
that for each [3 E B, nHjk ([3, r) converges in probability to a finite function of [3.
Hence, the inequality of Lenglart (2) in Lemma 3.1 implies that, for each [3 E Band
as n
--+ 00,
Also, as the result of (6.3), (6.4), and the boundedness conditions of sW(t) and
sW([3, t)(d
= 0,1), for each [3 E B, A jk ([3, r)
( a) =
9jk,.,
T
1
o
{
converges in probability to
(O)(a u ) }
([3 - [3 *)' Sjk
(1)(
(0)(
u) - Sjk
u)log Sj.,."
(0)
*
duo
. Sj. ([3 ,u)
Hence, Gjk([3, r) must converge in probability to gjk([3), as long as [3 E B, for j
=
1,''', J and k = 1"", K. Therefore,
G([3, r) ~ g([3)
where
K
J
=L L
g([3)
gjk([3).
j=lk=l
Using the boundedness conditions of sW(t) and s}~)([3, t)(d = 0,1,2), we may evaluate
the first and the second derivatives of g([3) by taking partial derivatives inside the
integral [c.f., Corollary 5.9 in Bartle (1966)]. Hence, for each [3 E B,
8
8[3g([3) -
~~
r { S3k(1)( U) _
j=1k=l
0
L..J L..J Jo
99
s~~\[3,
u) (0)( )} d
(0)
S3k u
u
Sj. ([3, u)
~ ior {8).(1)( U ) _ 8~~)(,8,
u) (0)( )} d
(0)
S).
U
U
-
LJ
j=1 0
=
which is 0 at ,8
Sj. (,8, U)
q(,8)
= ,8* by the definition of {3*.
-
Furthermore,
_LJ
~ r {8}~)({3, u) _
Jo
j=1 0
= -
(0)
Sj. ({3,u)
(8~~ll(,8, U)) 18l
(0)
Sj. (,8,u)
2
}
(0)(
S3.
)d
u
U
J r
L
io Vj(,8, u)s~~)(u)du
j=1 0
which is negative definite at ,8 = ,8* by Condition
a.
Therefore, G({3, r) converges
pointwise in probability, for each ,8 E B, to a concave function 9(,8) on B with a
unique maximum at ,8
= ,8*.
The random function 0'(,8, r) is also concave and has a
unique maximum at ,8 = ,8* when the maximum exists. Following from Lemma 3.2,
the random concave function G(,8, r) converges to 9(/3) in probability uniformly over
i3 of 0(,8, r) converges in probability to the
maximizing value of ,8* of 9(,8), that is, i3 ~ ,8* as n ~
0
B. Consequently, the maximizing value
00.
The foregoing result in Theorem 6.1 is a multivariate failure time generalization
of Theorem 2.1 of Struthers and Kalbfleisch (1986) and (2.1) of Lin and Wei (1989).
When the mixed baseline hazard model (6.1) is correct, i.e.,
note that we then have
8W(t)
Hence, ,8*
= AOj(t)8W(,80, t), (d:= 0,1).
= ,80 is the solution to q(,8) = O.
The asymptotic distribution of the maximum partial likelihood estimator
fj
under a possibly misspecified marginal mixed baseline hazard model is given in the
next theorem.
100
Theorem 6.2 (Asymptotic normality of fj under a possibly misspecified marginal
mixed baseline hazard model) Assume that Condition 1 - Condition 3 given in
Theorem 6.1 are satisfied.
vn {SJ~)((3*, t) as n
--+ 00,
In addition, suppose that
vn {Fj.:n(t) -
Fj.(t)} and
s;~)((3*, t)} are normal processes with mean O. Then
where
(6.6)
J
W" (f3)=
IJk
r{z..
Jo
IJk
K
K
J
A((3) = E{~ ~ ~ ~ Wljk((3)W~!g((3)},
j=lk=I!=lg=1
8;~)((3'U)}{dN."
(U)_l'ijk(U)eXP{(3IZijk(U)}dF-(U)}
(0)( (.I)
IJk
(0)( (.I
)
J.'
(u)-
sJ.
sJ.
fJ' U
fJ' U
(6.7)
K
Fj.:n(t)
K
= ~ Fjk:n(t),
Fj.(t)
k=1
= ~ Fjk(t)
k=1
J
1((3) = ~ I j ((3),
;=1
and 1;((3) is defined as in Condition 3 in Theorem 6.1.
Proof: The pseudo partial likelihood score function under the working mixed
baseline hazard model is
J
- I: U;((3)
;=1
where
101
The first order Taylor expansion of U(f3) centered at (3* results in
U((3)
where
= U((3*) +
:(3U((3) 1(3=19((3 - (3*)
/3 is on a line segment between fj and (3*.
Since U(fj) = 0 by the definition of
fj, we have
where
First, we shall show that
JnU((3*)
~ N p (0, A((3*)) .
To this end, we shall show that n- 1 / 2 U j ((3*) can be expressed as a sum of independent
and identically distributed random vectors
1
1
-w((3*)
~
n
J
K
= - E E E tVijk((3*)
~ i=lj=lk=l
plus terms that converge in probability to zero. In other words n- 1 / 2 U((3*) is asymptoticallyequivalent to n- 1 / 2 w((3*) in the sense that
In
Let Wj((3)
{U((3*) - w((3*)}
= Ei=l Ef:"=l Wijk((3).
Notice that
__
1 {U .((3*) - w'((3*)}
~:J
:J
102
--!~ o.
(6.8)
xdFj.(u)
(6.9)
Now, n 1 / 2 {Fj.:n (t) - Fj.(t)} converges weakly to a. zero-mean normal process by assumption. As the consequence of the tightness of the normal process of n 1/2 {Fj .:n (t) -
Fj.(t)} and the boundedness of 8~~)(,8, t), (6.8) vanishes as n
that
11(6.9)11
<
sup
UE[O,T]
-
{S~~)(,8*,u)
(0)
*
Sj. (,8 ,u)
-
8~~)(,8*,u)
(0)
S j.
*
(,8 ,u)
(0)((.1*
Sj.
01'(1)
103
fJ
,U
)-1}
--+
00. We also have
because
(,1*)
5 j.(1)( fJ'
U
p { S(O)(f3*)
uelo,T]
j.
, U
SU
-
(,1* )
Bj.(1)( fJ'
U
(O)(f3*)
Sj.
, U
(0)
*
-1
S·
U
J. (/3, )
}
= 0 p( 1)
in view of (6.4) and the boundedness of s}~)(f3, u), plus
v'n ISJ~)(f3*, u) - s}~)(/3" u)1 = Op(l)
from the assumption of the normal process of
..;n{ S)~) (f3* , u) -
s}~) (f3*, u)}. Hence,
n- 1/ 2 U(f3*) is asymptotically equivalent to
K r {
B}~)(/3*,U)}
r.::-LLLJo
Zijk(U)- (0) *
V n i=1 j=1 k=1
Sj. (/3 ,U)
lIn
'Ti W(/3*) va
J
0
.. ( ) _ )'ijk(u)exp {/3*'Zi jk(U)}dF. ( )}
(0)
*
J. U
{dNIJk U
Sj. (f3 ,u)
X
•
Notice that n- 1/ 2 w(/3*) is a sum of n independently and identically distributed
p-component random vectors with mean 0 and cova.riance matrix
J
A(/3*) =
K
J
K
E{L L L L Wljk(/3*)W~jg(/3*)}.
j=lk=lj=lg=1
It then follows from the multivariate central limit theorem that
What is left is to show the consistency of 1(~) for 1(/3*) for any random
/3 =
~(n). We now establish the consistency using the t(~chniques applied in the proof of
Theorem 3.2. Notice that
r
- - j=1
L k=1
L Jo
J
K
Yj(/3, U )dN.jk (u)
0
104
and
1JKr
JK
-LL}o ¥;(,8,u)dN.;lc(U)-LL}or Vj({3*,u)s~~)(u)du
n j=llc=1 0
j=1 lc=1 0
<
t. t./[
+
+
+
{11(.8, u) - ,,;(,8, u)} dN.~(u) II
t.t.II[
t.t.II[
t.t.11!.'
{1I;(,8,u)-"M",u)} dN.~(U)11
";(/3", u) { dN.~<u) 11;(/3., u) {sj2)(u) -
sj2) (u)du }II
-w<u)} dull·
(6.10)
It follows from (6.4) and the boundedness of sW({3, t) that
sup
ue[O,T].{3e B
Therefore,
,8
lilT
0
{¥;({3,
u) - Vj({3, unll ~ o.
~ {3* together with (6.5) indicates that the first term of (6.10) con-
verges in probability to zero.
The continuity in {3, uniformly in t, of sW({3, t) and (6.5) shows that the second
term of (6.10) tends to zero also.
Following from Lenglart inequality (2) in Lemma 3.1,
Pr [I (Vj({3*,u))a.b{ dN.:(u) -S~~)(U)dU} > 8]
T
105
where, (Vj (f3 , U))a,b is the (a, b) element of Vj({3, u) and M.jk(U) =
E?:l Mijk(U). Hence,
(6.4) and the boundedness of sW({3, t) and Condition 1 implies the third term of (6.10)
vanishes.
Finally, applying (6.3), boundedness conditions of s;:>({3, t)(d = 0,1,2) and
Condition 3 directly to the fourth term of (6.10), we obtain that this term converges
in probability to zero. Thus, when {3 ~ (3*,
as n ~
00.
The proof is completed.
0
Note that Theorem 6.2 is a multivariate failure time generalization of Theorem 2.1 of Lin and Wei (1989). Also, when the marginal mixed baseline hazard
model (6.1) is correct, Wijk({3o) reduces to Dijk({3o) and E({3o) = E m ({3) [c.f. (6.7),
(3.1), (6.6), and (3.11)]. The sufficient regularity conditions of asymptotic normality
for y'n(fj - (3*) are stronger than those when the marginal mixed baseline hazard
model (6.1) is correct. It is natural to estimate the covariance matrix of y'n(!J - (3*),
E({3*), from the data by
(6.11)
where,
I({3)
r
z:
z:
z:
10 ¥j({3, u)dNijk(U),
n
1
n
J
K
= -
i=l j=l k=l
(6.12)
0
(6.13)
106
and
X
Note that
{
71.T
()
Ok U
dJ.V;
'3
dFi.:n ( u) }
SJ~)(f3, u)
•
(6.14)
by replacing f3*, B~~)(f3, t), s~~)(f3, t), and Fi.(t) with
and Fi.:n(t), respectively. It is interesting to note also that
the variance estimate for
Em
( ) (.l' Zoo (u)
'k U e fJ
'JIe
'3
E is obtained from E
/3, S~~)(f3, t), SJ~)(f3, t),
v
-.1;
/3, E is defined the same as the variance estimate for /3 m'
[c.f. (3.9) - (3.15), (6.11) - (6.14)]. The following theorem states that
E is
a
consistent estimator of E.
Theorem 6.3 (Consistency of
E(/3)
under a possibly misspecijied marginal mixed
baseline hazard model) Suppose that all the assumptions of Condition 1 to Condition 3
are satisfied and in addition the following condition holds,
E{
sup
Yiik(U)IZlik(U)lef3'Zlile(U)Yif9(V)IZlf9(V)lef3'Zl/9(V)} <
00
f3EB.UE[O,"I.tlE[O,"1
1$j,J$J,l$k,9$K
(6.15)
then
Proof: This proof follows the argument in the proof for Theorem 3.4. Basically, we
first need to show that
107
and
1(~) ~ 1(13*)
as n
--+ 00.
Then using Slutsky's theorem we obtain A(~) ~ A(f3*) and using
Slutsky's theorem again we get the desired results.
To show that the first part holds it suffices to show that each component of
converges in probability to the corresponding components in
that is, for a, b = 1,'" ,p,
Write the ath element of the vector bijk as
bijk,
a, then
(6.16)
(6.17)
(6.18)
108
n
+.!:. '" [
'T
(1)
(1)
'T
Sj.,a(~,U) dN.. (U) [ Sf.,b(~'U) dN· (U)
A
n ~ i o s(O)((.I )
1=1
J.
,.."U
A
io S(o)((.I )
f. ,.."U
IJk
Ijg
(6.19)
(6.20)
(6.21)
--n1 i=l
L ior { Zijg,b(U) n
0
A
'T
X
1
o
}ijk(u)e(3
(0)
A
S(l)
~~~
(/3 U)} dNijg(u)
A
'
Sf. ((3, U)
I
Zijlc(U)
Sj. ((3, U)
dFj.:n( U)
(6.22)
(6.23)
(6.24)
109
(6.25)
(6.26)
r
x Jo [Zlfg,b(U) o
'
ej,b(,8*,U)]
¥if (u)e 13• Zl/9(U)
9
(0)
*
Sj.(,8,u:)
dFj.(u)du
}
",
(6.27)
(6.28)
(6.29)
Note that (6.16) is just an average of iid random variables. Therefore, (6.16) ~
(6.23) by the law of large numbers.
Now, let's rewrite (6.20) and (6.27) in the following forms:
(6.20) =
dFj .:n ( U )dFj .: n ( v)
X S)~)(13, u)S}~)(13, v)
(6.30)
110
,-
(6.31)
(6.32)
(6.33)
(6.27) =
x
dFj.(u)dFf.(v)
s~~)(f3*, u )S}~)(f3*, v)
Xfj
(6.34)
(f3* u) dFj.(u)dFf.(v)
,II
, s~~)(f3*, u )s~~)(f3*, v)
(6.35)
- loT loT E{Yif9(V)ef30'Z1f9(V)Zljk,II(U)Yijk(U)ef30'Zlik(U)}
X
(f.l*
fj,b 1J ,V
)
dFj.(u)dFf.(v)
(
Sj. (f3*, U)S f~)(f3*, V)
(0)
x fj,II(f3*, u)ef,b(f3*, V)
dFj. (U )dFf.(V)
S(O)(f.l*
U)S(O)(f.l*
). 1J,
f. 1J, V )
111
(6.36)
(6.37)
Define
and
"
From (6.15), we also have
and
E{
sup
y}jk(u)ef31Z1i1l:(U)y}jg(v)ef31Z1/9(V)} <
00.
{3eB,Ue[O,"J,ve[O,"J
l$j,J$J,l$k,g$K
Using the above condition and a similar argument
<l.S
in Theorem 3.3, we have that
mjkajgb(f3, u, v) is a continuous function of f3 E Band Mjkajgb(f3, U, v) converges in
probability to mjkajgb(f3,u,v) uniformly in (u,v) E [O,T]2. By (6.4) and boundedness
of s}~)(/3, t), the above results and the consistency of (3, we have
Mjkajgb(!3, U, v)
p
sj~)(!3, u )S}~)(!3, v) ---.
mjka!.gb(/3*, u, v)
s}~)(/3*, u )s~~)(/3*, v)
uniformly in (u,v) E [O,T]2. Note that Fj.:n(u) converges to Fj.(u), that Fj.:n(u) is
bounded in probability and that mjkajgb(/3*, u, v) resides in the product space of left
continuous functions, it follows that (6.30) is consistent for (6.34). Using similar
arguments, we can show that: (6.31) ~ (6.35), (ti.32) ~ (6.36), and (6.33) ~
(6.37). Hence, (6.20) ~ (6.27).
Again using the similar argument, we can show that (6.21) ~ (6.28) and
(6.22) ~ (6.29).
112
Now, we are going to show that (6.17) is consistent for (6.24).
..
< A+B
where
and
"
By the law of large numbers and consistency of 13 we obtain that B ~ 0 as n
--+ 00.
From (6.4) and boundedness of s}:) (,8, t), consistency of 13, and the iid condition across
the families we have
A
1
n
L:
<
fNi;k( T ) IZijg,b(Xijg) 1
n i=l
<
fIZijg,b(Xijg)1
n i=l
--+
p
--+
1
n
L:
fE( Zljg,b(X1jg))
0
113
since E(Zljg,b(X1jg)) <
00
by Condition 1. Hence, A ~ 0 as n
----+ 00.
Using similar
arguments, we can show that (6.18) ~ (6.25), and (6.19) ~ (6.26). Therefore,
.
To prove the second part, we write
11
1(,8) -
1(,8*)11
1JKr
-n E E Jo
j=l k=l
JK
Yj(/3, U)dN.jk ( u) -
0
E E Jor
j=l k=l
0
l1j(,8*, U)s~~>C8*, U)AOj( U)du
+t. Ell!.' {Vj(~, u) - u)} dN.~( u) I
+t.EII!.' V;(p·,U){dN.~(U) '\~'du}11
v;(p',
..
-
.
A
A
Then, using the same argument as for 6.10, we can show that 1(f3)
6.2.2
Let
p
----+
1(,8*).
0
Distinct Baseline Hazard Models
/3 be defined as the solution to the pseudo partial likelihood score equations
J
EK EE
f
Jo
n
k=l j=l i=l
'T {
Zijk(U) -
sW (f3, u) } dNijk(U) = 0
(0)
Sjk (f3, u)
under an assumed distinct baseline hazard model
(6.38)
114
Let (3* be the unique solution to the system of equations
q({3)
If
J
K
faT
j=1
k=1
0
= L: L:
S(I)({3 t)
t:) ,
{sW(t) -
Sjk ({3, t)
S)~)(t)}dt
= o.
(6.39)
Define
E({3) = I-I ({3)A({3)I- 1((3),
J
K
J
K
L: L: L:
A({3) = E{L:
Wljk({3)W~Jg({3)},
j=lk=I/=lg=1
" (~) - r
{Zoo ( ) _ sW({3,U)} {dN... ( ) _ }'ijk(u)exp{{3'Zijk(U)} dF- ( )}
"k U
(o)(~)
"k U
(o)(~ )
,k U ,
o
Sjk ,." U
Sjk ,." U
W"k'" - Jo
I({3)
J
K
j=1
k=1
= L: L: I jk({3),
and
-
8
Z (X)
ijk { ijk ijk -
S(I)(~ X .. )}
jk,."
sj~)({3,
"k
X ijk )
{z.o"k (U)_ SW({3,U)}
- Inr
(0)
o
Sjk ({3, u)
0
{dl\Too ( ) _ V:o ( ) (3'Zij/c(U) dFjk:n(U)}
H"k U
.Li,k U e
(0)
Sjk ({3, u)
115
•
Theorem 6.4 (Asymptotic properties of
/3
under
(l
possibly misspecified marginal
distinct hazard model) Suppose that the following conditions are satisfied:
Condition
= 1,···, J,
1. For j
k
= 1,···, J(,
E{ sup Yijk(t)IZljk(t)12 Aljk(i)} <
00.
te[o,'Tj
There exists a neighborhood B of f3* such that
E{
Yi;k(t)IZl;k(t)1 2e,8' Zli lo (t)} <
sup
00,
te[o,'Tj,,8e8
AO;k(t)
~ 0,
and
fa'T AO;k(t)dt < 00.
Condition 2. Pr{Yijk(t) = 1 "'It E [0, T]}
> o.
Condition 9. The matrices I jk (,8*) are positive definite.
In addition, assume that
.;n {F;k:n(t) -
normal processes with mean
o.
Fjk(t)} andJTi {SJ~)(,8*, t) - sW(,8*, t)} are
Then
y'n(/3 - ,8*) ~ N
as n --+
6.2.3
Let
00
p
(0, E(,8*))
and E(,8*) can be consistently estimated by
±(/3).
Common Baseline Hazard Models
/3 be defined as the solution to the pseudo partial likelihood score equations
•
under an assumed common baseline hazard model
(6.40)
116
and
K
J
F..(t)
= L: L: Fik(t).
j=lk=l
Theorem 6.5 (Asymptotic properties of /3 under a possibly misspecified marginal
common baseline hazard model) Suppose that the following conditions are satisfied:
= 1"", J
Condition 1. For j
and k
= 1"", J{
E{ sup Yijlc(t)IZljlc(t) 12Alilc(t)} <
te[O,T]
00.
There exists a neighborhood B of (3* such that
E{
sup }}jlc(t)IZljk(t)12e(3'Zlj,,(t)} <
te[o,T].(3e8
00,
fT
AO(t) ~ 0, and Jo Ao(t)dt <
00.
Condition 2. Pr{Yijk(t) = 1 "It E [0,
Tn > 0.
Condition 3. The matrix 1«(3*) is positive definite.
In addition, assume that y1n{F..:n (t)-F..(t)} and y1n{S~O)«(3*,t)-s~.O)«(3*,t)}are
normal processes with mean 0, respectively. Then
vn(!3 - (3*) ~ N
as n
6.3
--+ 00
p
(0, E«(3*))
and E«(3*) can be consistently estimated by
E(/3).
Special Cases
The asymptotic results developed in Section 6.2 are very general in the sense that
they apply to any possible kind of model misspecification. In this section we consider
some special cases of model misspecification. First, we examine the consequences of
baseline type misspecification when the true marginal hazard model is in the Cox
regression family and the regression part, exp{(3' Z(t)}, is correct.
118
Corollary 6.1 If the true underlying marginal hazard model is a common baseline
hazard model (6.40), then the estimator
/3
under either an assumed mixed baseline
hazard model (6.1) or an assumed distinct baseline model (6.38) is consistent for the
•
true value of the regression parameter f30.
Proof: If the true state of the marginal hazard model is a common baseline model
(6.40) but is misspecified as a mixed baseline hazard model (6.1), we have
s~~)(t) =
K
I: s}:)(t)
Ie=!
- t E{Ao(t)Yljle(t)Z1jle(t)@def3~ZlJ,,(t)}
1e=1
-
Ao(t)
t E{Yijle(t)Z1jle(t)@def3~ZlJ"(t)}
1e=1
•
Hence, from (6.2)
_
-
~
r{ (1)(
)
s~~)(f3, t) Sj.(0)( f3o, t )} Ao (t) dt.
L.J in Sj. f3o' t - (0)
j=1
It is clear that f3*
0
Sj.
= f30 is the solution to the system of equations in (6.42).
by Theorem 6.1, the estimator
(6.42)
(f3, t)
Therefore,
/3 from the assumed mixed baseline hazard models (6.1)
is consistent when the true marginal hazard model is actually a common baseline
model (6.40). By the same argument, it is easy to show that the estimator
/3 from
the assumed distinct baseline hazard models (6.38) is consistent for f30 when the true
marginal hazard model is, in fact, a common baseline model (6.40). 0
The results in Corollary 6.1 guarantee the consistency of the estimator
/3 from
either a mixed baseline hazard model (6.1) or a "distinct baseline model (6.38) when the
119
Let 13* be the unique solution to the system of equations q(f3)
= 0 and
(6.41)
Define
K
J
A(f3) = E{E
j=1
.. (/.I)
W'Jk f.I
=
1'" {zoo'Jk ( ) o
E E E Wljk(f3)W~Jg(f3)},
k=I/=1 g=1
8~.I)(f3,U)}
U
(0)(/.1)
f.I'U
S ..
J
1(13)
11(13, t)
A
U
r
K
i=lk=1
11(13, t)s~~)(t)dt,
8~.2)(f3, t)
=-
s~~)(f3, t)
,
= ]-1 (f3)A(f3)]-1 (13),
1
n
n
i=1
n
{8~.I)(f3, t) }®2
-
K
J
= - E EEl
1
(0)(/.1
0
= s~~)(f3, t)
V(f3, t) =
A(f3)
{dR.'Jk. ()_}ijk(u)exp{,8'Zijk(U)}
dF ( )}
s.. f.I,U
= E E in
E(f3)
1(13)
K
J
i=1 k=1
.,.
V(f3, t)dNiik(u),
0
S~.2)(f3, t)
S.~O)(f3, t) J
K
{S~.I)(f3, t) }®2
S.~O)(f3, t)
K
J
E E E E E Wiik(f3)W~/g(f3),
n i=1 i=1 k=I/=1 g=1
F..:n(t)
J
K
j=1
k=1
= E E Fjk:n(t),
117
)
.. U
,
true model is a common baseline hazard model (6.40). If the common baseline model
is true and (30 is a scalar, we might anticipate that the variance for the regression
estimator from the assumed mixed (or distinct) baseline hazard models would be
larger than that from the common baseline hazard models, primarily because the
mixed or the distinct baseline model involves more nuisance baseline hazard functions
which over-stratify the risk sets and failure events. On the other hand, however, the
robustness of the "sandwich" type variance estimator may compensate for the possible
loss of efficiency. The following corollary indicates that the converse of Carollary 6.1
is not true.
Corollary 6.2 If the underlying marginal hazard model is a distinct baseline hazard
model (6.38), then the estimator
/3
under either an assumed mixed baseline hazard
model (6.1) or an assumed common baseline hazar(l model (6.40) is not consistent
for f30. Similarly, the estimator
(6.40) is not consistent for
f30
/3
from an assumed common baseline hazard model
in a mixed baseline hazard model (6.1).
Proof: When the true marginal hazard model is the distinct baseline model (6.38)
but is misspecified as the mixed baseline hazard model (6.1), then
s}~)(t) =
K
L
s}:)(t)
k=1
- t E{AOjk(t)Yijk(t)Z1jk(t)®def3~Zljk(t)}
k=1
- t AOjk(t)E{Yijk(t)Z1jk(t)®def3~Zljk(t)}
k=1
K
- L AOjk (t)s}:) (f30, t),
d = 0,1,2,
k=1
and from (6.2)
J
K
j=1
k=1
(1)(
q ( f3) = '"
~ '"
~ 10r{ 8jk
f30, t ) 0
(1)(/.1 '
8j. fJ' t) (0)(
(0)
\-Sjk f3 0, t )} AOjk ( t ) dt.
Sj. (f3, t)
120
(6.43)
..
Therefore, f30 is not the solution to the system of equations in (6.43) because
sW(f3o, t)
•
-::j:.
s~~)(f3o, t)/ s~~)(f3o, t)sW(f3o, t) in general. It follows from Theorem 6.1
that the estimator
/3
from an assumed mixed baseline hazard model (6.1) is not
consistent when the true marginal hazard model is a distinct baseline hazard model
(6.38). Similar arguments also work for the assertion in other parts of the corollary.
o
The results in Corollary 6.1 and Corollary 6.2 agree with our expectation that
over-stratification does not cause inconsistent estimators, but the converse is not true.
When f3* is not equal to f30' the parameter f3* will not be interpretable in
general, unless its relationship to f3 0 can be characterized. We focus our attention
here on {30,}, the first component of the true parameter vector, {30,1.
Corollary 6.3 Let (30,1, Zijk,l (t), and 131 be the first component of the parameter vector f30, the covariate vector Zijk(t), and the estimator vector
that (30,1
=0
/3, respectively.
Suppose
and that Zijk,l (t) is independent of censoring and the other (p - 1)
components of the covariate vector Zijk,2(t),'" ,Zijk,p(t). Then
131 under an assumed
distinct baseline hazard model is consistent, that is,
as n
--+ 00.
Similarly, if Zijk,l is identically distributed across failure types and
/3
is estimated under an assumed marginal mixed baseline hazard model with different
baselines for members or if Zijk,l is identically distributed across members and failure
types and
then
/3 is estimated under an assumed marginal common baseline hazard model,
131 is consistent.
Note that, in the corollary, the true underlying marginal hazard model Aijk(t) may
not be even in the Cox regression family.
Proof: Let S}~~l (t) and S}~~l (f3*, t) be the first components of sW(t) and sW(f3*, t),
respectively. Let
/3
be the estimator obtained from the assumed distinct baseline
121
hazard models (6.38). To show that P1 ~ 0 when #0,1
= 0, it suffices to show
= 0 when {3o,1 = 0 by (6.39) and Theorem 6.4. In other words,
verify {3i = 0 satisfies the following condition when {3o,1 = 0:
{3;
J
K
"" f
!- L..J io
J=1 k=1
Notice that when {30,1
(1) ( )
8jk,1
t
T
{8(1)
Jk,1
(t) _
(1) «(.1* t)
Sjk,1 ,., ,
iO)(t)}dt
(0)( (.1* t) Jk
8jk ,., ,
that
it is sufficient to
= O.
(6.44)
= 0,
=
Also, under {3i = 0
Therefore, (6.44) is satisfied, i.e., the P1 from the assumed distinct baseline model is
consistent.
When
Zijk,1
is identically distributed for k = 1o, ••• , ]{, we have
and
.
It is clear that 0 is the solution to the first component of q(f3)
= 0 in (6.2).
It follows
from Theorem 6.1 that P1 obtained under an assumed mixed baseline hazard model
is consistent for {3o,1. Using the same argument, we can easily show that P1 from an
122
assumed common baseline hazard model is consistent for (3o,ll if Zijk,l is identically
.
distributed for j
= 1,· .. ,J and k = 1,·· . ,K.
0
Suppose that Zijk,l (t) is the indicator of the treatment assigned in a randomized clinical trial. The random assignment of the treatment should guarantee that
Zijk,l(t) is independent of censoring and the other covariates of interesting. The re-
sults in Corollary 6.3 imply that if the treatment has no effect on failure times then
the estimated treatment effect will be zero even if the assumed marginal model is
misspecified.
6.4
Concluding Remarks
We derived the asymptotic properties for the maximum partial likelihood estimator
f3 under a possibly misspecified marginal Cox regression hazard model.
The general
results are applied to some special cases, including the case of misspecifying the
type of baseline hazard function for the Cox model when the functional form of
.to
the regression portion, exp{,8' Z}, is correct. Simulation studies are conducted in
Chapter 7 to obtain information on the effect of misspecified marginal hazard models
for applicable sample sizes in practice.
•
123
Chapter 7
Simulation Studies of Marginal
Hazard Model Misspecification
7.1
Introduction
This chapter deals with the numerical evaluation of marginal hazard model misspeci-
.
fication. In Chapter 6 we derived the asymptotic properties for the maximum partial
likelihood estimator
/3 under a possibly misspecified marginal Cox regression hazard
model. In this chapter we obtain information via simulations on the effects of misspecified marginal hazard models for practical sample sizes in applications. We confine
our attention to the case of misspecifying baseline type when the true marginal hazard
model is in the Cox regression family and the regression part exp{,8' Z} is correctly
specified.
-.
124
7.2
Simulation Parameters and Summary Statistics
In this simulation study, the true marginal hazard model is a mixed baseline hazard
model (6.1). The simulation parameters and methods are similar to those in Chapter 4
and are, therefore, not described in detail here. The simulation results are summarized
within each simulation configuration by the following: mean
mean ~, mean
13m, mean Vm , mean /3d'
/3c' and mean Yc, where the subscripts m, c, and d indicate that the
estimator is obtained under an assumed marginal mixed baseline hazard model (i.e.,
the correct model), an assumed distinct baseline hazard model (6.38), and an assumed
common baseline hazard model (6.40), respectively. In this simulation study, the only
part of model misspecification is the type of baseline hazards; the exp(,8 Z) part is
correct.
7.3
Results and Discussion
The simulation results are displayed in tables 7.1 - 7.6. In these tables ,80 and () refer
to the regression parameter and the dependence parameter of the underlying model
and n is the sample size for each simulation run. Each row in the tables is based on
500 simulation runs.
These simulation results suggest that the estimator
/3 d under an assumed dis-
tinct baseline hazard model is approximately unbiased and the estimator
/3c obtained
from an assumed common baseline model is severely biased for each simulation con•
figuration with /30
= 0.7,
when the underlying marginal baseline hazard model is
/3c is more severely biased for
the exponential
marginals than for the Weibull marginals. All estimators, including
/3c' are approxi-
the mixed baseline model. Estimator
..
mately unbiased for each simulation configuration when ,80 = O. The unbiasedness of
125
estimators when (30 = 0 is expected as a result of Corollary 6.3 since the covariate Z is
identically distributed for both j and k. The sandwich type robust variance estimator
under an assumed distinct baseline hazard model agrees with that obtained from the
•
true mixed baseline hazard model, as judged by the values of mean Vd and mean Vm ,
although mean
Vd tends to be consistently a little larger than mean Vm • This result
agrees with the conclusion by Kalbfleisch and Prenti<:e (1980, p.88) for the univariate
failure time data that the loss of efficiency in the estimate of f3 is generally not severe
when the stratification is used unnecessarily. The rohust variance estimator under an
assumed common baseline hazard model underestimates the underlying true variance,
except when (30
= 0, in which case /3c is consistent.
Table 7.1, Table 7.2, and Table 7.3 show the comparison results of censoring for
the failure times with Normal(O, 1) distributed covariate and exponential marginals.
Table 7.1 gives the summary statistics of the simulation results when there is no
censoring. The simulation results with censoring are shown in Table 7.2 and Table 7.3.
The Uniform(O,5) censoring distribution in Table 7.2 resulted in 12% and 14% of the
data censored when (30 = 0 and (30 = 0.7, respectively. The Uniform(O,l) censoring
distribution in Table 7.3 led to about 14% of the data censored. It shows clearly
from these tables that all estimators are approximately unbiased when (30
that
Pc is severely biased when (30 = 0.7 regardless of censoring proportion.
•
= 0 and
However,
the degree of underestimation of the robust variance estimator under an assumed
common baseline hazard model decreases as the censoring percentage increases.
Comparing the results from Table 7.2, and Tahle 7.4, we see that all estimators
are approximately unbiased when (30 = 0 and that
Pc is
biased when (30
= 0.7
no
matter whether the exponential marginals or the Weibull marginals are used. However,
Pc is more severely biased for the exponential marginals than for the Weibull
marginals. Also, the degree of underestimation of the robust variance estimator under
the assumed common baseline hazard model decreases when the Weibull marginals
are used.
126
.
Table 7.2 and Table 7.5 indicate that
Pc
is severely biased when
/30
0.7
regardless of the use of normal covariate or the use of Bernoulli covariate.
fie is consistently more severely biased for the
the Weibull marginals when /30 = 0.7, Bernoulli(0.5)
The results in Table 7.6 show that
exponential marginals than for
covariate distribution and uniform(O, 5) censoring distribution are used.
In brief summary, the simulation results indicate that the asymptotic results
derived in Chapter 6 for the estimator under a possibly misspecified marginal model
are preserved in practical sample sizes. Further numerical study is important to
investigate the loss of efficiency in the estimate of /3 when stratification of failures is
used unnecessarily.
.
•
127
Table 7.1: Simulation results (based on 500 simulation runs) for the true mixed
baseline hazard model with Normal(0,1) covariate, no censoring, and exponential
marginals.
130
8
n
mean ~m
mean ~d
mean ~c
0
.25
100
200
.000
-.001
.000
-.001
-.000
-.000
.0024
.0012
.0025
.0013
.0024
.0012
.80
100
200
.000
-.002
.001
-.002
.000
-.001
.0024
.0012
.0025
.0013
.0024
.0012
1.50
100
200
-.000
-.002
.001
-.002
.000
-.001
.0025
.0012
.0025
.0013
.0024
.0012
3.00
100
200
-.000
-.002
.000
-.002
.000
-.001
.0025
.0012
.0025
.0013
.0025
.0012
.25
100
200
.709
.703
.704
.700
.523
.522
.0053
.0027
.0052
.0027
.0031
.0016
.80
100
200
.707
.702
.703
.700
.523
.521
.0044
.0023
.0045
.0023
.0030
.0015
1.50
100
200
.705
.701
.702
.699
.523
.521
.0040
.0020
.0040
.0020
.0030
.0015
3.00
100
200
.704
.700
.702
.699
.523
.520
.0037
.0019
.0037
.0019
.0030
.0015
mean
Vm
mean
Vd
mean
Vc
•
.7
128
Table 7.2: Simulation results (based on 500 simulation runs) for the true mixed
baseline hazard model with Normal(O,l) covariate, uniform(O, 5) censoring distribution (12% censoring for
Po
= 0 and 14% censoring for
Po
= 0.7), and exponential
marginals.
f30
0
•
.7
(J
n
mean
Pm
mean
Pd mean Pc mean Vm mean Vd mean Vc
.25
100
200
.001
-.001
.001
-.001
.000
-.001
.0028
.0014
.0029
.0014
.0029
.0014
.80
100
200
.000
-.001
.000
-.001
-.001
-.001
.0029
.0014
.0029
.0014
.0029
.0014
1.50
100
200
.000
-.001
.000
-.001
-.001
-.001
.0029
.0014
.0029
.0014
.0029
.0014
3.00
100
200
.000
-.001
.000
-.001
-.001
-.001
.0029
.0014
.0029
.0014
.0029
.0014
.25
100
200
.708
.701
.704
.699
.521
.518
.0056
.0029
.0056
.0029
.0036
.0018
.80
100
200
.706
.701
.704
.699
.521
.519
.0048
.0024
.0048
.0024
.0034
.0017
1.50
100
200
.704
.700
.703
.699
.520
.518
.0044
.0022
.0044
.0022
.0034
.0017
3.00
100
200
.704
.700
.703
.700
.520
.518
.0041
.0020
.0041
.0021
.0034
.0017
•
•
129
Table 7.3: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with Normal(O,l) covariate, uniform(O, 1) censoring distribution
(42% censoring), and exponential marginals.
130
6
n
mean ~m
mean ~d
mean ~c
0
.25
100
200
.001
-.000
.000
-.000
-.001
-.001
.0043
.0021
.0044
.0022
.0043
.0021
.80
100
200
.000
-.000
-.000
-.000
-.002
-.001
.0043
.0021
.0044
.0022
.0043
.0021
1.50
100
200
.000
.000
.001
.000
-.002
-.001
.0044
.0021
.0044
.0021
.0044
.0021
3.00
100
200
-.000
-.000
-.000
-.000
-.003
-.001
.0044
.0021
.0044
.0021
.0043
.0021
.25
100
200
.706
.699
.703
.698
.541
.538
.0068
.0034
.0068
.0034
.0051
.0025
.80
100
200
.705
.700
.704
.699
.542
.539
.0060
.0030
.0060
.0030
.0050
.0025
1.50
100
200
.703
.700
.703
.699
.541
.539
.0057
.0028
.0057
.0028
.0050
.0025
3.00
100
200
.703
.699
.703
.699
.541
.539
.0055
.0027
.0056
.0028
.0050
.0025
mean
Vm mean Vd mean Vc
•
.7
•
130
Table 7.4: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with Normal(0,1) covariate, uniform(O, 5) censoring distribution
(19% censoring for 130
130
0
Vm mean Vd mean Ye
n
mean ~m
mean ~d
mean ~c
.25
100
200
.001
-.000
.001
-.000
.000
-.001
.0031
.0015
.0032
.0016
.0031
.0015
.80
100
200
.001
-.000
.001
-.000
-.000
-.001
.0031
.0015
.0032
.0016
.0031
.0015
1.50
100
200
.000
-.000
.001
-.000
-.000
-.000
.0031
.0015
.0032
.0015
.0031
.0015
3.00
100
200
.000
-.000
.001
-.001
.000
-.000
.0031
.0015
.0032
.0015
.0031
.0015
.25
100
200
.707
.701
.703
.699
.636
.630
.0059
.0030
.0059
.0030
.0051
.0026
.80
100
200
.706
.700
.703
.699
.635
.629
.0050
.0026
.0051
.0026
.0045
.0023
1.50
100
200
.704
.700
.703
.699
.633
.629
.0047
.0023
.0047
.0024
.0042
.0021
3.00
100
200
.703
.700
.703
.700
.633
.629
.0044
.0022
.0044
.0022
.0040
.0020
()
•
.7
= 0 and 21 % censoring for 130 = 0.7), and Weibull marginals.
•
131
mean
Table 7.5: Simulation results (based on 500 simulation runs) for the true mixed
baseline hazard model with (30
= 0.7, Bernoulli(0.5) covariate, uniform(O, 5) censoring
distribution (9% censoring), and exponential marginals.
()
n
mean
Pm
mean
Pd mean Pc mean lim mean lid mean lie
.25
100
200
.704
.701
.697
.697
.515
.513
.0139
.0071
.0142
.0072
.0113
.0057
.80
100
200
.700
.699
.695
.696
.512
.511
.0129
.0066
.0132
.0067
.0111
.0056
1.50
100
200
.698
.698
.696
.696
.511
.511
.0124
.0063
.0126
.0064
.0111
.0056
3.00
100
200
.698
.697
.697
.696
.511
.510
.0121
.0061
.0122
.0061
.0111
.0056
132
•
•
Table 7.6: Simulation results (based on 500 simulation runs) for the true mixed baseline hazard model with sample size of 100, Bernoul1i{0.5) covariate, and uniform{O, 5)
= 0 and 9% censoring for {30 = 0.7 with
exponential marginals and 19% censoring for {30 = 0 and 16% censoring for (30 = 0.7
censoring distribution {12% censoring for {30
with Weibull marginals).
130
0
•
.7
(J
marginals
mean
Pm
mean
Pd
mean
Pc
mean
Vm
mean
Vd
mean
Vc
.25
Exponential
Weibull
-.006
-.005
-.005
-.005
-.006
-.009
.0112
.0122
.0114
.0125
.0113
.0122
.80
Exponential
Weibull
-.009
-.008
-.008
-.007
-.009
-.012
.0112
.0122
.0114
.0125
.0113
.0123
1.50
Exponential
Weibull
-.008
-.008
-.008
-.009
-.009
-.013
.0113
.0123
.0114
.0125
.0113
.0123
3.00
Exponential
Weibull
-.008
-.009
-.007
-.009
-.010
-.014
.0113
.0123
.0114
.0124
.0113
.0123
.25
Exponential
Weibull
.704
.704
.697
.697
.515
.659
.0139
.0148
.0142
.0151
.0113
.0150
.80
Exponential
Weibull
.700
.700
.695
.696
.512
.655
.0129
.0139
.0132
.0141
.0111
.0140
1.50
Exponential
Weibull
.698
.698
.696
.696
.511
.653
.0124
.0134
.0126
.0136
.0111
.0135
3.00
Exponential
Weibull
.698
.697
.697
.697
.511
.653
.0121
.0130
.0122
.0131
.0111
.0132
133
Chapter 8
Remarks
Multivariate failure time data with generalized dependence structure are common
in applications.
However, practically no work has been done on this issue. We
have proposed using a marginal model approach to multivariate failure data with
generalized dependence structure when the scientific interests are in the effect of
covariates on the risk of failures and knowledge of the dependence structure is not
•
,
available. Depending on whether the baseline hazards are distinguishable, we have
proposed three types of marginal hazard models: distinct baseline hazard models,
common baseline hazard models, and mixed baseline hazard models. Note that the
13
use of a common regression parameter vector of
failures dose not preclude the use of different
13
for all members and all types of
for different members or different
failures. We can use member and failure-specific covariates to introduce different
13
for different members or different failures.
For example, if Z 1
=
(Z, 0)' and
Z2 = (0, Z)', then 1301 and 1302 represents the effects of Z on failure time 1 and
failure time 2, respectively.
..
The proposed marginal models generalize the Vv'LW model and the LWA model.
Our distinct baseline model is equivalent to the WLW model if we stratify our anal-
134
ysis based on both failure types and members in a cluster. Similarly, our common
baseline model becomes a more general setup of the LWA model by allowing multiple
failures per member. However, we have to assume either a different baseline for each
combination of failure types and members in a cluster, or an identical baseline for all
combinations of failures and subjects in a stratum in order to apply the WLW model
or the LWA model, which may not be applicable in applications. Our mixed baseline
hazard model provides significantly grea.ter modeling flexibility and applicability, and
enables us to deal with some applica.tion problems the current existing methods can
not handle, such as the data from the Framingham Heart Study.
Inference on regression parameters for each type of model is based on a system
of pseudo score equations obtained under the working assumption of independence,
which is in the framework of generalized estimating equations. It may be more efficient to use the estimating equations which take into account the nature of dependence
•
explicitly via introducing weight matrices in a similar spirit as in the case of (noncensoring) longitudinal data suggested by Liang and Zeger (1986). Cai and Prentice
(1995) proposed using the inverse matrix of the covariance functions between counting process martingales to construct the weight matrix for the WLW model or the
LWA model. Their simulation results, however, showed that the efficiency gains are
important only if the dependence among the failure times are very strong. This may
indicate the difficulty of constructing optimal weight matrices because of the censoring
and the non-linear nature of the Cox model (Lin (1994)). Further research is needed
to study the efficiency of estimators by incorporating the dependence structure into
the estimating equations.
Relying on the theory of multivariate counting processes, stochastic integrals
•
and local martingales, we have proven that when the marginal hazard model is correctly specified, the estimators from all three types of the proposed marginal models
are consistent and asymptotically normal with a "sandwich" type robust covariance
matrix which can be consistently estimated. Note that, because the pseudo partial
135
likelihood score statistic is not a sum of independent terms, the large-sample properties do not follow the usual argument for the likelihood-based statistics. The simulation results show that the proposed large sample approximation is adequate even
when the sample size is relatively small (n = 50). The methodology is illustrated on
.
data from the Framingham Heart Study.
The validity of the asymptotic distribution theory for the estimator depends
critically on the appropriateness of the assumed marginal models. The consequences
of misspecified marginal models were investigated. We derived the general asymptotic
properties for the maximum partial likelihood estimator under a possibly misspecified
marginal Cox regression hazard model. The general results were then applied to
some special cases, including the case of misspecifying the type of baseline hazard
functions for the Cox model when the correct functional form of exp{,8' Z} was used.
The simulation results indicate that the derived asymptotic results for the estimator
under a possibly misspecified marginal hazard model are preserved in the finite sample
sizes applicable in practice. Further numerical study is needed to investigate the loss
•
of efficiency in the estimate of 13 when stratification of failures is used unnecessarily.
•
It is important to assess the adequacy of the marginal hazard models. Spiek-
erman and Lin (1996) developed a class of graphical and numerical techniques using
residuals based on marginal martingale processes for checking the adequacy of the
marginal Cox model. The applicability of their methods to the mixed baseline hazard models deserves investigation.
.
"
136
.
Bibliography
Aalen, O. (1988). Heterogeneity in survival analysis, Statistics in Medicine 7: 1121-
1137.
Akaike, H. (1973). Information theory and an extension of the likelihood principle, Proceedings of the Second International Symposium of Information Theory,
Akademiaia Kiado : Budapest.
Andersen, P. K. and Borgan, 0. (1985). Counting process model for life history data:
A review, Scandinavian Journal of Statistics 12: 97-158.
•
Andersen, P. K. and Gill, R. D. (1982). Cox's regression model for counting processes:
A large sample study, The Annals of Statistics 10: 1100-1120.
Andersen, P. K., Borgan, 0., Gill, R. D. and Keiding, N. (1993). Statistical Models
Based on Counting Processes, New York: Springer.
Anderson, G. L. and Fleming, T. R. (1995). Model misspecification in proportional
hazards regression, Biometrika 82: 527-541.
Anderson, J. E. and Louis, T. A. (1995). Survival analysis using a scale change random
effects model, Journal of the American Statistical Association 90: 669-679.
•
Bandeen-Roche, K. J. and Liang, K. Y. (1996). Modelling failure-time associations
in data with multiple levels of clustering, Biometrika 83: 29-39.
Bartle, R. G. (1966). The Elements of Integration, New York: Wiley.
137
Block, H. W. and Basu, A. P. (1974). A continuous bivariate exponential extension,
Journal of the American Statistical Association 69: 1031-1037.
Cai, J. and Prentice, R. L. (1995). Estimating equations for hazard ratio parameters
based on correlated failure time data, Biometrika 82: 151-164.
.
Chang, I. and Hsiung, C. A. (1995). An efficient estimator for proportional hazards
models with frailties, Unpublished Manuscript.
Chang, I. S. and Hsiung, C. A. (1994). Information and asymptotic efficiency in some
generalized proportional hazards models for counting processes, The Annals of
Statistics 22: 1275-1298.
Clayton, D. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence,
Biometrika 65: 141-151.
Clayton, D. and Cuzick, J. (1985). Multivariate generalizations of the proportional
hazards model, Journal of the Royal Statistical Society, Ser. A 148: 82-117.
Clayton, D. G. (1991). A monte carlo method for bayesian inference in frailty models,
Biometrics 47: 467-485.
Cox, D. R. (1972). Regression models and life-tables (with discussion), Journal of the
Royal Statistical Society, Ser. B 34: 187-220.
Cox, D. R. (1975). Partial likelihood, Biometrika 62: 269-276.
Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data, London: Chapman and
Hall.
Cramer, H. (1946). Mathematical Methods of Statistics, Princeton University Press.
Crowder, M. (1985). A distributional model for repeated failure time measurements,
Journal of the Royal Sta.tistical Society, Ser. B 65: 447-452.
138
•
Dawber, T. R. (1980). The Framingham Study, the epidemiology of atherosclerotic
disease, Harvard University Press.
Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimation: observed versus expected information, Biometrika 65: 457-482.
Farlie, D. J. G. (1960). The performance of some correlation coefficients for a general
bivariate distribution, Biometrika 47: 307-323.
Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Anal-
ysis, New York: Wiley.
Freund, J. E. (1961). A bivariate extension of the exponential distribution, Journal
of the American Statistical Association 56: 971-977.
Gail, M. H., Santner, T. J. and Brown, C. C. (1980).
An analysis of compara-
tive carcinogenesis experiments based on multiple times to tumor, Biometrics
•
a
36: 255-266.
Gail, M. H., Tan, W. Y. and Piantadosi, S. (1988). Tests for no treatment effect in
randomized clinical trials, Biometrika 75: 57-64.
Gail, M. H., Wieand, S. and Piantadosi, S. (1984). Biased estimates of treatment effect
in randomized experiments with nonlinear regressions and omitted covariates,
Biometrika 71: 431-444.
Genest, C. and MacKay, R. J. (1986). The joy of copulas: Bivariate distributions
with univariate marginals, The American Statistician 40: 280-283.
Gill, R. D. (1980). Censoring and Stochastic Integrals Mathematical Center Tracts.
•
No. 124, Amsterdam: The Mathematical Centre.
Gumbel, E. J. (1960). Bivariate exponential distributions, Journal of the American
Statistical Association 55: 698-707.
139
Hougaard, P. (1986a). A class of multivariate failure time distributions, Biometrika
73: 671-678.
Hougaard, P. (1986b). Survival models for heterogeneous populations derived from
stable distributions, Biometrika 73: 387-396.
Hougaard, P. (1987). Modelling multivariate survival, Scandinavian Journal of Statistics 14: 291-304.
Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions, Proceedings of the Fifth Berkeley Symposium in Mathematical
Statistics and Probability, University of California Press: Berkeley, pp. 221-233.
Huster, W. J., Brookmeyer, R. and Self, S. G. (1989). Modelling paired survival data
with covariates, Biometrics 45: 145-156.
Johansen, S. (1983). An extension of cox's regression model, International Statistical
•
Review 51: 258-262.
Kalbfleisch, J. D. and Prentice, R. 1. (1980). The Statistical Analysis of Failure Time
,
Data, New York: Wiley.
Kent, J. T. (1982). Robust properties of likelihood ratio tests, Biometrika 69: 19-27.
Klein, J. P. (1992). Semiparametric estimation of random effects using cox model
based on the EM algorithm, Biometrics 48: 795-806.
Klein, J. P., Keiding, N. and Kamby, C. (1989). Semiparametric marshall-olkin models
applied to the occurrence of metastases at multiple sites after breast cancer,
Biometrics 45: 1073-1086.
Klein, J. P., Moeschberger, M., Li, Y. H. and Wang, S. T. (1992). Estimating random
effects in the Framingham Heart Study, in J. P. Klein and P. K. Goel (eds),
140
•
Survival Analysis: State of the Art, Kluwer Academic Publishers: Dordrecht,
pp.99-120.
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency, The Annals
of Mathematical Statistics 22: 79-86.
Lagakos, S. W. (1988). The loss in efficiency from misspecifying covariates in proportional hazards regression models, Biometrika 75: 156-160.
Lagakos, S. W. and Schoenfeld, D. A. (1984). Properties of proportional hazards
score tests under misspecified regression models, Biometrics 40: 1037-1048.
Lee, E. W., Wei, L. J. and Amato, D. A. (1992). Cox-type regression analysis for
large numbers of small groups of correlated failure time observations, in J. P.
Klein and P. K. Goel (eds), Survival Analysis: State of the Art, Kluwer Academic
Publishers: Dordrecht, pp. 237-247.
•
Lee, E. W., Wei, 1. J. and Ying, Z. (1993). Linear regression analysis for highly stratified failure time data, Journal of the American Statistical Association 88: 557565.
Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized
linear models, Biometrika 73: 13-22.
Liang, K. Y., Self, S. G. and Chang, Y. (1993). Modeling marginal hazards in multivariate failure time data, Journal of the Royal Statistical Society, Ser. B 55: 441453.
Lin, D. Y. (1994). Cox regression analysis of multivariate failure time data: The
marginal approach, Statistics in Medicine 13: 2233-2247.
Lin, D. Y. and Wei, L. J. (1989). The robust inference for the Cox proportional
hazards model, Journal of the American Statistical Association 84: 1074-1078.
141
Lin, J. S. and Wei, L. J. (1992). Linear regression analysis for multivariate failure time
observations, Journal of the American Statistical Association 87: 1071-1097.
Marshall, A. W. and Olkin, I. (1967). A multivariate exponential distribution, Journal
of the American Statistical Association 62: 30-4,1.
Marshall, A. W. and Olkin, I. (1988). Families of multivariate distributions, Journal
of the American Statistical Association 83: 834-840.
Murphy, S. A. (1994). Consistency in a proportional hazards model incorporating a
random effect, The Annals of Statistics 22: 712-731.
Murphy, S. A. (1995). Asymptotic theory for the frailty model, The Annals of Statis-
tics 23: 182-198.
Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sf/Jrensen, T. I. A. (1992). A counting
process approach to maximum likelihood estimation in frailty models, Scandina-
vian Journal of Statistics 19: 25-43.
Oakes, D. (1982). A model for association in bivariate survival data, Journal of the
Royal Statistical Society, Ser. B 44: 414-422.
Oakes, D. (1989). Bivariate survival models induced by frailties, Journal of the Amer-
ican Statistical Association 84: 487-493.
Oakes, D. and Manatunga, A. (1992). Fisher information for a bivariate extreme
value distribution, Biometrika 79: 827-832.
Prentice, R. L. (1978). Linear rank test with right censored data, Biometrika 65: 167179.
Prentice, R. 1. and Cai, J. (1992). Covariance and survivor function estimation using
censored multivariate failure time data, Biomet'rika 79: 495-512. Amendment
80: 711-712.
142
•
Prentice, R. 1. and Hsu, L. (1996). Estimating equations for hazard ratio and correlation parameters in multivariate failure time analysis, to appear in Biometrika.
Prentice, R. 1. and Zhao, 1. P. (1991).
Estimation equations for parameters in
means and variances of multivariate discrete and continuous responses, Biomet-
rics 47: 825-839.
Prentice, R. L., Williams, B. J. and Peterson, A. V. (1981). On the regression analysis
of multivariate failure time data, Biometrika 68: 373-379.
Puri, M. 1. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis,
New York: Chapman and Hall.
Royall, R. M. (1986). Model robust confidence intervals using maximum likelihood
estimators, International Statistical Review 54: 221-226.
Rubin, D. B. (1976). Inference and missing values, Biometrika 63: 81-92.
•
Sarkar, S. K. (1987). A continuous bivariate exponential distribution, Journal of the
American Statistical Association 82: 667-675.
Sen, P. K. (1981). The Cox regression model, invariance principles for some induced
quantile processes and some repeated significance test, The Annals of Statistics
9: 109-121.
Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics: An Intro-
duction with Applications, New York: Wiley.
Solomon, P. J. (1984). Effect of misspecification of regression models in the analysis
of survival data, Biometrika 71: 291-298. Amendment 73: 245.
•
Spiekerman, C. F. and Lin, D. Y. (1996).
Checking the marginal cox model for
correlated failure time data, Biometrika 83: 143-156.
143
Struthers, C. A. and Kalbfleisch, J. D. (1986).
Misspecified proportional hazard
models, Biometrika 73: 363-369.
Therneau, T. M. (1996). Extending the Cox model, Technical Report no. 58, Mayo
Foundation.
Tsiatis, A. (1981). A large sample study of Cox's regression model, The Annals of
Statistics 9: 93-108.
Vaupel, J. W., Manton, K. G. and Stallard, E. (1979). The impact of heterogeneity
in individual frailty on the dynamics of mortality, Demography 16: 439-454.
Wei, L. J., Lin, D. Y. and Weissfeld, L. (1989). Regression analysis of multivariate
incomplete failure time data by modeling marginal distribution, Journal of the
American Statistical Association 84: 1065-1073.
White, H. (1982). Maximum likelihood estimation of misspecified models, EconometII
rica 50: 1-25.
Zhao,1. P. and Prentice, R. 1. (1990). Correlated binary regression using a quadratic
exponential model, Biometrika 77: 642-648.
•
144