Higgins, J. E.; (1978A Model for the Anlaysis of Survival with an Intervening Event."

•
•
A MODEL FOR THE fu~ALYSIS OF SURVIVAL
WITIi AN INTERVENING EVENT
by
James Everett Higgins
Department of Biostatistics
University of North Carolina
Institute of Statistics Mimeo Series No. 1199
OCTOBER 1978
•
•
A MJDEL FOR 1HE ANALYSIS OF SURVIVAL
WIlli AN INIERVENING EVENI
by
James Everett Higgins
•
A Dissertation submitted to the faculty of The
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for
the degree of Doctor of Philosophy in the
Department of Biostatistics, School of Public
Health.
Chapel Hill
1978
Approved by:
V~~ c:~~
y\
=Advlser-------,--!--·~
•
y//
~-
•
ABSTRACT
JAMES EVEREIT HIGGINS. A Model for the Analysis of Survival with an
Intervening Event. (Under the direction of DENNIS B. GILLINGS.)
Methods introduced by Lagakos for incorporating infonnation
from a time-dependent covariable (an intervening event) into the
analysis of failure times are generalized.
The research was motivated
by two follow-up studies--one involving industrial workers where disI
ability retirement is the intervening event and the other, patients
with coronary artery disease where the first non-fatal myocardial
infarctiOn after diagnosis for the disease is the intervening event.
The generalized model is applied to data from these two studies.
•
The model assumes that individuals are potentially subject to
two paths to failure, one involving the intervening event and the
other not.
Failure time for the path including the intervening event
is the sum of the intervening event time and the time to failure
subsequent to the intervening event.
Failure time for the path with
no intervening event is the time to failure in the absence of the
intervening event.
Additional model assumptions are that the failure
times associated with the two paths are independent and that the time
to failure subsequent to the intervening event is dependent on the
intervening event time.
Model properties are developed in terms of density and survival
functions, without need for reference to specific distributions.
Methods for testing hypotheses about model parameters and testing the
e·
goodness of fit of the model are presented.
The model is adapted for
situatlons where the time of the intervening event is recorded only
in a specified interval and death times are recorded continuously
•
or where both the time of the intervening event and the time of
death are recorded in specified intervals.
•
•
•
ACKNO\'lLF.IlGlENTS
There are several people who have contributed their time and
encouragement to this research.
Foremost, I wish to thank my adviser,
Dr. Dennis B. Gillings, for his continual enthusiasm and valuable
insights.
A special thanks is extended to Dr. Regina C. Elandt-
Johnson for her timely guidance.
I would also like to acknowledge
the helpful contributions of the other members of my committee,
Dr. Michael J. Symons. Dr. Kerry L. Lee. Dr. Gary G. Koch. and
•
Robert A. Rosati.
~1.D.
Financial assistance for my graduate study and research was
provided by NCHSR Training Grant 5-TOl-HS00045-0553 administered
through the Health Services Program of the Department of Biostatistics.
Skillful typing of the manuscript was provided by Bea Parker,
also of the Health Services Program.
Finally. my warmest thanks go to my wife. Barbara.
She gave
me help, encouragement, and a baby boy named Trux.
•
\'
•
TABLE OF CONTENI'S
Page
ACKNOWLEDGMENTS
ii
LIST OF TABLES
vi
LIST OF FIGURES
vii
O1apter
I.
•
INTRODUCTION AND REVIEW OF LITERATIJRE
1
1.1 Introduction • . . . . . . .
1.2 Fundamental Definitions
1.3 Methods of Survival Analysis
1
1.3.1 Nonparametric Approaches
1.3.1.1 Life Table Methods
.
1.3.1.2 Kaplan-Meier Product-Limit Method
1.3.2 Parametric Models Considering
Concomitant Variables . . . . . .
1.3.2.1 Exponential-Type Models
1.3.2.2 Other Parametric Models
II.
•
3
4
5
5
7
7
9
12
1.3.3 Semiparametric Models . . .
1.3.4 Time-Dependent Covariables
14
16
1. 4 Contents of the Present Work • . •
21
AN INTERVENING EVENT SURVIVAL MJDEL FOR CONTINUOUS
FOLLOW-UP . . . . . • . • • .
23
2.1 Introduction . . . • . . . . . • . .
2.2 Description of the Model . . . • . .
23
27
2.2.1 Maximum Likelihood Estimation ••.
·2.2.2 The Unconditional Survival Function
2.2.3· Inclusion of Covariates . • . . . .
32
34
2.3 The Assumption of Independent Paths to Failure
2.4 Dependence Structure Between Xz and X3
2.5 Likelihood Ratio Tests . • • • • . . . ,
.•
35
37
39
28
iv
O1apter
Page
2.5.1 Test of Independence of Xz' X3
..
..
2.5.2 Tests of Other Parameters (Model Building).
2.6 Goodness of Fit
III.
•
41
SURVIVAL ANALYSIS FOR AN INDUSTRIAL SAMPLE WIlli
DISABILITY RETIREMENf AS TIlE INfERVENING EVENT
43
3.1 Introduction. . .
.
3.2 Description of the Data
3.3 Estimation and Testing .
43
46
3.3.1
3.3.2
3.3.3
3.3.4
3.4
Estimation
.•
Hypothesis Tests and Model Building
Goodness of Fit. . . . .
. ..
Discussion of Results . . . . .
48
49
51
54
55
Some Derivations Associated with Weibull
Distributional Assumptions . .
.,
57
3.4.1 The Survivor Function for the
Unconditional Time to Failure
57
3.4.1.1 Constant Hazards Model with
Covariates (X 2 , X Independent)
3
3.4.1.2 Constant Hazards Model with Covariates (X , X Not Independent).
2 3
3.4.1.3 Non-Constant Hazards Model with
Covariates.
....
.
3.4.2 Some Special Cases of the Pr[Xl>Xzl
3.4.2.1 Constant Hazards Model with
Covariates .
.....
3.4.2.2 Non-Constant Hazards ¥~del with
Covariates
....
IV.
39
40
58
59
60
61
62
62
TIlE MJDEL WHEN FOLLOW-UP IS AT FIXED INfERVALS
67
4.1 Introduction..
..
.
..
..
4.2 Description and Notation for Interval Follow-up
67
68
4.2.1 Follow-up of Intervening Event at
Fixed Intervals
...
....
4.2.2 Follow-up of Both Intervening Event
and Death at Fixed Intervals
69
4.3 MaxinIWn Likelihood Estinlation
•
71
72
•
•
v
Page
O1.apter
4.3.1
4.3.2
Follow-up of Intervening Event at
Fixed Intervals . . . . . . . . .
Follow-up of Both Intervening Event and
Death at Fixed Intervals . . . • . . .
4.4 Application of the Model to a Sample of Coronary
Artery Disease Patients Where the Intervening
Event is Observed in an Interval . . . . . . .
4.4.1 Background and Description of the Data
4.4.2 Estimation and Testing
.
4.4.3 Discussion of Results
.
..
V.
SWf.IARY AND SUGGESTIONS FOR RJTIJRE RESEARCH
•
76
78
78
81
88
89
5 .1
Summary
. . .• . . .
89
5.2
Suggestions for Future Research
91
LIST OF REFERENCES
•
73
94
•
LIST OF TABLES
Page
Table
3.1
Number of Bnployees Observed in Each of Four Possible
Ways for Different Age Groups (Age at Entry into Study)
48
3.2
Parameter Estimates for Likelihood (3.1) with
Estimated Standard Errors . . . . . . . . . . . . .
51
Maximized Log Likelihoods for Selected Constraints
to Likelihood FUnction (3.1)
52
3.4
Parameter Estimates with Estimated Standard Errors
for" the Likelihood (3.1) with the Constraint 63=0 .
54
3.5
Results of Goodness of Fit Test (2.24) Where the
Unconditional Survival Function is Provided by (3.2)
with Parameter Estimates from Table 3.4 . . . . . .
55
Parameter Estimates with Estimated Standard Errors
for the Likelihood (4.12) . . . . . . . . . . . . .
83
Maximized Log Likelihoods for Selected Constraints
to Likelihood Function (4.12) . . . . . . . . . . .
84
Parameter Estimates with Estimated Standard Errors
for Likelihood (4.12) with the Constraint a=O . .
86 "
Results of Goodness of Fit Test (2.24) Where the
Survival Func~ion is~Provided by (4.16) with
Parameter Est~tes A = 0.00274, ~ = 1.36234 . .
87
3.3
•
4.1
4.2
4.3
4.4
•
•
LIST OF FIGURES
Figure
Page
2.1
Intervening Event Failure Paths. . . . . . . . . .
24
2.2
Four Types of Observations Possible when Censoring
is Present and Follow-up is Continuous . . . . . .
25
Types uf Observations Possible when Censoring
Present and Follow-up of the Intervening Event is
at Intervals . . . . . . . . . . . . . . . . . . . .
69
Four Types of Observations Possible when Censoring is
Present and Follow-up of the Intervening Event and
Death is at Intervals . . . . . . . . . . . . . . . .
71
4.1
~our
1.S
4.2
•
•
•
QIAPTER I
INTRODUCTION AND REVIEW OF LITERATIJRE
1.1
Introduction
A major focus of many epidemiological, clinical, and other
medical investigations is the time between an individual's entry into
the study and the occurrence of some event that relates to him.
The
event of interest can vary with the research setting and is often
described as a "failure," with the elapsed time termed the failure
•
time.
A commonly encountered event of interest is death and the
associated time is labelled death time, or more optimistically, lifetime or survival time.
In the case of many types of cancer patients,
an important criterion for evaluating the treatment that a patient has
received is the time from treatment until death.
For treatment of
chronic infections, death is often not the event of interest but rather
the time until the infection returns after application of a therapy.
Statistical methods aimed at the analysis of failure time data
are called survival analysis.
Data for the analysis of survival are
generally gathered from longitudinal or prospective studies.
These
follow-up studies range from controlled clinical trials utilizing wellestablished principles of experimental design to essentially observational investigations where as many individuals as are available are
•
incorporated in the analysis.
Common requirements for follow-up
studies are precise definitions of the popUlation to be observed and
of the failure or outcome event.
2
There are several complicating features related to the
collection and analysis of follow-up data that are shared by the
various methods of data gathering.
•
Since the period of the data
collection is usually of limited duration, the study may end before
all the members exhibit the failure event.
Data of this type can
provide only partial information on times to failures and are said to
be censored.
Additionally, study members may be lost from follow-up
as a result of some event that is not of interest.
Such events in-
clude the withdrawal of an individual from a study, the inability to
locate an individual to obtain follow-up information, the death of a
study member before 'an event of interest occurs, and, if death from a
specific cause is the event of interest, the death of a patient from
some other, competing cause.
In the analysis ,of failure time, it is often fruitful to
consider concomitant information provided by certain features of study
members that are not accounted for by the design of the study.
•
These
covariables are often helpful in explaining differences in failure
times among individuals.
Some concomitant information on study members may be available
at an individual's point of entry, while additional information i,;
encountered during follow-up.
Entry point covariates are frequently
referred to as baseline variables and can be divided into two types,
those that do not vary with time and those that do.
Examples of co-
variables that do not vary with time are race, sex, and date of birth.
Variables that are likely to change over time are blood pressure,
weight, and level of cholesterol in the blood.
•
•
3
One type of concomitant information that becomes available only
after follow-up begins is provided by a time-dependent event that n, i'
be related to failure time.
Individuals enter the study free of the
event but may experience it during the period of
follow~up.
For
example, consider the follow-up of an industrial cohort where the
employees were actively working at their times of entry into the study.
Subsequent to entry, an employee may receive an early retirement due
to a disability that indicates health impairment and a possible reduction in lifetime.
The occurrence time of the disability retirement
was not available at baseline and could only be revealed during followup:
In a similar fashion, baseline' covariates that vary with time can
provide additional concomitant information during follow-up.
•
After introducing the fundamental technical definitions used in
survival analysis, we shall review some of the methods that have been
employed 'in the analysis of failure times.
The last section in this
chapter will briefly discuss the contents of the present work and outline the material in the remaining chapters.
1.2' Fundamental Definitions
Let T be a non-negative random variable that represents the
failure time of an individual.
The fundamental definitions and
relationships with respect to T are as follows:
Failure Distribution Function
F(t)
= Pr(T~t) = Pr
f (t)
= dF (t)/dt
(an individual fails at or before t)
(1.1)
and
•
is the probability density function for 1.
(1.2)
4
Survival Function
pet)
= l-F(t) = Pr(T>t) = Pr
(an individual fails after t)
•
(1. 3)
Note that
f(t)
=
dP(t)/dt.
Hazard Function
Pr(t<T"t+lit~
lit
=
f(t)/F(t)
Since
(1.4)
\'10
h:.:ve by integration
t
pet)
=
exp(-
J
o
).l(T)dT)
(1. 5)
which, after differentiation, yields
•
t
f(t)
=
).l(t)
exp(-
J
o
).l(T)dT).
(1.6)
The survival function has also been labeled survivor function and
survivorship function while the hazard function is alternatively known
as the force of mortality and the instantaneous failure rate.
Similarly, the time-dependent events that provide concomitant
information after the follow-up begins have distribution, survival,
and hazard functions associated with their occurrence times.
1.3 Methods of Survival Analysis
Techniques for analyzing survival data are known to have been
available since the end of the seventeenth century when life table
methods were first introduced.
Much of the early interest and
•
•
5
introduction of methods was in the field of actuarial science.
The
military and industrial needs of the Second World War and the post- ..dr
industrial boom helped motivate the extensive development of methodology
called reliability theory for analyzing data on the lifetimes of manmade equipment.
Many of the approaches from reliability theory have
been adapted for use in living populations.
More recently, there has
been a great deal of interest focused on methods for analyzing survival
data from epidemiologic research, clinical trials, and other medical
studies.
In the following subsections we shall review some of the
literature on survival analysis giving more emphasis to methods that
are related to the work in later chapters .
•
1.3.1 Nonparametric Approaches
1.3.1.1
Life Table Methods
Nonparametric approaches to survival analysis have their
beginnings with life table methods developed by actuaries.
Detailed
discussions of life table methods can be found in Chiang (1968) and
Gross and Clark (1975).
The life table is a summary of survival data grouped into
convenient time intervals.
For certain applications (actuarial being
one example), data are actually collected in a grouped form.
In other
situations, the data are grouped to prOVide a simpler and more easily
. understood format.
Typically, the vital information collected in the
intervals includes the number alive at the beginning of an interval,
•
the number dying during an interval, the number lost to follow-up
during an interval, and the number withdrawn alive during an interval.
Using this information, conditional estimates of the proportion
6
surviving to the end of an interval or dying during an interval are
computed along with the unconditional cumulative proportion
to the beginning of an interval.
surviv~,lg
•
In addition, estimates of the density
and hazard functions for an interval are often obtained.
Chiang (1968) points out that the cohort life table and the
current life table are the two basic fonus of life tables in general
use, today.
A cohort life table follows the mortality experience of a
set of individuals born about the same time, while the current life
table considers the mortality experience of a specific group that usually share characteristics other than a common birth date and are
followed for a relatively short period of time.
Because of the length
of follow-up required for a cohort life table, the method has not
attained widespread use for human populations.
Classically, current or
population life tables have relied on census and vital statistics data
taken at a single period of time for all age groups (for example, the
•
United States population in 1970).
A variant of the current life table is the medical follow-up or
clinical life table.
The data for these tables arise from follow-up
studies of individuals who share some characteristic such as a disease
or an environmental exposure.
In many instances, subjects enter the
study at different times and follow-up is frequently terminated before
survival information for all individuals is complete.
The term "cohort"
is currently in popular use to describe such a group of individuals
even though they do not in general share a common birth date.
Early
work on the description and use of clinical life table techniques was
provided by Berkson and Gage (1950) and Cutler and Ederer (1958).
•
7
•
1.3.1.2 Kaplan-Meier Product-Limit Method
The Kaplan and Meier (1958) product-limit approach.to estimh .ing
probabilities related to survivorship does not consider survival times
grouped in intervals, but rather the ordered times of observed deaths.
Let t(1)st(2)s, •.. ,t(i)s ... ,t(k) be the ordered times of death and n i
be the ntunber of individuals remaining under observation just prior to
t(i)
(i·~.,
including the individual who died at t(i))'
By
convention,
Kaplan and Meier treat a death and a loss or censoring that occur at
the. same time
t(i)
as if the death occurred imnediately before t(i)
and the loss immediately after t(i)'
Then, the Kaplan-Meier estimate
~
of the survival function, designated by pet)
~
•
-ret) =
(nl-l)
nl
(nz-l)
n2
is
(nr -1)
rr
[n~~l)
r
= 1f
i=l
l.
~
where r=l,Z, ... ,k and t(r)st<t(r+l),
pet) equals one until the first
death and decreases in steps at points where a death occurs. . If there
~
are no individuals lost, F(t) coincides with the usual empirical survival function
~
F(t)
=
1 -
f
1.3.2 Parametric Models Considering Concomitant Variables
Generally, investigators in medical or epidemiologic follow-up
•
studies deal with populations with individual characteristics that vary
widely outside of a relatively small set of common traits.
For example,
8
individuals with coronary artery disease differ with respect to age,
sex, race, history of hypertension, plasma triglyceride and
chl_~sterol
•
levels, history of diabetes, left ventricular function, and many other
characteristics which may be related to survival, given the presence of
the disease.
Since the factors related to survival with a particular
pathologic condition and, perhaps, treatment for the condition, are not
always well known and are often hard to hypothesize, it is difficult to
aportion patients to homogeneous subgroups to facilitate the comparison
of treatment regimens.
Such difficulties may necessitate the inclusion
of covariables or explanatory variables for non-homogeneous populations.
There exists, an abundant literature on' failure time models for
homogeneous populations.
I
,
Mann, Schafer, and Singpurwalla (1974) pre-
sented various methods for analysis of failure time and life time data
that are generated by known distributional forms.
These distributions
include the exponential, gamma, lognormal, extreme value, with a special
emphasis given to the Weibull distribution.
•
A presentation by Gross
and Clark (1975) discussed most of the common survival distributions
utilized in biomedical research and includes a detailed coverage of
the exponential distribution.
Additionally, a thorough discussion of
the properties and theoretical bases of many of the distributions
commonly encountered in survival analysis is given by Johnson and Kotz
(1970a,b).
Some of the survival distributions mentioned above have
been generalized to take account of concomitant information on individual observations.
below.
A variety of these generalizations are reviewed
•
•
9
1.3.2.1
Exponential-Type Models
The one parameter exponential distribution is obtained by ta '-ng
the hazard Il(t)
= Il
to be
a constant over the
range of t.
Since the
instantaneous failure rate does not involve t, the chance of failure in
the next. time interval is the same regardless of the length of the
follow-up.
This characteristic is referred to as the memoryless prop-
erty of the exponential distribution.
The probability density and
survival functions of T are respectively
.'
exp( -Ilt)
f(t)
= Il
F(t)
= exp(-Ilt),
and
•
t>O.
An empirical check of the exponential distribution for a set of failure
time data is provided by plotting (-loge) of the survival function
estimate versus t.
If the exponential model is appropriate, the plot
should approximate a straight line through the origin and the slope of
the line provides a rough estimate of Il.
A generalization of the exponential distribution to allow for
concomitant variables can be obtained by letting the hazard rate be a
function of the covariates z.
If Il(t;:) represents the hazard at time
t ·for an individual with explanatory vector :' the exponential hazard
is represented by
Il(t;:)
= Il(:)
,
where the hazard rate does not change with
t
but varies with::.
·Feigl and Zelen (1965) presented a method of estimating survival
•
distributions for individual patients based on different exponential
hazard rates of the form
10
lJ(t;z)
=
(a
+
bz)
-1
(1. 7)
with the extension to multiple covariates being suggested.
The methud
•
was applied to the non-censored survival data of acute leukemia patients,
utilizing the logarithm of the white blood cell count at the time of
diagnosis as the covariate.
The model was generalized by Zippin and
Armitage (1966) to consider censoring of the follow-up times of patients.
Asymptotic standard errors of the parameter estimates are obtained for
several study lengths given a fixed set of values of the concomitant
variable and assuming various modes of entry of the patients into the
study.
Zippin and Armitage also indicated that the methods could be
extended to multiple covariates.
A disadvantage with the hazard model given by (1.7) is that
parameter values are constrained to a set that will guarantee (a + bz)
•
to be positive for all z's. An alternative representation of the hazard
that remains non-negative regardless of parameter values is given by
which has frequently been called a log linear model.
In effect, this
model adjusts the hazard rate by a multiplicative factor which is a
function of the covariates
z where:'
= (zl'zZ""'zp)'
Glasser (1967) applied the form given by (1.8) in analysis of
the survival after surgery' of two groups of patients with lung cancer.
Specifically, it was assumed that the hazard rate for the i th individual
in the j th group was
IJ·
.(t;z .. )
1J
1J
=
lJ·exp(Sz .. ) ,
J
1J
wi th z.. being the difference in age of the ij th individual from the
IJ
•
•
11
average of all the
ij
individual ages.
There were two groups considered.
The quantity lJj was characterized as the "adjusted" hazard rate in ..;roup j
and the emphasis of the analysis was on a comparison of survivorship in the
two groups.
Subsequently, Prentice (1973) explicitly considered IIILIlti-
ple covariates in (1.8) and derived inference procedures for covariate
parameters and for the hazard function at average values of the covariates taken one at a time.
The log linear model (1.8) was employed by Breslow (1972,1974)
with attention restricted to an underlying hazard function lJO
was constant
(~.~.,
which
derived from an exponential failure model) between
the observed uncensored failure times and can be represented by
•
where the t(i) were the ordered, uncensored survival times.
Addition-
ally, the censored observations which occur in an interval (ti,t + )
i l
were adjusted so that they were treated as occurring at t . This model
i
was compared to the linear exponential model of Feigl and Zelen (1965)
and the log linear exponential model of Glasser (1967) for the comparison of survival curves related to a clinical trial of maintenance
therapy for childhood leukemia.
Recently, Holford (1976) has considered
a model similar to that presented by Breslow (1974), where the underlying survival function lJO(t) is assumed constant over
k
follow-up
(xi,x i + l ] for i = l,Z, ... ,k where Xl = O.
Other authors have investigated topics related to linear and log
intervals
•
linear exponential models.
Cox (1964) discussed some aspects of the
linear exponential model (1.7) related to exponential ordered scores.
Cox and Snell (1968) presented techniques for evaluating residuals from
12
a fitted regression and applied these to the log linear model (1.7).
Sprott and Kalbfleisch (1969) considered the use of the likelihood
•
functions, both exact and resulting from large sample theory, as measures
of the plausibility of parameter values.
Mantel and Myers (1971) looked
at problems of convergence in the solution of maximum likelihood iterative equations.
Additional literature related to applications of linear and log
linear exponential survival models has been given by Cooper et al.
(1972), Bayard et al. (1974), Byar et al. (1974), Ramirez et al. (1975),
and Levin et al. (1976).
1.3.2.2 Other Parametric Models
A useful generalization of the exponential distribution is to
allow for a power dependence of the hazard rate on time in the follow-
•
mg way
-l
].J(t) = AytY
where A, y>O.
This hazard yields the Weibull distribution and is mono-
tone increasing for y>l, monotone decreasing for y<l, and reduces to
the constant exponential hazard for y=l.
The probability density and
survival functions of T are respectively
and
F(t) = exp( -Aty) , t>O .
Since
an empirical check of the Weibull distribution for a set of failure time
•
•
13
A
A
'data is provided, by a plot of loge [-log epet)] versus log
, e t where pet)
is a sample estimate of the survival ftmction.
If the Weibull mode
is appropriate, the plot should give approximately a straight line.
The slope of the line provides a rough estimate of y
intercept an estimate of
and the loget
loge~'
Further, the Weibull distribution can be generalized to obtain
a regression model by utilizing the JlDlltiplicative or log linear form
jJ(t;~) = ~y(At)
y-l
exp(B'~)
(1.9)
where ~ ,pO and W (At) y -1 is the hazard ftmction.
Nelson and Hahn
(1971) and Prentice and Shillington (1975) used hazard ftmctions similar
to (1.9) to investigate, respectively, accelerated life testing in an
•
industrial setting and survivorship of advanced lung cancer patients .
Additionally,'Dyer (1975) explored model (1.9) and labeled it an exponential-Weibull model.
An alternative approaCh to the study of survival data is to
consider a bivariate response variable (for example, alive or not alive)
at the end of some period of observation.
The JlDlltiple logistic function
(see Truett, Cornfield, and Kannel [1967] and Walker and Duncan [1967])
z. of the i th individual of a sample with the
, relates the risk factors _1
probability that the individual dies within a specified period of time
by the model
P(~)
=
[1 - exp(-B':)]
-1
(1.10)
The logistic regression model has several potentially tmdesirable
•
deficiencies.
It does not utilize the time at which deaths occur, but
rather whether or not a death has occurred in a particular time period.
The probability estimates derived from the model are limited in
14
applicability to the period of time equal to that used for parameter
estimates.
Finally, the model is most appropriate when the follow- .....
•
time is essentially equal for all study members, which is usually not
the case in on-going clinical studies.
Myers, Hankey, and Mantel (1973)
have discussed an adaptation of the multiple logistic model to incorporate individual response time and have labeled their model a logisticexponential regression model.
1.3.3
Semiparametr~c Models
In 1972 n.R. Cox introduced a survival model based on the
assumption that explanatory variables
z
(in clinical studies these
can be thought of as prognostic and/or treatment factors) exert multiplicative effects on the hazard function of an underlying survival
distribution which assumes no particular parametric form.
This model
•
is expressed by
(1.11)
Model (1.11) is of the same form as the log linear model (1. 8) with the
exception that the form of the underlying hazard
~O(t)
is unspecified.
As will be discussed later, the covariate vector may contain time-
dependent covariables and so be represented by
~(t).
Suppose that t l <t 2< ••• <tk represent the k distinct times to
death among a set of survival times, and further suppose that the death
times are unique.
Let z. represent the vector of covariates for the
-1.
patient dying at t i and denote by Ri the set of labels for patients
whose survival times are at least as large as t.'.
With this notation,
1.
the conditional probability that only individual
i
dies at t., given
that someone in the risk set R. died, is represented by
.
1.
1.
•
•
15
exp(~' ~i)'
(1.12)
R.~R.exp(~'~R.)
1
see Cox (1972).
The product of these tenns for each death point yields
the conditional likelihood function
exp(B'z.)
k
L(B)
=
11
-
-1
(1.13)
L
exp(B'z,)
toR.
- _N
i=l
1
Standard likelihood methods were suggested for estimating B and for
significance testing.
The conditional likelihood (1.13) is based on assumptions about
•
the continuous nature of survival times. while clinical data frequently
show ties in survival times.
For handling tied data, Cox (1972) proposed a discrete analog of
the model (1.11) based on a linear logistic model.
Letting p.(z) repre1 -
sent the conditional probability of death at t i for an individual with
covariate vector z who survived up to t.. the discrete time model is
-
1
given by
]
p.1 (z)
-
[ I-Pi (~)
=
BIZ
+
- -
[
log
e
p.1_]
(1.14)
I-Pi
where p.1 represents the corresponding conditional probability
for the
.
underlying survival distribution.
Kalbfleisch and Prentice' (1973) formalized same of Cox's arguments
•
for the case of no time-dependent covariables by utilizing the information available in the rank statistics of the survival times.
They also
offered an alternative to the discrete version (1.14) of model (1.11).
16
Other considerations and applications of Cox's regression model can be
found in Kalbfleisch (1974a,1974b) and Breslow (1974,1975).
•
1.3.4 Time-Dependent Covariables
Both in presentation and application, the regression models of
Sections 1.3.2.1 and 1.3.2.2 implicitly assume that the study members,
prognostic and/or treatment factors (covariable vector
are recorded
~)
once and at a point in time which marks the beginning of survival time
measurements for all patients of interest.
~'
= (zl,z2""'zp)
problem.
Changes in
That is, the covariates
are all known constants as in the usual regression
~'s
might reasonably be expected to occur for some
individuals over the course of a lengthy follow-up and the information
contained in such changes might be of fundamental interest to the survival study.
Several authors have recently directed attention to time-
dependent covariables.
•
The regression model presented by Cox (1972) allowed for
covariables which are functions of time as represented by
(1.15)
the hazard function for an individual with a vector
dependent covariables.
~(t)
of time-
For example, in the two sample problem where
there is a covariable zl with binary values 0 and 1 acting as indicators
of the two samples, a second covariable z2
= tZ l
can be added to reflect
changE'S over time in the ratio of the two hazards (i.e., i f sample 1 is
indicated by zl=0, then the ratio of the hazard of sample 2 to sample 1
without the time-dependent covariable is the constant exp(Slzl) while
the ratio with z2 included is exp(Slzl+S2z2) and varies with time.
•
17
•
Bayard et al. (1974) utilized, in their words, an "exponentialtype" model with time-dependent covariables for comparing treatment_
for prostatic cancer patients.
In the model, the hazard for the i
th
individual assumes the form
k
]1.
1
= I
(t)
j=l
(1.16)
S·z .. + yt.
J 1J
1
where the z .. are assumed to be baseline covariables for the i
th
1J
individual in the j th treatment that do not vary with time and t
i
represents either the total follow-up time or the time removed from an
assigned treatment.
The time variable was included to investigate the
effect of different withdrawal times from treatments.
The Stanford Heart Transplant Program has been a focus of interest
•
for considerations of time-dependent covariables.
Patients are subject
to a change of treatment status in that upon entry into the program,
the patient must wait until a suitable donor is located for transplantation.
Several explanatory variables such as waiting time for transplant
are observed during patient follow-up and depend on time elapsed to
transplant.
Turnbull et al. (1974) considered an exponential survival model
in which each patient was assumed to have a constant hazard
changes at the time of transplant to
]12.
that
]11
The likelihood for the sample
of N+M patients where N (N-n were censored) patients did not receive a
transplant and M (M-m were censored) patients did, was given as
n
•
L =
TT
i=l
N
TT
k=n+l
M
TT
i=m+l
(1.17)
18
where x is the transplant time, w represents failure or censoring time for
those who do not receive a transplant during the study, and y corn Jponds
•
to failure or censoring time measured from the point of the transplant. One
shortcoming with Turnbull's model is that the time of transplant is
treated as a constant rather than a random variable with an associated
hazard.
Crowley and Hu (1977) adapted the survival model suggested by
Breslow (1972,1974), discussed earlier in Section 1.3.2, to consider a
set of covariates which are subject to change at a random point in time
with the objective of analyzing the change in hazard at transplant.
Two
types of time-dependent covariables were utilized with respect to the
heart transplant data.
One ·covariable represented the waiting time of
a patient to transplant, while a set of covariates were viewed as being
zero before transplant, but changing from zero to the actual value of
the particular covariates when the transplant took place.
As with
Turnbull's (1974) model, the random nature of the time-dependent
•
covariables was not fully accounted for in the analysis.
Recently, Lagakos (1976,1977) investigated the survival of
advanced lung cancer patients by using an exponential model that incorporated the stochastic nature of a time-dependent covariable.
The time-
dependent covariate was elapsed time until progression of the disease,
where progression was objectively defined.
able
c
T denoted failure time,
In the model, random vari-
P denoted time until progression, and
denoted whether or not a patient experiences progression before
death.
Random variables
T, P, and
c
were constructed from the
mutually independent exponential random variables Xl' X , and X repre2
3
senting (i) death time without progression, (ii) progression time, and
(iii) death time subsequent to progression, and with the respective hazards
•
The interpretation of the model is that patients begin the study with
respective hazard rates for death and progression of Al and A .
Z
If
progression is observed, the death hazard changes to A and subsequent
3
failure time is independent of the time to progression.
When there is censoring, Lagakos proposes additional notation.
•
Letting random variable Y represent "potential" censoring time (Le.,
the time between entry into the study and an event such as point of
analysis that results in censoring), and with T and P as described
by (1.18), he further defines
U
=
a
=
V=
b =
min(T,Y) ,
l:
if
T>Y
if T,;Y
min(P,U)
1:
if
P;'U
if
P<U
(1.19)
In terms of the 4-tuple (U,a,V,b), the four possible types of outcomes
•
that can be observed for a patient are as follows:
1.
(U=s,a=O,V=s,b=O):
Patient alive at time s, progression
has not occurred.
ZO
Z.
(U=s,a=O,V=p,b=l):
Progression at time p, patient
alive at time s(s>p).
3.
(U=t,a=l,V=p,b=l):
•
Progression at time p, death
at time t(t>p).
(U=t,a=l,V=t,b=O):
4.·
Death at time t without progression.
Treating Y as a constant and supposing that (U.,a.,V.,b.),
1
1
1
1
i=l,Z, ... ,n are i.i.d. vectors, Lagakos proposes the following likelihood for homogeneous populations.
where
n
U
=
L
i=l
n
V =
L
i=l
U.
•
1
V.
1
n
L a.(l-b.)
l = i=l
1 .
1
K
,
n
z=
K
L
i=l
b.
1
n
=
L a.b.
3
i=l 1 1
K
. ·and Kl , KZ' and K3 represent, respectively, the number of deaths without
progression, the number of progressions, and the number of deaths with
progression.
Maximum
likelihood estimates of AI' AZ' and A3 are shown
to be
A.
h
J
= K./V
J'
,
J
A = K /(U.-V.)
3
3
= 1,Z
,
•
21
•
and likelihood ratio tests are used for hypothesis testing of functions
of parameter values.
Lagakos also proposes a likelihood for heten
geneous populations by using the log linear model (1. 8) to account for
differences in individual characteristics as reflected through a
vector of covariates.
1.4 Contents of the Present Work
The objectives of the present work are to generalize the approach
by Lagakos (1976) for incorporating information from a time-dependent
covariable into the analysis of failure time and to apply the resulting
methodology for the analysis of survival in a sample of industrial
workers and a sample of patients with coronary artery disease.
•
In Chapter II, -the generalized model is introduced and developed
for follow-up data, the Lagakos formulation being a special case.
Labeling time-dependent covariates such as progression of disease as
intervening events, the model does not require that survival subsequent
to the intervening event be independent of the time to the intervening
event, and a test of the independence assumption is proposed.
Proper-
ties of the model are developed in terms of density and survival
functions without need for reference to specific distributions.
A
method for testing the goodness of fit of the model to observed data
is discussed.
O1apter III presents an analysis of failure time data from a
sample of industrial workers where disability retirement is taken as
the intervening event.
•
The two-parameter Weibull family of distributions
is used with the model and some associated derivations specific to this
distributional choice are provided.
ZZ
In O1apter IV, the methods of O1apter II are extended to cover
the situations where
1.
•
the time of the intervening event is recorded only in
a specified (fixed) interval, or
Z.
both the time of the intervening event and the time
of death are recorded only in specified (fixed)
intervals.
Data from a sample of patients with coronary artery disease are used to
illustrate application of the methods when follow-up of the intervening
event is at intervals.
For this example, a non-fatal myocardial
infarction after entry into the study is used as the intervening event.
Finally, O1apter V provides a summary of the present research
and offers suggestions for future research.
•
•
•
OW'TER II
AN INTERVENING EVENT SURVIVAL l>DDEL
FOR CONfINUOUS FOLLOW-UP
2.1
Introduction
In some follow-up studies of the death times of individuals,
there are events observed that may signal a change in the individual's
risk of dying. Including information provided by such events could
potentially improve a lifetime analysis.
•
For example, consider indi-
viduals in an industrial cohort where an early retirement due to a
disability indicates health impairment and possibly a reduction in life
expectancy.
Information about which employees will be subject to a
disability retirement is not available when individuals enter the workforce and can only be collected during follow-up.
This precludes
treatment of the information as a baseline covariable.
Data from this
example motivated the work presented in this chapter and more detail
about the form of the observations, to be analyzed· in a later chapter,
deserves mention at this· time.
The industrial cohort consists of 6421 white males who had been
actively employed as of January 1, 1964 for at least ten years at a
single rubber manufacturing location.
These employees were followed
for eight years and the exact times of disability retirements, if appro-
•
priate, along with the eXact times of death by any cause were recorded .
In addition, baseline information on age is available.
24
Henceforth, we shall label as intervening events occurrences
such as a disability retirement, and use death and the more general.
term failure interchangeably.
•
Death acts to censor observation of an
intervening event, since an intervening event cannot be observed subsequent
to a failure.
However, a failure can be observed after an intervening
event.
Individuals can be viewed as failing by one of the two paths
illustrated by Figure 2.1 where x
intervening event occurs and
t
2
represents the time at which the
the time of the failure.
In Figure
2.1, the upper path to failure includes the intervening event prior to
death while the lower path does not.
We shall need the following random
variables:
Xl:
time to failure in the absence of the intervening event
X :
2
time to the intervening event
X :
3
time to failure measured from the occurrence of the
•
intervening event,
Figure 2.1.
Intervening Event Failure Paths
•
•
25
It is assumed that all the individuals in the study are
potentially subject to both the intervening event and death.
When
follow-up of subj ects begins, the intervening event and death are in
competition.
We define
(2.1)
where T is the time at death.
Thus, if the intervening event occurs
before failure (i.e., X <X ),the time to failure is given by T
2 1
However, if
~ ~Xz'
= Xz+X 3 .
the time to failure is taken to be T = Xl' and X2 is
no longer of concern.
•
As is common in follow-up studies, the observation period may
end before either failure or the intervening event, or after observing
only the intervening event.
For either situation, the observation is a
censoring time for each corresponding individual.
If
and
t
c
corresponds to the time at which censoring was observed
and x
are as defined for Figure 2.1, the diagrams in Figure
2
2.2 represent the four types of observations possible.
I
Type l.
t
Type 2.
Type 3.
•
x
I
2
I
x2
Type
4.
I
t
I
c
I
c
Figure 2.2.
Four Types of CiJservations Possible when
is Present and Follow-up is Continuous
C~nsoring
26
•
•
•
•
27
The· most mathematically manageable assumption to make about the
covariance structure of Xl'
mutually independent.
Xz'
and X3 is that the random variables _fe
This assumption and that of constant
hazards for
~,
Chapter I.
With the mutual independence assumption, the likelihood
X , and X represent the foundation of the progression
2
3
of disease model presented by Lagakos (1976) and discussed earlier in
function for the observations is composed of three distinct components
corresponding respectively to the observation of realizations of
Xl' X , and X3 . If desired, these three components can be maximized
2
separately to generate maximum likelihood estimates for the model.
Additionally, the constant hazards assumption permits the computation
•
of the maximum likelihood estimates to proceed in a straightforward
manner when baseline covariates are not included .
In the remainder. of the chapter, we shall consider a model where
it will be assumed that random variable Xl is independent of both X
2
and X (i.e., the two failure paths are independent), but that X and
3
2
X3 are dependent. Properties of the model will be developed in terms
of density and survival functions without need for reference to specific
distributions and a test of the dependence assumption for
provided.
model where
The Lagakos
Xz
and X is
3
foI1llU1ation represents a special case of the
and X3 are taken to be independent and constant hazards
are assumed for Xl'
2.2
Xz
Xz'
X3 ·
Description of the Model
For the model we shall assume that random variable Xl' corres-
•
ponding to the time of failure in the absence of the intervening event,
is independent of both random variables X and X .
3
2
This is equivalent
28
to asstmling that the two failure paths depicted in Figure 2.1 are
independent.
A discussion of this assumption is provided. in
•
Section 2.3.
In addition, we shall asstmle that X3 , representing the
time to failure measured from the intervening event, depends on time to
the intervening event
Xz.
A way of accounting for the dependence is
discussed in Section 2.4 and applied in later chapters.
In terms of the indus trial cohort example, the assumptions about
the dependence/independence of Xl' X2, and X3 are interpreted as meaning
that the time to a disability retirement is stochastically independent
of the time to death in the absence of a disability retirement.
However,
given that a disability retirement has occurred, the time to death from
the point of the disability retirement depends on the time to the disability retirement.
We introduce the following notation for the distributions of
•
Xl has density function f(x l ), distribution function F(xl ),
and survival function F(xl );
X2 has density function g(x2), distribution function G(x 2) ,
and survival function G(x 2);
x3tX2 has conditional density function h(x3Ix2), conditional
distribution function H(x3Ix2), and survival function
H( x
3
Ix 2) •
2.2.1 Maximum Likelihood Estimation
To obtain the likelihood function, it is necessary to determine
the likelihood contributions of the four events in Figure 2.2.
Con-
sidering the events in the same order as in Figure 2.2, the contributions
•
•
29
are detennined as follows:
1.
The conditional distribution of the failure time in the absence :>f
the intervening event is
FI(XIIXISX2)
= Pr[Xl sxl !XI sX 2]
= Pr[ (\sxl)n (XI sX2)]
Pr[X sX ]
I 2
(2.2)
•
Then if
11 I
= Pr [~ sXz], the conditional density is found by
differentiation to be
(2.3)
and the contribution to the likelihood for an observation that
failed at Xl is
(2.4)
2.
The joint conditional distribution of the intervening event time
and failure time after the intervening event is
·PT[(X2sx2)n(X3sx3)n(Xz<~)]
•
=
--'--=--------Pr[X <X ]
2 I
30
X
X
f
f
3
1
=
0
Pr [XZ<Xll
z
00
J
0
f(Yl)g(YZ)h(Y3IYz)dYldYZdY3
YZ
•
x 3 Xz
1
=
f
Letting
'Il
Z
=
f
0
Pr[x/Xll
(Z.5)
F(YZ)g(Yz)h(Y3IYz)dYzdY3
0
Pr[XZ<Xll, the joint conditional density is fOlUld by
differentiation to be
and the contribution to the likelihood, for an observation with
intervening event time
X
z
and failure time x
3
after the intervening
event, is
(Z.6)
3.
Observations of Type 3 end by censoring at time
the intervening event.
c
•
sometime after
Then, using the joint conditional density
provided by differentiation of (Z.5), we have by integration the
contribution to the likelihood
00
(Z.7)
4.
Since observations of Type 4 end by censoring at time
c
before
the intervening event, the contribution to the likelihood is
00
Pr[Xl>c,XZ>cl
00
=f f
c c
=
f(xl)g(xZ)dxldx Z
F(c)G(c) .
(2.8)
•
•
31
Referring to Figure 2.2, assume that n
1
individuals are observed
to fail without the intervening event, n z are observed having the
intervening event and failing after the intervening event, n
censored after observing the intervening event, n
are
3
are censored before
4
either failure or the intervening event, and the total number of observations is n
= n l +n Z+n 3+n 4 .
Then by using (Z.4), (Z.6), (Z.7), and
(Z.8), the likelihood function of a sample of n
individuals is pro-
portional to
•
..
n
TT
i=n +n +n +1
l Z 3
F(ci)G(c i ) ,
where xli' xZi' and x3ilxZi are the respective observations of ~,
(Z.9)
Xz'
and X31 Xz and the c i are censoring times.
The maximum likelihood estimators (MLEs) of the unknown parameters
can be found either analytically or numerically, depending on the distributional forms chosen for ~, XZ' and X !X . For constant hazard
3 Z
assl.Dl1ptions, closed fonn expressions for the MLEs can be found in the
usual way by differentiation while non-constant hazard assl.Dl1ptions, such
as are available through the Weibull family of dis tributions, require
•
the use of numerical techiliques .
3Z
Z.Z.Z
The Unconditional Survival Function
As described in Section Z.l, the time to death T is
repre~~nted
•
two ways depending upon whether or not the intervening event occurs
before death.
If the intervening event does not occur before death
(i.e., Xl:S;XZ),·the time to death is T=X , and if the intervening event
l
does occur before death (i.e., XZ<X ), T=X +X .
l
Z 3
Using (Z. 3) and letting T=X , we have
l
= ~ f(t)G(t)
11
(Z.lO)
1
By using the joint conditional density fZ3(xZ,x3[XZ<Xl) available from
(Z.6) and making the substitutions T = XZ+X , S=X ' we have by inte3
Z
gration the conditional density of
T given that XZ<X
•
l
t
fT(t!XZ<X l ) =
f fZ3(s,t-sIXz<Xl)ds
o
t
f F(s)g(s)h(t-s[s)ds
o
(Z .11)
. Then
t
= f(t)G(t) +
f F(s)g(s)h(t-sls)ds
o
By definition the survival function for
(Z.1Z)
T
is
00
00
=f
t
00
f(T)G(T)dT +
T
ff
t 0
F(s)g(s)h(T-sls)dsdT
(Z .13)
•
•
33
By changing the order of the integration of the second term in (2.13)
we have
ex> ,
ff
t
F(s)g(s)h(,-sls)dsd,
s
woo
too
=f f
F(s)g(s)h(,-sls)d,ds + f f F(s)g(s)h(,-sls)d,ds
t s O t
\<hich, upon substitution in (2.13), gives
=
f
f(,)G"(i:)d, +
f
F(,)g(,)d,
t
t
of (2.14), we can see that
•
co
f
co
f(,)G(,)d, +
t
f
0000
F(,)g(,)d,
t
=f f
f(u)g(v)dudv
(2.15)
t t
It follows by substituting (2.15) into (2.14) that
t
FT(t)
=
F(t)G(t) +
f F(s)g(s)H(t-sls)ds
o
.
(2.16)
Finding specific values of FT(t) will depend on the particular
distributional asslDllptions chosen.
In most instances, the joint
survival function in the first term of (2.16) can be evaluated easily.
However, the second term may prove to be more difficult and could
require numerical techniques.
The evaluation of FT(t) is explored in
detail in CJJ.apter III \<here the random variables \<hich underlie the
model are assumed to possess Weibull distributions.
Finally, rT(t) can be thought of as the survival function for a
•
situation \<here an intervening event may occur but cannot be or is not
actually observed.
For this interpretation, the likelihood function
would be proportional to
34
•
where, recalling Figure Z.Z, nl+n Z is the number of observed deaths and
n Z+n 4 is the number of censoring times.
Z.Z.3 Inclusion of Covariates
In addition to failure time, intervening event time, and/or
censoring time, suppose that a vector :'
=
(zl,zZ, ... ,zr) of covariates
or explanatory variables has been observed for each individual and that
the values of z for the i th individual are -1
z! = (zl"'
zZ"1 , ... , zr1").
1
A convenient way to be used here for including covariates in the survival
process is to incorporate them into the hazard function
~(t)
by the
frequently used form
~(t;z")
= ~O(t)exp(S'z")
_1
__ 1
where
~O(t)
(Z.17)
•
is a function of time, sometimes referred to as the under-
lying hazard (see Cox [197Z]), and S is a vector of unknown parameters.
Because of the one-to-one relationship between the hazard
function and the distribution function, covariates can be easily
included in the'likelihood function (Z.9) by using (Z.17).
if the hazard function for the i th observation of ~ is
For example,
then the survival function is
Xli
F(x l ";z.) = exp[ - J . ~Ol(t)exp(Sl'z.)dt]
1 -1
0
as discussed earlier in Chapter 1.
-
-1
It should be noted that the param-
eter vector associated with the covariate vector Xl is subscripted to
•
3S
indicate that the covariate parameters are specific to the distribution
of
of
JS..
JS.,
Allowing different covariate parameter vectors for distribu_:'ons
XZ' and X31 Xz provides more flexibility and is reconnnended for
the starting model. It may be, of course, that some elements of the
covariate parameter vectors will be similar, and it may be possible to
set them equal, reducing the total number of parameters required.
By incorporating zi in the notation for the density and survival
and X3 ' Xz, we shall indicate that a vector of
covariates is included for each observation and the likelihood is
functions of Xl'
Xz,
represented by
n
L
•
l
= TI f(x l 1· ;z.)G(x
-1
-1
l 1· ;z.)
i=l
' ;z.)
F(x Z1' ;z.)g(x
-1
-1
31.\x Z1-1
Z1· ;z.)h(x
F(x z1· ;z.)g(x
-1
Z1· ;z-.)H(c.-x
-1
1
31·\xz1·)
F(c.1 ;z.)G(c.
;z.)
-1
1 -1
(2.18)
where nl ,n Z,n3 ,n 4 ; xli,xZi,x3ilx2i; and c i are as described for (Z.9).
In most instances, numerical techniques will be required for
maximizing (Z.18).
Z.3 The Assumption of Independent Paths to Failure
•
In addition to being a mathematical convenience, the assumption
that the two failure paths depicted by Figure Z.l are independent can
have a substantive interpretation.
In general terms, we might
36
characterize individuals in certain failure time settings as having
predispositions or internal markers for a particular path to
failuI~.
•
An internal marker can be regarded as some aspect about an individual's
constitution that cannot be perceived before he enters a failure time
study.
Rather, it is revealed only through observation of his route to
failure.
Specifically, let us consider the independence assumption in
the context of the example to be analyzed in Chapter III.
In the industrial cohort example, the occurrence or non-occurrence
of a disability retirement determines which failure path an employee
takes.
For this particular cohort, the eligibility requirements for
disability are the accrual of at least ten years of employment and
experiencing a period of six consecutive months before age sixty-five
in which work is not possible.
A common attribute that might be shared
, by the disability retirees is, a lack of susceptibility to sudden death
before age sixty-five.
Even though the disability retirees have
•
suffered a condition that has left them unable to work, they are able
to survive the condition for at least six months.
However, a lack of
vulnerability to sudden death does not necessarily imply that disabil'ity retirees should, on the average, live longer than non-disability
retirees, since a disability retirement must occur before age sixtyfive and since a disabled employee might reasonably be expected to be
at a greater risk of death.
As
an example of a lack of susceptibility to sudden death,
consider employees who experience a myocardial infarction and survive.
Their ability to avoid death might be attributable to an internal
marker which is a cardiovascular system tllat is especially resilient.
•
•
37
BIOMATHfMATICS TRAINING PROGR~M
Lack of susceptibility to sudden death represents an internal
marker concerned with acute conditions such as myocardial infarctio••.
Disability retirees might be subject to an internal marker that is
chronic in nature.
For example, genetic predisposition could make some
employees susceptible to ill effects from certain environmental contaminants in the workplace.
Over an extended period of time, the
accumulated exposure might cause a weakening of health sufficient to
render an employee unable to work.
The stochastic independence required in the model can be viewed
as assuming that failure paths for individuals are controlled by
internal markers that are not subject to change.
•
Some dependence of
the internal marker on other factors is likely, but the assumption may
hold sufficiently for it to represent a reasonable approximation.
If
this assumption seems appropriate, then the use of the model is defensible for reasons other than the ability to nvdel the observed failure
times.
Z.4
Dependence StTIlcture Between X and X
z
3
The model as formulated in Section Z. Z assumes that random
variables
Xz
and X3 are dependent.
marginal density function for
Xz
Further, it requires that the
and the conditional density function
for X3 !XZ be specified so that the joint density of Xz'~ be given by
the product g(xz)oh(x3Ixz).
For the applications which follow in later chapters, the
•
dependence stTIlcture between X and X wili be reflected through the
z
3
hazard function of x3lxz. The treatment of known values of X will
z
be handled in much the same way as described previously for the
38
covariate vector z.
However, the values of Xz are realizations of
random variable with distribution function G(x Z) and they must be
•
considered in a proper probabilistic fashion.
In the hazard function for x3lxz' the dependence between X3
and Xz will be represented by a multiplicative factor that requires
that the hazard functions for different conditioning values of X
z
be proportional. This assumption provides mathematical convenience
and has been widely used by other authors in related settings.
Let the hazard function for distribution H(x3lxi be denoted
by
~(x3Ixz)
and let the influence of a value of Xz be of the form
(Z.19)
where
is a strictly positive function of x3 and an unknown
vector of parameters 8. Then the conditional density h(x3IxZ) is
A(x3;~)
•
(Z.20)
A convenient specification for
A(x3;~)
is a marginal hazard function
so that if a=O, random variable X3 will be independent of Xz and
have a known lifetime distribution.
•
•
39
In particular, the application to follow later will assume that
A(x3;~)
is the Weibull hazard function
Y3- l
A(X ;A ,y ) = A y x
3 3 3
3 3 3
This assumption yields the conditional density
ax Y
y -l UX -A e Zx 3
3
3
z 3
e e
h(x3Ixz) = A3y3x3
which is a Weibull density conditional on Xz=xZ' If a=O, X is inde3
pendent of X and has a Weibull distribution with parameters A3 'Y3'
z
Z.5
Likelihood Ratio Tests
Hypothesis testing can be accomplished by using likelihood ratio
•
tests based on i transformation of the ratio of the maximized likelihood
with a constrained parameter space to the unconstrained maximized likelihood.. This frequently used procedure has the fonn
-z log [~ (constrained pa;ameter space)
. e L (unconstrained parameter space)
J,
(z. Zl)
where L(·) represents a maximized likelihood or, more exactly, the
·value of the· likelihood when the associated parameters are replaced by
maximum likelihood estimators.
The likelihood ratio test (Z.Zl) is
asymptotically distributed as a chi-square with degrees of freedom
equal to the nUmber of constraints.
Z.5.l Test of Independence of XZ,X
3
•
As discussed in the previous section, including the parameter a
through the multiplicative tenn in the hazard function shown in (Z .19)
introduces dependence between Xz and X3 when afO and allows the pair to
40
be independent when
Ct~O.
Thus, a tes t of the independence of X2 and X3
•
is provided by
(2.22)
where the likelihood ratio test (2.21) with
Ct
constrained to zero has
one degree of freedom.
2.5.2 Tests of Other Parameters (model building)
The independence test (2.22) is the primary hypothesis to be
tested.
However, additional likelihood ratio tests may be desired to
determine if the parameter space required for the model can be reduced.
Reductions can result from deletion of covariables that make no significant contribution to the model likelihood and, by setting equal
parameters that serve comparable roles for the three underlying random
variables yet do not differ appreciably in value.
These tests can be
considered to be aimed at model building rather than the acceptance or
•
rejection of an hypothesis.
For the rxl vector z of model covariates, an overall test of the
explanatory value of these covariates is provided by
where the likelihood ratio test (2.21) has 3xr degrees of freedom.
Tests of the type (2.23) and analogous tests for subsets of the
parameters are used in Chapters III and IV.
By constraining subsets of
parameters to be equal to values other than zero, likelihood ratio tests
of the equality of parameters that are to remain in the model can be
constructed.
Since the variety of likelihood ratio tests that may be
of interest is simpler to demonstrate than to describe and since such
•
•
41
tests are in cOIlllllOn use, we shall not discuss them further here but
rely on their application in later chapters for additional clarification.
2.6
Goodness of Fit
The goodness of fit of the model estimates to the observed
survival data can be explored by applying the goodness of fit test statistic devised by Taulbee (1977).
Since death before the intervening
event precludes observation of the intervening event and since we have
no lmowledge of which path to failure will be taken by individuals that
are censored prior to either the intervening event or death, data are
not available to test directly the distributional assumptions that
•
underlie the model.
However, if we ignore which path individuals follow
to death, there is complete information observable on death and censoring.
The lDlconditional survival function FT(t) given by (2.16) is
concerned with the probability of survival regardless of the path to
failure and will be used to assess the overall (i.e., without regard
to failure path) goodness of fit of the model.
Let d.(t) be.a random variable where
l.
th
°fail
if the i
~dividual is observed to
in the interval (O,t].
='
th
1 if the i
individual is not observed
to fail in the interval (O,t].
When there is censoring, we define a goodness of fit test statistic at
time
t
by
N
•
L
Wet)
=
i=l
L
i=l
N
diet) - . L1 FT[rnin(t,p.);z.]
l.
-l.
l.=
1
r.T[min(t,p.) ;z. ]FT[rnin(t,p.) ;z.]
l.
_l.
l.
-l.
(2.24)
42
where we use the following notation:
N: the
the
for
z.: the
-1 .
number of observations in the sample;
known time at which the i th individual is potentially due
censoring;
vector of covariates of the i th individual;
•
FT(·): the unconditional survival function for the model of interest
given by (2.16) where FT(·) = l-FT(·).
If the chosen model provides a good fit, we expect
Pr[di(t)=l] = F[min(t,Pi);Zi] and this equality is regarded as the null
hypothesis where E[di(t)]
=
FT[min(t,Pi)]
and
Var[d.(t)]
= FT[min(t,p.);z.]·FT[min(t,p.);z.].
1
1
_1
1
-1
Then, by appealing to large sample theory, the square of Wet) will be
approximately a chi-square random variable with one degree of freedom.
In the case where there is no censoring, FT[min(t,p.);z.] in (2.24) is
-
1
-1
•
replaced by FT(t;Z.).
-1
Since the goodness of fit test compares the observed with the
estimated frequency of survival at a single point t, it will be necessary
to make several comparisons at various values of t to determine if the
proposed model fits the data adequately.
The one degree of freedom
chi-square statistics produced by the multiple comparison are not independent and so cannot simply be accumulated to form a multiple degree
of freedom test.
To assume that the overall significance level of the
goodness of fit at k points in time is no greater than a, we can use the
"addition" or Bonferroni inequality.
If there are k tests of significance
each at level a*, then an upper bound on the overall significance level is
l-(l-a) k . By expanding the expression for the upper bound, it can be
shown that a';a\ .
When a* is small and k not large, we have ex*'l'a/k.
•
•
Q-\APTER
III
sURVIVAL ANALYSIS FOR AN INDUSTRIAL SAMPLE WIlli DISABILI'IY
RETIREMENT AS 1HE INTERVENING EVENT
3.1
Introduction
This chapter applies the intervening event survival model
introduced in Chapter II as a means of studying the role of disability
retirement on the survival of the members of a cohort of rubber workers.
However, before the details of the analysis are discussed, a brief
•
description of the usual composition of occupational cohorts and some
cOlIUIlOnly used methods of studying occupational mortality within the
cohorts will be presented along with comments specific to other studies
of mortality among rubber workers.
A typical occupational cohort study supposes that n
employees
are kept under continuous surveillance from their entry into the study
until death, the termination of the study, or loss to further observation.
While entry into the study may coincide with the date 6f the
employee I s first employment in a given work location, or, more generally,
industry, it is more common that entry cOlTOllences with the jth anniversary (e.g., tenth or fifteenth) of his hiring or after he has worked
at least
j
years.
Delaying an individual's entry into the study until
he has aCCUlID.1lated some work time yields a more· homogeneous study pop-
•
ulation and allows time for the effects of employment on health to
develop.
Since there is frequently a·strong positive correlation
between age and aCCUlID.1lated work time in an industry, entry into the
44
study is sometimes governed by the worker's age rather than length of
employment.
•
The most commonly used methods for studying occupational
mortality have relied on the standardized proportional mortality ratio
(SPMR) and/or the standardized mortality ratio (SMR).
Briefly, the
SPMR represents the ratio of the proportion of deaths from a specific
cause in the industrial population to the proportion of deaths from the
same cause in some appropriate standard population.
Since the age
composition of the sample population is frequently different from the
standard population, adjustments for the differences are commonly made
before ratios are determined .. For a specific cause of death, the usual
adjustment or standardizing procedure multiplies the total number of
deaths for a specified age group in the sample by the proportion of all
deaths due to the cause in the comparable group of the standard population.
The product gives the number of deaths that would have occurred
•
in the group in the sample population if the standard population proportion applied.
deaths.
This number is referred to as an "expected" number of
To get the SPMR, observed and expected numbers of deaths are
summed over the groups and a ratio of the observed sum to the expected
surn·is formed.
It is not necessary to restrict the formation of groups·
to age strata since it may be desirable to adjust for differences in
other variables such as sex and race.
The SPMR includes only. information on deaths and so typically
excludes a large portion of the population at risk.
In contrast, the
SMR draws on the entire industrial population at risk (i.e., including
those who survived) to determine the ratio of the death rate in the
occupational
popu~ation
to the death rate in an appropriate standard
•
•
45
population.
As with the SPMR, the ratios are usually adjusted for age
and sometimes other variables.
For age adjustment, the number of
person years aCctnllulated in a specific age group is l11l.l1tiplied by the
death rate for the corresponding group in the standard population and
the product is referred to as an expected number of deaths.
served
~d
The ob-
expected number of deaths are summed over the age strata and
a ratio of the observed sum to the expected sum is formed to give the
SMR of a.specific cause of death.
The use of SMR methodology has been a standard tool in occupational epidemiology, and it has been employed in several studies involving
rubber workers.
•
Among these studies, McMichael, Spiritas, and Kupper
(1974) and Andjelkovic, Taulbee, and Symons (1976) looked at cause
specific mortality for a large set of causes of death among respective
. cohorts of rubber workers.
In addition, Andjelkovic et al. (1976)
discussed briefly the need to consider the impact of events such as
retirement on mortality.
In another study, MJnson arid Nakano (1976)
concentrated on deaths from different types of cancer among a cohort
of rubber workers.
Recently, a few authors have discussed the applications of
survival analysis methods to studies of occupational mortality.
Breslow
(1977) detailed several applications of the general 'type of proportional
hazards model as described by Cox (1972) to problems in epidemiology
with some reference to occupational cohorts.
Survival analysis tech-
niques which do not require a proportional hazards assumption have been
•
proposed by Taulbee (1977) and applied to a cohort of rubber workers to
estimate their survival in the presence of covariates.
46
3.2 Description of the Data
The data to be analyzed in this chapter come from an
histor~_al
•
prospective study design which was part of a research program that was
initiated by contract agreement between the United Rubber Workers
Union and six major United State rubber companies.
Permission to use
the data was granted by Dr. Robert L. Harris, Jr., the Director of the
Occupational Health Studies Group of the School of Public Health,
University of North Carolina at Chapel Hill.
The data are composed of white male hourly employees who have
been actively employed for at least ten years as of 1 January 1964, at
a single rubber manufacturing plant in Akron, Ohio.
The mortality
experience of the 6425 sample members has been followed for eight years
and a total of 390 individuals died.
The terin "cohort" will be used to
describe the type of sample in the sense that a cohort refers to a group
of individuals. defined according to one or more characteristics or
•
events that are common to all members of that group.
The identification of the reference population for this cohort
is of practical interest.
Since the manufacturing processes at the
Akron plant cover most of the processes performed industry-wide, it
would be reasonable to conjecture that the cohort is representative of
white male hourly rubber industry employees who had been actively
employeed for at least ten years as of 1 January 1964.
Beyond that
group, there may be other industries with work environments similar to
the rubber manufacturing industry.
'The intervening event of interest for the cohort is disability
retirement.
For an individual to be eligible for a disability retire-
ment, he must have been employed with the company for at least ten years
•
•
47
and, after meeting the length of employment requirement, been unable
to
work for a consecutive period of six months.
For the analysis,
employees who retired with. other than a disability retirement or who
left the company for other reasons were considered to be censored at
their respective departure times.
A breakdown of the numbers of employees by each of the four
possible ways of observing survival of individuals for the intervening
event survival model is given as follows:
Type 1.
TIlere were 295 employees who died during the observation period with no disability retirement.
Type 2.
There were ~ employees with a disability retirement
followed by death during the observation period.
•
Type 3.
There were 246 employees with a disability retirement
followed by censoring during the observation period.
Type 4.
There were 5789 employees who were censored without
a
disability retirement during the observation period.
In addition to death times, disability retirement times, and
censoring times, the ages of individuals on entry into the study were
available.
A breakdown of the four types of observations possible by
grouped ages is given in Table 3.1.
•
48
TABLE 3.1
Number of Employees Observed in Each of Four Possible Ways
for Different Age Groups (Age at Entry into· Study)
Age Group
in years
(1964)
Type of Observation
Type 1
Type 2
Type 3
Type 4
26-35
6
1
3
524
36-45
50
5
36
1680
46-55
137
32
124
1629
56-64
102
57
83
1956
3.3
•
Estimation and Testing
For application, the model requires distributional assumptions
about three random variables Xl' X2, and X3 ' X2 that for this analysis
correspond respectively to the time from entry into the study to death
•
without disability retirement, the time from entry into the study to
disability retirement, and the time from disability retirement to death.
We shall assume that these three distributions can be adequately represented by the two-parameter Weibull family of distributions.
referred to above are measured in years.
The times
Since the cohort is composed
entirely of active workers and since there is an eligibility requirement
of six consecutive months off the job for disability retirement, the
random variable X2 is of the form X = O. 5+X2 I . Random. variable X I
2
2
represents time to disability retirement with measurement beginning six
months after the study commences.
The two-parameter Weibull family of distributions provides a
rich source of failure time distributions, including the exponential
•
•
49
family as a special case.
Dyer (1975) assumed a Weibull distribution
in an analysis of survival of an industrial population.
Additional :",
Weibull distributional assumptions have been used by other authors, as
mentioned in O1apter I, to analyze survival in non-industrial populations.
3.3.1 Estimation
Assuming that the distributions that underlie the model can
reasonably be represented by the Weibull family and taking age as a
covariable, the density and survival functions necessary for the i th
individual in the sample are as follows:
•
Xz = O.5+X Z':where XZ' is distributed as a Weibull with scale
parameter AZ and shape parameter YZ' then
YZ
8 Zz
YZ-l 8 Zzi -AZe i (x Zi il.5)
= AZYZ(x Zi-O.5)
e
e
•
IT(x 31"lxz1· ;z.)
=
1
so
and z. is the age of the i th individual on entry into the study.
1
Following (Z.18), the likelihood function is
L
•
=
(3.1)
•
The maximum likelihood estimates and estimated standard errors
for the parameters in (3.1) are given in Table 3.Z.
Table 3.Z appears to indicate that age is an important covariable
for survival prior to or without disability retirement and for time to
A
disability retirement as Sl and SZ' respectively exceed twice their
standard errors.
However, for survival given a disability retirement
at time x Z' age appears not to be an important covariable as S3 is less
than its standard error. Also, the estimate of a (with estimate of
standard error) seems to suggest that the time of a disability retirement as measured from entry into the study may be important in dctcrmining the time to death given a disability retirement.
These conjectures
•
•
51
will be subjected to likelihood ratio tests in the next section and
final results will be discussed in Section 3.3.4.
TABLE 3.2
Parameter Estimates for Likelihood (3.1)
with Estimated Standard Errors
Parameter
Description
Maximum Likeli-
Parameter
hood Estimate
Estimate of
Standard Error
Al
scale
0.000106
0.000048
Yl
81
Xl shape
1.281358
0.067456
f,e
0.075120
0.007576
0.000018
0.000006
covariate
1.
•
2
Y2
82
1.
X shape
2
covariate
1.479032
0.079328
0.107449
0.011045
scale
0.076486
0.094218
shape
0.814947
0.077891
covariate
0.018272
0.020567
dependence
-0.141730
0.061644
3
Y3
83
x31x2
a
3.3.2 Hypothesis Tests and Model BUilding
In this section, we shall test for the independence of X and X
3
2
and to see if the initial ten parameter model with likelihood (3.1) can
be simplified by deleting or consolidating other parameters.
To test the hypothesis that X3 is independent of X2 ' a
constrained to zero and likelihood function (3.1) is maximized.
•
first two
rm~s
1S
The
of Table 3.3 contain the maximized log likelihood values
for the unconstrained ten parameter model with parameters as listed in
5Z
Table 3.1 and for the nine parameter model where a=O.
The nine
parameter model with a constrained to zero has a maximized log liL-
•
lihood value which is less than the maximized unconstrained log likelihood value by 3.4465.
The resulting likelihood ratio test statistic
provided by (Z.Zl) has a value of 6.893 which is greater than the 95th
percentile of the one degree of freedom chi-square statistic, and so
the hypothesis of independence for XZ' X is rejected.
3
TABLE 3.3
Maximized Log Likelihoods for Selected
Constraints to Likelihood Function (3.1)
Number of
Parameters
Constraint
Value of
Log Likelihood
Ten
none
-3797.7879
Nine
a=O
-3801. Z344
Nine
Yl=Y3
-3805.0549
Nine
S3=0
S1=S Z=0
-3798. Z091
Y =Y =y =1
1 2 3
-3841.1408
Eight
Seven
•
-3947.6973
The remaining entries in Table 3.3 are used for model building.
The third row of the table lists the maximized log likelihood when the
shape parameters for the Weibull distributions of Xl and x31Xz are
constrained to be equal.
The likelihood ratio test statistic is 14.5340
with one degree of freedom.
As a ,result, we reject the hypothesis of
proportionality of the hazard functions for Xl and x3lXz.
•
•
53
The fourth row of the table lists the maximized log likelihood
value when the covariate parameter in the distribution of X is COli
3
strained to zero and the resulting likelihood ratio test statistic is
0.8424.
This indicates that age is not an important covariate for the
distribution of failure time given a disability retirement.
The fifth row of the table provides a maximized log likelihood
value which can be used to determine if age is an important covariable
in the distributions of Xl and X2 . The associated likelLhood ratio test
statistic is 299.8, indicating that age is an important explanatory
variable for the respective distributions of failure time prior to
disabili ty retirement and time to
~
disability retirement.
The last row of Table 3.3 gives the value of the maximized log
•
likelihood for an exponential model with age as a covariate (i.e., the
respective shape parameters are. constrained to unity).
The likelihood
ratio test statistic value of 86.7058 does not support a reduction of
the model to one with constant hazards.
From the above tes ts, we conclude that the ini tial ten parameter model
can be reduced to a nine parameter model by dropping the age covariate
parameter 83 , Estimates of the parameters of the reduced model are
given in. Table 3.4, along with estimated standard errors. Implications
of some of the test results will be discussed in more detail in
Section 3.3.4 .
•
54
•
TABLE 3.4
Parameter Estimates with Estimated Standard Errors for the
Likelihood (3.1) with the Constraint S3=0
Parameter
Parameter
Description
Maximum Likelihood Estimate
Estimate of
Standard Error
Al
scale
0.000106
0.000048
Yl
Xl shape
1.281358
0.067456
0.075120
0.007576
0.000018
0.000006
X2 shape
covariate
1. 479032
0.079328
0.107449
0.011045
scale
0.205054
0.045652
X31Xz shape
0.813329
0.077787
-0.161418
0.057401
covariate
81
1,2
r
Y2
82
1,3
Y3
,e
dependence
Ct
•
3.3.3 Goodness of Fit
The goodness of fit of the estimated unconditional survival
function discussed in Section 2.2.2 can be detennined by using the
.goodness of fit test (2.24). Fro~ (2.16), the unconditional survival
function of the i th individual for the chosen nine parameter Weibull
model is
for tsO.5
.
S z. y
S z.
Y2
YZ-l -(Ale 1 1 5 l+AZe Z 1(5-0.5) )
J (s-0.5)
e
0.5
5
Y
]
.
-A e Ct (t-s) 3
e 3
ds
(3. Z)
-
I
t
•
•
55
The integration for the case where t>O. 5 IIlLlSt be perfonned by numerical
techniques.
More details about (3.2) are given in Section 3.4.
USLfig
the parameter estimates from Table 3.4 to compute survival probabilities with survival function (3.2), we can see from the results in
Table 3.5 that the model fits well.
TABLE 3.5
Results of Goodness of Fit Test (2.24) Where the
Unconditional Survival Function is Provided
by (3.2) with Parameter Estimates From Table 3.4
Time
Interval
(years)
•
0-1
0-2
0-3
0-4
0-5
0-6
0-7
0-8
No. Not Observed
to Fail in Interval
Expected No.
of Survivors
Difference
(ObservedExpected)
x2
6390.02
6341.44
6287.75
6232.42
6181. 22
6133.88
6081. 27
6040.81
-3.02
6.44
1.25
7.58
-4.22
-6.88
-6.27
-5.81
0.2626
0.5055
0.0117
0.3115
0.0769
0.3789
0.1260
0.0978
6387
6335
6289
6240
6177
6127
6075
6035
3.3.4 Discussion of Results
The use of the intervening event survival lIDdel to analyze the
failure time data from the rubber industry sample provided some interesting insights into employee survival.
Perhaps the most interesting
result was the contribution of age as an explanatory variable.
•
Age was
detennined to be an important covariable with respect to the hazard
rate for death in the absence of a disability retirement and the hazard
56
rate for the occurrence of a disability retirement.
However, age was
not found to be an important explanatory variable for the hazard ra
of death given a disability retirement.
~
•
Thus, employees who were
subject to a disability retirement experienced the same risk of death
regardless of their respective ages at the time of entry into the study.
The range of ages at the time of disability retirement was thirty-six
to sixty-four and approximately 12% (i.e., 40 of 341 occurrences) of
the disability retirees were less than fifty years old at retirement.
Thus, there were a substantial number of relatively young employees
who, after a disability retirement had the same risk of death as disability retirees who were in some cases twenty or more years older.
If the causes of all or even a large portion of the disability
retirements could be traced to the work environment, then the result
regarding the importance of age as an explanatory variable would represent an indictment of the severity of the health hazard in the rubber
industry.
•
However, reasons for the disability retirements, as deter-
mined by a committee of qualified individuals, were not available for
examination at the time of the analysis.
Accordingly, the circumstances
surrounding each disability retirement must be thoroughly examined
before responsibility can be established.
Other results from application of the model indicate that the
hazard rate for death following a disability retirement decreases with
time measured from the point of the retirement and, further, that the
time from entry into the study to disability retirement is an important
factor in determining the hazard rate for death given a disability
retirement.
The decreasing hazard rate following a disability retire-
ment, with time measured from the point of retirement, may suggest that
•
•
57
individuals are at the greatest risk of death in the period
immediately following a disability retirement and that their probability of survival improves with time.
This, of course, must be
viewed as a relatively short term phenomenon since age will eventually
act to increase the hazard rate.
An explanation for the decreasing
hazard rate may be provided by analyzing the causes of the disability
retirements.
Wi th regard to the time when a disability retirement occurs,
the negative estimate for a
implies that the longer disability retire-
ment is postponed following entry into the study, the better the chances
of survival after a disability retirement.
This result may suggest
that as time progressed from 1964 there was a more liberal interpreta-
•
tion of the rules governing disability retirement.
The more liberal
interpretation may have allowed individuals to retire from a disability
in better health than previously permitted.
the source of the negative estimate for a
The conjecture concerning
gains credence when it is
observed that the period of the data collection (1964-1972) was a time
of increasing interest on the part of government agencies in the health
and welfare of the nation's work force.
3.4 Some Derivations Associated with Weibull Distributional Assumptions
3.4.1 The Survival Function for the Unconditional Time to Failure
In Section 2.2.2, the unconditional survival function for the
intervening event survival mdel was found to be
t
•
FT(t)
=
F(t)G"(t)
+
f F(s)g(s)H(t-s js)ds
o
,
where g(') is the density function for the underlying random variable
Xz
58
and where F( .),
(j( .),
and H( .) are the respective survival functions
for the model random variables Xl' X2, and X31 X2 . Following the
notational scheme of Section 2.2.3, the introduction of a covariate
vector z. for the i th observation to (2.24) will result in it being
•
-l.
written as
t
FT(t;Z.)
= F(t;z.)G(t;z.)
_l.
_l._l.
+
oJ F(s;z.)g(s;z.)H(t-sls;z.)ds
-l.
-l.
-l.
(3.3)
The remainder of this subsection will contain evaluations of FT(t;zi)
for some special cases involving Weibull distributional assumptions
3.4.1.1 Constant Hazards
~bdel
with Covariates (X2 ,X3 independent)
Assume that Xl' X2 , and X3 are mutually independent and have
constant hazards. If covariates are handled by the proportional hazards
•
assumption discussed in Section 2.2.3, the respective density functions
are
(3.4)
where
the hazard for the jth (j=1,2,3) random variable with
covariate vector z., A.>O, and -00<6·<00.
_l.
J
--J -
with the survival function (3.3), we have
Then, using assumptions (3.4)
•
59
•
FT(t;z.)
_1.
S I Z.
=
we
S I Z.
S 1 Z.
' -1-1., -Z-1.)
-t ("Ie
+"Ze
+ (l-w)e
, -3-l.
-" 3e
t
(3.5)
where
•
provided the denominator of w is not equal to zero.
If for some z., the denominator of w is equal to zero, then by
-1.
setting
S' Z •
S' Z •
A -1-1. A -Z-1.
1e
+ Ze
=
we have
S I Z.
S' Z •
-p ( t;z. )
T
-1.
3.4.1.Z
-A e-:>'::l.
= (l+'ze-Z-1.)e
3
"
(3.6)
Constant Hazards Model with Covariates (X Z,X3 not independent)
Assume that Xl and Xz are as defined in Section 3.4.1.1 and that
•
X and X are not independent. If Xl and Xz have the densities given
z
3
by (3.4) and X31Xz is assumed to have the density
60
where A3 and
~3
•
are as defined in Section 3.4.1.1 and -oo<a<oo, then
FT(t;z.)
-J.
and the integration in the second tenn must be evaluated by numerical
techniques.
3.4.1.3 Non-Constant Hazards Model with Covariates
Assume that Xl' X2, and X31 X2 have Weibull hazards. If
covariates are handled by the proportional hazards assumption discussed
•
in Section 2.2.3 and if the conditioning of X by X is through the
3
2
multiplicative term introduced in Section 2.4, the respective density
functions are
(3.8)
Y·-l S'z.
th
where AjYjX j J e-J-J. is the hazard for the j
(j=l,2,3) random variable with covariate vector Z.,A.>O·,Y·,-oo<Q.<oo, and -oo<a<oo. Then,
-J. J J J - '<J -
~
61
•
F:r(t;z.)
_l
.~
(3.9)
and the integral
ill
the second tenn must be evaluated by numerical
teclmiques.
Software for the numerical integration is readily available.
Guassian quadrature fonnulas were used to evaluate the integral for the
application earlier in the chapter.
These teclmiques are described by
Krylov (1962) and software is available through the Systern/360 Scien-
•
tific Subroutine Package--Version III.
3.4.2 Some Special Cases of the Pr[X >X ]
l 2
In Section 2.2 the probability that the time to death exceeds
the time to the intervening event was found to be
;z.]
Pr[Xl >X
. 2 _l
=J
•
0 F(-r;z.)g(T;z.)dT
_l
_l
(3.10)
where F(.) is the survival function for the underlying random variable
Xl and g( .) is the density function of the random variable X2 . The
evaluations of the conditional density functions (2.10) and (2.11)
rely on the probability indicated in (3.10).
This section will eval-
uate Pr[Xl>X Z) for the special cases considered in Section 3.4.1
involving Weibull dis tributional assumptions about Xl and X2.
•
62
3.4.2.1
Constant Hazards Model with Covariates
Assume that Xl and X2 are independent and have constant haz< cds
with covariates handled by the proportional hazards assumption. By
•
referring to the density functions for Xl and X2 given by (3.4), the
probability indicated by (3.10) is found to be
8' z.
8' z.
00
8' z. -(>' Ie _l_l.+>. 2e-2-l.)T
' -2-l.e
dT
= f 1\2e
o
8' z.
=
>. e-2-l.
2
-~=------
(3.11)
3.4.2.2 Non-Constant Hazards Model with Covariates
Assume that Xl and X2 are 'independent and have Weibu11 hazards
with covariates handled by the proportional hazards assumption. The
•
appropriate density functions for Xl and X2 have been given by (3.8).
Two cases to be considered are equal and unequal shape parameters.
Case 1:
by
Shape Parameters Equal to Y
When the shape parameters for Xl and X2 are equal and denoted
Y, the probability indicated by (3.10) is found to be
=
>.
8'z.
Ie
8'z.
-l-l. >. e -2-l.
+ 2
which is identical to (3.11).
(3.12)
•
•
63
Case
z:
Shape Parameters Not Equal, Y1h Z
When the shape parameters are not equal, the probability
indicated by (3.10) cannot be found in closed form but can be
mated by using terms from an infinite series.
approx~-
To evaluate
(3.13)
we shall use the Dlfinite series
B'Z. Y
e
(AI e-l-~, 1) Z
Bl'z. Y
-", e--~, l
+
1
+ •••
I!
and write the probability
•
f
o
B'z. Y
(Ale-l-~, 1)3
Z!
(3.13~
3!
as
B' z.
,-Z-l
'
B
YZ-l -Z~· -"Ze
YZ
AZYZ'
e
~e
,d,
-f
\
o
Integration of the first term in the expansion yields one
•
remaining terms, let
u
= AZe -Z-~ ,
B' z· YZ
for the
64
•
so that
T
=
1
SI
•
Z
-Z_l
;
YZAZe
which on substitution into the remaining terms yields
1+
'I
j=l
lli
(3.14)
j!
00
•
where a=y /y and the integral J uaJe-udu is the gamma function denoted
1 Z
o
by r(aj+l). The expression (3.14) can be rewritten as
00
1 +
L
j=l
j
S I Z.
- A e-1-1
1
.. S I Z •
(A e- Z- 1)a
•
r (aj +1)
.
,
(3.15)
J.
z
When O<a<l, the series (3.15) can be shown to converge and for
a>l the probability
Pr[Xl>XZ;~il
can be evaluated by considering
I-Pr[XZ>Xl;~i]'
Convergence for O<a=y IY <1:
1 z
r(aj+l) < j!
Let
For O<a<l, we have that
•
•
65
then if IKI<l, the series (3.15) can be seen to converge faster than a
geometric series involving a common ratio equal to K.
For
00
IKI~l
we can write
f(aj+l)
00
---=
• I
J.
a
I
KJ
jf(aj)
(3.16)
j !
j=l
Since
f
(aj) = (211) (1-j)/2
W (a+i/j)
jjaJ~
i=O
and
•
we have for (3.16) that.
j-l
TT
i=O
r(a+i/j)
n
. -1
r(a+i/j)
i=O
=a
I [1.0844K)J
j=l
.l-a
J
j-l
TT
f(a+i/j)
i=O
For values of a where O<a<l, we can establish convergence because an
upper bound can be set on f(a+i/j) since 0<a+i/j<2.
Although f(a+i/j)
gets very large as a+i/j approaches zero, most values of a in practice
•
will be greater than 0.01.
by C.
Then we .have that
~
i=O
f(a+i/j) < CJ
Let the upper bound on f(a+i/j) be denoted
66
•
which gives
co
r(aj+l)
.
co
[10844Kc]j
Kj - - < a L .=.:':.=..:..:::.:::
j=l
j!
j=l
jl-a
L
When
j
increases so that j>ll.0844KCI
becomes less than one.
co
KJr(aj+l)
L
----< a
j=l
(3.17)
l/(l-a)
the term 11.0844KCI/!-a
As a result, we shall write
j!
where the limit [.] indicates the largest integer value contained in the
•
brackets andthe first part on the right hand side is an alternating
series with terms that decrease in absolute value.
The second term con-
verges faster than a geometric series with connnon ratio equal to
[l.~~~~KC)J where
j = [ll.0844KCl l /(1-a)+1].
Computation for a> 1:
When a> 1 the series (3.15) does not converge.
However, if a>l for the expansion of Pr[Xl >X 2 ;Zi]' then a<l for the
comparable expansion of Pr[X2>Xl;~i]' As a result, the Pr[Xl>X2;~i]
can be found by
1 -
f
co
+
l
j=l
1",
]
S'z
- 1 2e
(Ale-I-i) 1/a
J r(
ia
J.
"'
+
1)
J
•
•
O!APTER IV
1lIE MJDEL \\HEN FOLLOW-UP IS AT FIXED INTERVALS
4.1
Introduction
In this chapter we shall adapt the intervening event survival
model introduced in Chapter II to accommodate the situation where the
follow-up of individuals is not continuous but occurs only at specified time points.
~1any
studies of chronically ill patients rely on
follow-up data collected systematically at intervals, since patients
•
are generally treated and released from the treatment setting.
Then,
at each data collection time point, information on the patient I s health
status is obtained i f the patient remains alive.
If the patient has
died since the previous collection point, the exact time of death is
accessible for some studies but not for others.
As a result, investi-
gators may be faced with one of the following situations:
1.
The time of -the intervening event is recorded only in
a specified interval and death times are recorded
continuously, or
2.
Both the time of the intervening event and the time of
death are recorded in specified intervals.
The situation where exact times of intervening events are available but
death times are recorded in specified intervals is not generally
•
encountered and will not be discussed here.
However, the techniques
for addressing this problem follow closely those developed in the
68
subsequent sections of this chapter for the two situations described
above.
•
As an example, consider a group of patients under treatment for
coronary artery disease at a cardiology clinic.
of the
s~udy
terization.
clinic.
Patients become part
group when the condition is established by cardiac catheAfter diagnosis, patients are treated and released from the
At established intervals the patient's health status, with
respect to coronary artery disease, is checked and the resulting information recorded.
If the patient has died subsequent to the previous
,follow-up point, the exact time of death is recorded.
One piece of
follow-up information that is of interest is the occurrence of a nonfatal myocardial infarction after the disease is established in the
clinic, since such an intervening event might result in increasing the
individual's risk of dying.
However, the occurrence time is not
recorded exactly but only in a follow-up interval.
This example will
•
be explored further in Section 4.4 to illustrate application of the
model when the time of the intervening event is recorded in a specified
interval.
4.2
Description and Notation for Interval Follow-up
Recalling the notation of Chapter II, we required the following
variables.:
Xl:
time to failure in the absence of the intervening event
X :
2
X :
3
time to the intervening event
time to failure measured from the occurrence of the
intervening event
where Xl'
variable
Xz'
X3>O.
Also, we defined the time at death as the random
•
69
•
Further, let the follow-up intervals starting from time zero be
represented by A.=(a. l,a.], J·=1,2, ... ,r.
.
J
JJ
4.2.1
Follow-up of Intervening Event at Fixed Intervals
The diagrams in Figure 4.1 describe the four types of observations
possible when the intervening event is recorded in an interval.
In
the diagrams, c corresponds to censoring time, t represents the exact
death time, and lCzEAj corresponds to the intervening event occurring in
the j th follow-up interval.
•
t
Type 1.
a.J- 1
a·
J
t
Type 2.
a
Type 3.
O
I
aO
a
l
I
a
l
c
Type 4.
I
a
Figure 4.1.
•
O
- - ' \ ; r - -I
1
al
a.
J.
Four Types of Observations Possible when Censoring is
Present and Follow-up of the Intervening Event is at
Intervals
70
Referring to Figure 4.1, the four types of observations possible have
the following description:
1.
•
Type 1 observations correspond to individuals who fail
without an occurrence of the intervening event.
Even
though knowledge of the death is not available until
follow-up point a , the exact time of death t can be
j
detennined. For estimation purposes we have Xl:;;X and
Z
. are provided with a failure time for Xl
(i.e., Xl=t).
Z.
Type Z observations represent individuals who die after
experiencing the intervening event.
In the diagram, the
intervening event .occurs in the interval A. = (a. l,a.].
J
J-
J
However, it is possible for it to occur in the same interval
as death.
When both are observed in the same interval, for
example Aj +l' then the time of the intervening event is
placed in aj<Xz~t,. since the time of death XZ+X3=t is
•
recorded exactly.
3.
Type 3 observations correspond to individuals who experience the intervening event and are subsequently censored.
Since censoring occurs at the end of a follow-up interval,
the time of the intervening event must fall within one of
the Aj , j=l,Z, ... ,r . . In the diagram, the time of the
intervening event is placed as aj_l<XZ~aj while death
4.
time is censored giving XZ+X 3>c = a + .
j l
Type 4 observations represent individuals who are censored before either the intervening event or failure.
For these individuals we have that Xl>c and XZ>C.
•
•
71
4.2.2
Follow-up of Both Intervening Event and Death at
Fixed Intervals
The diagrams in Figure 4.2 describe the four types of observations
possible when both the intervening event and death are recorded in
In the diagrams, c
intervals.
corresponds to censoring time, TEA.
J
corresponds to death occurring in the j th follow-up interval, and X2EAj
represents to the intervening event occurring in the j th follow-up
interval.
TEA.
J
Type I.
I
a
•
O
f---Iv-l
a
a.
J
a.J- 1
l
XzEAj
Type 2.
I
a
O
a
f-./y-f
I-'\r-I'
a.
J
a.J- 1
l
TEAj+.Ie
X €A
2 j
f
aj+.Ie-l
aj+.Ie
c
Type 3.
a
O
a
a.
J
a.J- 1
l
a J+
. .Ie
C
Type 4.
~
a.
a
a
Figure 4.2.
O
l
J
Four Types of Observations Possible when Censoring is
Present and Follow-up of the Intervening Event and
Death is at Intervals
.
Referring to Figure 4.2, the four types of observations possible have
•
the following description:
72
1.
Type 1 observations correspond to individuals who fail in
interval A without an occurrence of the intervening event.
j
Z.
•
Type Z observations correspond to individuals who die in
interval Aj+R. after experiencing the intervening event in
interval A .
j
It is .possible for the intervening event and
death to occur in the same follow-up interval.
3.
Type 3 observations correspond to individuals who experience
the intervening event in interval A. and are subsequently
.
J
censored at time c which comes at the end of a follow-up
interval.
4.
Type 4 observations represent individuals who are censored
before either the intervening event or death.
4.3 Maximum Likelihood Estimation
Following the model of Chapter II, we assume that random variable
Xl is independent of both Xz and X ' but that
3
(X and X independent being a special case).
Z
3
•
Xz
and X are dependent
3
In addition, we retain
the notation for the distributions of Xl' XZ' and X31Xz described there.
Before constructing the likelihoods for the two follow-up cases
under consideration, we shall need to introduce some more notation.
Following the diagrams in Figures 4.1 and 4.Z, let
n
n
l
Z
represent the number of individuals observed to die without
the intervening event with n l · deaths in interval A .,
J
r
J
j =1, Z, ... ,r and n l = L n · ;
j=l l J
represent the nwnber of individuals observed to fail after
experiencing the intervening event with n
Zjk
experiencing
the intervening event in interval Aj and death in the
•
73
•
interval A (j=l,Z, ... ,r, k=j, j+l, ... ,r) and
k
r
r
n Z = L L nZ"k;
j=l k=j
J
n
3
represent the number of individuals censored after
experiencing the intervening event with n " experiencing
3J
.
the intervening event in interval A", j=l,Z, ... ,r and
r
J
n
.3
= L
j=l
n " ;
3J
n 4 represent the number of individuals censored before either
the intervening event or death;
where the total number of observations n=n +n +n +n .
l Z 3 4
The likelihood function can be obtained from the developments in
•
Section Z.Z.l, where the contributions to the likelihood function when
follow-up is continuous were determined for each of the four types of
observations possible as. depicted by Figure Z.Z.
4.3.1
Follow-up of Intervening Event at Fixed Intervals
By utilizing (Z.Z) - (Z.8), the contributions to the likelihood
for interval follow-up of the intervening event are as follows:
1.
From (Z.3), the contribution to the likelihood for
individual who
ffil
failed at Xl in the absence of the
intervening event is
( 4.1)
Z.
From (Z.6), the contribution to the likelihood for an
individual who
•
experienced the intervening event in
interval Aj and failed subsequently in interval
exact time t=x +x is (k>j)
Z 3
~
at
74
(4.2)
•
If the intervening event is observed in interval A.
.
.
J
followed by death in the same follow-up interval at
exact time t=x +x , the contribution to the likelihood
Z
3
is
t
J
(4.3)
F(u)g(u)h(t-ulu)du
a. 1
J-
3.
From (Z.7), the contribution to the likelihood for an
individual who
~xperienced
the intervening event in
interval A. and was subsequently censored at
J
c
is
(4.4)
4.
•
From (Z.8), the contribution to the likelihood for an
individual censored at time
c
before the intervening
event is
F(c)G(c)
(4.5)
Then, applying (4.1) - (4.5), the likelihood function of a
sample of n
observations is
•
75
•
r
r
•TT
TT
j=l k=j+l
~2.j
TI
~
2.k
TI
i=l
a.
jJ
~
F(u)g(u)h(t .. k-u1u)du
aj_l
1.J
~
ijj
tj
F(u) g(u)h(t ... -ulu)du
j=l i=l a.
1.JJ
r
•TT
·Tr·=l i=l
TI3.
J
~
o
J -1
_
a.
]
jJ Fcu)g(u)H(c .. -ulu)du
a.
J=l
1.J
(4.6)
where xli' i=1,2, ... ,n l are the observed values of Xl for observations
•
of Type 1; \jk' i=1,2, ... ,n 2jk , j=1,2, ... ,r, k=j, j+l, ... ,r are the
observed failure times for observations of Type 2; cij ' i=1,2, ... ,n 3j ,
j=1,2, ... ,r are the observed censoring times for Type 3 observations
o
and ci ' i=l, 2, ... ,n4 are the observed censoring times for Type 4
observations.
The contribution of a covariate vector
z can be included in
the likelihood for each observation in the same manner as described in
Section 2.2.3.
By incorporating
z in the notation for the density
and survival functions of Xl' X2 , and X31X2 (~.a., see (2.18)), we shall
indicate that a vector of covariates is included. For observations
of Types 1 and 4 with respective numbers of observations nl and n ,
4
the corresponding individual covariate vectors are represented by z
-1.
•
For the n 2 observations of Type 2, the corresponding individual
covariate vectors are represented by z. Ok' while z.. represents the
-1.J
-1.J
covariate vectors for the n 3 observations of Type 3.
With the
0
•
76
remainder of the notation the same as for (4.6), the likelihood
function with covariates included is
n
L=
l
11
. 1
1=
f(x · ;z.)G(x · ;z.)
l 1 -1
l 1 -1
~2'k
aj
TI a.J
r
r
·11
11
j=l k=j+l i=l
)-1
r ~1~.=2l·j
t
·11
II J
j=l
a
j
F(u;z. ·k)g(u;z. ·k)h(t. ·k-u!u;z. 'k)du
-1)
-1)
1)
-1)
j
ijj
-F(u;z ... )g(u;z ... )h(t ... -ulu;z ... )du
-1))
-1))
1))
-1))
j-l
r
·11
j=l
n
•
~fT=3l'
j
Ja
a _
j l
j
_
.
_ .. -ulu;z .. )du
F(u;z
.. )g(u;z
.. )H(c
-1)
-1)
1)
-1)
4
·11
i=l
(4.7)
F(c. ;z.)G(c. ;z.)
1-1
1-1
•
For most distributional assumptions about Xl' X2 , and x31 X2 '
maximization of (4.6) or (4.7) will require numerical techniques.
4.3.2
FOllow-av of Both Intervening Event and Death
at Fixe Intervals
By modifying (2.1) - (2.8), the contributions to the likelihood
for interval follow-up of both the intervening event and death are as
follows:
1.
From (2.3), the contribution to the likelihood for an
observation that was observed to fail in interval A. in
)
the absence of the intervening event is
(4.8)
•
•
77
z.
Fran (Z.6), the contribution to the likelihood for an
observation that experienced the intervening event in
interval A and died subsequently in interval
j
a
j
~
is
a.
k
jJ
~-l
F(XZ)g(xZ)h(t-x
a j _l
zIxZ)dxZdt
(4.9)
If the intervening event is observed in interval A
j
followed by death in the same interval, the contribution
to the likelihood is
a.
/
a.
(4.10)
J- 1
•
3.
The contribution is identical to (4.4).
4.
The contribution is identical to (4.5).
Then, applying (4.4), ·(4.5) and (4.8) - (4.10), the likelihood
function.of a sample of
r
L =}:I;.
~l.
JX
a
n
observations is
j
j
aJ
f(u)G(u)du
J-l
r
r
• TT TT
j=l k=j+l
·ITr ~z"j
fj
J-l
T
•
· TT
j=l
~-l
~2.k
n_ J~
~-l
~-l
a·
jJ
J
F(u)g(u)h(v-ulu)dudv
a _
j l
J
j
aj.
vJ F(u)g(u)h(v-ulu)dudv
a.J - 1 a.J- 1
~3.a.
TI /
i=l a _
j l
j
F(u)g(u)H(c .. -ulu)du
~J
n
4
• TTF(c.)G(c.)
i=l·
~
~
(4.11)
;'
78
where the notation is the same as that used by (4.6).
The contribution of a covariate vector
z can be included in
•
the likelihood for each observation in the same way as described in
Section 4.3.1 and displayed in likelihood (4.7).
4.4 Application of the Model to a Sample of Coronary Artery Disease
Patients Where the Intervening Event is Observed in an Interval
This section applies the fom of the model described in 4. 3.1
where the time of the intervening event is recorded in an interval.
The
data are a sample of medically treated coronary artery disease (CAD)
patients with the first occurrence of a non-fatal myocardial infarction
after a diagnosis of CAD at a cardiology clinic is used as the intervening event.
4.4.1 Background and Description of the Data
Since 1969, a group of investigators at the Duke University
•
Medical Center have been involved in an ongoing study of consecutively
evaluated coronary artery disease patients.
The Myocardial Infarction
Research Unit at Duke has developed a systematic approach to recording
and storing their clinical experience with patients with ischemic heart
disease.
The primary use of the accumulated clinical experience is
patient management.
The data bank has been described by the Duke group
as a "computerized textbook of chronic disease." A physician can locate
a subset of patients in the data bank with selected characteristics
similar to his current patient and have available follow-up infonnation
on those patients under different therapies.
Although there are many causes of disease in the coronary arteries,
atherosclerosis is by far the dominant morbid condition.· Atherosclerosis
•
•
79
is a narrowing of blood vessels resulting from the deposition of lipid,
protein, and calcium particles which takes the fonn of plaques of
fibrins and fatty material.
Coronary atherosclerosis does not necessarily manifest itself in
clinical disturbances of the heart and it may be that most human hearts
are eventually affected by some degree of atherosclerosis.
Clinical
manifestation of coronary atherosclerosis (i.e., through the observation of symptoms and course of coronary artery disease in a living
patient) has interchangeably been called atherosclerotic heart disease,
coronary artery disease, coronary atherosclerotic heart disease, and
ischemic heart disease (see Friedberg, 1966:
Chapter 17). Advanced
occlusion of coronary arteries can cause insufficient blood flow to
•
cardiac tissue (myocardial ischemia) with manifestations that range
from little or no damage to cardiac tissue to massive damage and sudden
death.
Cardiac catheterization techniques represent the most advanced
approach to recognizing coronary artery disease and measuring related
,
heart functions.
Roughly, cardiac catheterization refers to the
passage of a small catheter through a vein, usually in the leg, into
the heart.
The catheter provides a pathway to the heart through which
tests can be perfonned.
This technique enables the cardiologist to
assess the pathophysiology of cardiac disorders by detennining their
location and extent.
Many times these disorders have been indicated
by non- invas i ve means.
•
The treatment of coronary artery disease can be classed as
medical or surgical.
Medical treatment involves various drug regimes
and/or adaptations in life style such as dietary and work habits.
80
Currently, the· most popular surgical treatment for occluded arteries is
the aorta to coronary artery saphenous vein graft (aortocoronary bypass).
•
The choice of treatment for an individual patient is an ongoing research
question.
The data to be used in the present application are composed of
1131 medically treated patients from the Duke data bank who have
"significant" CAD.
Significant CAD is defined as a condition where
one or more of the three primary coronary arteries is at least seventy
percent occluded.
The data were collected from 1969 to 1976 so that
some patients have been followed up to seven years after diagnosis by
cardiac catheterization for CAD.
The intervening event of interest is the first non-fatal
myocardial infarction subsequent to diagnosis of significant CAD.
Follow-up of patients begins after diagnosis and occurs at six months,
twelve months, and annually thereafter.
•
Referring to Figure 4.1,.a summary of the number of patients
observed in each of the four possible ways is given as follows:
1.
There were 213 patients who died during follow-up without
. experiencing a non-fatal myocardial infarction.
2.
There were 19 patients experiencing a non-fatal myocardial
infarction and dying. later.
3.
There were 57 patients experiencing a non-fatal myocardial
infarction followed by censoring.
4.
There were 842 patients who were censored before either a
non-fatal myocardial infarction or death.
•
81
•
4.4.Z Estimation and Testing
For estimation purposes, we shall asswne that the distributi as
of the underlying model random variables Xl' XZ' and X31Xz are from
the exponential family where
Xl corresponds to the time from diagnosis of significant
CAD to death without a non-fatal myocardial infarction;
Xz corresponds to the time from diagnosis of significant
CAD to the first non-fatal myocardial infarction; and
X31Xz corresponds to the time from the first non-fatal
myocardial infarction to death,
with time measured in months.
In addition, we shall include baseline
information on the contraction of the left ventricle as a covariate.
•
This binary covariate
a if
z
z will be defined by
contraction of the left ventricle
l.S
normal
1
="
I if contraction of the left ventricle is abnormal.
An abnormal pattern of contraction could involve conditions such as
asynergy or aneurysm.
Let the density and survival functions for Xl' XZ' and X3 !X Z
with covariate z be represented by the" following:
82
•
where 8 , 82 , and 8 are the baseline covariate parameters and a is
1
3
the parameter representing dependence between X3 and X2 (see Section
2.4 for a discussion of a).
Using (4.7), the likelihood function is
L =
r
r
•TT TT
j=l k=j+l
~
.II J e
1.=1 a.) - 1
·TT
j=l
•
J
8 z . 'k -A eaue83Zijk (t. 'k- U )
A eaUe 3 1.) e 3
1.)
du
3
r
8 2z1.' )'k
82Z"k -A e
u
A2e 1.) e 2
81z.1.)'k
2 k a.). cAle
...£J
u
n '
.
t ...
1.))
J
a.)- 1
r
·TT
j=l
(4.12)
•
•
83
The integrals in (4.12) were evaluated by the nwnerical techniques discussed in Section 3.4 and the likelihood function was max
mized by using the iterative procedures of Kaplan and Elston (1972).
Maximwn likelihood estimates and estimated standard errors for the
parameters in (4.12) are given in Table 4.1.
TABLE 4.1
Parameter Estimates with Estimated Standard
Errcrs for the Likelihood (4.12)
Parameter
Estimate
Estimate of
Standard Error
Al
0.00268
0.00043
13 1
1. 36772
0.17717
A2
13 2
0.00192
0.00036
0.41121
0.23780
A
3
133
0.00411
0.00276
1.20477
0.63314
-0.00012
0.01830
Maximwn Likelihood
•
a
By comparing estimates and estimated standard errors, the last
line of Table 4.1 appears to indicate that a does not contribute
significantly to the maximized likelihood so that X2 and X3 can be
treated as being independent. Further, the estimates and estimated
•
standard errors for 13 1 , 13 2 , ard 13 3 imply that the covariate representing the condition of left ventricular contraction makes an important
contribution to the maximized likelihood.
84
To test these conjectures formally, we shall use likelihood
ratio tests as indicated by the second column in Table 4.Z.
When
~
•
is constrained to zero, the maximized log likelihood decreases in value
from that of the unconstrained maximized log likelihood by less than 0.001;
so indicating that the hypothesis of independence between Xz and X3
cannot be rejected. The last row of Table 4.Z indicates that the
hypothesis Sl=SZ=S3=0 can be rejected since the constrained maximized log
likelihood differs from the unconstrained maximized log likelihood by more
than 50.0 and there are three degrees of freedom for testing ..
TABLE 4.Z
Maximized Log Likelihoods for Selected
Constraints to Likelihood Function (4.1Z)
Number of
Parameters
Constraints
Value of Log
Likelihood
Seven
none
-1709.780Z
Six
Ct=O
-1709.7804
Four
S1=S Z.=S 3=0
-1751. 7215
•
As a consequence of the. tests in Table 4.Z, the model is reduced
to a six parameter model by dropping the dependence parameter
a
Ct.
When
is set to zero in likelihood (4.1Z), the respective integrals in the
likelihood can be evaluated as follows:
•
85
•
a.
/
a.J - 1
(4.13)
where Kijk
='1\1 e
BIZ. 'k
B z. 'k
B Z. 'k
1J +'1\2 e 2 1J -'1\3e 3 1J ' J' =,
1 2 , ... "r
k =J,
. J' +1 , ... "r'
t ...
1JJ
•
J
a.J- 1
z... (B +B ) -A e
,1]]
A21\3 e
2
3
e
3
B3Zijj
t ...
1]]
=-=--=----------------------
(4.14)
K...
1JJ
and
a.
/
a.J- 1
•
=
where L ..
1J
(4.15)
.
,."
,
86
The parameter estimates and estimated standard errors for likelihood
(4.12) when a=O are listed in Table 4.3.
•
The parameter estimates and associated estimated standard
errors in Table 4.3 for AI' 61 , A3 , and 63 appear to indicate that the
hazard for the distribution of Xl may not differ significantly from
that of X3'
A likelihood ratio test of no difference in hazards between
TABLE 4.3
Parameter Estimates with Estimated Standard Errors
for Likelihood (4.12) with the Constraint a=O
Parameter
Maximm Likelihood
Estimate
Estimate of
Standard Error
0.00268
0.00043
1. 36772
0.17717
0.00192
0.00036
0.41121
0.23780
0.00410
0.00237
1. 20527
0.62968
•
Xl and X3 can be performed by. constraining Al =A 3=A and 61=63=6 in the
six parameter likelihood. The resulting reduction in the maximized log
likelihood is -0.7021.
The corresponding likelihood ratio test statis-
tic provided by (2.21) has a value of 1.4042.
Since it is smaller
than the 95th percentile of the two degree of freedom chi-square statistic, the hypothesis of no difference in hazards cannot be rejected .
In addition, the maximtnn likelihood estimate for
A is A = 0.00274
•
•
87
B is B = 1.36234 with respective standard errors of 0.00042
and for
and 0.17051.
As a consequence of the lack of a significant difference in the
hazards for Xl and X , the unconditional survival function provided by
3
(3.5) is simply
(4.16)
Table 4.4 displays the results of applying the goodness of fit test
A and B
(2.24) with the unconditional survival function (4.16) where
are estimated respectively by A = 0.00274 and S = 1.36234.
By using an
overall level of significance of no greater than a = 0.05 and evaluating
•
each one degree of freedom chi-square statistic at a level of significance of 0.05/7.0=0.00714 (this corresponds to a one degree of freedom
chi-square value of 7.2374), Table 4.4 indicates that model (4.16)
provides a good fit for the observed data.
TABLE 4.4
Results of Goodness of Fit Test (2.24) Where the Survival
Function is Provided by (4.16) with Parameter Estimates
A = 0.00274, S = 1.36234
A
Time
Interval
(years)
•
0-1
0-2
0-3
0-4
0-5
0-6
0-7
No. not observed
to fail in
Interval
1024
982
949
924
915
907
900
A
Expected No;
of Survivors
Difference
(ObservedExpected)
X
1041.71
983.95
943.94
917.30
902.23
894.96
893.10
-17.71
- 1.95
5.05
6.70
12.77
12.04
6.90
3.91
0.03
0.18
0.30
1.05
0.92
0.30
2
88
4.4.3
Discussion of Results
The analysis revealed that concomitant infonnation from interval
•
follow-up of the first non-fatal myocardial infarction time did not
contribute significantly to the. estimation of the survival function.
As a result, an exponential model with a single baseline covariate
representing the condition of left ventricular contraction was used to·
estimate survival and provided a good fit of the observed data.
A
better fit of the data might be furnished by a model, such as a Weibull
mode1, that would allow the hazard to vary with time.
However, the
original exponential model represented by the likelihood function
(4.12) seems adequate to ascertain that the myocardial infarction
event did not provide useful explanatory information with respect to the
survival of medically treated patients.
There are several possible explanations for the non-significance
of the myocardial infarction covariate.
Of the i13l total observa-
•
tions, 76 experienced a non-fatal myocardial infarction and 19 of those
died in the period covered by the analysis.
A longer follow-up with
more occurrences of the intervening event might reveal different
information.
Also, the occurrence times of non-fatal myocardial in-
farctions were recorded in intervals.
Exact times would provide more
accurate information and possibly new insight, although it does not
appear that the results of the analysis would be altered for the
present set of data.
•
•
CHAPTER V
SUMMARY AND SUGGESTIONS FOR FUTURE RESEARGI
5. 1 Sunnnary
The research presented in the preceding chapters was motivated
by two follow-up
st~dies--one
involving industrial workers and the
other, patients with coronary artery disease.
The methods given by
Lagakos (1976) for incorporating information from a time-dependent
cGvariable into the analysis of failure time data were generalized and
•
the resulting model applied to data from these two studies.
In the
case of the industrial sample, we were interested in looking at the
relationship between disability retirement and death time; for the
medical data, the relationship between the first non-fatal myocardial
infarction after diagnosis of coronary artery disease and death time
in medically treated patients was of interest.
O1apter I introduced the reader to the field of survival
analysis and presented a review of some of the pertinent literature.
The literature review emphasized methods of analysis that account for
concomitant information associated with individual failure times.
Among the methods reviewed was that offered by Lagakos (1976).
The Lagakos model utilized three random variables to describe failure
time data that included information on a time-dependent event other
•
than death.
In the model, Xl corresponded to death time in the
absence of an occurrence of the other event,
Xz
represented the
90
•
occurrence time of the other event, and, given an occurrence of the
other event, X corresponded to death time measured from the occurrence.
3
Lagakos assumed that the three rahdom variables were mutually independent and exponentially distributed.
In Chapter II, a generalization of the Lagakos model was introduced
and the time dependent event other than survival was labeled an intervening event.
The generalization relaxed the covariance assumptions of
Lagakos by not requiring that survival subsequent to the intervening event
be independent of the time to the intervening event (Le., X and Xz may
3
be dependent). A test of the independence assumption was provided and
properties of the model were developed in terms of density and survival
functions, without need for reference to specific distributions.
Since
failure times without regard to whether or not the intervening event
occurs are potentially completely observable, the unconditional survival
function was utilized in a goodness of fit test of the model.
•
The generalized model was then applied in Chapter III to failure
time data from a sample of rubber workers where disability retirement
was the intervening event.
The age of employees on entry into the
study was used as a non-time-dependent covariate.
The two major results
of the analysis were the following:
1.
An employee's age was not an important explanatory
variable for death time after he had experienced a
disability retirement;
Z.
The time to death after a disability retirement was
related to date. (year) of the disability retirement.
Possible implications of these results were also discussed.
For this
application, it was assumed that the underlying model random variables
•
"
•
.
91
possessed distributions from the two-parameter Weibull family.
last section of Chapter III, some derivations specific to the
In the
Weib~'l
assumptions were offered.
In Chapter IV, the methods of Chapter II were adapted for data
that are not collected continuously but only at specified time intervals.
Two possible collection schemes that were considered are as
follows:
1.
The time of the intervening event is recorded only in a
specified interval and death times are recorded continuously;
2.
Both the time of the intervening event and the time of
death are recorded in specified intervals.
•
An
analysis of clinical data collected by the first scheme was used to
illustrate application of the model.
These data were a sample of
medically treated coronary artery disease patients where the first nonfatal myocardial infarction after diagnosis of the disease was the
intervening event.
An
assessment of the left ventricular function at
the time of the diagnosis was also utilized as an explanatory variable.
The primary result of the analysis was that information provided by
the time of myocardial infarction did not contribute significantly to
the estimation of failure times.
5.2
Suggestions for Future Research
The model given by Lagakos assumed that Xl'
mutually independent.
•
Xz,
and X3 were
In the present paper, there is no requirement
that X2 and X be independent.
3
l-bdels involving other assumptions
92
about the covariance structure of the underlying random variables may
also be of interest.
•
The most general assumption about the covariance structure of
JS.' Xz'
and X3 would relax the requirement of pairwise independence
for all three pairs of random variables. For this case, let
f 123 (xl'x2 ,x3) represent the joint density function of Xl'
Xz'
and X3 ·
It is convenient to use this joint notation, as it is in the case of
the classical parametric competing risks problem, even though we can
only observe either death before the intervening event or death after
the intervening event for a given individual.
Following the methods
and notation given in Section 2.2, especially relationships (2.2) (2.8), the contributions tothe.likelihood function for the four possible types of observations shown in Figure 2.2 would be as follows:
'" '"
Type 2:
Type 3:
'"
J
x2
(5.1)
(5.2)
f123(xl,x2,x3)dxl
'"
'"
(5.3)
(5.4)
Type 4:
where we recall that
•
c
represents censoring time.
The contributions
(5.1) - (5.4) could be assembled in the same manner described in
Section 2.2.1 to form the likelihood function.
While the model is
appealing because it lacks independence assumptions, it requires
•
93
•
specification of a trivariate failure time distribution before further
model properties can be determined.
Methods for including covariat's
that are not time-dependent will also require investigation.
Assessing the contribution'of the intervening event to a
survival analysis presents an interesting problem.
In the case of the
application of the model in O1apter IV, we were able to determine by
likelihood ratio testing that the information provided.by the intervening event did not contribute significantly to our estimation efforts.
The· ability to make this determination through likelihood ratio testing
was directly related to the constant hazards assumption used for the
application.
In those cases in which constant hazards assumptions are
not deemed appropriate, the application of likelihood ratio tests or
•
other types of tests for assessing the contribution of the intervening
event would require further study.
Extension of the model to accommodate more than one intervening
'event would provide additional research problems.
One approach would
specify random variables for the occurrence times of each· intervening
event and for the death times subsequent to each intervening event.
We might also wish to account for the. order of occurrence of the llU.Il tipIe intervening events.
Finally, there are additional research problems associated with
the follow-up studies.
For the rubber workers, an analysis of the
circumstances leading to disability retirements would be of interest.
In the case of the coronary artery disease patients, it would be useful
•
to investigate the relationship between non-fatal post-diagnostic myocardial infarctions and subsequent death times for surgically treated
patients.
•
LIST OF REFERENCES
Andjelkovic, D., Taulbee, J., and Symons, M. (1976). Mortality
experience of a cohort of rubber workers, 1964-1973. JournaZ of
OaaupationaZ Mediaine 18, 387-394.
Bayard, S. et aZ. (1974). Comparison of treatments for prostatic
.
cancer using an exponential-type life model relating survival to
concomitant information. Canaer Chemotherapy Reports Part 1 58,
845-859.
Berkson, J. and Gage, R.P. (1950). Calculation of survival rates for
cancer. Proaeedings of the staff meetings of the Mayo CZinia 25,
270-286.
Breslow, N.
(1972).
Comment on D.R. Cox (1972) paper.
JournaZ of the
RoyaZ StatistiaaZ Soaiety B 34, 187-202.
Breslow, N.
•
(1974).
Covariance analysis of censored survival data.
Biometrias 30, 89-100.
Breslow, N. (1975). Analysis of survival data under the proportional
hazards model. IntemationaZ StatistiaaZ Review 43, 45-58.
Breslow, N. (1977). Some statistical models useful in the study of
occupational mortality. In Whittemore, A., Ed. EnvironmentaZ
HeaUh: Quantitative Methods, 88-102. Philadelphia: Society
for Industrial and Applied Mathematics.
Byar, D.P., Huse, R., and Bailar, J.C. III. (1974). An exponential
model relating censored survival data and concomitant information
for prostate cancer patients. JOUI7laZ of the NationaZ Canaer
Institute 52, 321-326.
Chiang, C.L.
(1968). Introduation to Stoahastia Proaesses in BiostaJohn Wiley and Sons, New York.
tistias.
Cooper, I.A. et az' (1972) . Combination chemotherapy (MJPP) in the
management of advanced Hodgkin's disease. MediaaZ JournaZ of
AustraZia 1, 41ff.
Cox, D.R.
(1964).
Some applications of exponential order scores.
JOUI7laZ of the RoyaZ StatistiaaZ Soaiety B 26, 103-110.
•
Cox, D.R.
(1972).
Regression models and life tables (with discussion).
JOUI7laZ of the RoyaZ Statistiad Soaiety B 34, 187-202.
Cox, D.R. and Snell, E.J. (1968). A general definition of residuals
(with discussion). JOUI7laZ of the RoyaZ StatistiaaZ Soaiety B
30, 248-275.
95
•
Crowley, J. and Hu, M. (1977). Covariance analysis of heart transplant
survival data. JournaL of the American StatisticaL Association
72, 27-36.
Cutler, S.J. and Ederer, F. (1958). Maxi.mLun utilization of the life
table method in analyzing survival, JournaL of Chronic Diseases 8,
699-713.
Dyer, A.R. (1975) .. An analysis of the relationship of systolic blood
pressure, serum cholesterol, and smoking to l4-year mortality in
the Chicago Peoples Gas Company Study 1, Total Mortality in an
Exponential-Weibull model, JournaL of Chronic Diseases 28, 565-570.
Feigl, P. and Zelen, M. (1965). Estimation of exponential survival
probabilities with concomitant information. Biometrics 21, 826-838.
Friedberg, C.K., ed.
Philadelphia.
(1966).
Diseases of the Heart.
W.B. Saunders,
Glasser, M. (1967). Exponential survival with covariance.
the American StatisticaL Association 62, 561-568.
Journal of
Gross, A.J. and Clark, V.A. (1975). SurvivaL Distributions: ReLiabiLity Applications in the BiomedicaL Sciences. John Wiley and
Sons, New York.
Holford, T.R. (1976). Life tables with concomitant information.
Biometrics 32, 587-598.
•
Johnson, N.L. and Kotz, S. (1970a). Continuous Univariate Distributions I. Houghton Mifflin Co., Boston.
Johnson, N.L. and Kotz, S. (lil70b). Continuous Univariate Distributions 2. Houghton Mifflin Co., Boston.
Kalbfleisch, J.D. (1974a). Some efficiency calculations for survival
distributions. Biometrika 61, 31-38.
Kalbfleisch, J.D. (1974b). Some extensions and applications of Cox's
regression and life model. Paper presented at the Joint Statistical Meeting, Tallahassee, Florida.
Kalbfleisch, J.D. and Prentice, R.L. (1973). Marginal likelihoods
based on Cox's regression and life models. Biometrika 60, 267-278.
Kaplan, E.L. and Meier, P. (1958). Non-parametric estimation from incomplete observations." JournaL of the American StatisticaL
Association 53, 457-481.
Kaplan, E.B. and Elston, R.C. (1972). A subroutine package for maximum likelihood estimation (MAXLIK). Institute of Statistics Mimeo
Series No. 823, Department of Biostatistics, School of· Public
Health, University of North Carolina, Chapel Hill, N.C.
•
•
96
Krylov, V.J. (1962). Approximate CaZauZations of IntegraZs.
Macmillan, New York, 100-111 and 337-340.
Lagakos, S.W. (1976). A stochastic model for censored-survival data
in the presence of an auxiliary variable. Biometrias 32, 551-559.
Lagakos, S.W. (1977). Using auxiliary variables for improved estimates
of survival time. Biometrias 33, 399-404.
Levin, M. et aZ. (1976). BCNU (NSC-409962) and Procarbazine (NSC77213) treatment for malignant brain tumors. Canaer Treatment
Reports 60, 243-249.
Mann, N.R., Schafer, R.D., and Singpurwalla, N.D. (1974). Methods for
StatistiaaZ AnaZysis of ReZiabiZity and Life Data. John Wiley
and Sons, New York.
Mantel, N. and Myers, M. (1971) . Problems of convergence of maximum
likelihood iterative procedures in multiparameter situations.
JournaZ of the Ameriaan StatistiaaZ Assoaiation 66, 484-491.
•
McMichael, A.J., Spirtas, R., and Kupper, L.L. (1974). An epidemiologic
study of mortality within a cohort of rubber workers, 1964-72.
JournaZ of OaaupationaZ Mediaine 16, 458-464 .
t-bnson, R.R. and Nakano, K.K. (1976). t-brtality among rubber workers I.
White male union employees in Akron, Ohio. Ameriaan JournaZ of
EpidemioZogy 103, 284-296.
Myers, M., Hankey, B.F., and Mantel, N. (1973). A ·logistic-exponential
model for use with response-time data involving regressor variables. Biometrias 29, 257-269.
Nelson, W.B. and Hahn, G.J. (1972). Linear estimation of a regression
relationship from censored data, part l--simple methods and their
applications (with discussion). Teahnometrias 14, 247-276.
Prentice, R.L. (1973). Exponential survivals with censoring and
explanatory variables. Biometrika 60, 279-288.
Prentice, R.L. and Shillington, E.R. (1975). Regression analysis of
Weibull data and the analysis of clinical trials. UtiZitas
Mathematiaa 8, 257-276.
Ramirez, G. et aZ. (1975). Combination chemotherapy in breast cancer-randomized study of 4 versus 5 drugs. OnaoZogy 32, 101-108.
•
Sprott, D. and Kalbfleisch, J.D. (1969). Examples of likelihoods and
comparison with point estimates and large sample approximations.
JournaZ of the Ameriaan StatistiaaZ Assoaiation 64, 468-484 .
"
I.'
'~!,j'~'f'f'A.)
'//
:1 ,'11,::" /I
!
'
I!
. ~ if
f.
97
Taulbee, J.D. (1977). A general model for the hazard rate with
covariab1es and methods for sample size determination for cohort
studies. Institute of Statistics Mimeo Series No. 1150, Department of Biostatistics, School of Public Health, University of
North Carolina, Chapel Hill, N.C.
•
Truett, J., Cornfield, J., and Kannel, W. (1967). A multivariate
analysis of the risk of coronary heart disease in Framingham.
Journal of Chronic Diseases 80, 511-524.
Turnbull, B.W., Brown, B.W., and Hu, M. (1974). Survivorship analysis
of heart transplant data. Journal of the American Statistical
Association 69, 74-80.
'
Walker, S.H. and Duncan, D.B. (1967). Estimation of the probability
of an event as a function of several independent variables.
Biometrika 54, 167-179.
Zippin, C. and Armitage, P. (1966). Use of concomitant variables and
exponential survival parameters. Biometrics 22, 665-672.
•
•