Portier, K.M.; (1979)Some Multi-Stage Models of Association Between Two Causes of Death."

SOME MULTI-STAGE MODELS
OF ASSOCIATION BETWEEN TWO CAUSES OF DEATH
by
Kenneth M. Portier
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1193
February 1979
SOME MULTI-STAGE MODELS
OF ASSOCIATION BETWEEN TWO CAUSES OF DEATH
by
KENNETH M. PORTIER
A Dissertation submitted to the faculty of The University of North
Carolina at Chapel Hill in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in the Department of Biostatistics
Chapel Hill
January 1979
approved by:
p.c. £~l-~-<~UJ~
c
Advisor
,~:;.-~
(sr
-
~;f2~/if;bReader
TABLE OF CONTENTS
ACKNOWLEDGEMENTS
LIS'} OF TABLES
LIST OF FIGURES
I.
INTRODUCTION AND REVIEW OF LITERATURE
1.1
1.2
1.3
1.4
II.
DISEASE PROCESSES
2.1
2.2
2.3
III.
Natural History of Chronic Diseases
Disease Association
Example: Cerebrovascular Disease and
Ischemic Heart Disease
2.3.1 Cerebrovascular Disease
2.3.2 Ischemic Heart Disease
2.3.3 Evidpnce for Association
STATISTICAL MODELS OF DISEASE ASSOCIATION
3.1
3.2
3.3
3.4
3.5
IV.
Introduction
Multiple Causes of Death
1.2.1 Basic Concepts
1.2.2 Death Certificate
1.2.3 MUltiple Causes of Death Studies
Illness-Death and Competing Risks Models
1.3.1 Illness-Death Models
1.3.2 Competing Risks
Discussion
Introduction
Waiting Times Random Variables
The Onset-Dependence Model
3.3.1 Component Random Variables
3.3.2 Crude Probability Functions
The Duration-Onset Dependence Model
The Independent Action Model
INVESTIGATION OF MODELS USING MULTIPLE CAUSES
OF DEATH DATA
4.1
4.2
4.3
4.4
Introduction
Data Adjustments
Pseudo-Likelihood
Parametric Forms of the Models
4.4.1 Modell - Proportional Hazard Gompertz
Independent Action Model
4.4.2 Model 2 - Proportional Hazard Gompertz
Onset Dependence Model
-ii-
4.4.3 Model 3 ~ Gamma Onset Dependence Model
Parametric Estimate~ and Evaluation of Model
Fit
4.5.1 Distribution of Age at DeaHl Conditional
on Cause
4.5.2 Distribution of Cause of Death in a Given
Age Group
4.5.3 Estimated Component and Waiting Times
Random Variable Distributions
4.6 Additional Parametric Forms of the Models
4.6.1 Model 4 - Nonproportional Hazard GomFertz
Independenct Action Model
4.6.2 ModelS - Nonproportional Hazard Gompertz
Onset Dependence Model
4.6.3 Model 6 - Gompertz-Exponential Onset Dependence
Model
4.7 Parametric Estimates and Evaluation of Fit for the
Additional Forms of the Model
4.7.1 Distribution of Age at Death Conditional on
Cause
4.7.2 Distribution of Causes of Death in a Given
Age Group
4.7.3 Estimated Component and Waiting Time Random
Variable Distributions
4.5
V.
CONCLUDING REMARKS AND SUGGESTIONS FOR FUTURE
RESEARCH
BIBLIOGRAPHY
APPENDIX
-iii-
ACKNOWLEDGEMENTS
I wish to express appreciation to my advisor Dr. Regina C.
Elandt-Johnson for her guidance and encouragement during the course
of this research.
I would also like to thank Dr. Kenneth Manton for
his overall cooperation and advice, and for providing the data used
in this study.
In addition, I also extend
thanks to the other mem-
bers of my committee, Oro Norman L. Johnson, Dr. David G, Hoel, Dr.
Dana E.A. Quade and Dr. Fredric K. pfaender for their helpful comments.
The financial support for this research and my grad late study in
the Department of Biostatistics was through Training Gra1t #5 TOl
ES00133 from the National Institute of Environmental Health Sciences.
Data was provided through a grant to Dr. Manton from the National
Institute on Aging: Grant #9 ROl AGOl159-02.
I wish to thank my wife, Mary, for the love and encluragement she
gave which enabled me to complete this research.
I also appreciate the
support given me by my family and friends during my grad late studies.
Finally I also thank Ms. Jackie W. O'Neal for her e,cellent typing
of this manuscript.
-iv-
LIST OF TABLES
1.1
Reporting Trends of MUltiple Causes of Death
4.1
Life Table - United
4.2
Observed and Life Table Deaths by Causes, White Males,
1969
4.3
Parametric Estimate for the Initial Model Forms
4.4
Observed and Predicted Proportions of 'Total Deaths'
from Different Causes
4.5
Observed and Predicted Number of Deaths from Different
Causes Condtional on Age at Death
4.6
Age Specific and Total Pseudo Chi-Squared Values
4.7
Estimated Theoretical Component Means and Standard
Deviations
4.8
Estimated Means and Variances of Waiting Times Random
Variables
4.9
Estimated Correlations Between Waiting Times Random Variables
4.10
Parameter Estimates for Additional Model Forms
4.11
Age Specific and Total Pseudo Chi-Squared Values
4.12
Estimated Theoretical Component Means and Standard
Deviations for Additional Model Forms
4.13
Estimated Means and Standard Deviations of Waiting
Times Random Variables for Additional Model Forms
4.14
Estimated Correlations Between Waiting Times Random
Variables for the Additional Model Forms
States~
-v-
White Males, 1969
LIST Of FLGURES
1.1
Cause Section of Death Certificate. North Carolina State
Board of Health, Office of Vital Statistics
1.2
Illness-Death Model for Acquiring Two Diseases and
Succumbing to Them
2.1
Model of Disease Natural History
3.1
Events and Random Variables Associated With the Model
of Disease Natural History
3.2
Two Disease Onset Dependence Model
4.1
Plots of Empirical and Estimated Conditional Den~ities
by Cause Category - Proportional Hazard Gompertz
Independent Action Model ~ Model 1
4.2
Plots of Empirical and Estimated Conditional Densities
by Cause Category - Proportional Hazard Gompertz Onset
Dependence Model - Model 2
4.3
Plots of Empirical and Estimated Conditional Den~ities
by Cause Category - Gamma Onset Dependence Model - Model 3
4.4
Plots of Empirical and Estimated Conditional Densities by
Cause Category - Nonproportional Hazard Gompertz Independent Action Model - Model 4
4.5
Plots of Empirical and Estimated Conditional Densities by
Cause Category - Nonproportional Hazard Gompertz Onset
Dependence Model - Model 5
4.6
Plots of Empirical and Estimated Conditional Densities by
Cause Category - Gompertz Exponential Onset Dependence
Model - Model 6.
-vi-
ABSTRACT
KENNElll PORTIER. Some Mul ti-Stage Models of Assod ation Between
Two Causes of Death (under the direction of Regina C. Elandt-Johnson).
ABSTRACT:
Renewed interest in mUltiple cause mortality data in recent
years has lead to consideration of possible use of such data in analysis of human mortality and chronic disease associations.
To study
certain aspects of this problem, a parametric model of mortality which
takes into account disease natural history and the informational content of multiple causes of death data is presented.
This model con-
siders diseases to progress in stages and is able to incorporate
different assumptions about the independence/dependence of causes of
death.
The scope of this study is restricted to considera1ion of a
model of mortality for a population assumed susceptible to only two
chronic diseases.
The basic model components involve concepts of
hypothetical age of onset of chronic conditions and age at death due
to specific causes.
Two models are discussed: one in which the times
of onset of the two diseases are assumed independent and one in which
the times of onset are assumed to be positively correlated.
Specific parametric forms of distributions were fitted to deaths
in the U.S. life table constructed from the 1969 white male death
tificates.
cer~
The main focus of this study was to examine the appropri-
ateness of using the Gompertz distribution in these models.
were estimated using a 'pseudo-likelihood' function.
Parameters
Although we did
not obtain a perfect fit, because of the simplifying assumptions used
-vii-
in these models, this study does indicate certain possibilities for
further development of models for disease association.
-viii-
CHAPTER r
INTRODUCTION AND REVIEW OF LITERATURE
1.1
Introduction
One of the basic problems in the study of human mortality, especi-
ally mortality due to chronic diseases, is that death is often the
result of multiple pathological events and that such events may exhibit
various types of interdependency.
Since these pathological events may
be related to a number of distinct diseases, it has been suggested that
these diseases may be associated.
The nature of this association may
be represented in a number of ways.
The difficulty lies in realisti-
cally representing the nature of the dependency among diseases as causes
of death in such a manner as to be useful in the analysis of available
mortality information.
Information on 'multiple causes of death', as available from death
certificates, has been suggested for use in the study of disease association.
The purpose of this dissertation is to study the question of
dependent causes of death.
A parametric model of mortali ty which takes
into account the history of the disease and the informational content
of multiple causes of death data is developed.
This model considers
diseases to progress in stages and is able to incorporate different
assumptions about the independence/dependence of causes of death.
Using
this model we examine various aspects of the mortality process which are
not apparent from the original data.
2
The scope of this study is restricted to consideration of a
model of mortality for a population assumed to be susceptible to only
two chronic diseases. Even though a human population of this type does
not exist, the insight gained from the two disease models should suggest ways to consider mortality due to multiple causes.
The fairly
simple biological and mathematical assumptions used for the two disease
models result in complex mathematics.
This suggests that study of the
restricted model is necessary to the formulation of future, more
realistic models.
The properties of the model developed in this study depend upon
the limitations of the data and the assumptions used.
For this reason
we discuss, in the next section, some general properties of the death
certificate and multiple causes of death data.
Illness-death models
and competing risks models are also reviewed to provide background
material on suggested methods for analysis of human mortality and
disease association.
1.2
Multiple Causes of Death
1.2.1
Basic Concepts
When an individual dies, his death is assigned, fOl statistical
purposes, to some 'cause of death.'
The cause of death may be
a disease, condition, accident or act of violence.
For many years now,
mortality medical statistics have been based on the undEcZying oause
of death concept.
The underlying cause of death is defined as "either
the single disease or injury leading directly to death cr the circumstance of the accident or violence which produced the fatal injury,"
(W.H.O. 1949).
At present, mortality medical data are classified under
3
the provisions of the Eighth Revision of the InteY"Ylational Classifica-
tion of Diseases as Adapted fop Use in the United States, denoted ICDA
8th (NCHS 1968).
"This classification system includes a series of
rules for defining the causal relationship of the entries made by the
medical certifier in order to identify the underlying cause of death
and for qualifying that cause in certain situations.
The underlying
cause of death identified initially may be changed to give preference
to certain classification categories or to consolidate two or more
medical entries into a single classification category when they meet
rCDA criteria,"
(Templeton and Evans
1970).
The National Center for
Health Statistics, utilizing the ICDA classification, system, assigns
an underlying cause to each death.
It has been suggested that assigning a single cause to a death,
especially a death which is the result of a chronic illness, may not
adequately describe the medical status of the individual at death
(Weiner et al. 1955, Krueger 1966).
This, coupled with increasing
demands for more comprehensive medical data on cause of death, has led
researchers to consider assigning 'multiple causes' to Each death.
By
'multiple causes' we mean the assigning of more than one morbid condition as contributory to the death of the individual.
The mUltiple
causes usually include the underlying cause and additional 'associated
causes.' (Proceedings PHCRS 1962),
Information on multiple causes of
death is obtained from death certificate data.
1.2.2
Death Certificate
The cause section of the death certificate, as illustrated in
Figure 1.1, consists of two parts.
In Part I, the certifier is
4
instructed to record all 'immediate' (line a), 'intermediate' (line b)
and 'underlying' (line c) conditions relating to death.
The underlying
cause of death is recorded on the lowest used line of Part I.
The sequence of morbid events when viewed in chronological order
should appear on the death certificate as
immediate
cause of death < a result of
(line a)
intermediate
condition
< a result of
(line b)
underlying
condition
(line c)
Intermediate or immediate conditions may not be present on the certificate, in which case the one cause reported is the underlying cause.
Part II of the cause section contains 'contributory conditions'
defined to be any important disease or condition that was present at
the time of death which may have contributed to death but which was not
related to the immediate cause of death (Pitts 1976),
These conditions
may contribute to death by modifying the effect of the underlying condition.
FIGURE 1.1.
Cause Section of Death Certificate, North Carolina State
Board of Health, Office of Vital Statistics.
~(Of-----';;'··MC--=--·.",
.uv.1'
5
Since 1968, the National Center for Health Statistics has used a
computer program called ACME (Automated Classification of Medical
Entities) to select the underlying cause of death using all information
available on the death certificate (Pitts 1976).
This process involves
entry into the computer system of all diseases, conditions, acciJents,
and injuries given in the medical certifier's statement.
The informa-
tion enters the system in the form of lCDA codes, in the same order
given by the certifier, thus allowing access to detailed medical data
for multiple cause of death analysis as well as permitting computer
selection of the underlying cause based on the International Rules for
determining the underlying cause.
Special instructions for applying
the ICDA codes to all diseases, conditions and injuries were developed
in order to retain, in coded form, as much information as possible from
the medical certifier's statement.
The underlying cause of death (UCD) specified by the certifier on
the death certificate in many cases will be different from that selected
by the ACME program.
The ACME program UCD selection rules are designed
to yield useful tabulations of deaths by specified underlying causes.
Thus ACME may be biased toward selecting certain diseases as an underlying cause in order to produce specific mortality tables and hence
underlying causes selected may not correspond to the definition of
underlying cause of death given in section 1.2.1.
As an example, consider the case where the certifi('r indicates
General Arteriosclerosis as the underlying cause with
Disease as an associated condition.
I~,chemic
Heart
The ACME program w.ll almost
always select Ischemic Heart Disease as the underlying ,:ause.
This is
6
because Ischemic Heart Disease tabulations are preferred for vital statistics publications and not because we assume General Arteriosclerosis
to be a result of the Ischemic Heart Disease.
For this study we will use the medical certifier's statement of
cause of death as reported in an attempt to avoid the bjas of the
ACME program discussed above.
It is assumed then that 1he certifier's
ordering of entities as underlying and non-underlying may have a more
biological justification than the ordering selected by .ACME.
This
assumption may not always be valid as has been suggested from studies
of multiple causes of death.
Many problems with the information obtained by means of the death
certificate have been identified.
The number, position, language and
reliability of reported entries depend on such factors [s age and sex
of decedent and place of residence at time of death.
1963, Janssen 1940, Gura1nick 1966).
(I:eodenkoph et al.
It is reasonable
10
assume that
not all conditions present at death, either known or unLnown to the
certifier, are reported (Moriyama et aZ. 1966, James et aZ. 1955,
Polheen and Emerson 1942, 1943).
the
Those entities report"d may represent
circumstances of death rather than correspond to the instructions
on the death certificate (Guralnick 1966).
These difficulties affect
the accuracy and validity of multiple causes of death data, which, in
turn, will affect any analysis performed on these data.
1.2.3
Multiple Causes of Death Studies
Until recently only four surveys of multiple
cause~
of death had
been undertaken at the national level in the United States: 1917
7
(Bureau of the Census 1920), 1925 (Bureau of the Census 1927), 1940
(Vital Statistics 1943) and 1955 (NCHS 1965). These surveys were
based on samples of total deaths from the years surveyed.
In the early part of this century the majority of deaths were due
to acute conditions.
This has changed with time until today the major-
ity of deaths are the result of some chronic disease.
As mortality
patterns have changed, the inadequacy of coding only the underlying
cause of death has become more evident (Guralnick 1966, Olsen et at.
1962).
Because chronic diseases develop over many years and usually
include many conditions and symptoms, the certifier of death has more
information on the course of disease and records more information on
the certificate in order to clearly specify cause of death.
This
trend toward reporting more information on the cause of death is
illustrated in Table 1.1.
This information was obtained from the
surveys mentioned above and from Pitts (1976), for 1969 deaths.
TABLE 1.1.
Reporting Trends of Multiple Causes of Death.
Year
Percent of Certificates With
More Than One Condition Reported
1917
35%
1925
44%
1940
55%
1955
58%
1969
82%
Early advocates of mul tiple causes of death data On particular,
Dublin and Von Buren 1924, 1925; Janssen 1940) suggested studying the
extent of disease association manifested in these data as well as using
8
these data to ascertain future mortality patterns.
But the time and man-
power needed to tabulate multiple causes of death data by hand restricted
these researchers from explaining these concepts very deeply.
Recent studies of multiple causes of death (National Center for
Health Statistics 1955 Multiple Cause
Stud~
Guralnick 1966 and Olsen
et aZ. 1962) have also considered the question of disease association.
Their approach has been to consider combinations of diseases to determine whether the joint occurrence of these diseases could arise from
the random occurrence of the separate diseases.
to allow for age and competing risks effects.
This approach failed
In addition it does not
recognize that, as a result of the instructions on death certificates
all diseases mentioned together must be associated, regardless of how
often they appear together (Olsen et aZ. 1962).
It seems clear that the next step in studying disease association
concepts within the restrictions of mUltiple causes of death data
would be to include the age and competing risks effects in the
analysis.
One method of doing this is to develop a model to explain
the age distribution of deaths due to multiple causes, and which includes
competing risks concepts.
Before proceeding with this step, we will
consider illness-death and competing risks models in order to examine
how age, competition and disease association are incorporated into
existing models of human morbidity and mortality.
1.3
Illness-Death and Competing Risks Models
1.3.1
Illness-Death Models
A simple stochastic model representing a disease process as a
sequence of health, illness and death states was discussed by Fix and
9
in 1951.
Ne~nan
Mathematical properties of this type of illness-death
model are given in Chiang (1968).
Basically an illness-death model allocates the population of
interest among illness (or well) and death state.
Illness may be
repetitive or curable as in acute illnesses or irreversible as in
a chronic illness.
Because an individual cannot stay in an illness
state (or well state) for an infinite amount of time, such states
are called transient in the standard terminology for Markov chains.
Death is irreversible and terminal and hence all death states are
termed 'absorbing'.
When more than one illness sta- e or death state
is considered, the problem of competing causes of death arises.
Transitions from one state to another are assumed to be governed
by intensities of risks (also called hazard functions or forces of
mortality) of illness and death and are defined as follows.
A.. (t)M
1)
+
oeM) = Pr[individual in illness stcte
will be in illness state
jl ..
1)
(t)M
+ 0
In general the hazards,
t,
at time
(fl.t) = Pr[individual in illness state
will be in death state
time,
j
where
under observation.
t
A.. (t)
1)
and
jl ..
1)
j
(t),
at time
i
i
t
+
tit]
at time
at time
t
+
t
t
fl.t]
are functions of
may be age of the individual or may represent time
In certain situations the hazards may also depend
on the length of time spent in the present state (state
i).
If the
hazard functions are assumed constant, then the duration of stay in
any illness (transient) state is distributed as a negati,e exponential.
10
This process is Markovian since we assume that given thE current state
of an individual, his past history is irrelevant to prediction of his
future status.
Using the hazard functions,
A•• (t)
1J
and
~
.. (t),
1J
one can derive
the form of the crude transition probabilities which arE defined to be:
P~ . (T)
1J
Pr[individual in illness state
be in illness state
P .. (T)
1J
j
i
at time
t
Pr [individual in i llness state
be in death state
j
at time
at timE
=
i
t
=
t
=
0
will
t
=
0
wi 11
T]
at time
T] .
The transition probabilities are termed 'crude' to emphasize that the
risk of transfer between two states is affected by other competing
risks.
Concepts of association between two diseases has been studied by
Wij sman (1958) (also Clifford (1977)) wi thin the frameworl of an illnessdeath model.
He discusses the concepts of incidence and mortality
independence by means of the simple illness-death model for acquiring
diseases A and B given in Figure 1.2.
r
FIGURE 1.2.
Illness-Death 'Model for Acquiring Two Diseases
and Succumbing to Them.
,T
D_
n
-0
~OO
A02
"/52
DZ
A
OI
AZ3
~rl
> Sl
1
AI3
.. S3
113~
D
3
11
We define
So
as the well state,
Sl
and
52
as the illness
states of having only disease A or disease B, respectively.
sents the illness states of having both disease A and B.
the state of being dead having been in state
The
S.
J
53
D.
J
repre-
represents
just prior to death.
assumption that no transition is reversible is indicated by
the direction of the arrows in Figure 1.2.
Thus once an individual
leaves a state,no return to this state is possible.
This implies that
diseases A and B are considered incurable (as would be expected if A
and B were chronic diseases, for example).
Using this simple model, Wijsman defined diseases A and B to be
incidence independent i f 1.. 01 (t) = A23 (t) and A02 (t) = A (t) Le.,
l3
the risks of acquiring either disease is independent of whether or not
the other disease is present. Positive incidence association would be
assumed if
A01(t) < AZ3 (t)
and
Negative incidence
A02 (t) < A (t).
l3
association would be defined with these inequalities reversed.
When
the inequalities were not both in the same direction, the association
could not be described as either positive or negative.
The two diseases A and B would be defined to be mortality inde-
pendent if
~33(t) = ~ll(t) + ~22(t)
-
~OO(t).
expressing the additi-
vity of risks of two independent causes of death.
mortality association would be assumed if
~OO(t).
~33(t)
Positive (negative)
>
«)
~ll(t) + ~22(t)
Diseases A and B would be considered completely independent
(or progress independently) if they were both incidence and mortality
independent.
These concepts can also be extended to the situation where
hazards are functions of age and duration in a state.
-
12
These concepts of incidence and mortality association are useful
in the study of disease association when the hazard functions can be
estimated.
To do this, information on age-specific prevalence and/or
incidence of the two diseases and their joint occurrence would be
needed.
More complex models, such as the semi-Markov illness-death
models proposed by Weiss and Zelen (1965) and Lagakos (1976), which
allow the hazard rate to be functions of both age and length of stay
in a state, would require even more information for parameter estimation.
This type of information is not readily available for many
disease combinations.
When no information on disease incidence is available other than
the cause of death, the hazards for transition between illness states,
A.. (t),
can not be estimated directly.
1J
Clifford (1977) has shown
that even for the serial sacrifice animal experiment, where information
on disease incidence is determined by sacrificing some animals before
death due to natural causes can occur, the transition rates between
illness states are nonidentifiable.
Rates (or parameters in general)
are defined as nonidentifiable if and only if for every
1J
there
P~ . (t IA. .)
exists some other
A~.
1J
such that
P..
* (t IA..
* )
P..
*
is the crude transition probability distribu-
1J
1J
where
1J
A.. ~ A~.
A..
1J
1J
and
1J
1J
tion defined earlier in this section.
If, in addition to information on cause of death, certain relationships between the hazard rates are specified, it is conceivable
that estimates for the parameters could be obtained.
The main hin-
drance to use of this technique is deciding on reasonable forms for
the relationships between the rates.
A recent paper by folley,
Burdick, Manton and Stallord (1978) utilizes this approach to estimate
13
the transition rates for a model of cancer latency.
In this study
transition probabilities to unobserved states are modeled as explicit
functions of time (age).
Information from epidemiologic and animal
studies as well as theories of carcinogenesis is used to select specific mathematical expressions for the relationship.
The purpose of
the approach used by Tolley et al. is to synthesize available information on disease development and related mortality into a comprehensive
model which may offer a basis for further analysis and insight on
these diseases.
1.3.2
Competing Risks
Models which incorporate the concept of competing risks (or
causes) of death have been developed from the Markov illness-death
model and are used to analyze mortality data where more than one
cause of death is of interest.
A history of competing risks is given
by Seal (1977) and a comprehensive review of competing risks models
is given by Gail (1975).
Basic results in competing risks are given
by Chiang (1961, 1968), David (1974), Berkson and E1veback (1960) and
recently David and
~~oeschberger
(1978).
Consider a population in which each individual is exposed to
causes (or risks) of death, denoted
C ,C , ••.
1 2
,Ck'
k
For each death
we obtain information on the time and specific (underlying) cause
of death.
A useful concept when discussing competing risks is that
of assigning hypothetical 'times to death', denoted
Xl 'X 2 ' ... 'Xk '
representing the times at which an individual dies of cause
respectively.
C ,C "",C
k
l 2
14
With this framework we defined the joint survival function as
= prr~
~=l
(X. >
J
x.~
J
J
Similarly we may define the survival function from cause
C.
J
when
C.
J
is the onZy cause of death acting in the population, called the net
survival function for
C.,
as
J
s. (x.)
]
J
= Pr [X. > x.]
J
J
with associated force of mortality (hazard)
The fundamental assumption in classical competing risks is that
each death is due to a single cause.
Thus an individual who dies of
one cause is unable to die from any other cause at a later time and
hence competition among causes is developed.
This implies that for
each death only
and
<5 =
j
are observed.
for which
W = X. (or X. < X. for all
J
J
1
i
~
j)
Hence the overall survival function (crude survival func-
tion) from all causes is:
SW(x)
= Pr[min{X.}
j
J
For each specific cause
C.
J
> x]
= Sl , 2 ~ ... ~ k(x,x, ... ,x)
.
we observe the crude probability function
15
Pj (x)
Pr [(X > x) n 6 == j]
==
j
wi th associated
ll~(X)
J
the hazard due to cause
==
- dP~(x) /
J
S (x)
dx
W
acting in the presence of all other
C.
J
causes.
The purpose of a competing risk study is to estimate the net
survival functions
S.(x.)
J
J
and to predict patterns of mortality to
be expected under such hypothetical conditions as the removal of one
disease condition as a cause of death or the removal of all causes
of death other than this one.
In this context, the joint distribution
of the hypothetical 'times to die' is of mathematical c0nvenience.
Most competing risks models which have been proposed assume that
the causes of death operate independently of each other.
This assump-
tion implies that the risk of death from one cause is independent of
and unaffected by changes in the risk of death from other causes.
This
in turn implies that the times to death are mutually independent.
A competing risks model which permits death to be the result of
mUltiple causes was suggested by David (1974) and Lee ard Thompson
(1974) based on a strategy developed by Marshall and
01~in
(1967).
Manton, Tolley and Poss (1976) consider the biological jmplications
of this model and the question of cause elimination.
where we assume only two causes of death
may define
k
2 - 1
==
(k
==
2),
say
IT the case
A and B, we
3 patterns of death for analysis; ceaths due to
disease A alone, B alone and both A and B in combination.
This model
assumes independent risks of death from each of the three patterns.
16
The theoretical (or mathematical) importance of the assumption
of independent times to death can be appreciated by the fact that if
independent times to death are not assumed, then the model of hypothetical survival times is nonidentifiable.
bability functions
P~(x)
J
joint survival function,
functions,
S.(o)
J
That is, the crude pro-
do not uniquely specify a form for the
S
Xl' .• Xk
(Tsiatis 1976).
(a)
'
nor for the net survival
This is similar to the problem
encounted in the Markov illness-death model.
For mathematical convenience or when we wish to fit a particular
mathematical model, the joint distribution of the times to death
is often presented in a parametric form.
Most studies
method have included the assumption of independent
~ith
time~
this
to death.
Exceptions to this are the studies of Meschberger and David (1971),
Moeschberger (1974) and Elandt-Johnson (1976).
Moeschb£rger and David
present a general likelihood function for observed deatrs assuming
the (parametric) joint density function of (potential) times to death
is
given.
Moeschberger considers the specific case of a bivariate
normal density function for times to death.
Elandt-Johnson discusses
the failure time distributions when a cause is eliminatec' and failure
times are assumed dependent.
The assumption of independent risks in models applied to human
popUlations has little justification for the majority of causes of
death.
Nevertheless, it is a necessary first step in the modeling
process when no a ppiopi knowledge is available on the dependence
of risks. Alternatively, parametric models of the joint Hstribution
of the times of death require this a ppiopi knowledge to specify the
form of the distribution and the nature of the dependency.
One
17
advantage of the parametric approach is the ability to incorporate different assumptions as to the mode of action and interaction of diseases
into the analysis of mortality data.
With the parametric model we may
also be able to study certain characteristics of the disease process
which are not readily apparent from the data.
1.4
Discussion
The main aim of this dissertation is to develop and examine a
model of human mortality which incorporates certain concepts of disease
association.
Developing this model involves accounting for the effects
of disease association and competing risks on the distribution of age
at death.
For various reasons, as discussed in this chapter, illness-
death and competing risks models are inadequate for our purposes.
Cer-
tain concepts illustrated in these models are used to develop a new
model of human mortality.
The essential components of this new model
are derived from a discussion of the biological processes associated
with chronic diseases and what information on these processes is available from multiple causes of death data.
In the next chapter we discuss a biological model of disease
gression.
pro-
The particular concepts of disease association incorporated
into the mathematical model, as well as the two diseases used to illustrate the model are also discussed.
In Chapter III we present the
statistical model and calculate those probabilities necEssary for fitting the model to data.
In Chapter IV we attempt to apFly this model
to mortality data on deaths due to two specified diseaSES.
CHAPTER II
DISEASE PROCESSES
Insight into the nature of chronic disease processes is useful
to a proper understanding of multiple causes of death data.
A review
of the important events in the development of diseases and in particular noting which events are observable, will suggest a model for
analysis of multiple causes of death data.
2.1
Natural History of Chronic Diseases
A biological model of the development of a chronic disease can be
obtained by consideration of the natural history of the disease.
The
natural history of a chronic disease is a description of the natural
course of events and stages in the development of the disease
(Mausner and Bahn 1974, Sartwell and Merrell 1952).
Al~hough
each
disease has its own natural history, we will assume that chronic
diseases have natural histories defined by the four events:
Initiation" Clinical
Onse~
and Death.
(see Fig. 2.1)
Birth"
This, of course,
is a simplification of the true biological processes but is a necessary simplification if we are to illustrate the major components of
chronic disease natural history without considering each disease
separately.
19
Figure 2.1.
STAGE
EVENT
Model of Disease Natural History
'Well t
Incubation or
Latent Period
Advanced
Disease
1------.1----1----1
Bi rth
Initiation
Clinical
Onset
Death
The event Birth is simply the beginning of life.
observable
Birth is an
event.
Initiation of a disease is that point in time at which an individual is first considered as having the disease.
For many diseases,
especially chronic diseases, it is very difficult to define exactly
when initiation of disease has occurred.
The pathological changes
associated with the early stages of disease may not be manifested as
symptoms and it may be difficult if not impossible to detect these
changes using available medical procedures.
Even though we recognize
that at some point in time initiation must occur, for practical purposes it is an unobservable event.
We consider ClinioaZ Onset of a disease to be that point in
time, or more precisely, short interval of time, during which the
disease first manifests itself in the form of certain specified symptoms.
Since the development of a disease is an irregularly evolving
process, some symptoms may appear before others, making the time
or age of clinical onset of a disease very difficult to determine
accurately.
For many diseases there is no specified list of symp-
toms which define onset of the disease and hence there is much flexibility in the definition of onset.
But clinical onset, unlike ini-
tiation, is an observable event and is useful in dividing individuals
20
into two categories, those for which the disease does not onset and
those for which it does.
The event Death is fairly simple to define but the event Death
clue to a disease is much more difficult to define.
Prior to death an
individual usually manifests numerous symptoms and conditions which
may be associated with a number of 'distinct' diseases.
At death it
becomes particularly difficul t to assign the underlying cause of death
(see section 1.2).
This is related to the difficulty in defining
the onset of a disease.
Thus though death is an observable event,
there are problems in assigning the death to the specific disease
which is the underlying cause.
The period of time from clinical onset to death due to a disease
may be called the stage of advanced disease (see Fig. 2.1).
In this
stage the disease exists as a threat to the life of the individual.
The amount of time spent in this stage may depend on many factors,
including age at onset, efficacy of treatment and presence or
absence of other diseases and risk factors.
The period from initiation to clinical onset is called the
Zatency or incubation period.
During this stage the disease is pre-
sent but has not manifested sufficient symptoms for it to be declared
as present in the individual.
The stage in the natural history of a disease from birth to
initiation is difficult to name.
For our purposes we call this the
'weZl' stage, indicating that the early form of the disease is not
present during this time period.
Since the age of initiation is not
21
observable we cannot easily separate individuals into the 'well'
or incubation stages.
For the diseases in which we are interested, we will assume
that once an individual has reached the stage of clinical onset for
a disease, he is, from that point on, considered to be in the
advanced disease stage for this disease, i.e. clinical onset for a
disease is irreversible.
Death may be due to any disease for which
clinical onset has occurred.
For now, the many factors which may
affect the rate of transition from one stage to the next in the
disease natural history will not be included in the development of
the model.
Of importance to the model will be the fact that clinical
onset, when
observe~
indicates that the individual is in the advanced
disease stage.
2.2
Disease Association
The question of associations between diseases has always been of
interest to medical investigators.
As long as the predominant diseases
were caused by single agents (acute diseases caused by germs or
viruses) which could be isolated using appropriate techniques, very
little was done to exploit the potential value to the epidemiologists
of disease associations.
Now our attention is more and more occupied
by chronic diseases, caused by mUltiple factors and developing over
long periods of time.
Since the presence of one or a group of risk
factors may predispose a population to a number of apparently distinct
diseases, the question of disease association assumes greater importance (Neil, 1962).
22
From the epidemiological point of view, the effect of risk factors on the natural history of disease and associations among
diseases is very important.
In all three stages of disease natural
history, risk factors play an important role by modifying the rates
at which crucial events occur and by affecting the degree of interaction between diseases.
'l~e
important components for model building purposes are the
'observable' events:
clinical onset and death.
The corresponding
quantities 'age at onset' and 'duration or advanced disease' are
important in specifying the nature of disease association to be
incorporated into the model.
Based on the discussion of disease
association above and the model of disease natural history discussed
in section 2.1, we define three types of disease association:
Independent action, Onset dependence and Duration-onset dependence.
i)
Independent Action:
For two diseases, denoted disease A and disease B, we simply
assume that disease A proceeds independently of disease B.
Thus, for
this case, the age at death from disease A is considered as independent of the age at death from disease B.
Similarly, the age at onset
of disease A would be independent of the age at onset of disease B.
From a biological point of view it is unlikely that two chronic
diseases are truly independent.
Because of the possibility of common
risk factors or precursors and the complexity and balance of the human
body, disturbances due to the presence of one disease affect the whole
system and hence affect any other disease present or developing.
23
ii)
Onset Dependence:
As mentioned at the beginning of this section, one way in
which two diseases could be associated is through the presence of
risk factors or precursors which are common to both diseases.
Since
the presence of common risk factor(s) or precursor(s) could be expected
to increase the risk of onset of both diseases, we would expect the
onset of disease A and the onset of disease B to be positively correlated.
It is this concept of (indirect) association between two
diseases which is incorporated into the mathematical model of the
next chapter.
iii)
Duration-Onset Dependence:
This case may be considered as a complication of the onsetdependence case.
Assuming onset-dependence exists, it is further
suggested that the duration of advanced disease (time from clinical
onset to death) may depend upon the age at which clinical onset of
the disease occurred.
This implies that the times from onset to
death for these two diseases are also dependent.
For many disease
combinations this type of dependence may be a more realistic representation of the interaction of disease processes.
It is certain that within the simple framework we have been
using, other concepts of association could be developed.
For pur-
poses of this study we will restrict discussion to the above three
definitions of association.
24
2.3
§xamp~e:
Cerebrovascular Disease and Ischemic Heart Disease
To illustrate the concepts of disease natural history and association discussed so far, two chronic diseases, Cerebrovascular
disease
(leDA
8th 430-438.9)
8th 410-414.9)
will be used.
and Ischemic heaTt disease
(ICDA
Each of these diseases encompasses
a large number of conditions and situations (see Appendix A), any
one of which could have been selected for this illustration.
The
choice of these grouped disease categories was made because the
data on multiple causes for these grouped disease categories was
readily available.
By using grouped categories we lose the ability
to make specific definitions of the important events in the disease
process but we obtain more information on the distribution of deaths
in all age groups, especially the very young and very old age groups.
In an exploratory model of this type our initial concern is assuming
sufficient data with which to estimate parameters in the model.
Once
the properties of this model have been examined it would then be useful to consider more specific disease categories.
2.3.1
Cerebrovascular disease
Cerebrovascular disease (denoted CVD) is a leading cause of
death in the United States today.
CVD refers to' those processes
which alter the supply of blood to the brain resulting in damage
and eventual death.
Mechanisms for the occurrence of CVD include
cerebral thrombosis, cerebral hemorrhage and cerebral embolism
with the major cause being cerebral thrombosis.
25
Those factors which define the initiation of CVD are not known.
It is suspected that in the incubation stage basic metabolic changes
in the blood and arteries occur which lead to the creation of emboli,
cerebral atherosclerosis and/or weakening of the arteries.
The
disease may be first manifested (clinical onset) as acute hypertension and/or in a cerebrovascular attack.
The time spent in the
advanced disease stage may be short as when the first cerebrovascular
attack is fatal or it may be much longer, depending to a large extent
upon how onset is defined.
Duration of advanced disease would be
expected to depend on many factors, two of which would be the age
at onset of CVD and the presence of other diseases, especially heart
diseases.
Studies of the quality of death certificate data for deaths
classified to Cerebrovascular disease under the Eighth Revision of
the International Classification of Diseases, Adapted For Use in the
United States (denoted ICDA 8th) are not available.
A study of a
comparable Seventh Revision ICDA Classification, Vascular Lesions
Affecting Central Nervous Systems (ICDA 7th 330-334.9, comparability
ratio .9905, NCHS 1975), was done by Moriyama, Dawber and Kannel
(1966).
This study estimated that in at least 54% of deaths classi-
fied to
ICDA 7th 330-334.9
the clinical statement of cause of death
was found to be reasonable or better.
This study was based on a
sample of death certificates and the results of a questionnaire sent
to the certifiers of these deaths.
The conclusions of this study,
though tentative as the result of a fairly high non-response rate,
26
suggest difficulties in the assigning of deaths to CVD :tnd suggest
also caution in the interpretation of the results of the model fitting exercise.
2.3.2
Ischemic Heart Disease
Ischemic heart disease (denoted IHD) refers to death resulting
from the deficiency of blood to the heart due to functional constriction
or actual obstruction of coronary blood vessels.
The mechanisms for
the occurrence of HID are similar to those of CVD and include coroI
nary artery embolism, occlusion, hemorrhage and
t~rombosis.
There
is strong evidence (Kuller 1976) that coronary atherosclerosis with
subsequent coronary artery stenosis is a major underlying determinant
of this disease.
Many researchers have suggested that initiation of IHD occurs
with the appearance of fibrous plaques on the arterial Halls (atherosclerosis).
Clinical heart disease is almost always associated with
severe coronary artery stenosis and its precursor atherosclerosis.
But not all individuals with coronary artery stenosis manifest clinical disease.
This suggests that either a cri tical
degrc~e
of coronary
artery stenosis is required for a heart attack (clinics. onset) to
occur or some other factor determines the likelihood of the heart
attack in the presence of severe coronary artery stenos.S.
Like CVD
the length of time spent in the advanced disease stage Tlay be short,
as when the first heart attack is fatal, or may be long
depending
on age at onset, extent of coronary artery stenosis and other factors.
27
Some indication of the quality of certification for deaths due
to Ischemic heart disease (ICDA 8th 410-414.9) is given from studies
of the comparable Seventh Revision ICDA group called Arteriosclerotic
heart disease (ICDA 7th 420-420.2, comparability ratio 1.1456,
1975).
N~HS
The study by Moriyama, Dawber and Kannel (1966), discussed
previously, estimated that approximately 80% of deaths classified
to arteriosclerotic heart disease (denoted AHD) had the clinical
statement of cause of death determined as reasonable or better.
In
another study, Beadenkopf, Abrams, Daoud and Marks (1962) compared
the results of autopsies
on the death certificate.
of
individuals with what was specified
In 82% of the cases where AH[ was indi-
cated as a cause of death by the death certificate, the autopsy
indicated that AHD was a reasonable choice of cause of ceath.
On
the other hand, in only 50% of deaths where autospy indicated AHD
as a cause of death did the death certificate specify AHD as a cause.
Both the studies by Moriyama et al. and Beadenkopf et a1. indicate
that the accuracy of the diagnostic information decreases with
increasing age and that secondary or contributory conditions such
as diabetes mellitus and hypertension are underrepresented on the
death certificate.
If we are willing to accept the findings of these studies of
Seventh Revision ICDA coded deaths as valid for the 1969 data, then
we must recognize that information from the death certificate yields
a biased picture of the distribution of deaths by cause and age at
28
death.
Since we lack sufficient information to adjust for this
bias, we use the data as coded and recognize the limitations this
places on interpretation of the model.
2,3.3
Evidence for Association
The existence of a group of common risk factors or precursors,
in particular atherosclerosis and hypertension, suggests that
Cerebrovascular disease and Ischemic heart disease may be associated.
For IHD, coronary atherosclerosis is a major risk factor.
Hyper-
tension is suspected as enhancing the development of coronary atherosclerosis and is a major risk factor for the initial heart attack.
Hypertension is also a precursor for cerebral hemorrhage and cerebral
thrombosis.
Other findings suggest elevated blood
lipid~
cigarette
smoking and diabetes mellitus are risk factors for both CVD and IHO.
The 'hypothesis' we explore with the mortality model states
that the ages at clinical onset of CVO and IHO are positively
correlated (onset dependence).
This 'hypothesis' is studied by
examining the fit of the mortality model to observed data.
We com-
pare the fit of the mortality model when onset dependence is assumed
with the fit of the mortality model when independent action of the
diseases is assumed.
We cannot prove or disprove the hypothesis
with the mortality model.
Comparison of fit of the models can be
used to decide the appropriateness of different assumed modes of
action of diseases in describing available data.
CHAPTER III
STATISTICAL MODELS OF DISEASE ASSOCIATION
3.1
Introduction
In this chapter we develop a mathematical model of human morta-
lity which incorporates the concepts of natural history of a disease
and disease association discussed in Chapter II.
The goal of this
model is specification of the age-at-death distribution for deaths
observed from multiple causes data.
A coherent model is presented in
which the times spent in the individual stages in the disease natural
history are assumed to have different distributions with functional
forms specified except for unknown parameters.
We begin by considering a hypothetical cohort of individuals
who are assumed susceptible to only two chronic
and disease B.
disease~,
disease A
We follow this cohort until all individuals have died.
At the death of each individual in the population we determine:
1)
The age at death of the individual (in presence of both
competing causes)
2)
The underlying cause of death (either disease A or disease
B)
3)
Presence of a secondary condition (disease) at the time of
death (whether disease A or B occur as a nonunderlying
cause).
30
We make the following assumptions:
1)
Each death is assigned to a single underlying cause.
2)
Each individual in the cohort is liable to die from either
cause.
An underlying cause cannot also be considered a non-
3)
underlying cause for the same individual.
Let Al (Bl) denote the 'event' that disease A (B) is listed as
the underlying cause of death.
Let A2 (B2)
denote the 'event' that
disease A (B) is listed as a non-underlying cause of death.
represents the complement of the eventA2 (B2).
A2 (B2)
From assumptions (1)
and (3) we have:
Al u BI =
Al
n
A2
= BI
n B2
= Al
n BI
=~
senting all deaths due to disease
where
n
n
is the sample space, repre-
A or disease
B as underlying cause.
With the information determined at death we can classify all deaths in
n
to the following four events.
E : Al n
I
82- Disease A is recorded as the underlying cause and
disease
E : Al n B2 - Disease
2
disease
E : BI n A2 - Disease
3
disease
E : BI n A2 - Disease
4
B is not recorded.
A is recorded as the underlying cause and
B is recorded as a non-underlying cause.
B is recorded as the underlying cause and
A is not recorded.
B is recorded as the underlying cause and
disease A is recorded as a non-underlying cause.
Clearly
E
I
through
E
4
are mutually exclusive events and
31
4
n E.J
(3.1)
j =1
The aim of the model is to specify the distribution of age-at-death
conditional on being in
events
til' EZ' E3
or
~
E4 .
and being classified into one of the four
We have that
(3.2)
1
where
Pr(~)
- {Probability an individual in
~
dies}
and
Pr(E.)
J
3.Z
an individual in ~
and is classified in event E.
= {probability
dies} .
J
Waiting Times Random Variables
Using the concepts discussed in Chapter II we define the following
events in the natural history of a disease for each individual in
B - Birth of the individual;
O
- Clinical onset of disease
A',
°2
- Clinical onset of disease
B·,
0
- Death due to disease
A as the underlying cause
(AI);
- Death due to disease
B as the underlying cause
(Bl);
°1
1
0",
~.
'"
For each of the events we may defined hypothetical waiting times
to occurrence of the events as follows:
T - Potential time to clinical onset of disease
l
.\ (8 0
+
01);
T - Potential time to clinical onset of disease
Z
B (8
+
0Z);
0
32
Xl - Potential times to death due to disease A as underlying
cause
(8
0
~
0 );
1
X - Potential times to death due to disease B as underlying
2
These waiting times are assumed to be random variables having some
joint distribution function.
These random variables represent the
theoretical times to occurrence of the events in the absence of competing risks.
For disease A these events and random variables are
illustrated in Figure 3.1
Figure 3.1.
B
O
(BIRTH)
Events and Random Variables Associated
With the Model of Disease Natural History
0
°1
(INITIATION)
(CLINICAL
ONSET)
1
(DEATH)
~----------------..,...--+--------t
I
I
~._------
I
1<'-------------I
I
--------~>
I
:
I
---------->l
I
Since an individual is unable to die from a diseas(' unless he
first gets the disease we have the restrictions
(3.3)
In addition, assumption (1) states that each death is assigned only
one underlying cause, thus just one of the events
be observed.
Hence either
01
end
O will
2
33
Xl < X
2
(Disease A is underlying cause)
X < Xl
(Disease B is underlying cause)
z
(3.4)
will be recorded for any individual death.
This restriction on
observed information is similar to that used in competing risks
models, section 1.3.2.
Conditions on the age at death for an individual classified in
one of the events
EI , E ' E
Z 3
or
E
4
can be expressed in terms of
the waiting times random variables if we make the following additional assumption:
4)
A secondary condition (non-underlying cause) is recorded
only if the secondary condition manifested symptoms (i.e.
onset) prior to death from the underlying cause.
Thus a non-underlying cause is assumed to be recorded if
or
T
Z
first.
~
Xl
<
XZ'
T
I
~
X < Xl
z
We make no assumption as to which disease onsets
Our interest is only in the age at onset of the secondary
condition with respect to the age at death from the underlying cause.
In principle the question 'Had onset of the non-underlying disease
occurred prior to death?' could be answered by analysis of the condition of the individual at death.
What cannot be answered easily is
'How long prior to death did onset occur?',
To answer this question
we would need observations on the age at onset.
This irformation is
not available from mortality data and for only few diseases is any
information of this type available at all.
Keeping in mind the assumptions on the data and the restrictions
placed on the random variables, conditions on the a'5e-at-death for
each of the events are given as follows.
34
EI:(X I < X2) n (Xl < T2)
(3.5)
= (Xl < T2 < X2 ) ;::; (Xl < T2)
since
1'2 ~ X
z
with probability one.
E2 : (Xl < XZ) n (1'2 ~ Xl)
=
(T Z
~
Xl < XZ) .
(3.6)
E3 : (X 2 < Xl) n (X Z < 1'1)
= (X
= (X
2
2
< l'1 ~ Xl)
(3.7)
< T1) •
E4 : (X Z < Xl) n (T I
~
XZ)
(3.8)
= (1'1 S Xz < Xl) .
3.3
The Onset-Dependence Model
To specify the age at death distributions for death; classified
in the events
E , E , E and E we need to specify tle joint disI
3
2
4
tributions of the waiting times random variables. In adiition we
would like to introduce into the model at this point the concept of
onset-dependence discussed in Chapter II.
Thus along with the restric-
tions placed on the random variables by equation (3.3)
the covariance of the random variables
3.3.1
T
I
and
w~
also require
T to le positive.
Z
Component Random Variables
One method of specifying the joint distribution of
~he
waiting
times random variables which satisfies all conditions pl;lced on these
35
variables is given as follows.
Let
T = YO + Y
l
I
T2 '" YO + Y
2
Xl
=
T + Y = YO+Y +Y
I
3
3
l
X =T +Y
224
where the
Y.]. 's
=
Y +Y +Y •
O
4
2
are independent, non-negative random variables,
denoting component random variables and having density functions
f y (y.; e .) = f y (y.)
.
1 ~l
.].
1
e.
where
~1
i = 0, I , 2, 3, 4
1
are the parameters which specify the form of the density.
The component random variables Y and
3
Y
4
are non-negative thus
satisfying the restriction that
one.
T :S; Xl and T :S; X with probability
2
2
1
Since the waiting times random variables T and T
(onset
l
2
times) have the component random variable
YO
in common, they are
positively correlated.
T
1
and
The covariance of
T
2
ply as the variance of the component random variable
is given sim-
YO'
There are, of course, other ways of representing the waiting times
random variables in terms of the independent component random variables
such that all conditions on the waiting times random variables are satisfied.
The representation used was chosen because in addition to satis-
fying the conditions it is possible to speculate on the interpretation
of these random variables as they relate to the stages in the natural
history of a disease.
For example, the components
natural
as
interpre~ation
Y
3
and
Y
4
have
36
Y - the (potential) time to death after onset for disease A or
3
simply the potential length of the advanced disease stage
for disease A;
Y - the potential length of the advanced disease stage for
4
disease B;
Interpretation of the components
cult.
We might describe components
YO' Y
I
YI
and
and
Y
z
is more diffi-
YZ as follows
Y - the potential time spent in the incubation (or latent) stage
I
for disease A;
z-
Y
the potential time spent in the incubation (or latent) stage
for disease B;
For component
YO
we have
Y - the potential age of joint initiation of diseases A and B.
o
Thus the concept of onset dependence association between diseases A and
B would be equivalent to the assumption of simultaneous initiation of
disease A and B under the above interpreation of componEnt random variabIes
YO' Y
I
and
Y (see Figure 3.2).
Z
This is specLlation and it
is possible that other interpretations for these componEnts could be
given.
The joint density function of the waiting times r8ndom variables
can be expressed in terms of the densities of the comporent random
variables.
The joint probability density function of the component
random variables is
(3.9)
37
Figure 3.2.
Two Disease-Onset Dependent Association
Model-Component Random Variable.
r-----. Yo I
I
I
I
I
I
Y
l
I
I
I
Y
3
ADVANCED
I
DISEASE (A)
: INCUBATION
0
1
I
I
'WELL'
jou.
I
1
I
I INCUBATION
I
I
I
Y .
YO
0
DISEASE (B)
Y
4
2
We can express the joint probability density function of
and
X
2
in terms of the density functions of the
Yo
=
YO
T
::
YO
+
YO
T
l
I
T
2
Xl
X
=
z=
YO
=
YO
Y1
Y
l
::
Tl - YO
+
YZ and
YZ
=
TZ - YO
+
Y
3
Y
3
=
Xl - T1
T +Y
Z 4
Y
4
::
X
with Jacobian
so that
D
Z
ADVANCED
2
z - TZ
o(Y O'Yl'YZ'Y3'Y4)
o(yo,t l ,t Z'x l ,x 2)
=
1
Y. ts.
1
T1 , TZ' Xl
We have
38
(3.10)
Integrating out
of
YO
we obtain the joint probability density function
TI , TZ' Xl' XZ'
(3.11)
or
if
t
l
$
t
2
and
t I <- x I <
00
t 2 -< x 2 <
(3.12a)
00
(3.l2b)
3.3.2
Crude Probability Functions
We have now derived a model of disease development which incorporates a concept of association between two diseases.
This model is
confined to consideration of those individuals dying of
~auses
deter-
mined by events
E . E ' E , E in 51. In order to fit this model
Z 3
4
l
to the data on deaths in 51 we need to determine the probabilities
of dying before a certain time (age) , say
one of the death categories,
E , E , E
l
2
3
T,
or
and being classified to
E4 ·
39
Consider the following (crude) probabilities
Ql(t)
:=
Pr{an individual of g
dies before time
from disease A before onset of disease
1
B is observed}
Pr{(X
:=
Pr{(X
I
I
'$ 1)
n E }
I
'$1)
n (Xl <T
z)} .
(3.13)
We label this probability 'crude' (using competing risks terminology)
because it is determined for an individual who is assumed susceptible
to classification in anyone of the four mutually excluslve categories,
El , EZ' E , E ,
3
4
which comprise
n.
Thus this probability is deter-
mined in the presence of the four competing categories.
Evaluation of
QI(1)
is performed by expressing the appropriate
events in terms of the component random variables.
The lndependence
of the
QI(1)
Yi's
results in a simpler formula for the
than could
be obtained using the waiting time random variables.
We have (see 3.13)
(3.14)
where
40
Notice that
so that
(3.15)
Because of the independence of the
Yi's, YO' Y and
2
U are
independent with joint density
(3.16)
Hence,
Pr{(U::;T-Y ) n (U<Y )}
O
=
fTfT-U[t y
o0
U
2
Y U(Y ,y ,u)dy dY du
2 o
O 2
(3.17)
0' 2'
Using equation (3.16) we can simplify equation (3.17)
or
(3.18)
where
41
F
Y.1
Co)
is the distribution function for random variable
Y.
and
1
Sy. Co)
is the survival distribution function for the
1
random variable
Y.
1
Now
Pr{E } = Pr{individual dies of disease A, before onset of
l
disease B is observed}
= Pr{U < YZ} = [ f fy CYZ)dYzfUCu)du
ou Z
=c
Pr{X l < TZ}
=
JooSy CU)fUCu)du = Ql(oo) .
(3.19)
o Z
Thus the distribution (CDP) of the age at death among those who died
from disease A before onset of disease B is observed is given by
(3.20)
For deaths in
E we have
Z
QZ(T) = Pr{an individual in
n
dies before time
T
from
disease A after onset of disease B has been observed}
Now since the component random variables are independenf,
YO' Y ' Y4
Z
42
and
U = Yl + Y3
are also independent with joint density
(3.22)
Thus
(3.23)
=
)J
oo
fTfu(U)JT-U fy (YO)dYojUfy (y
f y (y )dy d}2du
2
4
4
4
o
0
0
0 2
u-Y
2
=
JTfu(U)F y (T-u)JUfy (y )Sy (u-y )dy du
2
2
o
002 2
4
(3.24)
Now
(3.25)
F (·) is the distribution function of
V
expression (3.24) takes the form
where
V
=
Y2
+ Y4'
=
fTfu(U)Fy (T-U)F y (u)du - jTfu(U)Fy (T-u)FV(u)du
o
0
2
0
0
=
J\U(U)F y (T-U) [F y (u) - FyCu)]du .
002
Thus the
(3.26)
43
Now
Thus equation 2.26 becomes
=
fTfU(U)Fy
o
0
(T~U) [Sy(U)
- Sy (u)]du
2
TfU(U)F y (T~u)SV(u)du
fo
0
(3.27)
- fTfU(U)Fy (T~U)Sy (u)du
002
= fT fU(u)F y (T-u)Sy(u)du - Ql (T)
o
0
(3.28)
Pr{E } = Pr{an individual dies and is classified in category
2
=
E }
2
PdT 2 ~ Xl < X2} = Pr{y2 ~ U < y}
= Pr{U<y} -
Pr{U~Y2}
.en
..00
= JoSy(U)fu(U)dU
Sy (u) fU(u)du
Jo 2
= J:Sy(U)fu(U)dU - QI(oo) = Q2 (oo) ,
(3.29)
and the distribution of age of death among those who died were classified in category
E
2
is given as
(3.30)
For the deaths classified in categories
determine
Q7 (T)
and
and
Q2(T).
J
for
Q (T)
4
and
Q (T)
1
E we may
4
using arguments similar to those used
Q (T)
4
3
and
For this reason the formulas for
will not be derived, but just stated.
y=Y2+ Y4
U
Then
E
= YI
+ Y3
Let
Q3(T)
and
44
=
flp y
o
Pr{E 3 }
=
(T-V)Sy (v)fy(v)dv
0
Q3 (oo)
(3.31)
1
= J:SYI (v)fy(v)dv
(3.32)
,
(3.33)
and
(3.34)
Q
4
=
(oo)
(3.35)
faSu(V)fy(V)dV - Q3 (oo)
(3.36)
The
to
Qj(T)
E )
4
are used to calculate the (two-way) ag< by event
distribution of deaths in
n.
To fit the onset-dependence
model we need to specify the distribution functions for the component
random variables and estimate values for the unknown palameter.
To
illustrate one method of fitting this model to mUltiple causes of
death data, assumptions as to the form of the component distribution
functions are made.
Discussion of choice of distributicns as well as
model fitting problems are given in Chapter IY.
3.4
The Duration-Onset Dependence Model
To incorporate the concept of duration-onset depen(ence into the
model proposed in the previous sections simply requires that we allow
the time-to-death-after-onset random variables,
Y3
and
Y4'
distributions which are dependent upon the age-at-onset random
to have
45
variables, 1'1
and
1'2'
One way by which this can be accomplished is
to assume that the random variables
conditionally with respect to
1'1
Y3
and
and
Y
4
are distributed
1'2'
Formally, assume the same model as given in section 3,1 but now
let
Y
3
and
Y
4
have conditional density functions denoted by
Because the onset times
random variables
Y
3
T
and
1
and
Y
4
waiting time random variables
<
00
<
00
•
are correlated, the component
1'2
are also correlated.
Xl
and
X
2
In addition the
are no longer simple
convolutions of the component random variables and the variancecovariance structure of, Xl
and
X2
is now complex.
Assuming this new structure for the distribution of component
random variables, we can express the
Ql(T)
probabilities as follows:
(3,37)
where
and
U
=
Y
l
+
Y
3
and
46
(3.38)
where
V
= Y2
+
Y
4
and
and
Similarly
(3.39)
and
= Pr{(Y
o ST-V)
n (Y sV<V)}
I
47
(3.40)
Incorporating the duration dependence concept into the initial
onset dependence model has the effect of changing the ferm of the
integral representation of the age at death distributions.
Qj(T)'S.
The problem of specifying the densities of the five component random
variables remains.
However, for the duration-onset dep(ndence model
this process is further complicated by the requirement that the densities of the components
and
T .
2
Y3
and
Y
4
depend upon the
on~et
times
T
l
There are numerous ways this can be accomplished but discus-
tion of these will be left for later study.
3.5
The Independent Action Model
The models discussed so far have been constructed cn the assump-
tion that the waiting times random variables are positi\ely correlated.
It seems reasonable to consider at this point a model which assumes the
waiting times random variables
buted.
T
I
and
T
are independently distriZ
We have then a model which represents the case where two
diseases operate independently of each other (independent action).
Using the structure and notation similar to the
on~et
dependence
model (section 3.1.4) we can express the waiting times random variables
in terms of the component random variables as follows
48
T
:=
Y
I
T
Z
:=
Y
Z
Xl
:=
T
I
+
Y
3
T
Z
+
Y
4
I
X
z
:=
Because the component random variables
Y , Y ' Y
Z 3
I
and
Y
4
are
assumed independent, the (potential) onset times random variables,
and
T '
Z
variable,
are uncorrelated and the (potential) age at death random
Xl
and
XZ'
are also uncorrelated.
The interpretation of the component random variables
are the same as suggested in the onset dependence models.
nent
T
I
random variables,
YI
and
Y '
Z
Y
3
and
Y
4
The compo-
now simply represent the
(potential) age at clinical onset for diseases A and B respectively.
The restriction that the model be applicable to multiple causes
of death data and the assumption of independent times to onset imply
that it is unnecessary to incorporate disease events which occur
between birth and clinical onset into the model_
For example, if
we assume that the diseases have independent (potential) times to
initiation, denoted by
yl
o
and
potential latency times, denoted
component random variables
Y
I
Y
I
:=
o
respectively and independent
yl
and
yl I
I
and
yl
o
Y;, we simply define the
Y as
Z
+ YI
I
Y =yll+yl
Z
0
Z
To calculate the crude probabilities,
Q. (T),
J
for the independent action
model, we need the distribution of component random variables
Y
I
and
YZ-
49
Integral formulas for the crude probabilities in terms of the
densities of the component random variables are given as
Pr{ (Y 1 + Y3 :s; T) n (Y 1 + Y3 < Y2) }
\y
(u)fX (u)du
f0 2 1
(3.40)
where
=0
=0
Pr{ (Y1 + Y3 :s; T) n (Y2 :s; Y1 + Y3 < Y2 + Y2)}
JTSX
(u)f (u)du - ITSy (u)f (u)du
x
X
02102
1
,IT f x
(u) [Sx (u) - Sy (u)] du •
012
2
(3.45)
where
and
Q3(T)
=0
=
Pr{(X
f\y
2
:S;T) n (X <T )}
l
2
(v)fX (v)dv
012
(3.46)
50
=
flsx
o
(v)f
1
X2
(v)dv -
JTSy
0
(v)f
1
JofX 2 (v) [SX 1 (v) - Sy1 (v) ]dv
T
X2
(v)dv
.
(3.47)
In the next chapter we attempt to fit the four component, independent action model and the five component, onset-dependence model
to multiple causes of death data and compare the fits of these two
models.
CHAPTER IV
OF MODELS USING MULTIPLE
CAUSES OF DEATH DATA
I~~ESTIGATION
4.1
Introduction
In this chapter we attempt to apply the models developed in
Chapter II I to some data on multiple causes for 1969 U. S. white male
deaths (physician coded, see section 1.2.2).
The models developed
in Chapter III assume mortality information for a completed birth
cohort of individuals susceptible to only two causes of death.
Since the available multiple causes of death data are cross-sectional
data (i.e. not cohort data) and some individuals are assigned an
underlying cause of death which is other than Cerebrovascular disease
or Ischemic heart disease, the models cannot be directly applied.
life table adjustment of these data is necessary before we can
attempt to fit these models (sec. 4.2).
The purpose of this chapter is twofold:
1)
To examine possibilities for the distribution of the component
random variables in an attempt to determine combinations of
distributions which afford the better fitting models.
2)
To determine whether the Independent Action model and onset
dependence model give the same degree of fit to the data
when the assumptions of the distribution of component random
variables are the same for the two models.
A
52
Candidates for the distributions of the component random variables to be studied are the Gompertz, Gamma and Exponential distributions.
Because it was not feasible to study all possible combi-
nations of these distributions in the models, we restricted consideration to two forms of the Independent Action model and four forms
of Onset Dependence model.
Five of the models involve the use of Gompertz distributed
component random variables.
The six forms of the model are discussed
in two groups of three models each (sections 4.4 and 4.6); the second
group of three models are modifications of those in the first group.
The parameters of each model are estimated using the 'pseudolikelihood' function discussed in section 4.3.
A summary of the
results of this (fitting) exercise is given in Chapter V.
4.2
Data Adjustments
Observed multiple causes of death data represent
th~
mortality
experience of a population over some fairly short period of time
(usually 1-3 years).
This population consists of individuals from
many different birth cohorts. each cohort having a different mortality experience over time.
Since this population is neither stable
nor stationary. each cohort will have different size pop'llations-atrisk at any point in time.
To adjust for the differences in population-
at-risk in the different age categories (representing different birth
cohorts), the multiple decrement life table technique (Jordan, 1961,
Chapter 2) is used.
S3
Census figures for white male U.S. population, 1969-71 (NCHS
1975h) and total observed white male deaths for 1969 (Bureau of the
Census 1973), grouped into five year age categories are used to cal··
cuI ate conditional pTobabilities of dying in each age iLterval.
Table 4.1 presents the census data, observed total mortality data
and life table TIlnctions for United States White Males, 1969.
Notation (see Table 4.1)
K - 1969-71 census population size of white
5 x
group
x
to
x
+
ma1e~
in age
5
D - observed white male deaths in age group
5 x
x
to
x + t
for calendar year 1969
sqx - proportion of persons alive at the beginning of age
interval
[x, x + S)
who die before reaching the end
of the interval
£ - number of persons living at the beginning of the age
x
interval
[x, x + S)
out of the total number of births
assumed for the radix,
£0
Sd
x
=
£0'
of the table.
I'ie take
6
10 .
- the number of white males who would die in the age
interval
[x, x + S)
out of the total number of births
assumed for the radix of the table.
These 'expected' deaths,
Sd '
x
represent the mortality experience for
a hypothetical cohort (of size
individuals) from a popula-
which is both stable and stationary.
For our purposes we will distinguish five mutually exclusive
subpopulations (or categories) to which deaths are assigned according
54
Table 4.l.
Life Table - United States, White Males, 1969
x
d
5 x
.024868
100000)
24868
4267
.002468
975132
2407
9033725
4295
.002374
9727::5
2310
15-20
8291270
12333
.007410
970415
7191
20-25
6940820
13472
.009658
963225
9303
25-30
5849792
9900
.008426
953922
8038
30-35
4925069
9874
.009069
945884
8578
35-40
4784375
12763
.013250
937305
12419
'W-45
5194497
22355
.021289
924886
19690
l5-40
5257619
36230
.033871
905196
30660
riO-55
4832555
53123
.053494
8745':;6
46782
;i5-60
4310921
76162
.084600
827754
70028
60-65
3647243
97868
.125732
757727
95271
(t5- 70
2807974
112941
.182733
662456
121053
70-75
2107552
124712
.257741
541403
139542
75-80
1437628
125827
.359055
401862
144290
80-85
805564
102617
.483082
257571
124428
85-90
330213
60589
.628927
133143
83737
90-95
92024
23178
.772760
49406
38179
95-100
18704
5195
.819620
11227
9202
100-105
1764
583
.904858
2025
1832
105+
144
50
193
193
5° x
0
5 x
sqx
0-5
7374333
37139
5-10
8633093
10-15
1.0
51-
55
to information recorded on the death certificate.
represent Cerebrovascular disease
Let disease A
(ICDA 8th 430-43.89)
B represent Ischemic heart disease
and disease
(ICDA 8th 410-414.9).
The five
categories, which will simply be called 'causes' of death, are:
Cl :
deaths due to disease A as underlying cause with disease B
not reported on the death certificate (compare to event
E ,
l
section 3.1)
C :
2
deaths due to disease A as underlying cause with disease B
reported as a non-underlying cause (compare to
C :
3
E )
Z
deaths due to disease B as underlying cause with disease A
not reported on the death certificate (compare to
C :
4
E )
3
deaths due to disease B as underlying cause with disease A
reported as a non-underlying cause (compare to
,
C :
5
E )
4
deaths due to an underlying cause other than disease A or
disease B.
We denote by
x to
x+t
(complement of
SD xj
and cause
C U C U C U C ).
2
3
4
l
the observed number of deaths in age group
Cj
over a calendar year, and denote
for the corresponding 'expected' deaths from the life table.
Sd xj
In
Table 4.2 we present both the observed and life table deaths for the
five causes,
C
l
to
C ,
5
by five year age groups for 1969 white
males.
The models developed in Chapter III are fit to the life table
(expected) deaths for which either Cerebrovascular disease or Ischemic
heart disease is the underlying cause (the
Sd., j
xJ
= 1,2,3,4).
The
life table deaths assigned to other underlying causes, subpopulation
C ,
5
are not utilized since no provision is made for these deaths in
Table 4.2.
x to x
e
Total Deaths
D
d
5 x
5 x
Observed and Life Table Deaths by Causes, ~hite Males, 1969
c2
C
D
5 x1
d
5 x
c4
C
D
5 x2
d
5 X2
0
0
D
5 X3
d
5 x3
D
5 X4
c
d
5 X4
D
5 x5
d
5 x5
37053
24810
4224
2382
0-5
37139
24868
74
50
5-10
4267
2407
37
21
0
0
5
3
1
0
1
10-15
4295
2310
45
24
0
0
4
2
0
0
4246
2284
15-20
12333
7191
70
41
0
0
17
10
0
0
12246
7140
20-25
13472
9303
69
48
0
0
57
39
1
1
13345
9215
25-30
9900
8038
118
96
1
1
159
129
5
4
9617
7808
30-35
8974
8578
149
142
2
2
571
546
4
4
8248
7884
35-40
12763
12419
258
251
11
11
2057
2002
19
18
10418
10137
40-45
22353
19690
475
418
23
20
5574
4909
83
73
16200
14270
45-50
36230
30660
824
697
64
54
11433
9675
224
190
23685
20044
50-55
53123
46783
1285
1132
110
97
18208
16035
449
395
33071
29123
55-60
76162
70028
2026
1863
284
261
26334
24213
845
777
46673
42914
60-65
9786~
95271
3124
3041
449
437
SS4u3
32517
1466
1427
59426
57849
65-70
112941
121053
4473
4794
799
856
37387
40072
2217
2376
68065
72955
70-75
124712
139541
6267
7012
1267
1418
39190
43850
3360
3759
74628
83502
75-80
125827
144290
7658
8782
1586
1819
37456
42952
3994
4580
75133
86157
80-85
102617
124428
7177
8702
1488
1804
28770
34885
3759
4558
61423
74479
85-90
60589
83737
4541
6276
944
1305
16367
22620
2237
3092
I 36500
50444
e
12
8
0
I
Ul
0\
e
e
e
e
Table 4.2.
x to x+5
90-95
Total Deaths
D
d
5 x
5 x
C1
I
C2
d
5 xl
D
5 xl
(continued)
C3
D
5 x2
d
5 x2
D
5 x3
I
C4
diD
5 x3 \ 5 x4
d
5 x4
D
5 x5
23178 \
38179
1634
2691
331
545
6090
10031
95-100
5195 I
9202
366
648
59
104
1337
2368
134
237
3299
100-105
583
1832
34
107
6
19
142
446
8
25
393
50
193
2
8
0
0
15
58
a
0
33
944573
1000000
40706
46844
7424
264588
287370
19553
105+
8753
I
747' 1230
C5
14376
d
5 x5
1\
I
23682
5845
1235
127
22747 1612302 1634286
til
.......
58
the models.
C
l
to
The effect of using only those deaths in categories
C
4
models.
is to modify the interpretation of components in the
The model, as fitted, then represents the mortality experi-
encc of individuals who not only have clinical onset of one or both
of these diseases (CVO or rHO) but aZso die with one of these diseases
as the underlying cause, as if no other causes were acting in the population.
4.3
Pseudo-Likelihood
For model fitting purposes we treat the four sets of life table
deaths,
Sd ., j
XJ
=
1,2,3,4,
as if they were the observed deaths.
The
method of maximum likelihood is used to estimate the parameters in the
models.
Since these 'data' are not random variables but expected
values from the life table model (and thus depend on the choice of
~O)
and since we have excluded deaths due to other underlying causes
(cause
C )'
S
the 'likelihood function' is clearly not a likelihood in
a probabilistic sense.
We will call this function a pseudo-likelihood
function (and denote itby PL) because it is constructed in a formal
manner as if the life table deaths,
variables.
{Sd.
XJ
j=I,2,3,4},
were random
Since the 'data' are grouped into five-year age categories
the 'multinomial distribution formula' is used.
The pseudo-likelihood
function is used simply as a basis for a mathematical technique of
fitting models and obtaining parameter estimates.
These estimates,
of course, do not have the properties of true maximum likelihood estimates.
59
We recall (section 3.3.1) that
f y (y.)
.
the density of component random variable
f y (y.; e.)
.
1
1
Y.1
where
represents
~1
e.
are the para-
~1
meters of this distribution.
Let
recall that
represents the (crude) probability func-
~,
Q .(x)
J.
tion for deaths
Q.(x·6)
J'~
e
:=
1
1
~
{e., i=O,l,2,3,4}.
=
x
which occur before age
Pr{Death occurs between age
x
We also
~1
and
from cause
x+5
from
C ••
J
Thus
in the pre-
C.
J
sence of the rffinaining three causes}
Q.(x+s;e)
:=
Let
J
PL x (6)
~
Q.(x;e) = ljJ .(8)
-
~
J
XJ
~
.
~
denote the pseudo-likelihood associated with (life
x
table) deaths in age group
x+5
to
from causes
through
C
l
Then
4
PL x (e)
~
=
sd xj
nj=l
[ljJ .
xJ
8]
~
The pseudo-likelihood for all 'deaths' denoted
by taking the product of the
PL(~) =
n
XEM
where
PL(~),
is obtained
PL (8) over all age groups.
x
PL x (8)
=
n nj=l
[ljJxj
(~)]
Thus
d .
S XJ
(4.1)
XEM
M = {O,S,lO,---,lOsl.
The values of
e
which maximize
PL(e)
or equivalently, which
maximize
4
log [PL(e)] =
e
are our point estimates of
L
L
XEM j =1
8
~
sd XJ. log e [ljJ xJ.(8)]
~
and are denoted by
"-
~.
Because of the complexity of the (crude) probabilities
it is not possible to maximize
PL(~)
(4.2)
Q.(x;6),
J
~
using classical differential
60
calculus methods.
Instead a multivariate search procedure is used to
determine numerically the value for each element of
log e [PL(8)].
~
MAXLIK.
which maximize
This is accomplished by the computer subroutine package
~~XLIK
was originally developed to perform iterative maximum
likelihood estimation (Kaplan and Elston 1972).
4.4
Parametric Forms of the Models
The final and most important step in the model fitting process
is specification of the parametric form of the distributions of the
component random variables.
These distributions should be suggested
from the literature on the diseases under study.
Since the actual
distributions of the component random variables cannot be observed
(because of the effect of competing risks) we must rely on theoretical arguments or assumptions in choosing initially among possible
candidate distribution functions.
Continued research with these
models using different assumed distributions may make clearer the
appropriateness of certain distributions for these types of models.
Since it is not feasible to consider, in this dissertation, all
combinations of parametric distributions for the component random
variables and since the literature on Cerebrovascular disease and
Ischemic heart disease does not suggest specifically appropriate
forms, it was decided to use the Gompertz and Gamma distributions
for this initial study.
The Gompertz distribution has a long his-
tory of use in the analysis of human mortality and has also been
introduced in connection with the theory of aging and mortality
(Mildivan and Strebler 1960). The Gamma distribution has had a much
61
wider use in statistical reliability theory and is related to the
Poisson process,
Since the Gompertz distribution is left-skewed
and the Gamma distribution is right-skewed, comparisons between the
fit of models based on these distributions should indicate the degree
of sensitivity of this model to fairly different kinds of assumed
distributions.
Using the Gompertz and Gamma distributions we consider the
following three forms of the model.
4.4.1
Modell - Proportional Hazard Gompertz Independent Action
Model
As Modell we use the Independent Action model of section 3.5
and assume the component random variables to be Gompertz distributed
with common 'a' parameter and different 'R' parameters.
The Gompertz
distribution is of the form
with density function
f y (y.;a,R.) = R exp{ aYi
.11
i
1
+
R.1
a
(l-e
ay. }
1)
,
and hazard function
hy (y.;a,R.)
=
R. exp{ay.} .
.1111
1
The first two moments of the Gompertz distribution are
~l
(4.3)
62
y]
2
n2
00
+ (; + 2
I
(4.4)
k=l
where
y:: Euler's constant:: 0.57721'"
and
n:: 3.14159'"
(Hoel 1972).
The assumption of a common 'a' parameter is made to reduce the
number of parameters in the model.
This assumption also implies that
the hazard functions for the component random variables are proportional, since
h y . (t)
R.e
1
at
1
-----,-
at
R. ,e
R.
1
::
i:: 1,2,3,4
;t i'
~
i
1
1
We will call this model the Proportional Hazard Gompertz Inde-
pendent Action Model to remind us of the form of the model and to
help distinguish it from other forms of the model.
set for this model is
e (1)
~
:: {a
The parameter
R
1'
,
The sum of two independent Gompertz distributed random variables
does not have a Gompertz distribution, hence neither the (marginal)
distributions of the ages at onset
death
(Xl
and
X)
2
(T
l
will be Gompertz.
and
T )
2
nor of the ages at
Numerical approximation of
the convolution of Gompertz distributions as well as numerical evaluation of the crude probabilities,
Qj(X)
is necessary to fit Modell.
Both Simpson's rule and (closed form) Newton-Cotes formula
(Abramowitz and Stegun 1972) were used for this purpose.
63
4.4.2
Model 2 - Proportional Hazard Gompertz Onset Dependence
Model
We now consider the Onset Dependence model of Section 3.3.
As in
the previous model, we assume each component random variable is Gompertz
distributed with common 'a' parameter and different
The parameter set for this model is
'R.'
1
parameters.
~(2) = {a,RO,Rl,R2,R3,R4}'
A cOlnparison of the fit of the Proportional Hazard Gompertz Independent Action Model (Modell) and the Proportional Hazard Gompertz
Onset Dependence Model (Model 2) is useful in evaluating the appropriateness of the onset dependence assumption for the two diseases
understudy. Since the parameter set for Model 2 has one more element
than the parameter set for Model 1 we expect a better fit from Model
2.
It is important to assess whether the difference in fit between
these two models is greater than could be explained by the introduction of an additional parameter.
The unexplained increase in fit
would then be assumed to be the result of incorporation of the onset
dependence concept into the model.
4.4.3
Model 3 - Gamma Onset Dependence Model
To investigate the sensitivity of the onset-depend0nce model to
different assumptions about component random variable distributions,
we consider the Onset Dependence model of section 3.3 using the
assumption of Gamma distributed random variables.
To reduce the
number of parameters in the model and simplify somewhat the mathematics, we assume a common scale parameter
parameter
la. ' .
1
'Bt
but different shape
64
The
Ga~na
distribution has density function of the form (Johnson
and Katz, 1970)
f y (y .;a . • (3)
.11
1
a.-l y./S
1
1
Yi
e
O<a.,O<B,o
a.
1
S 1 f(a.)
y. < 00
,
1
1
where
f(a)
is the
function
Ga~a
The first two moments of the
The sum
having
co~on
variable.
distribution are given as:
as
~l
~2
G~a
(4.5)
= aS 2
+
(a8)2 .
(4.6)
of two independent Gamma distributed random variables
scale parameter
'B'
is also a Gamma distributed random
Hence the waiting times random variables have
ginals) distributions.
Numerical approximation of the
Ga~a
G~a
(mar-
distribu-
tion was performed using a continued fraction representation (computer
program MDGAM, IMSL 1978).
Methods similar to those used for the
Gompertz models, Models 1 and 2, are used to calculate numerically
the crude probabilities,
e. (x),
J
for Model 3.
Comparisons between the fits of the Proportional Hazard Gompertz
Onset Dependence Model (Model 2) and the Gamma Onset Dependence Model
(Model 3) should indicate the degree of sensitivity of the OnsetDependence
model to different assumed component distributions.
Because mortality data is usually left skewed, we would expect Model 2
65
to have the better fit.
But, it was not clear, initially, that
assuming Gamma distributed component random variables must cause the
Q (X),
crude probabil ities,
j
to be right skewed al so, al though i t
is likely they would be.
4.5
Parameter Estimate and Evaluation of Model Fit
The forms of the models disccssed in the previous section were
fit to the life table deaths assigned to Cerebrovascular disease and
Ischemic heart disease.
Parameter estimates obtained using the
pseudo·-likelihood function of section 4.2 are presented in Table 4.3.
Table 4.3.
a
Parameter Estimates for the Initial Model Forms
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
Model 1
Model 2
Model 3
-2
::9.026 x IO
R :: 1.897 X
l
R = 8.079 x
2
R :: 1. 296 x
3
R =5.340 x
4
r.:
10-;'
10- 5
10- 1
IO
-1
a
= 1.316 x 10
-1
R = 1.313 x 10- 4
O
R = 8.671 x 10- 4
l
4
R = 3.721 x 102
1
R =1. 928 x 103
R = 7.401 x 10- 1
4
s =
2.690
a = 12.581
O
a = 18.925
1
a = 13.937
2
a = 1.396
3
a =0.589
4
To evaluate the fit of these models to 'data' we will consider
three questions
1)
How well do the models predict the distribution of age at
death for those individuals who die from cause
C.?
J
66
2)
How well do the models predict the distribution of death
among the four causes
C., j=1,2,3,4,
J
in each age group?
How consistent are the models in their estimates of the
3)
means, standard deviations, and covariances of (potential)
waiting times afid component random variables?
pistribution of Age at Death Conditional on Cause
4.5.1
Let
R,oo
J
I
sd., mE {O,S,lO,"·,lOS}
xEM
represent the total
xJ
number of (life table) deaths 'observed' in subpopulation
j
C.,
J
= 1,2,3,4. The proportion of these deaths which occur in age inter-
val
[x, x+5)
is
(4.7)
Since the sizes of the age intervals are all equal, the
o
Pxj
repre-
sent the histogram ('empirical' density) of the age at death for
individuals in subpopulation
C .•
J
From the models, we can estimate the probability of dying in
age interval
conditional on dying from cause
[x, x+5),
A
A
pXJ' =
C. ,
J
as
A
Q.(x+5;8) - Q.(x;8)
J
~
J
~
A
Q (oo;y1)
j
A
.....
~
~
= t/JxJ (8)/Q.J (00;8)
(4.8)
0
(histograms)
The 'observed' distribution of deaths,
and predicted distributions of deaths,
causes
Co, j
J
4.1 to 4.3.
=
1,2,3,4,
Pxj'
(continuous lines) from
for the three models are plotted in Figures
A review of these plots indicates:
67
Figure 4.10
Plots of Empirical and Estimated Conditional Densities
by Cause Category - Proportional Hazard Gompertz Indedependent Action Model - Modell.
\.5
J
C1
C1 .. 0
d. :\.0
1
IS
~.
o oj
age
age
....
o. :to
0.'"
O. IS
().':"
<>. '"
0.10
o.
o.
'--o
Q
:10
Jo
flO
sa
I'
__=o...._
---'.=......~
I
60
age
70
'0
10
IQO
II"
..,
10
--.--- ---~.~ .-._-"---
4'"'
Jo
'h)
Sf'
60
age
7,)
""0
'Ie
,0....
/10
68
Figure 4.2.
Plots of Empirical and Estimated Conditional Densities
by Cause Category - Proportional Hazard Gompertz Onset
Dependence Model - Model 2
•. J.5
tJ
C
I
.~O
Gl,::aD
o. ,5
0.10
0.0$
age
age
0.'-'
0,
O,a
C
2
~o
C
4
".~o
".15
O. /5
0, /0
0,/0
0.
os
(-'
----_._------~----------"='---,
30
50
&0
80
J()o
1/.0
10
~b
~o
1(J
age
jo
age
69
~ure
4.3.
Plots of Empirical and Estimated Conditional Densities
by Cause Category - Gamma Onset Dependence Model - Model 3
age
age
o.
"'. ~s
O •
~5
.10
o. 's
a,/O
0,10
•. cs
age
age
70
1)
The Proportional Hazard Gompertz Onset Dependence Model
(Model 2) seems to give the best overall fit of the three models
considered.
2)
The Gamma Onset Dependence Model (Model 3) predicts right
skewed distributions of age at death whereas observed distributions
are left skewed.
This suggests i t is inappropriate to assume all
of the components random variables have Gamma distributions with
common
3)
'8'
parameters.
The major differences in fit between the Gompertz Models
(Models I and 2) occur for causes
alone).
C
2
and
C
4
(CVD alone and IHD
It is unlikely that the better fit of Model 2 is due only
to the added freedom afforded by an additional parameter.
The depen-
dence of onset times might be expected to affect the distribution of
age at death where both diseases are involved in the death (causes
C
2
and
C ).
4
That both forms of the Onset Dependence Model (Model
2 and 3) predict the distribution of age at death from causes
and
C
4
Action
C2
better than the Proportional Hazard Gompertz Independent
~Ddel
does lend strength to our assumption of dependence times
to onset for the two diseases under study.
4)
All three of the models failed to predict adequately the
age at death distribution for cause
C .
l
It is quite possible that
this failure of the models may be due to the restrictions placed on
the model parameters.
This suggestion will be examined in section
4.6.
Thus, an evaluation of the three models with respect to their
ability to predict the age at death distributions for deaths from
71
the four causes, indicates that the Proportional Hazard Gompertz
Onset Dependence Model
gave the best fit.
Model 2 is necessarily a good fit.
This does not mean that
It simply indicates that it seems
to be a better candidate for the structure of distributions of component random variables for future models
than the other models con-
sidered.
4.5.2
Distribution of Causes of Oeath in a Given Age Group
Continuing the analysis of the three models we now wish to examine how well these models predict the distribution of deaths to the
four cause categories, conditional on dying in a given age interval
(i.e. given death occurs in a specified age interval,how well does
the model predict the probability of death being due to a given
cause).
[x, x + 5}
In the age interval
the proportion of 'total
deaths' (i.e. those deaths for which CVO or rHO is the underlying
cause) which are classified as due to cause
C. ,
J
denoted by
no .,
XJ
is calculated as:
Sd xj
0
nxj
=
4
I
i=l
sdxj
SO .
_~xJ_
4
I
(4.9)
50 .
k=l
XJ
[Since
Given that the model is correct, we calculate the 'expected' proportion,
n .,
XJ
as (see Table 4.4)
A
n .
xJ
=
lJJ • (8)
~XJ
4
I
~
ljJ • (~)
i=l XJ
(4.10)
72
Table 4.4.
Proportion of 'total deaths' from
different causes
1\
nxl
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
'Observed'
Model 1
Model 2
Model 3
0-5
.8605
.0906
.0930
0.0000
5-10
.8604
.1194
.1162
.0000
10-15
.9184
.1349
.1277
.0000
15-20
.8046
.1425
.1333
.0002
20-25
.5433
.1464
.1360
.0009
25-30
.4169
.1485
.1369
.0025
30-35
.2052
.1497
.1370
.0060
35-40
.1100
.1504
.1369
.0120
40-45
.0771
.1507
.1370
.0211
45-50
.0656
.1507
.1373
.0334
50-55
.0641
.1504
.1379
.0480
55-60
.0687
.1499
.1386
.0638
60-65
.0812
.1489
.1392
.0794
65-70
.0996
.1474
.1391
.0935
70-75
.1251
.1453
.1381
.1055
75-80
.1510
.. 1424
.1366
.1151
80-85
.1742
.1391
.1347
.1222
85-90
.1885
.1360
.1330
.] 271
90-95
.1856
.1341
.1319
.1302
95-100
.1930
.1335
.1314
.1320
100-105
.1789
.1334
.1313
.1331
105+
.1176
.1334
.1313
.1336
Total
.1225
.1448
.1371
.0942
CI
nxl
73
Table 4.4.
continued
AX2
0
nx2
'Observed'
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
fo.lode1
Model 1
Model 2
Model 3
0-5
.0
.0000
.0004
.0000
5-10
.0
.0000
.0008
.0000
10-15
.0
.0000
.0013
.0000
15-20
.0
.0000
.0019
.0000
20-25
.0
.0001
.0028
.0000
25-30
.0035
.0001
.0036
.0000
30-35
.0027
.0002
.0050
.0000
35-40
.0046
.0003
.0058
.0002
40-45
,0037
.0005
.0064
.0007
45-50
.0051
.0008
.0068
.0017
50-55
.0054
.0012
.0073
.0034
55-60
.0096
.0019
.0079
.0058
60-65
.0116
.0029
.0089
.0087
65-70
.0178
.0044
.0103
.0119
70-75
.0252
.0066
.0121
.0151
75-80
.0312
.0094
.0142
.0181
80-85
.0361
.0127
.0163
.0205
85-90
.0391
.0158
.0182
.0224
90-95
.0376
.0177
.0194
.0237
95-100
.0311
.0184
.0199
.0246
100-105
.0315
.0184
.0200
.0251
.0
.0184
.0200
.0253
.0223
.0070
.0120
.0134
105+
Total
74
Table 4,4.
continued
ft x3
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
'Observed'
Model 1
Model 2
Model 3
0-5
.1395
.9092
.9048
1.0000
5-10
.1162
.8804
.8800
.9999
10-15
.0816
.8648
.8666
.9999
15-20
.1954
.8571
.8583
.9997
20-25
.4488
.8531
.8520
.9990
25-30
.5618
.8507
.8464
.9973
30-35
.7865
.8492
.8417
.9937
35-40
.8771
.8480
.8383
.9869
40-45
.9856
.8470
.8360
.9758
45-50
.9113
.8457
.8339
.9597
50-55
.9080
.8440
.8314
.9389
55-60
.8730
.8416
.8279
.9148
60-65
.8689
.8381
.8231
.8892
65-70
.8331
.8329
.8173
.8641
70-75
.7824
.8256
.8108
.8414
75-80
.7388
.8160
80-85
.6984
.8048
.7965
.8073
85-90
.6794
.7943
.7904
.7964
90-95
.6918
.7877
.7865
.7892
95-100
.7051
.7855
.7849
.7847
100-105
.7473
.7852
.7845
.7821
105+
.8823
.7852
.7845
.7808
Total
.7963
.8244
.8124
.8589
0
nx3
.8037
.8223
75
Tahle 4,4,
continued
fl x4
a
Proportional
Hazard
Gompertz
Independent
Action
Hodel
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
'Observed'
Model 1
Model 2
Model 3
nx4
0··5
.0
.0000
.0016
.0000
5-10
.0232
.0000
.0028
.0000
la-IS
.0
.0001
.0043
,0000
15-20
.0
.0002
.0063
.0000
20-25
.0078
.0003
.0091
.0000
25-30
.0176
.0004
.0126
.0000
30--35
.0055
.0007
.0161
.0002
35-40
.0081
.0011
.0188
.0008
40-45
.0134
.0017
.0205
.0022
45-50
.0178
.0027
.0218
.0050
SO-55
.0223
.0042
.0233
.0094
55-60
.0286
.0065
.0254
.0154
60-65
.0381
.0099
.0286
.0226
65-70
.0494
.0151
.0331
.0302
70-75
.0670
.223
.0388
.0377
75-80
.0787
.0319
.0454
.0443
80-85
.0912
.0432
.0523
.0498
85··90
.0928
.0537
.0582
.0539
90-95
.0848
.0603
.0621
.0567
95-100
.0706
.0624
.0636
.0585
100-105
.0421
.0627
.0640
.0596
.0
.0627
.0640
.. 0601
.0588
.0236
.0383
.0330
105+
Total
76
where we recall that
occurs in
[x. x+S)
WXj(~)
represents the probability that death
C..
and is from cause
4
0
I nxk
4
=
I nxk
Note that
J
= 1
.
k=l
k""l
4
L SOxk deaths are observed in age interval
k=l
as due to CVD or IHD as underlying cause, we calculate the
Given that
[x, x + S)
SO; =
predicted number of these deaths from cause
C.
J
as
n xj • 5 0*x
(4.11)
(see Table 4.5).
A 'pseudo-chi square' value (denoted by
2
PX )
can be calculated
for each age interval to measure agreement between observed and
expected proportions.
This measure is denoted 'pseudo' for the same
reasons we used the term pseudo-likelihood function.
square value for age interval
as
=
0*·
S x
[x, x+S),
rI
denoted
The pseudo-chi
Px x2 ,
is calucated
(4.12)
~=l
Since the observed deaths in each age interval are (conditionally)
independent, we simply sum the age specific pseudo-chi square values
to obtain an overall measure of fit.
The age specific and overall
measures are used to compare the fit of the three models (see Table
4.6).
No formal test of goodness of fit is used because the distri-
bution of the pseudo-chi square is unknown.
77
Table 4.5.
'.
Observed and predicted number of deaths from
different causes conditional on age at death
Proportional
Hazard
Gompterz
Independent
Action
Model
SDx2
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Ganuna
Onset
Dependence
Model
Model 1
Model 2
Model 3
[x,x+5)
D
5 xl
Data
0-5
74
7.8
8.0
0.0
5-10
37
5.1
5.0
0.0
10-15
45
6.6
6.3
0.0
15-20
70
12.4
11.6
0.0
20-25
69
18.6
17.3
0.1
25-30
118
42.0
38.8
0.7
30-35
149
108.7
99.5
4.4
35-40
258
352.8
321.1
28.2
40-45
475
927.7
843.3
l30.3
45-50
824
1890.9
1722.8
419.5
SO-55
1285
3017.2
2765.3
964.2
55-60
2026
4420.5
4088.7
1883.3
60-65
3124
5725.6
5351.9
3052.6
65-70
4473
66.17.5
6243.5
4l99.1
70-75
6267
7278.6
6921.2
5.288.6
75-80
7658
7223.1
6925.7
5836.8
80-85
7177
5732.7
5551.1
5034.5
85-90
4541
3278.0
3205.2
3061.8
90-95
1634
1180.9
1161.3
ll46.3
95-100
366
253.2
249.3
250.4
100-105
34
25.4
25.0
25.3
2
2.3
2.2
2.3
40706
48127.8
45564.0
3U28.6
105+
Total
78
Table 4,5,
continued
6
[x,x+S)
D
5 x2
Data
Proportional
Hazard
Gompertz
Independent
Action
Model
5 X2
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
Model 1
Model 2
Model 3
0-5
0
0.0
0.0
0.0
5-10
0
0.0
0.0
0.0
10-15
0
0.0
0.1
0.0
15-20
0
0.0
0.2
0.0
20-25
0
0.0
0.4
0.0
25-30
1
0.0
1.1
0.0
30-35
2
0.2
3.7
0.0
35-40
11
0.8
13.8
0.6
40-45
23
3.2
39.5
4.7
45-50
64
10.1
85.8
22.5
SO-55
110
25.0
146.5
69.8
55-60
284
56.7
235,3
172.7
60-65
449
113.3
345,4
337.5
65-70
799
200.2
466.0
538.2
70-75
1267
330.9
609.4
761.2
75-80
1586
478.9
722.1
918.9
80-85
1488
525.9
675.4
847.6
85-90
944
382.0
439.7
540.9
90-95
331
156.6
171.1
209.3
95-100
59
34.9
37.8
46.7
100-105
6
3.5
3.8
4.8
105+
0
0.3
0.3
0.4
7424
2322.6
3997.4
4476.0
Total
79
Table 4.5.
continued
sDx4
[x, x+ 5)
sD x4
Data
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
Model 1
Model 2
Model 3
0-5
0
0.0
0.1
0.0
5-10
1
0.0
0.1
0.0
10-15
0
0.0
0.2
0.0
15-20
0
0.0
0.5
0.0
20-25
1
0.0
1.2
0.0
25-30
5
0.1
3.6
0.0
30-35
4
0.5
11.7
0.2
35-40
19
2.7
44.1
1.9
40-45
83
10.8
126.3
13.9
45-50
224
34.2
273.9
63.2
50-55
449
184.6
467.7
189.6
55-60
845
192.1
751.0
455.8
60--65
1466
383.7
1101.9
869.1
65-70
2217
677.7
1486.3
1358.1
70-75
3360
1120.3
1943.5
1888.9
75-80
3994
1621.6
2303.4
2249.2
80-85
3759
1781.0
2155.4
2052.5
85-90
2237
1294.4
1403.7
1299.4
90~95
747
531.0
546.7
499.8
95-100
134
118.5
120.7
111.1
100-105
8
11. 9
12.2
11. 3
105+
0
1.1
1.1
1.0
Total
19553
7866.1
12755.5
11065.1
80
Table 4.5.
continued
fi
Proportional
Hazard
Gompertz
Independent
Action
Model
5.x4
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Model 1
Model 2
Model 3
12
78.2
77.8
86.0
5-10
5
37.9
37.8
43.0
10-15
4
42.4
42.5
49.0
15-20
17
74.6
74.7
87.0
20-25
57
108.4
108.2
126.9
25-30
159
240.8
239.6
282.2
30-35
571
616.6
611. 3
721.4
35-40
2057
1988.8
1965.9
2314.3
40-45
5574
5213.3
5145.8
6006.1
45-50
11433
10609.8
10462.5
12039.8
50-55
18208
16925.2
16672.5
lk828.3
55-60
26334
24819.7
24414.0
26977.2
60-65
33403
32219.2
31642.6
34182.6
65-70
37387
37380.7
36680.3
38780.7
70-75
39190
41354.3
40609.9
42145.2
75-80
37456
41370.3
40742.7
41689.1
80-85
28770
33154.4
32812.1
33259.3
85-90
16367
19134.6
19040.4
E'186.9
90-95
6090
6933.5
6922.9
6946.6
95-100
1337
1489.4
1488.2
1487.8
100-105
142
149.2
149.1
148.6
105+
15
13.3
13.3
13.3
264587
273954.5
269954.1
28S401.3
o
[x, x+5)
0-5
Total
5 x3
Data
Gamma
Onset
Dependence
~fodel
81
Table 4.6.
Age Specific and Total Pseudo Chi-Squared Values
Proportional
Hazard
Gompertz
Independence
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
Model 1
Model 2
Mode13
0-5
617.9
599.9
5-10
473.2
239.6
10-15
257.5
274.9
15-20
311. 8
339.1
20-25
183.6
179.5
41879.3
25-30
361.2
189.6
21284.5
30-35
62.1
33.0
4998.9
35-40
260.7
31.4
2232.9
40-45
850.5
218.2
1358.1
45-50
2008.5
573.5
906.3
50-55
2950.4
943.6
504.9
55-60
4519.8
1213.4
430.2
60-65
5273.0
1176.7
466.2
65-70
5982.4
1112.7
70-75
7379.5
1853.4
737.5
1869.8
75-80
6426.5
2616.8
2836.0
80-85
4900.8
3144.7
3420.2
85-90
2400.1
2005.1
2105.9
90-95
558.5
515.3
506.1
95-100
84.4
83.3
76.5
100-105
6.3
6.2
4.5
105+
1.6
1.6
1.7
Total
45870.3
17352.9
Age
[x, x+5)
4.8X10
9
82
The pseudo·-chi square values could also be cal culated using the
life table deaths in each age interval, i,e. using, for interval
4
[ x, x
+ 5))J
the deaths
the observed deaths,
cd'~,
J
.. L
(see equation 4.9). Unlike
sd
k=l
xk
the number of deaths from the life table,
depends upon the value of the radix of the table.
x
The pseudo-·
chi square values using the life table deaths are indeed proportional
to the values using the observed deaths, but by uS:.ng the observed
deaths all give the pseudo-chi square some degree of objectivity
for our model comparisons.
A review of the age-specific and overall pseudo-chi square values
for the three models under study indicates
1)
The Proportional Hazard Gompertz Onset Dependence Model
(Model 2) gives the best overall fit among the three models studied
but it does not give the best fit in every age interval.
2)
The Gamma Onset Dependence Model (Model 3) fits (i.e. pre-
dicts proportions) best in the age intervals
[50 - 70)
and
[90 - 105)
although in the latter interval the fit is not much better than the
Proportional Hazard Gompertz Onset Dependence Model (Model 2)
(although little weight is attached to the fit of the models at
the advanced ages).
3)
the
Model 3 does not provide a very good fit to the data in
[0 - 45)
4)
age interval s .
The differences in fit between the Proportional Hazard
Gompertz Independent Action and Onset Dependence Model may be no
more than what we would expect from the added freedom afforded by
the extra parmneter in the Onset Dependence model.
But, judging
83
from the magnitude of this difference, it seems that the better fit
of Model 2 can be ascribed, in part, to a real improvement consequent
upon insertion of the dependency concept into the model.
The differences in fit between the two forms of the Onset
5)
Dependence model suggest that the Onset Dependence model is fairly
sensitive to the choice of distributions for the component random
variables.
This also agrees with the conclusions of the previous
section.
Additional insight into the fit of these models may be obtained
from Table 4.4 and 4.5.
1)
In particular we find:
All three models tend to overestimate the proportion of
'total deaths' which are due to causes
and
Cl
and
C ,
3
(i.e. where
A
B are acting alone) thus underestimating the proportion due to
causes
2)
C
2
and
C
4
(where
In the age interval
most deaths are due to cause
A and
[0 - 35)
C
l
the observed data indicate
(CVD alone) whereas the models
predict most deaths to be from cause
3)
B are both present).
C
3
(IHD alone).
The models fail to account for the increase and subsequent
decrease in the proportions with increasing age of death from cause
C
l
and the mirror changes for the proportions of deaths from cause
C3 ·
4)
The overall lack of fit of the Gamma Onset Dependence Model
in young age groups reflects the failure of this model to predict the
proportions of deaths from causes
C
l
and
C
3
correctly.
Thus an evaluation of the three models with respect to their
ability to predict the proportion of 'total deaths' (in a specified
84
age group)
ob~;erved
as due to each of the four causes suggests that
the Proportional Gompertz Onset Dependence Model gives the best fit.
Thi:; agrees with the conclusions of the last section.
Again we note
that Model 2 gives the best fit of the three models studied but it
is not necessarily a
4.5. 3
i
good , fit.
.~~'ll!Ua!.~~
Componel!.!- and Waiting Time Random Variables
Distributions
Additional comparisons of the fit of the three models are possible i f we examine the forms of the (marginal) distributions of the
component and waiting times random variables.
Specifically, using
equations (4.3) to (4.6) we can calculate the means and standard
deviations of the component random variables assuming the parameter
estimates as given from Table 4.3 (see Table 4.7).
Table 4.7.
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
Model 1
Model 2
Model 3
Component
mean
YO
Y
l
Y
2
Y
3
Y4
Estimated Theoretical Component Means
and Standard Deviations
std, dev.
mean
std. dey.
mean
std. dev.
48.2
9.6
33.9
9.6
87.4
14.1
34.0
9.1
51.1
11. 7
71.4
14.0
23.6
8.3
37.6
10.1
5.1
3.8
3.5
2.6
3.8
3.2
1.6
1.4
1.2
1.0
1.6
2.1
85
Use of the Gompertz and Gamma distributions in the Onset Dependence model represent quite different assumptions on the distribution
of
potential time spent in the different stages of the disease pro-
cess.
This is clearly illustrated from the values in Table 4.7.
If
we are willing to assume that the interpretation of the component
random variables conjectured in section 3.3.1 (in particular that
component
YO
represents the time spent in the 'well' stage) is
valid, we find that the Proportional Hazard Gompertz Onset Dependence
Model represents a situation where the two diseases have fairly late
initiation whereas the Gamma Onset Dependence Model represents diseases
with fairly early initiation.
A comparison of the results of the
fitted models with current medical assumptions as to the mean ages for
initiation (or other events)
could help in deciding the appropriate-
ness of different forms of the model.
Further research in this area
is needed.
Despite the differences in mean component values, the three
models lead to fairly similar estimates for the mean (potential) ages
at onset and mean (potential) ages at death (see Table 4.8).
In parti-
cular, for deaths due to Ischemic heart disease as underlying cause,
the mean (potential) age at onset and age at death
(T
Z
and
X
z
ran-
dom variables) estimated from the three models are almost identical.
Differences do occur in the estimated means for the random variables
T
1
and
Xl.
It is possible that these differences are the result of
the small proportion of deaths which are due to Cerebrovascular
disease (about 14%) affecting the ability of the fitting procedure to
estimate those model parameters associated with
Y
3
and
Y •
4
Exactly
86
why all three models yield similar estimates for the random variable
means is unknown.
Further research on this finding should be reward-
ing Clnd is necessary to a full understanding of these models.
Table 4,,8.
'--~-'--_--~---~~~
~.,~
..
Estimated Means and Standard Deviations of
IVai Ling Times Random Variables
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
Model 1
Model 2
Model 3
~---,-~~
Random
Variable
T
l
T
2
Xl
X
2
mean
std. dev.
mean
std. dev.
mean
std. dev.
87.4
14.1
82.2
13.2
85.0
15.1
71. 4
14.0
71. 8
12.7
71. 5
13.9
92.5
14.6
85.7
13.5
88.8
15.5
73.0
14.1
73.0
12.7
73.1
14.0
The estimated correlations of the waiting times random variables
are presented in Table 4.9.
These correlations present another vehicle
for making comparisons among the three models.
For example, all three
models have high correlations between the age at onset and age at
death random variables
(corr(T1,X )
1
and
corr(TZ'X ))'
Z
This is
the result of relatively small estimated variances for components
Y
3
and
Y4
as compared to the estimated variance of random variables
and
T .
2
In addition, for the Proportional Hazard Gompertz Onset
Dependence Model (Model 2), component
variance hence the remaining
0.5.
YO
has the largest estimated
correlations have values greater than
For the Gamma Onset Dependence Model (Model 3), components
Y
l
T
I
87
Y
z
have the largest estimated variances hence the correlations
between
T
I
and
T as well as between
Z
Xl
and
X
z
are less than
0.5.
Table 4.9.
Estimated Correlations Between Waiting Times
Random Variables
Proportional
Hazard
Gompertz
Independent
Action
Model
Proportional
Hazard
Gompertz
Onset
Dependence
Model
Gamma
Onset
Dependence
Model
Model I
Model Z
Model 3
0.547
0.435
0.981
0.978
Corr(TZ'X l )
0.537
0.426
Corr(Tl,X Z)
0.513
0.430
0.997
0.989
0.535
0.421
Corr(Tl,T )
Z
Corr(TI,X I )
Corr(TZ'X Z)
0.965
0.995
Corr(Xl,X Z)
4.6
Additional Parametric Forms of the Models
The conclusions drawn from the previous section suggest that the
Proportional Hazard Gompertz Onset Dependence Model yields the best
representation of the 'data' among the three models considered.
Cer-
tain discrepancies between the 'data' and the model predictions were
pointed out, leading us to conclude that the best fit is not
neces~
sarily a good or adequate fit.
Continuing our illustration, we consider three additional parametric forms of the models which incorporate
different parameteriza-
tions of the Gompertz models (Models 1 and 2) discussed in section
88
4.4.
We are attempting to obtain a better fitting model by removing,
to some extent, the restrictions placed on the 'a' parameter in the
Gompertz models.
This seems an obvious next stop in our search for
an adequate form of the model.
4.6.1
-- N?npr~ortional Hazard Gompertz Independent
Action Model
~9~~.~
In Model 4 we allow those component random variables in the independent action model assigned to disease A (CVD) events, Le.
to have Gompertz distributions with common parameter
different parameters
'R'
1
component random variables
and
y
2
'R'
3 .
and
Y4
aI'
and
and
'a '
1
In the same way we assume
(assigned to disease B (IHD))
have Gompertz distributions with common parameter
sarilyequal to
Y
l
and different parameters
'a '.
2
'R '
2
not neces-
and
This
'R 4 ' •
is an obvious extension of Modell - the Proportional Hazard Gompertz
Independent Action Model.
The assumption of possibly different parameter values
a
2
implies that the hazard functions for components
Y
1
not proportional to the hazard functions for components
a
and
Y
2
and
l
Y
3
and
are
Y .
4
Thus to distinguish this form of the model from Modell, we call it the
Nonproportional Hazard Gompertz Independent Action Model.
The para-
meter set for this model is
4.6.2
Model 5 - Nonproportional Hazard Gompertz Onset Dependence
Model
An obvious modification of the Proportional Hazard Gompertz Onset
Dependence Model (Model 2) is to assume the component random variables
89
have Gompertz distributions with parameter structure given as
Yo
~
Gompertz (aO,R )
o
Y
I
Gompertz (al,R )
I
Y
3
Gompertz (a ,R )
l 3
Y
Z
Gompertz (aZ,R )
Z
Y
4
Gompertz (a ,R )
Z 4
This form of the model, having the largest number (8) of parameters of any model form considered,
~
(5)
= {ao,Rp,al,RI,R3,az,Rz,R4}'
should fit the 'data' best of all forms considered.
Since the hazard
functions for many of the component random variables are not proportional we call this model the Nonproportional Hazard Gompertz Onset
Dependence Model.
5.6.3
Model 6 - Gompertz-Exponential Onset Dependence Model
From Table 4.7 we note that for component random variables
and
Y ,
4
Y
3
the estimated means are almost equal to the estimated stan-
dard deviations.
section 4.4
This result was seen for all three models fit in
and, as we shall see, is also seen for the Nonproportional
Hazard Gompertz Independent Action and Onset Dependent Models (Models
4 and 5) of this section.
These findings suggested that we consider
a form of the Onset Dependence model which allows the component random
variables
meters
Y
3
A
3
and
Y to have exponentiaZ distributions with para4
A
respectively. (This is equivalent to assuming
4
and
they are Gompertz distributions with common parameter
zero and different parameters
'R
= A3 '
'R
equal to
A '.) The other
4
component random variables are assumed to have Gompertz distributions
3
and
'a'
4
=
90
with common
'a'
'R.'
1
parameter and different
the parameter set for this model
is~.
(6)
parameters.
Thus
}
"-' { a,RO,Rl,RZ,A3,A4'
Since the use of the Gompertz and Exponential distributions
represents different assumptions on the nature of the underlying
disease processes, we will be particularly interested in comparing
the fit of the Proportional Hazard Gompertz Onset Dependence Model
(Model 1) with the Gompertz Exponential Onset Dependence Model
(Model 6).
Parameter Estimates and Evaluation of Fit for the Additional
Forms of the Model
4.'7
The forms of the models defined in the previous section were fit
to the life table deaths assigned to Cerebrovascular disease and
Ischemic heart disease using the same techniques applied to the models
of section 4.4.
The parameter estimates obtained using the pseudo
likelihood function are presented in Table 4.10 (compare these estimates to those given in Table 4.3).
Table 4.10.
Parameter Estimates for Additional Model Forms
Nonproportiona1 Hazard
Gompertz Independent
Action Model
Nonproportional Hazard
Gompertz Onset Dependence Model
Gompertz Exponential
Onset Dependence
Model
Model 4
Model 5
Model 6
~
a_ = 9.918 x 10
1
a = 8.507
2
R = 9.979
l
R : : 9.136
z
-2
-2
x
10
x
10
x
10- 5
-6
-1
R =:: 1.415 x 10
3
R :: 5.315 x 10- 1
4
a
O
=
1.318 x 10
-1
-1
a = 1.900 x 10
l
-1
a ::: 1.236 x 10
2
-4
R :: 1. 377 x 10
O
-4
R = 1. 884 x 10
1
R :: 4. 134 x 10- 3
2
-1
R = 2. 466 x 10
3
-1
R :: 8.306 x 10
4
a
::
1. 315
x
10 -1
R :: 1.133 x 10- 4
O
-4
R :: 8.862 x 10
l
-3
R :: 3. 810 x 10
2
1
11. :: 2.711 x 103
A :: 8.612 x 10- 1
4
91
4.7.1
Distributions of Age at Death Conditional on Cause
o
(pxj)
The observed distribution of deaths
butions of deaths
(p.)
XJ
from causes
C., j
J
=
and predicted distri-
1,2,3,4,
for the three
additional forms of the models are plotted in Figures 4.4 to 4.6.
A
review of these plots and comparisons with Figures 4.1 to 4.3 indicate
1.
The additional parameter in the Nonproportional Hazard
Gompertz Independent Action Model (Model 4) does not improve the
overall fit as compared to the fit of the Proportional Hazard Gompertz
Independent Action Model (Modell) (Figure 4.1).
2.
A comparison of the (6-parameter) Proportional Hazard
Gompertz Onset Dependence Model (Model 2) with the (6-parameter) Model
4 strengthens our argument that the Onset Oependence models fit best.
3.
As was suggested, the (8 parameter) Nonproportional Hazard
Gompertz Onset Dependence Model (Model 5) does give a better overall
fit to the observed distributions than does Model 2.
cular that Model 5 fits the age distribution for cause
very well but the fits for causes
C
2
and
C
4
We note in partiCl (CVD alone)
distributions (both
CVD and IHD present) are not as good as for Model 2.
4.
The Gompertz Exponential Onset Dependence Model (Model 6)
seems to fit cause
C distributions slightly better than
4
does Model 2 but there is little difference between these models for
C
l
and
C
3
C
2
and
distributions.
A comparison of the six forms of the models suggests that the
best fit to the distributions of age at death is given by the Nonproportional Hazard Gompertz Onset Dependence Model.
92
}~~~u..r~ 4.4,
Plots of Empirical and Estimated Conditional Densities
by Cause Category - Nonproportional Hazard Gompertz
Independent Action Model - Model 4
C
1
O.JO
age
age
O. .;l,S
a.u
0./5
O./~
o,as
o. os
~
age
10
.to
age
93
Figure 4.5.
Plots of Empirical and Estimated Conditional Densities
by Cause Category - Nonproportional Hazard Gompertz
Onset Dependence Model -Model 5
.
o.
~s
0) .~O
o.
os
o
10
;1;0
30
~"
5'0
60
10
Pc
'10
"'0
liD
age
age
O,::lS
0.'"
o./s
0./0
O. dS
i
.P'
0'----'0-'-0-Jo---:;:!--_·--:-~:---::----::----J.-"'-.
age
age
94
Plots of Empirical and Estimated Condi tion'Ll Densities
<- Gompertz Exponential Onset Dependence Model - Model 6
by Cause Category
0,'.1$
<>,J's
O.
i(J
0.05
IC)
:1:0
JO
~Q
!i-'
60
70
60
'10
100
-
110
i/O
age
age
o,J.S
o. 's
0.10
(J,
age
os
age
'Ie
10"'
110
95
4.7.2
Distribution of Causes of Death in a Given Age Group
An evaluation of the three additional forms of the model based
on the age specific pseudo chi-square values of Table 4.11 and those
in Table 4.6 indicates
1.
The Proportional Hazard Gompertz Onset Dependence Model
(Model 2) has the smallest total pseudo chi-square value.
The
Gompertz Exponential Onset Dependence Model (Model 6) has total
pseudo chi-square value almost as small as that found for Model 2.
2.
Interestingly enough, the Nonproportional Hazard Gompertz
Onset Dependence Model (Model 5) had larger total pseudo chi-square
than that observed for either Model 2 or Model 6 even Hough it had
two additional parameters.
2 or 6 in the
[0-35)
and
Model 5 seems to fit worse
[70-85)
age groups.
1
han Models
(We point out here
that Model 6 had larger pseudo likelihood value than that observed
for any of the other five forms of the model, as would be expected).
3.
The Nonproportional Hazard Gompertz Independert Action
Model (Model 4) has larger total pseudo chi-square than that found
for the Proportional Hazard Gompertz Independent Action Model (Model
1) indicating the additional parameter was of little benefit in
obtaining a better fit for the Independent Action model.
4.
The additional forms of the model were found to have the same
defects as the first three forms in that they overestimate the proportion of 'total deaths' which are due to causes
C
l
and
C
3
and thus
underestimate the proportion due to causes
models predicted most deaths in the
cause
C and C . All the
4
2
[0-35) age group to be from
C whereas the observed 'data' indicate most deaths in this
3
age group are from cause C .
l
96
Table 4.11.
Age Specific and Total Pseudo Chi-Squared Values
Nonproportiona1
Hazard
Gompertz
Independent
Action
Model
Nonproportiona1
Hazard
Gompertz
Onset
Dependence
Model
Gompertz
Exponential
Onset
Dependence
Model
Model 4
Model 5
Model 6
Age
[x, x+5)
0-5
1219.6
2387.3
525.7
5-10
886.2
880.6
226.1
10-15
493.6
891.4
265.8
15··20
596.1
998.9
331.2
20-25
366.8
525.5
175.6
25-30
669.5
555.8
185.5
30-35
165.7
185.4
31.5
35-40
346.8
38.1
32.3
40-45
956.4
3.6
223.0
45-50
2187.8
48.7
583.4
50--55
3133.2
144.5
956.5
55-60
4910.3
310.6
1236.7
60-65
5910.9
583.7
1228.9
65-70
6900.5
1228.9
1239.2
'10-75
8604.7
2967.9
2123.1
75-80
7330.7
3947.6
2930.2
80-85
5342.4
4028.7
3391. 9
85-90
2476.1
2227.6
2097.5
90-95
554.6
522.4
527.8
95-100
84.5
84.1
83.5
100-105
6.0
5.7
6.2
105+
1.6
1.6
1.6
Total
53149.3
22569.6
18404.3
97
Thus an evaluation of all s.ix forms of the models, with respect
to their ability to proportion 'total deaths' to the four causes in
each age group, indicates that the Proportional Hazard Gompertz Onset
Dependence Model gives the best fit, though not necessarily a 'good'
fit.
4.7.3
Estimated Component and Waiting Time Random Variable
Distributions
The estimated means and standard deviations of the component
random variables (Table4.12) and the waiting times random variables
(Table 4.13) for the fitted additional forms of the model may be compared with the corresponding estimated means and standard deviations
for Models 1, 2 and 3 (Tables 4.7and 4.8) to get some idea of the
differences in construction of the models.
As discussed in section
4.5.3 these differences represent different assumptions on the nature
of the underlying disease processes.
Table 4.12.
Estimated Theoretical Component Means and Standard
Deviations for Additional Model Forms
Nonproportional
Hazard
Gompertz
Independent
Action
Model
Nonproportional
Hazard
Gompertz
Onset
Dependence
Model
Gompertz
Exponential
Onset
Dependence
Model
Model 4
Model 5
Model 6
Component
mean
Yo
Y
1
Y
2
Y
3
Y
4
std. dev.
mean
std. dev.
mean
std. dey.
47.7
9 .. 6
49.3
9.6
87.0
12.9
33.4
6.6
33.9
9.1
73.7
14.8
23.9
8.7
23.4
8.3
4.7
3.5
2.6
1.9
3.7
3.7
1.6
1.S
1.1
0.9
1.2
1.2
98
Estimated Means and Standard Deviations of Waiting Times
Random Variables for Additional Model Forms
Table 4.13,
NonpToportional
Hazard
Gompertz
Independent
Action
Model
Nonproportional
Hazard
Gompertz
Onset
Dependence
Model
Gompertz
Exponential
Onset
Dependence
Model
Model 4
mean std. dey.
Model 5
Model 6
mean
std. dey.
mean
std. dey.
87.0
12.9
81.1
11.6
83.2
13.2
1'2
73.7
14.8
71.6
12.9
72.7
12.7
Xl
X
2
91. 7
13.4
83.7
U.8
86.9
13.8
75.3
14.9
72.7
12.9
73.9
12.7
Random
Va:dable
l'
1
Table 4.14.
Corr (1'1' T2)
Corr(Tl,X l )
Estimated Correlations Between Waiting Times Random Variables for the Additional Model Forms
Nonproportional
Hazard
Gompertz
Independent
Action
Model
Nonproportional
Hazard
Gompertz
Onset
Dependence
Model
Gompertz
Exponential
Onset
Dependence
Model
Model 4
Model 5
Model 6
0.608
0.549
0.987
0.963
0.600
0.529
0.606
0.546
0.996
0.996
0.598
0.526
0.966
Corr(T ,X )
2 1
Corr (1' 1 ' X2 )
Carr (1' 2' X2)
Corr(X ,X )
I 2
0.995
For purposes of comparison we have calculated the estimated correlations between waiting times random variables (compare Table 4.14 to
Table 4.9).
CONCLUDING REMARKS AND SUGGESTIONS FOR
FUTURE RESEARCH
In the previous chapter we consider six specific forms of the
disease model developed in Chapter III.
Action Model were studied.
Two forms of the Independent
Both of these models assumed Gompertz
distributed component random variables: the difference
between the
two was in the restrictions placed on the parameters (Modell, section
4.4.1 and Model 4, section 4.6.1).
Model were studied.
Four forms of the Onset Dependence
Two of these models have Gompertz distributed com-
ponent random variables (Model 2, section 4.4.2 and Model 5, section
4.6.2) with parameter restrictions similar to those used for the Independent Action Models.
An Onset Dependence Model which assumes Gamma
distributed random variables having common shape parameters (Model 3,
section 4.4.3) and an Onset Dependence Model using both the Gompertz
and exponential distributions (Model 6, section 4.6.3) were also considered.
With these particular forms of the model we were able to
make some comparisons between the Independent Action Model and the
Onset Dependence Model with respect to fit to the 'data' under similar
distributional assumptions.
In addition we were able to examine the
effect of different distribution (and parameterization) assumptions on
the fit of the model.
100
An evaluation of the above models with respect to their ability
to predict the distribution of age at death for specific causes (section 5.4.1, 4.7.1), as well as to predict the distribution of deaths
among the
fOUT
causes in each age inteTval (section 4.5.2,4.7.2),
clearly indicates a perfect fit was not obtained.
Taking into account
the simplifying assumptions used in these models, the fact that we
ignore deaths due to other causes, and the nature (and size) of the
data to which these models were applied, this lack of fit was not
unexpected.
Nevertheless, certain of these models were partially
successful in fitting the 'data' and from these models we will draw
our conclusions.
A comparison of the fit of the Independent Action Models (Models
I and 4) with the corresponding Onset Dependence Models (Models 2 and
5) suggests that, at least for the two diseases studied, the Onset
Dependence Model seemed to provide the better fit (conditional on
the Ilse of the Gompertz distribution).
The Independent Action Models
were unable to adequately predict the distribution of age at death
from causes
C
2
and
C
4
(deaths for which both diseases are present)
whereas the Onset Dependence Models were much more successful in this
aspect of fitting.
Since we might expect the effect of disease asso-
ciation to be more involved in the age distribution of deaths when
both diseases are present, the better fit observed for the Onset Dependence Models for the age distribution for deaths from causes
C suggests that the assumption of dependent onset
4
pletely unreasonable.
time~
C
2
and
is not com-
101
Three of the models, the Proportional Hazard Gompertz Onset
Dependence Model (Model 2), the Nonproportional Hazard Gompertz Onset
Dependence Model (Model 5) and the Gompertz Exponential Onset Dependence Model (Model 6) were partially successful in fitting the 'data'.
Models 2 and 6 were unsuccessful in fitting the age distribution of
deaths due to cause
C
l
(CVD alone).
Model 5 was somewhat more suc-
cessful in fitting the age distribution of deaths due to cause
but gave worse fit for causes
from Models 2 and 6.
C
2
and
C
4
C
l
than the fits obtained
The almost identical fits obtained from Model
2 and Model 6 suggest that the distribution of the time-to-deathafter-onset component random variables,
Y and Y , may be exponen4
3
tial and need not necessarily have the same distribution as the other
component distributions.
The fit of Model 5 suggests that the dis-
tribution of component random variable
YO
may also be different
from the other distributions.
From the above discussion we can conclude that two reasons for
the lack of fit of the particular models studied was the choice of
distributions for component random variables and the limitations
placed on the parameters of these distributions.
The Gompertz dis-
tribution was used mainly because of its use in the analysis of human
mortality.
Restrictions on model parameters were made to reduce the
total number of parameters estimated since the cost of estimating
each parameter increases as the number of parameters increases.
This
suggests that in future models, rather than increase the number of
parameters in the Gompertz model (Independent Action or Onset Dependence Model), other distributions, in addition to the Gompertz
102
distribution, be used.
For certain diseases, information on the age
at onset Illay be obtainable and in that case would be useful in speci··
fying the model distributions.
Another reason for the overall lack of fit of the particular
models studied is that these models do not take into account the
effect of the presence of other causes of death ('cause'
C ).
5
By
limiting our discussion to mortality in a cohort of individuals
susceptible to only two causes of death, we have been ahle to develop a simple model for disease association which could he examined
within the constraints of a dissertation.
These models offer a
framework for the development of future models of disease association
which would include the possibility of death due to other causes
Initially we might consider other causes of death to be independent
of the two specific causes under study since dependence between a
specific cause and a group of other diseases would be difficult to
define (in biological terms) and might well prove mathematically
intractable.
With an extended model (i.e. one accounting for all deaths) we
may be able to draw inferences to the whole population (or cohort),
something we are unable to do with the models discussed in this study.
We will still be required to fit these models to the 'data' from the
mUltiple decrement life table and thus a new ipseudo likelihood; function will be used to estimate parameters.
As in this study, the
statistical properties of such 'pseudo likelihood' estimates cannot
be determined directly.
It may be possible to use Monte Carlo methods
to investigate the properties of these estimates.
103
Finally, in future studies, multi-stage models of the type discussed in this study should be applied to more specific disease
(cause) categories than those discussed in Chapter II.
Careful
and precise definitions of disease stages in terms of (observed and
unobserved) disease pathology as well as clear definitions of the
dependency concepts assumed
a useful model.
are necessary to the development of
In Chapter II we attempted to make such definitions
but the large disease categories (by this we means that each disease
category (i.e. CVD or IHD) included a large number of separate disease
entities (see Appendix I)) studied
implied that precise definitions
of disease stages in terms of disease pathology were impossible.
The models developed in this dissertation and the above suggestions should provide a useful starting point for discussion
between medical researchers and biostatisticians on the questions of
dependent causes of death and disease association.
should result in more realistic and useful models.
Such discussion
104
BIBLIOGRAPHY
Abramowitz, M., and Stc&run, I .A., eds. 1972. Handbook of Mathematical
Punetions with Pormula3~ Graphs and Mathema-tical Tables. National
Bureau of Standards, Applied Mathematics Series no. 55.
Washington, D.C: U.S. Government Printing Office.
Beadenkopf, w.e.; Abrams, M.; Daoud, A.; and Marks, R.U. 1963. An
assesSlnent of certain medical aspects of death certificate data
for epidemiologic study of arteriosclerotic heart disease.
Journal of' Cftl'onic Diseases 16: 249-62.
Berkson, J. and Elvehack, L. 1960.
Competing exponential risks, with
particular reference to the study of smoking and lung cancer.
JOUl'nal oj'the American Statistical Association 55:415-28.
Mortality Statistics: 1918, pp. 49-91.
U.S. Government Printing Office_
Bureau of the Census, 1920.
Washington, D.C.:
Bureau of the Census, 1927. Mortality Statistics: 1921j, Part I, pp.
425-55. Washington, D.C.: U.S. Government Printing Office.
Bureau of the Census, 1973. Census of Population~ 1970, Vol. 1, Part
1, Section 1, Table SO, pp. 1-265. Washington, D.C.: U.S.
Government Printing Office.
Chiang, C.L. 1961. A stochastic study of the life table and its applications: III The follow-up study with the consideration of competing risks. Biometrics 17:57-78.
Chiang, C.L. 1968. Introduction to Stochastic Processes in Biostatistics. New York: John Wiley &Sons.
Clifford, P. 1977. Nonidentifiability in stochastic models of illness
and death. Proceedings oj' the Nat-ional Academy of Sciences
74(4) :1338-40.
David, B.A. 1974. Parametric approaches to the theory of competing
risks. In Reliability and Biometry: Statistical Analysis oj'
Life length , pp. 275-90. Edited by F. Proschan and R.J. Serfling.
Philadelphia: SIAM.
David, H.A. and Moeschberger, M.L. 1978.
New York: Macmillan Publishing Co.
Theory of Competing Risks.
Dublin, L. and VanBuren, G. 1924. Contributing causes of death,
their importance and suggestions for their classification.
American Journal of Public Health 14:100-105.
lOS
Elandt-Johnson, R.C. 1976. Conditional failure time distributions
under competing risk theory with dependent failure times and
proportional hazard rates. Scandinavian Actuarial Journal, pp.
37-51.
Fix, E. and Neyman~ J. 1951. A simple stochastic model of recovery,
relapse, death and loss of patients. Human Biology 23:205-41.
Gail, M. 1975. A review and crtique of some models used in competing risks analysis. Biometrics 31:209~22.
Guralnick, L. 1966. Some pr1bmes in the use of multiple causes of
death. Journal of Chronic Diseases 19:979-90.
Hoel, D.G. 1972. A representation of mortality data by competing
risks. Biometrics 28:475-88.
IMSL, 1978. Computer program MDGAM. Houston, Texas: International
Mathematical and Statistical Libraries.
James, G.; Palton, R.E.; and Heslin, A.S. 1955.
of-death statements in death certificates.
70: 39-51.
Accuracy of causePublic Health Reports
Janssen, T. 1940. Importance of tabulating multiple causes of death.
American Journal of Public Health 30:871-79.
Johnson, N.L. and Kotz, S. 1970. Distributions in Statistics:
Continuous Univariate Distributions I and II. New York: John
Wiley & Sons.
Jordan, C.W. 1961.
Actuaries.
Life Contingencies.
Chicago:
Society of
Kaplan, E.B. and Elston, R.C. 1972. A subroutine packaf,e for maximum
likelihood estimation (MAXLIK). losti tute of Stat .stics Mimeo
Series No. 823. Chapel Hill: Department of Biostatistics.
University of North Carolina.
Kaplan, E.L. and Meier, P. 1958. Nonparametric estimation from incomplete observations. Journal of the American Stati:~tical Association 53:457-81.
Krueger, D.E. 1966. New numerators for old denominators -Multiple
causes of death.. In National Cancer Institute Monograph #l9, pp.
431-44. Washington, D.C.: U.S. Government Printing Office.
Ku1ler, L.H. 1976.
perspectives.
Epidemiology of cardiovascular disease: current
American Journal of EPidemiology 104(4):425-56.
Lagakos, S.W. 1976. A stochastic model for censored survival data
in the presence of an auxiliary variable. Biometrics 32:551-59.
106
Lee, L. and Thompson, W. A. 1974. Results on failure time and pattern
for the series system. In Reliability and Biometry: Statistical
Analysis of Lifelength, pp. 291-302. Edited by F. Proschan and
R.•J. Serfling: Philadephia: SIAM.
Manton, K.G.; Tolley, H.D.; and Poss, S.S. 1976. Life table techni-·
ques f'01' mul tiple cause mortaIi ty. Demography 13 (4) ; 541-64.
Marshall, A.W. and Olkin, I. 1976. A multivariate exponential distribution. JournaZ of the American S·tatistical Association
63:30-44.
Mausner. J. S. and 8ahn, A.K. 1974. Epidem1.:ology: An Introductory
Text, pp. 12-15. Philadelphia: W.B. Saunders Co.
Mildivan, A.S. and Strehler, B.L. 1960. A critique of theories of
mortality. In The Biology of Aging, pp. 216-35. Edited by
B.L. Strehler. Washington, D.C.: American Institute of
Biological Sciences.
Moeschberger, M.L. and David, ~I.A. 1971. Life tests under competing
causes of failure and the theory of competing risks. Biometrics
27:909-33.
Moeschberger, M.L. 1974. Life tests under dependent competing causes
of failure. Technometric8 16(1) :39-47.
Moriyama, I.M.; Dawber, T.R.; and Kannel, W.B. 1966. Evaluation of
diagnostic information supporting medical certification of deaths
from cardiovascular disease. In National Cancer Institute
Monograph #19, pp. 405-20. Washington, D.C.: U.S. Government
Printing Office.
National Center for Health Statistics, 1965. Vital Statistics of
the United (Jtates, 1955, SuppZemen-t: MultipZe Caures of Death.
Washington, D.C.: U.S. Government Printing Office.
National Center for Health Statistics, 1967. Eigtht Revision of International Classification of Di8ea8e8~ Adapted for Use in the
United States, Vol. 1. Washington, D.C.: U.S. GOvernment
Printing Office.
National Center for Health Statistics, 1975a. Comparability of MortaZity Stat1:stics for the Seventh and Eigtht Revisions of the
InternationaZ Classification of Diseases, United States. U.S.
Department of Health, Education and Welfare Publication No. (HRA)
76-1340, Rockville, MD.
107
National Center for Health Statistics, 1975b. United States Life
Tables: 1967-1971, Vol. I, No.1. Department of Health, Education and Welfare Publication No. (HRA) 75-1150. Rockville, MD.
National Office of Vital Statistics, 1943. Vital Statistics of the
United States, 1940, Part I., pp. 570-623. Washington, D.C.:
U.S. Government Printing Office.
Neil, J.V. 1962. The Use of Vital and Health Statistics for Genetic
and Radiation Studies. Geneva: World Health Organization.
Olson, F.E.; Norris, F.D.; Hammes, L.M.; and Shipley, P.W. 1962.
A study of mUltiple causes of death in California. Journal of
Chronic Diseases 15:157-70.
Pitts, A.M. 1976. Some Notes on the Collection of u.s. Multiple
Cause of Death Data with Illustrative MUltiple Cause Tabulations for 1969. Durham, N.C.: Center for Demographic Studies,
Duke University, October.
Pohlen, K. and Emerson, H. 1942. Errors in clinical statements of
cause of death. American Journal of Public Health 32:251-60.
Pohlen, K. and Emerson, H. 1943. Errors in clinical statements of
cause of death: second report. American Journal of Public
Health 33:505-16.
Public Health Conference on Records and Statistics, 1962. Proceedings
of the 9th National Meeting, p. 11. PHCRS Document No. 574.
Sartwell, P.E. and Merrell, M. 1952. Influence of the dynamic character of chronic disease on the interpretation of morbidity
rates. American Journal of Public Health 42:579-84.
Seal, H.L. 1977. Studies in the history of probability and statistics
XXXV: Multiple decrement or competing risks. Biometrika 64(3):
429-39.
Templeton, M.C. and Evans, M.C. Automated Classification of Medical
Entities (ACME) fop Selection of Causes of Death. Presentation
at the 98th Annual Meeting of the American Public Health Association Statistics Section, October 29, 1970. Houston, Texas.
Tolley, H.D.; Burdick, D.; Manton, K.G.; and Stallard, E. 1978. A
compartment model approach to the estimation of tumor incidence
and growth: investigation of a model of cancer latency.
Biometrics 34:377-90.
Tsiatis, A. 1975. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences
72(1) :20-22.
108
Weiner, L.; Bellows, M.T.; McAvoy, G.lf.; and Cohen, E.V. 1955. Use
of multiple causes in the classification of deaths from cardiovascular-renal disease. Anlerican Journal of Public Health 45:
492-501.
Weiss, G.H. and Zelen, M. 1965.
trials,
JOU1'Yl/1t
A semi-Markov model for clinical
of Appl-ied Probab'iUty 2: 269-85.
Wijsman, R.A. 1958, Contribution to the study of the question of
association between two diseases. Human Biology 30:219-36.
Manual of the International Statistical Classification of Disease, Injuries and Causes, p. 45.
World Health Organization, 1949.
Geneva:
World Health Organization.
109
APPENDIX 1
Detailed List of Eigth Revision rCDA Codes for Ischemic
Heart Disease and Cerebrovascular Disease (NCHS 1967)
ISCHEMIC HEART DISEASE (410-414)
410
Acute myocardial infarction
Includes:
cardiac infarction
coronary (artery):
embolism
occlusion
rupture
. thrombosis
infarction of heart, myocardium, or ventricle
rupture, heart or myocardium
Also includes:
410.0
any condition in 412 specified as acute or
with a stated duration of eight weeks
or less
With hypertensive disease
Any condition in 410 with any condition in 400-404
*410.1
410.9
411
Repeat myocardial infarction during current hospitalization
Without mention of hypertensive disease
Other acute and subacute forms of ischemic heart disease
Includes:
411.0
angina decubitus
coronary:
failure
insufficiency
intermeidate coronary syndrome
micro-infarct of heart
pre-infarction syndrome
subendocardial infarction
With hypertensive disease
Any condition in 411 with any condition in 400-404
411.9
Without mention of hypertensive disease
110
412
Chronic ischemic heart disease
Includes:
aneurysm of heart
arteriosclerotic heart (disease)
cardiovascular:
arteriosclerosis
degeneration
disease
sclerosis
Coronary (artery):
arteriosclerosis
atheroma
disease
sclerosis
stricture
healed myocardial infarct
ischemic degeneration, heart or myocardium
ischemic heart disease
postmyocardial infarct syndrome
Also includes:
412.0
any condition in 410 specified as chronic or
with a stated duration of over eight weeks
With hypertensive disease
Any condition in 412 with any condition in 400-404
412.9
413
Without mention of hypertensive disease
Angina pectoris
Includes:
413.0
angina:
NOS
Pectoris
anginal syndrome
cardiac angina
stenocardia
With hypertensive disease
Any condition in 413 with any condition in 400-404
413.9
414
Without mention of hypertensive disease
Asymptomatic ischemic heart disease
Includes:
ischemic heart disease diagnosed on ECG but presenting
no symptoms
111
414.0
With hypertensive disease
Any condition in 414 with any condition in 400-404
414.9
Without mention of hypertensive disease
CEREBROVASCULAR DISEASE (430-438)
Excludes:
430
Subarachnoid hemorrghage
Includes:
431
with malignant hypeptension (400.2!
meningeal hemorrhage
ruptured (congenital) cerebral aneurysm
subarachnoid hemorrhage
430.0
With hypertension (benign)
430.9
Without mention of hypertension
Crebral hemorrhage
Includes:
hemorrhage (of):
basilar
bulbar
cerebellar
cortical
extradural (not
traumatic)
hemorrhage (of):
internal capsule
intracranial
intrapontine
pontine
subcortical
subdural
ventricular
rupture of blood vessel in brain
subdural hematoma, not due to trauma
432
431.0
With hypertension (benigh)
431.9
Without mention of hypertension
Occlusion of precerebral arteries
Includes:
embolism, thrombosis, occlusion (of) (artery):
basilar
carotid (cornmon) (internal)
precerebral arteries NOS
vertebral
112
433
432.0
With hypertension (benign)
432.9
Without mention of hypertension
Cerebral thrombosis
Includes:
434
4:33.0
With hypertension (benign)
433.9
Without mention of hypertension
Cerebral embolism
Includes:
435
embolism, embolic:
apoplexy
brain
cerebral
intracranial
paralysis
softening of brain
434.0
With hypertension (benign)
434.9
Without mention of hypertension
Transient cerebral ischemia
Includes:
436
cerebral artery occlusion NOS
cerebral infarction NOS
thrombosis, thrombotic:
apoplexy
intracranial
brain
paralysis
cerebral
softening of brain
basilar artery syndrome
intermittent cerebral ischemia
spasm of cerebral arteries
vertebral artery syndrome
435.0
With hypertension (benign)
435.9
Without mention of hypertension
Acute but ill-defined cerebrovascular disease
Includes:
acute cerebrovascular disease
apoplectiform convulsions
apoplexy, apoplectic:
NOS
bulbar
cerebral
113
fit
hemiplegia
seizure
stroke
cerebral seizure
cerebrovascular accident NOS
hypertensive encephalopathy
stroke (paralytic)
437
436.0
With hypertension (benign)
436.9
Without mention of hypertension
Generalized ischemic cerebrovascular disease
Includes:
438
atheroma of cerebral arteries
cerebral:
arterioscleriosis
endarteritis
ischemia NOS
thrombo-angiitis obliterans
cerebrovascular:
degeneration
insufficiency
sclerosis
437.0
With hypertension (penign)
437.9
Without mention of hypertension
Other and ill-defined cerebrovascular disease
Includes:
cerebellar softening
cerebral:
arteritis
edema
hemiplegia
hyperemia
necrosis
paralysis
softening NOS
cerebrospinal softening
cerebrovascular disease NOS
hemiplegia specified as arteriosclerotic or hypertensive
necrosis of brain
softening of brain
thrombosis, nonpyogenic, of intracranial sinus (any)
thrombosis of spinal cord
114
Excludes;
hemiplegiq NQ$ (344,1)
thromboqi~ Cpyogenic oX'igin) of intr>acmnial sinus (321)
thrombosiq of pyogenic or>igin of spinal cord (322)
438.0
With hypertension (benign)
438.9
Without mention of hypertension