ESTIMATION OF AGE AT DEATH DISTRIBUTION
FOR A SPECIFIC CAUSE
by
Regina C.
Elandt~Johnson
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No, 1120
May 1977
ESTIMATION OF AGE AT DEATH DISTRIBUTION
FOR A SPECIFIC CAUSE*
Regina C. Elandt-Johnson
Department of Biostatistics
University of North Carolina
Chapel Hill, NC 27514, U.S.A.
SUMMARY
Age at death distribution is defined for a (hypothetical) population of individuals who are
liable to die only from one specific cause (Section I), A general model of
mixture of age at death distribution for a joint survival function is intro~
duced (Section 2), and the likelihood function (for estimating parameters)
is constructed (Section 3). Of special concern is a model which represents
a mixture of two survival functions associated with two competing causes I
a specific cause (C ) with liability ¢ (0 < ¢ ~ 1) to die from C , and
I
I
'all other causes' (C ) with liability 1 (Section 4). Since the joint
2
survival function cannot be uniquely determined from mortality data alone,
it is further assumed that the force of mortality of one causes in presence
of the other cause is the same as if the other cause were ignored. Since
C represents a group of many causes, this assumption might be a fair
2
approximation to some situations. It is, in fact, used in actuarial work.
Constructing single decrement life table for a specific cause (C ) from
1
multiple decrement life table, we obtained the marginal distribution, the
The theory of competing risks is briefly reviewed.
proportion
¢,
and the age at death distribution for this cause (Section 5).
An example for cancer mortality, using the US life tables 1959-61 is presented.
Key words:
Competing risks; Mixture of failure distributions; Age at death
distribution.
*This investigation was supported by NIH research grant number 1 ROI CA17107
from the National Cancer Institute,
1,
1.1.
Suppose that
k
competing causes of death,
are operating in a population.
'times to die' from
INTRODUCTION
Let
C ,C 2 , ... ,C
l
k
Xl ,X 2 ,. ",Xk
C ,C , ... ,C ,
k
l 2
say,
denote (hypothetical)
respectively,
Let
(1.1)
be the corresponding joint survival function, and
(l,2)
; Sl . . . k(O, ... ,xa. , ... ,0)
f
be the marginat survival function.
= 1 - S
a."
(x)
a.
is the corre-
sponding marginal failure distribution.)
In the theory of competing risks, it is usually assumed that each death
is due to one (so caUed 'undertying') cause.
Then the time at death,
X,
say, is
(1. 3)
and the (observable) overall joint survival function is
(1.4 )
with the force of mortality (hazard rate)
(1. 5)
We can also observe the cause of death.
The (crude) probability of eventually dying from cause
C
el
in pre-
sence of all other causes is
(1.6)
where
a]Ja ex)
as1. .. k (xl ~ •.. ,x k )
dX
(1. 7)
a
Ca'
is so called 'crude' hazard rate for cause
It follows that
a]Jx (x) = a]JI (x)
+
a]J2 (x)
+ • ,. +
a]Jk (x) ,
(1.8)
Note that
(1.9)
is the proportion of those who died from
and
aSa(X)
=:
a
C
el
in presence of all other causes,
(1.10)
Pa(x)
is the (proper) survival function for cause
C
a
among those who eventually
die from
C in presence of all other causes, that is, the conditional sura
vival function given that death is ultimately due to C .
a
1.2.
Because of the condition (1.3), it appears that the joint sur-
vival function (1.1) cannot be uniquely identified from mortality data
alone:
"e
to each model with dependent 'times to die' there exists a model
with independent 'times to die' both yielding the same likelihood (Tsiatis
(1975)).
In other words, there exists another joint survival function
-4-
*
Sl ... k(x l ,x 2 ,·· .,xk )
=
, say,
for which the marginal hazard rates,
ll~.(x),
(1,11)
satisfy the condition
(1.12)
and which, yields the same
tion defined in (1.1).
Sx(x)
and
P (x)'s
0:.
as the joint survival func-
In our further discussion, we assume then that
'times to die' are independent.
1.3.
Imagine an isolated (hypothetical) population in which
the only cause of death.
Co:.
is
Let
S (x) = pr{X > x}
0:.
(1.13)
0:.
be the survival function in this population, and
(1.14)
be the corresponding force of mortality.
We define the failure distribution
(1.15)
F (x) = 1 - S (x)
a
a
as the age at death distribution from cause C acting aZone. This is, in
a
fact, the distribution of age of 'hypothetical' deaths from C
a
1.4.
In
competing risk theory, it is usually assumed that each
vidual is liable to die from any of the
k
causes.
indi~
Therefore, under the addi-
tiona1 assumption of independence of 'times to die', we have
(1.16)
so that the marginal distributions represent the age at death distributions.
J
-5-
In real populations, however, some individuals are often more prone
to die from one (or few) specific cause(s), but have low risk of dying
from other causes.
The population can be regarded as a mixture with
respect to risk (liability) to die from different causes.
The topic of the present paper is the estimation of age at death distribution from a specific cause from mortality data of a population which
is heterogeneous with respect to liability of dying from different causes.
2.
2.1.
MIXTURE OF SURVIVAL FUNCTIONS
In an ordinary (human) population, the number of all possible
causes is vast.
three causes:
For convenience of the argument, we confine ourselves to
two specific causes
C and C , and the third
2
l
C or C . Extension to k
l
2
denotes 'all other causes' except
C , which
3
causes is
straightforward.
Consider a population in which
taneollsly.
assume
C , C
l
2
and
C
3
are operating
Since everybody must die sometime, it seems reasonable to
that everybody is liable to die from 'other causes'
for the specific causes, we assume that only a proportion
die from
C ;
3
C
l
(of course, as well as from
and a proportion
portion
simul~
1 -
~12
(~l + ~2 + ~12)
from
C , C
l
2
(C ).
3
and
C ,
3
is liable to
~l
C ); a proportion
3
~2
from
C2 and
while the remaining pro-
of individuals could die from
C
3
only.
ther, we assume that (conditional on liability) the 'times to die',
X , are independent.
3
lation is a mixture
However,
Fur-
Xl' X ,
2
Therefore, the joint survival function for this popu-
-6-
(2.1)
We may also write (2.1) in the form
(2.la)
The marginal survival distributions are
(2.2)
Of course, in the general case
(2.3)
so that unconditionaZZy,
are not independent.
Xl' X2 , X
3
Note that
that is,
51. (x)
2.2.
and
Denoting by
of the random variable
a~l
(x)
in the form
5 .(x)
2
are improper distributions.
fa(x) ;: Xa'
d5 (x)
a
dx
the probability density function
we express (from (1,7)) the 'crude' hazard rate
i
-7-
(2.5)
A similar expression is obtained for
scripts
1
and
2,
a~2(x)
by exchanging the sub-
while
(2.6)
2.3.
Special cases.
liable to die from
Let
E
a
C (a = 1,2,3).
a
denote the event that an individual is
We have
(2,7)
We may identify three special cases:
(i)
The events
In this case
I
and E are mutually exclusive.
2
</>12 = 0, so that (2.la) takes the form
E
l
(2.8)
(ii)
Put
The events
E
l
</>1 + </>12 = y l'
and
and
E are independent.
2
</>2 + </>12 = y 2'
In view of independence of
El and E2 we have </>12 = YlY2' so that </>1 = Yl(1~Y2)
</>2 = y (1-Yl)' The joint survival function (2,la) takes the form
2
events
and
(2.9)
This model has been discussed by Hoel (1972),
necessarily coincide with
"-
*
S123(x
l ,x 2 ,x 3 )
Note that (2.9) does not
defined in (1.11).
-8-
(iii)
Each -individua), is liabZe to die from any cause,
Finally, if we assume
¢12
= 1),
Y1
=
Y2 ;: : 1
(or equivalently
¢l;::: ¢2
=0
and
the joint survival function (2,la) takes the form
(2.10)
3.
ESTIMATION OF AGE AT DEATH DISTRIBUTION:
PARAMETRIC APPROACH
Suppose that we have a sample of
had died.
Let
n
n
individuals followed until all
denote the number of deaths from cause
a
C ,
a
be the time at death of the jth individual who died from cause
sence of all other causes.
L
a
~
and
C
a
x .
aJ
in pre-
The corresponding likelihood is
na
TT
Q (x .)
j=l a aJ
I
na
=
TT
a~ (x .)Sx(x .)
j =1
a aJ
aJ
(3.1)
where
(3.2)
The overall likelihood is
k
L
= 11
a=l
(3.3)
L
a
Likelihood for grouped data can be obtained using the probabilities
of dying in specified intervals (multinomial).
Of course, in view of nonidentifiability of the joint survival function,
we are not sure whether our estimates of
Sa(x)'s
are correct, even if the model
fits the data well, without giving reasons for assuming a special parametric
form of joint survival function,
(See Section 1.2.)
-9-
4. SURVIVAL MODEL WITH TWO
CAUSES OF DEATH. NONPARAMETRIC APPROACH
4.1.
Nonparametric estimation of marginaZ survival functions can be
derived only for models based on independence of (unconditional) 'times
to die',
For example, in our case of
k=3
causes, with incomplete
liability, only model (2.9) can be considered, for the remaining models
only bounds can be obtained (Peterson (1975)).
In deriving the nonparametric estimates it is convenient to distinguish only two causes;
causes' (except
C ),
l
the specific cause
say.
C ,
2
C ,
l
say, and the 'other
Our model (2.1) takes now the form
(4.1)
or
(4.la)
where
~
from
C
l
(0 < ¢ < 1)
is the proportion of individuals who are liable to die
as well as from
only to 'other causes',
C ,
2
while the proportion
(l-¢)
is 'susceptible'
C .
2
It is worthwhile to notice that (4.1) resembles a model discussed by
Berkson and Gage (1952).
In their problem, the population consists of
patients who had an onset of cancer, and
(l-¢) (= c
corresponds to patients who. are 'cured', while
in their notation)
¢ (= l-c)
are those who
died from cancer.
4.2.
tribution,
By life table techniques, we can only obtain the marginaZ
Sao (x) (a = 1,2).
dis~
-10-
For the specific cause,
C ,
l
we have
(4.2)
This implies that the proportion of those who are liable to die from
<p =
1-5
1°
(4.3)
(00)
Therefore, the age at death distribution,
F (x),
l
can be obtained from
the formula
1-510 (x)
1-5 (00) ,
(4.4)
1°
r
and
(4.4a)
For 'other causes'
C ,
2
we have, of course,
(4.5)
The force of mortality of the marginaZ (improper) survival
function
5 1o (x),
is
f
11 1o (x) =
while the force of mortality of
=
l-<P
l
(x)
T+ 51 (x)
51 (x)
'
(4.6)
is
(4.7)
Of course,
(4.8)
-11-
4.3.
When the data are complete, and the individual times at death,
t .(a = 1,2), are recorded, the product-Zimit estimate (often called the
aJ
Kaplan-Meier estimate) can be used for estimating the marginal distribu...
tions
Sa. (x) .
From
Examples with this kind of data are given by Hoel (1972),
population data, we construct the multiple decrement life tables.
It is customary to construct a single decrement life table associated with
a given multiple decrement life table, where a specific cause
eliminated.
(C )
l
is
This is equivalent to calculating the marginal (proper)
distribution
S2.(x)
of model (4.1).
Of course, one may also construct
by the same method the marginal distribution
then the age at death distribution,
F (x).
l
Sl.(x) = 1 -¢F (x),
l
and
This problem is discussed
in the next section.
5. APPLICATION TO POPULATION
MORTALITY DATA: LIFE TABLES
It is assumed that the reader is familiar with terminology of multipIe decrement life tables,
countries.
Notation is, in fact, not the same in different
In the present paper, we try to conform to English notation,
but using subscripts rather than superscripts (e.g.
etc).
al
ax
instead of
al (a) ,
x
We also assume that the survival functions are the 'true' functions,
since the life tables are models, not data.
5.1.
The
ala
'e
OVerall survival function,
a1
x
newborns.
SX(x).
column gives the number of survivors at exact age
x
out of
Thus we have
(5.1)
-12-
5.Z.
The survivaZ function for
aS (x) .
I
in presence of aZZ other causes,
C
I
Multiple decrement life tables given also columns of
ber of life table deaths between age
the
ad
ax
column, we can calculate
individuals present age
bability,
x
x
ax
x +I
= \x
ly=
C
a
- the num-
a ad ay
C. Using
a
- the number of
C.
a
The crude pro-
a = alaa/aIO' and the survival
in presence of all other causes is
Hence
tion for those who died from
ax
from cause
who eventually die from
= alax/a1a'
Pa(x)
al
to
ad
I
func~
TI
al
ax
= ---=
TI
al
O
a
5.3.
Age at death distribution,
Consider two causes,
F (x).
I
(specific), and
C
1
denote the number of survivors from
llx
other causes are ignored.
Thus
survival function for cause
Let
ad = al - al
x
x
x+l
and
x + I,
(5. Z)
adZ
= ad
x
x
- ad
C
I
C
z
(~other
causes').
Let
in a population in which
represents the margina Z
Sl·(x) = ll/lla
C .
I
denote the number of deaths between age
lx
x
to
- the number of deaths from 'all other'
causes
(C ).
Z
The crude conditional probabilities of dying from all causes, from
C '
I
and from
C
z
in presence of all causes acting simultaneously, are
respectively
aq
Let
C
z
qlx
is ignored.
x
= ad x /al a'.
aqax
=
ad ax / ala'
a
=
I, Z .
denote the conditional probability of dying from
(5.3)
C
I
when
e
~13-
(i)
One method of estimating
if they were withdrawals.
qlx
is to consider, deaths
ad
2x
as
Thus
aqlx
al
-1 ad
x 2
2x
= -1---
~
(5.4 )
1 - 2 aq2x
(e.g. see Jordan (1967), p. 279.)
(ii)
There are some other approximations to
qlx'
Among these the
following approximate formula was used in constructing the US Life Tables
1959-61 (1968)
1
1 - - aq
2x
2
l-aq 2x
Then
llX
(5.5)
is calculated from the formula
(5.6)
where
is the commonly chosen radix.
The marginal survival function, Slo (x)
is then
(5.7)
(5.8)
The age at death distribution,
Fl(x),
l~SlO
(x)
l-S
(00)
l'
is from (Y,y)
(5.9)
and
Sl (x)
In a similar way we derive
(5.9a)
1 - F1 (x)
=
S2 (x) .
0
In fact, we have
52 (x)
= 52.ex).
-14-
Notice that we should have
¢ > 7T l'
because there should be more
individuals susceptible to a specific cause than individuals who actually
die from this case.
5.4,
Survival function in a population with additional risk,
In many investigations of survivorship, and especially in clinical
trials, it is important to know the mortality pattern (life table) of the
population exposed to the additional risk of dying from a specific cause
such as, for example, cancer, diabetes, tuberculosis, etc,
Follow up studies of such populations are difficult and expensive;
usually the population represents a mixture of persons of different ages;
patients enter the study at various time points; and it takes a long time
(and high cost) to follow each patient until death,
It seems that a sur-
vival function of the form
(5.10)
might sometimes be a fair approximation to the survival function for such
a population.
It can be interpreted as the survival function among indivi-
duals who all are exposed to risk of dying from a specific cause
as from other causes
(C )
l
as well
(C ).
2
Clearly, we will have
(5.11)
where
SX(x)
is defined in (5,1).
In life table notation, we have
(5,12)
~15~
For the method to be efficient, it is rather important that the
plete life tables be as complete as possible.
com~
Unfortunately, most avail-
able multiple decrement life tables are abridged and, what is worse, the
last recorded age is very often only 85.
sensitive to the greatest age
x,
The coefficient
¢ is rather
for which a value of
is given
and clearly age 85 is not sufficiently old age.
However, the US multiple decrement life tables
1959~6l
columns extending to age 100, still having five year
should lead to somewhat a better estimate of
have some
intervals~
This
¢ though obviously not
the best.
EXAMPLE.
We apply now the method to cancer mortality, using the US
pIe decrement life tables
1959~6l
(1968) for White Males,
Columns 2-4 of our Table 1 are extracted from these tables,
we are now using the notation for the abridged life tables:
ad,
x
etc.
multi~
n
ad
x
Of course,
instead of
The remaining columns (6-12) exhibit various survival functions
which have been discussed in this section,
Note that
TIl
= PI (0) = 0.1526 is the proportion of White Males who
ultimately die from cancer in presence of all other causes acting in the
population, while
¢=
l~Sl'(oo)
= 1 -0.2358 = 0,7642
is the proportion of
Whilte Males who were under additional risk of dying from cancer,
Figures lA through 4A represent the following survival functions for
White Males:
the overall survival functions,
SX(x)
(Fig, lA);
the
TABLE 1
Various 'survival' distributions associated with cancer (U.S. Life Tables 1959-61, White Males)
Age group
x to x+n
a1
0-1
1-5
5-10
10-15
15-20
10,000,000
9,740,831
9,701,491
9,675,807
9,650,295
259,169
39,340
25,684
25,512
59,524
723
4,832
4,548
3,522
4,557
20-25
25-30
30-35
35-40
40-45
9,590,771
9,510,576
9,440,060
9,358,859
9,242,738
80,195
70,516
81,201
116,121
189,484
45-50
50-55
55-60
60-65
65-70
9,053,254
8,742,387
8,246,310
7,548,471
6,583,375
70-75
75-80
80-85
85-90
90-95
95-100
100+
ad
ad
ad
#
Sx(x)
PI (x)
aS (x)
l
Sl'(x)
Sl (x)
S2(x)
Sx(x)
258,446
34,508
21,136
21,990
54,967
1.0000
.9741
.9701
.9676
.9650
.1526
.1525
.1520
.1515
.1512
1.0000
,9995
.9964
.9934
.9911
1.0000
.9999
.9994
.9990
.9986
1.0000
.9999
.9993
.9986
.9982
1.0000
.9742
.9707
.9686
.9664
1.0000
.9741
.9700
.9673
.9646
5,518
7,333
9,942
15,388
27,634
74,677
63,183
71,259
100,733
161,850
.9591
.9511
.9440
.9359
.9243
.1507
.1502
.1495
.1485
.1469
.9881
.9845
.9797
.9731
.9631
.9981
.9975
.9968
.9957
.9941
.9975
.9968
.9958
.9944
.9922
.9609
.9534
.9471
.9399
.9298
.9585
.9503
.9431
.9346
.9226
310,867
496,077
697,839
965,096
1,200,840
51,293
92)672
140,748
197,779
235,224
259,574
403,405
557,091
767,317
965,616
.9053
.8742
.8246
.7548
.6583
.1442
.1390
.1298
.1158
.0959
.9449
.9113
.8506
.7583
.6287
.9911
.9854
.9747
.9574
.9309
.9883
.9809
.9669
.9443
.9096
.9135
.8872
.8461
.7884
.7072
.9028
.8702
.8180
.7445
.6433
5,382,535
4,020,739
2,599,327
1,306,474
460,007
1,361,796
1,421,412
1,292,853
846,467
364,365
241,044
214,255
159,413
80,748
24,257
1,120,752
1,207,157
1,133,440
765,720
340,108
.5383
.4021
.2599
.1306
.0460
.0724
.0483
.0269
.0109
.0028
.4745
.3165
.1760
.0715
.0186
.8948
.8495
.7945
.7269
.6502
.8624
.8030
.7311
.6427
.5423
.6015
.4733
.3272
.1799
.0710
.5187
.3801
.2392
.1156
.0385
95,642
11,513
84,129
11,513
3,813
327
80,316
11,186
.0095
.0012
.0000
.0004
.0000
.0000
.0027
.0002
.0000
.5673
.4854
.2358
.4338
.3266
.0000
.0170
.0024
.0000
.0074
.0008
.0000
00
e
~
x
n
x
n
1x
n
2x
-
.
..------..
e
I
~
,
0-
-17-
..-.-==::::::::::::--------------,
;
o
e>
0
0
Q)
Q)
0
<:>
0
0
0-
en
0
0
."
."
i!l
i!l
."
."
0
0
s~
~~
~
~o
~cS
~
~cS
He>
~.
tj.
~
~
~
O.
10.
20.
30.
'1-0.
Fig. lAo Survival function
SO. 60. 70.
RGE (X)
80.
90.
100. 110.
~
O.
10.
20.
30.
'1-0.
SO.
Fig. 2A. Survival function
SX(x)
..--r------===::::----------,
e>
60. 70.
RGE ex)
'80.
90.
100. 110.
aSl(x)
..-T------------------~
e>
~
o
Q)
Q)
e>
e>
e>
~
0e>
e>
en
~-+---r----l---'
O.
10.
20.
30.
I
40.
I---,--t----t---r---i
SO.
GO. 70.
n(;( eX)
80.
90.
100. HO.
~
o.
Fl~.
10.
~~~
20.
30.
'41\. :~llrvfv.11
40.
function
I
SO. £0. 70.
nGE (X)
I
80.
100. 110.
- 18 -
~,-r--------------------------,
~
O-f---------------------------,
,..'o
~
o
<Il
J
~+--.::::::c.4----,r-----r-----;------I--...J
2.0.
'1-0.
60.
AGE
Fig. lB.
'Curve of deaths'
80.
2.0.
100.
'1-0.
60.
AGF. 00
ex>
Fig. 28.
Hx(x)
'Curve of death.'
80.
loa.
aR (x)
l
~I----------------------"""
~
o
o
~
o
U;
<Il
r--F
o-r-=-~"-=:~-="---l------I------1
.
O.
20.
~O.·
GO.
80.
~OO.
~~
"-1
. O.
j - - - - - l - - - - 1 - - - - - . , . . - - - - t - -..
1';0.
20.
~o.
GO.
80.
flGI' (Xl
fl(,f: CO
II
"x(xl
- 19 -
survival function of those who eventually die from cancer in presence of
all other causes,
acting alone,
aS l (x)
Sl (x)
(Fig. 2A);
the survival £unction from cancer
(Fig. 3A); - this is the complement to the age at
death distribution from canceT,
F (x);
l
finally, the survival function
among all those who are subject to the additional risk of dying from
cancer,
#
SX(x)
(Fig. 4A),
Figures 18 through 48 are the corresponding proportionate distribu.,.
tions of deaths in the form of histograms.
We call them 'curves of
deaths' and, for simplicity, they are denoted as: HX(x) , aH (x), HI (x)
l
and
respectively.
We notice that the overall survival function
SX(x)
the survival function with additional risk of cancer
are similar.
#
(Fig. lA) and
SX(x)
(Fig. 4A)
(Only a slight difference is noticeable for ages above 60.)
The corresponding 'curves of deaths' are also similar (Figures 18
and 48).
Figures 2A and 3A (and also figures 28 and 3B) describe the mortality
from cancer in presence and in absence of other causes, respectively,
There are remarkable differences between
aS (x)
l
(Fig. 2A) and
Sl (x)
(Fig. 3A), which become more apparent in the corresponding graphs of
aH (x)
l
(Fig. 28) and
HI ex)
(Fig. 38).
The peak of deaths from cancer
in presence of other causes is between ages 65 to 75 (Fig. 2B), while in
their absence is beyond age 100 (Fig. 3B).
This indicates that a fairly
high proportion of White Males who would have died from cancer at older
ages is not, in fact, observed, because they died from other causes before
reaching extreme
old age,
-20-
Of course, we should always keep in mind the assumptions under which
multiple decrement life tables are constructed.
aware of the assumption
)J
a'
(x) = a].1 (x)
a·
Especiallr~
(a=1,2),
we should be
which cannot be
tested (or proved) and on which the construction of single decrement
tables and the derivation of age at death distributions from specific
causes is based.
REFERENCES
1.
Berkson, J. and Gage, R. P. (1952).
patients following treatment,
47, 501-515.
Survival curve for cancer
J. Amer. Statist. Assoc,
2.
Hoel, D.G. (1972). A representation of mortality data by competing
risks. Biometrics~, 475-488.
3.
Jordan, C.W. (1967). Life Contingencies.
Chicago. Chapter 14, p. 279.
4.
Peterson, A.V. (1975). Bounds for a joint distribution function
with fixed subdistribution functions: Application to competing risks. Technical Report No, 7, Stanford University,
July 15, pp. 1-12.
5.
Tsiatis, A. (1975). A nonidentifiability aspect of the problem
of competing risks. Proc. Nat. Acad. Sci. U.S.A. 72, 20-22.
6.
united States Life Tables by Causes of Death:
The Society of
Actuaries~
1959~61 (1968),
U,S.
Department of Health, Education, and Welfare, Public Health
Service. Vol. I, No.6, Washington, D,C.
Acknowledgement
I would like to thank Mrs. Anna Colosi for the calculations in
Table 1 and for plotting the graphs using an IBM 370 Model 155 computer.
r
•
© Copyright 2026 Paperzz