Read, K.L., Harris, R.R., Noura, and Gillings, Dennis; (1982)Survival Analysis with Time-Related Covariates."

•
THE INSTITUTE
OF STATISTICS
THE UNIVERSITY OF NORTH CAROLII'\A
.
SURVIVAL ANALYSIS WITH TIME-RELATED COVARIATES
by
K.L.Q. Read, R.R. Harris, A.A. Noura and Dennis Gillings
Department of Biostatistics
University of Norch Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1375
.
January 1982
DEPARTMENT OF BIOSTATISTICS
Chapel Hill, North Carolina
/
SURVIVAL
ANALYSIS
\H'l1I
TUIE-RELATED
COVARIATES
by
K. L.Q. READ· r• R.R. HARRIS"', A.A. NOURAt and DENNIS GILLINGS';'i'
SUMMARY
A general approach to the analysis of survival data is
developed, based on the fitting of log-linear and similar models
to multidimensional contingency tables formed from such data by
suitable categorisation of the time domain and the covariates
(including treatment).
Consideration is given to non-proportional
hazards, time-dependent or sequentially realised covariates,
alternative transformations of the survivor function and subsets
of the parameters which may be involved non-linearly.
The work
is presented in terms of weighted least squares estimation,
though this is not necessary to the underlying formulation and
other methods, of fitting may be adopted.
Some keywonds:
Survival analysis, categorical data, log-linear
models, conplementary log-log
transforn~tion,
time-related covariates, weiehted least squares.
'I-
p
Department of Mathematioat Statistios and
Unive~sity of Exeter.
Ope~ational
Researoh,
Department of Ri06tatistios, Sohoot of Publio Health, University
of No~th Carolina at Chapet Hill.
-1-
1.
Introduction
The purpose of this paper is to describe a general method for modelling
and analysis of survival data with censoring when a variety of risk
factors may be expected to condition the experiences of the individuals
under study.
This situation is in marked contrast to many industrial
applications (reliability and life-testing) in which conponents or
materials of supposedly uniform quality
a~e
tested under controlled conditions.
Survival processes in animal and human populations involve heterogeneous
individuals living under at least partly controlled condition.
The
analyst must first identify factors which, singly or in combination,
may affect mortality.
Secondly, the relevant factors serve to specify risk
groups of the subjectR or individuals under study; and thirdly it is required
to model the survivor functions appropriate to these groups.
In common with
much published work on survival analysis, this paper was motivated by
consideration of the results of a clinical trial concerned with the treatment
of a chronic disease.
Although this has conditioned terminology, the basic
approach is applicable more generally, for example to surveys of diseased
patients, or to workers exposed to long term occupational and other
environmental hazards.
Also 'death' may be interpreted broadly as the
occurrence of some critical event in the problem under study, such
a~
'death £! a repeat myocardial infarction'.
In a typical study, relevant factors such as age, sex, and previous medical
history may be known before onset of the disease, initial symptoms may be
recorded at or soon after onset, and subsequently symptoms that identify critical
development~
in the patient's condition may be observed.
In the case of a
clinical trial, treatment is itself a factor which will be realised at or after the
time of onset.
It is fair to assume that the procies ways in which these
-2-
factors condition survival through time will not be known a priori:
rather, it is in the hope of gaining luch knowledgc that survival .tudici
are undertaken.
A robust analysis based on a
~nimum
of structural
assumptions, expressed or implied, about the data i8 therefore called
for.
Otherwise, strong restrictions on the form of model which are
unsupported by prior knowledge may act like a straitjacket on the
data, so that erroneous relationships may be implied and true
patterns go undetected.
However, a review of the literature suggests
that the possibility of varying forms of time-dependence across risk
groups has not been fully explored.
Thus a good deal of effort has been
invested in the proportional hazards (PH) model, in which the hazards
for all subjects are multiples of an underlying function of time.
In
Cox's (1972) analysis, for example, no explicit form for this time
dependence need be assumed;
however, although the possibility is acknow-
1edged, thQrQ appears to he relatively little work on Reneral
alternatives to the PH model.
On the other hand, the fitting of linear
models to multidimensional contingency tables of categorised data has
Pfoved a fruitful means of catering for time-independent covariates
without building strong presuppositions of structure into the analysis.
In this paper, a synthesis of these two approaches is attempted.
In
the simplest (non-sequential) situation when the definition of z does not
change with time a two-way table is set up in which the columns
correspond to successive time-intervals and the rows to different subpopulations or risk groups.
determined by his risk vector
Each individual is then located 1n the cell
~
and survival time.
A full set of
parameters (one less than the number of intervals) is available to
describe the experience of each risk group through time, and there need
be no prior or built-in relationships between the parameters of different
groups.
When concomitant information arises sequential IX at specific points
-~
in time after t • 0, it is natural to divide the time axis at these points.
Since the definition of z and the numbers of risk groups then vary between
intervals, it is necessary to modify the above construction of the twoway table, but this entails no new problems of principle.
In both cases
strong prior assumptions about the effect of z are avoided by the use of
indicator variables, and a corresponding representation is adopted for
(the cumulative integral of) A (t).
o
A review of relevant literature is presented in Section 2.
In
Sections 3 and 4 respectively the analysis framework is described for
the cases on non-sequential and sequential covariates as distinguished
above.
A brief discussion is given in Section 5, and this is followed
by an Appendix in which specific systems of suitable parametric models are
considered.
-4-
2.
A Review of Some Earlier Work:
Comparison-invariance and Proportional Hazards
If
denotes time measured' from an arbitrary origin .uch as the 1n1ttal
t
exposure to an occupational or environmental risk or from the onset of a
disease or experimental test , then the survivor function of a typical
individual or subject may be written aa
p(T > tl.!.) •
~(tl!.)
(2.1)
• exp {- ft A(ul!.)du}
o
for a process in continuous time, where A(tl~) is the hazard function
at time
t
for a subject with concomjtant vector
~
which for the present is
assumed not to depend on time and which can therefore be specified at time t • O.
Analysis of the survival data is aimed at modelling
~r
equivalently A.
.~
Two aspects of this are (i) the dependence of ~
on z for fixed t. i.e. the variation of the chances of survival
,
-
to time t with the vector of risk factors z; and (ii) the mortality
experience through time for subjects with a given concomitant (or risk)
vector .:.'
h!.:..
the dependence of ~(tl.:.) on t.
In regard to (i) a special
case arisea when, if the hazards of all subjects are ordered at time t,
say as
, ). (t\z ) ,
-0
their functional
fo~
entail the same ordering at all other times.
In this event the survival .probabilities are in the reverse order
(2.2)
determined for all time by the time-independent risks z.
as comparisons
(~orderings)
In so far
of the chance. of .urvival of subjects
with different risk vectors are invariant through time, (2.2) represents
a property of cOmparison-lnvariance, which we 'hall refer to as CI.
-5-
A sufficient but not nece•• ary condition l for the CI of a .et of ~urvivor
It functions
is that the hazard A(t!.!) be uparable, Le. factoriae as the
product of a functiQD of time only.by a function of.! only:
>.(tlz)
-
= A0 (t)h(z)
-
(2.3)
(2.3) is often referred to al the proportional haurd. (PH) model; clearly
the functions
~
and h are arbitrary to within a con.tant multiple. A (t)
o
0
may conveniently be regarded as a reference or 'minimum ri.k' hazard function
appropriate to standard or normative condition. coded as z
~
which
h(~)
- 1.
Furthermore, the
repre.e~tation
(say) and for
(2.3) will later
be used generally on the ba.is that.! may depend on time.
Cox's Approach and ·Some Developments
In many cases, .uch as finding the be.t form. of treatment for classes
e
of patients suffering from a degenerative or other severe disease. interest
mainly resides in identifying, and then comparing outcome. for. different risk
groups (- risk-factor-by- treatment combinationl, altogether indexed by.!).
By contrast. there may be little knowledge of (or concern with) the timedependence of A(tl.!), that i., given proportional hazards, of the form of
AO(t).
Analytical methods which do not need .trong prior (and possibly VTOng)
structural assumption. about Ao (t)
. (for example, that all .urvival time
distributions belong to the exponential family) may be expected to have a
major advantase of robuatne...
Thu. the method of Cox (1972) con.ist. in
maximising a marginal likelihood which doe. not involve AO(t) (which could
be regarded as a nui.ance function).
It
Kalbfleilch and Prentice (1973) have
1 Suppose that we have two group. of .ubject., indexed by z-o and z-1, and
that A(tl~) - A(l+t)z for t>O, l on the RHS being a pOlitive con.tant.
Clearly the hazard. of the two sroup. are not proportional. However, the
.urvivor function. q(t!O) and .(tll) are re.pectiveJy siven by expC-At) and
exp(- A~2 - At), and =l(t!U <
~t !O)
for all
t )
o.
-6-
.hown that the ordering of individual. who die (fail) by their
~ failure times is a marr.1nally luff1c1ent .tat1.tic for t~e
p.rameterl of h(~). i.e ••ufficient for th••e p.rameter. in the ab.ence of
knowledge of ~o(t). rbi. le.d. to the formul.tion of the marainal likelihood
L a.
n
L· failure time.
til i-l ••••• nf
(2.4)
in which a tota'l of nf .ubjectl die (and known numben of .ubjectl may be
t
cen.ored or IOlt to follow-up at variou. known ci... ). and tei
run. over all
i
tho.e who .urvive until time t i (or t i - 0). that i •• over the rilk .et ii
With proportional "haaard. the null h.aard ~o(.) cancel. from
at time tie
(2.4) and Cox'. choice h(!.> • exp (!'!.> then aive.
(2.5)
L•
The lineor form !'~ permit. a .y.tematic explor.tion of the d.ta uling
likelihood ratio te.t. corre.pondina to
the ne.ted hypothe.i, .trategy for
linear mod.l. whil.:t the re.ultina ML I.timate. of parameter. are marginally
.ufficient and have the u.u.l BAN propertie..
Given
i.
Cox .ub.equently
.,timete. ~o(t) a•• ',um of lep.r.tely "tLa,ated contribution. ari.ina from
the f.ilure point. t i not l.t.r theA ttme t. taking ~ o (t) . ! 0 .xcept .t the.e
point.. Bre.low (1974) r.pre••nt. Ao(t) •• takina ••erie. of con.tant value.
(to be .,timeted) between .ucce••ive failure point. and e.timate••11 p.rameter.
,imultaneou.ly.l
~
Cox'. tre.tment of the (.liahtlY-arouped) di.crete-time c••••
1 De.pite the likely l.rll number of p.rameter. to be e.timet,d, Bre.lov'. fin.l
Iquation, for 8 .r. idlntic.l with tho.e of Cox if, in the ca.e of .imultaneou.
failure' followinl .11aht aroupina on • continuou. time .c.le. Peto' ••pproxim.tion
to L replace. that of Cox (Cox (1972). di.cu•• ion). It may b, conjectur.~ on the
ba.i. of this equiv.lence th.t thl e.timation of •• many parameter••• f.llurepoint. .imultan.ou.ly with! .ay not n.c••••rily up,et the inf.rence. on!
"~I'
"..
I
I,
(f"t,.-,...,
II~,- .. ,-~
-7-
in which tied failure times may oecur, is refined by Kalbfleisch and Prentice
4It(l973), who eive the marsinal likelihood for
~
co which COX'I likelihood is a
fair approximation when the groupins intervals are narrow.
Following Cox, the.e
authorl first estimate! and then in terms of their! eltimate contributions
to Ao(t) arising from eaeh failure time.
The resulting eltimate ofl(tl!) i.
a step function related to the Kaplan-Meier (1958) produet limit estimate.
Kalbfleilch and Prentice also analYle with cholen (Imaller) number of time
(not failure) pointl, A et) beins a.sumed con.tant between luccessive time
o
points.
of
S and
The HL estimates of the.e constants are easily obtained in terms
A
o
(t) may thus be modelled in te~ of al many or al few parameter.
as desired.
This development yieldl continuou. eltimatel of survivor functionl
and in the use of prelpecified time intervall parallels our own approach.
The use of time-dependent covariatel z in the PH model was mentioned by
~cox
If the .tandard conditions or rilkl
(1972).
(e.g. if z
.
-0
~
do not vary with time
.
• 0) then Ao(t) Itill represent. the correlponding reference
hazard function, but comparilon-invariance will not hold in general when the
effect. (or definition) of
~
depands on time.
Several author. have noted thi.
possibility (for example Holt et al (1974), Thomp.on (1977), Prentice et al
(1978), ~reslow (1978», but we are aware of relatively few who have uled it in
applications.
Three who have are:
Cox (1972), who u.ed a .imple parametric
form to illustrate a po•• ible model of dep,rture from the PH model; Kay (1977).
(implicitly. by allowins different function. A (t) in different 'atrata' of
o
.ubjects); and Taulbee (1979), who characterised the hazard function by a
polynomial in t with coefficient. dependinl on
~ •.
-~
Analysis of Survival Data in Contingency Tables
Grizzle. Starmer and Koch (1969) - abbrieviated as
ho~ wei~lted
GS~have
de.cribed
least squares analysis of linear regresoion models may be used
to test hypotheses and fit simplified models to a multi-way table (such
as we envisage above) of cell frequencies arising from the cross-classification of subjects by categoricAl variable..
As stAted by Koch et at
(1972), linear and nonlinear functions of the cell proportions and a
corresponding dispersion matrix may be estimated, and the resulting test
statistics, which belong to the Minimum Modified Chi-Square
~CS)
ClASS
(Neyman, 1949) or equivalently to the quadratic form criterion of
Wold (1943), have central X2 distributions when the null hypotheses
at which they are directed are true.
Goodman (1970, 1971), NeIder (1977»
(1971»
Maximum Likelihood (~) {Bishop (1969),
and Minimum Discrimination
(MP)
(Ku et aZ
approaches can also be used in this situation and together with the
GSK method are based on BAN estimates (Koch et at (1972»
asymptotically equivalent.
and hence
In principle, therefore, the structural
models for contingency tables of. survival data put forward in this paper
could be anAlysed by any of the GSK, ML or KD methods.
GSK over
~~
Our preference for
and KD is secondary (though not always unimportant), arising
from the computational convenience of having explicit analytic formulAe
for the estimate..
All data are taken to be (or are rendered) categorical
in form, and the justification of the Normal theory X2 statistics essentially
ariseS
fr~m
the asymptotic Normality of multinomially distributed cell
frequencies when (roughly speaking) the average of these frequencies over
the whole table i ••ufficiently large (Johnson st at
least 25
ob8e~vation.
(1~71)
sugge.t at
per cell).
Illustrative applications of the GSK method to .urvival snalysi. with
covariates which do not depend on time are de.cribed in Koch et al (1972)
and Freeman et at (1974).
Thus, in an analysis of 5-year survival rate.
- 9 -
from breast cancer Koch et at, following Cutler and Kyerl (1967), considered
18 risk groups arising from 3 underlying factors (degree of skin fixation,
node status and tumour size) classified into 3, 2 and 3 levels respectively.
Two alternative saturated models involving 18 parametero were fitted to the
5-year survival rates from the 18 risk group..
Progressive simplification
by a process of exploratory hypothesis testing led to a 5-parameter model
(with a residual 'X 2 value of 2.76 on 13 degreel of freedom) ,~ich was used
as the basis of an empirical Itatistical classification of the extent of
disease in cancer of the breast.
Adaptations of the GSK approach to deal
with withdrawals, losses to follow-up and period (as opposed to cohort)
studies are also discussed in the above two references and will not be
further considered here.
-10-
3.
Categorical Data Analysis
Non-sequential CAGe
Basic Theorv. of the CSK method
The purpose of this section is to discu.. the modelling of ha:ards
-
A(tl~) in which the covariates z are defined at time t •
-
.
o.
For com-
pleteness of exposition we begin by repeating (with minor adaptation.)
elements of the basic GSK formulation
Koch et at (1972).
as given in CSK (1969) and
For simplicity we consider the ca.e of a follow-up
study of a disease
in which no losses to observation occur until
the planned final time of follow-up (or cen.oring),
from each individual's onset of disease.
~
measured
We divide the
time axis into k + I suitabl,. defined intervals
For subjects surviving a time
~
we note merely that t
otherwise the interval , t.j - < t , t •
j
l
of death is recorded.
> ~,
but
Dr ideally the exact time l
Since the concomitant data are categorical, we
have a strictly limited number of pOlsible different vectors z which can
be indexed a priori a,
categorical
~
~,
1 • 1, ••• , N.
Subject. with the lame
value are treated a. a homogeneous riSK group. and it i.
assumed that death. due to caule. unrelated to the diwease of interest
are excluded.
Then .ubjects may be classified in an N x (k + 1) contingency
table as envisaged in Section 1.
the cell frequency n
t j _l < t
~
ij
t j ( or t >
For i • 1. 2, ••• ,~, and J • 1. 2•••• , k
denote. the number of deaths in the jth interval
~,1. f
j • k + 1) occu;rln
• g i n the l. th
., group
r18~
(which is of totAl size n. equal to the i th row total), •.. denote. the
~.
lJ
"probability that a member of this group dies in the jth time interval, and
I
For a given choice of k, t l , ••• , ~-l' the preci.e times of death are
irrelevant for the GSK approa~h becau.e we are concerned only with the
chance of death iu each in:orval, which it estimated ~y wI tinomial
maxim~ likelihood in ter~ of the interval (or cell) frequenciel only.
However. l~nowledge of the exact time. permits one to reanalyse with
different numberl and choice. of interval. and compare their effectl
on the rusulta.
+
1
-11-
n ..
e
P•. denotes the corresponding multinomial estimate given by-!!.
LJ
n.1..
Writing
-t(t I-1.
z.) -~. (t) for brevity. ve .ee that. for all i and j.
1.
~.(t.)
1.
3
k+1
j
- 1 - I w1' "
..-1
-
I
wi...
.. _j +1
with corresponding multinomial estimate
.
:f 1..(t.)
J
We
00\1
j
k+l
- 1 - t P'a
1-1 1 ..
-
t
..-j+l
PU'
formvcctor&!.i andE,i' where.!!.i'· (w il •· .. , wi(k+1,>an~.£i'·
(Pil •••. ' Pi(k+l)' and composite vectors!. and and.£ given by
"
1f
I • (.!!.l' !.2.···'
P'
(pI
p'.
I
.!N)
pI)
• -1' -2····· -N
Then the GSK method can in principle be applied to fit models of the form
p
(w). X
(3.1)
Q
uxv .e.vxl·
-wel -
where! is a vector of u components which are twice differentiable
functions of the probabilities
w•.
.
13 (and u ~... tfk, the total degrees of freedom
of the data layout
v ,where v " u, and
{n1j }), X is a prescribed design matrix of full rank
B is a vector of v unknown parameters.
versions of the CSK computer program handle functions
!
Current
which can be represented
as a succession of linear, logarithmic or. exponential operations on the vector
~,
and these possibilities have facilitated many practical application••
We further define
e
V(w.)
-L
(jc.+l) x(k+.l)
-
1
n.
L.
wit (l-"U)
- "U 1f i2
·....
-wil"!(k+l)
-w i2 "U
-12 (l-w i2 )
·....
-w12" i(k+1)
•
-w i (k+1)w i1 -"i(k+l)w i2
·....
.
wi (k+l) (l-wi(k+l»
-12V(~l.)
- sample estimate of V(n.)j
-1
V(.e.) - block diagonal matrix having V(.e.i} on the main diagonal;
f (n) - any twice-differentiable scalar function of the elements
m-
~ ~;
w•• of w, m - 1, 2, ••• , u
lJ
-
f m(.E,) - f m('II') evalua ted at .2J
F' -(F(R-», - (f 1 (.E,),f (.E,),. .. , fu(E,»;
2
H
u x [(k+l)N] -
[afa:.~
('11')
lJ
S uxu -
ljJl
n •. - p ••
lJ
HV(n)H'.
..
The matrix S is the
.~)le
estimate of the covariance matrix of
I, exactly
when the components f 'are linear in the p.. 's and asymptotically as given
m
lJ
.
by the 'delta' method otherwise. We assume that the components fare
m
functionally independent and independent of the constraints
k+l
In.. - l, i-l, ••• ,N •
. 1 lJ
'
Jthen Hand S are of full rank u.
(3.2)
If difficulties arise .from occasional
zero cells, such zeros are r.placed by l/(k+l).
The weighted least squares estimate of
to
~ini~ise
! in the model
(3~1)
is b
determine~
the quadratic form
namely
The test statistic for the fit of the model (3.1) is
SS(!(.!) - X!) -
!.'s-~ - !' (X'S-IX)!,
which is asymptotically X2-di.tributed on u-v DF if (J.l) holds.
Aasuming (3.1),
-13-
the test of H :CB • 0, where C i. an arbitrary d x v matrix of full rank
o -
-
d , v, is based on
k'£' &(X'S-lX)-lc'J-lCk
SS(CB .,2,) •
which is asymptotically X2-distributed on d DF if H is true.
o
A'pplication to the Proportional Hazards (PP.) tfodel
As the genuine PH model implies, we begin by supposing that the parameter
vector
! representing the effects of the covariates is the same in all k+l
time intervals.
Putting h(.!.) •
!',!.
in equation (2.3) and then substituting
in (2.1) gives
q(tl.~.> • exp { - fot>'o(U)du ltxp<.f.!)}·
Applying the double-log transformation, we obtain
e
10[-10 ;t(tl.!.~
•
10P:Ao (UldU]. !'.!.'
To model the function >'0 (t) in the intervals t j -1
introduce a subvector of additional parameters
<
t
~
t j , j ·1, ....k, we now
1, where l' •
t.
f J).
exp(.,) •
J
0
(t1, .•• ,tk ) and
(u)du,
(3.3)
0
so that
.'j
I.n [- In'f i (t j )]
+ !'.!.!, i·l •••• ,N;j·l .....k. ~.
The model 0.4) is of the form !(.!) • Xi, where
l' • (1' ,!')
(3.4)
is the transposed
full vector of unknnwn parameters, and so 'can be analysed using the GSK method.
Explicitly.
!(!) •
~
[-
~
(A!)]
(3.5)
where A is a (Nk) x [N(k+18 block-diagonal matrix
A•
B
0
0
B
... • • •
0
0
0
• ••
• ••
• ••
•••
•••
B
• ••
' '1 ar formulation of this equation has been presented independently by
t A Slml
"'", "I'"
1)
Vr"~""
-,.,\0.
~ ..
(r'
rftcen,t. .,a.,er to the Ro"al Statistical Society.
-14-
vith blocks B given by .
B
kx(k+l)
-
0
1
1
• ••
1
0
0
1
• ••
1
•••
•••
0
•••
• ••
• ••
0
• ••
1
0
and
~
applied to a vector argument means the logarithm of each component.
Furthermore. from WTiting
~.(t.)
(3.6)
J
1
ve find that the matrix H of firlt partial derivatives of the components of
(3.5) with respect to the w.• 's il a block-diagonal matrix
1J
H1
0
H(Nk) x (N (k+l»
•
0
• ••
0
H
2
• ••
•••
• ••
• ••
•••
...
0
0
in which for i·l ••••N the k x (k+l) matrix
·..
H~
•
"N
is given by
vhere the k x k matrix C. is given by
1
and
y .. 1J
Had we writ ten
differently.
~.(t.)
J
~1
1n;.(t.)J -1. j-l, •.• ,k.
'f.1 (t.)
J
J
1
j
as 1 - t ~.1. the matrix H. would have come out Ilightly
1-1 1
1
However, the re.u1ting asymptotic covariance matrix S - HVH'
is invariant under ule. of the conltraint. (3.2).
e
The general form of the
design matrix X il
X(Nk) x (k,!"N-l)
.~
~
z '
-1
~
.!k
~
- ...
,~
z .'
• ••
.!k .!N'
,
(3.7)
-15-
where \. denotes the k x k identity matrix,
~
denotes a vector of units
~ and the (N-l)-dimensional vectors ~l' ••• '!N indicate the covariate levels
(or values) pertaining to the N risk groups.
We note that, given a
parametrisation 1 of the underlying hazard function A (t), only N-l parameters
a
(indicator variables, component. of
between the mortality
~)
are neces.ary to discriminate fully
experience. of the N groups.
The modelling of A (t) expressed by (3.2) appear. similar tat but is
a
much less restrictive than, that of Kalbfleisch and Prentice (1973).
Thus,
if for j • 1, ...• k we represent A (t) a.
a
then (3.2) implies that
9.
e J -
+J'-1
0
A.J • -t- -- t _- -
j
j l
•••
J
However, we do not require A (t) to be constant throughout each of a defined
-
a
set of time intervals.
Furthermore, specific distributional forms for Ao(t),
which we may not have wished to a.sume a priori, translate into linear or
nonlinear hyPothesea about the subvector
1.
Thes. hypothesea can be examined
by graphical or other methods, and, if linear, can be tested within the GSK
framework and built into the model if satisfactory.
Some example. with details
are listed in the Appendix.
Extension ,to the Cue of Time-dependent Covariate Effects
It will be seen from the dimensions of X as defined at (3.5) that (save
~ when k-l)
the number
(k+~-l)
of parameters fitted in the above analysis of
the PH model is le.s than the total number Nk required to specify a saturated
-16-
for the table of cell frequencies {n }. The difference representl the
lj
number of parameterl which in the context of the data l.yout {n .. } are
~del
~J
.vailable for modelling viol.tion. of .the PH ...umption.
However. the matrix (3.5)
need only be enlarged to the .quare matrix
X(Nlt)x(Nlt) •
~
Zl
~
Z2
•• •
Ik
•••
(3.8)
Zu
where. for i·l. 2••••• N. Zi il • k x (N-I)k block diagonal matrix with blocks
z ' • and the componentl of the. parameter vector
-i
1 are ordered as in
(3.9)
in which for j·l •••••k
1j i • • •ubvector of N-l p.rameterl which characterise
the relative mortality experience. of the N risk groups in the jth time interval.
A model based on (3.8) neces.arily fitl the data layout perfectly and any
reduced (unsaturated) line.r model involving fewer that Nk parameter.
CLn
then
be tested.
An
Example
In a given applic.tion, the .pecific reduced models of intere.t are likely
to depend on the ba.ic factor. underlying the definition of the N risk groups.
A natural situation, at least in the early .tages of analysis, is when the
ri~k
groups corre.pond to cell. in • cro••-cl.ssification of the basic factors
suitably categori.ed into lmall number. of level..
Taking the simple case of
two basic factor., .uppo.e th.t age (A) and relevant dileale history (H) are
of interest. thele being dichotomiled al .g. lell than 60 years
(A·
~
0)
versus age .60 or over"CA. 1) and a. pre.ence (B. 1) v.rsus
ab.ence (H • 0) of di••••• hi.tory.
We further choo.e to work with four
time intervals altosether: th.t il. k • 3 interv.l. up to the time of fin.l
follow-up.
To avoid complication. we ignore the po•• ibility 6f unknown or
-!l-
missing data.
The N.2 2 -4 risk groupa may be indexed by a vector
~,
where
Zl • 1 when there i. a hi.tory of the dileaae,
• 0 otherwise;
(3.10)
Z2 • 1 when the age is 60 or over
• 0 otherwise ;
We now display the vectors z.
-1
(whi~h
corre6pondi~g
to the different risk groups
are arbitrarily numbered 1, 2, 3, 4 .a shown):
H • A • 0 .•• -1
z • • (0, 0, 0)
H • 1,A-D •.~
z • • (l, 0, 0)
Group 1
Group 2
(3.11)
Group 3 : H • O,A-l : !.3 • - (0, 1, 0)
Group 4 :H-A-1 .=4
• z • - (1, 1, 1)
Underlying this coding ia • aaturated (interval-specific) ANOVA-like
e
parametrisation
('j' BRj , BAj , l!HAjj-l, 2, 3),
in terms of which we have, for all (four) posaib1e (v.1u~of) ~,
ln [-In q.(t j
IE] •
or in matrix terms
In [-In
t(
t l l.!.l)]
Ln[-1n '(t21.!l)]
In[-ln ~(t31.!.1)]
lnl-In ~(tll.!.2)1
In[-ln ~(t21.!2)j
1n[-1n ~(t31.!.1)]
1n[-IJl ~(tll.!.3>J
In[-In y<t21.!3)]
In [-In :f<t J I.!.J)]
In[-ln ~tll!4)]
In[-ln ~t21.!4)J
In [-In ~t3I!J'>:1
+j + BRjz l + BAj z 2 + BHAjZ3,
-
'0 0 o '0 0 o 10
I
I
0 1 0 0 0 o '0 0 o 10
I
0 0 1 10 0 0 0 0 o 10
--- - r - - - - - - - - - 1 0 0 1 0 0 '0 0 0 10
I
0 1 o '0 0 011 0 o 10
I
0 0 1L
0_0-0- 0
____
1- 0 o 1 1
I 1 0
-
1 0
0 1
o
o
,
o ,0 1 0 0
o 10 0 0 1 0
0 _1_10- -0 _0. l0_
___
o 11 1 1,0
0 1 0,0 0 o ,1
1 0
0 0 1 I 0 0 0,0
0
0
0
0
0
0
0
0
0
0
---
0
0
- - 1 - 0 0 0 0 0
1 0'0 0 0
0 0 1 0 -1 - 0
0- 0 1
0 0
1 1'0 0 0
0 0 11 1 1
0
I
.~
'1
'.2
'3
BH1
BAl
BHAl
SH2
BAl
SHA2
8H3
BA)
BHA3
(3.12)
-18-
in which the parameter vector conforms to the ordering apecified for 1 in (3.9)
~
In this illustration, Group 1 (H-A-o), for which ~l -~, is the standard
or reference group which his hazard function
). (t)
o
paraaeterbed
by +1' +2 and t 3 • The group having the leaat
unfavourable concomitant vector z is a natural candidate for this role.
-
Some
possible hypotheses of interest in this example are the following:
corresponding to hazards proportional across all risk groupSj
corresponding to proportional hazards across levels of A for fixed Hj
corresponding to no effect of A in the first intervalj and
Hod : BHAI • BHA2 - BHA3 • 0,
corresponding to additivity of H and A effects on the double-log scale.
Alternatively, when patterna of significant interactions occur in the ANOVA
version, a clearer picture may be obtained from a hierarchical representation
based on
Zl as defined in (3.10);
z2 • I when age is 60 or over and zl • 0 (and z2 • 0 othervio.) } 3.13)
1 - 1 when age is 60 or over and 11
3
- 1 (and 13 • 0 otherwiae) •
,
Underlying this coding ia a hierarchical parametriaation .
<+j' SHj' B(A in HJj' SeA in H1)j' j • 1,2, 3),
w~ere
'A in Ho' refers to the ri.k group of subjects with A • 1 within the
~ group with H • 0, and so on.
On
this basis the risk vector for Group 4 (a-A-I)
is now given by
.!q' • (1, 0, 1),
(3.14)
-19-
but ~l' ~2 and ~3 remain aa defined at (3.11).
Corre.ponain~ly,
·e
j • 1, 2, 3,
or in matrix terms (writing!fi(t.) for 1-(t./z.»
J
1n(-1n~1(tl)J
J
-1
10
O' 0
00 1 0
0
0 10
0
0
0
1
0: 0
0
0
0
I0
0
0
1n[-1n Y-l (t 3)]
_ 0_
~
]. ,-0_
~
_0~
2 _~ _0 ~ 0_ ~
0
+3
1n[-1n q.2(t )]
l
1n[-1n:f2(t2)]
1
0
0 I 1
0
0
0
0
BHl
0
0
_0_0_1.J~ ~ !1_O_O_041_~
0
B(A in Ho> 1
B(A in HI> 1
1n[-1n
011 (t 2)]
0
10
1
0
0
0 I 0
In[':'ln~3(tl)J
In[-1n.~3(t2)]
1n[-1n Y3(t >]
3
1n[-1n ~4 (t )]
l
1n[-1n~4(t2)]
In[-ln~4(t3)J
•
0
011
0
I
I
1n[-1ny.2(t 3)]
0
0 10
0
1
I
0
+1
+2
100101010001000
B
H2
0
B(A in "0)2
B(A in H )2
1
O' 0
0
0' 0
1
0
0
0 II. _
0 0_0 1.....0 _
1 -0
1, 0
_0_ 0_ 1
~
1
0
0
1
0
0
1
010
0
I
1
0
0,0
0 I 0
0
0
0
0
0 1 1 0 110 0 0
0011000 1 000:101
BH3
B(A in'
1
"0) 3
B(A in HI) 3
Again the parameter vector i. ordered as in (3.9).
The model (3.15), which like (3.12) il laturated and hence fully general, may
be deemed more convenient for testing hypotheles luch a'
Hoe: B(A in ~)l • B(A in Hl)l • B(A in Hl )2 • 0
which implies no effect of A in the firlt interval at level 0 of H and no effect
of A in the first two intervall at level 1 of H.
H
oe
In the ANOVA-like formulation
becomes
If we do not wi.h to work with an ANOVA-type or hierarchical design matrix, a
.imp1e
unstructured alternative i. to u.e the
~)
x (Nk) identity which
associates one parameter with each rilk-group by interval (l, ••• ,k) combination.
Por example,
IYlt.~tic
equaliti.1 of mortality rat•• can ea.i1y be explored
with this reprelentation.
(3.15)
-Lu4.
Categorical Data Analysis:
The Sequential Case
Generalities
In this section we extend our approach to deal with survival data when
additional concomitant information accrues at times after t - O.
To
an init1al set of factors such as age and relevant medical history
which are defined for all the subjects under study at time
t - O. new
factors may be obtained from clinical assessments or follow-up reports.
tlhen the new data arise at prescribed times in the disease episode it
would seem natural to divide the time axis at these points.
time t
i
Then at each
more risk groups can be identified by cross-classifying by the
newly observed factors those already defined.
On this basis the number of
risk groups is multiplied by the number of levels of each new factor introduced.
T~e
problems arising when several factors are used to classify a relatively
small set of data would call for further consideration in any given
application: whilst their obvious practical impoTtance is not denied,
the7 are not. however a major concern of this paper.
In general, similar
difficulties pertain to the discussion of survey data involving a large
number of potential explanatory factors. and require the exercise of
statistical substantive judgement in the context of whatever methods of
analysis are adopted.
Here we consider first a reformulation of the non-
sequential case which allows for the subdivision of risk groups at different
points in time.
The general representation of the sequential case is then
obtained and illustrated by an example.
Conditional Binomial Formulation of the Non-sequential Case
The work of this paragraph i. baaed on the fact that the multinomial
probability of observing death frequencies nil' n
i2
' •••• n ik in the i
th
risk group can a1ao be viewed aa a product of binomial probabilities in a
-1.1-
conditional sequence.
Thus, in the first interval nil is a realisation
* • ni • and w*il • wil
of a binomial distribution with parameters nil
Subsequently, that is for
(we use the notation nil - B(n i ., w*
il
j • 2, 3, ••• , k, given that
».
,j-l
- t~l nit • nie
(4.1)
.
1, n
B(*
h
.
.
subJects
surv1ve
to enter the J' th 1nterva
nij , w*)
ij , were
ij -
j
*
Wij -
lI'i'J
j-l
1 - z: 1I'it
t-1
<and 1
*1J
-
1' ••
-
1 - t lI"t
t-1
1
(4.2)
),
j-1
1 - t 11'.
t-1 1t
*
n't
Corresponding sample estimates Pij are given by writing Pit (- n~ ) for
1.
n ..
j - 2, 3, ••• ,k) •
* - P'l'
*1J
(In fact Pu
1 and p,.
n·lJ.
On this basis the data for the i th risk group are now laid out in a k )( 2
'lr
-4,
.t • 1,.,., k, in (4.2>U '
,
contlngency
tabl e, t h
e 'J th row of which refers to the interval t, 1 < t
.
ln
. n*..
Whlch
l.J
probabilities
.
subjects
*lJ
11'.,
J-
~
t,
J
are represented as dying or surviving with respective
and 1 -
*
11'".
1J
Accordingly, the D)del (3.5)
be written in terms of vectors !i,* and l?.i,* ,
11'
* and E.*'
will now
where
* lI'i2'
* ••• , lI'ik)'
*
(-~*,)' - (1I'i1'
i - 1, 2, ••• , N,
and
.
Ei* and .2.*
. are ana.logously
dehned.
We obtain
* - .!:!. (- C* .!:!. <1 - .!.*) ]
! * (!..J
where C* is a (Nk) )( (Nk) block-diagpnal matrix
the representat i on
(4.3)
-aIII
III
C
-
B
0
0
Bill
•• •
0
• ••
0
1
0
0
1
1
0
0
• ••
·.. 0
·.. B*...
• ••
III
with blocks B given by
III
~xk •
·..
... ... ...
1
1
0
• ••
0
...
• ••
1
1
1
III
The appropriate 'covariance matrix', V say, is an (Nk) x (Nk) block diagonal
matrix with blocks V~, i-l, ••• ,N where V~ is a k x k diagonal matrix with
1
1
III
III
•• (1-11' 1].
•• ) .
W1. th (.J,J. ) e 1emeut 11' 1]
1n wh·lC11, f or J•
_.0
III
~
2,nIII.. expresses t h e cond
···
ltl0nln;;
lJ
n ••
1J
on the outcome of the preceding interval.
III
The matrix H of first Dart;Al
derivatives of the components of (4.3) with respect to the
*, s 1S
.
11' ••
1J
a block-
diagonal matrix
III
III
H(Nk)X(Nk)
-
Hr
0
0
~
III
·..
·..
•••
...
• ••
0
0
• ••
0
0
...
III
~
in which for i - 1, 2 , ••• , N t he k x k matrlx
. HiIII is given by
*
III
III
III
Hi • - Gi B Pi
where the k x·k matrix G~ is given by
1
III
*
*
Gi - diag (Yil' ••• , Yik)
and
• tn
and the k x k matrix
P; is given by
[1 - 1-1~"Ifi1 J
(4.4)
-23-
(p*.)-l
1
-
* ' ••• , 1 - wik)·
*
diag (1 - wil
- diag (1 - wi1 ' ••• ,
k
1 - t 1r
. . t-1 it
k-1
1-tl'&.
t-1
It is not
diffic~lt
J ·
1 ...
to verify from these definitions that the corresponding
sample covariance matrix of the NIt random qU&l1tities 1n[ -1n
~i
(t j )] ,
namely
* - H*
S
* *
*
V (.E.) (H )',
agrees exactly with, the matrix S - K VC!?) H' defined in Section 3.
Since risk groups are independent, it is enough to show agreement for the
. SR* say, relating to a general risk
k x k submatrices, SR'
identifying suffix (i hitherto) omitted.
gro~p
R with the
The general U, R.) element of
Sa is then obtained from (3.6) and following results in Section 3, and
(~.4)
as
c
c
1
P
m-l m
I
Sj1 -
tnt- ipJ tnF- ~ .J
n·F- ~ pJ
m-1 m
mal P
mal m
* * t P
y.y
J 1 mal m
n.
[1-J1
Pm]
*.
•
where c - min (j ,1); and that of SR
18 g1.ven by.
*
* *
SjR. - y.J Y1
c
t
mal
*
Pm
n* (l-p* )
m
m
-
* *
~ Yt
c
n.
m-1
t
r_1
1- t P
n-1 n
Pm
[1- ~ pJ
n-l
Equality is easily proved by induction on c, being obvious for c - 1 and
resulting for general c from the fact that
c+l
c
t P
I P
Pc+l
maIm
m-l m
c+l
c+l
c
c
(1- t P ) (1- t P )
1- t P
1- t P
m-lm
mal III
.1 III
m-1 III
-
It follows frolll the above results that the binomial reformulation applied
to all N risk groups yields an (NIt) x 2 table of data, analysis of which in
-/.4-
terms of the 'starred' model (4.3) yields estimates and inferences precisely
equivalent to those obtained for the multinomial model treated in Section 3.
The Model for the Sequential Case: Outline of General Construction
Our starting point for modelling data involving sequentially defined
covariates is the (Nk) x 2 layout of binomial data categorised by the N risk
groups distinguished at time t-o.
For convenience, we begin by reordering the
rows of this table so that they are grouped by time interval instead of by
risk group.
For consistency, the same permutation is then applied to the
.. as given at
rows of
~
3.
We now consider a general risk group defined in the (j_l)th interval,
(4.3) and to the rows of the design matrix X of Section
j • 2, 3, ••• ,k, which is to be split on the b~sis of new data into two or more
'n the J,th 1·nterval.
groups L
Th e numb ers 8urv1v1ng
.,
f rom t he or1gLna
•. I ( or parent )
group at t j _ are divided into their new groupl. In each new group the numbers
l
·
.
1 are represente das
b 'LnOmLa
• 11y
•
, and surviv1ng
dy1ng
Ln
th
e 'J th 1nterva
distributed and the rows pertaining to these croups then replace the row
corresponding to the original (undivided) group in the jth interval.
Similarly,
the survival probabilities of these groups replace the original survival
'l'Lty f or t he or1g1na
" l group 1n
'h
.
I and correspon d'Lug rows
prob ab L
t e 'J th 1nterva,
identifying the mortality experiences of these groups up to and within the jth
interval are inserted into the matrix C in-place of the row corresponding to the
original group,
The inserted rows will of course agree with the row being
replaced in regard to all elements related to the common history of survival
•
1s. By wor k'1ng
of t h~ new groups t h rough t he 1st , 2nd , ••• , {'J- l)th 1nterva
, 1y t h rough t he 2nd , 3rd , ••• , k th 1nterva
•
1s, eacht1me
'
,
uSlng
t h e vector
succeSSLve
of survival probabilities
(1 -~)
.
and the matrix C just obtained, the initial
(Nk) x 2 table and the associated representation (4.3) may be expanded
vertically as often as necessary to include risk groups discriminated
sequentially at the time. t lf t2, ••• ,tk~1' There i8 of course no point in
-25introducing new groups at the final follow-up time t
k
because. by definition.
after this time no further mortality data arise by which such groups may be
distinguished in our analysis.
Reordering and division of the original risk groups as described leads
to an (No - NI + N2 + ••• + Nk ) x 2 table. where. for j - 1. 2••••• k, N, is the
J
numb er
' k groups f'lna 11y 1'd entl'f'led'ln t h e J.th,lnterva.
I
rlS
0f
The resulting
table is 'stratified' by interval - that is. by time - in that the first N
l
rows of data pertain to the first interval, the next N rows to the second,
2
and so on.
The accompanying representation in terms of the
(verti~ally
expanded) equation (4.3) may 'then be parametrised as ~•• where in the usual
initial case of a saturated model X is a No x N0 design matrix and ~
d.· is a
corresponding vector of No parameters. and analysed by means of the general
GSK theory quoted in Section 3.
An Illustration
A full statement of the general layout of the sequential case, with
arbitrary numbers and levels of factors each arbitrarily subdivided at each
time point. is not given here on account of the excessively complicated
notation required.
Instead we give a simple illustration involving one initial
covariate (FO say) defined at three levels (coded 0, 1 and 2) and two binary
i
(0. 1) covariates F defined respectively at times t,. i - I , 2.
1
Incorporation
2
of Fl and F in the model demands at least four time-intervals (k ~ 3): for
simplicity.we take k-3 and the time of final follow-up is then denoted by t 3 •
rd ,
nd
' 1 e case when t he rlS
'k groups ln
'h
I n t he Slmp
t e 2 and 3
lnterva I s are
defined by exhaustive cross-classification, we then have NI -3. N2-6. N3-l2 and
N -3+6+12-21. Accordingly, the data are laid out in a 21 x 2 table. To
o
clarify matters, this table, together with the corresponding probability
~
vector
~
and its sample estimate
~.
is dilplayed below.
To be consistent
with the earlier work of this section the n'l, w's and p's should carry ·'s
but, again for the sake of clarity, the *'s have been omitted.
-26-
dying
surviving
(d)
(8)
n.
-n
1
n
I-interval
l
1
-n
•
2
n.
1
2
-n1
00 00
n 1 -n 2
01
01
n 1 -n 2
10
10
n1 -n 2
I
2
nd .
lnterva1
11
11
20
20
-n 2
n 1 -n 2
~
n
2
3
n
rd .
lnterval
2.
- n1
20
21
- nl +n1
00.
• n
1
11
+~
- -000- -001-
2 • ni
00
11'2
+n 2
010 011
•
n
• n2
2 +n 2
100 101
10.
1"Q
• n2
2 .
110 111
• n 2 +n 2
01
11'2
200 201
- n 2 +n 2
21.
210 211
- n2
- n 2 +n 2
20
11'2
01.
10
2
• ni
1J
11
11'2
.,
21
11'2
2
--n
I
2
3
3
n110_n110
2
J
I 111 111
I n 2 -n J
~10
3
IP2
-n 2 In 1
I 01 01 01
i P2 -n 2 /n1
i 10 10 10
;P2 81\2 In l
!P211
i
8Q
11
2
11
Irll
!,
20
20 20
/
: P2 81\2 n l
I 21
21/ 21
i P2 -n 2
i
nllO
n
00-
nl
000
3
3
111
1I'J
P3
I
i P\0l:..J.30l/n~01
J
; p\10-n\10 In~IO
\ 111
P3
Ill/Ill
n2
-n 3
~01_J301
p2301 -n23(\ 1'1 ~20 1
2lO
J1()_xt lO
pliO ·r1-iO' /~lO
n
n
2
3
211
3
n
211
2
3
-n
211
J
I
I'
I
II
2
3
';'01
3
I
!
pl300 -n23OO 1r1-2OO
I
I.
000, I
~OO-JJOO
r1:.300
I
!I pOl3 l-.re3l1 /rf?,1l2
I
I 100 -n100 In100.
~Ol
111
'-00--00
I
3
n10l_n101
2
3
3
2 2
-nl/n.
-n 3 '/n 2
(4.5)
; 001 001 001 ~
! P3 -n 3 /ni '
! 010 010 010,
; P3 8Q 3 /n 2 '
n101
3
2
PI
: 000
100 100
n
-n 3
2
3
1
-n1/n.
Ip3
001
-n
1
Pl
1- - - - - - .
nOll_nOll
100
3
10
-~
010 010
n
3
n
2
1.
-n1
3
001
001
3
010
n
n
3
21
21
-n
2
1
000 000
n
-n
000
n
nOll
;-°
°°
PI .nl/n.
° °
°1
n
211
P3
211/ 211
-n 3 n 2
In the table, the left column (d) contains the numbers dying, and the
right column (s) the numbers surviving, within the interval considered.
Each
I
-27row of the table corresponds to a unique risk group which is identified by the
O
superscripted levels of the hitherto defined covariates (F in the 1st interval,
o
1
nd
0
1
2 .
rd
st
F and F in the 2 and F , P and F ln the 3 ). In the 1
interval the
numbers
n~'
•
n~
-
n~
surviving in each of the previously defined (original)
risk groups are subdivided as nto, n~l corresponding to the values 0 and 1
1
of the factor F which is defined on all surviving subjects at time t • The
1
• · , 1 , 2 t h en comprlse
•
• .
n umb ers n 1iO ' ni11 ' 1
t he 6 row tota 1 s. pertalnlng
to t h e
nd
2
interval. The numbers surviving the 2nd interval are similarly subdivided
°
according to the value of F2 realised at time t 2 , g~ving rise to the 12 row
rd
totals for the 3 interval as shown. The double-log transformation of the
survivor function is given by
(compare (4.3»where C is a 21 x 21 'block triangular' matrix in which all the
,
-I
1
I
1 I
11
I
1
1
1
1
I
I
I
I
I
-- -
I
1
C
•
1
I 1
1
1
1
1
11
1 I'
1I
- - -
-
(4.6)
1
1
I
11
- - --
1
1
1
1
-,
1
1
I
I
11
-- 1
I
I
1
1
1
I
1
I
I
- - - --
-
I
1 I
1
- - - - - -
I
1
I
1 I
r-
I
-+
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-28-
omitted elements are zero.
A saturated hierarchical model (for example). in
which parameters refer specifically to the effects of newly defined covariates
within each of the risk groups previously defined. is given by
,,
below. zero elements of X being
I
1
1
,
I
1
1
as shown
omitte~
1
I
I
1
X~
I
-I
I
I
I
1
1
1
1
1
1
1
1
1
1
1
1
1
.- - 1 -
,I
,
I
1
(4.7)
1
I1
1
I1
1
1
I
I1
1
,1
1
1
1
I 1
, 1
1
1
1
1
1
1
,
I
1
1
I
1
1
I
1
1
1
1
1
1
1
1
In this representation S~ denotes the effect (on the double-log scale) in the
J
,th lnterva
,
l
J
of F O
at level i. j • 1. S
2. 3: i • 1.2.'11D1'I ar 1Y. Qi1
Pj deno t es
O
the effect in the jth interval of ,1 at level 1 when F is at level i. j • 2, 3;
2
i • 0, 1. 2; and in the 3rd interval ~;ml denotes the effect of F at level 1
-29-
e
l
O
when F and F are at respective levels i and m, m • 0, 1; i • 0, 1, 2.
The
underlying structure may also be displayed in a tree diagram as shown on
page 30.
When estimates of parameters and of In[-ln~J (and of their standard
errors) are inserted at appropriate points in the tree, the resulting diagram
is often a valuable aid in the process of model reduction.
The technique is
of, course also useful in the non-sequential case.
All saturated models are formally equivalent (although not
necessarily equally useful as starting points for interpreting a given set of
data), and many alternative design matrices X (for example, of the 'ANOVA type'
mentioned in Section 3) are possible.
sometimes not
exhausti~ely
However, even saturated models are
cross-classified, for example when such is logically
precluded by the definitions of the covariates. Suppose
1
2
O
that F • 1 cannot arise when F • 0 and that F • 1 cannot arise
~
when Fl. 0.
Then the corresponding rows are removed from the tableau (4.5)
\
and the 6 categories asterisked in the tree are deleted, along with the
Sool SOl SOlI SlOl
d S20l and the
specifically related parameters SOl
2' 3
3' 3 ' 3
an
3
th
th
.
th
th
th
th and 19 ) rows and columns of the
correspondlng (5 ,11 ,12 , 13 , 15
matrices C and X defined at (4.6) and (4.7).
The result is a saturated model
appropriate to the non-exhaustive classification, and this can be
investigated in the usual way.
e
e
I
I
F°-o
FO.2
~
1
"1+ 81
t 1+8 l
,
I
F1-o
*F1.1
01
+'2+ B2
. "2
1
r
F 2-o
I
*p2.1
I·
*F 2-o
"3+8~1
t
2
"3
r
t
J
*F 2.1
3
2
f
I
F2-o
3
j
.t3+B 3
3
I
I
F2-o
*F 2.1
'+B1+S 101
-3 3 3
1
t +8°1
"I
I
1
F2.1
.-
1
F2-o
*F 2.1
I
•-3+B 23+S 201
3
'+Sl+ 811+8111
3 3 3 3
11
t 3+B 3+B 3
,
Fl .1
2 21
~2+B2+82
F1-o
2
~2+82
F1-o
F1.1
1111
~2+82
~2+B2+S2'
+'3+ B01 +8°11
3
I
FO.1
to
t
e
2
'-3+ B3
·
I
f
F2-o
-3
2
I
F2.1
+B~+B21+B211
3 3
3
21
.'-3+a"'3+a"'3
I
w
Figure:
.Tree Diagram fot"3x2x2 Sequentia1'Classification Example.
?
-315.
Discussion
In this paper we have descrihed an approach to the analysis of survival
data with arbitrarily time-related covariate information.
If, as leems
desirable, we wish to avoid strong implicit or explicit prior assumptions
of structure underlying the covariate effects of their dependence on time,
then a natural step is to categorise the' data in the form of multidimensional
contingency tables.
In Sections 3 and 4 we have developed a formulation
in which arbitrarily general linear models may be fitted to such tables,
with or without transformations of the cell frequencies.
Proportional
hazards models and various known survival time distributions are accessible
as special cases of this representation (see Section 3 and the Appendix) and
sequentially
Section 4.
realise~
concomitant data can be handled as discussed in
Primarily for the convenience of having explicit analytic
formulae for the estimates, we have worked in terms of the GSK (Grizzle,
Starmer and Koch) methodology for modelling and testing,'based on minimum
modified X2 (MMCS) statistics.
As noted in Section 2, however, maximum
likelihood (ML) and other BAN asymptotically equivalent methods can in
principle be applied to fit models within our formulation, and with equal
generality to that of GSK.
Elsewhere (Read et al (1980»
we
describe a practical. application - a clinical trial of randomised duration
of bedrest in the treatment of males suffering from an acute myocardial
infarction - and compare results obtained by MHCS and ML methods.
paper we also review and disculs modelling strategy.
In that
References
Bishop, Y.M.M. (1969). Full contingency tables, logits and split contingency
tables. Biometrics 25, pp. 383-399.
Breslow, N. (1974).
CovariRnce analysis of censored survival data.
Biometrics 30, pp. 89-99.
Breslow, N. (lQ78). The proportional hazards models: application in
epidemiology. Communications in Statistics - Theory and Methods A7(4),
pp. 315-332.
Cox, D.R. (1972); Regression models and life tables (with discussion).
Journal of the Royal Statistical Society Series B 30, pp. 248-275.
Cutler, S.J. and Myers, M.H. (1967). Clinical classification of extent
of disease in cancer of the breast •. Journal of the National Institute
of Cancer 39, pp. 193-207.
Freeman, D.H., Freeman, J.L. and Koch, G.G. (1974). A modified x2 approach
for fitting somebivariateWeibull models to synthetic life tables.
University of North Carolina Institute of Statistics ~eo Series No. 958.
Goodman, L.A. (1970); The multivariate analysis of qualitative data:
Interaction among multiple classifications. Journal of the American
Statistical Association 65, pp. 226-256 •
.4It
Goodman, L.A. (1971). Partitioning of chi-square, analysis of marginal
contingency tables, and estimation of expected frequencies in multidimensional
contingency tables. Journal of the American Statistical Association 66,
pp. 339-344.
Grizzle, J.E., Starmer, C.F. and Koch, G.G. (1969).
data by linear models. Biomet~s 25, pp. 489-504.
Analysis of categorical
Grizzle, J.E. and Williams, O.D. (1972). Log linear models and tests of
independence for contingency tables. Biometrics 28, pp. 137-156.
Holt, J.D., and Prentice, R.L. (1974). Survival analysis in twin studies
and matched pair experiments. Biometrika 61, 1, pp. 17-30.
Johnson, W.D. and Koch, G.G. (1971). A.note on the weighted least squares
analysis of Ries-Smith contingency table.·data. Technometrics 13, pp. 438-447.
Kalbfleisch, J.D. and Prentice, R.L. (1973). Marginal likelihoods based on
Cox's regression and life model. Biometrika 60, 2, pp" 267-278.
Kay, R. (1977). Proportional hazard regression models and the analysis of
censored survival data. Applied Statistics 27 No.3, pp. 227-237.
Koch, G.G., Johnson, W.D. and Tolley, D. (1972). A linear models approach
to the analysis of survival and extent of disease in multidimensional
contingency tables. Journal of the American Statistical Association 67, No. 340,
pp. 783-796. (Application Section).
References Cont.
Ku, H.H., Varner, R.N. and KUllback, S. (1971). Analysis of multidimensional
contingency tables. Journal of the American Statistical Association 66,
pp. 55-64.
NeIder, J.A. (1977).
A reformulation of linear models (with discussion).
JournaZ of the Royal Statistical Society Series A 140, pp. 48-76.
Neyman, J. (1949).
Contributions to the theory of the
X2
test.
Proceedings of the BeFkeley Symposium on Mathematical Statistics and
probability, BeFkeZey and Los Angeles: university of CaZifornia ~ess
pp. 239-272.
Prentice, R.L. (1973). Exponential survivals with censoring and explanatory
variables. Biometrika 60, pp. 279-288.
Prentice, R.L. and Gloeckler, L.A. (1978). Regression analysis of grouped
survival data with application to breast cancer data. Biometrics 34,
pp. 57-67.
Read, K.L.Q., et a1 (1980). Survival Analysis in a Sequenced Clinical Trial
with Covariates. To be pFesented at the 1980 RSS/IAS Conference on Design
and Analysis and published in the Bulletin in Applied Statistios.
Taulbee, J.D. (1979). A general model for the hazard rate with covariables.
BiometFics 35, pp. 439-450.
Thompson, W.A. (1977). On the treatment of grouped observations in life
studies. Biometrics 33, pp. 463-470.
Wald, A. (1943). Test of statistical hypotheses concerning several parameters
when the number of observations is large. Transactions oj: the American
Mathematical Society 54, pp. 426-489.
A.l
Appendix: Some Parametric Models for the Reference Hazard Function A (t)
0-
A.I Generalities
The purpose of this appendix is to present some specific models for the
standard or reference hazard function A (t) (or the corresponding survivor
o
function Gro(t»
which may serve as useful reductions of saturated linear
models fitted to the double-log transformation of the survivor function as
in Section 3.
We are thus considering non-sequential situations in which,
although the effects of the covariates may vary arbitrarily in successive time
intervals, the definition of the concomitant vector z is invariant through time.
Since any saturated model fits the data layout perfectly, there is in one sense
no need to consider transformations other than the double-log form, or models
which are not linear in all the parameters.
However, in practice interest
invariably lies in simplified models with substantially reduced numbers of
parameters, and in any particular case there is no reason why the double log
~
(rather than some other) transform of ~ should necessarily have the simplest
or most economical representation in terms of linear (rather than non-linear)
models.
In the course of our account, other transforms and non-linear models
will therefore arise.
As in Section 3 we work with a total parameter vector
in which the subvectors
of the cQvariates.
subvector
!
~
,;t.
1, where 1 L ·(f' ,!'),
and -S respectively parametrise A0 (t) and the effects
Here we consider replacing the k-vector
i by a smaller
of parameters identifying members of a relatively simple family
of distributions.
In the current (not necessarily saturated) linear model
fitted, say
F (.£) ..
Xl,
we may partition X as eXt; X ] where X consists of the first k columns of X
t
B
and X stands for the remaining columns.
B
!
(.£) • X~ + Xe!
Then
(A. 1)
A.2
Before we deal with specific cases, two general approaches are worth
~
noting.
The first, which can be applied within the double-log formulation,
consists in replacing Xt! in (~.l) by a vector function ~(tl!), where
t' - (tl, ••• ,t k), which normally has some explicit parametric dependence on
time expressed through the reduced subvector!.
Then if
e is involved nonlinearly,
through some function ~(!I!), where S'- (tl, ••• ,t k ), weighted least
squares analysis implies minimising with respect to the components of
1
a
function of the general form
Q -
CF<.2,) - a(ll.!) -
Xe!]' S-l [F(.2.)
- .s.(!li) -
xa!l.
(A.2)
Although computationally unattractive, the nonlinear minimisation of Q may be
justified by the performance of the resulting model.
No essential difficulties
of principle are raised since, if the form of model expressed in (A.I) is
correct, Q is still asympto~ically x2-distributed.
~
~(!Ie) agrees closely with the linear form
that
6 will
xti which
Furthermore. if
it replaces, it is likely
have been little changed by this substitution.
At least as a
first step, estimating! conditional on the original (linear model) estimate
of
!
may reduce the computational burden considerably.
Secondly, a linear models formulation which embraces a number of cases
of interest is that in which
Tis
a conveniently invertible function. G say.
of a function, h(t, ~11) say, which is linear in the components of the vector
1 of unknown parameters. We may then write
'i-<t I.!)
- G(h) ,
where
(A.3)
where L is the dimensionality of i and hl""'~ are known, prespecified
functions of t and z which are such that within sone region of parameter space
A.
~
is a legitimate survivor function.
).(t
I!.) -
it [- ln
q.(t
j
The corresponding hazard is
I.!)]
A subsystem of (A.3) which preserves the linear combination !',! intact is
given by
(A. 4)
where t(t) is a known prespecified function of t, g is linear in the
components of the subvector! and
(A.4) are not
eI
1
is given by
1'-
(!',!').
(A.3) and
in general, although the separable product case,
(A.S)
obtained by restricting e to be a scalar and
(assuming that covariate' effects subsumed in
intervals).
~utting
!
=1
clearly is eI
do ~ vary between
The original 'double-log system is seen to fall within the scope
of (A.4) if we take G-l(!f) - In(-R.nT), replace! by
take T(t)
= e ~ (t),
get/e)
and define
. R(tl~) as ~. when t. 1 <
- .x.
J
J-
!
as defined at (3.3)"
t
~ t., j-l, ••• ,k.
J
A.2 Analyses for Particular Families of Distributions
Our starting point is the N x (k+l) data layout of Section 3, the
reference or standard hazard function). (t) being parametrised by the
o
subvector.1 defined at (3.3). With proportional hazards, covariate effects
are parametrised by the subvector! as at (3.4); otherwise, when hazards
are not proportional,
!
is composed of further subvectors
!l""'~
representing
covariate effects in the time interval t. 1 < t < t., j-l, ••• ,k, as at
J-
J
(3.9).
In either case it may be of interest to elicit from the vector
estimate
1, and then test, a simple parametric form for ).o(t) and hence
...
for the corresponding standard survivor function ~ (t).
When proport;'onal
hazards do not apply, it may also be useful to look for similar simple time
A.4
dependences relating covariate effects in different time intervals
~
As need hardly be said, the list which follows is illustrative, not
exhaustive.
Not unnaturally, our choice of examples has been conditioned
by mathematical convenience.
(i)
The Weibull Distribution
For the two parameter Weibull distribution we have
80
that
tn[-tnTo (t)] • a tn p + a tn t,
If~o(t)
which is of the form 6 1 + 6 2 tnt.
of the estimates
e.J
is of Weibull form then a plot
against t~.)should be well fitted by a straight line.
J
If proportional hazards apply, this could be formally tested by fitting the
(still linear) reduced model expressed for i • l, ••• ,N; j • l, ••• ,k by
(A.6)
otherwise
a'
-
in (A.6) should be replaced by 8!, the jth subvector of the
-J
extended!' defined in (3.9).
The further specialisation 6 2 • 1, corresponding
to the exponential distribution, can be treated in the same way.
For the three parameter Weibull distribution, (A.5) becomes
leading to a nonlinear problem of the form (A.l).
62-1 corresponds to a
displaced exponential distribution.
(ii)
The Generalised Rayleigh Distribution
This distribution has a polynomial hazard function,
~
(t)·
o
p
t
..
6 t,
.. -0 l
where ao' al, ••• ,a p are such that
~o(t) ~
0 for all
t >
o.
It follows that
A.4
,.
dependences relating covariate effects in different time intervals
~
As need hardly be said. the list which follows is illustrative. not
exhaustive.
Not unnaturally. our choice of examples has been conditioned
by mathematical convenience.
(i)
The Weibull Distribution
For the two parameter Weibull distribution we have
:f 0 (t)
•
-
exp [ -(pt)
aJ •
t
> 0. a > 0. p > 0.
so that
tn[-tnT
(t)] • a 1n p + a tn t •
.0
which is of the form 61 + 62 tnt.
If~o(t) is of Weibull form then a plot
of the estimates ~. against t~.)should be well fitted by a straight line.
J
J
If proportional hazards apply. this could be formally tested by fitting the
(still linear) reduced model expressed for i • 1 •••.• N; j • l ••••• k by
tn rL:-ln ••1 (t.)]
• 8 1 + 8 2 ln t.J + -B'z.;
J
-1
otherwise B' in (A.6) should be replaced by
-
extended~'
defined in (3.9).
B!.
-J
(A.6)
the jth subvector of the
The further specialisation 82 • 1. corresponding
to the exponential distribution. can be treated in the same way.
For the three parameter Weibull distribution. (A.S) becomes
leading to a nonlinear problem of the form (A.l).
82-1 corresponds to a
displaced exponential distribution.
(ii)
The Generalised Rayleigh Distribution
This distribution has a polynomial hazard function.
~
where 6
0
,
p
o
(t)·
..
t 8ftt.
t-<)"
8 ••••• 8 are such that AO(t)
1
p
~
0 for all
t >
O.
It follows that
P
~ (t) • exp (- t
o
t-o
ett"+lj
1.+1
and that
t
P al.t.
exp(,.) • t .2:...J.. j.l ..... k.
J
t-O 1.+1
indicating an obvious graphical way of aS8essing this form of distribution
(preferably with small p) in the light of estimates .l ••••• i • Within the
k
double-log system this distribution gives a nonlinear model of the form (A.2).
Alternatively, a general system of linear models based on (A.2) and (A.3)
can be conveniently investigated by means of a single log transformation,
namely.
~.(t.) - exp
1
J
p
- E
f
t+l
ett.
t-o
J
1.+1
- t(t.)S'z.}
J --1
or
p
-I.n[~.(t.)]
• t
1
J
".
+ t(t.)S'z .•
J --1
taO
where teO) • 0 and t(t) > 0 when t > 0;
by choosing p • k - 1 this can be made a saturated model of generality equal
to that of the whole double-log system.
A corresponding
eI
subsystem of reduced
models can then be derived from (A.4) if required.
I
(iii)
Logistic Systems
The standard logistic distribution function is (l+et}-l, -~ s t
< e.
This may be adapted so that time runs from 0 (the median of the standard
distribution), giving the survivor function
"!:i.
~o(t)
• 2(1 + e
at -1
)
• t >
o. e > o.
for which the appropriate logistic transformation is
1n [2- To (t>]- . at
To(t)
(A.7)
A.6
and
). (t)
o
It follows that, in the usual notation of the double-log formulation,
which is a nonlinear model of the form (A.2).
Alternatively, if we apply (A.3)
a generalised linear model can be set up using the adapted logit transformation
. (A,.7), namely,
one of many possible esamples being
It would be interesting to compare the results of using these models with
those obtained by an early use of the multiple logistic function in the
.,"
Framingham study of coronary heart disease reported by Truett et al (1967).
(iv)
Inverse Power Laws
For the inverse power survivor function
!I o (t)
o
• (l +
....!
0
1
-0
t)
1,
t>O, 8 >0, 8 >0.
2
1
we have
(A.8)
and
). (t)
o
A. I
Since ~. •
J
t.
r0
J
A (u)du, we have
0
(A.9)
If we know or can guess at or are prepared to try possible values for 8 ,
1
(A.9) indicates an obvious graphical method for assessing this form of distribution;
or, for the reference group of subjects with
could be based on (A.8).
~
•
Q,
a linear models analysis
From the double-log transformation we obtain the
nonlinear form
8
2
tn[-tn~.(t.)l • tne
l.
+ tn[tn(1+8- t.)] + B'z ••
J -.1
1 J
- -1
As in (iv) reparametrisation of 8 with e~ - tne
-
1
1
may be helpful.
As in (i), a displacement parameter can be incorporated if required.
If the index of the power law is known, say k, models of the form
_1
L
~~'i(tj)J k • 1 + t hl.(tj'.!i)~l'
1-1
based on (A.3) with G(x)
= x-k ,represent
a wide range of possibilities which
are still linear in the parameters.
(v)
The Gompertz Force of Mortality
In this model, which has been widely used in actuarial studies, we have
or
Corresponding nonlinear models accessible via the double-log transformation
are of the form
A.8
,
Evidently we may work in terms of 8 • ln6 instead of 61 , and then only
1
l
62 is involved nonlinearly.
The approach represented by (A.3) may be less helpful here, since the
invertible transforcation ofrwhich is required demands knowledge of the
•
ratlo
8t
e-.
2