SURVIVAL ANALYSIS FROM THE VIEWPOINT
OF HAMPEL'S THEORY FOR ROBUST ESTIMATION
by
Steven J. Samuels
..
'
Department of Biostatistics
University of North Carolina at Chapel Hill
.4
Institute of Statistics Mimeo Series No.
JUNE 1978
116~
SURVIVAL ANALYSIS FROM THE VIEWPOINT
OF HAMPEL'S THEORY FOR ROBUST ESTIMATION
by
Steven J. Samuels
Department of Biostatistics
School of Public Health
University of North Carolina at Chapel Hill, NC
27413
TABLE OF CONTENTS
Page
LIST OF FIGURES
LIST OF TABLES
Chapter
1.
2.
INTRODUCTION. . . . . . . . . . . . . . .
1
1.1
Survival Background, Parameterization . .
1
1.2
Description of Contents
.
3
IDEAS FROM THE THEORY OF ROBUST ESTIMATION
5
2.1
Estimators as Functionals
5
2.2
Fisher Consistency • . . .
7
2.3
Robustness, the Influence Curve
9
\
The Influence Curve • . .
Theoretical Considerations . . .
Use of the I.C. with Outliers . .
Assessing Robustness with the I.C.
Interpretation of the I.C. in Regression.
3.
4.
EXTENSION OF HAMPEL'S FRAMEWORK TO SURVIVAL DATA •.
10
13
15
15
16
19
3.1
Introduction . . . . . • . .
19
3.2
The Random Censorship Model with Covariates
20
3.3
Related Work. . . . . . . .
24
3.4
Independence of Censoring and Survival •.
26
3.5
Limitations of Hampel's Approach ••
28
THE ONE PARAMETER EXPONENTIAL MODEL . .
30
4.1
The Model . . .
30
The Estimator .
Functional Form .
Fisher Consistency . • •
31
31
The Influence Curve.
32
Interpretation . . • .
34
4.2
30
Page
5.
4.3
The Limiting Variance and Distribution of ~
35
4.4
Other Functions of A •
36
NONPARAMETRIC ESTIMATION OF AN ARBITRARY SURVIVAL
FUNCTION .
...•.
39
5.1
Note on Rank Estimators.
39
5.2
The Product Limit and Empirical Hazard
Estimators . . . . . . . • . . . . .
39
Functional Form: Fisher Consistency Under
Random Censoring.
41
5.4
The Influence Curve ••
42
5.5
Discussion • . • •
45
5.6
The Squared I.C.
47
5.3
6.
7.
n
AN EXPONENTIAL REGRESSION MODEL
52
6.1
Introduction . • . . .
52
6.2
Likelihood Equations •.
52
6.3
The Estimator as a Von Mises Funcational,
Fisher Consistency . .
54
6.4
The Influence Curve . .
56
6.5
The Squared I.C. under Random Censorship.
58
6.6
Discussion . •
59
COX'S ESTIMATOR
62
7.1
Introduction •
62
7.2
The Proportional Hazards Model, Time
Dependent Covariates.
63
Random Censorship.
64
7.3
Cox's Likelihood . •
65
7.4
Solvability of the Likelihood Equations . .
69
7.5
The Estimator as a Von Mises Functional •.
70
Page
8.
7.6
Fisher Consistency.
73
7.7
The Influence Curve . .
76
7.8
The Squared I.C. under Random Censorship.
84
7.9
The Empirical I.C. and Numerical Studies
of Cox's Estimate •.
91
7.10
The Simulation Study.
105
7.11
Conclusions and Recommendations
111
ESTIMATION OF THE UNDERLYING DISTRIBUTION IN
COX'S MODEL. .
. . • .
113
8.1
Introduction .
113
8.2
Cox's Approach.
113
8.3
The Approaches of Kalbfleisch and Prentice
116
8.4
The Likelihoods of Oakes and Breslow.
119
8.5
Functional Form; Fisher Consistency of
Breslow's Estimator
121
8.6
The Influence Curve.
123
8.7
Estimation of G(sl~)
125
8.8
Conjectured Limiting Distribution for
Breslow's Estimator
..•.
127
BIBLIOGRAPHY. .
129
Appendices
AI.
APPENDIX TO CHAPTER FIVE • • .
133
A2.
APPENDIX TO CHAPTER SEVEN
139
A3.
APPENDIX TO CHAPTER EIGHT
142
A4.
COMPUTER LISTING OF SAMPLES FOR CHAPTER
SEVEN PLOTS. • . • • • . . . • • • .
144
LIS']' OF FIGURES
Figure
7.9.1.
7.9.2.
7.9.3.
Page
Plots of the empirical influence cu~ve for
Cox's estimate aguinst t: n=49,
=0.36
n
a
Plots of the normalized differ~nce in Cox
estimates against t: n=49,
=0.36.
98
Plots of the empirical influence curve for
" =1.36
Cox's estimate against t: n=49, 8
99
Plots of the normalized differ~nce in Cox
estimates against t: n=49, 8 =1.36.
100
Plots of the empirical influence curve for
Cox's estimate against z: n=49, B =0.36
101
Plots of the normalized differ~nce in Cox
estimates against z: n=49, S =0.36.
102
Plots of the empiri~:al influence cu~ve fur
Cox I s estimate ag,linst z: n=49, 8 =1.36
103
Plots of the normalized difference in Cox
estimates against z: n=49, B =1.36.
104
an
n
7.9.4.
n
7.9.5.
n
7.9.6.
n
7.9.7.
n
7.9.8.
97
n
LIs'r OF TABLES
Table
Page
7.10.1.
Percentiles for the Regression Coefficient Data .
108
7.10.2.
Percentiles for th(' Studentized W Statistic . . .
109
ACKNOWLEDGMENTS
I am pleased to acknowledge the teaching and help of my advisor,
Professor Lloyd Fisher.
The comments of Professor Norman Breslow led
to improvements in the final dr-aft, while a conversation with Ms. Anne
York was very helpful at an early stage of the work.
special appreciation to Professors
o.
I want to express
Dale Williams and C. E. Davis,
for making possible the completion of this study.
Dr. Geoff Gordon wrote
the computer programs, and Mrs. Gay Hinnant did the superb typing.
all these I am grateful.
To
Finally, I want to thank my parents, for their
unfailing encouragement over the years.
This work was supported, in part, by U.S.P.H.S. Grant 5-T01-GM-1269
and, in part, by
u.s.
National Heart, Lung, and Blood Institute Contract
NIH-NHLBI-7l-2243 from the National Institutes of Health.
1.
[NTRODUCTION
Clinical trials, industrial life testing, and longitudinal studies of
human or animal populations often require the statistical analysis of censored survival data.
(An elemllntary account can be found in the book by
Gross and Clark (1976).
(1975).
A review article with many references is Breslow
Farewell and Prentice (1977) provide advanced parametric methods.)
In this study we are concerned with the robustness of some estimators
encountered in survival analysis.
In particular We extend to the censored
survival problem a formal framtlwork, due to Hampel (1968, 1974), for the
study of robustness.
By robustness, we mean, loosely, insensitivity of an
estimator to deviunt data and <lepartures from assumptions.
For concreteness
we will speak mostly in terms of human follow-up studies; but the results
may apply to other fields.
1.1.
survival_Background, Parameterization
In survival andlysis, interest centers on time to an inevitable
response, which we term "death" or "failure."
"failure time" or "survival time."
time is modeled as
d
The time to failure we call
(No distinction is intended.)
non-negat ive random
variabll~
Failure
s.
Suppose we follow a sample of n persons, each with a potential failure
time S. (i=1, ... ,n).
~
Censoring arises when S. is unobservable.
happen for a variety of reason:;.
~
This can
For example, the study may E!nd before all
n people fail; a person may drop out or move away.
(A different kind of
censoring, more l.'ommon in i.ndustrial li.fe testing, is progress ive censoring.
A progressive censoring scheme is one in which the experimenter deliberately
remuves object.s from ,·.;tudy.)
2
Because of censoring, we observe for the i-th person in the sample:
t
i
= time
observed without failure and a censoring indicator 0i;
t. -= S
1.
i
1
0.
1.
(uncensored)
=
o
(censored).
In most studies, known factors such as age, treatment group, sex,
and extent of disease may affect survival time.
px 1 vector
0
·
. h z,T
f covar1.ates
z., W1.t
-1.
-1.
=
We express these as a
(Zil,zi2, •.• ,zip)'
The data for
the experiment then consist of (ti,Oi) or (t"o"z.), i=l, ... ,n.
1.
1.-1.
Throughout this paper, lower case "t" denotes a time variate; upper
case "T" denotes matrix or vector transpose.
Unfortunately, both symbols
may appear in the same expression, e.g. 6.1.1.
The distribution of the failure time random variable S can be
described by
(1.1.1)
The cumulative distribution function
G(t)
(1.1.2)
The density function
get)
(1.1.3)
= P(S<t):
d
= dtG(t)
:
The survival function
G(t) = I-G(t) =
(1.1.4)
P(S~t);
The hazard function
P(t~s<t+6t IS~t)
A(t) = g(t)/G(t) = l i m - - - - - - - 6t-+o
6t
(1.1.5)
The integrated or cumulative hazard
A(t)
= ftA(u)du = -log
o
e
(G(t».
When failure time depends on covariates z, we write (1)-(5) above conditioned
on the value of z:
G(tl~)
3
Inference from survival data requires the choice of models and tests
of hypotheses about parameters.
These are important topics, but in this'
thesis we concentrate on estimators and their properties.
1.2.
Description of Contents
Chapter 2 sketches the
~~in
results from Hampel's robustness theory.
A central idea is to define est:imators based on
of the empirical distribution function F:
n
cdlled von Miscs functionals.
~(FO)
:.::
~,
~l '~2'
••.
e = w(F).
n
'~n
as functionals
Such estimators are
An estimator is Fisher consistent if
where Fa is the underlying distribution of the X's.
influence curve (I.C.) is introduced.
The
The I.C. is essentially a derivative
of the functional w(F) and shows the sensitivity of w(F) to local changes
in the underlying model.
An empirical version of the I.C. shows how the
estimator reacts to perturbations in the data.
Related theory is discussed.
"
In particular, if ~e is asymptotically normal, the asymptotic variance 0 2
is often related to the I.C. by
( L 2.1)
or by a multivariatl' version ot (1.2.1) if 0 is a vector.
The general method for applying Hampel 's
idl~as
to censored survival
data is outlined in Chapter 3.' The idea is to write X
X
=
=
(t,6) or
(t,o,z) and define the empirical distribution function F (x) accordingly.
n
We also need to model the X's
distribution F(x).
dS
random variables with a common underlying
One such distribution is proposed, termed a random
censorship model with covariates.
Limitations of Hampel's approach are
discussed, and some related work in the survival literature is reviewed.
Each of the remaining ch<1pters is devoted to a particular estimation
problem and associated estimator.
Chapter 4 considers the simple
4
cxponent·ial model A(t) ~ ,\ and the maximum likelihood t!l:>timator (M.L.E.) for
the model.
In Chapter 5, a nonparametric eBtimator for an arbitrary survival
function G(t) is examined.
A(tl~) = exp(~T~)Ao
Chapter 6 studies an exponential regression model
and the corresponding M.L.E.'s.
Each of the estimators in these chapters is shown to be a von Mises
functional and, under random censorship, is proved l"isher consistent.
The
influence curVe is derived, and, again under random censorship, the
relationship l.2.1
is proved.
Chapt.ers 7 and 8 are devoted to a major topic of this thesis--the
proport.ional hazards (P.H.) model of D. R. Cox (1972).
>'(tlz)
-
~
In this model
expWTz)A (t), where A (t) is the hazard function of an arbitrary
-
-
0
0
unspecified underlying distribution.
detail in Chapter 7.
Cox's estimator for
8
is examined in
In Chapter 8 we study an estimator (due to Breslow
1972a,b) for the underlying distribution in the P.lI. model.
2.
IDEAS F'ROM THE THEORY OF ROBUST ESTIMATION
This chapter introduces those ideas from the theory of robust estimation which underlie the succeeding chapters.
The formal theory originated
in Hampel's doctoral dissertation (Hampel, 1968).
For illuminating reviews
of robust statistics, including Hampel's theory, see Huber (1972) and
Hampel (1973,1974).
(1974, Section 9.4).
There is also a brief introduction in Cox and Hinkley
Hampel began by characterizing estimators as func-
tionals on the space of probability distributions.
2.1.
Estimators as Functiona1s
The statistical problem is to estimate a p X 1 parameter vector 8£0.
At hand are observations x ,x , ••• ,x , possibly multivariate, with empirical
n
l 2
distribution function F (x).
n
Definition:
8
-n
An estimator
e-n
for 8 is said to be a von Mises functional if
can be written as a functional of F , that is if
n
e_n _.
w (F ),
n
where the functional w('), defined on the space of probability measures and
taking values in 0, does not depend on
II
(von Mise.s, 1947; Hampel, 1961:.l,
1974; Huber, 1972).
Exampl.es
(1)
The arithmetic mean can be written
dF (x)
n
= w (Fn ),
where
(2.1.1)
w(F)
= Ix
dF(x).
6
(2)
The M estimator for a location parameter (Huber, 1964,1972) is
implicitly defined by
n
"
l: IjJ(X.-e )
i=l
1.
n
(2.1.2)
for an odd function
~(.).
= a,
(A unique solution is not guaranteed.)
divide both sides of (2.1.2) by n.
Then
en
We can
w(F ), where w(F ) is
n
n
implicitly defined by
f~(X-W(F»
(2.1. 3)
o.
dF(x)
The sample mean in Example 1 is a special case, with
(3)
Regression.
= x.
~(x)
In multiple regression, the observations consist
z~,
of a (univariate) response y. and a lXp vector of covariates
1
where z. may include indicators.
Define X.
-1
-1
=
-1
I-1
In the model E (y. z. )
1
~~,
T
z.8,
1
the i-th row, and let yT
=
Let
8
n
is defined by
or
T
"
= a.
We can rewrite the normal equations as a sum over the sample:
n
(2.1.4)
T"
r. z. (y.-z.a ) ::; a.
i=1-1
1
-1
n
Now divide both sides of (2.1.4) by n to find:
n
1 ~ 7,.
n i-=1"]
"
(y.-z~8 )
1
-1-n
= o.
~(nXp)
(Y1""'Yn)'
the normal equations:
Z (y-Z8 )
- - --n
1, •.. ,n,
1
the least squares estimator
1
=
(y. ,z.), i ::; 1, ... ,n, and
let Fn(X) be the empirical distribution function of the X's.
be the design matrix, with
i
Therefore
B
-n
= w(F ), where w(F)
n
7
is defined by
_00
J J ~(y_~T~(F» dF(Y,~)
(2.1.5)
= o.
Z _00
and
Z
is the domain of the z's.
In this case, we can of course explicitly
solve (2.1.5) for w(F):
The reader can easily show that this gives the usual least squares solution
when F = F .
n
An estimator that is not exactly a von Mises functional itself
may be nearly equal to such a functional.
The sample median M , for
n
example, depends for its definition on whether the sample size is even or
odd; but in large samples
(2.1.6)
The right hand side (hereafter abbreviated r.h.s.) of (2.1.6) is a von Mises
functional w(F ), defined by
n
(2.1.7)
2.2.
Fisher Consistency
Suppose w(F ) is a von Mises functional for estimating 8£0.
-
n
Now
assume that the observations x ,x , ... ,x are independent random variables,
1 2
n
with cornmon cumulative distribution function Fe(x).
Definition:
The estimator w(F ) is Fisher consistent (F.e.)
n
iff
~(F8)
- 0, identically in 8
(Hampel, 1974)
8
Remarks
1.
The virtue of a Fisher consistent estimator is that it estimates the
right quantity if faced with the "true" distribution.
2.
The definition given abovt' is the one used by Hampel (1968,1974) and
others.
Rao (1965, Sc.l) requires in addition that w(-) be a weakly
continuous functional over the space of probability distributions.
3.
Fisher consistency does not refer to a limiting operation.
Fisher
consistency and ordinary weak consistency (convergence in probability)
cannot be related without regularity conditions on w(-) and the space
of probability distributions.
4.
The parameter 8 need not completely identify the distribution F
.
O
Examples
1.
Consider the family of distributions F
Then the functional (2.1.1) evaluated at F
w(F ) =
P
Ix
II
II
with finite first moment ll.
gives
dF (x) == ll.
II
Therefore the arithmetic mean is F.C.
2.
In the regression sptup of Example 3 in Section 2.1, let the
covariates z be drawn from a probability space
K(z).
Z with
distribution function
Conditional on z, draw the response y from a distribution GS(yl~)
such that
E
y
Iz (y\z)
-
-=
Jy
dGo(y\z)
z
T
(~.
p-
'l'hpn the joint dist.ribution of (y,z)
is
If this distribution is substituted in the normal equations, (2.1.5), one can
easily show that w(F)
S
is a solution.
(Details are left to the reader).
9
Therefore, the least squares estimator w(F) defined by (2.1.5) is, under
this model, Fisher consistent.
2.3.
Robustness
Hampel chose to study an estimator w(F) in terms of its functional
properties.
For example, the natural requirement that w(F) estimate the
"right thing" is translated, in functional terms, into Fisher consistency.
"Robustness" is identified, in part, with stability of w(F) to changes in F.
As outlined by Huber (1972) and Hampel (1974), there are three aspects to
this stability:
continuity of the estimator, its "breakdown point," and
sensitivity to infinitesimal changes in F.
Continuity is the requirement that small changes in F (errors, contamination, distortion) lead to only small changes in the estimator w(F) .
Formally, a metric is introduced onto the space of probability distributions,
and continuity of w(F) in this metric is required.
The Prokhorov metric
is often convenient for studying robustness, though other metri.C's can be
used (Prokhorov, 1956; Hampel, 1968,1974; Martin, 1974).
Among familiar
estimators, the sample mean defined by (2.1.1) is nowhere continuous in the
space of distributions.
The breakdown point is, roughly, the smallest percentage of observations that can be grossly in error (at
itself becomes unbounded.
~
00,
say) before the estimator
For example, the breakdown point of the mean is
zero; that of the median is 50 percent.
Again, there is a formal definition
in terms of the Prokhorov metric.
The Prokhorov metric proves difficult to work with in the survival
setup of the next chapter.
Therefore, we will not formally study con-
tinuity and the breakdown point in this thesis.
Instead we concentrate
on the third aspect of robustness--stability of w(F) in the face of local
or infinitesimal changes in F.
This leads naturally to consideration of 10
a "derivative" of w(F) at F and consequently to the influence curve.
The Influence Curve
The influence curve (I.C.) was introduced by Hampel (1968).
Among
published discussions are an expository article by Hampel (1974) and
sections of Huber (1972) and of Cox and Hinkley (1974, Sec. 9.4).
Most applications of the I.C. have been to estimation of univariate
~!
location and scale (Andrews,
see Mallows (1974)
al., 1972).
For mUltivariate applications
(regression and correlation) and Devlin, Gnanadesikan,
and Kettenrinq (1975)
(correlation).
We first. define a general first order von Mises derivative.
Let F(x)
and K(x) be probability measures on a complete, separable metric space.
Suppose w (0) is a vector valu('d von Mises Functional (estimator).
the (-mixture distribution F£
(l-C)F + £K.
Form
For sufficiently nice F, the
estimator w(o) has a von Mises derivative at F:
£
Ud - (F)J
~
(2.3.1)
=
w
£
for some function IC(x;w,F)
£:;0
w(F ) - w(F)
£
_
, 1 1m
£-..0
£
(von Mises, 1947; Filippova, 1962, Hampel,
1968) .
In particular, we can find I_C(XiW,F) by letting K(x) = I
distribution function which puts mass one at x.
(poi.ntwj~;e)
<it 1S
i~;
given by:
x
, the
11
(2.3.2)
+ £ I 1 - 't(F)
~[(l-£)F
lim
x
£
£~
=
where F £
(1-£) F
Example 2.3.1
+ £ Ix'
(means)
Let
w(F)
=
EF(~(x»
J~(U)
Then, contaminating F with the point x * , we have
for some function b(x).
w(F ) =
c
dF(u),
J~ (u)
= (1-£)
d[(1-£)
F + £ I
J~(U)dF
(u)
= w (F) + £[b(x * )
x
* ] (u)
+ £b(x * )
-w(F»).
And
w(F ) ~
£
w(F)
w(F)
£
- b(x)
(ve:>O) •
Therefore
Ie ( x * ; w , F)::
(2.3.3)
When F
=
F
n
b (x *)
-
E (b ( x) ) •
F
, the empirical I.C. of the sample mean
(2.3.4)
IC(XiW,F )
n
x - X
(b(x)
=
x)
is
n
Remarks
1.
I f w(F)
is a pXl vector valued estimator,
the IC(x,w,l")
is also a pXI
vpctor.
~(F)T
'" (W (F),W (F), ... ,W (F»,
1
2
p
The k-th coordi.nate of the IC is
]2
(2.3.5)
Ie (XiW,F) '" IC(xiwk,F).
k-
That is, IC
is just the I.C. of wk'
k
2.
Let us note for future use that L=O corresponds to the "central case":
3.
How does the I.C. express Lhe "influence" of un observation on an
estimator?
For one answer, we consider the empirical I.C. for a sample
with d.f. F .
n
bt~
Let w(F )
n
the estimator based on F
Form F
observation (not necessarily in the sample).
n
and x a possible
= (l-£)F
n,E
n
+£
I
x
A first order Tay1or.'s series expansion about w(F ) using (2.3.2) gives
n
w(F
(2.306)
n,L
) - w(F ) == L IC(XiW,F ) .. 0([").
n
Tht'n each of the
l/n+1.
which in F
n
II
observ,ltion~;
x. (i=1, ... ,n),
1
hud mass J/n, hilS in F ,J/(u+1) mdSS
n
(1-£)
/n
ha~;
The "new" observation x
f
=
1/ (n+1) .
mass
1/(n+1).
Therefore, the mixture dis1.ribution F
n,l/(n+1)
is the distribution of
+x
F + , say, consisti.ng of x1, ... ,x and x.
n J
n
+x
)
.
.
Thp esti.mclt.or w(F
/
» == w ( F I l S the (~st1.mat.or based on t. h e
n,l (n+l
n+
the
samplt~
of: size
!~+l.,
dugmpntt'd sampll', dnd the t'xpansion (2.3.5) becomps
IC(XiW,F )
(2.3.7)
W(f'+x\)
n+
- w(F )
n
n
n+J
+ O(l/(n+1) 2).
Approximately
(2.3.8)
(n+l) Iw(F
+x
l)
n+.
w (F
n
)
1
IC(xiw,F ).
n
In other words, the empirical I.C. measures the normalized change in
the estimator caused by tht· addit. ion of a point x to the sample.
13
For the case of the sample mean w(F ) = x , one can easily show
n
n
x-x
n
n+1
That~
is, the rcl.ationship (2.3.H) is exact.
is not linear in 1",
(2.3.8) may provide only
In other cases, where w(F)
d
qualitative approximation.
(The approximation will be examined in Chapter 7 for the case of Cox's
estimator.)
4.
If (2.3.]) holds, then
o.
(2.3.9)
=
In particular, when F
F
n
II
r.
(2.3.10)
IC(x. ;w,F )
i=l
1
-
= o.
n
For the estimators examined in this study, we will not prove the
existence of tlw general von Mises deri vdti ve sat i sfying 2.3.1.
equations (2.3.')
and 2.3.]0) will be checked for the I.C.'s
But
d(~rived
in the chapters to come.
Theoretical Considerations
When the general von MiSt'S derivative (2.3.1) exists, another Taylor
series expansion may hold.
(2.3.11)
w«l-c)F +£K)
This leads to
Binkley,
d
proof of the
- w(F)
a~;ymptotiG
(l-L) P +l. K
(
~
F
II
and
Jrc(X;~,F)
w; i
Ill)
-n
=
w(F)
\974).
1/11 and let K(x) = ni"
P
normality of 0
(2. 3.9) .
dK(x)
n
(x)
-
(n-1)
F(x).
Then
n
(Cox and
14
The n by ( 2 • 3. Ll)
n
w(F ) - w(F) = (l/n) .}: I_~(xi;~'I") + O«]/n)2).
(2.3.12)
n
1-"]
Note that the 1. C. with respect. to F, not F , appears on the r. h. s. of
n
Here IC(X.iW,F) expn'sses the "influence" of x. on w(F) in
(2.3.12)
1
-
1
a way d liferent from (2.3.7).
The summands IC(X.iW,F) are i.i.d. random
-
1-
v<..lriables with mean zero (by (:>..3.9) and variance-covariance matrix
E
T
F
(JC(x,;w,F) IC(X.iW,F) )
-
J
-.
-
1-
(2.3.13)
Appeal to the central l.imit th.'on!m leads to the conclusion that
;;-; (w (F ) - w ( F»
n··
U.3.L4)
'rill'
l·oll<.lilion~;
~
N ( 0 , A (w , 1") ) •
~
• -
under whie'll (2 ..LI4) ho.Ldu h,)v.' bel'lI dlBcusBod by
VOII MiHt'H (llJ47), !"iliI'Povu (illbl), und MUII'r and Sun (llJ7:l).
ditions are so
IlC,.1Vy
Ttll! ('on-
t.hat in pJ'CJ(;ticp, asymplot.ic l1ormalit.y if::; usually
proved by other means.
When asymptotic normality has been proved by other theory for estimates
ill
succCl~ding
dl<..lpt.l'rS, we will check that l\(w,F) agret's with the limiting
covarianc0 mat.rix.
A sample approximation to l\(w,1")
is (for p=l)
n
l\(w,F ) =- 1/11
(2.:L 15)
fI
'l'hj~.
cstimatL' of l\(w,l<')
is
): TC~ (x. ;w,F ).
i=1
1
n
dis('u5~H.'d
by Cox ,wd Hinkley
(1975)
and by
Mal lows (1974).
For one
I!St.
imdtor st uelled in thiH thes is, till' Breslow
Cot
imate of the
underlying dj!;tribution in Uw Cox modol, Chapter II, no limi.ting result.s are
known.
In th is case we wiJ 1
~~l!~~-,.::t~~
that asymptotic norma.1 i ty holds,
15
with limiting variance A(w,F) qivl'n by (2.3.13) and estimated (before
some simplification) by A(w,F ).
n
Use of the I.C. with Outliers
Devlin, Gnanadesikan, and KeLtcnring (1975) advocated the empirical
I.C. as an indicator of the influence of each data point Xi = (YiI'Yi2)
on
the sample correlation coefficient p.
.
exact expressIon for p
1.
P
-
n
and showed
n- 1
A-X .
.1
n (p -p
n
n-
They found the complicated
n
.,,-x.
])
Dev 1 in, et al. ['(-commended plot
S
~
IC (x. i P , F ).
n
1
0
f
t.he
n
f~mp i
r i<,:a} JC(x.;P ,I" ) contours on
1
n n
For estimators of dimension p>2 functions like
Q.
= lC(X.iW,F)
1
n
T
1-
IC(Xiiw,F)
-
n
might with similar reasoning sllow t.he effect of a point x. on the estimate
1.
on
w(F) .
n
Unfortunatl:'ly, equation (2.3.8) suggests that the empirical IC(x.)
1.
~~~i.-n~
better estimi.lt.es the <,ffpet of
t.he point already there.
already distorted II
n
=
I f' x.
1.
is an out.] ier originally it may haw'
1
w(F ).
n
another point at x. than of dl>] et.inq
'I'hc effcct of adding the same outlier to t.hL'
si.unple a second t im(; wi 11 not, :i n general, be as great as the effect of
deleting the outlier altoqethel.
approximat ion (2. J .16) usi ng the
In practice, Devlin et al. found the
l~mpirical
1. C. satisfactory, but thi s may
not. be so in other problems.
Assessing Robustness with the l.C.
The I.C. helps to asse.;s robustness of estimators against two kinds
of disturbances (Hampel, J974).
The first kind is the "throwing in" of
bad data points--contamination, outliers.
A bounded I.C. indicates
protection against such errors.
(Hampel calls the sUPxIIIC(xi;w,Fl
16
II
the
"gross error sensitivity.")
A second kind of disturbance comes from "wiggling" observations:
rounding, grouping, local shifts.
A measure of robustness to "wiggling" is
the "local shift sensitivity":
A = supIIC(x) - IC(y) I/Ix-yl.
x,/y
This will be infinite if the I.C. jumps as a function of x.
In such cases,
the estimator will jump with infinitesimal shifts in the data.
This is
true, for examplp, of the Cox pstimator in Chapter 7.
'I'hn.'(.' of the chapters in this thesis discuss n~grE'ssion problems.
Chapter 6, on an exponential model, and Chapters 7 and 8, on the proportional
hazards model of Cox.
In regression, the influence of each observation
comes not only from the (random) response variable y but also from the
covariables z.
Let us illustrate by solving for the I.C. of the least squares
estimator
B defined
by (2.1.5).
*
point x * = (y * ,z).
W<' writ.!'
F
and substitute F
l
Suppose wp want to find the I.C. at the
I
( I -I") F + F
[or F in <2.1.5):
r *
X
The equation defining f3(E;) = w (F ) is
£
then
0
(2.3.18)
J:(y
- z
I:()'
- z
I
r*
Z
Iy *
T
B(f.) )
dF £(y,~)
T
B(E)
dF(y,z)
-
z
.T
)
B(n) -
hey
z
T
B(,') ) dF (y ,
~)
1
Now differentiate (2.3.18) w.r.t. (: and evaluate at £=0.
This gives,
(using 2.3.9),
(2.3.19)
* *
*T
+ z Iy - z
S(O»).
Or
IC(y * ,z * ;l3,F)
( 2. 3.20)
If
=
when F
B(F
F , 8(0)
n
::
T
dF(y,:)
-j-l *
~ (y
), the loS. estimate based on F .
n
n
* _ z *T
Q
....
«» 1
The empirical I.C.
is therefore
IC(y * ,z * d3,F
(2.3.21)
-
-
n
)
n (Z
T
Z)
-1
z
*
(y
*
-
z
*T
A
S)
•
-on
This implies, that
*
A
*
It(y ,l, )
(2 ..3.22)
-
"n-tl
T
",."
T
(2.:L 23)
where r
n
n+ 1
( - - ) (Z Z)
(Z Z)
*
y* - z
*'1'
-1
."
-1 z * (y *
z r
*
l3
-n
We see that the point (y * ,z * ) influences the estimate
B in
two ways.
One, as expected, is by means of the discrepancy between y * and its
predicted value z
r
*
y*
z
*T
*T
S;
-n
this discrepancy is measured by the residual
A
l3
-n
The second way is by means of the magnitude of the
independent variables z*.
EVlm if the residual is slight, extreme values
of z * may strongly affect the estimate.
of z *
th(~
"infllll'nCe of
pl)~1ition
Hampel (1973) calls this effect
in factor space."
17
Errors in the data may give rise to large residuals, but a
(relatively) large z * may not be an error at all.
18
We may nonetheless
want to make an estimator robust against the influence of z's far from the
data (Hampel, 1973; Huber, 1973).
This influence can stand out strongly
in estimators that are otherwise robust
on ranks, in Chapter 7).
(e.g. the Cox estimator, based
3.
3.1.
EXTENSION OF HAMPEL'S FRAMEWORK TO SURVIVAL DATA
Introduction
In this chapter we 5how how Hampel's ideas can be applied to censored
survival data with covariates.
In the last part of the chapter, some other
work on the robustness of survival estimators is reviewed.
Application of Hampel's ideas first requires definition of an
empirical distribution function F.
Recall from Chapter 1 that in a
n
for each patient:
t. is the observed time on study, 6. is a censoring
1
1
indicator (0 = censored, 1 = uncensored), and
~i
is a vector of covariates.
We therefore find it natural to make the following definitions:
Definition 3.1.1.
The observations in the survival problem consist of
vectors x. ,x , ••• ,x , where
1
2
n
x.
J
(t"6,,z.)
1
1-1
if covariates are present; and
if covariates are not present.
To simplify notation the wavy underscore is
omitted from the x's, though Hot from other vectors, throughout this
dissertation.
Definition 3.1.2.
The empirical distribution function F (x) based on the
-'-..- - - - - - - - - - - - - - - - - - n - -
sample x ,x " .. ,x is that distribution function which puts mass lin at
n
l 2
each observed x. (i = 1, ••• ,n).
1
With F
n
defined, we can study survival estimators as von Mises
functionals w(F ).
n
3.2.
20
The Random Cens_orship Model with Covariates
Application of Hampel's theory also requires that the observations
x ,x , ... ,x be modeled as Li.d. random
2
l
n
function Fe.
(Here
e
variablf~s
with distributi.on
is the parameter to be estimated.)
This allows
consideration of Fisher consistency and properties of the influence
curve with regard to an underlying distribution.
This requirement poses a problem:
distribution specified is
Ge (s I~)
In survival analysis, the only
(or G (s), a special case), the dis-
e
tribution of the failure time random variable S: the only
-
interest are p~rameters of GO'
~Emeter..:?_ of
The observations x ~ (t,O,z) depend, how-
ever, not only on GO but on the occurrence of censoring and on the
covariates z.
The mechanisms generating the censoring and the covariate
are arbitrary nuisance mechanisms, which may not be random at all.
The
data can therefore arise in many ways.
To apply Hampel's ideas, we define a distribution F(x) in which the
censoring and covariates are random but otherwise unspecified.
This dis-
tribution is an extension of t.he model of random censorship (Gilbert,
Breslow, 1970: Breslow and Crowley, 1974; Crowley, 1973).
call this new model the !andom.
Definition 3.2.1.
~,hip
.ce~sorship
1\n observat.ion x. ""
1
!!:.odel
(t. ,l).
1.
1
~ith
1962:
We may t:herefore
covariates (and where
,z.) from the random ('ensor-1.
model is lIt'nerated in th" following steps:
1.
The z. are a random sample from
~
on domain
k(z)dz
Z.
= dK(z)
il
l
distribution with d.L
.
K(~)
-
We may assume that the distribution has a density
(defined suitably when z contains both continuous
and discrete elements).
2.
Obtain the failure
t
lme S. from G(slz.) with density g(s\z.) ,Where
1
z. was generated as in (1).
~l
~l.
".1
21
3.
Independent of step (2) pick the censoring time C
i
= p{c<clz.},
-1
but ion H(c lz.)
-1.
with density dH(clz.)
-1
with H(c\Z.) = 1 - H(clz.).
-1
4.
Form t.
i
-1
is, conditional on
= mineS. ,C.)
111
= h(clz.),
-1
Again z. is from step (1).
-1
other words, C
from a distri-
~i'
and
In
independent of Si·
o.1
and
tor of the event A).
5.
Then x.
1
=
(t.,O. ,z.) and the resulting distribution function is
1
1-1
F (x) •
If there are no
covariate~~,
the process consists of picking Si and C
independently from G(s) and H(c) respectively, and of forming (ti,Oi) as
in steps (4) and (5).
We can calculate expressions for the joint distrihution f(t,o,z) as
follows.
For the moment, let upper case "T" denote the random variable,
while lower case "t" denotes a particular value.
P(T<t,O=ll:) = P(S<t,s<cl~)
t
J H(yl~)g(yl~)dY.
=
o
The conditional density is found by differentiating the last
expression with respect to t:
(3.2.1)
f(t,O=ll~)
We will denote this f(t,ll~)
=
H(tl~)g(tl~).
dF(t,ll~)
dt
Then
f(t,l,z)
~ f(t,ll~) k(~)
i
22
Similarly,
()F(t,O,z)
where, again f(t,O,z) '''' --J:-~-- is "hort for f(t,o==O,z).
utaz
The total density, with respect to the product of Lebesque and
counting measure, is
(3.2.3)
f(t,o,z)
(3.2.4)
l
[H(t!:)g(tl:)]O [G(tl:)h(tl:)2 -0k(:)
[get
I 1-0 ] [h(t I:) 0-H(t:)
I 1-0 lk(:).
I~) 0-G(t:)
In estimation problems, the parameters of interest are parameters
of G and g.
The factorization of f(t,o,z)
is noted only implicitly and
only the first term
I
(3.2.5)
I
0-
get :) G(t :)
1-0
appears in the likelihood.
The symbols F ilnd f will designate marginal, conditional, and unconditional distributions of (t,O,z).
f (t, 1)
(3.2.6)
Thus we write
~F~~,_u.. ==
J f(t,l,~)dK(:),
Z
for the marginal density of observed failure times.
We will always assume
enough regularity so that integcations in t,z, or in C,S, and z can he
performed in any order.
Derivations, when there are no covariates, are easily made.
Remarks
1.
This random censorship modet with covariates
work of Breslow (1970).
Brt~slow
WetS
inspired by E'nrlier
studied a generalization of the Kruskal-
Wallis test for comparing survival distributions of K populations.
long run proportion of obsecvations from the j-th population was
K
A. ( E A,==1); and a different censoring distribution was allowed for
J j==l )
each population.
The
Define K 0-1 variables Zj'
Let prob{Zj=l, Zl~O, ~j}
=
23
Aj .
Then the K sample problem becomes a special case of random censorship
with covariates.
2.
By allowing a different censoring distribution for each population,
Breslow implicitly weakened the usual assumption of survival analysis
that censoring and survival be unconditionally independent.
In extension
to regression problems, the current model requires only that censoring
and failure time be independent conditional on the value of ZI S •
This
further weakens the usual assumption.
Suppose, for example, that younger patients are more likely than
older patients to move away during the course of a clinical trial.
Then short censoring times are apt to be associated with long
(unobserved) survival times, since both are associated with younger
ages.
But, conditional on age at entry, censoring and !'iurvival may be
independent.
The inclusion of age as a covariate in such a case allows
the inference to proceed.
3.
Not all censoring schemes are covered by this model.
ESp(~cially
excluded
are progressive schemes in which censorship depends on events in the
course of the trial.
Many statistical procedures are valid for such
schemes; the only requirement is that failure times of the censored
observations not be affected by censoring.
We cannot, however, study
such procedures by means of the random censorship model.
4.
To what extent can covariate vectors ~i be regarded in practice
random variables?
Some covariates are truly random.
1. Ld.
For example,
treatment assignment may be the outcome of a randomization process.
Or,
the patients may represent some sampled population, entry into the trial
being a "random" process.
At the other extreme
intercept in a regression
z .. - 1.
covariates which are fixed.
<lrE'
mod(~l,
24
The need for an
for· example, may require a covariate
We may think of this covariate value as being sampled with
1)
probab i l it y one.
Between the t'xtremes of truly random and truly fixed covariates are
others, in which t.he sampled distribution may be hard to describe at
best:
patient assignment schemes, "balancing" of patients within
treatments, biased or arbitrary recruitment and selection.
In such
cases (which are cormnon), we would only conjecture that the covariates
behave like random varidbles wit.h some regularity conditions.
5.
In Cox' oS (l C)72)
time.
regn'ssion model, the covariate values may change with
Such covariates may also be considered in a random censorship
model.
1\
di scussion is <]('fer-red unt i 1 Chapter Seven, where the study
of Cox's model takes placf'.
Related Work
3.3.
In this section, some rplated work on robm;tness in the survival
literature is outlined.
'l'hesp references do not constitute a complete
survey; they are intended only to alert the readf'Y to other approaches.
Fisher and Kandrek
(197'la) studied the extent to which choic{' of
incorr('C't model ;lffccted illfer('lIl'(' in f;lIrviv,ll "IIdlysi!C;.
exponent ial
regre~;~;ion
models. with Ute distribut i 011 dcpelldinq on one
covariate z:
1.
A1 (z)
2.
\.~ (z)
3.
A (z)
4.
A (Z)
4
J
1/(0
o
They studied four
+
l~
1
z), Felq.1 and Zelf'1l (l<.)()5).
Model 4 is an exponential mo<1('1 which
qivp~;
the usual multip.le logistic
'".
L.:J
probability of survival past to'
i dnd j
A distilnce bf'twf'en models
dl'firll'd by !h(· ,lv('cdqe
W3!';
(over z)
of the supremum norm:
(J.3.l)
E (su1>
d(A"A.)
.1
1
7.
II'
t>O
-'\,(z)t
-A.(Z)t
.1
- e
1
I).
The covariate z was given a uniform (-1,1] distribution; censoring was
not considered.
is known to be tlw true model, the parameters must sti 11
When model i
be estimated.
Here maximum likelihood c!;t.imation was used.
of the cstimatf'd model
from
(3.3.2)
~
dO,.,A.)
.1
1':\
(\ .
1
1
'I'hl' outer expecLltioll
wa~;
(E
tll(~
z
tru£' model was
(~;up
t
>0 I('
ciefint~d
-A.(Z)t
1
-A.(Z)t
-e
1
The distance
its:
I"
1\).
Llkt'n with respect to the asymptotic normill
distribution of A..
1
Examination of (3 ..1.1) ilnd (3.3.2) showed how the loss of precision
from incorrect model choice compared to the loss from the variability of
(It might have b('en informativp to compute d(A .. ~,.), but this
estimation.
)
Waf)
1
not done.)
Th('
re~;1I1 t~;
sh()w,-~d
I
(el(A.,A,)
1
1
mode
l~;
~
<_. tI(A.,A.)), "xc!'},t
.3
:;i7.e was LHqP.
1
1
J, and 4 wen' vpry c I OS('
I,
wh(~n
z h(ld
On the other h.Jnd, model
d
stconq t'ffl.'ct or wht'l\ sample
2, thp i"f·iql--·7,(']en model, was
considerably di ff('rent from tile uthccs.
Prentice (1974,1975) and Fan'well ann Prentice (ll)'17) embedd('d
i1
number of well known parametric models (Weibull, log normal, logistic,
gamma)
in a !;inqle parametric family.
time is modeled as
(1.3.3)
<1
In this family, the log of failure
line,lr combination of the covdriates:
log S (z)
('1
T
+ ap Z + Ow.
Here
a is a scale parameter and w is an error random variable.
26
According to
the generality required, w follows eithpr the distribution of the log of a
ganuna R. V.
(one parameter) or the log of an F' random variable (two
parameters).
Censoring is taken into account; likelihood methods provide
tests and estimates.
B
Prentice and Farewell show that inference about
may be more robust to spurious response times, if the model (3.3.3)
is
fitted.
Independence of Censoring and Sur_vival
Most procedures in survival analysis depend upon some kind of
assumption of the independence of censoring and survival.
(In the random
censorship model, we require independence conditional on z.)
The assumption
enables one to factor likelihoods, so that estimation of the survival
-
function G can proceed.
In many applications, the assumption of independence is suspect.
Patients who drop out of clinical trials may De more sickly than those
who stay in, or they may be hE·althier.
For example, patients who drop out
because of drug side effects may be constitutionally different from those
who do not suffer such side effects.
Or, patients who are healthier may be
better able to move away from the city in which a study is located.
What are the consequencE's for survival analysis?
been studied by several authors.
Peterson (1975) and Tsiatis (1975)
competing risks set-up proved the following:
tion F(t,O)
(or F(t,O,z»
This question has
in a
that any underlying distribu-
of the observed data may arise from an infinite
number of survival distributions
G,
if dependent censoring is possible.
Therefore G is not identifiilble.
-
Peterson studied nonparametric estimation of G via the Kaplan-Mei.er
estimate (described in Chapter Four).
He obtained sharp upper and lower
27
bounds for the estimate.
The lower bound is equivalent to designating all
censored observations as failures immediately after censoring.
The upper
bound is found by treating all censored observations as if they never fail.
For some data sets (in which there is little censoring) the bounds can be
narrow; with heavy censoring they can be wide.
Fisher and Kanarek (l974b) also studied the problem of estimating a
-
single survival function G, when censoring and failure time are dependent.
TIley presented a model with two kinds of censoring.
The first kind was
"administrative" censoring caused by end of study; such censoring should
be independent of survival.
due to dropping out.
The second censoring was loss to follow-up
In the model, such loss to follow-up affected failure
time through a non-estimable scaling parameter, 0
~
a
~
00.
Large values
of a (>1) corresponded to poor survival, following censoring; small values
«1)
corresponded to better survival.
The extremes of a=O and a::ro
corresponded to Peterson's upper and lower bounds, respectively, while
a=l was equivdlent to independence.
By varying a and finding the M.L.
estimate of G for each value, an investigator could asspss the
rob\l~tness
of his estimates to the indep{.·ndence assumpt ion.
Fisher and Kanarek also discussed ways of using auxiliary information
to test the assumption of indE'pendence.
Suppose that at time t., N. people
J
J
are being followed and K. of these are lost to follow-up (in some small
J
interval following t.).
J
If the assumption of independence holds, any set
of K. people of the N. at risk might, with equal probability, have been
J
lost.
J
If the assumption is not true the covariate values for the K.
J
who were lost should be somewhat separated from those who were not lost.
These facts are exploited to give tests of the independence assumption.
The appllcabi lity uf thl'5e results in regression problems is not
cloar.
Certainly one can apply til£' trick of failing censored ubservdtions
at the time of censoring or of letting them live forever.
easy to do and to program on a computer.
This should be
28
Perhaps an analogue of Fisher
and Kanarek's (l974b) a-technique can be extended to the regression problem;
this might provide a more realistic assessment than the extreme bounds
(corresponding to a=o and a=oo).
3.5.
Limitations of Hampel's Approach
First consider the estimation problem when there are no covariates.
We would like to study the continuity of O=w(F) with respect to changes in
G, the survival function.
We are restricted to studying continuity with
respect to changes in F, the argument of the functional w.
Results quoted
in the last section show that the same F may correspond to many different
survival distributions G, if censoring and survival are not independent.
This nonidentifiability property makes a general study of continuity
~I
extremely difficult.
One approach is the following:
Assume that rdndom censorship holds
(thus ensuring independence of failure and censoring times).
assume that the censoring distribution H(c) is fixed.
In addition,
The resultinq
distribution F of x=(t,6) should be a 1-1 functional of G:
F=q(G), say.
The estimator is then also a functional of G, defined by neG) = w(q(G).
The formal study of the functional n=wOq can then proceed; we shall not,
however, do so.
With covariates, the situation is more complicated.
We cannot define
the estimator as a functional O=fj(G), because there is no single underlying
G.
Instead there is a different distribution G(·I~) for each value of the
covariables z.
In regressiun problems, therefore, the functional approach
if> restricteJ to st.uJy of U""w(F).
29
The restriction of Hampel's ideas to estimation is another limitation.
Some extensions to test statistics are possible, but we will not pursue the
subject in this study.
4.
4. 1.
THE ONE-PARAMETER EXPONENTIAL MODEL
The Model
We start with a simple parametric model:
exponential.
the one-parameter
In this model, the hazard, density, and survivor functions
are:
A(t) :: A
(4.1.1)
e
-At
Actually, the model 4.1.1 is a special case of the Glasser multiple
regression exponential model, which will be studied in Chapter Six.
study 4.1.1 for two reasons:
We
(1) the simple exponential model is often
fitted in its own right to survival data; and (2) the calculations and
results serve as a good warm-up to the more complex models in the chapters
to come.
The Estimator
A, and the usual method is by maximum likeli-
The goal is to estimate
hood.
Each observation (t. ,6.) contributes to the likelihood a term.
1.
]
6.
gA(t
i
)
1.
1-6.
-
1
G).(t )
.
i
(4.1.2)
6.
::
).
1.
e
-At. •
1.
(See 3.2.5).
The log likelihood is
n
t(A)::
n
n
E t. (A) :: (In A) E 6. - A E t ..
i=l 1.
i=l 1.
i=l 1.
.H
The estimatiny equation :is therefore
n
(4.1.3)
d
Y.Pd =
dA
): V. (A)
l.
i=l
n
)~
i=l
A
o.l.
-
n
E t. = 0,
J.
i=l
O.
where Vi (A) = Al. - t
i
is the efficient score for a single observation.
Equation 4.1.3 is solved by
>.
n
n
n
LO./Lt ..
i=l l. i=l l.
A
(4.1.4)
n
is the ratio of the observed number of failures to the total exposure
time in the sample.
Functional Form
Let 1"
n
be the s.:lmplp dist ribution fWlCtion of (t,l).
A can
bl~
written
A =
n
1
1
(-r.
0 . ) / (-}: t . )
n
n
1.
l.
(4.1.5)
=
Io
dFn(O)/[t dFn(t),
= l\(F ) /B (1" ), say,
n
n
wlwre
A(F)
EF' «(.\) , B(F)
=
E (1)
F
w (l~)
(4.1.6)
•
Therefore A = w(F ) , with
n
n
A(1")/B(f') •
!::_lsher Consislcn<..:y
Let Hk) bp an urbitrary distribution of cem;oring time.
With the
model 4.1.1 for failure timp.s, a random censorship model (without covariates)
is easily defined,
uS
in Section 3.2.
Let F>.(t,O) be the resulting distribu-
tion of the data.
Fisher consistency will hold, if
32
We show this by evaluating A(F ) and B(F ), which are the expectations
A
A
(over
1'\)
of 0 and t, respt!ct.ively:
A(F ) = E (0) = p{s<c}
A
F
A
(4.1.8)
00
00
= J H(t)gA(t)dt
A
J
-
H(t)e
-At
dt.
o
o
00
(4.1.9)
I-FA(L)
(4. L 10)
=
B(F )
A
f (I-F A(t)dt).
=
E (t)
FA
(Rao, 1965, 2b2.1)
o
p{S>t} p{c>tl, by independence,
= G,\ (t) 11 (t)
e
-At
H (t) •
Therefore,
00
Je
(4.1.11>
-AtH(t)dt,
o
and
4.2.
The Influence Curve
Suppose we cont,iminute
d
distrihution F' with a point x
Writl' the contdmillcited di::;ldbution as F
=
c
(l-c)F' +f:I
1x1
(4.2.1)
w ( F ) = A ( F ) /8 ( F ),
£
or, to simplify notation:
(4.2.2)
w(c)
£
I::
(:
(t,6).
We
' where Ilxl is
the c.d.f. which puts mass 1 at x.
The estimator of the exponential hazard based on F
~
is:
BIOMATH£MATlCS TRAtNING PROGRAM
33
The influence curve is given by:
(4.2.3)
IC(x;w,F) =
where 1'.'(0) :::: ?d
C
1\«(:)1
:i
L=
w(C)
Ic~o
B(O)A' (0) - A(O)B' (0)
= B 2 (0)
0 etc. and 1\(0) :::: An').
'fhe derivatives A'(O)
and B'(O) are easy to evaluate from (2.3.3), since A(F) and B(F) are
means:
A' (0) = <5 - A(F)
(4.2.4)
B'(O) = t
- B(F).
Simplifying (4.2.3), we find
1('(
=
(t,cS) ;w,F)
6 - [( A (F) /B (F) 1t:_
Bn')
(4.2.5)
<5 - w(F)t
------_.
B (F)
The
~mpirica1
I.C. is found by substituting P
o - An t
IC( (t,o) ;w,F )
(4.2 •. 6)
for P in (4.2.5):
6 - A t
B<J.T =
(I
Check on
n
n
n
t
n
Computatio~
By (2.3.9), an I.C. should satisfy EF(IC(x;w,F»
show this for the I.C.
E
p
( 0) = A ( F), E
F
E (I C ( (t
F
We can easily
By definition, we have for any F,
(4.2.5).
( t ) "" B ( F) •
= O.
By
( 4 • 2 • 5)
,0) ; w , f') = E {.o - (A (F) /8 (P) ) t.}
F
B(P)
(4.2.7)
:::: (A(Fr -
The argument ctbove holds for 1"= 1.0'
A(F'»/B(F) = O.
n' so that
n
~
i=l
which is (2.3.10).
IC(x.;w,F )
1
n
=
0,
o
34
Interpretation
curve~
The influencE;'
to interpret.
for the estimated hazard rate are fairly easy
We go into some detail, because the principles will apply
in later chapters.
How will the addition of a new data point x=(t,O) affect an estimate
A based on a sample of size n'?
For an answer, let us first examine the
n
formula 4.1.5 for A ,
n
(l/n)
A
n
(4.1.5)
>:0 1.
.( lIn )-r.t,'
1.
We see that 0=1, an observed f.lilure, will increase the estimate, if the
corresponding t
,~+x
result 1.n
1\
n+
is less than
1 <
\
1\
-
equal to t
01"
Similarly, 0=0 and t>t
n
n
will
The change in other cases depends on the relat.ive
n
changes in (l/n)L(S, and (l/ll)r.t..
1.
1.
Intuitively, a relatively large t ought
to decrease the estimator, even if 0=1.
Let us now see how the empirical I.C. clarifies matters.
Recall this
is
o(4.2.6)
t
A t
n
n
From (2.3.7), we regard the 1.('. as a (first order) approximation to
(n+1)
A+ X
(A I - A).
n+
n
To an approximation, then, we expect the estimate to
decrease if o/t < A , to increase if Oft > A
n
0/t =
A.
n
n
and to show no change if
Note that the first inequality is not entirely correct.
An
observation (t,O) with 0=0 can still result in a decreased estimate if t
small enough.
Another shortcoming of the approximation is also apparent.
large enough t, the approximation implies that
zero, but A>O.
An+l
For
can decrease below
is
35
Another point of interest is this:
number of failures (0) in x=(t,O) and
failures (E) in time t.
An t
We can regard 0 as the observed
as the expected number of
Then the empirical I.C. is proportional to O-E,
a kind of "residual" of the observation (t,o).
I.C. gives at least a
qualitativ~
We therefore see that the
indication of the influence of (t,O),
when (t,O) is added to the sample.
4.3.
The Limiting Variance and Distribution of
An
We now compare the limiting distribution of
~
n
from standard likeli-
hood theory to that suggested by the theory of von Kises derivatives.
Suppose that the data arise from a random censorship model FA' with
survival given by (4.1.1).
The information in a single observation (t,O) is
(4.3.1)
If
the ordinary regularity conditions dpply to the density f~(t,O),
then we have the stand.'\rd result (Hao, 1965, 5f).
(4.3.2)
wher(~
(4.3.3)
The regularity conditions apply to the total random censorship model, which
includes the di.stribution of the censoring times C. (i=l, ... ,n).
1.
This poses
no real problem, as one can first condition on C. before taking expectations.
1.
See Cox and Hinkley (1975, 4.8).
The result (4.3.2) can also be obtained direetly from the formula
,\
n
~:\~ />:t ..
n
1
1
'I'11l' dsympt.otic normality of nin~O./n,}:t./n) is proved by a
1
1.
36
multivariate central limit theorem.
The limiting distribution of
is then obtained by the "0 method" (Rao, 1965, 6a.2).
n~(~n -X)
Details are left
to the reader.
According to the theory of von Mises derivative
In(~ -
(4.3.4)
X)
n
1 N(O,E F
(IC 2 (XiW,F).
X
We now show that the limiting variances of (4.3.4) and (4.3.3) agree.
The I.C. (4.2.5) can be written
IC(XiW,F A) =
oB(F
- At)
X
X
(4.3.5)
= B(F
=
X
B (F
) (
X
0 - Xt)
X
) u. (X)
X 1.
•
where U. (X) is the efficient score defined at 4.1.3.
1.
Therefore
E (IC 2 (Xi W,F »
F
::
=
(4.3.6)
X2
B2 (F ) E(U~(X»
A
A(F )
2
X
X2
X
i
(X) :: 2
B2(F )
B (F ) )\2
X
A
A(F )
X T(A)-l.
= X2
Therefore the two variances are identical.
4.4.
Other Functions of A
Suppose we are interested not only in estimating X but in estimating a
parameter 8 which is a 1-1 function of X:
estimate of
e
is then
en =q(X).
n
in a straightforward way to
8=q(X).
The maximum likelihood
We now show that the results for A extend
n
en .
First. it is easy to see that 8 is a von Mises functional defined
by reF ) = q(8) = q(w(F ».
n
n
r(F ) = q(W(F »
X
X
= q(A) =
e.
This functional is Fisher consistent:
37
The influence curve for reF) follows directly from that of w(F) •
Write r(R ) = q(w(F ».
C
£:
Provided that q(e) is differentiable, we have
IC(x;r,F) = :£ r(F£) 1£=0
(4.4.1)
= q'(w(F»
l:£
= q'(w(F»
W(F£) 1£=01
IC(x;w,F).
If the theory of von Mises derivatives is applied to
e,
then
10(0n - 0) 1 N(O,EF (IC 2 (x;r,F»
where
(4.4.2)
This is exactly the same result obtained from applying the "0 method" to
the function q(A ).
n
Example 4.4.1.
Let a=l/A, the mean of the exponential distribution (4.1.1).
Here a =l/A = Et./Eo ..
n
n
1
1
Formally q(A)=A
IC( (t,O) ;r,F)
= -A -2
-1
, and r(F)=w(F)
-1
.
By (4.4.1)
IC( (t,O) ;w,F)
(4.4.3)
= A(F) -1
obsl~rvation
As expected, a large
estimate of O.
of t.
(t-8o) .
time t will probably increase the
In form, the Ie (4.4.2) for a is unbounded as a function
In practice, censoring will usually impose an upper limit, C
max
say, on the observation times.
Deviant survival times greater than C
max
are "moved back" to C
and have influence curve no greater than
max
A(F)-l C
max
If only one failure time is observed in a sample of size n,
the rest being censored, 8
n
will remain bounded.
Therefore, the estimate
does not break down easily, unlike the mean for uncensored observations.
Nonethpless, the? bound A(F)-l C
max
may still be large, if C i s .
max
And, as Hampel (1968, p. 89) remarks in a similar context, the estimate
A
en
38
may be sensitive to distribution of failure times near C
max
Example 4.4.2.
Let 8=log A, a location parameter for log failure time.
influence curve is easily found to be
IC(XiW,F) = IC(XiW,F)/X
6
t
= - - - B(F)
A(F)
•
The
5.
5.1.
NONPARAMETRIC ESTIMATION OF AN ARBITRARY SURVIVAL FUNCTION
Note on Rank Estimators
In this chapter and in Chapters Seven and Eight, we consider esti-
mators based on the rank ordering of observations.
Some comment is needed
to show how the rank representation is based on the sample F •
n
Let the data be (t.,o.) or (ti,o"zi)' as (i
~
~
1.
-
= l, ..• ,n).
The marginal
distribution of t, in the sample, is
1
(5.1.1)
n
=-
F (t)
n
I I[ <t).
n i=l t i
If t(i) is the i-th largest observation, then 5.1.1 means that
(5.1.2)
and that
(5.1.3)
Here F
n
is defined to be continuous from the right, following Breslow
and Crowley (1974).
Note that the rank of t. is usually defined by
1.
n
(5.1.4)
R.
~
=
I I
. 1 [t.<t.l
J=
J- 1.
.
Therefore
(5.1.5)
=
(n-R.+l)/n,
1.
a fact used repeatedly.
5.2.
The Product Limit and Empirical
Hazard Estimators
When a parametric model i.s not specified, a nonparametric estimate
of G(s) is called for.
A well known estimator is the product limit (P.L.)
40
estimate, discovered by Kaplan and Meir (1958).
If there are no ties
among the failure times, the P.L. estimate is defined by
=
.
1:t
6
n-R
TI
<S
1
i- [n-Ri+l J
i
(5.2.1)
..
TI
i:t.<s
1.-
[1-
This is a step function, jumping at observed failure times and
constant between.
The P.L. estimator has been studied by Efron (1967) and
Breslow and Crowley (1974), with extensions to multiple decrement models
by Peterson (1975) and Aalen (1976).
Peterson (1975) has succeeded in writing 5.2.1 as a functional of
F.
n
His representation is difficult to work with, so we consider instead
a nearly equivalent estimator of G(s).
estimator (Grenander,
1956~
This is the empirical hazard
1970~
Altschuler,
Breslow and Crowley, 1974):
= exp(-Ae(s»
n
(5.2.2)
where
E
(5.2.3)
1.-
estimates the cumulative hazard A(s).
assumed.)
Ae(s)
n
= +00,
(l)
,
< S,u,'"
~
l} n-R,+l
{ 1:t,
1
1.
(Again the absence of ties is
If the largest observation, t
max
' is a failure time, then
for s>t
.
- max
The empirical hazard estimate can be derived as the maximum likelihood estimate of G(s), assuminq a constant hazard function between observed
failure times.
Breslow and Crowley prove the following lemma relating
41
Lemma:
Let net) equal the number of individuals at risk of failure at t.
Then,
o < -log {aPI,(s)} _ Ae(s) < n-n(s)
e
Proof:
n
n
n-n(s)·
Breslow and Crowley, 1974, Lemma 1.
The estimates are therefore close and are asymptotically equivalent.
5.3.
Functional Form:
Fisher Consistency under Random Censoring
We choose to work primarily with Ae(s); conclusions about ae(s) will
n
n
follow immediately.
From 5.1.5 and 5.2.3, we see that
(5.3.1)
s
f
=
(l-Fn<t»-l dF (t,l).
n
o
This representation was first given by Breslow and Crowley (1974, eq. 7.6).
For general F, we define the functional
I
s
(5.3.2)
A(s,F)
=
(l_F(t»-l dF(t,l).
o
Then
e
A (s)
n
= A(s,Fn ).
Fisher consistency under random censorship now follows, as Breslow
and Crowley showed.
For, sUPIKlse F i9 a random censorship distribution.
By 3.2.2
dF(t,l) • H(t)g(t)dt.
Also,
I-F(t)
= H(t)G(t) .
42
Therefore
I
I
s
A(s,F) ::
o
=
s
9(:)dt
G(t)
A (s)
A(t)dt
0
•
The estimator of G(s) in functional form is
Ge(s) = exp(-A(s,F »,
n
n
and this is also Fisher consistent:
= G(s).
exp(-A(s,F»
Breslow and Crowley actually proved the weak convergence of A(s,F )
n
to A(s).
This follows from continuity of A(s,F) in the supremum norm
and from the convergence in the norm of F
n
5.4.
The Influence Curve
We now find the I.C. of A(s,F).
x*
to F.
Write the contaminating point as
* * where the asterisks are temporarily added for clarity.
(t,O),
F£ :: (l-£)F + £1
The estimator based on F
c
x
*
= F + £(1 .-F).
x
of the cumulative hazard, is A(s,F ), and
£
the I.C. will be given by
(5.4.1)
•
I.C. (x ;A(s,F»
=
I
dd £ A(s,F)
£ £= O'
From 5.3.2
(5.4.2)
A(S,F ) =
£
f
I[
.
y~s
Let
lO(l-F (y»
£
-1
dF(y,O).
43
We expand this:
=
I
I[ < l0(l-F (y»
y_s
(5.4.3)
-1
dF(y,O)
.
£[I[t*~S)6*(1-F£(t*»-1
+
- £J I[Y~S)6(1-F£(y»-ldF(Y,O»).
Let us assume that we can differentiate with respect to £ under the
integral sign in 5.4.3.
Then
I
IC{{t*,O*); l\(s,F»
0
d l\(s,F)
d£
£ £=
(5.4.4)
< )O{dd (l-F (y»-ll 0}dF(y,6)
£
£
£=
J I[ y _s
+
since
F
d
We must evaluate -(l-F (y»
de
£
d
-d(l-F
(y»
£
E
-11
E=
I
[t *<s]
6*(l-F(t*»
-1
- l\(s,F),
F.
o
-1
:
d (l-F (y»
0 = -(l-F(y» -2 [-d
E
E·
I
£=
0]
(5.4.5)
d
+ ( 1-F (y) ) -2 [-d-F
. (y)
(L
I
_=() 1-
L
The last derivative is
(5.4.6)
d
dE [F(y) + E (I [y>t*l - !"(y»
1
=
I [y>t*) - F(y) •
The indicator in 5.4.6 in the distribution fWlction wi.tlt
t*, evaluated at y.
(5.4.7)
Y'2.t *
y>t*
or
(C,.4.8)
ffidSS
1 at
44
Incorporating 5.4.6 and 5.4.5 into 5.4.4, we find,
s
= f
IC«t*,<5*);l\(S,F»)
(1-F(y»-2 I [y>t*ldF(y,1)
o
(5.4.9)
s
(l-F(y»
- f
-2
F(y)dF(y,l)
o
+ I
[t*~s I
<5*(l-F(t*»-l
- l\(s,F).
This can be simplified, if we add and subtract from the first two
terms the following expression:
min(s,t*)
J (1-F(y»-2 dF (y,1)
D
o
(5.4.10)
s
=
f
(l-F(y» -2 1 [y<t*ldF(y ,1) •
o
~len
the sum of the first two Lerms in 5.4.9 is
s
J
o
(5.4.11)
(1-F(y»
-2
(I [y~t*l + I [y>t*l - F(y) )dF(y,l) - 0
s
f
(1- F ( y) )
-2
(1- F ( y) ) dF (y , 1)
- D
o
l\(s,F)
Therefore, we can
- D.
rewritl~
the I.C. 5.4.9, dropping the asterisks,
as
min(t,s)
I C ( (t , <5); 1\ ( s , F» = -
f
(1- F (y) ) -
2
dF (y , 1 )
o
(5.4.12)
+ <5 I [t<sl (1-F(t»
-1
•
45
The empirical I.C. of A(s,F ) is found by substituting F for F in 5.4.12:
n
n
IC«t,o);A(s,F »
n
(5.4.13)
= 0,
In Appendi.x AI, it is shown that EF(IC(X;!I.(S,F»
for differ-
entiable F, and that the empirical I.C. sums to zero over the sample.
that 5.4.12 and 5.4.13 will be undefined for any t such that F(t)
= 1.
Note
We
therefore agree to define the estimators only for s such that F(s) < 1.
The influence curve for
G(S,F)
(5.4.14)
=
exp{-!l.(s,F)}
-
the estimator of G based on F, follows easily from 5.4.12:
IC(x;G(s,F»
~ dd exp{-A(s,F
£
(5.4.15)
£
)}I £=0
I
.. -G(s,F o )
[~d
£ (s,F)
£
~ -G(s,F)
IC(XiA(S,F».
f. ..
0)
e
In particular, when F=F , we obtain the empirical I.C. of G (s):
n
n
IC(t,O); G(s,F »
n
(5.4.16)
5.5.
Discussion
Let us examine the empirical I.C. for
new point (t,O) affects the estimator.
0=0 and 0=1.
-e
G
n
(s) (5.4.16) to see how a
We consider separately the cases
46
Case 1:
0=0.
When (t,O) is added to the sample, the I.C. is
Ge(s)
(5.5.1)
n
!
{L
n
(l-F (t.»-2 }
n
1.
{i:O.=l,t.<min(s,t) }
1.
1.-
For t less than the smallest sample failure time, the I.C. is zero,
implying no change in the estimate.
This is also apparent by reference
to 5.2.2 and 5.2.3.
Suppose there are m failure
Then as t moves past t(j)' for. t
ej
tim:~s,
ordered t(l) < t(2) <
) < s, the I.C. is increased by
-e
-2
Gn(s) (l-Fn(t(j»)
.
(5.5.2)
n
Each term 5.5.2 is larger than the previ0\1s ones, suggesting that larger
censored observations have a stronger influence than smaller ones.
This
larger influence is clearly related to the diminishing number at risk,
expressed by increasing. Fn (t).
As soon as t>s, the I.C. does not add any more terms but instead
remains constant.
This const~lt influence indicates that Ge(s) is not
n
affected badly by outliers greater than s.
Case 2:
6=1.
When a failure time (t,l) is added to the sample, the picture changes
dramatically.
(5.5.3)
Now there is a negative contribution to the I.C. from
-e
-Gn (s)
I [t~sl (l-F (t»
n
-1
.
For t<t(l)' the negatjve contribution of 5.5.3 constitutes the entire
I.C.
As T moves past the sample failure times 5.5.3 grows larger in
absolute value, but its contribution is offset by 5.5.1.
continues until t>s; at that
~)int
The tradeoff
the contribution of 5.5.3, which had
47
been increasing, becomes zero.
Thereafter the observation has the same
influence as a censored time greater than s.
The lesson in all this iB fairly clear and expected.
tend to decrease the estimated probability of survival.
Failure times
Late failure times
and censored times are influential because the risk sets have diminished
in size.
5.6.
The Squared I.C.
Suppose o<s<u.
Define the functional
!\(s,u,F)
- 2xl
(5.6.1)
=
[J\(S,F)].
J\(u,F)
Then
e
J\ (s,u), say.
!\(s,u,F )
(5.6.2)
-
-n
n
For arbitrary F, the influence curve of 5.6.1 is
IC(x;I\(S,U,F»
(5.6.3)
2x1
= [IC(X;I\(S,F)]
IC(x;I\(u,F)
Now assume that the observations xi = (ti,Oi)' i = l, ••• ,n are
distributed according to F(t,O), not necessarily a random censorship
distribution.
According to the theory of von Mises derivatives (2.3.14)
~(I\e(s,u)
(5.6.4)
-n
- l\(s,u,F»
~
N(O,A(F»
where
(5.6.5)
A(F) = EF{~C(Xil\(S,U,F) IC(XiJ\(S,u,F»
2X2
T
l.
48
Explicitly, the elements of a(F) are
(5.6.6)
Breslow and Crowley (1974) provide a rigorous proof of the limiting
normality of ~(Ae(s,u) - A(s,u,F».
-n
Their proof (which does not depend on
the von Mises theory) holds if (a) F(t) < 1 if t<oo and (b) F(t) and F(t,l)
are continuous.
(w
ij
The covariance matrix of the limiting normal process is
) (2 X 2) where for
s~u,
w
: w
: w
: W(s), defined by
21
ll
12
s
(5.6.7)
f
W(s) :
(1 -F(t»-2 dF(t , 1) ,
o
and w
: W(u).
22
If a random censoring model holds, condition (b) above is replaced
by (b)
I
G and H are continuous.
In this case, the covariance function w(s)
becomes
s
W(s)·
f
(l-F (T) )
-1 G(T) -1 dG(T).
o
For the result of Breslow and Crowley to agree with the theory of
von Mises derivatives, we must show A(F) : W, or for s<u
(5.6.8)
(5.6.9)
E {IC(Xi1\(s,F» IC(XiA(u,F»} : W(s).
F
We now prove (5.6.8), assuming that F is differentiable.
of (5.6.9) is similar and is therefore omitted.
The proof
49
We start by expanding the IC (5.4.12)
min (t ,s)
IC«t,O) ,A(s,F»= -
I
(1_F(y»-2 dF(y,l)
o
+ 0 I[t<s] (1-F(t»
-1
t
(5.6.10)
-I[t<sJ
I
(1_F(y»-2 dF(y,l)
o
s
-I[t>sl
(1-F(y»
J
o
-2
dF(y,l)
+ 0 I [t<s] (l-F(t»-l
-I[t<sl B(t)
(5.6.11)
-I[t>sl B(s)
+ 0
I
[t<sl (l-F (t» -1
where
t
B(t)
(5.6.12)
J
(l-F(y) )
-2
dF (y ,1) .
o
Squaring the Ie, we have
+ I
2
2
[t~sl B (s) - 2
1:
u
I [t<s] I [t<s] B(t) (l-F (t»
-1
(5.6.13)
Because
I[t~s]I[t<Sl
= 0,
I.C. therefore reduces to:
the last two cross products are zero.
The squared
50
= I[
2
t<s
]B (t) + I[
<"
t -s
]0(1-F(t»-2
(5.6.14)
2
+ I [t>sjB (s) - 2
0
I [t<S]B(t) (1-F(t»
-1
•
Taking the expectation of these terms, we find
s
JIC
2
«t,O);A(S,F»
dF(t,O)
=
f B 2(t)
s
dF(t) +
o
f
(1-F(t)-2dF (t,1)
o
(5.6.15)
s
+ B 2 (s) (l-F(s»
- 2
JB(t) (l-F(t»
-1dF (t,1).
o
The second integral on the r.h.s. of 5.6.15 can be recognized as
W(s).
Let the sum of the remaining terms be denoted Q(s).
Then 5.5.8 will
hold if Q(s) = 0 V s>O.
s
Q(s)
=
J B2 (t)
2
dF(t) + B (s) - B(s)F(s)
o
(5.6.16)
I
s
- 2
B ( t) (1- F (t) ) -1 dF ( t , 1) •
o
Now differentiate both sides of 5.6.16 with respect to s.
B(O)
=
Because
0, we may neglect terms multiplied by B(O).
Then
Q' (3) = B 2 (8) dF(s) + 2B(s)B' (s)
ds
( 5 . 6 . 17)
- B 2 (s) dF ( s ) - 2B ( s ) B' (s) F ( s )
ds
- 2 B(s) (l-F(s»
(5.6.18)
= 2B(s){B'(s) (1-F(s»
Because B'(s) = (l-F(s»
and Q' (s)
-1
=
0 V s>O.
-2
dF(s,l)
ds
-
(1-F(S»-1 dF(s,1)}
ds
dF(s,l), the term in brackets is zero,
ds
51
This implies Q(s) is constant.
Q(O)
= 0,
we must have Q(s)
= 0,
With the immediate side condition
proving 5.6.8.
The asymptotic normality of In(Ge(s) - G(s)) follows from application
n
of the o-method to Ge(s) = exp{-Ae(s)}. The limiting variance of In Ge(s)
n
n
n
is given by
(5.6.19)]
V(s)
=
(l-F(s))W(s).
As shown in 4.4, this is identical to EF{IC2(XJG(s,F))}.
6.
6.1.
AN EXPONEN'rIAL
1{!~GRESSTON
MODEL
Introduction
In this chapter we encounter the first of the two regression models
we will study.
This model is exponential, with hazard function depending
on the covariates but not on time.
(6.1.1)
Here B is a pXl vector of regression coefficients, including intercepts.
(Recall upper case "T" denotes a transpose.)
Model (6.1.1) was proposed by Feigl and Zelen (1965), who gave the
likelihood equations for uncensored data with one intercept and one continuous covariate.
Glasser (1967) wrote down the likelihood equations for
censored data with several intercepts.
This work was in turn extended by
Breslow (1972a,1974) to multiple covariates.
Prentice (1973) treated the
problem from the viewpoint of structural inference.
The likelihood equations below are slightly different from those
appearing in earlier work.
In an analysis of covariance with parallel
lines, the estimated treatment effects (intercepts) can be written in terms
of the other estimated parameters.
The likelihood equations and the
observed information matrix are thereby reduced in dimension and changed
in form.
In this chapter, we retain full generality and do not separate
treatment intercepts from other coefficients.
6.2.
Likelihood Equations
The density and survival functions corresponding to the hazard 6.1.1
are
= exp(~
(6.2.1)
T
53
T
~)
exp(-exp(a z)t)
and
= exp(-exp(aT z)t),
(6.2.2)
respectively.
From 3.2.5 the contribution of a point xi
(t.,o, ,zi) to the log
E
1.
1.-
likelihood is
~(a;x,)
-
0,
log [gO(t'\Zi)
e ~ 1. -
=:
1.
1.
l-Oi
GQ(t,lz.)
~
1-1.
(6.2.3)
T
=a
z.6.
- -1. 1.
T
- expC6 zi)t ..
-
1
The efficient score for the observation is then (utilizing vector
notation)
(6.2.4)
T
z. (O.-exp(a Z,)t,).
-1
1
-
-1.
1
Based on the sample x ,x , ••. ,x , the M.L.E. is the solution to the
n
l 2
p likelihood equations:
n
(6.2.5)
U(B )
n
" T
z, (O.-exp(a z.)t )
i=l -1 1
-n -1 i
= E
pXl
a
O.
The equations are usually solved by a Newton Raphson procedure.
observed second derivative of
~(B;x.)
-1.
is
(6.2.6)
= -Z,Z.
T
-1-1
T
exp(a z, )t,.
-
-1.
1.
The observed average information matrix is therefore,
(6.2.7)
1avg
pXp
=-
1
n
E
C(~ix)
n i=1 - -
i
~
1
n
E Z Z
n i-I -i-i
T
T
exp(a Z .)t .
- -1 i
The
54
The actual average information is:
T
E(U(SiX.)
1 u(8.x.)
_ -1 1 )
I
(6.2.8)
(13)
= E (z . z.
-avg -
T
T
exp (8 z.) t . ) ,
- -1 1
-1-1
if the zls are regarded as random variables, or
(6.2.9)
1
(13) = lim{-
I
-avg -
n~
n
E z.z.
T
T
exp(13 z.) E(t.lz.)}
- -1
1 -1
n i=1-1-1
in general.
According to standard likelihood theory for nonidentically distributed variables (Cox and Hinkley, 1975, Sec. 9.2).
m(S-n -13)-
(6.2.10)
assuming the model is true.
6.3.
1 N(O, 1-1 (a»,
- -avg -
In practice
I -1 (8)
-avg
is estimated by
l
i-avg
(B n ).
The estimator as a von Mises Functional, Fisher Consistency
We now show that 13
-n
is implicitly defined as a functional of F •
n.
The proof is similar to that in the multiple regression example in 2.li
we simply divide the likelihood equations 6.2.5 by n:
(6.3.1)
.!.U.(S)
n
n
A
z. (6 . -exp (8
1-n
-1
1
T
z . )t . )
-n -1
1
o
or
(6.3.2)
For the formal approach, we define
(6.3.3)
U(8,F)
=
I~(6-exP(~T~)t)dF
p'><l
where 0 is pXl and F is an arbitrary distribution function.
Then the Glasser estimator based on F is
of El, such that
d~fined
to be that value
assuming a unique solution exists.
6
(6.3.5)
where B(·) is a functional.
(6.3.6)
55
U(6,F) = 0,
(6.3.4)
For the solution to 6.3.4, write
= B(F)
Then B(F) is defined by
~(~(F) ,F) = f~(o-eXP(~(F)T~)t
The finite sample M.L.E. is just B
-n
dF • O.
= B(F
),
- n
and the likelihood
equations 6.3.2 are
U(B ,F )
- -n n
(6.3.7)
= O.
-
Suppose now that the random censorship model with covariates (see
3.2) governs the distribution of the data x = (t,6,z).
Failure time
is assumed to follow the exponential regression model 6.1.1, and we label
the resulting distribution F
8
.
By the definition 6.3.6 of the estimator B(F), Fisher Consistency
will hold i f
(6.3.8)
But 6.3.8 is just the condition
(6.3.9)
E
F
(U
(B , x)} ==
0,
13 -
which follows from a standard result for regular likelihood problems
(Cox and Hinkley, 1975, 4.8).
We will also find it helpful to define the average information matrix
6.2.8 as a functional of F:
(6.3.10)
Then the reader can easily check that the observed sample information
matrix 1
(8) (6.2.7) is, in functional form, written I
(I3(F),F).
-avg -n
-avg - n
n
56
6.4.
The Influence Curve
We contaminate an arbitrary distribution F with a point x
*
=
(t
*
*
and write
F
=(l-£)F+£I*
x
£
Let S(F ) be the regression estimator based on F£.
~
£
By 6.3.6, S(F )
-
is defined by the relation
U(S(F ),F ) = O.
(6.4.1)
-
-
£
£
-
For notational convenience, we write
(6.4.2)
Then
Recall that the influence curve of the estimate is given by
*
IC(x ;f3,F)
-
-
=
d
-d S(£)
£
-
I '
£=0
where the differentiation is coordinate-by-coordinate (2.3.5).
We will
find the I.C. by implicit differentiation of the estimating equation
6.4.1.
Expanding F , we find
£
(6.4.3)
()
=
t
'1'
~{6-exp(~ ~(£»t}dF
J
* *
*T
*
£Ix {IS -exp(z S(£»t})
*
,6 ,z )
£
57
Or
o
= U(l3(£),F )
£
* *
*T
*
= U(l3(£) ,F) + £[z {o -exp(z l3(£»t}]
(6.4.5)
- U(l3(O) ,F).
The first term on the r.h.s. can be evaluated by vector calculus
if we differentiate under the integral sign.
~(a(E).F) IE~O ~
:E
I~{6-eXP(~T~(E))t}dF
-f~[=£ exP(~T~(£»tl£=0 ]dF
(6.4.6)
-f~ ~T exp(~T~(O»t dF[:£ ~(£)
::: -I
<SCO)
-avg pxp
I ]
£=0
IC(x * ia,F)
,F)
pXl
The third term on the r.h.s. of (6.4.5) is U(l3(O) ,F) =
~
by
definition of 13(0) 6.4.1.
Thus
o
-1
(~(F) ,F) IC(x * ;f3,F)
-avg -
*
+ Z * {o * -exp(z *T l3(F»t}.
(6.4.7)
Now we drop the asterisks from x
*
*
*
the I.C.
IC( (t,o,z) ;t3,F)
-p
--
X1
(6.4.8)
=
C l UHF) ,F) {z(o-exp(zTf3 (F»t.},
-avg - pXp
pX1
where a generalized inverse
~lY
be used.
*
= (t,o ,z ), and solve (6.4.7) for
58
We recognize the I.C. as
(6.4.9)
= I -1
IC(xd3,F)
-
avg
(fl,F) U(fliX)
- -
where U(S,x) is the efficient score 6.2.4 based on x
(T,O,z).
s
The empirical I.C. follows by substituting F for F :
n
IC ( (t, 0 , z) i fl , F )
-
-
n
(6.4.10)
==
"
I
"-1
(fl,F)
avg n n
T
{z(O-exp(z a )t)}.
- -n
We can easily show that the basic property 2.3.9 holds for the I.C.
That is,
6.4.6.
f~C(X;~'F)dF = 0,
(2.3.9)
for arbitrary F.
For,
in~egrating
6.4.6 over F, we find
I~C(Xi~,F) = !~~g(~(F) ,F) I~{o-exP(~T~(F»t}dF
(6.4.11)
-1
= -avg
I
(fl (F)
=
,F) U(a (F) ,F)
o.
The proof applies to F == F , so that also
n
n
(6.4.12)
E
i=l
6.5.
IC (x.
1
,
a, F n )
-
lC
o.
The Squared I.C. Under Random Censorship
Suppose that the model 6.1.1 is true and that random censorship holds.
The asymptotic covariance matrix of a
is!
-1
(~,Fa).
fore expect:
n
given by likelihood theory (6.2.10)
From the theory of von Mises derivatives, (2.3.14), we there-
E {IC(Xia,F Q ) IC(Xia,F Q )
F
-~
-p
(6.5.1)
a-
-
-1
T
}
= -avg
1
(a,F
- p
Q )
59
.
•
But by 6.4.9 the l.h.s. is equal to
-1
1
(a,F Q )
-avg - ~
-1
= 1-avg
(a,F Q ) 1
(a,F )
- ~ -avg - Q~
(6.5.2)
-1
1
(a,F )
-avg - a~
-1
o
::: 1
(a,F Q ) .
-avg - ~
6.6.
Discussion
Recall from 2.3.8 that for x = (t,o,z),
X
(n+l)[8+
- 8) - IC«t,o,z);a,F)
-n+ l
-n
n
(6.6.1)
-1
T
~
= 1-avg (8,F
) {z(o-exp(z
n n _
where S+X is the estimated baued on xl' x ' ..• , x and x.
2
-n+l
n
a )t)},
-n
From this repre-
~
sentation, we <.:an draw some conclusions about the way a is affected by
individual data points:
1.
The I.C. is continuous as a function of t, for t < C
, the maximum
- max
possible observation time.
The estimator therefore depends (for
fixed z) strongly on the value of t.
2.
The term
multiplies all entries in Lhe I.C.
For any r<.lndom censOl:ship model F, it can be shown that
El" (0)
where 1\(tlz)
= EF (1\ ( t
I~» ,
is the cumulative hazard 1.1.5. of the failure time.
the exponential model,
1\(tl~) = exp(zTa)t; therefore
EF(O)
= EF{exp(z T 8)t).
Now, 15 can be considered the observed nwnber of failures (0) in the
T~
obsprvation (t,o,z), and cxp(z a)t may be regarded as the
~xpectt1d
In
60
number of failures (E) in the observation.
R
T
Therefore
A
=
o-exp(z B)t
=
Observed - Expected
=
O-E,
is a kind of residual associated with the observation.
(This is the
viewpoint of Peto and Peto, 1972, p. 193.)
A negative residual large in absolute value can arise when
exp(ZT
a) »
1 (implying early failure) and instead a large t is observed.
A large positive residual can occur when a small predicted hazard function exp(ZT
a ) is coupled with an early observed
- _n
failure.
Though the
residuals sum to zero over the sample (6.4.12), their distribution will
be highly skewed.
3.
The residual O-E is multiplied by z.
Therefore the influence of a con-
taminating observation
de~~nds
on the relative size of z, apart from the
size of the residual.
This phenomenon was encountered in the multiple
regression example at 2.3.19.
4.
Note the effect of
1-1 (a,F ) in 6.6.1.
-avg
n
An extreme value in one of the
coordinates of z (say the k-th) affects not only zk but the other
coefficients as well.
-
The strength of the influence of zk on B. depends
-1
A
on the size of the (k,j)-th element of I
(8,F).
-avg A -n n
estimate of the asymptotic covariance of InB
5.
j
and
-]
This element is an
A
InS k .
With the interpretation of the residual in remark 2, the I.C. 6.4.10
shows a remarkable correspondence to that for the least squares estimator
in multiple linear regression (2.3.21).
In both
Ic(x;B,F) = A z R,
- -
where A is a pXp matrix proportional to the estimated covariance matrix
of
Bi z is the pXl vector of covariates for the contaminating data pointJ
and R is a scalar residual, appropriately defined.
The I.C.'s in both
cases depend continuously on the response (y or t) and
on~.
61
We conclude
that the exponential estimator will, like its least squares counterpart,
be sensitive to outliers in the data.
7.
7.1.
COX'S ESTIMATOR
Introduction
In many applications prognostic factors may vary with time.
Example 7.1.1.
If a patient undergoes a series of treatment, define a
covariate z(t), indicating the treatment current at t.
Or z(t) may
indicate the whole sequence of treatments up to t.
Example 7.1.2.
variate:
The covariate might be the current value of some biological
white blood count, number of cigarettes smoked/day, tumor size,
cholesterol, and so on.
Where such variables act through cumulative exposure, one might
define z(t) as the "effective dose" at t.
Such a dose might be measured
as a weighted average of past exposure levels.
With such time dependent variables, D. R. Cox (1972) introduced the
proportional hazards (P.H.) model:
(7.1.1)
Here A (t) is an arbitrary unspecified underlying hazard function.
o
Cox introduced likelihood methods for estimating S, and in this chapter
we consider Cox's estimator.
Estimation of the underlying distribution is
of interest in its own right and some estimates will be examined in
Chapter 8.
In the next section, we further develop the model 7.1.1., including
an extension of random censorship.
equations for estimating
B.
Section 7.3 presents the likelihood
]n 7.4 we briefly state some conditions under
which the likelihood equationt; have no finite solution.
the demonstration that the estimator
~
n
Section 7.5 contains
is a von Mises functional.
In 7.6
63
Fisher consistency under random censorship is proved when the covariates do
not depend on time.
The influence curve of
contained in Sections 7.7 and 7.8.
a and
related theory are
The chapter closes with some numerical
and Monte Carlo studies of the effects of outliers.
7.2.
The P.". Model
Let us define the covariate z as a function of time.
Definition 7.2.1.
A time dependent covariate z(pXl) is defined by
z
where z (t) q
q
=
=
{z(t) : t~O},
l, .•. ,p is a real function of t.
We assume z is an element
Z.
of a (measurable) space
With time dependent z, define the integrated hazard from 7.1.1.
t
AS(tl:)
=
IAS(Y\:»dY
o
(7.2.1)
t
I exp(~T:(y»Ao(Y)dY.
o
The survival function is
(7.2.2)
If z does not depend on time, this simplifies to:
GS(tl:)
=
T
exp{-exp(S z) A(t) }
=
G(t)exp(~ :)
(7.2.3)
T
where
t
(7.2.4)
A(t)
f A (y)dy and G(t)
0
'" exp{-A(t)},
0
c.lre the
undl~rlyillY
intf'lJriltcd hazar.d and survival functiotlH.
Again, with time dependent z, the density in the P.H.
mode~
64
is
(7.2.5)
With z a function of time, we can again define each observation as
x.
(t . ,
1
1
= 1,..., n •
6. , z. ), i
1-1
The fact that z is a function of time creates a host of potential
problems for the results in this chapter:
definition of F(t,o,z), measura-
bility, the existence of expectations, and so on.
compromise:
-z
(1)
We adopt the following
All proofs will be valid, with conditions stated, when
is not a function of time; (2) when z is a function of time, we regard
--
our methods as heuristic, the manipulations as formal only.
As examples of the "heuristic" method, we "define" the sample
empirical d.f. F (t,e,z) to be that distribution function which puts mass
n
I
- a t each observed sample point x.
n
1
= (t.,6.,z.).
1
1-1
We also will "define"
arbitrary distributions F(t,6,z) and write down expectations over these
distributions.
Random Censorship
Our formal manipulations with time dependent z will not be extended
to the random censorship model.
for
S will
For example, our proof of Fisher consistency
be carried out only for time independent covariatt.'s; the expectd-
tion of the squared I.C. will be found under the same conditions.
Let us, however, briefly indicate how the random censorship model
could be extended to the case of time dependent covariates.
Borel Field B on the space Z of the functioffi, z = {z(t)}.
probability measure on
suitable metric on
Then observations x
B,
=
B,
so that
(Z,B,P)
We define a
Let
be a
p(.)
is a probability space.
With a
let K(z) be a distribution function corresponding to
(t,6,z) are formed in the usual way:
Choose z
according to K (or r); continqent on z, choose failure and censoring times
P.
65
Sand C; let t = min(S,C); and set o.
This model has promise, but we will not pursue it any further.
To
summarize the strategy for this chapter, formal manipulations of F(t,o,z)
-
for time dependent covariables will be carried out only in Section 7.5
(showing B is a von Mises functional) and in Section 7.7 (finding the
influence curve).
In other sections--involving random censorship--the
proofs involving F(t,o,z) will be confined to variates which do not depend
on time.
7.3.
Cox's Likelihood
Suppose the sample consists of (t. ,o.,z.), i
1
z,
-1.
=
{Z,
-1.
(t) lo<t<t.}.
-
-
Definition 7.3.1.
1.
(We do not observe
1-1
Z1'
= l, ••• ,n,
where now
(t), if t>t ).
i
The risk set R(t) at time
~
is the set of labels for
individuals still at risk of failing (observed) at time t.
That is:
(7.3.1)
Suppose now there are m distinct failure times in the sample, at
< t(m).
By convention the t(j)' j
= l, ••• ,m
times, a subset of the t., i == l, ..• ,n, the observation times.
1.
are failure
The label
of the (j)-th failure is (j).
In what follows, we assume no ties among failure times.
Modifica-
tions to Cox's likelihood in case of ties have been suggested by Cox
(1972), Peto and Peto (1972b) and Efron (1977), Kalbfleisch and Prentice
(1973, and Breslow (1975); Breslow's approach will be reviewed in Chapter
Eight, in conjunction with his estimate of the underlying distribution.
Assuming no ties, Cox argued as follows:
and conditional on (a) the risk set R(t(j»
With the P.H. model 7.1.1
and (b) the fact that a failure
occurs at t(j)' the probability that the observed failure was (t(j)'~(j»
is the ratio of hazards:
66
(7.3.2)
The reasoning is this.
t
(j) ;
Let E be the event {Individual k fails at
k
the others in R(t(j) ) survive} •
The probability density of E is,
k
by independence, 7 • 1. land 7. 2 • 5 •
(7.3.3)
The events {E
k
: kER(t(j»} are mutually exclusive, so that the probability
that a single failure occurs at t(j)I conditional on R(t(j»' is
E
f(E ).
k
kER(t (j) )
Therefore, the probability that the failed individual at t(j) has
label (j), conditional on R(t (j»
is
f(E(j»
(7.3.4)
and this reduces immediately to p(t(j)
,~).
Notice that both numerator and
denominator in (7.3.3) are probability densities conditional on R(t(j».
Multiplication of the p(t(j) I~) leads to Cox's likelihood for
(7.3.5)
L(a)
a:
67
A notable fact about this likelihood is that it depends on the observed
failure times in the sample only through their ranks.
The likelihood (7.3.5) was a subject of controversy when it first
appeared.
Cox (1972) originally called it a "conditional" likelihood, but
this was incorrect:
failure time t(j).
likelihood of
a,
the conditioning events were not identical at each
Kalbfleisch and Prentice derived (7.3.5) as the marginal
based on the ranks of the failure and censoring times, when
there were no ties nor time dependent covariates.
Breslow (1972,1974,1975)
derived (7.3.5) as an unconditional likelihood for
a,
-
at observed failures and was constant between.
argument in connection with his estimate of
assuming
A (t) jumped
0
(We repeat Breslow's
Ao (t) in Chapter 8.)
Finally
Cox in a fundamental paper "Partial Likelihood" (1975) showed the 7.3.5
can be treated as an ordinary likelihood, for purposes of inference.
Let us write the efficient score at t(j) :
u. (a)
-J -
d log p (t ( . ) , a)
=
J
da
-
(pX})
Note that ~(t(j) ,~) is an "exponentially weighted" average of the ~(t(j»
in the risk set at t(j).
The equation for estimating
a is:
Or, by writing
a-n ,
A
68
for the solution, we define B by the equation
-n
m
(7.3.9)
E lz(.)(t(.»
j=l - J
J
- \J(t(j),a)] '"
-n
-
o.
A solution to (7.3.9) may not exist, as discussed in the next section.
The observed pxp information matrix is
m
1(13) =
(7.3.10)
r
j=l
~(t(.),~),
J
where
That is, C(t(.) ,B) is the variance-covariance matrix of the z's in the risk
J set at t(j)' under the scheme of exponentially weighted sampling.
In the 1975 paper, Cox shows that under a wide variety of censoring
schemes, E(~j(~»
=
0 and v~r(~j(~»
I
(7.3.12)
(6)
-avg -
=
= -E(~(t(j)
,~»
i. ,
= -J
say.
Write
lim(~ Ei.).
n~
n
-J
The sum rio has m terms; the divisor n is needed for work with F. If
-J
n
m/n ~ Q, say, then I
(B) = Q(lim! ri ).
Under regularity conditions,
-avg m~ m
-j
a C.L.T. for dependent R.V. 's applies to n
1
(7.3.13)
n
2"
A
(a-n -B)
-
L
-+
-1/2
-J
(a».
N(o,l
- -avg -
I
(B) will be consistently estimated by
-avg (7.3.14)
providing for large sample inference.
Eu. (B), and
69
We can,
therefore, base large sample inference on the fact that
1
1(6n )2 (6-n-6)- i
(7.3.15)
7.4.
-
N(O,I x ).
- -p p
Solvability of the Likelihood Equations
In this section, we briefly outline some conditions under which the
estimating equations 7.3.9 have no finite solution.
the z's do not depend on time.
For simplicity, assume
Then the likelihood equation 7.3.9 is
m
1: (~(j)
j=l
(7.4.1)
-
~(t(j) ,~)
= ~.
There will be no finite solution if, for some coordinate of
~,
say
the q-th, and for all B,
(7.4.2)
z
q
(j)
> «) IJ (t(,),B), j::o: l, ... ,m
-
-
q
)
with at least one strict inequality.
-
In words, we require z
q
for the
observed failure at t(j) to be greater than (less than) or equal to the
exponentially weighted mean of the Zq'S in the risk set at t(j).
Let us see how this can arise in practice.
indicator of group membership.
IJ (t(,),B) < 1, j
q
) -
=
1, •.. ,t-l,
an average of the Z.
q
Then, Zq(l)
= 1, ... ,t-l}
But the risk sets {R(t(j»: j
q
_00
< B <
For, suppose the first
::0:
Zq(2)
::0:
•••
q
::0:
1, ... , l-l.
But if there are no l's left in the risk set at t(l)' then
z(,)-IJ(t(,),B) -o,t<j<m.
q
)
-
Therefore,
precisely because IJ (t(,) ,B) is
00,
j
q )
= Zq(t_l) = 1.
all contain some O's.
As a result,
(7.4.3)
(7.4.4)
is a 0-1
Then 7.4.2 will hold if all of the l's are
failed or censored before the first 0 fails.
O-failure takes place at t(t).
Suppose Z
)
-
70
Together, 7.4.3 and 7.4.1 imply 7.4.2, and Cox's likelihood has no finite
solution in this case.
Note that if there are any l's left in the risk set at
Zq(j) < ~q(t(j) ,~) for at least one j ~ ~ and for any
B.
t(~),
then
The estimating
equations 7.4.1 may then have a finite solution, illustrating a peculiar
role of censoring.
With no censoring, condition 7.4.2 will hold in general if, for some
q, 1
2.
q < p,
Z
>z
>
>z
q(l) - q(2) - ... - q(n)'
(7.4.5)
with at least one strict inequality.
(The direction of the inequalities
7.4.5 may of course be reversed.)
7.5.
The Estimator as a von Mises functional
With time dependent z's F
n
is again defined to be the distributions
with mass lin at (t. ,o.,z.), is i=l, ... ,n.
1.
1. -1.
Note that z. (t) is only defined
1.
if t.>t; otherwise we may set z. (t) = 0 if t.<t.
1.-
1.
1.
Our strategy is to
write sums in 7.3.7 and 7.3.9 over the whole sample and divide by n.
The statement "k£R(t)" about risk set measurement is converted to
=
1."
~(t,B)
We begin by writing the exponentially weighted mean
functional of F.
n
e
Let
be a pXl vector.
1 n
~(t,EJ)
(7.5.1)
=
as a
Then
T
- E I[t >t]z.(t) exp(z.(t) 8)
n. 1
.
J
-JJ=
J1 n
T
- E I[t >t]exp(z.(t) 8)
n.J= l 'J-J-
The division of numerator and denominator by n has converted the sums
to averages over the sample.
redefining
~:
We make the dependence on F
n
explicit by
71
A(t,e,F )
~(t,e,F
(7.5.2)
-
-
n
- n
) = B(t,e,F )
-
n
where (letting y be a dummy for time) :
(7.5.3)
~d
(7.5.4)
F(y,z) is the marginal distribution of (y,z), and integration is
over [0,00) x
Z where Z is
the domain of the covariates.
Note that 7.5.3 is formally equivalent to
(7.5.5)
= Ez {p(Y>tlz)
- _
(7.5.6)
-z(t) exp(z(t)Te
_ _ )}.
Here t is a fixed time point, while Y and z(t) are random quantities.
We
must assume the existence of the expectation on the r.h.s. of 7.5.S to
avoid measurability problems.
Similar remarks apply to 7.5.4 and other
expressions in this chapter.
There is a connection here with the work of Efron (1977) on the
efficiency of Cox's likelihood, Efron makes his calculations on the basis
of the fixed sample zl,z2""'z. The randomness in the risk sets R(t)
- -n
is in his work due only to censorship and failure of the sample items.
For example, let P. (t) be the probabiliey that sample item i is in R(t).
~
Then Efron (c.f. his equation 3.18) works with quantities like the
following:
72
z.
E {E
e xp ( z.
(t)
i£R(t)-~
(t) T 8) }
-1-
(7.5.7)
n
E P. (t) z . (t) exp (z. (t) T8) •
i=l 1
-1
-1-
Now, the probability P. (t) is p(Y.>tlz.).
1
1-
-1
Therefore each term
P.(t)z.(t) exp(z. (t)T8 ) summed in 7.5.7 can be identified with the
1
-1
-1
-
quantity whose expectation is taken in 7.5.6.
The approach of this chclpter explicitly recognizes the randomness
of the z's, while the approach of Efron does not.
However, asymptotic
arguments with Efron's work require some sort of long run distribution for
the z's.
Indeed, in the two sample problem presented by Efron as an
example (z.
1
= 0 or z.1 = 1), the two samples are present in fixed propor-
tions p and l-p.
Using 7.5.2 we can write the estimating equation 7.3.9 for S over
-n
the whole sample and divide by n.
1 n
.!u(B
) = - E 6. (z. (t.)
n- -n
ni=l 1 -1 1
(7.5.8)
"
- lJ(t.,B ,F
1 -n
n
» = o.
For an arbitrary distribution F, let us define a functional S(F)
implicitly by:
(7.5.9)
~(~(F) ,F)
=
I 6(~(t)
- lJ(t,B(F) ,F» dF(t,6,z) =
o.
Then by 7.5.5, Cox's estimator is a von Mises functional, with
B
-n
B(F
•
n
).
o
Note that existence and uniqueness of a solution to 7.5.9 are not
guaranteed.
Nevertheless, we will continue to speak of "the" Cox estimator
defined by 7.5.9.
7.6.
73
Fisher Consistency
Suppose now that the covariates z do not depend on time.
observations x
Let the
(t,6,z) be from a random censorship model (Definition 3.2.1),
=
with failure time S from a P.H. distribution 7.1.1:
Label the resulting random censorship distribution Fa{t,6,~).
In this section we show that Cox's estimator is Fisher consistent
under the random censoring.
The equation defining the estimator for
arbitrary F is 7.5.6:
(7.6.l)
u(8(F) ,F) =
f6{~ -
Fisher consistency will hold if
l..I{t,8{F) ,F»dF(t,6,z) = O.
U{~,F8)
= 0, that is if:
(7.6.2)
Let us rewrite 7.6.2 as
00
00
(7.6.3)
JhdF 8
o Z
(t ,
1,
~)
making explicit the region of integration as
Zx
[0,00] and 6=1.
We will
evaluate both sides of 7.6.3 and show them to be equal.
We assume enough regularity in the distribution of
F8(t,1,~)
so
that a density function (with respect to Lebesg4e and counting measures)
exists (3.2.2):
(7.6.4)
We also assume that ~f8(t,1,~) and ~(t,~,F8) f8(t,1,~) are integrable on
Zx
[0,(0), so that Fubini's theorem may be invoked.
74
We begin by writing
=
~(t,~,Fa)
",,"'Y_
B(t,~,Fa)
and evaluating ~(t,~,Fa) and B(t,~,FB) .
By 7.5.3,
00
(7.6.5)
=
fZ f
I
T
)z exp(~ ~) dFa(y,z:y).
[y~t -...
0
By definition, we have
(7.6.6)
When 7.6.6 is substituted into 7.6.5, let us assume that the
resulting function is integrable.
We apply Fubini's theorem,
finding
00
~(t,~,FB) =
(7.6.7)
=
f~Z exp(z 13)
T
k(~) [f f~(yl~)
dy) dz
t
h
T
exp(z S) k (z) FS(tl~) dz,
Z
h
T
exp(z 13) k (z)
H(t Iz) GS(tl~)
dz.
Z
The last equation holds because
by the conditional independence of Sand C, given z, in the random censorship model.
In a similar fashion, we can show
(7.6.9)
B(t,~,FS) = J exp(zTS)
Z
k(z)
H(tl~)GB(tl~)
dz.
75
We now write the r.h.s. of 7.6.3 as
00
A(t,a,Fe)
R.H.S.
=
II :
o Z
B(t,
a,Fa)
- _
dFa(t,l,z)
-
-
(7.6.10)
using Fubini's theorem.
By 7.6.4, this is
(7.6.11)
[fga(tl~) H(tl~)
R.H.S.
Z-
(7.6.12)
lIAo(t)
k(z)
d~)dt,
exP(~T~)G8(tl~) H(tl~) k(z)d~)dt,
Z
(7.6.13)
00
(7.6.14)
=
I A(t,~,F~)AO(t)dt.
o
We further evaluate
7.6.14 using 7.6.7
00
R.H.S. =
II:Z
T
exp(z
S)
k(z) H(tl:)
Ga(tl~) Ao (t)dz- dt.
o
00
=
I J:
ga(tl:) H(tl~) k(z)dt dz,
Z
o
I I~
00
=
o Z
dF (t,l,:), by 7.6.4
a
= L.H.S. of 7.6.3.
This proves Fisher consistency.
o
76
The reader is invited to try a "proof" of Fisher consistency with
Remark:
time dependent z's, along the lines of this section.
Surprisingly, if.the
formal manipulations are carried out with Fubini's theorem--such a "proof"
will go through.
The Influence Curve
7.7.
In this section, we derive the I.C. for Cox's estimator, allowing
the covariates to depend on time.
similar to that in Chapter 6.
x*
=
(~,o
* ,z *),
The strategy for finding the I.C. is
We contaminate an arbitrary F with
the resulting distribution is F£
=
Cox estimator based on F£ is defined by U(6(F ),F )
- -
£
£
(l-£)F + £ Ix..
=
0 (7.5.6).
-
The
This
equation will be differentiated with respect to £ in order to find
:£
~(F£)
I '
the influence curve.
£=0
Before we proceed, it will be helpful to restate the definitions of
some functionals needed in the sequel and to define some new functionals.
* •
To avoid confusion with the contaminating point (t ,0
,~
•
), let y and u
be variables for time.
From 7.5.2, 7.5.3, and 7.5.4, we have the exponentially weighted
mean at y:
A(y,e,F)
l-/(y,n,F) =
- -
B(y,e,F)
pXl
where
A(y,O,F) =
II[U~]~(Y) exp(~(y)T~)
dF(u,z)
pXl
and
B(y,e,F) =
fl[U~]
exp(z(y)T e ) dF(u,z).
77
---
Define the matrix D(y,e,F) by
(7.7.1)
Then let
D(y,e,F)
(7.7.2)
C(y,e,F)
= B(y,e,F)
- -
~(y,e,F) ~(y,e,F)
-
T
.
The reader may verify that C(y,e,F ) is just the exponentially weighted
-
-
n
covariance matrix of the z's at y, defined at 7.3.11.
Finally let
1
(e,F) = Jo~(y,~,F) dF(y,O)
-avg - - pxp
(7.7.3)
~ J~(Y'~'F)
dF(Y,l)
Note that I
(O,F) is analogous to the average information matrix
-avg
I
-ilVg
(6) defined at 7.3.12.
The relationship will be discussed in 7.8.
To ease the notational burden, we adopt the convention
(7.7.4)
Then
13(0)
= 13(F).
The defining equation for 13(£) is
(7.7.5)
u (13 ( £)
-
-
,F
The I.C. will be
Wo now expand 7.7.5.
) = Jo
(~ (y)
£-
-
~- (y , 13- ( £) , F£ ») dF £
= O.
78
(7.7.6)
+ £ 0 * (z *(t. ) - )..I (t * , S (£) , F » = O.
-
£
Or
= (1-£) U(S(£) ,F) + £ [0 * (z * (t * ) - lJ(t * ,6(£),F
o
-
-
»1.
£
We now differentiate 7.7.6 coordinate by coordinate with respect to £
and evaluate at £=0.
~c ~(~(£) ,F f )
o
I
=
£=0
ic
u(6(£) ,1")
-
£
I
£=0
+ 6 * (z * (t * ) - pet * ,~(O) ,F)}- U(S(O) ,F)
(7.7.7)
because
,F) =
0
a£ ~(~(£) ,F)
I
u(S(O)
)..I (t * ,6 (0) , F»,
~(s(n,F)lc'=o + 0 * (z • ( t*)
d
;)E
-
-
by 7.5.6.
Now
a
=
£=0
~
o£
I(
z (y) - )..I (y , S (£) , F ») dF (y , 1)
-
-
£
-
I
£=0
(7.7.8)
We assume sufficient regularity in 7.7.8 to allow differentiation under
the integral sign.
Sufficient conditions are given, fOr example, in
Theorem l7.3e of Fulks (1409).
a.
Assume:
I)..I(y,(3(C),F) dF(y,l) and
-
-
C
f~
a£
)..I(y,S(E),F ) dF(y,l)
-
-
£
are continuous with respect to y and £.
b.
There is an £' fOL which IJ..I(y,6«('),F ,) dF(y,l) converges
(true here for
-
£' =0) •
-
e:
c.
79
I~
V(y,B(£),F
)
a£ _
£
dF(y,l) converges uniformly for 0<£<1.
With these assumptions,
(7.7.9)
The inner derivative is:
(7.7.10)
d I d ~~(Y'~(E)'FE)~
d£ ~(y,~(£) ,FE) E=O = a£ B(y,a(E),F )
-
E
~
= {S(y, (0) ,F)
(7.7.11)
- A(y,I3(O) ,F)
-
-
[~
a£
E=O
[iE ~(y) ~
I
(E) ,FE} E=O]
I -olL~2(Y,I3(0)'F;I
J
B(y,I3(£),F)
E £-
1
We require
~
a£
A (y , B( £) , F )
-
-
£
I
c= 0
~
a£
and
B ( Y , 13 ( £) , F )
-
£
I .
E= 0
We find the first of these expressions by writing out ~(y,~(E),FE):
T
exp(z(y) B(E» dF (u,z)
-
(7.7.12)
= A(y,B(£),F)
-
I:
+ £ [ I[t*~l~ * (y) exp(z * (y) To~(c»
-
~(y,~(E)
,F}
l
Here
(7.7.13)
Then, differentiating 7.7.12,
(7.7.14)
+ I I t * > ) Z * ( y) e xp ( Z * (y) Tn~ ( 0» - A( t , l3 (0) , F) .
-Y
80
The first term of 7.7.14 is
a
a£
.
,
~(y,~(£) ,F) £=0
(7.7.15)
~(y) exp(~(y)T~(£) dF(U,Z)ll
J £",,0
•
Again let us assume that we can differentiate with respect to
under the integral sign in 7.7.15.
£
(Recall that the regularity conditions
in Theorem 17.3e of Fulks (1969) will apply only when z does not depend on
time.)
They by the chain rule of multivariable calculus, we have
(pXl)
J [u~yl~ [~£
1
(7.7.16)
(y)
J [u~yl ~ ~
1
(y)
exp
(~(y) ~
T
(£))
1£=0]
dF (u, Z)
exp(~ T~ (0)) [:£ ~ (£1 1£=0] (u,~)
(yl T
(y)
dF
= D(y,(3(O) ,F) (3' (0),
-pxp-
pXl
where D(y,(3(O) ,F) was defined at 7.7.1.
Note that (3' (0) is the influence
curve we are seeking.
Therefore, collecting terms from 7.7.16 and 7.7.14, we have
~oC
A(y,13(£),F)
E
px 1
-
I
E=
0
•
•
T
D(y,S(O) ,F) 13' (0) + I[t.5)~ (y) exp(z (y) (3(0»
pxp
pXl
pXl
- A(y,(3(O),F).
-pxl-
81
In a similar fashion, we can find:
()
"B(y,B(£),F );
aE:
£
(7.7.18)
paralleling 7.7.12 - 7.7.14.
The first term on the r.h.s. of 7.7.18 is
()
aE B(Y,~(£)
,F)
I
£=0
-~ II
-
()£
exp(~(y) T~(£» dF(U,~)
[U51
I
£=0
(7.7.19)
A(y,B(O),F)
T
p'(O)
lXp
pXl
assuming we can differentiate under the integral sign.
Again 13'(0) is the
I.e.
Therefore
~£ B(y,~(E:) ,F£) I
=
£=0
~(y,~(O) ,F) T ~'(O)
(7.7.20)
+ I(t*5
*
1
exp(~ (y)
We ndw simplify notation by writing
T
13(0» - B(Y,~(O) ,F).
82
A(y)
= ~(y,~(O),F)
p Xl
B(y)
= B(y,~(O),F)
l.I(y)
= B(yf
~(y)
(7.7.21)
p Xl
~(y)
= ~(y,~(O),F)
pxp
~(y) = ~(y,~(O),F).
pXp
We also write
tHo)
-
=
-B,
suppressing the dependence on F.
We now substitute from 7.7.20 and 7.7.17 into 7.7.11:
=
[B(y)~(y)~' (0)
+ B(Y)I[t*51~ * (y) exp(~* (y) T ~) - B(y)A(y)
(7.7.22)
T
- ~(y)~(y) ~'(O) - ~(y) I [t*5
+
~
~(Y)B(Y)l
)(y)
1
1
-
[B2(y)1
A(y)A(y) TJ
--- B(y)B(y)
- a'
(0)
B(y)
-
(7.7.23)
+
I
*
[t ~y)
exp ( z *( y )T D~ ) ~z*(y)
- A(y)]
- B(y)
BZ (y)
This result is substituted into 7.7.8.
-a)
exp(z * (y) T
00
~£ ~(~(£) ,F)
83
I . , -[I~(y) dF(y,l»)~'(O)
£=0
o
(7.7.25)
00
-
II[t*~]exP(~*(y)T~) (~*(Y)-~(Y»dF(y,l)
o
B(y)
t*
= -I-avg (6,F)6'
- (0)
(7.7.26)
-
IexP~(Y)'_)(Z_*(Y)-ll_(Y»dF(Y,l).
_~
o
B(y)
We substitute 7.7.26 in turn back into 7.7.7:
t*
(7.7.27)
IexP(~*(y)T~) (z_*(Y)-ll_(y»dF(y,l)
0=-1
(6,F)6' (0) -avg -
--~
o
B(y)
We can now solve for 6'(0) = IC(x * iB,F).
The result is a
Theorem 7.7.1
Let Cox's estimator 6(F) be defined by 7.5.6, and let I
(6(F) ,F)-l
-avg be an inverse of I
(6(F) ,F).
-avg -
(A generalized inverse will do.)
Then
the influence curve for Cox's estimator at x* • (t*,o*,z*) is
...
le( (t * ,6 * ,z * ) i6,F)
!avg(~(F) 'F)-l{-I::PI~*IYlT~(Flll~*(Y)-~(YlldF(Y'l)
o
+ 6*
B(y)
(~* (t*)-~(t*))} .
In Appendix A2 it is shown that EF(~C(x* J~,F) • O.
I.e. and robustness are discussed in 7.9.
The empirical
84
7.8.
The Squared I.C. Under Random Censorship
According to the theory of von Mises derivatives, we expect
~(8-n -6) ~
(7.8.1)
.~
N(O,V(F»
where
(7.8.2)
Suppose now that the P.H. model holds in the random censorship
setup of Section 7.6.
depend on time.)
(In that setup, recall that the covariates do not
As in 7.6, call the resulting distribution Fa.
The main result of this section is Theorem 7.8.1, which states:
(7.8.3)
E
FS
(~C(x;~,FB)
This implies V(F o )
-
I.J
~C(x;~,FB)
=1
(a,F )
avg - aI.J
T
)
=
!aVg(~,Fa)
-1
.
~l
How does this compare to the asymptotic covariance matrix from
Cox's (1975) partial likelihood?
covariance matrix of 1
(B)
-avg -
-1
That theory, as quoted in 7.3, gave a
,where 1
(a) was defined at 7.3.12.
-avg -
Writing 7.3.12 in functional form, we see that
n
1
(a) = lim{-E (~ E C.C(t. ,a,F )}
. 1 1_ 1 F oI.J n 1=
n
-avg n-)(lO
(7.8.4)
The identity of 1
(8) ,md 1
(13, Fa) should follow from the mild
-avg -avg - I.J
conditions of Cox's theory; but no proof will be given.
We begin the proof of the theorem with two closely related lemmas.
These were implicitly proved in Section 7.6.
use in Chapter 8.
Lemma 7.8.1 will be of
85
Assume the P.H. model 7.1.1 for failure time:
with a random censorship distribution as in Section 7.6.
With the
integrability conditions of Section 7.6, then
LeIlll\a 7. 8 • 1.
Lemma 7.8.2.
Proofs:
By 7.6.4 and 7.2.5
(7.8.5)
Then
dFS(t,l)
=
IdFS(t,l,:)
z
(7.8.6)
by 7.6.9.
This proves Lemma 7.8.1.
Also,
dFB(t,ll:)
=
dFS(t,l,:)
k(z)dz
(7.8.7)
o
by 7.6.8, proving the second lemma.
Theorem 7.8.1.
(7.8.8)
With the assumptions of the previous lemmas
-1
)
= I-avg (a,F
- ....a
86
Proof:
+ O{z-ll(t»}
(7.8.9)
I
(13,F Q
-avg - p
pxp
-1
)
{-w(t,z) + O(z-ll{t» },say.
--
pX1
pX1
Then,
20w(t,z) (z-ll{t»
T
(7.8.10)
T
+ 6{Z-1l(t» (z-ll{t»}I
(B,F Q )
-avg - .....
N(t,O,z)
where
M(t,z) =, w(t,z) w{t,z)
pxp
T
lXp
and
N(t,6,z)
(7.8.11)
T
26w(t,z)(z-P(t»
.
lXp
pXp
Then
E
(7.!:L12)
T
FS
(I_C«t,6,:) ;~,FS) I_C{(t,O,:) ;~,FB) )
1
(B,FS)-l{B (M(t,z) - N{t,z»}I
(B,F )-1
-avg F -avg - S
B
+ !avg(~,FB)
-1
T-1
tEF(O{:-~{t» (:-~(t) )}!avg{~,FB)
-1
87
We compll,te the proof in two parts.
E
= O.
(M(t,z) - N(t,6,z»
Fa -
-
E {O(z-~(t» (z_~(t»T}
Fa
- - -
=
In Part I, we show that
In the second part, we show that
I
(a,F a ).
The theorem follows immediately.
~
-avg -
Part I.
E
To show:
F
(M(t,z) - N(t,O,z»
a- -
-
m
-
The following step is crucial.
O.
-
Define the pXp matrix
where the differentiation is coordinate by coordinate.
Then
(7.8.13)
Comparing ~l(t,~) to ~(t,6,~) defined at 7.8.10, we see that
(7.8.14)
= 6~1 (t,~)B(t)
N(t,6,z)
exp(~T~) ~Fa~:'1)J
By Lemma 7.8.1, this is
(7.8.15)
=
N(t,6,z)
6~1 (t,~)
exp(zTa)A (t)
-
-
0
Then
00
E
(N(t,6,z»
Fe -
f f ~1 (t,~)dFa(t,1,:)
Z
0
T
exp(z a)A (t)
-
(7.8.16)
-
0
88
00
by Lemma 7.8.2.
We also evaluate E
Fe (M(t,z»:
- -
(7.8.17)
-J
r~(t.~)dFa(t'~))dK(~).
Z0
Therefore
E
Fa (M(t,z)
- -
- N(t,O,z»
-
(7.8.18)
-
-
f~ro
(~(t,~)dFB(tl~)
Z
dt
The elementary rUle for differentiation of products reveals
(7.B.l9)
Let us
are finite.
clSSumC
that for ('very z, the elements of the matrix H(OO,z)
This will be
tnl('
Or, more realistically, the
limit of observation,C
max
it. fer inslancp, the cov.:lriates ore bounded.
a~;sumption
, so that H(C
With the assumption,
~he
will hold if there is an upper
Iz)
max -
= o.
inner integral in 7.8.18 can be evaluated
by the fundamental theorem of calculus and 7.8.19:
89
00
f ~t ~(t,~) FB(tl~)dt
=
o
=
M(oo,z) x 0 - 0 x 1
= o.
Therefore, the entire integral 7.8.18 is zero, and
= o.
E
(M(t,z) - N(t,6,z»
FB -
Part II.
We must show that
(7.8.20)
E
It is
pB
ea~;ier
(6(z-~(t»
(z-~(t»
T
)
l%
J
(B,I"o).
-avg I-'
to start flom the r.h.s, defined at 7.7.3:
00
I
(B,F o )
-avg I-'
f ~(t,~,Fa)
dFa(t,l).
o
Recall from the remark at 7.7.2 that ~(t,~,FB) is the exponentially
weighted covariance matrix of the z's in R(t); we can, therefore, write
it as:
00
And, assuming we can apply Fubini's Theorem:
1
(B,F a
-avg -
)
I-'
" I~[IImexP(~T~)(c-~(t»
o
-Zt
B(t)
(~_~(t»T dFa(yl~) k(~)d~]dFB(t'l)
=
I JeXP(~T~) (~-~(t)) (~-~(t»T(I
-8ft)
Z
o
00
dFa(y\Z»
k(~)d~ dF a (t,1)
90
t
(7.8.22)
on
J JexP(:T~) (~-~(t) (:_~(t»T
o
Z
Fa(tl:) k(z)dz dFa(t,l).
B(t)
With Lemmas 7.8.1 and 7.8.2, we can write
= dFa(t,1Iz»
(7.8.23)
=
k(z)dz B(t).
dF 8 ( t , 1 , z) B ( t) .
Substituting the last expression into 7.8.22, we find that B(t)
cancels in numerator and denominator, and
00
I
(S,F a )
-avg -
IJ
JJ(~-~(t»
(~-~(t)
) T dFS
(t,l,~)
() Z
T
(O(z-v(t»(z-V(t»),
E
F
- -
S
- -
as we were to show.
The theorem now follows, since by 7.8.12,
EI-'
l
b
(~'(x;~,l"l;)
'I'
l~C(xd~,F(:)) )
I
(R,F )-1 I
-avq -
B
(B,p) I
-~vg -
B
(S,F )-1
-avg -
B
[J
91
7.9.
The Empirical I.C. and Numerical Studies of Cox's Estimate
Suppose B
is Cox's estimate for a sample of size n, with failures
~n
at Y < Y •.. < Y '
m
2
l
The empirical version of the I.C. 7.7.28, is
evaluated at a point x = (t,6,z)
(z time dependent) is
IC( (t,6,z);l3 ,F )
-
-
~·n
n
(7.9.1)
+ 6(z(t) - ll(t,S )}.
-
n
Suppose the point x = (L,6,~) is now actually added to the original
sample.
Let
~~~=l
denote the t!stimate based on the augmented sample.
Then
by 2.3.8 we t'xpect that
(7.9.2)
IC(xd3 ,F )
n
~n
=
(n+1)
"+x
1 - 8 ).
-n+
-n
(B
In this section, we shall discuss robustness of
of the empirical I.C.
8
~n
by examination
As a check, computations of both sides of 7.9.2
have been carried out for actual samples with a scaler covariate.
Two
samples of size n=49 were drawn.
The observations for each sample were created in the following way.
A scalar covariate z was gener.ated from a uniform [-I,ll distribution.
Conditional on z, a failure time S was drawn from an exponential distribution with hazard e
13=1.
Bz •
For the first sample, 8=0, and for the second
A censoring time C was drawn from a U(0,4) distribution.
z,S, and C, the observation x = (t,6,z) was formed.
From
A similar procedure
was used in the Monte Carlo study described in the next section.
The two resul ting
sampl.~s
The resulting estimates of
are listed in Appendix A4 for reference.
S, were:
8 49 '" 0.36 and 84q = 1.36.
The>
92
first sample thus provides a "small" estimated B and the second a "large"
one, as desired.
We begin our examination of ehe empirical I.C. by observing that it
is the sum
(6
of two parts, both multiplied by I
)-1.
-avg -n
The first part
is
The second part is
In the augmented sample, the added observation is present in those
risk sets R(k)' such that
Yk~t.
This presence is represented by Q "
l
On the other hand, Q can contribute to the I.C. only if the
2
added observation is a failurl!
is formed.
(~=l).
In this case a new risk set at t
This may not be aile of the original m risk sets.
As a con-
vention, we agree that if t is larger than all n sample times, then
~(t,S )
n
= z(t)"
Indeed in the
In this case there is therefore no contribution from Q2'
~lUgmented
sample, with t the largest observation, we must
have
since only z is in R(t}"
Ref(~rence
to th!' original likelihood equations
(7.3.9) shows that the contribution of x then does not depend on 8.
We now consider in detail how an observation x
S
-n
(t,8,z) influences
To do so, it seems useful to consider two cases separately.
first, we assume z and 6 are fixed, but t moves.
and
=
6 are held constant, but z is allowed to vary.
In the
In the second case t
93
Case 1:
Fix z and
For fixed z
6~
=
vary t.
{z(y) :y~O}, the term Q (t,~) is a step function in t,
l
For t<y , Q (t,z) = O.
l
l
taking on m+l possible values.
to the sum for each failure Y with Yk2t.
k
A term is added
For t>y , Ql(t,z) is constant.
m
-
Q is therefore bounded.
l
The term
Q2(t,6,~)
is also bounded.
For time dependent zls, z(t)
and ~(t'~n) may vary continuously with t.
Therefore Q is not, in
2
general, limited to a finite number of values, as Q is.
l
are not time dependent, Q
2
For zls which
takes on at most n+l values; there is a value
for each original observation t., i = l, •.. ,n, and zero, as agreed, for
1
t>t , the largest sample valu(!.
n
From these observations we may conclude that the I.C. is discontinuous as a function of t and is bounded.
apply to the quantity 50
(B;~
- 649 ).
The same conclusions should
Let us turn to the sample data to
check these conclusions.
For the first sample, B
49
failure at t
==
3.69.
=
The largest observation was a
0.36.
In Figure 9.2.1 the Ic(t,6,z) is plotted for the
four combinations resulting from z = {-l,+l} and 6 = {O,l}.
curve labeled "1" is the combination (z
is (z = -1,0=1) and so on.
==
Thus the
-1,6=0); the curve labeled "2"
The horizontal axis is plotted from t=O to
t==4.
.
.
d'1f f erences are :>0
r. ( ~ +x
Q
)
The correspond1ng
norma]1zed
1->50 - 1->49'
are designated "DIFF" and are plotted in Figure 7.9.2.
These
The I.C. and
DIFF were plotted and were both calculated at increments of 0.10 in t.
We can see that the I.e. plots are very similar in shape to the
corresponding DIFF plots.
values of t.
Thl' I.C. exaggerates the true influence at
Both the I.C. and DIFF curves are discontinuous in t, as
94
predicted.
'fhe curves are geJlPrill I y monotone, except for a few
instances.
The curves at z=+l decroase with increasing t.
This is to be
expected.
As t increases, we expect the estimated hazard to decrease.
"
8
" should increase. Also at z
For z = +1, this hazard is e ; therefore
a
= +1,
an added failure (0=1) should result in a larger estimated hazard than an
added censored time (0=0), no matter what the value of t i s .
This is
precisely the behavior shown by the I.C. and DIFF.
For z = -1, the estimated hazard is
for the increase in
the value of
l)
a with
increasing t.
e8 .
By symmetry we can account
For either value of z(-l or +1),
does not affect the estimate of t
sample observation.
> 3.69,
the maximum
In this sample, the largest change is
in the case (z=l,O=l).
850
occurs
The re-estimated values of s+x range from 0.42
50
at t=O to 0.11 for t = 3.69.
Now let us turn to the second sample, which resulted in
849
= 1.36.
In this sample, the maximum observation was a censoring time at t = 3.99.
The four empirical influence curves for z = {-l,+l} and 0 == {O,l} are
plotted in Figure 7.9.3.
The DIFF curves for the normalized change in
a
are plotted in Figure 7.9.4.
Again the
I.e. curves strongly mimic the shape of the DIFF curves.
But the influence curve also badly overestimates the large changes in the
estimator for z
apparent here.
+1.
The discontinuous effect of the risk sets is also
Indeed, of 27 failures, only two occurred after t = 1.5.
The curves for z = +1 show much more of a change with t than do the
curves for z = -1.
The original data give, for z = +1, an estimated mean
failure time of E(S)
=
;1.36 = 0.25.
decrease the estimate (from
for t == 4.0 and 0=1).
850
Larger observed times dramatically
= 1.42 at t=O.O and 0=1 to
850
= 0.95
From these examples, we may conclude that the influence of t on
is indeed bounded.
Bn
95
A virtue of the Cox estimator is that only the ranks
and not the values of t enter into the estimate.
Therefore any monotone
transformation of the original sample times leaves the values of the I.C.
unchanged.
The most that an outlier can do is shift from being largest
observation to smallest.
The actual value does not matter.
The discontinuities in the I.C. and normalized difference curves
have been noted.
These discontinuities mean that
B is
sensitive to local
shifts in the data--the "wiggling" mentioned in Section 2.3.
Case 2:
Fix t and 6; vary z.
We have seen that the I.C. and normalized change in the estimate are
both bounded, as functions of t.
tinuous and unbounded.
exponential terms:
As a function of z, the I.C. is con-
There are terms linear in z and also there are
If the actual change (S+x - S )
-n+l
-n
depends exponentially on z, this would pose a serious problem.
outliers in z could potentially cause large changes in
S.
For
This phe-
nomenon of "influence" of position of the independent variables was
encountered in the cases of multiple least squares regression
(Section 2.3) and of exponential survival regression (Chapter Six).
It is therefore of interest to look at some examples.
Plots of the I.C. and of DIFF ;
z, were made from the two samples of n
combinations of (t,6) were considered.
(n+l)
:c
"'+x
'"
(8 n +l - 5n ), as functions of
49 described earlier.
Four
In each, z ranges from -5 to +5
in increments of 0.2; recall that in the original samples z is in the
interval [-I,ll.
The four combinations of (t,6) are:
(2) t=1,6=0; (3) t=1,6=l; and (4) t=4.
(1) t=0,6=1;
By our earlier discussion, the
curves for t=4 will be the same for 6=0 and for 6=1.
Also, an added
point with t=O and 6=0 has no influence on the estimate.
Therefore the
96
points chosen cover the range of possible (t,O) values.
(S
For the first sample
= 0.36) the I.C. 's are plotted in
Figure 7.9.5 and the corresponding OIFF curves are plotted in Figure 7.9.6.
For the I.C. all curves except that at t=O show the effect of the exponential term.
Indeed the curve at t
=
4.0 has a value of -401.0 at z=5.
When we turn to the actual OIPF plots, we notice that the predicted
exponential declines in
~
do rIot take place in the curves 2, 3, and 4.
These curves seem to level off as z becomes larger; indeed thi.s is true
of curve 1 also.
Nonetheless, the I.C. mimics the qualitative behavior of all four
DIFF curves.
linear.
For example, both curves for situation 1 are approximately
It is interesting th.lt the DIFF curves are not necessarily
monotone in z, for curves
2,~,
and 4.
The I.C. captures this behavior,
at least for curves 3 and 4.
Plots of the I.C. 's from the second sample
Figure 7.9.7.
Withthe larger value of
(S =
S, the exponential fall-off in
curves 2, 3, and 4 is more pr'Jcipitous than before.
DIFF curves aro plotted in Fi'Jure 7.9.H.
650
a~
is not nearly so severe
The I. C. therefore has
effects of outliers in z.
6
50
The corresponding
Again the actual fall-off in
the I.C. 's predict.
,~xaggerated
in both samples the extreme
But comparison of the I.C. and DIFF curves
show close agreement for smaller values of z.
changes in
1.36) are shown in
And, in both samples, the
caused by outliers in z are more extreme than the changes
caused by the variation in t.
Remarks
We close this section with a few miscellaneous remarks on the I.C.
A small Monte Carlo simulation is presented in Section 7.10.
and recommendations will be made in Section 7.11.
Conclusions
-
e
Fig. 7.9.1.
e
Plots of the empirical influence curve for Cox's estimate against t: n=49, B =0.36.
n
The cases are: 1: (z=-l,C=O) 2: (z=-l,C=l) 3: (z=l,C=O) 4: (z=l,O=l).
8
+--------------------------------------------------------------------------------1
I
l-~11
,,
I
II
1 1 111 1 1 1 1 1
~
1 1 , 1
,. II
,
'Q 4
""'1'
1 1
2222222 222
1 1 1
411111
222 2
t
1111
Illlll
~~1-------------------1l-.-----------------2-2----------------_A,
o
,
~
~
, 222
~
f
3
,
222
,
-4
~
~
,2
2 2 2 2
222
_~_.
2222114'2
II •
JJ
2 2
4
333
444411
4 .. •
,
3 J 3
Ie I
3 3 3 3
II II
",
.. ,.
3 3 3 3
I
-8 ..
I
3 3
I
I
-12
..
,
,,
3 3 3 3 3 3 3 3 3 !
~
,
I
-16 ..
I
f
I
,
-20
,,
+
,
"~1'I
+
-+-------------------+-------------------f-------------------+-------------------+
o
2
3
I
-2/1
,
,
\£l
'!'
....a
Fig. 7.9.2.
Plots of the normalized difference
in Cox estimates against t: n=49, 8n =0.36.
_
The cases are: 1: (z=-l,~=O) 2: (z=-l,O=l) 3: (z=l,O=O) 4: (z=l,O=l).
8
+--------------------------------------------------------------------------------1
I
,
1
I
1
1 1
I
Q
o
+
'QQ
1111111
1
Q Q II
1 ,
'Q111111
2222
1
""
QQQll
22
+'-1-------------------Q-Q------------------------------------------------------I 313 3
4 2 222 2 2 2
I
3
222 2 Q II II
I
2 2 2 2
122 2
333
Q Q 0 •
Q 0
-0
+2 2 2
3 3
Q Q
1
333
OQ
DIY!' ,
3 .
1
3333333
-8
~
1111111111
1""'2222222222
1
+
Q
1
I
I
I
I
I
I
Q
'lala'laltlt
J J
,
3 3 J 3 3 3 3 3 J J
I
I
I
,
-'2 +
3 3 J 3
,
I
I
I
!
I
-16
+
I
I
I
I
-20
+
I
I
,
-2Q
I
+
-+-------------------+-------------------+------------~-----+--------------.----+
o
1
2
J
•
..0
IJ:)
T
e
e
e
-
e
Fig. 7.9.3.
e
Plots of the empirical influence curve for Cox's estimate against t: n=49, 8 =1.36.
n
The cases are: 1: (z=-1,8=O) 2: (z=-l,o=l) 3: (z=l),O=O) 4: (z=l,O=l).
8 +----------------~--------------------------------------------------I
.. ---------,I
I
I
III /I
1
11/1
""""",",,',,11111,
11111111111
o +1-'- '-1-'-1-----------~------------2-2-2-2-2-2-2-2-2-22-2-2-2- 2-2-2-2-2-2-2-2I
3 '3
I
I
2 2
II
133
II II /I
2 2 222 2
3
-8 +
,
2 2
/I
II II II .. •
3 3
2 2 2 2 2 2 2 3 3 3
/I ..
12
-'6
Ie
-2'
333
,I
3 3
+
,
~
I
I
I
I
+
II .. .... .. ..
II II .. ..
3 II
3 3 3 J J 3 J J 3 333
t
1
I
I
-32
+
I
I
I
I
-flO
+
II ,
I
-/18
I
I
I
II .. II ,
,
II ,
•
I
I
I
I
3 3 3 3 3 3 3 3 3 3 3
I
I
I
+
I
I
I
I
-56 +
-+-------------------+-------------------+-------------------+-- -----~---------+
o
1
2 3 '
'1'
\0
\0
Fig. 7.9.4.
Plots of the normalized difference in Cox estimates against t:
n=49, 6 =1.36.
The cases are:
1: (z=-l,o=O) 2: (z=-l,O=l) 3: (z=l,O=O) 4: (z=l,O=l). n
8
+--------------------------------------------------------------------------------1
I
1
1
I
,~ "
o
-8
I
I
~~
11111111111111111111111111111111111
+1-1-1-1-1-1-------------------------2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-1
I
3
4
2 2
133
4 4 ~ ~
2 2
133
222 222
I
2222
~ll
+2 2 2 2
3 3 3
~ II
I
3 3 3
1
I
3 3
II
33
llllllllllllllll'll'
I
-16
+
3 3 3 3 3 3 3 3 3 3 3 3
,
DIP!' ,
1
a .. 411 • • • • • •
3
33 3 333 3 3 3 3
I
-2'1
+
I
I
,
+
,
+
,
J
-32
I
I
-'0
I
I
I
1
-II"
+"1
1
t
I
-56
+
-+-------------------~-------------------+-------------------+-------------------+
o
,
2
3
•
I-
o
o
T
e
-
e
-
e
:ig. 7.9.5.
e
Plots of the empirical influence curve for Cox's estimate against z:
The cases are: 1: (t=:J,C=l) 2: (t=l,O=O) 3: (t=1,5=1) 4: (t=4).
....
n=49, 13 =0.36.
n
20 +----------------------------------------------------------------------------------------------------1
1
1
I
1
,
I
I
411444444
IIIIIIIQQ
10
1 1
1
1
1 1
1
11
1 1
1
IaQQ
+"
"
I
I
I
1
II
4 "
,
1 1
'"
1
'"
1
12 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
1
I
2222
113
J...
3311
3333
I
331
,
,
33"
33
1
3 3
1 1
IC I
I
t
3 3
3
3 3
t
-1:>
+
3 3
I
I
3
3
3 3
3
J3
I
-20
+'I
2
3
33
22
3
2
1 ,
2
Il
3
1
3
1 1
2
" 2
1 1
3
2
1 1
3
1
II
1 1
2
3
2
" 3
1 1
1
1 1
2
2
I
I
,
I
I
3
•
2
3
2
I
-30
22
4
1
3 3
I
I
I
1 1
II
+
3
-+---------+---------+---------+---------+---------f---------+---------+---------t---------+---------+
-II
-3
-2
-1
0
1
2
3
5
-~
II
to-'
o
to-'
7:
Fig. 7.9.6.
Plots of the normalized difference in Cox estimates against z: n=49,
"
; : ,
The cases are: 1: (t=O,O=l)
2: (t=l,u=O)
3: (t=l,::'=l) 4: (t=4).
8
n
=0.36.
20 +----------------------------------------------------------------------------------------------------1
I
I
,
I
I
1
I
10
,+
I
I
1
I
1
1"'1
,
I
1
,llllllllllllllllllllllQ
1""1111111&
1 1
1
1
UQ.
I
l'
" 1 ,
,
I
,,
II
222 2 2 222 2 2 2 2 2 2 2 2 222
1 1
II
1
12
22222
11
0+
3311333333
,
333
'11
22
33
I
33"
22
33
,
3 3
1 1
"
2
3 3
DUr I
3 3 3'
2
3 3
,
33
l'
Il
22
33
I
33"
" 2
333
,
333
1
22
333
-10 +
3 3
1 ,
II
2 2
3
13333
1
22
I
" ,
II
22
I
, "
" 2 2 2
I
1
1
2221
,
1 1
" 2
I
I
-20
1 1
Il
"
1
+1
II II
I
I
I
Il
I
1
II II II
I
I
,
,
I
-30
1
I
l
l
''''
.
" Il " •
+
1
I
I
I
"
I
-f----f------+--------f-----f---------+------+----f--------+--------f----f
-5
-"
-3
-2
-1
0
1
2
3
Q
....o
S
r-J
z
e
-
e
e
e
e
,..
Fig. 7.9.7.
Plots of the empirical influence curve for Cox's estimate against z:
The cases are: 1: (t=O,6=1) 2: (t=1,6=O) 3: (t=1,6=1) 4: (t=4).
n=49,
8n =1. 36.
1 1
1 1
,
1
1
1 1
22:3 :3 3 1
332113
1 1
2
3
1
2
3 J
3
1 1
3 3
1 1
3
1
3 3
1 1
3
331 1
1
3
331 1
3 3 1 1
3
1
:3 3 1 1
1 1
1
..
2 3
It
2
3
•
2
:3
•
2
....o
w
Fig. 7.9.8.
16
Plots of the normalized difference in Cox estimates against z: n=49, 8 =1.36.
The cases are: 1: (t=O,O=l) 2: (t=l,O=O) 3: (t=1,5=1) 4: (t=4).
n
+----------------------------------------------------------------------------------------------------1
I
,
I
8
+
11'11 1
1
1 ,
I
o
,
+
,
3 3
J 3
J
J
3 3
3 3
I
3
I
-16
01""
-2'
-32
1 1 1 1
3 3
1 ,
, 1
,
1 1
1 1
2
"
1 1
1 1
3 3
I
3
2
3
/I
2
1 ,
3 3
+
3
2
u
1 1
2 3
Ia
3
."
3 3 1 1
3 3 1 1
+
1
J 3 3
II
1 1
,1 1
2 3
4
J
J
•,
+
2 3
2 3
2 3
J
.
I
2
I
-40
1 1 1
• • 4/1/1 2 2 222 2 2 3 J 3 1
1
332 1 1 3
+2 2 2 2 2 2 2 2 2 222 2 2 2 2 222
I
-8
1 1 1
•
+
I
J
2
•
+
I
J
I
3
/I
I
I
-56
,
3
2 3
•
-laS
)
I
2
3 3
,
2 2
3
I
4
2
33
I
4
2
3
I
114
22
333
I
2 2
II
+
/I /I
1
-64
e
/I 4
I
II 4 /I
+
-+------+-------+---+-----+---------+-----+-- -
-S
I
/I /I
I
I
I
2 2
-4
-3
-2
-1
0
e
z
1
+------+--------+--2
3
/I
1
-- -+
I-'
o
5
.s:o
e
1.
The discussion in this section has been mainly in terms of a scalar
covariate.
105
The sensitivity of Cox's estimator to outliers will not
be simple to describe when the z's are multivariate.
One cause of this complexity is multiplication
7.9.1.
by
1
(6) in
-avg -n
This means that an outlier in only one coordinate of z can
affect the estimates of coefficients for the other coordinates.
This
phenomenon is also encountered in the exponential survival model
(Section 6.6, Remark 4) and in least squares regression.
2.
The I.C. (7.9.1) shows why time dependent covariables are natural
when used with Cox's model.
if z(y) =
3.
~(y),
for every
For example, the
~C«t,o,~) ,~,F)
is zero
y~O.
Most of the discussion in this section has centered around the
empirical I.C.
The IC(Xi8,F) evaluated at a true long run distribu-
tion F serves as the limit of the empirical I.C.
(Mallows, 1974).
No attempt has been made here to evaluate the I.C.: for a theoretical
distribution.
Still, some observations can be made.
The behavior of the I.C. as a function of z should be analogous
to the bahavior of the empirical I.C.
should be smoother.
As a function of t, the I.C.
Discontinuities in the censoring pattern and
in values of the time dependent z's may still produce discontinuities
in the I.C. itself.
7.10.
The Simulation Study
The last section showed the effects of outliers in single samples.
To provide a probabilistic look at the problem of outliers, a simulation
study was carried out.
Th~
simulation was designed to show the influence
outliers have in a random censorship setting.
106
Five models were studied.
1.
BO:
no outliers, beta = O.
Each observation (t,6,z) was picked according to a random censorship
model.
In this model, the covariate z was picked from a U[-l,ll distribu-
tion.
Conditional on z, the failure time S was chosen from an exponential
distribution with hazard e
is equal to one.
distribution.
Bz
For model BO, beta is zero, so the hazard
Finally a censoring time S was generated from a U[0,41
The endpoint 4 was chosen to achieve approximately a 25\
censoring rate at z=O.
To complete the observation, t is set to the
minimum of S on C and 0=0.
This process is repeated 100 times to form
a sample.
2.
Bl:
no outliers, beta
= 1.
Model Bl is the same as BO, except that the failure times S are
.
exponent~al
3.
.
w~th
h azard e 1 .
BO/T4:
In model BO/T4, the basic setup is similar to BO.
But each
observation has a 10% chance of being t == 4.0, 0=0, and z=5.
In BO/T4,
we expect the estimates to be decreased by the 10% contamination.
4.
Bl/T4.
Bl/T4 is model Bl contaminated with 10% outliers with t = 4.0,
0=0, and z=5.
5.
BO/TO.
BO/TO is model BO contaminated with approximately 10% outliers at
t=O, 0=1, z=5.
In contrast to BO/T4, the contamination is expected to
increase the estimate of beta.
For each of these model:; 1000 samples of size n=lOO were formed.
The estimated regression coefficient in Cox's model was found with its
estimated variance by Newton Raphson iteration.
107
In about six cases, a
solution to the likelihood equations could not be found, and replacement
samples were required.
An attempt was made to produce estimates from a
Bl/TO model analogous to the BO/TO model above.
However, a solution to
the likelihood equation could not be found for over 10% of the samples
generated; therefore a Bl/TO case is not included.
Computations were carried out in single precision arithmetic on an
IBM 370/65 at the Triangle Universities Computation Center, Research
Triangle Park, North Carolina.
The basic random numbers generator was a
Tausworthe generator of uniform [0,1] numbers, described by J. Whittlesey,
Communications of the A.C.M. 11 (1968).
reused for each model.
The same uniform numbers were
Therefore the covariate and censoring times are
identical for samples in different models, if the observation is not an
outlier.
Percentile points for the estimated regression coefficients appear
in Table 7.10.1.
90
92.5
95
The tabled percentiles are:
1
2.5
5
10
25
50
75
99.
In addition to regression coefficients, a l3tudent-type statistic was
For the j-th sample let 13. be the estimated
calculated for each sample.
)
regression coefficient and v. its estimated variance.
J
If the
uncontam~
inated portion of the sample was generated with the true beta equal to
zero, define
(7.10.1)
w.
]
=
13.
~
v.
J
For those samples for models Bl and Bl/T4,
W
j
III
108
Since outliers also affect the estimated variances, the W distributions need not show the same patterns as the
for the W's are presented in Table 7.10.2.
B distributions.
Percentiles
All the percentiles were cal-
culated according to an empirical distribution function definition.
For
comparative purposes, the percentage points from a N(O,l) distribution
are also tabled with the w's.
TABLE 7. 10 . 1
PERCENTILES FOR THE REGRESSION COEFFICIENT BETA
Model
Percentiles
BO
BO/T4
BO/TO
Bl
BI/T4
1.0
- 0.51
- 0.73
0.45
0.48
- 0.39
2.5
- 0.45
- 0.67
0.53
0.58
- 0.37
5.0
- 0.34
- 0.62
0.59
0.64
- 0.35
10.0
- 0.26
- 0.58
0.63
0.71
- 0.32
25.0
- 0.15
- 0.52
0.69
0.85
-
50.0
- 0.01
- 0.45
0.75
0.99
- 0.23
0.13
- 0.39
0.82
1.15
- 0.18
90.0
0.25
- 0.34
0.88
1. 29
- 0.12
9 r >.0
0.32
- 0.29
0.91
1. 37
- 0.09
97.5
0.38
- 0.26
0.94
1. 76
- 0.05
99.0
0.46
- 0.22
0.98
1. 56
0.00
75.0
0.28
109
TABLE 7.10.2
PERCENTILES FOR THE STUDENTIZED W STATISTIC
Model
Percentiles
BO
BO/T4
BO/TO
Bl
Bl/T4
N (0,1)
2.40
-15.76
-
3.58
-
1.94
-15.35
- 1.96
- 4.43
4.17
-
1.66
-15.15
- 1.64
- 1.25
- 4.28
4.80
-
1. 36
-14.89
-
25.0
- 0.69
-
3.98
5.50
- 0.71
-14.30
- 0.67
50.0
- 0.04
-
3.62
6.10
- 0.07
-13.66
0.00
75.0
0.64
-
3.24
6.51
0.62
-12.93
0.67
90.0
1. 20
-
2.81
6.79
1.21
-12.28
1.28
95.0
1. 54
-
2.45
6.91
1. 54
-11.87
1.64
97.5
1.83
-
2.27
6.99
1.88
-11.42
1.96
99.0
2.18
-
1.98
7.06
2.19
-10.79
2.33
l.0
-
2.44
- 4.76
2.72
2.5
-
2.19
- 4.57
5.0
- 1.70
10.0
2.33
1.28
Here are some short comments on the results for each model.
1.
Model BO
Both Bct.:l und W show a sliqht negative bias.
In addition the left
tails are slightly more stretched than the right tails.
2.
Model BO/T4
The di sLribuLioll of BeLl is shifted to the left, as expected:
the
shift is about -0.44 at the 50-th percentile and is greater in the right
tail than in the left.
The resulting distribution is much more tightly
bunched than the BO distribution.
bution.
Similar remarks apply to the W distri-
110
3.
Model BO/TO
Both the Beta and W distribution show the expected shifts to the
right.
The W distribution has a longer right tail than left tail.
The
Beta distribution shows the same tendency to a lesser extent.
4.
Model Bl
For this no-outlier model; the Beta percentiles show a slight ten-
dency to a longer right tail.
The W distribution, on the other hand,
shows a slightly more stretched left tail.
the BO W distribution.
This was a characteristic of
The similarity may be due to use of the same
random numbers.
5.
Mode 1 Bl/T4
Both the Beta and W distributions show a large shift to the left
from the Bl distribution.
is about -1.22.
For Beta, the shift at the 50-th percentile
The greater mdgnitude of this shift compared to that
from BO to BO/T4 is expected from the plots in Section 7.9 of the normalized changes in
B:
the influence of outliers at
in the sample with (349
=
t~4
1. 36 than in the sample with
and z m 5 was stronger
f3 49
:z:
0.36.
Thtl
right tail is slightly longer than the left, and the distribution is much
tighter than the Bl case with no outliers.
Both tails of the W distribution are heavier than the corresponding
N(O,l) distribution.
the left tail.
In addition the right tail is more stretched than
The middle part of the distribution is symmetric around
the median; the interquartile range is 1.37, very close to the theoretical
1. 35.
Perhaps the results for the outlier models arc not surprising, since
the outliers are unusually bad.
in practice.
(Hampel, 1973).
'l'en percent gross errors is not unusual
7.11.
111
Conclusions and Recommendations
From the results of the last two sections, some
conclusio~s
can be
drawn about the robustness of Cox's estimator:
1.
The estimator is robust to outliers in the observed sample times.
This robustness reflects the nonparametric rank approach of the Cox procedure.
2.
On the other hand, the estimator is relatively more sensitive
to outliers in the covariates.
The influential covariates are likely to
be found at the extremes of the sample ranges.
In this sensitivity to
covariate position, the Cox estimator is like parametric regression models.
3.
Infinitesimal shifts in sample times can cause abrupt changes in
the estimated coefficients.
The influential local shifts are those which
move an observation in or out of a risk set.
These shifts may result
in practice from errors, rounding, and grouping.
The user of Cox's estimate can take some practical action to protect
the analysis against outliers in the covariates:
1.
Identification of extreme values is essential.
2.
Once extremes in the covariates are identified, a nwnber of steps
can be taken.
(Hampel, 1973).
Impossible values should be rejected.
On
the other hand, the extreme values may provide most of the variation in a
particular covariate; in such cases it may be
nece~sury
to keep and trust
the values.
3.
As a middle way, Hampel suggests moving doubtful outlying z's
towards the rest of the data.
This might be done by applying a trans-
formation.
4.
At some point, the Htatistician may choose to delete observations
with outliers to see how Cox's estimate is affected.
It may be the case
112
that the influence is not largl!.
5.
The approach described above deals with extremes in the zls
on a coordinate by coordinate basis.
of each observation would be helpful.
the empirical influence curve.
Assessment of the total influence
To do this one might calculate
The use of the empirical I.C. to detect
outliers was not dealt with in this chapter, but it is a topic worthy of
investigation.
For further discussion, see Chapter Two.
As a substitute
for calculating the I.C., observations could be ranked in order of their
estimated hazards.
There seems to be no simple remedy for the sensitivity of the
estimator to local shifts in the observation times.
To check on this
sensitivity, the estimation procedure might be run with different scalings
of time.
Observations measured in days might, for example, be grouped in
two-day or weekly units.
Another approach is to decrease all censoring times by a fixed
amount, or--equivalently--to increase all failure times.
This tactic
should alter some of the risk sets sufficiently to provide an additional
check on the sensitivity to local shifts.
If one finds that a sample estimate is sensitive to perturbations
or outliers, the question to a:3k is:
how sensitive?
One measure may be
the change in the estimate in comparison to the size of its esti.mated
standard errClr.
Of course the standard error, too, may be disturbed.
Of poor comfort is the fact that the influence curve is related
to a change in estimates multiplied by sample size.
This fact implies
that the influence of any pilrticular observations diminishes as sample
size grows.
Unfortunately, the number of bad data points is likely to
increase at the
sam(~
time.
8.
ESTIMATION OF THE UNDERLYING DISTRIBUTION
IN COX'S MODEL
8.1.
Introduction
Recall that in the P.H. model (7.1.1)
Cox's likelihood for
a did
not require knowledge of the underlying survival
distribution G(t), with hazard AO(t).
Nonparametric estimation of G(t) is
of obvious interest in its own right, and several approaches have been
proposed.
We assume that the covariates do not depend on time.
We will describe briefly the estimation schemes of Cox (1972) and
of Kalbfleisch and Prentice (1973).
A simple proof of the inconsistency
of Cox's method is given.
The main topic of this chapter is an estimator for G(t)
(in two
versions) due to Breslow (1972a,b) and considered in a slightly different
form by Oakes (1972).
8.2.
Cox's Approach
Cox (1972) chose to model the underlying distribution as discrete,
with mass at the observed failure times tel) ,t(2) , ... ,t(m)·
Let
nk(~)
be
the conditional probability of failure at t(k)' for a patient with
covariates z who has survived up to t(k)-O.
z~O,
In the underlying distribution,
and the underlying conditional probability is
nk(o), Cox postulated a logistic relationship:
nk(~).
To estimate
114
T
n (:) == exp (13 z)
(8.2.1)
n (0)
k
k
I-n
k
(z)
I-n
~
The previously derived value of
~
(k=l, •.. ,m)
(0)
k -
B was
B in
inserted for
separate maximum likelihood est.imation for n
k
(~),
(8.2.1), and
was carried at each
observed failure point.
It is natural to regard t:he discrete probabilities
(8.2.1)
1T
k
(~)
as approximations to probabilities in continuous time.
defined by
Doing so
leads to Cox's estimator for Go(t)
(8.2.2)
G(t)
Unfortunately, the logistic
di~;crete
any continuous time model.
This inconsistency was noted by Breslow (1972a)
and by Kalbfleisch and
Suppose that t
continuous time,
l
I
We
1
Prentic\~
and t
would
2
model (8.2.1) is not compatible with
(1973).
We give a (new) simple proof here.
(t <t ) are successive failure times.
l 2
writ.~
In
the conditional probability of failure at
ilS
(8.2.3)
1T(Z)
-
== p{t
<s<t Is>t ;z},
12 - 1 -
with n(z) related to n(O) by (8.2.1).
Now introduce
d
new failure time t', with t <t'<t .
1
2
Define
TI (z) ::: p{t <s<t' \s>t ;z};
1 -
1-
-
1
(8.2.4)
(ltobubi I i
t.i£H';
ctlrrL'sponding
to lI(Z).
We wi! l show that the loqi!:it.i(·
wlc.ltionshii> (EL2.1) C<lnnot hold for If(Z), TIl
(~),
and
112(~)
simultanuously.
115
We start with two equivalent equations relating W(z) to Wl(z) and
(8.2.5)
(8.2.6)
(Equation 8.2.5 is obtained from the fact that failure in [t ,t )
l 2
requires failure in [tl,t') or, conditional on survival in [t1,t'),
failure in [t',t ).
2
Equation 8.2.6 is obtained by subtraction or by
noting that survival of the entire interval [t ,t ) requires survival of
l 2
the two subintervals.)
Therefore
(8.2.7)
n (z)
=
l-n(z)
n
(l-n
-
l
l
(~)
n 2 (~)
+
(:» (l-n 2 (~»
l-n (:)
2
In particular, for z=O
(8.2.8)
n(o)
l-n(O)
1Tl(~)
=
(l-n
l
(~»
W (~)
2
+
(l-n
2
l-n2(~)
(~»
Now apply the logistic relationship (8.1.1) to n (z) :
(8.2.9)
n (z)
I-n (z)
T
.- exp(a z) n(p)
•
-
l-n (0)
Substituting for n(O)/l-n(O) from 8.2.8 into 8.2.9 we find:
(8.2.10)
n (z)
l-'IT (z)
116
By (8.2.1) we also have
= exp(~
(8.2.11)
T
~) TIl (~)
l-TI 1 (~)
and
T
exp(~ ~) 1T2(~)
l-TI
2
(?)
We can substitute from 8.2.11 into 8.2.7 and find another expression
for
IT(Z)/1-1T(Z):
(8.2.12)
T
exp(~ ~)
'IT (z)
1T
2 (,9)
1-1T2(~)
1-11 (z)
If we equate the right hand sides of 8.2.10 dnd 8.2.12, we conclude
that
(8.2.13)
But 8.2.13 implies that
1T2(~)
does not depend on z, a clear contra-
0
diction.
We should nob- that Cox llsed the discrete modol 8.2.1 not only for
l~st imat
ion of G bul
present.
d
J.so for
e~;timdtion
of B whon tied
failur~
time!, were
This pstimate, as shown by Kalbfleisch and prentice, was
inconsistent.
8.3.
The Approaches of Kalbfleisch and Prentice
Kalbfleisch and Prentice (1973) proposed a discrete model which is
compatible with the P.H. model in continuous time.
partitioned at the observed failure times t(k)' k
Spt·('
ifi
l'!;
The time axis is again
= l, .. "m.
The model
117
T
(8.3.1)
where
nk(~)
~
I-nk (z)
-
exp(B z)
(l-nk (0»
_
-
-
,
is the conditional probability of failure at t(k).
To
simplify notation, let us follow Kalbfleisch and Prentice by writing
(8.3.2)
The probability that a person with covariates z fails at t(k) is:
T k-l
T
exp(~ ~» n a. exp(~ ~)
(1-(1
(8.3.3)
k
j=O )
The probability that a person with covariate z survives past t(k) is,
,in the discrete model,
k
n a j exp(B
(8.3.4)
T
z)
.
j=O
Expression (8.3.3) is
therefor~
person failing at t(k)'
the contribution to the likelihood of a
Expression (8.3.4) is the contribution of a
person censored in £t(k) ,t(k+l».
The use of (8.3.4) is equivalent to
moving a censoring time in £t(k) ,t(k+l»
back to t(k)+O.
For a fixed value of B, the likelihood equation for
(8.3.5)
is:
exp(~T~),
I
=
~
j£R(k)
when'. F (k)
is the set of individuals failing at t(k)'
are allowed.)
The previously derived estimate
which is then solved for
~
=
1-nk(~)'
is more than one failure at t(k)')
B is
(Therefore ties
inserted into (8.3.5),
(Iteration is necessary if there
The estimate of the survivor function
is given by (8.2.2), resulting in a step function.
Kalbfleisch and Prentice also proposed a method of estimating Go(t)
by a connected series of straight lines.
For fixed intervals
II
=
[YO,y ), 1
2
l
=
[Y 'Y ) , .•. ,I
2
r
l
=
[yr-l,y ), the hazard function was 118
r
approximated by a step function AO(t)
=
A~
for tCI~.
Let R~ be the risk set at the fixed point Y~, and let R~_l/Rl be the
set of individuals failed or censored in
I~.
Of these
m~
are observed
failures in I ~.
Then, for
6=6, (previously derived), the likelihood function of
A ,A , ... ,A is proportional to
1 2
r
m~
r
n A~ eXP{-A~(.E
"T
1E:R~_/R~
l=l
(~-Y~_l)exp(~ :k) +
r.
"T
(Y~-Y~_l)exp(~ :k»}
kE:R l
(8.3.6)
=
The maximum likelihood estimator of A is easily found to be
l
(8.3.7)
A continuous estimate of the underlying survival function is:
G(t)
where
t
Ao(t)
=
f ~o(U)dU
o
(8.3.8)
This estimate was suggested by the estimators of Oakes (1972) and Breslow
(1972a,b), which are considered in the next section.
We note that if A~
=
likelihood is a function oF.
maximized with respect to
m~/Q~
is inserted into (8.3.6), the resulting
I{ alone.
This likelihood can in turn be
6, without recourse to a previously derived
119
estimate.
estimator
Holford (1976)
~
giv~~
details of this approach.
The resulting
is extremely dependent on the stepwise exponential assumption,
on the intervals chosen, and on the exact times of failure and censoring
within those intervals.
Holford's estimator
a is
therefore not recommended,
especially as the much more robust estimator of Cox is available.
8.4.
The Likelihoods of Oakes and Breslow
Oakes (1972) and Breslow (l972a,b) both approximated the underlying
hazard by a step function constant between observed failure times.
there are failure times:
t(l) ,t(2) , ••• ,t(m)·
Suppose
Then, the model is
(0.4.1)
estimator
Oake~'
i~
essentially that of equation (H.3.7), where now there
are m random intervals.
Oakes' el:>timate therefore incorporates knowledge
of exact censoring times betwel!n failures; a previously derived rank
estimator of 8 is required.
Breslow, on the other hand, followed the practice of Kalbfleisch
and Prentice (1973) with regard to the likelihood (8.3.5):
censoring times in [t(k) ,t(k+l»
he moved
back to t(k).
The resulting likelihood
'"
'"
provided for simUltaneous estimation of the Ak's and~. We develop Breslow's
estimators, following the technical report,
were first derived.
(Breslow, 1972a) in which they
See also CrOWley and I1u (1977).
With constant hazard between failures, the probability that a person
with covariates z fdils at t(k) is:
(8.4.2)
With Kalbfleisch and Prentice'H simplification of the censoring times,
the probability of being censored at t(k) is:
120
(8.4.3)
The likelihood for the data is therefore
T
m[m
n Ak
(8.4.4)
k=l
8 z.
J[
k
k
exp{-e- ~'J
I.
1=1
jll)/'\.+l
where §k is the sum of the covariates for those who fail at t(k)' and
\.1\.+1
is the set of individuals failing or censored in [t(k) ,t(k+l».
[This is the same as (8.3.6), with the adjustment to censoring times.)
The log likelihood is
(8.4.5)
The likelihoud equation for A is therefore
k
(8.4.6)
When these values of A are substituted into the 10g- likelihood (8.4.5),
k
tlw resulting 1 ikelihood equation for
8 is
m
u (8)
(8.4.7)
):.
k=.1
where
~
(t. (k»
[~k -~ ~ ( t
(k) »)
= (),
is the exponentially weighted mean
(7. J.I) .
When there arc no ties, (mk=n the likelihood equation (8.4.7) is
identical to Cux's original likelihood (7.3.9).
make the estimation of
8
In any case, ties do not
or the Ak's overly complex, as is the case with
other approaches.
The estimator 8 is inserted into (8.4.£1) yielding
121
~
----------
(8.4.8)
(t(k)-t(k_l»
AT
E exp(~ ~i)
j£'\.
This estimator is also a first order approximation to the solution of
Kalbfleisch and Prentice's equation (8.3.5).
For the remainder of the chapter, we consider in detail properties
of estimates of G(t) based on Rreslow's estimator (8.4.8).
Both Oakes
and Breslow called their est imcltes "maximum like 1 i hood." But this is not
strictly correct, as Kalbfleisch and Prentice (1973) remark:
the model is
not chosen independent of the data; the parameters depend on the sample;
and usual likelihood inference does not apply.
We will therefore consider
estimators based on (8.4.8) under the general random censorship model
and will disregard the original piecewise model (8.4.1).
8.5.
Functional Form; Fisher Consistency of Breslow's Estimate
From the tlstimated hazard rate A(t) (8.4.8), t.wo different estimators
of G(t) can be defined.
The first treats the quantities Ak(t(k)-t(k_l»
condit.ional probabilities of f,lilure in (t(k_l),t (k)]; thus
G(l) (t)
n
(8.5.1)
=
n
m
k
(l -
>: exp (6
{k:t(k)~t}
j£R
k
T
)
z.)
--J
This version therefore generalizes the product. limit estimator (5.2.1),
(Breslow, 1974).
The se<..:o~d estimator trea.ts the quantitiel;; .\ (t
estimates of
I
(k)
-t(k_U) as
(k)
A(u)du:
t(k_l)
(8.5.2)
(Breslow, 1975).
= exp {-
L
k:t(k)~S
[-
~ T 1l
~ exp(~ ~.)
j£'\.
1
,
l;(2) (0) generalizes the empirical cumulative hazard
n
as
122
process estimator (5.2.2).
Just as the product limit and empirical hazard process estimators
-(1)
are close, so we expect G
n
-(2)
and G
n
to be close in practice.
G(2) is
n
easier to write as a von Mises functional, so it will be studied in detail.
From now on let us assume that there are no ties among the observed
(~
times in the data
- 1, V k).
Then, the solution S • S(F ) to
-n
n
Breslow's likelihood is a von Mises functional, Fisher consistent, with a
known I.C., on the basis of the last chapter.
These facts make the proofs
of the next theorems easy.
Theorem (8.5.1).
Proof:
With S
=
G(2) (s) is a von Mises functional.
n
B(F ), we can write G(2)
-
n
n
(8)
as
n
(:;(2) (5)
n
s
(8.5.3)
exp{-f dFn (t , 1 ) /B ( t , a_ (F n ), Fn )}
o
where
was defined at 7.2.
Therefore G(2)
n
(5)
= G(s,F
n
), a von Mises functional with
s
G(s,F)
(8.5.4)
exp{-
J dF(t,l)/B(t,S(F) ,F)
o
o
Theorem (8.5.2).
Assume that the observations
(t,O,~)
,...
~rc
generated from a
random censorship model Fa with covariates, in which the P.H. model (7.1.1)
holds.
Assume the regularity conditions required in Section 7.6, so that
B is Fisher consistent.
Then G(2) (s) is Fisher consistent.
n
123
Proof:
By Lenuna 7.8.1,
Therefore,
G(S,F )
S
~
s
exp{-f
o
s
(8.5.5)
:: exp{-f
o
:: G(s).
o
8.6.
The Influence Curve
-(2)
The influence curve for G
(s) follows from results in Chapter 7.
First we let the added point by x * :: (t * ,6 * ,z
...,* ), and write
F £ = (1-£) F
+£ I
[x*) •
The estimate for G(s) based on F
E
is:
s
G(S,F )
E
exp{-f dF (t,l)!B(t,S(E),F )}
E
£
-
o
where following our previous practice, we write ~(E) :: ~(F£) and
B
13(0)
=
B(F).
The I.C. is given by
* - (2)
IC(x;G
(8.6.1)
(s) ,F)
= ddE
-
G(s,F[)
d
=.-G(S,F) [ dE
I
1[=0
dF (t,l)
J B(~,~(E)
s
]
,FE)
o
Let us expand the integral in (8.6.1) and differentiate each term:
•
- £=0
124
SdF (t,l)
f B(~,~(C),F
) =
o
(8.6.2)
eS*1 [t*<s]
-
+ £
_
[ B(t* ,B(£),F)
£
S
I
0
dF(t,l)
1
B(t,B(£),F )
£
We differentiate the first term on the r.h.s. of (8.6.2) under the
integral sign:
S
(8.6.3)
d
d£
f
o
s
I
-l[[~ B(t,~(£) ,F£) 1£=0
dF(t,1)
B(t,13 (£),F)
0=
£ £=
2
B (t,13,F)
By 7.7.20, this is
T
_[ (~(t'~'F)dF(t'l)] 1_C(x*;~,F)
o
B(t,B,F)
-
s
-J
(8.6.4)
s
+
dF (t, 1)
B(t,B,F)
J
o
where IC(x;~,F) is the I.C. of
S
(7.7.28).
The derivative of the second
term in (8.6.2) is simply
d [
(8.6.5)
-
d£
dF(t,1)
£1----
B (t * , ~ ( £) , F , E,
O*I[t*<S]
B(t * ,e,F)
II
S
0*1 [t*2,.s)
)
JB(t,B(£),F
)JJ
-
o
S
I
dF(t,l)
o B(t,13,F)
_
£
£=0
125
Adding (8.6.4) and (8.6.5), we find that the I.C. of G(2) (s) at
x
*
=
* * *
* -(2)
(t ,6 ,z ) is IC(x;G
(s) ,F) =
G(s,F) {
-6·I(t.~Sl
min(t*,s)
+
o
B(t*,B,F)
(8.6.6)
s
If
+
o
8.7.
I exp(~*T~)dF(t,l)
B2(t,B,F)
~(t,~,F)dF(t,l)JT ~C(X·'~'F)}
.
B(t,~,F)
Estimation of G(sl~)
A statistician may wish to estimate not G(s)
(corresponding to zzO)
but
T
G(slz) =
G(s)exp(~ ~»
The obvious estimate, based on a sample of size n is
(8.7.1)
The properties of G(2) (slz) follow in a straightforward way from
n
-
A
those of G(2) (s) and ~.
n
-n
Thus G(2) (slz) is a von Mises functional and is
n-
Fisher consistent under random censorship.
- (2)
The influence curve of G
derived.
I
(s~),
Let us write G£(sl~) for the estimate based on F£
temporarily dropping the superscript (2»
G£(sl~)
-£
where G
is
based on arbitrary F, is easily
(l-£)F + £ I[x.)
or
T
=
[G£(8»)exp(~(£) ~),
and 13(£) are functionals based on F£.
-(2)
Then the I.C. of G
I
(8~)
* *
*
-(2)
IC«t ,8 ' : );G
I
d- G
-£ (5 Z)
d£
-
(SI:),F)
I
126
£=0
d
-£
= d£ exp(log G (5)
I
T
exp(~(£) ~»
£=0
(8.7.2)
= G- (2)
I {log
(s~)
*
T
T
G(s) exp(~ ~)~C(X ia,F) z
+ exp(ST z ) IC(X* i<;(2)
(5)
,F)}
<;(2)(5)
This can be simplified by using 8.6.6 and the fact that
s
log
-(2)
e
G
(s)
f
dF(t,l)
B(t,S,F)
o
* * * -(2)
IC«t,8,z )iG
(sIZ),F)
to-
-
= exp(~ T =)G-(2)
(8.7.3)
- 8
I
IC(x * is,F)
(s =)
"I[t*~sJ
min (t * , s)
+ exp(~ T ~*)
B(t * )
In Appendix A3, we show that
=
o
EF(IC(/:<;(2)(sl~),F) =
The empirical I.C. follows when F
Let T(j)' j
f dF(t,~_}
n
B 2 (t)
0, V
s~O.
is substituted for F in 8.7.3.
l, ... ,m be the observed failure times.
Then
* * ,z * )iG-(2) (S\Z),F)
IC«t,8
""
n
_
n
(8.7.4)
"T *
- 0 * I[t*<sl + exp(~.~).
B (t .)-
} .
.
*
(1/I3 ( j »
{ J : t ( ]' ) ~ml.n ( t , s) }
L
2
127
The sum runs over the observed failure times.
8.8.
Conjectured Limiting Distribution for Breslow's Estimate
Let
s~u
be two points in time and z a covariate, with Breslow's
estimates G(2) (slz) and G(2) (ulz).
n
-
n
-
If F is the true distribution of the
data, then G(2) (slz) and G(2) (u\z) are, by definition Fisher consistent
n
n
for some constants, G(s,rlz) und G(u,Flz), respectively.
If these
-
-
constants exist, they may have no meaning unless a P.H. model holds.
Now define
m(G(2) (slz) - G(s,FI~»
Win =
n
w
= m(G(2) (u\z) - G(u,FI~»
2n
n
-
(8.8.l)
T
w = (WI ,w ).
n 2n
-n
If the theory of von Mises derivatives is applied to the functionals
G(2) (-Iz), we would conclude the w has a limiting multivariate normal
n
-
.... n
distribution:
L
w
_n
(8.8.2)
where A(F)
(8.8.3)
-+
N(O,A(F»,
(a .. ) is 2x 2, and
1.)
=E
{IC(X*;G(2) (SIZ),F)IC(X*;G(2)(ulz),F)},
F
=
-
* - (2)
E { Ie 2 (X;G
F
I
(u~)
-
}
,F) .
In other words, we conjecture that the estimator G(2) (slz) is,
n
as a function of s, a normal process.
of this conjecture is known.
-
No rigorous proof (or disproof)
It is interesting that the von Mises theory
128
suggests such a conclusion for general F.
The topic deserves further
investigation, perhaps through Monte Carlo methods.
To estimate the variance all of Breslow's estimate, the empirical
influence curve may be used.
(8.8.4)
(
The proposed estimate is
I
1
n
2
* -(2) (s z),F ),
) L IC (x.;G
n-l 1=1
1
n
where the summands are the squared empirical I.C.'s 7.7.4, evaluated at
* 1
x.,
1
n
=
l, .•. ,n.
The denomindtor (n-l) is suggested because
r. IC(X~;G(2) (S!Z),F ) = O. Again, the utility of 8.8.4 must be checked,
1=1
1
n
and this will be the topic of further research.
BIT3LIOGRAPHY
[1]
Aalen, O.
(1976). Nonparametric inference in connection with
multiple decrement models. Scand. J. Statist. l, 15-27.
[2]
Altschuler, B.
(1970). Theory for the measurement of competing
risks in animal experiments. Mathematical Biosciences ~,
1-11.
[3]
Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J.,
Rogers, W. H., and Tukey, J. W. (1972). Robust Estimates
of Location: Survey and Advances. Princeton University
Press.
[4]
Breslow, N. (1970). A generalized Kruskal-Wallis test for
comparing ! sampl~s subject to unequal patterns of censorship. Biometrika 57, 579-594.
[5]
Breslow, N.
(1972a). Covariance analysis of censored survival
data. Technical Report. Department of Biostatistics,
University of Washington.
[6]
Breslow, N. (1972b). Contribution to the discussion on the paper
by D. R. Cox, Regression Models and Life Tables. J. Roy.
~S_t~a~t_._S~o~c~. (~) 34, 216-217.
[7]
Breslow, N.
data.
[8]
Breslow, N.
(1975). Analysis of survival data under the proportional hazards model. Int. Stat. Rev. 43, 45-58.
[9]
Breslow, N. and Crowley, J.
(1974). A large sample study of the
life table and product limit estimates under random
censorship. Annals of Statistics 2, 437-53.
(1974). Covariance analysis of censored survival
Biometrics 30, 89-100.
[10]
Byar, D. P., Huse, R., Bailar, J. C. III and the Veterans
Administration Cooperative Urological Research Group.
(1974). An exponential model relating censored survival
data and concomitant information for prostatic cancer
patients. J. Nat. Cancer Institute 52, 321-326.
[HI
Cox, D. R. (1972)
discussion) .
Regression methods and life tables (with
J. Roy. Stat. Soc. (~) l!, 187-220.
130
(1975).
Partial likelihood.
Biometrika
6~,
[121
Cox, D. R.
269-274.
ll31
Cox, D. R. and Hinkley, D. V.
(1974).
Chapman and Hall, London.
l141
Crowley, J.
(1973). Nonparametric analysis of censored survival
data, with distribution theory for the k-sample generalized
Savage statistic. Unpublished Ph.D. dissertation, University
of Washington.
[15]
Devlin, S. J., Gnanadesikan, R., and Kettenring, J. R.
(1975).
Robust estimation and outlier detection with correlation
coefficients. Biometrika 62, 531-545.
[16]
Efron, B.
(1967). The two sample problem with censored data.
Proc. Fifth Berkeley Symp. Math. Statist. Prob. i, 831-853.
University of California Press.
[17]
Efron, B.
(1977). The efficiency of Cox's likelihood function
for censored data.
J. Amer. Statist. Asso~. 72, 557-565.
[lSI
Feigl, P. and Zelen, M.
(1965).
Estimation of exponential survival
probabilities with concomitant information. Biometrics ~!_,
Theoretical Statistics.
826-3~.
[19]
Farewell, V. T. and Prentice, R. L.
tional shape in life testing.
(1977). A study of distribuTechnometrics 19, 69-76.
[20]
Fillipova, A. A.
(1961). Mises' theorem on the asymptotic
behavior of functionals of empirical distribution functions
and its statistical applications. Theory Prob. and Its Appl.
!.-, 24-57.
[211
Fisher, L. and Kanarek, P.
(1974a).
Is there a practical difference
between some common survival models? Preliminary Version.
Department of Biostatistics, University of Washington.
[22]
Fisher, L. and Kanarek, P.
(1974b). Presenting censored survival
data when censoring and survival times may not be independent.
In ~eliability and Biometry: Statistic~l Analysi~~Lif~
le,:gth, pp. 303-326. eds. Proschan, F. and SPlfling, R. J.
Society for Industrial and Applied MaUle·mell.les, 'foronto.
[231
Grenander, V.
Part II.
[24]
Hampel, F. R.
(1968). Contributions to the theory of robust
estimation. Unpublished Ph.D. Dissertation, University of
California, Berkeley.
[251
Hampel, F. R.
survey.
87-104.
(1956). On the theory of mortality measurement,
Skan. Aktuarietidskr. 39, 125-153.
(1973).
Robust estimation: a condensed partial
Zeit. Wahrscheinlichkeits Theorie Verw. Geb~~,
131
[26]
Hampel, F. R.
(1974). The influence curve and its role in robust
estimation. J. Amer. Statist. Assoc. 69, 383-393.
[27]
Holford, T. R.
(1976). Life tables with concomitant information.
Biometrics ~, 587-598.
[28]
Huber, P. J.
(1972).
Robust statistics:
Statist. 43, 1041-1067.
[29]
Huber, P. J.
(1973).
Robust regression: asymptotics, conjectures,
and monte carlo. Ann. Statist. !, 799-821.
[30]
Kalbfleisch, J. D. and Prentice, R. L.
(1973). Marginal likelihoods based on Cox's regression and life model. Biometrika
60, 267-278.
[31]
Kanarek, P.
(1973).
Investigation of exponential models for
evaluating competing risks of death in a popUlation of
diabetes subject to different treatment regimens. UnpUblished Sc.D. dissertation, Harvard University.
[32]
Kaplan, E. L. and Meier, P.
(1958). Nonparametric estimation from
incomplete observations. J. Amer. Statist. Assoc. 53, 457481.
[33]
Mallows, C. L.
(1974). On some topics in robustness.
ms. Bell Labs., Murray Hill, N. J.
[34]
Martin, R. D.
(1974). Robust estimation: theory and algorithms.
Preliminary rough draft, October 18, 1974. Department of
Electrical Engineering, University of Washington.
[35]
Miller, R. G., Jr. and Sen, P. K.
(1972). Weak convergence of
U-statistics and von Mises' differentiable statistical
decision functions.
Annals of Math. Stat. ~, 31-41.
[36]
Oakes, D.
(1972). Contribution to the discussion on the paper
by D. R. Cox, Regression Models and Life Tables. ~. Roy.
Stat. Soc. (~) 34, 208.
[37]
Peterson, A. V.
(1975). Nonparametric estimation in the competing
risks problem. T\!chnical Report No. 73.
Department of
Statistics, Stanford University.
[38]
Peto, R. and Peto, J.
(1972).
variant test procedures.
206.
[39]
P~~khorov,
a review.
Ann. Math.
Unpublished
Asymptotically efficient rank in
J. Roy. Stat. Soc. A. 135, 185-
Yu. V.
(1956). Convergence of random processes and
limit theorems in probability theory. Theory Prob. and Its
~_. !, 157-214.
132
[40)
Rao, C. R.
(1965). Linear Statistical Inference and Its Applications. Wiley, New York.
[41)
Truett, J., Cornfield, J., and Kannel, W.
(1967). A multivariate
analysis of the risk of coronary heart disease in Framingham.
J. Chronic Diseases 20, 511-524.
[42)
Tsiatis, A.
(1975). A nonidentifiability aspect of the problem of
competing risks. Froc. Nat. Acad. Sci. U. S. A. 72.
[43]
von Mises, R.
(1947). On the asymptotic distribution of differentiable statistical decision functions. Ann. Math.
Stat. 18, 309-348.
AI.
APPENDIX TO CHAPTER FIVE
Theorem AI.l.
For differentiable F and for the IC(x;A(s,F»
M(s)
(ALL)
defined by 5.4.12,
= fIC(X;A(S,F»dF = o.
Proof:
Define
t
B(t)
(Ai. 2)
=
f (1-F(y»-2dF (y,1).
o
Then the I.C. 5.4.12 can be written as
IC( (t,O) ;A(s,F»
(Ai. 3)
+ 0 I [t<sl (l-F(t»
-1
.
I\lld
M( s)
= "\' ( Ie ( (t , 0) ; A (s , F) )
(Ai. 4)
where
s
- f
B(t)dF(t)
o
(Al.5)
- B(s) (l-F(s»
s
J
()
I
s
(1- F' ( t) )
-1 d F ( t , 1)
=
o
(1- [0' ( t) ) (1-[0' ( t) ) -
2
dF ( t ,
I) •
134
We can evaluate A (5)
l
S .
(Al.6)
A (s)
1
:=:
-
S
I J
+
B(t)F(t)
o
Now B(O)
F(O)
F(t)B' (t)dt.
0
0, and
B' (t)
(Al.7)
(l-F(t»
:=:
-2
dF(t,l)
dt
Therefore
s
- B(s)F(s) +
(Al. 8)
I
F(t) (l-F(t»
-2
dF(t,l).
o
s
+
- B (s) F (s)
(Al. 9)
J(
2
1-F (t) ) - dF (t , l)
o
- - B(slF(s) + B(s)
=:
Theorem A2.2.
e
estimate A (5).
n
Let (t .. O.)
1
1
-
rI
A (s).
1.
i '" 1, ..• ,n constitute the sample used to
We assume th(!n~ are no ties among the t '8.
Let the
J.C. 5.4.13 be written
IC(t.,l~.;A (5»
1
for simplicity.
(Al.lO)
1
n
Then \is=:!), 2.3.10 holds:
n
E re(t.,o.;A (s» = O.
i=l
1
1
n
Proof:
The proof is by induction.
For convenience we reorder the sample
< ... <
t
(n)
.
Th(m the conclusion of tht;.> theorem is
(1\1.11)
o.
135
The proof rests on the observation that, considered as a function of
5,
each term
takes on at most i+l distinct values, V i
occur when s < t(l)' s
= t(l)'
= l, .•. ,n.
The possible values
s = t(2)' ... , s = ten)'
In fact,
s < t(l)
0,
i
k
-2
1
E 0(") (l-F (t (j»)
,
n j=l )
n
.
for k such that t(k) < s < t(k+l) < t
(i)'
-
°
(s) ) =
IC (t (i) , (i) ; \
1
i
E
n j=l
(Al.12)
°(")
)
(l-Fn(t(j»)
+ O(i)(l-Fn(t(i»)
-1
-2
,
t(i) < s.
For example, IC(t(l) ,5(1) iAn(S»
takes on at most two values:
for
s < tel) and for s ~ tel) .
To prove Al.11, therefore, it is sufficient to consider the cases
s < tel) and s
=
t(k)' k
=
l, ... ,n.
The proof is by induction on k.
s < t(l)'
Case I:
Then IC(t(i) 'O(i) ;An(s»
=
a
V i
:0:
l, ... ,n, and thf' conclusion
holds in this case.
Case II:
s = t (l)'
(or t (l)
.2
s < t (2) ) .
Recall that (l-Fn(t(j») = (n-j+l)/n.
(n-1+1)
In ::
1.
Then (l-Fn(t(l») =
136
Now
lC(t (1)
,0 (1) iAn (t
!.n-F
(1»)
n
(Al. 13)
+ (l-F
n
n
(t
(1)
» -2
(t (1»)
-1
1
n
-+1.
(Al.14)
1
for i = 2, •.. , n .
n
'rherefore
n
(A1.1S)
i:/C(t(i),O(i)i"'n(t(1»)
=- ~ +
1 -
(n-1)/n
Now assume that the lennna holds for s ;:: t (k)
=
O.
(or, equivalently, for
t (k) < s < t. (k+l) ) .
This means that the sum
n
r.
(Al.16)
IC (t ( i)
,0 (i)
i
i=l
''n
(t (k»)
;::
o.
Explicitly
k
o
)
i
l
-
:
):
11
i == l
o(j)
(1- F n (t (j) ) ) -
2
j-=l
(AI.l7)
+
k
):
O(i)(l-FJ\(t(i»)-l
i=l
k
1
+ (n-k) [- n
"
.c
U
U
(j) (l-E'n (t (j»)
-2
].
j=l
The first two sums arc contributions from the first k observations.
last term comes from the (n-k) observations greater than t(k)
the same value for the I.C. at s = t(k).
I
all having
We must show H + = O.
k l
H 1 may be Qvaluated in three parts:
k+
The
137
l.
A contribution to Hk + l from tel) ,t(2) , ... ,t(k):
k
i
1
!:
i=l
L 0(,) (l-F (t(),»)
j=l)
n
n
-2
(Al.18)
k
!:
+
2.
-1
i=.l
0(,) (l-F (t(,»)
n
1.
.
1.
Contribution from IC(t(k+l) 'O(k+l) ;An(t(k+l»):
1 k+l
-2
C2 = - ~ ,L O(j) (l-Fn(t(j»)
+
)= l
°
-1
(k+l) (l-Fn(t(k+l»)
(Al.19)
-2
- O(k+l) (l-Fn(t(k+l»)
-1
+ O(k+l) (l-Fn(t(k+l»)
.
n
3.
Contribution from the I.C.'s of the n-k-l observations t(k+2) ,t(k+3)'
... ,ten) larger than t(k+l):
k+l
-2
C"3 = -(n-k-l)
>: O(j) (l-Fn(t(j»)
n
j=l
(Al. 20)
- (n-k-l)
n
°
- (n-k-l)
n
-2
(k+l) (l-Fn (t (k+l) ) )
.
Now writing H + = C + C + C and collecting terms multiplied by
l
2
3
k l
O(k+l)' we have:
k
H
k+1
(Al. 21)
E
i=l
i
-2
k
-1
E O(,)(l-F (t(,»)
+ E O(,)(l-F (t(,»)
n j=l)
n)
i=l 1.
n
1.
1
1
+ (n-k) [- -
n
k
E 0
, 1 (j)
)=
n
(l-Fn(t(),»)
-2
]
138
The first three terms add to "k' which is zero by the induction
hypothesis.
The last term is (using I-Fn(t(k+l»
=
n-k)
n
O(k+l) (l-Fn(t(k+l»)
-2
{-l + n(l-Fn(t(k+l») -
(n-k-l)}
n
(Al.22)
= o.
Therefore "k+l
o
and the theorem is proved.
o
A2.
APPENDIX TO CHAPTER SEVEN
In this appendix we show that the I.C.
(7.7.28) of Cox's estimator
satisfies 2.3.9:
(A2.l)
where F is arbitrary.
We drop the asterisks from x ;
depends on time.
(t,6,~), where ~ ;
{~(y)
: y~O}
(As in Chapter Seven, the proof will be strictly valid
only when z does not depend on time.)
Again write B ;
B(F) for the esti-
mator defined by 7.5.6.
Then
dF(y,l)
+ O(z(t)-lJ(t,B,F»1.
E (IC( (t,O,z) ;t3,F»
F
-
--
dF(y,l)
(A2.2)
Writing out the second t.erm in brackets, we find
(A2.3)
by the definition 7.5.6 of Cox's estimator B ;
B(F).
140
=
Then E (IC(XiB,F)
F
-
-
0 if
.-
t
EF(IeXP(~T~» (z(y)-~(y»
(A2.4)
o
dF(y,l»
= 0,
B(y)
or
Q(B,F)
0, say.
We have
00
00
II rI I[y~t]eXP(~T:(y»
Q(B,F)::
Zo
0
(~(y)-~(y»
dF(y,l)l dF (t,Z).
B(y)
Let us assume we can interchange the order of integration in
(L,~)
..inu '/.
(The conditions for Fubini's theorem will, of course, strictly apply only
if Z does not depend on time.)
With the interchange,
00
Q(8,F)
=
00
I lB~Y) III[t?y]l!XP(~T:(y» (:(y)-~(y» dF(t,~)]dF(Y,l)
o00
Zo
r~(y)
dF(y,l),
SdY·
o
We complete the proof by showing
"! (y )
== ~,
y?-O •
'0
w (y)
sly)
JJexP(~T~(y»:(y) dF(t,z)
Zy
-
00
~(y) JJexP(~T:(y»
S (y)
:: A ( y)
Zy
- 11 ( y) s (y), by 7. 5 . 3 and 7. 5 . 4
B(y)
jJ (y)
dF(t,z)
B(y)
-
lJ (y)
O.
[J
141
The proof applies to F
F , so that we also have
n
w
L IC(x.,8 ,F )
1. -n
n
i=l
= o.
A].
. .
Proposl.tl.on:
APPENDIX TO CHAP'rER £o;IGHT
'
* - (2) (s,F
)
}
For an arb'l.trary d"lstn.b
utlon
£0' an d f or IC ( x;G
given by 8.6.6,
A3.1.
E
F
- (2)
{ I C (x * ; G
}
( s) , F) _ ==
o.
v s">o.
Proof:
Let us drop t.ll('
ast.erisk~;
from x * , so that x == (t.,l\,Z)
Then
I
~!~U.
-
o
R (y)
min (t , s)
- (2)
Ie ( x ; G
( s) ,to')
-: ( 2)
(,
(s)
{-o I ( < J + exp(~ T ,~)
t s
-----B (t)
(A3.2)
s
+
[I
o
p (y) dF (y , l) J T IC(XiB,F) }
B (y)
By the result of A2,
o.
E (rC(x;8,1")} ==
F -
Therefore the expected v.llue of the last bracketed term in (AJ.2) is
zero.
The proposit.ion will hold if (ignoring the factor G(s»
min(t.,s)
(A3.1)
E (l5T It::.s)} ::
F
_-B(t)
EF{cxp(~T~)
...
Thl~
I
()
~.._Ll.l
B:> (y)
1.h.s. of A3.3 can be written
s
(A3.4)
L.H.S.
=
I
o
= l\ (S,lt")
o
•
143
The r. h . s . i s
min(t,s)
R.ILS.
T
E fexp(z (3)
F
'
.~
dF (y, 1)
J
. B 1 (V)
I
()
.x>
(1\3.5)
E {exp (z T B)
F
- -
f
< I I [ <.) dF (V,
y_?
o - - -B2(y)
-----I [
y_t
1) ) }
00
00
f f exp(:T!~) {f
Z ()
I
I
[y~t]
0
8
2
[y.2.s)
dl" (y , l)} dF (t , z) •
(V)
Here the inner integral .i.s with respect t.O y, the outer ones are with
respect to (t, z),
(Z is the domain of
(l.,z) and V integration.
s
R.II.S.
z).
Let. us
interchange the order of
Then 1\3.6 becomes
un
J tJ J
ZV
o
s
f
(l/H(y»
dF(V,l), bV definition 7.5.4 for B(y),
o
- f\
o
(s l")
'
1..11.5.
[]
144
A4.
COMPUTER LISTING OF SAMPLES FOR CHAPTER SEVEN PLOTS
"-
Sample I.
n=49, l3 =0.36.
n
0.0007
0.0035
0.0127
0.0302
0.0350
0.0650
O.OqqO
0.11190
O.lAIlO
0.1979
0.2231
0.2386
0.2567
0.2577
0.2655
0.3445
0.340Q
0.3699
0.11004
0.430R
0.4558
0.4878
0.5295
0.5362
0.580B
0.f'015
0.7461
0.763Q
0.7801
O. fHiqa
0.975';
1.0633
1.1368
1. 1 II 70
1. 2178
1.32B9
1. 3q:n
1.3921
1.4045
1.6322
1.6565
1.781A
1.8607
2.07Ql
2. 26Q,
2.11268
2.6624
2.9724
3.6949
1
1
1
0
0
1
1
1
0
1
0
1
0
1
0
1
0
0
1
0
1
1
1
1
1
0
1
1
0
1
,,
1
0
1
1
,,
,
0
0
1
0
1
0
1
1
0
1
The observations are (t,o,z) .
0.750
O.qqll
-0.333
0.821
-0.667
0.372
-0.558
-0. lfJ6
-0. 8t1 7
0.589
-0.8RO
-0.010
0.902
-0.383
0.878
0.692
-0.855
-0.414
0.472
0.125
-0.228
0.053
-0.038
0.860
0.251
-0.360
-0.316
0.710
0.6Ql
-0.603
0.419
-0.463
-0.868
0.BQ3
-0.12A
0.208
0.638
-0.747
-0.693
-0.574
-0.785
C.029
-0.166
0.971.>
0.871l
-0.6511
0.517
-0.72<)
-0.619
BtOMATHEMATICS TRAINING PROGRAM
145
A4.
Sample II.
Sn =1.36.
n=49,
0.0003
0.0013
0.0177
0.0302
0.0350
0.0448
0.1098
O. 1725
0.1730
0.1812
0.1840
0.2231
0.2270
O. 2411
0.2497
0.2567
0.2655
0.3489
O.369C}
0.3755
0.3780
0.4308
0.4518
O. qf; 28
0.5502
0.r;727
0.5814
0.6015
0.6415
0.70Q9
0.7801
0.1866
1.0234
1.0789
1. 1291
1.3102
1.3203
1.3309
1.4111
1. 4845
1.5877
1.6565
1.7302
1.8607
1. 8611
2.9l76
2.9724
3. q 2 41
3.9QSq
1
1
1
o
o
1
1
1
1
1
o
o
1
1
1
o
o
o
o
1
1
o
1
1
1
1
1
o
1
,
o
1
1
1
o
1
o
o
o
o
1
o
1
o
o
1
o
o
o
0.750
0.994
-0.333
0.821
-0.667
0.372
0.589
0.6c}2
-0.558
-0.196
-0.8In
-0.880
. 0.860
-0.010
0.412
0.902
0.878
-0.855
-0.414
0.110
-0.383
0.125
O. 251
0.053
-0.038
-0.228
0.893
-0.360
O. 419
0.638
0.691
0.910
-0.)16
0.208
-0.46)
0.A15
-0.868
-0.128
-0.603
-0.693
0.517
-0.785
0.029
-0.166
.-0.574
-0.741
-0.729
-0.619
-0.654
(Continued)
The observations are (t,o,z).
© Copyright 2026 Paperzz