A General Class of Nonparametric Tests for Survival Analysis Author

A General Class of Nonparametric Tests for Survival Analysis
Author(s): Michael P. Jones and John Crowley
Source: Biometrics, Vol. 45, No. 1 (Mar., 1989), pp. 157-170
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2532042
Accessed: 05/10/2008 17:31
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=ibs.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.
http://www.jstor.org
BIOMETRICS
45, 157-170
March1989
A GeneralClass of NonparametricTests for SurvivalAnalysis
Michael P. Jones
Departmentof PreventiveMedicine,Collegeof Medicine,
Universityof Iowa, Iowa City, Iowa 52242, U.S.A.
and
John Crowley
Fred HutchinsonCancerResearchCenter, 1124 ColumbiaStreet,
Seattle,Washington98104, U.S.A.
SUMMARY
Taroneand Ware(1977, Biometrika64, 156-160) developeda generalclass of s-sampletest statistics
for right-censoredsurvival data that includes the log-rank and modified Wilcoxon procedures.
Subsequently,many authors have consideredtwo- and s-sample classes in detail. In this paper a
family of nonparametricstatistics is shown to unify existing and generate new test statistics for
the s (>2)-sample,s-sampletrend, and singlecontinuouscovariateproblems.
1. Introduction
Many of the popularnonparametrictwo-sampletest statisticsfor censoredsurvivaldata,
such as the log-rank(Mantel, 1966), generalizedWilcoxon (Gehan, 1965), and Peto-Peto
(1972) test statistics,have been shown to be specialcases of a generaltwo-samplestatistic,
differingonly in the choice of weight function (Taroneand Ware, 1977; Gill, 1980). This
work has been extendedto a generals-samplestatistic(Taroneand Ware, 1977; Andersen
et al., 1982) which includes the s-samplelog-rank,Breslow(1970), generalizedWilcoxon,
and Prentice (1978) linear rank statistics. There are, however, many survival analysis
proceduresthat do not obviously fit into the generalclass above. Examplesare statistics
that deal with the s-sample trend problem, such as the Jonckheere(Gehan, 1965) and
Tarone (1975) trend statistics, and statisticsthat deal with continuous, fixed, and timedependentcovariateproblems,such as the Cox (1972) score test, the linear rank statistics
of Prentice(1978), the modifiedKendallrankcorrelationof Brown,Hollander,and Korwar
(1974), and the logit ranktest of O'Brien(1978). In this paperwe proposea nonparametric
statisticthat is applicableto the s-sample,s-sampletrend, and single continuouscovariate
problems,and as a generalclass unifies all of the aforementionedtest proceduresand some
new statisticsthat are robustto outliersin the covariablespace. In particular,the Kendall
and Jonckheerestatistics,whose textbook definitions are in terms of U-statistics,will be
written as rank statistics.One of the new statisticsis a generalizationof the log-ranktest
appropriatefor the single continuouscovariatesettingand robustto extremecovariates.
In a common approachto developingtest proceduresfor survivaldata in the spirit of
Mantel (1966), a contingencytable is formed at each failuretime containingthe risk set,
covariableinformation,and identificationof the failing individual(s).Most of the ad hoc
Key words. Censored data; Kendall rank correlation; Log-rank test; Time-dependent covariates.
157
158
Biometrics, March 1989
nonparametricproceduresas well as Cox regressionfor time-dependentcovariates(Cox,
1972, 1975) can be derived from these contingency tables. In Section 2 we present the
contingencytable approachto the survivalproblem,and from these tables we motivate a
class of single-covariatenonparametrictest statistics.In Section 3 we show that many of
the common survivalanalysistests are special cases of this generalstatistic.This class is
also extended to the s-sample problem. In Section 4 we discuss severaltopics including
stratificationand small-sampleproperties,beforesummarizingthe benefitsof the proposed
generalclass of procedures.
2. The Class of Statistics
Suppose there are k distinct failure times,
t(l) <
. .
<
t(k),
and that t(o) = 0. Let R(t(j))
denote the risk set at time t(i). Each of the n(t(j)) individualsat risk at t(i) has a covariate
Xj(t(I)),j E R(t(j). At time t(i), b(t(j)) individualsfail. Each individualwill eventuallyfail or
be censored. Conditional on the covariate, the censoring and failure mechanisms are
assumed to be mutually independent. For the rest of this paper, we shall assume all
n individualsunderstudy are at risk at time 0, i.e., n = n(O).
Each of the n(t) individualsat risk at time t has a quantitativelabel Zj(t), j E R(t). We
shallassumethat the labels Zj(t ),j E R(t) have been constructedso that increasinglabels
correspondto either monotone increasingor monotone decreasingchances of failing,but
consistentlyone or the other. This is our nonparametricmodel relatingthe label to the
hazardof failure,typicallycalled an orderedhazardsmodel. The Zj(t) may representthe
covariatevalues but can be more generaland representsome function of the covariates,
e.g., the ranks of the covariates.Define an at-riskindicator Yj(t) = 1 if j E R(t) and 0
otherwise,and an observedfailuretime indicatorJj(t ) = 1 if the j th personis observedto
fail at time t and 0 otherwise.At time t the data can be summarizedin a contingencytable
(Table 1).
Table 1
Summaryof data at time t
Individual
Label
Indicator of failure
At risk
2
Z,(t)
Z2(t)
J2(t)
J2(t)
Y,(t)
Y(t)
n
Sum
Znl(t)
il -t
.Yn(t)
b(t)
n(t)
2
The generalclass of statisticswe proposeis given by
k
n
T(w, Z) = , w(t(i)) , Jj(t(i))[Zj(t(i)) j=1
i=1
(t(i))],(1
where Z(t) = n-'(t) E Yj(t)Zj(t) is the average label at risk at time t, and w(t) is a
weight function. T(w, Z) will often be abbreviatedto T when the meaning is clear from
the context. If the failuremechanismpreferslargevalues of the labels,then T will be large;
if it prefers small labels, T will be small. If we fix the labels and margins of Table 1,
(J, (t),
...,
Jn(t)) has a multiple hypergeometric distribution under Ho with mean
b(t)n-'(t) x (Y,(t), .. ., Yn(t))and covariancematrixwith (j, I)th element
b(t)[n(t
- b(t)]n-'(t)[n(t)
- ll-'Yj(t)[bj,n(t)-
Y,(t)]5
A Class of NonparametricSurvivalTests
wherebQ= 1 if j
=
159
I and 0 otherwise.In this frameworkit is easy to verifythat
E[ X Jj(t)Zj(t)0= b(t)Z(t),
n
1
~ ~
n(t) - b(t)
b(t) ( ))
vart E Jj(t)Zj(t)]
(2)
Czz(t),
(3)
where CM0(t)= n- '(t) , Yj(t) [Zj(t) -Z(t)]2 is the sample variance of the labels of
those at risk at time t, in which n-1(t) is used rather than [n(t) - I]-'. Let b*(t) =
b(t)[n(t) - b(t)]/[n(t) - 1]. When there are no tied failuretimes, b*(t) -1. As a variance
estimator,therefore,let us propose
k
V(w, Z)=
E
w2(t(I))b*(t(I))Czz(t(i)).
(4)
One would hope to be able to refer the statistic T/NJVito a standardnormal table to
obtain a significancelevel for the associationbetween the label and the hazard.Note that
in (2) and (3) the mean and varianceare functions of random variables.Furthermore,in
the calculationof V, the hypergeometricvariancesare summedacrosstablesas if they were
independent. Of course, they are highly dependent. However, it still seems intuitively
reasonablethat one should be able to condition acrosstime on previouseventsand proceed
as above. In fact, many authors have used martingaletheory to derive the statistical
propertiesof tests whose motivation is based on changes in the risk set over time. Gill
(1980) appliedthis theory to a class of two-sampletest statisticswith quite generalweight
function,and Andersenand Gill (1982) investigatedthe Cox relativeriskregression.Using
similartechniques,Jones and Crowley(TechnicalReportNo. 88-1, Departmentof Preventive Medicine,Universityof Iowa, 1988)showedthat underthe null hypothesisand suitable
is asymptoticallya standard
regularityconditions, Vis consistentfor var(T) and that T/ JIV
normal random variablefor a generalclass of weight functions w and label functions Z.
The failuretime distributionneed not be continuous.
The T statisticdefinedin (1) representsa broadclass of nonparametricprocedureswhose
individualmembers are specified when particularchoices for the weight function w and
label Z are made. We shall introduceseven differentchoices for the label Zj(t):
(Li) The most obvious choice of label is Zj(t) = Xj(t), in which case T is the Cox score
statistic of Ho: : = 0. Since Cox regressionis quite sensitive to outliers in the
covariable space, appropriatealternative choices might be labels robust to such
outliers.
(L2) Let rj(t) be the rank of Xj(t) among {Xi(t): i E R(t)}; then Zj(t) = rj(t)/n(t) would
be a robustchoice. Becauseof the difficultyin rankinga multidimensionalpoint, this
procedureis typicallyconfined to a single covariate.
(L3) If Zj(t) is a function of rj(t)/n(t), such as normal scores or approximatenormal
scores,then one not only gains in robustnessbut also in approximatenormalityof T
in small samples,wherethe centrallimit theorem does not hold. This should lead to
an improvementin the Type I errorrate in small studies.
(L4) If the covariateis truncatedso that
if Xj (t) < XL
(XL
Zj(t)
=
AXj(t) if XL Xj(t) Xu
Xuif
Xj(t)>
Xu
whereXLand Xu are predefinedconstants,then outliersare assignedvalues XLand
Xu, and the actualcovariatedata betweenXLand Xu are used.
160
Biometrics, March 1989
(L5) If the covariatevalues at risk {Xj(t): j E R(t) are replacedby a Winsorizedsample
IZ,(t): j E R(t) in which au % of the uppertail have been moved down to the next
observedcovariatevalue, and similarlyfor aL% of the lower tail, then once again
most of the actualcovariatesare used with some protectionagainstoutliers.
(L6) A grouped covariate, e.g., Z1(t) = di when xi-, < Xj(t) - xi where i = 1, . . ., s and
-oo = xo < xi < ... < xsl < xs = ?o are predefined constants, would provide
protectionagainstdata that are only approximatelycorrect.The constants{diI might
be chosen to represent any type of trend, e.g., linear (di = i), quadratic (di = i2), or
logarithmic (di = log i), i = 1, . . ., s. Alternatively, they could be midpoints of the
intervals [xi-,, xi] for i = 2, . .., s - 1 with d, = some minimum and ds = some
maximum. The value of di may varywith time, e.g., the averageor median covariate
within [xi-1, xi] among those at risk at time t.
(L7) Finally, consider Zj(t) = T[Xj(t) - X(t)], where T downweightsoutliers similarto
the robust T function used in M-estimationprocedures.The generalform
c + g(Ix - cl)
T(x) = Xx
-c - g(l x + cl)
if x > c
if lxiI c
if x < -c
for some nonnegativefunctiong and constantc coversa wide class.For example,the
functiong( Ia l) = log(1 + Ia I) bringsin outlierswithoutboundingtheircontribution
to T, whereasthe functiong 0 boundsthe outliercontribution.Similarly,one could
modify the truncatedcovariatewith such a g function.
The weight function w(t) is chosen to be sensitive to a prescribed alternative hypothesis.
As seen in (1) when w
1, T weights failures evenly over time. For convenience,
let W'(t; a, f) = [SKM(t )]a[ 1 - SKM(t)] , where SA is the Kaplan-Meierestimator.Note
that
W'(t;
1, 0) gives more weight to early failure times,
W'(t;
0, 1) to late failure times,
and wi(t; 1, 1) to mid failuretimes. Severalpopularforms of T employ w(t) = n(t), the
numberat risk.
3. Equivalence to Popular Tests
In this section we show that various forms of the trend statistic T, specifiedby particular
choices of the label and weight functions, are equivalentto many of the commonly used
survivalanalysisprocedures.The types of labelsto be consideredherearethe rawcovariate,
the groupedcovariate,the covariaterank,and the transformedrank of the covariate.
3.1 Raw Covariate
Cox relative risk regressionbased on the relative risk model Xj(t) = Xo(t)exp[3Xj(t)]
(Cox, 1972) is the principalsurvivalregressionmethod based on covariatesin use today.
The Cox score statistic of Ho: = 0 based on Breslow's (1974) approximatepartial
likelihoodis
k
E
i=l
[S(t(i))- b(t(i))XV(t(i))]j
(5)
where S(t) is the sum of the covariatesof the failing individualsand X(t) is the average
covariate at risk. The Cox score statistic (5) and T given by (1) are equivalent when
w(t) = 1 and Z = X. Thus, the Cox score statisticcan be writtenas T(1, X).
A Class of NonparametricSurvivalTests
161
Prentice(1978) developedanotherregressionapproachfrom the marginallikelihood of
a rank vector of residuals using the accelerated failure time hazard models Xj(t) =
exp(fXj)X0[t exp(3Xj)]. The subsequentcensoreddata linear ranktests of Ho: = 0 have
the generalform
k
v
=
E (X(i)ci + S*.)Ci),
i=1
wherethe covariatesare fixed, there are no tied failuretimes, X(E)is the covariateof the
individualobservedto fail at t(i),S*(/) is the sum of the covariatesof those censoredduring
(t(i), t(I+l)), and ci and C1 are as defined by Prentice (1978). Prentice and Marek (1979)
showed that in the two-sample problem v is of the form (1) with w(t(1))= c1 - C1 and
b(t)
1 whenever
n(t(j))C,_j = ci + [n(t(1))- I]C1.
(6)
The substitution
k
n-I(t()) E (x(i)+ S(*))= Z(t())
.=J
1
Michalek,
and
holds.
Mehrotra,
whenever
(6)
fixed
covariate
problem
in the general
Mihalko (1982) showed that (6) is always true. Hence, Prentice's linear rank tests of
in their equation (6) shows that v is equivalent to (1) with the same w(t(1)) and b(t(i))
Ho: 3 = 0 can be written as T(ci - Ci, X).
Much attention has been focused on proceduresthat test the equality of s survival
distributions.Taroneand W-are(1977) introduceda class of statisticsin which the log-rank
and Gehan tests for the two-sampleproblem and the log-rankand Breslow tests for the
s-sampleproblemdifferonly in the choice of weightfunction.Subsequently,many authors
have studied such generalclasses for two or more samples, including Aalen (1978), Gill
(1980), Fleming et al. (1986), Breslow (1982), and Andersen et al. (1982). Before the
relationshipwith the Tarone-Wares-sampleclassis studied,the Tstatisticmustbe extended
to the s-samplecase. Let XJ'(t) = (Xi 1(t), .. , Xj,(t)), whereXjrn(t) = 1 if thejth individual
is in group m at time t, and 0 otherwise.The obvious extensions of (1) and (4) are a T
statisticvectorwhose mth component is
.
k
Tn2(w, X) =
E
n
W(t(i)) E Jj(t(i))[X;m(t(i))
-
X,(t(i))],
(7)
j=l
i=-l
where X,n(t) = n-'(t) E Yj(t) Xjfl(t) is the average mth component of the covariate vector
and a covariancematrixV(w, X) whose (m, l)th element is
k
w
w2(t(i))b*(t(i))Cxlx,(t(i)),
V,7,1(W, X)=
(8)
i=l1
where Cx,7IX/(t)is the sample covariance, using n-(t) rather than [n(t) - I]-, of the mth
and Ith components of the covariates at risk at time t. Let O,72(t)= E Jj(t)Xj,,(t) be the
number of observed failures in group m at time t, andf(t) = E Yj(t)Xjm(t) be the number
at risk in group m at t. Defining E,m(t)=fn(t)b(t)/n(t) to be the hypergeometricmean for
0,,,(t), substitutioninto (7) yields
k
T,n(W5 X)
1= W(t(i))[1in(t(i))-Etn(t(i))1i=lI
(9)
162
Biometrics,March 1989
Direct substitutionshowsthat
C,K, ,A(t )
=n-
(t )
fi
Y;(t0)[X;,n(0- V,n(t)][XjlI(t) - Xz(t )]
j=1
f(tO)
(
f_
(t
n(t)J'
n(t)
so that (8) becomesthe familiarform
V/,n(w, X) = E W2(t(i))b*(t(i))
n(t)
n2(t)
i=1
Letting T' = (T,,...,
(10)
-
Ts) and VT be a generalized inverse, the quadratic form T'V-T is
identicalto that of Tarone and Ware (1977) and basicallythe same as that of Andersen
et al. (1982). The latterfirst introducea separateweight function for each component but
later note that most examples of interest are covered by a common weight; our w is
equivalentto their L. The only other differenceresultsfrom their assumptionof no tied
failuretimes so that their varianceestimationis given by (10) without the tied failuretime
adjustmentfactor[n(t) - b(t)]/[n(t) - 1]. Becauseof this equivalenceto the Tarone-Ware
class, much is known about T in the s-sample setting. For the two-sample problem in
particular,T encompassesthe Efron(1967) test with
w(t)
=
n (t)f-'(t)f2j'(t)SKM,I(t)SKM,2(t)
wheneverfi(t)f2(t) > 0, whereSKm,i(t)is the Kaplan-Meierestimatorat time t for sample
i (i = 1, 2); it also encompassesthe differencein cumulativehazardsfor the two samples
with w(t) = n(t)fV'(t)f-1 (t) wheneverJ (t)f2(t) > O.Table 2 containsother specialcases.
3.2 GroupedCovariate
to show that
Using the groupedcovariatelabel defined in Section 2, it is straightforward
T2/ V is identical to the trend statisticfirst proposedby Tarone (1975) in the case w 1
and later extendedto more generalweightfunctionsby Taroneand Ware(1977).
3.3 CovariateRank
For j E R(t) let rj(t) be the rank of Xj(t) among {Xl(t): I E R(t)}, where ties are
handled by midranks,and let Zj(t) = rj(t)/n(t). Then Z(t) = [1 + n-'(t)]/2. Suppose
there are e(t) distinct covariates at time t. Let m1(t) be the number of covariates at
risk equal to the smallest distinct value, m2(t) the number equal to the next smallest
value, and so on. Therefore, mi(t) of the 1rj(t):j E R(t)} are equal to the average
+ mi-1(t) + 1, ..., mi(t) +
+ mi-I(t) + mi(t). Note also that
of mi(t) +
n(t) = ml(t) + * + me(t)(t). From Lehmann (1975, p. 330, eq. A.18),
...
...
n2(t)
-
1
e(t)
m (t)[m2(t)
-
1]
so that
lCzzC~()=12L
(t ) =
-eel) [i
m(t)fl
i(1
Ei[n( n(t))
1=
The generalcovariaterankversionsof T and V are obtainedby substituting[1 + n 1(t )]/2
for Z(t) and (11) into equations (1) and (4). Note in particularthat T[n(t), r(t)/n(t)] =
T[1, r(t)]. We preferrj(t)/n(t) over rj(t) as the label since rj rangesbetween 1 and n(t),
A Class of Nonparametric Survival Tests
163
Table 2
Special cases of T
Label
Setting
Weight
General
Fixed covariate
s samples
Covariate
2 samples
2 samples, stratify
on second covariate
Grouped covariate
s-sample trend
n`'(t) x Ranked
covariate
s-sample
Fixed continuous
covariate
General
Transformed
ranked covariates
Equivalent test
Cox (1972) score test of Ho:3 = 0
Prentice (1978) linear rank statistics
Tarone and Ware (1977); Andersen
et al. (1982)
1
Log-rank
n(t)
Breslow (1970)
Prentice (1978)
ci - Ci
Tarone-Ware (1977); Gill's (1980)
w(t)
K class
1
Log-rank (Mantel, 1966)
Gehan (1965)
n(t)
wi(t; 1, 0) Peto and Peto (1972)
w(t; a, F) Fleming et al. (1986)
w(t)
Stratified versions of 2-sample tests
1
ci - Ci
w(t)
1
w(t)
w(t)
Tarone (1975)
Tarone and Ware (1977)
Same as for covariate label
n(t)
Brown et al. (1974) modification of
Kendall rank correlation
T(w, r/n) (?3.3)
Generalized log-rank (?3.3)
Weighted Kendall (?3)
Jonckheere trend test
(Gehan, 1965)
s-sample trend
w(t)
1
n(t)w(t)
n(t)
Continuous
1
Logit rank test (O'Brien, 1978)
which inherently weighs early failure times. On the other hand, rj(t)/n(t) always stays
within (0, 1];the only weightingcomes from the weightfunction w.
Let us considerthe covariaterank version of T for the s-sample, continuous covariate,
and s-sampletrend problems.Define X (t) = (Xj 1(t), .. . , Xj,(t)), Om(t), fm(t), and Em(t)
as in Section 3.1. Then the rank of Xjm(t) among lXim(t): i E R(t)} is
n(t)
rim(t) =
-
fm(2)
if j E group m at time t,
n(t) - fm(t) + 1 if j e group m at time t.
2
The averagemth component of (rjI(t), . . ., rj,(t)) is P.m(t) = [n(t) + 1]/2. Noting that
2rjm(t) = [2n(t) -fm(t)
+ 1]Xjm(t) + [n(t) -fm(t)
+ 1][1 - Xjm(t)],
it is easy to show that substitution of Zj(t) = rj(t)/n(t) into (7) yields Tm(w, rln)
2Tm(W,X).
=
164
Biometrics, March 1989
Furthermore,
Yi
Y(t)[rims(t) - .,n(t)][rj1(t) -
C,,7,,()=n`(t)
j=1
n(t)
4
'
.,(t)
1
E
- X.12(t)][Xjl(t)
Yj(t)[XjJn(t)
X(t)]
-
so that
Cz",z1(t) = (4)Cx,,x1(t)
and
Vmi(W,
rln)
=
(7)Vml(w,
X).
Thus,
r/n)IT(w, r/n)} = {T(w, X)}'V-(w,
{T(w, r/n)}'V-(w,
X)IT(w, X)}
so that the Tarone-Wareclass for the s-sampleproblemcan be derivedfrom the T statistic
using eitherthe covariatesor the ranksof the covariatesfor the label. Hence, the log-rank
test can be denoted by eitherT(1, X) or T(1, rln).
Next let us considerthe rank version of T when the covariateis continuous. Extending
the work of Brown et al. (1974), we shall introducea weightedKendall rank correlation
statisticmodifiedfor right-censoreddata for a fixed covariate.In the absenceof tied failure
times the rankversionof T is equivalentto a specificform of the weightedKendallstatistic.
To facilitatethe proof we shall derivea U-statisticversionof T. Assume for the continuous
covariateproblemXj = Xj (t) = Xj (0) and there are no tied failuretimes. Define
01
I
U11
12
Lo
ifXj>XI
if = XI
ifXj<XI.
Ifj E R(t), we have
1+
rj(t)=
E
(12)
Ujl.
IER(t)-{i}
With fixed covariates the survival data typically are representedas 1(yj, Xj, bj), =
1, .. ., n}, whereyj is the time under study for the jth individual,and bj= 1 if he left the
study througha failureand 0 if he was censored.Let tj be the actual survivaltime for the
jth individualwhich is observedonly when bj= 1. For simplicity,reorderthe indices so
that yi - *** - yn and j < I whenever yj = y,, bj = 1, and &1= 0. Then from (1) and (12)
the U-statisticform is
n-I
T[w(t)n(t), r(t)/n(t)] = E w(yj)Qj
i=I
since n(yj) = n
-
U,iI
-
-Aj
)
(13)
1ER(Yj1)-i}j
j + 1 when bj = 1. In creating a framework for the Kendall statistic,
define
1
c(u, v)
=
if u is definitely largerthan v
if u = v or uncertain
(-1 if u is definitely smallerthan v.
For example, if (y5, bi) = (5, 1), (Y2,
62) =
(4,
0), and (y3, 63) = (4, 1), then c(tl, t2)
=
0,
and C(t2, t3) = 1. Since the covariates 1XjI are all known, the "uncertainty"
option of c(X;, XI) is never used. We introducehere a weightedKendallrank statistic
C(tl, t3) =
1,
K(v) = 2 ,
j<i
vjlc(tj, tl)c(Xj, XI).
(14)
A Class of Nonpararnetric Survival Tests
165
When vjl-1, (14) is the Kendallrankstatisticintroducedby Brownet al. (1974, henceforth
denoted by BHK).
Proposition 1 In the absence of tied failure times and when vjl = w(yj), K(w) =
-4T[w(t)n(t), r(t)/n(t)].
*..
Proof Since yi,
< yn and j < I whenever yj = yl, bj = 1, and 1 = 0, then
c(tj, t,) = -bj forj < 1.Since c(Xj, XI) = 2Uj, - 1, (14) becomes
K(w) = 2 Z
j<1
w(yj)(-bj)(2Uj,
= -4
j=l I
w(yj)bj
= -4T(w, r)
l=j+I
=
-
1)
Ujl -
-4T[w(t)n(t),
by (13) and the factR(yj) - {j} = {j + 1, ...,
i
j=1
w(yj)bj 2-
1
r(t)/n(t)]
n}.
When there are tied failuretimes, T and K are not equivalent.In essence the T statistic
compares covariablesof simultaneouslyfailing individuals,whereas the K statistic does
not since c(tj, t,) = 0 when tj = ti. The BHK modification of the Kendall statistic is
K(1) = -4T[n(t), r(t)/n(t)]. As such, there are two distinct advantagesto K(1) belonging
to the T class. First, the BHK Kendall statistic requiresfixed covariates,whereasas a T
statistic it is extended to time-dependentcovariates. Second, K(1) can be extended to
having weights, i.e., as T[w*(t)n(t), r(t)/n(t)] for some function w*(t), although not to
the extent of the weightedKendallstatistic(14).
Note that the Kendall statistic for the general covariateproblem and the Gehan twosample statisticare both representedby T(n, r/n). This confirmsthe observationof BHK
that the Kendall statisticapplied to the two-sampleproblem is the Gehan statistic.Since
w(t) = n(t), the Gehan-Kendallstatisticweightsearlyfailuretimes more heavily;however,
since the weight depends on the censoringmechanism,there can be severe consequences
as shown by Prentice and Marek (1979) in the two-samplecase. Let us propose a new
statisticT( 1, rln) that appliesequal weightingacrosstime and that is equivalentto the logrank test in the two-samplecase. This "generalizedlog-ranktest" will be appropriatefor
the proportionalhazardsmodel and yet robust to outliers in the covariablespace. If one
wants a test that is sensitive to hazarddifferenceswhich are not constant over time, then
one should use the generalform T(w, rln), where the weight w is chosen appropriately.
Sometimes the weighting scheme can be predeterminedexactly as a function of time.
Oftentimesone may know only whetherearly,mid, or late failuretimes shouldbe weighted,
in which case the aforementioned w(t; a, ,B) is an excellent candidate. To avoid the
problems experienced by the Gehan-Kendall statistic, the weight function should be
independentof the censoringdistribution.
Finally, let us consider T in the s-sample trend problem.In particular,let us study the
specialcase in which the fixed covariatelabels for the s categoriesare ordered.In contrast
to Section 3.2, no particulartrend alternative is assumed here. Let us now define a
Jonckheeretrend statisticbased on a two-sampletest. Suppose the group index ordering
reflectsthe covariateordering.If Sab is a statisticused to test the differencebetweengroup
a and group b, then a reasonabletest statisticof no differenceamong s groupsagainstan
orderedcategorytrend alternativeis the Jonckheerestatistic E Sab, where summation is
over all grouppairingsa < b.
Proposition2 In the s-sampletrendproblemthe Jonckheeretrendstatistic(Gehan, 1965)
based on Gehan'stwo-sampletest is equivalentto the BHK censored-datamodificationof
166
Biometrics,March 1989
Kendall rank correlationK(1). In fact, K(1) = -2 E Sab, where Sab is the Gehan test for
groupsa and b and summationis over all pairsa < b.
Proof Without loss of generality assume the s distinct covariate values are I1, ..., s}.
The data can be presented as I(yj, bj, Xj ), j = 1, . . ., n}, where yj, bj, and Xj are as defined
before. However, for simplicity of this proof it will be convenient to consider another
indexingsystem(a, i) wherebyYaiis the observedtime on study for the ith personin group
a. The terms jai, Xai, and taj are defined similarly. Hence, Xai = a. The data are now
representedby I(yaij, ai, Xai), i = 1, ... ,fa(O)and a = 1, . . ., s}, wherefa(O)is the size of
groupa at time 0. Then
K(1)=
c(ti, tj)c(Xi, Xj)
i j
S
=E
a=1
Since c(u, v) =-c(v,
for a < b,
S
fa
fb
E
E
E C(tai, tbj)c(Xai,
b=1
u), c(Xai, Xaj)
XbJ).
j=1 j=1
=
c(a, a) = 0, and c(Xai, Xbj)
K(1) = -2
= -2
E
a<b
i
E
Sab,
C(tai,
C
=
c(a, b) =-1
tbj)
j
a<b
where Sab is the Gehan test comparinggroups a and b. Because the Jonckheerestatistic
is equivalent to the Kendall statistic, it is also equivalent to the trend statistic T by
Proposition 1 when there are no tied failuretimes and the weightsare all equal to 1.
3.4 TransformedRanked Covariates
In this section we consideronly continuous covariates.Section 3.3 can be generalizedby
employingtransformationsof the rankedcovariatelabels,i.e., the labelforthej th individual
at time t is Zj(t) = gt[rj(t)], wheregt is a monotone function. In this fashion one might
replacethe actual covariatesby the expected or approximateorder statisticsunder some
distributionfor the covariates,e.g., the normal. Except for a few distributions,expected
order statistics are difficult to find. To calculate approximate order statistics, let
gt(l) = H 1 {(l - ')/n(t)}, where / = 1, ..., n(t), and Ht is a distribution function. Note
that (I - ')/n(t) is never 0 or 1 so that g(l, t) will be finite. The choice Ht = 1, the normal
distributionfunction,will give approximatenormalscores.The logit functionalso provides
a fair approximationto the normal.
Proposition3 The trend statisticsT(1, Z)/vfV(1, Z) where Zj(t) = V-1{[rj(t) - ']In(t))
and Z.(t) = logit [rj(t) - -]/n(t )} are equivalentto the inversenormaland logit ranktests,
respectively,proposedby O'Brien(1978).
Proof This propositionfollows readilyupon comparisonof (1) and (4) with the correspondingequationsin O'Brien(1978).
As a summaryto this section, Table 2 lists some tests that are membersof the general
class of statisticsspannedby T(w, Z).
4. Example
Table 3 specifiesthe posttreatmentsurvivaltimes and agesat time of treatmentfor 28 male
patients with low-gradegliomas (brain tumors) who comprised a randomized study of
radiotherapywith and without CCNU, an oral nitrosourea(chemotherapy).These data are
A Class of Nonparametric Survival Tests
167
Table 3
Survivaltimes (days) and ages (years)for
28 male patientswith low-gradegliomas
Survival time Age
6
61
179+
236
296
370
420
474
535
547+
587+
637+
639
747
Survival time Age
38.4
59.6
18.9
65.5
67.8
42.1
66.2
38.1
57.3
34.5
37.0
30.1
49.5
50.9
797+
954
967+
1,050
1,134
1,141
1,213+
1,349+
1,506+
1,517+
1,624
1,828+
1,983+
2,237+
36.9
49.0
26.8
66.9
30.4
41.6
35.1
22.2
42.0
43.1
30.9
35.7
29.5
57.8
Table 4
Analysesof unmodifiedand modifiedTable3 data
Standardizedtest statistic
Unmodified
data
3.15
2.92
3.10
Modified
data
1.40
2.69
2.87
Statistic
X)
T(1,
T(1, rln)
T(n, r/n)
Name
Cox score
Generalizedlog-rank
Kendall
T(SKM,r/n)
Survival-weighted T
3.11
2.87
Logit rank
2.88
2.35
T(1, Iogit(
2
from the SouthwestOncologyGroupProtocol7983. Data indicatingCCNU treatmentare
omitted. In order to demonstratethe differentialeffects of extreme covariate values on
variousforms of the T statistic,a second data set is createdfrom that of Table 3 by simply
changingthe age of the last patient to exit the study from 57.8 to 97.8. Howeverunlikely
this modification may be for this particulardata set, it is certainlywithin the range of
extremecovariatevalues found in many studies.Five differentversionsof the normalized
T statistic were computed for the unmodified and modified data and are presented in
Table 4. For the unmodified data the Cox score test yields the most significantresult,
followed closely by the Kendall and survivalfunction weightedcovariaterank version of
T, both of which emphasizeearly failuretimes. Data sets can be found for which each of
these five statisticsis most significant.Analysis of the modified data set, as seen in Table
4, illustratesthe considerableinfluence of a single extreme covariateon a covariate-based
procedure,such as the Cox score test which has droppedfrom 3.15 to 1.40, whereasthe
covariaterank-basedproceduresremainreasonablyrobust.
5. Discussion
The T statisticis designedto consider only a single covariate,although some extensions
are possible,as seen in the s-sampleproblemin Section 3.1. In a multicovariatesituation
one often wants to test for the significanceof one particularcovariatewhile adjustingfor
168
Biometrics, March 1989
the effects of the others.In a nonparametricsettinga popularapproachto this problemis
to stratifythe databasedon valuesof the covariatesto be adjustedfor, form the test statistic
Ti and its varianceestimatorViwithin each stratum,and finallyform a normalizedstatistic
i 12, wheresummationis over the numberof strata.For example,in a data set
-(ET1)( V4)consistingof three covariates,sex with two levels, age groupwith five levels, and city with
two levels, one could stratifyon the ten sex-age group combinations and then test for
a survival difference between the two cities using a stratified log-rank or a stratified
Peto-Peto test. Of course, the decision on how to stratifya continuous covariateis often
guesswork.
For significancetesting purposeswe would hope that T/lVi is approximatelya N(O, 1)
random variableunder Ho. Under regularityconditions T/VT7 is in fact asymptotically
normal (Jones and Crowley, Technical Report No. 88-1, Department of Preventive
Medicine, University of Iowa, 1988); however, in small samples the approximation
may be poor. The large-sampleapproximationrelies on the central limit theorem. Let
K = b(t1) + *** + b(tk) be the total numberof individualsobservedto fail. As seen in (1),
T is approximatelynormal for small K underHo when the {Zj(t)I are close to normal;but
when the IZj(t )} are far from normal,a good approximationmay requireverylargeK. The
obvious solution for small samplesis to transformthe covariatesto be as close to normal
random variables as possible. Let us consider the continuous covariate situation.
One possibilitywould be to replacea covariatewith an approximatenormal score; e.g.,
Zj(t) = b- '[(rj(t) - ')/n(t)] or Zj(t) = logit[(rj(t) - ')/n(t)] as mentioned in Section 3.4.
O'Brien(1978) has shown throughsimulationthat the logit rank test, based on the latter
label, has better control over the Type I error level than the Cox score test. Normal
approximationto T may be poor if the distributionof covariatesis skewed or there are
outliers. With fixed continuous covariatesanother solution would be to transformthe
initialcollectionof covariatesso that they are more normallydistributed-perhapsa simple
log transform.Certaininvarianceprinciplesshould be noted here: T/ i V based on the raw
covariates is invariant to the linear transformf(X) = a + bX; T/I/V based on logtransformed covariates is invariant to the transform f(X) = aXb; and T//V
based on
rankedor transformedrankedcovariatesis invariantto any monotone transformation.
The generaltrendstatisticT has severalverydesirableproperties:(i) it handlestied failure
times, (ii) it allowstime-dependentcovariates,(iii) its varianceestimatorV does not require
the censoringdistributionto be independentof the covariate,(iv) it allows fairlygeneral
weighting schemes, and (v) it represents a fairly general class of nonparametrictest
proceduresfor survivalanalysis.Its two majordisadvantagesderivefrom its nonparametric
nature:its principal use is for a single covariate and there are no parametersper se to
estimateand interpret.The weight function should be chosen with a particularalternative
in mind. For example, if one believes that the associationbetween the failurehazardand
the label is strongerearly in time ratherthan later, the weight function should be chosen
accordingly.To avoid erroneous results due to the censoring,distribution,the weight
function should be chosen independent of censoring-dependentquantities, such as the
numberat risk.
As a generalclass, T contains many of the survivalanalysisproceduresin common use
(Table 2), includingboth linear and nonlinearrank statistics.The linear rank statisticsof
Prentice(1978) use the ranksof the failuretimes and the actual covariatevalues, whereas
the nonlinearKendall rank correlationof BHK (1974) uses the rank of the failuretimes
and the successive rerankingof the covariates at each failure time (?3.3). There are
severalbenefitsin belongingto such a generalclass. If we invent a new test, or are interested
in an alreadyexistingone, and we can show that it is a memberof the T class,then several
extensionsare immediatelyavailable.For example,in the form of (1) it can accommodate
weight functions and time-dependentcovariates.Severalof the statisticsof Table 2 were
A Class of Nonparametric Survival Tests
169
designed with neither in mind. We can also write down (4) as its variance estimator, which
can handle tied failure times and is appropriate even when the censoring distribution
depends on the covariate. Conditional on the covariate we still require the failure and
censoring mechanisms to be independent. As an example of such an extension, as seen in
Section 3.3, the Kendall statistic of BHK (1974) can be modified to handle a weight
function and time-dependent covariates. Another benefit of the T class is to extend old
tests into completely different covariate settings, sometimes creating new tests, while
retaining the same weight function and label. For example, the two-sample log-rank
procedure, as shown in Sections 3.1 and 3.3, can be written as either T( 1, X) or T( 1, rln).
Extending this to the continuous covariate setting, T( 1, X) is the Cox score statistic and
T(1, rln) is a new procedure, a generalized log-rank. Likewise, the entire Tarone-Ware
class of two-sample tests can be generalized to the continuous label statistics T(w, X) or
T(w, rln). If there are outliers in the covariable space, then instead of using the Cox score
test T(w, X), we may opt for a more robust test T(w, Z), where the label Zj(t) is rj(t)/n(t)
or perhaps the truncated covariate, the latter retaining most of the original data. For small
sample sizes or highly skewed covariate situations, we may decide to replace the covariate
by a transformed rank of the covariate or by a transformation of the covariate (?3.4) to
have better control over the Type I error. Another benefit of such a general class is in the
study of the asymptotic properties. Given a general central limit theorem for T/VY ( Jones
and Crowley, unpublished technical report, 1988), asymptotic normality for any member
of the T class, specified by particular weight and label functions, can be established as a
corollary to the general theorem. Finally, because so many tests differ only by the choice
of weight function and label, such a unifying class promotes a better understanding of the
relationships among test procedures, which in turn helps in the decision as to which
procedure should be used in a given situation.
ACKNOWLEDGEMENTS
This work was done while the first author was supported by National Cancer Institute
training grant T32 CAO9168 and research grant CA39065 and the second author by a
NIGMS grant. Data for the example in Section 4 are used with the permission of Dr
Harmon Eyre, Chairman of the Brain Tumor Committee of the Southwest Oncology
Group, and Dr Charles A. Coltman, Jr., Chairman of the Southwest Oncology Group.
RESUMm
Tarone et Ware (1977, Biometrika64, 156-160) ont developpe une classe generalede test de s
echantillonspour des donnees de survie censureesa droite;cette classe inclut la proceduredu lograng et celle de Wilcoxon modifiee. Par la suite, de nombreuxauteursont etudie des classes pour
deux ou s echantillonsen detail. Dans ce papier,on montrequ'une famille de statistiquesnonparametriquesunifie l'existantet permetd'engendrerde nouvellesstatistiquesde test pour le problemede
s (> 2) echantillons,le problemed'unetendancepours echantillonset celui d'unecovariablecontinue.
REFERENCES
Aalen, 0. 0. (1978). Nonparametricinferencefor a familyof countingprocesses.Annalsof Statistics
6, 701-726.
Andersen,P. K., Borgan,O., Gill, R. D., and Keiding, N. (1982). Linear nonparametrictests for
comparisonof counting processeswith applicationsto censored survival data. International
StatisticalReview50, 219-258.
Andersen,P. K. and Gill, R. D. (1982). Cox'sregressionmodel for countingprocesses:A large-sample
study. Annals of Statistics 10, 1100-1120.
Breslow,N. E. (1970). A generalizedKruskal-Wallistest for comparingK samplessubjectto unequal
57, 579-594.
patternsof censorship.Biormetrika
Breslow,N. E. (1974). Covarianceanalysisof censoredsurvivaldata. Biometrics30, 89-99.
170
Biometrics, March 1989
Breslow, N. E. (1982). Comparison of survival curves. In The Practice of Clinical Trials, M. Buyse,
M. Staquet, and R. Sylvester (eds), Chap. 2, Part 6. Oxford: Oxford University Press.
Brown, B. W., Jr., Hollander, M., and Korwar, R. M. (1974). Nonparametric tests of independence
for censored data, with applications to heart transplant studies. In Reliability and Biometry,
Statistical Analysis of Lifelength, F. Proschan and R. J. Serfling (eds), 327-354. Philadelphia:
Society for Industrial and Applied Mathematics.
Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical
Society,Series B 34, 187-220.
Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269-276.
Efron, B. (1967). The two-sample problem with censored data. In Proceedings of the 5th Berkeley
Symposium 4, 831-854. Berkeley, California: University of California Press.
Fleming, T. R., Augustini, G. A., Elcombe, S. A., and Offord, K. P. (1986). The SURVDIFF procedure.
Supplemental Release, SAS User's Group International. Cary, North Carolina: SAS Institute,
Inc.
Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly-censored samples.
Biometrika 52, 203-223.
Gill, R. D. (1980). Censoring and stochastic integrals. Mathematical Centre Tracts 124. Amsterdam:
Mathematisch Centrum.
Lehmann,E. L. (1975).Nonparametrics:
StatisticalMethodsBasedon Ranks.San Francisco:HoldenDay.
Mantel, N. (1966). Evaluation of survival data and two new rank-order statistics arising in its
consideration. Cancer Chemotherapy Reports 50, 163-170.
Mehrotra, K. G., Michalek, J. E., and Mihalko, D. (1982). A relationship between two forms of linear
rank procedures for censored data. Biometrika 69, 674-676.
O'Brien, P. C. (1978). A nonparametric test for association with censored data. Biometrics 34,
243-250.
Peto, R. and Peto, J. (1972). Asymptotically efficient rank-invariant test procedures (with discussion).
Journalof the Royal StatisticalSociety,SeriesA 135, 185-206.
Prentice, R. L. (1978). Linear rank tests with right-censored data. Biometrika 65, 167-179.
Prentice, R. L. and Marek, P. (1979). A qualitative discrepancy between censored data rank tests.
Biometrics 35, 861-867.
Tarone, R. E. (1975). Tests for trend in life table analysis. Biometrika 62, 679-682.
Tarone, R. E. and Ware, J. (1977). On distribution-free test for equality of survival distributions.
Biometrika 64, 156-160.
ReceivedFebruary1987; revisedMarch 1988.