A General Class of Nonparametric Tests for Survival Analysis Author(s): Michael P. Jones and John Crowley Source: Biometrics, Vol. 45, No. 1 (Mar., 1989), pp. 157-170 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2532042 Accessed: 05/10/2008 17:31 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ibs. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected]. International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org BIOMETRICS 45, 157-170 March1989 A GeneralClass of NonparametricTests for SurvivalAnalysis Michael P. Jones Departmentof PreventiveMedicine,Collegeof Medicine, Universityof Iowa, Iowa City, Iowa 52242, U.S.A. and John Crowley Fred HutchinsonCancerResearchCenter, 1124 ColumbiaStreet, Seattle,Washington98104, U.S.A. SUMMARY Taroneand Ware(1977, Biometrika64, 156-160) developeda generalclass of s-sampletest statistics for right-censoredsurvival data that includes the log-rank and modified Wilcoxon procedures. Subsequently,many authors have consideredtwo- and s-sample classes in detail. In this paper a family of nonparametricstatistics is shown to unify existing and generate new test statistics for the s (>2)-sample,s-sampletrend, and singlecontinuouscovariateproblems. 1. Introduction Many of the popularnonparametrictwo-sampletest statisticsfor censoredsurvivaldata, such as the log-rank(Mantel, 1966), generalizedWilcoxon (Gehan, 1965), and Peto-Peto (1972) test statistics,have been shown to be specialcases of a generaltwo-samplestatistic, differingonly in the choice of weight function (Taroneand Ware, 1977; Gill, 1980). This work has been extendedto a generals-samplestatistic(Taroneand Ware, 1977; Andersen et al., 1982) which includes the s-samplelog-rank,Breslow(1970), generalizedWilcoxon, and Prentice (1978) linear rank statistics. There are, however, many survival analysis proceduresthat do not obviously fit into the generalclass above. Examplesare statistics that deal with the s-sample trend problem, such as the Jonckheere(Gehan, 1965) and Tarone (1975) trend statistics, and statisticsthat deal with continuous, fixed, and timedependentcovariateproblems,such as the Cox (1972) score test, the linear rank statistics of Prentice(1978), the modifiedKendallrankcorrelationof Brown,Hollander,and Korwar (1974), and the logit ranktest of O'Brien(1978). In this paperwe proposea nonparametric statisticthat is applicableto the s-sample,s-sampletrend, and single continuouscovariate problems,and as a generalclass unifies all of the aforementionedtest proceduresand some new statisticsthat are robustto outliersin the covariablespace. In particular,the Kendall and Jonckheerestatistics,whose textbook definitions are in terms of U-statistics,will be written as rank statistics.One of the new statisticsis a generalizationof the log-ranktest appropriatefor the single continuouscovariatesettingand robustto extremecovariates. In a common approachto developingtest proceduresfor survivaldata in the spirit of Mantel (1966), a contingencytable is formed at each failuretime containingthe risk set, covariableinformation,and identificationof the failing individual(s).Most of the ad hoc Key words. Censored data; Kendall rank correlation; Log-rank test; Time-dependent covariates. 157 158 Biometrics, March 1989 nonparametricproceduresas well as Cox regressionfor time-dependentcovariates(Cox, 1972, 1975) can be derived from these contingency tables. In Section 2 we present the contingencytable approachto the survivalproblem,and from these tables we motivate a class of single-covariatenonparametrictest statistics.In Section 3 we show that many of the common survivalanalysistests are special cases of this generalstatistic.This class is also extended to the s-sample problem. In Section 4 we discuss severaltopics including stratificationand small-sampleproperties,beforesummarizingthe benefitsof the proposed generalclass of procedures. 2. The Class of Statistics Suppose there are k distinct failure times, t(l) < . . < t(k), and that t(o) = 0. Let R(t(j)) denote the risk set at time t(i). Each of the n(t(j)) individualsat risk at t(i) has a covariate Xj(t(I)),j E R(t(j). At time t(i), b(t(j)) individualsfail. Each individualwill eventuallyfail or be censored. Conditional on the covariate, the censoring and failure mechanisms are assumed to be mutually independent. For the rest of this paper, we shall assume all n individualsunderstudy are at risk at time 0, i.e., n = n(O). Each of the n(t) individualsat risk at time t has a quantitativelabel Zj(t), j E R(t). We shallassumethat the labels Zj(t ),j E R(t) have been constructedso that increasinglabels correspondto either monotone increasingor monotone decreasingchances of failing,but consistentlyone or the other. This is our nonparametricmodel relatingthe label to the hazardof failure,typicallycalled an orderedhazardsmodel. The Zj(t) may representthe covariatevalues but can be more generaland representsome function of the covariates, e.g., the ranks of the covariates.Define an at-riskindicator Yj(t) = 1 if j E R(t) and 0 otherwise,and an observedfailuretime indicatorJj(t ) = 1 if the j th personis observedto fail at time t and 0 otherwise.At time t the data can be summarizedin a contingencytable (Table 1). Table 1 Summaryof data at time t Individual Label Indicator of failure At risk 2 Z,(t) Z2(t) J2(t) J2(t) Y,(t) Y(t) n Sum Znl(t) il -t .Yn(t) b(t) n(t) 2 The generalclass of statisticswe proposeis given by k n T(w, Z) = , w(t(i)) , Jj(t(i))[Zj(t(i)) j=1 i=1 (t(i))],(1 where Z(t) = n-'(t) E Yj(t)Zj(t) is the average label at risk at time t, and w(t) is a weight function. T(w, Z) will often be abbreviatedto T when the meaning is clear from the context. If the failuremechanismpreferslargevalues of the labels,then T will be large; if it prefers small labels, T will be small. If we fix the labels and margins of Table 1, (J, (t), ..., Jn(t)) has a multiple hypergeometric distribution under Ho with mean b(t)n-'(t) x (Y,(t), .. ., Yn(t))and covariancematrixwith (j, I)th element b(t)[n(t - b(t)]n-'(t)[n(t) - ll-'Yj(t)[bj,n(t)- Y,(t)]5 A Class of NonparametricSurvivalTests wherebQ= 1 if j = 159 I and 0 otherwise.In this frameworkit is easy to verifythat E[ X Jj(t)Zj(t)0= b(t)Z(t), n 1 ~ ~ n(t) - b(t) b(t) ( )) vart E Jj(t)Zj(t)] (2) Czz(t), (3) where CM0(t)= n- '(t) , Yj(t) [Zj(t) -Z(t)]2 is the sample variance of the labels of those at risk at time t, in which n-1(t) is used rather than [n(t) - I]-'. Let b*(t) = b(t)[n(t) - b(t)]/[n(t) - 1]. When there are no tied failuretimes, b*(t) -1. As a variance estimator,therefore,let us propose k V(w, Z)= E w2(t(I))b*(t(I))Czz(t(i)). (4) One would hope to be able to refer the statistic T/NJVito a standardnormal table to obtain a significancelevel for the associationbetween the label and the hazard.Note that in (2) and (3) the mean and varianceare functions of random variables.Furthermore,in the calculationof V, the hypergeometricvariancesare summedacrosstablesas if they were independent. Of course, they are highly dependent. However, it still seems intuitively reasonablethat one should be able to condition acrosstime on previouseventsand proceed as above. In fact, many authors have used martingaletheory to derive the statistical propertiesof tests whose motivation is based on changes in the risk set over time. Gill (1980) appliedthis theory to a class of two-sampletest statisticswith quite generalweight function,and Andersenand Gill (1982) investigatedthe Cox relativeriskregression.Using similartechniques,Jones and Crowley(TechnicalReportNo. 88-1, Departmentof Preventive Medicine,Universityof Iowa, 1988)showedthat underthe null hypothesisand suitable is asymptoticallya standard regularityconditions, Vis consistentfor var(T) and that T/ JIV normal random variablefor a generalclass of weight functions w and label functions Z. The failuretime distributionneed not be continuous. The T statisticdefinedin (1) representsa broadclass of nonparametricprocedureswhose individualmembers are specified when particularchoices for the weight function w and label Z are made. We shall introduceseven differentchoices for the label Zj(t): (Li) The most obvious choice of label is Zj(t) = Xj(t), in which case T is the Cox score statistic of Ho: : = 0. Since Cox regressionis quite sensitive to outliers in the covariable space, appropriatealternative choices might be labels robust to such outliers. (L2) Let rj(t) be the rank of Xj(t) among {Xi(t): i E R(t)}; then Zj(t) = rj(t)/n(t) would be a robustchoice. Becauseof the difficultyin rankinga multidimensionalpoint, this procedureis typicallyconfined to a single covariate. (L3) If Zj(t) is a function of rj(t)/n(t), such as normal scores or approximatenormal scores,then one not only gains in robustnessbut also in approximatenormalityof T in small samples,wherethe centrallimit theorem does not hold. This should lead to an improvementin the Type I errorrate in small studies. (L4) If the covariateis truncatedso that if Xj (t) < XL (XL Zj(t) = AXj(t) if XL Xj(t) Xu Xuif Xj(t)> Xu whereXLand Xu are predefinedconstants,then outliersare assignedvalues XLand Xu, and the actualcovariatedata betweenXLand Xu are used. 160 Biometrics, March 1989 (L5) If the covariatevalues at risk {Xj(t): j E R(t) are replacedby a Winsorizedsample IZ,(t): j E R(t) in which au % of the uppertail have been moved down to the next observedcovariatevalue, and similarlyfor aL% of the lower tail, then once again most of the actualcovariatesare used with some protectionagainstoutliers. (L6) A grouped covariate, e.g., Z1(t) = di when xi-, < Xj(t) - xi where i = 1, . . ., s and -oo = xo < xi < ... < xsl < xs = ?o are predefined constants, would provide protectionagainstdata that are only approximatelycorrect.The constants{diI might be chosen to represent any type of trend, e.g., linear (di = i), quadratic (di = i2), or logarithmic (di = log i), i = 1, . . ., s. Alternatively, they could be midpoints of the intervals [xi-,, xi] for i = 2, . .., s - 1 with d, = some minimum and ds = some maximum. The value of di may varywith time, e.g., the averageor median covariate within [xi-1, xi] among those at risk at time t. (L7) Finally, consider Zj(t) = T[Xj(t) - X(t)], where T downweightsoutliers similarto the robust T function used in M-estimationprocedures.The generalform c + g(Ix - cl) T(x) = Xx -c - g(l x + cl) if x > c if lxiI c if x < -c for some nonnegativefunctiong and constantc coversa wide class.For example,the functiong( Ia l) = log(1 + Ia I) bringsin outlierswithoutboundingtheircontribution to T, whereasthe functiong 0 boundsthe outliercontribution.Similarly,one could modify the truncatedcovariatewith such a g function. The weight function w(t) is chosen to be sensitive to a prescribed alternative hypothesis. As seen in (1) when w 1, T weights failures evenly over time. For convenience, let W'(t; a, f) = [SKM(t )]a[ 1 - SKM(t)] , where SA is the Kaplan-Meierestimator.Note that W'(t; 1, 0) gives more weight to early failure times, W'(t; 0, 1) to late failure times, and wi(t; 1, 1) to mid failuretimes. Severalpopularforms of T employ w(t) = n(t), the numberat risk. 3. Equivalence to Popular Tests In this section we show that various forms of the trend statistic T, specifiedby particular choices of the label and weight functions, are equivalentto many of the commonly used survivalanalysisprocedures.The types of labelsto be consideredherearethe rawcovariate, the groupedcovariate,the covariaterank,and the transformedrank of the covariate. 3.1 Raw Covariate Cox relative risk regressionbased on the relative risk model Xj(t) = Xo(t)exp[3Xj(t)] (Cox, 1972) is the principalsurvivalregressionmethod based on covariatesin use today. The Cox score statistic of Ho: = 0 based on Breslow's (1974) approximatepartial likelihoodis k E i=l [S(t(i))- b(t(i))XV(t(i))]j (5) where S(t) is the sum of the covariatesof the failing individualsand X(t) is the average covariate at risk. The Cox score statistic (5) and T given by (1) are equivalent when w(t) = 1 and Z = X. Thus, the Cox score statisticcan be writtenas T(1, X). A Class of NonparametricSurvivalTests 161 Prentice(1978) developedanotherregressionapproachfrom the marginallikelihood of a rank vector of residuals using the accelerated failure time hazard models Xj(t) = exp(fXj)X0[t exp(3Xj)]. The subsequentcensoreddata linear ranktests of Ho: = 0 have the generalform k v = E (X(i)ci + S*.)Ci), i=1 wherethe covariatesare fixed, there are no tied failuretimes, X(E)is the covariateof the individualobservedto fail at t(i),S*(/) is the sum of the covariatesof those censoredduring (t(i), t(I+l)), and ci and C1 are as defined by Prentice (1978). Prentice and Marek (1979) showed that in the two-sample problem v is of the form (1) with w(t(1))= c1 - C1 and b(t) 1 whenever n(t(j))C,_j = ci + [n(t(1))- I]C1. (6) The substitution k n-I(t()) E (x(i)+ S(*))= Z(t()) .=J 1 Michalek, and holds. Mehrotra, whenever (6) fixed covariate problem in the general Mihalko (1982) showed that (6) is always true. Hence, Prentice's linear rank tests of in their equation (6) shows that v is equivalent to (1) with the same w(t(1)) and b(t(i)) Ho: 3 = 0 can be written as T(ci - Ci, X). Much attention has been focused on proceduresthat test the equality of s survival distributions.Taroneand W-are(1977) introduceda class of statisticsin which the log-rank and Gehan tests for the two-sampleproblem and the log-rankand Breslow tests for the s-sampleproblemdifferonly in the choice of weightfunction.Subsequently,many authors have studied such generalclasses for two or more samples, including Aalen (1978), Gill (1980), Fleming et al. (1986), Breslow (1982), and Andersen et al. (1982). Before the relationshipwith the Tarone-Wares-sampleclassis studied,the Tstatisticmustbe extended to the s-samplecase. Let XJ'(t) = (Xi 1(t), .. , Xj,(t)), whereXjrn(t) = 1 if thejth individual is in group m at time t, and 0 otherwise.The obvious extensions of (1) and (4) are a T statisticvectorwhose mth component is . k Tn2(w, X) = E n W(t(i)) E Jj(t(i))[X;m(t(i)) - X,(t(i))], (7) j=l i=-l where X,n(t) = n-'(t) E Yj(t) Xjfl(t) is the average mth component of the covariate vector and a covariancematrixV(w, X) whose (m, l)th element is k w w2(t(i))b*(t(i))Cxlx,(t(i)), V,7,1(W, X)= (8) i=l1 where Cx,7IX/(t)is the sample covariance, using n-(t) rather than [n(t) - I]-, of the mth and Ith components of the covariates at risk at time t. Let O,72(t)= E Jj(t)Xj,,(t) be the number of observed failures in group m at time t, andf(t) = E Yj(t)Xjm(t) be the number at risk in group m at t. Defining E,m(t)=fn(t)b(t)/n(t) to be the hypergeometricmean for 0,,,(t), substitutioninto (7) yields k T,n(W5 X) 1= W(t(i))[1in(t(i))-Etn(t(i))1i=lI (9) 162 Biometrics,March 1989 Direct substitutionshowsthat C,K, ,A(t ) =n- (t ) fi Y;(t0)[X;,n(0- V,n(t)][XjlI(t) - Xz(t )] j=1 f(tO) ( f_ (t n(t)J' n(t) so that (8) becomesthe familiarform V/,n(w, X) = E W2(t(i))b*(t(i)) n(t) n2(t) i=1 Letting T' = (T,,..., (10) - Ts) and VT be a generalized inverse, the quadratic form T'V-T is identicalto that of Tarone and Ware (1977) and basicallythe same as that of Andersen et al. (1982). The latterfirst introducea separateweight function for each component but later note that most examples of interest are covered by a common weight; our w is equivalentto their L. The only other differenceresultsfrom their assumptionof no tied failuretimes so that their varianceestimationis given by (10) without the tied failuretime adjustmentfactor[n(t) - b(t)]/[n(t) - 1]. Becauseof this equivalenceto the Tarone-Ware class, much is known about T in the s-sample setting. For the two-sample problem in particular,T encompassesthe Efron(1967) test with w(t) = n (t)f-'(t)f2j'(t)SKM,I(t)SKM,2(t) wheneverfi(t)f2(t) > 0, whereSKm,i(t)is the Kaplan-Meierestimatorat time t for sample i (i = 1, 2); it also encompassesthe differencein cumulativehazardsfor the two samples with w(t) = n(t)fV'(t)f-1 (t) wheneverJ (t)f2(t) > O.Table 2 containsother specialcases. 3.2 GroupedCovariate to show that Using the groupedcovariatelabel defined in Section 2, it is straightforward T2/ V is identical to the trend statisticfirst proposedby Tarone (1975) in the case w 1 and later extendedto more generalweightfunctionsby Taroneand Ware(1977). 3.3 CovariateRank For j E R(t) let rj(t) be the rank of Xj(t) among {Xl(t): I E R(t)}, where ties are handled by midranks,and let Zj(t) = rj(t)/n(t). Then Z(t) = [1 + n-'(t)]/2. Suppose there are e(t) distinct covariates at time t. Let m1(t) be the number of covariates at risk equal to the smallest distinct value, m2(t) the number equal to the next smallest value, and so on. Therefore, mi(t) of the 1rj(t):j E R(t)} are equal to the average + mi-1(t) + 1, ..., mi(t) + + mi-I(t) + mi(t). Note also that of mi(t) + n(t) = ml(t) + * + me(t)(t). From Lehmann (1975, p. 330, eq. A.18), ... ... n2(t) - 1 e(t) m (t)[m2(t) - 1] so that lCzzC~()=12L (t ) = -eel) [i m(t)fl i(1 Ei[n( n(t)) 1= The generalcovariaterankversionsof T and V are obtainedby substituting[1 + n 1(t )]/2 for Z(t) and (11) into equations (1) and (4). Note in particularthat T[n(t), r(t)/n(t)] = T[1, r(t)]. We preferrj(t)/n(t) over rj(t) as the label since rj rangesbetween 1 and n(t), A Class of Nonparametric Survival Tests 163 Table 2 Special cases of T Label Setting Weight General Fixed covariate s samples Covariate 2 samples 2 samples, stratify on second covariate Grouped covariate s-sample trend n`'(t) x Ranked covariate s-sample Fixed continuous covariate General Transformed ranked covariates Equivalent test Cox (1972) score test of Ho:3 = 0 Prentice (1978) linear rank statistics Tarone and Ware (1977); Andersen et al. (1982) 1 Log-rank n(t) Breslow (1970) Prentice (1978) ci - Ci Tarone-Ware (1977); Gill's (1980) w(t) K class 1 Log-rank (Mantel, 1966) Gehan (1965) n(t) wi(t; 1, 0) Peto and Peto (1972) w(t; a, F) Fleming et al. (1986) w(t) Stratified versions of 2-sample tests 1 ci - Ci w(t) 1 w(t) w(t) Tarone (1975) Tarone and Ware (1977) Same as for covariate label n(t) Brown et al. (1974) modification of Kendall rank correlation T(w, r/n) (?3.3) Generalized log-rank (?3.3) Weighted Kendall (?3) Jonckheere trend test (Gehan, 1965) s-sample trend w(t) 1 n(t)w(t) n(t) Continuous 1 Logit rank test (O'Brien, 1978) which inherently weighs early failure times. On the other hand, rj(t)/n(t) always stays within (0, 1];the only weightingcomes from the weightfunction w. Let us considerthe covariaterank version of T for the s-sample, continuous covariate, and s-sampletrend problems.Define X (t) = (Xj 1(t), .. . , Xj,(t)), Om(t), fm(t), and Em(t) as in Section 3.1. Then the rank of Xjm(t) among lXim(t): i E R(t)} is n(t) rim(t) = - fm(2) if j E group m at time t, n(t) - fm(t) + 1 if j e group m at time t. 2 The averagemth component of (rjI(t), . . ., rj,(t)) is P.m(t) = [n(t) + 1]/2. Noting that 2rjm(t) = [2n(t) -fm(t) + 1]Xjm(t) + [n(t) -fm(t) + 1][1 - Xjm(t)], it is easy to show that substitution of Zj(t) = rj(t)/n(t) into (7) yields Tm(w, rln) 2Tm(W,X). = 164 Biometrics, March 1989 Furthermore, Yi Y(t)[rims(t) - .,n(t)][rj1(t) - C,,7,,()=n`(t) j=1 n(t) 4 ' .,(t) 1 E - X.12(t)][Xjl(t) Yj(t)[XjJn(t) X(t)] - so that Cz",z1(t) = (4)Cx,,x1(t) and Vmi(W, rln) = (7)Vml(w, X). Thus, r/n)IT(w, r/n)} = {T(w, X)}'V-(w, {T(w, r/n)}'V-(w, X)IT(w, X)} so that the Tarone-Wareclass for the s-sampleproblemcan be derivedfrom the T statistic using eitherthe covariatesor the ranksof the covariatesfor the label. Hence, the log-rank test can be denoted by eitherT(1, X) or T(1, rln). Next let us considerthe rank version of T when the covariateis continuous. Extending the work of Brown et al. (1974), we shall introducea weightedKendall rank correlation statisticmodifiedfor right-censoreddata for a fixed covariate.In the absenceof tied failure times the rankversionof T is equivalentto a specificform of the weightedKendallstatistic. To facilitatethe proof we shall derivea U-statisticversionof T. Assume for the continuous covariateproblemXj = Xj (t) = Xj (0) and there are no tied failuretimes. Define 01 I U11 12 Lo ifXj>XI if = XI ifXj<XI. Ifj E R(t), we have 1+ rj(t)= E (12) Ujl. IER(t)-{i} With fixed covariates the survival data typically are representedas 1(yj, Xj, bj), = 1, .. ., n}, whereyj is the time under study for the jth individual,and bj= 1 if he left the study througha failureand 0 if he was censored.Let tj be the actual survivaltime for the jth individualwhich is observedonly when bj= 1. For simplicity,reorderthe indices so that yi - *** - yn and j < I whenever yj = y,, bj = 1, and &1= 0. Then from (1) and (12) the U-statisticform is n-I T[w(t)n(t), r(t)/n(t)] = E w(yj)Qj i=I since n(yj) = n - U,iI - -Aj ) (13) 1ER(Yj1)-i}j j + 1 when bj = 1. In creating a framework for the Kendall statistic, define 1 c(u, v) = if u is definitely largerthan v if u = v or uncertain (-1 if u is definitely smallerthan v. For example, if (y5, bi) = (5, 1), (Y2, 62) = (4, 0), and (y3, 63) = (4, 1), then c(tl, t2) = 0, and C(t2, t3) = 1. Since the covariates 1XjI are all known, the "uncertainty" option of c(X;, XI) is never used. We introducehere a weightedKendallrank statistic C(tl, t3) = 1, K(v) = 2 , j<i vjlc(tj, tl)c(Xj, XI). (14) A Class of Nonpararnetric Survival Tests 165 When vjl-1, (14) is the Kendallrankstatisticintroducedby Brownet al. (1974, henceforth denoted by BHK). Proposition 1 In the absence of tied failure times and when vjl = w(yj), K(w) = -4T[w(t)n(t), r(t)/n(t)]. *.. Proof Since yi, < yn and j < I whenever yj = yl, bj = 1, and 1 = 0, then c(tj, t,) = -bj forj < 1.Since c(Xj, XI) = 2Uj, - 1, (14) becomes K(w) = 2 Z j<1 w(yj)(-bj)(2Uj, = -4 j=l I w(yj)bj = -4T(w, r) l=j+I = - 1) Ujl - -4T[w(t)n(t), by (13) and the factR(yj) - {j} = {j + 1, ..., i j=1 w(yj)bj 2- 1 r(t)/n(t)] n}. When there are tied failuretimes, T and K are not equivalent.In essence the T statistic compares covariablesof simultaneouslyfailing individuals,whereas the K statistic does not since c(tj, t,) = 0 when tj = ti. The BHK modification of the Kendall statistic is K(1) = -4T[n(t), r(t)/n(t)]. As such, there are two distinct advantagesto K(1) belonging to the T class. First, the BHK Kendall statistic requiresfixed covariates,whereasas a T statistic it is extended to time-dependentcovariates. Second, K(1) can be extended to having weights, i.e., as T[w*(t)n(t), r(t)/n(t)] for some function w*(t), although not to the extent of the weightedKendallstatistic(14). Note that the Kendall statistic for the general covariateproblem and the Gehan twosample statisticare both representedby T(n, r/n). This confirmsthe observationof BHK that the Kendall statisticapplied to the two-sampleproblem is the Gehan statistic.Since w(t) = n(t), the Gehan-Kendallstatisticweightsearlyfailuretimes more heavily;however, since the weight depends on the censoringmechanism,there can be severe consequences as shown by Prentice and Marek (1979) in the two-samplecase. Let us propose a new statisticT( 1, rln) that appliesequal weightingacrosstime and that is equivalentto the logrank test in the two-samplecase. This "generalizedlog-ranktest" will be appropriatefor the proportionalhazardsmodel and yet robust to outliers in the covariablespace. If one wants a test that is sensitive to hazarddifferenceswhich are not constant over time, then one should use the generalform T(w, rln), where the weight w is chosen appropriately. Sometimes the weighting scheme can be predeterminedexactly as a function of time. Oftentimesone may know only whetherearly,mid, or late failuretimes shouldbe weighted, in which case the aforementioned w(t; a, ,B) is an excellent candidate. To avoid the problems experienced by the Gehan-Kendall statistic, the weight function should be independentof the censoringdistribution. Finally, let us consider T in the s-sample trend problem.In particular,let us study the specialcase in which the fixed covariatelabels for the s categoriesare ordered.In contrast to Section 3.2, no particulartrend alternative is assumed here. Let us now define a Jonckheeretrend statisticbased on a two-sampletest. Suppose the group index ordering reflectsthe covariateordering.If Sab is a statisticused to test the differencebetweengroup a and group b, then a reasonabletest statisticof no differenceamong s groupsagainstan orderedcategorytrend alternativeis the Jonckheerestatistic E Sab, where summation is over all grouppairingsa < b. Proposition2 In the s-sampletrendproblemthe Jonckheeretrendstatistic(Gehan, 1965) based on Gehan'stwo-sampletest is equivalentto the BHK censored-datamodificationof 166 Biometrics,March 1989 Kendall rank correlationK(1). In fact, K(1) = -2 E Sab, where Sab is the Gehan test for groupsa and b and summationis over all pairsa < b. Proof Without loss of generality assume the s distinct covariate values are I1, ..., s}. The data can be presented as I(yj, bj, Xj ), j = 1, . . ., n}, where yj, bj, and Xj are as defined before. However, for simplicity of this proof it will be convenient to consider another indexingsystem(a, i) wherebyYaiis the observedtime on study for the ith personin group a. The terms jai, Xai, and taj are defined similarly. Hence, Xai = a. The data are now representedby I(yaij, ai, Xai), i = 1, ... ,fa(O)and a = 1, . . ., s}, wherefa(O)is the size of groupa at time 0. Then K(1)= c(ti, tj)c(Xi, Xj) i j S =E a=1 Since c(u, v) =-c(v, for a < b, S fa fb E E E C(tai, tbj)c(Xai, b=1 u), c(Xai, Xaj) XbJ). j=1 j=1 = c(a, a) = 0, and c(Xai, Xbj) K(1) = -2 = -2 E a<b i E Sab, C(tai, C = c(a, b) =-1 tbj) j a<b where Sab is the Gehan test comparinggroups a and b. Because the Jonckheerestatistic is equivalent to the Kendall statistic, it is also equivalent to the trend statistic T by Proposition 1 when there are no tied failuretimes and the weightsare all equal to 1. 3.4 TransformedRanked Covariates In this section we consideronly continuous covariates.Section 3.3 can be generalizedby employingtransformationsof the rankedcovariatelabels,i.e., the labelforthej th individual at time t is Zj(t) = gt[rj(t)], wheregt is a monotone function. In this fashion one might replacethe actual covariatesby the expected or approximateorder statisticsunder some distributionfor the covariates,e.g., the normal. Except for a few distributions,expected order statistics are difficult to find. To calculate approximate order statistics, let gt(l) = H 1 {(l - ')/n(t)}, where / = 1, ..., n(t), and Ht is a distribution function. Note that (I - ')/n(t) is never 0 or 1 so that g(l, t) will be finite. The choice Ht = 1, the normal distributionfunction,will give approximatenormalscores.The logit functionalso provides a fair approximationto the normal. Proposition3 The trend statisticsT(1, Z)/vfV(1, Z) where Zj(t) = V-1{[rj(t) - ']In(t)) and Z.(t) = logit [rj(t) - -]/n(t )} are equivalentto the inversenormaland logit ranktests, respectively,proposedby O'Brien(1978). Proof This propositionfollows readilyupon comparisonof (1) and (4) with the correspondingequationsin O'Brien(1978). As a summaryto this section, Table 2 lists some tests that are membersof the general class of statisticsspannedby T(w, Z). 4. Example Table 3 specifiesthe posttreatmentsurvivaltimes and agesat time of treatmentfor 28 male patients with low-gradegliomas (brain tumors) who comprised a randomized study of radiotherapywith and without CCNU, an oral nitrosourea(chemotherapy).These data are A Class of Nonparametric Survival Tests 167 Table 3 Survivaltimes (days) and ages (years)for 28 male patientswith low-gradegliomas Survival time Age 6 61 179+ 236 296 370 420 474 535 547+ 587+ 637+ 639 747 Survival time Age 38.4 59.6 18.9 65.5 67.8 42.1 66.2 38.1 57.3 34.5 37.0 30.1 49.5 50.9 797+ 954 967+ 1,050 1,134 1,141 1,213+ 1,349+ 1,506+ 1,517+ 1,624 1,828+ 1,983+ 2,237+ 36.9 49.0 26.8 66.9 30.4 41.6 35.1 22.2 42.0 43.1 30.9 35.7 29.5 57.8 Table 4 Analysesof unmodifiedand modifiedTable3 data Standardizedtest statistic Unmodified data 3.15 2.92 3.10 Modified data 1.40 2.69 2.87 Statistic X) T(1, T(1, rln) T(n, r/n) Name Cox score Generalizedlog-rank Kendall T(SKM,r/n) Survival-weighted T 3.11 2.87 Logit rank 2.88 2.35 T(1, Iogit( 2 from the SouthwestOncologyGroupProtocol7983. Data indicatingCCNU treatmentare omitted. In order to demonstratethe differentialeffects of extreme covariate values on variousforms of the T statistic,a second data set is createdfrom that of Table 3 by simply changingthe age of the last patient to exit the study from 57.8 to 97.8. Howeverunlikely this modification may be for this particulardata set, it is certainlywithin the range of extremecovariatevalues found in many studies.Five differentversionsof the normalized T statistic were computed for the unmodified and modified data and are presented in Table 4. For the unmodified data the Cox score test yields the most significantresult, followed closely by the Kendall and survivalfunction weightedcovariaterank version of T, both of which emphasizeearly failuretimes. Data sets can be found for which each of these five statisticsis most significant.Analysis of the modified data set, as seen in Table 4, illustratesthe considerableinfluence of a single extreme covariateon a covariate-based procedure,such as the Cox score test which has droppedfrom 3.15 to 1.40, whereasthe covariaterank-basedproceduresremainreasonablyrobust. 5. Discussion The T statisticis designedto consider only a single covariate,although some extensions are possible,as seen in the s-sampleproblemin Section 3.1. In a multicovariatesituation one often wants to test for the significanceof one particularcovariatewhile adjustingfor 168 Biometrics, March 1989 the effects of the others.In a nonparametricsettinga popularapproachto this problemis to stratifythe databasedon valuesof the covariatesto be adjustedfor, form the test statistic Ti and its varianceestimatorViwithin each stratum,and finallyform a normalizedstatistic i 12, wheresummationis over the numberof strata.For example,in a data set -(ET1)( V4)consistingof three covariates,sex with two levels, age groupwith five levels, and city with two levels, one could stratifyon the ten sex-age group combinations and then test for a survival difference between the two cities using a stratified log-rank or a stratified Peto-Peto test. Of course, the decision on how to stratifya continuous covariateis often guesswork. For significancetesting purposeswe would hope that T/lVi is approximatelya N(O, 1) random variableunder Ho. Under regularityconditions T/VT7 is in fact asymptotically normal (Jones and Crowley, Technical Report No. 88-1, Department of Preventive Medicine, University of Iowa, 1988); however, in small samples the approximation may be poor. The large-sampleapproximationrelies on the central limit theorem. Let K = b(t1) + *** + b(tk) be the total numberof individualsobservedto fail. As seen in (1), T is approximatelynormal for small K underHo when the {Zj(t)I are close to normal;but when the IZj(t )} are far from normal,a good approximationmay requireverylargeK. The obvious solution for small samplesis to transformthe covariatesto be as close to normal random variables as possible. Let us consider the continuous covariate situation. One possibilitywould be to replacea covariatewith an approximatenormal score; e.g., Zj(t) = b- '[(rj(t) - ')/n(t)] or Zj(t) = logit[(rj(t) - ')/n(t)] as mentioned in Section 3.4. O'Brien(1978) has shown throughsimulationthat the logit rank test, based on the latter label, has better control over the Type I error level than the Cox score test. Normal approximationto T may be poor if the distributionof covariatesis skewed or there are outliers. With fixed continuous covariatesanother solution would be to transformthe initialcollectionof covariatesso that they are more normallydistributed-perhapsa simple log transform.Certaininvarianceprinciplesshould be noted here: T/ i V based on the raw covariates is invariant to the linear transformf(X) = a + bX; T/I/V based on logtransformed covariates is invariant to the transform f(X) = aXb; and T//V based on rankedor transformedrankedcovariatesis invariantto any monotone transformation. The generaltrendstatisticT has severalverydesirableproperties:(i) it handlestied failure times, (ii) it allowstime-dependentcovariates,(iii) its varianceestimatorV does not require the censoringdistributionto be independentof the covariate,(iv) it allows fairlygeneral weighting schemes, and (v) it represents a fairly general class of nonparametrictest proceduresfor survivalanalysis.Its two majordisadvantagesderivefrom its nonparametric nature:its principal use is for a single covariate and there are no parametersper se to estimateand interpret.The weight function should be chosen with a particularalternative in mind. For example, if one believes that the associationbetween the failurehazardand the label is strongerearly in time ratherthan later, the weight function should be chosen accordingly.To avoid erroneous results due to the censoring,distribution,the weight function should be chosen independent of censoring-dependentquantities, such as the numberat risk. As a generalclass, T contains many of the survivalanalysisproceduresin common use (Table 2), includingboth linear and nonlinearrank statistics.The linear rank statisticsof Prentice(1978) use the ranksof the failuretimes and the actual covariatevalues, whereas the nonlinearKendall rank correlationof BHK (1974) uses the rank of the failuretimes and the successive rerankingof the covariates at each failure time (?3.3). There are severalbenefitsin belongingto such a generalclass. If we invent a new test, or are interested in an alreadyexistingone, and we can show that it is a memberof the T class,then several extensionsare immediatelyavailable.For example,in the form of (1) it can accommodate weight functions and time-dependentcovariates.Severalof the statisticsof Table 2 were A Class of Nonparametric Survival Tests 169 designed with neither in mind. We can also write down (4) as its variance estimator, which can handle tied failure times and is appropriate even when the censoring distribution depends on the covariate. Conditional on the covariate we still require the failure and censoring mechanisms to be independent. As an example of such an extension, as seen in Section 3.3, the Kendall statistic of BHK (1974) can be modified to handle a weight function and time-dependent covariates. Another benefit of the T class is to extend old tests into completely different covariate settings, sometimes creating new tests, while retaining the same weight function and label. For example, the two-sample log-rank procedure, as shown in Sections 3.1 and 3.3, can be written as either T( 1, X) or T( 1, rln). Extending this to the continuous covariate setting, T( 1, X) is the Cox score statistic and T(1, rln) is a new procedure, a generalized log-rank. Likewise, the entire Tarone-Ware class of two-sample tests can be generalized to the continuous label statistics T(w, X) or T(w, rln). If there are outliers in the covariable space, then instead of using the Cox score test T(w, X), we may opt for a more robust test T(w, Z), where the label Zj(t) is rj(t)/n(t) or perhaps the truncated covariate, the latter retaining most of the original data. For small sample sizes or highly skewed covariate situations, we may decide to replace the covariate by a transformed rank of the covariate or by a transformation of the covariate (?3.4) to have better control over the Type I error. Another benefit of such a general class is in the study of the asymptotic properties. Given a general central limit theorem for T/VY ( Jones and Crowley, unpublished technical report, 1988), asymptotic normality for any member of the T class, specified by particular weight and label functions, can be established as a corollary to the general theorem. Finally, because so many tests differ only by the choice of weight function and label, such a unifying class promotes a better understanding of the relationships among test procedures, which in turn helps in the decision as to which procedure should be used in a given situation. ACKNOWLEDGEMENTS This work was done while the first author was supported by National Cancer Institute training grant T32 CAO9168 and research grant CA39065 and the second author by a NIGMS grant. Data for the example in Section 4 are used with the permission of Dr Harmon Eyre, Chairman of the Brain Tumor Committee of the Southwest Oncology Group, and Dr Charles A. Coltman, Jr., Chairman of the Southwest Oncology Group. RESUMm Tarone et Ware (1977, Biometrika64, 156-160) ont developpe une classe generalede test de s echantillonspour des donnees de survie censureesa droite;cette classe inclut la proceduredu lograng et celle de Wilcoxon modifiee. Par la suite, de nombreuxauteursont etudie des classes pour deux ou s echantillonsen detail. Dans ce papier,on montrequ'une famille de statistiquesnonparametriquesunifie l'existantet permetd'engendrerde nouvellesstatistiquesde test pour le problemede s (> 2) echantillons,le problemed'unetendancepours echantillonset celui d'unecovariablecontinue. REFERENCES Aalen, 0. 0. (1978). Nonparametricinferencefor a familyof countingprocesses.Annalsof Statistics 6, 701-726. Andersen,P. K., Borgan,O., Gill, R. D., and Keiding, N. (1982). Linear nonparametrictests for comparisonof counting processeswith applicationsto censored survival data. International StatisticalReview50, 219-258. Andersen,P. K. and Gill, R. D. (1982). Cox'sregressionmodel for countingprocesses:A large-sample study. Annals of Statistics 10, 1100-1120. Breslow,N. E. (1970). A generalizedKruskal-Wallistest for comparingK samplessubjectto unequal 57, 579-594. patternsof censorship.Biormetrika Breslow,N. E. (1974). Covarianceanalysisof censoredsurvivaldata. Biometrics30, 89-99. 170 Biometrics, March 1989 Breslow, N. E. (1982). Comparison of survival curves. In The Practice of Clinical Trials, M. Buyse, M. Staquet, and R. Sylvester (eds), Chap. 2, Part 6. Oxford: Oxford University Press. Brown, B. W., Jr., Hollander, M., and Korwar, R. M. (1974). Nonparametric tests of independence for censored data, with applications to heart transplant studies. In Reliability and Biometry, Statistical Analysis of Lifelength, F. Proschan and R. J. Serfling (eds), 327-354. Philadelphia: Society for Industrial and Applied Mathematics. Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society,Series B 34, 187-220. Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269-276. Efron, B. (1967). The two-sample problem with censored data. In Proceedings of the 5th Berkeley Symposium 4, 831-854. Berkeley, California: University of California Press. Fleming, T. R., Augustini, G. A., Elcombe, S. A., and Offord, K. P. (1986). The SURVDIFF procedure. Supplemental Release, SAS User's Group International. Cary, North Carolina: SAS Institute, Inc. Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 203-223. Gill, R. D. (1980). Censoring and stochastic integrals. Mathematical Centre Tracts 124. Amsterdam: Mathematisch Centrum. Lehmann,E. L. (1975).Nonparametrics: StatisticalMethodsBasedon Ranks.San Francisco:HoldenDay. Mantel, N. (1966). Evaluation of survival data and two new rank-order statistics arising in its consideration. Cancer Chemotherapy Reports 50, 163-170. Mehrotra, K. G., Michalek, J. E., and Mihalko, D. (1982). A relationship between two forms of linear rank procedures for censored data. Biometrika 69, 674-676. O'Brien, P. C. (1978). A nonparametric test for association with censored data. Biometrics 34, 243-250. Peto, R. and Peto, J. (1972). Asymptotically efficient rank-invariant test procedures (with discussion). Journalof the Royal StatisticalSociety,SeriesA 135, 185-206. Prentice, R. L. (1978). Linear rank tests with right-censored data. Biometrika 65, 167-179. Prentice, R. L. and Marek, P. (1979). A qualitative discrepancy between censored data rank tests. Biometrics 35, 861-867. Tarone, R. E. (1975). Tests for trend in life table analysis. Biometrika 62, 679-682. Tarone, R. E. and Ware, J. (1977). On distribution-free test for equality of survival distributions. Biometrika 64, 156-160. ReceivedFebruary1987; revisedMarch 1988.
© Copyright 2025 Paperzz