HSRP 734: Advanced Statistical Methods July 10, 2008 Objectives Describe the Kaplan-Meier estimated survival curve Describe the log-rank test Use SAS to implement Kaplan-Meier Estimate of Survival Function S(t) The Kaplan-Meier estimate of the survival function is a simple, useful and popular estimate for the survival function. This estimate incorporates both censored and noncensored observations Breaks the estimation problem down into small pieces Kaplan-Meier Estimate of the Survival Function S(t) For grouped survival data, Sˆ (t ) Estimated Pr( Survive beyong t ) y j Lj 1 j : bins 1 thru t Nj Let interval lengths Lj become very small – all of length L=Dt and let t1, t2, … be times of events (survival times) Kaplan-Meier Estimate of the Survival Function S(t) 2 cases to consider in the previous equation Case 1. No event in a bin (interval) y j Lj Nj 1 0 Lj Nj y j Lj Nj 0 1 Sˆ (t ) does not change — which means that we can ignore bins with no events Kaplan-Meier Estimate of the Survival Function S(t) Case 2. yj events occur in a bin (interval) Also: nj persons enter the bin assume any censored times that occur in the bin occur at the end of the bin 1 y j Lj Nj 1 y j Dt n j Dt nj y j nj Kaplan-Meier Estimate of the Survival Function S(t) So, as Dt → 0, we get the Kaplan- Meier estimate of the survival function S(t) nj y j ˆ S (t ) j :t j t nj Sˆ (0) 1 (by convention ) Also called the “product-limit estimate” of the survival function S(t) Note: each conditional probability estimate is obtained from the observed number at risk for an event and the observed number of events (nj-yj) / nj Kaplan-Meier Estimate of Survival Function S(t) We begin by Rank ordering the survival times (including the censored survival times) Define each interval as starting at an observed time and ending just before the next ordered time Identify the number at risk within each interval Identify the number of events within each interval Calculate the probability of surviving within that interval Calculate the survival function for that interval as the probability of surviving that interval times the probability of surviving to the start of that interval Example - AML Group Weeks in remission -- ie, time to relapse Maintenance chemo (X=1) 9, 13, 13+, 18, 23, 28+, 31, 34, 45+, 48, 161+ 5, 5, 8, 8, 12, 16+, 23, 27, 30+, 33, 43, 45 No maintenance chemo (X=0) + indicates a censored time to relapse; e.g., 13+ = more than 13 weeks to relapse Example – AML Calculation of Kaplan-Meier estimates: In the “not maintained on chemotherapy” group: Sˆ (t ) Time At risk Events tj nj yj n yj Sˆ (t j ) Sˆ (t j 1 ) j nj 0 12 0 1.000 5 12 2 1.000 x ((12-2)/12) = 0.833 8 10 2 0.833 x ((10-2)/10) = 0.666 12 8 1 0.666 x ((8-1)/8) = 0.583 23 6 1 0.583 x ((6-1)/6) = 0.486 27 5 1 0.486 x ((5-1)/5) = 0.389 33 3 1 0.389 x ((3-1)/3) = 0.259 43 2 1 0.259 x ((2-1)/2) = 0.130 45 1 1 0.130 x ((1-1)/1) = 0 Example – AML (cont’d) In the “maintained on chemotherapy” group: Sˆ (t ) Time At risk Events tj nj yj n yj Sˆ (t j ) Sˆ (t j 1 ) j nj 0 11 0 1.000 9 11 1 1.000 x ((11-1)/11) = 0.909 13 10 1 0.909 x ((10-1)/10) = 0.818 18 8 1 0.716 23 7 1 0.614 31 5 1 0.491 34 4 1 0.368 48 2 1 0.184 Example – AML (cont’d) 1.0 The “Kaplan-Meier curve” plots the estimated survival function vs. time — separate curves for each group 0.6 0.8 Maintained=0 Maintained=1 0.0 0.2 0.4 Survival 0 50 100 Time 150 Example – AML (cont’d) Notes — Can count the total number of events by counting the number of steps (times) — If feasible, picture the censoring times on the graph as shown above. Kaplan-Meier Estimate Using SAS Comments on the Kaplan-Meier Estimate If the event and censoring times are tied, we assume that the censoring time is slightly larger than the death time. If the largest observation is an event, the Kaplan-Meier estimate is 0. If the largest observation is censored, the Kaplan-Meier estimate remains constant forever. Comments on the Kaplan-Meier Estimate If we plot the empirical survival estimates, we observe a step function. If there are no ties and no censoring, the step function drops by 1/n. With every censored observation the size of the steps increase. When does the number of intervals equal the number of deaths in the sample? When does the number of intervals equal n? Comments on the Kaplan-Meier Estimate The Kaplan-Meier is a consistent estimate of the true S(t). That means that as the sample size gets large, KM estimate converges to the true value. The Kaplan-Meier estimate can be used to empirically estimate any cumulative distribution function Comments on the Kaplan-Meier Estimate The step function in K-M curve really looks like this: If you have a failure at t1 then you want to say survivorship at t1 should be less than 1. For small data sets it matters, but for large data sets it does not matter. Confidence Interval for S(t) – Greenwood’s Formula Greenwood’s formula for the variance of Sˆ (t ) : Vaˆr Sˆ (t ) Sˆ (t ) 2 j:t j t yj n j (n j y j ) Using Greenwood’s formula, an approximate 95% CI for S(t) is Sˆ (t ) 1.96 Vaˆr Sˆ (t ) There is a “problem”: the 95% CI is not constrained to lie within the interval (0,1) Confidence Interval for S(t) – Alternative Formula Based on log(-log(S(t)) which ranges from -∞ to ∞ Find the standard error of above, find the CI of above, then transform CI to one for S(t) This CI will lie within the interval [0,1] This is the default in SAS Log-rank test for comparing survivor curves Are two survivor curves the same? Use the times of events: t1, t2, ... (do not include censoring times) Treat each event and its “set of persons still at risk” (i.e., risk set) at each time tj as an independent table Make a 2×2 table at each tj Event No Event Total Group A aj njA- aj njA Group B cj njB-cj njB Total dj nj-dj nj Log-rank test for comparing survivor curves At each event time t j, under assumption of equal survival (i.e., SA(t) = SB(t) ), the expected number of events in Group A out of the total events (dj=aj +cj) is in proportion to the numbers at risk in group A to the total at risk at time tj: Eaj = dj x njA / nj Differences between aj and Eaj represent evidence against the null hypothesis of equal survival in the two groups Log-rank test for comparing survivor curves Use the Cochran Mantel-Haenszel idea of pooling over events j to get the log-rank chi-squared statistic with one degree of freedom 2 a Ea j j j ~ 2 2 1 ˆ V a r a j j Vaˆ r a j d j (n j d j )n jA n jB n 2j (n j 1) Log-rank test for comparing survivor curves Idea summary: Create a 2x2 table at each uncensored failure time The construct of each 2x2 table is based on the corresponding risk set Combine information from all the tables The null hypothesis is SA(t) = SB(t) for all time t. Comparisons across Groups Extensions of the log-rank test to several groups require knowledge of matrix algebra. In general, these tests are well approximated by a chisquared distribution with G-1 degrees of freedom. Alternative tests: Wilcoxon family of tests (including Peto test) Likelihood ratio test (SAS) Comparison between Log-Rank and Wilcoxon Tests The log-rank test weights each failure time equally. No parametric model is assumed for failure times within a stratum. The Wilcoxon test weights each failure time by a function of the number at risk. Thus, more weight tends to be given to early failure times. As in the log-rank test, no parametric model is assumed for failure times within a stratum. Between these two tests (Wilcoxon and log-rank tests), the Wilcoxon test will tend to be better at picking up early departures from the null hypothesis and the log-rank test will tend to be more sensitive to departures in the tail. Comparison with Likelihood Ratio Test in SAS The likelihood ratio test employed in SAS assumes the data within the various strata are exponentially distributed and censoring in noninformative. Thus, this is a parametric method that smoothes across the entire curve.
© Copyright 2026 Paperzz