Leukemia (2011) 25, 1433–1438 & 2011 Macmillan Publishers Limited All rights reserved 0887-6924/11 www.nature.com/leu ORIGINAL ARTICLE Recommendations to meet statistical challenges arising from endpoints beyond overall survival in clinical trials on chronic myeloid leukemia M Pfirrmann1, A Hochhaus2, M Lauseker1, S Sauele3, R Hehlmann3 and J Hasford1 1 Institut für Medizinische Informationsverarbeitung, Biometrie und EpidemiologieFIBE, Ludwig-Maximilians-Universität, München, Germany; 2Abteilung Hämatologie/Onkologie, Universitätsklinikum Jena, Jena, Germany and 3III. Medizinische Klinik, Medizinische Fakultät Mannheim der Universität Heidelberg, Mannheim, Germany The aim of this work was to provide guidelines for appropriate statistical analyses regarding most common endpoints in clinical trials on chronic myeloid leukemia: hematologic, cytogenetic and molecular results, failure-free and event-free survival, and progression-free and overall survival. The reasons for the specified recommendations are explained and important issues are outlined by comprehensive examples. Particular attention is paid to the warning of the application of suboptimal methods that may lead to seriously biased results and conclusions. In the presence of a competing risk like death, Kaplan-Meier analysis should not be applied for time-to-remission endpoints. The appropriate method to estimate the probabilities of a time-to-remission endpoint is the calculation of its cumulative incidence function. However, the exact date of remission is hardly known. Detection of remission depends strongly on evaluation frequencies. Complex composite endpoints comprising many events with considerably heterogeneous severity imply difficulties with interpretation. Time-to-remission and complex composite endpoints are not recommended for primary judgment on efficacy. It is rather advisable to investigate remission status at a fixed time point as a primary endpoint, followed by progression-free and overall survival. For patients with the intended remission success at the time point of interest, relapse-free survival provides an additional primary outcome. Leukemia (2011) 25, 1433–1438; doi:10.1038/leu.2011.116; published online 20 May 2011 Keywords: chronic myeloid leukemia; clinical trials; competing risks; time-to-event endpoints; composite endpoints Introduction Overall survival (OS) used to be the undisputed primary endpoint for efficacy evaluation of treatments in chronic myeloid leukemia (CML). Apart from its role as the main clinical concern, OS with death from any cause as the only event has some convenient characteristics from a pure statistical point of view. The date of the event is known exactly for almost every patient. Provided that follow-up was sufficiently long, complete and did not differ between the randomized treatment groups and provided that censoring carried no prognostic information about subsequent failure time, analysis of OS probabilities should lead to unbiased results.1 The statistical methods for calculating Kaplan-Meier survival plots, the P-value (mostly) of the log-rank test, estimates of treatment effects, and corresponding confidence intervals, are well documented.1–4 Since the beginning of this century, tyrosine kinase inhibitors (TKIs) have become the treatment of choice for CML. Due to Correspondence: Dr M Pfirrmann, Institut für Medizinische Informationsverarbeitung, Biometrie und EpidemiologieFIBE, Ludwig-MaximiliansUniversität, Marchioninistrae 15, 81377 München, Germany. E-mail: [email protected] Received 1 April 2011; accepted 26 April 2011; published online 20 May 2011 early promising results, already in the phase III clinical trial (IRIS) leading to the approval of imatinib (the first TKI) for newly diagnosed CML patients, efficacy assessment was not based on OS any longer.5 The same applies to later trials like the recent ones comparing imatinib in different doses and treatment combinations,6 imatinib with nilotinib7 or with dasatinib.8 Instead of OS, the rates of major molecular remission (MMR) and of complete cytogenetic remission (CCR), both at 12 months, had been the primary endpoints.6–8 They were preferred over the classical endpoint because TKIs improved OS in a way that possible statistically significant differences between treatments will not be observed for a long time.9 With the need to arrive at insights on therapeutic improvement earlier on, decisions on the efficacy of a new therapy are based on remission parameters like MMR and CCR which are regarded as surrogate markers for OS. Cumulative probability curves for complex composite endpoints and endpoints like ‘time to the first MMR’ became more and more common. Methods It is our observation that the analysis of time-to-remission endpoints is still erroneously performed by plotting 1–KM (Kaplan-Meier) curves: ‘death before a possible observation of a first MMR’ is censored at the time it occurred and the fact that it actually is a competing event (risk) to the event of interest, which is ‘first MMR’, is ignored. Instead of 1–KM curves, cumulative incidence functions (CIFs) for ‘time to first MMR’ with appropriate consideration of competing risks must be calculated.10–12 As not only the procedure of computing CIFs but also its necessity is not well understood, we present a simple example which could easily be recalculated. Not all statistical software packages support CIF analysis. Hence, a program published by Scrucca et al.13 based on the freely available programming software R is described in the Supplementary appendix in detail. Time-to-remission endpoints are compared with ‘remission at a certain time’. We give reasons why the latter should be preferred for a primary outcome. Explicit proceedings in determining the remission status at a certain time are provided. Finally, methodological issues regarding the consideration of composite endpoints like progression-free survival, event-free and failure-free survival are discussed. Results Time-to-event endpoints and competing risks With the following example based on data of five hypothetical patients, we would like to illustrate a typical situation where the Analyzing endpoints beyond overall survival in CML M Pfirrmann et al Leukemia F 0.2 0 0.2 0 0.4 F 0.2 0 0.33 0 1 1 0.8 0.6 0.4 0.4 0 0 0.2 0.2 0.47 0.47 1 0 93 147 209 360 523 Abbreviations: CCR, complete cytogenetic remissions; KM, Kaplan-Meier. a It was assumed that the exact dates of the first achievement of a CCR were known. 1 0.8 0.8 0.53 0.53 0 F 1 0 1 0 1 F 1 1 1 0 1 ej ccr ej F 1st CCR Death 1st CCR Censoring 1st CCR 5 5 4 3 2 1 Estimated CCR probability11 p̂CCR(tj) Estimated CCR rate10 ^lCCR(tj) Estimated survival w/o any event9 Ŝ(tj) Estimated 1–KM probability8 1ŜCCR(tj) nj No. of first CCR6 Total no. of events5 Estimated survival w/o CCR7 ŜCCR(tj) 0 1 2 3 4 5 ð3Þ tj ^lk ðtj Þ ¼ ejk : nj j In our example, refer to Table 1, columns (4, 5, 9). Next, we calculate the hazard that a patient experiences event k at time tj, with k ¼ ‘(first) CCR’ or k ¼ ‘death’, under the condition that the patient had no event up to time tj: No. at risk4 priate approach to estimate the probabilities of achieving a first CCR in the presence of competing risks is the calculation of the CIF.10–12,15 Following the proceeding of Putter et al., the first step is the estimation of the Kaplan-Meier probabilities of being event-free in relation to any kind of the competing events:12 ^ j Þ ¼ Sðt ^ j1 Þ 1 ej : ð2Þ Sðt nj Event3 Appropriate consideration of a competing risk: calculation of the cumulative incidence function. The appro- Event time in days2 ð1 S^CCR ðt5 ¼ 523ÞÞ þ ð1 S^death ðt5 ¼ 523ÞÞ ¼ 1 þ 0:25 ¼ 1:25: Table 1 ‘death’ is censored and formally set to zero (columns (6–7)). By definition, Kaplan-Meier estimates change only at event times.1 Thus, it is sufficient to apply the recursive formula at times tj where ej40. With t0 ¼ 0 and ŜCCR(0) ¼ 1, at t1 ¼ 93 the probability of survival without the achievement of a first CCR is estimated by S^CCR ðt1 ¼ 93Þ ¼ S^CCR ðt0 ¼ 0Þð1 e1CCR =n1 Þ ¼ 1ð1 1=5Þ ¼ 0:8 and so on. Then, it is often claimed that the complement 1–KM (Table 1, column (8) and Figure 1) represents the estimated probability of observing the event of interest (first CCR) up to a time t. A prerequisite for the application of the Kaplan-Meier estimator is ‘non-informative censoring’ meaning that the event of interest might be observed in the future, that is, the patient is still ‘at risk’.1,3,4 It implies that the reason for censoring carries no information on the subsequent event probabilities. For the competing event ‘death before a possible first CCR,’ this prerequisite is clearly not met. If a patient died, he is not at risk any longer and the probability of a later first CCR is reduced to zero. Consequently, the erroneous censoring of competing risks leads to systematically biased (overestimated) probabilities of observing the event of interest.12,15 From t5 ¼ 523, the wrongly estimated probabilities of the two mutually exclusive events ‘achievement of a first CCR’ and ‘death before achievement of a first CCR’ even add up to a total probability greater than one (Figure 1): Inappropriate versus appropriate estimation of the probabilities of achievement of a first CCRa display the ordered event times tj, j ¼ 1,y,5, and the quality of the event observed for the five patients. At the time of randomization (j ¼ 0), none of the patients was in cytogenetic remission. The fourth column contains the number of patients at risk nj just prior to event time tj which is followed by the total number of events ej at tj. In ignoring ‘death before a (possible) CCR’ as a competing risk and calculating the familiar KM estimator with first CCR as the only event, ejCCR S^CCR ðtj Þ ¼ S^CCR ðtj1 Þ 1 ; ð1Þ nj Event time ordering1 Erroneous censoring of competing risks through use of the Kaplan-Meier estimator. Columns (1–3) of Table 1 First CCR as the only event, competing risk(s) inappropriately censored First CCR as the event of interest, competing risk(s) appropriately considered Estimated cumulative incidence12 IˆCCR(tj) calculation of CIFs instead of 1–KM curves is needed. Let us assume that the primary event of interest was ‘time to first CCR after randomization’ with CCR as defined by the European LeukemiaNet.14 For the moment, the dates of first CCR are supposed to be known exactly. 0 0.2 0.2 0.4 0.4 0.8 1434 Analyzing endpoints beyond overall survival in CML M Pfirrmann et al Probabilities of achieving a 1st CCR 0.9 Probabilities of death before a 1st CCR 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Months after randomization 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Probabilities of achieving a 1st CCR Probabilities of death before a 1st CCR 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Months after randomization Figure 1 Estimation of probabilities of ‘achieving a first CCR’ and of ‘death before achieving a first CCR’ based on the Kaplan-Meier method. The dates of both, mutually exclusive events were supposed to be known exactly. ‘Death before first CCR’ and ‘first CCR (before death)’ were the competing events. At the time of the last event (t5 ¼ 523), the probabilities of the curves add up to 1.25. Accordingly, ^lCCR(tj) (column (10)) is calculated dividing column (6) by column (4). To receive the unconditional probability of observing event k at tj, p̂k(tj) (column (11)), we need to multiply the event rate ^lk(tj) by the probability of still being event-free at tj, Ŝ(tj1): ^ j1 Þ: ^k ðtj Þ ¼ ^lk ðtj ÞSðt p ð4Þ Because the event-free survival condition became an integrated part of p̂k(tj), the estimator expresses the probability of observing event k at tjFwithout further constraint. Finally, the cumulative incidence of event k, at tj, Îk(tj) (column (12)), is given by ^Ik ðtj Þ ¼ ^Ik ðtj1 Þ þ p ^k ðtj Þ: Probability estimate according to CIF Probability estimate according to 1-KM 1435 1.0 ð5Þ At t1 ¼ 93, the event rate for a first CCR is l^CCR ðt1 ¼ 93Þ ¼ e1CCR =n1 ¼ 1=5 ¼ 0:2: For the probability of observing a first ^ 0 ¼ 0Þ ¼ ^CCR ðt1 ¼93Þ¼ ^lCCR ðt1 ¼93ÞSðt CCR at t1, we calculate p 0:21 ¼ 0:2: The CIF of a first CCR at t1 is then estimated by ^ICCR ðt1 ¼ 93Þ ¼ ^ICCR ðt0 ¼ 0Þ þ p^CCR ðt1 ¼ 93Þ ¼ 0 þ 0:2 ¼ 0:2 and so on. (Table 1, columns (9–12) and Figure 2). Comparing column (8) with column (12) and Figure 1 with Figure 2, we learn that the differences between the 1–KM and the CIF estimators appear as soon as the first event of interest directly after the first detection of a competing risk has been recorded. Both estimators go up to their maximum possible value, if all subjects with the longest observation time experience an event of interest. Whatever the statistic, a sample of n ¼ 5 patients cannot provide reliable estimations. But as the probabilities of the CIF estimator are appropriately corrected for competing risks, the observation of an estimated total probability greater than one for the mutually exclusive events is now prevented (Figure 2): ^ICCR ðt5 ¼ 523Þ þ ^Ideath ðt5 ¼ 523Þ ¼ 0:8 þ 0:2 ¼ 1: Some features around the calculation of cumulative incidence functions. The observation time of patients with no events should be restricted to the date of the last confirmation that none of the competing events had been experienced. If more than one competing risk is considered, the CIF probabilities of the event of interest are calculated with all other competing events formally grouped together. For outcome Figure 2 Estimation of probabilities of ‘achieving a first CCR’ and of ‘death before achieving a first CCR’ based on the CIF. The dates of both, mutually exclusive events were supposed to be known exactly. ‘Death before first CCR’ and ‘first CCR (before death)’ were the competing events. At the time of the last event (t5 ¼ 523), the probabilities of the curves add up to 1. interpretation, each of the mutually exclusive competing events should be examined in turn. Instead of the inappropriate use of the log-rank test, the comparison of CIF estimates between two or more patient groups can be performed through the Gray test.16 A widespread proposal to calculate 95% pointwise confidence intervals for the CIF estimates was introduced by Choudhury.17 Scrucca et al.13 published a program providing many functions for the analyses of CIFs. The program is based on the R package ‘cmprsk’. Its application is described in the Supplementary appendix (see also Supplementary Figure 1). Critical considerations about time-to-event analysis of remission parameters. At the beginning, we made the assumption that the exact date of a first CCR was known. Actually, this is rarely the case. According to many study protocols, patients should have a cytogenetic evaluation every three or six months within the first years of treatment. Thus, the real, unknown date of a first CCR lies within a time interval between scheduled visits leading to delayed, ‘intervalcensored’12 observation of the first CCR. Statistical tools might be suggested to deal with intervalcensored data also, in the presence of competing risks. However, the first detection of a CCR depends on protocol compliance, too. Patients will not see the doctor exactly every three or six months. Many patients will have cytogenetic evaluations less frequently than intended. Hence, CCR discovery is further delayed or even missed. In addition, relapses are not accounted for. A treatment might produce shorter times to a first CCR, but also shorter times to a loss of CCR afterwards. For these reasons, we prefer ‘remission status at a certain time point’. Remission status at a certain time point as a primary outcome In evaluating remission status at a certain time point, several issues have to be considered. Assignment of remission outcome to a time point. At first, clinicians have to determine reasonable time points during the course of therapy at which the evaluation of a remission variable should be performed. Every result should be assignable to exactly one of these time points. For instance, with a Leukemia Analyzing endpoints beyond overall survival in CML M Pfirrmann et al 1436 three-monthly evaluation of CCR status (CCR: yes or no), the following assignment of results in dependence on t ¼ ‘cytogenetic evaluation time after randomization’ is suggested: 1.5otp4.5 months is allocated to the time point 3 months 4.5otp7.5 months is allocated to the time point 6 months 7.5otp10.5 months is allocated to the time point 9 months and so on. Obviously, any drug needs some time to induce cytogenetic remission in Philadelphia chromosome (Ph)-positive14 CML patients. For this reason and to obtain identical duration for the time intervals, 1.5 months was chosen as the beginning of the first interval. All cut offs lie exactly halfway between two subsequent time points. More than one result at a time. Some patients will have more than one evaluation attributable to a certain time point. Before a first look at the data, a proceeding of how to choose one out of several results has to be defined. It is of utmost importance that this definition is independent of the remission outcome. As an example, we determine the CCR status 18 months after randomization. The corresponding time interval is given by 16.5otp19.5 months. Of all evaluations falling into this period, our priority is to select the closest result to 18 months which was evaluated X18 months after randomization. If there was no such t, the closest result to 18 months out of 16.5oto18 months is chosen. The motivation behind this is the preference of a result after the (possible) completion of the whole treatment period (of 18 months) over maximum adjacency to the investigated time point. To support the availability of results just after the completed period, physicians could be asked to see patients directly after 18 months rather than at 18 months. Choosing time point and response level for the primary endpoint. To decide on the optimal time for the primary endpoint and the level of response, it is important to balance risk against gain of information. On the one hand, the endpoint evaluation should be early enough to prevent an undue amount of serious events like blast crisis14 or deaths. On the other hand, the primary endpoint should also be late enough to catch the bulk of the expected number of responses. Furthermore, at the time of choice, the achievement of the clinical goal connected with the primary endpoint, for example, CCR needs to show sufficient stability. Finally, a noticeable quality as a surrogate marker for survival should have been established for the endpoint.18 For instance, a landmark analysis19 may suggest that the CCR status at 18 months carries considerable prognostic information with respect to progression-free survival.20 In many cases, the decision on time point and response level of the primary endpoint has to be based on information that was retrieved from pilot studies or formerly common treatments. Determination of remission outcome. With CCR status at 18 months as the primary endpoint, a cytogenetic evaluation at this time point is inevitable. A substitution by the result of hematologic or molecular remission is not satisfactory. A very small BCR-ABL to ABL ratio14 might imply a CCR. But what level does still imply CCR? Taking the previous considerations with regard to CCR status determination at 18 months into account, the following classification is suggested: Patients without detectable Ph: CCR: yes Ph-positive patients: CCR: no Patients who died within 19.5 months: CCR: no Leukemia Patients whose disease phase changed to accelerated phase21 or blast crisis (disease progression)21 within 19.5 months: CCR: no Patients who experienced a severe (grade 3 or 4) adverse event (AE)22 and had to stop treatment within 19.5 months: CCR: no. Disease progression and severe AEs are serious indicators for treatment failure. Regarding them as early negative remission outcomes, both events should be considered as competing risks in prior CIF analyses, too. In any case, it is important to describe each expression of lacking remission success and its probabilities on its own. Handling of missing values. Using the last reported value before the time point of interest (last value carried forward (LVCF)) is not encouraged. LVCF does not work for nonresponders in a period of rapid increases of the remission rate. It is likely that a patient with CCR at 6 or 9 months has the same status at 12 months. However, a patient with no CCR at 6 or 9 months could well be in CCR at 12 months.20 Thus, the application of LVCF in substituting the missing values at 12 months by the observed values at 6 or 9 months would underestimate the proportion of patients with CCR at 12 months. Two further approaches are common: either the observations are counted as ‘no CCR’ or they are discarded from analysis. In case of the first approach, the probability of CCR is systematically underestimated. Both approaches share the problem that the amount of and the reasons for missing values could be associated with the randomized treatment, that is, data is not missing at random. To circumvent at least systematic underestimation, we prefer the second approach and work with complete data only. However, in particular with a primary endpoint, missing outcome data should be avoided in any circumstances. Together with the treatment group, the reasons for missing values need to be recorded. Composite time-to-event endpoints For discussion, four time-to-event endpoints are introduced: Overall survival Events: Death from any cause Progression-free survival Events: Death from any cause Beginning of blast crisis Beginning of accelerated phase Event-free survival Events: Same as progression-free survival plus Observation of a severe AE (grade 3 or 4) Failure-free survival Events: Same as event-free survival plus Less than complete hematologic remission (CHR)14 at 3 months Less than CCR at 18 months Loss of partial cytogenetic remission (PCR).14 If at least two different events are summarized as a result of the same kind, the corresponding outcome variable is usually denoted as a ‘composite endpoint’. Whichever of the three events ‘accelerated phase’, ‘blast crisis’ or ‘death’ is observed first ends the time of the composite endpoint progression-free survival (PFS) in patients who started treatment in chronic phase.21 With the extension by ‘severe AE’ as a fourth event, Analyzing endpoints beyond overall survival in CML M Pfirrmann et al 1437 event-free survival (EFS) provides an additional composite endpoint. A definition of failure-free survival (FFS) might comprise all events of EFS plus ‘loss of PCR’ which means that 435% Ph-positive metaphases were observed, again.14 At diagnosis, almost every patient has 435% Ph-positive metaphases. To avoid the bias that patients who have never achieved a CCR are never at risk for a loss, FFS should also consider the failure to obtain remission until a certain time. For instance, ‘no CCR at 18 months’ could be regarded as an extra event. Further based on the experience with TKIs, ‘no CHR at 3 months’ may be added. ‘Loss of PCR’ was preferred over ‘loss of CCR’ because ‘loss of cytogenetic remission’ should not already be backed through the possibly exceptional observation of just one Ph-positive metaphase. With 3 months for CHR and 18 months for CCR chosen in accordance with clinical knowledge, it is important to give the complete time for the drug to develop the aspired quality of remissionFunless progression or a severe AE is observed before. Individual decisions on premature drug discontinuation with lacking objective traceability would hamper the generalizability of the results. To determine the remission status at a certain time point, the proceeding introduced in the previous section is suggested. The failure definition does not demand a second evaluation in order to confirm a remission status, and loss of CHR is implicitly captured through either lacking CCR at 18 months or loss of PCR thereafter. Compared with OS with death as the only event, composite endpoints have the advantage of a possibly earlier discovery of treatment differences. More events usually demand lower sample sizes. The motivation to choose a composite endpoint like FFS is the notion to cover all events leading to treatment discontinuation in clinical reality. However, in particular with FFS, the gravity of the events counted as ‘failures’ differs considerably which leads to difficulties with respect to interpretation. Discussion Due to the treatment success of TKI, the mortality with CML has become remarkably low. A possible further improvement in OS, for example, due to a next generation TKI, will not be recognized for a long time. However, investigators want to identify advantages of a certain treatment approach earlier on. Accordingly, alternative primary efficacy endpoints like remission parameters were introduced.6–8 To master implicit methodological challenges, we suggested recommendations for sound statistical analyses. Remission parameters are regarded as surrogate markers for OS. Some investigators like to present a time-to-first-remission analysis, with ‘remission’ defined, for example, by CCR or MMR. Here, due to the inevitable competing risk ‘death before a possible observation of a first remission’, the Kaplan-Meier method should not be applied. A prerequisite for Kaplan-Meier analysis is non-informative censoring. With competing risks, this prerequisite is no longer met (Table 1, Figures 1 and 2). The appropriate approach to estimate the probabilities of a time-to-event endpoint in the presence of competing risks is the calculation of its CIF. Scrucca et al. provided a program for CIF analyses (see the example in the Supplementary appendix).13 For further reading on competing risks, the tutorial of Putter et al. is recommended.12 Due to interval censoring, the exact date of first remission is hardly known. Additionally, remission detection depends very much on protocol compliance. Differences in evaluation times and frequencies may cause seriously biased effect sizes between treatments. For these reasons, we do not recommend the use of ‘time to a certain level of remission’ as a primary endpoint to decide on the efficacy of a treatment. Also a composite time-to-event outcome like FFS does not qualify for a primary endpoint. FFS has (too) many events whose severity and accuracy of measurement are too heterogeneous. It could be beyond the capability of a clinical trial to follow each patient with sufficient intensity in order not to miss important information at any scheduled evaluation time point. In accordance with accuracy and ease of interpretation, the order of preference for time-to-event outcomes is OS, PFS and EFS. The chance to identify essential treatment differences earlier due to a higher number of failures could speak for EFS where all possible events mean a serious deterioration for the patient, which will not pass unnoticed. For interpretation of composite endpoints, it is advisable to also report each event on its own, in particular if a randomized comparison between therapies is involved. Otherwise, differences in the distribution of the causes of failure between two therapies A and B could go unnoticed: the total number of failures might have been higher for therapy A, but for therapy B, considerably more deaths might have been reported. Composite endpoints demand caution with respect to analysis and interpretation.18,23 Suggested proceeding for randomized treatment comparison In summary, we suggest the following proceeding for a randomized comparison between treatments in CML: As the first primary endpoint, the investigators choose a remission status which should be achieved at a certain time point, for example, CCR at 18 months. They might base their decision on the results of a prior ‘time-to-first-CCR’ analysis that, despite delayed discovery times, gives an impression of the velocity of drug response and of the time until a certain remission should be waited for. For both analyses, the same adverse competing risks are defined. Only the occurrence of a competing risk should lead to drug discontinuation, that is, the investigators agree on a clinically meaningful waiting time allowing reasonable safety also for non-responders. Whereas non-responders might be recruited for a research question on second-line treatment, responders qualify for the examination of the next (primary) endpoint following from the a priori ordering of the hypotheses. Naturally, this second primary endpoint could be relapse-free survival, the time until a certain level of remission is lost. The definition of loss should not only provide a clear-cut differentiation between the intended remission success and its loss, but also consider the safety of the patients. If the intended remission success was complete molecular remission,14 loss of MMR would clearly indicate a relapse whereas loss of complete molecular remission might rather be subjected to a temporary fluctuation. Sticking to firstline treatment until loss of MMR is reasonably safe. With caring patient management, loss of remission will be noticed before any more serious event occurs.24 The beginning of the accelerated disease phase is a serious failure which only postpones death, if no major treatment change is undertaken. Thus, like OS, PFS could be chosen as a further primary outcome, even though it is a composite endpoint. On the contrary, EFS and FFS should rather be regarded as secondary endpoints. Primary analyses are intention-to-treat (ITT), that is, in accordance with randomization.25 With the investigators’ Leukemia Analyzing endpoints beyond overall survival in CML M Pfirrmann et al 1438 agreement on the safety of the remission endpoints, the corresponding results of the ITT analyses should be well relatable to first-line treatment. This relation might be less close when the endpoints PFS or OS are considered. Many patients with insufficient remission results change to second-line treatment with the obvious intention to influence survival probabilities in the long run. Still, the results of the ITT analyses with regard to these important survival endpoints need to be reported. Furthermore, surrogate markers are not deterministic.18 The assumed correlation of a surrogate marker with future survival probabilities should be investigated. In deviation from the ITT principle, alternative samples may only include patients who were adhering to the protocol or comprise patients in accordance with their actual treatment. However, many decisions on treatment changes are subjective and can lead to samples with unbalanced patient characteristics. The ITT analysis based on the randomized groups prevents this selection bias.25 ‘Per-protocol’ or ‘as-treated’ analyses may support result interpretation of ITT analysis but not substitute it. Our recommendations support the conduct of good quality clinical trials in CML. With respect to primary endpoints, they name problems and provide solutions for data collection and analyses. They also stress the importance that the methods in determining the outcome of an endpoint be described in detail. Sufficient clarification eases the understanding of variations in outcome definition between trials, one possible reason for differences in findings. In any case, results between two treatment arms of different trials are not directly comparable. Instead, effect sizes between randomized treatments can be collected from related trials and combined in a common meta-analysis.26,27 Conflict of interest The authors declare no conflict of interest. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Acknowledgements The valuable assistance of A Gil and S Döring is gratefully appreciated. This work was supported by the German CML Study Group; Grant no. BMBF 01GI0270 from Deutsches Kompetenznetz für Akute und Chronische Leukämien; Grant no. DJCLS R05/23 from Deutsche José-Carreras Leukämiestiftung and The EUropean Treatment and Outcome Study (EUTOS), a project supported by Novartis Oncology Europe. 21 22 References 1 Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer 2003; 89: 232–238. 2 Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Amer Statist Assoc 1958; 53: 457–481. 3 Altman DG. Practical Statistics for Medical Research. Chapman & Hall: London, 1991, pp 365–387. 4 Collett D. Modelling Survival Data in Medical Research. Chapman & Hall: London, 1994, pp 15–49. 5 O’Brien SG, Guilhot F, Larson RA, Gathmann I, Baccarani M, Cervantes F et al. Imatinib compared with interferon and low-dose cytarabine for newly diagnosed chronic-phase chronic myeloid leukemia. N Engl J Med 2003; 348: 994–1004. 6 Hehlmann R, Lauseker M, Jung-Munkwitz S, Leitner A, Müller MC, Pletsch N et al. Tolerability-Adapted Imatinib 800 mg/d Versus 400 mg/d Versus 400 mg/d Plus Interferon-a in Newly 23 24 25 26 27 Diagnosed Chronic Myeloid Leukemia. J Clin Oncol 2011; 29: 1634–1642. Saglio G, Kim DW, Issaragrisil S, le Coutre P, Etienne G, Lobo C et al. Nilotinib versus imatinib for newly diagnosed chronic myeloid leukemia. N Engl J Med 2010; 362: 2251–2259. Kantarjian H, Shah NP, Hochhaus A, Cortes J, Shah S, Ayala M et al. Dasatinib versus imatinib in newly diagnosed chronic-phase chronic myeloid leukemia. N Engl J Med 2010; 362: 2260–2270. Deininger M, O’Brien SG, Guilhot F, Goldman JM, Hochhaus A, Hughes TP et al. International Randomized Study of Interferon Vs STI571 (IRIS) 8-Year Follow up: Sustained Survival and Low Risk for Progression or Events in Patients with Newly Diagnosed Chronic Myeloid Leukemia in Chronic Phase (CML-CP) Treated with Imatinib. ASH Annual Meeting Abstracts 2009; 114: 1126. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. Wiley: New York, 1980, pp 167–171. Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data, 2nd edn. Springer: New York, 2003, pp 50–57. Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multi-state models. Stat Med 2007; 26: 2389–2430. Scrucca L, Santucci A, Aversa F. Competing risk analysis using R: an easy guide for clinicians. Bone Marrow Transplant 2007; 40: 381–387. Baccarani M, Cortes J, Pane F, Niederwieser D, Saglio G, Apperley J et al. Chronic myeloid leukemia: an update of concepts and management recommendations of European LeukemiaNet. J Clin Oncol 2009; 27: 6041–6051. Gooley TA, Leisenring W, Crowley J, Storer BE. Estimation of failure probabilities in the presence of competing risks: new representations of old estimators. Stat Med 1999; 18: 695–706. Gray RJ. A class of k-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 1988; 16: 1141–1154. Choudhury JB. Non-Parametric confidence interval estimation for competing risks analysis: application to contraceptive data. Stat Med 2002; 21: 1129–1144. Fleming TR, Rothmann MD, Lu HL. Issues in using progression-free survival when evaluating oncology products. J Clin Oncol 2009; 27: 2874–2880. Anderson JR, Cain KC, Gelber RD. Analysis of survival by tumor response. J Clin Oncol 1983; 1: 710–719. Hasford J, Baccarani M, Hoffmann V, Guilhot J, Saussele S, Rosti G et al. Predicting complete cytogenetic response and subsequent progression-free survival in 2060 patients with CML on imatinib treatment. The EUTOS score. Blood 2011; e-pub ahead of print 2 May 2011; doi:10.1182/blood-2010-12-319038. Baccarani M, Saglio G, Goldman J, Hochhaus A, Simonsson B, Appelbaum F et al. Evolving concepts in the management of chronic myeloid leukemia: recommendations from an expert panel on behalf of the European LeukemiaNet. Blood 2006; 108: 1809–1820. U.S. Department of Health and Human Services, National Institutes of Health, National Cancer Institute. Common Terminology Criteria for Adverse Events (CTCAE), Version v403: June 14, 2010. http://evs.nci.nih.gov/ftp1/CTCAE/CTCAE_4.03_2010-06-14_ QuickReference_5x7.pdf. Mell LK, Jeong JH. Pitfalls of using composite primary end points in the presence of competing risks. J Clin Oncol 2010; 28: 4297–4299. Mahon F-X, Réa D, Guilhot J, Guilhot F, Huguet F, Nicolini F et al. Discontinuation of imatinib in patients with chronic myeloid leukaemia who have maintained complete molecular remission for at least 2 years: the prospective, multicentre Stop Imatinib (STIM) trial. Lancet Oncol 2010; 11: 1029–1035. Altman DG. Practical Statistics for Medical Research. Chapman & Hall: London, 1991, pp 463–464. Woodward M. Epidemiology: study design and analyis. Chapman & Hall/CRC: Boca Raton, 2005, pp 673–720. Chronic Myeloid Leukemia Trialists Collaborative Group. Interferon Alfa Versus Chemotherapy for Chronic Myeloid Leukemia: A Meta-analysis of Seven Randomized Trials. J Natl Cancer Inst 1997; 89: 1616–1620. Supplementary Information accompanies the paper on the Leukemia website (http://www.nature.com/leu) Leukemia
© Copyright 2026 Paperzz