Recommendations to meet statistical challenges arising

Leukemia (2011) 25, 1433–1438
& 2011 Macmillan Publishers Limited All rights reserved 0887-6924/11
www.nature.com/leu
ORIGINAL ARTICLE
Recommendations to meet statistical challenges arising from endpoints beyond overall
survival in clinical trials on chronic myeloid leukemia
M Pfirrmann1, A Hochhaus2, M Lauseker1, S Sauele3, R Hehlmann3 and J Hasford1
1
Institut für Medizinische Informationsverarbeitung, Biometrie und EpidemiologieFIBE, Ludwig-Maximilians-Universität,
München, Germany; 2Abteilung Hämatologie/Onkologie, Universitätsklinikum Jena, Jena, Germany and 3III. Medizinische
Klinik, Medizinische Fakultät Mannheim der Universität Heidelberg, Mannheim, Germany
The aim of this work was to provide guidelines for appropriate
statistical analyses regarding most common endpoints in clinical
trials on chronic myeloid leukemia: hematologic, cytogenetic and
molecular results, failure-free and event-free survival, and progression-free and overall survival. The reasons for the specified
recommendations are explained and important issues are outlined by comprehensive examples. Particular attention is paid to
the warning of the application of suboptimal methods that may
lead to seriously biased results and conclusions. In the presence
of a competing risk like death, Kaplan-Meier analysis should
not be applied for time-to-remission endpoints. The appropriate
method to estimate the probabilities of a time-to-remission
endpoint is the calculation of its cumulative incidence function.
However, the exact date of remission is hardly known. Detection
of remission depends strongly on evaluation frequencies.
Complex composite endpoints comprising many events with
considerably heterogeneous severity imply difficulties with
interpretation. Time-to-remission and complex composite endpoints are not recommended for primary judgment on efficacy. It
is rather advisable to investigate remission status at a fixed time
point as a primary endpoint, followed by progression-free and
overall survival. For patients with the intended remission success
at the time point of interest, relapse-free survival provides an
additional primary outcome.
Leukemia (2011) 25, 1433–1438; doi:10.1038/leu.2011.116;
published online 20 May 2011
Keywords: chronic myeloid leukemia; clinical trials; competing
risks; time-to-event endpoints; composite endpoints
Introduction
Overall survival (OS) used to be the undisputed primary
endpoint for efficacy evaluation of treatments in chronic
myeloid leukemia (CML). Apart from its role as the main clinical
concern, OS with death from any cause as the only event has
some convenient characteristics from a pure statistical point of
view. The date of the event is known exactly for almost every
patient. Provided that follow-up was sufficiently long, complete
and did not differ between the randomized treatment groups and
provided that censoring carried no prognostic information about
subsequent failure time, analysis of OS probabilities should lead
to unbiased results.1 The statistical methods for calculating
Kaplan-Meier survival plots, the P-value (mostly) of the log-rank
test, estimates of treatment effects, and corresponding confidence
intervals, are well documented.1–4
Since the beginning of this century, tyrosine kinase inhibitors
(TKIs) have become the treatment of choice for CML. Due to
Correspondence: Dr M Pfirrmann, Institut für Medizinische Informationsverarbeitung, Biometrie und EpidemiologieFIBE, Ludwig-MaximiliansUniversität, Marchioninistrae 15, 81377 München, Germany.
E-mail: [email protected]
Received 1 April 2011; accepted 26 April 2011; published online
20 May 2011
early promising results, already in the phase III clinical trial
(IRIS) leading to the approval of imatinib (the first TKI) for newly
diagnosed CML patients, efficacy assessment was not based on
OS any longer.5 The same applies to later trials like the recent
ones comparing imatinib in different doses and treatment
combinations,6 imatinib with nilotinib7 or with dasatinib.8
Instead of OS, the rates of major molecular remission (MMR)
and of complete cytogenetic remission (CCR), both at 12
months, had been the primary endpoints.6–8 They were
preferred over the classical endpoint because TKIs improved
OS in a way that possible statistically significant differences
between treatments will not be observed for a long time.9 With
the need to arrive at insights on therapeutic improvement earlier
on, decisions on the efficacy of a new therapy are based on
remission parameters like MMR and CCR which are regarded as
surrogate markers for OS. Cumulative probability curves for
complex composite endpoints and endpoints like ‘time to the
first MMR’ became more and more common.
Methods
It is our observation that the analysis of time-to-remission
endpoints is still erroneously performed by plotting 1–KM
(Kaplan-Meier) curves: ‘death before a possible observation of a
first MMR’ is censored at the time it occurred and the fact that it
actually is a competing event (risk) to the event of interest, which
is ‘first MMR’, is ignored. Instead of 1–KM curves, cumulative
incidence functions (CIFs) for ‘time to first MMR’ with appropriate
consideration of competing risks must be calculated.10–12 As not
only the procedure of computing CIFs but also its necessity is not
well understood, we present a simple example which could
easily be recalculated. Not all statistical software packages
support CIF analysis. Hence, a program published by Scrucca
et al.13 based on the freely available programming software R is
described in the Supplementary appendix in detail.
Time-to-remission endpoints are compared with ‘remission at
a certain time’. We give reasons why the latter should be
preferred for a primary outcome. Explicit proceedings in
determining the remission status at a certain time are provided.
Finally, methodological issues regarding the consideration of
composite endpoints like progression-free survival, event-free
and failure-free survival are discussed.
Results
Time-to-event endpoints and competing risks
With the following example based on data of five hypothetical
patients, we would like to illustrate a typical situation where the
Analyzing endpoints beyond overall survival in CML
M Pfirrmann et al
Leukemia
F
0.2
0
0.2
0
0.4
F
0.2
0
0.33
0
1
1
0.8
0.6
0.4
0.4
0
0
0.2
0.2
0.47
0.47
1
0
93
147
209
360
523
Abbreviations: CCR, complete cytogenetic remissions; KM, Kaplan-Meier.
a
It was assumed that the exact dates of the first achievement of a CCR were known.
1
0.8
0.8
0.53
0.53
0
F
1
0
1
0
1
F
1
1
1
0
1
ej ccr
ej
F
1st CCR
Death
1st CCR
Censoring
1st CCR
5
5
4
3
2
1
Estimated
CCR
probability11
p̂CCR(tj)
Estimated
CCR
rate10
^lCCR(tj)
Estimated
survival
w/o any event9
Ŝ(tj)
Estimated
1–KM
probability8
1ŜCCR(tj)
nj
No. of
first CCR6
Total no.
of events5
Estimated
survival
w/o CCR7
ŜCCR(tj)
0
1
2
3
4
5
ð3Þ
tj
^lk ðtj Þ ¼ ejk :
nj
j
In our example, refer to Table 1, columns (4, 5, 9). Next, we
calculate the hazard that a patient experiences event k at time tj,
with k ¼ ‘(first) CCR’ or k ¼ ‘death’, under the condition that the
patient had no event up to time tj:
No. at
risk4
priate approach to estimate the probabilities of achieving a first
CCR in the presence of competing risks is the calculation of the
CIF.10–12,15 Following the proceeding of Putter et al., the first
step is the estimation of the Kaplan-Meier probabilities of being
event-free in relation to any kind of the competing events:12
^ j Þ ¼ Sðt
^ j1 Þ 1 ej :
ð2Þ
Sðt
nj
Event3
Appropriate consideration of a competing risk: calculation of the cumulative incidence function. The appro-
Event time
in days2
ð1 S^CCR ðt5 ¼ 523ÞÞ þ ð1 S^death ðt5 ¼ 523ÞÞ ¼ 1 þ 0:25 ¼ 1:25:
Table 1
‘death’ is censored and formally set to zero (columns (6–7)). By
definition, Kaplan-Meier estimates change only at event times.1
Thus, it is sufficient to apply the recursive formula at times tj
where ej40. With t0 ¼ 0 and ŜCCR(0) ¼ 1, at t1 ¼ 93 the
probability of survival without the achievement of a first CCR
is estimated by S^CCR ðt1 ¼ 93Þ ¼ S^CCR ðt0 ¼ 0Þð1 e1CCR =n1 Þ ¼
1ð1 1=5Þ ¼ 0:8 and so on. Then, it is often claimed that the
complement 1–KM (Table 1, column (8) and Figure 1) represents
the estimated probability of observing the event of interest (first
CCR) up to a time t.
A prerequisite for the application of the Kaplan-Meier
estimator is ‘non-informative censoring’ meaning that the event
of interest might be observed in the future, that is, the patient is
still ‘at risk’.1,3,4 It implies that the reason for censoring carries
no information on the subsequent event probabilities. For the
competing event ‘death before a possible first CCR,’ this
prerequisite is clearly not met. If a patient died, he is not at
risk any longer and the probability of a later first CCR is reduced
to zero. Consequently, the erroneous censoring of competing
risks leads to systematically biased (overestimated) probabilities
of observing the event of interest.12,15 From t5 ¼ 523, the
wrongly estimated probabilities of the two mutually exclusive
events ‘achievement of a first CCR’ and ‘death before achievement of a first CCR’ even add up to a total probability greater
than one (Figure 1):
Inappropriate versus appropriate estimation of the probabilities of achievement of a first CCRa
display the ordered event times tj, j ¼ 1,y,5, and the quality of
the event observed for the five patients. At the time of
randomization (j ¼ 0), none of the patients was in cytogenetic
remission. The fourth column contains the number of patients at
risk nj just prior to event time tj which is followed by the total
number of events ej at tj. In ignoring ‘death before a (possible)
CCR’ as a competing risk and calculating the familiar KM
estimator with first CCR as the only event,
ejCCR
S^CCR ðtj Þ ¼ S^CCR ðtj1 Þ 1 ;
ð1Þ
nj
Event time
ordering1
Erroneous censoring of competing risks through use of
the Kaplan-Meier estimator. Columns (1–3) of Table 1
First CCR as the only event,
competing risk(s) inappropriately
censored
First CCR as the event of interest,
competing risk(s)
appropriately considered
Estimated
cumulative
incidence12
IˆCCR(tj)
calculation of CIFs instead of 1–KM curves is needed. Let us
assume that the primary event of interest was ‘time to first CCR
after randomization’ with CCR as defined by the European
LeukemiaNet.14 For the moment, the dates of first CCR are
supposed to be known exactly.
0
0.2
0.2
0.4
0.4
0.8
1434
Analyzing endpoints beyond overall survival in CML
M Pfirrmann et al
Probabilities of achieving a 1st CCR
0.9
Probabilities of death before a 1st CCR
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Months after randomization
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Probabilities of achieving a 1st CCR
Probabilities of death before a 1st CCR
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Months after randomization
Figure 1 Estimation of probabilities of ‘achieving a first CCR’ and of
‘death before achieving a first CCR’ based on the Kaplan-Meier
method. The dates of both, mutually exclusive events were supposed
to be known exactly. ‘Death before first CCR’ and ‘first CCR (before
death)’ were the competing events. At the time of the last event
(t5 ¼ 523), the probabilities of the curves add up to 1.25.
Accordingly, ^lCCR(tj) (column (10)) is calculated dividing
column (6) by column (4). To receive the unconditional
probability of observing event k at tj, p̂k(tj) (column (11)), we
need to multiply the event rate ^lk(tj) by the probability of still
being event-free at tj, Ŝ(tj1):
^ j1 Þ:
^k ðtj Þ ¼ ^lk ðtj ÞSðt
p
ð4Þ
Because the event-free survival condition became an integrated
part of p̂k(tj), the estimator expresses the probability of observing
event k at tjFwithout further constraint. Finally, the cumulative
incidence of event k, at tj, Îk(tj) (column (12)), is given by
^Ik ðtj Þ ¼ ^Ik ðtj1 Þ þ p
^k ðtj Þ:
Probability estimate according to CIF
Probability estimate according to 1-KM
1435
1.0
ð5Þ
At t1 ¼ 93, the event rate for a first CCR is l^CCR ðt1 ¼ 93Þ ¼
e1CCR =n1 ¼ 1=5 ¼ 0:2: For the probability of observing a first
^ 0 ¼ 0Þ ¼
^CCR ðt1 ¼93Þ¼ ^lCCR ðt1 ¼93ÞSðt
CCR at t1, we calculate p
0:21 ¼ 0:2: The CIF of a first CCR at t1 is then estimated
by ^ICCR ðt1 ¼ 93Þ ¼ ^ICCR ðt0 ¼ 0Þ þ p^CCR ðt1 ¼ 93Þ ¼ 0 þ 0:2 ¼ 0:2
and so on. (Table 1, columns (9–12) and Figure 2).
Comparing column (8) with column (12) and Figure 1 with
Figure 2, we learn that the differences between the 1–KM and the
CIF estimators appear as soon as the first event of interest directly
after the first detection of a competing risk has been recorded.
Both estimators go up to their maximum possible value, if all
subjects with the longest observation time experience an event of
interest. Whatever the statistic, a sample of n ¼ 5 patients cannot
provide reliable estimations. But as the probabilities of the CIF
estimator are appropriately corrected for competing risks, the
observation of an estimated total probability greater than one for
the mutually exclusive events is now prevented (Figure 2):
^ICCR ðt5 ¼ 523Þ þ ^Ideath ðt5 ¼ 523Þ ¼ 0:8 þ 0:2 ¼ 1:
Some features around the calculation of cumulative
incidence functions. The observation time of patients with
no events should be restricted to the date of the last confirmation
that none of the competing events had been experienced. If
more than one competing risk is considered, the CIF probabilities of the event of interest are calculated with all other
competing events formally grouped together. For outcome
Figure 2 Estimation of probabilities of ‘achieving a first CCR’ and of
‘death before achieving a first CCR’ based on the CIF. The dates of
both, mutually exclusive events were supposed to be known exactly.
‘Death before first CCR’ and ‘first CCR (before death)’ were the
competing events. At the time of the last event (t5 ¼ 523), the
probabilities of the curves add up to 1.
interpretation, each of the mutually exclusive competing events
should be examined in turn. Instead of the inappropriate use of
the log-rank test, the comparison of CIF estimates between two
or more patient groups can be performed through the Gray
test.16 A widespread proposal to calculate 95% pointwise
confidence intervals for the CIF estimates was introduced by
Choudhury.17 Scrucca et al.13 published a program providing
many functions for the analyses of CIFs. The program is based on
the R package ‘cmprsk’. Its application is described in the
Supplementary appendix (see also Supplementary Figure 1).
Critical considerations about time-to-event analysis of
remission parameters. At the beginning, we made the
assumption that the exact date of a first CCR was known.
Actually, this is rarely the case. According to many study
protocols, patients should have a cytogenetic evaluation every
three or six months within the first years of treatment. Thus,
the real, unknown date of a first CCR lies within a time
interval between scheduled visits leading to delayed, ‘intervalcensored’12 observation of the first CCR.
Statistical tools might be suggested to deal with intervalcensored data also, in the presence of competing risks.
However, the first detection of a CCR depends on protocol
compliance, too. Patients will not see the doctor exactly every
three or six months. Many patients will have cytogenetic
evaluations less frequently than intended. Hence, CCR discovery is further delayed or even missed. In addition, relapses
are not accounted for. A treatment might produce shorter times
to a first CCR, but also shorter times to a loss of CCR afterwards.
For these reasons, we prefer ‘remission status at a certain time
point’.
Remission status at a certain time point as a primary
outcome
In evaluating remission status at a certain time point, several
issues have to be considered.
Assignment of remission outcome to a time point. At
first, clinicians have to determine reasonable time points during
the course of therapy at which the evaluation of a remission
variable should be performed. Every result should be assignable
to exactly one of these time points. For instance, with a
Leukemia
Analyzing endpoints beyond overall survival in CML
M Pfirrmann et al
1436
three-monthly evaluation of CCR status (CCR: yes or no), the
following assignment of results in dependence on t ¼ ‘cytogenetic
evaluation time after randomization’ is suggested:
1.5otp4.5 months is allocated to the time point 3 months
4.5otp7.5 months is allocated to the time point 6 months
7.5otp10.5 months is allocated to the time point 9 months
and so on.
Obviously, any drug needs some time to induce cytogenetic
remission in Philadelphia chromosome (Ph)-positive14 CML
patients. For this reason and to obtain identical duration for the
time intervals, 1.5 months was chosen as the beginning of the
first interval. All cut offs lie exactly halfway between two
subsequent time points.
More than one result at a time. Some patients will have
more than one evaluation attributable to a certain time point.
Before a first look at the data, a proceeding of how to choose
one out of several results has to be defined. It is of utmost
importance that this definition is independent of the remission
outcome. As an example, we determine the CCR status 18
months after randomization. The corresponding time interval is
given by 16.5otp19.5 months. Of all evaluations falling into
this period, our priority is to select the closest result to 18
months which was evaluated X18 months after randomization.
If there was no such t, the closest result to 18 months out of
16.5oto18 months is chosen. The motivation behind this is
the preference of a result after the (possible) completion of the
whole treatment period (of 18 months) over maximum
adjacency to the investigated time point. To support the
availability of results just after the completed period, physicians
could be asked to see patients directly after 18 months rather
than at 18 months.
Choosing time point and response level for the primary
endpoint. To decide on the optimal time for the primary
endpoint and the level of response, it is important to balance risk
against gain of information. On the one hand, the endpoint
evaluation should be early enough to prevent an undue amount
of serious events like blast crisis14 or deaths. On the other hand,
the primary endpoint should also be late enough to catch
the bulk of the expected number of responses. Furthermore, at
the time of choice, the achievement of the clinical goal
connected with the primary endpoint, for example, CCR needs
to show sufficient stability. Finally, a noticeable quality as a
surrogate marker for survival should have been established for
the endpoint.18 For instance, a landmark analysis19 may suggest
that the CCR status at 18 months carries considerable prognostic
information with respect to progression-free survival.20
In many cases, the decision on time point and response level
of the primary endpoint has to be based on information that was
retrieved from pilot studies or formerly common treatments.
Determination of remission outcome. With CCR status at
18 months as the primary endpoint, a cytogenetic evaluation at
this time point is inevitable. A substitution by the result of
hematologic or molecular remission is not satisfactory. A very
small BCR-ABL to ABL ratio14 might imply a CCR. But what
level does still imply CCR? Taking the previous considerations
with regard to CCR status determination at 18 months into
account, the following classification is suggested:
Patients without detectable Ph: CCR: yes
Ph-positive patients: CCR: no
Patients who died within 19.5 months: CCR: no
Leukemia
Patients whose disease phase changed to accelerated phase21
or blast crisis (disease progression)21 within 19.5 months:
CCR: no
Patients who experienced a severe (grade 3 or 4) adverse
event (AE)22 and had to stop treatment within 19.5 months:
CCR: no.
Disease progression and severe AEs are serious indicators for
treatment failure. Regarding them as early negative remission
outcomes, both events should be considered as competing risks
in prior CIF analyses, too. In any case, it is important to describe
each expression of lacking remission success and its probabilities on its own.
Handling of missing values. Using the last reported value
before the time point of interest (last value carried forward
(LVCF)) is not encouraged. LVCF does not work for nonresponders in a period of rapid increases of the remission rate. It
is likely that a patient with CCR at 6 or 9 months has the same
status at 12 months. However, a patient with no CCR at 6 or 9
months could well be in CCR at 12 months.20 Thus, the
application of LVCF in substituting the missing values at 12
months by the observed values at 6 or 9 months would
underestimate the proportion of patients with CCR at 12 months.
Two further approaches are common: either the observations
are counted as ‘no CCR’ or they are discarded from analysis. In
case of the first approach, the probability of CCR is systematically underestimated. Both approaches share the problem that
the amount of and the reasons for missing values could be
associated with the randomized treatment, that is, data is not
missing at random. To circumvent at least systematic underestimation, we prefer the second approach and work with
complete data only. However, in particular with a primary
endpoint, missing outcome data should be avoided in any
circumstances. Together with the treatment group, the reasons
for missing values need to be recorded.
Composite time-to-event endpoints
For discussion, four time-to-event endpoints are introduced:
Overall survival
Events: Death from any cause
Progression-free survival
Events: Death from any cause
Beginning of blast crisis
Beginning of accelerated phase
Event-free survival
Events: Same as progression-free survival plus
Observation of a severe AE (grade 3 or 4)
Failure-free survival
Events: Same as event-free survival plus
Less than complete hematologic remission (CHR)14
at 3 months
Less than CCR at 18 months
Loss of partial cytogenetic remission (PCR).14
If at least two different events are summarized as a result of the
same kind, the corresponding outcome variable is usually
denoted as a ‘composite endpoint’. Whichever of the three
events ‘accelerated phase’, ‘blast crisis’ or ‘death’ is observed
first ends the time of the composite endpoint progression-free
survival (PFS) in patients who started treatment in chronic
phase.21 With the extension by ‘severe AE’ as a fourth event,
Analyzing endpoints beyond overall survival in CML
M Pfirrmann et al
1437
event-free survival (EFS) provides an additional composite
endpoint.
A definition of failure-free survival (FFS) might comprise all
events of EFS plus ‘loss of PCR’ which means that 435%
Ph-positive metaphases were observed, again.14 At diagnosis,
almost every patient has 435% Ph-positive metaphases. To
avoid the bias that patients who have never achieved a CCR are
never at risk for a loss, FFS should also consider the failure to
obtain remission until a certain time. For instance, ‘no CCR at 18
months’ could be regarded as an extra event. Further based on
the experience with TKIs, ‘no CHR at 3 months’ may be added.
‘Loss of PCR’ was preferred over ‘loss of CCR’ because ‘loss of
cytogenetic remission’ should not already be backed through
the possibly exceptional observation of just one Ph-positive
metaphase. With 3 months for CHR and 18 months for CCR
chosen in accordance with clinical knowledge, it is important to
give the complete time for the drug to develop the aspired
quality of remissionFunless progression or a severe AE is
observed before. Individual decisions on premature drug
discontinuation with lacking objective traceability would
hamper the generalizability of the results.
To determine the remission status at a certain time point, the
proceeding introduced in the previous section is suggested. The
failure definition does not demand a second evaluation in order
to confirm a remission status, and loss of CHR is implicitly
captured through either lacking CCR at 18 months or loss of PCR
thereafter.
Compared with OS with death as the only event, composite
endpoints have the advantage of a possibly earlier discovery of
treatment differences. More events usually demand lower
sample sizes. The motivation to choose a composite endpoint
like FFS is the notion to cover all events leading to treatment
discontinuation in clinical reality. However, in particular with
FFS, the gravity of the events counted as ‘failures’ differs
considerably which leads to difficulties with respect to interpretation.
Discussion
Due to the treatment success of TKI, the mortality with CML has
become remarkably low. A possible further improvement in OS,
for example, due to a next generation TKI, will not be
recognized for a long time. However, investigators want to
identify advantages of a certain treatment approach earlier on.
Accordingly, alternative primary efficacy endpoints like remission parameters were introduced.6–8 To master implicit methodological challenges, we suggested recommendations for
sound statistical analyses.
Remission parameters are regarded as surrogate markers for
OS. Some investigators like to present a time-to-first-remission
analysis, with ‘remission’ defined, for example, by CCR or
MMR. Here, due to the inevitable competing risk ‘death before a
possible observation of a first remission’, the Kaplan-Meier
method should not be applied. A prerequisite for Kaplan-Meier
analysis is non-informative censoring. With competing risks, this
prerequisite is no longer met (Table 1, Figures 1 and 2).
The appropriate approach to estimate the probabilities of a
time-to-event endpoint in the presence of competing risks is the
calculation of its CIF. Scrucca et al. provided a program for CIF
analyses (see the example in the Supplementary appendix).13
For further reading on competing risks, the tutorial of Putter et al.
is recommended.12
Due to interval censoring, the exact date of first remission is
hardly known. Additionally, remission detection depends very
much on protocol compliance. Differences in evaluation times
and frequencies may cause seriously biased effect sizes between
treatments. For these reasons, we do not recommend the use of
‘time to a certain level of remission’ as a primary endpoint to
decide on the efficacy of a treatment.
Also a composite time-to-event outcome like FFS does not
qualify for a primary endpoint. FFS has (too) many events whose
severity and accuracy of measurement are too heterogeneous. It
could be beyond the capability of a clinical trial to follow each
patient with sufficient intensity in order not to miss important
information at any scheduled evaluation time point. In
accordance with accuracy and ease of interpretation, the order
of preference for time-to-event outcomes is OS, PFS and EFS.
The chance to identify essential treatment differences earlier due
to a higher number of failures could speak for EFS where all
possible events mean a serious deterioration for the patient,
which will not pass unnoticed. For interpretation of composite
endpoints, it is advisable to also report each event on its own, in
particular if a randomized comparison between therapies is
involved. Otherwise, differences in the distribution of the causes
of failure between two therapies A and B could go unnoticed:
the total number of failures might have been higher for therapy
A, but for therapy B, considerably more deaths might have been
reported. Composite endpoints demand caution with respect to
analysis and interpretation.18,23
Suggested proceeding for randomized treatment
comparison
In summary, we suggest the following proceeding for a
randomized comparison between treatments in CML:
As the first primary endpoint, the investigators choose a
remission status which should be achieved at a certain time
point, for example, CCR at 18 months. They might base their
decision on the results of a prior ‘time-to-first-CCR’ analysis that,
despite delayed discovery times, gives an impression of the
velocity of drug response and of the time until a certain
remission should be waited for. For both analyses, the same
adverse competing risks are defined. Only the occurrence of a
competing risk should lead to drug discontinuation, that is, the
investigators agree on a clinically meaningful waiting time
allowing reasonable safety also for non-responders.
Whereas non-responders might be recruited for a research
question on second-line treatment, responders qualify for the
examination of the next (primary) endpoint following from the a
priori ordering of the hypotheses. Naturally, this second primary
endpoint could be relapse-free survival, the time until a certain
level of remission is lost. The definition of loss should not only
provide a clear-cut differentiation between the intended
remission success and its loss, but also consider the safety of
the patients. If the intended remission success was complete
molecular remission,14 loss of MMR would clearly indicate a
relapse whereas loss of complete molecular remission might
rather be subjected to a temporary fluctuation. Sticking to firstline treatment until loss of MMR is reasonably safe. With caring
patient management, loss of remission will be noticed before
any more serious event occurs.24
The beginning of the accelerated disease phase is a serious
failure which only postpones death, if no major treatment
change is undertaken. Thus, like OS, PFS could be chosen as a
further primary outcome, even though it is a composite
endpoint. On the contrary, EFS and FFS should rather be
regarded as secondary endpoints.
Primary analyses are intention-to-treat (ITT), that is, in
accordance with randomization.25 With the investigators’
Leukemia
Analyzing endpoints beyond overall survival in CML
M Pfirrmann et al
1438
agreement on the safety of the remission endpoints, the
corresponding results of the ITT analyses should be well
relatable to first-line treatment. This relation might be less close
when the endpoints PFS or OS are considered. Many patients
with insufficient remission results change to second-line
treatment with the obvious intention to influence survival
probabilities in the long run. Still, the results of the ITT analyses
with regard to these important survival endpoints need to be
reported. Furthermore, surrogate markers are not deterministic.18 The assumed correlation of a surrogate marker with future
survival probabilities should be investigated.
In deviation from the ITT principle, alternative samples may
only include patients who were adhering to the protocol or
comprise patients in accordance with their actual treatment.
However, many decisions on treatment changes are subjective
and can lead to samples with unbalanced patient characteristics.
The ITT analysis based on the randomized groups prevents this
selection bias.25 ‘Per-protocol’ or ‘as-treated’ analyses may
support result interpretation of ITT analysis but not substitute it.
Our recommendations support the conduct of good quality
clinical trials in CML. With respect to primary endpoints, they
name problems and provide solutions for data collection and
analyses. They also stress the importance that the methods in
determining the outcome of an endpoint be described in detail.
Sufficient clarification eases the understanding of variations in
outcome definition between trials, one possible reason for
differences in findings. In any case, results between two treatment
arms of different trials are not directly comparable. Instead, effect
sizes between randomized treatments can be collected from
related trials and combined in a common meta-analysis.26,27
Conflict of interest
The authors declare no conflict of interest.
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Acknowledgements
The valuable assistance of A Gil and S Döring is gratefully
appreciated. This work was supported by the German CML
Study Group; Grant no. BMBF 01GI0270 from Deutsches
Kompetenznetz für Akute und Chronische Leukämien; Grant no.
DJCLS R05/23 from Deutsche José-Carreras Leukämiestiftung and
The EUropean Treatment and Outcome Study (EUTOS), a project
supported by Novartis Oncology Europe.
21
22
References
1 Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis
part I: basic concepts and first analyses. Br J Cancer 2003; 89:
232–238.
2 Kaplan EL, Meier P. Nonparametric estimation from incomplete
observations. J Amer Statist Assoc 1958; 53: 457–481.
3 Altman DG. Practical Statistics for Medical Research. Chapman &
Hall: London, 1991, pp 365–387.
4 Collett D. Modelling Survival Data in Medical Research. Chapman
& Hall: London, 1994, pp 15–49.
5 O’Brien SG, Guilhot F, Larson RA, Gathmann I, Baccarani M,
Cervantes F et al. Imatinib compared with interferon and low-dose
cytarabine for newly diagnosed chronic-phase chronic myeloid
leukemia. N Engl J Med 2003; 348: 994–1004.
6 Hehlmann R, Lauseker M, Jung-Munkwitz S, Leitner A, Müller MC,
Pletsch N et al. Tolerability-Adapted Imatinib 800 mg/d
Versus 400 mg/d Versus 400 mg/d Plus Interferon-a in Newly
23
24
25
26
27
Diagnosed Chronic Myeloid Leukemia. J Clin Oncol 2011; 29:
1634–1642.
Saglio G, Kim DW, Issaragrisil S, le Coutre P, Etienne G, Lobo C
et al. Nilotinib versus imatinib for newly diagnosed chronic
myeloid leukemia. N Engl J Med 2010; 362: 2251–2259.
Kantarjian H, Shah NP, Hochhaus A, Cortes J, Shah S, Ayala M
et al. Dasatinib versus imatinib in newly diagnosed chronic-phase
chronic myeloid leukemia. N Engl J Med 2010; 362: 2260–2270.
Deininger M, O’Brien SG, Guilhot F, Goldman JM, Hochhaus A,
Hughes TP et al. International Randomized Study of Interferon Vs
STI571 (IRIS) 8-Year Follow up: Sustained Survival and Low Risk
for Progression or Events in Patients with Newly Diagnosed
Chronic Myeloid Leukemia in Chronic Phase (CML-CP) Treated
with Imatinib. ASH Annual Meeting Abstracts 2009; 114: 1126.
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time
Data. Wiley: New York, 1980, pp 167–171.
Klein JP, Moeschberger ML. Survival Analysis: Techniques for
Censored and Truncated Data, 2nd edn. Springer: New York,
2003, pp 50–57.
Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing
risks and multi-state models. Stat Med 2007; 26: 2389–2430.
Scrucca L, Santucci A, Aversa F. Competing risk analysis using R:
an easy guide for clinicians. Bone Marrow Transplant 2007; 40:
381–387.
Baccarani M, Cortes J, Pane F, Niederwieser D, Saglio G,
Apperley J et al. Chronic myeloid leukemia: an update of concepts
and management recommendations of European LeukemiaNet. J
Clin Oncol 2009; 27: 6041–6051.
Gooley TA, Leisenring W, Crowley J, Storer BE. Estimation of
failure probabilities in the presence of competing risks: new
representations of old estimators. Stat Med 1999; 18: 695–706.
Gray RJ. A class of k-sample tests for comparing the cumulative
incidence of a competing risk. Ann Stat 1988; 16: 1141–1154.
Choudhury JB. Non-Parametric confidence interval estimation for
competing risks analysis: application to contraceptive data. Stat
Med 2002; 21: 1129–1144.
Fleming TR, Rothmann MD, Lu HL. Issues in using progression-free
survival when evaluating oncology products. J Clin Oncol 2009;
27: 2874–2880.
Anderson JR, Cain KC, Gelber RD. Analysis of survival by tumor
response. J Clin Oncol 1983; 1: 710–719.
Hasford J, Baccarani M, Hoffmann V, Guilhot J, Saussele S, Rosti G
et al. Predicting complete cytogenetic response and subsequent
progression-free survival in 2060 patients with CML on imatinib
treatment. The EUTOS score. Blood 2011; e-pub ahead of print
2 May 2011; doi:10.1182/blood-2010-12-319038.
Baccarani M, Saglio G, Goldman J, Hochhaus A, Simonsson B,
Appelbaum F et al. Evolving concepts in the management of
chronic myeloid leukemia: recommendations from an expert
panel on behalf of the European LeukemiaNet. Blood 2006; 108:
1809–1820.
U.S. Department of Health and Human Services, National
Institutes of Health, National Cancer Institute. Common Terminology Criteria for Adverse Events (CTCAE), Version v403: June 14,
2010. http://evs.nci.nih.gov/ftp1/CTCAE/CTCAE_4.03_2010-06-14_
QuickReference_5x7.pdf.
Mell LK, Jeong JH. Pitfalls of using composite primary end points
in the presence of competing risks. J Clin Oncol 2010; 28:
4297–4299.
Mahon F-X, Réa D, Guilhot J, Guilhot F, Huguet F, Nicolini F et al.
Discontinuation of imatinib in patients with chronic myeloid
leukaemia who have maintained complete molecular remission for
at least 2 years: the prospective, multicentre Stop Imatinib (STIM)
trial. Lancet Oncol 2010; 11: 1029–1035.
Altman DG. Practical Statistics for Medical Research. Chapman &
Hall: London, 1991, pp 463–464.
Woodward M. Epidemiology: study design and analyis. Chapman
& Hall/CRC: Boca Raton, 2005, pp 673–720.
Chronic Myeloid Leukemia Trialists Collaborative Group. Interferon Alfa Versus Chemotherapy for Chronic Myeloid Leukemia: A
Meta-analysis of Seven Randomized Trials. J Natl Cancer Inst
1997; 89: 1616–1620.
Supplementary Information accompanies the paper on the Leukemia website (http://www.nature.com/leu)
Leukemia