When Is a Confirmatory Randomized Clinical Trial Needed?

EDITORIALS
When Is a Confirmatory Randomized
Clinical Trial Needed?
Donald A. Berry*
Few clinical trials show important advantages for one therapy
over another. When one does, a confirmatory trial will help
decide whether the observation in the first trial was real. The
need to replicate experimental results is fundamental in science.
Systematic differences occur across clinical trials for a variety
of reasons: differences in patient populations (geographically
and over time), different patient referral patterns, differential application of entry criteria by investigators, and so on. Results
vary across trials, just as they vary across patients within a trial,
although intertrial variability is usually less than interpatient
variability. If the differences among trials affect all treatment
arms similarly, then treatment differences may be similar in different trials, but trial settings may well interact with treatment.
The need for replication suggests that the answer to the question in the title is "Always." But carrying out a confirmatory
trial has implications other than scientific ones. First, trials require commitments of time and resources. More important, the
original trial suggested that the control therapy is inferior, and
randomly assigning patients to receive a possibly inferior
therapy is ethically questionable. Randomly assigning patients
to therapy is the same scientifically as, say, randomly assigning
plants to pots, but it is not the same ethically.
Different people view trial results differently, in part for ethical, economic, and political reasons. So there are differences of
opinion concerning the need for a confirmatory trial. In this
issue of the Journal, Parmar et al. (7) suggest that differences of
opinion can be explained from a Bayesian perspective (2) as differences in prior probability distributions of treatment effectiveness. They suggest using a skeptic's prior distribution to address
whether a confirmatory trial is necessary. A confirmatory trial is
recommended if, after updating this prior distribution on the
basis of results from the original trial, the skeptic's resulting
posterior probability of a clinically important treatment benefit
is small. This Bayesian calculation is easy to make, and it can be
very useful. Quantifying opinions (of real people as well as of
"enthusiasts" and "skeptics") and showing how they are updated
on the basis of empiric evidence can be pivotal in a decision
process. The point of using posterior probabilities is not to dictate the final decision but to elucidate the decision process.
Decision makers are not constrained to choose the option indicated by such an analysis, but if they choose otherwise, they
should be able to identify which assumptions in the analysis are
1606 EDITORIALS
wrong, and they can redo the analysis with the repaired assumptions. Anyone considering a confirmatory trial will find the
analyses of Parmar et al. instructive. Furthermore, enthusiastic
and skeptical posterior probability distributions of treatment
benefit would be useful additions to the discussion sections of
clinical trial reports.
In addressing the decision to confirm a trial, Parmar et al.
restrict their consideration to data from the original trial. This is
seldom the only available information. Consider the authors' example of the Cancer and Leukemia Group B (CALGB) trial of
nonresectable stage III non-small-cell lung cancer (3,4). Prior to
or during the initiation of the confirmatory trial (5), several
other randomized trials (6-9) had compared radiotherapy plus
chemotherapy with radiotherapy alone, although none used exactly the same chemotherapy regimen as did the CALGB. This
other evidence could be included in a statistical analysis, and the
Bayesian approach is ideal for synthesis. However, it would be
wrong to regard patients from different studies as exchangeable
and simply pool them. Hierarchical models (10-12) provide an
appropriate alternative. In a hierarchical analysis, patients and trials
are viewed as two different levels of experimental units. Patients
are regarded as sampled from a trial's population, and trials are
regarded as sampled from some larger population—a random effects model. The original trial is a sample of size 1 from this
population, and this is so irrespective of the trial's size.
Another virtue of a Bayesian approach is that it fits very well
into a decision analysis. Parmar et al. indicate that a confirmatory trial may not be appropriate, despite a low (skeptical)
probability of a clinically meaningful benefit if the disease being
treated is rare or if there are no reasonable alternative therapies.
More generally, the issues of prevalence of disease and availability of alternative therapies can be neatly incorporated into a
formal decision analysis, a topic beyond the scope of this editorial (75,14).
Despite its advantages, a Bayesian analysis that ignores the
manner of selection of clinical trials as candidates for confirmation is no better than any other analysis. Trials that are chosen
* Correspondence to: Donald A. Berry, Ph.D., Institute of Statistics and
Decision Sciences, Duke University, 223 Old Chemistry Bldg.. Durham. NC
27708-0251.
Journal of the National Cancer Institute, Vol. 88, No. 22, November 20, 1996
for replication tend to be those showing the greatest therapeutic
advantage. Observed differences in therapies have at least two
possible explanations: they are real and they are the result of
random variation. Whether or not the first applies, the second almost always does. A trial selected because of a large observed
therapeutic difference is biased. Replicating such a trial will
likely give a smaller difference, even if the circumstances of the
two trials are identical.
This bias results from the regression effect—also called regression to the mean. It is ubiquitous and insidious. Failure to
appreciate its presence is among the most serious of errors made
in medical (and other) research (2). It is possible to adjust for
biases resulting from the regression effect without a confirmatory trial—theoretically at least. One method is to use a hierarchical model including all clinical trials. Appropriate
adjustments depend on many things, including sample sizes of
the various trials. The spirit of such an adjustment is easy to
convey: Estimating the true benefit of a therapy to be a fraction—such as one half—of the observed benefit is likely to be
closer than the observed benefit itself. (Baseball sophisticates
know that a player's batting average in the current year is not a
good estimate of his batting average next year and that a better
estimate is the simple average of his batting average this year
and this year's batting average of the entire league.) However,
the most appropriate adjustment is not obvious because trials
are often considered for confirmation for reasons other than observed treatment differences. Other reasons include the following: medical importance, the presence of corroboratory clinical
or preclinical information, economics, and politics. In a particular trial, it is difficult to know how much of an observed
treatment benefit is real and how much is random. A subjective
assessment using a Bayesian approach is possible (2). However,
this may vary considerably from one assessor to another and not
lead to a consensus.
Overcoming the regression effect is itself a good reason to
replicate a trial. Indeed, since large deviations are usually the
ones checked by replication and the replicated results are usually smaller, the regression effect has probably played an important role in establishing replication as a scientific standard. In
cases where the second trial shows less of a treatment difference
than the first—the typical case, and the case in examples considered by Parmar et al.—one might average the results of the
two trials to obtain an overall estimate. However, because of selection andregression-effectbias, the second trial's results have more
credibility and should be given more weight In extreme cases, the
first trial might reasonably be discounted entirely.
There are settings in which the decision to carry out a confirmatory trial is not difficult. One setting is when the results of the
original trial have, at most, a moderate impact on clinical practice. Apparently, such was the case after the CALGB lung cancer trial that compared radiotherapy plus chemotherapy with
radiotherapy alone (15). Moreover, one of the confirmatory
trial's authors (16) suggests that, even now, patients are usually
treated with thoracic irradiation alone.
Parmar et al. do not address the closely related question of
monitoring trials with the possibility of early stopping. The circumstances in the previous paragraph apply as well when considering stopping a trial prematurely. If a trial's monitoring
committee can foresee that, despite publication of the trial's results, most patients in clinical practice would continue to be
treated with the arm that is the apparent loser—radiotherapy
alone, for example—then it is not ethically imperative to stop the
trial. On the other hand, it may be reasonable to stop an apparently
positive trial early simply to make way for a confirmatory trial.
To summarize, confirmatory trials are always important from
a scientific perspective. Whether they are ethical is another,
much more complicated matter. The methods of Parmar et al.
are useful in assessing the need to confirm a trial, with two
reservations: 1) The setting of the original trial may be special
and Bayesian or other analyses should recognize the possibility
of trial differences, and 2) the regression effect makes it difficult
to assess the magnitude of a treatment benefit on the basis of information from any single trial.
References
(/) Parmar MK, Ungerieider RS, Simon R. Assessing whether to perform a
confirmatory randomized clinical trial. J Natl Cancer Inst I996;22:I64551.
(2) Berry DA. Statistics: a Bayesian perspective. Belmont (CA): Duxbury,
1996.
(3) Dillman RO, Seagren SL, Propert KJ, Guerra J, Eaton WL, Perry MC, et
al. A randomized trial of induction chemotherapy plus high-dose radiation
versus radiation alone in stage III non-smaJl-cell lung cancer [see comment
citations in Medline]. N Engl J Med 199O;323:94O-5.
(4) Dillman RO, Herndon J, Seagren SL, Eaton WL Jr, Green MR. Improved
survival in stage III non-small-cell lung cancer: seven-year follow-up of
Cancer and Leukemia Group B (CALGB) 8433 trial [see comment citation
in Medline]. J Natl Cancer Inst 1996,88:1210-5.
(5) Sause WT, Scott C, Taylor S, Johnson D, Livingston R, Komaki R, et al.
Radiation Therapy Oncology Group (RTOG) 88-08 and Eastern Cooperative Oncology Group (ECOG) 4588 preliminary results of a phase III trial
in regionally advanced, unresectable non-small-cell lung cancer. J Natl
Cancer Inst I995;87:198-205.
(6") Mattson K, Holsti LR, Holsti P, Jakobsson M, Kajanu M, Liippo K, et al.
Inoperable non-small cell lung cancer radiation with or without chemotherapy. Eur J Cancer Clin Oncol 1988:24:477-82.
(7) Trovo MG, Minatel E, Veronesi A, Roncadin M, De Paoli A, Franchin G,
et al. Combined radiotherapy and chemotherapy versus radiotherapy alone
in locally advanced epidermoid bronchogenic carcinoma. A randomized
study. Cancer 1990;65:400-4.
(8) Morton RF, Jett JR, McGinnis WL, Earle JD, Themeau TM, Krook JE, et
al. Thoracic radiation therapy alone compared with combined chemoradiotherapy for locally unresectable non-small cell lung cancer. A randomized, phase III trial [see comment citation in Medline]. Ann Intern
Med I99I;115:681-6.
(9) Le Chevalier T, Arriagada R, Quoix E, Ruffie P, Martin M, Tarayre M, et
al. Radiotherapy alone versus combined chemotherapy and radiotherapy in
nonrcsectable non-small-cell lung cancer: first analysis of a randomized
trial in 353 patients [see comment citation in Medline]. J Natl Cancer Inst
l991;83:4l7-23.
(10) Stangl DK. Hierarchical analysis of continuous-time survival models. In:
Berry DA, Stangl DK, editors. Bayesian biostatistics. New York: Marcel
Dekker, 1996:429-50.
(//) Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. London: Chapman & Hall, 1995.
(12) Gilks WR, Richardson S, Speigelhalter D. Practical Markov chain Monte
Carlo. New York: Chapman & Hall, 19%.
(13) Berry DA. Decision analysis and Bayesian methods in clinical trials. In:
Thall P, editor. Recent advances in clinical trial design and analysis. New
York: Kluwer Press, 1995:25-54.
(14) Berry DA, Stangl DK. Bayesian methods in health-related research. In:
Berry DA, Stangl DK, editors. Bayesian biostatistics. New York: Marcel
Dekker, 1996:1-66.
(15) Tannock IF, Boyer M. When is a cancer treatment worthwhile? [editorial]
[see comment citation in Medline]. N Engl J Med I99O;323:989-9O.
(16) Johnson DH. Combined-modality therapy for unresectable, stage III nonsmall-cell lung cancer—caveat emptor or caveat venditor? [editorial] [see
comment citation in Medline]. J Natl Cancer Inst 1996;88:l 175-7.
Journal of the National Cancer Institute, Vol. 88, No. 22, November 20, 1996
EDITORIALS 1607