A practical approach for eliciting expert prior beliefs about cancer

Journal of Clinical Epidemiology 62 (2009) 431e437
A practical approach for eliciting expert prior beliefs about cancer
survival in phase III randomized trial
Anne Hiancea, Sylvie Chevretb,c, Vincent Levya,c,*
a
INSERM 9504, Centre d’Investigations Cliniques, H^
opital Saint Louis, Assistance Publique-H^
opitaux de Paris, Universit
e Paris 7, France
Department de Biostastique et Informatique M
edicale, H^
opital Saint Louis, Assistance Publique-H^
opitaux de Paris, Universit
e Paris 7, France
c
INSERM U 717
b
Accepted 14 April 2008
Abstract
Objective: To propose and compare practical approaches that allow eliciting and using expert opinions about the benefit effect on a censored endpoint, such as event-free survival (EFS), used in the planning of a clinical trial based on Bayesian methodology.
Study Design and Setting: Individual interviews of 37 experts. Bayesian normal models on the log hazard ratio (HR) of EFS were
implemented. We illustrate our approach by using a trial of autologous stem cell transplantation (ASCT) vs. chemotherapy (CT) in chronic
lymphocytic leukemia (CLL). We elicited experts’ prior beliefs about the difference in 3-year EFS between the two treatment arms, either
roughly or throughout weights over the difference scale. Subsequently, a Bayesian synthesis of the information reported in the trial protocol
with that in the experts’ prior was performed, using: (1) the postulated treatment effect based on null (skeptical) and alternative (enthusiastic) hypotheses with shared standard error; and (2) the expected difference derived from experts’ distributions.
Results: As compared with the priors based on the trial protocol data, expert priors agreed with some average from enthusiastic and
skeptical information, with close standard errors.
Conclusion: This case study illustrates a rational approach to construct an expert-based prior. It should be considered as part of the
design of future Bayesian trials. Ó 2009 Elsevier Inc. All rights reserved.
Keywords: Elicitation; Prior; Bayes; Expert opinion; Chronic lymphocytic leukemia; Randomized trial
1. Introduction
Bayesian methods provide an interesting and innovative
alternative to classical statistical approaches in the design,
monitoring, and reporting of clinical trials. One of the
advantages of a Bayesian approach is the ability to make
explicit probability statements about hypotheses, for
instance, the probability that the experimental treatment
is superior to the standard one. Indeed, such probability
statements cannot be inferred through classical approaches,
although many clinicians misinterpret P values and confidence intervals (CIs) as so. In contrast, Bayes’ inference
directly considers uncertainty in treatment effect using
a probability distribution [1,2]. This Bayesian requirement
for a so-called ‘‘prior’’ distribution, which reflects the
* Corresponding author. CIC, H^opital Saint Louis, 1 Avenue, Claude
Vellefaux 75475, Paris, France. Tel.: þ33-1-42-49-97-42; fax: þ33-1-4249-93-97.
E-mail address: [email protected] (V. Levy).
0895-4356/09/$ e see front matter Ó 2009 Elsevier Inc. All rights reserved.
doi: 10.1016/j.jclinepi.2008.04.009
initial information about the treatment effect, including
the investigator’s understanding of the biology of the disease, historical results for the investigational and related
treatments, and preclinical results for these treatments, thus
inherently subjective, has been long reported as the main
drawback of such approaches. Indeed, in such a regulatory
setting, it is important for sponsors and regulators to agree
in advance to the prior distribution(s) that will be used [1].
For overcoming concerns about the subjective nature of
prior distributions, a common approach is to assume a prior
distribution that is ‘‘non-informative’’ [1]. In this setting,
the results of the trial will carry essentially all the influence
in the so-called posterior distribution. Another is to consider a variety of prior distributions in attempting to
approximate the posterior distribution held by all types of
readers. Spiegelhalter et al. [3] and Parmar et al. [4]
proposed to use both enthusiastic and skeptical priors,
based on the alternative and null hypotheses stated when
computing the trial sample size, respectively. Finally, collecting and documenting subjective prior beliefs from
knowledgeable experts about the potential results of
432
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
a clinical trial have been reported to gain/encompass many
advantages [5]. It is classically described in four separate
stages [6,7]: (1) set-up stage, that is, selecting and training
the experts and identifying the aspects of the problem to
elicit; (2) elicitation itself, that is, interaction with the
experts; (3) fit of a probability distribution to those summaries; and (4) assessing the adequacy of the elicitation.
Different methods have already been used to elicit subjective opinion of informed experts in health care assessment. Four main categories have been described in
literature: (1) informal discussion [8,9]; (2) structured interviewing and formal pooling of opinion (Freedman and
Spiegelhalter described an interviewing technique, in
which, a set of experts were individually interviewed and
hand-drawn plots of their prior distribution were made
[10]); (3) structured questionnaire [4]; and (4) computerbased elicitation [11]. However, despite a literature widely
developed on prior beliefs’ elicitation [12], few researches
have been carried out on this process in clinical trials.
Therefore, the purpose of this article was to illustrate the
conduct and detailed results of prior elicitation with a phase
III randomized clinical trial in chronic lymphocytic leukemia (CLL), which is currently taking place, as a case study.
Several experts were asked their opinions with regard to the
expected difference between the studied treatments, namely
autologous stem cell transplantation (ASCT) and conventional chemotherapy (CT). We describe how we elicited
expert prior beliefs about the effects of ASCT for the treatment of CLL, from structured interviewing and questionnaire. The resulting priors are compared with those
reached from the protocol design.
2. Materials and methods
2.1. Trial information
As an illustrative example, we choose the auto-CLL,
a phase III randomized clinical trial designed to assess
the benefit of ASCT-consolidation treatment following
induction CT in patients with previously untreated Binet’s
stage B or C CLL [13]. All patients were scheduled to
receive three cycles of ChOP (cyclophosphamide, adriamycin, vincristine, prednisone) followed by three cycles of fludarabine. Once stem cell collection was completed, patients
responding to CT (complete or partial response according
to the National Cancer Institute (NCI) recommendations
[14]) had to be randomized between a watch-and-wait policy and ASCT, with conditioning regimen consisting in
total body irradiation (TBI) plus high-dose cyclophosphamide. The primary outcome was the 3-year event-free
survival (EFS) rate.
When planning the trial in 2000, an absolute difference
of 40% in 3-year EFS was expected with ASCT over CT
(CLL 90 trial [15]). This led to a recommended sample size
of 80 responders (2 40), with an 80% power and two-
sided significance level of 5%. Owing to the 40% expected
prevalence of response, a total of 200 required CLL patients
had to be enrolled.
2.1.1. Elicitation of prior opinion
Thirty-seven French hematologist experts, all involved
in the treatment of patients with CLL, were interviewed.
The demographic characteristics of the experts are detailed
in Table 1. Interviews took place either during the French
Haematology Society’s (SFH) annual meeting in Paris in
March 2006, or by e-mail, followed by telephone calls.
The investigators’ prior opinions were elicited by the use
of a questionnaire.
To make the elicitation as easy as possible for subjectmatter experts to tell us what they believe, while reducing
how much they need to know about probability theory to
do so, we decided to ask predictive rather than structural
questions, following the argument of Kadane and Wolfson
[16]. Thus, experts were asked to assess only observable
quantity conditionally on the treatment group, which is
the probability of experiencing the endpoint within the first
3 years of ASCT as consolidation treatment and CT alone,
respectively. Subsequently, the experts were asked to give
their own weights of belief over the difference scale,
involving, in some sense, the assessment of individual probabilities. The questionnaire also aimed to capture the investigators’ range of equivalence, which is the interval of
difference in 3-year EFS, within which, the investigator
feels that the two treatments are equivalent. In addition,
they were also asked for their opinion on the 3-year EFS
Table 1
Main characteristics of the 37 experts
Variables
Experts (N 5 37)
Sex
Male
Female
N 5 24 (64.8%)
N 5 13 (35.1%)
Time from MD degree (years)
Median: 17.5
Range: 3e29
Institution
University Hospital
General Hospital
Cancer Center
N 5 34 (91.9%)
N 5 2 (5.4%)
N 5 1 (2.7%)
Discipline
Clinician
Biologist
N 5 34 (91.9%)
N 5 3 (8.1%)
Function
Head
Professor
Associate professor
Assistant professor
Attending
CLL as first domain of interest
Inclusions in the auto-CLL trial
Interviewed by phone
N57
N 5 10
N 5 14
N55
N51
N 5 15
N 5 18
N58
Abbreviation: CLL, chronic lymphocytic leukemia.
(18.9%)
(27%)
(37.8%)
(13.5%)
(2.7%)
(40.5%)
(50%)
(21.6%)
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
for CT alone, and the number of CLL patients to whom they
had given auto-graft in the preceding 12 months. Otherwise, experts were asked to consider themselves as the
main investigators of this trial and to give the number of
patients they would reasonably plan to include to answer
the question of the superiority of ASCT over CT. Finally,
they had to quantify the number of patients they would
be ready to treat with ASCT, so that only one patient would
benefit of a lengthening of EFS, that is, the number needed
to treat (NNT).
The questionnaire was closely based on that used by
both Parmar et al. [4] and Tan et al. [17]. It is given in
Appendix 1. It was either hand-delivered or sent to the
investigators by e-mail, by two physicians (A.H. and V.L.).
2.1.2. Constructing the priors
Because the original protocol used a fixed hazard ratio
(HR) model, and because the HR represents the most
commonly used scale for comparing survival between treatments, as it is specifically designed to allow for censoring
and time to an event [18], we decided to use these grounds
alone. Therefore, a transformation was required to convert
the difference results to the log HR (LHR) scale, throughout the equation:
LHR 5 log½logðP1Þ=logðP2Þ
ð1Þ
where P1 is the 3-year EFS rate in the experimental
group (CT þ ASCT), and P2 is the 3-year EFS rate in the
conventional group (CT). An LHR of more than 0 (i.e.,
HR O 1) indicates an advantage to CT and the converse.
Furthermore, the use of LHR and its variance allows prior
normal distributions to be fitted directly [19]. We assumed
LHR to be normally distributed, with m0 and s0 being the
prior mean and the prior standard deviation (SD), respectively. This prior is equivalent to a normal likelihood arising from a hypothetical trial of n0 patients with an
observed value m0 of the treatment difference statistic
LHR [1].
Several normal prior distributions were fitted, differing
in terms of mean m0 and s0.
First, we considered two, either enthusiastic or skeptical,
priors based on the alternative and null hypotheses used when
computing the trial sample size, respectively, with shared
variance, as proposed by Spiegelhalter et al. [3] and Parmar
et al. [4]. The enthusiastic prior was, thus, directly computed
from equation (1) mentioned earlier, on the basis of a 3-year
EFS rate of P2 5 50% for the control group and P1 5 90%
for the experimental group, resulting in a mean LHR of
m0 5 1.884. The variance, s20, was approximated by 4/n0
[19], where n0 is the total number of adjusted deaths from
all the studies summarized [20]. In absence of external evidence, we based this computation on the information used
when planning the trial from the Schoenfeld formula [21],
that is, n0 5 4 (F(1 a/2) þ F(1 b))2/m20, where F() is
the standard normal distribution function, and a and b are
the type I and the type II errors, respectively. Both type errors
433
were fixed at 5%. Therefore, the prior variance was assumed
to be fixed at s20 5 4/14.65 5 0.273.
In contrast, the skeptical prior represents the belief of
individuals who are reluctant to accept that high response
rates are possible; hence, it formalizes the belief that large
treatment difference is unlikely, based on a 3-year EFS rate
of 50% for both treatment groups, that is, P1 5 P2 5 50%.
Consequently, the prior skeptical mean was m0 5 0, with
the same variance as the enthusiastic prior [22]. In other
words, the skeptical prior was equivalent to having performed a trial with n0 patients, all of whom have died
and, in which, no difference has been observed between
the two arms. Otherwise, the skeptical prior distribution
was such that the probability of an effect as large or larger
than log (h1) was 0.013, in agreement with its skeptical
nature [23].
Subsequently, we used the experts’ information achieved
by the questionnaire to solicit their prior beliefs. Because
two main results were available, namely individual 3-year
EFS in both groups, and individual weights on the treatment difference scale, different priors were computed.
The first expert prior averaged the informative prior beliefs
about the treatment effect of all experts. It was built by: (1)
transforming each individual answer in terms of difference
on 3-year EFS in the LHR scale by using equation (1); and
(2) computing mean and SD of observed distribution
among the experts. Otherwise, each expert’s weights could
be considered as a discrete probability distribution for the
expected 3-year difference in outcomes across randomized
groups, on a discrete scale of 18 predetermined values. To
synthesize these different distributions into a single distribution, unweighted average of the individual probability
distributions comprising it [24] were computed. First, we
averaged the own full priors of each expert by combining
expert weights (expert prior 2). Briefly, it consisted in:
(1) plotting distributions of individual weights across
LHR after having transformed differences in that scale
(Fig. 1A); and (2) estimating the mean and SD of such a distribution by integration over the whole LHR scale. Finally,
to better reflect the fact that individual weights corresponded to a discrete probability distribution, assuming
all experts to have access to the same amount of information, we fitted a multinomial model to the data; this will
be referred further as expert prior 3.
2.2. Statistical analysis
Summary statistics (mean with 95% CI) were computed
for each item of the questionnaire. Association between
experts’ characteristics and prior beliefs regarding the difference in the 3-year EFS between the two arms were tested
by using the Wilcoxon rank-sum test. They consisted of
sex, time from Medical Doctor Degree (MD), CLL as the
first domain of interest, participation as an investigator of
the Auto-LLC trial, number of CLL patients treated with
ASCT, and the number of ASCT in the previous 12 months.
434
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
A
B
Table 2
Main results of the questionnaire on the prior beliefs of the 36 experts
with regard to treatment effect in the CLL trial
80
0.20
60
40
Pr
%
0.15
0.10
20
0.00
(-2.8,-2.6] (-0.6,-0.4]
LogHR
Mean (95% CI)
3-year EFS for CT alone
Minimum differencea in the 3-year EFS
between the two arms
Range of equivalence: minimum boundary
Range of equivalence: maximum boundary
Sample size of a planned trial
Number needed to treatb
43.7% (38.9e48.6%)
22.4% (17.2e27.6%)
10%
20%
200
5
(10e15)
(20e30)
(200e300)
(1.5e9)
Abbreviations: CI, confidence interval; CT, chemotherapy; EFS, eventfree survival.
a
Stated as ‘‘discounted,’’ ‘‘interesting,’’ or ‘‘minimum worthwhile’’
difference in the questionnaire.
b
Stated as number of patients expert would be ready to treat with autologous stem cell transplantation, so that only one patient would benefit of
a lengthening of EFS.
0.05
0
Items
-2.0
-1.0
0.0
1.0
LogHR
Fig. 1. Empirical distribution of the difference on the 3-year event-free
survival (EFS) rate measured on the log hazard ratio (LHR) scale (A), with
fitted multinomial distribution (B), both based on the 36 experts’ weight
for each interval of difference.
Statistical analysis was performed on R (R Foundation
for Statistical Computing, Vienna, Austria, URL http://
www.R-project.org) software package.
3. Results
3.1 Results of the questionnaire
The questionnaire was proposed to 37 experts. Table 1
summarizes their main characteristics. About two-thirds
were male (65%), and one-half had their MD degree for
more than 17.5 years. Ninety-two percent worked in a university teaching hospital, and were clinicians. More than
40% declared CLL as their main domain of interest. The
median number of ASCT performed during the previous
12 months was 45 (range: 0e140), with a median of CLL
patients treated with ASCT of 2 (range: 0e9).
All fulfilled the information requested, except one hematologist, who declined to provide any answer. Time devoted
for each interview ranged from 20’ to 45’.
Table 2 summarizes the main answers to the questionnaire. The average prior beliefs on the 3-year EFS for CT
alone were estimated at 43.7% (95% CI: 38.9e48.6%).
This was less optimistic than that used when planning the
trial, because a 3-year EFS rate of 50% was expected in
the reference group. The estimated total median sample
size was 200 (range: 76e400). It was close to the initial
calculation, because, in 2000, 208 patients were planned
to be included in this trial. Therefore, clinicians have a good
perception of the number of patients required to give a statistically significant result related to the question the trial
aims to answer. The median NNT was 5 (range: 0e50).
Of note, 17 clinicians refused to estimate NNT, whereas
five clinicians estimated a null NNT.
The mean reported ‘‘interesting,’’ ‘‘discounted,’’ or
‘‘minimum worthwhile’’ difference in the 3-year EFS rate
between the two arms was 22.4% (95% CI: 17.2e27.6%).
Only the inclusion in auto-CLL trial appeared to modify
this finding; experts who included patients in the trial
reported a mean difference of 22.11% (95% CI:
16.29e27.91%), whereas those who did not participate in
the trial reported a mean difference of 16.94% (95% CI:
12.42e21.46%). Otherwise, the overall mean prior belief
regarding the difference in the 3-year EFS between the
two treatments was in the range of equivalence, which is
12.5% to 23.2%. Eleven of the 36 clinicians stated the
range of equivalence to be 10% to 20% in favor of ASCT.
Although experts mostly estimated the minimum worthwhile difference at the lower, the upper or between the
limits of the range of equivalence, nine experts estimated
minimum worthwhile difference far from it.
The distributions of the probability values that experts
individually assigned to the various intervals of 3-year differences in EFS was centered above 10, indicating that experts mostly expected that ASCT could improve the 3-year
EFS rate compared with CT alone. Most experts expected
a difference of at least 15%, which is below their demanded
minimum worthwhile difference, meaning that experts are
quite pessimistic with regard to the trial’s results.
3.1.1 Estimation of the prior distribution for treatment
effect size based on expert data
Having obtained the information on the 3-year EFS for
CT alone, and that for the difference in 3-year EFS, we
transformed the data to the LHR scale for each expert, using equation (1). The resulted mean LHR among the 37 experts was 0.69, with SD 5 0.494 (expert prior 1; Table 3).
The box plots of the 18 recorded weights across individual experts are displayed in Fig. 1A. The second expert
prior was first fitted to the observed expert distribution corresponding to the mean’s weight of beliefs for each treatment difference given in the questionnaire, as revealed
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
Table 3
Estimated priors from the expert opinion
Enthusiastic
Skeptical
Expert’s prior
Expert’s prior
Expert’s prior
Expert’s prior
1a
2b
3c
4d
LHR (m0)
SD (s0)
Proportion in the range
of equivalence (%)
1.884
0
0.690
0.717
0.3565
0.660
0.523
0.523
0.494
0.628
0.313
0.393
!0.01
8.81
27.29
21.76
38.02
42.67
Abbreviations: LHR, log hazard ratio; SD, standard deviation.
a
According to the minimum worthwhile difference on the 3-year EFS
rate.
b
According to the average of weights over the scale of expected differences in the 3-year EFS rate.
c
According to the multinomial fit of weights over the scale of expected differences in the 3-year EFS rate.
d
According to the minimum worthwhile difference on the 3-year EFS
among experts who did not participate in the auto-CLL trial.
earlier. This reached a mean LHR of 0.717 with
SD 5 0.628. Finally, the third expert prior, resulting from
fitting the multinomial model to individual weights
(Fig. 1B), has a mean LHR of 0.3565 and SD of
0.3133. Of note, these resulting experts’ prior distributions
lie in between the enthusiastic and skeptical priors derived
from the protocol information (Fig. 2). This suggests some
substantial uncertainty about which of the treatments would
benefit the patient most, fulfilling the principle of
2.0
Enthousiastic
Sceptical
Experts: Prior 1
Experts: Prior 2
Experts: Prior 3
Experts: Prior 4
1.5
1.0
0.5
0.0
-4
-3
-2
-1
0
1
2
logHR
ASCT superior
CT alone superior
ASCT = autologous stem-cell transplantation; CT = chemotherapy
Fig. 2. Enthusiastic, skeptical, and experts’ prior distributions. Expert’s
prior distributions were constructed according to the observed distribution
among experts of the minimum worthwhile difference (expert’s prior 1 regarding all clinicians and expert’s prior 4 regarding clinicians who did not
participate in the auto-chronic lymphocytic leukemia [CLL] trial) or according to the average of the 36 expert probability distributions (expert’s
prior 2) or the multinomial fit of the 36 expert probability distributions (expert prior 3).
435
equipoise. As expected, the experts’ priors 1 and 2 were almost superposed, whatever it was, based on the minimum
worthwhile difference or on the weights on the expected
difference, although the variability of the latter was increased. In contrast, expert prior 3 was closest to the skeptical prior, illustrating that elicited distributions varied from
other distributions, which might also be consistent with the
expert’s knowledge.
4. Discussion
Bayesian methods are being increasingly recognized as
appropriate for statistical analysis of clinical trial data. Advantages of Bayesian analysis over classical analysis of
clinical trials include: the ability to incorporate prior information regarding treatment effect into the analysis; the ability to make multiple unscheduled inspections of
accumulated data without increasing the error rate of the
study; and the ability to calculate the probability that one
treatment is more effective than another.
However, the implementation of Bayesian methods requires the specification of prior distributions. A prior is often the purely subjective assessment of an experienced
expert. However, subjectivity in prior distributions is explicit [3] and open to examination (and critique) by all.
The final report of trial analysis could display the dependence of the posterior distribution on the different choices
of informative priors, including non-informative priors as
reference models to be used as standard of comparison
against situations where subjective priors are used
[25,26]. To formulate a prior distribution, opinion of informed experts is needed, and this process is defined by
elicitation. Elicitation is largely described in literature, because it can be applied in many domains. Moreover, eliciting prior beliefs is ethically important for documenting the
nature of the uncertainty or equipoise [5,17]. However, despite the existence of a broad and diverse literature in elicitation, which has provided many valuable procedures and
insights, there remains very considerable scope for easy application. Regarding randomized phase III trials, Bayesian
methods and elicitation process have been less described.
The literature devoted to elicitation techniques used in clinical trials is based on a small number of experts (n 5 4 [27],
n 5 10 [28], n 5 11 [4], or n 5 14 [17]). In this study, 37
experts were elicited, and we performed the elicitation process during an individual interview. Face-to-face elicitation
proved to be useful, considering the time spent with each
expert (ranging approximately from 20’ to 45’).
Elicitation of prior distributions is based on statistic interpretation, and experts are not necessarily familiar with it.
Literature on elicitation recommended the training of the
experts first to familiarize them with the interpretation of
probabilities, but this method was difficult to perform in
practice. We used a questionnaire with clinically meaningful questions and direct interaction with clinicians, although some were contacted by e-mail.
436
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
Clinicians were asked to estimate observable quantities
that appeared easier to think about than regression parameters. Many inconsistencies were encountered, possibly because of misunderstandings. For instance, the fact that
clinicians’ answers broadly differed from the question concerning the minimum clinically worthwhile difference and
that regarding the range of equivalence revealed some elicitation issues. Parmar et al. [4] interpreted the range of
equivalence as the minimum worthwhile difference, but
nine clinicians (25%) answered in a very different way to
the two questions. This highlights the impact of questions’
formulation and the cautious interpretation of clinicians’
answers. Otherwise, results regarding the NNT were difficult to analyze. On one hand, 17 clinicians (47%) refused
to estimate this quantity and, on the other hand, five quantified the NNT to 0. No agreement between reported NNT
and benefit was observed.
Prior distributions from the experts were not obtained
before the trial onset. However, despite the name, ‘‘prior’’
suggests a temporal relationship; it does not refer to time,
but to a situation where we assess what our evidence would
have been if we had no data [3]. In reality, no significant
external evidence of ASCT benefit in CLL has been reported. Of significance is the inclusion of patients in the
auto-CLL trial as a significant factor influencing minimum
worthwhile 3-year EFS difference (P 5 0.031). However,
the 17 experts who did not include patients in the trial were
rather more skeptical than the others. Finally, little or no
impact of the trial data on the beliefs of the experts has
been previously reported [29].
There were conceptual main differences between the expert priors. Indeed, the first expert prior did not correspond
to the belief of any one expert, but averaged the informative
prior beliefs about the treatment effect of all experts. Instead, the other priors averaged the own full priors of each
expert by combining expert weights. To synthesize these
different weights into a single distribution, we used one
of the most popular methods known as ‘‘opinion pools,’’
which was a weighted average of the individual probability
distributions comprising it, with simple equal weighting of
the experts [24]. As different techniques may produce different distributions, because the method of questioning
may have some effect on the way the problem is viewed,
some differences in the resulting expert priors were expected [24]. We performed these different approaches as
a sensitivity analysis to allow determining whether, when
the elicited distribution varied from other distributions that
might also be consistent with the expert’s knowledge, the
results derived from that distribution change appreciably.
In reality, observed expert prior distributions 1 and 2 were
very close, whereas distribution 3 was somewhat different.
It can be partly explained by the fact that the expert prior 2
was derived from a normative method for combining the
quantitative judgments of the experts with regard to probability distribution. Expressing opinions as probability measures seemed to lead invariably to arithmetic averaging, as
previously reported in the work of Genest and Zidek [24].
Nevertheless, this allowed getting further insights in the
variability of experts’ beliefs, as illustrated by larger SD
of the expert prior 2 contrasting with the smaller SD of expert prior 3. Finally, ‘‘appreciable’’ changes could have
been more obvious if differences in priors had been translated, as previously proposed [20], on a sample size scale.
Indeed, because the expert prior’s LHR mean ranged from
0.3565 to 0.717, this would result in a required number
of deaths (with a 5 b 5 0.05) ranging from 101 to 409,
contrasting with the required 15 deaths scheduled from
the alternate hypothesis.
In summary, Bayesian approaches for randomized clinical trials appear promising. The elicitation process from
a small set of experts should be considered as part of the
design of future trials. We have shown that choices must
be made in converting expert opinion into statistical distributions. Nevertheless, as stated by Spiegelhalter et al. [3],
there is no ‘‘correct’’ or ‘‘true’’ prior; instead, a ‘‘community’’ of priors, expressing a range of reasonable opinions
that have at least consensus support to be generally accepted by a wider community, is used.
Acknowledgment
The authors wish to thank the 37 experts who participated in the survey (Appendix 2).
References
[1] Berry DA. Bayesian clinical trials. Nat Rev 2006;5:27e36.
[2] Freedman LS, Spiegelhalter DJ, Parmar MK. The what, why and how
of Bayesian clinical trials monitoring. Stat Med 1994;13(13e14):
1371e83. discussion 1385e1389.
[3] Spiegelhalter DJ, Myles JP, Jones DR, Abrams K. Bayesian methods
in health technology assessment: a review. Health Technol Assess
2000;4(38):1e130.
[4] Parmar MKB, Spiegelhalter DJ, Freedman LS. The CHART trials:
Bayesian design and monitoring in practice. Stat Med 1994;13:
1297e312.
[5] Chaloner K,, Rhame FS. Quantifying and documenting prior beliefs
in clinical trials. Stat Med 2001;20(4):581e600.
[6] Garthwaite PH, Kadane JB, O’Hagan A. Statistical methods for eliciting probability deistributions. J Am Stat Assoc 2005;100:680e701.
[7] O’Hagan A. Eliciting expert beliefs in substantial practical applications. Statistician 1998;47(Part 1):21e35.
[8] Lilford RJ, Thornton JG, Braunholtz D. Clinical trials and rare diseases: a way out of a conundrum. BMJ 1995;311(7020):1621e5.
[9] Rosner GL,, Berry DA. A Bayesian group sequential design for a multiple arm randomized clinical trial. Stat Med 1995;14(4):381e94.
[10] Freedman LS, Spiegelhalter DJ. The assessment of subjective opinion
and its use in relation to stopping rules for clinical trials. Statistician
1983;32:153e60.
[11] Chaloner K, Freedman LS, Spiegelhalter DJ. Graphical elicitation of
a prior distribution for a clinical trial. Statistician 1993;42:341e53.
[12] O’Hagan O, Buck CE, Daneshkhah A, Eiser JR, Ggarthwaite PH,
Jenkinson DJ, et al. Uncertain judgement. Eliciting experts’ probabilities. Statistics in Medicine. John Wiley and Sons Ltd; 2006.
[13] .www.sfgm-tc.com/documents/ProtocolerandomisedPhaseIII.pdf.
[14] Cheson BD, Bennett JM, Grever M, Kay N, Keating MJ, O’Brien S,
Rai KR. National Cancer Institute-sponsored Working Group
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
[15]
[16]
[17]
[18]
[19]
[20]
[21]
guidelines for chronic lymphocytic leukemia: revised guidelines for
diagnosis and treatment. Blood 1996;87(12):4990e7.
Leporrier M, Chevret S, Cazin B, Boudjerra N, Feugier P,
Desablens B. Randomized comparison of fludarabine, CAP, and
ChOP in 938 previously untreated stage B and C chronic lymphocytic
leukemia patients. Blood 2001;98(8):2319e25.
Kadane JB,, Wolfson LJ. Experiences in elicitation. Statistician
1998;47(1):3e19.
Tan SB, Chung YFA, Tai BC, Cheung YB, Machin D. Elicitation of
prior distributions for phase III randomized controlled trial of adjuvant therapy with surgery for hepatocellular carcinoma. Control Clin
Trials 2003;24:110e21.
Kalbfleish JD. Nonparametric Bayesian analysis of survival time
data. J R Stat Soc Ser B 1978;40:214e21.
Tsiatis AA. The asymptotic joint distribution of the efficient scores
test for the proportional hazards model calculated over time. Biometrika 1981;68:311e5.
Tan S, Chung YFA, Tai B, Cheung Y, Machin D. Can external and
subjective information ever be used to reduce the size of randomized
controlled trials? Contemporary Clinical trials, 2007.
Schoenfeld D. Sample-size formula for the proportional-hazard regression model. Biometrics 1983;39:499e503.
437
[22] Spiegelhalter DJ, Freedman LS, Parmar MK. Applying Bayesian
ideas in drug development and clinical trials. Stat Med
1993;12(15-16):1501e11. discussion 1513e1517.
[23] Fayers P, Ashby D, Parmar M. Tutorial in biostatistics. Bayesian data
monitoring in clinical trials. Stat Med 1997;16:1413e30.
[24] Genest C,, Zidek JV. Combining probability distributions: a critique
and an annotated bibliography. Stat Sci 1986;1(1):114e35.
[25] Bernardo JM. Reference posterior distributions for Bayesian inference. [with discussion]. J R Stat Soc Ser B 1979;41:113e47.
[26] Van Dongen S, Talloen W, Lens L. High variation in developmental
instability under non-normal developmental error: a Bayesian perspective. J Theor Biol 2005;236:263e75.
[27] White IR, Pocock SJ, Wang D. Eliciting and using expert opinions
about influence of patient characteristics on treatment effects:
a Bayesian analysis of the CHARM trials. Stat Med 2005;24(24):
3805e21.
[28] Abrams K, Ashby D, Errington D. Simple Bayesian analysis in clinical trials: a tutorial. Control Clin Trials 1994;15(5):349e59.
[29] Rovers MM, van der Wilt GJ, van der Bij S, Straatman H, Ingels K,
Zielhuis GA. Bayes’ theorem: a negative example of a RCT on
grommets in children with glue ear. Eur J Epidemiol 2005;20(1):
23e8.
437.e1
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
Survival difference rate compared to chemotherapy
alone of:
Appendix 1
Questionnaire submitted to the panel of 37 experts to
elicit their opinion (translated from French)
j_j_j % at 2 year
j_j_j % at 3 year
j_j_j % at 5 year
The aim of this questionnaire is to ask your opinion regarding a randomized phase III trial assessing the benefit
over conventional chemotherapy of intensification with autologous stem cell transplantation as the first line treatment
in patients with B or C Chronic Lymphocytic Leukemia
(Principal investigator: Laurent Sutton).
Median difference:
j_j_j_j months in terms of survival
j_j_j_j months in terms of EFS
Question 1
Question 3
For chemotherapy alone, what Event Free Survival
(EFS) rate do you expect?
We are interested in your estimation of the difference in
the 3-year Event Free Survival rate that you expect between
chemotherapy alone and intensification with autologous
stem cell transplantation.
Please enter your weights of belief in each of the possible intervals of difference shown in the table below.
The stronger you believe that the difference will truly lie
in a given interval, the greater should be your weight for
that interval. If you believe it is impossible that the difference lies in a given interval your weight should be zero.
Your weights should add up to 100.
Example: This is the example of a clinician who believes
the most plausible situation is that intensification with autologous stem cell transplantation is better than chemotherapy alone by 10e15%. However, he is prepared to concede
that it may do better than this. He even accepts the possibility that chemotherapy may harm 0e5%. As a consequence
j_j_j % at 2 year
j_j_j % at 3 year
j_j_j % at 5 year
Question 2
Please try to quantify the interesting or discounted or
minimum worthwhile difference of EFS, survival, and median survival.
EFS difference rate compared to chemotherapy alone of:
j_j_j % at 2 year
j_j_j % at 3 year
j_j_j % at 5 year
Diff erence in 3- ye ar Ev ent Free Survi val rat e
Intensification with autologous stem cell transplantation worse than
50+
0
45-50
40-45
35-40
30-3 5
25-30
20-2 5
15-2 0
10-15
5-10
0
0
0
0
0
0
0
0
0
0- 5
5
Intensification with autologous stem cell transplantation better than
0-5
20
5-10
10-15
15-20
20-2 5
25-30
30-3 5
35-4 0
40-45
45-50
20
35
20
0
0
0
0
0
0
10-15
5-10
0- 5
40-45
45-50
50+
50+
0
TOTA L
100
Your opinion
Diff erence in 3- ye ar Ev ent Free Survi val rat e
Intensification with autologous stem cell transplantation worse than
50+
45-50
40-45
35-40
30-3 5
25-30
20-2 5
15-2 0
Intensification with autologous stem cell transplantation better than
0-5
5-10
10-15
15-20
20-2 5
25-30
30-3 5
35-4 0
TOTA L
100
A. Hiance et al. / Journal of Clinical Epidemiology 62 (2009) 431e437
he weighs his beliefs as 35 for the most likely outcome, 5
for the unfavourable outcome, and divides the remaining
units over the range 0e5, 5e10, 15e20, equally in blocks
of 20.
NB: 0e5% advantage to intensification with autologous
stem cell transplantation means that the 3-years EFS rate
for this treatment will be between 0% and 5% higher than
that for chemotherapy alone.
Question 4
We are also interested in what importance you place on
the extra difficulty and possible extra toxicity of autologous
stem cell transplantation procedure.
Please indicate in the table below the absolute advantage
in 3-year EFS rate above which you would routinely switch
to intensification with autologous stem cell transplantation
rather than just chemotherapy alone?
Please also indicate the rate below which you would definitely retain the use of chemotherapy alone.
Example: This is the opinion of a clinician who feels he
would definitely use the intensification with autologous
stem cell transplantation if it gives a 20% increase in the
3-year EFS rate. He would definitively retain the use of
chemotherapy alone if the intensification with autologous
stem cell transplantation resulted in less than 10%
improvement.
Between these limits (i.e., 10%e20%) he feels that the
treatment are roughly equivalent:
Absolute advantage in 3-year EFS rate above which
you use intensification with stem cell transplantation
20%.
Absolute advantage in 3-year EFS rate below which
you use chemotherapy alone as compared to intensification with autologous stem cell transplantation
10%.
Your opinion
Please fill in your own judgement
Absolute advantage in 3-year EFS rate above which you use intensification
with autologous stem cell transplantation
437.e2
j_j_j %
Absolute advantage in 3-year EFS rate below which you use chemotherapy
alone as compared to intensification with autologous stem cell
transplantation
Question 5
You are the principal investigator of a randomized phase
III trial assessing intensification with autologous stem cell
transplantation versus chemotherapy alone and EFS in the
primary outcome.
How many patients would you reasonably plan to include in this trial?
Question 6
In the same context of a randomized trial, how many patients are you ready to autograft so that one patient would
benefit for a lengthening of survival and EFS?
Question 7
How many patients did you autograft in the last 12
months?
j_j_j total
j_j_j LLC
Appendix 2
List of the 37 experts who answered the questionnaire
N. Ifrah, X. Troussard, F. Lefrere, S. Choquet, M. Divine, V. Ugo, L. Sanhes, A. Jaccard, F. Isnard, E. Solary,
G. Laurent, A. Aoudjhane, J.F. Rossi, J.J. Kiladjan, B.
Rio, P. Rousselot, A. Buzyn, B. Royer, M. Hunault, P. Feugier, T. Facon, V. Leblond, M. Malphettes, A. Delabarthe,
E. Raffoux, P. Brice, A. Delmer, M. Michallet, J.P. Fermand, B. Arnulf, O. Tournilhac, F.X. Mahon, N. Milpied,
A. Thyss, D. Bordessoule, F. Bauduer, B. Dreyfus.