Palmer, Christopher Ralph; (1988).A Clinical Trials Model for Determining the Best of Three Treatments Having Bernmoulli Response."

A a.. INI CAL TRIALS MODEL FOR DETERMINING TIIE BEST
OF TIIREE TREATMENTS HAVING BERNOULLI RESPONSES
by
Christopher Ralph Palmer
A dissertation submitted to the faculty of The
University of North Carolina at Chapel Hill in
partial fulfilment of the requirements for the
degree of Doctor of Philosophy in the Department
of Statistics.
Chapel Hill
1988
Approved by
Advisor
Reader
Reader
aIRISTOPHER RALPH PALMER.
A Clinical Trials Model for Determining the
Best of Three Treatments Having Bernoulli Responses.
(Under the direction of GORDON D. SIMONS.)
ABSTRACT.
Clinical trials are commonly performed primarily to answer,
"Is a proposed new treatment an improvement over existing therapy?"
For certain diseases. notably those for which the response to
treatment occurs relatively quickly. the techniques of sequential
analysis. introduced by Wald (1947). are especially well-suited to the
statistical problem posed.
Noting. however. the apparent lack of use of such methods in
practice. reasons for their implementation are considered from
theoretical. ethical and practical points of view.
These concerns
also motivate a study of a clinical trials model using a decisiontheoretic approach. in the particular case having three treatments
with responses classified as successes or failures.
This paper. in extending the work of Simons (1986) to three
treatments. develops the methodology for a simple. well-performing.
ethical. pragmatic and potentially useful model.
Three cases are
studied. both theoretically and numerically. the main case allowing
for the early. sequential elimination of the poorer treatments.
In the presence of multiple treatments. it also addresses the
question of allocating two or more than two treatments at a time. and
concludes the latter on qualitative more so than on quantitative
grounds.
Though mostly based on a special type of prior. known to
perform well in the two-treatment case. some theoretical results are
extended to general prior distributions on the treatment success
probabi Ii ties.
Aa<NOWLEDGEMENTS
First. I would like to thank the Graduate School and the
Statistics Department* for their financial support. without which this
endeavour would not have been possible.
1banks also go to my
committee members. Dr.s V.G. Kulkarni. J.S. Marron. W.L. Smith and
M.J. SYmons for their careful reading of the manuscript.
My dissertation advisor. Gordon Simons. is to be thanked in
addition for his expert guidance. clear insight. ready availability
and. above all. his personal friendship over the years.
Perhaps the
way I can best express my gratitude to him is in quoting and aspiring
to satisfy our Lord' swords. "It is enough for a student to be I ike
his teacher."
Numerous friends made during my time in Chapel Hill. notably from
Bible study groups past and present. the Statistics Department and the
Chapel Hill Bible Church have been sources of great encouragement to
me.
Two in particular deserving a special mention are Whit Jones and
Cathy-Joan MacDonald.
I thank them and all for their love and support
throughout my graduate studies.
Last. but not least. I thank my loving parents at home for all
that they mean to me and their constant devotion.
It gives me much
joy to dedicate this work to them.
*
Research partially supported by National Science Foundation Grants
DMS-8400602 and DMS-8701201.
PREFACE
An Early Clinical Trial
"Daniel (who had resolved not to defile himself with the royal
food) said to the guard whom the chief official had appointed over
Daniel. Shadrach. Meshach and Abednego. "Please test your servants for
ten days.
Give us nothing but vegetables to eat and water to drink.
Then cOmPare our appearance with that of the young men who eat the
royal food. and treat your servants in accordance with what you see."
"So he agreed to this and tested them for ten days.
At the end
of ten days they looked healthier and better nourished than any of the
young men who ate the royal food.
So the guard took away their choice
food ... and gave them vegetables instead."
Daniel 1:11-16
(c. 605 B.C.)
TABLE OF OONTENTS
aIAPIER I.
INTROOOCTIaf
Opening Remarks
1
The Background in Practice
3
The Background in Theory
5
Motivation
14
Mathematical Approach and Terminology
17
aIAPIER II.
CASE 1: FROM 11IREE TREA1lIENTS TO
am
Introduction
21
Prior
21
Posterior
22
Reward
23
Transition States
25
Transition Probabilities
26
Optimal Stopping Rule
28
Dynamic Equation in Qt+3(j,k)
29
Three Assertions
30
Numerical Questions
32
aIAPIER III.
CASE 2. STAGE II: FROJI 1'10 TREATMENTS TO
am
Introduction
35
Prior
32
Posterior
36
Reward
37
Transition States and Probabilities
38
III (Continued)
Dynamic Equation in Qt+2(x,y}
40
Generalisations
51
aIAPrER IV.
CASE 2. STAGE I: FIDI 1lIREE TREA'I1InfI'S TO T10
Introduction
61
Prior and Posterior
62
Reward
63
Dynamic Equation in SMAX t +3 (j,k}
64
Programming Considerations and Applications
72
Numerical Results
76
(1) Small Horizons
76
(2) Robustness in (a,b)
78
(3) Universal Optimal Stopping Points
80
Four Assertions
83
aIAPrER V.
CASE 3: T10 PAIRWISE TRIALS
Introduction
87
Terminology and Notation
89
Dynamic Equation in Vt +2 (j,k}
90
Four Assertions
104
Programming Considerations
105
Numerical Results
114
(1) Large Horizons
114
(2) Rewards
115
(3) Robustness and Minimax Policy
115
Introduction
119
Quantitative Comparisons
120
Qualitative Comparisons
123
Discussion
124
Closing Remarks
127
APPENDIX 1.
Proof of Theorem 3.2
APPENDIX 2.
Case 2 Program Listing and Sample Output
= 2. N = 36. (a.b) = (.6 .. 5)
4. Modes Table for t = 60. N = 100. (a.b) = (.6 •. 4)
5{a). Modes Tables for t = 68 & 70. N = 100. (a.b) = (.6 .. 4)
5{b). Modes Table for t = 72. N = 100. (a.b) = (.6 .. 4)
5{c). Modes Tables for t = 86-100. N = 100. (a.b) = (.6 .. 4)
APPENDIX 3.
APPENDIX
APPENDIX
APPENDIX
APPENDIX
REFERENCES
Modes Table for t
aIAPTER I
INTROOOCfION
Opening Remarks
Issues surrounding clinical trials are necessarily complex. since
they include a delicate balancing of some essential considerations:
(1) Is a proposed new treatment an improvement over existing therapy?
(2) How well does the new treatment perform?
(3) Is the design of the trial ethical?
(4) Is the design of the trial practical?
These inter-dependent questions from theoretical. ethical and
practical perspectives address the purpose. justification and
relevance of any particular clinical trial.
Their answers define four
corresponding goals or principles of identification. estimation.
ethics and impact.
These goals provide the framework for the
background. the motivation and the mathematical approach adopted in
the current paper. and so it is fitting to begin by considering them
in more detail.
Each issue is vitally important in its own right. yet can be made
redundant without due consideration to the others.
For instance.
ignoring for the moment secondary effects. the second question can be
rendered moot if the first turns out to be negative.
The distinction
between goals (1) and (2) cannot be smeared into non-existence.
aims of identifying an improvement and of quantifying such an
The
2
improvement, in serious illness trials at least, are sufficiently
different, it seems, that it is unreasonable to apply the same
statistical methodology to both.
Equally, it is unwise to expect both
issues of identification and estimation to be fully and well addressed
in a single trial.
This fundamental dichotomy between a "pragmatic"
approach designed to come to a conclusion about the best treatment and
an "explanatory" approach to increase scientific knowledge, such as
the estimation of treatment effects, is more fully discussed in
Schwartz, Flamant
& Lellouch's (1980) insightful book.
The third issue, ethical considerations, is precisely what sets
the theory of clinical trials apart from almost all other branches of
statistics, and must playa dominating role in any trial's design.
The more serious the disease being studied, the more serious are the
consequences of inferior treatments and the less important are the
secondary effects.
Thus, the highest ethical standards should apply
to trials whose purpose is to make therapeutic advances against
chronic, life-threatening ailments.
Of course, there are many ethical
concerns underlying a clinical trial (see Tagnon (1984) or Bulpitt
(1983), pp.12-27, for examples).
Statistically, most pertinent is the
dilemma between weighing the needs of current patients entered into
the trial with those of future patients (including concurrent ones not
in the trial) who stand to benefit from the results.
This, of course, is conditional upon the fourth issue which is
raised for reasons that will become clear later.
It is recognised
that anyone trial will have a limited and time-lagged impact on the
practicing medical community.
However, if all clinical trials were
designed and conducted as though they were the only one. then their
sum total effect would be broader and more immediate.
In order for a
3
clinical trial to be of practical worth. two major sub-issues arise:
(4a) Is it statistically valid?
(4b) Is it simple enough to implement?
These are co-requisites to a workable model. since both components
have to be present in order to persuade anyone who actually conducts a
clinical trial of the usefulness of a design. however otherwise
meritorious.
This is because if a design is deemed irrelevant or
infeasible by those who employ them. it will not be initiated.
So far. we have touched on some topics that remain the source of
great debate between (and even within) theoretical statisticians and
clinical practitioners.
The object of doing so has been to be neither
contentious nor to "preach to the converted".
Instead. it should
serve to illustrate some of the many difficulties inherent in clinical
trials methodology and to begin to motivate the reasoning behind the
attack of the particular problem studied in this paper.
There follows
a brief description of the status quo and a literature review. tracing
the historical development of medical experimentation from Daniel's
day (see Preface) to the present.
In discussing the background, it is
intended to further motivate the need for a careful study of a
clinical trials model for determining the best of three treatments
yielding Bernoulli responses.
The Background in Practice
Owing to its unique ethical concerns, a potential new treatment
has to pass through a number of stages before being accepted as a
therapeutic advance.
From its test-tube origins in the laboratory, a
new drug must first be successful in extensive animal studies before
undergoing up to four phases with human subjects.
First. it is
4
administered under careful scrutiny to volunteers in a "Phase I"
study, a small-scale toxicity screen to determine a safe dosage level.
Next come "Phase II" trials which are fairly small-sized ("not
beyond 100-200 on a drug", says Pocock. (1984»
and, i f promising,
full-scale "Phase III" investigations. Both of these phases involve
comparisons with a control or standard therapy.
Phase II trials,
then, are a further screening process, a pilot study, aimed at
answering question (1) above. at least provisionally, whereas
Phase III trials are undertaken in the hope of confirming such early
promise and looking at question (2) in the process.
Finally, "Phase
IV" trials are conducted on treatments, unlike the vast majority, that
have not been dropped after earlier phases.
These amount to follow-up
studies with a special view to monitoring any long-term and sideeffects.
Far more details are available in the clinical trials
textbooks, but for our purposes we note that Phases II and III are
both comparative. and are performed with different aims in mind.
The model pursued here (which assumes any new treatment to have
passed through Phase I studies) applies to both the comparative
phases. but, by its design fits in best with the constraints and
purposes of Phase II trials.
Most attention in the literature has
been given to the larger Phase III trials, though this does not
detract from the vital role played by the preliminary trials.
In
Cancer Clinical Trials, in the chapter summarising multi-stage designs
for (two-treatment) Phase II trials, Herson (1984) concludes.
"Considerable further development of statistical methods for Phase II
trials is needed."
Since each step of a drug's progress acts as a
screen, it seems all the more relevant to consider an example with
multiple treatments for Phase II applications especially.
5
The Background in Theory
The rationale for asking "(1) Is a proposed new treatment an
improvement over existing therapy?" is that, if so, one would like to
know with reasonable certainty so it might become the standard for
future patients, and if not, the trial can be terminated so it is not
prolonged unduly for the sake of the participants.
an early and reliable answer is sought.
In either event,
For certain diseases. notably
reasonably common ones for which the response to treatment occurs
relatively quickly, the techniques of sequential analysis, pioneered
by Wald (1947), are particularly well-suited to the statistical
problem posed.
Much work has been done in sequential ranking and selection
procedures from two or more populations such as in Sobel and Huyett
(1957), Paulson (1967, 1969) and Bechhofer, Sobel & Weiss (1968). but
not specifically in a clinical trials context.
Their procedures are
based on the idea of controlling error probabilities. so that if 0 is
some measure of distance between population parameters. then the
probability of correct selection (of superior population) is reqUired
*
to exceed a given value P* whenever 0 exceeds a given O.
According
to Bechhofer et a 1 ., "th e specification of any particu 1ar pal' r (~*,p*)
u
is ideally based on economic considerations," (p. 257) and "if
o ~ 0 ~ 0*, the experimenter is indifferent to certain incorrect
selections" (p. 258).
Both of these statements seem inappropriate for
clinical trials, due to question (3) above.
First. since ethical
concerns regarding the allocation of inferior treatments outweigh. and
indeed, are incommensurate with the financial cost of experimentation.
it is common in clinical trials theory to neglect the latter when
calculating the risks involved.
Secondly, one ought to be concerned
6
about detecting even minute differences in treatment success
probabilities (for even a fraction of one percent of the total of all
cancer victims. for instance. is still a large number).
So, methods
that are entirely suitable for industrial or agricultural settings,
for example. tend to be unacceptable when the "experimental units"
involved are human beings.
For some trials, the quantity of interest is a continuous
variable and so some authors such as Anscombe (1963). Chernoff (1967)
and Chernoff and Petkau (1981) assume treatment responses to be
normally distributed and with known. constant variance.
Besides these
assumptions. the analysis can involve some advanced technical details
mentioned in Simons (1986) and lack direct applicability to the
discrete variable case.
By contrast. many treatments display a
dichotomous response (it either works or not). so it is appropriate to
consider Bernoulli responses of "successes" and "failures" instead.
In such cases. the primary question becomes a matter of identifying
the treatment of maximal success probability.
It seems there are two opposing viewpoints commonly held among
theoreticians of clinical trials, most notably brought to light in
Anscombe's (1963) paper reviewing Armitage' s (1960) book. "Sequential
Medical Trials".
(The distinction roughly parallels the gulf between
classical and Bayesian statisticians.)
Armitage's approach is to
consider three hypotheses - that Treatment A is superior to Treatment
B or vice versa or that A and B show no difference in their effect and to control the error probabilities of reaching incorrect
conclusions (similar in style to Bechhofer etc.).
Based on Wald's
(1947) sequential probability ratio test. Armitage considers both
"open" and "closed" (or "restricted") plans having potentially
7
unlimited and pre-determined maximum sample sizes respectively.
(Restricted plans are only introduced because of the small probability
that an open plan may require a large sample size before reaching a
conclusion.)
The trial terminates when a sample path traverses a
boundary line which could be either (i) one of two pairs of parallel
lines (in the open plan). or, (ii) one of a pair of diverging lines
and a vertical or convex connecting boundary (in the restricted plan).
Compared to any fixed sample size design, these sequential stopping
rules greatly reduce the expected sample size while maintaining the
same error probabilities.
Anscombe. on the other hand, suggests having only two possible
conclusions (not the third, non-committal statement) and those in his
camp exchange the difficulty of specifying error probabilities for one
of specifying the total number of patients to be treated, the socalled "horizon", N.
He uses a two-stage approach with an initial
"testing stage" having pairs of patients on two treatments that is
stopped sequentially according to a risk minimisation, decisiontheoretic criterion.
current paper.
It is this approach that is adopted in the
Colton (1963, 1965), also in the continuous variable
case, uses the same approach, as do Cornfield et al. (1969) who extend
Colton's theory to an adaptive multi-stage version.
Canner (1970)
adds to Anscombe's model by including a (financial) cost of
experimentation.
Robbins (1952) studied what became to be known as the "two-armed
bandit" problem, though not specifically in the context of clinical
trials.
He postulated the ethically satisfying "play-the-winner"
rule, which was later studied as a clinical trials model by Sobel and
Weiss (1971) and Hoel (1972).
The rule was modified by Zelen (1969)
8
into a two-stage approach (as per Anscombe) in which using about N/3
patients in the testing stage generates near optimum results.
Feldman
(1962) solved an important problem by showing that myopic (one-steplook-ahead) strategies are optimal when both arms (or treatments) are
Bernoulli with known success probabilities but unknown labels.
Berry
(1978) shows that in the two treatment case with Bernoulli responses.
one only needs moderately good Feldman-type two-point priors to obtain
satisfactory results.
Bandit processes provide an especially valuable tool in analysing
the sort of problems encountered in clinical trials where one desires
to balance immediate losses with future potential gain.
Much research
has been done in the area. notably by Gittins (1979) and in the
clinical trials context by Glazebrook (1978. 1980) and Bather (1981).
Recently. Eick (1988) introduced a realistic modification
incorporating delayed responses.
This changes the nature of his
bandit from a stopping problem to one of determining a manifold in a
multi-dimensional state space.
One or other of two treatments is
optimal according to which side of the manifold the current state is
located.
Part of the present paper uses a similar approach in the
determination of optimal continuation regions.
The monograph "Bandit Problems: Sequential Allocation of
Experiments" by Berry and Fristedt (1985) is a most comprehensive
account, including a substantial, annotated bibliography.
They give
the background to dYnamic programming and results proving the
existence of optimal procedures.
Discount sequences are used to
enable the option of putting less weight on future patients compared
to current ones (and dispense with the finite horizon selection).
One
chapter covers the minimax approach as a means to bypassing the valid
9
criticism of the undesirable, subjective choice of prior in the
Bayesian approach.
These minimax considerations motivate the joint
paper by Bather and Simons (1985) as well as the line of reasoning in
the current paper whereby certain "worst-case" choices of parameters
are made.
However, counter-balancing their appeal, the problem with bandits
in general is that they are notoriously complex - perhaps too complex
for routine use in clinical trials owing to issue (4) above.
Since
two-armed bandits tend to be quite unWieldy, multi-armed bandits can
be close to untractable without making simplifying restrictions.
This
theme of impracticability will recur later.
Most studies in the literature (and with good reason) address
trials haVing two treatments, Flehinger and Louis (1971). Hoel. Sobel
& Weiss (1972) and Simons (1986) being just some of many examples.
For any illness, however, there are always numerous potential new
treatments, such as different methods (surgery or drugs), different
drugs (existing standard or new), different dosages (in level or in
timing) or different combinations of treatments.
In "Clinical Trials:
A Practical Approach", Pocock (1984) says, "When a clinical trial is
being proposed it is ... not uncommon to find a substantial number of
treatments that it is reasonable to consider", and later, "Since most
trials experience difficulty in getting enough patients, one commonsense rule is to avoid having more than two treatments unless one is
confident that sufficient patients per treatment can be obtained with
three or more treatments." (p. 138).
His reasoning is that the power
of a test depends not on the total number of patients but on the
number of patients per treatment.
Thus, the situation of more than
two treatments is not unusual but to conduct a trial with more than
10
three would be rather rare.
The methods Pocock employs for multiple treatments is to regard.
say, the case of three treatments as three pairwise, ongoing trials.
each being analysed by so-called "group sequential" methods, described
in Pocock (1977) and further discussed by McPherson (1984).
This
amounts to selected, rePeated significance testing with typically up
to five or so interim analyses of the data.
Because of the high risk
of false-positive (and false-negative) results, nominal significance
levels are fixed to be very small for early looks at the accruing data
so that the final significance level is kept to the traditional 0.05
or 0.01 for the overall Type I error probability.
With multiple
treatments there is an even greater chance of finding a falsely
significant difference so the early nominal significance levels have
to be very stringent indeed.
Pocock discusses comprehensively the
number and size of groups, various nominal significance levels and
adverse consequences of unscheduled, ad hoc interim analyses (c.f.
"data-snooping").
Whitehead (1985), on the other hand, argues that these group
sequential methods should still involve frequent inspections.
In his
book, "Sequential Clinical Trials", he modifies Armi tage' s fully
sequential approach by advocating a "triangular test" (or "doubly
triangular" for two-sided al ternatives).
(His work is an appl ication
of theoretical advances made by Anderson (1960).)
In the triangular
test, a pair of test statistics are plotted, usually the sum of
observations versus the number of observations. The trial is
terminated when the sample path crosses one of two sides of a
triangular region, dictating the appropriate inference.
Whitehead's
final chapter is devoted to the comparison of more than two
11
treatments, so the trial becomes an elimination procedure for the
worst treatments.
Such procedures amount to "conducting each of the
r(r-l)/2 pairwise comparisons possible amongst r treatments" (p. 205).
Each pairwise trial is treated as though it involved the only two
treatments (using his Armitage-type methods) and as soon as the "best
vs. worst" attains nominal significance, the worst is discontinued.
As with Pocock's approach, with multiple treatments it is necessary to
be conservative by decreasing corresponding non-sequential a-levels by
a factor of the number of pairwise comparisons being made.
Only a few authors of papers deal directly with multiple
treatments, but amongst these are two who generalise Feldman's (1962)
results: Zaborskis (1976) and Rodman (1978).
In the former. it is
supposed there are k Bernoulli treatments and a finite horizon.
Two
problems are considered; first, an optimal strategy is found for
determining the most effective treatment when it is known that
precisely one has success probability "a", higher than "b" for the
other k-l treatments.
With an objective function to maximise the
expected sum of successes, Zaborskis shows that the myopic strategy is
again optimal, i.e., the best policy is always to select that
treatment for which the a priori success probability is greatest.
(Note, from a physician's perspective this is also ethically
appealling.)
His second problem allows the inferior treatments to
have success probabilities less than or equal to b.
Although some
asymptotic properties are given, he writes, "the construction of an
optimal plan in this case is possible if and only if one takes into
consideration the effectiveness of all k treatment methods. but such a
plan is very complicated and hardly deserves a place in practical
applications."
So once again, the unrestricted freedom of choice of
12
treatment. the multi-armed bandit. is just too difficult to solve (let
alone ever explain to the clinicians).
Rodman (1978) relaxes the assumption of Bernoulli responses.
In
one model. all but one of k arms have distribution Q. the other P with
higher expected reward. while in another model. all but one of the k
arms are from P. the other being from the inferior Q.
Within each
model. not knowing the superior population(s). the aim is to find a
rule for the choice of treatment at each trial that minimises the
expected loss.
case k
= 2.
Once more. it is shown that what Feldman proved in the
with P and Q Bernoulli. still holds in these more general
settings - the myopic strategy is optimal.
In more recent articles. Kulkarni and Jennison (1986) typify,
extend and refer to earlier work by the joint authors. notably
Bechhofer and Kulkarni (1982).
They proposed a class of closed
adaptive procedures for Bernoulli population selection along the lines
of Bechhofer et al.·s (1968) approach. governed by maintaining
probabilities of correct selection (already discussed herein) with the
modification that no more than a fixed n observations are to be
sampled from anyone of the k populations.
common constraint in clinical trials.)
(This. though. is not a
Kulkarni and Jennison prove
that their procedure minimises the expected sample size before
termination if the true success probabilities are known to be large
enough (according to a simple condition in their Section 5).
We have seen that myopic strategies (at least with certain
priors) are appeal ling from both theoretical and ethical points of
view. but have the serious disadvantage of impracticability since they
require a technical decision to be made after every single patient's
response.
The present paper considers a stopping rule which is much
13
easier to put into practice since it requires only two decisions in
the course of the trial by disqualifying a treatment once and for all.
if it is found to be unpromising.
This "restricted bandit" idea is used in the study by Zhang
(1987) which is a quite general and wide-ranging exposition of
sequential clinical trials including much work by earlier authors as
examples or special cases.
The non-Bayesian techniques are the same
as those used in a series of four papers possessing a strictly timedecreasing sequence of authors: Lai. Levin. Robbins & Siegmund (1980).
Lai. Robbins & Siegmund (1983). Lai and Robbins (1985) and Lai (1987).
In the first of these papers. Lai et al. (1980) consider three
sequential stopping rules - including that of Anscombe - using
Monte-carlo methods.
All are shown to be asymptotically optimal and
an improvement over any fixed sample size plan.
The 1983 paper
further develops theoretical properties of these rules and also looks
at asymptotic properties of a rule proposed by Begg and Mehta (1979).
one which allocates pairs of patients in a testing stage until
stopping becomes at least as favourable as any fixed size continuation
of that stage.
(This rule is shown to be sub-optimal asymptotically.)
The 1985 and 1987 Lai papers address in the multi-armed bandit
setting a rule based on the comparison of "certain upper confidence
bounds for the mean of an apparently inferior population with the
estimated mean of the (current) leader."
With suitable restrictions
on the confidence bounds. an entirely reasonable allocation rule is
proven to be asymptotically efficient. meaning a lower bound for the
"regret" function is attained as the sample size increases to
infinity.
(The regret is the expected loss or shortfall that one
seeks to minimise that arises from using some policy instead of the
14
superior treatment throughout the trial.)
In Zhang's (1987) paper, a
lower bound for the regret is found that is no smaller than that of
Lai and Robbins (1985), in spite of his simpler allocation rule that
gives "failed" treatments no second chance.
Zhang proves two
theorems; the first, under certain smoothness conditions on the
(general) parametric family of response distributions, that the regret
attains an asymptotic limit, and the second, that the said limit is
actually a lower bound, with the corollary that his policy is
asymptotically optimal.
The only drawback of this advanced theory is, once again, its
practical limitations, since the price paid for such general results
is the use of continuously updated estimators of treatment success
-1"
probabilities (witness
Pin = n Sin' equation (116) in Zhang's
tf
'"
paper) rendering his rule rather too complicated for general
applications.
In the current paper, besides sharing Zhang's
simplifying restriction concerning failed treatments, fixed success
probabilities analogous to Feldman's
a
and
b
are assumed
throughout, a great simplification, yet the results are demonstrated
(empirically) to maintain asymptotic optimality.
Motivation
We have reviewed a variety of sequential designs for clinical
trials models.
These approaches can be further subdivided, due to
Iglewicz (1984), as fully sequential (typified by Armitage), adaptive
allocation (Zelen), multi-stage sequential (Pocock) or decisiontheoretic (Anscombe).
In spite of the ethical backing and tremendous
progress in the theory behind these "alternative" designs since the
time of Wald, it seems unfortunate that such techniques have been very
15
rarely employed in practice. where the norm has been to adopt
classical. fixed sample size methodology.
Armitage (1985). expounds.
"Data-dependent allocation methods are ... just the sort of contribution
that statistics should be making to the design. execution and analysis
of clinical trials.
Yet most of the theoretical work done in this
tradition. over the last 20 years or so. has found no application
whatsoever in the actual conduct of trials.
This lack of contact
between theory and practice seems to me quite deplorable."
It is for
this reason that question (4) above (practicality) was raised and has
been highly influential in the design of the model in this study.
The theoretical, and by implication. ethical. advantages in terms
of expected sample sizes of sequential over classical methods are
well-known and well-documented. see for instance. Wetherill (1975). or
more specifically in a clinical trials context. Bather (1985). He
shows that. "sequential allocation rules can achieve a similar pattern
of error probabilities for a small fraction of the expected cost to
the volunteers employed in the experiment."
The fact that sequential trials generally require fewer patients
has practical implications too.
A common complaint amongst
statisticians involved in clinical trials. such as Peto (main author
of Peto. Pike. Armitage et al. (1976»
is that samples are often too
small for statistical validity under current fixed sample size methods
(see (4a) above).
Pocock. on page 133 of his book. reports that in a
random sample of 50 randomized cancer trials. the median accrual rate
was 33 patients per year - too slow to attain the excess of 100
originally planned for in most of the trials (Pocock et al. (1978».
Also. Zelen found a median trial size of just 50 patients in a survey
of trials reported in Cancer from 1977 to 1979.
16
Finally, on the matter of small samples, Anscombe (1963) notes that
the usual effect of monetary considerations in a clinical trial is to
impose a limit on the number of patients.
Thus, practical, as well as
theoretical and ethical, considerations indicate a strong case for the
application of sequential methods to clinical trials.
Fully adaptive bandit strategies, we have noted, tend to be too
difficult to put into practice.
Armitage's and Pocock's fully and
group sequential methods respectively, in essence, mimic repeatedly
the fixed sample size hypothesis testing methods, whereas Anscombe's
decision-theoretic approach concentrates on the pragmatic need for a
conclusion without dealing directly with
sma~l
error probabilities.
(However, Bather (1985) does give reasons for using them at least as a
means of comparing decision-theoretic methods with either classical or
other sequential approaches.)
From an ethical perspective, governing
trials by "small error probabilities" raises problems when dealing
with serious diseases.
In a survey article, Simon (1977) reports,
"University Group Diabetes Program trials, National Breast Cancer
Surgical Adjuvant trials, and many other trials, have shown it is not
ethically feasible to continue a comparative trial until the 'error
probability' is extremely small when dealing with life-threatening
diseases."
On the practical side, Novick and Grizzle (1965) support
Anscombe's approach by "demonstrating the usefulness of Bayesian
inference methods when facing (practical problems in clinical
trials)".
Sylvester and Staquet (1977, 1980) recommend the decision-
theoretic approach especially for Phase II trials.
Further
theoretical support for Anscombe's model is found in Chernoff and
Petkau (1981) in a continuous time setting, while Lai et al. (1980)
17
conjecture, and Lai et al. (1983) prove, that Anscombe's approach is
"asymptotically optimal from both Bayesian and frequentist points of
view".
Finally, Simons (1986) in the two-treatment, Bernoulli
response case proves that his stopping rule, (generated by letting
success probabilities tend to 0.5 and considering an outer envelope)
is best amongst admissible rules for symmetric priors, in the sense
that it minimises the probability of selecting the inferior treatment.
Furthermore, his rule is attractively elegant in its simplicity:
Stop the testing stage after
n
pairs, having seen
the two treatments respectively, if
2n
exceeds
r, s successes on
N - T ' where N is
k
the patient horizon; k the absolute difference, Ir - sl; and
readily computed sequence with initial values
T
k
is a
2, 14, 41. 82, 136.
Mathematical Approach and Terminology
The main problem in this paper is to generalise Simons (1986) by
studying a clinical trials model involving three Bernoulli treatments.
Simons adopted Anscombe's model (With initial testing stage treating
pairs of patients at a time).
The model of most interest here has an
additional stage at the beginning during which triplets of patients
are allocated one to each of the treatments.
Thus, there are two
critical decisions to be made: precisely when to drop from three
treatments to two, and from two down to one?
We have already seen
that this set-up is ethical, pragmatic and potentially useful.
It
turns out that it gives rise to a straightforward, (easy to apply) and
well-performing (regret-minimising, asymptotically optimal) rule.
Three particular cases are considered.
We will call the model
just described "Case 2", itself a generalisation of "Case I" which
only allows one decision to switch from three treatments directly to
18
one.
Case 1 is studied both to see if it yields a sensible model and
to develop the theory for the more complicated Case 2.
"Case 3" is
developed for COmParison purposes, haVing the restriction of only
sampling at most two treatments at a time.
Essentially, Case 3
comprises two, serial pairwise trials, say Treatment X vs. Treatment Y
followed immediately by the "winner" vs. Treatment Z, and later, the
better of these two alone.
So, Cases 2 and 3 have dual testing stages, and Case 1 has a
single testing stage.
We will say in Case 2 that "Stage I" ends and
"Stage II" begins at the transition to the two-treatment stage.
In
Case 3, "Stage I" and "Stage II" refer to the two pairwise trials
respectively.
Both Case 1 and Case 2 begin by observing patient
triplets; Case 3 only observes at most pairs.
In this paper, Chapter
II examines Case 1 and Chapter V considers Case 3.
Chapters III and
IV are concerned with Stages I and II of Case 2 respectively.
In all three cases, the aim is to determine an optimal stopping
rule for deciding when to make the first transition.
In Cases 2 and
3, we are less concerned with the transition out of the testing stage.
since the problem is studied in Simons (1986).
We will, however, need
to extend some of his results to a slightly more general prior in the
process of determining the exact moments of transition.
Again. in all
of the cases, the constraints are:
(i)
A patient horizon, N
(11) To maximise the objective function, or "reward", the expected
number of successes for the entire horizon.
(Note, (ii) is equivalent to minimising the regret, defined earlier.)
The prior will be discussed in more detail separately in each
case.
It is worth noting, though, that we presume each case begins
19
with an initial indifference to the three treatments on the part of
the physician.
(This. by the way. coincides with the general stance
adopted in practice. when potential new treatments are being proposed
for a clinical trial.)
In Case 1. following Feldman's (1962) ideas. we restrict
attention to prior distributions on the unordered triple (v ,v ,v ) of
1 2 3
unknown success probabilities for the three treatments that are of the
form (a.b.c). where
a
~
b
~
c.
Thus. we consider the prior having
the six permutations in a. b and c as being equally likely and allow
our results to indicate the most promising treatment.
Note that in
our pragmatic approach. the object is to identify this treatment and
..
is not directly concerned with either the exact ranking or estimation
of the true success probabilities.
Again. Simons (1986) gives a
number of reasons to motivate interest in analogous two-point priors
in the two-treatment case.
b
=c
It turns out that the situation having
seems to yield the longest duration trials and. therefore, in a
sense. provides the worst-case in terms of identifying the best
treatment.
Clearly. in the presence of a very poorly performing
treatment. a three-treatment trial would that much sooner become a
two-treatment trial. so it is intUitively quite reasonable to expect
b
=c
to be the slowest case to decide on a winner.
For this reason,
and for implicit theoretical simplifications. only this situation is
pursued in detail in Chapter III and beyond.
For a general prior distribution on (v .v .v ). the Bayes risk
l 2 3
depends on the horizon. N: the number sampled at any time and the
numbers of successes achieved at that time. say. sl' s2 and s3' by
each treatment.
(We are free to specify sl
~
s2
~
s3')
However, when
the prior is of the type described. the Bayes risk simplifies.
In
20
Case 1. for instance. it depends on these parameters only through the
number of patients remaining. denoted by "t", (most conveniently
thought of as the "time to go") and the pair of score differences,
(j,k)
= (sl-
s2' s2- s3)'
Note that the sum, j + k, has a simple
interpretation as the difference in successes accumulated between the
current "winner" and "loser".
The "state" (or "point") (t,j,k) created is Markovian, and the
associated transition probabilities have an elegant and useful
representation.
Certain states are deemed "optimal continuation"
states. whereas others are "optimal stopping" states (and some can be
both simultaneously).
Finding the optimal stopping rule, then, is a
question of identifying which states are which.
In each of the three
cases described, this question is explored both theoretically and
numerically.
CHAPTER II
CASE 1: FROM TIIREE TREATMENTS TO ONE
Introduction
Case 1 aims to find the optimal stopping rule for when to
discontinue allocating triplets of patients. one per treatment. and to
begin giving the best appearing treatment to all subsequent patients.
Clearly. Case 1 is a special case of Case 2. to be discussed more
thoroughly in Chapters III and IV. and is somewhat less realistic. but
it does. nevertheless. prOVide a simpler working model and a suitable
starting point to develop the theory.
Let us suppose the success probabilities associated with the
three treatments are
a. b and c
where
1
~
a
~
b
~
c
~
0
with the case of equality throughout disqualified since we are seeking
a discriminatory rule.
Let
vI' v
2
and v
3
denote the true unknown success probabilities
of treatments 1. 2 and 3 respectively and set
U
= (v 1 .v2 .v3 ).
Define the Feldman-type prior as follows:
= (a.b.c» = r 1
P(U = (a.c.b» = r 2
P(U = (b.a.c» = r 3
P(U
P(U
where
r
i
~
0
for each i
and
= (b.c.a» = r 4
P(U = (c.a.b» = r 5
P(U = (c.b.a» = r 6
};
r.
1
= 1.
22
The symmetric prior having each r. equalling one-sixth
1
corresponds to the physician having no initial preference for the
three treatments, so this is the case of most interest that will be
pursued.
This also bypasses the ethical problem of allocating
treatments when one has a preferred treatment that could be given to
all the patients.
Posterior
Suppose we have observed
sl' s2 and s3 successes
respectively.
(0
~
m triplets of patients and recorded
si
~
m) on treatments I, 2 and 3
This means that we label "treatment I" that which has
accumulated the greatest number of successes and "treatment 3" the
least.
This is for future convenience and does not cause problems in
the case of equalities among the numbers of successes.
Here, let
given
PI
denote the posterior probability that
sl' s2 and s3 successes after
m triplets.
IT
= (a,b,c)
Then, after
cancelling binomial coefficients,
where
ai
=a
si
(I-a)
m-s i
for
i
= 1,
2, 3
and similarly for b , c ..
1
i
Letting
j
k
= sl
= s2
- s2 (excess successes of current winner over median treatment)
- s3 (excess successes of median treatment over current loser)
•
23
"J\ - a(l-b)
- bel-a)
(so that
I
< "J\ < Jl
and
in the case
a
_ a(l-c)
Jl - c(1-a)
> b > c)
"J\j j+k
we have
PI
= "J\jJlj+k
+ "J\j+kJlj + Jll+ k + Jlj + "J\j+k + "J\j
Let the denominator be denoted by:
So, given the results after
m triplets, we have the following
posterior probabilities:
"J\j+k
Ps
= P(U = (c,a,b) I
P6
= P(U = (c,b,a) I m,
m,
5
1
,5 ,5 3 )
= D(j,k)
,5 3 )
= D(j,k)
2
"J\j
5 1 ,5
2
Reward
As a suitable reward function we consider the expected number of
successes gained by stopping optimally over continuing with triplets
for all remaining patients.
We shall consider a finite horizon, say
N, being the total number of patients haVing the disease (or whatever)
including those who join the trial after it has already begun.
the above notation, we can replace the Markovian state (m,
5
1
,5
With
2
,5
3
)
24
by
It is convenient to consider
triplets.
t
as the "time to go" after
m
Recall that before allocating treatments one is faced with
the choice of continuing with all three treatments or (for now)
SWitching to the currently favoured treatment.
The expected number of successes gained by switching immediately
to the current winner, when in state (t,j,k)
instead of continuing
with triplets for all remaining patients, denoted by Rt(j,k), is given
by:
=3
t
[
. k k
. k
. k
]
D(j,k) (2a-b-c)(~)J(~ +A ) + (2b-a-c)~J(~ +1) + (2c-a-b)A J (A +1)
in terms of the state parameters.
The expected number of successes gained by SWitching at the
optimal time to the current winner, when in state (t,j,k), instead of
continuing with triplets for all remaining patients, denoted by
St(j,k), is defined recursively by:
St+3(j,k)
and
= max
A
A
{ Rt +3 (j,k) , E[St(j,k)
S s (j,k)
= Rs (j,k)
I
(t+3,j,k)] }
for s
= 0,
for t
~
0
I, 2
where (t,j,k) denotes the immediately attainable states from (t+3,j,k)
and
E[.]
denotes expectation.
In order to write this recurrence relation in St(j,k) more
explici tly, it is necessary to specify these "immediately attainable"
(meaning accessible on the very next transition) states and their
associated conditional transition probabilities.
These states and
probabilities are the subjects of the next two sections.
25
Transition States
The eight possible results on the triplet undergoing treatment at
time t yield in general seven new states. since the score differences
(j.k) are unaltered in both the event of three successes and of three
failures.
The exact number of new states depends on the positivity or
otherwise of j and k.
The following table shows this. where the
"result" is the observed outcome (0 for failure. 1 for success) on the
current winner. median and trailing treatment respectively.
All new
states are of the form (t.j.k) (though t is omitted below for brevity)
when the current state is (t+3.j.k).
Table 2.1
Transition states in the four cases.
k
Result
0
0
0
1
1
1
1
0
0
j
]
>0
>0
k
j
>0
=0
k
j
=0
>0
k
j
=0
=0
(j .k)
(O.k)
(j.O)
(0.0)
0
(j+l.k)
(1.k)
(j+l.0)
(1. 0)
1
0
(j-1.k+l )
(l.k)*
(j-1.1)
(1. 0) *
0
0
1
(j.k-l)
(O.k-l)
(j-1.1) *
(l.0)*
1
1
0
(j .k+l)
(O.k+l)
(j .1)
(0.1)
1
0
1
(j+l.k-l)
(1.k-l )
. 1) *
(J.
(0.1)*
0
1
1
(j-l.k)
(l.k-l)*
(j-l.0)
(O.lt
*
Indicates a change of ordering of the treatments occurs before
the next transition. but with the aforementioned convention of
labelling treatments according to their rank of accumulated successes
this does not concern us (nor indeed is the initial labelling of any
consequence) .
26
Transition Probabilities
We introduce the notation
a
= I-a.
b
= I-b.
c
•
= I-c.
Then. in terms of the above posterior probabilities. the transition
probabilities to the indicated state (t.j.k) conditional upon being in
state (t+3.j.k) are given in the following two tables.
Table 2.2 Transition probabiLities. generaL case: j
= abc + abC
ql = abc(PI+P2)
q2 = abc(P3+P5)
q3 = abc(P4+P6 )
q4 = abC(P I +P3 )
q5 = abc(P2+P4 )
% = abc(P5+P6 )
> O.
> O.
k
~
(j.k)
(j+l.k)
(j-l,k+l)
(j .k-l)
(j .k+l)
(j+l, k-l)
(j-l,k)
+ abc(P3+P4) + abc(P5+P6)
+ abc(PI+P6 ) + abc(P 2+P 4 )
+ abc(P2+P5 ) + abc(P l +P3 )
+ abc(P2+P5 ) + abc(P 4+P6)
+ abc(P l +P6 ) + abc(P3 +P5)
+ abc(P3+P4 ) + abc(P I +P2 )
e
•
Table 2.3 Transition states and probabiLities. other cases.
>0
j
>0
j
(O.k)
~
(j.O)
~
(0.0)
qo
(I.k)
ql+ q2
(j+I.O)
ql
(1,0)
ql+ q2+ q3
(O.k-l)
q3
(j-l,l)
q2+ q3
(0.1)
q4+ q5+ q6
(O.k+l)
q4
(j.l)
q4+ q5
(I.k-l)
q5+
(j-l.O)
%
k
> O.
j
%
> O.
k
= O.
k
=0
In order to produce more elegant formulae. define the indicator
variables J and K by:
27
J
and
= I{j > O}
K
= I {k > O}
Then, the above transition probabilities
can
.
be summarised in
a
transition array:
j-l
k-l
j+l
j
o
[q5+
k
~(l-J)]K
[q2+q3(I-K)](I-J) + ql
<to
a
[q5+ ~(l-J)](l-K) + q4
k+l
o
=
say.
The values of these entries are found to be functions of D(j,k) as
shown below.
Table 2.4 Transition array entries (multiplied by a factor).
> 0, k > 0
> 0, k
=0
= 0,
>0
=k =a
Value of
j
q3KD(j ,k)
abeD(j ,k-l)
0
abeD(O,k-l)
0
rIKD(j,k)
abcD(j+I,k-l)
0
28:bcD( I , k-l )
0
~JD(j,k)a
a 2 beD(j-I,k)
a 2 beD(j-I,O)
0
a
_2
_2
r D(j,k)a
2
r JD(j,k)
3
r D(j,k)
4
a bcD(j+l, k)
a bcD(j+I,O)
2a bcD(l,k)
38: bcD(l.O)
abeD( j-l, k+ I)
2abeD(j-I,k+l)
0
a
abcD(j ,k+ 1)
28:bcD(j.l)
abcD(O,k+l)
38:bcD(O.l)
Recall: D(j ,k)
j
j
k
_2
j
2
28
To verify, for instance the first entry, note that for
q3D{j,k)
-- j j
= abc{~
+A )
+
- -
abc{~
k
> O.
j J'+k j+k
-j j+k J'+k
A +A
) + abc{A ~
+~
)
= abc{~j+Aj) + abc{~jAj+k-l+Aj+k-l) + abc{Aj~j+k-l+~j+k-l)
= abCD{j ,k-l),
The dynamic equation in St{j,k) can now be written:
for t
with
where
Ss (j,k)
= Rs (j,k)
for s
= 0,
~
0
I, 2
A denotes the linear operator given by:
A Zt{j,k) = [ abCD{j,k-l)KZt{j,k-l) + abcD{j+l,k-l){2-J)KZ t {j+l.k-l) +
a 2 bc/a)D{j-l,k)JZ t {j-l,k) + (abc + abc)D{j,k)Zt(j.k) +
_2
(a bc/a)D(j+l,k)[1+(2-K)(1-J)]Zt(j+l,k) +
abcD(j-l,k+l)(2-K)JZ (j-l,k+l) +
t
abcD(j,k+l)[1+(2-J)(1-K)]Zt(j,k+l) ] + D(j,k).
Optimal Stopping Rule
Clearly certain states (t,j,k) are optimal continuation points or
optimal stopping points (or possibly both),
A point (t+3.j,k) is said
to be in the optimal continuation region if
A St(j,k)
than
Rt +3 (j,k)
is not less
and is in the optimal stopping region if
is no more than Rt + (j ,k),
3
A St(j,k)
Here "stopping" means switching from all
three treatments to the best appearing treatment for the remaining
patients in the trial,
Note that, within the optimal continuation region, St(j,k)
29
satisfies the difference equation:
Define
so that
Qt(j.k)
represents the advantage (in terms of the expected
number of successes gained) of stopping optimally over stopping
immediately when in state (t.j.k).
Thus. the optimal stopping rule says to stop if and only if
Furthermore. the shape of the optimal continuation region is entirely
defined by the function Qt(j,k).
Dynamic Equation in Qt+3.Li....kl
The dYnamic equation in St(j.k) is easily modified to read:
Qt+3(j.k) = max { 0 , A Qt(j.k) + A Rt(j.k) - Rt(j,k) + a + b +
C
- 3[a(Pl+P2) + b(P3+P4) + c(P5+P6)] }
with
Qs (j.k) = 0
for s = O. 1. 2.
It turns out. after lengthy algebra. that the value of
for j
= [(a_b)2~k + (a-c)~k + (b-c)2]t
2(~k + ~k + 1)
> 0,
for k
k 2 0;
> 0;
and
1
2
2
2
A Rt(O.O) - Rt(O.O) = ~(2-a)(b-c)
+ (2-b)(a-c) + (2-c)(a-b) ]t .
That is.
where J and K are the previously defined indicator functions, and
( b)2 k
(
)~k + (b-c )2 + (I-K)~a(b-c)
1
2
2
2
P =ag +a-c
+ b(a-c) + c(a-b) J.
k
k
2(~k + X + 1)
30
Defining
ITj,k by the equation
the recurrence relation in Qt(j,k) can be recast:
for t
for s
equality if and only if
Pi
= 61
0
I, 2.
for all integers k and also that
Note that
j
= 0,
~
IT. k
J,
~
0
with
for each i, that is, if and only if
= k = O.
Three Assertions
There now follows three assertions that the optimal continuation
region is monotone, or nested, in each of its parameters. For j and k
non-negative integers, we have:
Assertion 2.1
If (t,j,k) is an optimal continuation point,
then so too is the point (t+l,j,k).
Assertion 2.2
If (t,j+l,k) is an optimal continuation point,
then so too is the point (t,j,k).
Assertion 2.3
If (t,j,k+l) is an optimal continuation point,
then so too is the point (t,j,k).
Assertions 2.2 and 2.3 are merely saying that if it is optimal to
continue for any given pair of success differences, then it is optimal
31
to continue for any reduced component-wise pair of success
differences.
These assertions remain unproven, although neither of
them surprises ones intuition and each is strongly supported by
computer-generated numerical evidence.
The proof of Assertion 2.1
depends on Lemma 2.1 to follow.
For simplicity, we shall restrict attention to those values of t
that are multiples of three (effectively considering N to be divisible
by 3, therefore) but the proof can very easily be modified to apply to
all integer values of t.
Lemma 2.1
for j,k non-negative integers,
t
By induction on
Since
~
is zero, clearly this holds for t
Qs+6(j,k)
If Qs+3(j,k)
= 0,
= max
= max
0
Qt+3(j,k) - Qt(j,k)
Assume for some (multiple of 3) s that
Now,
~
a multiple of 3.
~
O.
= O.
Qs+3(j,k) - Qs(j,k) l O.
{ 0 , 4 Qs+3(j,k) + pk (1-J}(s+3) - ITj,k }
{O,*},
say.
then we are done, since Qs+6(j,k)
So take the case
Qs +3(j,k}
=4
~
O.
Qs (j,k) + Pk(l-J}s - IT.J, k
> O.
Then,
The second and third terms combine by linearity of A to a non-negative
term, and the first and fourth terms are positive and non-negative
respectively.
Hence
and further
o
32
Proof of Assertion 2.1
Let
Now.
~
denote the optimal continuation region.
(t+3.j.k)
(~
if and only if
Qt+3(j.k)
= O.
Let us assume
that this is so. i.e .• (t+3.j.k) is an optimal stopping point.
Then. by Lemma 2.1.
Qt(j.k) ~ O. so
That is.
Qt+3(j.k)
Qt(j.k)
(t.j.k) (
~
Qt(j.k)
and. by definition.
= O.
~.
Therefore. if (t.j.k)
€~.
then so too do we have (t+3.j.k) €
~.
0
Numerical Questions
With the aid of the computer. the optimal continuation region can
be thoroughly explored by addressing certain questions numerically,
three of which will be discussed briefly here.
First of all. for any given set of values of (a.b.c). and any
given pair of success differences (j.k). for what numbers of patients
remaining t is it optimal to stop the testing stage of the trial (that
is. to switch all future patients to the best appearing treatment)?
It turns out that when j
= O.
so that the top two treatments have
accumulated the same number of successes. it is generally optimal to
continue sampling by triplets.
The only exception is if the number of
patients remaining is very small. say less than 25. when it is
sometimes optimal to stop and randomly choose between the two
treatments (since a clear-cut winner has not emerged) for those
patients yet to be treated.
An example of an optimal continuation region then. for values of
j
> O.
with (a.b.c) equalling (.6 .. 5 .. 4) and some specified values of
t is given in Table 2.5 to follow.
33
Table 2.5
For
Optimal continuation region For (a.b.c)
t
t
t
t
t
= 30.
= 60.
= 90.
= 120.
= 150.
continue for
j
continue for
j
continue for
j
and
j
= 1.
= 1.
= 1.
= 2.
continue for
j
and
j
continue for
j
and
j
= (.6 .. 5 .. 4).
k
= O.
otherwise stop
k
~
2.
otherwise stop
k
~
5
k
= O.
= 1.
k
~
13
= 2.
= 1.
= 2.
k
~
1.
k
~
14
k
~
2.
otherwise stop
otherwise stop
otherwise stop.
Clearly. with the stopping rule only allowing for the switch to
be from three treatments to one treatment. this model is continuing
for large values of k when j is very small
(~2).
Thus. what is
important is the value of t when the model begins stopping for new
(higher) levels of j.
T
j
= min
Let us define these "jump values" of t by:
{ t : (t.j.k)
€ ~ }
for
j
~
o.
The first seven values for the same triple (.6 .. 5 .. 4) are:
The second question addressed numerically is that of universal
optimal stopping points.
That is. for which states (t.j.k) does one
terminate the testing stage irrespective of the values of a. b and c?
This is tantamount to finding the largest possible optimal
continuation region. or eqUivalently. the smallest set of values of T.
J
just defined.
definition of
(Numerical evidence suggests we can restrict our
T
j to apply to k
=0
only.)
It seems that no triple of values (a.b.c) uniformly minimises
these Tj·S. but those triples that come close to doing so have
34
a
approximately
0.5
and
b
=c
approximately
0.47 .
T' =
min
{T},
j
all (a,b,c)
j
Defining
then, empirical evidence suggests the first eight of these values to
be as follows:
So we can say that irrespective of the values assigned (a,b.c) it is
always optimal to stop the testing phase if the current state is
(t,j,k)
and
t
< Tj.
A further, related question of practical importance is when one
of the treatments has a known success probability, as happens when one
treatment is a standard whose performance is well-documented.
Here.
4It
it is useful to know the jump values defined by:
Tj =
where
min
{
all (a,b,c)
Tj
one of (a,b,c) equals
o}
V
o is the known success probability.
V
Again, numerical evidence suggests restricting attention to those
cases where b
= c,
so there are two cases, namely (a,b,c) equalling
where
a
> Vo > b,
to investigate.
However.
due to the above-mentioned problem with this model, namely its
tendency to continue in the presence of a clear loser (c.r. small j
and large k), this avenue has not been pursued in detail.
having noted that the "b
= c"
Instead.
setting appears to be the slowest to
decide on a winner, we shall now turn our attention to the more
realistic setting of Case 2. incorporating this situation.
aIAPrER III
CASE 2 , STAGE I I: FROM TWO TREATMENTS TO ONE
Introduction
Case 2 generalises Case 1 by allowing for the early elimination
of the worst-appearing treatment.
Thus, one need not wait until both
losing treatments are sufficiently behind the leader to warrant their
joint discontinuation.
Case 2 has an initial testing stage, called
"Stage 1", after which one decides to cease allocating the single
worst-appearing treatment. giving pairs of patients. instead, the top
two treatments,
Secondly. one decides after "Stage II", when to
switch all remaining patients to the better of these two treatments.
Naturally, the Stage I decision depends on the Stage II decision, so
the latter is investigated in some detail first.
Prior
Pairs of patients are to be allocated one each on two treatments
in Stage I.
The six-point prior haVing weight on each point (a,b),
(a.c). (b.a). (b.c). (c.a) and (c.b) is the most general case arising
out of Case 1. but we shall concentrate on the simpler three-point
prior having weight on (a,b). (b,a) and (b,b) only.
corresponds to the case b
= c,
As noted, this
the worst-case scenario, it seems, as
far as identifying the best of three treatments is concerned.
Just as
in Case 1. the priors are of the Feldman type. so that if, here.
36
~1
and
~2
denote the true unknown success probabilities. then
P{{~I'~2) = (a.b» = wI
P{{~I'~2) = (b.a»
= w2
P{{~I'~2) = (b. b» = w3
and
In the case of interest arising from Case 1. the ratio w /w equals
1 2
j
A • where j denotes the difference in the number of successes achieved
at the time of transition into Stage I by the top two treatments.
With this in mind. it is convenient to define the number
i
by the
equation:
i
=
In wI - In w2
In A
so that
Posterior
Suppose treatments 1 and 2 have accumulated
respectively.
x
and y
successes,
Here. we are not necessarily labelling according to the
number of successes. so x is not necessarily larger than y.
Suppose
further that this is after observing n pairs of patients from a total
horizon N.
=
Then. after cancelling binomial coefficients.
- - n
(b/a) w
3
.
37
=
=
- - n
(b/a) w
3
x
Y
W1A + W A + (b/a)nw3
2
As before, let t denote the number of patients remaining after n
pairs, so that
t
=N -
2n.
Define a new denominator function, Dt{x,y) by:
Dt{x,y)
= W1Ax
- - (N-t)/2
y
+ W2A + w3 {b/a)
.
Reward
As reward function we consider the expected number of successes
gained by stopping optimally over continuing with pairs for all
remaining patients.
The expected number of successes gained by
switching immediately, when in state (t,x.y), versus continuing with
pairs for all remaining patients, denoted by Rt{x,y), is given by:
Rt{x,y)
= t[aPl+
bP2+ bP3 ] -
~[(a+b){Pl+P2)
since we switch to the winner, and
So. in general,
Rt{x,y)
t
= ~a-b){P2-Pl)
Rt{x,y)
= ~a-b) Ip 1- P21.
+ 2bP3 ]
38
X
t(a-b)lw 1X -w2XY /
that is,
2D (x,y)
t
t(a-b)w2/XX+i_XYI
=
2D (x,y)
t
As before, the reward for stopping (that is, switching to best
treatment from state (t,x,y»
at the optimal time as the expected
advantage over continuing with pairs for all remaining patients,
denoted by St(x,y), is defined recursively by:
for t 2. 0
and
Ss (x,y)
= Rs (x,y)
for s
= 0,
1
where (t,x,y) denotes the attainable states from (t+2,x,y).
Transition States and Probabilities
The four possible results on the pair undergoing treatment at
time t yield four new states, with the following transition
probability distribution from the state (t+2,x,y).
Table 3.1
Transition states and probabilities.
Result
New State
Probability
(0,0)
(t ,x,y)
aj;P1 + baP2 + bbP3
(1,0)
(t,x+l,y)
abp
1 + biP2 + bbP3
(0,1)
(t ,x,y+1)
abp
1 + baP 2 + bbP3
(1, 1)
( t ,x+ 1, y+1)
abp
1 + baP2 + bbP3
= abD t (x,y)lD t +2 (x,y)
= abD t (x+1,y)lD t +2 (x,y)
= abD t (x,y+1)lD t +2 (x,y)
= ab~t(x+1,y+1)lDt+2(x,y)
b
To verify the first entry, note that:
39
ab(w1AX+ w "l-l+ (b/a) (N-t)/2W3 )
2
=
Dt +2 (x,y)
= abD t (x,y)lD t +2 (x,y).
Thus, the recurrence relation defining St(X'y) can be re-written:
St+2(X,y)
=
max {R + (x,y) , [abDt(x,y)St(x,y) + abDt(x+l,y)St(x+l,y) +
t 2
abD t (x,Y+l)St(x,Y+l) +
,
Now define
St(x,y)
ab~t(x+l,y+l)St(X+l,y+l)J / Dt +2 (x,y)}"
b
= St(x,y)Dt(x,y)
and
Then we have the simpler recurrence:
,
St+2(x,y)
= max
.
,
{ Rt +2 (x,y) , (iiIb) A St(x,y) }
for t
~
0
where, here, the operator A is defined by:
Note that
Notice also that the coefficients of A sum to unity, and that this
recurrence is independent of w , the prior weight on (b,b).
3
So this
recurrence relation encapsulates the optimal stopping rule:
it is optimal to switch to the better of the two treatments if and
,
only if
,
Rt +2 (x,y) ~ (iiIb) A St(x,y).
40
As in Case I, we can define Qt(x,y) to be the advantage (in terms
of expected number of successes gained) of stopping optimally over
stopping immediately when in state (t,x,y), so that
and further define
.
= Qt(x,y)Dt(x,y).
Qt(x,y)
Thus, we have
.
Qt(x,y)
.
= St(x,y)
,
- Rt(x,y).
Now the optimal stopping rule says to stop if and only if
or, equivalently,
,
Note that GQ(x,y)
.
=0 and Ql(x,y) =O.
.
Dynamic Equation in Qt+2(x,y}
.
Now, for t l 0, we have:
Qt+2(x,y)
.
.
{
o , (a/b) A St(x,y)
- Rt +2 (x,y) }
= max {
o , (a/b) A Qt(x,y)
+ (a/b) A Rt(x,y) - Rt +2 (x,y) } .
= max
,
The quantity
.
.
.
(a/b) A Rt(x,y) - Rt +2 (x,y) has to be evaluated with
some care in each of four cases, depending on the value of
x-y+i.
summarise, the value of this quantity is:
(i)
Y
-w (a-b) (Xx+i -X)
2
if x-y+i ~ 1
(ii)
-w2 (a-b)(XY-Xx+i )
if x-y+i ~ -1
(iii)
Y _ abt(XY+1_XX+ i )]
-w2 (a-b)[(Xx+i -X)
if
(iv)
Y x+i _ abt(XX+ i +1_XY)]
-w2 {a-b)[{X -X
)
if -1
o
x-y+i
~
~
x-y+i
~
1
~
o.
To
41
Defining indicator functions
II
= I{
0 ~ x-y+i
I_I
= I{
-1
<1
}
< x-y+i < 0 }
we can write:
,
,
(8Jb) A Rt(x,y) - Rt + (x,y)
2
= -w2(a-b)[IAx+i_AYI
- abt{(AY+l_AX+i)Il + (AX+i+l_AY)I_l}]'
Therefore, we have
Q~+2(x,y)
= max
{ 0 , (8Jb) A Q~(x,y) - (a-b)w2IAx+i_AYI +
(a-b)w2abt{(Ay+l_AX+i)Il + (AX+i+l_AY)I_l} }
with
QQ
=Q1 = O.
It should be noted that in cases of interest arising from Case 1.
i is an integer (non-negative) so that I_I is zero and II becomes
if x-y
= -i,
and zero otherwise.
corresponds to WI
= w2 .
Further, the case having i
When additionally w3
=0
1
=0
the situation
reduces to the two-point prior case studied by Simons (1986).
,
Lemma 3.1
,
Qt+2(x,y) - Qt(x,y)
~
0
for all (t,x,y).
,
Proof
(By induction.)
,
Since
~(x,y) ~
0 for all x, y then, clearly
,
~(x,y)
- QQ(x,y)
~
O.
Assume for some (multiple of two) s that
,
,
Qs+2(x,y) - Qs(x.y)
,
Qs+4(x.y)
= max
~
0 for all x. y.
Now
- -'
I x+i YI
{ 0 • (alb) A Qs+2(x.y) - (a-b)w2 A -A
+
(a-b)w2ab(s+2){(AY+l_AX+i)Il + (AX+i+l_AY)I_1} }.
,
If Qs+2(x.y)
=0
the inductive step is complete. since
,
Qs+4(x.y) - Qs+2(x.y) ~ O.
42
Thus, we take
Qs+2(x,y)
= (atb)
A Qs(x,y) - (a-b)w2IAx+i_AYI
> o.
+ (a-b)w2abs{(AY+l_AX+i)Il + (AX+i+l_AY)I_l}
So
(B/b) A Q~+2(x.y) - (a-b)w2IAx+i_AY/ + (a-b)w2ab(s+2){ ... }
= Qs+2(x,y)
o
+ (atb) A Qs+2(x,y) - (B/b) A Qs(x,y) + 2(a-b)w2ab{ ... }
,
= Qs+2(x. y ) + (B/b) A [Qs+2(x,y) - Qs (x,y)] + 2(a-b)w2ab{ ... }.
The terms on the right-hand side are positive, non-negative (by
linearity of A and the inductive hypothesis) and non-negative
respectively, and hence the left-hand side is positive.
Therefore, it
follows that
o
Qs+4(x.y) - Qs+2(x,y) ~ 0
establishing the lemma for even values of t.
But the proof for odd
values of t is identical except for the initialising step
,
,
OJ(x,y) - Ql(x,y)
Result 3.1
~
O.
o
If (t.x,y) is an optimal continuation point,
then so too is (t+2,x.y).
As in Case I, but using Lemma 3.1 above.
Thus the optimal continuation region is nested in t.
The next result has important consequences on the shape of the
optimal continuation region, and gives rise to much simplification
theoretically.
o
Lenuna 3.2
Proof
43
.
.
Qt(x+l,Y+l} = XQt(x,y}
(By induction.)
for all (t ,x,y).
= O.
The assertion is trivially true for t
Assume there exists an integer s
1.
0 such that
.
Q (x+l,y+l) = XQ (x,y).
~
s
To es tabI ish
s
Qs+2(x+l,y+l)
= max
Qs+2(x+l,y+l}
= XQs+2(x,y),
note that, for s
~
O.
- .
I x+l+i -Xy+l
{ 0 , (alb) A Qs(x+l,y+l} - (a-b)w2 X
I
+
(a-b)w2abs{(Xy+2_XX+l+i}Il + (XX+i+2_xY+l)I_l} }
= max
-'
{ 0 , (alb)
A XQs(x,y)
- (a-b)w2XIXx+i -XYI +
(a-b}w2abXs{(xY+l_xX+i)Il + (XX+i+l_XY)I_l} }
o
So the proof is complete.
Result 3.2
If (t,x,y) is an optimal continuation point.
then so too is (t,x+l,y+l) .
Qt(x,y)
=0
.
if and only if Qt(x+l,y+l)
= O.
o
Thus, even in the three-point prior case, the parameter of interest in
determining optimal continuation region is the difference in successes
accumulated, x - y.
The above lenuna allows us to re-write the
recurrence in terms of
z
=x
- y
and
t
alone.
II
Define
Then
Qt(z)
= Qt(x-y) = Qt(x-y,O).
Q~(x,y)
= XQ~(X-l,y-l} =... = XYQ~(X-y,O)
44
Note that II and I_I are functions of z. so we can now write the
.
recurrence in terms of Qt(z) as follows:
.
Qt+2(z)
= max
" "
{ 0 . (ab+ab)Qt(z) + abQt(z-I) + abQt(z+I) +
(a-b)w abt{(A-A Z+ i )II + (AZ+i+I-I)I_I} - (a-b)w2IAz+i_1
2
QQ
with initial values
I}
=QI =O.
It is optimal to stop in the (new) Markovian state (t.z)
if and only if
..
Qt(z)
= O.
In cases of interest arising from Case 1. this reduces to:
"
Qt+2(z)
= max
" " "
{ 0 • (ab+ab)Qt(z) + abQt(z-I) + abQt(z+l) +
(a-b)2w2tI {z = -i} - (a-b)w2IAz+i_II }
where
i
is a non-negative integer.
The next results concerning the optimal continuation region depend
.
only on the positivity or otherwise of Qt(z). so for convenience the
positive constant (a-b)w can be factored out and the recurrence can
2
be written in terms of
v = ab
Setting
+
Qt(z)
abo
defined by:
we have
Qt+2(z) =
max{ 0 • vQt(z) + abQt(z-I) + abQt(z+I) + (a-b)tI{z
= -i}
z i
- IA + -1
I }
= O.
1.
with initial values
Lemma 3.3
Proof
Qt (-i+k)
Since
QQ
= Ak Qt (-i-k)
=QI = O.
for all k
€
71..
the result is trivially true for
t
45
> O.
First, consider k
Assume the result holds for some s
~
o.
Then, we lmow
Qs+2(-i+k)
k
= max{
0, vQs(-i+k) + abos(-i+k-l) + abQs(-i+k+l) - A +1}
= max{
0, vQ (-i-k) + abo (-i-k-l) + abQ (-i-k+l) + A- -1}.
and
Q (-i-k)
s+2
k
s
s
s
Therefore,
Akos+2 (-i-k)
=
k 1
k 1
k
max{ 0, vAko (-i-k) + abA + Q (-i-k-l) + abA - Q (-i-k+1) - A +1}
s
s
= Qs+2 (-i+k)
If k
< 0,
s
establishing the result for
k
> o.
the result says,
is the same as the case
and finally, if k
= 0,
k
>0
(on multiplying both sides by A1kl )
o
there is nothing to prove!
={ z
Zt
Now define
which
€ I
: Q (z)
t
>0 }
so Zt is the set of integer values of treatment success differences
for which, for a given t. it is optimal to continue sampling by pairs.
Lemma 3.4
(i) Zt
(ii) Zt
Z
t
~
Zt+2
=0
for all
= 0,
1,2 ....
t ~ 2 :
if
= { -i
t
}
if
t:
3~
t
where. recall, i satisfies
<
14 ~ 6+ 2[2abba
1-~_]
Ai
and
v
= ab
+ abo
46
Proof
(i) The assertion that
consequence of Result 3.1.
Zt
is non-decreasing in t is a direct
An induction argument similar to that in
the proof of Lemma 3.1 shows that:
for all
So it follows that if
that is, if
(ii)
z
€
Zt then
By definition,
Further,
Thus, Zt
=0
if
t
~
{ 0 ,
~(z)
so
~(-i)
and
~(z)
€
then Qt+2(z)
> 0,
Zt+2 .
= Ql(z) = 0 for all z
- IAz+i -1 I } = 0 for all
€ 71..
z € 71..
2.
= max
Now,
z
GQ(z)
= max
~(z)
>0
Qt(z)
z € 71..
{ 0 ,
(a-b)I{z
= -i}
-
IAz+i -1 I
}
= (a-b) > 0
for all
= 0
z # -i, showing Z3 = { -i }.
The rest follows by examining early values of Qt more closely.
so
and
~(-i)
Q4(z)
~(-i+l)
Qg(-i)
for all
= 2(a-b) > 0
z # -i.
= 2(a-b)(v+2).
~(-i-l)
and
=0
Q4(-i)
= max
= max
= max
~(z)
{
0 , 2ib(a-b) - lA-I-II}
{
0 , (l/ab)(a-b)(2abba-l) }
{
0 , (l/ab)(a-b)(2abba-l) }
=0
=0
=0
(since
for any other value of z.
= 2(a-b)(v2+2v+3).
Qg(-i-l)
= max
{
0 , (l/ab)(a-b)(2abba(v+2)-I) }
= O.
Qg(-i+l)
= max
{
0 , (l/ab)(a-b)(2abba(v+2)-I) }
=0
and
Qg(z)
=0
for any other value of z.
--
2abba
< 8"1 ).
47
Omitting some algebraic details.
02(n+2)(-i±l)
>0
for n such that:
n
2abba( v n - 1 + 2v - 2 + 3vn - 3 + ... + (n-l)v + n ) - 1
The lowest integer m is such that
and
2ab~m_l ~ 1
>1
2ab~
m
~n = vn - 1 + 2vn - 2 + 3vn - 3 + ... + (n-l)v + n.
where
~
Note
> O.
n -
v~
n- 1 = n.
so subtracting v times the the first inequality
from the second yields:
2abbam
= [~~~]
and hence
m
Therefore
Qt(-i±l)
given by
2(m+2).
>0
+ 1.
where
> I-v
[.]
denotes integer part.
for the first (lowest) value of t that is
Lastly. the minimum value of
I-v
is 4. attained when
a
= b = 0.5.
2abba
so
m is at least 5 and hence t is at least 14.
(Further minimum
values of t are discussed after Lemma 3.6 to follow.)
Thus Zt = { -i } for precisely the values of t stated.
( iii)
This can be proven. by virtue of Lemma 3.3. if it can be shown
that
The assertion is verified immediately for t
~
2 from (ii).
The first
inequality follows directly from (i). so it only remains to establish
for
To show this by induction. assume for some
for some integer
Zo
This is true for
s
s
t
~
~
3.
3.
and non-negative integer L. so that
=3
and 4
by (ii). with
Zo = -i
Izs I = L+l.
and
L = O.
48
Note
Z
is non-empty since
s
Consider
-i
Z
€
max { 0 .
and
Qs+2(z)
=0
=
similarly.
by virtue of (i) and (ii).
s
-IA
zO-2+i
- 11 } = o.
for all
max { 0 .
-IA
zO+l+2+i
-11 } = O.
for all
and
Hence
and so
Part (iii) of this lemma is entirely analogous to the two-point
prior case studied in Bather and Simons' (1985) Theorem 5.
Lemmas 3.3 and 3.4 show that there is a sYmmetry about
-i
in
the optimal continuation region. so that if the set is non-empty
(t
> 2)
then
= { -i.
Zt
-HI. .., . -i±k }
for some non-negative integer k. for any given t.
The optimal
continuation region then is characterised by a sequence of values of
at which
depend upon
a
and
b.
k
expands.
Tk
These values of
Lemma 3.4(ii) shows the first values to be
T
T
Zt
I
= 6 + 2
1
I-v__ .
[2abba
is defined as follows:
T
so that. for t
k
< Tk
= min
{ t
~ 0
: Qt(-i+k)
>0
for k
}
t€71
we have
Qt(-i+k)
= O.
ork (-i+k)
but
> O.
Thus.
Qt+2(-i+k)
= max{
-
k
O,vQt(-i+k) + abQ t (-i+k-I) - A +I}
for the first value of t at
t
= Tk
(we know
Q (-i+k+l)
t
>0
=0
by
~
O.
49
Lemma 3.4{i», so
Qt+2 (-i+k) = abQ t (-i+k-l) - Ak +l > 0
for the first time when
= Tk
t
- 2.
This proves the next lemma.
Lemma 3.5 Tk
= min
{ t
2
k
> (1/ab){A
-1)
Qt_2{-i+k-l)
} for k
~
O.
tel:
Qt{-i+k) > 0
This says that
t) that
~
Qt_2{-i+k-l)
for the first time (lowest value of
> (1/ab){Ak-l).
The next lemma gives precise formulae for
Q
t
for values of
t
satisfying
Lemma 3.6
TO
For
Q (-i)
t < T1 '
~
= (a-b)2
if t is even
(I-v)
t
Q (-i)
= (a-b)
{(I+v)v(t-l)/2 - 2 + (I-v)t}
if t is odd
(l-v)2
t
and
{ 2v t / 2 - 2 + {1-v)t }
Qt(z)
=0
Z ~
for
-i.
For all values of t in the given range,
Qt+2(-i)
= vQt(-i)
+ (a-b)t.
This is a simple linear recurrence in Qt(-i) with complementary
solution
Av t / 2
for some constant A, chosen so that
~(-i)
=0
for t even;
and particular solution
Bt + C
B = (a-b)
(I-v)
Ql(-i)
=0
for todd
where
_ -2(a-b)
C -
(I-v)
2
50
The value of
A specifies the stated result and. lastly.
for values of z other than
-i
by restriction of
t
Qt(z)
=0
< T1 .
o
Understandably. the formulae for Q for higher values of t become
t
increasingly more complex.
examine the behaviour as
However. it is possible and useful to
a
and
b
tend to the same limit one half.
since there is a maximal optimal continuation region that is found by
letting
a
and
b
approach this limit.
This generates an outer
envelope for a universal optimal continuation region (applicable for
all choices of a and b) and is characterised by a sequence of integers
where .. here. for each
T
=
k
k
~
O. we define:
min Tk = lim
T
k
(a.b)
a~.5
~5
Lemma 3.4(ii) shows
the lemma shows that
(a.b)
~
TO
=3
and
Q3(-i)
= a-b
(.5 .. 5) . so that we have
T
1
= 14.
However.since the proof of
. this also tends to zero as
TO
= 2.
These first two values agree with those found in Simons (1986)
which dealt with a two-point prior (so w
3
= 0)
in the case
i
= o.
Reasons are given therein for justifying interest in the limit as both
a and b tend to one half.
Since the formula for
Q
t
of w . it is reasonable to suppose that the values of
3
is independent
T
k
he found
apply equally in this more general setting.
(The initial values of
the sequence reported in his Theorem 5 are
2 or 3. 14. 41. 82. 136.)
The reason for concentrating on the prior that puts weight on the
points (a.b). (b.a) and (b,b) is two-fold.
First. the recurrence
relation for the more general six-point prior (as considered in
Case 1). though it can be written down. is too unwieldy to analyse.
Secondly. and more importantly. as mentioned earlier, the given three-
51
point prior corresponds to the worst-case scenario when it comes to
identifying the best of three treatments.
That is, the statistical
task of identifying the Bernoulli population with maximal success
probability from more than one Bernoulli population is the most
difficult when all but one of the success probabilities are the same.
See, for example, Appendix II of Sobel and Huyett (1957), which proves
under the circumstances of their interest, the (a,b,b) case is the
least favourable configuration amongst all possible choices of
(a,b,c).
Under the present circumstances of interest, computer
generated numerical evidence strongly supports the same notion.
Generalisations
(Note: this section is incidental to the clinical trials model
being investigated.
It does, however, shed some light on the
possibility of extending results to more general prior distributions
than the special Feldman-type we have been using so far.)
The results obtained for the three-point prior generalise to a
mixed prior having weight wI
and w
3
= l-w 1-w2
~
> 0,
w
2
>0
on (a, b). (b,a) respectively
0 smeared along the diagonal (9,9) for
0
~
9
~
1.
We shall call this measure G so that
J:
dG(B)
= w3
.
We begin by finding the posterior probability distribution for
(~1'~2)'
the unknown success probabilities of the two treatments. and
define
PI
and
P2
= P((~1'~2) = (a,b)
= P((~l '~2) = (b,a)
(n,x,y»
(n,x,y»
where, again, we suppose treatments 1 and 2 have accumulated x and y
successes, respectively. after n pairs of patients from a horizon. N.
52
On cancelling binomial coefficients, we find:
x
n-~y
n-y
a (I-a)
D (I-b)
wI
and
(blb)X+Y(1/ab)nJ19x +Y(1-9)2n-X-YdG(9)
o
Define the denominator function, Ft(x,y) by
where
t
= N-2n
denotes number of patients remaining after n pairs.
For notational convenience, define P3(9) to be the posterior
"density" corresponding to (9,9) for any 9 € [0,1].
notation a
= 1-9,
Then, with
Table 3.2 below shows the transition states from
. (t+2,x,y) and associated transition probability distribution.
Table 3.2 Transition states and probabiLities.
Result
New State
Probability
(0,0)
(t,x,y)
abp1 + baP2 + Jaa p3 (9)d9
(1,0)
( t,x+l,y)
abp
(0,1)
(t,x,y+1)
abp
(1, 1)
( t ,x+ 1, y+ 1) abp
p
1 + ~P2 + J9a 3 (9)d9
1
1
+ baP
+ baP
= abF t (x,y)IF t +2 (x,y)
= abF t (x+1,y)/F t +2 (x,y)
2
+ Jaap3(9)d9
= abF t (x,y+l)lF t +2 (x,y)
2
+ J99P3(9)d9
= ab~t(x+l,y+l)lFt+2(x.y)
b
-
53
To verify the second entry for instance, note that
abP l + baP2 +
J~99P3(9)d9
abwIAX+l+abw2AY+ab (blb)x+l+Y(1/ab)(N-t)/2 9x +1+Y{1_9)N-t-x-l-YdG(8)
=
F t +2 (x,y)
If Rt(x,y) and St(x,y) have their former meanings (expected
advantage over taking pairs all the way over stopping immediateLy and
optimaLLy respectively), then the same recurrence relation in St is
modified to read:
St+2(x,y)
=
max {R t +2 (x,y) , [abFt(x,y)St(x,y) + abFt(x+l,y)St(x+l,y) +
abFt(x,Y+l)St(x,Y+l) +
ab~t(x+l,y+l)St(X+l,y+l)] / F t +2 (x,y)}
b
= Rs (x,y)
= 0,
where
Ss (x,y)
and
X
t(a-b) IwI A -W2AYI
Rt(x,y) =
2F (x,y)
t
for s
1
To see the form of Rt(x,y), the expected advantage by switching
immediately to the current winner instead of taking pairs for all
remaining patients when in state (t,x,y), note that, if PI
switch to treatment 1):
> P2
(so we
54
(and if
P2
< PI
we switch to treatment 2 and find the negative of this
quantity).
So, just as before, we can define and work with suitable
multiples of Rt and St' namely:
St(X,y)
= St(x,y)Ft(x,y)
Rt(x,y)
= Rt(x,y)Ft(x,y)
and their difference, if
Qt(x,y)
= St(x,y)
- Rt(x.y).
Now the above recurrence can be written:
for t
~
0
where A is the linear operator defined previously.
i
(Recall i is such that A
Thus,
Rt(x,y)
= w1/w2
.(x,y)
=R
.)
(defined earlier)
and once again the recurrence can be written in terms of Qt:
Qt+2(x,y)
= max
{ 0 , (8/b) A Qt(x.y) - (a-b)w2IAx+i_AYI +
(a-b)w2abt{(Ay+l_AX+i)Il + (AX+i+l_AY)I_l} }
with
Qo
=Q1 = 0
and. as before,
I
- I
1 - {O
~
x-y+i
<1
}
I_I
= I{
-1
< x-y+i < 0
}.
55
Qt
That is.
satisfies the same recurrence as Q and the same initial
t
values and hence is the identical function.
Therefore. the optimal continuation region for the generalised
prior having mass smeared along the diagonal (8.8) is identical to
that of the three-point prior studied earlier.
However. because of
the different denominator functions D and Ft' the rewards are not the
same. only the positivity or otherwise of Qt(x.y) for any given
(t.x.y).
This fact is proven and made more explicit by virtue of
Theorem 3.1 to follow.
As a consequence. all the results of this chapter so far.
including those studied in more detail for the particular case of
i
being an integer. apply in this more general setting.
To begin to explain this similarity. we define p1 and p2 to be
measures by:
respectively. whereas;
WI and w2 on (v 1 .v2 )
p2 puts weight
respectively. and
= (a.b)
w is smeared along (v .v )
1 2
3
1
according to a measure
G.
such that
J dG(8)
O
where
and
and (v .v )
1 2
WI
> 0,
w
2
>0
WI
> O.
w2
> O.
WI
w
and
w3
~
WI + w2
0
and
= (b,a)
= (8,8)
= w3
=1
WI + w2 + w3
=1
2
w
WI
2
so that the relative mass on (a.b) to (b.a) is the same for p1 and p2.
Note that
The Radon-Nikodym derivative
dp1
dp2
exists and is given by
56
(essentially the likelihood ratio):
dp1
dp2
=
so
dp1
Thus, dP2 is just the ratio of the denominator functions:
D(x,y}
F t{x,y)
where
D(x,y)
= w1Ax
+
Ft(x.y) = w\Xx +
and
t
=N -
W
2
AY,
w~y
+ (bIb)X+Y(\/ab)nJ:ox+Y(\-O)2n-X-YdG(O)
2n.
Theorem 3.1 to follow relates the quantities that measure the
advantage of stopping at the optimal time over immediate stopping
under the two priors p1 and p2 respectively, but first, we introduce
some notation.
i
Let Qt{x,y) denote the the advantage in terms of the expected
number of successes gained by stopping at the optimal time instead of
stopping immediately when the current state is (t,x,y) and the prior
is according to measure pi ; i
= I,
2.
(Recall "stopping" means "switching to the currently most favoured
treatment for all remaining patients".)
i
Let Rt{x,y) denote the the advantage in terms of the expected
number of successes gained by stopping immediateLy instead of taking
57
pairs for all remaining patients when the current state is (t,x,y) and
the prior is according to measure pi ; i
= 1,
2.
i
Let St(x,y) denote the the advantage in terms of the expected
number of successes gained by stopping at the optimaL time instead of
taking pairs for all remaining patients when the current state is
i
(t,x,y) and the prior is according to measure P
; i
= 1,
Note, therefore, that
for i
-i
Lastly, let Zt(x,y) be defined for i
Zt(x,y)
-1
1
= Zt(x,y)D(x,y)
-2
2
= Zt(x,y)Ft(x,y)
Zt(x,y)
2.
= 1,
for Z
= 1,
2.
2 by:
= Q,
R and S successively.
Since D and F are both positive functions, it follows that:
t
z~ > 0
if and only if
Zi
t
>0
for Z = Q, R and S successively.
i
Recall that the positivity of Qt(x,y) determines that (t,x,y) is an
optimal continuation point, for i
Theorem 3.1
Step 1:
Q~.
= 1,
2.
With notation as above,
i
We first establish that the assertion holds when R replaces
t
From their definitions,we have:
t(a-b)lw1AX-W2AYI
2 D(x,y)
for t
~
0; and
for t
~
O.
t(a-b)lw1AX-w2AYI
2 Ft(x,y)
58
Since we mow
dp1 _ D(x.y)
dp2 - Ft(x.y)
and
then. clearly
Step 2:
-1
-2
It also follows from their definitions that R and R are
t
t
related by the equation:
Step 3:
-1
-2
St and St both satisfy recurrence relations involving the
operator A. where. as previously.
namely. for t
~
-i
0 and i
St+2(x.y)
where for s
= O.
= 1.
= max
1 and i
2
-i
- -
-i
{ Rt +2 (x.y) . (alb) A St(x.y) }.
= 1.
2.
-i
-i
S = R .
s
s
The derivation of these recurrences is given earlier in the two cases
representing the two different priors (only p1 is somewhat simpler
since there is zero weight on the point (b.b».
From Step 2. it follows (by a straightforward induction argument) that
Step 4:
Thus we have
~(x.y)
Step 5:
Finally. subtraction of the result in Step 1 from that in
Step 4. we find
o
59
So Theorem 3.1 explains why the optimal continuation regions are
identical for the two-point symmetric prior pi and for an extended
version of it, p2 which has the same proportionate weights on (a.b)
and (b,a) but additional mass along the diagonal vI
according to some measure G.
= v2
smeared
The result of interest has G associating
all of its mass on the single point (b,b), since this is the case
arising from the situation of current interest.
Theorem 3.1 also raises the question of how general the result
is.
For instance, is the optimal continuation region the same for any
specified prior, say G and for an extended prior that puts some
1
weight on G and the remainder on the diagonal vI
1
some measure?
= v2
according to
That is, more specifically, if we define such priors.
G and G given by:
1
2
o < '"l' < 1
where
and
G puts mass along vI
= v2 '
then do G and G give rise to the same optimal continuation region?
2
1
It turns out that the answer is "yes", so that Theorem 3.1 is
itself a special case of the follOWing more general result, in which
the notation has the same meaning as in Theorem 3.1 except that G and
1
G replace pi and p2, respectively.
2
Theorem 3.2
i
where Qt{x,y) denotes the the advantage in terms of the expected
number of successes gained by stopping at the optimal time instead of
stopping immediately, when the current state is (t,x,y) and the prior
is according to measure G., i
I
G
1
= 1.
2.
Also,
is a general prior distribution for (v ,v );
1 2
G puts mass along VI
= v2 '
G
2
= (1-'"l')G 1
and '"l' is a constant such that 0
+ '"l'G;
< '"l' < 1.
60
(Since this represents something of a diversion from the main
topic under consideration in this paper, the proof of this theorem is
deferred to Appendix 1.)
Theorem 3.2 shows that
i f and only i f
and hence that the optimal continuation region is the same for any
given prior distribution and an extension of it that smears some mass
along the diagonal.
Theorem 3.1 gives this result in the context of
the two-point prior under consideration.
Theorem 3.2 states the
result in its most general form.
For practical purposes, one would typically choose G to be a
1
symmetric prior to reflect an initial indifference to the treatments
being studied.
It appears from a theoretical perspective that even
slight generalisations of our "two points off the diagonal" prior give
rise to formulae for the stopping rewards that are overly complex for
further analysis.
We can now use the theory of this chapter, which extends Simons
(1986) to cope with a three-point prior, in constructing the optimal
stopping rule for Case 2.
OIAPTER IV
CASE 2, STAGE I: FROM TIIREE TREATMENTS TO TWO
Introduction
Having developed the theory for making the decision to drop from
two treatments to one at the optimal time, we shall now turn attention
to the decision that precedes it chronologically in Case 2.
This,
recall, is the situation of most interest in the current paper,
allowing for early termination of the worst-appearing treatment.
The
goal of this chapter is to derive the dynamic equation that governs
the transition from Stage I.
A corresponding computer program,
suitable for analysing the performance of the stopping rule for
arbitrarily large patient horizons, is given in Appendix 2.
Stage I of Case 2 begins just as in Case 1, with triplets of
patients assigned one each to the treatments.
Once the results of
each triplet are known (we assume this is practically immediately
after treatment). the question is whether or not to continue this mode
of operation for the next three patients.
The alternative is to
reject the "losing" treatment and commence Stage II, allocating pairs.
In Case 1, the mathematics was made somewhat easier by considering
expected number of successes in terms of the advantage over continuing
with triplets for all remaining patients.
were made in Case 2. Stage II.
Similar simplifications
This cannot be done in Stage I,
however, because we are seeking to compare triplets with doublets.
62
The optimal policy is determined by asking, "Is it better to
continue with another triplet or is this the time to drop the losing
treatment?"
The answer is found by a direct comparison of what one
expects (in terms of the number of successes arising from the
remaining pool of patients) in both instances, with the policy
naturally choosing the larger quantity.
Prior and Posterior
The prior distribution of success probabilities for the three
treatments is as follows:
p(rr
= (rr l ,
rr
where
= (a,b,b» = p(rr = (b,a,b» = p(rr = (b,b,a» = 31
rr2 , rr3 ) denotes the vector of true success
probabilities of treatments I, 2 and 3 respectively.
a
and
b
The parameters
are known and the object of the clinical trial is to
determine which has success probability
(> b).
a
corresponds to the particular case having
b
=c
This set-up
in Case 1 with the
symmetric prior, and so the reasoning behind the posterior
distribution has already been given (in Chapter II).
After observing
m triplets and
sl' s2' s3 successes
respectively (with the convention of ordering from greatest to least)
the posterior depends only on the parameters
t, j and k
which
together form the Markovian state:
where
N is the patient horizon.
So, j and k denote certain
differences between the current numbers of successes when t patients
are yet to enter the trial.
Explicitly, the posterior distribution is:
63
wI = p(rr
w
2
= (a.b.b)l(t+3.j.k»
Aj +k
= Aj +k + Ak + I
= p(n = (b,a,b)l(t+3,j.k» = Aj +k
k
A
+ Ak + I
(where the notation deliberately corresponds to that of the prior in
Case 2. Stage II).
Reward
We shall define the reward function
SMAXt(j.k)
to be the
expected number of successes gained by switching at the optimal time
from three treatments to two. when the current state is (t,j,k).
(It
should be remembered that, under our model. "optimal" means subject to
the constraint of disallowing further use of dropped treatments.)
Mathematically, SMAX + (j.k) is defined recursively as being the
t 3
larger of (i) the expected number of successes gained acting optimally
with the two leading treatments alone. from the state (t+3,j.k); and
(ii) the amount
a+2b + E[SMAX (J.k)I(t+3.j,k)].
t
where
a+2b
is what one expects from the next triplet and the
'" '"
conditional expectation term is based on (t.j.k). the immediately
attainable states from (t+3.j.k).
From an earlier argument. the quantity in (ii) can be expressed:
where
AI is the (in general. seven-point) linear operator derived in
Chapter II. evaluated at b
= c.
64
Dynamic Equation in SMAX t +3llJU
The following reasoning and series of lemmas develop the
expression for (i) in the dynamic equation satisfied by SMAX + (j,k).
t 3
Since the posterior at (t+3.j.k) in Stage I becomes the prior for
Stage II. we can expect with probability w • to see b(t+3} successes
3
under any policy. as it corresponds to making the error of eliminating
the truly superior treatment and entering a ttb vs. b tt situation.
Also. with probability (l-w ). we can expect the best possible from
3
two treatments in an tta vs. b tt situation with t+3 patients remaining
and initial between-treatment separation j.
Let us denote. for given
t and j. this expected number of successes from two treatments in an
a vs. b situation. by
By commencing Stage II. then. when the current Stage I state is
(t+3.j.k). the expected number of successes is given by:
+
Now. St+3(j} itself satisfies another recurrence relation; one
which can be generated most readily in terms of the expected advantage
over taking pairs for all remaining patients.
Conditioning on not
being in the b vs. b situation. let us suppose we have observed
and
y
successes on the two treatments since beginning Stage II and
that there now remain t patients to be treated.
transition associates weight
w
2
x
= w2 (I-w3 }-1
on the point (a.b). and
on the point (b.a).
Note that the ratio
=
Then the prior at
=
65
Define
Rt(x,y;j)
to be the advantage in terms of expected
number of successes gained by stopping Stage II immediately instead of
taking pairs for the remaining patients, t, j, x and y as above.
(Recall the initial separation was j in favour of the treatment now
having accumulated x successes in Stage II).
Then earlier reasoning from Chapter III, setting w
3
where
that is,
Letting
d
=a
= 0,
shows:
- b,
_ ! ~Ax-y+j - 1 L.
Rt(x,y;j) - 2
St(x,y;j)
Ax - y + j + 1
denote the corresponding expected advantage over
pairs but stopping optimally, we have, in the usual notation:
together with initial conditions
S.1
=R.
for i
1
= 0,
1.
The conditional expectation term is written more fully as:
Dropping the j's on the right-hand side, we have
= max{
R + (x,y), abst(x,y) +
t 2
+
Lemma 4.1
St(x+1,y+1;j)
t, j, x and y.
Ax - y - 1+ j + 1
Ax - y + j + 1
= St(x,y;j)
Ax - y + 1+ j + 1
Ax - y + j + 1
abS t (x+1,y)
abs (x,Y+1) + abS (x+1,y+1)}.
t
t
for all non-negative integers
66
.
Smce
Proof
.
Rt(x,y;J)
we have R (x+l,y+l;j)
t
t ~1~X-y+j - 1 L
+'
,
~x-y J + 1
= Rt(x,y;j)
S (x+l,y+l;j)
s
= S s (x,y;j)
•
for all non-negative t, j, x and y.
So the assertion holds in particular for t
Assume
a
= 2"
= 0,
1.
for some s and all j, x and y.
Then consider (again dropping some j's):
S
s+
2(x+l,y+l; j)
= max{
R 2(x+l,y+l), abs (x+l,y+l) +
s+
s
+
= max{
~x-y-l+j+ 1
~x-y+j+
~x-y+l+j+ 1
~x-y+j+
1
abS (x+2,y+l)
s
abs (x+l,y+2) abS (x+2,y+2)}
s
s
1
~x-y+l+j+ 1
R + (x,y), abs (x,y) +
s 2
s
~x-y+j+
1
abS (x+l,y)
s
~x-y-l+j+ 1
+
= S s+2(x,y;j)
~x-y+j+
abs (x,y+l) + abS (x+l,y+l)}
s
s
1
completing the inductive step.
o
Therefore, it follows that
Let us define
Rt(z;j)
and
St(z;j)
where
z
= dRt(z,O;j)
= dSt(z,O;j),
is any integer and
d
=a
- b
is a multiplicative constant.
This simplifies the recurrence to:
St+2(z;j) = max{ R t +2 (z;j), vSt(z;j) +
+
where
v
= ab
+ ab, and S.
I
=R.
I
for i
~z+l+j+ 1
~z+j+ 1
~z-l+j+ 1
~z+j+ 1
= 0,
1.
abS t (z+l;j)
abs (z-l;j)}
t
67
for integers t
Z
t
Rt(z'J')
= '
2
JA +
j
z+J.
A
-
S
s+
Ss (z;j)
2(z+I,j-l)
+ 1
= Ss (z+l;j-l)
= max{
O. j
~
0 and z.
1L
= R (z+I'J'-I)
t"
so. in particular. the assertion holds for t
Assume
~
= O.
1.
for some integer s L O.
R 2(z+l;j-l).
s+
Then,
Az - 1+ j + 1 abS (z;j-l)
s
AZ+ j + 1
AZ+1+ j + 1 + vS (z+l;j-l) +
abS (z+2;j-l)}
s
AZ + j + 1
s
completing the induction.
Consequently,
and similarly for
St(Z;j)
= St(z+l;j-l) =... = St(z+j;O)
Rt(z;j).
Further defining
Y
R (y) = R (y;O) = -tJA -IL
t
t
2 AY+l
and
St(Y)
= St(Y;O)
we have:
with initial values
S.
1
=R.
1
for i
= 0,
1.
Then the quantity of interest is
because the factor
d
needs to be re-inserted and the expression
given in terms of pure expected number of successes as opposed to an
advantage over pairs for all remaining patients.
Finally. the recurrence can be tidied by some algebraic
manipulations into hyperbolic functions. such as:
o
68
Rt(y)
and
y 1
A + +1
- - - ab
AY+l
where
a
= 2"1
= ~ Itanh ayl = ~
y 1
+l
-Y2- ab
2
A +l
= fJn
=A+
In A
and
f3
tanh alyl
cosh(y+l)a
--cosh ya
2-= abba.
Slightly adapting the notation from Simons (1986). setting
u
= f3
y
cosh(y-l)a
cosh ya
and
; = f3
cosh(y+l)a
cosh ya
y
the recurrence reduces to the one he derived. namely:
with
S.1 == R.1
for i
= O.
1 and
Rt(y)
=~
tanh alyl.
The preceding lemmas reduce the number of parameters from four
down to two. thereby greatly facilitating efficient computer
programmability.
With this view in mind. the next lemma is useful for
considerably reducing the required storage space (and execution time)
of the program.
for non-negative integers t and y.
Proof
By virtue of tanh(·) being an even function. the assertion
holds for t
= O.
1.
Assume
S (-y)
s
= S s (y)
for some integer s
Z O.
Then
= max{
R
(-) f3 cosh(-y-1)a S (- -1) + vS (-y)
s+2 y.
cosh -ya
s y
s
+ f3 cosh(-y+l)a S (- +1)}
cosh -ya
s y
= max{Rs+2 ()
y .
n cosh(y+l)a S ( +1) + vS ( ) + n cosh(y-l)a S ( -I)}
cosh ya
s y
s Y
fJ
cosh ya
s y
fJ
(since cosh(·) is also even).
o
69
This means that we can restrict attention to non-negative integer
values of y.
We can also make use of the case y
=0
as follows:
~
St+2(0)
= max
=
{ Rt +2 (0), uOS t (l) + vSt(O) + wOS t (I)}
max { 0, vSt(O) + (l-v)St(I)}
= vSt(O)
Result 4.1
+ (l-v)St(I).
For any given state (t+3,j,k) in Stage I, the reward
satisfies:
i\j+k
where
wI = w1 (j,k) = i\j+k+ i\k+
1
A1 Z t (.J, k)
for i = 0, I, 2
SMAX. (j ,k) = i[b + dw 1 ]
1
wi th ini t ial values
= [ abbD(j,k-l)KZ t (j,k-1)
1
w3 = w3 (j,k) =
i\j+k+ i\k+ 1
+ abbD(j+l,k-1)(2-J)KZ t (j+1,k-1)
+ (a2 bb/a)D(j-l,k)JZ (j-l,k) + (abb + abb)D(j,k)Zt(j,k)
t
_2
+ (a bb/a)D(j+l,k)[1+(2-K)(1-J)]Zt(j+1,k)
+ abbD(j-1,k+1)(2-K)JZ (j-l,k+1)
t
+ abbD(j,k+1)[l+(2-J)(1-K)]Zt(j,k+1) ] + D(j,k)
and, here,
Further,
D(j,k)
St+3(j)
= 2i\j(i\j+k
+ i\k + 1).
satisfies
~
= max{
Ss+2(y)
and
with
Ss+ 2(0)
Si
=Ri
= vS s (0)
for i
= 0,
1
-1
v = ab + ab ; a = 2" In i\
y
= {3
>0
+ (l-v)S s (1),
and
u
for y
Rs +2 (y), uySs (y-1) + vSs(y) + wySs (Y+1)}
cosh(y-1)a
cosh ya
and
; = {3
y
cosh(y+l)a
cosh ya
{3
2-= abba
70
Proof
All that remains to be proven after Lemmas I, 2 and 3 above is
the initial conditions for the main recurrence and its leading term.
To dispose of the latter, this is known to be
+
(1-w3 )St+3(j) + w3b(t+3)
= (1-w3 )dS t +3 (j)
M3
+ ~a+b)(1-w3) + w3 b(t+3)
= (1-w3 )dS t +3 (j)
t+3
+ ~a+b-dw3)
as stated in the result.
Clearly, when no patients remain, SMAX
O
=o.
If only one remains,
then assigning the best treatment (randomizing in the event of a tie)
yields a success with probability:
aWl + bW2 + bW3
=b
+ dw .
1
Finally, with two remaining, one has a choice, so that:
This reflects choosing either allocating both on the best appearing
treatment or using both the top two treatments once each.
The former
is always at least as good as the latter because, on subtraction:
d[2w
1
+ w3 -1]
with equality if and only if j
It follows that
SMAXi(j,k)
Lemma 4.4
= i[b
= d(w 1-w2 )
=0
(since
SMAX (j,k) = 2 SMAX (j,k)
2
1
+ dw (j.k)]
1
for i
= 0,
~
0
·+k
AJ
= Ak
must hold).
and, in particular,
I, 2.
0
If Stage I or Stage II has continued until just three
patients remain, then it is always optimal (irrespective of the values
of a and b) to allocate the leading treatment alone if there is one.
Otherwise, in the event of a tie between the top two treatments, one
should allocate those treatments to the next pair of patients.
71
Proof
First, suppose we are in Stage II and t
= 3.
For j
> O.
- max{ ~ tanh ·a ~ cosh(j-l)a Itanh(j-l)al + ~2 tanh .
J , 2 cosh ja
Ja
2
+ ~ cosh(j~l)a Itanh ( ·+l)al}
2 cosh Ja
J
The former quantity within { , } is the larger of the two (indicating
a switch to a single treatment) if and only if:
j
3(A -l)
> abA j - 1
+ (ab+ab)A
j
that is,
A (3-ab-ab-ab-ab)
j
but this is true since A
>1
when j
j
+ abA j + 1 - 1
= 2A j > 2
> O.
Thus, switching to the
single leading treatment is optimal.
If j
= 0,
then
S3(0)
= vS1(0)
+ (l-v)Sl(l)
= 21
(l-v)tanh
a
= 2d > 0
(after a little algebra) indicating that taking a pair of treatments
is optimal.
when t
This completes the assertion if Stage II is in progress
= 3.
Secondly, if Stage I (taking triplets) is still in progress when
t
= 3,
and there is a unique leader (j
> 0),
then:
The former quantity is the larger if and only if
j k j
j
j
3d(A j +k+Ak )(A j -l) + 3(A +l)(a+b)(A + +A +l) - 3d(A +l)
> 2(a+2b)(A j +l)(A j +k+A j +l)
that is,
72
4A j +k - 2Ak - 2
or
>0
but this always holds since j
Therefore. SMAX {j.k)
3
triplet.
> a+2b.
> O.
and so it is optimal to not take a
Instead. the policy is as if Stage II were in progress.
In the final case to consider. j
= O.
we have:
The first term exceeds the latter if and only if:
that is.
or
This says the optimal policy when j
treatments.
Ak {d+1) > 1. which is indeed true.
=0
is to select the pair of
o
This completes the proof.
Programming Considerations and Applications
The dYnamic equation for SMAXt{j.k) involves a nested recurrence
and a seven-point linear operator. so perhaps unsurprisingly, is
analytically intractable for values of t above 3.
Nonetheless. one of
the advantages of dealing with Bernoulli responses is that progress
can be made by a direct numerical approach by computer.
The algorithm
in Result 4.1 lends itself to a rapid and efficient program that can
be performed on a personal computer in (approximately) linear time,
i.e .. one which is suitable for arbitrarily large patient horizons.
yet only requiring the storage of small two-dimensional matrices.
This. incidentally. is an important consideration.
For example,
Kulkarni and Kulkarni (1987) in a section on optimal Bayes procedures
(for two treatments. but a Beta prior setting) noted. "space and time
requirements grew at a rate proportional to n
4
making it impractical
73
to compute the decisions even for moderate values of n, say n
~
50."
Again, Armitage (1985) commented on the two-armed bandit problem for a
finite horizon, "The solution ... can in principle be obtained by
dynamic programming methods, but in practice the computation involved
is prohibitive except for trivially small horizons."
The rest of this chapter is concerned with numerical results and
assertions based on studying the computer output for a wide variety of
values of a, b and N.
Assertion 4.1
For any given values of a, b, t
< N,
j and k, if it is
optimal to continue Stage I in the state (t,j,k) then it is also
optimal to continue Stage I in the state (t+1,j,k).
Assertion 4.2
The policy described in Result 4.1 is asymptotically
optimal in the sense that as the horizon increases, the expected
number of successes approaches a limit equal to the product of the
horizon and the largest of the treatment success probabilities. i.e ..
the asymptotic regret is zero, or:
lim
N~oo
-1
N
SMAXN{O,O)
= a.
Assertion 4.1 was proven in the analogous situation in Case 1
(see Assertion 2.1) and seems intuitively reasonable in this more
general setting too.
However, a formal proof remains elusive. even
with t+3 replacing t+1.
Its usefulness is that it allows the optimal
continuation region for Stage I to be characterised by an array of
minimal numbers of remaining patients, t, for which it is optimal to
commence Stage II for any given pair of success differences (j,k) and
74
all higher values of t.
(j,k)
= (2,1)
then
is an optimal (Stage I) continuation point so long as at
least 153 patients are remaining.
Further such minimal t-values are
shown in this example of a "policy table".
Table 4.1
= (.6,.4)
For example, if (a,b)
PoLicy tabLe For (a,b)
(Missing values exceed N.)
= (.6 .. 4),
N
= 300.
k\j
o
1
2
3
4
o
6
21
57
127
269
1
40
79
153
299
2
117
194
3
267
Besides giving the policy the program also shows the expected
number of successes from any point (t,j,k) for given a and b.
supreme interest is the quantity
Of
SMAXN(O,O) corresponding to the
expected number of successes prior to observing any data.
Assertion
4.2 is empirically supported by numerous cases, for example. with
(a,b) = (.6, .5),
SMAX9 ,999(O,O}
= 5,965.74
or
59.66%.
This is as opposed to the 60% success rate expected if we knew
ahead of time which of the three treatments was truly the best (and
recall two of them have 50% success rates in actuality).
The regret
is the expected shortfall due to ignorance of the best treatment prior
to sampling, so here, it is less than 35.
Note that this means one
only expects this many patients (out of 9,999) to fare any worse than
if the truly superior, by 10%, treatment were allocated to all.
From
this in turn, we infer that we expect to make just 350 allocations of
the inferior treatment throughout the entire horizon.
75
When (a,b)
= (.6,.4),
SMAX9 ,ggg(O,O)
= 5,978.91
or
59.80%.
With an even greater difference between a and b, results for horizons
that are higher still can be computed in reasonable time, e.g ..
when (a,b)
= (.75,.25),
SMAX60 ,OOO(O,O)
= 44,989.00
or
74.98%.
This says that, with this horizon, one expects the single 75%
successful treatment to become clear after 22 allocations of the 25%
treatments.
The expected number of successes, the regret and the
expected number of inferior allocations in a trial all express the
same information, so we will not continue stating all three
quantities.
These large horizon results, then are highly suggestive of
asymptotic optimality as asserted.
It should be mentioned, however.
that this does not come as a surprise.
Few 'sensible' sequential
rules fail to achieve this sort of criterion (being one of several
attributes of sequential methods).
Many further questions of interest naturally arise, for instance:
(1) How well does the rule perform for small or moderate values of N?
(2) How critical is the choice of (a,b)?
(3) Are there universal stopping points analogous to Case I?
(4) How does the rule compare to two serial, pairwise trials?
The fourth question is especially interesting, since it provides
valuable insight into the usefulness of handling three treatments
simultaneously as opposed to the more commonly employed comparison of
just two treatments at a time.
This issue, Case 3, will be the
subject of Chapter V to follow.
We shall briefly address here the first three questions by
studying further some numerical results.
76
Numerical Results
(1) Small Horizons
Assertion 4.2 suggests the rule performs very well when the
horizon is very large.
It is also important to see how it performs
for smaller horizons, both from a theoretical perspective and for
practical reasons - with a view to applications in Phase II trials
(recall, these are the small-scale comparative studies designed to
detect early promise of a new drug), and in addition, because we have
seen, in Chapter I, that even Phase III trials with 50 or so patients
are commonly reported in the medical literature.
Table 4.2 gives the expected number of successes or reward,
S~(O,O),
for three treatments, one known to be 60% successful and
two known to be 40% successful, for various values of N.
Table 4.2
Reward For (a,b)
N
= (.6,.4),
S~(O,O)
various horizons, N.
N-1S~(O,O)X100%
Log N
12
5.89
49.10
1.08
24
12.04
50.15
1.38
48
24.83
51.73
1.68
72
37.98
52.75
1.86
100
53.66
53.66
2.00
200
111.10
55.55
2.30
300
169.61
56.54
2.48
500
287.94
57.59
2.70
1000
585.76
58.58
3.00
3000
1782.82
59.42
3,48
9999
5978.91
59.80
4.00
77
Obviously. the larger the horizon. the higher the expected
percentage of successes for the clinical trial (reading down the third
column).
But. even the smallest tabulated values are acceptable.
especially if compared with the corresponding expected percentages for
analogous fixed-sample size trials.
these. or just 46.67% for (a.b)
One expects 100(a+2b)/3 % for
= (.6 .. 4).
independent of N.
For detecting a 5% difference in treatment effects. consider as
an example (a.b)
= (.525 .. 475).
(We will see later the relevance of
this particular choice).
Table 4.3
Reward for (a.b)
N
= (.525 •. 475).
S~(O.O)
various horizons. N.
N-1S~(0.0)X100%
Log N
100
49.63
49.63
2.00
200
99.65
49.83
2.30
300
149.93
49.98
2.48
400
200.41
50.10
2.60
500
251.05
50.21
2.70
600
301.84
50.31
2.78
700
352.75
50.39
2.85
800
403.76
50.47
2.90
900
454.88
50.54
2.95
1000
506.08
50.61
3.00
1100
557.36
50.67
3.04
1200
608.71
50.73
3.08
For values of log N between 2 and 3 (so horizons are between 100
and 1.000) there appears to be an approximate linear increase in the
expected proportion of successes in the scale of log N.
For moderate-
sized horizons it seems (based on the above two and other examples)
78
that horizons around 300. when log N is approximately 2.50. will
perform satisfactorily.
Of course. there is still room for
improvement. but only at the cost of magnifying the size of the trial.
In short. the benefit in terms of the increase in expected proportion
of successes is about the same in increasing N from 100 to 300 as it
is in increasing N from 300 to 1.000.
(2) Robustness in (a.b)
As with any Bayesian approach. one is always concerned with the
robustness of the prior.
The usefulness of the rule in practice will
require a lack of sensitivity to small changes in parametric values.
especially the values of a and b.
For any particular disease being
studied in a clinical trial. at least an approximate success rate for
one of the treatments will generally be known. so it is unlikely that
both success probabilities are going to be severely mis-estimated.
Some results are given in Table 4.4 for the optimal stopping policy
when the horizon is 300 (chosen due to (1) above) and various (a.b).
The entry in position (j.k) in any policy table is the minimal
number of remaining patients for which it is optimal to continue
Stage I. sampling by triplets.
Table 4.4
(i) (a.b)
(A
PoLicy tabLes For N
= (.6 •. 55)
= 1.23)
(All missing entries exceed N.)
= 300.
various (a.b).
~j
0
1
2
3
4
5
0
6
23
54
101
163
2~
1
45
84
137
207
293
2
130
189
263
3
~2
79
Table 4.4 (Continued)
(ii) (a. b) = (.6 .. 5)
k'\j
0
1
2
3
4
5
0
6
22
53
103
174
275
1
42
79
134
210
2
119
176
256
3
237
5
(A = 1.50)
(iii) (a. b)
(A
(iv) (a. b)
(A
(v) (a. b)
(A
= (.6 .. 45)
k'\j
0
1
2
3
4
0
6
21
54
111
204
1
40
78
138
236
2
115
178
279
3
241
= 1.83)
= (.55 . .4)
= 1.83)
= (.5 .. 4)
= 1.50)
k'\j
0
1
2
3
4
0
6
21
54
110
203
1
40
77
138
236
2
114
178
278
3
241
5
k'\j
0
1
2
3
4
5
0
6
21
52
101
172
273
1
42
79
133
209
2
118
175
255
3
236
80
Table 4.4 (Continued)
(vi) (a, b)
(A
= (.45, .4)
= 1.23)
k'\j
0
1
2
0
6
21
52
1
44
83
136
2
128
187
261
3
250
4
5
98
160
239
205
291
3
Comparing Tables 4.4(i)-(iii) and Tables 4.4(iv)-(vi), there are
no major changes in the values of the entries when one parameter is
held constant and the other incremented by 0.05. indicating the rule
is fairly robust in (a,b).
(Rather remarkably, the (j,k)
= (2,1)
entry only varies between 133 and 138 for all six of these tables.)
Furthermore, it is also interesting to see how close the values
of the entries are for the same value of A, e.g. Tables 4.4(i) and
4.4(vi), indicating the difference between the treatment success
probabilities is a more crucial factor than the precise values of the
unknown success probabilities themselves.
This is re-assuring since
it makes the selection of a and b just that much less critical.
Finally, it is worth noting that the choice of (a,b) has greatest
impact on the duration of the testing stage, not on the final decision
that is inferred.
The optimal rule is most likely to stop very early
when we would want it to, namely when a - b is large.
(3) Universal Optimal Stopping Points
Naturally, one would like the choice of (a,b) to be completely
uninfluential.
A conservative approach would be to find the values of
a and b for which the entries in the policy tables are minimal.
This
is analogous to Case I, for if such a pair (a,b) could be found. then
81
the corresponding policy would at worst prolong the testing stage of
the trial to. in a sense. its maximal duration (compared with using
values of a and b closer to reality).
It turns out. however. perhaps
unsurprisingly given Case l's failure to possess a uniformly
minimising pair (a. b). that no such values exist.
That this is so can
be seen by inspecting a few policy tables with values of (a,b) close
to (.5 •. 5) (since if such uniformly minimising (a.b) exist, we have
reason to believe they would be in that neighbourhood).
Table 4.5
Policy table for (a.b)
= (.51 .. 49).
k\j
o
1
2
o
6
22
53
1
45
85
138
2
135
197
273
3
267
N
4
5
97
156
228
206
288
3
This yields relatively low values for the k
expense of rather high values for the j
(0.3) position.
(a.b)
=
= 300.
=0
=0
row. but at the
column. such as 267 in the
The (5.0) entry is 228 here. and 237 by contrast for
(.525 •. 475) suggesting the minimal entry for a given (j.O)
occurs with a and b close to each other and to 0.5 (see Assertion 4.3
also).
In seeking to minimise the entries for small j and large k
values. one necessarily increases the k = 0 values.
Thus it seems
that the optimal continuation region displays some sort of "twist" in
three-dimensional space. prohibiting the existence of a universal set
of minimal entries in a policy table.
(Of course. it should be
remembered that we have restricted attention to the case of equality
between the two inferior treatments. so general statements about the
optimal continuation region without this constraint cannot be made.)
82
So, just as in Case 1 (c.f. definition of T:), an even more
J
conservative minimax-type approach can be adopted, namely, to find the
minimal entry for each (j,k) and any choice of (a,b).
This would be
the sort of rule recommended for practical use in the event of
absolutely no prior information on any of the treatments, (an unlikely
scenario), so this search for minimal t-values has not been pursued in
earnest.
However, it can at least be said that such a policy would
have entries bounded above by the values in Table 4.6 (being a hybrid
of several policy tables having different (a,b) choices).
Table 4.6
GLobaL poLicy tabLe for arbitrary (a,b), N
k'\j
0
1
2
0
5
21
52
1
40
77
133
2
114
174
255
3
236
= 300.
4
5
97
156
228
206
288
3
In order to generate a useful, practical model for a trial that
included as standard therapy a treatment that was known to succeed,
say, 35% of the time, then one could easily adopt the sort of search
just described but with the condition that either a or b equals 0.35.
The object of generating such minimal sets of entries and recombining into a single policy table is to be able to make the
following claim: that irrespective of the values assigned to a and b
(alternatively, conditional upon one treatment being a known standard)
it remains optimal to cease Stage I if the current state is (t,j,k)
and t is less than its corresponding tabulated value in this "global"
(or "restricted global") policy table.
83
Four Assertions
Several further theoretical propositions suggest themselves based
on the numerical evidence compiled.
First. some assertions can be
made concerning the order of appearance of entries in any given policy
table.
Loosely speaking, there tends to be a progression along
successive diagonals.
This is stated more explicitly in Assertions
4.3. 4.4 and 4.5 below.
Assertion 4.6 deals with some propositions
concerning the actual values of the entries in the policy tables.
Assertion 4.3
Among (j.k) such that j + k
~
i. for any given
positive integer i. and for any given (a. b). the first (lowest) entry
in the policy table occurs in position
Assertion 4.4
(i.O).
Among (j.k) such that j + k
~
i. for any given
positive integer i, and for given (a. b). the order of entries
appearing in the policy table is such that the entry at
(j.k)
= (i-i.i)
precedes the entry at (i-i-l.i+l) for i
= 0.1, .... i-I.
Note that Assertion 4.4 is a stronger statement than Assertion
4.3. since it concerns more than just the k
Assertion 4.5
=0
row entry.
In any policy table. if to is the entry at (jo,kO)'
then there exist entries numerically less than to in all positions
(j.k) such that
j + k
< jo
+ k O'
Together. these last two assertions indicate how policy tables
expand with the horizon.
(It may be helpful to look back at the
earlier policy tables with N
= 300.)
Diagrammatically. if the policy
84
table has the following shape in outline. then
X
marks the only
candidates to be the next entry (Assertion 4.4):
TO ~ T1 ~ ~ ~
and numerically (Assertion 4.5):
T3 .
The final assertion is based on observing the values of the
minimal t entries in policy tables for large N.
displayed. the first having (a.b)
= (.6 .. 5)
second (a.b) = (.75 .. 25) with N = 60.000.
Two examples will be
with N
= 9.999
and the
These are extracts from the
corresponding policy tables. where the figure in brackets below each
t-value is the ratio of the (j.k) entry to the (j-1.k) entry.
Table 4.7
(a.b)
k\j
e
Extracts from two Large horizon poLicy tabLes.
6
7
= (.6 .. 5).
9
8
A
10
= 1.5
11
12
13
14
0
9CX)
417
615
1311
1911
2802
4122
6090
9033
(1.51 ) (1. 47) (1.46) (1.46) (1.46) (1.47) (1.47) ( 1.48) (1. 48)
1
4200
6171
9117
669
957
1374
1980
2874
(1.44) (1.43) (1.44) (1.44) (1.45) (1.46) (1. 47) (1. 48)
(a.b) = (.75 .. 25). A = 9.0
k\j
0
0
6
1
51
1
2
3
4
27
195
(4.50) (7.22)
1659
(8.51)
14784
(8.91)
222
1689
(4.35) (7.61)
15036
(8.90)
e
85
Assertion 4.6
(i) For a given (a. b). and for fixed k. the ratio of
successive minimal t-values tends to a limit; and
(ii) this limiting ratio equals A.
We will discuss briefly some implications of Assertion 4.6.
detecting a specified difference. say
For
100 0 %. there exists a
limiting rate of increase of the minimal t-values in any policy table.
This is because A has a minimal value amongst all (a.b) with fixed
d
=a
- b
occurring when
elementary calculus.)
a + b
= 1.
(The proof of this is by
So. to find this minimal limiting constant
ratio of successive entries. for a given O. one only need evaluate the
value of A corresponding to (a.b)
denote such minimal A by
Table 4.8
= (.5
- 0/2 • . 5 + 0/2).
If we
AmIn
. . some sample values are shown below.
MinimaL A for various
o.
A
min
0.10
1.49
0.05
1.22
0.02
1.08
0.01
1.04
This means that in seeking to detect a difference of 5%. one can
expect the policy table entries to grow in the k
at least. for large j.
=0
row by about 22%.
This. in turn. gives extra information
concerning the robustness of the optimal stopping rule in N. and
furthermore. adds weight to the evidence that our concentration on the
case having a + b
regions.
= 1.
does indeed give rise to maximal continuation
(Additional comments on this will be given in Chapter V.)
86
Applying the above reasoning to the (admittedly. rather extreme)
case in which (a.b)
N
= 60.000.
= 5.
(.75 •. 25). then. based on computer output for
we can assert fairly reliable estimates for the minimal
t-values in the k
j
=
=0
row for sufficiently large N.
6 and 7. approximately
132.000;
1.180.000
These are. for
and
10.500.000
respectively.
Combined with Assertion 4.4. this says that the policy table for
(a.b)
= (.75 •. 25)
has entries in excess of 10.000.000 for j + k
In conjunction with the supposition that a + b
=1
~
7.
gives rise to
maximal continuation regions. then this last statement also applies to
any (a.b) with a and b separated by 0.5 or more.
Based on this. one could easily devise a stopping rule for
detecting sizeable treatment success differences (note that 0.5 would
be far greater than what would ordinarily be encountered in practice)
that continue sampling by triplets only until the leading treatment
has accumulated. say. 7 more successes than the worst.
Such a rule
would be highly insensitive in the selected value of patient horizon.
N. where N can be very large indeed.
This example. in spite of its
extremity. serves to illustrate how robust in N. but more importantly.
how simple. an optimal rule can be to put into practice.
Furthermore.
applied to such large horizons. Assertion 4.2 indicates that one
expects negligibly few patients to receive the inferior treatment.
CHAPTER V
CASE 3: TWO PAIRWISE TRIALS
Introduction
Case 3 is developed primarily for comparison purposes with
Case 2, in order to address the advisability of sampling only two
treatments at a time as opposed to using all three.
We consider
performing two pairwise trials, one immediately after the other. so
again, there are two stages of testing prior to switching all future
patients to an indicated treatment.
"Stage I" is the first
comparative trial (in practice, one would want this to include a
standard therapy) and "Stage II" commences once enough evidence has
accumulated in one treatment's favour, with that treatment being
compared with the third treatment.
The better of this latter pair is
then to be used for the remaining patients.
Each stage is stopped by
the usual criterion of maximising the total expected number of
successes for all patients.
At the transition to Stage II, we assume that the two treatments
begin on an equal footing, so as to most closely imitate a practical
situation whereby trials are conducted independently.
The only
difference, however, is that Stage I is terminated by a rule that is
cognisant of a third treatment being available.
To be compatible with
Case 2, we impose the same restrictions:
(i) the prior at the start of Stage I is the analogous Feldman-type
88
distribution with success probabilities a. b and b (where a
> b):
(ii) the total patient horizon is N;
(iii) the objective function is the same as before; and
(iv) once a treatment is dropped from contention. it is never recalled
for further use.
We are most interested in examining the behaviour of the optimal
stopping rule at its moment of transition out of Stage I.
The aim of
this chapter is to develop the necessary theory behind a program. in
order to describe this transition. suitable for both small and large
horizons. N. and to analyse the results of such a program.
In our
quest for the optimum two-at-a-time rule. under the aforementioned
restrictions. we incorporate two somewhat impractical options:
(a)
allowing for the possibility of never using the third treatment;
and
(b)
allowing for the possibility of switching all to a previously
unused treatment.
Naturally. one would never trust one's prior beliefs to such an
extent in real life. but these options are useful.
(a) ensures that
the rule performs consistently for odd and even horizons.
(Any
sensible rule should meet this fundamental nesting requirement. that
one expects. e.g .. the reward for 15 patients to be between those
corresponding to 14 and 16 patients. respectively.)
(b) further
improves the performance of the rule by increasing the numerical value
of the rewards (as opposed to not including it).
Together. they
guarantee that the Case 2 comparison (which. after all. is the present
emphasis) is against no "straw man". but rather. really is with the
best possible alternative under our restrictions.
Happily. these
unrealistic options only occur at extremes of likely (plausible-to-
89
arise) states, so the probability of invoking one or other of them is
only small, and, further, the larger the horizon, the less likely they
are encountered as the means of terminating Stage I.
Options (a) and
(b) appear below in (3) and (4), respectively.
The optimal policy when in Stage I of Case 3 selects the best of
four options:
(1)
Continue Stage I by sampling another pair on the same two
treatments;
(2)
Begin Stage II. a new pairwise trial between the current leader
and the so far unused treatment;
(3)
Stop the testing stage, putting all future patients on the
current leader; and
(4)
Stop the testing stage, putting all future patients on the so far
unused treatment.
Terminology and Notation
To avoid excessive subscript notation, we shall, throughout this
chapter, use j and k to denote (what was formerly denoted sl and s2)
the number of successes accumulated by the current leader and the
second-best treatment, respectively, at any given time in Stage I of
Case 3.
Note, in particular, that j is no longer a score difference
between treatments, as in preceding chapters, and that j
convenience, we shall later re-parameterize anyway.)
~
k.
(For
Once again, if t
represents the number of remaining patients from the original horizon
at any given time, the state (t,j,k) is Markovian.
Let us denote the maximum expected number of successes gained by
acting optimally from the state (t,j,k) by Vt(j,k).
We will further define ~(j,k), for m
= I,
2, 3 and 4 as follows:
90
Vt(j,k)
= max {V~(j,k), ~(j,k), ~(j,k),
V;(j,k)},
where the superscripts refer to the four options just described above.
We shall say that a state (t,j,k) is "m-mode optimal" if
for m = I, 2, 3 or 4.
(We will later set conventions for the case of more than one mode
being optimal simultaneously.)
Thus, the present quantity of interest in Case 3, for a given
horizon N, is VN(O,O).
S~(O,O)
It is this that we seek to compare with
to infer any quantitative differences between sampling by
at most pairs (Case 3) instead of triplets (Case 2).
Result 5.1 to follow gives the exact formula for V + (j,k).
t 2
Since no new theory is involved, it is stated without proof.
Suffice
to say that the prior associates equal weight on the three points
(a,b), (b,a) and (b,b).
The posterior distribution
PI' P2 and P3 ,
after observing j and k successes by time t+2, which is necessary for
V~+2(j,k), is derived in Chapter III.
The reasoning behind
is given in Chapter IV, and lastly, the formulae for
y3
V~+2(j,k)
4
and V are
quite straightforward.
Dynamic Equation in V + (J,kl
t 2
Resul t 5.1
For any given state (t+2,j,k) in Stage I, the reward
satisfies:
with initial conditions,
91
where
~+2{j,k)
~+2{j,k)
Also
=
= PIt+2{.J, k)
PI
and
02 _ I-b
- I-a
where we define 0 > 0 by:
Now, in general, for j
(> 1).
> k, we define here,
A Vt{j,k)
=
[abP I + baP2 + bbP3] Vt{j,k)
+ [abP I + baP2 + bbP3 ] Vt{j,k+l)
+ [abP I + ~P2 + bbP3 ] Vt {j+l,k)
+ [abp 1 + baP2 + bbP 3 ] Vt {j+I,k+l).
If j = k, then
A Vt{j,j)
=
[abP I + baP 2 + bbP3 ] Vt{j,j)
+ [(ab+~){I-P3) + 2bbP 3 ] Vt{j+I,j)
+ [abp
I
+ baP
2
+ bbP ] Vt (j+l,j+l).
3
Lastly, St+2{O) is found from the recurrence given in Result 4.1,
which, for ease of reference, is repeated here:
St+2{y)
with
= max{
S.
1
=R.
1
'"
Rt +2 {y), UySt(y-l) + vSt(y) + WySt(y+I)}
for i
= 0,
I
for y
>0
92
and
v
u
Note:
- {3 cosh(y-1)a
y cosh ya
and
ab :
= ab
+
w
y
= (3
=~
a
In 'A
{32
= abba
cosh(y+1)a
cosh ya
We shall also use the notation p.(j,k) to mean p~+2(j,k),
1
1
unless stated otherwise or if a superscript is needed for clarity.
Lemma 5.1
j * , which, for any given t, j, k;
There exists an integer
~
PI
P3
if and only if
~
j
j
*.
From their definitions,
Proof
PI
~
P3
i f and only i f
That is, i f and only i f
j
~
'A
j
N-t-2
0
,
~
(N-t-2h
2
02 = l-b
I-a
where
where
'Y
=
> 1.
2 In 0
In 'A
so J.* = [(N-t-2
2 h ] ,where [.] denotes integer part, is that integer. 0
Corollary 5.1
and for j
Proof
> j*,
Vt(j,k)
From the definitions of
~(j,k) ~
that
y3
= max
{ V;,
V~, ~
}.
and V4 , and Lemma 5.1, it follows
V;(j,k)
<=>
j
~
j*.
Hence the result.
This corollary simplifies proofs of the following results, since
for a given jo' a point (t,jo,k) cannot be 3-mode optimal for one
value of k and 4-mode optimal for another.
0
93
the value of j * when t
The value of
= 2.
incidentally, is always strictly between 0 and 1.
~,
= I,
the event that a + b
~
= 0.5
In
precisely.
The next result characterises the optimal stopping rule for when
t
= 2.
Proof
(a) Y2(j,k)
= Y21
if and only if
(b) Y2 (j,k)
=~
if
j
**
~ j
(c) Y (j,k)
2
= Y24
if
j
~ j
=k
and
Since we are considering the special case t
= 2,
Lemma. 5.2
will be dropped throughout this proof.
j
j
~
j **
and
**
subscripts
Establishing the lemma. is a
matter of verifying some simple inequalities among the expressions:
1
=a
+ b - dP ,
3
y3 = 2b
y2 = a
+ b - dP 2 ,
Y = 2b + 2dP3 .
Y
(a)
y1 ~
y3 <=>
1 - P
3
~
+ 2dp 1 '
4
2P 1
<=>
<=>
l\k ~ oN-2
P2 ~ PI
j
y 1 ~y2
<=>
P2 ~ P3
y1 ~ y4
<=>
1 - P
3
<=>
l\k ~ oN-2
~
<=> 3P 3 ~
<=> j _> J.** ,
2P 1 ~ 1 - P2
y3 ~
j
<=>
<=>
PI
j ~ j **
<=>
2P3
y3 ~ y2 <=>
y4
l\k ~ l\j
= k.
<=>
~
~ j** (from Lemma. 5.1).
if j
= k.
1
if j
Thus, (2,j,k) is I-mode optimal if and only i f
(b)
<=>
P3
j
= k.
>
_ J.**
<=>
j
and j
~ j**
= k.
94
j ~ j**
Thus, (2,j,k) is 3-mode optimal if
between j and k, then v3
= V1 ,
In the event of equality
and we will adopt the convention of
giving precedence to 1-mode, the continuation of Stage I.
c
()
J
J. <.**
I n t h e rema i n i ng i nstances,
integer, then again there is equality, V4
and
= v3,
V
V4
=.
If j ** is an
and the convention
o
will be to give precedence to 4-mode.
Thus, the array or "modes table" displaying optimal modes is
completely characterised at t
Modes tabLe, t
Table 5.1
j
k
=#
=#
= 2,
= 2.
(j,k) co-ordinates.
successes on leading treatment
successes on second treatment
N-2
k\j
0
1
j **-1
0
4
4
4
3
3
3
4
4
3
3
3
4
3
3
3
1
3
3
1
'**
j -1
j **
j **+1
j** j**+l
1
(k
~
""2
3.
3
j)
N-2
1
3
1
""2
For an example, refer to Appendix 3, which displays the modes
table at t
=2
when the horizon, N
As one would anticipate, when
maximal.
= 36 and (a,b) = (.6,.5).
t = 2, v2(j,k) is never uniquely
4
(It can at best equal v3(j,k) and V (j,k) in the case
e
95
j
= j **
and if j ** is an integer.)
This says. if Stage I has
continued until just two patients remain. there is no point in
starting Stage II for one pair.
Instead. it is optimal to put both
patients on either the current leader or on the so far unused
treatment. whichever has the higher expectation.
Notice that when t
= 1.
3-mode or 4-mode is optimal.
there are only two choices - either
Corollary 5.1 shows that this
distinction is independent of the value of k. so the modes table
consists simply of 4's in the upper left corner (an "upper triangle of
4·s". we shall say) and 3's elsewhere.
The next results are concerned with the patterns displayed in
modes tables for higher values of t.
These results were first
suggested by examining computer output giving the optimal mode for
each (t.j.k) and small horizons.
The remainder of this section is
most readily understood after referring to an example.
case. having parametric values N
= 100.
t
= 60
and (a.b)
A typical
= (.6 .. 4)
is
given in AppendiX 4.
First. some terminology: we shall say (t+2.j.k) has as its
"immediately attainable states". those states that can be attained
after sampling the next pair of patients.
Specifically. for k
< j.
these are the elements of the set
{(t.j.k). (t.j+l.k). (t.j.k+1). (t.j+l.k+1)}
and. for k = j. elements of
{(t.j.j). (t.j+1.j). (t.j+1.j+1)}.
Lemma 5.3
(a) If (t+2.j.k) is 4-mode optimal for given t. j and k.
then (t+2.j.k-1) is also 4-mode optimal. provided for both states. all
of their immediately attainable states are 4-mode optimal.
96
(b) If (t+2,j,k) is 4-mode optimal for given t, j and k, then
(t+2,j-l,k-l) is also 4-mode optimal, provided for both states, all of
their immediately attainable states are 4-mode optimal.
Note: The immediately attainable states referred to in (a) are, in
general, for
0
<k <
< N-t-2
2 ,as
j
follows:
(t,j,k), (t,j+l,k), (t,j,k+l), (t,j+l,k+l), (t,j,k-l), (t,j+l,k-l).
or, in particular, a certain subset of these states if any inequality
is replaced by equality.
In (b), similarly, there are seven
immediately attainable states, in general, that are reqUired to be
4-mode optimal.
Proof
The proof is stated in the general case having
o < k < j < N-t-2
2
only.
The modifications necessary for equal or
extreme values of k and j are minor, and so are omitted.
(a)
Observe that the 4-mode optimality of (t+2,j,k) implies, by
Corollary 5.1, that j
Next, consider
~
j * so (t+2,j,k-l) cannot be 3-mode optimal.
1
.
Vt +2 (J,k-l)
= a+b
t+2 .
- dP3 (J,k-l) + A Vt(j,k-l),
where A is a linear operator (given in Result 5.1) with coefficients
summing to one.
Each attainable state from (t+2,j,k-l) is, by
assumption, 4-mode optimal.
A Vt(j,k-l)
Therefore,
= Alt(b+dp~(j,k-l»
+ A2t(b+dp~(j,k»
+ ~t(b+dp~(j+l,k-l»
where
AI' A ,
2
~
+ A4t(b+dP~(j+l,k»,
and A are known functions (implicitly stated in
4
Result 5.1) of t, j and k such that their sum is one.
So, it follows that
In general,
A Vt(j,k-l)
= t[b
+ d A p~(j,k-l)].
~
97
-- t+2
-- t+2
-- t+2
t
[abp
(j.k-l) + baP
(j,k-l) + bbP
(j,k-l)] P (j,k-l)
2
3
3
1
=
- t+2
- t+2
- t+2
t
(j,k-l) + baP
(j,k-l) + bbP
(j,k-l)] p (j,k)
+ [abp
1
3
3
2
- t+2
- t+2
- t+2
t
(j,k-l) + baP
(j,k-l) + bbP
(j,k-l)] P (j+l,k-l)
+ [abp
1
3
3
2
t+2 .
t+2 .
t+2 .
t
(J,k-l) + bbP
(J,k-l)] p (j+l,k).
+ [abp 1 (J,k-l) + baP
3
3
2
The first term, on expansion, is:
abA
j
~ baAk - 1+ bboN- t - 2 pt(j,k-l) _ ab(A j + Ak - 1+ oN-t)
AJ + Ak - 1+ oN-t-2
3
-
oN-t
Aj + Ak - 1+ oN-t-2 Aj + Ak - 1+ oN-t
-- 2 t+2 .
= abo
P3 (J,k-l).
Similar reductions take place in the second, third and fourth terms,
so that these are, respectively,
- 2 t+2
abo P
(j,k-l),
3
- 2 t+2
abo P
(j,k-l), and
3
- 2 2 t+2 .
ab 0 P
(J,k-l).
3
I-b
(Note, this "cancelling of denominators" phenomenon occurs in earlier
simplifications seen in the posterior probability derivations in
Chapters II and III.)
This means that
t
A P (j,k-l)
3
= [b-2
2 t+2 .
+ 2bb + b ]P
(J,k-l)
3
.
= P3t+2 (J,k-l)
A Vt(j,k-l)
= t[b
+ dp;+2(j,k-l)]
V~+2(j,k-l)
= a+b
+ bt + (t-l)dp;+2(j,k-l).
Similarly,
A Vt(j,k)
= t[b
+ dp;+2(j,k)]
so
V~+2(j,k) = a+b + bt + (t-l)dP;+2(j,k).
and so
Now,
V;+2(j.k-l)
= (t+2)(b
and
2
p t + (J',k-l)
3
= Aj
+ dp;+2(j,k-l»
N- t - 2
J:
u
k-l
+ A
N-t-2
+ 0
> Pt+2(.J, k) .
3
98
V~+2(j.k-l)
Since
<
-
V~+2(j.k) = d(t-I)(p~+2(j.k-l)
4
4
Vt +2 (j.k-l) - Vt +2 (j.k)
= d(t+2)(P3t+2 (j.k-I)
-
p~+2(j,k»
t+2
- P3 (j,k».
it follows that
where the latter inequality is by an assumption.
Thus.
and hence. (t+2.j.k-l) cannot be I-mode optimal.
Now it only remains to show that it cannot be 2-mode optimal either.
To prove this. it is sufficient to demonstrate:
again. with the latter holding by assumption.
the desired inequality:
V;+2(j.k-l)
From this. we obtain
~ ~+2(j.k-I).
So. consider
and. from above.
4.
4
t+2 .
t+2 .
Vt +2 (J.k-l) - Vt +2 (j.k) = d(t+2)(P3 (J.k-I) - P3 (J.k».
We will need a tight upper bound for St+2(O). found by exploiting the
assumed 4-mode optimality of (t+2.j.k).
Since V;+2(j.k)
~ ~+2(j.k).
(t+2)(b + dP 3 (j.k»
and hence.
we know:
~ (I-P2 (j.k»dS t +2 (O) + ~t+2)(a+b - dP 2 (j,k»
99
(I- P2(j.k»dS t +2 (O)
~ ~t+2)[2P3(j.k)
= ~t+2)[P3(j·k)
Therefore.
St+2(O)
+ P2 (j.k) - 1]
- P1(j.k)].
(t+2)[p3 (j·k) - P1(j·k)]
~ ---~---....;;."..-­
2[P3 (j·k) + P1(j·k)]
and so
This implies that
2
2
Vt+2 (J. . k-l) - Vt+2 (J' ' k) ~ d(t+2)p (J·.k) P2 (j·k) - P2 (j·k-l)
3
1 - p
This is indeed less than or equal to
2
(j,k)
V;+2(j.k-l) - V;+2(j,k).
if it is true that (after cancelling d(t+2) from both sides):
That is. if and only if
but this is so. because the left and right-hand sides are, in fact,
equal for any j. k. N and t.
Both sides equal the amount
oN-t-2
(AJ + oN-t-2)
(A j + Ak - 1+ oN-t-2) (A j + Ak + oN-t-2)
Thus. (t+2.j.k-l) cannot be 2-mode optimal. and hence. by elimination.
(t+2,j.k-l) is 4-mode optimal as asserted.
(b)
An entirely similar proof holds for
appropriate places.
j-l
replacing
j
in
0
100
If (t+2,f,f) is 4-mode optimal for some integer
Corollary 5.2
t
~
1 and (t,j.k) is 4-mode optimal for each (j.k) such that
k
~
j
~
f + 1, then (t+2,j,k) is 4-mode optimal for each (j,k)
~
satisfying k
Proof
f.
j
~
f.
Follows immediately from repeated applications of Lemma 5.3.
0
This corollary essentially says if there exists an upper triangle
of 4's in a modes table at some value of t, then the existence of a
particular 4-mode optimal point along the diagonal at the next higher
value of t, ensures the continuation of the upper triangle of 4's.
The assumption about some (t+2,f,f) being 4-mode optimal on the
diagonal is not restrictive, for example at t
= 2,
we know f
= j ** .
Empirical evidence based on small horizon programs suggests the
assumption can be dropped altogether; see Assertion 5.4 to follow.
(It is included in the statement of Corollary 5.2 in order for Lemma
5.3 to hold.)
The result could be stated more strongly for
off-diagonal states, by separating (a) and (b) as in the lemma, but
for programming application purposes, it is most useful to combine
them into one statement concerning the diagonal.
The next lemma and corollary address 3-mode optimality.
Lemma 5.4
(a) If (t+2,j,k) is 3-mode optimal for given t, j and k.
then (t+2,j+1,k+1) is also 3-mode optimal, prOVided for both states,
all of their immediately attainable states are 3-mode optimal.
(b) If (t+2,j,k) is 3-mode optimal for given t, j and k, then
(t+2,j,k-1) is also 3-mode optimal, provided for both states, all of
their immediately attainable states are 3-mode optimal.
101
Note: The immediately attainable states referred to in (a) are, in
general, for
o < k < j < N-t-2
2 '
as follows:
(t,j,k), (t,j+l,k), (t,j,k+l), (t,j+l,k+l),
(t,j+2,k+l), (t,j+l,k+2), (t,j+2,k+2),
or subset of these if any inequality is replaced by equality.
In (b),
there are six immediately attainable states, in general, that need to
be 3-mode optimal.
Proof
(Again, for simplicity, the proof is stated in full for the
most general case,
(a)
o < k < j < N-t-2
2 ,only.)
First, note that the 3-mode optimality of (t+2,j,k) means j
> j*,
so that (t+2,j+l,k+l) cannot be 4-mode optimal.
Secondly, consider,
V~+2(j+l,k+l) = a+b - dp~+2(j+l,k+l) + A Vt(j+l.k+l).
The assumption concerning immediately attainable states implies that:
A Vt(j+l,k+l)
= t[b
+ d A p~(j+l,k+l)].
In general,
A
p~(j+l,k+l)
=
[abPl + baP2 + bbP 3 ] p~(j+l,k+l)
+ [abp
1
+ baP
2
+ bbP ] p~(j+l.k+2)
3
+ [abP l + ~P2 + bbP3 ] p~(j+2,k+l)
+ [abp 1 + baP2 + bbP3 ] p~(j+2,k+2),
where each Pi within [ ] abbreviates
p~+2(j+l,k+l) for i = 1. 2. 3.
The first term, on eXPansion, is:
Again, as in the proof of Lemma 5.3, similar reductions take place in
the second, third and fourth terms, so that these are, respectively,
102
- t+2
abp i (j+l.k+I).
- t+2
abp i (j+l.k+I). and
abp
t+2
i
(j+l.k+I).
This implies
t
A PI(j+l.k+l)
so
-= Cab
t+2
+ ab + ab + ab]P I (j+l.k+l)
I
Vt +2 (j+l.k+l)
= PIt+2 (j+l.k+l)
t+2
t+2
- dP 3 (j+l.k+l) + t[b + dP
(j+I,k+I)J.
I
I
t+2 .
t+2 .
Vt +2 (j.k) = a+b - dP 3 (J.k) + t[b + dP
(J.k)].
I
Also we know
= a+b
Hence their difference.
V~+2(j+l.k+l) - V~+2(j.k)
.
= d[p3t+2 (J.k)
t+2 .
t+2 .
t+2 .
- P3 (J+I.k+I)] + dt[P I (J+I.k+l) - PI (J,k)J.
Furthermore.
~+2(j+l.k+l) - ~+2(j.k) = d(t+2)[p~+2(j+l.k+l) - p~+2(j.k)] > 0
Aj +1
= Aj +l + Ak+l + oN-t-2
> PI
p~+2(j+l,k+l)
Ak+1
= Aj +l + Ak+l + oN-t-2
> P2t+2(.J. k)
p~+2(j+l.k+l)
= Aj +l +
p~+2(j+I.k+l)
since
t+2(.J. k) .
Note also that
and
aN- t - 2
Ak+ l + oN-t-2
< P3t+2(.J. k) .
Now.
~+2(j+l.k+l)
-
~+2(j.k)
-
V~+2(j+l.k+l)
+
V~+2(j.k)
t+2 .
t+2 .
t+2 .
t+2
(J+I.k+l) - PI (J.k)] - d[p3 (J.k) - P3 (j+l.k+I)J
= 2d[P I
=
d{2p~+2(j+l.k+l) + p~+2(j+l.k+l) - 2P~+2(j,k) _ p~+2(j,k)}
= d{P It+2 (j+l.k+l)
t+2 .
t+2 .
t+2
- P2 (J+l.k+l) - [PI (J.k) - P2 (j,k)J}.
This expression is non-negative provided we have
Aj + l _ Ak+1
Aj +1+ Ak+ l + oN-t-2
This holds. since j
Therefore.
~
~
Aj _ Ak
Aj + Ak+ oN-t-2
Aj + l _ Ak+1
= Aj +l + Ak+l + AaN- t - 2
k and the latter has the largest denominator.
e
103
~+2(j+1,k+1)
V~+2(j+1,k+1) ~ ~+2(j,k)
-
V~+2(j,k) ~
-
0
and so it follows that (t+2,j+l,k+l) cannot be I-mode optimal.
~+2(j+l,k+l) ~ ~+2(j+1,k+1),
Finally, to show that
so that 2-mode
cannot be optimal either, we will demonstrate that:
~+2(j+l,k+l) - ~+2(j,k) ~ ~+2(j+l,k+l)
-
~+2(j,k).
To see this, note that,
V~+2(j+l,k+l) - V~+2(j,k)
= d[P2(j,k)
- p2 (j+l,k+l)]{St+2(0) +
~t+2)}
~ d(t+2)p (j,k) [p2 (j,k) - p2 (j+l,k+l)]
1
1 - p (j,k)
2
using a similarly obtained upper bound to that used in the proof of
Lemma 5.3.
This proof is complete provided:
that is, if and only if
But this is so, because this says, on multiplying both sides by
(A j + 1+ Ak + 1+ oN-t-2)(A j + Ak + oN-t-2):
Aj + 1 (A j + oN-t-2) ~ Aj (A j + 1 + oN-t-2)
Aj+loN-t-2 ~ Aj oN- t - 2
or
which clearly holds, since A
(b)
> 1.
o
This is proven in a similar fashion.
Corollary 5.3
and t
both j
~
If
(t+2,f,~)
is 3-mode optimal for some integers
f.
~
I, and (t,j,k) is 3-mode optimal for each (j,k) satisfying
~
f and j-k
~
f
-~,
(j,k) in the same range.
then (t+2,j,k) is 3-mode optimal for each
104
Proof
This is a direct consequence of repeated applications of
Lemma 5.4.
o
Corollary 5.3 essentially says: if all states in a certain region
are 3-mode optimal at time t, then the 3-mode optimality of one
specific state at time t+2 preserves the whole 3-mode optimal region.
This region is the set of states that lie at least equidistant from
the diagonal and are above or to the right (in the modes table) of the
specific state mentioned.
Once again, empirical evidence suggests a much stronger result is
true; the condition requiring any state to be 3-mode optimal at time t
seems unnecessary.
This claim is among the next set of assertions
concerning patterns in modes tables.
As yet, these are unproven, but
not contradicted in computer program runs using a wide variety of
values for a and b, and moderate sized N.
Four Assertions
Assertion 5.1
(a) For any given t, j and k
< j,
if (t,j,k) is I-mode
optimal, then (t,j,k+l) is also I-mode optimal.
(b) For any given t and j
< N;t,
if (t,j,j) is I-mode optimal, then
(t,j+l,j+l) is also I-mode optimal.
Assertion 5.2
For any given t, j and k
> 0,
if (t,j,k) is 2-mode
optimal, then (t,j,k-l) is also 2-mode optimal.
Assertion 5.3
(a) For any given t, j and k
> 0,
optimal, then (t,j,k-l) is also 3-mode optimal.
if (t,j,k) is 3-mode
105
(b) For any given t, k and j
N-t
< ~'
if (t,j,k) is 3-mode optimal. then
(t,j+l,k+l) is also 3-mode optimal.
Assertion 5.4
(a) For any given t, j and k
> 0,
if (t.j,k) is 4-mode
optimal, then (t,j,k-l) is also 4-mode optimal.
(b) For any given t, j and k
> 0,
if (t,j,k) is 4-mode optimal, then
(t,j-l,k-l) is also 4-mode optimal.
Of course, each of these assertions, stated only for neighbouring
points within any modes table, can easily be applied recursively.
For example, i f a "2" occurs in any modes table, then the entire
column of entries above it (meaning same j, lower k values) consists
only of 2's.
The same, and more, can be said for 3's and for 4's.
Repeated applications of Assertions 5.3 and 5.4 demonstrate whole
regions are "unimodal".
Each assertion has a contra-positive version.
for instance, Assertion 5.1 suggests that a "non-I" anywhere in a
modes table implies an entire column of non-l's above it.
Programming Considerations
We shall now turn attention to the problem of computing the
optimal stopping rule and associated expected reward for given values
of a, b and N.
Since the prime motivation behind studying Case 3 is
to be able to make comparisons with Case 2, we are interested in
seeing how well it performs both for small and large N.
Large horizons need to be handled carefully to overcome the
difficulty of storing prohibitively large matrices.
Determining the
optimal stopping rule is a matter of keeping track of the values and
locations of the I-mode optimal points within the modes tables.
This
106
is because if we know a particular entry is 2. 3 or 4-mode optimal. we
can easily compute its corresponding value
from its (j.k) location alone.
(v2. y3
or y 4 respectively)
(We will explain later the behaviour
of the I-mode optimal states and the extent to which the unproven
assertions are relied upon.)
The dynamic equation in Yt is nested for
odd and even values of t by its construction (as mentioned before).
that is. for any (t.j.k) we have.
Therefore. there is no loss in considering just even horizons, so from
now on we will take N to be even.
As a first step toward large
horizon programs. it is helpful to re-parameterize so as to focus
attention on those states that are I-mode optimal.
We replace (t.j.k) by another. equivalent Markovian state
( t • f . i). where:
f
i
= number of failures on leading treatment at time
= difference in number of successes at time t = j
t
N-t
=~
-
- k
j
nO).
Thus. in terms of these parameters.
and
The posterior probabilities can easily be re-written in terms of f and
P~{f.i) =
i as f 0 11 ows :
t
2
and
where we define c
>0
1
-i
1 + X
f N-t'
+ Xc
X-i
p {f. i)
=
t
P3 {f.i)
Xf c N-t
=
-i
f N-t'
1 + X + XC
by:
c
f N-t
-i
1 + X + Xc
2
b
a
«
1).
Further. define Bt{f.i) to be the expected number of successes from
the state (t.f.i) by acting optimally thereafter.
Thus.
107
For reference. these are:
where. in general. for i
> O.
A Bt(f.i)
=
[abP I + baP 2 + bbP3 ] Bt(f+l.i) + [abp + biP + bbP ] B (f.i+1)
t
i
2
3
+ [abP I + baP2 + bbP3 ] Bt(f.i-l) + [abP I + baP2 + bbP ] Bt(f.i)
3
. h PI
WIt
(If i
= PIt+2(f . 1.).
= O. A Bt
etc.
is the same. except that Bt(f.i-l) has to be replaced
by Bt(f.l).)
B~+2 = d[l
Also.
and
-
p~+2(f.i)]St+2(O)
B~+2 = (t+2)(b
+
4
Bt +2
t+2
+ dP 3 (f.i».
= (t+2)(b
+
t+2
t+2.
(f.l».
~a+b-dp2
dP~+2(f.i»
The quantity of interest for comparison with Case 2 is now
~(O.O).
representing the expected number of successes before sampling
any patients from the horizon. N.
The effect of the re-parameterization to (f.i) co-ordinates is to
transform the modes table.
Instead of the triangle of 4's appearing
in the upper left corner. it now appears in the upper right corner.
and. more usefully. the diagonal l's are now in the first row.
example. at t
5.2.
= 2.
For
the new modes table is of the form given in Table
108
Table 5.2
Modes tab1.e, t
f
i
= # failures
= difference
(f,i) co-ordinates.
in # successes
N-2
0
1
f**-l
f**
f**+l
0
1
1
1
4
4
1
3
3
3
4
4
£**-1
3
3
3
4
4
f**
3
3
3
4
f**+l
3
3
3
T
4
4
3
T
f **
e
on leading treatment
i\f
N-2
where
= 2,
3
N-2
=T
-
j
**
Notice that the total number of B values in the table is
2
1 N-2
2T
2
x
N-2
1
This, it turns out, limits a program that calculates B ,
T'
3
4
B , B and B
to evaluate
for each (2,f,i) cell to N approximately 150.
~(O,O)
In order
for larger values of N, one has to reduce the
number of calculations performed without compromising accuracy.
We noted already that the key to determining the optimal stopping
rule is to search for the I-mode optimal states.
As it happens. these
l's enter the modes table for higher values of t in essentially two
distinct ways.
As t increases by 2, an entire row of l's may replace
the row of 3's immediately below an existing row of l's, or, a column
of l's may replace some, or all, of a column of 2's, 3's or
4'5
at a
109
boundary between two types (i.e. between 3's and 2's; 3's and 4's or
2's and 4's).
For an illustration of this latter effect, refer to the
modes tables for t
5(a).
= 68,70
when N
= 100,
(a,b)
= (.6,.4)
in Appendix
A new row of l's seems to remain for all higher values of t.
However, the columns of l's, acting as a "fence" between distinct
stopping modes, appear irregularly.
See Appendix 5(b), giving the
next higher value of t from Appendix 5(a).
For completion, the last
few values, as t approaches N, are displayed in Appendix 5(c).
The intuitive reason behind the temporary columns of l's is
unclear, except to say it is as if the mechanism for determining
whether to continue or not is sometimes unable to decide which of two
stopping modes to choose, when at a precise borderline, and instead.
"sits on the fence".
This phenomenon is genuine in that the I-mode
optimal values are strictly greater than their stopping mode
counterparts.
Neither is it a consequence of special choices of
parametric values, such as a and b summing to 1, causing
(N-t)~
to be
an integer.
Being pragmatic, this fence behaviour only minutely affects the
final value of
~(O,O).
So, instead of performing calculations for
the full range of i, from 0
N-2
to~,
it suffices to consider a
truncated range, from 0 to i ' say, where i
O
independent of N.
O
is a constant,
The smaller i ' the quicker computations are
O
performed, but obViously, it has to be large enough to ensure accuracy
of results.
This can be verified by checking agreement to a desired
number of decimal places with i +1 replacing i '
O
O
horizons, say up to 300, i
O
= 10;
For moderate
for larger horizons, i O
sufficient for acceptable accuracy.
= 20.
is
(The difference, d. between a and
b is also influential - the smaller d, the larger i O has to be.)
110
The second programming consideration to reduce requisite storage
space is an application of Assertion 5.4 (recall, a slightly stronger
statement than Corollary 5.2).
Storing B values beyond encountering
t
4-mode optimality in the first row (c.f. i = 0, or j = k in former
parametric terms) is unnecessary, on the grounds that all subsequent
values for higher f will also be 4-mode optimal, and therefore, of
known value.
of f, f
max
This means, for any given t, that there exists a value
say, depending on t, such that
for all f
The initial value of f
the sequence {f
highest f
max
} for t
max t
occurring at t
max
= 2,4, ... ,N.
from Lemma 5.2.
f
~
max
=2
is the largest of
We can easily find this
4
1
It occurs when B (f,O) exceeds B (f,O)
2
2
for the first time (i.e. lowest value of f), which, in turn, is when
P3 first exceeds Pl'
A little algebra shows that, at t
= 2,
if [oJ
denotes integer part, then
Note, for general t, f
max
equals [N;t(l_'Y}] approximately.
The preceding two arguments justify a considerable reduction in
the size of matrix to be stored to fmaxx i '
O
One further reduction is
necessary, though, to handle horizons above about 1,000 because f
still depends on N.
max
The last programming consideration exploits the
fact that for large horizons, values of Bt(f,i) tend to a limit for
small f as N increases.
This is a consequence of the follOWing
observation.
For a given (t,f,i), as N
~ 00,
111
and
P~(f.i)
-+
~l{i) =
p~(f. i)
-+
~2{i) =
t
P3(f. i)
-+
O.
1
+
A-i'
1
-i
A
-i
1 + A
For practical purposes. this means that when N is large, Bt{f.i) tends
to a limiting value as f decreases.
This limit only depends on i.
So. there exists a threshold value of f. f . say. such that
mIn
Bt{f . . 0) and Bt{f . -1.0) agree to q. perhaps 16. decimal places.
mIn
mIn
Hence. Bt{f.O) is effectively constant for all f
~
f
min
.
To evaluate this f . . we argue as follows:
mIn
Ip~{f.i) - ~l{i)1 ~ 10-q <=> Afc N- t ~ 10-q
(and this only occurs for f
~
f mIn
. (log A)
Now. "Y
f . ).
mIn
~
Taking logarithms to base 10.
N-t
---2 (log b/a) - q.
= log b/a
= InInb/a
log A '
A
so
f min
=
[N;t{ 1-"Y)] -
boi
A] - 1.
Thus. the values of Bt(f.i) are only "interesting" for f ranging
from f . to f
• since outside this range. values either attain a
mIn
max
limit (to q decimal places) or else. are 4-mode optimal. and so are
readily calculable.
Now f . and f
are both independent of i. but depend on t.
mIn
max
noted earlier. f
max
is maximal when t
= 2.
We have. therefore. proven
the next result. which ensures the programmability of Bt{f.i) for
arbitrarily large values of N.
As
This is quite remarkable, because
112
there is no a priori reason. it seems. for the extremes of interesting
(necessary) f-values for a given a and b. f
the form:
g(N-t) + constant
Lemma 5.5
For any N. a and b. at t
max
and f . . to both be of
mIn
for the identical function g(.).
f max- f min
= 2.
= [loi
the range
A] + 2.
and hence is independent of N.
Note. for t
> 2.
the result still holds except the range may be
slightly larger than the one quoted. but only by the addition of a
small constant.
This is due to f
max
at higher values of t not always
exactly coinciding with the value of f for which P first exceeds Pl·
3
This means that
~(O.O)
can be computed for any N using only
matrices of constant dimensions.
(f max- f mIn
. ) x i ' and consequently.
O
in linear time (Le. a program for N
as for N
= 1,000) .
By evaluating
= 2.000
boi
easily see that if the difference. d
A]
takes about twice as long
for various A and q. one can
= a-b.
is larger than 0.2. for
instance. fewer than 50 f-values are interesting.
To detect a
treatment difference as small as 0.05. and working to as many as 8
decimal places. fewer than 100 f-values need be evaluated (at each
even t less than or equal to N).
The final result enables multiple inferences to be made from a
single program run in a special case: a + b
= 1.
As mentioned in
Chapter IV. there are reasons to believe that this is also. in some
sense. the most interesting case. among all cases with a fixed
difference. a-b.
For example. Theorem 4 of Bather and Simons (1985)
says. in relation to error probabilities. that the upper bound for
113
the maximum risk in their two-treatment case occurs along
= 1.
a + b
(The "error probabi Ii ty" is the probabiIi ty that patients beyond the
testing stage are mistakenly assigned the inferior treatment.)
Further evidence is found in Bather (1985), Simons (1986) and in
Kulkarni and Kulkarni (1987), who suggest among certain symmetrical
procedures, the largest expected number of observations occurs, once
again, when a + b
= 1.
We introduce superscript notation on
B~N), to designate the
reward with t patients remaining from the initial horizon. N.
Lemma 5.6
If a + b
= 1,
then for any given N, t, f and i,
In this special case, A
2
= (alb).
Further, PI' P2 and P3
depend on N only through Afe N- t , which is e N- t - 4f
(Recall: e
2
b
= -.)
a
Consequently, with t fixed, increasing N by 4 is exactly compensated
for by increasing f by 1.
Since the posterior probabilities dictate
1
4
the values of B to B , (as seen in earlier proofs, before changing
o
parameters) the result follows.
Corollary 5.4
If a + b
= 1,
then for any given N and f,
~N)(O,O)
Proof
i
=0
= ~N+4f)(f,O).
This follows immediately from Lemma 5.6 above, upon setting
and t
= N.
o
114
This means that the reward for any particular horizon can be
inferred from intermediate results from a larger horizon, in the
special case a + b
= 1.
Numerical Results
We have described the theoretical background for obtaining
numerical results for optimal stopping policies and rewards in Case 3
by computer for arbitrarily large horizons.
We shall present some
results of program runs here to see how well two serial trials
perform.
extent.
The program depended on unproven assertions to a minimal
The contra-positive of Assertion 4.1 and Assertion 4.4 were
the two results relied upon for large horizon output.
Direct comparison with Case 2 will be deferred until the next and
final chapter.
(1) Large Horizons
The first observation is that Case 3 seems to preserve the
property of asymptotic optimality, in the sense that the regret tends
to zero as N tends to infinity.
B10 ,ooo(O,O)
For example, when (a,b)
= 5976.69,
or
= (.6 .. 4).
59.77%.
Define the "regret percent" to be the difference between the expected
percentage reward under the policy and the percentage of successes
expected if we knew ahead of time (and applied throughout) the
superior treatment.
Then, here, the regret percent is just 0.23%.
= (.75,.25),
B10 ,ooo(O,O) = 7488.86%,
Another example, when (a,b)
so the regret percent is only 0.11%.
has
or
74.89%,
115
(2) Rewards
The optimal rule still performs well for smaller horizons also.
The corresponding Case 2 values to Table 5.3 below have already been
given in Table 4.2.
(These two tables will be discussed later. in
Chapter VI.)
Table 5.3
Case 3 rewards for (a.b)
N
~(O.O)
= (.6 .. 4).
Expected %
various N.
Regret %
12
5.88
49.01%
10.99%
24
12.03
50.13%
9.87%
48
24.80
51.66%
8.34%
72
37.91
52.65%
7.35%
100
53.55
53.55%
6.45%
200
110.84
55.42%
4.58%
300
169.22
56.41%
3.59%
1000
584.72
58.47%
1.53%
(3) Robustness and Minimax Policy
We will say the "policy vector" summarising the optimal stopping
rule. given by
for i
= O.
(1 , II' 1 .... )
0
2
is the sequence of minimal t-values
1. 2. etc. for which it is optimal to terminate Stage I
(continuation or I-mode) and to begin the appropriate stopping mode
(2. 3 or 4).
Recall that i denotes the difference between successes
accumulated by the two treatments.
When (a.b)
= (.6 .. 5).
so that d
has as its first few entries:
= 0.1
(or 10%). the policy vector
(2. 16. 44. 90. 158. 256 ... ).
Thus. supposing there are t patients remaining in this case and the
116
current difference in successes accumulated is 4. then it is optimal
to continue Stage I so long as t is at least 158.
For any fixed horizon. the last values of a policy vector given
by the program are influenced by N. but only for a small range of
t-values close to N.
All policy vectors quoted are unaffected by the
particular choice of N used to generate them. so they are in this
sense independent of the horizon.
(That is. in all examples. N is
large enough to ensure no horizon-dependent end effects are present.)
Another example of a policy vector. with d
(a. b)
= (.6 • .45).
begins:
= 15%.
and
(2. 16. 46. 100. 190 .... ).
Table 5.4 shows the early entries in policy vectors for various
a + b = 1.
(a.b) subject to
Empirical evidence suggests (based on
more than just the two examples already quoted) that these minimal
t-values are themselves minimal amongst all choices of (a.b) sharing
the same difference. d. as was observed in Case 2.
This means that
given a specific d. the entries in the policy vector are bounded below
by the a + b
Table 5.4
=1
situation.
PoLicy vectors. various (a.b) with a + b
= 1.
d
(a.b)
50%
(.75 •. 25)
2
24
190
1652
30%
(.65 .. 35)
2
18
64
208
**
*
20%
(.6 • .4)
2
16
50
118
258
*
10%
(.55 .. 45)
2
16
44
88
156
256
388
578
850
*
5% (.525 . .475)
2
14
42
84
140
214
306
418
554
716
908
*
Note: All values are constrained to be even.
*
denotes the first entry exceeding 1.000;
**
exceeding 10,000.
117
So the choice of (a.b) is critical. but once again it turns out
that the difference between a and b is the most influential.
For a
specified d. the policy does behave robustly - neither very much more
nor less so than Case 2.
It is interesting to note the similarity between the last
tabulated policy vector and the sequence of numbers Simons found (some
of which are given in Chapter I) in his two-treatment case. as (a.b)
tended to (.5 .. 5). defining an outer envelope for universal optimal
continuation points.
The minimax policy for Case 3, then. appears to
coincide with the two-treatment case.
Additional evidence for this
assertion is found in noting that the first tabulated policy vector,
having (a,b)
= (.75 .. 25)
has two-treatment case analogue:
(2. 23. 190. 1652).
This suggests that the minimax stopping rule for Stage I of
Case 3 behaves as though it is in an "a vs. b" situation (as opposed
to the alternative "b vs. b") and that there are only two treatments
present.
This is not too surprising. since we are in the restricted
setting of the worst two of the three treatments being of equal
success probability. and. one would expect a vs. b to decide on a
winner before b vs. b.
Hence. realising the finality of the selection
of treatment to be rejected. the minimax policy expects to stop no
sooner and no later than the a vs. b situation.
Case 2. general comments without this "b
= c"
Of course. just as in
restriction cannot be
made. except to say we still believe this constraint gives rise to the
"longest-lasting" trials.
Also as in Case 2. if one knew the success probability reliably
for one of the treatments. then a corresponding minimax policy vector
could be calculated taking this extra information into account.
This
118
vector would have entries at least as great as if we had no prior
lmowledge of the success probabilities and we used the "worst-case"
choice of a and b summing to one.
To be most conservative. one could
use the limiting values of the policy vector found by letting (a.b)
tend to (.5 .. 5) in order to make a rule. independent of the prior.
suitable for putting into practice.
CHAPTER VI
<X>MPARISONS AND <X>NCLUSIONS
Introduction
In the Opening Remarks of Chapter I. we brought up some questions
fundamental to clinical trials methodology and three associated goals
or principles.
In closing. we shall make comparisons and draw
conclusions. bearing in mind each of these perspectives: theoretical.
ethical and practical.
We begin with quantitative comparisons. in
terms of both the optimal reward and the policy. and later. consider
qualitative comparisons.
Cases 2 and 3.
For the most part. we will concentrate on
This is because Case 1. from three treatments to one.
recall. was explored mostly to develop the theory and proved to be
somewhat unrealistic. particularly with regard to the artificial
constraint of requiring a simultaneous decision on the two worstappearing treatments.
Also. any differences between Cases 2 and 3
will be helpful in gaining insight into the question of allocating two
or more than two treatments at a time. in the presence of multiple
treatments.
In the discussion following the Case 2 and Case 3 comparisons. we
address briefly several topics pertinent to the model we have
developed. including: the intended applications. a comparison with
fixed sample size methods and suggestions for further research.
120
Quantitative Comparisons
By comparing Tables 4.2 and 5.3. one can see that the rewards for
the case (a.b)
= (.6 •. 4)
and various N are quite similar.
Both seem
asymptotically optimal and perform entirely satisfactorily (in terms
of the expected number of successes) for a wide range of horizons, and
there is little practical difference between their numerical answers,
except to say Case 2 seems always to be marginally superior.
This
would be expected. mainly because after Stage I is over in Case 3
(only). the two remaining treatments start afresh. as if in a new
clinical trial. thereby "wasting" some valuable information about the
relative efficacy of the treatments.
Table 6.1 shows the rewards. corresponding expected percentages
("Exp.%") of successfully treated patients and the regret percent
("Regret%"). defined in Chapter V. for various (a.b) and a fixed
horizon. N
= 300.
for both Case 2 and Case 3.
(We will return to
discuss the fixed sample size column (F.S.S.) shortly.)
Table 6.1
d
Comparison of rewards among cases. various (a.b), N
(a.b)
CASE
2
CASE
SMAX(O.O) Exp.% Regret%
20%
3
B(O.O)
Exp.% Regret%
= 300.
F.S.S.
Regret%
(.6 •. 4)
169.61
56.54
3.46
169.22
56.41
3.59
13.33
15% (.6 .. 45)
168.99
56.33
3.67
168.71
56.24
3.76
10.00
10%
(.6 •. 5)
169.39
56.46
3.54
169.24
56.41
3.59
6.67
5% (.6 •. 55)
172.46
57.49
2.51
172.42
57.47
2.53
3.33
15% (.55 • .4)
153.99
51.33
3.67
153.70
51.23
3.77
10.00
10%
(.5 •. 4)
139.38
46.46
3.54
139.23
46.41
3.59
6.67
5% (.45 •. 4)
127.46
42.49
2.51
127.42
42.47
2.53
3.33
121
So, the rewards in each case behave very similarly across a
variety of choices of (a, b).
Once again, Case 2 performs consistently
slightly better than Case 3, in spite of the latter's construction to
be about the strongest possible contender to Case 2, under the
identical restrictions.
This observation is based on reading the
percent columns, but, when translated back into expected numbers of
successes, the two reward columns indicate this effect is quite
insignificant.
It comes as no surprise for Case 2 to eclipse Case 3
as (a,b) varies, for the reason already stated in the context of
changing N.
It is interesting to note how nearly identical the regret
percents are within each case, for the same value of d.
This provides
further evidence for the robustness of the optimal policies in the
difference between a and b. (It should be mentioned, however. that
this effect is exaggerated due to the value of A also being the same
for two of the tabulated pairs having a common d.)
Next, we will compare the optimal policies for Cases 2 and 3 in a
We have seen that the constraint a + b
typical example.
=1
apparently gives rise to the largest optimal continuation regions in
both cases, and hence to the minimax-type policies for a specified d.
For this reason. together with a realistic choice of d. the optimal
policies are given below for (a,b) equalling (.525,.475).
The policy
table for Case 2 has been adapted to plot the minimal t-values, not
for j and k, but for j + k and k.
This is to facilitate a direct
comparison with I. values, since both
1
j + k
and
i
represent the
score difference between the current winning and losing treatments in
their respective cases.
122
Table 6.2
Minimal. t-val.ues to begin Stage II. (a.b)
Case 3
Case 2
1
0
II
1
2
1
3
1
4
1
5
2
14
42
84
140
214
k\k+j
0
1
2
3
4
5
0
6
24
54
99
159
237
45
84
135
201
285
126
186
258
+
246
+
+
1
2
3
= (.525 .. 475).
(where + denotes above 300)
The minimal t-values for Case 3 are uniformly lower than their
counterparts in Case 2.
This means that Stage I lasts longer (samples
more patients in the first testing stage) on average for Case 3. but
this alone does not necessarily make it better or worse.
(We are
concerned with the expected number of successes throughout the entire
trial. not just after Stage I.)
A possible explanation for this is
that. in Case 3. there is a one-third probability of Stage I being
conducted between two equally successful treatments (i.e. the b vs. b
situation).
Intuitively. it is not obvious whether this is any worse
than starting the trial with an a vs. b situation.
It seems. then. that there is not much difference in the
performance of the optimal stopping rules in Case 2 and Case 3. from a
theoretical viewpoint.
(i)
To summarise:
both yield high expected numbers of successes for large and small
values of N;
(ii) both are fairly robust in (a.b). with the difference being far
more influential than the individual values; and
e
123
(iii) both can be made into a minimax-type rule that does not depend
on the choice of (a. b).
If anything. Case 2 performs marginally better than Case 3. but
clearly other considerations besides theoretical ones need to be taken
into account to distinguish between them.
Qualitative Comparisons
Here we will address first ethical and then practical differences
between the cases.
ethical reasons.
Note that Case 1 is not a satisfactory model for
It is difficult to justify a clinical trial which
continues sampling by triplets when the top two treatments are doing
equally well. while the loser is performing relatively very poorly.
This. however. can occur since the rule is constrained to make a
single switch to a unique treatment.
Even if such an occurrence is
unlikely. one cannot ethically support the implementation of Case 1.
Case 3 involves shelving a potentially good treatment for an
undetermined time span. if indeed there are three treatments waiting
to be tested in a trial. as we are assuming.
ethical difficulty.
This too raises an
Case 2. on the other hand. has neither of these
problems and. by its design. meets high ethical standards.
Further
comments will be given in the discussion to follow.
From a practical perspective. both Case 2 and Case 3 have easily
implemented. simple stopping rules. each requiring just two technical
decisions.
However. Case 3 does incorporate the two impractical
options. discussed in Chapter V. and so. Case 2 is again preferable.
Hence. it is for qualitative more so than quantitative reasons
that Case 2 is recommended as a suitable model for certain clinical
trials involving three treatments.
124
Discussion
These aforementioned clinical trials are for those involving
treatments seeking to combat very serious. perhaps life-threatening.
diseases (so that "failure" is death). and for which treatment
response occurs soon after it is administered (relative to the rate of
patient accrual).
Also. in the case of serious diseases. one is less
concerned with secondary effects. which are not included in the model
we have studied.
Though there is an argument for applying decision-
theoretic designs to Phase III trials. Case 2 provides a model best
suited to the smaller-scale Phase II trials.
Recall. these are the
preliminary. comparative trials having as primary goal the
identification of an improvement over existing therapy.
Decision-theoretic approaches. by design. are based upon the
pragmatic need for a conclusion to be reached about relative
efficacies of competing treatments.
Because this also means that
testing stages of such trials tend to involve small numbers of
patients (both in absolute terms and relative to the traditional fixed
sample size designs). Case 2 is not suitable for estimation purposes.
However. as mentioned in Chapter I. it is difficult for one trial to
attempt to answer more than one major question:
(1) Is the new treatment better than the oLd? or
(2) How successfuL is the new treatment?
Case 2 does not seek to answer (2). but is set up to answer (1). the
principal motivation behind Phase II trials.
In short. Case 2 is most
applicable to serious illness. Phase II trials.
A natural question to ask of decision-theoretic models is about
the choice of N.
In theory. it is that large number of all patients
125
who actually would receive the indicated treatment once the testing
stage of the trial is complete.
In practice. this number is unknown.
and furthermore. the inherent assumption that one trial has such a
high imPact is clearly false (but see comments in Chapter I).
Anscombe (1963) and others have written at length on the role played
by N. noting especially the unimportance of its precise specification.
For Phase III applications. N would be very large, but for Phase
II. one might choose N to be moderate. say 1.000. and only expect 50
or so patients to actually partake in the testing stages of Case 2.
The majority of trials are conducted in practice on about this many
patients anyway. and there is no need under any sequential model to
know the exact number of patients coming forward with a given illness.
That is. it is quite practical to enter a small and random number of
patients into a trial.
If ethical considerations make (1) above a more important
question than (2). then fixed sample size methods become hard to
justify.
This was mentioned earlier in the motivation part of Chapter
I. and is now supported numerically.
Referring to the table at the
start of this chapter. one can see that with the smallest of treatment
differences. the "F.S.S." regret percent is only slightly worse than
that of Case 2.
However. as the difference between a and b widens,
the fixed sample size regret grows considerably faster.
This means
that if there does turn out to be a single treatment. amongst the
three. that is clearly best. then there is a significant risk to the
trial particiPants.
Suppose we are conducting hypothetical trials in which death is
the endpoint by the two different approaches.
If N is 1.000 and there
is a difference of 5% between the best and worst treatment. then about
126
14 lives are at stake.
For illustration purposes. let us take only a
slightly more extreme example.
Suppose (a.b)
= (.6 .. 4)
and N
= 3.000.
The reward expected would be 1.782.55 for Case 2. some 380 more than
if 1.000 were allocated to each treatment.
Of course. in all
likelihood. this latter trial would be called off before completion by
its ethical committee.
To avoid getting into ethical dilemmas in the
first place. Case 2. by contrast. is designed to terminate its testing
stage precisely when sufficient evidence has accumulated to indicate
the superior treatment.
Pocock (1984) discusses "collective" and "individual" ethics.
arguing that each clinical trial requires a balance between the two.
(The former is concerned with ensuring the best is done for future
patients; the latter for current patients.)
Case 2 places a high
premium on individual ethics (only a minimal. small percentage of
in-trial patients are expected to end up any worse off than if the
truly superior treatment were allocated to all). and yet future
patients are given equal weight.
Of course. ethical arguments become
less serious with disease severity but. conversely. for serious
illnesses they are foremost.
There are several avenues that can be pursued for further
research.
Assertions based on empirical evidence need to be proven
rigourously.
Assertion 4.6(ii) seems especially interesting and
perhaps its proof will require quite different techniques to those
employed in this paper.
The natural extension to four or more
treatments is probably not worthwhile. partly because the degree of
complexity is too severe.
The analogous dynamic equation can be
written down. but involves a recurrence relation that is "three deep"
when there are four treatments.
(Case 2 has one recurrence embedded
127
in another and hence is "two deep".)
Also. as noted in Chapter I.
very few trials in practice. are set up to consider more than three
treatments.
To deflate objections to the choice of N. it could be
made random. most readily with a geometric distribution.
This
generalisation has already been performed by Simons (1986b) in the
two-treatment setting.
Closing Remarks
To summarise. there are three main conclusions.
1.
Case 2 provides a simple. well-performing. ethical. pragmatic and
potentially useful model for a clinical trial involving three
treatments with dichotomous responses.
It is most applicable to rapid
response. serious illness. Phase II trials.
2.
In the presence of multiple treatments to be tested. it is
preferable to begin sampling patients in triplets. rather than perform
two. serial pairwise trials. but more so for qualitative than
quantitative reasons.
3.
This study does raise serious questions about the conduct of
trials using fixed sample size methodology when ethical considerations
indicate the need for pragmatism.
APPENDIX 1
Proof of Theorem 3.2
We shall follow the steps as in the proof of Theorem 3.1.
Step 1:
For convenience. define. for i
= 1.
2
(Thus. the Radon-Nikodym derivative is just the ratio of E~ to E~.)
With R~(X.y) denoting the expected advantage of immediate switching
when in state (t.x.y) instead of continuing with pairs for all
Now. if G replaces G.• then the numerator is zero. vanishing because
1
and hence
-i
Step 2:
This can be re-written in terms of Rt(x,y)'s defined
analogously as before:
for i
= I,
2
so that
Step 3:
Defining
-i
St(x,y)
i
i
= St(x,y)Et(x,y)
for i
= I,
2 we now
establish that the S~(x'Y)'s satisfy identical recurrence relations
save for a multiplicative constant.
To see this, we need to know the transition probabilities for the four
possible results after sampling another pair of patients from state
(t+2,x,y).
Table A.1
Result
These are summarised in the table below (using notation
Transition states and probabilities.
New State
Probability (under prior G.l
1
(0,0)
(t ,x,y)
i
i
II v1v2dGi(~l'~21 t+2,x,y) = E t (x,y)lE t +2 (x,y)
(1,0)
( t,x+l,y)
i
i
II ~lv2dGi(~l'~21 t+2,x,y) = Et (x+1,y)lE t +2 (x,y)
(0,1)
(t,x,y+1)
i
i
II v1~2dGi(~l'~21 t+2,x,y) = Et (x,y+1)lE t +2 (x,y)
(1, 1)
( t , x+ 1, y+1)
i
i
II ~1~2dGi(~1'~21 t+2,x,y) = E t (x+1,y+1)lE t +2 (x,y)
To verify the third entry, for instance, note that
=
=
where, as usual, n is the number of pairs sampled when t patients are
remaining from the original horizon N.
Thus we can write, for
i
= 1,
2
and
t
0,
~
i i i
i i i
St+2(x,y) = max {R t +2 (x,y) , [Et(x,y)St(x,y) + E t (x+l,y)St(x+l,y) +
i i i
i
i
E t (x,y+l)St(x,y+l) + E t (x+l,y+l)St(x+l,y+l)] / E + (x,y)}
t 2
and for
s
= 0,
1
If v denotes the linear operator such that:
v Z(x,y)
then we have for t
= Z(x,y)
~
+ Z(x+l,y) + Z(x,y+l) +Z(x+l,y+l)
0, both
-1
-1
-1
St+2(x,y) = max { Rt +2 (x,y) , v St(x,y) }
-2
-2
-2
St+2(x,y) = max { Rt +2 (x,y) , v St(x,y) }
and
with initial conditions
Si
s
=Rsi
for s
= 0,
1 and i
= 1,
2.
By the relation given in Step 2, it follows that
Step 4:
Hence
Step 5:
Lastly, Steps 1 and 4 combine (on subtraction) to give the
stated result.
o
APPENDIX 2
5 REM THIS IS "DANIEL". IT PRINTS OUT THE MINIMALT-VALUES FOR
WHICH IT IS OPTIMAL TO CONTINUE SAMPLING BY TRIPLETS (STAGE I,
CASE 2) FOR ANY GIVEN HORIZON (MULTIPLE OF 3) AND PAIR (a.b).
10 REM J = SCORE DIFF. BETW. "BEST" AND "MEDIAN" TREATMENT '
K = SCORE DIFF. BETW. "MEDIAN" AND "WORST" TREATMENT.
15 REM "DANIEL" CAN HANDLE LARGE HORIZON~~ Hl,..,.SO LONG AS THE
CRITICAL VALUE OF ~ YC~}S LESS THAN Tl1E vIMENSION OF THE
S-VECTORS IN 55. NOTE Tl1AT GG HAS TO BE SET LARGER THAN
MAXIMAL J1 VALUE WHERE (H,J1,O) IS A CONTINUATION POINT.
20 REM TO ENSURE YC < 500, NEED A-B > ,02 . HERE, DIM(S)=500; GG=35.
25 DEFDBL A-F,L-5 U-X,Z,I
30 INPUT "HORIZON, N ="'H
35 IF INT(H/3k>H/3 THEN INPUT "ENTER MULTIPLE OF 3, H =";H
40 INPUT "A 13 ="·A B
45 GG=35 ,
"
50 HH=Hf3: HHH=GG+INT(H+l)/2)
55 DIM P(3),Q(6)~SX(35{35),Tl\1IN(35,g5),SEO(500),SEN(500),SOO(500),
SON(500);~0(SOO }
60 AA=l#-A : BB=l#-B : C=A-B
65 IF C<.02:£ THEN PRINT" BEWARE: CHECK YC VALUE"
70 LPRINT lr HORIZON =";H'TAB(24)'''A =";A;" B =";B
75 L=A*BB/B/AA: BE={A*B*BB*AA)".5#
80 E=A*B*13B : F=A*BB*BB : EE=AA*B*'B : FF=AA*B*BB
85 QC=A*B*B + AA*BB*BB: V=A*B + AA*BB: VV=l#-V
90 OEF FNR(Y) = .5#*{L "(Y12#) - L"(-Y/2#))/(L "(Y/'1#) + L"(-Y/2#»
95 DEF FNCB(Y8 = .5#'*(L"CYn:JJ;) + l#/(L"(Yf2#)))
100 FOR Y=O T 500 STEP io: PY=Y
105 IF ABS(.5# - FNR(PY»+ABS(L" .5# - FNCH(PY+1#)/FNCH(PY))
< .oooaoooOOOOOb01# THEN GOTO 115
110 NEXTY
115 YC=Y : PRINT "YC ="'YC
120 PRINT TAB(I \fNR(PY),FNCH(PY-1#)/FNCH(PY),FNCH(PY+1#)/FNCH(PY)
125 IF YC >499 TBt,N GOTO 750
130 FOR J=O TO 500 : PJ=J
135 SEO(J)=O#: SEN(J)=O#: SOO(J)=O#: SON(J)=O#: SO(J)=O#
140 NEXT J
145 FOR J=O TO GG : PJ=J
150 FOR K=O TO GG : PK=K
155 SX(J~)=O#: TMIN(J,K)=60001!
160 NEXT K : NEXT J
165 REM SO FAR WE'VE INITIALISED
170 REM NEXT WE FIND Sl AND S3
175 FOR Y=O TO HHH: PY=Y
180 IF Y=YC THEN GOTO 195
185 SOO(Y)=FNR(PY)
190 NEXT Y: GOTO 200
195 SOO~C) = .5#
200 SON 0 =V*SOO(O) + VV*SOO(l)
205 FOR =1 TO HIlIt-l :PY=Y
210 IF Y= YC THEN GOTO 230
215 SON(Y)= (BE*FNC~PY-1~~)*SOO(y-1)+ V*FNCH(PY)*SOO(Y) +
BE*FNCH(PY+1 *SOO Y+l)/FNCH(PY)
220 IF SON(Y) < '3#*F (PY) HEN SON(Y)=3#*FNR(PY)
225 NEXT Y : GOTO 235
230 SON(YC) = 1.5#
235 FORY=O TO HHH-l
240 IF Y= YC THEN GOTO 255
245 SOO(y)=sONfY) : SO(Y)=SON(Y)
250 NEXT Y : GO 0 260
255 SOO(YC)=SON(YC) : SO(YC)=SON(YC)
260 T=3 : pT=T
265 Jl=O
270 REM NEXT IS THE SMAX MODULE
275 IN=O
280 FOR J=O TO J1+1 : PJ=J
285 FOR K=O TO Jl+I-J :PK=K
290 IJ=I#: IK=I#: JJ=J-l : KK=K-l
295 IF J=O THE~nJ=O#
300 IF J=O THEN JJ=O
305 IF K=O THEN IK=O#
310 IF K=O THEN KK=O·
315 PI=PJ+PK: D=L'PI + L'PK + 1#
320 P(I)=(L 'Plf~/D : P(2)=(L 'PK)/D : P(3 l =IJ.D
325 X = (1'#-P 3))*C*SO(J) + P1'·(A+B:-.c*P 3»*.5
330 1 =F*p
+ FF* 1 -P 1 : 4 =E* 1 -P 3 + EE*P 3
335 2 =F*P 2 + FF* 1 -P 2 : 5 =E* 1 -P 2 + EE*P 2
~g~~~~iJtlitlrJfb~~pJJjt~~~lK~ifl *i~tlMf:&~L»
350
X;~~~ + *J~(9~6~t)*f~(ti~~PmC-IJ)*(Q(2)+Q(3)*(I#-IK))) +
.
+
SX~J K+l \*lJ~'ol~~)+ (3)* 1 -IK) +
. SX J,:f(+I)*(Q(4 + 1#- K)* Q 5)+Q(6)*(I#-IJ)))
355 XX = X + A + 2 *
360 IF X>XX THEN OTO 395
365 IF K>O THEN GOTO 385
370 IF J <J1 THEN GOTO 385
375 IN=1
II
II
380
385 PRINT
X=XX TAB(I)·IISMAx(n·T·
'
, , ,II.J.II
, , ,n·K·
, , ) -"·XX·TAB(50)·"3"
-"
,
390 IF T < TMIN(J,K) THEN TMIN(J,K) = T
395 SX(GG-Kl,.qG-J) = X
400 NEXT K: ~EXT J
405 IF T=H THEN GOTO 635
410 FOR J=O TO Jl+l
415 FOR K=O TO Jl+I-J
420 SX(J J!<) = SX(GG-K,GG-J)
425 NEXl K : NEXT J
430 T=T+3 : PRINT Tj
435 IF IN=1 THEN J1=J1+1
440 IF INT(T/2k>T/2 THEN GOTO 545
445 REM IF l' EYEN GOTO 455
450 REM NEXT COMES THE NEXT THREE EYEN S's
455T=T~
460 FOR G=O TO 2
465 T=T+2 : PT=T
470 SEN(O) = Y*SEO(O) + YY*SEOO)
475 FOR V=1 TO HHH-T/2 : Py=y
480 IF Y = YC THEN GOTO 500
485 SENey) = (BE*FNCH(PY-l~*SEO(Y-l) + Y*FNCH(PY)*SEO(Y) +
BE*PNCH(PY+l#J*SEO +l)/FNCH(PY)
490 IF SENey) < PT*FNR(PY) HEN SENey) = PT*FNR(PY)
495 NEXT Y : GOTO 505
500 SENey) = .5#*PT
505 FOR Y=o TO HHH-T/2
510 IF Y=YC THEN GOTO 525
515 SEO(Y) = SENey) : SO(Y) = SENey)
520 NEXT Y: GOTO 530
525 SEO(YC)=SEN(YC) : SO(YC)=SEN(YC)
530 NEXTG
535 GOT0275
540 REM NEXT COMES THE NEXT THREE ODD S's
545T=T~
550 FOR G=O TO 2
555 T=T+2 : PT=T
560 SON(O) = Y*SOO(O) + YY*SOOO)
565 FOR')'=1 TO HHlI-T/2+.5 : py=y
570 IF Y = YC THEN GOTO 590
575 SON(Y) = (BE*FNCH(PY-~*SOO(Y-l) + Y*FNCH(PY)*SOO(Y) +
B£*PNCH(PY+l#)'*SOO +l)/FNCIt(PY)
580 IF SON(Y) < PT*FN"R(PY) HEN SON(Y) = PT*FNR(PY)
585 NEXT Y : GOTO 595
590 SON(YC) = .5#*PT
595 FORl'=O TO BHH-T/2+.5
600 IF Y=YC THEN GOTO 615
605 SOO(Y) = SON(Y) : SO(Y) = SON(Y)
610 NEXT Y: GOTO 6'20
615 SOO(YC)=SON(YC) : SO(YC)=SON(YC)
620 NEXTG
625 GOTO 275
630 REM NEXTl,.;0 PRINT A TABLE OF MINIMUM T VALUES
635 IF Jl > 10 Tt1EN GOTO 720
640 LPRINT: LPRINT" K \ J"j
645 FOR J=O TO Jl
650 LPRINT TAB(8+6*J);Jj
655 NEXT J
660 FOR K=O TO Jl:LPRINT:LPRINT K;
665 FOR J=O TO J1
670 IF TMIN(J,K) > H THEN TMIN~,K) =-1
675 LPRINT TAB 8+6*J ·TMIN J K ;
680 NEXT J :
K:
INT:LPRINT:LPRINT
685 REM SOME SUMMARISING STATEMENTS
690 LPRINT TAB(l);"SMAX("·H·",";" 0 0) ="iSX(GG GG)
695 LPRINT:LPRIN1''' SO
EXPECTED % = II·USING "##.###".
100*SX(GG,GG)/Hj
,
,
700 LPRINT"%
WrTH A =".A-" B ="·B
705 LPRINT" AND
REGRBT = "iUSING"##.###"·A*H-5X(GG GG)·
710 LPRINT"
WITH N ="·H
"
,
715 LPRINT:LPRINT: GOTO 750
720 LPRINT:LPRINT:LPRINT
725 FOR J=O TO Jl
730 FOR K=O TO J1
735 IF TMIN(J K) < H THEN LPRINT "TMIN(ljJil,";K;") =";TMIN(J K)
740 NEXT K : NEXT J
'
745 GOTO 690
750 END
NE~T
ti>RIN~:LP
Sample Output
HORIZON
J
K \
0
1
2
3
= 300
0
6
45
126
246
-1
-1
-1
4
5
6
1
24
84
186
-1
-1
-1
-1
A
2
54
135
258
-1
-1
-1
-1
SMAX( 300 • 0 • 0 )
so
AND
EXPECTED %
= 49.976%
REGRET =
7.571
= .525
3
99
201
-1
-1
-1
-1
-1
4
159
285
-1
-1
-1
-1
-1
B
= .475
5
237
-1
-1
-1
-1
-1
-1
6
-1
-1
-1
-1
-1
-1
-1
= 149.9290715369244
WITH A
= .525
WITH N = 300
B
= .475
APPENDIX 3
Modes Table for t - 2, N - 36, (a,b) - (.6,.5)
Key
The setting is Case 3, Stage 1.
j = # successes accumulated on leading treatment
k
=#
successes accumulated on second treatment
"I"
= Continue
Stage I
"2"
= Commence
Stage II
"3"
= Switch
= Switch
"4"
all to leading treatment
all to third treatment
t
j 0
k
*
0*4
1 *
2 *
3 *
4*
5 *
6 *
7 *
8 *
9 *
10 *
11*
12 *
13 *
14 *
15 *
16 *
17 *
1
*
4
4
2
*
4
4
4
3
*
4
4
4
4
4
*
4
4
4
4
4
5
*
4
4
4
4
4
4
6
*
4
4
4
4
4
4
4
7
*
4
4
4
4
4
4
4
4
=2
8
*
4
4
4
4
4
4
4
4
4
9 10 11 12 13 14 15 16 17
* * * * * * * * *
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
1 3 3 3 3 3 3 3
1 3 3 3 3 3 3
1 3 3 3 3 3
1 3 3 3 3
1 3 3 3
1 3 3
1 3
1
e
APPENDIX 4
Modes Table for t - 60 , N - 100, (a,b) - ( .6, .4)
Key
See Appendix 3.
t = 60
j 0
k
*
0*4
1 *
2 *
3 *
4*
5 *
6 *
7 *
8 *
9 *
10*
11*
12 *
13 *
14*
15 *
16 *
17*
18 *
19 *
20 *
1
*
4
4
2
*
4
4
4
3
*
4
4
4
4
4
*
4
4
4
4
4
5
*
4
4
4
4
4
4
6
*
4
4
4
4
4
4
4
7
*
4
4
4
4
4
4
4
4
8
*
4
4
4
4
4
4
4
4
4
9 10 11 12 13 14 15 16 17 18 19 20
* * * * * * * * * * * *
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
2 2 2 3 3 3 3 3 3 3 3 3
1 1 1 3 3 3 3 3 3 3 3 3
1 1 1 3 3 3 3 3 3 3 3
1 1 1 3 3 3 3 3 3 3
1 1 1 3 3 3 3 3 3
1 1 1 3 3 3 3 3
1 1 1 3 3 3 3
1 1 1 3 3 3
1 1 1 3 3
1 1 1 3
1 1 1
1 1
1
APPENDIX 5{a)
Modes Tables for
t
= 68
and 70, N
= 100,
(a,b) - (.6,.4)
e
•
Key
See Appendix 5{b).
t
f
0
*
o* 1
1 * 1
2 * 1
3 * 3
4* 3
5 * 3
6 * 3
7 * 3
8 * 3
9 * 3
10 * 3
11* 3
12 * 3
13 * 3
14* 3
15 * 3
16 * 3
i
1
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
2
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
4
*
1
1
1
3
3
3
3
3
3
3
3
3
3
5
*
1
1
1
3
3
3
3
3
3
3
3
3
6
*
1
1
1
1
3
3
3
3
3
3
3
7
*
1
1
1
2
2
2
2
2
2
2
f
0
*
o* 1
1 * 1
2 * 1
3 * 3
4 * 3
5 * 3
6 * 3
7 * 3
8 * 3
9 * 3
10* 3
11* 3
12 * 3
13 * 3
14 * 3
15 * 3
1
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
2
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
*
1
1
1
3
3
3
3
3
3
3
3
3
3
8
*
1
1
2
2
2
2
2
2
2
9 1011 12 13 14 15 16
* * * * * * * *
1 4 4 4 4 4 4 4
2 4 4 4 4 4 4
2 4 4 4 4 4
2 4 4 4 4
2 4 4 4
2 4 4
2 4
2
e
t
i
= 68
4
*
1
1
1
3
3
3
3
3
3
3
3
3
5
*
1
1
1
3
3
3
3
3
3
3
3
6
*
1
1
1
1
1
1
1
1
1
1
= 70
7
*
1
1
2
2
2
2
2
2
2
8
*
1
2
2
2
2
2
2
2
9 10 11 12 13 14 15
* * * * * * *
1 4 4 4 4 4 4
1 4 4 4 4 4
1 4 4 4 4
1 4 4 4
1 4 4
1 4
1
e
APPENDIX 5(b)
Modes Table for t
Key
= 72,
N
= 100,
(a,b) - (.6,.4)
The setting is Case 3, Stage I.
= # failures
= difference
f
i
on more successful treatment
in # successes accumulated between treatments
"1"
= Continue
Stage I
"2"
= Commence
Stage II
"3"
= Swi tch
all to leading treatment
"4"
= Swi tch
all to third treatment
t
f
i
o*
1 *
2 *
3 *
4 *
5 *
6*
7 *
8 *
9 *
10 *
11 *
12 *
13 *
14 *
0
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
1
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
2
*
1
1
1
3
3
3
3
3
3
3
3
3
3
3
*
1
1
1
3
3
3
3
3
3
3
3
3
4
*
1
1
1
3
3
3
3
3
3
3
3
5
*
1
1
1
1
3
3
3
3
3
3
= 72
6
*
1
1
1
2
2
2
2
2
2
7
*
1
1
2
2
2
2
2
2
8
*
1
2
2
2
2
2
2
9 10 11 12 13 14
* * * * * *
4 4 4 4 4 4
4 4 4 4 4
4 4 4 4
4 4 4
4 4
4
APPENDIX 5(c}
Modes Tables for t - 86-100, N - 100, (a,b) - (.6, .4)
e
•
Key
See Appendix 5(b).
t
= 86
f
i
o*
1 *
2 *
3 *
4*
5 *
6 *
7 *
t
= 88
f
i
o*
1 *
2 *
3 *
4*
5 *
6 *
t
= 90
f
i
o*
1
2
3
4
5
t
= 92
*
*
*
*
*
f
i
o*
1
2
3
4
t
= 94
*
*
*
*
f
i
o*
1 *
2*
3 *
0
*
1
1
1
3
3
3
3
3
1
*
1
1
1
1
3
3
3
2
*
1
1
1
1
1
1
3
*
1
1
2
2
2
4
*
1
1
2
2
5
*
1
1
1
6
*
4
4
0
*
1
1
1
3
3
3
3
1
*
1
1
1
1
1
1
2
*
1
1
1
2
2
3
*
1
1
2
2
4
*
1
2
2
5
*
1
4
6
*
4
0
*
1
1
1
1
3
3
1
*
1
1
1
1
1
2
*
1
1
2
2
3
*
1
1
2
4
*
1
1
0
*
1
1
1
1
1
1
*
1
1
1
2
2
*
1
1
2
3
*
1
2
4
*
1
0
*
1
1
1
1
1
*
1
1
2
2
*
1
1
3
*
1
7
*
4
e
5
*
4
All remaining entries in the modes tables for t
= 96-100
are "1".
e
REFERENCES
Anderson. T.W. (1960) A Modification of the Sequential Probability
Ratio Test to Reduce Sample Size. Ann. Math. Statist. 31 165-97.
Anscombe. F.j. (1963) Sequential Medical Trials. ]. Am. Statist.
Assoc. 58 365-83.
Armitage. P. (1960)
SequentiaL MedicaL TriaLs. Blackwell: Oxford.
Armitage. P. (1985) The Search for Optimality in Clinical Trials.
Internat. Statist. Rev. 53 15-24.
Bather. j.A. (1981) Randomized Allocation of Treatments in Sequential
Experiments (with Discussion). ]. R. Statist. Soc. B 43 265-92.
Bather. j.A. (1985) On the Allocation of Treatments in Sequential
Medical Trials. Internat. Statist. Rev. 53 1-13.
Bather. j.A. and Simons. G.D. (1985) The Minimax Risk for Two-Stage
Procedures in Clinical Trials. ]. R. Statist. Soc. B 47 466-75.
Bechhofer. R.E .. Kiefer. j. & Sobel. M. (1968) SequentiaL
Identification and Ranking ProbLems. Univ. of Chicago Press.
Bechhofer. R.E. and Kulkarni. R.Y. (1982) Closed Adaptive Sequential
Procedures for Selecting the Best of k ~ 2 Bernoulli Populations,
Third Purdue Symp. on Stat. Decision Theory I 61-108.
Begg. C.B. and Mehta. C.R. (1979) Sequential Analysis of Comparative
Clinical Trials. Biometrika 66 97-103.
Berry. D.A. (1978) Modified Two-Armed Bandit Strategies for Certain
Clinical Trials. ]. Am. Statist. Assoc. 73 339-45.
Berry. D.A. and Fristedt. B. (1985) Bandit ProbLems: SequentiaL
ALLocation of Experiments. Chapman and Hall: London.
Bulpitt, C.j. (1983) Randomised ControLLed CLinicaL TriaLs. Martinus
Nijhoff: The Hague.
Canner, P.L. (1970) Selecting One of Two Treatments when the
Responses are Dichotomous. ]. Am. Statist. Assoc. 65 293-306.
Chernoff. H. (1967) Sequential Models for Clinical Trials. Fifth
BerkeLey Symp. on Math. Stat. 8 Prob. IV 805-12.
Chernoff. H. and Petkau. j. (1981) Sequential Medical Trials
involving Paired Data. Biometrika 68 119-32.
Colton. T. (1963) A Model for Selecting One of Two Medical
Treatments, ]. Am. Statist. Assoc. 58 388-400.
Colton. T. (1965) A Two-Stage Model for Selecting One of Two
Treatments. Biometrics 21 169-80.
Cornfield. J .. Halperin. M. & Greenhouse. S.W. (1969) An Adaptive
Procedure for Sequential Clinical Trials. ]. Am. Statist. Assoc.
64 759-70.
Eick. S.G. (1988) The Two-Armed Bandit with Delayed Responses. Ann.
Statist. 16 254-64.
Feldman. D. (1962) Contributions to the "Two-Armed Bandit" Problem.
Ann. Math. Statist. 33 847-56.
Flehinger. B.J. and Louis. T.A. (1971) Sequential Medical Trials with
Data Dependent Treatment Allocation. Sixth Berkeley Symp. on
Math. Stat. B Prob. IV 43-51.
Gittins. J.C. (1979) Bandit Processes and Dynamic Allocation Indices
(with Discussion). ]. R. Statist. Soc. B 41 148-77.
Glazebrook. K.D. (1978) On the Optimal Allocation of Two or More
Treatments in a Controlled Clinical Trial. Biometrika 65 335-40.
Glazebrook. K.D. (1980) On Randomized Dynamic Allocation Indices for
the Sequential Design of Experiments. ]. R. Statist. Soc. B 42
342-46.
Herson. J. (1984) Statistical Aspects in the Design and Analysis of
Phase II Clinical Trials. In Cancer Clinical Trials: Methods and
Practice (Buyse. M.E .. Staquet. M.J. & Sylvester. R.J .. ed.s).
Oxford University Press 239-60.
Hoel. D.G .. (1972) An Inverse Stopping Rule for Play-the-Winner
Sampling. ]. Am. Statist. Assoc. 67 148-51.
Hoel. D.G .. Sobel. M. & Weiss. G.H. (1972) A Two-Stage Procedure for
Choosing the Better of Two Binomial Populations. Biometrika 59
317-22.
Iglewicz. B. (1984) Alternative Designs: Sequential. Multi-Stage.
Decision Theory and Adaptive Designs. In Cancer Clinical Trials:
Methods and Practice (Buyse. M.E .. Staquet. M.J. & Sylvester.
R.J .. ed.s). Oxford University Press 311-36.
Kulkarni. R.Y. and Jennison. C. (1986) Optimal Properties of the
Bechhofer-Kulkarni Bernoulli Selection Procedure. Ann. Statist.
14 298-314.
Kulkarni. R.Y. and Kulkarni. Y.G. (1987) Optimal Bayes Procedures for
Selecting the Better of Two Bernoulli Populations. ]. Statist.
Planning and Inference 15 311-30.
Lai. T.L. (1987) Adaptive Treatment Allocation and the Multi-Armed
Bandit Problem. Ann. Statist. 15. 1091-1114.
•
Lai. T.L .. Levin. B.• Robbins. H. & Siegmund. D. (1980) Sequential
Medical Trials. Proc. Nat'L. Acad. Sci. USA 77 3135-38.
Lai. T.L. and Robbins. H. (19S5) Asymptotically Efficient Adaptive
Allocation Rules. In Advances in AppLied Mathematics 6. Academic
Press: New York 4-22.
Lai. T.L .• Robbins. H. & Siegmund. D. (1983) Sequential Design of
Comparative Clinical Trials. In Recent Advances in Statistics
(Rizvi. M.• Rustagi. J. & Siegmund. D. ed.s). Academic Press: New
York 51-68.
McPherson. K. (1984) Interim Analysis and Stopping Rules. In Cancer
CLinical Trials: Methods and Practice (Buyse. M.E .• Staquet. M.J.
& Sylvester. R.J .. ed.s). Oxford University Press 407-22.
Novick. M.R. and Grizzle. J.E. (1965) A Bayesian Approach to the
Analysis of Data from Clinical Trials. ]. Am. Statist. Assoc. 60
81-96.
Paulson. E. (1967) Sequential Procedures for Selecting the Best One
of Several Binomial Populations. Ann. Math. Statist. 38 117-23.
Paulson. E. (1969) A New Sequential Procedure for Selecting the Best
One of k Binomial Populations. Ann. Math. Statist. 40 1865-74.
Peto. R.. Pike. M.C .. Armitage. P. et al. (1976) Design and Analysis
of Randomized Clinical Trials requiring prolonged observation of
each patient. Br. ]. Cancer 34 585-612.
Pocock. S.J. (1977) Group Sequential Methods in the Design and
Analysis of Clinical Trials. Biometrika 64 191-99.
Pocock. S.J. (1984)
Wiley: London.
CLinicaL TriaLs: A PracticaL Approach. John
Pocock. S.J .• Armitage. P. & Galton. D.A.G. (1978) The Size of Cancer
Clinical Trials: An International Survey. UIce Tech. Rep. Ser. 36
5-34.
Robbins. H. (1952) Some Aspects of the Sequential Design of
Experiments. BuLL. Amer. Math. Soc. 58 527-35.
Rodman. L. (1978)
491-98.
On the Many-Armed Bandit Problem. Ann. Probab. 6
Schwartz. D.• Flamant. R. & Lellouch. J. (1980) CLinicaL TriaLs.
Academic Press: London. (Trans. Healy. M.J.R.).
Simon. R. (1977) Adaptive Treatment Assignment Methods and Clinical
Trials. Biometrics 33 743-49.
Simons. G.D. (1986) Bayes Rules for a Clinical Trials Model with
Dichotomous Responses. Ann. Statist. 14 954-70.
Simons, G.D. (1986b) A Randomized Horizon Clinical Trials Model,
(unpublished manuscript).
Sobel, M. and Huyett, M.J. (1957) Selecting the Best One of Several
Binomial Populations, Bell System Tech. J. 36 537-76.
Sobel, M. and Weiss, G.H. (1971) Play-the-Winner Rule and Inverse
Sampling in Selecting the Better of Two Binomial Populations, ].
Am. Statist. Assoc. 66 545-551.
Sylvester. R.J. and Staquet, M.M. (1977) An Application of Decision
Theory in Phase II Clinical Trials in Cancer. In Recent Advances
in cancer Treatment (Tagnon, H.J. and Staquet, M.M. ed.s). Raven
Press: New York.
Sylvester, R.J. and Staquet, M.M. (1980) Design of Phase II Clinical
Trials in Cancer using Decision Theory, cancer Treat. Rep. 64
519-24.
Tagnon, H.J. (1984) Ethical Considerations in Controlled Clinical
Trials. In cancer Clinical Trials: Methods and Practice (Buyse,
M.E., Staquet, M.J. & Sylvester, R.J., ed.s), Oxford University
Press 14-25.
Wald, A. (1947)
Sequential Analysis, John Wiley: New York.
Wetherill, G.B. (1975)
Hall: London.
Sequential Methods in Statistics, Chapman and
Whitehead, J. (1983) The Design and Analysis of Sequential Clinical
Trials, Ellis Horwood: Chichester.
Zaborskis, A.A. (1976) Sequential Bayesian Plan for Choosing the Best
Method of Medical Treatment, Automatika i Telemekhanika 11
144-53.
Zelen, M. (1969) Play the Winner Rule and the Controlled Clinical
Trial, J. Am. Statist. Assoc. 64 131-46
Zhang, C.-H. (1987) Asymptotically Optimal Sequential Clinical
Trials, (unpublished manuscript).
•