Lectures on Optimal Design and
Sequential Analyses
Herman Chernoff
lnst. of Statistics Mimeo Series # 1349
Preface
Professor Herman Chernoff, Massachusetts Institute of Technology, delivered the Harold Hotelling Lectures
1981, on January 26-29, under the titles
January 26, 1981:
Optimal Design of Experiments
January 27, 1981 :
Sequential Design of Experiments
January 28, 1981 :
Continuous Time Sequential Problems
January 29, 1981 :
Massachusetts Number Game
These are the Lecture Notes of the first three lectures, with contributions
by Paul P. Gallo and Leonard A. Stefanski.
Also taking notes were Ying So,
Kenneth Risko, David Giltinan, Jed Frees, Bruce J. Collings and Reuel L. Smith.
Paul Gallo and myself did the proofreading of the typescripts.
I. M. Chakravarti
June
·e
1981
LECTURES ON OPTIMAL DESIGN AND SEQUENTIAL ANALYSES
Herman Chernoff
Massachusetts Institute of Technology
I.
Introduction
These notes are the framework of the Hotelling Lectures, a series of
three lectures presented at the Department of Statistics at the University
of North Carolina at Chapel Hill on January 26-28, 1981.
These lectures touch briefly on three topics which are discussed more
fully in the SIAM Monograph No.8, entitled Sequential Analysis and OptimaZ
Design.
The titles of the lectures are
1)
Optimal Design of Experiments for Fixed Sample Size,
2)
Sequential Design of Experiments, and
3)
Continuous Time Sequential Problems
Many of the ideas and results are introduced in terms of examples with relatively little explanatory introduction, but I believe that these notes present a relatively comprehensible glimpse of a subject of importance and
interest.
II.
Optimal Design of Experiments
A.
Optimal Design in Estimation
We first consider some fixed sample size estimation problems:
Example 1
to estimate
(i)
~;
Xl' X ""X are i.i.d.
2
n
N(~,02), with 0 2 known.
how large a sample should be taken?
We wish
Costs involved:
cost of sampling: it is often reasonable to assume that this is
linear in n;
page 2
(ii)
"cost" of bad decision: squared error loss is often used.
X to
Thus, if we use the sample mean
estimate ~, then for some c, k, the
cost is
en
-
+ (X - ~)
2
k
We want to minimize the expected cost:
2
en + ko In
2
1/2
This is done by choosing n = nO =(ko Ic)
Co
=
2(ck)1/2 0
and the optimal cost is
.
We now consider regression experiments in which we are interested in estimating
some specific functions of regression parameters.
Y. = a + 6x. + u., i = l,oo.,n;
Example 2
1
1
Ixil ~ 1, i = 1, ... ,n ; we want to choose xl" .. ,x
estimate of
S has
:0;
x.
1
:0;
x*, i
1
n
so that the least squares
minimal variance.
Y.
Example 3
o
u. LLd. N(0,o2)
1
=
Bx.
+
2
yx.
i = l,oo.,n
+
111
u. Li.d. N(o,i);
1
l,oo.,n; we want to choose xl' ... .,x n
e
squares estimate of
=
so that the least
2
SX o + YXo has minimal variance.
Elfving (1952, A.M.S.) derived an elegant solution to the following more
general problem, which includes the two previous examples.
Y.
Example 4
~i =
(Xli' x 2i )
1
E
Elfving's solution:
·e
=
+
u., i=l, ... ,n;
some set S;
Consider the convex
set generated by Sand -S (the pointwise
reflection of S about the origin).
1
2
u. i.i.d. N(O,o )
1
page 3
Draw the ray from the origin through the point
~
=
(al,a ) , and consider
Z
the point k where the ray penetrates the convex set.
level corresponding to that point n times; if
sponding to -k n times; if k
points sl'sZ
t
to the points ± sl' ±
A
Sz
OZ
Var(8) = n
E
S, repeat the
-S, repeat the level corrc-
S,-S, then it is a convex combination of two
S u -S, with weights wl,w
E
~ E
If Z
Z
say; repeat the levels corresponding
in proportions determined by these weights:
I la liZ
~
IlkliZ
In Example Z:
Sz
I
~_
f\Q
sl =(1,1)
-+
this line
is S
~ =
(0,1):
k
= ~
thus put half the observat ions at +1, and half
at -1.
(1 , -1)
If we instead wished to estimate ex + -1? 8,
an optimal design puts all observations at
x
=
liZ.
An alternative is to note that (l,l/Z)
is a weighted average of (1,1) and (1,-1) and to
put 3/4 of the observations at x
rest at x
=
-1.
To estimate ex
+
Z8 optimally, one must
put 3/4 of the observations at x
the rest at x
·e
= 1 and the
= -1.
= 1,
--t------
page 4
In Example 3, our decision depends on the particular value of xO: it may
be optimal to place all observations at
X
o (Fig.
1), or to place some pro-
portion of them at X and the rest at some smaller value (see Fig. 2, where
the slope of the ray from 0 to
Z
is
x~.
2
(x*,x* )
-+ S
-S-+
Fig. 2
Fig. 1
Comments:
Suboptimal designs can be compared with optimal ones and are often
1)
preferred for practical considerations arising outside the problem statement.
For each choice of x
2)
=
(x ,x )
l 2
T
E
S, the corresponding information
T
2
matrix is a- x x . This has rank one which is to be expected since repeti-
tions of this experimental level can only yield consistent estimates of one
function of 13.
Non-regression problems:
Example
Probit (Dose response model)
5
We assume that the dose level x of a drug required to achieve a response is
N(~,a
2
),
~
and a
2
unknown.
p(x;
with w =
x-~
a
,
~,a
Then the probability of response to dose x is
2
X-II
) = <P(cf) = <P(w)
and where <P is the standard normal distribution function:
page 5
¢(t)
1
= ----
/2TI
e
-t 2/2
¢(x) =
fx
¢(t) dt
_00
We want to select dose levels xl' ... ,xn to minimize the variance of the
estimate of
the level required to achieve probability
~-20,
.0228 of response.
The information matrix corresponding to level x becomes:
where
¢ (w)
T
I¢(w)(l-¢(w)) (l,w) = V
or
(e)
I
x
=
n,2(w)
-Oj~~~--
02¢(w) (l-¢(w))
d
(¢(w) = dw ¢(w)).
Since a solution to this design problem depends only on the information matrices,
and the information matrices of this problem are like those of the regression
problem with (X 'X ) replaced by
l 2
(¢(w) (¢(w) (l-¢(w)) 0 2 )-1/2, w¢(w)(¢(w)(l-¢(w)) 0 2 )-1/2)
and (a l ,a 2) replaced by (1,-2), the solution can be obtained using the
Elfving method:
page 6
X=lJ+O (w=l)
•
X=lJ (w=O)
---+----.-~
X
1
X=lJ+l.570
in the direction of (1,-2)
The solution is to put the observations at the levels lJ ± 1.570 in proportions
determined by c
.~
l
and c .
2
However, the values lJ ± 1.570 are unknown at the
start of the experiment.
We need preliminary estimates of lJ and o.
Thus we
must use a sequential design.
B.
Optimal Design in Testing Hypotheses
In a simple hypothesis vs. simple hypothesis testing situation, we can
always arrange for both error probabilities to converge to zero as n+
ponentially fast, say, as e-
np
for some p > O.
00,
ex-
As before, we might consider
minimizing risk functions of the form
cn + ke- np
c
1
This is minimized by nO = - p log (kp) .
zero we have n +
O
1
- -p clog c
.~
+
00.
Indeed n
o
~-
As the cost per observation c approaches
1
p
- log c, and the minimum cost is approximately
kc.
1
We can interpret - - clog c as the cost of sampling and kc as the cost of
p
errors.
Although these approximations are rough, they do indicate how large a
page 7
sample size to take when the cost per observation is small.
~
Note that the
cost of experimentation should be the main part of the overall risk.
A design problem to consider:
Example 6
•
Consider a device, the lifetime of which is exponentially dis-
tributed with failure rate
e
(mean,
e-1 ).
We want to test:
vs
The following restrictions are imposed:
we can only observe our sample at two time points t , t ; thus we
z
l
can determine only the number of failures in the three intervals (O,t l ), (tl,t )
Z
(i)
(tz,oo).
(ii)
.~
i
=
letting m.
1
1,Z; we must base
=
number of components still functioning at time t.,
1
our test on the statistic T
=
m
l
+
m.
Z
The problem is to find values t , t z which provide an optimal design for a test
l
of the above hypotheses to be based on T, where HI is rejected if T
~
k
for some k.
In this example we may regard the statistic T as nY where
Y.
1
=
0
= 1
=
Z
where Z. is the lifetime of the i-th unit.
1
To find the asymptotic behavior of the error probabilities of this test
.~
procedure we use
page 8
If a ::; EX, then
Theorem 1.
l log{P[X::;a]}
7
-log inf E[et(X-a)] = pea).
n
•
t
This result may be paraphrased to state that P[X~a] is roughly of the order of
magnitude of e-np(a)= [m(a)]n where
mea) = inf E[et(X-a)]
t
Here the term "roughly" refers to the fact that the approximation in the paraphrase may neglect factors which approach 0 or
n
many applications P(X::;a) ~ n- 1/2 m (a).
00
like powers of n. Indeed in
However, the exponential part is the
most important part of this approximation for our purposes.
Note that if
a ~ E(X), it is easy to show that P[X~a] is roughly of the order of
mn(a) = e-np(a)
Our test procedure consists of rejecting HI if
a
Y::;
a
=
kin.
By selecting
between E (Y) and E (Y), we can show that the error probabilities of our
H1
H2
procedure approach 0 exponentially fast in n, at rates Plea) and P (a), deter2
mined by a, 8 and 8 (and implicitly on t and t ).
2
1
l
2
which PI (a) = P2(a) = Po
which a
+
will yield error probabilities a = E and
1
.e
a
l
and t
2
for
S = E2 for
AS is approximately minimized and for which roughly E1~ E2~ e
Our optimal design is obtained by selecting t
•
The value of
-np
0
appropriately to maximize
Suppose now that we decide to remove the restriction on the use of T.
Given the design, i.e. the choice of t 1 and t , we may wish to use the efficient
2
likelihood-ratio
test.
However, the likelihood-ratio test is also a test of
the form: reject H if Y ::; a.
O
page 9
To see this we note that the likelihood-ratio test for testing
HI: f(x) = f l (x) vs
HZ: f(x) = f z (x)
n
A =
II
i=l
•
f l (\)
£Z(\)
:5 k
consists of rejecting HI if
l
or if
L Y.1
:5 k = log k
2
l
or i£
where
Y.1
= log[fl(X.)/fZ(X.)]
1 1
In the Bayesian context where the wrong decision under Hi costs r i and Hi has
-e
see that for the Bayes tests the error probabilities approach zero roughly
like
n
m
-n p
m
inf
0:5t:51
o =e
O
where
o=
J f~(x)
f;-t(x) dx
if £1 and £z correspond to continuous distributions.
The integrals are replaced
by a sum if the distributions are discrete.
Thus each design choice in our less restricted problem consists of selecting t
l
and t
z
appropriately.
That is, for each t , t z , there is a corresponl
ding f l and £Z and m ' both of which are discrete distributions with three possible
O
values.
.e
We select t l and t z to minimize m or maximize Po = -log mO.
O
We shall not pursue further the numerical details for this problem .
page 10
III.
Sequential Design of Experiments
A.
•
Sequential Analysis for Testing Simple Hypotheses
versus Simple Alternatives
•
Wald (1947) introduced the SPRT (Sequential Probability Ratio Test) for
testing HI: f(x) = flex) versus H : f(x) = f 2 (x) where f(x) is the distribution
2
function of the i. i. d. random variables Xl' X , ...
2
This test procedure is to
Reject HI if A.n ~ B
Accept HI if A.n ~ A
and continue sampling if B < A.n < A
where
A.
·e
n
=
n
II
i=l
is the sequential probability-ratio and B < 1 < A.
Then
n
Sn = log A.n
= l Y.3.
i=l
where the
Y. = log
1
are i.i.d. observations on Y.
The SPRT has several basic properties:
1)
The SPRT leads to termination with probability one.
More precisely,
let N be the sample size n when An first fails to be within (B,A).
N=
00
if B < An < A for all n.
As long as PH. (Y=O) < 1 for i = 1,2,
We let
i.e.
3.
flex) and f 2 (x) represent distinct probaility distributions, PH. (N=oo) = 0, i=I,2.
3.
.e
2)
page 11
satisfy
..
~< B
1-8 -
and
•
I-a> A
8 -
Indeed, these inequalities are approximations under conditons where the "excess"
is small.
The excess corresponds to the overshoot beyond A or B of the proba-
bility ratio at termination of sampling.
bounds can be
constructe~
These approximations, for which
state that
B(A-l)
A-B
l-B
8 : : : A-B
3)
The expected sample size E . (N) satisfies
H
1
•
Assuming that the excess is negligible, SN ~ log A when HI is accepted and
SN
~
log B when HI is rejected and hence
EH (SN)
~
a log B + (I-a) log A
1
EH (SN) Z (1-8) log B
+ 8
log A .
2
Moreover,
and
.e
are the Kullback-Leibler information numbers for discriminating between f
l
and f .
2
page 12
Thus
•
E (N)
H
~
~
1
1
I(f ,f ) {a log B
l 2
+
(I-a) log A}
and
E (N)
H
2
•
4)
~
~
1
I(f ,f )
2 l
{(B-1) log B - B log A}
The sequential probability ratio test is optimal.
Given a SPRT with error probabilities a and
error probabilities a'
~
B'
a and
~
B, for any other test with
B both expected sample sizes under HI
and H2 are at least equal to those for the SPRT.
Moreover, the Bayes sequential
tests are SPRT's if the cost of sampling is linear in sample size.
B.
-e
Asymptotic Behavior of the Bayes SPRT
Suppose that the cost of the wrong decision when H. is true is r. and that
1
the cost per observation is c.
1
Then the Bayes sequential test as c
taking a large number of observations.
This means that A +
00
and B
+
0 involves
+
O.
Assuming that log A and -log B are of the same order of magnitude, one may
approximate
a ::::; B,
Bz
-1
A , E (N)
H
1
Then minimizing the Bayes Risk
R = 'lTlR l
+
'lT 2 R2
where the risks
.e
and
'IT.
1
R = cE (N)
H1
l
+
rIa
R2 = cE (N)
H2
+
r 2B
are the prior probabilities of the H., leads to
1
page 13
B ~ c,
..
E (N)
H
1
Thus the main part of the expected cost for the Bayes
Procedure comes from the
cost of sampling and is of the order of magnitude of -c log c.
Furthermore,
this cost is inversely proportional to the Kullback-Leibler Information Number
and relatively insensitive to the prior probabilities and the costs of in-
·e
correct decisions.
To see the relevance of this result for the problem of design of experiments, suppose that there are available two experiments for sequentially testing HI vs
H2 .
Under the first experiment we observe Xl 'X 2 ...
with informa-
tion numbers l(f ,f ) = Il,and I(f ,f ) = 1 , Suppose that for the second ex2
2 l
1 2
periment we observe Xi,Xi ... with information numbers Ii and Ii.
If II
>
Ii and 12
>
Ii,
(and the same cost c per observation) it seems
clear that we should prefer to observe the Xi'
the situation is not as clear.
is true but Xi,X
z""
Then Xl 'X 2 ' ...
if H2 is true.
there would be no need to experiment.
If II > Ii but 12 < Ii,
are to be preferred if HI
If we knew which hypothesis were true,
However, as data cumulate, one may be
still unsure enough about which hypothesis is true to invest in further ob-
.e
servations without being sure enough to stop and make a final decision.
that event it makes sense to select X if An > 1 and x* if An < 1.
In
page 14
~
This strategy would involve shifting from X to X* or vice versa as An changes
during experimentation.
Another insight to be gained from the above asymptotic analysis is the
•
following.
Let TI
Xl,XZ""'Xn '
ln
and TI
2n
be the posterior probabilities after n observations
Then Bayes Theorem states that
A
n
=
Thus the asymptotic Bayes sequential rule corresponds to stopping and selecting
HI when TI Zn goes below some number roughly of the order of magnitude of c.
Similarly we stop and select HZ when TI
C.
ln
goes below some such number.
A Simple Composite Hypothesis Testing Problem.
The simplest composite problem will shed some light on our problem.
us test HI: 8 = 81 vs
HZ: 8 = 8 Z or 8 = 83 ,
Let TI = (TI l ,TI ,TI ) represent the
Z 3
prior probability distribution.
In the Bayesian framework we may compute
Sl2n =
S13n =
.e
n
f(x.18 )
1
L log
18 )
Z
n
f(x.18 1)
L log f(X: 1 83)
I Y =
i=l IZi
L Y =
i=l 13i
f(X:
Suppose now that HI: 8 = 8 , is true.
1
Then
Let
the posterior distribution
page 15
•
and, assuming TIl > 0, TI
and exp[-nIC8 ,8 )].
l 3
2n
and TI
3n
will approach zero roughly as exp[-nIC8l,82)]
The posterior probability of H = TI 2n
2
+
TI 3n will approach
zero as the sum of these, i.e., like exp[-nIC8 l )] where
When HI is true the experimenter should select an experiment which maximizes
IC8 ).
l
·e
The following example is informative.
Example 7
We have three experiments with IC8 ,8 ) described in the
l i
following table.
6
3
2
2
3
6
2
3
2
Of these three experiments e
2
were to randomize and select e
have IC8 l ,8 2 )
I,
will maximize IC8 ).
l
l
and e
= IC8 l ,8 3 ) = IC8 l ) = 4.
3
However, if the statistician
each with probability 1/2, we would
This suggests a potential advantage
of selecting experiments by using randomization.
D.
A Proposed Plan for the Sequential Design of Experiments
Assembling the suggestions of the preceding section we propose a procedure
for the problem of sequential design of experiments formulated as follows:
Let E be a set of available elementary experiments e whose outcome X has
e
page 16
density f(xje,e) with respect to a measure ~e.
H : S E8
2
2
be two hypotheses where 8
1
Let HI: S E 8, and
and 8 2 are disjoint sets whose union
8 represents the set of possible states of nature.
Let reS) be the cost of
deciding incorrectly and let c be the cost per observation.
servation a decision is made to stop or continue sampling.
stopped, either HI or H is accepted.
2
iment en+ lEE is selected.
After each obIf sampling is
If sampling is continued a new exper-
While its choice may depend on the past data
(X ,X , ... ,Xn ) once the experiment is selected its outcome is independent of
I 2
the past.
To describe the proposed solution, let E* be the set of randomized experFor e E E*,
iments generated by E.
Y(S,¢,e) = 10g[f(X IS,e)/f(X I¢,e)]
e
·e
e
has expectation I(S,¢,e) and variance v(S,¢,e) when S is the true state of
nature.
Let Yn(S,¢) = 10g[f(Xn IS,e n )/f(Xn I¢,e n )] and
n
I
i=l
Y. (S,¢)
1.
and 8 be the maximum-likelihood estimate (m.l.e.) of 8 based on X ,X , ... ,X .
n
n
I 2
Let h(8) = 8. and a(8) = 8 - 8. if S E 8., i=I,2 and let
1
1.
1
8 based on Xl' ... ' Xn when 8 is restricted to a(Sn).
en
be the m.l.e. of
Here h(S) is identified
with the "hypothesis of e" and a(8) with the "alternative to the hypothesis
of 8", and en is the m.l.e. on the alternative to the hypothesis of en.
We describe our sequential procedure, Procedure A:
accept the hypothesis h(8 ) if
n
.e
sn (e n ,8n )
> -log c.
stop sampling and
page 17
E E* to maximize
Otherwise select
inf
I(8,¢,e).
¢Ea(8)
n
n
•
On the basis of our motivation one would anticipate that the procedure
would be asymptotically optimal and that its risk would be asymptotically
~
R(8)
- clog c/I(8)
where
1(8)
sup inf I(8,¢,e)
eEE* ¢Ea(8)
Indeed, the following theorem holds;
Theorem
If E and 8 are finite and for every e E E, I(8,¢,e) > 0
and v(8,¢,e) <
.~
00
F ¢,
for all 8 and ¢ E 8 with 8
and 1(8) > 0 then the
risk R(8) for procedure A satisfies
R(8) ~ [1
and any procedures
R(8)
+ 0(1)]
{-c log c} for all 8
1(8)
for which the risk R*(8)
2:
[1
+ 0
(1)]
CC
log c]
1(8)
= O(-c
log c) for all 8 satisfies
for all 8.
In short, this theorem states that procedure A is optimal in the sense that
to do substantially better for any state of nature, one must do worse by an
order of magnitude for some states of nature.
This result has been generalized considerably.
It breaks down when points of 8
1
It has two major shortcomings.
and 8 2 can get close in the sense of
I(81,82)~O.
It is not very dependable in the early stages of experimenting and requires supplementation for problems which will require moderate sample sizes .
.~
page 18
IV.
Continuous Time Sequential Problems
A.
•
Sequential Problems
We list three problems .
1)
Deciding sign of the mean of a normal distribution.
Let X ,X , ... ,Xn ... be i.i.d. N(~,02) with 0
l 2
to test HI:
~
> 0 vs
H :
2
~ ~
2
known.
It is desired
O. The SPRT was designed to test a simple hypothe-
sis versus a simple alternative and could be modified for composite hypotheses.
However, such modifications are inadequate for this case where points of the
alternative hypotheses are close to one another.
The classical approach con(~1'~2)
sists of assuming the existence of an indifference zone
doesn't matter what decision is made and testing Hi:
.~
= ~l
~
This approach leads to unduly large sample sizes for ~
=
vs
within which it
2: ~
= ~2'
H
(~1+~2)/2 and is theo-
retically inadequate if the cost of sampling is small and there is a positive
cost of wrong decision for all
f O.
~
We complete the statement of the problem in a decision theoretic form and
apply a Bayesian approach to state a clean cut optimization problem.
of sampling is c per observation.
Finally we assume that
The cost of a wrong decision is
~ has the N(~O,0~) prior distribution.
The cost
r(~) = k(~).
We seek a sequen-
tial decision procedure which minimizes the expected cost.
2)
One armed bandit.
Let
XI,X2""'~1 be
i.i.d.
Xl + X2+···+ Xn if he stops at N
on the past data X
~
=
=
(Xl" .. ,X N)·
n~M,
N(~,02),
with 0
2
known.
A player receives
where the stopping time N may be based
The unknown mean
~
has the
2
N(~O,00)
prior
distribution.
This problem is a normal version of the one armed bandit problem where a
page 19
e
player may play up to M times on a one armed bandit in an environment where
most "bandits" may be unfavorable but it is known that some are favorable.
3)
Canonical
problem.
Let Yn be a process starting at Yo for n
ative integer.
one, increase n by one.
a
B.
if Yo
nO' where nO is a neg-
As n increases Y + = Yn + un where the un are independent
n l
N(O,l) random variables.
ceives
=
2
a
A player may stop with no payoff or at a cost of
If he continues till n
the game ends and he re-
= 0,
and Y~ if YO ~ O.
Bayesian Analysis
In problems 1 and 2, the data!n
=
(X l 'X 2 ' ... ,Xn ) and the prior
2
N(~O'oO)
give rise to the posterior distribution.
L(~fX )
III
= N(Yn ,s n )
where
Yn
=
sn
= (0 0-2
and
+
no
-2 -1
)
is decreasing in n.
It is possible to show that as data cumulate, Y , the Bayesian estimate
n
of
~,
changes according to independent increments, and
L(Y n -YmIY m)
= N(O,s m-s n )
for n
In problem 1, the posterior risk upon stopping at N
.e
=n
2
is
d (Y , s )
l n n
where
d 1 ( y, s)
=
ks 1/ 2"'(ys-l/2)
'f'
+
-2 0 2
co 2s -1 - co a
m
2
O.
page 20
and
~( ) = {¢(U)=U[l-¢(U)]
].1
¢(u)+u¢(u)
u
u
~
~
0
o.
Thus our sequential Problem 1 has been reduced to the stopping problem of
observing the Gaussian stochastic process {Y ; n
n
= 1,2 ... }
of independent
increments and stopping with cost dl(y,s) if stopping takes place at
Yn
=
y, sn
=
s.
The object is to find the stopping rule which minimizes
A similar analysis shows that Problem 2 also reduces to a stopping problem
with stopping cost
for 0
~
n
~
M.
Here stopping is enforced at M if it has not taken place earlier.
-e
also easily seen to be a stopping problem with {Yn ; n
stopping enforced at n
= 0,
Problem 3 is
= nO' n O+ l, ... O}, s = n,
and stopping cost
if s
=
0 and y < 0
otherwise
c. Continuous Time Version of Stopping Problems
Each of the three problems of Section A has a continuous time version.
These versions are of interest in themselves.
The solutions of the discrete time
problems are approximated by those of the continuous versions which are more
readily subject to mathematical analysis.
Problem 1*.
We observe X(t), a Wiener process (Gaussian process of in-
dependent increments) with drift ].1 per unit time and variance
.e
i.e., E[dX(t)]
2
= ].1dt,
Var[dX(t)]
= o2 dt .
0
2 per unit time;
Once again the prior distribution is
N(].10'OO) and the cost of wrong decision is r(].1)
= k(].1).
page 21
Here the posterior distribution of
~
given X(t'), 0
=
L(~IX(t'), 0 ~ t' ~ t)
~
t'
~
t
is
N(Y,s)
where
Y = Y(s)=
and
-2
-2-1
s = (° 0 +to )
The process Yes) is a Wiener process in the -s scale starting at
(Y(s),s) = (~0,002), i.e., E[dY(s)] = 0, and Var[dY(s)]= -ds.
The Bayes strategy for this problem is the solution of the continuous time
stopping problem involving {yes):
°02
~
s > o} with stopping cost d l (y,s).
2
The transformation s* = a s, y* = aY, permits one to normalize this problem
to one with stopping cost
d*(
1 y,s ) = s -1
.
Problem 2*.
+
2
S1/2 ,10/.( y,s -1/ ) .
The continuous time version of the one armed bandit problem
is similar to Problem 1* except for the stopping cost d 2 (y,s) which can be
normalized to
for s
~
1
with stopping imposed at s = 1.
Problem 3*.
The continuous time version of this problem involves
observing {Yes): So ~ s ~ o} with Y(so) = yO' stopping imposed at s = 0 and
stopping cost
·e
if s = 0 and y < 0
otherwise
page 22
D.
Free Boundary Problem
In this section we shall relate the optimal solution of continuous time
stopping problems to that of a free boundary problem involving the heat
equation.
First let beY,s) represent the risk or expected cost for a specified
stopping rule given that the stochastic process Y has reached Yes)
= y.
That is
beY,s)
= E{d[y(S),S] IY(s) = y}
where S > s is a stopping time (identified with the rule).
optimal risk achievable given yes') for s'
process of independent
p(y,s)
~
increment~
~
sand Yes)
Let p(y,s) be the
= y.
Since Y is a
p depends on yes') only through (y,s). Then
deY,s) and for a procedure to be optimal it must call for stopping
only when p(y,s)
=
deY,s).
Thus we may confine attention to stopping rules
which are determined by a stopping set S and the complementary continuation
set C in the (y,s) plane.
Clearly, beY,s)
= deY,s)
for (y,s)
E
S.
For (y,s) an inner equation point
of C,
b(y,s+o)
=
E{b(Y(s),s) IY(s+o)
=
y}
together with the fact that the conditional distribution of Yes) is Normal
with mean y and variance 0 yields the heat equation
~ b
yy
= bs
for (y, s)
E
C.
A somewhat more delicate analysis shows that the optimality condition for the
stopping rule reduces to smoothness of b across the boundary of C.
Thus the
problem of finding an optimal stopping set S is related to that of finding the
solution (p,C) of the following free boundary problem (f.b.p.)
page 23
p(y,s) = deY,s)
on S
where B is the boundary of C and S is the complement of C.
The relation between the optimization problem and the free boundary problem
permits one to apply the methods of partial differential equations.
Three
major techniques have been found to be useful, when closed form solutions are
not available.
One of these due to Bather
equation
(1962) is to generate solutions of the heat
and to find what stopping problems they solve.
By relating these stop-
ping problems to ours, we find bounds on the solution to our problem.
A simpler
variation is the following:
Let us consider the normalized version of Problem 1* with
=
s
-1
+
A special solution of the heat equation is
For K > 3(n/2)1/3, the equation
determines the curves ± yi(s) for s
~
sk and Yi(sk) = O.
Then let
on C = {(y,s): s > sk' Iyl < yi(s)}
on S
page 24
and b is the risk function corresponding to the stopping rule determined by
the continuation set C.
This implies that the optimal risk pi
$
b .
l
Thus
any point (y,s) for which bI(y,s) < di(y,s) is one for which pi(y,s) < di(y,s)
and hence a point of the optimal continuation set.
The function b (y,s) defined by
2
for s > 1, Y > 0
= dZ(Y's) = -y/s
s
~
1, Y
$
0
(C**)
2
(S**)
2
is a solution of the ordinary boundary value problem for dz(y,s).
the risk for the suboptimal procedure C *.
z
Hence b 2 is
For any point (y,s) for which
b < d the optimal risk P2 for Problem 2* will certainly satisfy P2< d and
2
z
Z
(y,s) will be in C , the optimal continuation set for d .
Z
z
-e
y > 0 and s > 1.
But b < d
2
2 for
Hence
z
C
::::>
{(y, s): s > 1, Y > O}.
A second method, related to the first, consists of developing asymptotic
expansions for large s and for small s.
Thus for Problem 1*, the symmetric
solution has the property
-Yl (s)s -1/2 - {
3 -1 + .•. } 1/2
log3
s - log 8TI - 6(log s)
as s +
00
-1/2
3/2
3
6
Yl(s)s
- (0.25)s· {l- 0/12)5 +(7/l5'16)s - ... } as s+ 0
-
where ± YI (s) are the boundaries of the optimal continuation set.
-
the optimal continuation set lies above the curve Y2(s) for which
s = ¢[s-1/2y2 (S)]
~ 2s- l
~
1/2
for s +
00
for s + 1
For Problem 2*,
page 25
Note that '"(3 can be interpreted as a nominal significance level which leads to
termination and s
-1
represents that proportion of the total potential infor-
mation that is available at the time.
peY,s)
peY,s)
=
-s
for Y
~
O.
Then P is a solution of the f.b.p. and the optimal procedure is to stop whenever Yes)
~
o.
A third approach is to use numerical methods.
These will be described in
the next section.
In the meantime let us observe that Problem 3* has a trivial solution
which is given by
E.
Numerical Approximation
The continuous time stopping problems were introduced to furnish analytical
descriptions of approximations to the discrete time problem.
Closed form solu-
tions are rarely available and numerical methods are desirable.
ical method
One natural numer-
consists of approximating the continuous time problem by a related
discrete time version where stopping is permitted only at closely spaced discrete time points.
The latter problem may be solved using backward induction.
We seem to have gone in a logical circle.
This is not completely the case.
First, the excursion to continuous time has permitted us to construct bounds
and expansions.
Second, a single discrete approximation can be used to approx-
imate the continuous problem which in turn is an approximation to a whole class
of discrete time problems with a given normalized version.
~e
A direct approach to this numerical solution by backward induction has
two shortcomings.
First, this approach involves many numerical integrations
page 26
corresponding to expectations with respect to normal densities.
~
Second, a
3
doubling of accuracy involves 2 times as many calculations.
We shall reduce the integration problem by replacing the Gaussian process
by the discrete
Y(s-o)
= Yes)
+ 16
Yes) - 16
w.p. 1/2
w.p. 1/2.
This process has independent increments with mean
time.
° and variance
per unit
1
The backward induction equation which replaces the numerical integrations
when the Wiener process is used is the relatively simple
p(y,s+o)
=
min[d(y,s+o), p(y+l6,s~+p(y-l6,s)]
When 0 is small, this process will approximate the continuous time solution.
The second problem is alleviated considerably by two facts.
-4It
For all the
problems with which I have dealt, this backward induction gives surprisingly
good approximations with very coarse grids (0,18) in the sand y scales.
Secondly there is a correction for the discreteness which effectively increases
the accuracy of the method considerably, thereby replacing the cube factor for
refinement by something very close to linear.
Thus this method is very useful
for practical purposes although it still leaves something to be desired for
those who require highly accurate solutions.
The correction factor can be derived by an argument which points out that
to a "small
statistician" located near the boundary of the continuous time
solution, the stopping problem very much resembles Problem 3*.
Then the relation
between the corresponding discrete and continuous time solutions of Problem 3*
provides a good approximation to the difference between the discrete and continuous time solutions of the general stopping problem.
We pose Problem 3**.
Y**(s)
Let Y**(s) be such that
= Y**(s+l)
±
1
w.p. 1/2,
s
= 0,1,2, ...
page 27
and let d (y,s) be the stopping cost.
3
and the associated risk.
Find the optimal stopping procedure
The optimal strategy is to stop when Y**(s) zY3*(s)
where Y3*(s) is monotone decreasing in s and approaches -1/2.
Hence a first correction to the solution of a continuous time solution
•
of a stopping problem consists of correcting the boundary of the discrete
approximation by expanding the continuation region boundary by 0.5/8 along
the y direction.
This correction is rather crude because it represents 1/2 the grid size
in the y direction, and there is a little vagueness about where the boundary
belongs if two neighboring grid points are such that one is a stopping point
and the other a continuation point for the discrete problem.
This difficulty
is resolved by studying the nearby risk values and comparing the optimal continuous time and discrete time risks for the canonical problem as s
+
The continuous time optimal risk is _y2_ s for y ~ 0 and -s for y z O.
.
discrete time optimal risk, for large s, behaves like v(y)-s for y
and -s for y z -.5
~
00.
The
-.5
where
.
}.
v ( y ) = -y 2+.ln f{ (Y+J. ) 2
: .J an lnteger
This leads to the approximation
where
and YO and Yl are the two continuation points closest to the optimal boundary
of the discrete problem.
REFERENCES
•
Bather, J. A., Bayes procedure for deciding the sign of a normal mean,
Froc. Cambridge Phi los. Soc. 58 (1962), pp 599-620 .
Breakwell, J. V. aDd H. Chernoff, Sequential tests for the mean of a normal
distribution II (large t), Ann. Math. Statist. 35 (1963), pp 162-173.
Chernoff, H., Locally optimal designs for estimating parameters, Ibid.,
24 (1943), pp 586-602.
, A measure of asymptotic efficiency for tests of a hypothesis
based on the sum of observations, Ibid., 23 (1952), pp 493-507.
, Sequential analysis and optimal design, SIAM, Philadelphia,
Pennsylvania, 1972.
, Sequential design of experiments, Ann. Math. Statist. 30 (1959),
pp 755-770.
, Sequential tests for the mean of a normal distribution, Proc.
Fourth Berkeley Symp. Math. Statist. 1 (1961), pp 79-91.
, Sequential tests for the mean of a normal distribution
(small t), Ann. Math. Statist. 36 (1965), pp 28-54.
------::--~-=-
•
III
, Sequential tests for the mean of a normal distribution IV
discrete case), Ibid.> 36 (1965), pp 55-68.
, and A. J. Petkau, An optimal stopping problem for sums of
dichotomous random variables, Ann. Frobabil. 4 (1976), pp 875-889.
, and S. N. Ray, A Bayes sequential sampling inspection plan,
Ibid., 36 (1965), pp 1387-1407.
Elfving, G., Optimum allocation in linear regression theory, Ann. Math.
Statist. 23 (1952), pp 255-262.
Wald, A., Sequential Analysis, John Wiley, New York, 1947.
© Copyright 2026 Paperzz