Bohrer, R.E.; (1965)On Bayes sequential design of experiments."

I
I.
I
I
I
I
I
I
I
ON BAYES SEQUENTIAL DESIGN OF EXPERIMENTS
by
R. E. Bohrer
Research
Institute
and
University of North Carolina at Chapel Hill
Triangl~
Institute of Statistics Mimeo Series
No. 442
August 1965
I
I
I
I
I
I
I
,.
I
This research was supported by the Air Force Office of Scientific
Research under Contract AF 49(638)-1544 and Grant AF-AFOSR-760-65,
at different times. Much of this work appeared previously as an
uncirculated report under the above contract.
DEPARTMENT OF STATISTICS
UNIVERSITY OF NORTH CAROLINA
Chapel Hill, North Carolina
I
I.
I
I
I
I
I
I
I
I
I
I
I
I
I
I
,.
I
ii
TABLE OF CONTENTS
Chapter
Page
ACKNOWLEDGEMENTS
iv
I
INTRODUCTION AND DISCUSSIuN OF RESULTS
1
II
FORMlJIATION OF THE PROBLEM AND SOME GENERAL RESULTS
5
III
2.1
Elements of the problem
5
2.2
Extension of some sequential, non-design results
to sequential design problems
8
AN EXAMPLE: PAULSON t S PROBLEM WITH UNIFORM ALTERNATIVES
21
3.1 Framework
21
3.2 The non-sequential, non-design (ns-nd) case
23
3.3 The non-sequential, design (ns-d) case
24
3.4 The sequential, non-design (s-nd) case
26
3.5 The sequential design (s-d) case
28
3.6 Relation of the Bayes d r
in
~
co
and Chernoff's
d r
3.7 Numerical comparisons
IV
32
*
36
THIi: EXPERIMENTATION RULE X
4.1 Introduction
36
4.2 X* experimentation rules
36
4.3 A result for general convex stopping sets
c , I+c'
1)
4. 4 A partial characterization of the X*( I+c'
37
rule for small c
4.5 Asymptotic optirna.lity of
v
31
5:
39
42
AN APPLICATION: THE CASE OF LOST IABEU3 AND THE NO-OVERSHOOT
*
48
X RULE
5.1 Introduction
48
5.2 Specification of the no-overshoot X* rule
in ~(c2,c1)'
X*
I
I.
I
I
I
I
I
I
I
I_
I
I
I
I
I
I
I
••
I
iii
5.3 Some Fourier analysis
53
5.4 Some characteristics of the Bayes stopping rule
57
5.5
58
An example
BIBLlOORAPHY
69
nIDEX OF NarATION
71
I
iv
I-
I
I
I
ACKNOWLEDGEMENTS
I wish to express my thanks to Professors W. Hoeffding, W. J. Hall, and
M. R. Leadbetter for their guidance and encouragement throughout;
J. S. MacNerney and W. M. Whyburn for their attempt to teach me to teach me
I
mathematics;
I
I
I
Welfare for financing my graduate study and thesis research;
I
I
I
I
I
I
I
.I
to Professors
to the Research Triangle Institute, Air Force Office of Scientific
Research, University of North Carolina, and Department of Health, Education, and
Carrie for coexisting and helping with it.
and to Joyce and
I
I.
I
I
I
I
I
I
I
CHAPTER I
INTRODUCTION AND DISCUSSION OF RESULTS
The study of sequential design of experiments, as discussed, for example,
by Chernoff [3J, is related to the more fully developed study of sequential analysis, which I will call sequential, non-design (s-nd) analysis to emphasize the
relationship.
In both cases, observations on a process of interest are taken
in a sequence of trials; after each trial, the experimenter decides, on the basis
of observations at hand, whether to take additional observations or to stop and
take some action, i.e., terminal decision, concerning the state of nature
erning the process.
e
gov-
The difference is that in s-nd analysis, if one decides to
continue observation, he repeats the same experiment at each trial.
In the se-
quential design (s-d) situation, he has a class E of possible experiments from
which to choose the next experiment; this choice may be based on previous observations.
The experimenter thus has somewhat greater freedom in the s-d case; his
goal is to formulate an experimentation rule (e r) which capitalizes on this.
I
I
I
I
I
I
I
••
I
Thus the s-d formulation
generali~es
the s-nd formulation of problems in
the same way as s-nd analysis generalizes fixed sample-size analysis.
a decision rule (d r) with fixed sample size n is a s-nd
That is,
d r, viz., one in which
the experimenter decides to stop making trials when, and only when, n trials have
been made.
Similarly, a s-nd
d r is a s-d
d r in which the experimenter, if
he decides to make another trial, decides to use the same experiment as at all
previous trials.
class of s-nd
Thus, the class of fixed sample-size d r is a subclass of the
d r, which is, in turn, a subclass of the class of s-d
therefore follows that, for any given problem, the best s-d
good as the best s-nd
d r.
It
d r is at least as
d r, which is itself at least as good as the best fixed
sample-size d r.
In Chapter 2, extensions of several results of s-nd theory to the s-d case
are cited.
Among these is the fact that the Bayes risk p(g) of a s-d procedure
I
2
with prior distribution
~
p(~)
satisfies, subject to boundary conditions.
e
inf
€
where c is the cost per trial,
J pU
E Xe'
~e,x
+ c,
x) f t (x) d\)(x)
s,e
is the posterior distribution of
trial with experiment e has outcome x, and f t (x) is the
s,e
experiment e, averaged with respect
to~.
(1) ,
e
given the
density of x, using
The solution to (1) in a very special
decision problem is obtained in Chapter 3.
Comparisons are made there of the
risks of the best fixed sample-size, s-nd, and s-d
d r, so that, in this case,
the average saving gained by using s-d methods is assessed.
In cases where solution to (1) is possible, the Bayes e r is specified as
~,
the infimum in (1) is attained.
(If it cannot be attained for some
I
I
I
I
I
follows:
If the posterior distribution is
--I
then do that experiment for which
~,
I
then Bayes rules may not exist and one may have to be satisfied with
€-Bayes rules.)
In most cases, however, solution of (1) is quite difficult and not yet accomp1ished.
For these cases, it is of interest to find a
d r for which the risk
is relatively small, e.g., as compared with the risk of the best s-nd procedure.
Two results mentioned in Chapter 2 simplify this problem by showing that
if a Bayes rule exists, attention can be restricted to a greatly simplified class
of d r.
First, it can be deduced from the form of (1) that a Bayes rule --- if
one exists --- can be defined in such a way that it depends on observations only
through the posterior distribution which results from those observations.
With
this form of the Bayes rule, the rules concerning when to stop observation and
which action to take define "action sets" in the space of possible posterior
distributions, i.e., sets of posterior distributions for which the Bayes rule
says to stop and take a given action.
These Bayes action sets are convex
pointsets.
In Chapter 4, an
e r
for problems where E is finite, two states of
nature are possible, and two actions are available is investigated.
cal
In c1assi-
inference terminology, this is the case of testing a simple hypothesis
I
I
I
I
I
I
I
-.
I
I
3
I.
I
I
I
I
I
This e r, called the X* e r, is proposed for use
against a simple alternative.
with given, convex action sets.
risk for prior distribution
and does experiment e
~,
~
It is defined as follows.
of the s-nd
E at each trial.
E
X* uses an experiment
e(~)
Let
r(~;e)
be the
d r which uses the given action sets
When the posterior distribution is
which satisfies
r(~,
e(~)
)
r(~,e).
inf
e E E
That is, X* acts at each trial as if a single experiment from E were going to be
used at all succeeding trials; it selects the experiment which would be preferred
in this circumstance.
The definition
special case by Haggstrom [5].
of this rule has been investigated in a
Two general results for the two-action and two
I
state of nature case are proved in Chapter 4 here.
I
and the proposed e r X .
*
First, let p * be the risk function of the d rousing
the given action sets
*
'I.
Let P be the risk function for the d r
n
o* which uses
n
the same e r through the first n trials and the same action sets as 0* , but which
uses the same experiment at
each trial t for t
> n.
Then Theorem 4.3 states
that Pn*+1 <
P*
and that p*
n
I
I
I
I
I
I
I
,.
I
* both under quite general conditions.
lim Pn'
n=oo
Second, an asymptotic result is established, for the case where E is finite,
'I.
by Theorem 4.5.2. If the cost per trial is c, let 0c be the d r which stops as soon
as the posterior probability of either state of nature is ~ c/(l+c) and which
*
uses the e r X.
at
~
If P* (~,c) is the risk of 0* at
c
~
and
p(~,c)
the Bayes risk
for cost c, then
*
lim p (E"c)
c=o -c log c
lim
c=o
p(Lc)
-c log c
Thus, 0* achieves the minimum limiting risk as c approaches 0; i.e. , 0*c is
c
asymptotically optimal in the sense of Chernoff [3].
The d r
sense.
0* is certainly not the only asymptotically optimal one in this
c
There are, among others, such rules proposed by Chernoff [3], Abramson
[1], and Kiefer and Sacks [9].
The reason for consideration of the X*
e r is
its seemingly reasonable definition for any size cost c per trial and not just
I
4
asymptotic
optimality.
This is made precise by Theorem 4.3, whick says that for
any size cost the risk of the best d r using e r
best s-nd
X* is less than that of the
d r, since this latter risk is the minimum, over all choices of convex
*
action sets, of PO'
In Chapter 5, the case of "lost labels" is considered.
This problem has
the same general structure as the "two-anned bandit" problem [4} except in its
goal.
The goal here is to decide what the true state of nature is, whereas the
two-anned bandit one's goal is to use the "better" experiment most often.
this problem, an approximate X* rule is specified very simply.
For
For a binomial
case with special values for cost,a X* d r is shown to be a Bayes rule and its
risk is compared with that of Chernoff's rule.
It is of interest to note that,
although the goals in the two-anned bandit and lost labels problems may seem
related, in this binomial case the experimentation rules are seen (by comparing
results of [4) with those of Chapter 5) to be exactly opposite.
--I
I
I
I
I
I
I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
CHAPTER II
FORMULATION OF THE PROBLEM AND SOME GENERAL RESULTS
2.1
Elements of the Problem.
Let E be the class of available experiments
e and X the class of possible observations x using experiments in E.
There is a a-field j containing each
Assumption.
and a
a-field
e containing
Notation.
!n
~
single-point subset of X
each single-point subset of E.
denotes the infinite, real vector with ith component zi and
denotes the (n x 1) real vector with ith component zi'
real vector with ith component the ith component of z.
Z
(z) is the (m x 1)
-m -
The "vector" z
-0
will
appear as a subscript in defining d r; it should be read as the subscript
° in
these places.
I
The following five elements are basic in formulating a problem in decision
theory.
(1)
I
I
I
I
I
I
I
••
I
Sample space, ZOo
It is convenient to work with the sample space
For interpretation and notation, (z2p_l,z2p)
the p-th stage of experimentation when e
p
(e ,x ) represents a trial at
p
p
--
is the experiment and x
p
the outcome
(I)
Alternatively, Zo is written as the countable Cartesian product
Z is E for p odd and X for p even.
p
Z(n) denotes the product
X Z , where
p=l p
CD
X
p=n+l
e
P
e.
(2)
A class
(3)
A class A of available actions or terminal decisions (t d) a.
(4)
A non-negative loss function L defined on
incurred when
(5)
e
of possible states of nature
Z .
e x A.
L(8,a) is the loss
is the true state of nature and action a is taken.
A cost-of-sampling or
~
function.
Throughout, the work assumes
that each trial is performed at cost c> 0, i.e., that sampling cost is a multiple c of sample size.
Definition.
An experimentation rule X is a class of probability measures
I
6
e: E & x e: X for 1 < p < n} together with a probabil i ty measure X ' each
O
p
-defined on the 'measurable space (E,e). If SEe, then x
(S) gives the proba{X
: e
~2n
p
~2n
bi1ity, using e r X, that an experiment in S is used in trial n+l given the previous trials
~2n;
XO(S) is the X-probability that an experiment in S is used at
the first trial.
Assumption.
The outcome of experiment e when
determined by the probability measure
There is a measure
D
e
is the state of nature is
De ,e ,independently
which dominates each measure in (U
Remark and notation.
used where convenient.
Thus, the
5o denotes
density
of previous trials.
e. , e : a E e
functions f
e ,e
& e E E) .
= dUe
,e
/du are
the a-algebra generated by the measurable
rectangles [6] of ZOo
With the probability structure given, a probability function
measurable rectangles in Zo is defined.
nature, e r X is used, and S =
2m
S x Z
p
X
p=l
a is
Specifically, if
(2m)
, X from
the
the state of
2p-l
dX
(e )
z
p
-2p-2
m
dX
~2m-2
Remark.
X and write
for
~e
, X for
I t can be shown flO
with
~a
(e )
m
dU(x )
m
x
n fa
p=l
,e p
(x).
p
Except when confusion will result, I suppress the dependence on
~e
on the
brevity.
J that there is a unique measure on (Zo' 50) which agrees
measuTable rectangles.
This measure will also be denoted by
~a=~a,x'
Definition.
A decision rule (d r) 0 for a sequential design problem con-
sists of three classes of probability measures, viz., an experimentation rule
(e r)X, a stopping rule (s r)*, and a terminal decision rule (t d
(A,I1) be a measurable space.
n > O· e
-
,
p
E E & x
p
Then a
I
I
I
I
I
I
is a measurable rectangle, then
J
S
~e
.-I
r)~.
Let
t d r ¢ is a class of probability measures
E X for 1 :::; p :::; n} .
For SEA, ~
~2n
(S) is the 0 -prob-
I
I
I
I
I
I
I
••I
I
7
I.
I
I
I
I
I
I
I
ability that a
t d in S is made, given the n trials
of binomial probabi lity measures (W
If n trials
Zz
-n
n > 0; e
~Zn
have been performed, then W
qn
is discontinued after the nth trial.
Remark and notation.
fication of the s r.
p
€
~2n'
A s r
E & xp
€
W is a class
X for 1 ~ p ~ n} .
is the 5-probability that sampling
E r were defined previously.
It is convenient to introduce an alternative speci-
For ~
€
ZO' define (rrn(~):
n ~ O} by rrO(~) = W and
o
() nITl (l-W
(».
Thus rrn(~) is the 5-probability that,
p= l
Zz
Z
pwith sample point ~,0 terminates after exactly n trials. Any s r specifies a
rr (z)
n -
=
(l-W ) W
0
Zz
Z
n-
unique set of such rrn(~) for each ~ € ZOo Also, for ~ € Z and p ~ n ,
o
n-l
rr (z) / [1 - sIn rr (~)} is the probability that, with sample point ~, 5 termis
P nates after exactly p trials, given that 5 does not terminate before n trials.
n
Definition.
A d r
5 is truncated
The risk of the d r
Definitions.
n if, for z
-
ZO'
€
L rr (z)
p=O n -
=
1.
5 when A is the state of nature is
+J
rrn(~) [nc
r(A,5)
when defined.
~
A
L(A,a)d¢z (a)J d~A(~)'
-Zn
(1) ,
Here c is the cost per trial of sampling so that the risk is the
average cost plus average loss, when 5 is used and A is the state of nature.
I
I
I
I
I
I
I
,.
I
Let
u be
a a-algebra of
e
subsets which contains each 8
probability measure on (8,u).
€
8 and let
Then the average risk of 5 at
r(s,5)
J r(A,5)
S
S be
a
is
ds (8),
(2),
8
when defined.
Remark.
The integrals in (1) and (2) are defined if L is a measurable
function with respect to (8 x A,u x II) and i f the functions involving wand ¢
are measurable.
All subsequent work considers only rules for which the inte-
grals are defined.
Bayes criterion.
A
d r
5 is Bayes at
S in
the class 6 of d r if 5
and
5'
inf r(L5')
€ 6
r(s,5).
€
6
I
8
The Bayes risk at
~
.!!!
~
..£!.!.!!
t::. of d r is
p( ~ ,I:::.)
Notation.
inf
o€
r( ~,o)
I:::.
The class of d r of most interest is in this work is I:::.
the
co
class of d r for which (1) and (2) are defined and which terminate with prob~nn(~) =
ability one, i.e., for which
9
8.
€
The classes I:::. , n
n
~
~9
(~)
for almost all
0, of d r truncated at n for which (1) and (2) are
defined will also be considered.
2.2
1 almost every where
For brevity, write p
n
(~)
=
p(~,1:::.
n
) and
Extension of some sequential, non-design results to sequential design
problems.
The results of this section can, once the founcation work of Section
2.1 has been done, be proved by methods quite similar to those for analogous
s-nd results.
Each of the results will be used in the investigations in the
succeeding three chapters.
.'I
I
I
I
I
I
I
The first theorem establishes an integral equation for the rist of s-d d r
in I:::.
As a corollary to this, integral equations for the Bayes risks in I:::.
n
CD
and I:::.
CD
are derived.
These Bayes risk equations are s-d analogs of standard s-nd
results, see e.g., Theorem 9.3.2 of [2J.
used in [lJ, [5J, and [13J.
The extension to the s-d case has been
It is proved here, in the framework of Section 2.1,
for completeness, since the equation and its consequences are important througk)ut
the present work.
Before proceeding to the theorem, I belabor a point in integration theory
which is perhaps "obvious" and certainly necessary. Let S(n) be the cr-algebra
n
generated by the measurable rectangles in Z(n) .
of sets in Z(n) = X Z
p=l P
Lemma 2.2.1
A.
A necessary and sufficient condition that a subset S of
be in S(n) is that S x z(n) be in
B.
So .
Suppose f(n) is a measurable function on
satisfies
f(~) = f(n)(~(~»'
If Sn
€
~(n)
(Z(n)'~(n»
and S
= Sn
and f on (Zo,50 )
x z(n), then
(1) ,
I
I
I
I
I
I
I
.,
I
I
9
I.
I
I
I
I
I
I
I
on the sense that if one side exists, then so does the other, and the two are
equal.
Here,
~e,n
is a measure on (Z(n),S(n»
defined (by virture of part A)
on S(n) by
~ e,n (S) = ~ e (S x z(n»
Proof.
(Part A)
Let R be the class of measurable rectangles of Z(n)
n
and show that S(n) x Z(n) is the rr-algebra R(say) generated by R x Z(n) as
n
S(n) x Z(n) is a rr-algebra, since S(n) is; moreover, S(n) x Z(n)
follows.
contains R
n
x z(n) and hence also the rr-algebra R.
From this follows that R can
be written as R* x Z(n) for some class R* of Z(n) subsets.
Since R is a
oJ(
rr-algebra, R* is; and since R CR , R* contains S(n)' i. e. , R contains
n
5 . x z(n) . Since 0 contains R, ifS E S(n) , then S x z(n) E S •
(n)
0
S
Let
5*
be the class of Zo subsets E such that each n-dimensional Z(n) section
~
of E is in .5 (n)'
,*
J
~
(V
is a rr-algebra, since J(n) is, and J~ contains the
class of measurable rectangles in Zoo
S x z(n) E So' then for
~
E z(n),
Hence
j~
is a subset of
S*.
If
the z-section of S x z(n), viz., S, is in
S(n)' to complete proof of A.
I
I
I
I
I
I
I
••
I
(Part B)
Assume for definiteness that the left hand integral in (1)
exists; the other case is similar.
is f.
By definition of
~e
By part A, if f(n) is measurable, then so
(see page 6) and ~e,n' the result follows if f ( .
n)
is an indicator function, hence also if f(n) is simple.
If f (n) is any ~e,n
integrable function, then there are increasing, mean fundamental sequences of
integrable simple functions
+
-
f(n) and f( n )' respectively.
for s =
+,-.
(f~n)p} and (fCn)p} which converge in measure to
+
- } on Z by f s (z) = f(s ) (z (z»
Define (f p } and(f p
op n p -n -
These functions are measurable (by part A) and simple and form
mean fundamental sequences which converge in
hand integral in ( 1) exists and the
Defini tion.
~e-measure
monotone
to f.
Hence the right
convergence theorem proves part B.
o ~<
I f 5 is a d r and ~2n is in Z(2n)' define 50 = 5 (~2n) by
*
I
10
41
0
z
-Zn'-Zp
-Zp
'" z
0
-Zp
.1
*
41 z
z
z* z
= '"-Zn'-Zp
= Xz*
z
-Zn'-Zp
for n;::' O.
Thus 50 is the d r which "follows 5 from the n-th trial onward".
Definition.
The posterior distribution, g
~Zn
Zz have been observed is defined by
-n
bution is g and trials
n
dg
Also, f.
s,e
n
(9) = IT f 9
(x )dg(9)/f IT f
(x )d~(Q)
~Zn
p=l
,e p p
e p=l 9,e p p
(x) =
f f
e ,9e
Theorem Z. Z. 1
r(L5)
(x)dS(9).
If 5
€
6.
00
,
e
A
o
5)
JJ
E X
By definition, for 5
~,
€
eA
o
+
r(~
0
,5 (zZ»f. (x)du(x)dX (e)J,
~2
s,e
0
6.00 and ~Zn
J J L(9,a)d4!
n
(a)d~(9)
0
+ (l-n )[c +
r(
_I
then
J J L(9,a)d4!
n
o
Proof.
, of 9 when the prior distri-
0
= ~2n(~)'
(a)dS(9) + h(~.5)
where
h( ~. 5)
J
Jnn(~)[nc
8 Z
o
+ J L(9.a)d4!z
A
Let nO = rro(z) be the nn-type s r of 50
n
n-
I
I
I
I
I
I
I
-2n
(a)Jd~9(~)d~(9)
*
oo (~2)'
Then for n
~
O.
(2) •
I
I
I
I
I
I
I
••
I
I
I.
I
I
I
I
I
I
I
11
/\
/\
By Lemma 2.2.1, wi th rcn defined by
JJJ
eE
/\
(rc'(~2)[c
J
L:
••
I
~2
A
7tn(~2n) [nc +
n= 2 Z (2n)
f
t
s,e 1
J L(9,a)d¢~2n(a) J
A
n
II
p= 2
f9
,e
p
(x )dX
(e )dD(x )} x
p
z2 2 P
P
- p-
(x )dD(x 1)dX (e 1)dg (9)
1
0
~2
(1- rc)c+
o
I_
I
I
I
I
I
I
I
+ J L(9,a)d¢ (a)J +
X
00
+
-2n (z»
- ,
n (z)
- = rcn (z
l'C
f
g,e
(x)dD(X)dX (e)
0
J J r(g~2 'Oo(~2»fg,e
(l-rc )[c +
E X
o
1
(x 1)dD(x 1)dX (e 1)J
0
to prove Theorem 2.2.1.
Corollary 2.2.1
for n
~
Pn(g) = min[po(s), c
+ei~fE
1; and
p(s) = min[po(g), c + e
_
Proof.
inf
E
E
J p(ge x)f t e (X)dD(X)J
X
the
'
s,
From the theorem, r(s,B) = rc oA(s,o) + (l-rc 0 )B(s,O), where A is
a function of rc o only and B is independent of
d r
J p l(s e,x )f s,e
(x)dD(X)J
t
X n-
l'C
0
and ¢.
0
If A <
B, then the
-
°* with * = 1 and ¢* = ¢ satisfies r(s,o)* ~ r(s,o); if B ~ A,
d r °* with the same t d rand eras 0, but with * = 0 and
l'C
0 0 0
l'C
o
then
I
12
rr*
n
*
=
rr /(l-rr ), satisfies r(;,O ) < r(;,O).
n
0
-
one need consider only d r with rr o
Thus, to consider risk minimization,
° or rro = 1.
=
where 6.* is the subset of 6. on which rr
= 0.
o
If
Consequently,
OO(~2)
6. ' then
I) E
n
6. _ '
E
n l
so that the minimum can be obtained by minimizing the average, with respect to
f t (x)du(x) dX (e) ,of p(; ). This is done using a d r (or, sequence of d r)
s,e
0
~2
which minimizes (or, with limit which minimizes) the average of p(; ). The
~2
average (or, limiting average) risk of this d r (or, sequence of d r) is just
as given on the statement of the corollary.
The derivation of the integral
equation for p follows in exactly the same way by noting (Bo(~2) : B
Theorem 2.2.2
e
If L(.,.) is bounded on
€
6.oJ= 6.
00
'
x A, then there is at most one
solution to the functional equation
p(O .. min[p (;), c
o
+ inf
e
€
J
p(;
EX
e,x
)fl
i,e
(x)du(x)]
(3),
where
J
e
() = a inf
Po;
€ A
f!:2.Qi.
n
J p 1(; e,x )f ;,e (x)du(x)]
X n-
n ~ 1
J I
po'(;) = 0, Pn'(;) = rnin[p o (0, e inf
€ E
(x)du(x) ]
t
X pn- 1(; e,x )f s,e
Let p(;) be any solution of (3).
for 0
~
p
~
n-1, that p(;)
~
Then p(;) < p (;) for all ;.
-
Hence p (;) >
n
Also,
-
p(~)
p~(;) ~
0
n ~ 1.
Suppose
pp(;) for all ;, then
p (0 ~ min[ pO), c +
n
I
I
I
I
I
I
I
_I
L(9,a)d;(9)
Consider p (;) and p'(;) defined by
n
_I
0
e
inf
€
J pO e,x )f s,e (x)du(x)]
EX
t
for each nand ;.
po(;) for all;.
Suppose for 0
~
p
~
n-1 and for each;
p(;).
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
13
that
p(E)
~
p' (E).
p
Then
p' (E) :::: min[po(E), c
n
P~
i.e.,
+ ei~fE X
f p(E e,x)f s,e
(x)du(x)]" p(E)
t
(E) :::: p(E) for each nand E.
Note that pn is the Bayes risk function for the class of procedures
truncated at n observations and p' the Bayes risk for procedure truncated at
n
n observations in the modified structure wherein a decision d(n) is available
at stage n only with L(9,d(n»
with risk p'(E).
n
.. O.
Let B'(E)
be d r in the modified structure
n
With r denoting the risk function in the usual structure, it
follows that
(4) ,
1\
where L .. eS~PA L(9,a) and PE(n) ..
sf Pr9{B~
takes n observations} dE(9).
1\
1\
Since p~(E) :::: po(E) :::: Land ncPE(n) ~ p~(E), Pg(n) ~ Line and from (4)
I
I
I
I
I
I
I
••
I
to complete the proof.
Definition.
A dr B is a non-randomized function of the posterior distri-
bution if (a) E .. E , implies Xz = Xz ' , Wz = WZI , lP z = lP z ' , and
~2n
~2m
-2n -2m -2n -2m -2n -2m
(b) at any possible posterior distribution E,there is 5-probability 1 that
sampling continues with a given experiment e(g) or that sampling terminates with
a given action a(g).
For such a d r, N .. N(5,
trials before termination with sample point
!>
is defined as the number of
~.
Theorems 2.2.3 and 2.2.4 establish conditions under which existence of
a Bayes d r guarantees existence of a Bayes d r which is a non-randomized
function of the posterior distribution.
I
14
Theorem Z.Z.3
If 5 is a Bayes rule, then the subset Z of Z on which
a
(a) fails and the subset Zb of Zo on which (b) fails are
~
a
null for almost
all (s)9:
n-l
(a)
for each n such that
1:
(!.> < 1, there is e
1f
p=O
p
e(zZ
-n) € E such
that
J
X
where
t
~n
(b)
inf E Jp([s J, )f
p([sJ
)f t
(x)dt(x) = el€
,(x)du(x)
n e,x ~n,e
X
n e ,x t~n,e
=
t
•
~z
-Zn
'
for each n such tha t
J
e
'h (!.) >
L(9,a)ds
(9)
!.Zn
0, there is a • a(!.Zn)
inf
al € A
L(9,a')ds
Je
€
A such that
(9)
~Zn
Moreover, if !. € Z , z' € Z , and s
= s, = s (say), then one can take
a a
!.Zn !.Zm
a(!.Zn) = a(!.2m) = a(s) and e(!.Zn) = e(!.2m) = e(s).
E!221.
Suppose Za is not null.
Then,
inf
where Za,m =(zZ
- m
€
E
if z € Zo' then M(!.2m' !.> = m} ,
m
f
(x) =
s ,~ -m
and where 5(z
)
-0
J IT
e pel
f
9 ,e p
I
I
I
I
I
I
_I
Let M(!.>, defined on Za' be the least
integer for which (a) fails with sample point z, and let S = S
m
!.Zm
by Theorem Z.Z.l,
e
--I
(x )ds(9)
p
5 in the term for m • O.
Since, if (a) fails, the integrand
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
15
is everywhere positive on a set supposed to have positive measure,
r(~,5)
> 0, so that 5 cannot be Bayes. This contradiction proves the
- p(O
assertion that Z is null.
a
A similar argument proves that Zb is null, and the final assertion follows
from Corollary 2.2.1.
at
~,
then there is a Bayes rule at
~.
I
which is a non-randomized function of
By virture of and with the notation of Theorem 2.2.3, a d r 5 is
defined almost everywhere by
where
~
n
= ~z
forn~O;
-2n
-2n
••
~
there is a Bayes rule
the posterior distribution.
~z
I
I
I
I
I
I
I
e x A and
If L(.,·) is bounded on
Theorem 2.2.4
(a(~z
». 1;
-2n
and
X
(e(~
~2n
~2n
» =1
If definition of 0 is completed by defining it as a non-randomized function of
the posterior distribution on
Za\}~'
then 0 is such a rule for each
The theorem is proved by proving that 0 is Bayes at~.
and
and
~
n
=~
~2n
Xn • X -
,let
p(~)s;c+
xn •
Xn (z2
- n- 1)· (x n
X.
By definition of the d r
n
€
X
o
n
5,
inf
e
E
E
~
E Zoo
For ~2n-lE Z(2n-l)
I
16
where
00
L:
n-1
n=l
J
~
XX
p=l P
and where ep+1= e(sz ).
-2p
in /::;.
00
(Corollary 2.2.1), of which there is at most one by Theorem 2.2.2, satisfies
~(s)
and
A solution p of the integral equation for Bayes risk
can be shown by induction to satisfy, for m ~ 1,
h_
·1.m
(0 +
R (s)
m
where
J
J
XX
X
n
n-1
p=l P
[pc + po(sz )] f
-2p
t
s,~
m
J
[me +
X X
p=l P
The sequence (h
h (s).
1
1m (s) : m~ 1} is bounded and non-decreasing and thus has limit
Hence, h(s,o) = h (s) and r(s,o) = p(~), to complete the proof.
1
Notation
If d r
° is
a non-randomized function of the posterior distri-
bution, then the space of possible posterior distributions can be written as
,...,
L:.;:,
E
e
+
.....*
~
L:":' , where ":'e consists of
A
a
those posterior distributions at which
sampling is continued using experiment e at the next trial and
-
* consists
~a
of
those posterior distributions at which sampling is discontinued and action a is
taken.
I
I
I
I
I
I
(x )du\x )
-n
-n
and
R(O
m
--I
I
I
I
I
I
I
I
-.
I
I
17
I.
I
I
I
I
I
I
I
Theorem 2.2.5
° is
If
a Bayes d r which is a non-randomized function of
the posterior distribution, then the sets
Proof.
2*a
are convex.
The proof is the same as in the s-nd case, see, for example,
Theorem 9.4.3 of [2J.
The remaining results of this section are proved in generality sufficient
for the applications of them in the following chapters.
Note, however, that some
of them can be extended easily to more general situations.
Theorem 2.2.6 adapts Stein's s-nd theorem [14J on probability of termination to the s-d case.
2.
° is
S-d d r
e ..
1.
Theorem 2.2.6
(1,2)
a non-randomized function of the posterior distribution with
, specified by numbers m and M, 0 < m $ M < 1, and the rule
Stop as soon as the posterior probability that e
3.
1 is • (m,M).
There are positive numbers 01 and 02 such that either
(i)
If e
€
E, then pr e (f 2 ,e(x)/f l ,e(x) > e
(ii)
If e
€
E, then pr e (f 2 ,e(x)/f 1 ,e(x) < e
°1 )
> 02
or
-0
I
I
I
I
I
I
I
I·
I
~
1) > 02
Pre(N < m) = 1; in fact there is a positive number b such that
A.
for m ;:: O.
<D
for some t > O.
c. eeNk <
<D
for k;:: O.
~.
L(~2n)
tN
<
B.
eee
Let D = 1
-1
+ (M-m) 01 and suppose assumption 3(i) holds. Let
denote the random logarithm-of-the-1ike1ihood-ratio after n trials,
using 0, Le.,
I
18
where e
e (g
)using 0.
p ~2p-2
p
prS{~
If n
l
~
L(~2n +2n ) - L(~2n )
1
2
0 and n
M-m}
2
>
1
~
0, then
°m >
2
I
I
I
I
I
I
0
If n ~ 0,
to prove A. B. and C. follow exactly as in [14J.
The proof where 3(ii), but
not 3(i), holds is nearly identical.
Definition.
e
x
e
The Kullback-Leibler information numbers are defined on
x E by
I(S,S' .e) = e S log(f S ,e /f S ' ,e )
A s-d corollary of Wald's equation is Theorem 2.2.6, the proof of which
follows Johnson's proof [8
Theorem 2.2.7
where again e
p
Proof.
inf
€
e
x E, then
n
E I(S,3-S,e) ~ e S
~
(5) ,
p=l
= e (s
) using 0.
p ~2p-2
Define
Then the right hand side of (5) is
00
_I
J for the s-nd case.
If assumptions 1 and 2 of Theorem 2.2.6 hold and if
I(S,3-S,e) is bounded on
egN. e
--I
~ J eS(y log(f 2 (x )/f 1 (x»
n=l E
n
,e n
,e n
Ien
~ e} d Pr(e
n
= e}
I
I
I
I
I
I
I
-.
I
I
19
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
•
co
~
n=l
JE
ee Y ee(log(f 2
(x )/f l
(x» Ie • e} d Pr(e = e},
n
,en n
,en n
n
n
since Y and the log-ratio are conditionally independent given en.
n
(6),
But for
any e,
ee(log(f 2 ,e (x)/f l ,e (x))
~
inf
e'E E I(e,3-e, e'),
The theorem follows from (6) and (7), since
co
co
00
~
eeYn = ~ Pr(N ~ n} = ~ n Pr(N • n} • eeN
n=l
n=l
n=l
Chernoff
proves the following theorem in [3].
Theorem 2.2.8
away from zero on
e = (l,2) and if I(9,3-9,e) is bounded and bounded
If
e
x E, then there is a number b
n
Pre ( ~:
Proof.
~
p=l
log(f
e ,e
p
> 0 such that
) < O} < e -bn
/f _
3 e ,e p
See the proof of Lemma 1 in [4].
Corollary 2.2.8
Suppose the assumptions of Theorem 2.2.8 as well as
assumption 2 of Theorem 2.2.6 with m <
~
< M are valid.
Then there is a
number b > 0 such that
A.
B.
el(N(5,~)
I~z
~
e2(N(5,~
I~z
~ M} Pr2(~z
E.!.22i.
-2N
m} Prl(~z
~
-2N
-2N
m} ~ e
~ M} ~ e
-2N
To prove A, note
-b
-b
/(l-e
/(l-e
-b 2
)
-b 2
)
that the left-hand side is
co
co
nPr (N = n, ~
<m)~~nPrl(~z~~}
l
~2Nn=l
-2n
n-l
~
ne -bn ,where b > 0 from Theorem 2.2.8,
= e -b /(l-e -b ) 2
The proof of B is analogous •
(7) •
I
20
Theorem 2.2.9
function of S
~.
(see [2]).
= sell,
If
e
= [1,2}, then the Bayes risk function p, as a
the prior probability that 9
= 1,
is continuous.
This is a direct consequence of the proof of Theorem 2.2.5
--I
I
I
I
I
I
I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
CHAPTER III
AN EXAMPLE:
PAULSON'S PROBLEM WITH UNIFORM ALTERNATIVES
3.1
Framework.
For the example considered in this chapter, the Bayes se-
quential-design (s-d) risk equation of Corollary 2.2.1 is evaluated directly and
the Bayes d r
is easily specified.
In addition, Bayes procedures and risks are
determined for cases where the experimenter's alternatives are restricted more
than in the s-d case, viz., the non-sequential non-design, non-sequential design,
and sequential non-design cases.
Evaluation of the risk-improvement provided by
adding experimental flexibility is carried out for several values of the parameters of the problem.
The example considered is that proposed by Paulson [11,12] with a very special probability structure.
An interesting hypothetical case in which it arises
is the following, which is closely related to a problem considered by ~iryaev
I
I
I
I
I
I
I
I·
I
n~.
Suppose it is of interest to detect the presence of a target in one's
vicinity by scanning with radar.
It is not known whether the target is located
in the north or in the south, or whether there is a target present at all.
More-
over, because of interference, one cannot expect to be sure of the presence
~r
absense of target based on a single radar reading.
However, he must decide,
based on radar readings from either direction, either to fire at a supposed
target in one of the two directions or not to fire at all.
The larger the
number of observations he makes, the more sure he is that his action is correct
so that he will incur no loss.
On the other hand, each observation is made at
a cost c, so that he wants to take some action as soon as possible.
In the notation of Section 2.1, the possible states of nature are 8=0 if no
target is present, B-1 or 2 according as a target is present in the north or in
the south.
E. {l,2}, with experiment 1 being to take a radar reading
f~om
the
I
22
north and experiment 2 being a radar reading from the south.
action 1 or 2 being to fire north or south, respectively.
A
= (0,1,2}
with
Suppose the observa-
tions, i.e., radar readings, are represented in some way as being uniformly
distributed on [O,lJ if no target is present and as uniformly distributed on
[b, l+b] if a target is present; here 0 < b < 1 for cases of interest.
Hence
one is sure that no target is present in a given direction if a radar reading
is < b; i f a reading is> 1, then a target is surely present.
The probability structure of section 2.1 has U real Lebesgue measure and
f 9,e given in Table I, where g0 is the uniform density on [0,1] and gl is the
uniform density on rb, l+b].
Table I.
9
a
1
The most general probability measure
where sp
2
S on
(S,u) can be denoted
The problem is to decide how to take now many radar readings when the cost
per reading is c and losses for incorrect decisions are given by
a
if
9
1
if
9"a
a
In the following sections, this problem of optimal allocation is solved for
several different cases.
These cases differ in the amount of flexibility
allowed in deciding on how to take readings.
Numerical comparisons are given
in the final section.
Admittedly, this loss structure is quite specialized.
The most general loss
structuring for the problem is specified by nine constants, and it seems that
analogous results for this case should be little more difficult to derive than
those for the "0-1" structure used here.
In particular, note that since the
units of loss are unspecified, a 0-1 loss is no less general than a O-L loss,
for any L> O.
I
I
I
I
I
I
1
= s«(p}).
L(9,a)
.-I
I
I
I
I
I
I
I
••
I
I
23
I.
I
I
I
I
I
I
I
This problem seems to be another one wherein a sequential modification is
"obviously" preferred (cf.
p1ing [7]).
the "classical" example of binomial acceptance sam-
For example, a=O can be taken as soon as one sample with each ex-
periment is obtained which is less than b.
Or a=B for B = 1,2 can be taken as
soon as a trial with e=B yields an outcome which exceeds 1.
These actions can
be taken with no probability of error even though some pre-assigned sample size
has not been attained.
This reasoning extends to comparison of the sequential, .
non-design and sequential design fOTmU1ations of the problem.
Since, intui-
tive1y, no additional infoTmation can result from observations using e=B after
having obtained one such observation outside [b,l], a design procedure which
discontinues observations with e=B in such an event would seem preferable to a
procedure without this flexibility.
3.2
The non-sequential. non-design (ns-nd) case. This is the fixed sample
size case, where, for n fixed, decisions are based on n observations with each
experiment.
In the notation of Section 2.1,
x
o
x
(1)
~p
(1)
1
X
~p-2
(2) for 1
~ P ~
n
(1)
I
I
I
I
I
I
I
I·
I
*0 • *!.2p
-
0 for 1
~
p < 2n ,
1
with the e rand s r otheTWise arbitrary.
Remark.
Class designations, such as ns-nd, are given to clarify, but not
to define, the experimental flexibility of procedures in the class.
For example,
the ns-nd class provides some design flexibility, in the sense that fixing of
the sample size, n, is up to the experimenter.
Non-design is meant to indicate
that the experimenter cannot decide how many of each experiment to perform,
given n.
Let 6(n) be the class of ns-nd d r using 2n trials.
A Bayes drat
6(n) is a d r in 6(n) which minimizes, as a function of 0,
t
in
I
24
~2n
where
is. as usual. the vector with pth component x
z2p in the notation
p
of Section 2.1.
--I
It is easily checked that this minimization is accomplished with the d r
and risk of Result 1.
Result 1.
~
~n
A Bayes procedure in
~(n)
is given by (1) together with
~a
<1 for
max(~o'~1'~2) and b<x
-p-
~a
1<1. some x 2p<b for
max(~o'~l) and b<x
- 2p--
l~r
and b<x <1, some x _ <b for
- 2pa = max(~o'~2)
2p l
l~n
(a) • 1 i f
~
x
p
> I and e =a for some p
p
which is defined with probability one if the
as the
~
1~2n
~a
of the conditions are interpreted
corresponding to the least a for which the condition holds.
The aver-
age risk of this procedure is
PI (i. n ) • 2nc + (l-b)
2n
[I-max ~e] + (l_b)n[l_(l_b)n][ min
e
8=0,2
~e+ min
e=o, I
se]'
I
I
I
I
I
I
Definition. A Bayes ns-nd procedure at i is the Bayes procedure with respect to i in
~~(n) with average risk
n=l
inf
PI (i)
n
2:.
°
PI (i. n ).
This infimum can be attained, since Pl(i.O) = I-max ~e<I<2n'c where n' is
I
8
any integer>
Thus, the Bayes fixed sample size N satisfies OSN<n'.
ZZ.
Help in determining the Bayes ns-nd sample size is provided by Result 2.
Result 2, PI(i.n) is a convex function of n.
Proof.
PI (i,n) is a linear combination with non-negative coefficients of
n
2n
the three convex functions n,(l-b) , and (I-b) .
Since PI is a convex function of n, it is minimized for real n by no such
that
d
dn
Pl(~,n)ln=n
=
o
° and
for integral n by [no] or [no] + 1, where [x]
denotes the greatest integer not greater than the number x.
3.3
The non-sequential. design (ns-d) case.
This is the fixed sample
siz~
case wherein the experimenter, before taking any observations, allocates each
member of his total sample to one of the two experiments.
For a sample of n,
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
25
the s r is
\jr
o
\jrz
-2p
=0
for 1 < P < n, \jr
z
-2n
E r of interest are Xm, n for 0
~
m ~ nand n
~
=1,
(1).
0, where Xm,n is the n-sample
rule which assigns the first m samples to e=l and the remaining n-m samples to
Thus X
is defined by
m,n
X (1)
o
·G
p<m
i f m-O
and X
(1)
!.2p
otherwise
where the subscript m,n has been omitted.
lent to
X
n,n
(2),
otherwise
Note that the ns-nd e r is equiva-
The minimum-risk t d r for use with the s r of (1) and e r of
(2) is that which yields risk
r(i~n,m)
2
L:
h(S ,n,m)
~
e-O
.. nc + h(i,n,m), where
e
J L(e,a)d~
A
n
(a)
II
!.2n
p=l
Result 3.
I
I
I
I
I
I
I
I·
I
..
(x )dx
p
--n
fee
(x ) dx
p
p
--n
.
The average risk in the class of procedures using (1) and (2)
defined, with probability 1,
~
is minimized by using 5(i,m,n) with t d r
!.2n
by
a-O and for some
~m
a-1 and for some
~,
and p'>m, x p<b and x'<b
p
x >1
p
a-2 and for some p>m, x >1
P
if
~
(a)
10
1 if
!.2
n
if
10
e}
t}
b<x <1 for
1, ' - p-
12 '
b$X~1
The average risk of this procedure is
for l$PSn.
and for some p>m,x <b
P
b<x <1 for p>m; and for some
- p-
a is the least integer in a with
and
~_m;
~
a
• max
e
~e
~m,x
-
p
<b
I
26
~e(l-b)m[l-(l-b)n-mJ+ min
nc+ min
P (5o,n,m)
2
e=0,1
e=0,2
[1 - max ~eJ (l_b)n.
+
Definition.
'e(l-b)n-m[l-(l-b)m J +
I
I
I
I
I
I
e
A Bayes ns-d procedure at i is the o(i,m l ,n l
)
with risk
..
As in the ns-nd case, and by an analogous argument, this infimum can be
attained.
To determine the (m,n) value which characterizes the best Bayes ns-d procedure, the following result is helpful.
Result 4.
(b)
(a)
1 - max,
e
-
min
e
,
8=0,1
e
-
min
'e
e=0,2
~
0.
° and f(x,y) .. c(x+y) + ko~X+Y + k1~x(1-~Y) + k2~Y(1-~x).
Then (i) f x - ° ~ ~X+Y(ko-k1-k2) + k1~x = c/(-log ~)
(ii) f y " ° ~ ~X+Y(ko-k1-k2) + k2~y .. c/(-log ~)
Suppose ~ >
(iii) F
.. [log
2
(iv)
~(2
(k o -k -k 2 )
1
~
Remark on proof and use.
possible orderings of
[f
f
°
xx
f
yx
~
L1~X
0
fXYJ
yy
(k -k -k ) [1
012
1
2
:]
F2 is positive definite.
Part (a) follows from considering each of the six
~0'~1'~2'
The first three parts of (b) are simple anal-
ysis results, and (iv) follows from definition.
The identifications
and k Z " min ~e show that best (m,n-m)
e=0,2
values are within unity of the corresponding solutions of (i) and (ii), since,
x-m,y-n-m,k -l-max
o
e
~e'
oJ+
k ~y
k l ..
min
~e'
8=0,1
by (a) and (iv), the matrix of second partial derivatives is positive definite.
3.4
The sequential, non-design (s-nd) case.
In this case, each stage of
sampling consists of one trial with each experiment; after observing the outcome of a stage, the experimenter decides whether to stop and make a decision
or to observe another stage.
\
(2)
=-4n-Z
=1
for n
~
1.
In
symbol., X
!.t.n
(1)
=X (1)
0
= 1 and
••I
I
I
I
I
I
I
I
••
I
I
I.
I
I
I
I
I
I
I
27
Bayes d r with respect to i in the class of s-nd procedures satisfy a risk
equation similar to that of Corollary 2.2.1 of Chapter 2, viz., that of Theorem
9.3.2 of [2].
specification of the Bayes d r
••
I
is neglected, since evaluation of the risk
suffices for comparing the s-nd with other procedures.
Result S.
The Bayes risk P3(i) in the class of s-nd d r
P3 (i) • min [l-max ~e' 2c
e
satisfies
+ h] ,
(1),
Evaluation of Bayes risk using (1) can be accomplished straightforwardly
in this case because of the small number of possible
listed, along with the
1_
I
I
I
I
I
I
I
This theorem is specialized to the present case in Result 5;
~
~
~
values.
These are
values to which each corresponds, in Table II.
Table II
,
~z
Set
~
-'02
S2
= (liZ: X z >
1}
Sl • (li2: xl > 1}
=
(liZ: b$x1~l, b$xZ~l}
Use of Table II with (1) gives
=
(1,0,0)
= (~0/[~0+~2],0'~zI[~0+S2])
12 = (0,0,1)
's'1
101
S
for li2 in this set
(0,1,0)
= (EO/[~O+~l], S/[~O+~l],O)
i = (~'Sl'~2)
I
28
Note that P3(ip)
=0
from (1), for p
= 1,2,
so that the integral over Sp is 0 for 0
~
p
~
2.
Again
(3)
where
h
op
•
(4) .
From (3) and (4) for p
P3
= 1,2,
(~
)
--op.
~o
E,.,
2c
-J
f+f"
' --'"E +E 'b '
o pop
min [
(5).
Using (5) in (2), obtain
••I
I
I
I
I
I
I
_I
(6).
(6) in (1) gives Result 6.
Result 6.
3.5
The sequential design (s-d) case.
This is the general case of Chapter
2 in which a rather direct application of Corollary 2.2.1 is used to evaluate
Bayes risks in 6.
00
Result 7.
•
The Bayes risk relative to
1 in
6.
00
,
p(E), is
I
I
I
I
I
I
I
••
I
I
I.
I
I
I
I
I
I
I
29
From Corollary 2.2.1,
~.
pel)
= min [1 - max
e
se'
c + hl,
(1),
where, with the notation for prior distributions introduced in Table II,
h
=
=
min ([p(lo 3 )(s + s3 )1 b + p(l)(l-b)},
e-1,2
,-e
0
-e
(2)
Next, reapplying Corollary 2.2.1,
-op )
=
min[s 0 !(~ 0 +s p ), s p
!(s
o+s
p ), c + h op 1,
(3)
h
=
min
J p([la lz ) f~ e (xl) dx l ,
P -2
-op 1
e 1=1 ,2 X
(4) .
p(~
where
op
The integral in (4) is minimized with respect to e
the integral is (l-b)
l
by taking el=p, since then
p(~
-op )(~ 0 +~ p ) whereas with e 1 = 3-P the integral is
By (3), i f
1_
< min
~o
2-J
[ ~,~+~
o pop
,
then
I
I
I
I
I
I
I
I·
I
c + (l-b) p(la p ) , i.e.,
min
~
~o
2- cJ
So+S p's
o+S
p'
(5).
b
Use of (5) in (2) and the result in (1) completes the proof.
Ad r
0 is now defined, and Result 8 proves that r(i,o)
Notation.
T
for
e
with
p(~)
= min
[T o ,T 1 ,T 2 ], where
o
= 1,2.
Definition.
(a)
For brevity, set
If p(i) = To' then
~
a
= max
e
0
= oBi
as follows:
= 1 and
~o(a)
= 1 if a is the least integer in A
Define the d r
se.
~o
Otherwise, 0 is defined arbitrarily, in the sense that other
rules in 0 are restricted only
by the requirement that 0 be in
~oo.
I
30
(b)
If
= Te
p(l)
=1
for e
or 2, then let N denote the (random) least integer
n such that x n is not in [b,l].
The e r
X is defined, with probability 1, on
[O,N] by
X
z
-2p
(i)
and
If x
> 1, then 1jI
N
!.2p
X (e) = 1
-
for
0
== 1jI 0 = 0 = 1 - 1jI
1 $ P $ N.
for 1 $ p < n,
If
~
< band min
e-o,e
(0) = 1 - 41
!.2N
arbi trari ly.
(e)
<
~e
1,
!.2N
c
$ b; then 1jIz
== 1jI
o
-2p
1 only in case
!.2N
If ~
(iii)
b and
~
< s e ; and
o -
=0
° is
= l-1jI
for 1 $ p < N;
!.2N
otherwise defined
c
> b'
then let Nt denote the (random) least
min
e=o ,e
integer greater than n such that x ' is not in [b,l].
N
p
(e)
41
!.2N
° is otherwise defined arbitrarily.
(ii)
41
(e)
The e r
is defined for
> N by
X
!.2p
The s r
(3-e) - 1
n > N.
for
is defined, with probability 1 for
for
is defined at Nt by 41
The t d r
(0)
Otherwise
Result 8.
Proof.
° is defined arbitrarily.
~
p < N'.
1 only in case
(3-e)
!.21Q'
P(i,oBi) = p(l)·
If case (a) obtains, then the proof is straightforward.
For the
case (b), evaluate risk as the sum of average loss and average cost, and note
that
Pre{N=n}
=
(b-i), p(l,o) =
(I-b)
c
b'
n-l
b, so that
eeN
=
00
~
nb(l-b)
n-l
1
Thus in case
n-l
since with probability 1 this case can obtain only if e=e.
c
Similarly for (b-ii), p(i,o) = -b + min(s o ,s).
e
= b'
Note that, with probability 1,
(b-iii) does occur if 8=e and that if e=0,3-e t.hen N' is distributed identically
as and independently of N.
so that r(l,o)
c
= b(2-s
).
e
Thus for
c
e = e, r(e,o) = b and otherwise r(e,o) =
Comparison of risks derived for the several possible
cases with those of Result 7 proves Result 8.
I
I
I
I
I
I
p $ Nt by
0
1 - 41
!.2N'
x Nt < b.
~
••I
2c
~,
I
I
I
I
I
I
I
••
I
I
31
I.
I
I
I
I
I
I
I
Relation of the Bayes d r
3.6
~
00
in
defines an e r for the two action case.
and Chernoff's d r. Chernoff [3]
An extension to the present three
action framework which seems to preserve the substance of Chernoff's rule is as
follows .
...
Let e
n
denote the maximum likelihood estimator of e after n trials and let
e
I(e,e',e) on
e
x
x E be the Ku11back- Leib1er information numbers as used in
[4], i.e., I(8,8',e) •
ee
log (fee/fe'e)'
A three action Chernoff-type e r is:
X chooses by some random rule between e=l and e=2 and X (1) = 1-X
(2) = 1
o
!.2n
!.2n
if min 1(8 ,8,1) > min 1(8 ,8,2), and some random selection is made if the
n
n
~e n
~8n
minima are equal.
If more than one e maximizes the likelihood at stage n then
select the next experiment by any random method.
That the Bayes rule 5 is a special case of Chernoff's rule, i.e., Chernoff's
rule with special randomization ru1es,is seen as follows.
Maximum likelihood
estimates for cases of concern are given in Table III.
Table III.
I
I
I
I
I
I
I
I·
I
en
z
-n
0,1,2
b<x
<1 for l$p$.n
- pb<x <1 for l$P<n; e =8
- p-
n
'
x >1 for 8 = 0,1.
n
b<x <1 for l$P<n; e =e x n<b for
- pn '
e
8
0,8
=0,1.
Information numbers relevant to definition of Chernoff's rule are in Table IV.
Table IV. ~(e,e,e)
A
e
(8,e) (0,1)
°1
00
2
o
(0,2)
(1,1)
00
(1,2)
(2,1)
o
o
°
00
00
00
CD
(2,2)
00
00
e * : Chernoff
experiment
1 or 2
*
min
~'e I(~,e,e )
o
1
00
2
00
I
32
Thus if
ee
>
~3
-e • the Bayes rule is that form of the Chernoff rule which
always chooses e when randomization is necessary.
to the Bayesian in the following sense.
This would seem satisfying
His procedure is not worse than a pro-
cedure assuming no prior information, of which it is a special case.
On the
other hand, it is better in the sense that he performs first the experiment
which, in his prior belief, is more likely to result in the decision without
loss which requires the fewest possible observations, viz., N instead of N' in
the notation of the previous section.
3.7
Numerical comparisons.
For given prior distributions, the results of
the previous sections can be used to evaluate and compare risks of Bayes procedures with different degrees of experimental freedom.
As noted in Chapter 1
and made explicit in Sections 3.2-3.4, the class of s-nd d r, the class of
ns-nd d r, and the class of ns-d d r are each a subset of the class of s-d d r.
Hence the best, i.e., Bayes, s-d d r is no worse than the best d r in any of
the other three classes.
_.
I
I
I
I
I
I
I
The extent to which it is better can be judged in
the two cases considered by the calculations of this section.
For case
In the terminology of the hypothetical example
of Section 3.1 this is the case when it is not known whether a target is present
to the north or to the south, or whether one is present at all, and when each
of these possibilities is considered equally likely.
For case 2,
~
=
(.5,.1, ,4),
By comparison with case 1, this might occur when bad weather decreases the
probability that a target is present at all but makes it more likely that the
target is in the south if it is present.
Risks Pl,P2,P3' and P are calculated
as a function of the cost c per observation for these cases and are presented
in Figures 1 and 2.
In both cases, the risks of all four Bayes d r are the same for very large
cost, i.e., cost so large that none of the d r can afford even one trial.
Risks
are small for very small costs, the case when much sampling can be done.
For "moderate" cost, substantial saving can be made by adopting the s-d formulation.
For example, in case 2, Figure 2 shows that use of the Bayes s-d d r
I
I
I
I
I
I
I
-.
I
I
II
I
I
I
I
I
I
33
RISK
Ie
I
I
I
I
I
I
I
.-
I
COST
I
34
_I
I
I
I
I
I
I
I
RISK
O.S
_I
0.2
,{,ute' 2
.{.
<.5,
.1, .4)
COST
I
I
I
I
,I
I
I
-.
I
I
35
I-
leads to savings of as much as .15 over the s-nd d rand .18 over the ns-nd and
I
ns-d d r.
I
I
of the s-d drover the s-nd d r is as much as $150,000.
I
I
I
I
I
I
I
I
I
I
I
II
For example, if the units of cost is $1 million i.e., the loss for
not firing when a target is present is of this size, then the average saving
I
.1
I
I
CHAPTER IV
THE EXPERIMENTATION RULE X*
4.1
Introduction.
In the remaining two chapters, attention is confined
to the "simple hypothesis versus simple alternative" case where each of a
finite class of experiments gives information concerning the state of nature.
In terms of the elements of Section 2. I, the situation is described by:
e
fl,2} = A;
L
(L(e,a):
c
e
~
e, a
€
~
$ 1.
L(l,l) = L(2,2)
on the subsets of 8 can be denoted by
~=~(1)=1-~(2),
Without essential loss of generality, the assumption
0 can be and is made.
This chapter concerns the X* e r described in Chapter 1.
defined in Section 4.2.
I
I
A};
cost per trial.
Any probability measure
for 0 $
€
I
I
I
These e rare
In Section 4.3, it is shown that there is, generally,
d r using the X* e r which is better than anyone-experiment, s-nd rule.
a
The asymptotic optimality of the d r
5 , defined in Chapter I, is proved in
c
Section 4.5.
It will be noted that Theorem 4.3 is valid for general classes E, whereas
additional assumptions are necessary to extend Theorem 4.5.2 to infinite E.
4.2
X* experimentation rules.
The d r to be considered are all in the
class of d r which are non-randomized functions of the posterior distribution
with convex action sets that include
all such procedures.
subset of
~ro'
search in
~
*
!'"
ro
~=O
and
Two facts are noteworthy.
*
Let
~=l.
~
First,
i.e., the search for a Bayes d r in
*
~
denote the class of
*
~
is a tractible
is much simpler than the
Second, Theorems 2.2.2 and 2.2.5 prove that a Bayes rule in
is a Bayes rule in
~
00
for a Bayes d r to the class
Therefore, nothing is lost by restricting the search
*,
~
whereas simplicity is gained.
I
I
I
I
I
I
·1
I
I
37
I.
I
I
I
I
I
I
I
I_
I
I
I
I
I
I
6(~1,E2)
Definitions.
is the subset of 6 * consisting of d r which
continue sampling if and only if the posterior probability (that eel) is in
(E ,E 2 ) and which use a Bayes t d r.
l
I;
e
L(e,a)
He)
$ is Bayes !,!l6* i f
The t d r
•
a
~~ I; L(e,a')~(e)
€A
e
when a is the decision of $ at posterior distribution
The e r
s(2», 0 $ s(l) $ 1.
X* • X* (~I,E2) is defined for given s r wand t d r $, viz.,
rules such that (W,$,X*) is in
6(sl'~2)'
Hence, the best d r using X* is the
one which, by choice of (sl,E ),minimizes the risk among such d r.
2
Definition.
Let r(s',e) • r(s'. e,sl,s2) be the Bayes risk at
of s-nd d r in 6(SI,s2) which use experiment e
€
E at each stage.
e(s') • e(I',sl,s2) be an experiment in E satisfying r(s',e(s'»
Define X*
= X* (sl' s2)
X* (e(s'»
!.2n
= I
if and only if
o*
0* (s,sl,s2) is the d r for prior probability
e r
X* (El's2)'
Sz
=
-2n
in the
Let
• min r(s',e).
s'.
S in 6(sl,E 2 ) which uses
4.3 A result for general convex stopping sets.
provides some indication of
S'
E
by
The theorem of this section
"goodness" of the X* e r
in the class 6(sl's2) for
any (El,sZ) with 0 < E < E < 1. In particular, the corollary proves that, in
l
Z
0* has smaller average risk than any single experiment s-nd
6(sl,s2)' the d r
d r.
Further, the best s-nd d r in 6 (X) is known [16] to be in 6 * ; suppose it
is in 6(E * ,E *).
l 2
* * has risk no larger
The corollary proves the X* rule X* (El,sZ)
As usual, dependence on (sl,E Z) is suppressed
than that of the best s-nd d r.
for brevity.
Definition.
as 0* and with e r
For n ~ 0, let 8(n) be the d r
with the same stopping sets
X(n). x(n)(s) defined by
•
*z
X
for
-2p
and
for
I
(~(l),
p
~
n.
I
38
Let p* be the risk using o(n) and p* the risk using 0*.
n
Thus 0 (n) is the d. r. which "follows" 0* for n trials and uses the same
experiment for all trials after the (n-1)st.
Theorem 4.3
*
*
If 0 ~ ~ ~ 1, then Pn+1(~)
~ Pn(~)'
A.
If condition 3 of Theorem 2.2.6 holds, then p* (~) • lim
n-oo
B.
(E1'~2)
[Note that dependence on stopping values
*
Pn(~)
for 0
~ ~ ~
1.
is again implicit but suppressed.
*
Proof (Part A) Let Zn be the set of Zo points for which ep+1 is the
experiment of X(n) for each z2 and p < n.
-p
.
**
{!.
Z
n
€
*
Z :
n
Then
where
~
*
denotes complementation with respect to Zn'
integer p with
= 1, i.e., the decisive sample
~
!.2p
size for!. using d r
d~
e,x
() (!.)
n
o(n).
d~
=
e,x
If z = (z
z(2n»
then
-2n' ,
(n)(!.
(2n)
I!.2n)
n
IT f
(x) dU(x ) a.e. ~
()'
p
p=l e ,e p p
e,x n
for B=1,2, so that a.e.
n
IT f
(x)
p=l e ,e p p
rule for selecting e
1
~(e)
at prior distribution f
dU(x ),
p
Therefore,
!.2n
nJ [f{nc
xz a
+ r(~
,a(e »)dE (e)]£ (x )dun
~
n
~
~e-n
'
~n
-n
P=l2p
where o(e) is the s-nd d.r. using experiment e (with the given stopping sets)
and f t
(x )dun =
s~ -n
Z*n
~
e
; f
(x) dU(x )~(e).
p=l e e p p
p
is the same using either o(n) or 0(n+1).
By definition of o(n) and 0(0+1),
On
z**,
n
Theorem 2.2.1 gives the
J
.-I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
I
II
I
I
I
I
I
I
39
l'fsk'contribution ~s
f (n+1)c + f r(~
*
X
~2n+2
n
n
, o(e »f t
(x )du(x )} f t (x )du using o(n), (1),
n
sen
n
s~ -n
~2n n
and
f
(n+1)c +
r(~
f
X*
~2n+2
, 0(en+1»
ft
s
~2
n
n
(x )du(x )}f t (x )dun using 0(n+1),(2),
n
n
se-n
en+1
-n
where X* is the set of outcomes possible at the first n trials of a point in
n
**
Zn
*
*
But by definition of Xn , i.e., of X , use of en+l at trial n+l minimizes
the integral over X in the expressions (1) and (2) for each
~2n'
In summary,
risk contributions on z** are the same for o(n) and 0(n+1), while on Z** the
n
n
risk contribution using o(n+l) is uniformly no greater than that using o(n),
to prove part A.
(Part B).
Suppose
€
> O. Theorem 2.2.6 insures that there is a number M
eeN < Musing
such that, for &=1,2,
0* or o(n) for n
~ O. Also, there is an
!
Pre(N(o,z) > Tj}E(e) < €!M for 0=0* or o=o(n). On
8=1
- Let z* denote those Z
Z** ,risk contribution using 0('1]) is given by (1).
integer
'I]
such that p
=
'I]
o
'I]
I
points for which ep+l is the experiment of X* for each p and
I
On "'**
Z ,the contributions using either o(Tj) or 0* are, by definition, the
Tj
I
I
I
I
I
.-
I
"'**
Tj
contribution on Z* -Z
~2p'
The risk
using 0* is given by (2) with o(en+l) replaced by 0*
same; these are the first integral in P*
n
to complete the proof.
Corollary 4.3.1
If condition 2 of
The~em
2.2.6 holds, then in
~(El,E2)'
P* is not greater than the risk of the best (i.e., minimum risk) s-nd d r.
~.
The risk of the best s-nd d r
establishes the corollary.
4.4
* so part B of the Theorem
is Po'
*
A partial characterization of the X
c
1
(I+C' I+C)
rule for small c.
The lemma of this section will be used in proving the asymptotic optimality of
the d r
8* which uses a X* e r.
c
Definitions.
o(e) = o(e,c) is the d r in
c
c
+c
+c
~ =~-l--'-l--)
c
which uses experiment
I
4lJ
e in each trial.
5* is the d r in 6 which uses e r
c
c
*
c
1
X (Hc' 'i+c')'
X*
c
n
L(z2 ) = ~ log(f 2
/f
).
- n
p=l
,e p 1 ,e p
Lemma 4.4
1.
I
I
I
I
I
I
L(e,a) is finite for 8,a=1,2.
e
x E, m ~ 1(8,3-e,e)
~
2.
There are numbers (m,M) such that for (e,e)
3.
There is only one experiment, ee (say), in E which maximizes l(e,3-e,.) on E.
4.
There is a number B such that, with d r
€
5(e) in 6
2N
c
M.
c
ee(L(~ ) - log 1 -L
~
1-e
B
and
~
uniformly in 8,e, and c.
B
(This requires that the average "overshoot" of the
action set boundaries be uniformly bounded; see Wa1d [15, Appendix A3.2].)
>
There are numbers (A1,A ) such that if
2
X*
c'~2n
(e )
1
1
(e 2)
1
if
ez
Al <
E
r
> 0, then c 1 > 0 exists with
<
-2n
[ + 1-£
1
c
2
--I
(1),
and
X*
c'~2n
if
c
1-£
2/
[ + '_£]
1
c
2
<
e.[2n <
A ,
2
I
(2),
whenever 0 < c < c .
1
Proof.
Proof of (1) is given; (2) is proved analogously.
that there is a Al such that, for c sufficiently small and A <
1
the best s-nd d r in 6
c
uses experiment e .
Writing N for N(5(e),.[) and
1
e'
At
e',
e'
< [ l+c
l-~]-l
This can be proved as follows.
to represent any possible prior distribution,
de' ,5(e»
the s r of 5(e) is to stop as soon as
I
The lemma asserts
(3) .
,
I
I
I
I
I
-.
I
I
I.
I
I
I
I
41
Thus by Chernoff's Lemma 3 [4], if the prior distribution is f', then
where
L=
max L( e,a).
exA
By Wald's equation, if the prior distribution is
-log c -log ~ + ZC(l;;') (log c-B)
I
<
-
( 1,Z,c )
elN $
$
(Z "1 e )
From (3), for each e
E and 5(e)
€
€
eZN $
6 '
c
~
s'c
-log c +log ~
1-5' +B
1(Z,l,e)
r(~' ,5(e)) ~ 1(l,Z,e) [-log c - log l-~' +
+
and with e
I
••
I
-log c -log ~
1-5' +B
1(l,Z,e)
and
I
I
and 5(e) is used,
then
-log c +log ~
l-f' + M
l-S (log c-B)
I
I
I
I
I
~'
=
(1-5')C
Zc(l-E')
~'
~
(log c-B)]
~
1(Z,l,e) [-log c + log 1-~' + l_~I(log c-B)],
+
(4),
e , the experiment which maximizes 1(l,Z,.),
l
:;;
-log c -log ~
1-5' + B ~ ~
C~' [
1(1,Z,e )
+ L s' ] +
1
~
+
c(l-~')
-log c + log 1-5' + B + ~ ~
[
L l-s' ] ,
1(Z,1,e )
1
(5).
Let
1
1
1(l,Z,e)
1
1(Z,l,e)
1
and
D
1
=
Then, using (4) and (5), the difference in the risk
by assumption 3, D1 > O.
using e and that using e
1
is
I
42
r(~',o(e»
r(~',o(el» ~
-
[~'Dl
[lOg c + log 1:;']
-c
2
(l-~')/m]
A
+ 2c [log c - B]/m - cB/m -cL =
Since
B
B is a concave function of
c
where lim
c=o
Rl(c) = O.
2
then B ( 2+mD
c
For
E
>
»
c
~'.
say.
[-Dl+ -lJ
-c
(x)
Bc(~'),
+
x(l-x)
I-x
mx
Also,
Hence, there is a c
>
ll
0 such that if 0
< c < c ll '
-2n
l
0, let
~
I
I
Then
E,C
_£ D clog c [1 + h (c)],
2
4 1
if
[
where lim R (c) = O.
2
c=o
B
c
(~
E,C
I-£'
2 ]
D
l
2 -
~ /
Hence c
) > 0, i.e., X*
c'~2n
12
(e l )
(Hc
E
1-2
~2 '
(6),
> 0 exists such that if 0 < c < c
=1
_I
1
)
if S
12
' then
SE,C
~2n
Since B is concave and positive at both
c
S'
=
(1)
2/ (2+mD ) and S' = SE,C' for each c < c
l '
l
between 2/(2+mD ) and
l
proves (1) with Al
4.5
=
~
E,C,
i.e. , X*
~2
uses e
l
B
is positive on the interval
c
is in this interval.
i f ~z
This
-2
2/(2+mD )·
l
* Theorem 4.5.1 specializes a result of
Asymptotic optimality of 0c;
Chernoff [3] to the situation of this chapter.
Theorem 4.5.1
Let 0c' ad
1
trial c, be defined for 0 < c < 1.
lim inf
c=O
r(Lo c )
-c log c
~
for a given
dec~"i0r.
Then if 0 <
sup Iil,2,e)
E
s<
proh1pl"l with cost per
1,
1-5
+ sup I(2,1,e)
E
I
I
I
I
I
1 i f ~z
0, i.e.,
_.
I
I
I
I
I
I
I
-.
I
I
II
I
I
I
I
I
I
43
Proof.
This is Thr.orem Z of [3].
By virtue of this theorem, asymptotic optimnlity of a class of d r is
defined as follows.
Definition.
Let
~
c
, a d r for a given decision problem with cost per trial
c, be defined for 0 < c <: 1.
optimum at
~
that if
C
if
lim sup
c=o
Remark.
Then the class {f, : (kc<U is :lsymPtotically
r(s,O )
c
-c log c
5
Asymptotic optimality of
,I
1-5
sup I(2,1,e)
E
fa *c : O<c<l} can thus be proved by showing
Ci> 0, then c l > 0 exists such that if 0 < c
r(8,0 * )
-c log c
I(8,3-e,e A)
c
where ee is as in Lemma 4.4.
(1+"')
'-"
~
c
and 8=1,2, then
l
,
(1) ,
For this is equivalent to
r(8,f/<)
lim sup
I
I
I
I
I
I
I
+
sup I(1,2,e)
E
Theorem 4.5.Z.
Under the condi.tions of Lemma 4.4,
asymptotically optimum at
~.
(1)
c
-c log c
for 0 <
~
~
..'e
fa c : 0 < c <
l} is
< 1.
is proved for e=l, the proof for
e=z
being c.,)mpletely analogous.
The theorem then follows by tho preceding remark.
The numbers m, c ' Al,B are as in Lemma 4.4.
l
are given, and let
Consider c
E
E
= ma.
(O,cZl, where
~ppose
a> 0 and
For notational convenience, let
C
z satisfies C z E
(O,c ] and ~
l
E
SE
(0,1)
S = log(s/(l-s)).
log c
2
~ log«l-Al)/A ).
l
Partition the sample space Z into components which differ in the way
o
termination is achieved.
Z points which do not lead to termination are neglected,
o
since Theorem Z.2.6 proves that this set is null.
diagram of
Reference to the "schematic"
Figure 3 may be of help in following the argument.
I
44
o
•.j1
>.J\
>.J\
>.J\
C)
C)
U
+
bO
....
s::
.,-l
.....0
0
Po
,...
+
+
+
s::
L'
C)
bO
.....o
0
0
bO
bO
.....
0
.....
""'
wl..;t
,,,IN
wl..;t
1\1
I
>.J\
+
-rl.. .
'-'
N
+
bO
.....
.....
>.J\
NI
'-'
...:l
,.<
I
.....
0
Corresponding
point
~z
-2n
.....
-L
He
I
Al
~
He
Figure 3
C~: for some n, L(~2n) ? -log c + t;
&,
for p<n, L(~2p) >
i
log c +
n
and
ZBc = (~: for some n, L(~2n) ::; ~ log c + t; &, for p<11,
Let
NAc(~)'
Nlc-(~)'
defined on ZAc' be the least n such that
defined on ZBc' be the least n
Z
B1C
and
ZB c
2
{~
f
~lIch
that
i
log c + t; < L(~2p)<10g e+O.
L(~2n)
? log c, and
c
L(~2n)::;"2
log c.
E
Consider
for some n > Nlc(~)' L(~2n) ? i; log c +
(
&, for p<n, L(z2
- p ) > (l-~E) log c +
for some n, L(~2n) ::; (l-\E) log c
&,
for NB(~) ::; p < n, L(~2p) <
4
log c
Let Nc (z)
- be the least n such that
"- (log c + t; , -log c +
and, on ZBc' N2c = NBc - Nlc and N3c = Nc - NBc'
L(~2n)
"A'
1;),
The average sample number (ASN) of 5 e when 8=1 is
_I
t}
t}
I
+ t;
E
I
I
I
I
+
Let NBc' defined on ZBc' be the lea3t n > Nlc such that L(~2n) ? ~ log c +
or L(~2n) ::; (l-~€) log c.
I
I
I
I
Define
ZAc
--I
t;
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
1_
45
e l Nc
=
el{Nc(~)I~ e ZAc} Prl{~
€
ZAc}
3
+ eli E N (z) Iz e ZB } Pr {
Z } +
1 ~ e Blc
p=l pc - 1c
3
e l { EN (z)lz e ZB } Prl{~ e ZB c},
p=l pc - 2°
2
+
By Theorem 2.2.6, there is a b
(2).
2
> 0 such that the first term is S e -b /(l-e -b ).
Next,
is not greater than the average of the number of trials, n, such that
which by Theorem 2.2.4 is
s
-,e
log c + (B + ~2.
m
'
the term B/m here is a bound on the average "overshoot" of \e log c
+~.
Now
note that, from Lemma 4.4, B* uses e at trial n+l whenever
l
c
l-A
l
(l-\e) log c + ~ S L(~2n) Slog -A--- +
I
I
I
I
I
I
I
I·
I
+
S,
1
and, in particular, whenever
Let
k(~)
k = k(~),
and ZB
C
on ZBc be the "overshoot" of ~ log c +
S by L(~2N
). For any
Bc
the sum of the integrals, i.e., contributions to elN ' over ZB
c
lC
of N is just the ASN of a s-nd d r using experiment e and terminating
l
2c
2
as soon as the log likelihood ratio
SO satisfies SO i «l-e) log c -k, ~4 log c -k).
n
n
By Theorem 2.2.7, this is, for any k,
S
-(l-e) log c + B
I(1,2,e )
l
{
}
Pr l ~ e ZBc
The expectation e{N3c(~)I~ e Blc } is, by Theorem
2.2.7, not greater than an
upper bound on the ASN of a s-d d r which stops after the first trial at which
L(~2n)
€
i (2 log c, -(l+t)
log c).
Such a bound is
I
46
-2 log c +B
m
Similarly,
-, € log c +B
m
$
Required bounds on probabilities are
Prl(~ € ZB C} $ 1,
2
Pr l ( ~ € ZB1C } $ Pr ( 8
0
0
terminates SN
~
4€ log
c
}
$ c
\€
by Chernoffls Lemma 3 [4]. Also from this lemma, i t follows that
I
I
(3) .
*
To bound r(1,8 c )' use
together with (2), (3), and preceding facts to obtain, for some numbers b i > 0
and b 2 > 0 and for 0 < c < c 2 '
*
r(l,8 )
c
(average-cost bound on ZAc)
(average-cost bound for Nlc trials on ABC)
I
-b
+
+
l
ce
(1_e- bl )2
-\€ log c + cB
m
c
l~
4
-2 log c + cB
m
(average-cost bound for
Nc-N lc trials on ZB
+ -(l-€) clog c + cB +
-\€ clog c + cB
I(1,2,e )
l
m
- clog c
$
- clog c
1
I(1,2,e )
l
+~+ R ( )
2
_I
I
I
(loss bound)
$ ct
.1
I
I
I
I
I
3 c
lC
)
(average-cost bound for
Nc-N lc trials on ZB1C)
I
I
I
I
·1
I
I
47
I.
I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I
I·
I
where lim R (C) • 0, so that there is a c l E (O,c 2) such that if 0 < c < c 1 '
3
c-o
then
*
1
r(l,B c ) $-C log c [~I~(~1~,2~,-e-l~)
+ ex],
to complete proof of (1) for 9-1.
beginning of this proof.
The theorem follows as described at the
I
••I
CHAPTER V
AN APPLICATION:
THE CASE OF LOST LABELS AND THE NO-OVERSHOOT X* RULE
5.1
Introduction.
The elements of the decision problem in this chapter
specialize those of Chapter 4.
Here
e
~
A - E • {1,2}
and
L(e,a) > 0 if and only if
e
~
a.
The probability structure involves two probability densities, gl and g2' with
respect to a given measure u on a given measurable space (X,S).
fe,e - gl if
e~
e and fe,e • g2 if
per trial is a constant, c.
e~
Specifically,
e, as illustrated in Table V.
The probability measure
Cost
I
I
I
I
I
I
S on the subsets of e
can be denoted by s - s(l) - 1-s(2).
Table V
~l
f
e,e
~2
e~l
e=2
This, for example, is the structure in the problem of target detection in
Chapter 3 when
(a)
radar readings are random variables with density gl if a target is
present and g2 if no other target is present,
and
(b)
a target is known to be present, but it is not known whether it is in
the north (e-l) or in the south (e=2).
In another practical application, this situation can be called the "case of
lost labels", because of the following interpretation.
A physician is treating
I
I
I
I
I
I
I
••
I
I
49
I.
I
I
I
I
I
I
I
disease which can be cured quickly using drug 1 and less quickly using drug 2,
the performance of drug d on any particular individual measured by a random
variable with density gd'
Suppose there is a supply of each drug, but that the
labels on the two supplies have been lost.
The physician must determine, by
experimentation with patients, which supply contains drug 1.
called sl and s2' then he is working in the present framework if the cost of
administering either drug is c, and where, for p=1,2, eap if drug 1 is in supply
sl' and e-p if the drug from supply sp is administered.
Note also that the S, E, and probability structure here are the same as in
the generalized two-armed bandit problem, as solved by Feldman [5].
example of this chapter, the e r for the Bayes two-armed bandit is just the
opposite of the Bayes solution here.
For this lost-labels case, a
"no-overshoot"
explictly, i.e., each of the probability measures X*
explicitly.
r(~,5(e»
I
However, as
indicated in Chapter 1, the loss-cost structures differ, so that in the numerical
~2n
I
I
I
I
I
I
I
I·
If the supplies are
of Section 2.1 is defined
By a no-overshoot X* e r is meant one in which the quantities
are evaluated only approximately, the approximation introduced by
"neglecting the excess over the stopping boundaries", i.e., assuming the posterior
distribution
when sampling is stopped is either exactly c
1
or exactly c 2 .
A method of evaluating the risk of s-d d r in 6(c ,c ), adapted from the
2 1
work of Whittle [17], is sketched and applied in an example with certain binomial
distributions to obtain no-overshoot approximations to the risks of d r using
,,*
X e r and Chernoff
e r.
It is shown that, for some special parameter values,
,,*
a Bayes solution in this binomial case is a X -type d r.
5.2
"*
Specification of the no-overshoot X* rule in 6(c2,cl~)-L~X_
Notation. For convenience, let
1(8,3-8,8)
J
X
I
50
and
~
=
1(3-e,8,8)
•
J (log
X
g2
--) g2 duo
gl
The no-overshoot approximation to be used, as indicated in the previous
se~
tion, can be written as
log (l-C l -1-) .. t
c
L(!.2N)
•
1
log (
where
~
l-~
l
l-c
2 -1-) = t
c2
l-~
l
2
if
if
is the prior probability that e=l.
The no-overshoot approximations of Wald [15] to error probabilities are
"
ex
these do not depend on which d r
o(e) is used.
Approximations to average L(!.2N)
values are
and
..
(
l-c
c
l-c -12 -1-) + (l-c )(~-c ) log (-c---l 1 t) J
l-~
1
2
1
-s
2
To approximate the risk
the approximations above are used to give
••I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
••
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
51
l-c
1-c
+ I(Z,l,e)
c
((cl-~)(l-cZ) log(--Z
-1-)
+ (l-cl)(~-cZ) log(-c-l
C
l-~
z
1
-1-
l-~ )~1,
Alternatively, this result is obtained by the methods of [17).
Definition. d(~) - ~(~,8(Z»
- ~(~,8(1».
I if
,,*
X
!.Zn
(1)
•
{
~
!.Zn
~'
and
*
The no-overshoot X e r
d(~') ~
in
0
o otherwise.
By (1), after some simplification,
d(~')
-
I-x for 0 < x < 1.
where g(x) = log --x
Theorem 5.Z
posterior
~'
A.
If
If
If
then ...*
X uses e=Z for those, and only those,
~ >~,
~ =~,
~'-cZ
> g(c Z) + --_-[g(c ) - g(c Z»)'
c C
l
l
z
,,*
then X uses e=Z for only those posterior
~'
satisfying
l'-c Z
< g(c Z) + ---- [g(c l ) - g(c Z»)'
cl-c Z
,,*
then X uses e-l for each trial.
g(~')
C.
specified as in Theorem 5.2.
satisfying
g(~')
B.
~ <~,
* is
Hence ~
Two simplifications are worth note.
First, "*
X in
~(cZ,cl)
can be speci-
fied in "cook book" form, with no calculations required of its user, by use of
Figure 4 as follows.
(1) .
I
52
_.
I
I
I
I
I
I
I
_I
Figure 4
I
I
I
I
I
I
I
-.
I
I
53
II
I
I
I
I
I
I
Ie
Connect the ordinates g(c l ) and g(c ) with a line l(cl'c ), say.
2
2
(bA ) If ~ <
and the posterior probability ~. is in (cl'c ), then take
2
(a)
6.F
another observation, using e=2 if and only if the curve g is above the line
and the posterior probability ~. is in (cl' c ), then take
2
another observation, using e=2 if and only if the curve g is below the line
(bB)
I
If ~
> 6,
J(c l , c2 ) at ~'.
"*
Second, the facts of Lemma 5.2 concerning g aid in specifying X •
Lemma 5.2 A.
g'
< 0 on (0,1).
> 0 on (O,!) and gil < 0
B.
gil
c.
If
0 ~
on (!,l).
<!, then g(tt-x)
x
= -g(~x).
For (c ,c2 ) with 0 < c2 < c l < 1, there is at most one ~* in (c ,c ) satisl
2 l
fying h( ~*> = 0, where
_ x-c
2
hex) = g(x) - g(c )
[g(c ) - g(c )] •
2
cl -c2
l
2
~.
A, B, and C are easily proved. Proof of D, which is only slightly
D.
more involved, is given.
case
I
I
I
I
I
I
I
I-
~ ••
I( c l ' c2 ) at
two
5*
>! is treated similarly. Since
zeros on (O,~), viz., c2 and t; and
and h* = g-f.
On
~
downward on [2,1).
at
Suppose one such ~* exists and suppose ~* <
-!,
(!,c l ), h,*
i ;
the
h is convex up on (0, ~], h has at most
ha) > O. Define
f(x) =
~
cl-~
g(c )
l
> 0, since h*(-!) = 0 = h*(cl ) and since h* is Ollnvex
!
But h-h.* = f *-f > 0 on a,c
), since this is linear, positive
l
and 0 at c l ; i.e., h > h,* > 0 on [!,cl ).
Therefore, there is at most one value 5* of the posterior probability
such that one experiment is preferred for all ~ > 5* while the other is preferred i f 5<
"*
X
*
~.
If no such
* exists,
~
then the proof of Lemma 5.2 shows that
uses one experiment at all trials.
Note that if
c +c =l, then
l 2
~
* exins
5.3 Some Fourier anaJ,ysis.
Consider the lost labels problem with
l,
L(a,a) = {
!
and equals "2.
o,
(1).
I
54
This type of symmetry, i. e., errors of either kind being of equal seriousness, is
also part of the structure of the two-armed bandit problem in [4].
As usual, p(~) is the Bayes risk at prior distribution~. From
Corollary 2.2.1, experiment e is preferred for the first trial at prior distribution ~
o
(hence also when ~
!en=
~
0
) if and onq i f R (s ) < R7: (~), where
e a
~-e
0
R (~) = f p( ~
) f t (x) d\)(x) •
e,x
s,e
e
X
From the symmetry of the problem, some useful facts can be deduced.
First, p(~) = p(l-~).
For let 8 be a Bayes rule which is a non-randomized func-
tion of the posterior distribution for the given problem.
1em forJlD.l1ated in terms of e* =
3-e,
Consider the same prob-
e* = 3-e, and a* = 3-a.
The probability den-
sities are again as in Table III and the loss function as in (1), except with
*
e replaced bye, e replaced by e*, and a replaced by a*. This revised problem
thus has the sa.me formal structure as the original one. Hence 8 is a Bayes d r in
this case also, and
p(s(e»
= p(~(e»
* = p(l-He»
(2).
,
Suppose 8 uses experiment e when the posterior probability that e*=1 is
~.
This means that 8 uses experiment e* = 3-e when the posterior probability that
e*=1 is ~, i.e., using 8, e(l-~) = 3-e(~). Moreover, by (2), it follows from
Theorems 2.2.5 and 2.2.9 that there is a Bayes rule in ~(l-cl,cl) for some
l in [i,l).
Let ~t denote the class of d r which are in 6(c ,l-C ) for some cl in [-!,l)
l
l
and for which, i f i < ~ < c and if e( ~)=e, then e(l-s) = 3-e. Any such d r can
l
c
be denoted
liz,
where E = C'Z1''E2 ), as defined immediately preceding Theorem 2.2.5.
Synmetry arguments like those above show that i f 8 E ~t, then r( s, B) = r(l-s,B).
Suppose ~
and
where
£
~t, and let ~ = ~(~) = log l~S , Y
::I
y(x) = log[gl(x)/~(x)J,
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
55
FeE(~) = 0
Note that
FIE(~)
=
e~
F 2E (-s) if
~f
e~
if - - ; " E =3 (E).
l+e ~
e
e
0, since
r{£,~)
=
r(l-~,~).
Therefore,
n1eorem 2.2.1 implies
(3) ,
eS
l+e
if ~
£2: 2 ,
llere, G is the density, 1ilth respect to U* (say), of y = y{x) when x has density
~; g(y) = g{y). eY;
H,.(tl •
and
HzeCtl •
l
l
-e
(l+etl
i [ minc:,etl-min(l,et+Yl] aCyl dUOCy)
+
eS
if:=!
l+e
,....,
£
'::'1
otherwise ;
-e
(l+et,
+
e~
i [min(l:etl-minCet,eYl] gCy) dUOCy)
if ~
l+e
£
32
otherwise •
Forming Fourier-Stieltjes transforms in (3) and (4) gives two linear
O
equations in the transforms of FJE and F • let f denote the Fourier transform
2E
of the function f.
Then F~ and F; are given by
,p (e) [l-go(-e)] + uO (e) gO(-e)
F~ ( e) = '':I.E
,
''2]:
( )
5 ,
1 - gO(-e) - go(e)
F~(e)
mlere
Also,
hO(a) =
f
=
~(e)
g'(e) +
~(e)
1 _gO ( -a)
_
eie.Yh(y)dU*(y) if h
F~(e)
=
[l-g<'(a)]
(6).
SO( a)
= g,g .
F~(a) + F~(a) ,
(7).
Equations (5) - (7) provide solutions, at least in concept, to the problems
of evaluating risks of a given
d r
in 6.' and of searching for a Bayes rule. If
the Fourier inversion FE of F~
in (7) can be found, then Whether or not ~ is
Bayes is easily verified, since
~
is Bayes if and only if
I
56
satisfies the Bayes risk equation of Corollary 2.2.1 written in terms
of~.
The
prospective usefulness of this criterion, however, seems limited. One of the
simplest discrete
distribu~1on
~
cases uses binomial distributions, where
is the
density of a binomial random variable w:I. th success probability TIp' i. e. ,
~(l) = Tlp 1 - ~(O). In this case, ~(8) (1+e i8) is analytic, and the denomi-
=
nator in (5) and (6) is
If the residue theorem is to be used to evaluate the inversion theorem integral,
then the zeros of (8) must be found. This itself is a difficult problem.
For
special binomial random val"iables, however, the roodified Fourier analysis of [J.7]
leads to discovery of a Bayes rule, as proved in Section 5.5.
The same diffi-
culty of inversion arises in one of the simplest cases ,dth continuous random
variables, viz., when ~ (x)
fez) = 1 +
l
(l-r)
2
e-\x. In this case, one must find the zeros of
2
+ Z
r
=\
~ sin(z log r) - 2 cos(z log r)
r
+ iz (l-r)
r
l1here r
= ~/'2
2
+
[1 + cos(z lOS r) ] ,
and sin and cos are the complex trigonometric functions.
Equations (5) - (7) provide a second necessary and sufficient condition for
% to
be Bayes.
FE(~(s»
This fo11o'-IS from the fact that r ( s, ~) < r ( s, ~ , )
if and only
>FE'(~(~».
Therefore, r(~,~) is the Bayes risk at s if and only if,
o '1110
for any E' such that ~, € 6', FE-.l'E'
is a non-negative nn.l1.tiple of a characteristic function of a probability distribution. Although this criterion does not
seem practically applicable in proving
proving that some
~,
€
~
to be a Bayes d r, it may be of use in
6' is inadmissible by finding another such d r
that F~-F~, is a positive multiple of a characteristic function.
Use of
(5) and (6) in (7) gives, formally,
~
such
--I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
57
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
00
E
[go(e)+go(_e)]n
n=O
and
co
FE =
E
n=O
[
(HlE+~)
+ hl
* (~-~) ] * h~
,
"'here
*
"There
h:I.(y)
= G(Y) -
h (y)
2
=
g(-y)
C(y) + g(-y)
denotes convolution and 1-There f"*n is the n-fold convolution of f
itself. Since
~
is non-neGative, a Bayes
d r
HUh
in t,' is one which maximizes, by
choice of E, the function
for all
~
simultaneously, if possible.
5.4 Some characteristics of the Bayes stgPping rule.
The results of this
section were indicated to me by W. Hoeffding. They are included here, since they
provide an interesting
cl~acterization
ot the Bayes s r for the problem of thlG
chapter. Theorem 5.4.1 holds for general s-d problems. The other results arc for
the lost labels problem "lith the symmetric loss structure of equation (5.,.1).
Theorem 5.4.1
pes) > min [ po(s) , c/A] ,
"There
A = 1 - inf J 10f fee (x) dU(x) •
eeE X eee '
Note that in the lost labels case
! J
A=
Ig1(X)-~(X)
I dU(x)
•
X
~.
Since
pes)
a.nd
=
inf
J r(e,8)
BeD.
8
'"
ds(e)
inf J p(~
) f t (x) dU(x) = Inf J [inf r r(e,B)
eeE X
e,x
!o,e
eeE X BeD.",t3
> 10f J
- eeE
X
p(~)
10f f e
e€6' e
f
e
.~
(x) d~( e)
( ) ] f t (x)dn(x)
~,e x
!o,e
(x) dU(x)
,
I
58
it follows, from Corollary 2.2.1, that if
p(E) < p (E),
p(E) >
then
ciA.
o
Theorem 5.4.2 A necessary and sufficient condition for the existence of E
such that a Bayes drat s can take at least one trial is that
~.
Suppose that c > ~A • Then
Corollary 2.2.1
po (E) <- ~ <
po (E) <- ,p(E),
d r which takes no trials.
p(~)
'S
p0 (E) <- p( E)
ciA, so that
it follows that
p(E)
'S
Next, suppose that c
~A •
~ (E) •
'S
p( E)
'S
~,
By Theorem 5.4.1 and since !:ito includes
min [ Po (E) , ciA ]
c
=
p (E)
• Since by
and there is a Bayes
,0
~A • Then
<
rl(~) = c + ~l-A) < ~ =
-
P
0
(~) ,
so that a Bayes drat E = ~ nmst take at least one trial.
Definition.
If 8 is a non-randomized function of the posterior distribution,
define the continuation
,..,
!!:i w C
of 5 by
,...,,...,
~C
=
E.:.. e •
E
Theorem 5.4.3
If 8 is a Bayes d r which is a non-randomized function of
the posterior distribution, then
Since, in the symmetric-loss lost labels problem, p is symmetric
and since each -*
~ a
is convex, it follows that if ,-,
Uc is not empty, then
If
I€I 'S
1
':'
"2 € u
C.
(A-2c)/(2-A), then
l+€) _ Po (~)
<c _~
(1-"'"' _ ~ <
2
2
f\J
2 _ 0,
Pl ( 2
to establish the first inclusion.
If
ciA'S p(E)
E
,..,
€uc'
then
po(s)
>p(E)
so that, using Theorem 5.4.1,
~ po(E) = min(s,l-E). Hence, if min(E,l-E) < ciA, then E ¢
3 c'
to
complete the proof.
5.5 An eX!ll!Ple.
In this section, an example where gl and
binomial densities with a special property is considered.
certain c, a
x*
d r is Bayes.
I
I
I
I
I
I
I
_I
I-c) ( ....,
(c
c)
l-A+C
( ~'2-'X
~C C A' l-~ •
~.
_.
~
are
In this case, for
For this special case, an extension to s-d
d r of Whittle's method [17] of evaluating the risk function is sketched and used
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
59
to compare no-overshoot approximations to the Bayes risk of the X* -type d r
and the risks of two Chernoff-type d r.
Conditions under which this method of
evaluating risks is justified in more general problems are of interest,
although no such justification is given here.
Solutions by this same method in
more general cases of interest are presently being sought.
In the special case considered, ~1=gl(1)=1-gl(O)=.7457 and ~2=g2(1)=1-g2(O)=.5808.
The special feature of these densities is that
(1);
in particular,
(2).
The methods applicable for the special g
p
values here extend without modification
to any binomial distributions satisfying (1), but probably no further.
The locus
of "success probability" pairs (11 ,'1 ) which satisfy (1) is shown in Figure 5.
1 2
'11
t
I
I
I
I
I
I
I
••
I
19--------ii-"1.
1
Figure 5
The simplification which results from (1) and (2) is because of the fact
that if
any
~ €
~
is the prior distribution, i.e., prior probability that 8=1, then for
Zo and any n,
~z
-2n
where k is an integer.
bution is
~.
H
[1 + ~
e
\k] -1
,
This is true since if, before a trial, the prior distri-
then possible posterior distributions after the trial are as
given in Table VI .
I
60
Table
V..
Posterior distribution after trial (e,x)
e=l
e=2
x = 0
[1 + l~s e\(l
[1 + l~s e -\(1
x = 1
[1 + l~s e-~]-l
[1 + 19S e~(l
The loss function is L(e,a) = 0 or 1 according as
e=
a or
e~
a.
In the remainder of this section, attention will be confined to cases where
the prior distribution is of the form
[1 + e
~k -1
]
for k an integer. The analysis
for any other prior E can be carried out by the same method. Let E = [1 + e tk J -1
k
With the
in
~(c2,cl)
(~1'~2)
value of the example,
are specified as follows.
so the "*
X and Chernoff e r
~ >~,
,,*
I f c =1-c , then X
(1)=1 if and only i f
!.2n
c
X
(1)=1 if and only if E =E and k ~ O. Let X denote the
Ez <
- \, Le., !.2n
z
k
E
-2n
-2n
Chernoff e r [4] when the prior distribution is E. If the prior distribution is
2
,,*
l
Ek ., then X~ '!.2 (1)=1 if and only if E
~ Ek ., i.e., if and only if Ez ~Ek"
'ok' n
!.2n
-2n
i.e., if and only if E =E and k> k'.
z
k
-2n
From Theorem 2.2.1, the risk rUk'1"~ ) of the d r
k
uses ~
5~
k
in 6(c 2 ,c ) which
l
is the solution of the following functional equation at s=k:
k
c + r(E s+2 )f ,1(0) + r(Es_l)f ,1(1)
Es
Es
s ~ k, Es
if
€
(c 2 ,c l ), (4)
r(E s ) -
(3.2) .
Similarly, the risk r(E ,5* ) of the d r
k
is the solution at s=k of the functional equations (3) - (5) with l-e
of c , and the condition "s
2
~
kIf replaced by "s
~
l
in place
0" in (4) and "s < kIf replaced
by "s < 0" in (5).
Let N be the smallest integer such that EN
2
2
~
c
2
and N the largest integer
l
_.
I
I
I
I
I
I
I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
61
such that
~N
~
Ct'
Then, following Whittle, it can be established that a solu-
1
tion to (3)-(5) is
I
min(~.l-n
r(S) =
the functions r
l
if
r (S)
l
if
~
r (S)
2
if
~
~
=
~s
= ~s and k
and s
~
t
(N l ,N 2 )
s < N2
~s and Nl < s < k;
and r 2 are given by
109(l~S)(t¢
- H) +
r (0
l
c
r2(~)
c log(H)(..1 _ H) +
~
f
~v(l_~)l-v dfl(v),
f
~v(l_~)l-v df (v),
2
(6.1) ,
N
l
and
~
~
4r
N2
where
1
(v:
N
e
L:
x=o
(f l
,e
(x»
v
(f 2
,e
l-v
(x»
= l}
and f (v) is determined by the requirement that it be a measure on a class of N
e
e
subsets such that (3.e) is satisfied, for e=1,2, and such that
I
I
I
I
I
I
I
I·
I
(7. 1 )
and
r2(~k_l)
•
c
+
rl(~k f~
sk_l'
2(1) + r2(~k_])ft
2(0),
. sk-l'
The solution to the corresponding equation for
*)
r(~k,,5
with 5
is the solution of the preceding paragraph with N =-N,N 2=N, k=O, and
l
0.2).
-:.
in
6(~N'~_N)
~=~kl'
For the present, this "solution" is useful, since it suggests a possible
form of the Bayes solution and since it will provide an approximate evaluation,
i.
albeit inadequately justified, of the risks of 5
and
C
5~.
Rigorous justification
of the method is of interest but is not given here.
There is, in fact, a 5* rule which is Bayes for various values of c, as
proved by the following result.
Theorem 5.5.
If c is the cost per trial and there is an integer N satisfying
I
62
1
!z;N(~
1 +
+ _)
Ar
~
1N
Tl
4 +1
1
_e_
_ (_
Tl l -Tl 2 ~
1
c
5* in 6(E ,E_ ) is a Bayes d r.
N N
then the d r
The first step in the proof is to verify that this 5* has risk
Proof.
••I
(8) ,
where
and
!z;
(~
-!z;N
_
e~
I
I
I
I
I
I
_I
).
This is accomplished by showing that the function in (8) satisfies the functional
equation relevant to 0* in Theorem 2.2.1.
Specifically, the following can be
verified straightforwardly:
(b)
r 2H ) = l-s for
k
k
(c)
Equation (6.1) is satisfied for k=O
(d)
rl(sk) = c + r l (Sk+2)f ,1(0) + r l (Sk_1)f ,1(1), for 1
sk
sk
(e)
r 2 (E )
k
=
k = -N, -N-l
~
k < N
c + r 2 (sk_2)f
2(0) + r 2 (sk+l)f
2(1), for -N < k
Ek'
Ek'
~
-1.
In fact, because of the symmetry of the framework of the problem and the fact
that rl(E )
k
= r 2 (E_ k ),
only (a), (c), and (d) need be verified.
The proof is completed by verifying that the function in
functional equation for Bayes risk of Corollary 2.2.1.
the proof for
~
>
~
is again established by symmetry.
~)
satisfies the
This verified for S
~~;
I
I
I
I
I
I
I
••I
I
..
I.
I
I
I
I
I
I
I
63
The "optimum boundary" condition, by (d), is
POC!k)~}rlC!k)
This is satisfied, since r
l
if
k{:r:: ::1.
is a concave function of
~
and since, by (a),
It remains to show that r satisfies the "optimum experiment" condition, viz.,
1
(f)
if 2 $ k < N and ~ = ~k' then rl(~) $
(g)
rl(~l) $ rl(~2)ft
~
rl(~2 x)f~
x-o
'
,
~k'
then
2(x)
+c
and
~l'
2(1) + r2(~_1)ft
~l'
2(0).
Condition (f) is satisfied, since if 2 $ k < N and
for the
(~1'~2)
pair of the example.
~
=
In fact (9) holds for k=l.
To verify
(g), note that if k> 0, then rl(~k) > rl(~_k)' i.e., r2(~_1) • rl(~l) > rl(~_l)'
I
Hence, by (9) with
I
I
which is less than the right-hand side in (g), to complete the proof.
Remark.
~=~l'
The key to the preceding result is the assumed form of the Bayes
risk in equation (8).
I
I
I
I
I·
I
for assuming this form.
It may therefore be of interest to indicate the motivation
The key is a comparison with the form in (6).
In fact,
I arrived at (8) by beginning with (6), putting f (v)=O on N except at the real
e
numbers v-O and 1, and fitting the two end conditions
Ikl = N,Ntl.
e
re(~k)
=
po(~k)
for
From this form, the value of N for which equation (7.1) holds was
determined, as in the statement of the theorem.
An approximate comparison of the average risk of a Bayes rule 5* with a
c
Chernoff rule 5~ is also available through equations (6).
Again in this case
I
64
the approximation is of the no-overshoot type, in the sense that for a d r in
~(c2,cl)'
the posterior distribution at termination is assumed to be exactly
c
With this approximation, attention can be confined to the real subset
l
or c .
2
(O,l} of N , as defined following equations (6), to obtain
e
if ~
poH)
....
r(~) -
c log (1:1) (....l
~
6.r
c log (1:1) (...!.
~
~
1:1)
~
+
~fl(l)
- 1:1) + H
4r
2
(1)
t [c 2 ,c ]
l
+ O-Of l (0)
+
(1-~)f2(0) if ~k
as the no-overshoot approximation to the risk of the d r
The function f
l
i f c $. ~ $. ~k '
2
O~k in
<
~ $. c
(0) ,
l
~(c2,cl)'
is determined by Q.l) and the no-overshoot boundary condition
~(c2) - p o (c 2 ); similarly f 2 is evaluated using (7.2) and ~(cl) - po(c ).
l
The corresponding no-overshoot approximation to
is given by
(~O)with
*)
r(~,o
in
~(l-cl,cl)
k-O, c 2-l-c , and the same method of determining f and f .
l
2
l
For the special values of c defined in Theorem 5.3, the exact risk of this
••I
I
I
I
I
I
I
procedure is available through (8).
Two particular s r are of interest with the x~
e r.
First is the type
which Chernoff proposed and which, since it does not depend on the prior distribution, does not require that prior knowledge be specified.
This rule is to
stop after n observations if n is the least integer m for which
outside some interval (-k,k).
second boundary considered is
L(~2m)
falls
I refer to this type of s r as L-symmetric.
~-symmetric
The
in the sense that it stops after n
falls outside some
observations if n is the least integer m for which =
~z
-2m
interval (-k,k).
Formulae for no-overshoot risk approximations are given in Table VII.
remark concerning these formulae is of interest.
A
The (approximately) best
choice of boundary-determining value N is that which minimizes the expressions in
Table VII
as functions of N.
For small c and large N, the equation
is approximately equivalent to c •
~e
-tN .
dNo r(~)
= 0
(Note that as c decreases it is
reasonable that the ASN of the best d r, which increases as N increases, should
I
I
I
I
I
I
I
••I
-----------------e
Table VII: Computing Formulae for No-overshoot Risks
When the Prior Distribution is ~k E [c 2 ,c 1 ]
Risk
(c ,c )
2 1
d r
ck
4(1+e\k)
*
\k
(...l _ L-) +
4
~
6.r
ck
1
1
e\k
-.=:.,,-- ( - - - - )
4(1+e b
B~
(~k+N' ~k-N)
k
(L-synunetric)
) ~
6.r
"---
r'
"---
--or +
1 + e\N
(~N'~-N)
B
1
Nc e\N
+ - ( - --)
1
+
1 + e\N
..1.)
Ar ~
Ar
c
+
6.r r"---'
1'\1+1'\2 e \N -e\k .
1) -TJ
fN 1£ 0 <- k <- N
121+e
1
1
TJ +TJ 2 e\(N+k)_l
1
(- - -) - .:::.---:----:
4(1+e\k)
6.r ~
TJ1-TJ2
if -N :::;. k
<
0
e\N+1
\N
(l_e\N){k(e
\(N+k)
1+e\k + N+k c(e
_
4
~
+
(..1. _ ...l)
1
Nc e\N
+ - ( - --)
4 ~
6.r
\k
_c_
[(...l _ L-)k + (...l _
4(1+e\k)
~
c
4 (l+e\k)
-1.)
Ar
\k
TJ e\k + TJ
+ e ) _
2
1}]
W
e~N - 1
(TJ -TJ )(e +1)
1 2
+
W
\k
+ N-k c(L- _ L-)
4
~
Ar
(1 + e\N) (1 + e\k)
C
B
~k
nN'~-N)
c \k
4(1+e)
[(~
\k
_ e,,---)k _
--or
1+
+
e
_
--or
r'
H -synunetric)
(~
~)
r'
(e\N - e\k)
{~
e
-1
(e
\(N+k) l)(TJ e\k + TJ )
2
1 )]
~N
e\k(TJ[TJ2)(e
+
-1)
\N
1
NC (L- _ "---)
4 ~
\N
--or
+ 1
CI\
U1
I
66
increase.)
With N-N(c) so specified, it is easily verified that, for each no-
overshoot approximation,
lim _~r~(E....) _
ceo -c log c
so that, asymptotically at least, the approximate solutions are "well-behaved".
For c ... 0125 and c ... 000915, the risk of the Bayes d r is computed using
Theorem 5.3.
For each case, the no-overshoot approximations to the risk of the
"best" L-symmetric and s-symmetric Chernoff-type d r, are shown in Figures 6
and 7.
The Chernoff d r risks were computed for several values of N in the
neighborhood of -log c.
The "best" Chernoff rule is that which gives a relative
minimum risk in the neighborhood of -log c.
In each case, these best boundaries
are the same as the boundaries for the corresponding Bayes rule.
case, for
Also in each
S • \, change from L-symmetric s r to s-symmetric s r reduces risk
c
appreciably, whereas the improvement of the Bayes /) * d rover the /)s d r, both
with E-symmetric boundaries, is at best slight.
imate
_.
I
I
I
I
I
I
I
Note, however, with this approx-
evaluation as with an exact evaluation (since
/)
*
is a Bayes d r),
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
67
RISK
.4
BEST L- SYMMETRIC CHERNOFF RULE (.)
e-
BEST
SYMMETRIC
CHERNOFF RULE (CD)
I
68
BEST L- SYMMETRIC CHERNOFF RULE (.)
RISK
'''~-BEST !-SYMMETRIC
CHERNOFF RULE (CD)
BAYES·
X" RULE (T)
.oe
_.
I
I
I
I
I
I
I
_I
,
I
I
I
I
I
I
I
-.
I
I
69
II
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
II
BIBLIOGRAPHY
[1]
Abramson, Lee R. (196~). "Sequential design of experiments with two random
variables." Thesis at Columbia University, New York.
[2]
Blackwell, D. and M. A. Girshick (1954).
Decisions. New York: Wiley.
[3]
Chernoff, H. (1959). "Sequential design of experiments."
Statistics. 30, 755-70.
[4 ]
Feldman, D. (1962).
Theory of
~
and Statistical
Annals Mathematical
"Contributions to the 'two-armed bandit' problem."
Annals Mathematical Statistics.,
11,
847-56.
[5 ]
Haggstrom, G. (1964). "Optimal stopping and experimental design."
at the University of Illinois, Urbana.
[6 ]
Halmos, P. R. (1950).
Measure Theory.
[7 ]
Hotelling, H. (1963).
"Different meanings of experimental design."
d'Experiences.
pages 39-49.
[8 ]
Paris:
Johnson, N. L. (1959).
Princeton:
Thesis
van Nostrand.
Le
lli!!
Centre International de la Recherche Scientifique,
"A proof of Wald's theorem on cumulative sums."
Annals Mathematical Statistics, 30, 1245-7.
[9 ]
Kiefer, J. and J. Sacks (1963).
and design."
Annals Mathematical Statistics, ,1!1, 705-50.
[10] Kolmogorov, A. N. (1933).
Berlin:
"Asymptotically optimum sequential inference
Grundbegriffe der Wahrscheinlichkeitsrechnung.
Springer.
[11] Paulson, E. (1952).
"On the comparison of several experimental categories
with a control."
Annals Mathematical Statistics,
11, 239-46.
[12] Paulson, E. (1962). "A sequential procedure for comparing several experimental categories with a standard or control." Annals Mathematical
Statistics,
11,
438-43 •
[13] Siryaev, A. N. (1964). "On the theory of decision functions and control of
an observational process with incomplete data." Transactions Third
Prague Conference Information Theory, Statistical Decision Functions,
Random Processes.
New York:
Academic Press, pages 657-81.
(Russian)
I
70
[14] Stein, C. (1946).
Statistics,
[15] Wald, A. (1948).
"A note on clUllulative SlUllS."
!l,
498-9.
Sequential Analysis.
[16] Wald, A. and J. Wolfowitz (1948).
probability ratio test."
[17] Whittle, P. (1964).
Biometrika,
11,
Annals Mathematical
New York:
Wiley.
"Optimum character of the sequential
Annals Mathematical Statistics,
~,
"Some general results in sequential analysis."
123-41.
326-39.
--I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
71
I.
I
I
I
I
I
I
I
INDEX OF NOTATION
This is a list of notations and abbreviations each of which appears in several places in the report.
A few of these are defined or clarified here.
For
each of the others, the page where it is defined is cited.
Defini tions.
~
- generic notation for prior or posterior distribution.
Where it is
important to distinguish between prior and posterior distributions in
Chapters 4 and S, the former are denoted
e9 Pr
9
~
and the latter
~'.
expectation when 9 is the state of nature
- probability when 9 is the state of nature
B
probability - probabi 11 ty defined by the d r 01 i.e., by the measures
x, "',
4>
S - generic notation for a set
Notations.
I
I
I
I
I
I
I
I·
I
(The number following the symbol is the page where the notation
is defined).
a,A
-S
r(9, B) -7
t::.
d r
-1
rO,o) _7
t::.
-8
e r
-1
S
_S
4r
-49 P
n
-8
e,E,e
-S
s-d
_1
l}.
- SO Pl
-24
f
-6
s r
-6
t::.(~l'~2) -37
P2
- 26
-10
t d
-S
Tlp
-S9
P3
- 27
;1:(9,9' ,e}
-18
t d r
-6
9,8
-S
4>,4>
L(9,a)
-S
~
-5
1-l
'" 1-l9 ,x -6
X,X
z
-m
-5
f
9,e
g,e
L(~2n}
N ... N(
ns-d
ns-nd
-40
o,~)
-13
z (~
-m
-5
-24
B*
-59
-23
0*
-3
c
-8
n
00
9
-6
U
*
a'
C
Ek
e
-16
-S8
-60
-7
1l
n
-7
~z
-2n
-10
-2n
-8
P
~2n
-l
"', "'z
~2n
-7
-7
...X * (~l' ~2) -37
tt
-60, 51
XC
g
-60