W. Hoeffding; (1959)Lower Bounds for the expected sample size and the average risk of sequential procedure."

LOWER BOUNDS FOR THE EXPECTED SAMPLE SIZE AND
THE AVERAGE RISK OF A SEQUENTIAL PROCEDURE
by
Wassily Hoeffding
U~iversity
of North Carolina
This research was supported by the United
States Air Force through the Air Force
Office of Scientific Research of the Air
Research and Development Command, under
Contract No. AF 49(638) ..261. Reproduction
in whole or in part is permitted for any
purpose of the United States Government.
Institute of Statistics
Mimeograph Series No. 22'7
June, ·1959
~i)~.
..
_'
.. '
LOWER BOUNDS FOR THE EXPECTED SAMPLE SIZE
AND THE AVERAGE RISK OF A SEQUENTIAL PROCEDUREl
ByWassily Hoeffding
University of North Carolina
Summary.
Sections 1-6 are concerned with lower bounds for the
expected sample size E (N) of an arbitrary sequential test whose error
o
probabilities at two parameter points Q and 9 do not exceed given
2
l
numbers 01 and 02' where Eo(N) is evaluated at a third parameter point
90
•
The bounds in (1.3) and (1.4) are shown to be attainable or nearly
attainable in certain cases where Q lies between ~l and Q2. In section
o
7 lower bounds for the average risk of a general sequential procedure
are obtained.
1.
Introduction and main results o
Let Xl' X2 , ••• be a sequence of independent random variables having a common probability density f with
One of two decisions , d and d2 , is
l
to be made. Let f and f 2 be two probability densities such that decision
l
d2 (d l ) is considered as wrong if f = f (f2 ). We shall consider sequenl
tial tests (decision rules) for making decision d or d , such that the
l
2
probability of a wrong decision does Dot exceed a positive number a.
respect to a a-finite
measure~.
ot..
when f
=fi
(1=1,2).
Let N denote the (random) number of observations
required by such a test.
This paper is mainly concerned with lower
1. This research was supported by the United States Air Force through the
Air Force Office of Scientific Research of the Air Research and Development
Command, under Contract No. AF 49(638)-261. Reproduction in whole or in
part is permitted for any purpose of the United States Government.
Part of'this work was done while the author was a Visiting professor
at Stanford University.
2
bounds for E0
(N)" the expected0
sample size vhen 0
f = f " where f is in
general different from f l and f 2 •
The background of this problem is as follows.
depends ona real parameter
corresponds to the value Qi' where
i
Suppose further that decision dl or d is preferred according
2
l < Q2as 9 ~ Ql or Q
Q
Suppose that f
91 < 9 < 02.
~
Q
and f
9 2 " and that neither decision is strongly preferred if
If we require that the probability of a wrong decision
does not exceed 01 (a2 ) if Q ~ 91 (9 ~02)' the condition of the pre,
ceding paragraph will be satisfied. (In many :lmportant cases a test
which satisfies the latter condition also satisfies the forme~) It is
known
L-J:.4J that Wald's
sequential probability ratio (SPR) test for
testing Ql against Q2' with error probabilities equal toal and 02'
minimizes the expected sample size at these two parameter values. In
typical cases its expected sample size is largest when Q is between
01 and 02 (that is" when neither decision is strongly preferred), and
in general there eXist tests whose expected sample size at these intermediate Q values is smaller than that of the SPR test.
(A special case
in which a SPR test mmm1zes the maximum expected sample size will be discussed in section
4.) In principle it is possible to construct a test
vhich minimizes the expected sample size at an arbitrary Q value or
minimizes the maximum expected sample size.
Kiefer and Weiss
have proved important qualitative properties of such tests.
L-7J
However,
the actual construction of a test having this property, as well as the
evaluation of its expected sample sizeand its error probabilities, meets
with difficulties which have not been overcome so far (except for a few
special cases).
Therefore attempts have been made to find a test which,
without actually minimizing the maximum expected sample size, comes close
to this soal, or at least substantially improves upon the performance of
known tests.
Anderson
I mention in particular the work of Donnelly
L1.] who,
that~
tests such
C,J and
independently of each other, considered a class of
if Q is the mean of a normal distribution, the boundaries
for the cumulative sums are not parallel lines, as in the SPR test, but
converging straight lines.
of this type.)
(Anderson also considered
trun~ated
tests
The performance of these and other tests can, to some
extent, be Judged by
comparing~
at any parameter point
Q,
the expected
sample size of the test with the smallest expected sample size attainable
by any test having the same error probabilities at Q and Q2.
l
In the
ignorance of the minimum expected sample size, the comparison may be made
with a lower bound for this minimum.
If the discrepancy is small, both
the test (as judged by this criterion) and the bound cannot be greatly
improved.
Our main concern will be with bounds which are best when
Q 1s between Q and Q2.
l
We admit arbitrary (in general, randomized) sequential tests
which terminate with probability one under each of f ' f and f 2 • We
o
1
also assume with no loss of generality that E (N)< 00. To exclude trio
vialities we suppose that ell + <0: < 1.
2
The first lower bound for the expected sample size was given by
Wald
(L-lOJ,
p. 156) who proved for the case f =f that
o l
4
(1.1) JS.(N) ~
Jrl [iog (f l /f 2 )JdJ,1
and an analogous inequality for f =f2 • (Wald's proof assumes a nono
randomized test, but this restriction is easy to remove.) Both the
numerator and the denominator in (1.1) are positive (since log x ? l-x-1
'Un.l e S5 )( ~ ,
for x > 0); the integral in the denominator can be equal to + ex>, in
A
the
which case the lower bound basI',trivial value O. The sign of equality in
(Ll) can be attained with a SPR test in the case where the ratio fl/t2
takes on the two va.lues C and l/C only, provided that the values
a2 can be achieved as error probabilities in this test.
other cases the sign of
equa~ity
~
and
In certain
can be nearly attained with a
~PR
test.
An extension of (1.1) to the case of a.n arbitrary f o bas been
given by the author
f:6J:
(1.2) E (N) > sup
o
For f
o
=f l
to be close
- o<C<J.
and
eo
1\
~
c
f
f
+ (l-c)
o (log i2)JlJ,1
L
l
If
0
(log ~)dJ1
L
1, (1.2) reduces to (1.1). This bound is likely
is close to f l or f 2 •
o
In this paper two new inequalities will be proved,
and
(1.4)
2
when f
5
where
and
Note that
ti~O,
and ti>O is f o and f i are densities of different dis-
tributions.
In the proof of (1.4) it will be assumed that, in addition to the
existence of the integrals in (1.5) and (1.6),
except perhaps on a set of probability zero under f o , and that the
equation
(1.8)
is satisfied, where
Concerning the last assumption we note that tl -t2
=
E(~) =
J
ffo Llog (f 2 fr1)
dl1' so tbat Eo (YJ) • 0 and, by (1.6),
Equation (1.8) has been proved by Wald
and Wolfowitz
L9J
certain conditions; see also Seitz and Winkelbauer
L-8J.
2
T
[i5J
•
under
It certainly
holds if N is bounded or if Yl + ••• + Ym is bounded for m < N. It is
clear that if condition (1.8) is satisfied for a test which minimizes
6
Eo (N), then inequality (1.4) is true also for any other test. In
particular this is true under the assumptions of Theorem 4 of Kiefer
C'L7,
and Weiss
which imply that if a test which minimizes Eo(N),
then N is bounded.
Inequalities (I.}) and (1.4) will be proved and discussed in
the following sections. Here we mention only the conditions for the
attainment or near-attainment of equ'Jlyty.
In inequality (I.}) the
sign of equality holds under certain conditions which are typified by
the following two cases.
In the first case the densities f i are arbitrary
except that fo(X) ~ min {fl (x), f 2 (x)J, butal and 0:2 are restricted
to values which are attainable with a test which requires at most one
observation, xl' and decision dl (d2 ) is made if fl(~) - f 2 (~) > 0 «0).
In the second case the f.~ are rectangular densities on intervals of common
length such that the mean of f 0 is between the means of f 1 and f 2.
Then
equality in (I.}) is attained with a version of the SPR test for arbitrary
f~
values
In
and
a2~
(1~4)
strict equality is not attainable except in trivial
cases.
If f o ' f l and f 2 are normal probability densities with variance 1
and respective means Q = 0i - 5 and 5, then for 01 = 0:2 =Ct <~ equality
in (1.4) is nearly attained with a fixed sample size test if a is very
small and with a SPR test i f
0:
=0.01
Anderson
1
Ct
is sufficiently large. For
Ct
= 0.05 and
the expected sample size at Q=O of a test considered by
L1_7 comes
remarkably close to the lower bound in (1.4).
In section 7 lower bounds for the average risk of a general
sequential procedure are derived.
from one of these boundso
Inequality (I.}) can be obtained
7
2.
Some lemmas
0
A randomized sequential test for deciding between d
l
and d 21 based on the sequence Xl' X2 , ••• , can be characterized by two
sequences of random variables, Vo '
, •• ,
and
\10' \11' \12 ,
... such
and both V.n and 9:n
Xn only; V0 and eJ.0 are constants. Here Vn
that V.n>-0 .. '"0 + "'1 + "'2 + •••
are functions of )L
--1 1
"'1' "'2' •••
< 1,
-
0
<
eJ.n< 1,
-
denotes the probability of N = n, under the condition that the values
Xl' ..
e,
n have been observed, where N is the number of observations
X
taken before making a terminal decision, and
of making decision d
(~)
2
9n
(l-\ln) is the probability
under the condition that N=n and the values
Xl' ... , Xn have been observed. A test defined ·in this way"will' be.:denoted by (V ,
n
9:).
n
stopping rule and
The sequence ('IT ) will be referred to as the
n
(9:n ) as
the terminal decision rule of the test.
.
I~
< 00, that is, vo+"'l+ ....=l, with probability
will be assumed that N
cF
th
~
one when the common probability density f A independent random variables
X ,X , ... is anyone of the functions f ,fl'f •
l 2
2
o
We note that the
probability of making decision' d 2 when fef'i equals
densityn~=l f i (X j
The probability
product measure JA.n (n
) with respect to the
>
1) will be written f
for short. It will
i ,n
-
be convenient to define
f
f i,n
= 1
i f n = 0,
J,n
in.accordance with the convention that an "empty' product is equal to 1.
"
8
Similarly, the empty sum ~J=l with n=O, is defined to be O.
The
notation (~) will serve to denote any terminal decision rule such
that for n
= 1,2, •••
(201)
r/!.=
n
{
1
if
f
0
it
f
ln
<
f
ln
>
f
proof of (1.4).
all but Lemma
Lemma 1
2n
•
Lemmas 1, 3, 4, 5 and 6
The following lemmas will be needed.
will be used in the proof of inequality
2n
(1.3), Lemmas 1, 2 and 4 in the
Most of the lemmas are known.
The simple proofs of
4 are included for convenience.
ft
If'
n ,9.n ) is an arbitrary sequential test,
(v
(2.2)
Lemma 2e
If fo(X) = 0 implies min
0
1
(k), f (x)_7 = 0 except
2
perhaps on a set of probability 2Jero under f ' then for any stopping
o
rule h'n)'
Lemma 3.
For any stopping rule
Nn },
where the sign of equality holds if
except perhaps on a set of probability zero under f •
o
..
9
To prove these three lemmas we note that for any test (vn ' 9'n)
El(nf )+ E (1-d )
2
"'N
"'N
(2.6)
= '!r0
+
>
'!r
0
+ E
n>l
I:
n>l
fv.n eel.n,n
f
+ (1-9'. )f
l
n_
7dlJ,n
n~-
fv n min
with equality for 9'n = ~" n=1,2"...
(f, .... ,f2 n)d~n
~,
This proves Lemma 1.
It the condition of Lemma 2 is satisfied, we can write
min (fl "n"f2 "n) = Lmin (f l .9 n ' f 2 ,n) lto .9 n- lto ,n in the integrand in
(2.6),which 'implies Lemma 2.
Finally, using (2~6)"
El (9#N) + E2 (l_dtf.) > 'if + E
"'N - 0 n>l
-
fv n min
(f
"fl "f2 )d~n.
o"n,n
"n
Upon dividing and multiplying by f
in the integrand we obtain ino"n
The condition for equality in Lemma 3 is eaay to
equality (204),.
verify.
~mma
4. If Eo (N) < 00 and t(x) is a real-valued function such
that Eo [t(~)
J
eXists, then
E
J{
o ~:l
t (Xj>J= E
0
{t(Xl>JE (N).
0
Equation (2 o~) is originally due to Wald { i l J and has been proved
under the present
assumptions~except
tests, by Blackwell
L3 J;
for the triVial extension to randomized
see also Wolfowitz [15
J.
10
The proof is by induction.
Lemma 6.
If 0 ~ d j ~ 1,
~
Jal
(l-d,,)
~
J a 1, ..., n, then
> 1 -.g
a=1
-
d".
~
The sign of equality is attained if and only if all but at most one of
~I " ' 1
dn are equal to 1.
Lemma. 6, with the condition for equality, follows from the identity
n
"r.
~iil
(l.d .. ) - 1 +
~
Jl
n
m-l
n d .. = ~ (l-d)
(l-YJd i ).
.1=1 ~ ma2
m
,,_);
Lemma I. If U is a random variable, E(eU)
expectations exist.
~
eE(U) whenever the
The sign of equality holds if and only if U is
equal to a constant with probability one.
Proof.
Let V • U - E(U).
equality only if VaO.
,.
eV~l+VI with
Rence E(e V)~ll and the lemma. follows.
Proof of inequality (1.,).
El (9'w)~ and E2(1-9N>SXI.'
By Lemma "
By Taylor's formUla,
By Lemma. 1, for any test which satisfies
11
By Lemma.
5,
min (t
o,n
, tl
jn
,t
2 In
)
>
ft Ul~n Lf0 (Xi)'
-j:al . .
f l (x.), f 2 (xi
].
)_7.
Hence if we write
we have
Note that 0 ~ rex) ~ 1.
If we apply Lemma. 6 and then Lemma. 4,
we obtain
~ r(X,,)]
>
E
0'(:1
u
- 0
E
=1
{l-~
i=l
[i-reX,,) J
- E (N) E
o
0
= 1 - E (N)
o
4.
u
Discussion of inequality (1.3).
)
Ll-r(Xl ) ]
{i -
J min
(f
0
If
frf)d~J
1 ,,4
The sign of equality in (1.3)
holds if and only if it holds in each of the four inequalities used in
the proof.
Equality in (3.1) is attained if ¢n
(4.1)
In (3.2) it is attained, by Lemma 3, if
(4.2)
=~
and
12
By Lemma 6, equality in (3.5) holds i f and only if, for each
~l,
N = n implies that all but at most one of r(Xl),u.,r(Xn ) are equal to .
1, with probability one under f • This is the case if
o
If this condition is satisfied, equality also holds in (3.4). Thus
conditions (4.1) (4.2) and (4.3) are sufficient for the attainment
of equality in (1.3).
Condition (4.2) is satisfied for many common one-parameter
families of distributions when
Q0
is between Ql and 92 • Onder this
condition, (1.3) reduces to
(4.4)
=
l-a -CX
l 2
~ f 'f1 -f2 f d~
Condition (4.3) is satisfied, and equality holds in (4.4), in
the following two cases.
The first case is where the densities are arbitrary, subject
only to (4.2), but the values
~
and 0:2 are such that they can be
attained as error probabilities with a test which requires at most
one observation (N9-), and if an observation x is taken, decision
dl (d2 ) is made if f l (X)-f2 (X) is > 0 « 0).
The second case in which equality in (4.4) is attained is
where, in addition to (4.2), the set
a positive probability. Let
fJo,-e
subdiVided into two disjoint sets
€ = {x I fo(x)=f l (X)=f2 (X) } has
and let the complement of 80 be
tl
and (2 such that f l (X)-f2 (X»0 if
13
xeC l and ~ 0 if xeC 2 • Let N be the least n such that xnrr/C o • Decision'
d i is made if XNECi' 1=1,2. (Instead, suitable randomized decisions
can be made when x eC.) Then it can be readily verified that equality
n
holds in (4.4), with Eo(H)
= (l-Po)-l,
ot=P12(1-Po)-l, a 2=P21(1-Po)-l,
where Po is the probability of Co (under any f i ) and Pij is the probability
of Cj under f i • In the particUlar case where \.1 is linear Lebesgue
measure, 1'i (x)=g(X-9 i ), g(x)=l!L, -L/2:.xg.../2, g(x)=O otherwise,
0<92 - 91 < L, and 91 ~ Qo ~ 92 , we have C = L-92-L/2, 91+.L/2J.
Let Q2-L/2 ~ c ~ d ~ 91+L/2, Co = (c,d), Cl =(- 00, C], C2= Ld,+ ex».
Then with the test just described, perhaps preceded by a randomized
decision as to whether to take at least one observation, any error
probabilities can be attained (al ~ 0, 02 ~ 0, 01 + a2 ~ 1). Moreover,
the maximum with respect to all real 9 of the expected sample size of
th1stest when the density is g(x-9) is attained when Q is between Ql and
92 • Hence the test minimizes the maximum expected sample size. It
should be noted that the present test is a modified version of the SPR
test as defined in Wald
in this respect:
{i2J; po 120. It 1$ffers from the latter only
If the probability ratio after n observations equals
one of the two numbers A and B (in WaldIe notationj in our case A=B=l),
the stopping decision and the terminal decision may depend on the position of the sample point in the corresponding sets, instead of being
randomized decisions.
It is of interest to note that the bound in (1.3) is always
positive whereas the bounds in (1.1) and (1.2) take on the trivial
value 0 if the integrals in their denominators are equal to +
00.
14
However, in most of the more common cases the bounds (1.1) and (1.2)
(as well as (1.4» are better than (1.3). For instance, if f O ' f l , and
f 2 are normal distributions with a common variance and respective means
0, - 5 and 8, the bound in (1.3) is of the order 8-1 , but those in (1.2)
and (1.4) are proportional to 5- 2 and hence better than (1.3) if 8 is
small.
There is an interesting similarity between Wald's inequality
(1.1) and inequality (1.3) (or (4.4» for fo=f l • If a l and a 2 denote
the actual error probabilities, both inequalities are of the form.
where D is the measure of discrepancy between two distributions which
appears in the denominators of (Ll) and (4.4), and f! denotes the
diS.~
tribution on the two points d ,d of the decisi.on space such that the
l 2
probability assigned to d j is the probability of ma'king decision d j
when f=f i ; more precisely,
ft
is the probability density with respect
to a measure p* such that ~*(di=~*(d2)=1 and l-f!(dl )=fI(d2 )=al ,
l-f~(d2)=f~(dl)~2·
It will be seen in section 7 that inequality (1.3) can be deduced from a lower bound for the average risk of a general sequential
procedure. However, the direct proof given in section 3 makes it
easier to determine the conditions for equality.
15
5.
T
Proof of inequality (1.4). We assume that the integrals
t 1 , t2
and
2 in (1.5) and(l.6) exist and that the conditions (107) and (1.8) are
satisfied.
Let, tor i=l,2,
and let
(5.2)
Zn = Zl
where Y j 1s defined in (109).
f
..!..I.!!
f
o,n
,n
- Z2
,n
n
= E
.1=1
Yj ,
Then
=e
.Z
-t.n
.1.,n 1
•
Hence, by Lemma 2,
where t=max( t1 , t2 ) • By Lemma 7,
Since 2 max (Zl,N,Z2,N)
= ZlJU' + Z2,N +
and, by Lemma 4, Eo (Zl,N) = Eo (Z2,N):O, we have
'Zl,N - Z2,NI, Zl,N-Z2,N=\,'
16
Also
Eo ( IZN J
(5.6)
) :5
2
[Eo(ZN)
1
1/2
1
:=
.
~ [toe N)J
1/2.
,
"
where winave used equation (1.8).
Thus if (wn ,9'n) is any test such that El H'N) :5 0:1' E2(1-~):5
and equation (1.8) is satisfied, it follows from Lemma 1 and the
(l2,:r
relations (5.3), (5.4 ), (5.5)·and(;.6) that
log (0:1+0:2 )
2: - ~ [Eo (N) J
1/2
-
tEo (N).
Solving this inequality for Eo(N), we obtain (1.4).
6.
Discussion of inequality (104).
Inequality (1.4) bas been obtained
by combining the tour inequalities (3.1), (503), (5.4) and (5.6).
Equality in (:5.1) is always attainable tor suitable ~ and
(5.3) it holds it t l =t 2
(=t).
2 ' and in
'0
In (5.4) the sign of e%ality holds i t
and only if max (Zl N' Z2 N) + tN is constant with probability one
,
(see Lemma.
I
7), and in (5.6) it holds if and only if
J ZN
I
,that is
t~,N - Z2,N I , is constant with probability one, both probabilitea
evaluated under f o • The la\
/\ two conditiona cannot be satisfied simultaneouslyexcept in triv.ial cases.
To obtain an idea of how close the bound in (1.4) can come to
the minimum attainable value of Eo(N), we shall consider the following
special case.
Let f i be the normal probability density with variance
2
1 and mean 9i , w~ere ~o 0, el =-§ a.nd ~>o. Then tl =t 2=e /2, '1'=28,
and inquality (1.4) becomes
..
17
(6.1)
where 2a =
'1.-ta2 •
This bound will be compared with the values of
Eo(N) for a fixed sample size test, WaldIe SPR test, and a test
considered by Anderson, with error probabilities '1.=a2=a:(~) in each
case.
Let Sn=X11-
+Xn •
& ••
For a fixed sample size test such that
decision dl or d2 is made according as Sn
probabilities at
Q0_ 8
(x)
t
and G=a are both equal to
= oJ
(_8nl
/2),
where
(:l"c
~(_8nl/2) ~ a.
Bence Eo (N) is the least n such that
fe-A)
~
= (2u)-1/2 j Xe-~J2dY.
-
by
< 0 or Sn > 0, the error
If'
N=~(a)
ia defined
we have
(6.2)
exactly or with a good approximation.
If' a
~
0, then A ~ ex> and
Bence
).,2
=_2
log
a + 0 [iog (-2 log a)].
The factor of 8- 2 in inequality (6.1) is
£L1-2
log (20<...)]
112_1
)2
= _2
log a + 0 /]-2 log
a)l/~.
18
Thus if
a is
small enough" the bound in (6.1) is nearly attained with
a fixed sample size test" although the asymptotic approach is extremely
slow.
It follows that the :fixed sample size test nearly minimizes the
expected sample size at
e ==
0 when ex is (very) small.
Now consider the SPR test which stops as soon as 28 'Snl
(:>0).
Then (log A)2
<
48~ (SN2 )
0
==
> log
A
48~ (N) by (l.8). and A < l~ •
0
'
-
v.
These inequalities are close approximations for ex fixed and 8 small
enough (Wald
Llo..7>.
With this approximation"
-2 1
Eo ()
N == 8 (2' log
Put ex == (1- €)/2 and let
e ~ 00
1
l~ 2
(2' log a ) ==
2
€
Then
2
+,
a1~)2 ·
4
E
23
q5' €
+
6
+. • •
and
Thus if
0:
is close to its upper bound ~J and 8 is small enough" the
lower bound in (6.1) is nearly attained with a SPR test.
test nearly minimizes Eo (N) in this case.
Hence the SPR
Table 1 shows that even for
0=0.2 the expected sample size exceeds the lower bound by only 30;0.
(The
lower bound in (1.2) with c~ also approaches Eo (N) for the SPR test as
a ~!. HOwever" inequality (6.1) is better than (1.2)" as applied
to the present case" for all values of
a.)
For ex values not close to 0 or ~ we compare the bound in (6.l)
with the expected sample size of a test considered by Anderson ~l~
This test stops as soon as
I Snl
2= c +
dn" where d
< 0 < c. Anderson
19
approximated the sequence (Sn ) by.
. a ~ie~er process so that his values
for the expected stopping time" Eo(<r»when the mean of the process is
o are
approximations to Eo (N). He chose the constants c and d so as
to minimize Eo(T) subject to prescribed error probabilities Ql =02=O
at Q = :t 6, for 6=0.1 and a=O.Ol and 0.05. Anderson's values are
given in Table 1. The expected sample sizes exceed the lower bounds
only by 3.610 and 2.8~0, respectively. This shows that both Anderson's
test (as judged by the expected sample size at Q=O) and inequality
(6.1) cannot be greatly improved in these cases.
TABLE 1
Values of Eo{N) and of the lower bound in (6.1) for 6 = 0.1.
a=
Fixed sample size
SPR test
Anderson's test
Lower bound (6.1)
0..0001
1383
2121
1054
0.001
.0.01
955· 541.2
1193 527.9
402.2
710 388.3
0.05
0.1
0.2
270;6
216.7
192. 2
187.0
. 70.8. 27.5
120.7 48.0 17.9
0.·3
164~3
111.1 46.6
17.8
'or-
To conclude this section, it will be shown thatAeach of the two
sequential tests here considered the expected sample size attains its
maximum when the mean
Q
of the normal distribution is O.
In
conjunction
with the preceding results this implies that each of these tests (as well
as the fixed sample size test) comes close to minimizing the maximum
expected sample size for certain a values.
I
Both tests are such that sampling is stopped as soon as 'Sn ~ cn '
where c l' c 2 , ... are nonnegative constants. The expected value of N
at
Q
is the sum of the probabilities PLN>n~QJ. We can write
20
where Y • (yl, ... ,yn ), z..=(l!l, ..., 1), f is the probability density of
n independent normal random variables with mean 0 and variance 1, and
jYl+ ....+Yml < cm' m=l, ...,n}. The set A is convex, and
yeA implies -yeA. It follows from a theorem of' Anderson
that
A = {y'
L2J
P
[if>
g
<0
n
f e]
attains its maximum a.t Q=O (and is monotone for
and g;> 0). Thus the same conclusion is true for the expected
value of N.
21
7. Lower bounds for the average risk In this section a sequence of
increasingly better lower bounds for the average risk of a general sequential procedure will be derived.
Under certain conditions these bounds con-
verge to the minimum average risk.
They are similar to the bounds given by
Blackwell and Girshick,~4_7 and will be obtained as a consequence of results
of Wald and Wolfowitz 113_7 which are also contained in Wald I s book £12.-7.
In slight extension of the assumptions in £13_7 and £4_7, the cost per observation will be allowed to depend on the parameter;
due to this assumption
the bounds can be used to obtain lower bounds for the expected sample size
(see section
8).
The random variables Xl' X2 , ••• are assumed to be independent with a
common probability density f O with respect to a ~-finite measure ~, where the
parameter 0 is contained in a space
1-2.
To simplify the exposition, the as-
sumptions of £13_7, section 2, will be made (with some obvious changes in
notation), with two exceptions stated below.
In particular, ~ is Lebesgue or
counting measure on the real Borel sets (this is not essential);
1-2
function Won
x D is .ccnnegative and bounded;
the loss
the terminal decision
space D is compact in the sense of the convergence sUPOIW(Q,di)-W(O,d ) I ---> 0;
o
the a priori distributicnss are the probability measures on a fixed Borel field
.r2.
of subsets of
The cost of m observations is assumed to be c(O)m, where
c(O) is nonnegative, bounded and measurable on the given Borel field of subsets
of
1-2.
(In £13_7, c(O) is a constant.)
€.r2
function info
In addition, we assume that the
f O is Borel measurable. The class b. consists of all sequen-
tial decision functions 0 which satisfy the needed measurability conditions as
specified in £13_7.
£13_7.
For the other measurability assumptions we also· refer to
22
Denote by r(O,8) the risk (expected loss plus expected cost) when the
decision function 8 is used and the parameter is O.
tribution
~
over
1-2
let r( r; ,8) =
the average risk r(s,8) for 8~.
J
r( 0,8) d~ •
For any a priori dis-
ret p( 5) denote the infimum of
Let
and let (;
denote the distribution over (-) defined by dl;
y
-y
Then the function pes) satisfies the equation
This is a straightforward extension of Theorem
3.2
of
~
~
8~ ,
n
the class
£13_7 .
For n > 0 let p (l;) denote the infimum of r( s,8) for
-
= fl"\(y)dg/f,:CY).
n
of all decision functions in A which terminate after at most n observations.
(This is consistent with the definition of p (s) above.)
o
of Theorem 3.1 of £13_7 we have
Clearly po(~) ~ Pl (~) ~ P2 (l;) ~ ••• ~ p(s)·
In
£13_7
By direct extension
it is shown that if
c(O) = c > 0, then lim Pn (~) = pes).
Blackwell and Girshick (£4_7, 1'1'. 255-256) have given lower bounds
for
pes)
r*
(~)
o
which with the present cost function can be defined as follows. Let
=0
and define recursively for n
r*
(s) = min Lr p0 (~),
n
J*
= 1,
2, •••
r ~ l(~ Y)ft(y)d~(y)
+ c(~) - 7
~
.
It will now be shown that the lower bounds (7.3) can be improved with
the help of an inequality of Wa1d and Wo1fowitz
.L13_7.
Sufficient condi-
tions for the convergence of these lower bounds and of the upper bounds
Pn(~) to pes) when
c(O) is not constant will also be given.
Let
Excluding the trivial case where all distributions f O are identical, we
have 0 < A. < 1..
Now define
and recursively for n
(7.6)
= 1,
2, •••
p~(s) = min .LPo(s),
We shall write fO,n for
1f~=1 fO(X j ),
posteriori distribution over
ds(n)
JP~_l(;y)fs(Y)dlJ.(Y)
.Cl
f£,n for
+ c(s)_7
J
fO,nd£ and £(n) for the a
after n observations xl' ..... , Xn '
= f O,nds/fs,n •
Theorem 1.
We have
In order that
( 7.8)
lim
n ->
p~(s)
CD
it is sufficient that either
=
lim
n -> co
Pn(S)
=pes)
60
that
24
or
g { c(O) > 0)
(7.10)
If ~
Remark 1.
= 1,
= r *n (g),
so that the two sequences
We always have pI l(g) > r *(~).
of bounds are equivalent.
Remark 2.
then pIn- leg)
= 1.
n-
-
n
The integral in (7.9) is the risk of the (fixed sample size)
Bayes procedure based on n observations when c(O) ==
o.
Thus condition (7.9)
is satisfied for all g if the maximum expected loss of some decision rule
based on n observations tends to 0 as n - >
(I).
An upper bound for the
in~egral in (7.9) (which, in turn, is an upper bound for Pn(l;) • p~(s»
the case of finite
Remark 3.
1-2 is given in Theorem 2 below.
In section 8 it will be shown that the inequality
pes) -> pl(S)
implies inequality
(1.3).
0
.
that equality in
for
p(~)
The discussion in section 4 shows.
> pl(S) is attained in special cases.
-
0
Proof of Theorem 1.
Since pes)
~P(Sy)fs(Y)<W(Y) =
f
= infa ~r(O,a)ds
and
infaL~r{0,O)fo(Y)ds(O)_7dlJ.(Y)
we have
•
(This is essentially equivalent to inequality (3.22) of ["13_7.)
Hence , by
(7.1), if pes) < po(g), then pes) ~ ~-lc(s). Therefore p(~) ~ p~(s).
It now
follows from (7.1) and (7.6) by induction that pes) > pl(~) for all n > O.
-
n
To complete the proof of (7.7) we now show that
(7.12)
n = 1, 2, •••
...
It can be seen in a similar way as in the proof of (7.11) that
•
Hence, by
(7.6)
with n
=1,
(1 - ~) p I (~) . + c( ~) 7
o
-
•
It is readily shown that the right side of this inequality is equal to
p~(e).
Thus (7.12) is proved for n
lows by induction from
= 1.
For n
= 2"
"
••• the result fol-
(7.6).
To prove the remaining part of the theorem, we first observe that
p~(e) (just as r:(~) j
14_7)
see
can be interpreted as the minimum average
risk in a modified decision problem.
let D' denote the original terminal
decision space D, augmented by a terminal decision dol.D.
tion be W(O,d) if d
# do "
of te.e original problem.
but X-lc(O) if d
= d.0
let the loss func-
The cost function
is that
let 6 ' denote the class of all sequential decision
n
functions (subject to measurability assumptions analogous to those in 11'_7)
which terminate after at most n (> 0) observations" such that decision d
-
is
0
allowed only after the n-th observation has been taken.
,
If r ' (O,,8) denotes
the risk function in the modified problem, it can be seen that the minimUm of
r'(t,8) for 8 in 6' is equal to pf(~) as defined by
n
n
Since p'{s) < pee) < p (~)"
n
-
lim
n-">
-
CD
n
(7.8) will
6~" so that p~(~) = rf{£,o~).
identical with
8~
&~
and
(7.6).
be proved if we show that
1Pn(~) - p'(t)
7=0
n-
For a fixed a priori distribution £, let
(7.5)
•
be a Bayes decision function in
let 8n be the decision function in
6n which
is
before the n-th observation is taken and makes the optimal
terminal decision after the n-th observation.
Denote by",'n = ""n (Xl' ••• , Xn- 1)
26
the probability that the sample size N' required by procedure
n" given that the first n-l observations are xl" ••• , xn-l.
e~
is equal to
Then
Therefore
(7.14)
It follows immediately that condition (7.9) is sufficient for (7.13)
and hence for (7.8).
Also" ifW is an upper bound for W(O,d) and hence for
po(e), (7.14) implies
p (~) n
(7.l5)
p'(~)
<W
n
-
}"IjI'f/: d~n =WP/:(N'
n ~"n
10
= n)
Now
W-> p'(e)
=
n
r'(~,e')
n
>
-
J
c(O)EI"\(N')de
..
~J c(O}nPo(N'=n)dg ~
=
nl/2fp~ (N'=n)
~
n / fp£ (N' = n) -
l 2
l 2
n /
tc(o)
J:?
PO(N'
l 2
n- / }
= n}dt
J
PO(N'=n)dC7
l
2
(c(O) < n- / }
~
l 2
{ c(O} < n- / }
_7
Thus
( 7.16)
~tting
n ->
CD"
it follows from (7.15) and (7.16) that condition (7.10) is
sufficient for (7.8).
This completes the proof of the theorem.
27
The section is concluded by a theorem which shows that if
rl
is
finite" then under a natural assumption on the loss function the difference
p (~) - p' ( ~) converges to
n
n
Theorem 2.
for each
where
Q:Ej,-l
If
° uniformly in
rl
= 1"
consist of k points, 0
2" "'1 k" say, and if
there 1s a. doe'D such that W(O'do) = 0, then
W is an upper bound for W(O ,d)
We remark that 7
f l " ""
~ at an exponential rate.
and
< 1 if it is understood that no two of the functions
f k are densities of the same distribution.
particularly simple bound for Pn(~) - p~(~);
The theorem exhibits a
closer bounds are contained in
the proof.
To prove the theorem we note that by (7.14)
(7.18)
ret
~
assign probability gi to the point 0 = i.
J
J
~J
P (~( n) ) f l: c4Ln =
o
!DIn
infd
k
I:
i=l
min
j=l" ••• ,k
giW( i, d) f
Then
i ,n
dl.P
k
I:
g.W(i"dj)fi d~
i=l 3.
,n
n
where ¢l' "', ¢k are arbitrary nonnegative measurable functions of Xl' ••• , X
n
such that ¢l + ••• + ¢k
= 1.
Hence, recalling that W(i,d )
i
= 0,
28
k
= W I: giJ (1 - ¢i)fi n~n
i=l
'
•
Let, in particular, ¢i = Tr~=l ¢ij' where ¢u = 1 and for i
1 - ¢Ji
= ¢ij
= 1 if fi,n > fj,n and = 0 otherwise.
=1
and ¢l + ••• + ¢k
are satisfied.
By Lemma 6 of section 2" 1 - ¢i :5
zero.
(Note that ¢i
I:~=l (1
< j,
The conditions ¢i.? 0
=1
if fi,n
= ma.xjfj"n·)
- ¢ij) I where the term with j
=i
is
Hence
J(
•
Now if i .; j,
J
(1 .. ¢ij)fi ,nd\ln:5
<
Hence
Jpo(~ (n»f~"nd\ln :5
J
min(fi,n,fj,n)d\ln
J(f
i"n
If
W ~=l gi (k_l)'Y
)1/2~n
j,n
n
=if
=W(k-l)'Yn ,
(fifj)1/2~7n::5 "In.
and the theorem fol-
lows from (7.18).
8.
Further lower bounds for the expected sample si ze •
The lower
bounds for p(~) in section 7 can be used to obtain lower bounds for the expected sample size at a specified parameter point Q in terms of upper
o
b't)'llUdl;? on the expected loss or (by choosing a suitable loss function) in terms
of upper or lower bounds on the probabilities of various decisions at selected
parameter points.
For thLs purpose one chooses the cost function so that
and c(O » O. The explicit result will be stated only for
o
0
a two-decision problemj extensions to problems involving more than two decic(O) = 0 for 0 .;
Q
sions will be obvious.
Let
1-2
consist of the three points 0" 1" 2, and let there be two de-
°
cisions, d and d • Put W(1,d2) = W(2,d1 ) = 1, W(i,d j ) = otherwise, c(O)=l,
l
2
c(l)
c(2) = 0. Let ~ assign probability gi to the point i (i
0, 1, 2).
=
Then po(s)
=
= min(gl'~),
c(s) = go' and, with 8
= {Wn'¢n
(8.1)
},
n
=0"
1, 2, •••
This gives a sequence of increasingly better lower bounds for Eo(N).
particular,
p~(s)
= min(gl,g2,A- 1SQ>'
where A = 1 -
J
min(fo ,fl ,f2 )dp.
ratio in (8.1) with n = 0 is maximized by letting gl = ~
sulting inequality is equivalent to (1.3).
= X-lgo '
In
The
and the re-
References
T. W. Anderson, "A modification of sequential analysis to reduce
the sample size," ~. ~. ~.
T. W. Anderson, "The integral of a symmetric unimodel function,"
E!:2£. ~. ~. ~., Vol. 6 (1955), pp. 170-176.
L3J
David Blackwell, "On an equation of Wald," Ann. Math. Stat.,
Vol. 17 (1946), pp. 84.87.
['4J
David Blackwell and M. A. Girshick, Theory of Games ~
Statistical Decisions, John Wiley and Sons, Inc., New York
195~.
.
T. G. Donnelly, "A family of sequential tests," Ph.D. dissertation,
University of North Carolina, 1957.
LbJ
Wassily Hoeffd1ng, "A lower bound for the average sample number
of' a sequential test/' ~. ~. ~., Vol. 24 (1953),
pp. 127-130.
J. Kiefer and Lionel Weissl' ttSome properties of generalized
sequential probability ratio tests,lI Ann. Math. Stat., Vol.
28 (1957), pp. 57-74.
DJ
[il
J
-
-
-
J. Seitz and K. Winkelbauer, Remark concerning a paper of
Kolmogorov and Prohorov /' Czechoslovak Math. J.; Vol. 3
(78) (1953), pp. 89-91 (Russian with English summary.)
Abraham Wald, "Sequential tests of statistical hypotheses," ~.
~. ~., Vol. 16 (1945), 117-186'.
~12~ Abraham Wald, Statistical Decision Functions, John Wiley and Sons,
Inc., New York 1950.
L13 J A. Wald and J. Wolfowitz, "Bayes solutions of sequential decision
problems," ~. ~. ~., Vol. 21 (1950), pp. 82-99.
[14 J
A. Wald and J. Wolfowitz, "Optimum character of the sequential
probability ratio test," ~. ~. ~., Vol. 19 (1948),
pp. 326-339.
.
1:.15:1 J.
Wolfowitz, "The efficiency of sequential estimates and Wald's
equation for sequential processes", Ann. Math. Stat., Vol. 18
- (1947), pp. 215-230.