Read, Campbell Burgess; (1969)On minimizing the risk in certain sequential tests for known or unknown costs."

ON MINIMIZING THE RISK IN CERTAIN SEQUENTIAL TESTS, FOR
,
KNOWN OR UNKNOWN COST
by
CAMPBELL BURGESS READ
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 626
JUNE 1969
Research partially supported by the United States Air Force Office of
Scientific Research Grant No. AFOSR-68-l4l5 and the National Institutes
of Health, Institute of General Medical Science Grant No. GM-l2865-05.
ii
TABLE OF CONTENTS
CHAPTER
PAGE
tv
ACKNOWLEDGEMENTS
I.
INTRODUCTION
1.l.
II.
1.2.
The SPRT: Notation and Main Results
2
l.. 3.
Optimum Property of the SPRT
5
1.4.
Unknown Cost and the Partial Sequential Probability
Ratio Test
7
MINIMIZING THE AVERAGE RlSK IN A S.P.R.T.
2.1.
Introduction
11
2.2.
The Absolute Minimum Risk for a Multinormal
Discrimination Problem with Equal Loss
14
The Absolute Minimum Risk for Symmetric
Nonnorma1 Tests
25
2.4.
The General Case
32
2.5.
Comparison of a.m. risks:
2.3.
III.
1
w=2
39
THE PARTIAL S.P.R.T. PROCEDURE
3.1.
Definition, and Application to the Exponential
Family
50
3.2.
Testing a Gamma Parameter
55
3.3.
Testing a Normal Mean with Known Variance:
the O.C. Function
61
Testing a Normal Mean with Known Variance:
the Average Sample Number
69
Numerical Results and Comparisons
75
3.4.
3.5.
IV
1
ON MINIMIZING THE RISK WHEN THE COST PER OBSERVATION IS UNKNOWN
4.1.
Introduction
83
iii
CHAPTER
V.
VI.
VII.
PAGE
4.2.
Excess Risk due to Use of Incorrect Cost
84
4.3.
Average Excess with Variable Cost
89
ON MINIMIZING THE RISK WHEN THE COST IS UNKNOWN:
A SPECIAL CASE
5.1.
Introduction
95
5.2.
Extension of the Chernoff a.m. Risk to the PSPRT
of a Normal Mean
96
5.3.
Average Excess
103
5.4.
Method of Quadrature
110
5.5.
Numerical Results
113
ACCURACY OF APPROXIMATIONS TO THE A.M. RISK OF A P.S.P.R.T.
6.I.
Introduction
6.2.
Bounds on the
6.3.
Bounds on the Average Sample Number
134
6.4.
Numerical Results
138
6.5.
Bounds on the Risk of a P.S.P.R.T.
143
130
o. c.
Function
131
A MULTIVARIATE EXTENSION OF A SEQUENTIAL DISCRIMINATION
PROCEDURE
7.1.
Introduction
147
7.2.
The Test Procedure
150
7.3.
The O.C. Function
152
7.4.
The A.S.N.
155
7.5.
Suggestions for Further Research
161
APPENDIX
1.
Lemma A.I.
165
II.
On Choosing a Suitable Estimator of Cost
167
BIBLIOGRAPHY
169
iv
ACKNOWLEDGEMENTS
I wish to acknowledge the guidance and inspiration of Professor
Normal L. Johnson throughout the preparation of this dissertation.
His
invaluable comments and suggestions were a constant encouragement; it
has been a privilege to have been taught by him and to have worked under
him.
CHAPTER I
INTRODUCTION
1.1.
Between 1944 and 1946 Abraham Wald published a series of papers
([18], [19], [20], [21], [22], [23]) which effectively brought Sequential Analysis into the main stream of statistical theory.
Using the im-
portant Fundamental Lemma [18], Wald developed the properties of the Sequential Probability Ratio Test (SPRT) of a simple hypothesis against a
simple alternative, and evaluated approximations to the Operating Characteristic (D.C.) Function and to the Average Sample Number (ASN) of the
procedure for a large class of sequences of independently and identically distributed (i.i.d.) random variables.
He also calculated bounds
on the D.C. function and on the ASN ([19]).
Since these approximations
playa large part in subsequent chapters, a summary of the main results
is given in 1. 2.
In 1.3., the optimum property of Wald's test ({25], [26])
i~
re-
lated to restrictions which are usually placed on the boundaries of the
continuation region of the test and is also related to the "risk" of the
procedure when the prior distribution of the unknown parameter
given.
The background of Chapter II is completed in 1.3.
8
is
2
A modification of Wa1d's SPRT is introduced and examined in
Chapter III, and in 1.4. the background and main results of the rest of
this paper are set out.
The modified SPRT is used in Chapter V, which
together with Chapter IV explores the problem of minimizing the risk
A multivariate test discussed in Chapter VII
when the cost is unknown.
is also introduced in 1.4.
1.2.
The
Notation and Main.Results.
SPRT~
Xl' X ' •••••
2
every
j,
is a sequence of i.i.d. random variables, and for
has a probability density function (p.d.f.) or likelihood
X.
J
The likelihood of
is denoted by
n
IT
j=l
Pe(x.) •
J
In the standard Wa1d SPRT, we wish to test the simple hypothesis
HO:
e = eO
against the simple alternative
The test
T
is then as follows:
given two constants
A
and
B
such that
sampling so long as
Pe (~)
B
<
1
Pe (~)
<
A
a
If
stop and.accept
o
< B < A,
continue
3
Pe (~)
1
If
Pe (~)
:::;;
B,
stop and accept
HO
0
a = log A,
If
b = log B,
Zo = 0,
z.
J
= log
Pe (Xj )
1
j=1;2, ••• ,
Peo (x j )
and
z. = Z.
J
J
- Zj_1 ,
j
= 1, 2,
... ,
then sampling is continued until
n
b <
~
Zj < a
j=l
is violated, and
ratio;
T may be rephrased in terms of the log-like1ihood-
in many situations it can be discussed in terms of a particle
on a random walk between two absorbing barriers.
The D.C. function is denoted by
L(e)
Let
L(e),
and
= Pr(accept
Hole)
N denote the stopping variable.
In this paper, the stopping
variable
is synonymous with the sample size.
..
procedure is denoted. by
The ASN of any sequential
E(NIS).
Wa1d's Fundamental Lenuna [18], as applied to
under certain conditions
~ h = h(e)
+0
such that
E ehzl = 1,
i.e.
[pe, (xlT dF e (xl)
JG Peo(x l )
zl'
=
1,
states that
4
where the integral is a sum if
Xl
is a discrete variable, and
the cumulative distribution function of
pending on·· e,
1 - a'
13'
and let
a, 13
is
G is a set, not de-
Xl'
such that
Pr(X l
Let
Fe
=
L(e o)
=
L(e l )
~G)
= 1.
be the nominal error probabilities when excess over the
boundaries is neglected.
Further, we introduce the constraint
o <.B
< 1 < A
or
-
This constraint .is
import~nt
SPRT in the next section.
If
=
log 1-13
b
=
log _13_
l-a
a' + 13'
<
a + 13
E(Zlle)
imations to
00
in considering the optimum property of the
1-13 '
log --,a
::;
a
<
13 '
log --a
1
'
1
exp [ah ( e)] - 1
exp[ah(e)] - exp[bh(e)]
+o.
h(e)
bL(e) + a{l - L(e)}
E(zlle)
The above approximations are better when
E(Nje)
<
+0,
E(N Ie)
when
a
When it holds, Wald showed that
a
L (e)
< b < 0 <
00
is large.
L(e)
and to
a'
and
13'
are small, or
Page [15] and ~emp [13] improved Wald's approx~
E(NI e)
for the case when
X.
J
is a normal
5
r.v. with known variance and mean
8.
Bhate [4] derived upper and lower
bounds for the cumulative distribution of
and applied them to spe-
N,
cial cases.
Wald's bounds on
will be used in Chapter VI: --
L(8)
let
lim
[I;;
E(ehzlehZ :::; 1.) ]
=
n
[p
E(ehzlehz ~ 1.)]
=
cS
I;;
lim
I;;
P
p
where it is understood that
e ah - 1
e
ah
bh
- ne
:::;
n
and
cSe
L(8)
depend on
cS
ah
ah
ue
s:
1
-
bh
e
8.
Then
if
h >
0
if
h <
a
and
1 - ne
L(8)
e
Wald calculated
nand
buted r.v. having mean
cS
8
bh
ah
- ne
ah
for the case where
and known variance.
X.
1.
is normally distri-
These values are used in
Section 6.2.
1.3.
Optimum Property of the SPRT.
Wald and Wolfowitz ([25], [26]) showed that in the class of all se-
Ho and
quential tests between two simple hypotheses
Pr(accept
H118 = 8 0 )
:::;
a'
Pr(accept
Hole =
:::;
s'
81)
hold, the SPRT is "best" in the sense that whenever
it requires on the average fewest observations.
by
P.
HI'
8 = 8
for which
o or e
=
e1 ,
We denote this property
6
Burkholder and Wijsman [6] pointed out, however, that property
P
had been shown valid only when
(i)
B :::; 1 :::; A
(ii)
E(N!e) <
00
•
They showed that condition (ii) is not necessary, but that condition (i)
is necessary.
It is for this reason that (i) will be assumed to hold
throughout this paper.
Burkholder and Wijsman also showed that if attention is restricted
to the class of tests having at least one observation, then property
P
of the SPRT holds without condition (i), other than the obvious condition
B:::; A.
Tests having no observations will, however, be considered
in Chapter II.
The optimum property may be regarded in another light [14].
pose
e
to have a prior distribution in which
w
=
Pr(e =
eo)
1 - w
=
Pr(e =
e1)
=
(w,
denoted by
/
Also let
A
w
c
a test procedure
0
H
O and
HI
and
S'
respectively.
the losses for
The risk function of
Lowa ' + L 1 (1-w)S' + cE(N!e)
are the exact error probabilities of
the class of tests
0
SPRT minimizes
).
W
RO(A
Lo ' L 1
is
RO(AW)
a'
1 - w)
be the cost per observation, and
incorrectly rejecting
where
Sup-
having
aI
and
SI
This is property
o.
Then in
as error probabilities, the
P.
If the class of tests is
not restricted by bounds on the error probabilities, then the optimum
test minimizing
RO(A
)
W
is either a SPRT or a decision made without
7
taking any observations.
In ,Chapter II, this will be discussed in
detail, using an iterative procedure,in terms of Wald's nominal error
probabilities
a
and
f3.
Chernoff [7] suggested approximate error probabilities leading to
a nominal minimum risk, and although these error probabilities are not
always close to those obtained by iteration, we find that they often lead
to a risk which is close to that obtained by iteration.
The special
case of testing a normal mean is discussed, and also the class of sym1
When w=2'
metric tests, as well as more general testing problems.
a symmetric test is frequently as nearly optimal as a nonsymmetric test.
Tables give results when
1.4.
X.l.
has a variety of distributions.
Unknown Cost and the Partial Sequential Probability Ratio Test.
If the cost per observation
the risk, we can replace
c
c
is unknown, and we wish to minimize
by an estimator.
The question of the most
suitable estimator arises, and is touched on in Appendix II.
main part of the thesis, we use
n
observations.
c '
n
the mean observed cost based on
Chapter IV covers the case in which
in a previous experiment.
In the
c
n
is observed
The risk is minimized as if the true cost were
and the excess risk over that attainable for the case in which
is known is calculated.
c
If the observed cost has a Gamma distribution
with small variance, then the excess risk is averaged over the distribution of
c.
n
The analysis uses Chernoff's approximations [7], for
SPRT procedures having at least one observation.
For the case in which no previous estimate of
have adopted a two-stage procedure.
In the first,
c
is available, we
n
observations are
8
made;
and
x
-n
c
are observed.
n
In the second stage, a Wald SPRT proA
n
cedure is followed, with boundaries
advance, but the observed cost
c
and
B.
n
determines
n
A
n
n
is determined in
and
B
n
if the risk
is to be minimized (Chapter V).
This procedure is termed the Partial Sequential Probability Ratio
Test (PSPRT), and is of interest in itself (Chapter III).
test
T(x )
is first defined, given
'-11
x •
n
A conditional
Sampling contin-
ues if
Pe (~)
=
B' (x )
B __0-:---:-
n '-11
n Pe (x )
m
<
=
L:
j=n+l
1'-11
A' (x )
n
'-11
m > n.
where
In order that property
ceed beyond
n
P
shall hold, i.e., for sampling to pro-
observations,
B' (x ) ~ 1 ~ A' (x ) •
n --n
n --xl
If
B'(x) > 1,
n --xl
sampling stops with the first stage and
HO is
A'(x)
sampling stops with the first stage and
HI
accepted.
If
n
--xl
<
1,
is
accepted.
The approximate
DC
function and
over the joint distribution of
x
--xl
imations to the unconditional PSPRT.
ASN
of
T(x)
--xl
are averaged
to give extensions of Wald's approxFor an exponential family, these
are obtained from the univariate distribution of a sufficient statistic
and the cases in which
is a Gamma variable and a normal var-
iable are considered, with numerical analysis in the latter case.
bounds on
L(e)
and on
for the same case.
E(Nle)
Wald's
are extended to the PSPRT in Chapter VI
9
The PSPRT does not appear to have been studied in the literature,
although Billard [5] recently developed modifications of procedures of
Sobel and Wa1d [16] and of Armitage [2] for testing three hypotheses
jointly.
Her modifications are also analogs of the PSPRT.
Tables show that Anderson's modified SPRT [1] is better than the
PSPRT, but it is not appropriate to the problem of unknown cost.
If
h(e)
= 0,
a fixed sample-size test of equivalent power may re-
quire fewer observations thana Wa1d SPRT on the average (see, for
example, [12]).
The PSPRT may then be better than either (see Tables
3.5.7 and 3.5.8).
If
and
n = 1,
E(Nle)
the PSPRT is a Wa1d SPRT.
The approximations to
L(8)
derived in Chapter III seem to improve very slightly on
Wa1d's but not as substantially as Kemp's [13] (see Tables 3.5.4 and
3.5.5) •
Approximations to the minimum risk, and the average excess risk
using the estimator
c
n
are considered in Chapter V;
be normal with unit variance,
La
= L1 ,
and
1
w = '2.
X.
J
is assumed to
Numerical results
by the method of statistical differentials (shown to be reasonably
close when compared with quadrature for small values of
an optimum value of
n
c) demonstrate
which minimizes the average excess risk; tables
and diagrams are included.
Chapter VIII extends work by Baker [3] and Hall [9] for a two-stage
test of the mean of a normally distributed variable with unknown variance.
Hall suggested a modified SPRT, replacing the variance by an
estimator.
The error probabilities have known bounds.
This extends to
a test of the mean of a mu1tinorma11y distributed population in which
the covariance matrix is known except for a scalar multiplier.
10
NOTATION.
~
denotes approximate equality throughout.
means. "is distributed as
" in referring to
ra~dom
variables.
CHAPTER II
MINIMIZING THE AVERAGE RISK IN A S.P.R.T.
2.1.
Introduction.
Let
Xl' X2 , •••
be a sequence of independent identically distri-
buted random variables, indexed by a parameter
8.
shall be interested in testing a simple hypothesis
a simple alternative
HI:
8
= 81 ,
In this chapter, we
Ho:
8
=
against
80
where the parameter has a prior dis-
tribution
are losses relating to wrong rejection of
Ho and
HI
respectively;
c
is the cost per observation.
In the sequel, we shall be considering SPRT's of
HO
VB
HI'
but
the error probabilities considered will be the nominal error probabilities when excess over the boundaries is neglected.
To this extent, the
results will be approximate, since we shall minimize the risk function
using these nominal error probabilities rather than the exact ones.
When the error probabilities are small, however, the nominal error probabilities come very close to the exact ones.
In general, we shall find
that if cost per observation is small compared to loss
Lo or
Ll
,
then the error probabilities appropriate to the minimum risk are small,
and in many practical applications this will be so.
_ _ _ _ _ _J
12
a
Let
be the SPRT for testing
Ho vs
having
HI'
(nominal) error probabilities, and constant cost
Let
RO(A
)
W
the prior on
a
be the risk corresponding to
a
is
c
a
and
S as
per observation.
when the distribution of
Aw • Then
(2.1.1)
where
N is the stopping variable, possibly equal to zero, and
a', S'
the error probabilities.
feN), and nominal error probabi1-
Using Wa1d's approximation for
ities
a,S,
(2.1. 2)
Xl' X2 ,
In (2.1.2), we suppose
variables, each having a p.d.f.
ues are
Xl' x 2 '
to be a sequence of i.i.d. random
Pe(')'
If the respective observed va1-
••• ,
=
and
=
e.) ,
~
i
= 0,
1
13
We shall consistently use the notation
cw
=
(2.1.3)
=
It is a property of the likelihood ratio that
Pe
flOg
(x)
I
Pe
S
Pe
(x)
o
(x) dx
<
o
(x) dx
>
o
o
and
pel
f
log
S
where
S
[ Pe
is a set such that
(X)]
Pe
( )
o
IS
x
I
Pe. (x)dx =
1;
i
= 0,
1,
and such
~
that
Pe. (x)
=
0
at most on a set of measure zero ill
S.
~
Hence we have, if we also assume
c > 0,
0 < w < 1,
(2.1.4)
and
~
6
1-6
Lowa + L I (1-w)6 + {k o (l-a) + k I S}logl_a + {kOa + kl(1-6)}lo~
(2.1.5)
In problems in discriminant analysis, we may wish to minimize the
risk for a given prior
A • The experimenter is free to choose nominal
error probabilities
and
a
w
6
if he uses a Wald SPRT procedure, by
appropriate choice of boundaries.
We know that among the class
Co
of
all sequential procedures (including those for which decisions are made
without making any observations at all), Wa1d l s SPRT is a Bayes rule for
14
given
CO'
A,
a.
w
and
[14] •
(3
In minimizing the risk for procedures in
therefore, it seems reasonable to restrict attention to the class
C1 of Wa1d SPRT's.
given
A
given
A ,
Thus the (approximate) minimum risk in
C1 for
Co
will be the (approximate) absolute minimum risk in
W
w
for
which will therefore be termed the "nominal a.m. risk."
The problem will be analyzed in two special cases first; those of
testing a mu1tinorma1 mean with known covariance matrix, and of certain
synunetric non-normal tests in which
(3
= a..
tests in the general case is discussed.
Then a procedure for simple
The chapter concludes with some
comp aris ons •
2.2.
The Absolute Minimum Risk for a Multinorma1 Discrimination Problem
with Equal Loss.
Let
lb,
be a sequence of mutually independent
X2 ,
dom vectors, each distributed mu1tinormal1y with mean
iance-covariance matrix
Then the
log
L;.
We wish to test
HO:..§.
6
px 1
ran-
and known var-
=!o against
likelihood ratio is
=
So
=
To minimize
R (A )
8 w
=
- ..!.(
6
2 -1
- 6 )' L; - 1 ( 6
-0
-
by a suitable procedure
-1
8'
E
- _6 ).
0
C1 ,
we have
(2.2.1)
15
~
T) = - 1-(3
If we write
oR
-= 0
o~
=>
'V = _(3_
,
l-~
,
then
1
LOw - k olog (T)'V) - k 1 (T) - 'V) = 0
(2.2.2)
Similarly
oR
as
=
0
=>
Eqs. (2.2.2) apply to the general problem to be solved later.
In the
meantime we note that (2.2.2) would be easier to deal with if the terms
in
10g(T)'V)
could be eliminated.
An interesting approach is to solve the problem first with the restriction
+ (l-w)(3
w~
=
(2.2.3)
n
so that the expected error rate
Putting
~(~,
(3)
= w~ +
n
(l-w)(3,
is fixed.
we can use Lagrange's method to
solve
oR + h.£1
0(3
Since
of
log (T)'V) ,
=
=
0(3
wk 1 + (l-w)k o
= 0,
eliminating
h
O.
gives an equation free
i.e.,
(2.2.4)
16
Special Case:
La
= L1 = L
Solving (2.2.4) with (2.2.3) gives concise solutions when the
losses are equal.
In this case, (2.2.4) reduces to
i. e. ,
(2.2.5)
Solving (2.2.5) with (2.2.3), we get the formal solutions
a
7f(1 - w - 7f)
w(1-27f)
=
(2.2.6)
7f (w-7f)
(l-w) (1-27f)
S =
leading to nominal error probabilities minimizing the risk subject to
(2.2.3), for tests having at least one observation.
These results can be incorporated into
x
THEOREM 2. 1.
X
-1' -2
•••
is a sequence of
px 1
mutually independent and having the multinormal
If in testing
7f
i = io
i = i1
vs
p -
distribution.
-
= L1 = L,
then
C1
(i)
7f
<
max(w, l-w)
'd c;
E
(11)
7f
<
min(w, l-w)
<
in the class
minimizing
RC;(A
)
W
N (6, E)
the expected error rate
= wa + (l-w)S is fixed, and the losses La
i>
random vectors,
C1 ,
the procedure
subject to (2.2.3) is a SPRT given
by (2.2.6), having at least one observation.
(iii)
min(w, l-w)
<
7f
<
max(w, l-w)
the procedure minimizing
<-->
RC;(AW)
in the class
C1 ,
subject to (2.2.3)
is a decision made without taking any observations.
17
Proof:
(i) In any SPRT having at least one observation, we assume
a + S < 1.
So
TI
= wa + (l-w)S <
ma~(w,
l-w) (a + S) < max(w, 1-w).
Now suppose we have a randomized decision with no observations.
is the decision which accepts
Hi'
i = 0, 1,
If
d.
~
then
p = Pr(d o) = Pr(dol~ = ~o) = 1 - a
= Pr(dol~= i
Then (2.2.3)
=
S,
a+S=
1.
1)
TI-W
-->
so that for such a decision,
p
and
if
P = 1-2TI
. TI
~
w
and
1
w <2
or
TI
~
w
and
w
and
:
p
i. e. ,
p
~
~
0 -->
{
1 =>
So for such a procedure,
~
l-w
and
~
l-w
and
1
w=-
if
2
2
>l2
1
2
1
w >2
w
lies between w and
TI
-1
:;::
<-
,
or
.
the risk is
l-w;
This proves (i) , and sufficiency in (iii).
LTI.
We showed above that for SPRTs having one observation,
(ii)
patab1e
a
~
S
1
2
1
TI
>-
TI
< 2'
a
+S
and
a
~
0
0
are given by (2.2.6).
S
in (2.2.6)
in (2.2.6)
>
===>
TI
~
0
and
TI
1-w -
TI
~
0
and
TI
TI
~
0
and
TI
TI
~
0
and
TI
{ ww-
is not admissible, since then
>
< 1
TI
if
~
min(w, 1-w) •
TI
~
Then
1-w -
{
TI
>
<l
or
2
1
>2
<l
2
1
>2
max(w, 1-w) •
It can be checked that
min(w, 1-w)
com-
,
or
Hence
a
~
1, S
~
1,
in (2.2.6).
To prove necessity in (ii) and (iii), we need only show that if the
minimizing procedure subject to (2.2.3) is not a SPRT having at least
18
one observation, then
TI
t
min(w, 1-w).
But this must be so, since
such a procedure must then be based on no observations, implying what
we showed in (iii).
This completes the proof of the theorem, except that we need to
establish that the solution (2.2.6) in (ii) does give a minimum.
Expressing the risk in terms of
S
only,
=
+
{(l~S)(l-w)
- (TI - (l-w)S)}
w(l-S)
log TI _ (l-w)S ].
So
W-TI}
{l-W-TI}
- (w-TI) { S{w-TI+(I-w)S} + (l-w-TI) (l-S)(TI-(I-w)S)·
For
S small and
For
l-S
small and
TI
TI
aR < 0 •
as
aR
min(w, l-w) , - > 0 •
as
min(w, l-w) ,
<
<
-
Since there is only one turning value for
R,
this gives the re-
Q.E.D.
quired minimum when (2.2.6) holds.
Situations in sequential analysis could arise in which the experimenter would be more interested in fixing the expected error rate than
in fixing both error probabilities.
if a prior distribution for
Such an approach would only apply
Ho and HI
were known, but as shown in
the theorem for this problem of testing a multinorma1 mean, it leaves
scope for flexibility in choosing
volved.
a
and
S
to reduce the risk in-
19
R (A ' TI), subject to (2.2.3), can be ex8 W
The nominal minimum risk
pressed in terms of
TI.
For when (2.2.6) holds,
=
-
(l-w) (1-13) - wa
=
1 - w - TI,
log _13_
1-a
=
10 g{(l:W)
1-13
log a
=
log {(l:W)
(l-w) 13 .,. w(l-a)
(w-TI) ,
. (l:TI)}
and
,
. (l~TI)} •
Then we substitute in
to get the solution
TI
<
min(w, l-w)
TIL ,
(2.2.7)
Now that the problem is solved for fixed expected error rate
it remains to evaluate the nominal a.m. risk by minimizing
with respect to
TI.
For the case
ITI -
minimized with a greatest lower bound
TI
= min(w,
For
~I
<
Iw -
~I,
L xmin(w, 1-w) ,
TI,
R (A, TI)
8
the risk
TIL
is
at
1-w).
TI < min(w, 1-w),
the function in (2.2.7) is to be minimized.
gives
or
where
p
-1
=
P
=
P
-1
1
1
1-TI
210g - + (1-2TI)(-; - 1-TI)
TI
1-TI
1 - 2TI
210g - - TI(l-TI) = 0
TI
2c
=
LLl 2
= o
(2.2.8)
(2.2.9)
20
The value of
TI
required is a root of (2.2.8).
Raphson iterative method can be used to solve for
is free of
w,
~
TI
=
~ > 0
1-TI
1
(l-TI) 2
(2.2.10)
o
<
<
TI
<
1,
dep
inf
TI < 1 dTI
and
=
1
TI=2·
at
16
is monotonic increasing from
is skew-symmetric about the point
compared with
E1 (Z1)'
or
L
(2.2.8) will be small.
gives values of
P
-1
For values of
-
00
to
1
(2' p
then
p
-1
).
+
00
If the cost
Figure 2.2.2 shows a graph of
other than
1
40'
paper suggests that
p = HTI
-1
TI '" P
r
for some
TI
in
(0,1),
Further, the function
which correspond to roots
p
for
c
ep
is small
is small, and the single root of
Hand
ep, and Table 2.2.1
of (2.2.8).
the graph of
'slides' up or down but is similar in shape.
is small,
p
in (2.2.7).
and the method will give a rapid convergence.
TI
noting that
[TI(;-TI)] 2
0
ep
TI
for
dTI
TI
l + -!.2 + _2_ +
=
dTI
Hence
TI,
and that therefore the prior distribution appears only
in the boundary values for
Thus
The Newton-
ep
in Figure I
The linearity on log
r,
and certainly when
A good first approximation to the solution of
(2.2.8) was given by Chernoff [7], who showed that
(l-w) c
a '" E (Z1)L w '
1
O
(2.2.11)
give an approximate a.m. risk, using Wa1d's crude approximations, for
procedures in
C1 having at least one observation.
Here this gives a
21
250
100
50
25
Values of
p
~1
I
L
= - E (Z e)
ell
correspondin? to values of
10
n when
RO(AW' n)
is minimized, and when
n
l~w).
<
min(w,
_1
P
5
2.5
1;--------..-------...------...----_-----_---.-1.003
.005
.01
.025
.05
.10
.25
.40
22
<p< 7f)
FIGURE 2.2.2.
VS
where
7f ,
-p1 = 40
40
30
20
10
o
0.1
0.2
0.3
0.4
0.5
7f
TABLE 2.2.1.
Values of
7f
p
-1
.001
7f'
.0025
minimizing
R (A
.005
.01
8
,7f)
W
.05
for given
.10
.15
p:
'IT
.20
< min(w, 1-w)
.30
.40
.5
1012.81 409.98 209.58 108.18 24.84 13.28 8.960 6.523 3.599 1.644 0.0
23
first approximate root
=
The nominal risk arising from
a.m. risk."
c
=
7T
l
=P
=
(2.2.12)
p
will be called the "Chernoff
The risk arising from the solution
7T*
to (2.2.8) will be
called the "nominal a.m. risk."
A second approximation, by Newton's method, is
=
C/>(7T l )
7T l - </>'(7T )
l
i. e. ,
7T'2
=
P
-
1
{p (l-p)} 2 {P
-
1-p
l-2p
2log - - p(l-p) }
P
1
- - - 2log hp
= p _ p2(1_p)2 {1-p
p
If
p
p
(2.2.13)
p - p2(1 + 2logp) + 0(p 310gp).
is large, and
1
7T .'" -
If
= 0.25;
7T
2
<
0,
(2.2.14)
it is conceivable that
2'
poor first approximation to the root
p
.
is small, we can write
=
If
}
we replace
7T*
of
C/>(7T) = 0, leading to
by
pp
for
0
then we start the iterative process again.
it can be shown (see Appendix, I) that
that the n-thapproximation
7T
n
may be. a
7T* = 7T
2
<
p
<
But if
+ 0(p 3 l ogp )
is of this form.
1, say
7T
2
>
0,
in the sense
Eventually, and
usually quickly, there is rapid convergence to any disired tolerance.
Having obtained
plied.
7T*,
the analysis leading to (2.2.7) is then ap-
The nominal a.m. risk is then
R (A ,7T*),
O W
with nominal error
24
probabilities given by substituting
n*
that
lies between wand
n* < min(w,
But if
1-~).
n*
fbr
n
in (2.2.6), provided
1-w,
then
the minimum risk is
min[Lw, L(l-w)]
using a procedure having no observations, where
=
p
Pr(d o)
=
0
or
1.
We state the conclusions in
THEOREM 2.2.
n*
For the mu1tinorma1 problem defined in Theorem 2.1., let
be the root.of
~(n)
=p
-
a.m. risk among all procedures
L
x
1-n
1-2n
21og-;- - n(l-n)
-1
8
€
Co
Then the nominal
= O.
is defined by
min(w, 1-w) ,
(2.2.15)
It can be checked that when
n*
= min(w,
1-w) , the two forms agree,
i.e. for the a.m. procedure, Wa1d's approximation to
E(N)
is zero at
n* = min(w, 1-w).
in (2.2.7) is attained when w
1-w
- (l-2w)log--::w
d
(dw[-
•
Then
8'
decreases as
1-w
(l-2w)lo~]
=
1
= 2'
w increases from
since
0
to
1
2
~(w)).
is a SPRT with at least one observation, and from (2.2.6)
the minimizing nominal error probabilities are
~* =
S* = n*.
25
2.3.
The Absolute Minimum Risk for Symmetric Nonnorma1 Tests.
We consider procedures
C1 ,
8' E
not restricted to the mu1ti-
normal case, in which the error probaPi1ities
tests of a simple
where the losses
HO: 8
La
and
=
a
and
S are equal, in
80
against a simple alternative
L1
may be unequal, and
8'
HI: 8 = 6 1 ,
has at least one
observation.
(2.1.5) may be expressed
(2.3.1)
To minimize, we have
oR
=
oa
oR
oa
So
=
°
gives
=
1
1-a
k* - 210g
a -
1-2a
a(l-a)
= o.
(2.3.2)
This equation has the same form as (2.2.8), with
instead of
p
>
0,
k*
=
p.
We note that
(2.3.3)
k* >
° by virtue of (2.1.4), and since
the formal analysis for solving (2.3.2) is the same as discussed
previous 1y.
A first approximation to
a
is required, and the argument used by
Chernoff [7] for general tests is applied.
The error probability is
supposed small, and Wa1d's crude approximations are used, so that
(2.3.4)
26
Putting
f3 = a,
RO'(AW)
'"
(2.1.2) gives
c(l-w)
{Low + L 1 (1-w)}a + E ~ ) 10ga - E (Z ) 10ga
{LoW +
L 1 (l-w)
}a
a
1
(k 1
-
1
1
k o) log a.
(2.3.5)
Then
aR
aa
'"
Low + L 1 (l-w)
=
0
k 1-k o
a
at
k 1-k o
a1
Since earlier
TIl = P
k*
=
Low + L 1 (l-w)
(2.3.6)
gave (2.12.13) and (2.12.14),
leads to the improved approximation
2
k*
.
k* (l-k*)
2
a2
to the root
1
1-k*
{l-k* - 210g k *
a
a
1
=
k*
of (2.3.2)
}
(2.3.7)
a+f3<l =>
a1
= k*
>
1
2'
a <
1
2'
and numerical results .indicate that if
it can be replaced by
a1
= 0.45
as a suitable first ap-
proximation.
Hence the method of solution is straightforward, and covers a wide
class of problems, including those with unequal loss.
arising from
a
1
=
k*
The risk (2.3.1)
will be called the "Chernoff symmetric a.m. risk,"
and the risk (2.3.1) arising from the solution of (2.3.2) will be called
the "no.mina1 symmetric a. m. risk."
For the problem of testing a multinormal mean with equal loss, as
considered in 2.2, the case in which
solution, since (2.2.6) gives
a
= f3
w = -1
2
. lds the above symmetric
y~e
when w =
21 .
27
Tables 2.3.1 to 2.3.4 show solutions
the initial
a
1
=
k*
'
a
to (2.3.2), together with
the Chernoff symmetric a.m. risk, the nominal
symmetric a.m. risk, and the nominal a.m. risk for
in the next section.
When
c
S
~
a
as discussed
is small, the Chernoff risk is close to
the iterated solution; and the iterated solution appears to come closest
to the non-symmetric nominal a.m. risk whenever w
The question may be raised:
= ~.
does a non-symmetric test procedure
give a substantial improvement in the a.m. risk over a symmetric procedure?
The computed results for Bernoulli and Gamma distributions with
equal loss indicate that a symmetric procedure is effectively as good
when
w
1
= 2'
but not necessarily otherwise.
hold whether the cost is large or small.
This statement appears to
Further discussion appears
later in 2.5.
If there is no prior information about
well take
60
6,
the experimenter might
to be as equally likely a priori as
61 ,
and the nu-
merical results suggest that one would do almost as well with a symmetric test.
In the case of normality,
If
Lo
Eo(Z) = -E 1 (Z) => k 1
-
k o = c/E 1 (Z).
= L 1 = 1 in addition, then the risk (2.3.1) is free of Aw •
Hence in this case the nominal symmetric a.m. risk is invariant under
changes in
and is effectively the same as the non-symmetric nominal
A ,
a.m. risk for
w
w
1
2
28
Tables 2.3.
*
means the 'no observations' rule is best.
Lo = L1 = 1
in all cases.
TABLE 2.3.1
= -1-
P (x)
e
v'27fe
e
- -.L2
2e
Chernoff
;
EN
CI.
c
w
x2
Symmetric a.m. Risk
Chernoff
nominal
nominal
Chernoff
nominal
Nominal
a.m. risk
.25 .1
.1715
.2190
3.024
1.226
.4739
.3416
.2499
.5
.1
.2191
.2609
3.327
1.092
.5518
.37002
.36947
.75 .1
.2667
.2935
3.525
.968
.6192
.3903
.25*
.25 .01
.0172
.0194
6.974
6.465
.0869
.0841
.0798
.01
.0219
.0254
8.372
7.586
.1056
.10126
.09985
.75 .01
.0267
.0316
9.666
8.555
.1233
.1171
.0958
.25 .001
.00172
.00175
10.924
10.848
.0126
.0126
.0123
.5
.001
.0022
.0023
13.418
13.299
.0156
.01555
.01534
.75 .001
.0027
.0028
15.808
15.636
.0185
.0184
.0160
.25 .0001 .00017
.00017
14.873
14.863
.00166
.00166
.00163
.5
.0001 .00022
.00022
18.463
18.447
.00207
.002065
.002043
.75 .0001 .00027
.00027
21. 950
21.93
.00246
.00246
.00222
.5
29
TABLE 2.3.2
6e
=
-6x
,
x > 0,
6 > 0; 6 0 = 1,
c
a.m. Risk Nominal
EN
a
w
Chernoff
Chernoff
nominal
nominal
.25 .01
.4019
4.842
.01
.3991
4.977
.75 .01
.3661
5.119
.5
For c ~ .1, the 'no
observations' procedure is nominally
the best.
Chernoff
nominal
a.m. risk
.4503
*
*
*
.4488
.4473
.25 .001
.0621
.0810
172.49
126.30
.2345
.2073
.1719
.001
.0602
.0784
169.21
125.17
.2294
.20353
.20351
.75 .001
.0584
.0757
165.87
123.96
.2242
.1997
.1691
.25 .0001 .0310
.0374
215.50
186.56
.1388
.1306
.1125
.0001 .0301
.0361
210.95
183.47
.1356
.12786
.12784
.75 .0001 .0292
.0349
206.35
180.31
.1324
.1251
.1101
.5
.5
30
TABLE 2.3.3
P (x)
e
=
e2
xe -ex ,
x > 0,
>
EN
a
c
w
e
Chernoff
eI
= 2
Symmetric a.m. Risk Nominal
Chernoff
nominal
nominal
0;
Chernoff
nominal
a.m. Risk
.25 .1
.2349
.2726
3.403
1.049
.5752
.3775
*
.5
.1
.2109
.2544
3.282
1.114
.5392
.36580
.36564
.75 .1
.1869
.2337
3.135
1.183
.5004
.3519
.25 .01
.0235
.0274
8.811
7.922
.1116
.1067
.0905
.5
.01
.0211
.0244
8.139
7.404
.1025
.09840
.09803
.75 .01
.0187
.0213
7.439
6.846
.0931
.0898
.0824
.25 .001
.0024
.0024
14.220
14.084
.0166
.0165
.0148
.5
.0021
.0022
12.995
12.884
.0151
.01504
.01499
.00187
.00191
11. 743
11.654
.0136
.0136
.0129
.25 .0001 .00024
.00024
19.628
19.610
.0022
.0022
.0020
.0001 .00021
.00021
17.851
17.836
.0020
.001995
.001990
.75 .0001 .000187 .000187 16.047
16.035
.00179
.00179
.00172
.001
.75 .001
.5
*
31
TABLE 2.3.4
Pr(x
=
1 = {6
o<
0)
6 < 1
•7
1 - 6
EN
ex.
w
c
Chernoff
nominal
Symmetric a.m. Risk Nominal
Chernoff
nominal
.25 .1
.5
.1
0.262
.4475
.75 .1
Chernoff
nominal
*
*
*
.47364
a.m. Risk
*
.47364
*
.25 .01
.1198
.1606
25.42
13.456
.3740
.2951
.2287
.5
.01
.1181
.1584
25.23
13.480
.3704
.29318
.29318
.75 .01
.1164
.1562
25.04
13.501
.3668
.2912
.2279
.25 .001
.0120
.0132
53.02
50.35
.0650
.0635
.0567
.5
.001
.0118
.0130
52.43
49.83
.0642
.06281
.06281
.75 .001
.0116
.0128
51. 84
49.31
.0635
.06209
.0559
.25 .0001 .00120
.00122
80.61
80.22
.0093
.0092
.0086
.5
.0001 .00118
.00120
79.63
79.25
.0091
.00912
.00912
.75 .0001 .00116
.00118
78.65
78.28
.0090
.0090
.0084
32
2.4.
The General Case.
We now return to the problem of minimizing (2.1.5), viz.
We seek solutions of equations (2.2.2);
~(n,
these are
~(n, v) :::;:
aR
as:::;:
if
n:::;: 1~13'
~~:::;: LOw - kolog(nv) - kl(~ -
v) :::;:
and
v : :;: _13_
l-a'
v) :::;: 0,
1
L1(1-w) + k1log(nv) + ko(v - n) : :;: O.
a + 13 < 1
We note that if
o<
n
<
1,
for Wald's SPRT, then
(2.4.1)
O<v<l.
Also
a
:::;:
n (l-v)
l-nv
< n
(2.4.2)
13
:::;:
v(l-n)
l-nv
< v.
We apply the Newton-Raphson. method for two unknowns to solve for
and
v.
If
(no' vo)
is a first approximation to the roots of (2.2.2),
then we take as our second approximation in
section with
Z:::;: 0
n
(n, v, Z)-space
of the tangent planes to the surfaces
the interZ:::;:
~(n,
v)
So we solve
+ E'(.£!)
¢eno' vo) + E(!t)
av 0
an 0
:::;:
0
(2.4.3)
+ E' (~)
Hn o ' vo) + E(~)
av 0
an 0
for
E
and
E' ,
where
(~)
an 0
:::;:
0
is the value of ~
an
at
(no'v o)' etcc
33
n1 = no +
Then,
vI =
o
<
n1
V
< 1,
terminant
o+
and
D of 0,
:,}
gives a second approximation, provided
0 < vI < 1.
In solving (2.4.3), we need that the de-
where
D =
=
k
k
- -O+ -1
nO n~
k1
- ko +-
no
k
o+ k
- V
1
o
ko k 1
- -+-v2
Vo
0
i. e. ,
=
D
Hence
k1
>
0,
D
>
k1
- k o)(k 1
no
(-
since
0,
0 <
k o < 0 by (2.1.4);
ko
1
-)(-V
novo
1)
.
no
V
< 1
o
< 1,
0 <
o
(2.4.4)
from (2.4.1), and
thus each factor in (2.4.4) is strictly
positive.
The second approximation to the set of minimizing error probabi1ities is then derived from
n1 = no +
- 4>(n o'v o) .£!)
dV 0
- 1/J(n o' V o) ~)
dV
D
vI
=
V
o+
.£!) - cHn ' v )
o o
dn 0
~)
dn 0 - 1/J(n o'v o)
D
(2.4.5)
34
The Chernoff Approximation, [7].
Using Waldls cruder approximations to the ASN
(2.4.6)
=
we have
Then
,
oR
oa '"
k1
Low
a
a
'" L 1 (l-w) +13-
013
p
<
(a
o
-1
+ 13 )
0
•
ao
LOw
and
Unless
13 0
w or
L 1 (l-w)
a O + 13 0
replaced by
l-w
<
pa
If
l.
and
O
then
a O and
13 0
will be small.
than
!E O(Zl)
Cases where
1
w or
a* + 13*
>
cision
without taking any observations at all.
and
13 1
give
no
and
13 0
or
L1 ,
and
lEI (Zl)l;
l-w
is small
a*
and
13* with
in turn (2.4.5) and (2.4.2) give
as second approximations.
necessary to start again with
replace
1,
and the minimum risk would then be taken by making a de-
and
a1
~
where
Pl3 o '
L O or to
are of less interest, and often lead to solutions for
1,
a o + 13 0
is small, however, in mos t prac-
tical situations the cost will be small compared to
will be of the same or smaller order
(2.4.7)
kO
=
13 0
as first approximations, provided that
we start again with
O
giving
ko
oR
kl
=
by
pSo.
As before, if
a O replaced by
pa o ;
al
if
0,
<
13
1
<
it is
0,
Eventually the convergence process should com-
mence within the unit square
°
<
n
<
1,
°
<
v < 1.
35
Before continuing further, we refer to Tables 2.4, which give the
minimizing error probabilities and nominal a.m. risks for a number of
hypothesis-testing problems with
of
c
and
w.
L
o = L1 = L
The Chernoff approximations
and for differing values
a O and
entered, as are the risks (2.1.5) based upon them;
So
in (2.4.7) are
these risks turn out
to be close to the nominal a.m. risks evaluated by computer, and a discussion of this follows later.
Originally, Chernoff approximated the
minimum risk by
=
=
using (2.4.6).
Computer results show this not to be a better approx-
imation to the a.m. risk.
What we wish to make clear is that Chernoff's
approximations (2.4.7) to the minimizing error probabilities do give a
close value to the a.m. risk when entered into
R (A )
8 W
in (2.1.5).
discussion on why this is so follows later in this section.
A
For the
moment, we notice from the tables that although Chernoff's a.m. risk is
close to the computed value, the appropriate error probabilities are not
necessarily close.
(a*, S*) does give a minimum
a2R at (a*,
2
a2R -for R (A ) , we examine
and [aaas] - aa 2
S*) ;
8 W
as 2
we show that (a*, S*) gives a minimum when a* and S* are small.
To show that the stationary point
a2R
aa 2
aR
aa
=
<P(n,
.
a2R
v);
aR
as
=
1Ji(n, v).
36
Under the usual conditions of continuity, and existence of partial
derivatives,
a 2R
aa 2
+ 1.1~
av aa
1.112l
an aa
=
k1 k o
(2 --)
>
0
1
l-S
+
(k
for small
a
and
n
n
1
-
ko
S )
(1-a)2
-)(-
v
S
(and so for small
n
v) since the first term dominates.
and
In a similar manner,
=
-
k1 kO
a
( - - - ) +-L (k
1
I-a
(1-S)2 n 2
n
a 2R
=
as 2
-
k1
a
((l-S) 2 n
a2 R
aaaS
-
kO)
-
ko
-)
v
k1 k
o
+_1_ ( --)
I-a v
v2
Hence
k v - k
_1_+
l-S
1
0]2
vel-a)
k1v - k o
k 1 - kon
k1v - k o
I-a ] [1-{3
+ v 2 (I-a) ]
=
u + -v] 2
[--
n
\)
say, where
k \) - k
u
=
v
=
1
I-a
o.
'
u > 0,
v
>
o.
37
When
n
and
(hence
v
1 o
2
n v2
So, if
(a*, 13*).
<
13) are small, this quantity is
2 -1
and
(nv)
k 2
1 . 1
. -l-a
1-13
since each term
and
(n 2 v 2 )-1, (n 2 v) -1
dominated by terms in
k k
a
k 2
• _1_ • 1
2
1-a
1-13
nv
0
-
1 . 1
. -1-a'
1-13
1
2
n v
-
Le., by
,
RO(A
)
W
is approximately minimized at
a* + 13*
But, as mentioned earlier,
>
1
implies that a de-
cision should be made without taking any observations.
It should be
noted that the Wa1d approximations are only good when
(a*, 13*)
0
o.
a* + 13* < 1,
small, and when
<
E(N)
a
and
13
are
is large, so that an investigation of whether
is a minimizing point when
a*
and
13*
are not small is not
of very much interest.
The tables indicate that for
greatest around
1
w=2'
Lo
= L1 ,
the nominal a.m. risk is
which is reasonable since we have the least
prior information about
e
when w
ated solution is greater than
=
1
"2.
We remark that if the iter-
min(Low, L 1 (l-w)),
then the best rule is
a decision with no observations.
Closeness of Chernoff a.m. Risk to Nominal a.m. Risk
RO*(A
):
W
The following argument is intended to account for the closeness of
R
oo (Aw)
(2.4.7).
to
RO*(AW )
Since
that the surface,
,
when
(a*, 13*)
R
0
and
= g(a,13)
We examine the behavior of
0
is the SPRT defined by
(a o' 13 0)
ao
and
13 0
need not be close, it appears
say, is somewhat flat near the minimum.
~~ when 13
in
is fixed.
38
where
f(S)
Let
a O =a* + 0,
probability.
where
a O is the Chernoff minimizing error
Then
+ 0 + k (. S*
l-S*)
( *) - kolog a*
l-a*-o
1 l-a*-o - a*+o
=
f S
o
0
l-a*
f(S*) - k o [log(l + a*) - log(l - l-a*) - log~]
1
k
1
l-S*.
1
a*
1 +
..9a*
o
0
S*
- k o [log(l + a*) - log(l - l-a*)] + k 1 l-a*(
k
1
l-S* (
Ci.*
dRI
0
0
1 - l-a*
1)
- 1)
1 + a*
since
da
1
1
o.
(a*,S*)
For small error probabilities, the dominating term is the lasto
assume
ko
cases, or if
Chernoff
and
c
k1
<
.01
to be small, which is so if
in many cases.
a O is close to
a*), then
If
I:* I
dRI
da
c
~
.001
We
in most
is small (i. eo, if the
is going to be small;
(aO'S*)
39
or if
is small (and this is the important criterion in cases where
may not be small), then
1;*1
aRI
aa (a ,(3*)
will be small.
o
The same kind of argument holds about any point
aR
aa
which
is small; i.e., it can be applied to
Similarly, if
small, where
a
(30 = (3* +
E,
or if
for
a'=a'+8'.
o
aR\
a(3 (a*,(30)
is held fixed,
(a', (3')
is small i f
is
is small.
These results do not prove the flatness of
R (A )
8 W
near
(a*, (3*),
even if all the quantities involved are small, but they are indicative
of reasons why this flatness exists.
creases (and
a*, (3*
The tables show that as
c
de-
decrease), the percentage saving in using the it-
erative method, rather than the Chernoff approximation, becomes negligible.
Tables 2.4.1 to 2.4.7 show values of the Chernoff a.m. risk and
nominal a.m. risk for a number of cases, including the univariate normal of Section 2.2.
2.5.
They are given on pp 43-49.
Comparison of a.m. risks:
w =
1
2'
The tables also indicate that for
Lo
= L 1 .= 1
and
1
w=2'
the
nominal a.n. risk in the class of synunetric tests is very close to the
nominal a.m. risk in the class of general tests,
We write
R
min
the a.m. risk in the latter class, and
R
sym
drop the star (*) notation.
(a, (3) minimize R (A )
8 W
eral tests, and
min(a,(3)
~
a'
~
a'
Thus, let
for synunetric tests,
max(a,B),
and indeed
a'
a
+ (3
y
=
2
for the former.
for
We also
for gen-
In all cases in the tables,
seems to be very close to
40
Let
R"
ability is
be the risk of the symmetric test in which the error prob-
R" - Rmin'
Consider
y.
and
R" - R
•
sym
The first of
these appears from computed results to be much larger numerically than
the second.
Consider first
R" - R . •
mJ.n
The following applies to any
(0'., (3)
satisfying the conditions below, and not only to the minimizing values.
From (2.1.5) and (2.3.1),
R" - R .
mJ.n
where
=
-
[1 - (0'.+(3) ]log
2-(0'.+(3)
(0'.+(3)
(l-a)log _(3__ aJDg 1-(3
1-0'.
.0'.
(2.5.1)
=
1
Hence, if
(i)
w =
2'
(2.5. 2)
R" - R .
mJ.n
and there is no contribution from the loss term.
a substantial difference; for example, if
w
=
This factor could make
.25
and
L O = L1
= L,
R" - RmJ.n
.
The first term
(R" - Rmin)/Rmin
this term when
w
=
1
w
1
-y L
2
could be large enough to make the ratio
substantially greater than zero.
are close (when
~ may explain why R" and R.
mJ.n
2")' but not necessarily close when w
from
The disappearance of
+2.1
A further reason arises
41
ii). Examining the terms in
E = ~(s - a).
so that
consider
=
k o and
y
a
=y -
and
E,
=Y +
S
l-v
( 1- 2y )log -=---.L..
(Y+E) [log - L + leg
l -y
1
-
Y
y
E
y are small, and
1
l+~
Y
+
_E_
l-y
E
---
l-y) ]
E
1
y
2dog l-y _ (1-2y)
E
_ 1:. ~ (1 _ -L)2 + O(E:).
y
y(l-y)
2 y2
l-y
y
Since e:logy
as
0
-+-
~
y -+- 0 ,
1£1
y
is small, then
J
1
,
it follows that whenever
a
=
la
and similarly
J '
R" - R.
m1n
(2.5.3)
-s I
+ (3
are small; and hence for
o
w= 2
that
(iii)
An examination of the numerical results showed that the condi-
is small.
tions for (ii) to hold were often, but not always, present.
rises to as much as
tity
.431.
In such cases
are small, (2.5.2) shows that
whenever the cost
c
Next, consider
Writing
=
m1n
will be small.
J1
ko
are
and
This is so
is small enough.
R" - R
sym
R" = f(y),
R"
R" - R.
The quan-
J O and
not small, but neither will they be large, and hence whenever
(iv)
E,
J 1.
- (1 - y - E) [log -y + log (
1
let
Suppose further that
1
=
k1,
and
•
y = a'
R
+ E'f'(a')
sym
+ E', we get by Taylor expansion
+1:.2
E'2f"(a) + ••• .
42
But from (2.3.2),
f'(a')
=
~(a')
= 0,
and with (2.2.10),
(2.5.4)
R" - R
sym
This is small i f
and
most of the numerical cases investigated,
1
is so close to 2(a+S)
,
E'
E'
is small.
In
is very small, since
a'
are small, or if
a'
and (2.5.4) is negligible.
The above discussion throws some light on the remarkable agreement
in computed results between
R.
m~n
and
when
R
sym
w =
1
2' The prob-
lem of whether such agreement holds under wider conditions might repay
further investigation.
Tables 2.4.1 - 2.4.7 follow.
procedure is best.
L O = L1
inal values are both shown;
=1
the
* means
that the 'no observations'
in all cases.
% Saving
The Chernoff and
is also shown.
nom-
This is
given by
Percentage Saving
=
Chernoff a.m. risk - nominal a.m. risk x 100 •
nominal a.m. risk
e
e
e
TABLE 2.4.1
Nominal a.m. Risk
P6(X)
=
1
-- e
-~(x-6)2
;
6
ili
For
c
~
0.005,
1
-
6
0
=
E1 (z)
0.1
=
- E (Z)
a
EN
S
c
.1
.001
.2
.001
.3
.001
.4667
.7298
.0857
.0376
103.82
46.737
.4
.001
.3000
.4270
.1333
.1242
151.20
98.302
.5
.001
.2000
.2453
.2000
.2453
166.36
.1
.0005
.9000
.2
.0005
.4000
.6125
.0250
.0151
.3
.0005
.2333
.3470
.0429
.4
.0005
.1500
.2142
.5
.0005
.1000
.1
.0001
.2
Nom.
.005
the 'no observations' procedure is best.
w
Cher.
=
o
Cher.
Nom •
Cher.
Nom.
a.m. Risk
Cher.
Nom.
*
*
*
% Sav.
*
*
.2920
2.74
.3512
.3436
2.22
114.52
.3664
.3598
1.82
188.30
105.75
.1942
.1874
3.59
.0435
284.75
204.32
.2424
.2367
2.40
.0667
.0814
335.54
255.89
.2678
.2625
2.02
.1345
.1000
.1345
351. 56
272 .10
.2758
.2706
1.84
.1800
.2112
.0022
.0021
395.99
363.99
.0596
.0594
.37
.0001
.0800
.0936
.0050
.0053
580.97
549.19
.0781
.0779
.26
.3
.0001
.0467
.0543
.0086
.0095
679.5
647.8
.0880
.0878
.23
.4
.0001
.0300
.0347
.0133
.0151
731. 0
699.3
.0931
.0929
.20
.5
.0001
.0200
.0230
.0200
.0230
747.2
715.5
.0947
.0945
.20
.0111
15.070
*
*
.po
w
e
e
e
TABLE 2.4.2
Pe(X)
=
~ e-~(X-e)2.
ili
'
e1
a
-
eO
= 0.25
E1 (Z)
= - Eo(Z)
EN
13
.03125
a.m. Risk
Cher.
Nom .
w
c
.1
. 01
.2
.01
.3
.01
.7467
.4
.01
.4800
.6278
.2133
.1177
6.707
5.912
.3871
.3809
.5
.01
.3200
.3218
.3200
.3218
8.684
8.507
.4068
.4068
.005
.1
.001
.2880
.3609
.0036
.0029
46.01
38.63
.0780
.0773
.93
.2
.001
.1280
.1595
.0080
.0085
75.55
68.26
.1075
.1069
.57
.3
.001
.0747
.0924
.0137
.0156
91. 29
84.03
.1233
.1227
.48
.4
.001
.0480
.0588
.0213
.0252
99.53
92.28
.1315
.1310
.44
.5
.001
.0320
.0387
.0320
.0387
102.12
94.88
.1341
.1335
.3
.1
.0001
.0288
.0299
.00036
.00036
126.30
125.18
.0158
.0158
0
.2
.0001
.0128
.0133
.00080
.00082
155.93
154.81
.0188
.0188
0
.3
.0001
.0075
.0077
.00137
.00141
171. 70
170.58
.0204
.0204
0
.4
.0001
.0048
.0050
.0021
.0022
179.95
178.83
.0212
.0212
0
.5
.0001
.0032
.0033
.0032
.0033
182.55
181.43
.0215
.0215
0
Cher.
Nom.
Cher.
Nom.
.1371
Cher.
Nom.
*
*
*
1.366
% Save
*
*
*
1.63
..j:>..j:>-
e
e
e
TABLE 2.4.3
Pe(X)
w
c
-l- e -;Hx-e) 2 •
l2TI
'
61 -
6
0
=
0.5
E1(Z)
7
Eo(Z)
.125
EN
a.m. Risk
B
Chernoff nominal Chernoff nominal Chernoff nominal Chernoff nominal % saving
().
.25 .1
.5
=
.1
.4231
.4231
.381
*
*
*
.4613
.25 .01
.2400
.3491
.0267
.0259
12.060
8.975
.2006
.1965
2.10
.5
.oaoo
.1067
.0800
.1067
16.413
13.369
.2441
.2404
1.54
.25 .• 001
.0240
.0259
.0027
.0028
33.55
32.95
.0416
.0415
.05
.5
.0080
.0086
.0080
.0086
37.95
37.34
.0460
.0459
.04
.0024
.0024
.00027
.00027
52.56
52.47
.00606
.00606
0
.00081
.00080
.00081
56.95
56.86
.00650
.00650
0
.01
.001
.25 .0001
.5
.0001 ..• 00080
~
\Jl
e
e
e
TABLE 2.4.4
Pe(X)
_
-
1
-~
l:2TIe
1
exp(- ~-
2e 2
eo
2
X );
=
1,
e1
=
2;
Eo(Z)
E1 (Z)
- .31815,
.80685
.
Chernoff nominal Chernoff
EN
I
nominal Chernoff nominal
.25 .1
.3718
.9672
.1048
.0041
1.184
.5
.1
.1239
.2305
.3143
.2879
1.605
l.103
.3797
.75 .1
.0413
.25 .01
.0372
.0481
.0105
.0110
6.299
5.958
.0801
.0798
.38
.5
001
.0124
.0156
.0314
.0341
7.815
70498
.1001
.0999
.21
.75 .01
.0041
.0048
.0943
.1034
6.925
6.631
.0959
.0958
.14
.25 0001
.00372
.0039
.00105
.00106
10.539
10.493
.0123
.0123
.08
.5
.001
.00124
.00129
.0031
.0032
13.152
13.106
.0153
.0153
0
.75 .001
.00041
.00043
.0094
.0096
13.357
13.311
.0160
.0160
0
.25 .0001 .00037
.00037
.000105
.000105 14.534
14.536
.0016
.0016
0
w
B
CI.
c
a.m. Risk
Chernoff nominal % saving
*
.9430
*
.2499
.3695
2.76
*
.0001 .000124
0000125 000031
.00031
18.243
18.240
.0020
.0020
0
.75 .0001 .000041
.000041 .00094
.00094
19.543
19.539
.0022
.0022
0
.5
1
.j::-o
'"
e
e
e
TABLE 2.4.5
Pe(X)
8e- ex ,
=
x > 0,
For
w
c
81
8 > 0;
~
.01,
8
=
-
E1 (z)
.01768,
=
.01565
the 'no observations' procedure is best.
EN
S
C't
c
E (Z)
a
= 1.2
0
a.m. Risk
Ch.ernoff nominal Chernoff nominal Chernoff nominal Chernoff nominal % saving
.25 .005
.5
.005
*
.3195
.3159
.2828
.3092
20.187
17.795 .4021
.75 .005
*
*
.4015
*
.25 .001
.1917
.2586
.0189
.0203
111.91
91.92
.1740
.1718
.13
.001
.0639
.0814
.0566
.0753
145.51
125.18
.2057
.2035
1.10
.75 .001
.0213
.0223
.1697
.2405
113.28
92.27
.1717
.1691
1.52
.25 .0005 .0958
.1163
.0094
.0105
163.99
151.11
.1130
.1125
.44
.0005 .0319
.0378
.0283
.0344
196.47
183.46
.1284
.1278
.75 .0005 .0107
.0117
.0848
.1060
162.85
149.63
.1106
.1101
.5
.5
.p..
.......
e
e
e
TABLE 2.4.6
P6(X)
6 2xe- 6x ,
=
w
x > 0,
6 > 0;
a
c
1,
61
6
2
2
EN
S
- .6137,
EO(Z)
E1 (Z)
=
.3863
a.m. Risk
Chernoff nominal Chernoff nominal Chernoff nominal Chernoff nominal % saving
.25 .1
.5
.1
*
*
.2589
.2689
.1629
.2389
1.619
1.118
.3728
.25 .01
.0777
.0878
.0054
.0061
6.718
6.396
.0907
.0905
.18
.5
.01
.0259
.0289
.0163
.0196
7.715
7.381
.0982
.0980
.21
.75 .01
.0086
.0092
.0489
.0599
6.396
6.049
.0827
.0824
.32
.25 .001
.0078
.0079
.00054
.00056
12.438
12.389
.0148
.0148
0
.5
.0026
.0026
.00163
.00168
12.884
12.834
.0150
.0150
0
.00086
.00088
.0049
.0050
11.012
10.963
.0129
.0129
0
.25 .0001 .00078
.00078
.000054
.000054 17.895
17.890
.00202
.00202
0
.0001 .00026
.00024
.000163
.00016
17.789
17.870
.00199
.00199
0
.00049
15.365
15.359
.00172
.00172
0
.75 .1
.001
.75 .001
.5
*
.75 .0001 .000086
.000087 .00049
.3656
1.96
*
.j::-
00
e
e
e
TABLE 2.4.7
Pr(x
1
= {O)
6
c
w
6
{1 - 6'
0
05,
6
1
=
Eo(Z)
.7;
= -
E1 (Z)
.08228
EN
a.m. Risk
S
% saving
Chernoff nominal Chernoff nominal Chernoff nominal Chernoff nominal
Ci.
.25 .1
.5
.08718,
.1
.4480
.4470
.262
.75 .1
*
*
*
.4736
*
*
.25 .01
.3646
.5544
.0382
.0277
110631
6.936
.2361
.2287
3.24
.5
.01
.1215
.1603
.1147
.1564
180136
13.480
.2995
.2932
2.15
.75 .01
00405
.0290
.3441
.5426
11. 971
7.048
.2361
.2279
3.62
.25 .001
.0365
.0403
00038
.0041
44.77
43.55
.0568
.0567
.09
.5
00122
00133
.0115
.0126
51.05
49.83
.0629
.0628
.08
.75 .001
.00405
.0043
.0344
.0383
44.35
43.12
.0560
.0559
.09
.25 .0001
000365
.0037
.00038
.00039
73.54
73.36
.00855
.00855
.5
.00122
.00123
.00115
.00116
79.42
79.24
.00912
.00912
.000). .• 00042
.00041
.0034
.0035
72.33
72.15
.00840
.00840
.1. 75
.001
.0001
"
lJ
.po
\0
CHAPTER III
THE PARTIAL SPRT PROCEDURE
3.1.
Definition, and Application to the Exponential Family.
In Chapter II, the problem of minimizing the risk for Wald SPRTs
was considered.
We shall be concerned with the problem of minimum risk
when the cost is unknown in Chapters IV and V.
The procedure to be dis-
cussed in Chapter V, when no estimate of the cost is available, is a
variant of the SPRT, in which
n
observations are taken initially, and
a SPRT procedure is introduced at stage
n.
This will be defined pres-
ently, and will be called the Partial Sequential Probability Ratio Test,
or
PSPRT.
Although no discussion of such a procedure has appeared in the lit-
erature, Billard [5] has recently derived properties of an analogous
type of test procedure for two-sided alternative hypotheses, deriving
the OC function and ASN for testing a normal mean with known variance,
and a binomial variable.
In this chapter, the properties of the PSPRT will be derived, and
compared with other procedures.
In this section, general formulae for
the O.C. function and the ASN will be derived when the r.vo under consideration belongs to the exponential family.
In Section 3.2., these
51
formulae will be further developed when the r.v.
X has a Gamma distri-
bution; in Sections 3.3. and 3.4.,
N(8, 1)
X will be a
It can be noted at this stage that when
Wa1d SPRT.
= 1,
n
variable.
the PSPRT is a
The results of Sections 3.3. and 3.4. will be compared in
this case numerically with exact results and approximations of Wa1d and
Kemp [13], in Section 3.5.
A comparison with Anderson's modified SPRT
[1] is also made.
Definition of PSPRT:
Xl' X2 , ••• ,
is a sequence of independently and identically dis-
tributed random variables indexed by a parameter
hypothesis
bers
n
H O:
8
=
8 0 vs
an alternative
8.
8
HI:
We want to test a
=
81 •
A fixed num-
of observations is taken, and further observations are taken
according to whether
B
<
Pln'
--
<
A,
n'
~
n
Pan'
or not, where
Pin'
= joint
likelihood of the first
under
Hi' i
= 0,
n
observations
1.
This is a GSPRT, and may be relevant to the problem of unknown cost
considered earlier, since the information from the first
n
observa-
tions may be used to estimate the cost (or mean cost), and hence the
values of
Let
A and
T(x)
"""'""11
n
"""'""11
=
chosen to minimize the risk.
be the conditional test, given
the close of stage
B' (x )
B
n,
x,
"""'""11
and starting at
in which the continuation region is
=
A' (x ); n'
n
"""'""11
~
n.
(3.1.1)
52
If the inequality
o
< B' < 1 < A'
n
n
(3.1.2)
does not hold, then we decide at stage
if
n.
A~ <
If
1,
B~ >
1,
Let
b
10gB,
Let
Tn
be the unconditional test procedure with
HI
is accepted;
Ha is accepted.
a = logA,
vance, having DC function
b' = 10gB'
n
n'
L (6), ASN
n
a'
n
=
logA'
etc.
n'
n
fixed in ad-
En (NI6), and error probabilities
sn .
L (6), E (Nle),
x
x
--n
-n
quantities for test T(x),
If
a(x)
--n
= EL x
n
S(x)
--n
are the equivalent
then
-n
L (e)
and
(e)
--n
En(Nle)
(3.1. 3)
EEx (Nle),
=
-n
taking expectations over the joint distribution of
x •
-n
Exponential Family.
We confine attention to the case where
Pe(x)
_
If
then· s-
n
s
A(x)exp[e sex) - ,(e)].
=
1 n
= _. ~
n
n i=l
(3.1. 4)
s(x.),
(3.1. 5)
1.
is a sufficient statistic for
e
up to
n
observations, and
hence expectations in (3.1.4) may be taken over the distribution of
-
s •
n
We have
PIn
log - Pan
=
n(~s
n
- T),
say.
(3.1.6)
53
Dropping the subscript
n
on
s- ,
servations are continued beyond stage
b
we have from (3.1.2) that ob-
n
n
if and only if
n(~s ~ T)
<
a
<
or
btl
~+T
n
where we assume
la+T
.;:.:n,--~~_ <
61
>
60
s < ..::;n,--~_ _
(3.1. 7)
n
0),
~ >
(i.e.,
a"
=
and will retain this assump-
tion for the rest of the chapter.
a.
C. Function.
a.c.
Using Wa1d's approximation, the
1
L
x
--n
(6)
'"
s
if
if
b"
n
n
s
--n
is
n
n
if
r(x )
< btl
1 - exp {h ( e) a' }
n
exp{h(6)b'} - exp{h(6)a'}
a
function in test
>
<
s
<
a"
n
(3.1.8)
a"
n
where
I
=
1,
if
S
-
< b II
E[{ P 1 (x) }h ( 6) 6]
Po (x)
h :f O.
So
1
L
x
--n
(6)
e
hn(~s - T)
hb
e
0
-
e
e
ha
ha
if
if
n
b" < s- < a"
n
n
s-
>
a"
n
(3.1.8a)
54
Hence
L (e)
=
n
and if
E(L
x
(e»,
-n
s-
has c.d.f.
F,
a"
=
=
F (b")
e
n
+ hb 1 ha
- e
e
Jn
{ hn(4s - T) _ ha}
e
dF e(s)
e
b"
n
ehbF e(b~) - ehaF (a")
hb
ha e n
e
- e
+
--:-hb,........;1~..,....h-a
e
Jail
n e
b"
e
hn(4s - T)
dFe(S).
n
(3.1. 9)
A. S. N.
ge
Denote by
the quantity
i. e. ,
ge
=
4E(s(x)
Ie) -
T.
(3.1.10)
Then
n
s
if
< b"
n
Ex (N I e)
-n
n
+ JL {L
ge
~
or
a"
n
>
(e)b' + (1 - L
n
~
(e»a'}
otherwise.
n
(3.1.11)
55
Hence
E (N I e)
n
=
n
Ja~
e hn(t:.s - T) - e ha ]
a'
(a'
b')
hb
ha
dFe(S)
[ n
n
n
e
- e
btl
+ 1ge
n
=
Ja~
[a - n(~s btl
n + 1-
ge
T) -
(a -
b)
e hn(~s-T) - e ha
e
hb
- e
n
ha
J
-
dFe(S)
(3.l.12)
=
n~
g
a
n" s dFe(S)
J
e b"
n
Ja"
a~-__b~-- n ehn(~S-T)dFe(S).
( hb
ha)
ge e
- e
b"
n
(3.1.12a)
We shall consider two particular cases for study; those in which
x
~
3.2.
N(e, 1)
and
X is a Gamma variable with
m degrees of freedom.
Testing a Gamma Parameter.
Here
1
m m-1 -ex
e x
e
r(m)
Pe (x)
=
Pe(~)
=
K(x )e-n(ex - m10g e)
s-
=
- x-
T
=
- m 10g(e 1 /e O)
x > 0;
e >
o.
(3.2.1)
And
-n
So
and
(3.2.2)
56
x 'has
a Gamma distribution where
xP8 (x)
1
mn mn-I -n8x
r(mn) (n8)
x
e
=
Representing the distribution of
-x
by
x >
o.
G,
=
F 8 (t)
(3.2.3)
(3.2.4)
Hence (3.1.9) gives
ehbG8(-b~) - ehaG8(-a~)
L (8)
n
=
1 -
----.;=--~hb,.---""""h-a......;.,-~
e
+
-h~b~l"--":"'"h-a J-b~ r (~n)
e
=
- e
- e
(n8) nm ynm-1 e hnm log (~)
-a"
n
hb
e (G (-b") - ehaG (-a")
8
e n+
1
(l)hnm
8
)nm
8
n
1 hb
ha
hb ha
(8+h(8 -8 )
e
- e
e -e
0
1 0
e
C~
1
r (nm)
[n(8 + (8 -8 )h]nm ynm-1 e-n (8 + h(81-80»
1 0
dy.
n
If we use the notation for the Incomplete Gamma Function ratio
P (a, x)
1
= r (a)
JX0
t a - 1 e- t dt,
(3.2.5)
57
then this reduces to
L (6) '" 1 -
n
hb
e
1
hb
6
81
ha
8
61
ha [e P(mn,~(-b+mnlo~» - e P(ron,~(-a+mnlo~»]
a
-e
a
6
1
[(_l)hnm ( 6 )nm {PC
6+h~ (-b + mn 1
hb ha
8
6+h~
mn, ~
og
0
e -e
+
8+h~
~
- peron,
(-a + mn log
6
ea1 »
81
e»}]
(3.2.6)
a
or
L (6)
1 -
n
hb
e
+
e
hb
1
-e
1
hb
ha
ha [e
P(mn, - n8b~) - e
P(mn, - n6a~)]
- e
8 1 hnm
8 nm{
ha (-6)
(8+h~)
P(mn, - n(6+h~)b")- P(mn,
- n(6+h~)a")}
a n n
(3.2.7)
where
a"
n
bIt
n
For any value of
evaluated.
8,
=
if
To determine
- m
h( 6)
h,
L (8)
n
PI (x)
log
() =
Po x
is known,
since
can be approximate ly
-~x
8
+ mlog - 1
8
0
i. e. ,
(3.2.8)
58
m = 1,
For the case
this gives
h
as the non-zero solution of
(3.2.9)
The solution of (3.2.9) will, by Wa1d's Lemma [18], satisfy (3.2.8),
so that for distributions of the form (3.2.1),
m,
h(8)
is independent of
and given by (3.2.9).
The A.S.N.
Using the notation of (3.1.10), with one obvious modification,
PI (x)
E(log
() 18)
Po x
8
=
E(- l:Ix) +m log (_1)
=
8
l:I
- - m + m log (..1:..)
8
80
80
(3.2.10)
From (3.1.12), (3.2.3) and (3.2.4), we get
En (N 18)
J:....
g8
n +
.
+
.
8
hb b ha
[-nm log -81 + a\b - ~a ] {G
(-b") - G
(-a")}
8,mn
n
8,mn
n
o
e-e
_bl'
~l:I J.
8
n
1
r(mn)
• (n8)mn xmn e- n8x dx
-aII
n
(a - b)
-~h:-:b--7h-
g (e
8
-e a)
J-b~ ehnm10g(~-)
1
0 r (mn)
-a"
n
(n8) nmxmn-1 e -n(hl:l+8) x dx
59
i. e. ,
1
({-nm 1
n + ---.;;;---
E (N 18)
n
m(loJl80
~)
8
8
ha
-be HG
(-b")
hb ha
8,mn
n
1 + ae
eo
og
hb
e-e
- G
(-a")} + nt. • m {G
+l(-b") - G
+l(-a")}
8 ,mn
n
n
n
8 ,mn
8 ,ron
8
8
a - b
e
hb
-e
ha
We can relate
e
-n8y
(8~)hnm
to
G
8,mn+l
dy
=
1
{G8+ht.,mn(-b~) - G8+ht.,mn(-a~)}).
(8:t.h)nm
G
8,mn
mn
- n8. Y
e
•
-n8y
For
+
1
n6
f
ron .y
mn-l
e
-nay
dy.
Hence
G
(b")
G
( a")
8 ,mn+ 1 - n 8 ,mn+1 - n
(n8) mn+l [ ( _ a II ) mn e n8a"n
1
---.,._-,....
r (mn+l)
8
n
=
- (_b")mn e
n
-
n8b"
n]
+ {G
8,mn
(-b")
n
(-a")}.
8,mn· n
G
So
E (N 18) ~ n +
n
+
A
U
(n8)mn+l
n8a"
n8b"
2 [(_a")mn e
n _ (_b")mn e
n]
r(mn) 8
n
n
1
~t.
m(lol;;n-:- - -)
80
8
8
hb
ha
1
({
1
nt.m
ae -be HG
(-b")
-mnl08'8 + -8- +
hb ha
8,mn
n
81
t.
o
e -e
m(log-;;- - -)
80
8
- G8 ,mn (-a")}
n
a - b
hb
e
ha
-e
(3.2.11)
60
or
E (NI e)
n
n{l - G
(-b")
e,mn
n
""
1
+
e
tJ.
m(102 - -)
eO
+
G
e,mn
(-a")}
n
A
(ne)mn
nea"
(.::.!!. •
{(_a")mn e
n
e
r (mn)
n
(_b,,)mn e
neb"
n}
n
e
hb b ha
+ae - e
{G
(-b")-G
(-a")}
hb ha
e,mn
n
e,mn
n
e -e
i.e.,
E (N!e) "" n{l- P(mn, - neb") + P(mn, - nea")}
n
n
+
n
A
(ne)mn
nea"
1
(.::.!!. •
{(_a")mn e
n
e
tJ.
r(mn)
n
e
m(lo2 - -)
eo
e
hb b ha
+ ae - ~a {P(mn, - neb~) - P(rnn, - nea~)}
hb
e -e
a - b e l hmn
e
mn
hb ha (""6)
(e+tJ.h) {P(mn, - n(e+tJ.h)b~)
e -e
0
- P(mn, - n(e+tJ.h)a")}).
n
(3.2.12)
61
3.3.
Testing a Normal Mean with Known Variance:
the O.C. Function.
Suppose without loss of generality that
Then in (3.1.4),
1
---- e
=
P8 (x)
-~(x-8)2
(3.3.1)
•
&
s(x) = x
and
n
= x-
,(8) = ~82;
T = ~(8t - 8t).
Hence
s-
a"
n
=
=
1
n
I:
i=l
x.1-
(3.3.2)
a + k(
nt.
"2 8 1 + 8 0 )
(3.3.3)
and
In the case of the present problem,
(3.3.4)
h(8)
So for the O.C. function, (3.1.8a) gives for the conditional test
T(x),
-n
Hence
1
L
x
-n
( 8)
~
exp [hnt.{(x-8) - ~ht.}] - e
hb
ha
e
- e
0
-
if
X
<
b II
n
if
b"
n
<
x
if
x
>
a"
n
ha
<
a".
n
(3.3.5)
62
So for the PSPRT
T,
if
n
~
is the c.d.f. of a
N(O,l)
variable,
(3.1. 9) gives
hb
ha
Ln(e) ~ h~ ha ~(In(b~-e)) - h~ ha ~(In(a~-e))
e -e
e -e
+ hb
e
and since
1
-e
ha
J
a~~
n
2TI exp[hn~(x
- e -
~h~) -
n 2
I(x-e)
]dx
(3.3.6)
b"
n
hnA(x-e-~n~h) - ~n(x-e)2
= -~n(x-e+h~)2,
a"-e+h~
=
Jn
b"-e+h~
the last integral
~ _~y2
2TI
e
d
y
n
where
y
= x-
- e +
h~
•
Hence, since
In(b" - e)
n
In(b" - e + h~)
n
and similarly for terms in
L (e)
n
~
e
e
hb
hb
-e
ha
a"
n'
~(--E- + ~Inh~)
In~
=
=
(3.3.6) gives
ha
e
~(--1!.... + ~/nh~)
hb ha
In~
e -e
(3.3.7)
63
An alternative form is
L (e)
n
- ep(l_ .:1'vn(e +e -2e))}.
1 o
IriIl
(3.3.8)
We introduce the following notation, to be used here and in
Chapter V:
ep+ (a, e)
=
ep (~ + .:1'vnhll)
.JD.ll
(3.3.9)
ep (a, e)
-
=
ep(~ - .:1'lUhll)
.JD.ll
The derivatives will be required later;
=
if
ep'(x) =
~(x),
write
ep~(a,e)
(3.3.10)
and similarly for
fined.
~_(a,e);
The dependence of
ep
also expressions in
and
~
on
nand
b
II
are similarly deis to be under-
stood.
Then concisely, (3.3.7) gives
L (e) '"
n
(3.3.11)
64
n + 0,
We note that as
stochastically; i.e., as a continuous
real number,
1 _ e ha
L (6)
n
e
hb
ha'
-e
the approximate D.C. function of a Wald SPRT.
L (6)
+
n
1 or 0
Further, as
n
+
00,
according as
Error Probabilities.
Let
and
an
and
L(6 1 ).
PSPRT,
at
Sn
be the nominal error probabilities
Then, writing in terms of the boundaries
6 = 60 ,
a
h(6) = 1,
and at
6 = 61 ,
h
=
1 - L(6 0 )
A and
-1.
B
of the
So
n
=
1
+ ~ ~{.-E- +
A-B
full
lUll} _ ~
2
A-B
~{-2..- +
full
full}
2
(3.3.12)
65
If we use the notation
(3.3.13)
and
<p_ (a)
<P(...2- - ~;;L'l)
fuL'l
=
with similar notation for expressions in
b,
and for
¢+(a) ,
¢_(a),
etc., then
(3.3.14)
We next investigate some properties of the error probabilities.
For a Wa1d SPRT,
a + B
<
1 whenever
0
<
B
1
<
<
A.
The same holds
for the PSPRT considered in this section.
LEMMA 3.1.
an + B
n
<
1
whenever
0
<
B
<
1
<
A.
Proof:
<
1 + AB - A + 1 - B
A - B
<p ( )
+ a
since
i. e. ,
<
<
since
A- 1
<
1,
A- B
and
B < 1;
and
+ B - AB + A - 1 <P+(b)
A - B
66
It is easy to show that the conditional and nominal conditional
error probabilities
S'
x
-n
a'
x
-n
a' + S'
x
x
-n
-n
and
ax' Sx
-n -n
satisfy
(3.3.15)
<
But one cannot deduce that
a' + S' < a + Sn'
n
n
n
since integration
is performed w.r.t. two different measures.
Behavior of Error Probabilities.
THEOREM 3.2.
For given boundaries
a <~~2,
increases, whenever
Otherwise
Sn
decreases as
n
a
write
B,
increases as
n
n
n
decreases as
and
Ibl < ~n~2,
increases.
Q
"'n
instead of
n,
with
t > O.
From Eqs. (3.3.12),
Clat
at
=
or whenever
We consider the procedure as a continuous time process, and
t
~
1
B [ -b
+-] exp[A-B 2t 3/ 2 ~
4t~ I2TI
1 [ b
A-B 2t V2
~
~
1
4t~
l21T
--]
A [ -a
+ ~J)
A-B 2t 3/ 2 ~
4e
--l:....
ili
b2
t~2
t~2
~(-+-+
exp[-
b2
~(-t~2
4
b)]
t~2
+ - - - b)]
4
2
~2
exp[- ~(.1!- + _t_ + a)]
t~2
4
~
1
1
-a
exp[+ A-B [ 2t¥2 ~ - ~]
2
4t
I2TI
~2
2 + _t__
a) ]
t~2
4
k(~
2
n
increases.
increases, whenever
Otherwise
a
a > ~~2
or whenever
and
Proof:
A and
67
and since
e
-~b
= -
1
etc., we get
IE
=
lB
D.
1
• 2Yt
I2TI
A-B
IA
-A-B
-
. -D. .
A-B
D.
. -.
<
0
if
1
-
l2TI
21t
=
1
b2
tD.
exp[-~(-2
• -
1
21t
I2TI
exp[-
~(--
2
tD.
+)]
4
a2
tD. 2
2
tD.
+)]
4
exp[-
~(-
exp [_
~(....L _ ltD.) 2 ]
ltD.
2
b
ltD.) 2 ]
- -
ltD.
2
<
exp[-
exp[-
a - ltD.) 2 ]
ltD.
2
~(-
(~ _ 1tD.)2
i.e.
ltD.
2
i.e.
since
b < 0,
aa.
i. e.
at
t
t > 0,
<
0
-b
if
If
a < ~ tD. 2 ,
this gives
If
a > ~ tD. 2 ,
i t gives
a
+
+ tD.
2
2
a > b,
b
>
Ja - t f
a - ltD.) 2 )}
ltD.
2
~(-
I·
which is so.
68
Similarly
=
A
b
A-B
2t¥2 t:,.
_. { -
t:,.
4rt
b2
1
- - } - exp[- ~(--2
ili
tt:,.
+ -tt:,.2 - b)]
B
-a
t:,.
1
a2
{
- --} exp [- ~(-2
A-B 2t3/2 t:,.
4rt fu
tt:,.
- -
AB
+A B {
-
2t
-a
$12
1
t:,.
+ - } - - exp [-
41t ili
t:,.
AB
-b
- A B { 3i2
2t
t:,.
t:,.
1
4rt
&
+ -} -
.fA • B t:,.
1
--:..;;.--'- - - • {exp [A-B 2rt
I2TI
-
<
b2
exp [- ~(-2
tt:,.
a2
tt:,.2
~(-
A • VB t:,.
1
b2
{exp[- ~(A-B
~ r.::tt:,.2
2vt
°
a2
tt:,.
~(-2
V27T
rtt:,.
> ~tt:,.2,
+ a)]
tt:,.2
+ -4- +
b)]
tt:,.2
+ -4) ] }
rtt:,.1
2
>
or
< ~tt:,.2,
>
0,
a
>
0.
this gives
a
Ibl
tt:,.2
4
tt:,.2
4
t
i.e.
If
+
- a)]
+ -)]}
i. e.
Ibl
tt:,.2
4
+-
if
I~+
If
4
+
tt:,.2
b
>
-
(-b) - a
<
tt:,.2.
we get
a >
b
(true).
Q.E.D.
69
COROLLARY 3. 3.
Sn
For all symmetric procedures, with
decrease as
The case
n
increases, for all
= 0:
o
o if
h(8)
(3.3.11) gives
b = -a,
a
n
and
n.
i.e.
= 0,
h(8)
and hence we use d'H6pital's Rule;
(3.3.16)
Monotonicity of theO.C. Function.
This can be easily established, using the sufficient conditions for
the exponential family.
We know that the procedure terminates with
probability 1;
Pr{ accep t HI IN =
i.e.
L:
n'
X.
I
1.
L (8)
n
Hence
If
3.4.
81
1
for
uK
Pr{accept HI IN
where
=
n' }
is hon-increasing, in
<
80 ,
<
=>
<
<
Similarly
n'}
L (8)
n
.,
0
=>
L:
n'
I
x.1.
> uK.
8.
is non-decreasing in
Testing a Normal Mean with Known Variance:
8.
the Average Sample
Number.
This again is constructed using Wald's approximation to the ASN of
the conditional test
x
for test
T.
n
T(x),
-n
and integrating over the distribution of
70
From (3.1.10), and (3.3.4),
ge
=
~e - ~~(el+eO)
=
- ~h~2.
So (3.1.12) gives
En (N I e)
hn(~(x-e) - ~h~2)
-
(a-b) • e
hb
e
=
n - _2_
h WA2
ha
ha
- e
] dFe(x)
- e
ha
- b~ }{iP(fu(a"-e» - iP(fu(b"-e)}
hb
a n n
e
- e
{~h~2 + ae
hb
a"
(a-b)
hb ha
e
-e
f
n
rn
[2;
-
exp [hnMx - e - ~h~)
b"
n
+
2n~ fa~
h~2
b"
[Il
J2if
n 2
(x-e) e - 7(x-e) dx •
n
Since
1
-~ny2 y=~-e
[- - - e
]
hnn
y=b~-e
= -1-
b
r
2
[exp{- ~(-- + ~vn(el + 6 - 26) }
0
/2nn
fu~
71
and using the result of (3.3.6), in which
In(a" - e)
n
etc.,
we get
En (N Ie)
'"
n[l -
H~ +
J§lnhl1) +
1nl1
+_2_
hl1 2
a-b
{H~
hb ha
I
e -e
rnl1
<p(.-E.- +
J§lnhl1)]
1nl1
-
J§vnhl1) -
H.-E.- -
J§vnhl1)}
I
rnl1
(3.4.1)
+ 2.
2
hl1
(3.4.2)
An alternative form can be obtained using the concise form (3.3.11) for
the O.C. function.
72
E
n
I
1
{bL (e) + a(l - L (e»}
Ee(Z)
n
n
(N e)
+ n[l -
~+(a,e)
+
~+(b,e)]
1
Ee(Z) [b~+(b,e) + a(l - ~+(a,e»
I
+ vnli{tP+(b,e) - tP+(a,e)}]
(3.4.3)
where the first term is the Wald form of the approximation to
E(N!e)
for a SPRT procedure.
increases,
the terms in
n
The A.S.N. when
and
It can be seen from (3.4.3) that as
In
h = 0
n
begin to dominate the right-hand side.
(i. e.
Since we cannot apply Wald's equation
=
E(Zl) = 0,
when
the case
the ASN in a SPRT.
h
= 0 leads to a different approximation to
We use the result [10]
N
E{( E Z~)2}
i=l
~
Here
=
E(N) Var(Zi),
where
(3.4.4)
Z~
~
So
For the conditional test
procedure begins with the
T(x),
-n
suppose (3.1.2) holds, so that a SPRT
(n+l)th observation.
73
Then
N
E
E{ (
i::;n+l
a'
n
neglecting excess over the conditional boundaries
and
b'
n'
=
h = 0,
When (3.1.2) holds and
L ;x
( 8)
=
-n
a'
n
a'-b' ,
n n
so
1 - L
x
( 8)
=
-xl
b'
n
a'-b' ;
n n
and
a' - b'
n
n
=
a - b;
therefore
a'
n
b'2
= n
a - b
N
E{
E
i=n+l
Z'}2
i
=
-
b'
n
a'2
n a - b
- alb'
n n
(3.4.5)
and so if (3.1. 2) holds,
E
x (N18 =
80+8 1
.
-n
2
)
1
= n--a'b'
/::.2 n n
i.e.
n
8 +8
E
=
x (N18
-n
1
2
0
)
'"
if
PIn
POn
-:5 B
or
~
A
n - {b - n/::.(i-8)}{a - nt:.(i-8)}
if
PIn
POn
B < -- < A
. -/::.21
74
Hence unconditionally,
E (NI e =
n
n -
-..!-2
tl
Jb" x
n
<
[ab - ntl (a+b )
<
(x- e)
a"
n
(3.4.6)
•
The las t term
t
n { -ye -br 2]
= - ---
If.;
a
a
ID.tl
+
b
J ~ e-b2 dy}
vntl
vntl
So the last term
where
y = lO(x-e)
at
e
=
-t( e +e ).
1
0
75
Hence (3.4.6) gives concisely
An alternative form can be obtained in terms of
case of a SPRT, with DC function
L(6),
L (6).
n
In the
the approximation to the ASN
is given by
Here, (3.3.16)
E (N I
n
6 +6
1
2
0)
and (3.4.7) lead to the form
n(l - <P(~) +
vnLl
<p(....E-»
vnLl
(3.4.8)
3.5.
Numerical Results and Comparisons.
In this section, the results obtained for testing a normal mean will
be used to present some interesting and useful numerical results and
comparisons.
Tables 3.5.1 and 3.5.2 give the DC function and ASN of the PSPRT
when
Ll
=
.50,
n
= 1,
5, 15 and 30,
for the two cases
a
= -b = 3
76
and
a = -b = 5.
In this last case, when the boundaries are farther
apart, the ASN shows virtually no increase from
only begins to increase substantially with
n
n
for
= 1 to n = 5, and
n
>
15.
TABLES 3.5.
function and ASN for PSPRT;
DC
b = e1
Table 3.5.1.
b =
DC
050
eO;
n
a = -b
3
-
test of normal mean.
= 1, 5, 15, 30.
Function
e-eo\n
1
5
15
30
.50
.375
.25
.0474
.1824
.5000
.0472
.1822
.5000
.0398
.1734
.5000
.0251
.1510
.5000
__ e_
ASN
Tab Ie 3. 5 • 2.
b =
.50
e-e o\ n
1
5
15
30
.50
.375
.25
21. 724
30.487
36.000
21. 787
30.550
36.066
24.758
33.386
39.036
34.811
42.784
48.751
a = -b
5
OC Function
e-e o\
.50
.375
.25
n
1
5
15
30
.00669
.0759
.5000
.00669
.0759
.5000
.00653
.0755
.5000
.00522
.0718
.5000
ASN
e-eo\n
.50
.375
.25
1
39.465
67.863
100.00
5
39.465
67.863
100.00
15
30
39.826
68.112
100.27
44.214
71.197
103.50
77
Comparison with Wa1d SPRT.
The following table compares the ASN of a PSPRT with a Wa1d SPRT'
having the same error probabilities
= a;
Pr{wrong decision/H.}
1.
a,
and a fixed sample-size test:
= 0, 1.
i
a, Band
E(N)
are
nominal.
Tab 1e 3.5. 3.
8 1 - 80
f:,
= 1,
= 15,
n
(Wa1d) E(Nln = 15)
8 - 8 E(N)
a
= -b
a
5, a
= B = .002356
(PSPRT) Fixed sample size
0.5
36.554
29.748
31.945
0.0
12.035
16.164
31. 945
n'
The PSPRT procedure does better than the fixed sample size proh
=0
or
80
cedure, and improves the SPRT at
in observations required at
However, for
are
.0067,
a
81
= -b = 5.0,
so that fixing
is
= ~).
The difference
4.
the nominal SPRT error probabilities
n = 15
error probabilities here by about
(8 - 8 0
with these boundaries reduces the
3/5.
Tables 3.5.1 and 3.5.2 demonstrate what intuitton would suggest,
namely that separating the boundaries
value of
n
may be allowed before
stantia11y, or
L (e)
n
a
E(N)
and
b
means that a larger
begins to increase sub-
to change subs tantia11y.
Separating the
boundaries results of course in decreased error probabilities.
Comparison with Other Methods and Exact Results.
As noted earlier, the approximations given above for
E (Nle)
n
apply to a Wa1d SPRT when
n
= 1.
L (8)
n
and
Effectively they give exact
78
probabilities of stopping at the first observation, and are approximate
thereafter.
We give some numerical results below.
It will be seen
that the differences from Wald's approximations are small, but that they
are nevertheless an improvement.
Using a method of Page [15], Kemp [13] developed in 1958 formulae
for calculating other approximations to
procedure.
Table 3.5.4.
n = 1
L(6)
and
E(NI6)
in the SPRT
The following tables are partially based upon his paper.
ASN:
60
= -~, 6 1 = ~, h = -26.
a
= 7.5, b = -2.5;
in PSPRT
6
-.50
-.25
0
.25
.50
Wald
PSPRT
Kemp
Exact
4.990
9.324
18.750
18.733
13.359
5.007
9.341
18.771
18.751
13.369
7.0
13.0
27.7
24.8
16.2
6.4
12.0
25.2
23.2
15.4
The improvement in approximating to the ASN using the PSPRT approach is less than
1.2%
over Wald's formulae.
some errors in computing the Wald ASN:
gets ASN values of
for
Kemp appears to have
6 = -.25, .25 and .50,
8.7, 15.8 and 12.3 respectively.
The next table gives a similar comparison for the
a.c.
function.
he
79
Table 3.5.5.
b
Function:
DC
= -2.5.
n = 1
80
= -~,
=~,
81
h
= -28.
a
= 7.5,
in PSPRT.
8
Wa1d
-0.25
-0.125
0
0.125
0.25
.9831
.9223
.7500
.4942
.2817
PSPRT
Kemp
.9872
.9254
.7155
.4058
.1896
.9831
.9223
.7498
.4939
.2811
Kemp also makes a comparison for
Exact
.9861
.9241
.7239
.4276
.2113
= 5.0 = -b.
a
(3.3.11) gives
the same results as Wa1d's approximation, since the c.d.f. terms are
~(5±8)
~(-5±8),
or
compared.
which are nearly
1
or
0
in the range of
~(-2.5±8)
In the above table, however, terms in
~(7.5±8)
and
appear, allowing small improvements over Wa1d's results.
It appears therefore that the PSPRT, which is a SPRT when
only shows small improvements in the approximations to
E(NI8),
8
L(8)
n
= 1,
and
and these become negligible when the boundaries of the test
procedure are further apart.
Although exact values are not available, we give a second example,
with closer boundaries, giving approximate error probabilities
and
.022
.097.
Tab 1e 3. 5 •6 •
81 - 8 0
1;
a
= 3.2,
b
8 - 80 Wa1d DC PSPRT DC
0
.25
.50
.75
1.00
.96317
.85261
058182
.26997
.09657
.96316
.85244
.58119
.26891
.09552
= -2.3.
n
=1
for PSPRT data.
Wa1d ASN PSPRT ASN
4.1949
5.9574
7.3600
6.8607
5.3378
4.2233
5.9830
7.3836
6.8790
5.3499
80
Comparison with Anderson's Modified SPRT [1].
It was noted earlier that at
h
= 0,
the procedure we are con-
sidering can do better than either the SPRT or fixed sample-size procedures (the latter may indeed do better than the SPRT at
h
= 0).
Anderson [1] developed approximations to the DC function and ASN for a
sequential test having boundaries which are linear functions of
the number of observations.
the lower bound at
h
,
He obtained ASN values which are close to
= 0, but which are only slightly greater than the
optimum values of the SPRT at
symmetric tests having
a
h
= fl.
= B = 0.05
He obtained his results for
and
0.01,
with
8 1 - 80
In the PSPRT procedure, it was found that by varying both
a(= -b),
n'
=
.25.
nand
and keeping the error probabilities constant, the ASNs at all
values of
8 would vary, increasing with
Tables 3.5.7./8.
n
as shown below.
Comparison of PSPRT with Anderson's Modified SPRT for
given Error Probabilities.
Table 3.5.7.
a =
B
.05
a = -b Initial. fixed
2.944
2.7
2.6
2.5
2.4
2.2
(n = 0
n ASN (approx:
gives SPRT data)
h = ±l) ASN (approx:
132.50
143.88
149.86
155.96
162.40
176.04
216.70
208.59
207.44
206.96
207.33
210.34
Lower Bound
132.5
187.0
Anderson Modified SPRT
139.2
192.2
270.6
270.6
0
96
111
124
136
158
Fixed Sample-size
h = 0)
81
The ASN of the PSPRT reaches a minimum for
but the ASN for
Table 3.5.8.
= -b
a
a
h = ±1
=S=
is increasing with
Initial fixed
4.59
4.3
4.0
3.7
3.5
3.4
3.3
3.0
~
.01
= 81
n ASN (approx:
0
173
228
272
298
310
323
357
h
at
n
= ±1)
(n
o
gives SPRT data)
ASN (approx:
h
225.20
244.79
271. 20
299.43
318.50
327.78
338.29
366.88
527.90
492.03
468.35
453.61
447.87
445.86
445.69
446.41
Lower Bound
225.2
388.3
Anderson Modified SPRT
249.4
402.2
541. 2
541. 2
Fixed Sample-size
125,
~
n.
0.25
- 80
h = 0
= 0)
It can be seen that Anderson's procedure is better than the PSPRT
in general.
n
~
320,
occurs.
The ASN of the PSPRT does reach a minimum for
but the ASN for
h
=
±1
h
=
0
at
shows consistent changes as this
The optimum combination for a PSPRT thus depends on the cir-
cumstances, but we might take
20 observations (on average) at
improvement at
h = 0
a
= -b = 3.5,
h = ±1
n
= 298,
might not make
since the extra
t~e
2.18
worthwhile.
In Table 3.5.7, the partial procedure does produce an ASN at
h = 0
a
which is a substantial gain; an optimum procedure might be for
= -b =
2.7,
on the SPRT at
= 96.
n
h
= 0,
The results indicate that in order to improve
the error probabilities should be fairly small.
Anderson's results are much better.
The PSPRT procedure could be
used, however, when some results are already available, or if the cost
82
is unknown, a situation which will be considered in Chapter V.
Com-
putationally, the PSPRT may be easier to work with than Anderson's procedure, although there seems no analytical method of finding optimum
values of
a, band
the case of the PSPRT.
n
corresponding to given error probabilities in
The combinations of
examples were obtained by trial and error.
a, band
n
in the above
CHAPTER IV
ON MINIMIZING THE EXCESS RISK WHEN THE COST PER OBSERVATION IS UNKNOWN.
4.1.
Introduction.
In this chapter, we shall consider the problem of obtaining opti-
mum procedures when the cost per observation is unknown.
for an 'optimum' procedure in a class
The criterion
C of tests will be that such a
procedure minimizes the excess risk over the Chernoff a.m. risk, or
that it minimizes the average excess risk when the cost of an observationis a random variable having a known distribution.
Throughout the
discussion, the variability in cost will be assumed to be independent
of the observations.
In Section 4.2, expressions will be derived for the nominal excess
risk when the true cost
c'
1 c.
c
is erroneously assumed to be some quantity
These will be derived both for symmetric SPRTs and general
SPRTs, using results in Chapter 2.
In Section 4.3, the approximate
average excess risk will be derived when a previous estimate
c
n
of
the cost is available, for symmetric SPRTs and general SPRTs, and when
the cost
bution.
c.
~
per observation is assumed to have a certain Gamma distri-
84
Throughout the chapter, it will be assumed that the appropriate
a.m. risks are based on SPRTs having at least one observation, and that
the nominal error probabilities which minimize the risk also satisfy
a+S<l.
4.2.
Excess Risk due to Use of Incorrect Cost.
Symmetric SPRTS.
We use the notation introduced in Chapter II.
that for a minimizing error probability
a,
It will be recalled
the corresponding nominal
risk is
(4.2.1)
We shall express (4.2.1) in terms of Chernoff's approximation to
the minimizing error probability, i.e., from (2.3.6)
=
=
where the notation
y
and
yc,
=
(4.2.2)
V is introduced for convenience.
Then
(k1-k o) [1 + (1 -
=
If
•
yc
2(k -k )
V - (k -k )
1 O
V
) log { k _ \ 0 }]
Vyc[l + (1-2yc) log (
1 - yc
yc
)]
(4.2.3)
= a is supposed small,
= Vyc(l - logyc + 2yc logyc - yc) + O(y3 c 3).
(4.2.4)
85
Now suppose that the value
vation, while the true cost is
by the error probability
a'
c'
c.
is assumed for the cost per obserThen the risk will be "minimized"
= yc',
and the actual risk is approx-
imately
= Vy[c' + c(l-2yc') log (l~~;')].
(4.2.5)
Since the wrong value has been assigned to cost, this risk will be
larger than the desired minimum.
Hence if we define excess risk to be
that amount by which it exceeds the a.m. risk, and using Chernoff's
approximation,
Excess risk
~
Vy[(c'-c) + clog
l~~;'
_ clog
l~~C
l-yc' _ clog l-yc}]
- 2yc{c'10g --'-:-yc'
yc
~
Vy[(c' - c + clog
4-)
c
+ cy(c-c')
- 2yc(clogyc - c'logyc')] + O(y3 c3),
where
c'
Excess risk
is assumed to be of the same order of smallness as
~
c.
c
Vy(c' - c + clog -I) + Vy 2 c [-(l-2logy) (c' -c)
c
- 2(clogc - c'logc')] + O(y3 c 3).
In (4.2.6), the expression
Vy(c'
part, under the assumption that
•
So
c'
c
c + clog -I)
c
and
c
(4.2.6)
plays the dominant
are both small.
It
will be
86
seen presently that a similar expression dominates the Chernoff excess
risk for more general SPRT's; later it will be studied more closely.
General SPRTs.
We return to Chernoff's approximating error probabilities (2.4.7),
i. e. ,
L 1 (l-w) ,
where, if
c
is the true cost,
=
from (2.1.3).
Let
k O = y ac
(4.2.7)
2cw
2c(l-w)
- /;2 ,
/),2
The Chernoff a.m. risk is
and
k1
write
A
=
-.l....+
Low
We temporarily introduce the notation
1
L 1 (l-w)
(4.2.8)
87
k
U(c)
=
YO
L1(l-w)
a
L 1(l-w)
k1
1-Low
=
Y
1
1
-c -Low
(4.2.9)
W(c)
k
o
1 + L (l-w)
1
=
k1
LoW
1
Yo
-+
c L1 (l-w)
=
Y1
Low
Then
(Y 1-Y a )c + c{Ya-YaY1cA}log(U(c»
If
c'
+ c{Y 1+Y1 Ya cA }log(W(c».
(4.2.10)
is supposed - wrongly - to be the value of the cost, the
'supposed' a.m. risk will be based on
ct.
The actual nominal risk is
then
The excess risk is defined:
Excess risk
'"
U(c')
W(c')
(YI-Ya)(c'-c) + cYa log (U(C) ) + cY1 log (W(c) )
88
=
1
Y1
(Yl-YO) (c'-c) + cy [log(- - - .)
a
cLaw
1
Yo
1
Y1
log(cr - L w)]
a
1
Yo
+ cy 1 [log(cr + L (l-w)) - 10g(c: + L (l-w))]
1
1
1
Yo
1
Y1
- c10g{(- +
)(- - --)}]
c
L1 (l-w) cLaw
c)
113
+ Y1Y ocA(2c'log cr - 2clog
+ O(c )
=
+ 2cY Oy 1A(c10gc - c'logc') + 0(c 3 ).
(4.2.12)
In (4.2.10), as in (4.2.6), the leading term dominates, i.e.,
(y 1 - yo) (c' - e + e log ee,) •
89
The coefficients
k
i.e.
Y1 - YO
1
-
k
and
Vy
represent the same quantity,
o
c
c' - c + clog
4c =
c'
c'
c(- - 1 - log-)
c
c
0,
since
x - 1 - log x
~
°
for every
x > 0,
and we assume
c > 0,
c' > 0.
So the dominant term is non-negative.
This is of course desirable,
since here we are dealing with approximations to the a.m. risk and to
the excess risk, and it is important that non-negativeness of excess in
the true risks should be reflected in the approximations.
4.3.
Average.Excess with Variable Cost.
Suppose that the cost per observation,
independent of
X.
c '
i
is a random variable,
In this section, we shall derive the Chernoff excess
risk, averaged over the distribution of
when the mean observed
cost
-c
1
n
n
1:
n j=l
c.
J
from a previous experiment is used to estimate the true mean cost
Thus
E(c)
n
= c,
and if
c
c.
were known, the a.m. risk could be at-
tained as shown in Chapter II.
We suppose that
=
c
i
has a Gamma distribution, i.e.,
c. > O.
~
(4.3.1)
90
Then
and
E(c.) = c,
~
Var(c.)
m
~
Notationally, write
m
r (m, -)
•
c
Then
c
n
ron
r(mn, - ) .
c
Symmetric SPRTs.
In (4.2.6), if
c'
risk when the estimate
is replaced by
-c
cn'
we get the Chernoff excess
is used for cost instead of
n
c.
Hence if
the average excess is defined as the expectation of the excess risk,
Average excess
'"
VyE(c
- c + c10gc - c10ge )
n
n
+ Vy 2 cE[-(l-2logy)(en-c) - 2(clogc - e n logen )]
=
Since
-c
n
E log c n
where
x ,. . , r (m',
Vyc(logc - Eloge ) - 2Vy 2 c (clogc - E(c loge »
n
n
n
is a Gamma variable,
=
mn
c
log -
+ 1ji(mn) ,
is the Digamma Function
6),
(4.3.2)
d
dZ {log
r (Z)};
and i f
91
oo
E(X1ogX)
fo
=
em'
(m'+l)-l -ex d
f(m') (1 ogx) x
e
x
m'
""6
E(logY) where Y '" f(m'+l, e)
=
m'
=
e
m'
e
(- loge + ljJ(m'+l»
1
(- loge + ljJ(m') +-)
m'
Hence
=
E(e loge)
n
n
mn
1
- clog - + c(ljJ(mn) + - )
c
mn
(4.3.3)
0
(4.3.2) and (4.3.3) give above,
Average excess
'"
Vyc(log(mn) - ljJ(mn»
=
2
Vyc(1-2yc){log(mn) - ljJ(mn)} + 2Vy 2 -c + O(y3 c 3) •
ron
(4.3.4)
In many problems, the variance of
E(c.)
1.
n
is not necessarily small.
c.
1.
Hence
will be small, even when
m may be large.
may be large, and the Gamma-type distribution of
to normality.
-c
n
Alternatively,
will be close
The asymptotic form of the Diagamma Function can then be
used, Le.,
ljJ(Z)
log
z-
1
2Z -
~
~
B 2j
j=l 2jZ
2J"
(4.3.5)
92
where
B2j
1J;(Z)
are Bernoulli numbers.
1ogZ
Thus, if
_
J:... __1_ +
12Z 2
2Z
mn
1
120Z 4
Then
1 + 1
252Z 6
240Z 8
is reasonably large, (4.3.4) and (4.3.6) give
Vyc(l+2yc)
2mn
+ Vyc(1-2yc)(
Average excess
(403.6)
In (4.3.7), it can be seen that if
ron
1
2 2
12m n
~
10,
the average excess, given that the true cost is
=
c {l-w
2mn E(Zll e 1 )
-
(4.3.7)
w}
E(Zll eo)
and if
c,
c
is small,
is of the order of
(4.3.8)
General SPRTs.
We now replace
c'
by
-c
n
in (4.2.10) and take expectations, to
obtain
=
=
(4.3.9)
93
Asymptotically, if
mn
is large, we may use (4.3.6), obtaining
Average excess
(4.3.10)
If
mn
~
10,
say, and if
c
is small, then as in (4.3.7), the
average excess is of the order of
Since
Yo < 0
and
y
I
(k l
-
k o)/2mn.
there is no inconsistency in the op-
> 0,
posite signs which appear in the first terms of (4.3.7) and (4.3.10),
Le.
in the quantity
y(1+2yc)
of (4.3.7) and in the quantity
The smallness of the approximate average excess is derived from
( 4. 3. 7) and ( 4 • 3.10), 1. e.
(k l
k o)/2mn,
-
may more appropriately be
studied in its ratio to the Chernoff a.m. risks in (4.2.3) and (4.2.10).
To terms in
clog c,
the leading quantity in (4.2.3) is
Vyc(l - logyc)
which dominates the risk
R '(>"w)
for synnnetric SPRTs.
o
The ratio of
the approximate average excess to this risk is of the order
Vyc
1
2mn • Vyc(l-logyc) ,
=
1
2mn(1-logyc) •
(4.3.11)
94
For general SPRTs, the equivalent ratio, from (4.2.10), is
1
k
1
- k 1log(-)
L w
o
=
2mn[Y 1 - YO +
=
1
Yo
2mn[1 - loge + -""-YI-YO
(4.3.12)
Hence under circumstances in which the expansions above could be
used, i.e., when
i)
c
ii)
mn
is small
is large,
the Chernoff average excess risk is small in proportion to the Chernoff
a.m. risk, as evidenced by (4.3.11) and (4.3.12).
The situation considered in this chapter, when some estimate of the
cost is available, is likely to be more frequent than one in which no
estimate is available at all.
However, the latter situation, to be con-
sidered in the next chapter, is one in which the Partial Sequential
Probability Ratio Test procedure, whose properties were studied in
Chapter III, is a suitable one to use.
CHAPTER V
ON MINIMIZING THE RISK WHEN THE COST IS UNKNOWN:
5.1.
A SPECIAL CASE.
Introduction.
When no estimate of the mean cost per observation in a sequential
testing procedure is available, the analysis of Chapter IV does not
apply.
A procedure which has some appeal in this situation is the PSPRT of
Chapter III.
The first
n
observations are used to estimate the cost,
and we shall use the observed mean cost
-c
n
where
c.
~
=
1
n
c. ,
I:
(5.1.1)
~
n i=l
is the observed cost of the i-th observation.
tation of Chapter III, the conditional test
SPRT on the basis of the observed
T(x )
w(x )
-n
A
n
and of
risk is used,
61
-
60 ,
and
-c.
n
A
n
B
n
of
T(x
)
-n
x
-n
and
.~
cn'
~
and
c •
n
the test
will generally be functions of
In one special case, however, if Chernoff's a.m.
and
B
n
are free of
and on the expected error rate.
Lo = L 1 = L,
is constructed as an
... , xn )
x = (xl' x 2 '
-n
To minimize the conditional risk, given
boundaries
-n
Using the no-
w(x)
-n
and depend on
-c
n'
on
This is the case in which
X is a normal r.v. with known variance, and
E(x) = 6.
96
This leads to a PSPRT procedure, in the sense that boundaries
and
B are to be chosen which depend on the first
--
n
observations only
through· cn '
but which are free of the posterior probabilities
and
of
l-w(x)
--n
and
after
x
--n
A
has been observed.
If
w(x )
--n
-c
n
were the true cost, these boundaries would lead to an approximately
minimum conditional risk for
T(x),
--n
and since they would do so for
all
the procedure would approximately minimize the unconditional
risk for
T.
n
In Section 5.2, the boundaries
procedure is developed.
tigated when the cost
A and
B are obtained, and the
In Section 5.3, the average excess is invesc
and for the special case
has the Gamma distribution of Chapter IV,
i
w
=~.
In Sections 5.4 and 5.5, the average
excess is espressed in explicit but approximate forms using the methods
of statistical differentials and of quadrature respectively.
Section
5.5 gives numerical results and graphs which show
i)
the existence of an optimum value of
n
minimizing the
average excess.
ii)
that the method of statistical differentials geves reasonably accurate results when compared with quadrature.
5.2.
Extension of the Chernoff a.m. Risk to the PSPRT of a Normal Mean.
In this section, the boundaries
and the procedure is justified.
A and
B
are explicitly obtained
Later, the lack of information on cost
is considered, but in developing the procedure we shall denote by
the estimator of the true cost.
~
c
n
97
The posterior probabilities of
l-w(x)
-n
8
0
and
81
are
w(x)
-n
and
where
w(x )
-n
w
(5.2.1)
w + (l-w)exp{n~xn - ~n(81-8~)} •
The a.m. Conditional Risk at Stage
Let
r(x)
-n
n.
be the nominal a.m. risk, given that at least one more
observation is taken, and that beyond stage
of a SPRT.
n
the procedure is that
In Theorem 2.2, it was shown that the nominal a.m. risk is
-c
l-1T*
l-w(x )
n
1T*L + E (Z) {(1-21T*) log ~ - (1-2w(~»log( w<;J )}
1
where
1T*
is the root of (2.2.8) •
1T*
is a function of
but
~,
Chernoff's approximation (2.2.12), viz.
-c
1T*1
=
n
E1 (Z)L
2c
=
n
~~2L
gives a good approximation to the a.m. risk, and is free of
1T*
does not depend on
r(x )
-n
=
x
-n
x •
-n
Thus
very much, and approximately
l-1T*
l-w(x )
1T*L[l + (1-21T*) log ~ - (1-2w(~»log w(x)]
-n
(5.2.2)
98
If
min[Lw(x),
L(l - w(x »]
"""'"'1J.
we stop and decide at stage
r(x) ,
<
"""'"'1J.
(5.2.3)
"""'"'1J.
n.
In this case,
w(~) >
J§
=>
accept
H
o:
6 = 60 •
From Theorem 2.2, it can be seen that we accept
HI
at stage
n
if
w(x ) < J§
"""'"'1J.
and
i*
> w(x ) •
"""'"'1J.
The second inequality gives
w < TI*[w + (l-w) exp{n~xn -
n(6f-6~)}] •
So
i.e., on taking logs and rearranging,
1-TI*
(5.2.4)
TIT)
This is the condition for stopping at stage
If we assume that the expected error rate
the condition
w(x) < J§
"""'"'1J.
-
x
n
TI*
1.
"2(6 1+6 0 ) + -
1
n~
and accepting
is less than
gives
>
n
w
log - 1-w
J§,
HI.
then
99
and follows from (5.2.4), since
if
Similarly, from Theorem 2.2, we stop and accept
1T*
< ~.
Ho at stage
n
if
w(x)
~
>
-n
1T*
and
1 - w(x) •
>
-n
The second inequality leads, analogously to (4.1.4), to
(5.2.5)
Then
1T*
< ~
> ~.
w(x )
=V
-n
Consider now the PSPRT procedure of Chapter III.
(5.2.4) and
(5.2.5) can be compared with Eqs. (3.3.3) of that chapter, and i t is
clear that a PSPRT procedure which approximately minimizes the unconditiona1 risk (since it does so for every
a
x)
is
n
given by boundaries
=
(5.2.6)
1T*
·w
b
10g(1_w • 1-1T*)
or
A =
B
Further,
0
<
B
<
1
1T*
<
1-w
. 1-1T*
1T*
w
1-w
. 1-1T*
W
(5.2.7)
1T*
A if
<
min(w, 1-w)
as for the SPRT in Chapter II •
Inequality (5.2.8) ensures that
(5.2.8)
100
It is understood that
-
upon· c n '
A, B, a
and
b
are functionally dependent
TI*.
through
With these parameter values, the approximate a.m. risk is given by
=
R(w,n,c Ic)
n
where
a
L(wa
+ (l-w)S ) + c{wE (N!6 0 ) + (l-w)E (N!6 1 )},
n
n
n
(5.2.9)
are given by (3.3.12) of Chapter III and
and
n
n
En (N /6)
by (3.4.1).
We note from (5.2.7) that if
symmetric one, with
Numerical Results
B
w
=~,
the best procedure is a
= l/A, b = -a.
Chernoff a.m. Risks for PSPRT Procedures.
=
Tables 5.2 following show the approximate a.m. risks when
and
6 1 -6 0
.001,
= ~
~,
and
and for unit loss
cedures having
n
L.
If the cost
then of course the optimum rule will be for
according as
w
Ho or
H1
n = 1,
c
is known,
or not to take any
is accepted in the latter case,
The "no observations" rule would apply for the case
w
= .25,
c
= .01,
b = -1.852,
For
n
= 100
t:,
so that
= 0.5,
(and
n
=1
and
< ~.
or
>
~
The a.m. risks are calculated for pro-
initial observations.
observations at all.
.01
for costs in the neighborhood of
w =
since risk
0
<
B
<
1
for
n
=1
>
Lw
(and
<
A does not hold).
the a.m. risk more than doubles from
to
n
= 50
for
c
= .01),
t:,
a
= -.345,
n
while for
the increase is less , particularly for smaller cost.
= .25,
= 1 to
t:,
= 0.25
101
Tables 5.2.
Chernoff a.m. Risks for PSPRT Procedures with Known Cost.
= 8 0 vs HI:
Test of
Ho:
N(8, 1)
distribution.
8
8
= 8 1 , where 8 is mean of
= 8 1-8 0 •
~
TABLE 5.2.1
~
= 0.5,
=
w
.5, L
=1
= .25
w
n\c
.0075
.009
.01
.011
.0125
.01
1
5
10
15
25
50
.2053
.2056
.2105
.2219
.2599
04026
.2296
.2303
.2377
.2535
.. 3029
.4794
.2441
.2452
.2547
.2736
.3309
.5304
.2576
.2590
.2708
.2931
.3586
.5813
.2758
.2781
.2937
.3214
.3995
.6574
.2229
.2233
.2368
.2597
.3222
.5274
TABLE 5.2.2
~
= OE5,
w
= .5, L = 1
w = .25
n\c
.00075
.001
.00125
1
5
25
50
100
c03629
.03629
.03737
.04569
.07652
.04595
004595
.04775
005980
.10J78
.05503
.05503
.05770
.07371
.12701
..
.• 001
.04197
.04197
.04481
.05846
.10154
TABLE 5.2.3
~
= 0.5,
w
=
.5,
L
=1
n\c
.0001
.00001
.000001
1
50
100
200
500
.006495
.007064
.010443
.020018
.0500001
.000835
.000853
.001093
.002004
.0001019
.0001023
.0001172
.0002010
102
TABLE 5.204
~
=
0.25, w = .5, L = 1
w = .25
In\c
.0075
001
00125
.01
1
5
10
15
25
50
.3839
03847
,3909
.4024
.4359
.5531
.4068
.4106
.4256
.4476
.5040
06814
.4319
04373
.4608
.4934
.5725
.8099
.2726
.3288
.3693
.4047
.4750
.6658
I
TABLE 5.2.5
~
= 0.25,
w
= .5, L = 1
w = .25
n\c
.00075
.001
.00125
.001
1
5
25
50
100
010866
.10866
010870
010990
.12069
013412
013412
313423
.13656
015359
015695
015695
.15720
.16104
.18513
.12136
.12136
.12160
.12564
.14667
TABLE 5.2.6
~
1
= 0.25, w = .5, L = 1
n\c
00001
cOOO01
.000001
1
100
200
300
.02146
.02180
.02544
0002893
0002897
.003043
0003515
00003632
.0003632
.0003675
.0003937
103
Tables 5.2.3 and 5.2.6 show how the Chernoff a.m. risk for the
SPRT procedure behaves when the cost becomes small.
the value of
n,
As the cost
~
0,
at which the increase of the risk ceases to be slow,
becomes larger.
It can be noted that the steepest increase in the risk for larger
n
results from the growing domination of the term
of the first
n
observations.
For very large
n,
cn,
due to the cost
the other terms
appear, in fact, to become negligible, as evidenced in Table 5.2.3 for
n =
500.
When
is smaller, the risks are larger, but the proportional
~
increases in risk are smaller over the same change in
5.3.
n.
Average Excess.
For the special case in which
w=
~
and loss
L = 1,
investigate the effect of lack of information on cost.
we next
In the previous
section, the apparent a.m. risk is based upon a supposed cost
although
-c
If the true cost is
is an estimator.
n
W
1-7T
= 10g(1_w'-;-)
7T
=
with
w =
~;
b = -a
and
2e
=
/;2'
Using the notation of (5.2.9), we are interested in
ER(~, n,
over the distribution of
C
n
e
n
,
c, boundaries for
the same test procedure would be determined by
a
-e
cn Ie)
As in Section 4.3, suppose that
104
c.
~
~
m
f(m, -),
c
so that
1
m m m-l -.!!!x
x
e c
c
p(c )
i
= f(m)
(-)
E(c.)
= c,
Var(c. )
~
c2
= m
~
Then
-c
n
mn) ,
(mn, c
~
Ecn = c '
Var(c")
n
mn
Evaluating (5.2.9), we get from (3.3.12)
et
n
+ 13
n
B(A-1)
b
rA(l-B)
a
rA-B q>(- + ~vntl) - A-B qi(- + ~vntl)
=
1 -
=
~ + ~lUtI,
and
qi(~ + ~lntl)
=
~
~
Now put
g+
- = ~
IUtI
g
inti
~lUtI
(5.3.1)
If we also write
qi+
=
1rltl
qi(g+)
(5.3.2)
qi
=
qi (~ -
inti
~lntl)
=
qi(g-)
and since
b
= -a
=>
qi(-E- + ~~) = 1 - qi
inti
qi (-E-
inti
(5.3.3)
- ~lntl ) = 1 - qi+
105
and since, further,
1-21T*
A - B = -1T-*=-(1""':-;;';'1T;""*-'-)
and,
1T* =
2c
--;:zn ,
we have
(5.3.4)
Next, we require
h =1
and
-1
En(N!So)
and
En(Nls 1 ).
From (3.4.1), by putting
we get
+ a-b {~ + ~ -1}
B-A +
+ a-b {~ + ~ _ 1}
A-B +
(5.3.5)
Now
bA-aB
B-A
bB-aA
A-B
=
-
a-b
A-B
-
=
2aA
A2_1 •
106
Hence
E (N Ie) + E (N Ie)
nOn
1
=
(5.3.6)
Hence (5.2.9) gives
=
+ c[(n - 2a
~2
(1-2TI*)(1-~ _~
+ -
) +
zln
{~(g )
~
-
_
~(g+)}]
,
(5.3.7)
Ht) = ~'(t)
where
If, for given
TI *
and
n,
Zc
=
---.!!.
~2
the apparent risk is (5.3.7), then the excess
over the (approx.) a.m. risk when
Excess
= R(%,
nr
•
Cn Ic)
Over the distribution of
c
is known, is
- R(%, n, c/c).
(5.3.8)
the average excess is
ER(%, n,
cn Ic)
- R(%, n, clc).
The exact evaluation of this quantity presents analytical difficulties,
but when
-c
n
,
i. e. ,
c
is small, and
1
;;n
,
m is large, the coefficient of variation of
is small, and we can apply the statistical differential
method to approximate the average excess.
This method has the advantage
107
that if we only take the expansion to the variance term, then the dis-
cn
trlbu tiona!- form of
moments •
doesn't matter beyond knowing the firs t two
Thus
2
1 Var(TI*) OTI*2
0
ER(~,n,cnlc) ~ R(~,n,clc) + If
[R(~,n,cnlc)]
IC =c
n
(5.3.9)
Evaluating the required terms separately,
4
= flt
a
1
l-TI*
""'* [-- log --*- +
0"
/Ufl
TI
g'
+
1
- --
IUfl
.
I
~y'nfl]
1
(5.3.10)
g"(TI*)
+
=
0
OTI*
[g'(TI*)]
+
(1-2TI*)
= _1_ •
TI*2(1-TI*) 2
/Ufl
=
g: (TI*)
=
g'(TI*)
+
~(g
+
(TI*»
108
Abbreviating the notation, write
as before.
g
-
, cp - , etc. , are similarly defined.
Then
d2
[4>+]
d7f*2
=
(gil _ g g' 2)ep
+
+ +
+
d
d7f* (CP+)
=
- g+g~cp+
d2
(cp+)
d7f*2
=
+ g gil _ g2 g '2)ep
- (g'2
+
++
++
+
=
4>+ + 7f*g~cp+
=
2g'cp
+ 7f*(g" - g+g~ 2)ep+
++
+
log~}
=
(1-27f*)
-)
- 210g (1-7f*
7f*(1-7f*)
7f*
d2 { (1-27f 1<)
1-7f*
log~}
d7f*2
=
1-27f*
2
2
+
2
7f*(1-7f*)
7f*2(1-7f*)
(1-7f*) 2
=
1-47f*2
7f*2(1-7f*) 2
d
d7f* (4)+
. 7f*)
d2
(7f*4>+)
d7f*2
d
{ (1-27f*)
d7f*
1-7f*
Also, for an arbitrary twice-differentiable function
=
=
where
f(7f*) ,
109
Hence
2
{(1 2 *) 1
(1-7f*)'1>}
d
d7f*2
- 7f
og
7f*
+
=
1-7f*
+ 210g ~}g~
Finally, in order to evaluate (5.3.9), we substitute the above
quantities, and evaluate them at
2c
7f* = 7f = -f),2
Le., at
~
c
n
Ec
n
= c.
This gives
2c 2
[2g~~_ + 7f(g~ - g_g~2)~_ + 2g~~+
mn
ER(~,n,cnlc) - R(~,n,clc) ~ f),4
2
+ 7f(g" - g g' 2)~ + -2c • 1-47f
('1>+ + '1>
+
+ +
+
f),2
7f2(1-7f)2
evaluated at
- 1) - -2c
f),2
1 27f
7f(1-7f)
{-7.-:-'-7"
(5.3.11)
110
For the sake of a better comparison, it might be desirable to
measure the average excess over the a.m. risk in the class
C of SPRT
procedures, including those in which no observations are made.
The
boundaries of such an SPRT, leading through Chernoff's approximations to
a risk very close to the a.m. risk, are the same as above, with error
probability
2c
a = TI = - - under both hypotheses.
/::,2
Using the notation of Chapter 2, when at least one observation is
taken,
+ {nc - TI10g (l-TI) (1-2TI)}(1 TI
=
<P
+
-
<P )
-
[nc - TI{l + (1-2TI) log 1~TI}](2 - <P+ - <p_)
(5.3.12)
So
ER(ilf, n,
cn I c)
- R (A
o
W
Ic)
is approximately given by summing
(5.3.11) and (5.3.12).
5.4.
Method of Quadrature.
A more accurate measurement of the average excess is given by
evaluating
f
R(ilf, n, C Ic)dF(c )
n
n
(5.4.1)
in the form
(5.4.2)
111
summed at intervals of length
h
and truncated at suitable points to
be set out presently.
The s.d. of the distribution of
c
n
c
is
vii1ii '
and a suitable
degree of accuracy to the above integral seems" to be given if
c
=
h
(5.4.3)
lOItiill
-c
summed from
max(O, c - ~)
n
Ir;m
The choice of
h,
to
6c
c + --
Ir;m
as one-tenth of the standard deviation of
~
c ,
n
is chosen because, for bell-shaped distributions whose density functions
are not too far removed from normality, it can give a very close approximation to the area under the curve, i.e., for this value of
h
c cn )
h
mn mn
- mn- 1
(mn
f(mn) (7)
E cn
exp is very close to
1 when
generally smaller than
1,
mn
is not small.
Since
and decreases overall as
R(~,
c
n,
Cn Ic)
is
decreases, it
would seem reasonable that (5.4.2) and (5.4.3) give a close fit to
(5.4.1).
This is a heuristic argument, but the probability integral was
computed by the method, in a few cases, and found to be extremely close
to
1.
We use the refined form of Stirling's approximation [8]
k
(mn) I "'" (2n)"2 (mn)
mn+k
1
1
"2 exp(- mn + - - )
l2mn
360m 3n 3 •
(5.4.4)
The supplementary factors in (5.4.4) are required in order that the
average excess computed by quadrature gives a sufficiently accurate
result.
It turns out that without these factors, the approximation to
112
average excess using statistical differentials gives a more accurate
result in many cases; but using (50404), the statistical differential
approach still gives results agreeing with those obtained by quadrature
to within
3.5%.
(5.4.2), (5.4.3) and (5.4.4) now give (replacing
1
r (mn)
by
mn
(mn) !)
hmn
c
--->-~--­
• e
- mncc -c)
c n
1
• exp[- l2mn +
-
·c
. -- . L:R(c )
) ""
f R(,!€,n,cn !c)dF(cn;z.;;:-(mn)
mn+ e- mn
(mn)mn
n
1
(..1!.) mn-l e-mn
c
3 3
360m n
-
1
1
1
_
cn mn-l - mn(c -c)
=
exp (- - - +
) L: R( c ) (-)
e c n
3
3
10&
l2mn
360m n
n c
(5.4.5)
The summation is as described above.
Although the analytical presentation here of the quadrature method
is considerably briefer and more concise than that of 5.3 by statistical
differentials, the computation of (5.4.5) involves a substantially
greater amount of time on the computer than do (5.3.11) and (5.3.12),
and this is a disadvantage.
The only reason, therefore, to use quad-
rature in preference to statistical differentials would be a gain in
accuracy.
Numerical comparisons, to be presented in 5.5, showed, how-
ever, that for smaller values of
c,
the method of 503 is adequate,
both in evaluating the average excess based on Chernoff's a.m. risk,
and in evaluating the optimum value of
to an overall minimum average excess.
n
in PSPRT procedures leading
113
5.5.
Numerical Results.
Tables 5.5.1 to 5.5.4 show the approximate average excess as eva1-
uated using the method of statistical differentials for cases in which
~
and
c
are fixed.
It will be seen from these, and from the diagrams
in Figs. 5.5.1 to 5.5.4 that although
n,
m varies, the optimum value of
minimizing the average excess, does not change very much under these
conditions.
The value of the Chernoff a.m. risk for aWa1d SPRT, given that
the corresponding cost
c
is known, is stated with each table, and it
should be noted that throughout the entire range of
n
displayed in
each table; the ratio (average excess risk)/(Chernoff a.m. risk) is
small.
Examples might arise where this would not necessarily be so
for instance, if the variance of
c.
1
were larger.
The curves in each figure lie inside one another as
m decreases,
and the (approximate) minimum average excess is greater when
smaller.
Var(c.)
1
This last feature is what one would expect, since
= c 2 /m
Tables 5.5.
is greater when
m is smaller.
Approximate Average Excess of PSPRT procedures for tests
of normal mean;
•
m is
02
= 1•
114
TABLE 5.5.1
~
=
6 -6
1
0
=
.25,
C
n\m
3
1
3
5
8
10
12
13
14
15
16
17
18
19
20
22
25
30
.005668
.001889
.001134
.000708
.000566
.000470
.000433
.000402
.000374
.000351
.000332
.000316
.000304
.000296
.000290
.000312
.000443
=
0001,
Chernoff a.m. risk
.13412
I
17
25
.0010003
.0003334
.0002001
.0001250
.0001000
.0000834
.0000774
.0000726
.0000691
.0000671
.0000666
.0000678
.0000710
.0000764
.0000951
.0001470
.0003152
.0006802
.0002267
.0001360
.0000850
.0000680
.0000569
.0000530
.0000500
.0000482
.0000476
.0000484
.0000507
.0000550
.0000614
.0000818
.0001357
.0003064
10
.001701
.000567
.000340
.000213
.000170
.000141
.000131
.000122
.000115
0000110
.000106
.000105
0000106
.000109
.000124
.000172
.000334
=
I
TABLE 5.5.2
~
=
.50,
n\m
1
3
5
6
7
8
10
11
12
15
20
C
=
.001,
3
.001355
.000452
.000271
.000225
.000192
.000169
.000148
.000150
.000164
.000282
.000812
Chernoff a.m. risk
10
.0004064
.0001355
.0000812
.0000677
.0000587
.0000536
.0000585
.0000712
.0000930
.0002300
.0007787
=
.04595
17
25
.0002390
.0000797
.0000478
.0000400
.0000351
.0000332
.0000428
.0000572
.0000805
.0002209
.0007729
.0001625
.0000542
.0000325
.0000273
.0000243
.0000239
.0000356
.0000508
.0000748
.0002167
.0007702
115
TABLE 5.5.3
6 = .25,
In\m
I
35
40
45
48
50
52
53
54
55
56
57
58
60
65
70
75
c = .00001,
Chernoff a.m. risk = .002893
3
10
17
.000001524
.000001334
.000001187
.000001113
.000001071
.000001032
.000001015
.000000999
.000000983
.000000969
.000000957
.000000946
.000000927
.000000909
.000000945
.000001059
.000000457
.000000401
.000000356
.000000337
.000000324
.000000316
.000000312
.000000309
.000000306
.000000304
.000000303
.000000304
.000000307
.000000338
.000000418
.000000570
.000000268
.000000235
.000000211
.000000199
.000000193
.000000189
.000000188
.000000187
0000000187
.000000186
.000000189
.000000190
.000000198
.000000238
.000000325
.000000483
25
.000000183
.000000160
.000000144
.000000136
.000000133
.000000131
.000000131
.000000131
.000000132
.000000133
.000000136
.000000138
.000000147
.000000192
.0000002821
.000000444
116
TABLE 5.5.4
~
=
.50,
n m
1
3
5
8
10
12
15
17
18
19
20
21
23
25
28
30
33
37
c
=
.00001,
3
.000013335
.000004444
.000002665
.000001666
.0000013331
.000001111
.000000888
.000000786
.000000744
.000000709
.000000681
.000000661
.000000648
.000000686
.000000888
.000001163
.000001875
.000003579
Chernoff a.m. risk
10
.000003999
.000001332
.000000799
.000000500
.000000400
0000000334
.000000266
.000000238
.000000227
.000000220
.000000217
.000000220
.000000247
.000000320
.000000567
.000000867
.000001613
.000003356
17
.000002353
.000000784
.000000469
.000000293
.000000236
.000000196
.000000156
.000000141
.000000136
.000000133
.000000135
.000000141
.000000176
.000000256
.000000510
.000000815
.000001566
.000003317
=
.0008346
25
.000001600
.000000533
.000000319
.000000200
.000000159
.000000133
.000000106
.000000097
.000000094
.000000094
.000000097
.000000106
.000000144
.000000226
.000000484
.000000792
.000001546
.000003299
117
From computed results, the method of statistical differentials is
undesirable for larger values of
~
=
.25,
a negative excess.
cedure with
=
c
.01
and
~
c,
yielding as it does for
c
=
.01,
(However, the aomo risk for an SPRT pro-
=
.25
comes close to that obtained if a
decision is made with no observations; see Table 202.20)
A comparison
with the results obtained by quadrature follows in 506, but here it
can be noted that for cases such as
c
= 001,
~
=
.25,
quadrature does
give valid results.
If
crease as
c
and
~
m are fixed, the optimum value of
n
appears to in-
decreases (see Figs. 50505 and 5.5.6), as does the approx-
imate average excess.
This is what intuition would suggest.
It is of interest to give an example illustrating how the average
excess splits into two non-negative components.
imate average excess by
E,
If we denote the approx-
then
E
=
where
average risk of PSPRT using estimator
- (risk of PSPRT when true cost
E2
=
Risk of PSPRT when true cost
- (risk of Wald SPRT when
The risks in
E1
and
would expect,
E1
increases and
c
n
E2
c
c
c
n
for cost
is known)
is known
is known).
are Chernoff a.m. risks for given
dominates
~
c
c
~
E
for smaller values of
in probability,
E2
n,
n.
As one
but as
n
becomes dominant.
118
TABLE 5.5.5
Components of approximate average excess.
m =25,
8 1-6 0 = .25,
c = .001,
E2
n
E1
1
10
15
18
25
.00068022
.00006792
.00004449
.00003622
.00002403
Chernoff a.m. risk = .13412
E = approx. Av. Excess
.00068022
.00006800
.00004821
.00005075
.00013566
0
.00000008
.00000372
.00001453
.00011163
The next comparison shows the effect of changes in cost when
and
~
are fixed.
m
Table 5.5.6 and Fig. 5.5.7 are compiled to show the
correspondence between the optimum value of
n
and the costo
On a
log-scale for cost, the graph indicates that the increase in optimum
is linear with log (decrease in cost).
A,
For each combination of
n
m and
the ratio
Approx. Average Excess
Cost
appears to remain in the same order of magnitude as the cost varies.
119
Figs. 5.5.
Graphs of Average Excess Risk
E
against
n
for tests of
normal means by PSPRT procedureso
35
30
m
=3
25
20
15
10
5
25
6
10
Fig. 5.5.1.
15
E x 10
5
n
against n.
20
c
25
= .001, ~ = .25.
120
lfl
L{)
0
r-l
a
lfl
x
'i
C'l
a
'i
rJ:.1
Fig. 5.5.2.
E x
10
5
against n.
c = .001,
f'j.
= .50.
121
90
rn = 3
80
70
60
50
45
40
35
rn = 10
30
25
rn = 17
20
rn = 25
15
50
40
30
Fig. 5.5.3.
E x 10
8
against n.
n
70
60
c = .00001,
f:,
.25.
122
o
(Y)
o
C'I
~~I ~
--,---.,..--------r
oo
o
o
Fig. 5.5~4.
xo
~
~
r-
,-j
E
x
10
8
against n.
o
<1
~
~
c = .00001,
C'I
~ = .50.
123
9
/: :,
.25
8
7
/:::, = .375
6
5
4
3.2
-l-----y------..~----__r_-----_r_---15
20
5
10
n
Fig. 5.5.5.
E x
10 5 against n.
c = .001,
m = 17.
124
45
40
35
to. "" .375
30
25
= • 25
20
15
13
80
60
40
20
n
Fig. 5.5.6.
E x 10
8
against n.
c "" .00001,
m = 17.
e
e
Fig. 5.5.7.
50
e
Correspondence between optimum values of n and unknown true cost
when Average Excess Risk is approximately minimized (logarithmic scale
for cost).
;::; 10,
f:,;::;.
25
40
o
'"0
::l
m ;:: 25,
f:,;::
rt
}-I.
S
§
30
20
=::::-10
-!o,
m ;::
f:,;::.
m
25,
5
f:,
.5
o
.000025
.00005
.0001
Cost
.00025
c
.0005
.001
I-'
N
VI
126
TABLE 5.5.6
Minimum (approx.) average excess and Optimum values of
procedures for given values of
~
True
Cost
Opt.
n
= .25
Min. Av.
Excess
~
= .25
Opt.
n
~
Min. Av.
Excess
.01
.001
.0001
18
37
.00001 57
.00010507 16
10- 6 x 4.87 33
10- 7 x 3.04 53
10- 5
10-
6
10- 7
x
x
x
4.76
in PSPRT
~.
m = 10,
m = 25,
m :;: 10,
m and
n
m = 25,
= .5
Opt.
n
Min. Av.
Excess
Opt.
n
4
.001282
3
8
2.13 14
1. 31 20
1010-
7
Min. Av.
Excess
x
3.14 13
.000630
10- 5 x 2.39
10- 6 x 1. 38
x
2.17 19
10
10- 5 x
6
= .5
~
5.36
8
-7
x
0.94
Comparison with Quadrature.
Returning to the discussion at the end of 5.4, a comparison was
made for similar values of
m,
~
and
c
of the average excess as
approximated by statistical differentials and by quadrature.
5.5.7 shows the results.
Table
127
TABLE 5.5.7
Comparison of Approximations to Average Excess in PSPRT Procedures:
Test of Normal Mean
R= Chernoff
a.m. Risk
in Wa1d SPRT
= 25
/:; = .25
c = .00001
m
n
1
= .002893
= .25
c = .00001
/:;
= .002893
10-
7
10-
7
54
10-
7
x
1.365
60
10- 7 x
10- 5
53
30
40
50
57
60
70
80
1
= 25
/:; = .50
c = .00001
R
= .0008346
6.443
10-
50
58
m
10- 6
x
7
56
R
10- 6
10-
35
1
m = 10
Av. Excess by
Stat. Diff1so
= ED
7
52
R
Av. Excess by
Quadrature
= E
Q
10-
7
-7
10
10- 7
10-
7
10-
7
10- 7
10-
7
10-
7
10
-7
10- 6
7
x
x
1.367
x
1.351
x
x
x
1.333
x
10
7
10- x
1.311
-7
I
10-
7
10- 5
x
1. 601
x
5.339
x
4.005
x
3.242
x
3.035
1.013
I 1.014
5.362
10-
7
-7
10
10- 7
30
10- 7 x
1.004
1.006
10-
x
3.032
1.012
3.070
10- 7 x
3.038
1.011
3.111
4.232
x
3.067 I 1.014
x
4.182
1.012
x
8.232
1.010
x
1.600
1.006
x
1.058
1.023
10- 7 x
0.943
1.012
x
0.942
1.014
x
0.969
1.023
"
2.257
1.008
7.915
1.004
10-
7
10-
7
-7
1.082
10- 7
I
I
I
3.068
x
25
1.037
II 1.014
7
10- 6
20
1.044
10-
3.076
1.610
x
I
7
x
10
10- 7
1.033
1.624
10
18
1.043
1. 474
8.312
10- 7 x
-7
1.039
x
I
x
10-
1.308
I
I
I
I
I
I
1.528
3.285
x
x
10-
1.006
I 1.025
1. 308
x
x
1.827
7
I
EQ/E D
x
4.027
x
x
7
I Ratio
I
I 10- 7
x
x
6.404
I 10-
1.385
x
15
19
1.872
x
I
0.954
0.955
x
0.991
x
2.274
7.943
I 10- 7
10
-7
-7
10
10- 7
-7
10 .
>(
128
Table 5.5.7 (continued)
1
m
fj,
c
R
m
fj,
c
= 17
= .375
= .001
=
.07279
= 25
= .5
= .01
7
10
=
.244126
10-
5
10- 5
5
11
10-
12
10- 5
5
13
10-
14
10- 5
5
15
10-
1
10- 4
2
10- 4
3
4
5
R
10- 5 x
10
10-
4
x
x
x
x
x
x
x
10-
I 1.009
x
6.141
1.004
4.531
10- 5
x
4.459
1.016
4.395
10- 5
x
4.298
1.023
4.508
10- 5
x
4.385
1.028
4.928
10- 5
x
4.779
1.031
5.723
10- 5
x
5.549
L031
5
x
6.771
1.029
10- 4
x
18.416
10- 4 x
9.005
x
x
10- 5 x
6.166
10- 5
6.968
10-
7.217
10-
4
x
6.298
I 1.031
I 1.059
1.146
8.662
10- 4
x
7.515
1.153
x 12.916
x 106.248
1.093
x 18.991
x
9.540
10- 4 x
4
43.011
43.417
14.110
10- 4 x 107.003
10-
4
10- 4
1.007
129
The results shown by quadrature can be considered more accurate,
and they give an approximate average excess which is consistently a
little greater than that given by statistical differentials (and if
c
~
.001
5.5.7.
only very slightly so), as shown in the last column of Table
This indicates that the statistical differential method gives
reasonably accurate results for the average excess based on the Chernoff
a.m. risk, whenever the true value
c
of the cost is small enough.
Since the results by the latter method seem to lag behind the corresponding results by quadrature, it may be that the contribution from the
third-order term in the differential expansion is sizeable enough to
correct the lag, viz.,
1I. E[ 1T* _ 1T) 3 ]
-3
a3 3
d1T*
[(k
R;2, n, ~c
n
I c )]
c =c
n
What is important, however, is that the optimum value of
all examples studied never varies by more than
1,
n
in
c
= .01
15%.
This
even when
and the average excess approximations differ by as much as
perhaps is a more important point to be noted than the comparison of
average excess values.
CHAPTER VI
ACCURACY OF APPROXIMATIONS TO THE A.M. RISK OF A P.S.P.R.T.
6.1.
Introduction.
In this chapter, some attempt is made to study how close the approx-
imations used in the last three chapters are to their true values.
different kinds of approximations have been used.
Two
The first is the set
of Wald approximations to the O.C. function and ASN of a SPRT procedure,
based on nominal error probabilities [24].
ditional test
T(x)
""""'n
of Chapter III to evaluate approximations to the
O.C. function and ASN of a PSPRT procedure.
on
E(N 18)
than if
80
These were used for the con-
Wald's bounds on
for a SPRT are generally closer i f
<
8
<
81 ,
8:::; 8 a
or
L(8)
and
8::::: 8 1
and since the main interest in these chapters is
the risk function for the two-point parameter space
(8 0 , 8 1 ),
the
analysis should not lead to bounds which are too far apart to be of interest in the case of the PSPRT.
In Sections 6.2 and 6.3, these bounds
are obtained for the OC function and ASN respectively, and in Section
6.4 some numerical results are given for the problem of testing a normal
mean when the variance is known.
In Section 6.5, bounds for the
Chernoff a.m. risk are given, and further numerical results are tabled.
131
The second kind of approximation which has been used is Chernoff's
approximation to the nominal a.m. risk, based upon nominal error probabilities [7].
We noted at the end of 2.4 that when
th~
cost is small
the Chernoff a.m. risk in the computed results is close to the nominal
a.m. risk.
Analytically, the determination of general bounds on the
true a.m. risk would repay consideration, but has not been successfully
considered here.
6.2.
Bounds on the D.C. function.
(based on Wald)
If
e +e
1
1 - lP(- Ie °e
=
e +e
1 - Hie -
1
2
2
e +e
0\)
°2
lP(le =
°1)
11)
e +e
lP( - Ie -
O
2
(6.2.1)
) I)
and
1
ne
(6.2.2)
~
then for
h(e)
=h
>
0,
and when (301.2) holds,
hal
e
n - 1
hal
hb'
e n - n e n
e
::;
L
x
-n
(e)
oee
hal
n - 1
- e
hb'n
using the notation of
Wald [22] and of 3.1, i.e.,
(6.2.3)
132
L (e): -- For the lower bound, we replace
n
e
For the upper bound, we replace
e
hb
ha
hb
by
nee
by
oeeha
in (3.3.11).
in (3.3.11).
Then, as in (3.3.9), let
=
~{
Similarly, defining
hb
nee
hb ha
nee -e
:S
L (e)
n
-
eJ?
::;
~+(b,e)
e
-
a
~(el-eO)
~+(b,e),
ha
e
hb ha
nee -e
hb
hb 0 ha
e - ee
~+(b,e)
-
+ ~~ (e 1+e O-2e)}
(6.2.4)
we get
~_(b,e),
~+(a,e)
1
+
nee
oeeha
ehb - 0 e ha
e
hb
-e
~+(a,e)
ha
+
{~_(a,e)
-
1
hb 0 ha
e - ee
~ _ (b,
{~
e)}
- (a, e)
(b, e)}
i. e. ,
e
hb
e hb - 0 eeha
::; L (s)
n
::;
~+ (b,
e)
-
hb
e
ehb - 0 e ha
e
ha
°ee
ehb - 0 eha
e
~+ (b,
e) -
~+ (a, e)
oee
+
°e
hb 0 ha
e - ee
{~
(a, e)
- ~ - (b ,e) }
ha
hb 0 ha
e - ee
w+ (a, e) +
1
ehb - 0 eha
e
{~_(a,e)
(6.2.5)
133
The accuracy of these bounds is measured by their difference, viz.,
(6.2.6)
as compared with
for the SPRT procedure.
Clearly, closer bounds are given in the PSPRT procedure, since
Further,
e 1 -e O > 0,
ep (a, e)
-
<
0: -
h
where
become closer as
h
- ep- (b, e)
1 - e
n
>
is decreasing in n for all a, b, e and
8 1+e O
=> e <
Hence the bounds on L (e)
2
n
°
increases.
h~
1 - n e
L
x
-n
ha'
n
e
(e) ::; -.,.....,...-=----,---;hb I
ha I
e
n
-n e
n
(6.2.7)
•
e
So, for the lower bound of
L (8),
n
replace
e
and for the upper bound of
L (e),
n
replace
e
hb
ha
by
oeehb
by
nee
ha
in (3.3.11),
in (3.3.11).
This leads to
oeehb
ha
e
-;::--:h::""b-"h-a ep+ (b, e) hb ha
u e
-e
0 e -e
e
e
::; L (e)
n
(6.2.8)
134
The accuracy is again measured by the difference
o -1
h~ h
10 ee -e al
again decreasing as
n
{~_(a,e) - ~_(b,e)},
increases.
Since we shall be largely concerned with the risk when
is the prior distribution on
e
interest when
=
eo
G
=
(eo' 6 1),
(w, l-w)
the above bounds are of
when
or
(6.2.9)
=
6.3.
Bounds on the Average Sample Number.
Again, using Wald's bounds, and writing· L, L
for upper and lower
bounds of the D.C. function, let
~6
=
~[6 + ~(~)]
H6)
(6.3.1)
~'
=
_
6
=
6 - t(6 1+6 0) •
6
where
~[_ 6 + ~(-6)]
~(-6)
Pe (X)
Then if
h < 0,
and
Z = log
1
-~.,....
p e (X)'
with the notation of 3.1,
o
-
L
x
-n
(e)(b'+~6')
n
+ (l-L
Ee(Z}
x
-n
(6»a
Lx (e)b + (I-Lx (e»(a+~6)
-n
Ee(Z)
~ E(Nlx ,6) - n ::;-n
-n
(6.3.2)
whenever sampling is continued beyond stage
n.
(h < 0
=¢
Ee(Z)
>
0).
135
So for the conditional test
-
a'n - Lx
(e)(a'-b'-~')
nne
r(x),
if
-n
an' +
I
~e
::;; Ex (N e) ::;;
-n
_ _ _-n-'--;:,...-;'::7'"
Ee(Z)
so if
N
>
N
- -x
L
>
nand
h < 0,
(e)(a'+~e-b')
n
n
-n_=--.,--,.----
Ee(Z)
n,
lower bound for
Ex (Nle)
oe-eha'n
an' - - - - - - - (an' - b n' - ~e') + n,
hb'
ha'
Ee(Z) is
x
-n
oe
e
=
n -e
n
a - n~(x -e) + ~h~2 n
e +
• (a - b - ~')
n
(6.3.3)
from (6.2.7).
Hence, integrating, the lower bound for
E (Nle)
n
is given by
[c
n
-
J
n~
a~(i
b"
n
u
e
e
where
hb
n
n
oe(a-b-~')
e
J:
- 6)dF(i )
-e
ha
e
~h~2
exp{n~(xn- e) }dF(X.~
n~1
C
n
a" b"
n' n
are defined in (3.3.3).
,
(6.3.4)
136
The integrals in (6.3.4) have been already evaluated in 3.3 and 3.4.
get for
h
<
°
ha
E
(N
I
e)
=
n
+
_2_[
-{a
+
~nhil2
+
e
b ~')}{
\t) (
-n
hil2
0 hb ha (a-e
+ a, e) ee -e
Using a similar procedure, we get for
Ee(Z)
x
We
(upper bound for
a' +
n
~
e
-
E~(Nle»
1-e
oe
e
<
0,
when
N
>
+ (b., e)}
n,
is given by
hal
n
--=......,;;...--
hb'
h
\t)
n_ e
ha'
(a - b
+
~e)
(6.3.6)
n
and
The accuracy of the approximation to
can be measured by the difference
E (N I e)
n
in 3.4 when
h(e)
<
°
137
(6.3.8)
However, for comparison, a better measure of accuracy might be the
ratio
2{E (N/e) - E (N!e)}
n
--n
E (Nle) + E (Nle)
n
If
h
>
0,
-n
we get for the conditional test
£x
T(x )
--n
::;;--n
(e)(b'+~e') + (l-L (e»a'
n
x
n
--n
Ee(Z)
(6.3.9)
therefore
Ee(Z)
x
lower bound
=
ha'
l-e n
a' + ~ - ---.;;:;......;;--- (a - b +
n
e
hb' ha'
n e n_ e n
e
So, for test
T,
we get when h
n
- ~+(b,e)} +
(a-b+~e)
nee
>
°
ha {~ (a,e) - ~_(b,e)}
hb
-e
l;e)
(6.3.10)
138
Similarly, for
T(x )
-n
n -e
e
al
n
Ee(Z) x upper bound
hal
n
hb I hal
n e n_ e n
(a - b - ~ I)
(6.3.12)
e
e
and for test
En(Nle)
=n
+
n
+
T,
n
when
h > 0,
h~2
[-{a +
~nh~2
ha
+
~b
nee
ha
-e
(a-b-~~)}{~+(a,e) - ~+(b,e)}
e (a-b-~ eI)
n e
e
hb
-e
ha
The accuracy for
h > 0
is again measured by
En (Nle)
- E (Nle)
-n
in a result similar to (6.3:8)
6.4.
Numerical Results.
As one would expect, the bounds on
as
n
L
n
(e)
and on
E (N I e)
n
narrow
increases, a reflection on the increasing probability of stopping
observations at stage
n,
as
n
increases, and since this event has a
probability whose value is known exactly, the information we have on the
O.C. function and on the ASN is correspondingly more complete.
The following tables show some of the results of numerical analysis.
One interesting case, when
= 1, gives bounds for a Wald SPRT pro-
n
cedure, and (6.2.6) shows that they are closer than those given by Walde
However, the factor
~_(a,e)
-
~_(b,e)
is generally too close to
1
to
139
be at all substantial.
~
not too small, say
Only if
~ ~
a
.5,
n = 1,
a
~
2,
and
is any noticeable improvement observed,
and Table 6.4.1 shows this for
for
is not too large, say
a
= -b =
~
2,
= 1.
(The factor here,
is
.)
=
Comparing the difference in bounds in Table 6.4.1, the following
is of interest.
When
6
=
60
or
6
=
61'
I
and
When
8
8.
1.
i
+ (-1) ~;
L(6) - 1:.(8)
improves by
l:(N 8) - f(NI8)
i
= 0,
.0055
improves by
in
•196
.07
in
2.485 .
~
1,L( 8) - 1:.(8)
improves by
.0073
in
,143
and
E(N I8)
- feN 18)
improves by
.450
in
8.823.
140
TABLE 6.4.1.
Bounds for
o.c.
function and A.S.N. in PSPRT and SPRT
a = -b = 2,
-n
Ln (8) Difference
-n
.1282
.1251
.0720
.0337
.0756
.0701
.0294
.0120
2.810 5.491
2.902 5.387
5.652 6.696
10.255 10.679
2.681
2.485
1.044
.424
.3903
.3865
.3354
.2866
.1504
.1431
.0898
.0630
.900 10.173
1.122 9.945
4.827 10.359
9.851 13.735
9.273
8.823
5.532
3.884
8-8 0 L (8)
n
SPRT: Wa1d' s
Approximations
1
1
1
5
10
1
1
.0526
.0550
.0426
.0217
.667
.667
.667
.667
.2399
.2434
.2456
.2236
SPRT: Wa1d's
Approximations
1
5
10
8 1-8 0 = 1
E (N 18)
The difference in bounds drops noticeably as
n
En (N 18) Difference
increases.
TABLE 6.4.2.
Bounds in PSPRT;
n
1
50
1
50
1
50
-
8-8 o L (8) . L (8)
0
0
.083
.083
.167
.167
1 .25
50 .25
-n
.9829
.9834
.8248
.8233
.3354
.3251
.0661
.0569
n
.9862
.9864
.8438
.8410
.3667
.3521
.0810
.0673
a = 4,
b
-2.5,
Difference E (N 18)
-n
.0033
.0030
.0190
.0177
.0313
.0270
.0149
.0104
76.34
87.12
127.39
137.42
148.28
157.03
110.67
115.63
8 1-8 0 = .25
En (N 18)
83.80
92.49
159.00
165.54
187.68
192.38
120.57
123.94
Difference
7.46
5.37
31. 61
28.12
39.40
35.35
9.90
8.31
141
TABLE 6. 4. 3.
Bounds in PSPRT;
a = 4,
b = -2.5,
S1-S0 = .50
-I
L (e)
e-s o "1l
L (S)
Diff. E (N Is)
"1l
En (N Is)
Diff.
1 0
.9827
.9838
15 0
30 0
.9884
1 .167 .8185
15 .167 .8181
.9887
.9890
.9915
.8550
.8511
.0060
.0052
.0031
.0365
.0330
19.05
23.11
33.99
30.01
34.00
22.96
25.63
35.31
45.88
47.57
3.91
2.52
1. 32
15.87
13.57
.8229
.3114
.3012
.2753
.0541
.8505
.3716
.3508
.3143
.0812
.0276
.0602
.0496
.0390
.0271
44.21
34.61
38.24
47.72
27.55
55.13
54.27
55.28
61.56
32.58
10.92
19.66
17.04
13.84
5.03
15 .500 .0455
30 .500 .0284
.0625
.0371
.0170 29.61 .
.0107 37.69
33.57
40.03
3.96
2.34
n
30
1
15
30
1
.167
.333
.333
.333
.500
n
With the same test boundaries
a
and
b,
but
~
double its value
in Table 6.4.2, Table 6.4.3 shows a more spectacular decrease in the
difference in bounds (proportionally) as
n
increases.
function, the factor
would suggest this to be so; for given
decreases as
~
increases.
a
and
n,
For the O.C.
142
A final example illustrates how the upper bound·
crease initially as
n
En (N!8)
may de-
increases, before beginning to increase with
n.
Also, the lower bound for the error probabilities may increase initially.
TABLE 6.4.4a.
Behavior of
En (NI8)
I 8-8
8-8
as
n
n
0
0
increases.
a = -b
1
5
15
30
=
0
25.945
25.935
27.77
36.44
=
.333
46.395
46.350
47.50
54.64
TABLE 60 4. 4b •
Upper bounds
~
a
n
of error probabilities as
1
I
5
.031761 .03180
15
.02857
n
increases;
30
.01909
143
605.
Bounds on the Risk of a P.SoP.R.T.
The bounds calculated earlier for the O.C. function and ASN of a
PSPRT can be used to calculate bounds for the risk.
01
have prior probabilities
wand
l-w;
Suppose
then for cos t
c
00
and
and unit
loss, the true risk lies between
and
Of particular interest are bounds for the a.m. risk.
The procedure
used earlier to yield the Chernoff nominal a.m. risk and the bounds
developed here will not give bounds for the true a.m. risk.
All that is
known is that the Chernoff nominal a.m. risk is usually close to the
nominal a.m. risk produced by the iterative methods of Chapter II.
The
tables below, therefore, give bounds for the risk when the boundaries
of the test procedure are those of (5.2.6), viz.,
w
log
a
l-w
r
b
=
2c
1-2"
!J.
. -2cT;2
2c
!J.2
log Iw
/l-W • l-~
l
"
!J.2
144
so that
a
and
b
are given by
2c
w
1 -
log (l-w) ± log
f::,2
2c
(6.5.1)
f::,2
These boundaries arise from the Chernoff nominal error probabi1ities, and yield the Chernoff a.m. risk.
The optimum value of
n,
minimizing the approximate average excess is also shown for the case in
which (using the notation of Chapter V),
C
n
rv
r(mn
'
ron)
c
and
m
25.
145
TABLE 6.5.1.
Bounds on the Chernoff a.m. risk;
= .5
c = .0001
a = -b = 7.1301
Optimum n = 13
= .5
c = .001
e = -b = 4.8203
Optimum n = 8
f::,
= .5
c = .01
a = -b = 2.4423
Optimum n = 3
f::,
= .25
c = .0001
a = -b = 5.7414
Optimum n = 33
f::,
= .25
c = .001
a = -b :. 3.4095
Optimum n = 16
f::,
= .25
c = .01
a = -b·= .7538
Optimum r = 3
f::,
~,
c
n
'" r (mn
'
mn
-)
c '
m = 25
Lower bound Chernoff a. m. risk Upper bound
n
f::,
w =
1
25
50
100
1
15
25
50
100
1
5
10
15
25
1
100
200
300
1
25
50
150
250
1
10
15
25
50
.006231
.006269
.006906
.010406
.006495
.006518
.007064
.010443
.006856
.006860
.007281
.010494
.04328
.04366
.04575
.05889
.10162
.04595
.04615
.04775
.05980
.10178
.04973
.04970
.05059
.06109
.10202
.2144
.2170
.2319
.2557
.3196
.2441
.2452
.2547
.2736
.3309
.2907
2892
.2903
.3017
.3486
.02087
.02130
.02519
.03651
.02146
.02180
.02544
.03659
.02215
.02239
.02575
.03668
.1281
.1283
.1314
.1814
.2626
.1341
.1342
.1366
.1834
.2634
.1422
.1421
.1434
.1861
.2644
.2990
.3582
.3922
.4628
.6566
.4068
.4256
.4476
.5040
.6814
.5247
.4992
.5082
.5491
.7084
<
146
The above data show:
i)
The bounds are proportionally closer when
n
is larger,
for any given cost.
ii)
The difference narrows considerably as
and becomes very narrow as the risk
+
en.
because as the contribution from the term
n
increases,
This is to be expected,
en
begins to dominate, the
information on the ASN becomes correspondingly more complete.
iii)
as
n
The upper bound for the risk again decreases initially
increases, in many instances; this is reflected in similar
behavior noted earlier in upper bounds for the ASN.
iv)
The (approximate) optimal values of
n
minimizing the
average excess risk for unknown cost generally are such that the bounds
above have not altered substantially beyond their values for
n = 1;
a slight sharpening of bounds is all that seems possible to claim.
This last point is perhaps important.
values of
n
for minimizing average excess are those found earlier in
Chapter V using approximate methods, for
Gamma distribution.
much with
m,
The approximate optimal
Since the optimal
the choice of
however, that the optimal
n
m
n
=
25,
where the cost has a
values did not appear to vary
m is not too crucial.
It seems intuitive,
would lie in the range of
the risk function does not begin to increase markedly.
n
for which
Consequently,
the PSPRT bounds for the true risk when the cost is unknown are not
going to be much narrower than for a Wald SPRT with the same test
boundaries, except in cases noted earlier (i.e.,
say) .
a = -b
~
2,
~ ~
.5,
A MULTIVARIATE EXTENSION OF A SEQUENTIAL DISCRIMINATION PROCEDURE
7.1.
Introduction.
In this chapter, we shall extend a procedure developed by Baker [3]
and Hall [9] as a sequential analog of Stein's two-stage test [17] of
hypotheses about the mean of a normal population with unknown variance
and given bounds on the error probabilities.
The notation is based on
Hall's, and hence differs from that used in earlier chapters.
Let
~
and
02
be the mean and variance of the population re-
spectively, both unknown.
vs
We wish to test the composite hypotheses
HO:
~ ~
HI:
~ ~ ~ >
0,
a >
°
0,
a
with error probabilities bounded above by
a
a
0
and
S.
When no bound on
is known, a fixed sample-size test does not exist, and the sequential
t-test may be unsatisfactory because
~ ~
>
HI
has to be stated in s.d. units,
00.
In Hall's test procedure
T,
a first stage estimates
02
as in
Stein's two-stage test, but sampling then proceeds one observation at a
time.
The procedure relates closely to that of Baker [3]; compared with
148
results by Stein's procedure, savings are possible in the ASN of
Hall obtained a sequence of upper and lower bounds on both the
function and the ASN of
T.
a.c.
T.
The extension discussed in this chapter is to the case of a multinormal population in which the covariance matrix
for a scalar multiplier.
When
V is known except
is generally unknown, the sequential
V
T2-test can be used, but no approximations to the
ASN exist.
p x 1
are independent
multinormally distributed with common mean
~,
function or to the
(See Jackson and Bradley, 1961 [10].)
Suppose -1
X, -2
X ,'"
matrix
ac
~
and it is known that either
~
=
random vectors,
and variance-covariance
~o
or
~
=
~1'
If
V
is
known, then Wald's SPRT can be applied as follows:
Let
if
=
then
.?S.i - t(.E.1+..!:!.O) ,
log B =
N
-1
~) 'V
b < (L:
i=l
take a further observation;
N
if
(L:
log A,
(.E. 1-J:!.o) < a
1
~i)' V- (..!:!.1 -J:!.O ) < b ,
i=l
assign to population with mean ..!:!.O;
N
(L:
if
, -1
~i) V (..!:!.l-~O) > a ,
i=l
assign to population with mean
Frequently the elements of
~1
•
V are not known in full, and here we
consider the case in which it is known that
known, but the positive scalar
based on
m observations.
A
These
V=
A~O'
where
~o
is replaced by an estimator
is
~
m observations come either from the
first stage of the procedure or from a previous experiment.
149
The above SPRT procedure is then carried out with
1\
AVo.
a
and
b
V replaced by
are replaced by quantaties such that the error proba-
bilities are bounded by required values.
The properties of this test, as well as the procedure itself, are
almost identical with those of Hall's in the univariate case, in which
the variance
0
2
is replaced by an estimator
has the same form, provided
normal problem,
h(~)
=-
~
=
e~l
+
The D.C. function
s2.
(l-e)~o'
and for the multi-
The approximations to the D.C.
2(e-~).
function and to the A.S.N. function are then notationally the same as
those demonstrated by Hall and Baker, except that we have
degrees of freedom instead of
f)
degrees of freedom.
v
(or
~
f),
pv
being based on
(or
v
pf)
(or
Baker's and Hall's tables can be used with this
modification.
In 7.2, the test procedure is presented, and in 7.3 and 7.4 the
D.C. function and ASN approximations are discussed.
It should be stated that Hall's and Baker's procedures apply
whether the estimate
s2
of the variance
experiment or upon the first
0
2
is based upon a previous
m observations of a two-stage test.
In
the latter case, however, it should be pointed out that not all the information is used, since Hall's approximations assume that the test dies
not terminate with the first stage.
If it does so terminate after
m
observations, it would seem appropriate to study an analog of the PSPRT
procedure of Chapter III, replacing
0
2 by
s2.
A discussion of this
possibility is presented in 7.5.
In the sequel, some familiarity with the notation and content of
Sections 1-5 of Hall's paper will be assumed.
150
7.2.
The Test Procedure.
With the above notation, the maximum likelihood estimator of
based on
A
m observations is
A
~
mp
X = 1
where
m
1
=
-
i=l
m
L
m i=l
-1
~
(~-X)'Vo (Xi-X),
L
(7.2.1)
X.
-J.
Now
m
L
i=l
X~p = X~(m-l) + X~
i.e., in terms of distributions,
(7.2.2)
with the components on the right independent, so that
A
Hence
Ell.'"
= -m-l
m
A •
We shall use the unbiased estimator
A
=
~~
m-l
(7.2.3)
(7.2.4)
which is distributed as a
\)
~\) x\)2 variable, with
p(m-l)
(In the univariate case,
(7.2.5)
\) = m-l,
Using Hall's notation, let
known
as used by HalL)
T be the test based on the SPRT for
V described in 7.1, but with
V = AVo
replaced by
~Vo •
151
So, i f
n
Z
=
-n
z.
L:
i=l
(7.2.6)
=
-;J...
let
II
r (;\)
n
Then
T
is:
(7.2.7)
stop and accept
Ho
(decision
dO)
if
r (~)
n
~
b
stop and accept
HI
(decision
d 1)
if
rn(~)
;?;
a
m
m
= 10gB
;::
10gA
m
otherwise continue sampling.
T(s, 0),
Analogous to Hall's
SPRT of ~o
-a
m
=
~I'
vs
log A
m
~
given
=
a
~
III
and
b
, ,
m
1\
T(~,;\)
define
;\,
;::
as the conditional
with boundaries
log
Bm
(7.2.8)
then it is seen that we decide according as
r (;\)
n
~
b
m
or
A
r (;\) = -~ r (;\) ,
;\
n
n
and since
the same if
~
r (;\)
n
~
-a .
m'
n
m,
;?;
decisions based on
is computed from the first
II
T(;\,
X)
and on
T
are
n :::: m.
m observations and
Hence
E Pr{d.;J...
from test
T(~,;\)I'~, A, ~}
=
Pr{d.;J...
from test
TIA,~} •
(7.2.9)
But using Wald's bounds on the error probabilities,
1
<-
from test
from test
Am
T(~,A) I~, A, ~
=
~I} < Bm
m
;::
(7.2.10)
152
Hence, since
we get
E Pr{d 1
T(tA) I~, A,.H.
from test
=
lL
a}
<
E exp(- am ~/A)
<
(1 + ~)-~v
<
01.
2a
i.e.
a
if
m
= ~v(0I.-2/V
-
v
1)
(7.2.11)
(- log 01.)[1 + -logOi. + 0(1)]
v
v
Similarly
T(~,A) I~, A, lL
from
if
b
=
].l
-1
}
<
S
m
(7.2.12)
=
- (- log S)[l + -logS + 0(1)]
v
v
We shall demonstrate shortly that the bounds
01.
and
S
above also
hold for the composite hypotheses
lL
H 1I .•
lL = elL1
a·
7.3.
= elL1 +
H I.
(l-e)lLa'
+ (l-e)lLa'
e
:5:
e ;::
0
1 .
The D.C. Function.
We first point out that, in this multivariate problem, the form of
the D.C. Function, neglecting excess over the boundaries, is given by
153
1 (e)
=
~
=
Pdd
o
T(~,A)I~, A,~}
from
Xh-l
m
-h
-h
A - B
m
m
,
(h :j: 0)
~
for the conditional test, given
there exists
= h(~) = -
h
also remark that
and
1(e)
(7.3.1)
and
2(e-~)
A;
~ = e~O + (l-e)~l'
and for
such that (7.3.1) is satisfied.
is monotonic decreasing in
e,
so that bounds
S hold on error probabilities for H'o and H'1 respectively,
as stated earlier.
So
from
T!A,~} < a
for all
e
from
T!A,~} < S
for all
e
:0;
0,
~
=
1,
.!:!.
=
Successive upper and lower bounds on the D.C. function
test
We
T follow as in Hall's paper.
E[
Pr{doIA,~}
of
Taking expectations in (7.3.1),
1 - exp(- a h ~/A)
m]
if
1 - exp{- (a -b )h ~/A}
m m
h > 0,
Le., if
e
< ~
00
=
E[(1- exp(- am ~/A h))
{E
r=O
exp(- r(a -b ) ~/A h)}]
m m
=
.'
_ (1 +
2(2a -b)
_..;......,;m~..;;;;m~
V
h)
-kv
"2
2(2a -2b)
kv
+ (1 + _--.;:m~~m~ h) -"2
_
V
(7.3.3)
(7.3.4)
using (7.2.11) and (7.2.12).
154
These results are notational1y Hall's, with
h
and
v
appropriate
to the multivariate problem being considered (his equations (9».
h
0,
<
S = a,
Hall's equations (10) apply, and if
If
partial sums again
give successive upper and lower bounds in the result (Hall's (11»
00
=
(7.3.5)
L;
r=O
where
Case
j = 0
h = O.
if
e
(Le. ,
<
*,
and
e = *).
= 1
j
if
e
>
*.
For this case,
We
evaluate in (7.3.1)
lim
h
-T
0
exp(a h ~/A) - 1
m
pr{dol~, A, ~}
lim
h
-T
a exp (G1 h ~ / A)
m
m
a
exp(a
h
~/A)
b exp(b h ~/A) ,
0 m
m
m
m
using L'Hopital's rule,
a
=
=
m
a
m
*
- b
(7.3.6)
m
a.
if
Th~se results are independent of
function of
~a11
T.,
i.e., for
~,
and hold therefore for the
a.c.
Pr{doIA,~} .
illustrates the rapid convergence of the series (7.3.5) when
a = S = ..• 05,
and
m = 16, 31.
table, modified in terms of
e.
His remarks apply here, as does the
155
7.4.
The A.S.N.
Again, we follow Hall's method.
Under certain conditions, Wald's
Equation
N
E( L: Y. )
i::;:l 1.
holds for a sequence
Y , Y ,
1
2
of i.i.d. random variables.
•••
A
simple extension gives the same result for multivariate variables, where
each "observation" covers
LEMMA 7. 1.
p
a)
~ l' ~2' •••
b)
If
characters.
is a sequence of independent
p
x
1
ran-
dom vectors.
Z'
then
-1
Elzill <
00,
i ::;: 1, 2, ... , p •
c)
N is a positive integer-valued r.v. and
i)
the event
j
ii)
<
~ j}
{N
and
~
are independent if
k,
E(N) <
00.
N
Then
~k)::;:
E( L:
E(N)
E(~l)
.
k::;:l
Following Johnson's method [11], the proof consists in applying the
univariate result to each component of
Then from (7.2.6), where
E r (A)
n
Z ::;:
-n
-1
(.H. 1-.H.o) '~o
n
L:
i::;:l
~l.
z. ,
-:I.
E~
A
(7.4.1)
156
Again, using conditional expectations, and following Hall's
approach,
E rN(A)
=
E[E(rN(A) I~)]
=
E[
1
E
A
~
E{rN(A)IA, d.} Pr(d.IA)]
• O.
1
1=
1
neglecting excess over the boundaries.
Hence if
h > 0,
~
a
~
a
m
- (a -b )E[(~/A)pr(dol~)]
m m
m
- (a -b )E[~/A {1- exp(- a ~/A h)}·
m m
m
00
• {E
r= O
exp(- r(a -b ) ~/A h)}],
(7.4.2)
m m
using Wa1d's approximation
Pr( do I~, A, ..!:!.)
~
exp(a h ~/A) - 1
m
.....-~__
exp(a h ~/A)-exp(b h ~/A)
m
m
157
Hence
~
a
A
m
A
A
- (a -b )[E A/A - E{A/A exp(- a A/A h)}
m m
m
+ E{~/A exp(-(am-b m) ~/A h)} - E{~/A exp(-(2am-b m) ~/A h)}
+ E{~/A exp(- 2(am-b m) ~/A h)} - ••. J ,
a h
a
m
- (a -b )[v!v - (1 + 2 ~)-l-~V + (1 + 2
- (1 + 2
~
b
+
m
(1
v
m m
+
(a -b )h
m m )-l-~V
v
(2am-b)h
1m
- 1-"§v
)
+ ... J
v
(a -b) [ (1
m m
+
a h
2
~) -l-~v
(1
v
+ 2 (2am-b m)h) -l-~v
v
_ (1
+
+ 2
( a -b )h
m m
-l-~v
v .)
4 _(.. .;am::;;..-_b..:::;rn;;.. .,)h_) -l-~V
+ ... J •
v
(7.4.3)
Combining with (7.3.1), we obtain for
h > 0,
(7.4.4)
which on the right is Hall's (16), and gives
bounds on the ASN ofT.
For
(7.4.3) and put
h;
-h
for
h
<
0,
succ~ssive
interchange
in (7.4.4) interchange
a
m
a
upper and lower
and
and
b
m
S.
in
With
158
the same notation as in (7.3.5),
~
a
I~I
= B in (7.4.4) leads to
2
(a- / v _ 1)[1 - 2
oa
~
{l
r=l
(7.4.5)
These approximations involve the unknown
(i.e.,
e
=
0
e
and
1),
=
A.
At
~
= ~O
= ~1'
~
we can use bounds
exp (- a
from
T(~,A) I~, A, ~ }
from
and
1
<
B
m
=
exp(b
m
m
~/A)
~/A) .
Since
=
b
m
+
(7.4.1) gives for
E(N) •
(a -b
)E[(~/A) Pr{d 1
m m
h
1
at
, -1
(~l-~O) ~ (~l-~O) >
from
T(~ , A) I~, A, ~o}] ,
~,
A
A
-2b m - 2(am-b m)E[(A/A)exp(- am A/A)]
i. e. ,
(7.4.6)
and
>
v[a -2/v - 1 - 2a(1 - a 2/v )]
if
B
= a.
(7.4.7)
If
Il
-Il -- -1'
then
B interchanged.
e = 1 and h
=
-1,'
then we get (7.4.6) with
a
and
159
There remains the special case
~ =~(.H./110)'
where
h = O.
We
consider
=
n
L:
i=l
Given
N
pendence of
for given
n,
= n,
~
and
the second sum has zero expectation, by the inde~j'
and since
h
=0
implies
the distribution of ~~ 1 (.H.1-~O)
-1
N(O, A (.H. 1-.H.O) 'V 0 (.H. 1-.H.o»·
Ez.
~
=
O.
Also,
is
Hence
with expected value
(7.4.8)
160
Again,
1
E[ E
j=O
E[{r (A)}2It d.]Pr(d.
N
J
J
if
from
T(tA)!t A)]
and since
h = 0;
(b
a
=
a~ E(x~/v2) - (a~-b~) a ~ b
m
m
using (7.3.6) at
'"
m
~/A) 2
'
E(X~/v2),
h
= 0;
- a b (1 + 2/v).
mm
so that
(7.4.9)
Hence
E(Nlh = 0) •
(~l-~O)'V~l(~l-~O) '" _~Av2(a-2/v_
1)
(B- 2 / v _
1)( 1 + 2/v),
(7.4.10)
and if
B = a,
161
Suggestions for Further Research.
7.5.
We return to the univariate problem of Hall's paper, where the in1 m
formation arising from the statistic x =- L x. at the m-th observam i =l
m
1
tion is not fully taken into consideration.
Suppose that the PSPRT procedure of Chapter III (in which
here replaced by
m)
n
is
is modified when the variance is unknown, so that
is replaced by
T
The test procedure
m
is now as follows:
n
if
r (s)
n
t:.
(x. - ~t:.)
L
i=l
stop and accept
H
stop and accept
HI
o
continue sampling if
Tm(s2,
Let
vations.
Is 2
if
r (s)
if
r (s) ;:: a
b
n
:<;
n
m
< r
n
n ;:: m ,
,
b
(7.5.1)
m
m
(s) < a
m
.
be the conditional test given the first
x)
m
This will depend on
sufficient statistics
Tm(s2,
1
x
m
and
(xl' x 2 ' •.. , x )
m
s2.
m obser-
only through the
Then (cf. (3.1.1), (3.1.7), (3.3.3»
is as follows:
x)
m
if
a'
m
(7.5.2)
b'
m
r*(s)
n
(7.5.3)
r (s) - r' (s) ,
stop and accept
n
m
H
o if
r*(s)
n
:<;
b'
m'
162
s top and accep t
r*(s) ;:: a l
n
m'
if
HI
bI
continue sampling if
m
r*(s)
<
aI
m
<
n
•
If the inequality
°
bm
' <
does not hold then we decide on the
If
a I < 0,
m observations of the first stage.
b I > 0,
m
is accepted; i f
m
l
am
<
is accepted.
This analog of the PSPRT seems appropriate, since the probability,
given
0
2,
that observations terminate with the first stage, can be
exactly determined, and we have thus more information on the G.C. function and ASN.
be the SPRT based upon the first stage, with
Let
known
0,
if
defined by:
am
l
=
a l s2/ 0 2
m
(7.5.4)
hI
m
=
b ' s2/ 0 2
m
stop and accept
Ho
if
r*(o)
n
:::;
hI
m
stop and accept
H
if
r*(o)
n
;::
am .
If
hI < 0 <
m
first stage as in
am
l
1
does not hold, stop and decide at the end of the
T (s2
m
'
It can be noted that
3.3., with boundaries
l
x)·
otherwise continue sampling.
m'
T (s2
m
a s2/ 0 2
m
'
xm'
and
0
b
2)
8
m
is the same as
2 /0 2
T(x)
-m
of
for the unconditional test.
163
2
r*(a) = r*(s) • s 2 la,
Further, since
n
n
Tm(s2,
same decision rule at each stage as does
E EPr{accept
xm).
Tm is given by
The O.C. function of
=
Tm(s 2 , x- m, a2) has the
H
O
(7.5.5)
where expectations are taken w.r.t. the distributions of the independept
statistics
xm
and
s2.
say, where
I. e. ,
1
1 - exp(ha')
m
if
r*(a) ::;
m
if
bm
exp(hb') - exp(ha')
m
mla 2
m
< r*(a) <
m
a'm
m
o
Then in (3.3.7), we replace
b'
0.5.6)
otherwise
a
by
as 2 la 2 ,
b
by
bs 2 la 2 ,
to get
=
where expectations are taken over the distribution of
s2.
and
n
by
164
In this form (7.5.7) is unlikely to lead to an infinite series of
the form of Hall's equations (9), but further research on this approach
using bounds on the normal c.d.f. might yield results extending Hall's
procedure to the case in which the test terminates with the first stage.
The same approach may well be applied to the ASN.
165
APPENDIX I
Let
LEMMA A.I.
root
TI*
be a sequence of approximations to the
of the equation
-1
=P
</l (TI)
l-TI
l-2TI
- 2 log --TI- - TI(l-TI)
= 0,
such that
</l(TI )
TI
TI n+ 1
If, for some
TI k
where
k
=
~
n
</l'(TI) •
n
n
1,
P - p 2 (1 + 2 log p) + 0(p3 log p)
is small, then for every
p
n
>
k,
TI
n
>
°
has the same form
as
PROOF:
[TIl' TI 2 , ••
iteration.
</l
from
_00
0
are obtained by the Newton-Raphson process of
The condition
to
+00
for
TI
k
TI
a new first approximation
>
°
follows from monotonicity of
(0,1),
in
TIl
and as shown in 2.2, if
would be chosen.
however, rapid convergence follows (see Fig. 2.2.2).]
If
1T
2
>
0,
166
By induction, assume true for
7T •
n
Then
7T +
n 1
1
l-7T
l-27T
- {7T 2 (1-7T )2}[__ 2 log
n _
n]
n
n
n
p
7T
7T (1-7T )
n
n
n
=
7T
=
p - p2(1+2logp) - {p - p2(1+2logp)}2{1 - p + p2(1+2logp)}2
1
1- + 2(1+21)
1 - 2p + 2 p 2(1 + 2 log p)
• [__ 2log( P P
ogp ) _
]
P
p_p2(1+2 log p)
{p-p2(1+2logp)}{1-p+p2(1+2logp)}
+ O(p3 log p)
+ O(p3 log p)
p - p2(1+2logp) - p2(-2logp+2logp+2p-2p+O(p»
=
+ O(p 3l ogp )
p - p2(1 + 2 log p) + O(p3 log p) ,
which is of the same form.
Q.E.D.
167
APPENDIX II
ON CHOOSING A SUITABLE ESTIMATOR OF COST
The choice of a PSPRT procedure in Chapter V is based largely on
mathematical convenience, as is the choice of
mator.
-c
n
as a suitable esti-
Research into the choice of a "best" estimator, in some sense,
has not been done, to the writer's knowledge.
The following approach
may be useful.
We consider a class of test procedures between two simple hypotheses
HO
and
HI
about the parameter
is indexed by
8.
8,
where the i.i.d. sequence of
To distinguish the present prob-
1em from that of the tests based on
we refer to "cost -
loss" and "cos t-risk ' .
The cost-loss refers to incorrect use of an estimator
cost per observation
on the first
n
c,
where
c
n
individuals, i.e.,
Using the notation of 5.2, let
c
n
of the
is a function of the observed cost
c
n
is a function of
R(w, n, cJc)
be the a.m. risk in
a class of test procedures, each procedure being well-defined when
nand
L(~ I c)
n
c
are given.
w,
Define the cost loss to be
R(w, n, ~ Ic) - R(w, n, clc) •
n
(A.2.1)
168
Thus the cost-loss is the excess, as for example, in (5.3.8).
Ideally, the estimator
c
n
averaged over the distribution of
c·
n'
but
c
is unknown.
However, if we can postulate a prior distribution
we try to choose
c
n
EL(~ Ic),
n
should be chosen to minimize
~
for
c,
then
to minimize the cost-risk
where the integrals are assumed to exist, and the conditions for Fubini's
theorem are assumed to apply.
posterior cost-risk of
given
c •
n
c
Then, effectively, we try to minimize the
over the posterior distribution of
This may lead to a Bayes estimator
c
n
c,
of the true cost.
169
BIBLIOGRAPHY
[ 1]
Anderson, T.W., "A modification of the sequential probability
ratio test to reduce the sample size",
Annals of Mathe-
matical Statistics, Volume 31 (1960), pp 165-197.
[ 2]
Armitage, P., "Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis",
Journal of the Royal Statistical Society,
Series B, Volume 12 (1950), pp 137-144.
[ 3]
Baker, A.G.,
"Properties of some tests in sequential analysis",
Biometrika, Volume 37 (1950), pp 334-346.
[ 4]
Bhate, D.H.,
"Approximations to the distribution of sample
size for sequential tests. I. Tests of Simple hypotheses",
Biometrika, Volume 46 (1959), pp 130-138.
[ 5]
Billard, L.,
"Sequential tests for two-sided alternative
hypothesis",
University of New South Wales, Ph.D. Thesis,
1968.
[ 6]
Burkholder, D.L. and Wijsman, R.A.,
"Optimum Properties and
Admissibility of Sequential Tests",
Annals of Mathematical
Statistics, Volume 34 (1963), pp 1-17.
[ 7]
Chernoff, H.
"Sequential design of experiments",
Annals of
Mathematical Statistics, Volume 30 (1959), pp 755-770.
170
[ 8]
Feller, W.,
An Introduction to Probability Theory and Its
Applications,
Volume I, Second Edition, John Wiley and
Sons, Inc., New York, 1958.
[ 9]
Hall, W.J., "Some sequential analogs of Stein's two-stage test",
Biometrika, Volume 49 (1962), pp 367-378.
[10]
Jackson, J.E. and Bradley, R.A.,
tests",
"Sequential
X2 -
and
T2
Annals of Mathematical Statistics, Volume 32
(1961), pp 1063-1077.
[11]
Johnson, N.L.,
"A proof of Wald's theorem on cumulative sums",
Annals of Mathematical Statistics, Volume 30 (1959),
pp 1245-1247.
[12]
Johnson, N.L.,
"Sequential analysis: a survey",
Journal of
the Royal Statistical Society, Series A, Volume 124
(1961), pp 372-411.
[13]
Kemp, K. W. ,
"Formulae for calculating the operating character-
istic and the average sample number of some sequential
tests",
Journal of the Royal Statistical Society, Series
B, Volume 20 (1958), pp 379-386.
[14]
Lehmann, E.L.,
Testing Statistical Hypotheses,
John Wiley
and Sons, Inc., New York, 1959.
[15]
Page, E.S.,
"An improvement to Wald's approximation for some
properties of sequential tests",
Journal of the Royal
Statistical Society, Series B, Volume 16 (1954), pp 136139.
[16]
Sobel, M. and Wa1d, A.,
"A sequential decision procedure for
choosing one of three hypotheses concerning the unknown
mean of a normal distribution",
171
Annals of Mathematical Statistics, Volume 20 (1949),
pp 502-522.
[17]
Stein, C.,
"A two-sample test for a linear hypothesis whose
power is independent of the variance",
Annals of Mathe-
matical Statistics, Volume 16 (1945), pp 243-258.
[18]
Wald, A.,
"On cumulative sums of random variables",
Annals
of Mathematical Statistics, Volume 15 (1944), pp 283-296.
[19]
Wald, A.,
"Sequential tests of statistical hypotheses",
Annals of Mathematical Statistics, Volume 16 (1945),
pp 117-186.
[20]
Wald, A.,
"Sequential method of sampling for deciding between
two courses of action",
Journal of the American Statis-
tical Association, Volume 40 (1945), pp 277-306.
[21]
Wald, A.,
"Some generalizations of the theory of cumulative
sums of random variables",
Annals of Mathematical
Statistics, Volume 16 (1945), pp 287-293.
[22]
Wald, A.,
"Some improvements in setting limits for the
expected number of observations required by a sequential
probability ratio test",
Annals of Mathematical Statis-
tics, Volume 17 (1946), pp 466-474.
[23]
Wald, A.,
"Differentiation under the expectation sign in the
fundamental identity of sequential analysis",
Annals of
Mathematical Statistics, Volume 17 (1946), pp 493-497.
[24]
Wald, A.,
Sequential Analysis,
New York, 1947.
John Wiley and Sons, Inc.,
172
[25]
Wald, A. and Wolfowitz, J
0,
"Optimum character of the
sequential probability ratio test",
Annals of Mathe-
matical Statistics, Volume 19 (1948), pp 326-339.
[26]
Wald, A. and Wolfowitz, J.,
decision problems",
"Bayes solutions of sequential
Annals of Mathematical Statistics,
Volume 21 (1950), pp 89-99.