Hall, William J.; (1954)Most economical multiple-decision rules." (Air Research and Dev. Command)

MOST ECONOMICAL MULTIPLE-DECISION RULES
I
by
Hm. Jackson Hall
Institute of Statistics
University of North Carolina
Institute of Statistics
l'iimeograph Series No. 115
August, 1954
1. This research was supported by the United states
Air Force, through the Office of Scientific Research of the
Air Research and Development Command •
•
ii
ACKNO~LEDGMENT
The author wishes to express his sincere thanks
to Professor Wassily Hoeffding whose stimulating lectures inspired this research, and whose criticisms and
suggestions have been invaluable in accomplishing it.
Thanks are also due the Office of Naval Research
and the United States Air Force for their generous
financial assistance, and to the U. S. International
Information Administration for the Fulbright Grant which
enabled the author to study abroad and to incorporate
into this research many valuable suggestions of Mr. D. V.
Lindley, to whom the author is gratefully indebted.
iii
TABLE OF CONTENTS
CHAPTER
PAGE
ACKNOWLEDCMENT
I
... .....
INTRODUCTION •
v
MOST ECONOMICAL MULTIPLE-DECISION RULES FOR
SIMPLE DISCRIMINATION • . . . .
1
1.1 Formulation of the Problem. • . . • •
1.1.1 Basic Assumptions and Definitions
1. 1. 2 Problems of Sirllple Discrimination
1. 2 Hinimax Decision Rules for Fixed Sample
Sizes
II
ii
. . . . . • . . . . . . . . . . .
1
1
3
9
1.3 Most Economical Decision Rules Relative
to a Vector a . • • . • . . .
.•..
1.3.1 First Minimax Approach. •
1.3.2 Second Minimax Approach
1.3.3 An Alternative Approach
1.3.4 Admissibility . • . . . • . .
1.3.5 Likelihood Ratio Decision Rules
1.3.6 Four Examples • • . . • . . • . .
1.4 ~1ost Economical Decision Rules Relative
to a Hatrix 3 • • • • • • • • • • •
1. 5 A Generalization of Host Economical
Decision Rules Relative to a Vector
49
MOST ECONOMICAL MULTIPLE-DECISION RULES FOR
COMPOSITE DISCRIMINATION • • • •• . • • . •
52
2.1 The Problem of Composite Discrimination
2. 2 }~inimax Decision Rules for Fixed Sample
Sizes
... . . . . . . . . . . . . . .
2.3 Most Economical Decision
to a Vector a . • . • .
2.4 Most Economical Decision
to a Matrix B • • • • •
2.5 Some Parametric €::mnplos
Decision Rules • . . . •
2.6 A Non-Parametric Example
Decision Rule . • . • .
Rules Relative
. • . . . • . .
Rules Relative
• • • • • •
of Three. . • • .
of a Three. . • . . .
12
13
18
22
30
35
38
41
52
57
70
78
81
96
iv
Table of Contents (Continued)
III
EXISTENCE THEOREMS FOR HOST r;CONOHICAL
lffiLTIPLE-DECISION RULES •
3.1 Introduction
3.2 Non-Trivial, Selective, and
Sequences of Decision Rules
3.3 Existence Theorems for Most
Decision Rules •• . . . •
IV
101
101
Consistent
. . • . • •
Economical
. •
103
119
MOST ECONOMICAL DECISION FUNCTIONS
124
4.1
4.?
4.3
4.4
124
126
132
Introduction . • • • • • . . • • •
Preliminary Theory • • . • . . •
Most Zconomical Decision Functions
Decision Functions with Bounds on the
Maximum Expected Cost • • • • • • • • •
APPENDIX: SOHE "TWO-SmED" TltlO-DECISION RULES
AND SYM1'1ETRIC THREE-DECISION RULES FOR COM,;"
POSITE DISCRTI1INATION • • • • • . . • • • .
A.l Some "Two-Sided" Two-Decision Rules
A.I.l Introduction • • . .. • • • • •
A.l.2 The Mean of a Normal Distribution,
Variance Knmvn • • • . . • . . •
A.l.3 The }~an of a Normal Distribution,
Variance UnknOlm . . . • . . . •
A.l.4 The Parameter of a Binomial Distribution . . • . . . . . . . . •
A.l.5 The Median of an Unspecified Distribution . • • . .
.•....
A.2 Nomographic Solutions. • . . . . . . .
A.2.l The Nomographs • • • • . . . • •
A. 2. 2 The Bean of a IJormal Distribution,
Variance Known • • . . • . . . .
A.2.3 The Mean of a Normal Distribution,
Varianco Unknmm • • • . • . . •
A.?4 The Parameter of a Binomial Distribution and the Hedian of an
Unspecified Distribution • • . •
A.3 Approximate Equivalence with Symmetric
Three-Decision Rules • . • . . . • . .
A.4 Tables of Most Economical Sffiuple Sizes
BIBLIOGRAPHY
137
138
138
138
139
141
142
143
143
143
146
147
148
150
154
158
INTRODUCTION
Most economical multiple-decision theory is primarily en
extension of Professor Hoeffding's formulation of moet economical
two-decision theory, as given in his COUTse lectures at Chapel Hill.
The concept is en application of Wald's formulation of sequential
analysis to non-sequential decision theory in which the choice of
a sample size is at the disposal of the experimenter - instead of
minimizing the expected sample size of
8
sequential experiment sUb-
ject to bounds on the probabilities of error, Hoeffding suggests
minimizing the fixed sample size of a non-sequential experiment
subject to these bounds.
Assuming the cost of experimentation to
be proportional to the sample size, such procedures are, in a senee,
"most economical".
In terms of the now classical Neyman-Peerson
theory of testing hypotheses, such decision rules a.djust the sample
size so as to obta.in the desired power a.gainst certa.in a,lterna,tives;
such an a,pproa,ch is implicit in the writings of Neyman and Pearscn.
In the words of Ferris, Grubbs, and Weaver
15_7, "...
if the
sampler wants a given degree of assurance in rejecting the null
hypothesis when a, particular alternative is true, he would like to
know the minimum sample size which would a.ccomplish this when the
probability of rejecting the null hypothesis when true is given. n
For this purpose, these authors he.ve supplied gra.phs of the power
functions for va.rious sample sizes for some of the common statistical tests.
Hoeffding's lectures constitute a formalization, and
generalization
8S
well, of this approa.ch to testing hypotheses and
give, in addition, a number of existence theorems.
vi
The multiple-decision extensions treated here are, first,
to the consideration of decision rules for deciding among m alternatives, rather than Just two, which minimize the sample size subJect to bounds on the m probabilities of choosing the alternative
which is to be preferred, or which, in a sense, is the "correct"
alterna.tive, under the prevailing one of m possible situations
(true distributions or classes of distributions).
A second exten-
sion is to decision rules which minimize the sample size subject
to bounds on ea.ch of the probabilities (less than t·m in number)
of making "incorrect" decisions - the probabilities of choosing
one of the m alterna.tives which is
the
I
~
possible situs.tions prevails.
to be preferred when one of
Solutions to the first pro-
blem turn out to be "likelihood ratio" decision rules, and to the
la.tter, "unlikelihood ratio" decision rules, defined precisely in
the text.
They are obtained by an a.pplication of Wa.ld' s minimax
theory for fixed sample sizes;
however, the minimdx theory is
used solely as a. tool and does not lend itself to any interpretation according to Wald's theories.
But it is an effective tool
in tha.t It is proved that one he,s nothing to lose by restricting
his consideration solely to minimax solutions with respect to
certain artificial loss functions for various sample sizes.
Problems of both "simple" a.nd "composite" discrimination
are considered, end characteriza,tion theorems and existence theorems
are given.
Various properties of the decision rules are derived,
as well a.s relationships with works of Wald, Wolfowitz, Lindley,
vii
Reo, end others. A number ot examples ere treated which are
generalizations ot some of the common statisticsl tests. The
tbeory as given is applicable to multivariate problems as well as
univariate ones, and this generality enables some k-population
problems - such as deciding which of sevel'sl populations he.s the
largeat meen - to 'be covered elso by considering e. set of k observations, one from each univariate population, as one observation from
III
k-variate population.
Finally, utilizing the ane.logies with Weld's statistical
decision functions, a generalization to a most economical theory
of decision functions is considered 1n which the maximum
expecte~
cost is min1Ddzed subject to bounds on the expected loss function.
This a.ppr08ch to statistical decision functions overcomes one of
the
c~on
criticisms of Weld's theory, that losses due to incor-
rect decisions and cost of exper1menta.tion are tree.ted on an equal
footing, simply by summins them.
The relationship with ene.logous
work of Bl1thsnd Kont jn 1s pointed out.
Some of the theorems end their proofs, as well
88
some of
the concepts, are fairly stra.ightforward generalize.tiona end
a.da.ptations of the work of Hoeffding, Weld, Lindley, and others,
and indication to that effect is 6iven Wherever appropr1a.te.
In Chepter I, the bas1c assumptions and detin1tions ere
given and problems of !tsmple discr1lll1nation" are treated - that
is, problems in which one ot a finite number of possible
tiona underlies the decision problem.
di8trib~
Chs.pter II trea.ts problems
viii
of composite discrimina.tion - discrimina.tion among possible cla,sses
of distribution functions.
Chapter III gives some very genera,l
necessary and sufficient conditions for the existence of most
economical decision rules for both simple and composite discrim1m.tion.
And Chapter IV trea.ts generalizations of the foregoing
theory to sta.tistica.l decision functions , giving existence theorems
as well as indicating various a.pplica.tions.
An a.ppendix treats
some pa.rticular examplesof two- and three- decision rules;
a
nomograph is given for obta.ining such rules explicitly, and some
brief tables of most economical sample, sizes are computed from it.
NOTATION
We use
XIS
to denote random variables and xle to denote
observations on the corresponding random varia.bles.
X or x with a
subscript denotes one real- or vector-valued variable, and without
a subecript it denotes a. sequence of such varia.bles, usually n in
number.
For clarity, we use
"/;U
to denote the completion of a proo f .
Numbers in square brackets refer to bibliography.
"Decision rule"
is abbreviated "d.r. 1I and "most economica.l", IIM.E. II ;
II with
respect toll is abbrevia.ted "w.r. t •II •
For conciseness, any symbol, say B, when underlined denotes
m (or sometimes f) of the same With subscripts running from 1 to
m (or f);
thus, ~ denotes sl' s2' ... , sm'
positive integer greater than unity.
introduced as needed.
Throughout, m is a
All other notation is
CHAPTER I
MOST ECONOMICAL MULTIPLE-DECISION RULES
FOR SIMPLE DISCRTI1INATION
1.1 Formulation of the
Proble~t
1.1.1 Basic Assumptions and Definitions.
cerned with a sequence
~,
X2, ••• ,
We are con-
of real- or veotor-valued,
independent, and identically distributed random variables, each
having a generalized density function
~(x).
f(x) w.r.t. a measure
f(x) is not completely known, but is assumed to belong to
same specified class ~l of density functions w.r.t. the specified me asure I..L.
We suppose we are faced with a number m of alternative decisions, AI' ••. , Am' one of which is to be chosen after having
taken a sample of size n, that is, after having observed the first
n random variables Xl' •.• , Xn , denoted simply X, n being at our
disposal.
We denote the sample values by x
= (xl'
.•• , xn)'
n
is to be fixed in advance of experimentation and is to be completely non-random, not depending on the observations nor on any
chance mechanism.
We assume further that the cost of experimen-
tation is proportional to the sample size.
It would be necessary
to assume only that the cost be some specified increasing function
2
of the swnple size by making minor alterations throughout, but
we make the former assumption for expediency.
The decision procedure consists first in choosing a nonnegative integer n, and then, after taking an observation on
... , x ),
n
~,
••• , Am.
in choosing
one of the alternative decisions
.
The decision problem is to formulate a rule for
choosing n, and having taken a sample of size n, for choosing
among AI' ••• , Am'
viated m-d.r., or
A multiple-decision rule D (hereafter abbres~~ply
d.r., m denoting the number of alterna-
tives available) for choosing among AI' ••• , Am after an observation x on X has been taken is defined by an ordered set of nonnegative, real-valued, and measurable functions ~(x)
..• , ¢m(x)_7 on the space
X
of x having the property
m
E ¢.(x) = I identically in x (for n
. 1
~=
~
of non-negative constants ¢
D is interpreted as follows:
= ~¢l(x),
= 0, D is defined by a set
= (¢l' •.. , ¢. ) summing to unity).
m
having taken a sample of size nand
computed 2(x), a chance mechanism is used to select one of the
alternatives
¢i(X)
(i
~,
= 1,
.•. , Am' the probability of choosing Ai being
•.. , m) when x is observed.
to be a "randomized" d.r.
Such a d.r. is said
A d.r. is said to be "non-randomized"
if the values of the ¢.~ 's are restricted to
° and 1;
then each ¢.~
3
is a characteristic (or "indicator") function of a set, commonly
called a region, say R., of the sample space
~
jC
and the property
Z¢i ,. 1 implies that Rl' ••• , Rm are mutually exclusive and exhaustive.
vie call such regions "acceptance regions" since a non-
randomized d.r. is of the type:
m).
if x e R , accept Ai (i ,. 1, ••. ,
i
For every d.r. D, a set of functions ~ is implicitly assumed,
and we shall refer to a d.r. either by D or by the set of functions~.
A subscript n on D is sometimes used to denote the
corresponding sample size;
fining D.
n
l.n
denotes the set of functions de-
Unless otherwise specified, we assume the class of
d.r. 's at our disposal consists of all possible d.r. 's as defined
above for every possible fixed sample size
n
= 0, 1,
1.1.2 Problems of Simple Discrimination.
throughout this chapter that
f,
of elements
1-1
2,
.
...
We suppose
consists of a finite number, say
f l , ••. , ff' and we denote
1-1 =! ,.
(fl , ••• ,
ff); when this is the case, we say that the corresponding decision
problem is one of "simple discrimination" and a d.r. is a d.r.
for "simple discrimination" or for fldiscriminating among
f".
We
use the same notation to denote the joint density functions of
the first n random variables in the sequence, sometimes using a
superscript n to denote the sample size when necessary to avoid
4
confusion; similarly, the measure
~
may be used to denote a joint
n
measure of n random variables
= f~(x)
J
a
1\
that is, we denote
i-I
fj(X.)
~
f.(x), where x without a subscript denotes an n-vector,
J
and we say f.{x) is a density function w.r.t.
J
~(x)
(j
a
1, •.• ,
m) •
A d.r. D will be characterized by the functions
(1.1)
p.. (D)
~J
= Pr(D
chooses
A.I
J
f.)"
\ ¢.(x) f.(x)
J
~
~
d~" E'¢j(X)
~
~
(i
= 1,
••• ,
t;
j ..
1, ••• , m) where the subscript on the expec-
tation operator E denotes the corresponding density function.
If
D is a non-randomized d.r., (1.1) implies
Pij(D) •
l
c4t •
fi(x)
Rj
He shall consider two different criteria for choosing a
d.r. for simple discrimination.
The first criterion is applica-
ble only to the case when the number
t
of elements in
1-1
is equal
to the number m of alternatives and the alternative decisions correspond to the possible true density functions in such a way that
the decision A.~ is to be preferred when f.~ is true (i .. 1, •.• , m);
5
Ai is said to be a "correct" decision if f
i
is true and "incor-
(i = 1, ••• , m), so that when the d.r. D is used and f i is true,
Pi is the probability of a correct decision and qi the probability
of an incorrect decision.
DEFINITION 1.1: Let a
~
We now formulate the first criterion:
(aI' .•• , am) be a given vector of
positive l constants each less than one.
A d.r. D , based on a
N
sample of size N, is said to be a most economical m-decision rule
relative to the vector a for discriminating among f
-
~
(fl , ••• , f )
m
if it satisfies
(i
= 1,
... , m)
and if N is the least integer n for which (1.2) may be satisfied
by some m-d.r. D based on a sample of size n.
n
N is said to be
the most economical sample size.
Thus, by this definition (and assuming the sampling cost to be
proportional to the sample size), a most economical d.r. (hereafter
abbreviated
I~.E.
d.r.) is one with a minimum sampling cost subject
i. If any
a. were zero, we may just as well never choose
~
Ai and consider an (m-l)-d.r. for discriminating among the m-l fjts
(j of i).
6
to lower bounds on the m probabilities of correct decisions.
We
note that (1.2) implies lower bounds on the probabilities of an
-
incorrect decision, vis., q.(D) < 1 - a.
~
~
= 1,
(i
•.. , m).
A second criterion for choosing a d.r. will now be given.
We no longer require that
ding to each f
i
f = m,
but shall suppose that correspon-
one or more of the alternatives Aj is preferable
(or "correct") when f.~ is true.
Let
DEFINITION 1.2:
~
l'"
a
(S. ~J
.. ) be a given
Kx m
matrix of
positive constants such that for every i, j pair for which Aj is
a correct decision when f i is true
~ij
= 1.
A d.r. DN, based on
a sample of size N, is said to be a most economical m-decision
rule relative to the matrix
~
for discriminating among!. = (f1 ,
•• " ff) if it satisfies
(i = 1, ... ,
(1,3)
K;
j = 1, ... , m)
and if N is tho least integer n for which (1.3) may be satisfied
by some m-d.r. D based on a sample of size n.
n
N is said to be
the most economical sample size.
Thus, by this definition, a M.E. d.r. is one with a minimum sample size subject to an upper bound on each of the probabilities
7
of incorrect decisions.
If we wish to relinquish control of the
probability of any particular incorrect decision, we simply set
the corresponding ~ij = 1.
is true (i = 1, .f"
If
f ..
m and Ai is preferred when f i
m), then a M.E. dfr. relative to
troIs the probabilities of correct decisions if
~ ~i'
j1i
~
also con-
< 1 since
J
(1.3) implies
p .. (D)
11.
= 1- l:Pij(D);:ljfi
For some applications, one may
l:~i'
j1i J
,~sh
(i .. 1, ... , m).
to restrict the class
of d.r. 's to the class of all non-randomized d.r.'s and define
M.E. non-randomized d.r. 's relative to a vector or matrix by the
same definitions, but this may require an increased sample size.
Alternatively, one may wish to allow a random choice between two
(or more) consecutive sample sizes, and this may decrease the
average sample size over a number of experiments.
(Hoeffding ~8-7
has considered such procedures in the two-decision case.)
How-
ever, we shall assume throughout that tho sample size is non-random,
but shall allow randomized d.r. 's unless otherwise specified.
In the case of 2-d.rf's when
and P2l .. l-P22.
thesis HI: f .. f
H2 : f .. f 2f
Suppose
l
~
f .. m" 2, we have P12 • l-Pll
is the decision to accept the hypo-
and A2 is the decision to reject HI in favor of
Then P12 and P2l are the classical first and second
8
kinds of error, respectively, associated with a test of
H1
against
H2, and, denoting a • 1 - a1 • ~12 and ~ • 1 - a 2 • ~21' it follows that both (1.2) and (1.3) roduce to upper bounds (a, ~) on the
two kinds of error.
(if
K• m •
Hence, Definitions 1.1 and 1.2 are equivalent
2), both defining a M.E. 2-d.r. as one with minimum
sample size subject to these bounds.
Consider for each sample size
n, tho most powerful test Tn of size a of HI against H •
2
It may
be shown that if for some n the powor of T is at least 1 - {3,
n
then the test TN' where N is tho least n for which this is truo,
is "most economical".
Hoeffding
decision problems in same detail.
£8J has
considered such two-
Definitions 1.1 and 1.2 arc
two extensions of these concepts to multiple-decision problems.
In tho following sections we shall be concerned with deriving both types of M.E. m-d.r.'s for simple discrimination, as
defined by Definitions 1.1 and 1.2. We shall consider minimax
d.r.'s w.r.t. certain weight functions relative to the class of
all d.r.'s based on a sample of fixed size n and Shall provo that
in order to obtain M.E. d.r. 's we need only consider minimax
d.r.'s for various values of n.
Tho question of "admissibility"
of M.E. d.r. 's obtained in this way is also considered.
It
should be emphasized that we are using minimax theory only as a
tool; tho loss functions introduced are somewhat artificial.
Section 1.3.3 we shall consider another approach to obtaining
In
9
M.E. d.r. 's, but shall show that it is virtually equivalent to
the minimax approach.
First, in tho next section, we shall re-
view Wald's theory of minimax d.r. 's for fixed sample sizes when
the number of possible decisions and the number of possible distributions are both finite.
1.2 Minimax Decision Rules for Fixed Sma2le Sizes.
We shall re-
view briefly some of the concepts introduced by Wald in ~22_7,
Section
1.1, as applied in Section 5.1.1, and shall repeat some
of his theorems of Sections
3.5 and 5.1.1, altering his notation
slightly to conform to the notation introduced in the previous
section.
Thore are only minor differences between the "data of
the decision problem" as assumed by \vald and here: he treats distribution functions rather than generalized density functions
w.r.t. some measure, and he assumes univariate random variables
rather than allowing the possibility of multivariate random variDbles, but these and other differonces are inessential in what
follows.
"He assume throughout th at the sample size n has been
fixed.
1,1e assume a bounded weight function W( f., A.) ... W.. (i ... 1,
~
... , K;
J
~J
= 1, .0.' m), representing the Iloss' incurred by ac-
j
cepting A when f
j
1
is true.
The corresponding risk function when
using a d.r. D is defined as the expected loss:
10
m
(1.4)
r(f i , D)... Z '''''jP .. (D).
j=l ~ ~J
1/Je introduce an a priori distribution i::t (~l' ..• , ~l) over
_
1
( ) where S. > 0 and Z ~i .. 1; S. represents the 'probability
-~ . 1
~
..
~
that f
i
is true'.
The average risk relative to
r(~, D)::t
(1.5)
~
is defined:
m
Z Sir(f., D)" l; ~.W .. p .. (D).
i=l
~
i,j ~ ~J ~J
A d.r. D~~ is said to bo a Y3ayes d. r. relative to
5.
if it
minimi~os the average risk relative to~,; Le., if r(~, D~i-)
= inf r(~, D).
A d.r. DO is said to bo a minimax d.r. if it
D
= inf
max r(f., D).
~
D l-::i~
An a priori distribution ~O is said to be
a least favorable distribution if it maximizes w.r.t.
mum of the average risk; i.G., if inf
D
r(~O, D) = sup inf
~
The folloWing two theorems arc given by Wald
of Theorems
5.1, 5.3, and 3.9):
~
the minir(~, D).
D
(1:22_7,
parts
11
~HEORm!
1.1: A necessary and sufficient condition for a d.r. D
to be a Bayes d.r. relative to a given a priori distribution
that ¢.(x) = 0 for any x (except perhaps in a set of
J
~
is
~-measure
zero 2 ) and for any j for which
r
2: l;.vf. .f.(x) >
i=l
~ ~J ~
rr
min
2: ~.W·kf.(x) 7.
1<k<m- i=l ~ ~ ~
-
THEORE1 1.2: (i) There exists a minimax d.r. DO;
(ii) there exists a least favorable distribution ~O;
( iii) any minimax d. r. is a Bayes d. r. re1ati va to any
least favorable distribution, and conversely;
(iv) for any i for which l;~ > 0, r(f , DO)
i
=
max ref ., DO).
i~j~i J
ASsm1PTION 1.1: The measure ~ is non-atomic. 3
i
~
2.
By the s;-measuro of a subset R of
l;i ( f.(x)
i=l
)
~
X
we mean
d~.
R
3. A measure is non-atomic if every set of non-zero
measure has a subset of different non-zero measure; e.g.,
Lebesgue measure is non-atomic.
12
Dvoretzky, Wa1d, and Wo1fowitz
£4_7
have proved under
Assumption 1.1 that any randomized d.r. for simple discrimination may be replaced by an "equivalent II non-randomized d.r.,
lI
equivalent ll implying equality of risk functions.
We conclude this section with a lemma which will be useful
later on.
Conditions under which r n , defined below, tends to zero
arc given in Chapter III.
LEl~ 1.1: For every fixed sample size n = 0,1, 2, ... , let D~
be a minimax d.r. and denote r n
quence (r }, n = 0, 1, 2, ... ,
n
Proof = Let f)
n
= max
i
°n
r(f i , D).
is a non-incroasing sequence.
denote the class of all m-d. r. I s based on a sam-
ple of fixed size n (n = 0, 1, 2, ••• ).
n ~ N (N
= 0,
Then the so-
Now cf) C ij N for all
n-
1, 2, ... ) since we may take l(X l , •• ~, XN) indo-
pendent of xn+l ' ••• , xN' in which case ~ defines a d.r. in 1) n
and in .f) N'
Hence
= inf max r(f.,
r
n
De i)n
i
).
D) > inf
- De D
N
max r(f., D) = r N (n ~ N).
i
).
1.3 Most Economical Decision Rulos Relative to a Vector a.
We
II
13
shall apply the thoory of Section 1.2 to two specific weight
functions W
ij
and develop in each case a method of obtaining H.E.
d.r.'s as defined by Definition 1.1.
K= m.
In both cases we assume
Various properties of these docision rules are considered
and some examples indicated.
1.3.1 First Minimax Approach.
(1.6)
Let
.... 5 . .fai (i, j = 1, ... , m)
. 1.J
where 5.. donotes the Kroneker 5-function. 4 Then, from (1.4),
1.J
G. The loss function (1.6) may be interpreted in terms of
gains. Ono might develop a decision theory completely analogous
to Wald's theory but with Wald's loss function replaced by a gain
function which is positive when a "correct" decision is made and
zero otherwise. A Bayes d.r. might be defined as one which maximizes the average expected gain, and a ''maximin II d.r. as one which
maximizes the minimum expected gain. But such a theory would be
completely equivalent to the Wald theory if one defines loss as
negative gain, or, if one prefers a non-negative loss function, i f
ono defines loss as the difference between the maximum possible
gain (maximum over all possible distributions as well as over all
possible decisions) and the actual gain~ (Savage ['19_7 considered
Wald's loss functions in terms of gains, defining loss as tho difference between the maximum gain for tho prevailing true distribution and the actual gain, so that his loss is always zero when a
correct decision is made and positivo otherwise.) Either of these
loss functions satisfies Wald's requirements (if all gains are
finite) though it is not necessarily zero when a correct deoision
is made nor necessarily positive otherwise, as suggested -- but
never required mathematically -- by 1lald. Thus, we may consider
G' = 5. .fa. as a gain function and define the loss function in
1.j
1.J 1.
(1.6) byWij = -Gij , Qr, alternatively, we may replace (1.6) by
Wij ~ (~ Gij ) - Gij •
1.,J
14
the risk w.r.t. W.. is
~J
m
(1.7)
r(f., D) = E 0i' P .. (D)/a. = -p.(D)/ai
~
j=l J ~J
~
~
(i = 1, ... , m).
°
Theorem 1.2 asserts the existence of a minimax d.r. D for any
(fixed) sample size.
We shall consider minimax d.r. 's for each
value of the samplo size n
= 0,
1, 2, ••• , and prove the follow-
ing theorems
°
For each n = 0, 1, 2, ••• , let Dn be a minimax d.r.
THEORTIl1 1. 3:
w.r.t. the weight function (1.6) for smaples of fixed size n.
Suppose for some n,
max r (f ., DO) < -1
.
~
n-
(1.8)
~
and let N be the least such integer.
Then D~ is a M.E. d.r. rela-
tive to the vGctor £ for discriminating among
thero exists a M.E. d.r. relativG to
~
£.
Conversely, if
for discriminating among!
and the M.E. sample size is N, then D~ is a M.E. d.r.
Proo~a From (1.1) and (1.8), it follows that D~ satisfies (1.2).
Now suppose for somo n
<
N, thero exists a d.r. Dn satisfying
But nO is a minimax d.r. so that
n
15
max r(f.,DO) < max r(f.,D ) = max ["-p.(D )/a. 7.
i
1. n i
1. n i l . n
1.Since D satisfies (1.2), we have from (1.9) that max r(fi,nO)
i
n
~
-1, in contradiction to the fact thnt N is the least integer n
for which this is true.
Henca, D~ is a ME d.r.
To provo the converse, suppose
-1
n
~
m:;xf-Pi (DN)/a i
1.
_7 = m~
1.
~N
r(fi' DN)
is aM. E. d. r.
~ m~
r(fi'
1.
is a minimax d.r. Hence, (1. 8)"is satisfied for n
= N,
D~)
Then
since
D~
and since N
is tho H,E. sam~le size, D~ is a H.E. d.r.
II
Henca, to obtain M.E. d.r. 's relative to
~,
we need consider only
minimax d. r. I s for various values of n.
Lemma 1,1 should prove
helpful in finding the M.E. sample size since it assures us that
any n for which (1.8) is violated is too small.
Now let us con-
sider tho structure of minimax d. r. I s for a fixed sample size n.
Let
~
be an a priori distribution.
m
l: l;.W .. f.(x)
i=l
1.
l.J
1.
= -l;.f .(x)/a.
J J
J
(j
Now
= 1,
... , m).
Hence, by Theorem 1.1, a necessary and sufficient condition for n
to be a Bayes d.r. relative to
in a set of
~-measure
~
is that for any x (except perhaps
zero) and any j for which ¢.(x) > 0, we have
J
16
~.f./a.
(1.10)
J J
DEFINITION 1.3:
J
= max
~.f./a .•
l<i<qn
1. 1.
1.
A d.r. D defined by ~(x) is said to be a likeli-
hood ratio d.r. if there exist positive constants aI' ••• ,
9'. (x)
such that for any j and any x for which
>
-
a.f.(x) for all i
1. 1.
> 0, a.f. (x)
J J
r j.
(Note that aI' ••• , ~ determine
= m~
for which aifi(x)
J
~
l
completely except in sets of x
ajfj(x) for morc than one value of i.)
J
Setting a. = ~./a., it follows from (1.10) that a Bayes d.r. relaJ
J
tivc to any
~
J
for which all
~i
>
a
is a likelihood ratio d.r., and
conversely.
Moreover, it follows from (1.7) and Theorem 1.2 (iv) that
if all components of a least favorablo
dist~ibution
are positivo,
any min~nax d.r. DO has the property:
(1.11)
We shall give sufficient conditions for this to be true.
ASStJ!.lPTION 1. 2:
If R is a subsot of
X
for which ) f
R
i
(xl d!J. = 0
11
for some i, then \ f i (x)
)
~
... 0 for all values of i.
(Whenever
R
this assumption is made, we shall tacitly assume that J( is redefined so as to exclude all such sets R; this implies that fi(x)
> 0
for all i and for all x eX.)
We prove a theorem analogous to Waldls Theorem 5.4
["22_7 5 ; the
proof is also analogous.
If Assumption 1.2 holds, all components of a least
THEOREM 1.4:
favorable distribution ~O w.r.t. the woight function (1.6) are
positive.
Proof,
Consider the d.r. D defined by ¢.(x) = a./
1
in x (i
= 1,
1
•.• , m); and let DO be a minimax d.r.
m
~
a. identically
j=l J
Then
(1.12) max r(f.~DO) = in! max r(f.,D) < max r(f.,D) ... mJ~(-aj/~iai)
j
J
D
j
J
-
j
J
< O.
5.' It might be noted that Wald's condition (iii) of
Theorem 5.4 is superfluous since it is always fulfilled; e.g., in
Wald's notation, let 5. = l/u (i ... 1, ••• , u) identically in x,
1
and then r(F., 5) = (u - l)/u < 1 for j = 1, ... , k.
J
18
Now suppose for some j, l;~
= 0.
lhcn, under Assumption 1.2,
° = l;jf° J.(x)ja.J < maxi l;.f.
° (x)/ai
(1.13)
~ ~
for all x.
I
But DO is a Bayes d.r. relative to ~O (by Theorem 1.2) so that
(1.10) and (1.13) imply ~(x)
J
p.(DO) = 0 and r(f., DO)
J
~~
J
J
>
° for
in
identically in x.
Hence,
contradiction to (1.12).
Hence,
II
all j.
1. 3.2
= 0,
=0
Second Minimax Approach.
lve consider a second
weight function W•• :
~J
l-I( f ., A.)
~
Let 1 - a j
(1.15)
= ~j'
J
= TN.~J• = (1
- 5 •. ) 1(1
~J
-
a.)
J
(i,j
=
1, ... ,m).
Then
r(f.,
D) =
~
q.(D)/~.
~
~
(i
= 1,
... , m).
Theorem 1.2 asserts the existence of a minimax d.r.
Proceeding
as with the first weight function, we stato the following theorem,
the proof of which is not given since it is analogous to the
proof of Theorem 1.3.
19
THEORm'~ 1.5: For each n = 0, 1, 2, ••• , let D~ be a minimax d.r.
w.r.t. tho weight function (1.14) for samples of fixed size n.
Suppose for same n, max r(f., DO) < 1, and let N be the least such
•
1.
1.
n-
Then D~ is a M.E. d.r. relative to the vector ~ for dis-
integer.
criminatinG among
1. Conversely, if there exists a M.E. d.r. rom-
tive to ~ ffi1d N is the M.E. sample size, then D~ is a M.E. d.r.
Thus, we
h.:lVO
a second method of obtaining
d. r. t s (if exis-
M. E.
tent) from minimax d.r. 's.
Now let
~
be an a priori distribution.
m
~
m
t'.
H. .f . (x) =
i=l~1.
1.J 1.
•
~
t'.
f . (x) 18.
'1.
s1. 1.
1.=1
- ~ J.f J.(x ) IA~J.
Then
(J' = 1, ..., m).
By an argument analogous to the one usod with the first"weight
function, we have: a nocessary and sufficient condition for D to
be a Bayes d.r. relative to a givon a priori distribution
that for any x (except perhaps in a set of
for any
where b j
j
~-measure
for which ¢.(x) > 0, b.f.(x) > b.f.(x)
J J
J
= ~j/~j
(j
= 1,
... , m) •
-
IS,
(i
is
zero) and
= 1, ... , m)
Hence, Bayes d.r. 's w.r.t. the
weight function (1.14) relative to any
also likelihood ratio d.r.
1. 1.
~
r
for which all ~i >
and conversely.
° are
20
Horoover, it follows from (1.1$) and Thoorem 1.2 (iv) that
it all components of a least favorablo distribution arc positivo,
any minimax d. r. DO has thc property:
(1.16)
We shall givo sufficient conditions for this to be true.
t:I!MMA 1.2:
If Aseumption 1.2 holds, and if there exists some d.r.
D for which r(fi' D) < 1/ max ~j
( i · 1, ••• , m), then all com-
l~j~
ponents of a least favorable distribution arc positive.
Proof:
Let DO
be a
minimax d.r.; then
max
j
< l/max ~j
j
r(f j , nO) .::
M?
r(fj,D)
J
so that
for all j.
(1.17)
Now suppose for some j, ~~
= O.
By an argument analogous to the
one in the proof of 'lheorom 1.4, this implios ~(x) • 0 identically in x.
Hence, qj(nO)
diction to (1.17).
/1
=1
and ref., DO) • 1/~., in contraJ
J
21
1
If8.<~
,
~
m
~{3.
m-.L j=1' J
(i.o., ct.1.
~ct.-1
>
--1-rn:r ) for all i,
then thero exists a d.r. D for which r(f i , D) < 1/m~ ~j for all i.
J
Proof:
Let D be dofined by ¢.(x)
1.
in x (i
=
1, ... , m).
for all j (i = 1,
LEMMA 1.4:
for which
Proof:
= l-(m-l)~./~~.J
~
... , m) • II
then
°
~i >
= l/~i;
J
1/8.
< 1/6..
.J
.~
1.6:
j
° implies
henco, we need only provo that r(fi,DO)
~
~~~
° ° for so me
~. >
0.
By Theorem 1.2 (iv), we have max r(f., DO)
<;
If
In the proof of Lemma 1.2, we found that ~~ =
r(f i , DO)
-
°identically
Then r(f i , D) = q.(D)/a.
= (m-l)/ZA.
< liB.
~.~
~J' J
Suppose Assumption 1.2 holds.
~j > ~i'
>
=
<
l/~i'
r(f., DO) = q.(DO)/B.
J
J' J
II
Suppose Assumption 1.2 holds.
For any sample size
greater than or equal to the M.E. sample size, all components of
a least favorable distribution arc positive.
Proof:
Suppose n ~ N, the M.E. sample size, and that D~ is a
22
minimax d.r.} for samples of size n; then, using Lemma 1.1 and
This implies q.(DO) < ~. and
Theorom 1.5, DO satisfies (1.2).
n
= q.(DO)/~.
1. n
1.
r(f., DO)
n
1.
proof of the theorem.
1.
< 1 < llmax
j
-
~..
n
-
1.
Lemma 1.2 completes the
J
II
Hence, under the conditions of tho theorem, D~ is a likelihood
ratio d.r.
A possible advantage in using this second weight
function in order to obtain M.E. d.r.
IS
is that if for some none
of the components of a least favorable distribution is zero, we
know immediately that n is less than the M.E. sample size.
(It
cannot be greater by Lemma 1.1.)
vious sections that, under Assumption 1.2, a M.E. d.r. relative
to
~
obtained by one of the minimax methods satisfies (1.11) or
(1.16).
This suggests an alternative approach for obtaining
these d.r. 's which is described balow, the proofs of the various
statomonts being indicated but not always given in detail.
This
method givos somo geometric insight into the properties of the
d.r.
IS
thus obtGined.
Assume the sample size n is fixed, and,
c1assos of d.r. 's C = C(~) and I
= I(~)
given~,
as follows:
define two
23
...
where ~i • 1 - a i
(i
= 1,
= Pm(D )/am )
•. 0' m) as before; that is, C is the
class of d.r. 's for which the m probabilities of a correct decision are in assigned ratios and I is the class of d.r. 's for
which the m probabilities of an incorrect decision are in assigned ratios.
A d.r. is said to be optimum in C (or I) if it
maximizes over all d.r.'s in C (or I) all probabilities of corroct decisions; that is, if it maximizes the common ratio
Pi(D)/a i (or minimizes the common ratio qi(D)/~i)' Clearly, optimmn d.r.'s in C and I, when existent, are minimax d.r.'s w.r.t.
tho weight functions (1.6) and (1.14), respectively.
Rao (~17_7, page 311) has considered the class I of d.r.'s
for problems of classification in multivariate analysis, in which
case tho sample consists of ono observation from a multivariate
population.
This is no restriction, of course, since a sample of
size n from a p-variate population may be treated as a sample of
size 1 from an np-variate population.
Rao gives sufficient condi-
tions for a d.r. to be optimum in I; his conditions are that if
thero exists a likelihood ratio d.r. in I, then it is optimum in
I.
Howover, his conditions cannot always be fulfilled since I
may be a null class (if m > 2).
Heuristically, this may be seen
24
as follows: suppose one
~i'
say 81 , is very close to unity and all
others are very close to zero; then for any d.r. in I, tho ratio
even for a d.r. for which Pl
= 0 it may not be possible to mako
all the other p.ts
sufficiently large for this to hold.
1
If this
is the case, the obvious thing to do is to always reject
.~
and
consider (m - l)-d.r. 's in the corresponding class I.
A theorem completely analogous to Raots theorem may be
proved for tho class C, giving the samo condition as sufficient
characterization of an optimum d.r. in C; and we shall sec that
C is never null.
It will be shown by a geometric argument that
similar conditions are also necessarj' for a d.r. to be optimum.
Much of this development was suggested by similar arguments of
Consider the m-dimensiona1 Euclidean space of points
E = (PI'
Pi
•• " Pm); every d.r. D has a corresponding point £ with
= Pi(D). Let P denote the set of points corresponding to all
,
possible d.r. ts (for a fixed sample size), and let C and I
note sub-sets of P corresponding to C and I respectively,
,
de-
It
may be proved (e.g., as a corollary to Lindley's Theorem 1.1
~16_7 since P is a subspace of the space considered by him, or
as a special case of a theorem of Dvoretsky, Wald, and Wolfowitz
25
(see footnote 2 in ~24_7) for the weight function (1.6) or (1.14»
that P is a convex body; i.e., P is closed, bounded, and convex.
Clearly, P is enclosed in the unit cube U, and all points with one
coordinato unity and the other m - 1 coordinates zero are points
in p.
Since P is convex, the intersection of U with the hyper-
plane ZPi
=1
determined by tllese m points is contained in P; dc-
noto this IIflat" sub-space of P by Po 6•
Tho equations Pl/~
= ... = Pm/anI
determine a line LC pass-
ing through the interior of U (since all a. > 0) and passing
~
,
through the origin, and C is the segment of this line intersecting P; such a segment exists since LC must intersect PO'
Since
,
P is a convex body, C is closed and the endpoint furthest from
the oriein, say 2C, corresponds to any optimum d.r. in C.
Let
al , ..• , ~ be constants (at least one of which is positive) determining a supporting hyperplane to P at pC; i.o., Za.p. = Za.p?,
-
and for any
(1.18 )
~
n
,di;.
. 1
6. It
~
~
C
e P, Za.p.
< Za.p.,
or
~~_
~~
a. ) ¢.(x)f.(x)
~=
~
~
~
:x
~ay
~
~
<
~
- ~=
. 1
a.
~
~
¢9(x)f.(x)
~
~
d~
X
be pointed out th~t all points on and below Po
(that is, on the same side of Po as the origin) correspond to
"trivial" d.r. 's and all points above Po to "non-trivial" ones,
as defined in Chapter III.
~
26
whore ~ defines an optimum d.r. in C and
d.r.
Lot 6 be a "small" subset of
positive e, ¢i(x) > e for ull x e
X
~
l
defines an arbitrary
such that for u f1 small"
~
und
fl(x)
d~
> 0.
That
6
such a set exists, at least for smnll e, is guaranteed since
pi > O.
For some j > 1, define
¢.J. (x) =
l<x)
as follows:
¢i<x) - e
if x e 6 for i
=1
¢:(x) + e
if x e
=j
J
if x
¢?(x)
J.
I
~
for i
= 1,
6 for i
j
and for all x for
i .; 1, j.
Then (1.18) implios
fl(x) d~ > n.
-
J
which is true for any j, so that by tnking 6 and e sufficiently
small we have that
alfl(x) > a.f.(x)
-
J J
for which ¢i(x) > 0 and fl(x) > O.
~i is positive and pi
> 0, a
(j
= 1, •.• , m) for all x
Moreover, since at least one
l must be non-negative.
A similar
27
relation holds for i > 1 as well as for i
= 1,
so that any d.r. D
for which E(D) = gC is of the likelihood ratio form with nonnegative constants 3 1 , ••. , am' which are not in general necessarily all positive.
But clearly, since all p.C
J.
>
0, under
Assumption 1.2 all ai'S must be positive, so that in this case
an optimum d.r. in C is a likelihood ratio d.r.
Now consider tho line LI determined by tho equations
(1 - Pl)/~l
= ... = (1
- Pm)/Bm•
This line nasses through tho
interior of U (since all Bi < 1) and through tho point (1,1, •.. ,1),
but docs not necessarily intersect Po and may not intersect P
(seo the reffi3rks in the third paragraph of this section).
In
fact, it may be shown that LI intersects Po if and only if every
~i ~
m
1
.t
8j /(m - 1).
If
~ intersects P, I
'
is the segment of
J=-l
L1 intersecting P, and EI , the endpoint of I' closest to the
point (1,1, ... ,1), corresponds to any optimum d.r. in I.
Lr
If
intersects P, it may be shown, as in tho previous caso, that
7. The sufficiency is proved by constructing a d.r. which
is in both Po and I as in LcIllina 1.3. The necessity is proved by
showing that if ~1 > t~j/(m-l), say, then LI intersects the hyperplane tp. - 1 in a point with a negative first coordinate, and
J.
hence outside of U.
28
any d. r. D for which E(D) = EI is of the likelihood rntio form de-
finea by tho non-negative constants b , ..• , h
l
m
v~j~h cGto~line
a supporting hyperplane of Pat E. I, and if Ass'Jmp"':Jcn 1.? holds,
all b
i
nrc positive.
If
11
M.E. d.r. relnt:L7o to
it may be provod under Assumption 1,2
t~at
1
r
~
G:x4.sts, then
intnrsocts P for
the H.E. smnple size; i. e., that I is not null.
A direct proof
has beon established but will not be given since Theorem 1.6
proves the same thing.
The fact that I mny bo ndl for smaller
sample sizes is e ssontially equivalent to the f8ct that some of
the components of
for such
s~mple
0
least favorable distribution may be zero
sizes.
Now let Pn , Cn , and In denote the corresponc.ing classes
P, C, and I for
sa~ples
of siZG n, n
=
0,1, 2, ' ..
n, the two lines L
Cn and L1n intersect in tho point £
is interior to U.
For any
=~
which
Intuitively, the least n (if any) for which
this point is contained in P is the fl.E. sample size; formally,
n
we have:
THEOREH 1. 7:
Suppose for some n tho point .E
N be the least such integor.
= £ is in Pn and let
Then N is tho M.E. snmple size.
l1oreover, thoro exist optimum d. r. I s in C and IN and any such
N
d.r. is a M.E. d.r.
29
Proof: 1To assume l:a. > 1 since othe!'tviso 0 is the H.E. sample
~
size and the theorem is trivial.
Clearly, any d.r. D for which
= a satisfies (1.2), and any such D is in C and in I so
2(D)
that d.r. 's corresponding to ,£C and E1 exist and satisfy (1.2).
We need only prove that N is the ii.E. sample size.
is not and that D is a M.E. d.r. for n < N.
n
timum d.r. in Cn' and suppose aI'
... ,
Let DC be an opn
aTil are non-negative conC
stants defining a supporting hyperplane at E n.
>
-
l:a.p.(D ) > l:a.a
n
~~
-
~
i
C
Then l:a.p. n
~
all i so that tho point E
size.
~
since D satisfies (1.2); but since tho
n
C
C
n
n
Pi's are proportional to the ai'S, this implies Pi
p.
n
Suppose it
= ~,
~
a i for
C
being between .E n and PO' is in
This is a contradiction, so that N is the M.E. sample
//
Clearly, for n
= 0, Pn = Po defined previously, and similarly to
Lemma 1.1, we have that P C P if n < n •
l - 2
nl n2
Under "favorable II
conditions, wo might expect Pn to be a proper subset of Pn+ 1 for
30
every nand P to tend in the limit to U, thus guaranteeing the
n
existence of a M.E. d.r. relative to
any~.
conditions will be given in Chapter III.
Such "favorable"
(Though stated in dif-
ferent terms, it is easy to verify that Theorem 3.6 does just
this.)
1.3.4 Admissibility. When considering M.E. d.r.
tivc to a vector
~,
IS
rela-
we shall define admissibility as follows: a
d.r. D is said to be admissible if there does not exist a d.r.
D
r
for the srune sample size for which
I
p.(D ) > p.(D)
J.
-,J.
(i
= 1,
with strict inequality for at least one i.
... , m)
For either of the
weight functions introduced in Sections 1.3.1 and 1.3.2, this is
equivalent to Wald's definition (seG ~22_7 or ~24_7).
A Bayes d.r.
n*
relative to the sequence of a priori dis-
tributions (~l, ••• , ~h) is defined inductively as follows: for
h
= 1, D~~ is a Bayes d.r. relative to ~l; for h
> 1,
D~A- mini-
mizes the average risk relative to ~h w.r.t. all d.r. 's which
1 .•. ,
are Bayes d.r.'s relative to the sequence ( ~,
h-l) • This
~
definition is due to Wald and Wolfowitz ~24_7; they proved the
following theorem:
31
THEORm'l 1.8: A necessary and sufficient condition for a d.r. to
be admissible is that it is a Bayes d.r. relative to a sequence
of h (~m) a priori distributions (~l, •.. , ~h) such that
(i) for any j, s~ > 0 for some i, and
~~) the sequence (~l
( ~~
~, ••• ,
h- l ) does no t h ave propert y (.)
~.
r
~
It follows that a Bayes d.r. relative to any
ponents are positive is admissible.
...~ of which
all com-
Hence, any likelihood ratio
d.r. is admissible, and M.E. d.r.'s obtained by either of the
minimax approaches of Sections 1.3.1 and 1.3.2 are admissible i f
all components of a least favorable distribution are positive;
under Assumption 1.2, this latter condition holds (by Theorems
1.4 and 1.6).
But suppose Assumption 1.2 docs not hold.
We shall now
show, for any sample size, how a minilnax d.r. w.r. t. tho weight
function (1.6) may be altered to obtain an admissible d.r.
The
same method may be used for the M.E. smaplo size for the weight
function (1.14).
This will enable us to obtain M.E. d.r.'s which
are admissible.
Geometrically, tho situation is this: the line LC intersects the surface of P in a point BC; if this point is at a
"flat ll part of the surface of P, then we may be able to obtain a
"better" point by choosing a boundary point of this flat sub-
32
space which is further from PO; that is, without changing any of
the coordinates of ~C determining the sub-space, we may increase
C
some of the coordinates of E within the subspace.
Lot
'£.0 be
a least favorable distribution and DO a minimax
d.r. w.r.t. the weight function (1.6), and suppose k (0 < k < m)
of the components, say
° ... , °
~k'
~1'
arc zero.
We shall introduce
below a distribution '5.1 with the first l~ components positive and
re-definc DO, without increasing the maximum risk, in such a way
that it is a Bayes d.r. relative to the sequence
Theorem 1.8, this d.r. will be an
a~~issib1c
°,
(~
~1 ).
Using
minimax d.r.
Define the class $:)0::: (D : D is a Bayes d.r. relative
to
50°) ;
i.e.,
,e;0
is the set of all minimax d.r.
IS.
The
d.r.
DO, defined by ¢O(x), is a particular member of this class.
S ::: (x : f
lex) ::: ...
k+
that S is not empty.
= f In (x)
Lot
::: O} ; we shall verify presently
°
Now for all x e 'X - S, max ~ .f (x)/a > a
i
k<i<m ~ i
so that by Theorem 1.1 (seo (1.10», ¢l(X) ::: •.. :::
all x e ')( - S for any
lex)
defining a d. r. D e
using (1.1), for any such d.r.,
~k(x)
.f) 0.
:::
a
Henco,
for
33
)
~i(x)fi(x) d~
for i = 1, •.• , k
S
(1. 20)
p.(D) ::
~
~
d~ for
¢i(x)fi(x)
i = k+l, ••• , m.
'X-s
Then, if we ro-define ~(x) in S, we will not affect Pk+l(DO),
... , Pm(DO) •
Let si = \ fi(X)
~
Suppose si =
(i = 1, ••• , k).
°
S
for some i, say i = 1; then from (1.20») Pl(DO) = 0, implying
"
r(f , DO) = 0, in contradiction to (1.12)
l
pend on Assumption l.?).
(which does not de-
Therefore, all s.
J.
>
0, implying that S
is not empty as well.
Lot f~(x)
~
(i = 1, ..• , k) be the conditional density
function of x w.r.t.
~
~~
~
~~
= f.(x)/s
.. Let
J.
J.
distribution over (f*, ... ,
given xeS; i.o., f:(x)
J.
i~
* be some a priori
= (i;l'
... , ~k)
l
define a k-d.r. D'\ say, which is a Ba~Tcs d.r. relative to 5,*
~
*
for discriminating among fi, ..• , f~; i.o., D satisfies
\1
34
" * = inf r(~ *,D) = - sup
r(~'~,D)
(1.21)
-
D
l
-
*
~.
-2:
kz
i=l u i
*
\ ¢1..(x)f (x) c%J.
i
S
k
= - c sup
l
z
~~
-2: ( ¢.(x)f.(x) d~
i=l ui )
~
1.
S
where
1
~.
1.
-l~
k*
= ~./s.c
and c = Z ~./s .. (If k = 1,
1. 1.
i=l 1. 1.
single component "l".)
j
rI*
~ (x)
has the
By 'Iheorcm 1.1, for any xeS and for any
for which ~J'(x) > 0, we have (J::/J-:<X)/U '
J
=
max ~~f~(x)/u. or
l<i<k 1. 1.
1.
l;~f .(x)/u. = max ~~f. (x)/a .•
J J
l<i<k 1. 1.
J
1
Let ~
=
1
1.
1)
1
(~l' •.. , ~m where ~k+1
=
1
= ~m
= 0 so that
~i = 1, and let D1 be an m-d.r. defined by
f(x)
if x e '1 -
s
L¢~(X), ..• , ¢~(x),
0,
...,
if xeS •
Now, by definition, n1 is a Bayes d.r. relative to the sequence
35
r(~l,
nl ) = inf
r(~l,D) = - sup
DeJ:P
k
.
Dit)O 1."
-
l;~
~)
1 a.
Z
1.
S
¢.(x)f.(x)
1.
. S so that by «1.20), r
Now Do and Dl are equal Qxcept l.n
_
-
0
r(~
0
1
fjO
, D ); hence, D e .
1.
~0 ,
d~.
D1 )
1110reover, any D e l:'0 for which
the supremwa in (1.23) is attained must have ¢k+l(x) .....
.. ¢m(x) = 0 for xeS since Z¢i(x) .. 1.
Clearly, then, from
(1.21), (1.22), and (1.23), Dl is a Bayes d.r. relative to (~O,
1.3.5
Likelihood R<'Itio Decision Rules.
It was shown in
the previous sections that to obtain M.E. d.r. 's relative to a
vector £ under Assumption 1.2 we need consider only likelihood
ratio d.r. 's, and that any likelihood ratio d.r. is admissible.
Suppose the density functions in ~l are completely spccified except for a parameter Q,
t •• ,
Q (assuming
m
tcl~ing
on one of the values Q1'
f = m). Clearly, then, to obtain a likelihood
ratio d.r. we need consider only sufficient statistics, if existent.
The following theorem is of a related nature.
THEOR~1 1.9t
Suppose there exists a statistic t .. t(x) which is
a monotone increasing function of f.(x)/f.(x) for every j and
J
1.
36
for every i < j (for some ordering of the subscripts).
there exist constants
0
1
Then
, •.• , c _ suoh that if ~(x) defines a
ml
likelihood ratio d.r., we have
~
t
¢ .(x)
J
implies
> 0
for j
cl
J-
-
-
a
m
J
for j
-
Proof:
for j
c·l<t<c.
t > c
m-l
for any j and any x
=1
= 2, ..• ,m-l
=m
eX.
By Definition 1.3, thoro exist positive constants aI' •.• ,
such that ¢.(x) > 0 implies f.(x)/f.(x) > a./a. for all i
J
J
J.
-J.J
r j.
Now f./f. > a./a. for all i < j implies t > b .. for all i < j for
J
J. -
J.
J
-
J.J
some constants b .. (i < j), and f./f. > a./a. for all i > j (or
1J
f./f. <
1
J -
J
1 -
1
J
a./a.) implies t < b .. for all i> j. Hence
J
1
-
J1
( t ~ min b l .
\
i>1
for j
=1
1
max b .. <: t < min b .. for j=2, ..• ,m-l
i<j 1J i>j JJ.
t > max b.
-'"""",
J. -':il
1m
for j
=m ;
37
or, in particular,
t
¢. (x) > 0 implies
J
b
~
< t < b
j-1,j -
t >
-
Set c ' = b . . 1
J
=1
b12 for j
-
j,j+1
bli1-1 ,m for
j
for J'
=
2, ... ,m-1
= m.
II
(j = 1, .•. , m-1), and tho proof is completed.
J,J+
This theorem characterizes !{.E. d.r. 's obtained by either of the
minimax methods of Sections 1.31 and 1.3.2.
If all probabili-
ties of correct decisions are to be positive, then -00 < c
1
< ••• <c 1<00
- m-
(aI' 1-al' 0,
=
( 0, a , 1-a ,
2
2
if t
...
= c1
if c < t < c
l
2
... , 0) if t
( 0, 0, 1, • •• , 0)
( 0, 0, 0,
if t < c
1
... , 0)
(0, 1, 0, ••• , 0)
2(x)
c
and
(1, 0, 0, • •. , 0)
(1.24)
~
... c
2
if c < t < c
2
3
.......•
·.. , 1)
if t > cm-1
where aI' ... , am_l are appropriately chosen constants betwoen
2
38
zero
and
If Assumption 1.1 is satisfied, the a.J. 's may be set
one.
equal to 0 or 1 arbitrarily and
2 then
defines a non-randomized
d.r.
Note that if ench f.(x) is of the form
J.
f.(x)
(1. 25)
J.
= A(x)
exp
;-p
-
(Q.) t(x) + T (Qi)
J.
where Ql' •.• , Om are so ordered that
P(gl)
-
7
< •.• <
p(gm)' then
the conditions of Theorem 1.9 arc satisfied.
1.3.6 Four Examples.
In this section we give four cxam-
pIes of M.E. m-d.r. 's relative to a vector
~
for simple discrimina-
tion, each of which is obtained by the characterization theorem
of the previous section.
For m ~ 2, they reduce to the standard
"one- sided" tests of hypotheses.
EXAHPLE 1:
liean of Normal Distribution.
Each f i (x) is a normal
density function with variance 0 2 (known) and with mean Qi (-00
< Q < .•. < Q < 00).
m
l
The m alternativGs
~,
••• ,
Am
corres-
is of. the form
pond to tho true densities f l , ••• , f m• Now.f.(x)
J.
(1.25) with t(x)
the
pre\~ous
regions
=
x,
the sample mean, so that, by the results of
scction, a non-randomized d.r. Dn with acceptance
39
(1.26)
Rn
1
= (x : -x
Rn
= {x
j
~
c nl )
cnj 1 < -x < c.n } , 1 < j < m
-
-
J
n
c
<xl
m-l
n
R = {x
m
is a likelihood ratio d.r.
Since Assumption 1.2 is satisfied, a
likelihood ratio d.r. satisfying (1.11)
(or (1.16»
with n
chosen so that the ratio in (1.11) is npproximatcly, but not less
than, unity is a H. E. d. r.
HO ma~r
Hence,
obtain a M. E. d. r. by
first solving the following equations for n, c~, ... , c~_l' with
p
n
=1:
c
=
-
(1.27)
p/Dn )
whore §
- Q
1
rJ'( III -l - ) =
(J
= §(In
I Pm(Dn ) = 1 - §
l
n
n
c .-0.
J
J)
(J
a
1
n
n
c. 1 -g,
.!l (jn J- a 1)
n
c
m-l - Qm)
<.jn
a
, l<j<m
= a.p
J n
= amP n
denotes tho standard norlilC"!l distribution function, and
thon, choosing N to bo the least integer
for
p
~
nand re-solving (1.27)
c~, ..• , c:_l,P N' with n = N, the acceptance regions (1.26)
40
with n
= N will
define a M.E. d.r. relative to a which is a mini-
-
max d.r. w.r.t. (1.6) for samples of size N.
may replace (1.27) by equations of tho form
Alternatively, we
,
n = (l-ai)Pn
1 - Pi(D )
and proceed as before, obtaining aN.E. d.r. which is a minimax
d.r. w.r.t. (1.14).
ci's for which a
Or, solving for N as above, we may choose any
likelihoo~
ratio d.r. DN satisfies (1.2), and
this d.r. will be a M.E. d.r.
Equations (1.27) with P
n
=1
may
be solved iteratively by
choosing a trial value of n, solving the first equation for c~,
the second for c~, and so on until the jth equation is unsolvable,
and then choosing another trial value of n, and so on; at each
stage it will be obvious from the jth equation whether to try a
larger or smaller n.
Elk: IPJE 2:
Varianco of Normal Distribution.
normnl density function with mean
1-1.
Each f.(x) is a
~
(known) and v3riancG (j~
~
f.(x) is again of the form (1.25)
~
but nmi with t(x) = Z (X
k
lar to Exmaplo 1 with
k
- ~)2/n.
Honco, the results are simi-
x replaced by Z(Xk
k
- ~)21In in (1.26), and
the normal distribution functions replaced by the corresponding
X 2 distribution functions in (1.27).
n
41
EXANPLE 3:
Binomial Parameter.
Each f.(x) is a point binomial
J.
probability function with parameter O.J. (0 <
Q
and hence is of the form (1.25) with t ... Ex k •
similar to Example 1 except that here a
usually be required.
l
< ... < g
m < 1),
The results are
r~ndomized
d.r. will
The nocessary modifications to the methods
of Example 1 are obvious.
EXN1PLE
4:
Poisson Parameter.
Each f.(x) is a Poisson probabiliJ.
ty function with parameter A.J. (0 < Al < •.. < Am < 00), and hence
is of the form (1.25) with t ... x.
lbe results are similar to
those of Example 3.
1. 4 Host Economical Decision Rules llelative to a Matrix
a.
1'10
shall derive a method of obtaining H.E. d. r. I s as defined .by
Definition 1.2, using minimax theory.
In this development we rc-
p1aco each .8 J. which is equal to unity by
i
+00
makes no effective change in Definition 1.2.
which, of course,
This has the same
effect as omitting completely all corresponding terms throughout,
but because it is complicated notationally to do so we use the
above device.
Suppose n fixed, and let
g.. w.r.t. I.l. (i
J.J
=
1, ... ,
K;
tically in x; thus, there arc
j
1-1'
be a set of density functions
= 1, ... , m) where g.. = f. idenJ.J
y' ...
fm clements in
1-1'.
].
Dofine a
42
weight function W(gij' ~) ::: Wijk where
1/~.,
J.J
if j ::: Ie
(i
(1.28 )
= l, ... ,f;j,k = 1,
••• , m).
° otherwise
We shall consider m-d.r. 's D for choosing among i~,
when one of the (
the "los s
k ).
A
(I
,
density functions
g"
l.J
Am
is lItrue ", and where
incurred by choosing A. vlhon g., is "true
l.J
k
••• ,
(I
is W( g..
. J.J'
This interpretation is meaningless, but it servos to
clarify the approach.
The risk function when using a d.r. D
,
,
and g.. is "true ll is: reg. " D) ::: ZH. 'kP"l (D)
l.J
k J. J
l.J
::: Pr(D chooses Ak
I gij)
PikeD).
reg .. ,D) ". p .. (D)/a ..
l.J
l . J ' l.J
where p. 'k(D)
l.J <:
J.J
From (1.28), then, we have
= 1, •.. ,(;
(i
Theorem 1.2 when applied to the class
i-l'
j ". l, ... ,m).
and the weight func-
tion (1.28) asserts the existence of a minimnx d.r. DO; Le., a
cl.r. DO such that maxrp . . (DO)/~. ,
.
. .1.,J
He h <NO the theorem:
J.J
7 = infrmax
p .. (D)/B. , 7.
- . ,1.J
. l.J-
l.J-
D
J.,J
43
THEORn11.l0:
For each n
= 0,
°
1, 2, ••. , let D be a minimax d.r.
n
w.r.t. tho weight function (1.26) for discriminating among (gIl'
g12' ••• , gfm) for samples of fixed size n.
Suppose for soma n,
max reg. "DO) < 1
(1.30)
,.
1.,J
1.J
n
-
and lot N be the least such integer.
tive to the matrix
~
°
Then D is a M.E. d,r. relaN
for discriminating among (fl , .•• , ff)'
vorso1y, if there exists a M.E, d.r. ro1ative to
~
Con-
and N is the
°
II.E. sample size, then DN is a M.E. d.r.
Proof:
From (1.29) and (1.30), it follows that D~ satisfies
Now suppose for some n < N, there oxists a d.r. Dn aatisfying (1.3).
But DO is a minimax d.r. so that maxLP .. (DO)/~, _ 7
n
= max r(g'j' DO)
i, j
1.
n
'j
1.,
1.J
n
< max r(gi"D ) ;: maxLP'j(D )/~, .7 < 1
- i, j
J n
i, j
1.
n
1.J--
1.j-I-
by
(1.3), in contradiction to the fact that N is the least integer n
for which this is true.
Hence, D~ is a M.E. d.r.
To prove the converse, suppose D is a M.E, d.r.
N
1 > maxrp, ,(DN)/~'j-7 > max reg. " DO).
N
- . j- 1.J
1.
-,.
1.J
1.,
1.,J
Hence, (1.30) is
'!hen
44
satisfiod for n = N,and since N is the M.E. sample size, D~ is a
H.E.
II
d.r.
Thus, to obtain M.E. d.r. 's relative to B, we noed only consider
minimax d.r.'s relative to the wei~lt function (1.28) for various
values of n.
size.
Lemma 1.1 may be helpful in finding tho M.E. sample·
Now let us consider the structure of minimax solutions.
Let ~ = (~11' ~12' ••• , ~K'm) where ~ij ~
denote an a priori distribution over
i -,
1.
a
and. Z .~ij
=1
~,J
By Theorem 1.1, a
necessary and sufficient condition for a d.r. D to be a Bayes
d.r. re1ativG to r is that for any x (except perhaps in a set of
t:-rnoasure zero) and any k for which ¢k(x) > 0,
6 ~. Jr. 'k g .. (x)
~J ~J
~J
:l.,J
. .
= min
1<~~
'lrlG
have
6 t; ..li. .k g.. (x)
i,j
:l.J :l.J' :l.J
or
(1. 31)
K
Z ~.kfi(x)/~ik <
i=l
:l.
-
V
Z l; .. fi(X)!B ..
i=l
:l.J
• :l.J
(j
= 1, ... , m).
Since for any i,j pair corresponding to a correct
= 00,
dGcision'~ij
the sums in (1.31) may be replaced by sums over all i for
which i,k on the left or i,j on the right correspond to incorrect
decisions.
Setting a. . =
~J
~.
./B.. , we thus have from (1. 31) that
~J
.~J
45
any Bayes d.r. relative to
defined by Lindley
£16_7.
for every i,j for which
~ij
~
is a "minimum unlikelihood" d.r., as
Hereafter, we shall suppose
~
..
J.J
=0
= 00 without loss of generality.
Theorem 1.2 asserts the existence of a least favorable distribution ~O, and that any Bayes d.r. relative to ~o is a min~nax
d.r. and conversely; moreover,
(1.32)
p .. (DO)/~.. = maxLP .. (DO)/~. 7 for any i,j for which
J.J
J . JJ.,J
..
J.J
J.J0
o
~ij >
O.
Apparently, however, there are no goneral conditions under which
all
~
o..
J.J
arc positive, and consequently we have no proof of the ad-
missibility of a minimax d.r.
m
B. . 's so.tisfy
"1J
~
s..
j=lJ.J
In fact, supposing ( = m and the
= 1 for every i, if ~? > 0 for all i, j
J.J
then p .. = B. 0' regardless of the smnple size.
1J
·1J
is too much to expect!
This, of course,
Geometrically, the convex body in the
(.m-dimensional space with coordinate axes Pij , corresponding to
all possible d.r. 's for a fixed sample size, is not necessarily
intersected by tho line determined by p . ./'}.
1J
0
1J
=p ,
i j
J~
"
i j
all pairs of subscripts corresponding to incorrect decisions.
for
46
(Soo ~16_7 and Section 1.3.3.)
However, we do have the follow-
ing thoorem in this regard, assuming
K= m and
A. is "correct"
1.
when f. is true (i = 1, ..• , m).
1.
THEOREl1
1.11: Suppose Assumption 1.2 holds and that
for every i.
~ ~ ..
1.J
jri
< 1
For any sample size groater than or equal to the
q.E. sample size, a least favorable distribution ~O has the property
m 0
~ ~i'
i=l J
Proof:
> 0 for every j.
Suppose tho theorem false; i.e., for some j, ~~j = 0
for every i.
Then, for some k
r j, S~k > 0 for at least one i.
Honce, for all x (using Assumption 1.2), Z s~.f.(X)/B ..
i
<
Z s~kf'(X)/~'k'
i
1.
1.
have ¢k(X)
o
1.
0
by D , Pik(D)
'1.J
=0
Therefore, a Bayes d.r. relative to sO must
0 idontically in x.
=
1.J 1.
Hence, denoting a minimax d.r.
= 0 for all i; in particular, Pkk(D0 ) = O. Now,
since tho sample size is> N (tho II.E. smple size) and using
Lemma 1.1, DO satisfies (1.30) -- i.o., it satisfies (1.3) -so that
47
°
=
°
m
Z l;k' (D ) - 1
j=l J
°
j7'k
<
Z 8 . + P1'k( D ) - 1 < Pkk( D ) ,
j7'k 1cJ
..
a contradiction.
°
Z PI .(D ) + Pkk(D ) - 1
CJ
=
o
0
II
According to this theorem and (1.32), it follows that p . . (DN°)/e ..
~J
'~J
attains its maximum for at least one valuo of i for every j,
where Dg is a minimax d.r. for samples of the M.E. size.
This
may be useful for obtaining M.E. d.r. 's.
EX1IMPLE:
liean of a Normal Distribution, Variance Known.
consider briefly minimum unlikelihood d.r.
IS
life shall
for samples of size
. n for discriminating among m normal density functions f , ••• , f
l
m
wi th me ans Q. (-00 < Q
~
l
< •.. < 9
For simplicity, suppose (12
"\t,
f •
3
m
=1
and
< 00) and common variance (12.
f = m = 3,
tho alternatives
A , A corresponding respectively to the densities f l , f 2 ,
2 3
Suppose furthGr, without loss of generality, that Q
2
We need consider only non-randonuzed d.r. 's.
acceptance regions
= O.
A d.r. with
48
~Q
Rn2
(1.33)
rx
h~(X) ~ h~(x), h~(X) ~ h~(X»)
(x
h~(X)
<
h~(X), h~(x) ~ h~(X»)
n
R = (x
3
h~(X)
<
~(X), h~(X)
=
<
h~(X)J
whore h~(X) • aji~(x) + aki~(x) and (i, j, k) is a permutation
of (1,2, 3), is a minimum unlikelihood d. r. for the weights (a .. ).
J.J
Denotine the sample mean by
x,
we have
Clearly, we r:lay replace hZ:(x)
by gZ:(x)
throughout (1.33), and
J.
J.
thus the acceptance regions depend on x only through
X.
easily be v8rificd that g~ is an increasing function of
g~ is a decroasing function of
x.
Setting dg~/di
::z
~verywhero
positive.
x and
0 and solving
n -x
for -x, we obtain the single stationary point of g2'
second derivative is
It may
::I
Lr
n(Q)
Hence, by sketching
the three g~ functions, it is clear that i f none of the acceptance
J.
regions is to be empty, one of three possibilities must obtain:
49
the acceptance regions are of the form
R1 '" (x
(1.34)
R?
=
ex
R = {x
3
where either c
l
~
x
c l or c3
x ~ c
4
)
c <x<c }
2-
-
cl :; x
~
3
c 2 or x > c
= c 2 , or c = c , or both.
3
~
4
4
}
(~quality
signs
have boen assigned everywhere in (1.34) for simplicity.)
c
(= 2 or 3) denote the number of c.
1.
c. 's are obtained by solving c
1.
=p ~..
1.J
(i, j
= 1,
+
IS
to be determined.
Let
The
1 of the six equations
2, 3; i ., j) for the c. 's and p, the choice
1.
of the equations to bo solved being such that p .. <
1J -
all six pairs of subscripts.
p~..
1.J
for
Theorem 1.11 may be helpful in
this choice of equations to be solved.
To obtain a H.E. d.r., tho sample size n is to be minimized subject to
p
=:
p
n ~ 1.
Slinilar methods may be applied to simple discrimination
problems concerning any distribution of the form (1.25).
1.5 A Generalization of Most Economical Decision Rules Relative
to a Vector.
DEFINITION 1.4:
Given an m x m matrix W = (w .. ) of non-negative
1J
,0
elgments and a vector
~
= (~, ••. ,
~m)
of positive constants, a
d.r. D , based on a sample of size N, is said to be a M.E. m-d.r.
N
relative to
~
w.r.t. the matrix Wfor discriminating among
f = (f , ••• , f m) if it satisfies
1
m
(1.3,)
2: w. ,Pi.(D ) <
j=l ~J J n -
~.
(i
~
= 1,
... , m)
and if N is the least integer n for lihich (1.3,) may be satisfied
by some m-d.r. D based on a sample of size n.
n
Lotting wij = 1 - 0ij (Kroncker 0) and ~i • 1 - ai' this dcfinition reduces to Definition 1.1.
If we interpret the alternative A. as tho decision to
J
est~nate
the true Q by 0., then Q. (the estimate) is a random
J
J
variable with probability function p. , if O. is the true Q, and
l.J
wij is the squared error.
w.r.t. U is a d.r. with
(8)
.... on the
r,l
~
Hence, a H.E. d.r. relative to
mini~
~
sample size subject to bounds
mean-squared deviations corresponding to tho m pos-
sible true values of Q.
Hinimax theory for fixed sruTlple sizes may be applied as
before to obtain H.E. d.r, I s.
Let
51
(1.36)
sample size, therG exist$ a minimax d.r. DO for the waight func-
tion (1.36).
By considering minimax d.r. fa
D~ for each
sample
size n, we may provo in a mann~,. completely analogous to Tbcorom
1.5
that Do is a M.E. d.r. if N is tho least n for which
N
max r(t.• DO) < 1.
J....
n 1
In Chapter IV, we shall consider further genoralizations.
CHAPrER II
MOST ECONOMICAL MULTIPLE-DEC!SION RULES
FOR COMPOSITE DISCRIMINATION
In this chapter we no longe~ require tbat
class of
den~ity
1-1 is
w.r.t. a specified
measure~.
wh~.~h
We
arc d.ensity
sh~ll
assume that
a parametric class of generalized density functions,
completely
valued
a finite
functions but allow a dem..lJ':.f'r9.~'2.e or non-denumer-
able inflnit:! of elements as well, all of
functicn~
1-1 be
s~€cified
p~r~~cter
except for some unkr.own rea}-· or vector-
OJ this 1s somewhat restrictive, but with only
minor chp,nges the whole of this chapter may be extended to more
general classes of diJtributions.
by
1-1 and
~,re
denote the par8.Deter space
the corresponding density function oy f(x,O).
All other
assumptions and definitions of Section 1.1.1 carryover.
We su?pose, moreover, that some finite number, say {, of
disjoint subsets
001 ,
""
Wt
of the space
such that for every pair 1, j (1
there 1s a
if the true
defin~te
O€W
i
= 1,
••. ,
preference for or
1-1 are
mj
j
specified
= 1,
ag~inst t~e
•.. ,
1),
decision A
; we suppose that none of the decisions is
definitely pre~erred if e€(
1-1 -
f
U
Wi)' the "indifference
i=l
j
53
region", and therefore we shall hereafter denote for simplicity
_
11 =
l
U (.1)1 which we often denote (.I)
i=l '
= {(.I)l'
••• , (.I)t).
Under
these assumptions, we say that the corresponding decision
problem is one of "composite discrimination" and a d.r. is a
d.r. for "composite discrimination" or for "discriminating
among ~ = {(.I)l' ••• , (.1)1)".
density function w.r.t.
~
As before 1 f or fn denotes the joint
= ~(x)
of the first n random variables.
A d.r. D will be characterized by the functions PJ(O,D),
J
= 1,
, •• , m, defined for all 0 E 1-1, where
=
~ ¢j(x) f(x,Q)
dI' • EQ¢j(X)
X
where the sUbscript on the expectation operator E denotes the
corresponding parameter point.
If D is a non-randomized d.r.
with acceptance regions R , •• t, Rml
l
Pj(Q,D)
=
~
R
f(x,Q)
j
d~ •
54
As in the case of simple discrimination, we shall consider
two different criteria for choosing a d.r. for composite discrimination.
The first criterion is applicable to the case
when the number of subsets, {, of
1-1
is equal to the number,
m, of alternatives and the alternative decisions correspond to
the possible true parameter points in such a way that Ai is
preferred when Qa»i (i
= 1,
••• , m); Ai is said to be a correct
decision if Q€ooi and incorrect it Q6Wj (JFi).
DEFINITION 2.1:
Let a
-
= (al ,
••• , a ) be a given vector ot
m
positive constants each less than one. A d.r. D , based on a
N
sample afl4zeN,is said to be a most economical m-decision rule
relative to the vector a tor discriminating among
00
-
= (rol, ••• ,rom)
if it satisties
(i .. 1, ... , m)
and if N is the least integer n for which (2.2) may be satisfied
by some m-d.r. Dn based on a sample of size n.
the most economical sample size.
N is said to be
55
Thus, assuming the sampling cost to be proportional to the sample
size, a M.E. d.r. is one with minimum sampling cost subject to
lower bounds on the probabilities of correct decisions.
(2.2) may
be written
(i
= 1,
••• , m)
or
where qi(Q,D)
=1
- Pi(O,D), so that lower bounds on the
probabilities of an incorrect decision are implicit in the
definition.
A second criterion for choosing a d.r. will now be
given.
We suppose that corresponding to each Wi one or more
alternatives A is preferable, or correct, when Oew •
j
i
DEFINITION 2.2:
Let t3
= (t3 ij )
be a given
t xm matrix of
pos1t-ive
constants where for every i, j pair for which A is a correct
J
decision when OEm , t3
i
1J
= 1.
A d.r. DN, based on a sample of
size N, is said to be a M.E. d.r. relative to the matrix t3 for
56
discriminating among m = (ml , ••• , m ) if it satisfies
m
Pj(Q,D) $ ~ij
for all
oeooi
(i=l, ••• ,{;j=l, ••. ,m)
and if N is the least integer n for which (2.3) may be satisfied
by some m-d.r. D based on a sample of size n.
n
N is said to be
the M.E. sample size.
The remarks following
Definit1(~m
are applicable here as well.
1.2, with minor modification,
In particular, when
t = m = 2,
the
two definitions are equivalent. Considering a 2-d.r. as a test
of the hypothesis that O€m against the class of alternatives
l
Oew , both (2.2) and (2.3) specify bounds on the two kinds of
2
error; Hoeffding
£8_7
has proved that a M.E. 2-d.r. may be
obtained by considering, for each n, tests of size a(=
l~l)
w.r •t. m1 which maXimize the minimum power w.r. t. m and choos2
ing that test for which n is a minimum subject to the minimum
~
(= a 2 ). Definitions 2.1 and 2.2 are
two extensions of these concepts to multiple decision problems.
power being at least 1 -
In the follOWing sections, we shall derive both types
of M.E. d.r.'s for composite discrimination from minimax d.r.'s
w.r.t. certain weight functions relative to the class of all
57
d.r.'s based on a fixed sample size.
First, we shall review
parts of Wald's theory with some additional theorems to be
applied to M.E. theory afterwards.
2.2 Minimax Decision Rules for Fixed Sample Sizes.
We shall review briefly some of Wald's concepts of
Sections 1.1 and 5.14 ~22_7, with minor changes as indicated
in Section 1.2 above.
We assume throughout that the sample size
n has been fixed.
l
We assume a bounded weight function W(O,A j )
(j
= 1,
••• , m), defined for all
O€1-1 representing
incurred by accepting Aj when 0 is true.
=Wj(O)
the "loss"
The corresponding risk
function when using a d.r. Dis:
(2.4)
r(O,D)
=
m
t Wj(O) Pj(O,D)
j=l
We introduce an a priori distribution, denoted
the Borel subsets (ro) of
S = (i,A), over
1-1:
1. The boundedness is not always required (see GhoSh~6_7)
but is not restrictive here.
58
where
(i=l, ••• ,!)
The average risk relative to
r(~,D)
=
r(Q,D)
is:
~
d~ =
{~
. 1 i
r(Q,D) dA
)
~=
•
i
A d.r. D* is said to be a Bayes d.r. relative to
minimizes the average risk relative to
inf
D
r(~,D).
~;
i.e., if
~
if it
*) =
r(~,D
A d.r. D* is said to be a Bayes d.r. relative to
the infinite sequenc~ f~vJ of a priori distributions if
lim ~inf r(~V,D) - r(sV,D*)_7
V=00
D
= 0.
D* is said to be a Bayes
d.r. in the strict sense if there exists a
~
such that D* is a
Bayes d.r. relative to it, and a Bayes d.r. in the wide sense if
there exists an infinite sequence of distributions such that D*
is a Bayes d.r. relative to the sequence.
A d.r. DO is said to
59
be a minimax d.r. if it minimizes the maximum risk over
i.e., if sup r(Q,DO)
1-1
= inf
sup r(Q,D).
1-1;
An a priori distribution
1-1
D
lO is said to be least favorable if it maximizes w.r.t. ~ the
minimum of the average risk; i.e., if for any other
inf r(~O,D)
(2.6)
D
~
> inf r(~,D)
-
D
Wald has proved the following two theorems ( ["22_7 remarks
on page 148 and parts of Theorems 3.8, 3.9, 3.10, and 5.12):
THEOREM 2.1:
D to be
~
A necessary and sufficient condition for a d.r.
Bayes d.r. relative to a given a priori distribution
E is that ¢.(x)
J
=
°for any x (except perhaps on
measure zero) and for any
j
W.(Q) f(x,Q) dE
J
THEOREM 2.2:
(1)
a
set of E-
for which
>
min
(
l<k<m
)
There exists a minimax d.r. DO;
60
(ii)
any minimax d.r. is a Bayes d.r. in the
(iii)
suppose there exists a least favorable
wide sense.
distribution ~Oj then any minimax d.r. DO is a Bayes d.r.
° = r(s,D° °);
relative to it and conversely; and sup r(O,D )
D
(iv)
if
sO
is a least favorable distribution,
0
D0
a minimax d.r., and ID the set of all Oel1
for which r(O,D )
< sup r(O,DO), then ~O(ID) = 0.
D
We formulate two assumptions ( ~22_7 Assumptions 5.1 and
5.6) and state a lemma and a theorem proved by Wald (
~22_7
Lemma 5.1 and Theorem 5.11).
ASSUMPTION 2.1:
1-1 is closed and bounded and Wj(O) (j=l, ••• ,m)
is a continuous function of O.
ASSUMPTION 2.2:
If (oy) is a sequence of parameter points such
that lim OV = 0°, then
y=oo
R
R
61
uniformly in all subsets R of
LEMMA 2.1:
THEOREM 2.3:
X•
If f(x,Q) is continuous in Q, Assumption 2.2 holds.
If Assumptions 2.1 and 2.2 hold, then there eXists
a least tavorable distribution.
Another existence theorem for least favorable distributions has
been given by Lehmann
["14J.
We consider another assumption and prove three theorems
which may be helpful in finding minimax d.r. 'so Sverdrup
["20_7
gives some other theorems which may prove useful in this same
regard.
ASSUMPTION 2.3:
For each i, J pair (i
= 1,
WJ(Q) equals a constant, say W , for all
1J
••• ,
Ii
J :: 1, ••• ,m),
Q~1.
This assumption 1s that for each alternative the loss varies
only from subset to subset among 0)1' ••• , O)t and not within
any subset. For a given set ot conditional distributions
(2.8)
t~(x) =)
O)i
t(x,O) cIA!
(i :: 1, ••• ,
t>.
62
THEOREM 2.4:
If Assumption 2.3 holds l a necessary and sufficient
condition for a d.r. D* to be a Bayes d.r. relative to
for discriminating among
d.r. relative to
~
= (001 ,
~
= (~1~)
"', OOt) 1s that D* be a Bayes
i for discriminating among!~ = (f~I"'lff)
w.r.t. the weight function W • The average risks in the two
iJ
cases are equal.
Proof:
Using Assumption 2.3 and (2.8), we have
=
t
E ~i
i=l
The first part of the theorem follows tmmediately from (2.9)
and Theorems 1.1 and 2.1.
We now show that for any d.r. D, the two risks are equal.
We have, using {2.4)1 Assumption 2.3 1 (2.1)1 Fubini's Theorem,
(2.8), (l.l), and (1.4):
(2.10)
( r(Q,D)
)
~i. ~
)
Wj(O) Pj(O,D)
J-1
(,&)i
~1
(,&)'1
~ Wij
•
)
CD
) ¢j(x) 1"(x,Q) cIjI.
1
~ Wij
•
)
~i
X
¢j(X)
~(X) cIjI.
'X
where we have denoted
Pij(~,D)
• Pr(D chooses A
J
I t~).
(2.11)
where r A(l,D) is the average risk relative to
. criminating among
rA •
//
i
when d1s-
Hence,
64
THEOREM 2.5:
Suppose Assumption 2.3 holds.
Necessary and
sufficient conditions that ~O =(1°,~0) be a least favorable
distribution and DO a minimax d.r. for discriminating among
Q)
are that
(i)
10 is a least favorable distribution and DO
~O
minimax d.r. w.r. t. Wij for discriminating among!
(ii)
for any i for which
o
r(O,D )
0
d~.
1.
is a
; and
o > 0,
~i
= sup
0
r(O,D}
•
Oea>i
Moreover, the maximum risks in the two cases are equal; i.e.,
Proof:
any ~
We first prove the necessity.
= (1,~),
for any
~,
Since (2.6) holds tor
we have in particular int r(~O,D) ~ int ~(l,~O),D)
D
D
o
so that, using (2.11), int r 0(1 ,D)
D i.
~
inf r o(l,D);
D
~
65
that iS I
1°
is a least favorable distribution for discriminating
°
among!AO • By Theorem 2.2 (iii)1 D is a Bayes d.r. relative
to ~Ol so that, by Theorems 2.4 and 1.2 (iii), D is a mintmax
°
AO
d.r. for discriminating among! •
We shall now verify (2.12).
= Ll
1=1
°
By Theorem 1.2 (iv),
° =r
~i r(f AO
i ,D )
°
0(10,D )
A
so that together with (2.11) and Theorem 2.2 (iii), we have
°> 0, we have r(f1 ,D°) = max r(f1 ,D°)
AO
For any 1 for which ~1
AO
1
by Theorem 1.2 (iv) and sup r(O,DO)
wi
= sup r(O,DO)
1-1
by Theorem
2.2 (iv), which, together with (2.12) and (2.10), prove (ii).
We now prove the sufficiency.
°
2.4, D is a Bayes d.r. relative to t
By Theorems 1.2 (iii) and
° = (t °,A°)j i.e.,
66
Suppose ,0 1. DOt a least favorable d1.tl"ibut1ODi then there
exista a £ • (.1,).) .uch that
iDt l"(tO,D)
P
< 1nf r(i,D)
D
•
But
r(Q,Do)
r(Q,D o)
~
dh
i
67
By
Theorem 1.2 (lv), for any 1 tor which
~x
°> 0, r{fAO1 ,D°) =
~i
0
r(f 'A ,D ), which, together with (2.10) and (ii) implies
1.
°
i
sup r(O,DO)
= max
Wi
r(tO,DO)
sup r(O,DO)
i
Wi
=~
t~ ~
= sup r(O,DO).
fl
Hence, from (il),
° ° = sup r(O,D°)
r(O,D )
d~.
1.
,
1-2
Wi
in contradiction to (2.13), (2.14), and (2.l5).
Hence,
to
is a least favorable distribution, and, by Theorem 2.2 (1ii),
DO is a minimax d.r. for discriminating among~.
THEOREM 2.6:
II
Suppose Assumption 2.3 holds, and suppose {A Y )
is a sequence of sets of conditional a priori distributions and
°
D a d.r. such that
(2.16)
lim
(i
= 1,
••• ,
!>
y=oo
...
68
2, ••. J DV is a minimax d.r. for dis-
= 1,
where for each v
v
criminating among f~.
criminating among
Proof:
Then DO is a minimax d.r. for dis-
~.
By Theorem 1.2 (ii) and (iii), for each v there exists
a least favorable distribution
to
i V for
(2.17)
l V and DV
1s a Bayes d.r. relative
v
discrtminating among
I
I: ~~
i=l
1.
\
fA ; i.e., for any d.r. D,
I
r(O,D v) dA~< I:s~
1. -
i=l
1-
~
r(O,D) d~~1.
w.J.
wi
~
sup r(O,D)
1-2
v
Now each sequence (~~] has at least one limit point; let (~j),
69
= 1,
j
2, ••• , be a sub-sequence of
f~
v
= (1 v,Av»)
v.
0
0
~1J converges to a limit, say ~1; then L ~i
= 1.
for which each
By Theorem
1
1.2 (iv) and (2.10), for each i for which ~~
so that, from (2.16), for each i for which
> 0,
o > 0,
~.
l-
0
supr(Q,D )
CD
i
o = sup
max sup r(Q,D )
i
Hence, from (2.16),
i-1
CDi
l v
lim L ~. J
. 1 J.
j=oo J.=
0
r(Q,D).
)
CD
v
too
r(O,D J) dA"'J = L ~i sup r(O,D )
1
1=1
(J)
i
i
o
which, together with (2.17), asserts sup r(O,D )
1-1
~
.
sup r(O,D)
i-1
=
70
tor any Pi
among~.
~.e.,
Do is
~
minimax d.r. tor
discr~in~ting
II
If a least ;favorable distr1bution exists" the probleJll
reduces to one at
s~~
discrimination, so that it
.~
is
~Qn-
atOllie only non-randOmized d.r.'sneed be considered. We state a.
lemma for the case at
c~posite
discrimination
~nalogous
to
Lemma. 1.1; the proot (not given) 1s a.lso analogous.
LEMMA 2.1: For every fixe.d
sequence (rnl, n
= 0,1,2,
sampl~
size n
••• , 1s a
;I
0, 1, 2, ••• , let
n.on-incre~s1ng
2.3 Most Economical Decision Rules Relative to
~
seq\l.ence.
Vector
g.
As in SectioJils 1.3.1 and 1.3.2, we shall apply tl'1e theory
of' Section 2.2 to two speC;1fic weight functions WJ(Q) and develop
in e&¢h case a metbodof obtaining M.E. d.r.'s as d.ef1ned by
Definition 2.1.
In bQtb cases.. weass\UD.~
f = m.
First, let
71
From (2.4), the risk w.r.t. WJ(O) is
and
sup r(Q,D)
wi
=-
in! Pi(O,D)/Oi
wi
(1
= 1,
o
By Theorem 2.2, there exists a minimax d.r. D.
••• , m) •
We consider
such d.r.'s for each sample size n and prove:
THEOREM 2.7:
For each n
= 0,
o
1, 2, ••• , let Dn be a minimax
d.r. w.r.t. the weight function (2.18) for samples of fixed
2. The remarks in footnote 4, Chapter I, are applicable
here as well.
72
Suppose for some n, sup r(O,DO) <~ and let N be the
size n.
n
1-1
least such integer.
vector
~
-
°
Then D is a M.E. d.r. relative to the
N
for discriminating
among~.
exists a M.E. d.r. relative to
~
Conversely, i f there
for discriminating among
~
°
and the M.E. sample size is N, then DN is a M.E. d.r.
Proof:
The proof is exactly like the proof of Theorem 1.3,
replacing p.(D
) by inf Pi(Q,D).
1
n
n
II
(l.)i
Note that the weight function (2.18) satisfies Assumption
2.3 with Wij given by (1.6).
button
~
Hence, if a least favorable distri-
° = (1°,A°) eXists, Theorems 2.4 and 2.5 imply that
the composite discrimination problem may be treated as a
simple discrimination problem with
ri(X)=f~O(X)= ~ r(x,g)~~.
wi
and the theory of Chapter I will be applicable.
If a least
favorable distribution does not eXist, Theorem 2.6 asserts
that by a similar treatment for a sequence of a priori distributions having certain properties in the limit, it may be
possible to solve the composite discrimination problem.
suppose a least favorable distribution ~O
Now
= (l°,AO) eXists.
73
Then, by Theorem 2.6,
(2.20»)
P1(O,DO)
(1)i
ASSUMPTION 2.4:
If R is a subset of
Xfor
rf(X,Q)d~=O
which
R
for some
Q£i-l,
then
~
f(x,O)
d~
= 0 for all
Q€1-1
R
This assumption implies Assumption 1.2 for the density functions
f ~ , ••• , f ~ , defined by (2.8), for any set of conditional distrim
l
butions~.
If Assumption 2.4 holds, and if a least favorable
distribution eXists, it follows from Theorems 1.4 and 1.2 (iv)
and (2.20) that
(2.21)
...
inf
p
O€(1)
m
where DU is a minimax d.r.
(Q,D o)
m
74
We now consider a second weight function Wj(O):
f l/~1
= ~
where f3 1
=1
0
otherwise
From (2.4), the risk w.r.t.
- Q. as before.
~
W.(O) is
J
i f O€co.
~
(i :: I, ... , m),
and
sup r(O,D)
= sup
co
CO
i
i
(1 = 1, •.• , m).
q.(Q,D)/f3.
1
~
°
By Theorem 2.2, there exists a minimax d.r. D.
The proof of
the following theorem is analogous to the proof of Theorem 2.7
and therefore will not be given.
THEOREM
2.8:
For each n
= 0,
1, 2, ••• , let DO be a minimax
n
d.r. w.r.t. the weight function (2.22) for samples of fixed
size n.
Suppose for some n, sup r(Q,DO) < 1 and let N be
11
n
75
o is a M.E. d.r. relative to the
Then DN
for discriminating among~. Conversely, if there exists
the least such integer.
vector
~
a M.E. d.r. relative to
~
for discriminating among
~
and the M.E.
o is a M.E. d.r.
sample size is N, then DN
Note that the weight function (2.22) satisfies Assumption
2.3 with Wij given by (1.14). Hence,
Theore~s
2.4 to 2.6 and
the theory of Chapter I may be applied to obtain M.E. d.r.'s.
Suppose a least favorable distribution ~o
does exist.
(2.24) (
)
= (sO,~o)
Then, by Theorem 2.5 and (2.23)
q.(O,DO)d~? = sup
1
1
o
illi
~
If Assumption 2.4 holds, if a least favorable distribution
eXists, and if
o
sup r(Q,D ) < 1/ max ~J
11
0
qi(Q,D ) for any i for which ~i
~~
> O.
76
°
where D is a minimaX d.r., then by Lemma 1.2 and (2.24)
°
1
=~
sup
....m 1'\,.,,,
(2.26)
~(O,D)
~""""'m
The following two lemmas give sufficient cond.it:tons for (2.25)
to hold.
LEMMA 2.2:
Suppose Assumption 2.4 holds.
For any sample size
greater than or equal to the M.E. sample size, (2.25) holds.
Proof:
Suppose n
~
°
N, the M.E. sample size, and that Dn is a
minimax d.r. for samples of size n; then, using Lemma 2.1 and
Theorem 2.8, DO satisfies (2.2).
n
This implies qi(Q,DO) < 13
n
-
1
< l/max 13.. / /
j
J
LEMMA 2.3:
If 13 1
<
m
~ I3 /(m-l) (i.e.,
j=l j
all i, then (2.25) holds.
a1 >
(~J.- l)/(m-l»for
77
~
II
The lemma may be proved analogously to Lemma 1.3.
Thus, we have given two methods for obtaining M.E.
d.r.Js relative to a for composite discrimination.
If Assumption
2.4 holds and if a least favorable distribution eXists, the
first method leads to M.E. d.r.'s for which the minimum
probabilities of a correct decision are in the ratios a :
l
a : ••• : am (see (2.21», and the second method leads to M.E.
2
d.r.'s for which the maximum probabilities of an incorrect
decision are in the ratios ~l : ~2: ••• : ~m (see (2.26».
This suggests an alternative approach:
we may consider for
each n the class of d.r.'s satisfying (2.21) (or (2.26»
define an "optim\llD" d.r.from among this class as a
maximizes (or minimizes) the common ratio.
these "optimum" d.r.' s for n
= 0,
least n for which the ratio is
d.r.'s.
~
d.r~
and
which
By considering
1, 2, ••• , and choosing the
(or
~)
1, we can obtain M.E.
Results similar to Theorems 2.5 and 2.6, stated in
terms of the weight function (2.18) (or (2.22», may be
derived as sufficient conditions for a d.r. to be optimum in
the corresponding class.
This is clearly equivalent to the
minimax approach.
Thus, to obtain a M.E. d.r., we look for a set of IIleast
favorable conditional distributions", or a sequence of distributions which is "least favorable in the limit", and then
78
determine nand constants defining a likelihood ratio d.r.
for discriminating among these lIaverage" density functions
for which (2.21) (or (2.26»
are satisfied.
Wald's definition of admissibility for either at the
weight functions considered here is:
a d.r. D is said to be
,
admissible i f there does not exist a d.r. D for the same
,
sample size for which Pi(Q,D )
~
Pi(Q,D) for all
with strict inequality for at least one
admissibility of the M.E. d.r.'s
been obtained.
Q€i-l.
de~ived
i (1=1, ••• ,m)
Q€w
No proof of
in this section has
However, "if Assumption 2.4 holds and there exists
a least favorable distribution, it can easily be verified that
,
there does not eXist a d.r. DN based on a sample of size N for
,
which int Pi(Q,DN) ~ inf Pi(O,DN) (i
wi
= 1,
"" m) with strict
wi
inequality for at least one i, where DN is a M.E. d.r. obtained
by either of the mintmax methods of this section.
2.4 Most Economical Decision Rules Relative to a Matrix
~.
ile now consider M.E. d.r.'s for composite discrimination
as defined by Definition 2.2.
Just as the approach of Section
l.}.2 was extended 1n Section 1.4, we shall extend the approach
of Section 2.} in this section.
The argument is very brief
79
because of this analogy.
We replace each
~1j
which 18 equal to
~"y~+oo.
Suppose n fixed, and consider parameter spaces
1-1m,
each
For each
1-1j
being identical to
1-1,
and denote
1-11 , ••. "
1-1'eU 1~j.
J
J, denote the corresponding subsets by ID1J , ••• ,
ID!j.
Define a weight function Wk(Q) for k = 1, ••• , m, by
(2.27) W(O,Ak)=Wk(O)=
I
l/~ij if QEID
o
and j=k(i;l, ••• ,I;J=l, ••• ,m)
iJ
otherwise.
We obtain
Let
ID
~
be an a priori distribution over
in 1-1' denote
where
1-'
1
and for any subset
80
For a given set of conditional distributions ~=(~11'~12, ••• ,Alm)'
denote
f(x,Q)
dA ..
:LJ
In a manner completely analogous to Sections 2.3 and 1.4, we
have:
THEOREM 2.9:
For each n
= 0,
°
1, 2, ••• , let D be a minimax
n
d.r. w.r.t the weight function (2.27) for samples of fixed
size n.
Suppose for some n, sup r(O,Do)
1-1'
the least such integer.
the matrix
~
<1
and let N be
n-
°
Then DN is a M.E. d.r. relative to
for discriminating among
~
= (ill1 ,
••• , illf).
Conversely, if there exists a M.E. d.r. relative to
~
for
discriminating among ~ and the M.E. sample size is N, then D~
is a M.E. d.r.
81
The theorems of Section 2.2 may be applied to obtain minimax
d.r.'s for composite discrimination w.r.t. the weight function
I
(2.27) by replacing
in the theroems by
single subscripts i by iJ and f
,
I = t·m
A by gij.
A
and replacing
If a least favorable
i
distribution eXists, then the composite discrimination problem
reduces to a problem of simple discrimination among the "average"
density functions g~J defined by {2.28} w.r.t. a set of "least
favorable conditional distributions" A, and Theorem 1.11 and the
remarks of Section 1.4 are applicable.
Thus, this method of
solution gives minimum unlikelihood d.r.'s as M.E. d.r.'s.
If a
teast favorable distribution does not eXist, then a minimax d.r.
will be a Bayes d.r. in the wide sense and Theorem 2.6 may be
applicable.
2.5 Some Parametric ExamRles of Three-Decision Rules.
EXAMPLE 1:
Normal Mean, Variance
Know~.
is a normal density function with variance
Q, and
Co :
°
00
1
=
(Q : 0 ~
°1
),00
2
,
~ 03J,
= (0
=1
(i
S °~
{known} and mean
"
02
J,
00
3
=
II
where 01 < 02 ~ 02 < 03.
tively, where Ai{Qi)
,
: 02
of
Suppose t(x,O}
= 1,
Define a set of' condi-
2, 3) and 02
,
"
= 02 or 2 , to
°
be determined later.
For fixed n, we shall show that such a
set of Ats is '!least favorable" in the sense of Theorem 2.5
(ii), w.r.t. the weight function (2.18), and hence, this
composite discrimination problem reduces to the simple discrimination problem considered :i.n Exs,m,ple 1, Section 1.3.6,
with 02 determined as follows:
,
"
where D and D are the solutions to the corresponding simple
discrimination problems (with Q
2
= 02,
"
or 02) for fixed n; we
shall show below that such a determination of 02 is complete
and consistent. We shall only treat this extension of Example
1 of Section 1.3.6, but the other three examples of that section
may be extended in a completely analogous manner.
Also, we
may use the weight function (2.22) instead of (2.18) by making
minor changes throughout.
We now show that A satisfies Theorem 2.5 (ii) for fixed
n.
Let D, defined by c l and c 2 , be a minimax d.r. for dis-
criminating among 01' Q2' Q3' as given in Section 1.,.6; primes
on any symbol refer to the corresponding value of Q2since .li8 an increasing function and using (1.26),
=P1(Ol,D). ~
(1)1
and similarly
Furthermore
P1(O,D)
~
Now
84
and a decreasing function of 0 for 0
~
(c
l
+C
2
)/2.
Hence, if O
2
is chosen according to (2.29), Theorem 2.5 (11) is satisfied.
We now show that such a choice of Q is possible by
2
"
"
showing that if P2(Q2,D )
,
"
"I
I
I
> P2(Q2,D ) then P2(Q2,D ) >P2(Q2,D )
and conversely. From (1.21), with either a prime or double-prime
where F-1(x) = t is defined by..l (t)=
Xo
Hence
85
which is obviously a decreasing function of p for fixed G.
a:2p ,
= P2(O ",D
) and a 2P"
'"
(2.30)
Q2(P - P )
" ")
Suppose P2 (Q2,D
= P2(O""
,D )
so that, upon subtracting,
" ")
= P2(02,D
- P2 (02,D ).
'")
> P2(02,D
t
ttl
Now
I'
so that from (2.30), a 2 (p " - P , )
"t
I
>p
P2 (02,D ') - P2(02,D ), implying that p
>
t
since P2(02,D) is
II
a decreasing function of p.
Then, since P2(02,D) is a decreasing
II,
II,
function of P, we have from (2.30) that 0 <a2 (p - P )<P2(Q2,D )
II
,
Conversely in the same manner, it P2(02,D )
,
,
>'P2 (Q2' D),
II
then froDi(2.30),
I
II
"
> P2(G2,D )
°2(P - P )
"t"
t
- P2(02,D ),. and p must be greater than P ; hence, from (2.30),
o <a 2 (p
II,
- p )
EXAMPLE 2:
o
,
"
NOll
suppose f(x,O),
"
II
00
1
= (0:
~/o ~ 01' 0 ~
~
and
°2, 02' 03 (01 <
t
unknown, and suppose, given 01 ,
O2 $ O2 < 03)' that
t
,
is a normal density function With both mean
2
0
II
Normal Mean, Variance Unknown.
= (~,o),
variance
"
< P2(02,D ) - P2(02,D ).
0
II
< 00), 002
= (0:
02~~/O~02' 0~a<00},w3= (Q: ~/a~03' 0SO<OO).
86
We shall show that, for a fixed sample size n, a non-randomized
d.r. DO with acceptance regions of the form (1.26) with
x
replaced by Student's t-statistic is a minimax d.r. w.r.t. the
weight function (2.18).
c l ' c 2 ' and p are determined by
equations of the form (1.27) with the normal distribution
functions replaced by non-central t distribution functions.
By considering such d.r.'s for various sample sizes, a M.E.
d.r. may be obtained according to Theorem 2.7.
(Alternatively,
we may use the weight function (2.22) and Theorem 2.8~~
To prove that Do is minimax, we consider a sequence of
distributions C~v} and a corresponding sequence of minimax
d.r.'s
v
CD )
~v
for discriminating among!
and apply Theorem 2.6.
, defined by (2.8),
The methods of this example are adapted
from Hoeffding.'s' lecture notes
£8J where
he shows that a test
based on Student's t maximizes the minimum power against a onesided class of alternatives.
Let n be fixed throughout.
For each v
= 1,
V
consider a set of conditional distributions ( ~l'
2, 3,
v
= 1,
2, 3), 02
,
= 02
=
"
or 02 to be determined later as in
(2.29), and a is distributed over (0,00) according to the
probability density
~
v) where
~2' ~3
~~ assigns probability one to sets of Q in which ~/a
0i (i
...
87
c mTm-l
e
f{m)
where
T
= 1/(202),
-CT
m = l/V, and c is a positive constant.
For each v(or m), a non-randomized likelihood ratio d.r. for
)..v
)..v
)..V
discriminating among (f l ' f 2 ' f, ) which satisfies (1.11) is
a minimax d.r. w.r.t. the weight function (1.6) for simple
discrimination.
V
V
Denote such a d.r. by D • Now D is determined
)..V
by
the ratios LiJ(X)
i, j (i
)..v
=f J
)..V
(X)/f i (x) for various values of
< j). We have, using (2.8) and (2.,1),
00
)
n
T / 2 + m-l exp L-T(c+r.x2 )+Q
I:o:..f2TIdT
1
o
(1
= 1,
2, 3).
88
Let u
+ tx 2 ) and denote tc(x)
= T(C
= tx/ J:~2+c
4
j
then
By a theorem of Kruskal ( ~12_7, see
a function of t c only.
A'll
his equation (4.2», L1j
is an increasing function of t
c
for
0i < OJ (i < j) so that Theorem 1.9 is applicable, and D'II is
of the form (1.26) with i replaced by t
c
and the constants
v
c lmc ' c 2mc ' and p are determined by the equations
ex p
i
v
=
which
de~ends
on a through T; and in a similar manner, using
90
00
)
(2.34) C¥ 1P Y =
m rn-l
Sl(CT, Clmc )
e -CT d T
cr(m)
°
00
)
m-l
e -u du,
sl(u, c lmc ) ~{m)
o
Y
Y
and similar equations may be obtained with C¥2P and C¥3P on the
left.
Y
These are the equations determining c lmc ' camc' and p ,
and thUD it is clear that c lmc and c
are independent of c,
2mC
which therefore is arbitrary.
Hereafter, we omit c as a
subscript.
Now the distribution of t depends on Q only through
~/o
60
that p.(O.,DO)
is independent of
l.
l.
0
(1
= 1,
may be verified as in Example 1 that
(i
= I,
2, 3)
2, 3).
It
91
,
= 02
where 02
"
or 02' to be determined as in (2.29).
(That
such a choice of Q is possible may be proved as in the
2
previous example.)
Student's t may be written t(x)
= J(n-l)/ri
tol j 1
2
- t~n
I
Lxi JZ(x~i)2"
=";(n-l)/n'
n
where to = Lxi ~tK- , and clearly
t 1s an increasing function of to'
Hence, the d. r. DO may be
expressed in terms of to rather than t with corresponding constants
r
r
1
1
c l and c 2 defining the acceptance regionsj that is, c l ' c 2 ' and
p
=P
are determined by Pl(Ql' DO)/o i
using the
no~ation
0_
= 1,
2, 3); explicitly,
introduced in (2.33) and the following equations,
\ Pl(Ql,DO)=Pr(to~ c~
)
(i
I
\ P2(02,D )-Fr(c l
< to
1°1 ) = .1(0,c~)
I
~ c2
= alP
1
I
l
P,(O"DO)=Pr(t o > c;
I
I Q2) = s2(O,c 1 ,c 2 ) =°2P
I 0,) = .,(O,c;)
We shall prove presently that
= a}p
92
lim p v .. p
v=oo
•
Assuming it to be true temporarily, we have from (2.32) and
(2.36)
(2.38)
From (2.38), (2.35), and (2.19), (2.16) is satisfied for the
weight function (2.18) so that DO is a minimax d.r.
We need only prove that (2.37) holds.
,
lim c
im
m=oo
I
=c1
(1
,
where c 1 and c are defined by (2.36).
2
For any v
•
> 0,
we have
= 1,
First, we prove
2)
93
00
)
(2.40)
00
m-1
u
'f{Lij
e
\lil
-u
)
<1V
du
V
m
u
-u
r(m} e
du
V
00
j
m
u
f{'iii) e
-u
du =
m
V •
o
Hence, remembering that 0
~ 8
1
~
1, we have from (2.34)
00
o
m-1
)
)
8
-u
) u
(
du
1 u,c lm rCm) e
v
v
)
o
00
U
m-1
-u
t
r1i} e du+01
)
v
m-l
u
ftiiiJ
e
-u
du
00
)
m-l
u
-u
~ du
v
V
m-l
,
u
-u (
)
'r(mJ e du 0 ~ 51 ~ 1
v
Let v =~ so that, from (2.41),
similarly,
~e
may obtain
95
Suppose, for at least one i (i
= 1,
2), c 1m does not tend to
,
,
cii e.g., suppose that 11m sup clm
m=O
exists a sequence em }, .1
.1
= 1,
,
~1m
= cl+
clm.
J=OO
J
=cl+
> 0).
E (E
Then there
2, ••• , such that 11m mj
.1=00
= °and
,
,
E, and, since sl(u,c l ) is continuous in u and c l '
,
the right hand side of (2.42) tends to sl(O,c + E)/a l and the
l
left side tends to p* - lim sup p 'II •
\1=00
>
8,(5<:0).
Suppose 11m sup c 2m
j=oo
c:: .1
Then there exists a subsequence {m j }, k
redefined as
{~},
such that lim
k=oo
,
--x:
= 1,2, ••• ,
k
= c2 +
c~
I
= c 2+
5 and the right
,
,
sides of (2.43) and (2.44) tend to s2(0,C +E,c +o)/a and
l
2
2
,
S3(O,c 2+5)/a , respectively, whereas the left sides both tend
3
*
,
to p.
But, from (2.36), it is seen that sl(O,c ) is an
l
,
I'
increasing function of c 1 ' s2(0,c ,c ) is a decreasing function
l 2
I
"
of c I and an increasing function of c 2 ' and S3(O,c ) is a
2
,
, ,
decreasing function of c • Hence, s2(O,c ,c )/a
l 2 2
2
,
< sl(0,c 1+ e)/a l
,
I '
= s2(0,c l +
,
€,c 2+ o)/~2' implying 0> 0, and
,
I
= SI(O'CI)/Ol < SI(O,cl+€)/a l = s3(O,c 2+5)/a 3,
,
< 0, a contradiction. Hence, lim sup clm ~ c I •
s3(o,c 2 )/a 3
implying 0
,
= sl(o,cl)/al
m=O
,
Similarly, we may show that lim sup c $ c and lim inf c im
2
2m
m=O
m=O
(i=1,2). Consequently, (2.39) holds.
,
~
ci
Taking the limit in (2.42)
and using (2.36), (2.31) 1s verified.
Note that in the sequence of distributions (A
parameter c was left completely arbitrary (c
that the sequence of d.r.'s
is held fixed.
(D
v
> 0).
V
),
the
Note also
J does not converge to D0 if c
This can be achieved, however, by letting c tend
to zero as v tends to infinity.
2.6 A Non-Parametric Example of a Three-Decision Rule.
We shall give an extension of the sign test for the median
of an aribtrary distribution function by adapting an example
given by Hoeffding in ~1_7.
regard.)
(See also Ruist ~l8_7 in this
In an analogous manner, a M.E. d.r. concerning any
quantile of an arbitrary distribution may be derived.
As was remarked in Section 2.1, the theory holds also
for non parametric classes of density functions.
Let
1-1 be
the class of all density functions f w.r.t. a fixed measure
on the real line such that
~(x
n
fn(x) =
1T f(X i )
i=l
and Q(f) =
< oj > 0,
(0
J
-00
~(x
f(x) d~.
> 0) > O.
Given
~
Denote
97
(f: Q(f)
~
Q3J.
The alternatives, A , A , A , corresponding
l
2
3
to (.1.)1' (.1.)2' (.1.)3' might be that the median of the unknown distribution is "significantly" less than zero, "close" to zero,
"significantly" greater than zero, respectively.
Consider the density function
if
f(x,Q)
Ixl 5 A
=
o
otherwise
where b(x) = 1 if x
~
I
0 and 0 otherwise, A
= ~((-A,O»,
is an arbitrary positive constant, and denote fn(x,Q)
and A
=
n
1Tf(x ,Q).
i
i=l
Define a set of conditional distributions;" =
probability 1 to f(x,Oi) (i = 1, 2, 3), hereafter denoted
simply f i , and where Q
2
I
= 02
"
or Q2' to be determined later as
o
in the parametric examples above.
Note that Q(fi)
= ) f (x,Qj )d~
-00
=
Qi
so that f
i
is in wi (i
= 1,
2, 3). Consider the simple
n
discrimination problem of discriminating among fn(x,Ol)' f (x,02)'
n
f (x,03)' and let Dn be a minimax d.r. w.r.t. the weight function
(1.6) for samples of fixed size
ratio d.r.
E
k
b(~)
nj
clearly, D is a likelihood
n
=
Now fn(x,Oi) is of the form (1.25) with t{x)
so that Theorem 1.9 is applicable, and Dn , defined by
!n{x), is of the form (1.24) with m =;.
Now t has a binomial distribution with parameter
and index n where f is the true dens i ty funct ion.
° = O(f),
Clearly, then 'J
for Dn defined above,Pi(f,D n ) depends on f only through 0
= O(f),
the binomial distribution function and probability function by
Bn,,,,~ and bn,,,,A' respectively,
99
where a l , a 2 , c l ' c 2 depend on n and are determined so that
inf
Pi(f,D)
n
fEW
= QJ.'P
(1
= 1,
Now B ~(t) is
2, ,) for some p.
rl, ....
1
a decreasing function of 0, and b
~(t)
is a decreasing or an
n,'"
increasing function of 0 according as t < or
> (n-l).
Hence
Pl(O,D ) is a decreasing function of 0 using the fact that if
n
similarly, P (Q,D n ) is an increasing function of O.
3
Since
= 0.,
clearly Theorem 2.5 (ii) for the weight function
1.
O(f.)
J.
(2.18) is satisfied for i
= 1,
3. Now P2(O,Dn ) may be shown
,
to have a maximum between G = 02 and 0
which is near
° = (c l +c 2 )/2(n-l)
"
= 02'
say at 0
=
°,
0
,
(if this point is between 02
II
,
II
and 02; otherwise the maximum is at e2 or 02)' and to be an
increasing or a decreasing function of 0 according as
LP
,
e < or
II
Hence, inf P2 (f,D) = min
), P~(02,D ) 7, and
2 (02,De
fEill
nn
n 2
since 0(f 2 )
= O2 ,
,
by a proper choice of 02
2.5 (ii) is satisfied for i
= 2.
= 02
II
or 02' Theorem
That such a choice of 02 is
possible may be proved as in Section 2.5.
100
2.5
Thus ). is "least tavorable" in the sense ot Theroem
(11) and a M.E. d.r. may be obtained by solving Pi(f.,D
)
J. n
with N equal to the least integer
c ' ai' a 2 ; DN is
2
th~n
~
=
n, re-solving tor P, e l ,
a M.E. d.r. tor discril'tinating among
CHAPTER III
EXISTENCE THEOREMS FOR MOST
~ONOMICAL MULTIPLE.D~ISION
RULES
3.1 Introduction.
In the two-decision case, it can easily be shown that the
existence of a uniformly consistent sequence of d.r.'s (see
Definition 3.3) implies the existence of M.E. d.r.'s.
Berger
~1-7 has given sufficient conditions for the existence of such
sequences (in the two-decision case).
Hoeffding
£8_7,
having
defined non-trivial 2-d.r.'s for fixed sample sizes (see Definition
3.1) and given some sufficient conditions for their non-existence,
has proved the existence of a uniformly consistent sequence from
the existence of a non-trivial d.r. for some n by an adaptation of
Berger's theorem.
Berger and Wald ~2-7 have given rather broad
sufficient conditions for the existence of certain two-decision
rules (what we shall define as strongly selective d.r.'s), the
eXistence of which implies the existence of non-trivial 2-d.r.'s,
as defined by Hoeffding, for any n
> O.
Briefly, their conditions
1-11
and 1-~, defined below, be disjoint, assuming the
existence of least favorable distributions l :
are that
1,
See Theorem 2.3 and the remarks following it.
102
f(x)
where
~i
=
(i
= 1,
2)
is any distribution fUDction over IDi • Thus, Berger and
Wald'e results supply sufficient conditions for the existence of
a uniformly consistent sequence of 2-d.r.'s, and, therefore, for
the existence of a M.E. 2-d.r.
We shall extend this work to the case of m-decision rules.
The concept of non-triviality does not appear very fruitful in
this casej instead, we define "strongly selective" m-d.r.'s and
use them to prove that the existence of certain non-trivial
2-d.r.'s is both necessary and sufficient for the existence of a
uniformly consistent sequence of m-d.r.'s.
of M.E. m-d.r.'s (relative to a or ~ for
Finally, the existence
t = m)
is proved from the
existence of a uniformly consistent sequence of m-d.r.'s.
Thus,
the existence of M.E. m-d.r.'s basically depends on the existence
of certain non-trivial 2-d.r.'s, sufficient conditions for which
are given by Berger and Wald, and some necessary conditions for
which are given by Hoeffding's sufficient conditions for non-existence.
(~v
Hoeffding's results are that if there exists a sequence
= (~~,
A~)J, v
= 1,
2, ""
of conditional distributions over
1 and (J)2' respectively, such that
00
103
lim
"cOO
~
where f i is defined by (2.8), then a non-trivial 2-d.r. for
discriminating between 001 and 00 does not eXist.
2
The results are derived for composite discrimination among
parametric classes of density functions,
1 , ••• , wm' but they
00
hold as well for more general classes of density functions and
hence, in particular, for simple discrimination •
3.2 Non-Trivial, Selective, and Consistent
Se~uences
of Decision
Rules.
Many of the results of this section hold for sequences of
random variables which are not necessarily identically distributed
and some do not require independence, but since the results are
derived primarily for application to most economical theory, we
make both assumptions throughout.
We find it convenient throughout this section to specify
a d.r. D by ~(x), sometimes adding a superscript n to denote the
sample size.
101+
be non-trivial for discrUninating among rol , ""
rom it the conditions
(1 = 1, ... , m)
can be satisfied with some numbers a , ••• , a (0 <a i < 1) such
l
m
m
that
La.
i=l J.
> 1; or, equivalently, it
m
E int Eg¢i(X) > 1
i=l Gero
i
The term "non-trivial" is used since, i f I. O:i
:s
1, the conditions
can always be satisfied without taking any observatins.
(See
tootnote 7, Chapter I.)
The
~xis~€nce
of a non-trivial 2-d.r. for discriminating
between w and ro for some i, j (1fj) implies the existence of a
j
1
non-trivial m-d.r. tor discriminating among ro l , ••• , rom - no matter
what the renaining ~IS arej tor, suppose ¢i(x) + ¢j(X)
identically in x and in! Eg¢i + int Eg¢j
rot
(l)J
identically in x tor all
kli,
jj then
=1
> Ij set ¢k(x) = 0
105
m
t ~i(X)
1=1
=1
and
m
t inf EO~1(X)
>1
1=1 (1)1
(A proof can also be given in which ¢k(X)
>0
for
kf1, J). Hence,
the concept of non-trivial m-d.r.'s does not appear to be
particularly
u3ef~1
except for m = 2. We shall introduce a
slightly more restrictive class of d.r.'s:
DEFINITION 3.2:
A d.r. ~(x) is said to be selective (or weakly
selective)for discriminating among
~
if
for all J#i
(1
= 1,
... , m).
A d.r. ~(x) is said to be strongly selective for discriminating
among
~
if all the above inequalities hold strictly.
(We should be careful not to confuse selectivity with the similar
concept of unbiasedness, which might be defined as implying
for all J#l
(1
I:
1, "., m);
106
this definition reduces to what has been termed an unbiased test
in the literature.)
In the classical notation, strongly selecti"e
in the two-decision case means a < 1/2, ~ < 1/2, whereas non-trivial
means a +
~
< 1.
It may easily be shown that selectivity does not imply nontriviality but strong selectivity does. We give some sufficient
conditions for strong selectivity:
.m-l
m
-> 1
1
= -m
(1 = 1, .. 0, m)j
(2)
inf EQ~i(X)
Q€(.l)i
> 1/2
for all i, since
107
(1
DEFINITION 3.3:
The sequence
(1n{x)},
= 1,
••• , m).
n(samp1e size)
= 0,
1, 2,
of d.r.'s is said to be uniformly consistent for discriminating
among (.1)1' ••• , (.I)m' if, for everyal , •• ", am (OS a i
exists an N = N(2) such that tor
o£
0
< 1), there
~ N, EQ~~(X) ~ a i for all
m.1. (i • 1, "" m)j or, equivalently, if
11m
0=00
THEOREM '.1:
int EQ¢~(X)
(.1)1
=1
(1
= 1,
""
If there exists a strongly selective d.r. tor
discriminating among
~
for some n, then there exists a uniformly
consistent sequence ot d.r.'s for discriminating
Proot:
m).
Part 1:
First, suppose tor n
= 1,
among~.
a strongly selective
••• I
108
We define, for n
= 1,
n
~i (x)
2, ••• , the functions
=-n1
n
r.
j=l
~. (x~)
J.
(1 = 1, ... , m),
oJ
and note the following properties of them (for any
D,
and
1, j = 1, ... , m):
n
= Eg¢i(X1)
(1) Eg~i(X)
n
(iii)
Varg
(Ji
n
n
n
n
n
- fb j ) ~ Var g fbi + Var g ~J + 2(Vargfbi • var,}P j )
~
4/n
!.
2
109
We define, for each D, the d.r. in (x)
1
n
k+l if ~1 ~
~~(x)
n
l3J
= (t~, ... ,
for all jf.i with equality
=
for k values of
o otherwise
Clearly,
m n
E Vi(X)
1=1
=1
j
(i = 1, ••• , m),
identically in x.
n
~
\If:):
PO(~1
n
> ~j
tor all
We have, for i
J1 1 )
taking only the first term in the sumj hence, using (i),
EO\lf~(X) ~
n
1 -
n
PO(~i oS ~j
tor at least one jfi)
= l, ••• ,m,
110
o > o. Thus
n
n
1
~ 1 - I. varO(~i - ~j) - 2
Jf1
by Tchebychetf's Inequality.
0
By (3.2) and (iii), we have
111
(1
Hence, for i
c
= 1,
••• , m) •
1, ••• , m,
i.e., (~(x») is a uniformly consistent sequence of d.r.'s for
discriminating among
Part 2:
~.
Now, suppose a. strongly selective d.r.,
!(x1, ••• , xv), exists for an integer v.
integer
~,
= ~v for
some
and define for i • 1, ••• , m
n
Ji (x1 ,
Take n
••• , xn )
%
= ~~i(xl'
••• , xv) + ¢1(xv+1' ••• , x2V )+ •••+
The remainder of the proof 1s analogous to that in Part 1.
//
112
It may be shown that the above theorem does not hold for weakly
selective d.r.'s.
In the two-decision case, a more powerful
theorem is possible:
THECR~
3.2:
If there exists a non-trivj.al 2-d.r. for dis-
criminating between wi and wj (iFJ) for some n, then there exists
a uniformly consistent sequence of 2-d.r.'s for discriminating
This theorem is given in ~8-7, being an adaptation of a theorem
in ~l-7, and will not be proved here.
The proof of Theorem 3.1
given here 1s somewhat analogous to the proofs quoted above.
THEOREM 3.3:
If there eXists a non-triVial 2-d.r. for discriminat-
ing between Wi and wj for some nij (sample size) for every i, j
=
1, ••• , m (iFj), then, for some n, there exists a strongly selective
m-d.r. for discriminating among
Proof:
00
1
, ••• , w •
m
Consider a particular pair i, j (iFJ).
By Theorem 3.2,
there exists a uniformly consistent sequence of d.r.'s for
discriminating between w. and 00.; denote such a
.
n
~ij(x)
n
= (¢1(j)'
J
~
se~ue~ce
by
n
¢j(1»' n
= 1,
exists an Nij such that for n
~
2, ••••
Nij ,
This implies that there
113
This is true for every pair i,
for all n
> N, (3.3)
j
(ifJ).
holds for every i,
Consider some such n and define wni(x)
(i
= 1,
••• , m).
o -< W.l -< 1
Now jn(x)
= (w~,
Then,
= 1,
j
1
=--(m)
2
••• , m
••• , w:) is an m-d.r. since
m
~ w~(x)
i=l
1
(~)
n
I.I.
irk
n
~ ¢i(k)(X)
kfi
and
= 2 - -
(1Fj).
¢i(k) (x)
m
so that Z wn (x) = 1 identically in x. We have
i=l 1
114
2
> m(m-1)
(m-1)
=
furthermore, for
jri,
2(m-1)
:2
m
m-1
-m-
(1
by
(
;.3)
= 1, ••• ,m);
115
(
m-l)
<1- m
-l(m)
m
2
(i
= 1,
by (3.3)
••• , m) •
From (3.4) and (3.5) it follows that jn(x) is a strongly selective
d.r. for discriminating among w.
II
We shall now give a converse to Theorem
THEOREM ;.4:
;.;2
If there exists a strongly selective m-d.r. for
discriminating among wl ' ••• , wm' then there exists a non-trivial
2-d.r. for discriminating between wi and wj tor every 1,j=1, ••• ,m
(i~j).
116
Proof:
Suppose !(x) = (~ll , •• , ¢m) is a strongly selective d.r.
for discriminating
among~;
i.e. ,
Consider some particular i, j (i#j), and suppose
we obtain
>1
by (3.7) •
117
Hence, (¢i' l-¢j) is a non-trivial 2-d.r. for discriminating
i and
between
ID
If instead of (3.7), we have s~p EQ¢j
ID •
j
i
< sup Eg¢i' a similar argument will prove (l-¢j' ¢J) to be a
ID
j
non-trivial 2-d.r. for discriminating between Wi and wj
A necessary and
This
II
is true tor every 1, j (1~j).
THEOREM 3.5:
'
su~ficient
condition for the existence
of a uniformly consistent sequence of m-d.r.'s for discriminating
among
, ••• ,
ID
l
ID
m
is that there exist non-trivial 2-d.r.'s for
discriminating between
every i, j
Proof:
= 1,
ID
i
and wj for some n ij (sample size) for
••• , m(i~J}.
The sufficiency follows directly from Theorems 3.3 and 3.1.
To prove the necessity, suppose
1n = (¢~J
••• , ¢~), n
= 1,
2, ••• ,
1s a uniformly consistent sequence of d.r.'s for discriminating
among 001 , "', wm'
inf EQ¢~
> 1/2
for i
Then let N be an integer
= 1,
"'J
m.
sl~h
that for n
~
N,
By the second sufficient condi-
Wi
tion for strong selectivity given early in this section, and
Theorem 3.4, the proof is completed.
II
For some sufficient conditions for the existence, and for
118
the non-existence, of non-trivial 2-d.r.'s, see the remarks
in Section
;.1.
We shall now consider the case of simple discrimination.
We say that two density functions rex) and g(x) w.r.t. a
measure ~ are "distinct" it the set ot all x tor which f(x) ,; g(x)
has positive
~-measure;
and a set of density functions is said
to be "distinct" if every pair in the set is distinct.
tl'
to.,
f m be density functions w.r.t. a measure
~j
Let
we have
the theorem:
THEOREM 3.6:
A necessary and sufficient condition for the eXistence
of a uniformly consistent sequence of m-d.r.ts for discriminating
among f l , "', f m is that f l , "', f m be distinct.
Proof:
The distinctness of any pair f i , f
j
(i~j)
implies the
existence of a non-trivial 2-d.r. for discriminating between ii
and f j for some nij (sample size) by Wald and Berger's theorem
quoted in Section 3.1.
The sufficiency part of this theorem
follows, then, as a special case of Theorem 3.5 which is true for
general classes of density functions as well as for parametric
classes.
The necessity is proved as follows:
Then, for some n
= n€,
there exists a d.r.
choose an E(O
~(x)
such that
< E < 1/2).
119
where the subscript on the expectation operator refers to the
Now suppose tor
corresponding density function as in Chapter I.
some 1, j (i,j) t i and f are not distinct.
J
Then
m
E1~i(X) + Ej~j(X) = Ei~i(X) + Ei~j(X) ~ k:1Ei¢k(X)
in contradiction to (}.8).
Hence, f
i
and f
J
= 1,
are distinct tor all
i,J (i#J). II
}.} Existence Theorems for Most Economical Decision Rules.
We assume throughout this section that
pair i,
j
t = m and
=J
corresponds to a correct decision if i
that a
and an
correct decision if i#J.
THEOREM '.7:
A necessary and sufficient condition for the
existence of a M.E. d.r. relative to any vector
criminating among
00 ~ (001,
-
••• ,00
)
m
~
for dis.
is that there exists a
uniformly consistent sequence of m-d.r.'s tor discriminating
among
~.
in~
120
Proof:
The theorem is obvious from the definitions involvedj
however, a formal proof may be given analogous to the proof of
Theorem .}.8.
II
THEOREM .}.8:
A necessary and sufficient condition for the
existence of a M.E. m-d.r. relative to any matrix
criminating among
~
= (001 ,
••• ,
~
for dis-
m) is that there eXists a
W
uniformly consistent sequence of m-d.r.1s for discriminating
among
~.
Proof:
Part I:
Sufficiency:
Suppose there exists a uniformly
consistent sequence of d.r.'s (D J.
n
Then, for any positive E < 1,
-
there eXists an N = NE such that for n
(1
= 1,
Wi
n
satisfies (2 •.}).
~xists
NE int P1(O,Dn )
~
1 -
E
Wi
Hence, for all i, J (i#J),
••• , m).
sup PJ(O,D )
~
< sup
-
Wi
~
J~1
r
PJ(O,D )
n
=1
- inf P1(O,Dn )
SE
Wi
Since (2 •.}) can be satisfied for some n, there
a least n for which it may be satisfiedj i.e., there exists
a M.E. d.r. relative to
~.
121
Part 2:
Necessity:
Suppose, given any
there exists a M.E. d.r. relative
(1
Given
to~.
~(O
€
< ~iJ
S 1,
~
1),
let
if'i=j
t~;j(m-l)
~
if i
j
and let DE be a M.E. d.r. relative to ~€.
(i, J •
1, ••• ,
m),
Then, since sup Pj(Q,D€)
(1)i
~ e/(m-1) for all 1, j (1~J), we have
1 - sup
(1)i
> 1-E
Let lEyJ, y
= 1,
(i
= 1,
••• , m).
2, ••. , be a decreasing sequence of positive
E
constants converging to zero.
.
Let D y be a M.E. d.r. relative
to (~ijEy) defined above and let Ny be the corresponding M.E.
sample size.
Now (Ny) is a non-decreasing sequence since, for
122
~
< v,
N~
is the least integer for which sup Pj(Q,D} ~ ~1jC
(1)1
= e~I(m-l)
(1,
~
J = 1, ••• , m; 1#J) can be satisf1ed by
so~e
V,
and Nv is the least integer for which
can be satisfied by some D, and hence
N~ ~
Ny'
We shall suppose
the sequence (Ny) does not contain any integer more than once,
for it it does we may delete some terms from. (e v ) and re-number
the SUbscripts so that it will not. Consider the sequence (Dn ),
n
co
nO' no + 1, ••• , where nO = min
'"
N
v~ n
< N\l+l' Hence, using
J and where
(N
'"
e
D = D y it
n
(3.9), we have
~
lim
v=oo
(l-e",)
=1
(i
= 1,
••• , m);
123
i.e., {Dn J is a uniformly consistent sequence of
d.r.~s.
II
Theroems 3.7 and 3.8 hold, of course, for the analogous cases
of simple discrimination as well.
According to these two theorems, the necessary and
sufficient conditions for the existence of a uniformly consistent
sequence of d.r.'s given by Theorems 3.5 and 3.6 provide sufficient
conditions for the existence of a M.E. d.l'. relative to any specif1c
~
or
~ani
relative to
necessary conditions for the existence of M.E. d.r.'s
every~ or~.
Thus we have existence theorems for
M.E. d.r.'s defined by Definitions 1.1, 1.2, 2.1, and 2.2 (for
i =m),
and hence sUfficient conditions for the assumptions of
Theorems 1.3, 1.5, 1.7, 1.10, 2.7, 2.8, and 2.9, deriving M.E.
d.r.'s from sequences of minimax d.r.'s for fixed sample sizes,
to be satisfied.
CHAPTER IV
HOST ECONOIICAL DECISION FUNCTIONS
4.1
Introduction.
In this chapter we extend the concept of M.E.
d.r.'s to more goneral decision problems.
We assume the formula-
tion of the statistical decision problem as given byWald (~22_7,
Chapter 1) in its complete generality, except for one modification: we shall be concernod with two pairs of loss and cost functiona.
For conciseness, we denote the sum of one pair (one loss
function and one cost function) by 1v and the sum of the other
1
pair by l1T2. (In applying the rosults, lve shall suppose that one
of the Wits is simply a loss function and the other a cost function, but we stato tho problem in this manner for symmetry and
generality.) WI and lV are referred to as "weight functions".
2
In considering risk functiona, we denote
where ri and r~ correspond to Wald:s r l and r 2, the i designating
that r i is the risk w.r.t. the weight function W.•
:J.
We assume
Wald's subsequent definitions, notations, and theorems throughout.
.tll references to Wald refer to ~22_7 unless otherwise
specified.
125
Let .f) denote the class of nll decision functions at the
disposal of the experimenter, and defino the following subclasses:
for any non-negative real r.
l'1e shall cons idcr the following problem:
to find a mini-
max solution w.r.t. W relativG to the class l'*; that is, to
2
find a decision function 8 which min~nizes the maximum risk w.r.t.
the
function W2 subject to the condition that the risk
w.r.t. the weight function WI is nowhere greater than unity.
w~ight
Blyth ~3_7 considered this problem (with minor modification~
and proved, under suitable conditions, that a minimax
solution,
fP,
w. r. t. cWl + W2 (relative to.t'»
so that sup rl(F, 8°)
Fe.Cl
where c is chosen
= 1 is a solution to tho problem. His con-
ditions arc that there exists a class C of minimax solutions 8
w.r.t. clV + W (for some c) for which
2
l
126
for all 5 e C, and that for every vnlue L between the minimum and
maximum of HI (over
which
I
X, 1-2,
Dt, s), thore exists a 5
L
e C for
L = L.
sUE r (F, 5 )
Fe12
lIe shall consider a different approach.
We prove under
very gener.:!l assumptions that a minimax solution w.r. t. WI relative to ~
r
is such a decision function, where r
O
O
is the
minimum r for which a minimax solution w.r.t. WI relative to
rfj r is in
£/k,
and shall also give sufficient conditions for
the existence of r O and of such minimax solutions.
4.2
Prcl~uinary
Theory.
Lot 5
r
donote any minimax solution
(if existent) w.r.t. WI relative to ,[jr' and define
e f)* J
5
r
and
}j0l
~
4.1:
=
{5
r
e ZiO
2
sup r (F, 5 )
l-2
=r J •
r
Suppose there exists a minimax solution w.r.t. WI
relative to Jj
r
for overy r for which Ac.:J
r
is non-null.
'Then
127
Proo.£:
Every 5 e)3°' is also in £)0, so that we need only
show that every 5 e
910
(1'-...1
is also in
<?10'
dv'
•
Consider an arbitrary clement 5' of )00; then 5' must be
a minimax solution w.r. t. WI relative to some
2
'
If sup r (F, 5 )
i-1
= r ,,
then 5
,
e
iJ r'
say
b
r
,.
£J 0' .
Now suppose sup r 2(F, 5 ' ) = r "
< '
r.
1-1
Then 6 ,~
e A-J
"
so
r
that
(4.1)
But
oCJ r "
~ £:) , since r" < r' so thQt
r
(4.2)
From
(4.1)
and
(4.2)
we have that 5
,
is a minimax solution w.r.t.
PcvJ relative to ~-rll;
but sup r 2
(F, 5 ') "
= r so that 5 ' e0
J:) ' .
l
1-2
According to Lenuna
4.1,
II
under certain conditions we may consider
5 as a minimax solution w.r. t.I'll relative to tho class of all
r
128
decision functions in ~ for which sup r 2 (F, 5) = r.
1-2
~
4.2:
relativG to
Suppose there exists a minimax solution w. r. t. l,J
iJ *
2
and a minimax solution w.r.t. WI relative to
J)r for every r for which
.b r
is non-null, and donote
(4.3 )
Then (i) any minimax solution w.r. t. 1.v relativG to 1)
l
-1:-
is a
r
minimax solution w.r.t. W relative to ;:)0, and conversely; and
2
(ii) any minimax solution w.r. t. VT
2
max solution w. r. t. W2 relative to
Proof:
is a mini-
.b *.
Let 6* bo a minimax solution w.r.t. W2 relative to ;:'*.
*
PI~lNmr *'-J
is non-null since 6
r
solution w.r.t. 1.vl relative to
(4.4)
13°
relative to
6
PI
o<..J ",
r
cZJ
w
1l-'
10 t 5
Since
-
bo a minimax
~o
af...-I
C
_
A 1lc<.J
and
r
. 2
2
inf sup r (F, 6) > inf sup r (F, 6)
/:)0
1J~l-
1-1.
~l-
r
12
= r*
2
> sup r (F, 6
- .r1.
r
*).
129
Now 6-:~ e
~- n J:) ~~ so that
j)
r
Therefore, 6
r
signs must
'!rJ
2
~~ e J) -l~ and hence
hold in
(4.4), and
6 -l~
r
e b O.
Hence, the oquality
6 -If- is a minimmc solution w.r.t.
r
relative to .fj 0, proving the first part of (i).
Since equality must hold in
(4.4), we have inf sup r 2 (F,0)
J:)O
= in!
l:f~
1-1
sup r 2( F, 6), and therefore, since .f) 0 ~ ,() -l~, (ii) is
i-2
proved.
By Lemma
4.1,
~ 0 by
we may replacG rJ
minimax solution w.r. t. W relative to
2
<;4
oZ...J
0
p,O'.
~J
Let 0 ' be a
Since 0' e.fj O'~
'.
it must, bo a minimax solution w.r. t. HI relative to some c1j r'
say ~~rl' and s~p r
1-2
= in!
,t)"'!:
0' is
2
(F, 0' ) = r t •
2
sup r (F, 6); that is, r
1-.2.
Cl
'
= r
-l~
and
'?l
~
r
I
= J) *'
r
-:t-'
minimax solution w.r. t. HI rulative to)j
r
the converse of (i).
II
Thus,
proving
130
THEO~1
Suppose there exists a minimax solution w.r.t. W
4.1:
2
relative to £) * and a minimax solution or w.r. t. W relative
l
to :t) r for every r for which J:) r is non-null.
b * J = r O'
min (r: or e
say, exists and
Then
°O is a ninimax
r
solution w.r. t. W relative to .f) ~\ i.e.,
2
Moreover, if 6
"'n
is any other minimax solution w.r.t. W relative
2
to t:) ~*', thon
(4.6)
Proof:
Now
inf
]jot
Lemma
4.2
sup r
.r-l
2
(F, 6)
= inf
(r
6
c;H
reAl
~'t-
) •
By
(i) and Lemma 4.1, thero exists a minliaax ~olution
w.r. t. 1:J relativG to
2
exists and 6
r
O
JJ o or
1j
01
.
Hence,
•
m~
(r : 6
r
(i!:>,~*,
e~)
is a minimax solution w.r.t. W relative to ~ O.
2
Lemma 4.2(ii) completes tho proof of (4.5).
J
•
131
where r~l- is defined by (4.3), and since 0* e
Clearly, if J:j
:0 *'
r
satisfies Waldls Asswnptions 3.1 to 3.5
then mly subset of j?J also satisfies them.
Konijn £11_7 has
proved that if ~ satisfies his formulation of Wa1d's j~sumption
3.6, then 1:) * and.J:)
r
(for anY' non-negative real r) satisfy'
this assun~tion. Then, byWald's Theorem 3.1, asserting the
existence of a minimax solution under his Assumptions 3.1 to
3.6, we have Theorem 4.2 stated below•
.\SS~iPTION 4.1: The stochastic process X = (Xl' X2, ••. ) underlying tho decision problem, the class
1-2 of possible
distribu-
tion flillctions of X, the space Dt of possible terminal decisions,
both weight functions HI and Vl2 , C!.ncl tho class :() of docision
functions at the disposal of the experimenter satisfy Waldls
,
Assumptions 3.1 to 3.5 and Konijnls formulstion of Weld s As-
sum~tion 3.6. 1
1. Some of these assumptions mal be relaxed somewhat;
e. g., see Ghosh £6_7 and LOhmnnn £14_1.
THEOREII
4.2: Suppose Assumption 4.1 holds. Then
(i) if
w.r. t.
:JJ -l.~
is non-null, there exists
2 relative to the class
11-1
0
minimax solution
.e *; and
(ii) for any r for which,[:) r is non-null, there exists a
minimroc solution w.r. t. 1vl relativQ to j).
r
" lso
tuions re It'
'?*and
a 1VO t 0 ~
noreover, ID1n1max
u
<;Po,
~r
are ch arac-
terizod according to Wald's genoral theory as given in~22_7,
Chapter 3.
4.3 Most Economical Decision Functions.
Let c(x; s) be a cost
function as defined byWald and donote the expected cost function
when using 6 by r 2(F, 6). Let w(F, dt ) be a loss function as
defined by Wald and denote the expected loss function when using
6 by rl(F, 6).
Let
finod for all F e
DEFINITION 4.1:
~(F)
be a given positive-valued function de-
i-2 .
A decision function 6° is said to be most econo-
mical (relative to the class
lJ ) if
it satisfies
and if, for any other 6 satisfying U~. 8),
sUE
Fei
o
r 2(F,6 )~ su.E r (F,6).
2
2
Fei 1
•
133
Thus, a most economical (M.E.) docision function minimizes the
maximum expected cost subject to upper bounds on the expected
loss.
Define two weight functions:
(4.8)
Clearly, the problem of finding a most economical decision function is simply the problem introduced in Section 4.1 with W
1
and W defined by (4.8).
2
Theorem 4.1 gives a method of obtaining
such decision functions:
Define the class )t)r of decision
functions for which the expected cost is nowhere greater than r
and obtain a minimax solution 5 w.r. t. \v relative to
r
1
J3r .
Letting r O be the minimum r for which 0 satisfies (4.7), 5
r
r
O
is a M.E. decision function.
And Theoren 4.2 gives sufficient
conditions for the existence of a solution by this method; oxp1icit1y, if nssumption 4.1 is satisfied and if there exists
some decision function satisfying (4.7), then thero exists a
M.E. decision function.
vla1d 1 s sequential probability ratio test (see ~21_7)is
an example of a M.E. decision function.
Suppose Xl' X2 , •.• ,
are independent and identically dist,ributed and that
1-2
consists
134
of but two clements, Fa and Fl'
Suppose the cost depends only on
t
Suppose D has but
the sample sizG and is proportional to it.
two clements dO and dl corresponding to "F is true ll and "F is
a
true II , respectively.
1
Suppose the clo.ss of decision functiims at
j
the disposal of the experuoonter is unrestricted.
Let w(F , d )
i
= 1 - 5ij (Kronoker 5; i, j = 0, 1), o.nd let
= a. and
B(F )
l
=~.
~23_7,
~(Fa)
Then, according to a theorem of Wald and Wolfowitz
the sequential probability ratio test of Ha : F = Fa
against HI : F
= Fl is most economical where
on the two types of errors.
the maximunl expected cost; it
Blyth
L3_7
a. and
~
are bounds
In fact, it does more than minimize
min~nizes
the Gxpected cost at
gives solutions to some estimation probloms.
His solutions may be used to find
11}'.1. E.
estimators II for tho mean
of a variable which is (1) normal with known variance, or (2) rec..
tangular with known range, and the cost is proportional to the
sample size and the loss function is an arbitrary non-decreasing
/
function of the absolute error of estimate.
Now let us consider non-sequential M.E. decision functions.
We suppose (Xl' •.• , Xp )' (Xp+l ' ••• , X2p )' •.• ,
are indepen-
dent and identically distributed random vectors.
Let JC)n be tho
13,
class of decision functions for which tho probability is one
that the first np random variables in X are observed and no
more, and supposo the class of decision functions at the disposal of the experimenter is 1) =
U
iJn
n
(that is, we only
admit samples of a fixed size n, n
= 0, 1,
2, ..• , from the p-
variate population).
Suppose tho cost function is a function
of n only, say c(n).
Denote decision functions in
~
~n
by 6n .
Then a M.E. decision function 6N is one which satisfies
for all F e
i-1
and c(N) < c(n) for all n for which some 6 &~!
-
(4.9).
n
satisfies
lJc then say that c(N) is the minimum cost and N is
the 1'1.E. sample size.
If c(n) is
ml
increasing function of n,
then by defining a weight function and a 8-function as in Sections 1.3.1, 1.3.2, 1.4, 2.3, or 2.4, this definition reduces
to Definition 1.1, 1.2, 2.1, or 2.2, respectively.
To find such decision functions, Theorem 1 suggests the
following procedure, analogously to the corresponding procedures
givon in Chapters I and
v1
l
(defined
by
n:
Let 6°n be a minimax solution w. r. t.
(4.8» relative to fJ n h~hich surely exists if
./'.ssumption 4.1 holds).
Suppose the minimum of c(n) over all n
for which 6~ satisfies (4.9) is attained for n
=
N.
'Then 6~ is
136
n H.E. decision function.
}10roover, Theorom 4.1 also asserts
t
that if ON is any other M.E. decision function, then
or
~(F) - rl(F, O~)
(4.10)
13(F)
No property analogous to (4.10) was proved for the special cases
in Chapters I and II.
A possible extension of non-sequential M.E. decision functions as given nbove is to problems of k populations.
a cost function c(~) whe~e ~
size for the i th population.
functions £)
c
=
(0 e
/3 :
We consider
= (nl~ ••• ,nk) and ni is the fixed sample
We consider classes of decision
c(n)
-
=c
) , and let 0 be a minic
max solution w.r.t. 1rJl relative to j j c.
Then, by Theorem 4.1,
if we order the 0 's according to increasing cost and choose the
c
first 0 in the sequence for which reF, 0 ) < e(F) for all F, wo
c
will obtain a M.E. decision function.
c -'
137
4.4 Decision Functions with Bounds on tho Maximum Expected Cost.
TIe make tho srune definitions and assumptions as given in the
first paragraph of Section
4.3, but instead of (4.8), we define
the two weight functions
WI
= c(x;
s)/~(F),
Then a minimax solution w.r. t. 1IJ relative to fj.:f- will be a dc2
cision function which minimizes tho maximum expected loss subject
to the bounds on the expected cost: r (F,o)
2
F e 1-2 .
If
functions
~(F)
~
B(F) for all
is independent of F, then, in terms of the weight
(4.8), such decision functions arc simply the minimax
solutions w.r.t. WI relative to ~r (with r
siderod previously in obtaining
U.j~.
= 1) that were con-
decision functions.
the approach to finding M.E. solutions by using Theorem
similar to attacking this problem directly.
and
4.2
ThUS,
4.1
is
Though Theorems
4.1
are applicable, their method of solving this problem docs
not appear to be very practical.
It may be noted that Blyth1s results arc applicable to
this problem also.
He gives the solution to tho two estimation
problems referred to in Section
4.3.
APPENDIX
SOHE "T':lO-SIDED II '!'friO-DECISION RULES AND SYMMETRIC
THREE-DECISION RULES FOR COMPOSITE DISCRIMINATION
A.l Some "Tl·w-Sided" Two-Decision Rules.
A.l.l
Introduction.
Let Al , A denote two alternative
2
decisions, the acceptance of A, being preferred when an unknown
1.
parameter 0 of the density function f(x, 0) of a random variable
X, to be observed, is in a subset w, of tho parameter space
1.
11
(i
= 1,
2). Given two numbers Ql' 02 (01 > 02 ~ 0), we de-
fine:
Given
~
tive to
= (al , a 2), it is desired to construct a M.E. d.r. rola-
.s:
for discriminating between wl' w2 '
We call such a
decision problem a "two-sided" two-decision problem for obvious
reasonS.
We consider four differont situations, using the above notation for each,
M.S. d.r. 's for each situation are given balow,
and namographs for obtaining such d.r.'s explicitly arc given in
Section A.2; some examples are given in Section A.4.
The
139
derivations of the d.r. 's are not given, however, since they arc
analogous to the derivations given in Section
2.5
and 2.6.
Al-
ternatively, they may be derived as tests which maximize the minimum power using a theorem of Hoeffding referred to in Section 2.1.
(Many of these tests which maximize the minimum power have beon
derived in the literature, especially for the case of 9 = 0;
2
Horoovor, wo foel that such two-sided problems are not usually
very realistic except as approximations to the analogous sym,
,
I
metric throe-decision problem where alternatives '\' A2, A)
correspond to
,
(.l)
1
'"
(9
o~
,
-01
,
respectively, and a1 '" a
3
J,
=~.
I
(.l)
2
=
,
(.l)
2'
(.l)
3 '" (Q
We shall show in Section A.3
that this approximation is usually very good so that the solutions to the "two-sided" two-decision problem may be used as
solutions to this symmetric three-decision problem.
The H.E. 2-d.r. 's referred to above are given bo1ow.
A.l.2 The Mean of a Normal Distribution, Variance Known.
Suppose f(x, g) is a normal density function with variance
(known) and
moan~; i.e., X is N(~, 0 2 ).
0
2
Lotting ~ be Q intro-
duced above, a M.E. 2-d.r. relative to (~, a 2) is: Take a
sample of size N and
(A.l)
~
choose
>
if
Ii I
A2
c
= cN
<
where (n, cn ) is a pair of solutions of
- a1
and N is the least integer n for which a pair of solutions exists.
Solutions
~ay
be obtained by first solving the two equations
(A.) )
where '11
=1
-
~
P, '1
2
= a p
2
with p
= 1 for n and
c and then,
taking N to be the least integer ~ n, ro-solving (A.)
for cN
and P •
A i1.E. 3-d.r. for the corresponding symmetric threedecision problem is given in Section 1.3.6, Whore, by symmetry,
cn = -cn =
1
2
n
-0 ;
that is, take a sample of size N and
143I
.\
choose
-x
if
I
A2 if
< -c
I x I .::
c
x >
c
I
A if
3
where Nand c arc chosen as in Section 1.3.6.
A.l.3 The Mean of a Normal Distribution, Variance Unknown.
Again, we suppose X is N(~,
We let ~/o
0
2 ), but now both ~ and
0
are unknown.
n
= g and denote s2 = ~ (x. - x)2/(n - 1). AM.E. 2-d.r.
. 1
1.=
rolative to (aI' a 2 ) is:
1.
Take a sample of size N and choose J~
or .\2 according to (A.l) vd th x replaced by x/s, and where Nand
c are chosen as in Section A.l.2 with
(A.~)
replaced by
T 1 ~g (ync) - T 1 I::g
(-fie) < 1 - a l
n- ,yn 1
n- ,v n 1
T 1 ~r. (JUc) - T -1
n-
,vn~2
n
T:.r.
,vn~2
(-fie)
:=
a2
where T (t). denotos the non-central t distribution function
f6
,
with f degroes of freedom and non-centrality factor 6 (i.e.,
t =jf(z
+
6)/ X
f
where z is N(O, 1) and X~ has a chi-squared
distribution with f degrees of freedom).
in Section
~.1.2
with (A.3) replaced by
(A.4) may be solved as
Tn-1 ,yn
~ ~~.
1
(In
c) - Tn-1 ,vn
~ ~~. (-in c)
1
= y.1 (i = 1,2).
A.l.4 The Parameter of a Binomial Distribution.
f(x, Q) is a point binomial distribution with
Q
= P - 1/2 (1/2
par~letcr
> 01 > 9 > 0), and donote by t
2
--
observations in a sample of size n that are
ourselves to non-randomized d.r.
IS;
~
11
Suppose
p.
Lot
the number of
O. We restrict
a M.E. randomized d.r. may
be obtained by making obvious modifications.
AK.E. non-randomized 2-d.r. relative to
(~, a
2 ) is:
Take a sample of size N and choose 1~ or A2 according to (A.l)
with replaced by (t n/n - 1/2), and where N and c are chosen as
x
before with (A.2) replaced by
(A.6)
and (A.3) replaced by
(A.?) Bn ,1/2+G.(n/2+nc) - Bn ,1/2+Q. (n/2-nc-l) ~ Yi (i = 1,2).
1
1
B (t) donotes the binomial distribution function as in Chapter
n,p
I.
ll!3
11..1.5
The Median of an Unspecified Distribution.
Sup-
pose X has an unknown density function f(x) w.r.t. a measure ~
on the real line such that
~
(x
~ 0)
> 0
and
fJ.{ x > 0)
>
0,
o
mod denote g
= g(f) ~ \ fIx) ~ -
1/2. Denote by t n the number
-00
of observations in a sample of size n which are
~
O.
Restricting
ourselves to non-randomized d.r.1s, a M.E. 2-d.r. relative to
(al , a 2 ) is:
Take a sample of siZG N and choose
ing to (A.1) with
x replaced by (tn/n -
.~
or A accord2
1/2) and whore N and c
are chosen as before with (.\.2) replaced by (A.6) and (1'...3) replaced by (1'...1); that is, a M.E. d.r. for the parameter of a
binomial distribution is also a M.E. d.r. for the median of an
unspecified distribution, in the above sensa.
t.2 Nomographic Solutions.
In this section we develop a nomo-
graphic method for solving the equations in (A.3) for n and en.
Thon, by using normal approximations to the non-central t and
binomial distributions, we use the same method for solving the
equations in
(A.5), (A.?), and (1..9), thus
mak:~g
it possible to
obtain explicitly the M.E. d.r. 's of Scction A.1.
1..2.1
(11..8)
•
The Nomographs.
Consider the function
y(x, y) =£(y + x)
-!fey - x)
and the inverse function y
= y(x,
y).
Figure A.l below is a
graph of y as a function of x for various values of y.
Figure
A.2 below is a graph of the function
as a function of x for various pairs of values (Yl' Y2)'
Now suppose x is a function of nand c, say
x = f(n, c).
Suppose further that y may be factored as:
y = y(f(n,c), y) = g(n,c).h(y).
C"•• 10)
Then we have z
= z(f(n,c);
Yl' Y2)
==
h(y 2 )/h(Yl)' independent of
nand c.
Suppose we are given two values of Y, (Yl , y ), and we
2
wish to solvo the two corresponding slinultanoous equations expressed i1"'. (A. 3) for (n, c) where x and y
(A.10).
z
==
g:i ",m by C". 9) and
Enter Figu:'(' ,\,2 at
h(Y2) /h( y]) and read off x = xo as the cc;s,i'3,Ja of the (Yl'Y2)-
curve.
•
lie :!J':o,:,oed as follows:
01:"0
Enter Figlro A.l at xo and read off y(x O' "(1) = YO as the
FIGURE A.l
y
= y(x, r)
Defined by r(x,y)
l
=.f Crt-x) -..1 (y-x)
1.5I
i
1-----
1.0 -
r=.50
0.5
I
I
I
o
I
1I
i
. ".. - :. ._- .... _._j" - .,--" ..
W-.J..._ _
o
JI-...
_
0.5
1.0
._ _.Ll.-_ _.-......L_-_~
1.5
2.0
I
..JI_ _- - J - x
3.5
4.0
FIGURE A.2
z(X i 1l ,r2)
z
.4 ~-For x
,
.2
___ .z
•
> 4:
x-1.6449
= x+l.2816
--" x-2.3263
Z = Jet 1. 6449
.1
o
= y(x,r2)/Y(X,1l )
o
1.0
1.5
1. This figure was constructed with the aid of Tables of
Normal Probability Functions ~26_7•
146
ordinate of the yl-curve.
Yo
= g(n,
Then solvo Xo
= fen,
c) and
c) • h(y ) for nand c.
l
Note that if X o + YO is large, say> 3, then § (Yo+xo ) ;. 1
YO
~ xo + f-l{'~).
.
,(Soe Section 1..3.)
This gives an alternative
method for obtaining YO without using Figure A.l.
(By using
Figure A.l, additional curves may be plotted on Figure A.2 if
needed. )
~Jo now apply this method to solving the equations of Scc-
tion A.l.
A. 2.2
The Nean of a Normal Distribution, Variance Known.
We shall use tho nomographs of the previous section for solving
the equations in (A.)
and a with
2
p
for n and cn for specified values of ~
= 1.
Note that tho equations in (A.3) arc of the form (A.8)
with x
=
fen, c)
=Jh cia where
h(y.)
~
= Q.
~
(i
= 1,
2).
There-
fore, by the procedure of Section A.2.l, we may obtain Xo
and YO = Y(xO' "(1)
ing
•
=.;n Ql/a
=jn
cia
and then solve for n and c, obtain-
Tho least integer
..~. 2.3
known.
~
n is tho M.E. sample size •
The Mean of a Normal Dis tribut~oE 9 V~j ?:2££._:z.~-
Using a normal approximation to the non-centrnl t dis-
tribution zivon by Johnson and Welch ~lo_7:
f(
-
t-5),
r
t
2'
/. 1 + 2f
tho equations in (A.S) may bo approximated by
c - Q.
-c - Q.
§ C/n -::::==1.::;
) - cr (In --===1.:;
1 + -!!£..
1 + 2n-2
?
-
2n-2
= f (In
j~
Q. + c
- j1
=
y.
1.
. nc 2'
nco2
2n-2
= Qi
(i
-
.-
Q. - c
1.
I1 +nc-2
./
~
)
2n-2
(i = 1, 2)
and nre thus of the form (A.8) with x
where h(yi )
) - rf (,In
1.
~
)
=
f(n,c)
= jrrc/!1+nc 2/(2n-2)
= 1, 2). We may obtain Xo and YO = y(xO'Yl)
by the procedure of Section A.2.1 and then solve for nand c, ob-
148
YO
2
2
xo
Q~
1
2
=-+-,
the latter giving a satisfactory approximation except for vary
small n.
However, those solutions are only approximations since tho
non-central t distribution has been approximated by a normal distribution.
They may be improved iteratively, or simply checked
by seeing if a solution to (A.4) Gxists for n
= N - 1, computing
the non-central t distribution function more accurately from its
Edgeworth Type A Series, as discussed in LIO_7 (p. 388).
A.2.4 The Parameter of a Binomial Distribution and the
Median of an Unspecified Distribution.
solved to obtain M.E. d.r.'s in Sections
The equations to be
~.1.4
and A.l.S are idcn-
tical so the two problems are treated simultaneously here.
the normal approximation to the binomial distribution,
B
( t) ,;
§ (
n,p
the equations in (A.7) may be
k + 1/2 - np ) ,
Inp(l _ p)
approxL~ated
by
Using
1 1j)
§" (
nc + 1/2 - nQ
In(1/4 -
0i)
i
)
-nc - 1 + 1/2 - nQ.
-.f ( _;:::::==:::;:-..;;;;J.
In(1/4 - Q~)
=f (.;ri
o.
+
c
+
)
/1/4 -
and are of the form (A_B) with x J..
and y.J.
=;n g./
/1/4 - Q~J.
].
1/2n
J.
g~J.
.,.,;n
rJ" (.(n
-
)
Q. - c - 1/2n
_J.
-
)
11/4 - ~].
(c + 1/2n)/ /1/4 -
g?
J.
= g(n, c)-h(y.)
where h(y.)
J.
].
= Q./Il/4
- Q~,
g(n,c) = IS (i = 1, 2).
J.
].
But x is not a func-
tion of (n, c) only, so that tho above method cannot be applied
= g(n, c)-h(y.),
and
].
sider
Y2
fixed \imp1ying g~ fixed), and let x
r.: c + l/2n
=yn
---.
11/4 -
gi
_ Con-
,
= fen,
c)
Construct a graph of z as a function of x '
from Figure ~_1 by taking the ratio of the ordinates of tho
15C
Y2-curvo at the abscissi x
=/(1/4 - Qi)/(1/4 - Q~)
ordinates of tho y1-curvo at the abscissi x
I
I
=
x.
,
and the
Xl
Entoring this
I
, we obtain x , and y = y(x , Yl)
be obtained from Figure
~.l;
then x
,
and y
,
Hlay
as functions of n and
c may be solved, obtaining
To check or improve tho accuracy of tho solutions, tables
of tho binomial distribution function may be usod.
A.)
Approximate Equivalence with
Synrr~otric
Three-Docision Rules.
We first consider tho caso of decision rules concerning the mean
of a normal distribution, variance lcnown.
The equations to bo
solved for nand c to obtain a M.E. symmetric J··d.r. are, from
] Sl
U.E. two-sided 2-d.r., (.\.3) must be solved with P
if f
LIII
(9
1
+ cn )/a_7
,;,
= 1. Clearly,
1, then the solutions (n, c) of the
two pairs of equations will be equal.
We shall show for
lower bounds on a 1 and a 2 (SGC Table iLl) that 1
n
+ c )/a_7 < .000s independently of Ql' Q2' and a.
v~rious
-!ELIII
(Q1
Consider the equations
(A. 11)
-
> Y2 ~
(Yl
0, x
~(y.
1
>
~
normal density function by
=
1
x is
say, and for all 01' Q2' a (Ql
1
1
0) where Yl = 1 - a l
sufficient to show that Yl +
y.
(i = 1, 2)
+x) - trey. -x) =y.
= a2 •
"largo" for a. >
1 -
> 9
¢ and
and Y2
2 ~ 0).
It is
a: (i = 1,2),
1
Donoting the standard
considering
(~.ll) as
y.(x, y.), we have
1.
1
'oY·
'OYi
_2 = Cj(y. + x) - ¢(y. - x)
1
1
< 0 since Iy+xl > Iy-x I
(1 = 1, 2)
1~2
so that Y is an increasing function of x for fixed Y and a dei
i
creasing function of Yi for fixed x, and x incroases with Yi for
fixed y ..
1.
Writing x as a function of y. and y., the two equations
1.
1.
in
(~.ll)
may be written
(:•• 12 )
From the above monotonicity relations, we thus have
(A.l) )
Writing Yl as a function of x and Yl' we therofore have
o
(A.14)
Yl ' say.
Thus, x
O
is the solution of § (xO) - [ (-xO)
=
a~ , and Y~ is
.00
000
the solut1.on of'£ (Yl + x ) - § (Yl - x ) = I - a l ' and for all
° (1.. = 1,
ai ~ ai
2), Yl
+
° x°
x ~ Yl
+
or.:e;r (Yl
+
° x°).
x ) ~.:eIT (Yl
+
Table A.l below gives pairs (a~, a~) of lower bounds on
(aI' a 2) for whiChjr(y~ + xC) > .9995 so that for a i ~ a~
2), 1 - § (Yl
+
x) < .0005 for all Ql' Q2'
CJ.
(i
= 1,
For all such values
(aI' a ), the symmetric three-decision problem is Virtually
2
153
equivalent to the two-sided two-decision problem.
T.:.BLE A.I
Lower Bounds on (~,a2) for which I -
ff' £,.;;i.(Ql+cn )/cr_7
< .0005
.001, .999
.501, .900
.950, .600
.035, .990
.800, .800
.990, .300
.265, .950
.900, .100
.999, .100
If tho variance is unknown, and we assume the normal approximation to the non-central t-distribution givon in Section
A.2.3 to be sufficiently accurate, then the above argument
holds for this caso as well so that the symmetric throe-decision
problem and the two-sided two-decision problem are virtually
equivalent i f
~,
a
2
satisfy bounds such as those given in Table
1)..1.
The cases of decision rules concerning the parameter of a
binomial distribution or the median of an unspecified distribution can be treated in an analogous manner with one minor modification, assuming the normal approximation to the binomial
distribution is sufficiently accurate.
As in Section A.2.4, we
sec that a subscript i is required on x in (1.11) and the argument following it.
Equation (1.12) must be replaced by
154
and hence xl > x • Hence, (A.13) may be replaced by xl > x 2
2
~ x2(O, a~) = xO and thus (A.14) still holds.
Hence, tho results
of the argument are the same and the tvlO decision problems are
virtually equivalent for aI' a
2
satisfying bounds such as those in
Table 11..1.
A.4 Tables of Host Economical Sample Sizes.
In this section, we
give some tables of M.E. sample sizes, exemplifying the decision
problems considered above.
Table A.2 gives M.E. sronple sizes
relative to the vectors a = (.90, .95) and ~ = (.95, .99) for
various values of T I
= 0l/a and T 2 = 02/a for the "two-sided" de-
cision problem concerning the mean of a normal distribution,
variance known, as introduced in Section A.l.2.
The sample sizes
have been cOmputed from the nomographs as indicated in Section
A.2.2.
Table A.2 also gives the M.E. sample sizes as computed
from the nomographs for the analogous decision problem when the
variance is known, as set out in Section A.l.3, where now
T.
J.
= Q.J. (i = 1, 2). However, these latter sample sizes are not
necessarily exact since their computation utilized the normal
155
approximation of the non-central t distribution given in Section
A.2.3; but, by using an Edgeworth
e:~pansion
of the non-central t
distribution, several of the values were checked, all of which
were found to be correct.
In any case, the relative sizes indi-
cated may be of more interest.
Table A.3 gives the M.E. saInT,lle sizes for "two-sided" nonrandomized d.r. t s concerning the parameter of a binomial distribution or the median of an unspecified distribution as defined
in Section A.lo4 and A.lo5.
All sar,lple sizes between 0 and 50
listed were taken from Tables of the Binomial Probability Distribution ["25_7 and are accurate.
All other values were com-
puted from the nomographs as outlined in Section A.2.4, and the
last digit is very approximate due to the use of a normal approximation to the binomial distribution.
A better approximation
might be expected for randomized d.r. 's since the introduction
of randomization essentially has the effect of making a discrete
distribution continuous, just as the normal approximation does.
In any caso, the relative values may be useful.
Of particular interest in both tablos is the increaso in
the H.E. sample size whon
a slightly larger
n~mber.
Q
2
(or
T
2
)
is increased from zero to
156
TABLE A.2
Host Economical Sample Sizes:
Normal Hean, Variance Known and Unknown
i'
(
T •
i
II
if a unknown
iII
Q
i
T
l
T
2
I
2.0
0
2.0
.10
2.0
.50
1.0
0
1.0
.10
1.0
.50
0.5
0
0.5
.10
0.5
.20
0.2
0
.
;
I.
g./a if a known
.-
a1
cr
= .90, a 2 = .95
I
a1
I
known
_·-t
II
I
!'iost Economical Sample Size
lcr
cr
unkn01ffi
= .95, a 2
l:nmm
-,
I'
I
-= .99
.,
5
5
8
3
5
5
9
8
8
14
13
18
22
20
24
64
83
I
11
,
I
II
I?
II
35
45
!
43
44
72
75
54
57
99
104
II
96
102
176
188
446
449
I
I
I
I'I
263
14
I
,
I
I
II
265
I
i
385
701
708
857
868
1578
1598
1782
1785
I
382
0.2
.05
0.2
.10
0.1
0
1051
1053
0.1
.02
1347
1350
2466
2471
0.1
.05
3428
3439
6310
6329
I
I
b" unknown
3
4
j
I
I
I
I
157
TABLE A.3
1iost Economical Sample Sizes:
Binomial Parameter or Median of Unspecified Distribution
t
I,
M.E. Sample Size
, M.E. Sample Size
I
~
= .90 I a 1 = .95
a
~
I
.40
0
2
.95 a 2 = .99
11
19
.20
.05
89
163
.40 .05
15
25
.20
.10
195
359
.40 .10
17
32
.10
0
259
439
.40
.20
37
61
.10
II
277
485
.30
0
24
40
.10
331
606
.30 .05
32
53
.10
I .05
I
I
837
1542
.30
.10
45
81
.05
I
.30
.20
161
298
.05 I .01
1333
2454
.20
0
62
105
.05 I .02
2367
4941
.20
.02
66
115
I
I
6564
11128
I
I
.02
I .01
I
i
I
.02
1484
0
0
I
!
I
1775
158
BIBLIOGRAPHY
liOn Uniformly Consistent Tests", .i\nnals
of l1athematicnl Statistics, 22 (1951),
289-293.
["2_7 nerger, A., and l"rald, A., "On Distinct Hypothesos", !,rmals
of Mathematical Statistics, 20 (1949),
104-109.
---
["3_7 Blyth, Colin R., liOn 1'1inimax Statistical Decision Procedures
and Th'dr l'..dmissibilityII , Annals of Hathematical Statistics, 22 (1951), 22-42.
Dvorotzky, ".,.... , Wald, A., and Uolfowitz, J., "Elimination of
Randomization in Certain Statistical Decision Procedures and Zero-Sum Two-Person
Games II, .Annals of Hathomatical Statistics,
22 (1951), 1-21.
Ferris, C. D., Grubbs, F. E., and Weaver, C. L., r~perating
Characteristics for tho Common Statistical
Tests of Significance ll , Annals of l1athematic~l Statistics, 17 (1946), 178-197.
Ghosh, M. N.,
'V~n Extension of Waldls Decision Theory to
Unboundod Height Functions", Sankhya, 12
(1952), 8-26.
Hoeffding, 1,'Tassily, '" Optimum I Nonparametric Tests", Second
Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1951, 83-92.
Hoeffding, lIassily, IILectures on Estimation and Tosting of
Hypotheses", University of North Carolina,
Fall Quarter 1952 (unpublished).
Hoeffding, v,Tassily, "Lectures on Non-Parametric Inference",
University of North Carolina, Winter Quarter 1953 (unpublished).
[io.-7 Johnson, N. L., and lielch, B. L., "1~pp1ications of the
Non-Central t-Distribution", Biometrika,
31 (1939), 362-389.
159
~11_7 Konijn, H. S.,
"On Certain Classes of Statistical Dccision Procodures", Annals of £lathomatical Statistics, 24 (19531, 440-448.
["12_7 Kruska1, 1.vi11iam, "The 1-1onotonicity of tho Ratio of Two
Noncentra1 t Density Functions", .Anna1s
of Mathematical Statistics, 25 (1954),
162-165.
-r13-7
Lohmann, E. L.,
Theory of Testing Hypotheses, Associated
Student Stores, University of California.
"On the Existence of Least Fa·vorab1e Distributions", ..".rma1s of Hathematica1 Statistics,23 (1952), 458-416.
"Some Principles of tho Theory of Testing
Hypotheses", .Annals of Mathematical Statis~, 21 (1950), 1-26.
~16_7
Lindley, D. V.,
"Statistical Inference", Journal of the
Royal Statistical Society, Series g, 1$
(1953), 30-76.
Advanced Statistic~l Methods in Biometric
Research, Now York, John tri10y and Sons,
1952.
"Comparison of Tests for Non-Parametric
Hypotheses", :.rkiv fOr Matcmatik, 3 (1954),
133-163.
["19_7 Savage, L. J.,
"Theory of Statistical Decision", Journal
of the .ijncrican Statistical Association,
46 (1951), 61.
Erling, "Height Functions and Ninimax Procedures
in the Theory of Statistical Inference fl ,
f.rchiv for l!Qthomatik og Naturvidonskab,
51 (1952), 1-73. -
£21J HaId.,
:lbraham,
Sequential .lna1ysis, New York, John Wiley
and Sons, 1947.
.'
Statistical Decision Functions, i:ew York,
John Wiley and Smls, 1950.
,11,. ,
160
-r23-7 lvald,
'
. ,
.l.a..
.
.Ai. ,
and 't'iolfowitz, J., "Optimum Character of the
Sequential Probability Ratio Test",
.i1nnals of Hathematical Statistics, 19
(1948), 326-339.
and
~·Jolfowitz,
J., "Characterization of the
Minimal Complete Class of Decision Functions when tho Number of Distributions
and Docisions is Finite", Second Berkeley
Symposium on Hathcmatical 'S'ratistics and
Probability, University of Cnlifornia
Press, 1951, 149-157.
~25_7
Tables of the Binomial Probability Distribution, National
Bureau of Standards ~pplicd Mathematics
Series, 6, 1950.
~26_7
Tables of Normal Probability Functions, National Bureau
of Standards Applied Mathematics Series,
23, 1953.