·e
ADMISSIBLE DECISION RULES
by
GEORGE E. McARTHUR
Department of Mathematics
Master of Science
McGill University
Montreal
ABSTRACT
This paper discusses the concept of admissibility with respect
to decision theory in a comprehensive yet theoretical manner.
little knowledge of decision theory is assumed.
Very
Admissible phenomena
are investigated both for their own sake and for their inherent connection with completeness and Bayes and Minimax decision rules.
summary of results is also included in the final chapter.
A
ADMISSIBLE DECISION RULES
ADMISSIBLE DECISION RULES
by
GEORGE E. McARTHUR
A thesis submitted to the Faculty of Graduate Studies
and Research at McGi11 University, in partial fulfi1ment of
the requirements for the degree of Master in Science.
April,
Department of Mathematics
McGi11 University
Montreal
@
George E. McArthur
1969
1969
ACKNOWLEDGEMENTS
1 would like to express my deep appreciation to Prof. V.
Seshadri, of McGill University, for suggesting the subject of this
dissertation, and for providing assistance and encouragement in the
organization and evolution of the content.
1 am also grateful for the opportunittes and facilities provided
by the Mathematics
Depar~ment
of McGill University during the course
of my work.
Finally 1 am indebted to Miss Hilde Schroeder for the patience
and competence she exercised in typing this thesis.
-ii-
TABLE OF CONTENTS
PAGE
ACKNOWLEDGEMENTS ••••••••••••••••••••••••••••••••••••••••
i
TABLE OF CONTENTS •••••••.••••••••.••••••••.••••••••.••••
ii
...........................................
iii-vii
The Decision Prob1em •••••.••.•••.•.••
1 - 15
CHAPTER II:
Admissibi1ity and Comp1eteness
16- 77
CHAPTER III:
The Admissibi1ity of Bayes and Minimax
INTRODUCTION
CHAPTER
CHAPTER
1:
IV:
Decision Rules ...................... .
78-118
Summary ••.•••.•••.•••••••••.••••••.••
119-123
BI BLIOGRAPHY ••.••••••••.••.•••••••••••••••••••••••••••••
viii-x
- iH -
INTRODUCTION
It was Abraham Wald who first suggested in 1939 the use of
game theory princip1es in theretofore strict1y statistica1 prob1ems.
The mathematicalfoundations of the theory of games had been developed
by E. Borel (1921) , J.Von Neumann and Oskar Morgenstern, and were
expounded in a definitive book by the latte!' two in 1944, entit1ed
Theory of Games and Economie Behaviour.
At the same time statistica1
theory had evo1ved a10ng the guide1ines set clown by the likes of Karl
Pearson and R.A. Fishkcr and had been formulated formally by J. Neyman
and Egon Pearson.
However, it remained for Wald to fully appreciate
that statistics can be viewed as a game against nature.
He realized
that the eva1uation of a statistica1 procedure should be based on its
consequences as opposed to other more inherent properties.
This of
course is the very principle of decision theory and this concept was
first fu11y developed by Wald in the 1940's and resulted in his book
Statistical Decision Functions in 1950.
Since then the decision-theory
approach to statistics has become increasingly more popular and its practical
implications have mu1tip1ied according1y.
This thesis is re1ated to the desirability, in some sense, of
making one (type of) decision rather than another (type).
This implies
the search for a decision rule (or class of rules) that is an optimal
strategy in the sense of its consequences (i.e. loss or risk to the
statistician).
The principle investigated here is that of admissibility
-iv-
in whieh a rule a (or elass of rules) is eonsidered when no ether
rule is "bette... than ail.
Although it seems certain that no single
rule is suitable for selecting a specifie course of action that would
be universally agreeable in decision theory problems, admissibility
is often of
considerabl~
value in deeiding what not to do.
To see
this, eonsider a situation where we p'r,QBo.s.e a certain strategy a for
consideration.
Then two possibilities arise;
strategy that is "better than ail
is better than a.
or (ii)
(i)
there is no other
there is a strategy a r that
In case (i) a is eertainly admissible but not neees-
sarily preferred (and therefore not optimal) since other, competing,
admissible strategies may exist.
However. in case (ii) we certainly
need not consider a.
In ehapter one the deeision theory approach to statistieal problems
in general is introduced.
The basic concepts of decision theory are defined,
discussed and illustrated with examples.
The formaI mathematical tools
needed in decision theory are constructed in a logical
sequence, beginning
with the statistical decision problem itself and including sample spaces,
loss functions, and risk functions.
Decision rules in general are dis-
cussed and broken down and defined as both randomized and non-randomized
strategies.
The two common avenues of arriving at randomized decision rules
are exploited and delineated.
The uses of decision theory for testing
h1,~b~é~~§#
parameters and intervals are briefly discussed.
estimating
Finally the two principles
mQst commonly used to order decision rules are defined and illustrated,
-v-
name1y Bayes and Minimax ru1es.
one both to i11ustrate how
th~
They are investigated brief1y in chapter .
action space can be ordered and a1so to
provide specifie types of decision ru1es that a110w the reader to place
the genera1 concepts of chapter two in a better perspective.
In chapter two the main theorems of decision theory are offered
in addition to resu1ts dea1ing with the phenomenon
of admissib1ity.
Admissibi1ity is thorough1y investigated in this chapter.
The concepts
of minimal complete and minimal essentia11y complete classes, c10sure
under equiva1ence of a c1ass, equiva1ence (by Fisk) classes, lower
quantants,
lower boundaries of the risk set S, c10sure, boundedness, and compactness
of S and the convex hu11 of
Sare a11 defined and used to deve10p the
theoretica1 resu1ts invo1ved in admissibi1ity.
to i11ustrate some of these theoretica1 resu1ts.
Again examp1es are used
The desirabi1ity of ad-
missib1ity as a choice criterion, the consequences of its emp10yment and
the situations in which it may arise are discussed at 1ength in this chapter.
The concepts of comp1eteness and essentia1 comp1eteness of classes of ru1es,
both inherent1y connected with admissibi1ity, are deve10ped and e1aborated
on through a succession of theorems and 1emmas.
Resu1ts dea1ing with re-
strictions on the parameter space (i.e. the states of nature), the action
space, the samp1e space, and the 10ss (risk) function are offerred in
this chapter.
Even situations in which the various restrictions themse1ves
arise are discussed.
Especial1y are the circumstances yie1ding the essentia1
comp1eteness of the c1ass of non-randomized decision ru1es investigated.
These resu1ts have farreaching theoretica1 and practica1 implications that
are of particu1ar value in association with the other resu1ts of this chapter.
- vi -
The majority of. the theorems deal with the determination of an admissible
rule or class of rules within the available action space.
The lsst three
theorems, however, suggest methods of constructing admissible rules under
certain assumptions.
In chapter three, Bayes and Minimax decision rules, both introduced
in chapter one, are studied from the viewpoint of admissibility.
Ali
the possible relationships between Bayes, Minimax and admissible strategies
are investigated.
wherever possible.
These relationships are illustrated with examples
These problems are employed with an eye to using the
theory available up to the point of their insertion.
In many cases the
examples are themselves offered as results concerned with the study of
admissibility apart from Bayes and Minimax rules, other types of
strategies are introduced in this chapter, namely rules that arG unique up to
equivalence, rules that are
€-Bayes, €-minimax and €-admissible, equalizer
rules and extended Bayes rules.
Other notions defined and used in the
evolution of results include those of a least favourable distribution and
the value of a statistical game.
Chapter three finishes with two examples
that illustrate a somewhat delicate and easily misunderstood idea, i.e.
though it is often true that a minimax equalizer rule is admissible, it is
not always so, for Bayes rules (which are minimax if they are equalizer,
by the theorem of this chapter) need not be unique up to equivalence.
Chapter four serves as a useful and comprehensive summary of the
results of chapters two and three.
Two tables are included, which are
- vii -
to a large extent a condensation of chapters two and three respectively.
This last chapter finishes, appropriately, with a brief report on
when and how one might hope to find an admissible rule(s) in various
decision theory statistical problems.
-1-
CHAPTER 1 •
.
.
The Decision Problem
The fundamental problem with which decision theory is concerned
i8 the selection by a statistician of a suitable action (from aIl of
his possible actions)
in a particu1ar decision-making context.
statistician must contend with the possible "states of
natur~'
The
under-
1ying the prob1em. These different states of nature alter the conseq uences of the action'
Genera11y, we are confronted with the fo110wing essentia1 components:
(1)
A non-empty set
9
=
{9 ,9 , ••. ,9 , •.• }
l 2
n
of possible
states of nature (often referred to as the parameter. space when we
use decision theory as an estimation procedure).
(2)
A non-empty set
to the statistician.
~ =
Given any
{a ,a , •.• }
1 2
9. €
1
fP ,
the statistician is faced
with a "cost" for each eaction hè parsues (while
of nature).
of
9.
1
of actions available
9.
1
is the true state
Th;ls cost is tabulated as a "loss function ", Le. a function
and whichever action the statistician chooses.
Renee our third component is:
(3)
A loss function
L(Q,a)
defined on
G;
x ~.
We sball
rigorously define 1055 functions for decision theory problems 1ater.
-2-
The deciaion-making prob1em ia of interest when the statistician
has a choice of a1ternate'actions (i.e. ~ consists of two or more
possible actions)
and the consequence of ta king one of these actions
must depend on the state of nature.
Of course the difficu1ty in de-
ciding which action to take is due to the fact that it is not known
which of the possible states of nature is the true one.
It is the
10ss function that measures the "cost" of taking actions
respective1y when the states of nature are
al' a , •••
2
9 ,9 , ••. respective1y.
1 2
It is of interest now to note that the statistician is genera11y
free to hypothesize c,what the true state of nature really is.
He will
do this by performing "experiments" aimed at revealing (at 1east partia11y)
the true state of nature.
Hence, in a situation where there
is no experimentation, the statistician's "strategies" (Le. avenues
of action) are pure in the sense that they are simp1y the
actions of
c:t.
possible
However, in a situation with even a single experiment
by the statistician the number of strategies for the statistician is
vast1y increased, for he must now decide on a ru1e that will associate
with each possible outcome of the experiment a point
a
€
cr.
Such
a ru1e will be cal1ed a decision ru1e and we sha11 now give the appropriate definitions:
(~)
Definition 1.1.
A statistica1 decision prob1em is a triplet (~,~, L)
together with an experimenta1 invo1ving a random observable
distribution
Pe
depends on the true state of nature
X whose
9 € ~ •
-3-
Defini tion 1. 2.
We shall consider
X to be a random variable or
a random veetor, the observed value, x, of which the statistican may
see before making a decision.
space consisting of
of
X when
(X,
49
~,
,Pe)
then, is said to be the sample
= 3(
where
Pe
Q is the true state of nature.
Let ~ = (X,
Definition 1.3.
e ,Pe)
he a sample space, and let
an arbitrary space of actions or decisions.
on
is the distribution
c!f
be
Then a function, d, defined
X into ~ is called a decision rule (function).
X and mapping
Such a function is non-randomized and represents a pure strategy for
the statistician.
We shall denote the class of aIl non-randomized de-
cision rules by D.
Note that a decision rule,
d
€
D, is a random variable, and it
can be considered as a partition of the set
subsets
S
a
=
(x: d(x) = a}
whose union is X.
of a single experiment is an element in
Definition 1.4.
Let
*
=
(X, <9,P e )
or
L
L(e,d(X»
Rence if the outcome
S , action a is taken.
a
he a sample space, and let
an arbitrary space of actions or decisions.
function
X into mutually exclusive
~
be
Then a bounded, real-valued
defined on the product space ~ x ~with values
L(Q,a)
is called a loss function.
Note that the function
L(e,a)
may take negative values so that
the statistican may in fact experience a gain.
Definition 1.5.
Let
'i:.
= (X,@ ,Pe)
an arbitrary spa ce of actions.
Let
be a sample sapee, and let Efbe
D be the class of non-randomized
-4-
deeision ru1es mapping~ into ~ and let
defined on .x~.
L(·,·)
Then the expeeted value of
be a10ss funetion
L(Q,d(x» when
is the true state of nature is ea11ed the risk funetion.
Q
It is
written as fo110ws:
R(Q,d(x»
( 1.1)
and it represents the average
state of nature is
Q
=
EQ L(Q,d(x»
"loss" to the statistieian when the true
and the statistiean uses the funetion d.
We may simp1y take the expeetion of
(1.1) to mean:
(a)
R(Q,d(x»
=
Ee L(Q,d(x» =
L(9,d(x»
~
• P9(x)
X€x
OR
(b)
The Lebesgue integra1
R(9, d(x»
EQ L(9,d(x»
=
J
L(Q,d(x»
dPe(x)
x
Note that sinee we defined the 10ss funetion
L
to be bounded (see
definition 1.4), the risk funetion is finite.
Definition 1.6.
We shal1 now FPoeeed to al10w the statistieian more
flexibility in his decision-making by introducing mixed strategilef!t',;
(i.e. randomized decision rules).
-5-
DEFINITION A.
Any probabi1ity distribution
8
on the space of non-
randomized decision ru1es D, is ca11ed a randomized decision rule
(function ). The space of a1l randomized decision ru1es shall be
denoted by D*
The space of a1l non-randomized and randomized de-
cision ru1es will be denoted specifica1ly by
difficu1ties we assume that ~
a:.*
To avoid non-existence
contains al1 probability distributions
giving mass 9ne to a finite number of points of~.
D*
Renee, assuming
to contain a1l probabi1ity distributions giving mass one to a
finite number of points of D we have that
a point
De D* for we may identify
d € D with the probabi1ity distribution
generate at the point d.
€
D*
Of course, by the definition of
Given ~
that
et
is de-
as an
must contain~.
arbitrary action space, et*
Examp1e:
8
then the set ~*
=
of probability distributions on ~may be taken as the interval [0,1]
such that
whereas
1{ €
1-1{
a*
represents the probabi1ity of taking action
is the probability of ta king action
a2 •
D, the set of non-randomized decision rules, consists of four
=
.. {d ,d ,d ,d )
l 2 3 4
e1ements,
D
where
dl(x l )
= al
d (x )
l 2
= al
d2 (x )
l
al
d (x )
2 2
= a2
d3 (xl)
= a2
d (x )
3 2
al
d (x )
4 l
= a2
d (x2 )
4
a2
-6-
*
D,
the set of randomized decision ru1es, may then be
represented as:
4
Z
=
i=l
where
ru1e
dl
is chosen with probabi1ity
Pl
ru1e
d2
is chosen with probabi1ity
P2'
DEFINI TION
etc.
We may equiva1etn1y (for proof see Wald and Wo1fowitz,1951)
B.
consider the fo11owing definition of random decision ru1es:
Let )(
= (X, GD , Pe)
space of actions,and let
then
~(x)
be a samp1e space, and let ~ be an arbitrary
~ be a c1ass of functioBs ~ from){ into~,
is said to be a randomized decision ru1e (often referred to
as a "behavioural"
decision ru1e, in fact, we sha11 denote the c1ass of
behavioura1 ru1es by !) ).
on (X • ~)
~ then, is a class of functions ~
such that, for each
bution on ~.
x e X,
~(x)
is a probabi1ity distri-
That is, te select a randomized strategy
is to select the fo110wing ru1e of behaviour:
defined
cp
froID
for every outcome
cI>
x
of
a single experiment, the stattstician chooses action a with probabi1ity
cp (a)
x
~(a/x)
Examp1e:
•
Given that
=
and
-7-
Let
also
X be a random variable where distribution is binomial
with sample size
n
i.e.
9
where
e::
and
=
space of probability distributions on
=
[0,1]
~ =
Then
is the true state of nature and 9
i
such that
re
taking action
al'
ta king action
a
=
,.
=
l-re
2
= 1/2
is the probability of
•
l
from the sample space
(rr'
re' , ••• ,re' )
o ' 1
E L(9
n
A
i
Z)
where
€
~
to
0< rel .::; l, ••• , 0
0< rro< l,
n
(see risk for definition B), where
~
R(9 ,re)
i
~,
o
R(9.,re)
=
3/4, 9
represents the probability of
whereas
ô(x)
( (re ,rr , ••• , re ):
A..
Also
«*
=
set of behavioural deëision rules
~ = set of functions
~
2
€
1
9.
~
€
®
0.*
:s ren-<
1 }
and
~ is found as follows:
Z
is a random variable ta king values in
whose distribution is given by rr.
,.
R(9.,
re)
~
n
x.
n-x.
n
(x .) 9.:~ J (1- 9 )
J
i
j=O
J
~
[re. L(9.,a )
J
~
1
+ (l-rr. ) L( 9.,a 2 ) ]
J
~
As indicated, we shall not prove the equivalence of Definition A
and Definition B here.
However, it will suffi ce to say that the computation
-8-
of the respective risks is slightly different but that the result is
the same:
for Definition A:
R(9,8)
= E9
R(9,Z)
where
Z
ia a random variable
ta king values in D whose distribution
for Definition B:
,..
R(9,8)
=
defined by
is given by 8;
E9 L(9,8(x»
where
=E
L(9,Z)~
L(9,8)
L(9,ô(x»
whene
is
Z is a
random variable ta king values in ~ whose distribution
is given by 8 •
Note that the choice of a decision function should depend only on
the risk function
random variable
(b)
R(9,d)
and not otherwise on the distribution of the
L(9,d(x».
The uses of decision rules in other statistical problems:
(i)
Problems in testing hypothesis:
i.e.
where the action space
consists of only two points
(ii)
Point estimation of a real parameter:
consists of the real line
Example:
Let
Let
$
to minimize
2
c
d(which is really a
defined on the sample space)
unknown state of nature
+ ~)
c(Q - a),
L(Q, a)
Here a decision rule
i.e. ~ = (- ~, +~)
(-~,
=
where the action space
is actually an
some positive constant
real-valued function
"estima~e"
of the true
Q; i.e. the statistician chooses the function d
the risk function,
-9-
=
R(9,d)
c Ee (e - d(x»
the mean squared error of the estimate
here (i.e. that of choosing an
estima~e
2
,
d(x).
which is
times
c
Note that the criterion involved
with a small mean squared error in
some sense) is exactly the criterion most frequently used in classical
statistics.
We will see later how one of" the most valuable applications
of decision theory is in estimation procedures and how the study of various
types of decision rules will help solve estimation problems.
(iii)
lnterval Estimation:
A third use of decision rules in statistical
problems is that of interval estimation, where the action space ~ may
consist of the two dimensiona1 plane and ~ May be the rea1 1ine.
Here
the 10ss function is of a special type, to allow an estimation of an interval
belonging to $
Examp1e:
.
Let
e
{ (y,z): y
L(9(y,
positive constant.
be the rea1 line, let ~ be the half-p1ane
~"z
} and let the loss function be
z»
k(z-y) - l( y, z ) (9)
where
k
is a
Here a decision ru1e is essential1y an interval
estimate of the true state of nature.
(c)
Ordering Principles and decision rules:
Naturally the statistican wou1d search for a "best" decision rule
in the sense of a rule that has the smallest risk regardless of the true
state of nature.
'~enerally,
however, thaugh there May be a best action
for the statistician ta take for each state of nature, this best action
-10-
will differ for different states of nature, so that no one action can
be presumed best
over a11.
To select a decision ru1e, then, other criteria
must be used.
Severa1 princip les are common1y used to choose a decision ru1e.
Any
princip1e that orders decision ru1es on some sca1e of desirabi1ity or
suitabi1ity will serve as a basis for a decision.
Two important princip les that serve to order decision rules are
the Bayes
princip1e and the Minimax princip1e.
These two common prin-
cip1es that 1ead·to so ca11ed Bayes and Minimax decision ru1es respective1y
are of
considerab1~
value in the discussion and comparison of admissible
decision ru1es (i.e. the form of decision ru1e that this thesis dea1s wibh).
We sha11 now define Bayes and Minimax decision ru1es.
(i)
Bayes
decision ru1es:
We first introduce a so ca11ed prior
~
distribution on the parameter space ~.
~
~
We take
to be any space
of distributions ~ on ~ that satisfy the two fo11owing conditions:
(a)
1!Ib*
QP
contains a11 finite distribuions on
(b)
~*
is 1inear
~1
i.e.
ex1' 1 + (1 - ah' 2
€
9*, ~2
€
fJ*
l' €
@*
ri
implies
for a11 ex such that
Renee we define a decision ru1e
some prior distribution
€
~
if
Ô
o
O:S a
:s 1.
to be Bayes with respect to
-11-
where
-r(-r,5 0 )
the "Bayes Risk"
=
of a decision ru1e
50
with
respect to the prior distribution -r
and
=
-r(-r,5 0 )
where
E R(T,5 )
0
is a random variable over
1['
El) with distribution -r
Note that in using the Bayes princip1e, the statistican acts as
if the true state if nature were actua11y a random variable whose
distribution he knows.
Essentia11y, then, we see that Baye's princip1es
never require randomization , a fact that some statisticians consider
a drawback.
a ru1e
51
For a fixed distribution
to a ru1e
52
if
-r
€
e=*
the statistician prefers
51 has sma1ler Bayes risk.
a linear ordering on the space of decision rules.
This sets up
A Bayes decision ru1e
is one that is best with respect to this ordering.
inf
Note also that if the actual
ô
statistician
€
-r(-r,5)
doesn't exist, then the
n*
must choose a decision rule whose Bayes risk is closest to
the minimum value. (These so ca11ed €-Bayes rules will be dealt with in
Chapter three).
Example:
<ID
Given
c a
=
~=
(0,00) and
L(9,a) =
c \9 -
positive constant.
X is distributed as:
1
f(x/f»
=
if
f)
o
where
9
O<x<9
otherwise
is the true state of nature.
al '
-12-
We will find a Bayes rule with respect to the following
distribution
on
g(e)
= 9
exp (-e)
o
Solution:
prior
h(x,e)
=
e> 0
if
otherwise
f(x/Q)
g(e)
(- e)
if
e>
0
otherwise
f(x)
J
=
exp (- e) de
Q
e
J
=
x
{
exp (- e) de
e~
if
(-x)
x> 0
otherwise
(x-a)
gee/x)
=
h(x,e)
f(x)
ifo<x<a
=
otherwise
-13-
00
Therefore
min R(e,d)
=
J
c
d
exp(x-e)de
=
median of gee/x)
x
=
(ii)
e-a
Bayes
Minimax decision rules:
rule with respect to
g(e).
the minimax principle orders
decision rules according to the worst that could happen to the statistician,
i.e.
a decision rule
51
fs preferred to a rule
This procedure effectively orders the space
D*
52
if
of decision rules
linearly.
Rence a decision rule
€.
sup R(e,5 )
(1. 2)
e
Le.
50
0
5
=
o
5
is minimax if
inf
€
*
D
€.
sup
e
R(e,5)
is the "best" in the given order.
Note that there may not be a minimax decision rule (even if the
value on the right-hand side of (1.2) is finite)
whereupon the statistician
must choose a decision rule whose maximum risk is close to the righthand side of (1.2), 1. e. close to the so called "minima x" value. (There
€-minimax rules, as they are called, will be dealt with in chapter three).
-14-
Example:
Consider
a&
=
the following decision problem:
{9 l ,92 } , ~ =
{a ,a 2 },
l
loss is given by the
following table:
.
-2
3
3
-4
L(9,a)
A randomized strategy
°:5 g :5 l,
such that
probability
5·, €
<!{*
may be represented as a number
is taken with probability
g
g,
and
l-g.
Hence the risk set for the problem is
=
{(g L(Ql,a ) + (l-g) L(9 ,a ), g L(9 ,a )+(1-g) L(9 ,a 2 )):
a
l
l 2
2 l
0:5g:S l }
=
{ ( -2g
+ 3 - 3g , 3g - 4 + 4g):
=
{(3 - Sg, -4 + 7g):
0
<
g
<
0:S g:5 1 }
1 )
Now, for a rule to be minimax, we must have that:
-15-
Max R(Q,ô )
Q €
i.e.
®
we must find
0
g
Min
5
n*
€.
Max R(Q,5)
Q
sueh that
Max (3-5g,
But this on1y oecurs for
-4 + 7g)
3-5g
Renee the minimax strategy
with probabi1ity
€
= Min
o~ g ~
=-
4 + 7g
(Max (3-5g, -4.+'7g»
1
or when
g
= 7/12
•
for this prob1em is to ehoose action al
7/12, and to ehoose action
a
2
with probabi1ity
5/12.
CHAPTER II
Admissibi1ity and Comp1eteness
This chapter will dea1 with the deve10pment of sorne genera1 decision
theory resu1ts arising from the interactions between the parameter space
(states of nature), the spa ce of actions of the statistician, and the
10ss or risk function.
We will introduce two concepts, name1y admissi-
bi1ity and comp1eteness, and the purpose of this chapter is to define
and stucIy! these notions as we11 as to discuss the relations between
them and the situations in which they arise.
The fo11owing definitions are fundamenta1 to this chapter.
Definition 2.1.
A decision ru1e
51
is said to be as good as a ru1e 52 if
for a11
A decision ru1e
and
51
< R(e,5 2 )
for aU
R(e,5 )
1
< R(9,5 2 )
for some
R(e,5 )
1
51
=
€
~.
is satQ to be better than a ru1e
R(9,5 )
1
A decision ru1e
e
9
4b
€
e
is said to be equiva1ent to a ru1e
R(9,5 )
2
for aU
52 if
e
€
Ci>
.
52
if
-17-
Note that this natural ordering of the space
D*
of decision rules
is in fact a partial ordering.
Definition 2.2. Admissible Decision Rule:
A rule
sible if there exists no rule better than
i.e
R(9,0)
< R(e,o')
for a11
This of course implies that strategy
0
Q
is said to be admis-
0
El
€
e
and a11
0'
€
*
D .•
would certainly he admissible,
but not necessarily preferred since other admissible strategies will
be competing for attention.
Rence the name admissible.
If a rule is not admissible, then • ft·· is inadmissible.
Note that
in a given decision problem every rülà may be inadmissible if there are
not rules better than any other.
that this is so when the set S"
(We will see later, for example,
of risks doesn' t contain its boundary
points).
Definition 2.3.
A class
C C.D*
with respect to the class
such that
0
to the class
good as
If
0 E D*
of decision ru1es if for al1
doesn't belong to C,
is better than
A class
D*
of decision rules is said to be complete
there exists a ru1e
00
E
C that
o(see Definition 2.1)
C of rules is said to be essentia1ly complete with respect
D* if for a1l
ô
E
D*, there exists a
0
0
E
C that is as
0 (see Definition 2.1).
D*
is the class of a1l non-randomized decision rules, then
we simply say that
C is (essential1y) complete for the given prob1em.
-18-
Note that from the risk function point of view, one loses nothing
by restricting consideration to an (essentially) complete class of
decision rules.
We observe the following immediate results of the above definitions:
Lemma 2.1.
A complete class will necessarily contain aIl admissible
strategiesJi.e. if
C is a complete class and
A CC.
admissible rules, then
Proof.
If
8 € A
since i f ô
i C
contradiction,
Examp1e:
A is the class of aIl
then there is no
th en there is a
8
Ô
o
0
better than
better than
ô
€
C
ACC
i.e.
Consider the following prob1em, where only non-randomized
action al
Loss
o
1
1
2
2
1
o
1
Then
=
hence
ô, which is a
decision rules are considered:
and A
Ô,
(al}
is an admissible class.
-19-
Note that it is not true however that a complete class can
always be restricted to only admissible strategies, since if
ô
is
inadmissible, it doesn't follow that any of the rules that are better
than it will necessarily belong to the admissible class.
Lemma 2.2.
If
C i8 an essentially complete class and there exists
an admissible decision rule
ô
not in C, then there exists a
ô'
€
C
which is equivalent to ô.
Proof.
(i)
ô ri. C
Since
therefore there exists a
<
R(9,ô )
o
(ii)
Since
is not better than
(1)
ô
R(9,ô)
for a11
e
Ô
o
€
€
C such that
e
is admissible therefore any other decision rule
ô . i.e. if
ô'
€
R(9,ô') = R(9,ô)
C then either
for aU
e
for some
e
QI)
€
OR
(2) R(e,ô') > R(9,ô)
but (2) is impossible by the essential completeness of
R(e,ô')
Le.
R(e,ô)
=
ô'
_
for aU
9
€
C and therefore
@
ô
Note that for any decision problem, a complete class of
strategies exists, namely the class of all strategies.
Also every
complete class is of course essentially complete by definition.
-20-
Definition 2.4.
complete if
A class
if
C
C
A class
C
of decision rules i8 said to be minimal
C is complete and if no proper subclass of
C
is complete.
of decision rules is said to be minimal essentially complete
is essentially complete and if no proper subclass of
C
is es-
sentially complete.
Example.
Let
D*
=
if
C
=
then
If also
is better than both
subclass of
52 and 53'
then
K = {51)
51
is a complete proper
C.
We now proceed to offer several results that give relationships
among the above ideas.
We start with the relationship between admis-
sible decision rules and minimal complete classes.
Theorem 2.1.
(a)
If a minimal complete. class exists, it consists of exactly
the adminissible rulës
(b)
If the class of admissible decision rules is complete,
it is minimal complete.
Proof.
(a) Let
C denote a minimal complete class and let
Lennna 2.1.
the class of aIl admissible rules.
a minimal complete class is always complete.
c C A. We assume
Let
implies
A denote
A <:. C, since
It remains' then, ta show
C ~ A and evolve a contradiction.
and suppose
5
a
ri
A.
We shall show there exists a
-20-
Definition 2.4.
complete if
A class
if
C
C
A class
C of decision rules is said to be minimal
C is complete and if no proper subclass of
C
is complete.
of decision rules is said to be minimal essentially complete
is essentially complete and if no proper subclass of
C
is es-
sentially complete.
Example.
Let
D*
=
if
C
=
then
If also
is better than both
subclass of
B2 and B3'
then
K = (BI)
BI
is a complete proper
C.
We now proceed to offer several results that give relationships
among the above ideas.
We start with the relationship between admis-
sible decision rules and minimal complete classes.
Theorem 2.1.
(a)
If a minimal complete class exists, it consists of exactly
the adminissible rulës
(b)
If the class of admissible decision rules is complete,
it is minimal complete.
Proof.·
(a) Let
C
denote a minimal complete class and let
the class of all admiss·œble rules.
Lemma 2.1.
a minimal complete class is always complete.
C
C A.
Let
We assume
Ô
o
€
C ~ A
C and suppose
It
A
denote
implies
A C. C, since
remains~
then, to show
and evolve a contradiction.
ô
o
ri. A.
We shall show there exists a
-21-
51
€
C which is better than
there exists a
= 5.
51
If
5
~
C,
5
5 , Le. since
better than
then since
that is better than
51
to Cl'
If
exists a
Cl
Cl
5 € C we may take
{50)'
51
let
Cl
€
also.
C that is better than 5 •
5
be an arbitrary rule not belonging
o
is better than 5.
o
If
€
51
t
50'
then
51
If
= 50'
5
C is better than 5.
Cl~C
But by construction,
(b)
Let
AI
C
€
Cl
This
i.e. cC A
= A.
We must show that no proper subclass of
~
there
and hence
we have a contradiction to the minimal completeness of C.
and therefore
In either
€
51 # 5 , then.5 1
is complete.
50
51
51 € C that is better than 5
If
If
o
and therefore better than
5 = 50 then
is better than 5.
implies
= C-
5.
is inadmissible
o
C is complete there exists a 51 € C
case we have shown there exists a
Now let
5
o
A be any proper subclass of A.
Assume
A is complete.
AI complete and we will
show a contradiction to the admissibility of A.
Since
51
5
€
~
A is complete therefore for aIl
AI such that
AI,
let
5
€
51
is better than 5.
A {AI)
i.e.
5
€
5
~
AI
there exists a
Since this holds for a11
A for sure.
But now, since
A is
,
admissible, then there is no ru1e better than 5.
51 is better than 5.
Rence we have a contradiction.
AI is not complete or no proper subclass of
Theorem 2.2.
A class
But we have shown
This imp1ies
A is complete.
tR of decision ru les which
is closed under equi-
valence is complete if and on1y if it,is essential1y complete.
<fa,
-22-
Proof.
On1y if:
It suffices 'Co show that every complete class is
This foUows since ~ complete implies for
essentia11y complete.
all
there is a
such that· 6
5 o is aa ~godd as
This means
6
or
6t
is better
o
c10sed under equivalence, then ft is
is essentially complete.
co~p1ete.
~ essential1y complete means if 6 i ~ (~or 6
degenerate case)
then there is a
50
€
~ such that
~ in the
€
50
is as good
6
for aU
i.e.
a
5.
We wish to show that i f ~ is essentiaUy complete and
If:
as
th~n
«
c10sed under equiva1ence means if
51
such thàt
Now for
<R
R(9, 50)
= . R(9,6 )
l
9
50 €~ and if there exists
for aU
to be complete we must have i f
5 0 €~ such that
R(9,5 0 )
R(9,5 0 )
< R(9,6)
< R(9,6)
cD
€
6
for aU
9
€
t
~
9
for some
€
then 6 1
,
there exists
t!>
9
Rence we shal'. consider aU possibilities:
(i)
For
5
t~
there exists
R(9,5 )
0
<
R(e.6 )
0
Le.
( ii)
For
5
€
CR.
<X.
< R(fL6)
R~9,5)
60
€
IR
such that
for aU
9
€
Q»
for some 9
is complete.
there exists a
€~ a1so.
S,
5 c~ such that
0
-23-
=
R(Q,Oo)
But sinee
impled
R(9,0)
for all
Q
€ ••
~ is closed under equivalence, a contradiction is
i.e.
case (i)
is the only possibility, but this imples
~is complete.
Thereom 2.3:
under
The class
A of adminissible decision rules is closed
equivalence.~
Proof.
Assume
i.e.
A
is not closed under equivalence
R(9,B)
o
But since
rule
0
1
But sinee
0
0
=
R(9,0)
A is admissible and
better than
0
This means
0
0
A
such that
for all
B
A
€
Q
€
~
•
therefore there is no
B •
= 0
o
o€
otA,
o
there exists
€
we must have no rule
A.
0,
that is better than
But this contradicts our assumption that A
is not closed under equivalence, i.e. if
A
is admissible, it is
closed under equivalence.
Theorem 2.4.
For each
decision rules, let
R(Q,o)
Then the sets
8
0
0
o
be the class of decision rules 0
o
5
such that
o
R(9,0 )
8
belonging to the class A of admissible
for a11
from a partition of
9
€
A
@
(Le. 0
ii
B )
o
into equivalence classes.
o
Renee an essentially complete class of decision rules neeessarily contains
at least one element from eaeh equivalence class of A.
-24-
Proof:
We have for aIl
5
Let ~ be an essentially complete class.
S5'
equivalence class)
A a set of decision rules (i.e. an
€
ASRume now that ~ does not contain at least one element from each
equivalence class of A, i.e. there is at least one
5
there is nOl
if
5
€
A then eertainly
€
assumption.
R(Q,5 )
l
51
=
(i)
Noweither
~ sueh that
51
€
A or not;
€
9
R(9,5 )
for a11
R(9,5 )
for sorne
0
(ii)
Now
S5'
€
€
S5 ' but this is impossible by our
1
therefore
there is a 5 € A such that
ri
A,
51
0
Renee
(i)
either
51
A sueh that
51
o
51
is impossible sinee then
($
9
S5
€
€
which contradicts our
o
assumption.
Also (ii)
is impossible since
i.e.
51
€
Gt
essentially complete.
our initial assumption is wrong and our result is proved.
Theorem 2.5.
If the class of admissible decision rules
A is complete,
then every class eontaining one and only one element from each of its
equivalence classes (see Theorem 2.4) and no other element, is essentially
complete.
Proof.
Let6l be a elass of
decision rules containing one element
from each of the equivalence classes of A.
essentially complete.
is a r~le
(1)
51
€
therefore
5'
€
<S(
51 =
(2)
sorne
show~ is
51
We must show that for any decision rule
~ that is a gooù as
and
It suffices to
Ô
l
t
50 € A,
~.
5.
We have two cases,
Consider
but since
there
(1)
since
51
€
6\
A is complete this implies
-25-
A is essentia11y complete by Theorem 2.2.
ru1e
8*
80
R(e,ô )
But since
81
have that
R(e,8*) < R(e,Ô )
1
~
there must be some
,
i.e.
8'
=
0
1
=
e
for a11
for a11
•
€
8' - €
R(9,8*)
is aagood as
ô
1
'
€
and since
a11
G
we
8*
€
A
9
€
ô*
€
•
e€ •
for a11
ô'
9
8 *
8
for
This implfes
€~.
< R(9{8 0 ) for a11 e
R(Q,8 )
~ such that
€
R(Q,8')
i.e.
R(Q,8*)
80 • i.e.
A as good as
€
This means there is a
which means ~ is
essentia11y complete for case (1) •
Case (2):
i.e.
8-
i ~ means either
(ii)
Considering
(i) , since
that is as good as
ô'
€
8'
88*
and
ô1
is as good as
that
88*
and
But there must be some
A
8' €~ such that
This means, simila:dy. to case (1),
that
8 1 or ~ is essentia11y complete.
since
that is as good as 8
1
€
.
ô' = 8* .
Considering (ii),
ô'
A is essentia11y complete there is a
.
A is essentia11y complete there is a
A1so there must be some
ô' ; 8*.
8' is as good as
completes the theorem.
8'
€~
But this implies similarily
8 ,
1
i.e.
ô*
such that
to case (1) ,
is essentia11y complete.
This
€
A
-26-
Theorem 2.6. If a minimal essentially complete class exists, it consists
of one and only one element from each of the equivalence classes of A
and no other elements.
Proof.
(i)
Let6( be a minimal essentially complete class of decision
rules.
We assume first that« does not consist of one element from
each of the equivalence classes of A ·and show a contradiction to the
(essential) completeness of ~
5
€
there exists at least one element
~ such that
(a)
Now
i.e.
(a)
5
t.
and
A
implies since
5
(b)
~
5
any
A
€
o
A is admissible there exists a
50
€
A such
that either
(1)
R(e,5)
=
(2)
R(Q,5)
> R(e.5)
o
OR
Now (b)
implies
(1)
> R(Q,5)
o
But the essential completeness
and hence
(2.6.1)
)
for all
9
for some
Q
is impossible, hence
R(Q,5)
(2.6.1.)
R(e.5 0
of~
(2)
must hold, i.e.
for some
implies
(1
€
R(9,5
0
9
loi
)
for all
contradicts the essential completeness of
9
€
~
dt
Rence our assumption is wrong and ~ must at least conta in one element
from each
(ii)
of rules.
Again let~ be an (minimal)
essentially complete class
Assume now that ~ contains at least one other element
that does not belong to any
S
in addition to at least one
B.€ A
1
5
-27-
e1ement from each
Sô.€ A'
This means that
(a)
ô
i
A and
~
(b)
ô
i
any
there is a
50
50
€
€
A.
But since
A such that either
( i)
R(Q,ô)
=
R(Q,ô )
for a11
(2)
R(Q,ô)
> R(Q,ô )
for some
0
OR
But
A is admissible, this means
(b)
imp1ies (1)
R(Q,ô)
0
Q €
Q
•
is impossible, hence we must have
> R(Q,ô)
o
for some
Q.
But this contradicts the essentia1 comp1eteness of
6\ and hence
the
initial assumption of (ii) is wrong, i.e. there are no e1ements of
~
that do not be10ng to any
(iH)
We now have shown that for
6t to
be essentially complete
(at 1east) it must contain at 1east one e1ement from each of the
equiva1ence classes of
A but no other (1. e
no other ru1es not be-
10nging to any of the equiva1ence classes).
Now let ~ be
1east one
Ô
€
~
minimal essentia11y complete.
that be10ngs to one of the
Assume that there is at
S
5.€ A
that is a1ready
~
occupied by another member of~, i.e.
there .sre two rules be10ngong to~ .
from at least one
S
5.€ A
~
Now this fact does not contradict
the essenti~l comp1eteness of6t since it simp1y imp1ies that the other
5'
€
~ that be10ngs to
(where
Ô
o
€
~ a 1ready) is such tha t
-'28-
R(e,ô')
Howëver,
for a11
=
Q
e
® .
this does contradict the minimal requirement of (K since
we observe that the classof decision rules
ô e ~ such that there is
e~ each S
A will also be essentially complete but
'"
ô.e
~
will be a proper subset of our just assumed class ~ of (Hi).
only one
ô
Moreover, the class consisting of one and only
one rule from each
A . and no other(s) will certainly be the smallest
equivalence class of
(i.e. minimal) essentially complete class.
Befor.e proceeding to the next result we must introduce some basic
ideas:
Definition 2.5.
The risk set
S
for a decision problem is defined
as follows for finite $:
S
=
for some
for
Definition 2.6.
A set
S
Ô
e D*, y. = R(Q.,ô)
J
j = l, ••• , n }
in n-dimensional Euclidean space
said to be bounded from below if there exists a finite number
that for a11
at x,
denoted by
E ,
n
is
M, such
y
> Definition 2.1.
J
Let
M
for
x be a point in
1, ... ,n .
j
E
n
space.
The lower quantant
Q , is defined by the following set:
x
-29-
Thus
Qx
y
(
=
€
< x.
E :
n
J
for
is the set of risk points as good as
j=I, •.• ,n}
x and Q _ (x}
x
i8
the set of risk points better than x.
Definition 2.8.
We say
and
Definition 2.9.
y
pointa
=
S
S
S
for
S C En
of a convex set
of
S
En
space is said to be convex if, whenever
= (Y1"'·'Yn) and y' = (Yi, .•• ,y'n) are e1ements of S, the
= (aYl
0
~
Definition 2.10. A point
convex set
is the cIo sure of the set S
V ( aIl 1imit points of S}
A subset
ay + (1- a)y'
e1ements of
S
S
if
a
~
x
+(1- a)y'l, ••. ,aYn +(1- a)y'n } are
1
is said to be a lower boundary point of a
Qx Il S = (x}.
i8 denoted by
The set of. 10wer boundary points
~(S).
Examp1e: The 10wer boundary of the unit square
s =
o :s:
is the set consisting of one point
~(s)
Definition 2.11.
E
A convex set
~(S)
c:.
also
S
S
n
Y2
:5 1 }
= (D,D)}
is c10sed from be10w if
-30-
s
Example:
Theorem
then
(ii)
Proof.
that
GD
ie finite and
S
is cl08ed and
Assume that
Bo
i.e.
there is no
B
€
S
0
;x
Qx
l
Rence
Qx
0
OR
n
Bo
is admissible, then
is not admissible
that i8 better than
B
o'
.
consists of
Xo and xl
2.7,
at least , i.e.
o
= x 0 U xl at least.
i.e.
Bo
S C E , then:
= (R(Ql,B), ••• , R(Qn,B», we have by Definition
But by Definition 2.10
(ii)
is closed from below
is an admissible decision rule.
o
(i)
xl
Qx ()
B
If
then,
If,
If
2.7.
of (Ex. Definition 2.10)
Bo
this meanx
x
o
~
À(S) , a contradiction.
is admissible.
admissible implies that for any other B we have either
( 1)
R(Q,ô)
R(Q,ô )
(2)
R(Q,ô)
> R(Q,B)
o
o
for a11
for some
Q
Q •
€
4D
-31-
If
(1)
If
(2)
holds then
holds then
Rence
S
=
x
1
=
= xo
xl i Q
, Xo
(x}
o
by
Definition 2.7.
and then
xo
~
by
Definition
2.10
MS) •
Example.
Consider
x
o
Therefore
= (0,1)
Qx
=
(x}
o
and certainly
a
2
'0
and
Q (\
xo
is not better than
i.e.
S
(x }, Le.
o
al
is admissible.
We now proceed to give some results for finite
assumptions on the risk set
x n é- MS)
QD
with certain
S, (i.e. we subsequently assume
S
to
be both bounded from below and closed from below).
proceeding these results, we accordingly investigate conditions placed on
the action space ~:, the loss function L, and the random variable
X to this
end.
Consider the decision problem with
QP
finite, L bounded and allow
fixed sample-size experimentation on which to base a non-randomized decision
rule
d
Let
€
D.
W
-32-
Then W is a subset of
with
P
and)E
is the samp1e spaee'X =(X,
X given
the probabi1ity funetion of
9
L(9,a .)
En
is the 10ss funetion when
and action
a
9
e
'Pe)
e.
is the true state of nature,
is taken by the statistiean.
For fixed
9
€
QP,
let the probabi1ity distribution on X be
p(x ; Q).
Renee the risk funetion based on a
non-~andomized
deeision ru1e
d
€
D
is given by:
R(9,d)
Z
=
L(e,d(x»
p(x;e)
x € X
Theorem 2.8.
Consider the risk set
for sorne
=
S
Then if
Proof:
S,
W(see above)
j
is e10sed and bounded, so is S.
We give the proof for the diserete plan.
Sinee W is bounded, then
and for a11
9.
1
€
® and
Binee
€
~.
L(e.,a)
1
Renee
is bounded for 'a11
L(9 ,d(x»
i
is bounded for a11
is a probability funetion then
L(9.,d(x»
X € X
9.1 €<iJ i.e.
(H)
a
p(x;9)
Z
al1
= 1, ••. , n }
1
p(x;9.)
1
R(9.,d)
1
S is bounded.
It remains ta show that
S
is e1ased.
is bounded for
-33-
Now since
that
GD is
finite, we may assume, without 10ss of genera1ity,
X consists of a countab1e number of e1èments
Let for a11
d
€
D,
a kQ .
J
= L(Qj,d(xk »,
Rence
k
= 1,2, ••••
and
j = l , ••• ,nI
i.e.
we may write a point
s(d)
€
S
as
Rence we may consider a decision ru1e
d
€
D to be a specification
of a sequence of vectors,
be10nging to W and such that if
x
k
is observed, the vector
-34-
is chosen from the sequence and the 10ss is
true state of nature.
No~,
has
n
SC
reca11ing that
E
n
a
k9
.
if
9.
J
J
is the
space and therefore a point in S
coordinates, consider a convergent sequence of points in S.
sha11 represent the sequence as
R(9,d )
=
m
genera1 point of sequence ,
= (
If we show that the limit of
that
S
{R(9,d) ~ be10ngs to
m
S we will have
is c10sed.
Consider the sequence
{d )
m
a
(1)
19
1
( 1)
aU>
{dm)
2
=
(1)
aU>
n
of veetor be10nging to W, where
We
-35-
Now, by the Cantor diagonal method we may select a subsequence {d'}
n
of
{d}
m
that converges to some vector
for aIl
k
as
n
~
00
and since W is closed therefore
€
We now must show that a subsequence
to
R(Q,d*)
where
d
d*
W.
{R'(Q,d )}
n
of
{R(9,dm)} converges
is a specification of a sequence of vectors, i.e.
* =
To show this we use the following lemma:
-36-
Lemma
Let)E
2.3.
that for
x
€
=
p)
be any sample spaee and suppose
X
p(x; e n )
is some element of
as
p(x;e)
~
n
~oo
where
e
e,
f (x)
n
and
(X,~,
f(x)
~
<
Ifn(x)1
as
M
n~
00
for a11 n
,
Now, in our theorem, sinee
~ a(n) p(xk;Q), ••• , ~ ak(~) p(xk;Gn »
kQl
k
~n
R'(e,d ) = (
n
k
and then putting
eonYergenee of
f (x) in Lemma 2.3
for
(R'(e,d )}
n
to
n
=
...
a*
k9
Renee our original sequence
to a point of
S, i.e.
S
we get the
*
R(Q,d)
..•
n
(R(Q,d )]
m
of points in
is also closed.
S
converge
-37-
Note that it can be shown that this theorem ho1ds not on1y for
discrete probability distributions but a1so for densities.
In fact
it can be shown that, in the case of densities and under the conditions
of Theorem 2.8, not on1y is
8
c10sed and bounded, but a1so conveXe
The implications of this will be seen in subsequent resu1ts dem1ing
with the essentia1 completeness of the class of non-randomized decision
rules.
Note also that the restrictive conditions of the above theorem
(i.e W closed and bounded)
hold in many cases.
For example the
boundedness of the 10ss function L (which is usua1ly assumed in any
case)
imp1ies the boundedness of
W from the definition of W.
The
closure of W is provided for, for example,if the action space is
finite on a bounded closed subset of
function of a for each
Theorem 2.9.
E and
L(Q,a)
n
is a continuous
9 (which is also usual1y true in any case).
If the action space.(1 is finite and the random variable
X assumes only a fini te number of values th en
8 , the risk set of
0
aIl non-randomized decision rules, is compact.
Proof.
8ince ~ is finite, then by definition, so is
aIl non-randomized decision ru1es.
number of points.
Rence
8
0
D, the set of
consists of a finite
This immediate1y implies the closure of
8 •
o
A1so,
provided the expected values of the loss function are finite for aIl 9,
d
€
D, then each coordinate of each point of
and hence
compact.
8
o
is bounded.
Rence
8
0
8
o
will be bounded
is c10sed and bounded and therefore
-38-
Definition 2.12.
The convex hu11 of a set
convex set containing
So
is the sma11est
i.e. the intersection of a11 convex sets
S ,
o
containing S •
o
Lemma
2.4.
The convex hu11 of a subset
So
of En
of a11 convex 1inear combinations of at most
0+1
(Z:Z
If
Lemma 2.5.
S
=
~
Theorem 2.10.
Proof.
EZ
If
Ài Yi'Y i
i~
€
So'Ài ~ 0,
~
Ài
i=1
=1
)
is a convex subset of E and Z is an n-dimensiona1
n
random vector for which
is finite then
points of So
0+1
0+1
i.e.
space is the set
R(Z € S)
=1
and for which
EZ
exists and
S.
€
S
is compact (c10sed and bounded) then so is S.
o
We first1y show that the convex hu11 of the compact set
So
€
is itse1f compact.
Define
0+1
[ (À
i
> 0,
~
Ài
i=l
= 1]
x
S
o
x ... x
~1
as equa1 to
~
ÀiYi.
i=l
Now
its
g
i!~age
is a continuous function defined on a compact set, hence
will be compact.
the convex hu11 of So.
itsè1f compact.
But by 1emma 2.4.
This image is simp1y
Rence we have that the convex hu11 of
Now we proceed to show that
S
is
is in fact the convex
hu11 of Sand this will give us our desired resu1t.
o
So
EN
-39-
We show
and
The convex hu1l of
(1)
(2)
(1) and (2)
(i)
We first show
8
8
8 ,
is
8
itself a convex subset of
and 8
1
in.D*
2
the decision ru1e
8
0
,
2
1-c.
8
1
with probabi1ity
i.e.
8
o
and hence
j
= 1, ••• ,n
jth
coordinate is
R(9.,8 )
J c
S
•
the
is convex,
But this imp1ies that the convex hu11 of
S with
8
convex.
must be10ng to
8
since it is by definition the interseètion of a11
8
o
€
convex sets containing
sets containing
hu11 of
Le.
c
C1early
denotes the point whose
=
z
am consider
which chooses a non-randomized decision ru1e
with probabi1ity
z
Yl
j=l, ••• ,n •
0 and 1
for
If
Let
n
for
D according to the distribution mixing
8
E
such that
Let, now,c be an arbitrary number between
and
8 € C,
Then by Definition 2.5, there exists
and
in
then
0
imply our desired resu1t.
be arbitrary points of 8.
decision rule
€
0
C is any convex set containing
If
and
and Y2
8
o
€
o
Si =
8.
{S, Sl,S2".' }
and therefore if we' let
then
8,
Si
8
S , Le.
o
8
n
Sl ft 8 2
=
set of. convex
8 1 represent the convex
(1 ...
-40-
S
(2)
€
o
convex set
C.
We wish to show
Z be a vector belonging to
Let
S •
o
S
€
C also.
Then there is a
8
€
D*
such that
is a random variable taking values in D whose distribution ia given·
by 8
Z, then, is the expectation of a random vector
Rence, since
with values in
S, it follows from lemma2.5
o
sec.
implies
( 1)
(2)
SI
S
that
81
€
S
which imp lies
S is compact if
q& : :
So
SI
(Ql, ••• ,Qn)'
= SI
DI C. D*
=
is closed, then
is closed from below.
S
Now, for any
ô
i
DI
OR
(2)
But this
8 € DI
8
€
D*
S
o
is itself compact
is compact.
complete and
(1)
C.
SI
€
Theorem 2.11. If
Proof.
€
Rence we have shown that:
and also that the convex hull of the compact set
i.e.
Z
either
and
DI
is essentta11y
-41-
holds, then
t
(1)
D'
is essentially complete, there exists a rule
is as good as
Renee
QZ
< R(Q,ô)
€
D'
sueh that
for all
Q €
D'
D'
Ô €
yielding a risk point in
Z
~ ~(S)
~(S).
is essentially complete.
But another
ô'
€
D'
may exist
sueh that
R(Q,ô')
But this implies
S' C S
=
QZ
then by case (1)
for only those
Z's
R(Q.ô)
=
for a11
{Z} only.
New sinee
S'
is closed and
we see that it suffiees to coneièer
derived from rules not belonging to
then,
fl
S'
Q fl
S'
Q
Z
=
Z
{Z}
Le.
and
MS)
S
ô'
®
hold~, eertainly then, no rule not belongi~g to D' is in QZ
(2)
sinee
ô'
eontains at least two risk points and hence
This implies there is no
If
Also sinee
S'.
ô,
R(Q,ô')
i.e.
Z
= (R(Ql,ô), ••• ,R(Qn'ô»
If
€
sinee
sinee
S'CSc.s
S'
is elosed
S' Co. S
c. s
is elosed from below by
Definition 2.11.
D'.
QZ (l
S
We have
•
-42-
Cor011ary 2.11.1.
If the non-randomized risk set
So
is closed and
Dl
the set of all non-randomized rules, is essentially complete, the risk
set
S
is closed from below.
We will now offer the heretofore
69
mentioned results for finite
with certain assumptions on the risk set
S,
hav~ng
given rather general
situations which leaè to the desired assumptions in S.
We now pIove a lemma which we will need in the following theorem.
Lemma 2.6.
À(S)
If a non-empty convex set
S
is bounded from below, then
is not empty.
Proof.
S
M<
bounded from below implies there is a
such that
00
y. > - M (Definition 2.6) •
J
Now sinee
bound,
S
is bounded from below and non-empty then the greatest lower
Yo' of
y(n)
sueh that
below)
S
exists.
~y.
o
and since
Consider the sequence of points
Clearly y
0
is
(y(n)}
€
S
fini~e (since ·S is bounded from
Yo is a limite point of S,
then
Yo
€
S
a.nd
{yo}C~
(\ S.
o
since i f
Also
than
but
Yo
itself, then
Y1 ~ S
for the Yl
is the greatest lower bound of S.
Yo
then it would be a 1imit point of
less than
to
S
S0
.
would be less than
Also
that converge
either since
At the same time it would be
~f)
to Yl
S
since
Yo
is the greatest lower bound of S.
= (Yo)
0
Yo
Yl ~:s
Yo'
and therefore there is no sequence of points belonging
Yo
Renee we have
i.e.
other
is any other point of
€
MS)
and
MS)
is not empty
-43-
Theorem 2.12.
The c1ass of decision rules corresponding to points
in the lower boundary df the risk set
suppose that ~
i.e.
= {e 1 , ••• ,en }
be10w and closed from be1ow.
= {
Then
S
is a minimal complete c1ass;
and that
S is bounded from
Then let
~
Ô € D'
D.OJ is a minimal complete c1ass of decision ru1es for the given
decision prob1em.
We fiz:.st show that
~.
Do and let
y
definition of
D is a complete class.
= (R(e 1 ,ô), ••• ,R(en ,Ô»,
Do.
Let
from be10w (since
S
SI = ~(\
S.
then
y
Then
SI
is bounded from be1ow)
€
S
r be any ru1e not in
Let
o
but y i
~(S)
by the
is non-emptY~',bounded
and convex (since the c10sure
of a convex set is convex and the intersection of two convex sets is convex).
Rence, app1ying 1emma 2.6, we observe that
A1so
and
since
Y1
€
=
-~f\ S C~
= ~
-= ~
since
MS)
And now, since
S
such that
and
D
o
CS),
YI = (R(el,Ô ), ••• ,R(Q ,ô»
o
therefore
But this imp1ies
~(S)
is closed from below (i.e.
R(e.,ô)
J
R(e.,ô)
J
0
0
<
is complete.
n
< R(e.,ô)
J
R(Q.,ô)
J
0
for some
then there exists
and since
for aIl
j
e .•
J
n
1
= 1, ••• ,n
S
-44-
Now by Theorem 2.7 . we observe that every rule
Bence no proper subclass :of
Do
Ô
€
o
Do
is admissible.
could be complete sinee by lemma 2.1
every complete class must contain all admissible decision rules, i.e.
D
o
is a minimal complete class.
Corollary 2.12.1.
1f
S
is compact in Theorem 2.12,
then
D is a
o
minimal complete class.
Proof.
S compact implies
S
closed and bounded, which means
S
is
bath bounded and closed from below, then Theorem 2.12 gives the result.
Corollary 2.12.2.
The class
D
of Theorem 2.12
o
consists exactly
of the admissible decision rules.
Proof.
The result is immediate from Theorem 2.1.
We now give some results that deal with the valid elimination of
randomized decision rules in statistieal deeision problems.
The following theorem gives an interesting and useful result dealtng
with the essential completeness of the non-randomized decision rules when
the loss funetion is conveXe
-'
Definition 2.13: Consider
a real-valued function
S
€
X
~
f(X)
€
E,
n
a column vector
then
defined on a convex
E , is said to be convex if for
n
we have the following relationship:
t
l
€
S,
X2
€
S, and
0 <
0:
<
l,
-45-
Lemma
2.7.:
Jensen's Inequa1ity:
Qn a non-empty convex subset
S
If
En
C
f(X)
and
is a convex rea1-va1ued function
..a
if
Z
is an n-dimensiona1 random
vector with finite expectation
-
= (lZl, ••• ,E Zn) , such that P(Z~ C S) = 1,
EZ
(1)
and
(2)
E
Z C.
S
-
~
f(E Z)
E f(Z)
Theorem 2.13. Let ~ be a convex subset of
convex f unct i on
0
If for some
vector
L(Q, ~)
o
~
a,
Q'
for a11
€ ICI
~
€®
Q
€
E
and
n
> 0 and a c >
€
2:
€
la
< L(Q,P)
for al1
Q
4B,
€
-
where
P
Let
....Z
is given by P.
be a random vector with values in
Then for al1
Since by the given,
E(
€\il
€
But
be a
0
such
L(Q',P)
Therefore
~*
P
+
€ '-'\
El zr +
is finite, hence
z1 <
E1
00
< E(L(Q' ,Z»
and
€
c
Eltl
then
L(Ç)' ,P)
+ c
<
.-
co,
E Z
is finite •
therefore
<
E Z <
él .
whose distribution
we have that
~
c)
ft
is a probability distri-
but ion giving mass one to a finite number of points of
Proof.
L(Q,a)
e.
there exists an
,~'
let
1 is the 1ength of the co1umn
+ c, where
such that
then for a11 P € ~* there exists an a €
0
L(Q', 1)
that
"a
f
then
co
-46-
i.e.
we may now use lemma 2.7
L(Q,E ""Z)
~
But
EL(Q,Z)
therefore
=
L(Q,~)
o
which here implies
~
< EL(Q,Z)
L(Q,P)
L(Q,E ""Z)
and
< L(e,p)
for
a11
= L(e,a..lo ) where 'to
Q e:
=
Ete: ~,
® 0.
Note that the imposed condition on the loss function is guaranteed in
the case of ~ being a bounded set.
The actual condition on the loss function
simp1y ensures that every element of ~* (in the case of Theorem 2.13 that
is P)
has a finite expected value.
Renee, we observe that for the statistical problem
(GD,~,R),
0*
with the conditions of Theorem 2.13, we have that for aIl elements of~
there exists an element of ~ with no larger risk.
This implies the
e1ementsof D (i.e. the clas of aIl non-ranomized rules) are always as
good as the elements of~.
Renee the non-randomized ru1es from an
essentially complete class.
Note that there is a definite restriction in Theorem 2.13, namely the
existence of the risks.
The restriction that expectations exist;·o is overcome
by assumption in the following more restrictive theorem, given here without
proof.
Theorem 2.14. If the loss
L(e,a)
is convex, if ~ is a subset of
E
n
spa ce, and if we consider only decision rules yielding finite expectations
for the rlsk, then the élass of non-randomized decision rules is essentially
complete.
-47-
The restriction to decision ryles with finite expectations
allows the use (as in Theorem 2.13) of Jensen's Inequality (lemma 2.7)
in the proof.
Note also, that by replacing the restriction to ru1es yielding
finite expectation with the requirement that for aIl
exists a
sueh that
M
a
we then have (sinee
infinite whenever
if
\a 1-a 1 =
then
M
a
a
L(a,a )
1
~ there
€
> L(a,a),
L is eonvex) that the risk associated with
E(8)
ô
is
is infinite (and thus the restriction to
finï"te1y expected ru1es is inherent with this new requirement).
A good example of a convex loss function is squared error
(Le. variance).
We now offer a result of
A.Lyapunov
that is fundamental to the
theorems that follow.
Lemma 2.8.
Let
(y}
= y
Let
(S}
=
:5
Let
f(S)
,
for
be any space
be a Borel field of subsets of Y
k=l, ••• ,q,
be a finite number of
real-valued, eountably additive set functions defined for a11
Then if we let
ô .(y),
J
j = l, ••. ,m,
function satisfying
m
2:
j=l
the set functions
if for some
that
S'
€
fk(S)
S €~,
~ also and
composition of Y into
to
be real
=
then there exists an
f(S') # f(S)
and
m disjoint subsets
~ and having the property that:
$.
y
Y,
€
and if
1
are such that they have no point masses
f(S) # 0,
€
non-negative $ -measurable
for aIl
Ô.(y)
J
s
S'
€
S
(i.e.
sueh
f(S') # 0)
there is a de-
Sl, ••• ,Sm
aIl belonging
-48-
=
J
(L 2.8.1 )
for
j = 1, ••• ,m and k=l, ••• ,g
y
and if
Sj* (y)
for a11
=1
for a11
j = 1, ••• ,m,
y
S.
€
then ( L
and
J
~.8.1)
(L 2.8.2)
*
5.(y)
J
=0
y
and
can be written as
J
=
for any other
for j = 1, •• , m
y
and
Theorem 2.15.
If
& is
finite, i f
6 is
k
finite a1so, and if
= 1, ••• ,
Pe'
g.
the
distribution of the random variable X given e as the true state of
nature, has no point masses for a11
e
there exists a non-randomized ru1e
€
C19,
then for any ru1e 5
d, that is as good as
5; i.e.
the c1ass of a11 non-randomized ru1es is essentia11y complete.
Since ~ is finite, so is D, the space of a11 non-randomized
Proof.
ru1es.
Let
EH)
=
masses for a11
such that
x
and
{ 9 1 ,···,9p )'
€
e.
~
Ul
and
x
Pe (x)
has no point
i
is any observed value of random variable X
(the rea1 line).
Rence we observe that any decision ru1e
5
may be represented
as a vector function that assigns, for every x, probabilities to each
d., j
J
= 1, ... ,m.
where
i.e.
pure decision
dj~
will be made when
x
5.(x)
J
is the probabi1ity that the
is the observed value of X.
-49-
We also have
m
B .(x)
J
>
0 ,
J
j=l
The risk function when
Pe (x)
since we are dealing with probabilities.
B .(x) = 1
Z
9
i
is the true state of nature (and then
X)
is the distribution of
and when the decision rule
B(x)
is
i
èhoSérl, is given by
R(9 i ,B)
= Ee L(9 i ,B)
m
J
Z
=
L(ei ,5.) 5.(x) dP 9 .(x)
J
J
~
X
j=l
Now we are of course able to specify more precisely the nature of
a vector function of a non-randomized decision rule.
0 and 1 for aIl
J
x.
S of ~
We let, for any measurable subset
(2.15.1)
V.. (S)
~J
=
J L(9 i ,d.)
S
Then by the properties of
QD
dP
for
(x)
J
9i
and
P9
i = l, •• ,p
the measures
i
V
ij
and j = l, ••• ,m •
are finite,
countably additive, and have no point masses (Le. if for sorne
Vi/S) i= 0,
then there is an
t:
S' C S
and
V .. (S)
~J
and
V.. (S ')
~J
S'
t:
€
JR. such
0
We may now rewrite our risk as
R(9.~ ,5)
=
is
5.(x)
* (j = l, ••• ,m) can take only
a vector function whose components
the values
* ie. d(x)
B,
m
Z
j=l
J 5. (x)
X
J
dV .. (x) •
.
1J
that ,
S
€
1R ,
-50-
We .now apply lemma 2.8
as
follows:
replace the space Y by ~,
replace the set function
{V
ij :
i
= l, ••• ,p
; j
= l, ••• ,m
}.
exists a non-randomized decision rule
Then it fo110wsthat there
ô * (x)
such that:
J
(2.15.2)
for
X
i
= 1, ••• ,p
and
j
= 1, ••• ,m
•
But this means, in terms of risk, that for every decision ru1e
there is a non-randomized decision ru1e that is at 1east as good as
~he
given one, i.e.
the c1ass of non-randomized decision ru1es is
essentia11y complete.
Theorem 2.16.
Pe (x)
Given
®
=
{9 , ••• ,9 } ,
n
1
is compact, and each e1ement
is such that it has no point masses, then the c1ass of non-
i
randomized decision rules from an essentia11y complete c1ass.
Proof.
Since
D is compact
(because ~ is compact)
then we know that
from any arbitrary open covering we can select a finite subcovering.
Now since there may in fa ct be any number of arbitrary open coverings
for D, we are able to select any number of finite subcoverings.
Rence
we sha11 construct an infini te decreasing sequence of subcoverings. The
construction is as. fol1ows.
(a)
From an arbitrary open covering, select a finite subcovering,
i.e. a finite number of non-empty, measurab1e sets with the properties
1isted be10w.
This will provide the first term of our sequences.
will write this term as:
We
-51-
(c(l), c(2), ••• ,c(k ) )
1
whe!'e k is the number of sets in this sub-
covering and the subscript
1
refers to the first se1ected sub-
covering.
(b)
From another arbitrary open covering, select another fini te
subcovering with the properties 1isted be10w.
second term of our sequence.
where
This will provide the
Write this term as
the subscripts
1,2
(c(1,1),c(1,2), ••• ,c(2,1),
refer to the fact that we
have successive1y se1ected two subcoverings now, or in other words, that
this is the second subcovering se1ected.
(c)
Continue the selection process ad infinum so that the
term of our sequence is:
subscripts
(c(1,1, ••• ,1), ••• ,c(k , ••• ,k2 »
1
1, ••• ,t represent the selection of the
tth
where
tth successive
subcovering.
The properties that must be satisfied by the se1ected subcoverings are:
(i)
Any two sets
C which have the same number of indices, not a11
identica1, are disjoint (note that including "not a11 identical" exc1udes
the trivial case)
(ii)
The sum of a1l sets with the same number of indices is D, i.e. we
specify exactly what each subcovering shou1d consist of in the sense that
it must not simply properly contain D, but must equa1 D.
(iii)
If the sequence of indices of one set
C constitutes a proper
unitia1 part of the sequence of indices of another set, the first set
proper1y contains the second.
i.e.
C(l,l)
C(l)
for examp1e.
-52-
(iv)
by
The diameters of al1 sets with
g('z)
and
lim
t
g('z)
=
t
indices are bounded above
O.
~'CO
Note that the sets that are derivedfrom our arbitrary open
coverings will no longer benecessarily open after being modified to
satisfy the four above properties.
To realize the implications of
these properties, consider thefollowing example of subcoverings:
The second term of the sequence could be
where both
of both
Also
c(l,l)
c(l,l)
c(l,l)
and
,Z..
(2.16.1)
x
c(1,2)
c(1,2)
and c(1,2).
Now we fix
i.e.
and
belong to
(c(l,l), c(1,2), ••• ,c(k ,k »
l 2
c(l) and where the diameters
are less than the diameter of
c(l).
are disjoint by property (i).
and define an arbitr.ary decision rule as follows:
(x)
is the observed value of X, then action
a
€
an (independent) c~ance mechanism such that for the set
~ is selected by
c(m ,.·.,m,Z)
1
€
6(,
the probabi1ity that the se1ected action a belongs to
Renee, (2.16.1)
represents a general randomized decison rule
We wish to show there exists a non-randomized decision rule
is as good as
B(x).
B(x).
B* (x) that
-53-
Now we let also
1
~1 .. ~.
for
x
fixed and i f ' !::.
ml!::.' •• m.e
= 0
for
x
> 0
f1ged and if
= 0
~
1
Hence, taking expectations, we have:
(2.16.2)
!::.
. ml
Consider a decision subspace
mi = 1, •.• ,k
and
i = 1, ••• ,
D.e with e1ements
m.e
(x) d Pi(x)
d
with
ml' ••• ,m.e
.et
and putting the 10ss
L(9 , d
) = Wl.' [x, c(m , •.• ,m n) ] ,
i
1
ml' ••• , m.e
.fi
(2.15.1) and (2.15.2)
sequence of measurab1e
and 1emma 2.8
functions:
then equations
imp1y that there exists a finite
-54-
such that
(x)
=
0 or· 1
for ail x.
m.e
-
·6
Z
Z
ro.e
ml
ml· ••
-
(x)
6 m _· .
m.e
l
and
J
=
0
m.e
(x)
W. [x, c(m , ••• ,m ll
1
1
/fi
= J w.
X
Let now B(x)
"'8 [
1
1
whenever 6
)]
X
(2.16.3)'
=
ml
for all
x
... ro.e
(x)
6
ml··. m.e
=
0
(x)
[x, c(m , ••• ,m ll )]6
(x)
l
/fi
ml· •• ID.e
dP. (x)
1
be the'decision function for which
c(ml, ••• ,m.e)/X ]
= 6m
1
.•.
m (x)
.e
B(n
m] ••• m.e
(2.16.4)
where the right hand side of
denominator is zero.
(2.16.2)
/x)
is defined to be when the
-55-
Then from (2.16.2)
Renee, putting
decision ru1e
5~(x)
and (2.16.3)
we have that
t = 1 in the above imp1ies the existence of a
such that:
(1)
the choice among the
Crs
with one indec is non-random.
(2)
the decision, once given the
C with one index that is
chosen, is made according to
5(x).
i
= 1, ••• ,n.
We now repeat the above prodedure for every C with two indices
using W [x,C(m ,m2 ) ] as weight function and
i
1
~eci8ion
ru1e.
1
5 (x)
as our given
Again we have the existence of a decision ru1e
2
5 (x)
such that:
(1)
the choice among the Crs with two indices is non-randomized
(2)
the decision, once given the C with two indices that is chosen,
is made according to 5 1 (x) and hence in accordance with
5(x).
(3)
J J
X C(m )
1
2
L(Qi,a)d5 (x) dP i(x)
-56-
for
x
fixed
and
ml
= 1,2, ••• ,kl .
and
= l, ••• ,n
i
We now repeat the above procedure for a11
~
= 3,4, ••• ,
8~(x)
ad
infinum.
At the
•
w1.tlt
Cfs
t indices,
~th we will have a decision rule
such that:
witn
~
(1)
the decision among the C's
(2)
the decision, once given the chosen C with
according to
ô
~l'
2
indices ia non-random,
1
(x), ••• ,ô (x),. (x)
~'indices,
and hence
ia made
8(x) ,
(3)
and
. for
J
x
~
f
d8 (x) dP.(x)
1
i = 1, ••. , n;m = 1, ••. , k , mt-l = l, .•. , k t-l •
l
l
fixed and
Now, holding
x
fixed and letting
C(x;t)
denote that C with t
C(x;~l)
is a proper subset
indices such that:
J
C(x; t)
Then, by our initial construction
of C(x; t)
for a11
~
>
o.
Moreover, the sequence
C(x;t), t = 1,2, •..
compactness of D, a unique limit point C(x)
€
D.
determines, by the
But we have shown
the existence of a corresponding sequence of probability'measures (1. e.
t
8 (x); t = 1,2, ••• ) that must, accordingly, converge ta a limit
-57-
probabi1ity measure
8 * (x)
which assigns probabi1ity one to any measurab1e
set which contains the point
C(x).
Since
L(9 ,a) is continuous in a,
i
we have
(2.16.5 )
lim
1,-+00
Now let
of
L(9,a)
x
f
S
1,
L(9 ,a) d8 (x)
i
vary over
= f
8
X(i.e.1Fl).
L(9.,a) d8*(x)
~
Then (2.16.5) and the boundedness
imp lies tha t
R(9. ,8)
~
for a11 x.
Since
a1so
i=l, ••• ,n.
Hence the probabi1ity measure
8* (x), which by. our deve10pment
represents a non-randomized decision ru1e, is as good as 8(x).
8(x)
But
was taken as any arbitrary decision ru1e and therefore we have
that the set of non-randomized decision rules is essentia11y complete.
Note that it can be shown that
of
x
for
D
c:
Definition 2.14.
8 * (Dix)
is a measurab1e function
t(.
For every
€
> 0, we say
ô(x)
and
1
8 (x) are
€-equiva1ent if
for a 11
Q
€
lE9 .
-58-
Definition 2.15.
A metric space
C is sàid to be conditiona11y
compact if any sequenèe (Ci}' i = 1,2, ••• ; of e1ements of C admits a
Cauchy subsequence
ij
: j
= 1,2, •••
},
subsequence such that
i.e.
where
(C
P(·,·)
is the metric of the space.
Thëorëm' 2.11 •.. If
n of
the space
Pe(x)
has no
point masses for aIl
e €
dt,
if
Pe's is conditiona11y compact in the sense of the
metrics:
sup
S
measurable subset of
S
=
ct is
sup
a,x
conditionally compact in the sense of the metric
p(a 1 ,a 2 )
for any
is any
1
1R,
and
and if
1J dPe
€ > 0
=
!UP 1L(e~.al)
~,x
and for any decision rule
€-equivalent non-randomized decision rule
- L(e,a 2 ) "
ô(x)
then
there exists an
ô* (x), i.e.
the class
of non-randomized decision rules are an "€"-essentially complete class.
-59-
Definition. 2.16.
We define
and
as
Theorem 2.18.
X)
sup
S
=
sup
a,x€S
Let aIl elements
no point mass~s.
space
=
Pg(x)
1S
J dP 9 - f dP 9 1 =
IS
2
1L(el'~)
of il
- L(Q2,a)'
P2S
be suchthat they have
If there exists a decomposition of~(i.e. the samp1e
into a sequence
{Ri: i = 1,2·,...}
of disjoint subsets such
that il is conditiona1ly compact for the following metrics:
for each
=
P
and P
1Ri
2Ri
and such that ~ is conditional1y compact in
i = 1,2, ••• ,
the sense of the metric
=
for each
sup
Q,X€ R.
1
i
= 1,2, ••••
Then for any
€ > 0
and for any decision rule
exists an €-equivalent non-randomized decision fuIe
* i.e.
ô,
ô
there
the class
of non-randomized decision rules are an n€"-essentially complete class.
We now observe that for the theorems above dealing with the
elimination of randomization in statistical decision problems,
assumed finte.
by the following
~ was
That this condition is indeed necessary is illustrated
example~'.
-60-
Let
Examp1e
@=
~=
Y
€
=
l
L(9,a )
1
=
1[1,2] (9)
N(9,1)
X
O
Ô
[-2,-1]
(9)
and let the observable random variable
by
Let
{a ,a }
o 1
L(9,a )
0
Let
[1,2 ]
[-2,-1]
=
f
1Yl
if
1yi
Y
if
Jyi <
X be defined
> 1
1
be the decision ru1e such that
=
of random variable
<Y < 0
-1
L.
if
o
if
o< y<
1/2
if
y
> 1
1
y the observed value
Y.
First, we show the
babi1ity density function
admissibi1ity of
g(9)
=
1/2
o
o
5 (x).
if
-2
otherwise
Consider the pro-
< 9 < -1 or
1 < 9 < 2
-61-
Considering
the posterior
then the
-1
g(Q)
as the prior
probabi1ity of the
e
distributio~
probabi1ity
interva1 [-2,-1]
posterior probability of the
of Q
is greater (less)
Q interva1 [1,2]
when
< x < 0 (P < x < 1) and the posterior probabi1ities of the two
interva1s are equal to eaeh other when x
=0
x > 1.
or
o
Renee B (x)
is a Bayes Bo1ution with respect to the prior distribution
g(~),
i.e.
f
(1)
-1
R(Q,Bo)dQ +
-2
f
2
o
R(Q,B ) dQ <
of
QIs
we have that
of zero measure.
are continuous functions of
everyw h ere an d h ence
1
B (x)
Now, let
~o(x)
u
~
since
Y is
R(9,B)d9
1
o
R(Q,B )
for a11
e.
Q 6:
o
R(9,B) < R(Q,B ). can hold on1y on a set
Since by the given
Q, it fo11ows that
R(Q,B)
o
and R(9,Q )
R(9,B)
= R(Q.B o )
is admissible.
be any decision ru1e such that
for a11
i.e.
2
f
B.
Suppose now, B is such that R(9,B)
Then from (1)
R(9,B)dQ +
-2
1
. for any decision rule
-1
f
Q
€
®,
N(Q,l),
/
( 2)
1
2rc
00
f
_00
o
for a11
Q
€8.
-62-
e-_
Now c1ear1y
5~(a1)
- 8!(a 1 )iS:il bounded function of y.
from the uniqueness properties of the Laplace transform,(2)
Rence
can on1y
ho1d if
Reoce no non-randomized
decision function exists such that
for a11
9
€8.
Renee the class of non-randomized decision ru1es for a given
decision problem cannot be essentia11y complete if
GD
is not finite.
We now offer another resu1t dea1ing with essentia1 <.comp1eteness
that appears somewhat restrictive (because of the assumpçions on the
action space ~ and the states of nature ~)
but which is of
considerable practica1 significance (since practical prob1ems often
dea1 with rea1 values).
Theorem 2.19.
Consider49 to be finite and tQ be10ng to a rea1-va1ued
interval, Le.
El
)f
space
= (X,.,
= ( G1 , ••• ,Gn )
p), where
c1R.
[G ,G ]
Consider the samp1e
1 n
X is a set of rea1 numbers. Consider a1so,
€
the action spa ce to consist of those rea1 numbers that are the range
of these functions that map
C(-
00,+00).
X into the rea1s again, i.e.
If we have, also,for a11
8 € [a,b]
~
€ct€~,
(1)
L(G,5)
is a non-decreasing function of 9 when 9 > 8
(2)
L(9,5)
is a non-increasing function of 9 when
Then the
c~ss
of decision ru1e
be1ong~ng
G< 8 •
to [a,b] is essentia11y complete.
•
-63-
Proof.
Let
ô*
Define
as:
ô*
(2.19.1)
Then
* :S
R(e,ô )
and (2)
be any decision rule belonging to the action·space ~ •
Ô
=
R(e,ô)
of the given.
ô
if
Ô €
[a,b]
a
if
ô
<
a
b
if
ô
>
1;>
for aH
follows directly from (1)
But this implies the class of decision rules
formulated by procedure (2.19.1) (i.e. those decision rules belonging
to [a,b])
is essentially complete.
Example.
If the elements of
{Pel
are probability distributions
or densities of the generalized form
Pa
where
and
if
QD is
(x)
f(x;e)
= p(x/Q)
f(x;e) g(x)
is such that if
Xl > x 2 ' 9 1 > Q2'
then
a real-valued interval, then the class of monotone procedures
is a complete cla ss.
We now give some results that offer a different approach to the
decisiori groblem.
Instead of Relecting a class of admissible (or
-64-
complete, essentially complete, etc.)
decision rules from the
entire action space as we have previously done, we will now actually
f~t:lOl
• decision rule that is by construction admissible.
For this
procedure we must of course be flexible with respect to the actions we
may use.
We consider the case when the random variable
X is distributed
according to a distribution belonging to the exponential family of
probability
dens~ties,
p(x;9)
=
i.e.
t3 (9) exp (x9)
for a11 x and for specified 9
€ ••
•:1
Furthermore, {P9} consists of distributions that are densities with
respect to a fixed a-finite measure
~.
We wish to consider decision rules based on a single observation
x,on X.
There is no loss of -generality in restricting our attention
-to the case of a single observation because of sufficient stati8tic
for
n
observations from an exponential distribution i8 the sum of the
obervations whose distribution is also a member of the exponential family •
We consider the 10ss function
1'.-
=
[9 - a]
2
We also assume that. is a truncated space, 1. e.
e = (e/9 > g
in taking
g = 0
the zero case.
1
L(9,a)
J.
Note that there is no real 108s of generality
sinee the development for
g
~
0
follow8 direetly from
-65-
Theorem 2.20.
Let
®
Let
p(x;Q)
Cg, 92 ),
=
function defined on
GP
>
J
g
be joint1y measurab1e in x and 9.
Let
g(Q)
1
d9
~
K> 0
ô
€
~ with bounded risk
such that
f
(2.20.1) ,
as
00
g(9)
If there.exists a decision ru1e
and a
be a positive, measurab1e
and such that
ru
00
(P9)
€
[ô(x)-Q] g(9) P(X;9)d9'::S Kg(ru) p(x;ru)
for a11 ru
€
(g,92 )
g
and for a11
x,
ô*
then if
(2.20.2)
is any decision ru1e such that
R(Q,ô*)
::s
R(9,Ô*) =
OR
Proof.
ô
R( 9,ô)
R(9,ô)
for a11
9
€ ••
it fo11ows that
e € ~)
is a decision ru1e that is admissible a.e.
From
(2.20.2)
we have that
J
[ô * (x)-9] 2 p(x;9)d (x) <
-00
9
qf),
a1most everywhere (Le. for almost a11
00
(2.20.3)
€
We now . show
~
-
00
J
_00
[ô(x)-9]
2
p(x;Q)d~(x)
for a11
-66-
co
J
(2.20.4)
00
[ô*(x)-ô(x)]
2
d~(x) ~ 2 J [Ô(x)-Ô*][ô(x)-9]p(X;9)d~(X)
p(x;9)
-co
-00
as follows,
the left-hand side of (2.20.4) expands as:
00
J
(ô * (x» 2 p(x;9)d~(x) +
_00
00
J
ô 2 (x)p(x;9)d~(x) -
_00
00
(ô * (x» ô(x) p(x;9)d (x) ,
J
-2
~
_00
the right-hand side of (2.20.4)
expands as:
00
2
2
Ô (x)p(x;9)d (x)- 2
J
_00
~
00
J
00
ô(x)Q p(x;9)d (x)- 2
~
.
_00
J ô*(x)ô(x)p(x;9)d
-00
~
(x) +
00
+ 2 J
ô* (x) Q p(x;Q)d
~
_00
00
Now, adding and subtracting
2
ô (x)p(x;Q)d (x)
J
~
-00
00
andJ
(x).
*
ô (x)p(x; 9)d (x),
IJ.
_00
to the left-hand side yields
2
OOJ
_00
22
ooJ * .
ooJ2
(ô (x» p(x;9)d (x)- 2
ô (x)à(x)p(x;Q)d (x) + 2 ô (x)p(x;e)d (x)
IJ.
~
_00
~
_00
*
à 2 (x)p(x; 9)d (x) - J (à (x»
_00
clJ._ oo
00
-J
00
2
p(x;Q)d lJ.(x).
-67-
00
now adding and subtracting
2
J
(8* (x» 2 p(x;Q)d~(x)
to the right-
_00
hand side yields
*
00
2
J
2'
p(x;Q)d~(x) - 2
(8 (x»
_00
00
J 8(x)
Qp(x;Q)d (x) + 2
~
_00
00*
J8
(x)Qp(x;Q)d (x)
~
_00
now, cancelling terms on both sides, we have,
00
left-hand side
=
-
00
8 2 (x)p(x;Q)dll(x) -
J
~
_00
00
right-hand side
=-
2
-L
(8 * (x»
J
(8 * (x» 2p(x;Q)d (x)
~
-00
00
2p(x;Q)d~(x) -
00
:L
8(x)Qp(x;Q)d~(x)
+
'Ii
+ 2 J 8
(x)Qp(x;Q)d~(x)
-00
Now, suppose, for the sake of contradiction, that the left-hand side
of (2.20.4)
is greater than the right-hand side.
transférrin~~term~'
00
J
-00
(8 *(x»
This implies, by
thàt:
00
00
2p(x;Q)d~(x) - 2J.8(x)Qp(x;Q)d~(x)
*
>
_00
J
8* (x)p(x;Q)d~(x) -
_00
00
-2
J 8(x)
-00
Qp(x;Q)d~(x)
-68-
which 1s a contradiction to (2.20.3).
and (2.20.4) holds.
Rence our assumpt10n 1s wrong
Let us now expand
and s1mplify (2.20.4):
co
Let
=
D(e)
J [Ô*(x)
- ô(x)] 2 p(x;Q)d~(x),
-co
then
(2.20.4)
imp1ies,
co
D(e) S 2
J
-co
therefore
[ô(x) - 5*(x)] [ô(x) - e] p(x;Q)d (x)
I..J.
(.l)
J
S2
D(e) g(Q)dQ
ro
Jg(e)
g
g
co
J
[ô(x)-ô * (x)] [ô(x)-e]
p(x;Q)d~(x)dQ
-co
therefore
co
fJ
-co
and fram (2.20.1)
g
J
g(e)[ ô(x)-e] p(x;Q)ded (x)
~
g
we have:
ro
J
[5(x)-ô *(x)]
J
D(e)g(e)de < 2
g
ro
co
D(e)g(Q)de S 2Kg(ro)
JI ,~(x)-ô * (x),
p(x;ro)d (x),
~
-co
Schwarz:' inequality we have
ro
(2.20.5)
J
g
1/2
D(Q)g(Q)dQ <
2Kg(ro) [ r(ro) ]
and by
-69-
ru
Let
now,
J
J(m) =
D(9) g(9) d9
g
We will show by contradiction that
Assume there exists a ru'
(2.20.5)
€
(g,92 )
J(ru) = 0
such that
for aIl
J(ru') > O.
m
€
(g,92 ).
Then
implies:
1
-12
4K
g(ru) D(ru)
<
g(ru)
,
ru' :Sru< 92
J2 (ru)
m*
1
Le.
J
2
4K
J
m:
1
cm
ru*
<
g(ru)
.Now applying: limiting condition as
g(m) D(m)
J
ru'
/(ru)
ru -4 9 2 ,
we have:
cm,
-'ru' :Sru* <
m*
00
1
J
>
ru'
g(9)
ru*
g(m) D(ru)
J
This implies
d9
m'
~
00
dru 2:
/ (ru)
1
*
ru-4
as
9
2
ru*
J
_1_
4K2 · ru' g(m)
d:o~
00
as
m*
But
J
dru
remains bounded as
m'
a contradiction.
M(W*)
=
To see this put
7*
(J)I
g(w) D(ru)
J2«(J)
dw+
1
J(ru*)
m*
~
which yields
e2
-70-
then M'(m)
*
exists and is zero almost everywhere on [m',e ).
2
Also
M is an absolutely continuous function on every interval contained i n . .
Therefore M is constant over ••
*
m
Rence
g(m) n(m)
J,
M(ru*) _ _1-:*:-J(m )
can be written as
i(m)
m
which remains bounded as' ru*
Rence we have shown J(m) = 0
construction
J(ru)
for 'a11
ru
€ .,
since by
2 o.
m
Now, since
J
J(ru) =
n(e) g(e) de
g
therefore
n(e)
= 0 for almost all e
Now, to show that
that
to
(2.20.4)
~.
But
ô(x)
=0
ô * (x)
= ô(x)
essentially implies this.
=
Theorem 2.21.
is admissible a.e., it suffices to show
is only true when
n(e)
€4D.
R(9,ô)
for almost a11
a.e.
with respect
Renee we have:
9
€
eJ.
Let us restriet~as follows; eonsider the exponential
*'
family of densities with respect to the a-finite measure
also, the sample spaee
x
and
(;)
€
Consider,
on the real line, measurable with respect to
That is, we have the family of densities {Pg}
for a11
~.
where
p(x;9)
e where
J
exp
(x9)d~(x)
<
00).
= ~(9)
~.
exp(xg)
Tl::
-71-
Then the following funetion of a single observataion
ô(x) =
f3(g) exp(gx)
x +
x
€
X,
is an
00
J
[3(e)exp(x9)de
g
admissible a.e. deeision rule.
Proof.
For simp1ieity, but as was mentioned ear1ier, with no loss
of genera1ity, we assumeg
(2.21.1)
=
h(ro)
~
O.
In Theorem 2.20, let
g(9)
= 1.
Let
[3(ro) exp(xro)
00
J
[3(e)exp(x9)d9
ro
Renee we may write
Q2
J
(2.21.2)
00
[Jcl-h(g)-Q]
[3(9)exp(xQ)dQ)
= J [3(9)exp(x9)d9
9
2
g
and th en the derivative
hl (ro)
=
of
h
beeomes, by differentiating (2.21.1),
h(ro) [x-e + h(ro)] • Now, from this it follows, sinee 9 > 0
here, that
00
J
Q <
ro
J
Q < x + h(ro) ,
9f3(e)exp(x9)de
=
00
ro
i.e.
[h(9 )-h(g) ]
2
[3( 9) exp(xQ)de
x + h(ro)
-72-
therefore
0 < [x-9 + h(ro)]
h(w) [ x-9
+
Le.
hl (w)
h(w) ] >
am since h(w) > 0
for al1
fi
then
0,
> O.
This implies
is an increasing function of w and therefore
h
h(9 ) - g(9 ) > 0,
2
1
i.e. then, taking this with
(2.21.2)
imp1ies that:
9
2
J
~(9)exp(xQ)d9
[(X+h(g)-9]
<
~(92) exp(x9 )
2
g
But this is precise1y the condition (with K
Theorem 2.20
= 1)
required in
to imply the admissibi1ity a.e. of the decision rule
x + h(g).
That is
or
ô(x)
= x + h(g)
j3(g) [exp(gx)]
ô(x)
=x +
is admissible a.e.
00
J
j3(Q)exp(x9)d9
g
We sha11 now investigate admissibi1ity a.e. in the non-truncated case,
Le. whereeis an interva1 with end points
9 ,92 •
1
The fo11owing theorem gives a sufficient condition for a decision
rule to be admissible a.e. in the non-truncated case.
-73-
Theorem 2.22.
~
to
Consider
= ~(9)
exp(x9),a density with respect
and jointly measurable in x and 9.
with endpoints
on
p(x;9)
G&.
9 ,Q2; g(9)
1
Consider
G&
as an interval
is a positive'measurable function defined
Consider, also, a decision rule
5
with bounded risk.
Now if,
(J)2
(2.22.1.)
>
00
1
f
g(Q)
dQ
~
00
as
~
(J)2
9
2
C
and
C
(2.22.2)
00
1
J
>
(1)1
where
(2.22.3)
C.
€
(9 ,9 ),
1 2
,
f
(1)2
g(9)
d9
~
as
00
(J)1
~ QI
K> 0
and if there exists a
such that
1
[ô(x)-Q]g(9)p(x;9)d9 ~ K [g«(J)2)P(x;(J)2)
+ g«(1)l)P(x;(1)l) ]
(J)1
for aIl
9 < (J)1 < (1)2 < 9
1
2
and for aIl
x
€
X.
Then if
5*
is any
decision rule such that
R(9,ô*) ~
R(Q,Ô*)
.
-
i.e.
ô
R(Q,ô)
=
is admissible
R(Q,ô)
a.e •
for aIl
9
€
fb,
for a1most aIl
it follows that
Q
€
~,
-74-
Proof.
The fact that
R(Q,5*):5 R(Q,5)
for a11 Q
€@,
implies
simi1ari1y to'the proof of Theorem 2.20 that,
J [5*(x)-5(x)J 2
00
p(x;Q)d~(x):5 2
_00
J
00
*
[5(x)-5 (x)]Iô(x)-Q] p)x;Q)d (x)
~
_00
Letting, simi1ari1y to Theorem 2.20 a1so,
00
D(Q)
= J
[5 *(x)-ô(x)] 2
p(x;Q)d~(x),
D(Q)
is measurab1e and
_00
finite.
00 2
J
Then (2.22.3)
a110ws us to write
*
00
D(Q)g(Q)dQ:5 2
f
002
J g(Q)[ô(x)-Q]
[ô(x)-ô (x)]
00
_00
p(x;Q)d (x)
~
1
therefore
00
2
co
therefore
+ f:IB(X)-B*(X)]
g(œl)p(x;œl)d~ (X)}
Then, applying
Schwarz,' inequality,
OR
-75-
ro
(2.22.4)
J2
D(9)g(9)d9
1/
2
S 2K {[g(ro
l )]
ro _.
l
We now proceed to show that
let
J(ro 2 )
a > 0
find an
= J
D(9)
=0
a.e.
then from (2.22.4)
g(9)D(9)d9
we can
roI
such that, for
ro
sufficiently close to
2
9 , we have
2
,
then upon integration and rearrangement,
(2.22.5)
a
2
l
l
- J(V )
2
right-hand side of (2.22.5)
l
>
goes to
00
d ro
2
,
and the left-hand side remains bounded •.
Hence we have a contradicting result and therefore,
ro
g("'2)
[D('''2)]
1/2
~
~
ll.· m
2
... 9
2
where
cannot b e greater th an zero.
-76-
(II)
°
9
J
=
g( 9) D(9)d9,
(.1)1
By (2.22.4)
and our given condition (2.22.2)
of the enuniciation, we have that
Assume now that there exists
such that W«(.I) ) > O.
o
Then
mo
1
d m
1
J
but then, as
~
m'
the righ-hand side goes to 00
9 ,
1
hand side remains bounded.
th~
But by the reasoning of
for almost a11
Example
9
If
€
e9
9= (-
each of the intervals
g(9)
and that hence
00, + 00)
(0,00)
and
1
=
J
exp(xQ)dJ.l.(x)
and
*
J.I.
=
°
for a1most a1l
possesses positive measure in
(- 00,0),
then we may take
But then
g(Q)
~
° as
< m
0
W( ml) = O.
this means D(Q)
R(Q,ô ) = R(Q,ô)
~~,
whi1e the 1eft-
This is an impossibility and so
proof of Theorem 2.20,
for
-77-
\gl ~
ho1d
~,
imp1ying that the assumptions on g(g) of Theorem 2.22
for a11
9
€
~
and consequently all decision rules
are positive functions of x, are admissible.
8,
that
-78-
CHAPTER III.
The Admissibility of Bayes and Minimax Decision Rules.
Where chapter two dealt exclusively with some general admissibility
concepts of decision rules, chapter three will now look into the relationships between admissibility criteria and specifie ty.pes of deci sion rules.
As heretofore unheard of types of decision rules are introduced,
they will be defined.
The results of theorems expressing the above
mentioned relationships, valuable in themselves, will also be supplemented with examples.
The two main types of decision rules that we
will be concerned with in this chapter are Minimax and Bayes decision
rules, both discussed briefly in chapter one.
Definition 3.1.
bution
on G.
Unique up to equivalence:
Consider the prior distri-
If any two Bayes rules with respect to
't'
have the
same risk function, then any Bayes rule with respect to
't'
is said to
't'
be unique up to equivalence.
A sufficient condition for a decision rule
Theorem 3.1.
Bayes with respect to a given prior distribution
is that
Proof.
such that
50
't',
5, that is
o
to be admissible
be unique up to equivalence.
Assume
50
is not admissible, therefore there exists a
51
-79-
and
< R(e,ô)
o
< E R(T,ô 0 )
therefore
-
random variable distributed on
But
E
R(T,8.~
o
=
Rence we must have
But then
51
€
e
Q €
e where
Ô
is Bayes •
for
D*
ER(T,5 ) = ER(T,ô )
l
o
o
for aU
must also be Bayes with respect to 't'.
=
contradicts our assumption,
3.1.1.
T
is a
e by 't'
any two Bayes rules with respect ot
Corollary
for a11
E R(T,ô)
inf
ô
for some
R(e,ô)
o
e
eS.
But we have that
't'are equivalent and therefore
for
aIl
e e
aB
This
and completes the proof.
A sufficient condition for a decision ru le , that is
Bayes with respect to a prior distribution 't', to be admissible is that it
be unique up to equivalence among the non-randomized decision rules.
Proof. It suffi ces to show that if a Bayes rule with respect to
exists, there a1so exists an equivalent
't'
non-randomized Bayes rule with
respect to 't'.
Assume, then, that
the random variable over
is Bayes
with respect to 't'.
D whose distribution i8 given by
after changing the order of integration we may write
E r('t',Z)
50
Let
Z
Then
be
-80-
But
S
r(~,80)
with respect to~.
r(~,Z)
= r(~,5),
o
r(~.d)
d € D since
for all
80
Hence with probability one we will have that
which implies that any
d € D
that
implies that
d
Theorem 3.2.
Given that
is Bayes with respect to
GP =
dition for a decision rule
prioD distribution
Z
chooses,
This furtper
will, with probability one, satisfy
j
is Bayes
~.
{Sl",.,Sk} , then a sufficient con-
8, that is Bayes with respct to the
o
(Pl""'P k ),
to be admissible is that
p. > 0,
J
= 1, •.• , k.
Proof.
Assume, for the sake of contradiction, that
Therefore there exists a
8
1
(3.2.1.)
m~st
J
for a11
0
)
< R(S.,5
J 0
And now, since a1l
But sinee for
-
50
p.
J
> 0,
=
k
Z
j=l
is inadmissible.
such that
D*
< R(S.,5)
R(S.,8 )
l
J
and
€
80
= l, ... ,k
j
for some
j
we have
<
1
p. R(El .,5 )
J
J
k
Z p. R(S .,5 )
j=l J
J 0
to be Bayes with respect ot
(Pl"'"
Pk)
have
k
inf
5 € D*
Z
j=l
then the strict inequality of (3.2.1)
p.R(El.,5) ,
J
implies
J
50
is not Bayes.
we
-81-
. This contradiction means our assumption is wrong and
Note that in the 1ight of Coro11ary 3.1.1,
0
is admissible •
Theorem 3.2
confirms
the existence of randomized, admissible Bayes ru1es in special circumstances.
The necessity of theee circumstances i8 evident in the
fo110wing counter examp1e:
action
p =1
1
9
P2 =0
9
For
00
1
Loss = 1
1
2
2
0
1
o
1
(P1,P2)
we must have
2
to be Bayes with respect to
Ee
In this case,
ô
R(9,00)
E R(9,0)
inf
1. e.
al
€
D*
E
E R(9,0)
= inf
0 € D*
2
p. R(9.,0) for a11
Z
J
J
j=l
R(9,ô)
ô
inf
€
Renee, any decision rule whose
to
(Pl,P2)
is Bayes in this case.
*
e
€
e
1
D
"Bayes" risk is
1 wihh respect
-82-
Consider
ôi
Then
But
5 such that
0
is Bayes since
50
R(9 ,5 )
=
1
R(9 2 ,ô o )
=
1
l
ER(9,ô)
0
o .
=
1.
is not admissible sinee there exists
l
R(9 ,ô l )= l, R(92 ,ô 2 ) = 0
l
with probability l, such that
is better than
that chooses al
Ô '
(Therefore
ô
l
5 ).
o
Note that this example is effectively a counter example to both
Theorem 3.1
and 3.2, since not aIl Bayes rule have the same risk
(condition of Theorem 3.1) and certainly
p. 'j 0
J
for a11
j (condi tion
of Theorem 3.2).
Coro11ary 3.2.1;
Given that
is bounded from below and closed from below, then a sufficient condition
for an admissible Bayes rule to exist
is that
Proof.
p. > 0
for
J
From Theorem 3.2
with respect to a prior distribution
j = l, ••• ,k.
we observe that it is only necessary to show
the given condition is sufficient for the existence of a Bayes rule.
Rence, letting
w
k
~
=
PJ'YJ' where
j=l
w
and
y = (Yl""'Y k ) € S
k
=
( w =
~
j=l
PJ'YJ'
for some
y
€
Then
S }.
the boundedness of S from be10w implies W is also bounded from below.
o be the greatest lower bound of W.
W
that
k
(n)
~ p.y.
j=l J J
-+w.
0
is bounded above.
Consider the sequence (y(n)} < S
Then each p. > 0
J
k
'~l p.y<?
J=
J J
=w o
such
( n)
implies that each sequence (y.
J
This means there exists a finite limit point
sequence (y(n)} and that
Let
y
O
}
of the
-83-
Now y
o
is a limit point of
(see chapter two).
is any point of
Also
Q0
Q
f\
yO
S, y
0
s C (yo),
y ,
y'
j=l
S there would be points
€
Y € S
and
since otherwise,
k
Z
o
other than
, y
r'
S
€
<
PjY'j
W
o
if
y'
and then i f
k
Z
such that
j=l
This contradicts the fact of
Q (l -S
Hence
yO
0
= (y),
Now, since S
w 's
o
lower boundedness of W.
i.e.
is closed from below,
yO e S and the lowest (minimum)
k
value of
Z
PJ' R(QJ.,8)
is achieved by a point of S.
j=l
Hence,any
B
€
o
R(Q.,B) = y.
D* such that
J
for
J
therefore a Bayes decision rule with respect ot
Corollary
3.2.2. Given that
et
= (Ql, ••. ,e k )
j = 1, ... ,k,
is
(Pl"'.,Pk)'
and that the risk set
S
is closed from below and bounded, then Bayes rules with respect to aIl
prior distributions on
t&
exist.
However they will only be admissible
for prior distributions of the type of Corollary 3.2.1.
Proof.
The result follows straight from the proof of Corollary 3.2.1.
Then
S
is convex, bounded from below, and closed from below.
Take the prior distribution (Pl,P2)
such that
p
1
=1
and P2
= O.
2
Then
P .y. = YI' 1. e. the minimum "Bayes " risk over the points of S
Z
J J
j=l
is O. However zero does not belong to S, hence there does not exist a
Bayes rule with respect ta (1,0).
-84-
This examp1e shows that the restriction p1aced on the prior distribution of Coro11ary 3.2.1
must
and 3.2.2 cannot be removed (i.e.
be positive for a11 j).
a e ~ (the rea1 1ine)
Definition 3.2.A point
support of a (probability) distribut.ion
the interva1
f
on
is said to be in the
1R
is such thst f(a-e,a+e) > O.
(a-e, a+e)
e > 0
if for a11
Note that
the support of a distribution on ~ is a1ways a c10sed set for otherwise
f(a-e,a+e)
for any
Theorem 3.3.
function of
e > 0
is not necessari1y positive.
R(Q,ô)
Suppose that the risk unction
Q
for a11
ô e D*
G9 =
and that
~
Then if the support
of~ is
1JR,
Consider
•
~,
Bayes ru1e with respect to the prior distribution
is finite.
is a continuous
ô, a
·0
whose Bayes risk
we have that
Ô
o
is ad-
missibl:e.
Proof.
Assume, for the sake of contradiction, that
missible.
Rence there exists a Ô e D*
and
Now, since
R(f),ô)
R(f),ô)
for a11
f) e
for some
Q.
o
~
R(Q,ô )
o
o
is not ad-
such that
< R(Q,ô )
R(Q,ô)
Ô
is a continuous function of
f)
®
for a1l
ô e D*,
then
R(f),ô)
and where
We have from this:
< R(e,ô)
+ ~2
o
R(QO~Ô
o
) - R(e0 ,ô)
whenever
= a >
0
IQo - el
< e
-85-
(3.3.1)
> g2 ~(Qo
where
given by
+
- €, Q
0
€)
Tisa random variable whose distribution onf19 is
't'.
And now, since
is in the support of
Q
o
r(~,ôo)
assumption and hence
-
> 0,
r(~,ô)
~,
(3.3.1)
which contradicts our
is admissible.
Ô
o
Note that the assumption that therisk function
continuous function of
important cases.
9
for aH
ô
€
If
(a)
(b)
( c)
is satisfied in many
(b)
R(Q,ô):,.
L(9,a)
is continuous in Q uniformly for
For aH bounded
Then, the risk function
(a)
of
is bounded
-measurable functions
g(x) p(x;Q) d",,(x)
function of
R(Q,d)
4& = ~
li
integral
continuou5
If
i5 a
L(9,a)
Lebesgue
Theorem 3.5.
D*
R(Q,ô)
For example, the following two theorems are typical
of conditions that yie1d the continuity
Theorem 3.4.
gives us
a
€
~
g, the
i5 a
Q.
is continuous in
Q for al1 d
€
D.
(the real linel
The distributions (prior) on8 should be of the
one-parameter exponential family.
-86-
(c)
Given any
and
EI9 ,
~\,e2 €
Then,
R(e,d)
L(e,a)
G (e ,e2 )
l l
bounded on the compact sets of QI x
G2 (e ,e2 )
l
such that for all
(d)
there exists functions
a
€
we have:
"
is continuous in
e
funct~on
of
ls a continuous
QD
for aIl
e
a
for aIl
~.
€
d
D.
€
Note that these two theorems deal only with non-randomized
decision rules.
However, when the non-ramdomized decision rules form
an essentially complete class (see the conditions for this in the
results of chapter two) . then continuity of the risk for non-randomized
rules is quite sufficient
Example
Given a sample
with unknown Mean
squared error, i.e.
the problem is
"estimate"
e
Xl"",Xn
and known variance n.
=(~ea)2.
L(e,a)
X, the sample mean.
The loss function is
A sufficient statistic
This is also
We will show the admissibili ty of X.
We have that
X is distributed normally with Mean
forn aIl
d e D (since condition
(c)
R(e,d)
is satisfied for
2
We assume, for the sake of contradiction, that
X
is not admissible,
e
and variance 1.
is continuous . . ·in
G (e ,e ) = 2(e - e ) ) .
2 l 2
l
2
dl(X) =
for
the minimax
of e.
Now we have a1so from Theorem 3.5 that
and
from a normal population
G(9 ,e 2 )
l
e
2
-87-
therefore
there exists
a
d
€
2
D such that
and
and since
R
for sorne
is eontinuous in
Q,
o
threfore there exists
€.
€
>
0
such
implies
(A)
~a = N(O,a 2 ) and the Bayes
Now consider the prior distribution
ru1e
Q
d
a
with respect to
The joint density of X and Q is given by:
g(Q, x)
1
=
exp
2rca
Renee the marginal density of
f(x)
=[
h(Q/x)
(
N
L]
-
2 .
2a
X is, after integrating,
-1/2
2
2rc(1 + a ) ]
and the prior distribution of
[
® given
1+a
2rca
exp
X
[-
=x
is then given by
exp
[-
2) 1/2
2
(~
1 + i
-88-
Rence the Bayes rule
d
a
with respect to
da
x a
=
1
is
Ta
2
which has the following Bayes risk
+ a2
Renee, we may write
r(T
,dl) - r(T ,d )
a
a a
=l
a
_
l
2
+
l
i
Now consider,
=
a
exp [-
Then from (A)
-2-:.."....~-] ldQ
above we have
Q +€
o
a €
J
Q -€
o
Now
d
a
Bayes implies that
> (RQ,d)
0'
(B)
therefore
-89-
Now since
(B)
holds and
a
2
0,
then
a [r(T ,d 2 ) - r(T ,d )] > 0
a
0
a
OR
a[ r(T a ,d 2 ) -r(T a ,dl)] + a[ r(T a ,dl) - r(T,d
)]
a {1
which
> 0
implies that
9 +€
o'
€
f
p;-
(- L)d9
exp
20
9 -€
2
a
+
1
+a
2
> 0
0
Now,
as
0-+
IX)
9 -€
€
_ 92
0
exp (
f
21!
20
e -€
2
-2€
-+
)de
2
fT,!
0
and
_0_
1 + a
2
=
a
-Z
a
..L2
-+
0
+ 1
a
But
cannot be positive, hence we have a contradiction
and our assumption is wrong, i.e. X is an admissible estimate.
~D~e~f~i~n~i~t~i~o~n~.__~3~.~3.
A decision rule
€-Bayes rule:
with respect to prior distribution
<
if for some
inf
ô
is said to be €-Bayes
ôo
€
D*
r(T,ô)
+
€
€
> 0
-90-
Definition 3.4.Extended Bayes ru1e:
extended Bayes if
50
is
A decision
€-Bayes for every
ru~e
Ôo
€ > O.
is said to be
Note that it
fo110ws direct1y that any Bayes ru1e is extended Bayes.
Examp1e.
6& =
Given
and
L(9,a)
<!= 1R. (the
{O,OO),
=
(9-a)
real Hne)
2
X is given by the Poisson distribution:
The distribution of
fX (x/9)" =
for
exp ( -9)
x
=
> 0
9
The prior distribution"of
9
0,1,2 •••
is given by the gamme distribution:
exp(g(9)
Now for a ru1e
for every
ô
o
€ > 0,
=
for
= x to be extended Bayes we must have:
there exists a
r(1',ô )
o
5
l'
such that
5
-
where
l'
< inf r(1',5) + €
< r(1'.ô ) +
l'
i.e.
0:,13,9 > 0
€
is Bayes with respect to
't'.
But for our 10ss function, this means we shou1d have:
E(ô
o
- 9)2 < E(5
-
l'
- 9)2 + €,
inequa1ity can be expanded as:
and the 1eft-hand side of this
-91-
E(8
=
T 9)2
E(8 - 8 + 8
OTT
=
E [(8-8 ) (8
But
T
T
~9)
_ 9)2
- 9)2 + 2E[(8~8 ) (8
- 8)2 + E(8
E(8
T T .
]'
Renee we have that for
=
T
T
- 9)]
0
to be extended Bayes rule we should have:
50
_ 9)2 +
E(8
€
T
OR
E(8
Sinee
2
E(X ) =
g(9)
- 8)2 <
€
(a,~)
is Gamma
we have
E(X)
=
a~
and
a2~2 + a~2 + a~
Consider
Bayes.
T
T
to be Gamma
Well, sinee
~(a
8
T
i8 Bayes with respect tQ
E(8
- 5 )2
T
(a,~)
0
=
~
T,
=
+
+
now, and lets see if
is extended
50
x)~
1
then
E«a + x)~
~
1
1
+
_ x)2
1
E [(a+x)~ - (~+l)x]
E(x-a~)
2
2
92-
(T. E X
2
?~-2 2
=
+ 1)2
«(3
=
1
and
= n
A
I-'n
= n
'
and now, choosing ex , (3
n
n
( 1+n)2
i.e.
B
o
=
1
1
+
<
n
€-minimax if for some
for sufficient1y large n.
< inf sup
B
e
e
Bo
is
A decision rule
€-minimax if for
el € ~ and all
a1l
Least favourable distribution:
is said to be least favourable i f
ô
r('t' ,B)
o
=
sup
't'
ô € D*
R(e,B) + €
e
inf
is said to be
R(e,B) + €
sup
Definition 3.6.
Bo
€ > 0
sup R(e,B )
o
't' o € .*
€
is extended Bayes.
Definition 3.5. €-minimax decision ru1e:
i.e.
such
we have
l+n
=
+ ex (3
'U(3Ex
-
inf
B
r('t',ô)
A prior distribution
-93-
The right-hand side term is known as the maximin or 10wer value of
the "game".
Note that in a given statistica1 prob1em, there may not be a 1east
favourab1e distribution.
Lemma 3.1.
If
and if for a11
00
00
e
is Bayes with respect to the prior distribution
€
~
is a1so a minimax
Proof.
~o
ru1e, and
~o
is a
1east favourable distribution.
The resu1ts of the lemma fo110w direct1y from Definition 3.6,
the definition of minimaxity of chapter one, and the following:
inf
o
€
D* ~
r(~,o)
sup
€
1)*
< sup
e R(Q,o 0) -<
r(~
,0 ) <
00-
infr(~
0
0
,0) <
< sup *
~ € S
Examp1e
Given
(19 = (e ,e2 ),
l
on the rea1 line from
° to
Pe
[O,l't/2] =
the closed interva1
l't/2 , and the loss":function:
We have an experiment as fo110ws:
probability
~ =
of heads, where
a biased coin is tossed once with
Pe = 1/3, Pe = 2/3.
1
2
-94-
We will firstly compute
Let
x
(x,y)
for all
be a point in the plane such that we estimate
if heads is observed and
and
R(Ql,d) and R(Q2,d)
R(Q2' (x,y)
y
=
1
3' (- cos x - 2 cos y)
=
3' L(Q2'x) + 3' L(Q2'y)
=
1
(-2 sin x - sin y)
3
2
1: 0
,
8o
=
=
~
e
1
2
1/2
min
(x,y)
d
Therefore
To(Q) R(Q,(x,y»
R(el,(X,y»
1
+ 2 R(Q2'(x,y»
(-2 sin x - sin y)]
r(To'(x,y»
d
therefore
be
to each state of nature.
~ [(- cos x - 2 cos y) +
and we want
ta
D.
with respect
to the prior
.
which gives probability
Well
Q
€
1
We shall now find the Bayes rule
distribution
if tails is observed.
d
x,y
then eqpating ta 0, we have
= 61
(sin x + 2 sin y - 2 cos x - cos y)
-95-
sin x + 2 sin y
But this holds for
or for
80
i.e.
=
= sin
cos. x
:rr
= 4' =
x
= cos
x
y + 2 cos x
= cos
y
= sin
y
y
(xo'Yo) = (:rr/4,:rr/4)
is the desired Bayes ru1e with respect
to ;r •
o
We will now show
By lemma 3.1
8
is minimax and that
0
ft suffices to show.
Q €
We11
is least favourab1e.
't'o
=
3'1 (-
cos
4'11: -
2 cos
EIfJ
4':rr )
1
rz
i.e.
and
- rz
1
R{Q,8 o )
is mimimax
Definition 3.7.
and
't'o
<
1
~
= r('t' ,8 )
0
for a11
Q
0
€
<i
is least favourab1e.
€-admissible decision rule:
A decision rule
said to be €-admissible if there does not exist a rule
for a11
Q
81
€ Ci.
80
is
for which
-96-
Theorem 3.6.
If for some
e > 0 , a decision ru1e
50
is e-Bayes,
then it is e-admissib1e.
Proof.
Assume, for the sake of contradiction, that
e-admissib1e.
Then there exists
51
Letting
by
1a not
such that
R(Q ,5 ) - e
o
50
0
T be a random variable on
Et
for some '9
o
whose distribution is given
consider
'r,
=
But thts contradicts the e-Bayes condition on
Rence
5
Theorem
o
is
50 .
e-admissib1e.
Let ~ denote the class of aU Bayes rules, (extended
3.7.
given statistica1 decision prob~em.
Bayes rules) ini rru
Then if (B is
an essentia11y complete c1ass af decision rules, ~ is a complete class
of decision ru1es.
Proof.
Since
tB
there exists a
is essential1y complete, we have that for a11
5o e
œ such
R(Q,5 0 )
that
< R(Q,5) for a11
Q
eG.
Now assume, for the sake of contradiction, that ~ is not complete.
1. e.
there does not exist any
Q
o
e
aD
such that
-97-
But the essential completeness of ~ then implies
=
R(Q,ô)
o
Also
Ô
o
for a11
Q€
6)
is Bayes and hence
r(:~
o
,ô )
=
r(~ ,ô)
o
0
r(~
inf
Ô €
0
~o on ~ •
distribution
Now
R(Q,ô)
n*
~ ER(T,ô)
But, since
therefore
R(Q,ô )
o
=
,ô)
where
0
whose distribution is given by
o
~
o
R(Q,ô)
with respect to sorne prior
T
is a random variable on
G&
.
for al1
Q
€
c&
ER(T,ô)
=
ER(T,ô) •
But this means
ô
is a1so Bayes, which contradicts our initial
o
construction and implies our assumption was wrong.
1. e.
~ is complete.
We shal1 now give sorne wel1 known, but basic results.
The Minimax Theorem:
Theorem 3.8.
risk set
S
Given ~
{el' ... ,ek
} a nd the
is bounded below, then
(a)
inf
Ô €
n*
r(-r,ô)
~
sup *
€Q9
and there exists a least favourable distribution
~
o
•
-98-
(b)
If a1so,
an
S is c10sed from be10w, then there exists
admissible minimax decision ru1e
wi th respect to
't'
o
00 and 00
is Bayes
•
The proof, which is we11 known and offerEed in a11 decision
theory books, is omitted here.
6& = {91 , ••• ,9k }
Theorem 3.9. Given that
°
ru1e
Proof.
is Bayes with respect to some prior distribution.
Since
°
is admissible, then we have
(R(9~,0),
Qxn S =
Q
Hence
X
~
{X}
VI :s v'z
and
Now
y
p. > 0
J -
••• ,R(9k'0»
= {X}
•
and
S are disjoint convex sets.
exists a non-zero vector
taking
then any admissible decision
such that
V
for a11
y
j
::;:
for a11
so that
This meaas there
y
€
R(9.,0)
for some
J
j
and
QX- (X)
= 1, ••. ,k,
°
S, where
Z
€
€
n* .
for if some
p. < 0, then by
J
is sufficeint1y negative, we wou1d have
k
k
p. y.
j =1 J J
~
>
~
j=l
p. x.
J J
k
We may a1so norma1ize
V such that
~
j=l
makes V a probabi1ity distribution ofer
p.
J
= 1,
but this, in effect,
and hence,
",
-99-
k
~
< V'Z
Pj R(ej,Ô)
Z
for al1
S
€
j=l
This implies
ô
:iheorem 3.10.
is a Bayes rule with respect to V.
The Complete C1ass Theorem:
Given
q& =
{el, ••• ,e }
k
and the risk set S is both bounded and c10sed frombe1ow, then the clase of
al1 Bayes rules is complete and the admissible Bayes ru1e form a minimal
complete class.
Proof.
The required result follows from Theorem 3.9
Theorem 3.11
Given that
ce
= "{el".' ,et}
S is both bounded and c10sed from below.
give a1l their mass to
~.
of B* ,
elemen~of
d
o
€
of e1ements of
D* which
B, is a complete c1ass of decision rules.
Ô* €
Consider any decision ru1e
there exists a
and that the risk set
Then if B is the c1ass of a1l
B*
non-randomized Bayes rules, the class
and Theorem 2. 12.
B*.
B such that
Then by the definition
p(Z
*
- d ) = 1
o
where
Z
*
is a random variable on the action space whose distribution is given
This implies
R(Q,ô * )
=
R(e,d o )
Now, hy the definition of B, d
o
distribution
i.e.
r(T .d )
o
0
=
for a1l
e
€
e.
ie Bayes with respect to some prior
,
,
' J •.
-100-
r(~
and then, certainly
Pi > 0
such that
o
<
,d)
0
and for aIl
Now consider any
ô d
r(T~,d)
d
n*
d
€ D
D.
€
certairtly then,
Z is a random variable on the action space, whose distribution
and where
is given by
5.
k
k
And then
~
~
r(~
5*
=
p. R(9 .• d )
i=l
i.e.
for a11
u
~
o
,d )
0
0
~
i=l
=
r(~
o
,5 * )
is Bayes (with respect to
So, we have shown that
B*
c:
is.
T)
o
class of Bayes rules.
Now we will show
the inverse inclusion.
Consider any rule 51 that is Bayes with respect to the prior
,
distribution
~l
,
= (Pl , ••• ,Pk)'
k
=
Then we have
~
p~ R(9.,5)
i=l
And then, since
De
D* we have that
~
~
=
inf
*
5 € D
-101-
< inf
r('t'1'ô 1 )
therefore,
R( Qi ,5 i )
d
<
€ D
inf
d
k
~
i=l
,
Pi R(Q.,d)
1
R(Qi,d)
for a11 i
R(Qi,d)
for a11
such that
D
€
,
Pi > 0
At the same time we have
R(Qi,5 1 )
Rence
>
inf
d
?
€
,
i
such that Pi > 0
D
inf
R(Qi,d)
for a11
i
such that
,
p. > 0
1
d € D
therefore
r('t'l'ô 1 )
Now if
to
ô1
dl
rh· ,d)
1
D ia the non-randomized Bayes ru1e corresponding
€
and with respect to
Then
Tl (see Coro11ary 3.1.1. and Coro11ary 3.11.2)
=
OR
i.e.
= inf
d
and since
the c1ass of Bayes ru1es be10ngs to
Rence
B*
since
S
dl
€
B ~ B* we have
*
B
is the c1ass of a11 Bayes ru1es and therefore from Theorem 3.10
is c10sed and bounded be10w,
B*
is a complete c1ass of decision
ru1es.
Coro11ary 3.11.1.
Given that. is Unite and that the risk set
both c10sed and bounded from be10w and that
randomized Bayes ru1e.
S
is
B is the c1ass of a11 non-
Then we have that" the c1ass B*
of e1ements
of D*
that give a11 their mass to e1ements of B, is the c1ass of a11 Bayes ru1es.
-102-
Proof.
See the
p~oof
Defini tion 3.8.
of
Coro11ary 3.1.1.
A decision ru1e
Equalizer Rule:
equa1izer ru1e if it is such that
e
al1
€
~
R(Q,ô )
o
=
El =
[Q,-l]
ô
ia sa id to be an
o
some constaRb
C for
•
Examp1e.
Given •
= (0,1)
t
(Q _ a)2
=
L(Q,a)
and
Q(l-Q)
The distribution of
X J is binomial with
n
trials and probability
Q of success
i. e.
(x/Q)
f
=
x
= O,l, ••• ,n
X
Let a1so, the prior distribution on. be unifo~m,
i.e.
g(Q)
=
1
= 0
for
0< Q < 1
otherwise
We will show that the decision ru1e
respect to
i.e.
g(Q)
g(Q/x)
=
X
n
which is Bayes with
is an equa1tzer ru1e.
R(Q,d(X»
E (L(Q,d(X)/X = x}
1
Well
d(X)
E (L(Q,d(X»/X = x)=
is the posterior
computed as fo11ows:
J
(Q _
o
Q( 1
= c
for a11
Q
€
69
d(X»2
g(Q/x)dQ
where
- Q)
distribution of Q given
X
= x and g(Q/x) is
-103-
h(x,Q)
x = 0, l, .•• ,n
o
=
otherwise,
therefore
1
f(x)
f
=
0
(n)
x
=
1
f0
[(l)(n) 9X(1_9)n-x] dg
x
gX (1_9)n-x df>
for
x
= 0,1, ••. ,
n
(n+1)!
g(f)/X = x)
and
= h(x,f»
f(x)
=
for
xl
0
< 9<
l, x = 0, l, ••• , n
(n-x)!
Renee
R(f),d(x»
=
(n+1)!
x! (n-x)!
=
(n+1) !
x! (n-x)!
1
f
o
(X+1) ~ (n-x-x:)!
(n+1)
1
f)X+1(1_9)n-x-1 dg - 2d(x)
- 2d(x)
x!(n-x-1)!
n!
f 9X(1_9)n-x-1
o
dg +
+ d2(x) (x-1)!(n-x-l)!
(n-1) !
-104-
putting
=
d(x)
x
-x+1
n-x 1
= n
X
n
d(X)
Theorem
Proof.
A1so sinee
Bo
Bo
sueh that
(n+l)x
(n-x)n
+
C
is an equa1izer ru1e.
is admissible, it is minimax.
B ,
o
R(Q,B ) =
is an equalizer ru1e,
o
C
B, i.e.
o
for every
B
*
there exists a
€ D
R(Q0
,B) < 0
R(Q ,B)
Then
for a11
tnf
*
B € D
this implies
i.e.
for a11
is admissible, tben there does not exist any other
is betËer than
But
=
2x (n+1)
n(n-x)
3.12. If an equa1izer ru le
Sinee
OR
x(n-x) .
therefore
n
=
R(Q,d(x»
i.e.
(n+1)n
.!...±...!. - 2 d(x) E....±-.l
n - x
n - x
=
50
sup
e
R(Q,B)
sup R(Q,5 )
e
0
is minimax.
B
€
*
D
> C
= inf
5
€
sup R(Q, B)
D*
Q
Q
o
€
e
B that
...
-105-
Exame1e.
,
[0,1]
Given
8=
{9 ,92
1
~=
},
and the 10ss function is
the c10sed unit interva1
2
L(9 ,a)
1
=
L(9 2 ,a)
= 1-a
a
The experiment performed is that a coin is tossed once with the
probabi1ity of heads being 1/3 if 9
1
if
9
is the true state of nature and 2/3
is the true state of nature.
2
Note, first of a11, that since
is convex in a for a11
9
€.,
El is
finite and the 10ss function
we may, by the resu1ts of chapter two,
restrict our attention to non-randomized decision ru1es.
We may represent the c1ass D of non-randomized decision ru1es as a
. subset of the plane as fo11ows·:
D=
to be
x
{(x,y): 0 < x < 1,
if heads is observed and
We may find
R(9 ,(x,y»
1
y
and
O~
y
~
1 } where
Q
is estimated
if tai1s is observed.
R(9 ,(x,y»
2
where
(x,y)
€
D
as fo110ws:
__
R(9 2 ,(x,y»
_1 x2 + _2 y2
3
3
for a11 (x,y)
€
D
= ~ (l-x) + ~ (l-y) for a11 (x,y)
€
We now observe that the c1ass of non-randomizedBayes ru1es for this
prob1em is given by
D•
-106-
B
=
= 0,
{(x, y): y
We shal1 now pro~eed
0
<
Je
<
1}
€
D
to find a minimax rule among the class of (aIl)
Bayes ru1es for this prob1em. WeIl Theorem 3.i2
s
suggesœthat if an
equa1izer rule is admissible, it is minimax
A1so
Consider
d
Then
is an
d
o
do
(1,0)
=
B •
equa1iz~r
rule since
R(9,d o·) = 1/3
for a11
e
€
is admissible, for eonsider the fo11swing eomparisons:
rule
d0
€
o
R(9 , (x,y»
2
R(9 l ,(x,y?)
= (1,0)
1/3
1/3
(0,0)
0
1
(0,1)
2/3
2/3
(1,1)
1
0
1/4
1/2
(1/2,1/2)
1.
i.e
there is no
d
€
D sueh that
and
Renee
do
R(9,d) < R(9,d ) for al1
-
0
R(9,d) < R(9,d)
9
€
for some
9
A deeision problem is said to have a value if
V
o
~
is minimax.
Definition 3.9.
where:
V
=
BUp
inf
ô
€
D*
T
€.*
r(T,ô)
= upper value of the problem
=V
e
-107-
and
v =
sup
*
~ €.
*
inf
Ô €
= 10wer value of the promlem
r(~,ô)
D
Corollary 3.12.1. If an equalizer rule is
admiss~ble(arld
therefore minimax)
the game (problem) does not necessarily have a value.
Proof.
It suffices to offer the following counter example:
Let
69=
{1,2,3, ••• }.:
a=
(0,1,2, • •• }
L(e,a)
=
+ 1
if
a
<e
=
o
if
a
=e
if
a >
-.1
=
€GP.
a
=0
e
X be degenerate at zero for all
Also let the random variable
e
or
Then we have that
=
l(r(T,ô)
and
R(Q,ô)
r(~"ô) €
therefore
=
~
~(e)
R(Q,ô)
e
E L(e,ô)
and
ô,~,
[-l, +1 ] for al17
Le.
sup
T
for aU
Similar1y
ô
inf
r(~,ô)
= -1
for aU
't'.
ô
Le.
V = -1
1=
V =+1.
r(~,ô)
= +1
-108-
toro1lary 3.12.2. If
a deeision prob1em has a minimax admissible
equalizer rule then this rule is not neeessarily Bayes.
~.
Again, it suffiees to offer
~he
fo11owing eounter example.
In the eounter examp1e of Corollary 3.12.1 we show that
d
o
= 0 is
an admissible minimax equa1izer rule.
i.e.
d·
is an equalizer rule sinee
O'
for a11
9
and
€
R(9,5)
o
then
5 = l,
< R(9,d o )
R(9,1)
5
=
5
then
R(9,5)
=
o
+1
R(9,5)
=
d
o
for,
5
aueh that·
9 = 1
9 = 1,2,3,4
0
if
9 = 5
otherwise,
.
.
0 and action 1 with equal
1
ER(9,Z) = 2 R(9,0) +
< R(9,d)
o
Renee by Theorem 3.12
9,
=0
otherwise
to be the ru1e ehoosing action
probability, then
e
if
= +1
5
if
9 €
= -1
=
take
a11
for
for some
=
take
E L(9,d0 )
is admissible sinee there ia no rule
R(9,5) < R(9,d )
o
and
take
d
=
R(9,d)
0
for a11
is minimax also.
21
9.
R(9,1)
-109-
However,
do
is not Bayes for
inf
=
5
for
€
consider~
r('t',5)
There is no
't'
such that this is true,
D*
instance, take
then
t('t',d o ) =
1
r( 't' .1) = 2:
But
21
L(1,0) +
L(1,1) +
21
21
L(2,1)
L(2,0)
=
1.
=
Hence a minimax admissible equalizer ru1e need not be .Bayes.
Theorem 3.13.
If an equa1izer ru1e is extended Bayes, it is a minimax
ru1e.
Proof.
Given
r('t' ,5 )
o
0
50
is an equa1izer ru1e and extended Bayes then
= . some constant C < inf r('t' ,5) +
5
€
D*
€
€
0
°
>
Rence we have certain1y sup
't' €e*
therefore
€.
sup
..:'t'
*
r('t',5 ) = C = r('t' ,5 ) <
o
0
0
inf *
5 € D
r('t' ,5) +
o
€
and
inf
5
€
D*
= C = r(~ o ,5 0 ) <
+
€
inf
5
€
D*
r('t' o ,5)
+
< sup
inf
r('t',5) +
- 't' €.* 5 € D*
€
-110-
But sinee for a11
T,Ô
we a1ways have
r(T,ô')
inf
*
Ô'€ D
and therefore
sup
"T
1. e.
V
<
inf
€®*
r(T,Ô)
Ô € D*
<
inf
sup
Ô € D*
T €
ri
r(T',ô)
V
then we have,
V< V <
sup
T
whieh imp1ies for
€
o
e*
arbitrari1y sma11 that
€
V
r(T,ô )
=
sup
T €
is minimax
i.e.
e*
Coro11ary 3.13.1. If an equa1izer ru1e is Bayes, it is minimax.
Proof.
The resu1t is obvious from the definitions of Bayes and extended
Bayes ru1es 1. e.
Examp1e,:,.
A Bayes ru1e is
Given
alWQ~S
extended Bayes.
CID
=
the haH-open interva1
~
=
the e10sed interva1
L(e,a)
=
(e-a)
( 1-e)
2
10,1)
[0,1]
-111-
Let the observable random variable
.
i.e.
(a)
= (1-9) 9
We will first express
R(9,d)
=
l:
X have the geometric distribution
x
x
R(9,d)
L(9,d(x»
for
= 0,1,2, •••
d
€
D as a power series in 9
f(x/9)
x
00
=
l:
x=O
.. 2
(9-a)
(1-9)
(1-9)9
x
00
=
l:
x=O
(b)
We will now show that the only non-randomized equalizer rule is
d(O)
WeIl
R(9,d(x»
= 1/2,
=
dei) = d(2)
~ (9 2+x _ 2a 9x+l + a 2 9x )
x=O
and for
= ••• = 1.
x = 0,
a = d(O) = 1/2
x = 1,
a
= d(l) = 1
-112-
therefore
for
9 = 0,
9
R(Q,9(X»
1/4
=
1
= 2' R(Q,d(x»
1
1
16 - 4
+ (
We note that
R(9,d(x»
1
1
16
=
8' +
=
1/4.
+
1
+ 4) + (
1
32
1
32
+ ... ) + ...
+ ••.
diverges for a11 other
d'(x), hence this
is the on1y llon-randomized equalizer ru1e for thts prob1em.
(c)
We now show that if a ru1e in
distribution
where the
E((l)
~i
T,it is of the form
d(i) =
i
= 0,1,2, ... ,
are the moments of the distribution T ,i.e.
E(9)
= ~i' where T is the distribution on Et.
For
d(x)
to be Bayes with respect to
E R(9,d(x»
= r(T,d(x»
00
E R(9,d(x»
T
therefore
E
T
= ET (
R(Q,d(x»
l:
x=O
T , we must have
= inf r(T,d)
d
T
Now,
D is Bayes with respect to a
€
D
(9 2+x _ 2 d(x) QX+1 + (d(x»2 QX »
= ~1'
-113-
Now, taking the derivative
Bayes risk with respect to
and equating to zero to minimize the
d(x), we have
otherefore
(d)
=
=
d(x)
is Bayes with respect to
We now show the ru1e
By part (b)
0
d(x)
of part
(b)
l' •
is minimax.
it is an equalizer rule, therefore(Theorem 3.13) if it were an
extended Bayes rule it would:be minimax.
Bayes for we observe hy part (c)
The rule of part (b) is not
we must find a distribution such that
its moments satisfy the equalities,
III
Ilo
Since
1
Ili+l
= d(l) = l, .•• ,
III
® = [0,1)
continuous.
112
= 2'
= d(O)
=
d(i)
= l, •••
Ili
the possible prior distributions on. must be
The on1y possibilities among the usual distributions are the
uniform and the beta, but neither of these are satisfactory, for:
(i)
f(Q)
then
1
Cl! < 9 <
=-
III
=
and
1-12
III
1
2
~
2
~ê-Cl!2
IlZ
12
=
2
3
Cl! = 0,
~,
+ .ill:±Iù
4
i= 1
2
=
1/3
~
=1
here
-114-
~ii)
Beta:
the general moments of Beta
(~+r+l)!
=
a+r+l
a+!3+r+2
Renee we must show our
a!
{:: 1
d(x)
for all values of
of part (b)
We must show, then, that there is a
we can have
r(T€, d(x»
~
inf
€
of
a,~
> 0
is extended Bayes.
such that for every
and by part (c)
€ > 0
if we
€
that would make our
a~e.moments
~i ~i+l
where
T€
r(T ,d) + €,
d € D
could find a T
aEe
(a+!3+ l)! (a+r)!
=
therefore
(a,~)
d(i)
as close as we like to
T , then we'd
€
have our
d(x)
~i+l
is
extended Bayes.
Consider the above mentioned Beta distribution on~.
generally
~i
we can get
~i+l
=
a+(i+l)
We saw that
and we observe that for every
€ > 0
a~+(i+2)
as close to one as we like by choosing a
large enough
~i
and
~
small enough.
such that
i.e.
r(T, d(x»
€
d(x)
or
implies for every
< inf
d € D
(T ,d)
€
€
>
0
there exists a
+ €
is extended Bayes
[
d(x)
This
d(O) i;: 1/2
L d(l)
= d(2)
is minimax
=
= 1
-115-
We now observe and confirm by the following counter example,
that the converse of Theorem 3.12
is not necessarily true,
i.e. a
minimax equalizer rule is not necessrily admissible.
~
Example. Given
=
(1,2)
~=
(O,1,2)
=
L(e,a)
We see that
R(Q,O)
+1
=
if
<
a
if
°
=
=
if
-1
=
a
a
Q
° or
Q
=a
>e
° for a11
Q €
e
and
therefore
ao
=
is an equalizer rule.
Consider the prior distribution
then
ao
=
° is Bayes with respect to
r(T o ,a 0 )
Renee by Cor011ary
Rowever,
i. e.
a
T
o
3.13.1,
o
on ~ giv~ng mass one to Ql'
'1",
for,
=0 =
=
a
o
r(T ,5)
inf
5
€
D*
o
is minimax .
is not admissible for
al
= 1 is better than a o
and
We might now point out that though the above examp1e refutes
inevitability of a minimax equalizer ru1e being admissible, this is
not a1ways the case, as illustrated in the following:
the
°
-116-
Example.
6
Given
The distribution of
= (0,1),
(! =
[0,1]
X is binomial with
and
n
= (9-a) 2 •
L(9,a)
trials and probability 9
of success,
(nx) 9x (1_9)n-x ,
--
f X (/9)
x
Consider the prior distribution of
g(9) =
Then
h(x,9) =
9
Fa-fi3)
ra~
to be
çp-1
x
Be(
a,~),
(1_9)()-1
(n) 9~-1 ( 1_.9)n-JCtj3-1
x
= 0, l , ••• ,n
•
i.e.,
for . 0 < 9 < 1; a,13 > O.
l (a-fi3)'
rarr
, x = 0,1,2, ••• ,n,
.1
therefore
f(x)
=
h(x,Q)dQ
J
0
l
J
o
and
g(9/X=x)
=
e~-l (1_9)n-JCtj3-1 d9
h(x,9)
f(x)
=
Be (a+x,
~+n-x).
Then the posterior expected 10ss given
X= x
1
E (L(Q,a)
g(Q/X~)
} =
J
o
(Q-a)
is given by the expectation.
2
Be(d+x,
~+n-x)
dQ
-117-
Now, in seareh of a Bayes deeision rule, we would like to minimize the
posterior expeeted loss with respect to a.
But sinee the loss is squared
error this is done for a equal to the mean of the posterior distribution
of
9, i.e., the expeetation is minimized for a equal to the mean of
Be( a+n,
~+n~x).
Renee a Bayes rule for this proi1em, with respect
Be(a,~)
to the prior distribution
a+X
a+j3+n
=
d(x)
is given by
sinee the first moment of
a
a
+
~
Now, the risk funetion for
=
.9
2
(a+j3)2
(n
where
sinee
d
-n]
+a +
is given by
+ 9 [n-2(x(cx+j3)] +
ci
:2
~)
X is binomial with
n
trials and probability 9
of sueeess.
We now observe that i f a
independent of
9, i.e.
d0 (X)
OR
d (X)
0
= ~ = 1ü/2, then R(9,d(X»
R(9,d(X»
=
beeomes
is constant for a1l 9 when.
n
+X
2
n
+
2
is an equalizer rule.
=
.!!+n
2
Renee we have
X + n/2
n+
n
d (X)
0
is Bayes
-118-
Be(
with respect to
by Coro11ary 3.13.1
d (X)
o
, JÏi)
2
and an equa1izer ru1e, therefore
it is minimax.
is a1so admissible for this prob1em. WeIl
"
is Bayes with respect to Be ~ .en
2 ' 2) and by our derivation of
We now show
do eX)
n2
d (X)
o
rn
we observed that it was unique up to equiva1ence
,i.e.,
by
construction our Bayes ru1e is unique in the sense that its risk is
independent of
Q
and therefore wilUle the same no matter what the
prior distribution of
Q
is.
We May, then, app1y Theorem 3.1
prob1em here and rea1ize the admissibi1ity of
d (X).
o
to our
-119-
CHAPTER IV
SUMMARY
It is the intention of the author that this;conc1usive chapter
serve as a usefu1 ana1ysis of the preceeding resu1ts.
Where as
chapters two and three have evo1ved and--expounded the main theorems
dea1ing with admissibi1ity criteria in decision theory, this chapter
co11ects and tabula tes them.
It is hoped that the two tables that fo110w
will constitute an
easi1y accessible source of information both concerning this thesis
and the admissible aspects of decision theory in genera1.
Tables l
and II are in effect, then, offerred as a workab1e counterpart of the
the ory of chapters two and three.
Table l condenses the resu1ts of chapter two, and Table II the
resu1ts of chapter three.
Theorems and coro11aries are 1isted in
their order of appearance in the two ear1ier chapters, and therefore
the tables serve a1so as a handy index of resu1ts.
The assumptions
invo1ved in the specified theorems are denoted by an "A" and the results by a "C".
The relevant conditions are appropriate1y 1isted as
headings in th:! tables.
We now offer, in conclusion, a brief summary of direct methods
that may be app1ied to determine when a decision ru1e is admissible.
-120-
(Note that the source theory behind these methods can be found
in chapters two and three and are inherent in the tables of this
chapter) •
al
€
(1)
When a decision rule, in general, is admissible.
(a)
A decision rule
6t
that is-better than
and
(b)
s
(c)
E
n
a
€
a,
is admissib 1_e i f there is no other
i.e. if there is no
R(9,a l )
~ R(9,a)
R(Q,a l )
< R(9,a)
A decision rule
belongs to
a
50
€
space and
A decision rule
a
n*
xo
for aIl
for some
9
€
al
€
6r
such that
GP
9.
is admissible i f .
= (9 l , ••• ,9n J,
=
~ is admissible if it belongs to a class
€
of rules that is minimal complete for the problem.
(II)
When a Bayes rule is admissible.
(a)
A Bayes rule that is unique up to equivalence is admissible.
(b)
A Bayes rule that is unqiue up to equivalence among the nonrandomized rules is admissible.
(c)
(i)
If
is finite
rule with respect to
j = 1, ••. , n,
(d)
If
9 ,···,9n )) and 5 is a Bayes
l
p = (Pl'···'P n ) and P. > 0 for
J
is admissible.
5
= (
4D =
~~d
p. >
J
then
~
(1. e.
(Qi' ••. ,9 }, S is closed and bounded from below,
n
there is a prior distribution_ p = (Pl' •.. 'P n ) such that
0
for
j
= l, .•• ,n,
Bayes with respect to
P
then there is a rule
and admissible.
5
that is
-121-
(e)
If
69=
for a11
R(the rea1 1ine),
~ €
R(Q,5)
D* , 5 is Bayes with respect to
0
't'o is R , and the Bayes risk of
is admissible.
support of
then
(f)
50
is continuous in
I f . is Hnite,
S
Q
't'o' the
50
is finite,
is bounded and closed from below, then
there exists a least favourable distribution 't'and a rule 5
o
that is Bayes with respect to
(III)
(a)
't'o
and admissible.
Wh en a minimax rule is admissible.
If (1 i8 Hnite,
S
i8 closed and bounded from below, t:hen
there exists a minimax rule that is a1so admissible.
0
-
"
. ..... .
.
.
NNNNNNNNNNNNNNNNNNNN
TABLE l
~~~~~~~~~~OO~~~~PWN~~
__
OO~~~~WN~O
-
~~
1"'- 1"'1"'--
»
>
>
>
>
The 0 rems
CHAPTER II
CI) finite
{I) real
>,
>
tlfinite
~a convex subset of En space
»
~compact
areal
»
~cond. compact wrt. some metric(s)
>
»»>
»
»
>
>
n
n
n
>
finite
Po has no point masses for aIl
e dP
Space of Pe's cond. compact wrt. some metric(s)
L(O,a) convex in a e et
SeEn space
S closed
S closed and~bdd.
S compact
.
S closed from below
S closed and bdd. fr~m below
So(nonrandomized riskset) compact
Xo = (R(Ol,B o ), ••• ,R(On,Bo» e ~(S)
Nonrandomized rules eSSe com.
Nonrandomized rules E-ess. com.
L(O' ,11) > e"~1:r c, for some 0' e (1)
Only rules with finite risks
Do = {B: risk € ~ (S)} is min. com. class
DèD* is an eSSe con'class
S' (risk set for D) closed
w= {(L(Olt a), ••• , L(On, a»} :(aeG}closed and bdd.
61ass contains only 1 rule e each SBeA and no more
Class contains only 1 rule e each S8.EA
Class contains at least 1 rule € eaêh SBeA
Class of rules closed under equivalence
Existance of equiv. classes of A (SB.A)
Min. eSSe com. class
Min. com. class
Ess. com.class
Com. class
Class or rule admissible (A)
X
>n
n>
nnnn
nn
>
c
>
>
>
n
>
n
»
n>
">
n>
>
>n
n>
>
0>
> >n
°
....
N
"',
1
1
•
1»
.........
~~~~~~~~~~~~~~~~~~~~
--
~~~~~~~~~~~~~~~~~~~~
--
........
~~~~~~~~o
~~
....
»
>
>
>
»
>
>
»»>
-»
»
>
>
C':l
C':l
C':l
>
>C':l
n>
nnnn
C':lC':l
>
C':l
>
>
>
C':l
>
œreal
(lfinite
~ a convex subset of En spa ce
acompact
areal
~cond. compact wrt. soma metric(s)
X finite
Po has no point masses for all 0 e QP
Spa ce of Pa's cond. compact wrt. some metric(s)
L(O,a) convex in a e et
SeEn space
S closed
S closed and~bdd.
S compact
S closed from below
S closed and bdd. from below
So(nonrandomized riskset) compact
Xo = (R(Ol,5 0 ) , ••• ,R(On,5 0 » e ~(S)
Nonrandomized rules eSSe com.
Nonrandomized rules €-ess. com.
L(O' ,t) > elâl+ Ct for some 0' e CI)
Only rules with finite risks
Do = {5: risk e ~ (S)} is min. com. class
DCD* is an ess. com·clas~
S' (risk set for D) closed
w= {(L(Gt, a), ••• , L(On, a»}:(aeGjclosed and bdd.
01ass contains only l rule Et each S5eA and no more .
Class contains only l rule e each S5eA
Class contains at least l rule € eaêh S5eA
-Class of rules closed under equivalence
Existance of equiv. classes of A (S5.A)
Min. ess. com. class
Min. com. class
Ess. com.class
Com. class
Class or rule admissible (A)
>
C':l
C':l>
»
11>
C':l>
a>
>
>C':l
CHAPTER II
<1 finite
>-
>
»
TABLE l
Theorems
>
a>
>
>C':l
....1
~
t-)
1
•
e
,~,,~"'"
wwwwwwwwwwwwwwwwwww
•
t;; t;; ~ ~ ~ ~ ::; ::; 1.0 ~ ~ -....1 (J\ W N N NI-'l-' 1 Theorems
•
......
N
~~
•
......
NI-'
»»»
»>
>
TABLE II
Chapter
III
1-'
e finite
• = R(rea1
line)
C1ass of ru1es giving a11;their mass to N.R. Bayes
R(O,a) continues in Q for a11 ~a € et
>
S bdd.
>
S bdd. from be10w
>
S c10sed from be1oW'
>
S
closed and bdd. from below
»> >
>
»
Prior dis~ such that Pj>o for aIl j.
> Rule unique up to equiv.
>
Rule unique up to equi~ in nonrandom
Rule is equalizer
»»>
>
Support of prior distr. is R
l .. f.d. exists
(J
Value exists
(J
Class or rule is nonrandomized
Rule
extended Bayes
>
Class or rule €-admissible
(J
Glass or rule €-Bayes
>
>
Bayes risk finite
Ess. complete class
>
(J
Min. complete class
(J (J
(J
Complete c1ass
(J
Value doesn't necessari1y exist
(J
Rule not necessari1y Bayes
(J
Minimax rule(s) exists
(J(J
Bayes rule(s) exists wrt. some ~
(J
(J(J> (J
Class or rule minimax
>
»
>
Glass or rule Bayes wrt. some ~
»
(
J
>
>
(J
(J
(J
(J
(J"
Class
or rule admissible (A)
» > > >(J
>
1
1-'
N
W
1
-viii-
BIBLIOGRAPIlY
1.
B1ackwell, D., and M.A. Cirshick, (1954).
Theory of Cames and
Statistica1 Decisions. Wi1ey, New York.
...
2.
B1yth, Colin, R. (1951). "On· minimax statistical decision procedures
and their admissibility 11.
3.
Chernoff, H., and Moses, L.E. (1959).
Wiley, New
4.
Ann. Math. Stat., 22, pp. 24-42.
Elementary Decision Thcory.
Yorl~.
Dvoretzky, A., A. Wald, and J. Wolfowitz (1950).
"Elimination of
randomization in certain prob1ems of statistics and of the
theory of games".
5.
Proc. Nat. Acad. Sei. Wash., 36, pp.256-260.
Dvoretzky, A., A. Wald, and J. Wo1fowitz (1951).
"Elimination of
randomization in certain statistica1 decision procedures
and zero-sum two-person games l1 •
6.
Elfving, C. (1952).
"Suffici~ncy
function theory".
7.
Ann. Math. Stat., 22, pp.1-21.
and comp1eteness in decision
An •• Acad. Sci. Fennicae, 135.
Ferguson, Thomas, S. (1967).
Mathematica1 Statisticsj A decision
Theoretic Approach. Academic Press, New York and London.
8.
Freund, John E. (1962)., Mathematica1 Statistics. Prentic-Ha11, Inc.,
Eng1ewood C1iffs, N.J.
9.
Girshick, M.A., and Savage, L.J. (1950).
for quadratic 10ss function".
"Bayes
aoo
minimax estimates
Proc. Second Berkeley Symp.
on Math. Stat. and Probabi1ity, pp. 53-73.
10.Hodges, J.L., Jr., and Lehmann, E.L. (1950) "Some prob1ems in
minimax point estimation".
Anc.. Math. Stat."
21, pp.182-197'.
-ix-
11.
Hodges, J.L., Jr. and Lehmann, E.L. (1951). "Some applications
of the Cramer-Rao Inequality".
Proceedings of the second
Berkeley Symposium on Mathematical Statistics and Probability.
12.
Hodges, J.L.,Jr, and Lehmann, E.L •• (1952). "The use of previous
experience in reaching statistical decisions."
Ann. Math.
Stat., 23, pp.396-407.
13.
Karlin, S. (1958).
loss".
14.
"Admissibility for estimation with quadrattc
Ann. Math. Stat., 29, pp.406-436.
Katz, M.W. (1961). "Admissibility and minimax estimates for parameters
in truncated spaces".
15.
Kiefer, J., (1953). "On Wald's complete class theorems". Ann.
Math. Stat., 24,
16.
Ann. Math. Stat., 32, pp.136-l42.
p~.70-75.
Lehmann, E.L., and Stein, Charles, (1953). "The admissibility of
certain invarient statistical tests involving a translation
parameter."
17.
Ann. Math.Stat., 24, pp.473-479.
Parzen, E. (1960).
Modern probability Theory and its Applications.
Wiley, New York.
18.
Wald, A., (1945). "Statistical decision functions which minimize the
maximum risk".
19.
Wald, A. (1947). "As essentially complete ciass of admissible decision
functions".
20.
An •• Math. (2), 46, pp.265-280.
Wald, A. (1949).
Ann. Math. Stat., 18, pp.549-555.
"Statistical decision functions"
Ann. Math. Stat.,
20, pp.165-205.
21.
Wald, A. (1950)
"Basic ideas of a general theory of statistical
decision rules".
22.
Wald, A. (1950).
New York.
Proc. Int. Congo Math., l, pp.23l-243.
Statistical decision functions, John Wiley and Sons,
,
-x-
23.
Wald, A., and Wolfowitz, J.(1948) "Optimum character of the
sequentia1 probability ratio test".
Ann. Math. Stat.,
19, pp.236-339.
24.
Wald, A., and Wo1fowitz, J., (1950). "Characterization of the
minimal complete c1ass of decision functions when the number
of distributions and decisions is finite".
Proc. Second Berkeley
Symp. on M"ath. Stat. and Probability., pp. 149-157.
25.
Wald, A., and Wolfowitz, J., (1951). "Twomethods of randomization
in statistics and the theory of games".
Ann. Math. (2)
53, pp.581-586.
26.
Wason, M.T., (1962). "Minimax estimation of a negative binomial
parameter".
27.
Ann. Math. Stat., 33, p.50I.
Wason, M. T., (1967) "Parametric Estimation", Vol. 1. Queens Papers
in pure and app1ied mathematics - No. 9, Queen's University,
Kingston, Ontario.
28.
Widder, D.V. (1946).
The Laplace Transform.
Princeton Pniversity
Press, Princeton, N.J.
29.
Wo1fowitz, J., (195). "Minimax estimates of the mean of a normal
distribution with known variance".
Ann. Math. Stat. 21,
pp. 218-230.
30.
Wo1fowitz, J., (1951). "On €-comp1ete classes of decision functions".
Ann. Math. Stat., 22, pp. 461-465.
© Copyright 2026 Paperzz