Sengupta, Debapriya; (1988).Improved Estimation in Some Nonregular Situations."

IMPROVED ESTIMATION IN SOME NONREGULAR SITUATIONS
by
Debapriya Sengupta
A dissertation submitted to the faculty of the
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for the
degree of Doctor of Philosophy in the Department
of Statistics
Chapel Hill
1988
Approved by
~
/tdv!sor
J~~
.
Reader
Reader
t:
Improved estimation in some nonregular situations
DEBAPRIYA SENGUPTA.
(Under the guidance of PRANAB KUMAR SEN )
Abstract: Two related topics in the area of improved estimation in the
nonregular case are considered. Firstly, we consider the mul tivariate
normal mean model when the mean vector is restricted to a posi tively
homogeneous cone. In this context, many authors advocated the use of the
restricted MLE to estimate the mean vector.
Stein-rule or shrinkage
versions of the restricted estimators are discussed, and in light of the
usual quadratic error loss,
the risk dominance properties of various
proposed estimators are established.
The second part
is devoted to a general
theory and
the Stein
phenomenon is explained in greater detail from the frequentists' point
of view. Some important examples are considered along with this. Finally
a
unified
admissibility
result
for
generalized
Bayes
rules
is
established in some multivariate parametric models of interest. This is
eventually
used
to
generalized Bayes
location parameters.
derive
rules
a
dichotomy
result
for
component
in simultaneous estimation of a
wise
number of
TABLE OF CONTENTS
OIAPTER I
INTRODUCTION
1.1
Outline of the problem
1
1.2
Literature review
3
1.2.1
The Stein effect
3
1.2.2
Estimation in restricted parameter spaces
1.3
OIAPTER II
Summary of results
.
18
22
SHRINKAGE ESTIMATION IN A RESTRICTED PARAMETER SPACE
2.1
Introduction. . . .
25
2.2
Preliminary notions
27
2.3
RSML when }; is known
30
2.4
RSMLE when }; =
2.5
RSMLE when}; is arbi trary (unknown)
44
2.6
Some extensions and important applications
49
2.6.1
Sub-orthant models
. . . . .
49
2.6.2
Univariate linear model:
Restricted MLE
53
2.6.3
Ordered alternative problems
2.7
'"
'"
0
2
V. Vis known
'"
43
'"
'"
56
Relative performance of SMLE and RSMLE
57
2.7.1
Relative risks at pivot. . . .
58
2.7.2
Directional variation of the risk
59
2.7.3
Dominance properties of the RSMLE
62
2.7.4
Risk computations for other versions of
shrinkage estimators
71
CHAPTER III IMPROVED ESTIMATION UNDER GENERAL SET UP
3.1
Introduction . . . . . . .
73
3.2
Formulation of the problem
74
3.2.1
74
The model . . .
3.2.2 Loss functions
78
3.2.3 Class of improved estimators
81
3.3
A Stein type identity
.
3.4
Heuristic solution of the Stein type inequality
95
3.5
Some applications
98
. . . . . . . . . . . . . . .
87
CHAPTER IV A UNIFIED ADMISSIBILITI RESULT
4.1
Introduction . . . .
.
115
4.2 Results from exponential families
4.3
117
A unified admissibility result for more
general set up
119
4.3.1
.
119
4.3.2 Structure of Bayes estimators
125
The set up.
4.4 Some applications
. . . . . . . . .
4.5 Tail inadmissibility of generalized Bayes
estimators
NOTATIONS AND ABBREVIATIONS
BIBLIOGRAPHY
153
168
rnAPTER I
INTRODUCfION
1.1. Outline of the problem:
The
estimation
of
the
mean
or
location
of
a
multivariate
distribution is a well attended problem in the theory of parametric
inference.
In
particular.
properties
such
as
minimaxity
and
admissibility of various proposed estimators have been studied in great
detai 1.
Perhaps
estimation
the
of
Traditionally.
most
the
studied
mean
of
it was believed
situation
a
in
this
multivariate
that
the
context
normal
is
the
distribution.
sample mean was
the best
estimate of the population mean until Stein (1956) established that
while estimating the mean of a p-variate normal distribution (p
~
3) the
sample mean is inadmissible under quadratic loss. This result is indeed
surprising and very paradoxical. That is why this fact is typically
referred to as the "Stein's paradox".
The above observation triggered off a great deal
essentially in two directions. Brown (1966,
1971,
1979,
of
research,
19S0) in a
series of papers addressed the question of admissibili ty under a more
general context. A series of papers by Efron and Morris (1973), Berger
(1976a, 1982) and Angers and Berger (1986) gave interpretations of the
above "Stein effect" from the Bayesian and the empirical Bayes points of
2
view. These investigations give us a much better understanding of this
interesting
phenomenon
(
with
regards
to
its
applicability
and
limitations) in Statistical Decision Theory.
A second direction was pursued by Hudson (1978) and others which
extended the above ideas to other types of distributions (essentially
for general exponential families).
See Hudson (1978). Berger (1980).
Berger (1982). Ghosh. Hwang and Tsui (1983. 1984) and Berger (1985) for
these extensions.
In
all
the
problems
discussed
above
the
parameter
space
is
unrestricted. For example. in the estimation of a p-variate normal mean
problem the unknown mean vector could be any point in
RP.
But there are
many problems where the parameter space is restricted in a very natural
way. To motivate this. consider the following examples :
i
) Ordered al ternative problems: Here the parameter space is
0=
so that
~
R :
91~ 92~
~
9p }
the null hypothesis relates to
= 90 ! .
!
it
{!
)
90
~ ~.
...
Orthant Problems: Here we lmow that the mean vector 9 is
coordinatewise bigger than some prefixed vector in
of.
The parameter
space in this case :
o
= {
! ~ RP: ! ~ !o }
In these si tuations i t is well lmown that one can find other
estimators of the parameters which perform better over the restricted
parameter space than the standard estimators in the unrestricted setup.
However. a different picture might emerge on the complementary parameter
space.
Other estimators in the restricted setup and their applications in
3
various
restricted parameter space problems
(
i.e.
mostly.
testing
problems ) have been considered by Bartholomew (1959. 1961 ). Chacko
(1963).
Kudo
(1963).
NUesch
(1966).
Perlman
(1969).
Barlow et al.
(1972). Sen and Saleh (1985) and others.
An
interesting
question
arising
at
this
point
concerns
the
relationship between the Stein effect and the restricted parameter space
problems. Some partial results have been obtained in this area by Chang
(1981.
1982). and Judge and Yancey (1986). Establishing the Stein' s
phenomenon in a
restricted parameter space setup will be the first
objective in this dissertation.
Another objective will be to develop a general theory of improved
estimators.
In most of
the available results Stein' s phenomenon is
established by using a special type of identity which typically follows
from Green's theorem. We refer to these problems as "regular problems".
Al though this technique yields resul ts in some cases of interest, its
applicability is quite limited. Our primary objective will be to obtain
improved estimators (in the decision theoretic sense) in those problems
which are "nonregular". This justifies the title of this dissertation
and with this in mind we proceed to the next section.
1.2 Literature
Review:
1.2.1 The Stein Effect:
We begin this section wi th the pioneering work of Stein in this
area.
Stein(l956) pointed out that in a mul tivariate normal
location
problem the usual estimator of the population mean. namely. the sample
mean (which is also the best invariant estimator. UMVUE and the MLE)
4
is inadmissible for p
space).
But for p
~
(where p is the dimension of the parameter
3.
= 1,2
-
the sample mean is admissible.
Let X be a random p-vector which is normally distributed wi th
completely unknown mean
!
and dispersion matrix
g.
Consider
the
following loss function :
which
is
the standardized quadratic
loss.
In
this case
the usual
estimator of 0 is given by :
.....
6 = X .
0
'"
'"
Further, the risk function of
~O
is
Stein (1956) showed that the following estimator
b
~l = ( 1 -
with a. b
>0 .
a + X' Q-l X
) X
would have smaller risk than the usual estimator
~O
for
sufficiently large a and small enough b.
He obtained the following inequality
p _ 2b
p -
2 -
bl 2_
+
0
(
a + 0' Q-l 0
1
-----)
a + 0' Q-l 0
The o-term goes to zero uniformly over 0 as a
~
00.
So for the dominance
'"
in the risk, it is enough to have sufficiently large a and 0
2(p-2). For p
=
2.
the admissibi Ii ty of
~O
<
b
<
can be shown by first
5
obtaining a lower bound for the risk function of a general estimator.
For p
~
3. Stein also showed that among all possible estimators
of 9 of the form
!h = ( 1 - h ( X'
Q-l X ) ) X
(which are called the spherically symmetric estimates of 9 ) . the best
-1
order of the asymptotic improvement ( 1. e. as 9'Q
9 ~
00
)
over the
standard estimator X is given by
Essentially motivated by this type of asymptotic optimality criterion in
the class of spherically symmetric estimators of 9 • James and Stein
(1961) obtained the celebrated James-6tein estimate of 9. which is given
'"
by
!.Is =
A
p-2
(1 -
---,)X
X' Q-l X
This is the estimate one obtains by putting a
earlier expression.
=0
and b
= p-2
in the
The risk function of the above estimator turns out
to be
(!.Is'!)
A
R
=p
2
- ( p - 2)
EO II~II
~
It is easy to see that the above estimator is asymptotically optimal in
the sense described above.
The estimator described above possesses the property of shrinking
the original
estimator
towards
the origin.
For
that
reason
it
is
6
sometimes called a "shrinkage estimator". The origin is called the pivot
of the estimator.
even any
In fact the pivot can be chosen to be any point or
lower dimensional
If.
subspace of
Hence. we have different
versions of the original James-6tein estimator
where
!o is any pivot.
p-3
A2
b} !JS = X + ( 1 -
X = p! 'J"P1
where
X
",i
_ 2) ( X - X )
IIX-XII
is the common mean. This form of the estimator is
referred to as the "Lindley' s form" (Lindley and Smith (1972)}.
Note :
2
II!II Q =
-1
!'~!
for any .,
_D
Eo
I("
~
The above estimator has finite risk only for p
When
the covariance matrix
is
unknown
4.
the
following
estimator
improves upon the sample mean:
aT; = (
~
p-2
1 - -- (
n-p+3
X' 8- 1 X }-1 ) X
'"
'"
'"
where 8 '" W ( n-1.p.Q ) is some estimator of Q • which is independent
of
X.
Here. W ( .•.•. ) stands for the Wishart distribution with respective
parameters.
The risk function of the above estimator is :
AM
R ( !JS •
Baranchik
! )=p
(1971)
-
n-p+1
n - p +
obtained
a
3
class
(p -
of
2
2) EO II ~ II Qminimax
2
estimators
for
the
7
multivariate normal mean; we present the improved version here due to
Strawderman (1971).
Theorem: If X is distributed as H (9,1) and if p
p""""
~
3 then any estimator
of the form :
where 0
~
t (.)
~
2 ( p - 2 ) and is nondecreasing, is minimax.
The method of proof in both the cases utilises the fact that t(.)
is nondecreasing and some properties of
the noncentral Chi-squared
distribution.
This more or less describes the nature of the development in this
area which is very closely related to the main idea in Stein (1956).
Next we give a brief discussion of the Bayes and the empirical Bayes
extensions of the above ideas.
Perhaps the first significant resul t in this direction came from
Blyth (1951). He developed a technique of proving the admissibility of
certain limits of Bayes rules under suitable conditions. The statement
of this result in its most precise form is given in Berger (1985):
In a decision problem where the parameter space (} is a convex
subset of
RP,
a decision procedure 6 with continuous risk function is
admissible in the class of all estimators with continuous risk function
if 3 a sequence { T
j
} of ( generalized) priors such that
n
a) the Bayes risks r (6) and r (6 ) are finite for all n. ( where
n
n b
n
6
is the Bayes estimate with respect to { Tn }. )
b
b) for any nondegenerate convex set
such that
T
Ie
dF n (x) ~ K.
e c (} .
3 K
> 0 and an integer N
8
lim n-+
c}
n
co
{ r n (6) - r n (6 b ) } =
o.
Although this technique is very easy to describe and is very useful
in some situations. often the choice of {
'Jr
heuristic
sequence
method
for
finding
out
this
n
} is quite difficult. A
can be
found
in
Brown (1979).
Following a similar line of reasoning used by Blyth (1951), a
interesting characterization of admissibility was obtained by Stein
(1955). See also Diaconis and Stein (1983).
Stein (1955) defined the concept of strict admissibil i ty which
reduces to usual admissibili ty wi thin certain weakly compact class of
estimators and proved that a rule 6 1 in that class is admissible if and
only if for every
91~
O.
~
0
where L is a loss function satisfying some regularity conditions.
Le Cam (1955) and Farrell (1968 a,b) considered similar problems
later. Stein (1955) mentioned that the necessity part of this result was
useful heuris tically in the recogni tion of the inadmiss i bil ity of the
usual estimator of the mean of a mul tivariate normal distribution of
dimension bigger than 2.
This shows some of the intuitive apPeal of this result.
Motivated
primarily by the above result Stein (1959) obtained a similar result for
general location families ( viz. inadmissibility of the best invariant
estimator under quadratic
loss
).
The admissibility result
in one
dimension had al ready been obtained by Kar lin (1958) . Stein (1959)
9
relaxed some of these conditions.
It is important to point out
that all the calculations done by
Stein (1959) were under the quadratic loss. Brown (1966) considered the
fixed sample size and the sequential case and established the dichotomy
result under very general loss functions in arbitrary location problems.
His resul ts actually suggest a more general dichotomy phenomenon in
decision
theory
but
some
counterexamples
are
also
available.
See
Blackwell (1951), James and Stein (1960) and Brown (1966).
Brown (1971) pursued this direction further and came up with a
beautiful result connecting the above phenomenon to ergodic properties
of the Brownian motions. It was established that in a p-variate normal
location problem under the quadratic loss, to each 'possibly admissible
estimator', there corresponds in a natural way a diffusion process in
the p-dimensional Euclidean space. The result essentially says that the
estimator
is admissible
if
the
corresponding diffusion process
is
recurrent. Both the problems reduce to solving the same boundary value
problem.
That
is how
these
two apparently different
problems are
connected.
Another point of view emerged in the early 70's due to Efron and
Morris. This was the empirical Bayes aspect of the Stein's phenomenon.
Efron and Morris (1973) considered the multivariate normal problem with
normal priors whose variances are unknown. By constructing the Bayes
estimator of the unknown mean and substi tuting an estimated value for
the unknown variance appearing in the Bayes estimate it was shown that
the usual James -
Stein estimator was a reasonable empirical Bayes
estimator.
The above point of view might have had its origin in Brown (1971)
10
where he showed that, in some sense , while looking for an admissible
estimator, one should always look for generalized Bayes rules.
Stein (1973) developed another technique of proving inadmissibility
in the mul tivariate normal context . This technique turned out to be
very useful later on and can be described as follows :
If X is distributed as Np"",,,,,,
(e,I) then
for any real valued differentiable function f. Now any generalized Bayes
estimator of
~(~)
e can be written as ,
=X +
v log
f(~)
, for some f
> O.
Thus, by the above integration-by-parts formula, one obtains
E
e
II ~ +
=p
+ E
v log f ( ~ ) - e
e
{2
~
f
f
II
2
II vf II
2
}.
f2
Therefore, we can assert that
~f
p+2~--";;;'-
f
is an unbiased estimate of the risk of a nearly arbitrary generalized
Bayes estimator. Now, by considering a superharmonic f (i.e. v 2 f ~ 0)
one can prove the inadmissibility of X .
Hudson (1978) obtained a general Stein type identi ty for certain
distributions in the exponential family. This opened up a new direction
for research in this area. We have already observed that the main trick
in Stein's identity was to be able to write
11
R(X+g(X},O} -R(X,O}
for some operator 8.
We call 8 (X) an unbiased estimator of the risk differential.
g",
general it is not easy to find out the operator 8
In
g
Hudson obtained an easy way of obtaining the above identi ty for
certain members of the exponential family (which includes normal, gamma,
Poisson etc. among others).
EO g( X } ( X - O)
He established the following identity:
= EO a(
~
} g' ( X )
for an almost arbitrary function g. ( g' denotes the first derivative
of g). Using this identi ty and then solving the ensuing differential
inequality (
like,
v2 f
~
0
in the normal
case
)
some
improved
estimators in the multi1Xlrameter exponential families were obtained.
Berger (1980) considered the problem of finding out improved estimators
for gamma scale parameters in a simul taneous estimation si tuation. In
developing a constructive general theory for improving upon inadmissible
estimators of
the natural parameters of an exponential
natural identities were derived.
f(x,O)
= ~(O)
two
Let
t(x) exp { - 0 rex} } I
(x)
(a,b)
denote the densi ty of a certain exponential
under certain conditions,
family
a.e
~
type distribution. Then
12
EO [ 0 h(X) ]
=
s' ( X )
EO [
t(
X)
]
for some differentiable h . where
sex)
t( x ) h( x )
=
r' ( x )
Also
r'(
X ) g( X )
t( X )
where
g(x)
= jK
t(y) hey) dy .
Actually, these are straightforward generalizations of the inequalities
obtained by Hudson (1978).
Next he considered loss functions of the following form
for all i .
Here we are concerned with the simultaneous estimation of the parameter
~ = (
° °
1 '
2 ,....... 0p ).
where. Xi is distributed with p.d.f f(x,Oi) for all i
An unbiased estimate of the risk differential can be found by
repeated application of above identities. Then,
'"
solve the differential inequality AR
R( 6 + b .
e ) -R( 6 . e )
~
0 so that
the next step is to
13
~
0
,.,
where AR denotes the unbiased estimate of the risk differential. As an
application
of
the
general
technique
the
problem
of
simultaneous
estimation of gamma scale parameters was considered.
Suppose Xl' X2 ,···, Xp are
having distribution
f{x,a,a)
=
~(ai,ai)
aa
x
a-I
independent
random variables wi th Xi
.( i.e. the density function is given by
exp { - a x}
, 0
<x <
00
•
)
f{a)
Then,
i) under the loss function ,
an improved estimator can be constructed as follows
Define 6 1 by
61
= ( 6~
, ..... ,
6~)
where
i = 1,. ~ ... ,p
with b
>0
Then for p
where,
and 0
~
2,
<c <2
( p - 1 ) .
6 1 dominates the usual estimator of 9
'"
14
ii) Under the loss function ,
the usual estimator
~O
described above is again inadmissible for p
Estimators of the following form would dominate
[ 1 -
Then for p
~
3,
~1
~O:
~
3.
Define
b
would be an improved estimator for suitable choices
for the constants b and c.
As pointed out by Berger (1980), improved estimation under general
setup
raises some extra issues :
i) It is not necessary to have dimension more than or equal to 3 for
improvement. There exists a characteristic number corresponding to each
simultaneous estimation problem (which may be 2 or 3 or even higher ).
if) Improvement may not have a general property of shrinking the
original estimator towards a smaller dimensional subspace or a pivot.
A better understanding of
the above phenomenon was pursued by
Berger and others afterwards. See Berger (1982 a,b) • Brown (1980) for
details.
Ghosh.
Hwang.
Tsui
(1983)
considered
the
constuction
of
improved estimators under loss functions similar to those described
in Berger (1980) and developed estimators with the property of shrinking
the original estimator towards some prefixed point.
The same group of authors considered the same problem in continuous
exponential families
(
i.e.
developing improved estimators wi th the
property of shrinking the original estimator
towards
some prefixed
15
point. See Ghosh , Hwang and Tsui (1984) for more details.
There
are
numerous
other
results
in
this
area
dealing
with
simultaneous estimation in exponential families. See Dasgupta and Sinha
(1986) where improvement over polynomial type estimators were sought in
the gamma family. But a general theory for construction of a reasonably
large class of minimax estimators is not available yet.
See Berger
(1985) for an extensive literature in this area.
While going through the literature in this area it becomes quite
clear
that
the basis
expansion of
the
of Stein's phenomenon
loss function.
unbiased estimator of
particular algebraic
The whole
lies
idea of
in some
tyPe of
finding out an
the risk differential depends heavily on the
form
of
the
underlying density and
the
loss
function. So it is reasonable to conclude that an interplay between the
functional form of the loss function and the densi ty function is the
cause of the Stein effect.
We can now ask questions like, "why does the quadratic loss fit so
perfectly in normal mean problems?" or " why is it so convenient to
consider
the
losses
introduced
by
Berger(I980)
for
the
general
exponential family setup?"
The available literature attempting to answer the question in this
generalit~
is relatively scarce. A brief review is given below.
Berger (1976) found out a broad class of minimax estimators for
multivariate normal mean under general quadratic loss of following form:
where Q is a p.d matrix and p
~
3.
The class of minimax estimators obtained can be described as
16
o(X)
where
= ( 1 - reX) (
~'~ ~)-1
the matrices B • C and
r:
B ) X }
~ R satisfy
RP
some
technical
conditions.
Berger (l978) obtained improved estimators for
the mul tivariate
normal problem under certain polynomial type losses. He considered those
estimators
0
for
which
it
was
possible
to
write
down
the
risk
'"
differential as follows:
AR
=
R(O.o) - R(O.X)
Here. we have. O(X)
~(X)
'"
h (X)
Uk '"
=X +
~(X).
and.
= ( 'I(X) •........• ,
'"
= ~_
i-I
p (X - 0)
k '"
'"
(X) )
p '"
[~(X)] m(k.i)
i '"
=~
i=1
(X - 0 ) n(k.i)
i
i
where m(k.i) and n(k.i) are nonnegative integers for each k and i .
Such an estimator 0 would be an improvement upon the standard
estimator X if it belongs to the following class of estimators:
{
~: o(~) = (
1 - (d +
~.~ ~)-1 ~ ) ~
}
17
>0
for suitable choices of the matrices B and C and d
The technique employed
.
here was yet another application of the
fundamental identi ty obtained by Stein (1973) which has already been
discussed. Brandwein and Strawderman (1980) considered the problem of
estimating the location of a spherically synunetric
densi ty under the
loss function of the following form:
= F(IIEh,1I
L(9.4f')
2
where the function F:
)
+
+ is a concave function. Now. it is known
~ ~ ~
that the standard estimator X is minimax wi th constant risk.
It was
found that the estimators of the form
2
2
(X) = ( 1 - a r(IIXII ) IIXII- ) X
{,
a.r",
'"
where.
2(p=2)
i)
o
ii)
r is a monotone increasing function with 0
~
a
~
p
~
r
~
1 .
would be minimax and hence would have smaller risk.
Berger
mul tivariate
and
normal
Haff(I983)
mean
for
constructed
unknown
minimax
covariance
estimators
matrix
and
of
under
arbitrary quadratic loss.
With this we complete our discussion on the development of improved
estimators under various models and loss functions ( when the parameter
space is unrestricted or natural ). We have tried to highlight various
important issues which have been emphasized by various researchers in
18
this area during last three decades.
1.2.2 Estimation in restricted parameter spaces:
We have already mentioned that one of our main objectives would be
to employ some of these techniques to the restricted parameter space
problems.
Restricted problems.
though not very frequently encountered in
statistics. arise often enough to merit special attention. In this case.
the
structural
symmetry of
the problem which
led
people
to
seek
improvement over the best invariant estimator is not clear any more.
Thus. in a restricted setup. the nature of the usual estimator may change
quite a bit from the unrestricted case.
As we have remarked. there is no point in looking for the best
invariant estimator.
The UMVUE is not a reasonable estimator any more.
So the only property of the standard estimator which might be exploited
in this case is its being the MLE.
We would try to improve upon the MLE
in the restricted setup.
Most of the research done in this area has been devoted to solving
maximum likelihood equations under various models of
interest.
The
solution of a restricted maximum likelihood equation is often qui te
complicated.
Typically
Construction of
no
likelihood
closed
ratio
form
tests
solution
for
is
multivariate
available.
one sided
al ternatives have also been of some interest to researchers in this
area. See Barlow et al. (1972) for an extensive literature in this area.
The normal mean problem which would comprise a substantial part of
the
ensuing
investigation
has
been
treated
by
several
authors.
Bartholomew (1961) considered the following problem in the context of
19
multivariate one sided testing :
Suppose
X1'~' .......• X
p are independent normal random variables
with Xi having
2
X(~i.ai)
distribution for 1
~
i
~
p. The problem is to
find out a reasonable test for.
"0 :
~1 = ~2 = .... = ~p
against H1
~1 ~ ~2 ~ ..... ~ ~p.
(or similar types of order restrictions)
2
A generalized X -test statistic was derived in this paper using a
simple likelihood ratio argument.
The power function was computed at
certain points in the alternative space. The power function indicated
that no more improvement was available over this likelihood ratio test.
Thus. it was concluded that the utilization of the prior information of
order restriction led to a considerable gain in the power compared to
the case where no such information was used ( i.e.
2
).
X -test
The
test he obtained was essentially a
the unrestricted
mixture of X2
statistics. Chacko(1963) also considered similar problems.
A
Kudo(1963)
generalized
the
above
test
to
the
case where
the
observations were dependent. But. he needed the covariance structure to
be
known.
Algorithms
for
constructing
the
likelihood
ratio
test
statistic were given and also their distribution theory were discussed.
In many cases ordered restricted problems can be reduced to a positive
orthant problem ( i.e. the parameter space is the positive orthant) by
a linear transformation. So. after the reduction the main task remains
is to solve the following optimization problem ( in the normal case) :
0)
min (X - O)'Q-1(X - ,..,
subject to the constraint 0
~
0
Here X denotes the observation vector and Q the covariance matrix.
20
A
Kudo(1963} derived some geometric properties of this solution along with
an algorithm for computing it.
Nuesch(l966} obtained a manageable analytic expression for
the
A
solution of the above problem. The solution !ml ( which is the MLE in
this case) can be expressed as follows
!ml =:Iop c-
c
a
-
v
n
p
~a (
X . "O)
I(
_
,.;l·a
Q=-~a'
X ,
,..,...,.;l
~
0 , ,.;l·a
X . ,
>0
)
where the above notations are described below:
For each a
C
N
p
= { 1, 2.
p }, X
,.;l
consisting of the coordinates Xi'
i
Eo
a.
la IXI vector
is the
( Here ,
la I denotes the
cardinali ty of the subset a. Also, note that the above defini tion is
vacuously satisfied for a
= aC ,
Let a'
= op
).
the complement of the subset a. Also assume that the
subsets considered here are naturally ordered .
Define,
: the covariance matrix of X .
,.;l
Q
~
, : the cross covariance matrix between the random vectors X
,.;l
and X , .
..,a
~:a'
=~
-1
-~, ~'a'~'
on the vector X ,
,.;l
,.;l
•
~a ( ..,a·a
X . "O)
_
corresponding i
the residual after regressing X
a pxl vector whose i th coordinate is the
th coordinate of
the vector X , for
a
otherwise.
Also it can be shown that the following 2P regions
i
Eo
a and 0
21
~
{Q}
a '"
-1
= { "',.,...,.,...
x : Qo'a'Xo'
~
0 , X
'"
...,a
> '"0
I
} for a eN,
-
p
are mutually exclusive and their union is the whole of
RP.
Nuesch{I966} also derived the form of the MLE when the covariance
matrix is unknown.
It turns out that the MLE in this case can be
expressed in the same way as before except that now the covariance
matrix is replaced by the estimated covariance matrix S ( the sample
dispersion matrix ).
Perlman{I969) treated the same problem under more general setup. He
considered testing the null hypothesis.
"0
8 ~ ~1 against the alternative HI :
Here ~1 and ~2 are
RP
if x
> O. )
D then ax
~
~ ~2 • ~1 ~ ~2
two positively homogeneous
positively homogeneous cone in
~
!
D for any a
cones
is defined as a set D C
in
RP. (
RP
such that
A
The likelihood ratio test (LRT) for this problem was derived and
the null distribution of the test statistic was obtained along with the
nonnull
distributions under
some
special
alternatives.
A similar
program was carried out for the case where the covariance matrix is
unknown. Some local power comparisons have also been done. See Sen and
Saleh (1985) for references.
Obtaining improved versions of these standard estimators in the
restricted setup is a relatively new area of research. The only results
we are aware of at this moment are by Chang (1981.82) where he derived
some improved estimators in some special cases. In Chapter 2. we will
discuss his work in a greater detail.
22
1.3
Summa~
of results:
In Section 1.2 we reviewed the literature which will be relevant to
the ensuing analysis.
In Chapter II we derive improved estimators for the mul tivariate
normal
mean where
the
mean
vector
is
restricted
by
some
linear
inequali ties. By a sui table linear transformation we can reduce the
problem to the following simpler case:
X is distributed as X (e,~) , with, e' = (e(l)
'"
e(l}
e(2} } where
p",,,,
and e(2}
lXPl and lXP2 vectors respectively (Pl+P2=P).
are
Moreover. we have the restriction that e(l} ~ O.
First. we deal with the case where P2= O. There we obtain improved
estimators
unknown).
improving upon
the
restricted MLE
(~
may be
mown or
Also. we develop a class of improved estimators in this setup
which is a direct generalization of the class of minimax estimators
obtained by Baranchik and Strawderman in the unrestricted setup.
Next
we consider an example in the one-sided Linear Model framework. In this
case P2 is not equal to zero anYmOre. We need a small modification to
handle this situation. And finally. we study the risk dominance picture
of various competing estimators.
In Olapter III we consider a simul taneous estimation situation
(where we have p independent estimation problems and we consider the
simultaneous estimation of the parameters in the compound problem under
the additive loss structure).
, X
p
Here. we deal with the following model:
are independent random variables where X.
1
distributed according to the probability law Pi
ei .
1
~
i
~
is
P where 9 ·s
i
23
are unknown real valued parameters taking values in an open interval of
the real line.
For the sake of mathematical simplicity we assume that,
E9 . Xi = 9 i ,
1
~
i
~
p.
1
The most crucial assumption we will make in the above setup is:
< co
Here, a (9 ) stands for the standard deviation of Xi.
i i
We will refer to these types of mode 1s as mode 1s wi th bounded
coefficients of variation.
Our objective here is to estimate the vector
of parameters 9 = ( 9 , 9 , .... , 9 ) simul taneously under the addi tive
p
1
2
loss function:
The main result of this chapter shows that given a component-wise
estimator (i.e, 9
i
is estimated by a function of Xi only) of
!
in this
setup we can always get an improvement if p is sufficiently large
(i.e. any component-wise estimator is inadmissible for large pl.
Also, we derive the form of the improved estimator in there.
This
practically generalizes all the existing results dealing with improved
estimation in simul taneous estimation problems.
On
the other hand,
because of its universality this might not give us the best form of the
improvement in some special cases.
Finally,
in Chapter IV we address the ques tion concerning the
admissibility of the component-wise estimators in low dimensions. These
types of dichotomy results are known for the best invariant estimator in
location or scale families under invariant loss functions ( for example,
24
in a location problem the best invariant estimator is admissible for
p
~
2 and inadmissible otherwise ( Brown (1966»
admissibility
result
for
some multivariate
). We derive a unified
parametric models.
This
essentially generalizes an analogous result obtained by Brown and Hwang
(1982) for multiparameter exponential families under the quadratic loss.
Using that we obtain a dichotomy result for (generalized) component-wise
Bayes estimators in location models.
•
I
aIAPTER II
SHRINKAGE ESTIMATIOO IN A RFSfRICfED PARAMETER SPArn
2.1.
Introduction:
Under a quadratic loss, improved estimation of the mean vector
of a
p(~3)-variate
(~)
normal distribution has received a great deal of
attention (see Stein (1956), James and Stein (1961) and an extensive
literature cited in Berger (1985».
A variant of this problem relating
to a restricted parameter space (a) C RP ) is treated here.
unrestricted
estimators
case,
(SRE)
estimator (MLE).
i.e.,
of
~
for
RP ,
a -
dominate
the
shrinkage
classical
(or
In the
Stein-rule)
maximum
likelihood
The pivot (!o), underlying the construction of SRE's,
in the restricted case, may not be an inner point of
a
[a si tuation
quite comparable to the hyPOthesis testing problem against a restricted
-
alternative where under the null hypothesis, 9 may lie on the boundary
of a],
so that
the unrestricted MLE may not be generally optimal.
Generally, restricted MLE (RMLE) derived under the parameter restraints,
performs better than the unrestricted MLE (UMLE) on the restricted
parameter
space,
although a
different
complementary parameter space (i .e.,
picture
of\a).
may
emerge
on
the
One may therefore wonder:
Can the RMLE be dominated (in risk) in the same manner as the UMLE is
dominated by a SRE?
The primary objective of the current study is to
focus on this basic issue.
26
To motivate. we may mention some typical problems where RMLE are
advocated.
~
(i)
Ordered alternative problem:
Here
a
= {9 € RP : 9
! = 91.
9 }' so that the pivot (or null hypothesis) relates to
p
R1 .
1
~
9 €
The RMLE computed under the ordered restrictions is termed the
isotonic MLE [viz .. Barlow et al. (1972)].
problem:
a
Here
=
{!
€
RP :
!
~
!a}.
(if)
Orthant alternative
for some specified
the RMLE is computed under the orthant restraint that
!a €
!
RP .
~!a'
Thus.
We may
refer to Kudo (1963). Nliesch (1966) and Perlman (1009) for various
properties of RMLE in restricted models.
It is well anticipated (and demonstrated. in some specific cases)
that over a restricted parameter space
UMLE.
a.
This raises the question:
is the RMLE admissible?
showing that for p
~
a.
RMLE performs better than the
Under a quadratic loss and confined to
We shall provide a negative answer by
3. a Stein-type shrinkage of the RMLE is indeed
possible. and we shall provide an explicit formulation of this improved
RMLE.
An explici t
expression for
this improved RMLE and its risk
function are also provided.
Along wi th the preliminary notions.
Section 2.
the RMLE are considered in
Section 3 deals with the proposed restricted shrinkage MLE
(RSMLE) in the case where the covariance matrix
{R+)P
= {!
considered.
established
!
~~.
Chang
the
(~)
is known and {}
=
Some earlier results in this situation are also
(1983)
deal t
inadmissibility
with
of
the case where
the
~
coordinatewise
estimator ( obtained earlier by Katz (1001».
=
!.
and
admissible
Judge and Yancey (1986)
developed a single inequali ty restricted Stein- rule estimator; this
will be considered in section 3.
RMLE is established.
The dominance of the RSMLE over the
A general class of restricted minimax estimators
27
(of
!
over 8+) is also considered in the same section.
= 02v.
V known
,..,..
devoted to the case of ,..
I
02 unknown.
and
of Section 3 are extended for this model.
Section 4 is
and the results
-
The more general case of I
arbitrary (and unknown) is then treated in Section 5.
Utilizing some
intricate properties of the Wishart distribution (of sub-matrices). the
dominance of the RSMLE over the RMLE is established for this general
model too.
More general forms of restricted 8. arising particularly in
some corrunon linear models. are treated in Section 6.
Finally.
in
Section 7. the usual SMLE and the proposed RSMLE are compared in the
light of their risks.
While the SMLE is invariant under non-singular
transformations. the RSMLE is not.
canonical reduction of
!
This eliminates the possibility of a
(on 8+). and hence. the relative picture is
drawn for diverse situations dealing with appropriate subspaces of 8+.
2.2
Preliminary notions:
For every positive integer p. let
(2.1)
N = {I ..... p}.
p
By a C N • we mean a naturally ordered subset of N • and a'
- p
p
C N ).
-
the complement of a.
p
version.
is also taken as a naturally ordered
Further. for every a C N.
-
(2.2)
N*
p
= {a
p
lal. the cardinality of a. stands
Thus. I~I
for the number of elements in a.
p and la'i = p-Ia I.
=Np\a (also
= 0,
(~= null set) 0 ~ lal ~
For later use. we also define
aCN
p
and
> 2}.
lal
for p ~ 3.
For a p-vector X = (XI •...• X )'. we define for every a C N • X as
-
p
-
p
-a
the lal-vector consisting of the components Xi' i € a; the case of a
can be treated conventionally.
For every a C N • such that lal
-
p
=r
=~
(0 ~
28
r
~
pl. if
~
(u.v)
-- and -yare r- and (p-r)-vectors. respectively. then ~a--
is the p-vector such that
--
where the order in which the components of u.v enter is determined by
the natural ordering.
For a p x p positive definite (p.d.) matrix 9 and for a eN. b C
-
P
-
N ' ~b denotes the lal x Ibl submatrix of 9 consisting of the rows in a
p
and columns in b. Further. for a p-vector X and a p.d. matrix Q. for
-
every a eN. beN • let
-
p
-
p
-1
(2.4)
!a:b(= !a:b(9» = !a - ~~b
(2.5)
~:b
=
~
-
!t>.
Qa~~ 900..
Let then
p
€ R
~a(9) = {X
-
(2.6)
Then RP = U<pCaCN
-1
: 0 • • X • ~ O. X . • (9)
""a a -a
- -a·a
> O}.
<p
cae
-
N .
p
~a' where the ~a are disjoint [viz. Kudo (1963)].
- - p
We consider here specifically the orthant model for which
(2.7)
e
= e+=
. .{O
.. €
RP : ....
0 ~ ....
O}.
A general class of restricted alternative models can be reduced to this
specific model. and we shall make more detailed comments in section 6.
Suppose now that ....
X •...• -n
X are independent and identically distributed
1
(i.i.d.) random vectors (r.v.) having the p-variate normal distribution
~
with mean vector! and dispersion matrix
(p.d.).
Note that the UMLE
of ....
0 is given by
(2.8)
!n
Note that
= n
-1 ...n
~i=1!i
....X
is unbiased for ....
0 but not admissible for p ~ 3.
and
!n
~
-1
Np(!.n
~).
29
In the case of
~
""
lmown, the RMLE of 9 is given by
""
wi th the notations adapted from (2.1),
(1966)].
The RMLE is not unbiased for 9.
""
Consider next the case of
Note
(2.3) and (2.6) [viz., Nliesch
that
(~,~) are
~
being unknown.
jointly sufficient
for
Let
(!!,~),
~ and ~ are
mutually independent, and
The UMLE of
!!
is still!n.
Restricted to e+,
the RMLE of
!!'
in this
case is given by [viz., Nliesch (1966)].
A*
(2.12)
~DV
.~
=
_
I
~CN
~ {X . . (S ), Q) 1{& € ~a{~» .
a "" a·a
n
..
- - p
Note that in (2.9) and (2.12),
for only one (out of the 2P ) of the
indicator functions is equal to I, so that we have actually a single
(random) a, written in this closed form.
In the next three sections, we
shall study the dominance of the RMLE (over the UMLE) and propose some
RSMLE which dominate
the RMLE.
In
this context,
quadratic loss function
A
so that for an estimator 9 of 9, the risk is given by
""
""
we
introduce the
30
A
A
Note that an estimator! dominates another estimator
A
A
if R(!.a)
....
some
~ R(~.a).
....
e.
V! €
~
(of the common !)
with the strict inequality sign holding for
!.
2.3 RSML when :I is known.
For simplicity of notations. we will take n=1, i
-n
= X .... N (a.:I).
p .... -
-
Note that {~a(~): ~ S a S Np } is a partitioning of ~p into 2P disjoint
subsets.
Also.
on
~
(:I). the problem of estimating 9 under the
aconstraint that 9 ~ 0 essentially reduces to solving an lal-dimensional
- -
normal location problem without a constraint (as on
~
(:I). X . I(:I) is
a -
-a..a-
-
>0 and it coincides with the UMLE in an lal-dimensional problem).
Thus.
adapting the classical James-Stein (1961) shrinkage version individually
on each of the 2P subsets ~ (:I). ~ Cae N (With the shrinkage factor
a - p
depending on the respective lal). we propose the following RSMLE of !
(in the case of known
~):
where
(2.16)
o < ca < 2(lal-2).
if lal
> 2. and c a = 0 for lal
~ 2.
In this context, one should compare the above form of the RSMLE
with the type of improved estimator suggested by Chang (1983).
This is
quite comparable to the form of the proposed estimator in his theorem 1.
For.
o
6 (~) = Xi + t(X )
i
i
improved version as follows
• i = 1, 2.
...• p , Chang suggested the
31
0 1 (X)
i
~
for p
={
3 and 0
if X
Xi + t(X.) - c
j
1
for all j,
otherwise,
Xi + t(X i )
<c <2
>0
(p-2).
Here, the improvement is considered only on the subset
~ (~)
a-
where a =
Np in our notation and the nature of the improvement is quite analogous.
But, in (2.15) we get some improvement in the risk on each
N,
p
la I ~ 3.
So,
it is qui te natural
~a(~)
to expect that (2.15)
modification will dominate the above estimator.
, a
~
type
This will be elaborated
further in remark 6.
Our main intention is to prove the following.
THEOREM 2.1
Under the quadratic loss in (2.13), for every p
A
(2.11)
~
3,
A
R(~,!) ~ R(~,!),
V!
€
9+,
-
where the strict inequal i ty holds in a neighborhood of the pivot 0
(CB+).
Thus, over 9+, the RSMLE dominates the RMLE.
For the proof of this theorem and some other results (to follow),
we need a basic lemma, which is presented first.
Lemma 2.1.
(2.18)
where
(2.19)
2
II!!II~
-
= <!,~~.
x '"
,.",
-
J( (e,~).
p""'''''
Assume that
~
#IV
p.d.
32
Then. for every (r.s): s
(2.20)
~
O. p+r+s > O.
E!!{II~lIi %!!>i J(~ ~ Q}} = (exp{-~ 1I!!II~l
J1c~O(k! fl
"'k+s O!.!)2(r+s+k)/2 f r+S;k+p)/f
ffi]·
Proof of Lemma 2.1:
Now by Cauchy-Schwarz on <!.!>! and by the dominated convergence theorem
it can be shown that the above expectation is finite if s
~
0 and
r+s+p>O. also the l.h.s. of (2.21) equals.
where EO.! denotes expectation under Rp(O.!).
Now. it is a known fact
that under R U!.!). (~. I(! ~ Q» is jointly independent of 1I!lI r
p
II!II!
So
(2.22) can be rewritten as (after multiplying and dividing the integrand
by 1I!1I;+k):
exp
{ 1 11011 2} L
-2 - !
<O.X>S+k
- -!
I(X
O.! [ IIXlls+k
-
(k l )-1 E
K~O'
-!
> 0)
-
1
IIXll r +s +k
O.! - !
E
33
r+s+k
= exp{-211!" I2} ~~O (k., )-1 "'k+s{!,I) 2
2
T (r+s +k+P]/T
2
(22)
which completes the proof of the lemma.
Proof of Theorem 2.1.
of the
~
~ (~),
a....
By (2.13), (2.14), (2.15) and the disjoint nature
a eN,
on defining Np*
- p
as in (2.2), we obtain that for p
3,
Thus, it suffices to show that for every! € 8+, the net contributions
of the last three terms is negative.
(2.24) aCN
I * Ee{ca....
I(X
€ ! (I)} =
a ....
- p
Towards this, note that
> 0, ~~al
X ~Q}.
-<;I,-a
I * c a..,
PO{X.
aCN
-a. a ,(I)
....
I
- p
=
-1
I ....
(I) > O} • PO{I
, , X
.... c a PO{X.
,.,a·a"''''
aCN..,
.., -a a -a
I
I
~ Q},
-
- p
as X . I{I) and X , are mutually independent, for every a eN.
-a·a'"
-a
- p
by (2.20) with p = lal and r=s=O. (2.24) reduces to
Thus,
34
Similarly.
-2
A
A
}; * EO{call~lI};
aCN
- p -
(2.26)
= aCN....
}; . . .
c
<!.~>}; I(X € ~a(~»}
-1
E {1(X. ,(};) > O. :I , , X ,
a _O ---a. a - -a. a -a.
~
A_2
A
0) <0. !Jnu> • lI!Jnull" }.
- --.u'l:I
--JVl ~
- p
A
where. we may note that on ~a(~)' ~ = ~a(~:a"
(2.27)
A
2
lI!JnuII"
--.u'l L.
1
( 2.28)
<0, ~-->
= -a.·a
X . ,:I- ,X
,;
-a.:a -a.:a
-
- "'RM
so that
X
. =X. ,(:I).
-a.:a
-a.·a-
r
= (0' flB- + 0'.
'a)X . ,(:I)
-a.
-a.
-a.·a :I
-
Q).
= -a.:
0' ,};-1,
a
a:a
X
,(:I),
-a.:a ....
where we have used the identities that
(2.29)
-1
...aa
=:I.,
-a.. a
.1.
-
and
r-
'a-l
= -:I •• :I, flB-.
-a. a -a. a -
Thus. again using (2.20). (2.27) and (2.28). (2.26) reduces to
[ ~~O
= }; ..... . .
aCN
c
a
- p
:I
[k~1
1
k! 2
(k-l)/2
Po{:I-~
•
-a. a
-
k
k-'
.
2
~+I(!a:a'(~)' ~:a')
X •
-a.
(k-2)/2
~
Tl-~-1+2 jail IT [M)]
2
.
O} (exp{-2!1I0 . • (:I)II~
}].
-a.·a - -a.:a'
L.
'iJk ( 0 . . (~). :I . .) T
-a.. a
-a.. a
~-2+lal)
2
[1
)
I T 2-la I .
Further. by using (2.20) with s=O. r=-2 and p=lal (so that r+s+lal > O.
Va
~
(2.31)
N*
p )' the last term of (2.23) can similarly be reduced to
I
a._on
* ca2 PO{I-~
,
-a. a
I"'"M
- p
-
X •
"'8.
~
O} (exp{-2!1I0 .
-
"'8. • a
,(I)II~
-
L.
-a.:
a
•
}].
(cont.)
35
Therefore, by (2.25), (2.30) and (2.31), we obtain from (2.23) that
A
(2.32)
A
R(~,!)
-
R(~,
!)
-2] IT (~]]
2
.
l l
1
(k-2)/2 [k+ a
k~O
k!
~k(!.a:a'(~)'~:a,)2
T
2
[
~ O,"as on N*,
lal > 2 and 0 < ca < 2(lal-2).
p
This completes the proof of the Theorem.
REMARK
1.
Note
that
in
(2.32),
for
each a
eN,
*
-
p
the
factor
ca (2Ial-4-ca ) is a maximum when ca = (lal-2)+ = max{lal-2,O}. Thus, on
letting c = (lal-2)+, for a eN, we obtain that the maximum value of
a
(2.32), (pointwise in
-
!
€
8+) is
p
36
(2.33)
This form is very similar to the classical James-Stein estimator in the
unrestricted case where the reduction in risk is
Thus. we see that in the restricted case (of 9+). the improvements in
the various lal-dimensional problems (for lal>2) add up to the total one
in (2.33).
REMARK 2.
As expected. the risk-reduction due to shrinkage is a maximum
-
at the pivot O. and is given by
(2.35)
As ! moves away from
52
(inside 9+).
(2.32) converges to 0;
this is
because both the RMLE and RSMLE are minimax.
REMARK 3.
-
In (2.15). we have considered the case of n=l.
If X is
-
replaced by X • the only change in (2.15) would be to replace
~ ~-1 ~ by n~ ~-1~. As for the reduction in the risk in (2.32)
it still holds if in (2.13) we choose the loss as
REMARK 4.
-1
A-
n(~!)'~
A-
(~!).
A positive-rule version of (2.15) may also be considered as
follows:
Here also. the (lower) truncation of the shrinkage factor is made to
-1
-1 +
depend on the sets ~ (~). ~ Cae N. Writing {1-c (~-- ~
~nv) } =
A-
a-
-
-
p
1(~ ~-1 ~ ~ Ca){I-Ca(~ ~-1 ~)-1}
proof of Theorem 2.1. it follows that
a"'RM-
A-
-n1'1
and virtually repeating the
37
A+
~M
so that
dominates
(2.3) [instead of
-1
~
~M
over 9+.
However. for an arbitrary 1J in
]. (2.37) may not hold [and the picture is quite
comparable to the unrestricted case].
REMARK 5.
Chang (1983) considered different versions of the restricted
shrinkage estimators.
If X has a normal distribution with mean
~
(~O)
and variance 1. then by Katz (1961) • one can obtain an admissible of
~
by considering
A
(2.38)
~(X)
~
= X+ ~
o
(X). where
exp {_~x2}
(X) =
o
---=----X
1 2
J-xJ exp { - 2' u } du
Now for a roul tivari te random variable
!
wi th mean
! (
~
Q) and the
dispersion matrix I. he considered the following estimators
(2.39)
6~(!)
= Xi
+ t{X i ) • for i
= 1.2 •...•
p.
and
if X
j
(2.40)
>0
for all j.
otherwise.
( Note that (2.38) is included as a special case if we allow any t
~
0 )
Then by Clang. in the siroul taneous estimation of J.L under the quadratic
error loss. (2.40) dominates (2.39) • for p
~
3 and 0
<c <2
(p-2). If
we consider the form of improvement analogous to (2.15). we get further
reduction in the risk function.
The natural alternative here is:
38
C X+
a i
IIX II
~~2
• on
~
(I). a C ~ .
a....
- p
"'8.
And.
=
~
* EO[
~~
~
.... iEa
- - P
(X. + t{X i ) - 0i»
1
2
-
~
iEa
(X. + t(x.)
1
1
+
e X ·
_ a i _ 0.)2]1{~ (I»
IIX 11 2
1
a
"'8.
-
c~
1 2]1{~ (I)
IIX II
...a
a
follows from the nonnegativity of t{.) and theorem 2.1.
Remark 6:
suggested.
In theorem 2 of Chang (1983). another type of improvement is
Here
o < C < 2{p-2)
and
p
~
3.
We can obtain (2.15) tyPe improvement here as well.
Ca (Xt'PO{X i
»
--";;'--K.---'-~-2=" •
~
iEa
Define. for a C
-
i € a
(Xj+'PO{X j »
if i (a.
on
~a{~).
~
p
39
where. 0 ~ Ca ~ 2(lal-2)+ and p ~ 3.
Then. by similar manipulations as before.
(2.44)
I
~
C P(X.
-a.
"""a
aaN
- p
~
0) ARa
....
where. denoting EO f(X )l(X
....
-a.
-a.
> 0)
by
*
EO
....
f(!a)
(-2.45)
Now. an application of integration by parts with respect to the ith
component on the first term of (2.45) yields
Now. using the fact that. CPO(O)
~
0 and following the same 1 ine of
argument given in theorem 2 of Chang (1983). it follows.
ARa
>0
f or a C IN*
-
p
40
Hence. we get the desired result.
Motivated by the posi tive-rule shrinkage estimators and a general
class of minimax esti!l'la.tors (in the unrestricted case) introduced by
Baranchik (1970) and Strawderman (1971). we are tempted to consider more
general minimax estimators in the restricted case.
In the unrestricted
A
case.
~
as well as the other minimax estimators are equivariant under
nonsingular transformations X
""" ~ Y
,."
..
B nonsingular (I ~ BIB'
= BX.
,."
= I*).
This equivariance may not generally hold for the restricted case. i.e .•
e
+
is not generally invariant under 0
~
6
......
to use a canonical reduction (on
!.~)
= -BO.
This makes it difficult
to establish the desired result.
However. we are able to establish a general result under an additional
condition on
(!.~).
Let us define the
~k(~4~)
as in (2.18).
Let then
It may be noted that in order for e: to be non-empty. it suffices to
assume that
-1
~:a' ~:a' ~
(2.47)
Q.
*
for every a ~ Np '
= diagonal.
= diagonal. V a eN.
*
p
and this is. in particular. true for ...
I
O. V a C N* and
-
p
~ . ,
-a.. a
= -a.a
I
follow holds for e+ when
~
so that -a..
0 .a
I
= -a.
0
~
Thus. the theorem to
is a diagonal matrix.
For every a : ., c a e N • let t (y) : R+ ~ (O.2{ lal-2)+). be a
p
a
nondecreasing. nonnegative and bounded function of y. and for p > 3. let
A
(2.48)
o = I
...A*
cpCaCN
- - p
I (! €
2
I{ lal>2)ta (II~II~)~
!:a (~) ) { 1 2
.
A
II~II~
41
Note that if we let t a (y) = y I(O<y<c
- - a) + ca
(2.48)
reduces
to
(2.36).
so that
l(y>c).
a eN.
a
p*
the posi tive-rule estimator
member of this class.
Let ~ '" Np(~'~) where ~ is such that ~ €
TIIEOREM 2.2.
e:
then
is a
in (2.38).
Also. assume that ta(y) is non-negative and monotone increasing in y
(€ R+) and
(2.49)
0
~ ta(y) ~ 2(lal-2). VaS N:.
~
Then. for p
e:)
~
3. any estimator
and hence is minimax (over
Proof.
~
A
of the form (2.40) dominates
~ (~
e:).
We very much follow the lines of the proof of Theorem 2.1.
Parallel to (2.23). we have here
A*
(2.50)
A
R(~ .~) = R(~.~)
-
2(1) + 2(11) +
(III).
where compared to (2.25). (2.30) and (2.31). we have here
(2.51)
(I) = };
* P9{};-~ , !a. ~
aCN
- p
(2.52)
(II) =
..... -a a
Q} [exp{ -2!1I!". :a' (~) II~
...
~:a'
}] .
a~* P~{~~a' !a. ~ Q}[eXP{-~II!a:a,(~}lIt:a)]'
- p
(2.53)
(III) =
a~* P~{~~a' !a' ~ Q}[eXP{-~II!a:a·llt:a))·
- p
( ~
"
1
k-2
2
2
"~ . ' ) E0 [IIX . ' II"
t (IIX : a' II};
• }]} •
k-' ./.
"'k (9
. ' •-a·a
k~O'
-a·a
..... -a·a ~:a' a"'"
-a:a
Thus. from (2.42) through (2.45). we obtain that
42
Now, under ........
9 = 0,
( 2.55)
IIX.
,.. >(2a " V cp cae
-a·a ,II~
~,
- Np .
1
-a:a
2
*
Therefore, writing U = IIX . ,II"
,we obtain that for every eN,
-a·a ~. ,
- p
-a·a
(2.56)
Eo{ta (U}[2 ukl2 - 2 k Ukl2- 1 - ukl2- 1 t a (U}]}
....
-II a I
= {22
k~O,
r(~lal)
}-1
I~
--u
1 -11 a I-1
e 2 u2
[ta(u} {2ukl2_2kukl2-1_ukl2-1 ta(u}}]du
+k/2-1 r [~( la l+k-2)]/r [~Ia I]
f{ ta(~ia I+k-2)[2>(ia l+k-2-2k-ta(~ia l+k-2)]}
~ {2k/2-1 r[~( lal+k-2} ]/r[~la I] f{ta(~ial+k-2}[2>(ial+k-2-2(
lal+k-2}]},
as by (2.41). t a (u) ~ 2(lal-2}. Now t a (y) and 2y-2(lal+k-2} are both
monotone ~ in y and E >(2 = q. V q ~ O. Therefore. the right hand side
q
of (2.56) 1s greater than or equal to
(2.57) {2k/2-1r
G( la l+k-2} ]n [~Ia I] fO{ta(~ia l+k-2)}
Eo{2>(jal+k_2-2(lal+k-2}]}.
as lal+k-2 ~ 0, Va C N*.
- p
= O. V k
~
O. a
~
N;,
Finally, by (2.46), the ~k(9 . " :I . ,} are
a·a -a·a
43
all nonnegative.
Hence, (2.54) is nonnegative too.
This completes the
proof of the theorem.
-
In passing, we may remark that besides the case of diagonal :I,
(2.47) [and hence,
(2.46)] may also hold for other situations (viz.,
ordered alternatives, intra-class correlation models etc.).
2.4
- = 02v.- V known:
RSMLE when :I
In this case, we consider the reduced model:
independent and
- S2)
(X.
are mutually
we want to estimate! subject to ! € 9+, with a plausible null pivot.
We take s
2
= m(m+2) -1 S2
-1
A
well as
(~~
2.._
= s~.
A
and let!
Then. in (2.15). for
A
~a(!)
as
A
~),
~
we replace
~,
by
and denote the resulting
A
A
A
estimator by
~.
Note that
~
(:I)
a-
= ~a(:I).
-
V .p C_ a C
Np (by the
-
positive homogeneity of these sets), thus the only adjustment is made
with the test statistic.
If we proceed as in the proof of Theorem 2.1, we obtain here
A
(2.59)
R(~,e) = R(~,!)
- 2
:I
*
aCN
- p
E
2{ca (s2/0 ·2)1(! €
0,0
~a(~»}
-
Next. note that (i) s2 and ! are independent. and (ii). by (4.1),
44
2 2
2 2 2
Thus. we may take the expectation of (s 10 ) [or (s 10 ) ] and the rest
separately.
Hence. for the second term on the right hand side of(2.59).
we have (2.25) multiplied by m/(m+2). and the same multiplicative factor
appears in (2.30) as well as in (2.31).
an additional factor m/(m+2).
(2.61)
Thus. we obtain that
'"
A
Hence. here (2.32) holds with
A
R(~.9) ~ R(~.!).
V 9 €
-
e .
+
In a similar manner it follows that theorem 2.2 also extends to this
case
with
the
specific
choice
of
s
2
=
the
same
multiplicative factor m/(m+2) appears in (2.48). (2.49) and elsewhere.
but that does not alter the relative picture.
handled in the same way.
THEOREM 2.3.
g210
Finally. (2.37) can be
Hence. we arrive at the following.
For the case of
!
=
o~. X mown and for s2 = (m+2)-lm 8 2 •
-~. independently of ! - N(!.!). the basic results in Theorems
2
2.1 and 2.2 and in (2.37) hold.
It may be noted that the factor m/(m+2) indicates that the cost of
not mowing a
2
is 2/(m+2). and it goes to 0 as m increases.
2.5 RSMLE when 1: is arbitrary (unknown):
In this case. the RMLE of
!
is given by (2.12).
(!.!).
(!n.~) are jointly sufficient for
since [by (2.11)].
we can reduce the original
problem to the following:
(2.63)
x-
IV
H
(9.1:). 8 - W{p.m.1:). X.S mutually independent. where m is
,p""""
#IV
""'"
#ttl
a positive integer. and W(eee) stands for the Wishart distribution.
want to estimate "."
9. subject to ,.",
9 €
"close to" Q.
e+ .
We
when it is plausible that 9 is
#IV
In view of the close similarity of (2.9) and (2.12).
here. we consider the following RSMLE:
I
~
45
(2.64)
where
A*
~.
defined by (2.12). may be written [as in (2.27)]. as
A*
(2.65) !Jnv
=
-nl'l
~a (X
. ,(8).0)
""8.·a
........
A* 2
, -1
on!:a (8);
lI!JnvIl8=(X
. ,(8» Sa:a ,(X
....
-nl'l
....a·a....
""8.:a ,(8»,
....
-
cae N . and the shrinkage factors {c* } satisfy
for every
~
(2.66)
< e*a < 2(lal-2)/(m-p+3)
0
-
TIIEOREM 2.4.
-
a
p
> 2.
if lal
and c*
= O. otherwise.
a
Under (2.63) and (2.66). the RSMLE in (2.64) dominates the
RMLE in (2.12) [under the quadratic loss in (2.13)].
Proof.
Following the lines of the proof of Theorem 2.1, we have here
(2.67)
R(~.!) =
A*
A*
*
R(!!RM.!) - 2 :I * Ee.:I{ca 1(!
aCN
_"I
A*2
A*2
€ !:a(~»II~II?,1I ~1I8}
_"I
- P
+ 2
:I * Ee.:I{c: 1(!
aCN
- p
€
........
!:a(~»(~ ~-1
Note that by virtue of (2.63). for every a
(2.68)
~:a''''' W(lal. m-Ia'
(2.69)
is
8 . ,
""8.. a
regardless of
~
I.
II~II~}
!)/
-
~CaCN.
-
-
p
~:a')'
independent
-1
of 8, ,
-1
(or ""8.
8, ,) and S ,S, ,.
a
aa a a
""8. a
[See Anderson (1984. Theorem 1.3.6)]. and Sa:a' is
independent of X . ,(8) (=X -8
""8. •a " "
-1
""8. ""8.a
-1
, 8 , , X ,) and S , , X ".
""8. a
""8.
-
a a
""8.
result follows from the independence of 8 and X and (2.69).
....
every
a
eN.
*p
-
using
the
classical
Wijsman
(random)
The last
Thus. for
orthogonal
transformation on ""8.
X •a""
. ,(8) and ""8.
8 .•a ,. we obtain that given ""8.
X .•a ,(8)
and
....
46
-1
S , , X , (X €
aa -a.
(2.70)
~
a
(S»,
....* 2
....* 2
II~II~ / II~II~
=~
2
\n-Ia' I-Ial+l
2
= \n-P+l'
which does not depend on the vectors held fixed; here =
equality of distributions.
~
stands for the
Hence, we have
(2.71)
-1 ,X,
= 1: .... c * (m-p+l)E ,,{l(S,
O
ru..... a
,.G
-a.
a -a.
a'-_n
#'W
- P
#IV
~
0) E[I(X. ,(S)
-a.. a -
~
0) IS -1, ,X
a a -a.
,n.
Next, note that
(2.72)
E{X. ,(S) IS -1, , X ,} = E{X -S , S-1
, , X , IS -1, .. X ,}
-a.·a - -a. a -a.
-a. "'88.
a a -a. -a. a -a.
1-1
-1 ,(X ,-0 ,) - E{S ,S, ,}S , ,X ,
= 0 + 1: ,1:,
-a.
"'88.
-a a -a. -a.
"'88.
-a. a -a. a -a.
= -a..
0 . ,
a
+ [1:
1-1
"'88.
0 a',
-- -a.:
as
-1 ,S , , - E{S ,S, ,}]S , ,X ,
,1:,
-a. a -a. a
"'88.
-a. a
-a. a -a.
-1
E(S
, S , ,) = "'88.
I
,!,. 'a' S-a. , a ,.
"'88. I a a
-c;o.
Similarly,
-1
(2.73) V(X. ,(S) IS , ,X ,)
-a.. a - -a. a -a.
-1
Further, given S , , X "
a a -a.
= -a..
I . ,{I
a
,
-1
+ X , S , , X ,}
-a. -a. a -a.
= -a..
1: . ,(X ,)
a -a.
X is normally distributed.
-a.
Hence,
say.
(2.71)
reduces (by using Lemma 2.1) to
(2.74)
I
-1
* c*a (m-p+l)E_'_
, ,X ,
O I{I(S
-a. a -a.
a~
- p
~
2)
1
I k ~k(O . . . I . ,(X,».
k>O'l
-a·a
-a.·a-a.
47
exp(-~lIe :a,lI~
(X»
-a. :a' -a.'
,.,a
2k/2T[~( lal+k-2)]IT(~lal).
Similarly.
(2.75)
= :I * c * (m-p+1)E e ,,{1(S-1• ,X ,
aCN
a
..... ~
-a. a -a.
~
Q)
1
:I k-'
,pk(e .
k>O.
-a.. a
II
:I . ,(X,».
-a.·a-a.
- p
(2.76)
*2
-1
-1-1
= CN:I* c a (m-p+l)(m-p+3) Ee.:I{l(~'a'!a' ~ Q) (1 + ~, ~'a' !a') .
a
........
- p
1 1 2
IIO
:I
k-' ,pk(O
:I .a. (X
'O.
-a... a
-a..
-a. .» • exp(-2- -a... a ,II"
L... (X)}}
.
k £.
-a.·a -a.
II
2k/2-1T[~( la l+k-2) ]IT (~Ia I).
From (2.67). (2.71). (2.74). (2.75) and (2.76). we obtain that
(2.77)
A*
R(~.!)
A*
- R(~. !)
*2
*
-1
1
2
=:I * c (m-p+1) c EO :I{l(S • .X • ~ Q) exp(-2"!a.:a'l:I
(X )}.
aCN
a
a ......... -a. a -a.
-a.: a • -a.'
- p
•
-1
-1
1
(1 + X • S , • X .)
.}; k-' \b.. (0 . •• }; . ,(X .»
-a. -a. a -a.
k~O' • k -a.. a
-a.. a -a.
•
1
[12-lal )-1 [(1 + X·.S-1•• X ,)(2Ial-4) - c * (m-p+3)].
T[2-(lal+k-2)]IT
-a. aa -a.
a
Now. under (2.66). for every a
(2.78)
c*(m-p+3)
a
< 2(lal-2} ~
~
Np* '
(lal-4}(l+X·.
-a.
S-~
.
a a
X .).
-a.
V X ••
-a.
48
while we may rewrite (2.77) as
(2.79)
:I .... (m-p+1) c
r-r.!'"
a '-I'
- p
*
-1
A* -2
0) IIfLull"
(X ,) Is
a E{E[1{S,
-a a IX
-a I ~ ....
"'RM.L.. I -a
-a ,a "X
-a .].
-a·a
-1
[(1 + {X
X 1}}(2 Ia I-4} - c*
-a IS,
-aa I -a
a (m-p+3}]}
~
V ....
0 €
0,
e+ .
This completes the proof of the theorem.
Suppose now that in (2.64), we let c*
a
= {lal-2}+/(m-p+3},
a eN.
p
Then, we have the following.
Corollary 2.4.1.
For the specific choice of c* = (lal-2)+/(m-p+3), a C
a
N*
the reduction of the risk due to shrinkage is
p'
(2.80)
-2
-1
IIfJnvll"
(X,) }(1+2 X'
X )}.
--nrl .L. •
, -a
-a ,S
-a , a , -a
-a·a
The proof follows from (2.79) and particular substi tution of the
c* .
a
A positive-rule version
A*+
(!RSM)
of (2.64) can be considered as in
(2.36), and (2.37) extends in this case too (With
A*+
A*
replaced by !RSM and!RsM' respectively).
Let us indicate the modifications for the case where X
-
= -n
X
= n- 1
n
:I !i and the !i are i.i.d.r.v. with N (O,:I).
i=l
P .... -
we have X .... N (O,n- 1 :I).
-
p-
A
-1
In this case, in {2.63},
Also, here are replace S by S
....
-n
.... W{p,n-1,:I).
....
Note that:I
S. Thus, here m=n-1, and hence, by Theorem 2.4
-n = {n-1} -n
and Corollary 2.4.1, we may take the shrinkage estimator
49
I
(2.81)
~Oi
- - p
where
~:£n)
(2.82)
--nn
=
I
~Oi
~ (X .. (i
a -a. a
A
),0)
-n -
on
~
(9 ),
a -n
- - p
and the other notations are as in (2.65) with proper replacement.
REMARK 6.
Comparing (2.15) and (2.81) (with c
a
= (lal-2)+. a C N*), we
- p
A
may note that (by Remark 3) here we have replaced I by 9
-
additional factor (n-l)/(n-p+2).
1. while for p
adjustments.
a.s.
>
3.
-n
For p=3. the later factor is equal to
>
(n-l )/(n-p+2)
Further. (n-l)/(n-p+2)
~
1.
indicating large shrinkage
~ m
1 as n
(p fixed). and 9
-n
(as well as in the rth mean, for every (fixed r
> O.
two versions become asymptotically risk equivalent.
becomes large. for every a
and have an
* nll9 . . II~2
eN.
p
-a·a ~.
-a·a
,
~ m,
~
I
-
Thus, the
However. as n
so that there is no
effective shrinkage (or reduction in risk due to shrinkage). for large
n.
- -
for any fixed 9 #- O.
In the unrestricted case. Sen (1984) has
advocated the use of Pitman al ternatives to the pivot. for which the
asymptotic
risk has a
non-degenerate
form.
and
this
consideration
remains in tact for the restricted alternative under consideration.
2.6.
Some extensions and important applications:
In
this
section.
we consider
some extensions of
the
obtained in the earlier sections to more general models.
results
and also
consider some specific applications in some practical problems related
to linear models.
2.6.1
Sub-orthant models:
We consider the following model:
50
where !l and Ml are m-vectors, Ml unrestricted,
~
€
!2 and
~
are p-vectors,
(R+}P, and we assume, for the time-being, that ~ is a (p+m) x (p+m)
p.d. known matrix.
We desire to consider improved estimation of M under
the above sub-orthant restraint (when we have the null pivot for
~).
We denote by NO = {rn+l, ... ,rn+p} and for every a : 'P Cae NO,
p
-
-
p
the
Also, we define !l:a' (~) =
complementary subset is denoted by a'.
Xl " X . , (~) = X . , and ~ . " ~ , , as in earlier sections.
"':a -a·a'"
-a·a
-a·a -a a
THEOREM 2.5.
m
) for Ml € IR
Under the model (
2.83,
and
~ €
+p
(R ) , the
RMI...E of M is given by
(2.84)
Proof.
"
u..... =
""1'<M
-1
~ 0 ~N
(Xl' "X . ItO) l(~ , ,X , ~ 0, X . ,
cpCa.CN
m,a "'·a -a·a'"
-a a -a
,., -a·a
- - p
-1
Let us partition ~
ij
as «~ }}i,j=l,2' and let
Then the RMI...E of M is given by
(2.86)
"
"
U(MlRM'~}
> O}.
"
I!RM
"
= (Mi RM'
= min{U(Ml'~}
: Ml
€
M2
RM)', where
m
+p
R , ~ € (R ) }.
Since Ml is unrestrained, differentiation of (2.85) with respect to Ml
yields that
(2.87)
"
11 -1 12
"
!l - MlRM = -(~)
~ (!2-~)'
Also, we may rewrite (2.85) as
(2.88)
11 -1 12
Noting that (~)
~
-1
= ~ll
~12'
we obtain from (2.87) that
51
,.
(2.89)
!1:2 - ~1:2 RM
= Q.
Hence, it suffices to minimize
-1
(~-~)' I22(~-~)
over
,.
to
~
RM' and then to use (2.87) to obtain
in (2.9) with
(2.90)
~
replaced by
~22'
~
~ ~
Q, leading
RM' we have the solution
so that
,.
~RM = ~~O ~a{!a:a .. Q) 1(~ € !:a{~»'
- - p
while, by (2.87),
(2.91)
,.
~1RM = !1 + (~
11 -1
)
12
~
,.
(~- ~RM)
-1
Next, we note that on !:a(~) (={~ : ~'a' !a' ~ 0, !a:a'
> Q}),
Hence, on ~ € !:a(~)'
(2.93)
,.
12-1
~1RM =!1 + ~11.2 ~ ~a{~' Ia'a'
so that by (2.90) and (2.93), on
for every a : ~ cae NO.
-
yield that on
!:a(~)'
-
p
!.a"
!a')'
!:a(~)'
Finally, some standard matrix manipulations
52
=
~N
(Xl'"
m' a "'·a
X . ., 0), V
-a.·a
-
cP Cae
-
and hence. (2.84) follows from (2.94) and (2.95).
NO,
p
Q.E.D.
Motivated by the RMLE in (2.84) and the Stein-rule estimators in
Section 3. we proceed next to construct improved estimators of
this sub-orthant model.
subspace
{~1 €
m
R ,
~ =
~
for
In this context, we consider the pivot as the
Q}.
and the quadratic error loss in (2.13), as
amended for the (p+m)-dimensional case.
THEOREM 2.6.
The shrinkage estimator
(2.96)
where 0
~ ca ~ 2(lal+m-2)+, ~ ~ a
C
N~, dominates the RMLE in (2.84)
m
[QYtl ~ € R x (R+)P] when p+m~3.
Proof.
Note that Lenuna. 3.2 extends directly to the case where the
positive orthant
[~ ~
QJ
is replaced by any positively homogeneous cone,
and hence. we may Virtually repeat the proof of Theorem 3.1, where we
need to replace (2.15) by (2.84).
For intended brevity, the details are
omi tted.
-
For unknown covariance matrix, whenever we have a Wishart matrix S
(with M DF),
-
independent of X,
we may directly use the results of
Section 5, and arrive at the following:
THEOREM 2.7.
For the sub-orthant model. the RMLE. given by
(2.97)
is dominated by the shrinkage estimator.
53
(2.99)
We may also remark that parallel to (2.81) and (2.82). here also.
-
for ....
X = "'Il
X = n
the RML with
-1
~
~
X. , ....
Xi i.i.d.r.v. with Np+m (I~.~).
1= 1 ....1
'" ,., we may construct
" as in (2.84). and for the SMLE in (2.98).
replaced by ~
~.
we may take the shrinkage factors as
(2.100)
°
"* -2 th
1 - (n-1)(n-m-p+2) -1 n -1 (m+ Ia I-2) + "~"i'
\f'S a S Np '
"'Il
The modifications for the case of ........
~ =
02v.
with a known ,.,
V. can be
done as in (2.58) by defining s2 = m(m-2)-l S2. and then proceeding as in
For this special case. by virtue of the adjustment for
(2.97)-(2.99).
2
s • in (2.99). we may allow
°~
c* ~ 2(lal+m-2)+.
a
<l>c-
Restricted Ml.E.
Cons ider the usual
a
This
P
special case will be found to be useful for the linear model setup to be
C NO.
-
considered next.
2.6.2
Univariate linear model:
model
( 2.101)
where
=
~
X = C
"'Il
-
R
I::'.
+ e ; e
"'Il
"'Il
.... N (0.02 I ).
n....
"'Il
is a (known) matrix (of order nxp*) of regression constants.
(Ili. Ili)
is a })*-vector of unknown regression parameters.
Pj-vector. j=1.2;
p*
2
= PI+P2' a (0
l!j
Il'
is a
< a < ~)
is unknown. and we consider
~ €
+ P2
(R)
is restricted to the
the following situation:
(2.102)
III
€
mPI
is unrestricted. while
nonnegative orthant in
mP2
54
we may actually partition £ as [£1 £2] and rewrite (2.97) as
~ = £1 ~1 + £2 ~2 +
(2.103)
and we desire
restraint on
(2.94)
to consider
~2.
> p*
n
!n:
~1 €
PI
m.
+ P2
(m)
.
improved estimators of
~
under
~
= P2 = p.
This condition enables us to use RSMLE for
In passing.
~
(or
~2)
which dominates the
we may remark that the K-sample (K
location model is a special case of (2.101). where p* =K.
i f we want
~
2)
In this case.
to estimate the means (° •...• OK)' subject to the ordered
1
restraint:
°1
~
...
< OK'
P2=p=K-l.
P1=01
and
Pj =Oj-Oj_l. j =2 •.... K.
may note that
centers on
the set
We set P2=P and assume that
and Rank of
usual RMLE.
~2 €
~1
~2.
we may reparametrize as in (2.101) with Pl=l.
Referring back to (2.103). we
behaves as a nuisance parameter. and our main interest
Thus. we may as well allow £1 to be possibly of less
than full rank.
-
First. consider the case of C being of full rank.
(2.105)
i = (ii. i')'
(2.106)
!! = (n-p)
2
*
-1
(£.£}-1£,~
=
Then
and
A2
II~~ ~II
are jointly sufficient for (~.02). and. moreover.
2
0 (£.£)-1).
(2.107)
i - NpM(~'
(2.108)
* A2 2
(n-p)o / 0
and
A
2
independent
of
~.
or '<n-pM'
In this case. we may use Theorem 2.7 with the modifications for the case
of
~
=
0
2
!.!
= (£'£)
-1
•
0
2
unlmown. and obtain the estimator
57
[considered in detail in Barlow et al. (1972)].
two-way layout:
e.
the RSMLE of
2.7
= ~i
Xij
We may also consider a
+ ~j + ~ij' I ~ j ~ n. I ~ i ~ r and obtain
under (2.117).
Relative performance of SMLE and RSMLE.
In earlier sections. we have noticed the similari ty of the RSMLE
and SMLE in various situations.
Also. the shrinkage versions dominate
their respective classical versions.
SMLE in the light of their risks.
We intend to compare the RSMLE and
In this respect. we may note a basic
difference between the two versions of the Stein-rule estimators.
SMLE is invariant under any nonsingular transformation:
nonsingular.
model X '"
"""
through A
But. the RSMLE is not generally so.
J(
(O.:I). :I known.
p"""'"
- -:I -O.
= O'
Y = ,...,.,.,
BX. ""'B"
""'"
Thus. for the simple
the risk of the SMLE depends on !J. only
I1V
-I
x -+
""""
The
...--
while the risk of the RSMLE. computed in (2.23).
elements of
depends on the unknown 0 in a much more involved manner and may depend
on the individual
consider the model
O.
Thus. for the SMLE. it suffices to
I
2
where!O = (A .0 •...• )·. A
(2.121)
~
O.
o
where. of course. 0 € e . the positive orthant. but on the boundary. On
+
-I
the other hand. for the RSMLE. the risk at a point! (such that !'~ ! =
A). on the A-constant curve. may not remain the same.
It turns out that
the comparison of the risk of SMLE and RSMLE on this A-constant contour
(or any other one) becomes very hard (technically). for general p
> 3.
We attack this problem in the following order:
(~);
(I)
COmparison of the risks at the pivot
(2)
Study of the differential picture of the risk of the RSMLE in
58
the different subspace of 9+. and
(3)
Comparison of the risk of SMLE and RSMLE under (2.121). or
parallel models.
2.7.1.
(for
Relative risks at the pivot. For X
...
1FI)
= "!-!!" 2
Thus. at the pivot
€
p"''''
p
~
3. the SMLE
is given by
s
2
so that "! -!!"
= 2.
~ ~ (O.~).
V p ~ 3.
~.
- 2(p-2)1I!1I
-2,
2
2
(!-!!)! + (p-2) "!" .
the risk of the SMLE is equal to p-2(p-2)+(p-2)
Likewise. for the RMLE ~. we have 1I~_!!1I2
=
~
~SNp
1(!
2
2
~a ){IIX
9=0.
-a.-0
-a. 11 + 110
-a. .1I }. and hence. at ...
... the risk of the RMLE is
equal to
2-P{ ~
(2.123)
~-CN
lal}
~p
= 2-P
(p)r
r=O r
~-p
= 2-P
p_2P- 1
= p/2.
Thus. by (2.35) and (2.123). we obtain that at the pivot
R(~.!!)
(2.124)
= p/2
- ~(P-4) - (p+2)2-P
... ...
Consequently. at the pivot 9=0. for every p
A
(2.125)
~
- (p+2)2-P .
3•
P
A
R(!sM.!!) -
=2
~.
= (p+2)/2 > O.
R(~.!!)
......
In the above derivation. for simplicity. we have taken 1:=1.
same result holds for
be equal to
2
... and
~
.•.
-a.. a
~
2 2-P (p+2).
where the right hand side of (2.125) would
Modifications for general
...
~
can be made
In the computation of the risk. 11_11 2 has to be replaced by
readily.
II-II~
0
2
I.
......
But the
for the RMLE. the X
have to be replaced by -a.
X .. a • and
-a.
The relative picture. however. remains the same.
modifications for the case of
~
(or
0
2
)
~
""8a
by
Similarly. the
unknown can also be instituted
59
along the same line to establish a similar relative risk picture of the
SMLE and RSMLE.
2.7.2
steps.
RMLE.
Directional variation of the risk of the RSMLE.
First. we study the directional variation of the
risk of the
Then. by virtue of (2.32). we draw the picture for the RSMLE.
Again. we consider the model ,..,
X - Hp"""'"
(0.1). "..,
a €
(2.126)
We do it in two
R(~:nv.a)
-nl'l
=
-
e+"..,
. 0
pivot. so that
2
Ea {l(X€!: )[IIX-aIl 2 + 119 ,1I ]}
_
a
-a. -a.
-a.
:I
~CN
- - p
Let eJ>(x). x € R be the standard normal d. f. and 4P(x) = eJ>' (x) be the
corresponding density function.
~
Q}
(2.127)
Pa{X,
(2.l28)
Ea {IIX -a 11
-a. -a.
_ -a.
-
2
= j€a
IT
•
l{X
-a.
~
Then. for every a : 4P cae N •
-
eJ>(-a j ). Pa{X
_ -a.
O)} = Pa{X
~
~
Q}
-a.
p
= j€a
IT eJ>(a j );
O} Ea {IIX -a 11
-a. -a.
--
-
-
2
-
I -X a~. -O}
Thus. by (2.126). (2.127) and (2.128). we obtain that
+
It is qui te clear that
2
:I
On}.
t€a' c;
(2.l29) depends on the individual elements
01 ....• 0p in an involved manner (and not simply on 1I!1I
2
= !'!).
To make
60
this
point
clear.
we
consider
the
special
case of
(2.121)
where
1
~!0={A2.0 ....• 0)1.
Through some routine steps. we obtain from{2.129)
that
(2.130)
R{~.!O)
=
1
~P
+
[~(A2)_~]
1
A[1~{A2)] ~
where by virtue of the Mills ratio.
1
2
the difference A
1
~(A2)
1
-
A[1~{A2)]
-+
°
1
A2
as
1
~(A2).
A
-+
VA
~
0. and
Thus. from
00.
(2.130). we conclude that R{~.!O) ~ (p+l)/2. V A ~ 0; at A=O. (2.130)
equals to p/2. while as A increases.
it monotonically increases and
attains its upper bound asymptotically as A....p;l.
If we look at (2.122)
- -° °
that at A=O. R{X .! ) = 2 and it monotonically increases
and compute its risk R{Xs .9). then it follows from the James-Stein
(1961) result
s
as A increases. and its upper asymptote is equal to p.
Two conclusions
may readily be made from the above results:
(i)
For p=3. under (2.121). the RMLE dominates the SMLE. so that
by virtue of (2.32). the RSMLE dominates both the RMLE and SMLE.
(ii) For any p
~
3. the risk of the SMLE can not be smaller than
that of the RMLE for all A (particularly. for large A). and hence. the
SMLE fails to dominate the RMLE (and hence. RSMLE) under (2.121).
Actually. we shall exploit (2.32) and (2.130) to establish the
--
dominance of the RSMLE over RMLE and SMLE when 9=9°. 9>0.
we consider the other extreme case where
(2.131)
! = 9 1. 9 >
-
°
Before that.
2
(so that A = p 9 ).
If we let a = ~(9). f3 = 1-9~(9)/~{9)
a{l+'Y)=l). then (2.129) simplifies to
and
'Y =
~(-9)/~{9)
(so that
61
(2. 132}
a
P
{p(I-~}(I+~}
p-l
2
+ p 9
~(1+~)
= p{~(9)
- 9
p-l
~(9)
}
=p
+ p
a[a(I+~}]
-1
A[I~(9}]};
Note that at 9=0, (2.132) reduces to p/2, and for p
than (2) the risk of the SMLE at 9=0.
not perform better than the SMLE.
note that for g(x}
= p{~(x)
{JP
~(xVP) -
JP
~
2
{1-~+9 ~)
A
=p
2
9
5, this is larger
Thus, in this case, the RMLE may
To compare (2.130) and (2.132), we
- x~(x) + x2[1~(x}]} - [~(xVP) - xVP ~(xVP)
+ p x2[1~(xVP)]}, we have g'(x)
~(x)} -
p-l
= p{~(x)-~(x)-x ~'(x)
~(xVP) - x p ~'(xVP) +
~(xVP)} = 2 px[~(xVP) - ~(x)] ~ 0, V x ~
o.
2p
+ 2x[I~(x)] - x
2
x[I~(xVP)] - p3/2 x2
Thus, (2.132) equals to
(2.130) at A=O and their difference is a monotone nondecreasing function
of A.
The upper bound (asymptote) for this difference is (p-l)/2
(~1).
In a similar manner, if we take
1
(2.133)
!
= !(k) = «A/k)2 lk,
Qp-k)', A
~
0, (k=I, ... ,p),
we can show that the risk of the RMLE at !(k) is a monotone increasing
function of k (1
~
k
~
pl.
This picture makes it more appealing to
-
consider the RMLE instead of the classical MLE when 9 lies on a lower
dimensional edge of 8+, the most favorable case being R+ x Q (up to
permutations of coordinates).
Note that by (2.124) and (2.125), at !=Q, the RSMLE has a smaller
risk than the RMLE as well as the SMLE.
-
However, when 9 deviates from
the pivot (inside 8+), the risk of the MLE is a constant, the SMLE has a
risk depending only through (A,p), while the risk of the RMLE and RSMLE
-
depend on 9 in much more involved manners.
A single picture can not be
drawn for the entire domain 8+, although the RSMLE dominates the RMLE
over thi s domain.,
62
2.7.3
Dominance properties of the RSMLE.
We have already remarked
earlier that for the model (2.121). the RSMLE dominates (over
the other three (i .e .. MLE. RMLE and SMLE) when p=3.
e+ )
all
We therefore
consider the case of Pl4 and study this relative dominance picture.
A
By virtue of (2.130) and (2.32) [and the fact that
2
-2
2
(p-2) E("J,.A)' A =
we obtain that under (2.121)
R(~.!)
=P =
"!" ].
A
(2.134)
R(~.!)
-
A
= p/2
R(~M'!)
First. consider the case of p=4.
~
2
- (p-2)~(~- 2) - [~(9)-1/2]
p.9
The right hand side of (7.14) reduces
to
(2.135)
~
2-4E(~
2) -
1
[~(9)-2-9~(9)
2
+ 9 (1~(9)]
4.9
= A4(9)
B4(9). (say).
+
Note that B4 (9) is non-negative. for every 9 l O.
show that A (9) l O. V 9 l O.
Towards this note that
r
Therefore. from (2.125) and (2.126). we obtain that
(2.137)
A (O)
4
= 0;
= 29{4
Thus. it suffices to
-4
E(~
2) 6.9
[1~(9)]}.
63
Next, we note that (by partial integration)
_!a 2
[1 - ~(a)] ~ a-I ~(a) = e 2
(~ a)-I, va> 0,
(2.139)
while by definition,
-4)
(2 . 140) 4 E( ~6,a2
=e
_!a 2
2 ~
r
4
_!a2
2
(!a )!_
> ! e 2 va' 0
~r~O 2
r! (2+2r)(4+2r)
2
£
> ../2/11' (which is less than .8), (2.138) is posi tive. At a=o,
Thus, for a
< a < ../2/11', we note that
(2.138) is equal to 0, while for 0
(a/aa){4E(~-4 2) - [1~(a)]}
(2.141)
= ¢(a) - 16a
6,a
= e
=
_~a2 {
1
-I2:rr - a };r~O
8,a
1 2 r I I }
r! -(2-+-2-r-)(-4-+-2-r)-(-6+-2-r-)
(2a )
!a2
2
~(a){l - ~ (a/3)e
~ ~(a){1 - (213)e r - 1} > 0, v 0 ~ a ~
2
!a
-1
2
-.".
as ae
~ v2/.". e
every a
~
o.
E(~-6 2)
and e
so that A4 (a)
~
-1
11'
o.
< 3/2.
Va
~
../2/.".
Thus. (2.141) is nonnegative for
o.
Hence. (2.135) is
~O.
Va
~
o.
so that for p=4. the RSMLE dominates the SMLE [under (2.121). for every
a
~
0].
Consider next the general case of p
Xl - X(a.1) while Xj
-
X(O.I). for 2
~ j
~
5.
Note that under (2.121)
~
p.
We denote the last term
on the right hand side of (2.134) by C (a) and write it as
p
64
Note that by direct evaluation.
= ~(-9)2-(p-1) ~
r=2
(p-1)(r-2)
r
Note that by definition
(2.144)
C(2)(0)
P
For every r
= (p-3)/4+2-P
and C (0)
P
= (p-4)/2
+ (p+2)2-P .
> 2. we denote by
Then. we may write
Noting that the joint pdf of X1 .···.Xr is
obtain that
(2.147)
(8189) A (9)
r
~(Xl-9) ~(X2)
= E9{(X l -9)IIX-11-r2 -l(Xr
...
~(Xr).
we
~ O)}. V r > 2.
Since in (2.147) Xl is confined to R+ • the derivative is posi tive at
9=0+ (as well as for 9 ~ 9~r). for some 9~r)
> 0).
while i t may be
65
negative for 0
m;n Oar».
(2.148)
> 00 ,
COnsequently,
C~2)(0) is nondecreasing in 0
€
(0,
Further, by (2.147), we immediately obtain that
(8/80) Ar(O)
> -0
>0
Ar(O), V 0
and
r
> 2.
COnsequently, from (2.146) and (2.147), we obtain that
(2.149)
(8/80)C~2)(0) > -0 C~2)(0), V 0 > O.
The last inequality leads us to
Thus, by (2.142), (2.143), (2.144) and (2.145), we have
p-5
(2.151)
Cp(O)
> 4>(-0) {""""2
p+ 1
--0 p-3
1
12{
+ 2 - 1} + e 2
4 + 2P} , V 0
P
> o.
COnsequently, the right hand side of (2.134) is bounded from below by
(2.152)
p/2 - (p-2)
2
-2
2) +
p,9
E(~
{O~(O)
2
1
- 0 [1~(0)]} - 2
p-3 p+ 1
--9 p-3
1
12{
}
+ 4>(-9) {""""2 + 2P-l } + e 2
+ 2 , V0
P
4
At 9=0, (2.152) reduces to (p+2)/2P
> 0,
> O.
and proceeding as in (2.136)
through (2.141) [but replacing 4 by p], it follows that the derivative
(Wi th
respect to 9) of (2.152) is nonnegative for all 0, and hence,
(2.152) is nonnegative for all 0
dominates the SMLE for every p
~
> O.
Hence, under (2.121), the RSMLE
3.
Let us next consider the relative risk picture under (2.131).
this case, (2.32) reduces to
In
66
say.
(2.153)
where
(2.154)
o
A (a) = f ... f
r
Rr +
":s" -2
r
1I
j=l
cp(xj-a}dx1 ... dx.
r
2
<r
~
p.
Proceeding as in (2.147) and (2.148). we obtain that
(2.155)
Consequently. proceeding as in (2. 149}-(2. 150}. we obtain from (2.153).
(2.154) and (2.155) that
P-4
[2
P+2] • V a ~ o.
+ 2
P
Next. we note that under (2.131).
p{~(a}-acp(a} + a2[1~(a}]} + c*(a}
p
1 2
--pa
~ p{l~(a} + acp(a}-a2[1~(a}]} + e 2 C*(O} - (p-2}~(X-2 2)'
p
p.pa
Now the problem is to show that (2.157) is nonnegative for all p and a
posi tive.
But.
unfortunately,
further simplification of
expression becomes highly complicated.
the above
Available mathematical tools do
not work well to give us the desired result for all p and a.
Hence we obtain following some partial results.
a}
Nonnegativityof (2.157) for large a.
To see this we first write:
67
1
1
p
E ---- = ~ (p) E
• I(X
X ~ 0 X
X < O}
a IIXII2 r=O r
a IIX 11 2
1····· r
. r+l····· p
-r
Then rewriting (2.157) we obtain.
A
A
(2.158) R(aSM.a) - R(aRSM.a) =
where.
(
)
A p-r (a) = I
r
or
p{~(-a)
2
II! 11-
r
rI
j=1
+a
¢<xj-a}
~(a)
p
rI
j=r+l
2
- a
~(-a)}
¢<xj+a}~.
Now. note that A(p-r)(a}
= o(~(-a}} as ~ if p-r > 0 (i.e .. if r < pl.
r
(p-r)
-2 r thr
p
Because. Ar
(a) = 1_0+ II! II
rI ,+,\xj-a} rI
IK'
j=l
r+l
thr
,+,\xta)~
= A(p-l-r} (a) ~(-a).
p
It is now easy to see that A(p-l-r} (a) ~ 0 as~.
Hence. A(p-r}(a} =
r
o(~( -a}}
as a
~ co
1£
r
<
p.
r
Next observe in (2.158) the terms
corresponding to r=p are equal in both the sums. hence they cancel each
other.
Thus we have.
A
A
R(aSM.a} - R(aRSM.a} =
~
p
~(-a)
+
p{~(-a}
o(~(-a}}
+
a(~(-a}
as a
~ co.
-
~(-a}}}
+
O(~(-a}}
58
.....
.....
So we have proved that a
RSM
dominance is
b}
dominates aSM for large a and the order of
~{-a}.
Nonnegativity of {2.157} for p=3 and 4 {for all a}:
consider the case p=3.
Let us first
In this case {2.158} reduces to
{2.159} R{9SM .a} - R{9RSM .a}
- Ea{II!1I
2
= 3[1~{a}
+
a ¢(a) - a2 ~{-a}]
I{! ) O}}
Now.
and
E {IIXII-2 I{X)O}}
a -
=I
--
~
2 1
IR3+
IIx 11-2 e-3/ 2a -2 11! II
2
-
j=l
!8 EaIIXII-2
Thus.
Call the rhs of above equality
-2
E{)(3.A}
so that
Thus.
1
=e
-2A
D{a}. Further.
1 r 1 1
k~ {2A} r! 1+2r
m
3
11 e 9xj {2lr}-3/2dX
-
69
Therefore,
D(a}
~ 3[~(-a}
2
+ a ~(a) - a
~(-a)
7
~
2
2
3
4
- 8[1 - a + 5 a ]
~ 3
= 3(I-a }[I~(a} - 24] + 3a[~(a} - 40 a ]
7
2 17
3
= 3(I-a }[24 - ~(a)] + 3a[~(a} - 40 a ].
17
Set aO such that
= 24.
~(aO)
(Note 0
we obtain immediately that D(9}
< aO < I). Then noting that
> 0 for 0
~
9
~
90 .
For 90
<9
~
1, we
note that
17
~(a)
- 24
= ~(a)
so that 0(9)
~
-
~(aO)
= (a-aO)
~(haO +
~ (a-aO}~(aO)
(l-h}a)
2
-3(1-9 )(9-90 ) ,,(90 ) + 39["(9) -
positive, for every 9
~
Consider the case when 9
1.
~
thus D(9}
1.
>0 V0
Note that
~
9
~
~
1.
3
9 ] and r.h.s is
70
I 3+
= 3
3 2 --lIxli
1
2
--8
1I~1I-2 e 2
2
IV
e
8x -AY_-9x_
1 --~ ---:lea} -3/2~
IR
2 1
2
I 3+ ,,~,,-2 e -3/28 -2"~" e
+
-ax
-(he -8x....
1 --~
.j
(2lr) -3/2~
IR
Thus,
1 2
D(8)
~ 3[{~(-8}
+
oM
~8}
- 8
2
~(-8}}
1
- 2
E(~
-2
2}e
3,28
-28
212
-8 E( -2 } _ __ -3/28]
4 e
~
2
24 e
.
3,8
_!
Also note that,
-~A ~
-2
E(~3, A}
= e
k:a (~A)
r 1
1
;l"1+2r
~
-!A
~
2 e 2
k:a
(~A)
r 1
1
;l" 2+2r
Therefore,
D(8} ~ 3 {~(-8) + 8 ¢(8) - 8
[
-
_1_
2~
(1 - e
1 2
-2,8
) e
_82
2
-
~(-8}} -
~
24
e
1
-:2
2 _!82
8
(1 - e- }e 2
28
-3/282 ]
Finally,
4>(-0)
+
0 ,,(0) - 02 ,,(-0)
= 0 ,,(0)
- [0:21] 02 4>(-0)
71
2
9 -1
= 9 ~(9) - ---2- {9
~(9)
1
- 9
~(9)
+
0(9
-3
)~(9)}
9
Hence after some trivial simplifications we have D(a)
~
0
Va
1.
~
The
same treatment holds for p=4 (as. ~ = p-2) but not for p ~ 5 (as ~
p-2).
Thus we obtain the following result:
Theorem 2.8.
The restricted Stein rule estimator 9
dominates the
RSM
unrestricted one 9
for p=3 and 4. under the model (2.131).
SM
p
~
<
5 the same conclusion holds for large values of
2.7.4
Further if
a.
Risk computations for other versions of shrinkage estimators:
Here. we are mainly interested in the computation of the risk
functions
of
posi tive
rule
versions
of
the
standard
shrinkage
estimators.
In the restricted case.
the posi tive rule estimator assumes the
form:
(7.38)
0+
-JS
= (1 -
E:!...t~.
IIxll2
-
-
and
(Throughout this section we assume the covariance to be I.)
It is quite difficult to handle the above quantity for general
the asynunetry.
It
-near the pivot.
--
significant near 9=0.
9=0.
is known
that
the
reduction
in risk
!.
due to
is only
So. our main intention is to compare the risks
72
which after some routine simplifications yield.
(2.160)
For
the
A+
R{9
.Q)
JS
_.2
= 2 P{~
positive
> p-2).
rule part
of
computation is quite similar.
(2.161)
R{O+RSM.Q)
=
the
spherical
I * P{X .
aCJN
a
symmetry
shrinkage MLE.
the
Here.
~
Q) Ell (1 -
2
~2
+]x 11
IIX II
a
a
- p
(Using
restricted
of
the
normal
l{X
a
distribution
> 0)
and
some
simplifications similar to those done in section 2.3.)
Now. rhs of (2.161) is a function of lal only.
One more step of
simplifying yields.
(2.162)
R{~.Q) = 1-P ~ (~)
2
Now.
r=3
2
P{~ > r-2).
direct evaluation of these numbers for different values of p show
that (2.162) is not smaller than (2.160) for any p. As a future project
we would like to consider some numerical coputational aspects of this
problem.
aIAPTER III
IMPROVED ESTIKATIOO UNDER GENERAL SE11JP
Introduction
3.1
Consider a (statistical compound decision problem
consisting of
p(~l)
component decision problems
relates to the estimation of a parameter OJ' 1
~l'
~
j
...
~
~ (=(~l' ... '~p»
'~
P
~.
• such that
p.
J
Thus. we have
the simultaneous estimation problem for the vector! = (OI ....• Op)' in a
compound decision theoretic setup, where for OJ we may conceive of an
A
A
estimator OJ and a loss function 2 j (=2 j (Oj.Oj»' such that we have some
admissible estimator 6 j of OJ' for j = l •...• p.
It may be qui te natural to combine the 6 j into ! = (6 •.... 6 ) ,
1
p
A
under a compound loss function (e.g .• 2(!.!) = !~=1 2 j (6 j .O j » which may
A natural question arises:
or may not be additive in the components.
A
Is 6 admissible for
#01
° with respect
#01
to the compound loss function?
For the multi-normal mean vector
° (and dispersion matrix a
#01
Stein (1956) had the fundamental observation that for p
2
respect to the quadratic error loss 1Ij!-!II. the MLE of
~
A
2
I ).
-p
3. with
e is not
#01
-
admissible. although for each component OJ' the MLE 6 j is admissible.
A
2
A
2
In this context. the additive loss II!-!II may be replaced by II!-!II Q =
A
A
(!-!) , g(!-!) and the dispersion matrix may as well be of the form
for
some arbitrary
(p.d.)
!.
Thus.
from
the
point
of
~.
view of
multi-parameter estimation (in a general multivariate setup). one may be
74
interested in establishing the inadmissibi 11 ty of the component-wise
estimators without necessarily imposing additivity on the component loss
functions and/or independence of the component estimators.
Led by Stein
this had indeed been a very active area of frui tful research
(1956).
during the past thirty years.
However.
in this chapter we deal
with simultaneous estimation
problems under additive losses.
In
Section
3.2
assumptions regarding
stated.
the
general
problem
is
the statistical m9del and
formulated
loss
and
the
functions are
In Section 3.2.3 we describe the class of estimators containing
the improved estimators we are seeking.
In Section 3.3 we derive a Stein type inequa11 ty for the risk
differential between the standard estimator and some estimator belonging
to the class containing all potential improvements.
In Section 3.4 a heuristic way of solving the inequality obtained
in the previous section is described.
these results in the next section.
We give some applications of
Finally in Section 3.6 we deal with
the case of discontinuous loss functions which is not covered in earlier
sections.
3.2 Formulation of the problem
3.2.1
The Model:
Simultaneous multiparameter estimation:
consider the improvement over a
problem.
In this section we want to
standard estimator in the combined
Since we have already indicated that we would sacrifice
mathematical abstraction we begin with the simplest formulation of the
problem:
Suppose, Xl' X ,.· .. X are p-independent random variables.
2
p
For
75
~
each i. 1
i
~
p.
X.1 is distributed according to the probability law
i
P9
with
i
distribution function F i ( •• 9 ) and F i has a density with respect to
i
Lebesgue measure
~.
(3.1)
9
i
a.e.
is the unlmown parameter.
(~).
1 ~ i ~ P
We assume that 9
is a
i
real valued
parameter. where.
Observe that if 9
i
is fini te then by a translation on the original
problem we can always assume that 9
about
1
= O. A similar comment can be made
ai .
Remark:
We might as well have a nuisance parameter associated with the
main parameter of interest.
Assume for the time being that there is
none or even if there is any it is known.
We shall discuss the case
where there is an unknown nuisance parameter later.
Our
objective
is
to
estimate
the
parameter
simultaneously.
Suppose for
of 9 i • 1
~
the i th estimation problem.
6 (Xi)
i
9
=
is an estimator
i ~ p.
= 6 i (Xi )
we can as well assume
without any loss of generality.
We shall also assume
By choosing the transformation. Y
i
that 6 i (X i )
-
= Xi'
that 9 's are mean like parameters for the distributions of Xi·s.
i
least we have to assume that 9
i
(At
is an one-to-one and invertible function
of the mean with certain smoothness properties.)
76
So. we make the following model assumptions:
(3.2)
Ml:
X1 •...• X are independent random variables.
p
M2:
E Xi = 0i' for 0i € 0i = (0 .0 ). 1 ~ i ~ p.
M3:
0i is an open interval in the real line.
M4:
The sample space for the ith problem.
i -i
~i'
is also a subset
of the real line.
M5:
For each i. 1
~
i
~
p. the map. 0i
~
Po
is continuous
i
w. r. t.
suitable topologies in the respective spaces.
(Topology in the space of PO's should be such that. 0
~
EO g
is continuous for the class of functions g such that.
EO Igl
M6:
f
<m
•
for some f
> O.
The two end measures are singular in the following sense:
and
M8:
sup Eluilt
°i
<m
V t ~ f.
Discussions about the assumptions:
1.
The motivation for MI-M2 has already been discussed.
M3-M4 will be
regarded as basic assumptions in the context of simultaneous estimation.
In fact M3-M4 can be relaxed somewhat. but we would not attempt to do
that here.
2.
M5 is helpful in dealing with the compact subset of the parameter
•
77
space, specifically, we may need to conclude that,
if for some function g(.), E g(X)
a
inf
E
a€KOl
a g > 0,
if K is compact.
>0
for all
a,
then,
M5 is relevant in that context.
M6 gives us the liberty to deal with the problem restricted near the two
boundary points almost independently of each other.
This turns out to
be a very useful assumption while analyzing the edge behavior.
3.
M7 is required for the validity of the most of the analysis we would
be
doing.
It
can be
seen
the assumptions M7 and M8 mean
In fact M8 implies M7 in most of the
essentially the same thing.
situations.
that
But we would still like to include M7 because of its nice
intuitive interpretation (i.e., bounded coefficient of variation model).
4.
Finally, in order to verify that the above assumptions are satisfied
by all standard families consider the following examples:
ex.
1:
Natural
parametrized
exponential
families
coefficient of variation:
Let, X '" Pa where
d P
a -_ eex,,(a)
-) ( x ) ,
( ~,x
a < a <9
without loss of generality assume that
a < 0 < 9,
(3.3)
dx
hex) I
f'
h(u)du
<
ClO
x
E
a
x = ~(a),
•
2
Ea(x,,(a»
= ..~(a).
Then bounded coefficient of variation implies:
sup
a
••
~(a)
/
-2
~
(a)
<
ClO
so that,
with
bounded
78
(This condition is satisfied in normal (a,l) and f-scale families, for
example.)
MI-M4 are trivially satisfied after suitable reparametrization through
Ea X
=~.
The same is true with M5.
For M6. observe that for any x. x
< x}
Pa(X
(3.4)
< x <x.
= e~(a}
~ e 9u h(u}du
~
(e 9x V
e~}e~(a)
=
e9x~(a) ~ h(u}du va> o.
i x h(u}du
va
x
~ K e
fX h(u} du
e(x-x'}
x
, (K =
-:=----)
for any x' > x.
fX,
h(u} du
x
For, if x' > x,
e~(e) ) ~, e 9u h(u)du ) e 9x ' ~, h(u)du.
Thus. in case. e is finite. since ~(a) ~ ~ as e ~
~
9. hence. Pa(X < x)
~ 0
as
a ~ e.
If. 9
= ~.
9. e9x~(a) ~
O. as a
by above inequality we
have.
Pe(X
< x)
Hence. M6 is satisfied.
o
as
ex. 2:
~
K e
a(x-x , )
~
0
as
e~9
= ~.
(A similar proof goes through for. Pe(X > x)
~
a ~ e.)
Let. X "" Uniform (o.a). e ) O.
It can be checked that for this
family also MI-M7 hold.
3.2.2
Loss functions:
As we have already indicated the loss function
79
for the simul taneous estimation problem would be taken as an addi tive
loss
The
function.
use
of
the additive
loss
function
is
quite
Since the decision problems are independent among
appropr iate here.
themselves it does not seem appropriate to consider a loss function
where the sample loss in the ith problem depends on the true value of
some 9 j ' j # 1.
In case the loss function is not additive. we shall
classify the si tuation as a mul tivariate decision problem.
(1980)
for an example of
the non-additive
loss
in a
See Brown
simultaneous
estimation problem.
Loss functions for the component problems would typically cover all
important loss functions.
For non-differentiable or discontinuous loss
functions we have to consider different techniques of analysis.
Here.
we
first
deal
wi th continuous
loss
functions which are
differentiable except possibly at a fini te number of points.
have:
We need the following assumptions for each 'i to satisfy:
Ll:
For each fixed 9 • 'i is continuous in x.
i
L2:
Except possibly at finitely many points til •...• tiSi.'i is
differentiable.
> O.
1...3:
For some a
L4:
For the same a as above. define
So. we
so
and we have.
1.5:
(Growth
condition
on
t
ii
):
Outside
containing til •... tiSi. t ii is Lipshitz of
L6:
some
order~.
compact
interval
i.e .•
The discontinuities are not spread very far away in the scale of
the standard deviation. i.e .•
L1:
a.
Moments of various derivatives of t i·.
For some r
> O.
Let.
b.
for all 9 .
i
c.
s. t .•
t ii
~
0
if
Xi
<fi
~
0
if
Xi
>fi .
Discussion of the assumptions:
1.
The non-differentiability occurs for some important loss functions.
In order to include loss functions like.
81
l{x.O)
= Ix-o I
or.
k
l{x.O) = Ix-o I
k
= Im-Ol
if
if
Ix-o I
Ix-o I
<m
>m
we have to consider possible non-differentiability.
2.
Lipchitz type conditions are allowed near the discontinuity points
in order to include loss functions of the form:
l.{x.O)
I
= lx-ala.
< a < 1.
0
And. in that case. iii also satisfies the required condition.
3.
For LP-type losses. where. 1
everywhere.
< p < 2. iii is no more differentiable.
Hence. in order to include those types of losses. 1.5 is
assumed.
4.
Typically.
the first derivative of li'
iii' gives a scale for
estimating the pointwise bias of the estimator.
li{x.O)
= (x-0)2.
iii
= 2{x-0)
As for example. if
gives us the pointwise bias of X.
So. in
order to make the loss function reasonable. it is natural to assume that
if the estimator is too much to the left (or. right) of a. the bias is
negative (or. positive).
Thus. 18 can be justified from this point of
view.
3.2.3
Class of improved Estimators:
As indicated earlier. we want to
improve essentially on the estimator.
It is also noted that in many cases by a suitable transformation we can
pretend that ,..
X is the standard estimator which is intended
improved upon.
6 I {!)
(3.7)
where.
=!
to be
So. in general. we seek improvement of the form:
+
z{!).
z{!) = (~I{!)'···'~p{!»··
But unfortunately it is very hard to have a handle upon very
82
general
~i{!)'s
unless we restrict to some situations where the exact
Stein type identities are available (like. normal or Gamma case under
quadratic type loss functions).
Also. by looking at the form of improvements already obtained for
various cases. it can be said that they follow a general pattern.
For example. in the normal case. the usual Stein rule estimator is
given by:
(3.8)
Also. as found by Berger (1980). in the simultaneous estimation of Gamma
scales. one comes up with solutions of the form:
(3.9)
1
~
i ~ p.
where. in most cases.
gj{x)
=x
d
j
or log x.
and.
Hence. it seems that it is really enough to restrict to the class
of rational functions while seeking the form of the improvement in
general.
But. as we shall see later. the need for this restricted class
of improved estimators is dictated by the fact that we have to compute
approximately some eXPectations involving
~i·s.
For those computations
we can restrict to the following class of estimators. which includes
some more general functions other than rational functions.
Define a class of functions:
lO
= {g
: g :
mP ~ m.
Ealglf
<m
for all! € 0
S3
s. t. ,
= b(~)
EO g(!)
....
where,
+ a(~) g(~)
supla(~)1
,o
...
sup Ib(~) I
<~
< ~}.
,o
...
p
a
It is easy to see that the monomials of the form, n Xi i are included in
1
this class of functions provided, a
i
~ r
and
sup Eluilr
oi '
< ~.
Now, we define the class of estimators in which we shall seek the
improved estimator:
Define the class,
(3.11)
where, gij and hij's are included in lO'}
So, II contains all rational functions in particular.
Now, typically we
need the denominator to be posi tive, but for general purposes it is
sufficient to assume that the functions
~i's
have finite moment so that
later calculations are meaningful.
Remark:
There is an interesting connection between the class lO and the
Hudson's identity.
The above identity considers the class of distributions for which,
= EO
EO g(X)(X-O)
a(X)g'(X)
of functions.
g'(O)
E~(X)(X-a)
=
~
0
for all g in certain class
Now, for any g in the class lO such that,
for all a outside a compact set,
* )(X-O) 2 ,
E~'(X
OK
we have,
after expanding g in Taylor series (EX=O).
Now, if g belongs to lO' then g is polynomially dominated.
Hence, g' also belongs to lO' and
84
(3.12)
g (X*)
I
:~ E9 { Ig'(9)
Now, if
I
_..2
co
< ,
u-}
then,
= g (9)
= [a(9)
2
I
E9 g(X)(X-9)
0
(9)
g'(X*) . .2
E9 g'(9) ur
02(9)]g'(9).
So, the above can be considered as an identity written in a parametric
form of
0
2
the Hudson's identi ty.
Also, note that for models where,
(9)/9 ... 0, we shall have,
g' (X*) -'" 1
E9 g'(9)
~
as
9 ... boundary of the parameter space.
so,
(3.13)
E9 g(X)(X-9)
= g'(9)
0
2
(9), which is the parametric form of the
Stein's identity.
Next,
we
prove
the
following
lemma,
which
essentially
generalization of the Stein type identi ty discussed above.
is
a
In what
follows, we assume the condi tions MI-M8 on the model and LI-LS on the
loss function.
COnsider an improvement,
=
g(X )
i
A + I heX
~i(!)'
of the following form:
>0
(3.14)
~i(!)
where,
g and h are continuously differentiable functions
j
)'
A
arguments.
Also, assume the following holds:
(3.15)
g'(x)
i)
>0
for large Ixl.
(i.e., x near ~ and x)
of
their
85
ii)
hex) ~ 1 for all x. and. h(~{X}) behaves like a polynomial
function of a for fixed x.y.
Now. state the lemma.
Lemma 3.1:
For
~.(X)
1
...
defined above. if
v.(9.)
1
1
~i
= x.(9.)lg(9
1
1
i)
<~
h'(fi)1
And.
~i
Proof:
=
sup
A.e i
Consider the expectation in the left hand side of (3.16).
Let us define the following quantities:
~
= h("k)
f
and ~
k
= f k (9k )
Then. rewrite (3.17) as:
= h(fk ).
1 ~ k ~ p.
is defined in LB.
where.
86
A
where, f
i
is an intermediate point.)
Consider the second term in the above expression:
(3.19)
where, Zi(Oi)
= lii(Xi,Oi)
A
g(X i ) h'(fi)(Xi-f i )·
Then, the random variable,
A
*
lii(Xi-f i ) g(Xi)h'(f i )
Zi = Ai(Oi) g(Oi)h'(f i )
is normal i zed proper ly for large
-1
(fi-Oi)a
tyPe.)
1°j I.
(Because, g' ) 0 u It imate ly ,
(Oi) is bounded in moment and both g and h are polynomial
Now. if Zi* € lO (as defined before).
Hence. we have the following:
87
Ai (9 i )lg(9 i )h'(f i )1
A + h + }; h
iO
j
j;1!i
where,
<
CIO
Thus. rewriting (3.19) once more. we obtain:
(3.23)
= ~i
Ai (9 i } Ig{9 i }h'(f i }1 EO
1
2
... (A + D )
i
Hence. we have the lemma.
QED
3.3 A Stein type identity for general decision problems.
We have seen from the heuristics of previous sections that in order
to get an improved estimator, we essentially have to analyze the problem
near the boundary of the parameter space.
Following Berger {1976c}, we
first establish a representation for the risk-differential. A'l (0).
near
...
the boundary of the parameter space.
We shall call that a Stein tyPe
identi ty and try to justify that nomenclature.
Then.
section
construct
we
estimators.
assumed.
would
use
this
representation
to
in the next
improved
All the assumptions (M.L) stated in previous sections are
Now. suppose that the potential improvement 'li for the ith
component is given as before. i.e .•
A
> O.
c
>0
88
e
I
Assume the following for
II:
~i
and
~ij
12:
i}
il}
~i:
is continuously differentiable up to the first order.
{the jth partial or
sup
!
biHDI
°i{Si}
~i}
Also.
~i
belongs to the class lO·
< c for c sufficiently small and A large.
sup sup I~i .{!} I = d
j
x
J
< 1.
#>I
Then. we have the following lemma.
Lemma 3.2:
Suppose that for fixed 1
Assume the model assumptions MI-M8.
that si=1 in L2.
~
i
~
p. li satisfies LI-LS.
Without loss of generality assume
Then. there exists an open neighborhood. Ni • such
that.
as 0 -+ 00.
#>I
ii}
Proof:
li is regular outside N {i.e .• Taylor series is possible.}
Let. t
i
i
denote the discontinuity of lii.
write.
Now. define. N as follows:
i
Then. by L3 we can
89
Hence,
For fixed, !(-i) = (xl'··· ,x i - 1 ,xi+1'··· ,xp ) expanding "1' i in one step
Taylor series about t
i
yields,
.....
(3.28)
"1'i(xi'!(_i»
= "1'i(ti'!(_i»
+ "1'ii(ti'!(-i»(xi-ti).
Thus, by 12,
From (3.29) it follows that,
Hence, the right hand side of (3.27) reduces to:
~
(1 +
l~)a EO
-(-i)
<
Now,
l"1'i(ti'!(_i»l
l"1'i(ti'!(_i)I
1 - d
)
a
Po (lxi-til
i
90
Jl
= POi (lUi
-
(Oi)
~i(Oi)
Next. notice that by L6 and Il-I2 we have that.
and.
Hence. for A sufficiently large and c small. we have
(3.32)
POi(IXi-til <
= O(
l~i(ti'~(_i»I(I-d)-I)
(l~i(ti'!(-i»I)
0i(Oi)
) for A large (uniformly in !(-i) and 0i)·
Hence. (3.31) can be rewritten as:
-
8 -+
(since. g. h
€
1
20 , EO
A + I h(X)
-( -i)
j;l!i
j
an.
= O(A
1
+ I h(O » near
Thus. from (3.27). (3.31) and (3.33) we obtain.
j
an.)
91
Hence, (i) follows.
(ii) now follows trivially from the definition of
QED
c
Next, on the set Ni , we can expand the difference,
li(x i ,9 i »
in Taylor series.
= lii(x i ,9 i )
(li(xi+~i,9i)
-
Hence, for fixed !(-i)' we have
"
~i(!) + (lii(x i ,9 i ) - lii(xi,9i»~i(!)
where, lii(x" i ,9 i ) is the derivative of li evaluated at an intermediate
point between Xi and (xi +
~i(!»·
Hence, we have,
Now, by the growth condition L5 on lii' we can write,
(3.37)
Hence, by previous arguments we get the following representation:
Next, we write,
92
The following lemma establishes an bound for the second term above.
Lemma 3.3:
Under the above definitions and regularity conditions:
(3.39)
Proof:
As before, without loss of generali ty we assume li as only
singularity at t .
i
Hence,
Now, by the assumption L4, we have that the above quantity is less than
or equal to:
as
-e
-+ c30
This follows from the same argument used to derive the lemma 3.2.
Remark:
QED
In some situations of interests, the above term is actually of
smaller order.
In case the following is true:
we can say little more, namely that
93
Typically. we would have.
"Y i i un
-+
0
...o -+
as
an.
Hence. we have.
(3.39) is of smaller order.
We state the main theorem of this section now.
This type of
representation has been considered by Berger (1976c) while considering
the concept of tail minimaxity in location families.
This type of upper
bound on the difference between the risk functions of the standard
estimator.
!. and the potential improvement. "Y(!). will finally let us
to solve for the improvement
Theorem 3.1:
Assume
A.
estimators.
~
~(!)
the regularity conditions M.L.1.
Define
the
A.
and
21 as follows:
A.
(3.44)
"Y i (!).
A.
=! and 21(!)
= X IV
c f?j(!)
------~
P
X - "Y(X)
IV
,.,,'"
A + I hj(X j )
j=l
where.
f?j(:S) = (gl(x 1 ) ..... ~(xp»'
and
A. c
> O.
values of A and small values of c are of interest).
(Typically. large
Also. assume that.
...o -+ 00 .
Then. we have.
(3.16)
A~(2) ~ -j~l c{+j(Oj)
p
+
O( I
j=1
EOj A +lD j
~j
vj(Oj) EOj (A +lD )21
j
kj(Oj)I"Yj(!)11+a
p
l+P
a (9 )
) + O( I dj (9 j >l"Y j (!)I
)
j
j
j=1
94
before.
Moreover. if li is differentiable everywhere. the second term in
(3.46) vanishes.
Proof:
The proof follows from earlier lemmas.
As 0
""
an.
~
p
'"
= j:1 [IN~ [-~j(~)ljj(Xj.Oj) - ~j(~){ljj(Xj.aj)
1+a
Kj(O j) I~ jO!> 1
- ljj(Xj,O j )}] fO(!)~ + O(
U (0 )
""
j
)
j
(follows from the lemma 3.2)
p
= - 1 EO
j=1 -
ljj~j
p
+
1 IN ljj ~j f
j=1
i
p
!
+ O( 1
j=1
11+a
I~
K
j
uj
j
)
95
p
+ O{ I
j=l
The last steps follow from lemmas 3.1. 3.2. 3.3 and (3.45).
Remark:
QED
The above theorem is very crucial in obtaining the form of the
Stein type improvement.
Actually. what it shows is that some kind of
differential inequality holds near the boundary of the parameter space.
This type of differential inequality is very similar to ones obtained by
integration by parts (we already mentioned earlier).
But. to show that
!is inadmissible. we do not need the inequality in the exact form. (the
exact form is possible to derive only if there is some specific relation
between l. and f 0).
J
,.,
Actually. by adjusting c and A. we can use the
above inequality to find the improved estimator.
heuristic solution to (3.46).
We describe below a
This heuristic solution will give us the
reqUired form of the improvement and also the critical dimension needed
for inadmissibility of X.
,., in some situations.
3.4 Heuristic solution of the Stein type inequality:
Here. we describe a way to solve for! in (3.46).
the D-terms in that expression.
the other.
First consider
Typically. one of them will dominate
While giving the general heuristic for solving (3.46). we do
not have to decide which one is the dominant term. except for observing
that both the terms have the following structure:
(3.47)
~ wj{Oj)lg{Oj)l
q
j=l {A + I h{Oj»q
for some weight function w and exponent q.
j
Next. consider the first term.
Break it into two parts.
Hence. we
96
obtain the following,
Now, in case we have, aj(Oj)/Oj
~
But,
in general,
as
assuming
OJ
~ ~j
or OJ' we can find out
-1 and E(A + D ) -2 as ~ ~
precise order of the expectations, EO(A + D )
j
j
an.
0
only
the
bounded
coefficient
of
variation, all we can say is,
1 "7':(O::-j~) ,
( 3.48)- E~ A +1Dj ~ aOj ~A-+-:I;:-;"hj
Actually,
in some
estimates of a
Oj
special
and a
1j
.
cases
it
v
is
j
possible
to
obtain better
Now, (3.48) is, in turn, smaller than,
~ ~j a 1j Vj(Oj)]
j=1
A + :I hj(Oj)
1
A + ~ h(Oj)
j=l
provided
~j(Oj)
>0
for all OJ.
Now, intuitively it seems quite clear
that (although, we can not give a formal proof of that fact here), one
should try to obtain a gj such that the following holds:
Any such gj
(actually,
there would be essentially unique such gj)
satisfying (3.50) would be called a cross product (or,
stabilizer with 'jj.
claim:
covariance)
Once we obtain such a gj' we can invnediately
97
Now. we go back to (3.41) once more.
The order is given by:
! wjlgjlq
(3.52)
(A + ! hj}q
Notice that (3.47) is always positive.
So. to find a solution in (3.46)
one must have the following:
1
which leads to the following choice of h :
j
1
(3.53)
h ,.. w q -l
j
j
as
ej
... ~j
or
ej
And. we have to verify one more condition at this stage.
(Typically.
for most of the problems this will be automatically satisfied.)
(3.54)
Then. we can write. after using everything derived above.
As we can see from the earlier considerations that the sequence
{a
1j
} are uniformly bounded under the assumptions M.L.!.
{~j}
Thus. if
and
98
~
= pc
A~(!) ~
We get
0 for all
I
!
!. if A is large enough. This is essentially a
generalization of the large dimensional argument considered by Stein
For. precise lower bound of the critical dimension p • we need
(1956).
to
c
know more
about
the
distribution
and
some
more
manipulations
(sometimes quite complicated) in special cases might be ,necessary.
3.5
Some Applications:
The above techniques and heuristics can be used virtually in any
simul taneous estimation problem.
Example 1:
(location problems)
where.
are
real
We consider a few examples below:
Consider the following model
valued
parameters
and
are
independent random variables wi th mean O. and fini te moment up to some
order (to be specified later).
Consider the loss function of the following form:
Here. wj is a positive weight function and P is a nonnegative function
j
such that.
i)
Pj is differentiable everywhere (except possibly at 0).
ii) Pj(x)
= ~j(x)
is nondecreasing.
(x
~
0)
Also. assume that. if P is not differentiable at O. p(x) small.
Hence. from preVious considerations. we have a=1.
Ixl for Ixl
Also. Kj(Oj)
•
99
= wj (9 j ).
Next consider
defined before.
improvements of
the form 'l'j(!) which have been
Then. we would have.
T
= sup
j
sup I'l'j(!)I
< m.
!
and.
sup l~j(x+t-9) - ~j(x-9)1 ~ Gj (x-9)T
Itl<T
except possibly in a neighborhood of
o.
Hence.
for T small.
~=1.
Now. the first step in solving for the improvements is to find out
the asymptotic covariance stabilizer of lii.
Here.
"-
(Note that here we take the best invariant estimator
standard estimator.
later.
~
=
!
as the
we will consider Bayes estimators with
respect to some improper priors.)
Hence. we want to stabilize.
Now. observe that for Xi to be admissible in one dimensional problems.
we mus t have.
otherwise. some shift of Xi would be a better estimator.
Actually. we
can assume (3.60) without loss of generality.
We solve for g in (3.59) next.
(3.59) can be rewritten as:
100
If there exists a solution such that, g'(e)
(Since, both g and
(Eg)(E
~i (U
i
»
~
= 0.)
gi(x)
everywhere, then,
are nondecreasing so we have, E
= jX Wi~U)
For that, choose, g such that,
du
for
Ixl large.
Now, from the considerations of the earlier section:
(3.63)
mOj
Also, note that
a=~=1.
Hence we obtain the following:
-
as 8
-+
an.
Also, in place of (3.52) we obtain:
(3.65)
:I w/9 j )lg j
{:I h
j
I2
g(9i+Ui)~i(Ui)
~
Hence, all we need to do is to solve for the
stability of (3.61) near boundary.
In other words,
>0
(q=2)
+ A)2
Hence, (3.53) yields that the choice of h
j
should be
101
And, (3.66) in turn gives us the following expression:
as c
-+
0
(3.67) follows after observing that,
=2
wj (9 j }gj(9 j } Ig(9 j }I
2
wj (9 j } (1 +
wj(9 j }
(W (9 )
j
-+
0
wj(9 j }
W
j
(9
)}
j
as
j
The above expression reduces to more familiar expression if we consider
the following loss function:
=x
In that case, choosing g(x}
(3.69)
A (!) ~ r
proving that,
of the form:
c
A +
~(!)
suffices, and thus we obtain:
2 {p-2+0(1}}
11 9 j l
is inadmissible for p
~
3 and an improvement exists
102
(3.70)
~I09 =
! -
x...
c A + IIXII 2
...
Notice that this result actually follows from Brown (1966).
Remark:
In this analysis, (3.63) is a very crucial expression.
The
exact evaluation of (3.63) might be qui te hard in some cases wi th
general weight functions.
Actually,
a better choice of g can be
described as follows:
Consider the expression (3.63).
g(x)-g(S)
iri ting, gO (S + au) = --x-_""':S:--- we
obtain the following:
(3.64)
(g(X) - g(S»
gO(S)
Es
~(X-S).
It can be shown that the above expression equals,
(3.65)
1
gO(S)
eo
I-co
F(v)
°
g (S+v)dv.
where,
F(v)
=
{
J~ 1~(u)lf(u)du
for
v
~
1~(u)lf{u)du
for
v
>0
leo
If
eo
I-co
~(u)f{u)du
increasing on
=
v
0,
then F
is a
positive continuous
(-CO,O) and decreasing on (O,eo).
best choice of g can be given as; choose g
holds:
(3.66)
1
i~f gM'{S)
eo
I-co F{v)
M'
g
{S+v)dv
0
M
function
In that situation, the
such that the following
103
1
sup
=
co
•
i:f g'(6) J-co F(v) g (6+v) dv.
g
lim g' (a)w(a)=1
lal~
where. g* obviously satisfies
lim g*' (a)w(a) = 1.
lal~
It seems to us that
for most of the choices of w. there will exist a g* such that.
(3.67)
co
(1-E.)E U,,-,(U) = (I-E.) [J-co F(v)dv]
a
~
co
*'
J-co
F(v) g (a+v)dv
1
~ inf
*'
g
(a)
E U/J(U)
for a sufficiently small
E..
Even if (3.66) can not be satisfied by any g* • (3.67) will have some
solution.
Example 2:
(Poisson family)
Suppose. X •...• X are independent Poisson random variables. with
1
p
Xi distributed as Poisson wi th mean a .
i
We want to estimate.
!
=
(a •...• 6 )· under loss functions of the following form:
p
1
(3.68)
This
type
of
loss
functions
(weighted
considered by Ghosh. Hwang and Tsui (1984).
quadratic
loss)
has
been
Their method consisted of
repeated use of the Stein type identity for poisson distribution which
leads to an unbiased estimator of the risk differential.
The next step
is to solve the differential inequality:
UgO£) ~ 0
where.
EO Ug (!) = R(! + g(!).!) - R(!.!).
-
A solution to the above inequality leads to an estimator improving upon
1M
the standard estimator!.
Using our technique we get the form of the
improved estimator quite easily.
Exact evaluation of the best choice of
those constants
in
(A,C)
involved
the
improvement might need some
further justification.
There is one special adjustment needed in this case.
the fact that, a(9}/9
~ ~
as 9
~
Because of
0 for Poisson distribution, the model
assumptions MI-M7 are not satisfied near 9=0.
But, that can be adjusted
by 'not improving near the bad corner' (which shall be made clearer).
First notice
the following fact that if mj
~
0, we can not find out any covariance
stabilizer near O.
For any g :
(3.69)
U {O}
~
~~,
consider the quantity:
m
[E g(X}(X-9}]9,
9
m€ Z
where, X - Poi(9}, and 9 is close to O.
(3.70)
(3.69) equals,
-9
-9
m
9k +m
[g(O)(-9)e
+ g(I)(1-9)ge
+ ... ]9 . - kg(k) ~as 9
where k is the smallest integer such that, g(k}
m
Hence, [E g(X}(X-9)]9 ~ 0
9
as
9 ~0
whenever m ~ O.
~
g(j)
Now, as 9
(3.72)
=0
~~,
for j ~ Iml
and
g(lml}
But, for m
> O.
we can use our previous argument to show that,
m
9 E g(X}(X-9} - 9m g'(9) E(X-9}2
9
= 9m+l
Hence, the form of the covariance stabilizer near
g'(9}.
~,
0,
O.
we can choose g such that,
(3.71)
~
is given by:
< 0,
•
105
(3.73)
if m = 0
g(O) - log 0
'" olml
'" (1 - 0
-m
)
if m
<0
if m
> O.
As an example. first consider the case of the standardized quadratic
loss where.
m1 =
Choose.
g(O)
Hence.
~
= ...
= mp = -1.
= O.
0-
1
EO g(X)(X-O)
= 0- 1 E X(X-O) = 1..
Also. from previous considerations «3.65). (3.66»
the quadratic part
of the expansion turns out to be:
Hence. the obvious choice of h j is:
hj(Oj)
= OJ.
Thus. we obtain the following form of the improvement.
(3.74)
'Y
j
09
P
j=I.2 •...• p.
A+ I ~
k=1
Also. it turns out that estimator of the form (!-!(!» dominates the
standard estimator in risk. if 0
<
(This needs some more calculations.
c
< 2(p-l)
and A
~
p-l for p
~
2.
Our results only say that the above
estimator dominates! for c small and A large.)
In the second example assume that.
106
ml
= ~ = ... = mp = o.
This is the case of the standard quadratic error loss where there is no
covariance stabilizing transformation.
First observe that. by the Stein
type identity for Poisson variables. we have.
9 E9 g{X)
Thus.
E9 g{X){X-9)
= E X g{X-l).
= E[g{X)
for g such that g{-I)
= O.
- g{X-l)]X.
So. by choosing.
x
gl{x) = I
j=1
1
J'
= 0
x ~ 1
x = 0
we obtain the following:
Note that.
for x large as suggested by (3.73).
gl{x) - log x
Also. the quadratic part turns out to be:
Thus. h j {9 j )
2
= gl{9
i)
2
- log
9j
for large 9 j .
Hence. we obtain the following form of the improvement:
A+
P
I
j=l
2
log (X j +l)
But. now due to the fact that E g{X){X-9) - 9 for small 9. we have to
9
consider the following modification to (3.76):
107
(3.77)
where. N(!)
p
=
rr
j=1
I(X j
~
1).
This is what is meant by the statement that one should not try to
improve near the bad boundary point!=Q.
Thus. the estimator. 6 (!)
I
such that.
(3.78)
1
-
will dominate X.
~
j
~
p
~
for c small and A large for p
3.
This is the
estimator obtained by Ghosh (1983). where the choice of N(!) is little
more general.
Actually.
any bounded monotone
increasing
(in each
coordinate) N(!) will do the job.
Example 3:
(scale family)
Here. we shall consider the following model:
Suppose Xl •...• Xp are
independent random variables where.
where
Ul •...• Up
are
independent
distributions are independent of
scale factor!
= (Ol •...• Op)·.
positive
a.
random
shall
be
primarily
where. ! € 0
concerned
considered by Berger (1980).
(3.80)
whose
Our purpose is to estimate the
= {! > Q}.
includes the commonly discussed Gamma scale problem.
We
variables.
with
the
This example
See Berger (1980).
similar
loss
Define the loss functions as:
functions
108
The best scale invariant estimator of Oj1 in the jth component problem
is given by:
(3.81)
A-I
OJ
cj
=cj
Xj •
1
~
j
~
P.
where.
= E Uj
E U2
j
To find out the covariance stabilizer. we need to solve the g from the
following quantity.
(This can be derived just by expanding the loss
function) :
(3.82)
0m+1 EO g(X) (c X 0-1) = constant.
since. U
= XO
has a distribution independent of O. it is natural to look
q
at polynomial improvements. i.e .• g(x) = x • for some q.
By choosing. g(x)
(3.83)
= x(m+1)
we obtain.
0m+1 EO X(m+1)(c XO-l)
= E U(m+l)(c
U-l). if this is finite.
But. the above expression turns out to be 0 for m=O because.
E U(c U-l)
=Ecu2-EU
= (E
u2)
~
- EU
= O.
In that special case. the required choice is:
g(x)
=x
log x.
so that. we have.
(3.84)
0 EO X log X (c XO-l)
= 0 E 0- 1 U (log U-Iog 0) (c U-l)
109
= (E U2
log U) c - log
U2 log- U
EU2
= -{EU}- E
eE
U{c U-1)
provided this is finite.
Now, coming to the quadratic part of the risk differential, we get,
(3.SS)
Hence, the right choice of h
j
turns out to be:
In other words,
m
(3.S7)
h .(x.) = x
J
J
=
j
j
Itn xj 12
Observe that above arguments work, only when the relevant expectations
are fini te.
For various choices of m ' s we get various forms of
improvements.
Below, we show how they coincide with the form of the
j
estimators obtained by Berger (1980) in Gamma scale problems.
Example 1
a)
m = ... = mp = 1
1
Form of the improvement:
b)
m1 =
~
= ...
Form of the improvement:
= mp = 1
110
c
Form of the improvement:
d)
m = ... = m =-2
l
p
Form of the improvement:
A + ! l/x2
j
Estimators of the above form would dominate the standard estimators for
Also. whether we have to subtract or add (i.e .•
small c. large A.
shrink or expand) would be determined by the constants obtained in
(3.83) or (3.S4).
In (d). Berger obtained another term of smaller order
together with our form of improvement.
second term can be omitted.
But by adjusting A properly the
Again. the precise calculation of c and A
3.6 Treatment for discontinuous loss function:
Suppose
we
have
a
loss
(9 )
discontinuous at til
i
function
ii.
for
some
i.
which
is
(9 )
•...• t
i
isi
sup sup
9
i
• where tik·s satisfy.
<
Ol)
•
k
Then it can be seen that the techniques discussed in the previous
section becomes quite messy (although it works).
situation.
we would
like
to use the technique considered by Brown
(l966) .
First consider the function:
H :
Then.
of
~
But in this type of
of defined by:
111
ch
....
em
ax....
=I+T}-
ax....
Now. by assumption E. we have.
Thus. for
T}
sufficiently small. we have.
em I
ax
....
=
II
+
T}
8~ I > 6 > 0
ax
....
for all !. for some 6
Hence. we can say that. the transformation H is 1-1 for
T}
> O.
sufficiently
small.
Let.
Z
,..,
= H(x)
x + T}
,.., = ,..,
~(x).
,..,
,..,
Then after inverting H. we can write.
!
wi th k
~
0
as
= ~ + k I!(~)
T} ~
O.
Thus. instead of parametrizing the family of estimators by ....
~ we can
as well do it by using
I!.
Now. it is an obvious consequence of previous arguments that we can
do the same thing with single components. i.e .• considering the map for
each i. but fixed !(-i).
Then.
would be inverted as:
Xi
= zi
+ k ~i(zi'!(-i»·
Now. with this in mind. consider.
(3.88)
I li(xi +
T}
~i(!)·9i) dP e(!)·
....
112
If dP9 (xi) has a densi ty which satisfies properties L-L,
then (5.17)
i
can be written as:
d zi
Ex
J 2 i {zi' 9 i } Pi(zi + kJ3 i {zi '!{-i}}' 9 i } 11 + 13
-(-i)
ii
r
So,
- J 2 i (zi,9 i ) Pi (zi,9 i )d Zi]
= E[J
2 i {zi,9 i )d zi [
Pi{Zi + k Pi {Zi'={-i»,9 i )
1 + k Pii
- Pi(Zi,9 i )]
Same program can be carried out in this case also, and one can show as
before Lemma 5.1,5/2 holds on N .
i
On
c
N we have,
i
Here we assume p=1 for the sake of simplicity.
term of the order 13
ii
is added here.
Then we have, the above equals,
Let,
Also, observe that one
113
For
simplicity.
assume
that,
discontinuiti~s.
possible
a=l.
so
that Su
is bounded near
its
Hence, we have the following lemma:
Lemma 5.4:
Pi(Zi+k 13i,9i)
[
II
li(zi,9 i )d zi
1 + k 13
- Pi(zi,9 i )]1
Ni
ii
2
Ai (9 i )K i (9 i )
2
l13 iO II13 iiO I
O( 0ii(9 )
l13 io l +
0li(9 ) )
i
i
= k
Proof:
Fix k, 1
~
k ~ si
(3.91)
by (3.89)
A (9 )
(3.91)
=
i
i
l-k
II
zi
-t
ik
1<kf3 i
If the family w (zi,9 ) satisfies the properties MI-M1, following the
i
i
arguments as in Lemma 3.1 we have,
As a
final
remark,
one should say
that,
obtaining the form of
the
114
fami liar estimator without going through the laborious way of solving
the standard differential inequalities (coming out of unbiased estimate
of the risk difference) is a clear advantage of our method.
allows us to deal with a very general situation.
This also
But. the disadvantage
being the fact that. in this process we might not be able to obtain good
estimate of the constants C and A. and also in some si tuations.
right critical dimension of inadmissibility.
method fails
to give
But. the cases. where our
the right critical dimension.
resolved by the standard technique either.
the
it can not be
Moreover. as long as. we are
interested in finding out a single improved estimator to establish the
inadmissibility of the standard estimator. the right evaluation of C and
A are not necessary.
Overall. this technique gives us a better insight
into the Stein's paradox.
aIAPTER IV
A UNIFIED
4.1
ADMI~mILITY
RESULT
Introduction:
In this Chapter. we address some general admissibility questions
related to the Stein's phenomenon.
When a compound decision problem. as
described in Chapter 3. is considered. it is generally assumed that the
estimator. which is to be improved upon. is componentwise admissible.
Otherwise.
in
simultaneous
estimation
problems.
componentwise
improvements will give us an overall improvement.
The main resul t
we attempt
to prove
in
this Chapter can be
described as follows:
Suppose that we have a componentwise admissible estimator.
When we
consider Stein type improvements. it seems to work only after a critical
dimension (e.g .• 2 for the classical normal mean problem).
the question:
value?
Now. we ask
What happens if we have dimension less than the critical
Is the standard estimator now admissible?
The answer is known to be "yes" in some special cases; for example.
in the normal mean problem or the ganuna. scale problems.
See Brown
(1966). Berger (1980). Brown & Hwang (1982).
A positive answer to this question shows some kind of optimality of
Stein type improvement and establishes it as a very basic method of
improvement in general.
The fact that such a result can be true stems
116
from the argument brought up by Stein (1956).
normal
location problem.
the
improvement
He showed that in the
(i .e .•
~)
over
the best
O(~).
near the
IIXII
'"
invariant estimator. can at most be of the order.
11911
'"
boundary of the parameter space. (i .e. II!!II large) and the tyPe of the
improvement suggested in that paper was shown to attain the optimal
order.
Also. Berger (1976C) established similar results for location
problems under arbitrary loss functions: while describing the concept of
tail minimaxity.
When we
difficul t
talk about
to handle
admissibility
of
an
estimator
the si tuation mathematically wi thout
it
is
very
introducing
In many situations. it can be (at least in standard
Bayes estimators.
one dimensional problems)
shown that
estimators. form a complete class.
the Bayes or generalized Bayes
Bayes estimators are straightaway
admissible. So. by virtue of independence of the component problems and
additivity of the loss functions. componentwise Bayes estimators will be
admissible
in
the
compound
estimator
problem.
We
shall
restrict
ourselves to the standard estimators that are componentwise generalized
Bayes.
In this situation.
method.
the most commonly used technique is Blyth' s
Brown & Hwang (1982) & Dasgupta & Sinha (1986) considered the
exponential family of distribution under squared error loss and obtained
a fairly complete answer to the issues related to the admissibility of
standard estimators in compound problems.
Again.
the technique was
basically some nontrivial application of the Blyth method.
In this chapter our main objective will be to extend the resul ts
obtained earlier
to more general situations.
and then apply them to
111
obtain
the
desired
results
about
the
optimality
of
Stein
type
estimators.
4.2.
Results from exponential families
-
Consider the following setup; where the random vector X has a
distribution F(-,!) on mP with a density:
(4.1)
-
where 9 is a mP-valued parameter and v is a a-finite Borel measure on
The natural parameter space is defined to be:
n is typically an open subset of
mP.
Also
If v is a product measure then the Xj's are independent.
Consider the
problem of estimating the mean vector. V\II under the quadratic loss
function:
(4.2)
Then. the risk of any nonrandomized estimator is given by:
R(!.~)
= E9 L(~.!).
-
Let IT be any nonnegative distribution on
U(K)
<~,
for
n, such that
Ken, K compact.
118
If
IT
admits
a
density,
v,
then
the
Bayes
estimate
(possibly,
generalized) of EO ! given IT is:
..,
I (v )
!
11'
=x+
..,
I (v)
.x.,
(4.3)
where, for any h
0
~ ~,
a.e. (v). '
(4.4)
Now, from the standard theory it follows that if U is a finite measure,
then 6 (!) is an admissible estimator of!.
U
But, many of the commonly
used estimators of EO ! can not be obtained as a proper Bayes estimator .
..,
Typically, they wi 11 be Bayes with respect to some generalized prior
distribution (i.e., U(O)
= 00).
For example, the UMVUE is obtained when
U == I, and if the parameter space is unbounded, it becomes an improper
prior.
For the generalized Bayes estimators,
were
obtained
by
many
authors.
Those
the admissibility resul ts
are
essentially
special
applications of Blyth's lemma which has been already described in 1.2 of
Chapter 1.
Under the loss function (4.2) the following general result was
obtained by Brown & Hwang (82):
Theorem 4.1:
Assume (4.1) and (4.2).
Consider the generalized Bayes
estimator described in (4.3) with respect to the prior IT.
following conditions are satisfied:
If
the
119
i)
where S ~
n = mP
is compact and IT(S)
> 0,
if)
iii) sup
R(!,~)
K
< ~ V K ~ n, compact. Then, 6.". is an admissible
es timator of V\II.
Proof:
Remark:
See Brown & Hwang (1982).
1.
The above result can be restated under slightly modified
loss functions (namely, weighted quadratic loss).
The only change will
appear in the statements of the conditions (i) & (ii).
2.
Dasgupta & Sinha (1986) proved an analogous result for estimating
1inear parametric functionals of
the form L V\II and considered the
A
admissibil ity of linear estimators L V\II.
estimating v+ under a loss:
This essentially reduces to
(! - V\II)' L'L(a - V\II), where L may be of
smaller rank than p.
3.
For the case p
= I,
the above result is a generalization of Karlin's
(1958) result on the admissibility of linear estimators of the mean.
One point of interest in this regard is to prove the converse of that
result, which is known as the karlin's conjecture: see Joshi (1969) for
a proof in some particular cases.
4.3 A unified admissibili ty result for IDOre general setup:
4.3.1
The setup:
Here, we go back to the usual setup in multivariate analysis.
simul taneous estimation situation arises as a
The
special case of this
120
general
The
situation.
main
purpose
is
to
obtain
a
general
admissibility result similar to theorem 4.1.
Suppose that we have a family of probability measures
on (ffiP,~), where ~ is the Borel a-field on ffiP.
dPa{!)
....
= fOe!)
....
dv{!) , a.e.
where. v is a a-finite dominating measure.
For simplicity assume. 0
AI:
A2:
EO
! = ~,
Cov a{!)
....
~i
1I~1I2
P
-i
X (Qi' a).
1=1
= ~(~),
tr{~)
lim
=
= m
c
{Pa{·):~
....
EO},
Let,
(v).
Here 0 £ ffiP , an open subset.
Assume the following:
! is p.d. for all ~ €
o.
<~
Along with the above assumptions, we also assume the earlier conditions
MI-M7 in their generalized form. which are fairly straightforward to
verify.
Remark 4:
Usually. the natural parameterization in exponential families
would not satisfy AI.
One has to use then the mean parameterization.
Next, to complete the required decision theoretic setup, we define
the loss function for estimation of ....
O.
Here.
l
is
considered
to be a
quadratic
type
loss admi tting
the
following type of Taylor series expansion:
(4.5)
where. ! =
!(!,!,~)
is the Hessian of l (when it exists) evaluated at an
121
intermediate point.
(4.6)
~(!.!;!)
We assume further that.
is nonnegative definite, where it exists, which makes l
a piecewise convex function.
A plausible loss function should be. in some sense. an increasing
function of IIt-SII.
........
To have that property. we impose some condi tion of
the following form:
For fixed!. let
(4.7)
l(P(!},!}
= min!
....t
Also. the map !
~
PC!} be defined as follows:
l(t,S}
.... .... .
(unique minimum)
e(!} is continuous. and moreover. we have. for compact
sets K ~ O. and A ~ IRP • 3 another compact set B :J A ~ IRP • such that
(4.8)
sup
(t.S}€AxK
........
Remark 5:
l(!.!} <
l(!.!) - 6. for some 6 ) O.
inf
(!.!}€BcxK
Actually. nondifferentiable loss functions satisfying L1-LS
of Chapter 3. can be handled in this very general setup too.
But. that
would lead to immense mathematical technical i ties rather than broader
generality.
So. it is omitted here.
Now. we consider some prior distribution (may be improper) G(!} on
O.
Assume that G(!} has a density g(!} with respect to Lebesgue measure
on O.
This is not a very severe restriction.
Definition 4.1:
For a prior distribution G on O. (possibly improper).
if.
= min!
!€IRPu{CD}
cp(!.!}
122
estimator with respect to G.
The above definition is actually the standard definition of a Bayes
estimator.
But, (4.9) is not convenient to handle mathematically.
wi 11 be useful to rewrite (4.9) in an estimating equation form.
It
The
following proposition rewrites (4.9) in a more convenient form:
Proposi tion 4.1:
Under the above setup and (4.6), (4.7),
(4.8) and
further under the following assumptions:
PAl:
Define N = {(!,!) : l(!,!)
is not twice continuously
differentiable} .
If N~
PA2:
= O.
v x G(N)
Then,
....
= {!
: ~(!,!) is strictly convex in a neighborhood of !},
then,
PA3:
v~(!,!)
i)
=I
vl(!,!) fe(!)g(!)d!
....
a.e. (v)
11) H (t,x)
........
~
we have that the minimum in (4.9) is unique and is given by the unique
solution to the estimating equation:
I o vl(!,!) fe(!)g(!)d!
(4.10)
Proof:
mP
....
=0
Due to the differentiability of
~,
a.e. (v)p
if the minimum occurs inside
then (4.10) must be satisfied.
So, we first prove that the minimum occurs inside a compact set.
For
~
> 0,
0
<~ «
6, choose a compact set K
~
0 such that
123
c
(4.11)
jK
fa(!) g(9)d!
.-
Now. by continuity of
~
where M = In
< iN.
f9(!)g(~)d~
«4.7».
AO = {~(!) : ! € K} is compact.
Next. by the above assumptions. 3 a compact set K' £
n.
K'
~
K. such
that.
Choose. A =
{~(!)
!
€
K'}.
Now. by (4.8). 3 another compact set B
A. such that
(4.13) I K, [t(!I'!) - t(!o.!)] fa(!)g(!)d!
.-
c
for all t 1 € B • !o € A.
(by (4.11»
Also
follows from (4.12) and nonnegativity of t.
Hence. we get after combining (4.13) and (4.14).
=I[t(!I'!)
> 6(1-~)
- t(!o.!)]fa(!)g(!)d!
M- ~
.-
>0
for
~
sufficiently small.
Hence. the minimum occurs inside A.
Next. by the above assumptions. if
~
124
--
N = {(t.O)
i(!.!) is not twice differentiable at !}.
by assumption PAl
v x G(N)
=0
This implies that. G(N t )
-
Nt
-
If
=0
a.e. (v). where.
= {!
(!.!) € N}.
i(!.!) is twice differentiable at !. then it must
neighborhood B(!.!) of!.
And by assumption PA2. above.
So. for an arbitrary vector l €
>0
be convex in a
mP •
a.e. (v).
Here. H • Hi 0 are respective second derivative matrices of
~
'-
The above steps follow after noting that. whenever Hn
~.~
~
and i.
exists. it is
1
nonnegative definite and on Nt it is positive definite.
-
So. we have proved the existence and uniqueness of the minimum.
(QED)
Hence. (4.10) holds true.
Towards
this
end.
notice
that all
the
interchanges
of
limit
operations (differentiation under integral sign) were made possible by
125
assumption PA3.
which may be replaced by more concrete bounds on
integrands and then an application of the Dominated Convergence Theorem
will give the desired result.
4.3.2.
Structure or Bayes estimators:
In the previous definition (4.1) and the proposition 4.1. we have
obtained two representations of the Bayes estimators.
Here. we shall be
using the representation obtained in terms of an estimating equation
(i .e .• 4.10) to obtain a representation for the difference between a
Bayes estimator and a prefixed estimator and also between two Bayes
estimtors with respect to two different prior distributions.
These
representations will be used to derive the main result of this chapter.
Suppose that roe!) is an estimator of!.
Actually speaking. ro may
not be a reasonable estimator. but it will be used as a base estimator
(just like the origin in the space of all estimators).
Assume the following about ro:
(4.17)
-
Eo"Vi{ro{!).!)"
where. Hi
e
2
<
01).
denotes the second derivative matrix of i{· .!).
We now
'-
present the following theorem giving the representation of a Bayes
estimator
~
Theorem 4.2:
with respect to a prior G (With p.d.f. g) with origin
Suppose 6
"'g
ro.
is the Bayes estimator wi th respect to the
prior G. as given by (4.10) (assuming PAI-PA3 holds).
i is piecewise convex (as in 4.5).
Also. assume that
Then we have the follOWing:
126
where r * lies on the line joining
g
*
IIrg -
ii)
!.oil
~ II~ -
rn
and 6 and.
-v
-
!.oil .
If H is another prior with density
T.
then
where.
Proof:
By (4.10) we have
(4.20)
I
Now.
Vi(~.!) fO(!)g(!)d!
-
~ =
!.o
+ (~ -
=!
.
!.o).
By one step Taylor series. we can write
Also. it is well known that Hi O(r* ) will be a measurable function.
'-
g
Hence. (4.20) and (4.21) together imply
So. (4.18) follows from PA2. where we virtually assert that the matrix
in the l.h.s. is positive definite. a.e. (v).
For. (4.19). observe that. if we use the estimating equation for H.
we obtain
127
In
From there. expanding
vl(~.!)
fO(!)v(!)d!
-
= Q.
vl about 6g one gets
So.
Similarly. changing the role of w and g in the previous argument we get
the second expression in (4.19).
COnsider the first expression in (4.19) and compare it with
Remark 6:
(4.18).
(QED)
That gives another way of computing the difference. (6 -6 ) .
..",. "'g
If v and g are very close to each other. then we can write the following
identities (approximately):
i)
6w(!)
~6g(!).
if) From (4.22).
I vl(6g"V
.0)fO(x)v(0)dO
= -I [vl(6 .0) - vl(6g""",.",IV'"
.0)]fO(x)v(0)dO
"",'"
".."
f1IV
=
-I
=-I
[{Vl(6w'!) - vl(!o.!)} + {vl(!o.!) {vl(~.!)
-I
Vl(~.!)}]go(!)v(!)d!
-
-
- vl(!o.!)}fo(!)w(!)d!
+
~
#IV
....",......
I {Vl(6g .!} - vl(!o.!)}
+
fO(!)v(!)d!
-
{vl(6v '!) - vl(!o.!)}fo(!)v(!)d!
-
+
I {Vl(6g .!) - vl(!o.!)}fo(!)g(!)d!
-
128
This argument just shows that adding and subtracting
!o
give us (4.18)
from (4.19).
We want to say a few words about the choice of ro. which behaves
-
like a reference point in the space of all estimators of a.
Suppose that! has a distribution from the exponential family. so
-
that the p.d.f. of X can be given as follows:
f (!) = e!.!~(!)
a
-
w.r.t.
v.
Then. under the quadratic loss. for estimating Eo!
-
= ~(!).
the Bayes estimator under the prior G. is given by:
(4.23)
-
where g is the density function of G. and for any function h of a.
define
-
Ix(h)
Now.
= I O fa(!)h(!)d!.
-
it can be readily seen that wi th the choice roe!) = !. (4.23)
coincides with (4.18).
Because in this case.
So. an application of Green's theorem (or. integration by parts) will
yield.
I viC!.!) f a (!) g(!)d!
-
129
O·x"(O) g(O)dO
-
= f(x-V'iJ) e- -
=
--
AI
I vg(O)e-O·x"(O) dO
AI
-
AI
-
= I (vg).
x
-
It can also be observed that for the above problem, rOe!)
natural origin for Bayes estimators.
case.
-
= x,
job.
= !'
acts as a
In general, it will not be the
Although it may not be the case every time , we shall choose roe!)
and impose some conditions of "
so that this choice will do the
So, without loss of much generality, we can assume that roe!) = !.
At this point, observe another fact.
If we have
fO(!)
-
= e-O·x"(O)
-
a.e. (u),
Hence the quadratic loss in the exponential family is nothing but a
special form of the Fisher score function loss.
similar to the entropy loss.
a)
This is actually very
We define thse loss functions below:
Fisher's score function loss:
In one dimensional problems, where
the log-likelihood is strictly convex, its derivative, which is Fisher's
score function, will be a monotone function.
In this case, the unique
solution to the equation:
is the maximum likelihood estimate.
From standard aSymPOtic theory it
is quite clear that the mle tries to locate O. in the scale of the score
function.
So, it can be seen that there is a strong connnection between
130
the mle and a loss function whose first derivative is the score function
So{x}.
{Actually, Cramer-Rao inequality shows that connection!}
So, if we want to construct a loss function such that its first
derivative wi th respect
to
the action coordinate {!}
is
the score
function, we must take,
i{a,O}
This
will
= !f{O}
be
8: in fO{u}du, where, 8: in fO{f{O}}
called
the
Fisher's
score
function
= O.
loss.
In
mul ti-dimension we can easi ly generalize this defini tion using 1ine
integrals.
In this case, we define,
where, f{!} is the unique
Q of
8
80 in fO{-}.
...
...
Also, it must be mentioned
that {4.24} can be made into a proper loss function only when
;~ i F {!,!}
...
In that case.
by putting.
>
...Q)
...
for all O.
-pC!!} = inf i F {!.!!} we get a new loss
a
...
function.
{4.25}
which will always be nonnegative.
For the exponential family of distributions. we obtain that.
131
which is the quadratic error loss function.
loss becomes so natural for
b}
eA~onential
That is why the quadratic
families.
Entropy or Kullback-Leibler loss functions:
attempts
to measure
In this situation. one
the Kullback-Leibler distance between the true
...
parameter! and its estimated value.
So. in that case. 2(a.0)
takes the
.-
following form:
(4.26)
2k (!.!)
= -EO[log
fOe!) - log f a (!)].
, . " , . "
! ... PO·
"..,
,."
A trivial application of Jensen's inequality shows that above function
is nonnegative.
But. for fini teness of 2k we need to have mutual
absolute continuity of the densities.
Now. if we compute the gradient
of the above loss function. (prOVided it is differentiable) we get
Again. for an exponential family.
log fOe!)
.-
=! - !
-
~(!)
.
so.
2k {!·!)
= E...O[!-!
= ~(a-O)'
,."
where
is.
H~
is the Hessian of
~.
IV
- ~(!) - ! - ! + ~(!)]
H'IJ (a-O)
#IV
,."
evaluated at an intermediate point.
As it
the above loss function is not the same as the quadratic loss
described before.
2 has a dependence on higher moments {skewness and
k
132
kurtosis for example) of the random vector.
But, in large samples, when
-9 can be estimated with a negligible amount of error, we can write
-1
where ~ is a ."fl-consistent estimator of v-IJ.
covariance or
exponential
the inverse of
So,
family.
Note that H-IJ
is the
the Fisher information matrix for an
typically,
the
entropy
loss approximates
normalized quadratic loss.
Now,
we are in a posi tion to state the main resul ts of
section.
the
this
We shall basically be concerned with general loss functions in
first
In
result.
conditions.
this
case,
we
need
some
stronger
moment
Next, we shall consider special losses discussed above.
In
that situation, an application of integration by parts formula reduces
the complexity significantly.
Here, we proceed for the main result,
which will be for strictly convex loss functions.
strictly convex,
difficulty.
the
same
theorem can be
Where the loss is not
proved with
some extra
Also, as we shall see later, we will need a modified form
of Blyth's method.
In this form, we shall need, h
n
) O.
We state the
modified form below.
Lemma 4.2:
(Modified Blyth's method):
Let 6
-g
denote the Bayes estimator under the loss function t with
respect to the prior g (possibly improper).
h
ii)
h (!) t 1 pointwise as n
n
n
iil) (a)
(b)
: 0
~
1)
(0,1]. smooth functions. n
~
~
Let {h } be as follows:
n
1.
m.
2
10 g(!) hn(!)d!
< m for all n.
3 compact set K. such that h (9)
n-
~ ~
for all
e
-
€
K. n
~
1.
133
If 6 denotes the Bayes estimator with respect to the prior, g
iV)
h
~
n
=g
2
n'
I~ In t(~,!) fO(!)g(!)h~(!)dv(!)d! < ~ for all n.
-
Then, if
A
n
= In
[R(6 ,0) - R(6 ,0)] g(0)h2(0)dO ~ 0
---g-
n-
-
~-
-
as
Proof:
Same as the usual proof.
n
~~,
6
---g
is admissible.
See, Brown & Hwang (1982).
QED
Next consider,
~n(!'!)
= In tel,!)
fo(!)g(!)hn(!)d!,
-
n ~ 1.
Since, h (0) t 1 pointwise, by monotone convergence theorem,
n~n(!'!) t ~(!,!) =
I t(!,!)fo(!)g(!)d!.
-
Now, define,
S
n
Then,
~
n
= {t-
:
~
(t,x)
n--
(x),x)}
-
g-
for n
~
1.
t ~ implies, {S } is a decreasing family of sets.
n
convex with unique minimum.
1bat implies SI is bounded.
d (x) = sup
n-
Claim:
~ ~(6
i)
it)
dn (!)
d
n
<~
t€S
- n
It- -
a.e. (v).
!Oasn~~.
6 (x) I,
g-
n ~ 1.
Also,
Let,
~1
is
134
Proof:
(i) follows trivially from the fact that 8
1
J 8
...• and s1
J
2
is bounded.
For (ii). notice that. since
nonempty.
6g (x).
6
>0
~
n
(6 (x),x)
g
< ~(6g (x).x).
-
-
each 8
n
is
Now. choose an arbi trari ly small compact set K containing
Now. by the convexity and the uniqueness of the minimum of
~.
3
such that.
Now. the fact that.
and the fact that each
~
n
is convex with unique minimum implies that for
n sufficiently large.
8
C K.
n-
since. k was arbitrary. (ii) follows.
(iii) is a trivial consequence of (ii). because 6
n
€ 8
n
by the
definition of 8 .
n
Now. consider the following.
(4.29)
t(~.!)
-
= vt{6n .0)
-
t(~.!)
• (6 -6 ) +
""g -n
(where. K = K{6 .6 .0) =
--n-nintermediate point.)
!!t
~
(6 -6 )'K{6 -6 )
""g -n - ""g -n
(by 4.5)
0(6*
n ): second derivative evaluated at an
'-
2
Let A
I
[R(6 .0) - R{6 .0)] g(0)h (0)dO.
n
--ogn- n- Then using (4.29). we can rewrite A as:
=
n
135
(4.30)
An
= S[E_O vi(6"11.0)
= ~ S(6 -6 )'
n
-g
Since.
Sn
(6
-6 ) +
-g-n
•
[Sn
~
(6
-6 )' K(6-g-n
-6 )]gn
(0)d9
-g-n
--
K fO(x)g
(9)d~ (6 -6 ) dv(x).
_- n -g-n
-
-
Vi(~.!)fO(!)gn(!)d! =
-
Q
a.e. (v)
by the definition of 6n .
Next. let. 6** . r * and 6* be the intermediate points as defined by
-g
"11
"11
(4.18), (4.19) ane (4.29) respectively.
Let.
Assume the following:
T A1:
Xl -9 1
If -U = (0(9
)
1
X -9
P p)
m
-
iOf,!) ~ A + B"!!" t.
nonnegative.
T A3:
there exists. k
•...• 0(9 )
p
sup EO"!!"
9
T A2:
•
<
ClCl
t
for all 0
< k.
<m ~
> 1.
such that.
k.
and A, B are functions of !. and
Also. i is assumed to be strictly convex.
g is a prior such that.
II~(!)II =
near the boundary of the sample space.
O(II!II) and. IId 1 (!)II = O(II!II)
136
T A5:
> O.
There exists p
q
> O.
such that
lim lim sup P(~) E
E-<>
~
P(~)
I(II~II < E) = 0
i = 1,2 •...• p
where Zi
and
1
~ a+II~lIq P(~)
is a polynomial function of degree at most r (for some r
-1
> 0)
-1
of (Ol •...• Op) or (° 1 •...• Op ) (depending on whether finite or infinite
boundary) .
The
following
lenuna is an immediate consequence of
assumptions.
Lenuna 4.3:
Under the assumption AI, A2, TAl, TA5, we have,
IIXII 5
EO - q
- a+IIXll
and q
< r.
Proof:
-
11011 5
= O{ - q )
a+1I01l
-
= 1I0ll
-
s
as
~ ~ 80, for all s
s
IIZlI
E ----! a+IIZlI q llOll q
- -
Now, the first integral is less than
-
and. EOII~lIs = 0{EOIl2I1s) = O{l)
-
.
by TAL
< k.
-
(by definition of Z)
the above
137
The result now follows from TA5.
TA 6:
-1
QED
-1
< kO < m
sup EO tr(B
ABC)
-n -n-n
-n
i)
n
-
for all _0.
There can be several sufficient conditions for TA 6 to hold.
We
give one that seems to be the most intuitive.
Proposi tion 4.2:
Assume that
~
= O.
Suppose. that there exists a
neighborhood. B(!. r("!"» of !. such that:
for
i)
!
€
c
F , for some compact subset F. F
~ int(~)
and k ) 0 (may be m).
ii)
IB[~'Ht O(!)~] fO(!)gn(Q)dQ
inf c inf
'x.€F
n IO[~' Ht,O(!)~] fO(!)gn(Q)dQ
-
-
) 6 ) 0 for some 6 ) 0 V
iii) a.
3
m. M ) 0
such that for all
~
#
~
#
Q.
Q
o <m ~
iii) b.
Then TA 6 holds.
Proof:
-
First note that for any matrix A.
sup
l#Q
This implies, 1I!1I
2
-
A'A.
= Amax (A'A)
_.- = max eigenvalue of .-
Now, for any
138
symmetric matrix A which is nonnegative definite,
= O{Amax{A»
tr{A)
as Amax{A)
~
or~.
0
So we have
= O{Amax (A'
"11
A », where "11
A
"11
= "11"11
A~-1~»
"11
Thus following similar lines of arguments that it is enough to show
that
And this is essentially guaranteed by iii a) and iii b) above, after
noting that
116 II
= O{lIxll)
"11
and
116 II
-g
-
= O{II_xll)
uniformly in n.
-1
If we consider A B •
"11"11
1
tr{A B- )
"11"11
l'!n l
= O{sup
'B ).
;y;;fQ l "11 l
Now.
g
{O)dO
n -
-
139
=
(~
Hl.O(~)~)fO(~)gn(!)d!
-
-
Now. by condition iii)a. iii)b it is possible to write
Similarly, the other conclusions follow.
Remark 1.
fill
Note that, for a purely quadratic loss function, ----2
ax
-
is independent of
~,
~
(~,!)
and hence the above condition is satisifed (with k =
and m. M = 1).
Now, we state the main theorem.
Theorem 4.3:
Under assumptions AI-A2.
TAl-TAG.
let
6 (x)
---g-
be
the
generalized Bayes estimator with respect to the prior (may be improper)
g.
Also assume that.
conditions of lemma 4.2.
3 a sequence of functions.
Then 6
---g
{h} satisfy the
n
is admissible by the Lemma 4.2 if the
following integrals are finite.
i)
11)
So tr(~(!» pC!)
2
SUP{IIVhn (!)1I } g(!)d!
n
So tr(~(!» EOII~(!)-!1I2
-
Proof:
<~
2
sUP{IIVhn (!)1I }g(!)d!
< ~.
n
Consider the expression (4.30).
Use Lemma 4.1 and (4.19) to
140
rewrite the integrand as
(6 -6 )' [In HI! O(6*} fO(x}g (O}dO](6 -6 )
(4.31)
-g
"11
= In
where "11
T
II
...
"11
'...
_
n'"
-
-
-g
"11
vl!(6 ,O}fO(x}g (O)dO
-g -
II
_ -
n -
-
and A , B defined as before.
"11
"11
Now.
because
In
vl!(~.!)fO(!}g(!)d!
-
Now. we write.
In using
= Q.
the new origin. r O(!)
= !.
and (4.18) so that
where.
point between x and 6 (x).
-
This follows from (4.18) with rOe!)
=!
g-
and one-step Taylor series
on vl!(6 .0) - vl!(x.O).
g-
--
Now. using notations defined below. write (4.32) as.
( 4.33)
where
T = D - V V-I D
"11
"11
"11 -
-'
141
=f
~
vl{!.!)fe{!)g{!)d!.
for p.d. matrices A. B.
{Note that. by the COurant Theorem.
x'Ax
(4.34)
- - = max Ai (AB-1 ),
sup ~ !'B!
1~i~n
where Ai{e) stands for the ith characteristic root 1
~
2 !' ! !
~
~
i
(by the definition of hn • 0
n).)
< hn{e) < 1).
Hence. (4.35) implies, together with (4.34) (with A replaced by Y and B
""11
-
replaced by Y).
(4.36)
1
Since. trey
Y- ) = 0 {max
""11 -
1~i~p
constant
<m
'\. (Y
"i
""11
y- 1»
-
•
Hence. sup trey
n
""11
a.e. (v).)
Next. consider (4.31)
(4.37)
T' S
(where.
T
""11 ""11 ""11
= {D""11
- Y
y- 1 D)'S (D - Y y-I D)
""11-
-
""11""11
""11-
-
~ 2[D' S D + D' y-I Y S Y y-I D'
""11 ""11 ""11
-
so, to conclude that A
n
-
~
""11 ""11 ""11 -
0 as n
~ m.
:;:.J
it is enough to show that.
y- 1 ) ~
-
142
i)
a
n
ii) ~
n
=I
=I
D' S D dv(x)
-n-n-n
-
~
0 as n
~ ~.
D' V-I V S V V~1 D]dv(x) ~ 0,
-
-n-n-n-
-
First consider a.
n
=
-
-
as n ~ ~.
Now,
I vi(!,!) [hn(!)-hn (!)] fo(!)g(!)[hn(!)+hn(!)]d!.
-
(where e* is an intermediate point)
so, that we have
(4.39)
D' S D = II~
D 11 2
-n -n
-n -n -n
= III[~ !!te][Hto 9 Vhn(!*)]
• """'",.,
fO(!)g(!)[hn (!) + hn(!)](hn(e)h~l(e»d!1I2
tIV
~ [Itr(~'tJIV0 ~ ~ '",.,,,.,,
O)fO(!)gn(!)d!J
[h (x)+h (0)]2
[I[vh' g' !!~10 9 vh ]fe(!)g(!) n2- n d!J
n
~,_
n h (!)
n
(By the Cauchy-Schwarz inequality for vectors.)
~
-1
9 vhn )[1
~ '_
2 tr(S C )[I(vh' g' Hn 0
-n -n
n
h~(!)
+ 2
h (0)
n-
] fO(!)g(!)d!]
_
143
where C =! Hn O{6 )fO{x)g (O)dO.
~
C;'#t,I
"'g
n"'"
,.,,"'"
#IV
So, now by Fubini's theorem and using (4.39) we can write
2
h (x)
a
1
2
~ 2 J [O[tr(S
C ) tr(g'H:, , _0 g)lIvhn 11 {I + h ~ (0)
- )]g(!)d!.
_
"'Il"'ll
n-
n
Now, by assumption TA 6,
Hence, if
J sup p..(!)
(4.39)
2
IIV(!)1I }g(!)d!
n
Convergence Theorem.
a
For
~
n
(4.40)
Now.
n
• first write.
~
n
~
-I
=!IIS-Y Y
"'Il "'Il -
DII
-
2
dv{x).
-
.... O.
<
CIO,
then by
the Dominated
144
using the same logic as before. provided the second integral is finite.
Now.
-1
V D = -[6g'V
(x}-x].
""
IV
#IV
So. from (4.40).
Now. letting
f:3
n
~(!)
= EO[tr(W)].
..,
= 0(1
Hence. if I tr(I)
tr(I} ~(O) E""~"..,,,.,
Oll6 (x}-xIl
#IV
,
-1
..,
2
2
IIvhn (0}1I g(0}dO).
ftl
#V""
EOII~-!1I2 sup {~(!}IIVhn(!}1I2} g(!}d! <
..,
«:0 •
n
f:3n
Now. tr(g ~i.O g}
one obtains •
~
0 by the Dominated Convergence Theorem.
1.
= tr(Hi •..,
O gg }
145
so that,
-1
X{O)
.... = tr{I) EO tr{vi' Hi 0 vi).
....
' ....
Hence, (4.39) is equivalent to finiteness of the integral
(4.42)
So, convergence of an and fJ
n
can be wri tten as a single condi tion,
Combining two integrals,
a
(4.43)
n +fJn -+0
if
2
-1
J c tr{I)g<!!>sup{ lIvhn O!>" }{[EO tr(vi' Hi 0 vi)]
K
n
....
V
(since, by TA6 it turns out that
' ....
[EOII~-?S" 2
~(!!)
....
E tr{W) ]}d!!
< co
= 0(1) as !! -+ an) where, i)
K is
a compact subset of the interior of 0 with
ii)
iii)
a Vb
= max(a,b).
W = Wg (!,!)
* -1
*
= Hi,!!{rg)Hi,!!(~)Hi,!!(rg).
Thus, if (4.43) is true, 6
"'g
Remark 8:
is admissible.
One can find the above proof quite similar to the proof given
by Brown and Hwang (1982) for the exponential family of distribution
under quadratic loss.
Due to the special structure of quadratic loss in
exponential family, their proof becomes somewhat simpler.
look at (4.33), where the decomposition of T
-n
But, if we
is given, one can easily
146
observe the relation between D and the asymptotic growth condition of
"1'l
Brown & Hwang (1982): also.
flatness of g.
~
is somewhat related to their asymptotic
Actually. in exponential family.
2
prior controls E 11eS {x)-xll.
8 g'" ...
...
the flatness of the
In general, no such explicit relationship
exists.
Remark 9:
The choice of {h } is left somewhat ambiguous in the theorem.
n
Brown & Hwang choose {h } as:
n
...
if 11811 ~ 1
1
l! 11811
hn(~)
=
1 -
n'"
l!n n
if 1 ~ 11911 ~ n
if 11911
0
for the case.
n = IRP •
>n
In our case. we need to modify h
n
so that i t
...
> n.
becomes posi tive everywhere. but decreasing fast enough for 11811
Now. it can be seen that for h
n
the optimal bound for. sup "vhn "
satisfyig the conditions of lemma 4.2.
2
is
[II~II
l!n(1I911 V 2)]
-2
.
In our case.
n
some adjustment near the boundary is needed to achieve the above rate.
In general. that adjustment will depend on the particular situation.
So. the question about the choice of {hn } is left open.
Now. we state the following corollaries.
COrollary 4.1:
Under the above assumptions. if the loss function l! can
be written as:
where
~(~)
is a
admissible if
positive definite matrix
for all
~.
then
{j
g
is
e
147
(4.44)
I sup IIVhn ll tr(};){ tr(~)
2
V tr<!n
n
Proof:
EO"~_!1I 2 }
g(!)d!
<
CXl
-
Follows from (4.43) after observing that
QED
Now we proceed to the next theorem. where we prove an analogous
result for special types of loss functions described before
entropy and the score function loss).
(i.e .• the
Here. an application of Green's
theorem gives somewhat stronger results.
Suppose that f O(!) is a family of densi ties wi th respect to the
dominating measure dv.
i)
The loss function i is such that
vi(!.!)
ii)
H(!.!)
(4.45)
iii)
Assume the following:
= ~(!)
8
-
+ A80 log f(!.!). for A
= H = A a280
ax-
= A(!)
log f(!.!) is positive definite.
-
8f
O
supp(P ) is independent of ! and g(!) 80- A(!)
O
-
= ghn2
n
~
9i
Vi .
are defined as before. then we can write. just as
in (4.30).
(4.46)
i(~.!)
-
i(~.!)
= vi(6"'11 .0)
-
(6"'g-6
) + ~(6"'g-6"'11 ). H(6*)(6
-6 ).
"'11
n "'g-n
so that we get
(4.47) An
0 as
i
or
Now. if g and g
nonsingular.
= I(R(6" ' g.0)
-
2
- R(6 .0»g(0)h (0)dO
"'11-
-
n-
-
148
= J(6 -6 )'[J H(6*)fO(x)~ (0)dO](6 -6 ) dv(x).
-g ""ll
n _ - -n - - -g ""ll
-
Next, note that, by (4.19).
And thus the integrand of (4.47) becomes
(4.49)
I'([~(O)
~
-
+ A SO(6 )]g ) S
_-g
n
""ll
I
~
([~(O)
-
+ A SO(6 )]g )
_-g
n
where
Ix(h(!»)
-
~ J h(!)fO(~)d!
-
a.e. (v)
and B • S are as before.
""ll
""ll
8
Also. SO(·) = 80 log f O(·) is the score function.
-
-
-
At this stage,
because of the flexibility due to the special form of the vl. we can
proceed in a slightly different way.
Write
So. we can rewrite (4.49) using (4.45) as
(J!; + ~ + lG)'~ (J!; + ~ + lG)·
where
e
149
~2
..,
and H*
= I x (V(gA)hn2 )
1
- I x (H~g
.... n ) 1x (H*""'~)I x (v(gA»
IV
IV
f t I , . "
*.C?8e
= Hi,e(r
..,g ) = A ax
..,..,
*
log fe(r
..,g )
*
= A Je(r).
..,g
Also, we define the
operator V on matrices as follows:
8 a
If A(!) is a pxp matrix then so is vA, and (VA)ij
=8
e
ij
j
The above steps follow from Green's theorem via (4.45 (iii»
standard manipulations.
(4.52) a~n)
So, as before, it is enough to show that
~' ~ ~ dv(!) ~ 0 for i
=I
= 1,2,3 as
n ~~.
First consider ~n):
(4.53)
a3(n)
=I
I'(gA Vh2 ) S
=2
I III (~ A vh gh )11 2 dv(x)
x
n
-n
-
x-n
~ 2 I tr(~
I (g A Vh2 )
x
n
n
£ri) Ix[(vh~
-
dv(_x)
n
-
!'
~-1(~)! vhn)g] dv(!).
(using similar arguments as in (4.38».
Now, by TA 6 it follows that
~n) ~
after some
Canst. I sup
n
Ee[vh~ !' H~le(!) !
IV
-"",
And hence, ~n) ~ 0 if
For, a~n) we have, using (4.49) and (4.51)
vhn ] g(!)d!.
150
(4.55)
a~n)
=f
II(
~1I2 dv(!)
= f II~(I
(v(gA)h2n )
"11
X
1
2
- I (H...., g )I- (H....,g)I (v(gA»]1I dv(x)
x."
... , ,nx
,
. , , , ,.... . .x
,
"",.,
Let, T
-
= vegA)
g
-
- H* I-l(H~)I (v(gA)}
!
!
Thus, (4.55) can be rewritten as
(4.56) a~n}
2
=
f II~ Ix(~ !}1I dv(!}
=
-1
f tr(S"11"11
C} I (T' HD e T) dv(x)
xco, -
-
-
~ Canst. f I
-
(T' H:
!-
1
e
co,_
(6 )T g ) dv(x)
-g- n
-
by (TA 6).
since,
2
g h
n
~
g.
From (4.56) we can conclude that
where T
-1
= vegA)
g
Actually, it turns out that, under the assumption TA 6 both the terms
are of the same order because
151
V{gA} ,
-1
Hi e{6 } v{gA}
' - "'g
= I {
?S
}.
g
Also,
Ix{gll!! !:1I
=
-
2
}
Now, by TA 6, 1?S{tr{!!}g} < M < 00
a.e. {v}.
So, {4.59} can again be bounded by
{4.60}
I
{gllJ![1I 2}
x
~ Ix{g tr{!!})1I!:1I 2
-
~ Millx (I~{H
cr) V{gA})1I 2
x
tF'
--
-1
= Mtr{I!
V{gA) ,
{H~}I {H~}}
!
I {
!
~1 v{gA)
g
}
152
v(gA) , H~l vegA)
=MI(
~
).
g
From (4.58) and (4.60) and along with TA 6, it can be concluded that
a(n) -+ 0 if
2
-<
dO
CIO.
(again D.C.T.)
For a~n), notice that if we replace vegA) by +(!)g in ~ we get~.
So,
an analogous calculation shows that. a~n) -+ O. if
(4.62)
I !'(!)
Eo[~~ (~.!)]-1 !(!)
---
g(!)d!
< 00.
So. we can interpret these conditions. (4.54). (4.61). (4.62) following
Brown and Hwang (1982) as:
Asymptotic growth condition:
Asymptotic flatness condition:
with an extra condition coming up due to the bias in the loss function:
Bias condi tion:
I o {!'(!) EO[J;I(!)]!(!)} g(!)d! < 00.
--
And hence we have the following theorem for special losses related to
153
score functions:
Theorem 4.4:
Assume TAl - TA6.
Consider the loss function described in
If for a generalized prior g,
(4.45) .
the above conditions (growth,
flatness and bias) are satisfied for some choice of {hn } as in Blyth's
lemma, then, -g
6 is admissible.
Proof:
Follows from the above considerations.
Brown and Hwang's result
for exponential
QED
family
follows as a
corollary.
Corollary 4.2:
Suppose! has an exponential family of distribution with
natural parameter!.
Consider the estimation of
~
= v~{!),
under the
loss function:
where, A is a positive definite matrix and B is nonsingular.
an improper prior.
(4.64)
i)
In sup
n
if)
The corresponding Bayes estimtor 6
-g
{vh~
!
In {vg ' ~ vg
vhn}g{!)d!
Let, g be
is admissible if
< co
+ vA' A-I vA}g{!)d!
< co
g
Proof:
Follows from the theorem 4.4, with
Je = !.
-
4.4
Some Applications:
From the corollaries and theorem 4.3 and 4.4,
various results for exponential families follow.
Also, Karlin's (1958)
154
theorem can be proved as a special case of theorem 4.4 (as has been
shown by Brown and Hwang (1982».
We consider a few examples that can
not be obtained from Brown & Hwang.
Example 1:
(Location problem)
Lebesgue measure on
(4.65)
of),
and
J!
Suppose
€ (}
I IIxll k f(x-O)dx
< co for
IW""""
=
of.
!
has dens ity f (~-J!) (a. e .
We shall assume that
k sufficiently large, so that we can
IV
apply theorem 4.1.
Consider
a
convex
function
m
p
m+
~
with
the
following
properties:
i)
~(x)
= p'(x)
is strictly increasing, p(O)
= min
p(x)
x
ii)
cp(x) = p"(x)
>0
for all x and cp is continuous.
Then consider a loss function of the following form:
Then we have
Now consider priors g with support
of
satisfying the following property:
There exist positive functions Fj(l
i)
For some 6
11m
1I~1I..p)
(4.61)
~
j
~
p) such that:
~
1
for all j.
> 0,
inf
1I!1I«1-f.)1I~1I
g(~+!)wj(~+!)
Fj(~)
155
lim
if}
sup
IIzll«1-c}lIxll
....
....
IIxll~
....
for some c
g(:s+!}wj(:s+!}
Fj (:s)
<M
for all j,
> 0, M > O.
Then we have the following proposition:
Proposi tion 4.4:
the prior g.
(4.68)
i}
Let -g
6 denote the Bayes estimate of ....
0 with respect to
Assume (4.67) and the following:
IlIz lI <6 p(t-z j } f(~,>d!
....
for some 6
> 0,
>0
for 1
~
for all t sufficiently large,
j
~
p
if}
for all j.
(4.69)
Then
sup
1I~(:s}
- :sll
< CD.
x
....
Proof:
Let
-
Substitute ........
z = x-O in the above integral.
where,
S
#IV
Then we have
= (sl •...• s P )' = x-to
ftI
ftI
Now. minimizing e(s}
.... with respect to ....s is equivalent to minimizing.
e
j
with respect to Sj' where
Consider ej(O} first.
156
I
(4.72)
ej{O)
= S p{-Zj)
= F j {!) S
•
I
wj{!-!)g{!-!)f{!)d!
Wj{!-!)g{!~)
p{-Zj)
F {!)
j
f{!)d!
+ SII!.II>{I_~}II~II] {p{-Zj}
Wj{!-!}g{!~}
F {!}
j
f{!}d!}
By {4.67} {ii} for "!" sufficiently large. the first integral is less
than
Again. by (4.GB) {if}. for sufficiently large "!". the second integral
in {4.72} can be made smaller than arbitrary
(4.74)
On
small~.
Hence. we have
ej{O) ~ Fj {!)[{M+l) S p{-zj}f{!)d! + ~].
the other hand.
{4.75}
•
ej{sj) ~ SlIzll<6 p(Sj-Zj) wj{!-!}g{!-!)f(!}d!
-
by 4.67 (i) for sufficiently large "!".
nondecreasing convex function.
This concludes the proof because ej{sj)
-+ co as Isjl -+ co uniformly in II!II.
-
large. uniformly in IIxll.
Now. use the fact that P is a
Thus ej{Sj) - ej{O) > 0 for all Isjl
Hence.
I
Remark:
Actually. we can prove a stronger statement than proved above.
157
provided the following is satisfied:
In that case, by (4.19) we can write
where. 6*
= ax + (1-a) -g
6 (x)
g (x)
........
....
for 0
< a < 1).
Then the jth component of 6g....
(x) - ....
x is given by:
- x
_I
j -
~(xj-9j)wj(!)f(!-!)d!
I ~(6;j-9j)Wj(!)g(!)f(!-!)d!
Consider the denominator first.
By the proposition 4.4. we have
sup 116*-xII =
X
....
Let
uo(z)
-g ....
=
inf
*
Isl<T
T
* < co.
~(s+z) ~
0
and
sup
*
Isl<T
~(s+z).
So we have
Then. under some mild restrictions on
~.
we can conclude from the above:
158
(4 . SO)
0
< mO <
lim J If'{6*j-9}
g
j
-
IIxll~
(9)g(9)
j F-. (x) - f(x-9}d9
- _ -
W
J-
Next, consider the numerator
W.(x+z}g{x+z)
J#'trI~
"V#Itt#
F j {!)
{follows from (4.76»
Now, if we assume the following:
ASSUMPTION:
by 4.76 (ii) it follows that
On
the complement, Le., on {! : II!II
< {1-~}1I!1I},
by
Wj{x+z}g{x+z}-g{x}Wj{x}
{4.S2}
-
sup
-
I---
IIzll«I-~)lIxll
--
F j {!}
- -- I
4.67 {i}, we have
~
(2M+1)
sufficiently large "xII.
Thus,
Also,
Wj{x+z}g{x+z}-g{x}Wj{x}
I---
--
F j (!}
- -- I
~ (2M+l)
for
159
Thus, by the dominated convergence theorem, we have
(4.84)
lim
w (x+z}g(X+Z}-g(X}W j (x)
j
1---Fj(~}
sup
II~II~ II ~II
<( 1-Eo ) II~II
- --I
f(~}d~ =
o.
Hence, by (4.80), (4.81), (4.84) we have
(4.85)
lim
-
IIxII~
lo~(~}
- xjl = 0
for all j.
The next step is to choose the sequence of priors (involved in
theorem 4.1) properly.
As required in the application of Blyth's lemma,
one must have finite Bayes risk for the sequence {g
n
2
=g
h } for each
n
finite n. Also, in the general situation, one must have hn >0 for each n.
We shall choose h
of the following form:
n
(4.86)
h (0)
n -
= hn(11011).
-
Now, the basic form of h
n
is going to be the same as Brown & Hwang,
but, at the tail (I.e. for II!II large), we need to have h
n
~
enough.
Suppose that 3
(4.87)
'"Y
>0
such that
I II!II-2'"Y wj(!)gUD p(s+xj-Oj) f(!-!)d! < ClCI for all j,
and for all s.
Then define, for n sufficiently large,
=1
-
if 0
~
11011
~
1
if 1
~
11011
~
d
tn(II!II)
-
tn n
-
n
n
0 rapidly
160
,
•
I
d
n
=--....;.;..---
...
...
=
J
e(v~+'Y
2
> an
11011
...
if
1I01l'Y in(1I01l)
n
2
where
a
in n - 'Y in n-2)/2'Y
and
'Y inCan n)
in n
dn = (an n)
n
1
'Y +
1
.."......,.........;;~."..
inCa
n
n)
Actually, this choice of a n and dn makes hn and h'n continuous except at
11011=1.
Moreover, hn is a monotone decreasing function of 11011.
...
...
Also, the following claims are true:
Ih~ (x) I ~
Claim:
i)
sup
n
(4.89)
11) sup sup
n
Proof:
For 0
Ih'(x)1
n
x
1, h~(x)
= 0
~
C
< CIO.
for all n.
For, 1
~
x
~
an n,
(since a n < 1 for sufficiently large n.)
> a n n,
h~(x)
= -dn ( 'Y+l'Y.
+ -'Y-+~I~I~
.2~)
x
c;n x
x
c;n x
(a n)'Y inCa n)
Ih'(x) 1 = _n_-=--_ _
n _ ('Y + • (1 »-1 ""+11
('Y + -._1_)
n
in n
c;n an n
x'
in x
c;n x
~
Hence, sup
n
Thus
~
1
1 ~ xinx
1
-- 1- xinn
For x
So,
x
i~
hn (y)
sup
he)
(1-E.)x<y<x n x
x
~
x
(i)
decreasing,
1
for all n.
x in x
Ih~(x)1 ~
follows.
x
i~
x·
For
(ii)
notice
that,
since h
n
is
monotone
161
hn «1-~)x)
hn(y)
h ( ) = h (x)
(1-~)x<y<x n x
n
sup
Now,
if
(4.90)
_ in
-
n-in(1-~)x
in n
x"Y in x
d
On (1-~)x
< an
in
< (1-~)x
n
(1-~)x
if x
< an
< an n < x
n.
(1-~)x,
< x,
n
n-in(1-~)x
in n
~
n
n
= in n-in(1-~)x
in n - in x
Now, it is clear that on a n n <
if
a
C
And finally, on x
"Y
in x in n
(a n)
n
< co
< an
x
"Y
inCa n)
n
for all n and x.
n.
in n-in(1-~)x _ (in n-in xl - in(1-~)
in n - in x (in n - in x)
= c < co.
Hence. (ii) in the above claim follows.
From the above claim.
it follows that.
(4.67). then so does wj and g hn2 .
if w
j
and g satisfies
In that case, we need to replace F
j
162
Also, note that since h
(4.90)
n
is bounded by I, therefore,
hn(y)
1
h ( ) ~ h (x)'
sup
y«I-~)x n x
n
This leads to the fact that if the following condition is satisfied, we
can prove proposition 4.4 with 6 replaced by 6
(the Bayes estimator
-g
~
2
of ! with respect to the prior ~
g h ).
=
n
(4.91)
Thus, if (4.91) is satisfied, we have
(4.92)
n
(This
< ClO.
sup sup 116 -xII
-
~-
x
follows
difficulty.)
from
the proof of
the proposition 4.4 without much
Now, we are in a position to apply theorem 4.1.
we need to verify the conditions TA I-TA 6.
For that,
TA I-TA 5 follow trivially
from the fact that we are in a location family situation and the above
arguments.
For TA 6 use (4.92) and follow the same line of argument
used to derive (4.80).
(4.93)
tr(S
-n C)
-n
< ClO,
We obtain the following:
*
*» <
-1 0(6 )~ft 0(6
tr{~ftc;._
0(6g )~ft
c;,_ -g c;,_ g
ClO
a.e.
(dv)
(using the notation of previous section.)
Hence, we have the following result.
Theorem 4.5:
Suppose X has p.d.f. f{x-O), 0 €
of estimating
!,
#IV
"'IV
".,
ufo
under the loss function (4.66).
support ~p such that (4.67) is satisfied.
Consider the problem
Let g be a prior with
Assume the following:
163
as IIxll
,., -+
00.
as
Then 6
-g
11911
,., -+
00
is admissible if
P
:I
j=l
w.un
J
g(!)d! < 00.
(4.95)
I1I911>k
,.,
Proof:
Under the above assumptions the conditions for the theorem 4.1
can be verified with some difficulty.
admissible if the following is finite:
Now by previous arguments:
(4.96)
iii}
2
i}
E9"~-!1
,.,
tr(:I}
= Const.
= O(I}.
Hence, by theorem 4.1, 6
-g
is
164
-
as
iV)
11011
-+
00.
QED
Hence the result.
Remark:
Notice that if we have an invariant loss function, i.e., if w
= 1 for each j, and if g
-x.
=1 (the noninformative prior),
In that case (4.95) is finite for p = 1.2.
invariant
(or
the
Pitman estimator)
dimensional problems.
and Brown (1966).
Let wj
=1
is admissible
j
we get 0 (x) =
g-
Hence,
the best
in one and
two
This has already been obtained by Stein (1959)
One interesting thing happens if we let,
for each j.
Then
-
dO
,., C I
00
rP-l+PI3-2
2
dr
K1 tn (r)
(after making polar transformation)
The above integral is finite if
p-1 +
(4.98)
p ~
~-2 ~
-1,
i.e.
if
2
1 + 13
Hence. if we take priors of the form (4.98). 6
-g
dimension 2(1+/3)-1.
cri tical dimension.
For invariant prior 13=0.
is admissible up to the
Thus. we get p=2 as the
Hence there is a close connection between the
165
invariance of the problem and the commonly obtained critical dimension
for admissibility.
the upper bound becomes
This is desirable because for p
~.
prior becomes integrable.
Hence.
p=-1
Another important feature of (4.9B) is that for
< -1
the
Thus. -g
6 becomes a proper Bayes estimator.
it becomes admissible.
Now the next question concerns
the
2
A partial result is
Next we give an example from the scale.
Here the si tuation is
~
inadmissibility of the estimator
for p
>
I+P
obtained in the next theorem.
completely different from the exponential family situation (because the
support might depend on the parameter.)
Example 2:
Consider the following simul taneous estimation problem in
the scale family of distributions:
1 ~ j ~ p.
(4.99)
where
U's
j
are
independent
random
var iab Ie
with
dens i ty
f (u) .
Typically. scale family model refers to the case when f has support
(O.~)
and 9 's are positive real valued parameters to be estimated.
j
shall assume 9 j €
(O.~)
We
for each j as well for simplicity.
For the loss functions.
we consider weighted average of scale
invariant loss functions of the following form:
l{!.!) =
P
wj {!) p{t j 9 j ). where! = (t1 •...• t p )· and wj.s are
j=l
posi tive weight functions and p is an increasing convex function such
(4.100)
that p{ Ixl)
-+ ~
1
as Ixl -+~.
Under this setup similar results for scale family can be obtained
following the same line of argument that led to the theorem 4.5.
The
only difference· is that now the technical1 ty condi tions will take a
166
different form.
Suppose now that g is a prior wi th support 0 = (O.oo)P.
If we
consider the following integral
one obtains after substituting u j = 9 x j •
j
(4.102)
I
-1
Wj{~!
-1
)g{~!
-
~
j
~
p.
~ P
) p{u j
IT u f{uj)d~
x j i=l i
{Here. for any positive vector!
a
1
= (a1 •...• a p )'
-1 A
-1
= (a-1
•...• a )
1
p
Then from (4.102) under the following regularity condition
(4.103)
-
for all x
it can be shown that (analogous to proposition 4.4)
(4.104)
(Here. because of the fact that
Now we define h
point
in
the
CO).
n-
closer
~
> Q a.e.
we have!
> Q a.e.)
In this case since we have a finite boundary
0' of
the
parameter
space.
we
have
to
167
modify h.
Instead of using the usual quadratic function. consider the
following:
let f : (O.m)
n
f(x)
~
(O.m)
= in2
be as follows:
x
= (x-I)
2
for 0
<x < 1
for x
>1
Then f is a continuously differentiable function blowing up at both the
ends of the interval.
Then define
ifO~AC!!>~1
hn ({!) = 1
(4.105)
= 1 -
in ACiD
in n
if 1
~
A(S)
....
~
an n
dn
=
[AC!Df in[AC!!>J
2
where A
Hl)
Notice that we have basically retained the same structure of h
n
as
in the previous example except for replacing the usual Eucleadean norm
by A{!). which is suitable for this purpose.
(4.103) can be verified for the prior g
n
=g
Now. with some difficulty
2
h .
n
Hence we would have the following:
nJ
(4.106)
c5 ( )
sup max I g ! I
n
Next.
J
XJ
< m.
in order to verify the condi tions TA 1 - TA 7.
existence of sufficient number of moments of U
j
-1
and Uj
we need the
.
In common
examples (i.e .• in Gamma family etc.) that is never a problem.
we get the following result:
Hence.
168
Theorem 4.6:
Under regularity condi tions similar to
previous theorem. the Bayes estimator 6
-g
support
n
= (O.oo)p
those in the
with respect to a prior g with
(under the model 4.99 and the loss function 4.100) is
admissible if
p:
( 4.101)
wj (!)](1I!1I
4
V 1)
2
2
A (!) ln A(!)
...... <
g(O)dO
00.
holds.
Proof:
Follows from theorem 4.1.
Remark:
Observe that under the above cri terion we can not get the
admissibil ity of the best invariant estimator under invariant loss.
That is because the theorem 4.1 was specially designed for location type
problems.
Actually.
family as well.
it can be modified so that it works for scale
It can be shown that the same conclusion (theorem 4.1)
holds if the following integral is finite:
(4.108)
where
~ =
I vl(!.!) fo(!)g(!)d!
under a little different argument.
...
That leads to the following:
(4.108)
4.5
Tail inadmissibility of generalized Bayes estimators:
As indicated in the beginning of this chapter our main objective
169
was to relate the Stein effect with the results obtained in earlier
sections.
In the location case, it was seen that for priors of the
form:
theorem 4.1 says that the generalized Bayes estimator is admissible up
to the dimension
resul t
for
2(1+~)
-1
~=O,
(equals to 2 in case
the best invariant estimator).
giving the standard
Here we shall obtain a
partial result saying that if the dimension is more than 2(1+~)-1, Stein
type improvement will dominate the generalized Bayes estimator at least
asymptotically.
statement
is
For
still
!'
small values of
true,
but
it
still
it
seems
needs
more
that
the above
justification.
Actually, for small ,.,
9 the question of dominance depends more on the
exact form of the density function than for large ,.,
9 (where it depends
essentially on the moment structure).
Suppose
parameter ,.,
9.
that
X has
,.,
a
distribution
determined
by
Assume all the conditions needed for theorem 4.5.
(4.19) we have the following:
(4.109)
a
I
~(xj-9j) f(!~)
hj(!)d!
I ~(6;j-aj) f(!-!) hj(!)d!'
Consider the numerator and substitute ,.,
z
= ,.,,.,
x-9.
So, we have,
location
Then by
170
where. J
j
is the Jacobian of h j and 0
< a < 1.
Next. assume that the loss function l{=! w l) is unbiased i.e .•
j
Then we have
Next consider the denominator
where 0
< mO < Cj {!) < m1 < ~.
(by (4.80».
Thus from above. we can
write
( 4.112)
6 (x)
""g"'"
=x
t1V
+ C- l k + C- l R
flIIV
,..,
,..,
,..,
where C = diag (C1{!) .....Cp {!».
k
...
= (kl •...• k)'
p
where
•
Now. for all priors and weight functions of interest the following will
hold:
(4.113)
IIC"""
l
rll = o{lIkll)
I1'V
"."
as
IIxll
,..,
~ ~.
171
Hence,
near
the boundary (I.e.,
large !) we have the following
for
representation of a Bayes estimator:
(4.114)
-1
6 (x) = ,..,
x + 11'V"'"
C
k + o(lIkll).
,..,
""g"""
Consider the following special cases:
i)
wj == 1
Then hj(J!)
and,
g(J!) = IIJ!II'"
= 11911'"
...
and hence
...
for large 9 .
--... .
..,9
-
...
11911
=
p
11
j=1
Then
Now
for
simplicity
of
component problems are
the
ensueing
analysis,
independent also
we
assume
the prior g
that
the
is product of
marginals.
Let,
(Actually,
the most general setup (4.114) can also be handled in the
same way.
See Brown (1966) for analysis in dependent situation in the
..,=0».
invariant case (i.e .
Next consider the following loss function:
1:
so we have wj
==
1
2'
OJ
OJ-2 p(t j -9 j
Hence,
),
we are now back to
the simultaneous
estimation si tuation so we can apply the techniques of chapter 3 to
172
develop Stein type improvements.
p
10 j
g($!) = 11
j=1
In this case. choose
I'Y
-
for large 9.
Hence (4.114) reduces to
(4.115)
() j (x)
g -
'Y u
2
C-1
j
j
x + ---.:",-Io_l..-.K._= j
X
j
(When g($!) is of the above form we shall refer to that as a balanced
problem.
In an unbalanced situation we have
Next. by lenuna 3.1.
in order to find out the asymptotic covariance
stabilizer we need to find q(x ) such that.
j
(4.116)
~(~-Oj) - Canst. for large OJ.
E a(x j )
Oj
At this stage we must mention one thing.
The asymptotic covariance
stabilizer may not have positive cross product for all OJ.
whether it happens we need to know more about f.
Now
=
E q(x j )
= q(Oj)
~(Zj
1
+ ~ + o(~».
xj
IX I
j
1
E ~(Zj + ~ + o(~»
xj
IX I
j
To see
173
Now for large OJ' using conditions (4.67). (4.68) and (4.79) one can
show the following:
provided::~ 0 as OJ ~~.
Thus (4.117) reduces to.
Choose q(y)
=y
for largely.
Then we get
Now
if
we
consider
the
corresponding
corresponding to this q. we get
-
t x
And. by lemma 3.1 and theorem 3.1. we get
(4.121)
R(O.6) - R(O.6 )
~
I
"ttl
""
'V
Stein
type
improvement
174
p{l+"Y) - 2 ~
~I
Hence.
A + 11911 2
-
Eo
for large A and large
(I + o{l»
maxle.l.
j
J
starts working after the dimension
(4.122)
Hence. we get the following result:
Theorem 4.7:
In a location problem. wi th independent components and
invariant loss function. the Bayes estimator
~(!)
with respect to the
prior g is admissible (under above regularity conditions) upto dimension
[2(1+~)-1]. where g and ~ are defined as follows:
(4.123)
g(!)
=
p
fI
j=1
Ie j I~
-
for large 11911.
•
Moreover. if
then, 6
"'g
is inadmissible for p ~ [2{1+~)-1] + 1.
Al though we do not attempt
to generalize this resul t
in this
dissertation, it seems clear to us that this result can be much more
strengthened.
In the context of Stein estimation under general setup there are
several interesting questions which demand attention.
those problems as a continuation of this work.
We shall pursue
NOfATIONS AND ABBREVIATIONS
colunm vectors
- - null set matrices
or standard normal density depending on the context
A. B • . . .
~
~
standard normal c.d.f.
m
set of real numbers
Z
set of integers
m
mp
mP
set of natural numbers
set of first p natural numbers
p-dimensional Euclidean space
/I -II
standard Euclidean norm
I I
absolute value
II II
g:
the quadratic norm:
matrix
i.
g=
1I!lI
J?S'~
for a nonnegative definite
9
first derivative of a function with respect to suitable
f'
arguments
u{o.e) :
Uniform distribution on
N (e.:I) :
p--
(o.e)
p-variate normal distribution with mean
!
-
matrix :I
vl :
tr{!)
A{!)
gradient of a scalar field l.
-
trace of the matrix A
maximum eigenvalue of A
max
-x'. A'
sup
supremum
inf
infimum
supp :
D.C.T.
MLE :
-
-
transpose of the vector x or the matrix A
support
Dominated convergence theorem
maximum likelihood estimator
and dispersion
BIBLI<X;RAPHY
Anderson. T.W. (1984). An Introduction
Analysis. 2nd Ed. Wiley. New York.
to Multivariate Statistical
Angers. J-F. and Berger. J.O. (19S6). The Stein effect and Bayesian
analysis: A reexamination. Commun. Statist. 15. 2005 - 2023.
Barlow. R.E . • Bartholomew. D.J . • Bremner. J.M and Brunk. H.D. (1972).
Statistical Inference under Order Restrictions. Wiley. New York.
(1959).
A test of
Bartholomew. D.J.
alternatives. Biometrika 46. 36 -48.
homogeneity
for
ordered
Bartholomew. D.J. (1961). A test of homogeneity of means under
restricted al ternatives (with discussions).]. Roy. Statist.
Soc.{Ser.B) 23. 239 - 281.
Baranchik. A.J. (1970). A family of minimax estimators of the mean of a
multivariate
normal
distribution. Ann.
Math.
Statist.
41.
642 - 645.
Berger. J.O. (1975). Minimax estimation of location vectors for a wide
class of densities. Ann. Statist. 3. 1318 - 1328.
Berger. J.O. (1976a). Admissibility results for generalized Bayes
estimators of coordinates of a location vector. Ann. Statist. 6.
256 - 264.
Berger. J.O. (1976b). Minimax estimation of a multivariate normal mean
under
arbitrary
quadratic
loss.].
Multivariate
Anal. 6.
256 - 264.
Berger. J.O. (1976c). Tail minimaxity in location vector problems and
its applications. Ann. Statist. 6. 33 -50.
Berger. J .0. (1978). Minimax estimation of a mul tivariate normal mean
under polynomial loss. ]. Multivariate Anal. 8. 173 -180.
Berger, J.O. (1980). Improving on inadmissible estimators in continuous
exponential families. Ann. Statist. 8. 545 - 571.
Berger. J.O. (1982). Bayesian robustness and the Stein effect. ]. Amer.
Statist. Assoc. 77. 358 - 368.
Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis.
2nd ed. Springer-4"erlag.
Berger. J.O and Haff, L.R. (1983). A class of minimax estimators of a
normal mean vector for arbitrary quadratic loss and unknown
covariance matriX. Statist. Dec. 1. 105 -129.
Blackwell. D. (1951). On the translation parameter problem for discrete
variables. Ann. Math. Statist. 18. 105 -110.
Blyth. C.R. (1951). On minimax statistical decision procedures and
their admissibility. Ann. Math. Statist. 22, 22 -42.
Brandwein. A. and Strawderman. W. (1980). Minimax estimators of
location parameters for spherically symmetric distributions
under concave loss. Ann. Statist. 8. 279 -284.
Brown, L.D. (1966). On the admissibility of invariant estimators of one
or more location parameters. Ann. Math. Statist. 37, 1086 -1136.
Brown, L.D. (1971). Admissible estimators. Recurrent diffusions and
insoluble Boundary-value problems. Ann. Math. Statist. 42,
855 -903.
Brown. L.D. (1979). An heuristic method for det~rmining admissibi Ii ty
of estimators - with applications. Ann. Statist. 7, 960 -994.
Brown. L.D. (19B0). Examples of Berger's phenomenon in the estimation
of multivariate normal means. Ann. Statist. 8. 572 -585.
Brown. L.D. and Hwang. J.T. (19B2). A unified admissibility proof.
Statistical. Decision Theory and Rel.ated Topics III. S.S. Gupta
(Ed.). vol.I, 205 - 230. Academic Press.
Brown. L.D. and Hwang, J.T. (1986). Some limitation
phenomenon. Commun. Statist. 15. 2025 -2042.
on
Stein's
Chacko. V.J. (1963). Testing homogeneity against ordered alternatives.
Ann. Math. Statist. 34. 945 -956.
Chang. Y-T. (19B1). Stein tyPe estimators for parameters restricted by
linear inequalities. Keio. Sct. Tech. Rep. 34 • 83 - 95.
Chang. Y-T. (19B2). Stein tyPe estimators for parameters in truncated
spaces. Keto. Sci. Tech. Rep. 35. 185 - 193.
Dasgupta. A. and Sinha. B.K. (1980). On the admissibility of polynomial
estimators in the one parameter exponential families. Sankh~.
(Sere B). 42. 129 -142.
Dasgupta. A. and Sinha. B.K. (1986). Estimation in the mul tiparameter
exponential family: Admissibility and inadmissibility results.
Statist. Dec. 4, 101 -130.
Diaconis. P. and Stein. C. (1983). Lectures on Statistical Decision
Theory. Unpublished. Stanford University. Stanford.
Efron. B. and Morris. C. (1973). Stein's estimation rule and its
competitors - an empirical Bayes approach.]. Amer. Statist.
Assoc. 68. 117 -130.
Farrell. R.H. (1968a). Towards a theory of generalized Bayes test. Ann.
Math. Stattst. 38. 1 -22.
Farrell. R.H. (l96Sb). On a necessary and sufficient condition for
admissibil ity of estimators when strictly convex loss is used.
Ann. Math. Statist. 38. 23 -28.
Ghosh. M. • Hwang. J.T. and Tsui. K. (1983). COnstruction of improved
estimators in mul tiparameter estimation for discrete exponential
families (With discussions). Ann. Statist. 11. 351 -356.
Ghosh. M. . Hwang. J.T. and Tsui. K. (1984). COnstruction of improved
estimators in multiparameter estimation for continuous
exponential families. ]. Multiuariate Anal. 14. 212 -220.
Hudson. H.M. (1978). A natural identity for exponential families with
applications. Ann. Statist. 6. 473 - 484.
James. W. and Stein. C. (1960). Estimation with quadratic loss. Proc.
Fourth Berkeley symp. Math. Statist. Probab. University of
California Press. Berkeley.
Judge. G. and Yancey. T.A. (1986). Improved Methods of Inference in
Econometrics. North _ Holland. New York.
Karlin. S. (1958). Admissibility for estimation with quadratic loss.
Ann. Math. Statist. 29. 406 -436.
Katz. M.W. (1961). Admissible and minimax estimates of parameters in
truncated spaces. Ann. Math. Statist. 32. 136-142 .
...
Kudo. A. (1963). A multivariate analogue of one sided test. Biometrika.
SO. 403 -418.
Le
cam.
L. (1955). An extension of Wald's theory of
decision functions. Ann. Math. Statist. 26. 69 -81.
..
Statistical
Lindley. D.V. and Smith. A.F.M. (1972). Bayes estimate for the linear
..
model {with
1 - 41 .
discussions).].
Roy.
Statist.
Nuesch. P.E. (1964). Multivariate tests of
alternatives.
Doct.
Dissert.
Swiss
..
.
Soc. (Ser .B)
34.
location for restricted
Federal
Institute
of
Technology. Juris - Verlag. Zurich .
Nuesch. P.E. (1966). On the problem of testing location in multivariate
populations for restricted alternatives. Ann. Math. Statist. :no
113 -119.
Perlman. M.D. (1969). One sided problems in multivariate analysis. Ann.
Math. Statist. 40. 549 -567.
Sen. P.K. and Saleh. A.K. (1985). On some shrinkage estimators of
multivariate location. Ann. Statist. 13. 272 -281.
Stein.
C.
(1955).
A necessary and sufficient
admissibility. Ann. Math. Statist. 26. 518 -522.
condi tion
for
•
Stein, C. (1956). Inadmissibility of the usual estimator for the mean
of a multivariate normal distribution. Proe. Third Berkeley
Symp. Math. Statist. Probab. University of California Press.
Berkeley.
Stein, C. (1959). The admissibility of Pitman's estimator of a single
location parameter. Ann. Math. Statist. 30, 970 -979.
Stein, C. (1973). Estimation of the mean of a roul tivariate normal
distribution. Proe. Prague. Symp. Asymp. Statist.
Stein, C. (1981). Estimation of the mean of a roul tivariate normal
distribution. Ann. Statist. 9, 1135 -1151.
Stein, C. (1986). Approximate Computation of Expectations. IMS lecture
notes - monograph series. vol. 7. S.S. Gupta (Ed.).
Strawderman, W.E. (1971). Proper Bayes minimax estimators of
multivariate normal mean. Ann. Math. Statist. 42, 385 -388.
the
Strawderman, W.E. (1974). Minimax estimation of location parameters for
certain spherically symmetric distributions. J.
Multivariate
AnaL 4, 255 -264.
Strawderman, W.E. and Cohen, A. (1971). Admissibility of estimators of
the mean vector of a multivariate normal distribution under
quadratic loss. Ann. Math. Statist. 42, 270 -296.