•
I:>
•
ON THE ASYMPTOTIC DISTRIBUTIONAL RISKS OF SHRINKAGE
AND PRELIMINARY TEST VERSIONS
OF MAXIMUM LIKELIHOOD ESTIMATORS
by
Pranab Kumar Sen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1478
December 1984
CN '!HE ASYMI?'.OCII'IC DISTRIBUTICNAL RISKS OF SHRINKAGE AND PRELIMINARY TEST
VERSICNS OF MAXIMUM LIKELIH<XD ESTIMA'IDRS
*
By PRANAB KUMAR SEN
University of North carolina, O1apel Hill, N.C., USA •
StHo1ARY. Both the preliminary test and shrinkage versions of the rnaxi.nuJm
likelihood estimators aim to reduce , in a neaningful setup, the risk of
the estimator. Asynptotic distributional risks for these versions of the
maximum likeli.hcx:rl estimators are studied , and it is slx:lwn that none of
these estimators daninate over the other in the light of the asymptotic
distributional risk criterion.
1. IN:l'RJIXJCI'ION
'lbe maximum ZikeZihood estimators (MLE) playa :fundanental role in the
classical pararretric estimation theory. Under quite general regularity conditions, the MLE are (asymptotically) optimaZ, where the optimality has been
•
interpreted in various ways; we may refer to IbragiIoov and Has'minskii (1981)
for an excellent account of this theory. Typically, in a multi-parameter
•
modeZ~
a proper subspace (w) of the pararreter space (r2) may be identifiErl fran
extraneous considerations, and a restricted MLE may be defined by limiting
considerations to w, and this may fare better than the unrestricted MLE
when the assumed pararretric restraints actually hold.
en the other hand,
negation of such assumed restraints may rerrler the restricted MLE as biasErl,
inefficient, and even , possibly, i.rxx>nsistent. '!hus, the restricta:i MLE may
have super-efficiency within a restricted part of the pararreter space, but
lacks the validity-robustness. This conflicting picture on the performance
* WJrk partially supp:>rted by the National Heart, Iilllg and Blood Institute,
Contract: NIH-NHIBI-7l-2243-L fran the National Institutes of Health.
AMS SUbject Classification Numbers: 62Cl5, 62E20, 62F12.
Key WJrds and Phrases : Asyrrptotic distribution; asynptotic distributional
risk; Janes-Stein rule; rnaximJm likelihood estimator; preliminary test
estimation; shrinkage estimation.
-2-
characteristics of the restricted MLE' s may be of genuine concern in a class
of problems where the prior infonnation on w is rather uncertain (though quite
likely to be true) : The use of the restricted MLE's may not be advocated
without reservation, altb::>ugh it is tenpting to take into. consideration this
prior infonnation to achieve higher efficiency i f possible. In such a situation,
ooth the preliminary test (PT) and shrinkage versions of MLE's can be used
with
saTe
disti.rx::t advantages.
In a preliminary test estimation problem, first,
a preUminary test (on the validity of w ) is made, and depending on the outcare of this test, a restricted or an unrestricted MLE is used. Thus, a P1MLE
is a convex canbination of the restricted and unrestricted MLE' s , where the
mixing coefficients are the irxlicator variables derived fran the preliminary
test. en the other hard, in a shrinkage estimation problem, a test statistic
( for testing the validity of w ) is directly incorporated in the estimation
rule in providing a SIlOOth estimator of the pararreter of interest. In either
•
case, confined to the paraneter space w , the estimator Perfonns better than
the unrestricted MLE, though, the restricted MLE may still be better than
either of them. On the other harrl, unlike the case of the restricted MLE ,
when
e
lies outside w
, either of these versions has a rrore robust Per-
fonnance characteristic, i.e., the negation of the assumad restraints affects
the Performance of either of these versions to a much lesser extent than the
restricted MLE.
For (nultivariate) nonnal and other SPecific paranetric nodels, the
theory of PTE and shrinkage estimation has been neatly developed by a host
of workers I viz., Judge and Bock (1978) where other references are also cited].
The exactness of this theory stumbles into imnensurable difficulties for a
general paranetric ( or nonparanetric) nodel where the urrlerlying distribution
may not necessarily belong to the exponential family. Nevertheless, in such a
case, asymptotic theoPy has emerged with ranarkable sinplifications ;
~
may
•
-3-
refer to sen(1979,1984), saleh and sen (1978,1984) and sen and saleh (1979,
1985), anong others. It is clear fran these results that in an asymptotic
setup, we need to confine ourselves to a shrinking neighbourhood of
w, for
which the unrestricted, restricted, PI' and shrinkage versions of the MIE
~
have distinct ( and ItBmingful ) perfonnance characteristics , ameanable to
carparative stu:iies. For the P'.IMLE, the asymptotic theory has already been
'w
stu:iied in detail in sen (1979) • The object of the current investigation is
to focus mainly on the shrinkage MLE and to
<Xl'lpClre
the asymptotic perfonnances
for the two versions (Le., PTE and shrinkage) of the MLE's.
The PIMLE and shrinkage MLE are roth introduced in Section 2 , where the
preliminary notions are also presented. Asymptotic distribution theory of the
shrinkage estimator is considerErl in Section 3 , and the asymptotic distributi0nal risk results are also presented there. The last section deals with the
asynptotic risk efficiency results for the P'.IMLE and shrinkage MLE. In this
context, it is shown that none of these versions daninate over the other in
the light of this asymptotic distributional risk criterion.
2. PIMLE, SHRINKJ\GE MrE AND PRELIMINARY OOl'ICNS
we
closely follow the basic setup in Sen (1979) where the PIMLE have
already been discussed. Consider k (,,: 1) indeperrlent sanples of sizes n , ... ,
l
n
K
,reSPectively. Let X. ' ... ,X.
be n. indepement and identically distril.l
m.
l.
l.
OOted (i.i.d.) randan vectors (r.v.) with a distribution ftmction (d.f.)
F . (x, a), x
l.
...
£
Jfl
and
a = (a l ' ••• , ar )'
£
nC
(=1, ••• ,k) and
a
£
...
i=l, ••• ,k. For each i
n , we assume that F. (x,a) admits a
l.
...
f.l. (x,a)
(with resPect to scme sigma-finite l1Easure
_
density function
•
r
E , for sCIre m _> 1, r _> 1 , for
~
). Then,
the log-likelihood function is defined by
log
e)=
L (X
,
11 _n...
where ~n
=
k
ni
E. -1 E. -1 log f. (x .. ,
l.Jl.
l.J
a 1, ...a
-
(XII'.·. '~) is the sanple point (
£
rfID )
£
n,
,and n
(2.1)
= n1+••• +~
•
-4-
\'e define the restricted pararreter space w as
=
w
= ( hl(e), ••• ,h
Ie: h(e)
-
- -
p
=
(e»
0_ } , for same p ~ r .
Then, an mrrestricted and a restricted MLE of the true pararreter
~
(2.2)
, denoted
A
e
-n
by
e , respectively, are defined as
_n
and
-e_n
log L (X
n-n
= sup
= sup
A
log L (X , e
n ...n
-n
e
(2.3)
log Ln (X
_n , e
(2.4)
log Ln (X_n
ee:n
ee:w
For testing the null hypothesis H : e
O
e:
w , the classical (log-) likelihood
ratio test statistic is
xn
= -2
e
The null hypothesis H
O
>
R,
n,a
is then defined as
API'
e
-n
--
(2.5)
is then accepted or rejected according as ;t is
the level a (0
,
e)} .
log{L (X ,
l/Ln (X-n , _n
n _n -n
n
<;
<;
-
or
8Pr
a < 1) critical value for 'Y!. The PJJ.ILE
.-on
-n
A
e ,
_n
{ -...n
e,
if
<
R,
if
>
R,
n,a ,
(2.6)
n"a •
'Ib intrcxluce the shrinkage MIE, we oonsider the usual Jarres-Stein (1961) rule
and define
AS
e
-n
=
(2.7)
It may be noted that in (2.7), instead of (p-2), we may oonsider a sequence
{c } of positive numbers, such that as n
n
-+
00
,
c
n
converges to a positive
constant c: 0 < c < 2 (p-2). lbwever, the asyrrptotic picture with the risk
of such estimators would remain the sarre, and,. in this oontext, c= (p-2)
appears to be the IIDSt desirable solution. Also, instead of the scalar mlltiplication of (p-2), it is possible to use same matrix mlltiplication ( as
is usually done in a general shrinkage estimation of the nu1.tinonnal rrean).
However, such a matrix has to be specifically coosen fran the sample estimates
of the unknown information matri:J: , and would therefore eatplicate the procedure.
The end picture ,however, ranains essentially the sarre. Hence, for the sake of
silrplicity, we shall specifically consider the Particular shrinkage estimator
-5-
as in
-n
(2. 71 and append a snall discussion on the other possibilities.
Qlr
primary concern is. to study the asynptotic distributional risk properties of
as and to ccupare these with those of the P'IMLE and the other two versions of
...n
the l-1IE.
For this study, we neerl to introduce the regularity carrlitions under which
the proposed estimators have the desirable asynptotic properties. For the
classical 1\fi.E, these regularity conditions have been considered in increasing
generality in Cr~(l946), Huber (l967) , Hajek (l970) , Inagaki (1973) and
others. In
sen (1979)
, these regularity conditions have been stated along with
the possible generalizations (to cover rrore general cases). Hence, for the
sake of brevity, we anit these details. However, we introduce the following
notations which will be needed in the sequel.
n is
assured to be a convex, coopact subspace of
F! , am,
for each i
(=1, ••• ,k), log f. (.x,e) is alnDst everywhere (a.e.) differentiable with
~
respect to
J.
-
(at least twice) , these derivatives are daninated by sore
integrable functions, and the second derivative matrix has the continuity
prq:>erty in the first
IreaI1.
Further, each of these densities has a finite
Kullback-Leibler infonnation. Let us define for each i (=1, ••• ,k) ,
«!fr.m (a/ae.)logf. (x,e) (a/ae,,)logf. (x,e)dF. (x,e)
Be(i) =
•
"'_
J
u"
and aSSlIDE that
J.
*
-
....,
-0
_
l:""i=l (ni/n) Be
-
J.
-
».J,N"-1
, ••• ,r
also assune that
with reSPect to
~e
sc:me neighbourhood
is positive definite (p.d.)
(2.91
~
heel possesses continuous first and secorrl order derivatives
, for every
= «
(a/ae)h(e)
~
e: n • Let then
»
(of order rxp) ,
(2.10)
and, we aSSlIDE that
~e
is of rank p ( < r) •
(2.11)
-0
Regarding the irrlividual sanple sizes nl' ••• '~ , we make the conventional
•
(2.8)
-0
.;.0
we
J.
, and that
(i)
_k
-
;tv
B~l) , ••• , B~k) are all continuous in ~ in
of the true parameter point e
Be
-
assurrption that there exist positive numbers
l~~ Cni/nl
=
PI' ••• ' P ' such that
k
k
Pi exists for i=l, ••• ,k; Ei=lPi = 1 •
(2.12)
Finally, we asSlJ[OO that the following matrix of order (p+r) x (p+r)
-6-
(2.13)
is p.d.
*
~
Note that unjer (2.12),
k
...0
til
E. 1 p. Ba
~a...0 =
1..=
~
Also, note that if
(
in (2.9) converges to
(2.l4)
1..-...0
denote the reciprocal matrix of (2.13) by
~~ Q~)
0*1
(2.15)
R*
...a...0
...a...0
then, under (2.12), (2.15) converges to
~a...0
~~~
-' :~)
9a
...0
1
where
R
Q
Note that ~a
...0
and ~a
'!a
-0
-0
I
~)
_a
...0
-H
_a
~a
9a
-~o
... ~o
~a...0
...0
= ...p+r
I
. (2.16)
0
-
...0
are all syrmetric matrices. Also, we need
-0
the following definition of the likelihood score function
:,:'II11 (a)
_
where
a
-0
= n-~(a/aa}log
_
e: w • It follows fran
conditions, as n
~
r.n
where
-+-
Q)
~~,
e - _0
a
~
Sen (1979)
~
•
consider first the case where
that urrler the assurood regularity
-~
0
~
) = Fe
flO
_::n
+
0 (1)
~
+ 0_p (1)
~
...
J« 0, Be-1 ) ,
-~
(3.l)
~
:>«
0, ~e
__
) ,
(3.2)
~ ~2 ,
(3.3)
-0
= ~I (F
...
~
(2.l7)
,
(e- - e ) = -Be-1
n~(
,
ASYMPTOTIC OISTRIBUTIOO OF SHRINKAGE MLE
For silrplicity of presentation,
n
= A....n (a-0 )
is the ture paranEter point.
3.
~
O
A
....n
Ln (X
-n ,a),
-
....~
- B -l}B
.... ~o
...0
(F - B -l}Ao +
.... ~o "'~o
-~o
-n
(1)
0
p
p
is a r. v. having the central chi square d. f. with p degrees of
freedan (OF); we denote the upper looa% point of this d.f. by x,2 a • He
p,
•
-7-
may also write
~
..a..,n
~
~
t-.
= n l e~...n
.... ij
°
-
(3.4>-
+ p <l} •
Lee
(a .... -n
a )
--0 _n
By virtue of (3.1) through (3.41, under HO: ~ e: w , we have
1
"5
1
0'
..1 ~...,
n~( ~.;p
a - a 1 =n~( ~.;p
a - a ) + (p-2) ( :;n
fI- AfI)
.n (an"
a,.,l
:.,-..;n
_.
=1
+ opel)
-
I - (P-2) (flo'A1\o)-l(I-P B )]B-alI\O + 0p(l),
a
::n.",::"~
- - a~ - ;JJ
- ;"o::n
-
(3.5)
where
A
=
B ~a )~a-l( I -B a Pa ) •
a _0 _0
- ......0
_""",ON_O
(I -
-
-
~
Consider next, the case of
~ ~
sen (1979) that for every
~
(3.6)
w. It follows fran the results in
w , n-lJ:.
n
converges, in probability, to a
positive constant ( deperrling on !:(~) ~ ~ ), so that by (2.7),
n ~ (a"5 - a
-n
)
-0
n ~ (a-
=
~n
+
- a)
~
The situation is quite different when a
-0
°-p (1)
•
(3.7)
lies near the boundary of w • For
this case, we consider the following sequence {K } of local alternatives
n
K :
n
h(a)
--0
= n -~
# .
y_ a real vector in
y_,
(3.8)
Note that for the nonparametric estimation problems, for shrinkage estimators,
it has been shown in sen(l984)
and sen and saleh(1985) that in such a shrin-
king neighbourl'Dod of the null hypothesis parameter space, the shrinkage estirnators have nice asymptotic properties and daninate the usual ones. In the
current stwy, we like to present the same picture for the shrinkage MI.E.
First, we note that by virtue of (2.10)-(2.11) and (3.8), we can conceive
-
of a sequence {a (n)} of parametric points , such that h( a (n»
~
lim
n~
1
n~ (a
(n)
-0
- e )
-0
=
y* exists , where
_
y
-
,
~
= II_a -y*
-0
•
=0
-,
and
(3. 9)
As such, proceeding as in section 4 of sen (1979), it follows that under {K } ,
n
n~l
n~(
e
.. e
-n -0
1
e-n - e-n
>-
=
=
y* +
-
lia
-
-0
y* + (£a
-
(3.10)
~o + 0p(1) ,
~n
..
~a..l) ~~
-0-0
+
~p(ll
(3.11)
-8-
where we note that by letting ~e
_
D*
-
_
= I - Be0-;,1:)
Pe
-
=~'
('2
nonsingularl ,
-0_
D
and
-0
"""'!
= I- -
D'P e D are idanpotent matrices,
(3.13)
...., #"'1....0-
Cof rank. p l . Further, (3.1) holds under
{K } as well. Thus, fran (2.7), (3.1),
n
(3.10), (3.11), (3.12) and (3.13), we have under {K}
and the asStlll'e:1 regularity
n
conditions,
(p-2)
['2* ~~ - ~e
y* ]
} + 0-p (1)
....0
(P-2) ID U - D'y*]
-o-n
-ID U - D'y*]'ID U - D'y*]
""-'O-n
- -
-c>-n
}
+
--
0
p
(1) ,
(3.14)
(3.15)
U
and
[D U
~-n
- D'y*]
__
'I D
U
-e>_n
- D'y*
-- ]
~
where
Xp2 ~)
-0---
[ 0_0U - D'y*]'1 D U - D'y*]
= y*'DD'y*
= y*'B
y*
_ -- . . , -6_0~
,
(3.16)
stands for a r. v. having the non-centra1 chi square d. f. with P
OF and noncentra1ity parameter
we
~
~.
are now in a position to define the asymptotic distributional risk
(}IDR) of the shrinkage and other estimators. For a suitable estimator _n
e* of
e
,we denote the asyrrptotic d. f.
-0
G*( x_ )
= 1~
~
1
P{ n~(
e*
_n
by
- _0
e ) -<
x_
I
Kn }
(3.17)
where, we assurre that this asymptotic d.f. is nondegenerate • Also, we consider
a suitable positive semidefinite (p. s.d.) matrix W.
e* ,at the
_ Then, the ADR of _n
point
p(
e
_0
,is defined by
-e* ; -
W)
= Trace (
WU ••• f
-
Note that by (3.1) and (3.18),
xx'dG*(x)] ) •
-
(3.18)
.
-9-
P(
-
-1
~ ; ~)
= Tr ( ~e ) , for every
y*
F! ,
£
(3.19)
-0
while, by (3.10) and C3.1S},
"e
p( _
; w1
~
=
-
(3.20)
Tr(WP
,
__ 8 } + (y*'wy*)
_
-0
where, we have made use of the identity that ~e ~e ~e
W
""'"
=
- --
= ~e •
-0 -0 -0
-
Let us denote by
-0
(D }-lw(p/}-l , where D is definerl before (3.121. Then, by (3.14) through
(3.1S-), we obtain that
as
p( -
= Tr(W-0}
i W)
_
U-D'y*}'W (0 U-D'y*)
+ (p-2) 2 E { (D
-0- - -0 -0- - 2
(lD U-o'y*] , ID U-o'y*])
-0- -
_ 2 (P-2}E{
-
-0- -
(D U- D'y*) 'W U
-0-
_..
-0-
}
-
}
,
(3.21)
UD U - D'y*] , ID U - D'y*])
-0-
where
Tr (~)
-
-0_
_
= Tr «!?') -~(!?) -1) = Tr (~ ~e-1
-,..,
(3.22)
) •
-0
Note that by. (3.13), (I - D)D
-
-0-0
= D
-0
- D = 0 , so that (I - D }U arrl D U are
-0
-
.-
-0-
-0-
i.n:ieperrlent r. v. 's with nonnal distributions with null rrean vectors arrl dispersion matrix 1-0
-
•
-0
and D
-0
,respectively. For the last tenn on the right hand
side of (3.21), we write (0 U - o'y*}~W U-= (D U - O'y*)'W 0 U + (0 U - o'y*)~
.....0-
-
-0-
-
....,()-
-
-
.....a-o-
-0-
--
W (1- 0 }U , and make use of the irrlependence of D U arrl (I - D )U • With this
-0-
-0-
-0..
manipulation, the last
reno
..-0-
on the right hand side of (3.21) is equal to
-2(p-2)E{ (LD U- D'y*]'ID U- O'y*])-l(O U- D'y*}'W 0 U } + 0
-0-
-,..,
-0-
-
-
-0-
-
-
= -2(p-2){ E(~2(Li»Tr(Wo) + (:r*'~*lIE(X~2(t.})+
where
-0-o,""
E(X;:4(t.})]} ,
(3.23)
Li is definerl by (3.16), and in the last step, we have made use of the
Stein identity { viz.,
~ix
B of Judge arrl Bock U97S) ]. In this context
note that Tr( (~'}!o (:r*:r*'» = Tr(~(:r*:r*'» = (:r*'!!*) , as it appearerl in
(3.20). Similarly, by using the Stein identity, the secorrl tenn on
the right
harrl side of (3.21) rerluces to
(p-2}2{
E(X-~2(Lil)Tr(W_o) +
-lJ'T"
4
(y*'WY*}E(X- (Li)l } •
-p+4
(3.24)
Therefore, fran (3.21), (3.22), (3.23) arrl (3.24), we concltrle that the ADR of
-10-
e
of the shrinkage MLE is given by
pC
~Si ~) = Tr(~l)
+ (P-2)2{
...0
Tr(~e1)E~2{t~»
+ (r*'!r*)E(XP:4(M)} (3.25)
...0
- 2 (P-2){ Tr (~/) E (~2 (t.»
+ (r* '!r*>L E (~2 (M )+E (~ (M ) ]} ,
-0
where we note that by the Courant theoran,
t.* = r*'!r* =
t.(r*'!r*)/(r*'~e
~ t.Tr(~e1) , for
every
2. t.~ (~e 1)
r*)
-0
-0
~ •
r* e:
(3.26)
-0
we
conclude this section with the following result an the ADR of the PIMLE ,
which follows directly fran the results in section 5 of Sen (1979) :
~
__1
-1
2
p( e i W) = Tr(~e) - Tr(~~e - ~e »ITpt2(~,a;t.) +
-0
...0-0
2
2
t.*1 ITpt2(Xp,ait.) - ITpt4(Xp,a ;t.) ]
=I
2
1 - ITpt2(X-
'p,a
'"
it.)]p(e;
W) + ITpt2(X-2 ;t.)p(e
·p,a-
w) +
(3.27)
where
IT (x; 0) is the non-central chi &JUare d.t. with q DF arrl noncentra1ity
q
parameter 0 •
4. ASYMPTCYl'IC RISK-EFFICIENCY RESULTS
-1
First, we note that by (3.13), ~e - ~e
-1
= ~e~*
-0
-0
is of rank p, and we
-0
assume that p > 2 ( otherwise, the shrinkage estimation would not lead to any
reduction in the risk). To avoid trivial consequences, we therefore assume that
~
,
Tr(~(~/
-
for the chosen
-0
~e » = cp(~' ~e '~e
-0
-0
>
_0
(4.1)
o.
From (3.19) and (3.20), we have then
p ( '"e; w)
~
~
= p(
-
e i W ) - c (W,B
-
'P
e - ...0
e
p - - _0
-
(4.2)
) + t.* ,
so that
><
In particular, under H
O
'
r* = ~ ,
c p (W,B
s) •
e 'P- _0
...,...,-0
-
(4.3)
'"
so that p(f!;~) < p<.~;~). On the other hand,
-11-
outside the ellipsoid : {~*: r*I~* ~ Cp(~'~S '~e
"'lO
noreover, as
U,
p(
-0
y* m:>ves away fran the origin ( i. e., tJ.* or /1
-
converges to +co , while
-
-
'"
~i~) > p( ~i~)
am
,
increases), p (S'" i W)
--
P (~i~) does not depend on r* (or tJ.* or tJ. ). Thus, in
-_n
the light of the A.D.R., none of SandS'"
-n
daninates over the other. Further,
this also reflects the lack of (risk-) efficiency robustness of the restricted
MI:E
against any departure fran the asSl.Irled restraints for which tJ.* is not small.
From (3.19)
am
(3.25), we have
P(~Si~)/P(~i~) = 1 +
E(~2(tJ.» + rtJ.*/Tr(~/)JE(~4(tJ.»}
(p-2)2{
(4.4)
-0
- 2 (p-2){
E(~2 (t1)
+
rtJ.*/Tr~sl)]E(~2 (tJ.)+ ~4 (t1)},
-0
where, by virtue of (3.26),
/1*/Tr(~/) ~ tJ., for every r*
£
~. This familiar
_0
expression arises in connection with the Janes-Stein rule estirrator of the
nultivariate nonnal lIEan vector, and hence, we may conclude that the follCMing
holds:
(i) Under H : So
£
O
to 2/p
(~l)
w , i.e., r*
, for every p
~
=~,
a = /1* =
0 , so that (4.4) reduces
2. '!he 1arger is the value of p, the smaller is
the ratio in (4.4), so that greater is the reduction of the ADR , under H •
O
(ii) The right hand side of (4.4) is never greater than Ii in fact, the upper
asyrrptote 1 is attained as
tJ. or
ADR, the shrinkage estirrator
tJ.* goes to +00
•
Thus, in the light of the
e
as
_n daninates over the unrestrained MIE -n
"'S
Consider next the case of {S}
-n
vs. {S'"_n }. By (3.20), (3.25) , (4.2) and
(4.4), we have then
P(eSiW)/P(~i~)
=
{P(~Si~)/P(~i~)}{
1 -
rcp(~'~S ,~S )-tJ.*]/Tr(~Sl)}-l.
_0
Note that for every
,
r* t
£=
{y*
-0
r*I~* ~ cp(~'~S ,~S )} , tJ.* > c
-0
(4.5)
-0
-0
(W,B e
P -
'P e ),
--0 --0
so that
sup
y*J.
,.
en
e.
AS
"'S
p(~ i~)
x
-
p(S
iW)
-
<
sup
P (~ ;~)
y*t€
P(~ i~)
(4.6)
< 1.
the other hand, the picture may be quite different for
y*
£
e
{and hence,
-12-
under H as
a
~ll).
-
Note that
-
-1
cp(!'~e '~e )/I'r(~e
...0
where ~e ~e
-1 -
-
...0
...0...0
-1
(4.7)
) = 1 - Tr«~e )~e ~e )/I'r(~e ) ,
..0 . . . 0
...0
is an idempotent matrix of rank r-p. Thus, whenever W is of full
..0 ...0
rank (r), the right hand side of (4.7) reduces to (r-p)/r. en the other hand,
..
W need not be of full rank (as will be explained later on). In that case, if
~
has the rank r' (
~
-1
~e
...0
r),
will have the sane rank r' , while the rank of
WP
....e",0
is
ooumed
r-p).
we
denote the rank of ~e
fran belCM by
r' + (r-p) - r (= r'-p) and above by min(r',
by r* , so that r'-p ~ r* ~ min(r' ,r-p). In
...0
this case,
starrlard matrix manipulations lead
sate
us to conclude that (4.7)
reduces to 1 - r*/r' , so that under H : '1.* = ~ , (4. 5) reduces to
O
(2/p) (r' /r*) = 2r' / (pr*) ~ 1 according as r* ~ 2r' /p •
>
<
(4.8)
a)
'!bus, in one extrane case, when r'= r and r* = r-p, in order that (un:1er H
(4.5) is less than one, we need
that r > (P-2) -lp2, P > 2; the opposite
when r < (p-2) -lp2 • In the other extrare case,
inequality holds ( in (4.5»
when r' = p and r* ~ 1, in order that (4.5) is greater than one, we need p ~ 1,
a
o' ..n
i.e., in this case, tmder H
daninates the shrinkage MI..E
as
..n . 'Ibis is
course, not surprising, as under r'=p, for the restricted MLE "e
",n
,of
, tacitly, W
'"
attaches full weight only to the restaints , so that it has no contribution
fran the ranaining (r-p) ccnq:x:nents , and Tr (~e )
~
p
-1-1
•Tr (~e ). To sumup,
...0
...0
we may therefore conclude that unlike the case of the unrestrained MLE, here,
the shrinkage MLE mayor may not daninate over the restricted MLE (un:1er H )'
O
depending on the ranks r' and r* • To illustrate this point, we consider the
follCMing classical exarrple.
Let {X.,i > l} be i.i.d.r.v. 's having the m-variate nonnal distribution
..1.
-
with rrean vector
~ , and let ~ =
while
P
",e...0
~
1:
and dispersion matrix ~ • First, consider the case of known
ard ~(~) = ~ • '!bus, HO refers to ~ =
~. In this case, p=m
is a null matrix, so that r* = 0 and r' = p. '!bus, (4.7) is equal to 1,
and hence , (4.5) is +
"
00
•
'!bis is not surprising as ..n
e =0.... ,with
probability 1,
.
e
-13-
ani hence, under H ' the ADR of the restricted MIE is equal to ~, while the
O
ADR of the shrinkage MIE is a positive nllIlber. Consider next the case of an
unknown
~
(m;1) -vector consisting of the elanents in
~ the
, ani denote by
the uwer triangle of ~ • let then ~ = ( ~', ~')
~l
• '!hen, p
of
r
= m(m+3}/2.
"
'
= ( ~l' ~2) , and let ~(~) =
If oor sole interest lies in the estimation
=(~ ~)
- a a
r' = p
l..l ( treating 0 as a nuisance paraneter), we may choose W such that W
-
where
-
Ttf
am r*
~
= m and
,
-
is a p.d. matrix of order pxp • In that case, we have again
= 0, so that (4.5} is equal to +
00
{
-
under H }. On the otherhand, if
O
is chosen to be a p.d. matrix, then, we have r' = r = p(p+3}/2 and r* =
r - p = (~l), so that p-1 2r' = p+3 and r* is > (or ~ ) p-12r' , for every
AS
A
P > (or _> ) 3. Thus, for m > 4, e daninates over e (under H as well as
O
-n
-n
elsewhere) , while for m
as would daninate over
=
3, under H ' they have the same ADR, though, elsewhere
O
2
Consider next the case, where E = 0 1 , 0 unknown.
6
•
-n
-n
-
Here, we have r = m+l=p+l and r* = r-p = 1 • Thus, for
~
-rn
of rank p+l, we have
r' = r = p+l, ani hence, 2r'/(pr*) = 2r'/p = 2 + 2/p > 1, so that under H '
O
(4.5) is equal to 2(1+p-l} > 2 • Similarly, if
~ = 02[ (l-p)~ + P~~' ] ,where
o and p are unknown, we have r = m+2 = p+2 ani r* = 2 • Thus, for W of rank
r
-
(= p+2), we have 2r'/(pr*} = (p+2}/p = 1 + 2/p > 1, and hence, under H
exceeds 1 • In either of the last
t\o.Q
' (4.5)
O
cases, if we take W of rank p ( with null
elanents for the last one or two rows and colUllU1s), as before, we have under
H ' (4.5) equal to +<x>
O
•
This explains clearly the dependence of the ADR-
efficiency on the choice of ~ , particularly through the rank of ~e
_0
let us finally carpare the ADR of the shrinkage and the PT MLE's. From
(3.25), (3.27), (4.4) and (4.5), we obtain that
p( -as; W)- = {p (as;
-
P (~PT;
~)
P (~ ;
W)~t
-
~)
(x
1 - IT
p+2
+ (b'
2 ;6)
+ IT
p,a.
(x
p+2
/Tr(~e1» [ (~,n'
flpt-2
-0
2 ;6) {P -
(eiW)
p,a.
p (~;~
b) - flpt-4
(~, dMr.
~
-1
(4.9)
-14-
(~O),
Note that for given p, a:O < a <1 and /).
q
(~l),
II ()(2 ;/).} is nonincreasing in
q-'P,a
and for given p, q and a , it is nonincreasing in /). • Thus, for every
~
a e: (0,1) and /). (
0),
IIp+2(X;,a;6) - IIp+4
-1
(~,a;/).)
.
is nonnegative; by using
2
2)] .
(3.26), it also follows that (/).*/Tr(~e » I IIp+2 LXp,a;/).) - IIp+4 (Xp,a;/).
equal to 0 at
/).
~s
..0
=0
and it converges to 0 as /).
In fact, this is a bourxled
-+ +00 •
nonnegative function , where the (finite) upper bound deperrls on p, a,
Hence, defining
the ellipsoid
E as
~
and ~e •
in before (4.6), ~ obtain fran (4.4),
..0
(4. 7) arxi (4 • 9) that
AS
~
p(~ ; ~)/ p(~
;~)
~
1 , for every
~
(4.10)
,. ~
r*
actually, the left hand side is usually close to 1 and it converges to 1 as
+00.
/). -+
Thus, the daninance of the shrinkage MLE over the PlMLE ( in the light
of the ADR) in the danain {r*
in (4.7) and (4.8),
2
P
{1
~
~ ~
1
follows from (4.10). Using the results
also claim that under H
2
2
O
- IIp+2(Xp,a;0) + IIp+2(Xp,a;O) (r*/r') }
'
(4.9) reduces to
-1
,
(4.11)
where r' and r* are defined as in before (4.8). Consequently, under H
for p
~
where
O
( and
2),
~
(4.9) is
1 according as
~
2
IIp+2(x....
2
'p,a ;0) < IIp (x....
·p,a ;0)
depends on the rank of
have,as in before, r'
~
1 - (p-2)/{pIIp+2 (~,a;O)} , (4.12)
is ;
=1
is always greater than or rqual to 1.
- a • Since r*
en
~
r' , for p
the other hand, for p
~
= 2,
(4.9)
3, the picture
and other factors. When ~ is of full rank (r), ~
=r
and r*
=r
- p , so that (4.12) reduces to
(4.13)
In Particular, if r
=p
( as was the case discussed for the nomal rrean vector
with known L ), (4.9) exceeds one whenever
~
case for a
= 0.10
very large (
e
when p
~
11 and for a
=
2
IIp+2 (X
; 0) > (p-2) /p. This is the
p,a
0.05 when p
~
21. 'Ihus, unless p is
and! or a is very small), under HO ' for the case of r=p, the
PlMLE daninates over the shrinkage MLE.
Let us next go back to (4.13) when
..
-15-
r > P ; there, urxier
Ha'
rp-2 ~2). Since
2
II
p+
(4.9) will be less than 1 when
ci
;O}
'p,a
< II (,,2 ; 0)
p 'p,a
= 1-
for (4.9) to be less than 1 (under H ) is that 1
O
IIp+2 ~,a; O} is <
a , a sufficient condition
-Ct
~
r(p-2)/p2 , and this
corxlition is easy to verify when r is large crnpared to p. It also shows that
choosing a large
a ( the significance level) for the preliminary test may
lead to an increased ADR for the PIMIE. let us consider next the case where
W is not of full rank ( as was discussed in the nonnal d. f. exarrple earlier) •
....,
In this case, we may have an extrene situation where r'
o or
1. For r*
= 0,
for p
~
O
= p.
Ha'
(4. 9) is
IIp+2 (~,a;Ol > (p-21/p, and this agrees
Thus, for a
= 0.10
, for p
~
11 and for
a
= 0.05,
21, l4.9t is greater than 1, indicating the daninance of the PIMLE
( over the shrinkage MLEl under H ' For r'=p and r*
O
•
and r* is either
by virtue of (4.12), in order that under H '
greater than one, we neal to have
with (4.13) when r
=p
(4.9) exceeds one, we need to have
= 1,
2
IIp+2~,a;Ot >
in order that under
(p-2)/(p-l)
=1
-(p-l)
-1
Again, by reference to the chi square distributional tables, we conclude that
for a
= 0.05,
for all p
~
9 (and for all p
~
5 for a
= 0.10),
(4.9) exceeds
1 (under H )' In such a case, depending on p and a , the shrinkage MLE may
O
daninate over the PlMLE (under H and hence, elsewhere too) , though this
O
feature is not generally true.
'!he general conclusion fran the aOOve discussion is that the shrinkage
MIE may not daninate over the P'lMLE , though it does so outside a small neighbourhood of the pararceter space (w
under H • M:>reover, over the pararceter
O
space where the shrinkage estimator dominates, the PlMLE and the shrinkage MLE
both have very close ADR , both converging to a ccmoon limit I Tr (~e1) J as
-0
y* rroves away fran the origin (Le., 6
-
or 6* converges to
+00 );
however, in
this case, the ADR of the shrinkage MLE alwayt= lies below the asymptote, while
that of the P'IMLE may be slightly aOOve the asyrrptote for
values of
SCIYE
inteDYEdiate
r*' Further, the PlMIE does not need that p .::. 3 ( as needed by the
•
-16-
shrinkage MIE), and hence, for smaller values of p (say, less than 6), the
may have an edge over the shrinkage MLE. en the other hand, as p becares
P'.JM[E
larger, the shrinkage MIE seans to have a nore daninating picture.
we
conclude this section with serre ranarks on a general fonn of the
shrinkage l-ILE •
....S
e
...n
6
...n
=
we
+
may rewrite (2.7) as
{l -
(p ... 2)
/!.n ]( e
e ).
...n - ...n
(4.14)
In line with the general. shrinkage estimator of the nonnal rrean vector with
unknown covariance matrix, we may replace the scalar factor in the second tenn
an the right hand side of (4.14) by an appropriate matrix, and consider a
general shrinkage estimator as
e
e* =
...n
where
n -+
...n
+ I I - cdr. -1 W-Is-1)
...
n n
n
(e-n - ...n
e),
(4.15)
{c } is a sequence of positive numbers converging to a positive c ( as
n
00
)
d
with c e: (0, 2 (p-2» ,
n
=
"
smallest characteristic root of WS
(4.16)
......n
S
....n is a BAN estimator of
and
......n
Bel
- P
_ e • Ideally, c may be taken as p-2 •
_0
...0
In (4.2), we have tacitly assumed that S W is of full rank, in. probability. I f
...n ...
ws
......n
is-n0t of full rank, we may work with a generalized inverse, and in that
case, for d
~e 1
-0
-
~e
n
in (4.16), we take the smallest positive root. Replacing S
~
in (4.16), the corresponding quantity is denoted by 0
by
Then,
-0
S
...n
p
-+
-1
~e
...0
-
-
~e
and
-0
d
n
~
o,asn-+
oo ,
(4.17)
so that if we define
e**
_n
= .e
+ I
..n
I - co
_
,.-~-l(B-el- P )-1]
_...... e
... 0
-0
tJ..... n _
(e- n-- n
8) ,
(4.18)
then, it follows by a direct use of the Slutzky theorem that under {K } ,
n
n~
II ...n
e* - ...n
e** J I
p
-+
0, as n
-+
00,
(4.19)
so that both the estimators in (4.15) and (4.18) would have the same A.D.R.
en
the other hand, for (4.18), we may use (3.10)-(3.13) and derive that
-17-
Wlder {K }
n
am
the assUllBi regularity c:orxiitions ,
o
AO + y*) + [I - CO~-~-l(D*)-lBa ] (BalD*A - y*) + 0 (1)
--o_n
n --0 --0- _n
-p
n~(a* - a ) =
_n
(P a
-0
= (D')-~
....,
--.."n
- cO(W-~-l)
U - O'y*)/(P
U -O'y*)'(O
U -D'y*)
+ 0.....p
(1),
...., "",,-VQ (D
-Q....,n
_ _
-Q_n
...., ....,
_o....,n
"., ....,
where we adopt the sane notationa as in (3.13)-(3.15), and assmne that
full rank ( note that 0
-0
(4.20)
am
~
is of
D* are idempotent matrices, so that their (genera-
lized) inverses are the sane matrices, and this adjust:nent is necessary when
D* or D is rx:>t of full rank).
-
we
define W as in after (3.20), and proceed as
-
-0
in (3.21) through (3.24), and cbtain that
p(~*
~) = 'Ir(~/)
- 2co{
-0
E(~2(6)'Ir~)
E(~4(6)]]}+
+
2 2
c 0 {
+
(r*'~*~e
.
r* )
-0.
E(~2(6»Tr(~~)
(E(X~2(6»+
+
E(X=~
(6» (y*'O*B -w-ls O*y*)} •
-PT4
- - _a _ _a - -0
(4.21)
-0
'Ihi.s expression is quite similar to that of (3.25), and parallel to (4.4), it
can be shown that 8 * daninates over the unrestricted MLE
a in the light -of
-n
_n
the ADR. The discussions following (4. 7) (through (4.9»
also pertain to the
* may not daninate
A
case of p(~*;~)/p(~;~) (Wlder HO)' and this shows that
~n
over the retrieted MLE when H ooIds. The sane conclusion holds for the PIMLE.
O
~ver,
the relative picture of (3.25) and (4.21) is a lot nore ccnplicated
and depends heavily on all the matrices appearing in the t:v.o expressions.
In particular, if we let ~
= -~8
-1
then we have Tr (~8 )
-0
~sitive characteristic root of 0*
-
= 'Ir(0-0 ) = rank
y*'S
_ _8
P
_a
S_8 y*
_ -<
-0 -0 -0
am
of 0
-0
6
,
50
, 0
(= 1), Tr (0
) = rank of 0*
-0
_
y*'O*S
y*
. . . - -8 _
-0
=P
..()
=
= smallest
, Tr (W
-lu )
-0-0
w-Is-8....,....,
O*y* = . y*'S
y*. , _8....,
y*'O*S
....,...., _6 -0
-0
_0
that the t:v.o expressions beccIte, very ITllch canparable.
REFERE"NCES
CRAMER,
H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press,
New Jersey.
HUBER, P.J. (1967). 'l.'he behavior of maximum likelihcxx1 estimators under non-
standard conditions. Prac. Fifth Berkeley Syrrf. Math. Statist. Probe
!.l
221-234.
IBRAGDOV, LA. ANL> HAS'MrnSKII, R.Z. (1981). Statistical Estimation: Asy!"ptotic
'l'hea:ry. Springer-Verlag, New York.
INAGAKI, N. (1973). Asyrrptotic relations between the likelihood estimating f\IDction
and the maximum likelihood estimator. Atm. mst. Statist. I-Bth. ~, 1-26.
~,
G.G. AND OOCK, M.E. (1978). The Statistical Inplications of Pretest and
Stein-Rule Estimators in Econanetrics. North - Hollarrl, Amsterdam.
SALEH, A.K.l-tl.E. AND SEN, P.K. (1978). NonPararretric estimation of location
parameter after a prelimina:ry test on regression. Atm. Statist. §., 154-168.
SALEH, A.K.1'ti.E. AND SEN, P.K. (1984). Least squares and rank order preliminary
test estimation in general multivariate linear rrodels. Prac. Indian Statist.
Inst., Golden Jub. Confer. statist. Apel. New Direc. (00. J .K.Ghosh) ,237-253.
Sl::N, P.K. (1979). Asynptotic properties of maximum likelihood estimators based
on conditional specifications. Atm. Statist.
~,
7..,
1019-1033.
P.K. (1984). A Jarres-Stein detour of u-statistics.
I-eth.
~,
camn.m.
Statist. Thear.
2725-2747.
SEN, P.K. AND SALEH, A.K.1'ti.E. (1979). Nonpararretric estimation of location
pararreter after a preliminary test on regression in the multivariate case.
Jour. Multivar. Anal.
~
, 322-331.
SEN, P.K. AND SALEH, A.K.lvti.E. (1985). On
salE
shrinkage estimators of multi-
variate location. Atm. Statist. 13, in press.
e
>-
© Copyright 2026 Paperzz