Sen, P.K.; (1978).Asymptotic Properties of Maximum Likelihood Estimators Based On Conditional Specification."

BIOMATHEMATICS TRAINING PROGRAM
ASYMPTOTIC PROPERTIES OF MAXIMUM LIKELIHOOD
ESTIMATORS BASED ON CONDITIONAL SPECIFICATION
by
P. K. Sen
Ilepartment of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1158
February 1978
.'
~.
'-,..
ASYMPTOTIC PROPERTIES OF MAXIMUM
LIKELIHOOD ESTIMATORS BASED ON CONDITIONAL SPECIFICATION*
Pranab Kumar Sen
University of North Carolina, Chapel Hill
ABSTRACT
Along with the asymptotic distribution, expressions for the asymptotic
bias and asymptotic dispersion matrix of the preliminary test maximum likelihood estimator for a general multi-sample parametric model (when the null
hypothesis relating to the restraints on the parameters may not hold) are
derived and compared with the parallel expressions for the unrestricted and
restricted maximum likelihood estimators.
This study reveals the robustness
property of the preliminary test estimator when the assumed restraints may
not hold.
AMS 1970 Classification No. 62A10, 62F99
Key Words & Phrases: Asymptotic bias, asymptotic dispersion matrix, asymptotic relative efficiency, conditional specification, maximum likelihood
(restricted and unrestricted) estimators, preliminary test estimators,
restraints on parameters.
*--------Work supported partially by the National Institutes of Health ,Contract
No. NIH-HHLBI-71-2243 from the National Heart, Lung and Blood Institute.
-2-
1.
INTRODUCTION
In a parametric model, assuming that the underlying distributions are
of specified forms and the parameter (vector)
parameter space
e
~
of
belongs to a suitable
~
the (unrestricted) maximum likelihood estimator (MLE)
Q,
is obtained by maximizing (over
~
is an (asymptotically) optimal estimator of
n,
and a restricted MLE
the likelihood function
n)
Under appropriate regularity conditions,
of the sample observations.
a proper sui space of
E
~.
In certain problems,
w,
can be identified from extraneous considerations
can be derived by maximizing the likelihood
of
A
function subject to the restrain that
~.
cally) a better estimator then
~ ~
actually
~ E
w.
When
~
w,
~
is (asymptoti-
But, if contrary to this assumed restrain,
A
w,
then
~
may not only loose its optimality but also may be
a biased (or even an inconsistent) estimator.
ness of
~ E
This lack of validity-robust-
may be of some concern in a class of problems arising in applied
w(e n)
statistics, where
can be suggested from certain practical considera-
eE w
tions, but, there may not be sufficient apriori evidence of
warrant the use of
A
between
ing:
~
and
~
~
without any reservation.
A preliminary (likelihood-ratio) test for
H :
O
~ EW
In such a case, a compromise
based on a conditional specification appears to be appeal-
liminary test (PT) estimator
as
so as to
8*
~
H :
O
is then taken to be
~ E W
is made, the preor
according
is tenable or not.
For a variety of specific problems, mostly, relating to univariate and
multivariate normal distributions, various workers have considered various
PT estimators; we may refer to Kitagawa (1963) and a recent bibliography
by Bancroft and
l~n
(1977).
The object of the present investigation is to
-3-
study the asymptotic properties of the PTMLE
~*
under the classical regu-
larity conditions pertaining to the asymptotic theory of
A
~
or
As is usually the case with PI estimators,
Aitchison and Silvey (1958)].
is not generally (asymptotically) optimal or unbiased when
theless, it has good asymptotic properties when
& and &.
A
perform better than either of
w,
boundary of
& [viz.,
8
~
Indeed, for
,
never-
8*
may
E W·
~~w,
For
E W.
8
~
close to the
expressions for the asymptotic bias and dispersion matrix
have been derived for each of
Q,
Q and
&*
and incorporated in the study
of the comparative performances of these estimators.
Along with the preliminary notions, these estimators are introduced in
Section 2.
Section 3 deals with their asymptotic distributions when
Parallel results for the non-null case are presented in Section 4.
The last
section is devoted to the asymptotic comparisons of these estimators.
2.
PRELIMINARY NOTIONS
Since in a PTE problem, typically, a multi-sample situation may be
invovled, we conceive of
by
n.
1
k(~
1)
F.(x,8),
1
for
i = I, ... , k ,
p(~ I)-dimensional Euclidean space and
&=
(8 " , .,8 )'
1
t
Actually,
F
i
i(= l, ... ,k),
df.
~
may not depend on all the parameters
rather, each element of
Further, we assume that for each
admits a density function
sure
l
1
independent and identically distributed random vectors (i.i.d.r.v.)
with a distribution function (df)
the
X.1 , ... ,X.In.
independent samples and let
lJ).
f.1 (x,8)
~
~
8 , ... ,8
1
where
Ene
t
x
E
EP ,
t
E .
for every
is associated with at least one
and
i(= l, ... ,k), F.(x,8)
1
~
(with respect to some sigma-finite mea-
Then, the (log-) likelihood function is defined by
-4-
k
n.
log L (X ,8) = \'. 1 \' . 1 109 f. (X. . , 8) ,
L1 = LJ =1
1
1J ~
(2.1)
n"'11~
k
n = LIn.
n1
where
and
X = (XII" .. ,X
) is the sample-point
"'11
knk
8
QO(E r2) is not known. A unrestricted MLE "'11
The true parameter
element of
is an
such that
r2
1
(2.2)
e)
og Ln (X
"'11''''11
Suppose now that
QO'
=
~
sup
(X 8)
E r2 log Ln "'11'~
.
though unknown, belongs to a subset
w,
where
(2.3)
A
Then, a restricted MLE
8
"'11
sup
(2.4)
ftEW
For testing
H :
O
L
(2.5)
n
Q E w,
=
Let
l
n,a
E W
n"'11~
L (X ,8)1
n "'11 ~~
rsup
~
log(L (X ;8 )/L (X ;8))
n "'11 "'11
n "'11 "'11
E
r2
L (X 8)~}
n "'11'~~
.
be a real number such that
~ a > P{L
(2.6)
where
log L (X ,8) .
the classical likelihood ratio statistic is
-2 log {~sup
l§.
= -2
w such that
is an element of
a(O < a < 1)
n
~ l
"0'
IH O}'
is the desired level of significance of the test.
the likelihood ratio test consists in rejecting
amI accept ing
n,a
otherwise.
'nlC
PTMLE
8*
"'11
H : 8
O
E W
when
is defined by
Ln
Then,
>
l n,a
-5-
L
if
n
L
n
(2.7)
if
:0;
l
>
l
n,a
n,a
Our primary concern is to study the asymptotic properties of
pare them with those of
(8' },
and
when
"'I1
H :
O
~ E W
{8*}
and com-
"'I1
mayor may not
hold.
For our study, we make the following assumptions:
is a convex, compact subspace of
[AI] :
(both
E
for at least one
(2.8)
f.(x,Sl)
1
[A2] :
For every
Q),
~f.(x,82)
1
~
exists.
~
S).EQ
.-
=
J
i(= l, ... ,k),
at least on a set of measure non-zero.
and every
i(= l, ... ,k), Z.(e) = flog f.(x,8)dF.(x,8 )
1
EP
~
1
~
1
where for every
~ E Q,
f.1 (x,S) = f.1 (x,SO)
~
~
For every
109{f.1 (x,8 )/f.1 (X,8)}dF.1 (x,8 0 ) = Z.1 (8 ) - Z.1 (8)
~
EP
0
Ii (~,~O)
~
~
0
~
~
0
~EQ
and
with the strict equality only when
i(= l, ... ,k), log
~
fi(x,~)
is (a.e.) thrice
and
G (x)
(2.10)
s
,
and
where
J
EP
~
almost everywhere (a.e.)
differentiable with respect to
(2.11)
~l ~ ~2'
In fact, for the ith density, the Kullback-Leibler information is
(2.9)
[A3]:
,
and for every
Gs (x)dF.1 (x,Oo) <
~
00
for
i = l, ... ,k
and
and where
s
=::
1,2,3 .
~
0
-6-
[It is possible to eliminate the third order derivatives conditions in
(2.10)-(2.11) by imposing the following:
(2.12)
It is also possible to avoid both the second and third order derivatives
conditions in (2.10)-(2.11) by those in Huber (1967) and Inagaki (1973).
But, these alternative conditions, in turn, require extra conditions on
the first and second order moments of
for small
6(> 0).
In the sequel, we shall deal with (2.10)-(2.11) only-
though towards the end of Section 3, we shall make certain comments on
these alternative conditions.]
For every
[A4]:
i(= 1, ... ,k)
].(..1'"
Let us define for each
llI
(2.15)
[A5]:
~o
~ E:
fEP (a2/ae.aeo)f.(x,e)d~(x)
(2.14)
~~i) =
and
p
E
(d!d8 j
(1)
~~
, ..
= 0,
V j,l = l, ... ,t .
i(= 1, ... ,k)
fi(x,~)(d!d8l) log
log
)
'"
n,
(k)
"~e
are all continuous in
~
in some neighbourhood of
and
(2.16)
~~
"'0
k
(i)
= I i =l (ni/n)~e
~O
is positive definite .
-7-
h(~)
[A6]:
posssesses continuous first and second order derivatives with
~,
respect to
'f/
~ E:
!is
(2 .17)
Let then
SG.
= (( (a/aQ)h(~)))
(of order
[A7] :
!is~O is of rank r« t) .
[A8] :
The following matrix (of order
(2.18)
t x r) .
(r+t) x (r+t))
~e
[ ~e
B'~O
-H~O ]
-H'
~e
Q
is of full-rank
~O
and we denote by
-1
(2.19)
~~
Note that
* ge*
£e'
(and hence,
~O
through
k
n~
Under
- n
n
i
and (0 < p. < 1)
= p.
[A9],
Ee* )
and
~O
may depend on
n
~O
We make the final assumption:
nl, ... ,n .
lim 1
[A9] :
~O
1
1
B*
converges
~O
exists,
V
l~i~k
and
\'~
1 .
l.1= P1
= 1.
t 0
~O
k
(2.20)
(i)
= l.1=
\'. lP,1~
Be
~O
~*
and in (2.19), on replacing
by
right hand side (rhs) are denoted by
~,
the corresponding matrices on the
£e ' ge
~O
these do not depend on
n.
.E e
~O
(2.21)
-I
Q
~~O
and
~O
Note that, by definition,
Qe
~e
-H
~8
Ee
-H'
~8
Q
~O
~O
~O
~O
~O
=
t :J
Be
~O
respectively;
-8-
Note that
~8 ' £8
~O
Ee
and
~O
are all symmetric matrices.
For later use,
~O
we also define the vectors,
/I.
(2.22)
a
"'Il
and for vectors or matrices use the notations
=
/I. (8 ) ,
"'Il
~
and
0
"'P
0
0
"'P
(or
0
and
Q)
the sense that these orders apply to the individual elements of them.
2.
ASYMPTOTIC DISTRIBUTION THEORY UNDER
H :
O
First, consider the case of the unrestricted MLE
8
tions made in Section 2,
and further
~O E: W
Under the assump-
exists (a.e.), it almost surely (a.s.) converges
"'Il
[viz., Silvey (1959)],
n-+
as
oo
,
(3.1)
Also, by a direct application of the multivariate central limit theorem,
(3.2)
O
A
~n
g Nt (0
B )
~'~e
~O
as
'
n -+
00
•
Consequently, from (3.1) and (3.2), we have
(3.3)
n~(e
-8
"'Il ~
0
For the restricted MLE
)
g Nt (0
B;l)
~'~8
as
00
•
A
8,
"'Il
consider the equations
t!8~ = Q
(3.4)
n -+
~O
-9-
r
A(E E )
where
and
J:.
are
"e
"'1l
is a Lagrangian multiplier vector; the solutions for
" ,
A
and
respectively.
"'1l
From the results in Section 7
of Silvey (1959), we conclude that under
conditions of Section 2, as
~
(3.5)
e
H :
O
~O E
wand the regularity
n+ oo ,
"
n 2(e
"'1l -eO)
~
=P
/1.0 +
0
~~0"'1l
~
(1)
(3.6)
Lh in (2.5)
and further for the likelihood ratio statistic
L
(3.7)
'"
n
Note that by (2.21),
=n
~e
~O
-t!~
~O
ge
= 1.
,_
~
A
(e"'1l -e"'1l ) ~~O
Be (e"'1l -e"'1l )
Ee
=
~O
Thus, noting that
~O
1
+
I""oJ
13e Sa '
-,
fe
~O
=
~O
+
~e
~O
~O
0
P
(1)
ge
=
~O
13e Ee ' -!:!~
~O
~O
~O
Ie
= 0
~
and
~O
we have
fe '
~O
(3.8)
Hence, from (3.2), (3.5) and (3.8), we have under
as
(3.9)
n
+
H :
O
~O
E
W
00
Also, from (3.1), (3.5), (3.6) and (3.7), we have (on using the identities
presented before (3.8)) under
H :
O
~O E W,
(3.10)
=
_~'R:l~
~n~~O"'1l
+
a (1)
p
R- 1Q' /1.0 +
~n ~~O ~O ~O "'1l
= _AO'Q
(1)
0
P
-10-
where it is easy to show (by using (2.21)) that
(3.11)
(.12)
Hence, by (3.2), (3.10), (3.11), (3.12) and the Cochran theorem on quadratic
forms in (asymptotically) normally distributed random vectors, we obtain
that under
L V
(3.13)
and let
n +
2
Xr,a
100a%
be the upper
degrees of freedom (DF).
(3.14)
Xr2
point of the chi-square d.f. with
Then, from (2.6) and (3.13), we have
R...
n,a
+ '/
''y,a
Let us now consider the case of
as
n +
00
{e*}.
",n
•
By (2.7), we have for every
t
YE E ,
'"
(3.15)
~ '"
+ P{n 2(e -eO) ~ y,
"7l '"
L
n
> R...
n,a
IHa }
By (3.2), (3.5), (3.6), (3.10) and (3.14), the first term on the rhs of
(3.15) converges to
(3.16)
r
", --1"
-nA"7l'"
Re "'n
A
"'0
2
r,a IH a}·
~ X
Note that by (2.21),
p
so that
/1.0
~~o~
and
g~ K
e fe
~o ~o ~o
!-:;"
n 2;\
=
0
~
,
are asymptotically independent, and hence, (3.16)
~
reduces (asymptotically) to
- fj 0
P{£e
(3.17)
where
~
~O
Gt(~;~'
E)
n
I
y HO}PU
<S,
~
stands for a
and dispersion matrix
~.
n
<S,
X2
t~variate
r,a
IH
}
O
multinormal df with mean vector
Let us also denote by
(3.18)
Then, by (3.1), (3.10) and (3.14), the second term on the rhs of (3.15) is
asymptotically equivalent to
(3.19)
where by (3.2) and (3.6), under
a(t+r)-variate normal df with
0
H ' (B-l/l.
O ~~O~'
--1
2e
~O
~O
(3.20)
-,
-R
~e
ge
~O
~O
~o
~o
=
fe ·
~o
°
~o
(3.19)
given
n~<'A
11
ge Ee k
~o
Hence,
~O
-1
~e ~
multi-normal with mean vector
§e Ee g~
has asymptotically
~
Q mean and dispersion matrix
!!e
so that the conditional df of
n~~)
~
=
z
is asymptotically
and dispersion matrix
~o
converges to
--1
~e
~o
+
-12-
(3.21)
From (3.15), (317), and (3.21), we obtain that under
=
H :
O
~OEW,
(l-a)Gt(l;Q,I~o)
+
JE Gt(l+ge Eel~;Q,Ee
~o
l
~o
~o
Thus, the asymptotic distribution of
I n par t 1· cu 1 ar,
l' f
-Q -R 1 = ~
0
~.@,O~.@,O
)dGr(~;Q,-Be ) ,
~o
is, in general, non-normal.
(which also implies that
-1
= 1!e ),
(3.22)
~o
reduces to
(3.23)
so that all the three stimators have the same limiting normal distribution.
Remark.
Our (3.1), (3.5) and (3.6), as adapted from Silvey (1959), rest on
the assumptions made in Section 2.
replace (2.10)-(2.11) for
s=3.
As mentioned in Section 2, (2.12) may
In such a case, our (3.1), (3.5) and
(3.6), would follow from the results of Feder (1968).
Also, we may pro-
ceed as in Inagaki (1973) and show that (3.1), (3.5) and (3.6) follow,
provided (2.13) satisfies appropriate growth conditions.
Rest of the for-
mulae in this section remains the same irrespective to the particular
approach we choose.
BIOMATHEMATICS TRAINING PROGRAM
-13~
4.
ASYMPTOTIC
Note that if
exists a
HO:
~*(E w)
~O E W
NON~NULL
DISTRIBUTION THEORY
Q(~O) ~Q),
does not hold (Le.,
then there
such that
(4.1)
where
r
~*(E E )
[AI] and [A2]).
~* #~O
is a Lagrangian multiplier and
In this case,
in (2.2), stochastically converges to
A
while
e
~n
(by assumptions
~*,
in (2.4) converges stochastically to
in (2.5) tends (in probability) to
00
as
n
~
and hence,
L
n
Consequently, by (2.7)
00.
and (3.14),
(4.2)
limp{e* ~ e leo ~ w}
n~
~
~n
~
:::;
limpU:::;1
leo ~ w} = 0
n
n, a.
n~
,
~
and hence, noting that (3.3) does not depend on
not, we have from (3.3) and (4.2), for every
r
H :
O
E
~O E W
being true or
t
E ,
(4.3)
and
1:
~
n 2(8 - 8 ) are asymp0
totically equivalent in probability and have the same (asymptotic) multi-
Thus, for any (fixed) alternative,
normal distribution.
the boundary of
{K}
n
(4.4)
w.
~
The situation becomes different when
~
~o
lies near
For this study, we conceive of the following sequence
of local alternatives:
r
real~vector
r
(E E )
,
and consider the asympttoic distributions of the estimators under
{K }.
n
~14-
First, (3.3) holds irrespective of
H or
O
{K },
n
and hence
(4.5)
Also, under
denoted by
K,
n
~*, ~*
the solutions
~(n)' ~~n)'
in (4.1) depends on
respectively.
Note that
h(~(n))
n
= Q.
and are
Hence,
under (4.4) and the assumptions of Section 2, we have
(4.6)
(4.7)
From (4.6) and (4.7), we conclude that
· 1
1 1m
'2 (e *
e) - *
n-~'()p ~(n) - ~o - "1
(4.8)
and
both exist, and
'*
* *
X =~e X ' t!e ~ =.lie X
~o
~o
~o
(4.9)
(=> ~
*
-, -, * * --1
*
= -E e t!e r ' r =.lie!:!e ~ ) .
~o
-0 -0
~o
From (3.4), (4.4), (4.6) through (4.9) and under the assumptions of Section
{K },
n
2, we obtain that under
-
(4.10)
+ P
~
e I\.
~o
=
(4.11)
~*
.-
+
Q'
e
-n
in (3.4).
+
0
~p
(1)
1\.0 + 0 (1) ,
~~o~
A
where
0
~
~p
A
is the restricted MLE
and
A
~
is the lagrangian multiplier
Comparing (3.5) and (4.10) and using the same arugments as in
(3.8) and (3.9), we obtain that for every
(4.12)
lim
k2
A
P{n (8
n~
-eo)
~n ~
::;YIK} =
- -n
Gt(Y-Y*;Q'Ye )
'-'-'- .-~o
-15-
It follows similarly that (3.7) continues to hold under
{K },
n
where by
(3.1) and (4.10), we have
r*
(4.13)
=
=
+
cI'e
8e1),I\0
,...,""'0 ""',...,O""'n
S-l H A*
""'~O""'~O"'"
-
{n A } +
""''''''0 -0
Thus, under
as
{K },
n
n
+
(1)
(1)
+ 0
"'P
(1)
0
"'P
"'fi
00
L = -n~ Re-1~
(4.14 )
0
"'P
g- HI 8- 1,1\0
QO""'~O""'~O"'fi
~'"
--1
Qe Be
+
+
n
"'fi"'"
""'0
"'n
+ 0
P
(1)
On noting that, by (2.21),
•
~e
and
ge
-1,
=
we conclude
""'0 ,""'0
from (4.11) and (4.14) that
chi-square df with
r
OF
has asymptotically a non-central
and non-centrality parameter
(4.15)
rrr(x;~*).
we denote this df by
Then, from (2.7) and (3.14),
(4.16)
!::2 A
P{n (e -eO) ~y, L ~t
"'fi -
,...,
!:: ,...,
+ P{n 2(~n-QO) ~
Since by (2.21),
(4.11) that under
-I
-
-
n
n,a
,-y., Ln
IK}
>t
n
n,a IKn }
-,
f e ~e ge = f e ~e Be = Q,
""'0 ""'0 ""'0
,...,0 ""'0 ""'0
K
n'
1 A
n~(e -e)
"'n
""'0
and
!::A
n 2A
"'fi
,
v .-Y.
t
E
E
we conclude from (4.10) and
(and hence, by (4.14),
L)
n
are asymptotically independent, so that by (3.2), (4.10), (4.11), (4.14)
and (4.15), the first term on the rhs of (4.16) converges to
-16-
- )IT (X 2 ;~)
*
Gt(y-y * ;O~Pe
~ ~
~ ~~O
r r,a
(4.17)
Also, let
(4.18)
'If £,
E
r
E
Then,by (3.1), (3.2), (4.11), (4.14) [and the fact that by (2.21),
-
--1'
ge Be ge
~O
~O
-
--1
Ee
=
~O
- ~e ],
~O
we obtain that the second tern on the rhs of (4.16)
~O
is
(4.19)
=
1
J
R
) + 0(1) .
- X;0'Pe )dG r (X;O,-R
* Gt (Y+Qe
~ ~
~e ~ ~ ~
~ ~
~e
E (~ )
~O ~O
~O
~O
l
From (4.16), (4.17) and (4.19), we arrive at the following.
Theorem 4.1.
(4.20)
Under
1imp{n~(8*-8 )
O
n~
~n
Here also, if
(4.21)
{K}
n
Q
~
~l
~.§O~.§O
~n
(4.4) and the assumptions of Section 2,
<_}
Y = Gt ( y-Y * ; 0 , -P
~
=
Q,
~
11*)
e )IIr (2
X;u
r
~
,~
~~O
(4.20) reduces to
Gt(r-r*;Q,Ie )rrr(X~ a;~*) + [l-IIr(X: a;~*)]Gt(DQ,I8 )
~O
. 1> ,
.1> ,
~O
(that is a mixture of two mu1tinormal
normal.
~
df's).
But, in general, it is non-
For later use, we denote the probability density functions (pdf)
corresponding to
G
t
and
and
*
gt'
respectively.
Then~
-17-
(4.21)
'ifl:E:E
5.
Let
t
ASYMPTOTIC COMPARISON OF THE ESTIMATORS
{Tn} be a sequence of estimators of
20
such that
Then the mean
has a limiting distribution with finite second moments.
vectop and dispepsion matPix of this limiting df are taken as the asymp!,:
n 2 (T
-8 ),
"1l
0
Here, we study the asymptotic bias and a.d.m. of each of the three esti-
totic bias and asymptotic dispepsion matpix (a.d.m.) of
mators considered in earlier sections and compare them.
selves to the sequence
{K}
n
r = Q.
follows from (3.1), (3.2) and (3.3) that
when
(5.1)
(5.2)
We confine our-
of alternative hypotheses in (4.4), so that
the null hypothesis case follows by letting
It
~
A.d.m. of
!,:
{K}
n
holds
A
n 2 (8
-8 )
"1l
0
when
~
{K}
n
holds
Similary, from (3.2) and (4.10), we have
(5.3)
= Asymptotic
!,: /\
bias of
n 2 (8"1l ~eO)
~
when
{K }
n
holds
-18-
when
(5.4)
* *,
=r
r
+
- -,
~e ~e ~e
~O
~O
* *'
=r
~O
r
{K}
holds
n
+
~e
~O
At this stage, we note that for a multinormal df
(5.5)
GP (x;O,D),
~
~
c
xx 'dGp (x
; 0 , D)
~ ~ ~
(~+~)
, -1
Q
~
= {l
;?:
0 ;
- IIp+ 2 ( c ; 0) }~
D -
(~+~»c
aa'{IIp (c;o) - 2IIp+ 2(c'o) +IIp+ 4(c;o)}
(5.6)
where
~
a[IIp (c,o) -IIp+ 2(c,o)] ,
=
J
~
~
o = ~'Q-1 ~.
Thus, from (4.20), (4.21) and (5.5), we have
when
(5.7)
.t:,,*) -g R-1A*[II (2
- r *II r (2
Xr a'
- e ~e ~
r Xr
,
a'
2 .t:,,*)
- -Q R1A*II
(X
- ~8~O"'8
~
r+2
r
r'J"
~O,u.
.!.
-
"'O~O'
.t:,,*)
{K}
n
II
(2 .t:,,*)]
- r+ 2 Xr a'
,
(as"':!.*
= -Q8
{K}
n
holds
R- l >..*)
8
"'~O~~O~
from (4.20), (4.21) and (5.6), we have
when
**
2 ;t:,,)
*
+ P )IT (X
e
~~'O
r r ,a
= (y y
~ ~
-
+ P
e
~~O
{I - II (X
r
holds
2 ;f'.,)}
*
r ,a
-19-
=
[1 - II
Eo
~O
r+
2 (X
2
r,a
* - --1-1
; /: " )] Qe R Qe
e
~~O~~O~~O
* - II +4 (X 2 ,,,,;/::" * )}
+ Y*Y*I{ 2II r +2 (X r2 ,a ;/::,,)
r u.
r
= --1
~e
~O
~ 1)
(2
/::" *)
* * 1 {2II
(2
/: " *)
+ (!'.e - ~e II r +2 Xr,a;
+
r+2 Xr,a;
~O
rr
~O
II
-
(2 ./::,,*)}
r+4 Xr,a'
We now proceed to compare the asymptotic bias and a.d.m. of the three
estimators.
that
First, concsider the null hypothesis case where
= 0,
y*
y
= 0,
y*
(5.9)
A*
=0
= O.
=
S (0)
~
~2
= S*(O) = 0
~
~
~
so that all these estimators are asymptotically unbiased.
(5.10) ~l (0)
--1
= ~o '
~O
~2(Q)
-
= Eo
so
From (5.1), (5.3) and (5.7), we have
=> ~l
S (0)
~
(5.4) and (5.8), we have for
20 E W,
r
=
and
Also, from (5.2),
Q,
*
v (0)
~O
=
--1
--1 -
2
Be - (Be -Pe)II 2(X
;0)
.- ~ 0 ~ 0 r+ r,a
~_·o
~
~
where
O < II r+ 2 (X2 , a ; 0) < II r (2
Xr , a ; 0)
r
(5.11)
=1- a <1
•
Also, by the identities in (3.8) and prior to it, we note that both
-I
and
-!!e ge
~O
~O
are idempotent matrices and
is non-singuler.
~O
follows that
(5.12)
~e!'.e
--1
Q
~O
e -
P
~go
is positive semi-definite
From (5.10), (5.11) and (5.12), we conclude that
(p.s.d.).
~O
Hence, it
-20-
In the multiparameter case, the relative efficiency may be judged by
the generalized variance (D-optimaltiy) or the trace of the covariance
matrix (A-optimality) criterion.
~e
both
and
~o
£e
~l (Q), ~2(Q)
are of full-rank, we have
and
~*(Q)
~o
also of full rank.
criterion.
In view of the fact that by (2.18)-(2.20),
Hence, we have to difficulty in applying the first
Similar results hold for the second criterion too.
We define
the "asymptotic generalized variance" as the t-th root of the determinant
of the a.d.m.
of
{~~}
In this light, the asymptotic relative efficiency (A.R.E.)
with respect to
{~}
H : ~O
when
O
E
w holds is
(5.14)
~
by (5.13)
1,
where the equality sign holds when
B- 1 = P
~~O
i.e. ,
~~O
~o
Similarly, the A.R.E. of
lently,
-I
!:!e ge =Q
{e* }
~
or equiva-
~o
with respect to
{ "'e }
~
(5.15)
~
by (5. 13)
1 ,
where the equality sign holds when
-
--1
ge Be
~O
(5.16)
Thus, under
H ' {~}
O
may
nevertheless, it is asymptotically, at least
not perform as well as
as good as
= Q.
~O
In this sense, we have the ordered relation
{8 } <
~n
{e*} <
~n
{e}
~
when
H holds .
O
Let us now consider the general case when
from (5.1), (5.3) and (5.7) that
{K}
n
holds.
It follows
is
-21-
and
(5.17)
where
2
. 6. *)
o < IIr+2 (X r,o.'
<; II
r+2
(X
2
2
. 0) < II (X
. 0) = 1 - a < l.
r,o.'
r r,o.'
Thus, on
noting that
o1 im
too
(5.18)
we conclude that whereas
oIT
8
(2
r+2 Xr,o.;
o,
0) =
e
is asymptotically unbiased,
"'J1
and
"'J1
e*
"'J1
are
A
not so, and, moreover,
A
asymptotic bias of
(=> 6. *
~
e
"'J1
e*
goes to
IIxl I ~
if
00
e* ~ .9..
the asymptotic bias of
00),
e
"'J1
with respect to {8}
Also, the
00
has an edge over
From (5.2) and (5.4), the A.R.E.
with respect to the asymptotic bias.
of {8}
e*
"'J1
"'J1
as Ilyll ~
where
00
Thus,
~n
A
"'J1
e .
has smaller asymptotic bias than
"'J1
is given by
"'J1
e(8,ely*)
= {Ip
+y*y*'I/IB"-ll}l/t
~ ~ ~
~e
~ ~
~e
(5.19)
~O
=
II I
{I!e Ie
~O
=
~O
+ Ie1x*x*' I }l/t
~O
~O
e(~,gIQ){ II + felx*x*'
I }l/t
~O
=
e(~,~IQ){l +Y*'fe-ly*}l/t ,
~
as
r *r *'
is of rank
so that
1,
~o~
II,. . ." +p-ly*y*'
I
,...,.,8,....."
=
f"V
product of the charac-
~O
, roots
ten.StlC
* ,--1
~
E y*
e
0
f
I
~
p-l Y.. * Y.. *' = 1 + largest root of
+ ~e
Ie-ly*y*' = 1 +
~O~
~O
~
Thus, if we let
~O~
(5.20)
S1
*
= {y..:
1 +
*
t ~ AI
1- r *,--1
f e r 2: 1 I e (Q ' Q Q) = ~e Ee
~O
then, from (5.19) and (5.20), we conclude that
(5.21)
~O
~O
I-I}
'
-22-
Similarly, it follows from (5,4) and (5.8) that
(5.22)
where
a(~*)
(5.23)
Now
a(~*) >a,
*
a ~b(~)
~IT
r+
1
=
'tf ~*?:O
*
2(X 2 ;~)
r,a
IT
(2 .~*)
- r+2 Xr ,a'
and it converges to
-+
a
as
~ * -+
00.
(5.23), we conclude that for
y*
close to
while it exceeds
r*
for which
1
and
for all
(5.24)
1
~*
as
4-
Also,
(X)
Thus, by (5.12), (5.22) and
Q,
(5.22) is less than
?:.~*
is
a
1,
where
and
Thus, if we let
* * ,*
52 = { y:y
By?:.
""8
~
(5.25)
r-v
........
~O
then from (5.22)-(5.25), we conclude that
(5.26)
for every
r*
E:
52
Finally, we have from (5.2), (5.8) and (5.23),
{1~E/-{l-a(~*)}{:[el_Ie } +b(~*)r*r*'II:[e
~O
(5.27)
=
~O
y
-10.'
close to
Q,
-lit
~O
{II+b(~*):[e y*y*' - (l-a(~*))(I-:[e
~o~ ~
Thus. for
~O
I}
~O
Ie
)I}-l/t
~O
(5.27) exceeds one, while, it is
~
1
when
-23-
*
is away from
Q
and (5.20) go to
00
y
Thus, for large
{e })
tJ.* ,
tJ.*
as
-+
{e*}
and
"11
-+
00),
Note that both (5.19)
But, (5.27) converges to
00
{8"11 }
1
tJ.*
as
-+
00
have similar performaces (better
tJ.* ,
while for smaller values of
"11
tJ. *
(in the sense that
(5.16) holds.
Combining these with
A
e ,
"11
(5.9) , (5.16) and (5.17), we conclude that the restricted MLE
is asymptotically optimal when
H :
O
~O E W,
though
is (asymptotically) biased and
its a.d.m. becomes larger (in the sense of trace or generalized variance)
y*
when
Q and this makes it less efficient too.
moves away from
the
On
A
other hand, the unrestricted MLE
~o
irrespective of
holds.
~O
H :
O
E W,
As a compromise, the PTMLE
e
"'11
remains asymptotically unbiased for
but is usually not optimal when
e*
"11
performs better than
is small and has uniformly a lower order of bias than
it performs better than
"e
"11
and very similarly to
e
"11
A
e
~n
e
"11
H
O
For large
w. Non-optimal i ty of the preliminary test estimator (under
has been studied by Huntsberger (1955).
H does not hold.
O
when
e*
PTMLE
of
"11
w,
n
H :
O
H :
O
~O E
w)
His conclusions do not remain valid
Indeed, in our setup, we have observed that the
may perform better than either
when
y* ,
Hence, it can be
recommended on the grounds of rebustness against any deviation from
~o E
r*
when
A
e
"11
or
e ,
"11
near the boundary
is large.
REFERENCES
1.
Aitchison, J. and Silvey, S.D. (1958). Maximum likelihood estimation
of parameters subject to restraints. Ann. Math. Statist. ~, 813-828.
Bancroft, '1'.1\. and I/an, C.P. (1977). Inference based on conditional specification: a note and a bihliography. RevieUJ Internat. Statist.
45, 117-128.
-24-
3.
Feder, P.I, (1968). On the distribution of the log likelihood ratio
test statistic when the true parameter is near the boundaries
of the hypothesis region. Ann, Math, Statist. 39, 2044-2055.
4.
Huber, p.J. (1967). The behavior of maximum likelihood estimators
under nons tandard conditions. Froe. 5th Berkeley Symp. Math.
Statist. Frob. !' 221-234.
5.
Huntsberger, D.V. (1955). A generalization of a preliminary test
procedure for pooling data. Ann. Math. Statist. 26, 734-743.
6.
Inagaki, N. (1973). Asymptotic relations between the likelihood estimating function and the maximum likelihood estimator. Ann. Inst.
Statist. Math. ~, 1-26.
7.
Kitagawa, T. (1963). Estimation after preliminary tests of signficance.
Univ. of Calif. Publ. Statist. ~, 147-186.
8
Silvey, S. D. (1959).
30, 389-407.
The Lagrangian multiplier test.
Ann. Math. Statist.