Sen, Pranam Kumar; (1988).Asymptotic Theory of Sequential Shrunken Estimation of Statistical Functionals."

ASYMPTOTIC 1lIEORY OF SEQUENTIAL SHRUNKEN FSflMATION OF STATISTICAL
F1JNCTIONAlS
Pranab Kumar Sen
Departments of Biostatistics and Statistics
University of North Carolina
Chapel Hi 11. NC 27599 USA
Based on the sample (empirical) distribution function F • let T =
n
"'Il
(T (Fn } •...• Tp (Fn )}' be an estimator of a smooth statistical functional ~
T =
1
(T (F} •...• Tp (F})·. Incorporating cost of sampling and a quadratic error
1
2 + cn. c > O. where W is a given
loss. the risk of "'Il
T is taken as LilT
-F "'Il-TII
~ W
~
~
posi tive defini te matrix.
Employing a jackknifed estimator of
stopping number Nc can be so formulated that
O) the minimum risk.
For p
~
Iri
2
~1I!n-rIlW'
a
has asymptotically (as c !
c
3. T may be dominated by a Stein-rule version
"'Il
~. an asymptotic treatment of this dominance is presented.
"'Il
The theory is
extended to the sequential case of ~ and some plausible forms of stopping
c
numbers are discussed in this context.
INTRODUCTION
Let
{~i;
i
~
1} be a sequence of independent and identically distributed
random vectors (i.i.d.r.v.) with a distribution function (d.f.) F. defined on
Eq • for some q ~ 1. Consider a transformation: ~ ~ = (Y(l} ....• Y(p}),
r
where y(j} has a marginal d.f. G • 1 ~ j ~ p. and define a smooth statistical
j
functional
r = (T 1 (G1}.···.Tp (Gp })·.
P
~
1. so that
(1)
We are primarily interested in an (asymptotically) optimal estimation of !.
Based on a sample
(~1' ... '~)
n(~l}.
of size
let
(2)
be the sample (empirical) d.f .• where I(A} stands for the indicator function
of the set A; the empirical d.f.·s G 1•...• G are defined in a similar
n
np
fashion [replacing the
~i
by
y~j}.
1
~
j
~
p].
Then. granted some mild
regularity conditions. a natural estimator of ,.,
T is "'Il
T
= (Tn 1•...• Tnp )'.
where
(3)
Consider a quadratic error loss incurred due to estimating ! by
c(>O) per unit sampling. so that the risk in estimating
= cn
Pc (T
.T)
"'Il ,.,
2
,.,
+ -..
LilT"'Il-Til,
,.,
= cn
T
,.,
In and
by "'Il
T is given by
+ -F
L{T"'Il'"
-T)'W(T
-T)}.
,., "'Il ,.,
where! is a given positive definite (p.d.) matrix.
"'Il
may depend on F.
is ! in n, although its precise form
!
o
n (=
c
in n.
Hence, we may assume that there exists a
0
n (c,F,!». such that
PcO(F) = Pc (T
,., O! ) = min
m'n
n •
c
~
0
Pc (T
-m'! ) •
(5)
where nO is the minimum sample size for which Pc in (4) exists.
is the minimum risk and
!
that
c
Thus. p~(F)
0 is the minimum risk estimator (MRE) of!.
n
o
n •
It may not
Thus, (4) is the sum of two nonnegative terms, one 1 in n
and the other one is
positive integer
ETIlIn-!II:,.,
(4)
Note that (4) depends on
c. n. ! as well as on F through the (joint) distribution of T.
be unreasonable to assume that
a cost
Note
c
as defined by (5). may generally depend on the unlmown F. so that
for any chosen n. T may not be MRE when F is allowed to vary within a class
"'Il
s.
For this reason sequential estimation rules are generally advocated to
achieve the MRE property (simultaneously for all F € S). at least, in an
asymptotic set up (where c
!
0).
A systematic account of this sequential
estimation problem (in a nonparametric set up) is given in Sen (1981, Ch.
10).
Before extending this theory to the current problem. we present some
related developments which have far reaching effects in this MRE problem.
For the multivariate normal mean vector estmation problem. for p
~
3, the
inadmissibility of the classical maximum likelihood estimator (MLE) has been
established by Stein (1956).
Later on. James and Stein (1962) constructed an
alternative (Stein-rule or shrinkage) estimator which dominates the MLE (in
quadratic error loss).
In the sequential case. parallel results are due to
Ghosh, Nickerson and Sen (1987) and others.
In the general nonparametric
case, an exact treatment of this Stein-phenomenon becomes difficult.
theless. the asymptotic theory follows the same track too:
\
Never-
See Sen (1984)
and Sen and Saleh (1985. 1987). among others.
Motivated by this. we are
naturally tempted in considering an appropriate Stein-rule version (is) of T
"'Il
is
"'Il
and in establishing the asymptotic dominance (of
defined manner.
"'Il
over T ) in some well
"'Il
Our main objective is to formulate along with
is
"'Il
its
sequential version ~ • where Nc is an appropriate stopping number, and to
c
establish the asymptotic MRE property of ~
in a systematic manner.
These
c
versions of T are introduced in the next section.
"'Il
The main results are then
presented in the third section. and their derivations in the following one.
Some general discussions are made in the concluding section.
SHRUNKEN ESTIMATION OF ,.,
T
We assume that for each j(=I •...• p). Tj(e) is Hadamard-continuous at Gj • so
that
on H € sf.
where sf is a topological vector space.
(6)
Actually. we assume more:
For each
j. Tj(e) is Hadamard (or compact) differentiable at Gj • so that
where
~.
and it €
a class of compact subsets of sf. IIG-FII refers to the usual
sup-norm and T (G ;e) is the compact derivative (or influence function) of
j1 j
Tj(e) at G • which can be so normalized that
j
1
~ j
~
(9)
p.
For the shrunken estimators (to be considered here). we need to estimate
F_(T
-T)T
-T)'.
"'Il
"'Il
-F
IV
IV
For this purpose. we shall employ the classical jackknifing
method which rests on the construction of pseudovariables. and we formulate
them first.
(i)
For each j(=I •...• p). we denote by
G(n_l)j(x) = (n-l)
-1
~
(j)
~r=I(~i) I(Y r
~ x).
T(i)
(G(i»
(n-l)j = Tj (n-l)j
and
i=1. ...• n.
x
€
E;
(10)
(11)
Then the Tnj • are the pseudovaribles and the classical jackknifed estimator
i
T* = (T*1 •...• T* )' is given by
"'Il
n
np
*
Tnj = n -1
ff!i=1
T
for
nj.i·
Let "'Il.
T i = (Tn 1 . i.···. Tnp • i)', 1 ~ i ~ n.
•-*
V = (n-1)
-1
"'Il
j=1 •...• p.
and let
*
-It
* .
T ) .
~i'-1
(T
i - "'Il
T )(T
i-"'Il
"'Il •
"'Il.
We assume that!
....* is p.d. and has finite elements.
I**
1 (M)
=
«II
(12)
Tj1(Hj;X)Tt1(Ht;y)dHjt(x,y»
(13)
Further. we assume that
is Hadamard-continuous at
Q.
Then
proceeding as in Sen (1988). it follows that
v* ~!*
"'Il
....
almost surely (a.s.). as n ~
(15)
m.
o (known)
The Stein-rule estimators rest on the choice of a suitable pivot!
whose plausibility plays a vital role in their effective dominance; without
o
any loss of generality. we may let ....
T = ....
O.
dn
= smallest
Let then
WV*.
characteristic root of --no
(16)
2
~n = nll1:o - QIly*-1'
(17)
"'Il
and let k : 0
< k < 2(p-2).
p
~
3) be a shrinkage factor.
Then we may
proceed as in Sen (1986) and consider the following shrinkage version of T :
"'Il
~ = (I - kd ~-1 W- 1 V*-1)T .
n
"'Il
[If
wv*n
n
'"
'"
(18)
"'Il
is not of full rank. we may use a generalized inverse of WV*.]
--n
Similarly. in (18). replacing T by its jackknifed version (T* ). we may
"'Il
consider a Stein-rule estimator T*S.
"'Il
"'Il
Our first objective is to study the
asymptotic dominance of ~ (or T*S) over T
"'Il
"'Il
"'Il
(or T*).
"'Il
But given the picture
in (5). we intend to consider their sequential versions. For this purpose.
first. we need to introduce suitable stopping numbers. In this context. we
may note that [viz .• Sen (1988)] under the assumed regularity conditions.
*
n ~ (T -T) -+ J{ (0.1 ).
"'Il '"
!II P '" '"
as
n -+ CD.
(19)
(and the "convergence in law" may as well be replaced by "convergence in
second mean"). so that (at least) for large n. (4) looks like
pc(In.!)
*)
tr(~
Thus. if
*
= cn + n-1
-1
trace(!1) + o(n
(20)
)
were known and c(>O) is small. then the optimal sample size
nO in (5) [for {T }] is given by
c
"'Il
o
(as c ! 0).
n c '"
and as a result.
* )}.
~
o '" 2{c
pc(F)
tr(~
as
c
I
~
(21)
(22)
O.
Keeping (15) in mind. we formulate a stopping number N by letting
c
= inf{n
Nc
where
nO(~
factor n
-a
~
nO : n
2
__ ~
~
{tr(wv
) + n
"""'Il
-c
}/c}.
>0
c
(23)
p) is a positive integer and a (> 0) is a suitable number; the
eliminates a too early stopping (for small c) and thereby helps us
in the manipulations wi th the asymptotic theory of N or
c
:r.r..
The stopping
c
number N may also be defined in some alternative ways. and we will briefly
c
discuss this at the end.
Combining (18) and (23). we consider the following
sequential shrunken version of the estimator T
"'Il
-S
IN
c
=
(! -
k ~
c
-1
~
c
!
-1
.~1
!N
c
):r.r..c
k > O.
(24)
Using the same risk function as in (4). but adapted to the sequential case.
we may define now
(25)
(26)
°
Then. our main interest lies in the comparative shedy of p c (F) in (5) and
p* (F) and p*s (F). when c
c
c
I
~
In this context, we shall find it convenient
0.
to make use of the concept of asymptotic distributional risk (ADR).
We may
recall that the Stein phenomenon is essentially a local one only in a
neighborhood of the pivot.
A Stein-rule estimator usually dominates its
classical counterpart, and this dominance becomes imperceptible as the true
parameter point
X moves
(21). nO{- O{c~»
away from the pivot (here Q).
Also, note that by
~ ~ as c! 0, and hence, the effective domain of this
c
dominance of the Stein-rule version shrinks to the pivot as c ! 0.
The
situation is comparable to the case of (sequential) Stein-rule KLE's treated
in detail in Sen (19S7a).
As such, we consider the following sequence {K }
c
of (local) alternatives:
k c : ,T
."",
= ""T{F) = c~ ,...,~,
P
.2- € E (fixed). c ! 0.
(27)
In (4), (5). (25) and (26). F is no longer treated as a (fixed) d.f .• rather.
F is replaced by a sequence {F }, where F satisfies K , so that the desired
c
c
c
eXPectations are all computed under {K } in (27).
c
under {Kc } (containing HO : X
c~{~
c
-X»
= Q as
As we shall see later on.
a particular case), c
~
(IN
-X) (or
c
has a limiting (as c ! 0) distribution which may be incorporated
in the evaluation of the (asymptotic) expectations in (4). (25) and (26).
There are two main advantages of this adaptation:
(i)
Use of such an
asymptotic distribution of the estmator and the stopping number leads to
considerably simpler asymptotic expressions, and (ii) this approach may
generally require less stringent regularity conditions than in the usual
case.
We shall elaborate these two points in the current context in the
concluding section.
However, we shall adapt this "asymptotic distributional
risk" (ADR) approach and present our main results in the next section.
suitable sequential estimation
~
of
X,
For a
we denote by
c
(28)
0*
and assume that F
is nondegenerate with a finite (p.d.)
Y0* = II ~'d F0* (~).
Then based on the same risk as in (4) {but incorporating the asymptotic
(29)
distributional approach). the ADR of
~
is defined by
c
OM
",
P
c.!
OM
~
(F) = c E(N ) +
tr(WV
C
c
~
).
(30)
where E refers to the expectation with respect to the asymptotic distribution
/ °
°
°
so that whenever (Nc -n°
°c ~ ° (as we shall see later on). ",ENc can as well
c
be replaced by n° + o(c~ ). and hence. (30) can
expressed as
of N (and ~,
PM).
c
Note that Nc = n c + (Nc -nc ). where n c is defined by (21).
c
)/n
be
c
OM
Pc
~L(F)
'",
*~
= c~ {(tr(~)
OM
+ tr(!y
)} + o(c~ ). as c ~• 0.
(31)
We shall find this expression suitable for our subsequent analysis.
for the jackknifed versions.
* IN* . IN*S etc .•
!no
c
Also.
we may need a second order
c
Hadamard-differentiability of x(e) of Q: this is an one-step extension of (7)
and (8) which incorporates a quadratic term in (7) and for which (8) is
o(IIH~jll
2
). uniformly in H
€ 1. 1 ~ j
~
p.
We omit the details of these
manipulations here and refer to Sen (1988) for some detailed accounts.
ADR RESULTS:
SEQUENTIAL CASE
The ADR versions of (25) and (26) [in the light of (30)] will be denoted by
""*
p
(F)
c.!
""*s
(F)
c.!
and p
""*0
denoted by p
c.~
IN*
and
IN*S are
c
""*OS
(F) and p
",
THEOREM 1.
respectively: the parallel measures for
c.~
(F) respectively.
c
Then. we have the following.
",
If x(e) is first order Hadamard differentiable at
Hadamard continuous at Q. then in the tight of ADR.
IN
Q and !1** (e)
is
is asymptotically (as
c
c ! 0) MRE. i.e .• for p ~ 1.
lim ,~{~
cw
C.~
",
(F)/pO(F)}
c
If. in addition. T(e) is second order
",
same AlIRE property holds for
= 1.
(32)
Hadamard differentiable at G. then the
",
IN*
c
THEOREM 2.
Suppose that p
""*
~
3 and the hypothesis of Theorem 1 holds.
""*S
limc~'O{p
(F)/p c.~ (F)) ~ 1 and
C.~
~
~
""*OS
"'*0
limc~'O{p
(F)/p c.~ (F)} ~ 1.
c.~
~
Then
(33)
~
for all ~ € EP • where the strict inequality signs hold for all ~ close to the
",
",
pivot (Q)
Thus.
~
*S ) has the desired dominance property.
(or IN
c
c
Before we proceed to sketch the proofs of Theorems 1 and 2. we present the
following remarks:
(i)
In (33). the equali ty sign is attained in the limi t
"!"2W
-+
ClC).
The
""
implication of this result is that for any significant detour from the pivot.
the shrinkage effect is asymptotically negligible.
Thus. the Stein-rule
estimators (in the sequential case too) are advocated only when one has a
prior belief that for the true parameter point T (though unknown) and the
chosen
o
pivot! •
0
"!-! II
~
= O{c
""
) where the cost per unit sampling. c. is
small.
(ii)
For both the theorems. the ADR measures may be replaced by their
asymptotic risk (AR) counterparts {computed from (25) and (26»
same asymptotic setup in (27).
under the
The conclusions would have been the same.
However. this could require more stringent regularity conditions on !(.).
We
shall discuss these in the concluding section.
(iii)
Often. it is of interest to study the asymptotic distribution theory
(viz .• normality) of the stopping number N {in a standardized form:
c
{nO)~{N
_nO}. as c ! 0).
c
c c
This is also possible under additional regularity
conditions. and we shall discuss them in the last section.
(iv)
The asymptotic dominance result in Theorem 2 rests on the particular
adaptation of ! in (24).
If in (24). one uses a different matrix (say. ! * )
while in (4) ! is adapted. then (33) may not hold for every (!.!*).
Or. in
other words. the dominance of (sequential or fixed-sample size) Stein-rule
estimator over the classical.version may depend very much on the chosen!.
In practive ,...
W may not be unique. and hence. there may be an issue regarding
the robustness aspect of the Stein-rule estimators (With respect to the
variation in the chosen !).
In the parametric case. ! may be linked to the
Fisher information matrix (say,
nonparametric case is
t).
* -1 .
! = (~)
and a natural analogue of this in the
where
influence functions. defined in (14).
*
~
is the dispersion matrix of the
In this setup. ,...
W is unknown. so that
in the definition of T~ etc. we need to make some adjustments.
n
way is to estimate ,...
Wby "'Il
y*-1. where
is provided by (15).
simpl Hies to
v*
"'Il
.
One SImple
is defined in (13); the justification
In this case. d =1 with probability one and (IS)
n
-S
r
= (1 - k
"'Il
-1
)T
n "'Il •
~
o < k < 2(p-2).
p ~
2.
(34)
This simplified form corresponds to the classical James-Stein (1962) version.
In this case. one may even consider a positive-rule version:
(35)
where a + = max(a.O).
A posi tive rule version :r:+ of:r: may be defined in
c
the same fashion.
versions
c
Also (34) and (35) extend readily to their jackknifed
*
In* and !rf.
The posi tive rule versions have usually smaller ADR
c
than their usual Stein-rule counterparts.
(v).
Note that in Theorems 1 and 2. the jackknifed covariance matrix V*
"'Il
rests on the first order compact differentiability only. while for the
.
Jackknifed
estimators
*
In.
~ • !rf
* and!rf~+ • we have invoked the second order
!rf
c
c
c
differentiability property of x(e).
This subtle point will be made clear in
the next section.
OUfLINE OF PROOFS OF (32) AND (33)
First. defining n~ and Nc as in (21) and (23). we show that
Nc/n~
-+
1 a. s.. as c !
Note that by defini tion in (23). for every n
~
o.
(36)
nO
(37)
and letting n
* '"
c
c
-(2+a)-1
= 1.
P{Nc ~ n*
c}
by
*
tr{~).
> O.
V c > O. where n*c
-+~
as
c! O.
= (l+(-l)j~)nO.
c
j
= 1.2.
~
. .M
tr(!(Xm
-
* ]). V n*
c
~
• m
~
* < O.
tr{~)
Vn
~
(39)
n]
~(> 0) arbitrary. Then. putting n
2
in (31) and (39) and noting that c(m - (n~)2) ~ c(n~)2[(1_~)2 - 1]
-~(2-~).
(38)
Thus. the right hand side of (31) can be expressed as
[c(m2 - (n0c ) 2 - m-1
Let nO j
c.
we obtain from (23) that
* ~ • i.e .• c{n02
O. we may replace n 0 by c ~ {tr{~)}
c)
c
I
~
Further. by (21). for c
• c
= nOc. 1
=
n 0c • 1 ' we obtain by using (23) and (39) that
Similarly, for n
~
o
~
Nc
0
I
n c (l-e) = n c, 1 a.s., as c • 0
2
0 2
no
c ,2' c[n - (nc ) ]
~
(40)
* ) > 0,
tr(~
(2-e)e
so that by (23),
(37) and (39), we obtain by a few simple steps that
N
c
o
~
0
I
nc, 2 = nc (l +e )
Then, (36) follows from (40) and (41).
a. s., as c • O.
(41)
It is interesting to note that for
this a.s. convergence property of the stopping number, for {V},
* the first
n
order differentiability of r(e) suffices.
For every
n(~l)
let us denote by
rO = (rO
n, 1' ... ' rO
n,p ). where
"'Il
(42)
Then, under the first order Hadamard differentiability of r(e), we have
[viz., Parr (1985), Sen (1988)]
T
"'Il
where 119n-Q1I
= max{IIGnj -{;jll,
:=0
-
,...
T
= "'Il
T + o«IIG
-{;II),
"'Il ,...
1
~
j
~
pl.
(43)
From the classical results on the
weak convergence of Kolmogorov-Smirnov statistic it follows that under (36).
as c ! 0,
(44)
so that by (43) and (44), as c ! 0
O~
(nc )
Since the
T
j1
[It.
c
-r]
o ~ :=0
TN
= (n)
c
"',
+
0
c
p
(1).
(45)
(e) are square integrable, we have
(46)
and the classical Anscombe condition holds, i.e.,
max
m: Im-n I~c5n
n~lIrO - rOil
"1Il
a as
n -+
GO
(c5
>0
small).
"'Il
From (45), (46) and (47), we directly obtain that as c ! 0
(47)
(48)
(49)
Given (48) and (49). we may evaluate (30) or (31) (for
!r. ) by using U: O}
c
(when c 1 0) and this leads to (32).
{!r.* }.
nc
To obtain the parallel result for
we may note that the magnifying factors (n and n-1) in (11) may call
c
for some extra manipulations in the derivations of the desired results.
Under the (assumed) second order differentiability condition on r(e) [see
Theorem 1]. we may proceed as in Sen (1988) and write
1
T* = T + 0(-)
"'Il
n
"'Il
=
T
'"
a.s.
(as n
~~)
:;:()
1
+ "'Il
T + o(IIG -Gil) + 0(-)
"'Il ,...
n
a.s.
Hence. we may repeat (44) through (49) and obtain (32) (for
(SO)
!r.).
*
Let us
c
proceed to the proof of Theorem 2. Parallel to (17). we wri te
*
~
n
= nllT II 2
"'Il
1M,...
1.
Then by using the Courant Theorem (on the ratio of two quadratic forms). we
obtain by (15) that
~/~*-+1
n
a.s .•
n
(51)
asn~~.
On the other hand. using (45) and some standard steps. we have
I~
-
c
[under HO :
r =Q
~*O I II 0
as
nc
as well as {Kc } in (27)].
-1
follows (by noting that no
c '" {c
~ - ~ 0 II 0
c
nc
*!)})
~
tr(~
c!
(52)
0
Combining (51) and (52). it
that
(under {Kc } or HO)'
as
c
1 0;
(53)
(54)
O~
In fact. we let ~ 0 = (nc )
n
c
[!
n
0-2]
c
and note that under {Kc }' ~ 0 ~ ~ '"
n
c
N
* ).
(~.I
P"''''
then we have under {K }.
c
(55)
and parallel to (49). we have then
as
c!
o.
(56)
Now (55) permits us to evaluate (29) for the Stein-rule version ~ • and. as
c
such. using (29)-(31) along with the expression (for the ADR of the
Stein-rule estimator in the nonsequential case for M-estimators) in Sen and
Saleh (1987). we obtain that
(57)
< k < 2(p-2). p
where lee) is nonnegative for every k : 0
~ 3 and ! € EP ; it
is strictly positive at ,...,
~ = ,...,
0 as well as for "'"
~ in a neighborhood of ,...,
O. and it
goes to 0 as
~
'"
moves away from the pivot.
differentiability of
Under the second order compact
r(e). using (SO), it follows that (55) extends to -n
T*S as
c
/
*S
well, and hence, (57) also pertains to IN.
This shows that (33) holds.
c
SOME CX>NCLUDING REMARKS
We make some remarks here on the rationality of the use of ADR (instead of
the asymptotic risk (AR»
in the sequential case.
If we look at the convent-
ional estimator T • its AR can be computed under fairly general regularity
-n
conditions.
Relatively more stringent regularity conditions are needed for
the computation of the AR of
IN.
This is primarily due to a "uniform
c
integrability condition" on the nllT -T 112M-1 (for m : Im-nl
"1Il
-n V
< 6n.
n -+
co.
6 !
"1Il
0) as well as on the elements of
V*.
"1Il
Im-nl
< 6n.
The second uniformly
condition demands some L -norm approximations for~.
1
In the particular case
of U-statistics (i.e .• von Mises functionals). by assuming that the kernel
has a finite rth absolute) moment for some r
> 4. Sen and Ghosh (1981) were
able to verify this uniform integrability condition.
However. in the current
context. the assumed first order Hadamard-differentiability of
continUity of T** (e»
may not suffice.
r(e) (and the
If. we assume that T(e) is second
order Hadamard differentiable then for the jackknifed covariance matrix V* •
"'1l
the desired uniform (square) integrability can be established along the lines
of Sen (1988) when EII!(2n)1I
r
<
01)
~
for some r
This would also entail the
4.
~
0
asymptotic normality of the stopping number Nc (i.e .• c (Nc-nc ) will have
asymptotically (as c ! 0) a normal distribution with 0 mean and a finite
variance). The situation is far more complicated with the sequential shrunken
rSn
estimators.
Even in the fixed sample case.
in (18) encounters the same
difficulty.
Not only one would require uniform square integrability for the
elements of y*-1
n • but also that ~-1
n has a finite expectation. In the standard
normal theory model.
~
has a (central or noncentral) variance-ratio distri-
n
bution. and hence. for p
~
3. E
-1
~n
exists.
On the other hand. under H or
O
~ converges in law to a central or noncentral chi square variable
c
n
(with p degrees of freedom). But. this "convergence in law" does not ensure
the "convergence in negative moments" of ~ to that of a chi square variable.
n
{K }.
In fact. in some cases.
~
n
may be arbitrarily close to '0' with a positive
probability (however. small it may be). and this can push up the expected
value of
-1
much beyond the value as may be obtained by using the appropri-
~
n
ate chi square distribution.
This technical problem can be taken care of in
some ways.
First. if one considers a positive rule shrinkage estimator [as
in (35) then for ~n ~ k. by forcing "'1l
T+S to O. one avoids this highly inflated
'OJ
status of
near ~
n
rS
"'1l
'OJ
0
and for ~
n
> k. ~n:v::t ~p • => E{~-1
I(~n > k)}
n
.&
-+
.Il
E(~p • .Il I(~p • .Il > k)}. Secondly. as in Sen and Saleh (1985). we may allow a
.&
.&
small truncation near the origin. and this will enable us to incorporate the
convergence in law of
either case. if
so that the
n
to that of L -norm convergence on ~n
1
~ ~
> O.
In
r(e) is assumed to be second order Hadamard differentiable.
v* has
"'1l
-1
~
the desired uniform integrability property. then the AE
results can be obtained for the parallel sequential versions.
Whenever these
AE results hold they are in agreement with the corresponding ADR results.
Hence. from the interpretational point of view. the ADR results serves the
right purpose without unnecessarily calling for these extra regularity
conditions or modifications.
As has been explained in (34)-(35). there are
good points in considering the James-Stein (1962) versions.
In such a case.
the stopping rule in (23) may as well be replaced by a nonstochastic integer:
00
N
c
'OJ
(pic)
~
as
I
c + O.
(58)
so that the sequential rules
counterparts with n given
b~
ma~
all be replaced
(59).
b~
their nonsequential
However. even in this special case. a
genuine stopping time may arise in a natural
rS
wa~.
Suppose that in (20). we
ts
replace ""Il,s""Il
T bv
and denote the corresponding covariance matrix b~
""
( instead of !*).
VS
""
(and we choose
If it is possible to derive a suitable estimator
*
~
~
ts
of
""
=!-1 ). then parallel to (23). we would have a stopping
number
1
~C = inf{n ~ nO : n2 ~ {tr(r
""Il
VS)
""Il
+ n-a}/c}.
c
> O.
(59)
-S
OS
The stochastic convergence of {N } to a suitable sequence {nc } would
c
naturall~ depend on (15) as well as the convergence properties of
{VS}.
and
""Il
granted this. the rest of the results would follow on parallel lines.
Since
tr(~) ~ tr(!.~*). it can be shown that ~/Nc ~ 1 a.s .• as c ! O. and
hence. the use of the stopping number leads to an
too.
asymptoticall~
smaller ASN
However. we ma~ remark that the difference tr(~*) - tr(~) becomes
smaller as the true parameter point
e moves
awa~
""
from the pivot.
Thus. for
an~ (fixed) ~ ~ Q. as c ! O. tr(~*) - tr(~) ma~ have asymptoticall~ (as c
!
0) a positive limit. and in that case. the stopping time in (59) leads to a
real reduction in the ASN (in the as~totic case where c ! 0).
Finally. we
make a general comment on the mapping X ~ Y leading to the definition of the
""
functionals in (1).
In
man~
""
situations we may have the
the Xi represent the same vectors i.e .•
~i
~i
=Xi(and p=q).
as p-vectors and
For example.
consider the p-variates location model where the d.f.F stands for a p-variate
d.f .• while G •...• Gp are the p univariate marginal d.f. and Tj(G ) is a
1
j
typical location parameter for the jth marginal. 1
~
j
~
p.
A very similar
case may arise when the Tj(G j ) are suitable scale parameters for the marginal
d.f. ·s.
In either case. one may choose the Tj(G ) as suitable L-. M- or Rj
functionals for which sample estimates {T } are known to have good
nj
robustness and efficiency properties.
f
Functionals corresponding to the
trimmed or Winsorized means may also be considered in the same vein.
Viewed
from this angle. the functionals corresponding to R- or M- estimators may
require some regularity conditions more stringent than the usual ones need
for a direct approach [See Jure~kova and Sen (1982) and Sen (1980)].
may be other situations where X ~ Y and q
""
'"
~
p.
There
For example. consider the
case where F. defined on E4 • is the d.f. of (X(I). X(2). X(3). X(4». and we
are interested in the pairwise association (correlation) parameters
4
(involving (2) = 6 bivariate distributions). Thus. q = 4 < 6 = p. For each
pair (X(r). Xes»~. we may consider an appropriate association parameter
such as the Kendall tau.
T
rs
•
Spearman grade correlation coefficient and others
[discussed in detail in Hoeffding (1948)]. for 1
~
r
<s
~
4.
The sample
counterparts are Hoeffding's U-statistics andlor von Mises' functionals. and
hence. they can be treated in the same setup as in the current study.
Alternatively. the results of Sen (1987b) may also be used for them.
In
eithr approach. the functional form of the d.f. F is not assumed to be given.
and hence. to retain the nonparametric structure of F. we may need to take
recourse to an asymptotic setup (where c
!
0). as has been adapted here.
A
finite c()O) with an unspecified F may call for an altogether different (and
presumably parametric) approach and may be much more complicated.
In fact.
generating such an optimal solution (for a given (fixed) c)O) in a genuine
nonparametric formulation is still an open problem!
REFERENCE'S
[1]
Ghosh. M.• Nickerson. D.M. and Sen. P.K.
Sequential shrinkage
estimation. Ann. Statist. 15 (1987). 817-819.
[2]
Hoeffding. W.
A class of statistics with asymptotically normal
distributions. Ann. Math. Statist. 19 (1947). 293-325.
[3]
James. W. and Stein. C.
Estimation with quadratic loss. in Proc. 4th
Berkeley Symp. Math. Statist. Probability 1 (1962). 361-380.
[4]
Jure~kova..
J. and Sen. P.K.
M-estimation and L-estimators of location:
Uniform integrability and asymptotically risk-efficient sequential
versions.
[5]
Parr. W.C.
Sequen. Anal. 1 (1982). 27-56.
Jackknifing differentiable statistical functionals.
Jour.
Roy. Statist. Soc. B 47 (1985). 56-56.
[6]
Sen. P.K.
On nonparametric sequential point estimation of location
based on general rank order statistics. Sankhya. Ser. A 42 (1980).
201-219.
[7]
Sen. P.K.
Sequential Nonparametric:
Invariance Principles and
Statistical inference. John Wiley. New York. 1981.
[8]
Sen. P.K.
A James-Stein detour of U-statistics. Commun. Statist. Theor.
Meth. A 13 (1984). 2725-2747.
[9]
Sen. P.K.
On the asymptotic distributional risks of shrinkage and
preliminary test versions of maximum likelihood estimators. Sankhya.
Ser. A 48 (1986).354-371.
[10] Sen. P.K.
Sequential Stein-rule maximum likelihood estimation:
General
asymptotics. In Statistical Decision Theory and Related Topics IV (eds.
S.S. Gupta and J.O. Berger). 2 (1981). 195-208.
[11] Sen. P.K.
Sequential shrinkage U-statistics:
General asymptotics. Rev.
Brasileira de Probe Estatist. 1 (1981). 1-20.
[12] Sen. P.K.
Functional jackknifing:
Rationality and general asymptotics.
Ann. Statist. 16 (1988). 450-469.
[13] Sen. P.K. and Ghosh. M.
Sequential point estimation or estimable
parameters based on U-statistics. Sankhya. Sere A 43 (1981). 331-344.
[14] Sen. P.K. and Saleh. A.K.M.E.
f
On some shrinkage estimation or
multivariate location. Ann. Statist. 13 (1985). 212-28l.
(15] Sen. P.K. and Saleh. A.K.M.E.
On preliminary test and shrinkage
M-estimation in linear models. Ann. Statist. 15 (1981). 1580-1592.
[16] Stein. C.
Inadmissibility or the usual estimator or the mean or a
multivariate normal distribution. in Proc. 3rd Berkeley Symp. Math.
Statist. Probe 1 (1956). 191-206.
t