Sen, Pranab KumaStatistical Functions, Stopping Times and Asymptotic Minimum Risk Property"

No. 2002
STATISTICAL FUNCfIONAlS. STOPPING TIMES AND
ASYMPTOTIC MINIMUM RISK PROPERTY
PRANAB KUMAR SEN
Departments of Biostatistics & Statistics
University of North Carolina.
Chapel Hill. NC
27599-3260. USA
ABSTRACf
In a nonparametric estimation problem. for a statistical functional. the
risk (defined suitably) depends on n.
underlying distribution function.
the sample size. as well as F.
the
Sequential estimators based on appropriate
stopping times possess the (first order) asymptotic minimum risk property.
al though the picture is somewhat different for shrinkage versions of such
sequential statistical functionals.
In this context. the role of stopping
times is explored in a systematic manner.
AMS (1980) Subject Classification Nos:
62 L12. 62 L15
Key
distributional
Words
minimum
risk;
and
risk;
Phrases:
Asymptotic
differentiable
quadratic error
loss;
statistical
optimal
risk;
functions;
sample
size;
jackknifing;
sequential
shrunken estimators; statistical functionals.
Short Title:
asymptotically
STOPPING TIMES FOR STATISTICAL FUNCfIONAlS
minimum
estimation;
1.
Let {Xi; i
~
INTRODUCTION
I} be a sequence of independent and identically distributed
(i.i.d.) random vectors (r.v.) with a distribution function (d.f.) F. defined
r
on E . for some r ~ 1. The functional form of F is not assumed to be given.
and it is only assumed that F € ~. where ~ is a suitable class of d.f.·s.
r
defined on E .
In this nonparametric setup. an estimable parameter 8 is
regarded as a (possibly vector valued) functional of the d.f. F. i.e .. we set
( 1.1)
~
where p
1. and T1(·) •...• Tp (.) are suitable functionals.
(X ..... X ) of size n
n
1
(~
Based on a sample
1). let
(1.2)
be the empirical (sample) d.f.
Then. a natural nonparametric estimator of 8
is
Tn = T(Fn ) = (Tn 1 = T1 (Fn ) •...• Tnp = Tp (Fn
»'.
n
~
nO·
(1.3)
In estimating 8 by T • we conceive of a plausible loss function
n
(1.4)
L(T .8) = L(IIT -811).
n
where L(t). t
>0
is nonnegative and nondecreasing in t and liT -811 stands for
n
a suitable norm of T -8.
n
L(~.!2)
-1
= (a-b) 'Q
(p.d. f) matrix.
n
For example. we may consider a quadratic error loss
2
(a-b) = lIa-bIl Q • where Q is a given positive definite
For p=l, another possibility is to consider the absolute
error loss L(a. b) = la-b I. and this defini tion can also be extended for
p
~
1.
Also. let c (> 0) be the cost of sampling per unit observation. so
that incorporating the cost of sampling along with the chosen loss function.
we may formulate the following risk function for estimating 8 by T :
n
p{n;c.F) =
~{L{Tn.8)
+ cn}. F €~. c
> O.
n
~
nO.
Note that. in general. the risk function may depend on 8 as well.
(1.5)
But. by
(1.1). 8 = T{F) is itself a functional of the d.f. F. and hence. the risk
p(n;c.F) is regarded as a function of (n.c) and a functional of the d.f. F.
Invoking
the consistency of T
n
{requiring usually some mild regularity
conditions on T(·) and F), we may argue that liT -911 becomes stochastically
n
smaller as n
increases,
so
that
incorporating appropriate
integrabili ty
condi tion on L( liT -911), it may be quite reasonable to assume that
n
(1.6)
where v (F) may depend in general on the unknown F (€
n
~).
Thus, by (1.5) and
( 1. 6), we have
~
p(n:c,F) = vn(F) + cn, c > 0, n
where v (F) is ! in n, while cn is f in n.
n
nO' F €
~,
(1. 7)
Therefore, there exists an no
c
(depending generally on c and F), such that
p(n~;c,F)
= inf{p(n;c,F) : n
~ nO};
F € ~, c > O.
(1.8)
o
[If there are more than one such n , we choose the smallest one among them as
c
the desired solution.]
In this formulation,
nO stands for
c
the optimal
o
0
samples size (for a given F and c) and p (F) = pen ;c,F) stands for the
c
c
minimum risk attainable for the estimation of 9 when one confines oneself to
the class of nonparametric estimators
~
= {Tn = T(F);
n
n ~ nO}'
(1.9)
and adopt the risk function in (1.5).
As has been mentioned earlier, v (F) in (1.6) generally depends on the
n
unknown F(€
model
where
~).
~
The current si tuation is more complex than a parametric
is
of
a
given
functional
form
(involving
some
unknown
parameters appearing as algebraic constants), so that v (F) is also of a
n
known functional form {involving the same parameters (or a subset of them) as
algebraic
constants).
In
our
functional of the unknown F (€
~),
nonparametric
formulation,
and it satisfies (1.6).
v (F)
n
is
a
It is therefore
clear from the above discussion that nO generally depends on F (€ ~) as well
c
o
0
as on c(>O), so that for any chosen n c = n , say, T 0 may not have the
n
minimum risk property,
estimator T
n
for all F €
~.
Thus, a
fixed sample size (n)
may not be a minimum risk estimator (MRE) of 9, simultaneously
A
•
for all F €
~.
In the negation of the MRE property of fixed sample size estimators. it
is of genuine interest to consider suitable sequential estimators
{TN; c > O} based on appropriate stopping times {N ; c > O}. such that in
c
c
some meaningful way (viz .• asymptotically as c ! 0). TN
has the desired MRE
c
property.
Or. in other words. we intend to formulate suitable stopping times
{N ; c>O}. where for every c > 0. N
c
is a positive integer valued r.v .. such
c
that
lim {F_[L(T .9) + eN ] / pO(F)}
Nc
c!O -y
c
c
= 1.
VF €
~.
( 1.10)
In the literature. (1.10) relates to the first order asymptotic minimum risk
property. and any {TN} satisfying (1.10) is termed an asymptotically (first
c
order) minimum risk estimator (AMRE) of 9.
Section 2 is devoted to the formulation of such AMRE of 9.
In this
context. asymptotic eXPansions for v (F) and sequential estimation of the
n
leading terms are of prime importance. and these will be discussed there.
For p
~
3. for any fixed n. Tn may not have the smallest risk:
There exist
suitable shrinkage (or Stein-rule) estimators which may dominate T
n
exact
or
asymptotic
sense)
with
respect
to
the
risk
in
(in an
(1.4).
This
Stein-phenomenon also holds in the sequential case [viz .• Ghosh. Nickerson
and Sen (1987) dealing with a specific parametric model and Sen (1989). for a
general nonparametric setup].
Thus. based on the stopping time {N } leading
c
to the AMRE property of TN • it may be possible to construct a sui table
c
shrinkage version of TN
(say.
c
~
). such that the (asymptotic) risk of
c
is smaller than (or equal to) that of TN . uniformly in 9.
c
risk dominance picture is depicted in Section 3.
c
This relative
In passing.
remarked that the usual shrinkage versions of the TN
~
it may be
may not belong to the
c
class
C€.
(asymptotic)
As
such.
optimality of
shrinkage estimators.
this issue.
there
remains
the
some
stopping
open
times
questions
in
the
regarding
context
of
the
such
The main objective of the current study is to focus on
2.
ASYMPTOTIC~Y OPTIMAL
STOPPING TIMES FOR STATISTICAL F1JNCTIONALS
Formulation of stopping times and sequential estimators of
very much on the nature of v {F} in {1. 6}.
a
depends
Let us assume that for some
n
positive number q,
v {F} = n
-q
n
{v
{I}
{F} + n
-1
v
{2}
{F} + ... }
{2.l}
where the v{j}{F} are suitable functionals of the d.f. F.
Typically, if we
consider the case of a quadratic error loss with a given {p.d.} Q, then v {F}
n
= trace{Q -I'
E-[Tn -O][Tn -a]'}, so that invoking the usual asymptotic expansions
for the second order moment of n~{T -a}, we obtain that {2.l} holds with q=l.
n
Actually, for Hoeffding's {1948} U-statistics and von Mises' functionals such
an expansion follows readily from the so called Hoeffding-decomposition {into
orthogonal
components};
for
more
general
forms
of
such
statistics,
a
Hoeffding-type decomposition has been neatly worked out by van Zwet {1984}.
For differentiable statistical functions, parallel results {up to the order
-2
n } have been considered by Sen {1988}. Note that by {1.7} and {2.l}, we
have
p{n;c,F} = cn + n-q v{l}{F} + n-q - l v{2}{F} + ., . .
{2.2}
Thus, if we define
n
*c
=
c
then, it is easy to show that as c
{q{l+q}
!
-1
+ q-q{l+q}
o
n
*
c
- n
0
_{l+q}-l
= o{c
};
c
no
c
c
= {-I
c q
{2.3}
0,
p{n*;c,F} - p {F} = o{c
c
> 0,
-1
} + o{cq
{l+ }-l
q
};
q {l+ }-l
q
},
v {l}{F}}{l+q}
-1
{2.5}
+ o{c- {
l+ }-l
q
}.
Thus, for c ! 0, n* provides a first order approximation for
c
asymptotic risk of T * satisfies the limit in {l.lO}.
n
{2.4}
{2.6}
and the
For this reason, in
c
the sequel, we shall replace nO by n* for our subsequent analysis.
c
worth mentioning in this context
c
that for
this
first
It is
order equivalence
results, we may not even need an asymptotic expansion {as in {2.l}}; i t
•
suffices to assume that there exist a positive q and a positive v{l){F). such
that as n
~
IXI.
(2.7)
and this can be justified even under less stringent regularity conditions.
(or nO) on F through v{l){F)
-Also. (2.3) clearly reveals the dependence of n*
c
c
(and the v{j) (F».
As such.
sequential estimator of v P ) (F) can be
a
incorporated in the formulation of an appropriate stopping rule.
In the simplest si tuation. we may assume that the functional form of
v{l){.) is known, and define
V = v{l){F ). n
n
n
~ nO'
(2.8)
as the sample counterparts of v{l){F); it is tactily assumed that
vn ~ v{l){F)
a.s (almost surely). as n ~
(2.9)
IXI.
Then (2.3) and (2.9) lead us consider a stopping time
Nc
where h
>
= inf{n
~
° (specified).
nO : n
and n
q+l
~
{c
-1
q)[Vn+n
-h
]}.
c
> 0.
(2.l0)
-h protects against too early termination (if
V is too small).
n
It may be remarked that {[Fn{x) - F{x)]:
{or {[F (x). x € Er ], n
n
is nonnegative.
r
x € E }. n ~ 1
~ I}) is a reverse martingale (process) and v{l){.)
So whenever v{l){.) is a convex functional. such that
E v{l){F ) exists for some n = nO (~ 1). then
n
{V ; n
n
~
nO} is a nonnegative reverse submartingale.
(2.11)
Incorporating this reverse sub-martingale property along with appropriate
moment convergence results for {V}. one may virtually repeat the steps in
n
Sen and Ghosh (l98l) and conclude that the AMRE property in {I. 10) holds for
{TN}.
c
Thus. in this sense, {Nc ' c
optimal stopping time.
property of
the V .
n
> O}
is asymptotically (as c ! 0) an
In the negation of such a reverse sub-martingale
one may require more
stringent moment convergence
properties of the V for the derivation of the AMRE property of {TN}.
n
c
In
dealing wi th rank based (R-) estimators of location. such an approach has
been systematically explored in Sen (1980.1981) [see also Sen (1984) for the
multiparameter case]. and similar results can be derived for other cases too.
In some situations (particularly in the multiparameter case). v{l){F)
may be of quite complicated form. so that the formulation of V may involve
n
rather complex computational schemes.
For example.
suppose that T{·) is
differentiable at F. so that T{F ) can be expanded around T{F).
In this
n
expansion. the first term is J T{l){F;x) dFn{x) = n- 1
T{l) (F;x)
~=1 T{l){F;X i ). where
is the so called influence function (at F).
If we choose a
quadratic error loss function (With a given p.d. matrix Q). then v{l){F) =
* (F».
* (F)
is the dispersion matrix of T (l) (F;X ). Usually.
1
the functional form of T{l){F;x) depends on the unknown F. so that for the
~
trace{Q
~
where
estimation of ~*(F) (or v{l){F». one may require the estimation of T{l){F;x)
as well.
Jackknifing methods can be used with advantage in such a case.
From the base sample (Xl ..... Xn ) of size n. we delete the i th observation
(Xi) and denote the corresponding empirical d. f. by
F~~~.
so that
T~~~
=
T{F~~~) is an estimator of a. for i=l •...• n. Let then
Tn. 1 = n Tn - (n-1)T{i)1'
i=l ..... n;
nT* = !. ~
T .
n
n i=l n.l
and
*
Vn
1 ~
= -1
n- 2..1= 1
(2.12)
*
* ,
(Tn.l.-Tn HTn.l. -Tn ) .
(2.13)
. are so called pseudovariables. T* is the classical jackknifed
n
n.l
estimator of a and V* is the jackknifed estimator of ~*(F).
Under qui te
The
T
n
general regularity conditions [viz .. Sen (1988)]. we have
nllTn -Tn* II = O{l)
almost surely (a.s.). as n -+
V*
n -+ ~*(F)
a.s .•
as n -+
(2.14)
CIO.
(2.15)
CIO.
so that
V
n
= trace(Q V*)
n
Note that for every n
-+
v(l)(F) = trace (Q ~*(F»
(~nO)'
T
a.s .•
as n -+
CIO.
(2.16)
1....• T
are exchangeable r.v. ·s. but their
n.n
dependence structure changes with n.
As such. in general, it may be
n.
e
difficult
to obtain a
However.
using
the
reverse
differentiability
convergence properties of V
on T(·) and F.
approach
in
Sen
(1988).
{V}.
n
moment
can be studied under appropriate regulari ty
n
condi tions
sub-martingale characterization for
and hence.
the asymptotic optimali ty of
the
stopping time {N • c)O} can be established as in before.
c
We conclude this section with a remark that the asymptotic optimality of
the stopping time {N . c ! O} in (2.10) is relative to the AMRE property in
c
(1.10).
In this context it is tactily assumed that pO(F) is the minimum risk
c
(~)
attainable for the class
model (i.e .• p
T
n
~
of estimators in (1.9).
3). it is possible to construct some shrinkage versions of
which have smaller (asymptotic) risk.
Is N • c
c
too?
!
For the multiparameter
This leads to the basic question:
0. an asymptotically optimal stopping time for shrunken estimators
We study this problem in the next section.
3.
STOPPING-TIMES AND SHRUNKEN STATISTICAL FUNCfIONALS
Shrunken versions of statistical functionals have been considered in Sen
(1989) where the asymptotic dominance of sequential shrunken functionals over
their classical counterparts have been discussed.
To motivate our main
results. we recapitulate briefly the basic results on shrunken functions.
Let us define T • V* and Q as in earlier sections. and let
n
dn
= smallest
Side by side. let 6
the
n (T -9).
n
of
note
testimators:
estimator
Qv*.
n
n -) nO.
(3.1)
characteristic root of QI* (F). where
the
asymptotic
(multi-normal)
* (F)
~
distribution
is
of
Then
d
Next.
characteristic root of
= smallest
dispersion matrix
~
n
that
n
... 6
Stein-rule
a. s..
(or
as n ...
shrinkage)
estimators
An appropriate test statistic (say.
itself.
and
this
test
statistic
divergence of 9 from the assumed pivot 90.
(3.2)
00.
is
~
n
are
basically
) is incorporated in the
primarily
a
measure
of
Without any loss of generality.
we may set 9 =0 (otherwise. substract 90 from Tn and reduce the case to a
0
null pivot).
Then an appropriate
~
n
to test the adequacy of the pivot (0) is
~ = nT' y*-1 T ,
n
n n
n
where under the null hypothesis (9=0),
~
n
(3.3)
has asymptotically (central) chi
Larger values of
square distribution wi th p degrees of freedom (DF).
depict the inadequacy of the assumed null pivot.
~
n
We consider the following
tyPe of shrinkage functionals:
rsn = {I - k dn ~-1n Q-l y*-I}T
n
n
(3.4)
where p is taken greater than 2, the shrinkage factor k is a positive number
(0
< k < 2(p-2)) and,
in
this
context,
the
loss
function
L(a,b)
=
(a-b)'Q(a-b) is quadratic with the p.d. matrix Q as its discriminant.
In
the
distribution,
Thus,
~
n
parametric
(normal
theory)
model,
T
n
has
This
has the Hotell ing r-distribution (noncentral when 9 # 0).
in the computation of the risk of
exact distribution of
~
-1
n
rs.
n
rs and incorporate
n
(or the independence of T and Y* ) may be extremely
n
quite stringent regularity conditions.
computation of the (asymptotic) risk of
n
n
require
rs (with an unbounded loss function),
n
the technique employed by Ghosh,
In the normal theory
Nickerson and Sen (1987)
heavily on the stochastic independence of {Tn:n ~ I} and
Hence,
rs may
This poses a serious problem in the
and the problem becomes harder in the sequential case.
their joint sufficiency.
the same
But, in a nonparametric model, the
difficult to comprehend, and moment-convergence results for
case.
multinormal
*n
independent.
(n-l)Y* "" Wishart(:I,p,n-l), and T and Yare
n
n
enables one to compute the dispersion matrix of
model,
a
{v::
rests
n ~ 2} and
Parallel results may not hold for the nonparametric
the theory developed so far
[Sen (1989)]
relates
to the
asymptotic case only.
There are two easy ways eliminating the technical difficulty caused by
the presence of ~-1 in (3.4).
n
dn Q-1 Y*-1
n
First, note that
= {largest
-1 Q-1 Y*-1 ,
root of Q-1 Y*-1
n }
n
(3.5)
*-1
so that the characteristic roots of dn Q-1 Yare
all less than or equal to
n
1 (these are nonnegative too). As such, on the set {~ >k}, {I-k d ~-1 y*-l}
n
n n
n
is positive semi-definite
(actually p.d.
with probability one)
characteristic roots all nonnegative and bounded by 1.
quite different on the set
{~
~
n
k}, particularly when
and has
The picture can be
~
n
is close to zero.
This prompts us to consider a "posi tive rule" version of ~ in (3.4).
n
denote by
*=
U
{O,
n
~n
if
~
k
~-1 Q-1 .V.*-1 ,
I-k d
n
We
n
(3.6)
if
~
n
>k,
and consider the modified estimator
(3.7)
Note that
~c;;+
~
-1
-1
= 0 on the set {~n ~ k}. while for ~n>k. ~n ~ k , so that
moment-convergence results for ~+ may not require any stringent regularity
n
conditions (pertaining otherwise to the integrability of ~-1).
n
In fact, such
a positive-rule version is known to possess better dominance properties than
~n itself [see Sclove, Morris and Radhakrishnan (1972) for the normal theory
The other possibility is to use the original estimator ~ in (3.4),
model].
n
but to adopt a modified definition of the asymptotic risk.
Towards this, we
may note tht if the true parameter point 0 is different from the pivot (0),
then under quite general regularity conditions, as n
n -1
so that
~
~
n
~
A = O'[!* (F)] -1 0
= 0 (n) when 0
n
p
~
O.
> 0,
~ 00
in probability,
(3.8)
In this context, by (3.4), we have
nll~-T 11 2 = k 2 n ~-2 T' y*-1 Q-1 y*-1 T
n nQ
n
n n
n
n
= (k2 ~-1) (n ~-1)(T' y*-1 Q-1 y*-1 T )
n
n
n n
n
n
= 0 (n
p
-1
), as n
~
00.
(3.9)
Thus, n~(~-O) and n~(T -0) are stochastically equivalent and they share the
n
n
conunon asymptotic proper ties.
Or in other words,
there is no asymptot ic
improvement due to shrinkage if the actual 0 (fixed) is different from the
assumed pivot.
pivot.
The si tuation is different when 0 is "close to" the assumed
In that case, there is some improvement due to shrinkage. and this
will be considered here.
We take into account such a 'small neighborhood' of
the pivot (in conjunction wi th c ! 0) in our formulation of the "asymptotic
risk" function.
For simplici ty. we confine ourselves to a quadratic error
loss. so that as in Section 2. we have
o
nc
= O(c~ ).
I
~
as c
O.
(3.10)
It will be seen later on that the class of stopping times {N ; c > O} may
c
also be so chosen that
= 0p (c~).
N
c
As such.
as c ! O.
(3.11)
looking at (3.8) and (3.9) along with (3.10) and (3.11). we may
conceive of a sequence {A(c); c>O} of nested neighborhoods.
A(c)
where A(O
this
< A < 00)
domain
= H~ :
liB-Oil
~
Ac
~
(3.12)
}. c > O.
is a fixed positive number (otherwise. arbi trary).
(A(c»
and
for
Nc
satisfying
(3.11).
c~[~ -TN ] has a
c
nondegenerate limit distribution (as c 1 0). and hence.
Wi thin
~
c
are not
and TN
c
c
generally asymptotically equivalent. while outside A(c). as c ! O.
holds [with 0 (n
p
-1
) replaced by
0
p
(1)].
asymptotic (as c 1 0) distribution of
(3.9)
As such. we may as well consider the
C~(TN -B) and c~(~ -B). denote then
c
c
G and G(S) respectively. and compute the asymptotic risk by reference to such
asymptotic distributional risk (ADR) [see. Sen (1987a. 1989). for example].
The main advantage of using the ADR cri terion is that we do not have to
consider more stringent regulari ty condi tions pertaining to the asymptotic
value of the actual risk p(n;c.F). and whenever such a limit exists. it would
also agree with the ADR.
risk
and
ADR
condi tions).
would
For the positive rule estimator ~+ the asymptotic
c
have
a
common value
(under
the
usual
regularity
The important fact is that the resul ts based on these ADR
comparisons remain applicable to the ones based on the asymptotic risks. when
additional regularity (i.e .• integrability) conditions are met.
It follows along the lines of Sen (1989) that in the light of the ADR
and adapted to the stopping time {Nc ' c>O} in (2.10).
~ dominates TN.
c
c
A
very similar conclusion holds for ~+ (vs TN).
c
asymptotically
optimal
stopping
Thus. with respect to the
c
times
pertaining
to
the
classical
case
(treated in Section 2). further reduction in the asymptotic risk is possible
due to shrinking the functionals towards a pivot. when the true parameter
lies "close to" the pivot.
This is in agreement with the general results on
sequential shrinkage estimation for the normal theory case [Ghosh. Nickerson
and Sen (1987)]. where also the dominance is perceptible only in a 'small'
neighborhood of the pivot.
However. our conclusions remain applicable to a
much larger class of estimators and a wider class of nonparameter models.
With these results at hand. we now consider the final question regarding
asymptotically optimal stopping times relating to shrinkage estimators.
In
this context. first. we provide an answer to the query posed at the end of
Section 2.
An affirmative answer to the asymptotic optimality of N • c ) 0
c
in (2.10) for shrinkage estimators can be made when the true 9 is away from
the pivot.
Or. in other words. if a pivot is so chosen that it is not likely
to be 'very close' to the actual parameter point 9. then shrinkage estimators
fail
to
provide
counterparts.
any
perceptible
and hence.
their
improvement
asymptotic
risk
over
their
equivalence
classical
entails
asymptotic optimality of the stopping time N • c)O. in (2.10).
c
the
However. we
may have an al together different answer when 0 lies "close to" the assumed
pivot.
We explore this situation in the remaining of this section.
Let us now confine ourselves to the case of 9 € A(c). c
! O.
Towards
this. we consider a sequence {F(c)' c ) O} of d.f.·s such that
~c
: 9c
= T(F( c » = c~
A.
A
= (A 1 ....• Ap )'
fixed. c)O.
This is more effectively conceived by allowing F(c) ~ F as c
~c
9 = T(F) = 0 (the pivot). so that
0
alternatives.
(3.13)
! O. such that
corresponds to a sequence of local
Note that the dispersion matrix of the asymptotic distribution
~
of n (T -0) is not affected by
n
(when N is defined by (2.10»
c
{~
} in (3.13). and hence.
the ADR of TN
c
does not depend on A (or the pivot».
other hand. the distribution of n
A. and hence. the ADR of
c
~-S
(l~-O)
n
On the
(even asymptotically) may depend on
~ depends on A as well.
c
We define by
r
the dispersion matrix of the asymptotic distribution of
n~{Tn -0). and for L{·). we take a quadratic error loss (with a p.d. Q). so
that as c ! 0
= n-1
p{n;c.F{c»
trace (Qf) + cn.
2
whenever cn is bounded from below by some
2
n : n c = 0(1) (as c ! 0). we denote by
positive~.
(3.14)
For every
(3.15)
and
Then. following the lines of Sen (1987a.b). it can be shown that the ADR of
2
~~
r-n (whenever n c
cn + n
-1
= 0(1)
and n
2
c>O) is given by
-2
[tr{Qf) - 2k{p-2)o E{X- (A
1>
o
-4
+ A [tr{Qf)]E{x-. {A
nc
~
4
2 2
-1 -1
-4
) + k 0 [tr{Q r )]E{X~'2{A
1JT
»
(3.16)
< k < 2{p-2)]. 0 is defined after (3.1)
the non-central chi-square d.f.
q
nc
nc »].
where k is the shrinkage factor [0
and denoting by H (x;A)
nc
with q DF and
noncentrality parameter A.
-2r
E{~
q
(A»
=
00
fOx
-r
dH
q
(x;A). r
~
O. A
~
(3.17)
O.
o
Since A and A both depend on X in (3.13). it is clear that (3.16) depends
nc
nc
on (c.n). Q.
r
minimum at A
nc
as well as X.
o =0
= Anc
cn + n
-1
For any given c. n. Q and
r.
(3.16) attains a
(i.e .. X=O). and this minimum value is given by
{tr{Qf) - ko[2-p
-1
where 0 tr{Q-l r- 1 ) = tr{Q-l r- 2 )/ch
(p-2)
max
Hence. (3.1S) is less than cn + n
-1
-1
-1-1
kotr{Q
r )]}.
(Q-l r -l) ~ p and 0
tr{Qf)
= p{n;c.F).
(3.1S)
<
k
<
2{p-2).
so that for X=O. the
minimum value of (3.1S) (over n) is given by
n
2{vc[tr{Qf) - ko{2-p
which is smaller than pO{F)
c
time NO. c > O. defined by
c
-1
(p-2)
-1
= ~{tr{Qf)}~.
-1 -1
~
kotr{Q r )}].
(3.19)
Thus. if we consider a stopping
(for c > 0), then it follows by the same technique as in Sen (1989) that the
asymptotic (as c ! 0) risk of ~0' when 9=0, is given by (3.19), and
N
c
>
moreover. defining Nc , c
° as in (2.10)
·m(No/N ) a:s. 1
11
clo
c c
(with q=I),
k6
[2
k6
(Q-l r- 1 )]
- tr(Qf)
- p(p-2) tr
<1
.
(3.21)
Thus, at the pivot (9=0), the stopping time No is stochastically smaller than
c
N and at the same time leads to a smaller asymptotic risk for the shrunken
c
estimator ~
NO
(comPared to ~N ) when A #
c
c
note that under (3.13), the ADR of ~
NO
° (i.e.
9 # 0).
To see this, we may
is given by
c
c~{tr(Qf) - k6[2 - P(~~2) tr(Q-l r-l)]}~ +
(3.22)
c~{tr(Qf) - k6[2 - P(~~2) tr(Q-l r-l)]}~ {Tr(Qf) - k6(2 - P(~~2) tr(Q-l r -l»
O
+ h(6,Q,r.A 0 ' A o )}
n c
n c
c
c
where n
o
c
= c
~
[tr(Qf) - k6{2-p
-1
(p-2)
o
h(6,Q,r,A nc . Anc )
-1
k6 tr(Q
-1
r
= (3.16)-(3.18)
-1
)}] and
(at n);
(3.23) is nonnegative and bounded from above by k6[2
(where the upper bound is attained when IIAII -+ CIO.).
note that the ADR for ~
c~{tr{Qf)}~ [1
(3.23)
P(~~2) tr (Q-l r- 1)]
At this stage. we may
is given by
c
+ [tr{Qf)]-I{TR{Qf) - K6{2 -
P{~~2) tr{Q-l r- 1»
+ h{6,Q,r,A
where n 0* ...., c ~ (tr(Qf» ~
c
°
> n.
c
* . AO~
)}]
nc
\J7'<
c
n c
(3.24)
c
By using (3.23) along with the upper bound
for h{-), we obtain that (3.24) can not be larger than ~{tr{Qf)}~ = pO{F).
c
Hence, it suffices to show that (3.22) can be larger than pO(F), for large
c
values of IIAII.
Towards this note that for
°
~
d
~
b
< a,
(a-b)~ + (a-b)~ (a-b+d) = (a-b)~ (2a-2b+d)
= (2a-b)(a-b)~ - (b-d)(a-b)-~
=
Since (1-b/2a)
2
= 1-b/a
(3.25) can be made
> :Wa
h(o,Q,r,A
~
° ' A°°)
n c
c
1
:Wa
b-~
(1 - 2a)(1 - il)
2
2
+ b /4a
>
- (b-d)(a-b)
1-b/a, V b
< a,
-lh
,
the right hand side of
for all d sufficiently close to b.
Now as IIAII-PI,
do{2-p-1 (p-2) -1 k 0 tr(Q-1 r -1 )}, by (3.22) and (3.25),
n c
c
we conclude that the ADR of ~O can indeed exceed that of TN
Nc
~ ) when
(3.25)
.
9 is away from the pivot (0).
(and hence,
c
This exhibits the lack of asymptotic
c
> O}, uniformly in A € EP . A similar criticism applies
optimality of {NO, c
c
to {N , c
c
> O}
[in (2.10)] when one intends to use shrinkage functional [as
in a neighborhood of the pivot (Le., for IIAII small) (3.24) will be larger
than (3.22)], although {Nc c
> O}
c
can't be larger than that of TN
c
us to the question:
and
has the nice property that the ADR of ~
(while that of ~
NO
can be so).
c
(a)
N* is stochastically smaller than N ,
c
c
(b)
the ADR of T
N*
c
This leads
c
Does there exist a stopping rule {N* , c
S
•
> O},
such that
is ~ the ADR of TN ' uniformly in A1
c
We construct an adaptive stopping time to meet both these requirements.
In
this context, we may note that there are nice series expansions for E(~-2(A»
q
and
-4
E(~q+2(A»,
q
>
2,
A
~
0,
in terms of Poisson weights, and hence,
h(d ,Q,V* ,a ,b ) can be conveniently computed for suitable a , b , d , Q and
n
nnn
n
n
n
*
V.
n
Let us define then Nc0 • c
> 0,
as in (3.20).
Also. for every n
tr(QVn* )}.
~
nO' let
(3.26)
•
where
anI = max{O. n T' y*-1 T -pl. a
nn
n
n2
= max{O.
n T' y*-I Q-l y*-1 T -tr(Q-l y*-1)}
nn
n
n
n
and the other notations have already been introduced earlier.
c
Let then
> O.
(3.27)
and by iteration. we let
V r l 1.
(3.28)
Finally. we define
(3.29)
It is clear from the definition in (3.26)-(3.29) that N* is stochastically
c
[Note that a~ ~ {tr(QY~)}. V n l nO' and hence N~r)
smaller than Nc .
~
N
c
V r lO].
Further note that under {Kc } in (3.13).
n T' Y*-1 T -p = n c ~ A' Y*-1 A + 2 nc~ A' Y*-1 (T -c~ A)
n
n
n
n
+
where for n
~
*c n c~
n.
~
neTn -c~ A)' Y*-1
n (Tn -c A) -
< 1 < 00.
p
(3.30)
the second term on the right hand side of
(3.30) has asymptotically a normal distribution with zero mean and variance
4nc~'r-IA. while the third term has the central chi squared d.f. with pDF.
2
where
Thus. n T' Y*-1 T -p behaves like A + Z + (x~~-p).
n n
n
nc
c
~
Zc - X(O.4Anc )·
A similar result holds for a n2 .
o
Thus. anI and a n2 are not
stochastically equivalent to A and A • respectively; they are subject to
nc
nc
some random variation [due to Zc and
2
~]
of the same order.
additional trouble for finding out the ADR of
TS*.
Fortunately. this relates
N
c
to the case. where
N(r)/n* ~ (1-~) + U • as c ! O.
c
where 0
<~ <
This creates
c
r
(3.31)
1 and U has a nondegenerate distribution over (0. 00 ) .
r
This
representation allows us to use the central limit theorem for random sample
size [c.f. Billingsley (1968. p. 147)]. and justifies the computation of the
ADR of
rS
from its asymptotic distribution.
It follows that the ADR of
N*
c
rS*
N
c
is less than (or equal to) that of TN . for all A.
c
the stopping time N* • c
c
> O.
Thus. the adoptation of
leads to a stochastically smaller stopping time
wi thout violating the asymptotic risk dominance of .yS* over TN .
N*
N
c
c
c
for.yS
t
But. viewed from a practical point of view. when 9 is not close to the pivot
(even under
(3.13». .yS* and
Nc
~
respect to their ADR). and {N • c
c
may be very close to each other (with
c
> O}
signals a great simplification of the
computation of ADR. wi thout changing the stopping time.
well favor N • c
c
> O.
Hence. one may as
as a working rule.
..
REFERENCES
1.
Billingsley. P.
Convergence of Probability Measures.
John Wiley. New
York. 1968.
2.
Ghosh. M.• Nickerson. D.M. and Sen. P.K.
estimation.
3.
Hoeffding. W.
distribution.
4.
Sequential shrinkage
Ann. Statist. 15 (1987). 817-829.
A class of statistics with asymptotically normal
Ann. Math. Statist. 19 (1948). 293-325.
Sclove. S.L .. Morris. C. and Radhakrishnan. R.
Nonoptimalityof
preliminary test estimators for the mean of a multivariate normal'
distribution.
Ann. Math. Statist. 43 (1972). 1481-1490.
On nonparametric sequential point estimation of location based
on general rank order statistics. Sankhya Ser. A 42 (1980). 201-219.
6. Sen. P.K. Sequential Nonparametrics: Invariance principles and
Statistical Inference. John Wiley. New York. 1981.
7. Sen. P.K. On sequential nonparametric estimation of multivariate
location. Proc. Third Prague Conf. Asymp. Meth. (£Os: Huskova. et.
al.). 1984. 119-130.
8. Sen. P.K. Sequential Stein-rule maximum likelihood estimation: General
asymptotics. Statist. Dec. Th. & ReI. Top. IV (£Os. Gupta. S.S. and
Berger. J.O.). 2(1987). 195-208.
9. Sen. P.K. Sequential shrinkage U-statistics: General asymptotics. Rev.
Brasileira de Prob. Estatist. 1 (1987). 1-21.
10. Sen. P.K. Functional jackknifing: Rationality and general asymptotics.
Ann. Statist. 16 (1988). 450-469.
11. Sen. P.K. Asymptotic theory of sequential shrunken estimation of
5.
Sen. P.K.
statistical functionals.
Proc. 4th Prague Conf. Asymp. Meth. (£Os. M.
Huskova. et al.). 1989. in press.
12. Sen. P.K. and Ghosh. M.
Sequential point estimation of estimable
parameters based on U-statistics.
13. van Zwet. W.R.
Sankhya. Ser. A 43 (1981). 331-344.
A Berry-Esseen bound for symmetric statistics.
Wahrsch. verw. Geb. 66 (1984). 425-440.
Zeit.