Sen, Pranab Kumar; (1991).Some Informative Aspects of Jackknifing and Bootstrapping."

SOME INFORMATIVE ASPECfS OF JACKKNIFING AND BOOTSTRAPPING
Pranab Kumar Sen
Departments of Biostatistics and Statistics
University of North Carolina
Chapel Hill, NC 27599-7400(3260), USA
The present study addresses a general query:
improved adaptively by a resampling plan?
Can an estiamtor be
In this context, the conventional
quadratic risk and the Pitman closeness criteria are incorporated in studying
some informative aspects of the two principal resampling plans, namely,
jackknifing and bootstrapping.
AMS Subject Classifications:
62G05, 62G99, 62F99
KEYWORDS AND PHRASES: Asymptotic bias; Asymptotic mean squared error;
Asymptotic median unbiasedness; Pitman closeness; Quadratic risk;
Shrinkage estimation; Statistical functionals.
I.
INTRODUCfION
Let Xl .... ,X be n independent and identically distributed random
n
variables (i.i.d.r.v.) having a (cumulative) distribution function (d.f.) F.
defined on the real line R.
Let 9
= 9(F),
a functional of the unknown d.f. F.
- 2 -
be a parameter of interest, and based on the sample {Xl' ... ,Xn }, we want to
draw statistical conclusions on 8(F).
be the empirical (sample) d.f.
8(F) is T = 8(F).
n
n
.
Let Fn(x) = n
-1
~
2
i =1 I(X i
~
x), x
E
R
Then a natural (nonparametric) estimator of
Although, Sans the case of linear functions, T may not
n
be unbiased for 8(F), under quite general regularity conditions, T
n
is
consistent, and as n increases,
~
n {Tn - 8(F)}
~
2
X(O,v (F»,
(1.1 )
2
where v (F) (generally, a functional of the unknown d.f. F) is a nonnegative
If the functional form of v 2 (F) is known and simple in nature,
parameter.
2
2
2
then v = v (F ) can be taken as a consistent estimator of v (F) , so that for
n
n
large values of n,
~
~ X(O,l).
n - 8(F)}/v n;u
n {T
( 1.2)
This last result provides the key to drawing statistical conclusions on 8(F)
for large n.
Sans this simple situation, in many other cases, a resampling
2
scheme may be incorporated in estimating v (F) in a nonparametric fashion.
Among these, jackknifing and bootstrapping methods are the most popular ones.
Let
F~:~
be the empirical d.f. of {Xl' ...
and
,Xn}~{Xi}'
T . = n T - (n-l)T(i)
nl
n
n-l'
and let
i = 1, ... ,n;
(1.3)
( 1.4)
Then TnJ is the classical jackknified version of Tn and v~ is the jackknifed
2
(Tukey-) estimator of v (F).
If
~(Tn) =
8(F) + n
-1-1
~(F)
+ o(n
is taken as the asymptotic bias [of n(T -8(F»], and ~ TnJ
n
so that jackknifing reduces the bias of an estimator.
), then
= 8(F)
~(F)
1
+ 0(n- ).
Under fairly general
regularlity conditions [viz., Sen (1988a,b)], as n increases,
and
2 / v 2(F) a.s.
vnJ
I
V
2(F) •
( 1.5)
- 3 -
Thus. essentially. jackknifing leads to a bias reduction (up to the order n
-1
)
without compromizing on the other properties of T and it provides with a
n
2
nonparametric estimator of v (F) as well.
Consider now M (usually large) independent samples. each of n independent
observations drawn (With replacement) from the d.f. F.
The observations in
n
the ith sample are denoted by X.* •...• X.* and the corresponding empirical d.f.
11
In
*
by Fn(i)'
for i=l ..... M.
Let then
* = M-1 x._M
*
1 T .;
nB
1=
nl
T*.
( 1.6)
T
nl
(M-1)
-1
*
_M
*
2
(1. 7)
Xi=l n(Tni-TnB ) .
We term T*
2
nB as a bootstrap version of Tn and v nB as bootstrap variance
estimator.
Then. under essentially the same regularity conditions as in the
case of jackknifing. as n
~ 00.
M~
00.
(1.8)
Moreover. if we consider the set {Y* . = n ~ (T* i-T ). 1
nl
n
n
~
i
~
M} and denote the
empirical d.f. of the Y* . by G* (termed the bootstrap distribution). then
nl
n
under parallel regularity conditions.
w
2
G* -+<P(·;O.v(F».
n
2
as
M~oo.n~oo.
( 1.9)
2
where <P(x;a.b ) stands for the d.f. of a normal (a.b ) r.v. [viz., Efron
(1982)].
Although in the literature it has been emphasized tha (1.9) may hold
even if <P were not a normal d.f .• such a claim may not be universally true
[viz .. Athreya (1987)].
. of T* , v 2
Further, for the computatIon
nB
needs to generate oM (= M* . say) i.i.d.r.v.'s from F.
n
nB
or G* . one
n
Although the advent of
modern computers has made it possible to accomplish this task without much
pain (even for large values of n and M!), there remains some arbitrariness in
the proper choice of M(relative to n) as well as in the outcome (T* , v 2 •
nB
nB
- 4 G*).
Two bootstrap samplers may generate different sets of bootstrap samples
n
from the comma base sample (X ..... X ) and thereby may endup with possible
n
1
• v2
different values of T*
nB
nB
and G* .
n
from F (and not directly from F).
n
Also. the bootstrap samples are drawn
Albeit F being sufficient for F (under
n
general regularity conditions). the information contained in Fn can not
presumably be equal to (or more than) that in F.
query:
Hence. we may pose a general
. of Tn? That is.
Can T*nB (or TnJ) be regarded as an improved verSIon
is it possible to improve an estimator adaptively by a resampling plan?
Incorporating the conventional quadratic risk and the Pitman closeness
criteria. we intend to study some informative aspects of jackknifing and
bootstrapping. the two popular resampling schemes.
2.
SINGLE PARAMETER CASE
First. let us consider the case of the jackknified version TnJ in (1.4).
it follows from Sen (1977) that
= Tn
TnJ
+ (n-l) £{(Tn -Tn- 1)1~}
n
(2.1 )
where ~n = ~(Xn:l'" .. Xn:n;Xn+ j • j ~ 1) is the sigma-field generated by the
order statistics Xn : 1 ~ ... ~ Xn : n and by Xn+ j . j ~ 1. so that ~n is
!
in n.
Thus. TnJ incorporates the bias correction through the conditional expectation
in (2.1).
If {Tn'
~n;
n
~
nO} forms a reversed martinglae sequence (so that
£[T~ 11~n ] = Tn a.e .• V n -> nO)' as in the case with U-statistics. then Tn J =
Tn with probability one, V n
> nO'
and hence they are equally informative.
The main role of jackknifing in this case relates to the nonparametric
2
estimation of v (F).
As is typically the case with nonlinear functionals
(such as the von Mises functions of degree
reversed martingales. {Tn'
~n;
n
~
~
2) or nonlinear fucntions of
nO} may not strictly form a reversed
-smartingale sequence (and may not be unbiased too).
In such a case. (2.1)
provides an operational rule for bias reduction in addition to the variance
estimation role of jackknifing.
In the so called "smooth" case. we assume
that
~
Tn
= 8{F)
+ n
-1
~(F) +
-1
o{n
(2.2)
).
so that under fairly general regularity conditions [viz .. Sen (1988a.b)].
(l.S) holds.
In such a case. we have therefore.
Asymptotic mean square error (AMSE) of n~{T -8{F»
n
~
= AMSE of n (T
nJ
-8{F»
+ n
-1 ...2
-1
p (F) + o{n
(2.3)
).
so that in the light of the usual quadratic risk criterion. Tn J and Tare
n
asymptotically equivalent.
of T
n
TnJ can still be regarded as an improved version
in the sense that in (2.3). the second term on the right hand side is
nonnegative. so that the bias correction in TnJ has a variance reduction role
(albeit very insignificant).
Next. we examine the picture in the light of the Pitman-closeness
criterion (Pee).
Recall that for two competing estimators. say T{l) and T(2),
n
n
of a parameter 8. the PC measure may be defined as
and if (2.4) is nonnegative for all 8 (and strictly positive for some 8). then
T{l) is said to be 'closer' to 8 than T(2). in the Pitman (1937) sense.
n
n
If
(2.4) converges to O. uniformly in 8. {T{l)} and {T(2)} are said to be
n
asymptotically PC-equivalent.
n
If for some positive integer k. (2.4) is
0{n-{k-1)/2). uniformly in 8 (in a closed interval). {T{l)} and {T(2)} are
n
said to be k-th order PC-equivalent.
In the current context. 8 = 8{F). and
hence. we shall conceive of a compact family
member of
~.
Thus. we have
n
~
of d.f.'s and allow F to be a
- 6 -
PCMF(Tnj.Fn ) = PF{(Tnj-Tn )
2
< O}
+ 2(Tnj-Tn )(Tn-9(F»
> O}
- PF{(TnJ-Tn )2 + 2(Tnj-Tn )(Tn-9(F»
= PF{(n-1)(Tnj-Tn )(Tnj+Tn-29(F» < O}
> O}.
- PF{(n-1)(Tnj-Tn )(Tnj+Tn-29(F»
(2.5)
Note that by (1.2). {T } is {n~}-consistent. so that if for a positive integer
n
k.
. sup n (k-1)/2, PF{T
11m
n
n~
~
11 = O.
9(F)} - 2
(2.6)
FE?J
we term {T } as kth order asymptotically median unbiased (AMU-k) for 9(F)
n
~
[viz .. Akahira and Takeuchi (1979)].
~
- 9{F)}
~
0 as n
00
Note that by virtue of (1.2). n {med(Tn )
~
(a.e.F). so that (2.6) holds for some k
other. if in (1.5). P{F) # O. n
1.
On the
-~
P{F) contributes a term of the order n
-~
in
..1~
P{Tnj+Tn -29 (F) < O}. so that even if {Tn} is AMU-2. (2.5) is O(n ) (not
o{n..1~». Thus. by virtue of (1.2). (1.5) and (2.6). we obtain that for
P(F) # O.
O{l).
if Tn is AMU-1
= {
..1~
O{n ). if Tn is AMU-k. k
= O.
If P(F)
some k
~
2.
in (2.7). O(n
..1~
)
can
=0
) if Tn is AMU-k for
(a.e.F) and {Tn} AMU-k for some k
{Tnj} are 2nd order PC-equivalent.
> 0 and med(Tn )
<0
..1~
2.
Therefore. {Tn} and {Tnj} are asymptotically first order
PC-equivalent; for P(F)
P{F)
be replaced by o(n
(2.7)
~
~
1
~
2. {Tn} and
In passing. we may remark that if P{F) is
9(F) + 2n P(F) (a.e.F). then (2.5) is
~
O.
Similarly. if
and med{Tn ) ~ 9{F) + ~n P{F) (a.e. F). then (2.5) is ~ O.
Thus. the
PC dominance of {Tnj} over {Tn} may be judged by looking at P{F) and the
med(Tn ); but. this effect will be imperceptible for large n [by virtue of
(2.7)].
- 7 -
Let us next consider a 'non-smooth' case where (2.2) may not hold, and
the picture may be somewhat different.
In the context of asymptotic normality
and other related properties of n~ (T -8(F», it is not uncommon to assume
n
that T admits a first order asymptotic eXPansion [see Jureckova (1984)]:
n
Tn - 8(F) = n
-1
where the <f'F(X i ) are i.i.d.r.v. with
o < v 2 (F)
=
..Jl
2..
1=
~
~<f'~(Xi) < 00
(2.8)
1 <f'F(X.)
+ Rn ,
1
<f'F(X i ) = 0,
and
n~n ~
as n
0,
~
(2.9)
00.
Such a representation is actually justified by the classical Hoeffding-Hajek
projection of Tn into independent summands, and in the literature, <f'F(x) is
known as the influence function of Tn (at x,F).
<f'F(X i )' n
Z 1,
.
-
Let us wrIte <f'Fn = n
and note that if Tn were a linear functional then Rn
-1
n
2. 1
1=
=0 with
probability 1, and we would have
n
-
- neTn- 1-8 (F))= -1
n Z2.
n- {<f'F(Xn )-<f'F}'
n
neTn -Tn- 1) = neTn -8(F»
Now, by the Khintchine strong law of large numbers, ~F
(2.10)
~ 0 a.s., as n ~
00,
n
while by (2.9), n~h{ max
l~i~n
I<f'F(X ) l/v(F)} ~ 0 a.s., as n ~
i
00.
Hence, from
(2.10), we have
(2.11)
Motivated by (2.11), we assume that {T } admits a first order representation
n
[as in (2.8)-(2.9)] and in addition, (2.11) holds.
Note that (2.8) and (2.11)
ensure that
nCRn -Rn- 1) = o(n
~h
) a.s., as n
~
(2.12)
00,
and hence, using a version of the Hewilt-Savage Zero-0ne law and defining
~
n
as in (2.1), we obtain that
E{n(Rn -Rn- 1) I~n } = o(n
-~
) a.s., as n
~
00.
(2.13)
- 8 -
As such. from (2.1). (2.8) and (2.13). we have under (2.11).
(n-l) E[(Tn-Tn_l)l~n}
1
= n- (n-1)]E[ nn 1 (~F(Xn ) - ~F(Xn ) - ~n )I~n } + E{n(Rn-Rn- 1) I~n }]
1
= n- (n-l)
-- o(n-~)
E{n(Rn-Rn_1)I~n}
a.s .• as n
-+
(2.14)
00.
Consequently. by (2.1) and (2.14). we have
(2.15)
As such. parallel to (2.3). we have for large n.
~
AMSE of n (Tn-S(F»
~
= AMSE of n (TnJ-S(F»
(2.16)
+ 0(1).
and parallel to (2.7). we have
as n
-+
(2.17)
00.
If we desire to improve the asymptotic equivalence results in (2.16) and
(2.17). we may need suitable rate of convergence in (2.15). and this. in turn.
may call for more refined results than in (2.11)-(2.12).
Let us now consider parallel results for the bootstrap estimator T* .
nB
. for each 1(=1
. ..... M). T* . = S(F* .). under the usual regularity
SInce
nl
nl
*
conditions on S(·) and F(=> Fn => F*
n i)' we may assume that given Fn . the Tnl.
admit a first order representation:
-1 ~
*
*
T*
~j=1 ~F (X ij ) + Rni ·
ni - Tn = n
n
~F
where the
= 0 and
n
~
n
1
~ i
~
M.
.
(X*.. ) are conditionally (given F ) i.i.d.r.v.·s WIth
IJ
n
(2.18)
~
n
~F
*
(Xi·)
n
J
2 (X*.. ) = v 2 (F ). and further.
~F
n
IJ
n
n
(2.19)
Moreover. the R*
nl. are conditionally (given Fn ) are i.i.d.r.v .. and
...
- 9 -
n F_
-I-
n
. R*
Thus, on lettIng
nB
*2
R.
nl
~
0,
1st mean, as n
(2.20)
~ 00.
_M 1 R* ., we have
= M-1 Y.1=
nl
T* - T = (nM) -1
nB
n
_M
Y.-
1=
n 1 'P (X*.. ) + R* ,
1 };.
nB
F
J=
n IJ
(2.21)
where invoking the i.i.d. (given Fn ) structure of the R*
.,
nl
(2.22)
From (2.21) and (2.22), we have
*
2
n E}-(TnB-S(F»
=n
*
2
E}-[(TnB-T
n ) + (Tn-S(F»]
=n
E}-{E}- [(T:a-Tn) + (Tn -S(F»]2}
n
= M-1
2
~
E}-[vn(Fn )] + n E}-(RnB )
(2.23)
+ n E}-(Tn -S(F»2 + 2 E}-[n R:a(Tn-S(F»].
The first two terms on the right hand side of (2.23) are both nonegative; the
first term behaves like M- 1 v 2 (F) while the second one converges to 0, as
n
~
00.
Similarly, the last term (not affected by M) converges to 0 as n
~
00.
Actually, when R* has expectation (given F ) equal to 0 (a.e. F ), this last
n
n
n1
term is identically equal to O.
But, in (2.21), R* contains the conditional
nB
bias of T* (given F ), and hence, in general, it may not have identically
nB
n
mean equal to O.
But, this conditional bias is generally o(n-~ ), and this
supports (2.20).
Hence, from (2.23), we have
*
AMSE of n~ (TnB-S(F»
= AMSE of n ~ (Tn-S(F»
-1 2
+ M v (F) + 0(1), as n
~
(2.24)
00,
so that the bootstrap version T* is asymptotically (as n
nB
~
informative as the original T only when M is chosen large.
n
(0) equally
It is interesting
to compare (2.3) and (2.24); whereas by jackknifing we are able to reduce the
- 10 AMSE (albeit insignificantly), by bootstrapping we have an increased value of
the AMSE, unless M is very large.
In fact, under the regularity conditions
pertaining to (1.5), it can be shown that n E(R~)
-1
that in (2.24), 0(1) can be written as O(n
).
= O(~),
uniformly in M, so
Hence, in order to limit the
*
excess of AMSE [n~ (TnB-a(F»]
over AMSE [n~ (Tn-a(F»] to O(n-1 ) [as in (2.3)],
we may need that M=O(n).
However, cOmPared to (2.16), for (2.14), it suffices
to take M large (without necessarily being O(n».
On the other hand, there is
a basic concern with M=O(n) when a(-) is not a linear functional.
To
illustrate this point, consider a 'smooth' functional a(-) admitting a second
order eXPansion, for which in (2.18), we may write [as in Sen (1988a,b)]
1
R*.
nl = n- bn (F*.)
nl + R**
ni' i -> 1 ,
(2.25)
where Ib (F*i) - b (F )1 ~ 0 a.s., (a.e. F ), as n ~ 00, while R~ has
n n
n n
n
nl
1
2
conditional expect ion , given F , 0(n- ) and E(R~2IF ) = 0(n- ). We assume
n
nl
n
further that b (F ) ~ ~*(F) a.s., as n ~
n
n
where I~*(F)I
00,
< 00
If we use
(2.25) in the definition of R* , we have
nB
*
-1
-1
b (F ) + 0 (n )
n n
p
nB = n
R
= n-1 ~* (F)
[Note that
~
* (F)
may not be equal to
+ 0 (n-1 ). as n
~(F)
p
in (1.5).]
~
(2.26)
00.
Under appropriate
regularity assumptions [on b (F* .)], we may even replace 0 (n-1 ) by o(n-1 ) in
n nl
p
second mean, so that for Min
~ ~
(0
< ~ < 00).
we obtain that, given F ,
n
*
~ N(~ ~ ~* (F), v 2 (F ».
(nM) ~ (TnB-T
n ) ;u
n n
(2.27)
and the AMSE of n~[T -a(F)] is given by
n
where in (2.28) we have made use of (2.23), (2.26) and (2.2) for T -a).
n
Note
.•
- 11 -
that ~*(F) may be different from 0 even if ~(F) = 0, and, in general, ~*(F)
and
may not be equal or of the same sign.
~(F)
2
~
n [Tn-O(F)] = v (F) + n
-1 ..2
-1
However, noting that MISE of
p (F) + o(n
), we obtain that (2.28)
-1
*
can
be written
as
2
v (F) + n
-1
{~
2
v (F) + (~ (F) + ~(F»
~(F)
It may be recalled that
2-1
} + o(n ).
(2.29)
arises mainly due to the expected value of the
second order compact derivative of O(e) at F, and, similarly, b (F ) comes
n
n
from the conditional expectation (given F ) of the second order compact
n
derivative of O(e) at F , when we assume that O(e) is second order Hadamard
n
differentiable in a neighborhood of F (containing the F a.s., as n
n
~
00).
Since typically this second order compact derivative is not a constant nor a
linear functional, we may not have
F_
-F bn (Fn ) = bn (F_
-F Fn ) = bn (F),
(2.30)
and hence, the convergence of b (e) to some bee) may not necessarily imply
n
that
~
bn(F )
n
~
b(F).
asymptotic bias terms
Now (2.29) clearly illustrates the role of the
~(F)
and
* (F)
~
MISE of the bootstrap estimator.
and
~
(= Min) in the inflation of the
In some simple situations,
~(F)
=
~
* (F)
and
(2.29) indicates that bootstrapping incorporates a four times effect of the
bias square in the second order term of its MISE as well as a factor 0(M- 1 )
due to bootstrapping.
As an illustration, consider the simple functional:
= Variance of X.
Here
~(F)
= -O(F), O(Fn ) = n
-1
~
(2.31)
- 2
2
2
i=1 (Xi-Xn ) = sn and bn(Fn ) = -sn
a.s., as n ~ 00, so that ~*(F) = ~(F) = -O(F).
~
-O(F)
Thus, using the biased
estimator (s2) of O(F), bootstrapping has four fold effect of (bias)2 in its
n
MISE.
If we use T = n(n-1)-1 s2, the unbiased estimator of O(F), we would
n
n
- 12 have
* (F) = ~(F) = 0;
~
bootstrapping.
[viz., B(F)
If the regularity conditions pertaining to (2.26) do not hold
= F-1 (p)
A
O(n- ) for some 1
A
E
nevertheless, M-1 v 2 (F) reflects the excess due to
for some p
> A > 0,
E
(0,1)], (2.28) may involve other terms
.'
and hence, the excess may become O(n-A), for some
(0,1), even when we choose M = O(n).
Hence, if A is known, M may as well
A
be chosen as O(n ), and there is no incentive in choosing M unnecessarily
large.
The picture becomes somewhat diffused for moderate values of n, and
hence, there is a good deal of interest in the moderate sample asymptotics in
this context.
Let us next consider the PC-performance of the bootstrap version T*
nB
relative to Tn'
*
* n ) + (Tn-B(F»
Writing TnB-B(F)
= (TnB-T
and proceeding as in
(2.5), we obtain that
".
= ~[PF
n
< O}]
{(T:S-Tn )2 + 2(T:S-Tn )(Tn -B(F»
(2.32)
First consider the "smooth" case, where under appropriate regularity
conditions, we invoke (2.27), (1.1), (2.2) and assume that Min
~~:
0
< ~ < 00
We write
(T* -T )2 + 2(T*nB-T )(T -B)
nB n
n
n
S0
~ L {vMn(T*nB -T )}2 + 2vhM(T*nB-T )v£'(T -B(T»
n
~
..- *
2
where by (2.27), {vMn(TnB-T)}
n
n
= 0p (1),
n
and by (1.1), T
n
S 0,
(2.33)
is AMU-l, so that by
(2.33) and some routine steps, (2.32) converges to 0, as n (or M)
~
00.
Thus,
- 13 (2.34)
and hence. {T* } and {T } are asymptotically PC-equivalent.
nB
n
Note that if T
n
is AMU-2 for 9(F). then by (2.27) and (2.33). (2.32) is OeM
that the picture is quite comparable to (2.7).
we do not need (2.27).
-1,h
) = O(n
-~
). so
Actually. for (2.34) to hold.
Recall that V£(T:S-T ) ~ 0 as n ~
n
00.
and hence. in
(2.33). dropping M on the right hand side. we still claim that (2.34) holds.
Thus. for the asymptotic PC-equivalence result in (2.34). (1.1) and (1.8)
-~
suffice.
But. to obtain an O(n
of T* (given F ). for k
nB
n
~
) rate for (2.34). conditional AMU-k property
2. may require more elaborate analysis than in the
case of T .
n
Let us compare Tnj and T* in the light of the PCM.
nB
Unfortunately. due
to possible lack of transitivity [c.f. Blyth (1972)]. (2.7) and (2.34) may not
by themselves convey the entire picture.
However. we may proceed as in (2.32)
and write
*
* nJ ) 2 + 2(T*nB-TnJ )(Tnj-9(F»
PCMF(TnB;Tnj)
= E1-[PF {(TnB-T
n
*
2 + 2(T* -Tnj)(T -9(F»
- PF {(TnB-Tnj)
nB
nJ
n
< O}
) O}].
(2.35)
If Tnj is median unbiased (MU) for 9(F) while conditionally. given F . T:S is
n
MU for Tnj' then (2.35) is
to 9(F) than T* .
nB
~
O. so that in the 19iht of the PCM. Tnj is closer
In a variety of situations. particularly. in the
location-scale models. Tnj (: Tn) has a d.f. sYmmetric about 9(F). so that T
nJ
is MU for 9(F). In scale models. T may be so adjusted by a scalar factor
n
that T becomes MU for 9(F). and the choice of such a jackknifed version may
nJ
be made on the ground of equivariance. as has been elaborated in Ghosh and Sen
( 1989) and Nayak (1990).
probability mass n-
1
On the otherhand. F
n
is a discrete d.f. with
at the given points x1 •...• x . and hence. the conditional
n
distribution of TnB
* . given Fn . may not be sYmmetric about Tn (or TnJ) or may
- 14 -
not have even median equal to TnJ'
On the set (Fn ) where the conditional
median of T* , given F , is nonegative, (2.35) is
nB
n
complementary set, (2.35) may be both
TnB-T
* nJ .
~
~
0, while on the
"
or l 0, depending on dispersion of
Invoking sYmmetry of F and on the functional G(e), as is the case in
R-, M- and L- estimation of location, we may argue that the part or the sample
*
space (in Rn ) for which the conditional median of TnB-T
, given F , is
n
nJ
has probability 1/2, and hence, in such a case, (2.35) is
dominates T*
nB is the POC.
~
~
0
0, so that T
nJ
This argument may run into difficulties in a
general case where such sYmmetric structures are not easy to comprehend.
Nevertheless, we may argue as in after (2.32) and show that
(2.36)
while in the "smooth" case, (2.36) can be strengthened to O(n
-~
) when M = O(n)
and both TnJ and T*
nB are AMU-k for some k l 2. In the small to moderate
sample cases, by a skillful choice of T , we expect a better PC-performance
nJ
than T* , although for large values of n (and M), this effect is
nB
imperceptible.
In the formulation of the bootstrap estimator T* , we have
nB
adopted the so called 'equal probability sampling' schemes, so that the T* .,
nl
i l 1, are conditionally, given F , i.i.d.r.v. 's.
n
There are some other
variants of bootstrap sampling schemes primarily designed to reduce M subject
to a level of accuracy of the bootstrap estimators comparable to the equal
probability sampling scheme; balanced sampling, importance sampling and a few
other schemes have been incorporated in this context.
(1989 a,b) where other references have been cited.
We may refer to Hall
So long as (2.18) is
justifiable for the individual bootstrap sample estimates, we may as well
derive a variant form of (2.21), and thereby verify that the conclusions made
before all pertain to such schemes, allowing M to be comparatively small.
Nevertheless, one needs to assume that M ~
00
as n
~
00,
although the rate is
- 15 -
generally somewhat slower.
Judged from the PCM point of view, it seems more
pertinent to choose MU TnJ and to adopt a bootstrap samplign scheme such that
the conditional median of T* , given F , is
n
nB
TnJ more often than being> T .
nJ
~
That would bring (2.35) close to 0 and make T*
nB
quite favorably comparable to
the TnJ.
3.
MULTI- PARAMETER CASE
Consider a generalization of the model in Section 1 where
a
'"
is a p-vector (p
~
= '"a.(F} = (al(F), ... ,ap (F})'
(3.l)
I) of functionals of the underlying d.f.F, and the d.f. F
may as well be a multivariate one; the sample observations are denoted by
Xl, ... ,X.
'"
"'n
We denote the empirical d.f. by F and consider the estimator
n
(vector)
T
"'n
= (Tn 1, ... ,Tnp )' = (al(Fn }, ... ,ap (Fn )}'.
(3.2)
As a natural extension of (l.l), we have under appropriate regularity
condi tions,
(3.3)
where the nonnegative definite (nnd) matrix E(F}, a matrix-valued functionals
of F, may again be consistently estimated by jackknifing or bootstrapping
methods; we denote these estimators by
*
YnJ and Yna
respectively.
For the jackknifed version of T , we define the pseudevariables (vectors)
"'n
Ini as in (1.3) and let (as in (1.4})
r~T =
- -j>.J
n
-1
..Jl
2.1=1
T .; "'n
V J = (n-l)
"'n1
-1
..Jl
2.1= l(T
·-r~T}(T
'-~T)
"'n1 - -j>.J
"'n1- j >.J
,
.
(3.4)
For the bootstrapped version of T , we define the T* . as in (1.6) and let
"'n
"'n1
- 16 -
(3.5)
In the conventional case. we may consider a quadratic loss
Pn(~) =
where Q is a given nnd matrix. and define
Then. we may compare
PnJ(~)
E
Ln(In.~).
PnJ(~)
= E Ln(!n.r·~)·
(3.7)
PnB(~)
*
= E L(!na'~).
(3.8)
and
PnB(~)
to draw conclusions about the relative
performance of the jackknifed and bootstrapped version of T .
"'Il
In the "smooth" case (where (2.2) holds in a vector-form). we have
parallel to (2.3). as n
Pn(~)
~
00.
= PnJ(~)
+ n
-1
-1
[~(F)]' g[~(F)] + o(n
(3.9)
).
so that the jackknifed version has a slight advantage over the classical
version. and this is mainly due to the "bias" reduction role of jackknifing.
Similarly. parallel to (2.24). we have
PnB(~)
= Pn(~)
-1
+ M
Tr(9£) + 0(1).
as
M~
00.
n
~oo.
(3.10)
1
and in (3.10). 0(1) can be replaced by O(M- ) if M = O(n) and other smoothness
conditions discussed after (2.24) hold in the vector case too.
From (3.9) and
(3.10). we may conclude that jackknifing may be generally (slightly) more
informative than bootstrapping.
Let us examine the relative picture in the light of the POe (which we
need to extend to the vector case too).
With the definition of the loss
function L(·) in (3.6). we extend (2.4) as
(3.11)
- 17 As such. as in (2.5). we have
(3.12)
As in Sen (1991). we introduce the notion of multivariate median unbiasedness
(MMU) property and extend (2.6) as follows:
If for a positive integer k and a n~-consistent estimator {T }.
""Il
(3.13)
where U = {a
: lIall
.....
.....
2
= I}. then {T
} is kth order asymptotically mul tivariate
""Il
median unbiased (AMMU-k) for
~(F).
Diagonal symmetry (or nearly so) of the
d.f. of ""Il
T around .....9(F) ensures the MMU-property.
In the smooth case. taking
into account (1.5) (coordinatewise). we see that whenever T
""Il
some k
~
1. (3.12) is 0(1). so that !oJ and In are asymptotically
GPC-equivalent.
some k
~
is AMMU-k. for
2.
An order of
n~ may also be obtained when (3.13) holds for
Also. for (3.12) to be 0(1). it is not necessary to be confined
only to the smooth case.
If we assume that the first order representation in
(2.8) along with (2.11) hold coordinatewise. then (2.17) directly extends to
GPCMF(!oJ:In)
~
0 as n ~
00.
The treatment of asymptotic GPC-equivalence of
* and In runs parallel to that in (2.32) through (2.34) with direct
Inn
adaptations from (3.12) and (3.13).
* In)
that GPCMF(Inn:
~
0 as n
~
Hence. omitting these steps. we claim
Finally. (2.35) also extends directly to
00.
the vector case. and hence. if !oJ is MMU for
~(F).
it may have an advantage
* when n is not so large.
over the bootstrap estimator Inn.
In the multiparameter case. T may not necessarily be optimal (even
""Il
asymptotically) although in a coordinatewise setup an (asymptotic) optimality
property can be established.
In the literature. this effect is known as the
- 18 Stein-effect. and there exist some shrinkage estimators (also known as
Stein-rule estimators) which dominate T in quadratic risk (as well as in
"'Il
other criteria).
Stein-rule versinos are typically non-linear having a
dominant "pre-test" flavor and are generally biased.
The introduction of the
bias term is deliberate with the objective of lower the risk.
On the other
hand. jackknifing is primarily incorporated to reduce the 'bias' without
compromising on the dispersion of an estimator.
As such. it seems counter
intuitive to use jackknifing on Stein-rule estimators.
In fact. in some
simple multivariate model. it has been shown [viz .. Sen (1986)] that
jackknifing a Stein-rule estimator may not reduce the risk any further (and.
rather. make it comparatively larger) and may fail to estimate the risk also.
If we assume that coordinatewise T satisfies the first order representation
"'Il
in (2.8) [along with (2.11)]. then similar conclusions hold for jackknifed
Stein-rule versions of T.
"'Il
As such. there is not much incentive in examining
the informative aspects of such estimators.
The situation may be somewhat
different with the bootstrapping. and hence. we shall discuss some Stein-rule
bootstrap versions of T for depicting this situation.
"'Il
Recall that the bootstrap version of T considered here is
"'Il
JI
= M-1 2:.1 T* ..
1=
"'Ill
(3.14)
..
11 y on Fn ) . so t ha t T*
where the "'Ill
T*. are 1. .i•d. r. vectors ( cond Itlona
~ rea 11 y
estimates "'Il
T.
r:a
towards
Hence. it may be of some interest to inquire whether shrinking
In makes
it more informative (on ~(F»? As such. we consider the
following bootstrap Stein-rule estimator:
T*s - T + {I-c d ~*-1 Q-l V-I} (l~n-In).
~ - "'Il
'" n n n
'"
NnJ
--11D
u
where c
n
is a positive number (converging to a limit c
is the smallest characteristic root of
9 YnJ.
YnJ
> 0). d n
(3.15)
= chp (Q'"
V J)
"'Il
is defined by (3.4) and
- 19 -
(3.16)
YnJ by the bootstrap version ~. defined by
* behaves like a stochastic matrix.
(3.5). But conditional on F . YnB
n
converging stochastically to YnJ. while YnJ is held fixed under the bootstrap
sampling scheme. Hence. it is more natural to consider YnJ instead of ~ in
It is also possible to replace
(3.15) and (3.16).
r:: -
Recall that by (3.15).
~(F)
=
[In-~(F)]
+ U.-cndn
~;-1
g-1
~}(T~-In)
(3.17)
so that we have
~{(r:: - ~(F»' g(r::-~(F»}
=
~[(In-~(F»'g(In-~(F»]
Q{I-c d ~*-1 Q-1 V- 1 }(T* -T )}.
'" '" n n n
'"
"'nJ ~ "'ll
(3.18)
The last term on the right hand side of (3.18) is nonnegative and is of the
order (nM)-1.
~
n
* }
{!na-T
"'ll
For linear functional. E {~-In}
f
= Q.
and in our setup.
n
-~
= "'P
0 (n ).
while it can be 0 (n-1 ) under additional regularity
"'P
conditions [including M = O(n)].
Also. in the contemplated bootstrap sampling
scheme. given F . as n increases.
n
2
~n ~~.
a central chi square r.v. with PDF.
(3.19)
As such. we may use a variant of the Stein identity in a conditional (but
asymptotic) setup. and claim that as n increases.
~ {[l-cndn d;-1 g-1 ~] (~-In)}
n
=
0
"'P
(n-~).
(3.20)
1
while it can as well be equal to Q (a.e.) for linear functionals or Q(n- )
under additional regularity conditions.
Since n~(T -S(F»
"'ll '"
is asymptotically
normal. it seems that the second term on the right hand side of (3.18) can be
- 20 -
made 0{n- 1 ) in general. and 0{n-3/ 2 ) under additional regularity conditions;
it may also be equal to 0 in some special cases.
n
~{T:: - ~(F»' Q{r::-~{F»}
O{M-1 ).
+ 0(1) +
as n
= n
Thus.
~{{In-~{F»' Q{In-~{F»}
~oo.
(3.21)
Consequently. shrinking ~ towards In may not provide any extra incentive
*
Even so. .T_n-T
= ""p
0 ({nM) ~ ). and hence. such a shrinking
-110 ""1l
unless M is large.
is of any perceptible effect.
A very similar picture holds for the
GPC-criterion.
Next. we may enquire whether shrinking a bootstrap estimator towards an
arbitrary pivot
of T?
""1l
e
"'0
makes it more appealing than the parallel shrinkage version
e =0.
Without any loss of generality. we let
T
= {I
s
T *
= {I
s
""1l
""1l
1 Q-1 y- 1}T
- c n d n w~ 0
~T'
n '"
.~~ ""1l
'"
'" -
and
~* - ~(F)
Consider then
"'0 '"
=
(3.22)
d (0- 1 Q-1 y- 1 }T*
cn n ~n* '"
~ "'nB'
~n* =
* YniInB)·
-1 *
nM{InB)
(~-In)
I
+
(~-~{F»
+
Again. we may write
!!n.
(3.24)
where
(3.25)
In passing. we may remark that in (3.16). conditional on F •
n
*n
~
has
asymptotically (in probability) central chi square distribution with pDF.
~
when actually (nM).
F .
n
*
(T_n-T)
has asymptotically H (O.Y ). conditionally on
. - uo ""1l
P '"
""1l
For the last result. we need to presume that the asymptotic bias of (nM)~
*
(InB-In).
conditional on Fn . is negligible. i.e ..
*
E{{T_n-T
) IFn } = ""p
0 ({nM) -1A ).
•-110 ""1l
(3.26)
••
- 21 -
nB IFn )
But E(T*
= E(T*nl. IFn ).
VM. and hence. (3.26) may also be written as
* I-T ) IF } = 0 ((nM) -~ ). as n
E{(T
'Vfl
'Vfl
n
p
~
(3.27)
00.
-1
In particular. if we let M = O(n). then (3.27) leads to o(n
).
Let us also
note that
-11
* Fn }
n
-2
.
E{~
~ E{~ ~
no
(3.28)
}.
so that E(~-l_~-':) ~ E{~-1_~2U(1) } which is a nonnegative quantity.
no n>"<
no l ' • l'liL
no
Thus.
operating with the loss function in (3.6). using (3.24) and incorporating a
two-stage expectation (first under F and then over F ). we obtain that
n
F_
-F
s
Ln (T
*. ~9(F»
'Vfl
= -F
F_
n
s
Ln (T
. ~9(F»
'Vfl
+
M- 1 Tr(Qv)
+ 0(1).
~
(3.29)
so that we have a similar conclusion that for large M. granted (3.27). TS * and
'Vfl
S
T are asymptotically (as n ~ 00) equivalent. while for finite M. the shrinkage
'Vfl
S
bootstrap estimator T
*
'Vfl
is less informative than T
S
'Vfl
if we use (3.24) and the GPC-criterion.
•
A similar result holds
In passing. we may remark that in an
S
asymptotic setup. the shrinkage estimator T behaves better than T
'Vfl
'Vfl
9(F) lies in a Pitman-neighborhood of e i.e .. n ~ 119(F)-9
~o
~
~
n liT
s
'Vfl
P
- T II -+ O. as n
'Vfl
~
00.
~
~o
II
= 0(1):
only when
otherwise.
As such. we need to confine ourselves to a
Pitman-neighborhood of the pivot
Qo
= (= Q) when studying (3.29). and even so.
the second term on the right hand side clearly reflects the dominant role M
plays in this context.
4.
SOME GENERAL REMARKS
In Sections 2 and 3. we have tacitly assume that T
n
is closely
approximable by a linear functional for which the asynmptotic normality result
holds under appropriate regularity conditions.
There may be some situations
- 22 ~
where such a linear functional is degenerate. so that n (T -O(F»
asymptotically a degenerate distribution.
.a
has
n
In such a case. (1.9) is redundant.
Much of the charms of bootstrapping may also vanish in such a non-regular
case; Athreya's (1987) findings are pertinent in this context.
In this setup,
the use of a quadratic risk function may be dubious. and a generalized
PC-criterion may provide a better interpretation.
In such a non-regular case,
the variance-estimation role of jackknifing may also become questionable.
However. the bias reduction role of jackknifing remains as a dominant one.
especially because the bias of T and its asymptotic standard error may be of
n
the same order.
Thus. from this point of view too. jackknifing may have some
advantages over bootstrapping.
Let us consider some other situations which may arise when T
n
apprOXimable. but not too smoothly.
is linearly
As an example. consider the 'bundle
strength of filaments' model [Sen (1973)] where X ..... X are i.i.d.r.v.·s
1
n
+
with a d.f. F defined on R , and where
1
T = n- { max (n-i+1)X .. }
n
l~i~n
n·l
(4.1)
and the X .. are the order statistics corresponding to the Xl. .
n·l
T as
We may rewrite
n
T = sup{x[l-F (x-)]: x € R+ }
n
(4.2)
+
O(F) = sup{x[l-F(x)] : x € R }.
(4.3)
n
and define
Note that {Tn; n
~
I} forms a reverse sub-martingale and if X is the unique
o
point such that x (l-F(x »
o
0
= 8(F). then on letting
Mn = xo[Fn(xo ) - F(xo )] = n -1
~
n [(T -8(F»
n
+ M]
n
ff!i=l{xo[I(X
~
0
It has also been shown that n~IE(T ) - 8(F)
n
i ~ x o ) - F(xo )]}'
a.s., as n
I~0
~
00.
as n ~
00.
(4.4)
(4.5)
although E(T ) n
-23-1
a(F) may not be O(n
).
~
The asymptotic normality result for n (T -a(F»
n
follows directly from (4.4) and (4.5). although jackknifing may not eliminate
the bias up to the order n
-1
.
Motivated by this example. we consider the case
where for a functional T = a(F ).
n
n
~(Tn) = a(F)
+ n-
X
X
a(F) + o(n- ). for some X
€
(~.1].
(4.6)
In this case. we have
~(TnJ)
= a(F) + (1-X)n
-X
-X
a(F) + o(n
(4.7)
).
As such. we do not expect (1.5) to hold; rather. we would have
(4.8)
This result may play an intricate role in the ascertion of the AMU-k property
in (2.6). and may lead to the supremacy of one of Tn and TnJ over the other in
the PC sense; in quadratic risk sense. we would have in (2.3). n- 1 ~2(F)
replaced by n
1-2A X2 a 2 (F).
Thus. for X
< 1.
we have a smaller discrePancy.
It is not necessary to assume that T (=a(F » are functionals of F
n
n
n
alone.
We may as well consider arbitrary T
suitable
n
~F)
satisfying (2.8) and (2.11) (for
* ·
and draw similar conclusions on TnJ and TnB
Besides Ln(e).
other loss functions may as well be used in the multiparameter cases.
REFERENCES
[1]
Akahira. M. and Takeuchi. K. (1979).
The concept of asymptotic
efficiency and higher order asymptotic efficiency in statistical
estimation theory.
[2]
Lecture Note. Univ. Tokyo. JaPan.
Athreya. K.B. (1987).
case.
Bootstrap of the mean i the infinite variance
Ann. Statist. 15. 724-731.
- 24 [3]
Blyth. C.R. (1972).
Some probability paradoxes in choice from among
random alternatives.
[4]
Efron. B. (1979).
.I. Amer. Statist. Assoc. 67. 366-373.
Bootstrap methods:
Another look at the jackknife.
Ann. Statist. 1. 1-26.
[5]
Efron. B. (1982).
The .Iackknife. Bootstrap and Other Resampling Methods.
SIAM Publ .. Philadelphia.
[6]
Ghosh. M. and Sen. P.K. (1989).
Closeness.
[7]
Median Unbiasedness and Pitman
J. Amer. Statist. Assoc. 84. 1089-1091.
Hall. P. (1989)
On efficient bootstrap simulation.
Biometrika 76.
613-617.
[8]
Hall. P. (1989).
Antithetic resampling for the bootstrap.
Biometrika
76. 713-724.
[9]
Jureckova. J. (1984).
Statistics, Vol. IV:
P.K. Sen).
M-. L- and R- estiamtors.
In Handbook of
Nonparametric Methods (eds. P.R. Krishnaiah and
North Holland. Amsterdam.
[10] Nayak. T. (1990).
Estimation of location and scale parameters using
generalized Pitman nearness criterion.
.I. Statist. Plan. Infer. 24.
259-268.
[11] Pitman. E.J.G. (1937).
The closest estimators of statistical parameters.
Proc. Comb. Phil. Soc. 33. 212-222.
[12] Sen. P.K. (1973)
of filaments.
On fixed size confidence bands for the bundle strength
Ann. Statist. 1. 526-537.
[13] Sen. P.K. (1977).
Some invariance principles relating to jackknifing and
their role in sequential analysis.
[14] Sen. P.K. (1986).
Ann. Statist. §. 316-329.
Whither jackknifing in Stein-rule estimation?
Statist. Theor. Meth. 15. 2245-2266.
Commun.
-~-
[15] Sen. P.K. (1988).
asymptotics.
Functional jackknifing:
Ann. Statist. 16. 450-469.
[16] Sen. P.K. (1988).
Functional approaches in resarnpling plans:
of some recent developments.
[17] Sen. P.K. (1990).
Seguen. Anal.
Rationality and general
A review
Sankhya. Ser. A 50. 394-435.
On the Pitman closeness of some sequential estimators.
~
[18] Sen. P.K. (1991).
Pitman closeness of statistical estimators:
years and the renaissance.
In Issues in Statistical Inference:
Latent
Essays
in Honor of D. Basu (eds. M. Ghosh and P.K. Pathak). IMS Lecture Note
Monogr. Ser. No. 17.
[19] Sen. P.K .• Kubokawa. T. and Saleh. A.K.M.E. (1989).
the sense of the Pitman measure of closeness.
1375-1386.
The Stein paradox in
Ann. Statist. 17.