Sen, Pranab Kumar; (1985).A Second-Order Asymptotic Distributional Representation M-Estimators with Discontinuous Score Functions."

A SHX::NIrORDER ASYMP'IDTIC DISTRIBurIaw. REPRESENrATION
M-ESTlMATORS WITH DISCONI'INlOUS SCORE FUNCTlffiS
by
Pranab Kurrar Sen
IEpartrrent of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mirreo Series No. 1485
May 19135
A SECOND-ORDER ASYMPTOTIC DISTRIBUTIONAL REPRESENTATION OF
M-ESTI~~TORS
WITH DISCONTINUOUS SCORE FUNCTIONS
By Jana Jureckova and Pranab Kumar Sen
I
Charles University. Prague. Czechoslovakia.
and University of North Carolina. Chapel Hill
For a nondecreasing score function having finitely
many jump-discontinuities. a representation of M-estimators
with the second order asymptotic distribution is established.
and the result is also extended to one-step versions of Mestimators.
Let {X.; i
Introduction.
1.
1
>
-
I} be a sequence of independent and
identically distributed random variables (i.i.d.r.v.) with a distribution
function (d.f.) F(x - 8). where 8 is an unknown location parameter.
~:
1
R
(1.1)
~
Let
1
R be a function such that for
fOO
A (t)
_00
~(x-t) dF(x).
t
R\
E:
O.
A(0)
Consider an estimating function
(1. 2)
M (t)
n
n
L:. I
1=
~(x.-t).
1
t
E:
I
R • n
>
1.
AMS Subject Classification: 60F17, 62E20. 62F35. 62G05
Key Words and Phrases: Jump discontinuity. M-estimator. one-step version
of M-estimator, random change of time, weak convergence of M-processes.
1) Work of this author was partially supported by the Office of Naval
Research, Contract No. N00014-83-K-0387.
2
The allied M-estimator 8
of 8 (corresponding to a fixed scale of F)
n
is defined as a solution of the equation
o·,
M (t)
(1.3)
n
•
under quite general regularity conditions, 8
estimator of 8.
(1.4)
8
For nondecreasing
1/2
n
may be written as
1
n
-2(su P{t: M (t) > O} + inf{t: M (t)
n
The existence of 8
of n
~,8
is a (weakly) consistent
n
n
<
n
0:).
in general as well as the boundedness (in probability)
~
18 n - 81 has been studied by a host of workers.
estimator T
n
For a general
e"
= Tn (xl""'xn ) of 8, denoting by IF T (') the influence
function of T , an asymptotic representation of the form
n
(1. 5)
R
n
o (1),
p
is of primary interest, and has been studied by many authors (viz.,
Serfling (1980), Huber (1981) and the
references cited therein).
A
representation of this type for }1-estimators of location, supplemented by
the order of R , was studied by Carroll (1978) for strongly consistent
n
versions of 8_n and for smooth
exact orders of R
n
~-functions.
.Jure~kova
for smooth as well as discontinuous
(1980) derived the
~-functions;
this
.-
result was extended to the regression model by Jureckova and Sen (198la,b).
Jureckova, Janssen and Veraverbeke (1985) obtained an analogous result for
M-estimators of general parameters (including the maximum likelihood estimator
•
3
and the Pitman estimator).
The representation in (1.5), supplemented by the order of R ,
n
has important applications.
Firstly, asymptotic relations of different
types of estimators (e.g., L-, M-, R-estimators), up to different orders
of equivalence, can be fruitfully studied with the aid of (1.5).
We may
~
refer to Jureckova (1983) for some of these developments.
M-estimation theory, generally,
en
Secondly, in
is not scale-equivariant, and hence,
either one-step versions or some other modifications (see, for example,
~
Jureckova and Sen (1984)) are usually adapted to suit the purpose.
Ap-
proximating an estimator by its one-step version or some other form, may
also be effectively studied by incorporating (1.5) for such aT.
n
Actually,
in sequential analysis (or in related areas), for such a representation,
one needs to be more precise on the order of R
n
(see for example,
Jure~kova and Sen (1982)), and this constitutes an important area of
research too.
There are numerous other applications of such a representation.
The next natural step in representations of type (1.5) (which we term
as the first order asymptotic representations) would be to supplement the
order of R
n
by the asymptotic distribution (if any) of the same.
A result
of this kind was derived by Kiefer (1967) in the context of Bahadur's
(19b6) representation of sample quantiles.
Asymptotic representations
supplemented by asymptotic distribution of remainder term will be referred
to as the second order asymptotic (distributional) representations; the
asymptotic distributions of R will be typically nonnormal.
n
The second order
asymptotic representation of M-estimator of location generated by a smooth
~'-function
"
.(1985), who used the method
was recently derived by Jureckova
of random change of time in the invariance principle connected with M-estimator.
4
~-
Our aim will be to extend this result to possibly discontinuous
functions.
More precisely, the representation (1.5) for M-estimator 6
n
takes
on the form
•
(1.6)
under quite general conditions, provided y
=
y(~,
F)
# O.
If
~
is
l 2
smooth (i.e., twice differentiable almost everywhere (a.e.), then R =0 (n- / )
n p
1 4
while, for ~ admitting finitely many jump discontinuities, R =0 (n- / ).
n p
Let further T
n
be an estimator of 8 admitting a first order representation
i.e. ,
(1.7)
•
where
(1.8)
II
~(x) dF(x)
o
and 0 < 0
~
00,
R
Then, for smooth
~,
as n
2
~
V
(1.9)
n(T
n
- 8) - na(T
- 8)2 + y-I{M (T ) - M (8)}
n
n n
n
~ ~l
•
~2
•
with a
= I
~(x)
dF(x)/2y and where ~
=
(~l'
~2) has a bivariate normal
~
distribution (see Jure~kova (1985».
(1.9) applies also for 8 , for the
n
least-square estimator (LSE) as well as for the maximum likelihood estimator
5
(MLE) in the role of T
n
it breaks down when
~
under general regularity conditions.
However,
is not smooth.
The primary objective of the present study is to focus on such
a second order representation in the case where
discontinuities.
~
admits of jump-
A different normalizing rate (in
~)
as well as a
different type of limiting law arises in this context comparing with
However, the result is in correspondence with that of Kiefer
(1. 9) .
(1967).
n l / 2 (T
(1.10)
n
Specifically, for an estimator T , satisfying (1.7), we have
D
- 8) -+ E,
n
and
n- l / 4 {n(T -8) - y-l[M (T ) - M (8)]}
n
n n
n
where, letting I[A] be the indicator function of the set A,
(1.11)
s
has a normal distribution with zero mean and W , W are independent
z
l
copies of Wiener process on [0,00).
Along with the preliminary notions, the main results are presented
in Section 2 and they are derived in Section 3.
The method used there
is based on a random change of time in certain invariance principles for
~
~
H-statistics, studied earlier by Jure~kova (1980) and Jureckova and
Sen (1981a, b).
The last section deals with the second order behavior of
the one-step version of M-estimators and, in this context, illustrates the
effect of the choice of an initial estimator.
6
2.
A second order representation theorem.
~,
function
we assume that
~l(x) + ~2(x),
~(x)
(2.1)
where
~l
x
~2
a.e., and
is a step-function.
there exist real numbers
-
j=O,l, ... p, where _00 = a
for x
E., 0
E
J
1 < j <
E
1
R
is absolutely continuous on any bounded interval in R
(~(l)
it possesses first and second derivatives
P (>1),
Concerning the score
p.
< j
-
O
<
a
l
~(2),
•
and
respectively)
Specifically, we assume that for some
8. and open intervals E.=(a., a.+ l ),
J
J
J
J
< ...
<
a
O
<
a +1 = 00, such that
p
< p; conventionally, we let
-
Also, we assume that
and
l
~
is
~2(a.)
J
~2(x)=S.
J
1
= -2 (S. + 8. 1)'
J
J-
increasing and that 1
R
1
~(x)
dF(x)=O.
Concerning the d.f. F, we assume that F possesses an absolutely
continuous and symmetric density f having a finite Fisher information
1(f)
(2.2)
(f'(x)/f(x»
2
dF(x)
«00)
,
where f (x)
(d/dx) f(x).
1
(2.3)
R
1
Also, we assume that
~lv)(x)
dF(x)
exists
(v
1,2) ,
(2.4)
and, either,
~l
is constant outside a fixed interval [-kl,k], k > 0,
or, there exist positive and finite numbers
0
and K such that
~
7
(~i2)(X ± t»2 dF(x)
1
(2.5)
R
1
<
Viti
00
<
0.
Further, we assume that f' is bounded and continuous in a neighbourhood
of a., j=l, ... ,p.
J
Denote
(2.6)
p
L. 1 (13 .-13
Y2v
J=
a
(2.7)
J
J-
1)
v
f(a.),
J
v
L~ 1 (13.-13. 1) f' (a.);
J=
J JJ
1'Z
=
1,2,;
Y=
\1 +
Z1 '
Y
*
Y
where we assumed
(2.8)
Y
:f 0,
Y
22
> O.
Then, we have the following
THEOREM 2.1.
Under the assumed regularity conditions, provided
for any {T } satisfying (1.7), the r.v.
n
(2.9)
Z
n
converges in law to
(2.10)
~
*
H{I[~ > 0] W1(1~1)
+
I[~ < 0] W2(1~1)}
~Z
:f 0,
8
where W , W are independent copies of
2
l
standard Wiener process on
~
[0,00) and ~ has ~ standard normal distribution, independently of WI'
(0
H
(2.11)
ep
Y22
)1/ 2
-1
Y
,.
i.e., Zn has an asymptotic d.f.
,~
p(~
(2.12)
where
f
co
2. x)
2 J (i( H x t
-1/2
o
is the standard normal d.f.
) d ep ( t)
If, however,
~2
- 0 in (2.1), then
(1. 9) holds.
Remarks.
(i)
For (1.9) to hold, we need that
(2.12) , it is not necessary to assume that
~l
~2
= 0 although, for (2.9) -
= o.
. h a 2 = J 'fI 2 (x) dF(x) , so that
O
l
R
2
(2.9) - (2.11) holds with H = (a 0 Y22 I Y-3)1/
3/4
(T - 8)2
(iii) The quadratic term in (2.9) may be omitted as n
(ii)
2
-2
8 we have a ep = y a 2
n
O
For T
n
w~t
n
op
l 4
(n- / ) = 0 (1).
(iv)
p
In particular, let 0
<
p
<
1 and put 'fI
then f 'fI(x - t) dF(x) - 0 for t - F-l(p).
~(x
l
- F- ) (p», we have Y21 = f(F-l(p»
we arrive at Kiefer's (1967) result.
l
=0
and
~2(x)
= p - I[x
<
Replacing ~(x) by ~*(x)=
Y22 = Y,
a~
= pel - p); hence
0];
.'
9
Consider the process
The proof of Theorem 2.1.
3.
(3.1)
2,
+n 1/2 t - y*t }
*
where yand yare
defined as in Section 2.
(3.2)
Z
t
E:
1
R
By (2.9),
n
~
By Jure~kova (1980) (see Corollary on p. 69), the process
"i~
=
W
n
.,'(
{W (t),
n
It I 2.k}, where
*
-K
W (t)
n
(3.3)
converges to Gaussian process W*
Vt
E:
< t
<
K
{W* (t), It I < k} with EW* (t)
°
[-K, K] and
EW * (s) W* (t)
(3.4)
0,
{
Is I
A
It I,
st
<
°
st
>
0,
in the Scorokhod topology on D[-K, K] for any K >
° as
n
~
00.
Hence,
to prove Theorem 2.1, we may make use of a random change of time (i.e.,
t
~
n 1/2(T
n
- 8»
and use (3.2) , (3.3) and (3.4) .
Towards this, we
consider the weak convergence of {(WO(t), £(T' - 8) ) ,
n
n
Note that by (1. 7) and (1. 8)
It I
< K}.
10
n
m(T
(3.5)
n
L:
- 8)
U .
i=l
nl
+
Q..
(1)
p
where
U .
nl
(3.6)
= n-1 / 2 ~(x.1
-
O, E U2.
nl
8); E U .
nl
1, ... ,n.
i
Le t us firs t cons ider the case If - If 2 •• Then, by (1. 2) and by
the definition of '±'2;
n
*
L: U • (t)
(3.7)
i=l nl
where
*
nl
U . (t)
n
-1/4
P
-1/2
Z (S.-S'_l){[I[t<O] I[a.+n
t < X.- 8< a.] j=1 J J
J
1
J
i=l, ... ,n
(3.8)
l 2
are i.i.d. random variables and n is so large that aj+l-a j > n- / K,
o .::..
j
< O.
Then
(3.9)
Var (U *.. (t»
Jl
It In
Cov (U ., U* .(t»
nl
nl
-1
Y22 + o(n
*nl
-1
)
E (U • • U . (t»
nl
and
11
hence
Cov(n- l / 4 [M (8 + n- l / 2 t) - M (8)],
(3.10)
n
n
as
n
-+
00.
(3.5) - (3.10) readily leads to the convergence of the finite dimensional
a
-1 1/2
r-
distributions of {W , vn(T - 8)} to those of {(y
n
n
~O - N(OI o~).
Also, !J1(T
n
Y22
*
W,
~
- 8) is relatively compact, while
a)}
where
{W~(t),
It I ~ K} is relatively compact due to the weak convergence of W* of (3.3).
n
Thus, as n
-+
00,
a
{(Wn(t), m(T n - 8», It I ~KI}
(3.11)
V ( -1 1/2 ,~
-+ Y Y22
W,
Since, for any n > 0, there exists a K(O < K < 00) such that
p(I~OI ~ K) > 1 - n, we may use (3.11) and apply the random change of
time:
t
2
[n l / (T
-+
n
- 8)]K (where yK
= Y if IYI ~ K and yK = a otherwise).
Using the results in Section 17 of Billingsley (1968), we obtain that as
n ->- 00,
-1
(3.12) Z
n
where
(3.13)
~
~
o/01' 'it.:
W (t)
N(Oll).
1/2 W'~(I:"O)
22
'?
( l' Y22 ) 1/2 y-l 1/(1:")
'?
°
Y
Y
It
is easy to show that
I[t>O] Wl (Itl) + I[t<O] w2(ltl),
12
where WI and W are two independent copies of a standard Wiener process
2
on [0,00).
This completes the proof for
~
= ~2'
The
proof for
~
= ~l
is
~
contained in Jureckova (1985).
~
If
=
~l
~2
+
where none of the components vanish, it follows from
Lemma 3.1 of Jureckova (1985) that
(3.14)
IW~l (t) I
sup
Itl~K
o
p
(n- l / 4 )
as
n
00
-r
where
n
-1/4
n
-1
L
-y
i=l
+
(3.15 )
n
1/2
t -
2
(Y12/2y) t },
is the component of W~(t) corresponding to ~l'
of (WO(t), In(T -8»
n
n
t
t:
1
R
Hence, the weak convergence
follows that of the component corresponding to ~2'
Q.E.D.
4.
Representation of one-step versions of M-estimators.
Theorem 2.1 to T
n
- cl
n
Applying
of (1.4), we get that
(4.1)
where R
n
(4.2)
is the remainder term in the representation (1.6) and
13
where W and W are independent copies of a standard Wiener process and
l
z
has the standard normal distribution and
~
*
(4.5)
H = (0 0 YZZ)
l/Z
-3/Z
Y
en
It is often convenient to approximate
by its one-step version
Ai'e
on
of the following form:
starting from an initial estimate T satisfying
n
In(T -6) = 0 (1), we put
p
n
T ,
{ n
(4.4)
T
n
+
A
(n y)
n
-1
M (T ),
n
n
where Y is a consistent estimator of y, and may be taken as
n
{M (T
(4.5)
n
with t , t (t
z l
l
t
l
<
= -t z = t(> 0).
n + n -l/Zt)
1 - Mn (T n + n -l/Z) }/(Ir1(t
.
n Z- t»
1
t ) being some arbitrary real numbers; often, we let
Z
The one-stop M-estimators were first considered by
..
Bickel (1975) and later on studied by Jure~kova (198Z) (see also Janssen,
..
Jure~kova and Veraverbeke (1985) for a more general setup) who showed that,
t
provided \jIZ
I y-n
A
(4.6)
0,
1
y-l!
o
p
(n- l / 4)
and
A* A
16 n -6 n I
for any In-consistent initial estimator T.
Hence, the asymptotic
n
" i'e
distribution of 6
n
coincides with that of 6
n
and the effect of the choice
14
of T
n
may appear only in the second order asymptotic properties.
Hence,
parallel to (4.1), we are interested in the limit distribution (if any) of
(4.7)
This is given in the following theorem.
Assume that the conditions of Theorem Z.l are satisfied.
THEOREM 4.1
A*
A
Let en be the one-step version of en given in (4.4) with Y of (4.5) and
n
with T
--
n
satisfying (1.7) and (1.8).
*
(4.8)
H{W (0 -
(tZ-t )
l
-1
Then
*
tl
[W (1;+ --)
0<1>
where W* is the Gaussian process of (3.4),
the vector I;
=
H is defined in (Z.l1) and
(I;, 1;0) has the bivariate normal distribution
Nz(~~
(1 p»
P 1
with
00
(4.9)
Proof.
p
f
00
_00
¢(x)
~(x) dF(x) • {f
00
_00
¢2(x) dF(x) • f -co
It follows from (3.1) and (4.5) that
".
~Z(x) dF(x)}-l/Z.
15
n
V
n
1/4
A
(y -y)
n
(4.10)
Also, by the Slutsky
v*n
(4.11)
n
theorem,
1/4(Ayy-1 - 1)
Further, by (3.1) and (4.4), we have
,,<
Z
n
n 1 / 4R +
n
y
-1 n1/4(y~-1 _ 1) n- 1 / 2 [M (T ) _ M (8 )] +
n
nn
nn
0
p
(1)
(4.12)
Moreover, utilizing (1.6) and (1.7), we obtain that the limiting distribution
of
(4.13)
m(T -tl )
n
n
coincides with that of
0~ ~ - y-100~0 where (~, ~O) has the bivariate
normal distribution with zero expectation and correlation matrix (lP )
P1
with P of (4.9).
16
It follows from the proof of Theorem Z.l that the joint (trivariate)
distribution of (WO(rn(T -8)), WO(rn(T -8-' + t ), WO(rn(T -8) + t z)
n
n
n
n'
n
n
l
converges to that of
(4.14)
Moreover, following the lines of the proof of (3.11), we get that
rr(4.15) {(WO(t),
n
yn(T
-8), yn(8
-8)),
n
n
A
It I
V
<
K}
*
-liZ W, ~
(y-1 yzz
+
Combining (4.lZ), (4.13), (4.14) and (4.15), we arrive at (4.8).
Q.E.D.
Remark.
(4.8) shows the effect of the choice of Tn as well as of Yn
(more precisely, of t
l
and t ).
Z
It follows from (4.8) that the limiting
distribution of Z* coincides with that of n
n
8
n
l/4
R
(Le., 8
n
up to the second order term) if and only if T
to 8
n
n
Go =
° \l1iUl probability 1).
n
coincides with
is asymptotically equivalent
17
References
[1]
Bahadur, R.R. (1968). A note on quantiles in large samples.
Ann. Math. Statist. 1I, 577-581.
[2]
Bickel, P.J. (1975). One-step Huber estimates in the linear
model. J. Amer. Statist. Assoc. lQ, 428-433.
[3]
Billingsley, P. (1968).
J. Wiley.
[4]
Carroll, R.J. (1978). On almost sure expansions for M-estimates.
Ann. Statist. ~, 314-318.
[5]
Huber, P.J. (1981).
[6]
Janssen, P., Jureckova, J. and Veraberbeke (1985). Rate of
convergence on one- and two-step M-estimators with application
to maximum likelihood and Pitman estimators. Ann. Statist.
13, No.3.
[7]
Jureckova, J. (1980). Asymptotic representation of M-estimators
of location. Math. Operationsforsch. Statist., Series
Statistics 11, 61-73.
[8]
Jure~kova, J. (1983). Robust estimation of location and regression
parameters and their second order asymptotic relations. Trans.
9th Prague Conf. on Inform. Theory, Statist. Decis. Functions
and Random Processes, 19-32. Reidel, Dordrecht.
[9]
Jureckova, J. (1985). Representation of M-estimators with the second
order asymptotic distribution. Statistics! Decisions, 1, in press.
Convergence of Probability Measures.
Robust Statistics.
.
J. Wiley, New York •
[10] Jure~kova, J. and Sen, P.K. (1981a). Invariance principles for some
stochastic processes relating to M-estimators and their role in
sequential statistical inference. Sanehya A, ~, 190-210.
[ 11] Jure~kova, J. and Sen, P.K.
(198lb). Sequential procedures based on
X-estimators with discontinuous score functions. Jour. Statist.
Plan. In£. ~.' 253-266.
J
[12] Jureckova, J. and Sen, P.K. (1982). M-estimators and L-estimators
of location: Uniform integrability and asymptotic risk-efficient
sequential versions. Sequen. Anal. l, 27-56 .
....
[13] Jureckova, J. and Sen, P.K. (1984).
M-estimators in linear models.
On adaptive scale-equivariant
Stat. ! Dec. ~, Suppl. 31-46.
[14] Kiefer, J. (1967). On Bahadur's representation of sample quantiles.
Ann. ~fath. Statist. 38, 1323-1342.
[15] Serfling, R.J. (1980). Approximation Theorems of Mathematical
Statistics. J. Wiley, New York.