Ren, Jian-Jian; (1990).On Hadamard Differentiability of M-Estimation in Linear Models."

ON HADAMARD DIFFERENTIABILITY
AND M-ESTIMATION IN LINEAR MODELS
by
Jian-Jian Ren
A dissertation submitted to the faculty of the University
of North Carolina at Chapel Hill in partial fulfillment of
the requirements for the degree of Doctor of Philosophy
in the Department of Statistics
Chapel Hill
1990
Approved by:
Advisor
Reader
Reader
JIAN-JIAN REN.
On Hadamard Differentiability and M-estimation in Linear
Models. (Under the guidance of PRANAB KUMAR SEN)
Abstract:
Robust (M-) estimation in linear models generally involves statistical
functional processes. For drawing statistical conclusions (in large samples), some
(uniform) linear approximations are usually needed for such functionals. In this
context, the role of Hadamard differentiability is critically examined in this
dissertation. In particular, the concept of the second-order Hadamard differentiability and some related theoretical results are established for the study of the
convergence rate in probability of the uniform asymptotic linearity of the Mestimator of regression. Thereby, using Hadamard differentiability through the
linear approximation of the estimator, the asymptotic normality, the weak
consistency and an asymptotic representation are derived under the weak
conditions on the score function t/J, the underlying d.f. F, and the regreSSIOn
constants.
Some other approaches are also considered for the study of the asymptotic
normality and the weak consistency, and these approaches lead to two sets of
conditions for the strong consistency of the M-estimators in linear models.
TABLE OF CONTENTS
CHAPTER I. INTRODUCTION
1.1 Introduction
1.2 Literature Review
1.3 Summary of Results and Further Research
1
4
12
CHAPTER II. ON HADAMARD DIFFERENTIABILITY OF STATISTICAL
FUNCTIONAL PROCESSES
2.1 Introduction
19
2.2 Preliminary Notions
:.24
2.3 Main Results
31
2.4 Proof of Theorem 2.3.1
35
CHAPTER III. ON M-ESTIMATION IN LINEAR MODELS
3.1 Introduction
3.2 Preliminaries
3.3 Asymptotic Normality
3.4 Consistency
63
67
87
102
CHAPTER IV. ON THE SECOND-ORDER HADAMARD DIFFERENTIABILITY AND ITS APPLICATION TO THE UNIFORM ASYMPTOTIC
LINEARITY OF M-ESTIMATION IN LINEAR MODELS
4.1 Introduction
118
4.2 Notations and Assumptions
122
4.3 On the Second-Order Hadamard Differentiability
126
4.4 Convergence Rate in Probability of the Uniform Asymptotic
Linearity of the M-estimators of Regression
137
NOTATIONS AND ABBREVIATIONS
REFERENCES
153
--:
155
CHAPTER I
INTRODUCTION
1.1 Introduction
The linear regression model is one of the most widely used tools in
statistical analysis. In this dissertation, we consider the following simple linear
model:
X.1
where the
~i
= /Fc.
+ e.,1
- -1
i = 1, ... , n
are known p-vectors of regression constants, 1!=(13 1 ,
vector of unknown (regression) parameters,
p~1,
(1.1.1)
... ,
13p)T is the
and is to be estimated from n
observations Xl' ... , Xn , and ei are independent and identically distributed
random variables (i.i.d.r.v.) with distribution function (d.f.) F. One should notice
that (1.1.1) reduces to the classical location model if all
~i'
with p=1, are equal to
1.
Classically, the problem is solved by minimizing the sum of squares (with
respect to
~):
~ (X.1 -c.
TO)2
-1 -
• L.J
= min!
(1.1.2)
1=1
or, equivalently, by solving the system of p equations (with respect to
~)
obtained
2
by differentiating (1.1.2)
n
" c.(X.1 -c·
TO)
-1 -
.LJ -1
1=1
= O.-
(1.1.3)
The computation of the Least Square Estimator (LSE) defined by the estimating
equations (1.1.3), which are linear equations, can be easily done and the statistical
properties of LSE, such as the consistency and the asymptotic normality, were
studied in detail (Huber, 1981). But as we know, in spite of its mathematical
beauty and computational simplicity, the least square estimator suffers a dramatic
lack of robustness against plausible departures from the model based assumptions.
In fact, the Influence Function (introduced by Hampel, 1968, 1974), which
describes the effect of an additional observation at any point x on a statistics, of
the residual of LSE is not bounded. This shows that one single outlier can have an
arbitrarily large effect on the estimate (Huber, 1981). Moreover, when the
underlying d.f. F has heavy tails (which is a departure from the assumption that
the model is usually based on: the normality of ei)' the least square estimator is
not efficient, and may not even be consistent.
Due to a lack of robustness of LSE, other robust estimators of
I!
have been
proposed. In particular, Huber (1973) robustized the classical equations (1.1.2)
and (1.1.3) in a straightforward way: instead of minimizing a sum of squares, we
minimize a sum of a less rapidly increasing function
p
of the residuals, i.e.,
(1.1.4)
or, if
p
is convex with derivative t/J, (1.1.4) is equivalent to
3
n
" c·
• LJ -1
1=1
·',(x.1 -c·-1 TO)
'f'
= O.
-
(1.1.5)
The resulting estimator of (1.1.5), ~n, is called M-estimator of Regression and tP is
called Score Function of the M-estimator. The robustness properties of such Mestimator were investigated by Hampel, Ronchetti, Rousseeuw and Stahel (1986),
and it shows that, for local robustness concern, this M-estimator performs very
well. In particular, the influence function of the residual of this M-estimator is
bounded, when the score function tP is bounded. So, in robustness sense, the Mestimator, comparing with LSE, is an improvement in linear model.
However, the robustness properties of M-estimator of regression is not the
interest of our study. Our current concern is the asymptotic properties of this Mestimator: the consistency and the asymptotic normality. As we shall see in
Section 1.2, people have required strong conditions either on tP or F for such
study. The motive of our research is to study the asymptotic properties of the Mestimator of regression by incorporating Hadamard differentiability under the
weak or alternative conditions on tP, F and h~i}' l:5i:5n.
Since M-estimators are implicitly defined by the estimating equations, it is
difficult to directly study their statistical properties. Various procedures have been
considered by various people for such study. The principle difficulty is the nonlinearity of the estimator. Often, such is the case that a linear approximation may
provide a very workable tool for the study of the asymptotic properties. In this
context, we shall find that Hadamard differentiability provides an easy way to
study the asymptotic properties of the M-estimator of regression through the
linear approximation of the estimator.
4
1.2 Literature Review
So far, a considerable amount of work on M-estimator in linear models has
been done. Some of the work on the asymptotic properties of the (Huber) Mestimator of regression for fixed
p~l
will be briefly reviewed in this section.
For convenience sake, denote
(1.2.1)
ldn = ( Cij )nxp
as the design matrix, denote
(1.2.2)
as the projection matrix, and let
(1.2.3)
Assume ~n=(.8nl' ... , .8np)T is the M-estimator of regression, i.e. , the
solution of (1.1.5). Under the assumptions that the function 1/J is continuous,
nondecreasing and bounded with a bounded second derivative, Huber (1973)
showed that, when fp2_ 0,
..
p
..
an = E·J= 1 a·J {3 nJ•
IS
asymptotically
normal
for all
arbitrary
coefficients
aj
satisfying
EP_la~=1. He also suggested the estimation of the covariance matrix of .8n.
J-
J
-
Yohai and Maronna (1979) improved Huber's similar result. They showed that an
5
is asymptotically normal assuming that t/J has a bounded third derivative and that
ep
3/2
-
O.
Under weak conditions on t/J, Koul (1977) and Jureckova. (1977, indirectly)
also showed the asymptotic normality, but they both required that the underlying
d.f. F has finite Fisher's information, i.e.,
o < I(f) =
J(f /f?dF <
00
where F'=f is the density function of F.
As a corollary of Theorem 4.1 of Bickel (1975), Yohai and Maronna (1979)
obtained the asymptotic normality under the following assumptions on t/J and F:
(i) t/J is nondecreasing
(ii) There exist positive numbers b, c, and d such that
t/J(x + ~)-t/J(x) ~ d
if Ixl
~ c, Izi ~ b
and c satisfies F(c)-F( -c»O .
(iii)
J~00[t/J(x+h)-t/J(x-h)]2dF(x) = 0(1),
as h-O
and for some e>O
sup
Iql~e, Ihl~e
{I~I
Joo [t/J(x+q+h)-t/J(x+q)]dF(x)}
-00
< 00.
(iv) There exists A( t/J,F) such that
J~OO[t/J(x+h)-t/J(x-h)]dF(x)= hA(t/J,F) + 0(1).
RelIes (1968) proved the consistency of this M-estimator of regression when
6
t/J belongs to the Huber family
t/J(x, k) = min{lx/, k}sgn(x)
where k is a positive real number. Yohai and Maronna (1979) proved the weak
consistency, assuming that, in addition to (i) and (ii), Al (l)~l)n)-oo, where
Al (~) denotes the smallest eigenvalue of the matrix ~.
For non-convex
p,
Jureckova (1989) derived the asymptotic normality and
the weak consistency under the assumption of certain existence of the second
derivative of F.
Among the work on M-estimator, we notice an interesting approach which
shall play an important role in our research later on. Treating M-estimator as a
statistical functional, its asymptotic properties were considered by Boos and
Serfling (1980), Huber (1981) with Frechet derivative used; by Fernholz (1983,
location model) with Hadamard derivative used; and by Kallianpur (1963, MLE)
with Gateaux derivative used. The general idea of these people's work was based
on a form of the Taylor expansion involving the derivatives of the functional by
Von Mises (1947). In nonparametric models, a parameter () (=T(F)) is regarded as
a functional T(.) on a space GJ of distribution functions F. Thus, the same
functional of the sample d.f. Fn (i.e., T(Fn)) is regarded as a natural estimator of
(). Von Mises (1947) expressed T(Fn) as
T(Fn)=T(F)+TF(Fn -F)+Rem(Fn -F; T(.))
(1.2.4)
where T F is the derivative of the functional T at F and Rem(Fn - F;T( .)) is the
7
remainder term in this first order expansion. Note that Fn(x)=!2:i 1 I(Xi ~x) is
based on n independent and identically distributed random variables Xl' ... , Xn ,
each having the d.f. F, and that T p is a linear functional. Hence, Tp(Fn-F) is an
average of n i.i.d.r.v.'s. For drawing statistical conclusions (in large sample), T p
plays the basic role, and in this context, it remains to show that Rem(Fn -F;T(.))
is asymptotically negligible to the desired extent. Appropriate differentiability
conditions are usually incorporated towards this verification. The Frechet
derivative is generally too stringent, and the existence of the second-order
derivative is assumed when the Gateaux derivative is applied (Von Mises, 1947;
Kallianpur, 1963) to show that the remainder term is asymptotically negligible,
which is a rather strong condition. Since, in a majority of statistical applications,
Hadamard
differentiability,
which
a
IS
weaker
condition
than
Frechet
differentiability, suffices without the assumption of the existence of a second-order
derivative, we are currently interested in Hadamard differentiability for our
research concern.
In particular, Fernholz (1983) considered the M-estimator in location
model with Hadamard derivative applied. More specifically, assuming that Xl' ".,
Xn is a sample from a population with d.f. F, Huber
(196~
defined the functional
corresponding to the estimating equation of the M-estimator of location
.
n
2: t/I(X. -9) = 0
1=
1
1
to be a root T(F)=9 of the following equation
J t/I(x-O)
dF(x) = 0
(1.2.5)
8
or equivalently, of
1
I
o
tJl(F-1(t)-o) dt =
°
(1.2.6)
where F is strictly increasing and continuous. Hence, the M-estimator defined by
(1.2.5) is also equivalently defined as a root Tn=T(Fn)=O of
1
I tJl(F!i(t)-O)
o
dt
= 0.
We observe that a statistical functional T induces a functional
r
on the space
D[O,l] (of right continuous functions having left hand limit) by the relation
r(G)=T(GoF) for GeD[O,l]. Thus, (1.2.4) can be written equivalently as
r(U n ) = r(U)+ru(U n - U)+Rem(U n - U; r(.)),
(1.2.7)
where Un is the empirical dJ. of the F(Xi ), l::5i::5n, and U is the uniform dJ. on
[0,1], and we generally have that, for any GeD[O,l], r(G)=O is a root of
(1.2.8)
where G- 1eD[0,1] defined by G- 1(y)=inf{1, x; G(x)~y}. Using the Hadamard
differentiability (along with some other regularity conditions), Reeds (1976)
showed that, for any Hadamard-differentiable (at U) functional r induced by a
statistical functional T,
9
p
Vi Rem(U n - Uj r(.)) - 0,
(1.2.9)
Assuming that t/J is continuous and pIeceWIse differentiable with a bounded
derivative t/J' which vanishes outside of some bounded interval, Fernholz showed
that the functional r defined by (1.2.8) is Hadamard-differentiable at U.
Therefore,
where ru(Un-U)=!r:i lIC(Xij F,T) (IC(xj F,T) is the influence function of T
at F) and q2=VarF{IC(Xj F, T)}<oo.
We can easily see that Fernholz was dealing with the Li.d. random
variables, but even if we just consider the simple linear regression model with p=l
X·1
= t.
0 + e.,
11
i=l, ... , n
Xl' ... , Xn are not i.i.d. random variables. Thus, anything similar to (1.2.8) does
not apply to the regression model. Hence, the statistical functional corresponding
to the M-estimator of regression cannot be defined in the same fashion as the one
for location model.
However, we notice that, for each ~ERP, the left hand side of the
estimating equation (1.1.5), denoted by
n
E- n (9)
= . 't'"
~
1=1
C• •
-1
I·(X.1 -c.
TO),
-1 -
'I'
,
10
is a linear functional of the following empirical function
VIZ.,
(1.2.10)
By the definition of Hadamard differentiability,
derivative of a certain functional
T.
~n(~)
could be the Hadamard
Since, as we pointed out earlier, the non-
linearity of the estimator is the essential difficulty for the study of the asymptotic
properties of the M-estimator in linear models, and since the uniform linear
approximation of the estimator by Jureckova (1977), i.e., assuming that F has
finite Fisher's information, then, for any K>O and 1 ~j ~p
(") p
sup
IE.(Y,~)-E.(Y,,B)+w(~_,B)TqJ I - 0,
II~-~II~K J
J
-
(1.2.11)
where E/Y,~) are the components of ~n(~), q{j) is the jth column of l)~I)n and
w=- f.,p(x)f'(x)dx, provides an easy access to the study of the asymptotic
properties of the M-estimator of regression, we may develop a way of using
Hadamard differentiability to establish something similar to (1.2.11) for the study
of the asymptotic properties of ~n under the weak cO:lditions on .,p, F and I)n.
More specifically, we may generalize the result (1.2.9) with respect to an empirical
function which is similar to
proper functional
T,
IJri( "~)
(Chapter II), and then, using (1.2.10) for a
we may establish the
(Jureckova-)
uniform linear
11
approximation of the M-estimator of regression.
For the more detailed property, Jureckova and Sen (1984) considered the
convergence rate in probability of (1.2.11). Assuming that F has finite Fisher's
information, that t/J is nondecreasing and continuous with a bounded derivative for
xe(a,b) and t/J,=O outside of (a,b), and that the design matrix I)n satisfies certain
conditions, they showed that, for any K>O,
(1.2.12)
where
~ c..IJ [·I·(X.
);~"VtTC']
NnJ.(t)
- = .LJ
'I"
1- ':.QTC.+~tTC.)_.I·(X.
-1 '4n- -1 'I" 1_fJTC
- -1.)- ~n'--1
1=1
and teRP . Furthermore, by (1.2.12), they gave an asymptotic representation of
the M-estimator of regression:
..
1 E
n c;.t/J (X._c;.TfJ ) +OpC~)
1
Vi(fJn-fJ)
= Q -1 ~
"'('4 n i= 1 1
1 1 ~n
where c;i T is the row of I)n, "Y=Jt/J'(x)dF(x) and
lim
-nl DnTD
Q = n-oo
-- n
is a positive definite pxp matrix.
12
1.3 Summary of Results and Further Research
For the simple linear model (1.1.1), we consider a normalized version of the
estimating equations (1.1.5) as the following:
Mn(u)
= 't""'!1
c. ·'·(Y.1 _CT.
u)
.l..t 1=1-m Of'
-mwhere Yi=Xi -~T~i are i.i.d.r.v.'s with d.f. F, and for every
(1.3.1)
n~p,
(1.3.2)
gg = Diag( ~rnll' ... , ~rnpp)
(1.3.3)
(1.3.4)
(1.3.5)
Then, we see that (1.1.5) is equivalent to
(with respect to y),
(1.3.6)
i.e., Yn=gg(~n-~) is a solution of (1.3.6), and that, for any 1::;j::;p,
(1.3.7)
where 11·11 stands for the Euclidean norm and
(1.3.8)
13
Since ¥n(Y) involves the following empirical function
(1.3.9)
particularly, for each yeRP and l$k$p, Mnk(y) (the k-th component of ¥n(Y))
is a linear functional of
1
S*k(t,u)="!1
T. U )
n
- '£"'1= 1cn1'kI(Y.1-<F- (t)+c-n1-
(1.3.10)
(the k-th component of ~ri(t,y)), viz.,
(1.3.11)
In
Chapter II, we generalized the result (1.2.9) with respect to the following
statistical functional process:
{T(
where
T
S* (.,u)
~k - ); yeRP , l$k$p}
".
'£"'1=1 c'k
m
is any functional defined on the space D[O,l]. More specifically, for any
fixed k=l, ... , p and any Hadamard differentiable (at U) functional
T,
we showed
that, for any K>O, as n-oo
In
sup .L cnik
lyl$K 1=1
{ ( S~k(
n .,y))
c
T
Li=1 nik
-T
n)1 -p
( U(.) )} -TU, ( Snk("
* y)-U(·).L cnik
0,
1=1
(1.3.12)
14
where 1~1=I(x1' ... ,xq)l=m~
Ix,l, under the following assumptions:
1::9~q 1
(A1) cnik ~ 0,
(A2)
i=l, 2, ... , n
lim
n-oo
(B) F is absolutely continuous with a derivative F' which is positive and
continuous with limits at ±oo.
In the general case of {cnik}' by (1.3.12), we showed that, for any l~k~p
I~
+
(S:k(·'Y))
sup .LJ Cnl·k T "n c+
LJi=l nik
IYI~K 1=1
- T(U)
n
E c ·k
· 1 111
1=
-
~ -
( S~k(·,y) )
T
. LJ CIII'k "n c1=1
LJi=l nik
,(
- TU S*k(·'Y)-U(.)
n
-
n)1
E
C'k -p 0,
.1111
(1.3,13)
1=
under the assumption (B) and
(A3)
where, for every l~i~n, l~k~p, tE[O,l] and YERP ,
c- 'k;m
c+·k=max{O, Cm·k}' c-m·k=-min{O, Cm'k};
Cm
·k=c+·kmm
(1.3.14)
(1.3.15)
C
-c+
c-·
-n.k--n.k--n.k'
(1.3.16)
15
(1.3.17)
_+
cnik=
c+ jd+
{nik nk
0
__
ml'
if diIk>O
C;'ik/d;'k
(1.3.18)
cnik=
0
if d;k=O,
~+.={~+.
-m
if d;k>O
...
~+.)T
'mp'
~-
if diIk=O;
.=(c- .
-m
ml'
... c-. )T'
'mp'
(1.3.19)
(1.3.20)
(1.3.21)
so that
(1.3.22)
and
(1.3.23)
Since, for every y.ERP and l~k~p, S;k(., y.) is an element of D[O,l], using
(1.3.11), (1.3.13) and the definition of (the first-order) Hadamard differentiability,
we, in Chapter III, proved that, for a proper functional
T
and any K>O, as n-oo
16
AT
t
= c+.
nk.
mk
1=1
{T( ",n
S:k("Y) )-T( S:k(·,Q) )} _
c+
",n c+
.L.i=l nik
_t.
C-·
1=1
mk
.L.i=l nik
{T( S;k( .,y) )-T( S;k( .,Q)
",n c.L.i=l nik
",n c.L.i=l nik
)},
l~k~p.
Furthermore, the uniform asymptotic linearity of M-estimator of regression
(1.3.24)
was established from showing
where 9n=Ei 1~ni~ni T=(ggr 1gn(ggr 1 and 1=jF(x)d1/J(x) >0. Immediately,
along with some regularity conditions, the asymptotic normality followed from
(1.3.24). Using a similar idea, the weak consistency was also shown by
incorporating Hadamard differentiability through the uniform asymptotic linearity
of the estimator.
However, the Hadamard differentiability approach may not provide a way
for the study of the strong consistency, which naturally requires stronger
conditions on 1/J, F and {~i}' In this context, some other ap1)roaches were
considered in Chapter III to establish (1.3.24) for the asymptotic normality and
the weak consistency, and these approaches led to two sets of conditions for the
strong consistency of the M-estimator of regression.
17
In all of our approaches in Chapter III towards the asymptotic normality
and the weak consistency, none of the assumptions, such as bounded second
derivative on 1{J, or finite Fisher's information on F, or (ii), or any existence of the
second derivative of F, was required.
Since, the uniform asymptotic linearity of the M-estimator of regression,
(1.3.24),
was
differentiability,
established
by
we naturally
incorporating
consider
the
using the
first-order
Hadamard
second-order Hadamard
differentiability, which provides a second order expansion, Le., a more preCIse
approximation, to study more detailed asymptotic property of (1.3.24): the
convergence rate in probability, which leads to an asymptotic representation of
the estimator. In Chapter IV, the general definition of the second-order Hadamard
differentiability and some related theoretical results, such as the chain rule,
certain sufficient conditions of the existence of the second-order Hadamard
derivative in some special spaces, and the results corresponding to (1.3.12),
(1.3.13) with respect to the second-order Hadamard differentiability, were given.
The results on the convergence rate in probability of (1.3.24) by incorporating the
second-order Hadamard differentiability and an asymptotic representation of the
M-estimator of regression were derived in the last section of Chapter IV, where we
basically required weaker conditions on {£), 1{J and F than Jureckova. and Sen
(1984).
In further research, we believe that this Hadamard differentiability
approach for the study of the asymptotic properties of the M-estimator can be
applied to the same study for Random effects models and Mixed effects models.
From the fact (Jureckova, 1977) that in linear models, any M-estimate
18
corresponds to an R-estimate such that the estimates are asymptotically
equivalent in the sense of convergence in probability, we also believe that the
results of the asymptotic normality and the weak consistency on M-estimator in
this dissertation can be established under the same assumptions for the Restimator by the same Hadamard differentiability approach.
CHAPTER II
ON HADAMARD DIFFERENTIABILITY
OF STATISTICAL FUNCTIONAL PROCESSES
2.1 Introduction
In nonparametric models, a parameter 0 (=T(F)) is regarded as a
functional T(.) on a space c:f of distribution functions (dJ.) F. Thus, the same
functional of the sample dJ. Fn (i.e., T(Fn)) is regarded as a natural estimator of
o.
Using a form of the Taylor expansion involving the derivatives of the
functional, Von Mises (1947) expressed T(Fn ) as
T(Fn)=T(F)+TF(Fn-F)+Rem(Fn-Fj T(.))
(2.1.1)
where T F is the derivative of the functional at F and Rem(Fn -FjT(.)) is the
remainder term in this first order expansion. Note that Fn(x)=!Ei=lI(Xi 5x) is
based on n independent and identically distributed random variables (i.i.d.r.v.)
Xl' ... , Xn , each having the dJ. F, and that T F is a linear functional. Hence,
TF(Fn -F) is an average of n LLd.r.v.'s. For drawing statistical conclusions (in
large sample), T F plays the basic role, and in this context, it remains to show
that Rem(Fn -Fj T(.)) is asymptotically negligible to the desired extent.
20
Appropriate differentiability conditions are usually incorporated towards this
verification.
We observe that a statistical functional induces a functional on the space
D[O,l] (of right continuous functions having left hand limits) in the following way:
r(G) = T(GoF),
GeD[O,l].
(2.1.2)
Thus, (2.1.1) can be written equivalently as
r(Un)=r(U)+ru(U n -U)+Rem(Un -U; r(.)),
(2.1.3)
where Un is the empirical d.f. of the F(Xi ), l::;i::;n, and U is the classical uniform
d.f. on [0,1] (Le., U(t)=t, O::;t::;l). Since the expansion in (2.1.1), written in
(2.1.3), is based on some kind of differentiation, it is quite natural to inquire
about the right form of such a differentiation to suit the desired purpose. The
current literature is based on an extensive use of the Frechet derivatives which
are generally too stringent. Less restrictive concepts involve the Gateaux and
Hadamard (or compact) derivatives (viz., Kallianpur(1963), Reeds(1976) and
Fernholz(1983), among others). Using the Hadamard differentiability (along with
some other regularity conditions), Reeds(1976) has shown that
Vi Rem(U n - U; r(.))
.E 0,
as n-+oo
(2.1.4)
so that noting that ru(Un-U)=!L:i lIC(Xi; F,T), where IC(x; F,T) is the
influence function of T at F, and assuming that u2=VarF{IC(Xi; F,T)}<oo, one
21
obtains that
(2.1.5)
In the context of the law of iterated logarithm or some almost sure (a.s.)
representation for T(Fn), one may require a stronger mode of convergence in
(2.1.4), and this, in turn, may require a more stringent differentiability condition.
However, in a majority of statistical applications, Hadamard differentiability
suffices, and we shall explore this concept in the context of functional process
arising in robust (M-) estimation in simple linear models.
Consider the simple linear model:
X.1 = I3T
c·
-1
+ e.,1
(2.1.6)
where the ~i are known p-vectors of regression constants, 1!=( 13 1' ... , I3p)T is the
vector of unknown (regression) parameters, p~1, and ei are i.i.d.r.v.'s with d.f. F
(E'!F). Based on a suitable score function t/J: R-R, an M-estimator ~n of
defined as a solution (with respect to
~)
I!
IS
of the following equations:
"'!l
c· t/J(X.1 -OT
c.)
L... 1=1 -1
-1
=0
-,
(2.1. 7)
where" =9" accommodates the possibility of left hand side being closest to
9 when
equality in (2.1.7) is unattainable (such a case may arise when t/J is not continuous
everywhere). Setting Yi=Xi-I!T~i (i.i.d.r.v.'s with d.f. F), we shall see that the
following empirical function
22
(2.1.8)
arises typically in the study of the asymptotic properties of ~n, where the £ni are
suitably normalized version of the £i' In particular, we set for every
n~p,
(2.1.9)
gg =
c-m.
Diagqrnll' '" , ~rnpp)
= (COn)-l
c,
= (cm'1'
-1
... ,c . )T,
mp
(2.1.10)
(2.1.11 )
Thus, letting
(2.1.12)
and
Mn(u)
= Lt1=1-m
"'!l c ..0/,,(y.1-c-m
T. u)
- ,
(2.1.13)
we see that (2.1.7) is equivalent to
(with respect to y).
(2.1.14)
The solution of the implicit (set of) equations is greatly facilitated by the
following type of (Jureckova.-) linearity for M-processes: for every finite real
number K>O, as n-oo
23
(2.1.15)
where lyl=l(v 1 , ... ,vq)l=m~ Iv·l, yeR
1 :5I5q 1
q
(q is an arbitrary positive integer),
1=Jt/J'dF>O and 9n=2:i 1~ni~~i=(ggr1gn(ggr1.Under various conditions on
the {~n)' the score function t/J and the d.f. F, (2.1.15) has been established
(Jureckova., 1984), and this provides an easy access to the study of the asymptotic
properties of the M-estimator ~n.
We consider a different approach here. Consider a statistical functional
process:
{T (
S* (.,u))
~k c -.
j
}
P
yeR , 15 k 5P
2:1=1 mk
where S;k(t,Y)=2:i 1cnikI(Yi 5F-l(t)+~~iY) are the components of gri(t,y) and
T is given by (2.1.2), and for this statistical functional process, the results
corresponding to (2.1.4) are shown in this chapter. Since for each yeRP , Mnk(y)
(the k-th component of ¥n(Y)) is a linear functional of S;k(t,y), viz.,
k=l, ... ,p,
Mnk(y) could be the Hadamard derivative of a certain functional
the results of this chapter, for a proper functional
K>O, as n-oo
T,
T.
(2.1.16)
Thus, using
we have, for an arbitrary
24
sup
lyl~K
larnk + [Mnk(Y) -
M k(Q)]1
n
R 0,
(2.1.17)
where
A
ur nk
_
-
~
·k
L.J c
1.-- 1 m
{
(
r
S;k( .,y) )
n
".
L.Jl=l c·k
m
r
( S;k( .,Q) )}
n
".
L.J 1=1 c·k
m
.
(2.1.18)
Let
(2.1.19)
then, (2.1.15) follows from showing
(2.1.20)
Some notations along with basic assumptions are presented in Section 2.2.
In the same section, the notion of statistical functionals and the concept of
Hadamard differentiability are also introduced. The main results along with part
of their derivations are considered in Section 2.3. The proof of Theorem 2.3.1 is
given separately in Section 2.4.
2.2 Preliminary Notions
Consider the D[0,1] space (of right continuous real valued functions with
left hand limits) endowed with the uniform topology. The space C[O,l] of real
valued continuous functions, endowed with the uniform topology, is a subspace of
25
D[O,1]. For every yeRP and 1:5k:5p, denote by
te[O,1]
(2.2.1)
where Yi are LLd. random variables with d.f. F and ~ni eRP is given by
i=1, ... , n.
(2.2.2)
It is easy to see that, for every yeRP and 1:5k:5p, S~k(" y) is an element of
D[O,1].
The
population
counterpart
of
the
I(Yi:5F·l(t)+~~iY) are
the
F(F·l(t)+~~iY)' and this leads us to consider the following:
(2.2.3)
Let 11·11 stands for the Euclidean norm, we also write for every 1:5i:5n and 1:5k:5p,
(2.2.4)
c-no k = (cn lk' ... , Cnn k)T
c· ·k; m
c+·k=max{O, cm
·k}'m
c· ·k=-min{O, cm·k};
cm
·k=c+·kmm
C
-m.=c+.
-m -c··'
-m'
c+
. ... , c·.
-m.=(c+.
ml' ... , C+.)T
mp , c·
-m.=(c·ml'
mp )T.,
}
(2.2.5)
(2.2.6)
(2.2.7)
26
(2.2.8)
+
cnik=
{nik
c+ /d+
nk
if d~k>O
0
if d~k=O;
__
C;'ik/d;u,
if d~k>O
(2.2.9)
cnik=
0
c+ .=(c+. ... C+.)T
-m
ml'
'mp'
if d~k=O;
(2.2.10)
(2.2.11)
(2.2.12)
so that
(2.2.13)
Let
!
= (1, ... , 1)T,
(2.2.14)
and
(2.2.15)
where q is an arbitrary positive integer, then, for any O<K<oo, (t,Y)EEp +l '
(t,y,y)EE 2p +l ' 1~k~p, we write
27
(2.2.16)
(2.2.17)
(2.2.18)
A*k(t,u,v)
= . £J
~ cm·kI(Y.1-<F- 1(t)+c+.
T(2u-I)K-c.T(2v-I)K)
n
- -m
- -m
-1=1
(2.2.19)
(2.2.20)
(2.2.21)
(2.2.22)
(2.2.23)
It is easy to see that
(2.2.24)
where
28
A;k(t,1!,1!) = S~(t,(21!-DK)
}
(2.2.25)
Ank(t,y,y) = E{A~k(t,y,y)} = Snk(t,(2y-I)K).
Also, let f be a function defined on E q , we denote
(2.2.26)
where 1~1=I(xl' ... , xq)l=m~
1::9~q
Ix·1 for any positive integer q.
1
Some assumptions, which may be required for our mam results m this
chapter, are given below:
(Al) cnik ~ 0,
i=l, 2, ... , n
lim
n-oo
(B) F is absolutely continuous with a derivative F' which is positive and
continuous with limits at ±oo.
In order to prove our results in Section 2.3, some basic concepts about
statistical functional and Hadamard differentiability are needed.
29
DEFINITION.
Let Xl' ... , Xn be a sample from a population with d.f.
F and let Tn=Tn(X l , ... , Xn ) be a statistics. If Tn can be written as a
functional T of the empirical d.f. F n corresponding to the sample Xl' ... , X n ,
i.e., Tn=T(Fn ), where T does not depend on n, then T will be called a Statistical
Functional.
For our current study, we actually consider an extended statistical
functional, or a statistical functional process:
The domain of definition of T is assumed to contain S~k(F(.), y)/r:i lCnik for
all n~l, l~k~p and yeRP , as well as the population d.f. F. Usually, the range of
T will be the set of real numbers.
As we see earlier, any statistical functional T induces a functional
T
on
D[O,l] by the relation given in (2.1.2). In Section 2.3 and 2.4, we will always
assume that functional
DEFINITION.
T
is induced by a statistical functional T.
The Influence Function (IF) or Influence Curve (IC) of a
statistics T at a d.f. F is usually defined by
IC(x; F,T) =
it T(F + t(ox- F )) It=o
where Ox is the d.f. of the point mass one at x.
30
Let V and W be the topological vector spaces and L(V, W) be the set of
continuous linear transformation from V to W. Let .A be an open set of V,
DEFINITION.
A functional T: .A--+ W is Hadamard Differentiable ( or
Compact Differentiable) at Fe.A if there exists TpeL(V, W) such that for any
compact set r of V,
T(F+tH) - T(F) - T p (tH)
.
hm
t
=0
t--+O
(2.2.27)
uniformly for any Her. The linear function T p is called the Hadamard Derivative
of Tat F.
For convenience sake, in (2.2.27), we usually denote
Rem(tH)
= T(F+tH)
- T(F) - Tp(tH),
(2.2.28)
then, correspondingly in Section 2.3 and 2.4, we always use the notation
Rem(tH) = r(U+tH) - r(U) - ru(tH),
(2.2.29)
where H is an arbitrary element of D[O,l].
By the two previous definitions, we can easily see that the existence of
Hadamard derivative implies the existence of the influence curve and we also have
31
T p(5 X -F) = IC(x; F,T).
(2.2.30)
2.3 Main Results
THEOREM 2.3.1.
Suppose
T:
D[O, 1] -
R
is a functional and is
Hadamard differentiable at U. For any fixed k=l, 2, ... , p , assume (AI), (A2)
and (B). Then, for any K>O, as n-oo
(2.3.1)
Therefore, we have, as n -
00
(2.3.2)
The proof of Theorem 2.3.1 will be given in Section 2.4. In the general case
of {cIII'k}' we have the following theorem.
THEOREM 2.3.2.
Suppose
T: D[O, 1] -
R
is a functional and is
Hadamard differentiable at U. Assume (A3) and (B). Then, for any l$k$p and
32
sup
Iy I~
.p. + ( S~k{ .,y))
I
+
· LJ Cm·k ",n
K 1=1
LJi=l nik
T
- T{U)
Proof.
C
-
.p. C- ·k ( s~i/ .,y)
. LJ
1=1
m
T
",n
)
c-
-
LJi=l nik
t
t
c ·k - TU(S*k{ .,y)- U{.)
C ·k)1
·l m
n
.1m
1=
1=
f.
o.
(2.3.3)
Consider any l~k~p. If d~k=O or d~k=O, then this simply is the
case of Theorem 2.3.1, and (2.3.3) is just the same as (2.3.2) since
n
or
(
·2:1C~ikT "'!l
C-.k
LJ =1 m
1-
(-+ T
-n1. = C·
-n1'
,.T
~
S~k{ .,y) )
-- · T)
-C
-n1
E R
1
2p
(2.3.4)
.
By the assumption (A3), we have
Note that
n {_+ )2
2:
C·k = 1,
i=l m
and
n
. 2: {c-m·k)2
1=1
= 1,
33
therefore, for both {c~k' ~ni} and {c~ik' £ni}, (A1) and (A2) are satisfied, and
we have
We also observe, for lyl:5K,
T
,,!1
1 +'k I(Y.<F-1(t)+c+.
L.,,1= cm
1~m
"!1 c+,
_
L.,,1=1
_
-
_
-
U-C-,T
~
~m
u)
~
mk
"!1
+'k I(Y.1<F-1(t)+c+,
T u1-C. T u2)
L.,,1= 1Cm
~m
~
~m
~
En. c·
-+
1=1 mk
,
T
• v)
"!1
+'k I(Y.1<F-1(t)+C~n1
L.,,1= 1 Cnl
~
En.
1=1
-+
c'
mk
,
Therefore, by Theorem 2.3.1, we have, for any K>O, as n-oo
.@. + { (
sup
L." c 'k T
IYI:5 K i=l m
I
S;k( .,y)) (u( ))} , (S;k(+.,y)
Ei 1 c;ik
d nk
-T'
-TU
()E
n +
C'k
U·
i=l m
)1
p
- 0,
Since O<d;k<1, we have, as n-oo
(2.3,5)
34
Similarly, we can show that, as n-oo,
In
sup .E c~ik
lyl~K 1=1
{ ( S~k(
n .,y))
cr
Ei=l nik
-r
(U(.) )} -ru Snk("y)-U(·).E
*n)1
c~ik -P O.
I
(
1=1
(2.3.6)
o
Therefore, (2.3.3) follows from (2.3.5) and (2.3.6).
COROLLARY 2.3.3.
Suppose r: D[O, 1] -
R is a functional and is
Hadamard differentiable at U. Assume (A2) and (B), and assume
min
l~j~p
Then, for any
Proof.
1~k~p ~d
{lim d-.} > O.
n-oo nJ
K>O, we have (2.3.3).
By the assumptions on d~j and d~j' there exists a real number c>o
such that d~j~c and d~j~c, for all n~1 and 1~j~p. By (A2), we have
n ma~
l~l~n
II~·
m
11 2 <
00,
and
Hence, (A3) holds. Therefore, (2.3.3) follows from Theorem 2.3.2.
o
35
2.4 Proof of Theorem 2.3.1
In this section, for any fixed k=l,
, p, (AI), (A2) and (B) are assumed.
First, we notice that, for a fixed k=l,
, p, (AI) and (A2) imply three facts:
there exists M>O such that for n large enough and any 1 $j $p,
(
n Ic ..1) max
E
i=l mJ
max Ic ..1< M,
l$j$p l$i$n mJ -
as n-oo,
(2.4.1)
(2.4.2)
and further
"?
c· L..tl=l mk
00
'
(2.4.3)
We also notice that (B) implies that F' is bounded and uniformly continuous. The
result (2.3.1) will first be proved on C[O,l], and then be extended to D[O,I].
For any O<K<oo, consider
For each i:
1
I(Y.1<F- (t)+c-mT. U )
1
if F(Y.1 -cT.u)<t
-m- -
o
otherwise,
={
36
the curve 1III.: t=F(Y.1 -c-lllT,U) is continuous in u, because F is continuous, Hence,
for each n~l and Y l' ... , Yn , [O,l]x[-K,K]P is divided into finite pieces by
smooth curves lni' l$i$n, and the value of S;k(t,y) is simply a constant in each
different piece, or region. Also, if at most r curves 1 " j=l, ... , r, intersect at one
point, the largest jump of S;k(t,y) is
Ef
n~j
1Cnijk' Let S;k(t,y) (got by smoothing
S;k(t,y) through the above regions) be a continuous version (in (t,y)) of S;k(t,y),
then S;k(t,(2y-DK) is an element of C[O,l]P+\ and we have the following
lemma.
LEMMA 2.4.1.
For any fixed k=l, 2, .., ,p, assume (AI). Then,
ISn*k(t,y)-Sn*k(t,y)1 $ (p+l) ma~ c 'k'
sup
p
1 $l$n III
(t,y)EE 1 x[-K,K]
Proof.
a.s.
(2.4.4)
Without loss of the generality, suppose that lni' 1$i$p+2,
intersect at one point (to,yo)' Since F is strictly increasing, we have
(2.4.5)
which are (p+l) linear equations with respect to YoERP . Then, there exist two
pxp matrices ~1 and ~2' depending on ~ni' 1$i$(p+2), such that
37
(2.4.6)
Since (2.4.5) and (2.4.6) imply that Y l' ... , Y p+2 are linearly dependent, i.e.,
and since F is continuous, then
For each n, the probability that more than (p+l) fs intersect at one point only
depend on {cnij}' Hence, with probability one, no more than (p+1) fs intersect at
one point for each
n~l.
Therefore, with probability one, the largest jump of
S~k{t,Y.) is no larger than (p+l)max 1 ~i~n cnik· Since S~k{t,Y.) is a continuous
smoothing of S~k{t,Y.), (2.4.4) follows.
LEMMA 2.4.2.
0
Assume (A2) and (B), then, for any l~k~p, as n -
sup
(t,y.,y)EE 2P + 1
I V~k{t,Y.,Y) -
Hnk{t,y.,y)
I-
0,
(2.4.7)
and
asc-O
(2.4.8)
00
38
uniformly with respect to n2:1 and
Proof.
= K
l~k~p.
For any (t,y,y)eE 2p +1' we have
E t Ic
. 1· 1
1= J=
·k 11c .. I supIF'(e) - F'(F- 1 (t))1
nI
nIJ
P
n 2
~ K • E . E cnI·k
J=1 1=1
Then, (2.4.7) follows from (2.4.2) and the uniform continuity of F'.
For any (t,y,y), (s,~,r)eE2p+1' we have
t
39
By the similar proof above,
+ 2Kc5M 1i=l
t Iem'k' (Iem·ll+... +lemp. I)
~ pK sup
IF'(F- 1(t)) - F'(F- 1(s))1
It-sl~c5
+ 2pKM1c5
where M1 is a bound of F'. Therefore, (2.4.8) follows from the uniform continuity
of F'.
0
COROLLARY 2.4.3.
have, as n --
Assume (A2) and (B). Then, for any 1~k~p, we
00
(2.4.9)
and
as c5 --
o.
(2.4.10)
40
Proof.
I
For any (t,y)eEp + 1 ' we have
Therefore, (2.4.9) and (2.4.10) follow from Lemma 2.4.2.
0
Let
Lq (r)=U(l1' ... , lq) Ili=O, 1, ... , rj i=1, 2, ... , q}
where r is an arbitrary positive integer.
LEMMA 2.4.4.(Neuhaus)
function
f:
Eq-R
Ih _~21~2-m(6),
be
given.
Let 6>0, m(6)=max{rI62r~1}, m>m(6) and a
Then,
for
points
h,
~2eLq(2m)
with
the following inequality holds
where the 'sup' is to be taken over all JeL q (2 r ) with J+eJ.l2- r eEq (eJ.l denotes the
J.l-th unit-vector of E q ).
•
For ~=(xl' ... , Xq), ~=(Yl' ... , Yq)eE q , we denote'
41
iff
PROPOSITION 2.4.5.
xi~Yi'
for i=1, 2, ... , q
(2.4.11)
For any fixed k=1, 2, ... , p, assume (A1), (A2)
and (B). Then, for any f>O,
(2.4.12)
Proof. For any (t,y), (s,y)eEp + 1 with l(t,y)-(s,Y)I<c5, there exist (t',y'),
(t",y"), (s',y'), (s",Y")eLp+1 (2 m ), where m is an arbitrary positive integer, such
that
(t' ,-u')-«t
- ,-u)-«t"
- ,-utI).,
(5' ,y') ~(S,y) ~(S" ,y")
and
I(t' ,y')_(t,y)I~2-m, I(t",y")-(t,y)1 ~2-m;
1(5' ,y')- (s,Y)1 ~2-m, 1(5" ,y") -(s,Y)1 ~2-m.
We write
+[Wnk(t',y')- Wnk(s" ,y")]
+ [Wnk(S",y")- Wnk(S'Y)]
42
and notice that
I( t',y') _(sIt,y")1 ~ I( t',y') -
(t,y) I+I(t,y) -(s,y) 1+I(s,y) _(sIt,y") I ~ 6+2 -m+ 1.
Consider [Wnk(t,y)-Wnk(t' ,y')]. Since A;k(t,y,y) is monotonic in each
component of (t,y,y),
[Wnk(t,y)- W nk(t',y')]
= [Vnk (t",y",y')-Vnk (t',y',y')] - [Ank(t,y,y)-Ank(t",y",y')]
= [Vnk (t" 'N'N
u" u')- Vnk'N
(t' u' ,-u')] - [yOnk (t 'N'N
u u)- VO
(t" u" u')] nk'~'n
- (t-t") . L 1 cm·k
1=
and
[Wnk(t,y)- Wnk(t',y')]
43
= [Vnk (t',y',y")-Vnk (t',y',y')] - [Ank(t,y,y)-Ank(t',y',y")]
= [Vnk(t',y',y")-Vnk(t',y',y')]
- (t-t') ,
- [V~k(t,y,y)- V~k(t',y',y")] -
t 1cm'k
1=
hence,
+ SUp{/Vnk(t,y,y)-Vnk(S,~,r)/j
We can treat [Wnk(S",y")- Wnk(s,y)] in the similar way, Therefore, we have
44
Since,
by Lemma 2.4.2, we have
w
0
(2
-mn
asn-+oo
) -+ 0,
(2.4.14)
Vnk
where
(2.4.15)
c=( 4p+3)/(4p+4)
and
an= maJ:C
maJ;C
l~J~p 19~n
Icn.. 1
1J
(2.4.16)
so that mn -+00, as n-+oo. By (2.4.1),
2- mn
.I: cnik ~ M (2mn an)-l ~ M 2- mn {(mn +1)
1=1
= M2
-~+c
4(p+l)
0
-+,
as n-+oo.
Hence, by virtue of (2.4.13), (2.4.14) and (2.4.17), it suffices to show that
(2.4.17)
45
m(6) - 00,
as 6 -
O.
By Lemma 2.4.4, we have
Choose ae(O,I) such that (a4 (P+l).J2»1. Since
r-m(6) £(I-a) _ £(I_amn - m(6)+1)
a
<
4(2p+l) 2 4(2p+l) 2
we have
p(sup{IVnk(t,y,y)- Vnk(s,~,r)lj
£
4(2p+l)
2'
46
(2.4.18)
We notice that if a r.v.
then, for
e satisfies
l~ 2
Let
~1
(
4(p+l)
=E[Vn k (-J.) - Vn k (.-J+e1 2-r)]4(p+l) =E[ . L:n 1m
c ok em.-Eem0)]
1=
47
where
~ . = I(F- 1 (t)+C+' T(2u-I)K-c- .T(2v-I)K<Y. <
"m
-m
-
-
-m
-
-
1-
1
r
-<F- (t+2- )+c+.
-nl T(2u-I)K-c- -nl.T(2v-I)K).
--
For eni' we have
1
1=0
IE(eni-Eeni)11 = { 0
1=1
~C
Eeni
2~1~4(p+1)
where C is a constant. By the choice of mn, (2.4.15) and (2.4.16),we have
for
So,
E~ . = F(F- 1 (t+2- r )+c+. T(2u-I)K-c- .T(2v-I)K) "m
-m
- -m
--
- F(F- 1(t)+c+.
-nl T(2u-I)K-c- -nl.T(2v-I)K)
--
- t - F'(T/)(c+.
T(2u-I)K-c-.
T(2v-I)K)
-m
- -m
--
r~mn'
48
r
<
- 2-
+ M1an
<
M2 2- cr
-
where M 1, M 2 are constants.
Let
(2.4.19)
where al\b=min{a,b}, avb=max{a,b}, and M3 , M4 are constants. Therefore,
49
< M ( 4(2p+l)2 )4(p+l)2r(2P+I)2-Cr2(p+l)
5 £ ( I-a)a r-m(c5)
= M ( 4(2p+l)2 )4(P+l)2-~
5 £ ( I-a)a r-m(c5)
(2.4.20)
where M5 is a constant.
Similarly, let
n
=E[Ec
'k(1J m.-E1Jm.)]
. 1 m
4(p+l)
,
1=
n
=E[Ec
'k('"m.-E."m.)]
. 1 m
1=
where
4(p+l)
,
50
-
$F-l(t)+(£~i T, -£~i T)((2y._I)T, (2Y-I)TtK}.
Then,
where M6 is a constant. Therefore, for
2$Jl$2p+1,
a =E[V (.)-V ('+ 2- r )]4(P+1) <M 2- Cr2 (P+1)
Jl
nk 1
nk 1
eJl
-
7
'
where M7 is a constant. In (2.4.18), we have
•
f ... 2;:1 ... f (
h=O
•
JJl=O
4(2p+l)2
J2P+1=O { 1-a a
.
( ) r-m(b)
)4(P+l)E[V
C)- V (. + 2- r )]4(P+1)
nk 1
nk 1 eJl
51
r
2r
2r-1
2
( 4(2p+l)2 )4(P+l) 2-Cr2(P+l)
M
~
7 • E "' • E '" • E
( ) r-m( IS )
h=O Jp=O J2P+l=O l I-a a
<M (
-
S
l
(
4(2p+l)2 )4(P+l) 2-~,
I-a) a r-m(lS)
(2.4.21 )
where MS is a constant.
From (2.4.1S), (2.4.20) and (2.4.21), we have, as 6--+0
E
M ( 4(2p+l)2 )4(P+l)2-~
- r=m(6 ) 9 l ( I-a) a r-m(6)
<
m(6)
E
~ M (4(2P+l)2)4(P+l) 2--2pr --+ 0
9 l(l-a)
r=O
where Mg is a constant, because p=(a4(P+l).J2)-1<1 and m(6)--+oo, as 6--+0.
o
PROPOSITION 2.4.6.
and (B). Let
For any fixed k=l, 2, ... , p, assume (AI), (A2)
52
and let {Pnk; n~1} be the sequence of probability measures corresponding to
T nk ,
n~1.
Then, {Pnk} is relatively compact.
Proof.
Note that T nk (t,y)eC[0,1]P+l and
S~k(O,-KD=O, for n~1. By
(2.4.4), we have
IT k(O,Q)1 = IS~(O, -K DI 5 (p+1) m~
n
1515n
'k'
C
ill
a.s.
Hence,
P onk
= P 1ro -1
converges in distribution.
(2.4.22)
By virtue of Neuhaus' (1971, discussion on page 1290-1291), it suffices to show
that for any £>0,
lim
lim
6-0 n-oo
p( "'Tn(6ȣ)
=
k-
0,
because (2.4.22) and (2.4.23) imply that {Pnk} is relatively compact.
Since for any (t,y), (s,y)eEp + 1 '
(2.4.23)
53
by (2.4.4), we have
WT
n
k(6) ~ 2(p+1) m~
l~l$n
c·k +
m
Ww
nk
(6)
+
W
0
Wnk
(6),
a.s.
By Corollary 2.4.3, we have (2.4.14) for W~k from a similar proof. Therefore,
(2.4.23) follows from (2.4.2), (2.4.14) for W~ and Proposition 2.4.5.
0
Suppose r={T A; AEA} is a compact set in C[O,l]P+\
LEMMA 2.4.7.
let r 1 ={T A(·,y); AEA, YEEp }, then r 1 is a compact set in C[O,l].
Proof.
It suffices to show that any infinite sequence in r 1 contains a
convergent subsequence in r 1.
Suppose {TAn(. ,Ynn is a sequence of r l' then {TAn(-, .:.n is a sequence of
r.
Since
r
is compact, {TA (.,.:.n contains a convergent subsequence also denoted
n
asn-+oo
(2.4.24)
where AoEA. Since Yn EEp + 1 ' there exists a convergent subsequence also denoted
by Yn such that
54
as n -
(2.4.25)
00.
For any te[O,l], we have
Therefore, by (2.4.24), (2.4.25) and the uniform continuity of TAo on E p +1' we
have
asn-oo
o
PROPOSITION 2.4.8.
For any fixed k=l, 2, ... , p, assume (Al), (A2)
and (B). If, for any compact set r o in C[O,l],
·
Rem(tH) - 0
11m
t
-,
t-O
(2.4.26)
uniformly for Hero. Then, for any K>O, as n-oo
n
( S~k ( ., Y)
.E cnik Rem
n c
I
IYI:5 K 1=1
Ei=l nik
sup
D(.)
)
I -P o.
(2.4.27)
55
Proof.
From Proposition 2.4.6, we know that {Pnk} is relatively compact
in C[O,l]P+\ where
Since C[0,1]p+1 is complete and separable, by Prohorov's theorem (Billingsley,
1968, Theorem 6.2), {Pnk} is tight, i.e., for any £>0, there exists a compact set f
in C[0,1]p+1 such that
P(Tnk e f) >
1-
£,
for
n~l.
(2.4.28)
Using the same definition about f 1 in Lemma 2.4.7 with respect to f, we
know that f 1 is a compact set of C[O,l]. gy (2.4.3) and (2.4.26), there exists a
positive integer N such that
f
1 c 'k Rem( n H
)1:5 £,
.
m
""
cnik
1=1
.L.ti=l
for n~N, Hefl'
for
n~N,
(2.4.29)
yeEp .
(2.4.30)
n~N
(2.4.31 )
Since (2.4.30) implies
n
( Tnk("y))1
sup
~ c 'k Rem
n
:5 £,
m
""
.
ye Ep 1=1
.L.ti=l cnik
I
for
56
and since (2.4.31) can be equivalently written as
for
therefore, from (2.4.28) and (2.4.32), we have, for
n~N
(2.4.32)
n~N,
o
Let r be a set in D[O,l] and HeD[O,l], define
dist(H, r) = inf II H - G II.
Ger
LEMMA 2.4.9.
Let Q: D[O,l]xR -+ R
(2.4.33)
and suppose that for any
compact set r in D[O,l],
lim Q(H, t) =
t-+O
°
(2.4.34)
uniformly for Her. Let £>0 and let an, I3n be sequences of real numbers such that
an -+0, I3n -+0, as n-+oo. Then, for any compact set r in D[O,l], there exists a
positive integer N such that, if dist(H,
r)~an,
then
57
I Q(H,
Proof.
,an)
I<
£,
for
n~N.
Suppose not. Then, for a real number £>0, there exists a compact
set r in D[O,l] and sequence {Hk}cD[O,l] with dist(H k , r)$a n such that
k
(2.4.35)
Since dist(Hk , r)$a n , we can choose HkEr such that
k
Since {Hk}cr and r is a compact set, {Hk } has an accumulation point H* Er.
Therefore, we can choose a subsequence of {H k } also denoted by {H k } such that
k-
H
Since an -0, we also have
k
and the set
is compact. By (2.4.34), we have
H*,
as k -
CXl.
58
Q(Hk , t) - 0,
as t - 0.
uniformly for H k Er 1 . This contradicts (2.4.35).
From
the
discussion
by
0
Fernholz
(1983),
we
know
that
[S;k( .,y)-U(· )Ei 1 cnik] are not random elements of D[0,1] with uniform
topology. Next, we will use the inner probability measure P * corresponding to P
to deal with this problem.
LEMMA 2.4.10.
Under the assumptions of Proposition 2.4.8, for f>O,
there exists a compact set r in D[0,1] such that, for all n;:::1,
P*{dist([S*k( .,(2y-I)K)-U(.)
n
I:
c 'k]'
i=l m
r)~(p+1)ma~ c 'k' VYEEp } >
19~n m
1-f.
Proof. From the proof of Proposition 2.4.8, there exists a compact set r in
p+1
C[0,1]
such that, for all n;:::1,
P(Tnk
By (2.4.4), we have, for n;:::1,
E
r) >
1-
f.
59
P {T kef, liS-* k(·' ~)-S*k(·' ~)II~(p+l)m~ c'k } ~ 1n
n
n
l~l~n m
Eo
(2.4.36)
By Lemma 2.4.7, we know that f induces a compact set f 1 in C[O,l],
which is also a compact set of D[O,l] because C[O,l] is a subspace of D[O,l]. We
also know that if Tnkef, then T nk (·,y)ef 1 for any yeEp , Le.,
for any
'Yo eEp .
(2.4.37)
By the definition, we know that IIS*k{-, ~)-S*k(·' ~)II~(p+l)m~ cn1'k implies
n
n
l~l~n
for any yeE p .
(2.4.38)
Since (2.4.37) and (2.4.38) imply
dist([S*k( .,(2u-I)K)-U(. n:~-lc 'k]' fl)~(p+l)m~ c 'k'
n
- 1- m
1 ~1~n m
by (2.4.36), we have, for
n~l,
P*{dist([Snk*(.,(2y-J)K)-U(·)
f:
cn'k]'
i=l 1
f1)~(p+l)ma~
l~l~n
c 'k' vyeEp } > 1-f.
m
o
60
Proof of Theorem 2.3.1.
Since r: D[0,1] - R is Hadamard differentiable
at U, by the definition of Hadamard differentiability (2.2.27), (2.4.26) holds.
By Lemma 2.4.10, for £>0, there exists a compact set r in D[0,1] such that,
for
n~1,
f:
P*{dist([S*k(·,(2y-DK)-U(.)
c ·k]'
n
i=l n1
r)~(p+1)m~
l~l~n
c 'k' vyeE p } > 1m
~.
Therefore, we can find measurable sets Bn for all n such that
and
(2.4.39)
Using Lemma 2.4.9 by considering Q(H,t)=Rem(tH)/t, for the compact set
r, there exists a positive integer N such that, if dist(H, r)«p+1)max
<.< c 'k
1 _l_n m
for
n~N,
then
I. f: cm·k Rem(
1=1
f:
n Hc )
~
LJi=l nik
cnik]'
{dist([S* k( .,(2y-DK)-U(.)
n
i=l
implies
I<
Eo
r)~(p+1)ma);C
c 'k' vyeE p }
m
l~l~n
61
(2.4.40)
Since (2.4.40) implies
(2.4.41 )
by (2.4.39), we have, for
n~N,
o
Remark (1).
In the proof of Theorem 2.3.1, we used the fact that, for
Rem(
S* k ( u)
R
.,- -U(.)),
".
"""1=1 cmok
is measurable, even though S~k( .,y.) is not a random element of D[0,1], and
therefore we used probability measure P for events concerning this function. This
is because that
62
is measurable, and that, by Lemma 4.4.1 of Fernholz (1983), we know that
t c 'k) = 1=1
t c 'k TU((6(
1=1
Tu(S*'k("Y)-U(,)
n
.
m
.
m
T
Y·-c . u)
I
-
F)
0
F- 1 )
-01-
(2.4.42)
is also measurable, where (2.4.42), without any assumptions on {c 'k} and the
111
underlying d.f. F, is true as long as
Remark (2).
T
is Hadamard differentiable at U.
If in Theorem 2.3.1 and Proposition 2.4.8,
or
is not measurable, we replace lyl$K by YEQk' where
Qk = {all rational points in [-K, K]P}.
Then, they both are measurable. Our main results in Section 2.3 will be slightly
.
different, but still good enough for the study of (2.1.15).
CHAPTER III
ON M-ESTIMATION IN LINEAR MODELS
3.1 Introduction
The linear regression model is one of the most widely used tools in
statistical analysis. In this chapter, we consider the simple linear regression model
as the following:
X.1
where the
~i
= {3T
c·
+ e·,1
-1
(3.1.1)
are known p-vectors of regression constants, 1!=({3l1 ... , {3p)T is the
vector of unknown (regression) parameters,
p~1,
and is to be estimated from n
observations Xl' ... , X n , and ei are independent and identically distributed
random variables (Li.d.r.v.) with distribution function (d.f.) F. We should notice
that (3.1.1) reduces to the classical location model if all
~i'
with p=l, are equal to
1.
Classically, the problem is solved by minimizing the sum of squares (with
respect to ~):
"'!1
1 (X.1 - c·
T (J)2 = min!
LJ1=
-1-
(3.1.2)
64
or, equivalently, by solving the system of p equations (with respect to
~)
obtained
by differentiating (3.1.2)
"!l
1 c.(X.
- c·
T (J) = O.
L..I=
-1
1
-1
-
(3.1.3)
The computation of the Least Square Estimators (LSE) defined by the estimating
equations (3.1.3), which are linear equations, can be easily done. Furthermore,
the statistical properties of LSE, such as the consistency and the asymptotic
normality, have been studied in detail (Huber, 1981). But, as we know, in spite of
its mathematical beauty and computational simplicity, the least square estimators
suffer a dramatic lack of robustness. Indeed, one single outlier can have an
arbitrarily large effect on the estimate. Hence, other 'robust' estimators of
I!
have
been proposed. In particular, Huber (1973) robustized the classical equations
(3.1.2) and (3.1.3) in a straightforward way: instead of minimizing a sum of
squares, we minimize a sum of a less rapidly increasing function
p
of the residuals,
I.e.,
"!l
1 p(X.1 - c·
T (J) = min!
L..I=
-1-
or, if
p
(3.1.4)
is convex with derivative ,p, (3.1.4) is equivalent to
"!l
1 c.,p(X. - ,:. T (J) = O.
L..I=
-1
1
'1
-
(3.1.5)
The resulting estimators of (3.1.5), ~n, are called M-estimators of Regression and
,p is called Score Function of the M-estimators.
65
To study the asymptotic properties of the M-estimators of regressIon,
various procedures have been considered by various people.
Huber (1973) proved the asymptotic normality of the M-estimators of
regression, assuming that the score function t/J is continuous, nondecreasing and
bounded with a bounded second derivative. Koul (1977) and Jureckova. (1977,
indirectly) also proved the asymptotic normality under weak conditions on t/J, but
they both required that the underlying d.f. F has finite Fisher information, Le.,
o < I(f) = J (f /f)2 dF <
00
where F'=f is the density function of F. As a corollary of Theorem 4.1 of Bickel
(1975), Yohai and Maronna (1979) obtained the asymptotic normality under
certain assumptions on t/J and F, one of which is given as below:
t/J(u+z) - t/J(u) > d
z
-,
(3.1.6)
where b, c, and d are positive real numbers satisfying q=F(c)-F( -c»O.
RelIes (1968) proved the consistency of this M-estimators of regression
when t/J belongs to the Huber family
t/J(X, k) = min{lxi, k} sgn(x).
Yohai and Maronna (1979) proved the weak consistency for nondecreasing t/J when
(3.1.6) is assumed.
For non-convex
p,
Jureckova. (1989) derived the asymptotic normality and
66
the weak consistency of the M-estimators
III
linear models, assuming certain
existence of the second derivative of F.
The interest of our current research is to study the asymptotic properties of
the M-estimators of regression under the weak conditions on the score function 1/;
and the underlying d.f. F. Since M-estimators are implicitly defined by the
estimating equations, it is difficult to study their statistical properties directly.
The principle difficulty is the non-linearity of the estimators. Therefore, one of the
approaches presented in this chapter is to use Hadamard differentiability through
Jureckova- (1971, 1977) uniform asymptotic linearity of the M-estimators of
regression. Other approaches are also considered in order to study the strong
consistency, to which the Hadamard differentiability approach may not apply.
The results show that Hadamard differentiability provides a good method for the
study of the asymptotic properties through the linear approximation of the
estimator.
Some notations along with basic assumptions and some preliminary
lemmas are presented in Section 3.2. The asymptotic normality of the Mestimators of regression is considered in Section 3.3, where none of the
assumptions, such as bounded second derivative on 1/;, or finite Fisher information
on F, or (3.1.6), or any existence of the second derivative of F, is required. The
weak and strong consistency of the M-estimators in linear models are investigated
in Section 3.4, where neither (3.1.6) nor any existence of the second derivative of
F is assumed.
67
3.2 Preliminaries
For the simple linear model (3.1.1), we consider the following normalized
version of the estimating equations in (3.1.5):
Mn{u)
"!1 c. ·'·{Y.
- - = .L,,1=l-nl
'Y
1-cT.
-m-u)
where Yi=Xi -~T~i are i.i.d.r.v.'s with d.f. F, and for every
(3.2.1)
n~p,
(3.2.2)
gg = Diag(~rnll' ... , ~rnpp)
O
c-m. = {C
= {cml
. , ... , cmp
. )T,
- n )-l c·
-I
(3.2.3)
(3.2.4)
(3.2.5)
(3.2.6)
Then, we see that (3.1.5) is equivalent to
(with respect to y,),
(3.2.7)
i.e., Yn=gg{.8n-,B) is a solution of (3.2.7), and that for any 1::;j::;p,
(3.2.8)
where 11·11 stands for the Euclidean norm and
(3.2.9)
68
Since ¥n(Y) involves the following empirical function
(3.2.10)
particularly, for each yeRP and l~k~p, Mnk(y) (the k-th component of ¥n(Y))
is a linear functional of
S*k(t,u)="'!1
<F- 1(t)+cT.
n
- L.J1=lCm'kI(Y,1-nl-U )
(3.2,11)
(the k-th component of §ri(t,y)), viz.,
(3.2,12)
we consider the expected value of S;k(t,y):
(3.2.13)
We also write for every l~i~n, l~k~p, te[O,l] and yeRP ,
c- 'k; m
c+'k=max{O, cm
cm
'k=c+'k'k}'m
c- 'k=-min{O, cm'k};
mm
C .=c+.-c-.·
-m -m -m'
c+.=(c+, ... C+,)T c- ,=(c-, ." c-. )T.
-m
ml'
'mp' -m
ml'
'mp'
C -c+
c- .
-n.k--n.k- -n.k'
}
(3.2,14)
(3,2.15)
(3.2.16)
..
69
(3.2.17)
c+ /d+
{nik
nk
cnik=
if d~k>O
0
if d~k=O,
_+
~iuk= (iuk d
l nk
0
if diIk>O
(3.2.18)
if diIk=Oj
(3.2.19)
}
(3.2.20)
}
(3.2.21)
so that
}
(3.2.22)
and
(3.2.23)
Let
70
!=
(1, ... , I)T,
(3.2.24)
and
(3.2.25)
where q is an arbitrary positive integer, then, for any O<K<oo, (t,y)EEp +1 '
(t,y,Y)EE 2p +1 ' 15k5P, we write
(3.2.26)
(3.2.27)
(3.2.28)
A*k(t,u,v)
= ~ c 'kI(Y.<F-1(t)+c+.T(2u-I)K-c- .T(2v-I)K)
n
- .LJ m
1-m
- -m
-1=1
(3.2.29)
(3.2.30)
(3.2.31)
(3.2.32)
(3.2.33)
71
It is easy to see that
(3.2.34)
where
A~k(t,y,y)=S~k(t,(2Y-I)K)
(3.2.35)
Also, let f be a function defined on Eq , we denote
(3.2.36)
where hsl=l(xl' ... ,xq)l=m~
1 ~l~q
Ix·l·
1
In this chapter, we will always consider the D[O,l] space (of right
continuous real valued functions with left hand limits) is endowed with the
uniform topology. Then, the space C[O,l] of real valued continuous functions,
endowed with the uniform topology, is a subspace of D[O,l]. It is easy to see that,
for every yeRP and l~k~p,
S~k(" y) is an element of D[O,l]. A convention
which we will follow through the whole chapter is: a function f: R-R is
nondecreasing if f(x)~f(y) for x~y, and is increasing if f(x)<f(y) for x<y.
Analogously, for nonincreasing and decreasing.
72
Some assumptions, which may be required for our results in this chapter,
are given as below:
(Al) cnik ~ 0,
i=l, 2, ... , n
(A2) n-oo
lim
(A3)
(A4) There exists a positive definite pxp matrix
9 such that
lim Qn=Q
n-oo (B) F is absolutely continuous with a derivative F' which is positive and
continuous with limits at ±oo
(Cl) 1/J is a nondecreasing function with a range including positive and
negative real numbers
(C2)
f
1/J dF = 0
(C3) 0 < "{ =
(C4) 0 <
0'2
f
=f
1/J' dF
<
00
1/J2 dF <
00
(C5) 1/J is right continuous and bounded.
73
We notice that, for any fixed k=l, ... , p, (AI) and (A2) imply that
as n-oo,
and (A2) implies two facts: there exists C>O such that for
n~l
(3.2.37)
and any
1~j:5P,
(3.2.38)
where
(3.2.39)
and
as n-oo.
(3.2.40)
We also notice that (B) implies that F' is bounded and uniformly continuous.
For ?S=(xl' ... , Xq), r=(Yl' ... , yq)eEq , we denote
iff
LEMMA 3.2.1.
xi ~Yi'
for i=l, 2, ... , q
(3.2.41 )
For a fixed k=l, ... , p, assume (AI), (A2) and (B).
Then, for any K>O, as n-oo
(3.2.42)
74
Proof.
Let Lq (r)=U(ll' ... , lq)lli=O, 1, ... , rj i=l, 2, '" , q}, where r is
an arbitrary positive integer.
For any
(t,y)eEp +1 '
there exist
(t',y'),
(t",y")eL p +1 (2m ), where m is an arbitrary positive integer, such that
(t',y') ~ (t,y) ~(t" ,y")
and
I(t',y')-(t,y)1 ~2-m, I(t" ,y")_(t,y)I~2-m.
We write
Consider
[Wnk(t,y)-Wnk(t',y')].
Since A;k(t,y,y)
is monotonic in each
component of (t,y,y)eE 2p +1 ' where y, yeE p , from the proof of Proposition 2.4.5,
we have
75
Hence, we have
+
~
sup
IWnk(t,y)- W nk(t,~nl
(t,y) ELp + 1 (2m )
2w
0
Vnk
(2-m)
+ 2-m+l En
i=l
cnik
+
Let
mn=max{m; a~~2cm},
where c=(4p+3)/(4p+4),
then, by (2.4.14), (2.4.17) and the proof of Proposition 2.4.5, i.e., as n-oo
(3.2.44)
76
it suffices to show that, as n-oo
(3.2.45)
For any (t,y)eEp +1 ' we write
where
We notice that if a r.v.
j
e satisfies
1
E{lel } = {
Elel
then, for
l~ 2
j=O
j~l
77
So, for eni' we have
IE(eni-Eeni)l,
={
1
1=0
0
1=1
~M Eleni l
2~1~4(p+1)
where M is a constant. Since,
where
TJ
is between [F-l(t)+~~i(2y.-nK] and F- 1(t), and M1 is a constant, for any
p>O, froffi the proof of Proposition 2.4.5, we have
P{
IWnk(t,y.)- Wnk(t,~nl>p}
sup
ffin
(t,y.)eL p + 1 (2 )
~ (2ffin +1)P+l p {IWnk(t,y.)-:- Wnk(t,~nl>p}
< (2
-
mn
p
+1
)P+l
4(p+l)
E[W
(t u)- W (t lI)]4(P+l)
nk ,nk '2-
78
= (2
mn
p
)P+l
+1
4(p+l)
E[
.
t
1=
1
c . (e .-Ee .)]4(P+l)
mk m
m
M 2mn (p+l)
(2 mn+1)P+l M
2(p+l)
3
<
4(p+l)
2 an
$
4(p+l)
P
P
where M2 and M3 are constants. By the choice of mn given by (3.2.44), (3.2.45)
follows from the following:
=2
mn(2P+l)
2
-+ 0,
o
because mn -+00, as n-+oo.
COROLLARY 3.2.2.
For a fixed k=l, ... , p, assume (AI), (A2) and
(B). Then, for any K>O, as n-+oo
sup{I[S~(t,Y)-S~k(t,Q)]-.tcnik ~~iyF'(F-l(t))I; te[O,I], IYI$K} f. 0.
1=1
(3.2.46)
Proof.
Lemma 3.2.1 implies that, for any K>O, as n-+oo
79
sUP{I[S~k(t, y)-S~k(t, Q)]-[Snk(t,y)-Snk(t,Q)]Ij tE[O,l], IYI::;K} R 0.
(3.2.47)
o
Therefore, (3.2.46) follows from Corollary 2.4.3 and (3.2.47).
Assume (A3) and (B). Then, for any K>O and
COROLLARY 3.2.3.
sup{I[S:i/t,y)-S:i/t,Q)]-[S;k(t,y)-S;k(t,Q)]Ij tE[0,1], IYI::;K}
R 0,
(3.2.48)
and
suP{I[S~k(t,y)-S~k(t,Q)]-[S~k(t,y)-S~k(t,Q)]IjtE[O,l], IYI::;K} R 0.
(3.2.49)
Therefore,
sup{I[S:k(t,y)-S:k(t,Q)]-.E
1=1
c;ik~~iyF/(F-l(t))lj tE[O,l], IYI::;K} R 0,
(3.2.50)
and
sup{I[S~k(t,y)-S~k(t,Q)]-. E c~ik~~iyF/(F-l(t ))Ij tE[O,l], Iyl ::;K} R 0.
1=1
(3.2.51)
80
Proof.
Consider any l$;k$;p. If d;k=O or
d~k=O,
it is the case of
Corollary 3.2.2. If d;k>O and d~k>O, by Lemma 3.2.1, we have, as n-oo
and
Hence, (3.2.48) and (3.2.49) follow from the fact: O<d;k' d;k <1. Further, from
the similar proof of Corollary 2.4.3, we have, as n-oo
sUP{I[S~k{t,lJ)-S~k{t,Q)]-.tc~ik~~ilJF'{F-l{t))l;te[O,l], IYI$;K} f. 0,
1=1
(3.2.54)
and
suP{I[S~k{t,Y)-S~k{t,Q)]-.t c~ik~~iyF'{F-l{t))l; te[O,l], IYI$;K} f. O.
1=1
(3.2.55)
Therefore, (3.2.50) and (3.2.51) follow from (3.2.48), (3.2.54) and (3.2.49),
(3.2.55), respectively.
o
81
LEMMA 3.2.4.
l:5k:5p. Then, as n-+
00
sup' 18*k(t, Q)-8 k(t,Q)/ = Op(l).
te[O,l] n
n
Proof.
(3.2.56)
Let a weighted empirical process given by
n
Zn k(t) = . E1cm'
'k[I{em.<Gm.(t)}-Gm.(t)],
1=
where Gni are arbitrary d.f.'s on [0,1] and eni are independent r.v.'s with d.f. Gni ,
respectively. Then, for Gni=U and eni=F(Yi ), we have
and for any O:5S, t:51,
t
Cov{Zn k(s), Zn k(t)} = . 1 cn2 1'k{SAt - st}={SAt - st}.
1=
Hence, by Theorem 1 of Shorack and Wellner (1986, page 109), we know that Znk
weakly converges to some Z on (D[O,l], !D, 11·11), where 1I·II=the uniform metric on
D[O,l] and !D=the O'-field of subsets of D[O,l] generated by the open balls.
Therefore, (3.2.56) follows.
o
82
COROLLARY 3.2.5.
For any fixed k=l, ... , p, assume (AI), (A2) and
(B). Then, for any K>O, as n-oo
(3.2.57)
Therefore, as n -
00
S* (t, u)
P
sup{1 ~k
- - t/; tE[O,I], IYI~K} - 0.
Ei=l cnik
Proof.
then,
(3.2.57)
Since for each tE[O,I],
follows
from
lyl~K,
(3.2.47),
Furthermore, (3.2.58) follows from (3.2.37).
COROLLARY 3.2.6.
(3.2.58)
Lemma 3.2.4 and
Corollary 2.4.3.
o
Assume (A3) and (B). Then, for any K>O and
83
(3.2.59)
(3.2.60)
and furthermore, as n-oo
sup
y)
n
{I S~k{t,
c
+
-
I·
t , te[O,l], lyl~K
} -P 0,
(3.2.61)
Ei=l nik
S*- (t, u)
P
sup{ 1 ~k _- - tl; te[O,l], IYI~K} - O.
Ei=l cnik
Proof.
(3.2.62)
Consider any l~k~p. H d~k=O or d~k=O, it is the case of
Corollary 3.2.5. If d~>O and d~>O, from Corollary 3.2.5, we have, as n-oo
and since (A3) implies that
n
c+'k
'"
£.oJ
n+l
i=l dnk
-
00,
84
(3.2.59) and (3.2.61) follow. The proof of (3.2.60) and (3.2.62) is similar.
o
PROPOSITION 3.2.7.
For a fixed k=l, ... , p, assume (AI), (A2) and
(B). Let m: [0,1] -+ R be a function such that meL I[O,l] and m(t)=O outside of an
interval [ao,bo], where O<ao<bo <l, and let a function g defined on R given by
[x -a+F- I(a)F' (F- I(a)))/F' (F- I(a»
g(x) = {
I
(3.2.63)
F- (x)
[x - b+F- I(b)F' (F- I(b)))/F' (F- I(b»
b~x<oo.
where O<a<ao<bo<b<l. Then, for any K>O, as n-+oo
I
n
JI{ S* k(t,u)- )-g( S*~ k(t,O)- )}m(t)dt- En c ·kf~ilJ. JI m(t)dt -+P
E
c'k g( ~
sup
".
".
'-1 m
0
· 1 m 0
"'"'1=1 c·
mk
"'"'1=1 c·
mk
1IlJ. I~ K 1=
I
o.
(3.2.64)
Proof.
First, we notice that the function g is continuous and differentiable
with a bounded and continuous g', and that
.E c .
1= 1 mk
JI{g( Sik(t,lJ.) )-g( Sik(t,Q) )}m(t)dt
0
".
".
"'"'1= 1cni k
"'"'1= 1C'k
m
85
+
where
et
n
1
g'{t) L: c 'k~ T. yF'{F- 1{t))m{t)dt,
o
i=l m m
J
(3.2.65)
is between S;k{t,y)/L:i=lcnik and S;k{t,Q)/L:i 1 cnik' By Corollary
3.2.2, Corollary 3.2.5 and the uniform continuity of
for
(3.2.66)
te[a,b],
we have, as n-oo
sup
lyl~K
I J1 [g'{et)-g'{t)][S*k{t,y)-S*k{t,Q)]m{t)dt
I .E O.
n
n
(3.2.67)
0
By the boundedness of g' on [a,b], meU[O,l] and (3.2.46), we have, as n-oo
Therefore, (3.2.64) follows from (3.2.65) through (3.2.68).
0
In the general case of {cnik}' we have the following proposition.
86
PROPOSITION 3.2.8.
Under all the assumptions of Proposition 3.2.7,
except replacing (AI) and (A2) by (A3), we have that, for any K>O and
l~k~p,
as n-oo
(3.2.69)
and
I g( S*i/t,u))
n c- 'k£T,y I m(t)dt I -P o.
- -g(S*k(t,O))}
m(t)dt- L:
n c- 'k
sup
L:
'
Iy I~ K 1=1 m
I
l
~
"cL."i=l nik
{
0
~
"cL."i=l nik
l
.
m
1=1
m
0
(3.2.70)
Therefore, as n -
n c-'k
- . L:
m
1=1
00
Jl{g(S~k(t,y))
n c 'k £T.y Jl m(t)dt I -P O.
n
-g (S~k(t,Q))}
n
m(t)dt- L:
c- ,
". c- .
._ m m
0
".
L.,,1=1 mk
L.,,1=1 mk
1-1
0
(3.2.71 )
Proof.
Consider any l~k~p, If d;k=O or d~k=O, this is just the case of
87
Proposition 3.2.7, and (3.2.69) or (3.2.70) is equal to O. If d~k>O and d~k>O, the
proof is the same as Proposition 3.2.7's by using Corollary 3.2.3 and Corollary
o
3.2.6.
3.3 Asymptotic Normality
LEMMA 3.3.1.
Assume (A4), (C2) and (C4), and assume that an -0,
as n-oo. Then,
as n - 00;
(3.3.1)
(2) In addition, assume (Cl) and (C3), and assume, for any K>O, we have,
(3.3.2)
then, for
n~l,
Yn exists as a solution of the following equations:
(with respect to y),
(3.3.3)
and
I gn Yn
- ¥n(Q)/r
I f.
0,
(3.3.4)
88
Therefore, as n ---+ 00
(3.3.5)
Proof.
(1) Recall (3.2.1), we know
where Yi are LLd.r.v's with d.f. F. It suffices to show that, for an arbitrary vector
~=(al' ... , ap)TeRP ,
as n---+oo.
Since
letting
X .
m
then, XnI ,
... ,
.·,,(Y.)
= aT
- c-m'f'
1
__ n_a '
O'~aTQ
i=l, .. , , n
Xnn are independent r.v.'s, and satisfy
E{Xm.} = 0,
(3.3.6)
89
n
n (aT c .)2(12
n (aT c .cT. a)
= I: - -m-m - = 1.
m .1= 1 aTQ
a- TQ_ n_a
- _ n a- (12.1= 1
I: Var{X .}
. 1
1=
=
I: - -m
Also, let Fni be the d.f. of X ni , then, for any '7>0
: :; (1\ JT/J2(y) I{IT/J(y)l> 1'7(1,Man }dF(y),
~
where M>O is a constant. Hence, from (C4) and the assumption: an -0, as n-oo,
we have
n
I:
i=l
JIx I> '7 x
2
dF .(x) - 0,
m
Therefore, by Theorem 7.2.1 of Chung (1974), we have
~ X .
~
i=l
m
= ~TI ¥n(Q) ~
(1~~Tgn~
N(O 1)
' ,
as n-oo.
Hence, (3.3.6) follows from (A4).
(2)
(C1) implies that
p
is convex and that there exist a and {3 such that
90
1/I(a)<O,
and
1/1(,8»0;
a<,8.
(3.3.7)
So, for X<a and y>,8, we have
and
Therefore,
lim
x-±oo
and
p
has a finite lower bound. Let
p(x)
~n
= +00,
(3.3.8)
be a pxp non-singular matrix such that
~~gn~n is a diagonal matrix which (by (A4)) only has positive eigenvalues for
large enough n. Consider a function
~n
defined as below:
n~l,
p
YER .
(3.3.9)
Then, for any YERP , we have, as IIYII - 00
n c T.A v 2
.E Lw-nJ
1=1
Tc . c T.A v = vTA T( n c . c T. ) A v
= .En vTA
- _n_w-w_n- _n.E -w-w _n_
1=1
1=1
where Al (~) denotes the smallest eigenvalue of the matrix
for any large enough n,
~.
Hence, by (3.3.8),
91
Since
p
(3.3.10)
as . IIYII - 00.
4'n(Y) - 00,
is convex, 4'n is also a convex function for
one global minimum, Le., by the convexity of
p,
n~1.
Therefore, 4'n has at least
(3.1.4), equivalent to (3.1.5), has
a finite solution. Therefore, from the equivalent relation between (3.1.5) and
(3.2.7), (3.3.3) has a solution for large enough n.
From (3.3.1), for any £>0, there exist Ko>O and N such that
P{I¥n(Q)I>K o } < £,
for
n~N.
For arbitrary p>O and K>O, since (3.3.3) has a solution
~n
for
(3.3.11)
n~l,
P{lgn ~n1 - ¥n(Q)I>p}
(3.3.12)
Consider a function given by
n uTc .·'·(Y.-c T .uB ) ,
= .i.J
~
- -m'f'
1 -m-
yeRP , Be(O, +00).
(3.3.13)
1=1
Since t/J is nondecreasing, h(y,B) is nonincreasing in B for any yeRP . Hence, we
92
claim: IIYnll>K implies that
(3.3.14)
where ~n=Yn/IIYnllERP. Otherwise, h(~n, K)<O implies
(3.3.15)
which, by the definition of Yn and (3.3.13), contradicts
Therefore, by (3.3.11), for
n~N,
Therefore, by (A4), when n large enough, there exists a positive number A such
that O<A<A1 (9) and
93
Choose K such that K>PXK;, then ,81=-pKo +,\K-y>0, and by (3.3.2), we have
that, as n-oo
(3.3.16)
Therefore, (3.3.4) follows from (3.3.12), (3.3.2) and (3.3.16).
THEOREM 3.3.2.
condition of order
a>~,
o
Let 'if; be a bounded function satisfying Lipschitzand F be an absolutely continuous d.f. with a bounded
and continuous derivative F'. Assume (C1) through (C3) and assume (A4), and
an - 0,
as n-oo.
(3.3.17)
Then, for any K>O and 15k5P, as n-oo
Therefore, as n -
00
(3.3.20)
and furthermore,
94
o (
C
_n
Proof.
fin - f! )!l>
A
(
(7'2
Q-l) .
N p Q, ../ _
(3.3.21)
Since, by Fubini Theorem,
00
00
-00
-00
I tP(x)dF(x+~~iY) = I tP(Y-~~iy)dF(y)
00
=
=
00
I I I(x:5Y-£~iY) dtP(x) dF(y) + 11'( -00)
-00 -00
00
I
-00
[l-F(x+C;~iY)]
dtP(x) + 11'( -00)
00
= 11'(00) -
I F(x+~~iY) dtP(x),
-00
and, by (C2), we have
11'(00)=
j
F(x)dtP(x),
-00
so, we have
j tP(x-C;~iy)dF(x) = -00j [F(x)-F(x+~~iy)]dtP(x).
-00
Therefore,
(3.3.22)
95
=
I.E
cnik~~iyJ[F'(e)-F'(x)]d,p(x)1
1=1
n .E
P ICnikllcniJ·IIYI JIF'(e)-F'(x)ld,p(x)
::;.E
1=1 J=l
P~
n C~i' Iyl
::;.E
.E C:Uk .E
J=l 1=1
= plyl J
1=1
J
Js~p.IF'(e)-F'(x)ld,p(x)
1, J
s~p .I F '(e)-F'(x)ld,p(x)
1,]
where
e is
between x and (x+~~y). Hence, using DCT, (3.3.18) follows from
(3.3.17) and the boundedness and continuity of F'.
It is easy to see that, for each yeRP , E{Mnk(y)}=!,p(x)dSnk(F(x),y) and
that E{Mnk(Q)}=Q. Then, (3.3.19) follows from showing: for any K>O,
sup
IYI::;K
ID
k(y)1
.E 0,
as n-oo.
(3.3.23)
n
where
Let
(3.3.24)
96
where (e l , ... , ep) takes on the 2P possible realizations on the vertices of [-l,l]P,
i.e., ej is -lor 1, for l::;j::;p, and let, for y.EEp ,
(3.3.25)
then,
(3.3.26)
Hence, it suffices to show that, for any 1::;r::;2P , as n-oo
p
sup IB k (y.)1 - O.
y.EEp n r
We
notice
that
E{Bnkr(y.)}=O
and
Bnkr(Q)=O.
(3.3.27)
Let
Ji ELp(m),
O::;i«m+1)P, where m is an arbitrary positive integer and Ji satisfies that: (i)
Jo=Q; (ii) IJi-Ji-ll=~, for 1::;i«m+1)P. Denote
t.
"1
= B n k r (J'.)
-1
- B n kor (J'.
)
-1-1 '
(3.3.28)
q
Sq
=. ~le·I=n
B kr(jq),
(3.3.29)
1=
Since 1/1 satisfies Lipschitz-condition with order Q, then, for any y., yeEp ,
Var{B n kr (u)-B
= .~
c2 ·kVar{1/1(Y.1 -cT.IruK)-·I.(y.
-cT.IrvK)}
n kor (v)}
L.J m
- m- 'P
1 - m- 1=1
n
<
" c2 • MK IcT.I
(u_v)1 2Q <
pMK an2Q lu-vl2Q
-.L.J mk
-m- r - - 1=1
97
where M is a constant. So, for any A>O and i5j,
P(IS.-S·I>A)
= P(IB n kr (j.)-B
J 1-J
n kr (j·)I>A)
-1 -
pMK {(.
.) an}2Q -_ pMK
{"
<
- -2J-l m
2L.J
ul }2Q
A
A
i<15j
where ul=~. Since 2Q>1, by Theorem 12.2 of Billingsley (1968, page 94), we
have, for any p>O,
where M1 is a constant. Since Bnkr(y) is continuous in y, letting m-oo, we have
Therefore, (3.3.27) follows from (3.3.17).
THEOREM 3.3.3.
o
For a fixed k=l, ... ,p, assume (A1), (A2), (B),
(C1), (C2) and (C5). Then, for any K>O, as n-oo
98
(3.3.30)
where 1o=JF'dt/J>0.
Proof.
Let M: [0,1]-R be a function defined by M(t)=t/J(F- 1 (t)), O$t$l.
From the assumptions on t/J and F, we have that M is nondecreasing, right
continuous and bounded. Consider a functional T: D[O,l]-R defined by
T(G) =
f:
h(G(t)) etdM(t),
GeD[O,l]
(3.3.31)
where h is defined by
-e-x
h(x)
T
x~O
={
(3.3.32)
x-l
Then,
if
if x<O.
can be expressed as a composition of the following Hadamard
differentiable transformations:
99
and
Note that ')'1 is Hadamard differentiable at U by Proposition 6.1.2 of Fernholz
(1983) (which is also true for ')': D[O,1]-+LQ[O,1]), because h is differentiable
everywhere with a bounded and continuous derivative, and that
')'2
is linear and
continuous, thus Frechet differentiable. Since the range of ')'1 is contained by
D[O,1], from Proposition 3.1.2 of Fernholz (1983),
T
is Hadamard differentiable at
U with Hadamard derivative
TU(G) =
J:
h'(t) G(t) etdM(t),
GeD[O,1].
(3.3.33)
By Theorem 2.3.1, we have, for any K>O, as n-+oo
n
*
n)1
P 0.
sup .E
cnik { T ( S~k(
!1 .,y))
c. - T ( U(.) )} -TU, ( Snk(·,y)-U(·).E
cnik -+
IYI::;K 1=1
E 1=1 mk
1=1
I
Therefore, as n-+oo
(3.3.34)
Since, h'(t)=e- t , for O::;t::;1, then, for each y,
100
E
TU{S*k{ "Y)) = f1h'{t)S*k{t,y) etdM{t) =
C"k
non
'-1 III
1n
= •E1 C111'k
1=
OOf
T
(Yi-~ni!:!)
1
f
dM{t)
T
F(Yi-~ni!:!)
n
dt/J{x) = t/J{OO) •E1111
C 'k - Mn k{Y)'
1=
(3.3.35)
By the similar proof of Proposition 3.2.7, we have
(3.3.36)
Hence, (3.3.30) follows from (3.3.34) through (3.3.36).
0
In the general case of {cnik}' l~i~n, l~k~p, we have the following
theorem.
THEOREM 3.3.4.
Under all the assumptions of Theorem 3.3.3, except
replacing (AI) and (A2) by (A3) and (A4), we have that, for any K>O and any
(3.3.37)
Therefore, as n ..... oo
(3.3.38)
101
and furthermore,
gg ( Pn -
f3 ) ~ Np{Q,
-
0-:
Q_-1).
10
(3.3.39)
Consider any l~k~p. If d~k=O or d~k=O, (3.3.37) is the case of
Proof.
Theorem 3.3.3. If d~k>O and d~>O, using the functional
Theorem 2.3.2, we have, for any K>O and
l~k~p,
r
given by (3.3,31) and
as n-oo
, ~ +
(S:k{·'Y»)
~ _ ( S~iJ·,Y) ) sup
£J cm·k r "n C+
- • £J Cm·k r "n c.
£Ji=l nik
1=1
£Ji=l nik
IYI~K 1=1
- r{U)
t
t
c ·k - rU(S*k{·'Y)-U{.)
C 'k)'
·l m
n
·lm
1=
1=
R O.
(3.3.40)
So, by (3.3.35), we have, as n-oo
,
~
+
sup
£J c·
·-1
mk
K
Y
~
1II
-
~
.£J
1=1
{(
r
S:k{ .,Y») (S:k{ .,Q) )}
n + -r n
+
E·1=1 c'k
E·1=1c·k
m
m
S*- ( )
S*- ( 0)
{r(
nk .,y )-r( ok .,mk "!1 C-,
"!1 c-.
C-,
"'""1=1 mk
"'""1=1 mk
By the similar proof of Proposition 3.2.8, we have
)}+M
{u)-M {O)I
nk nk -
R O.
(3.3.41)
102
(3.3.42)
Therefore, (3.3.37) follows from (3.3.41) and (3.3.42).
Remark.
0
Comparing Theorem 3.3.4 with Theorem 3.3.2, we can see that
we require weaker conditions on t/J, but a slightly stronger conditions on F in
Theorem 3.3.4 by incorporating Hadamard differentiability through the uniform
asymptotic linearity of the M-estimators of regression.
3.4 Consistency
Using the Hadamard differentiability approach, which is used in last
section for the asymptotic normality, we have the following theorem for the weak
consistency of the M-estimator of regression.
THEOREM 3.4.1.
In addition to (A3), (B) and (C1) through (C4),
asrume that infn~dA1(gn)}>0and t/J'=O outside of a finite interval. Then,
A
I~n
-
~I
P
-
0,
(3.4.1)
103
Proof.
Let m: [O,l]-R be a function defined by m(t)=t/J'(F- 1 (t)), 05t51.
From the assumptions on t/J and F, we have that meL 1 [0,1] and m(t)=O outside of
an interval [ao,bo ], where O<ao<bo <1. For O<a<ao<bo <b<l, consider the
function g given by (3.2.63) and consider a functional T: D[O,l]-R defined by
T(G) =
Then,
T
J:
GeD[O,l].
g(G(t)) m(t) dt,
(3.4.2)
can be expressed as a composition of the following Hadamard
differentiable transformations:
12(5)=
J:
S(t )m(t )dtj
and
Note that 11 is Hadamard differentiable at U by Proposition 6.1.2 of Fernholz
(1983) (which is also true for 1: D[O,l]-LQ[O,l]), because g is differentiable
everywhere with a bounded derivative, and that 12 is linear and continuous, thus
Frechet differentiable. Since the range of 11 is contained by D[O,l], from
Proposition 3.1.2 of Fernholz (1983),
T
is Hadamard differentiable at U with
Hadamard derivative
TliG) =
J:
g'(t) G(t) m(t)dt,
GeD[O,l].
(3.4.3)
104
So, by Theorem 2.3.2, we have, for any K>O, as n-oo
~
+
(S;k("Y))
~ - ( S;"k("Y) ) sup
~ c 'kr
- • ~ cm'kr "n cm
"n c+
.
~i=l nik
1=1
~i=l nik
I K 1=1
IY::5
I
- r(U)
E m'k -
. 1
1=
C
rL(S*k("Y)-U(,)
n
r. O.
E
C 'k)1
. 1m
1=
(3.4.4)
Since, g'(t)=1/F'(F- 1 (t)), for a::5t::5b, then, for each Y
rL(Snk* (.,y)) =
1
Jog'(t)S*k(t,y)m(t)dt
= Ec'k
n
m
J
1
'-1
1-
g'(t)m(t)dt
T
F(Yi-~ni!:!)
(3.4.5)
Therefore, by (3.4.4) and (3.4.5), as n-oo
Y.-c T. u
- .E mk {r( "n
S;"k( .,y) )-r( S;"k( .,Q) )} + E
J"n c.
mk
C-,
1=1
C'
~i=l
C-
nik
~i=l
nik
1=1
I
n1
-,p'(y)dy
Ito.
y.
I
(3.4.6)
1
Hence, by Proposition 3.2.8 and r= I,p'dF= Im(t)dt, we have that, for any 1::5k::5P,
o
asn-oo
105
Y.-cT. U
n
T
n
sup .E cnik~nilJ'Y +.E cnik
IlJl~K 1=1
1=1
I
J-
I
n1
-
,
IP
t/J (y)dy -+ O.
(3.4.7)
y.
I
Therefore, as n-+oo
sup
IlJl~K
I QnlJ'Y
+ E~ni
i=l
(3.4.8)
Denote
Y.-cT. eS
<Pn(~, B)
I
=
-n1-
.E ~T~ni J
1=1
t/J'(y)dy,
(3.4.9)
y.
I
where Be(O,+oo) and ~eRP satisfying 1I~1I=1. Since t/J is nondecreasing, then, <Pn is
nonincreasing in B and we have, for B>O, AeR,
Yi
o~
A
J
t/J'(y)dy
~ A[t/J(Yi)-t/J(Yi -AB)].
(3.4.10)
vi-AS
Hence, for B>O,
o ~ -<Pn(~, B) =
Y.-cT.eS
I
-n1-
-.E
~T~ni J
1=1
t/J'(y)dy
y.
I
Since, for any p>O, we have
(3.4.12)
106
and since (A3) implies that, as n-oo
1
- +00,
it suffices to show that, for large enough K>O,
(3.4.13)
P{IIYnll>K} - 0,
Since, for arbitrary £>0, l::5k::5p and
P{IM
therefore, for
n~l,
there exists Ko>O such that
(O)I>K} < Var{Mnk(Q)} _
nk 0
Ko 2
-
(72
Ko 2
< £
P'
n~ 1,
(3.4.14)
Using the similar argument referring to (3.3.16) in the proof of (2) of Lemma
3.3.1, by (3.4.11) and (3.4.14), we have, for large n
107
V.-cT.u
I
-n1-
J
tP'(y)dy
I ~ -~~¥n(Q)+~~gn~nKy}
tP'(y)dy
I ~ Pt} +
vi
V.-c T. u
I
-n1-
J
f,
(3.4.15)
Vi
where Pt=-pKo +Kr(inf{Al «~n)} »0, for K large enough. Therefore, (3.4.13)
follows from (3.4.8) and (3.4.15).
0
However, the Hadamard differentiability approach may not provide a way
for the study of the strong consistency, which naturally requires stronger
conditions on tP, F and {£ni}' 1 ~i ~n. In this context, a different approach is also
considered for the weak consistency as below, which, later on, will lead to the
conditions for the strong consistency in Theorem 3.4.4 and Theorem 3.4.5.
THEOREM 3.4.2.
Assume that, in addition to (C1) and (C2), tP, F and
9n satisfy the following conditions:
(NI) tP is bounded, right continuous and piecewise differentiable with a
bounded derivative tP'
(N2) F is a d.f. with a bounded and continuous derivative F'
(N3) 10=jF'dtP>O
(N4) There exists a constant M such that maxt~i~nll£ill~M<oo
(N5) Let Al (~) and A2(~) denote the smallest and largest eigenvalue".pf
./
the matrix
~,respectively.
For 9n,
n~l,
108
~A2(9n)
0(1),
asn-oo
(3.4.16)
A2(9n)
0(1),
A1(9n)
as n-oo.
(3.4.17)
I~n
as n-oo.
(3.4.18)
Al(9n)
Then,
~I f.
-
0,
[Note that (3.4.17) and JiE;ooA1 (9n)=+oo imply (3.4.16)].
Proof.
Denote, for YERP, c5~O,
(3.4.19)
then,
where ~n=(~n-~)/II~n-~II. First, we will show that, for any fixed real number
0:515<1, as n-oo
sup
1I~1I=1
P
INn(~,c5)-E{Nn(~,c5)}1 -
o.
(3.4.20)
For any fixed 0:515<1, let
(3.4.21 )
109
where ~eEp, 1::;r::;2P and Ir is given by (3.3.24). Then,
and (3.4.20) follows from showing that, for any 1::;r::;2P , as n-+oo
sup
~eEp
IRnr(~,6)1
Note that R nr (Q,6)=0, and, for any
p
-+ O.
~eEp, E{Rnr(~,6)}=0.
(3.4.22)
Since, for any lJeEp
and yeE p ,
Var{Rnr (lJ,6)-Rnr (y,6)}
=
1 2 ~ J[uTlrc..'.(t-c:lrU6)-vTlrc.1/J(t-c:lrv6)]2dF(t)
. ~ - - -1'1'
-1 - - - -1
-1 - [,\ 1(C_n )] 1-1
M1
n
2
M1
"[(u-v)Tlrc.] =
(u-v)TlrCnlr(u-v)
- [,\ 1(C_n )]2.~
-1
[,\
(C
)]2
- - - - - - 1-1
1 -n
<
where M1 is a constant, then, from the same argument used in the proof of
(3.3.19) of Theorem 3.3.2, we have, for arbitrary p>O
(3.4.23)
110
where M2 is a constant. Therefore, (3.4.22) follows from (3.4.16) and (3.4.23).
Since, for an arbitrary p>O, by (3.3.22), (3.4.17), (N2), (N4) and DCT,
there exists a real number 6 such that 0<6<p and, for all
~
6),
(Cn )
~C)
).1 - n
Jsup
1I~1I=1
n~l,
IF'(e)-F'(y)ld1/J(y)
J
~ 6M 3 sup IF'(e)-F'(y)ld1/J(y)
1I~1I=1
$ 6;0
where e is between y and (y+~i~6), and M3 is a constant. Therefore, SInce
Nn(~,6)
is nonincreasing in 6, we have
111
(3.4.24)
Hence, (3.4.18) follows from (3.4.20) and (3.4.24).
o
An alternative set of conditions on t/J and F is given next for the weak
consistency of the M-estimators of regression, where we require weaker conditions
on F, but stronger conditions on t/J.
THEOREM 3.4.3.
Assume that, in addition to (N4), (N5), and (Cl)
through (C3), t/J and F satisfy the following conditions:
(Nl') t/J is differentiable with a bounded t/J'
(N2') P{Yi e{ discontinuity points of t/J'}}=O.
Then, as n-oo
A
P
I(!n - (!I - O.
Proof.
(3.4.25)
For an arbitrary p>O, by (C2), (3.4.17), (Nl'), (N2'), (N4) and
nCT, there exists a real number 6 such that 0<6<p and
112
6>to (en)
1 _n
~ >to ~C)
Jsup
1I~1I=1
It/I'(e)-v/(y)ldF(y)
J
~ 6M 3 sup It/I'(e)-t/l'(y)ldF(y) ~ 6;
1I~1I=1
e is
between y and (y-~i~6), and M3 is a constant. Therefore, by the proof
of Theorem 3.4.2, (3.4.25) follows from (3.4.16) and
where
P{II.Bn-.8I1>i>}
-
-
~
p{ sup
1I~1I=1
INn(~,6)-E{Nn(~,6)}1?:62'Y}
(3.4.26)
where M4 is a constant.
o
In the following two theorems, we will study the strong consistency of the
M-estimators of regression.
...
THEOREM 3.4.4.
Assume that, in addition to the assumptions of
113
Theorem 3.4.2, {Al (9n)} is nondecreasing in n and that there exists an increasing
subsequence {nk} of {n} such that
(3.4.27)
and
Then, as n-+oo
I~n
Proof.
-
~I -+ 0,
a.s.
(3.4.29)
From the proof of Theorem 3.4.2, (3.4.23) and (3.4.24) imply that,
for arbitrary p>O, there exists 6>0 such that
(3.4.30)
where M 3 is a constant. Hence, by (3.4.27), we have, for arbitrary e>O
114
Therefore, as k -
00
a.s.
(3.4.31 )
Let
(3.4.32)
where "1
c.=eTc.·I,(Y.
-c: et5)-e- Tc.E{·I,(y.
-c:et5)} , and let
- -10/
1 -1 -1
0/
1 -1 -
(3.4.33)
(3.4.34)
Then, Ehq(~)}=O, 71q(Q)=O, and, for any y, YERP
Var{ 71q (y) -71q (y)}
= M 1 (U-V)T(C(
)-C- n k )(u-v)
- - nk+ q
- -
115
where M 1 is a constant. Using the same argument referring to (3.4.23), we have,
for any e>O and l:::;q:::;(nk+l-nk)'
(3.4.35)
where M 4 is a constant. Note that for symmetric matrices
~
and :e (of the
s~e
order), A2(~+:e):::;A2(~)+A2(:e)[viz., Rao (1965, page 55)]. Hence, by (3.4.35) and
(3.4.28), we have that, for arbitrary e>O,
{A2( 9(nk+q ))-A1( 9nk)}
{Al( 9n k)}2
Therefore, as k -
00
a.s.
(3.4.36)
116
therefore, by (3.4.31) and (3.4.36), we have, as n-oo
ISn(e)1
,\ (6 ) = sup INn(~,e5)-E{Nn(~,e5)}1 - 0,
1I~1I=1 1 _n
1I~1I=1
sup
a.s.
(3.4.37)
Therefore, by (3.4.30) and (3.4.37), (3.4.29) follows from
= p{
lim {sup INn(~,e5)-E{Nn(~,e5)}I~e5"Y20}}= o.
n-oo 1I~1I=1
o
Next theorem gives an alternative set of conditions on .,p and F for strong
consistency of the M-estimators of regression, where, corresponding to Theorem
3.4.3, we require weaker conditions on F, but stronger conditions on .,p.
Assume that, in addition to the assumptions of
THEOREM 3.4.5.
Theorem 3.4.3,
PI (9n)}
is
nond~creasing
in n and that (3.4.27) and (3.4.28) hold
for an increasing subsequence {nk} of {n}. Then, as n-oo
I~n - ~I
-
0,
a.s.
(3.4.38)
117
Proof. Using the proof of Theorem 3.4.3, the proof is similar to the one of
Theorem 3.4.4.
0
As we can see, we require stronger conditions on t/J and F for the strong
consistency so that we have (3.4.23), which implies the weak consistency.
Although it may not provide the bound of (3.4.23), the Hadamard differentiability
approach allows weaker conditions for the weak consistency of the M-estimator of
regression in the literature, because the bound of (3.4.23) is not necessary for weak
consistency concern.
CHAPTER IV
ON THE SECOND-ORDER HADAMARD DIFFERENTIABILITY
AND ITS APPLICATION TO THE UNIFORM ASYMPTOTIC LINEARITY
OF M-ESTIMATION IN LINEAR MODELS
4.1
Introduction
Consider the general linear model:
X.1 =
QT c.
":.
-1
+ e.,1
i~1
(4.1.1)
where the £i are known p-vectors of regression constants, 1!.=( 13 1' ... , I3p)T is the
vector of unknown (regression) parameters,
p~1,
and is to be estimated from n
observations Xl' ... , Xn , and ei are independent and identically distributed
random variables (i.i.d.r.v.) with distribution function (d.f.) F. Based on a
suitable score function ..p: R-R, an (robust) M-estimator ~n of
solution (with respect to
~)
I!.
is defined as a
of the following equations:
n
~
.~
1=1
c·
..p(X.1 -c.
TfJ)
-1
-1 -
= o.-
(4.1.2)
119
(4.1.3)
gg = Diag(~rn11 , ... ,~rnpp)
(4.1.4)
(4.1.5)
(4.1.6)
(4.1. 7)
(4.1.8)
where 11·11 stands for the Euclidean norm and
(4.1.9)
and (4.1.2) is equivalent to
(with resp('ct to 1J.)
(4.1.10)
where
n
Mn(u)
= . "c
..,.(y.1 -c-m. TU)
L. -m Y'
- ,
1=1
(4.1.11)
120
i.e.,
(4.1.12)
for Yn=9g(~n -I!). Under certain regularity conditions on {~), t/J and F, we, by
incorporating the first-order Hadamard differentiability, have shown (in Chapter
III) the uniform asymptotic linearity 'of the M-estimators of regression, i.e., for an
arbitrary K>O, as n .....oo
(4.1.13)
where 1~1=I(xl' ... ,xq)l=m~
1 ~l~q
Ix. I, and
i= It/J'(x)dF(x).
The results of Chapter
1
III shows that (4.1.13) provides an easy access to the study of the asymptotic
properties of the M-estimators of regression ~n.
Since (4.1.13) is shown by incorporating the first-order Hadamard
differentiability,
we
naturally consider
using
the
second-order
Hadamard
differentiability, which provides a second order expansion, i.e., a more precise
approximation, to study the more detailed asymptotic property of (4.1.13): the
convergence rate in probability of (4.1.13), which will lead to an asymptotic
representation of the M-estimators of regression.
Jureckova. and Sen (1984) studieJ the convergence rate in probability of
(4.1.13) and gave an asymptotic representation of M-estimators in linear models,
assuming that F has finite Fisher's information and that t/J is a nondecreasing and
121
continuous function with a bounded derivative and tP'=O outside of a finite
interval.
The interest of our current study is to study the convergence rate in
probability
of
(4.1.13)
by
incorporating
the
second-order
Hadamard
differentiability and give an asymptotic representation of the M-estimator of
regression under the weak conditions on tP and F.
Some notations along with basic assumptions are presented in Section 4.2.
The general definition of the second-order Hadamard differentiability and some
related theoretical results are given in Section 4.3. The results on the convergence
rate in probability of (4.1.13) by incorporating the second-order Hadamard
differentiability and an asymptotic representation of the M-estimator of regression
are derived in Section 4.4, where we basically require weaker conditions on {~i}
for 1:=;i:=;n, tP and F than Jureckova. and Sen (1984).
122
4.2 Notations and Assumptions
Since ¥n (y) involves the following empirical function
(4.2.1)
particularly, for each yeRP and 1~k~p, Mnk(y) (the k-th component of ¥n(y))
is a linear functional of
S*k(t,u)="!1
<F- 1 (t)+c-nlT. U )
n
- LJl= lcm.kI(Y.1-
(4.2.2)
(the k-th component of §ri,(t,y)), viz.,
(4.2.3)
we consider the expected value of S;k(t,y):
(4.2.4)
We also write for every 1~i~n, 1~k~p, te[O,1] and yeRP ,
c- 'k; m
c+'k=max{O, cm
cm
'k=c+'k'k}'m
c- 'k=-min{O, cm'k};
mm
(4,2.5)
.=c+.-c-.,
-nl -nl -m'
(4.2.6)
C
123
}
(4.2.7)
C+ k=(c+ k ... c+ k)T C- k=(c- k ... c- k)T.
-no
nl'
'nn
' -no
nl'
'nn
'
+
n
n
+
b-n k = . E1 C-m'k;
1=
bn k = . E1 Cm'k'
1=
+ ) T,
b_n+ = "(b+nl"", bnp
(4.2.8)
(4.2.9)
(4.2.10)
C+
_+
cnik=
/d+
{nik nk
~+.=(c+.
ml'
C;'ik/d;u.
if d~k>O
(4.2.11)
~~ik=
if d~k=O,
0
-m
if d~k>O
...
0
C+.)T
'mp'
if d~k=O;
(4.2.12)
(4.2.13)
(4.2.14)
so that
n
+
b. E1 Cm'k = b n k n k'
(4.2.15)
1=
(4.2.16)
and
124
(4.2.17)
In this chapter, we will always consider the D[O,I] space (of right
continuous real valued functions with left hand limits) is endowed with the
uniform topology. It is easy to see that, for every yeRP and l:::;k:::;p, S;k(·' y) is
an element of D[O,I]. A convention which we will follow through the whole
chapter is: a function f: R-R is nondecreasing if f(x):::;f(y) for x:::;y, and
IS
increasing if f(x)<f(y) for x<y. Analogously, for nonincreasing and decreasing.
Some assumptions, which may be required for our results in this chapter,
are given as below:
(AI) cnik ~ 0,
(A2)
i=l, 2, ... , n
lim
n-oo
(A3)
(A4)
inf Al(Qn) > 0, where Al(Qn) denotes the smallest eigenvalue of
n~l
-
-
the matrix gn
(Bl) F is absolutely continuous with a deriv1.tive F' which is positive and
continuous with limits at ±oo
(B2) F" exists and is bounded in any finite interval
125
(Cl) 0 <
"'I
= J t/J' dF <
00
(C2) t/J is bounded and absolutely continuous with
It/J'I~M,
a.e.,
where M is a constant, and t/J'=O outside of a finite interval
(C3) t/J is a nondecreasing function with a range including positive and
negative real numbers
(C4) ft/JdF = O.
We notice that, for any fixed k=l, ... , p, (AI) and (A2) imply that
"'!1
c.
.L..l=l mk
-
00,
as n-oo
and (A2) implies two facts: there exists C>O such that for
n~l
(4.2.18)
and any
l~j~p,
(4.2.19)
where
(4.2.20)
and
(4.2.21)
126
We also notice that (B1) implies that F' is bounded and uniformly continuous.
4.3
On the Second-Order Hadamard Differentiability
The first-order Hadamard differentiability is usually defined as the
following: Let V and W be the topological vector spaces and L1 (V, W) be the set
of continuous linear transformations from V to W. Let .A. be an open set of V,
DEFINITION.
A functional T: .A. -+ W is Hadamard Differentiable (or
F
Compact Differentiable) at Fe.A. if there exists T eL 1 (V, W) such that for any
compact set r of V,
.
T(F+tH) - T(F) - T F(tH)
hm
t
= 0
t-+O
uniformly for any Her. The linear function T
(4.3.1)
Fis called the Hadamard Derivative
of T at F.
For convenience sake, in (4.3.1), we usually denote
Rem 1 (tH) = T(F+tH)-T(F)-TF(tH)
as the remainder term of the first order expansion.
(4.3.2)
127
Naturally, let e(V, W) be the set of continuous functionals from V to W,
and let
L2 (V, W) = {f; fee(V, W), f(tH)=t 2f(H) for any HeV, teR},
(4.3.3)
we define the second-order Hadamard differentiability as below.
DEFINITION.
A functional T: .A - W is Second-Order Hadamard
Differentiable at Fe.A if there exist T p eL 1 (V, W) and T F eL 2 (V, W) such that
for any compact set f of V,
T(F+tH)-T(F)-Tp(tH)-~TF(tH)
.
lim
2
= 0
t-O
t
(4.3.4)
uniformly for any Hef. T p and T F are called the First- and Second-Order
Hadamard Derivative of T at F, respectively.
We also denote the remainder term of the second order expansion as the
following:
Rem2(tH) = T(F+tH)-T(H)-Tp(tH)-TF(tH).
Remark (1).
(4.3.5)
Our definition of the second-order Hadamard differentiability
is consistent with the one given by Sen (1988). Also, we have
128
(4.3.6)
and
(4.3.7)
where
bx
is the d.f. of the point mass one at x.
Remark (2).
From our definition of the second-order Hadamard
differentiability, it is obvious that the existence of the second-order Hadamard
derivative implies the existence of the first-order Hadamard derivative.
It is known that the chain rule holds for the first-order Hadamard
differentiability (Fernholz, 1983), which makes it useful. We will show that the
chain rule also holds for the second-order Hadamard differentiability.
PROPOSITION 4.3.1.
Let V, W and Z be the topological vector spaces
with T: V - Wand Q: W -Z. If T is second-order Hadamard differentiable at
FeV and if Q is second-order Hadamard differentiable at T(F)eW, then T=QoT
is second-order Hadamard differentiable at F and
(4.3.8)
TF" = (Q 0 T)"F = Q"T(Ft T'F
+ Q'T(F)
0
T"F'
(4.3.9)
129
Proof.
Since T and Q are second-order Hadamard differentiable at F and
T(F), respectively, for any compact set r v of V and compact set r w of W, we
have
.
T(F+tH)-T(F)-TF(tH)-~TF(tH)
hm
2
= 0
t-O
t
(4.3.10)
uniformly for any Herv , and
.
~
Q(T(F)+tG)-Q(T(F))-QT(F)(tG)-~QT(F)(tG)
t-O
t
2
=0
(4.3.11)
uniformly for any Ger w .
By Proposition 3.1.2 of Fernholz (1983), we know
TF, = Q'T(F)O T'F'
and obviously TFeL1(V, Z). It is also obvious that TF given by (4.3.9)
element of L2 (V, Z). From (4.3.10) we have
T(F+tH) = T(F) + TF(tH) + ~TF(tH) + 0(1)t 2 ,
so, by the linearity of QT(F)'
= Q(T(F+tH))-Q(T(F))-QT(F)(TF(tH))
IS
an
130
= Q(T(F)+TF(tH)+~TF(tH)+o(1)t2)-Q(T(F))-QT(F)(TF(tH))
= {Q(T(F)+t{TF(H)+~TF(H)+o(l)t} )-Q(T(F)) -
-Qi-(F)(t{TF(H)+~TF(H)+o(l)t})} +
(4.3.12)
By (4.3.11) and the uniform continuity of QT(F) on a compact set, we have, for
any compact set f of V,
lim 12{{ Q(T(F)+t{TF(H)+~TF(H)+o(l)t})-Q(T(F))t-O t
= lim QT(F)(t{TF(H)+~tTF(H)+o(l)t})-QT(F)(TF(tH))
2t 2
t-O
= \i~O ~{QT(F)(TF(H)+~TF(H)+o(l)t)
- QT(F)(TF(H))}
=0
(4.3.13)
uniformly for any HEf. Since QT(F) is continuous, then
t2QT(F)(o(l))
.
bm
2
t-O
t
.
= bm
QT(F)(o(l))
t-O
I
=0
(4.3.14)
131
uniformly for any HEr. Therefore, (4.3.12) through (4.3.14) imply that
.
r(F+tH)-r(F)-rF(tH)-HQT(F)(TF(tH))+QT(F)(TF(tH))}
lim
=
t-O
t2
°
(4.3.15)
o
uniformly for any HEr.
PROPOSITION 4.3.2.
Let L: R-R be differentiable and L' be
continuous, bounded and piecewise differentiable with a bounded derivative. Let
1':
D[O,l]-LP [O,l], p~l, be defined by
I'(S) = LoS,
and let A be the set of points in R where L' is not differentiable. If I' is defined in
a neighborhood of QED[O,l] and if
Jl{Xj
Q(x)EA}=O, where
Jl
is Lebesgue
measure, then I' is second-order Hadamard differentiable at Q with derivative
I'Q(H) = (L' oQ)H
and
I'Q(H) = (L" oQ)H 2 •
Proof.
For any compact set r of D[O,l], we need to show that
132
(4.3.16)
uniformly for Hef, as t-O, where
Since f is compact, for any £>0, we can choose HI' ... , Hn ef such that
inf
1 ~l~n
II H - H. II <
£
(4.3.17)
1
for any Hef.
Since for a given Hi
L(Q(x)+tH.(x))
-L(Q(x)) -L'(Q(x))tH.1(x) - ~L"( Q(x) )t 2 H?(x)
1
1
t2
_
where
e is between Q(x) and {Q(x)+tHi(x)}, by Lemma 5.4.3 of Fernholz (1983),
we have
133
where M is a bound for L". Therefore, for each i,
where M1 is a constant. Moreover, for x such that Q(x)¢A,
as t-+O.
So, by Dominated Convergence Theorem, as t-+O
(4.3.18)
For any Her,
(4.3.19)
and
Rem2(tH) - Rem 2(tHi )
t2
_ L(Q(x)+tH(x))-L(Q(x)+tHi(x))
t2
-
L'(Q(x))[H(x)-Hi(x)]
t
-
134
where
TJ
is between {Q(x)+tH(x)} and {Q(x)+tHi(x)}. By Lemma 5.4.3 of
Fernholz (1983), we have
So, by (4.3.17),
(4.3.20)
where M r is a constant which depends on r. Therefore, (4.3.16) follows from
o
(4.3.18) through (4.3.20).
THEOREM 4.3.3.
Suppose T: D[O,I] -+ R is a functional and is second-
order Hadamard differentiable at U. For any fixed k=l, 2, ... , p, assume (AI),
(A2) and (Bl). Then, for any K>O, as n-+oo
(4.3.21)
Therefore, we have, as n-+oo
135
)2{ (S~k(" y)) - r (-U(.)-)} -.En cnikru, ( Snk("y)-U(·).E
*
n)
n
sup I( .E cnik r n C
cnik lyl~K 1=1
Ei=l nik
1=1
1=1
(4.3.22)
Proof.
The proof follows from simply replacing
Rem(tH)
t
in the proof of Theorem 2.3.1.
by
o
In the general case of {cnik}' we have the following theorem.
THEOREM 4.3.4.
Suppose r: D[O,l] - R is a functional and is second-
order Hadamard differentiable at U. Assume (A3) and (B1). Then, for arbitrary
l~k~p
and K>O, as n-oo
n + )2{ ( S:t( .,y)) (S:t( .,Q))} n + ru, (S*+ ( ) S*+ ( ))
-:E cnik
nk ',y - nk .,Q I( .E cnik r ~n C + - r En C +
lyl~K 1=1
.l..Ji=l nik
i=l nik 1=1
sup
-
~{ru(S:t(·,Y)-U(')if:1c;ik)-ru(S:t("Q)-U(')if:1c;ik)}If. 0,
(4.3.23)
and
136
n
)2{ ( s~i/ .,y)) - r (S~k(
.,Q))} n
( ) S*- ( ))
n c-.L c~ikru(S*nk ',y - nk .,Q sup I( .L c~ik r n clyl5 K 1=1
Li=l nik
Li=l nik
1=1
I
-
~{rU(S~k( .,y)-U(.)iE1 c~ik)-rU(S~k("Q)-U(.)iE1C~ik)}1 .E O.
(4.3.24)
Proof.
Using Theorem 4.3.3, by the similar proof of Theorem 2.3.2, we
have, for any 15k5P and K>O, as n-oo
n cnik
+ )2{ r (S:k("
n cnikrU
+
(S*+(
n cnik
+ ) n c+y)) - r (U( . ))} -.L
nk ',y,) - U()
·.L
sup I( .L
IYl5K 1=1
Li=l nik
1=1
1=1
I
(4.3.25)
and
n C~ik)2{ r (S~k(.,
n c~ikrU(S*n C~ik) !1 C-.y)) - r (U( . ))} -.L
nk (',y) -U(·).L
sup I( .L
IYl5K 1-1
L 1=1 mk
1-1
1=1
I
(4.:t26)
Therefore, (4.3.23) and (4.3.24) follow from (4.3.25) and (4.3.26), respectively.
o
137
4.4 Convergence Rate in Probability of the Uniform Asymptotic Linearity of the
M-estimators of Regression
In this section, we define a function g: R - R given by
[x - a+F-1(a)F'(F- 1(a) »)/F' (F- 1(a»
g(x) =
(4.4.1)
{F-1(X)
[x- b+F-1(a)F'(F-1(b»)/F' (F-1(b»
where O<a<b<l, and define a function m: R - R given by
te[O,l].
(4.4.2)
So, we have
1/F'(F-1(a»
g'(x) = {
1/F'(F-1(x»
(4.4.3)
1/F' (F- 1 (b»
and
J:
m(t)dt
= It/J'(x)dF(x) = -yo
(4.4.4)
Obviously, g is continuous and differentiable everywhere, and g' is bounded,
continuous and piecewise differentiable with a bounded derivative, if we assume
138
(B1) and (B2).
LEMMA 4.4.1.
For a fixed k=l, ... , p, assume (AI), (A2), (B1), (B2),
(C1) and (C2). Then, for any K>O, as n-oo
- (f:
T·Y')'I = Op(l).
. lnJ'k) . f:1 cm'k~ m
C
J=
Proof.
(4.4.5)
1=
From the assumptions on t/J and F, we have that there exist ao,
bo E(O,l) such that m(t)=O outside of [ao,bo]' Choose 'a' and 'b' for (4.4.1) such
that O<a<ao<bo <b<l, then, from the properties of g, we have
1
cnJ'k)J g'(t){[S~k(t,y) -S~k(t,Q)]-[Snk(t,Y)-Snk(t,Q)]}m(t )dt
J=l
0
+ (,f:
+ (,f:
J=l
1
cnJ'k)J g'(t)[Snk(t,y)-Snk(t,Q)]m(t)dt,
0
+
139
(4.4.6)
where
et is between S;k(t,y)/Ei
1 cnik and S;k(t,Q)/Ei 1 cnik'
By Corollary 3.2.5, for any £>0, there exists a positive integer N such that
S* (t,u)
- -t\; te[O,l], IYI~K}>c5) ~ £
'£""1=1 c'k
m
p(sup{1 ~.~k
where c5>0 such that a«ao-c5)«bo+c5)<b. Since g'
bounded derivative in [ao-c5,bo+c5], for
n~N,
for
IS
n~N,
differentiable with a
we have
(4.4.7)
where '1t is between
et and t, and for any p>O,
(4.4.8)
When
S* (t u)
nk
,- -tl<c5
n
- ,
IE1=1
. c·
mk
we have ete[ao-c5, bo +c5] for te[ao, bo], therefore '1te[ao-c5, b o +c5] for te[ao, b o],
and, since et E
f:-1 cnik is between S;k(t,y) and S;k(t,Q),
140
is between [S~k(t,y)-Snk(t,Q)) and [S;k(t,Q)-Snk(t,Q)), therefore, by Corollary
3.2.5, we have
(4.409)
Therefore,
sup IIul = Op(l)
lyl5 K •
follows from (4.4.7) through (4.4.9) and Corollary 3.2.2.
We notice that, for any IYI5K, E{IIy}=O and IIQ=O. Since
II u = (
•
E c ·k){·-1
Ecm·k
·-1 nJ
J-
1-
F(Y j )
J
g'(t)m(t)dt -
T
F(Yi-~ni!:!)
for any y and y,
E
E
2
Var{IIu•
-IIv}
5 ( ·lnJ
C ·k)2
cm·k E{
•
·
l
J=
1=
T
T .v)} 2
n c·k )2n2
<
M1( .LJ
~
~ c ok E {(
F y.1 -c·m·u) -F ( y.1 -c·m·
nJ
.LJ m
J=l
1=1
(4.4.10)
141
where M 1 and M2 are coostants. By the similar argumeot of the proof of Theorem
3.3.2, we have, as 0-00
p
sup
lyl~K
IIIu I -
(4.4.11)
O.
-
Sioce, by (B1), (B2) and (4.2.19),
0
IIIu- =
(
(
) 0
T
c 'k . E101
c 'k~ 01'Y"Y =
. E10J
J=
1=
0
C'k )J1,
{g (t)[S k(t,y)-S k(t,Q)]- E
c 'k~ T .y}m(t)dt
j=l OJ
0
0
0
i=l 01 m
0
E
t
t
t
t
= (
c 'k)
c 'kJ1 g'(t)[F(F-1(t)+~OT.y))j= 1 oJ i= 1 01 0
1
= (
c 'k)
c ·k~T.yJ1g'(t)[F'((t)-F'(F-1(t))]m(t)dt
j=l OJ i=l 01 01 0
~
M 3(
tj=lcOJ'k)i=l
t cm'k~01T. YJ 11(t- F-1(t)llg'(t)m(t)ldt
0
0
)
TOT
1=1
0
·u <
<
"cOJ'k ao .LJ
" c01'k c-m" c01'kc-01·u
- M4 ( .LJ
- M4 C.LJ
J=l
1=1
therefore, as 0 - 00
IIIIulyl~K
sup
- (
c 'k) E c 'k~OT1'Y"Y
j=l OJ i=l m
0
E
0
1-= 0(1).
(4.4.12)
142
Therefore, (4.4.5) follows from (4.4.6) and (4.4.10) through (4.4.12).
0
In the general case of {cm'k}' we have the following proposition.
Under all the assumptions of Lemma 4.4.1, except
COROLLARY 4.4.2.
replacing (AI) and (A2) by (A3), we have that, for any K>O and
l~k~p,
as n-oo
(4.4.13)
and
- (E
c-nJ'k) Ec-m'k £nT'Y"Y1 = Op(l).
. 1
J=
1=
.
i
1
(4.4.14)
Consider any l~k~p. If d~k=O or d~k=O, this is just the case of
Proof.
Lemma 4.4.1, and the left hand side of (4.4.13) or (4.4.14) is equal to O. If d~k>O
and
d~k>O,
the proof is the same as Lemma 4.4.1's by using Corollary 3.2.3 and
Corollary 3.2.6.
o
143
THEOREM 4.4.3.
For a fixed k=l, ... , p, assume (A1), (A2), (B1),
(B2), (C1) and (C2). Then, for any K>O, as n-oo
Proof.
Consider a functional T: D[O,l]-R defined by
T(G) =
f: g(G(t))m(t)dt,
where g is given by (4.4.1). Then,
T
GeD[O,l].
(4.4.16)
can be expressed as a composition of the
following second-order Hadamard differentiable transformations:
1'2(S)=
J: S(t)m(t)dt;
and
Note that 1'1' by Proposition 4.3.2, is second-order Hadamard differentiable at U
with
1'l (H) = (g'oU)H,
u
1'1 (H) = (gil 0 U)H 2 ,
U
HeD[O,l]
144
because g is differentiable everywhere and g' is bounded, continuous and piecewise
differentiable with a bounded derivative, and that 12 is linear and continuous,
thus is second-order Hadamard differentiable with 12(G)=O. From Proposition
4.3.1,
T
is second-order Hadamard differentiable at U with
TU(G) = J:g'(t)G(t)m(t)dt,
(4.4.17)
GeD[O,l]
and
(4.4.18)
GeD[O,l].
From Theorem 4.3.3, we have, for any K>O, as n-oo
sup
n
)2{ (S~k("
y) ) -TU(,)
(
)} -.Lcn·kTUSnk"y-U·.Lcnikn
,( * ( )
() n
)
n
I( .Lcn·k
c
T
lyl~K J=l J
therefore, as n -
-
Li=l nik
J=l
J
1=1
00
~{Tu(S~kh Y)-U(')it1cnik)-Tu(S~kh Q)-U(')iE1cnik)}1 f.
O.
(4.4.19)
Since t/J is absolutely continuous,
145
TU(S;k( .,y)) = I:g'(t) S;k(t,y)m(t)dt
1
I
00
=i El cnik
1
T
tP'(F- (t))d
n
F'(F- 1(t)) t =i El cnik
F(Yi-~ni!:!)
OOI
tP'(x)dx
(Yi-~~i!:!)
then,
(4.4.20)
Since
by Corollary 3.2.5 and the boundedness of g,,(t) for te[ao,bo], as n--oo
sup
lyl~K
ITu(S*k(" y)- U(.)
n
E
(4.4.21 )
c ·k)1 = Op(l).
i=l m
Therefore, (4.4.15) follows from (4.4.19) through (4.4.21) and Lemma 4.4.1.
o
In the general case of {cnik}, l~i~n, l~k~p, we have the following
146
theorem.
THEOREM 4.4.4.
Under all the assumptions of Theorem 4.4.3, except
replacing (AI) and (A2) by (A3.)r assume
l:eii~nl = 0(1),
(4.4.22)
asn-oo
where :eii=Diag(b~l' ... , biip ), and
(4.4.23)
Then, for any K>O and
l~k~p,
as n-oo
(4.4.24)
(4.4.25)
where lJii=Diag{b~l' ... , biip }.
Proof.
Consider any l~k~p. If d~k=O or d~k=O, (4.4.24) is the case of
Theorem 4.4.3. If d~k>O and d~k>O, using the functional
T
given by (4.4.16),
from Theorem 4.3.4, we have, for any K>O and 1 ~k~p, as n-oo
147
(4.4.26)
and
sup
.,y)) (S~k(.,9))} n
, (S*- ( ) s*- ( ).)
n _
n c-.E c~ikTU nk ',y - nk ',9 I(.En c~ik )2{ ( S~k(
T
IYI::;K 1=1
Ei=l Cnik
-T
Ei=l nik
1=1
(4.4.27)
Since
Tu(S*+k(" y)-U(·)
n
EC+'k) = Jg"(t)[S*+k(t,y)-t
EC+'k]2m (t)dt
. 1 m
n
. 1 m
1=
1=
1
0
and
Tu(S*-k(" y)-U(.)
n
Ec-m'k) = Jl g"(t)[S*-k(t,y)-t
Ec- 'k]2m (t)dt
n
. 1 m
. 1
1=
0
1=
148
by Corollary 3.2.6 and the boundedness of g"(t) for te[ao,bo], we have, as n-oo
f:
(4.4.28)
f:
(4.4.29)
lyl~K
sup
IrU(S*+k(" y)- U(.)
c+ 'k)1 = Op(1),
n
i=l m
sup
IrU(S*-k(" y)- U(.)
and
lyl~K
n
c- 'k)1 = Op(1).
i=l m
Since
t/I'(x)dx
(4.4.30)
and
ru(S~i/ "y))
=
f:
g'(t)
S~k(t,y)m(t)dt
(4.4.31)
where
M-n k(u)="~
-c T. n)
• L. 1= lc-nl'ko',(y,
0/
1 ·nl·
149
so that
(4.4.32)
then, as n -
00
f:
(4.4.33)
sup Ib+k[M+k(y)-M+k(Q)+
C+'k£nTl'Yill = Op(1),
lyl ~K n
n
n
i=l m
and
(4.4.34)
follow from (4.4.13), (4.4.26), (4.4.28), (4.4.30) and (4.4.14), (4.4.27), (4.4.29),
(4.4.31), respectively. Therefore, as n-oo
sup
lyl~K
f:
Ib+k[M k(y)-M k(Q)+
C 'k£nT'Yil +
n
n
n
i=l m
1
Hence, (4.4.24) follows from (4.4.22) and (4.4.33) through (4.4.35).
COROLLARY 4.4.5.
0
Under all the assumptions of Theorem 4.4.3,
150
11
except replacing (AI) and (A2) by (A3), assume
I~n~til = 0(1),
asn-oo
(4.4.36)
where ~n=Diag(bnl' ... , b np ), and
(4.4.37)
Then, for any K>O and
1~k~p,
as n-oo
n
T
1
1b- k[M k(y)-M k(Q)+ E cn'k!;nl'Y'Y] = Op(1),
lyl~K n
n
n
i=l 1
sup
(4.4.38)
(4.4.39)
where ~n=Diag{bnl' ... , b np }.
Proof.
The proof is similar to the one of Theorem 4.4.4.
0
The following theorem gives an asymptotic representation ?f M-estimators
of regression.
•
151
THEOREM 4.4.6.
In addition to (A3), (A4), (B1), (B2) and (C1)
through (C4), assume that (4.4.22) and (4.4.36) hold. Then, as n-oo
(4.4.40)
where (~)=«(xl' ... ,xq)=mi~ Ix·1 and !=(1, ... , 1)T.
l$l$q 1
Proof.
From the assumptions on t/J, we have that, for arbitrary (>0,
1$k$p and n~1, there exists Ko>O such that
l¥n(Q)1 = Op(1).
(4.4.41 )
Since (A3), (4.4.22) and (4.4.36) imply
(I!TtJiil) -
00,
(4.4.42)
by Theorem 4.4.4, we have, for any K>O, as n-oo
P
sup l¥n(Y)-¥n(Q)+QnYrl - O.
IYI$K
-
(4.4.43)
152
Using the similar argument referring to (3.3.16) in the proof of (2) of Lemma
3.3.1, by (4.4.41), (4.4.43), (A4) and (C3), we have that, as n-oo
\Ynl
(4.4.44)
= Op(1).
Therefore, by Theorem 4.4.4,
(4.4.45)
as n-oo. Since (4.4.45) implies
hence, (4.4.40) follows from (4.4.46) and (A4).
0
•
153
NOTATIONS AND ABBREVIATIONS
~,
l'
I:
=(1, ... , I)T
~,
y, Y, ... ,
I(x~y):
:e, 9, ... : column vectors or matrices
=1 if x~y and =0 if x<y
11·11: standard Euclidean norm for any vector ~ERq, where q is a positive integer;
uniform norm for any HeD[O,I].
1I·IILp: standard norm in LP space
1·1: 1~I=m~
q
Ix·1 for ~=(xl' ... , xq)eR , where q is a positive integer
(.): (~)=mi1:1
q
Ix·1 for ?S=(xl' ... , xq)eR , where q is a positive integer
1 ~l~q
1 ~l~q
1
1
Al(~):
smallest eigenvalue of matrix
A2(~):
largest eigenvalue of matrix
~
~
E q : space [O,I]q, where q is a positive integer
Lq(m): = { xh(ll' ... , lq) I ~=O, 1, ... , mj i=l, ... , q}, where q and mare
positive integers
lim: lim sup
lim: lim inf
154
•
D convergence In
. d'IS t'b
.
-:
n utlOn
R: convergence in probability
Np(~, ~):
p-variate normal distribution with mean
~
and covariance matrix
~
0(1): term tends to 0
0(1): term tends to be bounded
a.e.: almost everywhere
a.s.: almost sure
d.f.: distribution function
LLd.r.v.: independently and identically distributed random variables
sup: supremum
inf: infimum
LSE: least square estimator
MLE: maximum likelihood estimator
DCT: dominate convergence theorem
•
155
REFERENCES
Bickel, P. J. (1975). One-step Huber estimates in the linear model. JASA. 70,
428-433.
Billingsley, P. (1968). Convergence of Probability Measures. John Wiley & Sons,
New York.
Boos, D.D. and Serfling, R.J. (1980). A note on differentials and the CLT and
LIL for statistical functions with applications to M-estimates. Ann. Statist.
8, 618-624.
Chung, K. L. (1974). A Course in Probability Theory. Academic Press, INC.
Fernholz, L.T. (1983). Von Mises Calculus for Statistical Functionals. Lecture
notes in statistics 19. Springer, New York. [1.3f, 2.1b, 2.3a].
Hampel, F. R. (1968). Contribution to the theory of robust estimation. Ph.D.
dissertation, University of California, Berkeley
Hampel, F. R. (1974). The influence curve and its role in robust estimation.
JASA. 62, pp. 1179-1186.
Hampel, F.R. and Ronchetti, E.M. and Rousseeuw, P.J. and Stahel, W.A. (1986).
Robust Statistics. John Wiley & Sons, New York.
Huber, P.J. (1964). Robust estimation of a location parameter. Ann. Math.
Statist. 35, 73 - 101.
Huber, P.J. (1973). Robust regression: Asymptotics, Conjectures and Monte
Carlo. Ann. Statist. 1, 799 - 821.
Huber, P.J. (1981). Robust Statistics.
John Wiley & Sons, New York.
Jureckova., J. (1971). Nonparametric estimate of regression coefficients. A. M. S.
42, 1328-1338.
Jureckova., J. (1977). Asymptotic relations of M-estimates and R-estimates in
linear regression model. Ann. Statist. 5, 464 - 472.
Jureckova, J. (1984). M-, L- and R-estimators. In Handbook of Statistics,
Volume 4: Nonparametric Methods (eds.: P. R. Krishnaiah and P. K. Sen),
North Holland, Amsterdam, pp. 463-485.
156
Jureckova, J. and Sen, P.K. (1984). On adaptive scale-equivariant M-estimators
in linear models. Statistics & Decisions, Supplement Issue No.1, 31 - 46.
Jureckova, J. (1989). Consistency of M-estimators of vector parameters.
Proceeding of the Fourth Symposium on Asymptotic Statistics. Edi. by P.
Mandl and M. Huskova, Charles Univ., Prague, 305-312.
Kallianpur, G. (1963). Von Mises function and maximum likelihood estimation.
Sankhya A 23, 149-158 [2.3a].
Koul, H. L. (1977). Behavior of robust estimators in the regression model with
dependent errors. Ann. Stat. 5, 681-699.
Neuhaus, G. (1971). On weak convergence of stochastic processes with
multidimensional time parameter. A.M.S. 42, 1285-1295.
Roo, C. R. (1965). Linear Statistical Inference and its Applications. John Wiley &
Sons, Inc.
Reeds, J.A. (1976). On the definition of Von Mises functional, Ph.D. dissertation,
Harvard University, Cambridge, MA
RelIes, D. (1968). Robust regression by modified least squares. Ph.D.
dissertation, Yale University.
Sen, P. K. (1988). Functional Jackknifing: relationality and general asymptotics.
Ann. Stat. 16, 405-469.
•
t
Shorack, G. and Wellner, J. A. (1986). Empirical Processes with Application to
Statistics. John Wiley & Sons.
Von Mises, R. (1947). On the asymptotic distribution of differentiable statistical
functions. A. M. S. 18, 309-348.
Yohai, V.J. and Maronna, R.A. (1979). Asymptotic behavior of M-estimators for
the linear model. Ann. Statist. 7, 258 - 268.
•
,