Clarke, Brenton R.Nonsmooth Analysis and Frechet Differentiability."

.
NONSMOOTH ANALYSIS AND FRECHET DIFFERENTIABILITY
OF
M-FUNCTIONALS
•
•
by
Brenton R. Clarke
Murdoch University
•
Key Words:
M-estimators, Robustness, Gross error sensitivity,
Asymptotic expansions, Asymptotic normality, Weak continuity,
Selection functional, Local uniqueness, Empirical distribution
function.
Mathematics subject classification
Abbreviated Title:
(Amer. Math. Soc.)
primary
62E20
secondary 62G35
Relaxing Conditions for Frechet Differentiability.
-2-
1.
INTRODUCTION
In a recent paper the author showed that contrary to popular
opinion, strict Fr'chet differentiability of the class of M-fUnctionals
is frequently possible.
A necessary requirement for existence of the
Frechet derivative is that the defining psi function is uniformly
•
bounded, and this naturally excludes those nonrobust estimators
such as the maximum likelihood estimator in normal parametric models.
On the other hand, in that paper, smoothness assumptions were imposed
on the defining psi function which are not appropriate for many common
robust proposals in M-estimation theory, such as lIuber's(1964) minimax
solution and Hampel's(l974) three part redescender used in estimating
location.
A host of robust solutions for more general parametric
families are obtained through Hampel' s(1968) lemma 5, and generalizations
of it (cL Hampel 1978), and these almost invariably are functions with
"sharp corners".
•
Indeed the problem that is presented by failure of
psi functions to have continuous partial derivatives has been the
focus of papers by Huber(1967), Carroll(1978) with respect to proofs
of asymptotic normality.
While Frechet differentiability of the M-functional
apriori gives asymptotic normality of the M-estimator, at least for
real valued observation spaces, it also gives a direct expansion
by which the degree of robustness can be directly m('asured through the
gross error sensitivity.
The latter quantity is the supremum of the
absolute value of the influence curve of Hampel(l968,1974), and Huber(l977)
assuming existence of the Fr'chet derivative shows that the maximum
asymptotic bias in contaminated neighbourhoods of a parametric distribution
is proportional to the gross error sensitivity.
Subsequently Frechet
differentiability of a statistical functional is an important tool
in the robust description of an estimator, and complements the definition
of a robust functional as one that is weakly continuous (cf. Hampel 1971).
-3-
In this paper the methods of nonsmooth analysis • described in the book
by F.H. Clarke(1983). are introduced to the theory of statistical expansions,
and are used here in the proofs of weak continuity and
Fr~chet
differentia-
Subsequently the conditions for Fr~chet
bility of M-functionals.
differentiability given in Clarke(1983) can be relaxed to include most
popular M-functionals.
The M-estimator is a solution of equations
IJJ(X.T) dF (x)
o•
=
n
(1.1)
where F is that distribution which attributes atomic mass lin to each
n
of n independent identically distributed observations xl •.... ,x > having
n
common distribution Fe=G , the space of probability distributions
defined on some separable metrizeable observation space R.
For the
applications in this paper it is only necessary to consider R = E.
the real line.
r-space
r
E ,
The parameter
and F
= {F
T
:
T
Te.O.
an open subset of Euc lidean
•
is a parametric family where the
EO}
usual assumption is that F = Fe for some eeo.
The function tP: RxO
-+-
E
r
can be defined through minimization of some loss function. or obtained
by some other optimal criteria ..
The theory of robustness makes use of
the M-functional T defined on G.
so that more generally T[G] is a
solution of equations
Ill.
IjJ(X,T)
if a solution exists, T[G] =
00
dG(x)
otherwise.
given by the functional T evaluated at F
n
(1. 2)
0
=
Thus the estimator is
•
and its asymptotic properties
follow from continuity and differentiability of T at F with respect to
suitable metrics defined on G.
This approach to asymptotic theory for
statistics was first considered by Von Mises(1947) .
-4-
To avoid ambiguity. and also for good statistical practice. the
concept of a selection functional p was introduced by Clarke(1983).
in order to identify in the event of several solutions of the equations (1.2).
that root which is to be the estimator.
is defined by
in6
~.p
That is. the M-functional
so that
I(~.G) p(G •• )
=
peG, T[~.p,G]),
where
W(x,') dG(x)
,if a solution exists.
Otherwise
=
0,
• EM
T[~.p,G] =~.
The functional T
is then Frechet differentiable at F with respect to the pair (G,d*),
for suitable metrics d* on G. if T can be approximated by a linear
,
functional
T which is defined on the linear space spanned by the
F
differences
-e
•
I
G - H of members of G, so that
,
T[G] - T[F] - TF(G - F)
as d*((;,F)
-+
0, GE.G.
I
=
o(d*(G,F))
(1. 3)
Essentially the expansion for Frechet differen-
tiability is dependent on a local expansion of equations(I.2),
and
a robust selection functional will automatically select the Frechet
differentiable root. whenever one exists.
To the latter end one uses
an auxilliary functional p(G.T) = IT - el
to prove existence of a unique
Frechet differentiable root in a local neighbourhood of the parameter e
when considering the derivative at Fe'
Also it is sufficient to consider
the expansion (1.3) for T defined on G, and the usual mathematical
extension of the domain of T to the linear space of signed measures is
•
of little importance here .
The Frechet derivative may be considered strong in the sense
that existence of the Frechet derivative for statistical functionals
implies existence of the weaker Hadamard or compact derivatives of
Reeds(1976), Fernholz(1983), and the Gateaux derivative discussed by
Kallianpur(1963), a special case of which is the influence curve
-5-
IC(x.F.T) =
.urn
T[(l-e:)F+e:o ] - T[F]
x
£-+0
; here 0
x is the distribution attributing mass 1 to the point x.
The G~teaux derivative is given by
J
IC(x.F.T) d(G-F)(x).
which coincides with the Fr~chet derivative when the latter exists.
Unfortunately comments by Kallianpur(1963) which were in specific
relation to the maximum likelihood estimator (rnle) led other researchers
to believe the derivative too strong to obtain.
states " Unfortunately the concept of
Fr~chet
Indeed
~wber(l98l)
differentiability appears
too strong: in too many cases, the Frechet derivative does not exist,
and even if it does. the fact is difficult to establish.
"
In Clarke(l983) simple conditions for Frcchet differentiability of
M-functionals were given together with a counterexample to the comments
of Kallianpur.
•
Boos and Serfling(1980) introduce the related notion of a
quasi-differential which assumes the same expansion (1.3). but
restricts G=F n
e-
and allows for small order errors in probability
with respect to the Kolmogorov distance between F and F.
n
This
expansion does not offer the same properties of robust description
of the estimating functional,
and even the mean functional satisfies
this stochastic form of differentiability.
Beran(1977) also adopts
a differential approach using the Hellinger metric, though this appears
to be for more specific application.
A weaker set of conditions than conditions A of Clarke(1983)
are introduced in section 2.
though for smooth psi functions conditions
A of that paper are easier to apply.
,
necessary to show condition A
4
nonsmooth psi functions.
Theorem 2.1 of this paper is
,introduced here. hplds for the popular
It
can be considered as a variation
- 6 -
or a generalization of the Glivenko Cantelli result.
Conditions
A'
are used in sections 3 and 4 in the theorems that give existence of a
unique continuous and
differentiable root of equations (1.2).
~chet
In particular the arguments for weak continuity follow when either of
or Prokhorov metrics are used.
L~vy
Important examples of application
are given in section 5, together with the conclusion.
2.
A DISCUSSION OF DEFINITIONS AND CONDITIONS
Suppose
f
f
Er
maps
is Lipschitz.
to itself and
nf
Denote
0
A'
is a point near which
to be the set of points at which
f
fails
to be differentiable, which by Radermachers theorem is known to be a
-e
•
set of Lebesgue measure zero.
Let
T E.n
of partial derivatives whenever
Definition 2.1:
(If(6)
~
The
gene~alized
The generalized .Jacobian
provided every matrix in
(If(6)
r
fo~
(If(6)
which means, g1:ven
(If(T)
Here
II B II
B
rxr
~ 1 .
c
af(e)
r
x
+ c
0
f
at
matY'ices
Jf( T.
1
whc1>e
)
6
~
denoted by
Z obtained m;
T·
1
+6
and
is said to be of maximal rank
on page 71 of F.H. Clarke (1983).
a there exists a
the open ball of Y'adiuB
matrix
.
f
The gene~alized Jacobian
(>
r x r
is of maximal rank (i.e. nonsingular).
The following proposition is proved
Proposition 2.1:
be the usual
Jacobian of
is the convex hull of all
the limit of a sequence of the
Jf(T)
cente"l'ed at
(If(6)
0 >
is upper semicontinuous~
a such that for
T
6 U (6) ,
o
6,
Brxr
is the unit ball of matrices for which
B 6B
rxr
implies
- 7 -
Remark 2.1:
Without loss of generality we can assume
IBYI
least upper bound of
"B"
to be the
lyl ~ 1
where
Frequently several solutions of equations (1.1), (1.2) can exist
whereupon a robust selection of the functional root is obtained using the
idea of a selection functional
p
introduced in Clarke (1983).
The
robust selection functional retains the continuity properties of the selected
n(E,F) c G of a distribution
root in small enough neighbourhoods
which can be considered here to be defined by metrices
is then defined by
include
dk , d
L
and
~
' d
p
as
T[~,p,.J.
the Ko1mogorov,
p
d..
F,
The M-functiona1
Typical choices for
d*
and Prokhorov metrics respectively.
L~vy
Condi tions A':
A'
o
A'
1
~(X.T)
is an
r x 1 vector function on
is continuous and bounded on
R
Rx e
D where
x
nondegenerate compact interval containing
and
A'2
uniformly in
xeR
is some
•
0
in its interior,
0
in the sense
I
ex
<
I'r
- 01
T in a neighbourhood of
and for all
0
Letting differentiation be with respect to the argument in
parentheses
/\~
T about
is locally Lipschitz in
~(X,T)
IWCX,T) - ljJ(x,O)
3
Dee
R is some separable metrizab1e space
that for some constant
/\'
e-
which
Given
<')
>
;lK
0
F
('r)
is of maximal rank at
t
=
0
o
there exists an
SUPTED IKG(T) - K (T)
Fe
I
c
>
0
such that for all
< <')
and
aKG(T)
c
aK F (T)
e
+
0 Brxr
uniformly in
T'" D .
Gsn(E ,F O)
- 8 -
Remark 2.2:
A'Q -= AQ
Remark 2.3:
For a function
satisfying
it follows from remark
A'
1
2.2 and theorem 6.1 in Clarke (1983) that given
E > 0
such that for all
e)
n(E,F
I
<
then
2.4:
A3
~
If
there exists an
0 ,
d , d , d
L p
k
is generated by metrics
A~.
This establishes the first part of condition
Remark
0
>
Gen(E,Fe)
SUPTED IKG(T) - KF (T)
e
whenever
0
is continuously differentiable in
F (T)
K
T at
8
e
A; , where condition
Conditions
the condition
A'o - A'3
A'
4
A3 is that of Clarke (1983).
can be considered fairly straightforward, whereas
is not so obvious.
When
R = E , the real line, it can
be shown to be a consequence of the following theorem, a proof of which is
-e
detailed in the appendix.
It is sufficient here to establish the result
for the Kolmogorov distance
dk .
Theorem 2.1 : Let A be a class of continuous functions defined on E
with the following properties:
there exists a constant
and
x E E ; and (2)
H
A
(1)
such that
is uniformly bounded, that is,
If(x) I : ;
A is equiaontinuous.
H <
Let
F
foY' aU
00
o
IS:
f.:::. A
G be (liven.
Then,
foY' every
0> 0
there is an
<:
> 0
such that
dk(Fs,G) ::; E
implies
saPhA sUPx4liEu{+oo} IfI fdG = flxfdFel < 6 ,
x
wheY'e integration is peY'formed over the intervals J x which can be
eitheY' open or closed of the foY'm
(_oo,x)
or
(_oo,x]
(2.1)
- 9 -
A similar proof yields the same result with
Remark 2.5:
d
L
replaced
In some instances Fr6chet differentiability with respect to
d
k
implies that with respect to
Consider
set of points
3KG(T)
=
, d
p
following (6.2) of Clarke (1983).
W with continuous partial derivatives bar on a finite
SeT) .
From F.H. Clarke (1983 pp. 75-83) it follows that
3 JW(y,T)dG(y)
f.(y,T)dG(y)
c
J 3W(y,T)dG(y)
f. e A and
Here
J
j
= I, ... m .
I
+
,
(2.2)
3W(X,T)G{X}.
Xe$(T)
J
~~ (y,T)
Since
=
fj(y,T)
on the connected interval
W is Lipschitz in
theorem 2.1 implies condition
3.
L
right hand side can be expanded to a finite summation
from
for
d
A~
T and
Ij ,
3W(X,T) bounded,
.
UNIQUENESS OF FUNCTIONAL SOLUTIONS TO EQUATIONS
For those psi functions which do not admit a unique root of
the equations, at least a unique root of the equations in a local region
of the parameter space about
neighbourhoods of
Fe'
e
can be shown to exist for small enough
If conditions
A'
are with respect to L6vy
or Prokhorov neighbourhoods, existence of a weakly continuous root is shown,
for which the global argument of Clarke (1983) can be used to select it i f
more than one root exists.
When the Kolmogorov distance is used only
consistency is directly established.
The following propositions are established on pp.252-255 of F.H. Clarke
(1983), and obviate the condition of continuous derivatives in the argument
for the inverse function theorem.
e-
- 10 -
Suppose
Proposition 3.1:
f
satisfies properties described in
Section 2 and
4A f $ infaf(O) IIM(S,f)
II ,
where the infimum is taken over a'll matrices
some
df(O) ~
M(O ,f) €
and foY'
imp ties
0 > O,TEiUo(O)
II
• T2 e (fo(O) ,
2A f IT 1- T 21 .
2A f :::; infdf(T) IIM(T ,f)
Then foY' arobitroaroy
If(Td
T1
~
- f(T2)1
the closure o[the baH
contains U,\ o(f(O)) .
f(Uo(O))
f
Remark 3.1:
-e
For
T ~UA 0(0)
LO!TVDa~:
Lc t (Jondi tiona
a and on
01 >
(1 > 0
wiU satisfy
IIM(T .G) II
o
[f
the choice of
,\
=
by Proposition 2.1
(0)
.
Gen(El,F
e)
01
tjJ ~ p
'l'hen theY'e is a
such that foY' aU
> 2,\
K (T)
F
lJher>e
,\
1"S
IIM(S,Fo)ll>
defined to be a vatue foY' which
41.. •
is continuously differentiable in
T
then
o
1/(4 IIM(G ,Fe) -111)
Proof of Lcnuna 3.1:
e U
ho tel foY' Borne
I\. '
impUes
M(0.FS)E3K F (0)
Remark 3.2:
1/(2A )
f
aKG (-r )
M( T , G) E
l
1
U,\ 0(f(8)) we can define f- (v) to be the
f
such that f(T) = v and Proposition 3.1 implies
VE
f
is Lipschitz with Lipschitz constant
f- 1
,
UndeY' the conditions of Prooposition 3.1
Proposition 3.2:
unique
Uo(O)
JK
Since
01 > 0
By condition
F
is upper semicontinuous, choose
e
such that
1\.'
4
satisies the criterion of Lemma 3.1.
3Kp
8
there exists an
implies
AB rxr
(T) c
uniformly in
3K
F0
(1 >
(0)
0
+
,\B
rxr
such that
whenever
- 11 -
Hence given
M(T ,G)
ES
for
ClKG(T)
there exists
M(a,F )
a
e:
ClK
such that
IIM(T,G) - M(a,Fe)11
F
(a)
e
< 2>. ,
whence by Proposition 3.1
IIM(T,G)II
> 21.. •
It is now possible to state and prove the uniqueness argument of Theorem
3.1 of Clarke (1983) using weakened conditions
A'.
The result also
implies existence of a weakly continuous root for either
L~vy
or
Prokhorov neighbourhoods. As usual the following selection functional is
only used as an auxilliary device.
eLet
Theorem 3.1:
Then given
implies
for this
K >
T[
ljJ,p ,GJ
(
l(ljJ,G)
0
p(G,T)
and suppose aonditions
IT-81
thepe exists an
c >
UK *(e)
=
K*
A'
hold.
G .n(~,F8)
Fw:other
suah that
> 0
T[ljJ,p,GJ ,
le t,
{G }
k
lim T[ljJ,p,GkJ = TrljJ,p,FeJ = 8 .
k->=
suah that
K
is of maximal Y'ank for
of positive number-s
0
exists and is an element of U (8)
there is a
n
=
For any null sequenae
be an arbi trar>y sequenae for whiah
- 12 -
Since
Proof of Theorem 3.1
aK p (.)
is upper semicontinuous in
6
such that
inf
aKG
(.)
IIM(T.G)
II
> 21..
for all
'eUK *(6)
G en(€l'F6)
where the infimum is taken over all matrices
01
101, and
A are defined in Lemma 3.1.
41.. (G) = inf aK (a) IIM(a,G)
G
Choose
0 < €*
aKG(T)
~
€1
aK F (T)
c
aK p (6) + (A/2)B
c
aK
8
M(T ,G)
K (U
G
K
*(6))
of radius
21.. .
(A/4)B
by
rxr
by
aK G(.)
II
< >. < 2A(G) ,
there exists an
and uniformly in
K (.)
G
about
•e
A'it
M(8 ,G) €' aKG(6)
U *(8)
K
KG(6)
image
such that
.
is a one-to-one function from
and by Proposition 3.2 the
AK*/2
A'it
by Proposition 2.1
rxr
e:
G e n(€:* ,F 8 )
By Proposition 3.1
Here
Hence
AS rxr
(6) +
IIM(.,G) - M(G,G)
whenever
+
6
Then for every
>
M(',G)E: aK G(.)
so that the following relations hold
c
G
II
implies
UK *(6)
onto
set contains the open ball
The argument for uniqueness now proceeds
as in Clarke (1983).
4.
FHEClUiT DIFFI:RENTlA13ILITY
It will be assumed in this section that
KF
(T)
has at least
6
a continuous derivative
KF
(T)
at
T =
6 , which is denoted
M(8)
6
This is common with absolutely continuous parametric families.
restriction
Pr~chet
differentiability follows.
With this
- 13 -
Let
Theorem 4.1:
P(G,T)
=
and assume conditions
IT-el
A'
hold with respect to this functional and neighbourhoods generated by
the metrics
J
d*
on
Suppose for al l
G.
G6 G
= O(d*(G,F e ))
$(x,e)d(G-Fe)(x)
(4.1)
R
Then
is Frechet differentiable at
T[$,p,.]
(G,d*)
Fe
with respect to
and has derivative
To prove the theorem it is necessary to introduce the following
described as Proposition 2.6.5
generalization of the mean value result
in F.H. Clarke (1983)
Proposition 4.1
r
E
and let
Tl
Let
and
f
be Lipschitz on an open convex set
be points in
T2
(The ,'ight hand side above denotes f,he
U.
(~onvex
U in
Then one has
hull of aU points of the
Since
ambigui ty.J
Proof of Theorem 4.1:
Abbreviate
T[$,p,.J = T[.]
and
let
k } be so that €k + 0+ as
By
k -+
and let {Gk } be any sequence such that Gk 6. n(€k,F a)
theorem 2, TCG k ] exists and is unique in UK* (e) for k > ko where
see that for arbitrary 6 > 0
By A''+
~ E
E
ko
K*,E
be given by Theorem 2.
Let
{E
00
aK G
k
(T)
c
aK F (T)
a
+
6B
rxr
uniformly in
TeD
(4.2)
- 14 -
for sufficiently large
k.
Consider the two term expansion,
(4.3)
I~k-0 I
where
<
theorem 3.1, and
IT[G k J
Tk
- 6 I ' which tends to zero as
k
-+
QO
by
is evaluated at different points for each
component function expansion obtained as a consequence of Proposition 4.1
~k
(i. e.
takes different values in each row of matrix
M).
See from
(4.3), (4.1) and Lemma 3.1 that
Also,
T[GkJ - 0 = -M(O)-l KG (0) - M(O)-l{M(~k,Gk)-M(O)}(T[GkJ - 0) .
k
By upper semicontinuity of
IIM(;k,G k ) - M(6)"
KC;(T)
= 0(1)
in
T and
(4.2)
.
So
5.
EXAMPLES AND CONCLUSION
Huber (1964, 1981) introduced a proposal for estimation of location
and scale of the normal distribution defined as a solution of
e
where
{(T1,T2): -
=
IjJ = (ljJl,~'2)'
setting
~(k) =
]
<
T1
<
QO
,
T2 > o}
and the vector function
where
IjJl(X) =
and
QO
=
max L-k , min(k,x)J
Imin(k2,x2)d~(x)
(61,02) , where
it follows that since
Here
cl>
denotes the normal distribution.
now distinguishes the vector parameter,
Kcl> (~) is continuously differentiable, the Jacobian
- 15 -
M(~)
::;
f 1/!{(y)d~(y)
1
- e:;2
0
(S .1)
f Y1/!2 (y) d~ (y)
0
Condition
A'o follows since
inspection, and
A'
3
E~[1jJ]::;
holds since
suffices for the first part of
M(!)
A~.
A'
0
1 '
A'
2
hold by
is nonsingular.
Remark 2.2
To apply theorem 2.1 consider the
function
f(x,;E) ::; I (x)
(Tl-kT 2 ,Tl+k'r Z )
+ -
1
-1
It is clear that
A::; {f( .•
class of functions on
E.
r):r ~D}
e-
-k
forms a bounded equicontinuous
Also
Jhl-kT z ,Tl+k1 Z) f(x,!)dG(x)
where differentiation of W is with respect to the second argument. while
F.t!(X)
: ; ~ (x-e I}
82
- 16 -
A'4
follows by Theorem 2.1 and because
are bounded.
Cl1jJ('tl+k'r 2 • .!)
ClljJ(Tl-k'r 2 •
.:O and
Assumption (4.1) holds for the
Kolmogorov distance through integration by parts and noting that
is a function of total bounded variation.
and 4.1 there exists a root that is
F!
= ~(x~e:)
T is
with respect to
dk
Thus by Theorems 3.1
differentiable at
Fr~chet
~
Since
has a bounded density
also.
p
Consequently, the infinitesimal robustness of this M-estimator
Fr~chet
differentiable with respect to
at the normal
distribution
fO (x)
d
L
d
,
parametric distribution is evident through
differentiability.
It is also
Fo(~~el}
= (l-€)
Fr~chet
Fr~chet
differentiable at the
for which the density function of
x
e -2"
Fo
is
2
for
Ix I
~ k
ffi
k
(l-E)
---e
2
2 - klxl
Ix I
for
I2iT
with
k
and
~
K
(~ =~'
c
- 2
> k
connected through
~(-k)
=
(
l-E
being the standard normal density).
Then the M-estimator
coincides with the mle, and provides another example of a robust
and asymptotically efficient estimator.
Examples where multiple roots of the equations exist include
Hampel's 3-part redescender M-estimator for location dependent on
three parameters a.b.c;
ljJa,b,c (x) =
Ixl
Ixl
x
a sign(x)
a
~
a ~
c-b sign(x)
b
~
o
1jJ
c
Ixl
~
Ixl
~
a
~
b
~
c
- 17 -
With the choice of selection functional p(G,.)
= I• -
1 I,
G-1 (-)
2
whereby the root closest to the median is selected, the functional
T[~a,bJc'P,·] is Fr€chet differentiable at ~(Xe:l).
In a sense weak continuity and Frechet differentiability of the
functional at the empirical distribution function are also important.
Weak continuity at F indicates stability of the estimate in the
n
presence of rounding errors in the recording of observations,
and at least for sufficiently large n the effects of gross errors
can be considered blunted.
Fr~chet
differentiability at F on the
n
other hand,could be used to justify asymptotics involved in Edgeworth
type expansions and bootstrapping, for example as considered in
Hampel (1982) , Beran(1982).
When the psi function is smooth, the
only change to the arguments of Clarke(1983)
at F
forFr~chet
differentiability
is to replace F by F in conditions A -A .
n
o
I 4
n
Similarly the same substitution of conditions can be made in
the results of this paper, however if it should occur that an observation
X falls exactly at the point where
partial derivative at
T
=
~(X,.)
does not have a continuous
T[F ] then the generalized gradient
n
aK F
(T[F ]) does not reduce to a single matrix.
Even though such
n
an event would occur with probability zero in most forseeable examples
n
in which the underlying distribution was absolutely continuous, it can
be said nevertheless that the proof used in theorem 4.1 does not follow
through.
In this instance the question of whether T is Frechet
differentiable at F
n
is then left open.
M-functionals defined through (1.2),
At least in the domain of
it can be concluded that Huber's(1981)
remarks should not be interpreted in the sense that Fr€chet differentiability
is too strong.
This is only the case for nonrobust M-functionals,
and consequently we should consider Frechet differentiability an advantage.
- 18 -
The problems induced by nonsmooth psi functions are not
unique to proofs of Frechet differentiability,
to many asymptotic proofs.
and are applicable
More frequently it is the case, that
rather than consider the difficulties, the appropriate smoothness
assumptions are made in the proofs, but somehow the results are
expected to be applicable to those continuous but nonsmooth functions also.
Nonsmooth analysis can the be considered as one possible avenue
of justifying such an approach.
6.
ACKNOtJLEDGEMENT
The author wishes to express gratitude to Professor F.B.Hampel
and Professor R.J.Carroll for encouragement and the opportunity to
pursue this research.
This research has received partial support
from a US Airforce Grant No. F 49620 82 C 009, while the author was
visiting the University of North Carolina at Chapel Hill.
-19 -
6.
APPENDIX
The proof of theorem 2.1 is preceded by some necessary lemmas.
The notational abbreviation
Lemma 1:
Let
n/4
x
for
G(x)
a
Xo
=
<
Xl
h~o
G(x - h)
is used.
be any distribution function for which
~(a,b),
G(b-) - G(a) > n,
G(x-) = lim
where
a
b
<
real,
and
n
> 0
are given.
then there exists a finite partition
< ••••• <
xk '
b,
=
so that
G(x~)
J
Proof:
Since
- G(x. 1)
n, j=l, ... ,k'.
<
J-
G""l (t) = inf {x I G(x) ~ t,
Define
G(G-I(t)) ~ t,
G is right continuous
Yj = G- I {G(a)
where
k
~
1
+
X Ell
[a,b]}
choose
teG(b -) - G(a))},
is chosen so that
G(b -) - G(a)
< k .;
2(G(b -) - G(a))
n
n
Then
G(Yj) - G(Yj_l)
?
G(a)
+
teG(b-) - G(a)) - G(Yj_l)
?
G(a)
+
teG(b-) - G(a))
- {G(a)
If
y.
J
~(a,b),
+
Lr!.(G(b -) - G(a))
=
1
- - G(a)) - n/ 4
k(G(b)
~
n/4
j=l, ... ,k,
t !wn
then
G(y.) - G(y~)
J
Gey.)
J
J
?
?
G(y j -1)
G(y.) - G(y. 1)
JJ
n/4
G(x) - G(x-)
+
n/4}
If
<
- 20 -
But this contradicts the initial assumption.
~ G(a)
G(Yj)
i<G(b-) - G(a))
+
Now since
= 1,
j
... , k,
then
G(y ~)
J
= a, Yl
Note that Yo
.
1
-
- G(a)) < n
a,
and if
Yk < b,
<
xk ' = b
~
G(y j -I)
>
~G(b)
then
G(b-) - G(Yk)
=
O.
Let
a
<
= X
o
xl < •.••
k
J J=
be the partition formed from
Lemma 6.2:
VI n
Then
Let
> 0
where intervals
to
Proof:
in
e
I
c
dk(G,F e ) s c'
Ifen! x f(y)dG(y) - fen!
x
e
let
> 0
(-c, cJ.
implies
f(Y)dFe(Y)!<
n
(6.1)
may represent either open or closed intervals from
x
x.
Given
n >
l
let
0,
{d.}.
1 1=
I
be the at most finite set of points
~ n/(16H),
Pe(d i ) - Fe(d i )
such that
family
Also for given
such that
> 0
sUPfeA supx«C
-00
be given.
Fe
3(
{y.}. 1 u {b}
C
A is equicontinuous and
if they exist.
Since the
e, is compact, we
the closure of
may choose a decomposition
-c = aD < Ul < •.••.
so that
and
u _
i 1
<
x
<
Y
<.
i=l, ..... ,m.
by combining the points
i = 1, ... , k.
am = c
<
ai
Let
!f(x) - fey)
{a.* }k.
be the further decomposition obtained
1 1=0
m
{a. } .
1
\0
{d.}
~ I'
1 1=
and
1=0
< xii < ••• < x.In.
1
<
T)/4,
so that
for every
* I
a.1-
<
fEA,
a.1* ,
Fe (a i*- )
From Lemma 6. 1 whenever
there exists a finite decomposition
*
ai_I
=
I
implies
n·
J =0
{x .. }.1
1J
so that
a.*
1
for which
If
Fe(xij) - Fe (x i (j_l)) < n/(4H)
.
* _ ) < n/(4H), set
FeCai* - ) - ['O(a
i l
(6.2)
j=I, ... ,n.
1
X.
10
and
- 21 That is, no further partitioning is necessary.
set of points that partition
i
= 1,
... , k.
Denote
attributes weight
F*
(-c,c]
xEC
Fe(b i ) - Fe(bi)
such that
which
x
is given.
b.J
=
<
x
<
{x .. }nji
1J
"0
,
to the points
p.1
= !lb.
2' 1
+
bi ,
b.1+ 1)' i
and weight
= 1,
or (b) there exists an
1 :s; i
... , n' - 1.
0:s;
x
:s;
i
n'
x
:s;
n' -
1
for
b..
lX
IJenI x fdF*
i
formed by combining
X
For case (a) and
:s;
be the
1 1=0
Then either: Ca) there exists an
b.1 + 1;
<
X
n'
{b.}.
the possibly improper distribution that
to the points
Suppose
Let
Fe{C}
feA
ix
- fenl fdF e
x
+
11/4 + 11/2
2H{F e (bi
I :; j I=1 J(b.
If(P.)
b.)
J
J -1, J
+1)
-
x
Fe(b i )}
x
3
=~
For case (b) where
for some
x = b.
1
1:s; i
X
IfenI xfdF* - f.Ln I xfdF e \
LX
$
I
j=l
f (b.
J-
x
l,b.)
:s;
n'
If(P.)
J
J
Hence
~
sUPfEA sUPXEC Ifcnl fdF * - JcnlxfdFel
x
<
~
Ihis is true for any distribution satisfying the inequalities (6.2).
- 22 -
E*
In particular we can choose
G(x.~)
1J
- G(x .. 1) < nl (4H) ,
for
1J -
Let
G*
... ,
... ,
j = I,
i
.
dk(G,F e ) < E*
such that
1,
=
implies
n.
1
k.
be the corresponding improper measure constructed from
G.
Then
c
xe,
f
If CnI
3
fdG* fdGI <~
6
CnI
x
x
It is now convenient to consider case (b) first
sup£ A sup
supfeA
Ifen
~ Q"f If
I
x
II
J-1
H
$
Choose
fCnI
fdG* -
fdF e*
l,b.)
JJ
(+)
x
ix
d(G-F,)
I
dCG-F e)
I ]=1
~i IG{b.} - FG{b·}I~
J
(b. l,b.)
JJ
~niJ=l If (b.
I
L IG{b.}
+
j=l
+
J
dk(G,F ) < E' implies
e
There are two possibilities for case (a). Either
a
< E'
such that
< E*
•
a
for some
$
(*)
J
i
(*)
n/4.
<
s n' - 1,
x
whence
(+) s (*) +
sUPf~A If
fd(G:F;)I
c]nI
(b.
lX'
(*) <
n/ 4 ,
X
or
p.
IX
sXSb·I
X
+1
,
a
$
i
x
:S
n' - 1,
for which
(+)
:S
(*) + H
/GCbi-+1) - G(b i ) - Fe(b i - +1) + Fe(b i )
X
X
X
X
For either case if it happens that
If CnI
Then the lemma follows.
fdG*
x
dk(G,F e )
fCn! fdF~ I <
x
< E',
n/ 2 .
then
I
<
n/2.
- 23 -
Proof of Theorem 2.1:
Fe{E - C}
For any
<
Given
and
If I fdG - II fdFel
Let
c'
c > 0
~
H(G{I x }
~
0/2
+
0/(8H)
from
Fe
Fe{I x })
x
c = min{c', 0/(8H)}.
Kolmogorov distance
I
so that
G within K01mogorov distance
be given by Lemma 6.2 for the choice of
Choose
IJ
choose
0/(8H)
x e (_oo,_c]
x
0 > 0
from
E
fdG - II fdFel
x
Then for arbitrary
H(G{E - C}
+
SUPy~clfcnI y fdG
<
0/2
= 0/2.
x > c
and
G within
Fe
~
x
n
+
+
Fe{E - C})
0/2
fcnr fdF e \
y
by Lemma 6.2
Hence
dk(G,F e )
< (
implies
sUPfeA sUPxeE If I fdG - JfdFel
x
•
<
0
- 24 REFERENCES
.
Beran, R.:
Robust location estimates, Ann. Statist.; 5, 431-444 (1977)
Beran, R.:
Estimated sampling distributions: the bootstrap and
competitors, Ann. Statist., 10, 212-225 (1982)
Boos, D.O. and Serfling, R.J.:
A note on differentials and the CLT
and LIL for statistical functions with application to M-estimates.
Ann. Statist., 8, 618-624 (1980)
Carroll, R.J.:
On the asymptotic distribution of multivariate
M-estimates, J. of Mult. Analysis, 8, 361-371 (1978)
Clarke, B.R.:
Uniqueness and
Fr~chet
differentiability of functional
solutions to maximum likelihood type equations, Ann. Statist.,
1196-1205 (1983)
Clarke, F.H.:
Optimization and Nonsmooth Analysis, Wiley,
N~w
York
(1983)
Fernholz, L.T.:
Von Mises Calculus for Statistical Functionals,
Lecture notes in statistics, 19 Springer Verlag, New York, 1983.
Hampel, F.R.:
Contributions to the Theory of Robust Estimation, Ph.D.
Thesis, University of California, Berkeley, 1968
Hampel, F.R.:
A general qualitative definition of robustness.
Ann Math.
Statist. 42, 1887-1896 (1971)
.,
Hampel, F.R.:
The influence curve and its role in robust estimation,
J. Amer. Statist. Ass., 62, 1179-1186 (1974)
Hampel, F.R.:
Optimally bounding the gross error-sensitivity and the
influence of position in factor space; Proc. Amer. Statist. Assoc.,
Statist. Computing Section, 59-64 (1978)
- 25 -
Hampel, F.R.:
Small-sample asymptotic distributions of M-estimators
of location, Biometrika, 69, 29-46 (1982)
Huber, P.J.:
Robust estimation of a location parameter, Ann. Math.
Statist., 35, 73-101 (1964)
Huber, P.J.:
The behaviour of maximum likelihood estimates under
non standard conditions, in:
Proc. Fifth Berkeley Symposium on
Mathematical Statistics and Probability Vol. 1, University of
California Press, Berkeley (1967)
Huber, P.J.:
Robust Statistical Procedures, Regional Conference
Series in Applied Mathematics No. 27, Soc. Industr. App. Math.,
Philadelphia, Penn. 1977
Huber, P.J.:
Robust Statistics, Wiley, New York, 1981
Kallianpur, G.:
estimation.
Reeds, J.A.:
Von Mises functions and maximum likelihood
Sankhya, Ser. A., 23, 149-158 (1963)
On the Definition of von Mises Functions, Ph.D
..
Thesis, Department of Statistics, Harvard University, Cambridge,
Mass. 1976.
Von Mises, R.:
On the asymptotic distribution of differentiable
statistical functions.
An .. Math. Statist. 18,309-348 (1947)
..