ABSTRACT
A TEST OF SYMMETRY ASSOCIATED
WITH THE HODGES-LEE}UU~ ESTIMATOR
One version of the Hodges-Lehmann estimator,
1 < i, j
~
e = median{(Xi +Xj )/2,
n} , is also the minimum distance estimator derived from a parti-
cular Cramer-von Mises distance.
This distance evaluated at
e,
i.e., the
minimized distance, provides a natural statistic for testing symmetry of the
underlying distribution.
1
1.
bution
Introduction.
Let
Xl"",X
F(x) = F «x-8)/cr).
O
n
be a sample from a continuous distri-
The Hodges-Lehmann (H-L) estimator for location
A
8
1
median {(X.+X.)/Z, 1 < i < j _< n}
=
J
1
-
is well-known to have excellent robustness
and efficiency properties under the usual assumption of symmetry
=1 -
FO(X)
81
For example, the asymptotic relative efficiency (ARE) of
FO(-x) •
to the sample mean
for
F
X is
.864
>
F
for all symmetric
O
and
=
.955
However, Bickel and Lehmann (1975) have shown that this
o = normal.
ARE could in fact be as low as
0 for asymmetric
FO'
Thus symmetry is impor-
tant for efficiency considerations as well as for identifying the location
parameter
8.
In this paper we propose a natural test of the symmetry
hypothesis which is based on a minimum distance estimator characterization of
the Hodges-Lehmann estimator.
Let
F
be the usual empirical distribuuion function and define
n
00
(1.1)
d (8)
n
J
[1 - F (8+x) - F (8-x)]Zdx
n
n
_00
Simple calculations allow (1.1) to be reexpressed as
(1.2)
Therefore
d (8)
n
d (8)
n
=
Ix.-x.1 + ~
1
J
n
n
X.+X.
n
I
I
1
i=l j=l
6Z = median{(X.+X.)/Z,
1
J
is minimized by
which is another version of the H-L estimator.
Knusel (1969) first noticed this characterization.
&= (~J-lli<j
Ixi-Xjl
nd
n
(6 Z)
_
1
2 i, j 2 n} ,
(He use the term "H-J1.. estimator"
generically for any asymptotically equivalent version.)
propose the minimized distance
8
]
Z
It appears that
As our test statistic we
scaled by Gini's mean difference
2
e
n
nd (8 )/O =
n 2
(1.3)
(n-I) fl
n
L L
+
i=l j=l
IX.+X.- 26 2 1
1
J
L Ix.-x.1
1 J
..
1<J
}.
Gini's mean difference arises naturally in (1.2) and its use in (1.3) produces
computational ease as well as an intuitive ratio of absolute differences.
The
idea of course is to compute (1.3) as an auxiliary statistic along with the
H-L estimator in order to give confidence in the symmetry assumption.
If
symmetry rather that location is emphasized, (1.3) may be viewed as a modified
form of the Cramer-von Mises statistic
nd (e); others proposed for the "e
n
knowd'case include Rothman and Woodroofe (1972), Gregory (1977), and Hill and
Rao (1977).
Whatever the emphasis, Corollary 1 of Section 2 indicates the
advantage of using the minimum distance estimator (or any asymptotically
equivalent version) when
nd
n
(8 2 )
tion of
e
is unknown:
the limiting distribution of
is the difference of two terms, the first gives the limiting distribundn(e)
Moreover, when
and the second reflects the estimation of
Fa
is the logistic distribution, the
e
by
e
2
first term reduces
simply to a weighted sum of chi-square random variables
the second term is just
Monte Carlo results indicate that this limit-
ing distribution based on the logistic distribution is fairly close to that
based on a variety of other symmetric distributions.
The paper is organized as follows.
for finding the limiting distribution of
Section 2 gives a general theorem
nd (e), e
n
arbitrary.
1 and 2 then relate specifically to the H-L estimator.
Corollaries
Section 3 presents the
Monte Carlo results and Section 4 outlines the'use of an alternative distance
d * (e).
n
~
An appendix contains some technical lemmas and the proof of
Corollary 1.
3
2.
Asymptotic distributions.
handled in several steps.
First, Theorem 1 approximates
~
suitable functional of
The limiting distribution ot' (1.3) is
n
(x) = F (x) - F(x) •
nd (6)
n
by a
Corollary 1 then finds the
n
limiting distribution of this latter functional when
§
is the H-L estimator.
Corollary 2 restricts to the logistic distribution and characterizes the
limiting distribution as a weighted sum of chi-square random variables.
Let
THEOREM 1.
distribution
e
O
f
O
Suppose that
symmetric about
has a (2+s)th moment
O
0 and continuous a.e. Lebesgue.
F
satisfies
18-8\
(2.1)
e
n
F(x) = F «x-8)/cr).
and a bounded density
If
be independent random variables having
Xl"" 'X
< n
-~
log n
w.p.l
for aU
J,,;
p
n 2{e-8-T(F'~ )} - >
' n
(2.2a)
n
sufficiently large,
o,
and
J,,;
n 2{T(F;~J }= 0p (J)
(2.2b)
~
then
00
-00
REMARK.
Theorem 1 is fairly general since most connnonly used estimators
satisfy (2.1) and (2.2), where
T(F;~
.
n)
is usually an average.
The
(2+s)th
moment is surely not necessary; in fact only a first moment is required for
E d (8) <
n
00
4
PROOF.
The difference (2.3) may be written as
00
n
f {fCe+x)T(F;An)
- [Fn (S+x) - Fn (e+x)]}Qn (e,e,x)dx
_00
(2.4)
00
_00
where
= {1·~
Qn (8,e,x)
... Fn
(a-x) -·
[f(e+x) + f(e-x)]T(F;A)-A
.:Ifn(frrx)
·
n n(e-tx)-A n (e-x)}.
The first term of (2.4) is bounded in absolute value by
00
n fl'f(8+x)T(F;A ) - [F(6+x) - F(8+x)]I· IQn(8,8,x) Idx
n
_00
(2.5)
00
+ n
sup
-oo<x<oo
I·
f 1Qn(8,8,x) Idx .
-00
flQ n (8,e,x)ldx
We first show that
00
IA n (8+x) - An(e+x)
00
00
_00
_00
-00
00
=
218-el + 2IT(F;An )1 + 4
flA n (x) Idx
_00
The first two terms of this last expression are
°p (n-k2)
taking expectations, the third term can be shown to be
f [F(x) (l-F(x))]
(2+€)th moment.
ve-rgence
k2
dx <
00
,
which in turn is satisfied i f
Thus the second term of (2.5) --p-> 0
using (2.2), and
-k
open 2)
F
if
has a finite
because of the con-
n~supx I"" n (e+x) - An (8+x)1-L> 0 which follows from (2.1) and
Theorem 4.2 of Sen and Ghosh (1971).
first term of (2.5) is bounded by
Using the mean value theorem, the
5
00
_00
00
_00
where
that
*
6 (x)
lies between
1IQn(8,6")1100
= Open
6
_k
2)
Kolmogorov-Smirnov statistic
and
8
and
IIhlloo
supx Ihex)
means
I .
Note
follows from well-known properties of the
II LlnIL",
and from (2.2) and the fact that
f
is bounded. Schef fe's convergence theorem for densities handles the las t
integral.
The second term of (2.4) is of course bounded in the same manner
as the first.
0
Another asymptotically equivalent version of the Hodges-Lehmann
estimator is
83 = median{(X.+X.)/2,
1
~
J
<
-
i < j < n} .
-
applies to any of the.three versions mentioned.
Brownian Bridge on
COROLLARY 1.
distribution
U(t)
be the usual
C[O,l] .
Let
Xl"'"
F(x) = F «x-6)/cr).
O
and a bounded density
Then for
Let
The following corollary
fa
Xn
be independent random vaY'iab Zes having
Suppose that
symmetric about
a
F
°
has a
(2+e:)th
moment
and continuous a.e. Lebesgue.
i = 1,2, or 3
d
(2 •6) nd (e.) -->
n
~
[J
11 f~(X)dX}
(U(t)+U(l-t)]dt]
The proof of Corollary 1 requires some technical lemmas to verify (2.1)
and (2.2) and is postponed until the Appendix.
Note that the first term on the
the right-hand side of (2.6) is just the limiting distribution of
and the second term is the adjustment due to estimation of
decompositions are found in Pyke (1970) and Boos (1979).
8.
nd (8) ,
n
Similar
6
When
~
Fa
is logistic, the Hodges-Lehmann estimator is asymptotically
efficient and the right-hand side of (2.6) reduces conveniently to a weighted
sum of chi-,=quare random variables.
The maximum likelihood estimate of
8
for this situation would also produce the following result.
Let
COROLLARY 2.
i
be independent random variables having
n
If ~
l
F(x) = [1 + exp«x-8)/cr)J- .
distribution
for
Xl"" 'X
,
then
= 1, 2, or 3
z:
00
nd (
n
(2.7)
where
(~J-lLi<jIXi-Xjl
=
zl,z2""
PROOF.
e.)/
a~> i=l
L
~
~
(2i+l)(i+l)
are independent standard normal random variables.
Consider the representation for
U(t)
found in Anderson and
Darling (1952):
~
U(t) = [t(1-t)J
where the
P
Then, since
j
co
L
2
Z.
J
1 p,(t)
j=l [j(j+l)J~ J
are the Ferrer associated Legendre polynomials.
Pj(t) = (-l)j+lpj(l-t)
and
fa(F;l(t»
= t(l-t) , we have
I
1
f
l
2
[U(t) + U(l-t) J dF -a (t) ==
J [U(t)
a
o
00
=L
+ U(I-t) J2 dt
t(l-t)
Z
ZZ'J- 1
j=l
and
Likewise, since
t
[~[U(t) + U(l-t) Jdt ]ZIOO f~(X)dx
1
1
PI (t) = [6t(1-t)J~ , we have
Zp ~
(t) t]
k k
6 2 .2 2
d
2 =
7
a -E->
Finally,
20
yields
d
nd (e.) /0 - > 0
A
n
A
~
00
=
!
j=2
d
=
J=2
J!
20
2
Z2j_l
j(2j-l)
Z7~
00
t
i=l
2j-l
I z2
.(.
1)
J J-"2
00
.
[
.0
(2i+l) (i+l)
Approximate percentiles of this limiting distribution were calculated by
.
~nver
t'~ng
.
numer~ca
11y t h e c h aracter~st~c
. . f unct~on
.
0
f
Q -- \,4-1=01Z2-1)(2-1+l)
(-1+1)
L~
~
~
~
using a version of the standard Imhoff procedure found in Koerts and
Abrahamse (1969).
Test runs with the Cramer-von Mises statistic
JU 2 (t)dt
indicated that 40 eigenvalues were sufficient to get within .002 of the
percentiles listed in Anderson and Darling (1952).
We expect similar accuracy
for the percentiles listed in Table 1.
TABLE 1.
P(Q~x)
x
3.
Percentiles of
Q=
\,~01Z7/(2i+l)(i+l)
.
L~=
~
.80
.85
.90
.95
.523
.595
.700
.887
Monte Carlo Results.
.975
1.081
.99
1.348
A small Monte Carlo study was carried out to
determine the power of the proposed test for various asymmetric alternatives
and also to check the stability of level over a variety of symmetric distributions.
~
Included in the study for comparison purposes was a version of a
statistic suggested by David and Johnson (1956) and later studied by
Resek (1974)
8
J = {XU .08n] + 1) + X(n_[ .08n]) - median{X i }} / cr ,
where
are the order statistics,
"-
integer function, and
cr
is Gini's mean difference.
[ ]
is the greatest
The calculations were
based on samples of size 10,000 generated by the McGill "Super-Duper" random
number generator and further analysed via a Monte Carlo package written by
David Dickey (1978.).
The results are listed below in Table 2 where GS refers
to our statistic (1.3).
All cut-off points were estimated by Monte Carlo
samples from the logistic distribution except GS * which used instead the
asymptotic value .887 from Table 1.
The J ** row shows the improved perfor-
mance of J when testing for only right skewness.
TABLE 2.
e
2
X4
Rejection Probabilities for
2 Log2
X2
normal Xl
Uniform
a == .05.
Normal
Logistic
DEXP
Cauchy
.047
.039
.037
.039
.050
.043
.050
.050
.079
.068
.090
.076
.31
.28
.26
.16
.046
.045
.043
.044
.050
.049
.050
.050
.077
.076
.086
.075
.33
.33
.12
.13
n=20
GS
GS*
J
J**
.38
.35
.31
.43
.82
.80
.68
.68
.65
.57
.68
.77
.93
.92
.83
.89
.090
.079
.038
.040
n==50
GS
GS*
J
J**
.82
.81
.59
.71
.99
.99
.88
.93
1.00 1.00
1.00 1.00
.93
.99
.965 .995
* Cut-off
value:
**
Cut-off value:
.107
.105
.064
.061
.887 from Table 1.
the estimated 95th percentile for logistic data.
9
Several conclusions:
1)
The convergence of (1.3) is fast enough so that the asymptotic
values are acceptable even for
2)
n=20 •
The .05 levels for the Cauchy are much too large, but the uniform
and double exponential (DEXP) are only moderately above
The normal is expected1y close to
3)
.05.
.05.
The power of GS is better than J and certainly comparable to other
tests in the literature (see, e.g., Table 5 of Doksum, Fenstad and
Aaberge (1977».
4.
A Related Test.
Consider the statistic
00
*
d (8)
n
J
[l-F (8+x)-F (8-x)]2dF (8+x)
n
n
n
_00
which is similar to ones proposed by Rothman and Woodroofe (1972) and
Gregory (1977) for testing symmetry about
8.
on
8
Under regularity conditions
A
including symmetry, the estimator
obtained by minimizing
d*(8)
n
has asymptotic approximation
T(F;t, )
n
This reduces to the same approximation obtained by Parr and Schucany (1978)
for the
§
which minimizes
distribution
F (x-8).
O
1
d
nd * (8) --->
n
A
f[U(t)
o
when
Similar to Corollary 1 we can show
Xl'·.· ,Xn have
10
One advantage of this statistic is that it is automatically scale invariant.
Moreover, the impact of
fa
nn the limiting distribution is restricted to the
second term.
Further analysis similar to Corollary 2 shows that if
= -2 • -x-1--x
1T
e + e
1
= -
1T
•
sech (x) ,
then
nd * (6) -d>
00
L
A
n
j=l
z~J.
00
d
~
Thus
fa
L
i=l ( J.'+1)
"2 2 1T 2
could be used as a base distribution just as the logistic was used
for the statistic of Section 2.
Appendix.
In order to prove Corollary 1 we list several lemmas.
first two are general and straightforward to prove.
the H-L estimator.
of
F
and
G by
LEMMA 1.
and F
-1
F
(p)
=
The second two relate to
inf{x: F(x) ..:. p} and denote the convolution
F*G .
Let F1 and F be distribution functions.
positive derivative
sufficiently
Let
small~
such that
The
F I (x) >
a at
x
If F has a
=
then there exists a constant
C
1
depending only on
0
11
LEMMA Z.
Let
F1,FZ,G l , and GZ be distribution functions.
If Gl
has at most a finite number of discontinuities, then there exists a constant
Cz depending onZy on
Gl
such that
For the next -two lemmas it is useful to write the three forms of the
-l(~)
e. = F.~n
A
H-L
estimator as functionals
Z
Fln(x) = n(n+l)
where
~
L
i<j
I (X.+X. <Zx)
~
J-
,
n
n
1
I(X.+X.<Zx) = F *F (Zx)
FZn(x) = Z L I
n n
~
Jn i=l j=l
e
,
and
Z
F (x) =
n(n-1)
3n
LEMMA 3.
Let
X1 '.'.'X
n
If G(x) = F*F(2x)
L
i<j
I(X.+X.<2x)
~
J-
.
be independent random variabZes having distrihas a positive derivative
G'(e) > a at its
bution
F.
median
e = G-1(~) and F is continuous, then for n sufficientZy large
w.p.1 , i
PROOF.
By Lemmas 1 and 2 and for
n
= 1,2,3
.
sufficiently large
12
8"
2
The result for
Wolfowitz (1956).
then follows from a theorem of Dvoretzky, Kiefer, and
81
The result for
IIF ln - Fznt, 2. 2(n+l)-1
and
83
and
follow likewise since
IlF3n - FZnto 2. Zn-
l
•
0
In the proof of Corollary 1 we also need the following approximation of
e.1. -
8
by a sum of independent random variables.
Under the asswrrptions of Lemma 3, we have
LEMMA 4.
k
P
n~.
where
~-F.
ei
(1)
1.n
--> 0 , i
G' (8)
6,=8+ 1
(2)
1.
l~
*
n
n'2R .
-..E.-:>' 0 , i- = 1,2,3 .
PROOF.
The proof of (1) for
where
1.n
addition~
In
1,2,3.
=
(8)
1.n
8 +--...:;:....--+ R. ,
1.n
=
we have
*
~ - F(Z8-X )
n
- - - - - -i + R.1.n
i=l
G' (8)
I
§3
is almost exactly the same as the proof
of Theorem 1 of Ghosh (1971) except that
F
3n
replaces
F .
n
The crucial
step,
is verified easily from known variance formula calculations for U-statistics.
Representation (Z) follows directly from (1) since the projection of
~ -
F
3n
(8)
is just
follow since
FIn
2n
and
-1 \'
L[~ -
F 2n
PROOF OF COROLLARY 1.
F(Z8-X.)]
1.
are suitably close to
process
Un(t)
= n k2
U*(t)
n
F3n .
8"
1
and
0
We need to verify that the second term of (2.3)
converges to the right-hand side of (2.6).
Theorem 2.1h construct
The results for
As in Pyke and Shorack (1968,
to have the same distribution as the empirical
-1
6 (aF
(t)+8)
O
n
and such that for some
8 > 0
13
d(U * ,U ) =
n
n
Note that the appropriate
*n
U (t)
sup
O<t<l
T(F;~)
~
U (t)
n
~> 0 •
comes from (2)
n
00
-I
[~
n
(e+x) +
~
n
(e-x)][f(e+x) + f(e-x)]dx
-00
T(F; ~n ) = - - - - - -00- - - - - - - - : - - - - - - -
I
[f(e+x) + f(e-x)]2 dx
_00
1
I
~ [Un(t) + Un (l-t)]dt
o
=-----'----------- n
00
2/0
I f~(X)dX
_00
2
= n
I
i=l
[~- F(2e-Xi ) ]
G'(e)
Thus the second term of (2.3) may be expanded and written as
T(U )
n
The last step is to show that
T(Un*) - T(U)
T(U*)
- T(U) -E-> O.
n
is bounded in absolute value by
The first term of
14
I
00
*
*
< 2d(U ,U ){d(U ,U )
-
n
n
n
n
+ 2d(U,O) }
p
[FO(x) (l-F (x))] 1-28 dx ---~
0
O
_00
The moment condition insures that the latter integral is finite.
term of
T(U)
n
T(U*) - T(U)
n
is handled similarly and thus both
have limiting distribution
T(U) .
0
T(U *)
n
The second
and
15
REFERENCES
Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain
"goodness-of-fit" criteria based on stochastic processes. Ann. Math.
Statist. 23 193-212.
Bickel, P. J. and Lehmann, E. L. (1975). Descriptive statistics for nonparametric models. II. Location. " Ann. Statist. 3 1045-1069.
Boos, D. D. (1979). Asymptotically efficient minimum distance estimators
&or location. Institute of Statistics Mimeo Series #1216, North
Carolina State University, Raleigh, N.C.
David, F. N. and Johnson, N. L. (1956). Some tests of significance with
ordered variables. J. R. Statist. Soc. B 18 1-20.
Dickey, D. A. (1978). A program for generating and analyzing large MonteCarlo studies. Unpublished manuscript.
Doksum, K. A., Fens tad , G. and Aaberge, R. (1977).
symmetry. Biometrika 64 473-487.
Plots and tests for
Dvoretzky, A., Kiefer, J. and Wolfowitz, J. (1956). Asymptotic minimax
character of the sample distribution function and of the classical
multinomial estimator. Ann. Math. Statist. 27 642-669.
Ghosh, J. K. (1971). A new proof of the Bahadur representation of quantiles
and an application. Ann. Math. Statist. 42 1957-1961.
Gregory, G. G. (1977). Cramer-von Mises type tests for symmetry.
Statist. J. 11 49-62.
So. Afr.
Hill, D. L. and Rao, P. v. (1977). Tests of symmetry based on Cramer-von Mises
statistics. Biometrika 64 489-494.
KnUsel, L. F. (1969). Uber Minimum-Distanz-Schatzungen. Published Ph.D.
Thesis, Swiss Federal Institute of Technology, ZUrich.
Koerts, J. and Abrahamse, A. P. J. (1969). On the Theory and AppZication of
the GeneraZ Linear ModeZ. Rotterdam University Press.
Parr, W. C. and Schucany, W. R. (1978).
tion. Preprint.
Minimum distance and robust estima-
Asymptotic results for rank statistics. In Nonparametric
Techniques in StatisticaZ Inference (M. L. Puri, ed.) 21-37. Cambridge
Pyke, R. (1970).
University Press.
Pyke, R. and Shorack, G. (1968). Weak convergence of a two-sample empirical
process and a new approach to Chernoff-Savage theorems. Ann. Math.
Statist. 39 755-771.
16
Resek, R. W. (1974). Alternative tests of skewness: efficiency comparison
under realistic alternative hypothesis. Proc. Bus. & Econ. Statist.
Section. Am. Statist. Assoc. 546-551.
Rothman, E. D. and Woodroofe, M. (1972). A Cramer-von Mises type statistic
for testing symmetry. Ann. Math. Statist. 43 2035-2038.
Sen, P. K. and Ghosh, M. (1971). On bounded length sequential confidence
intervals based on one-sample rank order statistics. ~~n. Math. Statist.
42 189-203.
© Copyright 2026 Paperzz