Boos, Dennis D.; (1978).Gini's Mean Difference as a Nonparametric Measure of Scale."

BIOMATHEMATICS TRAININ'I PROGRAM
-e
GINI'S MEAN DIFFERENCE AS A NONffiRAMETRIC
MEASURE OF SCALE
by
Dennis D. Boos
Department of Statistics
North Carolina State University
Institute of Statistics Mimeo Series
April, 1978
# 1166
Gini's mean difference as a nonparametric
measure of scale
By
Dennis D. Boos
Department of Statistics, North Carolina state University, Raleigh
SUMMARY
We propose Gini's mean difference as a highly competitive alternative to the pth power deviations suggested by Bickel and Lehmann
(1976). Comparisons are based on standardized asymptotic variances
and Monte Carlo simulation.
Some key words:
Gini's mean difference; Fth power deviation;
Standardized asymptotic variance; Test for scale; Jackknife.
·e
1.
nrTRODUCTION
Bickel and Lehmann (1976) proposed the pth power deviations
T
p
= (EIX1-LI P )l/P as a suitable family of scale measures for synmetric
distributions.
These measures satisfy certain "measures of scale II
criteria and can be estimated with fairly high asymptotic efficiency
over a wide range of distributions.
Gini's mean difference
D. = Elxl-~l also satisfies the scale measure criteria, and we seek to
compare D. to several selected members of the pth power family.
The estimator of D., also called Gini's mean difference, can be
defined for a sample Xl"",X as aU-statistic
n
b.
=
(2 1)
n n-
E lx.-x.l
i<j
J.
J
or as an L-statistic
X ~ .•. ~ X is the ordered sample. David (1968) gives
2n
nn
a brief history of ~ and traces its origin back to von Andrae (1872).
where X
ln
~
B=rhaps "D. is best known as a highly efficient estimator of (2/..;:rr)cr in
normal samples; see, e. g., Nair (1936), Downton (1966) and D' Agostino
(1970).
More recently A has been used by D'Agostino (1971) in a test
for normality and by Wainer and Thissen (1976) in the construction
of a robust estimator of correlation.
In
Section
a
we show that D. satisfies the Bickel and Lehmann
criteria for scale measures, and then for selected distributions we
~
compare the standardized asymptotic variance of A to that of
T2
and
Tl ,
the sample standard deviation and mean deviation.
In Section 13 we con-
struct two sample tests for scale based on log ~ and log Tala
p -Miller (1968).
Monte Carlo simulations are used to make small sample
comparisons.
2.
SCALE MEASURES
For X having distribution F, let the notation O'(X) or O'(F) refer
to a measure evaluated at F such as the standard deviation,
21.
st
T2 (X) = (E\X-~\ )2. Let X~Y mean that X is stochastically larger
than Y, Le., per> x) ~ p(Y> x) Vx. The three basic criteria suggested by Bickel and Lehmann (1976) for a scale measure are
(1)
O'(aX)
= \a\O'(X)
a > 0, (2) O'(X+b)
= O'(X)
V b and (3) for
symmetric distributions
st
IY-~y\ ~ IX-~\ ~ O'(Y) ~
where ~ and ~ are the centers of symmetry.
satisfies (1) and (2).
0'
(X) ,
Clearly 6
(2.1)
= Elxl-X21
Under the assumption of symmetric unimodal
densities g and f for independent Y.,X.,
i = 1,2, Theorem 1 of Bickel
1
1
and Lehmann (1976) yields
st
Y
IX-~xl ~ IYI - 2 1 ~ \xl-~I •
Then if g and f have finite means, criterion (3) follows for 6 from
the integration by parts identity
-3-
Note that the same proof applies to the family
see Katti (1960) for ~
~p
=
in discrete distributions.
does not involve a measure of location, except ~2
(EIX1-~IP)1/P;
The ~p family
= J2 T2 implicitly,
and motivates the replacement of (2.1) by
for general, possibly asymmetric distributions.
The family of measures
crJ(F) = S~-l(t)J(t)dt, where J(t) is skew-symmetric about ~, also
contains ~, J(t)
= 4(t-~), and satisfies the three Bickel and
criteria for scale measures.
Lehmanh
This family has been extensively stUdied
in particular parametric settings and allows for flexibility regarding
trimming and censoring; see Johnson and Kotz (1970, Vol. 1, p. 66-72).
However, we want to restrict our attention to
~
and consider a wide
range of distributions.
In Table 1 we list the actual values of ~ and the ratio ~/T2 for
the following distributions:
logistic - F(x)
= (l+e-x/cr)-l,
exponential - f(x)
Table L
~
~/T2
2
uniform (O,b), normal (0,cr ),
Laplace - f(x)
= (2cr)-le- lx \/cr,
= cr-le-x/crI(O ~ x).
Values of Gini's mean difference ~ and
its ratio to the standard deviation, ~/T2.
Uniform Normal Logistic
b
2cr
2cr
3
,;rr
Laplace
Exponential
:2£
2
cr
1.128
1.061
1
1.155
1.103
-4-
,.
The variance of t:. is easily calculated to be
2
is the variance of X and J = EI Xl-~ II Xl -~ I.
where (]
Small sample
efficiencies have been tabulated for several distributions by Nair
(l936) and Sarhan (l954 , 1955).
However, for space considerations arid
ease of computation we prefer to look at the standardized asymptotic
~
variance, defined for estimators
which are asymptotically normal with
2
mean 8 and variance A I n by
sV(8")
= A2182 .
Not only is this measure invariant with respect to scaling factors,
but sv(s)/n is the asymptotic variance of log
e.
This latter fact
provides a bridge to the test statistics found in Section 3.
the standardized asymptotic relative efficiency of
~
,..
eff(8l'8 )
2
,..
= sv(82 )/sv(8l
).
8l
to
82
We define
by
In Table 2 we list the standardized
asymptotic variances for the mean difference ~, the sample standard
deviation
1-2 , and for the sample mean deviation from the mean .flO Also
,.
listed are the standardized asymptotic relative efficiencies of t:. to
1- and to "Tlo
2
A truly remarkable fact is that the efficiencies of
~
with respect
to the maximum likelihood estimators for the normal, logistic, and
Laplace families are .978,
.985,
and .964 respectively.
Even in the
highly skewed exponential the efficiency of t:." with respect to the mean
is .75.
In Table 3 we list the efficiencies of ~ with respect to the
-·5-
·e
,.
standard deviation 1"2 for various members of the Tukey normal gross
error distributions, F(x)
standard normal.
=
(l-e)~(x) + e~(x/A),
where ~ is the
This table is to be compared with Table 5.3 of
Bickel and Lehmann (1976) which lists eff
for the same (e,A) combinations.
(r1'''2) and eff
("1.5'''2)
For these distributions ~ appears
to perform better than T2 and "1.5' but not 'luite as well as of 1.
Table 2.
Standardized asymptotic variances and relative efficiencies
for Gini's mean difference 6, the standard deviation T2 ,
and for the mean deviation Tl •
Uniform
Logistic
Laplace
Exponential
sv(~)
.200
·511
·710
1.037
1·333
SV(T 2 )
.200
·500
.800
1.250
2.000
(T1)
·333
.571
.712
1.000
1.437
eff(~,T2)
1.000
L978
1.127
1.205
1.500
eff (1 ,",.1 )
1.667
1.117
1.003
.964
1.077
sv
Table 3.
e
Normal
Asymptotic relative efficiency of Gini's mean difference with
respect to the standard deviation for Tukey normal gross error
distributions, F(x) = (1-e)~ (x) + d (x/A).
4
6
e\A
2
0
.978
.978
.978
.025
1:122
2.581
3.746
.075
1.222
1.975
1.957
.10
1.231
1.742
1.624
.20
1.190
1.262
1.108
.40
1.077
·991
·900
.50
1.039
·952
.884
.-6-
Finally we note that 6. is not robUBt in the strict Hampel (1971)
sense.
However, any member of the
C1
J
family can be suitably trimmed to
provide such robUBtness.
3.
'!WO SAMPLE TES TS
In normal samples the distribution of
&can be
closely approxi-
mated by a X, Ramasubban (1956), or by a :Eearson type TIl curve,
Barnett, Mullen, and Saw (1967).
For other parent distributions the
distribution of ~ is unknown and not easily obtained.
&is
the asymptotic variance of
Nevertheless,_
stabilized within scale families by the
log transformation, and Miller (1968) has shown how to UBe the jackknife, coupled with the log transformation, to provide approximate t
statistics.
We shall take this approach.
For a sample Xl' •. "X and a scale statistic
n
and log
~
e 0'
-~
observation missing.
e.
~
= n log
~
e.
i=l,n, where
-~
~
compute log
-~
~.= n log
the jackknife estimate of log
e-
(n-l)
n
n
I: log
i=l
and the jackknife estimate of variance of
&<8. ) =
1
-n(-n;;;"-l-)
n
i~l
e
eo,
~
-~
e.
2
(8i
-e. )
1-
Then (e. -log e)/{cr(S.)12 is approximately t distributed with n-l
degrees of freedom.
8. 1 , 9. 2 ,
&(8.1)'
For two samples of size n l
&<8. 2 ),
e
~
is the statistic calculated with the ith
We form the pseudo-observations
e - (n-l)log e .,
,..,
e we
= n2
and form our test statistic
"'7-
==
n we compute
·e
which should be approximately t distributed with 2(n-l) degrees of
freedom under the null hypothesis of equal scale.
Using the McGill "Super-Duper" random number generator, 1000 pairs
of independent samples of size n l =
generated for selected families.
the mean difference
deviation
n
-1
= 25 were
2
The statistic T was calculated for
~
= 10 and n
l
= n
~, the usual sample variance s2, the 1.5th power.
n-l!1~1Ixi-xll.5, and for the sample mean deviation
n
~=llxi-xl.
Each was checked against the .05 and.Ol percentage
points of a t distribution, and the classical F test was also performed.
Tables 4 and 5 list the empirical powers for ratios of variances
al2/a22 = 1,2,4,6,10.
The tables are intended to resemble Tables I and II
of Miller (1968) to facilitate' comparisons.
The F test shows its usual sensitivity to nonnormality and we will
not mention it further.
servative for the
s2 >
In Table 4 we observe that all tests are con-
unifo~,
with powers in the approximate order
~ > 1.5th deviation> mean deviation. For the normal s2 dominates,
but little is lost by using
A or
the 1.5th deviation.
As somewhat of
a surprise s2 continues to do well for the logistic with the 1.5th deviation slightly beating
A, and
all three dominating the mean deviation.
For
the Laplace s 2 tends to be too liberal, and a very slight ordering appears
,.
among the remaining three, 1.5th deviation> A > mean deviation. :Ferhaps,
if the median had been used rather than X, the mean deviation would have
performed better.
For the exponential all tests are too liberal with
s2 the worst of the four;
6 and
the 1.5th deviation are indistinguishable
-8-
'l'ab~e
4. Empirical powers for the two
1 2
"2/"2 ,.
samp~e
tests of scale;
~
..
~
,. 10.
a ,. .01
a" .05
1
2
4
6
10
~
2
4
6
~o
Uniform
F
variance
Samph 1.5tb. power
Samp~e mean deviation
Gini fS mean difference
Samp~e
.016
.023
.186 .748
.286 .783
.021 .257 .726
.621
.03~ .225
.021 .26~ .749
.922
.927
.895
.809
.914
.991
.985
.977
.929
.979
.037 .347 .686 .928
.080 .476 ·717, .916
.003 .060 .392 .659 .871
.004 .052 .320 .537 .774
.005 .066 .423 .686 .896
.003
.005
Normal
F
variance
Samph ~.5th power
Samph mean deviation
Gini's mean difference
samp~e
e
.053
.045
.040
.044
.038
.265
.~94
.~5
.~7
.~3
.672
.550
.553
.500
.539
.844
.734
.738
.709
.741
·955
.893
.895
.863
.895
.011 .079
.J68
.065
.O~
.063
.006 .062
.008
.006
.379 .625 .854
.232 .445 .676
.223 .428 .680
.~
.387 .634
.224 .426 .667
IDgistic
.325
.217
.04~ .203
.036 .194.040 .204
F
.090
samp~e
.053
variance
Sample ~.5th power
sample mean deviation
Ginifs mean difference
.650
.505
.499
.470
.493
.798
.664
.660
.641
.655
.935
.811
.819
.797
.811
.028 ·~35
.009 .081
.009 .058
.009 .046
.007 .060
.434 .6~4 .808
.246 .404 .593
.23~
.394 .587
.210 .363 .562
.224 ·382 .576
!Apace
F
Sample
Sample
Sample
Ginifs
variance
1.5th power
mean dev:ia tion
mean difference
.146 .359 .643
.069 .201 .423
.060 .~8 .420
.060 .188 .411
.059 .~82 .400
.780
.549
.567
.554
.554
.897 .064 .201 .441 .6~
.716 .025 .079 .204 .3~0
·722 .018 .064 .~90 .305
.720 .O~ .054 .~5 .294
.723 .o~ .066 .179 .297
.783
.460
.468
.452
.448
Exponential
F
Samph variance
Sample ~.5th power
Samp~e mean deviation
G1ni 's mean difference
.~5
.860 .093
.245
.427 .582 .744
.076 .208 .368 .482 .626 .021 .085
.069 .~95 .354 .462 .597 .0~7 .065
.076 .~96 .370 .489 .64~ .019 .073
.193 .280 .395
.171 .267 .365
.~8
.282 .399
.361 .623
.734
.094 .219 .388 .496 .630 .030 .102 .213 .294 .420
-9-
'!able 5. Empirical powers for the two sample tests of scale; nl =
a=
a~{ = 1
2
~
= 25.
.05
4
a =
6
10
1
.01
2
4
6-
10
.273
.250
.234
.203
.230
.860
.804
.800
.751
.798
.981
.950
.955
.937
.954
.999
.997
.997
.995
.998
.276
.172
.165
.155
.164
.803
.•626
.626
.609
.959
.843
.881
.877
.630 .880
.996
.961
.983
.985
.987
Normal
F
e
Sample variance
Sample 1.5th power
Sample mean deviation
Gini t S mean difference
.066
.056
.051
.055
.052
.542
.483
.479
.450
.477
.962
.938
.937
.920
.939
.995
.993
.994
.987
.993
1.000
1.000
1.000
.999
.999
.012
.013
.010
.Oll
.010
Logistic
F
Sample variance
Sample 1.5th power
Sample mean deviation
Gini t S mean difference
.103 .493 .929
.053 .381 .843
.045 .375 .874
.044 ·375 .864
.041 .377 .873
.990 1.000 .023
.956 .989 .010
·972 .998 .0oB
·971 .999 .006
.974 .998 .008
Lapl.ace
F
sample variance
Sample 1.5thpower
Sample mean deviation
Gini t S mean difference
.137 .515
.058 .312
.051 .311
.048 .307
.046 .301
.891
.733
.750
.756
.762
-10-
.971
.881
.903
.914
·910
.062
.Oll
.006
.009
.984 .005
.997
.963
.983
.985
.328
.130
.124
.ll5
.ll6
.786 .92l .989
.465
.474
.475
.476
.688
.724
.728
.728
.877
.906
.927
·915
but both dominate the mean deviation.
Table 5 for
~
= n2 = 25
The same pattern emerges in
except that s
2
is no longer a winner for the
logistic, and ~ and the 1.5th deviation are virtually indistinguishable for the logistic and Laplace as well as for the normal.
4.
CONCLUDING REMARKS
Gini's mean difference has high asymptotic efficiency for scale
estimation in symmetric distributions ranging from uniform to Laplace.
Indications are that it will perform adequately in skewed situations
as well.
'" and
Two sample tests for scale can easily be based on log A
tend to perform as well as those based on log 1.5th deviation.
REFERENCES
Barnett, F. C., Mullen, K. and Saw, J. G. (1967). Linear estimates
of a population scale parameter. Biometrika 54, 551-4.
Bickel, P. J. and Lehmann, E. L. (1976). Descriptive statistics for
nonparametric models. III. Dispersion.~. Statist. ~, 1139-
58.
D'Agostino, R. B. (1970). Linear estimation of the normal distribution
standard deviation. Am. Statis. 24, 14-5.
D'Agostino, R. B. (1971). An omnibus test of normality for modera~e and
large size samples, Biometrika 58, 341-8.
David, H. A. (1968).
Gini's mean difference rediscovered.
Biometrika
55, 573-5.
Downton, F. (1966). Linear estimates with polYnomial coefficients.
Biometrika ~ 129-41.
Hampel, F. R. (1971). A general qualitative definition of robustness.
Ann. ~. Statis. 42, 1887-96
Johnson, N. L. and Kotz, S. (1970).
Boston: Houghton Mifflin.
Continuous Univariate Distributions-l.
-11-
··e
Katti, S. K. (1960). Moments of the absolute difference and absolute
deviation of discrete distributions. Ann. Math. Statis. 31, 78-85.
Miller, R. G. (1968).
Jackknifing variances.
Ann. Math. Statist.
567~2.
~,
Nair, U. S. (1936). The standard error of Gini's mean difference.
Biometrika g§, 428-36.
Ramasuban, T. A. (1956) . A 'X. -approximation to Gini' s mean difference.
~. Indian Soc. Agric. Statist. ~, 116-21.
Sarhan, A. E. (1954). Estimation of the mean and standard deviation by
order statistics. Ann. Math. Statist. 25, 317-28.
Sarhan, A. E. (1955). Estimation of the mean and standard deviation by
order statistics. II. Ann. Math. Statist.26, 505-11.
von Andrae (1872). Ueber die Bestimmung des wahrscheinlichen Fehlers
durch die gegeben Differenzen vom gleich genaven Beobachtungen
einer Unbekannten. Aston. Nach. ~ 257-72.
Wainer, H. and Thissen, D. (1976).
PSychometrika 41, 9-34.
Three steps toward robust regression.
-12-