Boos, Dennis D.; (1978)A New Method for Constructing Approximate Confidence Intervals for M-Estimates."

.A NEW METHOD FOR CONSTRUCTING APPROXIMATE
CONFIDENCE INTERVALS FOR M-ESTU1ATES
by
Dennis D. Boos
Department of Statistics
North Carolina State University
Institute of Statistics Mimeo Series #1198
September, 1978
A New Method for Constructing Approximate
·e
Confidence Intervals for M-estimates
by
Dennis D. Boos
The empirical function used to define M-estimates of location is
similar to a distribution function when
~
is nondecreasing.
This
similarity allows approximate confidence intervals to be constructed
from the "percentiles" of the defining function.
KEY WORDS:
M-estimates; Confidence intervals; Quantiles; t statistic.
"e
1
1.
Let
Xl, ••• ,X
n
INTRODUCTION
be a sample from a distribution F and define the
location "parameter"
6
to be the solution of
00
!
~(x-6)dF(x)
0 .
=
(1.1)
_00
An M-estimate for
6
is the solution
e
of the empirical analogue to
(1.1)
n
1
n
'-e
[~(x.-e) = 0 .
Asymptotic properties of
A
6 are well-known and the Princeton study
Andrews, et aL (1972) suggests that
quickly.
(1. 2)
~
i=l
rn
(§-&) approaches normality fairly
Huber (1970), Gross (1976, 1977), and Shorack (1976) have con-
structed approximate confidence intervals for
rn
of
(8-6)
6
based on studentization
by estimates of the asymptotic standard deviation.
In this paper a new method of constructing approximate confidence
intervals for
6
is proposed for the special class of monotone nondecreas-
ing, right continuous
(c)
A
F
= -n-~~(Xi-c)
~
functions.
The method exploits the fact that
is like a distribution function (Le., nondecreasing
n
and right continuous).
In particular, the endpoints of the proposed con-
fidence interval are "percentiles" of
A
F
.
n
Section 2 gives a motivating example and Section 3 provides the basic
ideas and method.
In Section 4 Monte Carlo results and comparisons with
other results are mentioned. Section 5 shows how to extend to the
regression situation and Section 6 is a short summary.
2-
·e
2.
Let
F
quantile
F
If
F
n
-1
-1
MOTIVATION FROM QUANTILE ESTIMATION
~
(p) = inf{x:F(x)
p} .
Consider estimation of the pth
(p),O<p<l, from a sample
is the usual empirical
having distribution F.
df, then
1
P(a<F
(F- (p»<b)
. - n
=
-1
using the fact that all df's G satisfy
independent
X.~
the statistic nF (F
-1
n
G
(p»
(t).::.x
iff
t<G(x).
is binomial
(n,F(F
For
-1·
(p»).
Thus the normal approximation to the binomial and the assumption
F(F
-1
(p»
'" p lead us to choose
-a
'-e
=b =
p
+
I
p(l-p)
n - za/2
~
for an approximate (l-a) confidence interval for
100 (l-a)th percentile of the standard normal.)
F
-1
(p)
(z
a
is the
Although exact non-
parametric procedures exist for iid samples from continuous distributions,
the above method generalizes to quantile estimation in more complicated
situations, e.g., stratified sampling from finite populations.
The
important point for the present discussion is that M-estimation can use
the same idea with F
3.
Let
~(t)
n
replaced by
A
F
.
n
APPROXIMATE CONFIDENCE INTERVALS FOR
be nondecreasing, right continuous, and strictly positive
(negative) for large positive (negative) values of t.
such
~(x)
~
e
are "Hubers"
~(x) =
= IxIVsgn(x) ,0<v.::.1.
max(-k, min(k,x»
For df's
3
G define
Two families of
and "vth power"
·e
00
AG(C) = - f
-oo<c<oo
~(x-c)dG(x)
_00
and
t s (inf A (x) , sup AG(X)).
-oo<x<oo G
-oo<x<oo
The parameter and estimate are defined by
8 = A;l(O)
Similar to the case of df's it follows that
and
-l-
AG (t)<x
and thus
p(AF-l(a) < 8 < A;l(b))
n
P(a
~
n
A (8)
Fn
<
b)
is a reasonable estimate of
where
The statistic of interest,
rn AF
n
(8)
T = _ _~n,--_
In·
on
1
2:
n
i=l
~(x.-e)
1.
has a form very close to a t statistic based on the rv's
the X. are symmetric about
1.
8
~(x) = -~(-x),
and
to be close to a t distribution with
-In alon = t a/2 = In
b/o
n
n-l
~(X.-8).
1.
then we expect
degrees of freedom.
If
T
Choosing
' our proposed approximate (l-a) confidence
interval is
4
·e
(3.1)
Under suitable regularity conditions, the asymptotic width of (3.1)
is comparable to methods based on studentizing
~
(6-8), i.e.,
It is often desirable that location estimates satisfy
""-
For M-estimates the usual procedure is to replace
where
~
~(x)
is a suitable scale estimate, or solve simultaneous equations
as in Huber's Proposal 2.
The above methods carry through exactly and the
analogous statistic of interest is
\
1:. ~ ~(Xi-8
n i=l
I
n-l
4.
n
l: ~
i=l
I
&
!
._1._
A
\ cr
COMPARISONS AND MONTE CARLO RESULTS
For small samples the form (3.2) is more appealing than
A
where
S
(3.2)
2 x .- 8
is an estimate of the asymptotic standard deviation
5
~ (6-8)/8,
·e
~ar
IC(X )]\ for the following reason. Although 6-e' is approximated
1
1
by n- EIC(X.), Boos (1977) shows that this approximation is at best 0
.
since
n~-6-n-1EIC(Xi8
In(e-6)/S
n
-~
2
p
~
Thus, proximity of
to a t distribution depends on the t-1ike statistic
"-
nC (X.) /S
~
has a limit distribution.
(n3
and
the approximation of
"-
6-6
In fact
by
Gross (1976) prefers to avoid use of the t distribution.
On the other hand
Shorack (1976) seems to get very good t approximations for certain Hampe1s.
In order to spot cheek the performance of the approximate confidence
intervals based on (3.2), a small Monte Carlo study was performed.
Table 1 is found the empirical error probabilities and In
confidence interval lengths
generated by the
--e
M~Gi11
In
times the expected
(ECIL) for 10,000 Monte Carlo "samples"
"Super-Duper" random number generator.
A different
set of 10,000 samples was used for each distribution - normal, logistic,
D-EXP = double exponential, T3 = t distribution with 3 degrees of freedom,
slash
a standard normal deviate divided by an independent uniform (0,1)
deviate, and for each sample size,
n
= 10 and n = 20.
Only crude
Monte Carlo techniques were used, so considerable error may exist in the
3rd decimal of the empirical probabilities and in the 2nd decimal of the
ECIL.
This is exemplified by the mean whose exact error probability we
know to be .05 for the normal.
*(x)
SQRT is the M-estimator based on
= Ixl ~ sgn(x~ and Hk = 1.0, 1.5 are Hubers with k = 1.0, 1.5 using
a normalized interquarti1e range as an estimate of scale.
Huber's Proposal 2 with
k
Hk*
= 1.5 is
= 1.5. For both n = 10 and n = 20 the
true levels are generally conservative, but
Hk
= 1.5
and
Hk*
=
1.5
are fairly close to .05 except for the slash distribution and each has
reasonably short ECIL.
It is mildly surprising that the mean is so
6
.-
Table l.
Empirical Error Probabilities and Expected 95-Percent
Confidence Interval Lengths (multiplied by ~)
n ,. 20
n - 10
Estimator
Normal
Logistic
D-Exp
a•
Mean
SQRT
Hk"l.O
Hk-1.5
Hk*"l.5
. 054
.039
.046
.059
.060
b.
Mean
SQRT
Hk..l.O
Hk"l.5
Hk*-1.5
4.39
4.93
4.97
4.51
4.45
.048
.033
.040
.050
.053
.045
.030
.035
.046
.048
T3
Slash
Normal
Logistic
D-Exp
T3
Slash
.049
.040
.045
.048
.050
.042
.040
.048
.049
.051
.020
.026
.043
.040
.039
Empirical Error Probabilities
.022
.017
.031
.033
.036
.039
.029
.036
.044
.046
.055
.048
.055
.056
.058
.046
.043
.048
.050
.053
Expected 95-Percent Confidence Interval Lengths (multiplied by
4.34
4.69
4.65
4.30
4.22
4.25
4.34
4.14
3.94
3.88
6.88
6.73
6.28
5.95
5.84
193.79
84.35
14.60
14.61
14.81
7
4.13
4.45
4.41
4.21
4.16
4.09
4.17
4.08
3.99
3.93
4.06
3.73
3.54
3.62
3.60
.~)
6.62
5.76
5.34
5.37
5.32
128.90
32.15
11.36
12.22
12.62
·e
close to .05 from normal to T3, though the ECIL are expectedly large for
heavy tails.
SQRT seems to perform worst over all.
Table 2 represents Monte Carlo estimates of the percentiles of
T
cr
.
The percentiles tend to be larger than those of a t distribution for the
normal and generally smaller for the heavier-tailed distributions.
n = 20 and
a = .05
all estimates except for SQRT and the mean evaluated
at the slash distribution are very close to
t.
05
= 1.73
(we should note
that the method of calculating the estimated percentiles resulted in
considerable error in the second decimal place).
REGRESSION
5.
-
The Huber (1973) regression model is
P
..
X.
Lc .. 6.+U.
~J J
~
~
where
E (U.) = 0
(8 l
,8 p ,a)
, ...
and the
~
j=l
c..
are known coefficients.
~J
Let
be solutions of
p
n
L W(X.- L c. 8 )c .. = 0
i=l cr ~ k=l ~ k k ~J
1
(n-p)
For
n
L
2
p
L c. 8 ) =
~ k=l ~ k k
W(X.-
i=l cr
j
l,p,
s .
Define
p
p
..
Q
(t) = - E ~A(X.- E c_ 8 -c .. t)c.
n,r
i=l cr ~ k=l ~ k k ~r
~r
k~r
8
r = l,p .
Then
8 = Q-I (0)
r
n,r
p(Q-I (a)
n,r
and
< 8
By Taylor expansion in
<
r
= P(a
Q-I (b»
n,r
6r
8
r
<
-
Q
n,r
(8) <
r
b).
we find
n
p
= - L WA(X.- L c. 8 + c. (6 -8)c.
i=l 0 ~ k=l ~ k k
~r r r
~r
n
PIn,
p
* 2
= - L WA(X.- L c. k 8 )c. + ~ L WA(X.- L c. 8 +c 8 )c. (8 -8 )
i=l 0 ~ i=l ~ k ~r
0 i=l 0 ~ k=l ~ k k i r
~r r
r
A
The first term in the above expression is
er --p->
-e
8
r
0
and
8*...L-> 0
Thus, under suitable regularity conditions,
asymptotically normal with mean
0
IQ-I (-t /20
a
n
I~) ,
-~
~
n,r
(8) is
r
and variance
An approximate confidence interval for
In,r
n
if
8
r
is
Q-I (t /20 In)] ,
n,r
a
n
where
The advantage of this method over the usual methods is not clear (The
simplicity of the location model is gone!).
Note though, that use of
is not required and that for least absolute value regression the above
method circumvents an estimate of
f
9
~-l (~)J
W'
·e
6.
SUMMARY AND CONCLUSIONS
A new procedure for constructing confidence intervals for a location
parameter has been proposed which exploits the monotonicity of a class of
~
functions.
The distributional problem is reduced to consideration of
a t-1ike statistic and Monte Carlo results verify that "Hubers" perform
fairly well over a range of distributions and for samples of size
and n
= 20 .
-
..
10
n
= 10
REFERENCES
Andrews, D. F., et al. (1972), Robust Estimation of Location:
and Advances, Princeton, N. J.:
Survey
Princeton University Press.
Boos, Dennis D. (1977), "Limiting Second Order Distributions for First
Order Functiona1s, with Application to L- and M-Statistics,"
Institute of Statistics Mimeo Series #1152, North Carolina State
University, Raleigh, N. C.
Gross, Alan M. (1976), "Confidence Interval Robustness with Long-Tailed
Symmetric Distributions," Journal of the American Statistical
Association3 71, 409-416.
(1977), "Confidence Intervals for Bisquare Regression
Estimates," Journal of the American Statistical Association 3 72,
341-354.
Huber, Peter J. (1970), "Studentizing Robust Estimates," in Nonparametric
Techniques in Statistical Inference, ed. Madan L. Puri,
Cambridge:
Cambridge University Press, 453-463.
(1973), "Robust Regression:
Asymptotics, Conjectures,
and Monte Carlo," Annals of Statistics, 1, 799-821.
Shorack, Galen R. (1976), "Robust Studentization of Location Estimates,"
Statistica Neerlandica, 30, 119-142 .
..
11

Download Report

Boos, Dennis D.; (1978)A New Method for Constructing Approximate Confidence Intervals for M-Estimates."

Paperzz.com

Your Paperzz