Yin, Yin; (1988).Edgeworth Expansion and Tests Concerning Heteroscedasticity."

EOCEWORTH EXPANSION
AND TESTS CX>NCERNING HETEROSCEDASTICITY
by
Yin Yin
A dissertation submitted to the faculty of the
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for the
degree of Doctor of Philosophy in the Department
of Statistics
Chapel Hill
1988
Approved by
Advisor
Reader
Reader
YIN YIN: Edgeworth Expansion and Tests Concerning Heteroscedastici ty.
(Under the direction of RAYMOND J. CARROLL.)
Heteroscedastic regression models are used to analyze data in a
variety of fields. including economics. engineering and the biological
and physical sciences. This dissertation consists of two topics related
to heteroscedastic regression models. The first topic is the Edgeworth
Expansion
for
studentized generalized
least-squares
estimators.
The
second topic is testing for heteroscedasticity in linear models.
Often.
the heteroscedastici ty is modeled as a
function of
the
regression and other structural parameters. Edgeworth expansion is used
to compare the distribution of generalized least-squares estimator (GLE)
when the variance function is estimated by reasonable estimator wi th
normal
distribution.
distributed.
n -2
When
the
errors
in
the
model
are
normally
the nonnormal term in the expansion arises at the order
This means that normal distribution is a very good approximation
for the distribution of GLE wi th estimated variance.
In practice. we
need to construct confidence intervals and to test hypotheses about the
regression parameters.
and
therefore.
we
need
the
distribution
of
studentized GLE. Our study shows that in the Edgeworth expansion of
studentized CLE. the non-t term arises at order n
-1
. Therefore when n is
not large enough. the distribution of studentized GLE is not very well
approximated by t-distrbution.
In the second topic. we present expressions for the power and the
1imi t distribution of the Spearman rank correlation coefficient tes t.
for which. there were previously only nummerical results and we show how
symmetry of the errors affects the level and the power. Finally we
propose a test which is applicable when the errors are asymmetrically
distributed.
ACKNOWLEDGEMENTS
I would like to express my deepest gratitude to my advisor. Raymond
Carroll.
for his guidance.
patience and encouragement throughout
the
preparation of this thesis.
I acknowledge the members of my committee. Edward. Carlstein. Indra
M. Chakravarti. Norman L. Johnson. J. Stephen Marron and ex-member David
Ruppert. for their careful reading and many valuable comments.
I
also
thank
the members of
the
staff who have been helpful
throughout those years. especially June Maxwell. who has answers for all
tough questions.
I also want to thank my fellow graduate students: Stena. Marie.
for their friendship and advices.
I also want to thank my father and my mother for the encouragement
and help. I want to thank my husband for love and understanding. Finally
I want to thank my little son. for the joy you bring to me.
This work was supported by the US Air Force Office of Scientific
Research under Grant AFOSR F49620-SS-c-0144.
TABLE OF CX>NTENTS
PART 1. EDGEWORTII EXPANSION OF STIJDENTIZED GENERALIZED LEAST-SQUARES
FSfIMATFS
CHAPTER I.
INTRODUCfION
1
CHAPTER II.
EDGEWORTII EXPANSION OF THE STIJDENTIZED GlSE
5
2.0
Introduction
5
2.1
Edgeworth Expansion
5
2.2
Edgeworth Expansion for GLSE
6
2.3
Edgeworth Expansion for studentized GLSE
11
2.4
Sununery
32
PART II. TESTING FOR HETEROSCEDASTICITY IN LINEAR MODEL
CHAPTER II I. INTRODUCfION
3.0
Introduction and Overview
33
3.1
Model and Basic Assumptions
34
3.2
Review of Testing for Heteroscedasticity
35
3.3
Problems and Proposed Research
42
CHAPTER IV.
THE SPEARMAN RANK CX>RRELATION CX>EFFICIENT TEST
4.0
Introduction
44
4.1
The Limit Distribution of the Test Statistic
45
4.2
Power of the Spearman Rank Test
60
4.3
The Robustness of The Spearman Rank Test
67
4.4
Summary
75
CHAPTER V.
TESTING HETEROSCEDASfICITY WHEN ERRORS ARE ASYMMETRIC
5.0
Introduction
76
5.1
Testing for Heteroscedasticity
When Errors Are Symmetric
77
Testing for Heteroscedasticity
Without Knowing the Symmetry of the Errors
82
5.3
Asymptotic Efficiency
97
5.4
Sununary
5.2
REFERENCES
101
103
CHAPTER I
INTRODUCfION
1.0 INTRODUCfION AND OVERVIEW.
In the standard linear statistical model. regression coefficients
are most often estimated by the method of least squares. When the errors
are normally distributed wi th a scalar covariance matrix a 2 I
nxn
these
estimates are also normally distributed and have minimum variance among
the class of unbiased estimates.
nonscalar. 0
-1
If
the error covariance matrix is
say. a rotation of the coordinate system can transform
the problem to standard form.
This is equivalant
to estimating the
regression coefficients by generalized least squares. Usually the error
covariance matrix is unknown and an estimated matrix is used to perform
the rotation.
It is commonly assumed that the covariance matrix depends on an
unknown parameter 9. a vector of length k. To estimate the regression
coefficients. we estimate 9 first and then construct an GLS estimator
with 9 replaced by 9. It is well known. see Carroll and Ruppert (1988),
that if 9 is estimated consistently. the standardized GLS estimates are
asymptotically normal
estimates
using
the
and have
true
error
the
same
limit distribution as
covariance
matrix.
The
problem
the
of
interest here is to investigate the difference between the distribution
of the GLS estimate with 9 replaced by 9 and the normal distribution. In
2
a
recent
paper,
Rothenberg
(1984)
investigated
the
difference
by
Edgeworth expansion techniques and found that the normal distribution is
a very good approximation for the GLS wi th
nonnormal term arises at order n
-2
e
e
replaced by
since the
. In order to construct confidence
intervals and to test hypotheses about the regression coefficient, we
need to know the distribution of the studentized GLS with
e
since the covariance matrix of GLS is unknown,
replaced by
see the following
sections, and compare it with t-distribution. Our work
on
e
will concentrate
the Edgeworth expansion of Studentized Generalized Least-Squares
Estimates. We wi 11 compare this wi th the Edgeworth expansion of the
t-di s tribu tion.
The model we consider here is
(1.0.1)
where y.1
i
are
regression
the
he teroscedasticity
distributed
and
(i.i.d.)
function F. The a
i
f3 px 1 is
the
expressing
the
responses at design points x.IpX
( 1)'
coefficient,
a
are
i
the
with
fS
i
mean
positive
are
zero,
constants
= 1, ... ,n,
independent
and
identically
variance
and
distribution
1
can be a function of explanatory and / or exogenous
variables
(1.0.2)
The usual procedure for estimating f3 is to estimate the a. first and
1
then obtain f3
, the weighted least-squares estimator with a replaced
i
GLS
by a .. One popular method, see Carrroll and Ruppert (1988) is to obtain
1
3
a preliminary estimate
~)
'"
~p
(perhaps
~LS'
the least-squares estimator of
and an estimate '"e and then use
(1.0.3)
Jobson and Fuller (1980) and Carroll and Ruppert
(1982) proved the
following result:
If n 1/
as n --.
2
(8 -
e)
= 0 p (1)
and nl/2(~
-~)
= 0 p (1).
then with (1.0.3).
co.
L
(1.0.4)
where
p
---+» N(O.
AwLs
estimator of
is
~
the
covariance matrix of
AwLs).
the weighted
least
square
when 0i are known. This result is clearly optimistic. and
as detailed in Carroll and Ruppert (1988). the actual covariance of
~GLS
can be considerably larger than AWLS' This has been considered in detail
by Rothenberg (1984) and more generally by Carroll.
(1988). In the normal case. one can show that as n --.
Wu and Ruppert
co.
for a positive
definite matrix V.
By
Edgeworth
expansion
Rothenberg
(1984)
compared
the
-1
with N(O. A + n V) and found that the
WLS
2
difference between them is of order n- . In effect then. if
distribution of n
1/2 '"
techniques.
(~GLS-~)
4
the distribution of
order O(n
-2
n1/2c t(p _ P)
"
t
1/2
(we droped GLS of 13) is closely (to
(c Ac)
'
}} approximated by the standard normal distribution. However.
this tells us less about inference than one might first imagine.
In
practice. A is unknown and must be estimated. by A say. leading to the
studentized statistic
(l.O.5)
The question we address is whether the studentized statistic (1.0.5) has
a
distribution
closely approximated
Edgworth eXPansion
techniques.
we
t-distribution even up to order n
-1
by
find
that
that
of
a
t-statistic.
(l.O.5) differs
By
from a
. Thus the studentized GLS is not
approximated by t-distribution very well.
aIAPTER II
EDGEWORTII EXPANSION OF TIlE DISfRIBUfION OF STUDENTIZED
GENERALIZED LEASf-SQUARES ESTIMATES
2.0 INTRODUCfION.
In this Chapter we
outline
Edgeworth
introduce
expansions,
the model
state
we are
Rothenberg's
interested
result
about
in,
the
Edgeworth expansion for the distributon of the GLSE and give our result
on the Edgeworth expansion for the distribution of studentized GLSE.
2.1 EDGEWORTII EXPANSION.
In its most basic and common form, an Edgeworth expansion is an
expansion for the distribution F
n
of sum of independent random variables
Xl'··· ,Xn ' For instance, when the Xi are i.i.d. with EX 1=O, Elxll=~l<oo,
=
... , Elx:1
1
~
r
< 00
and distribution function F,
F {x}
n
= ~)
~x
uniformly in x as n
... , ~
r
+
Av)~
r
~x ~=3n
~
00.
-(k+l)/2n{)
~ x
+
Here
~(x)
0
(-r/2+l)
n
are polynomials depending on
~l'
but not on n and r (or otherwise on F), ~ and ~ are the standard
6
normal distribution function and density function respectively.
In our context. the required distribution function is not for the
sum of independent random variables. since
~
~
We ask the same question: How
~1'
is not a linear function of
far is
studied
~
....
~n.
depends on 9 and therefore.
from the normal distribution? Since Rothenberg (1984) only
normal
therefore.
the
satisfied.
In
distributed.
errors.
we
also
assumptions
addition
Any
for
when
departure
concentrate
general
is
9
of
~
Our model is (1.0.1) with F
=
normal
Edgeworth
known.
from
on
is
~
normality
errors
and
eXPansion
are
exactly
is
caused
normally
by
the
estimation of 9 by 9.
p.
the standard normal distribution.
and
a = ae
i
h.9
1
•
where. without loss generality. we may assume that
n 2
};.1= Ih./n
1
= 1.
a and 9 are unknown scalars.
This model has been Widely used by statisticians in both theoretical
and applied areas
such as Box (1987).
Carroll and Ruppert
(1988).
McCUllagh and Pregibon (1987). Geary (1966). Park (1966). Kmenta (1971).
Since both 9 and a are one-dimensional. estimates of this model are easy
to
calculate.
Also
the
h 9
form ae i
for
the
standard
deviation
convenient as it insures positive estimated standard deviations.
2.2 EDGEWORTII EXPANSION FOR GLSE.
is
7
~
t
=
(~l'
....
~
t
]
.
n ) and X =[x 1 . . . . . x np~
Rothenberg considered the model
y=XJ3+0
-1/2
~
where 0 is positive definite and the elements of 0 are known functions
of the k-dimensional parameter vector
a
(in our model.
also assumed that functions are differentiable in
order.
The
parameter vectors
[3
and
a
a
at = (a.
a».
He
up to the sixth
are unrelated and
can vary
independently.
For all
a.
t
assume that 0(9) and X O(9)X are assumed to be positive
definite. If 9 is known to equal 9 , the best unbiased estimate of [3 is
0
given by the normally distributed vector
If 9 is unknown. but an estimate 9 is available. we might use instead
0
The distribution of [3 depends. of course. on the choice of estimate of
a.
Rothenberg's result is based on two basic assumptions:
Assumption A: The estimate 9. when written as a function of XJ3 + 0
does not depend on [3 and is an even function of
-1/2
~.
~.
This is satisfied by all common estimates of 9 (including the MLE
8
and
those
based
on
residuals
from a
preliminary
regression).
see
Rothenberg (1984).
If
a
satisfies Assumption A. then p-~ is distributed symmetrically
about the origin since
(2.2.3)
is an odd function of c.
'"
From (2.2.3). p-~
is independent of ~. To see this. note that
a
is
independent of p and therefore p-~ is independent of p. Moreover. ~ is a
complete
sufficient
statistic
for
p.
However.
by
Basu's
theorem
(Lehmann. 1983) any statistic whose distribution does not depend on p
must be distributed independently of ~.
We
shall
study
linear
combinations
of
the
components
estimates. Let c be any constant p-dimensional vector. Let
precision
The
matrix
standardized
n be
of
our
the true
statistic
c
t
(P-P)
/[c t (X t OX)-l c ]I/2 is normal with mean zero and unit variance. If
(2.2.4)
converges in probability to zero. then c
asymptotically normal
t
'"
t
t
-1 1/2 .
(P-P)/[c
(X OX) c]
WIll be
wi th mean zero and uni t
variance.
To
obtain
higher-order asymptotic approximations to the distribution. Rothenberg's
second assumption is needed.
Assumption B: The standardized difference between c
written as
t'"
p and c t-p can be
9
(2.2.5)
where Z possesses bounded moments as n tends to infini ty; and R is
n
n
stochastically bounded with
(2.2.6)
for some constant q.
" X and
For sufficient regularity conditions on e,
n
to insure that
Assumption B is satisfied, see Rothenberg (1984).
Now we have
t
(2.2.7) P[
"
~ (~
- ~~ 1/2
[c (X OX) c]
= Eq>(x
since
~
is
distribution
~
x]
=
- An ),
normally
function
distributed
and
all
Assumption B the remainder term R
n
its
independent
derivatives
=n2 (n 1/ 2An-
n-
An .
of
are
The
bounded.
1
/2zn ) has
normal
Under
well behaved
tail probabilities. Hence, when calculating the expectation in (2.2.7),
restriction of the integration to the region where IR
n
an error of order
0
Ilognlq yields
(n-2). Fur thermore , by the mean value theorem,
that region we have the bound
(2.2.8)
I<
in
10
for some constant A. Thus. the probability distribution for ct~ can be
written as
(2.2.9)
=
iN
~x)
_J'hr
4
3
- n -1 E(Zn2 )A~x)/2
- n -2~(Z )(x n
thr
3x)~x)/24
Inverting the Taylor series expansion. we have
t
P[
= if\(
~x[l
Setting t
= x(l
A
(P -
c
p>
~ x]
[c t (x t C2X)-l c ]l/2
- n
-1
2
-2-2 2
E(Zn )/2 + n -E (Zn )/8]
+ EZ2/n)-1/2. we obtain
n
where
a
2
n
= c t (X t C2X) -1 c(l
+ varZ /n)
n
+ o(n
-2 ).
11
and
2.3 EDGEWORTII EXPANSION FOR STUDENTIZED GLSE.
As we discussed before, the present knowledge about the distribution
of Gl.SE is not enough when we want to construct an confidence interval
and when we want to test hypotheses about f3,
unknown.
We
shall
study
the
Edgeworth
since the variance is
expansion
of
studentized
generalized least-squares estimates.
In order to compare the Edgeworth expansion of the studentized GLSE
with
that
of
t-distribution,
we
first
need
to know
the Edgeworth
expansion of at-distribution.
TIIEOREM 2.3.1. Let t
n
be the Student's t-statistic with n degrees of
freedom. Then the distribution of t
n
has Edgeworth expansion
(2.3.1)
Proof: Let Y1""'Yn and X be i.i.d. with standard normal distribution.
Let
Then we know that t
of t
n
n
has a t-distribution with n d.f. The distribution
can be written as
= P[X ~ t(2Y./n)1/2]
1
12
since X is independent of Yi . Moreover
(~~/n)1/2
1
= {2[(Y~1
= [1
+
1) + 1]/n}1/2
2(Y~ - 1)/n]1/2
thus
and
o
REMARK. When p is fixed,
since
and
(n - p)
-1
- n
-1
= p/[n(n
- p)]
= O(n-3/2 ).
13
Now let us introduce some notation:
-2 -2h.9
0ij= { :
e
i
= j,
1
i # j.
A
9: MLE of 9 when
zi = e
-h.9
1 Xi'
A
-h 9
A
zi = e
i xi'
P
is replaced by ~,
with dimension pxl,
14
A~
=AwLs=
A=
A~
+ n
"2
2
a - a
t
nE[(~-~)(~-~) ]
-1
= n(X t OX) -1 pxp =a 2C-1
1n
V.
= d'a lIn1 / +2d.a 2/n
+
0
"2
2
1/2
a - a = d lIn
+ d 2/n +
0
*
a
a
p
p
(lIn)
(lIn)
.
15
"2
The unusual factor for a* will be explained later. but it does make
some intuitive sense. as it corrects for the p+l parameters
~
and 8.
Before discussing the Edgeworth expansion of the studentized GLSE.
we must decide how to studentize the generalized least-squares estimates
to be close to t . The following example suggests that an unbiased
n
estimator of
A~ + n- 1y should be used in the studentized GLSE.
Xn be i.i.d. with distribution N(M.a2 }. We
EXAMPLE 2.3.1. Let X
1
!mow that
(2.3.2)
n
1/2 -
(X-M}/S*,...
t n- l'
where
2
S
*
- 2
= ~(X.-X)
/(n-l).
1
2
the unbiased estimator of a . By Theorem 2.3.1. and the Remark following
1 2
Theorem 2.3.1 .• the Edgeworth expansion for n / (X-M}/S* is (2.3.1)
P(t n 1
~
If we define
(2.3.3)
2
the MLE of a. then
t) =
m
~ [t
+ n
-1
3
-3/2
(-t/4 - t /4)] + O(n
}.
16
=
~
~
{[1-(2n)
-1
]t + n
-1
3
-3/2
(-t/4- t /4)} +O(n
)
= ~~ {t+n-1 (-3t/4-t3 /4)}
(2.3.4)
-3/2
+ O(n
).
Comparing (2.3.4) with (2.3.1) we see that the coefficient of t
associated with the order n
unbiased
estimator
of
a
-1
2
term changes from -1/4 to -3/4 caused by
Therefore.
we
define
our
studentized
least-squares estimates as
,.,
where V is an estimator of V such that
IIV - VII
e
= 0 p (n-l/2 ).
The following theorem shows that there is a departure from t
n
for
1
the studentzed GLSE and the non-t term arises at n- .
THEOREM 2.3.2.
Under
regularity conditions
stated
in
the following
lemmas.
(2.3.5)
- 8
-1
[(c
t'
t
2-'"
2
A~c)/(c A~c)]-E(9-9)
3
-3/2
(t+t )} + O(n
)
REMARK. Comparing the Edgeworth expansion of
P[Tn~t]
(2.3.5) with that
17
of pet
n
~t]
(2.3.6)
(2.3.1), we get an extra term
- 8
-1
[(c
t·
t
2-
A
2
A~c)/(c A~c)]-E(9-9)
3
(t+t ).
The reason is the presence of 9 which causes some changes in the above
example. T
n
is affected by 9-9 in the following ways:
1. The numerator of T , ~-~, can be written as ~-~
n
= ~-~
+ ~-~, where
~-~ is (similar to X-~ in t ) a linear combination of c . The extra term
i
n
~-~ on the numerator is caused by 9-9.
A
and C , and therefore also
1n
2
denominator is not distributed as X
2.
A2
a
A~
are involved with 9. hence the
3. The numerator and denominator are not independent any more because of
9.
A2
From (2.3.6) we may guess that this extra term is caused by a* and/or
A_I
C * by the definition of A~. From the proof of the theorem we will see
1n
A_I
-1
A2 2
that the extra term comes from C *-G , not a*-a , since in the Taylor
1n
1n
A_I
-1
A2 2 A
1/2
eXPansions of C *-GIn and a*-a , 9-9 arises at the order nfor
1n
A_I
-1
-1
A2 2
C1n*-G1n , but only at n
for a*-a
To prove this theorem, we need to introduce some lemmas first.
LEMMA 2.3.1. Let 9 be the maximun likelihood estimate of 9 when 13 is
replaced by ~. Then
18
1)
a~ - a = n -1/2A1n
+ n
-1
A2n + o{1/n)
with.
A1n
= 2-1 [n-112~ihi{~i2
- 1)]. EA 1n
=0
2
and EA 1n
= 1/2;
and
~
2)
a
satisfies Assumption A of Rothenberg.
Proof: For 1). see Carroll Ruppert and Wu (1988) Lemma 3.
For 2). see Rothenberg (1984).
LEMMA 2.3.2. Let
~
Let N be an estimator of N. e.g.
such that
N- N = 0p (n- 1/2 ).
Let
Then
19
= n
-1/2-
-DC1 + n
-1
D
+ 0p(l/n)
C2
with
and
t
E(e DC1 e)
2
2-.2
= a -4 (e t'A~e)-EA1n'
Proof: Let
A
-1
A
-1
A At
t -2h a
C1n
= n ~.zizi
= nl
~.x.x.e
I
1 1
i
Let
Then
A
A
C
1n
- C
-1
t
= n ~.x.(e
1n I l
-2h a
i
-
'"
e
-2h.a
t -2h.a -2h (a-a)
= n-1 ~.x.e
1 (e
i
1 1
and
1)
1)
20
So. we have
and by Lemma 2.3.1.
(2.3.7)
-1
= NC 1n ·
Let
Then
with
and
t
2
-4
t·
2-
A
2
E(c DC1 c) = a n(c Apc)-E(8-8) .
21
o
"-1
-1
REMARK. From (2.3.7) we may see that C1n*-C1n is involved with A1n or
"
8-8 (see Lemma 1) at the order n
-1/2
.
LEMMA 2.3.3. Let
and
Then
d 1
a
d 2
a
= -2-1 Al2n
= a 2n -1/2~(c.21
- 1).
2 -2
T -1
2
- 2a n ~ ..c.c.z.C 1 z. + pa In.
IJ 1 J 1
n J
and
2
2
£da1 = 2a •
Proof
a2 =
Ed
o.
Let
"
"2
-1
t" 2 -2h.8
a = n ~.(y.-x.~)
e
1.
III
Then
=n
=n
-1~ [
~i
-1~ [
~i
aeh.8
1 c
"
i
_ xt.(~
_R)]2
e- 2h l· 8
1 ~
~
"
aC i
_ zt.(~
_R)]2 e-2h l·(8-8)
1 ~ ~
22
Since
'"
e-2h i(8-8)= 1 - 2h (e-8) + 2h~(e-8)2 + 0(le-81 3 )
i
and
o
-1 '"
(~- ~)= 0
= n
-1 -
(~-~)
+
0
-1 '"
-
(~-~)
-1/2 -1 -1/2
-1
C n
~z.~.+ 0 (n
).
p
1n i l
"'2
-1 2
2
'"
2
2
-1 2
-1 -1
o = n 0 ~i~i[1 - 2h .(8-8)+2h i (8-8) ] - 2n 0 ~.~.z,[CI n ~.z.~.]
1
III
n
J J J
A
+ n
-1 2
0
t
-1 -1
2
~.[zi(CI n ~.z.~.)] +
1
n
J J J
-3/2
O(n
)
and
;2 _ 02= n-l02~.(~~_ 1) _ 2n-l02[n-l/2~.(~~_1)h.][nl/2(e_8)]
1
+ 2n
-1 2
0
~
~i~~i i
2
1
III
2
-3/2
-1 2
t -1 -1
2
-3/2
(8-8) + O(n
) + n 0 ~.[z,Cl n ~.z.~.] + 0 (n
)
lin
J J J
P
A
If
Then
(2.3.8)
I
do 1= n
-1/2 2
0
2
~.(~.
1
1
-
1)
and
2 -1/2
2
1/2 '"
2 -1
~ 2
2
do 2= -20 [n
~.(~.-I)h.][n
(8-8)]
+
20
(n
~.~~h. )n(8-8)
11
1
III
I
A
23
2 -1
t -1
2
t -1 -1
2
- 20 [n Li .c i c.z.C 1 z.] + a L.[z.C 1 n L.Z.c.] .
J
J 1 n J
lIn
J J J
with
Ed~1
=0
and
= -202
+ a
2
- 2pa
2
+
pO'
2
= -(p+l)a2 .
If we define
"'2
a*= n(n-p-l)
-1"'2
a.
then
"'2
a
*
= ( n-p- 1)-IL i ( Yi
-
'"
~)2e -2h.9
1
xi~
and
"'2
2
a* - a
= n(n-p-l) -1 (a"'2-
2
-1 2
a ) + (p+l)(n-p-l) a
If we take
then we have
2
2
Ed = 20 •
a1
"'2
o
2
REMARK. From (2.3.8) we may see that a*-a
order n- 1 .
'"
is involved with 9-9 at the
24
LEMMA 2.3.4. Let
J4
and
Then
and
t
2
E(e DAle)
Proof: Since
We have
and
= 2(e t A~e) 2+
t·
2-.....
2
n(e A~e)~(9-9) .
25
From the proof of Lemma 2.3.2 and Lemma 2.3.3. we know that
and
By the assumption that n
-1
Th =O. we have
1
Therefore. by Lemma 2.3.2 and Lemma 2.3.3. we have
and
= 2(C'A~c) 2+
LEMMA 2.3.5. Let
Then
and
.
2-'"
2
n(c'A~c)-E(9-9) .
o
26
with
2
-1
-1
t·
t
22
ER 1 = 2 + 4 n[(c A~c)/(c A~c)]~(e-e) ,
A
and
E~
= - ER21/2 = -4-1
- 8
-1
[(c
t·
2
t
A~c)/(c A~c)]
.
Proof: Since
[(c
tA
t
A~c)/(c A~c)]
1/2
={ 1 +
[c
tAt
(A~
= 1 + 2-1 {c t (n-1/?-nA1 +n-1 DA2 )c]/(c t A~c)}
-
-
8
A~)c]/(c A~c)}
1/2
-1 {[c t n-1/2-nA1 c]/(c t A~c)}2
= 1 + 2-1 n-1/2 (c t DA1 c)/(c t A~c)
+ n-1 {2-1 (c t DA2c)/(c t A~c) -
8
-1 [(c t D c)/(c t A~c)]2 } + 0p(n-3/2 ),
A1
we have
and
~=
-1
2 (c
t
t
D c)/(c A~c)
A2
-
8
-1
t
[(c DA1 c)/(c
t
By Lemma 2.3.4, we have
2
ER 1
22
= 2-1 + 4-1 n[(c t·A~c)/(ct A~c)]-E(e-e)
A
2
A~c)]
.
27
and
ER2 = - 2
-1
2
-1
-1
t·
?_
2
t
2
ER 1 = -4 - 8 n[(c A~c)~(a-a) ]/(c A~c) .
A
o
LEMMA 2.3.6. Suppose that Z is normally distributed with mean zero and
standard deviation 1 and Z is independent of X and Y. Assuming also that
P[X=o]
= O.
Then
P(ZX + Y ~ t) = E ~ [(t-Y)/X].
Proof: Conditioning on X and Y. we have
I X=X.
co
co
P[ZX + Y ~ t] =J-co
J-co
P[Z~(t-y)/x
where f
x.y
Y=y] fxy(x.y) dxdy
is the joint density of X and Y. Since Z is independent of X
and Y.
P[Z~(t-y)/xl X=X. Y=y] = P[Z~(t-x)/y]
and therefore
P[ZX + Y ~ t] = Ex.y[P(Z~(t-Y)/X)] = E ~ [(t-Y)/X].
LEMMA 2.3.7. Let
t
-
Z = c (P-P)/(c
t
A~c)
1/2
•
o
28
X
= (c t.....A~c)/(ct A~c) 1/2 .
and
Then Z is independent of X and Y and Z has a
standard normal
distribution .
.....
Proof:
Since 9 maximizes
the matrix in brackets null Hies X.
so
that Y can be replaced by
0(9)-I/2~. thus the concentrated log likelihood function does not vary
with f3 and hence its maximum does not either. Since ~ is a complete
sufficient statistic for f3. it follows that ~ is independent of 9 and
.....-1
therefore ~ is independent of CIn*.
We can write
. . .2
0*
...
2
=°
...
(n-p-I)
t.....
-1
Since [0(9) - 0(9)X(X 0(9)X)
...
t.....
.....
(Y - Xf3 ) 0(9)(Y - Xf3)
-1 t
...
X 0(9)] nullifies X. We have
29
By the same reason as above.
"'2
a
is independent of ~. Therefore.
is independent of Z and so is X.
Let us look at Y . Since
is
independent of (3.
'"
it follows
that (3
~
is
independent of (3.
Therefore Y is indenpendent of Z.
It is obvious that Z has standard normal distribution.
o
Proof of Theorem 2.3.2: First. notice that we may replace Y by Y. In
fact.
__ nl/2ct(~_R)
{[ c t(A'"~ + n -l y"') c ]-1/2 - [t(A'"
-l y ) c ]-1/2}
p p
c
~ + n
'"
t '"
= n 1/2c t «(3-(3)[c
(A~
+ n
-1
Y)c]
-1/2
{-2
-1 -1 t '"
t '"
-1
-1
n c (Y-Y)c)[c (A~ + n Y)c] }.
Since
t '"
t '"
c (Y-Y)c[c (A~ + n
-1
Y)c]
-1
= 0p(n- 1/2 ).
30
n
1/2 t ~
t ~
_1 A -1/2
1/2 t ~
t ~
-1
-1/2
-1
c (~-~)[c (A~ + n V)c]
- n
c (~-~)[c (A~ +n V)c]
=op(n),
We have
= P{[n
1/2 t ~
t~
1/2
t~
t A -1
1/2
-1
c (~-~)/(C A~c)
][C A~c/(c (A~+n V)C)]
~ t}+o(n )
A
= P[n1/2ct(~_R)/(ctAARC)1/2~
-1 2e
-1 e
tv /( c t A~c )] +0 (-1)
~ ~
~
t (1 + n
n
-1 2e
-1 e
tv /( c t A~c )] +0 (-1)
= P[n1/2ct(~_R)/(ctA~RC)1/2~
~ ~
~
t (1 + n
n
.
..
1/2 t ~
t A -1/2
Now let us look at the distrIbutIon of n
c (~-~)[c A~c]
:
+ n1/2ct(~_~)/(ctA~c)1/2 ~ t }
By Lemma 2.3.6 and Lemma 2.3.7. we have
(2.3.10)
31
-1 2 2
-1 t
t
t ~ 2
t~
1/2
{n ER 1 t + n (c Vc)/(c A~c) + 2E[c (~-~)]t /(nc A~c)
}[-t¢{t)/2]
+O(n-3 / 2 )}
=
iN'
~t)
= iI\(
~t
+
+ n
thr
~t){[n
-1
-1
ER2 - n
-1
t
(c Vc)/(2c
t
A~c)]t-
n
-1 :L 2 -1
t-ER 2
1
-3/2
} + O(n
t
t
:L 2
-3/2
[t(ER -(c Vc)/(2c A~c»- t-ER 1/2]} + O(n
).
2
Since (2.3.9).
P[Tn~t] = p{[n1/2ct(~_~)/(ctA~c)1/2] ~ t[1+n-1ctvC/(2ctA~c)]}.
replacing t by t[1+n
=
iI\
~
- S
{t + n
-1
[(c
t·
-1
(-4
-1 t
c Vc/(2c
-1
t
t~
A~c)]
in (2.3.10). we have
-1 3
t-4 t )
2-
~
2
A~c)/(c A~c)]-E[(e-e)
3
-3/2
](t+t )} + O(n
)
)
32
o
by Lemma 2.3.5.
'"
The reason that 9-9 affects the expansion through C
can be
1n*
seen from the proofs of those lemmas and the theorem.
Let us look at
{j-i3 first. Since
-1
V on the denominator, see
there is no effect caused by
"'2
2
Second,since 0*-0
"'2""'-1
is sYmmetrically distributed
E(P-i3)2 only, but it is cancelled with
about zero we need to worry about
n
it
(*) in the proof of the theorem, so
{j-i3.
"'-1-1
and C1n*-C
affects the expansion through
1n
2 -1
.
0*C 1n*-0 Cln , only dOl and DCI WIll be considered. From Lemma 2.3.3 we
may see that 9-9 does not affect dOl. From Lemma 2.3.2 we see that it
does affect D ' and it causes the extra term in the expansion.
C1
2.4. Summary.
From section 2.3 we get
the
result
that
the additional
noise
introduced by using a random rotation matrix causes thickening of the
tail probabilities since 9-9 does affect the Edgeworth expansion of the
studentized generalized least-squares estimator such that non-t
term
arises at n- 1 . Therefore, when we use t-distribution to approximate the
distribution of studentized GLE, we need to consider the effect of the
non-t term when n is not large enough.
OIAPTER III
INTRODUCfION
3.0 INTRODUCfION AND OVERVIEW.
Heteroscedasticity and nonnormality are common in a variety of
fields of application. It is clearly important to test for homogeneity
of variance. since the resul t of such a test determines the type of
analysis (weighted or unweighted) to be performed. Failure to correctly
model heteroscedasticity can produce inefficient estimates and invalid
inferences
concerning
their
true
underlying
values.
see
Judge.
Griffiths. Hill and Lee (1984. Ch.U). A large spectrum of tests has
been developed in the
last few years.
These range from general or
non-constructive
(for
undefined
tests
broad
or
alternatives).
see
Goldfeld and Quandt (1965) and Horn (1981). to specific or constructive
tests which require
specification of
the heteroscedastic
form.
see
Anscombe (1961). Bickel (1978). Judge. Griffiths. Hill and Lee (1984.
Ch.11). Cook and Weisberg (1983) and Evans and King (1985).
Almost all the tests use residuals in the test statistcs so that
the
distribution
distribution of
of
the
test
statistics
can
be
affected
the estimates of the regression parameter.
by
the
When the
errors are normally distributed. the effects are not too hard to handle.
see Anscombe (1961). Judge. Griffiths. Hill and Lee (1984. Ch.11). Cook
and Weisberg (1983) and Evans and King (1985). When the errors are not
34
normally distributed.
proposed a
the effects are more complicated. Bickel (1978)
test robust against gross errors. Unfortunately. his test
depends on the assumption of sYmmetry of the error. There are also some
nonparametric
Quandt
tests deal ing wi th nonnormal
errors.
see Goldfeld and
(1965) and Horn (1981). The problems are that some of them, see
Goldfeld and Quandt
(1965). ignore the difference between errors and
residuals and others. see Horn
(1981). study the level and the power of
tests by simulation and therefore. theoretical results are not given. In
the
following
context.
we will
devote
ourselves
to
those
problems
mentioned above.
3.1 Model and Basic Assumptions.
Consider a general heteroscedastic regression model for observable
data y given by
(3.1.1)
Yi
J.l. i
where
Xi
are
= J.l. i +
t
= xif3·
the
i
oici'
design
vectors;
(px1)
parameter;
are
independent
and
1
= O.
the
regression
identically distributed
random variables wi th distribution function F.
density f. derivative f' and Ec.
is
f3(pxl)
= 1 •...• n,
Ec~1
= 1.
(Ll.d.)
absolutely continuous
The constants o. express
1
the heteroscedasticity in the model. The total sample size is n.
We consider tests for heteroscedasticity in two situations. In the
first.
we assume
that
the
observation can be
increasing variances. see Goldfeld and Quandt
(3.1. 2)
ranked according
(1965):
to
35
where g is an unlmown increasing function and hi are constants such that
i
<
j.
In the second situation, we consider tests in a general parametric
model is given by
(3.1.3)
where a is an unlmown scale parameter, g is a known function and
e
is an
unlmown scalar parameter representing heteroscedastici ty when
e
#
--+
o.
o.
Three examples of this kind of model are
(3.1.4)
a
i
= ae B,J..I,
(3.1.5)
or
as
(3.1.6)
e
Model (3.1.6) has been studied frequently, see Anscombe (1961), Bickel
(1978) and Carroll and Ruppert (1981).
3.2 Review of Testing for Heteroscedasticity.
i) Tests Generalized from Estimators.
When
(3.1.1)
and
(3.1.3)
hold,
testing
heteroscedasticity is equivalent to testing
to find an estimator of
e
first and
e~
for
the
existence
of
O. It is a natural idea
based on its distribution construct
a test. By the assumptions given above, the standardized variables
36
are i.i.d. with mean zero and variance 1. Thus
E[_I_ly - IL
a
i
i
i
I] ::
c
for some constant c, So that
= ca,I = cag(IL"I
8),
When (3.1.5) holds,
(3.2.1)
where
A
~
is an estimator of
~
such that
1113 -
~II
= 0p (n-1/2)
A
Applying least-squares method to (3.2.1) gives an estimator 8 for 8
proportional to
~(~i)IYi- ILil. If a~ is the variance of
8
e and
;~ the
8
37
2
estimator of a '
O
o will have an asymptotic t-distribution. For
-,,a"
o
discussion of the problem of variance function estimation see Carroll
and
Ruppert
(1988,
01.3)
Davidian
and
Carroll
(1987)
and
Judge.
Griffiths, Hill and Lee (1984, 01.11).
ii) Score res t .
Suppose (3.1.3) holds. The locally most powerful test for H : 0=0
O
vs. HI: 0>0 rejects 0=0 when
(3.2.2)
see Ferguson (1967) page 235.
If the density f of the error is absolutely continuous and
+ E-
shows
f'{E- 1 ) ]2
) dE-I
1
f(E- 1 )
that
<
the power
00.
LeCam's third Lemma (Hajek and Sidak. 1967).
is determined by
the covariance of
the
test
L(x 1 ,··· .xn : 0)
statistic and the loglikelihood ratio log
. assuming they
L{x 1 ,··· .xn : 0)
have joint normal distribution and the alternative is contiguous to the
null. When (3.1.4) holds.
(3.2.3)
and when (3.1.5) holds,
38
(3.2.4)
Under regularity conditions, the left hand side of (3.2.2) is just the
right hand side of (3.2.3) when (3.1.4) holds and the right hand side of
(3.2.4) when (3.1.5) holds, see Hajek and Sidak (1967). From LeCam's
third Lemma we can also see that the efficient score function {the left
hand side of (3.2.2»
has the maximum local power.
In our regression model, 9 is not the only unknown parameter since
the likelihood function is of the form
Thus in the score function, there is another unknown parameter
~.
When
the errors are normally distributed, the score test statistic is (Cox
and Hinkley, 1979 page 323)
wu = [810gL{y,/3,9>
89
I9=0' I-'ML
~ ]2. 99 {9=O 'I-'ML
~ )'
1
'"
where ~ML
be the maximum likelihood estimator of
-
~
when 9=0 and
39
The limit distribution of Wu is Chi-square with degrees of freedom 1.
In linear regression.
the score test was investigated by Cook and
Weisberg (1983). They show that the test statistic is
where
tA 2
t
A
2
(Yi- xi~) ]
/~.---------[ (Yi- x.~)
l I n
~
= ~ML
when
1
nx •
e = o.
The test statistic S is one-half of the sum of the squares for the
regression of U on D in the constructed model
and thus can be easily obtained by using standard regression software.
If the errors are normal.
S is asymptotically Chi-square distributed
under the null hypothesis. As shown by Carroll and Ruppert (1988) and
McCUllagh and Pregibom (1987). in general S is asymptotically
the
null hypothesis. where c
= (1+1<:4/(21<:22» -1
where
I<: i
2
c~
under
is the i th
40
2
cumulant and "22= "2·
iii) Bickel's Tests.
Under the assumption of normal ity,
Anscombe
(1961) proposed
the
following test statistic based on model (3.1.1) and (3.1.5), see also
Bickel (1978):
where
"-
a. (Jl)
~LS
"-
= };.a(Jl.
)/n,
1
1
is the least-squares estimator of
standard
deviation
of
the
~,
numerator
and
of
a
is an estimator of the
A when
have
normal
2
2
distribution. Since };i(e - Eei)Jl is proportional to the locally most
i
i
powerful test statistic for H : 9=0 vs. HI: 9#0, we may expect that A
O
has good power and in fact we have
P9=o(A
~
z) = 1 -
if.,,
2
~z[2/var(e1)]
1/2
} + 0(1).
The problem is that the test is not powerful if the errors have a
heavy-tailed distribution. If the kurtosis is large, this test does not
have
robustness
of
validity
and,
in
particular,
for
heavy-tailed
distributions have a greater probabil ity of type I error than under
normal distribution.
To solve this problem, Bickel (1978) proposed the test statistic
41
A
A
A
~
~= ~i[a(~i) - a.(~)]b(~i)/ab'
where b is a function chosen for robustness (see below) and
Under regularity conditions, see Bickel (1978),
(3.2.5)
where
The advantage of this test is that the power is independent of the
method of estimation of
~
and depends only on the choice of b when the
errors are SYmmetrically distributed and when b is SYmmetric. Therefore
least square estimates could be used with appropriate b's to give better
performance in such situations. Carroll and Ruppert (1981) show that
(3.2.5) holds for many b's. The disadvantage is that when the errors are
not SYmmetrically distributed, the power does depend on the method of
estimation of
~.
iV) Nonparametric Tests.
When
(3.1.1)
nonparametric
and
tests
(3.1.2)
have
been
hold,
used
and
g
instead
is
of
not
known
parametric
exactly,
tests.
Nonparametric tests include the Goldfeld and Quandt (1965) peak test and
the Spearman rank correlation coefficient test, see Johnston (1972) and
Horn (1981).
42
The
"Peak
test"
involves comparing
the absolute value
of
each
residual wi th the values of preceding residuals. One that is greater
than all the others consti tutes a "peak", and the test is based on
whether the number of peaks is significant.
The
Spearman
rank
correlation
coefficient
is
the
empirical
correlation coefficient of the rank of the absolute value of residuals
and the rank of hi in (3.1.2). This test rejects for large absolute
values of the rank correlation coefficient.
3.3 PROBLEMS AND PROPOSED RESEARaI.
For
model
(3.1.1.)
and
(3.1.2),
people
usually
choose
non-
parametric tests instead of parametric tests since those tests do not
depend on the assumption of normality and do not need exact knowledge
about g in (3.1.2). The problem is that most nonparametric tests. see
Goldfeld and Quandt (1965) and Horn (1981), did not study the difference
between residuals
~i
and errors
~i'
i.e. the difference between
~
and
~.
This difference may cause different probability of type I error than we
expect.
More details can be seen in Chapter IV where we study the
Spearman rank correlation coefficient test. By the end of Chapter IV, we
will
see
that aSYmmetry of
the distribution of
the
errors
dependence of the limit distribution of the test statistic on
~
-
causes
~.
For model (3.1.1) and (3.1.3), parametric tests are often used, see
Judge, Griffiths, Hill and Lee(1984). Most parametric tests are based on
the assumption of normal errors,
see Judge, Griffiths, Hill and Lee
(1984). Bickel's test (1978) does not need the assumption of normality
but it needs sYmmetry of the distribution of the errors. Wi thout the
assumption of sYmmetry,
the limi t distribution of the test statistic
43
will depend on the distribution of
the method of estimating
p.
P - P.
and therefore. will depend on
In Chapter Y we propose two tests to deal
with the case of asymmetric errors.
CHAPTER IV
TIlE SPEARMAN RANK OORRELATION OOEFFICIENT TEST
4.0 INTRODUCfION.
If
we are
testing
for
the model
(3.1.2).
g
is
unknown.
Any
reasonable test should be based only on the ordering of the {h.}. The
1
Spearman rank correlation coefficient test is studied here because of
its simplicity.
The Spearman rank correlation coefficient between h. and Iy.- ~.
1
1
1
I
is defined by
where
t
When
G.= G
1
= r = ~.r./n
= n(n+l)/2.
1 1
for all i. Iy.-~. 1=IG.c.
1
1
1
2
n 1/ S
n
1
L
I
is independent of h. and as n ~
t
N(O.I).
1
00
45
see Lehmann (1974). page 369.
Since
~
depends on the
_
unknown~.
instead of r
i
we only observe
t ....
R.=
rank(ly.- x.~I).
I I I
and therefore we are concerned about the limit distribution of n
112....
S.
n
where
In Section 2.1 we study the limit distribution of n 1/ 2
the limit distribution depends on the choice of
asynunetrically distributed.
~
Sn
and find that
when the errors are
In Section 4.2 we study the power of the
test for some models. In Section 4.3. we investigate the robustness of
this test.
4.1 TIIE LIMIT DISTRIBUTION OF TIIE TFSf STATISTIC.
LEMMA 4.1.1. Suppose that there exists an M > 0 such that
i) c. have distribution function F and density f with
1
If(x)I~M
(4.1.1)
for all x;
i i ) IIx II
i
~
M for all j.
Suppose that
T.
~
is an estimator of
~
such that for some random vector
46
Define ' \= { II~-p II ~ Ln-1/2} and
Then for any w
> 0,
i) we may find L{w)
(4.1.2)
< 00 such that for n > n1{w) say,
c
P{AL{w»
ii) There exists a c{w)
= P[ I"P - PI > L{w)n-1/2 ] < w/2,
> 0 depending on
w and L{w). and therefore on w
only, such that
Proof: Since for any w
> 0,
we may find L{w)
p[r
> L{w)] < w/2,
and
there exists an n
1
>0
such that for n
> n1,
<
00
such that
47
p[l~ - ~I > L(w)n- 1/ 2 ] < w/2.
and i) is proved. Define
In order to show ii). it is sufficient to show that. for any w>O and
L=L(w) .
Subdivide
the
cube
centered
at
0
of
2n-l/2L
size
(approximately) (2L/c)p cubes of size n-l/2c. where c
labelling them arbitrarily. let the kth such cube be
1
Ckjn(t»
2
2
and (fkjn(t). Ckjn(t»
=
*
~.
be the values of f and
into
w/(4M2 ) and
1
Let (fkjn(t).
C € ~*
at which
F(lt-x.cl + x.f) takes its maximum and minimum values respectively. Then
J
1
where
SUPmax (lc
-c
1. Ifl-f21)~n-l/2c
2
max( If 11. If 2 1. Ic 11. Ic 2 1Hn-l/~
1
IF(lt-x~Cll+x~fl)
-
F(lt-x~C21+x~f2)1
48
~
SUPmax {IC -c 1. If1-f21)~n-1/2c
[IF{lt-x~C11+x~f1)-F{lt-x~C11+x~f2)1
1 2
max{lf 1 1. If 2 1. Ic 1 1. I(21)~n-1/2L
*
By mean valus theorem. qjni{t.c)
is less than or equal to
sUPmax { Ic 1-C2 1. If1-f 2 1)~n-1I2c
{suPtf( t)[x~ If 1-f 2 1+ Ix~{C i -r j) I]}
max{lf 1 1. If 2 1. Ic 1 1. I(21)~n-1/~
2
_ ~cn
<
-1/2
/2
= n-1/2w.
and hence.
(4.1.4)
~ n-1/2~.[I
J
tIt
1
(c.~lt-xiCk· I+x.f k .)
J
In
J In
- I
(c.<ltl)
J-
The first term can be written as
(4.1.5)
-1/2~ {R{i) _
n
j -kjn
(i)}
(i){k·)
Pkjn sgn
In
= ~ j Wei)
kjn'
say. where
I I
t 1 I t.)
I- l
F{ t I ) I .
n = F{ t-x.1 Ckj n +x.f
J k In
(i)
~'j
OK
49
~~~
are
Bernoulli
variables
with
independent as j varies, and sgn(kjn)
~~~
expectation
=1
and
they
are
or -1 according to the sign of
We may write
where
T = {j: sgn(i)(kjn)
+
=+
I},
Thus
(4.1.6)
Hoeffding's inequality (1965) says that, when B.
J
o and
u
~
Bern(p.), for any h
J
> 0,
We need to find an h and fix it such that
n
-1/2
~.log[1
J
h
+ p.(e - 1)]
J
< n -1/~
~(~.p.+
J J
n
-1/2
u)
>
50
to get
P[~j- ~Pj
for some C
> n 1/2u]
~
exp(-n
1/2
C),
> O.
Since
log[1 + p.(eh _ 1)]
< p.(eh _
J
1),
J
it is sufficient to choose h, such that
(eh_ 1) < h( n -1/2~
n-1/2~p
~ j
~Pj+
u
)
or
h
(4.1.7)
(e - 1)/h
-1/2
<1
+ u/(n
-1/2
~p.).
J
C for some C < 00, we may find an h depending on
1
1
h
u and C such that (4.1.7) holds since (e -1)/h ~ 1 as h ~ O.
1
If for all n, n
~Pj~
In our case, see (4.1.6), u
= w/4
and let N be any subset of {1, 2 .
. .. , n} with size m. We have
(4.1.8)
-1/2
m
~jENPj
-1/2
=m
(i)
~jENPkjn
51
Now by (4.l.6), (4.l.8) and Hoeffding Inequality we need to find an
h such that
(4.1.9)
Since (eh _ l)/h is monotonely increasing with h, M and p are constants,
L depends on w and w is fixed,
we may find h small enough such that
(4.l.9) holds and h depends on w only. Therefore we have
< 2e -c{w)n
1/2
~
W
> /2] < 2e-c {w)n
P[ ~j€T_
kjn w
1/2
Wk· > w/2]
+ In
P[~.€T
J
and
where
c{w)
with cl{w)
= h[w/4
= (8~)P/{2wP-l).
h
- cl{w)(e -l)/h + cl(w)]
Thus c{w) depends on w only since h does.
Combining this with (4.l.4), (4.l.5) and (4.l.6), we have
r~) > ] ~ 2e-n
P[ SUPC,f€~ Rn,i ( t,~,~
w
Since this is true for all k and all t, we have
1/2
c(w).
52
Similarly, we may show that
Therefore we have
and
o
LEMMA 4.1.2. Under the assumptions of Lemma 4.1.1, we may replace t by
C
1
and we have
(4.1.10)
~ R)I
P[IRn,l ( cl'~-~
where c(w) depends on w only.
Proof: We may write
(4.1.11)
> w, AL(w) ] ~ 4e-n
1/2
c(w)
53
where
When n large enough.
2n-112~ w/2.
"-
Recalling the proof of Lemma 4.1.1. where we do not need the fact that
is a
function of
intervals
Lk*
where
. ..
'"
~-~
t
n
~
and we only consider different small
falls into. Also. (4.1.8) holds for any subset N
of {I. 2 ..... n}. Hence if we apply Lemma 4.1.1 on
we get
By (4.1.11). we have (4.1.10) if we redefine our new c(w) as c(w/2) in
o
Lemma 4. 1. 1.
LEMMA 4.1.3. Under the assumptions of Lemma 4.1.1. if we replace t in
A
Rn.i(t.~-~)
by t 1 • t 2 •...• tn' there is an n 2
in Lemma 4.1.1. such that. when n
n
~
n 1 • with n 1 as defined
> n2 •
A
P{u._ [IR .(t .. ~-~)I
1- 1
n.l
1
> w]}
~
w.
54
Proof: Since
n
and when n
A
P{u·_
[IRn, i{c.,~-~)I
> w]}
1- 1
1
(4.1.12)
> n1 ,
P{u.~I[IR
.(c.,~-~)I
> w], AL{w )} S 4ne-n
1n,l
1
1/2
c{w),
and
see (4.1.2), we may choose n
> ~.
as n
2
~
n , such that
1
The result follows (4.1.12).
o
THEOREM 4.1.1. Suppose that there exists an M > 0 such that
i) c
i
have distribution function F and uniformly continuous densi ty f
with
If{x) ISM,
for all x;
ii) IIx II S M for all j.
i
55
""
Suppose that {3 is an estimator of {3 such that
L
--';;;~l
T.
Then
n
1/2 ""
(S
n
- S ) = n
-1
-
~.f(c.)(ti/n
nIl
+ n
-1
t
-t
- 1/2)(-x.1 - x)n
1/2 ""
({3 - (3)
*
-t
t
1/2 ""
(ci)(t./n - 1/2)(x - x.)n
({3 -(3)
I I I
~.f
(4.1.13)
+
where
f(x)
*
f (x)
P
~--+l
={
fe-x)
x
>0
-fe-x)
x
~
={
o.
f(x)
x
>0
-f(x)
x
~
o.
. ---~i
1
n
t
12 Ef * (c 1 )11m
- x.)
T.
n~ n
=It.(x
1
1
Proof: The difference n
[12n
1/2 ""
(8 - S ) can be written as
n
n
1/2
2
I(n -1)]~.(R.-r.)(t./n
- 1/2)
III
1
where
R=
i
~.n
I
t""
t""
J
1
J=1 (ly.-x.{3I~IYi-x.{3I)
J
0
p
(1).
56
~
_
-
nIt A
t A
_
j=1 (Ej~ /Ei-X. (/3-/3) /+X . (/3-/3))
1
J
~
nIt A
t A
j=1 (E .<-IE.-X. (/3-/3) I+x . (/3-/3))
J
1
1
J
and
1 2
By Lemma 4.1.3. without changing the limit distribution of n /
we may replace n
n
-1/2
-1/2
n
(Sn -8n ).
(Ri-r ) by
i
tAt A
tAt A
~. I{F[IE.-x.(/3-/3)/+x j (/3-/3)] - F[-/E.-X.(/3-/3)I+x.(/3-/3)]
J=
1
1
1
1
J
t
A
+ F[IE.-x.(/3-/3)I]
- F(/E.I)}
I I I
- n
-1/2
~
n
tAt A
t A
{F[-IE.-X.(/3-/3)I+x (/3-/3)] - F[-IE.-X.(/3-/3) I]
j=1
1
1
j
1
1
t
A
+ F[-lEi-X.1 (/3-/3)
I] -
F( -IE.1
I)}
57
- n
-1/2
n
t ~
t ~
2. {f(-IE..-x.(I3-I3) l)x.(I3-I3)
J=1
1
1
J
+0
p
Since f
f(IE.. I)
1
without
-1/2
-1/2
changing
limit distribution of
the
i
(Sn -
by
S).
n
i
[I( E. <O)f(-E..)(nx.+2x.)
1
1
J
-1/2
n- 1/ 2
I)
(R - r ) can be replaced by
t
i
+ n
~
we may replace f( IE..-x. (13-13)
1
1
is uniformly continuous,
Therefore, n
n
t
(1).
[I( E. <O)f(E..)(nx.-2x.)
1
1
J
~
(13-13) +
t
i
~
(13-13) +
t
i
t
= {
~
I( E. >O)f(-E..)(-2x.-nx.)
(13-I3)J
1
J
1
i
If we define
l(x)
~
I( E. >O)f(E..)(2x.-nx.)
(13-I3)J
1
J
1
fe-x)
x
<0
-fe-x)
x
~
0,
58
*
f (x)
={
f(x)
-f(x)
x
<0
x
~
o.
then
and
Ef * (~1) = 0 when f is symmetric about zero.
Now
+0
p
(1).
and
1 2
-f( ~i )( t / n - 1/2)( xi + x-)tn 1/2(~
R)
n / (SAn _ S)
n = n -1~
~i
~ - ~
i
+ n
-1 };.f* (~.)(t./n - 1/2)(x. - x)
- t n 1/2 ({3A -(3)
III
1
+
0
p
(1).
The first term goes to zero in probability since
and by LLN. Similarly. the second term has the same limit distribution
as n
1/2 t
A
c ({3 - (3). where
59
o
REMARK 1. When f
is synunetric.
the limi t distribution of n
'"
~-~.
N(O.I). and is not affected by the distribution of
is not affected by the method of estimating
~-~
~.
1/2.....
S
n
is
and therefore.
If f is not synunetric.
does affect the distribution of the test statistic. If we ignore the
effect. we will get a different level than we expect.
REMARK 2. The result of Lemma 4.1.3 is not only useful in the proof of
Theorem 4.1.1.
It can also be used in many problems involving ranks.
care i.i.d .• for the linear rank statistic 2.c.R./n with
n
rank(
2c.= O. 2C.= 1 and r.=
1
1
where
1
F+
is
Ie.1 I).
the distribution of
I
l
1
we may replace r.ln by F+ ( Ie. I).
1
leil.
without
1
changing
the
limit
distribution of 2c r /n. since
i i
E[2.c.r /n - 2.c F+(lc. 1)]2~ o.
1 1 i
1 i
1
see Theorem 4.2.1 in next section. Notice that
and
This
means
that
we
expectation given c ..
1
may
replace
by
its
conditional
60
In our case,
Ic I.-x.I (~-/3) I
instead of
(=IYi-XiPI), what are not independent and not identically distributed.
Lemma 4.1.3 says that we may not do the same thing as in i.i.d. case in
our problem. What we can do is wi thout changing the distribution of
n
1/2
A
(S-S), we may write
n
tAt A
tAt A
I{F[lc i -x.(/3-/3)I+x
(/3-/3)] - F[-lc i -x.(/3-/3)I+x.(/3-/3)].
j
J=
I
I
J
R.=~.
I
This problem is very common in regression and therefore the use of the
result should not be limited to our problem.
REMARK 3. If
a ~
0 as n
~oo,
.
we wIll have n
1/2
A
(/3-/3)
P
) 0 for many
1
reasonable estimates of /3, and therefore, the test statistic n / 2S
n
will
have asymptotically standard normal distribution. It turns out that in
some situations of practical importance these asymptotics are relevant.
In particular, in assay data values for
qui te small
relative to the means,
a
are often observed which are
see Davidian,
Carroll and Smi th
(1987) and Carroll and Ruppert (1988). In these si tuations, we do not
have to worry about the effect caused by /3-/3.
4.2 POWER OF SPEARMAN RANK TEST.
LeCam's Third Lemma can be used to study the power of the Spearman
rank test for heteroscedasticity.
LECAM'S TIIIRD LEMMA (Hajek and Sidak 1967): Assume that the pair (Tn .
logL ), where
n
61
~(X)/p
Ln(x)
={ 1
(x)
n
Pn(x)
= ~(x) = 0,
= 0 < ~(x),
Pn(x)
is under Pn asymptotically joint normal
~
> 0,
n
01)
Then Tn is under
p (x)
(~l' ~2'
(~1+
asymptotically normal
°21 , °22 ,
012) with
°12 , ail.
From this Lemma, we see that we need to know the exact form of logL
n
and therefore we need that
the model
is given more precisely than
(3.1.2). We consider the model (3.1.1), where
(3.1.4)
oe
0.=
1
Bh.
B
I,
Note that 0i is an increasing function of hi when B
of Hajek and Sidak (1967) shows that
(4.2.1)
if
max. (Bh.)
i)
1
1
2
--+
0,
ii)
iii)
·
11m
.
n -1~~. h . eXIsts
and
n~
2
2
1
1
.
2
-1
b =I (f)11mn~nB [n
1
2
};.h.].
1 1
> O.
= A/nl/2 .
Theorem VI 2.2
62
If f is absolutely continuous and if
Lemma I.2.4.b of Hajek and Sidak (1967) says that
and therefore (4.2.1) implies that
In order to use LeCam's Third Lemma. we need (n 1/2S • logL ) jointly
n
n
normal. The following theorems show this.
THEOREM 4.2.1. Let
Wn = (12) 1/2~.c.
I InR.InIn.
and
V = (12)
n
where
F{~i)
1/2
~.c. F{~.).
I In
I
are i.i.d. with common U[O.1] distribution.
and
as n
~ 00.
63
Then
E[W - V ]2~ 0,
(4.2.2)
n
as n
n
----+
00,
and
~ x]
PEWn
= P[Vn
~ x] +
= ~x)
0(1)
0(1).
+
Proof: See Ruymgaart (1987).
In our case, if we take
then under H , we have
o
n 1/ 2S = (12)1/2~.c. [F+(lc.
n
I In
I
I) -
1/2] +
0
p
(I),
where F+ is the distribution function of Ic11. Recall that
2
10gL = -(1/2) b -
n
a ~.h.[1
I I
+
f'
C.~f c.)].
I
I
To use LeCam's Third Lemma, we need to show that the joint distribution
of n 1/ 2S and 10gL
n
n
tends to normal as n goes to infinity.
THEOREM 4.2.2. Suppose that the conditions for (4.2.1) hold. Then
64
as n
--+ co,
where
A(9,n) = cov(n
1/2
S. 10gL )
n
n
(4.2.3)
with
So that by LeCam's third Lemma, we have
1 2
P[n / Sn
~ z] = 1 - ~z - A(9,n»
+ 0(1).
1 2
Proof: To show that n / S and 10gL have joint distribution tending to
n
n
normal, the only thing we have to show is that any linear combination of
n 1/ 2S
n
and 10gL
n
is normally distributed as n
--+ co.
Let a and b be any
two constants. Then
anl/2S + blogL
n
n
+
= ~.{ac. [F (Ic.
I
In
I
I) -
f'
1/2] + b9h.[1
+ c.~f c.)]} + 0 (1).
I
I
I
P
Let
Then
EW.In= 0
and
(4.2.4)
2
Wi n
< 2{a2 c.In2 [F+ (Icil)
- 1/2]
2
2
f'
~2
+ b 9-h.[1 + c.~f c.)]}.
I
I
I
65
so that
By the definition of c.In and (4.2.1), we have
and
Then
and
By (4.2.4) and Lindeberg's condition, the joint normality is proved.
To see (4.2.3), we have
A(9,n) = cov(n
= n 1/29[n-1 ~.(t./n
1
1
1/2
S,
10gL )
n
n
+ ( Ic. I
' c.»]
1/2)h.]E[F
)(-1 f
- c l -7-{f
I I I
66
with
o
REMARK. When f
n 1/ 2S
n
is synune t ric.
the limit distributions of n
are the same and
thus
1 2
P[n / n ~ z]
S
= 1 - ~z
- A(S.n»
+ 0(1).
where
A(S.n)
= 2n 1/2S[n-1 ~.(t./n
- 1/2)h.]E[f(c 1 ) Ic 1 I].
I I I
If c. are normally distributed. then
1
and
1/2.. .
S
n
and
67
J
= 1/11"
+ 1/2
and
A(e,n) = n
1/2
-1
-1
e[n 2.(t./n - 1/2)h.]1I" .
I I I
Since t i are the ranks of hi' t are increasing with hi and therefore,
i
n
-1
2.(t./n - 1/2)h
1
1
i
is not zero when h. are not constant. Thus Spearman
1
rank correlation coefficient
test is unbiased.
Also we may use
the
result in (4.2.3) to compare with the power of other tests instead of
using empirical value, see Johnston (1984).
4.3 THE ROBUSTNESS OF SPEARMAN RANK CX>RELATION CX>EFFICIENT TEST.
For our purpose,
robustness signifies insensi tivi ty against small
deviations from the assumptions. Most tests are based on the assumption
of a distribution for the errors. If the observations are contaminated
by outliers, this influences the probability that the test rejects the
null hypothesis.
Therefore the robustness of a
test should also be
considered besides its power. In this section we study the robustness of
Spearman rank corelation coefficient test, and compare it wi th score
test to see that Spearman rank test is more robust than score test.
We start our robustness investigation by introducing the influence
function for tests.
Let WI' ... , W are i.i.d. random variables. The parametric model is
n
a family of probability distributions He on the sample space, where the
unknown parameter e belongs to some parameter space 8 which is an open
convex subset of R. One wants to test the null hypothesis
H :
o
by means of a
test
e = e0
statistic T (W ,
n 1
W)
n
when
the al ternative
68
hypothesis is two-sided,
HI: a "# ao'
One compares T with critical values h'(a) and h"(a) depending on the
n
n
n
level
T
n
a
and applies a rule like reject H if and only if T
o
n
< h'(a)
n
or
> h'n '(a).
We identify the sample WI' ... , W with its empirical distribution
n
G , ignoring the sequence of the observations,
n
T (G ),
n
n
so T (W , ... , W ) =
n 1
n
we assume that there exists a functional T: domain(T) _ R
(where the domain of T is the collection of all distribution on X for
which T is defined) such that
Tn (Wn , ... , W)
n
n~
I
T(G)
in probabili ty when the observations are i. i .d. according to the true
distribution G in domain (T).
We say that a functional T is Fisher consistent if
for all a in
e.
Since test statistics are usually not Fisher consistent, let f :
n
~
be
defined by fn(a) = Ea[T ] and put f(a):= T(H ). We assume that:
n
a
i) f (a) converges to f(a) for all a.
n
11) f
f
-1
is strictly monotone with a nonvanishing derivative,
so
that
exists.
1
Define U(G) as f- (T(G»: this functional gives the parameter value
which the true underling distribution G would have if it belonged to the
1
model. This U is clearly Fisher consitent, since U(H ) = f- (T(H » = a
a
a
69
for all 9. We now consider Hampel's influence function of
this new
functional U. Denote by A the probability measure which put mass 1 in
x
the point x. The test influence function of T at F is defined as
in those x where it exists. see Hampel. Ronchetti. Rousseeuw and Stahel
(1986) .
Generally. U is hard to write down explicitly. However.
the test
influence function can be constructed easily. since
In
(Ia.c.
our
I
11.
case.
WI'
W
n
are
two-dimensional
random
vector
h.) with joint distribution H(w 1 ,w2 ).
1
Tn (WI' .... Wn ) = Sn
-1 n
2
= 12n ~. l(R.- R)(t.- t)/n
1=
1
1
where H* and H** are marginal empirical distributions of la.c.
n
n
I
l
respectively. So we have
T(H)
= 12covH[H*(la.c.
I). H**(h.)].
I I I
I
and h.
1
70
where
H*
and
H**
are
marginal
distributions
of
and
h.
1
respectively.
If we denote the joint distribution of ehioc
i
and hi by H ' where c
O
i
are independent of hi and c i have distribution function F and density
function f as mentioned in previous sections. we will have
**
+
f'
= 120cov[h. H (h)]E[F (lcl)[-l - c-j-{c)]].
see Section 4.2. When
° = O.
T(H0
) = 01
and 1
ly.-x~~1 are independent of
hi and H*= F+ the distribution of Icil. We have
= 120cov[h.
**
+
f'
H (h)]E{F (lcl)[-l - c-j-{c)]}.
and
By the definition of influence function. we have
T[(I-t}H +
o
tA~
~]
w .w
1
2
- T(H )
0
= limtlO 12tcov[h.H**(h)]E{F+(lcl)[-I-cf'(c)/f(c)]}
[F+(I~II) - 1/2] [H**(~l) - 1/2]
=----~~--~---;;...-.-----
12tcov[h.H**(h)]E{F+(lcl)[-I-cf'(c)/f(c)]}
71
to show the second equality, notice that the marginal distributions for
(l-t)H +tA~ ~ are (l-t)H*+tA~ and (l-t)H**+tA~ . When 9=0, H* and H**
o
w ,w
0
wI
0
w
0
0
I 2
2
are independent,
and therefore
and
T[(I-t)H
o
+
tA~
~]
w ,w
1
2
the covariance is under the distribution function (1-t)H
o
+
is equal to
(l-t)3cOVH (H*(lel), H**(h»
o
+ t(l-t)2covH (A~ (lei), H**(h»
o WI
0
+ oCt).
Under H , lei and h are independent and therefore, the covariance of the
o
functions of lei and h are zeros. So we have
limdo
T[(I-t)H + tAw ,w ] - T(H0 )
o
I 2
--------.;;.---.;~---­
t
72
The influence function represents the influence of an outlier in the
sample on the value of the test statistic. and hence on the decision
(acceptance or rejection of H ) which is based on this value. The gross
o
-error sensitivity of T at H by
~* = supx 11Ft es t(x;T.H)I
the
supremum
being
taken
over
all
x
where
IF(x;T.H)
exists.
The
gross-error sensitivity measures the worst (approximate) influence which
a small amount of contamination of fixed size can have on the value of
the test statistic. It is a desirable feature that
which case we say that T is B-robust at H.
~
* (H.T)
be finite. in
It is easy to see that
Spearman rank corelation coefficient test is B-robust.
Now let us look at the influence function of score test. The score
test statistic is given by
where e
i
are the residuals.
see Cook and Weisberg (1983).
It can be
written as
Let WI' .... W be two-dimensional random vector (Ia.c. I. h.) with joint
III
n
73
and
and
Since
T[(I-t)Ho+ tAw ,w ]
l 2
and
2
2
"'2 '"
flcl (h- .~n )d[(I-t)H0 + tA'"w ,w
"'] = (I-t)flcl (h- .~)dH
tw l (w2-'~)
n+ o
n .
l
where
2
74
"'4 + 2t(fc 2dH* ) 2- 2tfc 2dH*wI2 +o(t)
=f(c2- Ec2 ) 2dH*o -tfc4dH*
+ tW
1
0
0
0
where
Let
Then
= f(h-
~)
2**
'"
dHo + t[(w2 -
2
~) -
f(h-
~)
Then
{T[(I-t)Ho+ tA'"
"'] - T(H0 )}/t
w ,w
1
2
2**
dHo ].
75
+0(1).
When
wl~ 00,
it goes to infinity. Thus score test does not have bounded
influence function and therefore it is not B-robust.
This
shows
another
advantage
of
Spearman
rank
correlation
coefficient test.
4.4 SUMMARY. From the discussion in previous sections we see that when
we do not know the exact form of o. and we only want to test wether o.
1
1
is an increasing function or decreasing function of an independent
variable, we may use Spearman rank correlation coefficient test instead
of a parametric test. When the errors are sYmmetrically distributed. the
limit distribution of the test statistic is not affected by the method
of estimation of
~.
This is also true when
errors are not sYmmetric,
0
~
0 as n
~ 00.
we find how the estimation of
~
When the
and the
aSYmmetry affect the limit distribution of the test statistic. Another
advantage of using Spearman rank correlation coefficient test is the
robustness
of
the
test.
We have got a
Spearman rank correlation coefficient test.
more clear knowledge about
CHAPTER V
TESTING HETEROSCEDASTICITI
WHEN ERRORS ARE ASYMMETRIC
5.0 INTRODUCfION.
Synunetry of errors is an doubtful assumption for many tests of
heteroscedasticity in the linear model. Almost all test statistics for
heteroscedasticity are functions of residuals and
without symmetry of
the errors, the limit distribution of the test statistic and the power
of the test would depend on the method of estimation of the regression
parameter, see Bickel (1978), Sukhatme (1958) and also Chapter II. One
idea for eliminating asymmetry of errors is to take the difference of
the responses to get a new model such that the errors of the new model .
which are the differences of errors in the old model, are symmetrically
distributed under the null hypothesis. Then the test statistic is the
function of the differences of residuals instead of residuals and
we
may use the theory developed for symmetric error models to study the
limi t distribution and the power of
the test statistics.
One would
imagine that the asymptotic efficiency of using differences of residuals
relative to using residuals is only half of the original one. In this
Chapter, we show that the asymptotic efficiency can be much more than
one half.
Instead, when the errors are distributed wi th mi Idly heavy
tails, the efficiency can be over 90%.
77
This Chapter is organized as follows. In Section 5.1. we introduce
the model discussed by Anscombe (1961) and Bickel (1978). and the test
proposed by Bickel.
In Section 5.2.
we propose
our
test
for
the
aSYmmetric model and investigate the limi t distribution of
the test
statistic under
Section 5.3.
the null hypothesis and
we discuss
the power of
the
test.
In
the asymptotic efficiency of
the
test and
Section 5.4 is a summary.
5.1 TESTING FOR HETEROSCEDASTICITY WHEN ERRORS ARE SYMMETRIC.
Our point of departure is the general linear model in the form
y. =~. + a{~ .. 8)~ .•
1
1
1
i=I.2 ..... n.
1
where
a{~ .. 8)
1
Here the
~i
=1+
8a{~)
+
0(8).
are independent and identically distributed (i.i.d.) random
errors with common distribution function F and density f. If
then without loss of generality. we may assume that
(5.l.1)
In this model heteroscedasticity is equivalent to HI:
To test H : 8=0. vs HI:
O
8~.
8~.
Bickel proposed the test statistic
78
(5.1.2)
where
A
{3
t
= X y,
b is a function to be chosen, (see below) and
He showed under regularity conditions that the local power is
(5.1.3)
where
Define
and
79
The local power of T is given by (5.1.3). To show (5.1.3). Bickel
b
(197S) showed that the difference between the numerators of T and T
b
b
converge
in probability
to
zero.
Informally.
we have by a
Taylor
expansion that
(5.1.4)
REMARK. It seems reasonable that. if instead of observing
~i+
[1
2
1
= a2 .
EC i
+ea{~.)]c.•
1
i=l ..... n,
we observe
~i+ a[l +ea{~.)]c..
1
i=l ..... n.
1
Let a be an estimator of a such that
a=a+o ( n
p
-1/2)
.
Under the conditions mentioned above. we may show that
=1
-
~(z
- A ) + o{l).
b
a
where
b (x) = b{x/a).
a
""
In fact. when we replace b{r.) by b{r./a). we get everything in Bickel's
1
1
Theorem 3.1 (197S) without difficulty as long as we operate with b"".
We
a
80
need to establish assertions
and
For the first one
,..
~ijWija{~i)b{cj/a)
- ~ijWija{~i)b{cj/a)
,..
= ~ijwija{~i)[b{cj/a)
- b{cj/a)]
.
,..
2
= -~'jw,
.a{~.)c.b (c./a){a-a)/a +
1 IJ
1 J
J
=0
(n
p
1/2
1/2
o{n
)
).
For the second one. we have
,..
,..
~ijWijb{ci/a)b{cj/a)
- ~ijWijb{ci/a)b{cj/a)
,..
,..
= ~ijWij[b{ci/a)-b{ci/a)]b{cj/a)
= -~ IJ
.. w..
b
IJ
•
A
A
(c.la)]c.(a-a)b(c.la)/a
1
1
J
,..
+ ~ijWij[b{cj/a)-b{cj/a)]b{ci/a)
2
-~ ..w.. b
IJ IJ
.
A
{c.la)(a-a)b{c.la)/a
J
1
I ' "
2
= -2~.b
{c./a)]c.{a-a)[b{c./a)-b .]/a
III
J
a
+0
p
(n
1/2
)
2
81
= 0 p (n).
Eb'(~1)
From (5.1.4) we see that if
method of estimating
Eb'(~1)=O
at least.
~.
# O.
the power will depend on the
The symmetry of errors and the symmetry of b. or
are necessary for the conclusion of Bickel's Theorem
3.1. If the errors are not symmetric. we need to choose b such that
Eb'(~)
=0
to get the same result. Usually F is unknown and therefore it
is hard to find such b. When
x - [1 ].
C z.
1
we consider the model
t
y.= z.~1+ u .•
I I I
where u
i
are i.i.d. Let
~
be a bounded continuous function and define
by
Let
~
be an M-estimator of
~
such that
Now we take
b(x)
= fX~(t)dt.
o
and we have
Eb' (~) =
o.
~O
82
Another way of dealing with asymmetric error is using a symmetric b on
symmetrized data. for detail see next section.
5.2 TESTING FOR HETEROSCEDASTICITY WITHOur KNOWING THE SYMMETRY OF
ERRORS.
Residuals are often used in tests for heteroscedasticity. From the
previous
section we
see
that
if
the
errors are
not
symmetrically
distributed. the difference between residuals and errors will cause the
dependence of the distribution of the test statistic on the method of
estimating
~.
Note
that
if
c
and c
i
j
are
i.i.d.
then
C.I
c.
J
is
symmetrically distributed. This motivates us to use the difference of
residuals in place of residuals in our test statistic.
The differences of the residuals that we will to use in our test are
not arbitrary differences
of
the
residuals.
We
need
to
order
the
equations first to get higher efficiency.
We estimate
~
by
~
as in previous section. Order the equations such
that
a.
I
for any i
and j.
statistic will be
(5.2.1)
where
~
a ..
J
Denote e i = c i - 1-
ci
and e i = r i - 1-
ri .
Our
test
83
and
For the test statistic T , we have
d
THEOREM 5.2.1.
Assume the regulari ty conditions for Bickel's Theorem
3.1. (1978) (see following) except for the symmetry of c '
i
(5.2.2)
'"
Pe[Td > z]
=1 -
~z -
Ad) + 0(1),
where
A
n
Before
proving
the
=};.
n
1=
2
l(a. - a.) /n.
theorem,
1
let
us
introduce
the
regularity
conditions used in Bickel (1978):
We shall use M wi th and wi thout subscripts throughout as generic
finite constant with the understanding that they may vary from condition
to condition. For instance, we may use M to represent 2M or
on.
~ and so
84
1. maxi
h.L i I ~
M,
2. pn -1/2~ 0,
3. n
4.
-1
n
1i=1[a{~i)
Ien 112 I
~
- a.{~)]
2
~
-1
M
> 0,
M,
5. F is a distribution symmetric about 0,
6. M-
1
~ J 2 {f) ~ M where
00
f'
2
J 2 {f) = f_oo[X---{x) + 1] f{x)dx
f
if F has an absolutely continuous densi ty f wi th derivative f',
J 2 {F) =
00
and
otherwise.
7. b{x) = be-x) for all x,
8. var
-1
b{c1)~
M ,
9. a is twice continuously differentiable.
I
(i)
la{x)
(ii)
la' (x)
(iii)
~ M, for all x.
I
la"{x) I
~ M. for all x,
~ M, for all x,
10. b is twice continuously differentiable.
(i)
(ii)
(iii)
I
Ib' (x) I
Ib"{x) I
Ib{x)
~
M, for all x,
~
M, for all x,
~ M, for all x,
2
n
11. 1i=1(~i - ~i) = 0p{p).
A
Bickel did not mention condition 9(i) in his paper(1978). But 9(i)
is necessary in his proof.
Condition 10 rules out many interesting b's such as Huber's function
squared and so on. Carroll and Ruppert (1981) pointed out that if we
replace Condition 2 by pn-1/4~ 0, Condition 10 can be weakened, to hold
everywhere except possibly at a finite number of points.
85
Before proving the main theorem we need the
following
lemmas;
withot loss of generality we will assume that n is even.
LEMMA 5.2.1. Let
and
~2
_n/2 ~
~
A 2
Adn = Y·-l[(a
1+
a
·)/2
a.]
/(n/2).
·
1=
2 121
Then under Condition 9 and 11,
(5.2.4)
A2
2
IAdn_ Adn I
< M ~~. n 1 (A~.-~. )2/n 1/2
-
1=
= 0 ( pn-1/2) .
I I p
Proof: Define
w ••
IJ
={
1- 1/(n/2)
l~i,j~n/2.
-1/(n/2)
Then W=[w ij ] (n/2)x(n/2)
is
i=j
idempotent,
vectors a and b, we have
(5.2.5)
To show (5.2.5), notice that
and the only thing left is to show
i~j
and for any n/2 dimensional
86
t
(a Wa)
~
t
(a a).
In fact.
t
t t
a (I - W)a = a T (I - W)Ta
for any orthogonal matrix T. If T is such that
t
T (I - W)T
= D.
where D is a diagonal matrix wi th 1 and 0 as its elements (W is
idempotent). then
"2
2
Also. A - A can be written as
dn
dn
-
~21
~21
w. j [(a2 · 1+ a 2 ·)/2][(a2 · 1+ a 2 ·)/2]/(n/2)
1=
J=
1
11
JJ
"
"
[(a _ + a )/2 - (a2j - + a 2j )/2]/(n/2)}.
1
2j 1 2j
87
Recall that we use M to represent any constants, we have
___n/2
2
· 1+ a 2 1·)] /n
[(a2 1· 1+ a 2 1·) - (a2 11= 1
A "
+
MX·-
by (5.2.5). By Condition 9,
/ M[~ n (A _ a.)2]1/2(-~n)1/2/n
~
""i=1 a i
1
M-
< M[~ n (A _
-
i=1 ~i
~i
)2]1/2/ 1/2
n,
and
we have (5.2.4) when n is large enough.
LEMMA 5.2.2. Let
and
o
88
Then under Condi tion 10 and 11.
Proof: Notice that
A
ei - ei
A
= [(Y i - 1-
A
~i-1) -
(Y i -
A
= (~i
~i)]
A
- ~i) - (~i-1- ~i-1)'
so
The proof is similar to that of Lemma 3.2.1.
LEMMA 5.2.3. Let
and
Suppose Condition 9. 10 and 11 hold. Then
A
Sa b - Sa b =
0
p (1).
Proof: Following "the proof of (A37)" in Bickel (1978). we may get
89
with
The only difference is that we have
in our case and
i
< j.
and therefore W depends on the order of ai·s. But it does not affect
k
the proof because
and
E~ ~ M.
E~ ~ M.
P
A
Therefore. both !i=1 W
k{Pk - Pk )/{n/2)
1/2
p
A
and !i=1Vk{Pk - Pk )/{n/2)
1/2
to zero in probability and
A
Sa b - Sa b
as n goes to infinity.
LEMMA 5.2.4. Let
= 0 p (1).
o
go
90
where
_n/2
A
L i-=1 [be e 2i ) -
b is the same as in Bickel's test,
and
Under the regularity conditions mentioned above,
A
Pa[Tdl
> z]
= P[Tdl
> z]
+ 0(1) = 1 - ~z - A
d1 ) + 0(1),
where
A
d1
(a,n)
where
with
for i<j.
Proof: We need to show the following:
91
,..
a)
T
d1
= T d1 + 0p(l),
b)
c)
Lemma 5.2.1, 5.2.2 and 5.2.3 are sufficient to show a).
To show b), let
,..
By the definitions of Adn and A(dn) , it is easy to see that
and
Therefore
,..
o
,..
~ A(dn) - Adn ~ A(dn)- A(dn) + Adn- Adn
Then result follows Lemma 5.2.1.
To show c), by LeCam's third Lemma on Hajek and Sidak (1967), we
need to check Lindeberg condi tion for bivariate random variables and
show that under HO'
92
This is very easy and very similar to the proof of Bickel's if we notice
that
cov{Td1 , log
rri~1
dP y- 1
e
(Y»
+ o(1)
dP y- 1
o
= cov{Td1 , -
= (2n)
1/2
We have
and
1-
f'
f
+ c. __{c.)]) + o(1)
1
1
_n/2
2
1/2
e{Xi=1[{a2i _ 1+ a 2i )/2 - a.] /(n/2)}
by Lemma 5.2.1 and b).
Let
n
e~'_1a{~i)[1
o
93
2
Comparing A with Adl we would like to see the difference between An and
b
2
A(dn). Recall that
LEMMA 5.2.5. Let
2
An
=~.
n
1=
2
l(a.a.) In.
1
and
Then
2
Proof: A(dn) can be written as
2
_nn
2
A(dn) = Yi=l[(a(2i-l)- a.)/2 + (a(2i)- a.)/2] 1(n/2)
_n/2
2
2
= Yi=l{[(a(2i-l)- a.)/2] +[(a(2i)- a.)/2]
+ 2[(a(2i_l)- a.)/2][(a(2i)- a.)/2]}/(n/2)
But
_n/2
2
= Yi=l[(a(2i-l)- a.)/2 - (a(2i)- a.)/2] In.
94
~ ~/n.
Therefore. we have
o
If we define
A
= limn-+
OO
A •
n
we have
A
and
Proof of THEOREM 5.2.1. Let
where
= limn-+ oo
A(dn ).
95
and
Let
Then
Since
and
96
0(1)
= 2r + 0(1),
where
we have
d
---+ N(O,l).
"
By the definition of T ,
d
...
where
~
ad
and notice
"
= 2~i n
l(a.1
=
"2 n
"
"2 2
- a.) ~.1=2[b(e.)
- b.] /n
1
that b(e ) are pairwise I-dependent and b(e )b(e + ) are
i 1
i
i
3-dependent and both of
them are stationary sequences.
3.7.1 in Stout (1974),
a.s.
and (3.2.2)
By Corollary
97
'"
Pa[Td > z]
=1 -
~z -
Ad) + 0(1).
where
The detail of proof for (5.2.2) is similar to that of LEMMA 5.2.4.
When r
0
< 1/2,
The power can be improved in this case. Examples will be seen in next
section.
5.3. ASYMPTOTIC EFFICIENCY.
The Pitman's efficiency is defined by
2
4E [c b'(c - c )] varb(c )
1
1
1
2
=-------------2
(1+2r)E [c b' (c )]varb(c -c )
1 2
1
1
This depends on the choice of b. We may choose b properly to get high
efficiency.
98
EXAMPLE 5.3.1. One interesting choice of b is the power family:
b(x)
= Ixl a ,
These b's correspond to the LMP tests for f(x) proportional to eFor instance, if our data is from a normal sample, we take
b(x)
= x2 .
Then we have
1 + 2r
so that
4
2 2
1
E(~I)-E(~I)
=
4
2
E(~I)+E(~I)
=
-
1
k
1 + -1
k
where
k
=
1x1a
.
99
the kurtosis of e. Therefore,
1
e(T , T )
d
b
=2
-
(
1 +
-
1
k
1
-
1
)/(1 +
1 +
k
-
1
k
1
1
)
=1
k·
k
This indicates that efficiency tends to be large when the kurtosis is
large. If the distribution function of e
F(x)
= (1-
1
is
15)gxx) + $~),
where gxx) is the standard normal distribution function, the kurtosis k
and e(T , T ), the asymptotic efficiency of T with respect to T , are
d
b
d
b
as follows:
and
These are listed as follows for different values of 6 and
~.
100
2
4
6
o
.()()() . 025
.075 . 200
. ()()() . 025 . 075 . 200
. ()()() . 025 . 075 . 200
k
3.00 3.57 4.25 4.69
3.00 11.7 13.4 9.75
3.00 28.5 22.4 12.2
e
.667 .720 .765 .787
.667 .915 .925 .897
.677.965 .955 .918
From this
table,
we see that
the asymptotic efficiency of our test
relative to score test can be as high as .965 for mildly heavy-tailed
distributions.
EXAMPLE 5.3.2. As shown in Carroll and Ruppert (1981), we may take b as
Huber's function squared:
2
b(x)
={ x 2
k
Ixl ~ k
Ixl > k.
and (5.1.3) still holds. The test statistic is
where
r~*
is
r~
2
truncated at k . Since
r~*
is also equal to r i truncated
at k and then squared, we perform a simulation with all c
i
truncated at
101
k and then take the squares of the differences between them:
where
ICi I
~k
C (-k
i-
ci~
k
and use it in (5.2.1), call the test statistic T . Then we get
h2
3
.025
5
.075
.050
.025
.075
.050
2.761
2.442
2.430
8.939
6.075
4.794
.650
.662
.683
.643
.668
.688
This suggests that if we use residuals robustly (as Bickel's test) in
testing
heteroscedasticity,
to
symmetrize
the
error
by
using
the
differences of residuals, we will lose one third of efficiency.
5.4. SUMMARY.
We
have
differences
shown
of
distribution of
the
the
that
we
equations
test
may
in
eliminate asymmetry
linear model
statistic and
such
by
taking
that
the
the power of
the
the
limit
test are
102
independent of the method of estimating
~.
the regression parameter. To
improve the efficiency of the test. we need to order those equations
first and the formulas for power and efficiency are given in section 5.2
and section 5.3 respectively. From the two examples we may see that the
asymptotic
relative
interesting cases.
efficiency
are
between
.67
and
.965
in
many
103
REFERENCES
Anscombe.
F.J.(1961).
Examination
of
~'1oc.
Residuals.
~e'1keLe~ ~~mp. ~ath. ~tati~t. ~'1ob.
~ou'1th
1 1-36. Univ. of
California Press.
Bickel,
P.J.(1978).
Using Residuals Robustly I:
dasticity. Nonlinearity.
Carroll,
R. J.
and D.
Ruppert
dasticity.
Carroll,
R. J.
and
D.
Ruppert
Likelihood
(1981).
~.~tati~t.6
On Robust Test
~.~tati~t.9
(1982).
and
Tests for Heterosce266-290.
for Heterosce-
206-210.
A Comparison Between Maximum
Generalized
Least
Squares
in
a
Heteroscedastic Linear Model. 'outtnaL ot the
dme'1ican
~tati~ticaL ~~~ociation
Carroll. R.J. and D. Ruppert (1988).
~eg.,'1e~~ion.
77. 878-882.
?han~to'1mation~
and Weigitting., in
Chapman and Hall: New York and London.
Carroll. R.J .• D. Ruppert and J. Wu (1988). On the Effect of Estimating
Weights in Weighted Least Squares. Submited.
Cook.
R.D.
and S.
Weisberg (1983).
Diagnostic For Heteroscedastici ty
in Regression. ~iomet'1ika 70. 1. 1-10.
Cox. D.R. and D.V. Hinkley (1974).
~heo'1eticaL ~tati~tic~.
ChPaman and
Hall: New York and London.
Davidian and Carroll (1987). Variance Function Estimation. 'ou'1naL ot
the dme'1ican
Davidian.
M..
R.
J.
~tati~ticaL ~~~ociation
Carroll and W.
Smith (1988).
82.1079-1091.
Variance Function
104
Estimation and the Minimum Detectable Concentration
Assays. To be published on
Evans.
M.
A.
and
M.
L.
King
(1985).
Heteroscedastic
Bconomet~~c~
~~omet~~ka.
A Point
Optimal
Test
for
toa~ai
Disturbances.
of
27. 163-178.
Geary. R.C. (1966). "A Note on Residual Heterovariance and Estimation
Efficiency in Regression."
dme~~cat/, ~taU~Uc~at/,.
20. (4) 30-31.
Goldfeld. S.M. and R.E. Quandt (l965). "Some Tests for Homoscedastici ty. "
tou~nai
d~~ocLatLon.
Hajek. J. and Z. Sidak (1967).
of
dme~~cat/,
the
~taU~Ucai
50. 539-547.
~heo~~
of
~at/,k ~e~t~.
Academic Press.
New York.
F.R .. E.M.
Hampel,
Ronehetti.
P.J. Rousseuw and W.A. Stahel (1986).
~oUa~t ~tat~~t~c~.
Hoeffding.
W (1963).
Probability
Random
John Willey & Sons. New York.
Inequalities
toa~ai
Variables.
~tati~ticai d~~ociation.
Horn.
P.
(l981).
for
of
to
~ommunication~
the
in
of
the
Bounded
dme'tican
58. 13-30.
"Heteroscedasticity of Residuals:
Al ternative
Sums
A Non-Parametric
Go ldfeld-Quandt
~tati~tic~
Peak
test ...
A. 10. 795-808.
Jobson. J.D. and W.A. Fuller (1980). Least Squares Estimation When the
Covariance
Functionally
Matrix
and
toa~ai
Related.
~tati~ticai d~~oc~ation.
Johnston. J. (1972).
Bconomet't~c ~ethod~
Company. New York.
Parameter
of
Vector
the
Are
dme'tLcan
75. 855-861.
2nd Edition. McGraw-Hill Book
105
Judge.
G.G .•
W.E.
Griffiths.
'!Jheo'ty-
R.C.
Hill
~'tacUce
CUtct
and T--c.
ot
Lee.
8conomet'ti.-c~.
(1984).
~he
Wiley.
New
York.
8tement~
Kmenta. J. (1971).
Lehmann. E.L.
ot
(1983). '!Jheo'tY-
8conomet'ti.-c~.
ot
~oi.-nt
Macmillan. New York.
htl-maUon. Wiley-Interscience.
New York.
McCullagh.
P.
and
D.
Pregibom
(1987).
Effects in Regression.
Park.
R.E.
(1966).
"Estimation
with
k-statistics
~. ~tat.
and Dispersion
15. 202-219.
Heteroscedastic
Error
Terms,"
8conomet'tLca. 34. 888.
Rothenberg.
(1984). Approximate Normality of Generalized Least Squares
Estimation. 8conomet'ti.-ca. 52. 811.
Ruyrngaart.
F.H.
(1987).
Class
Notes
for
Empirical
Processes
and
Applications at Univ. Of North Carolina.
Sukhatrne.
(1958).
Testing the Hypothesis That Two Populations Differ
Only In Location.
~.~ath. ~tat .•
27
60-78.