No Slide Title

Econ 140
Heteroskedasticity
Lecture 19
Lecture 19
1
Today’s plan
Econ 140
• How to test for it: graphs, Park and Glejser tests
• What we can do if we find heteroskedasticity
• How to estimate in the presence of heteroskedasticity
Lecture 19
2
Palm Beach County revisited
Econ 140
• How far is Palm Beach an outlier?
– Can the outlier be explained by heteroskedasticity?
– If so, what are the consequences?
• Heteroskedasticity will affect the variance of the regression
line
– It will consequently affect the variance of the estimated
coefficients
• L19.XLS provides an example of how to work through a
problem like this using Excel
Lecture 19
3
Palm Beach County revisited (2)Econ 140
• Palm Beach is a good example to use since there are scale
effects in the data
– The voting pattern shows that the voting behavior and
number of registered voters are related to the
population in each county
• As the county gets larger, voting patterns may diverge
from what would be assumed given the number of
registered voters
– Note from the graph: as we move away from the origin,
the difference between registered Reform voters and
Reform votes cast increases
– We’ll hypothesize that this will have an affect on
heteroskedasticity
Lecture 19
4
Notation
Econ 140
• Heteroskedasticity is observed as cross-section variability
in the data
– data across units at point in time
• In our notation, heteroskedasticity is:
E(ei2)  2
• We can also write:
E(ei2) = i2
– This means that we expect variable variance: the
variance changes with each unit of observation
Lecture 19
5
Consequences
Econ 140
When heteroskedasticity is present:
1) OLS estimator is still linear
2) OLS estimator is still unbiased
3) OLS estimator is not efficient - the minimum
variance property no longer holds
4) Estimates of the variances are biased
5) 2
eˆ 2

̂ YX 
n  k is not an unbiased estimator of YX2
6) We can’t trust the confidence intervals or
hypothesis tests (t-tests & F-tests): we may draw the
wrong conclusions
Lecture 19
6
Consequences (2)
Econ 140
• When BLUE holds and there is homoskedasticity, the firstorder condition gives:

V bˆ  
ci2 2
xi
ci 
 xi2
where
• With heteroskedasticity, we have:
 
V ˆ   ci2 i2
• If we substitute the equation for ci to both equations, we
find:
2


ˆ  i
V bˆ 
and
V

2
2
x
x
 i
 i

Lecture 19
 
7
Cases
Econ 140
• With homoskedasticity: around each point, the variance
around the regression line is constant
• With heteroskedasticity: around each point, the variance
around the regression line varies with each value of the
independent variable
Lecture 19
8
Detecting heteroskedasticity
Econ 140
• There are three ways of detecting heteroskedastiticy:
1) Graphically
2) Park Test
3) Glejser Test
Lecture 19
9
Graphical detection
Econ 140
• We can see that the errors vary with the unit of observation
• With homoskedasticity we find that for E(ei, X) = 0 :
• The errors are independent of the independent variables
• With heteroskedasticity we can get a variety of patterns
• The errors show a systematic relationship with the
independent variables
• Note: you can use either e or e2 on the y-axis
Lecture 19
10
Graphical detection (3)
Econ 140
• Using the Palm Beach example (L19.xls), the estimated
regression equation was:
Yˆ  50.28  2.45 X
• The errors of this equation, eˆ  Y  Yˆ can be graphed
against the number of registered Reform party voters, (the
independent variable)
– Graph shows that the errors increasing with the number
of registered reform voters
• While the graphs may be convincing, we also want to use a
test to confirm this. We have two:
Lecture 19
11
Park Test
Econ 140
• Here’s the procedure:
1) Run regression Yi = a + bXi + ei despite the
heteroskedasticity problem (it can also be multivariate)
2) Obtain residuals (ei), square them (ei2), and take their
logs (ln ei2)
3) Run a spurious regression: ln ei2  g 0  g1 ln X i  vi
4) Do a hypothesis test on ĝ1 with H0: g1 = 0
5) Look at the results of the hypothesis test:
• reject the null: you have heteroskedasticity
2
ln
e
• fail to reject the null: homoskedasticity, or
i  g0
which is a constant
Lecture 19
12
Glejser Test
Econ 140
• When we use the Glejser, we’re looking for a scaling effect
• The procedure:
1) Run the regression (it can also be multivariate)
2) Collect ei terms
3) Take the absolute value of the errors
4) Regress |ei| against independent variable(s)
• you can run different kinds of regressions:
ei  g 0  g1 X i  ui
or ei  g 0  g1 X i  ui
1
or ei  g 0  g1
 ui
Xi
Lecture 19
13
Glejser Test (2)
Econ 140
4) [continued]
• If heteroskedasticity takes one of these forms, this
will suggest an appropriate transformation of the
model
• The null hypothesis is still H0: g1 = 0 since we’re
testing for a relationship between the errors and the
independent variables
• We reach the same conclusions as in the Park Test
Lecture 19
14
A cautionary note
Econ 140
• The errors in the Park Test (vi) and the Glejser Test (ui)
might also be heteroskedastic.
– If this is the case, we cannot trust the hypothesis test
H0: g1 = 0 or the t-test
• If we find heteroskedastic disturbances in the data, what
can we do?
– Estimate the model Yi = a + bXi + ei using weighted
least squares
– We’ll look at two examples of weighted least squares:
one where we know the true variance, and one where
we don’t
Lecture 19
15
Correction with known i2
Econ 140
• Given that the true variance is known and our model is:
Yi = a + bXi + ei
• Consider the following transformation of the model:
Yi
i
a
1
i
b
Xi
i

ei
i
– In the transformed model, let ei  i  ui
– So the expected value of the error squared is:
 
E ui 2 
Lecture 19
E (ei2 )
 i2
16
Correction with known i2 (2)
Econ 140
• Given that there is heteroskedasticity, E(ei2) = i2
– thus:
2

E (ui2 )  i2  1
i
• In this simplistic example, we re-weighted model by the
constant i
• What this example shows: when the variance is known, we
must transform our model to obtain a homoskedastic error
term.
Lecture 19
17
Correction with unknown i2
Econ 140
• Given an unknown variance, we need to state the ad-hoc
but plausible assumptions with our variance i2 (how the
errors vary with the independent variable)
• For example: we can assert that E(ei2) = 2Xi
• Remember: Glejser Test allows us to choose a relationship
between the errors and the independent variable
Yi
Xi
ei
1
a
b

Xi
Xi
Xi
Xi
Lecture 19
18
Correction with unknown i2 (2)
Econ 140
• In this example you would transform the estimating
equation by dividing through by X i to get:
Yi
Xi
ei
1
a
b

Xi
Xi
Xi
Xi
• Letting:
ei

Xi
– The expected value of this error squared is:
   
E i2
Lecture 19
E ei2

Xi
19
Correction with unknown i2 (3)
Econ 140
• Recalling an earlier assumption, we find:


 
E i2
E ei2  2 X i


2
Xi
Xi
• When we don’t know the true variance we re-scale the
estimating equation by the independent variable
Lecture 19
20
Returning to Palm Beach
Econ 140
• On L19.xls we have presidential election data by county in
Florida
– To get a correct estimating equation, we can run a
regression without Palm Beach if we think it’s an
outlier.
– Then we can see if we can obtain a prediction for the
number of reform votes cast in Palm Beach
– We can perform a Glejser Test for the regression
excluding Palm Beach
– We run a regression of the absolute value of the errors
(|ei|)against registered Reform voters (Xi)
Lecture 19
21
Returning to Palm Beach (2)
Econ 140
• The t-test rejects the null
– this indicates the presence of heteroskedasticity
• We can re-scale the model in different ways or introduce a
new independent variable (such as the total number of
registered voters by county)
• Keep transforming the model and running the Glejser Test
– When we fail to reject the null: there is no longer
heteroskedasticity in the model
Lecture 19
22
Summary
Econ 140
• Even with re-weighted equations, we might still have
heteroskedastic errors
– so we have to rerun the Glejser Test until we cannot
reject the null
• If we cannot reject the null, we may have to rethink our
model transformation
– if we suspect a scale effect, we may want to introduce
new scaling variables
• Variables from the re-scaled equation are comparable with
the coefficients from the original model
Lecture 19
23