Lecture 5

Generalized Least Squares (GLS) Theory, Heteroscedasticity
& Autocorrelation
Ba Chu
E-mail: ba [email protected]
Web: http://www.carleton.ca/∼bchu
(Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for
details. Examples will be given and explained in the class)
1
Objectives
Until now we have assumed that var(u) = σ 2 I but it can happen that the errors have non-constant
variance, i.e., var(u1 ) 6= var(u2 ) 6= · · · 6= var(uT ) or are correlated, i.e., E(ut us ) 6= 0 for t 6= s. This
assumption about the errors is true for many economic variables.
Suppose instead that V ar(u) = σ 2 Ω, where the matrix Ω contains terms for heteroscedasticity
and autocorrelation. First, we study the GLS estimator and the feasible GLS estimator. Second,
we apply these techniques to do statistical inference on the regression model with heteroscedastic
errors and test for heteroscedasticity. Third, we learn to estimate regression models with autocorrelated disturbances and test for serial correlation. Fourth, we also learn to use the ML technique
to estimate the regression models with autocorrelated disturbances.
2
GLS
Consider the model:
y = Xβ + u,
1
(2.1)
0
where E[u] = 0 and E[u u] = σ 2 Ω with Ω is any matrix, not necessarily diagonal.
2.1
Problem with the OLS
0
0
0
0
If we estimate (2.1) with the OLS, we will obtain βbOLS = (X X)−1 X y = β + (X X)−1 X u. It is
0
0
0
easy to verify that βbOLS is still unbiased. However, since V ar(βbOLS = σ 2 (X X)−1 X ΩX(X X)−1 6=
0
0
σ 2 (X X)−1 , inferences based on the estimator σb2 (X X)−1 , that we have seen, is misleading. Thus,
t and F tests are invalid. Moreover, βbOLS is not always consistent. This is the main reason for us
to learn the GLS.
2.2
Derivation of the GLS
Suppose Ω has the eigenvalues λ1 , . . . , λT , by Cholesky’s decomposition, we obtain:
0
Ω = SΛS ,
where Λ is a diagonal matrix with the diagonal elements (λ1 , . . . , λT ), and S is an orthogonal
matrix. Thus,
0
Ω−1 = S −1 Λ−1 S −1
0
0
= S −1 Λ−1/2 Λ −1/2 S −1
0
= PP ,
√
√
where P = S −1 Λ−1/2 and Λ−1/2 is a diagonal matrix with the diagonal elements ( λ1 , . . . , λT ).
0
It is straight-forward to prove that P ΩP = IT . Now, multiplying both sides of (2.1) by P yields
P y = P Xβ + P u.
Setting y 0 = P y, X 0 = P X and u0 = P u, we end up with the classical linear regression model:
y 0 = X 0 β + u0 ,
2
0
0
0
0
where E[u0 ] = 0 and E[u0 u0 ] = E[P uu P ] = σ 2 P ΩP = σ 2 IT . The OLS estimate of β is given
by
0
0
βbGLS = (X 0 X 0 )−1 X 0 y 0
0
0
0
0
= (X P P X)−1 X P P y
0
0
= (X Ω−1 X)−1 X Ω−1 y.
The last equation is defined as the GLS estimator denoted by βbGLS . Note that the subscript GLS
is sometimes omitted for brevity.
The unbiased estimate of σ 2 is given by
σb2 GLS =
2.3
1
1
b 0 (y 0 − X 0 β)
b =
b 0 Ω−1 (y − X β).
b
(y 0 − X 0 β)
(y − X β)
T −K
T −K
Properties of the GLS estimator
• The GLS estimator is unbiased.
0
• V ar(βbGLS ) = σ 2 (X ΩX)−1 .
• The GLS estimator is BLUE.
0
• If u ∼ N (0, σ 2 Ω) then βb ∼ N (β, σ 2 (X Ω−1 X)−1 ). The F-test formula is given by
0
0
0
(|{z}
R βb − r) [R(X Ω−1 X)−1 R ]−1 (Rβb − r)
F =
q×K
q σb2 GLS
p
0
• If limT −→∞ X 0 X 0 /T = Q0 > 0, then βb =⇒ β as T −→ ∞.
•
√
d
−1
T (βb − β) =⇒ N (0, σ 2 Q0 ).
3
.
2.4
Feasible GLS
b We have
Since the matrix Ω is unknown, we need to use an estimator, say, Ω.
0
b −1 X)−1 X 0 Ω
b −1 y.
βbF GLS = (X Ω
If the following conditions:
0
b −1 X/T ) = p lim (X 0 Ω−1 X/T )
p lim (X Ω
T −→∞
T −→∞
√
√
b −1 u/ T ) = p lim (X 0 Ω−1 u/ T )
p lim (X Ω
0
T −→∞
T −→∞
hold, then βbGLS and βbF GLS are asymptotically equivalent.
3
Heteroscedasticity
Let’s consider the model (2.1) in which the matrix Ω is a diagonal matrix with the diagonal elements
(ω12 , . . . , ωT2 ). This is a form of heteroscedasticity defined in terms of inconstant variances. We can
estimate this model by using either the FGLS or the modified OLS technique proposed by White
[1980]. The FGLS procedure is straight-forward. We shall describe White’s OLS procedure in
details.
b with the diagonal elements (b
bt2 , . . . , ω
bT2 )
White [1980] proposes a consistent estimator Ω
ω12 , . . . , ω
given by
HC-0 ω
bt2 = u
b2t , where u
bt is the OLS residual.
HC-1 ω
bt2 =
T
u
b2 .
T −K t
HC-2 ω
bt2 =
u2t
,
1−ht
HC-3 ω
bt2 =
u2t
.
(1−ht )2
b
0
0
where ht is t-th diagonal element of the matrix X(X X)−1 X .
b
4
The asymptotic variance-covariance matrix of βbOLS is given by
0
0
0
b
Avar(βbOLS ) = lim (X X/T )p lim(σb2 OLS X ΩX/T
) lim (X X/T ).
T −→∞
T −→∞
Based on this heteroscedasticity-consistent estimator, the general linear hypotheses may be tested
by the Wald statistics:
0
0
d
W = (RβbOLS − r) [RAvar(βbOLS )R ]−1 (Rβb − r) =⇒ χ2 (q).
The White procedure has large-sample validity. It may not work well in finite samples.
We can identify whether or not there exists heteroscedasticity in the noise by using White’s test.
The null hypothesis is homoscedasticity. To do White’s test, we proceed by regressing the OLS
residuals on a constant, original regressors, the squares of the regressors, and the cross products of
the regressors and obtain the R2 value. The test can be constructed by T R2 ∼ χ2 (q), where q is
the number of variables in the auxiliary regression less one. We reject the null if T R2 is sufficiently
large.
4
Serial Correlation
Note that serial correlation and autocorrelation mean the same thing. Consider the model:
0
yt = Xt β + ut , ∀ t = 1, . . . , T,
(4.1)
where E[ut ] = 0, E[u2t = σ 2 and E[ut us ] 6= 0 for t 6= s.
For example, we can model serial correlation by an AR(1) model, i.e.,
ut = ρut−1 + t ,
5
(4.2)
where t ∼ IID(0, σ2 ) and |ρ| < 1. We can compute
σ2
,
1 − ρ2
ρ|t−s| σ2
E[ut us ] =
.
1 − ρ2
E[u2t ] =
More generally, we have



σ2 
0

E[uu ] =

2
1−ρ 


1
ρ
ρ
..
.
ρ
1
..
.
2
ρ
..
.
ρT −1 ρT −2 ρT −3
T −1
... ρ



... ρ 
= σu2 Ω,
.. 

...
. 

...
1
T −2 
(4.3)
where σu2 and Ω have the obvious meanings.
4.1
Estimate (4.1)-(4.2) by FGLS
The FGLS can be done in the following steps:
0
1. Regress yt on Xt and obtain u
bt = yt − Xt βbOLS .
2. Regress u
bt on u
bt−1 and obtain ρbOLS =
P ub ub
P ub
T
t=2 t t−1
T
2
t=2 t−1
.
0
b −1 X)−1 X 0 Ω
b −1 y, where Ω
b is obtained from (4.3) by
3. Construct the FGLS by βbF GLS = (X Ω
substituting ρ with ρbOLS .
If p limT −→∞
1
T
PT
1
p
p
0
Xt ut = 0, then βbOLS =⇒ β and ρbOLS =⇒ ρ. Therefore, βbF GLS is consistent
and asymptotically normal.
6
4.2
Estimate (4.1)-(4.2) by the ML
An application of the conditional probability rules, we have
pdf (y1 , y2 ) = pdf (y2 |y1 )pdf (y1 ),
pdf (y1 , y2 , y3 ) = pdf (y3 |y1 , y2 )pdf (y1 , y2 ) = pdf (y3 |y1 , y2 )pdf (y2 |y1 )pdf (y1 ),
.........
pdf (y1 , . . . , yT ) = pdf (y1 )
T
Y
f (yt |yt−1 , . . . , y1 ).
t=2
Since
0
0
0
0
yt = Xt β + ut = Xt β + ρut−1 + t = Xt β + ρ(yt−1 − Xt−1 β) + t
and t ∼ N (0, σ2 ), we have
0
0
yt |yt−1 ∼ N (ρyt−1 + Xt β − ρXt−1 β, σ2 ).
0
2
σ
Moreover, since y1 ∼ N (X1 β, 1−ρ
2 ), we can derive the log-likelihood function:
log L(β, ρ, σ2 ) = log pdf (y1 , . . . , yT )
= log pdf (y1 )
T
Y
pdf (yt |yt−1 ).
t=2
We then maximize this log likelihood function to obtain the MLEs. The asymptotic variance of the
MLEs can be obtained from the CR lower bound.
7
4.3
Estimate (4.1) by the modified OLS
In general, the model (4.1) with general form of heteroscedasticity and autocorrelation can be
0
0
consistently estimated by the OLS, i.e., βbOLS = (X X)−1 X y. βbOLS is AN, i.e.,
√
d
T (βbOLS − β) =⇒ N (0, σ 2 Q−1 M Q−1 ),
0
0
0
where Q = limT −→∞ X X/T and M = p limT −→∞ X ΩX/T under the assumption E[uu ] = σ 2 Ω.
b = X 0 X/T ; σ 2 M can be consistently estimated by the Newey-West
Q can be estimated by Q
heteroscedasticity and autocorrelation consistent covariance (HAC) matrix estimator:
b0 +
HAC = Γ
m
X
bj + Γ
b0j ),
w(j, m)(Γ
j=1
where
T
1 X
0
b
Γj =
Xt u
bt u
bt−j Xt−j , ∀ j = 1, . . . , m,
T t=j+1
where w(j, m) is a lag window, m is a lag truncation parameter. If w(j, m) = 1 − j/(m + 1), then
w(j, m) is called the Barlett window.
b −1 HAC Q
b −1 .
Hence, the asymptotic covariance matrix of βbOLS can be estimated by Q
4.4
Test for serial correlation
• The Durbin-Watson test
DW =
T
X
(b
ut − u
bt−1 )2 /
2
T
X
u
b2t ,
1
where u
bt is the OLS residual, is used to test for the first-order serial correlation (i.e., AR(1)).
– For H0 : ρ = 0 vs. H1 : ρ > 0, reject H0 if DW ≤ DW` , do not reject if DW > DWu ,
and inconclusive if DW` < DW < DWu with DW` is the lower critical value and DWu
is upper critical value.
– For H0 : ρ = 0 vs. H1 : ρ < 0, reject H0 if DW ≥ 4 − DW` , accept if DW < 4 − DWu .
8
• The Breusch-Godfrey LM test is used to test for high-order serial correlation (i.e., the AR(p)
model as given by
ut =
p
X
φi ut−i + t ).
i=1
0
The test is carried out by doing the OLS regression: u
bt = Xt γ +
Pp
bt−i
i=1 bi u
+ error. Next,
d
we do either the LM test T R2 =⇒ χ2 (q) under the null hypothesis of no serial correlation or
the F-test for b1 = b2 = · · · = bp = 0.
5
Autoregressive Conditional Heteroscedasticity (ARCH)
Engle [1982] suggests that heteroscedasticity may occur in time-series context. In speculative markets such as exchange rates and stock market returns, one often observes the phenomenon called
volatility clustering – large changes tend to be followed by large changes, of either sign, and small
changes tend to be followed by small changes. Observations of this phenomenon in financial time
series have led to the use of ARCH models. Engle [1982] formulates the notion that the recent past
might give information about the conditional disturbance variance. He postulates the relation:
0
yt = Xt β + σt t , where t ∼ IID(0, 1),
0
σt2 = α0 + α1 u2t−1 + · · · + αp u2t−p , where ut = yt − Xt β.
Testing for ARCH can be done in the following steps:
1. Fit y to X by OLS and obtain the residuals {et }.
2. Compute the OLS regression, e2t = α
b0 + α
b1 e2t−1 + · · · + α
bp e2t−p + error.
3. Test the joint significance of α
b0 , α
b1 , . . . , α
bp .
6
Exercises
1. Problems 6.1, 6.2, 6.3 and 6.7 (pp. 202-203, JD97).
9
2. Consider the model
yt = βyt−1 + ut , |β| < 1,
ut = t + θt−1 , |θ| < 1,
where t = 1, . . . , T and t ∼ IID(0, σ 2 ).
(a) Show that the OLS estimator of β is inconsistent when θ 6= 0.
(b) Explain how you would test H0 : θ = 0 vs. H1 : θ 6= 0 using the Lagrangian multiplier
test principle. Is such test likely to be superior to the Durbin-Watson test?.
3. Consider the model
yt = βxt + ut ,
ut = ρut−1 + t , |ρ| < 1,
where xt is non-stochastic and t ∼ IID − N (0, σ 2 ).
(a) Given a sample {xt , yt }Tt=1 , construct the log-likelihood function.
(b) Estimate the parameters β, ρ and σ 2 by the ML method.
(c) Derive the asymptotic variances of these MLEs.
(d) Given xT +1 , what do you think is the best predictor of yT +1 ?.
4. Consider the linear regression model y = Xβ +u, where y is a T ×1 vector of observations on
the dependent variable, X is a T × K matrix of observations on K non-stochastic explanatory
variables with rank(X) = K < T , β is a K × 1 vector of unknown coefficients, and u is a
0
T × 1 vector of unobserved disturbances such that E[u] = 0 and E[uu ] = σ 2 Ω, with Ω
positive definite and known.
(a) Derive the GLS estimator βbGLS of β.
10
0
d
0
(b) Assuming that T −1/2 X Ω−1 u =⇒ N (0, σ 2 limT −→∞ X Ω−1 X/T ), derive the limiting distribution of βbGLS as T −→ ∞. State clearly any further assumptions you make and any
statistical results you use.
5. In the generalized regression model, suppose that Ω is known.
(a) What is the covariance matrix of the OLS and GLS estimators of β?
b OLS = y − X βbOLS ?.
(b) What is the covariance matrix of the OLS residual vector u
b GLS = y − X βbGLS ?.
(c) What is the covariance matrix of the GLS residual u
(d) What is the covariance matrix of the OLS and GLS residual vectors?
0
0
0
6. Suppose that y has the pdf pdf (yt |Xt ) = 1/( Xt β ) exp{−y/((Xt β) )}, y > 0. Then
|{z}|{z}
1×K K×1
0
0
2
E[yt |Xt ] = β Xt and V ar[yt |Xt ] = (β Xt ) . For this model, prove that the GLS and MLE
are the same, even though this distribution involves the same parameters in the conditional
mean function and the disturbance variance.
7. Suppose that the regression model is yt = µ + t , where has a zero mean, constant variance,
and equal correlation ρ across observations. Then Cov(t , s ) = σ 2 ρ if t 6= s. Prove that the
OLS estimator of µ is inconsistent.
8. Suppose that the regression model is yt = µ + t , where
E[t |xt ] = 0, Cov(t s |xt , xs ) = 0 for i = j, but V ar[t |xt ] = σ 2 x2t , xt > 0.
(a) Given a sample of observations on yt and xt , what is the most efficient estimator of µ?,
What is its variance?.
(b) What is the OLS estimator of µ?, and what is its variance?.
(c) Prove that the estimator in part (a) is at least as efficient as the estimator in part (b).
11
References
Engle, R. F. [1982]. Autoregressive conditional heteroscedasticity with estimates of variance of
united kingdom inflation, Econometrica 50: 987–1008.
White, H. [1980]. A heteroskedasticity-consistent covariance matrix estimator and a direct test for
heteroscedasticity, Econometrica 48: 817–838.
12