[12pt] FINM Intro: Regression [3pt] Lecture 2 [6pt] Robustness [5pt]

FINM Intro: Regression
Lecture 2
Robustness
Mark Hendricks
Autumn 2016
FINM Intro: Regression
Outline
OLS Summary
Large Sample Properties
Robust Estimators
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
2/42
Assumption: Full-rank
Assumption 1:
X0 X is full rank.
Equivalently, assume that there is no exact linear relationship
among any of the regressors.
Hendricks,
I
Clearly, the existence of OLS estimator requires that this
assumption be satisfied.
I
Multicollinearity refers to the case where this assumption fails.
Autumn 2016
FINM Intro: Regression: Lecture 2
3/42
Assumption: Exogeneity
Assumption 2:
is exogenous to the regressors, x.
E [ |x] = 0
The exogeneity assumption,
Hendricks,
I
implies that is uncorrelated with x.
I
implies that is uncorrelated with any function of x.
I
does NOT imply that is independent of x.
Autumn 2016
FINM Intro: Regression: Lecture 2
4/42
Assumption: Homoscedastic and orthogonal residuals
Assumption 3: The residuals are uncorrelated across
observations, with identical variances,
Σ = E 0 |x = σ 2 In
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
5/42
Gauss-Markov Theorem
Given the assumptions above, the OLS estimator is the Best Linear
Unbiased Estimator (BLUE) of β.
b = X0 X
−1
X0 Y
var [b |x] =σ 2 X0 X
Hendricks,
Autumn 2016
−1
FINM Intro: Regression: Lecture 2
6/42
Assumption: Normality of residuals
Assumption 4:
The residuals, are normally distributed.
|x ∼ N (0, Σ)
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
7/42
Distribution of OLS estimator
Assumptions 1, 2, 3, 4 imply
b |x ∼ N (β, Ω)
where
Ω = σ2 X 0X
−1
Often, these 4 assumptions are referred to as the classical
regression model.
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
8/42
Is OLS robust?
How good is OLS if the assumptions do not hold?
Hendricks,
I
Financial data is usually non-normal—violating Assumption 4.
I
Time-series models will almost always violate
exogeneity—Assumption 2.
I
Macro-economic data typically has correlated residuals, while
asset prices show time-varying volatility—violations of
Assumption 3.
Autumn 2016
FINM Intro: Regression: Lecture 2
9/42
OLS corrections
Two main ways to address these problems:
I
Large sample properties. (Relax assumptions 2, 4.)
I
Robust standard errors. (Relax assumption 3.)
Instrumental Variable Regression (IV) is also very important in dealing
with assumption 2, but will not be discussed here.
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
10/42
Outline
OLS Summary
Large Sample Properties
Robust Estimators
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
11/42
Non-normality
Applications often do not satisfy Assumption 4, upon which the
inference results relied.
Hendricks,
I
However, the asymptotic distribution of the OLS estimate is
an application of the Central Limit Theorem.
I
In practice, inference often relies on having large data sets and
appealing to the asymptotic results.
Autumn 2016
FINM Intro: Regression: Lecture 2
12/42
Central Limit Theorem
Covered in other September Review Modules.
Hendricks,
I
Basically, it says that as sample size increases, the sample
average statistic converge to a normal distribution.
I
Slightly more complicated for non-iid data, but weaker
versions hold.
I
Note that the OLS estimator can be rewritten as a sample
average of !
Autumn 2016
FINM Intro: Regression: Lecture 2
13/42
Example - Central Limit Theorem
Exponential Distribution
2
Sample Averages: n=
350
5
300
1.5
250
200
1
150
100
0.5
50
0
0
1
2
3
4
5
Sample Averages: n= 50
300
250
200
200
150
150
100
100
0
0.2
Autumn 2016
0
300
250
50
Hendricks,
0
0.5
1
1.5
Skewness = 0.9088, Kurtosis = 4.24
2
Sample Averages: n= 500
50
0.3
0.4
0.5
0.6
0.7
0.8
Skewness = 0.2980, Kurtosis = 3.11
0.9
0
0.4
0.45
0.5
0.55
0.6
Skewness = 0.0929, Kurtosis = 2.89
FINM Intro: Regression: Lecture 2
0.65
14/42
Assumption: Orthogonality of population residuals
Assumption 5:
the regressors.
The population residuals are uncorrelated with
E x0 = 0
Hendricks,
I
This assumption is much weaker than Assumption 2.
I
This is a restriction on the population variables, not the fitted
estimates, which have zero correlation by construction.
Autumn 2016
FINM Intro: Regression: Lecture 2
15/42
Consistency
A sample statistic is consistent if it converges to the true
population value in probability.
I
Suppose that Assumptions 1, 5 hold.
I
Then the OLS estimator, b is consistent,
I
In practice, more attention is paid to having a consistent
estimator than an unbiased estimator, due to the weaker
assumption.
plim b = β
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
16/42
Asymptotic distribution of OLS
Under Assumptions 1,3, 5, the OLS estimate is asymptotically
normal,
b |x ∼asym N (β, Ω)
where
Ω = σ 2 X0 X
Hendricks,
Autumn 2016
−1
FINM Intro: Regression: Lecture 2
17/42
Heteroscedastic and autocorrelated inference
For many applications, particularly in time-series, Assumption 3 is
clearly false.
I
For practical purposes, this is not a big problem for inference.
Under Assumptions 1, 5, the OLS estimate is asymptotically
normal,
b |x ∼asym N (β, Ω)
where
Ω = X0 X
Hendricks,
Autumn 2016
−1
X0 ΣX X0 X
−1
FINM Intro: Regression: Lecture 2
18/42
OLS without iid errors
With non-iid errors, OLS is still unbiased (or consistent).
Hendricks,
I
Thus, it is appropriate to estimate with OLS, but one must
use the larger variance given by
−1
−1 0
X ΣX X0 X
var [b |x] = X0 X
I
Non-OLS estimators, such as GLS, may have lower variances
which allow for more confident inference.
Autumn 2016
FINM Intro: Regression: Lecture 2
19/42
FINM Intro: Regression: Lecture 2
20/42
Outline
OLS Summary
Large Sample Properties
Robust Estimators
Hendricks,
Autumn 2016
Checking for heteroscedasticity
Consider a regression of excess stock returns on the risk-free rate.
Hendricks,
I
To see if heteroscedasticity is a problem, try plotting the
residuals against some conditioning variable.
I
If the range of the sample residuals seems to change across
the values of the conditioning variable, this may indicate
heteroscedasticity.
Autumn 2016
FINM Intro: Regression: Lecture 2
21/42
Residuals: Excess return on risk-free rate
100
sample residual
80
60
40
20
0
−20
−40
−2
0
2
4
6
8
10
12
14
16
18
risk−free rate
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
22/42
Lagrange Multipler test
The Lagrange Multiplier Test (Breusch-Pagan) tests the
hypothesis that
σi2 = σ 2 z0i α
where zi is a vector of conditioning variables for observation i.
Hendricks,
I
If the model is homoscedastic, then α = 0.
I
One might try using a subset of x for the variables z.
I
This tests a certain form of heteroscedasticity. In fact, it need
not be linear, but even tests
σi2 = σ 2 f z0i α
Autumn 2016
FINM Intro: Regression: Lecture 2
23/42
Computing the LM test
Regress sample estimates of variances on x (or subset of x),
ei2 = x0 γ + νi
LM test stat is R 2 from this regression multiplied by the sample
size:
LM = n R 2 ,
Hendricks,
LM ∼a χ2 (k)
I
For the example above, the LM test rejects homoscedasticity
at the 1% level.
I
The LM test can perform poorly with nonnormal data, but the
simple adjustments are available.
Autumn 2016
FINM Intro: Regression: Lecture 2
24/42
Other tests of heteroscedasticity
The LM tests again a certain parametrization of heteroscedasity.
This gives the test power, but means it may be mispecified.
Hendricks,
I
White’s test is quite general: it makes no assumption about
the nature of the heteroscedasticity. It examines the
R-squared from regressing the squared errors on X along with
quadratic terms in X.
I
The Goldfeld-Quandt test simply tests one subset of the
data against another subset. It looks for statistical difference
in the variances of the subsets.
Autumn 2016
FINM Intro: Regression: Lecture 2
25/42
Correcting for heteroscedasticity
With heteroscedasticty,
var [b |x] = X0 X
−1
X0 ΣX X0 X
−1
The key is how to estimate Σ. There are two approaches:
Hendricks,
I
Use nonparametric estimation of X0 ΣX.
I
Make parametric assumptions about the form of Σ, and
estimate these.
Autumn 2016
FINM Intro: Regression: Lecture 2
26/42
Nonparametric estimation of Σ
Recall that Σ = E [0 |x]
Hendricks,
I
This is an n × n matrix. There is no hope of estimating it
using a sample of size n.
I
This is one reason that a parametric assumption on Σ is
useful.
I
But using just the data, one can get an estimate of X0 ΣX, a
(k + 1) × (k + 1) matrix.
Autumn 2016
FINM Intro: Regression: Lecture 2
27/42
White estimator
Write out
X0 ΣX =
n
X
σi2 xi x0i
i=1
noting that we are assuming Σ is diagonal (no autocorrelation.)
Then the White estimator is
X0 X
−1
n
X
!
ei2 xi x0i
X0 X
−1
i=1
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
28/42
Parametric Estimation
If we know the form of the serial correlation and heteroscedasticity,
we can form efficient estimators.
Hendricks,
I
Recall that heteroscedasticity means that some observations
have more statistical noise (epsilon shocks) than others.
I
Efficient estimation would simply put less weight on these
observations.
I
Similarly, if we know which observations have correlated
errors, we can put relatively less weight on these observations
given that they do not contain as much new information.
Autumn 2016
FINM Intro: Regression: Lecture 2
29/42
Generalized Least Squares
Suppose that we know the covariance matrix of , denoted Σ.
I
Weight the observations by the inverted covariance matrix.
(Pay more attention to the more precise data.)
I
This yields the following, efficient estimator:
b = X0 Σ−1 X
I
−1
X0 Σ−1 Y
The covariance of the GLS estimator is
var (b) = Ω = X0 Σ−1 X
Hendricks,
Autumn 2016
−1
FINM Intro: Regression: Lecture 2
30/42
Non-parametric v.s. parametric estimation
There is a tradeoff in model assumptions and estimation precision.
Hendricks,
I
The White estimator is impressive in that it makes no
assumption about the form of heteroscedasticity.
I
However, sample estimates of XΣX can perform quite poorly.
I
Further, the White estimator reveals nothing about the
underlying heteroscedasticity which is useful for forecasting or
studying the variance process.
Autumn 2016
FINM Intro: Regression: Lecture 2
31/42
Serial correlation
As with heteroscedasticity, serial correlation changes the inference
of the OLS estimate compared to the classic case where Σ = σ 2 I.
Hendricks,
I
With serial correlation, there are off-diagonal elements in Σ.
I
As mentioned, OLS is still valid, given that one uses the more
complicated equation for the variance of b.
Autumn 2016
FINM Intro: Regression: Lecture 2
32/42
Example of residual autocorrelation
In many time-series regressions, the errors exhibit autocorrelation.
Hendricks,
I
This is the idea that a shock to the variable at time t may still
be affecting the value at time t + 1.
I
Correlation of residuals invalidates the finite-sample inference
results of OLS.
I
Consider the previous example of regressing unemployment on
U.S. Treasury yields.
Autumn 2016
FINM Intro: Regression: Lecture 2
33/42
Residuals: Unemployment on the yield spread
Residual analysis: unemp on yields
sample residuals
6
4
2
0
−2
−4
1960
Hendricks,
Autumn 2016
1980
2000
FINM Intro: Regression: Lecture 2
2020
34/42
Autocorrelated series
The autocorrelation of the error series at a monthly frequency is
97%.
Hendricks,
I
This essentially says that the regression has much less data
than the classic formulas understand.
I
With highly correlated data, there is little true sample
variation for OLS to use in estimation.
I
Consider regressing one very persistent data series on another
persistent data series.
I
The levels of such persistent X and Y may track closely
together just due to the persistence.
Autumn 2016
FINM Intro: Regression: Lecture 2
35/42
Model mispecification
Often, autocorrelated errors are a sign that the model is
mispecified.
Hendricks,
I
This is commonly caused by having a time-trend in the data.
I
This also may be a sign that the model should use the
differenced data.
I
Much of time-series statistics deals with examining whether
the data has a time-trend, a random-walk, or cointegration.
I
This is beyond the scope of the notes.
Autumn 2016
FINM Intro: Regression: Lecture 2
36/42
Non-parametric v.s. parametric estimation
Like with heteroscedasticity, one can use parametric assumptions
to simplify the estimation.
Hendricks,
I
Time-series statistics often makes assumptions about a linear
model having autoregressive (AR) or moving average (MA)
components.
I
Again, this will be discussed more later in the program.
Autumn 2016
FINM Intro: Regression: Lecture 2
37/42
AR(1) serial correlation
Consider the AR(1) model for .
t =ρt−1 + ut
where ut is homoscedastic, uncorrelated, with variance σu2 .
This implies that
1
ρ
ρ2
 ρ
1
ρ

Σ=

..
1 − ρ2 
.
T
−1
T
−2
ρ
ρ
···

σu2
ρ3 · · ·
ρ2 · · ·
ρ2
ρ

ρT −1
T
−2

ρ



1
This is a widely used model for time-series correlation.
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
38/42
Non-parametric estimation
The goal is to estimate
var [b |x] = X0 X
−1
X0 ΣX X0 X
−1
which depends on estimating X0 ΣX.
X0 ΣX =
n X
n
X
σi,j xi x0j
i=1 j=1
One might estimate X0 ΣX
n X
n
X
ei ej xi x0j
i=1 j=1
Hendricks,
Autumn 2016
FINM Intro: Regression: Lecture 2
39/42
Trouble in estimation
Unfortunately, this sample estimate is not guaranteed to be
positive definite.
Hendricks,
I
The common way to deal with this is to put less weight on
observations further separated by time.
I
Several different weighting schemes have been employed.
Autumn 2016
FINM Intro: Regression: Lecture 2
40/42
Newey-West estimator
The Newey-West estimator of X0 ΣX is popular:
n
X
ei2 xi x0i +
i=1
w` = 1 −
L X
n
X
w` et et−` xt x0t−` + xt−` x0t
`=1 t=`+1
`
L+1
for some number of lags L.
Hendricks,
I
First term is the same as the heteroscedasticity-consistent
estimator.
I
Second term estimates autocorrelations of errors.
Autumn 2016
FINM Intro: Regression: Lecture 2
41/42
References
I
Hendricks,
Cochrane, John. Asset Pricing. 2001.
I
Greene, William. Econometric Analysis. 2011.
I
Hamilton, James. Time Series Analysis. 1994.
I
Wooldridge, Jeffrey. Econometric Analysis of Cross Section
and Panel Data. 2011.
Autumn 2016
FINM Intro: Regression: Lecture 2
42/42