FINM Intro: Regression Lecture 2 Robustness Mark Hendricks Autumn 2016 FINM Intro: Regression Outline OLS Summary Large Sample Properties Robust Estimators Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 2/42 Assumption: Full-rank Assumption 1: X0 X is full rank. Equivalently, assume that there is no exact linear relationship among any of the regressors. Hendricks, I Clearly, the existence of OLS estimator requires that this assumption be satisfied. I Multicollinearity refers to the case where this assumption fails. Autumn 2016 FINM Intro: Regression: Lecture 2 3/42 Assumption: Exogeneity Assumption 2: is exogenous to the regressors, x. E [ |x] = 0 The exogeneity assumption, Hendricks, I implies that is uncorrelated with x. I implies that is uncorrelated with any function of x. I does NOT imply that is independent of x. Autumn 2016 FINM Intro: Regression: Lecture 2 4/42 Assumption: Homoscedastic and orthogonal residuals Assumption 3: The residuals are uncorrelated across observations, with identical variances, Σ = E 0 |x = σ 2 In Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 5/42 Gauss-Markov Theorem Given the assumptions above, the OLS estimator is the Best Linear Unbiased Estimator (BLUE) of β. b = X0 X −1 X0 Y var [b |x] =σ 2 X0 X Hendricks, Autumn 2016 −1 FINM Intro: Regression: Lecture 2 6/42 Assumption: Normality of residuals Assumption 4: The residuals, are normally distributed. |x ∼ N (0, Σ) Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 7/42 Distribution of OLS estimator Assumptions 1, 2, 3, 4 imply b |x ∼ N (β, Ω) where Ω = σ2 X 0X −1 Often, these 4 assumptions are referred to as the classical regression model. Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 8/42 Is OLS robust? How good is OLS if the assumptions do not hold? Hendricks, I Financial data is usually non-normal—violating Assumption 4. I Time-series models will almost always violate exogeneity—Assumption 2. I Macro-economic data typically has correlated residuals, while asset prices show time-varying volatility—violations of Assumption 3. Autumn 2016 FINM Intro: Regression: Lecture 2 9/42 OLS corrections Two main ways to address these problems: I Large sample properties. (Relax assumptions 2, 4.) I Robust standard errors. (Relax assumption 3.) Instrumental Variable Regression (IV) is also very important in dealing with assumption 2, but will not be discussed here. Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 10/42 Outline OLS Summary Large Sample Properties Robust Estimators Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 11/42 Non-normality Applications often do not satisfy Assumption 4, upon which the inference results relied. Hendricks, I However, the asymptotic distribution of the OLS estimate is an application of the Central Limit Theorem. I In practice, inference often relies on having large data sets and appealing to the asymptotic results. Autumn 2016 FINM Intro: Regression: Lecture 2 12/42 Central Limit Theorem Covered in other September Review Modules. Hendricks, I Basically, it says that as sample size increases, the sample average statistic converge to a normal distribution. I Slightly more complicated for non-iid data, but weaker versions hold. I Note that the OLS estimator can be rewritten as a sample average of ! Autumn 2016 FINM Intro: Regression: Lecture 2 13/42 Example - Central Limit Theorem Exponential Distribution 2 Sample Averages: n= 350 5 300 1.5 250 200 1 150 100 0.5 50 0 0 1 2 3 4 5 Sample Averages: n= 50 300 250 200 200 150 150 100 100 0 0.2 Autumn 2016 0 300 250 50 Hendricks, 0 0.5 1 1.5 Skewness = 0.9088, Kurtosis = 4.24 2 Sample Averages: n= 500 50 0.3 0.4 0.5 0.6 0.7 0.8 Skewness = 0.2980, Kurtosis = 3.11 0.9 0 0.4 0.45 0.5 0.55 0.6 Skewness = 0.0929, Kurtosis = 2.89 FINM Intro: Regression: Lecture 2 0.65 14/42 Assumption: Orthogonality of population residuals Assumption 5: the regressors. The population residuals are uncorrelated with E x0 = 0 Hendricks, I This assumption is much weaker than Assumption 2. I This is a restriction on the population variables, not the fitted estimates, which have zero correlation by construction. Autumn 2016 FINM Intro: Regression: Lecture 2 15/42 Consistency A sample statistic is consistent if it converges to the true population value in probability. I Suppose that Assumptions 1, 5 hold. I Then the OLS estimator, b is consistent, I In practice, more attention is paid to having a consistent estimator than an unbiased estimator, due to the weaker assumption. plim b = β Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 16/42 Asymptotic distribution of OLS Under Assumptions 1,3, 5, the OLS estimate is asymptotically normal, b |x ∼asym N (β, Ω) where Ω = σ 2 X0 X Hendricks, Autumn 2016 −1 FINM Intro: Regression: Lecture 2 17/42 Heteroscedastic and autocorrelated inference For many applications, particularly in time-series, Assumption 3 is clearly false. I For practical purposes, this is not a big problem for inference. Under Assumptions 1, 5, the OLS estimate is asymptotically normal, b |x ∼asym N (β, Ω) where Ω = X0 X Hendricks, Autumn 2016 −1 X0 ΣX X0 X −1 FINM Intro: Regression: Lecture 2 18/42 OLS without iid errors With non-iid errors, OLS is still unbiased (or consistent). Hendricks, I Thus, it is appropriate to estimate with OLS, but one must use the larger variance given by −1 −1 0 X ΣX X0 X var [b |x] = X0 X I Non-OLS estimators, such as GLS, may have lower variances which allow for more confident inference. Autumn 2016 FINM Intro: Regression: Lecture 2 19/42 FINM Intro: Regression: Lecture 2 20/42 Outline OLS Summary Large Sample Properties Robust Estimators Hendricks, Autumn 2016 Checking for heteroscedasticity Consider a regression of excess stock returns on the risk-free rate. Hendricks, I To see if heteroscedasticity is a problem, try plotting the residuals against some conditioning variable. I If the range of the sample residuals seems to change across the values of the conditioning variable, this may indicate heteroscedasticity. Autumn 2016 FINM Intro: Regression: Lecture 2 21/42 Residuals: Excess return on risk-free rate 100 sample residual 80 60 40 20 0 −20 −40 −2 0 2 4 6 8 10 12 14 16 18 risk−free rate Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 22/42 Lagrange Multipler test The Lagrange Multiplier Test (Breusch-Pagan) tests the hypothesis that σi2 = σ 2 z0i α where zi is a vector of conditioning variables for observation i. Hendricks, I If the model is homoscedastic, then α = 0. I One might try using a subset of x for the variables z. I This tests a certain form of heteroscedasticity. In fact, it need not be linear, but even tests σi2 = σ 2 f z0i α Autumn 2016 FINM Intro: Regression: Lecture 2 23/42 Computing the LM test Regress sample estimates of variances on x (or subset of x), ei2 = x0 γ + νi LM test stat is R 2 from this regression multiplied by the sample size: LM = n R 2 , Hendricks, LM ∼a χ2 (k) I For the example above, the LM test rejects homoscedasticity at the 1% level. I The LM test can perform poorly with nonnormal data, but the simple adjustments are available. Autumn 2016 FINM Intro: Regression: Lecture 2 24/42 Other tests of heteroscedasticity The LM tests again a certain parametrization of heteroscedasity. This gives the test power, but means it may be mispecified. Hendricks, I White’s test is quite general: it makes no assumption about the nature of the heteroscedasticity. It examines the R-squared from regressing the squared errors on X along with quadratic terms in X. I The Goldfeld-Quandt test simply tests one subset of the data against another subset. It looks for statistical difference in the variances of the subsets. Autumn 2016 FINM Intro: Regression: Lecture 2 25/42 Correcting for heteroscedasticity With heteroscedasticty, var [b |x] = X0 X −1 X0 ΣX X0 X −1 The key is how to estimate Σ. There are two approaches: Hendricks, I Use nonparametric estimation of X0 ΣX. I Make parametric assumptions about the form of Σ, and estimate these. Autumn 2016 FINM Intro: Regression: Lecture 2 26/42 Nonparametric estimation of Σ Recall that Σ = E [0 |x] Hendricks, I This is an n × n matrix. There is no hope of estimating it using a sample of size n. I This is one reason that a parametric assumption on Σ is useful. I But using just the data, one can get an estimate of X0 ΣX, a (k + 1) × (k + 1) matrix. Autumn 2016 FINM Intro: Regression: Lecture 2 27/42 White estimator Write out X0 ΣX = n X σi2 xi x0i i=1 noting that we are assuming Σ is diagonal (no autocorrelation.) Then the White estimator is X0 X −1 n X ! ei2 xi x0i X0 X −1 i=1 Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 28/42 Parametric Estimation If we know the form of the serial correlation and heteroscedasticity, we can form efficient estimators. Hendricks, I Recall that heteroscedasticity means that some observations have more statistical noise (epsilon shocks) than others. I Efficient estimation would simply put less weight on these observations. I Similarly, if we know which observations have correlated errors, we can put relatively less weight on these observations given that they do not contain as much new information. Autumn 2016 FINM Intro: Regression: Lecture 2 29/42 Generalized Least Squares Suppose that we know the covariance matrix of , denoted Σ. I Weight the observations by the inverted covariance matrix. (Pay more attention to the more precise data.) I This yields the following, efficient estimator: b = X0 Σ−1 X I −1 X0 Σ−1 Y The covariance of the GLS estimator is var (b) = Ω = X0 Σ−1 X Hendricks, Autumn 2016 −1 FINM Intro: Regression: Lecture 2 30/42 Non-parametric v.s. parametric estimation There is a tradeoff in model assumptions and estimation precision. Hendricks, I The White estimator is impressive in that it makes no assumption about the form of heteroscedasticity. I However, sample estimates of XΣX can perform quite poorly. I Further, the White estimator reveals nothing about the underlying heteroscedasticity which is useful for forecasting or studying the variance process. Autumn 2016 FINM Intro: Regression: Lecture 2 31/42 Serial correlation As with heteroscedasticity, serial correlation changes the inference of the OLS estimate compared to the classic case where Σ = σ 2 I. Hendricks, I With serial correlation, there are off-diagonal elements in Σ. I As mentioned, OLS is still valid, given that one uses the more complicated equation for the variance of b. Autumn 2016 FINM Intro: Regression: Lecture 2 32/42 Example of residual autocorrelation In many time-series regressions, the errors exhibit autocorrelation. Hendricks, I This is the idea that a shock to the variable at time t may still be affecting the value at time t + 1. I Correlation of residuals invalidates the finite-sample inference results of OLS. I Consider the previous example of regressing unemployment on U.S. Treasury yields. Autumn 2016 FINM Intro: Regression: Lecture 2 33/42 Residuals: Unemployment on the yield spread Residual analysis: unemp on yields sample residuals 6 4 2 0 −2 −4 1960 Hendricks, Autumn 2016 1980 2000 FINM Intro: Regression: Lecture 2 2020 34/42 Autocorrelated series The autocorrelation of the error series at a monthly frequency is 97%. Hendricks, I This essentially says that the regression has much less data than the classic formulas understand. I With highly correlated data, there is little true sample variation for OLS to use in estimation. I Consider regressing one very persistent data series on another persistent data series. I The levels of such persistent X and Y may track closely together just due to the persistence. Autumn 2016 FINM Intro: Regression: Lecture 2 35/42 Model mispecification Often, autocorrelated errors are a sign that the model is mispecified. Hendricks, I This is commonly caused by having a time-trend in the data. I This also may be a sign that the model should use the differenced data. I Much of time-series statistics deals with examining whether the data has a time-trend, a random-walk, or cointegration. I This is beyond the scope of the notes. Autumn 2016 FINM Intro: Regression: Lecture 2 36/42 Non-parametric v.s. parametric estimation Like with heteroscedasticity, one can use parametric assumptions to simplify the estimation. Hendricks, I Time-series statistics often makes assumptions about a linear model having autoregressive (AR) or moving average (MA) components. I Again, this will be discussed more later in the program. Autumn 2016 FINM Intro: Regression: Lecture 2 37/42 AR(1) serial correlation Consider the AR(1) model for . t =ρt−1 + ut where ut is homoscedastic, uncorrelated, with variance σu2 . This implies that 1 ρ ρ2 ρ 1 ρ Σ= .. 1 − ρ2 . T −1 T −2 ρ ρ ··· σu2 ρ3 · · · ρ2 · · · ρ2 ρ ρT −1 T −2 ρ 1 This is a widely used model for time-series correlation. Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 38/42 Non-parametric estimation The goal is to estimate var [b |x] = X0 X −1 X0 ΣX X0 X −1 which depends on estimating X0 ΣX. X0 ΣX = n X n X σi,j xi x0j i=1 j=1 One might estimate X0 ΣX n X n X ei ej xi x0j i=1 j=1 Hendricks, Autumn 2016 FINM Intro: Regression: Lecture 2 39/42 Trouble in estimation Unfortunately, this sample estimate is not guaranteed to be positive definite. Hendricks, I The common way to deal with this is to put less weight on observations further separated by time. I Several different weighting schemes have been employed. Autumn 2016 FINM Intro: Regression: Lecture 2 40/42 Newey-West estimator The Newey-West estimator of X0 ΣX is popular: n X ei2 xi x0i + i=1 w` = 1 − L X n X w` et et−` xt x0t−` + xt−` x0t `=1 t=`+1 ` L+1 for some number of lags L. Hendricks, I First term is the same as the heteroscedasticity-consistent estimator. I Second term estimates autocorrelations of errors. Autumn 2016 FINM Intro: Regression: Lecture 2 41/42 References I Hendricks, Cochrane, John. Asset Pricing. 2001. I Greene, William. Econometric Analysis. 2011. I Hamilton, James. Time Series Analysis. 1994. I Wooldridge, Jeffrey. Econometric Analysis of Cross Section and Panel Data. 2011. Autumn 2016 FINM Intro: Regression: Lecture 2 42/42
© Copyright 2026 Paperzz