(ie, multicollinearity) in the regressors.

Lecturing 05
REGRESSION DIAGNOSTIC I:
MULTICOLLINEARITY
RELAXING THE ASSUMPTIONS OF
THE CLASSICAL MODEL
Assumption 1. The regression model is linear in the
parameters.
Assumption 2. The values of the regressors, the X ’s, are fixed
in repeated sampling.
Assumption 3. For given X ’s, the mean value of the
disturbance ui is zero.
Assumption 4. For given X ’s, the variance of ui is constant or
homoscedastic.
RELAXING THE ASSUMPTIONS OF
THE CLASSICAL MODEL
Assumption 5. For given X ’s, there is no autocorrelation in the
disturbances.
Assumption 6. If the X ’s are stochastic, the disturbance term and
the (stochastic) X ’s are independent or at least uncorrelated.
Assumption 7. The number of observations must be greater than the
number of regressors.
Assumption 8. There must be sufficient variability in the values
taken by the regressors.
RELAXING THE ASSUMPTIONS OF
THE CLASSICAL MODEL
Assumption 9. The regression model is correctly specified.
Assumption 10. There is no exact linear relationship (i.e.,
multicollinearity) in the regressors.
Assumption 11. The stochastic (disturbance) term ui is
normally distributed.
MULTICOLLINEARITY
 One of the assumptions of the classical linear
regression (CLRM) is that there is no exact linear
relationship among the regressors.
 If there are one or more such relationships among
the regressors, we call it multicollinearity, or
collinearity for short.
Perfect collinearity: A perfect linear relationship
between the two variables exists.
Imperfect collinearity: The regressors are highly (but
not perfectly) collinear.
MULTICOLLINEARITY
 Perfect collinearity
MULTICOLLINEARITY
 Imperfect collinearity
MULTICOLLINEARITY
MULTICOLLINEARITY
MULTICOLLINEARITY
There are several sources of multicollinearity
1. The data collection method employed
2. Constraints on the model or in the population being
sampled.
3. Model specification, for example, adding polynomial
terms to a regression model, especially when the range of
the X variable is small
4. An overdetermined model
CONSEQUENCES
 If collinearity is not perfect, but high, several
consequences ensue:
 The OLS estimators are still BLUE, but one or more regression
coefficients have large standard errors relative to the values of
the coefficients, thereby making the t ratios small.
 Even though some regression coefficients are statistically
insignificant, the R2 value may be very high.
 Therefore, one may conclude (misleadingly) that the true values
of these coefficients are not different from zero.
 Also, the regression coefficients may be very sensitive to small
changes in the data, especially if the sample is relatively small.
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
 If collinearity is not perfect, but high, several
consequences ensue:
 The OLS estimators are still BLUE, but one or more regression
coefficients have large standard errors relative to the values of
the coefficients, thereby making the t ratios small.
 Even though some regression coefficients are statistically
insignificant, the R2 value may be very high.
 Therefore, one may conclude (misleadingly) that the true values
of these coefficients are not different from zero.
 Also, the regression coefficients may be very sensitive to small
changes in the data, especially if the sample is relatively small.
Damodar Gujarati
Econometrics by Example
The Gauss—Markov Theorem and the
Properties of OLS Estimators
The Gauss—Markov Theorem and the
Properties of OLS Estimators
OLS is BLUE, where BLUE stand for “ Best (meaning
minimum variance), Linear (they are linear function of the
dependent variable Y), Unbiased (in repeated applications
of the method, on average, the estimators approach their
true values.
In the class of linear unbiased estimators, OLS estimator have
minimum variance. As a result, the true parameter values
can be estimated with least possible uncertainty; an
unbiased estimator with the least variance is called an
efficient estimator
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
Assume that X3i = λX2i , where λ is a nonzero constant.
CONSEQUENCES
Recalling OLS Estimator
Recalling OLS Estimator
Equivalently
VARIANCE INFLATION FACTOR
 For the following regression model:
Yi  B1  B2 X 2i  B3 X 3i  ui
It can be shown that:

var(b2 ) 
 x (1  r )
2
2i
and
var(b3 ) 
2

2
23
2
 x (1  r )
2
3i
2
23



2
x
2
2i

VIF
2
x
2
3i
VIF
where σ2 is the variance of the error term ui, and r23 is the
coefficient of correlation between X2 and X3.
Damodar Gujarati
Econometrics by Example
VARIANCE INFLATION FACTOR
VARIANCE INFLATION FACTOR (CONT.)
1
VIF 
2
1  r23
is the variance-inflating factor.
 VIF is a measure of the degree to which the variance of the
OLS estimator is inflated because of collinearity.
s2
s2
var(bk ) =
=
VIF
2
2
2
å xk (1- Rk ) å xk
An Example
DETECTION OF MULTICOLLINEARITY
 1. High R2 but few significant t ratios
 2. High pair-wise correlations among explanatory
variables or regressors
 3. High partial correlation coefficients
 4. Significant F test for auxiliary regressions
(regressions of each regressor on the remaining
regressors)
 5. High Variance Inflation Factor (VIF) and low
Tolerance Factor (TOL, the inverse of VIF)
Damodar Gujarati
Econometrics by Example
DETECTION OF MULTICOLLINEARITY
REMEDIAL MEASURES
 What should we do if we detect multicollinearity?
 Nothing, for we often have no control over the data.
 Redefine the model by excluding variables may attenuate
the problem, provided we do not omit relevant variables.
 Principal components analysis: Construct artificial
variables from the regressors such that they are orthogonal
to one another.
These principal components become the regressors in the
model.
Yet the interpretation of the coefficients on the principal
components is not as straightforward.
Damodar Gujarati
Econometrics by Example
REMEDIAL MEASURES
REMEDIAL MEASURES
Lecturing 04 (Lanjutan)
REGRESSION DIAGNOSTIC II:
HETEROSCEDASTICITY
Damodar Gujarati
Econometrics by Example
Heteroskedasticity
We seek answers to the following questions:
1. What is the nature of heteroscedasticity?
2. What are its consequences?
3. How does one detect it?
4. What are the remedial measures?
THE NATURE OF HETEROSCEDASTICITY
One of the important assumptions of the classical linear regression
model is that the variance of each disturbance term ui, conditional
on the chosen values of the explanatory variables, is some
constant number equal to σ2. This is the assumption of
homoscedasticity, or equal (homo) spread (scedasticity), that is,
equal variance. Symbolically,
Eu2i = σ2
i = 1, 2, . . . , n
(11.1.1)
Look at Figure 11.1. In contrast, consider Figure 11.2, the variances of
Yi are not the same. Hence, there is heteroscedasticity.
Symbolically,
Eu2i = σ2i
(11.1.2)
Notice the subscript of σ2, which reminds us that the conditional
variances of ui (= conditional variances of Yi) are no longer
constant.
Heteroskedasticity
Heteroskedasticity
HETEROSCEDASTICITY
 One of the assumptions of the classical linear
regression (CLRM) is that the variance of ui, the
error term, is constant, or homoscedastic.
 Reasons are many, including:
Following the error models
As income grow, people have more discretionary
income
As data collection techniques improve, variance is likely
to decrease
The presence of outliers in the data
Damodar Gujarati
Econometrics by Example
Outlier
HETEROSCEDASTICITY
 Reasons are many, including:
Incorrect functional form of the regression model
Incorrect transformation of data
Another source of heteroskedasticity is skewnes
Mixing observations with different measures of scale
(such as mixing high-income households with lowincome households)
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
 If heteroscedasticity exists, several consequences
ensue:
 The OLS estimators are still unbiased and consistent, yet the
estimators are less efficient, making statistical inference less
reliable (i.e., the estimated t values may not be reliable).
 Thus, estimators are not best linear unbiased estimators
(BLUE); they are simply linear unbiased estimators (LUE).
 In the presence of heteroscedasticity, the BLUE estimators
are provided by the method of weighted least squares (WLS).
Damodar Gujarati
Econometrics by Example
Heteroskedasticity
Unfortunately, the usual OLS method does not follow this strategy, but a
method of estimation, known as generalized least squares (GLS),
takes such information into account explicitly and is therefore
capable of producing estimators that are BLUE. To see how this is
accomplished, let us continue with the now-familiar two-variable
model:
Yi = β1 + β2Xi + ui
(11.3.1)
which for ease of algebraic manipulation we write as
Yi = β1X0i + β2Xi + ui
(11.3.2)
where X0i = 1 for each i. Now assume that the heteroscedastic variances
σ2i are known. Divide through by σi to obtain:
which for ease of exposition we write as
Y*i = β*1X*0i + β*2X*i + u*i
DETECTION OF HETEROSCEDASTICITY
 Graph histogram of squared residuals
 Graph squared residuals against predicted Y
 Breusch-Pagan (BP) Test
 White’s Test of Heteroscedasticity
 Other tests such as Park, Glejser, Spearman’s rank
correlation, and Goldfeld-Quandt tests of
heteroscedasticity
Damodar Gujarati
Econometrics by Example
BREUSCH-PAGAN (BP) TEST
 Estimate the OLS regression, and obtain the squared OLS residuals
from this regression.
 Regress the square residuals on the k regressors included in the model.
 You can choose other regressors also that might have some
bearing on the error variance.
 The null hypothesis here is that the error variance is homoscedastic –
that is, all the slope coefficients are simultaneously equal to zero.
 Use the F statistic from this regression with (k-1) and (n-k) in the
numerator and denominator df, respectively, to test this
hypothesis.
 If the computed F statistic is statistically significant, we can reject
the hypothesis of homoscedasticity. If it is not, we may not reject
the null hypothesis.
Damodar Gujarati
Econometrics by Example
WHITE’S TEST OF HETEROSCEDASTICITY
 Regress the squared residuals on the regressors, the squared
terms of these regressors, and the pair-wise cross-product
term of each regressor.
 Obtain the R2 value from this regression and multiply it by
the number of observations.
 Under the null hypothesis that there is homoscedasticity,
this product follows the Chi-square distribution with df
equal to the number of coefficients estimated.
 The White test is more general and more flexible than the
BP test.
Damodar Gujarati
Econometrics by Example
REMEDIAL MEASURES
 What should we do if we detect heteroscedasticity?
 Use method of Weighted Least Squares (WLS)
 Divide each observation by the (heteroscedastic) σi and estimate the
transformed model by OLS (yet true variance is rarely known)
 If the true error variance is proportional to the square of one of the
regressors, we can divide both sides of the equation by that variable
and run the transformed regression
 Take natural log of dependent variable
 Use White’s heteroscedasticity-consistent standard errors or
robust standard errors
 Valid in large samples
Damodar Gujarati
Econometrics by Example