30C00200 Econometrics 2) Linear regression model and the OLS estimator Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today’s topics • Multiple linear regression – what changes? – Model and its interpretation – Deriving the OLS estimator • • • • Standard error Multicollinearity Goodness of fit: R2 and Adj.R2 Fitting nonlinear functions using linear regression 2 Hedonic model of housing market Dependent variable (y) • Market price of apartment (€) Explanatory variables (x) • Size (m2) • Number of bedrooms (#) • Age (years) Hedonic model of housing market Dependent variable (y) • Market price of apartment (€) Explanatory variables (x) • Size (m2) • Number of bedrooms (#) • Age (years) Expected signs positive negative negative Model Regression equation with K parameters: yi = β1 + β2x2i + β3x3i + … + βKxKi + εi • β1 is the intercept term (constant) • βk is the slope coefficient of variable xk (marginal effect) • εi is the disturbance term for observation i Interpretation Interpretation of slope β2 for the variable ”size” (m2) Single regression model • Value of an additional m2, on the average, in apartments located in Tapiola Multiple regression model • Value of an additional m2, on the average, in apartments with the same nr. of bedrooms and age in Tapiola Interpretation Why intercept is β1 ? • Consider a model without a constant term: yi = β1x1i + β2x2i + β3x3i + … + βKxKi + εi • Where x1 = (1, 1, 1, …, 1) • This model is identical to yi = β1 + β2x2i + β3x3i + … + βKxKi + εi • We can think of the intercept term as the slope coefficient of a regressor that is constant (1) for all observations. Excel output of the hedonic model - Single regression with m2 SUMMARY OUTPUT Regression Statistics Multiple R 0,808717 R Square 0,654023 Adjusted R Square 0,648701 Standard Error 110242 Observations 67 ANOVA df Regression Residual Total Intercept size m2 SS 1,49E+12 7,9E+11 2,28E+12 MS 1,49E+12 1,22E+10 Coefficients Standard Error -31430,8 33447,06 5460,683 492,6256 t Stat -0,93972 11,08486 1 65 66 F Significance F 122,874 1,26E-16 P-value 0,350841 1,26E-16 Lower 95% -98229,2 4476,842 Upper 95% 35367,56 6444,524 Excel output of the hedonic model - Multiple regression SUMMARY OUTPUT Regression Statistics Multiple R 0,905971 R Square 0,820784 Adjusted R Square 0,81225 Standard Error 80593,26 Observations 67 ANOVA df Regression Residual Total Intercept size m2 nr. bedrooms age SS MS F 1,87E+12 4,09E+11 2,28E+12 6,25E+11 6,5E+09 96,17703 1,76E-23 Coefficients Standard Error 141366,7 36401,58 6972,404 857,5745 t Stat 3,883532 8,130378 P-value 0,000249 2,11E-11 Lower 95% 68623,96 5258,678 Upper 95% 214109,5 8686,13 -3,19498 -5,6965 0,002186 3,46E-07 -105952 -3809,39 -24413,3 -1830,8 3 63 66 -65182,6 -2820,1 20401,6 495,0578 Significance F OLS estimator OLS problem with K parameters: n min RRS ei 2 i 1 s.t. yi b1 b2 x2,i b3 x3,i ... bK xK ,i ei Or equivalently n min RSS ( yi b1 b2 x2,i b3 x3,i ... bK xK ,i ) 2 i 1 OLS estimator First-order conditions: n RSS 2 ( yi b1 b2 x2,i b3 x3,i ... bK xK ,i ) 0 b1 i 1 n RSS 2 x2,i ( yi b1 b2 x2,i b3 x3,i ... bK xK ,i ) 0 b2 i 1 n RSS 2 xK ,i ( yi b1 b2 x2,i b3 x3,i ... bK xK ,i ) 0 bK i 1 System of K linear equations with K unknowns - best solved by using matrix algebra OLS estimator Closed form solution in matrix form: b = (XX)-1 Xy Note: to study econometrics at more advanced level, you have to master matrix algebra. OLS estimator Closed form solutions: The intercept term b1 y b2 x2 b3 x3 ... bK xK OLS estimator Closed form solutions: Special case with two regressors x2 and x3 Slope b2 b2 Est.Cov( x2 , y ) Est.Var ( x3 ) Est.Cov( x3 , y ) Est.Cov( x2 , x3 ) Est.Var ( x2 ) Est.Var ( x3 ) Est.Cov( x2 , x3 ) 2 OLS estimator Important: explanatory variable x3 influences the slope of regressor x2 through the sample covariances. Note: if the regressor x2 does not correlate with the other regressor x3, that is, the sample covariance is Est.Cov( x2 , x3 ) 0 then the slope b2 estimated from the multiple regression model is exactly the same as that of the single regression of y on x2, leaving the effects of x3 to the disturbance term Est.Cov( x2 , y ) b2 Est.Var ( x2 ) OLS estimator as a random variable OLS estimates are calculated from observed data using the formula: b = (XX)-1 Xy Important: OLS estimator b is itself a random variable. (Why?) That is, elements of vector b have a probability distribution with the expected value E(b) and variance Var(b). The estimated standard deviation of b is called standard error. Excel output of the hedonic model - Multiple regression SUMMARY OUTPUT Regression Statistics Multiple R 0,905971 R Square 0,820784 Adjusted R Square 0,81225 Standard Error 80593,26 Observations 67 ANOVA df Regression Residual Total Intercept size m2 nr. bedrooms age SS MS F 1,87E+12 4,09E+11 2,28E+12 6,25E+11 6,5E+09 96,17703 1,76E-23 Coefficients Standard Error 141366,7 36401,58 6972,404 857,5745 t Stat 3,883532 8,130378 P-value 0,000249 2,11E-11 Lower 95% 68623,96 5258,678 Upper 95% 214109,5 8686,13 -3,19498 -5,6965 0,002186 3,46E-07 -105952 -3809,39 -24413,3 -1830,8 3 63 66 -65182,6 -2820,1 20401,6 495,0578 Significance F Standard error 2 Unbiased estimator of Var ( ) is n se2 2 e i i 1 nK Standard error of parameter estimate bk is s.e.(bk ) se2 ( XX) 1 kk The general expression requires matrix algebra Standard error Special case of two regressors (x2, x3): se2 s.e.(b2 ) (n 1) Est.Var ( x2 ) (1 rx22, x 3 ) where rx2,x3 is the sample correlation coefficient. Decomposed as product of 4 components: 1 1 1 s.e.(b2 ) se n 1 Est.Var ( x2 ) 1 rx22, x 3 Standard error 1 1 1 s.e.(b2 ) se n 1 Est.Var ( x2 ) 1 rx22, x 3 Four potential ways to improve precision: 1) Decrease se (improve empirical fit; add explanatory variables?) 2) Increase sample size n 3) Increase sample variance of x2 4) Decrease correlation of x2 and x3 Multicollinearity • High correlation between explanatory variables x can cause loss of precision • In practice, there is always some correlation – Example: sample correlation between variables ”size (m2)” and ”nr. of bedrooms” is 0.887 • Multiple regression analysis takes the correlation explicitly into account – correlation as such is not a problem – Depends on circumstances, particularly the sample size n, variance of disturbance ε, sample variances of regressors x. • When the high correlation is considered a problem, it is referred to as multicollinearity Multicollinearity Typical symptoms of multicollinearity: • Model has high R2 and is jointly significant in the F-test • Slope coefficients are large (small) but still insignificant • Slope coefficients have high standard errors • Slope coefficients have unexpected signs • Slope coefficients are much smaller or larger than expected Note: If two variables are perfectly correlated, the OLS estimator ill-defined and cannot be computed. This is the extreme type of multicollinearity. Multicollinearity Indirect methods to influence the other factors: 1) Improving empirical fit to decrease s – Include additional explanatory variables 2) Increase sample size n 3) Increase sample variance of problem variables Direct treatments: • Exclude one of problem variables – • problem: omitted variable bias Impose theoretical restriction Goodness of fit ANOVA and the coefficient of determination R2 : TSS ESS RSS n ( y i 1 i n 2 i 1 ESS RSS R 1 TSS TSS 2 n y ) ( yi yˆi ) ( yˆi y ) 2 2 i 1 Excel output of the hedonic model - Multiple regression SUMMARY OUTPUT Regression Statistics Multiple R 0,905971 R Square 0,820784 Adjusted R Square 0,81225 Standard Error 80593,26 Observations 67 ANOVA df Regression Residual Total Intercept size m2 nr. bedrooms age SS MS F Significance F 3 63 66 1 874 088 309 956 409 202 213 242 2 283 290 523 198 6,25E+11 6,5E+09 96,17703 1,76E-23 Coefficients 141366,7 6972,404 Standard Error 36401,58 857,5745 t Stat 3,883532 8,130378 P-value 0,000249 2,11E-11 Lower 95% 68623,96 5258,678 Upper 95% 214109,5 8686,13 -65182,6 -2820,1 20401,6 495,0578 -3,19498 -5,6965 0,002186 3,46E-07 -105952 -3809,39 -24413,3 -1830,8 Goodness of fit ESS RSS R 1 TSS TSS 2 Question: What happens to R2 if we include an additional explanatory variable xK+1to the model? Answer: • TSS remains the same • RSS tends to decrease / ESS increase. Thus, R2 is likely to increase • ESS and R2 can never decrease – In the worst case, bK+1 = 0, and there is no effect. Adjusted R2 n 1 2 K 1 Adj.R R R nK nK 2 2 • Simple (but rather arbitrary) degrees of freedom correction to the usual R2 • Adding a new variable xK+1 will increase Adj. R2 if and only if |tK+1| ≥ 1 • Dougherty concludes: ”There is little to be gained by finetuning [R2] by with a ’correction’ of dubious value.” Linearity of the model Regression equation: yi = β1 + β2x2i + β3x3i + … + βKxKi + εi • β1 is the intercept term (constant) • βk is the slope coefficient of variable xk (marginal effect) • εi is the disturbance term for observation i Important: model must be linear in parameters βK , εi. It does not need to be linear in variables x. Examples • Polynomian functional form: yi = β1 + β2x2i + β3x2i2+ … + βKx2iK-1 + εi • Log-linear (Cobb-Douglas, 1928) functional form: ln yi = β1 + β2 lnx2i + β3lnx3i + … + βK lnxKi + εi Exercise Exam question (Fall 2014): 1 We are interested in estimating parameters β1, β2, and β3. Which of the following 10 models can be stated as multiple linear regression (MLR) model, possibly after some variable transformations or other mathematical operations, such that parameters β1, β2, and β3 can be estimated by OLS? If some adjustments are required, briefly state the required operations and the resulting MLR equation that can be estimated by OLS. If the parameters cannot be estimated by OLS, briefly point out the problem. a) yi 1 2 xi 3 xi2 i f) yi i 1 2 x1i 3 x2i b) yi 1 x1i 2 x2i3 i g) c) yi 1 x1i 2 x2i3 exp( i ) yi ln 1 2 x1i 3 x2i i h) yi 1 x1i 2 x2i3 exp( i ) 0 i) yi 1 2 / 2 x1i 3 x2i i j) i 1 2 ln x1i 3 x23i yi d) e) yi 1 2 ln xi 3 xi0.4 i yi 1 i 2 x1i 3 x2i Notation: y and x denote the dependent and the independent variables, coefficients β are the parameters to be estimated, and ε are the random disturbance terms, respectively. Next time – Mon 14 Sept Topic: • Statistical properties of the OLS estimator 31
© Copyright 2026 Paperzz