Topic 3. Multiple Regression Analysis: Estimation Motivation for multiple regression analysis 1) Assumption SLR.4 is often violated in a twovariable regression model and therefore the OLS estimators are biased. 2) A multiple linear regression model allows for a more general functional form, e.g. quadratic function 3) More variation in y can be explained ( R 2 is higher). Thus, we get better model for predicting the dependent variable. 1 Consider the model log( price) 0 1 log(dist ) u, where price is housing price and dist to the distance from a recently built garbage incinerator. We suspect that the estimate of 1is biased since some of the omitted variables that affect price are likely to correlate dist . 2 For example, we may expect that one of such factors is age of the house, age. So, we may include in the model: log( price) 0 1 log(dist ) 2 age u. In this case, we still need the assumption that the error term is uncorrelated with dist and age : E(u | dist , age) 0. Does this assumption is likely to hold? Do we need to include additional independent variables in the model? 3 In some cases, we expect that the effect of x on y is not constant and depends on the level of x . In this case we may include quadratic term to a regression model. For example, we may hypothesize that advertising has a diminishing marginal effect on the revenue of a firm. We may specify that as follows: (1.1) revenue 0 1adv 2 adv 2 u , where revenue is annual revenue in thousands of dollars of a firm and adv is annual advertising expenditure in thousands of dollars. 4 Generally, the model with two independent variables can be written as follows: (1.2) y 0 1 x1 2 x2 u, where 0 is the intercept, 1 measures the change in y with respect to x1 holding other factors fixed, 2 measures the change in y with respect to x2 holding other factors fixed. 5 Note that when the model includes quadratic terms the interpretation of parameters is different. QUESTION 2: Consider the model (1.1). How annual revenue would change if annual advertising expenditure increases by 1 thousand dollars holding other factors fixed? (see quadratic functions in Lecture 1) 6 OLS estimates and interpretation The multiple linear regression model in the population can be written as (1.3) y 0 1 x1 2 x2 ... k xk u, where 0 is the intercept, 1, 2 ,…., k are slope parameters. Note, that since there are k slope parameters and the intercept, the multiple regression model contains (k 1) parameters. This is important to remember in order to calculate degrees of freedom that we will need in hypothesis testing. 7 The sample regression function (SRF), which is also called OLS regression line, is (1.4) yˆ ˆ0 ˆ1 x1 ˆ2 x2 ... ˆk xk . For each observation i , we can get the fitted value of y , denoted as ŷ : (1.5) yˆi ˆ0 ˆ1 x1i ˆ2 xi 2 ... ˆk xik . Note: independent variables have two subscripts: i is the observation number, i 1, 2,..., n. The second subscript is the way we distinguish between different independent variables. There are k independent variables. 8 The residual, i.e. the difference between the observed value yi in the sample and the predicted value yˆ i is (1.6) uˆi yi yˆi yi ˆ0 ˆ1xi1 ... ˆk xik . The method of ordinary least squares (OLS) chooses ˆ0 , ˆ1, ˆ2 ,..., ˆk such that the sum of squared residuals (1.7) n i 1 y ˆ i 0 ˆ1 xi1 ˆ2 xi 2 ... ˆk xik 2 is as small as possible. 9 Having presented the general intuition of the OLS method, we will not discuss how ˆ0 , ˆ1, ˆ2 ,..., ˆk are actually computed. You may use any econometric software to compute the estimates of the parameters in the multiple linear regression model. In Stata use “regress” command: regress depvar [indepvars] [if] [in] [weight] [, options] 10 Interpretation of the sample regression function (SRF) (1.8) yˆ ˆ0 ˆ1 x1 ˆ2 x2 ... ˆk xk . is the same as in the case of the simple linear regression function. SRF is an estimated version of population regression function: (1.9) E ( y | x1 , x2 ,..., xk ) 0 1 x1 2 x2 ... k xk . ŷ is the predicted value of y , that is an estimate of the expected value of y E ( y | x1 , x2 ,.., xk ) ̂ 0 is the predicted value of y when variables x1 , x2 ,...., xk are all zero. 11 To interpret slope coefficients in the SRF take the difference of (1.8): (1.10) yˆ ˆ1x1 ˆ2 x2 ... ˆk xk . Set changes in x2 ,...., xk equal zero x2 0, x3 0,...., xk 0and solve (1.10) with respect to ˆ1: yˆ ˆ . (1.11) 1 x1 Thus, ˆ1is the partial effect of x1on ŷ holding x2 ,...., xk fixed. Sometimes researchers say that we control for the variables x2 ,...., xk when estimating the effect of x1on ŷ . Other coefficients have similar interpretation. 12 QUESTION 3: Suppose, a researcher decided to examine the relationship between education and wages of individuals and obtained the following results (you may use the WAGE1.DTA file to replicate the results): log(wage ˆ ) 0.583 0.083educ, where wage is average hourly earnings, educ is years of education. Then the researcher decided to control for years of experience, exper , and years with current employer, tenure , and obtained the following results: log( wage ˆ ) 0.284 0.092educ 0.0041exp er 0.022tenure a) Interpret the estimates of every coefficient in both models. b) Why do you think the researcher decided to control for exper and tenure ? c) Does 0.092 is definitely closer to the true parameter than 0.083? Explain. 13 In the question above we may also want to calculate the effect of simultaneous increase in experience and tenure by 1 year and no change in education. In this case we take the difference of the model: log( wage ˆ ) 0.092educ 0.0041 exp er 0.022tenure, then set educ 0, tenure 1, exp er 1 and calculate the proportionate change in hourly earnings: log(wage ˆ ) 0.0041 0.022 0.0261 To get the percentage increase multiply the estimate by 100: 0.0261*100% 2.61% 14 As in the simple linear regression model we may obtain R 2 as SSE SSR 2 (1.12) R 1 . SST SST R 2 is the proportion of the sample variation in y that is explained by independent variables x1 , x2 ,..., xk . Thus it shows how well the model, or OLS line, fits the data. 15 R 2 has a number of limitations: 1) R 2 never decreases and usually increases when another independent variable is added to the model. Regardless of whether a variable belongs in a true model or not, R 2 increases if you add the variable. Therefore, R 2 should not be used as a tool to decide whether to add a variable or not. 2) As we discussed in Lecture 4, R 2 does not show whether the estimates are reliable or not. Reliability of the estimates depend on whether the assumptions underlying the analysis hold or not. Note that R 2 can be shown to equal the squared correlation coefficient between the actual yi and the fitted values yˆ i . 16 We have already defined sample variance and sample covariance in Lecture 3. An estimator of correlation coefficient between two random variables X and Y is (1.13) X i X Yi Y i 1 n RXY S XY S X SY n i 1 (Xi X ) 1/2 Yi Y i 1 n 1/2 . In Stata you may use command “correlate” to get sample correlation or sample covariance of the variables. 17 Unbiasedness of the OLS Estimators Assumption MLR.1 (Linear in Parameters): The model in the population can be written as (1.14) y 0 1 x1 2 x2 ... k xk u, where 0 , 1 ,..., k are the unknown parameters (constants) of interest and u is an unobservable random error or disturbance term The key idea of the assumption is that the population model (also called true model) is linear in parameters 0 , 1 ,..., k . 18 Assumption MLR.2 (Random Sampling): We have a random sample of n observations ( xi1, xi 2, ..., xik , yi ) : i 1,2,3,..., nfollowing the population model in Assumption MLR.1 Assumption MLR.3 (No Perfect Collinearity): In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables. Note that this is the only assumption of assumptions MLR.1-MLR.5 that involves not only population considerations, but also sample considerations. All other assumptions deal with the population. 19 If an independent variable is an exact linear combination of other independent variables, then we say that the model suffers from perfect collinearity and it cannot be estimated by OLS. x1 a bx2 . In this case, the correlation between x1and x2 will be perfect, corr ( x1 , x2 ) 1. 20 There might be a number of reasons why that might be the case: a) the two variables in the model sum to a constant by definition. For example, if we control for percentage of female employees, f , in a company and percentage of male employees, m , in the company: f 100 m . b) one of the variables is a constant. A linear regression with an intercept can be thought of as including a regressor with all observations equal 1. Therefore, if one of the variables is constant, it will be perfectly collinear with this “intercept” regressor. 21 c) control for the quadratic term while taking log of it. For example, consider the model (1.1). If you would like to specify it as a log-log model and take the log of each variable then you: log(revenue) 0 1log (adv) 2 log(adv 2 ) u. The independent variables are perfectly collinear since log(adv 2 ) 2log(adv) . You need to specify the model as log(revenue) 0 1log (adv) 2 [log(adv)]2 u. 22 Mathematically, perfect collinearity is problem since it results in division by zero in the OLS formulas. Intuitively, it is a problem since it results in the illogical question. For example, consider case a) above. The coefficient estimate of f would show change in dependent variable when the percentage of female employees increases by one percentage point, holding the percentage of male employees fixed. However, this is impossible. The solution to the perfect collinearity problem is to drop a regressor which is perfectly collinear with another regressor. We will discuss a bigger problem called multicollinearity later. 23 Another point is that there are at least as many observations in the sample as number of parameters in the model: n (k 1) . Otherwise it is impossible to estimate the parameters by OLS. Assumption MLR.4 (Zero Conditional Mean): The error u has an expected value of zero given any values of the independent variables. In other words, (1.15) E (u | x1 , x2 ,..., xk ) 0. 24 There are a number of reasons why the assumption may fail: 1) Functional form misspecification. For example, the true model (population) is the log-log model while we specify the level-level model. We discuss this problem later. 2) Omitted variable problem: a variable that affects y and is correlated with one of the independent variables is omitted from the regression. The reason for omitting the variable might be data limitations or because the variable is not measurable. 25 3) Measurement error problem. It is discussed in a more detail at the end of the course. The idea is that if there is an error in measurement of a variable, the error should not correlate with the disturbance term. 4) Simultaneity. One of the independent variables is a function of the dependent variable y . For example, wages can be considered as a function of prices and prices are a function of wages. In this case, we estimate simultaneous equations models. The models are not often used in practice and we do not cover them in the course. If independent variable does not correlate with the error term, it is called exogenous variable. If independent variable correlates with the error term, it is called endogenous variable. 26 Theorem (Unbiasedness of OLS): Under Assumptions MLR.1-MLR.4, E ( j ) j , j 0,1,..., k , (1.16) for any values of the population parameter j . In other words, the OLS estimators are unbiased estimators of the population parameters. Remember, that unbiasedness does not mean that the OLS estimate equals the true parameter. It only says that if we get OLS estimates from all possible samples, the expected value of the estimates equal the true parameter. In practice, when we deal with the one sample, we get only one estimate of the parameter and it can be far from the true parameter. Note, that correlation between a single independent variable and an error term makes estimators of all parameters to be biased. 27 Endogeneity problem is one of the main concerns in practical applications. If we suspect that some of the variables are endogenous, then the OLS estimators are biased. One of the conventional ways to deal with endogeneity is to use Instrumental Variable (IV) estimator. However, this estimator also suffers from a number of problems, especially the weak instruments problem. At the end of the course, we will discuss IV estimator, as well as the test for endogeneity. In most of the applications, researchers estimate the model with OLS estimator and then check for robustness of the results estimating with IV estimator. 28 The Variances of the OLS Estimators Variance of the OLS estimator show how the estimates from all possible samples of the population are spread around the mean. If the estimator are unbiased the mean is the value of the parameter. We are interested in making the variances as small as possible. 29 Assumption MLR.5 (Homoscedasticity): The error u has the same variance given any values of the explanatory variables. In other words var(u | x1 , x2 ,..., xk ) 2 . If the assumption fails, the model exhibits heteroscedasticity. Note that var( y | x1 , x2 ,...., xk ) var(u | x1 , x2 ,..., xk ) 2 . Assumptions MLR.1-MLR.5 are known as GaussMarkov assumptions. 30 Theorem (Sampling Variances of the OLS Slope Estimators): Under Assumptions MLR.1 –MLR.5, conditional on the sample values of the independent variables, (1.17) var( ˆ j ) 2 SST j (1 R ) 2 j for j 1,2,..., k , where SST j n j 1 , ( xij x j )2 is the total sample variation in x j , and R 2j is the R 2 from regressing x j on all other independent variables (and including an intercept). For hypothesis testing we need standard deviation of OLS estimators, ˆ j , which is the square root of the variance: (1.18) sd ( ˆ j ) SST j (1 R ) 2 j . 31 The only component of the (1.17) that we do not know is the error variance 2 . It can be estimated by n (1.19) uˆi 2 SSR ˆ . n k 1 df 2 i 1 Since standard deviation of the error term is not known we use its estimator (1.20) ˆ ˆ 2 which is called standard error of the regression. Now we can get estimators of standard deviations of ˆ j ˆ ˆ (1.21) se( j ) SST j (1 R 2j ) that are called standard errors. 32 An important thing to remember for hypothesis testing is that standard errors given in formula (1.21) are not valid of the assumption of homoscedasticity is violated. Stata as well as other econometric software compute standard errors using formula(1.21) by default. Therefore, if we suspect heteroscedasticity we need to correct that. We will discuss this issue later. 33 Let’s discuss the components of the variance of the OLS estimator given in formula (1.17). 1. The error variance, 2 . The larger the error variances, the larger are the variances of the OLS estimators. The more variables we omit from the regression, the larger is the error variance, and the larger are the variances of the OLS estimators. Adding more explanatory variable to the model, we may reduce the error variance. 2. The total sample variation in exploratory variable, SST j . The larger is the SST j , the smaller is the variances of the OLS estimators. Typically, the larger is the sample size, the larger is the SST j and therefore the more precise are the OLS estimators. 34 3. Linear relationships among the independent variables, R 2j . This is the R 2 from regressing x j on all other independent variables including intercept. In general, R 2j shows the proportion of the total variation in x j that can be explained by other independent variables in the model. R 2j 0is the best case since in this situation var( ˆ j ) is minimum. This can happen only if x j has zero sample correlation with other independent variables. The extreme case R 2j 1 is not allowed. It implies division by zero in (1.17) and violates assumption MLR.3. 35 As R 2j increases to 1, var( ˆ j ) gets larger and larger. Thus higher degree of linear relationship between independent variables leads to higher variance of OLS estimator of a parameter. High correlation between independent variables, when R 2j is close to 1, is called multicollinearity. This is not violation of the assumption MLR.3. However, multicollinearity may be a problem since it may lead to high variance of the OLS estimator. This makes the estimator less precise as we discuss later in the course. Multicollinearity can be mitigated by lower 2 or larger SST j . Therefore, in general, there is no benchmark value of R 2j that would indicate multicollinearity. However, in some case researchers become concerned of multicollinearity if R 2j 0.9 . 36 We may also discuss multicollinearity in terms of the variance inflation factor: 1 (1.22) VIFj . 2 1 Rj The general conclusion is that we are interested in less correlation between the independent variables. Also, note that correlation between a number of independent variables does not affect variances of the estimators of the coefficients of other independent variables. In contrast, correlation between a single independent variable and an error term makes estimators of all parameters to be biased. 37 Solutions of multicollinearity. 1) Drop one of the independent variables that has a strong correlation with the variable of interest. However, in this case, although we reduce variance of the variable but we get omitted variable bias. 2) Collect more data to increase the sample size and thus increase SST j . This leads to economic costs. 3) In some cases, the variables that are strongly correlated can be modified. For example, a number of variables representing subcategories (e.g. subcategories of expenditure) can be aggregated to one variable that represents a category (e.g. total expenditure). In other cases, we sometimes may modify a variable to net out the part of the variable that correlates with another variable. For example, if firm output and exchange rate correlate, I can subtract exports from the total firm output and thus reduce its correlation with exchange rate. 38 Overspecifying the Model and Underspecifying the Model Overspecifying the model - we include the irrelevant variable in the model. Irrelevant variable is the variable that does not belong in the population model, in other words, it has no partial effect on y in the population. Overspecification of the model does not lead to biased OLS estimators. However, it leads to higher variances of the OLS estimators if the irrelevant variable included in the model is correlated with other independent variables. This is the cost of overspecification. 39 Underspecifying the model - we do not include the variable that actually belongs in the population model (it has nonzero partial effect on y ). If the omitted variable appears to correlate with the independent variable(s), then this leads to the violation of assumption MLR.4 and therefore to the omitted variable bias. Note that in this case OLS estimators of all parameters are biased. 40 Gauss-Markov Theorem Gauss-Markov Theorem: Under Assumptions MLR.1-MLR.5, ˆ0 , ˆ1 ,...., ˆk are the best linear unbiased estimators (BLUEs) of 0 , 1 ,...., k , respectively. “Best” estimators means that the OLS estimators have the smallest variances among all linear unbiased estimators. This theorem justifies application of OLS estimators. It also shows why assumptions MLR.1-MLR.5 are important. If these assumptions do not hold then OLS estimators are not BLUE. We need assumptions MLR.1-MLR.4 for OLS to be unbiased and we need additional assumption MLR.5 for the OLS to have the smallest variance. If MLR.5 homoscedasticity assumption is violated then OLS does not have the smallest variance and we would use another estimator. 41 QUESTION 4: (problem 3.13 in the textbook): The following equation represents the effects of tax revenue mix on subsequent employment growth for the population of counties in the United States: growth 0 1shareP 2 shareI 3 shareS u, where growth is the percentage change in employment from 1980 to 1990, shareP is the share of property taxes in total tax revenue, shareI is the share of income tax revenues, and shareS is the share of sales tax revenues. All of these variables are measured in 1980. The omitted share, shareF , includes fees and miscellaneous taxes. By definition the four shares add up to one. 1) Why must we omit one of the tax share variables ( shareF was omitted) from the equation? 2) Give careful interpretation of 1. 42
© Copyright 2026 Paperzz