Economics 345 Applied Econometrics Problem Set 3--Solutions Prof: Martin Farnham Problem sets in this course are ungraded. An answer key will be posted on the course website within a few days of the release of each problem set. As noted in class, it is highly recommended that you make every effort to complete these problems before viewing the answer key. Variance of Univariate OLS Estimator 1) Normally, we think it’s important to include other RHS variables that are likely to be correlated with both x1 and y. This is for reasons of eliminating omitted variables bias (more on which below). However, looking at the expression for variance of the univariate OLS estimator, can you see any reason why we might want to include on the RHS variables that affect y but don’t affect x1? What would be the advantage of including such variables on the RHS? Recall that the formula for the variance of OLS estimates under univariate linear regression is Var(β̂1 ) = σ2 ∑ (xi − x )2 The error variance, σ2, tell us how much variation there is in u, the error term. Recall that u picks up all other determinants of y. If we can pull some of those determinants of y out of u and explicitly include them as x variables on the RHS, the variation in u, and hence σ2, will decline. This will lower the variance of the estimator of β̂1 . Multiple Regression Basics 2) Book Problems 3.1, 3.2, 3.3, and 3.5. Book Problem 3.1 i) A lower value of hsperc corresponds to a higher ranking in the highschool. If we expect high-ranked high school students to get higher GPAs at university, then the negative coeff on hsperc makes sense. ii) 2.676 (just plug in the values for hsperc and sat and add up) iii) The predicted GPA difference is 0.2072. One obtains this by multiplying 140 by the estimated coefficient on sat. This is not a very big difference in GPA, even on a 4-point scale. iv) If we rewrite the estimated equation in terms of changes and assume everything else is held constant (so that their changes are zero), then we can rewrite the estimated equation as Δcolgpa=0.00148Δsat. We want to find the case where the difference in colgpa=0.5. So we set 0.5=0.00148Δsat, which yields a difference in sat of about 337.84 points. Given that the SAT has a standard deviation of 140, this is a big difference. In other words, the small coeff on sat indicates that it only explains small differences in colgpa. 3.2 i) In my opinion, it has the expected effect. If one has more siblings, the cost of education for the family will be higher. Ceteris paribus, this is likely to lead the parents to provide less education to each child than they would if they had fewer children. Following the logic in 3.1.iv, we want to find the solution to Δeduc=-0.094Δsibs where Δeduc=-1. Dividing -1 by -0.094 gives us about 10.64. This suggests that increasing the number of siblings by about 11, will lead to one more year of education on average. This suggests that while more siblings leads to less education on average, it is a small effect, on average. ii) The coefficient on meduc is estimated to be 0.131. This suggests that each year of extra education that a working man’s mother has contributes (on average) an extra 0.13 of a year to the amount of education the working man gets. We might expect that more educated mothers tend to value education more highly, and hence encourage their sons to receive more education. iii) There are two ways to do this. You can plug in values for Man A and Man B and compare, or you can remember that Man B’s parents each have 4 more years of education than Man A, and so calculate that Man B should (on average) receive 0.131(4)+0.210(4)=1.364 more years of education than Man A. 3.3. i) negative ii) I would expect that more educated people are more likely to be driven in their activities outside of work. So that even controlling for their work hours, the coefficient on education would be negative, reflecting other tradeoffs with sleep. Another possible interpretation is that people who can get by on less sleep are more likely to be successful in school. The expected sign of the coefficient on age is less clear. Young adults, like you all, tend to sleep ridiculously small amounts. Also, adults with kids (who tend to be younger than other adults) get less sleep due to kids waking in the night. On the other hand, older people (say mid-50s on) tend to have more disturbed sleep for reasons associated with aging. Overall I’d guess that age would have a positive coefficient (that the first arguments outweigh the second), but this is a tougher call. It wouldn’t surprise me if the relationship between age and sleep was nonlinear. This would suggest respecifying the model with both Age and Age-Squared included as RHS variables. iii) 5 hours more per week is 300 minutes of extra work. So one would expect sleep to fall by (0.148)*300 minutes. This is about 44 minutes. This strikes me as a small to moderate tradeoff. iv) The coefficient on education suggests that going from a high school to a university education (adding 4 years of education) reduces weekly sleep by about 44 minutes. This strikes me as a pretty small effect. The sign of the coefficient is interpreted above. v) No, the R-squared is quite low. If you think about different working people you know, you’ll notice lots of variation in hours worked by things like occupation. People with secretarial jobs tend to work 8-5, 5 days a week. Certain untenured professors with midterms to write work 12 hours a day, 7 days a week (ahem!). Corporate lawyers and investment bankers work very long hours, while piano teachers tend to work relatively few. Obviously, occupation is highly correlated with totwrk. If you think occupation affects total sleep (for reasons not captured in totwrk) then omission of occupation would be problematic, and would cause the coefficient on totwrk to be biased. Another factor that might affect sleep is whether you have kids. This could also be correlated with totwrk (though I’m not sure which way this relationship would work). If men work harder when they have kids to meet income needs of the family, then these variables would be positively correlated. If they work less, to spend more time with their kids, this would be negatively correlated with totwrk. Clearly, though, having kids interrupts sleep and causes parents to get less than non-parents (ceteris paribus). So this could be another important omitted variable. 3.5. i) Remember that there are 168 hours in a week. If you hold sleep, work and leisure fixed, while changing study, you implicitly allow the number of hours in a week to vary. This is doesn’t make sense. ii) It’s important to realize that if there are 4 categories that students can lump their time into, and if the number of hours in a week is fixed (which it obviously is) then once we know how much time is spent on 3 of the categories, we already know how much is spent on the fourth. For example, if sleep=56, work=20, and leisure=62 then we already have enough information to know what study is. study must be 30 (yes, I suspect I’m being optimistic). In other words, there is a perfect linear relationship (i.e. 168(sleep+work+leisure) between the first three x variables and the fourth. In fact, there’s a perfect linear relationship between any three of the x variables and the fourth. If we include all four, then we violate the assumption of “no perfect collinearity,” (MLR.3) iii)We could omit one of the four variables. For instance, we could omit leisure from the model, so that we restate the model as GPA = β 0 + β1study + β 2 sleep + β 3work + u Now when we think about one of these variables changing by one hour, we’re implicitly letting leisure vary (also by one hour, but in the opposite direction) to preserve the 168 hours in a week. For instance, beta1 tells us what effect substituting one more hour of study for one less hour of leisure has on GPA. This is arguably a useful interpretation and the respecified model satisfies MLR.3. Unbiasedness of OLS estimators 3) What are the assumptions required for the OLS estimates to be unbiased? i) Population model is linear in parameters. ii) A random sample is used to obtain the estimates. iii) None of the RHS variables are perfectly collinear. iv) None of the x’s are correlated with the error term. Omitted Variables bias 4) Suppose that the true model of earnings is (1) Earnings = β 0 + β1Educ + β 2 Ability + u . If you had data to estimate this, you could obtain OLS estimates β̂ 0 , β̂1 , and β̂ 2 of the parameters. Suppose now that you have data on Earnings and Educ but no data on Ability (which is typically difficult for the econometrician to observe). Therefore, you choose to estimate, using OLS, the following specification: (2) Earnings = β 0 + β1Educ + v , where v is an error term. You obtain estimates β and β from this univariate regression. 0 1 a) State two conditions under which β1 will be an unbiased estimator of β1 (in particular, if either of these 2 conditions holds, β1 will be unbiased). Note that I’m not interested in the Gauss Markov assumptions. One condition should be about the relationship between Ability and Earnings. The other condition should be about the relationship between Ability and Education. i) If ability actually has no effect on earnings. (beta2=0 in the population) ii) if Ability and Education are uncorrelated. b) Do you think it’s likely that either of those two conditions are met? No. If higher skilled workers get paid more, then ability must matter in the population relationship. And there’s lots of evidence that smarter people (higher ability people) are more likely to pursue high amounts of education. c) Suppose now that Educ and Ability are positively correlated. This could be because “clever” (high ability) people find university easier to stomach than “slow” (low ability) people. Also, assume that β 2 is positive (so that high ability people get paid more than low ability people). If you estimate specification (2), will the slope coefficient estimate be unbiased? If so, in which direction will the coefficient estimate be biased? It will be positively biased. d) Suppose now that Educ and Ability are positively correlated, but that β 2 is negative (unlikely, but let’s play with the assumption). In which direction will the slope coefficient (estimated under specification 2) be biased? It will be negatively biased. e) Suppose now that Educ and Ability are negative correlated (also unlikely, but let’s play with the assumption). Assuming that β 2 is positive, in what direction will the slope coefficient (under specification 2) be biased? It will be negatively biased. f) Suppose, finally, that Educ and Ability are negative correlated and that β 2 is negative. In which direction will the slope coefficient (estimated under specification 2) be biased? It will be positively biased. g) Suppose we can obtain data on Ability. Explain why we could obtain an unbiased estimate of the slope coefficient on Educ in the following 2-step procedure: i) regress Educ on Ability; calculate the residuals and save them. ii) regress Earnings on these saved residuals. This is the method of partialling out. What this does is to basically create a new variable for education that is cleansed of any relationship with Ability. By regressing Educ on Ability, and obtaining residuals, we get the part of Educ that is unrelated to Ability. When we then regress earnings on this cleansed version of Educ, we get the true ceteris paribus effect of Educ on Earnings.
© Copyright 2026 Paperzz