NAME: ECON 326 (Spring 2013): Final exam Welcome to the final! Please read the following instructions while the exams are being passed out to the class. 1. Please do not turn the page until instructed to. 2. Please put away your calculators. You will not need them. 3. The exam is 120 minutes long, and is out of points plus 5 extra credit points. 4. My advice is to read through all the questions first, and then start by answering the ones that seem easiest to you. 5. Good luck! You should be proud of all that you’ve learned this semester! 1 Fill in the blank or Circle your choice- 27 points total. (3 pts each, no partial credit) 1. R2 can take values between -1 and 1. (True / False) 2. The homoskedasticity assumption cannot be tested. (True / False) 3. βˆ1 can be calculated by dividing the sample covariance between X and Y by the sample Var(Y). (True / False) 4. It is possible to include time fixed effects in a regression that also includes entity fixed effects. (True / False) 5. If all 5 SLR assumptions are true, then the OLS estimator B̂1 has (greater / less) variance than a Two Stage Least Squares estimator, assuming a valid instrument for X1 is used. 6. In Two Stage Least Squares, the endogenous variable X appears as a regressor or regressand in: (a) the first stage regression only (b) the second stage regressopm only (c) neither of the two stages (d) both of the two stages 7. If we have two periods of panel data and the population model: Yit = B0 + B1 Xit + ai + uit , and we are concerned that the error components ai are correlated with Xi , then: (a) Both First-Differences and Fixed Effects Estimation will solve this problem, but produce different estimate of βˆ1 . (b) Both First-Differences and Fixed Effects Estimation will solve this problem, and produce the same estimate of βˆ1 . (c) Only First-Differences Estimation will solve this problem. (d) Only Fixed Effects Estimation will solve this problem. 8. Consider the instrumental variable regression model Yi = β0 + β1 Xi + β2 Wi + ui which you plan to estimate by using Zi as an instrument for Xi (Let’s assume it is valid for this purpose). Unfortunately, you discover that data on Wi are not available. (a) Suppose Zi and Wi are negatively correlated. Is Z still a valid instrument? (Yes / No) (b) Suppose Cov(Zi , Wi ) = 0. Is Z still a valid instrument? (Yes / No) 2 Short Answer- 25 points total (5 pts each unless otherwise noted) 1. A researcher is interested in the effect of military service on earnings. He collects data from a random sample of 4000 workers aged 40 and runs the OLS regression Yi = B0 + B1 Xi + ui where Yi is the worker’s annual earnings and Xi is a dummy variable that is equal to 1 if the person served in the military and 0 otherwise. (a) Explain why the OLS estimates are likely to be biased. (b) During the Vietnam War there was a draft, where priority for the draft was determined by a national lottery. (Birth dates were drawn at random to determine the order in which men would be drafted). Explain how the lottery might be used as an instrument to estimate the effect of military service on earnings. Make sure to discuss the requirements for instrument validity. (c) What does it mean to have a “weak instrument” problem, and is it likely in this example? 3 2. We want to use a fixed effects regression to estimate the panel data model: Yij = B0 + B1 Xij + ai + uij (j might be t for time periods, or it might indicate another form of clustering). Explain the three assumptions that relate to the error term uij and what they are each necessary for. 3. The Currie et al. paper on the HeadStart preschool program aims to study how attending HeadStart affects childrens’ outcomes (test scores in Grade 3). Explain the type of fixed effects used in this paper, and why this approach is better than a simple cross-sectional comparison between all children who attended HeadStart and all those who didn’t. 4 Longer problems- 57 points total 4. (19 points) Traffic Fatalities A researcher is using panel data on traffic fatalities (in 30 states for 5 years) to estimate the following model: Deathsit = B0 + BeerT axit + U nemployedRtit + state fixed effects + time fixed effects + uit where Deathsit is the number of traffic fatalities per 10,000 residents of state i in year t, BeerT axit is the tax rate of beer in state i in year t, and U nemployedRtit is the share of the population of state i that is unemployed in year t. (a) Calculate the degrees of freedom in this regression model. (b) The researcher believes that traffic fatalities increase when roads are icy, so that states with more snow will have more fatalities than other states. Is it a good idea to add AverageSnowf all for each state (calculated as the average over the five years) to the regression model? Why or why not? [Hint: Think about what the subscripts on this term would be.] (c) Is it a good idea to add a Snowf all variable, equal to snowfall in certain year in state i, to the regression model? Why or why not? [Hint: Think about what the subscripts on this term would be.] 5 (d) A research assistant conjectures that Snowf all might have a different effect on traffic fatalities in the southern vs. the northern states. How would you test this hypothesis? (Be specific about the way you would estimate the regression and the statistical test you would use. The idea is to answer this question with the results of one regression.) 6 5. (24 points) Heart Attacks You are working with a team of medical researchers who have a large database of health characteristics from a random sample of adults in the U.S. during the 1960s. They want to estimate how the incidence of a heart attack is related to personal characteristics including age, smoking, and various diagnoses. Since Y is a dummy variable equal to 1 if the individual had a heart attack (within the ten years after the data was collected), this is a linear probability model. The goal is to use the results to predict the heart attack risk of other individuals who are not in this sample. d HeartAttack = B̂0 + B̂1 Age + B̂2 HighCholesterol +B̂3 HighBloodP ressure + B̂4 Diabetes + B̂5 Smoker where HighCholesterol, HighBloodP ressure and Diabetes are dummy variables for being diagnosed with that condition, and Smoker is a dummy variable indicating current smokers. (a) (3 pts) Most people with diabetes also have a diagnosis of high cholesterol. What are the consequences of this fact for your estimation of B̂4 ? Be specific. (b) (3 pts) Should you be concerned if your estimate of B̂0 is negative? Why or why not? (c) (3 pts) Suppose you obtain an estimate of B̂5 = .025. How do you interpret this effect in words, precisely? (d) (4 pts) Explain how you could use the results of this regression to predict the probability that Joe Schmoe will have a heart attack: He is a randomly selected Clevelander of age 57, who smokes and has high blood pressure, but does not have high cholesterol or diabetes. 7 (e) (3 pt) A medical expert comments “It’s not reasonable to assume that every year of age increase has the same impact on the risk of a heart attack.” How you could you improve the regression in response to this critique? (f) (8 pts) Medical experts tell you these results are not accurate for the female population, since their levels of heart attack risk differ from males. Part of your team thinks that this can be corrected by adding a F emale dummy variable to the regression. Others say this may not be enough, because the way in which females’ probability of a heart attack is determined may be entirely different from the way in which males’ heart attack risk is determined. You are asked to solve this dispute. First, write down the unrestricted model that allows for male and female heart attack probabilities to have different relationships with all independent variables. Then, write down the null hypothesis you need to test (to resolve the debate among your team), and explain in detail all the steps you would take to test it. 8 (g) (14 + 5 extra credit points) Supersize Me In the paper The Effect of Fast-Food Restaurants on Obesity and Weight Gain the authors examine how locating a new fast-food restaurant close to a high school affects the body weight of 9th graders at the school. Let’s suppose that you gather some of the same type of data as the authors of this paper. First, you determine which Ohio high schools did not have a fast food restaurant within .1 miles of the school in 2004. This will be your sample of high schools. Next, among these schools in your sample, you identify those schools where a new fast food restaurant opened up within .1 miles of the school in 2005. Finally, you gather information on the percent of obese 9th graders (as measured in the late spring of 9th grade year) at each school in your sample for the years 2004 and 2006. The average obesity rates for the schools that have a new nearby fast-food restaurant in 2006 are: .29 (in 2004) and .40 (in 2006). The average obesity rates for the schools that do not have a new nearby fast-food restaurant in 2006 are: .30 (in 2004) and .33 (in 2006) i. (6 pts) You decide to estimate a difference-in-differences model to measure the effect of a new fast food restaurant opening within .1 miles of the school on 9th grader obesity rates. Without even running a regression you can calculate this Difference-in-Differences estimate. Calculate this estimate and show your work. Explain by defining the treatment and control groups, as well as the pre- and post- treatment periods. ii. (3 pts) There are two main reasons why you would want to run an actual regression (instead of the simple calculation above) to obtain the difference-in-differences coefficient estimate. What are they? 9 iii. (5 pts) Write down the regression model that allows you to directly estimate the differencein-differences in one step. Be sure to define all of the variables. Indicate the coefficient that provides the differences-in-differences estimate. iv. (EXTRA CREDIT 5pt). The difference-in-differences estimation approach is a powerful technique to argue that one independent variable has a causal effect on the dependent variable. However, there is still the potential concern that our estimate is biased. What would lead to a biased estimate when using difference-in-differences? Give an example in the context of this problem. 10
© Copyright 2026 Paperzz