PMAP 8521: Evaluation Research Prof. Jesse Lecy MIDTERM EXAM Spring 2014 NAME____________________________________________________________ Instructions: The exam should take about two hours, although you have the full class period available if you need it. The questions are not in any order of difficulty but the bonus questions are more challenging than the exam questions. You can use a calculator and one page of notes. Remember to staple your page of notes to the exam when you turn it in and please turn off you cell phones. Good luck! 1 Please give non-mathematical definitions to the following statistical concepts: (1) Regression Coefficient: (2) Standard Error: (3) Standard Deviation: (4) Multicollinearity: (5) Measurement Error: 2 (6) Holding cov( x1,y ), var( x1 ) and var( y ) constant, which of the two cases below will have the smaller standard error? Which will have the larger slope b1? ( Y b0 b1 X 1 b2 X 2 e ) Smallest standard error: Larger slope b1: 3 (7) Name three things that will reduce the standard error of a regression slope. (8) Name the two sins of the Seven Sins where the primary unwanted effect is to increase the standard error of a regression slope. 4 (9) Which variable, X1, X2 or X3 has the smallest variance and how do we know? (b1 corresponds with X1, etc.) b1 b2 b3 b=0 5 (10) Consider the following cases: Case 1 Y Case 2 Y Case 3 Y X X X a. Holding cov(x,y) constant across all cases, which will have the smallest standard error? b. Holding cov(x,y) constant across all cases, which will have the largest slope? 6 (11) Consider the model for the following three cases: Y b0 b1 X 1 b2 X 2 e Case 1 Case 2 Y Case 3 Y Y X2 X1 X2 X1 X1 X2 Holding cov(X1,Y) and cov(X1,X2) constant, which case will have the smallest SEb1 ? 7 (12) Consider three cases below. Each has an X2 with a different correlation structure represented by the Venn diagrams. We want to compare the results from the naïve models and the full models: Y b0 b1 X 1 e Y 0 1 X 1 2 X 2 e Match the cases on the left with the scenarios on the right by drawing a line between them. A. b1 Y X2 Case 1 β1 X1 B. Y b1 Case 2 X1 X2 β1 C. Y b1 Case 3 X1 X2 β1 8 (13) Consider the following regression: Y 0 1 X 1 Consider the case where B1 = 6, and SEB1 = 2.49. (a) Using t=1.96, calculate the 95% confidence interval for B1. Is the slope statistically significant at this level? How do we know? (b) Using t=2.58, calculate the 99% confidence interval for B1. Is the slope statistically significant at this level? How do we know? (c) Yes or No: Does this program have an impact? Assume that a positive slope signifies a positive impact. Justify your answer. 9 (14) Calculate the slope and the intercept for a simple bivariate regression model ( Y b0 b1 X e ) from the following information: x: y: var(x): var(y): cov(x,y): -3 2 7 21 14 b1 = b0 = (15) Now using the slope and intercept that you calculated above, calculate the predicted Y and the residual for the following three cases (you do not need to calculate the sum of square error – the SSE - for the model): X -1 -2 -3 Y 8 3 1 Yˆ e 10 (16) Consider the policy problem of mandating small classrooms to improve test scores. Let us add another variable to our model in order to improve our estimations – this one related to extra state funding given to urban schools as an attempt to improve performance. The correlation structure (positive, negative, or null) is as follows: Test Score Class Size SocioEconomic Status Teacher Quality State Funding Test Score Class Size Socio-Economic Status Teacher Quality State Funding ─ + + + ─ + ─ Question: Since state funding given through this special program is correlated with our policy variable of class size, then if we run the following model our estimate will be biased: 𝑇𝑆 = 𝑏0 + 𝑏1 𝐶𝑆 + 𝑏2 𝑆𝐸𝑆 + 𝑏3 𝑇𝑄 + 𝑒 Will our policy slope b1 over- or under-estimate the true impact of classroom size on test scores? To get full credit show your reasoning or your math. 11 (17) Study on link between acetaminophen used during pregnancy and AHDH An interesting study in this week’s JAMA Pediatrics is sure to spark lots of conversation. In this study children of women who used the pain reliever acetaminophen during pregnancy appear to be at higher risk for attention-deficit/hyperactivity disorder (ADHD)-like behavioral problems. As the authors point out in their summary, acetaminophen is the most commonly used medication for pain and fever during pregnancy. Some recent studies have suggested that acetaminophen has effects on sex hormones as well as other hormones, which can in turn affect neuro development and cause behavioral dysfunction. The authors studied 64,322 children and mothers in the Danish National Birth Cohort (19962002). Parents reported behavioral problems on a questionnaire, and HKD diagnoses and ADHD medication prescriptions were collected from Danish registries. What was found was that more than half of the mothers reported using acetaminophen while pregnant. The use of acetaminophen during pregnancy appeared to be associated with a higher risk of HKD diagnosis, of using ADHD medications or of having ADHD-like behaviors at age 7 years. The risk increased when mothers used acetaminophen in more than one trimester during pregnancy. Question: Offer an alternative explanation by describing a different scenario that could lead to the data observed by the study – in other words, what is a possible omitted variable that could better explain the results? Justify your choice by describing the scenario where an omitted variable will prove problematic, and explaining how your variable fits this case. 12 (18) What are the three criteria that must be true for a variable to qualify as a valid instrumental variable? (19) In the homework on fixed-effect models we considered a model that examines the relationship between state spending on infrastructure and economic growth. The model uses panel data and a state fixed-effect. What is one state-specific variable that will be automatically controlled for using the state fixed-effect, and how do we know (how does it fit the criteria)? What is one state-specific variable that we would need to include in the model, even with fixed-effects? 13 BONUS (3pts): Go back to the model that attempts to discern the effects of class size on test scores: Now think about another model: SES 0 1ClassSize e What is the exact slope for the regression of SES on Class Size, π1? 14 BONUS (3pt): This question refers to the study on the health benefits of coffee. The policy variable in the study is how much coffee an individual consumes and the study concludes that consuming more coffee improves health (in this case fewer strokes). The article explains weaknesses in the experimental design as follows: “However, one expert doesn't think this study convincingly shows a strong link. The problem with this type of study is that there are too many factors unaccounted for and association does not prove causality, said Dr. Larry B. Goldstein, director of the Duke Stroke Center at Duke University Medical Center. ‘Subjects were asked about their past coffee consumption in a questionnaire and then followed over time. There is no way to know if they changed their behavior,’ Goldstein said.” Explain why a model using individual-level fixed-effects would not be appropriate to use in this case. 15 BONUS (4 pts): Think back to the model that we have studied looking at the relationship between classroom size and test scores: TestScore 0 1ClassSize 2 SES 3TeacherQuality (1) Now think about a different way to run the regression model. What if we constructed it in the following way: ClassSize 0 1 SES e1 (2) TestScore b0 b1e1 b2 SES b3TeacherQuality (3) In this case the e1 in model (3) is the residual term from model (2). How will this change the slopes in the model? Draw the Venn diagram for model (3). 16 Answer the following and be sure to explain why: Does b1 1 ? Does b2 2 ? Does b3 3 ? Does ? Note that epsilon and gamma are both just symbols for the residual. 17 Scratch paper 18
© Copyright 2024 Paperzz