2014 - Jesse D. Lecy

PMAP 8521: Evaluation Research
Prof. Jesse Lecy
MIDTERM EXAM
Spring 2014
NAME____________________________________________________________
Instructions: The exam should take about two hours, although you have the full class
period available if you need it. The questions are not in any order of difficulty but the
bonus questions are more challenging than the exam questions. You can use a calculator
and one page of notes. Remember to staple your page of notes to the exam when you
turn it in and please turn off you cell phones. Good luck!
1
Please give non-mathematical definitions to the following statistical concepts:
(1) Regression Coefficient:
(2) Standard Error:
(3) Standard Deviation:
(4) Multicollinearity:
(5) Measurement Error:
2
(6) Holding cov( x1,y ), var( x1 ) and var( y ) constant, which of the two cases below will have the
smaller standard error? Which will have the larger slope b1? ( Y  b0  b1 X 1  b2 X 2  e )
Smallest standard error:
Larger slope b1:
3
(7)
Name three things that will reduce the standard error of a regression slope.
(8)
Name the two sins of the Seven Sins where the primary unwanted effect is to increase the
standard error of a regression slope.
4
(9) Which variable, X1, X2 or X3 has the smallest variance and how do we know?
(b1 corresponds with X1, etc.)
b1
b2
b3
b=0
5
(10) Consider the following cases:
Case 1
Y
Case 2
Y
Case 3
Y
X
X
X
a. Holding cov(x,y) constant across all cases, which will have the smallest standard error?
b. Holding cov(x,y) constant across all cases, which will have the largest slope?
6
(11) Consider the model for the following three cases:
Y  b0  b1 X 1  b2 X 2  e
Case 1
Case 2
Y
Case 3
Y
Y
X2
X1
X2
X1
X1
X2
Holding cov(X1,Y) and cov(X1,X2) constant, which case will have the smallest SEb1 ?
7
(12) Consider three cases below. Each has an X2 with a different correlation structure
represented by the Venn diagrams. We want to compare the results from the naïve models
and the full models:
Y  b0  b1 X 1  e
Y   0  1 X 1   2 X 2  e
Match the cases on the left with the scenarios on the right by drawing a line between them.
A.
b1
Y
X2
Case 1
β1
X1
B.
Y
b1
Case 2
X1
X2
β1
C.
Y
b1
Case 3
X1
X2
β1
8
(13) Consider the following regression:
Y   0  1 X 1  
Consider the case where B1 = 6, and SEB1 = 2.49.
(a) Using t=1.96, calculate the 95% confidence interval for B1. Is the slope statistically
significant at this level? How do we know?
(b) Using t=2.58, calculate the 99% confidence interval for B1. Is the slope statistically
significant at this level? How do we know?
(c) Yes or No: Does this program have an impact? Assume that a positive slope signifies a
positive impact. Justify your answer.
9
(14) Calculate the slope and the intercept for a simple bivariate regression model
( Y  b0  b1 X  e ) from the following information:
x:
y:
var(x):
var(y):
cov(x,y):
-3
2
7
21
14
b1 =
b0 =
(15) Now using the slope and intercept that you calculated above, calculate the predicted Y and
the residual for the following three cases (you do not need to calculate the sum of square
error – the SSE - for the model):
X
-1
-2
-3
Y
8
3
1
Yˆ
e
10
(16) Consider the policy problem of mandating small classrooms to improve test scores. Let us
add another variable to our model in order to improve our estimations – this one related to
extra state funding given to urban schools as an attempt to improve performance. The
correlation structure (positive, negative, or null) is as follows:
Test Score
Class Size
SocioEconomic
Status
Teacher
Quality
State
Funding
Test Score
Class Size
Socio-Economic Status
Teacher Quality
State Funding
─
+
+
+
─


+
─

Question: Since state funding given through this special program is correlated with our
policy variable of class size, then if we run the following model our estimate will be biased:
𝑇𝑆 = 𝑏0 + 𝑏1 𝐶𝑆 + 𝑏2 𝑆𝐸𝑆 + 𝑏3 𝑇𝑄 + 𝑒
Will our policy slope b1 over- or under-estimate the true impact of classroom size on test
scores? To get full credit show your reasoning or your math.
11
(17) Study on link between acetaminophen used during pregnancy
and AHDH
An interesting study in this week’s JAMA Pediatrics is sure to spark lots of conversation. In this
study children of women who used the pain reliever acetaminophen during pregnancy appear
to be at higher risk for attention-deficit/hyperactivity disorder (ADHD)-like behavioral problems.
As the authors point out in their summary, acetaminophen is the most commonly used
medication for pain and fever during pregnancy. Some recent studies have suggested that
acetaminophen has effects on sex hormones as well as other hormones, which can in turn
affect neuro development and cause behavioral dysfunction.
The authors studied 64,322 children and mothers in the Danish National Birth Cohort (19962002). Parents reported behavioral problems on a questionnaire, and HKD diagnoses and
ADHD medication prescriptions were collected from Danish registries.
What was found was that more than half of the mothers reported using acetaminophen while
pregnant. The use of acetaminophen during pregnancy appeared to be associated with a
higher risk of HKD diagnosis, of using ADHD medications or of having ADHD-like behaviors at
age 7 years. The risk increased when mothers used acetaminophen in more than one trimester
during pregnancy.
Question: Offer an alternative explanation by describing a different scenario that could lead
to the data observed by the study – in other words, what is a possible omitted variable that
could better explain the results? Justify your choice by describing the scenario where an
omitted variable will prove problematic, and explaining how your variable fits this case.
12
(18) What are the three criteria that must be true for a variable to qualify as a valid instrumental
variable?
(19) In the homework on fixed-effect models we considered a model that examines the
relationship between state spending on infrastructure and economic growth. The model
uses panel data and a state fixed-effect.
What is one state-specific variable that will be automatically controlled for using the state
fixed-effect, and how do we know (how does it fit the criteria)?
What is one state-specific variable that we would need to include in the model, even with
fixed-effects?
13
BONUS (3pts): Go back to the model that attempts to discern the effects of class size on
test scores:
Now think about another model:
SES   0   1ClassSize  e
What is the exact slope for the regression of SES on Class Size, π1?
14
BONUS (3pt): This question refers to the study on the health benefits of coffee. The
policy variable in the study is how much coffee an individual consumes and the study
concludes that consuming more coffee improves health (in this case fewer strokes). The
article explains weaknesses in the experimental design as follows:
“However, one expert doesn't think this study convincingly shows a strong link.
The
problem with this type of study is that there are too many factors unaccounted for and
association does not prove causality, said Dr. Larry B. Goldstein, director of the Duke
Stroke Center at Duke University Medical Center. ‘Subjects were asked about their past
coffee consumption in a questionnaire and then followed over time. There is no way to know
if they changed their behavior,’ Goldstein said.”
Explain why a model using individual-level fixed-effects would not be appropriate to use in
this case.
15
BONUS (4 pts): Think back to the model that we have studied looking at the relationship
between classroom size and test scores:
TestScore   0  1ClassSize   2 SES   3TeacherQuality  
(1)
Now think about a different way to run the regression model. What if we constructed it in
the following way:
ClassSize   0   1 SES  e1
(2)
TestScore  b0  b1e1  b2 SES  b3TeacherQuality  
(3)
In this case the e1 in model (3) is the residual term from model (2). How will this change
the slopes in the model?
Draw the Venn diagram for model (3).
16
Answer the following and be sure to explain why:
Does b1  1 ?
Does b2   2 ?
Does b3   3 ?
Does    ? Note that epsilon and gamma are both just symbols for the residual.
17
Scratch paper
18