non-participants - World Bank Group

Africa Impact Evaluation Program on AIDS
(AIM-AIDS)
Cape Town, South Africa
March 8 – 13, 2009
Quasi-Experimental
Methods
Jean-Louis Arcand
The Graduate Institute | Geneva
[email protected]
1
Objective
• Find a plausible counterfactual
Reality check
• Every method is associated with an assumption
• The stronger the assumption the more we need
to worry about the causal effect
» Question your assumptions
2
Program to evaluate
 Hopetown HIV/AIDS Program (2008-2012)
 Objectives
 Reduce HIV transmission
 Intervention: Peer education
 Target group: Youth 15-24
 Indicator: Pregnancy rate (proxy for unprotected
sex)
3
I. Before-after identification strategy
(aka reflexive comparison)
Counterfactual:
Rate of pregnancy observed before
program started
EFFECT = After minus Before
4
Year
Number of
areas
Teen pregnancy rate
(per 1000)
2008
70
62.90
2012
Difference
70
66.37
+3.47
5
Teen pregnancy
(per 1000)
Counterfactual assumption:
no change over time
68
66
64
62
60
58
56
54
52
50
66.37
Effect = +3.47
62.9
2008
2012
Intervention
Question: what else might have happened in 2008-2012 to affect teen
pregnancy?
6
Examine assumption
with prior data
Number Teen pregnancy (per 1000)
of areas
2004
2008
2012
70
54.96
62.90
66.37
Assumption of no change over time
looks a bit shaky
7
II. Non-participant
identification strategy
Counterfactual:
Rate of pregnancy among non-participants
Teen pregnancy rate (per
1000) in 2012
Participants
66.37
Non-participants
57.50
Difference
+8.87
8
Counterfactual assumption:
Without intervention participants have
same pregnancy rate as non-participants
teen pregnancy
(per 1000)
100
Participants
80
66.4
Effect = +8.87
60
Non-participants
57.5
40
2008
2012
Question: how might participants differ from nonparticipants?
9
Test assumption
with pre-program data
teen pregnancy
(per 1000)
80
70
62.9
60
66.4
?
57.5
50
40
46.37
2008
2012
REJECT counterfactual hypothesis of same
pregnancy rates
10
III. Difference-in-Difference
identification strategy
Counterfactual:
1.Nonparticipant rate of pregnancy, purging preprogram differences in
participants/nonparticipants
2.“Before” rate of pregnancy, purging beforeafter change for nonparticipants
1 and 2 are equivalent
11
Average rate of teen
pregnancy in
2008
2012
Difference
(2008-2012)
Participants (P)
62.90
66.37
3.47
Non-participants (NP)
46.37
57.50
11.13
Difference (P-NP)
16.53
8.87
-7.66
12
Effect = 3.47 – 11.13 = - 7.66
Participants
80
teen pregnancy
70
66.37 – 62.90 = 3.47
66.4
62.9
60
57.5
50
46.37
57.50 - 46.37 = 11.13
40
2008
2012
Non-participants
13
Effect = 8.87 – 16.53 = - 7.66
teen pregnancy (per 1000)
80
Before
70
62.9
66.4
66.37 – 57.50 = 8.87
60
62.90 – 46.37 = 16.53
57.5
50
46.37
After
40
2008
2012
14
Counterfactual assumption:
Without intervention participants and
nonparticipants’ pregnancy rates follow same
trends
15
teen pregnancy
(per 1000)
80
74.0
70
62.9
60
66.4
16.5
57.5
50
40
46.37
2008
2012
16
teen pregnancy
(per 1000)
80
70
62.9
60
74.0 -7.6
66.4
57.5
50
40
46.37
2008
2012
17
Questioning the assumption
• Why might participants’ trends differ from that
of nonparticipants?
18
Examine assumption
with pre-program data
Average rate of teen
pregnancy in
2004
2008
Difference (20042008)
Participants (P)
54.96
62.90
7.94
Non-participants (NP)
39.96
46.37
6.41
Difference (P=NP)
15.00
16.53
+1.53 ?
counterfactual hypothesis of same trends
doesn’t look so believable
19
IV. Matching with Difference-inDifference identification strategy
Counterfactual:
Comparison group is constructed by pairing each
program participant with a “similar”
nonparticipant using larger dataset – creating a
control group from similar (in observable ways)
non-participants
20
Counterfactual assumption:
Unobserved characteristics do not affect
outcomes of interest
Unobserved = things we cannot measure
(e.g. ability) or things we left out of the
dataset
Question: how might participants differ from
matched nonparticipants?
21
76
73.36
Teem pregnamcy rate (per 1000)
74
72
Effect = - 7.01
70
68
66
64
62
Matched
nonparticipant
66.37
Participant
60
58
56
2008
2012
22
Can only test assumption
with experimental data
Studies that compare both methods (because they
have experimental data) find that:
unobservables often matter!
direction of bias is unpredictable!
Apply with care – think very hard about
unobservables
23
V. Regression discontinuity
identification strategy
Applicability:
When strict quantitative criteria determine
eligibility
Counterfactual:
Nonparticipants just below the eligibility cutoff are
the comparison for participants just above the
eligibility cutoff
24
Counterfactual assumption:
Nonparticipants just below the eligibility cutoff are the
same (in observable and unobservable ways) as
participants just above the eligibility cutoff
Question: Is the distribution around the cutoff smooth?
Then, assumption might be reasonable
Question: Are unobservables likely to be important (e.g. correlated
with cutoff criteria)?
Then, assumption might not be reasonable
However, can only estimate impact around the cutoff, not for the
whole program
25
Example: Effect of school inputs on
test scores
•
•
•
•
•
•
Target transfer to poorest schools
Construct poverty index from 1 to 100
Schools with a score <=50 are in
Schools with a score >50 are out
Inputs transfer to poor schools
Measure outcomes (i.e. test scores) before and
after transfer
26
60
65
70
75
80
Regression Discontinuity Design - Baseline
20
30
40
50
Score
60
70
80
27
65
70
75
80
Regression Discontinuity Design - Baseline
Non-Poor
60
Poor
20
30
40
50
Score
60
70
80
28
65
70
75
80
Regression Discontinuity Design - Post Intervention
20
30
40
50
Score
60
70
80
29
75
80
Regression Discontinuity Design - Post Intervention
65
70
Treatment Effect
20
30
40
50
Score
60
70
80
30
Applying RDD in practice: Lessons from
an HIV-nutrition program
• Lesson 1: criteria not applied well
– Multiple criteria: hh size, income level, months on
ART
– Nutritionist helps her friends fill out the form with
the “right” answers
– Now – unobservables separate treatment from
control…
• Lesson 2: Watch out for criteria that can be
altered (e.g. land holding size)
31
Summary
• Gold standard is randomization – minimal
assumptions needed, intuitive estimates
• Nonexperimental requires assumptions – can
you defend them?
32
Different assumptions will give you
different results
• The program: ART treatment for adult patients
• Impact of interest: effect of ART on children of patients
(are there spillover & intergenerational effects of
treatment?)
– Child education (attendance)
– Child nutrition
• Data: 250 patient HHs 500 random sample HHs
– Before & after treatment
• Can’t randomize ART so what is the counterfactual
33
Possible counterfactual candidates
• Random sample difference in difference
– Are they on the same trajectory?
• Orphans (parents died – what would have happened in
absence of treatment)
– But when did they die, which orphans do you observe, which
do you not observe?
• Parents self report moderate to high risk of HIV
– Self report!
• Propensity score matching
– Unobservables (so why do people get HIV?)
34
Estimates of treatment effects
using alternative comparison groups
Comparison group:
ARV hh (<100 days) * Rd. 2
ARV hh (>100 days) * Rd. 2
Constant
Observations
R-squared
(1)
(2)
Orphans in High/Mod.
Random
HIV Risk
sample
households
All kids (8-18 years)
(3)
(4)
Orphans in High/Mod.
Random
HIV Risk
sample
households
All boys (8-18 years)
(5)
(6)
Orphans in High/Mod.
Random
HIV Risk
sample
households
All girls (8-18 years)
10.675
(3.262)***
5.808
(3.133)*
14.723
(5.583)***
334
0.86
15.686
(4.877)***
10.930
(4.467)**
13.073
(6.510)**
164
0.84
10.805
(4.676)**
2.503
(4.566)
17.526
(10.406)*
170
0.90
10.787
(2.720)***
5.316
(2.638)**
15.836
(4.753)***
424
0.85
14.561
(3.832)***
9.302
(3.513)***
8.307
(5.693)
210
0.87
10.397
(3.979)**
1.652
(4.036)
23.553
(7.712)***
214
0.86
• Compare to around 6.4 if we use the simple difference in
difference using the random sample
Standard errors clustered at the household level in each round.
Includes child fixed effects, round 2 indicator and month-of-interview indicators.
35
Estimating ATT using
propensity score matching
• Allows us to define comparison group using more
than one characteristic of children and their
households
• Propensity scores defined at household level,
with most significant variables being singleheaded household and HIV risk
36
Probit regression results
• Dependent variable: household has adult ARV recipient
Single-headed household
Amt of land owned (acres)
Household size
Value of livestock owned (shillings)
Travel time to main road (mins.)
Value of durables owned (shillings)
House with tin roof
House with non-mud roof
Coefficient
z-value
0.8917932
-0.0153242
0.0060359
9.36E-07
0.0034674
-9.35E-08
0.2535599
0.2180698
3.06
-0.83
0.12
0.4
1.4
-0.01
0.58
0.7
Household with respondent who reported
high/moderate risk of having HIV/AIDS
2.76405
Constant
-3.250733
Observations
225
Pseudo R-squared
0.5151
6.88
-4.87
37
ATT using propensity
score matching
Mean change between rounds 1 and 2
Random Sample
ARV households
Hours of school attendance
Nearest neighbor matching
neighbors=2
Kernel matching
bandwidth=.06
Difference
T-stat
-10.97
-3.69
7.28
1.94
-7.82
-3.69
4.12
1.65
38
Nutritional impacts of ARV
treatment
(1)
Dependent variable:
Sample:
ARV household * Round 2
ARV household (<100 days in rd 1)
* Round 2
ARV household (>100 days in rd 1)
* Round 2
Constant
Observations
R-squared
(2)
(3)
(4)
WHZ
WHZ<=-2
All children 0-5 in round 1
0.315
(0.202)
-0.498
(0.386)
772
0.87
-0.098
(0.043)**
0.570
(0.277)**
-0.003
(0.252)
-0.481
(0.386)
772
0.87
0.076
(0.082)
772
0.70
-0.071
(0.058)
-0.111
(0.053)**
0.077
(0.082)
772
0.70
Includes child fixed effects, age controls, round 2 indicator, interviewer fixed effects, and month-ofinterview indicators.
39
Nutrition with alternative
comparison groups
(1)
Dependent variable:
Comparison Group:
ARV household * Round 2
ARV household (<100 days in rd 1)
* Round 2
ARV household (>100 days in rd 1)
* Round 2
Constant
Observations
R-squared
(2)
RS Orphans
(3)
WHZ
RS Mod/High Risk
1.038
(0.733)
0.864
(1.567)
96
0.92
(4)
0.521
(0.327)
1.195
(0.785)
0.773
(0.859)
0.904
(1.588)
96
0.92
-0.339
(0.819)
250
0.88
0.768
(0.392)*
0.220
(0.419)
-0.314
(0.818)
250
0.88
Includes child fixed effects, age controls, round 2 indicator, interviewer fixed effects, and monthof-interview indicators.
40
Summary: choosing among nonexperimental methods
• At the end of the day, they can give us quite
different estimates (or not, in some rare cases)
• Which assumption can we live with?
41
Thank You
42