Regression adjusted impacts

Agenda:
 Block Watch: Random Assignment, Outcomes,
and indicators
Issues
in Impact and Random Assignment:
Youth Transition Demonstration
–Who is randomized?
–Sample size, power, and effect size
–Who’s in the average?
Block Watch: Random Assignment,
Outcomes, and Indicators

What random assignment protocol would
you use to assess the impacts of Block
Watch?

What are the strengths and weaknesses of
your approach?

What are the key outcomes you want to
assess? What are indicators for those?
Youth Transition Demonstration
Evaluation Plan:

Background on YTD evaluation plan

The basics of Impact size and significance

Power and sample size



No Shows/ Intent to Treat vs. Treatment on
the Treated
Multiple Comparisons
Regression adjusted comparisons
Youth Transition Demonstration:

Targets Youth receiving disability payments to help in
transition to adult life and employment

Goals: increase earnings, decrease costs, facilitate
transition to self-sufficiency

Six program sites with variation in programs

Services:
– Waiver of benefit decrease with earnings
– Education, Job training, work placements
– Case management, counseling, referral to services
YTD Evaluation:

Selected 6 sites for demonstration and evaluation

Intervention built on research from past programs and
evaluations

Randomly assigned youth to treatment or control

Large sample sizes to allow identification of smaller
effects and sub-group effects

Process and Impact Evaluation

Data collected from administrative files, surveys before
and after program

Advisory group of experts
Sampling:

Why did they divide the list of potential
participants (sampling frame) into groups of
10 for contact?

Why did they randomize 55 percent to the
treatment?

Why get pre-intervention characteristics if
they are randomly assigning groups?
Comparisons may be:
-over time
-across intervention groups
with and without program;
levels of intervention (“dosage”)
Impact here!
Statistical significance:
When can we rule out having an impact IF there is no impact?
Compare 2 means from independent samples:
Means:
t
x  x 0
t
s 2p
Proportions:
c
1 1 
  
 nt nc 
z
 pˆ t  pˆ c   0
1 1 
pˆ 1  pˆ    
 nt nc 
Pooled sample variance:
s 2p

 nt  1 st2   nc  1 sc2
nt  nc  2
xt  xc
pˆ 
nt  nc
Compare 2 means from independent samples:
Means:
Proportions:
t
x  x 0
t
s 2p
c
1 1 
  
 nt nc 
z
 pˆ t  pˆ c   0
1 1 
ˆp 1  pˆ    
 nt nc 
Pooled sample variance:
s 2p 
 nt  1 st2   nc  1 sc2
nt  nc  2
pˆ 
xt  xc
nt  nc
Compare 2 means from independent samples:
Means:
t
Proportions:
x  x 0
t
s 2p
c
1 1 
  
 nt nc 
z
 pˆ t  pˆ c   0
1 1 
pˆ 1  pˆ    
 nt nc 
Pooled sample variance:
s 2p

 nt  1 st2   nc  1 sc2
nt  nc  2
xt  xc
pˆ 
nt  nc
Compare 2 means from independent samples:
Means:
Proportions:
t
x  x 0
t
s 2p
c
1 1 
  
 nt nc 
z
 pˆ t  pˆ c   0
1 1 
ˆp 1  pˆ    
 nt nc 
Pooled sample variance:
s 2p 
 nt  1 st2   nc  1 sc2
nt  nc  2
pˆ 
xt  xc
nt  nc
So, it’s easier to say impact is “real” (not
just randomness) if:
– Size of impact is larger
– Variation in outcomes is small (S)
– Sample sizes are larger
Same factors figure into deciding how big
a sample we need to find the effect if it’s
there! [Power, sample size, minimally detectable effects]
Power and sample size:
Given randomness, what % of time will you be able to rule out the null, IF it
is NOT true (there IS an impact)?
How big a sample size do you need to rule out NO effect if the program
DOES have an impact? (Rossi et al p.312)
Online Calculators for Sample size and Power:

Sample size:
– http://www.dssresearch.com/toolkit/sscalc/size_a2.asp
– http://www.dssresearch.com/toolkit/sscalc/size_p2.asp

Power:
– http://www.dssresearch.com/toolkit/spcalc/power_a2.asp
– http://statpages.org/proppowr.html
Minimum Detectable Impacts:
What are the smallest effects you will be able to detect
given n and predicted S?
Adjustments to impact assessment:

Regression adjusted impacts decrease S and increase
power by controlling for “noise” using baseline characteristics
Y     X baseline  treatment * Treatment  
Yˆtreatment    ˆ X all  ˆtreatment
Yˆ
   ˆ X
control

all
Multiple Comparisons are a problem because randomness
happens if you look long enough!
– MDRC picked “primary outcomes”
– Use adjustments to account for multiple comparisons
Showing estimated impacts over time in program
Who’s in the average?

“No shows” in treatment group didn’t get any services
– Unlikely to be similar to “shows”
– If drop, then may overstate potential impacts

“Intent to Treat” outcomes include outcomes for no-shows

“Treatment on the Treated” outcomes do not include no-shows

Non-response to follow-up surveys could bias impact
assessments
– Use administrative data available for all for key outcomes
– Put resources into follow up to minimize non-response
– Construct weights to make survey sample estimates comparable to
baseline sample
Lessons from Summary:

Randomization is hard

Need to use power analysis to choose target sample
sizes

Even randomization may not give comparable
baseline characteristics

Regression may increase comparability and precision

Worry about who we have outcome information for
(both control and treatment)