Sample Selection in Evaluation

Impact Evaluation
Sampling and Power
Slides by Jishnu Das
Sample Selection in Evaluation
 Population based representative surveys:
• Sample representative of whole population
• Good for learning about the population
• Not always most efficient for impact evaluation
 Sampling for Impact evaluation
• Balance between treatment and control groups
• Power  statistical inference for groups of interest
• Concentrate sample strategically
 Survey budget as major consideration
• In practice, sample size is often set by budget
• Concentrate sample on key populations to
2
increase power
Purposive Sampling:
 Risk: We will systematically bias our sample, so
results don’t generalize to the rest of the
population or other sub-groups
 Trade off between power within population of
interest and population representation
 Results are internally valid, but not generalizable.
3
Type I and type II errors
 Type I error: Reject the null hypothesis when it is
true
• Significance level  probability of rejecting the
null when it is true (Type I error)
 Type II error: Accept (fail to reject) the null
hypothesis when it is false
• Power  probability of rejecting the null when
an alternative null is true (1-probability of Type
II)
 We want to minimize both types of errors
• Increase sample size
4
Survey - Sampling




Population: all cases of interest
Sampling frame: list of all potential cases
Sample: cases selected for analysis
Sampling method: technique for selecting cases
from sampling frame
 Sampling fraction: proportion of cases from
population selected for sample (n/N)
5
Sampling Frame
 Simple Sampling – almost never practical unless
universe of interest is geographically concentrated
 Cluster Sampling – randomly choose clusters and
then randomly choose units within the cluster.
Effective sample size is less than actual number of
observations. This is the design, or cluster, effect
 The design effect implies that, for a given sized
sample, the variance increases [1 + (E-1)] where E
is the number of elements in each cluster and  is the
intra-class correlation, a measure of how much the
observations with in a cluster resemble each other.
6
Using Power Calculations to
Estimate Sample Sizes
 What is the size sample needed to be able to find a
difference in means at a given statistical significance.
 Need idea of what difference is a plausible
expectation for the intervention.
 Fixing the confidence level, we observe two things
when increasing sample size:
• the rejection region gets larger and
• the power increases
7
In Practice - I
 Many sample patterns possible especially when one
can vary cluster numbers and cluster sizes
 May use simulations in Stata or similar package.
They easily account for complicated designs
 Panel and dif-in-dif calculations need to be based on
ability to find significance of changes, not difference
in levels. Requires an estimate of correlation over
rounds
 Sample needed to find difference between
alternative treatments is different than that needed to
compare to control
8
In Practice - II
 Number of clusters improves precision and is
important especially in randomized designs.
 Not strictly necessary that treatment and control are
equal in size or number of clusters but analysis is
complicated if probability of selection differs.
 Importance of transparency in randomization
process
 Many medical journal require registering trials prior
to analysis (to avoid reporting only ‘favorable’
results).
9
An Example
 Does Information improve child performance in
schools? (Pakistan)
 Randomized Design
• Interested in villages where there are private
schooling options
 What Villages should we work in?
• Stratification: North, Central, South
• Random Sample: Villages chosen randomly from
list of all villages with a private school
10
In Practice: An Example
 How many villages should we choose?
 Depends on:
• How many children in every village
• How big do we think the treatment effect will
be
• What the overall variability in the outcome
variable will be
11
In Practice: An Example
 Simulation Tables
• Table 1 assumes very high variability in testscores.
Number
of
children
in every
School
5
10
15
20
Number of Villages (assuming 3 schools per village)
21
27
36
42
60
N,n
N,n
N,s
N,s
N,a
N,s
N,m
N,a
N,a
S,a
N,s
N,m
N,a
N,a
M,a
N,m
S,m
M,a
M,a
M,a
 X,Y: X is for intervention with small effect size; Y for
larger effect size
• N: Significant < 1% of simulations
• S: Significant < 10% of simulations
• A: Significant > 99% of simulations
12
In Practice: An Example
 Simulation Tables
• Table 1 assumes lower variability in test-scores.
Number
of
children
in every
School

5
10
15
20
Number of Villages (assuming 3 schools per village)
21
27
36
42
60
N,s
N,s
s,m
s,m
S,a
N,m
S,m
m,a
M,a
M,a
N,m
s,m
M,a
M,a
M,a
S,a
S,a
M,a
M,a
M,a
X,Y: X is for intervention with small effect size; Y for
larger effect size
• N: Significant < 1% of simulations
• S: Significant < 10% of simulations
• A: Significant > 99% of simulations
13
When do we really worry about
this?
 IF
• Very small samples at unit of treatment!
• Suppose treatment in 20 schools and control in
20 schools
• But there are 400 children in every school
• This is still a small sample
 IF
• Interested in sub-groups (blocks)
• Sample size requirements increase exponentially
 IF
• Using Regression Discontinuity Designs
14