Present - World Bank Group

Designing Experiments to Measure
Spillover Effects
ABCDE, June 3, 2013
Sarah Baird (U. Otago and George Washington U.)
Aislinn Bohren (U. Pennsylvania)
Craig McIntosh (U. California, San Diego)
Berk Özler (U. Otago and the World Bank)
6/3/2014
2
Motivation for measuring spillover effects

Saturation effects:


Threshold effects:


Risk of certain diseases, like cholera, may plummet above a certain
percentage of vaccine coverage
Local GE effects:


Exposure to infectious malaria bites declines with increased
communal coverage
The effect of transfer programs on prices will differ with coverage
intensity
Displacement effects:

Job training programs may have large effects on the eligible, but
displace others, with the net effect being zero.
6/3/2014
3
Interference as ‘nuisance’…

Interference between units can lead to real bias in RCTs.



Pollution of controls invalidates inference (Rosenbaum 2007).
If we knew that there was no interference between units,
we could conduct ‘blocked design’ experiments because
they’re ideal for power and they’re cheaper.
But, it is impossible to know that using baseline data…

Manski’s famous reflection problem (1993)
4
Interference as ‘of real interest’…

We can ask policy pertinent questions, such as:



‘Cost effectiveness: better to treat 50% of 100 villages or 100%
of 50?
Optimal treatment saturation: at what level of saturation do
public health treatments become minimally/universally
effective?
We can also ask questions of theoretical relevance and identify:



Network effects
Information transmission mechanisms
Social norms, aspirations, etc.…
5
Experiments with randomized
saturation designs

However, so far, there have been only a few experiments
that intentionally create variation in treatment intensity…
But, this number is rapidly growing!
Crépon et al. (2013): counseling and job placement in France
Banerjee et al. (2012): police reform in Rajasthan
Sinclair, McConnell, and Green (2012): Get out the vote in Illinois.
McIntosh et al (2013): infrastructure in Mexican municipalities
Callen et al (2013): formal vs. informal savings institutions






Not for citation without explicit permission from the authors.
6
In this paper, we…
1.
2.
3.
Formalize the design issues involved in ‘randomized saturation’
(RS) designs;
Define a rich set of estimands that can be identified;
Provide explicit power calculation expressions that allow
researchers to understand tradeoffs in design (with supplemental
Matlab code for optimal design;
4.
Consider some extensions made possible by RS designs; and
5.
Present an empirical application.
7
…but, you won’t get all of that today.


Instead, a tour of some of the key issues that come up
when you want to design such experiments:
Especially, tricky stuff that you would not immediately
think about…
8
Basic RCT Designs…
Blocked Design: 50% of every cluster is treated:
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Clustered Design: 50% of clusters are completely treated:
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Partial Population Design: Both 'pure' controls and 'within cluster' controls, saturation fixed:
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Randomized Saturation Design: Treatment saturations directly randomized:
Cluster 1
Cluster 2
Cluster 3
Cluster 4
9
How to allocate treatment to N units in C clusters
Blocked design treats half of the units in every cluster,
has the highest power in the face of intra-cluster
correlation (ICC), but is biased in the face of spillovers
within cluster.
 Clustered design treats all the units in half of the
clusters, has the lowest power in the presence of ICC,
but is unbiased as long as there are no spillovers across
clusters.
Usual solution: “When faced with high ICC, go blocked for
power.”

10
Why are observations correlated within a cluster?

Think of Manski’s (1993) Reflection Problem:



Correlated or contextual effects: blocked design highest power
& unbiased.
Endogenous effects: blocked design biased so should use
clustered.
Upshot: We end up on the horns of the Reflection
Problem, and typical baseline data do not give sufficient
information to disentangle these effects...
11
Randomized Saturation Design


Two-stage randomization: first stage randomizes the saturation
assigned to each cluster (c = 1,…, C), and the second stage
randomly assigns treatment to each individual (i = 1,…, n)
according to the realized saturation of the cluster.
A RS design introduces correlation between the treatment
statuses of individuals within the same cluster, which is:




proportional to the variance of the cluster level treatment saturations
zero in the blocked design,
one in the clustered design,
This correlation affects the power of the study design – in the
presence of intra-cluster correlation.
12
Treatment and Spillover Effects in a RS Design

Assuming only SUTVA, we can formally define and
consistently estimate several treatment and spillover effects:




Intention to Treat (ITT) Effect,
Spillovers on the Non-Treated (SNT) Effect,
Total Causal Effect (TCE),
ITT can be decomposed into two effects:


Treatment on the Uniquely Treated (TUT) Effect,
Spillover on the Treated (ST) Effect,
13
Inference and the power trade-off in RS designs


Minimum detectable effects (MDE) and Minimum
detectable slope effects (MDSE) depend on the
standard errors.
With two more assumptions, we can define the MDE
and the MDSE and compare the power of any RS
design to the more canonical designs (blocked,
clustered, PPE):


Stratified interference
Random Effects error structure
14
Stratified Interference and Random Effects

Using Assumptions 1-3:
15
Calculating the Minimum Detectable Pooled Effect


Suppose that ICC>0 and you fixed the size of the pure control
group:
Corollary 1: A PPE minimizes the MDE


Choosing the constant treatment saturation P involves a tradeoff between
the power of pooled ITT vs. SNT. Assigning equal importance to these
two effects and trying to detect same effect sizes implies:
Corollary 2: A PPE with P=0.5 minimizes the sum of MDEs
16
Calculating the Minimum Detectable Pooled Effect

Corollary 3: Choosing the size of the pure control
group:




It lies in a narrow range between 0.41 and 0.5.
It is always optimal to have more than a third of the clusters
devoted to pure control as they serve as counterfactuals for
both treatment and spillover groups
It is necessary to add more pure control clusters as ICC
increases…
Corollary 4: Fix the size of the treatment and control
groups. Increasing the variance in treatment saturations
linearly increases the MDE.
17
Minimum Detectable Pooled Effect (Summary)


In summary, if the researcher cares only about the
presence of treatment and spillover effects, a PPE
minimizes the sum of MDEs and the optimal size of
pure control is given by Corollary 3.
If the researcher has reasons to vary treatment
saturations, the implied power loss is given by Corollary
4.
18
Calculating the Minimum Detectable Slope Effect


MDSE is the smallest change in the spillover effect due
to a change in treatment saturation that is
distinguishable from zero.
If we want to detect a slope, we need at least two
interior saturations


Actually don’t need a pure control…
Corollary 5:

The two saturations should be symmetric about 0.5 and the
distance between them should be between 0.71 and 1.
19
Calculating the Minimum Detectable Slope Effect



If you want to test concavity, you need at least three
interior saturation points.
The larger the distance between saturations, the higher
the ability to detect smaller slopes (but lowering the
ability to detect pooled MDEs)
Pure control should be smaller than the size of any
treatment saturation, but larger than the size of any
treatment (or within-cluster controls)
20
Pause…

Why do people pick the designs that they do?




Round numbers
Powering each cell separately
Intuition…
Our paper, along with the Matlab code in the appendix,
provides some guidance as to optimal design of
RCTs…
21
Some extensions
1.
Treatment on the Compliers (TOC) Effect
2.
Using within-cluster controls as counterfactuals
3.
4.
Estimating the pure control outcome when it is not
possible to have a pure control
Estimating spillovers in overlapping networks
22
Two empirical findings from our Malawi RS
1.
Spillovers on psychological well-being…
2.
Treatment affecting friends networks…
23
Table 8: Spillover effects on GHQ-12 binary measure of psychological distress of baseline
schoolgirls
Spillover
Number of observations
Entire Spillover Group
Spillovers not in
Treatment Household
Spillovers in Treatment
Household
Round 2
(During)
Round 3
(After)
Round 2
(During)
Round 3
(After)
Round 2
(During)
Round 3
(After)
(1)
(2)
(3)
(4)
(5)
(6)
0.064**
0.007
0.099***
0.001
-0.086
0.015
(0.029)
(0.032)
(0.031)
(0.030)
(0.052)
(0.080)
1,916
1,916
1,847
1,847
1,421
1,421
Mean in control
0.375
0.309
0.375
0.309
0.375
0.309
Notes: *** p<0.01, ** p<0.05, * p<0.1. OLS regressions with standard errors (in parentheses) clustered at the
EA level. Observations are weighted to make results representative of the target population in the study EAs.
Included baseline controls are age dummies, geographical strata, and dummies for girl living in a household
with her father, having been ill over the past two weeks, and never having had sex. In addition, the
regressions control for the number of eligible baseline dropout siblings, the number of baseline schoolgirl
siblings, and sampling frame to correct for possible structural differences among households with different
numbers of eligible siblings.
Not for citation without explicit permission from the authors.
24
Endogenous Network formation


Cash transfer treatment caused both an increase in
more friends being treated and more churn of friends in
the treatment group (dropping more friends, and
adding new ones).
The implication is that if you want to study network
effects, you need to get at the network at baseline, not
later…
25
Thank you!
26
Not for citation without explicit permission from the authors.
27
Not for citation without explicit permission from the authors.
28
Not for citation without explicit permission from the authors.
29