Designing Experiments to Measure Spillover Effects ABCDE, June 3, 2013 Sarah Baird (U. Otago and George Washington U.) Aislinn Bohren (U. Pennsylvania) Craig McIntosh (U. California, San Diego) Berk Özler (U. Otago and the World Bank) 6/3/2014 2 Motivation for measuring spillover effects Saturation effects: Threshold effects: Risk of certain diseases, like cholera, may plummet above a certain percentage of vaccine coverage Local GE effects: Exposure to infectious malaria bites declines with increased communal coverage The effect of transfer programs on prices will differ with coverage intensity Displacement effects: Job training programs may have large effects on the eligible, but displace others, with the net effect being zero. 6/3/2014 3 Interference as ‘nuisance’… Interference between units can lead to real bias in RCTs. Pollution of controls invalidates inference (Rosenbaum 2007). If we knew that there was no interference between units, we could conduct ‘blocked design’ experiments because they’re ideal for power and they’re cheaper. But, it is impossible to know that using baseline data… Manski’s famous reflection problem (1993) 4 Interference as ‘of real interest’… We can ask policy pertinent questions, such as: ‘Cost effectiveness: better to treat 50% of 100 villages or 100% of 50? Optimal treatment saturation: at what level of saturation do public health treatments become minimally/universally effective? We can also ask questions of theoretical relevance and identify: Network effects Information transmission mechanisms Social norms, aspirations, etc.… 5 Experiments with randomized saturation designs However, so far, there have been only a few experiments that intentionally create variation in treatment intensity… But, this number is rapidly growing! Crépon et al. (2013): counseling and job placement in France Banerjee et al. (2012): police reform in Rajasthan Sinclair, McConnell, and Green (2012): Get out the vote in Illinois. McIntosh et al (2013): infrastructure in Mexican municipalities Callen et al (2013): formal vs. informal savings institutions Not for citation without explicit permission from the authors. 6 In this paper, we… 1. 2. 3. Formalize the design issues involved in ‘randomized saturation’ (RS) designs; Define a rich set of estimands that can be identified; Provide explicit power calculation expressions that allow researchers to understand tradeoffs in design (with supplemental Matlab code for optimal design; 4. Consider some extensions made possible by RS designs; and 5. Present an empirical application. 7 …but, you won’t get all of that today. Instead, a tour of some of the key issues that come up when you want to design such experiments: Especially, tricky stuff that you would not immediately think about… 8 Basic RCT Designs… Blocked Design: 50% of every cluster is treated: Cluster 1 Cluster 2 Cluster 3 Cluster 4 Clustered Design: 50% of clusters are completely treated: Cluster 1 Cluster 2 Cluster 3 Cluster 4 Partial Population Design: Both 'pure' controls and 'within cluster' controls, saturation fixed: Cluster 1 Cluster 2 Cluster 3 Cluster 4 Randomized Saturation Design: Treatment saturations directly randomized: Cluster 1 Cluster 2 Cluster 3 Cluster 4 9 How to allocate treatment to N units in C clusters Blocked design treats half of the units in every cluster, has the highest power in the face of intra-cluster correlation (ICC), but is biased in the face of spillovers within cluster. Clustered design treats all the units in half of the clusters, has the lowest power in the presence of ICC, but is unbiased as long as there are no spillovers across clusters. Usual solution: “When faced with high ICC, go blocked for power.” 10 Why are observations correlated within a cluster? Think of Manski’s (1993) Reflection Problem: Correlated or contextual effects: blocked design highest power & unbiased. Endogenous effects: blocked design biased so should use clustered. Upshot: We end up on the horns of the Reflection Problem, and typical baseline data do not give sufficient information to disentangle these effects... 11 Randomized Saturation Design Two-stage randomization: first stage randomizes the saturation assigned to each cluster (c = 1,…, C), and the second stage randomly assigns treatment to each individual (i = 1,…, n) according to the realized saturation of the cluster. A RS design introduces correlation between the treatment statuses of individuals within the same cluster, which is: proportional to the variance of the cluster level treatment saturations zero in the blocked design, one in the clustered design, This correlation affects the power of the study design – in the presence of intra-cluster correlation. 12 Treatment and Spillover Effects in a RS Design Assuming only SUTVA, we can formally define and consistently estimate several treatment and spillover effects: Intention to Treat (ITT) Effect, Spillovers on the Non-Treated (SNT) Effect, Total Causal Effect (TCE), ITT can be decomposed into two effects: Treatment on the Uniquely Treated (TUT) Effect, Spillover on the Treated (ST) Effect, 13 Inference and the power trade-off in RS designs Minimum detectable effects (MDE) and Minimum detectable slope effects (MDSE) depend on the standard errors. With two more assumptions, we can define the MDE and the MDSE and compare the power of any RS design to the more canonical designs (blocked, clustered, PPE): Stratified interference Random Effects error structure 14 Stratified Interference and Random Effects Using Assumptions 1-3: 15 Calculating the Minimum Detectable Pooled Effect Suppose that ICC>0 and you fixed the size of the pure control group: Corollary 1: A PPE minimizes the MDE Choosing the constant treatment saturation P involves a tradeoff between the power of pooled ITT vs. SNT. Assigning equal importance to these two effects and trying to detect same effect sizes implies: Corollary 2: A PPE with P=0.5 minimizes the sum of MDEs 16 Calculating the Minimum Detectable Pooled Effect Corollary 3: Choosing the size of the pure control group: It lies in a narrow range between 0.41 and 0.5. It is always optimal to have more than a third of the clusters devoted to pure control as they serve as counterfactuals for both treatment and spillover groups It is necessary to add more pure control clusters as ICC increases… Corollary 4: Fix the size of the treatment and control groups. Increasing the variance in treatment saturations linearly increases the MDE. 17 Minimum Detectable Pooled Effect (Summary) In summary, if the researcher cares only about the presence of treatment and spillover effects, a PPE minimizes the sum of MDEs and the optimal size of pure control is given by Corollary 3. If the researcher has reasons to vary treatment saturations, the implied power loss is given by Corollary 4. 18 Calculating the Minimum Detectable Slope Effect MDSE is the smallest change in the spillover effect due to a change in treatment saturation that is distinguishable from zero. If we want to detect a slope, we need at least two interior saturations Actually don’t need a pure control… Corollary 5: The two saturations should be symmetric about 0.5 and the distance between them should be between 0.71 and 1. 19 Calculating the Minimum Detectable Slope Effect If you want to test concavity, you need at least three interior saturation points. The larger the distance between saturations, the higher the ability to detect smaller slopes (but lowering the ability to detect pooled MDEs) Pure control should be smaller than the size of any treatment saturation, but larger than the size of any treatment (or within-cluster controls) 20 Pause… Why do people pick the designs that they do? Round numbers Powering each cell separately Intuition… Our paper, along with the Matlab code in the appendix, provides some guidance as to optimal design of RCTs… 21 Some extensions 1. Treatment on the Compliers (TOC) Effect 2. Using within-cluster controls as counterfactuals 3. 4. Estimating the pure control outcome when it is not possible to have a pure control Estimating spillovers in overlapping networks 22 Two empirical findings from our Malawi RS 1. Spillovers on psychological well-being… 2. Treatment affecting friends networks… 23 Table 8: Spillover effects on GHQ-12 binary measure of psychological distress of baseline schoolgirls Spillover Number of observations Entire Spillover Group Spillovers not in Treatment Household Spillovers in Treatment Household Round 2 (During) Round 3 (After) Round 2 (During) Round 3 (After) Round 2 (During) Round 3 (After) (1) (2) (3) (4) (5) (6) 0.064** 0.007 0.099*** 0.001 -0.086 0.015 (0.029) (0.032) (0.031) (0.030) (0.052) (0.080) 1,916 1,916 1,847 1,847 1,421 1,421 Mean in control 0.375 0.309 0.375 0.309 0.375 0.309 Notes: *** p<0.01, ** p<0.05, * p<0.1. OLS regressions with standard errors (in parentheses) clustered at the EA level. Observations are weighted to make results representative of the target population in the study EAs. Included baseline controls are age dummies, geographical strata, and dummies for girl living in a household with her father, having been ill over the past two weeks, and never having had sex. In addition, the regressions control for the number of eligible baseline dropout siblings, the number of baseline schoolgirl siblings, and sampling frame to correct for possible structural differences among households with different numbers of eligible siblings. Not for citation without explicit permission from the authors. 24 Endogenous Network formation Cash transfer treatment caused both an increase in more friends being treated and more churn of friends in the treatment group (dropping more friends, and adding new ones). The implication is that if you want to study network effects, you need to get at the network at baseline, not later… 25 Thank you! 26 Not for citation without explicit permission from the authors. 27 Not for citation without explicit permission from the authors. 28 Not for citation without explicit permission from the authors. 29
© Copyright 2026 Paperzz