Estimating k-th Order Spillovers

Estimating k-th Order Spillovers
Connor Jerzak1
1
Department of Government and Institute for Quantitative Social Science,
Harvard University, 1737 Cambridge Street, Cambridge MA 02138
e-mail: [email protected]
January 25, 2016
Abstract
Spillover effects are common in the social and biological sciences, but complicate the potential outcomes
framework as first articulated by Neyman in the 1920s and later formalized by Rubin in the 1970s.
This paper builds from Athey, Eckles, and Imbens’ recent work on obtaining exact p-values in network
inference, which allowed for 1-st order spillovers (2015). I provide a generalization of their framework
by allowing spillover effects of an arbitrary order when analyzing experimental interventions on a single
undirected network. After presenting notation for analyzing k-th order spillover effects, I suggest conditional randomization methods to perform inference, developing a generic procedure for identifying the
deepest spillover level present in an experiment. I also propose a causal estimand—the k-th net spillover
effect—which captures the effect of an intervention as it ripples out through the treated unit’s peripheral
ties in a social network. Lastly, I use these tools to analyze an anti-bullying field experiment in 56 US
schools to show how these procedures not only can help correct for statistical biases, but also can reveal
new insights into the dynamics of norm formation. Software to analyze spillover effects in social network
experiments will be made publicly available in CRAN as socialSpillovers.
Contents
0 Introduction
2
1 Notation & exact inference with no interference
1.1 Notation with no interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Exact inference with no interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
2
2 Notation & exact inference with interference
2.1 Notation with interference . . . . . . . . . . . . . .
2.2 Exact inference with interference . . . . . . . . . . .
2.2.1 Breakdown of exact inference . . . . . . . . .
2.2.2 Repairing exact inference . . . . . . . . . . .
2.2.3 A unified testing framework . . . . . . . . . .
2.2.4 Observational data . . . . . . . . . . . . . .
2.2.5 Weakening the Constant Intensity Assumption
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
4
4
4
6
9
10
3 Real-world example - a conflict intervention
12
4 Discussion
12
∗ Preliminary.
1
Jerzak
Estimating k-th Order Spillovers
0 Introduction
The potential outcomes framework was first formalized by Neyman and Rubin (see Rubin, 2005), but researchers began extending this framework to accomodate unit-level interference in the late 2000s (see, among
others, Oakes, 2004). In this period, social media experiments served as clear examples where the observed
outcome of unit i might depend on the treatment status of those connected to this unit. However, with
spillovers, the potential outcomes framework must be modified, and exact, randomization-based inference
breaks down.
In this paper, we extend existing claims that, in the presence of interference, we can still evaluate sharp
null hypotheses, but only if we focus on portions of the network that embody a randomized experiment
with no interference. However, to find these interference-free portions of a network, we are required to make
assumptions about how many layers of interference (K) are present in the data. If K = 1, only 1-st order
spillovers are present among units. In an undirected network of friends, this situation would imply that
the treatment status of unit i’s friends impacts their observed outcome, but not the treatment of unit i’s
friends’ friends. However, if K = 2, the treatment status of unit i’s friend’s friends would then impact
their observed response outcome as well. In this fashion, K governs the maximum extent to which unit i’s
treatment status affects other units, as only those within K degrees of separation are impacted by unit i’s
treatment assignment and potentially vice versa.
A key issue, however, is that K is unknown a priori. In what follows, we build from Aronow (2012) and
Athey, Eckles, & Imbens (2015) as we argue for a procedure that performs a series of hypothesis tests to
infer K from the data. The procedure we develop takes into account multiple testing issues, and is crafted to
allow for the search for K to be fairly generic. Thus, we argue that spillover effects force us to rethink exact
inference: randomization inference is still possible, but requires assumptions that can be made in a principled
but data-driven fashion. Moroever, we also contend that spillover effects are substantively interesting, and
worthy of study independent from the statistical issues arising from their disregard.
The paper proceeds in four main parts. The first section discusses exact inference when interference is not
present. The second section closely examines the implications of interference, and develops a sequential test
for determining the spillover structure present in a social network experiment. This section also discusses a
series of extensions related to this test. The third section illustrates these procedures by replicating Paluck,
Shepherd & Aronow (2016), which presented data from field experiment intended to reduce forms of school
conflict. The final section concludes.
1 Notation & exact inference with no interference
1.1 Notation with no interference
Following the notation in Morgan & Rubin (2010), we here assume that the experiment of interest is 21 ,
such that all n units either receive treatment (Ti = 1) or control (Ti = 0), with i ∈ {1, ..., n}. The vector
Tn×1 denotes the full treatment assignment vector. Often, we abuse notation slightly, writing T when
we mean TObs . Now, Y(0)n×1 denotes the potential outcomes for all units under control and Y(1)n×1
denotes the complete potential outcomes for all units under treatment. The entire potential outcomes
matrix can be written as Yn×2 = (Y(0), Y(1)). The observed outcomes can be written in this context as
YObs,i = Yi (1)Ti + Yi (0)(1 − Ti ). Then, (YObs (T))n×1 denotes the vector of observed outcome values.
1.2 Exact inference with no interference
To perform exact inference without interference, we generally begin by assuming a sharp null hypothesis—
i.e. that Yi (1) − Yi (0) = 0 ∀ i. In an experimental setting, we observe only Yi (1) or Yi (0) for each unit,
but this null allows us to impute YiMis as YiObs . Thus, conditional on this null,“we can empirically create
the distribution of any estimator,” g(T, YObs (T)) (Morgan & Rubin 2012). That is, we can generate the
randomization distribution for any test statistic that is a function of T and YObs by permuting the treatment
assignment vector, and using the imputed value of YiMis to recalculate the test statistic.
2
Jerzak
Estimating k-th Order Spillovers
Randomization tests have several virtues. First, no distributional assumptions are required, only the
exchangeability of disturbance terms—the idea that, under the null, the observed outcomes would be similar
irrespective of the level of the treatment variable (Erikson, Pinto & Rader, 2010). Another benefit of
randomization inference is that we can assess causal claims without appealing to a model, even in small
samples. Moreover, although the null hypothesis is quite strong, Peng (2014) shows that, in completely
randomized experiments, rejection of Fisher’s null can be more difficult than the rejection of Neyman’s null
(which only assumes an average treatment effect of 0). In this sense, randomization tests can be seen as
providing a conservative analysis.
Moreover, recall that we can exploit the duality between intervals and tests to form a fiducial interval
from a randomization test: we can produce an interval by finding the set of all null hypothesis values that
the observed data would fail to reject. Following Lock (2011), we can assume a constant, additive treatment
effect in the following calculations (i.e. Y(1) = Y(0) + τ ). The hypotheses are then
H0 : τ = τ0
H1 : τ 6= τ0 ,
(1)
and an α-level fiducial interval for τ consists of the set of τ0 such that the observed test statistic would not
lead to a rejection of the null hypothesis at significance-level α. Here, when τ0 6= 0, the randomization test
is conducted by first constructing
Yi (0)∗ = (YObs,i − τ0 ) Ti + YObs,i (1 − Ti ).
∗
Then, keeping Yi (0)∗ fixed, we permute the treatment assignment vector and calculate YObs
by adding τ0 to
the treatment group outcomes under the permutation. This procedure generates a distribution of τ̂ under
the null H0 : τ = τ0 . We can use this distribution to calculate a p-value for τ̂Obs under the null. We also can
form an 100(1 − α)% interval for τ by finding the values of τ0 which generate a p-value greater than or equal
to α. Garthwaite (1996) discuss an efficient algorithm for obtained randomization-based fiducial intervals,
which searches for the interval endpoints using a procedure based on the Robbins-Monro search process.
2 Notation & exact inference with interference
2.1 Notation with interference
In the context of interference, we must expand our notation. Assume that there are K levels of interference.
×K
}|
{
z
That is, the treatment status of unit i’s friends of friends of ... of friends may influence i’s outcome. Infor×k
z
}|
{
mally, we can denote unit i’s friends of friends of ... of friends as this unit’s k-th order friends (or the unit’s
friends of depth k). Let Tik denote the binary indicator whether unit i has 0 or more than 0 friends of depth
k who receive treatment. For example, Ti0 denotes the traditional treatment assignment vector, with Ti0 = 0
when unit i is in the control group, and Ti0 = 1 when unit i is in the treated group. In a similar vein, Ti1 = 1
if unit i has a treated friend, and Ti1 = 0 if unit i has no treated friends. In this fashion, Tk denote the full
binary vector for the k-th spillover level.
The potential outcomes have to be rewritten to account for the influence of spillovers. In particular, the
potential outcome for unit i is now a function of the treatment status of all units connected to i up to depth
K. If K = 1, there are 2(1+1) = 4 treatment options to consider, which include: Ti0 Ti1 = {11, 10, 01, 11} (see
Table 1). As a result, Yn×4 = (Y(0, 0), Y(0, 1), Y(1, 0), Y(1, 1)). If K = 2, we have eight treatment options
(see Table 2). In general, if the intervention has 2 levels, we implicitly obtain a 2K+1 factorial experiment
while performing inference because there are 2K+1 distinct “treatment” combinations assuming K levels of
spillover. Lastly, it is important to note that, assuming T0 is randomly assigned such that T0 ⊥ Y, X, then
it follows that Tk ⊥ Y, X for k > 0 as well. If we let X denote pre-treatment covariates, this result follows
from the fact that, given the undirected network, Tk is a function of T0 only, and is thus independent from
the potential outcomes.
3
Jerzak
Estimating k-th Order Spillovers
Table 1: Treatment combinations assuming 1 levels of spillover.
Comb.
Comb.
Comb.
Comb.
Treatment indicator (Ti0 )
Spillover 1 indicator (Ti1 )
0
0
1
1
0
1
0
1
1
2
3
4
Table 2: Treatment combinations assuming 2 levels of spillover.
Comb.
Comb.
Comb.
Comb.
Comb.
Comb.
Comb.
Comb.
1
2
3
4
5
6
7
8
Treatment indicator (Ti0 )
Spillover level 1 indicator (Ti1 )
Spillover level 2 indicator (Ti2 )
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
2.2 Exact inference with interference
2.2.1
Breakdown of exact inference
In the presence of interference, simple randomization tests are no longer valid. In a randomization tests,
we randomly permute the treatment assignment vector assuming a sharp null hypothesis. This procedure
depends on the null hypothesis that the observed and unobserved entries in Y are the same. However,
with interference, if we randomly permute the treatment assignment vector, the sharp null approach is no
longer valid, since permuting whether unit i is treated (i.e. Tj0 ) may alter Tjk and but YiObs is a function of
Ti0 , ..., TiK . In less formal terms, if we randomly permute the treatment assignment vector in the presence of
spillover, we are not only changing the treatment status of unit i, but also the treatment/spillover regime of
other units, since these units are impacted not only by Ti0 but also by Tik as well. Hence, Fisherian inference
breaks down because the sharp null (and exact tests) are invalidated by interference.
To see this difficulty in action, we can generate arbitrary scenarios where randomization tests will yield
faulty inferences in the presence of spillover. Consider, for example, the following data generating process
on a random (small world) network with i ∈ {1, 2, ..., 1000}:
(2)
YiObs ∼ N µ = −0.5Ti0 + 10Ti1 , σ 2 = 0.05 .
Hence, there is a −0.5 treatment effect, and a 10 first-order spillover effect. However, if we ignore the spillover
effects and instead perform a naive analysis using a randomization test, we obtain misleading inferences. For
example, the estimated average treatment effect is −0.14 (not −0.5), and we fail to reject the 0 null. Thus,
in this context, we underestimate the overall average treatment effect, and fail to distinguish this effect from
0. In general, spillover effects can cause arbitrarily large levels of bias if researchers fail to account for them.
2.2.2
Repairing exact inference
As has been discussed in prior literature (see Aronow (2012) and Athey, Eckles, & Imbens (2015)), we can
still use exact inference if randomization tests are performed in a manner that preserves the sharp null. For
example, with 1 level of spillover, we can perform a randomization test comparing the outcomes of individuals
in subclass with Ti0 = 0, Ti1 = 0 with the subclass with Tj0 = 0, Tj1 = 1. When comparing the two groups,
we can permute the reduced T1 vector because doing so would not change the T0 vector. In addition, we
could also compare the subclass with Ti0 = 1, Ti1 = 0 and Tj0 = 0, Tj = 0, and could validly permute the
reduced T1 vector, since doing so would not alter the values of the reduced T0 . In essence, we condition
on the status of all but one of the 2K+1 indicator variables related to treatment and spillover status. With
4
Jerzak
Estimating k-th Order Spillovers
(a)
(b)
Figure 1: Ignoring spillover effects can lead to misleading estimates and faulty exact p-values. Here, the null hypothesis
is false and should be rejected. The true effect is −0.5.
K = 1, we have four direct comparison:
• Subclass Ti0 Ti1 = 10 vs. Subclass Ti0 Ti1 = 00 (immediate effect comparison);
• Subclass 11 vs. Subclass 01 (immediate effect comparison);
• Subclass 01 vs. Subclass 00 (spillover comparison, level 1);
• Subclass 11 vs. Subclass 10 (spillover comparison, level 1).
Hypotheses comparing these subgroups can be evaluated, since they maintain the sharp null within the
randomization test. For example, if K = 2, we can perform a randomization test to compare subclasses 101
and 001, but not subclasses 101 and 000. In the later case, we could not distinguish the influence of T0 from
T3 while maintaining the sharp null. However, by comparing 101 and 001 in the former case, we can obtain
valid inferences about the treatment effect while maintaining a sharp null (since all units have no treated
5
Jerzak
Estimating k-th Order Spillovers
friends and at least 1 treated friend of a friend). In sum, if we want to isolate the effect of each spillover
layer, we must compare groups that possess one and usually only one difference in treatment combination.
The general formulation of this discussion is straightforward. With an arbitrary K, we can find the
number of legitimate pairwise comparisons that directly assess hypotheses relating to the K-th level of
spillover. In particular, we know that we can validly perform all tests comparing the outcomes of
Subclass Ti0 Ti1 Ti2 · · · TiK−1 1 = C 0 C 1 C 2 · · · C K−1 1
vs.
Subclass Tj0 Tj1 Tj2 · · · TjK−1 1 = C 0 C 1 C 2 · · · C K−1 0,
where the use of C is intended to illustrate that units in both subgroups should have the same treatment/spillover regimes, except at the K-th level. These comparisons generate all the pairwise hypotheses
around the K-th spillover level. Notice that there are 2K combinations of the form T 0 T 1 T 2 · · · T K−1 . Thus,
if we evaluate every valid pairwise hypothesis around the K-th level of spillover, we must perform 2K tests.
We can use these insights to estimate the orders of spillover present in a social network experiment, and
one key point is the following. To perform valid randomization inference, we must in essence condition on
spillover indices up to the K-th level. If we fail to do so, we may obtain misleading inference, as we saw in the
above. However, K is unknown, and the selection of K can be done in a principled fashion. For example, we
can in a sequential fashion evaluate all null hypotheses related to spillover level a, where a = G, G − 1, ..., 1.
Here, G denotes the a priori maximum prospective level of spillover, and controls the depth of deepest
spillover considered. In other words, we assume G ≥ K ≥ 0, where K denotes the true maximum order of
non-zero spillover effects. To limit multiple testing issues, we can employ a simple Bonferroni correction,
and, for example, reject null hypotheses that yield p-values of 0.05 ÷ (Total # of tests). This procedure
is one potential method of determining K (although the interpretability of this method is decreasing in
G). Moreover, this approach also depends on the fact stated above that, if T0 is randomly assigned, then
Tk is also assigned effectively at random for k > 0. Hence, the randomization of T0 justifies the use of
randomization-based inference for the general Tk .
To see this method in action, consider the simulated data from above, where the following data generating
process on a random network with i ∈ {1, 2, ..., 1000}:
(3)
YiObs ∼ N µ = −0.5Ti0 + 10Ti1 , σ 2 = 0.05 .
Assuming K = 1, we now obtain 2 average treatment effects—one that compares 10 and 00, and another
that compares 11 vs. 01 (these values can easily be averaged to obtain a single “overall” treatment effect
measure). In both cases, the effect is −0.5, and this value is recovered from the analysis. In addition, we
obtain 2 average spillover effects—one that compares 11 and 10 and another that compares 01 and 00. The
true effects of 10 are correctly recovered, and the null correctly rejected. Figure 2 presents these results for
the 10 vs. 00 comparison. Analogous results hold for the comparison between 11 and 01. This example
shows how, by adjusting for spillovers, we can again obtain unbiased estimates and repair issues around
exact inference.
2.2.3
A unified testing framework
From the above, we know that we have an exponentially increasing number of tests to evaluate the presence
of g-th order spillovers. With an arbitrary G, we must perform 21 + · · · + 2G = 2G+1 − 2 total tests, giving
an α = 0.05 rejection threshold of 0.05 ÷ (2G+1 − 2). When G = 10, we have 210+1 − 2 = 2046 total tests,
and an α = 0.05 rejection threshold of 0.05 ÷ 2046 = 0.000024. Clearly, this approach seems untenable in
many situations, especially when the sample size is small. We can instead employ a testing framework that
performs a single hypothesis test for each prospective level of spillover from G to 1. With this alternative
approach, the α = 0.05 rejection threshold becomes 0.05/G, or 0.005 when G = 10. A single hypothesis test
for each prospective spillover level also enhances the interpretability of the results.
The test statistic for this unified test can be constructed in the following fashion. First, assume we
are analyzing spillovers of level g. Then, we know from the above that we have 2g subgroup comparisons, denoted generically by τ1 , τ2 , ..., τg , where these τ values were formed by comparing units in subclass
6
Jerzak
Estimating k-th Order Spillovers
(a)
(b)
Figure 2: By conditioning on each unit’s full treatment status (including spillovers), we correctly reject the null, and
obtain an unbiased estimate of the true effect. These results show the 10 vs. 00 comparison. Analogous results hold
for the comparison between subgroups 11 and 01.
C 0 C 1 C 2 · · · C K−1 1 and subclass C 0 C 1 C 2 · · · C K−1 0. We can form an estimate of the overall effect. If n
denotes the total sample size, and ns denotes the number of units used to estimate τs , then
./
τ =
g
1 X
·
τs · ns ,
n s=1
(4)
which we denote as the k-th net spillover effect, and which captures the overall effect of having at least one
friend of depth k treated by integrating across all other treatment combinations. We can then perform a valid
./
randomization test on τ to evaluate the overall presence of g-th level spillovers. The global randomization
./
test is valid because each component of τ could itself be tested via a valid randomization test. In other
words, we can scramble the treatment/spillover regimes for units in two subclasses differing only in the last
spillover indicator, and iterate over all subclass comparisons to form a single test. This single test replace
multiple pairwise tests done on the valid subclass comparisons. In addition, because all comparisons evaluate
the potential outcomes of subclass C 0 C 1 C 2 · · · C K−1 1 vs. subclass C 0 C 1 C 2 · · · C K−1 0, no unit is “double
./
counted” in the randomization test. That is, each unit contributes to one and only one component of τ (i.e.
no unit contributes to both τi and τj for i 6= j).
We can use this procedure in the simulated data to correctly determine that K = 2 after setting G = 5.
Recall the following data generating process on a random network:
(5)
YiObs ∼ N µ = −0.5Ti0 + 10Ti1 , σ 2 = 0.05 .
This procedure correctly identifies K = 1, and recovers the 2-th net spillover effect of 10.
7
Jerzak
Estimating k-th Order Spillovers
(a)
(b)
Figure 3: This method correctly rejects the null of a 0 net spillover effect of order 1, and provides an unbiased estimate
of the true effect (10).
8
Jerzak
2.2.4
Estimating k-th Order Spillovers
Observational data
This framework can be extended to observational data. Indeed, in observational settings, we now must
assume that—conditional on confounding covariates—the observed outcomes would be similar irrespective
of the level of the treatment variable (Erikson, Pinto & Rader, 2010). In observational data, the potential
outcomes may not be independent from the treatment indicator. In the simplest formulation, Y(0), Y(1) 6⊥
T0 . If we assume conditional ignorability, we can say Y(0), Y(1) ⊥ T|X, where X denotes the set of
covariates that influence both T0 and the potential outcomes. Thus, if we assume that there is no unmeasured
confounding, we can perform a valid randomization test by estimating the τ ’s after conditioning on X at
each stage because, after conditioning, the outcomes will again be exchangeable under the null. In other
words, randomization inference break down if there are confounders, but we can account for this relationship
by conditioning on these variables in the randomization test. Thus, it is possible to employ this spillover
framework in an observational setting.
Simulations can again illustrate these phenomena. Adapting the earlier simulation framework, assume
(6)
YiObs ∼ N µ = −0.5Ti0 + 10Ti1 + XiT β , σ 2 = 0.05 ,
where (Xi )3×1 denotes the pre-treatment confounders and β3×1 denotes the coefficients relating these confounders to the outcome. In addition, Ti is determined by a propensity model that is a function of Xi . We see
in Figure 4 that, after adapting the earlier unified procedure to analyze the k-th order net spillover effects by
first conditioning on confounders, the approach gains precision in the estimation of the net spillover effects.
(a)
(b)
Figure 4: After conditioning on confounders, our precision greatly improves in the estimation of the net spillover
effects.
9
Jerzak
2.2.5
Estimating k-th Order Spillovers
Weakening the Constant Intensity Assumption
Earlier, we formed treatment/spillover regime subgroups while implicitly making the Constant Intensity
Assumption—the idea that, as long as at least one friend of depth k is treated, the number of additional
depth k friends does not impact one’s potential outcomes. In some situations, this assumption may be
unrealistic, since, for example, it is plausible that norms may change more easily when 10 of one’s friends
receive treatment compared to when only 1 receives the intervention. In addition, the sharp null of the
randomization test made no distinction between units having 100 treated friends versus 1 treated friend.
We can loosen aspects of this assumption while remaining within the horizons of exact inference. For
example, we can partition the spillover space based on cut-points, and perform randomization inference on
./
valid combinations of these partitions, as we did earlier when defining τ . Although this partitioning reduces
./
the interpretability of τ , it can provide additional sensitivity to the inferential procedure outlined above,
and can also be used to identify non-linearities present in a social system.
To clarify this partitioning method, let Sik denote the number of unit i’s k-th order friends who are
treated (and Sk denotes the concatenated vector). In this context, T0 still denotes the binary treatment
assignment vector, but when k > 0, Tik now refers to a p-level factor variable, where p denotes the number of
partitions made based on the cut-points of Sik . In this new framework, we still assume a sharp null between
comparison groups, but these groups differ. For example, if K = 2, p = 4, and k > 0, we let Tik ∈ {1, 2, 3, 4}
denote membership in subgroups with respect to Sik . In this context, we are now comparing subgroups such
as 113 and 112 instead of 111 and 110. Intuitively, the overall test for spillovers gains in sensitivity because
the tail-ends units in the tails of the Sik distribution will receive greater weight than previously, when all
units received the same weight. Tail-end units receive greater relative weight because they are pooled with
fewer units in each subgroup. Hence, this procedure is more sensitive to non-linear spillover effects.
Table 3: Treatment combinations assuming 2 levels of spillover, and weakning the Constant Intensity Assumption.
S1 and S2 are partitioned into four groups based on cut-points.
Treat. Comb
Treatment indicator (Ti0 )
Spillover level 1 level (Ti1 )
Spillover level 2 level (Ti2 )
Comb. 1
Comb. 2
Comb. 3
Comb. 4
Comb. 5
Comb. 6
Comb. 7
Comb. 8
Comb. 9
Comb. 10
Comb. 11
Comb. 12
Comb. 13
Comb. 14
Comb. 15
Comb. 16
Comb. 17
Comb. 18
Comb. 19
Comb. 20
Comb. 21
Comb. 22
Comb. 23
Comb. 24
Comb. 25
Comb. 26
Comb. 27
Comb. 28
Comb. 29
Comb. 30
Comb. 31
Comb. 32
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
To see this method in action, take an extreme case of non-linear spillovers in a social network:
YiObs ∼ N µ = −0.5Ti0 − 10 · I Si1 ≥ quantile(S1 , 0.99) , σ 2 = 0.05 ,
10
(7)
Jerzak
Estimating k-th Order Spillovers
where I[·] denotes the indicator function. In this data-generating process, most treated units have an average
of −0.5, and most control units have an average of 0. However, individuals with an above-average number
of friends treated receive a downward shock of −10 to their observed outcome. Because so few units are
affected by the spillover process, we sometimes fail to reject the null that there are no spillovers with Constant
Intensity Assumption. However, if we loosen this assumption and partition S1 into groups based on its cutpoints, we can more easily distinguish the spillover effect from 0. Figure 5 illustrates these results: with
the Constant Intensity Assumption, we obtain a false negative rate of 32 percent in the non-linear dataset;
with the weaker assumptions we obtain a 0 percent false negative rate. Overall, we can weaker the Constant
Intensity Assumption to account for non-linearities present in social network experiments. Essentially, we
do so by providing a more fine-grained partition of each unit’s spillover data.
(a)
(b)
Figure 5: If we weaken the Constant Intensity Assumption, we obtain greater sensitivity. Here, the null hypothesis is
false and should be rejected.
11
Jerzak
Estimating k-th Order Spillovers
3 Real-world example - a conflict intervention
In this section, we replicate Paluck, Shepherd & Aronow (2016), which presented data from a field experiment
intended to reduce forms of school conflict. This study encouraged a small number of students to take a
public stance against bullying, and network characteristics of the student population were studied to analyze
the types of students most influential over social norms. As a result, this data is uniquely suited for examining
spillover effects, since such effects are substantively interesting because they relate to how norms are conveyed
throughout a social network. With this example, we show how our inferential procedure can identify the
structure of the “ripple effect” generated by the anti-bullying treatment.
This section will be completed after receiving the data. An application is currently on file
with the Princeton IRB.
4 Discussion
This paper has developed a unified framework for identifying spillover in social network experiments. We have
argued that spillover effects render unreliable standard techniques based on exact inference. To address these
difficulties, researchers must account for spillover effects, and must make assumptions about the maximum
level of spillover. We have discussed how this decision can be made in a principled, data-driven manner,
and hope to release our methods in a CRAN package called socialSpillovers. Lastly, by replicating Paluck,
Shepherd & Aronow (2016), we have illustrated how spillovers are substantively interesting.
Future work could address the following. First, how could randomization inference be further extended
to non-binary treatments? We attempted to address this issue by partitioning the non-binary spillover treatments based on cut-points, but further work should examine this issue in more detail. Second, Campanharo
et al. (2011), Marwan et al. (2009), Donner et al. (2010), and others have argued that time series datasets
can be mapped into complex networks, making it possible to translate time series into networks and vice
versa. Can the notion of spillover as understood in the network context be applied to the analysis of causality
in time series?
In the end, spillover effects seem to present a promising area for inquiry. On the one hand, these effects
are important for obtaining unbiased causal estimates. On the other hand, spillover effects bring insight
into how norms or information are conveyed through a network, and how social conventions might evolve in
response to an intervention. Thus, as interference brings new challenges to the study of causality, it might
also enable new discoveries.
12
Jerzak
Estimating k-th Order Spillovers
References
[1] Aronow, P. M. A General Method for Detecting Interference Between Units in Randomized Experiments. Sociological Methods & Research 41, 1 (Feb. 2012), 3–16.
[2] Athey, S., Eckles, D., and Imbens, G. Exact P-values for Network Interference. arXiv: 1506.02084.
[3] Campanharo, A. S. L. O., Sirer, M. I., Malmgren, R. D., Ramos, F. M., and Amaral, L.
A. N. Duality between Time Series and Networks. PLoS ONE 6, 8 (Aug. 2011).
[4] Centola, D., and Baronchelli, A. The spontaneous emergence of conventions: An experimental
study of cultural evolution. Proceedings of the National Academy of Sciences 112, 7 (Feb. 2015), 1989–
1994.
[5] Ding, P. A paradox from randomization-based causal inference. ArXiv e-prints 1402 (Feb. 2014),
arXiv:1402.0142.
[6] Donner, R. V., Zou, Y., Donges, J. F., Marwan, N., and Kurths, J. Recurrence networksa
novel paradigm for nonlinear time series analysis. New Journal of Physics 12, 3 (2010), 033025.
[7] Dwass, M. Modified Randomization Tests for Nonparametric Hypotheses. The Annals of Mathematical
Statistics 28, 1 (1957), 181–187.
[8] Edgington, E., and Onghena, P. Randomization Tests, Fourth Edition. CRC Press, Feb. 2007.
[9] Efron, B., and Tibshirani, R. J. An Introduction to the Bootstrap. CRC Press, May 1994.
[10] Erikson, R. S., Pinto, P. M., and Rader, K. T. Randomization Tests and Multi-Level Data in
U.S. State Politics. State Politics & Policy Quarterly 10, 2 (2010), 180–198.
[11] Fisher, R. A. Statistical methods for research workers,, 4th ed. Oliver and Boyd, Edinburgh, 1932.
[12] Liu, J., and Li, Q. Planar Visibility Graph Network Algorithm For Two Dimensional Timeseries.
ArXiv e-prints 1411 (Nov. 2014), arXiv:1411.6438.
[13] Manly, B. F. J. Randomization, Bootstrap and Monte Carlo Methods in Biology, Third Edition. CRC
Press, Aug. 2006.
[14] Marwan, N., Donges, J. F., Zou, Y., Donner, R. V., and Kurths, J. Complex network approach
for recurrence analysis of time series. Physics Letters A 373, 46 (Nov. 2009), 4246–4254.
[15] Neyman, J. On the Two Different Aspects of the Representative Method: the Method of Stratified
Sampling and the Method of Purposive Selection. In Breakthroughs in Statistics, S. Kotz and N. L.
Johnson, Eds., Springer Series in Statistics. Springer New York, 1992, pp. 123–150. DOI: 10.1007/9781-4612-4380-9 12.
[16] Ngamga, E. J., Nandi, A., Ramaswamy, R., Romano, M. C., Thiel, M., and Kurths, J.
Recurrence analysis of strange nonchaotic dynamics. Physical Review. E, Statistical, Nonlinear, and
Soft Matter Physics 75, 3 Pt 2 (Mar. 2007), 036222.
[17] Oakes, J. M. The (mis)estimation of neighborhood effects: causal inference for a practicable social
epidemiology. Social Science & Medicine 58, 10 (May 2004), 1929–1952.
[18] Paluck, E. L., Shepherd, H., and Aronow, P. M. Changing climates of conflict: A social network
experiment in 56 schools. Proceedings of the National Academy of Sciences 113, 3 (Jan. 2016), 566–571.
[19] Pesarin, F., and Salmaso, L. Permutation Tests for Complex Data: Theory, Applications and
Software. John Wiley & Sons, Feb. 2010.
[20] Rosenbaum, P. R. Conditional Permutation Tests and the Propensity Score in Observational Studies.
Journal of the American Statistical Association 79, 387 (1984), 565–574.
13
Jerzak
Estimating k-th Order Spillovers
[21] Rubin, D. B. Causal Inference Using Potential Outcomes. Journal of the American Statistical Association 100, 469 (Mar. 2005), 322–331.
[22] Splawa-Neyman, J. On the Application of Probability Theory to Agricultural Experiments. Essay on
Principles. Section 9. Statistical Science 5, 4 (1990), 465–472.
14