Random? As if – Spatial Dependence and Instrumental Variables

Random? As if – Spatial Dependence and
Instrumental Variables∗
Timm Betz †
Scott J. Cook‡
Florian M. Hollenbach§
Prepared for the 2017 Texas Methods Meeting
February 20, 2017
Abstract
Instrumental variable methods are widely used to address endogeneity concerns in research
using observational data. Yet, a specific kind of endogeneity – spatial interdependence among
outcomes – is usually ignored in these models. We show that ignoring spatial interdependence
in instrumental variable models results in biased and inconsistent estimates. We further show
that the resulting bias in the instrumental variable estimates depends on the relative spatial
structure of the dependent variable, the instrument, and the endogenous variable of interest.
We show the extent of these problems both analytically and via Monte Carlo simulation, detail
an estimation strategy that can be used to remedy these issues, and provide an application to
illustrate these differences in practice.
Key Words: Instrumental Variables, Spatial Econometrics, Two-Stage Least Squares
∗
Thanks to Patrick Brandt, Kosuke Imai, Piero Stanig, and Vera Troeger for their helpful comments. All remaining
errors are ours alone. Authors are listed in alphabetical order, equal authorship is implied.
†
Assistant Professor of Political Science, Department of Political Science, Texas A&M University, College Station,
TX 77843. Email: [email protected], URL: www.people.tamu.edu/˜timm.betz
‡
Assistant Professor of Political Science, Department of Political Science, Texas A&M University, College Station,
TX 77843. Email: [email protected], URL: scottjcook.net
§
Assistant Professor of Political Science, Department of Political Science, Texas A&M University, College Station,
TX 77843. Email: [email protected], URL: fhollenbach.org
1
Introduction
Researchers in political science are often interested in estimating the causal effect of a predictor on
an outcome. To do so, scholars increasingly exploit careful research design, using natural, field, or
laboratory experiments, in order to identify the relationship of interest. In many research settings,
however, scholars do not possess the random (or pseudorandom) assignment exploited by these
approaches. Instead, researchers rely on instrumental variable (IV) methods, which allow scholars
to obtain consistent estimates of causal effects.1 Yet, finding valid instrumental variables – those
correlated with the endogenous predictor and orthogonal to the outcome disturbances – can be
challenging. In this paper, we identify a frequent, but often ignored, potential source of violations
of these IV assumptions: spatial dependence in outcome disturbances. When disturbances are spatially clustered, even otherwise exogenous instrumental variables violate the exclusion restriction.
As a consequence, results obtained from standard IV models are asymptotically biased, producing
estimates that can be worse than those recovered by ordinary least squares (OLS).
Our results expand on the well-established finding that even mild violations of the exclusion
restriction – i.e., the use of ‘quasi-instruments’ – result in substantial bias (Bartels, 1991). We
show that given unmodeled spatial interdependence in the outcome, such violations occur with
certainty. In short, the spatial lag of the dependent variable is a function of all predictors – including own-unit values – therefore, when it is erroneously omitted all predictors are correlated
with the disturbance term, and all slope estimates are biased. This holds even when instruments
are randomly assigned, that is, even in the best case scenario for IV models. Given the ubiquity
of spatial interdependence in the social sciences (Ward and O’Loughlin, 2002; Franzese and Hays,
2007; Plümper and Neumayer, 2010), this raises substantial concerns about the use of standard IV
models with observational data.
These concerns are greater still when the instrument is not independently distributed. As in any
1
See Sovey and Green (2011) for a discussion on trends in the use of published IV applications.
1
omitted variables problem, the bias from unmodeled spatial interdependence (i.e., the ‘omitted’)
concentrates most acutely in predictors (i.e., the ‘included’) that parallel the spatial distribution of
the outcome. As such, instrumental variables that are spatially distributed in a similar manner to
the outcome will suffer from more severe biases. This is troubling given that many of our preferred
instruments – e.g., geographic, meteorologic, or economic variable (Ramsay, 2011; Hansford and
Gomez, 2010; Ahmed, 2012) – exhibit a clear spatial pattern. Moreover, any instruments measured
at a higher level of aggregation than the endogenous predictor ensures spatial clustering, as the
aggregate value is common to each of the lower-unit observations it nests. When instruments are
more spatially similar to the outcome than the original endogenous predictor the spatial bias will
increase in the IV model. If a large enough difference exists, this can result in estimates from IV
models with greater total bias than the original OLS estimates which motivated instrumentation in
the first place.
Fortunately, unlike with more general violations of the exclusion restriction, there are solutions are available to recover asymptotically unbiased IV estimates under spatial interdependence.
We detail how spatial-two stage least squares (or related GMM approaches) can be used to model
spatial-autoregressive (SAR) models with additional endogenous predictors. An attractive feature
of this model is that it nests both a standard SAR model and a standard IV model, allowing researchers to explicitly test restrictions rather than proceed by assumption. Furthermore, as it is
itself an instrumental variables approach, it should be relatively straightforward for those already
pursuing such strategies to understand and employ.2 Our simulations demonstrate that this approach consistently outperforms estimation strategies that neglect spatial interdependence – even
under conditions generally unfavorable to spatial models. As such, we strongly advocate that researchers consider spatial-two stage least squares when confronting endogenous predictors.
In the next section, we describe spatial and non-spatial sources of endogeneity. We then discuss
2
We also briefly discuss other consistent estimation methods such as eigenvector filtering with instrumental variables.
2
in detail how spatial interdependence creates bias in IV models, deriving conditions under which
this bias tends to be especially pronounced and comparing it to the bias in ordinary least squares.
Following that, we introduced our preferred spatial-two stage least squares approach and demonstrate its fitness in a series of Monte Carlo simulations. Finally, we show the consequences of
failing to account for spatial interdependence in IV models by replicating Ashraf and Galor (2011)
and Ramsay (2011) before concluding.
2
OLS and Multifarious Endogeneity
In order to better understand the problems that arise from neglecting spatial dependence in IV
estimation, it is useful to first clarify that unmodeled spatial interdependence is itself an omitted
variables problem. To fix concepts, consider a simple linear-additive model
y = βx + e,
(1)
where y is an N -length vector of outcomes, x the predictor, and e the disturbance. The OLS
estimator of β is obtained as the ratio of the sample covariance of x and y to the sample variance
of x,
β̂ols =
cov(x,
c
y)
.
var(x)
c
(2)
Substituting the expression on the right-hand side of equation (1) for y yields the probability limit
plimn→∞ β̂ols = β +
cov(x, e)
,
var(x)
| {z }
(3)
endogeneity bias
showing that β̂ols is asymptotically unbiased if cov(x, e) = 0, that is, if x is exogenous.3 This result
should be familiar to readers, it is presented in any introductory econometrics textbook along with
3
Here and in the following when using the term bias we refer to asymptotic bias, defined as plimn→∞ β̂ − β.
Therefore, an estimator is unbiased if plimn→∞ β̂ = β.
3
common sources of bias: confounding due to omitted variables, simultaneity or reverse causality,
and measurement error in the variable of interest.
We are concerned with a special case of confounding: unmodeled spatial interdependence.
Spatial, or cross-sectional, interdependence occurs when a unit’s outcome affects the choices,
actions, or decisions of other units (Kirby and Ward, 1987; Ward and O’Loughlin, 2002; Beck,
Gleditsch and Beardsley, 2006; Franzese and Hays, 2007; Plümper and Neumayer, 2010). Theories of interdependence are ubiquitous in political science: the contagion of conflict and crises,
the diffusion of policies, the spread of institutions and ideologies, deepening economic integration and resulting policy coordination all provide examples. Ignoring this spatial interdependence
when present induces both cross-sectional correlation in the residuals and, more problematically,
covariance between the predictors as the disturbances. As a consequence, effect estimates are both
inefficient and biased.
To distinguish confounding due to spatial interdependence from other sources of endogeneity
of x, we decompose the error term in equation (1) as
e = ρWy + u,
(4)
where ρ is the effect of outcomes y in surrounding units j on unit i, weighted by W – an N -by-N
connectivity which identifies the relationship between units i and j.
Then, we can rewrite equation (3) as
plimn→∞ β̂ols
cov(x, Wy)
cov(x, u)
=β+ρ
+
.
var(x)
var(x)
|
{z
} | {z }
Spatial
endogeneity bias
(5)
Non-spatial
endogeneity bias
Equation (5) separately identifies these as two potential sources of bias in the OLS estimator:
spatial and non-spatial endogeneity. First, bias can arise from spatial dependence in y. As indicated
by the second term on the right-hand side of equation (5), this bias drops out if ρ = 0; that is, when
4
there is no spatial dependence.4 Second, bias can result from traditional, non-spatial sources of
endogeneity of x, or correlation between x and u. This is represented by the third term in equation
(5), which drops out if cov(x, u) is zero. In what follows, we show that addressing one while
neglecting the other not only fails to recover unbiased estimates of the effect, but, in many cases,
can magnify the bias relative to ordinary least squares.
3
Spatial Bias in 2SLS
Following Sovey and Green (2011), we introduce IV estimation using familiar notation from structural equation models, and assuming linear-additive relationships between the variables. Suppose
a suitable instrument z is available, resulting in the following system of equations:
y = βx + e,
(6)
x = γz + v.
(7)
As before, suppose that the disturbance can be decomposed as e = ρWy + u, and spatial dependence is ignored in the estimation. Then, non-spatial endogeneity arises if cov(u, v) 6= 0 and
therefore cov(x, u) 6= 0. We assume in the following that the variable z satisfies the usual assumption for a valid instrument – cov(z, x) 6= 0 and cov(z, u) = 0 – such that z is correlated with the
endogenous predictor x but uncorrelated with the error term u.
The IV estimator is obtained via two-stage least squares (2SLS): first regressing x on z, predicting x̂, and second regressing y on x̂.5 More directly, the 2SLS estimator is defined as
β̂2sls =
cov(y, z)
.
cov(x, z)
4
(8)
As we show below, cov(x, Wy) will always be non-zero, so it is only when ρ is zero that this term simplifies.
We focus on the 2SLS estimator. However, given the equivalence in the just-identified case, our results apply to
other instrumental variable methods as well.
5
5
Inserting the expression for y yields
ρ × cov(Wy, z) cov(u, z)
+
,
cov(x, z)
cov(x, z)
cov(Wy, z)
,
=β+ρ
cov(x, z)
|
{z
}
plimn→∞ β̂2sls = β +
(9a)
(9b)
Spatial
endogeneity bias
which shows that, by assumption of using a valid instrument, 2SLS does not suffer from the nonspatial endogeneity bias of OLS (because cov(u, z) = 0 and cov(x, z) 6= 0). This result, of course,
is well appreciated and motivates the use of 2SLS where x is suspected to be endogenous. The
instrument z, being uncorrelated with u, removes the non-spatial endogeneity bias.
Less appreciated is that 2SLS is always biased in the presence of (ignored and hence unmodeled) spatial interdependence.6 In short, the instrument violates the exclusion restriction, because
it is related to the outcome disturbances via the omitted spatial lag. This is true even for otherwise
valid instruments, and notably even when the instrument is randomly assigned. To see why, note
that after substituting and rearranging terms, equation (6) is
y = (I − ρW)−1 [βγz + βv + u].
(10)
Pre-multiplying both sides of the expression by W obtains
Wy = W(I − ρW)−1 [βγz + βv + u],
(11)
that is, we can re-express the omitted spatial lag, Wy, in terms of weighted-z and the stochastic
terms. Finally, this can be re-expressed using an infinite series in place of the spatial multiplier
6
While the most obvious solution to address the bias from spatial interdependence may be including Wy as a
variable, this would not be sufficient, because Wy itself is endogenous in the outcome equation; see, e.g., Franzese
and Hays 2007.
6
(i.e., the inverse term) as
Wy = βγWz + ρβγWz + ρ2 βγW2 z + . . . + W(I − ρW)−1 [βv + u],
∞
k
X
k+1 cov(W z, z)
= βγWz + β
+ W(I − ρW)−1 [βv + u].
ρ
var(z)
k=1
(12a)
(12b)
We can now substitute this expression into the definition for the spatial bias in 2SLS given in (9b)
and re-express it as
∞
k
X
cov(Wz, z)
k+1 cov(W z, z)
plimn→∞ β̂2sls − β = βρ
+β
ρ
.
var(z)
var(z)
k=1
(13)
Recall that W is the connectivity matrix – e.g., contiguity, neighbors, inverse-distance, etc. – defining how yi is related to all yj6=i . As equation (13) indicates, the more zi is related to neighboring
units zj6=i , as defined by W, the greater the bias will be.
Moreover, equation (13) also shows that even when z is independently distributed, bias persists.
Under independence, the first term in equation (13) drops out, as independence implies that no
specification of W yields cov(Wz, z) 6= 0. However, this is not true for the additional terms in
the expansion for common forms of spatial interdependence. While W is a hollow matrix – all
elements along the diagonal equal zero – this is generally not true for higher-order multiples of
W; as units themselves are neighbors of their neighbors and, therefore, even under independence
Wq z 6⊥ z for all even values of q where W is non-triangular.7 That is, under these conditions,
cov(Wy, z) 6= 0 by construction, because the ith element of vector Wq z is a function of zi .
Therefore,
R ESULT 1 With unmodeled spatial interdependence in the outcome, 2SLS is asymptotically biased.
7
If W is upper- or lower-triangular W – e.g., spatial ties are undirectional – the higher-order multiples would
remain independent of zi . However, socio-matrices rarely are unidirectional, and instead units affect each other.
7
That is, we show that any instrument that is randomly assigned is (only) first-order unbiased,
providing a lower bound on the spatial bias (under ρ 6= 0).
However, the instruments often used in practice are not independently distributed, risking
greater bias still. Specifically, when values of zi are correlated with zj6=i , the first term in equation
(13) no longer drops out and all of the values in the second-term summation are of greater magnitude. Researchers often draw on geographic, meteorologic, or economic variables, such as natural
disasters (Ramsay, 2011), rainfall data (Hansford and Gomez, 2010), or commodity price shocks
(Ahmed, 2012), where this is likely. To illustrate, consider the use of meteorological variables as
instruments for democratization (z) in models of economic development (y). Contiguous states (a
widely used W) are likely to have both similar levels of development (y) and common weather
patterns (z), where the former implies ρ > 0 and the latter implies cov(Wz, z) > 0. It is under
these conditions that the bias will be most severe; as can be seen in equation (13), the bias increases
in the strength of the interdependence in the outcome (ρ) and the strength of the spatial dependence
in the instrument (cov(Wz, z)).
R ESULT 2 The more the instrument is spatially distributed like the outcome, the greater the bias.
In fact, the spatial bias induced from the instrument can exceed the spatial bias in ordinary least
squares. Consider the relative spatial bias of OLS (the left-hand side) and 2SLS (the right-hand
side):
cov(Wy, x)
cov(Wy, z)
≶
.
var(x)
cov(x, z)
(14)
Re-expressing both as in equation (13) and concentrating on the first term of the expansion, this is
cov(Wx, x)
cov(Wz, z)
2
β(ρ + ρ )
+ · · · ≶ βγ(ρ + ρ )
+ ··· ,
var(x)
cov(x, z)
2
8
(15)
which, since γ is the linear regression of x on z, further simplifies as
cov(Wx, x)
cov(Wz, z)
+ ··· ≶
+ ··· .
var(x)
var(z)
(16)
Simply put, differences in the spatial distribution of the instrument and the endogenous variable
inform the relative degree of spatial bias. When the spatial distribution of the instrument is more
similar to the outcome than the endogenous variable, the spatial bias from 2SLS will be greater.8
R ESULT 3 With unmodeled spatial interdependence in the outcome, the spatial bias in OLS and
2SLS diverges when x and z have different spatial distributions.
Z
X
V
Y
U
Figure 1: Spatial Distributions in IV Models
8
Another way to see the relative bias is to note that the endogenous variable x has two components, the instrument
z and the error term v, since x = γz + v. Then, we can rewrite equation (14) as
cov(Wy, γv)
≶ 0.
var(v)
If the right-hand side is negative (positive), the spatial bias from 2SLS will be greater (less) than OLS.
9
(17)
The intuition behind this is expressed visually in Figure 1, which displays simulated georeferenced data on the contiguous U.S. states. Readers should see this as a spatially-mapped
directed acyclic graph (DAG), where the inputs inform the spatial distributions of resulting outputs (right-to-left). Starting at the end, we see that y is positively spatially clustered; with higher
values concentrating in the Midwest and South, and lower values concentrating in the Northeast,
Mountain States, and West Coast. Itself, this is not an issue. It only becomes problematic – in
terms of efficiency or bias – when this spatial pattern is not entirely explained by the included
predictors. That is, there is no risk of spatial bias if y conditionally independent. Here this is not
the case, as we see that u – the part of y not explained by x – is clearly spatially clustered as well;
that is, there is the risk of bias. As is apparent visually, the spatial clustering of z is greater than x
– in particular note the clustering of low values in the Mountain States and West Coast – producing
the conditions when IV estimation may increase the spatial bias. Calculating the respective biases
from equation (14) under these conditions, 2SLS (3.22) results in more than 3 times the bias of
OLS (1.027).9
This discussion highlights the importance of considering the respective spatial distributions
of the outcomes, the predictors, and the instrument. These considerations also become relevant
when choosing an instrumental variable, since different instrumental variables will have different
spatial distributions and therefore generate different amounts of bias – both absolute and relative
to OLS. The problem we identify resembles the problem of heterogeneous partial effects identified
by Dunning (2008) – except that here, the problem arises not because different components of
the endogenous variable have different partial effects on the outcome variable, but because the
endogenous variable has spatial and non-spatial components that become, relative to each other,
over- or under-weighted once the variable is instrumented with z. If the outcome variable exhibits
spatial interdependence, this mismatch in the spatial structure of x and x̂ changes the amount of
9
This simulations are intended to be illustrative not systematic, a comprehensive set of simulated experiments is
undertaken in Section 5
10
bias in OLS relative to 2SLS.
Ignoring the spatial attributes of the outcomes, predictors, and instrument not only risks greater
spatial bias in IV estimation, but unpredictable and possibly greater overall bias as well. Recall
that OLS suffers from spatial and non-spatial endogeneity bias, whereas 2SLS only suffers from
spatial endogeneity bias. 2SLS will be more biased if
ρ cov(Wy, x) cov(u, x) ρ cov(Wy, z) <
,
+
var(x)
var(x) cov(x, z) (18)
a sufficient condition for which is
cov(x, u) < |ρ| cov(Wy, z) − |ρ| cov(Wy, x) .
var(x) cov(x, z) var(x) (19)
This expression provides a simple intuition for when 2SLS is more biased than OLS. Not surprisingly, OLS performs better when the non-spatial endogeneity of x, as indicated on the left-hand
side, is less severe – 2SLS removes this bias term, but it remains with OLS. However, OLS also
performs better when the endogenous variable, x, is spatially less clustered than the instrument, z
– even with substantial non-spatial endogeneity: The severity of the difference in the spatial biases
may be sufficiently large to surmount the gains from addressing the non-spatial endogeneity. More
problematically, since the spatial and non-spatial bias may have different directions, resolving one
of the biases may easily produce results further from the truth than resolving none.10
These results are especially worrisome in light of the tendency to accept 2SLS estimates as
superior when they differ from the OLS estimates. As we show here, these differences can reflect
increases or decreases in the overall endogeneity bias. Absent specific knowledge on the sign and
relative size of these sources of bias, the OLS and 2SLS estimates will not even be sufficient to
10
For example, consider a special case where the endogenous variable is negatively correlated with the second-stage
error term. In this case, the (non-spatial) endogeneity of x creates downward bias in OLS. Further suppose that both x
and z are positively correlated with the spatial lag Wy, which is plausibly the case in most applications (Franzese and
Hays, 2007). It follows that the non-spatial endogeneity bias of OLS offsets the spatial endogeneity bias.
11
obtain bounds on the true parameter.11
In sum, with spatial interdependence in the outcomes, standard IV models yields asymptotically biased estimates. Even in the best case scenario of a randomly distributed instrument, 2SLS
is biased. When the instrument is itself spatially clustered, the bias in 2SLS increases further. In
many circumstances, 2SLS will yield more biased estimates than OLS, and there is no guarantee
that the two estimates provide bounds for the true parameter value.
4
Spatial Models with Additional Endogenous Regressors
While the literature on endogenous predictors and IV models is largely silent on the possibility
of and solutions for residual spatial autocorrelation, the spatial econometrics literature has occasionally considered contexts in which both problems are present. Early developments in spatial
modeling assumed exogenous predictors – with the exception of spatial lag of the outcome itself
– however, given that this is unlikely to be found in practice with social science data, researchers
have derived methods for estimating spatial models with additional endogenous predictors (Kelejian and Prucha, 2004; Anselin and Lozano-Gracia, 2008; Fingleton and Le Gallo, 2008). To date,
these models have neither received attention in political science, nor have they been understood as
a general solution to spatial violations of the exclusion restriction.
In short, spatial-lag models are simultaneous-equation models, as such estimation strategies
are similar to those generally found when confronting endogenous predictors (e.g., maximum likelihood, indirect least squares, 2SLS, etc.). This commonality readily permits extensions in which
both spatial simultaneity and predictor endogeneity are accounted for, as it is just a special instance
of multiple endogenous variables. That is, a spatial two-stage least squares (spatial-2SLS) model
can be estimated via 2SLS with additional instruments for the spatial lag.
The mechanics of estimation are similar to standard two-stage least squares models, so we
11
When spatial and non-spatial biases have offsetting effects – e.g., positive spatial bias and negative non-spatial
bias – OLS may be biased in one direction, 2SLS in the other, and the true parameter value lies somewhere in-between.
By contrast, if both the non-spatial and the spatial bias are in the same direction, the true parameter will be outside the
interval defined by the OLS and 2SLS estimates.
12
do not reintroduce those here. One important exception, however, is that while instruments for
the endogenous predictor require additional exogenous variables, instruments for the spatial lag
can typically be found from transformations to the existing data. Specifically, spatial lags of the
exogenous predictors (e.g., Wx) serve as instruments for the spatial lag of the outcome (e.g., Wy).
To see why, simply note that the reduced-form of the spatial-lag model discussed in section 2
y = (I − ρW)−1 [xβ + u],
(20)
can be re-expressed using an infinite series in place of the spatial multiplier
y = xβ + ρWxβ + ρ2 W2 xβ + . . . + (I − ρW)−1 u.
(21)
As such, spatial lags of x (and higher order powers of these spatial lags) effectively instrument for
the spatial lag of y – more simply, Wx is related to Wy just as x is related to y.
Limited and full information estimators allowing for both spatial and simultaneous systems
(e.g., non-spatial endogeneity) have been established, with Kelejian and Prucha (2004) the first to
derive formal large sample results and Fingleton and Le Gallo (2008) providing a comprehensive
set of small-sample experiments. In both, the spatial-2SLS performs well under fairly general conditions.12 Recently, Drukker, Egger and Prucha (2013) and Liu and Lee (2013) have expanded on
these to allow for both additional residual spatial error autocorrelation and/or heteroskedasticity.13
Moreover, code exists in both R and Stata to implement these methods, meaning researchers face
little constraint to utilizing them.
12
The situation is somewhat more complicated with binary outcomes, see Franzese, Hays and Cook (2016) for a
discussion on modeling spatial interdependence in discrete-choice models.
13
Note that these extensions are GMM-plus-IV, while we do not discuss this at length here the first step is the
spatial-2SLS we present. In short, S-2SLS provides the initial, consistent estimates of the spatial interdependence in
the outcome that can then be used in the second step estimation of the error autocorrelation, with successive iteration
over both steps until convergence of the parameters is obtained.
13
5
Simulation
To assess the performance of commonly used methods and potential gains from our preferred
alternative, we undertake a range of Monte Carlo experiments with varying levels of spatial and
non-spatial endogeneity. In particular, the data for our simulations is generated as follows:
y = (I − ρy W)−1 [xβ + λ1 Q + u1 ]
(22a)
x = γz + λ2 Q + u2 ,
(22b)
z = (I − ρz W)−1 v, where v ∼ N (0, 1)
(22c)
where y is the outcome, x is the endogenous predictor, Q is a matrix of exogenous predictors,
W is a row-standardized connectivity matrix, and z is the instrument.14 The extent of spatial
dependence in the outcome and the instrument is given by parameters ρy and ρz , respectively,
with larger values of ρy,z resulting in greater spatial endogeneity in y and z, respectively (whereas
ρy = ρz = 0 produces the standard IV model). Non-spatial endogeneity is induced in draws of
(u1 , u2 )T = N (0, Σ), where Σ is the covariance matrix of a bivariate normal random variable with
variance of 4 and correlation δ. We vary δ to induce the extent of correlation across the sources. If
δ = 0, x is exogenous and OLS (or standard spatial) models should be preferred. With non-zero δ
and non-zero ρy,z neither the assumptions of OLS or 2SLS hold.
The remaining parameters {β, γ, λ1 , λ2 } are the effect of the predictors on x or y.15 We are
particularly interested in deriving a unbiased estimate of β – the effect of x on y – which we hold
constant across experiments at 2. As noted above, we are especially interested in how spatial and
14
Locations for the units are generated by twice taking N draws from a standard uniform, with the combined
results producing xy-coordinate points. Connections between the units are then generated using a k-Nearest Neighbor
algorithm with k = 5, returning a binary N -by-N matrix with each element in a row coded as 1 for the five closest
units or 0 for all others (including zeros along the diagonal).
15
In the first stage we specify the intercept as 2 and the two exogenous predictors are 3 and −2.5. In contrast, for
the second stage the intercept is −2 and the exogenous predictors are −3 and 2.5.
14
non-spatial endogeneity in the DGP for y affects the individual models’ estimates of β. In addition,
in the online Appendix, we consider how the estimates vary with sample sizes (N) and the strength
of the instrument (γ).
Table 1 shows the different parameter values which we use to create simulated data sets. There
are 216 different combinations of the parameters shown in Table 1. For each combination we
generate 1, 000 data sets, which results in a total of 216, 000 simulation runs. On each data set we
estimate the β parameter using OLS, two-stage least squares (2SLS), and our preferred method,
the spatial two-stage least squares estimator (S-2SLS).
Table 1: Varying Parameter Values for Simulation
N
ρy
ρz
γ
δ
50
0
0
0.25
-0.5
200
0.3
0.3
0.75
0
0.6
0.6
1.5
0.5
2
To evaluate the different estimation methods, we compare the performance of the three estimators based on median absolute error and their coverage probabilities across different combinations
of simulated parameters and compare the performance of the three estimators. First, let us consider the 36 different parameter combinations when δ = 0.5, i.e. we should have reasonably strong
endogeneity bias. For 75% of those simulated parameter combinations, the spatial 2SLS method
has the smallest median absolute error. The 9 combinations that S-2SLS does not have the smallest
median absolute error are situations where ρy = 0 and nevertheless, the maximum difference in
median absolute error between 2SLS and S-2SLS for these cases is 0.013. Thus, when non-spatial
endogeneity is present and instrumental variable models may be warranted, the spatial IV model
performs better or essentially as good as the standard 2SLS model.
To get a more complete picture, Figure 2 shows the median absolute error for each of the four
estimators for 9 different combinations of parameters the instrument is reasonably strong and N is
15
large. Specifically, we here analyze simulations when γ = 1.5 and N = 200. First, in Figure 2,
ρy increases from left to right across the x-axis in each individual plot. Second, δ (the non-spatial
endogeneity) increases across the three rows from −0.5 in the top, to 0 in the middle row, to 0.5
in the bottom row. Third, each column of plots shows the statistics for different values of ρz ,
increasing from left to right.
16
Correlation
0
− 0.5
0.5
MedAE
MedAE
MedAE
17
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
●
●
●
0
ρy
●
ρy
●
ρy
●
●
●
●
ρz
ρy
●
ρy
●
ρy
●
●
●
s−2sls ● ols
●
0.3
2sls
Figure 2: Median Absolute Error (γ = 1.5 & N = 200)
●
●
●
Estimation Method
●
●
●
0.6
ρy
●
ρy
●
ρy
●
●
●
●
Several observations stand out from the plot. First, across all levels of non-spatial endogeneity
(δ), the median absolute error of the standard 2SLS model grows as ρz increases, but especially as
both ρZ and ρy increase together. Second, the median absolute error of the spatial 2SLS model is
quite stable and does not vary much even with high spatial dependence in Y, Z, or both. Across all
combinations, the spatial 2SLS model performs the best in terms of the median absolute error or is
essentially on par with the best performer. Most importantly and not surprisingly, the performance
of the S-2SLS model is essentially unaffected by spatial dependence, but the model also performs
well when no spatial dependence is present.
As expected, the OLS model performs poorly when non-spatial endogeneity is present. As
we have shown analytically above, however, the bias induced by the spatial dependence in Z and
Y can be larger for 2SLS than OLS, even in the case of strong non-spatial endogeneity. In cases
where both ρz and ρy are increasing together, the bias in 2SLS can be larger than that of OLS, even
in cases where strong reason exists to use instrumental variables (e.g. non-spatial endogeneity). In
addition, the top row of Figure 2 presents a very interesting situation. When the when the bias from
non-spatial endogeneity is negative and bias from spatial simulateneity is positive, the performance
of OLS improves with higher spatial dependence, as the two biases work against each other. In
this case, the performance of OLS compared to standard 2SLS methods is much better, as in some
avoiding one type is bias is worse than ignoring both. Again, the spatial 2SLS model is unaffected
by both these problems and consistently performs better.
Figures 6 in the Appendix shows the same plots of median absolute bias when the strength
of the instrument is much weaker (i.e., γ = 0.75). As is to be expected all methods utilizing the
instrumental variable perform slightly worse in comparison to the OLS model. Yet, the overall
order in performance between the different methods does not change.
18
19
Correlation
0
− 0.5
0.5
Cov
Cov
Cov
0.00
0.50
0.95
0.00
0.50
●
0
●
ρy
ρy
●
●
0.95
ρy
●
●
0.00
0.50
0.95
●
●
●
ρz
ρy
●
ρy
ρy
●
●
●
●
s−2sls ● ols
●
0.3
2sls
Figure 3: Coverage (γ = 1.5 & N = 200)
●
●
●
Estimation Method
●
●
●
0.6
●
ρy
ρy
●
ρy
●
●
●
●
Figure 3 shows the coverage for each estimator for the same combination of simulation parameters. Again, as above γ = 1.5 and N = 200, so we have a strong instrument and a relatively
large number of observations. The coverage statistic measures the share of true observations that
fall within the 95% confidence interval of the estimator. If perfectly calibrated, we would like
this to be true for 95% of cases. When non-spatial endogeneity is high (top or bottomw row), the
coverage of the OLS estimator is quite bad, which is not surprising. Again, however, when spatial
and non-spatial bias go in opposite directions (top row), the coverage of OLS improves with higher
spatial dependence.
The coverage of the 2SLS estimator is quite strong, however, with increasing spatial dependency in Y and Z, the estimator undercovers. Again, in contrast the spatial-2SLS estimator has
very good coverage throughout and is not affected by the spatial dependence in Z or Y. In fact, the
coverage of the S-2SLS estimator for the simulation parameters shown here is consistently right
around 95% and ranges only between 92% and 96%.
We display the same plot for a weaker instrument (γ = 0.75) in the Appendix in Figure
7. Somewhat surprisingly, the coverage of the 2SLS estimator actually improves slightly with a
weaker instrument. We suspect, that while the bias declines with stronger instruments, the stronger
instrument also decreases the standard errors around the estimate, thus decreasing the size of confidence interval. Even with smaller error, the stronger instrument therefore leads to worse coverage.
Again, the S-2SLS estimator is generally outperforming all other methods in terms of coverage
and is not affected by spatial dependence.16
5.1
Robustness Checks - Wrong W
One potential criticism of the simulations is that we assume full knowledge of the correct spatial
network. We estimate the spatial 2SLS model based on the correct spatial connectivity matrix.
16
Whereas we analyze the simulations with relatively large numbers of observations here (N=200), Figures 8 and 9
in the Appendix show the same plots (i.e. median absolute error and coverage) for the simulated data sets with small
samples, i.e. N = 50. Again, for those plots the instrument is rather strong (γ = 1.5). The overall conclusion about
the performance of the individual models does not change for smaller samples.
20
Recall that to create the spatial dependence in the data we randomly drew locations in space and
generated a k-Nearest-Neighbor matrix with k=5. we then used this matrix to generate both the
data and estimate the spatial models. Complete knowledge of the correct spatial network may,
however, be an unrealistic assumption in the real world. Here we therefore present the simulation
results assuming a completely wrong spatial network in the estimation of the spatial 2SLS model.
We do so in the following way. For the estimation of the spatial models in this simulation, we
randomly draw a second set of locations for each data point and use this second set of locations
to create our spatial weights matrix, again based on the 5-nearest-neighbors. In this simulation,
the spatial network in the data generation and the assumed spatial network in the estimation are
completely independent from each other. This is likely a worst case scenario for the spatial models,
as we hope researchers have at least some knowledge about the spatial process in the data they
analyze.
Figures 10 and 11 in the Appendix show the same plots as above for the simulations with
reasonable strong instruments (γ = 1.5) and a large number of observations (N = 200). We do
not discuss the results in detail here, but want to emphasize two things. First, the performance
for all methods decreases with increasing spatial dependence, especially as the spatial dependence
in both Z and Y increases together. Second, the spatial 2SLS method closely tracks the standard
2SLS model in performance. As the median absolute error in the standard 2SLS model increases,
so does the median absolute error for the S-2SLS model. Similarly, as the coverage for the 2SLS
model decreases, so does the coverage for the S-2SLS model. This is to show: Even if researchers
have absolutely no knowledge of the spatial network in their data and chose a spatial matrix at
random our preferred method, the spatial-2SLS model does not perform significantly worse than
the standard 2SLS model.
With respect to the simulations, the results indicate that if there is any risk of spatial dependence
and if we assume we have some minimal knowledge of the network that defines it, spatial-2SLS
should be preferred or at least considered. Even if one is unsure whether spatial dependence does
21
exist, the spatial-2SLS model might be the more conservative and better choice to estimate.
6
Application: “Dynamics and Stagnation in the Malthusian
Epoch”
In this section we illustrate how failing to account for spatial dependence when using IV models
can lead to biased results and thus an overestimation of the strength of the hypothesized relationship. In a 2011 article in the American Economic Review, Ashraf and Galor (2011) investigates
a “central hypothesis of the influential Malthusian theory”. According to Thomas R. Malthus
(1798), the main reason for stagnating incomes prior to the industrial revolution was that whenever
incomes increased, population size would rise as well, leading to living standards bumping up on
the resource frontier and subsequently declining living standards. Ergo, technological progress or
discovery of new resources would only temporarily improve living standards but not improve lives
in the long run (Ashraf and Galor, 2011). As Ashraf and Galor (2011, 2004) outline in the introduction, their article “exploits exogenous sources of cross-country variation in land productivity and
technological levels to examine their hypothesized differential effects on population density versus income per capita during the time period 11500 CE”. As fundamental tests of the Malthusian
theory in pre-industrial societies, Ashraf and Galor (2011, 2009) set out to investigate two particular predictions: 1) a country’s improvement in productivity should lead to larger populations but
not living standards (per capita income) ; and 2) countries with higher land productivity or better
technology ought to have higher population densities, but again, should not be much richer than
their less advanced counterparts.
In the empirical analysis, Ashraf and Galor (2011) use the timing of the onset of the neolithic
revolution to proxy for technological change. In line with their expectations, Ashraf and Galor
(2011) show that both the onset of the neolithic revolution and land productivity are positively
(and statistically significantly) associated with population density, yet not with income per capita.
22
In addition, and most importantly for our application, Ashraf and Galor (2011) provide additional evidence in support of the Malthusian argument using instrumental variables to estimate the
causal effect of technological progress on population density. The authors convincingly contend
that “prehistoric biogeographical endownments”, in particular the “availability of domesticable
species of plants and animals”, have had an important effect on technological progress, as well as
they are exogenous (Ashraf and Galor, 2011, 2029 & 2031). The use of the instrumental variable
is primarily motivated by the authors to estimate the “causal impact of technology on population
density” (Ashraf and Galor, 2011, 2031).
As we argue here, however, the authors ignore possible spatial dependence in both the instrumental variables as well as in the dependent variable of interest, population density.17 Both
population density and even more so plant species and animals are likely to be spatially clustered.
In other words, especially in prehistoric times, it is likely that animals and plants are more likely
to be similar in adjacent regions. Likewise, we posit it is likely that some areas of the world has
higher population density in 1000 CE than other areas, again leading positive spatial correlation.
Figure 4 shows the spatial distribution of the dependent variable of interest (logged population
density), whereas Figure 5 shows that of the combined instrumental variables. Because there are
two instruments (prehistoric availability of plants and animals), we here plot the average of the
two. As is easily visible, both the dependent variable and the instruments are clearly be spatially
clustered. As a first test of possible spatial dependence we also estimate Moran’s I based on the
residuals of the original OLS model with logged population density in 1000 CE as the dependent
variable (column 2, Table 9 in Ashraf and Galor (2011)). Based on Moran’s I were are unable
to reject spatial dependence in the residuals. For the spatial models estimated in this section we
create a spatial neighbor matrix where neighbors are defined as having contiguous borders.18
17
Because of the low number of observations when it comes to the per capita income regressions we focus on the
models with population density as the dependent variable.
18
In our eyes this is the most conservative option. We have also replicated the results with a k=5 nearest neighbor
matrix or a row-standardized contiguous neighbor matrix.
23
Log
Pop Density
2
1
0
−1
−2
Figure 4: This map shows the spatial distribution of logged population density, the dependent variable of interest. Gray coloring indicates no available data. As one can see, the simple visualization
of the dependent variable already indicates strong spatial clustering, which is not surprising when
it comes to population density in 1000 CE.
Table 2 shows the results of the replication analysis of models with population density in 1000
CE (Table 9 in Ashraf and Galor (2011)). Column 1 replicates the original OLS model on the restricted sample (column 2 in Table 9 in Ashraf and Galor (2011)). As a first step, column 2 in Table
2 shows the results when we estimate a spatial autoregressive model instead of the standard OLS
model. As one can see, the main coefficients of interest (technological index & land productivity)
have the same levels of significance as in the original OLS model. The point estimates, however,
are slightly smaller, indicating some upward bias due to spatial dependence.
Columns 3 & 4 replicate the instrumental variable model for population density in 1000 CE as
presented in Table 9 in Ashraf and Galor (2011). The differences in results between the original
2SLS model and the spatial 2SLS model are stark. The coefficient on technological progress (log
of technological index) in the original 2SLS model is 14.53, i.e. almost 3.5 times as large as the
coefficient estimated in the original OLS model. Ashraf and Galor (2011) argue that the difference
24
Instruments
20
15
10
5
Figure 5: This map shows how the instrumental variables vary across space. Since Ashraf and
Galor (2011) include two instruments (prehistoric availability of plants and animals) we here plot
the mean of both. The map clearly indicates spatial clustering, as would be expected when it comes
to plan and animal species. Again, gray coloring indicates missing data.
in estimated coefficients is “a pattern that is consistent with measurement error in the transitiontiming variable and the resultant attenuation bias afflicting OLS coefficient estimates” (Ashraf and
Galor, 2011, 2031). Column 4, however, shows the results from the spatial 2SLS model. Here the
coefficient for technological progress is much smaller compared to the standard 2SLS model. In
fact, the coefficient estimate of technological progress in the spatial 2SLS model is comparable to
that in the original OLS model. Recall, that, as we show above, the non-spatial and spatial bias in
OLS can be somewhat offsetting. This may be the case here. If the non-spatial measurement bias
is attenuating and the spatial bias is upward, the OLS model ends up being less biased than the
2SLS model due to the countervailing forces of both biases on the coefficient estimate.
Table 3 in the Appendix shows the results when we replicate the models with population density
in 1CE as the dependence variable. The overall results are the same. Again, the estimate of the
technological progress coefficient in the 2SLS model is almost three times as large as the OLS
25
Table 2: Replication of Table 9 (1000 CE) in Ashraf and Galor (2011)
(1)
Original OLS
4.198∗∗∗
(1.164)
(2)
SAR
2.856∗∗∗
(0.953)
(3)
Original 2SLS
14.53∗∗∗
(4.437)
(4)
S-2SLS
4.303∗∗∗
(1.328)
pc lnar lnas
0.498∗∗∗
(0.139)
0.397∗∗∗
(0.0963)
0.572∗∗∗
(0.148)
0.397∗∗∗
(0.0987)
ln abslat
-0.185
(0.151)
-0.0934
(0.106)
-0.209
(0.209)
-0.0861
(0.108)
distcr1000
-0.363
(0.426)
-0.341
(0.360)
-1.155∗
(0.640)
-0.462
(0.368)
land100cr
0.442
(0.422)
0.472
(0.341)
0.153
(0.606)
0.431
(0.344)
Constant
-1.820∗∗∗
(0.641)
-1.286∗∗
(0.531)
0.151∗∗∗
(0.0246)
92
Yes
-5.507∗∗∗
(1.702)
-1.796∗∗∗
(0.630)
0.169∗∗∗
(0.0334)
92
Yes
ln CEtech1K
Spatial ρ
Observations
Continent dummies
92
Yes
∗
Standard errors in parentheses
p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01
26
92
Yes
estimate. And again, the coefficient estimate of the spatial 2SLS model is much smaller, in fact it
is a bit smaller than the OLS estimate.
We want to emphasize that the overall conclusion of Ashraf and Galor (2011) still stands. The
Malthusian theory for pre-industrial times is clearly supported by these data. On the other hand,
the causal effect of technological progress on population density is in fact much smaller than the
standard 2SLS model indicates and is about the same size the original estimates in the OLS models
in Ashraf and Galor (2011).
7
Application: Revisiting the Resource Curse: Natural Disasters, the Price of Oil, and Democracy
** Coming soon **
8
Conclusion
IV models have seen increased used in political science over the last several years, with researchers
attempting to minimize the threat of endogeneity and increase the accuracy in estimates of causal
effects. Few, however, seem aware of or attempt to account for spatial dependence in addition to
these traditional endogeneity concerns. In this paper, we show that failing to account for spatial
interdependence in instrumental variable models not only results in inconsistent estimates, but also
may increase the bias compared to the simple OLS model. Moreover, that this offsetting (or increased) bias is likely even for otherwise exogenous instruments such as climatic patterns. Instead,
we suggest that researchers should prefer a spatial two-stage least squares estimator which guards
against both spatial and non-spatial endogeneity. Our simulated experiments provide evidence that
this estimator performs well across a variety of situations, including contexts where only spatial or
non-spatial enodgeneity are present, as it nests both models.
In addition, there are several of other important implications for researchers. First, when the
27
biases have opposing effects – e.g., positive spatial bias and negative non-spatial bias – OLS may
be biased one direction, 2SLS the other, while the true parameter value lies somewhere in-between.
By contrast, if both the non-spatial and the spatial bias are in the same direction, the true parameter
will be outside the interval defined by the OLS and 2SLS estimates. Thus, absent specific knowledge on the sign and relative size of these sources of bias, the OLS and 2SLS estimates will not be
sufficient to obtain bounds on the true parameter.
Second, when x and z differ in their spatial distribution it not only results in bias but fundamentally changes the estimand one is able to recover. This problem resembles the problem of
heterogeneous partial effects as identified by Dunning (2008) – except that here, the problem arises
not because different components of the endogenous variable have different partial effects on the
outcome variable, but because the endogenous variable has spatial and non-spatial components
that become, relatively, over- or under-weighted once the variable is instrumented with z. Recall
that Angrist and Imbens (1994) show that 2SLS recovers the local average treatment effect, i.e.
the estimated effect is based on those observations where the instrument has power. This may be
especially important where the instrument’s power is very geographically concentrated, e.g. using
oil price shocks as an instrument for economic growth has very specific geographic implications
for the local average treatment effect.19
Lastly, we believe the problem we have identified in this paper may be more frequent than
one might expect. The reason would be a particular type of the file drawer problem. As we have
shown above, when the bias caused by spatial dependence is positive and that from non-spatial
endogeneity is negative, using simple 2SLS models will reduce one of these biases and not the
other, thus leading to more biased results and potentially larger estimated effect sizes. This seems
to be the case in both our applications. We believe it is likely to be the case that researchers include
estimates from an IV model in their papers, if the results from the IV model are stronger or at least
as strong as the OLS results. Thus, due to this selection effect, IV models presented in papers may
19
For discussions on a similar point see Ratkovic and Shiraito (2014).
28
exhibit these problems at a higher rate than one might expect.
29
9
Appendix
30
Correlation
0
− 0.5
0.5
MedAE
MedAE
MedAE
31
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
●
●
●
0
ρy
●
●
●
●
●
●
ρz
ρy
●
ρy
●
ρy
●
●
●
s−2sls ● ols
●
0.3
2sls
●
●
●
Figure 6: Median Absolute Error over δ (λ = 0.75 & N = 200)
●
ρy
●
ρy
●
Estimation Method
0.6
ρy
●
ρy
●
ρy
●
●
●
●
32
Correlation
0
− 0.5
0.5
Cov
Cov
Cov
0.00
0.50
0.95
0.00
●
0
●
ρy
ρy
●
●
0.95
0.50
●
ρy
●
0.00
0.50
0.95
●
●
●
ρz
●
ρy
ρy
●
●
●
●
s−2sls ● ols
●
ρy
0.3
2sls
Figure 7: Coverage over δ (λ = 0.75 & N = 200)
●
●
●
Estimation Method
●
●
●
0.6
●
ρy
ρy
●
●
ρy
●
●
●
Correlation
0
− 0.5
0.5
MedAE
MedAE
MedAE
33
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
●
●
●
0
ρy
●
ρy
●
ρy
●
●
●
●
ρz
ρy
●
ρy
●
ρy
●
●
●
s−2sls ● ols
●
0.3
2sls
●
●
●
Figure 8: Median Absolute Error over δ (λ = 1.5 & N = 50)
●
●
●
Estimation Method
0.6
ρy
●
ρy
●
ρy
●
●
●
●
34
Correlation
0
− 0.5
0.5
Cov
Cov
Cov
0.00
0.50
0.95
0.00
0.50
0.95
0.00
0.50
0.95
●
●
●
0
ρy
●
ρy
●
ρy
●
●
●
●
ρz
ρy
●
ρy
●
ρy
●
●
●
s−2sls ● ols
●
0.3
2sls
Figure 9: Coverage over δ (λ = 1.5 & N = 50)
●
●
●
Estimation Method
●
●
●
0.6
ρy
●
ρy
●
ρy
●
●
●
●
Correlation
0
− 0.5
0.5
MedAE
MedAE
MedAE
35
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
0.0
0.3
0.6
0.9
●
●
●
●
●
●
●
●
●
ρz
0.3
ρy
●
ρy
●
ρy
●
s−2sls
●
●
●
ols
●
●
●
0.6
ρy
●
ρy
●
ρy
●
Figure 10: Median Absolute Error over δ (λ = 1.5 & N = 200) - wrong W
0
ρy
●
ρy
●
ρy
●
Estimation Method ● 2sls
●
●
●
36
Correlation
0
− 0.5
0.5
Cov
Cov
Cov
0.00
0.50
0.95
0.00
0.50
●
0
●
●
●
●
●
●
ρz
ρy
●
ρy
ρy
●
●
●
●
s−2sls ● ols
●
0.3
2sls
Figure 11: Coverage over δ (λ = 1.5 & N = 200) - wrong W
●
ρy
ρy
●
●
0.95
ρy
●
●
0.00
0.50
0.95
Estimation Method
●
●
●
0.6
●
ρy
ρy
●
ρy
●
●
●
●
Table 3: Replication of Table 9 (1 CE) in Ashraf and Galor (2011)
(1)
(2)
(3)
(4)
Original OLS
SAR
Original 2SLS
S-2SLS
3.947∗∗∗
3.369∗∗∗
10.80∗∗∗
3.010∗∗∗
(0.983)
(0.760)
(2.857)
(0.978)
0.350∗∗
0.311∗∗∗
0.464∗∗
0.294∗∗∗
(0.172)
(0.106)
(0.182)
(0.105)
0.0834
-0.0152
-0.0521
-0.0505
(0.170)
(0.115)
(0.214)
(0.114)
-0.625
-0.300
-0.616
-0.175
(0.434)
(0.394)
(0.834)
(0.388)
0.146
0.0986
-0.172
0.0867
(0.424)
(0.357)
(0.642)
(0.351)
-2.719∗∗∗
-1.749∗∗∗
-4.770∗∗∗
-1.334∗∗
(0.601)
(0.500)
(0.980)
(0.544)
ln CEtech0
pc lnar lnas
ln abslat
distcr1000
land100cr
Constant
Spatial λ
0.182∗∗∗
0.252∗∗∗
(0.0275)
(0.0358)
Observations
83
83
83
83
Continent dummies
Yes
Yes
Yes
Yes
Standard errors in parentheses
∗
p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01
37
References
Ahmed, Faisal Z. 2012. “The Perils of Unearned Foreign Income: Aid, Remittances, and Government Corruption.” American Political Science Review 106(1):146–165.
Angrist, Joshua D. and Guido W. Imbens. 1994. “Identification and estimation of local average
treatment effects.” Economica 62(2):467–476.
Anselin, Luc and Nancy Lozano-Gracia. 2008. “Errors in variables and spatial effects in hedonic
house price models of ambient air quality.” Empirical Economics 34(1):5–34.
Ashraf, Quamrul and Oded Galor. 2011. “Dynamics and Stagnation in the Malthusian Epoch.”
American Economic Review 101(5):2003—2041.
Bartels, Larry M. 1991. “Instrumental and ”Quasi-Instrumental” Variables.” American Journal of
Political Science 35(3):777–800.
Beck, Nathaniel, Kristian Skrede Gleditsch and Kyle Beardsley. 2006. “Space is more than geography: Using spatial econometrics in the study of political economy.” International Studies
Quarterly 50(1):27–44.
Drukker, David M, Peter Egger and Ingmar R Prucha. 2013. “On two-step estimation of a spatial
autoregressive model with autoregressive disturbances and endogenous regressors.” Econometric Reviews 32(5-6):686–733.
Dunning, Thad. 2008. “Model Specification in Instrumental-Variables Regression.” Political Analysis 16(3):290–302.
Fingleton, Bernard and Julie Le Gallo. 2008. “Estimating spatial models with endogenous variables, a spatial lag and spatially dependent disturbances: Finite sample properties*.” Papers in
Regional Science 87(3):319–339.
38
Franzese, Robert J. Jr. and Jude C. Hays. 2007. “Models of Cross-Sectional Interdependence in
Political Science Panel and Time-Series-Cross-Section Data.” Political Analysis 15(2):140–164.
Franzese, Robert J., Jude C. Hays and Scott J. Cook. 2016.
“Spatial- and Spatiotemporal-
Autoregressive Probit Models of Interdependent Binary Outcomes.” Political Science Research
and Methods 4(1):151–173.
Hansford, Thomas G. and Brad T. Gomez. 2010. “Estimating the Electoral Effects of Voter
Turnout.” American Political Science Review 104(02):268–288.
Kelejian, Harry H and Ingmar R Prucha. 2004. “Estimation of simultaneous systems of spatially
interrelated cross sectional equations.” Journal of Econometrics 118(1):27–50.
Kirby, Andrew M. and Michael D. Ward. 1987. “The Spatial Analysis of Peace and War.” Comparative Political Studies 20(3):293–313.
Liu, Xiaodong and Lung-Fei Lee. 2013. “Two-stage least squares estimation of spatial autoregressive models with endogenous regressors and many instruments.” Econometric Reviews 32(56):734–753.
Plümper, Thomas and Eric Neumayer. 2010. “Model Specification in the Analysis of Spatial
Dependence.” European Journal of Political Research 49(3):418–442.
Ramsay, Kristopher W. 2011. “Cheap Talk Diplomacy, Voluntary Negotiations, and Variable Bargaining Power.” International Studies Quarterly 55(4):1003–1023.
Ratkovic, Marc and Yuki Shiraito. 2014. “Strengthening Weak Instruments by Modeling Compliance.” Working Paper.
Sovey, Allison J. and Donald P. Green. 2011. “Instrumental Variables Estimation in Political
Science: A Readers’ Guide.” American Journal of Political Science 55(1):188–200.
39
Ward, Michael D. and John O’Loughlin. 2002. “Spatial Processes and Political Methodology:
Introduction to the Special Issue.” Political Analysis 10(3):211–216.
URL: http://pan.oxfordjournals.org/content/10/3/211.short
40