Random? As if – Spatial Dependence and Instrumental Variables∗ Timm Betz † Scott J. Cook‡ Florian M. Hollenbach§ Prepared for the 2017 Texas Methods Meeting February 20, 2017 Abstract Instrumental variable methods are widely used to address endogeneity concerns in research using observational data. Yet, a specific kind of endogeneity – spatial interdependence among outcomes – is usually ignored in these models. We show that ignoring spatial interdependence in instrumental variable models results in biased and inconsistent estimates. We further show that the resulting bias in the instrumental variable estimates depends on the relative spatial structure of the dependent variable, the instrument, and the endogenous variable of interest. We show the extent of these problems both analytically and via Monte Carlo simulation, detail an estimation strategy that can be used to remedy these issues, and provide an application to illustrate these differences in practice. Key Words: Instrumental Variables, Spatial Econometrics, Two-Stage Least Squares ∗ Thanks to Patrick Brandt, Kosuke Imai, Piero Stanig, and Vera Troeger for their helpful comments. All remaining errors are ours alone. Authors are listed in alphabetical order, equal authorship is implied. † Assistant Professor of Political Science, Department of Political Science, Texas A&M University, College Station, TX 77843. Email: [email protected], URL: www.people.tamu.edu/˜timm.betz ‡ Assistant Professor of Political Science, Department of Political Science, Texas A&M University, College Station, TX 77843. Email: [email protected], URL: scottjcook.net § Assistant Professor of Political Science, Department of Political Science, Texas A&M University, College Station, TX 77843. Email: [email protected], URL: fhollenbach.org 1 Introduction Researchers in political science are often interested in estimating the causal effect of a predictor on an outcome. To do so, scholars increasingly exploit careful research design, using natural, field, or laboratory experiments, in order to identify the relationship of interest. In many research settings, however, scholars do not possess the random (or pseudorandom) assignment exploited by these approaches. Instead, researchers rely on instrumental variable (IV) methods, which allow scholars to obtain consistent estimates of causal effects.1 Yet, finding valid instrumental variables – those correlated with the endogenous predictor and orthogonal to the outcome disturbances – can be challenging. In this paper, we identify a frequent, but often ignored, potential source of violations of these IV assumptions: spatial dependence in outcome disturbances. When disturbances are spatially clustered, even otherwise exogenous instrumental variables violate the exclusion restriction. As a consequence, results obtained from standard IV models are asymptotically biased, producing estimates that can be worse than those recovered by ordinary least squares (OLS). Our results expand on the well-established finding that even mild violations of the exclusion restriction – i.e., the use of ‘quasi-instruments’ – result in substantial bias (Bartels, 1991). We show that given unmodeled spatial interdependence in the outcome, such violations occur with certainty. In short, the spatial lag of the dependent variable is a function of all predictors – including own-unit values – therefore, when it is erroneously omitted all predictors are correlated with the disturbance term, and all slope estimates are biased. This holds even when instruments are randomly assigned, that is, even in the best case scenario for IV models. Given the ubiquity of spatial interdependence in the social sciences (Ward and O’Loughlin, 2002; Franzese and Hays, 2007; Plümper and Neumayer, 2010), this raises substantial concerns about the use of standard IV models with observational data. These concerns are greater still when the instrument is not independently distributed. As in any 1 See Sovey and Green (2011) for a discussion on trends in the use of published IV applications. 1 omitted variables problem, the bias from unmodeled spatial interdependence (i.e., the ‘omitted’) concentrates most acutely in predictors (i.e., the ‘included’) that parallel the spatial distribution of the outcome. As such, instrumental variables that are spatially distributed in a similar manner to the outcome will suffer from more severe biases. This is troubling given that many of our preferred instruments – e.g., geographic, meteorologic, or economic variable (Ramsay, 2011; Hansford and Gomez, 2010; Ahmed, 2012) – exhibit a clear spatial pattern. Moreover, any instruments measured at a higher level of aggregation than the endogenous predictor ensures spatial clustering, as the aggregate value is common to each of the lower-unit observations it nests. When instruments are more spatially similar to the outcome than the original endogenous predictor the spatial bias will increase in the IV model. If a large enough difference exists, this can result in estimates from IV models with greater total bias than the original OLS estimates which motivated instrumentation in the first place. Fortunately, unlike with more general violations of the exclusion restriction, there are solutions are available to recover asymptotically unbiased IV estimates under spatial interdependence. We detail how spatial-two stage least squares (or related GMM approaches) can be used to model spatial-autoregressive (SAR) models with additional endogenous predictors. An attractive feature of this model is that it nests both a standard SAR model and a standard IV model, allowing researchers to explicitly test restrictions rather than proceed by assumption. Furthermore, as it is itself an instrumental variables approach, it should be relatively straightforward for those already pursuing such strategies to understand and employ.2 Our simulations demonstrate that this approach consistently outperforms estimation strategies that neglect spatial interdependence – even under conditions generally unfavorable to spatial models. As such, we strongly advocate that researchers consider spatial-two stage least squares when confronting endogenous predictors. In the next section, we describe spatial and non-spatial sources of endogeneity. We then discuss 2 We also briefly discuss other consistent estimation methods such as eigenvector filtering with instrumental variables. 2 in detail how spatial interdependence creates bias in IV models, deriving conditions under which this bias tends to be especially pronounced and comparing it to the bias in ordinary least squares. Following that, we introduced our preferred spatial-two stage least squares approach and demonstrate its fitness in a series of Monte Carlo simulations. Finally, we show the consequences of failing to account for spatial interdependence in IV models by replicating Ashraf and Galor (2011) and Ramsay (2011) before concluding. 2 OLS and Multifarious Endogeneity In order to better understand the problems that arise from neglecting spatial dependence in IV estimation, it is useful to first clarify that unmodeled spatial interdependence is itself an omitted variables problem. To fix concepts, consider a simple linear-additive model y = βx + e, (1) where y is an N -length vector of outcomes, x the predictor, and e the disturbance. The OLS estimator of β is obtained as the ratio of the sample covariance of x and y to the sample variance of x, β̂ols = cov(x, c y) . var(x) c (2) Substituting the expression on the right-hand side of equation (1) for y yields the probability limit plimn→∞ β̂ols = β + cov(x, e) , var(x) | {z } (3) endogeneity bias showing that β̂ols is asymptotically unbiased if cov(x, e) = 0, that is, if x is exogenous.3 This result should be familiar to readers, it is presented in any introductory econometrics textbook along with 3 Here and in the following when using the term bias we refer to asymptotic bias, defined as plimn→∞ β̂ − β. Therefore, an estimator is unbiased if plimn→∞ β̂ = β. 3 common sources of bias: confounding due to omitted variables, simultaneity or reverse causality, and measurement error in the variable of interest. We are concerned with a special case of confounding: unmodeled spatial interdependence. Spatial, or cross-sectional, interdependence occurs when a unit’s outcome affects the choices, actions, or decisions of other units (Kirby and Ward, 1987; Ward and O’Loughlin, 2002; Beck, Gleditsch and Beardsley, 2006; Franzese and Hays, 2007; Plümper and Neumayer, 2010). Theories of interdependence are ubiquitous in political science: the contagion of conflict and crises, the diffusion of policies, the spread of institutions and ideologies, deepening economic integration and resulting policy coordination all provide examples. Ignoring this spatial interdependence when present induces both cross-sectional correlation in the residuals and, more problematically, covariance between the predictors as the disturbances. As a consequence, effect estimates are both inefficient and biased. To distinguish confounding due to spatial interdependence from other sources of endogeneity of x, we decompose the error term in equation (1) as e = ρWy + u, (4) where ρ is the effect of outcomes y in surrounding units j on unit i, weighted by W – an N -by-N connectivity which identifies the relationship between units i and j. Then, we can rewrite equation (3) as plimn→∞ β̂ols cov(x, Wy) cov(x, u) =β+ρ + . var(x) var(x) | {z } | {z } Spatial endogeneity bias (5) Non-spatial endogeneity bias Equation (5) separately identifies these as two potential sources of bias in the OLS estimator: spatial and non-spatial endogeneity. First, bias can arise from spatial dependence in y. As indicated by the second term on the right-hand side of equation (5), this bias drops out if ρ = 0; that is, when 4 there is no spatial dependence.4 Second, bias can result from traditional, non-spatial sources of endogeneity of x, or correlation between x and u. This is represented by the third term in equation (5), which drops out if cov(x, u) is zero. In what follows, we show that addressing one while neglecting the other not only fails to recover unbiased estimates of the effect, but, in many cases, can magnify the bias relative to ordinary least squares. 3 Spatial Bias in 2SLS Following Sovey and Green (2011), we introduce IV estimation using familiar notation from structural equation models, and assuming linear-additive relationships between the variables. Suppose a suitable instrument z is available, resulting in the following system of equations: y = βx + e, (6) x = γz + v. (7) As before, suppose that the disturbance can be decomposed as e = ρWy + u, and spatial dependence is ignored in the estimation. Then, non-spatial endogeneity arises if cov(u, v) 6= 0 and therefore cov(x, u) 6= 0. We assume in the following that the variable z satisfies the usual assumption for a valid instrument – cov(z, x) 6= 0 and cov(z, u) = 0 – such that z is correlated with the endogenous predictor x but uncorrelated with the error term u. The IV estimator is obtained via two-stage least squares (2SLS): first regressing x on z, predicting x̂, and second regressing y on x̂.5 More directly, the 2SLS estimator is defined as β̂2sls = cov(y, z) . cov(x, z) 4 (8) As we show below, cov(x, Wy) will always be non-zero, so it is only when ρ is zero that this term simplifies. We focus on the 2SLS estimator. However, given the equivalence in the just-identified case, our results apply to other instrumental variable methods as well. 5 5 Inserting the expression for y yields ρ × cov(Wy, z) cov(u, z) + , cov(x, z) cov(x, z) cov(Wy, z) , =β+ρ cov(x, z) | {z } plimn→∞ β̂2sls = β + (9a) (9b) Spatial endogeneity bias which shows that, by assumption of using a valid instrument, 2SLS does not suffer from the nonspatial endogeneity bias of OLS (because cov(u, z) = 0 and cov(x, z) 6= 0). This result, of course, is well appreciated and motivates the use of 2SLS where x is suspected to be endogenous. The instrument z, being uncorrelated with u, removes the non-spatial endogeneity bias. Less appreciated is that 2SLS is always biased in the presence of (ignored and hence unmodeled) spatial interdependence.6 In short, the instrument violates the exclusion restriction, because it is related to the outcome disturbances via the omitted spatial lag. This is true even for otherwise valid instruments, and notably even when the instrument is randomly assigned. To see why, note that after substituting and rearranging terms, equation (6) is y = (I − ρW)−1 [βγz + βv + u]. (10) Pre-multiplying both sides of the expression by W obtains Wy = W(I − ρW)−1 [βγz + βv + u], (11) that is, we can re-express the omitted spatial lag, Wy, in terms of weighted-z and the stochastic terms. Finally, this can be re-expressed using an infinite series in place of the spatial multiplier 6 While the most obvious solution to address the bias from spatial interdependence may be including Wy as a variable, this would not be sufficient, because Wy itself is endogenous in the outcome equation; see, e.g., Franzese and Hays 2007. 6 (i.e., the inverse term) as Wy = βγWz + ρβγWz + ρ2 βγW2 z + . . . + W(I − ρW)−1 [βv + u], ∞ k X k+1 cov(W z, z) = βγWz + β + W(I − ρW)−1 [βv + u]. ρ var(z) k=1 (12a) (12b) We can now substitute this expression into the definition for the spatial bias in 2SLS given in (9b) and re-express it as ∞ k X cov(Wz, z) k+1 cov(W z, z) plimn→∞ β̂2sls − β = βρ +β ρ . var(z) var(z) k=1 (13) Recall that W is the connectivity matrix – e.g., contiguity, neighbors, inverse-distance, etc. – defining how yi is related to all yj6=i . As equation (13) indicates, the more zi is related to neighboring units zj6=i , as defined by W, the greater the bias will be. Moreover, equation (13) also shows that even when z is independently distributed, bias persists. Under independence, the first term in equation (13) drops out, as independence implies that no specification of W yields cov(Wz, z) 6= 0. However, this is not true for the additional terms in the expansion for common forms of spatial interdependence. While W is a hollow matrix – all elements along the diagonal equal zero – this is generally not true for higher-order multiples of W; as units themselves are neighbors of their neighbors and, therefore, even under independence Wq z 6⊥ z for all even values of q where W is non-triangular.7 That is, under these conditions, cov(Wy, z) 6= 0 by construction, because the ith element of vector Wq z is a function of zi . Therefore, R ESULT 1 With unmodeled spatial interdependence in the outcome, 2SLS is asymptotically biased. 7 If W is upper- or lower-triangular W – e.g., spatial ties are undirectional – the higher-order multiples would remain independent of zi . However, socio-matrices rarely are unidirectional, and instead units affect each other. 7 That is, we show that any instrument that is randomly assigned is (only) first-order unbiased, providing a lower bound on the spatial bias (under ρ 6= 0). However, the instruments often used in practice are not independently distributed, risking greater bias still. Specifically, when values of zi are correlated with zj6=i , the first term in equation (13) no longer drops out and all of the values in the second-term summation are of greater magnitude. Researchers often draw on geographic, meteorologic, or economic variables, such as natural disasters (Ramsay, 2011), rainfall data (Hansford and Gomez, 2010), or commodity price shocks (Ahmed, 2012), where this is likely. To illustrate, consider the use of meteorological variables as instruments for democratization (z) in models of economic development (y). Contiguous states (a widely used W) are likely to have both similar levels of development (y) and common weather patterns (z), where the former implies ρ > 0 and the latter implies cov(Wz, z) > 0. It is under these conditions that the bias will be most severe; as can be seen in equation (13), the bias increases in the strength of the interdependence in the outcome (ρ) and the strength of the spatial dependence in the instrument (cov(Wz, z)). R ESULT 2 The more the instrument is spatially distributed like the outcome, the greater the bias. In fact, the spatial bias induced from the instrument can exceed the spatial bias in ordinary least squares. Consider the relative spatial bias of OLS (the left-hand side) and 2SLS (the right-hand side): cov(Wy, x) cov(Wy, z) ≶ . var(x) cov(x, z) (14) Re-expressing both as in equation (13) and concentrating on the first term of the expansion, this is cov(Wx, x) cov(Wz, z) 2 β(ρ + ρ ) + · · · ≶ βγ(ρ + ρ ) + ··· , var(x) cov(x, z) 2 8 (15) which, since γ is the linear regression of x on z, further simplifies as cov(Wx, x) cov(Wz, z) + ··· ≶ + ··· . var(x) var(z) (16) Simply put, differences in the spatial distribution of the instrument and the endogenous variable inform the relative degree of spatial bias. When the spatial distribution of the instrument is more similar to the outcome than the endogenous variable, the spatial bias from 2SLS will be greater.8 R ESULT 3 With unmodeled spatial interdependence in the outcome, the spatial bias in OLS and 2SLS diverges when x and z have different spatial distributions. Z X V Y U Figure 1: Spatial Distributions in IV Models 8 Another way to see the relative bias is to note that the endogenous variable x has two components, the instrument z and the error term v, since x = γz + v. Then, we can rewrite equation (14) as cov(Wy, γv) ≶ 0. var(v) If the right-hand side is negative (positive), the spatial bias from 2SLS will be greater (less) than OLS. 9 (17) The intuition behind this is expressed visually in Figure 1, which displays simulated georeferenced data on the contiguous U.S. states. Readers should see this as a spatially-mapped directed acyclic graph (DAG), where the inputs inform the spatial distributions of resulting outputs (right-to-left). Starting at the end, we see that y is positively spatially clustered; with higher values concentrating in the Midwest and South, and lower values concentrating in the Northeast, Mountain States, and West Coast. Itself, this is not an issue. It only becomes problematic – in terms of efficiency or bias – when this spatial pattern is not entirely explained by the included predictors. That is, there is no risk of spatial bias if y conditionally independent. Here this is not the case, as we see that u – the part of y not explained by x – is clearly spatially clustered as well; that is, there is the risk of bias. As is apparent visually, the spatial clustering of z is greater than x – in particular note the clustering of low values in the Mountain States and West Coast – producing the conditions when IV estimation may increase the spatial bias. Calculating the respective biases from equation (14) under these conditions, 2SLS (3.22) results in more than 3 times the bias of OLS (1.027).9 This discussion highlights the importance of considering the respective spatial distributions of the outcomes, the predictors, and the instrument. These considerations also become relevant when choosing an instrumental variable, since different instrumental variables will have different spatial distributions and therefore generate different amounts of bias – both absolute and relative to OLS. The problem we identify resembles the problem of heterogeneous partial effects identified by Dunning (2008) – except that here, the problem arises not because different components of the endogenous variable have different partial effects on the outcome variable, but because the endogenous variable has spatial and non-spatial components that become, relative to each other, over- or under-weighted once the variable is instrumented with z. If the outcome variable exhibits spatial interdependence, this mismatch in the spatial structure of x and x̂ changes the amount of 9 This simulations are intended to be illustrative not systematic, a comprehensive set of simulated experiments is undertaken in Section 5 10 bias in OLS relative to 2SLS. Ignoring the spatial attributes of the outcomes, predictors, and instrument not only risks greater spatial bias in IV estimation, but unpredictable and possibly greater overall bias as well. Recall that OLS suffers from spatial and non-spatial endogeneity bias, whereas 2SLS only suffers from spatial endogeneity bias. 2SLS will be more biased if ρ cov(Wy, x) cov(u, x) ρ cov(Wy, z) < , + var(x) var(x) cov(x, z) (18) a sufficient condition for which is cov(x, u) < |ρ| cov(Wy, z) − |ρ| cov(Wy, x) . var(x) cov(x, z) var(x) (19) This expression provides a simple intuition for when 2SLS is more biased than OLS. Not surprisingly, OLS performs better when the non-spatial endogeneity of x, as indicated on the left-hand side, is less severe – 2SLS removes this bias term, but it remains with OLS. However, OLS also performs better when the endogenous variable, x, is spatially less clustered than the instrument, z – even with substantial non-spatial endogeneity: The severity of the difference in the spatial biases may be sufficiently large to surmount the gains from addressing the non-spatial endogeneity. More problematically, since the spatial and non-spatial bias may have different directions, resolving one of the biases may easily produce results further from the truth than resolving none.10 These results are especially worrisome in light of the tendency to accept 2SLS estimates as superior when they differ from the OLS estimates. As we show here, these differences can reflect increases or decreases in the overall endogeneity bias. Absent specific knowledge on the sign and relative size of these sources of bias, the OLS and 2SLS estimates will not even be sufficient to 10 For example, consider a special case where the endogenous variable is negatively correlated with the second-stage error term. In this case, the (non-spatial) endogeneity of x creates downward bias in OLS. Further suppose that both x and z are positively correlated with the spatial lag Wy, which is plausibly the case in most applications (Franzese and Hays, 2007). It follows that the non-spatial endogeneity bias of OLS offsets the spatial endogeneity bias. 11 obtain bounds on the true parameter.11 In sum, with spatial interdependence in the outcomes, standard IV models yields asymptotically biased estimates. Even in the best case scenario of a randomly distributed instrument, 2SLS is biased. When the instrument is itself spatially clustered, the bias in 2SLS increases further. In many circumstances, 2SLS will yield more biased estimates than OLS, and there is no guarantee that the two estimates provide bounds for the true parameter value. 4 Spatial Models with Additional Endogenous Regressors While the literature on endogenous predictors and IV models is largely silent on the possibility of and solutions for residual spatial autocorrelation, the spatial econometrics literature has occasionally considered contexts in which both problems are present. Early developments in spatial modeling assumed exogenous predictors – with the exception of spatial lag of the outcome itself – however, given that this is unlikely to be found in practice with social science data, researchers have derived methods for estimating spatial models with additional endogenous predictors (Kelejian and Prucha, 2004; Anselin and Lozano-Gracia, 2008; Fingleton and Le Gallo, 2008). To date, these models have neither received attention in political science, nor have they been understood as a general solution to spatial violations of the exclusion restriction. In short, spatial-lag models are simultaneous-equation models, as such estimation strategies are similar to those generally found when confronting endogenous predictors (e.g., maximum likelihood, indirect least squares, 2SLS, etc.). This commonality readily permits extensions in which both spatial simultaneity and predictor endogeneity are accounted for, as it is just a special instance of multiple endogenous variables. That is, a spatial two-stage least squares (spatial-2SLS) model can be estimated via 2SLS with additional instruments for the spatial lag. The mechanics of estimation are similar to standard two-stage least squares models, so we 11 When spatial and non-spatial biases have offsetting effects – e.g., positive spatial bias and negative non-spatial bias – OLS may be biased in one direction, 2SLS in the other, and the true parameter value lies somewhere in-between. By contrast, if both the non-spatial and the spatial bias are in the same direction, the true parameter will be outside the interval defined by the OLS and 2SLS estimates. 12 do not reintroduce those here. One important exception, however, is that while instruments for the endogenous predictor require additional exogenous variables, instruments for the spatial lag can typically be found from transformations to the existing data. Specifically, spatial lags of the exogenous predictors (e.g., Wx) serve as instruments for the spatial lag of the outcome (e.g., Wy). To see why, simply note that the reduced-form of the spatial-lag model discussed in section 2 y = (I − ρW)−1 [xβ + u], (20) can be re-expressed using an infinite series in place of the spatial multiplier y = xβ + ρWxβ + ρ2 W2 xβ + . . . + (I − ρW)−1 u. (21) As such, spatial lags of x (and higher order powers of these spatial lags) effectively instrument for the spatial lag of y – more simply, Wx is related to Wy just as x is related to y. Limited and full information estimators allowing for both spatial and simultaneous systems (e.g., non-spatial endogeneity) have been established, with Kelejian and Prucha (2004) the first to derive formal large sample results and Fingleton and Le Gallo (2008) providing a comprehensive set of small-sample experiments. In both, the spatial-2SLS performs well under fairly general conditions.12 Recently, Drukker, Egger and Prucha (2013) and Liu and Lee (2013) have expanded on these to allow for both additional residual spatial error autocorrelation and/or heteroskedasticity.13 Moreover, code exists in both R and Stata to implement these methods, meaning researchers face little constraint to utilizing them. 12 The situation is somewhat more complicated with binary outcomes, see Franzese, Hays and Cook (2016) for a discussion on modeling spatial interdependence in discrete-choice models. 13 Note that these extensions are GMM-plus-IV, while we do not discuss this at length here the first step is the spatial-2SLS we present. In short, S-2SLS provides the initial, consistent estimates of the spatial interdependence in the outcome that can then be used in the second step estimation of the error autocorrelation, with successive iteration over both steps until convergence of the parameters is obtained. 13 5 Simulation To assess the performance of commonly used methods and potential gains from our preferred alternative, we undertake a range of Monte Carlo experiments with varying levels of spatial and non-spatial endogeneity. In particular, the data for our simulations is generated as follows: y = (I − ρy W)−1 [xβ + λ1 Q + u1 ] (22a) x = γz + λ2 Q + u2 , (22b) z = (I − ρz W)−1 v, where v ∼ N (0, 1) (22c) where y is the outcome, x is the endogenous predictor, Q is a matrix of exogenous predictors, W is a row-standardized connectivity matrix, and z is the instrument.14 The extent of spatial dependence in the outcome and the instrument is given by parameters ρy and ρz , respectively, with larger values of ρy,z resulting in greater spatial endogeneity in y and z, respectively (whereas ρy = ρz = 0 produces the standard IV model). Non-spatial endogeneity is induced in draws of (u1 , u2 )T = N (0, Σ), where Σ is the covariance matrix of a bivariate normal random variable with variance of 4 and correlation δ. We vary δ to induce the extent of correlation across the sources. If δ = 0, x is exogenous and OLS (or standard spatial) models should be preferred. With non-zero δ and non-zero ρy,z neither the assumptions of OLS or 2SLS hold. The remaining parameters {β, γ, λ1 , λ2 } are the effect of the predictors on x or y.15 We are particularly interested in deriving a unbiased estimate of β – the effect of x on y – which we hold constant across experiments at 2. As noted above, we are especially interested in how spatial and 14 Locations for the units are generated by twice taking N draws from a standard uniform, with the combined results producing xy-coordinate points. Connections between the units are then generated using a k-Nearest Neighbor algorithm with k = 5, returning a binary N -by-N matrix with each element in a row coded as 1 for the five closest units or 0 for all others (including zeros along the diagonal). 15 In the first stage we specify the intercept as 2 and the two exogenous predictors are 3 and −2.5. In contrast, for the second stage the intercept is −2 and the exogenous predictors are −3 and 2.5. 14 non-spatial endogeneity in the DGP for y affects the individual models’ estimates of β. In addition, in the online Appendix, we consider how the estimates vary with sample sizes (N) and the strength of the instrument (γ). Table 1 shows the different parameter values which we use to create simulated data sets. There are 216 different combinations of the parameters shown in Table 1. For each combination we generate 1, 000 data sets, which results in a total of 216, 000 simulation runs. On each data set we estimate the β parameter using OLS, two-stage least squares (2SLS), and our preferred method, the spatial two-stage least squares estimator (S-2SLS). Table 1: Varying Parameter Values for Simulation N ρy ρz γ δ 50 0 0 0.25 -0.5 200 0.3 0.3 0.75 0 0.6 0.6 1.5 0.5 2 To evaluate the different estimation methods, we compare the performance of the three estimators based on median absolute error and their coverage probabilities across different combinations of simulated parameters and compare the performance of the three estimators. First, let us consider the 36 different parameter combinations when δ = 0.5, i.e. we should have reasonably strong endogeneity bias. For 75% of those simulated parameter combinations, the spatial 2SLS method has the smallest median absolute error. The 9 combinations that S-2SLS does not have the smallest median absolute error are situations where ρy = 0 and nevertheless, the maximum difference in median absolute error between 2SLS and S-2SLS for these cases is 0.013. Thus, when non-spatial endogeneity is present and instrumental variable models may be warranted, the spatial IV model performs better or essentially as good as the standard 2SLS model. To get a more complete picture, Figure 2 shows the median absolute error for each of the four estimators for 9 different combinations of parameters the instrument is reasonably strong and N is 15 large. Specifically, we here analyze simulations when γ = 1.5 and N = 200. First, in Figure 2, ρy increases from left to right across the x-axis in each individual plot. Second, δ (the non-spatial endogeneity) increases across the three rows from −0.5 in the top, to 0 in the middle row, to 0.5 in the bottom row. Third, each column of plots shows the statistics for different values of ρz , increasing from left to right. 16 Correlation 0 − 0.5 0.5 MedAE MedAE MedAE 17 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 ● ● ● 0 ρy ● ρy ● ρy ● ● ● ● ρz ρy ● ρy ● ρy ● ● ● s−2sls ● ols ● 0.3 2sls Figure 2: Median Absolute Error (γ = 1.5 & N = 200) ● ● ● Estimation Method ● ● ● 0.6 ρy ● ρy ● ρy ● ● ● ● Several observations stand out from the plot. First, across all levels of non-spatial endogeneity (δ), the median absolute error of the standard 2SLS model grows as ρz increases, but especially as both ρZ and ρy increase together. Second, the median absolute error of the spatial 2SLS model is quite stable and does not vary much even with high spatial dependence in Y, Z, or both. Across all combinations, the spatial 2SLS model performs the best in terms of the median absolute error or is essentially on par with the best performer. Most importantly and not surprisingly, the performance of the S-2SLS model is essentially unaffected by spatial dependence, but the model also performs well when no spatial dependence is present. As expected, the OLS model performs poorly when non-spatial endogeneity is present. As we have shown analytically above, however, the bias induced by the spatial dependence in Z and Y can be larger for 2SLS than OLS, even in the case of strong non-spatial endogeneity. In cases where both ρz and ρy are increasing together, the bias in 2SLS can be larger than that of OLS, even in cases where strong reason exists to use instrumental variables (e.g. non-spatial endogeneity). In addition, the top row of Figure 2 presents a very interesting situation. When the when the bias from non-spatial endogeneity is negative and bias from spatial simulateneity is positive, the performance of OLS improves with higher spatial dependence, as the two biases work against each other. In this case, the performance of OLS compared to standard 2SLS methods is much better, as in some avoiding one type is bias is worse than ignoring both. Again, the spatial 2SLS model is unaffected by both these problems and consistently performs better. Figures 6 in the Appendix shows the same plots of median absolute bias when the strength of the instrument is much weaker (i.e., γ = 0.75). As is to be expected all methods utilizing the instrumental variable perform slightly worse in comparison to the OLS model. Yet, the overall order in performance between the different methods does not change. 18 19 Correlation 0 − 0.5 0.5 Cov Cov Cov 0.00 0.50 0.95 0.00 0.50 ● 0 ● ρy ρy ● ● 0.95 ρy ● ● 0.00 0.50 0.95 ● ● ● ρz ρy ● ρy ρy ● ● ● ● s−2sls ● ols ● 0.3 2sls Figure 3: Coverage (γ = 1.5 & N = 200) ● ● ● Estimation Method ● ● ● 0.6 ● ρy ρy ● ρy ● ● ● ● Figure 3 shows the coverage for each estimator for the same combination of simulation parameters. Again, as above γ = 1.5 and N = 200, so we have a strong instrument and a relatively large number of observations. The coverage statistic measures the share of true observations that fall within the 95% confidence interval of the estimator. If perfectly calibrated, we would like this to be true for 95% of cases. When non-spatial endogeneity is high (top or bottomw row), the coverage of the OLS estimator is quite bad, which is not surprising. Again, however, when spatial and non-spatial bias go in opposite directions (top row), the coverage of OLS improves with higher spatial dependence. The coverage of the 2SLS estimator is quite strong, however, with increasing spatial dependency in Y and Z, the estimator undercovers. Again, in contrast the spatial-2SLS estimator has very good coverage throughout and is not affected by the spatial dependence in Z or Y. In fact, the coverage of the S-2SLS estimator for the simulation parameters shown here is consistently right around 95% and ranges only between 92% and 96%. We display the same plot for a weaker instrument (γ = 0.75) in the Appendix in Figure 7. Somewhat surprisingly, the coverage of the 2SLS estimator actually improves slightly with a weaker instrument. We suspect, that while the bias declines with stronger instruments, the stronger instrument also decreases the standard errors around the estimate, thus decreasing the size of confidence interval. Even with smaller error, the stronger instrument therefore leads to worse coverage. Again, the S-2SLS estimator is generally outperforming all other methods in terms of coverage and is not affected by spatial dependence.16 5.1 Robustness Checks - Wrong W One potential criticism of the simulations is that we assume full knowledge of the correct spatial network. We estimate the spatial 2SLS model based on the correct spatial connectivity matrix. 16 Whereas we analyze the simulations with relatively large numbers of observations here (N=200), Figures 8 and 9 in the Appendix show the same plots (i.e. median absolute error and coverage) for the simulated data sets with small samples, i.e. N = 50. Again, for those plots the instrument is rather strong (γ = 1.5). The overall conclusion about the performance of the individual models does not change for smaller samples. 20 Recall that to create the spatial dependence in the data we randomly drew locations in space and generated a k-Nearest-Neighbor matrix with k=5. we then used this matrix to generate both the data and estimate the spatial models. Complete knowledge of the correct spatial network may, however, be an unrealistic assumption in the real world. Here we therefore present the simulation results assuming a completely wrong spatial network in the estimation of the spatial 2SLS model. We do so in the following way. For the estimation of the spatial models in this simulation, we randomly draw a second set of locations for each data point and use this second set of locations to create our spatial weights matrix, again based on the 5-nearest-neighbors. In this simulation, the spatial network in the data generation and the assumed spatial network in the estimation are completely independent from each other. This is likely a worst case scenario for the spatial models, as we hope researchers have at least some knowledge about the spatial process in the data they analyze. Figures 10 and 11 in the Appendix show the same plots as above for the simulations with reasonable strong instruments (γ = 1.5) and a large number of observations (N = 200). We do not discuss the results in detail here, but want to emphasize two things. First, the performance for all methods decreases with increasing spatial dependence, especially as the spatial dependence in both Z and Y increases together. Second, the spatial 2SLS method closely tracks the standard 2SLS model in performance. As the median absolute error in the standard 2SLS model increases, so does the median absolute error for the S-2SLS model. Similarly, as the coverage for the 2SLS model decreases, so does the coverage for the S-2SLS model. This is to show: Even if researchers have absolutely no knowledge of the spatial network in their data and chose a spatial matrix at random our preferred method, the spatial-2SLS model does not perform significantly worse than the standard 2SLS model. With respect to the simulations, the results indicate that if there is any risk of spatial dependence and if we assume we have some minimal knowledge of the network that defines it, spatial-2SLS should be preferred or at least considered. Even if one is unsure whether spatial dependence does 21 exist, the spatial-2SLS model might be the more conservative and better choice to estimate. 6 Application: “Dynamics and Stagnation in the Malthusian Epoch” In this section we illustrate how failing to account for spatial dependence when using IV models can lead to biased results and thus an overestimation of the strength of the hypothesized relationship. In a 2011 article in the American Economic Review, Ashraf and Galor (2011) investigates a “central hypothesis of the influential Malthusian theory”. According to Thomas R. Malthus (1798), the main reason for stagnating incomes prior to the industrial revolution was that whenever incomes increased, population size would rise as well, leading to living standards bumping up on the resource frontier and subsequently declining living standards. Ergo, technological progress or discovery of new resources would only temporarily improve living standards but not improve lives in the long run (Ashraf and Galor, 2011). As Ashraf and Galor (2011, 2004) outline in the introduction, their article “exploits exogenous sources of cross-country variation in land productivity and technological levels to examine their hypothesized differential effects on population density versus income per capita during the time period 11500 CE”. As fundamental tests of the Malthusian theory in pre-industrial societies, Ashraf and Galor (2011, 2009) set out to investigate two particular predictions: 1) a country’s improvement in productivity should lead to larger populations but not living standards (per capita income) ; and 2) countries with higher land productivity or better technology ought to have higher population densities, but again, should not be much richer than their less advanced counterparts. In the empirical analysis, Ashraf and Galor (2011) use the timing of the onset of the neolithic revolution to proxy for technological change. In line with their expectations, Ashraf and Galor (2011) show that both the onset of the neolithic revolution and land productivity are positively (and statistically significantly) associated with population density, yet not with income per capita. 22 In addition, and most importantly for our application, Ashraf and Galor (2011) provide additional evidence in support of the Malthusian argument using instrumental variables to estimate the causal effect of technological progress on population density. The authors convincingly contend that “prehistoric biogeographical endownments”, in particular the “availability of domesticable species of plants and animals”, have had an important effect on technological progress, as well as they are exogenous (Ashraf and Galor, 2011, 2029 & 2031). The use of the instrumental variable is primarily motivated by the authors to estimate the “causal impact of technology on population density” (Ashraf and Galor, 2011, 2031). As we argue here, however, the authors ignore possible spatial dependence in both the instrumental variables as well as in the dependent variable of interest, population density.17 Both population density and even more so plant species and animals are likely to be spatially clustered. In other words, especially in prehistoric times, it is likely that animals and plants are more likely to be similar in adjacent regions. Likewise, we posit it is likely that some areas of the world has higher population density in 1000 CE than other areas, again leading positive spatial correlation. Figure 4 shows the spatial distribution of the dependent variable of interest (logged population density), whereas Figure 5 shows that of the combined instrumental variables. Because there are two instruments (prehistoric availability of plants and animals), we here plot the average of the two. As is easily visible, both the dependent variable and the instruments are clearly be spatially clustered. As a first test of possible spatial dependence we also estimate Moran’s I based on the residuals of the original OLS model with logged population density in 1000 CE as the dependent variable (column 2, Table 9 in Ashraf and Galor (2011)). Based on Moran’s I were are unable to reject spatial dependence in the residuals. For the spatial models estimated in this section we create a spatial neighbor matrix where neighbors are defined as having contiguous borders.18 17 Because of the low number of observations when it comes to the per capita income regressions we focus on the models with population density as the dependent variable. 18 In our eyes this is the most conservative option. We have also replicated the results with a k=5 nearest neighbor matrix or a row-standardized contiguous neighbor matrix. 23 Log Pop Density 2 1 0 −1 −2 Figure 4: This map shows the spatial distribution of logged population density, the dependent variable of interest. Gray coloring indicates no available data. As one can see, the simple visualization of the dependent variable already indicates strong spatial clustering, which is not surprising when it comes to population density in 1000 CE. Table 2 shows the results of the replication analysis of models with population density in 1000 CE (Table 9 in Ashraf and Galor (2011)). Column 1 replicates the original OLS model on the restricted sample (column 2 in Table 9 in Ashraf and Galor (2011)). As a first step, column 2 in Table 2 shows the results when we estimate a spatial autoregressive model instead of the standard OLS model. As one can see, the main coefficients of interest (technological index & land productivity) have the same levels of significance as in the original OLS model. The point estimates, however, are slightly smaller, indicating some upward bias due to spatial dependence. Columns 3 & 4 replicate the instrumental variable model for population density in 1000 CE as presented in Table 9 in Ashraf and Galor (2011). The differences in results between the original 2SLS model and the spatial 2SLS model are stark. The coefficient on technological progress (log of technological index) in the original 2SLS model is 14.53, i.e. almost 3.5 times as large as the coefficient estimated in the original OLS model. Ashraf and Galor (2011) argue that the difference 24 Instruments 20 15 10 5 Figure 5: This map shows how the instrumental variables vary across space. Since Ashraf and Galor (2011) include two instruments (prehistoric availability of plants and animals) we here plot the mean of both. The map clearly indicates spatial clustering, as would be expected when it comes to plan and animal species. Again, gray coloring indicates missing data. in estimated coefficients is “a pattern that is consistent with measurement error in the transitiontiming variable and the resultant attenuation bias afflicting OLS coefficient estimates” (Ashraf and Galor, 2011, 2031). Column 4, however, shows the results from the spatial 2SLS model. Here the coefficient for technological progress is much smaller compared to the standard 2SLS model. In fact, the coefficient estimate of technological progress in the spatial 2SLS model is comparable to that in the original OLS model. Recall, that, as we show above, the non-spatial and spatial bias in OLS can be somewhat offsetting. This may be the case here. If the non-spatial measurement bias is attenuating and the spatial bias is upward, the OLS model ends up being less biased than the 2SLS model due to the countervailing forces of both biases on the coefficient estimate. Table 3 in the Appendix shows the results when we replicate the models with population density in 1CE as the dependence variable. The overall results are the same. Again, the estimate of the technological progress coefficient in the 2SLS model is almost three times as large as the OLS 25 Table 2: Replication of Table 9 (1000 CE) in Ashraf and Galor (2011) (1) Original OLS 4.198∗∗∗ (1.164) (2) SAR 2.856∗∗∗ (0.953) (3) Original 2SLS 14.53∗∗∗ (4.437) (4) S-2SLS 4.303∗∗∗ (1.328) pc lnar lnas 0.498∗∗∗ (0.139) 0.397∗∗∗ (0.0963) 0.572∗∗∗ (0.148) 0.397∗∗∗ (0.0987) ln abslat -0.185 (0.151) -0.0934 (0.106) -0.209 (0.209) -0.0861 (0.108) distcr1000 -0.363 (0.426) -0.341 (0.360) -1.155∗ (0.640) -0.462 (0.368) land100cr 0.442 (0.422) 0.472 (0.341) 0.153 (0.606) 0.431 (0.344) Constant -1.820∗∗∗ (0.641) -1.286∗∗ (0.531) 0.151∗∗∗ (0.0246) 92 Yes -5.507∗∗∗ (1.702) -1.796∗∗∗ (0.630) 0.169∗∗∗ (0.0334) 92 Yes ln CEtech1K Spatial ρ Observations Continent dummies 92 Yes ∗ Standard errors in parentheses p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01 26 92 Yes estimate. And again, the coefficient estimate of the spatial 2SLS model is much smaller, in fact it is a bit smaller than the OLS estimate. We want to emphasize that the overall conclusion of Ashraf and Galor (2011) still stands. The Malthusian theory for pre-industrial times is clearly supported by these data. On the other hand, the causal effect of technological progress on population density is in fact much smaller than the standard 2SLS model indicates and is about the same size the original estimates in the OLS models in Ashraf and Galor (2011). 7 Application: Revisiting the Resource Curse: Natural Disasters, the Price of Oil, and Democracy ** Coming soon ** 8 Conclusion IV models have seen increased used in political science over the last several years, with researchers attempting to minimize the threat of endogeneity and increase the accuracy in estimates of causal effects. Few, however, seem aware of or attempt to account for spatial dependence in addition to these traditional endogeneity concerns. In this paper, we show that failing to account for spatial interdependence in instrumental variable models not only results in inconsistent estimates, but also may increase the bias compared to the simple OLS model. Moreover, that this offsetting (or increased) bias is likely even for otherwise exogenous instruments such as climatic patterns. Instead, we suggest that researchers should prefer a spatial two-stage least squares estimator which guards against both spatial and non-spatial endogeneity. Our simulated experiments provide evidence that this estimator performs well across a variety of situations, including contexts where only spatial or non-spatial enodgeneity are present, as it nests both models. In addition, there are several of other important implications for researchers. First, when the 27 biases have opposing effects – e.g., positive spatial bias and negative non-spatial bias – OLS may be biased one direction, 2SLS the other, while the true parameter value lies somewhere in-between. By contrast, if both the non-spatial and the spatial bias are in the same direction, the true parameter will be outside the interval defined by the OLS and 2SLS estimates. Thus, absent specific knowledge on the sign and relative size of these sources of bias, the OLS and 2SLS estimates will not be sufficient to obtain bounds on the true parameter. Second, when x and z differ in their spatial distribution it not only results in bias but fundamentally changes the estimand one is able to recover. This problem resembles the problem of heterogeneous partial effects as identified by Dunning (2008) – except that here, the problem arises not because different components of the endogenous variable have different partial effects on the outcome variable, but because the endogenous variable has spatial and non-spatial components that become, relatively, over- or under-weighted once the variable is instrumented with z. Recall that Angrist and Imbens (1994) show that 2SLS recovers the local average treatment effect, i.e. the estimated effect is based on those observations where the instrument has power. This may be especially important where the instrument’s power is very geographically concentrated, e.g. using oil price shocks as an instrument for economic growth has very specific geographic implications for the local average treatment effect.19 Lastly, we believe the problem we have identified in this paper may be more frequent than one might expect. The reason would be a particular type of the file drawer problem. As we have shown above, when the bias caused by spatial dependence is positive and that from non-spatial endogeneity is negative, using simple 2SLS models will reduce one of these biases and not the other, thus leading to more biased results and potentially larger estimated effect sizes. This seems to be the case in both our applications. We believe it is likely to be the case that researchers include estimates from an IV model in their papers, if the results from the IV model are stronger or at least as strong as the OLS results. Thus, due to this selection effect, IV models presented in papers may 19 For discussions on a similar point see Ratkovic and Shiraito (2014). 28 exhibit these problems at a higher rate than one might expect. 29 9 Appendix 30 Correlation 0 − 0.5 0.5 MedAE MedAE MedAE 31 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 ● ● ● 0 ρy ● ● ● ● ● ● ρz ρy ● ρy ● ρy ● ● ● s−2sls ● ols ● 0.3 2sls ● ● ● Figure 6: Median Absolute Error over δ (λ = 0.75 & N = 200) ● ρy ● ρy ● Estimation Method 0.6 ρy ● ρy ● ρy ● ● ● ● 32 Correlation 0 − 0.5 0.5 Cov Cov Cov 0.00 0.50 0.95 0.00 ● 0 ● ρy ρy ● ● 0.95 0.50 ● ρy ● 0.00 0.50 0.95 ● ● ● ρz ● ρy ρy ● ● ● ● s−2sls ● ols ● ρy 0.3 2sls Figure 7: Coverage over δ (λ = 0.75 & N = 200) ● ● ● Estimation Method ● ● ● 0.6 ● ρy ρy ● ● ρy ● ● ● Correlation 0 − 0.5 0.5 MedAE MedAE MedAE 33 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 ● ● ● 0 ρy ● ρy ● ρy ● ● ● ● ρz ρy ● ρy ● ρy ● ● ● s−2sls ● ols ● 0.3 2sls ● ● ● Figure 8: Median Absolute Error over δ (λ = 1.5 & N = 50) ● ● ● Estimation Method 0.6 ρy ● ρy ● ρy ● ● ● ● 34 Correlation 0 − 0.5 0.5 Cov Cov Cov 0.00 0.50 0.95 0.00 0.50 0.95 0.00 0.50 0.95 ● ● ● 0 ρy ● ρy ● ρy ● ● ● ● ρz ρy ● ρy ● ρy ● ● ● s−2sls ● ols ● 0.3 2sls Figure 9: Coverage over δ (λ = 1.5 & N = 50) ● ● ● Estimation Method ● ● ● 0.6 ρy ● ρy ● ρy ● ● ● ● Correlation 0 − 0.5 0.5 MedAE MedAE MedAE 35 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 0.0 0.3 0.6 0.9 ● ● ● ● ● ● ● ● ● ρz 0.3 ρy ● ρy ● ρy ● s−2sls ● ● ● ols ● ● ● 0.6 ρy ● ρy ● ρy ● Figure 10: Median Absolute Error over δ (λ = 1.5 & N = 200) - wrong W 0 ρy ● ρy ● ρy ● Estimation Method ● 2sls ● ● ● 36 Correlation 0 − 0.5 0.5 Cov Cov Cov 0.00 0.50 0.95 0.00 0.50 ● 0 ● ● ● ● ● ● ρz ρy ● ρy ρy ● ● ● ● s−2sls ● ols ● 0.3 2sls Figure 11: Coverage over δ (λ = 1.5 & N = 200) - wrong W ● ρy ρy ● ● 0.95 ρy ● ● 0.00 0.50 0.95 Estimation Method ● ● ● 0.6 ● ρy ρy ● ρy ● ● ● ● Table 3: Replication of Table 9 (1 CE) in Ashraf and Galor (2011) (1) (2) (3) (4) Original OLS SAR Original 2SLS S-2SLS 3.947∗∗∗ 3.369∗∗∗ 10.80∗∗∗ 3.010∗∗∗ (0.983) (0.760) (2.857) (0.978) 0.350∗∗ 0.311∗∗∗ 0.464∗∗ 0.294∗∗∗ (0.172) (0.106) (0.182) (0.105) 0.0834 -0.0152 -0.0521 -0.0505 (0.170) (0.115) (0.214) (0.114) -0.625 -0.300 -0.616 -0.175 (0.434) (0.394) (0.834) (0.388) 0.146 0.0986 -0.172 0.0867 (0.424) (0.357) (0.642) (0.351) -2.719∗∗∗ -1.749∗∗∗ -4.770∗∗∗ -1.334∗∗ (0.601) (0.500) (0.980) (0.544) ln CEtech0 pc lnar lnas ln abslat distcr1000 land100cr Constant Spatial λ 0.182∗∗∗ 0.252∗∗∗ (0.0275) (0.0358) Observations 83 83 83 83 Continent dummies Yes Yes Yes Yes Standard errors in parentheses ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01 37 References Ahmed, Faisal Z. 2012. “The Perils of Unearned Foreign Income: Aid, Remittances, and Government Corruption.” American Political Science Review 106(1):146–165. Angrist, Joshua D. and Guido W. Imbens. 1994. “Identification and estimation of local average treatment effects.” Economica 62(2):467–476. Anselin, Luc and Nancy Lozano-Gracia. 2008. “Errors in variables and spatial effects in hedonic house price models of ambient air quality.” Empirical Economics 34(1):5–34. Ashraf, Quamrul and Oded Galor. 2011. “Dynamics and Stagnation in the Malthusian Epoch.” American Economic Review 101(5):2003—2041. Bartels, Larry M. 1991. “Instrumental and ”Quasi-Instrumental” Variables.” American Journal of Political Science 35(3):777–800. Beck, Nathaniel, Kristian Skrede Gleditsch and Kyle Beardsley. 2006. “Space is more than geography: Using spatial econometrics in the study of political economy.” International Studies Quarterly 50(1):27–44. Drukker, David M, Peter Egger and Ingmar R Prucha. 2013. “On two-step estimation of a spatial autoregressive model with autoregressive disturbances and endogenous regressors.” Econometric Reviews 32(5-6):686–733. Dunning, Thad. 2008. “Model Specification in Instrumental-Variables Regression.” Political Analysis 16(3):290–302. Fingleton, Bernard and Julie Le Gallo. 2008. “Estimating spatial models with endogenous variables, a spatial lag and spatially dependent disturbances: Finite sample properties*.” Papers in Regional Science 87(3):319–339. 38 Franzese, Robert J. Jr. and Jude C. Hays. 2007. “Models of Cross-Sectional Interdependence in Political Science Panel and Time-Series-Cross-Section Data.” Political Analysis 15(2):140–164. Franzese, Robert J., Jude C. Hays and Scott J. Cook. 2016. “Spatial- and Spatiotemporal- Autoregressive Probit Models of Interdependent Binary Outcomes.” Political Science Research and Methods 4(1):151–173. Hansford, Thomas G. and Brad T. Gomez. 2010. “Estimating the Electoral Effects of Voter Turnout.” American Political Science Review 104(02):268–288. Kelejian, Harry H and Ingmar R Prucha. 2004. “Estimation of simultaneous systems of spatially interrelated cross sectional equations.” Journal of Econometrics 118(1):27–50. Kirby, Andrew M. and Michael D. Ward. 1987. “The Spatial Analysis of Peace and War.” Comparative Political Studies 20(3):293–313. Liu, Xiaodong and Lung-Fei Lee. 2013. “Two-stage least squares estimation of spatial autoregressive models with endogenous regressors and many instruments.” Econometric Reviews 32(56):734–753. Plümper, Thomas and Eric Neumayer. 2010. “Model Specification in the Analysis of Spatial Dependence.” European Journal of Political Research 49(3):418–442. Ramsay, Kristopher W. 2011. “Cheap Talk Diplomacy, Voluntary Negotiations, and Variable Bargaining Power.” International Studies Quarterly 55(4):1003–1023. Ratkovic, Marc and Yuki Shiraito. 2014. “Strengthening Weak Instruments by Modeling Compliance.” Working Paper. Sovey, Allison J. and Donald P. Green. 2011. “Instrumental Variables Estimation in Political Science: A Readers’ Guide.” American Journal of Political Science 55(1):188–200. 39 Ward, Michael D. and John O’Loughlin. 2002. “Spatial Processes and Political Methodology: Introduction to the Special Issue.” Political Analysis 10(3):211–216. URL: http://pan.oxfordjournals.org/content/10/3/211.short 40
© Copyright 2026 Paperzz