B.5 Spatial Filtering Daniel A. Griffith B.5.1 Introduction In spatial statistics and spatial econometrics, spatial filtering is a general methodology supporting more robust findings in data analytic work, and is based upon a posited linkage structure that ties together georeferenced data observations. Constructed mathematical operators are applied to decompose geographically structured noise from both trend and random noise in georeferenced data, enhancing analysis results with clearer visualization possibilities and sounder statistical inference. In doing so, nearby/adjacent values are manipulated to help analyze attribute values at a given location. Spatial filtering mathematically manipulates data in order to correct for potential distortions introduced by such factors as arbitrary scale, resolution and/or zonation (i.e., surface partitioning). The primary idea is that some spatial proxy variables extracted from a spatial relationship matrix are added as control variables to a model specification. The principal advantage of this methodology is that these control variables, which identify and isolate the stochastic spatial dependencies among georeferenced observations, allow model building to proceed as if these observations were independent. Population counts data from the 2005 census of Peru, by district, for the 108 districts forming the Cusco Department are presented here to empirically illustrate the various spatial filtering approaches; an ArcGIS shapefile furnishes area measures for these districts. Population density, which ranges from 0.8 to 11,512.8 per unit area here, tends to be skewed, with a natural lower bound of zero, and few areal units with relatively sizeable concentrations. Accordingly, analyses based upon the normal probability model require application of a Box-Cox power transformation to better align the empirical population density frequency distribution with a bell-shaped curve; here the transformation is 302 Daniel A. Griffith 10 ( population area + 13.7 ) 0.56 . (B.5.1) This population density forms an elongated mound map pattern with a single peak. The highest density is in the city of Cusco, which has existed for more than 500 years, with the next-highest densities stretching along an economic corridor formed by the Vilcanota River valley; the lowest densities are in the most rural areas of this Department. This population density tends to covary specifically with elevation variability, selevation. Here the Box-Cox transformation is 1000 . selevation + 407.6 (B.5.2) The bivariate correlation for these two transformed variables is –0.48345, which is statistically significant. (a) (b) Fig. B.5.1. Geographic distributions across the Cusco Department of Peru; magnitude is directly related to gray tone darkness. (a): transformed population density. (b): transformed elevation standard deviation The geographic distributions (see Table B.5.1 and Fig. B.5.1) of both transformed population density and elevation variability display moderate, positive, and statistically significant spatial autocorrelation. B.5 Spatial filtering 303 Table B.5.1. Transformed population density and elevation variability: spatial autocorrelation in terms of MC and GR Attribute Y: population density X: elevation standard deviation MC 0.51461 0.45545 zMC 8.85 7.98 GR 0.41358 0.46650 Notes: MC denotes the Moran Coefficient, and GR denotes the Geary Ratio B.5.2 Types of spatial filtering A limited number of implementations of this methodology currently exist for georeferenced data analysis purposes, and include autoregressive linear operators (à la Cochrane-Orcutt type of prewhitening), Getis’s Gi-based specification (Getis 1990, 1995), linear combinations of eigenvectors extracted from distance-based principal coordinates of neighboring matrices (PCNM; Borcard et al. 2002, 2004; Dray et al. 2006), and topology-based spatial weights matrix eigenfunctions (Griffith 2000, 2002, 2003, 2004). The first of these is written in terms of a variance component, whereas the other three are written in terms of a mean response component, allowing especially the last two to be incorporated into generalized linear model (GLM) specifications. One technical advantage of the latter three types of spatial filter is that probability density/mass function normalizing factors no longer are problematic. These constants ensure that the probability density/mass function integrates/sums to one. They are a function of the eigenvalues of matrix C for the normal probability model. They are intractable for the binomial and Poisson probability models, requiring Markov Chain Monte Carlo (MCMC) techniques to calculate parameter estimates for these models. Another advantage is that the basis for the control variables does not change unless the spatial relationship matrix is changed. In other words, any attribute variables geographically distributed across a landscape tagged to the same geocoding scheme can be treated with the same spatial filtering. One disadvantage is that, for example, eigenfunctions may need to be extracted numerically from perhaps very large n-by-n matrices. Fortunately, the asymptotic analytically eigenfunctions for a regular square tessellation forming a rectangular region (for example, a remotely sensed image) are known. Various studies (for example, Getis and Griffith 2002; Griffith and Peres-Neto 2006) report that results obtained with these different spatial filter approaches essentially are equivalent. Autoregressive linear operators Impulse-response function filtering of time series data predates a parallel approach for spatial filtering, and motivated the development of spatial autoregressive linear operators (Tobler 1975), whose error term is correlated with some response vari- 304 Daniel A. Griffith able, Y. Consider the simultaneous spatial autoregressive (SAR) model specification −1 Y = Xβ + ( I − ρ C ) ε (B.5.3) where X is a n-by-(P+1) matrix of covariates, β is a (P+1)-by-1 vector of regression coefficients, ρ is a spatial autocorrelation parameter, n is the number of areal units, I is an n-by-n identity matrix, and C is a topology-based n-by-n geographic connectivity/weights matrix (for example, cij = 1 if areal units i and j are nearby/adjacent, and cij = 0 otherwise; cii = 0). Here these spatial filters take the matrix form (I –ρ C). The parameter ρ is estimated for Y (denoted ρ̂ , and then used in the two multiplications (I – ρ̂ C)Y, for the n-by-1 vector of response values, and (I – ρ̂ C)X, for the n-by-(p+1) vector of p covariates and intercept term. This spatial filter is almost always coupled with the normal probability model, and if properly specified, renders independent and identically distributed random error terms. Smoothing occurs in that each dataset value is rewritten as the difference between the observed value and a linear combination of neighboring values. The pure spatial autoregressive (SAR) maximum likelihood parameter estimates for the transformed population density (pd) and elevation standard deviation (selevation) attribute variables are, respectively, 0.79164 and 0.77455. According to their corresponding pseudo-R2 calculations, positive spatial autocorrelation latent in the transformed population density variable accounts for roughly 60 percent, whereas that in the transformed selevation accounts for roughly 55 percent, of its geographic variability. The bivariate correlation coefficient calculated for the spatially filtered variate pair, (I – 0.79164 W)Y and (I – 0.77455 W)X, where matrix W is the row-standardized version of matrix C, and both of which continue to conform closely to a normal distribution, decreases in absolute value to – 0.42070. Although both variables have roughly the same level of positive spatial autocorrelation, this decrease is rather modest because their map patterns are noticeably different (see Fig. B.5.1). Getis’s Gi specification This specification involves a multistep procedure exploiting Ripley’s second-order statistic or the range of a geostatistical semivariogram model coupled with the Getis-Ord (1992) Gi statistic, and converts each spatially autocorrelated variable into a pair of synthetic variates, one capturing spatial dependencies and one capturing non-spatial systematic and random effects. Regressing a response variable on the set of constructed spatial and a-spatial variates allows geographically structured noise to be separated from trend and random noise in georeferenced data. But it is restricted to non-negative random variables having a natural origin. B.5 Spatial filtering 305 The primary pair of equations is given by n ∑cij (d ) j =1 yi∗ = yi n −1 n (B.5.4) ∑cij (d ) yi j =1 n ∑ y j − yi j =1 and Ly =Y − Y ∗ (B.5.5) where d denotes distance separating location j from location i, the denominator is Gi(d), the numerator is E[Gi(d)], Y* is the a-spatial variable realization, and Ly is the spatial variable. Distance d is selected such that Gi(d), which initially tends to increase with increasing distance, begins to decrease. Figure B.5.2(a) displays the areal unit centroids for the Cusco region. Figure B.5.2(b) indicates that a 3-parameter gamma distribution (parameter estimates: shape = 0.7533, scale = 0.6984, and threshold = 0.0297) furnishes a good description of the set of distances. Figure B.5.2(c) illustrates the concavity of the Eq. (B.5.4) trajectories across the distance range of [0, 3.996]. Of note is that some trajectories encounter local peaks that are not global peaks. The number of geographic connections used for the transformed population density Gi(d) is 3,262, whereas that for the transformed elevation standard deviation is 4,787; in contrast, the number of connections in matrix C is 570. Figure B.5.3 portrays the maps of the synthetic spatial variates given by Eq. (B.5.5). The correlation between the two a-spatial synthetic variates is –0.20744, indicating that spatial autocorrelation dramatically inflates the observed coefficient. The regression equations may be written as follows: Y = a + b1 Ly + b2 X* + b3 Lx + e. (B.5.6) The variance in Y, the transformed population density, is accounted for as follows: 11.41 percent by Ly, the synthetic spatial variate; 15.39 percent by X*, the synthetic a-spatial covariate; and, 6.46 percent by Lx, the synthetic spatial covariate. Moderate multicollinearity is present in this model specification, but with virtually no impact of the regression coefficient variance inflation factors (VIFs). 306 Daniel A. Griffith (a) (b) (c) Fig. B.5.2. The Cusco Department of Peru: (a) geographic distribution of areal unit centroids; (b) a three-parameter gamma distribution description of the di values for Gi(d) – the black line denotes the empirical, and the gray line denotes the theoretical, cumulative distribution function (CDF); (c) four selected areal unit trajectories for identifying the di values for transformed population density – solid black circle denotes the smallest di, black asterisk denotes the largest di, and gray circles denote median dis (a) (b) Fig. B.5.3. Geographic distributions across the Cusco Department of Peru of Gi(d)-based spatial variates; magnitude is directly related to gray tone darkness: (a) extracted from the transformed population density; (b) extracted from the transformed elevation standard deviation B.5 Spatial filtering 307 Linear combinations of distance matrix-based eigenvectors Dray et al. (2006) specify the PCNM transformation procedure that depends on mathematical expressions, known as eigenfunctions, of a truncated inter-location distance matrix, where the truncation value is the maximum distance that maintains all sampling units being connected using a minimum spanning tree. The PCNM specification relates to semivariogram modeling. Distance-based eigenvector maps with large eigenvalues (that is, strong positive spatial autocorrelation) tend to have only a few large clusters of values on a map and represent global trends [for example, Fig. B.5.4(b)]. Eigenvectors with intermediate size eigenvalues tend to have a number of moderate-sized clusters of values on a map and represent regional trends [for example, Fig. B.5.4(c) and Fig. B.5.4(d)]. And, eigenvectors with small eigenvalues tend to have numerous small clusters of values on a map and represent patchiness and hence more local trends across a landscape [for example, Fig. B.5.4(e)]. Moreover, distance-based eigenvector maps capture a range of geographic scales encapsulated in a given georeferenced dataset, portraying increasing fragmentation as the corresponding eigenvalues decrease in magnitude. This specification utilizes eigenvectors extracted from the modified geographic weights matrix (I – 11T / n) W (I – 11T / n) where 1 is an n-by-1 vector of ones, and T denotes the matrix transpose operation. The elements of the n-by-n geographic weights matrix W are defined as follows: 0 Wij = 0 1−[d ij / ( 4t )]2 if i = j if d ij > t (B.5.7) if 0 < d ij ≤ t where t is the maximum distance for a minimum spanning tree connecting all n locations (for example, Fig. B.5.4(a)). Here the great circle distance value for t is 16.022 km. The eigenvalues associated with the PCNM eigenvectors do not have a simple relationship with their affiliated MCs (see Table B.5.1); some non-zero eigenvalues even represent weak negative spatial autocorrelation. Employing an adjusted value of MC/MCmax > 0.25, where MCmax denotes the maximum MC value, reduces the candidate set of eigenvectors for constructing PCNM spatial filters to 15 (that is, eigenvectors E1 to E12, E14, E16 and E17). The spatial autocorrelation contained in a response variable Y may be described with these eigenvectors as follows Y = µY 1 + Ek βk + εY (B.5.8) 308 Daniel A. Griffith where Ek is an n-by-K matrix of selected eigenvectors (using stepwise regression techniques), µY is the mean of variable Y (because all of the eigenvectors have a mean of zero), βk is a K-by-1 vector of regression coefficients, and εY is a random error term that is iid N(0, σ ε2 ). For transformed population density in the Cusco Department, Eq. (B.5.8) contains seven eigenvectors that account for 52.42 percent of its geographic variation. The zMC (z-score for the MC under a null hypothesis of zero spatial autocorrelation) value decreases from 8.79 to 2.83, and residuals continue to mimic a normal distribution, with MC = 0.76944 (GR = 0.25051) for the spatial filter. (a) (b) (c) (d) (e) Fig. B.5.4. The Cusco Department of Peru; magnitude in the choropleth maps is directly related to gray tone darkness: (a) the minimum spanning tree connecting the areal unit centroids; (b) E1, MC/MCmax = 1; (c) E3, MC/MCmax = 0.78; (d) E9, MC/MCmax = 0.52; (e): E14, MC/MCmax = 0.25 The correlation between the two sets of residuals for Eq. (B.5.8), after the respective spatial filters have been subtracted from transformed population density and transformed selevation, is –0.39203, indicating that spatial autocorrelation dramatically inflates the observed bivariate correlation coefficient. This inflation primarily is attributable to the three common eigenvectors, whose correlation is –0.91928; but it is suppressed by the presence of two sets of unique eigenvectors, whose correlations are exactly zero. B.5 Spatial filtering 309 Table B.5.2. Spatial autocorrelation contained in the 30 PCNM eigenvectors with non-zero eigenvalues Eigenvalue 6.757698 5.387761 4.428140 3.891251 3.390504 2.960842 2.796129 2.389961 2.285282 2.176693 1.932853 1.467388 1.359360 1.345400 1.164052 MC 0.879567 0.830660 0.683353 0.687340 0.586984 0.611523 0.523140 0.350517 0.458341 0.495837 0.407144 0.355678 0.196455 0.220722 0.209691 GR 0.199937 0.257939 0.326650 0.395729 0.444859 0.439971 0.534097 0.691055 0.611829 0.470299 0.637691 0.725449 0.720122 0.719404 0.768738 Eigenvalue 1.115993 0.986066 0.968562 0.890959 0.779474 0.714815 0.664565 0.578630 0.540622 0.386237 0.291445 0.228748 0.213037 0.158005 0.083016 MC 0.255954 0.246825 0.192463 0.168454 0.098354 0.151115 –0.034844 –0.109281 0.045193 0.003223 –0.127027 –0.025737 –0.036185 0.043036 –0.067713 GR 0.644689 0.870757 0.462175 0.909110 1.126354 0.818756 1.188295 1.295845 1.032415 0.917556 1.275769 1.200260 1.254641 1.033882 1.261137 Linear combinations of topological matrix-based eigenvectors This specification (see Tiefelsdorf and Griffith 2007) is a transformation procedure that also depends on eigenvectors extracted from the adjusted geographic weights matrix (I – 11T / n) C (I – 11T / n), a term appearing in the numerator of the MC spatial autocorrelation index. This decomposition also could be based upon the GR index, and rests on the following property: the first eigenvector, say E1, is the set of real values that has the largest MC achievable by any set for the spatial arrangement defined by the geographic connectivity matrix C; the second eigenvector is the set of real values that has the largest achievable MC by any set that is uncorrelated with E1; the third eigenvector is the third such set of real values; and so on through En, the set of real values that has the largest negative MC achievable by any set that is uncorrelated with the preceding (n–1) eigenvectors. As such, these eigenvectors furnish distinct map pattern descriptions of latent spatial autocorrelation in georeferenced variables, because they are both orthogonal and uncorrelated. Their corresponding eigenvalues, which can be easily converted to MC values, index the nature and degree of spatial autocorrelation portrayed by each eigenvector. As with PCNM, the resulting spatial filter is constructed from some linear combination of a subset of these eigenvectors. The candidate set can begin with all eigenvectors portraying the same nature (that is, positive or negative) of spatial autocorrelation as is measured in a response variable. Next, those eigenvectors representing inconsequential levels of spatial autocorrelation (that is, with very small eigenvalues) should be removed from this candidate set. Finally, a stepwise regression procedure can be used to select those eigenvectors that account for the spatial autocorrelation in the response variable. This stepwise selection can be 310 Daniel A. Griffith based upon, say, the conventional R2-maximiation criterion, or a residual MC minimization criterion. In practice, this spatial filter specification replaces the autoregressive spatial filter with its eigenfunction counterpart, and its single autoregressive parameter with a set of parameter estimates, one for each eigenvector, removing those from the model whose estimates essentially are zero. Table B.5.3. Eigenvector spatial filter regression results using a 10 percent level of significance selection criterion Component Population density (Y) and elevation standard deviation (X), for the Cusco Department, Peru (n = 108) Common eigenvectors Unique eigenvectors All selected eigenvectors Residual MC Shapiro-Wilk (S-W) statistic MC for spatial filters Transformed Y R2 = 0.4645 R2 = 0.1543 R2 = 0.6188 zMC ≈ –0.23 0.987 (prob = 0.393) 0.4719 Transformed X R2 = 0.5189 R2 = 0.0565 R2 = 0.5753 zMC ≈ –0.19 0.986 (prob = 0.313) 0.4019 Spatial filters were constructed for the two Cusco transformed attribute variables, where the candidate eigenvector set was restricted to those 24 vectors portraying positive spatial autocorrelation and having a MC/MCmax > 0.25; the maximum possible MC value for Cusco’s topological surface partitioning, MCmax, is 1.09315, the MC value for the principal eigenvector. The resulting spatial filters appear in Fig. B.5.5, each portraying strong positive spatial autocorrelation, and each closely reflecting its parent map (see Fig. B.5.1). Summary measures for them are reported in Table B.5.2. The bivariate correlation coefficient between (X – FX) and (Y – FY), where Fj denotes the spatial filter for variable j, and both of which continue to conform closely to a normal distribution, decreases in absolute value to –0.42688. Here spatial autocorrelation roughly accounts for, respectively, 62 percent and 58 percent of the geographic variability in these transformed attribute variables. The filtered residuals contain negligible spatial autocorrelation. Although both variables have roughly the same level of positive spatial autocorrelation, the correlation coefficient decrease is rather modest because their map patterns are noticeably different: their spatial filters have nine eigenvectors in common, and seven that are specific to one or the other of them. The decompositions highlighted here may be written as Y = µY 1 + E c β cY + E u Y β uY + εY (B.5.9) X = µ X 1 + E c βc X + E u X β u X + εX (B.5.10) B.5 Spatial filtering 311 where E is an n-by-H matrix for X and an n-by-K matrix for Y (with H and K not necessarily equal) of selected eigenvectors, subscripts c and u respectively denote common and unique sets of eigenvectors, β is a vector of regression coefficients, and εY and ε X respectively are the iid N (0, σ ε2j ), j = X or Y, a-spatial variates for variables X and Y. As with PCNM, the linear combinations of eigenvectors are the spatial filters. (a) (b) Fig. B.5.5. Typology-based spatial filters for the Cusco Department of Peru; eigenvector values are directly related to gray tone darkness: (a) for transformed population density; (b) for transformed elevation standard deviation Now the bivariate correlation coefficient can be rewritten as the following weighted combination of different correlation coefficients, where the weights are the square roots of relative variance term products (see Table B.5.2) 2 2 rX ,Y = rresid X ,residY (1 − RX )(1 − RY ) + rEc + rE uX 2 , residY 2 REu (1 − RY ) + rresid X , Eu X Y 2 X , EcY 2 REc REc + X 2 Y 2 (1 − RX ) REu + 0 RE2u RE2u Y X Y (B.5.11) 312 Daniel A. Griffith where resid denotes the residuals, R2 is a linear regression multiple correlation coefficient, and the subscripts X and Y denote with which variable a term is associated. The zero correlation arises because the unique sets of eigenvectors are orthogonal and uncorrelated. Substituting the corresponding Cusco case study values into this equation (see Table B.5.1; some rounding error is present) yields –0.48345 = – 0.43904 (1 − 0.6188)(1 − 0.5753) – 0.60486 (0.4645)(0.5189) – 0.10384 (0.1543)(1 − 0.5753) + 0.11396 (1 − 0.6188)(0.0565) + 0 (0.1543)(0.0565) . This decomposition equation like that for PCNM, emphasizes that common eigenvectors tend to increase the magnitude of a correlation coefficient, whereas unique eigenvectors tend to suppress it. B.5.3 Eigenfunction spatial filtering and generalized linear models A spatial filter can be constructed for GLM specifications again using a stepwise selection technique. By doing so, MCMC techniques can be avoided when estimating model parameters in the presence of spatial autocorrelation; rather, standard GLM procedures can be used. Because population is a count variable, it can be treated as a Poisson random variable, and the area variable in the denominator of a population density can be converted to a GLM offset variable (that is, its coefficient is set to one and not estimated) by including its logarithm as a special covariate (that is, an offset) in a model specification. For the Cusco Departmental data, the GLM estimation, including log(selevation) as a covariate, yields the spatial filter appearing in Fig. B.5.6, whose MC = 0.86030 (zMC = 14.94) and GR = 0.31022. This spatial filter has nine eigenvectors, six of which are contained in the set of eleven for the corresponding normal-approximation spatial filter. Including the previously specified transformed selevation as a covariate in the normal approximation specification increases its R2 to 0.6821. Switching to the correct probability function here results in a more parsimonious model whose predicted values better align with actual population density across the entire range of density values [see Fig. B.5.6(b) and Fig. B.5.6(c)]. B.5 (a) Spatial filtering 313 (b) (c) Fig. B.5.6. Generalized linear model (GLM) results: (a) the population density GLM spatial filter; eigenvector values are directly related to gray tone darkness; (b) scatterplot of the predicted versus the observed pd; (c) scatterplot of the predicted versus the observed pd with the four largest values set aside. The solid black line denotes observed pd, open circles denote GLM-predicted pd, and asterisks denote back-transformed normal approximation predicted pd B.5.4 Eigenfunction spatial filtering and geographically weighted regression Eigenfunction spatial filters allow geographically varying coefficient models to be specified, along the lines of geographically weighted regression (GWR). Interaction terms can be created by multiplying each variable in a set of covariates by each eigenvector in a candidate set. In other words, these interaction variates are cross-products of each synthetic spatial variate and each covariate. Again stepwise regression can be used to select the relevant variables. The stepwise procedures can be used to select from the candidate eigenvector set (which relates to the intercept term), the set of covariates, and the set of interaction terms. Once the subset has been identified, it can be grouped into sets having a common covariate so that this covariate can be factored from each set. What remains for each set is a linear combination of the synthetic spatial variates used to construct a cross-product, 314 Daniel A. Griffith which when added together constitutes geographically varying coefficients. The affiliated equation may be written as follows: Y = f( µY 1 + E1 β 1 + X Ex β x )) (B.5.12) where f denotes some function (for example, the natural antilogarithm, e, for the Poisson probability model), the subscript 1 denotes the eigenvector and the regression coefficient associated with the intercept term, the subscript X denotes eigenvectors and their regression coefficients associated with the slope coefficient, and denotes the Hammard matrix product (that is, element-by-element matrix multiplication). (a) (b) Fig. B.5.7. Geographically varying coefficients for the GLM population density model; coefficient magnitudes are directly related to gray tone darkness: (a) spatially varying intercept term; (b) spatially varying slope coefficient Consider the preceding GLM model describing population density across the Cusco Department. The geographically varying intercept can be rewritten as 8.8834 – 4.7838 E1 – 4.3226 E3 + 48.3641 E4 + 1.8258 E6 – 2.0448 E12 + 2.1773 E13 – 2.7006 E14 – 1.6251 E16 – 1.7334 E19 . Meanwhile, the geographically varying slope coefficient can be rewritten as –0.9446 – 8.7899 E4 – 0.3106 E10 – 0.4664 E11 + 0.7628 E15 . B.5 Spatial filtering 315 This is the term that is factored from the set of cross-product terms (i.e., each eigenvector multiplied by selevation); each element of this term is multiplied by its corresponding log(selevation) value. The geographic distributions of the spatially varying coefficients appear in Fig. B.5.7. Because eigenvector E4 is common to both coefficient expressions, and it dominates the intercept term, the correlation between these two geographically varying coefficients is very high (–0.98036). Because each of the eigenvectors has a mean of zero, these two geographically varying coefficients are centered on their respective global values [that is, the intercept constant, and the slope coefficient for log(selevation), itself]. Furthermore, because the coefficient variability is a function of the eigenvectors, these geographically varying coefficients contain (as well as account for) spatial autocorrelation in the response variable Y. Table B.5.4. Geographically varying coefficients: spatial autocorrelation in terms of MC and GR Coefficient MC zMC GR Intercept 0.92345 16.02 0.22664 Log(selevation) slope 0.92090 15.98 0.23104 Notes: MC denotes the Moran Coefficient, and GR denotes the Geary Ratio Each coefficient contains statistically significant, weak positive spatial autocorrelation. B.5.5 Eigenfunction spatial filtering and geographical interpolation Spatial interpolation is a problem frequently encountered in spatial analysis. Its solution exploits spatial autocorrelation in order to predict an unknown value at some location from known values at nearby locations. The redundant information interpretation of spatial autocorrelation, which relates to the amount of geographic variance it accounts for within an attribute variable, supports this interpolation. The best imputation of a missing response value is its expected value given a set of available data. In other words, it equals the prediction equation estimated with a set of observed data. This value can be calculated by inserting a binary indicator variable into a regression equation, where this variable is assigned a value of minus one for the single observation with a missing response value, and a zero for all other observations. The regression coefficient calculated for this indicator variable is an imputation. For a Poisson model specification, this requires the missing response variable value to be replaced with a one 316 Daniel A. Griffith K exp (α + β X X i + ∑ Eki β k − β m1) = 1 (B.5.13) k =1 when K β m = α + β X X i + ∑ E ki β k . (B.5.14) k =1 Imputed values for population density across the Cusco Department were calculated and are portrayed in Fig. B.5.8. The expected values were computed with the covariate log(selevation) coupled with a spatial filter. Of note is that Fig. B.5.8(a) is very similar to Fig. B.5.6(b); more variability appears here because each density value is not used in the calculation of the GLM, increasing the uncertainty in its prediction. Nevertheless, given their alignment with the ideal line in Fig. B.5.8, the imputed values obtained here appear to be reasonable. (a) (b) Fig. B.5.8. Generalized linear model (GLM) imputation results: (a) scatterplot of the imputed versus the observed population densities (pd); (b) scatterplot of the imputed versus the observed population densities (pd) with the four largest values set aside. The solid black line denotes observed pd, and the open circle denotes GLM-imputed pd B.5.6 Eigenfunction spatial filtering and spatial interaction data Recent work has returned attention to the role spatial autocorrelation plays in the estimation of model parameters describing spatial interaction data. LeSage and Pace (2008) propose a formulation that is autoregressive-based, and relates to the autoregressive linear operator spatial filter. Fischer and Griffith (2008) compare this autoregressive linear operator specification with an eigenfunction spatial filter specification. One finding is that the spatial autocorrelation involved transcends B.5 Spatial filtering 317 that latent in attribute variables representing characteristics of origins/destinations. Rather, the spatial autocorrelation relates to flows leaving nearby origins and arriving in nearby destinations. This conceptualization is reminiscent of the hierarchical component affiliated with geographic diffusion. This topic is at the research frontiers of spatial filtering work. B.5.7 Concluding remarks Spatial filtering methodology seeks to account for spatial autocorrelation in georeferenced data in a way that enables conventional statistical estimation techniques to be exploited. It also allows impacts of spatial autocorrelation to be uncovered in a more data analytic manner. Two geographically distributed attribute variables for the Cusco Department of Peru – 2005 population density and elevation variation – are used here to illustrate this contention, with special reference to their bivariate correlation coefficient. The naive correlation coefficient is –0.48345. Adjusting this value for the presence of positive spatial autocorrelation results in a decrease in its absolute value; in other words, positive spatial autocorrelation tends to inflate correlation coefficients. But this reduction is a function of the spatial filter specification employed. The autoregressive linear operator, PCNM, and eigenfunction spatial filtering results are very comparable. They are, respectively, –0.42070, –0.39203, and –0.43904. This finding is not surprising, because all three of these methodologies share a common mathematical foundation. In contrast, the Gi(d)-based spatial filtering yields a value of –0.20744. Part of its deviation from the other three results may well be attributable to its more restrictive assumptions. Spatial filtering can be employed not only with the normal probability model, but also with the entire family of probability models affiliated with generalized linear models. It also supports spatial interpolation, and offers a vehicle for addressing spatial autocorrelation in geographic flows data. Acknowledgements. We are indebted to Marco Millones, Clark University, for providing us with the Cusco Department GIS files, and its 2005 Peru Census data numbers. References Borcard D, Legendre P (2002) All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecol Mod 153(1/2):51-68 318 Daniel A. Griffith Borcard D, Legendre P, Avois-Jacquet C, Tuomisto H (2004) Dissecting the spatial structure of ecological data at multiple scales. Ecology 85(7):1826-1832 Dray S, Legendre P, Peres-Neto P (2006) Spatial modeling: A comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecol Mod 196(3/4):483-493 Fischer MM, Griffith D (2008) Modeling spatial autocorrelation in spatial interaction data: an application to patent citation data in the European Union. J Reg Sci 48(5):969-989 Getis A (1990) Screening for spatial dependence in regression analysis. Papers in Reg Sci Assoc 69(1):69-81 Getis A (1995) Spatial filtering in a regression framework: experiments on regional inequality, government expenditures, and urban crime. In Anselin A, Florax, R (eds) New directions in spatial econometrics. Springer, Berlin, Heidelberg and New York, pp.172-188 Getis A, Griffith D (2002) Comparative spatial filtering in regression analysis. Geogr Anal 34(2):130-140 Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24(3):189-206 Griffith D (2000) A linear regression solution to the spatial autocorrelation problem. J Geogr Syst 2(2):141-156 Griffith D (2002) A spatial filtering specification for the auto-Poisson model. Stat Prob Letters 58(3):245-251 Griffith D (2003) Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Springer, Berlin, Heidelberg and New York Griffith D (2004) A spatial filtering specification for the autologistic model. Environ Plann A 36(10):1791-1811 Griffith D, Peres-Neto P (2006) Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses. Ecology 87(10):2603-2613 Haining R (1991) Bivariate correlation with spatial data. Geogr Anal 23(3):210-227 LeSage JP, Pace K (2008) Spatial econometric modeling of origin-destination flows. J Reg Sci 48(5):941-968 Tiefelsdorf M, Griffith D (2007) Semi-parametric filtering of spatial autocorrelation: the eigenvector approach. Environ Plann A 39(5):1193-1221 Tobler WR (1975) Linear operators applied to areal data. In Davis J, McCullagh M (eds) Display and analysis of spatial data. Wiley, London, pp.14-37
© Copyright 2026 Paperzz