B5 Griffith - Center for Spatially Integrated Social Science

B.5
Spatial Filtering
Daniel A. Griffith
B.5.1
Introduction
In spatial statistics and spatial econometrics, spatial filtering is a general methodology supporting more robust findings in data analytic work, and is based upon a
posited linkage structure that ties together georeferenced data observations. Constructed mathematical operators are applied to decompose geographically structured noise from both trend and random noise in georeferenced data, enhancing
analysis results with clearer visualization possibilities and sounder statistical inference. In doing so, nearby/adjacent values are manipulated to help analyze attribute values at a given location. Spatial filtering mathematically manipulates
data in order to correct for potential distortions introduced by such factors as arbitrary scale, resolution and/or zonation (i.e., surface partitioning).
The primary idea is that some spatial proxy variables extracted from a spatial
relationship matrix are added as control variables to a model specification. The
principal advantage of this methodology is that these control variables, which
identify and isolate the stochastic spatial dependencies among georeferenced observations, allow model building to proceed as if these observations were independent.
Population counts data from the 2005 census of Peru, by district, for the 108
districts forming the Cusco Department are presented here to empirically illustrate
the various spatial filtering approaches; an ArcGIS shapefile furnishes area measures for these districts. Population density, which ranges from 0.8 to 11,512.8 per
unit area here, tends to be skewed, with a natural lower bound of zero, and few
areal units with relatively sizeable concentrations. Accordingly, analyses based
upon the normal probability model require application of a Box-Cox power transformation to better align the empirical population density frequency distribution
with a bell-shaped curve; here the transformation is
302
Daniel A. Griffith
10
(
population
area
+ 13.7 )
0.56
.
(B.5.1)
This population density forms an elongated mound map pattern with a single peak.
The highest density is in the city of Cusco, which has existed for more than 500
years, with the next-highest densities stretching along an economic corridor
formed by the Vilcanota River valley; the lowest densities are in the most rural
areas of this Department. This population density tends to covary specifically with
elevation variability, selevation. Here the Box-Cox transformation is
1000
.
selevation + 407.6
(B.5.2)
The bivariate correlation for these two transformed variables is –0.48345, which is
statistically significant.
(a)
(b)
Fig. B.5.1. Geographic distributions across the Cusco Department of Peru; magnitude is directly related to gray tone darkness. (a): transformed population density. (b): transformed
elevation standard deviation
The geographic distributions (see Table B.5.1 and Fig. B.5.1) of both transformed
population density and elevation variability display moderate, positive, and statistically significant spatial autocorrelation.
B.5
Spatial filtering
303
Table B.5.1. Transformed population density and elevation variability:
spatial autocorrelation in terms of MC and GR
Attribute
Y: population density
X: elevation standard deviation
MC
0.51461
0.45545
zMC
8.85
7.98
GR
0.41358
0.46650
Notes: MC denotes the Moran Coefficient, and GR denotes the Geary Ratio
B.5.2 Types of spatial filtering
A limited number of implementations of this methodology currently exist for georeferenced data analysis purposes, and include autoregressive linear operators (à
la Cochrane-Orcutt type of prewhitening), Getis’s Gi-based specification (Getis
1990, 1995), linear combinations of eigenvectors extracted from distance-based
principal coordinates of neighboring matrices (PCNM; Borcard et al. 2002, 2004;
Dray et al. 2006), and topology-based spatial weights matrix eigenfunctions (Griffith 2000, 2002, 2003, 2004). The first of these is written in terms of a variance
component, whereas the other three are written in terms of a mean response component, allowing especially the last two to be incorporated into generalized linear
model (GLM) specifications.
One technical advantage of the latter three types of spatial filter is that probability density/mass function normalizing factors no longer are problematic. These
constants ensure that the probability density/mass function integrates/sums to one.
They are a function of the eigenvalues of matrix C for the normal probability
model. They are intractable for the binomial and Poisson probability models, requiring Markov Chain Monte Carlo (MCMC) techniques to calculate parameter
estimates for these models. Another advantage is that the basis for the control
variables does not change unless the spatial relationship matrix is changed. In
other words, any attribute variables geographically distributed across a landscape
tagged to the same geocoding scheme can be treated with the same spatial filtering. One disadvantage is that, for example, eigenfunctions may need to be extracted numerically from perhaps very large n-by-n matrices. Fortunately, the asymptotic analytically eigenfunctions for a regular square tessellation forming a
rectangular region (for example, a remotely sensed image) are known.
Various studies (for example, Getis and Griffith 2002; Griffith and Peres-Neto
2006) report that results obtained with these different spatial filter approaches essentially are equivalent.
Autoregressive linear operators
Impulse-response function filtering of time series data predates a parallel approach
for spatial filtering, and motivated the development of spatial autoregressive linear
operators (Tobler 1975), whose error term is correlated with some response vari-
304
Daniel A. Griffith
able, Y. Consider the simultaneous spatial autoregressive (SAR) model specification
−1
Y = Xβ + ( I − ρ C ) ε
(B.5.3)
where X is a n-by-(P+1) matrix of covariates, β is a (P+1)-by-1 vector of regression coefficients, ρ is a spatial autocorrelation parameter, n is the number of areal
units, I is an n-by-n identity matrix, and C is a topology-based n-by-n geographic
connectivity/weights matrix (for example, cij = 1 if areal units i and j are
nearby/adjacent, and cij = 0 otherwise; cii = 0). Here these spatial filters take the
matrix form (I –ρ C). The parameter ρ is estimated for Y (denoted ρ̂ , and then
used in the two multiplications (I – ρ̂ C)Y, for the n-by-1 vector of response values, and (I – ρ̂ C)X, for the n-by-(p+1) vector of p covariates and intercept term.
This spatial filter is almost always coupled with the normal probability model,
and if properly specified, renders independent and identically distributed random
error terms. Smoothing occurs in that each dataset value is rewritten as the difference between the observed value and a linear combination of neighboring values.
The pure spatial autoregressive (SAR) maximum likelihood parameter estimates for the transformed population density (pd) and elevation standard deviation
(selevation) attribute variables are, respectively, 0.79164 and 0.77455. According to
their corresponding pseudo-R2 calculations, positive spatial autocorrelation latent
in the transformed population density variable accounts for roughly 60 percent,
whereas that in the transformed selevation accounts for roughly 55 percent, of its
geographic variability. The bivariate correlation coefficient calculated for the spatially filtered variate pair, (I – 0.79164 W)Y and (I – 0.77455 W)X, where matrix
W is the row-standardized version of matrix C, and both of which continue to conform closely to a normal distribution, decreases in absolute value to – 0.42070. Although both variables have roughly the same level of positive spatial autocorrelation, this decrease is rather modest because their map patterns are noticeably
different (see Fig. B.5.1).
Getis’s Gi specification
This specification involves a multistep procedure exploiting Ripley’s second-order
statistic or the range of a geostatistical semivariogram model coupled with the
Getis-Ord (1992) Gi statistic, and converts each spatially autocorrelated variable
into a pair of synthetic variates, one capturing spatial dependencies and one capturing non-spatial systematic and random effects. Regressing a response variable
on the set of constructed spatial and a-spatial variates allows geographically structured noise to be separated from trend and random noise in georeferenced data.
But it is restricted to non-negative random variables having a natural origin.
B.5
Spatial filtering
305
The primary pair of equations is given by
n
∑cij (d )
j =1
yi∗
= yi
n −1
n
(B.5.4)
∑cij (d ) yi
j =1
n
∑ y j − yi
j =1
and
Ly =Y − Y ∗
(B.5.5)
where d denotes distance separating location j from location i, the denominator is
Gi(d), the numerator is E[Gi(d)], Y* is the a-spatial variable realization, and Ly is
the spatial variable. Distance d is selected such that Gi(d), which initially tends to
increase with increasing distance, begins to decrease.
Figure B.5.2(a) displays the areal unit centroids for the Cusco region. Figure
B.5.2(b) indicates that a 3-parameter gamma distribution (parameter estimates:
shape = 0.7533, scale = 0.6984, and threshold = 0.0297) furnishes a good description of the set of distances. Figure B.5.2(c) illustrates the concavity of the Eq.
(B.5.4) trajectories across the distance range of [0, 3.996]. Of note is that some
trajectories encounter local peaks that are not global peaks. The number of geographic connections used for the transformed population density Gi(d) is 3,262,
whereas that for the transformed elevation standard deviation is 4,787; in contrast,
the number of connections in matrix C is 570.
Figure B.5.3 portrays the maps of the synthetic spatial variates given by Eq.
(B.5.5). The correlation between the two a-spatial synthetic variates is –0.20744,
indicating that spatial autocorrelation dramatically inflates the observed coefficient. The regression equations may be written as follows:
Y = a + b1 Ly + b2 X* + b3 Lx + e.
(B.5.6)
The variance in Y, the transformed population density, is accounted for as follows:
11.41 percent by Ly, the synthetic spatial variate; 15.39 percent by X*, the synthetic a-spatial covariate; and, 6.46 percent by Lx, the synthetic spatial covariate.
Moderate multicollinearity is present in this model specification, but with virtually
no impact of the regression coefficient variance inflation factors (VIFs).
306
Daniel A. Griffith
(a)
(b)
(c)
Fig. B.5.2. The Cusco Department of Peru:
(a) geographic distribution of areal unit centroids; (b) a three-parameter gamma distribution
description of the di values for Gi(d) – the black line denotes the empirical, and the gray
line denotes the theoretical, cumulative distribution function (CDF); (c) four selected areal
unit trajectories for identifying the di values for transformed population density – solid
black circle denotes the smallest di, black asterisk denotes the largest di, and gray circles
denote median dis
(a)
(b)
Fig. B.5.3. Geographic distributions across the Cusco Department of Peru of Gi(d)-based
spatial variates; magnitude is directly related to gray tone darkness: (a) extracted from the
transformed population density; (b) extracted from the transformed elevation standard
deviation
B.5
Spatial filtering
307
Linear combinations of distance matrix-based eigenvectors
Dray et al. (2006) specify the PCNM transformation procedure that depends on
mathematical expressions, known as eigenfunctions, of a truncated inter-location
distance matrix, where the truncation value is the maximum distance that maintains all sampling units being connected using a minimum spanning tree. The
PCNM specification relates to semivariogram modeling. Distance-based eigenvector maps with large eigenvalues (that is, strong positive spatial autocorrelation)
tend to have only a few large clusters of values on a map and represent global
trends [for example, Fig. B.5.4(b)]. Eigenvectors with intermediate size eigenvalues tend to have a number of moderate-sized clusters of values on a map and represent regional trends [for example, Fig. B.5.4(c) and Fig. B.5.4(d)]. And, eigenvectors with small eigenvalues tend to have numerous small clusters of values on
a map and represent patchiness and hence more local trends across a landscape
[for example, Fig. B.5.4(e)]. Moreover, distance-based eigenvector maps capture a
range of geographic scales encapsulated in a given georeferenced dataset, portraying increasing fragmentation as the corresponding eigenvalues decrease in magnitude.
This specification utilizes eigenvectors extracted from the modified geographic weights matrix (I – 11T / n) W (I – 11T / n) where 1 is an n-by-1 vector of
ones, and T denotes the matrix transpose operation. The elements of the n-by-n
geographic weights matrix W are defined as follows:
0

Wij = 0

1−[d ij / ( 4t )]2
if i = j
if d ij > t
(B.5.7)
if 0 < d ij ≤ t
where t is the maximum distance for a minimum spanning tree connecting all n locations (for example, Fig. B.5.4(a)). Here the great circle distance value for t is
16.022 km.
The eigenvalues associated with the PCNM eigenvectors do not have a simple
relationship with their affiliated MCs (see Table B.5.1); some non-zero eigenvalues even represent weak negative spatial autocorrelation. Employing an adjusted value of MC/MCmax > 0.25, where MCmax denotes the maximum MC
value, reduces the candidate set of eigenvectors for constructing PCNM spatial filters to 15 (that is, eigenvectors E1 to E12, E14, E16 and E17). The spatial autocorrelation contained in a response variable Y may be described with these eigenvectors
as follows
Y = µY 1 + Ek βk + εY
(B.5.8)
308
Daniel A. Griffith
where Ek is an n-by-K matrix of selected eigenvectors (using stepwise regression
techniques), µY is the mean of variable Y (because all of the eigenvectors have a
mean of zero), βk is a K-by-1 vector of regression coefficients, and εY is a random error term that is iid N(0, σ ε2 ). For transformed population density in the
Cusco Department, Eq. (B.5.8) contains seven eigenvectors that account for 52.42
percent of its geographic variation. The zMC (z-score for the MC under a null hypothesis of zero spatial autocorrelation) value decreases from 8.79 to 2.83, and residuals continue to mimic a normal distribution, with MC = 0.76944 (GR =
0.25051) for the spatial filter.
(a)
(b)
(c)
(d)
(e)
Fig. B.5.4. The Cusco Department of Peru; magnitude in the choropleth maps is directly related to gray tone darkness: (a) the minimum spanning tree connecting the areal unit centroids; (b) E1, MC/MCmax = 1; (c) E3, MC/MCmax = 0.78; (d) E9, MC/MCmax = 0.52; (e): E14,
MC/MCmax = 0.25
The correlation between the two sets of residuals for Eq. (B.5.8), after the respective spatial filters have been subtracted from transformed population density and
transformed selevation, is –0.39203, indicating that spatial autocorrelation dramatically inflates the observed bivariate correlation coefficient. This inflation primarily is attributable to the three common eigenvectors, whose correlation is
–0.91928; but it is suppressed by the presence of two sets of unique eigenvectors,
whose correlations are exactly zero.
B.5
Spatial filtering
309
Table B.5.2. Spatial autocorrelation contained in the 30 PCNM eigenvectors with non-zero
eigenvalues
Eigenvalue
6.757698
5.387761
4.428140
3.891251
3.390504
2.960842
2.796129
2.389961
2.285282
2.176693
1.932853
1.467388
1.359360
1.345400
1.164052
MC
0.879567
0.830660
0.683353
0.687340
0.586984
0.611523
0.523140
0.350517
0.458341
0.495837
0.407144
0.355678
0.196455
0.220722
0.209691
GR
0.199937
0.257939
0.326650
0.395729
0.444859
0.439971
0.534097
0.691055
0.611829
0.470299
0.637691
0.725449
0.720122
0.719404
0.768738
Eigenvalue
1.115993
0.986066
0.968562
0.890959
0.779474
0.714815
0.664565
0.578630
0.540622
0.386237
0.291445
0.228748
0.213037
0.158005
0.083016
MC
0.255954
0.246825
0.192463
0.168454
0.098354
0.151115
–0.034844
–0.109281
0.045193
0.003223
–0.127027
–0.025737
–0.036185
0.043036
–0.067713
GR
0.644689
0.870757
0.462175
0.909110
1.126354
0.818756
1.188295
1.295845
1.032415
0.917556
1.275769
1.200260
1.254641
1.033882
1.261137
Linear combinations of topological matrix-based eigenvectors
This specification (see Tiefelsdorf and Griffith 2007) is a transformation procedure that also depends on eigenvectors extracted from the adjusted geographic
weights matrix (I – 11T / n) C (I – 11T / n), a term appearing in the numerator of
the MC spatial autocorrelation index. This decomposition also could be based
upon the GR index, and rests on the following property: the first eigenvector, say
E1, is the set of real values that has the largest MC achievable by any set for the
spatial arrangement defined by the geographic connectivity matrix C; the second
eigenvector is the set of real values that has the largest achievable MC by any set
that is uncorrelated with E1; the third eigenvector is the third such set of real values; and so on through En, the set of real values that has the largest negative MC
achievable by any set that is uncorrelated with the preceding (n–1) eigenvectors.
As such, these eigenvectors furnish distinct map pattern descriptions of latent spatial autocorrelation in georeferenced variables, because they are both orthogonal
and uncorrelated. Their corresponding eigenvalues, which can be easily converted
to MC values, index the nature and degree of spatial autocorrelation portrayed by
each eigenvector.
As with PCNM, the resulting spatial filter is constructed from some linear
combination of a subset of these eigenvectors. The candidate set can begin with all
eigenvectors portraying the same nature (that is, positive or negative) of spatial
autocorrelation as is measured in a response variable. Next, those eigenvectors
representing inconsequential levels of spatial autocorrelation (that is, with very
small eigenvalues) should be removed from this candidate set. Finally, a stepwise
regression procedure can be used to select those eigenvectors that account for the
spatial autocorrelation in the response variable. This stepwise selection can be
310
Daniel A. Griffith
based upon, say, the conventional R2-maximiation criterion, or a residual MC
minimization criterion.
In practice, this spatial filter specification replaces the autoregressive spatial
filter with its eigenfunction counterpart, and its single autoregressive parameter
with a set of parameter estimates, one for each eigenvector, removing those from
the model whose estimates essentially are zero.
Table B.5.3. Eigenvector spatial filter regression results using a 10 percent level of significance selection criterion
Component
Population density (Y) and elevation standard deviation (X),
for the Cusco Department, Peru (n = 108)
Common eigenvectors
Unique eigenvectors
All selected eigenvectors
Residual MC
Shapiro-Wilk (S-W) statistic
MC for spatial filters
Transformed Y
R2 = 0.4645
R2 = 0.1543
R2 = 0.6188
zMC ≈ –0.23
0.987 (prob = 0.393)
0.4719
Transformed X
R2 = 0.5189
R2 = 0.0565
R2 = 0.5753
zMC ≈ –0.19
0.986 (prob = 0.313)
0.4019
Spatial filters were constructed for the two Cusco transformed attribute variables,
where the candidate eigenvector set was restricted to those 24 vectors portraying
positive spatial autocorrelation and having a MC/MCmax > 0.25; the maximum
possible MC value for Cusco’s topological surface partitioning, MCmax, is 1.09315,
the MC value for the principal eigenvector. The resulting spatial filters appear in
Fig. B.5.5, each portraying strong positive spatial autocorrelation, and each
closely reflecting its parent map (see Fig. B.5.1). Summary measures for them are
reported in Table B.5.2. The bivariate correlation coefficient between (X – FX) and
(Y – FY), where Fj denotes the spatial filter for variable j, and both of which continue to conform closely to a normal distribution, decreases in absolute value to
–0.42688. Here spatial autocorrelation roughly accounts for, respectively, 62 percent and 58 percent of the geographic variability in these transformed attribute
variables. The filtered residuals contain negligible spatial autocorrelation. Although both variables have roughly the same level of positive spatial autocorrelation, the correlation coefficient decrease is rather modest because their map patterns are noticeably different: their spatial filters have nine eigenvectors in
common, and seven that are specific to one or the other of them. The decompositions highlighted here may be written as
Y = µY 1 + E c β cY + E u Y β uY +
εY
(B.5.9)
X = µ X 1 + E c βc X + E u X β u X +
εX
(B.5.10)
B.5
Spatial filtering
311
where E is an n-by-H matrix for X and an n-by-K matrix for Y (with H and K not
necessarily equal) of selected eigenvectors, subscripts c and u respectively denote
common and unique sets of eigenvectors, β is a vector of regression coefficients,
and εY and ε X respectively are the iid N (0, σ ε2j ), j = X or Y, a-spatial variates
for variables X and Y. As with PCNM, the linear combinations of eigenvectors are
the spatial filters.
(a)
(b)
Fig. B.5.5. Typology-based spatial filters for the Cusco Department of Peru; eigenvector
values are directly related to gray tone darkness: (a) for transformed population density; (b)
for transformed elevation standard deviation
Now the bivariate correlation coefficient can be rewritten as the following
weighted combination of different correlation coefficients, where the weights are
the square roots of relative variance term products (see Table B.5.2)
2
2
rX ,Y = rresid X ,residY (1 − RX )(1 − RY ) + rEc
+ rE
uX
2
, residY
2
REu (1 − RY ) + rresid X , Eu
X
Y
2
X
, EcY
2
REc REc +
X
2
Y
2
(1 − RX ) REu + 0 RE2u RE2u
Y
X
Y
(B.5.11)
312
Daniel A. Griffith
where resid denotes the residuals, R2 is a linear regression multiple correlation coefficient, and the subscripts X and Y denote with which variable a term is associated. The zero correlation arises because the unique sets of eigenvectors are orthogonal and uncorrelated. Substituting the corresponding Cusco case study values
into this equation (see Table B.5.1; some rounding error is present) yields
–0.48345 = – 0.43904 (1 − 0.6188)(1 − 0.5753) – 0.60486 (0.4645)(0.5189)
– 0.10384 (0.1543)(1 − 0.5753) + 0.11396 (1 − 0.6188)(0.0565)
+ 0 (0.1543)(0.0565) .
This decomposition equation like that for PCNM, emphasizes that common eigenvectors tend to increase the magnitude of a correlation coefficient, whereas unique
eigenvectors tend to suppress it.
B.5.3
Eigenfunction spatial filtering and generalized
linear models
A spatial filter can be constructed for GLM specifications again using a stepwise
selection technique. By doing so, MCMC techniques can be avoided when estimating model parameters in the presence of spatial autocorrelation; rather, standard GLM procedures can be used.
Because population is a count variable, it can be treated as a Poisson random
variable, and the area variable in the denominator of a population density can be
converted to a GLM offset variable (that is, its coefficient is set to one and not estimated) by including its logarithm as a special covariate (that is, an offset) in a
model specification. For the Cusco Departmental data, the GLM estimation, including log(selevation) as a covariate, yields the spatial filter appearing in Fig. B.5.6,
whose MC = 0.86030 (zMC = 14.94) and GR = 0.31022. This spatial filter has nine
eigenvectors, six of which are contained in the set of eleven for the corresponding
normal-approximation spatial filter. Including the previously specified transformed selevation as a covariate in the normal approximation specification increases
its R2 to 0.6821. Switching to the correct probability function here results in a
more parsimonious model whose predicted values better align with actual population density across the entire range of density values [see Fig. B.5.6(b) and Fig.
B.5.6(c)].
B.5
(a)
Spatial filtering
313
(b)
(c)
Fig. B.5.6. Generalized linear model (GLM) results: (a) the population density GLM spatial
filter; eigenvector values are directly related to gray tone darkness; (b) scatterplot of the
predicted versus the observed pd; (c) scatterplot of the predicted versus the observed pd
with the four largest values set aside. The solid black line denotes observed pd, open circles
denote GLM-predicted pd, and asterisks denote back-transformed normal approximation
predicted pd
B.5.4
Eigenfunction spatial filtering and geographically
weighted regression
Eigenfunction spatial filters allow geographically varying coefficient models to be
specified, along the lines of geographically weighted regression (GWR). Interaction terms can be created by multiplying each variable in a set of covariates by
each eigenvector in a candidate set. In other words, these interaction variates are
cross-products of each synthetic spatial variate and each covariate. Again stepwise
regression can be used to select the relevant variables. The stepwise procedures
can be used to select from the candidate eigenvector set (which relates to the intercept term), the set of covariates, and the set of interaction terms. Once the subset
has been identified, it can be grouped into sets having a common covariate so that
this covariate can be factored from each set. What remains for each set is a linear
combination of the synthetic spatial variates used to construct a cross-product,
314
Daniel A. Griffith
which when added together constitutes geographically varying coefficients. The
affiliated equation may be written as follows:
Y = f( µY 1 + E1 β 1 + X Ex β x ))
(B.5.12)
where f denotes some function (for example, the natural antilogarithm, e, for the
Poisson probability model), the subscript 1 denotes the eigenvector and the regression coefficient associated with the intercept term, the subscript X denotes eigenvectors and their regression coefficients associated with the slope coefficient,
and denotes the Hammard matrix product (that is, element-by-element matrix
multiplication).
(a)
(b)
Fig. B.5.7. Geographically varying coefficients for the GLM population density model;
coefficient magnitudes are directly related to gray tone darkness: (a) spatially varying
intercept term; (b) spatially varying slope coefficient
Consider the preceding GLM model describing population density across the
Cusco Department. The geographically varying intercept can be rewritten as
8.8834 – 4.7838 E1 – 4.3226 E3 + 48.3641 E4 + 1.8258 E6 – 2.0448 E12 +
2.1773 E13 – 2.7006 E14 – 1.6251 E16 – 1.7334 E19 .
Meanwhile, the geographically varying slope coefficient can be rewritten as
–0.9446 – 8.7899 E4 – 0.3106 E10 – 0.4664 E11 + 0.7628 E15 .
B.5
Spatial filtering
315
This is the term that is factored from the set of cross-product terms (i.e., each eigenvector multiplied by selevation); each element of this term is multiplied by its corresponding log(selevation) value. The geographic distributions of the spatially varying coefficients appear in Fig. B.5.7. Because eigenvector E4 is common to both
coefficient expressions, and it dominates the intercept term, the correlation between these two geographically varying coefficients is very high (–0.98036). Because each of the eigenvectors has a mean of zero, these two geographically varying coefficients are centered on their respective global values [that is, the intercept
constant, and the slope coefficient for log(selevation), itself]. Furthermore, because
the coefficient variability is a function of the eigenvectors, these geographically
varying coefficients contain (as well as account for) spatial autocorrelation in the
response variable Y.
Table B.5.4. Geographically varying coefficients: spatial autocorrelation
in terms of MC and GR
Coefficient
MC
zMC
GR
Intercept
0.92345
16.02
0.22664
Log(selevation) slope
0.92090
15.98
0.23104
Notes: MC denotes the Moran Coefficient, and GR denotes the Geary Ratio
Each coefficient contains statistically significant, weak positive spatial autocorrelation.
B.5.5
Eigenfunction spatial filtering and geographical
interpolation
Spatial interpolation is a problem frequently encountered in spatial analysis. Its solution exploits spatial autocorrelation in order to predict an unknown value at
some location from known values at nearby locations. The redundant information
interpretation of spatial autocorrelation, which relates to the amount of geographic
variance it accounts for within an attribute variable, supports this interpolation.
The best imputation of a missing response value is its expected value given a
set of available data. In other words, it equals the prediction equation estimated
with a set of observed data. This value can be calculated by inserting a binary indicator variable into a regression equation, where this variable is assigned a value
of minus one for the single observation with a missing response value, and a zero
for all other observations. The regression coefficient calculated for this indicator
variable is an imputation. For a Poisson model specification, this requires the
missing response variable value to be replaced with a one
316
Daniel A. Griffith
K
exp (α + β X X i + ∑ Eki β k − β m1) = 1
(B.5.13)
k =1
when
K
β m = α + β X X i + ∑ E ki β k .
(B.5.14)
k =1
Imputed values for population density across the Cusco Department were calculated and are portrayed in Fig. B.5.8. The expected values were computed with the
covariate log(selevation) coupled with a spatial filter. Of note is that Fig. B.5.8(a) is
very similar to Fig. B.5.6(b); more variability appears here because each density
value is not used in the calculation of the GLM, increasing the uncertainty in its
prediction. Nevertheless, given their alignment with the ideal line in Fig. B.5.8,
the imputed values obtained here appear to be reasonable.
(a)
(b)
Fig. B.5.8. Generalized linear model (GLM) imputation results: (a) scatterplot of the imputed versus the observed population densities (pd); (b) scatterplot of the imputed versus
the observed population densities (pd) with the four largest values set aside. The solid black
line denotes observed pd, and the open circle denotes GLM-imputed pd
B.5.6
Eigenfunction spatial filtering and spatial
interaction data
Recent work has returned attention to the role spatial autocorrelation plays in the
estimation of model parameters describing spatial interaction data. LeSage and
Pace (2008) propose a formulation that is autoregressive-based, and relates to the
autoregressive linear operator spatial filter. Fischer and Griffith (2008) compare
this autoregressive linear operator specification with an eigenfunction spatial filter
specification. One finding is that the spatial autocorrelation involved transcends
B.5
Spatial filtering
317
that latent in attribute variables representing characteristics of origins/destinations.
Rather, the spatial autocorrelation relates to flows leaving nearby origins and arriving in nearby destinations. This conceptualization is reminiscent of the hierarchical component affiliated with geographic diffusion. This topic is at the research
frontiers of spatial filtering work.
B.5.7
Concluding remarks
Spatial filtering methodology seeks to account for spatial autocorrelation in georeferenced data in a way that enables conventional statistical estimation techniques
to be exploited. It also allows impacts of spatial autocorrelation to be uncovered in
a more data analytic manner. Two geographically distributed attribute variables
for the Cusco Department of Peru – 2005 population density and elevation variation – are used here to illustrate this contention, with special reference to their
bivariate correlation coefficient. The naive correlation coefficient is –0.48345. Adjusting this value for the presence of positive spatial autocorrelation results in a
decrease in its absolute value; in other words, positive spatial autocorrelation tends
to inflate correlation coefficients. But this reduction is a function of the spatial filter specification employed. The autoregressive linear operator, PCNM, and eigenfunction spatial filtering results are very comparable. They are, respectively,
–0.42070, –0.39203, and –0.43904. This finding is not surprising, because all
three of these methodologies share a common mathematical foundation. In contrast, the Gi(d)-based spatial filtering yields a value of –0.20744. Part of its deviation from the other three results may well be attributable to its more restrictive assumptions.
Spatial filtering can be employed not only with the normal probability model,
but also with the entire family of probability models affiliated with generalized
linear models. It also supports spatial interpolation, and offers a vehicle for addressing spatial autocorrelation in geographic flows data.
Acknowledgements. We are indebted to Marco Millones, Clark University, for providing
us with the Cusco Department GIS files, and its 2005 Peru Census data numbers.
References
Borcard D, Legendre P (2002) All-scale spatial analysis of ecological data by means of
principal coordinates of neighbour matrices. Ecol Mod 153(1/2):51-68
318
Daniel A. Griffith
Borcard D, Legendre P, Avois-Jacquet C, Tuomisto H (2004) Dissecting the spatial structure of ecological data at multiple scales. Ecology 85(7):1826-1832
Dray S, Legendre P, Peres-Neto P (2006) Spatial modeling: A comprehensive framework
for principal coordinate analysis of neighbor matrices (PCNM). Ecol Mod
196(3/4):483-493
Fischer MM, Griffith D (2008) Modeling spatial autocorrelation in spatial interaction data:
an application to patent citation data in the European Union. J Reg Sci 48(5):969-989
Getis A (1990) Screening for spatial dependence in regression analysis. Papers in Reg Sci
Assoc 69(1):69-81
Getis A (1995) Spatial filtering in a regression framework: experiments on regional inequality, government expenditures, and urban crime. In Anselin A, Florax, R (eds) New
directions in spatial econometrics. Springer, Berlin, Heidelberg and New York,
pp.172-188
Getis A, Griffith D (2002) Comparative spatial filtering in regression analysis. Geogr Anal
34(2):130-140
Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics.
Geogr Anal 24(3):189-206
Griffith D (2000) A linear regression solution to the spatial autocorrelation problem. J
Geogr Syst 2(2):141-156
Griffith D (2002) A spatial filtering specification for the auto-Poisson model. Stat Prob Letters 58(3):245-251
Griffith D (2003) Spatial autocorrelation and spatial filtering: gaining understanding
through theory and scientific visualization. Springer, Berlin, Heidelberg and New York
Griffith D (2004) A spatial filtering specification for the autologistic model. Environ Plann
A 36(10):1791-1811
Griffith D, Peres-Neto P (2006) Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses. Ecology 87(10):2603-2613
Haining R (1991) Bivariate correlation with spatial data. Geogr Anal 23(3):210-227
LeSage JP, Pace K (2008) Spatial econometric modeling of origin-destination flows. J Reg
Sci 48(5):941-968
Tiefelsdorf M, Griffith D (2007) Semi-parametric filtering of spatial autocorrelation: the
eigenvector approach. Environ Plann A 39(5):1193-1221
Tobler WR (1975) Linear operators applied to areal data. In Davis J, McCullagh M (eds)
Display and analysis of spatial data. Wiley, London, pp.14-37