Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Arthur Getis* and Jared Aldstadt** *San Diego State University **SDSU/UCSB Joint PhD Program Paper presented at the Regional Research Institute, West Virginia University Morgantown, West Virginia December 8, 2005 AMOEBA • A design for the construction of a spatial weights matrix using empirical data. • Multidirectional: Searches for spatial association in all specified directions. • Optimal: Optimum in the sense that the scale is local (the finest scale) and the analysis reveals all spatial association. • Ecotope-Based: The ecotope is a specialized region (a particular habitat) within a larger region. • Algorithm: The algorithm for finding the ecotope is based on an analytical system that often finds highly irregular (amoeba-like) sub-regions of spatial association. The Issues Question 1 • How does one create an appropriate spatial weights matrix? Question 2 • Can we have confidence in the identification of spatial clusters? Question 1 • How does one create an appropriate spatial weights matrix? The Spatial Weights Matrix In a regression context W is the formal expression of spatial dependence between spatial units (the spatial effects). Used in, for example: y = ρWy + Xβ + ε The Typical W Matrix j-------> 1 2 3 n _________________________________________________________ w12 w13 ... w1n i=1 w11 i=2 w21 w22 i=3 w31 i=n wn1 wnn ________________________________________________________ Some Traditional W Schemes • • • • • • • Contiguity Inverse Distances Lengths of Shared Borders, Perimeters nth Nearest Neighbor Distance All Centroids within d Ranked Distances Network Links Commentators on W • • • • • • • • • • • • Anselin: Outlined the problem Dacey: varying results given schemes Cliff and Ord: rook’s and queen’s cases Griffith: better under-specified Florax & Rey: over-specification reduces power Kooijman: maximize Moran’s Openshaw: computer search for best model Bartels: binary defensible Hammersley-Clifford: near neighbors in Markov Tiefelsdorf, Griffith, Boots: standardization Florax and Graff: bias due to matrix sparseness GEODA listserv Some Recent W Schemes • Fotheringham, Brunsdon, and Charlton’s bandwidth distance decay (1996) • LeSage’s Gaussian distance decline (1999) • McMillen’s tri-cube distance decline (1998) • Getis and Aldstadt’s local statistics model (2001, 2002) • Fotheringham, Charlton, Brunsdon’s optimize bandwidth (2002) • LeSage’s Bayesian approach (2003) • Aldstadt and Getis’ AMOEBA (2003) W Theory or Reality? • • • Exogenous versus endogenous Estimation versus prediction Model driven versus data driven • The AMOEBA approach AMOEBA: Critical Number of Links Identification Local statistics’ values are computed around each observation as the number of links (d) increases. When the absolute values fail to rise, the cluster diameter is reached. First peak equalsG i*dc . 2.5 |Gi*| 2 1.5 1 0.5 0 0 1 2 3 Distance Links 4 5 AMOEBA: Weight Calculation When dc > 0, wij = P( z ≤ Z d c ) − P( z ≤ Z d ij ) P( z ≤ Z dc ) − P( z ≤ Z 0 ) , for all j where dij dc wij = 0 , otherwise. When dc = 0, for all j, wij = 0 P(z) is the cumulative probability associated with the standard variate of the normal distribution Weights vary between 0 and 1. AMOEBA: Links Designations dij is the number of links from the focus spatial unit i to another spatial unit j dc is the critical number of links: the number of links from i beyond which no further autocorrelation exists. AMOEBA as W and U in an Autoregressive Spatial Lag Model It is conceivable for rows of the weights matrix to be completely filled with zeroes indicating that there is no local spatial autocorrelation surrounding an observation. To compensate for the zero row effect, we create a dummy variable, U, that assigns a 1 for all observations with no dependence structure and 0 otherwise. y = θWy + αU + Xβ + ε AMOEBA as W and U in a Autoregressive Spatial Error Model y = αU + Xβ + (I - κW)-1ε AMOEBA: The non-spatial and spatial matrices U= 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 w2,1 w3,1 w4,1 0 0 w7,1 W = w8,1 w9,1 0 w11,1 w12,1 0 w14,1 w1,2 w1,3 0 w2,3 w3,2 0 w4,2 w4,3 0 0 0 0 w7,2 w7,3 w8,2 w8,3 w9,2 w9,3 0 0 w11,2 w11,3 w12,2 w12,3 0 0 w14,2 w14,3 w1,4 w2,4 w3,4 0 0 0 w7,4 w8,4 w9,4 0 w11,4 w12,4 0 w14,4 w1,5 w1,6 w2,5 w2,6 w3,5 w3,6 w4,5 w4,6 0 0 0 0 w7,5 w7,6 w8,5 w8,6 w9,5 w9,6 0 0 w11,5 w11,6 w12,5 w12,6 0 0 w14,5 w14,6 w1,7 w1,8 w2,7 w2,8 w3,7 w3,8 w4,7 w4,8 0 0 0 0 0 w7,8 w8,7 0 w9,7 w9,8 0 0 w11,7 w11,8 w12,7 w12,8 0 0 w14,7 w14,8 w1,9 w2,9 w3,9 w4,9 0 0 w7,9 w8,9 0 0 w11,9 w12,9 0 w14,9 w1,10 w1,11 w1,12 w1,13 w1,14 w2,10 w2,11 w2,12 w2,13 w2,14 w3,10 w3,11 w3,12 w3,13 w3,14 w4,10 w4,11 w4,12 w4,13 w4,14 0 0 0 0 0 0 0 0 0 0 w7,10 w7,11 w7,12 w7,13 w7,14 w8,10 w8,11 w8,12 w8,13 w8,14 w9,10 w9,11 w9,12 w9,13 w9,14 0 0 0 0 0 w11,10 0 w11,12 w10,13 w11,14 w12,10 w12,11 0 w12,13 w12,14 0 0 0 0 0 w14,10 w14,11 w14,12 w14,13 0 Generalized AMOEBA Yc 1c 0 ε c Wcc Wc 0 Yc +β + Y = α 1 + ρ 0 0 Y0 0 0 10 ε 0 Total Fertility Rates An Example • Amman, Jordan • 1994 (data by census units) LEBANON SYRIA IRAQ Mediterranean Sea PALESTINIAN AUTHORITY Gaza ISRAEL EGYPT SAUDI ARABIA Explanatory Variables Regressor social variables • 1. Percent of females with higher education (called “hi-ed”) • 2. Percent females married (called “married”) Ordinary Least Squares AIC No W or U 165.35 t VALUES constant hi-ed married 6.266 -14.344 1.261 AMOEBA in Spatial Error Models Contiguity A M O E B A Gi Ii ci AIC 167.352 79.159 147.043 101.100 t VALUES constant hi-ed married lambda non-spatial 6.499 -13.040 1.164 1.634 6.499 -11.550 1.978 98.792 12.588 7.201 -13.316 1.227 1.187 -4.048 6.342 -4.680 1.154 14.005 7.089 Comparison of Spatial Contiguity and AMOEBA Spatial Error Model Spatial Error Model: • Gi AMOEBA has AIC much lower than contiguity (79.159 to 160.625). • All AMOEBA models are an improvement over contiguity. • Gi AMOEBA has an extremely high lambda and nonspatial vector: good descriptor of spatial and nonspatial effects. • Gi AMOEBA shows social variables to be significant in explaining TFR. AMOEBA in Spatial Lag Models Contiguity AIC t VALUES constant hi-ed married Rho Non-Spatial A M O E B A Gi Ii ci 160.625 108.27 148.481 123.881 5.419 -9.927 1.164 -0.005 3.866 -0.087 2.160 7.435 7.594 5.068 -9.051 1.341 1.819 -1.657 4.742 -8.642 1.201 5.443 8.058 Comparison of Spatial Contiguity and AMOEBA Spatial Lag Model • Again all AMOEBA have lower AIC than contiguity; Gi AMOEBA is best. • All variables significant. Question 2 • Can we have confidence in the identification of spatial clusters? Problems with Spatial Clusters • Not explicit (what is a cluster?) • Are they statistically significant? (degree of confidence) • What is the appropriate spatial scale? • Often arbitrary, too general • Over and under identification • Appropriate shape (too circular, ellipsoidal) • In general, the believability problem AMOEBA Procedure I For each observation i, local statistics values (e.g., Gi*, Z[Ii], Z[ci]) are obtained for all combinations of near neighbors j of i within distance d of i. The set of j observations that maximizes the local statistic become members of the ecotope together with the ith observation. 1 1 0 1 1 AMOEBA Subsequent Procedures The procedure is repeated at increasing distances from i. At each distance d from i, only the j observations that are contiguous to the already existing ecotope are evaluated. Again, using the local statistic, all combinations together with the already existing ecotope members are evaluated. That new set of j observations that maximizes the local statistic become members of the ecotope. 2 2 2 1 2 1 0 1 2 1 2 2 3 4 3 2 3 4 4 3 2 1 2 3 4 3 2 1 0 1 2 3 3 2 1 2 3 4 6 5 4 3 4 6 5 4 mean = 0 variance = 1 Hypothetical Clusters mean = 0 variance = 1 AMOEBA Example 1 LSM AMOEBA Gi AMOEBA Ii AMOEBA ci AMOEBA Example 2 LSM AMOEBA Gi AMOEBA Ii AMOEBA ci AMOEBA Example 3 LSM AMOEBA Gi AMOEBA Ii AMOEBA ci AMOEBA Example 4 LSM AMOEBA Gi AMOEBA Ii AMOEBA ci AMOEBA Example 5 LSM AMOEBA Gi AMOEBA Ii AMOEBA ci Heterogeneous Clusters This is like the data used in the GA paper. Homogeneous Clusters This is the same 6 clusters with radii 2,4, and 6. The high clusters have a mean of 0.5 and the low clusters have a mean of -0.5. These means are added to random values from the Normal(0,1) distribution. Peaked Clusters Real World Example • Clustering of dengue hemorrhagic fever in Thailand by province and by month. • 14 years data: 168 monthly observations STARS: A GIS System • Rey, Sergio. Space-Time Analysis of Regional Systems (STARS). Available as an open source program on the Internet. Other Clustering Algorithms • SaTScan by Kulldorff (1997, v4.0 2004), (Communications in Statistics) • FleXScan by Tango and Takahashi (2004, 2005) (International Journal of Health Geographics) Bases of Clustering Methods AMOEBA Based on values of the local statistic as d increases in many directions from an index location. SaTScan Based on a moving circle of varying radii searching for the circle that is the least likely to have occurred by chance. FleXScan Based on spatial scan statistic used on irregularly shaped windows formed by connecting adjacent neighbors. Clustering Methods Tests AMOEBA • • Ho: The sum of the observed values within ecotopes is greater (lesser) than expected by chance. The p value is calculated based on the location of the local statistic values of the observed ecotope within Monte Carlo permutations. SaTScan • • Ho: The sum of observed cases within the circular search region is proportional to the population size. The p value is calculated based on Poisson realizations using the global rate. FleXScan • Ho and p: Same as SaTScan, but within the irregular search region. Clustering Comparison High Risk Provinces Low Risk Provinces --------------------------------------------------Cluster No Cluster Cluster No Cluster --------------------------------------------------- • • • • • • • • • Relative Risk Expected AMOEBA SaTScan FleXScan 38 34 35 35 0 4 3 3 0 0 21 3 178 178 157 175
© Copyright 2026 Paperzz