Spatial Statistics PM599, Fall 2013 Lecture 5: October 7, 2013 1 / 45 Areal Data Link between geostatistical/point referenced and areal data I For geostatistical/point referenced data, we use functions of distance to estimate the variogram/covariance that defines spatial relationships I Geostatistical prediction involves using fitted covariance functions (kriging), spatial interpolation, or basis spline smoothing I For areal data (lattices), we use neighbour information to define spatial relationships 2 / 45 Areal Data Link between geostatistical/point referenced and areal data I In general, areal units are irregular (e.g. zip code, county) but methods may also apply to regular grids I We care about how areal units connect to each other I We will see some analogies between geostatistical data and areal data. Sometimes geostatistical methods are used for areal data prediction, but autoregressive models employing neighbourhood information are more commonly used I We will use the R package spdep() for areal data analysis 3 / 45 Areal Data Inferential issues with areal data Is there a spatial pattern? I Spatial pattern suggest that areal observations close to each other have more similar values than those far from each other. I You might think that there is a pattern through visualization, but this is often subjective. I Independent measurements will have no pattern, and would look completely random, but there may actually be an underlying pattern. I If there is a spatial pattern, how strong is it? 4 / 45 Areal Data Misrepresentation with maps Crude birth rates by state based on equal-interval cut points Figure: Monomier, N. Lying with Maps. Statistical Science 2005, 20(3) 215222. 5 / 45 Areal Data Misrepresentation with maps Crude birth rates by state based on quantile cut points Figure: Monomier, N. Lying with Maps. Statistical Science 2005, 20(3) 215222. 6 / 45 Areal Data Is there a spatial pattern? I Response of interest Yi measured in block or areal unit Bi I The Bi are supplemented with neighbourhood information (distance between Bi and Bj , area of Bi , boundary/edge connections) Areal data analysis involves: I I I I I Representation of spatial proximity in areal data using weighted graphs Testing for spatial pattern: Global testing using Morans I or Gearys C statistic Testing for spatial pattern: Local testing using local Moran’s I or Getis-Ord G∗ statistic Modeling spatial pattern for prediction and inference: autoregressive models (Simultaneous Autoregressive (SAR) models and Conditional Autoregressive (CAR) models 7 / 45 Areal Data Areal Data Example: SIDS 37 38 Sudden Infant Deaths in North Carolina Ashe 36 Watauga Madison Swain 35 Graham Cherokee Clay Macon Avery Mitchell Yancey Alleghany Wilkes Caldwell Alexander Surry Stokes Yadkin Forsyth Rockingham Caswell Person Northampton Vance Warren Granville Bertie Nash Edgecombe Iredell Davidson Wake Burke Randolph Wilson Chatham McDowell Catawba Rowan Buncombe Haywood Johnston Greene Lincoln Lee Rutherford Cabarrus Harnett Wayne Henderson Cleveland Gaston StanlyMontgomery Moore Jackson Polk Mecklenburg Lenoir Transylvania Union Anson Richmond Hoke Cumberland Sampson Duplin Martin Pitt 34 Columbus Camden Currituck Pasquotank Perquimans Chowan Washington Tyrrell Beaufort Dare Hyde Pamlico Jones Onslow Bladen Gates Craven Scotland Robeson Hertford Halifax Franklin Guilford Alamance OrangeDurham Davie Carteret Pender New Hanover 33 Brunswick -84 -82 -80 -78 -76 8 / 45 Areal Data Areal Data Example: SIDS I Data for 100 counties in North Carolina I Includes counts of live births and sudden infant deaths for two periods: July 1974-June 1978 and July 1979 to June 1984. I SIDS is defined as sudden death of infant up to 12 months old. I Risk factors include race, SES, physiologic (respiratory, sleep rate, cardiac function) I The primary analysis here is not only to see how often SIDS occurs, but where and if there are clusters or spatial patterns. 9 / 45 Areal Data Example Areal Data: SIDS SIDS74 Counts 45 40 35 30 25 20 15 10 5 0 10 / 45 Areal Data Flowchart How we analyze areal data 11 / 45 Areal Data Proximity I We represent proximity between areal units (blocks, Bi ) using connected graphs I Adjacency matrix (proximity matrix) is denoted W I The entries of W are wij and are called weights I The wij connect different values of the process Y1 , ..., Yn , i = 1, ..., n in some fashion I Generally wii is set to zero 12 / 45 Areal Data Proximity Examples of weights 1. Border based (edge connections): areal units are neighbours if they share a border I wij = 1 if i and j share common boundary 2. Distance based: areal units are neighbours if they are within a distance of of each other I I I wij = 1 if the centroid of i is distance (ex. 25km) of the centroid of j wij = 1 if j is the nearest neighbour (smallest ) of i wij = 1 if j is one of the k nearest neighbours of i, e.g. the two and three closest areal units j to i are the k=2 and k=3 nearest neighbors of i. This will result in multiple neighbours for each i 13 / 45 Areal Data Proximity I Distance can be defined several ways: I I I Euclidean distance (or driving distance or driving time, etc) between centroids (straight line path) Mean driving distance, mean driving time, walking distance, etc. (transit path, not necessarily a straight line) The connections between blocks under proximity by k nearest neighbour or distance neighbour can be examined using a connected graph 14 / 45 Areal Data Distance Neighbours Counties connected if 30km or less apart ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 / 45 Areal Data Nearest Neighbours (1) Counties each connected to its nearest neighbour ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 16 / 45 Areal Data Nearest Neighbours (2) Counties each connected to its 2 nearest neighbours ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 17 / 45 Areal Data Proximity Border/Edge based, binary connectivity. Two areal units are neighbours if they share a border wij = 1 0 if i and j share a boundary otherwise Where wij = wji (symmetric) 18 / 45 Areal Data Border/Edge Connectivity Queen a single shared boundary point means they are neighbours. Rook requires more than a single shared point to constitute neighbours. 19 / 45 Areal Data Border/Edge Connectivity: Queen 20 / 45 Areal Data Border/Edge Connectivity: Rook 21 / 45 Areal Data Border/Edge Connectivity: Difference Queen-Rook ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 22 / 45 Areal Data Proximity Fractional borders ( wij = lij li 0 if regions i and j share a border otherwise Where l is the length of the common border 23 / 45 Areal Data Proximity Neighbour Based wij = 1 0 if centroid of j is a k nearest neighbour of i otherwise Where wij and wji not necessarily symmetric 24 / 45 Areal Data Proximity k Nearest Neighbours (kNN) k=2 k=3 k=1 25 / 45 Areal Data Proximity Neighbour Based: 1NN ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 26 / 45 Areal Data Proximity Neighbour Based: 2NN ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 27 / 45 Areal Data Proximity Neighbour Based: Difference Between 1NN and 2NN ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 28 / 45 Areal Data Proximity Distance Based wij = 1 0 if dij < otherwise For some specified distance threshold Alternatively, −ρ dij if ρ > 0 wij = 0 otherwise For some power, ρ (recall idw) 29 / 45 Areal Data Proximity Distance based neighbours, dist=1*max_k1 dist=1.5*max_k1 30 / 45 Areal Data Proximity Distance based neighbours, between 1 and 1.5 times maximum kNN distance ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 31 / 45 Areal Data Proximity Distance based neighbours, between 10 and 30km ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 32 / 45 Areal Data Adjacency Creating the adjacency matrix from connectivity graphs 33 / 45 Areal Data Neighbours 1 2 3 4 5 6 7 2 1 2 3 1 5 6 5 3 4 5 4 7 5 6 7 34 / 45 Areal Data Weights: Adjacency Matrix I The adjacency matrix, W is a matrix of neighbours where elements are weights wij I Once our list of neighbours (fixed distance or kNN) has been created, we assign spatial weights to each relationship I Can be binary or variable I Even when the values are binary 0/1, there is the issue of what to do with no-neighbour observations arises I Binary weighting will assign a value of 1 to neighboring features and 0 to all other features 35 / 45 Areal Data Weights: Adjacency Matrix Binary weights I Binary weights vary the influence of observations I Those with many neighbours are up-weighted compared to those with few 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 36 / 45 Areal Data Weights: Adjacency Matrix Row standardization is used to create proportional weights in cases where features have an unequal number of neighbors I Row-standardized weights increase the influence of links from observations with few neighbours I Divide each neighbour weight for a feature by the sum of all neighbour weights I Obs i has 3 neighbours, each has a weight of 1/3 I Obs j has 2 neighbours, each has a weight of 1/2 I Use is you want comparable spatial parameters across different data sets with different connectivity structures 0 0 0.5 0 1 0 0.5 0.33 0 0.5 0 0.33 0 0.5 0 0.33 37 / 45 Areal Data Binary weight matrix 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 38 / 45 Areal Data Row standardized weight matrix 0 0.5 0 0 0.25 0 0 0.5 0 0.5 0 0 0 0 0 0.5 0 0.5 0 0 0 0 0 0.5 0 0.25 0 0 0.5 0 0 0.5 0 0.5 0.5 0 0 0 0 0.25 0 0.5 0 0 0 0 0.25 0.5 0 39 / 45 Areal Data Spatial Smoothers We can use the block values and weight matrices to obtain a smooth value for each region by taking locally weighted averages I If we have a measure of Yi , such as the SIDS rate in county i, we can get a rough estimate of what it could be predicted as from it’s j neighbours I Essentially, we replace Yi with Ŷi where Ŷi = P 1 X wij Yj j wij j I The ”new” Ŷi is a function of it’s spatial neighbours j I This smooths things out because the areal units look more like their neighbours 40 / 45 Areal Data Spatial Similarity I We want to summarize similarity between nearby areal units I Spatial autocorrelation is the the correlation of the same measurement taken at different areal units I The similarity of values at locations Bi and Bj are weighted by the proximity of i and j I The weight wij defines proximity 41 / 45 Areal Data Spatial Association Measuring strength of association I We want to measure how strong observations from nearby areal units are more or less alike than those that are farther apart I We also want to decide whether the similarity (or dissimilarity) is strong enough that it is not due to chance For example: I I I I I Let Yi be the response at the ith areal unit, Bi and Yj be the response at the jth areal unit, Bj Let simij be a measure of how similar (or dissimilar) the responses are at areal units Bi and Bj Let wij be a measure of the spatial proximity between areal units Bi and Bj We can define a general statistic by the cross product of the simij matrix and wij matrix 42 / 45 Areal Data Spatial Association Example con’t: Y= 0 0 1 1 0 1 0 1 0 define simij = (Yi − Yj )2 and 1 if i and j share a boundary wij = 0 otherwise 43 / 45 Areal Data Spatial Association Example con’t: W= 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 FindPpairwise similarity from Y and take the cross product to get P C= i j wij simij 44 / 45 Areal Data Spatial Association Example con’t I If C is small that means similarity between neighbours is high and we have positive spatial autocorrelation I If C is large that means there is little similarity between neighbours Measuring strength of association I In general the extent of similarity is represented by the weighted average of similarity between areal units: N P N P wij simij i=1 j=1 N P N P wij i=1 j=1 45 / 45
© Copyright 2026 Paperzz