N is sample size, eg, the number of cells

Chapter 12 – Correlation between two maps
Testing spatial correlation
(autocorrelation)
1. Moran’s I
2. Geary’s c
3. Variogram
4. Join counts
Cliff, A. D. & Ord, J. K. 1981. Spatial processes: models and applications. Pion
1
Testing correlation between two maps
(continuous variables)
x1
x2
Proportion
of land area
classified as
phydric
ln(elevation)
in foot
Gumpertz, M.L., Wu, C.-T. & Pye J.M. 2000. Logistic regression for southern pine
beetle outbreaks with spatial and temporal autocorrelation. Forest Science 95-107.
2
Assume the correlation coefficient between the two maps is r.
The null hypothesis:
H0: r = 0.
If y = (y1, y2, …, yN) is a random, independent sample, and x = (x1, x2, …, xN) is
also an independent sample, the test of H0 is straightforward. Under H0, r has
the distribution (N is sample size, e.g., the number of cells):
(1  r 2 ) ( N  4) / 2
f (r ) 
1 N 2
B( ,
)
2 2
Therefore, p-value for observing an extreme robs is:
(*)
p  1
ro b s
 f (r )dr
 ro b s
Equivalently, the test of H0 can be done using a t-test because
r ( N  2)1/ 2
(1  r 2 )1/ 2
has a t-distribution. Note these two tests are identical.
3
However, in reality y = (y1, y2, …, yN) is rarely an independent sample, neither
is x = (x1, x2, …, xN). This nuisance is caused by autocorrelation.
Autocorrelation inflates type I error. This means two uncorrelated maps will
be more likely mistakenly accepted as significantly correlated (reject a true
hypothesis).
In order to make a correct inference, we need to penalize the sample size. For
example, although the sample size is n, the effective sample size should be
much smaller than n because of autocorrelation.
The effective sample size can be calculated following the method of Clifford et
al. (1989), or Dutilleul’s method for small sample size.
Clifford, P., Richardson, S. and Hemon, D. 1989. Assessing the significance of the correlation
between two spatial processes. Biometrics 45:123-134.
Dutilleul, P. 1993. Modifying the t test for assessing the correlation between two spatial
processes. Biometric 49:305-314.
4
The effective sample size can be calculated following the method of Clifford et al.
(1989).
 
2
r
trace( x

y
)
N 2 x2 y2
where  x is a covariance matrix among the n locations. It is a N×N symmetric
matrix. It can be estimated by variogram of geostatistics.
Calculating the variogram is the most important step to test H0. The major part of
Covariogram is a decreasing function, i.e., two
nearby locations have high covariance than locations
far away. Therefore, the covariance matrix
x
the spatial correlation structure of the data.
captures
covariance
computation is to estimate the variogram and the covariance (covariogram) matrix.
distance
5
Once we have estimated the covariance matrix, the effective sample size is:
M  1
1
 r2
 1
N 2 x2 y2
trace( x

y
)
Then the test of H0 can follow the same probability distribution as (*), but replace N
in (*) by the effective sample size M. The p-value can be as calculated:
p  1
ro b s
 f (r )dr
 ro b s
Note the W-test described in Clifford et al. is very similar to the above test, thus, is
1 / 2 , and W ~ N(0,1), a
not included in my R program. Simply,
W  ( M  1)
r
standard normal distribution.
6
Description of R program
The main program is called “association.main”. It has five functions.
boxcox.fn: boxcoxize the data to make it normality.
generatexy.fn: generate a location matrix, and plot the map (image)
variogram.fn: calculate empirical variogram for a data
varcov.fn: estimate covariance using a theoretical model to fit empirical
variogram.
test.association.fn: calculate p-value for the test.
7
Example: BCI plot – correlation between number of recruits and number
of species. Cell size = 10×10 m. Total number of cells N = 5000
Data file name in R: bci.recruit.dat
Question of great ecological interest is: Whether diversity (species richness)
promotes recruitment and seedling survival?
> bci.recruit.dat[1:10,]
abund nsp recruit
1 26
22 5
2 38
26 12
3 57
34 5
4 46
29 10
5 49
35 12
6 52
23 16
7 28
24 27
8 39
22 10
9 57
28 4
10 35
24 2
… …
… …
simpson
0.9037433
0.7307692
0.6086549
0.5884316
0.6929293
0.5067466
0.8596491
0.7768131
0.4071429
0.8101852
…
5000 …
…
…
…
Number of recruits
Number of species
8
Wills, C. et al. 2006. Non-random processes contribute to the maintenance
of diversity in tropical forests. Science 311:527-531.
Example: BCI plot – correlation between number of recruits and number
of species. Cell size = 10×10 m. Total number of cells N = 5000
>association.main(bci.recruit.dat, map1=2,
map2=3,cellsize=10,boxcox=“no”)
The results are:
Correlation coef. r = -0.05455
Original sample size = 5000
p-value = 1e-04
Effective sample size = 1512.2 p-value = 0.0339
map1 = 2 is “number of species”, map2=3 is
“number of recruit”
The correlation coefficient between the two
maps is -0.05455. Without considering
autocorrelation, it is highly significant with pvalue = 0.0001. After taking account of spatial
autocorrelation, it is marginally different from 0,
with p-value = 0.0339. (It is significant at p=0.05
level, but not at p=0.001 level.)
Note: You need package geoR to run this program.
9