On the use of the simple and partial Mantel tests in presence of spatial auto-correlation Gilles Guillot∗ and François Rousset † December 3, 2011 Abstract The Mantel test is routinely used in many areas of biology and environmental sciences to assess the significance of the association between two or more matrices of distances relative to the same pairs of individuals. This test is a valid statistical procedure to test the auto-correlation of a single (possibly multivariate) variable. This includes the widely used test of isolation-by-distance in population genetics. However, we show that contrarily to a widely shared belief, the simple and partial Mantel tests are not valid statistical procedures to assess the significance of the correlation between two variables structured in space. Under a fairly general model, simulations show that the Mantel tests provide an excess of Type I error whose magnitude increases with the intensity of the spatial auto-correlation. The Mantel tests should not be used in case auto-correlation is suspected in both variables compared under the null hypothesis. Keywords Bio-geography, landscape ecology, landscape genetics, isolation-by-distance, isolation-by-resistance, permutation test, re-sampling method, geostatistical data, random fields, auto-correlation, cross-correlation, simulation. ∗ Informatics and Mathematical Modelling Department, Technical University of Denmark, Copenhagen, Denmark Institut des Sciences de l’Évolution (UM2-CNRS), Université Montpellier 2, Place Eugène Bataillon, CC 065, Montpellier cedex 5 34095, France. † 1 Background For the detection of clustering of cancer cases in space and time, Mantel (1967) introduced a test based on permutations. He concluded his article by claiming that this method was general - a claim later relayed by Sokal (1979) - and could be used whenever one has to assess the significance of the correlation between the values of two square matrices containing “distances” relative to pairs of individuals. Dietz (1983) discussed the efficiency of various measures of correlation and Smouse et al. (1986) proposed an extension of the test, referred to as partial Mantel test, and aimed at assessing the dependence between two matrices of distances while controlling the effect of a third distance matrix. Since then, and despite the fact that (or perhaps because) none of these four original methodological papers stated the null hypothesis explicitly, the simple and partial Mantel tests have known a tremendous popularity. The simple Mantel test is for example used routinely to assess the significance of the association between a matrix of genetic measurements and a matrix of phenotypic measurements relative to the same individuals. It is also intensively used in ecology to assess how a matrix of genetic or phenotypic distances relates to a matrix of geographical distances. The latter may contain plain geographical (Euclidean) distances between pairs of sampling sites but it may alternatively contain values that attempt to reflect the actual cost for an individual to move across the area (accounting e.g. for the presence of barriers or hostile areas). In the latter case, the distance is known in ecology as “cost distance”. It may not enjoy the properties of a mathematical distance but it is in general correlated with the Euclidean distance. A classical analysis consists in assessing the significance of the dependence between genetic (or phenotypic) distances and cost distances while controlling for the “effect” of geographical distances through the partial Mantel test. In view of the tasks above, the Mantel tests have a number of appealing features. First they allow one to synthesize information contained in multivariate data in a single index and hence in a single test; second they allow one to deal with the case outlined above where the “distance” between individuals cannot be expressed as a difference (or combination of differences) between one or several variables (e.g. case of a cost distance); finally, they do not seem to rely on any parametric assumption. Some concerns regarding the partial Mantel test have been raised though. Raufaste and Rousset (2001) gave a example in population genetics where the partial Mantel test leads to the wrong conclusion. In a comment of this paper, Castellano and Balletto (2002) exhibited a linear statistical model under which the partial Mantel test was valid. The example of Castellano and Balletto (2002) 2 is statistically well grounded, however, their analysis does not address potential issues with spatially auto-correlated data (Rousset, 2002). Despite this controversy, the partial Mantel is still routinely used. In a recent review article, Legendre and Fortin (2010) discussed this controversy and on the basis of simulations concluded that the simple and partial Mantel test were valid statistical methods in general. The aim for the present note is to show that this conclusion does not hold as soon as the data display spatial auto-correlation. To do so, we report results from simulated geostatistical data, covering the different cases discussed in the literature, and to avoid any ambiguity, we distribute the R code for our computer simulations. We conclude by discussing the potential sources of discrepancy between our results and those of Legendre and Fortin (2010). We also outline some alternative strategies. 3 A simulation study Design and results In the following we consider tests of the correlation between variables X and Y that are independent under the null hypothesis, but that may be spatially auto-correlated. We need a model that is simple enough to be simulated and analyzed easily but rich enough to encompass the main features we want to discuss. A widely used model for spatially auto-correlated variables taken at arbitrary positions are Gaussian random fields. These are random processes whose realizations are functions of spatial coordinates, such that values at any set of locations are jointly normally distributed with a given covariance matrix describing the auto-correlation structure (Lantuéjoul, 2002; Wackernagel, 2003; Gelfand et al., 2010). While in many applications in evolutionary biology, X and Y would represent multivariate measurements, we consider here without loss of generality the case where X(s) and Y (s) are both univariate. We therefore consider two random fields X(s) and Y (s) assumed to be mutually independent and both Gaussian, stationary and isotropic. We impose the condition for X(s) and Y (s) to have a 0 mean a unit variance and a spatial auto-covariance function with exponential decay, i.e. Cov[X(s), X(s0 )] = exp(−d(s, s0 )/a) where d(s, s0 ) is the (Euclidean) geographical distance between s and s0 . The number a is known as the scale parameter and has the dimension of a distance. Examples of realizations of this model on a fine grid are shown on figure 1. For all pairs of sites (s, s0 ), Cov[X(s), X(s0 )] and Cov[Y (s), Y (s0 )] are non zero. For this reason, the variations of X and Y both display a clear structure in space. Each variable could represent a genetic variable (e.g. the logit transform of an allele frequency for a population with continuous spread in space), a phenotypic variable or an environmental variable. If Y (s) represents an environmental variable such as the elevation or the temperature at sites s, then a matrix of pair-wise differences DY could be interpreted as a matrix of cost distances. 4 (a) (b) (c) (d) (e) Figure 1: Top: two independent random fields with an exponential covariance function with scale parameter a = 0.3 and location of 50 sampling sites; Bottom: scatter plot of individuals X values versus Y values (c), scatter plot of pairwise distances |xi − xj | versus |yi − yj | values (d), common theoretical covariance and variogram functions of X and Y (e). 5 In practice, biological or environmental variables are not observed in the continuum but at a limited number of irregularly spaced sites. We considered here the case where X and Y are observed at the same set of n = 50 sites. We considered the cases a = 0, 0.3 and 0.7. For each value of a, we simulated 200 data sets (x1 , ..., xn ), (y1 , ..., yn ), computed the Mantel statistics r with 10000 permutations and computed the associated “p-value” following Mantel’s algorithm. The distribution of these p-values is shown on figure 2 (left panels). 6 Figure 2: Quantile-quantile plots of p-values obtained on simulated data. Each point corresponds to a simulated data set. The y-axis gives the p-value returned by the simple Mantel test. The null hypothesis tested is the independence between X and Y . Left column: simple Mantel test, the matrices DX and DY are obtained from independent random fields with zero mean (no deterministic spatial trend). Middle: partial Mantel test, the matrices DX and DY are obtained from independent random fields with zero mean (no deterministic spatial trend). Right column: partial Mantel test, the matrices DX and DY are obtained from independent random fields with a deterministic linear spatial trend. For the partial Mantel test (middle and right columns), each data set was analyzed by each of the four permutation methods of Legendre and Fortin (2010). The p-values should be aligned along the diagonal. As soon as the data are spatially auto-correlated, the simple and partial Mantel tests produce an excess of Type I errors. The four permutation methods perform similarly and this whatever the intensity of the auto-correlation. 7 It could be objected that in case auto-correlation is suspected, a recommended strategy consists in entering the matrix Ds of geographical distances in a partial Mantel test between DX and DY with the aim of “controlling the effect of distances”. We implemented this strategy with the same simulation design as above. The distribution of the sets of p-values for these partial Mantel tests are represented on figure 2 (middle panels). We also carried out the same experiment with random fields displaying a deterministic spatial trend m(s) = β1 s1 + β2 s2 . We sampled the coefficients β1 and β2 independently from a N (0, 1) distribution. The distribution of p-values are shown on figure 2 (right panels). For any statistical test, the p-values should be uniformly distributed under H0 . For a = 0, i.e. in absence of spatial auto-correlation in both variables, the simple and partial Mantel tests perform well. However, in presence of auto-correlation in both variables they tend to produce a considerable excess of small p-values. The Mantel tests lead to reject the null hypothesis of independence too often and produce a much higher number of false positives than what they should do. This occurs identically for all four test procedures discussed by Legendre and Fortin (2010), showing that the discrepancies between different simulation studies is not simply explained by the use of different test procedures among these four ones. The magnitude of this excess of p-values increases with the amount of spatial dependence in the data. The inclusion of the matrix of geographical distances has no effect on “controlling” auto-correlation. Analysis Simple Mantel test Let us define H0 as “X and Y are uncorrelated”. The question is now: does the simple Mantel permutation procedure produce correlation coefficient values according to the right distribution? The distribution of the correlation coefficient involves not only the dependence structure between X and Y but also the joint distributions of (x1 , ..., xn ) and that of (y1 , ..., yn ). So the answer depends on what these joint distributions are. If (x1 , ..., xn ) and (y1 , ..., yn ) are both independent and identically distributed, then permuting the entries of (x1 , ..., xn ) breaks the potential dependence between xi and yi while leaving the joint distribution of (x1 , ..., xn ) unchanged. In the case of correlated data, the consequences of the permutation are different. For small spatial lags |si − sj | and because of the spatial structure, xi − xj and yi − yj tend to have the same order of magnitude, even though the random fields X and Y are independent. But substituting index k to index j will not leave Var[X(si ) − X(sj )] unchanged in general. How Var[X(si ) − X(sj )] relates to |si − sj | is described by the variogram function, which here is just 1 − Cov[xi , xj ] (cf. Fig. 1-e). The 8 entries of the DX matrix are therefore strongly heteroscedastic. Consequently, the permutation not only breaks the potential dependence between X and Y but also breaks the spatial structure among the entries of (x1 , ..., xn ). The permutation procedure has no reason to produce values from the distribution of r under the null hypothesis. It actually produces values that display far less dispersion than what auto-correlated data should do under the null hypothesis. In the present case, the feature of the data that is implicitly rejected is not the independence of X and Y but rather the absence of spatial structure of X and Y . Partial Mantel test In the partial Mantel test, the test statistic is the partial correlation coefficient of DX and DY given Ds . The coefficient amounts to computing the usual correlation coefficient on the residuals of DX and DY after a regression on Ds . The increments of a random field vary in a way that is far more complex than linearly (cf. Fig. 1-d). The regressions of DX on Ds and of DY on Ds fail to capture most of the variability in the data. The residuals of these regressions still display some spatial structure with an intensity that is close to that present in the site-wise values. The procedure consisting in permuting sampling units and computing correlation of residuals is subject to the same issue as in the simple Mantel test. The correlation coefficient values obtained by permutations do not display enough variability. 9 Conclusion Summary Our simulations study clearly show that the simple and partial Mantel test are not appropriate to test the dependence between two variables when they display some form of auto-correlation in space. Spatial auto-correlation is widespread in ecology and it is likely that many studies based on the simple and partial Mantel tests who concluded to the existence of an association between two sets of variables were based on an erroneously small p-value. Spatial auto-correlation is perhaps the most common form of auto-correlation in ecology but any form of auto-correlation would lead to the same issue. Our conclusions are in stark contrast with those of Legendre and Fortin (2010). The main source of difference is that Legendre and Fortin (2010) did not consider auto-correlated data in their study of the partial Mantel test, and therefore do not address previous criticisms of the partial Mantel test. In their study of the simple Mantel test, they include the analysis of auto-correlated data. However, they did not fully describe the model they considered and did not report the values of any parameter characterizing the intensity of spatial auto-correlation. When are the Mantel tests valid statistical methods? Enumerating the situations where the tests are valid is not an easy task as this requires to check that the permutation procedure produces correlation coefficient values from the distribution of under the null hypothesis. Let us at least remind here that the simple Mantel test is a valid statistical method in two important cases: (i) to test the dependence between DX and DY where both matrices derive from i.i.d random variables X and Y (but then there is no need to consider distance matrices and Mantel tests, as simpler standard procedures are available) (ii) to test the dependence between DX and Ds where (x1 , ..., xn ) are spatial data collected at sites (s1 , ..., sn ). In the latter case (not investigated by our simulations but where the same reasoning applies), the null hypothesis tested is the absence of spatial auto-correlation of X. Outside these situations, the partial Mantel test could still be valid in some cases. However, all theoretical or simulation-based assessments of its performance for two spatially auto-correlated variables have found problems with the partial Mantel test. Alternative strategies Assessing significance of the correlation between two random fields is a question that has been considered by Clifford et al. (1989); Richardson and Clifford (1991) and Dutilleul et al. (1993) for 10 quantitative continuous variables and Cerioli (2002) for categorical variables. In agreement with our results, Cerioli shows that the standard chi-square is valid unless both variables are auto-correlated. The methods proposed there can be readily used when the data are available as site-wise univariate values. They can be also adapted when the data are multivariate. The case where pair-wise distance matrices are not obtained as differences between site-wise values requires further work. We also recall that if at least one of the variables to be compared is observed at the nodes of a regular grid, shift permutations can be applied (Upton and Fingleton, 1995). Finally we note that the original problem of Mantel amounts to detecting the dependence between the marks and locations of a marked point process and has been addressed by Schlather et al. (2004). Acknowledgments: We gratefull to Murat Kulahci for comments on a earlier draft. This work has been supported by the French National Research Agency (project EMILE grant ANR-09BLAN-0145-01) and the Danish Centre for Scientific Computing. 11 References Castellano, S. and E. Balletto. 2002. Is the partial Mantel test inadequate? Evolution 56:1871– 1873. Cerioli, A. 2002. Testing mutual independence between two discrete-valued spatial processes: A correction to Pearson chi-squared. Biometrics 58:888–897. Clifford, P., S. Richardson, and D. Hemon. 1989. Assessing the significance of the correlation between two spatial processes. Biometrics 45:123–134. Dietz, E. 1983. Permutation tests for association between two distance matrices. Systematic Zoology 32:21–26. Dutilleul, P., P. Clifford, S. Richardson, and D. Hémon. 1993. Modifying the t test for assessing the correlation between two spatial processes. Biometrics 49:305–314. Gelfand, A. E., P. Diggle, P. Guttorp, and M. Fuentes, eds. 2010. Handbook of Spatial Statistics. Handbooks of Modern Statistical Methods Chapman & Hall/CRC. Lantuéjoul, C. 2002. Geostatistical simulation. Springer. Legendre, P. and M. Fortin. 2010. Comparison of the mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data. Molecular Ecology Resources 10:831–844. Mantel, N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Research 27:209–220. Raufaste, N. and F. Rousset. 2001. Are partial Mantel tests adequate? Evolution 55:1703–1705. Richardson, S. and P. Clifford. 1991. Testing association between spatial processes. Lecture NotesMonograph Series 20:295–308. Rousset, F. 2002. Partial Mantel test: reply to Castellano and Balletto. Evolution 56:1874–1875. Schlather, M., P. Ribeiro, and P. Diggle. 2004. Detecting dependence between marks and locations of marked point processes. Journal of the Royal Statistical Society, series B 66:79–93. Smouse, P., J. Long, and R. Sokal. 1986. Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Systematic Zoology 35:627–632. 12 Sokal, R. 1979. Testing statistical significance of geographic variation patterns. Systematic Zoology 28:227–323. Upton, G. and B. Fingleton. 1995. Spatial data analysis example. Series in probability and mathematical statistics Wiley. Wackernagel, H. 2003. Multivariate geostatistics : an introduction with applications. Third ed. Springer. 13
© Copyright 2026 Paperzz