COMMUNITY ECOLOGY 14(1): 1-7, 2013 1585-8553/$20.00 © Akadémiai Kiadó, Budapest DOI: 10.1556/ComEc.14.2013.1.1 Evaluating functions to describe point patterns B. B. Hanberry1,3, Jian Yang2 and Hong S. He1 1 School of Natural Resources, 203 Natural Resources Building, University of Missouri, Columbia, MO, 65211 USA State Key Laboratory of Forest and Soil Ecology, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, China 3 Corresponding author. [email protected]. Tel: 573-875-5341 ext. 230, Fax: 573-882-1977 2 Keywords: K2 functions; Pair correlation function; Point pattern; Ripley’s K; Simulation envelope. Abstract: Three functions that analyze point patterns are the L (a transformation of Ripley’s K function), pair correlation, and K2 functions. We tested these functions with Genton’s spatial index, which measures the distance between observed and random theoretical lines for a function over all lag distances. We simulated points from random, moderately regular, extremely regular, and clustered distributions at different density levels and a fixed plot size and for regular distributions at varying plot sizes. The spatial index clearly was able to identify clustered patterns. However, there was overlap in spatial index values between random and moderately regular distributions at lower densities. The spatial index for the L function also became more negative as plot size increased, indicating edge effects, whereas the pair correlation and K2 function values fluctuated with plot size. Using confidence envelopes rather than a spatial index, we also identified distributions other than the known distribution at low densities. These results indicate that ecologists should be cautious about 1) assigning random or regular distribution, particularly for the L function and the K2 function, 2) comparing point patterns of different densities and plot sizes, and 3) interpreting the pair correlation and K2 functions. Abbreviations: CSR–Complete Spatial Randomness, GSI - Genton's spatial index. Introduction Point patterns are realizations of spatial and temporal point processes, which are common in the fields of ecology, forestry, and natural resources. Locations of germination and establishment, disease, insects, exotic and rare species, ice and wind damage, and fire ignitions often occur as point patterns, which can provide information about sources, intensity, and spread. In addition, the relative position of trees and plants, whether random, clustered, or regular, may indicate important ecological or biological processes such as competition, dispersal, and self-thinning (Moravie and Robert 2003). Three common functions that describe and graph departure from spatial randomness over a range of spatial scales are Ripley’s K, L, pair correlation, and K2 functions (Pommerening 2002). These functions are based on inter-point distances, where in comparison to a constant intensity of points (i.e., average density of points: expected number of points per unit area), a greater distribution of short distances indicates clustering and conversely, a greater distribution of long distances indicates regularity (Genton et al. 2006). Ripley’s K function measures the mean number of events (e.g., stems or individuals), relative to the intensity of the point process, within varying lag distances of an arbitrary event (Ripley 1976). For complete spatial randomness (CSR; i.e., a process with uniform intensity and no spatial dependence), 2 the K function is simply K(h) = h , the area of a disc with the radius being the lag distance h. A transformation of the K function produces the L function, which is defined as L(h)= 1/2 (K(h)/) – h. Interpretation of the L function is more straightforward than the K function, because randomness is indicated by a straight line for this equation, and values above the line indicate clustering whereas values below the line indicate regularity (Stoyan and Penttinen 2000). The pair correlation function (also known as the g function) is an alternative option, defined as the derivative of the K function g(h)= (K/h)/2h. In contrast to the cumulative density functions of K or L functions, the pair correlation function represents a probability density function of lag distance h, which may produce stronger contrast between random point patterns (equaling one), and clustering (values greater than one) or regularity (values less than one; Stoyan and Penttinen 2000). The K2 function is approximately the first derivative of the g function, hence the second derivative of the K function. It was developed to quantify spatial autocorrelation of points after correcting bias caused by the heterogeneity of intensity (i.e., point density). Unlike the L and pair correlation functions, negative values represent clustering (Schiffers et al. 2008). These modern functions have replaced simple indices, such as the aggregation index (Clark and Evans 1954) and coefficient of segregation (Pielou 1961), which only use nearest neighbors and produce one value for the inter-point interaction (Pommerening 2002). Because modern functions depend on the lag distance, they can provide information about the spatial scales at which clustering or regularity occurs. However, when comparing the amount of clustering or regularity between various spatial point pattern data, visual interpretation of these graphs is subjective. Therefore in some cases, particularly when there are numerous plots, a summarized value that describes the overall spatial relationship of points is useful. 2 Genton et al. (2006) introduced a clustering index, defined as the area between the estimated L function (from observed data) and a completely spatial random process with the same point intensity as the data, to quantify the overall amount of clustering across all spatial scales. Genton et al. (2006) used the clustering index to examine how the amount of clustering of fire ignitions changes over various time periods. Yang et al. (2007) subsequently calculated the clustering index in their work that compared the degree of clustering between human-caused and lightning-caused fire ignitions. We renamed this clustering index as Genton’s spatial index (GSI) because it should be able to quantify the overall amount of not only clustering (when GSI > 0) but also regularity (when GSI <0). Although this index has been used to compare the amount of clustering between different spatial point pattern data sets, the effects of varying point densities and sampling plot sizes associated with spatial point pattern data sets have not been investigated for functions that describe point patterns and subsequently, the GSI. Differences among clustering index values obtained from various point data sets may be confounded by varying point densities or plot sizes, rather than due to the underlying inter-point spatial dependence. Our objectives were to use the GSI to 1) assess how well the L, pair correlation, and K2 functions characterize simulated random, regular, and clustered distributions over a range of densities and a constant plot size and 2) examine these three functions at a constant density and varying plot sizes for regular distributions. We then evaluated the usefulness of the spatial index compared to a graphical display with simulation envelopes and the three different functions to characterize spatial patterns. This practical assessment will inform ecologists about some of the limitations of current spatial statistics to describe spatial patterning. Functions on point patterns oped by Schiffers et al. (2008) to calculate the K2 function. Our standard plot was rectangular and we chose the isotropic method for edge correction. We used Genton’s spatial index (GSI, Genton et al. 2006) to measure the overall amount of clustering or regularity across various lag distances (maximum lag distance Rmax is 25% of the plot length). The GSI is defined as the difference between the total area under a spatial function curve for the observed data and that for CSR (completely spatially random) data with the same point density. For any function, with the L function as an example below, the equation is: (1) where h is the lag distance and Rmax is the maximum lag distance (25% of plot length). GSI approximately equals the difference between the sum of the observed L function value for each lag distance, or h, and the sum of L function values of the corresponding CSR process for each lag distance. Simulation design We repeatedly simulated point patterns of random, moderately and extremely regular, and clustered distributions using a range of reasonable densities (Figure 1a-d). To represent common survey plots, we randomly generated points in a square plot. We defined the plot size as the length of one side, determined by the range of x and y values (e.g., -50 to 49), and defined the plot area as the plot length×plot width. Although our units can represent any measurement length, for convenience we refer to our units as meters and thus, a plot size of 100 would represent 1 ha. Likewise, our points can represent any object, but we represent the points as trees. Methods Clustered distributions Overview For the first simulations, we generated points in a plot size of 100 (values of x and y ranging from and including -50 to 49) from clustered distributions for densities ranging from 20 to 1000 points per plot area (i.e., 20 to 1000 trees per ha) using Python (http://docs.python.org). We evaluated the clustered distribution by varying both the number of clusters (5, 10, 25, 50, and 100 clusters/plot area) and the mean number of trees per cluster (4, 6, 8, or 10), yielding densities of 20 to 1000 trees/unit area (e.g., 5 clusters x 4 trees per cluster = 20 trees/plot area) and 12 density levels (i.e., we did not use all possible combinations and duplicated densities). The number of clusters and trees per cluster were drawn from a Poisson distribution. Child trees were added by random bearing and distance. We set the maximum distance a “child” tree location could be placed relative to the original (“parent”) tree at 10 m in order to limit the points to the plot. We generated 1000 replicates per density level. First, we repeatedly simulated point patterns of random, moderately and extremely regular, and clustered distributions using densities generally ranging from 10 to 1000 points per plot. We then calculated the L, pair correlation, and K2 functions, followed by Genton’s spatial index to measure the overall amount of deviation from randomness (i.e., clustering or regularity). Next, we varied the plot size while maintaining a fixed density. We then calculated the L, pair correlation, and K2 functions, followed by Genton’s spatial index. Lastly, we developed 95% confidence interval envelopes based on 99 simulations of random, moderately regular, and mildly clustered distributions at low and high densities for the L function to compare the conventional graphical approach to Genton’s spatial index. Functions and spatial index Random and regular distributions We used the Spatstat package in R (Baddeley and Turner 2005, R Development Core team 2010) to calculate both the L and pair correlation functions. We used R coding devel- We then simulated points within a plot size of 100 for densities ranging from 10 to 1000 points/plot area from ran- Hanberry et al. 3 a b c d Figure 1. A realized point patterns. a: random distribution, with 2500 points and plot size of 50, b: moderately regular distribution, with 1580 points and plot size of 50, c: extremely regular distribution, with 2500 points and plot size of 50, d: clustered distribution, with 100 clusters of 25 trees and plot size of 50. dom, moderately regular, and extremely regular distributions. We used Spatstat for generating random and extremely regular distributions. A moderately regular distribution was formed in Python by generating random integer points only, effectively forming a grid, and randomly adding or subtracting a random offset value between 0 and 1 m, allowing the gridlines to move in relation to the center point. Although this produces a regular distribution, gaps between points are random, and would represent a natural forest with inhibition or a plantation with significant mortality rather than a perfect plantation. We generated 1000 replicates per density level. Varying plot sizes and fixed density We generated constant density and a regular distribution of points for varying plot sizes in Python. That is, there was one point at each integer intersection for plot sizes ranging from 10 (uniform point intensity of 100 trees/ha and 100 points in this plot area) to 54 (uniform point intensity of 100 trees/ha and 2916 points due to limit of 3000 points in Spatstat package). Because every realization would be identical, we produced only one replicate per plot size. We also ran a regression (Proc Reg; SAS software, Version 9.1, Cary, North Carolina, USA) to predict the GSI at plot sizes of 60, 80, and 100 for the L function of a regular distribution. Graphical identification of point patterns Rather than using a single spatial index value, we used 95% confidence interval envelopes based on 99 simulations of a completely spatially random point process (Spatstat package in R) to compare 10 point patterns each from random (algorithms from Python and R) and moderately regular (from Python) distributions at densities of 100, 200, and 300 points/plot size of 100 for the L function. Ten point patterns represent a more realistic effort for field sampling and the 4 Functions on point patterns densities would be the upper end of effort for number of measurements per plot. We also produced envelopes for point patterns of 2500 points/plot size of 100 from random and moderately regular distributions and for about as mildly clustered a distribution as we could produce of 300 points, from 30 clusters with 10 child points at a maximum distance of 20 and a plot size of 100. Results Although each function is on a different scale, values more distant from 0 indicate greater clustering. All spatial functions for the clustered process produced GSI values that were clearly separated from 0 (Appendix 1). As density increased, the GSI for the L function decreased from about 263 to 24, the GSI for the g function decreased from about 77 to 4, and the GSI for the K2 function increased from about -113 to -3 (note that like the K2 function, the GSI of the K2 function indicates clustering by negative values and regularity by positive values). Values were far enough away from 0 that even after accounting for standard deviation, values will indicate a clustered distribution. The GSI values for the random and regular processes also consistently moved closer to 0 as density increased, stabilizing to near 0 for the random process by around 300 points/plot size of 100 (Appendix 2 for the L function, Appendix 3 for the pair correlation function, and Appendix 4 for the K2 function; Figure 2). Mean values for random and regular distributions overlapped across densities, particularly when factoring in standard deviation. For the L function, densities of less than 90 trees per hectare, not uncommon in forests, for random distributions had GSI mean values that overlapped with GSI mean values from regular distributions and had GSI mean values that indicated regular distribution (i.e, greater than 1). The GSI values for the pair correlation function settled close to 0 by a density of 20 for random distributions, thus separating almost immediately from values of regular distributions, however standard deviations indicated that any one point pattern could show overlap between random and regular distributions at most densities (Appendix 3). The GSI values for the K2 function increased steadily from -4.45 to near 0 for random distributions, overlapping with values of both clustered and moderately regular distributions, which increased from -10.18 to settle at about 0.6. The K2 function for extremely regular distributions fluctuated to and from 0, with values ranging from 2.85 to 0, indicating regular or random distributions. At a fixed intensity, increases in plot size decreased the GSI (i.e., values became more negative) for the L function (Appendix 5). The GSI values ranged from -0.69 for a plot size of 10 to -1.34 for a plot of 54. Predicted GSI values for larger plots continued to decrease with plot size, likely due to 2 declining impact of edge effects. The R value for the regression line was 0.99. The equation for the line was y = –0.5007 –0.0189x + 0.0001x2, where y is the GSI, x is the plot size, and x2 is number of points (equivalent to plot area). Both the pair correlation and K2 functions fluctuated with plot size in Figure 2. The GSI values increase as density increases, and values stabilize by about 300 points/plot size of 100. This example is for the L function of a random distribution. a seemingly unpredictable manner. The GSI values illustrated by increasing plot size are important because they provide a threshold value. That is, if observed GSI values for both regular and random distributions are less than (more negative) these threshold values, the intensity, or density of points in the plot area, is too low and there will be overlap between random and regular distributions. We produced 95% confidence interval envelopes from 99 simulations around 10 point patterns each (as 10 plots represents a more realistic effort for field measurements) from two algorithms (Python and Spatstat) of random distributions and one algorithm (from Python) of moderately regular distributions at 100, 200, and 300 points/plot size of 100 (i.e., densities where GSI values stabilized according to Figure 2) for the L function (see Figures 3a and 3b for L function examples of a random and moderately regular distribution at 200 points/plot size of 100). At all densities, a random point pattern clearly crossed the envelope to indicate clustered (i.e., above the theoretical, completely spatially random line) or regular (i.e., below the theoretical, completely spatially random line) processes 1 to 3 of the 10 times. The moderately regular point pattern also could indicate other processes, as it appeared to be random 10 out of 10 times until reaching a density of 300 points/plot size, when it was either regular or regular in combination with clustering. Regardless of the location of the observed line, the magnitude of the oscillations was more extreme for the regular distribution than for the random distribution, which maintained a smoother line. By 2500 points/plot size of 100, functions of point patterns displayed corresponding random or moderately regular distributions (Figure 3c and 3d). Although the L function line for random point patterns still can break the envelope, the function was smooth. In contrast, the L function line for moderately regular point patterns had exaggerated oscillations that were larger than the envelope and even though the function line for moderately regular distributions as a whole Hanberry et al. 5 a b c d e Figure 3. a. An L function of a point pattern from a random distribution for a density of 200 points/plot size of 100 with a 95% confidence envelope. Even though the L function line as a whole can move above and below the theoretical line, note the smoothness of the line in terms of amplitude between peak and trough. b. An L function of point pattern from a moderately regular distribution for a density of 200 points/plot size of 100 with a 95% confidence envelope. Even though the L function line as a whole can move above and below the theoretical line, note the increased jaggedness of the line in terms of amplitude between peak and trough. c. An L function of a point pattern from a random distribution for a density of 2500 points/plot size of 100 with a 95% confidence envelope, showing the smoothness of the L function line. d. An L function of a point pattern from a moderately regular distribution for a density of 2500 points/plot size of 100 with a 95% confidence envelope, which shows that the magnitude of the L function line is more indicative of a regular distribution than simply the position below the theoretical line. e. An L function of a point pattern from a clustered distribution, for 300 points from 30 clusters and 10 child points with a maximum distance of 20 and a plot size of 100, which shows that the position above the theoretical line indicates clustering. 6 could cross both above and below the random line, most of the observed line was below the completely spatially random line. Conversely, functions of clustered point patterns at even relatively low densities consistently rose above the random line, clearly indicating clustered distributions (Figure 3e). Discussion It is difficult to classify the general point process for a forest based on graphical displays of functions for multiple plots. A spatial index can provide an objective value for functions that describe spatial points, by measuring the overall distance between the observed and theoretical, random lines. Testing of Genton’s spatial index confirmed that a clustered point pattern would differentiate from random and regular distributions with functions, making the spatial index a useful tool to detect clustering, as applied by Genton et al. (2006) and Yang et al. (2007). Of course, use of Genton’s spatial index to detect multiple patterns at different spatial scales would not be appropriate. Furthermore, the spatial index highlighted problems with functions that would be difficult to detect with graphs. The spatial index showed that values (of the distance between the theoretical line and the observed line) will decrease with increasing density and increase with increasing plot size. Therefore, comparisons among spatial patterns from different densities and plot sizes may be problematic. Also, plot size appears to be a factor that can affect point pattern analyses. Plot size effects potentially are associated with edge corrections, which are applied to any point pattern function. We selected a standard Ripleys’s isotropic correction. In addition, functions cannot differentiate reliably between known random and regular distributions (based on differences between the observed and random line) at low densities, due to inherent variability. The spatial index values for both random and regular distributions overlapped, albeit the values for the regular distribution remained more negative than the random distribution for each density. The L function is not able to distinguish between these two distributions as well as (i.e., at as low densities) the pair correlation function, and the K2 function exhibited overlap among all distributions. For 1000 replicates, low densities will be less than 90 points/plot size of 100 (or the equivalent intensity for a different plot size, e.g., 15 points/plot size of 50) as determined by the spatial index. Due to considerable variation, for smaller samples sizes (i.e., realistic field sample sizes), detection of the underlying process probably will require greater densities of around 100 to 300 points/plot size of 100, at least for the L function. Inability to differentiate between known random and regular distributions at low densities is true for graphical displays with confidence envelopes as well as the spatial index. Patterns from both random and regular distributions can move above and below the high and low lines of the confidence envelope at low densities when there are not enough points in the pattern to determine the distribution. Hui et al. (2007) also noted that it was easy to wrongly determine dis- Functions on point patterns tribution using the L function and Perry et al. (2006) also found that the K function misidentified a regular pattern for a clustered pattern. It appears that rather than differentiating between random and regular distributions based on the position of the observed line compared to the theoretical line and envelope lines, a better interpretation would be based on the magnitude of the peaks of the lines. This again is subjective, and another tool that can delineate between random and regular distributions is needed for standard use. For example, Loosmore and Ford (2006) suggested a goodness-of-fit test as a replacement for the Monte Carlo envelope. The L function was the most consistent and easiest to interpret of the three functions when taking into account varying plot size. Although the pair correlation function produced less overlap between functions of point patterns from random and regular distributions, and therefore is a better function, the fluctuation in GSI values of the pair correlation function with constant intensity and increasing plot size was beyond our ability to interpret. The K2 function, developed for heterogeneous intensity (Schiffers et al. 2008), which fluctuated for both fixed and variable intensity for extremely regular distributions and additionally produced overlap among all distributions, appeared to us to be the least interpretable. Conclusions A summarized value that describes the overall spatial relationship of points is useful. The GSI is a tool to 1) easily identify clustering, 2) save time by quickly measuring distance from randomness for large sample sizes or high densities, and 3) detect problems not possible with graphical displays. However, the GSI, along with simulation envelopes, cannot differentiate among point patterns of random and moderately regular processes when density is low and sample sizes are small. Our results also showed that there may be problems with edge correction methods. Ecological processes create changes in patterns over time. Standard measurements include composition, density, and biomass, but these measurements do not preclude the value of other statistics, including spatial structuring of communities. Although there are many potential applications for spatial statistics, current statistics may not be providing accurate information when density is low, sample sizes are small, or plot sizes vary. In these cases, ecologists should be cautious about the accuracy of results. Acknowledgements. B. Hanberry received support from the National Fire Plan of the USDA Forest Service, Northern Research Station and J. Yang received support from the Chinese Academy of Sciences (09YBR211SS). References Baddeley, A. and R. Turner. 2005. Spatstat, an R package for analyzing spatial point patterns. J. Stat. Softw. 12:1-42. Hui, G., L. Li, Z. Zhao and D. Puxing. 2007. Comparison of methods in analysis of the tree spatial distribution pattern. Acta Ecol. Sin. 27:4717-4728. Hanberry et al. 7 Clark, P. J. and F. C. Evans. 1954. Distance to the nearest neighbor as a measure of spatial relationships in populations. Ecology 35:445-453. Genton, M. G., D. T. Butry, M. L. Gumpertz and J. P Prestemon. 2006. Spatio-temporal analysis of wildfire ignitions in the St Johns River Management District, Florida. Int. J. Wildland Fire 15:87-97. Loosemore, N. B. and E. D. Ford. 2006. Statistical inference using the G or K point pattern spatial statistics. Ecology 87:1925-1931. Moravie, M-A. and A. Robert. 2003. A model to assess relationships between forest dynamics and spatial structure. J. Veg. Sci. 14:823-834. Perry, G. L. W., B. P. Miller and N. J. Enright. 2006. A comparison of methods for the statistical analysis of spatial point patterns in plant ecology. Plant Ecol. 187:59-82. Pielou, E. C. 1961. Segregation and symmetry in two-species populations as studied by neares-neighbor relationships. J. Ecol. 49:255-269. Pommerening, A. 2002. Approaches to quantifying forest structures. Forestry 75:305-324. Ripley, B. D. 1976. The second-order analysis of stationary point processes. J. Appl. Probab. 13:255-266. Schiffers, K., F. M. Schurr, K. Tielbörger, C. Urbach, K. Moloney and F. Jeltsch. 2008. Dealing with virtual aggregation – a new index for analyzing heterogeneous point patterns. Ecography 31:545-555. Stoyan, D. and A. Penttinen. 2000. Recent applications of point process methods in forestry statistics. Stat.Sci. 15:61-78. The R Development Core Team 2010. R. 2.11.1. A language and environment. www.r-project.org. Yang, J., H. S. Hong, S. R. Shifley and E. J. Gustafson. 2007. Spatial patterns of modern period human-caused fire occurrence in the Missouri Ozark highlands. Forest Sci. 53:1-15. Received Revised Accepted Electronic appendices Appendix 1. Genton’s spatial index, based on Ripley’s L (L), pair correlation (g), or K2 functions with a plot size of 100 (x and y values ranging from -50 to 49), for clustered distributions of point densities ranging from 20 to 1000, for 1000 replications. Appendix 2. Genton’s spatial index, based on Ripley’s L function for a plot size of 100 (x and y values ranging from -50 to 49), for random, moderately (Mod) regular, and extremely (Ext) regular distributions of point densities ranging from 10 to 1000 and 1000 replications. Appendix 3. Genton’s spatial index, based on the pair correlation function for a plot size of 100 (x and y values ranging from -50 to 49), for random, moderately (Mod) regular, and extremely (Ext) regular distributions of point densities ranging from 10 to 1000 and 1000 replications. Appendix 4. Genton’s spatial index, based on the K2 function for a plot size of 100 (x and y values ranging from -50 to 49), for random, moderately (Mod) regular, and extremely (Ext) regular distributions of point densities ranging from 10 to 1000 and 1000 replications. Appendix 5. Genton’s spatial index, based on Ripley’s L (L), pair correlation (g), or K2 functions for a constant intensity and increasing plot size from 10 to 54, with predicted values for the L function. The file may be downloaded from the web site of the publisher at www.akademiai.com.
© Copyright 2026 Paperzz