Evaluating functions to describe point patterns

COMMUNITY ECOLOGY 14(1): 1-7, 2013
1585-8553/$20.00 © Akadémiai Kiadó, Budapest
DOI: 10.1556/ComEc.14.2013.1.1
Evaluating functions to describe point patterns
B. B. Hanberry1,3, Jian Yang2 and Hong S. He1
1
School of Natural Resources, 203 Natural Resources Building, University of Missouri, Columbia, MO, 65211 USA
State Key Laboratory of Forest and Soil Ecology, Institute of Applied Ecology, Chinese Academy of Sciences,
Shenyang, China
3
Corresponding author. [email protected]. Tel: 573-875-5341 ext. 230, Fax: 573-882-1977
2
Keywords: K2 functions; Pair correlation function; Point pattern; Ripley’s K; Simulation envelope.
Abstract: Three functions that analyze point patterns are the L (a transformation of Ripley’s K function), pair correlation, and K2
functions. We tested these functions with Genton’s spatial index, which measures the distance between observed and random theoretical
lines for a function over all lag distances. We simulated points from random, moderately regular, extremely regular, and clustered
distributions at different density levels and a fixed plot size and for regular distributions at varying plot sizes. The spatial index clearly
was able to identify clustered patterns. However, there was overlap in spatial index values between random and moderately regular
distributions at lower densities. The spatial index for the L function also became more negative as plot size increased, indicating edge
effects, whereas the pair correlation and K2 function values fluctuated with plot size. Using confidence envelopes rather than a spatial
index, we also identified distributions other than the known distribution at low densities. These results indicate that ecologists should
be cautious about 1) assigning random or regular distribution, particularly for the L function and the K2 function, 2) comparing point
patterns of different densities and plot sizes, and 3) interpreting the pair correlation and K2 functions.
Abbreviations: CSR–Complete Spatial Randomness, GSI - Genton's spatial index.
Introduction
Point patterns are realizations of spatial and temporal
point processes, which are common in the fields of ecology,
forestry, and natural resources. Locations of germination and
establishment, disease, insects, exotic and rare species, ice
and wind damage, and fire ignitions often occur as point patterns, which can provide information about sources, intensity,
and spread. In addition, the relative position of trees and plants,
whether random, clustered, or regular, may indicate important
ecological or biological processes such as competition, dispersal, and self-thinning (Moravie and Robert 2003).
Three common functions that describe and graph departure from spatial randomness over a range of spatial scales
are Ripley’s K, L, pair correlation, and K2 functions (Pommerening 2002). These functions are based on inter-point
distances, where in comparison to a constant intensity of
points (i.e., average density of points: expected number of
points per unit area), a greater distribution of short distances
indicates clustering and conversely, a greater distribution of
long distances indicates regularity (Genton et al. 2006). Ripley’s K function measures the mean number of events (e.g.,
stems or individuals), relative to the intensity of the point
process, within varying lag distances of an arbitrary event
(Ripley 1976). For complete spatial randomness (CSR; i.e.,
a process with uniform intensity and no spatial dependence),
2
the K function is simply K(h) =  h , the area of a disc with
the radius being the lag distance h. A transformation of the K
function produces the L function, which is defined as L(h)=
1/2
(K(h)/) – h. Interpretation of the L function is more
straightforward than the K function, because randomness is
indicated by a straight line for this equation, and values above
the line indicate clustering whereas values below the line indicate regularity (Stoyan and Penttinen 2000). The pair correlation function (also known as the g function) is an alternative option, defined as the derivative of the K function g(h)=
(K/h)/2h. In contrast to the cumulative density functions
of K or L functions, the pair correlation function represents a
probability density function of lag distance h, which may
produce stronger contrast between random point patterns
(equaling one), and clustering (values greater than one) or regularity (values less than one; Stoyan and Penttinen 2000). The K2
function is approximately the first derivative of the g function,
hence the second derivative of the K function. It was developed
to quantify spatial autocorrelation of points after correcting
bias caused by the heterogeneity of intensity (i.e., point density). Unlike the L and pair correlation functions, negative
values represent clustering (Schiffers et al. 2008).
These modern functions have replaced simple indices,
such as the aggregation index (Clark and Evans 1954) and
coefficient of segregation (Pielou 1961), which only use
nearest neighbors and produce one value for the inter-point
interaction (Pommerening 2002). Because modern functions
depend on the lag distance, they can provide information
about the spatial scales at which clustering or regularity occurs. However, when comparing the amount of clustering or
regularity between various spatial point pattern data, visual
interpretation of these graphs is subjective. Therefore in
some cases, particularly when there are numerous plots, a
summarized value that describes the overall spatial relationship of points is useful.
2
Genton et al. (2006) introduced a clustering index, defined as the area between the estimated L function (from observed data) and a completely spatial random process with
the same point intensity as the data, to quantify the overall
amount of clustering across all spatial scales. Genton et al.
(2006) used the clustering index to examine how the amount
of clustering of fire ignitions changes over various time periods. Yang et al. (2007) subsequently calculated the clustering
index in their work that compared the degree of clustering
between human-caused and lightning-caused fire ignitions.
We renamed this clustering index as Genton’s spatial index (GSI) because it should be able to quantify the overall
amount of not only clustering (when GSI > 0) but also regularity (when GSI <0). Although this index has been used to
compare the amount of clustering between different spatial
point pattern data sets, the effects of varying point densities
and sampling plot sizes associated with spatial point pattern
data sets have not been investigated for functions that describe point patterns and subsequently, the GSI. Differences
among clustering index values obtained from various point
data sets may be confounded by varying point densities or
plot sizes, rather than due to the underlying inter-point spatial
dependence. Our objectives were to use the GSI to 1) assess
how well the L, pair correlation, and K2 functions characterize simulated random, regular, and clustered distributions
over a range of densities and a constant plot size and 2) examine these three functions at a constant density and varying
plot sizes for regular distributions. We then evaluated the
usefulness of the spatial index compared to a graphical display with simulation envelopes and the three different functions to characterize spatial patterns. This practical assessment will inform ecologists about some of the limitations of
current spatial statistics to describe spatial patterning.
Functions on point patterns
oped by Schiffers et al. (2008) to calculate the K2 function.
Our standard plot was rectangular and we chose the isotropic
method for edge correction. We used Genton’s spatial index
(GSI, Genton et al. 2006) to measure the overall amount of
clustering or regularity across various lag distances (maximum lag distance Rmax is 25% of the plot length). The GSI is
defined as the difference between the total area under a spatial function curve for the observed data and that for CSR
(completely spatially random) data with the same point density. For any function, with the L function as an example below, the equation is:
(1)
where h is the lag distance and Rmax is the maximum lag
distance (25% of plot length). GSI approximately equals the
difference between the sum of the observed L function value
for each lag distance, or h, and the sum of L function values
of the corresponding CSR process for each lag distance.
Simulation design
We repeatedly simulated point patterns of random, moderately and extremely regular, and clustered distributions using a range of reasonable densities (Figure 1a-d). To represent common survey plots, we randomly generated points in
a square plot. We defined the plot size as the length of one
side, determined by the range of x and y values (e.g., -50 to
49), and defined the plot area as the plot length×plot width.
Although our units can represent any measurement length,
for convenience we refer to our units as meters and thus, a
plot size of 100 would represent 1 ha. Likewise, our points
can represent any object, but we represent the points as trees.
Methods
Clustered distributions
Overview
For the first simulations, we generated points in a plot
size of 100 (values of x and y ranging from and including -50
to 49) from clustered distributions for densities ranging from
20 to 1000 points per plot area (i.e., 20 to 1000 trees per ha)
using Python (http://docs.python.org). We evaluated the
clustered distribution by varying both the number of clusters
(5, 10, 25, 50, and 100 clusters/plot area) and the mean
number of trees per cluster (4, 6, 8, or 10), yielding densities
of 20 to 1000 trees/unit area (e.g., 5 clusters x 4 trees per cluster = 20 trees/plot area) and 12 density levels (i.e., we did not
use all possible combinations and duplicated densities). The
number of clusters and trees per cluster were drawn from a
Poisson distribution. Child trees were added by random bearing and distance. We set the maximum distance a “child” tree
location could be placed relative to the original (“parent”)
tree at 10 m in order to limit the points to the plot. We generated 1000 replicates per density level.
First, we repeatedly simulated point patterns of random,
moderately and extremely regular, and clustered distributions using densities generally ranging from 10 to 1000
points per plot. We then calculated the L, pair correlation, and
K2 functions, followed by Genton’s spatial index to measure
the overall amount of deviation from randomness (i.e., clustering or regularity). Next, we varied the plot size while
maintaining a fixed density. We then calculated the L, pair
correlation, and K2 functions, followed by Genton’s spatial
index. Lastly, we developed 95% confidence interval envelopes based on 99 simulations of random, moderately regular, and mildly clustered distributions at low and high densities for the L function to compare the conventional graphical
approach to Genton’s spatial index.
Functions and spatial index
Random and regular distributions
We used the Spatstat package in R (Baddeley and Turner
2005, R Development Core team 2010) to calculate both the
L and pair correlation functions. We used R coding devel-
We then simulated points within a plot size of 100 for
densities ranging from 10 to 1000 points/plot area from ran-
Hanberry et al.
3
a
b
c
d
Figure 1. A realized point patterns. a: random distribution, with 2500 points and plot size of 50, b: moderately regular distribution,
with 1580 points and plot size of 50, c: extremely regular distribution, with 2500 points and plot size of 50, d: clustered distribution,
with 100 clusters of 25 trees and plot size of 50.
dom, moderately regular, and extremely regular distributions. We used Spatstat for generating random and extremely
regular distributions. A moderately regular distribution was
formed in Python by generating random integer points only,
effectively forming a grid, and randomly adding or subtracting a random offset value between 0 and 1 m, allowing the
gridlines to move in relation to the center point. Although this
produces a regular distribution, gaps between points are random, and would represent a natural forest with inhibition or
a plantation with significant mortality rather than a perfect
plantation. We generated 1000 replicates per density level.
Varying plot sizes and fixed density
We generated constant density and a regular distribution
of points for varying plot sizes in Python. That is, there was
one point at each integer intersection for plot sizes ranging
from 10 (uniform point intensity of 100 trees/ha and 100
points in this plot area) to 54 (uniform point intensity of 100
trees/ha and 2916 points due to limit of 3000 points in Spatstat package). Because every realization would be identical,
we produced only one replicate per plot size. We also ran a
regression (Proc Reg; SAS software, Version 9.1, Cary,
North Carolina, USA) to predict the GSI at plot sizes of 60,
80, and 100 for the L function of a regular distribution.
Graphical identification of point patterns
Rather than using a single spatial index value, we used
95% confidence interval envelopes based on 99 simulations
of a completely spatially random point process (Spatstat
package in R) to compare 10 point patterns each from random (algorithms from Python and R) and moderately regular
(from Python) distributions at densities of 100, 200, and 300
points/plot size of 100 for the L function. Ten point patterns
represent a more realistic effort for field sampling and the
4
Functions on point patterns
densities would be the upper end of effort for number of
measurements per plot. We also produced envelopes for
point patterns of 2500 points/plot size of 100 from random
and moderately regular distributions and for about as mildly
clustered a distribution as we could produce of 300 points,
from 30 clusters with 10 child points at a maximum distance
of 20 and a plot size of 100.
Results
Although each function is on a different scale, values
more distant from 0 indicate greater clustering. All spatial
functions for the clustered process produced GSI values that
were clearly separated from 0 (Appendix 1). As density increased, the GSI for the L function decreased from about 263
to 24, the GSI for the g function decreased from about 77 to
4, and the GSI for the K2 function increased from about -113
to -3 (note that like the K2 function, the GSI of the K2 function indicates clustering by negative values and regularity by
positive values). Values were far enough away from 0 that
even after accounting for standard deviation, values will indicate a clustered distribution.
The GSI values for the random and regular processes
also consistently moved closer to 0 as density increased, stabilizing to near 0 for the random process by around 300
points/plot size of 100 (Appendix 2 for the L function, Appendix 3 for the pair correlation function, and Appendix 4 for
the K2 function; Figure 2). Mean values for random and
regular distributions overlapped across densities, particularly
when factoring in standard deviation. For the L function, densities of less than 90 trees per hectare, not uncommon in forests, for random distributions had GSI mean values that overlapped with GSI mean values from regular distributions and
had GSI mean values that indicated regular distribution (i.e,
greater than 1). The GSI values for the pair correlation function settled close to 0 by a density of 20 for random distributions, thus separating almost immediately from values of
regular distributions, however standard deviations indicated
that any one point pattern could show overlap between random and regular distributions at most densities (Appendix 3).
The GSI values for the K2 function increased steadily from
-4.45 to near 0 for random distributions, overlapping with
values of both clustered and moderately regular distributions,
which increased from -10.18 to settle at about 0.6. The K2
function for extremely regular distributions fluctuated to and
from 0, with values ranging from 2.85 to 0, indicating regular
or random distributions.
At a fixed intensity, increases in plot size decreased the
GSI (i.e., values became more negative) for the L function
(Appendix 5). The GSI values ranged from -0.69 for a plot
size of 10 to -1.34 for a plot of 54. Predicted GSI values for
larger plots continued to decrease with plot size, likely due to
2
declining impact of edge effects. The R value for the regression line was 0.99. The equation for the line was y = –0.5007
–0.0189x + 0.0001x2, where y is the GSI, x is the plot size,
and x2 is number of points (equivalent to plot area). Both the
pair correlation and K2 functions fluctuated with plot size in
Figure 2. The GSI values increase as density increases, and
values stabilize by about 300 points/plot size of 100. This example is for the L function of a random distribution.
a seemingly unpredictable manner. The GSI values illustrated by increasing plot size are important because they provide a threshold value. That is, if observed GSI values for
both regular and random distributions are less than (more
negative) these threshold values, the intensity, or density of
points in the plot area, is too low and there will be overlap
between random and regular distributions.
We produced 95% confidence interval envelopes from
99 simulations around 10 point patterns each (as 10 plots represents a more realistic effort for field measurements) from
two algorithms (Python and Spatstat) of random distributions
and one algorithm (from Python) of moderately regular distributions at 100, 200, and 300 points/plot size of 100 (i.e.,
densities where GSI values stabilized according to Figure 2)
for the L function (see Figures 3a and 3b for L function examples of a random and moderately regular distribution at
200 points/plot size of 100). At all densities, a random point
pattern clearly crossed the envelope to indicate clustered
(i.e., above the theoretical, completely spatially random line)
or regular (i.e., below the theoretical, completely spatially
random line) processes 1 to 3 of the 10 times. The moderately
regular point pattern also could indicate other processes, as it
appeared to be random 10 out of 10 times until reaching a
density of 300 points/plot size, when it was either regular or
regular in combination with clustering. Regardless of the location of the observed line, the magnitude of the oscillations
was more extreme for the regular distribution than for the
random distribution, which maintained a smoother line.
By 2500 points/plot size of 100, functions of point patterns displayed corresponding random or moderately regular
distributions (Figure 3c and 3d). Although the L function line
for random point patterns still can break the envelope, the
function was smooth. In contrast, the L function line for moderately regular point patterns had exaggerated oscillations
that were larger than the envelope and even though the function line for moderately regular distributions as a whole
Hanberry et al.
5
a
b
c
d
e
Figure 3. a. An L function of a point pattern from a random distribution for a density of 200 points/plot size of 100 with a 95% confidence envelope. Even though the L function line as a whole can move above and below the theoretical line, note the smoothness of
the line in terms of amplitude between peak and trough. b. An L function of point pattern from a moderately regular distribution for a
density of 200 points/plot size of 100 with a 95% confidence envelope. Even though the L function line as a whole can move above
and below the theoretical line, note the increased jaggedness of the line in terms of amplitude between peak and trough. c. An L
function of a point pattern from a random distribution for a density of 2500 points/plot size of 100 with a 95% confidence envelope,
showing the smoothness of the L function line. d. An L function of a point pattern from a moderately regular distribution for a density of 2500 points/plot size of 100 with a 95% confidence envelope, which shows that the magnitude of the L function line is more
indicative of a regular distribution than simply the position below the theoretical line. e. An L function of a point pattern from a clustered distribution, for 300 points from 30 clusters and 10 child points with a maximum distance of 20 and a plot size of 100, which
shows that the position above the theoretical line indicates clustering.
6
could cross both above and below the random line, most of
the observed line was below the completely spatially random
line. Conversely, functions of clustered point patterns at even
relatively low densities consistently rose above the random
line, clearly indicating clustered distributions (Figure 3e).
Discussion
It is difficult to classify the general point process for a
forest based on graphical displays of functions for multiple
plots. A spatial index can provide an objective value for functions that describe spatial points, by measuring the overall
distance between the observed and theoretical, random lines.
Testing of Genton’s spatial index confirmed that a clustered
point pattern would differentiate from random and regular
distributions with functions, making the spatial index a useful tool to detect clustering, as applied by Genton et al. (2006)
and Yang et al. (2007). Of course, use of Genton’s spatial
index to detect multiple patterns at different spatial scales
would not be appropriate.
Furthermore, the spatial index highlighted problems with
functions that would be difficult to detect with graphs. The
spatial index showed that values (of the distance between the
theoretical line and the observed line) will decrease with increasing density and increase with increasing plot size.
Therefore, comparisons among spatial patterns from different densities and plot sizes may be problematic. Also, plot
size appears to be a factor that can affect point pattern analyses. Plot size effects potentially are associated with edge corrections, which are applied to any point pattern function. We
selected a standard Ripleys’s isotropic correction.
In addition, functions cannot differentiate reliably between known random and regular distributions (based on differences between the observed and random line) at low densities, due to inherent variability. The spatial index values for
both random and regular distributions overlapped, albeit the
values for the regular distribution remained more negative
than the random distribution for each density. The L function
is not able to distinguish between these two distributions as
well as (i.e., at as low densities) the pair correlation function,
and the K2 function exhibited overlap among all distributions. For 1000 replicates, low densities will be less than 90
points/plot size of 100 (or the equivalent intensity for a different plot size, e.g., 15 points/plot size of 50) as determined
by the spatial index. Due to considerable variation, for
smaller samples sizes (i.e., realistic field sample sizes), detection of the underlying process probably will require
greater densities of around 100 to 300 points/plot size of 100,
at least for the L function.
Inability to differentiate between known random and
regular distributions at low densities is true for graphical displays with confidence envelopes as well as the spatial index.
Patterns from both random and regular distributions can
move above and below the high and low lines of the confidence envelope at low densities when there are not enough
points in the pattern to determine the distribution. Hui et al.
(2007) also noted that it was easy to wrongly determine dis-
Functions on point patterns
tribution using the L function and Perry et al. (2006) also
found that the K function misidentified a regular pattern for
a clustered pattern. It appears that rather than differentiating
between random and regular distributions based on the position of the observed line compared to the theoretical line and
envelope lines, a better interpretation would be based on the
magnitude of the peaks of the lines. This again is subjective,
and another tool that can delineate between random and regular distributions is needed for standard use. For example,
Loosmore and Ford (2006) suggested a goodness-of-fit test
as a replacement for the Monte Carlo envelope.
The L function was the most consistent and easiest to interpret of the three functions when taking into account varying plot size. Although the pair correlation function produced
less overlap between functions of point patterns from random
and regular distributions, and therefore is a better function,
the fluctuation in GSI values of the pair correlation function
with constant intensity and increasing plot size was beyond
our ability to interpret. The K2 function, developed for heterogeneous intensity (Schiffers et al. 2008), which fluctuated
for both fixed and variable intensity for extremely regular
distributions and additionally produced overlap among all
distributions, appeared to us to be the least interpretable.
Conclusions
A summarized value that describes the overall spatial relationship of points is useful. The GSI is a tool to 1) easily
identify clustering, 2) save time by quickly measuring distance from randomness for large sample sizes or high densities, and 3) detect problems not possible with graphical displays. However, the GSI, along with simulation envelopes,
cannot differentiate among point patterns of random and
moderately regular processes when density is low and sample sizes are small. Our results also showed that there may be
problems with edge correction methods.
Ecological processes create changes in patterns over
time. Standard measurements include composition, density,
and biomass, but these measurements do not preclude the
value of other statistics, including spatial structuring of communities. Although there are many potential applications for
spatial statistics, current statistics may not be providing accurate information when density is low, sample sizes are
small, or plot sizes vary. In these cases, ecologists should be
cautious about the accuracy of results.
Acknowledgements. B. Hanberry received support from the
National Fire Plan of the USDA Forest Service, Northern Research Station and J. Yang received support from the Chinese
Academy of Sciences (09YBR211SS).
References
Baddeley, A. and R. Turner. 2005. Spatstat, an R package for analyzing spatial point patterns. J. Stat. Softw. 12:1-42.
Hui, G., L. Li, Z. Zhao and D. Puxing. 2007. Comparison of methods
in analysis of the tree spatial distribution pattern. Acta Ecol. Sin.
27:4717-4728.
Hanberry et al.
7
Clark, P. J. and F. C. Evans. 1954. Distance to the nearest neighbor
as a measure of spatial relationships in populations. Ecology
35:445-453.
Genton, M. G., D. T. Butry, M. L. Gumpertz and J. P Prestemon.
2006. Spatio-temporal analysis of wildfire ignitions in the St
Johns River Management District, Florida. Int. J. Wildland Fire
15:87-97.
Loosemore, N. B. and E. D. Ford. 2006. Statistical inference using
the G or K point pattern spatial statistics. Ecology 87:1925-1931.
Moravie, M-A. and A. Robert. 2003. A model to assess relationships
between forest dynamics and spatial structure. J. Veg. Sci.
14:823-834.
Perry, G. L. W., B. P. Miller and N. J. Enright. 2006. A comparison
of methods for the statistical analysis of spatial point patterns in
plant ecology. Plant Ecol. 187:59-82.
Pielou, E. C. 1961. Segregation and symmetry in two-species populations as studied by neares-neighbor relationships. J. Ecol.
49:255-269.
Pommerening, A. 2002. Approaches to quantifying forest structures.
Forestry 75:305-324.
Ripley, B. D. 1976. The second-order analysis of stationary point
processes. J. Appl. Probab. 13:255-266.
Schiffers, K., F. M. Schurr, K. Tielbörger, C. Urbach, K. Moloney
and F. Jeltsch. 2008. Dealing with virtual aggregation – a new
index for analyzing heterogeneous point patterns. Ecography
31:545-555.
Stoyan, D. and A. Penttinen. 2000. Recent applications of point process methods in forestry statistics. Stat.Sci. 15:61-78.
The R Development Core Team 2010. R. 2.11.1. A language and
environment. www.r-project.org.
Yang, J., H. S. Hong, S. R. Shifley and E. J. Gustafson. 2007. Spatial
patterns of modern period human-caused fire occurrence in the
Missouri Ozark highlands. Forest Sci. 53:1-15.
Received
Revised
Accepted
Electronic appendices
Appendix 1. Genton’s spatial index, based on Ripley’s L (L),
pair correlation (g), or K2 functions with a plot size of 100 (x
and y values ranging from -50 to 49), for clustered distributions of point densities ranging from 20 to 1000, for 1000
replications.
Appendix 2. Genton’s spatial index, based on Ripley’s L
function for a plot size of 100 (x and y values ranging from
-50 to 49), for random, moderately (Mod) regular, and extremely (Ext) regular distributions of point densities ranging
from 10 to 1000 and 1000 replications.
Appendix 3. Genton’s spatial index, based on the pair correlation function for a plot size of 100 (x and y values ranging
from -50 to 49), for random, moderately (Mod) regular, and
extremely (Ext) regular distributions of point densities ranging from 10 to 1000 and 1000 replications.
Appendix 4. Genton’s spatial index, based on the K2 function for a plot size of 100 (x and y values ranging from -50
to 49), for random, moderately (Mod) regular, and extremely
(Ext) regular distributions of point densities ranging from 10
to 1000 and 1000 replications.
Appendix 5. Genton’s spatial index, based on Ripley’s L (L),
pair correlation (g), or K2 functions for a constant intensity
and increasing plot size from 10 to 54, with predicted values
for the L function.
The file may be downloaded from the web site of the publisher at www.akademiai.com.