Data Transformations

CHAPTER 9
Data Transformations
M ost data sets benefit by one or more data
transform ations. The reasons for transforming data
can be grouped into statistical and ecological reasons:
Statistical
•
im prove assumptions o f normality,
homogeneity of variance, etc.
linearity,
•
m ake units of attributes comparable when mea­
sured on different scales (for example, if you have
elevation ranging from 100 to 2000 meters and
slope from 0 to 30 degrees)
Ecological
•
m ake distance measures work better
•
reduce the effect of total quantity (sample unit
totals) to put the focus on relative quantities
•
equalize (or otherwise alter) the relative impor­
tance of common and rare species
•
Domains and ranges
Bear in mind that some transformations are
unreasonable or even impossible for certain types of
data.
Table 9.1 lists the kinds of data that are
potentially usable for each transformation.
Monotonie transformations
Power transformation
Different parameters (exponents) for the transfor­
mation change the effect of the transformation; p = 0
gives presence/absence, p - 0.5 gives square root, etc.
The smaller the parameter, the more compression
applied to high values (Fig. 9.1 ).
The square root transformation is similar in effect
to, but less drastic than, the log transform. Unlike the
log transform, special treatment of zeros is not needed.
The square root transformation is commonly used.
Less frequent is a higher root, such as a cube root or
fourth root (Fig. 9.1). For example. Smith et al. (2001)
em phasize informative species at the expense of
uninformative species.
M onotonie transformations are applied to each
elem ent of the data matrix, independent of the other
elements. They are "monotonie" because they change
the values of the data points without changing their
rank. Relativizations adjust matrix elements by a row
or colum n standard (e.g.. maximum, sum, mean, etc.).
One transform ation described below, Beals smoothing,
is unique in being a probabilistic transformation
based on both row and column relationships. In this
chapter, we also describe other adjustments to the data
matrix, including deleting rare species, combining
entities, and calculating first differences for time
series data.
10
9
8
7
6
5
4
power
1/2
pow er
13
3
power " 1 10
pow er = 1 4
2
It is difficult to overemphasize the potential
importance of transformations. They can make the
difference between illusion and insight, fog and clarity .
To use transformations effectively requires a good
understanding of their effects, and a clear vision of
your goals.
1
0
0
25
50
75
100
x
Notation.— In all of the transformations described
below,
Figure 9.1. Effect of square root and higher root
transformations, b = f(x). Note that roots higher than
three are essentially presence-absence transformations,
yielding values close to 1 for all nonzero values.
x,j = the original value in row / and column j of the
data matrix
b,} = the adjusted value that replaces x„.
67
(
Table 9 1
'h a p te r 9
Domain of input and range of output from transformations
Reasonable and acceptable
domain of x
Range of fix /
MONOTONIC TRANSFORMATIONS
x" (power)
all
0 or 1 only
x
nonnegative
nonnegative
log(x)
positive
all
(2/7t)-arcsin(x)
0 <x< 1
0 to 1 inclusive
(2/7T)-arcsin (x ")
0<x < 1
0 to 1 inclusive
0 or 1 only
0 to 1 inclusive
(power)
SMOOTHING
Beals smoothing
ROW/COLUMN RELATIVIZATIONS
general
nonnegative
0 to 1 inclusive
by maximum
nonnegative
0 to 1 inclusive
by mean
all
all
by standard deviates
all
generally between -10 and 10
binary by mean
all
0 or 1 only
rank
all
positive integers
binary by median
all
0 or 1 only
ubiquity
nonnegative
nonnegative
information function of ubiquity
nonnegative
nonnegative
applied a cube root to count data, a choice supported bv
an optimization procedure. Roots at a higher power
than three nearly transform to presence-absence:
nonzero values become close to one. while zeros
remain at zero.
Logarithmic transformation
K = lo g (* „ )
Log transformation compresses high values and
spreads low values by expressing the values as orders
of magnitude. Log transformation is often useful when
there is a high degree of variation within variables or
when there is a high degree of variation among
attributes within a sample. These are commonly true
with count data and biomass data.
Log transformations are extremely useful for many
kinds of environmental and habitat variables, the log­
normal distribution being one of the most common in
nature See Limpert et al. (2001) for a general intro­
duction to lognormal distributions and applications in
various sciences. They claim that the abundance of
species follows a truncated lognormal distribution,
citing Sugihara (1980) and M agurran (1988) While
the nonzero values of community data sets often
resemble a lognormal distribution, excluding zeros
often amounts to ignoring half of a data set. The log­
normal distribution is fundamentally flawed when
applied to community data because a zero value is.
more often than not. the most frequent abundance
value for a species. Nevertheless, the log transforma­
tion is extremely useful in community analysis,
providing that one carefully handles the problem of
log(O) being undefined.
To log-transform data containing zeros, a small
number must be added to all data points. If the lowest
nonzero value in the data is one (as in count data), then
it is best to add one before applying the transforma­
tions:
hi} = lo g (x ,; + 1 )
Data Transformation
If. however, the lowest nonzero value of x differs
from one by more than an order of magnitude, then
adding one will distort the relationship between zeros
and other values in the data set For example, biomass
data often contain many small decimal fractions
(values such as 0.00345 and 0.00332) ranging up to
fairly large values (in the hundreds). Adding a one to
the whole data set will tend to compress the resulting
distribution at the low end of the scale. The order-ofmagnitude difference between 0.003 and 0.03 is lost if
you add a one to both values before log transformation:
log( 1.003) is about the same as log( 1.03).
The following transformation is a generalized
procedure that (a) tends to preserve the original order
of magnitudes in the data and (b) results in values of
zero when the initial value was zero. Given:
Min(.v) is the smallest nonzero value in the data
lnt(x) is a function that truncates x to an integer by
dropping digits after the decimal point
c = order of magnitude constant = Int(log(Min(.v))
J —decimal constant = log'1 (c)
then the transformation is
b,j = log(x,; + J ) - c
Subtracting the constant c from each element of the
data set after the log transforma­
tion shifts the values such that the
lowest value in the data set will be
a zero
For example, if the smallest
nonzero value in the data set is
0.00345. then
0.8
< x < 1 The function arcsin is the same as sin 1 or
inverse sine Data m ust range between zero and one.
inclusive If they do not. you should rclativize before
selecting this transformation.
Unlike the arcsine-squareroot transformation, an
arcsine transformation is usually counterproductive in
community ecology, because it tends to spread the high
values and compress the low values (Fig 9.2). This
might be useful for distributions with negative skew,
but community data almost alway s have positiv e skew
Arcsine sqnareroot transformation
bj = 2 /π * a rc sin (д /х ^ )
The arcsine-squareroot transformation spreads (he
ends of the scale for proportion data, while com­
pressing the middle (Fig. 9.2). This transformation is
recommended by many statisticians for proportion
data, often improving normality (Sokal and Rohlf
1995). The data must range between zero and one.
inclusive. The arcsine-squareroot is multiplied by 2/π
to rescale the result so that it ranges from 0 to 1
The logit transformation, b = ln(x/(l-x)). is also
sometimes used for proportion data (Sokal and Rohlf
1995). However, if x = 0 or x = 1. then the logit is
undefined Often a small constant is added to prev ent
-
log(min(x)) = -2.46
c = mt(log(min(x))) = -2
arcsin (sq rt(x )
log'1 (c) = 0.01.
Applying the transformation
some example values:
to
If x = 0.
then/) = log(0+0.01)-(-2).
therefore b = 0.
arcsin(x
If x = 0.00345.
then b = log(0.00345+0.01 )-(-2),
therefore b = 0.128.
A resine transformation
h,, = 2 /π * arcsin (x ,,)
The constant 2/π scales the
result of arcsin(x) [in radians] to
range from 0 to 1. assuming that 0
Figure 9.2. Effect of several transformations on proportion data
ln(0) and division by /ero
Alternatively, empirical
logits may be used (see Sokal and Rohlf 1995.762).
Because zeros are so common in community data, it
seems reasonable to use the arcsine squareroot or
squareroot transformations to avoid this problem.
Beals smoothing
Beals smoothing replaces each cell in the commu­
nity matrix with a probability of the target species
occurring in that particular sample unit, based on the
loint occurrences of the target species with the species
that are actually in the sample unit. The purpose of
this transformation (also known as the sociological
favorability index, Beals 1984) is to relieve the "zerotruncation problem" (Beals 1984). This problem is
nearly universal in community data sets and most
severe in heterogeneous community data sets that
contain a large number of zeros (i.e., most samples
contain a fairly small proportion of the species). Beals
smoothing replaces presence/absence or other binary
data with quantitative values that represent the
"favorability" of each sample for each species, regard­
less of whether the species was present in the sample.
The index evaluates the favorability of a given sample
for species
based on the whole data set, using the
proportions of joint occurrences between the species
that do occur in the sample and species i.
where S, is the number of species in sample unit i, A/,*
is the number of sample units with both species j and k.
and Λ), is the number of sample units with species k.
This transformation is illustrated in Box 9.1
This transformation is essentially a smoothing
operation designed for community data (McCune
1994). As with any numerical smoothing, it tends to
reduce the noise in the data by enhancing the strongest
patterns. In this case the signal that is smoothed is the
pattern of joint occurrences in the data. This is an
extremely powerful transformation that is particularly
effective on heterogeneous or noisy data. Caution is
warranted, however, because, as for any smoothing
function, this transformation can produce the appear­
ance of reliable, consistent trends even from a series of
random numbers.
This transformation should not be used on data
sets with few zeros. It also should not be used if the
data are quantitative and you do not want to lose this
information
Beals smoothing can be slow' to compute If you
have a large data set and a slow computer, be sure to
allocate plenty of time This transformation is avail­
able in PC-ORD but apparently not in other packages
for statistical analysis.
Relativizations
"To relativize or not to relativize. that focuses the
question. " (Shakespeare. ????)
Relativizations rescale individual rows (or
columns) in relationship to some criterion based on the
other rows (or columns)
Any relativization can be
applied to either rows or columns.
Relativization is an extremely important tool that
all users of multivariate statistics in community eco­
logy MUST understand There is no right or wrong
answer to the question of whether or not to relativize
UNTIL one specifies the question and examines the
properties of the data.
If the row totals are approximately equal, then
relativization by rows will have little effect. Consis­
tency of row totals can be evaluated by the coefficient
of variation (CV) of the row totals (Table 9.2). The
CV% is calculated as l(K)*(standard deviation / mean).
In this case, it is the standard deviation of the row
totals divided by the mean of the row totals.
Table 9.2. Ev aluation of degree of v ariability in row or
column totals as measured with the coefficient of
variation of row or column totals
CV, %
< 50
Variability among rows (or columns)
Small. Relativization usually has small
effect on qualitative outcome of the
analysis
50-100
Moderate (with a correspondingly
moderate effect on the outcome of
further analy sis)
100-300
Large. Large effect on results.
> 300
Very large.
Data Transformation
Box 9.1. Example of Beals smoothing
Data matrix X before transformation (3 sample units χ 5 species):
spi
SUI
SU2
SU3
sp2
1
0
1
2
•V,
sp3
sp4
sp5
1
0
0
1
1
0
1
0
0
1
2
1
S,
S, = number of species in sample unit /.
N, = num ber of sample units with species j.
Construct m atrix M. where M,k = number of sample units with both species j and k.
(Note that where j = k. then Mjk = Nį).
Species k
1
2
3
4
5
Species j
1
2
3
4
5
2
1
1
1
1
1
0
0
0
1
1
1
2
1
1
Construct new matrix B containing values transformed with Beals smoothing function:
h,,
-
I
S,
Σ
I
k \ NkJ
fo r all k w ith x lk τ
0
Data after transformation (B):
spl
sp2
sp3
sp4
sp5
SUI
0.88
0 13
0.75
0.88
0.75
SU2
0.50
0.00
0.50
1.00
0 50
SU3
1.00
0.75
0.25
0.25
0.25
Example for sample unit 1 and species 2:
b u = 1/4 (1/2 + 0/1 + 0 /2 + 0/1)
b l2 = 0.25 (0.5)
b¡ : = 0.125 (rounded to 0.13 in matrix above)
Example for sample unit 3 and species 2:
b3,2 = 1/2 (1 /2 + 1/1)
¿-3.’ = 0 5 (1 5 )
bl2 = 0.75
(
120
120
100
100
80
♦
♦
70
Sum of Cover
\ 80
o
CJ
«> 60
80
60
*
20
0
0
50
20
Cj
40
3
CO
0
20
25
20
5 0.4 l·
30
♦
0.3
♦
10
0.2
5
0.1
20
10
30
30
30
k. 0.7 l·<o
¿ o · 6 l·
'S 0-5 r
15
10
0.9 h
0.8 \~
55
N .
10
1
40
30
20
0
30
p resenceab sen ce
45
Frequency
♦
c
10
25
rei. by sp ecies
totals
Ö 20
>
o
rei. by sp ecies
maximum
♦
U
'S 15
♦
¿o 10
L
0
0
20
10
10
0
30
20
Beals
sm oothing
12
1♦
o
> 10 h
o
30
U
ko
25
20
♦
15
♦
e3
4
10
2
5
4
0
10
20
Rank
rei. by SU
totals
35
5 30
o
O 25
c-
8
CO 6
\1
%
r ♦
[ *
)
20
♦
♦
♦
♦
«v
co 15
%
10
5
0
0
30
log (x)
then rei. by
sp ecies max.
40
14
35
30
45
♦
40
20
10
30
16
%
45
Sum p(occur)
♦
c
20
50
Ł)
S 50
3
СЛ 40
♦
sqrt (x)
60
log (x)
raw data
40
’h a p te r 9
10
20
Rank
30
10
20
30
Rank
Figure 9.3. Effect of various transformations on relative weighting of species. Species abundance was measured
on a continuous, quantitative scale. 'Rank" is the order of species ranked by their abundance.
Data Transformation
[f the row or column totals are unequal, one must
decide whether to retain this information as part of the
analysis or whether to remove it by relativizing. One
must justify this decision on biological grounds, not on
its effect on the CV of row or column totals. For
example, consider two quadrats with identical propor­
tions of three species, but one quadrat has a total cover
of 1% and the other has a total cover of 95%. If the
data are relativized, then the quadrats appear similar or
identical. If they are not relativized, then distance
measures will consider them to be very different.
Which choice is correct? The answer depends on the
question. Does the question refer to proportions of
different species or is the total amount also important?
If the latter is true, the data should not be relativized.
An example demonstrates how relativization can
change the focus of the analysis. Menges et al. (1993)
reported rates of vegetation change based on both
relativized and nonrelativized tree species data, begin­
ning with a matrix of basal area of each species in
remeasured permanent plots. They used absolute rates
to emphasize structural changes (e.g.. increase in basal
area of existing species) and relative rates to emphasize
shifts in species composition (changes in the relative
proportions of species).
Relativization is often used to put variables that
were measured in different units on an equal footing.
For example, a data set may contain counts for some
species and cover for other species. In forest ecology ,
one may wish to combine basal area data for trees with
cover data for herbs.
If the species measured in
different units are to be analyzed together, then one
must relativize the data such that the quantify for each
species is expressed as a proportion of some total or
maximum abundance
Relativizations can have a huge effect on the
relative weighting of rare and abundant species. Raw
quantitative data on a continuous scale tends to have a
lew abundant species and many rare species (Fig. 9,3)
A multivariate analysis of these raw data might
emphasize only a few species, ignoring most of the
species. Log or square-root transformation of the data
usually moderates the imbalance, while relativization
by species totals can eliminate it completely (Fig. 9.3).
fins is. however, a drastic transformation.
Rare
species often occur haphazardly, so that giving them a
lot of weight greatly increases the noise in the analysis
General relativization
By rows:
By columns:
for a matrix of n rows and q columns.
The parameter, p. can be set to achieve different
objectives. If p = 1, relativization is by row or column
totals. This is appropriate when using analytical tools
based on city-block distance measures, such as BrayCurtis or Sorensen distance If p = 2. you are "stand­
ardizing by the norm" (Greig-Snuth 1983. p. 248)
Using p = 2 is the Euclidean equivalent of relativi­
zation by row or column totals. It is appropriate when
the analysis is based on a Euclidean distance measure.
The same effect can be achieved by using "relative
Euclidean distance" (see Chapter 6).
Relativization by maximum
b,j = Xy/xmax,
where rows (i) are samples and columns Ų) are species,
xmax, is the largest value in the matrix for species j.
As for relativization by species totals, this adjustment
tends to equalize common and uncommon species
Relativization by species maxima equalizes the heights
of peaks along environmental gradients, while relativ ization by species totals equalizes the areas under the
curves of species responses.
Many people have found this to be an effective
transformation for community data. A couple of cau­
tions should be heeded, however. ( 1) very rare species
can cause considerable noise in subsequent analyses if
not omitted; (2) this and any other statistic based on
extreme values can accentuate sampling error
Adjustment to mean
The row or column mean is subtracted from each
value, producing positive and negative numbers If
relativized by rows, the means are row means: if by
columns, the means are column means. The negative
numbers obviate proportion-based distance measures,
such as Sorensen and Jaccard. This unstandardized
73
( 'hapte r 9
centering procedure can have detrimental effects on
analysis of community data, it lends to emphasize to
values of /.ero more than does the raw data
Also,
more variable species are reduced in importance
relative to more constant species
ing on the number of zeros in each row or column For
example, the values 0, 0, 0. 0. 6, 9 would receive the
ranks 2.5, 2.5. 2.5, 2.5. 5, 6. while the values 0. 0. 6. 9
would receive the ranks 1.5, 1.5, 3. 4
Binary with respect to median:
Adjustment to standard deviate
K = ( x v - X j ) i Sj
where .v, is the standard deviation within column j.
Each transformed value represents the number of
standard deviations that it differs from the mean, often
known as " 2 scores." As for all of the relativizations.
tins transformation can be applied to either rows or
columns It is. however, usually applied to variables
(columns). This transformation results in all variables
having mean = 0 and variance = 1.
Because this transformation produces both positive
and negative numbers, it is NOT compatible with
proportion-based distance measures, such as Soren­
sen's While this transformation is of limited utility
for species data, it can be a verv useful relativization
for environmental variables, placing them on equal
footing for a variety of purposes.
Binary with respect to mean
h,j = 1 if x,, > x ,
bj = 1 if x,, > m e d ia n , b„ = 0 i f x,, < m ed ia n
The transformed values are zeros or ones
An
element is assigned a zero if its value is less than or
equal to the row or column median The element is
assigned a one if its value is greater than the row or
column median. This transformation can be used to
emphasize the optimal pans of a species range, at the
same time equalizing to some extent the weight given
to rare and dominant species. The Rank adjustment
caution also applies to this relativization because it too
is based on ranks
Weighting by ubiquity
bi} = U j Xy
where
UJ = Nj / N
If rows are samples, columns are species, and
relativization is by columns, more ubiquitous species
are given more weight. Under these conditions, .V, is
the number of samples in which species j occurs and Y
is the total number of samples
h,j = 0 if Xy < x
An element is assigned a zero if its value is less
than or equal to the row or column mean, x. The
element is assigned a one if its value is above the
mean Applied to species (columns), this transforma­
tion can be used to contrast above-average conditions
with below-average conditions
The transformation
therefore emphasizes the optimal parts of a species
distribution. It also tends to equalize the influence of
common and rare species Applied to sample units, it
emphasizes dominant species and is likely to eliminate
many species, particularly those that rarely, if ever,
occur in high abundances.
Rank adjustment
Matrix elements are assigned ranks within rows or
columns such that the row or column totals are con­
stant. Ties are assigned the average rank of the tied
elements For example, the values 1, 3, 3. 9. 10 would
receive ranks 1. 2.5. 2.5. 4, 5.
This transformation should be applied with
caution
For example, most community data have
many zeros These zeros are counted as ties. Because
the number of zeros in each row or column will vary,
zeros will be transformed to different values, depend­
Information function o f ubiquity
\
= 11 X:
where
Ij = ~ P j ^ g ( p J) - ( i - p J) \ o g ( ] - p J)
and pj = ,V,/.V with V, and V as defined above.
To illustrate the effect of this relativization,
assume that rows are samples, columns are species,
and relativization is by columns. Maximum weight is
applied to species occurring in half of the samples
because those species have the maximum information
content, according to information theory. Very com­
mon and rare species receive little weight. Note that if
there are empty columns, the transformation will fail
because the log of zero is undefined
Double relativizations
The relativizations described above can be applied
in various combinations to rows then columns or viceversa. When applied in series, the last relativization
necessarily mutes the effect of the preceding relativi­
zation.
Data Transformation
The most common double relativization was first
used by Bray and Curtis (1957). They first relativized
by species maximum, equalizing the rare and abundant
species, then they relativized by SU total. This and
other double relativizations tend to equalize emphasis
am ong SUs and among species. This conies at a cost
of dim inishing the intuitive meaning for individual
data values.
Austin and Greig-Sntith (1968) proposed a
"contingency deviate" relativization. This measures
the deviation from an expected abundance.
The
expected abundance is based on the assumption of
independence of the species and the samples. Expected
abundance is calculated from the marginal totals of the
n x p data set. just as if it were a large contingency
table:
P
Ί
Σ Σ
χ Ί
L
K
7 '!
*
*i
-
x 'j
'=1
~ r ~ ,—
ΣΣ*.
J --1
i- l
The resulting values include both negative and
positiv e values and are centered on zero. The row and
column totals become zero Because this transforma­
tion produces negative numbers, it is incompatible w'ith
proportion-based distance measures.
One curious feature of this transformation is that
zeros take on various values, depending on the margi­
nal totals. The meaning of a zero is taken differently
depending on whether the other elements of that row
and column create large or small marginal totals.
With sample unit x species data, a zero for an other­
wise common species will be given more weight (i.e.. a
more negative value).
This may be ecologically
meaningful, but applied to rows the logic seems
counter-intuitive: a species that is absent from an
otherwise densely packed sample unit will also be
given high weight.
Deleting rare species
Deleting rare species is a useful way of reducing
the bulk and noise in your data set without losing much
information. In fact, it often enhances the detection of
relationships between community composition and
environmental factors. In PC-ORD, you select deletion
ot columns "with fewer than N nonzero numbers." For
example, if ,V = 3. then all species with less than 3
occurrences are deleted. If .V = 1. all empty species
(columns) are deleted.
Deleting rare species is clearly inappropriate if you
wish to examine patterns in species diversity. Cao et
al. (1999) correctly pointed this out but confused the
issue by citing proponents of deletion of rare species
who were concerned with extracting patterns with
multivariate analysis, not with comparison of species
diversity. None of the authors they criticized suggested
deleting rare species prior to analysis of species
richness.
For multivariate analysis of correlation structure
(in the broad sense), it is often helpful to delete rare
species. As an approximate rule of thumb, consider
deleting species that occur in fewer than 5% of the
sample units. Depending on vour purpose, however,
vou may wish to retain all species or eliminate an even
higher percentage
Some analysts object to removal of rare species on
the grounds that we are discarding good information.
Empirically this can be shown true or false by using an
external criterion of what is "good' information. You
can try this yourself. Use a familiar data set that has at
least a moderately strong relationship between
communities and a measured env ironmental factor
Ordinate (Part 4) the full data set. rotate the solution to
align it with that environmental variable (Ch 15). and
record the correlation coefficient between the environ­
mental variable and the axis scores. Now delete all
species occurring in just one sample unit. Repeat the
ordination-^rotation->correlation procedure Progres­
sively delete more species (those only in two sample
units, etc.). until only the few most common species
remain. Now plot the correlation coefficients against
the number of species retained (Fig 9 4).
In our experience the correlation coefficient
usually peaks at some intermediate level of retention of
species (Fig. 9.4)
When including all species, the
noise from the rare ones weakens the structure slightly
On the other hand, when including only a few
dominant species, too little redundancy remains in the
data for the environmental gradient to be clearly
expressed.
A second example compared the effect of stand
structures on small mammals using a blocked design
(D. Waldien 2002, unpublished)
Fourteen species
were enumerated in 24 stands, based on trapping data,
then relativized bv species maxima. The treatment
effect size was measured with blocked MRPP (Ch 24).
using the A statistic (chance-corrected within-group
agreement). Rare species were successively deleted,
beginning with the rarest one. until only half of the
species remained. In this case, removal of the four
rarest species increased slightly the apparent effect size
Chapter У
“· — Depth to
waier table
0.8
Ό
g
■Distance
from stream
° '6
ψ 0.4
U,
Elevation
above stream
0.2
0.0
0
5
10 15 20 25 30 35 40 45
Criterion for Species R em oval
(occurrence in % o f SUs)
Figure 9.4. Correlation between ordination axis scores and environmental variables can often
be improved by removal of rare species. In this case, the strength of relationship between
hydrologie variables and vegetation, as measured by r \ is maximized with removal of species
occurring in fewer than 5-15% of the sample units, depending on the hydrologie variable.
The original data set contained 88 species. 59. 35. 16. and 9 species remained after removal
of species occurring in fewer than 5, 15, 40, and 45% of the sample units, respectively. Data
are courtesy of Nick Otting (1996, unpublished).
(Fig. 9.5). The fifth and sixth rarest species, however,
were distinctly patterned with respect to the treatment,
so their removal sharply dim inished the apparent effect
si/e.
Another objection to removal of rare species is that
you cannot test hypotheses about whole-community
structure, if you exclude rare species. Certainly this is
true for hypotheses about diversity. But it also applies
to other measures of community structure. Statistical
hypothesis tests are always, in some form or another,
based on evaluating the relative strength of signal and
noise. Because removal of rare species tends to reduce
noise, the signal is more likely to be detected. This can
be taken as an argument against removal of rare
species because it introduces a bias toward rejecting a
null hypothesis. Alternatively, one can define before­
hand the community of interest as excluding the rare
species and proceed without bias
0.2«
A
0.05
0.00
0
1 2
3
4
3
6
Number o f species removed
Figure 9.5. Response of A statistic (blocked MRPP)
to removal of rare species from small mammal
trapping data. A measures the effect size of the
treatments, in this case different stand structures.
Mark Fulton (1998, unpublished) summarized the
noise vs. signal problem well:
N oise and inform ation can only be defined in
the context o f a question o f interest. A n analogy:
we are sitting in a noisy restau ran t trvm g to have a
conversation.
From the point o f view o f our
attem pting to com m unicate, the am bient sound
around us is
n o ise .'
Yet that noise carries all
k in d s o f inform ation — that clatter over to the left
is the bus person clearing dishes at the next table:
the laughter across the room is in response to the
7
punchline o f a fairly good joke, w ith a little a tten ­
tion you can h ear what the tw o m en in business
suits tw o tab les over are arguing about: and that
rum ble you ju s t heard is a truck full o f furniture
turning the c o m er outside the restaurant. B ut none
o f th is inform ation is relevant to the conversation,
and so we filter it out w ithout thinking about the
process much.
V egetation analysis is a process o f noise
filtering right trom the very start D ata collection
Data Transformation
its e lf is a trem endous filtering process. W e decide
w hat N O T to m easure Any transform atio ns we do
First difference of time series
on the data — w hether w eighting, rescaling, or
deletion o f rare sp ecies — is also a filtering
process. O rdination its e lf is a further filter. The
p attern s in the w hole н -d im ensional m ess are o f
less interest than a carefully selected reduction o f
those patterns. T he point is. as scientists, w e need
to do this process o f inform ation selection and
n o ise reduction carefully and w ith full know ledge
o f w hat w e are doing. T here is no single procedure
w hich w ill alw ays b n n g out the inform ation o f
interest. D ata selection, transform ation, and analy­
sis can only be judged on how w ell they work in
relatio n to the questions at hand.
If your data form a time series (sample units are
repeatedly evaluated at fixed locations), you may want
to ordinate the differences in abundance between
successive dates rather than the raw abundances:
Combining entities
Aggregate sample units (SUs) can be created by
averaging existing SUs.
Each new entity is the
"centroid" of the entities that you average. See GreigSmith (1983. p. 286) for comments on ordinating
groups of SUs In general, community SUs should not
be averaged unless they are very sim ilar If SUs are
heterogeneous, then the average species composition
tends to fall outside the variation of the SUs. the
averages being unnaturally species-rich.
If your rows are SUs. and vou also have an
environm ental matrix, you should also calculate
centroids for environmental data.
Be careful if you
have categorical environmental variables. Depending
on how the categories are structured, averaging the
categories can be meaningless.
Difference between two dates
Before-and-after data on species abundance
obtained by revisiting the same SUs can be analyzed as
differences, rather than the original quantities. If a„,
and a ,,2 are the abundances of species j in sample unit i
at times 1 and 2. then the difference between dates is
b„ ~ a,jz * a,ß
The transformed data represents changes through
time Even with species abundance data, this transfor­
mation yields variables that are more or less normally
distributed with means near zero and with both posi­
tive and negative numbers. After this transformation,
be sure not to use methods that demand nonnegative
numbers: proportion coefficients (such as Sorensen) as
distance measures and techniques based on Correspon­
dence Analysis (CA. RA. CCA, DCA. Twinspan). On
the other hand. PCA and other techniques calling for
multivariate normal data and linear relationships
among variables will work far better on such a matrix
than they would with either matrix alone.
bn
a u j .į - a ¡i.'
for a community sampled at times t and /+1 This is
simply the extension through time of the idea described
in the preceding section. This transformation can be
called a "first difference " (Allen et al. 1977) because it
is analogous to the first derivative of a time series
curve. With community data, a matrix of first differ­
ences represents changes in species composition. If we
visualize changes in species composition as vectors in
species space, the matrix of differences represents the
lengths and directions of those vectors A matrix of
"second" differences would represent the rates of
acceleration (or deceleration) of sample units moving
through species space.
The matrix of first differences takes into account
the direction of compositional change. For example,
assume that the plankton in a lake go through two
particular compositional states in the fall, then go
through the same compositional states in the spring,
but in the opposite direction. The difference between
the two fall samples is not. therefore, the same as the
difference between the spring samples, even though the
absolute values of the differences are equal. Analyzing
the signed difference is logical, but other possibilities
exist.
Allen et al. (1977) analyzed the absolute
differences, creating a matrix of species' contributions
to community change, without regard to the direction
of the change:
b„
=
! η,,.,.ι
-
a ,,! ,
I
If environmental variables are recorded at each
date, you might analyze species change from time t to
ir 1 in relationship to the state of the environment at
time t.
Alternatively, you could apply the first
difference transformation to the environmental
variables as well, to analyze the question of how
community change is related to environmental change.
On the other hand, variables that are constant through
time for a given sample unit through time (e.g.,
location or treatment variables) could be retained
without transformation.
Note that the statistical properties of these
differences are radically different from the original
data. For more information, see the preceding section
on differences between two dates.
C'h a p ter 9
Species data
procedure for data adjustments that will be applicable
to many community data sets (Table 9.3). For more
details on steps 2, 3, and 4. consult the preceding
pages. For more detail on step 5, consult the section
on outliers in Chapter 7
While one can easily grasp the logic of a particular
data adjustment, the number of combinations and
sequences can be bewildering. Although it is impos­
sible to write a step-by-step cookbook that covers all
possible data sets and goals, we suggest a general
The sequence of actions is important
For
example, we check for outliers last, because many
apparent outliers will disappear, depending on the
monotonic transformations or relativizations that are
used.
A general procedure for data
adjustments
Table 9.3. Suggested procedure for data adjustments of species data matrices.
Action to be considered
Criteria
1. Calculate descriptive statistics. Repeat this after
each step below. (In PC-ORD run Row & column
summary)
Beta diversity (community data sets)
Average skewness of columns
Coefficient of variation (CV, %)
CV of row totals
Always
CV of column totals
2. Delete rare species (< 5% of sample units)
Usually applied to community' data sets, unless
contrary to study goals
3 Monotonie transformation (if applied to species,
then usually applied uniformly to all of them, so that
all are scaled the same)
A. Average skew ness of columns (species)
B. Data range over how many orders of magnitude .’
(Count and biomass data often are extreme.)
C Beta diversity. (Consider presence/absence
transformation for community data when ß is high.)
4 Row or column relativizations
What is the question .’
Are units for all variables the same.’
Is relativization built into the subsequent analysis?
CV o f row totals
CV of column totals
What distance measure do you intend to use?
Note: regardless of your decision to relativize or not.
you should state your decision and justify it briefly on
biological grounds
standard
deviation
5. Check for outliers based on the average distance of
each point from all other points. Calculate standard
deviation of these average distances. Describe
outliers and take steps to reduce influence, if
necessary
<2
2-2.3
2.3-3
>3
78
degree of
problem
no problem
weak outlier
moderate outlier
strong outlier
ƒlala Transformation
Environmental data
Adjustments of environmental data depend greatly on their intended use. as indicated in Table 9,4.
Categorical and binary variables in general need 110 adjustment, but one should always examine quantitative
environmental variables.
Table 9.4. Suggested procedure for data adjustments of quantitative variables in environmental data matrices.
Action to be considered
1. Calculate descriptive statistics for
quantitative variables. Repeat this
after each step below. (In PC-ORD
run Row & column summary)
Criteria
Al wavs
Skewness and range for each
variable (column)
2. Monotonic transformation (applied
to individual variables, depending on
need)
Consider log or square root transformation for variables with
skewness > 1 or ranging over several orders of magnitude.
3 Column relativizations
Consider column relativization (by norm or standard deviates) if
environmental variables are to be used in a distance-based
analysis that does not automatically relativize the variables (for
example, using MRPP to answer the question: do groups of
sample units defined by species differ in environmental space0).
Column relativization is not necessary for analyses that use the
variables one at a time (e.g., ordination overlays) or for analyses
with built-in standardization (e.g.. PCA of a correlation matrix).
4. Check for univariate outliers and
take corrective steps if necessary.
Examine scatterplots or frequency distributions or relativize by
standard deviates ("z scores ') and check for high absolute
values.
Consider arcsine squareroot transformation for proportion data.