investigation of temporarl variations in relative abundance of

•
Scientific Excellence • Resource Protection & Conservation • Benefits for Canadians
Excellence scientifique • Protection et conservation des ressources • Benefices au x Canadiens
INVESTIGATION OF TEMPORARL
VARIATIONS IN RELATIVE ABUNDANCE
OF MACROINVERTEBRATES IN LAKE
224 OF THE EXPERIMENTAL LAKES
AREA BY A MULTIVARIATE METHOD
..
R.F. Misra, I.J. Davies, N.H .F. Watson, and J.F. Uthe
Physical and Chemical Sciences Branch
Scotia-Fundy Region
Department of Fisheries and Oceans
P.O. Box 550
Halifax, Nova Scotia
Canada B3J 2S7
1995
Canadian Technical Report of
Fisheries and Aquatic Sciences
2026
1+1
Fi sheries
and Oceans
Peches
et Oceans
Canadian Technical Report of
Fisheries and Aquatic Sciences
Technical reports contain scientific and technical information that contributes to
existing knowledge but which is not normally appropriate for primary literature.
Technical reports are directed primarily toward a worldwide audience and have an
international distribution. No restriction is placed on subject matter and the series
reflects the broad interests and policies of the Department of Fisheries and Oceans,
namely, fisheries and aquatic sciences.
Technical reports may be cited as full publications. The correct citation appears
above the abstract of each report. Each report is abstracted in Aquatic Sciences and
Fisheries Abstracts and indexed in the Department's annual index to scientific and
technical publications.
Numbers 1-456 in this series were issued as Technical Reports of the Fisheries
Research Board of Canada. Numbers 457-714 were issued as Department of the
Environment, Fisheries and Marine Service, Research and Development Directorate
Technical Reports. Numbers 715-924 were issued as Department of Fisheries and the
Environment, Fisheries and Marine Service Technical Reports. The current series
name was changed with report number 925.
Technical reports are produced regionally but are numbered nationally. Requests
for individual reports will be filled by the issuing establishment listed on the front cover
and title page. Out-of-stock reports will be supplied for a fee by commercial agents.
Rapport technique canadien des
sciences halieutiques et aquatiques
Les rapports techniques contiennent des renseignements scientifiques et techniques qui constituent une contribution aux connaissances actuelles, mais qui ne sont
pas normalement appropries pour la publication dans un journal scientifique. Les
rapports techniques sont destines essentiellement a un public international et ils sont
distribues a cet echelon. 11 n'y a aucune restriction quant au sujet; de fait, la serie reflete
la vaste gamme des interets et des politiques du ministere des Peches et des Oceans,
c'est-A-dire les sciences halieutiques et aquatiques.
Les rapports techniques peuvent etre cites comme des publications completes. Le
titre exact parait au-dessus du résumé de chaque rapport. Les rapports techniques sont
résumés dans la revue Résumés des sciences aquatiques et halieutiques, et ils sont
classes dans l'index annual des publications scientifiques et techniques du Ministere.
Les numeros 1 a 456 de cette serie ont ete publies a titre de rapports techniques de
]'Office des recherches sur les pecheries du Canada. Les numeros 457 a 714 sont parus
titre de rapports techniques de la Direction generale de la recherche et du developpement, Service des peches et de la mer, ministere de l'Environnement. Les numeros 715 a
924 ont ete publies a titre de rapports techniques du Service des peches et de la mer,
ministere des Peches et de l'Environnement. Le nom actuel de la serie a ete etabli lors
de la parution du numero 925.
Les rapports techniques sont produits a ]'echelon regional, mais numerotes
]'echelon national. Les demandes de rapports seront satisfaites par l'etablissement
auteur dont le nom figure sur la couverture et la page du titre. Les rapports epuises
seront fournis contre retribution par des agents commerciaux.
Canadian Technical Report of
Fisheries and Aquatic Sciences 2026
1995
INVESTIGATION OF TEMPORAL VARIATIONS IN RELATIVE ABUNDANCE OF
MACROINVERTEBRATES IN LAKE 224 OF THE EXPERIMENTAL LAKES AREA
BY A MULTIVARIATE MEmOD
by
R.K. Misra, I.J. Davies 1, N.H.F. Watson, and J.F. Uthe
Science Sector
Scotia-Fundy Region
Department of Fisheries and Oceans
P.O. Box 550
Halifax, Nova Scotia B3J 2S7
Canada
- Freshwater Institute, Department of Fisheries and Oceans, Winnipeg,
Manitoba R3T 2N6
ii
©Minister of Supply and Service Canada 1995
Cat. No. Fs 97-6/1960E
ISSN 0706-6457
Correct citation for this publication:
R.K. Misra, I.J. Davies, N.H.F. Watson, and J.F. Uthe. 1995. Investigation of
temporal variations in relative abundance of macroinvertebrates in Lake 224
of the Experimental Lakes Area by a multivariate method. Can Tech. Rep. Fish.
Aquat. Sci. 2026: vii + 26 p.
iii
TABLE OF CONTENTS
. . . . . . . . . vi
ABSTRACT
RESUM£
. . . . vi
.
INTRODUCTION
. . . .
1
MATERIALS AND METHODS
• • . •
4
STATISTICAL METHOD
•..•..
ANALYSIS AND RESULTS
A. TIME TRENDS
6
• . . . • . . . . . . . • • . • • . • . • • • . . • . . . . . 16
• . . . . . . . . • • . . . . . • • . . • . . . • . . . . . • • . . . . 17
B. PAIRED COMPARISONS OF SAMPLES (YEARS) . . • . . . . . . • • . .
18
DISCUSSION • • • • . . . . . • . . • . . • • . . • • . • . • . . .
18
ACKNOWLEDGMENT
REFERENCES
. • •
. . . . . . . . . . 21
. . . . . . . . . . 21
iv
TABLES
TABLE 1
. . . . 24
TABLE 2
24
TABLE 3
25
TABLE 4
25
v
FIGURE
FIGURE 1
• • . • 26
vi
ABSTRACT
R.K. Misra, I.J. Davies, N.H.F. Watson, and J.F. Uthe. 1995. Investigation of
temporal variations in relative abundance of macroinvertebrates in Lake 224 of
the Experimental Lakes Area by a multivariate method. Can Tech. Rep. Fish.
Aquat. Sci. 2026: vii + 26 p.
We investigate temporal variations
in relative abundances
of
macroinvertebrates in Lake 224 of the Experimental lakes Area by a multivariate
method. The method analyses vectors of proportions of abundances of biological
populations (years) for paired comparisons of populations (useful for studying
similarities among populations) and linear comparisons among them (required for
investigating time trends and deviations from it) in which heterogeneity of
covariance matrices was accounted for. Contributions of individual taxa to these
comparisons were examined. Data on annual proportions (relative abundance) of
the five most common invertebrate taxa in Lake 224 for the period, 1986-1990
were analysed. The findings of the statistical analyses were generally consistent
with the visual examination of data, while allowing an overall trend to be
postulated. Even with the high degree of annual variation in taxa proportions
and diverse trends in the component taxa, the method was robust enough to
detect an overall trend in the relatively few years of data available. A linear
trend explained a significant portion of the overall five-year trend, but not all
of it. Chronologically adjacent communities were most similar. As temporal
separation increased, the communities became less alike. It is expected that an
analysis of annual trends will yield more meaningful results than paired
comparisons of community structure, and that annual sampling is preferable to
less frequent sampling.
IU!:SUM:t;:
R.K. Misra, I.J. Davies, N.H.F. Watson, and J.F. Uthe. 1995. Investigation of
temporal variations in relative abundance of macroinvertebrates in Lake 224 of
the Experimental Lakes Area by a multivariate method. Can Tech. Rep. Fish.
Aquat. Sci. 2026: vii + 26 p.
Nous etudions par une methode a plusieurs variables les variations temporelles
de l'abondance relative des macroinvertebres dans le lac 224 de la Region des
Lacs Experimentaux.
La methode anlyse les vecteurs des proportions de
l'abondance des populations biologiques (annees) pour faire des comparaisons
deux par deux des populations (utiles pour etudier les similarites
entre
populations) et pour faire des comparaisons lineaires entre elles (necessaires pour
etudier les tendances temporelles et leur ecarts) dans lesquelles on a tenu compte
de l'heterogeneite et des matrices de covariance. On a examine les contributions
de chacun des taxons dans ces comparaisons. Les donnees sur les proportions
annuelles (abondance relative) des cinq taxons l'invertebres les plus communs du
lac 224 pour la periode 1986-1990 ont ete analysees. Les resultats des analyses
statistiques concordiaent dans !'ensemble avec l'examen visuel des donnees, tout
en permettant de postuler une tendance generale. Malgre le haut degre de
variation annuelle des proportions des taxons et les tendances diverses des
differents taxons, la methode etait assez robuste pour detecter une tendance
globale sur la courte periode etudiee. Une tendance lineaire expliquait une partie
vii
significative de la tendance generale sur cinq ans, mais non sa totalite. Les
communautes chronologiquement adjacentes etaient les plus semblables. A mesure
qu'augmentait la distance temporelle, les communautes se ressemblaient de moins
en moins. Il semble qu'une analyse des tendances annuelles puisse donner des
resultats plus signifiants que les comparaisons deux a deux des structures des
communautes, et que l'echantillonnage annuel soit preferable a un echantillonnage
moins frequent.
1
INTRODUCTION
In 1986, The Department of Fisheries and Oceans (DFO) developed the DFO
Long-Range Transport of Airborne Pollutants (LRTAP) Biomonitoring Programme
(Davies, 1991 & in preparation).
Its purpose is to assess long-term effects
of acidic deposition in several Canadian freshwater locations (stretching from
the Atlantic region to the Ontario-Manitoba border) based on characterization
of the biota of discrete portions of selected aquatic systems (Shaw et al.
1992).
Standardized methods were developed for sampling a wide community of
organisms, including those sensitive to acidification, e.g. fish and benthic
and littoral macroinvertebrates.
The methods were designed to yield
geographical and time series data comprising repeatable, quantitative annual
samples, e.g. captured macroinvertebrates, from the minimum number of stations
required to characterize "selected" biotic communities.
The DFO-LRTAP macroinvertebrate survey data can be expressed as
proportions (relative abundances) of each taxon.
Such proportions are a
measure of community structure (Misra et al. 1991) and comparisons between
proportions can be used to compare populations, i.e. populations are
considered similar if proportions remained unchanged.
Many current ecological studies incorporate measures of multi-species
populations as part of their primary data.
Several approaches for analysing
benthic monitoring data are available (see, e.g. Norris and Georges [1993] for
a critical review of the options available for analysis).
One approach is to
consider each taxon to be a variable and the presence or absence of each taxon
to be an attribute of time (Norris and Georges 1993).
We expressed relative
abundance data by using variables to indicate the presence or absence of each
2
taxon.
Indicator variables are widely used to quantify attribute data (Li
1965; Balakrishnan and Sanghvi 1968; Neter et al. 1985; Johnson and Wichern
1988; Agresti 1990).
Here, we have designated each specimen within a sample
as to whether it belongs to a taxon (Y
and Wichern 1988).
= 1)
or not (Y
= 0)
(Li 1965; Johnson
DFO-LRTAP Biomonitoring data consist of several taxa and
each specimen was assigned to one taxon only (Balakrishnan and Sanghvi 1968;
Johnson and Wichern 1988).
Li (1965) notes that in proportions defined this
way (with nj 1's and n- nj O's in a sample of n individuals), the sample mean
for the j-th taxon is nj/n, i.e. its proportion in the sample.
Analysing
samples for differences in their taxonomic proportions is equivalent to
analysing sample means.
When a sample consists of individuals from several
taxa, a vector of proportions will represent the sample mean vector, its j-th
component showing the proportion for the j-th. taxon.
Univariate comparisons between population proportions and testing
regression of proportion on an independent variable have been used (e.g. see
Li 1965, Snedecor and Cochran 1980).
In analysing relative abundance data for
trends, the independent variable is time.
Analogous multivariate procedures
can be used to analyze vectors of proportions, e.g. Johnson and Wichern
(1988), Agresti (1990).
The multivariate procedure compares populations by
analysing all taxa jointly, in contrast to a series of univariate analyses
applied to each taxon separately, and may identify significant comparisons
that are not found if taxa are examined separately (Johnson and Wichern 1988;
Hair et al. 1992).
Information about relationships, interdependence, and
relative importance of taxa is retrievable.
Multivariate techniques show
greater promise than univariate comparisons for detecting and understanding
temporal trends in macroinvertebrates (Norris and Georges 1993).
3
We employed a multivariate method to analyse DFO-LRTAP Biomonitoring
Programme data. The purpose of this presentation is to investigate time trends
in relative abundances of the five most common invertebrate taxa sampled in
Lake 224 in the Experimental Lakes Area in northwestern Ontario for the
period, 1986-1990.
Although the multivariate method is subservient to the
biological objective of this study, its application to benthic data is the
novel component.
In large samples, the use of the normal distribution in analysing
proportions is known (See, e.g. Li 1965; Snedecor and Cochran 1980 for the
univariate analysis and Johnson and Wichern 1988; Agresti 1990 for the
multivariate analysis.)
Using the normal distribution appeals to a biologist
interested in making inferences about linear comparisons and probabilities in
that computations are simple and familiar.
A biologist feels "at home" with
analysing data that are normally distributed.
Balakrishnan and Sanghvi {1968)
and Kurczynski {1970) analysed proportions without transformations, e.g. logit
or arc-sine.
We analysed proportions without transformations because: 1.
There is no guarantee that analysing transformed proportions will be better
than analysing untransformed proportions (Draper and Smith 1981); 2. "In
general, when we make a transformation, it is impossible to relate the
parameters of the model used for the transformed data to the parameters in a
model initially intended for the untransformed data." {Draper and Smith 1981);
and 3. " ... with an analysis in the original scale it is easier to think about
the meaning and practical importance of effects in this scale." (Snedecor and
Cochran 1980).
Neter et al. {1985) note that interpretation is not simple
when logit transformation is used.
Misra et al. (1991) noted the direct
relevance of proportionality to community structure.
4
Balakrishnan and Sanghvi (1968) and Kurczynski (1970) both developed
multivariate procedures for paired comparisons (K
untransformed proportions.
= 2) of vectors of
While paired comparisons, i.e. between years, are
useful for studying similarities between two annual samplings (populations),
there is a greater need to analyse trends over a number of years (K > 2).
Here we present a multivariate procedure of analyzing linear comparisons of K
(~2)
vectors of proportions to do this.
In analysing linear comparisons (K>2)
(or paired comparisons) of vectors of proportions, heterogeneity of covariance
matrices occurs.
We have taken this heterogeneity into account.
Information
needed to develop the multivariate method used here is available piecemeal in
the statistical literature (Balakrishnan and Sanghvi 1968; Kurczynski 1970;
Timm 1975; Morrison 1976; Johnson and Wichern 1988; Agresti 1990).
Although only 5 years of data were available, these were analyzed to
detect the nature and magnitude of trends at the earliest possible moment (so
that some estimate of the required length of the Biomonitoring Programme can
be given) and to detect anomalies and errors that may be corrected by
modifying the current sampling protocols.
MATERIALS AND MIITHODS
The Experimental Lakes Area contains many circum-neutral, acid-sensitive
shield lakes.
Annual deposition of wet sulfate is less than 10 kg·ha-1·yr-1.
Unperturbed Lake 224 was chosen as one of five baseline monitoring lakes for
the DFO LRTAP Biomonitoring Programme.
Besides frequent chemical and
hydrological sampling, the states of the fish, profunda! benthos, and shallow
water benthos communities are monitored annually.
A subset of the data from a
5
"Kick and Sweep" technique of sampling shallow water macrobenthos was used for
our analysis.
Samples were collected by sweeping a mesh bag {800 pm mesh
opening) through the water column just above the lake bottom over an area that
had been roiled by the person taking the sample.
rigid frame attached to a wooden handle.
The net was supported on a
Dislodged specimens (including ones
still attached to small particles, sticks, or fine debris) were captured.
Coarse material settled quickly and was not captured.
A single sample
consists of material collected from 5-10 metres of shoreline length over a
depth range of 0-0.8 metres.
Early work, e.g. Stephens and Mierle {1993, in prep.), showed that the
cumulative number of taxa collected by a standardized "Kick and Sweep"
technique increased rapidly when 1-4 stations were sampled.
Beyond 5
stations, the incremental increase was small; accordingly 5 permanent sampling
sites were established in Lake 224 in 1986.
Studies by Stevenson {under
review) showed that a minimum of from four to six stations were required to
adequately describe the littoral benthic community of a lake.
The sampling
programme was designed to provide a "lake-wide" picture of benthic communities
in selected habitats, and not to look at within-lake variability.
Therefore,
data from all five sites were pooled before analysis of proportions of taxa.
Time trials determined that catches of additional taxa decreased sharply
between 5 and 10 minutes of sampling effort.
the standard sampling period.
Therefore, 10 minutes was set as
This minimized the field work and shoreline
volume needed to characterize the fauna of each station.
All stations were sampled annually during the last 2 weeks of August.
Video tape records verified that sampling activities did not appear to affect
6
habitat condition over the years.
From 1986-1990 all stations were sampled by
the second author, except in 1986 when 3 of 5 stations were sampled by a
student.
In 1986 and 1987, live samples were presorted in the field.
Sorted
residue and specimens were preserved in Kalhe's fluid (Wiggins 1977) and
transferred to 70% ethanol within 5 days.
the laboratory for additional specimens.
Sample residues were resorted in
Sorting in the field was inefficient
and was stopped in 1987 in favor of sorting in the laboratory.
Since 1989,
stains (Eosin B and Bierbrich Scarlet) have been used as a presorting
treatment.
Such dyed samples are easier to pick.
Resorting of residues from
previous years assured that all samples had been sorted to a uniform degree.
All specimens were identified by Bohdan Bilyj, the taxonomic authority for the
national DFO-LRTAP Biomonitoring programme.
STATISTICAL METHOD
Here population is used synonymously with year.
Lake 224 data comprises
a number of populations (years), each characterized by a distribution of taxa
frequencies.
The notation employed in the presentation of the statistical
procedure is:
Q: Population (year).
K: Total number of populations.
S: Taxon.
r: Total number of taxa.
Contingency table: Classified by K rows and r columns, with Kr classes that
represent Kr possible outcomes (frequency counts).
Qi: Population i, located in row i of the contingency table.
Sj: Taxon j located in column j of the contingency table.
7
X, Y: categorical variables, X having K and Y having r levels.
nij= The joint probability that an individual of Sj comes from Qi, i.e. the
probability that (X, Y) falls in the cell identified by row i (for Qi)
and column j (for Sj) of the contingency table.
njti: The conditional probability of Y when X is at level i, i.e. the
probability that an individual is of Sj if the population is Qi.
The contingency table is two dimensional.
The objective was to describe
the effect of the independent (explanatory) variable X on the dependent
(response) variable Y, e.g. in the investigation of temporal variations, each
Qi was assigned a fixed time value for its X variable.
than random.
Thus X is fixed rather
Vectors of Y-values for rows (Qi) of the contingency table were
analyzed for temporal variations by employing X as the independent
(explanatory) variable.
Agresti (1990) noted that the probability
distribution of nij is the joint distribution of X and Y.
X is fixed,
therefore the notion of a joint distribution of X and Y lacks meaning.
Instead, we need to consider the conditional probability distribution of Y at
each fixed level Xi, i = 1, . . . , K of X.
Observations at each setting Xi
have a probability distribution (nl/i• . . . , nr;i>·
The actual data did not deal with populations, but with samples drawn
from these populations.
Notation for the data structure is: nij= The
frequency of individuals of Sj in the sample from Qi.
ni: The number of individuals in the sample from Qi, i.e. total frequency
r
n in row i.
j=1 ij
~
n
j
the number of individuals of taxon j combined over all samples, i.e. total
8
K
frequency
~ n
in column j.n: Sum of all individuals of all taxa over all
i=1 ij
samples, therefore,
n =
K
~
n
i=1 i
r
=~
n
j=1 j
=
K
~
r
~
n
i=1j=1 ij
The contingency table for the data is produced by simultaneously crossclassifying individuals with respect to the categorical variables X and Y.
Because X is fixed, counts n .. were considered at each setting X. of X.
1J
When
1
thus conditioned on n , these counts had multinomial distribution with
i
response probabilities (nl;i . . . , nr;i).
A multinomial distribution consists
of observations that are classified into a finite number of categories.
The
Lake 224 data were collected by employing the same collection standard each
year.
Agresti (1990) notes that when X is an explanatory variable, it is
sensible to perform statistical inference conditional on the totals n , even
i
when their values are not fixed by the sampling design.
In presenting some features of a multinomial distribution of
proportions, a vector is denoted by an underscored letter and a matrix by a
bold letter.
A prime denotes a transposed vector or matrix.
vector denotes a column vector.
An
unprimed
We note (Kendall and Stuart 1958;
Balakrishnan and Sanghvi 1968; Kurczynski 1970; Johnson and Wichern 1988;
Agresti 1990): 1. Data from Q yield a multinomial frequency distribution of r
i
classes (taxa) that are mutually exclusive and exhaustive. 2. Sample
proportions P.. (= n .. /n.) are maximum likelihood (ML) estimates of
1J
1J
HJ/i.
In
1
an analysis of multinomial data, redundancies in the parameters due to
constraint
r
n = n are eliminated by retaining only q(= r - 1) of r
i
j=1 ij
variables Y. ·' j = 1,
. , r. Any one of the r variables is deleted, as
~
1J
long as it is one involved in the perfect linear relationship (Harris 1975).
9
Sample and population proportions for Q are denoted by vectors £. with
1
components p
i1
, . . . , P.
respectively.
1Q
and n
-i
1
with components
Rlli,
. • • , Rqli,
With large sample numbers, the multinomial distribution of
proportions is approximately multivariate normal (Johnson and Wichern 1988)
3. For large samples, the sampling distribution of £. is
1
(n.)!(£.
- -1
n.) which is approximately Nq (0,
t.)
1
1
1
i.e. a q-variate normal distribution with parameters -0 and t.1 for the mean
1
vector and the covariance matrix respectively. Also, n.(n.
- -1
n.)·t:1 (n.
n.)
- -1
1 ~1
~1
1
is approximately~. i.e. chi-square with degrees of freedom (df) = q. t: is
q
the inverse of ti.
1
The covariance matrix niti with elements nioijt; j, t
=
1, . . . , q is given by
n. 0 •• t = Rj I i ( 1 1 1J
Rj I i ) ,
j
=t
(1)
When
RJii
values are unknown and ni is large,
can be replaced by pij'
RiiJ
i.e. use
p .. ( 1 - p .. ) where j = t; -p .. P. t where j I t
1J
1J
1J 1
(2)
instead of (1).
Comparisons of multivariate normal populations based on the analyses of
vectors of their sample means are commonly used.
Gower (1972) notes that
10
anthropometricians and taxonomists have compared populations of humans,
animals, and plants since the early part of the century by employing measures
of distance between them.
A widely used measure of the (squared) distance
2
between two populations employs the Mahalanobis o statistic. With respect to
2
a set of continuous variables D for two populations with mean vectors~ .• i =
A
1, 2 and a common covariance
~ ),
2
where
~i and~
(y 1
are estimates of population parameters obtained from
matrix~.
is estimated as
A-1 A 1
A
samples of sizes n. drawn from the two populations.
1
(~
1 - y 2)'
~
This measure of
statistical distance accounts for differences in variation and the presence of
correlation between the q variables (Johnson and Wichern 1988).
Balakrishnan
and Sanghvi (1968) and Kurczynski (1970) used it to analyze attribute data by
developing their generalized distance indices.
In their data,
~.
1
was replaced
by -1
u ..
The corresponding statistical test when~.1 is used is provided by
2
Hotelling's T , which is related to the squared Mahalanobis distance (Jobson
1992) as follows:
(3)
2
The primary use of T -statistic has been to test the hypothesis
of equality of vector means from two populations (Jobson 1992). Withy.
1
2
replaced by -1
u., the T of equation (3) is approximated by~q (Kurczynski
1970).
For continuous variables also, Morrison (1976) notes this~
formulation for large samples.
Bowering and Misra (1982) and Dempson and
11
Misra (1984) employed the squared distance-based procedures of Balakrishnan
and Sanghvi (1968) and Kurczynski (1970) in analyzing fish meristic data.
These squared distance-based procedures are restricted in scope.
In
using the common covariance matrix to compute the squared distance between two
2
populations or to test H of equality of their mean vectors by the T of (3),
0
or its ~ approximation, it is assumed that populations do not differ in their
covariance matrices, e.g. Morrison (1976), Johnson and Wichern (1988).
For
vectors of proportions, Mardia et al. (1979) noted that two populations that
differ in their vectors of class proportions must also differ in their
covariance matrices.
This is also obvious from expression (1).
The
generalized distance index of Balakrishnan and Sanghvi (1968) employed a
pooled covariance matrix.
Kurczynski (1970) used pooled proportions to
compute the covariance matrix for his generalized distance index.
Gower
(1972) considered such types of pooling as drawbacks, and noted that the more
two populations differ in their vectors of class proportions, the poorer the
measures of distance calculated with such procedures will be.
The use of a
pooled covariance matrix is not justified if populations differ significantly
in their vectors of proportions.
The covariance matrix computed from pooled
proportions may be used for testing H of equality of vectors of class
0
proportions (explained later in the text), but not for computing distances
between populations or Cis for differences in their proportions.
Multivariate procedures that account for the inequality of covariance
matrices were employed.
Statisticians have developed procedures for testing
for equality of two population means based on independent samples in which
variances for the univariate analysis and covariance matrices for the
multivariate analysis are not equal.
A multivariate procedure for attribute
12
data from large data sets (taken primarily from Timm 1975; Snedecor and
Cochran 1980; Johnson and Wichern 1988; Agresti 1990) is given here.
For a
paired comparison: 1. A linear combination of normal variables is normal
(Johnson and Wichern 1988).
Therefore, the distribution of
Q
1
- Q is
2
approximately normal
The generalized squared distance from Q - Q to
1
2
~
1 - ~2 is given as
This squared distance has an approximate ~ - distribution.
q
estimated by using the elements (2)
of~ .•
1
i
=
1, 2.
The ~ is
q
The hypothesis
(4)
would be rejected at the significance level a if
where ~(a) is the upper (100a)th percentile of a chi-square distribution with
q
df
=
q.
At the univariate level (q
=
1), Snedecor and Cochran (1980) note that
when H holds, each ni, i = 1, 2 is distributed about the same mean.
0
Therefore to test H , a common u, which is estimated from two combined
0
13
samples, can be used for each n ..
1
However, to estimate the confidence limits
for the difference n - n between population proportions, n and n (and not
1
2
1
2
n) should be employed. To test H stated in (4) by this procedure, a common
0
matrix I would be used for I and I each. Elements niajt; j,t = 1, . . . , q
1
2
of the covariance matrix n.I. i = 1, 2 are given by
1
n.a.t
1 J
= K.(lJ
u.). j
J
=t
(5)
where, for the unknown value of u., we use its estimate.~ n .. 1~ n ..
J
1= 1Jl= 1
A 100(1 - a)% confidence region for ~ - ~ is given by all ~l - ~ values
1
1
2
1
2
such that
s
To test if a specified
~
~(a)
q
10 - ~20 is a plausible value for ~ 1 - ~2 • the
statistical squared distance
is computed.
If the value iss x!(a), ~
Cis for the components of the vector
~l
10
-
- ~20 is accepted. 2. Simultaneous
~
2 can also be derived. 100 (1 -
a)% simultaneous Cis for all linear combinations
evaluating
~·(~
1
- ~)
are obtained by
14
Numerous Cis can be generated by selecting values for the components of
vector
~·
The specified probability is guaranteed against any of the numerous
statements being incorrect, i.e. the CI of (6) will contain
probability 1 - a simultaneously for all
~·
~·(~
1 - ~ 2 ) with
Because the confidence
coefficient 1 - a does not change for any choice of
~.
Cis of (6) can be used
for "data snooping", i.e. the user can first inspect his data and then choose
linear combinations
~·(~
the components of the
1 - Kz) of interest by assigning specific values to
vector~
to evaluate the CI for it.
For example, to
examine the contribution of an individual taxon j, assign zero to all elements
of the
unity.
vector~·
except the j-th element, which is assigned the value of
The contribution is significant when the CI does not contain zero.
A
second example: To examine the difference between two taxa, hand j, we choose
vector
~·
by assigning
to all other elements.
1
to its h-th element,
-1
to the j-th element and zero
Many useful applications can result from the use of CI
of (6).
The above considers only comparison between two populations.
Paired
comparisons of K populations are useful for relative abundance data, e.g. they
can be used to study distances between populations or similarities among them,
because similarities among populations can always be constructed from
distances between them (Johnson and Wichern 1988).
There is also a need for
investigating linear comparison and regression extended over several (>2)
populations, e.g. analysis of Lake 224 data for trend over 5 years requires
linear comparison of 5 populations.
15
The general form of the linear comparison of k (>2) mean vectors Ki'
= 1, . . . , K is defined by the column vector (q;L 1) with q elements
i
K
L
=~mn:
g i=1i ig
where m. , i = 1 ,
1
K
,g=1, . . . ,q
... '
K are elements of a contrast vector
m such
that
m = 0
Here linear comparison (1) was used to investigate time trend in
i=1 i
.
Lake 224. When years are equally spaced, i.e. the time difference between two
~
consecutive years is constant, appropriate values of m. are given in a table
1
of orthogonal polynomials, e.g. Pearson and Hartley (1976).
When years are
unequally spaced, m. values must be calculated, e.g., Bliss (1970, Chapter
1
14).
The following procedure would test the hypothesis
H :
0
Define
1 =Q
(7)
(i.e. matrix with K rows and K columns) as a diagonal
matrix(~K)
matrix with diagonal elements I./n. where, for large n .• elements of I. are
1
1
1
1
Construct the matrix C) as C = m'Bm. Hypothesis ( 7) of no
(qxq
_
1
2
time trend would be rejected at the significance level a if 1"C 1 > X (a)
given as in (2).
A
q
where 1 is obtained by replacing elements K. of 1 by Q ..
1
1
A procedure to test
H of (4) for a paired comparison by using the same (common) covariance matrix
0
I (equation 5) for each~ .• i = 1, 2 was given earlier in the text. To test
1
H of (7) this way n:j, j = 1,
0
covariance matrix n.~. i = 1,
1
., q (required for the elements n a.
i Jt
of the
., K) is estimated the same way as in the
case of equation (5), except that n .. and n. terms are summed over i = 1 to K
1J
(instead of i = 1 to 2).
1
100(1- a)% simultaneous Cis for all linear
16
combinations
~·1
are obtained by evaluating
(8)
Hypothesis of (4) and CI of (6) are special cases of (7) and (8),
m'
respectively, and occur when K = 2 and vector -) = (1, -1). Again,
(lx2
contributions of individual taxa or combinations of taxa to the time trend can
be examined by assigning specific values to the elements of the vector
(8).
in
~·
Hypothesis (7) can be used to test other effects by providing
appropriate values to the elements of the vector m by the procedures presented
above.
Here, temporal variations in Lake 224 data were investigated by
testing H of (7) for a linear trend (L ) and for the second degree terms of
0
1
deviation from this linear trend (L ). Higher degree deviations were not
2
examined because of the small number of years. All tests were carried out at
the 5% probability level of significance (P) unless stated otherwise.
ANALYSIS AND RESULTS
The total number (n) of captured individuals was 3107.
The five most
abundant taxa accounted for about half, i.e. 1656 (Table 1).
Table 1 names
the taxa and gives annual abundances of each (n .. ) and the sample sizes n. for
1J
the five years.
Proportions P ..
1J
parentheses.
(=
1
n .. /n.) expressed as percentages are in
1J
1
The total frequency count n. for each sample was distributed
1
over six classes, i.e. the 5 taxa of Table 1 and one of "all other taxa";
therefore, q
=
5.
Many researchers replace a zero count by a small constant
when analysing their data (Agresti 1990).
-4
10
Zero P .. values were replaced by
1J
(Kurczynski 1970) to avoid computational difficulties.
•
17
A. Time Trends
Proportions (as percentages) were plotted against year (Figure 1).
H
0
for the linear time trend (L ) and (quadratic) second degree deviations from
1
it (L ) were both rejected (P<0.001), indicating that a linear trend was
2
present, although it did not explain the entire variation between years. More
years of data are required to determine the mathematical form of the
underlying pattern.
However, even if the true pattern is nonlinear, a
significant part of it was captured by the linear trend.
Taxa that led to the rejection of the H for L and L were determined
0
(Table 2).
1
2
There was a strong correspondence between the 95% simultaneous Cis
in Table 2 and Figure 1.
For taxon 1, Cis for L and L did not contain zero,
1
and the confidence limits were negative for both.
2
Figure 1 showed a
decreasing linear trend in percent values for taxon 1, although the deviations
from the linear trend were considerable.
The contribution of taxon 2 to L was significant and not significant to
1
L . The confidence limits for L were both positive which matched the
2
1
clearly increasing linear trend seen in Figure 1.
No linear trend was discerned for taxa 3 or 5 (Figure 1).
contributed significantly to L , but not to L .
2
1
Both taxa
The confidence limits for L
for H. azteca were positive, while those for the Chironomus group (5) were
negative.
In Figure 1, the curve for H. azteca opened upward; that for
Chironomus spp. downward.
2
18
Taxon 4 was present in roughly the same relative annual abundance, i.e.
its contribution to L and L was not significant (Table 2).
1
2
When the time span is short, the effects of taxa considered individually
may not vary significantly with time, but rejection of H of (7) implied that
0
their linear combination did.
8: Paired Comparisons of Samples (Years)
Comparisons of years were done by testing H of (4) for each pair.
0
The
2
ten pairs were then ranked based on their estimated X values.
A P s 0.001 level rejected all H 's. Proportions (Table 1) varied so
0
widely between years that one expected each year to differ from each other
year.
Table 3 gives taxa that led to the rejection of H . It was interesting
0
to note the generally good agreement between the statistical choice of which
taxa were responsible for differences and those that can be seen in the raw
data in Figure 1 and Table 1.
Data pairing of all combinations of years were
2
ranked in decreasing order of similarity, i.e. ascending values of X , to
produce Table 4.
The results showed that as temporal spacing increased, the
more dissimilar the biological structure in Lake 224 became.
•
DISCUSSION
We employed paired comparisons to investigate distances and similarities
among communities (populations of separate years) and extended the methodology
of paired comparisons to linear comparisons of more than two populations while
giving due account to the heterogeneity of covariance matrices of populations,
19
in analysing paired and linear comparisons.
In the LRTAP Biomonitoring
Programme, analysis required the use of linear comparisons as exemplified by
the investigation of the trend of five populations over 5 years of Lake 224.
The linear time trend was significant over this short period, showing
that the relative abundance of the five taxa varied over the period 1986-1990.
Although the linear time trend explained a significant portion of the observed
change, it did not explain it all.
Change occurred at a sufficient rate to
warrant continuation of the study, a significant proportion of which was
captured by the linear trend.
The findings of the statistical analyses were
generally consistent with the actual data as shown in Table 1 and Figure 1.
The question remains as to whether this change was a directed sequence or
merely a short-term linear pattern within a longer term random fluctuation.
The important feature is that the analytical results closely tracked the
obvious trends in the raw data.
However, specific inferences concerning the
1986-1990 trend in Lake 224 should be accepted with caution.
Opposing trends
in the relative abundances of two major chironomid groups (Tanytarsus/
Microsectra spp. and Pseudochironomus spp.) were the central feature of this
analysis.
Because considerable emphasis was placed on keeping sampling
methodology, sorting efficiency, and taxonomy constant, it is likely that
these trends in community structure were real.
is probably minimal.
Their biological significance
Taxa 1 and 2 both contain several species.
These broad
categories were used to enumerate specimens that could not be identified with
any greater degree of taxonomic precision, mostly because they were early
instar forms.
Later life stages could be identified to the genus or species
level and were enumerated separately.
The young larvae abundances in the
samples depend on the time of adult emergence, mating and egg laying success
20
(processes that are strongly weather dependent), and larval development/
mortality rate.
It would be false, therefore, to suggest from these results
that a directional change in community structure has occurred in Lake 224, or
•
even among chronomids.
•
Autocorrelation was not investigated in this study because: 1. The
number of time points (years) is too small (only 5) to determine
autocorrelation or to allow definitive judgments pertaining to autocorrelated
errors (Bliss 1967; Neter et al. 1985; Johnson and Wichern 1988); 2. It may be
safe to disregard autocorrelation when each sample contains different
individuals (Bliss 1967), the case in the Lake 224 data, given the short life
spans of these taxa.
Evidence of biological continuity from year to year can be seen in the
paired comparisons (Table 4).
most similar.
less alike.
Chronologically adjacent communities were the
As the temporal separation increased, the communities became
This suggested that previous structure is a stronger determinant
of the present state in unperturbed systems than is annual variability of
climate, chemistry, or hydrology.
The annual variation in relative abundance
was high at the taxon level, a feature that imparts a degree of longer-term
plasticity to community structure.
Even with the observed high degree of
annual variation in relative abundances among taxa and diverse trends in
component taxa, the method is robust enough to detect an overall trend in the
available data set.
Thus, in the long-term monitoring programme, it is
expected that an analysis of annual trends will yield more meaningful results
than paired comparisons of community structure, and that annual sampling is
preferable to less frequent monitoring.
•
21
ACKNOWLEDGMENT
The authors thank Mike Bewers, Susan Kasian, Mike Paterson, and David
Cook for their comments on the manuscript.
REFERENCES
Agresti, A. 1990. Categorical data analysis. John Wiley & Sons, New York, NY.
558 p.
Balakrishnan, V., and L.D. Sanghvi. 1968. Distance between populations on the
basis of attribute data. Biometrics 24: 859-866.
Bliss, C.I. 1967. Statistics in biology. Vol. 1, McGraw-Hill, New York, NY.
558 p.
Bliss, C.I. 1970. Statistics in biology. Vol. 2, McGraw-Hill, New York, NY.
639 p.
Bowering, W.R., and R.K. Misra. 1982. Comparisons of witch flounder
(Glyptocephalus cynoglossus) stocks of the Newfoundland-Labrador area based
upon a new multivariate analysis for meristic characters. Can. J. Fish.
Aquat. Sci. 39: 564-570.
Davies, I.J., 1991. Biomonitoring presnovodnykh ekosistem v Kanada: programma
Depatamenta Rybolostva i Okeanov. (Canadian freshwater biomonitoring: The
programme of the Department of Fisheries and Oceans). In: Problemy
ekologicheskogo monitoring i modelirovaniya elosistem. (Problems of
ecological monitoring and ecosystem modelling). 75-88. Leningrad:
Gidrometeozdat. In Russian. English version issued as Canadian Translation
of Fisheries and Aquatic Sciences, #5551, 1992, 24p. (FWI reprint #1078a).
Dempson, J.B., and R.K. Misra. 1984. Identification of anadromous Arctic char
(Salvelinus alpinus) stocks in the coastal areas of northern Labrador based
on a multivariate statistical analysis of meristic data. Can. J. Zool. 62:
632-636.
22
Draper, N.R., and H. Smith. 1981. Applied regression analysis. John Wiley &
Sons Inc. New York, NY. 709 p.
Gower, J.C. 1972. Measures of taxonomic distance and their analysis. p. 1-24.
In: J.S. Weiner and J. Huizinga [eds.]. The Assessment of Population
Affinities in Man. Clarendon Press, Oxford, UK. 224 p.
Hair, J.F., R.E. Anderson, R.L. Tatham, and W.C. Black. 1992. Multivariate
data analysis. Macmillan Publishing Co. Ltd., New York, NY. 544 p.
Harris, R.A. 1975. A primer of multivariate statistics. Academic Press, New
York, NY. 332 p.
Jobson, J.D. 1992. Applied multivariate data analysis. Springer-Verlag, New
York, NY. 731 p.
Johnson, R.A., and D.W. Wichern. 1988. Applied multivariate statistical
analysis. Prentice Hall, Englewood Cliffs, NJ. 607 p.
Kendall, M.G., and A. Stuart. 1958. The advanced theory of statistics. Vol 1.
Charles Griffin & Co. Ltd. London, UK. 433 p.
Kurczynski, T.W. 1970. Generalized distance and discrete variables. Biometrics
26: 525-534.
Li, J.C.R. 1965. Statistical inference. Vol. 1. Edwards Brothers, Ann Arbor,
MI. 658 p.
Mardia, K.V., J.T. Kent, and J.M. Bibby. 1979. Multivariate analysis. Academic
Press, New York, NY. 521 p.
Misra, R.K., N. Watson, J.F. Uthe and I. Davies. 1991. A multivariate method
of analyzing relative abundance data, employing the weighted mean
proportions, p. 192-201. In: D.A. Scruton, V.P. Williams, L.L. Fancy, and
M.M. Raberge [eds.] Proceedings of the 5th Annual Department of Fisheries
and Oceans LRTAP Workshop. Newfoundland Region, Science Branch, v
+
254 p.
Morrison, D.F. 1976. Multivariate statistical analysis. McGraw-Hill Book Co.,
New York, NY. 415 p.
23
Neter, J. W. Wassermen, and M.H. Kutner. 1985. Applied linear statistical
models. Richard D. Irwin Inc., Homewood, IL. 1127 p.
Norris, R.H., and A. Georges. 1993. Analysis and interpretation of benthic
macroinvertebrate surveys, p. 234-285.
[eds.].
In: D.M. Rosenburg and V.H. Resh
Freshwater biomonitroing and benthic macroinvertebrates.
Chapman
and Hall, New York, NY. 488 p.
Pearson, E.S., and H.O. Hartley [eds.]. 1976. Biometrika tables for
statisticians, Vol. 1. Biometrika Trust, University College, London, UK.
270 p.
Shaw, M.A., S. Geiling, S. Barbour, I.J. Davies, E.A. Hamilton, A. Kemp, R.
Reid, P.M. Ryan, N. Watson, and W. White.1992. The Department of Fisheries
and Oceans National LRTAP biomonitoring programme. Site locations, physical
and chemical characteristics. Can. Tech. Rep. Fish. Aquat. Sci. 1875: 87 p.
Snedecor, G.W., and W.G. Cochran. 1980. Statistical methods. The Iowa State
University Press, Ames, IA. 507 p.
Stephenson, M. and G. Mierle. 1993. Effects of experimental and cultural lake
acidification of littoral benthic macroinvertebrate assemblages. 1. Methods
development and testing. Can. J. Fish. Aquat. Sci. (in preparation).
Timm, N.H. 1975. Multivariate analysis. Brooks/Cole Publishing Co., Monterey,
CA. 689 p.
Wiggins, G.B. 1977. Larvae of the North American caddisfly general
(Trichoptera). University of Toronto Press, Toronto, ON 401 p.
24
Table 1. Frequency distributions of counts of five most abundant taxa in Lake
224 over five years, 1986-1990. Proportions (p .. ) expressed as
lJ
percent are given in parentheses.
Taxon
Tanytarsus/
Micropsectra
spp.
Pseudochironomus
spp.
(1)
1986
151
(29.0)
(0.4)
92
(20.8)
( 1.1)
220
(28.7)
1989
1990
1988
Cladotanytarsus
spp.
(2)
Sample
Size (ni)
(4)
( 5)
55
(10.6)
13
(2.5)
(0)
28
(6.3)
20
(4.5)
38
(8.6)
442
100
(13.0)
44
58
(7.6)
54
(7.0)
766
(5.7)
103
(17.4)
122
(20.6)
56
(9.5)
32
(5.4)
49
(8.3)
592
61
(7.8)
179
(22.8)
121
(15.4)
39
14
786
( 5. 0)
( 1.8)
2
5
(3)
..
Chironomus
(s.s.) spp.
Taxon No. (j)
Year
(i)
1987
Hyalella
azteca
521
0
Table 2. 95% Simultaneous confidence intervals for individual taxa.
The values
were used to test the significance of linear (L 1 ) and deviations from
this trend (L 2 ) for each of the five most abundant taxa.
Linear
Comparison
Taxa No.
Confidence Limits
Upper
Lower
1
2
3
4
5
-0.6265
0.5263
-0.0075
-0.0239
-0.0338
-0.2906
0.7588
0.2635
0.1402
0.0983
1
2
3
4
5
-0.4217
-0.1569
0.0996
-0.2053
-0.3641
-0.0217
0.1262
0.3927
0.0021
-0.1837
•
25
Table 3. Taxa (indicated by their numbers) which contributed significantly
to differences in paired comparisons among years.
Year
1987
1986
5
2,
1987
1988
1989
1990
4, 5
1' 2, 5
1, 2, 5
2
2
1' 2' 3, 5
1, 2
1' 2, 3, 5
1988
1989
1, 3, 5
Table 4. Pairs of years ranked in decreasing order of similarity (Yr represents
the time difference in years).
Rank
Pair
Yr
Rank
Pair
Yr
1
1988/1989
1
6
1988/1990
2
2
1986/1987
1
7
1986/1988
2
3
1989/1990
1
8
1986/1989
3
4
1987/1988
1
9
1987/1990
3
5
1987/1989
2
10
1986/1990
4
26
..
30
..
•
I'
I
\
\
I
\
25
I
\
I
\
'
/
I
-
/
/
.. ?</
20
\I
t,
I \
---
I
0~
I
c:
::1
I
-
\
- - + - - j=3
\
I
c:
I
x
I
~
co
I-
\
\
I
I
I
.A.
I
5
,' _ /
j=4
- - - -A - - - -
j=5
\
-
'
/I
- I
.::te I
il
-.
'
- ':!(.- ~-
"
'
""
- -"""
I
/
::t('-:
'
I
-- L
)1(.
I ";'-& ~--
- - -:1(---
\
I
10
I
I
--X
x-o ~-4~-L------L-----~----~----__J
I
1986
1987
1988
j=1
Ill
I
0
u 15
- -II- - - -
1989
1990
Year
Fig. 1. Percent taxa count (n.ij ) I total count (n i) for five most frequently
sampled invertebrate taXa .