Statistically assessing the correlation between salinity and

Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
Contents lists available at ScienceDirect
Palaeogeography, Palaeoclimatology, Palaeoecology
journal homepage: www.elsevier.com/locate/palaeo
Statistically assessing the correlation between salinity and morphology
in cysts produced by the dinoflagellate Protoceratium reticulatum
from surface sediments of the North Atlantic Ocean,
Mediterranean–Marmara–Black Sea region, and
Baltic–Kattegat–Skagerrak estuarine system
Ida-Maria Jansson a,⁎, Kenneth Neil Mertens b, Martin J. Head a,c
a
b
c
Department of Earth Sciences, Earth Sciences Centre, University of Toronto, 22 Russell St., Toronto, Ontario M5S 3B1, Canada
Research Unit for Palaeontology, Ghent University, Krijgslaan 281 S8, 9000 Ghent, Belgium
Department of Earth Sciences, Brock University, 500 Glenridge Avenue, St. Catharines, Ontario L2S 3A1, Canada
With contributions from: Anne de Vernal a, Laurent Londeix b, Fabienne Marret c,
Jens Matthiessen d, Francesca Sangiorgi e
a
GEOTOP, Université du Québec à Montréal, P.O. Box 8888, Montréal, Québec H3C 3P8, Canada
Département de Géologie et Océanographie, UMR 5805 CNRS, Université Bordeaux 1, avenue de Facultés, 33405 Talence cedex, France
c
School of Environmental Sciences, University of Liverpool, Liverpool L69 7ZT, UK
d
Alfred Wegener Institute for Polar and Marine Research (AWI), 27515 Bremerhaven, Germany
e
Department of Earth Science, Faculty of Geosciences, Laboratory of Palaeobotany and Palynology, Budapestlaan 4, 3584 CD Utrecht, The Netherlands
b
a r t i c l e
i n f o
Article history:
Received 31 May 2013
Received in revised form 8 January 2014
Accepted 18 January 2014
Available online 25 January 2014
Keywords:
North Atlantic
Model development
Hierarchical partitioning
Collinearity
Operculodinium centrocarpum
a b s t r a c t
Recent studies have correlated dinoflagellate resting cyst morphology to salinity and density variations in the
water column, suggesting that morphology can be used for paleoceanographic reconstructions. However, the
univariate statistics used by these studies are appropriate only where morphology is related to a single variable.
Density is a function of salinity and temperature, so more advanced statistical methods are needed to understand
which parameters affect morphology. In this study based on surface (coretop) sediments, a set of environmental
variables (sea-surface salinity, temperature, nitrate, and phosphate) was simultaneously correlated to morphological variations seen in resting cysts produced by the dinoflagellate Protoceratium reticulatum (=Operculodinium
centrocarpum sensu Wall and Dale). Approximately 3200 measurements were obtained from the North Atlantic
Ocean and used to generate a working model based on the Akaike information criterion. Hierarchical partitioning
was then applied to establish the independent and joint effects for each predictor variable. Results from these analyses showed that while salinity constitutes the dominant variable affecting process length in the cysts of
P. reticulatum in the North Atlantic, it is not the sole explanatory variable and that multicollinearity exists. Temperature and nutrients also showed a significant relationship to the morphology, requiring multiple regression to construct a representative model. The applicability of the North Atlantic working model was finally evaluated by
comparing the results to data from the Mediterranean, Marmara, and Black seas, and Baltic–Kattegat–Skagerrak estuarine system. This comparison showed regional differences in morphological–environmental correlation.
While salinity constitutes the most important explanatory factor in both the North Atlantic and Baltic–Skagerrak
system, this is not so for the Mediterranean–Black Sea region where temperature is the dominant variable. It is concluded that a predictive salinity model based on P. reticulatum cyst morphology has at best a regional application.
© 2014 Elsevier B.V. All rights reserved.
1. Introduction
Reliable reconstructions of past oceanographic conditions require
dependable proxies for seawater conditions, in particular salinity. In
⁎ Corresponding author.
E-mail address: [email protected] (I.-M. Jansson).
0031-0182/$ – see front matter © 2014 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.palaeo.2014.01.012
pursuit of such a proxy, sea-surface salinity (SSS) has been related to
changes in the morphology of both coccolithophores (e.g. Fielding
et al., 2009) and dinoflagellate cysts (e.g. Mertens et al., 2009, 2011,
2012a,b; Verleye et al., 2012). The original idea of salinity affecting dinoflagellate cyst morphology (Wall et al., 1973) has recently come to focus
on process length variation observed in resting cysts produced by
Lingulodinium polyedrum (Stein, 1883) Dodge, 1989 and Protoceratium
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
reticulatum (Claparède and Lachmann, 1859) Bütschli, 1885. Culture
studies suggest that L. polyedrum is not suitable for predictive SSS
models as the morphology is also affected by temperature (Hallett,
1999). Culture studies performed on P. reticulatum suggest that salinity
influences cell size (Röder et al., 2012), but no conclusive results have
been obtained to relate cyst morphology with salinity.
The cysts of Protoceratium reticulatum, known also by the now
superfluous paleontological name Operculodinium centrocarpum
sensu Wall and Dale (Paez-Reyes and Head, 2013), have attracted
attention as potential paleosalinity indicators owing to their cosmopolitan distribution, generally high abundance, and broad salinity tolerance (Zonneveld et al., 2013). Until recently, morphological
studies have focused on semi-quantitative reconstructions using
cysts recovered from sediments (Brenner, 2005; Dale, 1996;
Ellegaard, 2000; Head, 2007). However, in a more recent study
focusing on the Baltic–Kattegat–Skagerrak estuarine system, linear
correlation was used to relate the morphology of P. reticulatum cysts to
seasonal variations in surface salinity (S), temperature (T), and density
(D) (Mertens et al., 2011). This resulted in a transfer function in which
process length was correlated to summer salinity (R2 = 0.8), a model
subsequently used to reconstruct salinity for the early through late
Holocene of the Baltic Sea (Willumsen et al., 2013). Subsequently, similar studies conducted outside the Baltic–Skagerrak system have identified sea-surface density as responsible for process length variation in
this species (Mertens et al., 2012a; Verleye et al., 2012). Although
these studies support the original idea of a connection between salinity
and cyst morphology, a more in-depth understanding of the relationship is required before a reliable transfer function can be applied. To
develop a reliable proxy based on process length and salinity, a firm
evaluation of the relationship between cyst morphology and potentially
influential oceanographic variables is required (Birks, 1995). This requirement was not fulfilled in the previous studies as they had relied
on simple regressions, individually relating process length to each predictor variable. Such an approach is questionable because biological
variables are often affected by numerous environmental parameters.
Assuming that process length variation is influenced by more than
one variable, simple regression cannot provide a realistic view of the
problem at hand.
203
The present study is based on a regionally constrained calibration,
using North Atlantic core-top material to evaluate the individual
and combined effects of multiple environmental variables on process
length. Since process length constitutes the only dependent variable, a
combination of methods was used to assess its correlation to SSS, seasurface temperature (SST), nitrate, and phosphate. First, the importance
of salinity was addressed by conducting a model selection using the
Akaike information criterion (AIC; Akaike, 1974). Second, as environmental variables are often correlated, the explanatory variables were
assessed using hierarchical partitioning (HP) to avoid naïve interpretations of the outcomes of the univariate and multiple regressions
(Chevan and Sutherland, 1991; Mac Nally, 2002). Results from the
North Atlantic data set (Fig. 1a) were finally compared to a data set
from the Mediterranean–Marmara–Black Sea region (Fig. 1b) and another from the Baltic–Kattegat–Skagerrak estuarine system (Mertens
et al., 2011; Fig. 1c).
2. Material and methods
2.1. Sample and data collection
In contrast to the Baltic–Kattegat–Skagerrak estuarine system,
where previously collected data (Mertens et al., 2011, 2012a) were
available for re-analysis, cyst measurements were required from the
North Atlantic Ocean and Mediterranean–Marmara–Black Sea region.
A total of 138 microscope slides (from Konieczny, 1983; Dale, 1985;
Gundersen, 1988; Matthiessen, 1995; Thorsen et al., 1995; Thorsen
and Dale, 1998; Dale et al., 1999; Çağatay et al., 2000; Persson et al.,
2000; Boessenkool et al., 2001; Grøsfjeld and Harland, 2001; KunzPirrung, 2001; Leroy, 2001; Levac, 2002; Mangin, 2002; Marret and
Scourse, 2002; Marret et al., 2004; Sprangers et al., 2004; Sangiorgi
et al., 2005; Clarke et al., 2006; Ladouceur, 2007; Kirci-Elmas et al.,
2008; Kotthoff et al., 2008; Mertens et al., 2009, 2012a; Elshanawany
et al., 2010; Ribeiro et al., 2012) were obtained for this purpose. All samples were collected using box and multicorers and were retrieved during multiple cruises in the North Atlantic Ocean and the
Mediterranean–Marmara–Black Sea region. Samples were exclusively
limited to the top 2 cm of sediment, and were thus considered modern
a
b
Mediterranean–Marmara–Black seas
46°
North Atlantic Ocean
44°
Black Sea
42°
80°
40°
Marmara Sea
38°
36°
70°
Mediterranean Sea
34°
0°
10°
20°
30°
c Baltic–Kattegat–Skagerrak system
60°
64°
a) n=64
North Sea
b) n=26
62°
50°
40°
c) n=82
60°
58°
40°
Skagerrak
56°
-100°
-80°
-60°
-40°
-20°
0°
20°
40°
Kattegat
Baltic Sea
54°
0°
5°
10°
15°
20°
25°
30°
Fig. 1. Distribution of samples analysed in the present study: a) the North Atlantic Ocean; b) the Mediterranean, Marmara, and Black seas; and c) the Baltic Sea, Kattegat, and Skagerrak. All
maps were generated using clim.pact (Benestad, 2009), a coastline drawing function developed for R. The sample size, n, is given for each region.
204
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
and representative of present-day conditions. In that sense, it was also
assumed that the explanatory variables remained stable during the
time of cyst accumulation. Most cysts were extracted from the sediments using similar standard palynological methods, i.e. based on hydrochloric and hydrofluoric acids, sieving, and in some cases
ultrasound. Slides were prepared by mounting aliquots of residue in
glycerine jelly.
Measurements were performed by K.N.M. (n = 72) and I.-M.J.
(n = 18). All measurements by K.N.M. were conducted using an
Olympus BX51 microscope with a Nikon digital sight DS-1L 1 module,
a Nikon eclipse 80i microscope and coupled DS Camera Head (DSFi1)/DS Camera Control Unit DS-L2, and a Leica DM 5000 B microscope
equipped with a Leica DFC490 camera and Leica Application Suite 2.8.1
software. Measurements by I.-M.J. were conducted using a Zeiss Imager
Z1m microscope in combination with an Olympus XC10 Axio Imager
camera and the image analysis software analySIS. A 100× oil immersion
objective was used for all measurements. Following the approach previously developed by Mertens et al. (2009, 2011), the three longest visible
peripheral processes on each cyst of Protoceratium reticulatum were
measured (and then averaged) together with the cyst's maximum central body diameter. Measurements were obtained from no fewer than
20 cysts per sample, and up to 50 specimens when possible. Reproducibility was tested by the two observers by duplicating measurements on
12 of the samples; comparisons of results showed both sets of measurements to be within 0.5 μm of each other. This is the same reproducibility
as determined in another recent study of P. reticulatum process length
(Head, 2007). Cysts without processes were not included in the analysis
due to identification difficulties.
2.2. Test for spatial patterns
Parametric statistics is based on independence. However, ecological
data often violate this assumption by showing spatial dependence in
terms of trends, gradients, patchiness, and random fluctuations. Spatial
dependence means that the dependent variable/error term of a certain
location is correlated with the same observations obtained at adjacent
locations. This is a concern, as spatial autocorrelation (SAC) modifies
the effective sample size, thereby introducing the risk of an inflated
correlation and type I errors (Legendre et al., 2002). In the present
study, two types of spatial dependence (Fortin and Dale, 2005) could affect the morphological differences seen in Protoceratium reticulatum
cysts. First, SAC could occur as a result of constraints on the organism
(mobility, dispersal) if there are genetically distinct populations. This
type of spatial dependence is referred to as inherent autocorrelation
and should not be confused with induced spatial dependence. Here,
the process length of P. reticulatum cysts would have a functional dependence on an underlying variable (e.g. salinity) that is autocorrelated
in itself. One type of SAC does not exclude the other, so in many cases
inherent as well as induced SAC will affect the data set.
In order to present results that are not overly optimistic, or directly
misleading, it is crucial that spatial autocorrelation is acknowledged
and addressed. Ideally, the problem should be faced and dealt with as
part of the sample design. In this scenario, sample locations should be
selected randomly so that the samples are distributed independently
from one another. As simple as this might sound, it is not always possible and the distribution of samples used for the present study was not
actively determined prior to the sampling. This is a sombre reality for
many researchers due to the high cost of collecting deep-ocean sediment
samples and/or limited access to material. However, this limitation makes
it even more vital to acknowledge SAC and to work towards a solution.
As the present study is based on a sample set that covers a large region, and genetic variability might be present, spatial clustering is most
likely a reality. If SAC is a factor, then a number of solutions can be implemented. The simplest is to acknowledge the SAC and then use a 1%
significance level instead of 5% (e.g. Dale and Zbigniewicz, 1997).
Areas can also be divided based on their degree of dissimilarity, or
adjacent locations with similar values can be grouped by generating
spatial clusters (Fortin and Dale, 2005). For the present study, samples
were grouped to generate a more realistic representation of the actual
sample size. As part of this process, the degree of dependency between
the sample sites was first evaluated by calculating the inverse distance
weights (IDW) for all samples using the free statistical language and environment R (also referred to as “GNU S”, R Development Core Team,
2011) and the package “ape”. In IDW, spatially close samples are assumed to be more similar than samples further away. On a continuous
surface, all samples will be related to some extent, but they will not
carry the same weight (for more on this subject and R, see Bivand
et al., 2008). An individual sample will exert a local influence manifested
as high weights in spatially close samples. As this influence decreases
with increasing distance, lower values are generated and the sample
can be considered as belonging to a more random distribution.
In this study, the IDW was calculated for the North Atlantic and Mediterranean–Marmara–Black Sea samples. In each case, all samples that
generated a distance weight above one were analysed further to determine whether they should be kept as independent samples or merged
with a close neighbour. The decision to merge samples was made conservatively, and only samples less than 15 km apart and showing a process length difference below 0.5 μm (i.e. below the limit of
reproducibility) were combined and counted as a single data point.
The SAC was then measured by conducting a Moran's I test to show
whether SAC was present after the merging of the adjacent samples.
2.3. Environmental variables
The North Atlantic set of environmental variables consisted of four
predictors, all selected to test whether they affected process length on
a microclimatic scale or through nutrient availability. These predictors,
which included surface salinity and temperature as well as phosphate
and nitrate, were all considered direct variables (Austin, 1980). Unlike
previous studies (e.g. Mertens et al., 2011; Verleye et al., 2012), temperature and salinity functions, such as density and salinity/temperature
(S/T) ratios, were not included. The same four variables were also obtained for the Mediterranean and Baltic analyses. Annual, seasonal, as
well as monthly data were acquired from each region to provide a better
understanding of the observed process length variability. All environmental data were obtained from the World Ocean Atlas (Antonov
et al., 2006; Locarnini et al., 2006) and processed in Ocean Data View
(http://odv.awi.de).
Process length was also correlated to five additional variables: oxygen, silicate, and mixed layer depth (MLD) as well as photosynthetically
active radiation (PAR) and chlorophyll a. Whereas oxygen, silica and
MLD were all obtained using the same methods as the main parameters,
PAR and chlorophyll a were obtained from SeaWiFS (using the NASA
GSFC website http://oceancolor.gsfc.nasa.gov/) and processed using
the software solution ENVI. These variables have not been linked previously with morphological variation in Protoceratium reticulatum cysts,
but the correlation was performed to determine whether they might
nonetheless be influential.
3. Numerical analysis
3.1. Exploratory data analysis and transformation
Prior to the model selection, an exploratory data analysis (EDA) was
conducted to gain information about the relationships between the response and predictor variables (Fox and Weisberg, 2011). For the first
part of the EDA, a Shapiro–Wilk test was conducted to establish whether process length data were normally distributed. The same was done
for the independent variables to establish whether the data needed to
be transformed before continuing the analysis. This was followed by
the construction of a quantile–quantile (Q–Q) plot to visually render
the process length distribution against the theoretical reference
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
4.1. North Atlantic analysis
Normal Q–Q Plot
8.5
9.0
9.5
As the first step of the EDA, the Shapiro–Wilk normality test established that the distribution of process lengths in the sample set was normal (W = 0.9941, p-value = 0.9905). This result was confirmed by
8.0
While both univariate and multiple linear least-squares regressions
are often used for hypothesis testing in ecology, it is important to acknowledge that the methods frequently make unrealistic assumptions
about the data structure (Fox, 1991). Using the package “car” in R, a
diagnostic analysis was conducted to assess problems such as nonlinearity, nonconstant error variance, and collinearity (Fox and Weisberg,
2011).
Linearity was first assessed from Pearson residual plots to evaluate
whether the linear models were correctly specified. To detect nonconstant error variance, the studentized residuals were plotted against
the fitted values. In terms of collinearity, this problem occurs when variables are inter-correlated, which is often the case for environmental
variables. In practice, the importance of each variable decreases where
multicollinearity exists, and opposite variables can effectively cancel
each other (Mac Nally, 2000). Since AIC does not address this problem,
the impact of collinearity needed to be addressed in a separate analysis.
There are a few different ways to handle collinearity, of which model
respecification involves combining correlated regressors or simply removing one of two inter-correlated factors after some type of prior
screening. However, the problem with this approach is that one can inadvertently delete the variable that in fact has the highest effect on the
dependent variable (Mac Nally, 2000). In the present study, a Pearson
correlation matrix was constructed for the initial screening, but no
variables were removed based on these results. Collinearity was then
confirmed by calculating the variance inflation factor (VIF) for each
model predictor (Fox, 1991; Fox and Weisberg, 2011). Collinearity
was finally assessed more thoroughly by applying hierarchical
partitioning (HP; Chevan and Sutherland, 1991).
In HP analysis, all models that can be generated by the selected variables are evaluated to find those factors that influence the dependent
variable (Chevan and Sutherland, 1991). While the method does not
provide a predictive equation, it is capable of partitioning variance,
thereby making it possible to separate the variable's independent
(I) and joint (J) effects (Chevan and Sutherland, 1991). As long as the
data set consists of no more than nine explanatory variables, HP analysis
All samples containing fewer than 20 measurable cysts were excluded from the analysis. In addition, two samples that did not follow the
normal preparation standards (Boessenkool et al., 2001) were also
removed. After evaluating the IDW results, eight of the North Atlantic
samples were merged into four. This resulted in a final data set of 64
samples for the North Atlantic (Fig. 1a), equal to approximately 3200
measurements. A Moran's I analysis was then conducted, showing a
p-value of 0 and thereby indicating a random spatial pattern.
The North Atlantic data set covered an annual salinity gradient
between 27.8 and 36.0, and the average process length varied between 6.3 and 9.7 μm (mean = 8.0 μm, stdv. = 0.7 μm). For the comparative analysis, 26 of the Mediterranean–Marmara–Black Sea samples
fulfilled the measurement standards (Fig. 1b; mean = 8.65 μm, stdv. =
1.11 μm). Regarding the Baltic–Kattegat–Skagerrak set, the complete
analysis was based on re-analysing the results from 82 samples previously obtained for similar studies (Mertens et al., 2011, 2012a; Fig. 1c;
mean = 6.28 μm, stdv. = 2.18 μm), with all samples fulfilling the
measurement standards. Sample information for the North Atlantic,
Mediterranean, and Baltic data sets is given in Supplementary Table 1.
All analyses were performed using annual sea-surface means before
seasonal and monthly data sets were considered. A full breakdown of
HP results calculated before model reduction is available as Supplementary Tables 2–4.
7.5
3.3. Diagnostic analysis and collinearity assessment
4. Results
Sample Quantiles
Following the approach used in earlier studies of Protoceratium
reticulatum cysts, the direct relationship between process length and
salinity was examined first. This was done to determine the method's
accuracy by comparing the outcome to the results later obtained from
the multiple regression models.
All multivariate models were developed using model selection,
meaning that an optimal working model is identified by actively
selecting those parameters to include and exclude. Model selection is
necessary as it helps to identify irrelevant variables, which can then be
excluded. This is important because if the insignificant variables are
left within the model, their inclusion can result in overfitted models
with inflated coefficients, standard errors, and compromised R2-values
(Schroeder et al., 1986; Mac Nally, 2000). For the present study, the
model selection was based on AIC, which in turn is based on Kullback–
Leibler information (Kullback and Leibler, 1951), and provides a means
to select the best model from the full set of models generated by the
data. This helps to avoid type I errors as well as models with artificially
high R2-values.
7.0
3.2. Model selection
allows reliable ranking of the independent variables based on their importance (Olea et al., 2010). From the ranking and the I-values, one can
then establish what variable has the strongest independent effect on the
dependent variable. I-values above 0.15 have been suggested to indicate
an important relationship between the explanatory variable and the
predictor, whereas values below 0.05 have been categorised as unimportant (Fleishman et al., 2005). All I- and J-values were calculated
using a randomization routine to establish the statistical significance
of each variable (Mac Nally, 2002; Walsh and Mac Nally, 2008).
6.5
distribution. All environmental data were standardised to facilitate
comparisons between sites and variables. Z-scores were generated for
each data point by first calculating the mean and standard deviation
for each variable. The mean was then subtracted from each individual
data point and the remaining product was divided with the corresponding standard deviation (Cassie and Michael, 1968). The EDA, as well as
all following statistical analyses, were performed using R.
205
-2
-1
0
1
2
Theoretical Quantiles
Fig. 2. Normal quantile-comparison plot for Protoceratium reticulatum process lengths
(in μm) in the North Atlantic. The samples follow the fitted line, demonstrating that
process lengths follow a normal distribution.
206
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
a
b
3
1
2
0
1
SST
SSS
2
-1
0
-2
-1
-3
6
7
8
9
-2
10
6
7
Process length (in µm)
c
9
10
9
10
d
3
3
2
Phosphate
2
Nitrate
8
Process length (in µm)
1
0
1
0
-1
-1
-2
-2
6
7
8
9
-3
10
6
7
Process length (in µm)
8
Process length (in µm)
Fig. 3. Simple regression for North Atlantic data set. The univariate relationship between Protoceratium reticulatum process length and annual sea-surface values for a) salinity, b) temperature, c) nitrate, and d) phosphate. All explanatory variables are given as transformed (Z) values.
plotting the process length distribution against the reference distribution
in a Q–Q plot (Fig. 2). From the univariate regression of all variables
against process length (Fig. 3), the highest significant (p b 0.01) positive
relationship was established between annual surface salinity and process
length (Fig. 3a). From this regression, annual salinity accounted for 65.1%
of the observed process length variation (Table 1). August showed the
highest significant R2-value (0.643) of all univariate monthly models.
From the multiple regression, the AIC selection generated the following
annual model:
Process lengthAnnual ∼salinity þ temperature þ nitrate:
Together, these variables accounted for 68.2% of the process length
variation, leaving 31.8% of the variation unexplained (F3,60 = 46.03, P
Table 1
North Atlantic models from sea-surface data. Process length (P.l.) related to sea-surface temperature (SST), sea-surface salinity (SSS), nitrate, and phosphate in all Akaike information criterion (AIC) models. The second column shows the result of the univariate (process length [P.l.] ~ SSS) analysis. The third and fourth give the AIC-selected model as well as process length
correlation with the remaining variables. Adjusted R2 values are used to give a more representative account of the coefficient of determination. The percentage independent impact of the
predictors is given in the last four columns (gaps correspond to insignificant variables removed from the final model for that particular time interval). I-values are given in parentheses.
North Atlantic – SURFACE
Model
P.l. ~ SSS (R2adj)
AIC selected model
R2adj
SST
SSS
N
P
Annual
Jan–March
April–June
July–Sept
Oct–Dec
January
February
March
April
May
June
July
August
September
October
November
December
0.651
0.567
0.600
0.620
0.574
0.583
0.550
0.541
0.598
0.586
0.545
0.633
0.643
0.548
0.539
0.588
0.583
P.l. ~ SST + SSS + N
P.l. ~ SSS + N + P
P.l. ~ SST + SSS + N + P
P.l. ~ SST + SSS
P.l. ~ SST + SSS + N
P.l. ~ SST + SSS + N
P.l. ~ SSS + N + P
P.l. ~ SSS + N + P
P.l. ~ SSS + N + P
P.l. ~ SST + SSS + N + P
P.l. ~ SST + SSS + N
P.l. ~ SST + SSS + P
P.l. ~ SSS + N + P
P.l. ~ SST + SSS + N + P
P.l. ~ SST + SSS + P
P.l. ~ SST + SSS + N
P.l. ~ SST + SSS + N
0.682
0.611
0.668
0.654
0.629
0.629
0.571
0.585
0.693
0.643
0.619
0.681
0.677
0.609
0.596
0.631
0.644
14 (0.10)
73 (0.51)
56 (0.35)
58 (0.40)
96 (0.64)
69 (0.45)
57 (0.37)
63 (0.37)
55 (0.33)
69 (0.49)
61 (0.41)
78 (0.50)
71 (0.50)
70 (0.48
59 (0.38)
76 (0.47)
65 (0.42)
73 (0.48)
13 (0.09)
32 (0.20)
19 (0.13)
13 (0.08)
7 (0.5)
16 (0.11)
4 (0.03)
17 (0.11)
23 (0.15)
15 (0.10)
14 (0.09)
6 (0.04)
5 (0.03)
10 (0.06)
17 (0.11)
17 (0.11)
14 (0.09)
20 (0.13)
29 (0.17)
33 (0.20)
20 (0.14)
18 (0.12)
9 (0.05)
10 (0.07
5 (0.03)
18 (0.12)
10 (0.07)
8 (0.05)
12 (0.07)
11 (0.08)
6 (0.04)
23 (0.16)
20 (0.14)
31 (0.20)
14 (0.09)
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
Table 2
The independent and joint values for sea-surface salinity (SSS) following Akaike information criterion (AIC) selection. The percentage of process length explained by salinity is
given by the models' R2-value multiplied by the independent impact of salinity.
Model
Independent
Joint
% explained by SSS
Annual
January
February
March
April
May
June
July
August
September
October
November
December
0.51
0.37
0.37
0.33
0.49
0.41
0.50
0.50
0.48
0.38
0.47
0.42
0.48
0.15
0.22
0.18
0.22
0.12
0.18
0.05
0.14
0.17
0.18
0.08
0.17
0.17
49.78
34.82
35.94
32.19
47.82
39.23
48.27
48.35
47.39
35.93
45.27
40.99
46.98
b 0.001). Salinity displayed a positive relationship to process length, a
relationship that remained in each of the seasonal and monthly models.
From the multiple regression, the highest significant R2-values were
seen for the April and July models. From these results, the VIF analysis
of the April data showed that the confidence intervals for nitrate and
phosphate were somewhat higher than expected for uncorrelated predictors. In contrast, all VIF values for July were close to 1. However, as
the April results introduced the possibility of collinearity, a HP analysis
was conducted for all models.
HP analysis was performed using the respective variables generated
from the AIC selection (Table 1). Following the criterion of importance,
the analysis showed that salinity constituted the variable with the
highest independent effect on process length in all model scenarios.
Although other predictors occasionally displayed I-values above 0.15,
salinity was the only variable with I-values consistently above this
threshold. The highest monthly I-values of salinity were seen during
June and July, during which salinity in both cases explained approximately 48% of the observed process length variation (Table 2, Fig. 4). A
monthly (summer), rather than annual, model was chosen because
the cysts are formed around this time of the year. Based on the higher
R2-value (0.681) observed during July, this model was selected as the
working model:
Process lengthJuly ðμmÞ∼7:98836 þ 0:69155 ðSSSÞ þ 0:13316 ðPÞ
þ 0:15466 ðSSTÞ:
Together, these variables explained 68% of the process length variation (F3,60 = 45.82, P b 0.001). From the VIF analysis, salinity showed a
relatively low correlation to the other predictors (VIF b 3), and the diagnostic analysis confirmed that all variables had p-values supporting a
random distribution. This was supported by the Pearson residual plots,
which showed that nonconstant error variance did not constitute a
problem (Fig. 5).
As for the additional analysis, evaluating whether process length
was influenced by oxygen, silicate, MLD, PAR or chlorophyll a, these variables were often strongly correlated to the other independent variables
but showed little or no influence on process length. Based on these
results, they were considered insignificant in the North Atlantic analysis
and were not included in the Mediterranean–Marmara–Black Sea or
Baltic–Kattegat–Skagerrak analyses.
4.2. Mediterranean–Marmara–Black Sea analysis
Unlike the North Atlantic samples, process length deviates slightly
from a normal distribution. However, as no transformation satisfied
the normality criteria, the untransformed data were used for the analysis. Viewing the results from the univariate regression, annual salinity
explained 45.9% of the process length variation seen in the Mediterranean–Marmara–Black Sea data set (Table 3). However, unlike the
North Atlantic annual model, the Mediterranean–Marmara–Black Sea
AIC selection excluded salinity and included only temperature and
nitrate in the annual model:
Process lengthAnnual ∼SST þ nitrate:
Combined, these variables explained 74% of the process length variation (F2,23 = 36.59, P b 0.001), leaving 26% unexplained. Temperature
Variable contribution
80
60
Percentage
50
40
30
20
10
0
207
Phosphate
Nitrate
SST
SSS
Year Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
Fig. 4. North Atlantic data set: the annual and monthly independent impact of each variable on the variation seen in process length.
208
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
was included in all models except for June and July, whereas salinity
was primarily included in the summer and fall models. From the generated models, November and December displayed the highest significant
R2-values. Although both models contained temperature, and explained
around 70% of the variation, the November model included both phosphate and nitrate whereas the December model only included nitrate
in addition to temperature. The HP analysis showed significant Ivalues for salinity during summer and fall, whereas the highest values
for temperature were seen during winter and spring (Table 3).
4.3. Baltic–Kattegat–Skagerrak analysis
0.0
-0.5
Process lengthAnnual ∼salinity þ temperature þ nitrate−phosphate:
This model explained 90.2% of the observed variation, which was
similar to the R2-values also generated by the seasonal and monthly
models. Salinity was included in all models, seasonal as well as monthly.
Collinearity was confirmed from the Pearson analysis, the VIF output, as
well as the HP analysis. In all models, salinity displayed I-values above
0.15 (Table 4).
-1.0
Pearson residuals
0.5
As with the Mediterranean–Marmara–Black Sea data set, process
length deviated from a normal distribution and untransformed data
were used for the analysis. Based on the univariate regression, annual
salinity explained 87.2% of the process length variation (Table 4).
From the model selection, all variables remained in the annual model:
-2
-1
0
1
SST
-0.5
0.0
The extended North Atlantic multivariate analysis shows that while
the univariate relationship between process length and salinity produces relatively high R2-values, the explanatory power is influenced
by variables other than just salinity. These results demonstrate that
biological variation is more complex than can be explained simply by
relating one parameter of interest to another and computing the proportions of explained/unexplained variance. The relationship between
process length and salinity is simply not as straightforward as that
required for univariate analysis. Persistent use of simple regression
when not applicable, and especially when multicollinearity is demonstrated, will produce misleading R2-values. Univariate statistics will
not produce reliable results when explanatory variables are intercorrelated, and should not be used for these types of analyses. Instead,
the observed variation should be evaluated through multiple regression
analyses that also involve assessing the individual effects of the explanatory variables. Regarding the Mediterranean–Marmara–Black Sea and
Baltic–Kattegat–Skagerrak data, these results should be viewed with
caution, as the Q–Q analysis showed that process length did not follow
the assumption of a normal distribution.
-1.0
Pearson residuals
0.5
5. Discussion
-1
0
SSS
1
0.0
-0.5
5.1. Salinity effect on process length in the North Atlantic
-1.0
Pearson residuals
0.5
-2
0
P
1
2
3
0.0
-0.5
-1.0
Pearson residuals
0.5
-1
This study confirms a relationship between salinity and process
length in the North Atlantic Ocean, as salinity remains significant in all
models and through all HP analyses. Salinity also emerges as the dominant independent variable in each model. This indicates that salinity
positively influences the length of processes as they form. It would
therefore be ideal to construct a morphology-based model using data
from the precise time interval of cyst formation. However, Northern
Hemisphere blooms of Protoceratium reticulatum (cysts are most likely
to form during blooms) have been reported from April to October (see
6.5
7.0
7.5
8.0
Fitted values
8.5
9.0
Fig. 5. Regression diagnostics for the North Atlantic data set. Shown are basic residual plots
for regression based on the July model. Because linear models fitted by least squares can
make unrealistic assumptions, it is important to analyse the models using Pearson residuals
to make sure they are correctly specified. A lack-of-fit test was thus conducted to verify a random distribution for the July residuals. The Pearson residuals show no systematic features, i.e. they do not change with the fitted values or with predictors
(SST test stat = 0.955, Pr(N|t|) = 0.344; SSS test stat = 1.19, Pr(N|t|) = 0.239; P
test stat = − 0.522, Pr(N|t|) = 0.604; Tukey test = 1.071, Pr(N|t|) = 0.284). This
supports a correctly specified linear model. SST = sea-surface temperature, SSS =
sea-surface salinity, P = phosphate.
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
209
Table 3
Mediterranean–Marmara–Black Sea models from sea-surface data. Process length (P.l.) related to temperature (SST), sea-surface salinity (SSS), nitrate and phosphate in all Akaike information criterion (AIC) models. The second column shows the result from the univariate (P.l. ~ SSS) analysis. The third and fourth give the Akaike information criterion (AIC) selected model
as well as process length correlation with the remaining variables. Adjusted R2 values are used to give a more representative account of the coefficient of determination. The percentage
independent impact of the predictors is given in the last four columns (gaps correspond to insignificant variables removed from the final model for that particular time interval). I-values
are given in parentheses.
Mediterranean–Marmara–Black Sea – SURFACE
Model
P.l. ~ SSS (R2adj)
AIC selected model
R2adj
SST
Annual
Jan–March
April–June
July–Sept
Oct–Dec
January
February
March
April
May
June
July
August
September
October
November
December
0.459
0.435
0.450
0.497
0.451
0.439
0.433
0.431
0.441
0.461
0.446
0.494
0.502
0.495
0.463
0.435
0.451
P.l. ~ SST + N
P.l. ~ SST
P.l. ~ SST + N − P
P.l. ~ SST + SSS
P.l. ~ SST
P.l. ~ SST
P.l. ~ SST
P.l. ~ SST
P.l. ~ SST + N
P.l. ~ SST − P
P.l. ~ SSS + N
P.l. ~ −SSS
P.l. ~ SST + SSS
P.l. ~ SST + SSS + N
P.l. ~ SST + SSS
P.l. ~ SST + N − P
P.l. ~ SST + N
0.740
0.589
0.599
0.573
0.673
0.599
0.541
0.619
0.659
0.541
0.497
0.494
0.596
0.662
0.612
0.706
0.681
82 (0.62)
100
65 (0.42)
39 (0.24)
100
100
100
100
95 (0.65)
51 (0.30)
Mertens et al., 2011 for a summary). This means that the timing of cyst
formation is too imprecise to select a specific month, and that the full set
of monthly models had to be evaluated. This analysis shows that the
independent influence of salinity fluctuates through the year, but that
a significant increase occurs during the boreal summer. In particular,
salinity reaches its maximum independent influence during June, July
and August, when it explains N70% of the observed variation. This is
the closest that process length comes to being directly influenced by salinity, and a comparison of the difference between the simple and multiple July regressions reveals that the two models are very close to each
other (b5%).
Even though these results support previous observations of a positive
response pattern between salinity and process length (e.g. Verleye et al.,
2012), the relationship only explains a limited portion of the observed
variation. This means that other factors, such as transport mechanisms
or genetic variability, are affecting process length. In addition, when relating the independent salinity effect during July to the R2-value generated
by the multiple regression, no more than 48% of the observed process
length variation is correlated to salinity. In comparison, the univariate
37 (0.23)
50 (0.35)
60 (0.38)
64 (0.48)
81 (0.57)
SSS
N
P
18 (0.14)
5 (0.04)
30 (0.19)
61 (0.37)
5 (0.04)
49 (0.28)
92 (0.49)
100
63 (0.39)
45 (0.31)
40 (0.26)
8 (0.04)
5 (0.04)
24 (0.18)
19 (0.14)
12 (0.09)
analysis from the same time interval ascribes 63.3% of the variation to salinity (Table 1). This result highlights the importance of a HP analysis, as
the method constitutes a mechanism to detect inflated R2-values that
would otherwise be overlooked. Simply viewing models such as those
generated from the annual or July data sets as true, and using them for
predictions, is thus not justified (for more on this topic, see: Chatfield,
1995; Reichert and Omlin, 1997).
5.2. Regional comparison
The predictive power of the North Atlantic models was evaluated by
comparing the resulting models to data sets obtained from the Mediterranean–Marmara–Black Sea region and the Baltic–Kattegat–Skagerrak
system. From the univariate regression of salinity and process length
alone, it is already clear that there are regional differences (Fig. 6).
When viewing the Mediterranean–Marmara–Black Sea models and HP
results, salinity is not included to the same extent as seen in the North
Atlantic. Nor does it show a similar high independence. Instead, temperature emerges as the variable most frequently included in the models
Table 4
Baltic–Kattegat–Skagerrak models from sea-surface data. Process length (P.l.) related to temperature (SST), sea-surface salinity (SSS), nitrate and phosphate in all AIC models. The second
column shows the result from the univariate (P.l ~ SSS) analysis. The third and fourth give the Akaike information criterion (AIC) selected model as well as process length correlation with
the remaining variables. Adjusted R2 values are used to give a more representative account of the coefficient of determination. The percentage independent impact of the predictors is
given in the last four columns (gaps correspond to insignificant variables removed from the final model for that particular time interval). I-values are given in parentheses.
Baltic–Kattegat–Skagerrak – SURFACE
Model
P.l. ~ SSS (R2adj)
AIC selected model
R2adj
SST
SSS
N
P
Annual
Jan–March
April–June
July–Sept
Oct–Dec
January
February
March
April
May
June
July
August
September
October
November
December
0.872
0.877
0.851
0.868
0.884
0.881
0.879
0.870
0.853
0.846
0.853
0.866
0.861
0.874
0.880
0.882
0.889
P.l. ~ SST + SSS + N − P
P.l. ~ −SSS + SSS − N + P
P.l. ~ SST + SSS + N − P
P.l. ~ SST + SSS
P.l. ~ SST + SSS
P.l. ~ SSS + P
P.l. ~ SSS + P
P.l. ~ SSS − N + P
P.l. ~ SST + SSS
P.l. ~ SST + SSS + N
P.l. ~ SST + SSS + N − P
P.l. ~ SST + SSS
P.l. ~ SST + SSS
P.l. ~ SST + SSS + N
P.l. ~ SST + SSS
P.l. ~ SST + SSS
P.l. ~ SSS − N + P
0.902
0.898
0.908
0.902
0.899
0.902
0.901
0.893
0.907
0.908
0.908
0.901
0.902
0.902
0.902
0.902
0.902
38 (0.34)
27 (0.24)
44 (0.40)
22 (0.20)
43 (0.39)
42 (0.39)
47 (0.43)
46 (0.42)
78 (0.70)
57 (0.51)
67 (0.60)
69 (0.62)
94 (0.84)
48 (0.44)
51 (0.46)
50 (0.45)
84 (0.76)
86 (0.78)
62 (0.56)
60 (0.54)
49 (0.53)
55 (0.49)
2 (0.02)
9 (0.08)
1 (0.01)
18 (0.16)
17 (0.15)
9 (0.08)
1 (0.01)
33 (0.30)
31 (0.28)
5 (0.04)
1 (0.01)
9 (0.08)
2 (0.02)
52 (0.47)
48 (0.44)
39 (0.35)
16 (0.14)
14 (0.13)
35 (0.316)
40 (0.36)
41 (0.37)
3 (0.03)
3 (0.03)
42 (0.38)
210
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
11
Observation
10
Variable selection
Formulating the model
Process length (in µm)
9
Med
8
Literature
NA
Spatial
7
Temporal
Sample design
6
Sample size
Sample coverage
Sample strategy
ltic
Ba
5
4
3
Statistical analysis
2
EDA
Regression type
1
0
-3
-2
-1
0
1
2
TSSS
Fig. 6. Univariate regression: process length in relation to transformed annual salinity
(TSSS, salinity given in Z-values) for the North Atlantic, the Mediterranean–Marmara–
Black Sea region, and the Baltic–Kattegat–Skagerrak system.
Calibration
Model selection – AIC
Address multicollinearity and
its cause – HP
Evaluation
and the variable with the highest I-values (Supplementary Table 4).
However, this does not automatically mean that temperature controls
process length. Just as with the North Atlantic, it is important to consider
the timing of cyst formation. Previous observations show that dinoflagellate motile cell concentrations for the Mediterranean–Marmara–
Black Seas reach peak levels during late summer and fall (Psarra et al.,
2000; Mercado et al., 2005). If temperature indeed controls process
length, one might expect it to be important during this time interval.
From the model selection, temperature is excluded during June and
July but shows increasing I-values from August onwards. Temperature
thus passes as important during the time interval when cysts are most
likely to form. However, salinity also shows high I-values during this
interval, meaning that temperature does not show the same dominance
as seen for salinity in the North Atlantic model. As an example, while the
highest R2-value of all the Mediterranean–Marmara–Black Sea models
is seen during November (R2 = 0.70), temperature explains b 45% of
the variation. The remainder is partly explained by nitrate (17%) and
phosphate (8%). Therefore, it is difficult to identify one variable as having primary control over Mediterranean process length variation.
The Baltic–Kattegat–Skagerrak models and HP output in contrast
show a higher resemblance to the North Atlantic results, with salinity
being included in all generated models. Based on the HP analysis, salinity displays the highest I-values through all months except for April and
May. Although a large portion of this impact is due to joint effects, the
independent effect of salinity does increase during the boreal summer.
These results are consistent with conclusions from Mertens et al.
(2011, 2012a) that highest confidence should be given to the relationship between process length and late summer salinity. However, the
Baltic–Kattegat–Skagerrak data set is characterised by serious caveats.
First, the samples fall into two distinct groups: one with shorter processes confined to the brackish Baltic Sea, and one with longer processes
found in the region around the Kattegat and Skagerrak. Apart from process length, the cysts appear morphologically similar, but that does not
exclude the possibility that the process length difference seen between
the two areas is due to the presence of two genetically distinct strains.
This idea of strain-specific response in Protoceratium reticulatum is not
a novel concept as previous research has shown regional differences
in molecular data (Mertens et al., 2012a). If two distinct strains of
P. reticulatum have indeed been compared, the correlation is thus
performed between non-homogenous groups. This could generate a
reasonable correlation, but without actually representing the same relationship between process length and environment. In fact, due to the
Test the predictive ability by
applicability to other regions,
or by cross-validation
Fig. 7. The five steps involved in the model building process. In this study, model evaluation was performed by comparing the North Atlantic results to observations from the
Mediterranean–Marmara–Black Sea region and the Baltic–Kattegat–Skagerrak system.
environmental differences between the Baltic Sea and Kattegat–
Skagerrak, anything but a high correlation would be surprising.
It is beyond the scope of this study to determine whether the short
process length of the Baltic Sea cysts reflects a physiological response
or is genetically controlled. It is possible that the model efficiency
could simply be increased if additional samples were added to bridge
these two groups and provide a better coverage of the salinity gradient.
However, in-depth morphological and DNA analyses, preferably combined with an effective in situ study, are needed to fully understand
the factors involved. Meanwhile, any conclusions regarding the results
from the Baltic–Kattegat–Skagerrak region remain speculative.
5.3. Outlook and guidelines for future studies
A successful culture study would be the most important step towards understanding morphological variation in cysts of Protoceratium
reticulatum. Not only would this allow identification of individual
strains or cryptic species, and assessment of how salinity affects process
length; it might also serve to establish the biological function of these
changes. Numerous explanations have been given to account for the
development of projections on resting cysts of various planktonic organisms (Belmonte et al., 1997). Mertens et al. (2009) suggested that
the positive correlation they had determined between sea-water density and process length in the cyst of Lingulodinium polyedrum is related to
accelerated sinking through clustering. Whether such a relationship
might explain process length variation in P. reticulatum will require
laboratory studies to assess process development under controlled
conditions. In addition, the study of sediment traps would provide useful information on the unknown but presumably significant effects of
cyst transport.
The present study has focussed on statistically evaluating the relationships between environmental factors and Protoceratium reticulatum
cyst morphology based on distributions in modern sediments, with the
aim of improving the reliability of these cysts for paleoceanographic
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
reconstructions. In advancing this approach, future studies would benefit from the following key steps (modified from the recommendations of
Guisan and Zimmermann, 2000; Fig. 7):
1. Formulating a possible model: Once a possible dependent parameter
has been identified, either from the literature or from field observations, a model can be sought to describe the observed variation.
This means that a set of possible variables must be identified and selected. Variable selection is a recurring problem in regression studies,
where a balance must be found between more or fewer variables. Including more variables might have the positive effect of identifying a
factor that influences the dependent variable. On the other hand, the
risk of type I and II errors increases if the added variables have low
reliability. Due to the experimental nature of the present study, we
took a restrictive approach and used a limited set of variables. Future
research could then build on our results and evaluate other potentially important variables.
2. Sample design: The spatial and temporal resolution of a study should
be addressed as part of the sample set-up. The sample size, the size of
the sample area, and the sample strategy will all affect the outcome
of the study (Fortin and Dale, 2005). For example, in terms of the
present study, the limited salinity range covered by the North Atlantic data set constitutes a problem. Ideally, a sample design should include samples that will allow the modelling of end members. As part
of the sample design, it is also important to decide whether samples
will be collected from one or more constrained regions. In terms of
the spatial component, the minimum distance allowed between
samples should be decided prior to sampling. Recognising these issues ahead of time will result in a better representation of the effective sample size and reduce the risk of induced type I errors. If this is
not possible, SAC must be accounted for during the analysis (see
Fortin and Dale, 2005, for various approaches). In terms of the temporal factor, future calibration studies should also consider the problems associated with developing a model using material obtained at
different temporal scales. Using surface sediments for the dependent
data set and direct sea-surface measurements for the explanatory set
is not ideal. While cysts might have accumulated during centuries if
not more, the environmental data derives from a more restricted
time interval.
3. Statistical analysis: Only after a solid framework has been established
can one begin the EDA and proceed towards a functional model. During this part of the analysis, the data are tested to determine whether
they are normally distributed or in need of transformation. Next, a
decision must be made whether to use simple or multiple regression.
Multiple regression must be used if the dependent variable shows a
correlation to more than one variable.
4. Calibration: All explanatory variables proven insignificant to the variation seen in the dependent variable should be removed from the
model. This removal constitutes a key step in the analysis and should
not be done manually. Instead, it should be based on AIC and techniques developed for the analyses of collinear data in order to avoid
type I errors. To further evaluate the effect of the explanatory variables, HP provides a mechanism to address the fact that environmental variables are often correlated.
5. Evaluation: When a model has been developed for an area, its predictive ability should be tested. How this is done depends on whether
the data set is restricted to one region or not. The present study
benefited from having access to more than one data set. This meant
that the North Atlantic set could be used for the calibration, and the
Mediterranean–Marmara–Black Sea and Baltic–Kattegat–Skagerrak
sets for evaluating the model. In cases where only one data set is
available, different methods such as cross-validation (CV) must be
used. In CV, the data set is divided into one set that is used for the
analysis (the calibration set) and one set (the testing set) that is
used to evaluate the accuracy of the first set. Multiple rounds of calibration and testing are needed before the evaluation is considered
211
complete. In the event of dependent observations, h-block cross
validation can be used for the evaluation (Burman et al., 1994).
6. Summary and conclusions
In order to evaluate whether cyst morphology of the dinoflagellate
Protoceratium reticulatum can serve as a salinity proxy, cysts obtained
from North Atlantic surface-sediment samples were analysed statistically to establish the individual and combined effects of a number of environmental variables on process length. This analysis was conducted
using the Akaike information criterion in combination with hierarchical
partitioning. In the North Atlantic region, the process length variation of
P. reticulatum did not show a significant relationship to oxygen, silicate,
MLD, PAR, or chlorophyll a. However, analysis of the annual data set did
reveal a relationship with sea-surface temperature, salinity, and nitrate. Viewing the monthly data, the correlation between process
length and the environmental variables increased during the boreal
summer. This resulted in the following working model for the region:
process lengthJuly (μm) ~ 7.98836 + 0.69155 (SSS) + 0.13316 (P) +
0.15466 (SST) (R2 = 0.681). From this model, it is clear that variables
other than salinity also affect the observed process length variation. In
conclusion, process length and salinity do not show an independent relationship because temperature and/or nutrients are also included in all the
North Atlantic models. The North Atlantic analysis also shows that
multicollinearity is a reality that must be addressed when assessing
process length variation. A better understanding of the observed
variance also depends on whether the relationship between the included variables is clarified. We suggest that hierarchical partitioning be
used for this purpose. Conducting a HP analysis provides a straightforward way to identify the dependency level between variables and balance
the inflated R2-values produced by univariate and multiple regression
analyses.
The performance of the North Atlantic model, serving as the calibration set, was also evaluated by comparing the results to data sets from
the Mediterranean–Marmara–Black Sea region as well as the Baltic–
Kattegat–Skagerrak estuarine system. While the analyses show that
salinity constitutes the most important explanatory factor in the North
Atlantic, as well as in the Baltic–Kattegat–Skagerrak region, the same
cannot be said for the Mediterranean–Marmara–Black seas. In this region, temperature constitutes the dominant variable, and salinity is significant only during the summer and fall. This means that the North
Atlantic salinity model is region specific, and should not be applied to
areas beyond its geographical extent. By extension, these results show
that process length variations seen in cysts of Protoceratium reticulatum
cannot be used for global paleosalinity reconstructions using a single
algorithm.
Supplementary data to this article can be found online at http://dx.
doi.org/10.1016/j.palaeo.2014.01.012.
Acknowledgements
This research was supported by a University of Toronto studentship
and scholarship and Ontario Graduate Scholarship to I.-M.J., and an
NSERC Discovery Grant to M.J.H. K.N.M. is a postdoctoral fellow of
FWO Belgium. Marie-Josée Fortin (University of Toronto) is warmly
thanked for her generous advice on numerical ecology. We are grateful
to Rehab Elshanawany, Kari Grøsfjeld, Rex Harland, Ulrich Kotthoff, Peta
Mudie, Speranta Popescu, Vera Pospelova, and Sofia Ribeiro, for the loan
of microscope slides. Sample material from the Malangen fjord was
provided by the National Lacustrine Core Repository (LacCore). Elisabeth
Levac provided samples from Nova Scotia, and Simon Troelstra provided
information on Greenland cores and samples from the Mediterranean.
Michal Kucera and an anonymous reviewer provided very helpful
comments on the manuscript.
212
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
References
Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Autom.
Control 19 (6), 716–723.
Antonov, J.I., Locarnini, R.A., Boyer, T.P., Mishonov, A.V., Garcia, H.E., 2006. World ocean
atlas 2005, volume 2: salinity. In: Levitus, S. (Ed.), NOAA Atlas NESDIS 62. U.S.
Government Printing Office, Washington, D.C. (182 pp.).
Austin, M.P., 1980. Searching for a model for use in vegetation analysis. Vegetatio 42
(1–3), 11–21.
Belmonte, G., Miglietta, A., Rubino, F., Boero, F., 1997. Morphological convergence of
resting stages of planktonic organisms: a review. Hydrobiologia 355, 159–165.
Benestad, R., 2009. clim.pact: climate analysis and empirical–statistical downscaling
(ESD) package for monthly and daily set. R package version 2.3-10.
Birks, H.J.B., 1995. Quantitative palaeoenvironmental reconstructions. In: Maddy, D.,
Brew, J.S. (Eds.), Statistical Modelling of Quaternary Science Data. Technical Guide
5. Quaternary Research Association, Cambridge, U.K., pp. 116–254.
Bivand, R.S., Pebesma, E.J., Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R.
Springer, New York (376 pp.).
Boessenkool, K.P., Van Gelder, M.-J., Brinkhuis, H., Troelstra, S.R., 2001. Distribution of
organic-walled dinoflagellate cysts in surface sediments from transects across the
Polar Front offshore southeast Greenland. J. Quat. Sci. 16 (7), 661–666.
Brenner, W.W., 2005. Holocene environmental history of the Gotland Basin (Baltic Sea) —
a micropalaeontological model. Palaeogeogr. Palaeoclimatol. Palaeoecol. 220 (3–4),
227–241.
Burman, P., Chow, E., Nolan, D., 1994. A cross-validatory method for dependent data.
Biometrika 81 (2), 351–358.
Bütschli, O., 1885. Erster Band. Protozoa. Dr. H.G. Bronn's Klassen und Ordnungen des
Thier-Reiches, wissenschaftlich dargestellt in Wort und Bild. C.F. Winter'sche
Verlagshandlung, Leipzig and Heidelberg, pp. 865–1088.
Çağatay, M.N., Görür, N., Algan, O., Eastoe, C., Tchapalyga, A., Ongan, D., Kuhn, T., Kuşcu, I.,
2000. Late Glacial–Holocene palaeoceanography of the Sea of Marmara: timing of connections with the Mediterranean and the Black Seas. Mar. Geol. 167 (3–4), 191–206.
Cassie, R.M., Michael, A.D., 1968. Fauna and sediments of an intertidal mud flat: a multivariate analysis. J. Exp. Mar. Biol. Ecol. 2 (1), 1–23.
Chatfield, C., 1995. Model uncertainty, data mining and statistical inference. J. R. Stat. Soc.
Ser. A 158, 419–466.
Chevan, A., Sutherland, M., 1991. Hierarchical partitioning. Am. Stat. 45 (2), 90–96.
Claparède, É., Lachmann, J., 1859. Études sur les infusoires et les rhizopodes. Institut national génevois, Mémoires 6, 261–482 (pl. 14–24. [Imprinted 1858.]).
Clarke, A.L., Weckström, K., Conley, D.J., Anderson, N.J., Adser, F., Andrén, E., de Jonge, V.N.,
Ellegaard, M., Juggins, S., Kauppila, P., Korhola, A., Reuss, N., Telford, R.J., Vaalgamaa, S.,
2006. Long-term trends in eutrophication and nutrients in the coastal zone. Limnol.
Oceanogr. 51 (1, part 2), 385–397.
Dale, B., 1985. Dinoflagellate cyst analysis of Upper Quaternary sediments in core GIK
15530-4 from the Skagerrak. Nor. Geol. Tidsskr. 65 (1–2), 97–102.
Dale, B., 1996. Dinoflagellate cysts ecology: modelling and geological applications. In:
Jansonius, J., McGregor, D.C. (Eds.), Palynology: Principles and Applications, vol. 3.
American Association of Stratigraphic Palynologists Foundation, Dallas, TX,
pp. 1249–1275.
Dale, M.R.T., Zbigniewicz, M.W., 1997. Spatial pattern in boreal shrub communities: effects of a peak in herbivore density. Can. J. Botany 32, 1342–1348.
Dale, B., Thorsen, T.A., Fjellså, A., 1999. Dinoflagellate cysts as indicators of cultural eutrophication in the Oslofjord, Norway. Estuar. Coast. Shelf Sci. 48 (3), 371–382.
Dodge, J.D., 1989. Some revisions of the family Gonyaulacaceae (Dinophyceae) based on a
scanning electron microscope study. Bot. Mar. 32, 275–298.
Ellegaard, M., 2000. Variations in dinoflagellate cyst morphology under conditions of
changing salinity during the last 2000 years in the Limfjord, Denmark. Rev.
Palaeobot. Palynol. 109 (1), 65–81.
Elshanawany, R., Zonneveld, K., Ibrahim, M.I., Kholeif, S.E.A., 2010. Distribution patterns of
recent organic-walled dinoflagellate cysts in relation to environmental parameters in
the Mediterranean Sea. Palynology 34 (2), 233–260.
Fielding, S.R., Herrle, J.O., Bollmann, J., Worden, R.H., Montagnes, D.J.S., 2009. Assessing the
applicability of Emiliania huxleyi coccolith morphology as a sea-surface salinity proxy.
Limnol. Oceanogr. 54 (5), 1475–1480.
Fleishman, E., Mac Nally, R., Murphy, D.D., 2005. Relationships among non-native plants,
diversity of plants and butterflies, and adequacy of spatial sampling. Biol. J. Linn. Soc.
85 (2), 157–166.
Fortin, M.-J., Dale, M., 2005. Spatial Analysis: A Guide for Ecologists. Cambridge University
Press, Cambridge, U.K (365 pp.).
Fox, J., 1991. Regression Diagnostics. Sage Publications Inc., Newbury Park, CA (Chapter 1).
Fox, J., Weisberg, S., 2011. An {R} Companion to Applied Regression. Second edition. Sage
Publications Inc., Thousand Oaks, CA (Chapter 3).
Grøsfjeld, K., Harland, R., 2001. Distribution of modern dinoflagellate cysts from inshore
areas along the coast of southern Norway. J. Quat. Sci. 16 (7), 651–659.
Guisan, A., Zimmermann, N.E., 2000. Predictive habitat distribution models in ecology.
Ecol. Model. 135, 147–186.
Gundersen, N., 1988. En palynologisk underslogisk av dinoflagellatcyster langs en
synkende salinitetsgradient i recente sedimenter fra Østersjn-området. (Cand. Scient.
thesis) University of Oslo (96 pp.).
Hallett, R.I., 1999. Consequences of Environmental Change on the Growth and Morphology of Lingulodinium polyedrum (Dinophyceae) in Culture. (Ph.D. thesis) University
of Westminster, U.K.(109 pp.).
Head, M.J., 2007. Last Interglacial (Eemian) hydrographic conditions in the southwestern
Baltic Sea based on dinoflagellate cysts from Ristinge Klint, Denmark. Geol. Mag. 144
(6), 987–1013.
Kirci-Elmas, E., Algan, O., Özgar-Öngen, I., Struck, U., Altenbach, A.V., Sagular, E.K., Nazik,
A., 2008. Palaeoenvironmental investigation of sapropelic sediments from the
Marmara Sea: a biostratigraphic approach to palaeoceanographic history during the
Last Glacial–Holocene. Turk. J. Earth Sci. 17, 129–168.
Konieczny, R., 1983. En miljørettet palynologisk analyse av dinoflagellat-cyster i
resente marine sedimenter fra Skagerrak. (Cand. Scient. thesis) University of
Oslo (106 pp.).
Kotthoff, U., Pross, J., Müller, U.C., Peyron, O., Schmiedl, G., Schulz, H., Bordon, A., 2008.
Climate dynamics in the borderlands of the Aegean Sea during formation of sapropel
S1 deduced from a marine pollen record. Quat. Sci. Rev. 27 (7–8), 832–845.
Kullback, S., Leibler, R.A., 1951. On information and sufficiency. Ann. Math. Stat. 22 (1),
79–86.
Kunz-Pirrung, M., 2001. Dinoflagellate cyst assemblages in surface sediments of the
Laptev Sea region (Arctic Ocean) and their relationship to hydrographic conditions.
J. Quat. Sci. 16 (7), 637–649.
Ladouceur, S., 2007. Évaluation des changements hydrographiques de la Baie d'Hudson et
du Bassin de Foxe au cours des derniers siècles à partir de traceurs palynologiques et
micropaleontologiques. (M.Sc. Thesis) Université du Québec à Montréal (79 pp.).
Legendre, P., Dale, M.R.T., Fortin, M.-J., Gurevitch, J., Hohn, M., Myers, D., 2002. The consequences of spatial structure for the design and analysis of ecological field surveys.
Ecography 25 (5), 601–615.
Leroy, V., 2001. Traceurs palynologiques des flux biogéniques et des conditions
hydrographiques en milieu marin cotier: exemple de l'étang de Berre. DEA, Ecole
doctorale Sciences de l'environment d'Aix-Marseille (30 pp.).
Levac, E., 2002. High Resolution Palynological Records from Atlantic Canada: Regional
Holocene Paleoceanographic and Paleoclimatic History. (Ph.D. thesis) Dalhousie
University, Halifax, Nova Scotia (862 pp.).
Locarnini, R.A., Mishonov, A.V., Antonov, J.I., Boyer, T.P., Garcia, H.E., 2006. World ocean
atlas 2005, volume 1: temperature. In: Levitus, S. (Ed.), NOAA Atlas NESDIS 61. U.S.
Government Printing Office, Washington, D.C. (182 pp.).
Mac Nally, R., 2000. Regression and model-building in conservation biology, biogeography and ecology: the distinction between — and reconciliation of — ‘predictive’ and
‘explanatory’ models. Biodivers. Conserv. 9 (5), 655–671.
Mac Nally, R., 2002. Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables. Biodivers.
Conserv. 11 (8), 1397–1401.
Mangin, S., 2002. Distribution actuelle des kystes de dinoflagellés en Méditerranée
occidentale et application aux fonctions de transfert. Memoir of DEA. University of
Bordeaux 1 (34 pp.).
Marret, F., Scourse, J., 2002. Control of modern dinoflagellate cyst distribution in the Irish and
Celtic seas by seasonal stratification dynamics. Mar. Micropaleontol. 47 (1–2), 101–116.
Marret, F., Eiríksson, J., Knudsen, K.L., Turon, J.-L., Scourse, J., 2004. Distribution of dinoflagellate cyst assemblages in surface sediments from the northern and western shelf of
Iceland. Rev. Palaeobot. Palynol. 128 (1–2), 35–53.
Matthiessen, J., 1995. Distribution patterns of dinoflagellate cysts and other organicwalled microfossils in recent Norwegian–Greenland Sea sediments. Mar.
Micropaleontol. 24 (3–4), 307–334.
Mercado, J.M., Ramírez, T., Cortés, D., Sebastián, M., Vargas-Yáñez, M., 2005. Seasonal and
inter-annual variability of the phytoplankton communities in an upwelling area of
the Alboran Sea (SW Mediterranean Sea). Sci. Mar. 69 (4), 451–465.
Mertens, K.N., Ribeiro, S., Bouimetarhan, I., Caner, H., Combourieu Nebout, N., Dale, B., de
Vernal, A., Ellegaard, M., Filipova, M., Godhe, A., Goubert, E., Grøsfjeld, K., Holzwarth,
U., Kotthoff, U., Leroy, S.A.G., Londeix, L., Marret, F., Matsuoka, K., Mudie, P.J., Naudts,
L., Peña-Manjarrez, J.L., Persson, A., Popescu, S.-M., Pospelova, V., Sangiorgi, F., van der
Meer, M.T.J., Vink, A., Zonneveld, K.A.F., Vercauteren, D., Vlassenbroeck, J., Louwye, S.,
2009. Process length variation in cysts of a dinoflagellate, Lingulodinium
machaerophorum, in surface sediments: investigating its potential as salinity
proxy. Mar. Micropaleontol. 70 (1–2), 54–69.
Mertens, K.N., Dale, B., Ellegaard, M., Jansson, I.-M., Godhe, A., Kremp, A., Louwye, S., 2011.
Process length variation in cysts of the dinoflagellate Protoceratium reticulatum, from
surface sediments of the Baltic–Kattegat–Skagerrak estuarine system: a regional salinity proxy. Boreas 40 (2), 242–255.
Mertens, K.N., Bringué, M., van Nieuwenhove, N., Takano, Y., Pospelova, V., Rochon, A., de
Vernal, A., Radi, T., Dale, B., Patterson, R.T., Weckström, K., Andrén, E., Louwye, S.,
Matsuoka, K., 2012a. Process length variation of the cysts of the dinoflagellate
Protoceratium reticulatum in the North Pacific and Baltic–Skagerrak region: calibration
as an annual density proxy and first evidence of pseudo-cryptic speciation. J. Quat. Sci.
27 (7), 734–744.
Mertens, K.N., Bradley, L.R., Takano, Y., Mudie, P.J., Marret, F., Aksu, A.E., Hiscott, R.N.,
Verleye, T.J., Mousing, E.A., Smyrnova, L.L., Bagheri, S., Mansor, M., Pospelova, V.,
Matsuoka, K., 2012b. Quantitative estimation of Holocene surface salinity variation
in the Black Sea using dinoflagellate cyst process length. Quat. Sci. Rev. 39, 45–59.
Olea, P.P., Mateo-Tomás, P., de Frutos, Á., 2010. Estimating and modelling bias of the
hierarchical partitioning public-domain software: implications in environmental
management and conservation. PLoS ONE 5, e11698.
Paez-Reyes, M., Head, M.J., 2013. The Cenozoic gonyaulacacean dinoflagellate genera
Operculodinium Wall, 1967 and Protoceratium Bergh, 1881 and their phylogenetic
relationships. J. Paleontol. 87, 786–803.
Persson, A., Godhe, A., Karlson, B., 2000. Dinoflagellate cysts in recent sediments from the
west coast of Sweden. Bot. Mar. 43 (1), 69–79.
Psarra, S., Tselepides, A., Ignatiades, L., 2000. Primary productivity in the oligotrophic
Cretan Sea (NE Mediterranean): seasonal and interannual variability. Prog. Oceanogr.
46 (2–4), 187–204.
R Development Core Team, 2011. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
(URL http://www.R-project.org/).
I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213
Reichert, P., Omlin, M., 1997. On the usefulness of overparameterized ecological models.
Ecol. Model. 95 (2–3), 289–299.
Ribeiro, S., Moros, M., Ellegaard, M., Kuijpers, A., 2012. Climate variability in West Greenland during the past 1500 years: evidence from a high-resolution marine palynological record from Disko Bay. Boreas 41 (1), 68–83.
Röder, K., Hantzsche, F.M., Gebühr, C., Miene, C., Helbig, T., Krock, B., Hoppenrath, M.,
Luckas, B., Gerdts, G., 2012. Effects of salinity, temperature and nutrients on growth,
cellular characteristics and yessotoxin production of Protoceratium reticulatum.
Harmful Algae 15, 59–70.
Sangiorgi, F., Fabbri, D., Comandini, M., Gabbianelli, G., Tagliavini, E., 2005. The
distribution of sterols and organic-walled dinoflagellate cysts in surface sediments of the Northwestern Adriatic Sea (Italy). Estuar. Coast. Shelf Sci. 64 (2–3),
395–406.
Schroeder, L.D., Sjoquist, D.L., Stephan, P.E., 1986. Understanding regression analysis:
an introductory guide. Sage University Paper Series on Quantitative Applications in the Social Sciences. Sage Publications Inc., Newbury Park, CA 07–057
(Chapter 5).
Sprangers, M., Dammers, N., Brinkhuis, H., van Weering, T.C.E., Lotter, A.F., 2004. Modern
organic-walled dinoflagellate cyst distribution offshore NW Iberia; tracing the upwelling system. Rev. Palaeobot. Palynol. 128 (1–2), 97–106.
Stein, F.R. von, 1883. Der Organismus der Infusionsthiere nach eigenen Forschungen in
systematischer Reihenfolge bearbeitet. III Abteilung. II. Hälfte. Die Naturgeschichte
der arthrodelen Flagellaten. Wilhelm Engelmann, Leipzig (30 pp., 25 pl).
213
Thorsen, T.A., Dale, B., 1998. Climatically influenced distribution of Gymnodinium
catenatum during the past 2000 years in coastal sediments of southern Norway.
Palaeogeogr. Palaeoclimatol. Palaeoecol. 143 (1–3), 159–177.
Thorsen, T.A., Dale, B., Nordberg, K., 1995. ‘Blooms’ of the toxic dinoflagellate Gymnodinium
catenatum as evidence of climatic fluctuations in the late Holocene of southwestern
Scandinavia. The Holocene 5 (4), 435–446.
Verleye, T.J., Mertens, K.N., Young, M.D., Dale, B., McMinn, A., Scott, L., Zonneveld, K.A.F.,
Louwye, S., 2012. Average process length variation of the marine dinoflagellate cyst
Operculodinium centrocarpum in the tropical and Southern Hemisphere Oceans:
assessing its potential as a palaeosalinity proxy. Mar. Micropaleontol. 86–87, 45–58.
Wall, D., Dale, B., Harada, K., 1973. Descriptions of new fossil dinoflagellates from the late
Quaternary of the Black Sea. Micropaleontology 19 (1), 18–31.
Walsh, C., Mac Nally, R., 2008. hier.part: hierarchical partitioning. R package version 1.0-3.
Willumsen, P.S., Filipsson, H.L., Reinholdsson, M., Lenz, C., 2013. Surface salinity and nutrient
variations during the Littorina Stage in the Fårö Deep, Baltic Sea. Boreas 42, 210–223.
Zonneveld, K.A.F., Marret, F., Versteegh, G.J.M., Bogus, K., Bonnet, S., Bouimetarhan, I.,
Crouch, E., de Vernal, A., Elshanawany, R., Edwards, L., Esper, O., Forke, S., Grøsfjeld,
K., Henry, M., Holzwarth, U., Kielt, J.-F., So-Young, K., Ladouceur, S., Ledu, D., Chen,
L., Limoges, A., Londeix, L., Lu, S.-H., Mahmoud, M.S., Marino, G., Matsouka[sic], K.,
Matthiessen, J., Mildenhal[sic], D.C., Mudie, P., Neil, H.L., Pospelova, V., Qi, Y., Radi,
T., Richerol, T., Rochon, A., Sangiorgi, F., Solignac, S., Turon, J.-L., Verleye, T., Wang,
Y., Wang, Z., Young, M., 2013. Atlas of modern dinoflagellate cyst distribution based
on 2405 datapoints. Rev. Palaeobot. Palynol. 191, 1–197.