Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 Contents lists available at ScienceDirect Palaeogeography, Palaeoclimatology, Palaeoecology journal homepage: www.elsevier.com/locate/palaeo Statistically assessing the correlation between salinity and morphology in cysts produced by the dinoflagellate Protoceratium reticulatum from surface sediments of the North Atlantic Ocean, Mediterranean–Marmara–Black Sea region, and Baltic–Kattegat–Skagerrak estuarine system Ida-Maria Jansson a,⁎, Kenneth Neil Mertens b, Martin J. Head a,c a b c Department of Earth Sciences, Earth Sciences Centre, University of Toronto, 22 Russell St., Toronto, Ontario M5S 3B1, Canada Research Unit for Palaeontology, Ghent University, Krijgslaan 281 S8, 9000 Ghent, Belgium Department of Earth Sciences, Brock University, 500 Glenridge Avenue, St. Catharines, Ontario L2S 3A1, Canada With contributions from: Anne de Vernal a, Laurent Londeix b, Fabienne Marret c, Jens Matthiessen d, Francesca Sangiorgi e a GEOTOP, Université du Québec à Montréal, P.O. Box 8888, Montréal, Québec H3C 3P8, Canada Département de Géologie et Océanographie, UMR 5805 CNRS, Université Bordeaux 1, avenue de Facultés, 33405 Talence cedex, France c School of Environmental Sciences, University of Liverpool, Liverpool L69 7ZT, UK d Alfred Wegener Institute for Polar and Marine Research (AWI), 27515 Bremerhaven, Germany e Department of Earth Science, Faculty of Geosciences, Laboratory of Palaeobotany and Palynology, Budapestlaan 4, 3584 CD Utrecht, The Netherlands b a r t i c l e i n f o Article history: Received 31 May 2013 Received in revised form 8 January 2014 Accepted 18 January 2014 Available online 25 January 2014 Keywords: North Atlantic Model development Hierarchical partitioning Collinearity Operculodinium centrocarpum a b s t r a c t Recent studies have correlated dinoflagellate resting cyst morphology to salinity and density variations in the water column, suggesting that morphology can be used for paleoceanographic reconstructions. However, the univariate statistics used by these studies are appropriate only where morphology is related to a single variable. Density is a function of salinity and temperature, so more advanced statistical methods are needed to understand which parameters affect morphology. In this study based on surface (coretop) sediments, a set of environmental variables (sea-surface salinity, temperature, nitrate, and phosphate) was simultaneously correlated to morphological variations seen in resting cysts produced by the dinoflagellate Protoceratium reticulatum (=Operculodinium centrocarpum sensu Wall and Dale). Approximately 3200 measurements were obtained from the North Atlantic Ocean and used to generate a working model based on the Akaike information criterion. Hierarchical partitioning was then applied to establish the independent and joint effects for each predictor variable. Results from these analyses showed that while salinity constitutes the dominant variable affecting process length in the cysts of P. reticulatum in the North Atlantic, it is not the sole explanatory variable and that multicollinearity exists. Temperature and nutrients also showed a significant relationship to the morphology, requiring multiple regression to construct a representative model. The applicability of the North Atlantic working model was finally evaluated by comparing the results to data from the Mediterranean, Marmara, and Black seas, and Baltic–Kattegat–Skagerrak estuarine system. This comparison showed regional differences in morphological–environmental correlation. While salinity constitutes the most important explanatory factor in both the North Atlantic and Baltic–Skagerrak system, this is not so for the Mediterranean–Black Sea region where temperature is the dominant variable. It is concluded that a predictive salinity model based on P. reticulatum cyst morphology has at best a regional application. © 2014 Elsevier B.V. All rights reserved. 1. Introduction Reliable reconstructions of past oceanographic conditions require dependable proxies for seawater conditions, in particular salinity. In ⁎ Corresponding author. E-mail address: [email protected] (I.-M. Jansson). 0031-0182/$ – see front matter © 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.palaeo.2014.01.012 pursuit of such a proxy, sea-surface salinity (SSS) has been related to changes in the morphology of both coccolithophores (e.g. Fielding et al., 2009) and dinoflagellate cysts (e.g. Mertens et al., 2009, 2011, 2012a,b; Verleye et al., 2012). The original idea of salinity affecting dinoflagellate cyst morphology (Wall et al., 1973) has recently come to focus on process length variation observed in resting cysts produced by Lingulodinium polyedrum (Stein, 1883) Dodge, 1989 and Protoceratium I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 reticulatum (Claparède and Lachmann, 1859) Bütschli, 1885. Culture studies suggest that L. polyedrum is not suitable for predictive SSS models as the morphology is also affected by temperature (Hallett, 1999). Culture studies performed on P. reticulatum suggest that salinity influences cell size (Röder et al., 2012), but no conclusive results have been obtained to relate cyst morphology with salinity. The cysts of Protoceratium reticulatum, known also by the now superfluous paleontological name Operculodinium centrocarpum sensu Wall and Dale (Paez-Reyes and Head, 2013), have attracted attention as potential paleosalinity indicators owing to their cosmopolitan distribution, generally high abundance, and broad salinity tolerance (Zonneveld et al., 2013). Until recently, morphological studies have focused on semi-quantitative reconstructions using cysts recovered from sediments (Brenner, 2005; Dale, 1996; Ellegaard, 2000; Head, 2007). However, in a more recent study focusing on the Baltic–Kattegat–Skagerrak estuarine system, linear correlation was used to relate the morphology of P. reticulatum cysts to seasonal variations in surface salinity (S), temperature (T), and density (D) (Mertens et al., 2011). This resulted in a transfer function in which process length was correlated to summer salinity (R2 = 0.8), a model subsequently used to reconstruct salinity for the early through late Holocene of the Baltic Sea (Willumsen et al., 2013). Subsequently, similar studies conducted outside the Baltic–Skagerrak system have identified sea-surface density as responsible for process length variation in this species (Mertens et al., 2012a; Verleye et al., 2012). Although these studies support the original idea of a connection between salinity and cyst morphology, a more in-depth understanding of the relationship is required before a reliable transfer function can be applied. To develop a reliable proxy based on process length and salinity, a firm evaluation of the relationship between cyst morphology and potentially influential oceanographic variables is required (Birks, 1995). This requirement was not fulfilled in the previous studies as they had relied on simple regressions, individually relating process length to each predictor variable. Such an approach is questionable because biological variables are often affected by numerous environmental parameters. Assuming that process length variation is influenced by more than one variable, simple regression cannot provide a realistic view of the problem at hand. 203 The present study is based on a regionally constrained calibration, using North Atlantic core-top material to evaluate the individual and combined effects of multiple environmental variables on process length. Since process length constitutes the only dependent variable, a combination of methods was used to assess its correlation to SSS, seasurface temperature (SST), nitrate, and phosphate. First, the importance of salinity was addressed by conducting a model selection using the Akaike information criterion (AIC; Akaike, 1974). Second, as environmental variables are often correlated, the explanatory variables were assessed using hierarchical partitioning (HP) to avoid naïve interpretations of the outcomes of the univariate and multiple regressions (Chevan and Sutherland, 1991; Mac Nally, 2002). Results from the North Atlantic data set (Fig. 1a) were finally compared to a data set from the Mediterranean–Marmara–Black Sea region (Fig. 1b) and another from the Baltic–Kattegat–Skagerrak estuarine system (Mertens et al., 2011; Fig. 1c). 2. Material and methods 2.1. Sample and data collection In contrast to the Baltic–Kattegat–Skagerrak estuarine system, where previously collected data (Mertens et al., 2011, 2012a) were available for re-analysis, cyst measurements were required from the North Atlantic Ocean and Mediterranean–Marmara–Black Sea region. A total of 138 microscope slides (from Konieczny, 1983; Dale, 1985; Gundersen, 1988; Matthiessen, 1995; Thorsen et al., 1995; Thorsen and Dale, 1998; Dale et al., 1999; Çağatay et al., 2000; Persson et al., 2000; Boessenkool et al., 2001; Grøsfjeld and Harland, 2001; KunzPirrung, 2001; Leroy, 2001; Levac, 2002; Mangin, 2002; Marret and Scourse, 2002; Marret et al., 2004; Sprangers et al., 2004; Sangiorgi et al., 2005; Clarke et al., 2006; Ladouceur, 2007; Kirci-Elmas et al., 2008; Kotthoff et al., 2008; Mertens et al., 2009, 2012a; Elshanawany et al., 2010; Ribeiro et al., 2012) were obtained for this purpose. All samples were collected using box and multicorers and were retrieved during multiple cruises in the North Atlantic Ocean and the Mediterranean–Marmara–Black Sea region. Samples were exclusively limited to the top 2 cm of sediment, and were thus considered modern a b Mediterranean–Marmara–Black seas 46° North Atlantic Ocean 44° Black Sea 42° 80° 40° Marmara Sea 38° 36° 70° Mediterranean Sea 34° 0° 10° 20° 30° c Baltic–Kattegat–Skagerrak system 60° 64° a) n=64 North Sea b) n=26 62° 50° 40° c) n=82 60° 58° 40° Skagerrak 56° -100° -80° -60° -40° -20° 0° 20° 40° Kattegat Baltic Sea 54° 0° 5° 10° 15° 20° 25° 30° Fig. 1. Distribution of samples analysed in the present study: a) the North Atlantic Ocean; b) the Mediterranean, Marmara, and Black seas; and c) the Baltic Sea, Kattegat, and Skagerrak. All maps were generated using clim.pact (Benestad, 2009), a coastline drawing function developed for R. The sample size, n, is given for each region. 204 I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 and representative of present-day conditions. In that sense, it was also assumed that the explanatory variables remained stable during the time of cyst accumulation. Most cysts were extracted from the sediments using similar standard palynological methods, i.e. based on hydrochloric and hydrofluoric acids, sieving, and in some cases ultrasound. Slides were prepared by mounting aliquots of residue in glycerine jelly. Measurements were performed by K.N.M. (n = 72) and I.-M.J. (n = 18). All measurements by K.N.M. were conducted using an Olympus BX51 microscope with a Nikon digital sight DS-1L 1 module, a Nikon eclipse 80i microscope and coupled DS Camera Head (DSFi1)/DS Camera Control Unit DS-L2, and a Leica DM 5000 B microscope equipped with a Leica DFC490 camera and Leica Application Suite 2.8.1 software. Measurements by I.-M.J. were conducted using a Zeiss Imager Z1m microscope in combination with an Olympus XC10 Axio Imager camera and the image analysis software analySIS. A 100× oil immersion objective was used for all measurements. Following the approach previously developed by Mertens et al. (2009, 2011), the three longest visible peripheral processes on each cyst of Protoceratium reticulatum were measured (and then averaged) together with the cyst's maximum central body diameter. Measurements were obtained from no fewer than 20 cysts per sample, and up to 50 specimens when possible. Reproducibility was tested by the two observers by duplicating measurements on 12 of the samples; comparisons of results showed both sets of measurements to be within 0.5 μm of each other. This is the same reproducibility as determined in another recent study of P. reticulatum process length (Head, 2007). Cysts without processes were not included in the analysis due to identification difficulties. 2.2. Test for spatial patterns Parametric statistics is based on independence. However, ecological data often violate this assumption by showing spatial dependence in terms of trends, gradients, patchiness, and random fluctuations. Spatial dependence means that the dependent variable/error term of a certain location is correlated with the same observations obtained at adjacent locations. This is a concern, as spatial autocorrelation (SAC) modifies the effective sample size, thereby introducing the risk of an inflated correlation and type I errors (Legendre et al., 2002). In the present study, two types of spatial dependence (Fortin and Dale, 2005) could affect the morphological differences seen in Protoceratium reticulatum cysts. First, SAC could occur as a result of constraints on the organism (mobility, dispersal) if there are genetically distinct populations. This type of spatial dependence is referred to as inherent autocorrelation and should not be confused with induced spatial dependence. Here, the process length of P. reticulatum cysts would have a functional dependence on an underlying variable (e.g. salinity) that is autocorrelated in itself. One type of SAC does not exclude the other, so in many cases inherent as well as induced SAC will affect the data set. In order to present results that are not overly optimistic, or directly misleading, it is crucial that spatial autocorrelation is acknowledged and addressed. Ideally, the problem should be faced and dealt with as part of the sample design. In this scenario, sample locations should be selected randomly so that the samples are distributed independently from one another. As simple as this might sound, it is not always possible and the distribution of samples used for the present study was not actively determined prior to the sampling. This is a sombre reality for many researchers due to the high cost of collecting deep-ocean sediment samples and/or limited access to material. However, this limitation makes it even more vital to acknowledge SAC and to work towards a solution. As the present study is based on a sample set that covers a large region, and genetic variability might be present, spatial clustering is most likely a reality. If SAC is a factor, then a number of solutions can be implemented. The simplest is to acknowledge the SAC and then use a 1% significance level instead of 5% (e.g. Dale and Zbigniewicz, 1997). Areas can also be divided based on their degree of dissimilarity, or adjacent locations with similar values can be grouped by generating spatial clusters (Fortin and Dale, 2005). For the present study, samples were grouped to generate a more realistic representation of the actual sample size. As part of this process, the degree of dependency between the sample sites was first evaluated by calculating the inverse distance weights (IDW) for all samples using the free statistical language and environment R (also referred to as “GNU S”, R Development Core Team, 2011) and the package “ape”. In IDW, spatially close samples are assumed to be more similar than samples further away. On a continuous surface, all samples will be related to some extent, but they will not carry the same weight (for more on this subject and R, see Bivand et al., 2008). An individual sample will exert a local influence manifested as high weights in spatially close samples. As this influence decreases with increasing distance, lower values are generated and the sample can be considered as belonging to a more random distribution. In this study, the IDW was calculated for the North Atlantic and Mediterranean–Marmara–Black Sea samples. In each case, all samples that generated a distance weight above one were analysed further to determine whether they should be kept as independent samples or merged with a close neighbour. The decision to merge samples was made conservatively, and only samples less than 15 km apart and showing a process length difference below 0.5 μm (i.e. below the limit of reproducibility) were combined and counted as a single data point. The SAC was then measured by conducting a Moran's I test to show whether SAC was present after the merging of the adjacent samples. 2.3. Environmental variables The North Atlantic set of environmental variables consisted of four predictors, all selected to test whether they affected process length on a microclimatic scale or through nutrient availability. These predictors, which included surface salinity and temperature as well as phosphate and nitrate, were all considered direct variables (Austin, 1980). Unlike previous studies (e.g. Mertens et al., 2011; Verleye et al., 2012), temperature and salinity functions, such as density and salinity/temperature (S/T) ratios, were not included. The same four variables were also obtained for the Mediterranean and Baltic analyses. Annual, seasonal, as well as monthly data were acquired from each region to provide a better understanding of the observed process length variability. All environmental data were obtained from the World Ocean Atlas (Antonov et al., 2006; Locarnini et al., 2006) and processed in Ocean Data View (http://odv.awi.de). Process length was also correlated to five additional variables: oxygen, silicate, and mixed layer depth (MLD) as well as photosynthetically active radiation (PAR) and chlorophyll a. Whereas oxygen, silica and MLD were all obtained using the same methods as the main parameters, PAR and chlorophyll a were obtained from SeaWiFS (using the NASA GSFC website http://oceancolor.gsfc.nasa.gov/) and processed using the software solution ENVI. These variables have not been linked previously with morphological variation in Protoceratium reticulatum cysts, but the correlation was performed to determine whether they might nonetheless be influential. 3. Numerical analysis 3.1. Exploratory data analysis and transformation Prior to the model selection, an exploratory data analysis (EDA) was conducted to gain information about the relationships between the response and predictor variables (Fox and Weisberg, 2011). For the first part of the EDA, a Shapiro–Wilk test was conducted to establish whether process length data were normally distributed. The same was done for the independent variables to establish whether the data needed to be transformed before continuing the analysis. This was followed by the construction of a quantile–quantile (Q–Q) plot to visually render the process length distribution against the theoretical reference I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 4.1. North Atlantic analysis Normal Q–Q Plot 8.5 9.0 9.5 As the first step of the EDA, the Shapiro–Wilk normality test established that the distribution of process lengths in the sample set was normal (W = 0.9941, p-value = 0.9905). This result was confirmed by 8.0 While both univariate and multiple linear least-squares regressions are often used for hypothesis testing in ecology, it is important to acknowledge that the methods frequently make unrealistic assumptions about the data structure (Fox, 1991). Using the package “car” in R, a diagnostic analysis was conducted to assess problems such as nonlinearity, nonconstant error variance, and collinearity (Fox and Weisberg, 2011). Linearity was first assessed from Pearson residual plots to evaluate whether the linear models were correctly specified. To detect nonconstant error variance, the studentized residuals were plotted against the fitted values. In terms of collinearity, this problem occurs when variables are inter-correlated, which is often the case for environmental variables. In practice, the importance of each variable decreases where multicollinearity exists, and opposite variables can effectively cancel each other (Mac Nally, 2000). Since AIC does not address this problem, the impact of collinearity needed to be addressed in a separate analysis. There are a few different ways to handle collinearity, of which model respecification involves combining correlated regressors or simply removing one of two inter-correlated factors after some type of prior screening. However, the problem with this approach is that one can inadvertently delete the variable that in fact has the highest effect on the dependent variable (Mac Nally, 2000). In the present study, a Pearson correlation matrix was constructed for the initial screening, but no variables were removed based on these results. Collinearity was then confirmed by calculating the variance inflation factor (VIF) for each model predictor (Fox, 1991; Fox and Weisberg, 2011). Collinearity was finally assessed more thoroughly by applying hierarchical partitioning (HP; Chevan and Sutherland, 1991). In HP analysis, all models that can be generated by the selected variables are evaluated to find those factors that influence the dependent variable (Chevan and Sutherland, 1991). While the method does not provide a predictive equation, it is capable of partitioning variance, thereby making it possible to separate the variable's independent (I) and joint (J) effects (Chevan and Sutherland, 1991). As long as the data set consists of no more than nine explanatory variables, HP analysis All samples containing fewer than 20 measurable cysts were excluded from the analysis. In addition, two samples that did not follow the normal preparation standards (Boessenkool et al., 2001) were also removed. After evaluating the IDW results, eight of the North Atlantic samples were merged into four. This resulted in a final data set of 64 samples for the North Atlantic (Fig. 1a), equal to approximately 3200 measurements. A Moran's I analysis was then conducted, showing a p-value of 0 and thereby indicating a random spatial pattern. The North Atlantic data set covered an annual salinity gradient between 27.8 and 36.0, and the average process length varied between 6.3 and 9.7 μm (mean = 8.0 μm, stdv. = 0.7 μm). For the comparative analysis, 26 of the Mediterranean–Marmara–Black Sea samples fulfilled the measurement standards (Fig. 1b; mean = 8.65 μm, stdv. = 1.11 μm). Regarding the Baltic–Kattegat–Skagerrak set, the complete analysis was based on re-analysing the results from 82 samples previously obtained for similar studies (Mertens et al., 2011, 2012a; Fig. 1c; mean = 6.28 μm, stdv. = 2.18 μm), with all samples fulfilling the measurement standards. Sample information for the North Atlantic, Mediterranean, and Baltic data sets is given in Supplementary Table 1. All analyses were performed using annual sea-surface means before seasonal and monthly data sets were considered. A full breakdown of HP results calculated before model reduction is available as Supplementary Tables 2–4. 7.5 3.3. Diagnostic analysis and collinearity assessment 4. Results Sample Quantiles Following the approach used in earlier studies of Protoceratium reticulatum cysts, the direct relationship between process length and salinity was examined first. This was done to determine the method's accuracy by comparing the outcome to the results later obtained from the multiple regression models. All multivariate models were developed using model selection, meaning that an optimal working model is identified by actively selecting those parameters to include and exclude. Model selection is necessary as it helps to identify irrelevant variables, which can then be excluded. This is important because if the insignificant variables are left within the model, their inclusion can result in overfitted models with inflated coefficients, standard errors, and compromised R2-values (Schroeder et al., 1986; Mac Nally, 2000). For the present study, the model selection was based on AIC, which in turn is based on Kullback– Leibler information (Kullback and Leibler, 1951), and provides a means to select the best model from the full set of models generated by the data. This helps to avoid type I errors as well as models with artificially high R2-values. 7.0 3.2. Model selection allows reliable ranking of the independent variables based on their importance (Olea et al., 2010). From the ranking and the I-values, one can then establish what variable has the strongest independent effect on the dependent variable. I-values above 0.15 have been suggested to indicate an important relationship between the explanatory variable and the predictor, whereas values below 0.05 have been categorised as unimportant (Fleishman et al., 2005). All I- and J-values were calculated using a randomization routine to establish the statistical significance of each variable (Mac Nally, 2002; Walsh and Mac Nally, 2008). 6.5 distribution. All environmental data were standardised to facilitate comparisons between sites and variables. Z-scores were generated for each data point by first calculating the mean and standard deviation for each variable. The mean was then subtracted from each individual data point and the remaining product was divided with the corresponding standard deviation (Cassie and Michael, 1968). The EDA, as well as all following statistical analyses, were performed using R. 205 -2 -1 0 1 2 Theoretical Quantiles Fig. 2. Normal quantile-comparison plot for Protoceratium reticulatum process lengths (in μm) in the North Atlantic. The samples follow the fitted line, demonstrating that process lengths follow a normal distribution. 206 I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 a b 3 1 2 0 1 SST SSS 2 -1 0 -2 -1 -3 6 7 8 9 -2 10 6 7 Process length (in µm) c 9 10 9 10 d 3 3 2 Phosphate 2 Nitrate 8 Process length (in µm) 1 0 1 0 -1 -1 -2 -2 6 7 8 9 -3 10 6 7 Process length (in µm) 8 Process length (in µm) Fig. 3. Simple regression for North Atlantic data set. The univariate relationship between Protoceratium reticulatum process length and annual sea-surface values for a) salinity, b) temperature, c) nitrate, and d) phosphate. All explanatory variables are given as transformed (Z) values. plotting the process length distribution against the reference distribution in a Q–Q plot (Fig. 2). From the univariate regression of all variables against process length (Fig. 3), the highest significant (p b 0.01) positive relationship was established between annual surface salinity and process length (Fig. 3a). From this regression, annual salinity accounted for 65.1% of the observed process length variation (Table 1). August showed the highest significant R2-value (0.643) of all univariate monthly models. From the multiple regression, the AIC selection generated the following annual model: Process lengthAnnual ∼salinity þ temperature þ nitrate: Together, these variables accounted for 68.2% of the process length variation, leaving 31.8% of the variation unexplained (F3,60 = 46.03, P Table 1 North Atlantic models from sea-surface data. Process length (P.l.) related to sea-surface temperature (SST), sea-surface salinity (SSS), nitrate, and phosphate in all Akaike information criterion (AIC) models. The second column shows the result of the univariate (process length [P.l.] ~ SSS) analysis. The third and fourth give the AIC-selected model as well as process length correlation with the remaining variables. Adjusted R2 values are used to give a more representative account of the coefficient of determination. The percentage independent impact of the predictors is given in the last four columns (gaps correspond to insignificant variables removed from the final model for that particular time interval). I-values are given in parentheses. North Atlantic – SURFACE Model P.l. ~ SSS (R2adj) AIC selected model R2adj SST SSS N P Annual Jan–March April–June July–Sept Oct–Dec January February March April May June July August September October November December 0.651 0.567 0.600 0.620 0.574 0.583 0.550 0.541 0.598 0.586 0.545 0.633 0.643 0.548 0.539 0.588 0.583 P.l. ~ SST + SSS + N P.l. ~ SSS + N + P P.l. ~ SST + SSS + N + P P.l. ~ SST + SSS P.l. ~ SST + SSS + N P.l. ~ SST + SSS + N P.l. ~ SSS + N + P P.l. ~ SSS + N + P P.l. ~ SSS + N + P P.l. ~ SST + SSS + N + P P.l. ~ SST + SSS + N P.l. ~ SST + SSS + P P.l. ~ SSS + N + P P.l. ~ SST + SSS + N + P P.l. ~ SST + SSS + P P.l. ~ SST + SSS + N P.l. ~ SST + SSS + N 0.682 0.611 0.668 0.654 0.629 0.629 0.571 0.585 0.693 0.643 0.619 0.681 0.677 0.609 0.596 0.631 0.644 14 (0.10) 73 (0.51) 56 (0.35) 58 (0.40) 96 (0.64) 69 (0.45) 57 (0.37) 63 (0.37) 55 (0.33) 69 (0.49) 61 (0.41) 78 (0.50) 71 (0.50) 70 (0.48 59 (0.38) 76 (0.47) 65 (0.42) 73 (0.48) 13 (0.09) 32 (0.20) 19 (0.13) 13 (0.08) 7 (0.5) 16 (0.11) 4 (0.03) 17 (0.11) 23 (0.15) 15 (0.10) 14 (0.09) 6 (0.04) 5 (0.03) 10 (0.06) 17 (0.11) 17 (0.11) 14 (0.09) 20 (0.13) 29 (0.17) 33 (0.20) 20 (0.14) 18 (0.12) 9 (0.05) 10 (0.07 5 (0.03) 18 (0.12) 10 (0.07) 8 (0.05) 12 (0.07) 11 (0.08) 6 (0.04) 23 (0.16) 20 (0.14) 31 (0.20) 14 (0.09) I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 Table 2 The independent and joint values for sea-surface salinity (SSS) following Akaike information criterion (AIC) selection. The percentage of process length explained by salinity is given by the models' R2-value multiplied by the independent impact of salinity. Model Independent Joint % explained by SSS Annual January February March April May June July August September October November December 0.51 0.37 0.37 0.33 0.49 0.41 0.50 0.50 0.48 0.38 0.47 0.42 0.48 0.15 0.22 0.18 0.22 0.12 0.18 0.05 0.14 0.17 0.18 0.08 0.17 0.17 49.78 34.82 35.94 32.19 47.82 39.23 48.27 48.35 47.39 35.93 45.27 40.99 46.98 b 0.001). Salinity displayed a positive relationship to process length, a relationship that remained in each of the seasonal and monthly models. From the multiple regression, the highest significant R2-values were seen for the April and July models. From these results, the VIF analysis of the April data showed that the confidence intervals for nitrate and phosphate were somewhat higher than expected for uncorrelated predictors. In contrast, all VIF values for July were close to 1. However, as the April results introduced the possibility of collinearity, a HP analysis was conducted for all models. HP analysis was performed using the respective variables generated from the AIC selection (Table 1). Following the criterion of importance, the analysis showed that salinity constituted the variable with the highest independent effect on process length in all model scenarios. Although other predictors occasionally displayed I-values above 0.15, salinity was the only variable with I-values consistently above this threshold. The highest monthly I-values of salinity were seen during June and July, during which salinity in both cases explained approximately 48% of the observed process length variation (Table 2, Fig. 4). A monthly (summer), rather than annual, model was chosen because the cysts are formed around this time of the year. Based on the higher R2-value (0.681) observed during July, this model was selected as the working model: Process lengthJuly ðμmÞ∼7:98836 þ 0:69155 ðSSSÞ þ 0:13316 ðPÞ þ 0:15466 ðSSTÞ: Together, these variables explained 68% of the process length variation (F3,60 = 45.82, P b 0.001). From the VIF analysis, salinity showed a relatively low correlation to the other predictors (VIF b 3), and the diagnostic analysis confirmed that all variables had p-values supporting a random distribution. This was supported by the Pearson residual plots, which showed that nonconstant error variance did not constitute a problem (Fig. 5). As for the additional analysis, evaluating whether process length was influenced by oxygen, silicate, MLD, PAR or chlorophyll a, these variables were often strongly correlated to the other independent variables but showed little or no influence on process length. Based on these results, they were considered insignificant in the North Atlantic analysis and were not included in the Mediterranean–Marmara–Black Sea or Baltic–Kattegat–Skagerrak analyses. 4.2. Mediterranean–Marmara–Black Sea analysis Unlike the North Atlantic samples, process length deviates slightly from a normal distribution. However, as no transformation satisfied the normality criteria, the untransformed data were used for the analysis. Viewing the results from the univariate regression, annual salinity explained 45.9% of the process length variation seen in the Mediterranean–Marmara–Black Sea data set (Table 3). However, unlike the North Atlantic annual model, the Mediterranean–Marmara–Black Sea AIC selection excluded salinity and included only temperature and nitrate in the annual model: Process lengthAnnual ∼SST þ nitrate: Combined, these variables explained 74% of the process length variation (F2,23 = 36.59, P b 0.001), leaving 26% unexplained. Temperature Variable contribution 80 60 Percentage 50 40 30 20 10 0 207 Phosphate Nitrate SST SSS Year Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Fig. 4. North Atlantic data set: the annual and monthly independent impact of each variable on the variation seen in process length. 208 I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 was included in all models except for June and July, whereas salinity was primarily included in the summer and fall models. From the generated models, November and December displayed the highest significant R2-values. Although both models contained temperature, and explained around 70% of the variation, the November model included both phosphate and nitrate whereas the December model only included nitrate in addition to temperature. The HP analysis showed significant Ivalues for salinity during summer and fall, whereas the highest values for temperature were seen during winter and spring (Table 3). 4.3. Baltic–Kattegat–Skagerrak analysis 0.0 -0.5 Process lengthAnnual ∼salinity þ temperature þ nitrate−phosphate: This model explained 90.2% of the observed variation, which was similar to the R2-values also generated by the seasonal and monthly models. Salinity was included in all models, seasonal as well as monthly. Collinearity was confirmed from the Pearson analysis, the VIF output, as well as the HP analysis. In all models, salinity displayed I-values above 0.15 (Table 4). -1.0 Pearson residuals 0.5 As with the Mediterranean–Marmara–Black Sea data set, process length deviated from a normal distribution and untransformed data were used for the analysis. Based on the univariate regression, annual salinity explained 87.2% of the process length variation (Table 4). From the model selection, all variables remained in the annual model: -2 -1 0 1 SST -0.5 0.0 The extended North Atlantic multivariate analysis shows that while the univariate relationship between process length and salinity produces relatively high R2-values, the explanatory power is influenced by variables other than just salinity. These results demonstrate that biological variation is more complex than can be explained simply by relating one parameter of interest to another and computing the proportions of explained/unexplained variance. The relationship between process length and salinity is simply not as straightforward as that required for univariate analysis. Persistent use of simple regression when not applicable, and especially when multicollinearity is demonstrated, will produce misleading R2-values. Univariate statistics will not produce reliable results when explanatory variables are intercorrelated, and should not be used for these types of analyses. Instead, the observed variation should be evaluated through multiple regression analyses that also involve assessing the individual effects of the explanatory variables. Regarding the Mediterranean–Marmara–Black Sea and Baltic–Kattegat–Skagerrak data, these results should be viewed with caution, as the Q–Q analysis showed that process length did not follow the assumption of a normal distribution. -1.0 Pearson residuals 0.5 5. Discussion -1 0 SSS 1 0.0 -0.5 5.1. Salinity effect on process length in the North Atlantic -1.0 Pearson residuals 0.5 -2 0 P 1 2 3 0.0 -0.5 -1.0 Pearson residuals 0.5 -1 This study confirms a relationship between salinity and process length in the North Atlantic Ocean, as salinity remains significant in all models and through all HP analyses. Salinity also emerges as the dominant independent variable in each model. This indicates that salinity positively influences the length of processes as they form. It would therefore be ideal to construct a morphology-based model using data from the precise time interval of cyst formation. However, Northern Hemisphere blooms of Protoceratium reticulatum (cysts are most likely to form during blooms) have been reported from April to October (see 6.5 7.0 7.5 8.0 Fitted values 8.5 9.0 Fig. 5. Regression diagnostics for the North Atlantic data set. Shown are basic residual plots for regression based on the July model. Because linear models fitted by least squares can make unrealistic assumptions, it is important to analyse the models using Pearson residuals to make sure they are correctly specified. A lack-of-fit test was thus conducted to verify a random distribution for the July residuals. The Pearson residuals show no systematic features, i.e. they do not change with the fitted values or with predictors (SST test stat = 0.955, Pr(N|t|) = 0.344; SSS test stat = 1.19, Pr(N|t|) = 0.239; P test stat = − 0.522, Pr(N|t|) = 0.604; Tukey test = 1.071, Pr(N|t|) = 0.284). This supports a correctly specified linear model. SST = sea-surface temperature, SSS = sea-surface salinity, P = phosphate. I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 209 Table 3 Mediterranean–Marmara–Black Sea models from sea-surface data. Process length (P.l.) related to temperature (SST), sea-surface salinity (SSS), nitrate and phosphate in all Akaike information criterion (AIC) models. The second column shows the result from the univariate (P.l. ~ SSS) analysis. The third and fourth give the Akaike information criterion (AIC) selected model as well as process length correlation with the remaining variables. Adjusted R2 values are used to give a more representative account of the coefficient of determination. The percentage independent impact of the predictors is given in the last four columns (gaps correspond to insignificant variables removed from the final model for that particular time interval). I-values are given in parentheses. Mediterranean–Marmara–Black Sea – SURFACE Model P.l. ~ SSS (R2adj) AIC selected model R2adj SST Annual Jan–March April–June July–Sept Oct–Dec January February March April May June July August September October November December 0.459 0.435 0.450 0.497 0.451 0.439 0.433 0.431 0.441 0.461 0.446 0.494 0.502 0.495 0.463 0.435 0.451 P.l. ~ SST + N P.l. ~ SST P.l. ~ SST + N − P P.l. ~ SST + SSS P.l. ~ SST P.l. ~ SST P.l. ~ SST P.l. ~ SST P.l. ~ SST + N P.l. ~ SST − P P.l. ~ SSS + N P.l. ~ −SSS P.l. ~ SST + SSS P.l. ~ SST + SSS + N P.l. ~ SST + SSS P.l. ~ SST + N − P P.l. ~ SST + N 0.740 0.589 0.599 0.573 0.673 0.599 0.541 0.619 0.659 0.541 0.497 0.494 0.596 0.662 0.612 0.706 0.681 82 (0.62) 100 65 (0.42) 39 (0.24) 100 100 100 100 95 (0.65) 51 (0.30) Mertens et al., 2011 for a summary). This means that the timing of cyst formation is too imprecise to select a specific month, and that the full set of monthly models had to be evaluated. This analysis shows that the independent influence of salinity fluctuates through the year, but that a significant increase occurs during the boreal summer. In particular, salinity reaches its maximum independent influence during June, July and August, when it explains N70% of the observed variation. This is the closest that process length comes to being directly influenced by salinity, and a comparison of the difference between the simple and multiple July regressions reveals that the two models are very close to each other (b5%). Even though these results support previous observations of a positive response pattern between salinity and process length (e.g. Verleye et al., 2012), the relationship only explains a limited portion of the observed variation. This means that other factors, such as transport mechanisms or genetic variability, are affecting process length. In addition, when relating the independent salinity effect during July to the R2-value generated by the multiple regression, no more than 48% of the observed process length variation is correlated to salinity. In comparison, the univariate 37 (0.23) 50 (0.35) 60 (0.38) 64 (0.48) 81 (0.57) SSS N P 18 (0.14) 5 (0.04) 30 (0.19) 61 (0.37) 5 (0.04) 49 (0.28) 92 (0.49) 100 63 (0.39) 45 (0.31) 40 (0.26) 8 (0.04) 5 (0.04) 24 (0.18) 19 (0.14) 12 (0.09) analysis from the same time interval ascribes 63.3% of the variation to salinity (Table 1). This result highlights the importance of a HP analysis, as the method constitutes a mechanism to detect inflated R2-values that would otherwise be overlooked. Simply viewing models such as those generated from the annual or July data sets as true, and using them for predictions, is thus not justified (for more on this topic, see: Chatfield, 1995; Reichert and Omlin, 1997). 5.2. Regional comparison The predictive power of the North Atlantic models was evaluated by comparing the resulting models to data sets obtained from the Mediterranean–Marmara–Black Sea region and the Baltic–Kattegat–Skagerrak system. From the univariate regression of salinity and process length alone, it is already clear that there are regional differences (Fig. 6). When viewing the Mediterranean–Marmara–Black Sea models and HP results, salinity is not included to the same extent as seen in the North Atlantic. Nor does it show a similar high independence. Instead, temperature emerges as the variable most frequently included in the models Table 4 Baltic–Kattegat–Skagerrak models from sea-surface data. Process length (P.l.) related to temperature (SST), sea-surface salinity (SSS), nitrate and phosphate in all AIC models. The second column shows the result from the univariate (P.l ~ SSS) analysis. The third and fourth give the Akaike information criterion (AIC) selected model as well as process length correlation with the remaining variables. Adjusted R2 values are used to give a more representative account of the coefficient of determination. The percentage independent impact of the predictors is given in the last four columns (gaps correspond to insignificant variables removed from the final model for that particular time interval). I-values are given in parentheses. Baltic–Kattegat–Skagerrak – SURFACE Model P.l. ~ SSS (R2adj) AIC selected model R2adj SST SSS N P Annual Jan–March April–June July–Sept Oct–Dec January February March April May June July August September October November December 0.872 0.877 0.851 0.868 0.884 0.881 0.879 0.870 0.853 0.846 0.853 0.866 0.861 0.874 0.880 0.882 0.889 P.l. ~ SST + SSS + N − P P.l. ~ −SSS + SSS − N + P P.l. ~ SST + SSS + N − P P.l. ~ SST + SSS P.l. ~ SST + SSS P.l. ~ SSS + P P.l. ~ SSS + P P.l. ~ SSS − N + P P.l. ~ SST + SSS P.l. ~ SST + SSS + N P.l. ~ SST + SSS + N − P P.l. ~ SST + SSS P.l. ~ SST + SSS P.l. ~ SST + SSS + N P.l. ~ SST + SSS P.l. ~ SST + SSS P.l. ~ SSS − N + P 0.902 0.898 0.908 0.902 0.899 0.902 0.901 0.893 0.907 0.908 0.908 0.901 0.902 0.902 0.902 0.902 0.902 38 (0.34) 27 (0.24) 44 (0.40) 22 (0.20) 43 (0.39) 42 (0.39) 47 (0.43) 46 (0.42) 78 (0.70) 57 (0.51) 67 (0.60) 69 (0.62) 94 (0.84) 48 (0.44) 51 (0.46) 50 (0.45) 84 (0.76) 86 (0.78) 62 (0.56) 60 (0.54) 49 (0.53) 55 (0.49) 2 (0.02) 9 (0.08) 1 (0.01) 18 (0.16) 17 (0.15) 9 (0.08) 1 (0.01) 33 (0.30) 31 (0.28) 5 (0.04) 1 (0.01) 9 (0.08) 2 (0.02) 52 (0.47) 48 (0.44) 39 (0.35) 16 (0.14) 14 (0.13) 35 (0.316) 40 (0.36) 41 (0.37) 3 (0.03) 3 (0.03) 42 (0.38) 210 I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 11 Observation 10 Variable selection Formulating the model Process length (in µm) 9 Med 8 Literature NA Spatial 7 Temporal Sample design 6 Sample size Sample coverage Sample strategy ltic Ba 5 4 3 Statistical analysis 2 EDA Regression type 1 0 -3 -2 -1 0 1 2 TSSS Fig. 6. Univariate regression: process length in relation to transformed annual salinity (TSSS, salinity given in Z-values) for the North Atlantic, the Mediterranean–Marmara– Black Sea region, and the Baltic–Kattegat–Skagerrak system. Calibration Model selection – AIC Address multicollinearity and its cause – HP Evaluation and the variable with the highest I-values (Supplementary Table 4). However, this does not automatically mean that temperature controls process length. Just as with the North Atlantic, it is important to consider the timing of cyst formation. Previous observations show that dinoflagellate motile cell concentrations for the Mediterranean–Marmara– Black Seas reach peak levels during late summer and fall (Psarra et al., 2000; Mercado et al., 2005). If temperature indeed controls process length, one might expect it to be important during this time interval. From the model selection, temperature is excluded during June and July but shows increasing I-values from August onwards. Temperature thus passes as important during the time interval when cysts are most likely to form. However, salinity also shows high I-values during this interval, meaning that temperature does not show the same dominance as seen for salinity in the North Atlantic model. As an example, while the highest R2-value of all the Mediterranean–Marmara–Black Sea models is seen during November (R2 = 0.70), temperature explains b 45% of the variation. The remainder is partly explained by nitrate (17%) and phosphate (8%). Therefore, it is difficult to identify one variable as having primary control over Mediterranean process length variation. The Baltic–Kattegat–Skagerrak models and HP output in contrast show a higher resemblance to the North Atlantic results, with salinity being included in all generated models. Based on the HP analysis, salinity displays the highest I-values through all months except for April and May. Although a large portion of this impact is due to joint effects, the independent effect of salinity does increase during the boreal summer. These results are consistent with conclusions from Mertens et al. (2011, 2012a) that highest confidence should be given to the relationship between process length and late summer salinity. However, the Baltic–Kattegat–Skagerrak data set is characterised by serious caveats. First, the samples fall into two distinct groups: one with shorter processes confined to the brackish Baltic Sea, and one with longer processes found in the region around the Kattegat and Skagerrak. Apart from process length, the cysts appear morphologically similar, but that does not exclude the possibility that the process length difference seen between the two areas is due to the presence of two genetically distinct strains. This idea of strain-specific response in Protoceratium reticulatum is not a novel concept as previous research has shown regional differences in molecular data (Mertens et al., 2012a). If two distinct strains of P. reticulatum have indeed been compared, the correlation is thus performed between non-homogenous groups. This could generate a reasonable correlation, but without actually representing the same relationship between process length and environment. In fact, due to the Test the predictive ability by applicability to other regions, or by cross-validation Fig. 7. The five steps involved in the model building process. In this study, model evaluation was performed by comparing the North Atlantic results to observations from the Mediterranean–Marmara–Black Sea region and the Baltic–Kattegat–Skagerrak system. environmental differences between the Baltic Sea and Kattegat– Skagerrak, anything but a high correlation would be surprising. It is beyond the scope of this study to determine whether the short process length of the Baltic Sea cysts reflects a physiological response or is genetically controlled. It is possible that the model efficiency could simply be increased if additional samples were added to bridge these two groups and provide a better coverage of the salinity gradient. However, in-depth morphological and DNA analyses, preferably combined with an effective in situ study, are needed to fully understand the factors involved. Meanwhile, any conclusions regarding the results from the Baltic–Kattegat–Skagerrak region remain speculative. 5.3. Outlook and guidelines for future studies A successful culture study would be the most important step towards understanding morphological variation in cysts of Protoceratium reticulatum. Not only would this allow identification of individual strains or cryptic species, and assessment of how salinity affects process length; it might also serve to establish the biological function of these changes. Numerous explanations have been given to account for the development of projections on resting cysts of various planktonic organisms (Belmonte et al., 1997). Mertens et al. (2009) suggested that the positive correlation they had determined between sea-water density and process length in the cyst of Lingulodinium polyedrum is related to accelerated sinking through clustering. Whether such a relationship might explain process length variation in P. reticulatum will require laboratory studies to assess process development under controlled conditions. In addition, the study of sediment traps would provide useful information on the unknown but presumably significant effects of cyst transport. The present study has focussed on statistically evaluating the relationships between environmental factors and Protoceratium reticulatum cyst morphology based on distributions in modern sediments, with the aim of improving the reliability of these cysts for paleoceanographic I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 reconstructions. In advancing this approach, future studies would benefit from the following key steps (modified from the recommendations of Guisan and Zimmermann, 2000; Fig. 7): 1. Formulating a possible model: Once a possible dependent parameter has been identified, either from the literature or from field observations, a model can be sought to describe the observed variation. This means that a set of possible variables must be identified and selected. Variable selection is a recurring problem in regression studies, where a balance must be found between more or fewer variables. Including more variables might have the positive effect of identifying a factor that influences the dependent variable. On the other hand, the risk of type I and II errors increases if the added variables have low reliability. Due to the experimental nature of the present study, we took a restrictive approach and used a limited set of variables. Future research could then build on our results and evaluate other potentially important variables. 2. Sample design: The spatial and temporal resolution of a study should be addressed as part of the sample set-up. The sample size, the size of the sample area, and the sample strategy will all affect the outcome of the study (Fortin and Dale, 2005). For example, in terms of the present study, the limited salinity range covered by the North Atlantic data set constitutes a problem. Ideally, a sample design should include samples that will allow the modelling of end members. As part of the sample design, it is also important to decide whether samples will be collected from one or more constrained regions. In terms of the spatial component, the minimum distance allowed between samples should be decided prior to sampling. Recognising these issues ahead of time will result in a better representation of the effective sample size and reduce the risk of induced type I errors. If this is not possible, SAC must be accounted for during the analysis (see Fortin and Dale, 2005, for various approaches). In terms of the temporal factor, future calibration studies should also consider the problems associated with developing a model using material obtained at different temporal scales. Using surface sediments for the dependent data set and direct sea-surface measurements for the explanatory set is not ideal. While cysts might have accumulated during centuries if not more, the environmental data derives from a more restricted time interval. 3. Statistical analysis: Only after a solid framework has been established can one begin the EDA and proceed towards a functional model. During this part of the analysis, the data are tested to determine whether they are normally distributed or in need of transformation. Next, a decision must be made whether to use simple or multiple regression. Multiple regression must be used if the dependent variable shows a correlation to more than one variable. 4. Calibration: All explanatory variables proven insignificant to the variation seen in the dependent variable should be removed from the model. This removal constitutes a key step in the analysis and should not be done manually. Instead, it should be based on AIC and techniques developed for the analyses of collinear data in order to avoid type I errors. To further evaluate the effect of the explanatory variables, HP provides a mechanism to address the fact that environmental variables are often correlated. 5. Evaluation: When a model has been developed for an area, its predictive ability should be tested. How this is done depends on whether the data set is restricted to one region or not. The present study benefited from having access to more than one data set. This meant that the North Atlantic set could be used for the calibration, and the Mediterranean–Marmara–Black Sea and Baltic–Kattegat–Skagerrak sets for evaluating the model. In cases where only one data set is available, different methods such as cross-validation (CV) must be used. In CV, the data set is divided into one set that is used for the analysis (the calibration set) and one set (the testing set) that is used to evaluate the accuracy of the first set. Multiple rounds of calibration and testing are needed before the evaluation is considered 211 complete. In the event of dependent observations, h-block cross validation can be used for the evaluation (Burman et al., 1994). 6. Summary and conclusions In order to evaluate whether cyst morphology of the dinoflagellate Protoceratium reticulatum can serve as a salinity proxy, cysts obtained from North Atlantic surface-sediment samples were analysed statistically to establish the individual and combined effects of a number of environmental variables on process length. This analysis was conducted using the Akaike information criterion in combination with hierarchical partitioning. In the North Atlantic region, the process length variation of P. reticulatum did not show a significant relationship to oxygen, silicate, MLD, PAR, or chlorophyll a. However, analysis of the annual data set did reveal a relationship with sea-surface temperature, salinity, and nitrate. Viewing the monthly data, the correlation between process length and the environmental variables increased during the boreal summer. This resulted in the following working model for the region: process lengthJuly (μm) ~ 7.98836 + 0.69155 (SSS) + 0.13316 (P) + 0.15466 (SST) (R2 = 0.681). From this model, it is clear that variables other than salinity also affect the observed process length variation. In conclusion, process length and salinity do not show an independent relationship because temperature and/or nutrients are also included in all the North Atlantic models. The North Atlantic analysis also shows that multicollinearity is a reality that must be addressed when assessing process length variation. A better understanding of the observed variance also depends on whether the relationship between the included variables is clarified. We suggest that hierarchical partitioning be used for this purpose. Conducting a HP analysis provides a straightforward way to identify the dependency level between variables and balance the inflated R2-values produced by univariate and multiple regression analyses. The performance of the North Atlantic model, serving as the calibration set, was also evaluated by comparing the results to data sets from the Mediterranean–Marmara–Black Sea region as well as the Baltic– Kattegat–Skagerrak estuarine system. While the analyses show that salinity constitutes the most important explanatory factor in the North Atlantic, as well as in the Baltic–Kattegat–Skagerrak region, the same cannot be said for the Mediterranean–Marmara–Black seas. In this region, temperature constitutes the dominant variable, and salinity is significant only during the summer and fall. This means that the North Atlantic salinity model is region specific, and should not be applied to areas beyond its geographical extent. By extension, these results show that process length variations seen in cysts of Protoceratium reticulatum cannot be used for global paleosalinity reconstructions using a single algorithm. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.palaeo.2014.01.012. Acknowledgements This research was supported by a University of Toronto studentship and scholarship and Ontario Graduate Scholarship to I.-M.J., and an NSERC Discovery Grant to M.J.H. K.N.M. is a postdoctoral fellow of FWO Belgium. Marie-Josée Fortin (University of Toronto) is warmly thanked for her generous advice on numerical ecology. We are grateful to Rehab Elshanawany, Kari Grøsfjeld, Rex Harland, Ulrich Kotthoff, Peta Mudie, Speranta Popescu, Vera Pospelova, and Sofia Ribeiro, for the loan of microscope slides. Sample material from the Malangen fjord was provided by the National Lacustrine Core Repository (LacCore). Elisabeth Levac provided samples from Nova Scotia, and Simon Troelstra provided information on Greenland cores and samples from the Mediterranean. Michal Kucera and an anonymous reviewer provided very helpful comments on the manuscript. 212 I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 References Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19 (6), 716–723. Antonov, J.I., Locarnini, R.A., Boyer, T.P., Mishonov, A.V., Garcia, H.E., 2006. World ocean atlas 2005, volume 2: salinity. In: Levitus, S. (Ed.), NOAA Atlas NESDIS 62. U.S. Government Printing Office, Washington, D.C. (182 pp.). Austin, M.P., 1980. Searching for a model for use in vegetation analysis. Vegetatio 42 (1–3), 11–21. Belmonte, G., Miglietta, A., Rubino, F., Boero, F., 1997. Morphological convergence of resting stages of planktonic organisms: a review. Hydrobiologia 355, 159–165. Benestad, R., 2009. clim.pact: climate analysis and empirical–statistical downscaling (ESD) package for monthly and daily set. R package version 2.3-10. Birks, H.J.B., 1995. Quantitative palaeoenvironmental reconstructions. In: Maddy, D., Brew, J.S. (Eds.), Statistical Modelling of Quaternary Science Data. Technical Guide 5. Quaternary Research Association, Cambridge, U.K., pp. 116–254. Bivand, R.S., Pebesma, E.J., Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R. Springer, New York (376 pp.). Boessenkool, K.P., Van Gelder, M.-J., Brinkhuis, H., Troelstra, S.R., 2001. Distribution of organic-walled dinoflagellate cysts in surface sediments from transects across the Polar Front offshore southeast Greenland. J. Quat. Sci. 16 (7), 661–666. Brenner, W.W., 2005. Holocene environmental history of the Gotland Basin (Baltic Sea) — a micropalaeontological model. Palaeogeogr. Palaeoclimatol. Palaeoecol. 220 (3–4), 227–241. Burman, P., Chow, E., Nolan, D., 1994. A cross-validatory method for dependent data. Biometrika 81 (2), 351–358. Bütschli, O., 1885. Erster Band. Protozoa. Dr. H.G. Bronn's Klassen und Ordnungen des Thier-Reiches, wissenschaftlich dargestellt in Wort und Bild. C.F. Winter'sche Verlagshandlung, Leipzig and Heidelberg, pp. 865–1088. Çağatay, M.N., Görür, N., Algan, O., Eastoe, C., Tchapalyga, A., Ongan, D., Kuhn, T., Kuşcu, I., 2000. Late Glacial–Holocene palaeoceanography of the Sea of Marmara: timing of connections with the Mediterranean and the Black Seas. Mar. Geol. 167 (3–4), 191–206. Cassie, R.M., Michael, A.D., 1968. Fauna and sediments of an intertidal mud flat: a multivariate analysis. J. Exp. Mar. Biol. Ecol. 2 (1), 1–23. Chatfield, C., 1995. Model uncertainty, data mining and statistical inference. J. R. Stat. Soc. Ser. A 158, 419–466. Chevan, A., Sutherland, M., 1991. Hierarchical partitioning. Am. Stat. 45 (2), 90–96. Claparède, É., Lachmann, J., 1859. Études sur les infusoires et les rhizopodes. Institut national génevois, Mémoires 6, 261–482 (pl. 14–24. [Imprinted 1858.]). Clarke, A.L., Weckström, K., Conley, D.J., Anderson, N.J., Adser, F., Andrén, E., de Jonge, V.N., Ellegaard, M., Juggins, S., Kauppila, P., Korhola, A., Reuss, N., Telford, R.J., Vaalgamaa, S., 2006. Long-term trends in eutrophication and nutrients in the coastal zone. Limnol. Oceanogr. 51 (1, part 2), 385–397. Dale, B., 1985. Dinoflagellate cyst analysis of Upper Quaternary sediments in core GIK 15530-4 from the Skagerrak. Nor. Geol. Tidsskr. 65 (1–2), 97–102. Dale, B., 1996. Dinoflagellate cysts ecology: modelling and geological applications. In: Jansonius, J., McGregor, D.C. (Eds.), Palynology: Principles and Applications, vol. 3. American Association of Stratigraphic Palynologists Foundation, Dallas, TX, pp. 1249–1275. Dale, M.R.T., Zbigniewicz, M.W., 1997. Spatial pattern in boreal shrub communities: effects of a peak in herbivore density. Can. J. Botany 32, 1342–1348. Dale, B., Thorsen, T.A., Fjellså, A., 1999. Dinoflagellate cysts as indicators of cultural eutrophication in the Oslofjord, Norway. Estuar. Coast. Shelf Sci. 48 (3), 371–382. Dodge, J.D., 1989. Some revisions of the family Gonyaulacaceae (Dinophyceae) based on a scanning electron microscope study. Bot. Mar. 32, 275–298. Ellegaard, M., 2000. Variations in dinoflagellate cyst morphology under conditions of changing salinity during the last 2000 years in the Limfjord, Denmark. Rev. Palaeobot. Palynol. 109 (1), 65–81. Elshanawany, R., Zonneveld, K., Ibrahim, M.I., Kholeif, S.E.A., 2010. Distribution patterns of recent organic-walled dinoflagellate cysts in relation to environmental parameters in the Mediterranean Sea. Palynology 34 (2), 233–260. Fielding, S.R., Herrle, J.O., Bollmann, J., Worden, R.H., Montagnes, D.J.S., 2009. Assessing the applicability of Emiliania huxleyi coccolith morphology as a sea-surface salinity proxy. Limnol. Oceanogr. 54 (5), 1475–1480. Fleishman, E., Mac Nally, R., Murphy, D.D., 2005. Relationships among non-native plants, diversity of plants and butterflies, and adequacy of spatial sampling. Biol. J. Linn. Soc. 85 (2), 157–166. Fortin, M.-J., Dale, M., 2005. Spatial Analysis: A Guide for Ecologists. Cambridge University Press, Cambridge, U.K (365 pp.). Fox, J., 1991. Regression Diagnostics. Sage Publications Inc., Newbury Park, CA (Chapter 1). Fox, J., Weisberg, S., 2011. An {R} Companion to Applied Regression. Second edition. Sage Publications Inc., Thousand Oaks, CA (Chapter 3). Grøsfjeld, K., Harland, R., 2001. Distribution of modern dinoflagellate cysts from inshore areas along the coast of southern Norway. J. Quat. Sci. 16 (7), 651–659. Guisan, A., Zimmermann, N.E., 2000. Predictive habitat distribution models in ecology. Ecol. Model. 135, 147–186. Gundersen, N., 1988. En palynologisk underslogisk av dinoflagellatcyster langs en synkende salinitetsgradient i recente sedimenter fra Østersjn-området. (Cand. Scient. thesis) University of Oslo (96 pp.). Hallett, R.I., 1999. Consequences of Environmental Change on the Growth and Morphology of Lingulodinium polyedrum (Dinophyceae) in Culture. (Ph.D. thesis) University of Westminster, U.K.(109 pp.). Head, M.J., 2007. Last Interglacial (Eemian) hydrographic conditions in the southwestern Baltic Sea based on dinoflagellate cysts from Ristinge Klint, Denmark. Geol. Mag. 144 (6), 987–1013. Kirci-Elmas, E., Algan, O., Özgar-Öngen, I., Struck, U., Altenbach, A.V., Sagular, E.K., Nazik, A., 2008. Palaeoenvironmental investigation of sapropelic sediments from the Marmara Sea: a biostratigraphic approach to palaeoceanographic history during the Last Glacial–Holocene. Turk. J. Earth Sci. 17, 129–168. Konieczny, R., 1983. En miljørettet palynologisk analyse av dinoflagellat-cyster i resente marine sedimenter fra Skagerrak. (Cand. Scient. thesis) University of Oslo (106 pp.). Kotthoff, U., Pross, J., Müller, U.C., Peyron, O., Schmiedl, G., Schulz, H., Bordon, A., 2008. Climate dynamics in the borderlands of the Aegean Sea during formation of sapropel S1 deduced from a marine pollen record. Quat. Sci. Rev. 27 (7–8), 832–845. Kullback, S., Leibler, R.A., 1951. On information and sufficiency. Ann. Math. Stat. 22 (1), 79–86. Kunz-Pirrung, M., 2001. Dinoflagellate cyst assemblages in surface sediments of the Laptev Sea region (Arctic Ocean) and their relationship to hydrographic conditions. J. Quat. Sci. 16 (7), 637–649. Ladouceur, S., 2007. Évaluation des changements hydrographiques de la Baie d'Hudson et du Bassin de Foxe au cours des derniers siècles à partir de traceurs palynologiques et micropaleontologiques. (M.Sc. Thesis) Université du Québec à Montréal (79 pp.). Legendre, P., Dale, M.R.T., Fortin, M.-J., Gurevitch, J., Hohn, M., Myers, D., 2002. The consequences of spatial structure for the design and analysis of ecological field surveys. Ecography 25 (5), 601–615. Leroy, V., 2001. Traceurs palynologiques des flux biogéniques et des conditions hydrographiques en milieu marin cotier: exemple de l'étang de Berre. DEA, Ecole doctorale Sciences de l'environment d'Aix-Marseille (30 pp.). Levac, E., 2002. High Resolution Palynological Records from Atlantic Canada: Regional Holocene Paleoceanographic and Paleoclimatic History. (Ph.D. thesis) Dalhousie University, Halifax, Nova Scotia (862 pp.). Locarnini, R.A., Mishonov, A.V., Antonov, J.I., Boyer, T.P., Garcia, H.E., 2006. World ocean atlas 2005, volume 1: temperature. In: Levitus, S. (Ed.), NOAA Atlas NESDIS 61. U.S. Government Printing Office, Washington, D.C. (182 pp.). Mac Nally, R., 2000. Regression and model-building in conservation biology, biogeography and ecology: the distinction between — and reconciliation of — ‘predictive’ and ‘explanatory’ models. Biodivers. Conserv. 9 (5), 655–671. Mac Nally, R., 2002. Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables. Biodivers. Conserv. 11 (8), 1397–1401. Mangin, S., 2002. Distribution actuelle des kystes de dinoflagellés en Méditerranée occidentale et application aux fonctions de transfert. Memoir of DEA. University of Bordeaux 1 (34 pp.). Marret, F., Scourse, J., 2002. Control of modern dinoflagellate cyst distribution in the Irish and Celtic seas by seasonal stratification dynamics. Mar. Micropaleontol. 47 (1–2), 101–116. Marret, F., Eiríksson, J., Knudsen, K.L., Turon, J.-L., Scourse, J., 2004. Distribution of dinoflagellate cyst assemblages in surface sediments from the northern and western shelf of Iceland. Rev. Palaeobot. Palynol. 128 (1–2), 35–53. Matthiessen, J., 1995. Distribution patterns of dinoflagellate cysts and other organicwalled microfossils in recent Norwegian–Greenland Sea sediments. Mar. Micropaleontol. 24 (3–4), 307–334. Mercado, J.M., Ramírez, T., Cortés, D., Sebastián, M., Vargas-Yáñez, M., 2005. Seasonal and inter-annual variability of the phytoplankton communities in an upwelling area of the Alboran Sea (SW Mediterranean Sea). Sci. Mar. 69 (4), 451–465. Mertens, K.N., Ribeiro, S., Bouimetarhan, I., Caner, H., Combourieu Nebout, N., Dale, B., de Vernal, A., Ellegaard, M., Filipova, M., Godhe, A., Goubert, E., Grøsfjeld, K., Holzwarth, U., Kotthoff, U., Leroy, S.A.G., Londeix, L., Marret, F., Matsuoka, K., Mudie, P.J., Naudts, L., Peña-Manjarrez, J.L., Persson, A., Popescu, S.-M., Pospelova, V., Sangiorgi, F., van der Meer, M.T.J., Vink, A., Zonneveld, K.A.F., Vercauteren, D., Vlassenbroeck, J., Louwye, S., 2009. Process length variation in cysts of a dinoflagellate, Lingulodinium machaerophorum, in surface sediments: investigating its potential as salinity proxy. Mar. Micropaleontol. 70 (1–2), 54–69. Mertens, K.N., Dale, B., Ellegaard, M., Jansson, I.-M., Godhe, A., Kremp, A., Louwye, S., 2011. Process length variation in cysts of the dinoflagellate Protoceratium reticulatum, from surface sediments of the Baltic–Kattegat–Skagerrak estuarine system: a regional salinity proxy. Boreas 40 (2), 242–255. Mertens, K.N., Bringué, M., van Nieuwenhove, N., Takano, Y., Pospelova, V., Rochon, A., de Vernal, A., Radi, T., Dale, B., Patterson, R.T., Weckström, K., Andrén, E., Louwye, S., Matsuoka, K., 2012a. Process length variation of the cysts of the dinoflagellate Protoceratium reticulatum in the North Pacific and Baltic–Skagerrak region: calibration as an annual density proxy and first evidence of pseudo-cryptic speciation. J. Quat. Sci. 27 (7), 734–744. Mertens, K.N., Bradley, L.R., Takano, Y., Mudie, P.J., Marret, F., Aksu, A.E., Hiscott, R.N., Verleye, T.J., Mousing, E.A., Smyrnova, L.L., Bagheri, S., Mansor, M., Pospelova, V., Matsuoka, K., 2012b. Quantitative estimation of Holocene surface salinity variation in the Black Sea using dinoflagellate cyst process length. Quat. Sci. Rev. 39, 45–59. Olea, P.P., Mateo-Tomás, P., de Frutos, Á., 2010. Estimating and modelling bias of the hierarchical partitioning public-domain software: implications in environmental management and conservation. PLoS ONE 5, e11698. Paez-Reyes, M., Head, M.J., 2013. The Cenozoic gonyaulacacean dinoflagellate genera Operculodinium Wall, 1967 and Protoceratium Bergh, 1881 and their phylogenetic relationships. J. Paleontol. 87, 786–803. Persson, A., Godhe, A., Karlson, B., 2000. Dinoflagellate cysts in recent sediments from the west coast of Sweden. Bot. Mar. 43 (1), 69–79. Psarra, S., Tselepides, A., Ignatiades, L., 2000. Primary productivity in the oligotrophic Cretan Sea (NE Mediterranean): seasonal and interannual variability. Prog. Oceanogr. 46 (2–4), 187–204. R Development Core Team, 2011. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0 (URL http://www.R-project.org/). I.-M. Jansson et al. / Palaeogeography, Palaeoclimatology, Palaeoecology 399 (2014) 202–213 Reichert, P., Omlin, M., 1997. On the usefulness of overparameterized ecological models. Ecol. Model. 95 (2–3), 289–299. Ribeiro, S., Moros, M., Ellegaard, M., Kuijpers, A., 2012. Climate variability in West Greenland during the past 1500 years: evidence from a high-resolution marine palynological record from Disko Bay. Boreas 41 (1), 68–83. Röder, K., Hantzsche, F.M., Gebühr, C., Miene, C., Helbig, T., Krock, B., Hoppenrath, M., Luckas, B., Gerdts, G., 2012. Effects of salinity, temperature and nutrients on growth, cellular characteristics and yessotoxin production of Protoceratium reticulatum. Harmful Algae 15, 59–70. Sangiorgi, F., Fabbri, D., Comandini, M., Gabbianelli, G., Tagliavini, E., 2005. The distribution of sterols and organic-walled dinoflagellate cysts in surface sediments of the Northwestern Adriatic Sea (Italy). Estuar. Coast. Shelf Sci. 64 (2–3), 395–406. Schroeder, L.D., Sjoquist, D.L., Stephan, P.E., 1986. Understanding regression analysis: an introductory guide. Sage University Paper Series on Quantitative Applications in the Social Sciences. Sage Publications Inc., Newbury Park, CA 07–057 (Chapter 5). Sprangers, M., Dammers, N., Brinkhuis, H., van Weering, T.C.E., Lotter, A.F., 2004. Modern organic-walled dinoflagellate cyst distribution offshore NW Iberia; tracing the upwelling system. Rev. Palaeobot. Palynol. 128 (1–2), 97–106. Stein, F.R. von, 1883. Der Organismus der Infusionsthiere nach eigenen Forschungen in systematischer Reihenfolge bearbeitet. III Abteilung. II. Hälfte. Die Naturgeschichte der arthrodelen Flagellaten. Wilhelm Engelmann, Leipzig (30 pp., 25 pl). 213 Thorsen, T.A., Dale, B., 1998. Climatically influenced distribution of Gymnodinium catenatum during the past 2000 years in coastal sediments of southern Norway. Palaeogeogr. Palaeoclimatol. Palaeoecol. 143 (1–3), 159–177. Thorsen, T.A., Dale, B., Nordberg, K., 1995. ‘Blooms’ of the toxic dinoflagellate Gymnodinium catenatum as evidence of climatic fluctuations in the late Holocene of southwestern Scandinavia. The Holocene 5 (4), 435–446. Verleye, T.J., Mertens, K.N., Young, M.D., Dale, B., McMinn, A., Scott, L., Zonneveld, K.A.F., Louwye, S., 2012. Average process length variation of the marine dinoflagellate cyst Operculodinium centrocarpum in the tropical and Southern Hemisphere Oceans: assessing its potential as a palaeosalinity proxy. Mar. Micropaleontol. 86–87, 45–58. Wall, D., Dale, B., Harada, K., 1973. Descriptions of new fossil dinoflagellates from the late Quaternary of the Black Sea. Micropaleontology 19 (1), 18–31. Walsh, C., Mac Nally, R., 2008. hier.part: hierarchical partitioning. R package version 1.0-3. Willumsen, P.S., Filipsson, H.L., Reinholdsson, M., Lenz, C., 2013. Surface salinity and nutrient variations during the Littorina Stage in the Fårö Deep, Baltic Sea. Boreas 42, 210–223. Zonneveld, K.A.F., Marret, F., Versteegh, G.J.M., Bogus, K., Bonnet, S., Bouimetarhan, I., Crouch, E., de Vernal, A., Elshanawany, R., Edwards, L., Esper, O., Forke, S., Grøsfjeld, K., Henry, M., Holzwarth, U., Kielt, J.-F., So-Young, K., Ladouceur, S., Ledu, D., Chen, L., Limoges, A., Londeix, L., Lu, S.-H., Mahmoud, M.S., Marino, G., Matsouka[sic], K., Matthiessen, J., Mildenhal[sic], D.C., Mudie, P., Neil, H.L., Pospelova, V., Qi, Y., Radi, T., Richerol, T., Rochon, A., Sangiorgi, F., Solignac, S., Turon, J.-L., Verleye, T., Wang, Y., Wang, Z., Young, M., 2013. Atlas of modern dinoflagellate cyst distribution based on 2405 datapoints. Rev. Palaeobot. Palynol. 191, 1–197.
© Copyright 2026 Paperzz