Appendix S1 We used the Spatio-Temporal Exploratory Model (STEM) to estimate species’ distributions because of its ability to adapt to non-stationary in predictor-response relationships modeled from large sets of irregularly distributed observational data (Fink et al., 2010, 2014). STEM is an ensemble of local regression models generated by repeatedly partitioning the study extent into grids of spatiotemporal blocks, called stixels, and then fitting independent regression models, called base models, within all stixels. Together, the base models form an ensemble of local occurrence estimates and local land cover association estimates uniformly distributed across the study extent. In the methods sections of the paper, we describe how we used the ensemble to estimate 1) fine-scale occurrence and 2) regional-scale trajectories of several land cover association statistics. In this appendix, we discuss the how we specified the stixel size, how the partitions were created and randomized, and the base models we used for this analysis. S1.1 Specifying Stixel Size Specifying the size of the stixels that define the local neighborhoods is an important part of STEM. Because of the non-uniform distribution of eBird observations in space and through time, stixel size controls a bias-variance tradeoff (Fink et al. 2010, 2014). The larger the stixel, the larger the number of observations in the stixel used to train the base model (reducing the variance associated with sample size). The smaller the stixel, the weaker the assumption of spatial-temporal stationarity (reducing the bias associated with less flexible models). Since we wanted to capture seasonal variation in distributions, we wanted the time dimension to be short enough to adapt to seasonal changes. From past experience we have found that a grid with a 40-day window can adapt to a wide variety of complex avian migration patterns across a diverse set of terrestrial species using eBird data (NABCI 2011, 2013). The spatial dimensions were selected to be the smallest size possible (lowest bias) with the goal of capturing non-stationary patterns, but large enough to meet the minimum sample size requirements (variance control) for the BRTs to be fit throughout the study area. The number of stixels supporting a STEM estimate at a given location and time is called the ensemble support. Given the 40-day window, we estimated the smallest latitude-longitude stixel dimensions necessary to achieve at least 50% support, throughout the study area based on the eBird training data locations during the month with the smallest number of observations. To do this we generated a random sample of 10 uniformly distributed partitions at this time and recorded ensemble support across a fine grid of locations within the study area. To be included in the ensemble support, each stixel was required to meet the base model minimum sample size requirement (See S1.3 Boosted Regression Tree Base Models). For this analysis we used a regular spatiotemporal grid with dimensions 10-degrees longitude by 7-degrees latitude by 40-days and a minimum base-model sample size of 30. Figure 1 shows a partition with 7 degrees latitude by 10 degrees longitude stixels (Fig. 1) and the associated map of ensemble support (Fig. 1). The support map shows most of the continental US is covered by at least 9 out of 10 base-models possible. The combined effects of regionally sparse data and boundary effects are evident in Montana and North Dakota where ensemble support drops to 6 out of 10 partitions. Figure 1 Ensemble support diagnostics. The left panel shows a typical stixel partition of 7 degrees latitude by 10 degrees longitude. The right panel shows the associated map of ensemble support. The support map shows most of the continental US is covered by at least 9 out of 10 base-models possible. The combined effects of regionally sparse eBird data and boundary effects are evident in Montana and North Dakota where ensemble support drops to 6 out of 10 partitions. S1.2 Partitions: Creating and Randomizing Each individual partition divides the study extent into a regular grid of spatiotemporal neighborhoods (stixels) and regression base models are fit independently to the data in each stixel. The STEM ensemble is created from a sample of partitions to generate a uniformly distributed ensemble of model estimates across the study extent while facilitating bootstrap estimation. First, to capture sampling variation we generated 50 subsamples, each consisting of 70% of the data. Second, we generated four randomly located partitions for each subsample so that “edge” effects associated with individual partitions could be averaged out. STEM uses bootstrap smoothing, also known as bagging, to combine estimates across the ensemble while controlling inter-model variability (Efron, 2014). Each bootstrap smooth was based on a random subset of 100 partitions. We generated one hundred bootstrap replicates in this way, each equivalent to a grid-based block subsample (Lahiri & Zhu, 2006). S1.3 Boosted Regression Tree Base Models Within each stixel species’ occurrence was assumed to be stationary and we fit Boosted Regression Trees (BRTs) with a stixel minimum sample size requirement of 30 observations. BRTs are a flexible, highly automated nonparametric regression technique that can accommodate a wide-range of potential covariates with non-linear effects and interactions (Hastie et al., 2009) . BRTs have been found to perform well for species distribution modeling (Elith et al., 2008). BRT occurrence models were fit by using presence or absence of species on a checklist as the binomial response variable. Effort and time covariates were included to account for variation in detectability and availability for detection of birds. We selected BRT parameters to facilitate a bagging model strategy when combining information across base models. The strategy is to aim for base models with low bias, erring on the side of overfitting rather than underfitting, and then rely on the variance reducing properties of combining base model estimates across the base-models to control overfitting. All BRTs were run with an interaction depth of three, the shrinkage parameter equal to 0.01, the bag fraction equal to 0.80 and 500 trees in the ensemble. The interaction depth was set to 3 to insure that we would capture 2-way interactions (which we have good reason to expect a priori: e.g. interactions of elevation and land cover) but also give the base models extra flexibility to produce low-bias estimates. Each BRT ensemble included 500 trees. We selected the other parameters because they tended to produce overfit base models when tested over different regions, season, and species. The trees were fit in R (R Core Team, 2015) with the gbm package (Ridgeway, 2015). References Allouche, O., Tsoar, A. & Kadmon, R. (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology, 43, 1223-1232.Elith, J., J. R. Leathwick, and T. Hastie. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802–813. Efron, B. (2014) Estimation and accuracy after model selection. Journal of the American Statistical Association, 109, 991-1007. Fink, D., Damoulas, T., Bruns, N. E., La Sorte, F. A., Hochachka, W. M., Gomes, C. P., and Kelling, S. (2014) Crowdsourcing meets ecology: hemispherewide spatiotemporal species distribution models. AI magazine 35:19–30. Fink, D., Hochachka, W. M., Zuckerberg, B., Winkler, D. W., Shaby, B., Munson, M. A., Hooker, G. J., Riedewald, M., Sheldon, D., and Kelling, S. (2010) Spatiotemporal Exploratory models for Large-scale Survey Data. Ecological Applications. Ecological Applications 20:2131– 2147. Hastie, T., Tibshirani, R., and Friedman, J. (2009) The elements of statistical learning: data mining, inference, and prediction. Second edition. Springer-Verlag, New York, USA. Lahiri, S. N., & Zhu, J. (2006) Resampling methods for spatial regression models under a class of stochastic designs. The Annals of Statistics, 34(4), 1774-1813. NABCI (2011) The State of the Birds 2011 Report on Public Lands and Waters. U.S. Department of Interior, Washington, DC. NABCI (2013) The State of the Birds 2013 Report on Private Lands and Waters. U.S. Department of Interior, Washington, DC. R Core Team (2015) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Ridgeway, G. (2015) Generalized Boosted Regression Models. R package version 2.1.1. http://CRAN.R-project.org/package=gbm. Wood, S.N. (2006) Generalized additive models : an introduction with R. Chapman & Hall/CRC, Boca Raton, FL.
© Copyright 2026 Paperzz