Soil organic carbon across Mexico and the conterminous United States Guevara M.,Vargas R., et al. (co-authorship likely in alphabetical order), There is a need of an interoperable effort to efficiently characterize soil functionality across Mexico and United States continuum • To facilitate capacities building (technical and institutional) and avoid interoperability barriers of carbon cycle science and reporting purposes (i.e. Vargas et al., 2016) . – To better represent variability across geopolitical borders, – To periodically provide estimates of soil organic carbon and its associated uncertainty based on the best information available, – To better inform policy relevant decisions about carbon cycle science across both countries, Vargas et al., 2016 Enhancing interoperability to facilitate implementation of REDD+: case study of Mexico, Carbon management. In press. Introduction • Why soil organic carbon mapping across Mexico and United States? – Reporting purposes for climate change adaptation guidelines, – There are just a few digital soil organic carbon mapping efforts for the specific case of Mexico and United states continuum, and high uncertainty in available information • Why 30cm depth? – A harmonized database representing soil organic carbon at this depth is the first result of institutional (USA and Mexican) data sharing and curation – It is also a priority standard soil depth interval (0-30cm) required by the IPCC Why building analytical and institutional capacities for soil carbon mapping? To highlight the benefits of digital soil mapping into the federal agenda and facilitate carbon cycle related policy relevant research • We work with principles: – Reproducible research, – data sharing, – transparent methods and – (ideally) open source platforms Image from: A Safe Operating Space for Humanity. Rockström et al. 2009. Nature 461:24 Knowledge gaps: Spatial variability Spatial and temporal detail Sources or sinks? Reducing model uncertainty Scharlemann, J.P.W., Hiederer, R., Kapos, V. (2009). Global map of terrestrial soil organic carbon stocks. UNEP-WCMC & EU-JRC, Cambridge, UK. Global efforts: Polygon based approaches Hiederer R, Kochy M (2011) Global Soil Organic Carbon Estimates and the Harmonized World Soil Database. EUR 25225 EN. Publications Office of the EU, Luxembourg. Global efforts: ISRIC SoilGrids1km ~30% explained variance Unexpected pattern across Yucatan Peninsula permille SoilGrids1km — Global Soil Information Based on Automated Mapping, Hengl et al., 2014 http://dx.doi.org/10.1371/journal.pone.0105992 National efforts: Cloud based soil mapping ~30% of explained variance Using Google's cloud-based platform for digital soil mapping, Padarian et al. 2015 doi:10.1016/j.cageo.2015.06.023 National efforts: Linear geo-statistics Organic carbon Interpolation of Mexican soil properties at a scale of 1:1,000,000 Cruz-Cardenas et al., 2014. 213, 29–35. This is a collective effort • To build technical and institutional capacities as well as key alliances for digital soil mapping across Mexico and United States • Data -> Information -> Knowledge -> Wisdom? – Spatial detail and depth relationships – Precision and accuracy – Spatial and temporal modeling Scientific questions of this study? • How much soil organic carbon in is stored in the first 30 centimeters of soil across Mexico and United States, and what is the associated uncertainty? • How much of the spatial variability of soil organic carbon can we explain across Mexico and United States? Objectives • To generate an interpretable and predictable model to describe organic carbon variability • To quantify the soil organic carbon stock stored in the first 30 cm depth of the soils across Mexico and United States – To design a spatial soil organic carbon inference system • Parametric, machine learning and Bayesian statistics • Quantify uncertainty estimates Methods • Conceptual model • The data set (point data and SCORPAN factors) • Statistical design – Interpretability (i.e. linear terms) – Prediction capacity (i.e. non parametric modeling) – Mexico and United States (5 km and 250 m) – Downscaling model <100 m (State of Delaware in US and La Encrucijada in Mexico) • Computational resources available at the University of Delaware thanks to the Soil Plant Atmosphere Continuum working group and the UD IT team. A Pedometric mapping approach Courtesy of Tomislav Hengl, ISRIC Summer school 2012, Wageningen NL The conceptual model Available data Available soil covariates Static 2D (Baseline estimates) Grunwald et al., 2011 SSSAJ SOC 30cm depth = f (SCORPAN) + e n=38218 gr cm-2 Available data representative for the first 30cm depth SOC 30cm depth = f (SCORPAN) + e Digital terrain analysis Remote sensing Climate surfaces Soil organic carbon prediction factors, up to 250m pixel size Hengl., et al. in preparation SOC 30cm depth = f (SCORPAN) + e • • • • Linear models (i.e. lm) Machine learning (i.e. random forest, kknn) Bayesian methods (i.e. Hamiltonian Monte Carlo simulations) Bootstrapping methods (i.e. Independent uncertainty estimates) There is no best method (Wolpert s no free lunch statistical theorem). We assume that different prediction algorithms will capture different portions of soil organic carbon variability SOC 30cm depth = f (SCORPAN) + e Example of residual mapping of a multivariate linear model predicting SOC to new data, See Hengl et al., A generic framework for spatial prediction of soil variables based on regression-kriging, Geoderma 120 1-2, doi:10.1016/j.geoderma.2003.08.018 Results • • • • • • Descriptive statistics and spatial autocorrelation Linear models (variable selection) Machine learning models (kknn and random forest) Bayesian statistics Predictive maps (5 km – 250 m) Soil organic carbon stocks (and its uncertainties) across land uses of Mexico and United States. Histogram of available data The range of values of available data shows a mayor density between 0 and 1. Both data sets (Mexican and US) show a similar distribution and spatial autocorrelation structure Linear model coefficients R2 0.31 We use step wise linear models to analyze variable importance The problem of multicollinearity, variance inflation factor (VIF) plot Statistical redundancy among predictors will affect both, model interpretability and model predictability. More predictors often higher R2 but not necessarily a better model… SOC decreases as topographic wetness index (twi) and temperature (xTemp) increases, SOC increases with vegetation (xEVI) R2 0.28 After multiple (hundreds) of runs and intuitively variable drop offs we found an interpretable (linear) model with only 3 predictors, and without sacrificing too much prediction capacity (we cross validate [10 fold] to avoid overfitting in performance evaluation, R2 0.28) Linear models (median of 500 realizations) R2 0.28 (0.281, 0.283) Each realization was generated with an independent random combination of available data for training and test (70 – 30 %) Linear models (SD of 500 realizations) ~1hrs We use the standard deviation (SD) of all model realizations as a measure of uncertainty, we also benchmark to compare time performance Bayesian multivariate model R2 0.28 (0.27, 0.30) Median SD ~30 hrs Bayesian analysis aim to describe the moments of a given distribution of data and simulate them (in a probabilistic basis) across the predictors domain Kernel weighted nearest neighbors (500 realizations) R2 0.39 (0.37,0.41) Median SD ~3hrs Kknn is a pattern recognition technique which is very fast and generate reasonable results, it uses a kernel function to convert distance in weights an average in regression Random forests (500 realizations) R2 0.49 (0.47, 0.51) Median SD ~11hrs Random forests is non parametric ensemble of multiple decision trees generated by the means of ‘bagging’ Regression kriging of random forest ISRIC - Global Soil Information Facilities GSIF better capture the tail highest values (>1 gr cm -2) observations Our best method was validated using 10-fold cross validation in a geostatistical framework thanks to ISRIC facilities for automated mapping https://cran.rproject.org/web/packages/GSIF/index.html ISRIC-GSIF provides a flexible platform for transparent digital soil mapping capacity building (top-bottom and bottom-up) Median of lm, Bayes lm, kknn, and rf, (2000 realizations) Median of lm, Bayes lm, kknn, and rfGSIF, (2000 realizations) SD of all models (2000 realizations) The median of all models balances the uncertainty and prediction capacity to new data This is a multiscale inference system (Dover DE, USA, SOC 10m pixel size) Pending miscellaneous and work in progress: • Updating stocks per country and land use (using the North American Land Cover and Classification System) • Manuscript writing (finishing first draft, committed to August 2016) Ideas, comments, interests are very welcome?
© Copyright 2026 Paperzz