Spatial analysis of change in organic carbon and ph

DEPARTMENT for ENVIRONMENT, FOOD and RURAL AFFAIRS
Research and Development
CSG 15
Final Project Report
(Not to be used for LINK projects)
Two hard copies of this form should be returned to:
Research Policy and International Division, Final Reports Unit
DEFRA, Area 301
Cromwell House, Dean Stanley Street, London, SW1P 3JH.
An electronic version should be e-mailed to [email protected]
Project title
Spatial analysis of change in organic carbon and pH using re-sampled
National Soil Inventory data across the whole of England and Wales
DEFRA project code
SP0545
Contractor organisation
and location
National Soils Resources Institute
Cranfield University
Silsoe Bedfordshire MK45 4DT
Total DEFRA project costs
Project start date
£ 37520.00
30/04/04
Project end date
31/07/04
Executive summary (maximum 2 sides A4)
Characterising the variation of soil properties in space and time is a key factor in the design of
efficient monitoring systems and as a source of insight into soil changes. This project analysed data
collected from the National Soil Inventory (NSI) and its re-sampling to quantify changes in organic
carbon and pH in the soils of England and Wales during the 1980s and 90s
The re-sampling of the original 5km grid was carried out over a range of time intervals (12 to 25
years) so normalizing the change in soil carbon content to an annual rate was investigated. It was
concluded that this was a reasonable procedure, but the rates will not be comparable if soil carbon is
being rapidly lost, for example if a major change in land use occurred. The statistics of annual change
in organic carbon and pH were investigated for all the re-sampled sites and were found to contain
extreme values. Robust geostatistical methods were used to analyse the data but were not effective
at removing the influence of these outliers. The extreme values were investigated by returning to the
original data but no artefacts were found which would have led us to believe the outliers were not true
representations of the change in values in the soil. Therefore, Hermite polynomials were used to
normalise the data and disjunctive kriging (DK) carried out to give maps of annual change in organic
carbon and pH across England and Wales.
The spatial structure was weak so other properties of the soil were investigated to relate to the
change in organic carbon and pH. These analyses suggested that the baseline carbon content may
be a good predictor of carbon loss, and a regression analysis was carried out with spatially structured
residuals. The map shows the same general pattern as the DK results, but is much less smooth. The
CSG 15 (Rev. 6/02)
1
Project
title
Spatial analysis of change in organic carbon and pH using resampled National Soil Inventory data across the whole of
England and Wales
DEFRA
project code
SP0545
prediction variances for the regression analysis are smaller than the disjunctive kriging variances, this
reflects the uncertainty in the DK back-transformation, and also the different basis on which the
uncertainty is determined. If the spatial structure was stronger then the advantages of DK would be
seen. No similar relationship was found for pH.
The weak spatial structure of the residual process in the regression analysis and the fact that the
disjunctive kriging map of annual change is clearly dominated by the distribution of organic soils
supports the idea that at national scale, the rate of change of organic carbon where declining is
dominated by the effect of the absolute amount of carbon present. This suggests that it is most
efficient to map the annual rate of change at national scale by the regression on baseline carbon
content.
From these conclusions we may infer that a monitoring scheme for organic carbon content of the
soil should use design-based sampling on a stratification where soil type or baseline organic carbon
define the strata. The map of annual change in pH shows that the topsoil pH is increasing slightly
over the majority of the country (i.e. becoming more alkaline) except in the industrial areas of
northeast England and south Wales where there is no change or a very slight decrease. The weak
spatial structure also suggests that pH could be monitored using a design-based sampling scheme.
CSG 15 (Rev. 6/02)
2
Project
title
Spatial analysis of change in organic carbon and pH using resampled National Soil Inventory data across the whole of
England and Wales
DEFRA
project code
SP0545
Scientific report (maximum 20 sides A4)
INTRODUCTION
Characterising the variation of soil properties in space and time is a key factor in the design of efficient monitoring systems and as a
source of insight into soil changes. This project analysed data collected from the National Soil Inventory (NSI) and its re-sampling to
quantify changes in organic carbon and pH in the soils of England and Wales during the 1980s and 90s. A preliminary study showed
large changes in topsoil organic carbon in certain areas of the country and these changes, as well as changes in pH, were investigated
in detail to assess potential artefacts and relations with other NSI data.
METHODS AND RESULTS
Data sources
Between 1979 and 1983 the soils of the whole of England and Wales were sampled on a 5 km grid. The resulting dataset -- the NSI -provide a unique baseline against which change in soil can be monitored (Loveland, 1990). Between 1995 and 1996 re-sampling was
carried out for a proportion of the arable sites in the original survey (Loveland 1996) in 1998/9 a proportion of the grassland sites
were re-sampled (Loveland 1999) and in 2003 some of the non-agricultural sites were re-sampled (Bradley 2004). This means that
the interval between baseline sampling and re-sampling varies from 12 to 25 years (ignoring some much older data kept with the
baseline observations). Topsoil organic carbon and pH were the only properties analysed for all the samples collected.
Exploratory analysis of changes over time
In order to account for the variation in the time interval between the baseline sampling and the re-sampling, the change in any soil
property was normalized to an average rate per year. This assumes that the process of change is approximately linear, so that the
notional rate of change at a site would be independent of the sample interval.
To test this assumption, predictions of the organic carbon content of the soil generated by the RothC model for the Hoosfield longterm barley trial were examined. This trial covers a period of some 150 years and three contrasting treatments: no organic inputs,
yearly input of farmyard manure, and annual FYM until 1871, whereafter no manure inputs. The RothC model fits the data from
Hoosfield well (Coleman & Jenkinson, 1999). The model output was smoothed by fitting an exponential model to the end-of-year
carbon contents for the manured and unmanured treatments, C  a  br , where C is carbon content and t is time in years from the
start of the experiment. In the case of the treatment with manure up to 1871 only, a double exponential model was fitted,
t
C  a  br t  ds t , where t is time in years after 1871.
These models all fitted the data very well (R2 = 0.97, 0.99 and 0.99 for unmanured, manured and manured pre-1871 plots,
respectively). The carbon values, initially in kg/ha, were transformed to % by weight (as in the NSI database) assuming that the
carbon occurs in the top 25 cm of the soil which has a bulk density of 1.33 g cm -3.
The model output was then used to compute rates of change over times 13  t
model run. The annual rates of change were estimated over two time intervals:
OC 25 
 T  12 where T is the total length in years of the
C t  13  C (t  12)
C t  6  C (t  6)
and OC12 
.
25
12
These two measures of the rate of change encompass the range of time intervals in the re-sampling of the NSI database.
The results are shown in Figure 1. Note that for the top three treatments the estimates of rates of change over the two time intervals
are indistinguishable. However, during the initial sharp fall in carbon content for the first 30 years after the cessation of FYM
applications in 1871, the measurement of the rate is sensitive to the time interval. The effect of log-transforming the carbon
measurements was examined, but was found to be very small.
In summary, normalizing the change in soil carbon content to an annual rate is a reasonable procedure, but the rates will not be
comparable if soil carbon is being rapidly lost (as in the change from FYM to zero inputs under continuous barley on Hoosfield).
Exploratory analysis of annual rates of change
Summary statistics were computed for the normalized change in pH, the corresponding change in concentration of H + ([H+]), the
normalized change in organic carbon content, and the corresponding change in log of carbon content. These statistics are shown in
Table 1. The cumulative distributions of these variables are shown in Figure 2.
CSG 15 (Rev. 6/02)
3
Project
title
Spatial analysis of change in organic carbon and pH using resampled National Soil Inventory data across the whole of
England and Wales
DEFRA
project code
SP0545
It was decided to work with pH rather than [H+] because of the strong skew of the latter. The data on change in log-transformed
organic carbon appear more normally distributed than the untransformed. Further analyses were conducted on both variables.
Robust geostatistical analysis
Variograms of the variables were estimated using the standard estimator due to Matheron,
ˆM h 
1 N h 
 zx j   zx j  h 2 .

N h j 1
This estimator is not robust to the effect of outlying observations (Lark, 2000, 2002). For this reason robust alternatives were
considered. These are discussed in detail by Lark (2000). They are Cressie and Hawkins (1980) location estimator:

 1 N (h)
2ˆ CH (h )  
 z (x i )  z (x i  h)
N (h ) i 1


1
2





4

0.494 0.045 


 2
0.457 
,
N
(
h
)


N
(
h
)


Dowd's (1984) estimator,
2


2ˆD (h)  2.198 median  yi (h)   ,


and Genton's (1998) estimator, which uses the scale estimator of Rousseeuw and Croux's (1992, 1993)
variable Xi , i = 1, 2, …, n

Qn  2.219 X i  X j ; i  j
where H is the integer part of (n/2) + 1 and the term
   H 
 2 
 

H 
 
2
denotes the value of the
Qn . For a set of values of a
,
H 
  th ordered term in the braces.
2
From this Genton's (1998a) variogram estimator is obtained as

2ˆG (h)   2.219 yi (h)  y j (h) ; i  j




2

 ,
H
 h 
 2 


where Hh is the integer part of (N(h)/2) +1.
The performance of different variogram models for a particular variable may be compared by kriging from a prediction set of data
(from which the variogram has been obtained) to estimate the random function at locations in a set of m validation data, z(xi), …,
z(xm). At each location an estimate
(2000) proposed that the statistic
Zˆ x i  is obtained and an associated estimate of the variance, the kriging variance  K2 ,i . Lark
 x  be used for evaluation of the variogram used in the kriging, where
 x i  
Zˆ x   zx 
i
 K,2 i
i
2
.
This statistic may also be obtained by cross-validation. If z(x) can be regarded as a realization of a random function that meets the
assumptions of the intrinsic hypothesis, the kriging is done with the variogram of that random function and the kriging error
Zx   zx  can be treated as a Gaussian variable, then  x will be distributed as 
i
i
showed that the median of
 x  over all validation sites, denoted 
~
with
 
2
1
with
E x   1.0. Lark (2000)
~
E   0.455, is preferable to the mean for assessing a
variogram model because of its robustness to outliers that may be present. This statistic was used here. Its sample distribution for the
median of a set of independent data was computed after Lark (2002) to test the null hypothesis that
kriging error at two locations x i and x j is
CSG 15 (1/00)
4
~

=0.455. The covariance of the
Spatial analysis of change in organic carbon and pH using resampled National Soil Inventory data across the whole of
England and Wales
Project
title

SP0545
   Cx , x     Cx , x       Cx , x  .
C xi , x j 

DEFRA
project code
l M

l
l
i
k
k N
k
j
l M k N
l
k
k
l
C xi , x j denotes the covariance function for the variable being kriged over the interval between locations x i and x j , the kriged
estimate at location x i is obtained from
  zx  where the 
k N
k
k
i  N , and the kriged estimate at location x j is obtained from
k
are kriging weights and N a set of neighbouring locations and
  zx  . Using this equation the covariance matrix of a set of
l M
l
l
kriging errors can be written and used for bootstrapping an estimate of the confidence interval for
of the set of kriging errors are simulated and the corresponding
the quantiles of the resulting set of
~

 (xi ) calculated.
The median
~

~

. A large number of realizations
is obtained for each realization and
provide estimates of the confidence limit for our actual estimate of
~

.
By this criterion estimates obtained by Dowd's (1984) estimator were used for all three variables. Figure 3 shows the models fitted to
these estimates, (note that the data were multiplied by 10 before estimation of the variogram). The O(x) statistic of Lark (2002) was
then computed. This is a standard normal variate which is formed as the first principal component of the joint distribution of the
standardized kriging error and the standardized data value. It therefore combines information on the distribution of each datum
relative to both its local neighbours and the global distribution of the data set. Data with large values of O(x) may be regarded as
outliers. Note that computing O(x) requires an assumption of second order stationarity, so could not be done for pH which has an
unbounded variogram. In this case we focus on the standardized kriging error. The values of these statistics are presented in Figure
4.
Examination of outliers
The outliers from the log OC change were examined with respect to the original surveyor and the year of sampling. There should be
less variability in the method of taking samples and the aim to visit the exact node points amongst the re-samplers since there were
fewer people involved and the re-sampling was their only task. The original hand-written data were examined to see if the outliers had
any common features – for example, were they moved because of roads or buildings?
No artefacts were found which would have led us to believe the outliers were not true representations of the change in values in the
soil. The one aspect that could not be quantified was whether land use or management (e.g. manure applications ) had changed
between the baseline observation and re-sampling of any site.
Table 2 below suggests that POSSIBLY in a number of cases there is a relationship between the number of outliers and the observer
(those highlighted in red). However, because most surveyors worked within a fairly coherent region of the country, this could simply
reflect the clustering of outliers in some places.
Relationships between change and other properties of the soil
Other properties of the soil measured at the time of original sampling and annual rainfall were collated and examined with regard to
the outliers of the log change in OC data (Table 3)
When the numbers of outliers are expressed as a percentage of all samples in that land use, there was no significant difference
between the land uses (  =27.67 d.f.=21 p>0.05). Similarly there was no significant difference between the soil types (  =14.77
d.f.=11 p>0.05).When the numbers are expressed as a percentage of all samples in that organic carbon group there was a significant
2
2
difference between the groups (  =36.78 d.f.=11 p<0.05). The groups that contributed most to the  were the extreme groups
(<1.3% and >20%), which suggested that the relationship between original soil organic carbon and change in organic carbon should
be looked at. When the numbers are expressed as a percentage of all samples in that rainfall class there was no significant difference
2
2
between the rainfall classes (  =0.5067 d.f.=5 p>0.05).
2
The range of soil properties and series for the concentration of sites in Essex/Suffolk do not appear to explain the occurrence of
outliers. From a simple review, there seem to be no differences from the surrounding areas.
The values of the same variables were then plotted for all the re-sampled data against the annual change in OC to see if this was
related to any variable which could then be used to explain the spatial variation (Figure 5). It can be seen that the only variable that is
related to the annual change in organic carbon is the original value of organic carbon.
CSG 15 (1/00)
5
Spatial analysis of change in organic carbon and pH using resampled National Soil Inventory data across the whole of
England and Wales
Project
title
DEFRA
project code
SP0545
The distribution of annual change in organic carbon was divided into two groups in two ways according to organic carbon content.
The first by the amount of carbon in the baseline topsoil sample (> or < 7% - the approximate cut-off between mineral topsoils and a
peaty/humose topsoil (Avery 1980)). The second grouped profiles which were in subgroups defined as having a mineral or
peaty/humose topsoil (Avery 1980).
Summary statistics on these two classifications are shown in Table 4 and Figure 6 shows Box-and-Whisker plots highlighting possible
and probable outliers by the criteria of Tukey (1977).
These results show clearly that the rate of loss is larger in soils with a larger initial organic carbon content. This is not surprising as
those soils with a higher organic carbon content have a greater potential to lose organic carbon than those where the organic carbon
content is smaller. However, both groups still contain outlying values so these classifications are not sufficient to account for the
outliers.
Disjunctive kriging
Disjunctive kriging (DK) is a non-linear kriging technique. It is based on an initial transformation of the data using Hermite
polynomials, to obtain a normally distributed variable. The output of the DK includes local estimates of the original variable and
estimates of the conditional probability (conditioned on the data) that the true value at a particular site falls below some threshold.
Details of the method are provided by Webster & Oliver (1989), Lark & Ferguson (2004).
Since exploratory analysis of these data showed no reason to believe that extreme values reflect artefacts, it was decided to analyse
them using DK. Figure 7 shows the transformation for normalized change in (untransformed) OC and pH.
The variogram of the transformed data was estimated using all estimators. Cross-validation showed that Matheron's estimator was
suitable. The estimates and fitted model are shown in Figure 8.
Figure 9 shows DK estimates of (a) the normalized rate of change of soil OC content; (b) the estimation variance of this; and (c-d) the
conditional probability that the actual change is smaller than some specified threshold. Figure 10 shows DK estimates of (a) the
normalized rate of change of soil pH; (b) the estimation variance of this; (c) the conditional probability that the actual change is
smaller than 0, i.e. that the soil is becoming more acid.
Regression analysis
The results in Figure 5 suggest that the baseline carbon content may be a good predictor of carbon loss. A random subsample of the
data (1000 observations) was selected and a linear regression fitted using the ASReml software (Gilmour et al., 2002) and specifying
an exponential variogram for the residuals. Figure 11 shows the data and fitted regression.
The fitted model is:
Mean annual change in %OC = 0.06 – 0.0187 %OC in baseline survey
The standard error of the intercept is 0.0148 and of the slope, 0.00081, and the t-ratio for the slope is -23.21, showing strong evidence
for a relationship. The variogram of the error term has a nugget of 0.0617 and a spatially structured variance of 0.0054, with an
exponential distance parameter of 27.6 km (i.e. an approximate range of 83 km). This is very weak spatial dependence. Figure 12
shows the residuals from this regression for the whole data set, not just the data used to fit the model. This shows some regional
features, notably a tendency to overestimate the rate of loss of carbon in the Pennines, Borders and Dartmoor. Figure 13 shows the
predicted rate of loss at all NSI sites. This shows the same general pattern as the DK results, but of course is much less smooth. The
prediction variances are shown in Figure 14. These are computed using the formula


1
n
ˆ 2  1    ˆ 2 b x  x 2 ,
where ˆ  and ˆ b are respectively the ASReml estimates of the a-priori residual variance and the estimation variance of the slope
of the regression, x is the baseline carbon content at the prediction site and the overbar denotes the mean of this over all sites. This
OLS formula is used given the very weak spatial structure of the error. Given that the spatially structured variance is so small (about
8% of the total) the benefits of kriging the residuals and adding them to the regression predictions would be very small.
2
2
The prediction variances are smaller than the DK variances. This reflects the uncertainty in the DK back-transformation, and also the
different basis on which the uncertainty is determined. If the spatial structure was stronger then the advantages of DK would be seen.
CSG 15 (1/00)
6
Project
title
Spatial analysis of change in organic carbon and pH using resampled National Soil Inventory data across the whole of
England and Wales
DEFRA
project code
SP0545
The regression was then used to predict the change in OC at the 1144 sites not used to form the model. The 1000 prediction data were
then used to estimate the change at the same sites by DK (specifying the variogram of the Hermite-transformed variable that was
estimated from all the data). The residuals from the estimates were computed, and their summary statistics are tabulated below.
Again, the regression gives prediction with a smaller variance and (as measured by the robust median error) a smaller bias.
DISCUSSION
Estimates of the annual change in topsoil organic carbon content and pH have been made for the whole of England and Wales using
data from the National Soil Inventory. No artefacts were found which would have led us to believe the outliers were not true
representations of the change in values in the soil.
Hermite transformations were used to normalise the annual rates of change of OC and pH. The spatial structure of the transformed
data is relatively weak causing the kriged results to be smoothed and not be very different from the overall mean change.
Organic carbon
There is a strong relationship between the mean annual rate of change in organic carbon and the amount of carbon present. This is
expected when carbon is declining since this is an exponential process, as can be seen in Figure 1 (particularly the bottom graph). The
weak spatial structure of the residual process in the regression analysis and the fact that the disjunctive kriging map of annual change
is clearly dominated by the distribution of organic soils supports the idea that at national scale, the rate of change of OC is dominated
by the effect of the absolute amount of carbon present. This suggests that it is most efficient to map the annual rate of change at
national scale by the regression on baseline carbon content.
From these conclusions we may infer that a monitoring scheme for organic carbon content of the soil should use design-based
sampling on a stratification where soil type or baseline organic carbon define the strata. If NSI sites are resampled, selected at random
within the strata, then the baseline organic carbon may be a useful covariate to improve predictions following Brus (2000).
Soil pH
The map of annual change in pH shows that the topsoil pH is increasing slightly over the majority of the country (i.e. becoming more
alkaline) except in the industrial areas of northeast England and south Wales where there is no change or a very slight decrease. The
spatial structure of the transformed change in pH is weak. This weak spatial structure supports the use of design-based sampling to
monitor the change in pH – that is a sampling strategy based on classical sampling theory rather than the model-based design which
assumes the samples taken are from a field of spatially dependant random variables.
REFERENCES
Avery, B.W. 1980. Soil classification for England and Wales. (Higher categories). Soil Survey Technical Monogrpah No.14.
Harpenden.
Bradley, R.I. 2004. Changes in organic carbon content of non-agricultural soils. Defra project report SP 0521, by Cranfield
University. (http://www.defra.gov.uk/science/project_data/DocumentLibrary/SP0521/SP0521_1372_FRP.doc)
Brus, D.J. 2000. Using regression models in design-based estimation of spatial means of soil properties. European Journal of Soil
Science, 51, 159-172
Coleman, K. & Jenkinson, D.S. 1999. A model for the turnover of carbon in soil. Model description and windows guide. IACRRothamsted Harpenden, Herts AL 2JQ.
Cressie, N., Hawkins, D. 1980. Robust estimation of the variogram. Mathematical Geology, 12, 115–125.
Dowd, P.A. 1984. The variogram and kriging: robust and resistant estimators. In: Geostatistics for Natural Resources
Characterization (eds G. Verly, M. David, A.G. Journel , A. Marechal), Part 1. pp. 91–106. D. Reidel, Dordrecht
Genton, M.G. 1998. Highly robust variogram estimation. Mathematical Geology, 30, 213–221.
Gilmour, A.R., Gogel, B.J. Cullis, B.R., Welham, S.J. and Thompson, R. 2002. ASReml User Guide, Release
1.0. VSN International, Hemel Hempstead, UK.
Hodgson, J.M. (ed). 1997. Soil Survey field handbook. Soil Survey Technical Monograph No.5. Silsoe.
Lark, R.M. 2000. A comparison of some robust estimators of the variogram for use in soil survey. European
Journal of Soil Science 51, 137–157.
Lark, R.M. 2002 Modelling complex soil properties as contaminated regionalized variables. Geoderma. 106, 171–188.
Lark, R.M. and Ferguson, R.B. (2004) Mapping the conditional probability of deficiency or excess of soil phosphorous, a comparison
of ordinary indicator kriging and disjunctive kriging. Geoderma 118, 39–53.
Loveland, P. J. 1990 The National Soil Inventory: Survey design and sampling strategies. pp. 73-80 In Element
Concentration Cadasters in Ecosystems(Ed, Lieth, H. and Markert, B.) VCH Verlagsgesellschaft, Weinheim, Germany.
Loveland, P.J. 1996. Re-sampling of selected soils Defra project report SP0115 by SSLRC.
CSG 15 (1/00)
7
Project
title
Spatial analysis of change in organic carbon and pH using resampled National Soil Inventory data across the whole of
England and Wales
DEFRA
project code
SP0545
Loveland, P.J. 1999. Chemical analysis of re-sampled permanent grassland sites from the National Soil Inventory. Defra project
report SP0123 by SSLRC.
Loveland, P.J. & Webb, J. 2000. Critical levels of soil organic matter. Defra project report SP 0306.
(http://www.defra.gov.uk/science/project_data/DocumentLibrary/SP0306/SP0306_216_FRP.doc)
Rousseeuw, P.J. , Croux, C. 1992. Explicit scale estimators with high breakdown point. In: L1 Statistical Analysis and Related
Methods (ed. Y. Dodge), pp. 77–92 North Holland, Amsterdam.
Rousseeuw, P.J. , Croux, C. 1993. Alternatives to the median absolute deviation. Journal of the American Statistical Association,
88, 1273–1283.
Tukey, J.W. 1977. Exploratory data analysis. Reading, MA: Addison-Wesley Publishing Company.
Verheijen, F., Bellamy, P.H., Kibblewhite, M., and Gaunt, J. 2004 Organic carbon ranges in arable soils of England and Wales Soil
Use and Management (in press)
Webster, R. and Oliver, M.A. 1989. Optimal interpolation and isarithmic mapping of soil properties. VI. Disjunctive kriging and
mapping the conditional probability. J. Soil Sci. 40, 497–512.
CSG 15 (1/00)
8