A robust statistically based approach to estimating the probability of

A robust statistically based
approach to estimating the probability of
contamination occurring between sampling
locations
Peter Beck | Principal Environmental Scientist
Image
placeholder
Image
placeholder
Image
placeholder
Current Site Assessment Approach
Site History
Select Target
Size and
Design Pattern
Statistical
Evaluation of
Concentration
Data
Judgment Based
Decision
Collect data on site history to identify
sources of impact
Collect target samples at locations of
concern
Select target shape and size of
concern and design a sampling
pattern to establish absence at 95%
confidence using an unbiased sampling
pattern
Assess unbiased concentration data
using uni-variant statistical tools
Interpret the results from the two
separate approaches to assess
contaminant distribution and site
condition
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
The Trouble with the
Hot Spot
• Group trial in data
interpretation
• Participants selected sample
locations
• Doted line represents actual
hot spot
• Solid line represents linear
based interpolation
• Dashed line represents
nearest clean sample
interpretation
• Note the high degree of
variability and uncertainty
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
What Does the Hot Spot Mean
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Lead Histogram Consistent Scale
160
120.00%
Frequency
Cumulative %
140
100.00%
120
Interpreting Concentration Data
Lead
Consistent Bin Range
80.00%
80
60.00%
60
40.00%
40
20.00%
20
7930
7750
7570
7390
7210
7030
6850
6670
6490
6310
6130
5950
5770
5590
5410
5230
5050
4870
4690
4510
4330
4150
3970
3790
3610
3430
3250
3070
2890
2710
2530
2350
2170
1990
1810
1630
1450
1270
910
1090
730
550
370
.00%
10
0
190
Frequency
100
Bin
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Lead Histogram Variable Scale
70
120.00%
Frequency
Cumulative %
60
100.00%
50
Interpreting Concentration Data
Lead
Variable Bin Range
Frequency
40
80.00%
60.00%
30
40.00%
20
20.00%
10
48
20
0
50
0
80
0
11
00
14
00
17
00
20
00
23
00
26
00
29
00
32
00
35
00
38
00
41
00
44
00
47
00
50
00
53
00
56
00
59
00
62
00
65
00
68
00
71
00
74
00
77
00
80
00
42
36
30
24
18
6
.00%
12
0
0
Bin
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Limitations of Uni-variate statistics
• Assumes samples were collected in an un-biased
manner
• Dissociates location and concentration (ie. No relationship
between the two)
• Ignores sample location as a factor
• Normal and Log-Normal distribution not applicable
in many situations
• Log-Normal distribution can be unstable
• Non-parametric methods overcome distribution issues but still do
not consider location
• Can not provide confidence in spatial data interpretation
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Why is Spatial Relationship Important
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Spatial Geostatistics
•
Based on approach used in the mining industry
•
Allows for spatial relationship between the samples
•
Unaffected by sample biased
•
The VARIOGRAM or SEMIVARIOGRAM is the fundamental Assessment
Tool.
•
The variogram present random variance, spatial variance as well as the
range of influence of samples
•
Data evaluation is done by Kriging
•
Probability plots developed by Indicator Kriging
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
The Variogram
h
1 n
2

[
X

X
]
 i
ih
2n i 1
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
One Dimensional
Example
Distance Spatial
0
1
2
3
4
5
6
7
8
9
h
1 n
2

[
X

X
]
 i
ih
2n i 1
h=2
h=1
1
2
3
4
5
6
7
8
9
10
(Xi-Xi+h)2=(1-2)2=1
(Xi-Xi+h)2=(2-3)2=1
“
“
“
“
“
“2
(Xi-Xi+h) =(9-10)2=1
n=9
1
2
3
4
5
6
7
8
9
10
(Xi-Xi+h)2=(1-3)2=4
(Xi-Xi+h)2=(2-4)2=4
“
“
“
“
(Xi-Xi+h)2=(8-10)2=4
n=8
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Variogram Types and Examples
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Kriging and Indicator Kriging
Kriging
•
Mathematical technique for assigning best linear moving average
concentration over a defined area.
•
Considered the best method of estimating concentration distribution
because:
•
Avoids systematic bias
•
Minimises the error of estimation (kriging error)
•
Requires development and data input from variogram
Indicator Kriging
• Assign a value of 1 to “clean” samples and a value of 0 to “dirty” samples
• Results in a Probability Plot of presence and absence of contamintion
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
So Why Consider Spatial Geostatistics
• Provides a linkage between concentration, variance (micro + macro) and
location
• Separates random and spatial components of variance
• Micro-scale (sample scale) variance always in Nugget Effect (random
variance)
• Macro-scale (spatial variance) is either spatially related, random (Nugget
Effect) or a combination of the two
• Assist in establishing when sufficient samples have been collected to
characterise a site
• Provides a robust method for predicting uncertainty in impact distribution,
remediation volumes and cost.
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Case Study 1: Application to 19ha Parkland
• Parkland and Sporting Ovals in Armidale NSW. Impact by fill from gasworks
site was suspected.
• Investigations commenced in 2000, with a limited sampling program and
initial results were used to inform a staged geostatistical assessment
process
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Initial Assessment Stage Results
Variogram for PAH
concentration. The
results were used to
develop confidence
regions for the initial
assessment area
using indicator
Kriging and then
selecting additional
sampling locations
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Second Assessment Stage
Variogram for PAH
concentration. The
results were used to
revise confidence
regions for the
second stage
assessment area
using indicator
Kriging and then
selecting additional
sampling locations
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Third Assessment Stage
Note there was little
change in variogram
between stage 2 and 3
sampling. Thus further
sampling would offer
limited benefit
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Final Results
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
More Recent Advances
• Effects of different approaches and skill in the development of the variogram
on the reliability of spatial interpretation is an important consideration
• Using different variograms and assessing the effects on interpretation can
assist in clarifying the reliability of the spatial interpretation
• Utilising only primary sample results can lead to overestimation of statistical
confidence and in the case of spatial geostatistics, over estimation of the
confidence in the spatial interpretation
• The QA/QC data collected can be utilised to factor in sample scale variance
often caused by heterogeneity
• Inclusion of the blind duplicate and split samples allows incorporation of the
sample scale variance into the variogram development and accounts for it in
the spatial interpretation
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Case Study 2: Effects of Variogram and
Variance
•
A 2ha site was intensively assessed to
facilitate in-situ waste classification
•
A total of 303 primary samples were
analysed, with 108 (~36%) samples
analysed exceeded adopted criteria for
one or more contaminants
5815540
5815520
5815500
5815480
5815460
5815440
•
The effects of different variogram
interpretation was assessed by
development of variogram by different
assessors with different time budgets
5815420
5815400
5815380
5815360
5815340
303780 303800 303820 303840 303860 303880 303900 303920 303940 303960 303980 304000
•
Lead QA/QC data was used to assess
the effect of random variance on the
variogram and confidence mapping
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Column D: Cadmium
Direction: 0.0 Tolerance: 40.0
1000
5815540
900
5815520
800
5815500
700
5815480
600
5815460
500
5815440
Variogram
Variograms – Cd Data
400
200
100
0
0
10
20
30
40
50
60
70
80
90
100
110
5815400
5815380
5815360
5815340
Lag Distance
303780 303800 303820 303840 303860 303880 303900 303920 303940 303960 303980 304000
Variogram
Column D: Cadmium
Direction: 0.0 Tolerance: 40.0
1000
5815540
900
5815520
800
5815500
700
5815480
600
5815460
500
5815440
400
5815420
Key Variogram Data
Transformed: None
Sill:
800
Nugget:
1
Range:
35
Model:
Spherical
300
200
100
5815400
5815380
5815360
0
0
10
20
30
40
50
60
70
80
90
100
110
5815340
Lag Distance
303780 303800 303820 303840 303860 303880 303900 303920 303940 303960 303980 304000
Column D: Cadmium
Direction: 0.0 Tolerance: 40.0
5815540
800
Variogram
Cadmium data was assessed for
spatial relationship and variance
Effects of variations in various aspects
of the variogram on the data
interpretation were examined,
including:
• Increasing variance
• Reducing random variance
• Decreasing the lag distance and
range of influence
• Using a different model
5815420
Key Variogram Data
Transformed: None
Sill:
700
Nugget:
50
Range:
35
Model:
Spherical
300
5815520
700
5815500
600
5815480
5815460
500
5815440
400
5815420
300
Key Variogram Data
Transformed
None
Sill:
770
Nugget:
1
Range:
15
Model:
Gaussian
200
100
5815400
5815380
5815360
0
0
10
20
30
40
50
60
70
80
90
100
110
5815340
Lag Distance
303780 303800 303820 303840 303860 303880 303900 303920 303940 303960 303980 304000
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Column G: Lead
Direction: 0.0 Tolerance: 40.0
5815540
Variograms – Pb Data
1000000
5815520
900000
5815500
800000
5815480
700000
Variogram
5815460
600000
5815440
500000
5815420
400000
300000
200000
100000
5815400
5815380
5815360
5815340
0
0
10
20
30
40
50
60
70
80
90
100
110
Lag Distance
303780 303800 303820 303840 303860 303880 303900 303920 303940 303960 303980 304000
Column D: Lead Log
Direction: -15.0 Tolerance: 20.0
5815540
3
5815520
5815500
2.5
5815480
2
Variogram
Lead data was assessed for spatial
relationship and variance
Effects of variations in various aspects
of the variogram on the data
interpretation were examined,
including:
• Reducing random variance
• Decreasing the lag distance and
range of influence
• Log Transforming the data
before variogram development
Key Variogram Data
Transformed
None
Sill:
1100000
Nugget:
1
Range:
17
Model: Rational Quadratic
5815460
5815440
1.5
5815420
Key Variogram Data
Transformed
Log
Sill:
2.5
Nugget:
0.25
Range:
25
Model: Rational Quadratic
1
0.5
5815400
5815380
5815360
5815340
0
0
10
20
30
40
50
60
70
80
90
100
110
Lag Distance
303780 303800 303820 303840 303860 303880 303900 303920 303940 303960 303980 304000
Column D: Lead Log
Direction: -15.0 Tolerance: 20.0
5815540
5815520
2.5
5815500
2
5815480
Variogram
5815460
1.5
5815440
5815420
1
Key Variogram Data
Transformed
Log
Sill:
2.5
Nugget:
0.25
Range:
20
Model: Rational Quadratic
0.5
5815400
5815380
5815360
5815340
0
0
10
20
30
40
50
60
70
80
90
100
110
Lag Distance
303780 303800 303820 303840 303860 303880 303900 303920 303940 303960 303980 304000
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Comparison of Results
Cd
Pb
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Inclusion of QA/QC Samples – Lead Data
The variogram shows that the overall random variance component increased
from <1% to about 10%
Column G: Lead
Direction: 0.0 Tolerance: 40.0
Column D: Lead
Direction: 0.0 Tolerance: 40.0
1000000
1000000
900000
900000
800000
800000
700000
700000
Variogram
Variogram
600000
600000
500000
500000
400000
400000
300000
300000
200000
200000
Primary Data Only
100000
0
Primary Blind Duplicate and
Split Data Only
100000
0
0
10
20
30
40
50
60
Lag Distance
70
80
90
100
110
0
10
20
30
40
50
60
70
80
90
100
110
Lag Distance
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Confidence Mapping
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Adjusting Indicator Krige
• Assigning 0 or 1 to samples is deterministic and does not take into account
measurement uncertainty
• Using a probabilistic approach to assigning the indicator krige value can take
this measurement uncertainty into account.
0
0.75
C+U
C
0.5
0.25
Decision Criteria
1
C-U
Uncontaminated Uncontaminated Contaminated Contaminated Contaminated
Deterministic Approach
Uncontaminated
Probabilistic Approach
Possibly
Contaminated
Maybe
Contaminated
Probably
Contaminated Contaminated
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Results of Robust Statistical Assessment
Legend
Variogram
Concentration
6274800
Column C: Zinc (mg/kg)
Direction: 5.0 Tolerance: 50.0
5 to 50
50 to 100
100 to 200
200 to 2000
2000 to 7000
7000 to 17500
17500 to 500000
6274700
6274650
Indicator Krig Value
6274600
8000
7000
6000
Variogram
6274750
9000
5000
4000
3000
0 to 0.25
0.25 to 0.5
0.5 to 0.75
0.75 to 1.001
6274550
6274500
2000
1000
0
0
Probability Distribution
6274450
20
40
60
80
100
120
140
Lag Distance
6274400
351300
351350
351400
351450
351500
351550
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
Conclusions
• Current site characterisation practice collects spatially distributed data,
interpretation of the extent of impact and uncertainty is generally based on
judgment
• Uni and Bi-variant statistical tools are available to assess uncertainty
• Uni-variant approaches generally rely on the assumption of random
distribution and unbiased sample collection, which is rarely met
• Bi-variant variant approaches link concentration, variance and location thus
allowing estimation of the random and spatial component of variance
allowing development of probability distributions on the presence or absence
of contamination
• Results are generally robust even when variograms utilised are diverse
• Inclusion of QA/QC samples in the variogram development results in more
realistic probability distributions
A robust statistically based approach to estimating the probability of contamination occurring between
sampling title
locations
Presentation
www.ghd.com
Presentation title