Potential Modeling

“Potential Modeling”
Mathematische Methoden, Modellannahmen
und praktische Anwendungen
Helmut Schaeben
Mathematische Geologie und Geoinformatik
Institut für Geophysik und Geoinformatik
TU Bergakademie Freiberg
10. Sächsischen GIS-Forum des GDI Sachsen e.V.
Dresden, 29. Jan. 2013
Potential modeling
Contents
Introduction
Objective
Prerequisites
Methods
Weights of evidence
Logistic regression
Comparison
Case study
Potential modeling
Introduction
Objective
Prerequisites
Potential Modeling
Objective
The ultimate goal of “potential modeling” or “targeting” is to
recognize locations for which the probability of a “target” event
like a landslide or a specific mineralization is a relative maximum.
Potential Modeling
Objective
The ultimate goal of “potential modeling” or “targeting” is to
recognize locations for which the probability of a “target” event
like a landslide or a specific mineralization is a relative maximum.
The event must be sufficiently well understood in terms of cause
and effect to collect data corresponding to spatially referenced
factors (“evidences”) B` , ` = 0, . . . , m, in favor or against the
event T to occur.
Potential Modeling
Objective
The ultimate goal of “potential modeling” or “targeting” is to
recognize locations for which the probability of a “target” event
like a landslide or a specific mineralization is a relative maximum.
The event must be sufficiently well understood in terms of cause
and effect to collect data corresponding to spatially referenced
factors (“evidences”) B` , ` = 0, . . . , m, in favor or against the
event T to occur.
Then spatially referenced “posterior” probabilities given the pieces
of evidence can be estimated by several approaches including
weights-of-evidence, logistic regression, fuzzy logic, artificial neural
nets, statistical learning, support vector machines, and others.
Potential Modeling
Objective
The ultimate goal of “potential modeling” or “targeting” is to
recognize locations for which the probability of a “target” event
like a landslide or a specific mineralization is a relative maximum.
The event must be sufficiently well understood in terms of cause
and effect to collect data corresponding to spatially referenced
factors (“evidences”) B` , ` = 0, . . . , m, in favor or against the
event T to occur.
Then spatially referenced “posterior” probabilities given the pieces
of evidence can be estimated by several approaches including
weights-of-evidence, logistic regression, fuzzy logic, artificial neural
nets, statistical learning, support vector machines, and others.
These methods require a training area to estimate the parameters
of the model M(θ0 , . . . , θm | (b0,i , . . . , bm,i , ti )i=1,...,n ).
Potential Modeling – Höffigkeitsprognose
Prerequisites: No mineral deposits without Mineralogy
Cox, D.P., Singer, D.A., eds., 1986,
Mineral deposit models:
U.S. Geological Survey Bulletin 1693
Singer, D.A., Menzie, W.D., 2010,
Quantitative Mineral Resource Assessments,
an Integrated Approach:
Oxford University Press
McCuaig, T.C., Beresford, S., Hronsky, J., 2010,
Translating the mineral systems approach into an effective
exploration targeting system:
Ore Geology Reviews 38, 128–138
Fabricated data: Training area
2
10
y
6
8
1
1
1
1
1
1
1
1
1
1
4
1
1
1
1
1
1
1
1
1
1
4
6
8
10
2
1
1
1
1
1
1
1
1
1
1
1
1
4
6
binary target T
8
10
8
10
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
y
1
1
1
2
4
6
x
8
10
1
1
1
1
1
1
2
2
4
1
1
8
10
binary evidence B3
10
x
8
6
1
1
1
1
x
4
y
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
binary evidence B2
6
2
4
y
6
8
10
binary evidence B1
2
4
6
x
Fabricated data: Training area
2
10
8
1
1
1
1
1
1
1
1
1
1
1
1
6
1
1
1
1
2
2
2
2
1
1
y
1
1
1
1
2
2
2
2
1
1
4
1
1
1
1
1
1
1
1
1
1
B1 + B3
2
2
4
y
6
8
10
B1 + B2
4
6
8
1
1
1
1
2
2
1
1
1
1
10
1
1
1
1
2
2
1
1
1
1
1
1
1
1
2
2
1
1
1
1
2
1
1
1
1
4
10
1
1
1
1
1
1
y
1
1
4
2
2
1
1
2
2
2
1
1
6
8
10
8
6
y
4
2
2
1
1
2
4
6
x
1
1
8
1
1
10
B1 + B2 + B3
2
2
2
1
1
1
1
x
B2 + B3
2
2
1
1
1
1
6
x
1
1
1
1
8
10
1
1
1
1
2
2
1
1
1
1
1
1
1
1
3
3
2
2
1
1
2
1
1
1
1
3
3
2
2
1
1
2
2
1
1
2
2
1
1
2
2
1
1
4
6
x
1
1
1
1
8
1
1
1
1
10
Fabricated data: Posterior prob ∼ single evidence
0.8
10
0.6
8
0.4
T levels
6
4
0
2
4
6
8
0.2
0.0
0
0.0
0
0.2
2
seq(0, 10, 1)
0.6
0.4
T levels
6
4
2
seq(0, 10, 1)
8
0.8
10
1.0
T ~ B2
1.0
T ~ B1
10
0
2
4
seq(0, 10, 1)
6
8
10
seq(0, 10, 1)
binary target T
8
1
6
1
8
0.8
10
1.0
10
T ~ B3
0.6
y
T levels
6
0.4
1
1
1
1
0.2
2
4
4
1
1
2
1
0.0
0
seq(0, 10, 1)
1
0
2
4
6
seq(0, 10, 1)
8
10
2
4
6
x
8
10
0
0
2
2
T levels
0.6
6
4
seq(0, 10, 1)
6
8
10
T levels
0.6
0.8
10
1.0
10
0.4
8
1.0
8
6
0.8
10
6
4
8
4
seq(0, 10, 1)
0.4
4
seq(0, 10, 1)
2
0.2
0.2
2
0
0.0
0.0
0
0.0
0.0
0
0
0.2
0.2
2
2
T levels
0.6
6
0.4
4
T levels
0.6
6
seq(0, 10, 1)
0.4
4
seq(0, 10, 1)
8
8
0.8
0.8
10
10
1.0
1.0
Fabricated data: Posterior prob ∼ single evidence
T ~ B1 + B2
T ~ B1 + B3
seq(0, 10, 1)
0
0
2
seq(0, 10, 1)
2
4
4
6
T ~ B2 + B3
T ~ B1 + B2 + B3
seq(0, 10, 1)
6
8
10
8
10
Potential modeling
Methods
Weights of evidence
Logistic regression
Comparison
Terms
Odds
For probabilities P(A) 6= 1, A ∈ A, odds are defined as the ratio
O(A) =
P(A)
P(A)
=
, p ∈ [0, 1).
c
P( A)
1 − P(A)
Logits
Logits are defined as
logit(A) = ln O(A) = ln
P(A)
, p ∈ [0, 1).
1 − P(A)
Logistic function
The logistic function is defined as
1
Λ(z) =
, z ∈ R.
1 + exp(−z)
Terms
1.0
0.8
0.6
0.0
0.2
0.4
logisticfunction
0.6
0.4
0.0
0.2
logisticfunction
0.8
1.0
Logistic function
-10
-5
0
z
5
10
-10
-5
0
5
10
z
Graphs of the sigmoidal function Λ(z) (left) and Λ(32z) (right).
Terms
1.0
0.8
0.6
0.0
0.2
0.4
logisticfunction
0.6
0.4
0.0
0.2
logisticfunction
0.8
1.0
Logistic function
-10
-5
0
5
10
-10
-5
z
0
5
10
z
Graphs of the sigmoidal function Λ(z) (left) and Λ(32z) (right).
Remark
Logit transform and logistic function are mutually inverse, i.e.
Λ(z) = z.
logit Λ(z) = ln
1 − Λ(z)
Terms
Conditional independence of random events
Two random events A1 and A2 are conditionally stochastically
independent given the conditioning random event C , Z1 ⊥⊥Z2 | C,
if, given the occurrence of event C , knowledge of the occurrence of
event A2 does not affect the probability of event A1 , i.e.
P(A1 | A2 C ) = P(A1 | C ),
Terms
Conditional independence of random events
Two random events A1 and A2 are conditionally stochastically
independent given the conditioning random event C , Z1 ⊥⊥Z2 | C,
if, given the occurrence of event C , knowledge of the occurrence of
event A2 does not affect the probability of event A1 , i.e.
P(A1 | A2 C ) = P(A1 | C ),
Conditional independence of random variables
Two random variables Z1 , Z2 are conditionally stochastically
independent given the conditioning random variable C, if knowing
C renders Z2 irrelevant for predicting Z1 .
In terms of densities,
fZ1 | Z2 ,C = fZ1 | C .
Weights of evidence
Assuming conditional independence of binary evidential random
variables B` , ` = 1, . . . , m, given the binary random target variable
T yields
weights of evidence
W`+ := ln
P(B` = 1 | T = 1)
,
P(B` = 1 | T = 0)
W`− := ln
P(B` = 0 | T = 1)
,
P(B` = 0 | T = 0)
conditional “posterior” probabilities
in terms of a logit (“log–linear form of Bayes’ formula”)
X
X
logit P(T = 1 | (B` = b` )`=1,...,m ) = logit P(T = 1) +
W`+ +
W`− ,
`:b` =1
`:b` =0
in terms of a probability
X
X
−1
−1
W`+ −
W`−
.
P T = 1 | (B` = b` )`=1,...,m = 1 + O(T)
exp −
`:b` =1
`:b` =0
Weights of evidence
Estimation of weights W
Given the sample B`,i , Ti , i = 1, . . . , n, ` = 1, . . . , m, the weights
are estimated by counting, i.e.
c + = ln
W
`
c − = ln
W
`
Pn
Pn
i=1 B`,i Ti
i=1 B`,i Ti
P
P
n
n
i=1 Ti
i=1 Ti
Pn
Pn
P
=
ln
B`,i (1−Ti )
B`,i − ni=1 B`,i Ti
i=1
i=1
Pn
Pn
n − i=1 Ti
i=1 (1−Ti )
Pn
(1−B`,i )Ti
i=1
Pn
i=1 Ti
Pn
= ln . . . .
(1−B
)(1−Ti )
i=1
Pn `,i
(1−T
)
i
i=1
,
Logistic regression
Conditional expectation of a binary random variable T given the
realisation b = (b0 , b1 , . . . , bm )T ∈ Rm+1 of a (m + 1)–variate
random predictor variable B = (B0 , B1 , . . . , Bm )T with B0 ≡ 1
E(T | B = b) = P(T = 1 | B = b) =: π(b).
Then the logistic regression model without interaction terms is
in terms of a logit
m
X
logit P(T = 1 | (B` = b` )`=0,...,m ) =
β` b` ,
`=0
in terms of a probability
π(b) = P T = 1 | (B` = b` )`=0,...,m
m
m
X
X
−1
= Λ
β` b` = 1 + exp −
β` b`
.
`=0
`=0
Logistic regression
Estimation of parameters β
Given the sample B`,i , Ti , i = 1, . . . , n, ` = 1, . . . , m, the
parameters of the logistic regression model are estimated with well
established, well understood methods based on probability and
encoded in any major statistical software package
Method: Maximum likelihood estimation
Numerics: Fisher scoring algorithm (a form of
Newton-Raphson, special case of iteratively reweighted least
squares algorithm)
ensuring “nice” statistical properties of the estimates.
Artificial neural nets
Single–layer feedforward artificial neural net
With respect to artificial neural nets
m
X
π(bi ) = Λ β0 +
β` b`,i , i = 1, . . . , n
`=1
is called a single–layer
perceptron or single–layer ANN,
Cartoon of a single hidden layer ANN from
http://en.wikipedia.org/wiki/Artificial
neural network
minimization of the sum of squared residuals is referred to as
training,
gradient methods to solve for the model parameters are
referred to as linear perceptron training rule,
the stepsize along the negative gradient is called learning rate,
etc.
Comparison: Modeling assumptions, appropriate models
Conditional independence
If the pieces of evidence are conditionally independent given the
target, then
the method of weights of evidence applies,
logistic regression without interaction terms yields the proper
complete model.
Comparison: Modeling assumptions, appropriate models
Conditional independence
If the pieces of evidence are conditionally independent given the
target, then
the method of weights of evidence applies,
logistic regression without interaction terms yields the proper
complete model.
If individual pieces of evidence are not conditionally independent
given the target, then
the method of weights of evidence does not apply,
logistic regression still applies, but interaction terms may be
needed to yield the proper complete model.
Comparison: Modeling assumptions, appropriate models
Conditional independence
If the pieces of evidence are conditionally independent given the
target, then
the method of weights of evidence applies,
logistic regression without interaction terms yields the proper
complete model.
If individual pieces of evidence are not conditionally independent
given the target, then
the method of weights of evidence does not apply,
logistic regression still applies, but interaction terms may be
needed to yield the proper complete model.
If (some) pieces of evidence are not binary, then
the method of weights of evidence does not apply,
logistic regression still applies, interaction terms may be
needed, but multi-linear interaction terms may yield only
approximations to the proper complete model.
Potential modeling
Case study with fabricated data
Fabricated data: Training area
2
10
y
6
8
1
1
1
1
1
1
1
1
1
1
4
1
1
1
1
1
1
1
1
1
1
4
6
8
10
2
1
1
1
1
1
1
1
1
1
1
1
1
4
6
binary target T
8
10
8
10
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
y
1
1
1
2
4
6
x
8
10
1
1
1
1
1
1
2
2
4
1
1
8
10
binary evidence B3
10
x
8
6
1
1
1
1
x
4
y
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
binary evidence B2
6
2
4
y
6
8
10
binary evidence B1
2
4
6
x
Case study with fabricated data
Correlation matrix of fabricated data
B1
B2
B3
T
B1
1.00
0.11
0.00
0.36
B2
0.11
1.00
0.38
0.42
B3
0.00
0.38
1.00
0.17
T
0.36
0.42
0.17
1.00
Case study with fabricated data
log–linear model test for fabricated dataset
Null-hypothesis: The joint distribution can be represented in
terms of distributions assuming conditional independence given
“deposit” T.
Alternative: The joint distribution could be any, i.e., it cannot be
restricted.
loglm(formula =∼B1 ∗ T + B2 ∗ T + B3 ∗ T + T, data =
xtabs(∼.,Q))
Statistics:
Likelihood Ratio
Pearson
χ2
12.83973
13.46839
df
8
8
P(> χ2 )
0.1174842
0.0967179
Null-hypothesis is rejected for any α > P(> χ2 ).
Case study with fabricated data
log–linear model test for fabricated dataset
Null-hypothesis: The joint distribution can be represented in such
terms that the conditional distribution of “deposit” T given all
evidences B1 , B2 , B3 , is given by the logistic model including
interaction terms.
Alternative: The joint distribution could be any, i.e., it cannot be
restricted.
loglm(formula =∼B1 ∗ B2 ∗ B3 + B1 ∗ T + B2 ∗ T + B3 ∗ T + T,
data = xtabs(∼., Q))
Statistics:
Likelihood Ratio
Pearson
χ2
2.501984
1.801089
Null-hypothesis is not rejected.
df
4
4
P(> χ2 )
0.6442805
0.7722831
Case study with fabricated data
Relative frequency by counting
h(T = 1) = 0.1
Conditional relative frequencies by counting
h(T = 1 | B1 = 1) = 0.2666667
h(T = 1 | B2 = 1) = 0.35
h(T = 1 | B3 = 1) = 0.2
h(T = 1 | B1 = 1, B2 = 1) = 0.625
h(T = 1 | B1 = 1, B3 = 1) = 0.5
h(T = 1 | B2 = 1, B3 = 1) = 0.4
h(T = 1 | B1 = 1, B2 = 1, B3 = 1) = 0.75
Case study with fabricated data
Logistic regression model without interaction terms
P T = 1 | B1 B2 B3 = Λ β0 + β1 B1 + β2 B2 + β3 B3 ,
R
in terms of R
glm(T∼B1 + B2 + B3 , family=binomial(”logit”),data=Q)
b = 1 | B1 = 1, B2 = 1, B3 = 1)
P(T
= Λ −4.7957 + 2.7265 + 2.7532 + 0.2008
= Λ 0.884889 = 0.7078343 ± 0.1797243.
Case study with fabricated data
Logistic regression model with interaction terms
P T = 1 | B1 B2 B3 = Λ β0 +β1 B1 +β2 B2 +β3 B3 +β4 B1 B3 +β5 B1 B3 ,
R
in terms of R
glm(T∼B1 ∗ B3 + B2 ∗ B3 + B1 + B2 + B3 ,
family=binomial(”logit”),data=Q)
b = 1 | B1 ∗ B3 = 1, B2 ∗ B3 = 1, B1 = 1, B2 = 1, B3 = 1)
P(T
= Λ −4.6910 + 2.7657 + 2.3881 − 14.9968 − 0.0576 + 15.6903
= Λ 1.098612 = 0.75 ± 0.2165063.
Case study with fabricated data
Relative frequency h(T = 1) = 0.1
Method
counting
se
WofE
se.fit
R
logist reg (R
)
se.fit
R
logist reg (R
)
se.fit
b = 1 | B1 = B2 = B3 = 1)
P(T
0.75
±0.1767767
0.7985915
± 0.7155018
without interaction terms
0.7078343
±0.1797243
with interaction terms
0.75
±0.2165063
Potential modeling
Conclusions
Conclusions
in general
if weights of evidence apply, then they usually yield smaller
variances/errors than logistic regression due to additional
modeling assumption;
logistic regression is more general than weights of evidence,
i.e., it is unrestricted with respect to
modeling assumptions (conditional independence), and
type of random variables (binary).
for the fabricated dataset
numerical results confirm the deficiency of WofE;
the assumption of conditional independence is rejected, the
logistic model including interaction terms is not;
logistic regression including interaction terms seems sufficiently
large to closely approximate the saturated model;
Potential modeling
Acknowledgments
Acknowledgments
Potential modeling is a contribution of the Geomathematics and
Geoinformatics group at TU Bergakademie Freiberg, Germany, to
the “ProMine” project funded by the European Community’s
Seventh Framework Programme under grant agreement no.
228559. This publication reflects only the authors’ view, exempting
the Community from any liability.
It is my special pleasure to acknowledge emphatic discussions with
Don Singer, USGS.
Artificial neural nets
Single–layer feedforward artificial neural net
With respect to artificial neural nets
m
X
π(bi ) = Λ β0 +
β` b`,i , i = 1, . . . , n
`=1
is called a single–layer
perceptron or single–layer ANN,
Cartoon of a single hidden layer ANN from
http://en.wikipedia.org/wiki/Artificial
neural network
minimization of the sum of squared residuals is referred to as
training,
gradient methods to solve for the model parameters are
referred to as linear perceptron training rule,
the stepsize along the negative gradient is called learning rate,
etc.
Case study with fabricated data
B1
B2
B3
T
B1
1.00
0.11
0.00
0.36
B2
0.11
1.00
0.38
0.42
B3
0.00
0.38
1.00
0.17
T
0.36
0.42
0.17
1.00
Correlation matrix of initial dataset
B01
B02
B03
T0
B01
1.00
0.20
0.07
0.54
B02
0.20
1.00
0.47
0.56
B03
0.07
0.47
1.00
0.24
T0
0.54
0.56
0.24
1.00
Correlation matrix of dataset
“balanced” by a factor of 0.5.
Case study with fabricated data
Call:
loglm(formula = ∼ B1 ∗ T + B2 ∗ T + B3 ∗ T + T,
data = xtabs( .,artdat36))
Call:
loglm(formula = ∼ B01 ∗ T0 + B02 ∗ T0 + B03 ∗ T0 + T0 ,
data = xtabs( ., artdat1rep))
Statistics:
Statistics:
Likelihood Ratio
Pearson
X2
12.83973
13.46839
df
8
8
P(> X 2 )
0.1174842
0.0967179
Likelihood Ratio
Pearson
X2
35.19894
31.20648
df
8
8
P(> X 2 )
2.459697e-05
1.290963e-04
Call:
loglm(formula =
∼ B1 ∗ B2 ∗ B3 + B1 ∗ T + B2 ∗ T + B3 ∗ T + T,
data = xtabs( ., artdat36))
Call:
loglm(formula =
∼ B01 ∗ B02 ∗ B03 + B01 ∗ T0 + B02 ∗ T0 + B03 ∗ T0 + T0 ,
data = xtabs( ., artdat1rep))
Statistics:
Statistics:
Likelihood Ratio
Pearson
X2
2.501984
1.801089
df
4
4
P(> X 2 )
0.6442805
0.7722831
Likelihood Ratio
Pearson
X2
8.257430
7.266882
df
4
4
P(> X 2 )
0.08259057
0.12243913