“Potential Modeling” Mathematische Methoden, Modellannahmen und praktische Anwendungen Helmut Schaeben Mathematische Geologie und Geoinformatik Institut für Geophysik und Geoinformatik TU Bergakademie Freiberg 10. Sächsischen GIS-Forum des GDI Sachsen e.V. Dresden, 29. Jan. 2013 Potential modeling Contents Introduction Objective Prerequisites Methods Weights of evidence Logistic regression Comparison Case study Potential modeling Introduction Objective Prerequisites Potential Modeling Objective The ultimate goal of “potential modeling” or “targeting” is to recognize locations for which the probability of a “target” event like a landslide or a specific mineralization is a relative maximum. Potential Modeling Objective The ultimate goal of “potential modeling” or “targeting” is to recognize locations for which the probability of a “target” event like a landslide or a specific mineralization is a relative maximum. The event must be sufficiently well understood in terms of cause and effect to collect data corresponding to spatially referenced factors (“evidences”) B` , ` = 0, . . . , m, in favor or against the event T to occur. Potential Modeling Objective The ultimate goal of “potential modeling” or “targeting” is to recognize locations for which the probability of a “target” event like a landslide or a specific mineralization is a relative maximum. The event must be sufficiently well understood in terms of cause and effect to collect data corresponding to spatially referenced factors (“evidences”) B` , ` = 0, . . . , m, in favor or against the event T to occur. Then spatially referenced “posterior” probabilities given the pieces of evidence can be estimated by several approaches including weights-of-evidence, logistic regression, fuzzy logic, artificial neural nets, statistical learning, support vector machines, and others. Potential Modeling Objective The ultimate goal of “potential modeling” or “targeting” is to recognize locations for which the probability of a “target” event like a landslide or a specific mineralization is a relative maximum. The event must be sufficiently well understood in terms of cause and effect to collect data corresponding to spatially referenced factors (“evidences”) B` , ` = 0, . . . , m, in favor or against the event T to occur. Then spatially referenced “posterior” probabilities given the pieces of evidence can be estimated by several approaches including weights-of-evidence, logistic regression, fuzzy logic, artificial neural nets, statistical learning, support vector machines, and others. These methods require a training area to estimate the parameters of the model M(θ0 , . . . , θm | (b0,i , . . . , bm,i , ti )i=1,...,n ). Potential Modeling – Höffigkeitsprognose Prerequisites: No mineral deposits without Mineralogy Cox, D.P., Singer, D.A., eds., 1986, Mineral deposit models: U.S. Geological Survey Bulletin 1693 Singer, D.A., Menzie, W.D., 2010, Quantitative Mineral Resource Assessments, an Integrated Approach: Oxford University Press McCuaig, T.C., Beresford, S., Hronsky, J., 2010, Translating the mineral systems approach into an effective exploration targeting system: Ore Geology Reviews 38, 128–138 Fabricated data: Training area 2 10 y 6 8 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 4 6 8 10 2 1 1 1 1 1 1 1 1 1 1 1 1 4 6 binary target T 8 10 8 10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y 1 1 1 2 4 6 x 8 10 1 1 1 1 1 1 2 2 4 1 1 8 10 binary evidence B3 10 x 8 6 1 1 1 1 x 4 y 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 binary evidence B2 6 2 4 y 6 8 10 binary evidence B1 2 4 6 x Fabricated data: Training area 2 10 8 1 1 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 2 2 2 2 1 1 y 1 1 1 1 2 2 2 2 1 1 4 1 1 1 1 1 1 1 1 1 1 B1 + B3 2 2 4 y 6 8 10 B1 + B2 4 6 8 1 1 1 1 2 2 1 1 1 1 10 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 4 10 1 1 1 1 1 1 y 1 1 4 2 2 1 1 2 2 2 1 1 6 8 10 8 6 y 4 2 2 1 1 2 4 6 x 1 1 8 1 1 10 B1 + B2 + B3 2 2 2 1 1 1 1 x B2 + B3 2 2 1 1 1 1 6 x 1 1 1 1 8 10 1 1 1 1 2 2 1 1 1 1 1 1 1 1 3 3 2 2 1 1 2 1 1 1 1 3 3 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 4 6 x 1 1 1 1 8 1 1 1 1 10 Fabricated data: Posterior prob ∼ single evidence 0.8 10 0.6 8 0.4 T levels 6 4 0 2 4 6 8 0.2 0.0 0 0.0 0 0.2 2 seq(0, 10, 1) 0.6 0.4 T levels 6 4 2 seq(0, 10, 1) 8 0.8 10 1.0 T ~ B2 1.0 T ~ B1 10 0 2 4 seq(0, 10, 1) 6 8 10 seq(0, 10, 1) binary target T 8 1 6 1 8 0.8 10 1.0 10 T ~ B3 0.6 y T levels 6 0.4 1 1 1 1 0.2 2 4 4 1 1 2 1 0.0 0 seq(0, 10, 1) 1 0 2 4 6 seq(0, 10, 1) 8 10 2 4 6 x 8 10 0 0 2 2 T levels 0.6 6 4 seq(0, 10, 1) 6 8 10 T levels 0.6 0.8 10 1.0 10 0.4 8 1.0 8 6 0.8 10 6 4 8 4 seq(0, 10, 1) 0.4 4 seq(0, 10, 1) 2 0.2 0.2 2 0 0.0 0.0 0 0.0 0.0 0 0 0.2 0.2 2 2 T levels 0.6 6 0.4 4 T levels 0.6 6 seq(0, 10, 1) 0.4 4 seq(0, 10, 1) 8 8 0.8 0.8 10 10 1.0 1.0 Fabricated data: Posterior prob ∼ single evidence T ~ B1 + B2 T ~ B1 + B3 seq(0, 10, 1) 0 0 2 seq(0, 10, 1) 2 4 4 6 T ~ B2 + B3 T ~ B1 + B2 + B3 seq(0, 10, 1) 6 8 10 8 10 Potential modeling Methods Weights of evidence Logistic regression Comparison Terms Odds For probabilities P(A) 6= 1, A ∈ A, odds are defined as the ratio O(A) = P(A) P(A) = , p ∈ [0, 1). c P( A) 1 − P(A) Logits Logits are defined as logit(A) = ln O(A) = ln P(A) , p ∈ [0, 1). 1 − P(A) Logistic function The logistic function is defined as 1 Λ(z) = , z ∈ R. 1 + exp(−z) Terms 1.0 0.8 0.6 0.0 0.2 0.4 logisticfunction 0.6 0.4 0.0 0.2 logisticfunction 0.8 1.0 Logistic function -10 -5 0 z 5 10 -10 -5 0 5 10 z Graphs of the sigmoidal function Λ(z) (left) and Λ(32z) (right). Terms 1.0 0.8 0.6 0.0 0.2 0.4 logisticfunction 0.6 0.4 0.0 0.2 logisticfunction 0.8 1.0 Logistic function -10 -5 0 5 10 -10 -5 z 0 5 10 z Graphs of the sigmoidal function Λ(z) (left) and Λ(32z) (right). Remark Logit transform and logistic function are mutually inverse, i.e. Λ(z) = z. logit Λ(z) = ln 1 − Λ(z) Terms Conditional independence of random events Two random events A1 and A2 are conditionally stochastically independent given the conditioning random event C , Z1 ⊥⊥Z2 | C, if, given the occurrence of event C , knowledge of the occurrence of event A2 does not affect the probability of event A1 , i.e. P(A1 | A2 C ) = P(A1 | C ), Terms Conditional independence of random events Two random events A1 and A2 are conditionally stochastically independent given the conditioning random event C , Z1 ⊥⊥Z2 | C, if, given the occurrence of event C , knowledge of the occurrence of event A2 does not affect the probability of event A1 , i.e. P(A1 | A2 C ) = P(A1 | C ), Conditional independence of random variables Two random variables Z1 , Z2 are conditionally stochastically independent given the conditioning random variable C, if knowing C renders Z2 irrelevant for predicting Z1 . In terms of densities, fZ1 | Z2 ,C = fZ1 | C . Weights of evidence Assuming conditional independence of binary evidential random variables B` , ` = 1, . . . , m, given the binary random target variable T yields weights of evidence W`+ := ln P(B` = 1 | T = 1) , P(B` = 1 | T = 0) W`− := ln P(B` = 0 | T = 1) , P(B` = 0 | T = 0) conditional “posterior” probabilities in terms of a logit (“log–linear form of Bayes’ formula”) X X logit P(T = 1 | (B` = b` )`=1,...,m ) = logit P(T = 1) + W`+ + W`− , `:b` =1 `:b` =0 in terms of a probability X X −1 −1 W`+ − W`− . P T = 1 | (B` = b` )`=1,...,m = 1 + O(T) exp − `:b` =1 `:b` =0 Weights of evidence Estimation of weights W Given the sample B`,i , Ti , i = 1, . . . , n, ` = 1, . . . , m, the weights are estimated by counting, i.e. c + = ln W ` c − = ln W ` Pn Pn i=1 B`,i Ti i=1 B`,i Ti P P n n i=1 Ti i=1 Ti Pn Pn P = ln B`,i (1−Ti ) B`,i − ni=1 B`,i Ti i=1 i=1 Pn Pn n − i=1 Ti i=1 (1−Ti ) Pn (1−B`,i )Ti i=1 Pn i=1 Ti Pn = ln . . . . (1−B )(1−Ti ) i=1 Pn `,i (1−T ) i i=1 , Logistic regression Conditional expectation of a binary random variable T given the realisation b = (b0 , b1 , . . . , bm )T ∈ Rm+1 of a (m + 1)–variate random predictor variable B = (B0 , B1 , . . . , Bm )T with B0 ≡ 1 E(T | B = b) = P(T = 1 | B = b) =: π(b). Then the logistic regression model without interaction terms is in terms of a logit m X logit P(T = 1 | (B` = b` )`=0,...,m ) = β` b` , `=0 in terms of a probability π(b) = P T = 1 | (B` = b` )`=0,...,m m m X X −1 = Λ β` b` = 1 + exp − β` b` . `=0 `=0 Logistic regression Estimation of parameters β Given the sample B`,i , Ti , i = 1, . . . , n, ` = 1, . . . , m, the parameters of the logistic regression model are estimated with well established, well understood methods based on probability and encoded in any major statistical software package Method: Maximum likelihood estimation Numerics: Fisher scoring algorithm (a form of Newton-Raphson, special case of iteratively reweighted least squares algorithm) ensuring “nice” statistical properties of the estimates. Artificial neural nets Single–layer feedforward artificial neural net With respect to artificial neural nets m X π(bi ) = Λ β0 + β` b`,i , i = 1, . . . , n `=1 is called a single–layer perceptron or single–layer ANN, Cartoon of a single hidden layer ANN from http://en.wikipedia.org/wiki/Artificial neural network minimization of the sum of squared residuals is referred to as training, gradient methods to solve for the model parameters are referred to as linear perceptron training rule, the stepsize along the negative gradient is called learning rate, etc. Comparison: Modeling assumptions, appropriate models Conditional independence If the pieces of evidence are conditionally independent given the target, then the method of weights of evidence applies, logistic regression without interaction terms yields the proper complete model. Comparison: Modeling assumptions, appropriate models Conditional independence If the pieces of evidence are conditionally independent given the target, then the method of weights of evidence applies, logistic regression without interaction terms yields the proper complete model. If individual pieces of evidence are not conditionally independent given the target, then the method of weights of evidence does not apply, logistic regression still applies, but interaction terms may be needed to yield the proper complete model. Comparison: Modeling assumptions, appropriate models Conditional independence If the pieces of evidence are conditionally independent given the target, then the method of weights of evidence applies, logistic regression without interaction terms yields the proper complete model. If individual pieces of evidence are not conditionally independent given the target, then the method of weights of evidence does not apply, logistic regression still applies, but interaction terms may be needed to yield the proper complete model. If (some) pieces of evidence are not binary, then the method of weights of evidence does not apply, logistic regression still applies, interaction terms may be needed, but multi-linear interaction terms may yield only approximations to the proper complete model. Potential modeling Case study with fabricated data Fabricated data: Training area 2 10 y 6 8 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 4 6 8 10 2 1 1 1 1 1 1 1 1 1 1 1 1 4 6 binary target T 8 10 8 10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y 1 1 1 2 4 6 x 8 10 1 1 1 1 1 1 2 2 4 1 1 8 10 binary evidence B3 10 x 8 6 1 1 1 1 x 4 y 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 binary evidence B2 6 2 4 y 6 8 10 binary evidence B1 2 4 6 x Case study with fabricated data Correlation matrix of fabricated data B1 B2 B3 T B1 1.00 0.11 0.00 0.36 B2 0.11 1.00 0.38 0.42 B3 0.00 0.38 1.00 0.17 T 0.36 0.42 0.17 1.00 Case study with fabricated data log–linear model test for fabricated dataset Null-hypothesis: The joint distribution can be represented in terms of distributions assuming conditional independence given “deposit” T. Alternative: The joint distribution could be any, i.e., it cannot be restricted. loglm(formula =∼B1 ∗ T + B2 ∗ T + B3 ∗ T + T, data = xtabs(∼.,Q)) Statistics: Likelihood Ratio Pearson χ2 12.83973 13.46839 df 8 8 P(> χ2 ) 0.1174842 0.0967179 Null-hypothesis is rejected for any α > P(> χ2 ). Case study with fabricated data log–linear model test for fabricated dataset Null-hypothesis: The joint distribution can be represented in such terms that the conditional distribution of “deposit” T given all evidences B1 , B2 , B3 , is given by the logistic model including interaction terms. Alternative: The joint distribution could be any, i.e., it cannot be restricted. loglm(formula =∼B1 ∗ B2 ∗ B3 + B1 ∗ T + B2 ∗ T + B3 ∗ T + T, data = xtabs(∼., Q)) Statistics: Likelihood Ratio Pearson χ2 2.501984 1.801089 Null-hypothesis is not rejected. df 4 4 P(> χ2 ) 0.6442805 0.7722831 Case study with fabricated data Relative frequency by counting h(T = 1) = 0.1 Conditional relative frequencies by counting h(T = 1 | B1 = 1) = 0.2666667 h(T = 1 | B2 = 1) = 0.35 h(T = 1 | B3 = 1) = 0.2 h(T = 1 | B1 = 1, B2 = 1) = 0.625 h(T = 1 | B1 = 1, B3 = 1) = 0.5 h(T = 1 | B2 = 1, B3 = 1) = 0.4 h(T = 1 | B1 = 1, B2 = 1, B3 = 1) = 0.75 Case study with fabricated data Logistic regression model without interaction terms P T = 1 | B1 B2 B3 = Λ β0 + β1 B1 + β2 B2 + β3 B3 , R in terms of R glm(T∼B1 + B2 + B3 , family=binomial(”logit”),data=Q) b = 1 | B1 = 1, B2 = 1, B3 = 1) P(T = Λ −4.7957 + 2.7265 + 2.7532 + 0.2008 = Λ 0.884889 = 0.7078343 ± 0.1797243. Case study with fabricated data Logistic regression model with interaction terms P T = 1 | B1 B2 B3 = Λ β0 +β1 B1 +β2 B2 +β3 B3 +β4 B1 B3 +β5 B1 B3 , R in terms of R glm(T∼B1 ∗ B3 + B2 ∗ B3 + B1 + B2 + B3 , family=binomial(”logit”),data=Q) b = 1 | B1 ∗ B3 = 1, B2 ∗ B3 = 1, B1 = 1, B2 = 1, B3 = 1) P(T = Λ −4.6910 + 2.7657 + 2.3881 − 14.9968 − 0.0576 + 15.6903 = Λ 1.098612 = 0.75 ± 0.2165063. Case study with fabricated data Relative frequency h(T = 1) = 0.1 Method counting se WofE se.fit R logist reg (R ) se.fit R logist reg (R ) se.fit b = 1 | B1 = B2 = B3 = 1) P(T 0.75 ±0.1767767 0.7985915 ± 0.7155018 without interaction terms 0.7078343 ±0.1797243 with interaction terms 0.75 ±0.2165063 Potential modeling Conclusions Conclusions in general if weights of evidence apply, then they usually yield smaller variances/errors than logistic regression due to additional modeling assumption; logistic regression is more general than weights of evidence, i.e., it is unrestricted with respect to modeling assumptions (conditional independence), and type of random variables (binary). for the fabricated dataset numerical results confirm the deficiency of WofE; the assumption of conditional independence is rejected, the logistic model including interaction terms is not; logistic regression including interaction terms seems sufficiently large to closely approximate the saturated model; Potential modeling Acknowledgments Acknowledgments Potential modeling is a contribution of the Geomathematics and Geoinformatics group at TU Bergakademie Freiberg, Germany, to the “ProMine” project funded by the European Community’s Seventh Framework Programme under grant agreement no. 228559. This publication reflects only the authors’ view, exempting the Community from any liability. It is my special pleasure to acknowledge emphatic discussions with Don Singer, USGS. Artificial neural nets Single–layer feedforward artificial neural net With respect to artificial neural nets m X π(bi ) = Λ β0 + β` b`,i , i = 1, . . . , n `=1 is called a single–layer perceptron or single–layer ANN, Cartoon of a single hidden layer ANN from http://en.wikipedia.org/wiki/Artificial neural network minimization of the sum of squared residuals is referred to as training, gradient methods to solve for the model parameters are referred to as linear perceptron training rule, the stepsize along the negative gradient is called learning rate, etc. Case study with fabricated data B1 B2 B3 T B1 1.00 0.11 0.00 0.36 B2 0.11 1.00 0.38 0.42 B3 0.00 0.38 1.00 0.17 T 0.36 0.42 0.17 1.00 Correlation matrix of initial dataset B01 B02 B03 T0 B01 1.00 0.20 0.07 0.54 B02 0.20 1.00 0.47 0.56 B03 0.07 0.47 1.00 0.24 T0 0.54 0.56 0.24 1.00 Correlation matrix of dataset “balanced” by a factor of 0.5. Case study with fabricated data Call: loglm(formula = ∼ B1 ∗ T + B2 ∗ T + B3 ∗ T + T, data = xtabs( .,artdat36)) Call: loglm(formula = ∼ B01 ∗ T0 + B02 ∗ T0 + B03 ∗ T0 + T0 , data = xtabs( ., artdat1rep)) Statistics: Statistics: Likelihood Ratio Pearson X2 12.83973 13.46839 df 8 8 P(> X 2 ) 0.1174842 0.0967179 Likelihood Ratio Pearson X2 35.19894 31.20648 df 8 8 P(> X 2 ) 2.459697e-05 1.290963e-04 Call: loglm(formula = ∼ B1 ∗ B2 ∗ B3 + B1 ∗ T + B2 ∗ T + B3 ∗ T + T, data = xtabs( ., artdat36)) Call: loglm(formula = ∼ B01 ∗ B02 ∗ B03 + B01 ∗ T0 + B02 ∗ T0 + B03 ∗ T0 + T0 , data = xtabs( ., artdat1rep)) Statistics: Statistics: Likelihood Ratio Pearson X2 2.501984 1.801089 df 4 4 P(> X 2 ) 0.6442805 0.7722831 Likelihood Ratio Pearson X2 8.257430 7.266882 df 4 4 P(> X 2 ) 0.08259057 0.12243913
© Copyright 2025 Paperzz