p (x)

SPECIAL TOPICS IN SCIENTIFIC
COMPUTING
Pattern Recognition & Data Mining
Lect2: Bayesian Decision Theory
Ref:


Bishop: 1.5
Duda: 2.1-2.2
Decision Theory
Consider, for example, a medical diagnosis problem in which we have taken an
X-ray image of a patient, and we wish to determine whether the patient has cancer or
not
input vector x is the set of pixel intensities in the image
output variable t will represent the presence of cancer, which we denote by the
class C1, or the absence of cancer, which we denote by the class C2.
Class C1: t=0
Class C2: t=1
P(X,t) gives us the most complete probabilistic description of the situation
Minimizing the misclassification rate
Example: Consider Two class C1 & C2
R1 and R2 are Real Area of C1 & C2 Class respectively
Probability of Miss Classification:
Good Decision should minimize P(Mistake): We should assign x to C1 if
P(x,C1)>P(x,C2)
Applications: portfolio optimization
P(x,C1)=P(C1|x)P(x)
Optimal Decision: Assign x to C1 if:
P(C1|x)>P(C2|x)
General Form:
For the more general case of K classes, it is slightly easier to maximize the
probability of being correct, which is given by:
Optimal: Assign x to Class Ci :
i=argmax(P(x,Ck)), k=1,…,K
Or
i=argmax(P(Ck|x)), k=1,…,K
Minimizing the expected loss
For many applications, our objective will be more complex than simply minimizing the
number of misclassifications.
Consider Medical diagnosis problem:
We note that, if a patient who does not have cancer is incorrectly diagnosed
as having cancer, the consequences may be some patient distress plus the need for
further investigations.
Conversely, if a patient with cancer is diagnosed as healthy,
the result may be premature death due to lack of treatment.
Thus the consequences of these two types of mistake can be dramatically different. It
would clearly be better to make fewer mistakes of the second kind, even if this was at the
expense of making more mistakes of the first kind.
Minimizing the expected loss :Loss Function
:
Expected Loss:
Optimal Decision:
Minimization of
E[L]
Minimization of E[L]
j c
R (Ci | x)    (Ci | Ck ) P (Ck | x)
j 1
j c
Format in Duda book:
R ( i | x)    ( i |  j ) P ( j | x)
j 1
Minimizing E[L]
Minimize R(i | x) for i = 1,…, k
Optimal Decision: Assign x to Ck:
k=argmin{R(Ci|x)}, i=1,…,K
Two-category classification
1 : deciding 1
2 : deciding 2
ik = (i | k)
loss incurred for deciding i when the true state of nature
is k
Conditional risk:
R(1 | x) = 11P(1 | x) + 12P(2 | x)
R(2 | x) = 21P(1 | x) + 22P(2 | x)
Example
Bayes decision rule is stated as:
if R(1 | x)
< R(2 | x)
Take action 1: “decide 1”
This results in the equivalent rule:
decide 1 if:
(21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2)
and decide 2
otherwise
P ( x |  1 ) 12   22 P (  2 )
if

.
P ( x |  2 )  21  11 P (  1 )
Example
Let
12   22 P (  2 )
P( x | 1 )
.
   then decide  1 if :
 
21  11 P (  1 )
P( x |  2 )
 0 1

  
1 0
then   
P(  2 )
 a
P( 1 )
0 2 
2 P(  2 )
 then   
if   
 b
1
0
P
(

)


1
Reject option:
Reject x if: Max (P(Ck|x)) < t
t: Reject parameter
Decision Approaches:
Generative:
Discriminative Model: logistic regression
Decision Approaches:
Discriminant Functions
Optimal Decision: Assign x to C1 if:
P(C1|x)>P(C2|x)
P(Ck): Priori Probability
P(x|Ck): maximum likelihood
P(Ck|x): Posterior Probability
Example:
From sea bass vs. salmon example to “abstract” decision making problem
State of nature; a priori (prior) probability
State of nature (which type of fish will be observed next) is unpredictable, so it is a
random variable
The catch of salmon and sea bass is equiprobable
P(1) = P(2) (uniform priors)
P(1) + P( 2) = 1 (exclusivity and exhaustively)
Prior prob. reflects our prior knowledge about how likely we are to observe a sea
bass or salmon; these probabilities may depend on time of the year or the fishing
area!
Example
Bayes decision rule with only the prior
information
Decide 1 if P(1) > P(2), otherwise
decide 2
Error rate = Min {P(1) , P(2)}
Suppose now we have a measurement or
feature on the state of nature - say the fish
lightness value
Use of the class-conditional probability
density
P(x | 1) and P(x | 2) describe the
difference in lightness feature between
populations of sea bass and salmon
Maximum likelihood decision rule
Assign input pattern x to class 1 if
P(x | 1) > P(x | 2), otherwise 2
How does the feature x influence our attitude (prior) concerning the true
state of nature?
Bayes decision rule
Posteriori probability
Posteriori probability, likelihood, evidence
P(j , x) = P(j | x)p (x) = p(x | j) P (j)
Bayes formula
P(j | x) = {p(x | j) . P
(j)} / p(x)
where
j 2
P ( x)   P ( x |  j ) P ( j )
j 1
Posterior = (Likelihood. Prior) / Evidence
Optimal Bayes decision rule
Optimal Bayes decision rule
Decide 1 if P(1 | x) > P(2 | x);
otherwise decide 2
Special cases:
(i) P(1) = P(2); Decide 1 if
p(x | 1) > p(x | 2), otherwise 2
(ii) p(x | 1) = p(x | 2); Decide 1 if
P(1) > P(2), otherwise 2

Download Report

p (x)

Paperzz.com

Your Paperzz