Lecture Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml3e CHAPTER 3: BAYESIAN DECISION THEORY Probability and Inference 3 Result of tossing a coin is {Heads,Tails} Random var X {1,0} Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X) Sample: X = {xt }Nt =1 Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N Prediction of next toss: Heads if po > ½, Tails otherwise Classification Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk Input: x = [x1,x2]T ,Output: C Î {0,1} Prediction: C 1 if P(C 1| x1 ,x 2 ) 0.5 choose C 0 otherwise or C 1 if P(C 1| x1 ,x 2 ) P(C 0 | x1 ,x 2 ) choose C 0 otherwise 4 Bayes’ Rule 5 prior posterior likelihood P C px|C P C | x px evidence P C 0 P C 1 1 px px|C 1P C 1 px |C 0P C 0 pC 0| x P C 1| x 1 Bayes’ Rule: K>2 Classes 6 px |C i P C i P C i | x px px |C i P C i K px|C k PC k k 1 K P C i 0 and P C i 1 i 1 choose C i if P C i | x max k P C k | x Losses and Risks Actions: αi Loss of αi when the state is Ck : λik Expected risk (Duda and Hart, 1973) K R i | x ik P C k | x k 1 choose i if R i | x mink R k | x 7 Losses and Risks: 0/1 Loss 8 0 if i k ik 1 if i k K R i | x ik P C k | x k 1 P C k | x k i 1 P C i | x For minimum risk, choose the most probable class Losses and Risks: Reject 0 if i k ik if i K 1 , 0 1 1 otherwise K R K 1 | x P C k | x k 1 R i | x PC k | x 1 PC i | x k i choose C i if P C i | x P C k | x k i and P C i | x 1 reject otherwise 9 Different Losses and Reject 10 Equal losses Unequal losses With reject Discriminant Functions choose C i if gi x max k gk x gi x , i 1,, K R i | x gi x PC i | x px | C PC i i K decision regions R1,...,RK R i x|gi x max k gk x 11 K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) C1 if gx 0 choose C 2 otherwise Log odds: P C1 | x log P C 2 | x 12 Utility Theory Prob of state k given exidence x: P (Sk|x) Utility of αi when state is k: Uik Expected utility: EU i | x Uik PSk | x k Choose αi if EU i | x max EU j | x j 13 Association Rules Association rule: X Y People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y. A rule implies association, not necessarily causation. 14 Association measures 15 Support (X Y): # customers who bought X and Y P X ,Y # customers Confidence (X Y): P X ,Y P Y | X P( X ) # customers who bought X and Y Lift (X Y): # customers who bought X P X ,Y P(Y | X ) P( X )P(Y ) P(Y ) Example 16 Apriori algorithm (Agrawal et al., 1996) 17 For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent. If (X,Y) is not frequent, none of its supersets can be frequent. Once we find the frequent k-item sets, we convert them to rules: X, Y Z, ... and X Y, Z, ...
© Copyright 2026 Paperzz