Decision theory CS534 Three Main Approaches to Classifier Learning • Approach 1: Directly learn a mapping from x to target output – E.g., Perceptron, SVM and many others – Does not provide probabilistic assessment – May not be appropriate when we want to explicitly consider uncertainty in our decision, e.g., medical diagnosis Referred to as discriminative approaches 2 Three Main Approaches to Classifier Learning • Approach 2: Learn the joint probability distribution: – x, captures all of the uncertainty about x and y – Often achieved by learning x| and , x| – x, • Examples: LDA, Naïve Bayes, etc. • Referred to as Generative approaches – It tries to learn the generative model that is used to generate the data • Sampling ~ • Sampling x~ x| Three Main Approaches to Classifier Learning • Approach 3: Learn a conditional distribution – x, |x x – so this avoids modeling the distribution of x, just focuses on how random variable should behave given x • Example: logistic regression • Also referred to as discriminative approaches Decision theory Given , or , we can use Decision Theory to make the optimal decision considering the uncertainty about , such that – Misclassification rate is minimized, or – Expected loss is minimized Goal 1: Minimizing misclassification rate • We need to a rule to assign each to one of the classes • This rule defines a decision region for each class , such that all points in are assigned to class • Let’s assume we only have two classes and • A mistake occurs when a point belonging to is assigned to , and vice versa • The probability of this happening is mistake x∈ x, , x∈ x x, , x • This is minimized when we assign every point to the class that has the highest , or equivalently 6 Goal 1: Minimizing misclassification rate • For current decision boundary , the probability of mistake = red + green + purple • If we move the decision boundary to , the red region diminishes, and the probability of mistake is minimized 7 Decision rule for minimizing Misclassification rate Decision rule for minimizing p(mistake): yˆ (x) arg max p (x, ci ) ci , it is equivalent to: Note that yˆ (x) arg max p (ci | x) ci Goal 2: Minimizing Expected Loss • We often have more complicated loss function – e.g. for the spam filter problem, we have: true y ŷ S(1) NS(2) S (1) 0 10 NS (2) 1 0 • Our goal under such scenarios will be to minimize the expected loss • For a given x, if it’s true class is denoted by . • The total expected loss is given by: and we assign it to , the loss is x, • x This is minimized when we assign every point to the class that has the x, , or equivalently lowest ∑ |x lowest ∑ 9 Decision rule for Loss Minimizing true y ŷ S(1) NS(2) S (1) 0 10 NS (2) 1 0 p(y|x) 0.6 0.4 • For a given , and Expected loss for predicting S: ∗ 0.6 ∗ 0.4 4 Expected loss for predicting NS: ∗ 0.6 ∗ 0.4 0.6 Expected loss of predicting Reject Option • One could further include an option to abstain from making a prediction, which incurs a different loss • We can easily extend the expected loss formula to consider the reject option
© Copyright 2026 Paperzz