Naive Bayes Classifiers (NBC) J.-S. Roger Jang (張智星) [email protected] http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University Assumptions & Characteristics Assumptions: Statistical independency between features Statistical independency between samples Each feature governed by a feature-wise parameterized PDF (usually a 1D Gaussian) Characteristics Simple and easy (That’s why it’s named “naive”.) Highly successful in real-world applications regardless of the strong assumptions 2/9 Training and Test Stages of NBC Quiz! Training stage Identify class PDF, as follows. Identify feature PDF by MLE for 1D Gaussians Class PDF is the product of all the corresponding feature PDFs pdf C (x) pdf1,C ( x1 ) * pdf 2,C ( x2 ) * * pdf d ,C ( xd ) Test stage Assign a sample to the class by taking class prior into consideration: C arg min Pr C * pdf C (x) C 3/9 NBC for Gender Dataset (1/2) Scatter plot of Gender dataset PDF on each features and each class 4/9 NBC for Gender Dataset (2/2) PDF for each class Decision boundary 5/9 NBC for Iris Dataset (1/3) iris 25 PDF on each features and each class 0.4 0.2 0 10 5 10 20 30 40 sepal length 50 60 class 2 (Setosa) 15 class 3 (Setosa) sepal width 20 20 40 60 dim 1 (sepal length) 0.1 0.05 0 20 40 60 dim 1 (sepal length) 0.1 0.05 0 20 40 60 dim 1 (sepal length) class 3 (Versicolour) class 2 (Versicolour) class 1 (Versicolour) Scatter plot of Iris dataset (with only the last two dim.) class 1 (Setosa) 0.4 0.2 0 5 10 15 20 dim 2 (sepal width) 25 5 10 15 20 dim 2 (sepal width) 25 5 10 15 20 dim 2 (sepal width) 25 0.4 0.2 0 0.2 0.1 0 6/9 NBC for Iris Dataset (2/3) PDF for each class 7/9 NBC for Iris Dataset (3/3) Decision boundaries (using the last 2 features) 8/9 Strength and Weakness of NBC Quiz! Strength Fast computation during training and evaluation Robust than QC Weakness No able to deal with bi-modal data correctly Class boundary not as complex as QC 9/9
© Copyright 2024 Paperzz