NBC

Naive Bayes Classifiers
(NBC)
J.-S. Roger Jang (張智星)
[email protected]
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
Assumptions & Characteristics

Assumptions:
Statistical independency between features
 Statistical independency between samples
 Each feature governed by a feature-wise parameterized PDF
(usually a 1D Gaussian)


Characteristics
Simple and easy (That’s why it’s named “naive”.)
 Highly successful in real-world applications regardless of the
strong assumptions

2/9
Training and Test Stages of NBC

Quiz!
Training stage

Identify class PDF, as follows.
Identify feature PDF by MLE for 1D Gaussians
 Class PDF is the product of all the corresponding feature PDFs

pdf C (x)  pdf1,C ( x1 ) * pdf 2,C ( x2 ) * * pdf d ,C ( xd )

Test stage

Assign a sample to the class by taking class prior into
consideration:

C  arg min Pr C * pdf C (x)
C
3/9
NBC for Gender Dataset (1/2)

Scatter plot of
Gender dataset

PDF on each features
and each class
4/9
NBC for Gender Dataset (2/2)

PDF for each class

Decision boundary
5/9
NBC for Iris Dataset (1/3)
iris
25
PDF on each features
and each class
0.4
0.2
0
10
5
10
20
30
40
sepal length
50
60
class 2 (Setosa)
15
class 3 (Setosa)
sepal width
20

20
40
60
dim 1 (sepal length)
0.1
0.05
0
20
40
60
dim 1 (sepal length)
0.1
0.05
0
20
40
60
dim 1 (sepal length)
class 3 (Versicolour) class 2 (Versicolour) class 1 (Versicolour)
Scatter plot of Iris
dataset (with only
the last two dim.)
class 1 (Setosa)

0.4
0.2
0
5
10
15
20
dim 2 (sepal width)
25
5
10
15
20
dim 2 (sepal width)
25
5
10
15
20
dim 2 (sepal width)
25
0.4
0.2
0
0.2
0.1
0
6/9
NBC for Iris Dataset (2/3)

PDF for each class
7/9
NBC for Iris Dataset (3/3)

Decision boundaries (using the last 2 features)
8/9
Strength and Weakness of NBC

Quiz!
Strength
Fast computation during training and evaluation
 Robust than QC


Weakness
No able to deal with bi-modal data correctly
 Class boundary not as complex as QC

9/9