Binary Classification - Nanang Susyanto

Klasifikasi dua kelas (Binary
Classification): part 2
Kapita Selekta Matematika Terapan B
Nanang Susyanto
Departemen Matematika FMIPA UGM
08 Februari 2017
NS (Dep.Mat)
Binary Classification
08/02/2017
1/9
Definition
Given a random pattern/feature/vector x drawn from a domain X ,
determine which value an associated binary random variable
y ∈ Y = {p, n}.
NS (Dep.Mat)
Binary Classification
08/02/2017
2/9
Definition
Given a random pattern/feature/vector x drawn from a domain X ,
determine which value an associated binary random variable
y ∈ Y = {p, n}.
Classification function is f : X 7→ Y .
NS (Dep.Mat)
Binary Classification
08/02/2017
2/9
Definition
Given a random pattern/feature/vector x drawn from a domain X ,
determine which value an associated binary random variable
y ∈ Y = {p, n}.
Classification function is f : X 7→ Y .
Quantification function is q : X 7→ R.
NS (Dep.Mat)
Binary Classification
08/02/2017
2/9
Definition
Given a random pattern/feature/vector x drawn from a domain X ,
determine which value an associated binary random variable
y ∈ Y = {p, n}.
Classification function is f : X 7→ Y .
Quantification function is q : X 7→ R.
Classification by quantification:
q : X 7→ R |{z}
7→ Y .
threshold
NS (Dep.Mat)
Binary Classification
08/02/2017
2/9
Definition
Given a random pattern/feature/vector x drawn from a domain X ,
determine which value an associated binary random variable
y ∈ Y = {p, n}.
Classification function is f : X 7→ Y .
Quantification function is q : X 7→ R.
Classification by quantification:
q : X 7→ R |{z}
7→ Y .
threshold
The function f or q is trained using training data and
tested/evaluated using testing data. Training and testing data have
to be disjoint.
NS (Dep.Mat)
Binary Classification
08/02/2017
2/9
Accuracy and error
Let T be the cardinality of the testing data.
NS (Dep.Mat)
Binary Classification
08/02/2017
3/9
Accuracy and error
Let T be the cardinality of the testing data.
False Positive (FP): The probability of recognizing negative as
positive. How do we compute for binary data?
NS (Dep.Mat)
Binary Classification
08/02/2017
3/9
Accuracy and error
Let T be the cardinality of the testing data.
False Positive (FP): The probability of recognizing negative as
positive. How do we compute for binary data?
False Negative (FN): The probability of recognizing positive as
negative. How do we compute for binary data?
NS (Dep.Mat)
Binary Classification
08/02/2017
3/9
Accuracy and error
Let T be the cardinality of the testing data.
False Positive (FP): The probability of recognizing negative as
positive. How do we compute for binary data?
False Negative (FN): The probability of recognizing positive as
negative. How do we compute for binary data?
Error
FP + FN
e=
.
T
NS (Dep.Mat)
Binary Classification
08/02/2017
3/9
Accuracy and error
Let T be the cardinality of the testing data.
False Positive (FP): The probability of recognizing negative as
positive. How do we compute for binary data?
False Negative (FN): The probability of recognizing positive as
negative. How do we compute for binary data?
Error
FP + FN
e=
.
T
Accuracy
TP + TN
Acc =
= 1 − e.
T
NS (Dep.Mat)
Binary Classification
08/02/2017
3/9
Prior and Posterior
P (p |x )
P (x |p ) P (p )
=
×
.
P (n |x )
P (x |n ) P (n )
NS (Dep.Mat)
Binary Classification
(1)
08/02/2017
4/9
Prior and Posterior
P (p |x )
P (x |p ) P (p )
=
×
.
P (n |x )
P (x |n ) P (n )
P (p )
:
P (n )
(1)
prior odds.
NS (Dep.Mat)
Binary Classification
08/02/2017
4/9
Prior and Posterior
P (p |x )
P (x |p ) P (p )
=
×
.
P (n |x )
P (x |n ) P (n )
P (p )
:
P (n )
(1)
prior odds.
P (p |x )
:
P (n |x )
posterior odds.
NS (Dep.Mat)
Binary Classification
08/02/2017
4/9
Neyman-Pearson Lemma
Let fX |p and fX |n be the density of positive and negative classes,
respectively. The optimal quantification function is the likelihood
ratio function, which is defined as
LR (x) =
fX | p ( x )
fX | n ( x )
(2)
for every point x.
NS (Dep.Mat)
Binary Classification
08/02/2017
5/9
Neyman-Pearson Lemma
Let fX |p and fX |n be the density of positive and negative classes,
respectively. The optimal quantification function is the likelihood
ratio function, which is defined as
LR (x) =
fX | p ( x )
fX | n ( x )
(2)
for every point x.
Optimal: maximum TP at fixed FP.
NS (Dep.Mat)
Binary Classification
08/02/2017
5/9
Bayes Classifier
Assumptions: P (p ) = P (n).
NS (Dep.Mat)
Binary Classification
08/02/2017
6/9
Bayes Classifier
Assumptions: P (p ) = P (n).
Classification rule:
y = p ⇐⇒ LR ≥ 1 and y = n ⇐⇒ LR < 1
NS (Dep.Mat)
Binary Classification
08/02/2017
6/9
Linear Discriminant Analysis
It is assumed that X |p and X |n are normally distributed.
NS (Dep.Mat)
Binary Classification
08/02/2017
7/9
Logistic Regression
Assumption: Fp|X is logistic function.
NS (Dep.Mat)
Binary Classification
08/02/2017
8/9