ECE 471/571 – Lecture 10 Bayes Decision Rule (Recap) - UTK-EECS

ECE 471/571 – Lecture 10
Nonparametric Density Estimation – knearest neighbor (kNN)
02/20/17
Different Approaches - More Detail
Pattern Classification
Statistical Approach
Syntactic Approach
Supervised
Unsupervised
Basic concepts:
Baysian decision rule
(MPP, LR, Discri.)
Basic concepts:
Distance
Agglomerative method
Parametric learning (ML, BL)
k-means
Non-Parametric learning (kNN)
Winner-take-all
NN (Perceptron, BP)
Kohonen maps
Dimensionality Reduction
Fisher’s linear discriminant
K-L transform (PCA)
Performance Evaluation
ROC curve
TP, TN, FN, FP
Stochastic Methods
local optimization (GD)
global optimization (SA, GA)
ECE471/571, Hairong Qi
2
Bayes Decision Rule (Recap)
(
) p(x | ωp(x)P) (ω )
P ωj | x =
j
j
R(α i | x ) =
c
∑ λ (α
i
)(
|ωj P ωj | x
)
j =1
g i (x ) = ln p(x | ωi )+ ln P(ωi )
Maximum
Posterior
Probability
Likelihoo
d Ratio
Discriminant
Function
For a given x, if P (ω1 | x ) > P (ω 2 | x ),
then x belongs to class 1, otherwise, 2.
If R (α1 | x ) < R (α 2 | x ), then decide ω1 , that is
p (x | ω1 ) λ12 − λ22 P (ω 2 )
>
p (x | ω 2 ) λ21 − λ11 P (ω1 )
The classifier will assign a feature vector x to class ω i if
g i (x ) > g j (x )
Estimate Gaussian, Two-modal Gaussian
Three cases
Dimensionality reduction
Performance evaluation and ROC curve
3
1
Motivation
Estimate the density functions without
the assumption that the pdf has a
particular form
(
) p(x | ωp(x)P) (ω )
P ωj | x =
j
j
4
*Problem with Parzen Windows
Discontinuous window function ->
Gaussian
The choice of h
Still another one: fixed volume
' x − xi $
"
ϕ%
1 n %& hn "#
pn (x ) = ∑
n i =1
Vn
5
kNN (k-Nearest Neighbor)
To estimate p(x) from n samples, we
can center a cell at x and let it grow
until it contains kn samples, and kn can
be some function of n
Normally, we let
kn = n
6
2
kNN in Classification
pn (x ) =
kn / n
Vn
Given c training sets from c classes, the total number
c
of samples is
n = ∑ nm
m =1
*Given a point x at which we wish to determine the
statistics, we find the hypersphere of volume V which
just encloses k points from the combined set. If
within that volume, km of those points belong to class
m, then we estimate the density for class m by
p(x | ω m ) =
km
nmV
P(ω m ) =
nm
n
p(x ) =
k
nV
7
kNN Classification Rule
P (ω m | x ) =
km nm
p ( x | ω m ) P (ω m ) nmV n km
=
=
k
p ( x)
k
nV
The decision rule tells us to look in a
neighborhood of the unknown feature vector
for k samples. If within that neighborhood,
more samples lie in class i than any other
class, we assign the unknown as belonging to
class i.
8
Problems
What kind of distance should be used to
measure “nearest”
n
Euclidean metric is a reasonable
measurement
Computation burden
n
n
Massive storage burden
Need to compute the distance from the
unknown to all the neighbors
9
3
Computational Complexity of kNN
In both space (storage space) and time
(search time)
Algorithms reducing the computational
burden
n
n
n
Computing partial distances
Prestructuring
Editing the stored prototypes
10
Method 1 - Partial Distance
Calculate the distance using some
subset r of the full d dimensions. If this
partial distance is too large, we don’t
need to compute further
What we know about the subspace is
indicative of the full space
1/ 2
& r
2#
Dr (a, b ) = $ ∑ (ak − bk ) !
% k =1
"
11
Method 2 - Prestructuring
Prestructure samples in the training set into a search
tree where samples are selectively linked
Distances are calculated between the testing sample
and a few stored “root” samples, and then consider
only the samples linked to it
Closest distance no longer guaranteed.
X
o o
o o
X x
x o o
x x x x
x
x
x
12
4
Method 3 –
Editing/Pruning/Condensing
Eliminate “useless” samples during
training
n
For example, eliminate samples that are
surrounded by training points of the same
category label
Leave the decision boundaries
unchanged
13
Discussion
Combining three methods
Other concerns
n
n
The choice of k
The choice of metrics
14
Distance (Metrics) Used by kNN
Properties of metrics
n
n
n
n
Nonnegativity (D(a,b) >=0)
Reflexivity (D(a,b) = 0 iff a=b)
Symmetry (D(a,b) = D(b,a))
Triangle inequality (D(a,b) + D(b,c) >= D(a,c))
Different metrics
n
Minkowski metric
$d
'1/ k
k
Lk (a,b) = &∑ ai − bi )
% i=1
(
d
L1 (a,b) = ∑ ai − bi
w Manhattan distance (city block distance) d
i=1
2
L2 (a,b) = ∑ ai − bi
w Euclidean distance
i=1
€
w When k is inf, maximum of the projected distances onto each
of the d coordinate axes
€
€
15
5
Visualizing the Distances
16
6