μ μ μ μ

Classification
Classification – basic aim
 Simple allocation rules
 Extensions
 Not the same costs of mis
mis--classifications
 Prior knowledge available
 NOT the
th same (co)variances
( ) i
 Multivariate feature observations
 Other methods:
 K-nearest neighbour
 PLS regression
Multiclass problems
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Classification – basic aim
Simple example
1
Females:
73 kg
2
Males:
100 kg
x
Weight
Observe a weight of a new person:
Task: predict the sex of this new person (”classify”)
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Simple example
Allocation rule
”Obvious” method:
IF weigth close to female level:
cnew  1(female)
IF weigth close to male level:
cnew  2(male)
( 1   2 ) / 2
1
2
Females: 86.5 Males:
100 kg
73 kg
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
x
Weight
1, if xnew  (  1  2) / 2
cnew  
2, if xnew  (  1  2) / 2
( 1   2 ) / 2
1
2
Females: 86.5 Males:
100 kg
73 kg
x
Weight
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
1
Considering population distributions:
f1
Allocation rule
f1
f2
f2
Weight
Females
Males
Weight
Females
Allocation rule: The most ”probable”!
1, if f1 ( xnew ) f 2 ( xnew )  1
c ( xnew )  
2, if f1 ( xnew ) f 2 ( xnew )  1
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Allocation rule
Classification
f1
 Complications:
f2
Weight
Females
Males
IF both distributions are normal with the same variance:
f1 ( x)  f 2 ( x)  x  ( 1   2 ) / 2
 Not the same cost of mismis-classifying in
class 1 or 2
 Prior knowledge available about the class
sizes (NOT fiftyfifty-fifty)
 NOT the same (co)variances
 Extensions:
 Multivariate feature observations
 More than two classes
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Loss function for wrong classification
Different priors
Predicted class
True
Class
1
2
1
0
L(1|2)
2
L(2|1)
0
In this course: We work with equal losses: L(1|2)=L(2|1)
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Males
 Take knowlegde of class sizes into
account:
 Prior probilities P(x|1) and P(x|2)
 Usually: n1/(n1+n2) and n2/(n1+n2)
 Allocation rule:
1, if f1 ( x) f 2 ( x)  P ( x | 2) / P( x | 1)
c( x)  
2, if f1 ( x) f 2 ( x)  P ( x | 2) / P ( x | 1)
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
2
Two class bivariate example
Discriminant Analysis
 Normal distributed data
Weight
f x 
f2
1
 2 
p
e

1
 x -μ t Σ 1  x -μ 
2
Σ
 To classify we must study:
f1 ( x) f 2 ( x)
f1
Fat weight
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Discriminant Analysis
Bivariate normal with homogeneous variance
Σ1 = Σ2 :
 Same x1 and x2 variance
and no correlation:
Exercise 2
f1 ( x) f 2 ( x)  1
7
Linear discriminant function (LDA)
6
x2
5
Σ1  Σ2
4
3
2
Quadratic discriminant function (QDA)
1
0
1
2
3
4
5
x1
6
7
8
9
10
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Bivariate normal with homogeneous variance
Fisher Discriminant Analysis (CVA)
 WITH correlation:
Optimal ratio:
Exercise 1
4
x2
f1 ( x) f 2 ( x)  1
3
f1
f2
2
x
2
1
0
-1
x1
-2
-3
-4
-5
tc
-4
-3
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
-2
-1
0
x1
1
2
3
4
5
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
3
Use data to estimate distributions:
Bivariate normal with heterogeneous covariance
Exercise 3
9
8
log c
f1 ( x) f 2 ( x)  1
7
6
x2
5
4
3
2
1
T
x Σ
1


1 T 1
1 T 1
μ 1  μ 2  μ 1 Σ μ 1  μ 2 Σ μ 2  log c  0
2
2
0
0
2
4
6
8
10
x1
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Use data to estimate distributions:
Relations
Mahalanobis:
 Finds points of “equidistance”
Fisher:
 Finds univariate discrimination score
”General Bayes”:
 Finds points of ”equi-probability”
x  μ 
T
1
Σ1
1
x  μ   x  μ 
T
1
2
Σ2
1
 x  μ   2 log c  0
2
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Relations
k-Nearest Neighbour method
For 2 classes, normal distribution and
variance homogeneity:
The k-NN classifier is a very intuitive
method
 All three methods are the same! (=LDA)
For 2 classes, normal distribution and
variance heterogeneity:
 Bayes = Mahalanobis = QDA
 (Can be mimicked by an LDA with squared and
cross product terms)
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
 Examples are classified based on their similarity
with training data.
The k-NN
k NN only requires:
 An integer k.
 A set of labeled examples.
 A measure of “closeness”.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
4
k-Nearest Neighbour method
k-Nearest Neighbour method
3-NN
3-NN
c=2
x2
c=2
x2
c=1
c=1
?
x1
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
x1
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
k-Nearest Neighbour method
Characteristics of the NN classifier
3-NN
 Advantages
 Analytically tractable, simple implementation
 Nearly optimal in the large sample limit (N→∞).
 Disadvantages
 Large storage requirements
 Computationally intensive recall
 Highly susceptible to the curse of dimensionality
 1-NN versus k-NN
 Dertermine by cross-validation
c=2
x2
c=1
c=1!!
x1
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Multiclass class problems
Mahalanobis/Fisher approach (LDA)
x2
π1
π2
π3
π7
π6
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
π4
Between group variance
Within group variance
π5
x1
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
5
Example
k-Nearest Neighbour method
742 obs: 7 mutants and 600 ions.
CVA:
 PCA:
Data is generated for a 2-dim 3-class
problem, where the likelihoods are
unimodal, and are distributed in rings
around a common mean.
Nobs=742, Nmetabolites=135
Nobs=742, Nmetabolites=135
25
2nd principal component
15
10
5
0
-5
-10
-20
1.5
WT
CAT
OPI
INO2
HAP4
INO4
INO24
1
2nd canonical discrininant function
WT
CAT
OPI
INO2
HAP4
INO4
INO24
20
0.5
0
k = 5
 Metric = Euclidean distance
-0.5
-1
-15
-10
-5
0
5
10
15
20
1st principal component
-1.5
-2
 These classes are also non-linearly separable.
k-NN with
-1.5
-1
-0.5
0
0.5
1
1.5
2
1st canonical discrininant function
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
5-Nearest Neighbour
Link to regression and PLS
Solution
 Two group LDA can be (exactly) obtained by simple regression
 Just use the binary y’s in an MLR
 A version of QDA (not 100% equivalent) can be obtained by
including product and squared variables in the MLR
 PLSR
S and/or
/ PCR
C is also a good idea!
 Multi-class discrimination can be handled by PLS2
 Make K dummy variables (easy/inbuilt in Unscrambler)
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
Classification in Unscrambler X
Classification
 Linear Discriminant Analysis
 Simple allocation rules
 Extensions
 LDA: Multiclass
Multiclass,, multivariate X ((assuming
assuming n_i>p
n_i>p))
 QDA
 (Mahalanobis)
Mahalanobis)
 All may be combined with PCA (PCA
(PCA--LDA option)
 PLS1: 2-group LDA, PLSPLS-DA (a ”PLS” version of
PCA--LDA)
PCA
 PLS2: multi
multi--group LDA, PLS
PLS--DA
 The PLS
PLS--methods can by ”QDA
”QDA--like”
like” by inclusion of
squared/crossed variables.
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
 Not the same costs of mis
mis--classifications
 Prior knowledge available
 NOT the
th same (co
(co))variances
i
 Multivariate feature observations
 Other methods:
methods:
 K-nearest neighbour
 PLS regression
Multiclass problems
© Per Bruun Brockhoff, [email protected], and Michael Edberg Hansen IMM, DTU.
6