※ Discrimination and classification
1. To describe the differential features of objects from several known collections.
2. To assign new objects into two or more labeled by using a rule.
Example:
We want to judge two species of chickweed by Sepal(萼片) and petal(花瓣) length, petal cleft(半裂的)
depth, bract(苞片) length, scarious(乾膜質的) tip length, pollen(花粉) diameter(直徑).
Let f1 ( x) and f 2 ( x) be density functions associates with the p 1 vector random variable X for the
populations 1 and 2 . Let be the sample space of all possible observations x .
Let R1 be the set of x values for which we classify objects as 1 and R2 = R1 be the remaining
x values for which we classify objects as 2 . The conditional probability of classifying an object as 2
when it is from 1 is p(2 | 1) p( X R2 | 1 ) f1 ( x) dx , similarly, the conditional probability of
R2
classifying an object as 1 when it is really from 2 is p(1 | 2) p ( X R1 | 2 ) f 2 ( x) dx .
R1
Let p1 be the prior probability of 1 and p2 be the prior probability of 2 , where p1 p2 1. Then
p(obervation is misclassif ied as 1 ) p( X R1 | 2 ) p( 2 ) p(1 | 2) p2 ,
p(obervation is misclassif ied as 2 ) p( X R2 | 1 ) p(1 ) p(2 | 1) p1 .
Let c(1 | 2) is cost when an observation from 2 is incorrectly classified as 1 , and c(2 | 1) is the cost
when 1 observation is incorrectly classified as 2 .
1. For two populations (minimize ECM):
Allocate x to 1 if
f1 ( x) C (1 2) P2
, in particular two populations are normal distribution then
f 2 ( x) C (2 1) P1
a. If 1 2
C (1 2) P2
1
)
Allocate x to 1 if ( 1 2 ) ' 1 x ( 1 2 ) ' 1 ( 1 2 ) ln(
2
C (2 1) P1
PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , Spooled for 1 , 2 ,
respectively.
b. If 1 2
Allocate x to 1 if
C (1 2) P2
1
x(11 21 ) x ( 1' 11 2' 21 ) x K ln(
)
2
C (2 1) P1
1
1
Where K ln( 1 ) ( 1' 11 1 2 )
2 2
2
PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , S1 , S2 for 1 , 2 , 1 , 2
respectively.
2. For two populations using Fisher’s discrimination (maximum separation):
It does not assume that the populations are normal, but it need the equal population covariance matrices.
| y y2 |
Choose a linear transformation a' , y a ' x , such that the separation= 1
is maximum, where
Sy
S y2
n1
n2
j 1
j 1
( y1 j y1 ) 2 ( y2 j y2 ) 2
n1 n2 2
is the pooled estimate of variance. Then the linear transformation
1
yˆ aˆ ' x ( x1 x2 )' S pooled
x maximizes the ratio of the separation
So, allocate x to 1 if ( 1 2 ) ' 1 x
C (1 2) P2
1
( 1 2 ) ' 1 ( 1 2 ) ln(
)
2
C (2 1) P1
PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , Spooled for 1 , 2 ,
respectively.
Remark: Fisher’s linear discrimination rule is equivalent to the minimum ECM rule with equal prior
probabilities and equal costs of misclassification.
3. For several populations (minimum TPM):
Allocate x to k if ln Pk f k ( x) ln Pi f i ( x) for all i k , in particular all populations are normal
distribution then
a. unequal i
1
1
Allocate x to k if d kQ ( x) max {d iQ ( x)} , where d iQ ( x) ln i ( x i ) ' i1 ( x i ) ln Pi
1i g
2
2
PS: Above inequality is implemented by substituting the sample quantities xi , Si for i , i respectively.
b. equal i , i 1,2, g
1
Allocate x to k if di ( x) max{di ( x)} , where d i ( x) i' 1 x i' 1 i ln Pi
1i g
2
PS: Above inequality is implemented by substituting the sample quantities xi , Spooled for i ,
respectively.
4. For several populations using Fisher’s discrimination (maximum separation):
Now we introduce this method.
It does not assume that the populations are normal, but it need the equal population covariance matrices.
We consider the linear combination Y a' X
g
g
Sum of squared distances from g
( iY Y ) 2 (a' i a' ) 2 a' ( ( i )( i )')a
a ' B a
population s to overall mean of Y i 1
i 1
i 1
2
Variance of Y
a' a
a' a
a' a
Y
g
Where B ( i )( i )'
i 1
Ordinarily, and i are unavailable. Suppose a random sample of size ni from population
i , i 1,2, g . Denote the ni p data set, from population i , by X i and its j th row by xij' .
g
1
We define xi
ni
ni
xij and x
j 1
ni
x
i 1 i 1
g
n
i 1
ij
g
the sample between groups matrix B ni ( xi x )( xi x )'
i 1
i
g
The sample covariance of population i is S i and S pooled
ni
( x
i 1 j 1
ij
g
(n
i 1
Consequently we want to chose an â maximizing
maximizing
xi )( xij xi )'
i
1)
W
g
(n
i 1
i
1)
aˆ ' Baˆ
, then it is equivalent that chose an â
aˆ ' S pooled aˆ
aˆ ' Baˆ
.
aˆ 'Waˆ
Then aˆ1 eˆ1 , aˆ 2 eˆ2 , , aˆ s eˆs , where eˆ1 , eˆ2 , , eˆs are the eigenvectors of W 1 B and scaled so that
eˆ' S pooleseˆ 1 , where s min{( g 1), p} .
The linear combination aˆ1' x is called the sample first discriminant, and aˆ k' x is called the sample k th
discriminant.
Remark:
1
1
2
1
2
Let e1 , e2 , , es be the eigenvectors of B then e1 , e2 , , es are also the eigenvectors of B .
Similarly eˆ1 , eˆ2 , , eˆs are the eigenvectors of W 1 B then eˆ1 , eˆ2 , , eˆs are also the eigenvectors of
1
S pooled
B.
a1' i
Y1 a1' x
'
Y '
a2 x
a
2
Moreover Y
has mean vector iY 2 i under population i and covariance
'
'
Ys a s x
a s i
matrix I .
Then the appropriate measure of squared distance form Y y to iY is
s
( y iY )' ( y iY ) ( y j iYj ) 2
j 1
Allocate x to k if
s
s
s
j 1
j 1
j 1
( y j kYj )2 (a 'j ( x k )) 2 (a 'j ( x i )) 2 for all i k .
PS: Above inequality is implemented by substituting the sample quantities xi â j for i a j , where â j
is defined as above.
In factor, Fisher’s discrimination among several population is a special case in “normal theory” discriminant
score d i (x) , i.e.
s
s
j 1
j 1
( y j kYj ) 2 (a 'j ( x k )) 2 ( x i )' 1 ( x i ) 2di ( x) x' 1 x 2 ln Pi , where
1
2
1
2
y j a x , a j e j and e j is an eigenvector of B
'
j
1
2
© Copyright 2026 Paperzz