Inverse Fisher discriminate criteria for small sample size problem

Pattern Recognition 38 (2005) 2192 – 2194
www.elsevier.com/locate/patcog
Rapid and brief communication
Inverse Fisher discriminate criteria for small sample size problem
and its application to face recognition
Xiao-Sheng Zhuang, Dao-Qing Dai∗
Department of Mathematics, Faculty of Mathematics and Computing, Sun Yat-Sen (Zhongshan) University, Guangzhou 510275, China
Received 1 February 2005; accepted 11 February 2005
Abstract
This paper addresses the small sample size problem in linear discriminant analysis, which occurs in face recognition
applications. Belhumeur et al. [IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711–720] proposed the FisherFace
method. We find out that the FisherFace method might fail since after the PCA transform the corresponding within class
covariance matrix can still be singular, this phenomenon is verified with the Yale face database. Hence we propose to use an
inverse Fisher criteria. Our method works when the number of training images per class is one. Experiment results suggest
that this new approach performs well.
䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Linear discriminant analysis; Small sample size problem; Inverse Fisher discriminate criteria; Face recognition
1. Introduction
Face recognition is now a very active field for research
[1]. Linear (Fisher) discriminant analysis (LDA) is a wellknown and popular statistical method in pattern recognition and classification. In the last decade, LDA approach
has been successfully applied in the face recognition technology. Some of the LDA-based face recognition systems
have also been developed and encouraging results have been
achieved. However, LDA approach suffers from a small sample size problem. A well-known approach, called FisherFace,
to avoid the small sample size problem for face recognition
was proposed by Belhumeur et al. [2]. This method consists
of two steps. The first step is the use of principal component
analysis (PCA) for dimension reduction. The second step is
the application of LDA for the transformed data. The basic
idea is that after the PCA step the covariance matrix for the
∗ Corresponding author. Tel.: +86 20 8411 3190;
fax: +86 20 8403 7978.
E-mail address: [email protected] (D.-Q. Dai).
transformed data is not singular. We find out that this PCA
step cannot guarantee the successful application of subsequent LDA, the transformed covariance matrix might still
be singular. This phenomenon occurs in the YALE database,
see the experiment section for details. In view of this limitation we propose to use a new criteria to deal with the small
sample size problem.
2. The inverse Fisher criteria
The basic idea of Fisher discriminate analysis is to maximize the Fisher quotient
W T Cb W
.
T
W ∈Rd W Cw W
max
(1)
The rank of the within class covariance matrix Cw ∈ Rd×d
satisfies rank(Cw ) min{d, # − c}, where # is the number
of total training samples, c is the number of classes. When
small sample size problem occurs, i.e. # < d + c, the matrix
Cw is singular, hence the optimization problem (1) is not
0031-3203/$30.00 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2005.02.011
X.-S. Zhuang, D.-Q. Dai / Pattern Recognition 38 (2005) 2192 – 2194
solvable. Instead we propose the new inverse Fisher criteria,
minimization of the following quotient:
Table 1
Success rates of the FisherFace method with YALE database
W T Cw W
.
T
W ∈Rd W Cb W
# of training images
Success rate (%)
2
5
8
99.4
73.6
21.4
min
(2)
We notice that the rank rank(Cb ) of the between class covariance matrix Cb ∈ Rd×d satisfies rank(Cb ) c −1. Thus
it can be singular. To solve the optimization problem (2) dimension reduction is needed. We shall not justify the use
of (2) in this brief communication and leave it to the full
version. Our scheme is as follows.
Inverse Fisher Algorithm
(1) PCA step: From the equation Ct = Cb + Cw , where
Ct is the total covariance matrix, we have the singular value decomposition Ct = U T U , where
= diag(1 , 2 , . . . , g , 0, . . . , 0) is the diagonal matrix, whose diagonal entries are the eigenvalues of Ct
arranged by 1 2 . . . g > 0, where g is the
rank of Ct . U = (u1 , u2 , . . . , ud ) is a unitary matrix
U T U = I d so that ui is eigenvectors corresponding to
i for i = 1, 2, . . . , g.
(2) Eigenvector selection: Among {u1 , u2 , . . . , ug }
we use the selection rule: for i = 1, 2, . . . , g if
T
uT
i Cb u > ui Cw ui then ui is selected. We get
p(p min{g, c − 1}) eigenvectors {ui1 , ui2 , . . . , uip }.
(3) First projection: Rd → Rp with T1 =(ui1 , ui2 , . . . , uip )
∈ Rd×p , x =T1T x. The between class covariance matrix
Cb
∈ Rp×p of the dimension reduced samples is of full
rank p. This projection is different from that used in the
FisherFace method [2].
(4) Inverse Fisher criteria:
v
v T Cw
,
T
v∈R v Cb v
minp
(3)
∈ Rp×p is the within class covariance matrix
where Cw
for x .
(5) Eigenvalue problem: the optimization problem (3)
y = y, with eigenvalues
is solved by (Cb
)−1 Cw
0 1 . . . p , and corresponding normalized
eigenvectors y1 , y2 , . . . , yp .
(6) Second projection: Rp → Rq with T2 =(y1 , y2 , . . . , yq )
∈ Rp×q , where q p. x = T2T x .
(7) The inverse Fisher transform: T ∈ Rd×q is given by
T = T 1 T2 .
We call the columns of the transform T the inverse Fisher
face (IFFace).
3. Experiment results
This section reports our experimental results. Three
standard databases from Yale University, Olivetti Research
2193
Laboratory (ORL) and FERET are selected for evaluation.
These databases could be utilized to test moderate variations in pose, illumination and facial expression. The Yale
set contains 15 persons, each has 11 images with size 243 ×
320, with different facial expressions and illuminations. The
Olivetti set contains 400 images of 40 persons of size 92 ×
112 with variations in pose, illumination and facial expression. For the FERET set we use 432 images of 72 persons,
the image resolution is 92×112. As is known, the variations
of the ORL database and the FERET database are different,
we combine the two to get a new larger set, the ORLFERET,
which has 832 images of 112 persons.
Failure rate of FisherFace for the YALE face database: As
we pointed out in the introduction, the FisherFace method
might not work, that is the within class covariance matrix
of the PCA reduced data is still singular. We carry out 500
experiments. In Table 1 we list the failure rates. In the dimension reduction step with PCA, the image dimension is
reduced to d −c−1. When the number of training images for
each class is 2, the success rate of the FisherFace method is
about 99.4%. When it becomes 5, the success rate is 73.6%.
It becomes worse when the number of training images is 8
and the success rate decreases to 21.4%. This suggests that
for the Yale database, strong correlation between images exists, which is in agreement with that of [2], the illumination
variation of images of one person lies in a cone.
Recognition performance of the IFFace method: We test
the proposed IFFace method with three experiments. We use
the L2 metric. For the classifier we use the nearest neighbor
rule with class mean of each class. The recognition rate is
calculated as the ratio of number of successful recognition
and the total number of test samples. The experiments are
repeated 50 times and average recognition rates are reported.
The first two experiments are concerned with performance of IFFace method on the ORL database and FERET
database. They are shown in Fig. 1, the left part is for ORL
and the right part is for FERET. The average recognition
rates with the ORL database change from 74% to 96% when
the number of training images per class increases from 2 to
9. For the more challenging FERET database IFFace method
has even better performance, it changes from 85% to 94%
for training images of 2–5.
In the third experiment, we test with the combined
ORLFERET database. When the number of training images
per class increases from 1 to 5, the recognition rate is from
66.7% to 92.5%. It is shown in Fig. 2. We notice that the
2194
X.-S. Zhuang, D.-Q. Dai / Pattern Recognition 38 (2005) 2192 – 2194
1
1
0.98
0.96
recognition rate
recognition rate
0.95
0.9
0.85
0.8
0.94
0.92
0.9
0.88
0.86
0.84
0.75
0.82
0.7
2
3
4
5
6
7
8
number of training images per person
9
0.8
2
3
4
5
number of training images per person
Fig. 1. Recognition rates for the ORL database (left) and FERET database (right).
4. Conclusion
1
0.9
In this paper we presented the IFFace method to solve the
small sample size problem in face recognition. Compared
with the two steps in FisherFace method we used different strategy. Experiments show that the method works well,
especially when database have uneven variations for each
class.
recognition rate
0.8
0.7
0.6
0.5
0.4
FisherFace
IFFace
0.3
Acknowledgements
0.2
0.1
0
1
2
3
4
5
number of training images per person
This project is supported in part by NSF of China
(Grant No.: 60175031, 10231040), Grants from the Ministry of Education of China and Grants from Sun Yat-Sen
University.
Fig. 2. Performance and comparison on the combined ORLFERET
database.
References
IFFace method works even when the number of training
images per class is one.
For the ORLFERET database the FisherFace method
works. We compare it with IFFace in Fig. 2. IFFace has
better performance. When the number of training images
for each class is 5, the average recognition rates of 50
experiments are, respectively, 92.5% and 87.6% for IFFace
and FisherFace.
[1] A.K. Jain, A. Ross, S. Prabhakar, An introduction to biometric
recognition, IEEE Trans. Circuits Syst. Video Technol. 14 (1)
(2004) 4–20.
[2] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs.
Fisherfaces: recognition using class specific linear projection,
IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997)
711–720.
About the Author—XIAO-SHENG ZHUANG received his Bachelor degree in Mathematics from Sun Yat-Sen (Zhongshan) University,
China, in 2003. He is now working for his Master degree in Applied Mathematics. His current research interest includes face recognition
and detection.
About the Author—DAO-QING DAI received his Ph.D. in Mathematics from Wuhan University, China, in 1990. He is currently a Professor
and Associate Dean of the Faculty of Mathematics and Computing, Sun Yat-Sen (Zhongshan) University. He visited Free University, Berlin,
as an Alexander von Humboldt research fellow from 1998 to 1999. He got the Outstanding Research Achievements in Mathematics award
from ISAAC (International Society for Analysis, Applications and Computation) in 1999 at Fukuoka, Japan. He served as Programm Chair
of Sinobiometrics’ 2004. His current research interest includes image processing, wavelet analysis and human face recognition.