17685.pdf

1999 IEEE TENCON
A TRANSFORM DOMAIN FACE RECOGNITION APPROACH
Vinayadatt V. Kohir
U. B. Desai
P. D. A. College of Engineering
GULBARGA 585 101 INDIA
E-mail:[email protected]
Electrical Engineering Department
IIT, MUMBAI 400 076 INDIA
E-mail: [email protected]
Abstract
The paper combines DCT (Discrete Cosine
Transform) and HMM (Hidden Markov Model) to realise a
face recognition technique. The face images are subsampled to obtain a sequence of face sub-images. Each of
the sub-image is DCT transformed and the coeficients are
scanned (similar to JPEG image compression method) to
form a vector. These vectors are applied to HMMs for
recognition. Two different face databases- ORL and
SPA"
are used. The recognition rate for ORL is 99.5%
with 5 training and 5 testing images per subject. SPA"
database has recognition rate of 98.75/% with 3 training
Images and, 4 test images per subject. Further, an
investigation into dependence of recognition rate on
number of training images, and number of HMM states.
I. INTRODUCTION
In the last couple of decades many researchers [ 1,
2, 3 4, 6 , 7, 8, 91 have investigated the problem on face
recognition. Though lot of effort has gone in to this filed,
the problem has substantial scope to investigate into. A
detailed survey can be found in [ 11.
The paper presents a transform domain approach
for face recognition. The method is based on the
application of combination of DCT (Discrete Cosine
Transform) and HMMs (Hidden Markov Models). The
features of DCT and HMM are combined together. The
experiments are performed on two sets of database,
namely, the public domain ORL (Olivetti Research
Laboratory) face database database, one of the most
commonly used test bed, and in-house SPA"
(Signal
Processing and Neural Network laboratory) database.
Recognition rate of 99.5% (for 5 Training images and 5
Test images per subject} and 98.75% (3 Training images
and 4 Test images per subject} is achieved for ORL and
SPA"
databases respectively.
The space limitation prohibits inclusion of
relevant mathematical formula.
11. HIDDEN MARKOV MODELS
The (Hidden Markov Models) HMMs are known
to classify the data based on the statistical properties of the
input. We are not sure how a person recognizes a face and
so HMMs are expected to do the job of recognition by
extracting fuzzy features of the face in q l r d o n and
comparing it with the known (stored) one. To use HMMs
for classification of unknown input data (here unknown
face image), HMMs are first
trained to classify
(recognize) known faces.
HMM is associated with the interconnected
nonobservable (hidden) states which are manifested by the
observable vector sequences. HMM, 1 is characterized by
the three parameters (A,B,II).
0-7803-5739-6/99/$10.00 6 1 9 9 9 IEEE.
Let O=(o I ,02,. .ot,. . o ~ ) be the observation
sequence, where each or is a D dimension vector observed
at the t-th instant. Let Q=(ql,q2,...qt,..qT) be the
corresponding state sequence where qr E { 1,2,..N}, N being
the number of states in the model. Then the HMM
parameters A = (A,B,l-I) are defined as follows:
A: transition probability matrix whose elements are
q, = Pr~b.[q,,~=j/q,=i]
B: emission probability matrix determining the output
observation given that the HMM is in a particular state.
Every element of this matrix:
b,(o,): 1 5 j 5 N and 1 5 t 5 T
is the posterior density of observation ot at time t given
that the HMM is in state q,=j.
n: the initial state distribution matrix with the j-th entry
nj= Prob.[q,=j]
being the probability in the state j at the start of the
observation.
For further details on HMM interested reader is
advised to refer [5].
111. DISCRETE COSINE TRANSFORM
The Discrete Cosine Transform (DCT) is a real
transform with the property energy compaction and
decorrelation of data. This has made it suitable for most of
the image compressing techniques such as JPEG, MPEG.
The DCT of a 1D signal f(x), where
...N-1} is defined as:
x~{O,l,2,
N-I
c(U) = U(U)z f(X)COS((2x+I)urr/ZN)
x=o
llJN
where a(u)
=
: for U
=O
{
~/JN
: for U = 1,2,..N-1
The above equation can be applied in 2D and a
2D DCT can be realized. The 2D DCT transforms spatial
information to decoupled frequency information in the
form of DCT coefficients.
Usually, images are in JPEG compressed format,
our approach will facilitate the recognition of JPEG
images without decompression (some more investigation
in this regard is needed),
IV. DCT-HMM BASED FACE
RECOGNITION
The two components of the method i.e. DCT and
HMM have different properties altogether. The present
approach tries to combine the best of two to achieve face
recognition.
-104-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply.
HMMs are basically 1D models. Hence, to use
them for face recognition the face images which are 2 0
signals , must be represented in 1D format without
loosing any vital information. So, the approach adopted
has two parts - first one, Input conditioning, takes care of
converting 2D face images to 1D data stream, and the
next part does the recognition task.
V. EXPERIMENTATION AND RESULTS
IVa. INPUT CONDITIONING
The 2D face images are converted to I D to enable
use of HMMs for face recognition. Input conditioning first
samples the input face image generating a sequence of subimages. The sub-image sequence is generated by sliding a
square window (say, of size 16 by 16) with some over lap
(say 75%). The sampling method is in Fig. 1.
Feature Extraction: Each of the sub-image
obtained earlier is passed through the DCT transformation.
The DCT coefficients, so obtained, are scarhed in zig-zag
fashion and are arranged as a column vector. This vector
corresponds to an observation instance of HMM. For every
sub-image sequence a DCT transformed vector sequence is
obtained. The DCT transformation carried out decorrelates
the sub-images. In this work significant 10 DCT
coefficients are used to form the observation vector from a
subimage.
Every face image gives one 1D sequence.
IVb. HMM TRAINING
For every subject to be recognized, an HMM is
trained. HMM is trained using the 1D sequence generated
in the preceding Subsection. In order to in corporate
variations in facial expressions (like smiling, frowning)
and accessories (like stubble, spectacles), in stead on one
typical face, a set of face imageshubject are used to train.
The HMMs used are ergodic and are trained using
segmental k-means algorithm [9, lo]. The probability
distribution of DCT coefficients is assumed to be
Gaussian. The initial segmentation of segmental k-means
algorithm does, to some extent determines the recognition
rate.
Why choose ergodic HMM? There are no
restrictions imposed on state transitions in the ergodic
HMM. So, the HMM can enter into any state from any
other. This takes care the rotational variations of the input
face image. To be precise, one need not impose the
condition that the input face image should be upright, i.e.
head up, which is a must for top-down or pseudo 2D
HMMs proposed in [7].
IVc. RECOGNITION
Trained HMMs are used for recognizing input
face image. As earlier, the input face image is converted to
1D sequence by generating a sub-image sequence followed
by DCT transformation and zig-zag scan ordering of DCT
coefficients. This sequence is applied to HMMs and viterbi
score (state optimized probability function) is computed.
The HMM with the highest score gives the recognition
result. The state optimized probability function is is the
likelihood function Prob.(O,Q/A,) The recognized face
corresponds to that j which yields a maximum for
= argmaxProb.(O,Q/AJ).
Prob.(O,Q*J/LJ) , where
The above approach is experimented with two
different databases: ORL database and SPANN (in-house)
database.
ORL data base has 40 subjects, each having 10
different poses. All the images are in grey scale with 92 by
1 12 size. Some of the ORL subjects and their respective 10
different poses are in Fig. 2 and Fig. 3
SPANN database has 20 subjects with 7 different
pose, each of 128 by 128 size with 256 grey levels. Some
of the subjects are shown in Fig. 4 and Fig. 5.
Experiments are carried out with sub-image of
size I6 by 16 and 75% sub-image overlap. Significant I O
DCT coefficients are chosen to form the observation
vector. The training set and test set are disjoint.
To evaluate the recognition performance of the
proposed method, two different experiments are
performed.
In the first experiment, proposed method is tested
with increasing number of training face images per subject
with the ORL face database. The results agree with
intuition concept, more the training face images better the
is the recognition. The number training face images are
increased from 1 to 6, and the remaining face images of
the subject are treated as test face images. The results are
tabulated in Table 1.
In the second experiment recognition rate with
increasing number of HMM states is obtained. First five
poses of the subject (Fig. 2) are used to train HMM and
the rest 5 poses (Fig. 3) are used for testing the accuracy of
the recognition result. The number of states are increased
from 2 to 17. The recognition improves as the number of
states increase, but stabilize later but the recognition
timings do not have a definite pattern. The results are
shown as graphical plots in Fig. 6 and Fig. 7.
For the SPANN database 3 images of a subject
(Fig. 4) are used for training, and rest 4 (Fig. 5) are used
for testing. The recognition rate of 98.75% (i.e. only one
test face images) is misrecognised
References:
[ 11 R.Chellappa, C.L.wilson and S.A.Sirohy, "Human and
Machine recognition of Faces: A Survey", IEEE Proc.,
~01.83,110.3,pp.704-740, May 1995.
[2] S. Lawarance, C.L.Giles, A.C.Tsoi and A.D.Black,
"Face recognition : A Convolutional Neural-Network
Approach", IEEE trans. On Neural Networks, vo1.8, no. 1 ,
pp98-113, Jan. 1997.
[3] S.H.Lin, S.Y.Kung and L.J.Lin, "Face Recognition/
Detection by Probabilistic Desion based Neural
Networks", IEEE Trans. On Neural Networks, vo1.8, no. I ,
ppll4-132, Jan. 1997.
[4] S.M.Lucas, "Face Recognition with the Continuos ntuple Classifier, BMVC'98.
151 L.R.Rabiner, " A Tutorial on Hidden Markov Models
and selected Applications in speech recognition", IEEE
Proc., vo1.77, no.2, pp.257-285, 1989.
[6] A.N.Rajagopalan and et al., "Finding Faces in
Photographs", ICCV98, pp.640-645, Jan. 1998, New
Delhi, INDIA.
-105-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply.
[7] F.S.Samaria, Face Recognition using Hidden Markov
Models, Ph.D. Dissertation, University of Cambridge,
1994, UK.
[8] M. Turk and A.Pentland, 'I Eigenfaces for
Recognition", J. of Neuralsciences, vo1.3, no. 1, MIT 1991.
[9] Vinayadatt Kohir and U.B.Desai, "Face Recognition
using DCT-HMM approach", WACV98, Oct. 1998,
Princeton, USA.
[IO] Yang He and Amalan Kundu, "Unsupervised Texture
Segmentation using Multichannel decomposition and
Hidden Markov models", IEEE Image Processing, ~01.4,
no.5, pp.603-619, May 1995.
Table 1. Result showing the effect of number
face images on recognition performance for
database.
Misclassi
Test
fied
images
images
78
9
(2.PgmI0.pgm
45
8
(3.pgmI0.pgm)
33
7
(4.pgm1O.pgm)
6
(5.Pgm1O.pgm)
5
BEST
(6.pgm10.pgm)
4
(7.Pgm10.pgm)
E
/j
of training
ORL face
Figure 2. Sample ORL training afce images.
YO
Recognit
ion
78.33
85.93
88.21
Figure 3. Sample ORL test face images. The face image
with a 'X'mark is the only face misrecognised for the best
case.
95.42
99.50
99.50
Y overlap
X
over
_.
Figure 4. Sample SPA"
x
ISP
I
training face images.
1
Figure 1. Sampling of face image. (X,Y) is the size of
sampling window and (x,y) is the overlap allowed.
-
Figure 5. Sample SPA" test faceimages. The face image
with the a 'X' mark is the only misrecognised face for the
best case.
-106-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply.
U
I
Number of HMM States
1
l..-.
.
...
Figure 6. Graph showing the plot of number of HMM
states vs recognition timings on ORL face database with
the 5 training samples and 5 test samples. The timings are
obtained on 2OOMhz. Pentium in a multi-user environment.
CL
6
b
0
+
\CL
@
\b
I
Number of HMM States
i
1.
~..
.
.
~
~
~
~
Figure 7. Plot depicting dependence of % recognition on
the number of HMM states chosen. The results are
obtained on ORL face database with the 5 training images
.and 5 test images.
-107-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply.