1999 IEEE TENCON A TRANSFORM DOMAIN FACE RECOGNITION APPROACH Vinayadatt V. Kohir U. B. Desai P. D. A. College of Engineering GULBARGA 585 101 INDIA E-mail:[email protected] Electrical Engineering Department IIT, MUMBAI 400 076 INDIA E-mail: [email protected] Abstract The paper combines DCT (Discrete Cosine Transform) and HMM (Hidden Markov Model) to realise a face recognition technique. The face images are subsampled to obtain a sequence of face sub-images. Each of the sub-image is DCT transformed and the coeficients are scanned (similar to JPEG image compression method) to form a vector. These vectors are applied to HMMs for recognition. Two different face databases- ORL and SPA" are used. The recognition rate for ORL is 99.5% with 5 training and 5 testing images per subject. SPA" database has recognition rate of 98.75/% with 3 training Images and, 4 test images per subject. Further, an investigation into dependence of recognition rate on number of training images, and number of HMM states. I. INTRODUCTION In the last couple of decades many researchers [ 1, 2, 3 4, 6 , 7, 8, 91 have investigated the problem on face recognition. Though lot of effort has gone in to this filed, the problem has substantial scope to investigate into. A detailed survey can be found in [ 11. The paper presents a transform domain approach for face recognition. The method is based on the application of combination of DCT (Discrete Cosine Transform) and HMMs (Hidden Markov Models). The features of DCT and HMM are combined together. The experiments are performed on two sets of database, namely, the public domain ORL (Olivetti Research Laboratory) face database database, one of the most commonly used test bed, and in-house SPA" (Signal Processing and Neural Network laboratory) database. Recognition rate of 99.5% (for 5 Training images and 5 Test images per subject} and 98.75% (3 Training images and 4 Test images per subject} is achieved for ORL and SPA" databases respectively. The space limitation prohibits inclusion of relevant mathematical formula. 11. HIDDEN MARKOV MODELS The (Hidden Markov Models) HMMs are known to classify the data based on the statistical properties of the input. We are not sure how a person recognizes a face and so HMMs are expected to do the job of recognition by extracting fuzzy features of the face in q l r d o n and comparing it with the known (stored) one. To use HMMs for classification of unknown input data (here unknown face image), HMMs are first trained to classify (recognize) known faces. HMM is associated with the interconnected nonobservable (hidden) states which are manifested by the observable vector sequences. HMM, 1 is characterized by the three parameters (A,B,II). 0-7803-5739-6/99/$10.00 6 1 9 9 9 IEEE. Let O=(o I ,02,. .ot,. . o ~ ) be the observation sequence, where each or is a D dimension vector observed at the t-th instant. Let Q=(ql,q2,...qt,..qT) be the corresponding state sequence where qr E { 1,2,..N}, N being the number of states in the model. Then the HMM parameters A = (A,B,l-I) are defined as follows: A: transition probability matrix whose elements are q, = Pr~b.[q,,~=j/q,=i] B: emission probability matrix determining the output observation given that the HMM is in a particular state. Every element of this matrix: b,(o,): 1 5 j 5 N and 1 5 t 5 T is the posterior density of observation ot at time t given that the HMM is in state q,=j. n: the initial state distribution matrix with the j-th entry nj= Prob.[q,=j] being the probability in the state j at the start of the observation. For further details on HMM interested reader is advised to refer [5]. 111. DISCRETE COSINE TRANSFORM The Discrete Cosine Transform (DCT) is a real transform with the property energy compaction and decorrelation of data. This has made it suitable for most of the image compressing techniques such as JPEG, MPEG. The DCT of a 1D signal f(x), where ...N-1} is defined as: x~{O,l,2, N-I c(U) = U(U)z f(X)COS((2x+I)urr/ZN) x=o llJN where a(u) = : for U =O { ~/JN : for U = 1,2,..N-1 The above equation can be applied in 2D and a 2D DCT can be realized. The 2D DCT transforms spatial information to decoupled frequency information in the form of DCT coefficients. Usually, images are in JPEG compressed format, our approach will facilitate the recognition of JPEG images without decompression (some more investigation in this regard is needed), IV. DCT-HMM BASED FACE RECOGNITION The two components of the method i.e. DCT and HMM have different properties altogether. The present approach tries to combine the best of two to achieve face recognition. -104- Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply. HMMs are basically 1D models. Hence, to use them for face recognition the face images which are 2 0 signals , must be represented in 1D format without loosing any vital information. So, the approach adopted has two parts - first one, Input conditioning, takes care of converting 2D face images to 1D data stream, and the next part does the recognition task. V. EXPERIMENTATION AND RESULTS IVa. INPUT CONDITIONING The 2D face images are converted to I D to enable use of HMMs for face recognition. Input conditioning first samples the input face image generating a sequence of subimages. The sub-image sequence is generated by sliding a square window (say, of size 16 by 16) with some over lap (say 75%). The sampling method is in Fig. 1. Feature Extraction: Each of the sub-image obtained earlier is passed through the DCT transformation. The DCT coefficients, so obtained, are scarhed in zig-zag fashion and are arranged as a column vector. This vector corresponds to an observation instance of HMM. For every sub-image sequence a DCT transformed vector sequence is obtained. The DCT transformation carried out decorrelates the sub-images. In this work significant 10 DCT coefficients are used to form the observation vector from a subimage. Every face image gives one 1D sequence. IVb. HMM TRAINING For every subject to be recognized, an HMM is trained. HMM is trained using the 1D sequence generated in the preceding Subsection. In order to in corporate variations in facial expressions (like smiling, frowning) and accessories (like stubble, spectacles), in stead on one typical face, a set of face imageshubject are used to train. The HMMs used are ergodic and are trained using segmental k-means algorithm [9, lo]. The probability distribution of DCT coefficients is assumed to be Gaussian. The initial segmentation of segmental k-means algorithm does, to some extent determines the recognition rate. Why choose ergodic HMM? There are no restrictions imposed on state transitions in the ergodic HMM. So, the HMM can enter into any state from any other. This takes care the rotational variations of the input face image. To be precise, one need not impose the condition that the input face image should be upright, i.e. head up, which is a must for top-down or pseudo 2D HMMs proposed in [7]. IVc. RECOGNITION Trained HMMs are used for recognizing input face image. As earlier, the input face image is converted to 1D sequence by generating a sub-image sequence followed by DCT transformation and zig-zag scan ordering of DCT coefficients. This sequence is applied to HMMs and viterbi score (state optimized probability function) is computed. The HMM with the highest score gives the recognition result. The state optimized probability function is is the likelihood function Prob.(O,Q/A,) The recognized face corresponds to that j which yields a maximum for = argmaxProb.(O,Q/AJ). Prob.(O,Q*J/LJ) , where The above approach is experimented with two different databases: ORL database and SPANN (in-house) database. ORL data base has 40 subjects, each having 10 different poses. All the images are in grey scale with 92 by 1 12 size. Some of the ORL subjects and their respective 10 different poses are in Fig. 2 and Fig. 3 SPANN database has 20 subjects with 7 different pose, each of 128 by 128 size with 256 grey levels. Some of the subjects are shown in Fig. 4 and Fig. 5. Experiments are carried out with sub-image of size I6 by 16 and 75% sub-image overlap. Significant I O DCT coefficients are chosen to form the observation vector. The training set and test set are disjoint. To evaluate the recognition performance of the proposed method, two different experiments are performed. In the first experiment, proposed method is tested with increasing number of training face images per subject with the ORL face database. The results agree with intuition concept, more the training face images better the is the recognition. The number training face images are increased from 1 to 6, and the remaining face images of the subject are treated as test face images. The results are tabulated in Table 1. In the second experiment recognition rate with increasing number of HMM states is obtained. First five poses of the subject (Fig. 2) are used to train HMM and the rest 5 poses (Fig. 3) are used for testing the accuracy of the recognition result. The number of states are increased from 2 to 17. The recognition improves as the number of states increase, but stabilize later but the recognition timings do not have a definite pattern. The results are shown as graphical plots in Fig. 6 and Fig. 7. For the SPANN database 3 images of a subject (Fig. 4) are used for training, and rest 4 (Fig. 5) are used for testing. The recognition rate of 98.75% (i.e. only one test face images) is misrecognised References: [ 11 R.Chellappa, C.L.wilson and S.A.Sirohy, "Human and Machine recognition of Faces: A Survey", IEEE Proc., ~01.83,110.3,pp.704-740, May 1995. [2] S. Lawarance, C.L.Giles, A.C.Tsoi and A.D.Black, "Face recognition : A Convolutional Neural-Network Approach", IEEE trans. On Neural Networks, vo1.8, no. 1 , pp98-113, Jan. 1997. [3] S.H.Lin, S.Y.Kung and L.J.Lin, "Face Recognition/ Detection by Probabilistic Desion based Neural Networks", IEEE Trans. On Neural Networks, vo1.8, no. I , ppll4-132, Jan. 1997. [4] S.M.Lucas, "Face Recognition with the Continuos ntuple Classifier, BMVC'98. 151 L.R.Rabiner, " A Tutorial on Hidden Markov Models and selected Applications in speech recognition", IEEE Proc., vo1.77, no.2, pp.257-285, 1989. [6] A.N.Rajagopalan and et al., "Finding Faces in Photographs", ICCV98, pp.640-645, Jan. 1998, New Delhi, INDIA. -105- Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply. [7] F.S.Samaria, Face Recognition using Hidden Markov Models, Ph.D. Dissertation, University of Cambridge, 1994, UK. [8] M. Turk and A.Pentland, 'I Eigenfaces for Recognition", J. of Neuralsciences, vo1.3, no. 1, MIT 1991. [9] Vinayadatt Kohir and U.B.Desai, "Face Recognition using DCT-HMM approach", WACV98, Oct. 1998, Princeton, USA. [IO] Yang He and Amalan Kundu, "Unsupervised Texture Segmentation using Multichannel decomposition and Hidden Markov models", IEEE Image Processing, ~01.4, no.5, pp.603-619, May 1995. Table 1. Result showing the effect of number face images on recognition performance for database. Misclassi Test fied images images 78 9 (2.PgmI0.pgm 45 8 (3.pgmI0.pgm) 33 7 (4.pgm1O.pgm) 6 (5.Pgm1O.pgm) 5 BEST (6.pgm10.pgm) 4 (7.Pgm10.pgm) E /j of training ORL face Figure 2. Sample ORL training afce images. YO Recognit ion 78.33 85.93 88.21 Figure 3. Sample ORL test face images. The face image with a 'X'mark is the only face misrecognised for the best case. 95.42 99.50 99.50 Y overlap X over _. Figure 4. Sample SPA" x ISP I training face images. 1 Figure 1. Sampling of face image. (X,Y) is the size of sampling window and (x,y) is the overlap allowed. - Figure 5. Sample SPA" test faceimages. The face image with the a 'X' mark is the only misrecognised face for the best case. -106- Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply. U I Number of HMM States 1 l..-. . ... Figure 6. Graph showing the plot of number of HMM states vs recognition timings on ORL face database with the 5 training samples and 5 test samples. The timings are obtained on 2OOMhz. Pentium in a multi-user environment. CL 6 b 0 + \CL @ \b I Number of HMM States i 1. ~.. . . ~ ~ ~ ~ Figure 7. Plot depicting dependence of % recognition on the number of HMM states chosen. The results are obtained on ORL face database with the 5 training images .and 5 test images. -107- Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:50 from IEEE Xplore. Restrictions apply.
© Copyright 2026 Paperzz