ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) Uniform Local Active Forces: A Novel Gray-Scale InvariantLocal Feature Representation for Facial Expression Recognition Mohammad Shahidul Islam Assistant Professor, Department of Computer Science and Engineering,AtishDipankar University of Science & Technology,Dhaka, Bangladesh. s u va 9 3@ gm a il .com Abstract—Facial expression recognition is a dynamic topic in multiple disciplines: psychology, behavior science, sociology, medical science, computer science, etc. Local feature representation such as Local Binary Pattern is widely adopted by many researchers for facial expression recognition due to its simplicity and high efficiency. But long histogram produced by Local Binary Pattern is not suitable for large-scale face database. Considering this problem, a simple gray-scale invariant local feature descriptor is proposed for facial expression recognition. Local feature of pixel is computed based on magnitudes of gray color intensity differences of the pixel and its neighboring pixels. A facial image is divided into 9x9=81 blocks and the histograms of local features for all 81 blocks are concatenated to serve as a local distinctive descriptor of the facial image. Extended Cohn-Kanade dataset is used to evaluate the efficiency of the proposed method. It is compared with other appearance based feature representations for static image in terms of classification accuracy using multiclass SVM. Experimental results show that the proposed feature representation along with Support Vector Machine is effective for facial expression recognition. Keywords—Facial Expression recognition, Facial Image Representation, Feature Descriptor, Pattern Recognition, CK+, JAFFE, Image Processing, Computer Vision I. INTRODUCTION F acial Expression is very important for daily activities, allowing someone to express feelings beyond the verbal world and understand each other from various situations[15]. Some expressions instigate human actions, and others enrich the meaning of human communication. Mehrabian[15] observed that the verbal part of human communication contributes only 7%, the vocal part contributes 38% and facial movement and expression yields 55% to the meaning of the communication. This means that the facial part does the major contribution in human communication. Therefore, automatic facial expression recognition has become a challenging problem in computer vision. Due to its potential important applications in man- machine interactions, it attracted much attention of the researchers in the past few years [29]. A. Motivation From the survey, it is revealed that most of the facial expression recognition systems (FERS) are based on the Facial Action Coding System (FACS) [24], [25] and[21], which involves more complexity due to facial feature detection, and extraction procedures. Geometric shape based models have problem with on plane face transformation [26] , [20]. Gabor wavelet [31] is used widely in this field but the feature vector length is huge and has computational complexity. Another appearance based [23] method LBP-local binary pattern [17], which is adopted by many researchers also has disadvantages. (1) It produces too long histograms, which slows down the recognition speed; (2) Under some certain circumstances, it misses the local feature as it does not consider the effect of the center pixel [10]; Though LBPRIU2 solves the first disadvantage by making histogram small but as a penalty it drops 197 patterns among 256, which may contain valuable local structure. Aiming at these problems, a new methodology is proposednamed LAFU(Uniform Local Active Forces). LAFU is a local feature descriptor for static image, which not only overcomes the above disadvantages but also has the following advantages on top: Simple computation than existing methods, Robust feature extraction, Easy to understand and good for further research, Less memory cost due to its tiny feature vector, hence integration with devices like CCTV, Still camera is easy and cost effective, B. Related Works There are seven basic types of facial expressions. They are contempt, fear, sadness, disgust, anger, surprise and happiness [21]. Most of the research works on facial expression recognition (FER) are done on still images. The psychological experiments by Bassili [5] concluded that facial expressions are recognized more precisely from video than single still image. Yang et al. [34]used local binary pattern and local phase quantization together to build facial expression recognition www.ijcsce.org 8 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) system. Kabir et al. [35]proposedLDPv(Local Directional Pattern- Variance) by applying weight to the feature vector using local variance and found it to be effective for facial expression recognition. He used support vector machine as a classifier. Xiaohua et al. [36] used LBP-TOP (LBP on three orthogonal planes) on eyes, nose and mouth. They developed a new method that can learn weights for multiple feature sets in facial components.Tianet al. [24] proposed multi state face component model of AUs and neural network for classification. They developed an automatic system to analyze subtle changes in facial expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (deepening of facial furrows) in a nearly frontal image sequence. Cohen et al. [9] employed Naive–Bayes classifiers and hidden Markov models (HMMs) together to recognize human facial expressions from video sequences. They introduced and tested different Bayesian network classifiers for classifying expressions from video, focusing on changes in distribution assumptions, and feature dependency structures. Zhang et al. [30] proposed IR illumination camera for facial feature detection, tracking and recognized the facial expressions using Dynamic Bayesian networks (DBNs). He characterized facial expressions by detecting 26 facial features around the regions of eyes, nose, and mouth. Anderson and Peter [2] proposed a fully automated, multistage system for real time recognition of facial expression. They used a multichannel gradient model (MCGM) to determine facial optical flow. The motion signatures achieved are then classified using Support Vector Machines. Yeasinet al. [28] presented a spatio temporal approach in recognizing six universal facial expressions from visual data and using them to compute levels of interest. He used discrete hidden Markov models (DHMMs) to recognize the facial expressions. Pantic and Rothkrantz[20] presented a method which can handle a large range of human facial behavior by recognizing facial muscle actions that produce expressions. They applied face-profilecontour tracking and rule-based reasoning to recognize 20 AUs taking place alone or in a combination in nearly left-profile-view face image sequences and they achieved 84.9% accuracy rate. Kotsiaet al. [13] manually placed some of Candide grid nodes to face landmarks to create facial wire frame model for facial expressions and a Support Vector Machine (SVM) for classification. Besides facial representations using AUs, some local feature representations also proposed. The local features are much easier for extraction than those of AUs.Ahonenet al. [1] proposed a new facial representation strategy for still images based on Local Binary Pattern (LBP). In this method, the LBP value at the referenced center pixel of a 3x3 pixels area is computed using the gray scale color intensities of the pixel and its neighboring pixels as follows: LBP = P 𝑖=1 2𝑖−1 𝑓 g i − c (1) 0 𝑥<0 (2) 𝑓 𝑥 = 1 𝑥≥0 Where c denotes the gray color intensity of the center pixel, g(i) is the gray color intensity of its neighbors, P stands for the number of neighbors, i.e. 8. Figure 1 shows an example of obtaining LBP value of the referenced center pixel for a given 3×3 pixels area. An extension to the original LBP operator called LBPRIU2 was proposed by [18]. It reduces the length of the feature vector and implements a simple rotation-invariant descriptor. An LBP pattern is called uniform if the binary pattern contains at most two bitwise transitions from 0 to 1 or vice versa. For example, the patterns 00000000 (0 transitions), 01110000 (2 transitions) and 11001111 (2 transitions) are uniform whereas the patterns 11001001 (4 transitions) and 01010010 (6 transitions) are not. The uniform ones occur more commonly in any image textures than the non-uniform ones; therefore, the latter ones are neglected. The uniform ones yield only 59 different patterns. To create a rotation-invariant LBP descriptor, a uniform pattern can be rotated clockwise P-1(P=no of bits in the pattern) times, and the result matches with other pattern, then they are taken as a single pattern. Hence, instead of 59 bins, only 8 bins are needed to construct a histogram representing the local feature for a given local block. Once, the LBP local features for all blocks of a face are extracted, they are concatenated into an enhanced feature vector. This method is proven to be a growing success. It is adopted by many researchers, and is successfully used for facial expression recognition [32], [33]. Although LBP features achieved high accuracy rates for facial expression recognition, LBP extraction can be time consuming. C. Contributions in This Paper Proposed feature representation method LAFU can capture more texture information from local 3x3 pixels area. The computed feature holds information of local differences (the differences of gray color intensity of the referenced center pixel with its surrounding pixels) along with four possible accumulative gradient directions keeping the feature vector length up to 8 only for each of 9x9=81 blocks. It is robust to monotonic gray-scale changes caused, for example, by illumination variations. Due to its tiny feature vector length, it is appropriate for large-scale facial dataset. Unlike LBP, LAFUnever neglects the center pixel in the time of feature extraction. So in compare with LBP [17] or LBPRIU[18]proposed method performs better both in time and classification accuracy, see Experimental Results and Analysis. The rest of the paper is organized to explain the proposed local feature representation method and system framework in section II, data collection and experimental setup in section III, results and analysis in section IV and conclusion in section V. Figure 1: Example of obtaining LBP for a 3x3 block www.ijcsce.org 9 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) II. PROPOSED FEATURE REPRESENTATION METHOD The proposed feature representation method is based on facial texture information of a 3x3 pixels local area. The representation operates on gray scale facial images. It is designed to capture for each pixel, the local pattern of gray color intensities of its neighboring pixels with respect to the gray color intensity of the pixel along with four directions of gradients. These patterns represent distinctive features for a particular pixel. The histogram of all possible patterns is constructed for each 9x9 blocks. These histograms are then concatenated to form the local feature representation for the input image. The representation method is as follows: D. Gray-scale Invariant Local Active Forces (LAF) Local pattern for a pixel is computed from the 3x3 pixels area, see Figure 2. A D G B E H C F I Figure 2 : 3x3 pixels area. The pattern can be derived as follows: p= (f-e)-(d-e) (3) q=(c-e)-(g-e) (4) r= (b-e)-(h-e) (5) s = (a-e)-(i-e) (6) 0 𝑖𝑓 1 𝑖𝑓 a. Same area in low light Same area in high light 56 110 45 41 95 30 176 230 165 45 90 77 30 75 62 165 210 197 54 15 90 39 75 174 135 210 -34 20 -45 -45 C -13 -36 -75 0 -34 20 -45 -45 C -13 -36 -75 0 -34 20 -45 -45 C -13 -36 -75 0 a) local pattern 1 0 b. c. Local 3x3 pixels area in normal light 0 C Where ‘E’ is the local pattern at the central pixel E . Thus, ‘E’ contains 4 bits and therefore can have only 24=16 different patterns. A detailed example of local pattern extraction at pixel E is given in Figure 4. So feature vector lenght for each block is 16 for LAF, therefore feature dimension for all 81 blocks is 81x16=1296. LAFU is gray-scale invariant because monotonic gray-scale changes caused, for example, by illumination variations does not affect the local pattern, see Figure 3. This is because the local patterns are obtained using the magnitudes of local differences as shown in Figure 4. The magnitudes of local differences are independent of pixels gray color intensities for a 3x3 pixels local area. Figure 3 illustrates an example of LAFU of being gray-scale invariant. In all three different lightning condition, LAFU sustains the pattern 1010. The standard local binary pattern [17] (LBP) encodes therelationship between the referenced pixel and its surrounding neighbors by computing gray color intensity difference. The proposed method encodes the relationship between the referenced pixel and its neighbors, based on the four combined gradient directions through the center; those are calculated using the local magnitudes of gray color intensity differences of the referenced pixel with its surrounding pixels in vertical, horizontal and other two corner directions see (7) p≥0 Where a, b, c, d, e, f, g, h and i are the gray color intensities of pixels A, B, C, D, E, F, G, H and I, respectively. 1 ‘E’ = (P) (Q) (R) (S) Figure 4. Thus, LAFU holds more information than LBP and LBPRIU2 in the sense, p<0 . P= forces or local differences, i.e. E to F and E to D and P corresponds to the direction of the winning force. Q, R and S can be defined and derived in the same way as P. Then, 1 0 0 C b) local pattern 1 0 1 0 C LAFU preserves the all four possible accumulativegradient directions through the center in an addition, Unlike LBP or LBPRIU2, LAFU does not neglect thegray color intensity of the center pixel and LAFU does not neglect the magnitude of the color intensity differences between two pixels like other methods. E. Uniform LAF (LAFU) 1 0 c) local pattern Figure 3: Example of LAFUof being gray-scale invariant The p represents the winning force between two opposite A feature vector should have all those essential information needed by the classifier to classify. Unnecessary information put a pressure on the classification time as well as sometimes causes accuracy fall by a significant amount. On the other hand with inadequate features, a good classifier even may fail. Hence, Dimensionality Reduction (DR) methods are suggested as a preprocessing step to address the curse of dimensionality [37]. DR techniques try to project the data to a low-dimensional feature space. For a given p-dimensional random vector X=(x1,x2,…,xp), DR tries to represent it with a q-dimensional vector Z=(z1,z2,…,zq) such that, q<p, which maintains the characteristics of the original data as much as possible. DR can be done in two ways: 1. by transforming the existing features to a new reduced set of features, or 2. by selecting a subset of existing features. www.ijcsce.org 8 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) 1 𝐴𝑉𝐺(𝑛) = 𝑚 𝑚 𝑃𝑖𝑛 (9) 𝑖=1 Where m in number of samples in the feature matrix.TFM is then sorted in descending order. It starts with the pattern that appear more frequently therefore the highest contribution to the classification. Index from this matrix is used to select the actual pattern from the feature matrix. Experiments are conducted using topmost 100,200 up to 1200 most frequently occurring patterns, see Figure 6. 94 Corresponding Classification Accuracy(%) 92 To overcome this dimensionality problem, a new technique is introduced based on variance, term frequency and bitwise transition of the patterns for selecting a subset of existing features.After feature extraction using LAF, a feature matrix of dimension m-by-n is obtained, where m in the number of subjects and n is the feature dimension e.g. for LAF n is 1296. The column wise variance of that matrix is computed, which results to a 1-by-n dimensional array named as VM. So, VM = [VAR(1) VAR(2) . . . . . VAR(n)] 𝑉𝐴𝑅 (𝑛) = 1 𝑚 𝑚 (𝑃 𝑛 𝑖 𝑖=1 − 𝜇(𝑛)) ;Where, 𝜇(𝑛) = 2 1 𝑚 86 84 82 80 100 200 300 400 500 600 700 800 900 1000 1100 1200 Number of Features selected with top most variances Figure 5: Classification accuracy vs. number of features selected using variance. Red marked line is the best selection. 100 95 90 85 80 75 70 65 60 55 𝑚 50 𝑃𝑖𝑛 𝑖=1 100 200 300 400 500 600 700 800 900 1000 1100 1200 Number of Features selected with top most Frequency (8) Where(P ni ) is the n-th sample of the i-th image and m is the number of samples in the feature matrix. It is clear that the smaller the variance of a pattern, the less the contribution of that pattern. The VM is then sorted in the descending order to get the highly contributed patterns in the beginning. The index from the sorted matrix is used to select the actual variables from the feature matrix for SVM input, see Figure 5. Similar way the average of each column of the feature matrix is calculated which results to a 1-by-n matrix. The matrix is named as TF-matrix (Term Frequency matrix). TFM.= [AVG(1) AVG(2) . . . . . AVG(n)] 88 Corresponding Classification Accuracy(%) Figure 4: Example of local pattern extraction at a pixel E using ‘LAFU. (a) Gray-scale image, (b) Local 3x3 pixels area, (c) & (d) Local differences, (e), (f), (g) & (h) Four possible forces through the center, (i), (j), (k) & (l) Directions of the forces, (P), (Q), (R) & (S) Directions of the wining forces, (m) Final pattern for referenced pixel of gray color intensity ‘90’ 90 Figure 6: Classification accuracy vs. number of selected features using frequency of occurrence.Red marked line is the best selection. In both the cases, the highest accuracy is obtained when the number of selected feature in between 600-700 from 1296. Deep investigation on these patterns selected by both the Variance Matrix (VM) and TermFrequency Matrix (TFM) shows that the patterns having bitwise transition (e.g. from 1 to 0 or 0 to 1) less than or equal to 1 are highly responsible for expression differentiation. Based on this extensive experimental evaluation, feature selection can be simplified by selecting only less transition patterns, for example it is less than or equal to 1 in this case and named as Uniform Local Active Forces (LAFU), see Table 1. www.ijcsce.org 9 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) III. EXPERIMENTS Table 1: Patterns finally selected by LAFU Patterns selected by LAFU Patterns dropped by LAFU 0000, 0001, 0011, 0111, 1000, 1100, 1110 and 1111 0010, 0100, 0101, 0110, 1001, 1010, 1011 and 1101 F. Proposed Framework for Facial Expressions Recognition The Extended Cohn-Kanade Dataset (CK+) [14] is used for the experiments to evaluate the effectiveness of the proposed method. There are 326 peak facial expressions of 123 subjects. Seven emotion categories are in this dataset. They are ‘Anger’, ‘Contempt’, ‘Disgust’, ‘Fear’, ‘Happy’, ‘Sadness’ and ‘Surprise’. Figure 10 shows the examples of a posed facial image and an unposed facial image. Figure 8 shows the numbers of instances for each expression in the dataset. Figure 7 shows the proposed framework for facial expression recognition system. The framework consists of the following steps: Training Images Testing Images Face Detection Face Detection Face Preprocess Face Preprocess Feature Extraction Feature Extraction Contempt' (18) Angry (45) Disgust (59) Surprise (82) Fear (25) Sad (28) Happy (69) Figure 8:CK+ Dataset, seven expressions and number of instances of each expression. No subject with the same emotion is collected more than once. All the facial images in the dataset are posed. Figure 9 shows some examples of facial images in the dataset. Multiclass LIBSVM Expression Recognition Figure 7 : Proposed System Framework a. b. c. d. e. f. g. For each of training images, convert it to gray scaleif in different format. Detect the face in the image, resize it to 180x180 pixels using bilinear interpolation and divide it into equal size 81 blocks Compute LAFU value for each pixel of the image. Construct the histogram for each from 9x9=81blocks. Concatenate the histograms to get the feature vector for each image of 648 dimensions. Build a multiclass Support Vector Machine for face expression recognition using feature vectors of the training images. Do step 1 to 5 for each of testing images and use the Multiclass Support Vector Machine from step 6 to identify the face expression of the given testing image Figure 9: Cohn-Kanade (CK+) sample dataset The steps of face detection, preprocessing and feature extraction are illustrated in Figure 11. Face detection is done using fdlibmex library from Matlab. The library consists of single mexfile with a single function that takes an image as expressions. www.ijcsce.org 10 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) IV. EXPERIMENTAL RESULTS AND ANALYSIS Several experiments are conducted to find a suitable face dimension for better performance. Table 2 shows the results of different face dimensions vs. the classification accuracy obtained. Among different face dimensions 180x180 pixels is found to be the best, giving classification accuracy of 92.1%. Then the face area is masked to remove non face region e.g. hair, ear, neck side areas etc. as much as possible and divided into blocks of different sizes. Figure 10: (a) Unposed Image, (b) Posed Image Table 2: Number of blocks (2nd column) vs. classification accuracy (4th column) vs. Feature vector length (5th column); Feature vector length= Number of blocks X Number of Bins (In this case 8) Figure 11: Facial Feature Extraction input and returns the frontal face of 180x180 resolutions. The face is then masked using a elliptical shape, see Figure 11. According to the past research on face expression recognition, higher accuracy can be achieved if the input face is divided into several blocks [17],[18]. In the experiments, the 180x180-size face is equally divided into 9x9=81 blocks, see Figure 12. Figure 12: Dividing facial image to sub-image or block. Features are extracted from each block using proposed method as well as LBPRIU2 method. Gathering feature histograms of all the blocks produces a unique feature vector for a given image is of 648 dimensions. A ten-fold none overlapping cross validation is used in the experiments. Using LIBSVM [7] a multiclass support vector machine with randomly chosen 90% of 326 images (90% from each expression) as the training images is constructed. The remaining 10 % of images is used as testing images. There is no overlap between the folds and it is userdependent. Ten rounds of training and testing are conducted and the average confusion matrix for each proposed method is reported and compared against the others. The kernel parameters for the classifier are set to: s=0 for SVM type C-Svc, t=1 for polynomial kernel function, c=1 is the cost of SVM, g=0.0025 is the value of 1/ (length of feature vector dimension), b=1 for probability estimation. This setting of LIBSVM is found to be suitable for CK+ dataset with seven classes of data. Other kernel and parameter settings are also tried but the above setting is found to be suitable for CK+ dataset with seven classes of facial Face Dimension (Pixels) 180x180 180x180 180x180 180x180 180x180 180x180 180x180 Blocks 6x6 9x9 10x10 12x12 15x15 18x18 36x36 Block Dimension (Pixels) 30x30 20x20 18x18 15x15 12x12 10x10 5x5 Classification Accuracy (%) 88.70 92.1 89.26 89.21 86.75 86.51 86.07 Feature Vector Length 288 648 800 1152 1800 2592 10368 Table 3: Face dimension (1st column) vs. Classification accuracy (4th column) Face Dimension (Pixels) 240x240 220x220 200x200 180x180 160x160 140x140 Blocks 12x12 11x11 10x10 9x9 8x8 7x7 Block Dimension (Pixels) 20x20 20x20 20x20 20x20 20x20 20x20 Classification Accuracy (%) 88.23 88.51 88.62 92.1 88.9 84.41 Several experiments are conducted as well to find the suitable block numbers. According to Table ,number of blocks considered for further experiments is 9x9=81. 9x8, 8x9, 6x9, 9x6, 8x10, 10x8 and so many are also tried but 9x9 remained the best in terms of accuracy. Proposed method LAFU has 8 possible bins, see Table 1. So each of the 81 blocks has feature vector length of eight. Therefore, LAFUfeature vector in use is of dimension 8x9x9=648. To prove LAFU method to be gray-scale invariant, the gray color intensity of random instances from the 326 imagesare manually changed. Some samples are shown in Figure 13. The result is found to be consistent even after changing the illumination that proved proposed method to be gray-scale invariant. Table compares the feature vector length of LAFU with some popular methods. www.ijcsce.org 11 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) Surprise a)Photos in High Light b) Photos in Natural Light 0 1.22 0 1.22 0 0 97.56 c) Photos in low Light Figure 13: Samplesfrom CK+ dataset in different lighting condition (Different Gray-Scale Intensity) Table 4: Comparison of Feature Vector Dimension of LAFU with other methods. Method LPQ (Local Phase Quantization[19] Feature vector Length 256 Feature vector Dimension used in our experiment 256x9x9 =20736 LBP (General Local Binary pattern)[17] 256 256x9x9 =20736 LBPu2 (Uniform Local Binary Pattern)[18] 59 59x9x9 =4779 LBPRI(Rotation Invariant Local Binary Pattern)[18] 36 36x9x9 =2916 LBPRIU2(Rotation Invariant and Uniform Local Binary Pattern)[18] 8 8x9x9 =648 LAFU (Proposed Method) 8 8x9x9 =648 Third column shows the featurevector dimension for the whole facial image. All the methods from Table are conducted in same experimental setup with the same machine and the results are compared with proposed method, see Ошибка! Источник ссылки не найден.. From the figure, it is clear that LAFU performs better in terms of time and classification accuracy rate. The result for face expression recognition using the proposed feature representation method is shown using confusion matrices in Table 55. Table 5 : Confusion Matrix for LAFU prediction LAF(U) =10-fold validation Feature Extraction time for 326 Images =38 Seconds Average Classification Accuracy =92.1% Kernel parameter for LIBSVM: = (-s 0 -t 1 -c 100 -g 0.0015 -b 1) Confusion Matrix: C: Actual Angry Contempt Disgust Fear Happy Sad Surprise Angry 86.67 2.222 4.444 0 0 6.667 0 Contempt' 11.11 77.78 0 5.556 0 0 5.556 Disgust 3.39 0 96.61 0 0 0 0 Fear 4 4 4 76 8 0 4 Happy 0 0 0 0 100 0 0 Sad 10.71 0 3.571 3.571 0 75 7.143 Figure 14: Comparison of proposed method LAFU with some other popular methods in terms of-(a) Classification Accuracy. (b) Feature extraction time of 326 images from CK+ Dataset in seconds.(c) 10-Fold cross validation time in seconds (It is person dependent, Folds are non-overlapping and each fold contains 90% from each expression class for training SVM and 10% for www.ijcsce.org 12 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) testing.) and (d) Total experiment time for each of the methods in seconds. It should be noted that the results are not directly comparable due to different experimental setups, version differences of the CK dataset with different emotion labels,preprocessing methods, the number of sequences used, and so on, but they still point out the discriminative power of each approach. Table 2compares proposed method with other static analysis methods in terms of the number of subjects, the number of sequences , with different classifier and measures, providing the overall classification accuracy obtained with theCK [12] or CK+ [14] facial expression database. Table 2: Comparison with Different Approaches, No of expression classes=7 and non-dynamic; (PDM-Point Distribution Model; EAR-LBP- Extended Asymmetric Region-Local Binary Pattern; LPQ-Local Phase Quantization, LBPU2-Uniform Local Binary pattern, Poly.-Polynomial, RBF- radial basis function) Classifica No of No of tion Method Classifier Measure subjects sequences accuracy (%) SVM(RB 2-fold[8] PDM 123 327 80+ F) cross Multi Class 10-fold[16] EAR-LBP 123 327 82 SVM(RB cross F) leaveoneLBP+LQ Linear [27] 123 316 subject83 P SVM out crossvalidation Gabor SVM + 10-fold[4] Represent 90 313 87 AdaBoost cross ation 2D Manifold [11] Landmark 123 593 SVM 87 [6] s Robust SVM(Pol 10-fold[22] 96 320 88 LBPu2 ynomial) cross Multi Class 10-foldProposed LAFU 123 326 92 SVM(Pol cross y) Jeniet al.[11] used CLM tracked 593 posed sequences from Extended CK+ and Shan et al.[22] used LBPu2 (Uniform Local Binary Pattern) along with SVM.But LAFU clearly outperforms all of them in almost all cases, see Figure 11. It takes 0.024 Sec to extract features from face of 180x180 resolutions which cannot be compared with others as feature extraction is not cited in any of their papers. All the authors of [14] [8], [16], [4] and [11] used different types of face localization methods. According to those authors if faces are not aligned properly, it may miss some crucial features at the border area in the time of feature extraction from a block or whole face. It is clearly mentioned in [11] and [23] that aligned faces give an extra 5-10% increase in classification accuracy, leave-one -out increases the accuracy by 1-2% and adaboost increases it by 1-2% on CK+ dataset, [4]. So overall an extra 7-12% accuracy can be obtained using proper alignment, increasing training data and adding boosting algorithm along with classifier. LAFU better than the available best AAM result that uses texture plus shape information [14], the CLM result that utilizes only textural information [8] and the best CLM result that utilizes shape formation [11], see Table 3. It also shows that proposed texture based expression recognition is better than previous texture based methods and compares favorably with state-of-the-art procedures using shape or shape and texture information combined. The dataset has only one peak expression for a particular subject. Some subjects do not contain all seven expressions. Table 3:CK+ dataset.Comparison of the results of different methods on the seven main facial expressions. S: shape based method, T: texture based method. S + T: both shape and texture based method. (CLMConstrained Local Model, AAM-Active Appearance Model, PMSpersonal mean shape, An.= Anger, Co.= Contempt, Di.= Disgust, Fe.=Fear, Ha.=Happy, Sa.=Sad, Su.=Surprise, Avg.=Average) Authors [14] Method T/S An. Co. Di. Fe. Ha. Sa. Su Avg. AAM + SVM S 35 25 68 21 98 4. 100 50 AAM + SVM T 70.0 21.9 94.7 21.7 100.0 60.0 98.7 66.7 AAM + SVM T + S 75.0 84.4 94.7 65.2 100.0 68.0 96.0 83.3 [8] CLM + SVM T 70.1 52.4 92.5 72.1 94.2 45.9 93.6 74.4 CLM + SVM (AU0 norm.) S 73.3 72.2 89.8 68.0 95.7 50.0 94.0 77.6 CLM + SVM (PMS) S 77.8 94.4 91.5 80.0 98.6 67.9 97.6 86.8 No Face Aligning+ SVM T 86.67 77.78 96.6 76 [11] Chew et al.[8] experimented on the Extended Cohn-Kanade (CK+) dataset using CLM-tracked CAPP (Canonical Appearance) and SAPP (Similarity Shape) features. Though CLM (Constrained Local Model) is very powerful face alignment algorithm, they achieved above 80% accuracy. In [16],Naikaet al. proposed an Extended Asymmetric Region Local Binary Pattern (EAR-LBP) operator and automatic face localization heuristics to mitigate the localization errors for automatic facial expression recognition. Even after using automatic face localization, she achieved 83% accuracy on CK+ dataset. Yang at al. [27] used SIFT(System Identification from Tracking) tracked CK+, Bertlett [17] used both manual and automatic tracked faces along with AdaSVM (Adaboost+SVM), LAFU 100 75 97.56 87.1 Figure 15 shows the comparison between the number of training instances and achieved accuracy for each of expression types using LAFU. The number of ‘Sad’, ‘Contempt’ or ‘Fear’ instances is less in compare with other expression classes. Due to this reason the classification accuracy is less for these expression classes which affects the overall accuracy. Hence, it can be concluded that the number of training instances for an expression does affect the accuracy rate achieved for the www.ijcsce.org 13 ISSN: 2319-7080 International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) expression. Another reason for less classification accuracy could be the dataset. [4] No of Expression Instances in CK+ % of Expression Recognition Accuracy by LAF(RI) 120 100 87 78 80 60 76 59 98 75 69 82 45 [6] 40 20 [5] 100 97 18 25 28 [7] 0 [8] Figure 15: No. of expression Instances in CK+ dataset Vs % of classification Accuracy for Each Facial Expression in case of LAF Some facial expressions are confusing for human eyes as U is used. well. The accuracy can be improved by collecting more peak expression instances from the same subject. [9] V. CONCLUSION A novel gray-scale invariant facial feature extraction method is proposed for facial expression recognition. For each pixel in a gray scale image, the method extracts the local pattern using magnitudes of gray color intensity differences between referenced pixel and its neighbors and four possible gradient directions through the center of a local 3x3 pixels area. The method is very simple and effective for facial expression recognition and outperformed LPQ, LBP, LBPU2, LBPRI, and LBPRIU2 both in terms of classification accuracy rate and feature extraction time. The facial expression recognition using all the above methods is conducted with same experimental setup and in the machine.Future research work may include incorporating AdaBoost or SimpleBoost algorithm into the face expression classifiers, which may increase the accuracy rates substantially. VI. REFERENCES [1] [2] [3] [10] [11] [12] [13] Ahonen, T., Hadid, A., and Pietikainen, M. Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 28, 12 (2006), 2037-2041. Anderson, K. and McOwan, P. W. A Real-Time Automated System for the Recognition of Human Facial Expressions. IEEE Transactions on Systems, Man, and Cybernetics, 36, 1 (2006), 96-105. Asthana, A., Saragih, J., Wagner, M., and Goecke, R. Evaluating AAM fitting methods for facial expression recognition. In 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ( 2009), 1-8. [14] [15] [16] Bartlett, M.S., Littlewort, G., Fasel, I., and Movellan, & R. Real Time Face Detection and Facial Expression Recognition: Development and Application to Human Computer Interaction. In Proc. CVPR Workshop Computer Vision and Pattern Recognition for HumanComputer Interaction ( 2003). Bassili, J.N. Emotion Recognition: The Role of Facial Movement and the Relative Importance of Upper and Lower Area of the Face. J.Personality and Social Psychology, 37 (1979), 2049-2059. Chang, Y., Hu, C., Feris, R., and Turk, M. Manifold-based analysis of facial expression. Image Vision Comput., 24, 6 (2006), 605–614. Chang, C.C. and Lin, C.J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2011). Chew, S.W., Lucey, P., Lucey, S., Saragih, J., Cohn, J.F., and Sridharan, S. Person-independent facial expression detection using Constrained Local Models. In 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011), ( 2011), 915-920. Cohen, I., Sebe, N., Garg, S., Chen, L. S., and Huanga, T. S. Facial expression recognition from video sequences: temporal and static modelling. Computer Vision and Image Understanding, 91 (2003), 160-187. Fu, X. and W. Wei. Centralized Binary Patterns Embedded with Image Euclidean Distance for Facial Expression Recognition. In Fourth International Conference on Natural Computation, 2008. ICNC '08. ( 2008), 115-119. Jeni, L.A., Lőrincz, A., Nagy, T., Palotai, Z., Sebők, J., Szabó, Z., and Takács, D. 3D shape estimation in video sequences provides high precision evaluation of facial expressions. Image and Vision Computing, 30, 10 (October 2012), 785-795. Kanade, T., Cohn, J. F., and Tian, Y. Comprehensive database for facial expression analysis. In Fourth IEEE International Conference on Automatic Face and Gesture Recognition ( 2000). Kotsia, I. and Pitas, I. Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines. IEEE Transaction on Image Processing, 16, 1 (Jan 2007). Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., and Matthews, I. The Extended Cohn-Kande Dataset (CK+): A complete facial expression dataset for action unit and emotion-specified expression. Paper presented at the Third IEEE Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010) (2010). Mehrabian, A. Communication without words. Psychology Today, 2, 4 (1968), 53-56. Naika, C.L.S., Jha, S. Shekhar, Das, P. K., and Nair, S. B. www.ijcsce.org 14 ISSN: 2319-7080 [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] International Journal of Computer Science and Communication Engineering Volume 2 Issue 3 (August 2013 Issue) Automatic Facial Expression Recognition Using Extended AR-LBP. WIRELESS NETWORKS AND COMPUTATIONAL INTELLIGENCE, 292, 3 (2012), 244252. Ojala, T., Pietikäinen, M., and Harwood, D. A Comparative Study of Texture Measures with Classification Based on Feature Distributions. Pattern Recognition, 19, 3 (1996), 51-59. Ojala, T., Pietikäinen, M., and Mäenpää, T. Multiresolution Gray-scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pattern Analysis and Machine Intelligence, 24, 7 (2002), 971-987. Ojansivu, V. and Heikkilä, J. Blur Insensitive Texture Classification Using Local Phase Quantization. in Proc. ICISP, Berlin, Germany (2008), 236-243. Pantic, M. and Patras, Ioannis. Dynamics of Facial Expression: Recognition of Facial Actions and Their Temporal Segments From Face Profile Image Sequences. IEEE Transactions on Systems, Man, and Cybernetics, 36, 2 (2006), 433-449. Paul Ekman. Basic Emotions. In Handbook of Cognition and Emotion. John Wiley & Sons, 1999. Shan, C., Gong, S., and McOwan, P.W. Robust Facial Expression Recognition Using Local Binary Patterns. In Proc. IEEE Int’l Conf. Image Procession ( 2005), 370373. Shan, C., S.Gong, and P.W.McOwan. Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image and Vision Computing, 27, 6 (2009), 803-816. Tian, Y. L., Kanade, T., and Cohn, J. F. Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell, 23, 2 (2001), 97-115. Tong, Y., Liao, W., and Ji, Q. Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships. IEEE Trans. Pattern Anal. Mach. Intell, 29, 10 (2007), 1-17. Valstar, M. F. and Pantic, M. Combined support vector machines and hidden Markov models for modeling facial action temporal dynamics. In in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog. Workshop Human Comput. Interact. ( 2007), 118-127. Yang, S. and Bhanu, B. Understanding Discrete Facial Expressions in Video Using an Emotion Avatar Image. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, (2012), 980-992. Yeasin, M., Bullot, B., and Sharma, R. Recognition of Facial Expressions and Measurement of Levels of Interest From Video. IEEE Trans. Multimedia, 8, 3 (2006), 500508. Zeng, Z., Pantic, M., Roisman, G., and Huang, T. A [30] [31] [32] [33] [34] [35] [36] [37] survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 1 (2009), 39–58. Zhang, Y. and Ji, Q. Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans. Pattern Anal. Mach. Intel., 27, 5 (2005), 699–714. Zhang, Z., Lyons, M., Schuster, M., and Akamatsu, S. Comparison between geometry-based and Gaborwavelets-based facial expression recognition using multilayer perceptron. In Third IEEE International Conference on Automatic Face and Gesture Recognition, Proceedings. ( 1998), 454-459. Zhao, G. and Pietikainen, M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)., 29, 6 (2007), 915–928. Zhao, G. and Pietikainen, M. Facial Expression Recognition Using Constructive Feed forward Neural Networks. IEEE Transactions on Systems, Man, and Cybernetics., 34, 3 (2004), 1588–1595. Yang, S.; Bhanu, B., "Understanding Discrete Facial Expressions in Video Using an Emotion Avatar Image," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on , vol.42, no.4, pp.980,992, Aug. 2012 Kabir, H.; T. Jabid, and O. Chae."Local Directional Pattern Variance (LDPv):A Robust Feature Descriptor for Facial Expression Recognition." The International Arab Journal of Information Technology 9(4) (2012). Xiaohua, H.; Zhao, G.; Matti Pietikäinen, Wenming Zheng (2011): Expression Recognition in Videos Using a Weighted Component-Based Feature Descriptor. SCIA 2011: 569-578 Kumar A. 2009. Analysis of Unsupervised Dimensionality Reduction Techniques, Computer Science and Information Systems, vol. 6, no. 2, pp. 217-227. Mohammad Shahidul Islam (PhD) received his B.Tech. degree in Computer Science and Technology from Indian Institute of TechnologyRoorkee (I.I.TR), Uttar Pradesh, India in 2002, M.Sc. degree in Computer Science from American World University, London Campus, U.K in 2005 and M.Sc. in Mobile Computing and Communication from University of Greenwich, London, U.K in 2008. He has successfully finished his Ph.D. degree in Computer Science & Information Systems from National Institute of Development Administration (NIDA), Bangkok, Thailand in 2013.. His field of research interest includes Image Processing, Pattern Recognition, wireless and mobile communication, Satellite Commutation and Computer Networking. www.ijcsce.org 15
© Copyright 2026 Paperzz