Uniform Local Active Forces: A Novel Gray-Scale

ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
Uniform Local Active Forces: A Novel Gray-Scale
InvariantLocal Feature Representation for Facial
Expression Recognition
Mohammad Shahidul Islam
Assistant Professor, Department of Computer Science and Engineering,AtishDipankar University of Science &
Technology,Dhaka, Bangladesh.
s u va 9 3@ gm a il .com
Abstract—Facial expression recognition is a dynamic topic
in multiple disciplines: psychology, behavior science,
sociology, medical science, computer science, etc. Local
feature representation such as Local Binary Pattern is
widely adopted by many researchers for facial expression
recognition due to its simplicity and high efficiency. But long
histogram produced by Local Binary Pattern is not suitable
for large-scale face database. Considering this problem, a
simple gray-scale invariant local feature descriptor is
proposed for facial expression recognition. Local feature of
pixel is computed based on magnitudes of gray color
intensity differences of the pixel and its neighboring pixels. A
facial image is divided into 9x9=81 blocks and the histograms
of local features for all 81 blocks are concatenated to serve as
a local distinctive descriptor of the facial image. Extended
Cohn-Kanade dataset is used to evaluate the efficiency of the
proposed method. It is compared with other appearance
based feature representations for static image in terms of
classification accuracy using multiclass SVM. Experimental
results show that the proposed feature representation along
with Support Vector Machine is effective for facial
expression recognition.
Keywords—Facial Expression recognition, Facial Image
Representation, Feature Descriptor, Pattern Recognition,
CK+, JAFFE, Image Processing, Computer Vision
I. INTRODUCTION
F
acial Expression is very important for daily activities,
allowing someone to express feelings beyond the verbal
world and understand each other from various situations[15].
Some expressions instigate human actions, and others enrich the
meaning of human communication. Mehrabian[15] observed that
the verbal part of human communication contributes only 7%,
the vocal part contributes 38% and facial movement and
expression yields 55% to the meaning of the communication.
This means that the facial part does the major contribution in
human communication. Therefore, automatic facial expression
recognition has become a challenging problem in computer
vision. Due to its potential important applications in man-
machine interactions, it attracted much attention of the
researchers in the past few years [29].
A. Motivation
From the survey, it is revealed that most of the facial
expression recognition systems (FERS) are based on the Facial
Action Coding System (FACS) [24], [25] and[21], which
involves more complexity due to facial feature detection, and
extraction procedures. Geometric shape based models have
problem with on plane face transformation [26] , [20]. Gabor
wavelet [31] is used widely in this field but the feature vector
length is huge and has computational complexity. Another
appearance based [23] method LBP-local binary pattern [17],
which is adopted by many researchers also has disadvantages.
(1) It produces too long histograms, which slows down the
recognition speed; (2) Under some certain circumstances, it
misses the local feature as it does not consider the effect of the
center pixel [10]; Though LBPRIU2 solves the first disadvantage
by making histogram small but as a penalty it drops 197 patterns
among 256, which may contain valuable local structure. Aiming
at these problems, a new methodology is proposednamed
LAFU(Uniform Local Active Forces). LAFU is a local feature
descriptor for static image, which not only overcomes the above
disadvantages but also has the following advantages on top:




Simple computation than existing methods,
Robust feature extraction,
Easy to understand and good for further research,
Less memory cost due to its tiny feature vector, hence
integration with devices like CCTV, Still camera is
easy and cost effective,
B. Related Works
There are seven basic types of facial expressions. They are
contempt, fear, sadness, disgust, anger, surprise and happiness
[21]. Most of the research works on facial expression
recognition (FER) are done on still images. The psychological
experiments by Bassili [5] concluded that facial expressions are
recognized more precisely from video than single still image.
Yang et al. [34]used local binary pattern and local phase
quantization together to build facial expression recognition
www.ijcsce.org
8
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
system. Kabir et al. [35]proposedLDPv(Local Directional
Pattern- Variance) by applying weight to the feature vector using
local variance and found it to be effective for facial expression
recognition. He used support vector machine as a classifier.
Xiaohua et al. [36] used LBP-TOP (LBP on three orthogonal
planes) on eyes, nose and mouth. They developed a new method
that can learn weights for multiple feature sets in facial
components.Tianet al. [24] proposed multi state face component
model of AUs and neural network for classification. They
developed an automatic system to analyze subtle changes in
facial expressions based on both permanent facial features
(brows, eyes, mouth) and transient facial features (deepening of
facial furrows) in a nearly frontal image sequence. Cohen et al.
[9] employed Naive–Bayes classifiers and hidden Markov
models (HMMs) together to recognize human facial expressions
from video sequences. They introduced and tested different
Bayesian network classifiers for classifying expressions from
video, focusing on changes in distribution assumptions, and
feature dependency structures. Zhang et al. [30] proposed IR
illumination camera for facial feature detection, tracking and
recognized the facial expressions using Dynamic Bayesian
networks (DBNs). He characterized facial expressions by
detecting 26 facial features around the regions of eyes, nose, and
mouth. Anderson and Peter [2] proposed a fully automated,
multistage system for real time recognition of facial expression.
They used a multichannel gradient model (MCGM) to determine
facial optical flow. The motion signatures achieved are then
classified using Support Vector Machines. Yeasinet al. [28]
presented a spatio temporal approach in recognizing six
universal facial expressions from visual data and using them to
compute levels of interest. He used discrete hidden Markov
models (DHMMs) to recognize the facial expressions. Pantic
and Rothkrantz[20] presented a method which can handle a large
range of human facial behavior by recognizing facial muscle
actions that produce expressions. They applied face-profilecontour tracking and rule-based reasoning to recognize 20 AUs
taking place alone or in a combination in nearly left-profile-view
face image sequences and they achieved 84.9% accuracy rate.
Kotsiaet al. [13] manually placed some of Candide grid nodes to
face landmarks to create facial wire frame model for facial
expressions and a Support Vector Machine (SVM) for
classification. Besides facial representations using AUs, some
local feature representations also proposed. The local features
are much easier for extraction than those of AUs.Ahonenet al.
[1] proposed a new facial representation strategy for still images
based on Local Binary Pattern (LBP). In this method, the LBP
value at the referenced center pixel of a 3x3 pixels area is
computed using the gray scale color intensities of the pixel and
its neighboring pixels as follows:
LBP =
P
𝑖=1
2𝑖−1 𝑓 g i − c
(1)
0 𝑥<0
(2)
𝑓 𝑥 =
1 𝑥≥0
Where c denotes the gray color intensity of the center pixel, g(i)
is the gray color intensity of its neighbors, P stands for the
number of neighbors, i.e. 8. Figure 1 shows an example of
obtaining LBP value of the referenced center pixel for a given
3×3 pixels area. An extension to the original LBP operator called
LBPRIU2 was proposed by [18]. It reduces the length of the
feature vector and implements a simple rotation-invariant
descriptor. An LBP pattern is called uniform if the binary pattern
contains at most two bitwise transitions from 0 to 1 or vice
versa. For example, the patterns 00000000 (0 transitions),
01110000 (2 transitions) and 11001111 (2 transitions) are
uniform whereas the patterns 11001001 (4 transitions) and
01010010 (6 transitions) are not. The uniform ones occur more
commonly in any image textures than the non-uniform ones;
therefore, the latter ones are neglected. The uniform ones yield
only 59 different patterns. To create a rotation-invariant LBP
descriptor, a uniform pattern can be rotated clockwise P-1(P=no
of bits in the pattern) times, and the result matches with other
pattern, then they are taken as a single pattern. Hence, instead of
59 bins, only 8 bins are needed to construct a histogram
representing the local feature for a given local block. Once, the
LBP local features for all blocks of a face are extracted, they are
concatenated into an enhanced feature vector. This method is
proven to be a growing success. It is adopted by many
researchers, and is successfully used for facial expression
recognition [32], [33]. Although LBP features achieved high
accuracy rates for facial expression recognition, LBP extraction
can be time consuming.
C. Contributions in This Paper
Proposed feature representation method LAFU can capture
more texture information from local 3x3 pixels area. The
computed feature holds information of local differences (the
differences of gray color intensity of the referenced center pixel
with its surrounding pixels) along with four possible
accumulative gradient directions keeping the feature vector
length up to 8 only for each of 9x9=81 blocks. It is robust to
monotonic gray-scale changes caused, for example, by
illumination variations. Due to its tiny feature vector length, it is
appropriate for large-scale facial dataset. Unlike LBP,
LAFUnever neglects the center pixel in the time of feature
extraction. So in compare with LBP [17] or LBPRIU[18]proposed
method performs better both in time and classification accuracy,
see Experimental Results and Analysis.
The rest of the paper is organized to explain the proposed local
feature representation method and system framework in section
II, data collection and experimental setup in section III, results
and analysis in section IV and conclusion in section V.
Figure 1: Example of obtaining LBP for a 3x3 block
www.ijcsce.org
9
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
II. PROPOSED FEATURE REPRESENTATION METHOD
The proposed feature representation method is based on facial
texture information of a 3x3 pixels local area. The representation
operates on gray scale facial images. It is designed to capture for
each pixel, the local pattern of gray color intensities of its
neighboring pixels with respect to the gray color intensity of the
pixel along with four directions of gradients. These patterns
represent distinctive features for a particular pixel. The
histogram of all possible patterns is constructed for each 9x9
blocks. These histograms are then concatenated to form the local
feature representation for the input image. The representation
method is as follows:
D. Gray-scale Invariant Local Active Forces (LAF)
Local pattern for a pixel is computed from the 3x3 pixels area, see
Figure 2.
A
D
G
B
E
H
C
F
I
Figure 2 : 3x3 pixels area.
The pattern can be derived as follows:
p= (f-e)-(d-e)
(3)
q=(c-e)-(g-e)
(4)
r= (b-e)-(h-e)
(5)
s = (a-e)-(i-e)
(6)
0
𝑖𝑓
1
𝑖𝑓
a.
Same area in
low light
Same area in
high light
56 110 45
41 95 30
176 230 165
45 90 77
30 75 62
165 210 197
54 15 90
39
75
174 135 210
-34 20 -45
-45 C -13
-36 -75 0
-34 20 -45
-45 C -13
-36 -75 0
-34 20 -45
-45 C -13
-36 -75 0
a) local
pattern
1
0
b.
c.
Local 3x3
pixels area in
normal light
0
C
Where ‘E’ is the local pattern at the central pixel E . Thus, ‘E’
contains 4 bits and therefore can have only 24=16 different
patterns. A detailed example of local pattern extraction at pixel E
is given in Figure 4. So feature vector lenght for each block is 16
for LAF, therefore feature dimension for all 81 blocks is
81x16=1296. LAFU is gray-scale invariant because monotonic
gray-scale changes caused, for example, by illumination
variations does not affect the local pattern, see Figure 3. This is
because the local patterns are obtained using the magnitudes of
local differences as shown in Figure 4. The magnitudes of local
differences are independent of pixels gray color intensities for a
3x3 pixels local area. Figure 3 illustrates an example of LAFU
of being gray-scale invariant. In all three different lightning
condition, LAFU sustains the pattern 1010. The standard local
binary pattern [17] (LBP) encodes therelationship between the
referenced pixel and its surrounding neighbors by computing
gray color intensity difference. The proposed method encodes
the relationship between the referenced pixel and its neighbors,
based on the four combined gradient directions through the
center; those are calculated using the local magnitudes of gray
color intensity differences of the referenced pixel with its
surrounding pixels in vertical, horizontal and other two corner
directions see
(7)
p≥0
Where a, b, c, d, e, f, g, h and i are the gray color intensities of
pixels A, B, C, D, E, F, G, H and I, respectively.
1
‘E’ = (P) (Q) (R) (S)
Figure 4. Thus, LAFU holds more information than LBP and
LBPRIU2 in the sense,
p<0
.
P=
forces or local differences, i.e. E to F and E to D and P
corresponds to the direction of the winning force. Q, R and S can
be defined and derived in the same way as P. Then,
1
0
0
C
b) local
pattern
1
0
1
0
C
LAFU preserves the all four possible accumulativegradient
directions through the center in an addition,
Unlike LBP or LBPRIU2, LAFU does not neglect thegray
color intensity of the center pixel and
LAFU does not neglect the magnitude of the color intensity
differences between two pixels like other methods.
E. Uniform LAF (LAFU)
1
0
c) local
pattern
Figure 3: Example of LAFUof being gray-scale invariant
The p represents the winning force between two opposite
A feature vector should have all those essential information
needed by the classifier to classify. Unnecessary information put
a pressure on the classification time as well as sometimes causes
accuracy fall by a significant amount. On the other hand with
inadequate features, a good classifier even may fail. Hence,
Dimensionality Reduction (DR) methods are suggested as a
preprocessing step to address the curse of dimensionality [37].
DR techniques try to project the data to a low-dimensional
feature space. For a given p-dimensional random vector
X=(x1,x2,…,xp), DR tries to represent it with a q-dimensional
vector Z=(z1,z2,…,zq) such that, q<p, which maintains the
characteristics of the original data as much as possible. DR can
be done in two ways:
1. by transforming the existing features to a new reduced
set of features, or
2. by selecting a subset of existing features.
www.ijcsce.org
8
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
1
𝐴𝑉𝐺(𝑛) =
𝑚
𝑚
𝑃𝑖𝑛
(9)
𝑖=1
Where m in number of samples in the feature matrix.TFM is
then sorted in descending order. It starts with the pattern that
appear more frequently therefore the highest contribution to the
classification. Index from this matrix is used to select the actual
pattern from the feature matrix. Experiments are conducted
using topmost 100,200 up to 1200 most frequently occurring
patterns, see Figure 6.
94
Corresponding Classification
Accuracy(%)
92
To overcome this dimensionality problem, a new technique is
introduced based on variance, term frequency and bitwise
transition of the patterns for selecting a subset of existing
features.After feature extraction using LAF, a feature matrix of
dimension m-by-n is obtained, where m in the number of
subjects and n is the feature dimension e.g. for LAF n is 1296.
The column wise variance of that matrix is computed, which
results to a 1-by-n dimensional array named as VM.
So,
VM = [VAR(1) VAR(2) . . . . . VAR(n)]
𝑉𝐴𝑅 (𝑛) =
1
𝑚
𝑚
(𝑃
𝑛
𝑖
𝑖=1
− 𝜇(𝑛))
;Where,
𝜇(𝑛) =
2
1
𝑚
86
84
82
80
100 200 300 400 500 600 700 800 900 1000 1100 1200
Number of Features selected with top most variances
Figure 5: Classification accuracy vs. number of features selected using
variance. Red marked line is the best selection.
100
95
90
85
80
75
70
65
60
55
𝑚
50
𝑃𝑖𝑛
𝑖=1
100 200 300 400 500 600 700 800 900 1000 1100 1200
Number of Features selected with top most Frequency
(8)
Where(P ni ) is the n-th sample of the i-th image and m is the
number of samples in the feature matrix. It is clear that the
smaller the variance of a pattern, the less the contribution of that
pattern. The VM is then sorted in the descending order to get the
highly contributed patterns in the beginning. The index from the
sorted matrix is used to select the actual variables from the
feature matrix for SVM input, see Figure 5.
Similar way the average of each column of the feature matrix
is calculated which results to a 1-by-n matrix. The matrix is
named as TF-matrix (Term Frequency matrix).
TFM.= [AVG(1) AVG(2) . . . . . AVG(n)]
88
Corresponding Classification
Accuracy(%)
Figure 4: Example of local pattern extraction at a pixel E using ‘LAFU.
(a) Gray-scale image, (b) Local 3x3 pixels area, (c) & (d) Local
differences, (e), (f), (g) & (h) Four possible forces through the center,
(i), (j), (k) & (l) Directions of the forces, (P), (Q), (R) & (S) Directions
of the wining forces, (m) Final pattern for referenced pixel of gray color
intensity ‘90’
90
Figure 6: Classification accuracy vs. number of selected features using
frequency of occurrence.Red marked line is the best selection.
In both the cases, the highest accuracy is obtained when the number of
selected feature in between 600-700 from 1296. Deep investigation on
these patterns selected by both the Variance Matrix (VM) and TermFrequency Matrix (TFM) shows that the patterns having bitwise
transition (e.g. from 1 to 0 or 0 to 1) less than or equal to 1 are highly
responsible for expression differentiation. Based on this extensive
experimental evaluation, feature selection can be simplified by selecting
only less transition patterns, for example it is less than or equal to 1 in
this case and named as Uniform Local Active Forces (LAFU), see
Table 1.
www.ijcsce.org
9
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
III. EXPERIMENTS
Table 1: Patterns finally selected by LAFU
Patterns selected by LAFU
Patterns dropped by LAFU
0000, 0001, 0011, 0111,
1000, 1100, 1110 and 1111
0010, 0100, 0101, 0110,
1001, 1010, 1011 and 1101
F. Proposed Framework for Facial Expressions Recognition
The Extended Cohn-Kanade Dataset (CK+) [14] is used for the
experiments to evaluate the effectiveness of the proposed
method. There are 326 peak facial expressions of 123 subjects.
Seven emotion categories are in this dataset. They are ‘Anger’,
‘Contempt’, ‘Disgust’, ‘Fear’, ‘Happy’, ‘Sadness’ and
‘Surprise’. Figure 10 shows the examples of a posed facial
image and an unposed facial image.
Figure 8 shows the numbers of instances for each expression in
the dataset.
Figure 7 shows the proposed framework for facial expression
recognition system. The framework consists of the following
steps:
Training
Images
Testing
Images
Face
Detection
Face
Detection
Face
Preprocess
Face
Preprocess
Feature
Extraction
Feature
Extraction
Contempt'
(18)
Angry
(45)
Disgust
(59)
Surprise
(82)
Fear
(25)
Sad
(28)
Happy
(69)
Figure 8:CK+ Dataset, seven expressions and number of
instances of each expression.
No subject with the same emotion is collected more than once.
All the facial images in the dataset are posed. Figure 9 shows
some examples of facial images in the dataset.
Multiclass
LIBSVM
Expression
Recognition
Figure 7 : Proposed System Framework
a.
b.
c.
d.
e.
f.
g.
For each of training images, convert it to gray scaleif in
different format.
Detect the face in the image, resize it to 180x180 pixels
using bilinear interpolation and divide it into equal size 81
blocks
Compute LAFU value for each pixel of the image.
Construct the histogram for each from 9x9=81blocks.
Concatenate the histograms to get the feature vector for each
image of 648 dimensions.
Build a multiclass Support Vector Machine for face
expression recognition using feature vectors of the training
images.
Do step 1 to 5 for each of testing images and use the
Multiclass Support Vector Machine from step 6 to identify
the face expression of the given testing image
Figure 9: Cohn-Kanade (CK+) sample dataset
The steps of face detection, preprocessing and feature
extraction are illustrated in Figure 11. Face detection is done
using fdlibmex library from Matlab. The library consists of
single mexfile with a single function that takes an image as
expressions.
www.ijcsce.org
10
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
IV. EXPERIMENTAL RESULTS AND ANALYSIS
Several experiments are conducted to find a suitable face
dimension for better performance. Table 2 shows the results of
different face dimensions vs. the classification accuracy
obtained. Among different face dimensions 180x180 pixels is
found to be the best, giving classification accuracy of 92.1%.
Then the face area is masked to remove non face region e.g. hair,
ear, neck side areas etc. as much as possible and divided into
blocks of different sizes.
Figure 10: (a) Unposed Image, (b) Posed Image
Table 2: Number of blocks (2nd column) vs. classification accuracy
(4th column) vs. Feature vector length (5th column); Feature vector
length= Number of blocks X Number of Bins (In this case 8)
Figure 11: Facial Feature Extraction
input and returns the frontal face of 180x180 resolutions. The
face is then masked using a elliptical shape, see Figure 11.
According to the past research on face expression recognition,
higher accuracy can be achieved if the input face is divided into
several blocks [17],[18]. In the experiments, the 180x180-size
face is equally divided into 9x9=81 blocks, see Figure 12.
Figure 12:
Dividing facial image to sub-image or block.
Features are extracted from each block using proposed method
as well as LBPRIU2 method. Gathering feature histograms of all
the blocks produces a unique feature vector for a given image is
of 648 dimensions. A ten-fold none overlapping cross validation
is used in the experiments. Using LIBSVM [7] a multiclass
support vector machine with randomly chosen 90% of 326
images (90% from each expression) as the training images is
constructed. The remaining 10 % of images is used as testing
images. There is no overlap between the folds and it is userdependent. Ten rounds of training and testing are conducted and
the average confusion matrix for each proposed method is
reported and compared against the others. The kernel parameters
for the classifier are set to: s=0 for SVM type C-Svc, t=1 for
polynomial kernel function, c=1 is the cost of SVM, g=0.0025 is
the value of 1/ (length of feature vector dimension), b=1 for
probability estimation. This setting of LIBSVM is found to be
suitable for CK+ dataset with seven classes of data. Other kernel
and parameter settings are also tried but the above setting is
found to be suitable for CK+ dataset with seven classes of facial
Face
Dimension
(Pixels)
180x180
180x180
180x180
180x180
180x180
180x180
180x180
Blocks
6x6
9x9
10x10
12x12
15x15
18x18
36x36
Block
Dimension
(Pixels)
30x30
20x20
18x18
15x15
12x12
10x10
5x5
Classification
Accuracy (%)
88.70
92.1
89.26
89.21
86.75
86.51
86.07
Feature
Vector
Length
288
648
800
1152
1800
2592
10368
Table 3: Face dimension (1st column) vs. Classification accuracy
(4th column)
Face
Dimension
(Pixels)
240x240
220x220
200x200
180x180
160x160
140x140
Blocks
12x12
11x11
10x10
9x9
8x8
7x7
Block
Dimension
(Pixels)
20x20
20x20
20x20
20x20
20x20
20x20
Classification
Accuracy
(%)
88.23
88.51
88.62
92.1
88.9
84.41
Several experiments are conducted as well to find the suitable
block numbers. According to Table ,number of blocks
considered for further experiments is 9x9=81. 9x8, 8x9, 6x9,
9x6, 8x10, 10x8 and so many are also tried but 9x9 remained the
best in terms of accuracy. Proposed method LAFU has 8 possible
bins, see Table 1. So each of the 81 blocks has feature vector
length of eight. Therefore, LAFUfeature vector in use is of
dimension 8x9x9=648. To prove LAFU method to be gray-scale
invariant, the gray color intensity of random instances from the
326 imagesare manually changed. Some samples are shown in
Figure 13.
The result is found to be consistent even after changing the
illumination that proved proposed method to be gray-scale
invariant. Table compares the feature vector length of LAFU
with some popular methods.
www.ijcsce.org
11
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
Surprise
a)Photos in High
Light
b) Photos in Natural
Light
0
1.22
0
1.22
0
0
97.56
c) Photos in low
Light
Figure 13: Samplesfrom CK+ dataset in different lighting
condition (Different Gray-Scale Intensity)
Table 4: Comparison of Feature Vector Dimension of LAFU
with other methods.
Method
LPQ
(Local Phase Quantization[19]
Feature
vector
Length
256
Feature vector
Dimension used
in our
experiment
256x9x9
=20736
LBP (General Local Binary
pattern)[17]
256
256x9x9
=20736
LBPu2 (Uniform Local Binary
Pattern)[18]
59
59x9x9
=4779
LBPRI(Rotation Invariant Local Binary
Pattern)[18]
36
36x9x9
=2916
LBPRIU2(Rotation Invariant and Uniform
Local Binary Pattern)[18]
8
8x9x9
=648
LAFU (Proposed Method)
8
8x9x9
=648
Third column shows the featurevector dimension for the whole
facial image. All the methods from Table are conducted in
same experimental setup with the same machine and the results
are compared with proposed method, see Ошибка! Источник
ссылки не найден.. From the figure, it is clear that LAFU
performs better in terms of time and classification accuracy rate.
The result for face expression recognition using the proposed
feature representation method is shown using confusion matrices
in
Table 55.
Table 5 : Confusion Matrix for LAFU
prediction
LAF(U)
=10-fold validation
Feature Extraction time for 326 Images
=38
Seconds
Average Classification Accuracy
=92.1%
Kernel parameter for LIBSVM: = (-s 0 -t 1 -c 100 -g 0.0015 -b 1)
Confusion Matrix:
C:
Actual
Angry Contempt Disgust Fear Happy Sad
Surprise
Angry
86.67
2.222
4.444
0
0
6.667
0
Contempt' 11.11
77.78
0
5.556
0
0
5.556
Disgust
3.39
0
96.61
0
0
0
0
Fear
4
4
4
76
8
0
4
Happy
0
0
0
0
100
0
0
Sad
10.71
0
3.571 3.571
0
75
7.143
Figure 14: Comparison of proposed method LAFU with some
other popular methods in terms of-(a) Classification Accuracy.
(b) Feature extraction time of 326 images from CK+ Dataset in
seconds.(c) 10-Fold cross validation time in seconds (It is person
dependent, Folds are non-overlapping and each fold contains
90% from each expression class for training SVM and 10% for
www.ijcsce.org
12
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
testing.) and (d) Total experiment time for each of the methods
in seconds.
It should be noted that the results are not directly comparable
due to different experimental setups, version differences of the
CK dataset with different emotion labels,preprocessing methods,
the number of sequences used, and so on, but they still point out
the discriminative power of each approach.
Table 2compares proposed method with other static analysis
methods in terms of the number of subjects, the number of
sequences , with different classifier and measures, providing the
overall classification accuracy obtained with theCK [12] or CK+
[14] facial expression database.
Table 2: Comparison with Different Approaches, No of expression
classes=7 and non-dynamic; (PDM-Point Distribution Model;
EAR-LBP- Extended Asymmetric Region-Local Binary Pattern;
LPQ-Local Phase Quantization, LBPU2-Uniform Local Binary
pattern, Poly.-Polynomial, RBF- radial basis function)
Classifica
No of
No of
tion
Method
Classifier Measure
subjects sequences
accuracy
(%)
SVM(RB 2-fold[8]
PDM
123
327
80+
F)
cross
Multi
Class
10-fold[16] EAR-LBP 123
327
82
SVM(RB cross
F)
leaveoneLBP+LQ
Linear
[27]
123
316
subject83
P
SVM
out crossvalidation
Gabor
SVM + 10-fold[4]
Represent
90
313
87
AdaBoost cross
ation
2D
Manifold
[11] Landmark 123
593
SVM
87
[6]
s
Robust
SVM(Pol 10-fold[22]
96
320
88
LBPu2
ynomial) cross
Multi
Class
10-foldProposed LAFU
123
326
92
SVM(Pol cross
y)
Jeniet al.[11] used CLM tracked 593 posed sequences from
Extended CK+ and Shan et al.[22] used LBPu2 (Uniform Local
Binary Pattern) along with SVM.But LAFU clearly outperforms
all of them in almost all cases, see Figure 11. It takes 0.024 Sec
to extract features from face of 180x180 resolutions which
cannot be compared with others as feature extraction is not cited
in any of their papers. All the authors of [14] [8], [16], [4] and
[11] used different types of face localization methods. According
to those authors if faces are not aligned properly, it may miss
some crucial features at the border area in the time of feature
extraction from a block or whole face. It is clearly mentioned in
[11] and [23] that aligned faces give an extra 5-10% increase in
classification accuracy, leave-one -out increases the accuracy by
1-2% and adaboost increases it by 1-2% on CK+ dataset, [4]. So
overall an extra 7-12% accuracy can be obtained using proper
alignment, increasing training data and adding boosting
algorithm along with classifier. LAFU better than the available
best AAM result that uses texture plus shape information [14],
the CLM result that utilizes only textural information [8] and the
best CLM result that utilizes shape formation [11], see Table 3.
It also shows that proposed texture based expression recognition
is better than previous texture based methods and compares
favorably with state-of-the-art procedures using shape or shape
and texture information combined. The dataset has only one
peak expression for a particular subject. Some subjects do not
contain all seven expressions.
Table 3:CK+ dataset.Comparison of the results of different methods on
the seven main facial expressions. S: shape based method, T: texture
based method. S + T: both shape and texture based method. (CLMConstrained Local Model, AAM-Active Appearance Model, PMSpersonal mean shape, An.= Anger, Co.= Contempt, Di.= Disgust,
Fe.=Fear, Ha.=Happy, Sa.=Sad, Su.=Surprise, Avg.=Average)
Authors
[14]
Method
T/S
An.
Co.
Di.
Fe.
Ha.
Sa.
Su
Avg.
AAM + SVM
S
35
25
68
21
98
4.
100
50
AAM + SVM
T
70.0 21.9 94.7 21.7 100.0 60.0 98.7 66.7
AAM + SVM T + S 75.0 84.4 94.7 65.2 100.0 68.0 96.0 83.3
[8]
CLM + SVM
T
70.1 52.4 92.5 72.1 94.2 45.9 93.6 74.4
CLM + SVM
(AU0 norm.)
S
73.3 72.2 89.8 68.0 95.7 50.0 94.0 77.6
CLM + SVM
(PMS)
S
77.8 94.4 91.5 80.0 98.6 67.9 97.6 86.8
No Face
Aligning+
SVM
T
86.67 77.78 96.6 76
[11]
Chew et al.[8] experimented on the Extended Cohn-Kanade
(CK+) dataset using CLM-tracked CAPP (Canonical
Appearance) and SAPP (Similarity Shape) features. Though
CLM (Constrained Local Model) is very powerful face
alignment algorithm, they achieved above 80% accuracy. In
[16],Naikaet al. proposed an Extended Asymmetric Region
Local Binary Pattern (EAR-LBP) operator and automatic face
localization heuristics to mitigate the localization errors for
automatic facial expression recognition. Even after using
automatic face localization, she achieved 83% accuracy on CK+
dataset. Yang at al. [27] used SIFT(System Identification from
Tracking) tracked CK+, Bertlett [17] used both manual and
automatic tracked faces along with AdaSVM (Adaboost+SVM),
LAFU
100
75 97.56 87.1
Figure 15 shows the comparison between the number of
training instances and achieved accuracy for each of expression
types using LAFU. The number of ‘Sad’, ‘Contempt’ or ‘Fear’
instances is less in compare with other expression classes. Due
to this reason the classification accuracy is less for these
expression classes which affects the overall accuracy. Hence, it
can be concluded that the number of training instances for an
expression does affect the accuracy rate achieved for the
www.ijcsce.org
13
ISSN: 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
expression. Another reason for less classification accuracy could
be the dataset.
[4]
No of Expression Instances in CK+
% of Expression Recognition Accuracy by LAF(RI)
120
100
87
78
80
60
76
59
98
75
69
82
45
[6]
40
20
[5]
100
97
18
25
28
[7]
0
[8]
Figure 15: No. of expression Instances in CK+ dataset Vs % of
classification Accuracy for Each Facial Expression in case of
LAF
Some
facial expressions are confusing for human eyes as
U is used.
well. The accuracy can be improved by collecting more peak
expression instances from the same subject.
[9]
V. CONCLUSION
A novel gray-scale invariant facial feature extraction method
is proposed for facial expression recognition. For each pixel in a
gray scale image, the method extracts the local pattern using
magnitudes of gray color intensity differences between
referenced pixel and its neighbors and four possible gradient
directions through the center of a local 3x3 pixels area. The
method is very simple and effective for facial expression
recognition and outperformed LPQ, LBP, LBPU2, LBPRI, and
LBPRIU2 both in terms of classification accuracy rate and feature
extraction time. The facial expression recognition using all the
above methods is conducted with same experimental setup and
in the machine.Future research work may include incorporating
AdaBoost or SimpleBoost algorithm into the face expression
classifiers, which may increase the accuracy rates substantially.
VI. REFERENCES
[1]
[2]
[3]
[10]
[11]
[12]
[13]
Ahonen, T., Hadid, A., and Pietikainen, M. Face
Description with Local Binary Patterns: Application to
Face Recognition. IEEE Trans. Pattern Analysis and
Machine Intelligence, 28, 12 (2006), 2037-2041.
Anderson, K. and McOwan, P. W. A Real-Time
Automated System for the Recognition of Human Facial
Expressions. IEEE Transactions on Systems, Man, and
Cybernetics, 36, 1 (2006), 96-105.
Asthana, A., Saragih, J., Wagner, M., and Goecke, R.
Evaluating AAM fitting methods for facial expression
recognition. In 3rd International Conference on Affective
Computing and Intelligent Interaction and Workshops, (
2009), 1-8.
[14]
[15]
[16]
Bartlett, M.S., Littlewort, G., Fasel, I., and Movellan, & R.
Real Time Face Detection and Facial Expression
Recognition: Development and Application to Human
Computer Interaction. In Proc. CVPR Workshop
Computer Vision and Pattern Recognition for
HumanComputer Interaction ( 2003).
Bassili, J.N. Emotion Recognition: The Role of Facial
Movement and the Relative Importance of Upper and
Lower Area of the Face. J.Personality and Social
Psychology, 37 (1979), 2049-2059.
Chang, Y., Hu, C., Feris, R., and Turk, M. Manifold-based
analysis of facial expression. Image Vision Comput., 24, 6
(2006), 605–614.
Chang, C.C. and Lin, C.J. LIBSVM: a library for support
vector machines. ACM Transactions on Intelligent
Systems and Technology (2011).
Chew, S.W., Lucey, P., Lucey, S., Saragih, J., Cohn, J.F.,
and Sridharan, S. Person-independent facial expression
detection using Constrained Local Models. In 2011 IEEE
International Conference on Automatic Face & Gesture
Recognition and Workshops (FG 2011), ( 2011), 915-920.
Cohen, I., Sebe, N., Garg, S., Chen, L. S., and Huanga, T.
S. Facial expression recognition from video sequences:
temporal and static modelling. Computer Vision and
Image Understanding, 91 (2003), 160-187.
Fu, X. and W. Wei. Centralized Binary Patterns
Embedded with Image Euclidean Distance for Facial
Expression Recognition. In Fourth International
Conference on Natural Computation, 2008. ICNC '08. (
2008), 115-119.
Jeni, L.A., Lőrincz, A., Nagy, T., Palotai, Z., Sebők, J.,
Szabó, Z., and Takács, D. 3D shape estimation in video
sequences provides high precision evaluation of facial
expressions. Image and Vision Computing, 30, 10
(October 2012), 785-795.
Kanade, T., Cohn, J. F., and Tian, Y. Comprehensive
database for facial expression analysis. In Fourth IEEE
International Conference on Automatic Face and Gesture
Recognition ( 2000).
Kotsia, I. and Pitas, I. Facial Expression Recognition in
Image Sequences Using Geometric Deformation Features
and Support Vector Machines. IEEE Transaction on
Image Processing, 16, 1 (Jan 2007).
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar,
Z., and Matthews, I. The Extended Cohn-Kande Dataset
(CK+): A complete facial expression dataset for action
unit and emotion-specified expression. Paper presented at
the Third IEEE Workshop on CVPR for Human
Communicative Behavior Analysis (CVPR4HB 2010)
(2010).
Mehrabian, A. Communication without words. Psychology
Today, 2, 4 (1968), 53-56.
Naika, C.L.S., Jha, S. Shekhar, Das, P. K., and Nair, S. B.
www.ijcsce.org
14
ISSN: 2319-7080
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
International Journal of Computer Science and Communication Engineering
Volume 2 Issue 3 (August 2013 Issue)
Automatic Facial Expression Recognition Using Extended
AR-LBP.
WIRELESS
NETWORKS
AND
COMPUTATIONAL INTELLIGENCE, 292, 3 (2012), 244252.
Ojala, T., Pietikäinen, M., and Harwood, D. A
Comparative Study of Texture Measures with
Classification Based on Feature Distributions. Pattern
Recognition, 19, 3 (1996), 51-59.
Ojala, T., Pietikäinen, M., and Mäenpää, T.
Multiresolution Gray-scale and Rotation Invariant Texture
Classification with Local Binary Patterns. IEEE Trans.
Pattern Analysis and Machine Intelligence, 24, 7 (2002),
971-987.
Ojansivu, V. and Heikkilä, J. Blur Insensitive Texture
Classification Using Local Phase Quantization. in Proc.
ICISP, Berlin, Germany (2008), 236-243.
Pantic, M. and Patras, Ioannis. Dynamics of Facial
Expression: Recognition of Facial Actions and Their
Temporal Segments From Face Profile Image Sequences.
IEEE Transactions on Systems, Man, and Cybernetics, 36,
2 (2006), 433-449.
Paul Ekman. Basic Emotions. In Handbook of Cognition
and Emotion. John Wiley & Sons, 1999.
Shan, C., Gong, S., and McOwan, P.W. Robust Facial
Expression Recognition Using Local Binary Patterns. In
Proc. IEEE Int’l Conf. Image Procession ( 2005), 370373.
Shan, C., S.Gong, and P.W.McOwan. Facial expression
recognition based on Local Binary Patterns: A
comprehensive study. Image and Vision Computing, 27, 6
(2009), 803-816.
Tian, Y. L., Kanade, T., and Cohn, J. F. Recognizing
action units for facial expression analysis. IEEE Trans.
Pattern Anal. Mach. Intell, 23, 2 (2001), 97-115.
Tong, Y., Liao, W., and Ji, Q. Facial Action Unit
Recognition by Exploiting Their Dynamic and Semantic
Relationships. IEEE Trans. Pattern Anal. Mach. Intell, 29,
10 (2007), 1-17.
Valstar, M. F. and Pantic, M. Combined support vector
machines and hidden Markov models for modeling facial
action temporal dynamics. In in Proc. IEEE Int. Conf.
Comput. Vis. Pattern Recog. Workshop Human Comput.
Interact. ( 2007), 118-127.
Yang, S. and Bhanu, B. Understanding Discrete Facial
Expressions in Video Using an Emotion Avatar Image.
IEEE Transactions on Systems, Man, and Cybernetics,
Part B: Cybernetics, (2012), 980-992.
Yeasin, M., Bullot, B., and Sharma, R. Recognition of
Facial Expressions and Measurement of Levels of Interest
From Video. IEEE Trans. Multimedia, 8, 3 (2006), 500508.
Zeng, Z., Pantic, M., Roisman, G., and Huang, T. A
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
survey of affect recognition methods: Audio, visual, and
spontaneous expressions. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 31, 1 (2009), 39–58.
Zhang, Y. and Ji, Q. Active and dynamic information
fusion for facial expression understanding from image
sequences. IEEE Trans. Pattern Anal. Mach. Intel., 27, 5
(2005), 699–714.
Zhang, Z., Lyons, M., Schuster, M., and Akamatsu, S.
Comparison between geometry-based and Gaborwavelets-based facial expression recognition using multilayer perceptron. In Third IEEE International Conference
on Automatic Face and Gesture Recognition, Proceedings.
( 1998), 454-459.
Zhao, G. and Pietikainen, M. Dynamic texture recognition
using local binary patterns with an application to facial
expressions. IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI)., 29, 6 (2007), 915–928.
Zhao, G. and Pietikainen, M. Facial Expression
Recognition Using Constructive Feed forward Neural
Networks. IEEE Transactions on Systems, Man, and
Cybernetics., 34, 3 (2004), 1588–1595.
Yang, S.; Bhanu, B., "Understanding Discrete Facial
Expressions in Video Using an Emotion Avatar Image,"
Systems, Man, and Cybernetics, Part B: Cybernetics,
IEEE Transactions on , vol.42, no.4, pp.980,992, Aug.
2012
Kabir, H.; T. Jabid, and O. Chae."Local Directional
Pattern Variance (LDPv):A Robust Feature Descriptor for
Facial Expression Recognition." The International Arab
Journal of Information Technology 9(4) (2012).
Xiaohua, H.; Zhao, G.; Matti Pietikäinen, Wenming Zheng
(2011): Expression Recognition in Videos Using a
Weighted Component-Based Feature Descriptor. SCIA
2011: 569-578
Kumar A. 2009. Analysis of Unsupervised Dimensionality
Reduction Techniques, Computer Science and Information
Systems, vol. 6, no. 2, pp. 217-227.
Mohammad Shahidul Islam (PhD) received his
B.Tech. degree in Computer Science and
Technology from Indian Institute of TechnologyRoorkee (I.I.TR), Uttar Pradesh, India in 2002,
M.Sc. degree in Computer Science from American
World University, London Campus, U.K in 2005
and M.Sc. in Mobile Computing and
Communication from University of Greenwich,
London, U.K in 2008. He has successfully finished his Ph.D. degree in
Computer Science & Information Systems from National Institute of
Development Administration (NIDA), Bangkok, Thailand in 2013.. His
field of research interest includes Image Processing, Pattern Recognition,
wireless and mobile communication, Satellite Commutation and
Computer Networking.
www.ijcsce.org
15