Polymorphous Facial Trait Code

Polymorphous Facial Trait Code
Ping-Han Lee1 , Gee-Sern Hsu2 , and Yi-Ping Hung3
1
2
Graduate Institute of Computer Science and Information Engineering, National
Taiwan University
Department of Mechanical Engineering, National Taiwan University of Science and
Technology
3
Graduate Institute of Networking and Multimedia, National Taiwan University
Abstract. The recently proposed Facial Trait Code (FTC) formulates
the component-based face recognition problem as a coding problem using
Error-Correcting Code. The development of FTC is based on the extraction of local feature patterns from a large set of faces without significant
variations in expression and illumination. This paper reports a new type
of FTC that encompasses the faces with large expression variation and
under various illumination conditions. We assume that if the patches of
a local feature on two different faces look similar in appearance, this pair
of patches will also show similar visual patterns when both faces change
expressions and are under different illumination conditions. With this
assumption, we propose the Polymorphous Facial Trait Code for face
recognition under illumination and expression variations. The proposed
method outperforms the original Facial Trait Code substantially in solving a strict face verification problem, in which only one facial image per
individual is available for enrolling to the gallery set, and the probe set
consists of facial images with strong illumination and expression variations.
1
Introduction
Local features are commonly considered effective for face recognition. Different
feature extraction methods result in different descriptors of local features. Liao
and Li [1] extracted 17 local features using the Elastic Graph Matching (EGM),
and each of these 17 features had its own specific spot on a face, for example,
the corners of eyes, the ends of eyebrows, and the centers of lips. Deformable
graphs and dynamic programming were used in [2] to determine eyes, nose,
mouth, and chin. A two-level hierarchical component classifier was proposed in
[3] to locate 14 feature patches in a face, and [4] showed that face recognition
using these 14 feature patches outperformed the same recognition method but
using the whole face as the feature. Ivanov et. al. [5] extended this study by
experimenting with a few different recognition schemes using the same set of 14
feature patches. Recently, an iterative growing process was proposed to further
improve the localization of these 14 feature patches, leading to a two-layered
identification approach proven to perform well in identifying faces with large
pose and illumination variations [6]. Few have different perspectives toward the
definition of such local features, as the features are perceivable to our nature,
and many feature-based approaches have yielded promising performance.
Some unperceivable features, however, have been emerging in recent years
that have created some additional dimension for the computer analysis of human
faces. In [7], a few hundred local features are selected from a large set of local
facial patches with different sizes, locations, and orientations to form a face
similarity function able to distinguish between intra-personal and extra-personal
variations. Most of the selected features are unperceivable and cover only part
of the perceivable features, but the combination of these features gives a strong
classifier to recognize faces. This paper extends the study on these unperceivable
facial features.
It has been shown in [8] that patterns exist in many of the unperceivable features and can be extracted, and the extracted patterns can be used to decompose
and encode a face. The unperceivable features with patterns good for discriminating faces are called facial traits, and the associated face coding scheme is
called the Facial Trait Code. Through empirical observations, we found that the
variations across human faces can be categorized into two types: the inherent
variation and the external variation, the former is the variation caused by the
inherent difference between people, while the latter is the variation caused by
different conditions, such as illumination conditions or facial expressions, under
which facial images are taken. In [8], the facial patterns are extracted based on
a large collection of facial images called the FTC face set, and basically it contains faces taken under the inherent variation only. Hence the patterns extracted
can be regarded as the inherent patterns that best discriminate between different
people. However, if we take faces under both inherent and external variations
into account, [8] will extract a mixture of inherent and external patterns. The
external patterns, which capture the variation in external conditions such as illumination and facial expressions, are useless for discriminating different people.
Without a proper mechanism, these external patterns will cripple the FTC for
face recognition problem.
In this paper, we propose a novel Facial Trait Code, called Polymorphous
Facial Trait Code, or POLYFTC for short, that handles the inherent and
external patterns in a systematic way. The construction of the POLYFTC involves a two-stage pattern extraction scheme that extracts inherent patterns
and their associating external patterns hierarchically. The corresponding elaborated encoding and decoding schemes are also proposed, which jointly recognize
human faces under variations in illumination conditions and facial expressions
robustly. This paper will begin with an introduction to the Facial Trait Code in
Section 2, followed by the development of the Polymorphous Facial Trait Code
in Section 3. A comparative study on the face recognition performance using the
POLYFTC and other algorithms will be reported in Section 4. The conclusion
and contribution of our study will be summarized in Section 5.
2
Facial Trait Code
The original version of the Facial Trait Code (FTC) is reported in [8], and is
summarized in this section.
2.1
Facial Trait Extraction and Associated Codewords
One can specify a local patch on a face by a rectangle bounding box {x, y, w, h},
where x and y are the 2-D pixel coordinates of the bounding box’s upper-left
corner, and w and h are the width and height of this bounding box, respectively.
If the bounding box is moved from left to right and top to bottom in the face
with various sizes of steps, denoted by ∆x and ∆y pixels in each direction, and if
w and h can change from some small values to large values, we will end up with
an exhaustive set of local patches across the face. The number of the patches
grows with the size range of the patches and the reduction in the step size. With
an extensive experimental study on the size range and step, [8] ended up with
slightly more than a couple thousands of patches for a face with 80x100 pixels
in size. In the following, we assume M patches in total obtained from a face.
A large collection of facial images, called the FTC face set, is needed for
FTC construction. Assuming K faces available from the FTC face set, and all
faces aligned by the centers of both eyes, the above running box scheme will give
a stack of K patch samples in each patch. To cluster the K patch samples in
each patch stack, the Principal Component Analysis (PCA) is firstly applied to
extract the features. Considering the case that the K facial images can be from
L individuals (L ≤ K, i.e., one individual may have multiple facial samples),
for each patch stack the Linear Discriminant Analysis (LDA) is then applied
to determine the L most discriminant low dimensional patch features for the
L individuals. It is assumed that the L low dimensional patch features in each
patch stack can be modeled by a Mixture of Gaussian (MoG), then the unsupervised clustering algorithm proposed by Figueiredo and Jain [9] can be applied
to identify the MoG patterns in each patch stack. Assuming M patch stacks are
available, this algorithm can cluster the L low dimensional patch features into
ki clusters in the i-th patch stack, where i = 1, 2, ..., M . The ki clusters in the
i-th patch stack were considered the patterns existing in this patch stack, and
they are called the patch patterns.
A scheme is proposed in [8] that selects some combination of the patches with
their patch patterns able to best discriminate the individuals in the FTC face set
by their faces. This scheme first define a matrix, called Patch Pattern Map
(P P M ), for each patch. P P M shows which individuals’ faces reveal the same
pattern at that specific patch. Let P P Mi denote the P P M for the i-th patch,
i = 1, 2, ..., M . P P Mi will be L × L in dimension in the case with L individuals,
and the entry at (p, q), denoted as P P Mi (p, q), is defined as follows:

 0 if the patches on the faces of the p-th and the q-th
individuals are clustered into the same patch pattern
P P Mi (p, q) =

1 otherwise
Fig. 1. An example of the selected facial traits, revealing the fact that the brighter the
region, the more the traits are located.
Given N patches and their associated P P Mi ’s stacked to form a L × L × N
dimensional array, there are L(L − 1)/2 N -dimensional binary vectors along the
depth of this array because each P P Mi is symmetric matrix and one can only
consider the lower triangular part of it. Let vp,q (1 ≤ q < p ≤ L) denote one of
the N -dimensional binary vectors, then vp,q reveals the local similarity between
the p-th and the q-th individuals in terms of these N local patches. More unities
in vp,q indicates more differences between this pair of individuals, and on the
contrary, more zeros shows more similarities in between.
The binary vector vp,q motivated the authors in [8] to apply the Error Correcting Output Code (ECOC) [10] to their study. If each individual’s face is
encoded using the most discriminant patches, defined as the facial traits, then
the induced set of [vp,q ]1≤q<p≤L can be used to define the minimum and maximum Hamming distance among all encoded faces in the corresponding code
space. The vp,q with the least (most) of unities gives the minimum (maximum)
Hamming distance. To maximize the robustness against possible recognition errors in the decoding phase, authors in [8] proposed an Adaboost algorithm to
maximize the dmin , the minimum Hamming distance, for the determination of
the facial traits from the overall patches.
Assuming N facial traits selected from the the overall M patches, and each
with trait patterns symbolized by 1, 2, ..., ki , i = 1, 2, ..., N , one can now define
the codewords in FTC. Each codeword is of length N and n-ary where n is the
largest number of the trait patterns found in one single trait. In summary, given
a
collection of faces asQthe FTC face set, one can define N facial traits,
Plarge
N
N
k
trait patterns, and i=1 ki faces (or FTC codewords). An example of
i
i=1
the selected facial traits is give in Figure 1, revealing the fact that the brighter
the region, the more the traits are located.
2.2
FTC Encoding and Decoding
With a pre-selected length of the FTC codeword, N , the FTC face set defines N
facial traits of different sizes, orientations, and locations, and also the patterns
in each facial trait. Each facial trait pattern is tagged with a number, which
will be used as a symbol in the FTC codeword. In the FTC encoding, a given
face is firstly decomposed into N patches according to the specifications given
by the N facial traits, and each patch is then classified into a specific facial trait
pattern and numbered as the pattern tag. An ordinary classifier can be used for
the patch classification. The authors in [8] applied a Nearest-Neighbor classifier
based on feature vectors resulting from LDA. The given face is therefore encoded
into a n-ary FTC codeword of length N .
In practice the images in the gallery set are firstly encoded into gallery
codes. Given a probe, an image from the probe set, it is also firstly encoded into
a probe code. The FTC decoding matches this probe code against all gallery
codes, and finds the ’closest’ one using the Hamming distance as the measure.
Given two codewords gc = [g1 g2 ...gN ] and pc = [p1 p2 ...pN ], the Hamming
distance can be easily interpreted using the code difference dc = [d1 d2 ...dN ]
where
0
if pi = gi
di =
1
otherwise.
Then the Hamming distance between gc and pc is given by the following,
D(gc , pc ) =
N
X
di .
(1)
i=1
3
Polymorphous Facial Trait Code
As stated in Introduction, human facial images are taken under inherent and
external variations. The original FTC [8] considered mainly facial images taken
under only the inherent variation. Although it was reported to be effective in
identifying faces under inherent variations, it is expected to have degraded performance when faces taken under external variations are involved, owing to no
mechanism was proposed to handle such a situation.
In this paper we propose a novel Facial Trait Code that handles the inherent
and external patterns in a systematic way, and robustly recognizes faces taken
under variations in illumination conditions and facial expressions. We begin with
dividing the FTC face set into two disjoint subsets, the Trait Extraction Set
and the Trait Variation Set, respectively. The Trait Extraction Set consists
of a large number of frontal facial images taken under the inherent variation
only (i.e. taken with neural expression and evenly distributed illumination). The
Trait Variation Set consists of facial images taken under both inherent and
external variations. Assume the Trait Extraction and Variation Set has nE and
nV facial images, respectively, the following sections give the construction of the
proposed POLYFTC.
3.1
The First Stage of Clustering: Extraction of Inherent Patterns
For each of the M patches defined in Section 2.1, we extract its trait patterns
following the procedures described in [8]. Instead of using the whole FTC Face
Set, as is the case in [8], we use facial images in the Trait Extraction Set only.
Assuming that the inherent variation across faces in the Trait Extraction Set
follows a Gaussian Mixture model, the first stage of clustering extracts the
corresponding inherent patterns. Then, based on the extracted patterns, we select
the N most discriminative facial traits out of M patches using the Adaboostbased algorithm proposed in [8]. Assume each facial trait has ki patterns, i =
1 ∼ N . For each of the i-th facial trait, this step clusters the total nE patch
samples in the Trait Extraction Set into ki disjoint subsets, denoted as Ei,j ,
j = 1 ∼ ki , which is the collection of patch samples cropped from faces belong
to the j-th inherent pattern in the Trait Extraction Set.
3.2
The Second Stage of Clustering: Extraction of External
Patterns
Ei,j is defined in the Trait Extraction Set, which contains faces with only inherent
variation. We denote Vi,j as the counterpart of Ei,j in the Trait Variation Set,
and Vi,j contains patch samples whose identities are all in Ei,j . We define Pi,j
as the union of Ei,j and Vi,j , and it contains patch samples belong to the same
inherent pattern, but are taken under various external variations.
In our study, we found that when the patches of a local feature on two different faces are clustered into the same inherent pattern, this pair of patches will
also show similar visual patterns when both faces are taken under the same external variation (e.g. when both faces change expressions, or when they are taken
under another illumination condition). Based on this observation, we assume that
the external variation across patch samples belong to the same inherent pattern
also follows a Gaussian Mixture model, and the second stage of clustering
upon these patch samples extracts the corresponding external patterns. Fig. 2
(a) illustrates an example of the clustering result. Patch samples in this figure
belong to the same inherent pattern, and those in the same row are clustered as
the same external pattern. From this figure, it appears that when different people have the same inherent mouth pattern, their mouths taken under the same
illumination condition look similar (Figure 2 (a), the second row shows left-lit
samples and the third row shows right-lit samples), or with different expressions
(the fourth row for smiling and the fifth row for shouting). For the same local
feature, Fig. 2 (b) illustrates patch samples belong to another inherent pattern.
3.3
Polymorphous Patterns for Illumination and Expression Robust
Encoding
The proposed POLYFTC aims to encode human faces with their inherent variations maximized, while this encoding process is made invariant to external
variations. We define a Polymorphous Pattern as a set of external patterns
belong to the same inherent pattern. The Facial Trait Code using the polymorphous patterns is thus called the Polymorphous Facial Trait Code. Recall
that the FTC encoding transforms a facial image into a codeword. Each digit
location in the FTC codeword is a pattern tag of the associating facial trait, and
this pattern tag is the classification result of a trait-specific classifier. When we
apply polymorphous patterns, this encoding process needs to be modified. The
following gives the elaborated POLYFTC encoding scheme.
(a)
(b)
Fig. 2. See text.
1. For the i-th facial trait, assume the above construction results in ki polymorphous patterns, each consists ki,j external patterns. j = 1 ∼ ki . The total
Pki
number of external patterns exist in the i-th trait is thus Ki = j=1
ki,j .
We relabel all the n, n = nE + nV , patch samples in the whole FTC face set
using their external pattern tags, and it gives Ki classes.
2. Perform LDA on these n patch samples based on their external patterns
labels.
3. Train a Ki -class Support Vector Machine (SVM) using the resulting LDA
feature vectors.
4. Repeat step 1 through 3 for all the N facial traits to complete the POLYFTC
training stage.
5. In the POLYFTC encoding stage, a facial image is spatially decomposed
into N local patch samples. For each patch sample, the corresponding traitspecific SVM classifies it into one of total Ki external patterns, and the
resulting external pattern tag gives the polymorphous pattern tag it belongs
to.
6. The resulting N polymorphous pattern tags are concatenated to form a N digit codeword, which is the POLYFTC encoding result.
The training and encoding of the proposed POLYFTC are illustrated in Fig.
3. The POLYFTC decoding, required for the face recognition application, is the
same with that of the FTC. Using the polymorphous patterns has the following
advantages:
1. The proposed two-stage pattern extraction scheme excludes faces with significant external variations from the extraction of inherent patterns based
on which the facial traits are selected and the code space is constructed.
With these inherent patterns that capture the inter-person variations, the
resulting POLYFTC gives the maximum separation of different identities in
its code space, hence the maximum error-correcting capability 4 .
4
Since the corresponding dmin , the minimum codeword distance between identities
in the FTC face set, is maximized. Please refer to [8] for more details.
Fig. 3. The flow chart of the polymorphous pattern extraction and encoding.
2. A polymorphous pattern encompasses patch samples of the same inherent
pattern taken under various external conditions. If patch samples on two
faces actually belong to the same polymorphous pattern, no matter under
what external conditions the two faces are taken, their encoding results will
still be the same. This makes the proposed POLYFTC encoding robust to
faces taken under variations in illumination conditions or facial expressions.
3. The trait-specific classifier for the original FTC treats each pattern as a class.
If the FTC face set includes facial images taken under significant external
variations, there will be a large intra-pattern variation, which makes the
pattern classification difficult. The proposed POLYFTC eludes this difficulty
by treating external patterns, whose intra-pattern variations are relatively
small, as classes. The result is a superior recognition rate, as will be reported
in the next section.
4. The introduction of external patterns makes the proposed POLYFTC applicable to the facial expression recognition problem, since the POLYFTC
encoding actually recognizes the external patterns besides polymorphous
patterns. This is one of our ongoing research topics.
4
Experimental Results
To demonstrate the effectiveness of the proposed algorithm, we conducted several
experiments on the AR face database [11]. There are 126 different people (70 men
and 56 women) in the AR database. Each person participated in two sessions,
separated by two weeks (14 days). We include one neutral faces (Fig. 4 (a)),
three faces taken under three different facial expression (Fig. 4 (b),(c) and (d)),
and three faces taken under three different illumination conditions (Fig. 4 (e),(f)
and (g)). These faces are aligned with the centers of two eyes, converted to gray
scale images, and resized to 80-by-100 pixels.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 4. Samples of faces from the AR database.
We compared the performance of the proposed algorithm with two baseline
algorithms, Eigenface [12] and Fisherface [13], the LBP approach [14], and the
original FTC [8]. Note that instead of using the nearest neighbor classifier for
trait pattern classification [8], in this work we applied the Support Vector Machine5 (SVM), and it gives about 10% boost in identification accuracy. For both
FTC and POLYFTC, 64 facial traits are selected to form 64-digit codewords.
Both the identification and verification results are reported.
4.1
Test Protocol and Performance Comparison
We randomly select 63 identities from 126 ones with their samples to form the
training set. The facial samples belong to the remaining 63 identities taken during
the first session form the gallery set, and those taken during the second session
form the probe set. Face recognition algorithms are trained using samples in
the training set, and they are not allowed to alter their models afterward. The
facial images in the gallery set are enrolled, and images in the probe set are
identified or verified against the gallery identities. The number of sample per
person (SPP for short) in the gallery set is considered as a factor, it ranges
from 1 to 7. SSP=1 presents a very strict protocol, in which only one image is
enrolled, and it may be taken under strong illumination variation or slight facial
expressions.6 Meanwhile, the probe set includes images under all the kinds of
variations.
Fig. 5 (a) shows the identification results; (b) shows the Equal Error Rates in
verification problem; (c) shows the Hit Rates when the False Alarm Rate equals
to 0.01. Each data point in these figures is the averaged result over 20 rounds of
random identity selection. Note that the performance of the Eigenface and LBP
approaches decrease dramatically when SPP decreases, as expected, since the
two algorithms do not learn any feature that is invariant to within-person variation in appearance. Obviously the proposed POLYFTC outperforms all other
algorithms consistently. Note that POLYFTC outperforms FTC substantially
for the verification problem in Fig. 5 (c). The reason is that the minimum pairwise codeword distance, or dmin , among the gallery codewords for POLYFTC
is typically around 50, while it is around 14 for FTC in our experiment, and
5
6
For both FTC and POLYFTC implemented in this paper.
We do not use Fig. 4 (d) for enrolling, since in practice it is rarely the case.
(a) identification
(b) verification:
Error-Rate
Equal- (c) verification: Hit-Rate
Fig. 5. Performance of different algorithms. The y-axis is the accuracy and the x-axis
is the SPP.
it means that codewords of different identities are far more well separated in
the POLYFTC code space than those in the FTC code space. Table 1 gives the
summary of the performance of all the algorithms under SPP equals 1 and 7.
SPP
algorithm
EIGEN [12]
FISHER [13]
LBP [14]
FTC [8]
POLYFTC
Ident
0.80
0.78
0.88
0.85
0.90
7
HIT
0.67
0.64
0.75
0.69
0.90
EER
0.102
0.118
0.106
0.056
0.046
Ident
0.42
0.62
0.53
0.66
0.70
1
HIT
0.22
0.46
0.31
0.49
0.60
EER
0.298
0.149
0.291
0.106
0.099
Table 1. Summary of the performance of algorithms.
5
Conclusion and Future Work
In this paper we propose a new type of the Facial Trait Code [8]. The proposed
algorithm applies a more sophisticated two-stage clustering scheme to extract
inherent and external patterns of human facial parts. The inherent patterns
capture the genuine variation in human facial appearance, while the external
patterns capture the variations caused by illumination or facial expressions. The
proposed algorithm yields promising recognition results for faces under illumination and expression variations. An it achieves significant better verification
performance than the original Facial Trait Code.
The introduction of external patterns makes the proposed POLYFTC applicable to the facial expression recognition problem, since the POLYFTC encoding
actually recognizes the external patterns besides the polymorphous pattern. Our
future work will develop an algorithm that simultaneously recognizes the facial
expression, illumination condition and identity of a given face.
References
1. Liao, R., Li, S.Z.: Face recognition based on multiple facial features. In: In Proc.
of the 4th IEEE Int. Conf. on Automatic Face and Gesture Recognition, Dekker
Inc (2000) 239–244
2. Ahlberg, J.: Facial feature extraction using deformable graphs and statistical pattern matching. In: in Swedish Symposium on Image Analysis, SSAB. (1999)
3. Heisele, B., Thomas, S., Sam, P., Poggio, T.: Hierarchical classification and feature
reduction for fast face detection with support vector machines. Pattern Recognition
36(9) (2003) 2007–2017
4. Heisele, B., Ho, P., Wu, J., Poggio, T.: Face recognition: component-based versus
global approaches. CVIU 91(1) (2003) 6–12
5. Ivanov, Y., Heisele, B., Serre, T.: Using component features for face recognition.
In: FGR’04. (2004) 421
6. Heisele, B., Serre, T., Poggio, T.: A component-based framework for face detection
and identification. IJCV 74(2) (2007) 167–181
7. Jones, M.J., Viola, P.: Face recognition using boosted local features. Technical
report (2003)
8. Lee, P.H., Hsu, G.S., Chen, T., Hung, Y.P.: Facial trait code and its application to
face recognition. In: 4th International Symposium on Visual Computing. Volume
5359. (2008) 317–328
9. Figueiredo, M., Jain, A.: unsupervised learning of finite mixture models. PAMI
24 (2002) 381–396
10. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research 2 (1995) 263–
286
11. Martinez, A., Benavente, R.: The ar face database. Technical Report 24, CVC
(1998)
12. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1) (1991) 71–86
13. Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. fisherfaces: Recognition
using class specific linear projection. PAMI 19(7) (1997) 711–720
14. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns:
Application to face recognition. In: PAMI. (2006) 2037–2041