A Multimodal Approach for Face
Modeling and Recognition
指導老師: 萬書言 老師
學生: 何炳杰
1
Outline
Abstract
Introduction
3-D Face Recognition Based On Ridge Images And
Iterative Closest Points
2-D Face Recognition Based On Attributed Graphs
Fusing The Information From 2-D And 3-D
Experiments And Results
2
Abstract 1/3
In this paper, we present a fully automated multimodal
(3-D and 2-D) face recognition system.
For the 3-D modality, we model the facial image as a
3-D binary ridge image that contains the ridge lines
on the face.
We use the principal curvature max to extract the
locations of the ridge lines around the important facial
regions on the range image (i.e., the eyes, the nose,
and the mouth.)
3
Abstract 2/3
For the 2-D modality, we model the face by an
attributed relational graph (ARG).
Each node of the graph corresponds to a facial
feature point. At each facial feature point, a set
of attributes is extracted by applying Gabor
wavelets to the 2-D image and assigned to the
node of the graph.
4
Abstract 3/3
Finally, we fuse the matching results of the
3-D and the 2-D modalities at the score
level to improve the overall performance of
the system.
5
Introduction 1/5
In this paper, we present a multimodal face
recognition system that fuses results from both 3-D
and 2-D face recognition.
The 2-D and the 3-D modeling data in our system is
independent of each other, this system can be
employed in different scenarios of face recognition,
such as 2-D or 3-D face recognition individually, or
multimodal face recognition.
6
Introduction 2/5
3-D binary ridge image
ARG
Fig. 1 illustrates a general block diagram of our system.
7
Introduction 3/5
For the 3-D modality:
(i) we use the principal curvature k max to extract the locations of the
ridge lines around the important facial regions in the range image
(i.e. the eyes, nose, and mouth).
(ii) we represent the face image as a 3-D binary ridge image that
contains the ridge lines on the face.
(iii) In the matching phase, instead of using the entire surface of the
face, we only match the ridge lines.
(By (iii) This reduces the computations during the matching
process. )
8
Introduction 4/5
For 2-D modality, we build an attributed relational graph
using nodes at certain labeled facial points.
In order to automatically extract the locations of facial points, we
use an improved version of active shape model (ASM) .
At each node of the graph, we compute the response of 40 Gabor
filters in eight orientations and five wavelengths.
The similarity between the ARG models is employed for 2-D
face recognition.
9
Introduction 5/5
The similarity between the ARG models is employed for 2-D face
recognition.
In summary, the main contributions of this paper are:
presenting a fully automated algorithm for 3-D face recognition based on the
ridge lines of the face;
developing a fully automated algorithm for 2-D face recognition based on
attributed relational graph models.
presenting and comparing two methods for the fusion of the 2-D and 3-D face
recognition based on the Dempster– Shafer (DS) theory of evidence and the
weighted sum of scores technique;
evaluating the performance of the system using the FRGC2.0 database.
10
3-D Face Recognition Based On Ridge
Images And Iterative Closest Points 1/3
A. Ridge Images(山脊影像)
Our goal is to extract and use the points lying on ridge
lines as the feature points on the surface.
For facial range images, these are points on the lines
around the eyes, the nose, and the mouth.
In the literature [13], ridges are defined as the points at
which the principal curvature of the surface attains a local
positive maximum.
Intuitively, valleys are the points that illustrate the drainage
patterns and are referred to as ridges when looked at from
the opposite side.
11
圖. 2顯示了一個例子,一山脊圖像得到了的Kmax閾值。這是一張三維二值影像
顯示臉部表面上山脊線的位置。
12
3-D Face Recognition Based On Ridge
Images And Iterative Closest Points 2/3
B. Ridge Image Matching
In this work, we use a fast ICP variant [33].
The difference in the ICP that we used in this paper and the ICP in
[33] is in the phase of feature point selection.
We do not rely on random sampling of the points and we use all of
the feature points in the 3-D ridge image during the matching
process.
Although random sampling of the points speeds up the
matching process, it has a major effect on the accuracy of the
final results. …作者的觀點
13
3-D Face Recognition Based On Ridge
Images And Iterative Closest Points 3/3
Before matching the ridge images, we initially align the ridge
images using three extracted facial landmarks (i.e., the two inner
corners of the eyes and the tip of the nose).
We use a fully automated technique to extract these facial
landmarks, based on Gaussian curvature.
14
As shown in Fig. 3
Fig. 3, the surface that either has a peak or a pit shape
has a positive Gaussian curvature value.
15
As shown in Fig. 4
眼窩
鼻尖/頭
Fig. 4 shows a sample range image with the three extracted facial landmarks
16
2-D Face Recognition Based On Attributed
Graphs 1/14
Elastic bunch graph matching (EBGM) represented a facial image by
a labeled graph called bunch graph.
Where edges are labeled with distance information and nodes are
labeled with wavelet responses bundled in jets.
In addition, bunch graphs are treated as combinatorial entities in
which, for each fiducial point, a set of jets from different sample faces
is combined, thus creating a highly adaptable model.
17
2-D Face Recognition Based On Attributed
Graphs 2/14
In mathematics, a geometric graph is a graph in which the vertices
or edges are associated with geometric objects or configurations .
A triangulation is a technique for building a geometric graph.
Delaunay triangulation, a graph defined from a set of points in
the plane by connecting two points with an edge whenever a
circle exists containing only those two points.
18
Delaunay triangulation
19
2-D Face Recognition Based On Attributed
Graphs 3/14
In this paper, the goal is to model 2-D facial images by
attributed relational graphs.
20
2-D Face Recognition Based On Attributed
Graphs 4/14
A. Building the Attributed Graph
An ARG [26] consists of a set of nodes, edges, and mutual
relations between them.
Let us denote the ARG by g (V , , R) ,
where V {v1 , v2 ,..., vN } is the set of N nodes of the graph
and {e1 , e2 ,..., eM } is the set of M edges.
The nodes of the graph represent the extracted facial
features.
R is a set of mutual relations between the three edges of
each triangle in the Delaunay triangulation.
21
2-D Face Recognition Based On Attributed
Graphs 5/14
Mathematically, we write R {rijk | ei , e j , ek Dt } ,
where Dt is the set of triangles in Delaunay
triangulation.
Recall that a Delaunay triangulation Dt ( P) for a
set P of points satisfies the condition that no
point in P is inside the circumcircle of any
triangle in Dt ( P) .
22
2-D Face Recognition Based On Attributed
Graphs 6/14
Where specifies the orientation of the wavelet, is the
wavelength of the sine wave, is the radius of the Gaussian, is
the phase of the sine wave, and γ specifies the aspect ratio of the
Gaussian.
The kernels of the Gabor filters are selected at eight orientations
(i.e., {0, /8,2 /8,3 /8,4 /8,5 /8,6 /8,7 /8} ) and five wavelengths
(i.e., {1, 2,2,2 2,4} )
23
2-D Face Recognition Based On Attributed
Graphs 7/14
Specifically, referring to Fig. 5, the mutual relations used in this
work are defined to be :
24
2-D Face Recognition Based On Attributed
Graphs 8/14
B. Facial Feature Extraction
In this paper, we transform the color image into HSV
space and assume that the three channels, (i.e., hue,
saturation, and value) are statistically independent
and the normalized first derivative for each channel
along a profile line satisfies a multivariate Gaussian
distribution.
25
2-D Face Recognition Based On Attributed
Graphs 9/14
The best match for a probe sample in HSV color space to a
reference model is found by minimizing the distance :
g i : is the sample profile.
1
gi and i : are the mean and the covariance of the profile line of
th
the i component of the Gaussian model, respectively.
wi : is the weighting factor for the component of the model with
the constraint that the wh ws wv 1
26
2-D Face Recognition Based On Attributed
Graphs 10/14
C. Feature Selection
The number of feature points affects the
performance of the graph representation for face
recognition.
In this work, we initially extracted 75 feature
points and we then used a standard template to
add more features at certain positions on the face,
such as the cheek and the points on the ridge of
the nose.
27
2-D Face Recognition Based On Attributed
Graphs 11/14
By using the standard template (Fig. 6), the total number of
the feature point candidates represented by the nodes of the
ARG model was increased to 111 points.
28
2-D Face Recognition Based On Attributed
Graphs 12/14
Fig. 7 shows a sample face in the gallery along with the
candidate points for building the ARG model.
29
2-D Face Recognition Based On Attributed
Graphs 13/14
D. Recognition
Assume that the ARG models of two faces and are given. The
dissimilarity between these two ARGs is defined by
(1
Sv (.)) and Dr (.) are functions that measure the differences
between the nodes of the graph and the mutual relations of the
corresponding triangles from the Delaunay triangulation,
respectively.
The wv and wr are weighting factors.
30
2-D Face Recognition Based On Attributed
Graphs 14/14
The similarity measure S v (.) is defined as
a j : is the magnitude of the set of 40 complex
coefficients of the Gabor filter response, obtained
at the j th node of the graph.
31
Fusing The Information From 2-D And 3-D
1/4
The Tanh-estimators score normalization is efficient and
robust and is defined as
s j and s j n : are the scores before normalization and after
normalization.
The GH and GH are the mean and standard deviation
estimates, respectively.
32
Fusing The Information From 2-D And 3-D
2/4
Hampel estimators are based on the following influence
function:
where sign( ) = +1 if >=0 ; otherwise,sign( ) = -1 .
The Hampel influence function reduces the influence of
the scores at the tails of the distribution (identified by a, b,
and c ).
33
Fusing The Information From 2-D And 3-D
3/4
B. Fusion Techniques
The weighted sum score fusion technique is defined as :
R
wj
: is the weight of the j
th
modality with the condition
w
j 1
j
1
n
th
s
and j is the normalized score of the j modality.
34
Fusing The Information From 2-D And 3-D
4/4
In our case, the values of the weights w1 and w2 for the 3-D
and 2-D modalities, respectively.
Another fusion algorithm that we applied to combine the
results of the 2-D and 3-D face recognition is the DS
theory.
Based on the Dempster rule of combination, the match
scores obtained from two different techniques (i.e., two
modalities in our work) can be fused by
35
Experiments And Results 1/5
Fig. 8 shows the results of the
verification experiment for
neutral versus neutral facial
images.
As the ROC curve shows (also
the second row of Table II),
the 3-D modality has better
performance than the 2-D
modality (88.5% versus
79.80% verification at 0.1%
FAR) and the best verification
rate of multimodal (3-D + 2-D)
fusion belongs to the DS
combination rule (94.49% at
0.1% FAR).
36
Table II
37
Experiments And Results 2/5
Fig. 9 shows the verification
rate of the multimodal (3-D +
2-D) fusion, at 0.1% FAR,
with respect to different
weights for each modality.
Since there are only two
modalities, then w1 w2 1
and the x axis of Fig. 9 is w1 .
38
Experiments And Results 3/5
As the figure shows, the optimum weights that produce the
maximum fusion performance are 0.7 and 0.3, respectively,
for w1 and w2 .
39
Experiments And Results 4/5
Fig. 10 shows for various numbers of subjects enrolled in
the database the average rank-one identification rate.
40
Experiments And Results 5/5
Fig. 11 shows the cumulative match characteristic (CMC)
curve for the recognition, based on ridge images, of faces
with expressions using the FRGC v2.0 database.
41
Thank you !
42
© Copyright 2026 Paperzz