Methodologies to Build Automatic Point Distribution Models for

Methodologies to Build Automatic Point Distribution Models for
Faces Represented in Images
Maria João M. Vasconcelos
João Manuel R. S. Tavares
Faculdade de Engenharia da Universidade do Porto
Instituto de Engenharia Mecânica e Gestão Industrial
Laboratório de Óptica e Mecânica Experimental
Rua Drº Roberto Frias s/n, 4200-465 Porto, PORTUGAL
ABSTRACT: This paper presents new methods to automatically build Point Distribution Models for faces
represented in images. These models consider significant points of faces from several images and study them
in order to obtain the mean shape of the object and the main modes of variation. Active Shape Models and
Active Appearance Models use the Point Distribution Model to segment the modelled object in new images.
In this paper these models, their automatically building and some application examples from objects like faces
represented in images are describe.
1 INTRODUCTION
One of the most recent areas of interest in
Computational Vision is image analysis based on
flexible models. In this field, the use of statistical
methods for object modelling has proved to be
suitable to deal with problems in which the objects
have variable shapes.
This work is mainly concerned with the
employment of Point Distribution Models (PDMs) in
the modelling of objects represented in images
(Cootes et al. 1992). These models are obtained by
analysing the statistics of the co-ordinates of the
landmarks that represent the deformable object in
study: after aligning the object shapes, a Principal
Component Analysis is made and the mean shape of
the object and the main modes of its variation are
obtained. Grey levels of the objects can also be
modelled and used to build Active Shape Models
(ASMs) and Active Appearance Models (AAMs), in
order to segment (identify) the modelled object in
new images.
These statistical models have been very useful for
image analysis in different applications of
Computational Vision. For instance, they can be
used on areas like: medicine, for locating bones and
organs in medical images; industry, for industrial
inspection; and security, for face recognition.
Usually, because is manually made, the
determination of the landmark points of the objects
to be modelled is the most time consuming step of
the construction of PDMs, and so of ASMs and
AAMs as well. Consequently, some authors, like
(Hill & Taylor 1994, Baker & Matthews 2002, Hicks
et al. 2002, Angelopoulou & Psarrou 2004, Carvalho
& Tavares 2005, Vasconcelos 2005), have been
developing methodologies to fully automate this
stage. In this work, we present three methodologies
to automatically extract significant points from faces
represented in images.
The main goals of the present work are: the
introduction to the Point Distribution Models and its
variants, namely ASMs and AAMs; the building of
these models for faces represented in images using
fully automatic procedures; and its application,
namely, for its automatic segmentation in new
images.
This paper is organized as follows: in the next
section, the models considered are presented; in
section 3, are described our methods to
automatically extract landmark points of the faces to
be modelled, using the models previous presented; in
section 4, some experimental results are presented;
finally, in the last section, some conclusions and
perspectives of future work are addressed.
2 POINT DISTRIBUTION MODEL
(Cootes et al. 1992) describe how to build flexible
shape models for objects called Point Distribution
Models. These models are generated from examples
of shapes of the object to be modelled, where each
shape is represented by a set of labelled landmark
points. The landmarks can represent the boundary or
significant internal locations of the object (Fig. 1).
Figure 1. Training image, landmarks and an image labelled
with the landmark points (from left to right).
In this modelling method, all the training
examples are aligned into a standard co-ordinate
frame and a Principal Component Analysis is
applied to the co-ordinates of the landmark points.
This produces the mean position for each landmark,
and a description of the main ways in which these
points tend to move together. The equation below
represents the Point Distribution Model or Shape
Model and can be used to generate new shapes:
x = x + Ps bs ,
(1)
where x represents the n points of the shape:
T
x = ( x0 , y0 , x1 , y1 ,K , xn −1 , yn −1 )
,
( xk , yk ) the position of point k , x the mean position
of the points, Ps = ( ps1 ps 2 K pst ) the matrix of the
first t modes of variation, psi , corresponding to the
most significant eigenvectors in a Principal
Component Analysis of the position variables, and
T
bs = ( bs1 bs 2 K bst ) a vector of weights for each
mode.
If the shape parameters b are chosen inside
suitable limits (derived from the training set), then
the shapes generated by equation (1) will be similar
to those given in the original training set.
The local grey-level environment about each
landmark point can also be considered in the
modelling of an object represented in images. Thus,
statistical information is obtained about the mean
and covariance of the grey values of the pixels
around each landmark point. This information is
used in the PDMs variations: to evaluate the match
between landmark points in Active Shape Models
and to construct the appearance models in Active
Appearance Models, as we explain next.
2.1 Active Shape Model
After build the PDM and the grey level profiles for
each landmark point of an object, we can segment
that object in new images using the Active Shape
Models, an iterative technique for fitting flexible
models to objects represented in images (Cootes &
Taylor 1992a).
The referred technique is an iterative optimisation
scheme for PDMs allowing initial estimates of pose,
scale and shape of an object to be refined in a new
image. The used approach can be summarized on the
following steps: 1) at each landmark point of the
models is calculated the necessary movement to
displace that point to a better position; 2) changes in
the overall position, orientation and scale of the
model which best satisfy the displacements are
calculated; 3) finally, any residual differences are
used to deform the shape of the model by calculating
the required adjustments to the shape parameters.
In (Cootes et al. 1994) is presented an
improvement for the active shape models, which
uses multiresolution. Thus, initially the method used
constructs a multiresolution pyramid of the images
to be consider, by applying a Gaussian mask, and
then study the grey level profiles on the various
levels of the pyramid built, making this away active
models faster and reliable.
2.2 Active Appearance Model
This approach was presented in (Cootes et al. 1998)
and allow the building of texture and appearance
models. These models are generated by combining a
model of shape variation (a geometric model), with a
model of the appearance variations in a shapenormalized frame. The used statistical model of the
shape it is also described by equation (1). To build a
statistical model of the grey level appearance, we
deform each example image so that its landmark
points match the mean shape of the object, by using
a triangulation algorithm. We then sample the grey
level information, gim from the shape-normalized
image over the region covered by the mean shape.
To minimize the effect of global light variation, we
normalize this vector, obtaining g . By applying a
Principal Component Analysis to this data, we
obtain a linear model, the texture model:
g = g + Pg bg ,
(2)
where g is the mean normalised grey level vector,
Pg is a set of orthogonal modes of grey level
variation and bg is a set of grey level model
parameters.
Therefore, the shape and appearance of any
example of the object modelled can be defined by
vectors bs and bg .
Since there may be some correlation between the
shape and grey levels variations, we apply a further
Principal Component Analysis to the data of the
models. Thus, for each training example we generate
the concatenated vector:
 Ws bs   Ws PsT ( x − x ) 
b=
 ,
 =  T
 bg   Pg ( g − g ) 
(3)
where Ws is a diagonal matrix of weights for each
shape parameter, allowing the adequate balance
between the shape and the grey models. Then, we
apply a Principal Component Analysis on these
vectors, giving a further model:
b = Qc ,
(4)
where Q are the eigenvectors of b , and c is the
vector of appearance parameters controlling both the
shape and the grey levels of the model. Thus, an
example object can be synthesized for a given c by
generating the shape-free grey level object, from the
vector g , and deforming it using the landmark
points described by x .
3 AUTOMATIC EXTRACTION OF
LANDMARK POINTS
In Figure 2, are present some results in a training
image example using our method to automatically
extract landmark points of faces represented in
images.
a)
b)
c)
d)
e)
f)
3.1 Face Contour Extraction
This method extracts significant points of faces
represented in images; namely, on chin, eyes,
eyebrows and mouth.
The first step of our method uses a skin detection
algorithm to localize the face region. This algorithm
uses a skin representative model, built with skin
samples of the individual in study. Studies like
(Jones & Rehg 1999, Tien et al. 2004, Zheng et al.
2004, Carvalho & Tavares 2005) show that the skin
colour have usually the same luminance range and
with the study of the skin chromatic colours it is
possible to build a probability function for skin
regions.
Studies like (Campadelli et al. 2003) show that
the use of chrominance maps are useful for eyebrows
and eyes localization in images. Chromatic colours
can be obtained from the RGB colour space using
the transformation:
R

 Cr = R + G + B
.

B
Cb =
R+G+ B

(5)
Usually, eyes are characterized in CbCr plane by
low values on the red component, Cr , and high
values on the blue component, Cb , so the
chrominance map for eyes can be defined by the
following equation:
1
ˆ
EyeMap = ( Cb 2 ) + Cr
3
( )
2
 Cb  
+
 ,
 Cr  
Figure 2. a) Training image, b) segmentation result using the
skin algorithm, c) face contour extracted, d) eyebrows
and eyes found, e) mouth identified, and
f) final contours obtained.
3.2 Face Regular Mesh
The second method developed for automatically
extraction of the landmark points of faces
represented in images is based on the worked
presented in (Baker & Matthews 2004) that, to
construct active appearance models, consider the
landmark points as the nodes of a mesh defined on
the object to model.
Our method starts to identify face and eye regions
like described in the last section, and adjust a regular
rectangular mesh to the face region detected, rotating
it according to the angle given by the eye’s centroids.
The nodes of the mesh obtained are then considered
as landmark points of the object and used to build
active appearance models for the same one.
Figure 3, shows the face mesh result obtained in a
training image example using this method to
automatically extract landmark points of faces
represented in images.
(6)
where Cb 2 , Ĉr 2 and Cb / Cr are normalized to the
range [ 0, 255] and Ĉr is the negative of Cr (ie,
ˆ = 255 − Cr ). In our work, the EyeMap is used also
Cr
to identify the eyebrows region with good results.
In other hand, in our method, the mouth region is
identified using the HSV space, where H , S , V
represent hue, saturation and value, respectively;
where mouth is habitually characterized by having
high values on the saturation component.
By congregating the contours of the face,
eyebrows, eyes and mouth is possible to extract
landmark points from each of these zones.
Considering that the zone of the chin is the most
important segment of the face contour, we only use
the inferior part between ears.
a)
b)
Figure 3. a) Training image, b) face regular mesh
(red points) adapted to the face region (face contour in blue)
and rotated according to the eyes direction (yellow).
3.3 Face Adaptative Multiresolution Mesh
Finally, our third method combines the philosophy
of the first method described, using face, eyes and
mouth localization, and of the last method,
considering the landmark points as the nodes of the
defined meshes. So, our new method builds a
multiresolution mesh considering the face, eyes and
mouth positions.
After localizing the face, eyes and mouth regions
in the input image like described before, this new
method constructs adaptative meshes, in the eye and
mouth regions detected according to their
localization; and then adds additional nodes, in the
large mesh (that contains the face region), defined by
the external edges and the bounds of the sub-meshes
used in the regions of eyes and the mouth.
One example of the resulting final mesh using our
third method for faces represented in images is
presented in Figure 4.
a)
To the active shape model built, using 44
landmark points, the first 10 modes of variation
could explain 90% of all the shape variance of the
object modelled.
For the first face shape model trained (face
contour), it was found that for 95% of the shape
variance could be explained only by the first 13
modes of variation. By other hand, for the texture
model it was found that 95% of the variance could
be explained by the first 15 modes of variation.
Finally, the appearance model needs only 12 modes
of variation to explain 95% of the observed variance.
The first four modes of appearance variation are
shown in Figure 5.
1st mode
2nd mode
3rd mode
4th mode
b)
Figure 4: a) Training image and b) example of an adaptative
multiresolution mesh obtained for a face.
In all implementation developed for our methods
presented for automatically extract landmark points
of faces represented in images, we can choose the
parameters that define the resulting contour or mesh;
that is, the number of landmark points defined in
each interesting zone of the object to be modelled.
4 RESULTS
The methods described in this paper were used in
this work to automatically build active shape and
active appearance models for objects like faces
represented in images.
During this work we developed an application in
MATLAB to build shape models, using the Active
Shape Models software (Hamarneh 1999). For the
appearance models, we used the Modelling and
Search Software available in (Cootes 2004). The
images used in this paper are available in (Cootes
2004a).
For modelling faces represented in images, we
used a training set of 22 images and other 4 images
were used just for testing purpose. The active shape
model was build using the first method presented in
this paper for automatic extraction landmark points
of faces represented in images and the other two
methods presented from the same purpose were used
to build active appearance models.
We present results for active models using the
three approaches proposed for extracting landmark
points: the face contour method extracted 44
landmark points, we extract 49 landmark points with
the regular mesh approach and the third method
extracted 54 and 75 landmark points respectively.
Figure 5. First four modes of appearance variation for
the contour face model built ( ±2sd ).
For the model trained using an adaptative
multiresolution face mesh, it was found that 95% of
the variance of the object modelled could be
explained only by the first 3 modes of variation. In
the other hand, for the texture model, it was found
that 95% of the variance of the same object could be
explained by the first 14 modes of variation. In last,
the appearance model needs only 8 modes of
variation to explain 95% of the observed variance of
the object modelled.
The first four modes of variation of the texture
and appearance models built are shown in Figure 6.
1st mode
2nd mode
3rd mode
4th mode
Figure 6. First four modes of appearance variation for the
adaptative multiresolution face mesh model
considered ( ±2sd ).
In Figures 7, 8 and 9 are presented some
segmentation results obtained in a test image using
the active appearance models built using the face
contour model, regular face mesh model and
adaptative face mesh model.
Test image
1st iteration
7th iteration
12th iteration
17th iteration
21st iteration
Figure 7. Test image with initial position of the mean model
overlapped, and after the 1st, 7th, 12th, 17th and 21st iteration
of the search with the active appearance model built
for the face contour model.
Test image
1st iteration
10th iteration
15th iteration
19th iteration
24th iteration
Figure 8: Test image with initial position of the mean model
overlapped, and after the 1st, 10th, 15th, 19th and 24th iteration
of the search with the active appearance model built
for the regular face mesh model.
Test image
1st iteration
10th iteration
15th iteration
20th iteration
23rditeration
Figure 9: Test image with initial position of the mean model
overlapped, and after the 1st, 10th, 15th, 20th and 23rd iteration
of the search with the active appearance model built
for the adaptative face mesh model.
In the active appearance search process, 5 levels
of resolution were used and a maximum of 5
iterations were allowed per level. The active shape
models built using he alignment process that
consider the variance of landmark points, retained
95% of the variance of the object modelled and used
the grey level profile of 7 or 15 pixels long, were the
ones that obtained the best segmentation results. For
the active appearance models, the models built that
obtained best segmentation results considered 99%
of variance and 50000 pixels for the texture model.
In the face models, the mean error for
segmentation was between 6.2 and 15.5 pixels for
the active shape model, between 4.1 and 6.1 pixels
using the face contour extraction method, 1.3 and
4.9 pixels using the face regular mesh method, and
1.5 and 3.7 pixels using the face adaptative
multiresolution method, for the active appearance
models. The mean error calculated for each test
image consists in the Euclidean distance between the
landmark points obtained by the model used and the
object to be segmented.
5 CONCLUSIONS AND FUTURE WORK
A methodology to automatic build flexible models
was presented, using a statistical approach, for
deformable objects represented in images, namely
for faces objects.
The methods developed to automatically extract
landmark points from faces represented in images
showed to be reliable and that allow the building of
active shape models and active appearance models in
a fully automatic way.
The segmentation results obtained in this work
showed that the active appearance models built with
the regular face mesh model and the adaptative face
mesh model present better results than using the face
contour model.
In general, active appearance models allows the
construction of a robust model using relatively few
landmark points compared to active shape models;
so the first one is preferred in problems in which the
landmark points extraction is not a easy process.
For future work, the use of previous knowledge
about physical proprieties of the objects to be
modelled can be considered in the building of its
statistical models. Other interesting work can be the
study of the influence in the models built of the
number of training images used.
6 ACKNOWLEDGMENTS
This work was partially done in the scope of the
project “Segmentation, Tracking and Motion
Analysis of Deformable (2D/3D) Objects using
Physical Principles”, with reference POSC/EEASRI/55386/2004, financially supported by FCT –
Fundação para a Ciência e a Tecnologia from
Portugal.
REFERENCES
Angelopoulou, A. N. and A. Psarrou (2004). Evaluating Statistical Shape Models for Automatic
Landmark Generation on a Class of Human
Hands. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Istanbul.
Baker, S. and I. Matthews 2002. Automatic Construction of Active Appearance Models as an
Image Coding Problem IEEE Transactions
on Pattern Analysis and Machine Intelligence
26: 1380-1384.
Baker, S. and I. Matthews 2004. Automatic Construction of Active Appearance Models as an
Image Coding Problem IEEE Transactions
on Pattern Analysis and Machine Intelligence
26: 1380-1384.
Campadelli, P., et al. (2003). A color based method
for face detection. International Symposium
on Telecomunications, Isfahan, Iran.
Carvalho, F. J. S. and J. M. R. S. Tavares (2005).
Metodologias para identificação de faces em
imagens: Introdução e exemplos de
resultados.
Congresso
de
Métodos
Numéricos en Ingeniería 2005, Granada,
Espanha.
Cootes,
T.
F.
(2004).
Build_aam.
http://www.wiau.man.ac.uk/~bim/software/a
m_tools_doc/download_win.html.
Cootes,
T.
F.
(2004a).
Talking
Face.
http://www.isbe.man.ac.uk/~bim/data/talking
_face/talking_face.html.
Cootes, T. F., et al. (1998). Active Appearance
Models. Proceedings of European Conference on Computer Vision, Springer.
Cootes, T. F. and C. J. Taylor (1992a). Active Shape
Models - 'Smart Snakes'. Proceedings of the
British Machine Vision Conference, Leeds.
Cootes, T. F., et al. (1992). Training Models of
Shape from Sets of Examples. Proceedings
of the British Machine Vision Conference,
Leeds.
Cootes, T. F., et al. (1994). Active Shape Models:
Evaluation of a Multi-Resolution Method for
Improving Image Search. British Machine
Vision Conference, BMVA.
Hamarneh, G. (1999). ASM (MATLAB).
http://www.cs.sfu.ca/~hamarneh/software/co
de/asm.zip.
Hicks, Y., et al. 2002. Automatic Landmarking for
Building Biological Shape Models International Conference of Image Processing,
Rochester, USA 2: 801-804.
Hill, A. and C. J. Taylor (1994). Automatic Landmark Generation for Point Distribution Models. Fifth British Machine Vision Conference, England, York, BMVA Press.
Jones, M. J. and J. M. Rehg (1999). Statistical Color
Models with application to skin detection.
IEEE Conference on Computer Vision and
Pattern Recognition, Ft. Collins, CO, USA.
Tien, F.-C., et al. 2004. Automated visual inspection
for microdrills in printed circuit board production International Journal of Production
Research 42, nº 12: 2477-2495.
Vasconcelos, M. J. 2005. MSc Thesis: Modelos
Pontuais de Distribuição em Visão
Computacional: Estudo, Desenvolvimento e
Aplicação. Estatística Aplicada e Modelação,
Universidade do Porto.
Zheng, H., et al. 2004. Blocking Adult Images Based
on Statistical Skin Detection Electronic Letters on Computer Vision and Image Analysis
4: 1-14.