Development of an Algorithm for Eye Gaze Detection Based on

Development of an Algorithm for Eye Gaze Detection
Based on Geometry Features
Ho Tan Thuan
Huynh Thai Hoang
Faculty of Electrical and Electronic Engineering
Ho Chi Minh City University of Technology
Ho Chi Minh City, Vietnam
Email: [email protected]
Faculty of Electrical and Electronic Engineering
Ho Chi Minh City University of Technology
Ho Chi Minh City, Vietnam
Email: [email protected]
Abstract—This paper describes a simple but effective algorithm
for gaze detection based on eye geometry features. First, human
face and eye are detected using Haar-like features and Adaboost
algorithm. Then Canny filter is employed to extract edge feature
in the eye image. Next the iris is estimated by a circle such that
the average of nearest distances between a point on the circle and
a white pixel in the edge image is minimized. After that the
eyelids are detected based on image color’s saturation and
approximated by two second degree polynomials. The
intersections between the two polynomials are the locations of the
eye corners. Using the relative position information between the
center of the iris and the eye corners, the eye gaze can be
recognized. This system requires only a low cost web camera and
a personal computer. The accuracy of this gaze detection system
is over 80% in good lighting condition.
the eye within the face region is recognized. And last, the eyegaze will be indentified. There are two ways to estimate the
gaze. The first way is based on a set of sample images which
can be trained by many different techniques (e.g. PCA, neural
networks, etc.). Using PCA technique [8], the detection system
is simple, speed if the training set contains small number of
subjects. But this system often confuses between “looking–
straight”, “looking–up” and “looking–down”. To have high
accuracy, the size of the training set needs to be increased.
However, increasing the training set makes the calculation
speed slower. The second way is based on geometry of the eye.
This technique use only exist information of objects without
the training set. The development of a simple algorithm using
geometry features described in this article will relate to this
genre.
Keywords : image processing, face detection, Adaboost, gaze
tracking, eye tracking, eye detection, gaze detection.
This article is organized as follows: Section II reviews the
theories of Haar–like features, Adaboost algorithm and the
Canny edge detector which are the fundamentals of this study.
In section III, the development of the eye gaze detection
algorithm based on geometry features is described in details.
Experimental results are presented in section IV. Finally,
conclusions are given in section V.
I.
INTRODUCTION
Nowadays, there are many researches about eye-gaze
detection system because of its usefulness for a wide range of
applications including medical equipments, music players or
cars controlled by eyes, even computer interfaces for
handicapped people, etc. However, the gaze detection
techniques still have many limitations.
II.
A. Haar–like Features
There are many reasons for using features than using the
pixels directly. The most important reason is the feature-based
system operating faster and more effective in learning new
difficult knowledge than the pixel-based system.
One method often used to detect eye-gaze is to illuminate
the eyes with infrared light sources. Then one or more digital
video cameras capture images of the eyes and a computer
estimates the gaze [1], [3], [7]. First the glints of the IR-light
sources on the image of the eye are identified due to the
brightness of the reflection on the dark pupil. Then the center
of the pupil and cornea is estimated using for instance circleestimation, template matching or a simplified model of the eye.
From this information, an estimation of eye-gaze is done. This
type of gaze tracking system is inexpensive, and often has a
high accuracy. However, using infrared light sources to
illuminate the eyes possibly damage them and uncomfortable
for users.
Viola and Jones used three kinds of Haar-like features
illustrated in Fig. 1.
Figure 1. The Viola and Jones feature
The value of every feature is the sum of the pixels within
white rectangular region(s) subtracted from the sum of pixels
within black rectangular region(s).
Based on computer vision techniques, some systems are
truly non-intrusive because they use only exist information on
the images without infrared light sources [5], [6], [8]. Several
steps are used when gaze detection is based on image data.
First, the face of the subject is detected. Then the location of
Ho Chi Minh city University of Technology
REVIEW OF RELATED THEORIES
Another Haar-like features are also used by this system, all
of them are shown in Fig. 2.
173
Oct. 21 – Oct. 23, 2009 IFOST 2009
Sub-windows
Reject
1
Reject
2
Edge features
Not object
…
Line features
Reject
n
Object
Figure 4. A cascade of classifiers with n stages.
Center–surround features
Each stage of the cascade, which is trained by Adaboost,
will accept almost 100% of the positive samples and reject 20–
50% of the false samples. Every sub-window which is rejected
at k–stage will be concluded that it doesn’t contain a face and
ignored in the later stages. Linking n stages, the object in image
can be detected with high rate.
Figure 2. Haar–like features
Rectangle features can be computed very efficiently by
using integral image. Integral image at location (x, y) is
defined the sum of all pixels within rectangle ranging from the
top left corner at location (0, 0) to bottom right corner at
location (x, y) :
(1)
C. Canny Edge Detector
The Canny edge detection operator was developed by John
F. Canny in 1986. He used a multi-stage algorithm to detect
edges in images.
After calculating all integral images, rectangle features can
be calculated quickly. For example, to calculate the sum of the
pixels within region D in Fig. 3, the integral images at P1, P2,
P3, P4 are used. We have:
First the Canny edge detector smoothes image to eliminate
noise before trying to locate and detect any edges. And the
Gaussian filter is used in the Canny algorithm, because of the
simple of filter mask. The equation of the Gaussian function in
two dimensions is:
I (x, y)
¦xcdx,ycdy i( x, y)
where i(x, y) is the gray-level of input image.
D = A + B + C + D – (A+B) – (A+C) + A
(2)
x 2 y2
or
G(x, y)
D = I(P4) – I(P2) – I(P3) + I(P1)
A
P1
B
(3)
P4
e
2ı 2
(4)
where x is the distance from the origin in the horizontal axis, y
is the distance from the origin in the vertical, and ı is the
standard deviation.
P2
An example of Gaussian mask with ı = 1.4 is shown below
in Fig. 5:
D
C
1
2ʌV 2
P4
Figure 3. Using the integral image to compute the sum of the pixels within
rectangle D
B. Adaboost Algorithm
AdaBoost, short for Adaptive Boosting, is a machine
learning, proposed by Yoav Freund and Robert Schapire. It can
be used in conjunction with many other learning algorithms.
Viola and Jones introduced a face detection systems
capable of detecting face in real-time with high success rate
(about 90%). They combined the efficient fast features
calculating, the Adaboost algorithm and the cascade technique.
In order to detect an object in an image, all possible subwindows need to be examined, and determine if they contain an
object or not. This is done for different positions and different
scales. To run in real–time, Viola and Jones suggested using a
cascade of classifier for the face detection system, see Fig. 4.
Ho Chi Minh city University of Technology
2
4
5
4
2
4
9
11
9
4
5
11
15
11
5
4
9
11
9
4
2
4
2
4
2
Figure 5. Discrete approximation to Gaussian function with ı = 1.4
The larger the width of the Gaussian mask, the lower is the
detector's noise sensitivity.
After smoothing the image, the next step is using the Sobel
operator to calculate the gradients of image. The operator uses
a pair of 3x3 kernel convolution mask. If A is defined as the
source image, and Gx and Gy are two images which at each
174
Oct. 21 – Oct. 23, 2009 IFOST 2009
point contain the horizontal
approximations, we have:
Gx
and
vertical
derivative
Source Image
ª1 0 1 º
A * ««2 0 2»»
«¬1 0 1»¼
(5)
2
1º
ª1
A * «« 0
0
0 »»
«¬ 1 2 1»¼
(6)
and
Gx
where “*” is the convolution operator.
Detecting face using
Adaboost algorithm
Detecting iris using
geometry model
Estimating the
location of the eye
Detecting eyelid
based on color’s
saturation and
geometry model
The magnitude of gradient can be computed using formula:
G
G 2x G 2y | G x G y
(7)
and the gradient’s direction:
T
arctan
Gx
Gy
(8)
Finally, two thresholds are used, a high T1 and a low T2,
any pixel in the image which has the gradient value greater
than T1 is presumed to be an edge pixel. Then, any pixels that
are connected to this edge pixel and have the gradient value
greater than T2 are also presumed as edge pixels.
Examples of the Canny edge detector illustrated in Fig. 6.
Detecting eye using
Adaboost algorithm
Source images
Using the relative
position
between
canthus and pupil,
estimating the eye
gaze
Figure 7. Structure of the Algorithm
B. Face and Eye Detection
Using the Haar–like features, Adaboost algorithm and
cascade technique, the system capable of detecting face in realtime with both high detection rate and very low false positive
rates.
Result images
Figure 6. Examples of the Canny edge detector
III.
DEVELOP THE ALGORITHM FOR EYE GAZE DETECTION
To determine the location of the eyes within the face region,
two rectangles surrounding the left eye and right eye are
relatively estimated to speed up the detection, and reduce the
mistaken recognition to other parts of the face such as nose,
mouth, etc.
A. Structure of the Algorithm
The eye-gaze estimated by dividing it into four directions:
(1) looking straight, (2) looking up, (3) looking left, and (4)
looking right. The structure of the algorithm is illustrated in
Fig. 7.
Ho Chi Minh city University of Technology
175
Oct. 21 – Oct. 23, 2009 IFOST 2009
If W and H are the width and height of the rectangle
surrounding the face, a Descartes coordinate system which is
illustrated in Fig. 8 is used. Two rectangles surrounding the left
eye and right eye in the figure will satisfy the following
conditions:
(0, 0)
W
A
Source image
– The center of circle:
(x, y) = (centerxmin, centerymin)
– The radius of circle: r = rmin
– Minimum average distance:
mindavr = infinite
y
A’
r < rmax
x < centerxmax, y < centerymax
H
x
Figure 8. Estimating two rectangles surrounding the left eye and right eye
x
Location of point A : xA = W/6, yA = H/5
x
Location of point A’ : xA’ = W/2, yA’ = H/5
x
Width of rectangles : w = W/3
x
Height of rectangles : h = H/3
N
END
Y
With each pixel locate at (i, j) on the circle, the nearest
neighbouring edge pixel ( i + epsi, j + espj) is
determined. The the distance between two pixels is
calculated:
dij = sqrt(epsi2 + espj2)
Then the sum of distances d is calculated:
d = d + dij
After considering all pixels on the circle, the average
distance is computed:
davr = d/(number of pixels on the circle)
After estimating the position of left and right eyes, the
location of them are exactly detected by Adaboost algorithm as
same as face detection.
C. Iris detection
To detect an eye–gaze, five parts of the human eye: iris,
pupil, sclera, eyelids and canthus which are illustrated in Fig. 9
are focused on.
davr < mindavr
N
Y
davr = mindavr
Figure 9. Five main parts of the eye
Increase the value of center (x, y) or the
radius of circle
Based on relative position between pupil (center of the iris)
and canthus where the upper and lower eyelids meet, the eyegaze can be estimated. Therefore, the iris, and two eyelids must
be detected.
Figure 11. Flow chart of iris detection algorithm
To recognize the iris within the eye region which is
detected by Adaboost algorithm, the Canny edge detector
algorithm is used. Some example images are show in Fig. 10.
Because the gray level changes suddenly from iris to sclera,
there is a part of circles surrounding the iris in Canny edge
image. To detect the iris, circles with the change of center
position (x, y) between centerxmin < x < centerxmax and
centerymin < y < centerymax and the change of radius between
rmin < r < rmax are used to scan the Canny edge image. With
each point (xi, yj) on this circles, we find the Canny edge pixels
(xi + įi, yj + įj) nearest to (xi, yj), and compute the error
distance dij between them. After considering all points on each
circle, we calculate the average error distances:
Source images
Result images
d
Figure 10. Result of the Canny edge detector
Ho Chi Minh city University of Technology
176
¦ d ij
number of points on circle
Oct. 21 – Oct. 23, 2009 IFOST 2009
(9)
The circle which has smallest average error distance is
chosen.
n
S
¦İ
2
i
(11)
i 1
Flow chart of iris detection algorithm is showed in Fig. 11.
Examples of iris detection are showed below in Fig. 12.
Source images
Source images
Result images
Figure 13. Examples of eyelid detection
Result images
S is the minimum when a1, b1, c1 are the solutions of
following equations system:
Figure 12. Examples of iris detection
D. Eyelid Detection
Upper-eyelid and lower-eyelid meet at canthus. So to
recognize the position of canthus, first two eyelids are detected.
Eyelid detection algorithm base on color’s saturation and
geometry features. The shape of two eyelids is considered two
parabolas.
­ wS
°
° wa1
° wS
®
° wb1
° wS
° wc
¯ 1
To detect eyelid, this algorithm followed these subsequent
steps:
x
Convert color system of source image from RGB
to HSV.
x
Based on saturation and gray level of image pixel,
sclera is detected. Using the information of
sclera’s edge pixels and iris’s edge pixels, the
eyelids are recognized.
x
0
(12)
0
0
or
­ na 1 b1 ¦n x i c1 ¦n x i2
i 1
i 1
°
n
n
n
2
3
®a 1 ¦i 1 x i b1 ¦i 1 x i c1 ¦i 1 x i
°a n x 2 b n x 3 c n x 4
1 ¦i 1 i
1 ¦i 1 i
¯ 1 ¦i 1 i
n
¦i 1 y i
n
¦i 1 x i y i
n
¦i 1 x i2 y i
(13)
Similar, we interpolate the lower eyelid pixels by another
second degree polynomial: y = a2x2 + b2x + c2
Interpolating the pixels on the eyelids to second
degree polynomials.
The location of canthus is the solution of equations system:
­ a 1 x 2 b1 x c1
®
2
¯a 2 x b 2 x c 2
While RGB color space based on 3 basic colors: Red, Green,
Blue in Descartes coordination system, the HSV color space
based on 3 components: Hue, Value and Saturation. Saturation
component of color pixel relate to the purity or blaze of color.
Since the monochromatic light is purity and blaze, its
saturation is high. Because sclera is white, it has lower
saturation value and higher gray value than neighbor regions.
y
y
(14)
Examples of interpolating the pixels on the eyelids by second
degree polynomials are showed below in Fig. 14.
The sclera boundary pixels which are the set of pixels having
the change of color’s saturation component, and gray level
value greater than thresholds are detected. Combining the edge
pixels of sclera and iris, the eyelid’s edge is recognized.
Examples of eyelid detection are showed below in Fig. 13.
Source images
To locate the canthus, first the upper eyelid pixels are
interpolated by second degree polynomial: y = a1x2 + b1x + c1
If İi is error of pixel locate at (xi, yi), we have:
İi = yi – a1xi2 + b1xi + c1
(10)
Figure 14. Interpolating the pixels on the eyelids by second degree
polynomials
Then, the total square of error is:
Ho Chi Minh city University of Technology
177
Oct. 21 – Oct. 23, 2009 IFOST 2009
E. Eye Gaze Estimation
To estimate the eye gaze, relative position between pupil and
canthus is used.
TABLE 1. The statistics of the number of success detected
images in good lighting condition
Number of source
images
Number of success
detected images
x If the distance between the iris center and the left
canthus is less than a threshold T1, the eye is
looking left.
x If the distance between pupil and right canthus is less
than a threshold T2, the eye is looking right.
Looking
Left
Looking
Right
Looking
Up
10
10
10
10
8
9
8
8
TABLE 2. The statistics of the number of success detected
images in high-bright lighting condition
x If the distance between pupil and the mid-pixel of the
segment connecting left canthus and right canthus is
less than a threshold T3, the gaze is looking
straight. If the distance is greater than T3 and less
than threshold T4, the gaze is looking up.
IV.
Looking
Straight
Number of source
images
Number of success
detected images
Looking
Straight
Looking
Left
Looking
Right
Looking
Up
10
10
10
10
3
3
4
2
V.
CONCLUSIONS
EXPERIMENTAL RESULTS
This section presents the experimental results to illustrate
the performance of the proposed eye gaze detection algorithm.
In these experiments 80 images in which 40 images are in good
lighting condition and 40 images are in high-bright lightning
condition are captured. Fig.15 show some test images in good
lighting condition. Table 1 and 2 show the statistics of the
number of success detected images. Although the proposed
algorithm is able to detect eye gaze in images captured by low
resolution web camera, it is still very sensitive to illumination
condition. The successful rate is approximately 80% in good
lighting condition. Because the canthus detection is based on
color saturation, the result is low-accuracy in high-bright
lighting condition. To improve the performance of method, the
eyelid detection algorithms need to be improved.
This paper described our first effort in development an eye
gaze detection algorithm for non–intrusive systems. The
algorithm consists of these steps: face and eye detection, iris
estimation, eye corners localization and eye gaze
determination. The algorithm is able to detect the eye gaze with
satisfactory successful rate in good lighting condition. Despite
its simplicity and easy implementation, the algorithm still has
some limitations. Because the canthus detection is based on
color saturation, the result is not good as expected in highbright lighting condition. To improve the result to an
acceptable level, in the future, the edge detector within iris and
eyelids detection algorithms will be improved to make the gaze
detection system to be robust to different lighting conditions.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Figure 15. The result images of gaze detection system in good lighting
condition
Ho Chi Minh city University of Technology
178
K.Talmi, J.Liu, “Eye and Gaze Tracking for Visually Controlled
Interactive Stereoscopic Displays”, Signal Processing: Image
Communication 14, 1999.
Anders Jorgensen , “Adaboost and Histograms for Fact Face Detection”,
KTH Computer Science and Communication, 2006.
T. Takegami, T. Gotoh, S. Kagei, and R.Minamikawa–Tachino, “A
Hough Based Eye Direction Detection Algorithm without On–site
Calibration”, Proc. VIIth Digital Image Computing : Techniques and
Applications, Sun C., Talbot H., Ourselin S. and Adrriaansen T. (Eds.),
10 – 12 Dec. 2003, Sydney, pp.459–468.
H. Kashima, H. Hongo, K. Kato, K. Yamamoto, “A Robust Iris
Detection Method of Facial and Eye Movement”, VI 2001 Vision
Interface Annual Conference – Ottawa, Canada – 7-9 June 2001.
Mu–Chun Su, Kuo–Chung Wang, Gwo–Dong Chen, “An Eye Tracking
System and Its Application in Aids for People with Severe Disabilities”,
Department of Computer Science and Information Engineering, National
Central University, Chung Li, Taiwan, 2006, pp.44–52.
Kyung-Nam Kim, and Ramakrishna, R.S., “Vision-based eye-gaze
tracking for human computer interface”, IEEE Trans. on Systems, Man,
and Cybernetics, Volume: 2, On page(s): 324-329 vol.2. 1999.
A.Pérez, M.L. Córdoba, A.García, R.Méndez, M.L.Munoz, J.L.Pedraze,
F.Sánchez, “A Precise Eye–Gaze Detection and Tracking System”,
Facultad De Informática, Universidad Politécnica de Madrid, 2003.
George Bebis, and Kikuo Fujimura, “An Eigenspace Approach to Eye–
Gaze Estimation”, ISCA 13th International Conference on Parallel and
Distributed Computing, pp. 604-609, Las Vegas, 2000.
Oct. 21 – Oct. 23, 2009 IFOST 2009
Detecting four main objects: building, trees, sky and
ground surface for outdoor Mobile Robot
My-Ha Le
Hoang-Hon Trinh
Graduate School of Electrical
Engineering
University of Ulsan, UoU
Ulsan, Korea
[email protected]
Graduate School of Electrical
Engineering
University of Ulsan, UoU
Ulsan, Korea
[email protected]
Abstract— This paper proposes a method base on context
information and color to detect main objects in outdoor
complicated scene for outdoor mobile robot or intelligent system.
Outdoor scene usually consists of four main objects: Building,
trees, sky, ground surface and some non-objects. According to this
method, we detect building by using color, straight line, PCs, edge
and vanishing point. The trees can be detected by using features of
color. The sky and ground surface can be detected by using
context information and color.
INTRODUCTION
When a Robot navigates in outdoor scene, it needs much
information about it. In order to collect that information, the
basic function is detecting some main objects in outdoor
scene. Firstly, the building is detected by using line segments
and their dominant vanishing points [7, 8]. The trees are then
detection by using RGB color space. The context information
of four main static of objects such as sky, trees, buildings and
ground surface is used to verify the tree regions. From top to
down, the sky is usually appeared at the highest position; the
positions of trees and buildings are approximate to each other;
the ground surface is usually located at the bottom of images.
Here, the area of each candidate of tree is also considered.
There are also many researches for detecting the road [17,9]. All of them seem to be applied for the transportation
system, because just only the roads including structured and
unstructured models are mentioned. For the other intelligent
systems, just only detecting road is not sufficient information
in real application; for example, when a mobile robot is
working in urban environment he needs to analyze
information of both road and court. Furthermore, only
detecting the ground surface is not sufficient information to
construct all the functions for an intelligent system; for
example, to answer the question: where am I in the city? The
system should detect and recognize the building where he can
get more information than the roads. Therefore, the detection
of four main static objects is necessary for an intelligent
system.
Ho Chi Minh city University of Technology
[email protected]
A candidate of ground surface is the remained image. The
line segments and the remained candidate of tree in previous
steps are used to coarsely verify the ground surface. Then the
ground surface is identified with other objects such as car,
human and the non-object regions by multi-filters. Here, the
non-object region includes trees, buildings, etc. which appears
so far from camera and it’s information there is no meaning for
effecting transport system or robot. Therefore, we cannot detect
them in the previous steps. Filters are also used to classify the
ground surface with the car and human based on the color
information. The second one is based on the frequency of pixel
intensity to separate the ground surface and the non-object
region. Because the intensities of pixels in the road or court are
usually approximate to each other, so that it is considered as a
low frequency region. For the images do not contain building,
the non-object region is very important for classify the ground
surface and the sky.
Keywords: Ground surface detection, building detection, trees
detection, context information
I.
Kang-Hyun Jo
Graduate School of Electrical
Engineering
University of Ulsan, UoU
Ulsan, Korea
II.
BUILDING DETECTION
A. Method
We use line segments and belongings in the appearance of
building as geometrical and physical properties respectively.
The geometrical properties are represented as principal
component parts (PCPs) as a set of door, window, wall and so
on. As the physical properties, color, intensity, contrast and
texture of regions are used. Analysis process is started by
detecting straight line segments. We use MSAC to group such
parallel line segments which have a common vanishing point.
We calculate one dominant vanishing point for vertical
direction and five dominant vanishing points in maximum for
horizontal direction. A mesh of basic parallelograms is created
by one of horizontal groups and vertical group. Each mesh
represents one face of building. The PCPs are formed by
merging neighborhood of basic parallelograms which have
similar colors. The PCPs are classified into doors, windows
and walls. Finally, the structure of building is described as a
system of hierarchical features. The building is represented by
number of faces. Each face is regarded by a color histogram
vector. The color histogram vector just is computed by wall
region of face.
179
Oct. 21 – Oct. 23, 2009 IFOST 2009
Horizontal vanishing point detection is performed similarly
to previous section. In reality, building is a prototypical
structure where many faces and various color appear in images.
Therefore, it is necessary to separate faces. We calculate five
dominant vanishing points in maximum for horizontal
direction.
Line segment detection
The first step of the line segment detection is the edge
detection of image. We used the edge detection function with
Canny edge detector algorithm. The function is run in
automatically chosen threshold. The second step is line
segment detection following the definition: “A straight line
segment is a part of edge including a set of pixels which have
number of pixels larger than the given threshold (T1 ) and all
pixels are alignment. That means, if we draw a line through
the ends, the distance from any pixel to this line is less than
another given threshold (T2)”.
Building image
Detection of line
segments
MSAC-based detection
of dominant vanishing
points
Reduction of noise
Separation of planes as
the faces of building
Creation of mesh of
basic parallelograms
Number of building
faces
Detection of PCPs
Separation of the planes as the faces of building
The vertical segments are extended by their middle points
and vertical vanishing point. We based on the number of
intersection of vertical lines and horizontal segments to detect
and separate the planes as the faces of building. The coarse
stage of face separation is performed by the rule as following:
1. If the same region contains two or more horizontal groups
then the priority is given to a group with larger number of
segment lines
2. If two or more horizontal groups distribute along the
vertical direction then the priority is given to a group with
lower order (the order follows red, green, blue, yellow,
magenta color of horizontal groups) of dominant vanishing
point. The second stage is the recovery stage. Some horizontal
segments which located on close the vanishing line of two
groups are usually mis-grouped. Some segments instead of
belonging to lower order groups, they are in higher order
groups. So they must be recovered. The recovery stage is
performed from the low to high. The third stage is finding
boundaries of faces.
Furthermore, the PCPs are detected by merging the neighbor
Parallelograms which have the similar physical properties. We
based on the RGB color space to present the characters of
basic parallelogram. In reality, the light energy coming from
different region of the large objects including building to the
camera is different. So their intensities with different region
are not the same although they located on the same PCPs, for
example the same wall.
Construction of color
histogram vectors
Number of faces of building and corresponding color
histogram
Descriptor representation
The descriptor vector of the wall region is constructed by
RGB color space. The histograms of R, G or B components
are computed and quantized into 32 bins. In order to avoid the
boundary effects, which cause the histogram to change
abruptly when some values shift smoothly from one bin to
another, we use linear interpolation to assign weights to
adjacent histogram bins according to the distance between the
value and the central value of bin. The histogram vectors are
modified to reduce the effects of scale change. The vectors are
divided by total pixels and then normalized to unit length.
Finally, the three component vectors HR, HG, HB are
concatenated into indexing vector H to present each wall
region. Each face of building is represented by three vectors.
Two of them are formed from two regions of this face. The
union of these two face regions creates the other indexing
vector.
Figure 1. Flow chart of proposed algorithm.
Reducing the low contrast lines
The low contrast lines usually come from the scene such as
the electrical line, the branch of tree. Most of them usually do
not locate on the edge of PCPs because the edge of PCPs
distinguishes the image into two regions which have high
contrast color. We based on the intensity of two regions
beside the line to discard the low contrast lines.
MSAC-Based detection of dominant vanishing points
The line segments are coarsely separated into two groups.
The vertical group contains line segments which create an
actual angle 20° in maximum with the vertical axis. The
remanent lines are treated as horizontal groups. For the fine
separation stage, we used MSAC (m-estimator sample
consensus) [11, 12] robustly to estimate the vanishing point.
B. Result of building detection
After detect building face, we eliminate building from image.
Horizontal vanishing point detection
This work was supported (in part) by Ministry of Knowledge Economy
under Human Resources Development Program for Convergence Robot
Specialists.
Ho Chi Minh city University of Technology
180
Oct. 21 – Oct. 23, 2009 IFOST 2009
Figure 3. Trees detection and elimination
IV.
Figure 2. Building detection and elimination
III.
TREES DETECTION
A. Method
To extract region of tree, we use cues of color. Given a set
of simple color point representative of color of tree, we obtain
an estimate of “average” or “mean”. Let this average color be
denoted by the RGB column vector t. the next objective is to
classify each RGB pixel in an image as having a color in the
specified range or not. To perform this comparison, we need a
measure of similarity. One of the simplest measures is the
Euclidean distance. Let x denote an arbitrary point in RGB
space. We say that x is similar to t if the distance between
them is less than a specified threshold, T. The Euclidean
distance between x and t is given by
D(x, t) = Œx - tŒ
= [(x - t) T (x - t)]1/2
= [(xR - tR)2 + (xG - tG)2 + (xB – tB)2]1/2
Where Œ·ŒLVWKHQRUPRIWKHDUJXPHQWDQGWKHVXEVFULSWV
R, G and B, denote the RGB components of vectors t and x.
the locus of points such that D(x, t) ” 7 LV D VROLG VSKHUH RI
radius T. The points contained within or on the surface of the
sphere satisfy the specified color criterion; points outside the
sphere do not. Coding these two sets of points in the image
with, say, black and white, produces a binary, segmented
image.
DETECT SKY AND GROUND SURFACE
A. Method
We use absolute context for referee to the location of objects
in the image. The sky is always on the top of image, base on
this characteristic we can detect sky with this context
information. The cloud exists always inside sky. It becomes
the merger as general feature to merge to one region. The
cloud is included in sky and has the difficulty of extraction.
Therefore region extraction uses general feature of sky and
cloud. The intensity of cloud is generally higher than sky of
fine day. In the sky gives the condition that the cloud always
exists. Region segmentation extracts region of the cloud after
we extract the region of sky. Therefore, the result merges the
region of the sky and cloud. The sky exists in the image at the
top and the cloud gives context information which exists
always in the sky. Several color spaces including RGB, HSI,
CIE, YIQ, YCbCr, etc are widely used. In this paper, we use
RGB color space.
A candidate of ground surface is the remained image. The
remained object consist some objects that cannot be detected in
previous steps. The ground surface is identified with these
objects such as car, human and the non-object regions by color
detection and filters. Here, the non-object region is what
appearance very far or there is no information for navigation of
Robot/ intelligent system. Here we use two kinds of filters for
ground surface detection. Base on the color information, we
can classify the ground surface with cars and human or the
other small objects. To separate the ground surface and the
non-object region, we use the characteristic of frequency of
ground surface region. Because the intensities of pixels in the
road or court are usually approximate to each other, so that it is
considered as a low frequency region.
B. Result of sky and ground surface detection
The result for sky and ground surface detection as bellow
B. Result of trees detection
After detect trees, we also eliminate trees from image, in the
next step we detect sky and ground surface using context
information.
Ho Chi Minh city University of Technology
181
Oct. 21 – Oct. 23, 2009 IFOST 2009
regional innovation through KIAT and post BK21 project at
University of Ulsan and Ministry of Knowledge Economy
under Human Resources Development Program for
Convergence Robot Specialists.
REFERENCES
[1]
R. Arnay, L . Acosta, M. Sigut and J. Toledo, “Ant Colony Optimization
Algorithm for Detection and Tracking of Non-structured Roads”,
Electronics Letters, Vol. 44 No. 12, pp. 725-727, 5th June 2008.
[2] Q. Gao, Q. Luo and S.Moli, “Rough Set based Unstructured Road
Detection through Feature Learning”, Proceedings of the IEEE
International Conference on Automation and Logistics, pp.101-106,
August 18 - 21, Jinan, China, 2007.
[3] Y. Guo, V. Gerasimov and G. Poulton, “Vision -Based Drivable Surface
Detection in Autonomous Ground Vehicles” Proceedings of the
2006IEEE/RSJ Int’ Conf. on Intelligent Robots and Systems, pp. 32733278, October 9 - 15, 2006.
[4] Y. He, H. Wang and B. Zhang, “Color-Based Road Detection in Urban
Traf¿F6FHQHV´,(((WUDQVDFWions on intelligent transportation systems,
vol. 5, no. 4, pp. 309-318, December 2004.
[5] J. Huang, B. Kong, B. Li and F. Zheng, “A New Method of
Unstructured Road Detection Based on HSV Color Space and Road
Features” Proceedings of the 2007 International Conference on
Information Acquisition, pp.596-601, July 9-11,2007.
[6] D. Song, H. N. Lee, J. Yi, A. Levandowski, “Vision- Based Motion
Planning for an Autonomous Motorcycle on Ill-Structured Roads”,
Autonomous Robots, Vol. 23, No.3, pp. 197-212, 2007.
[7] H. H. Trinh and K. H. Jo, “Image-based Structural Analysis of Building
Using Line Segments and Their Geometrical Vanishing Points”, SICEICASE, Oct. 18-21, 2006.
[8] H. H. Trinh, D. N. Kim and K. H. Jo, “Facet-based Multiple Building
Analysis for Robot Intelligence”, Journal of Applied Mathematics and
Computation (AMC), Vol. 205(2), pp. 537-549, 2008.
[9] Y. Wang, D. Chen and C. Shi, “Vision-Based Road Detection by
Adaptive Region Segmentation and Edge Constraint” Second
International Symposium on Intelligent Information Technology
Application, Vol. 1, pp. 342-346, 2008.
[10] M. Lievin and F. Luthon, “Nonlinear Color Space and Spatiotemporal
MRF for Hierarchical Segmentation of Face Features in Video”, IEEE
Trans. on Image Processing, vol. 13, pp. 63-71, 2004.
[11] M. A. Fischler and R.C. Bolles, “Random sample consensus: a paradigm
for model fitting with application to image analysis and automated cartography,” Communications of the ACM, Vol. 24, Issue 6, pp. 381~395,
1981.
[12] R. Hartley and A. Zisserman, “Multiple view geometry in computer
vision”, Cambridge Uni. Press, 2004.
Figure 4. Sky and ground surface detection
V.
CONCULUSION
This paper proposed a method to detect four main objects in
outdoor environment using multiple cues. Multiple cues are a
color, straight line, context information, PCs, edge, vanishing
point. Combining those features, we can segment image to
several regions such as building, sky, trees and ground surface.
We detect building using color, straight line, PCs, edge and
vanishing point. The tree region extract by using features of
color. The sky also can be detected by using color and context
information. The remained image is candidate of ground
surface. Then color and filter is used to identify the ground
surface. Because of the complicated of our door scene the
result of simulation is not completely perfectibility. In the
future, we will keep studying outdoor object detection applying
for outdoor mobile robot with another approach and using
sequence image or omni-directional camera or using multifilter are the next choices.
ACKNOWLEDGMENT
The authors would like to thank to Ulsan Metropolitan City.
MKE and MEST which has supported this research in part
through the NARC, the Human Resource Training project for
Ho Chi Minh city University of Technology
182
Oct. 21 – Oct. 23, 2009 IFOST 2009
Entrance Detection using Mutiple Cues
Suk-Ju Kang
Hoang-Hon Trinh
Dae-Nyeon Kim
Kang-Hyun Jo
Graduate School of
Electrical Engineering
University of Ulsan, UOU
Ulsan, Korea
[email protected]
Graduate School of
Electrical Engineering
University of Ulsan, UOU
Ulsan, Korea
[email protected]
Graduate School of
Electrical Engineering
University of Ulsan, UOU
Ulsan, Korea
[email protected]
Graduate School of
Electrical Engineering
University of Ulsan, UOU
Ulsan, Korea
[email protected]
entrance. So that the regions which is not matched to wall and
windows are considered as the candidates of entrance.
Abstract—This paper describes an approach to detect the
entrance of building for outdoor robot. The entrance is
important component which connects internal and external
environments of building for the navigation of robot. Firstly,
building surfaces are detected and then wall region and windows
are detected. The remaining regions except the wall region and
windows are candidates of entrance. To detect the entrance, We
use the information of windows in rectangular shape. The
geometrical characteristics are used for extracting the entrance
such as the height of window, the height of floor. And We adopt a
probabilistic approach for entrance detection by defining the
likelihood of various features. The proposed approach captures
both the shape and color.
Keywords-component; Entrance detection, probabilistic model,
geometrical characteristics
I.
INTRODUCTION
It is important to find the entrance of the building in the
external environment. The robot have to recognize the entrance
for navigation. Because the entrance of the building connects
the external environment to the internal environment. The
features of the entrance are similar to doors and windows such
as vertical lines and corners[1, 3-7]. The door detection of
interior has been studied numerous times in the past. The
entrance is a part of the building. In [2], authors detect the
entrance and windows in order to recognize building. They use
laser scanners and a CCD camera. The research approach is
based on a variety of sensors. For example, doors are detected
by sonar sensors and vision sensors with range data and visual
information respectively [3]. In [4-6], authors use a CCD
camera getting geometrical information and color information.
Authors in [4] use the fuzzy system to analyze the existence of
doors and genetic algorithm to improve the region of doors. In
[5], authors using the shape and the color information detect
the doors in probabilistic method. In addition, The research
using laser sensors has been studied [7]. Our laboratory has
studied the building recognition [8-11]. We use the algorithm
which is researched in our laboratory for building recognition
and looking for the entrance in connection with the window
detection [8-11]. Fig.1 shows an overview of proposed method
where wall region, surface detection and window detection
have been done by our previous works. The building has three
kinds of principal components such as wall, windows and
Ho Chi Minh city University of Technology
Figure 1. An overview of proposed method
II.
SURFACE, WALL REGION AND WINDOW DETEION
The processes for detecting building surface, estimating
wall regions and detecting windows were explained in detail in
our previous works [8-12]. First, We detected line segments
and then roughly rejected the segments which come from the
scene as tree, bush and so on. MSAC algorithm is used for
clustering segments into the common dominant vanishing
points comprising one vertical and several horizontal vanishing
points. The number of intersections between the vertical line
and horizontal segments is counted to separate the building
pattern into the independent surfaces. And then we found the
boundaries of surface as the green frame in the second row of
Fig.2. To extract wall region, we used color information of all
pixels in the detected surface [10]. At first, a hue histogram of
surface’s is calculated, then it is smoothed several times by 1D
Gaussian filter. The peaks in the smoothed histogram are
detected. The continuous bins that are larger than 40(%) of the
highest peak are clustered into separate groups. The pixels
indexed by each continuous bin group are clustered together.
The pixels of each group are segmented again where the hue
183
Oct. 21 – Oct. 23, 2009 IFOST 2009
value is replaced by gray intensity information. And then The
biggest group of pixels is chosen as wall region. Finally, The
candidates of windows is the remaining regions not to be the
wall region. To detect window the rectangular image was
considered. We use the geometrical characteristics to obtain
window regions and then do alignment. Fig.2 shows several
examples; the first row is the original images; the second row
shows the results of wall region, the third row illustrate
windows.
A. Noise Reduction by Geometrical Information
Geometrical information is acquired from window image
and the candidate image of entrance. We obtain the window
positions and the positions of the candidate of entrance in the
image respectively. And then the information is computed by
acquired information. The necessary information is the height
of windows hw, height of floors hf and the position of the
second floor hwp in the window image. The necessary
information of candidate image is the height of regions hnw and
the position of regions hnwp as defined equation (1). The height
of floors is defined from a window to a adjacent window.
Normally, The entrance is near to bottom and has higher than
windows in height. Also, That is not over the second floor as
defined equation (2).
Figure 2. Detection of building surface, wall region and windows
III.
ENTRANCE DETECTION
hw
x1wmax x1wmin
hf
x1wmin x2wmin
hwp
x3wmin
hnw
x1nw
x1nw
max
min
hnwp
We acquire the height between floors, floors of the building
and so on from the rectangular image binarized. And The
vertical lines and horizontal lines are detected by hough
transform. Finally, Probabilistic model with the lines and the
green color of RGB channel is used to decide the entrance
region. Fig.3 present the process detecting the entrance.
x1nw
min
hnw ! hw
hnwp hwp
Figure 4. window image and the candidate image of entrance
B. Line segments
First, The vertical lines and horizontal lines are extracted by
hough transform in a rectangular image. Hough transform
convert the lines into the points in hough space [14]. This
accumulated points are become to lines in image space again.
We can compute the distance of x-coordinate and y-coordinate
between a starting point and a endpoint in a line li respectively.
Figure 3. Entrance Detection Algorithm
Ho Chi Minh city University of Technology
184
Oct. 21 – Oct. 23, 2009 IFOST 2009
When the distance Di is 5 or less this line become the vertical
line as equation (3). The horizontal line is the same as the
vertical line except considering the axis. Fig.5(a) show the
method and Fig.5(b) is the results through below equation.
Di | xa xb | : For Vertical lines
D j | ya yb | : For Horizontal lines
channel of RGB color space. We use the normalized color
from 0 to 255.
x
The geometrical features : total length of a vertical line,
number of intersection points
x
The color feature : density of color.
We will assume that method of probabilistic model is well
described by a small set of parameter ș. We use a restricted
simple data with line length Xl, intersection points Xi and color
information Xc to compute P(Entrance|Xc,Xl,Xi) in the
rectangular image as defined form :
P ( Entrance | X c , X l , X i )
| p ( X c , X l , X i | Entrance) P ( Entrance)
This posterior probability can be decomposed to equation
(4).
P (T | X c , X l , X i ) v p ( X c , X l , X i | T ) P(T )
Figure 5. Extracting vertical and horizontal lines
= P( X c , X l , X i | T c ,T l ,T i ) P(T c ,Tl ,Ti )
= P( X c , X l , X i | T c , Tl , Ti ) P(T c ) P(T l ) P(Ti )
After extracting lines we have to segment lines. Because
Edges are divided into pieces that original images are
converted to rectangular shape and extracted to multiple lines.
Fig.6(a) show the method how the lines is segmented. The
vertical lines are extracted by searching from the left to the
rignt whether the line is or not. The horizontal lines have the
method which is similar to the vertical lines. The different
point is the direction of searching. Fig.6(b) is the results
through line segment.
We consider the parameters șc, șl and și to be independent.
We do not consider P(șc) of the prior information of the color
and P(șl) and P(și) of the prior knowledge of geometrical
parameters. We use only P(Xc,Xl,Xi | șc,șl,și) of likelihood term
of individual measurements and consider maximum likelihood
values of parameters, given a particular instantiation of the
model parameters. The likelihood term can be factored as
equation (5)
P( X c , X l , X i | Tc ,Tl ,Ti )
P( X c | T c , X l ) P( X l | Tl , X i ) P( X i | Ti )
Figure 6. Line Segments
C. Probabilistic Approach
After line segments we extract the intersection points
between the vertical lines and the horizontal lines. We assume
that a vertical line has up to 2 points and the lines of the
entrance are longer than the others. Really, a vertical line has
that the number of points is from 0 to 2 or more. The average
height, havg, shown in equation (7) is average length of vertical
lines with two points from bottom point to top point. Three
features are used to decide the entrance. Two features are the
number of intersection points and the length of vertical lines.
The other feature is color density from 0 to 40 in green
Ho Chi Minh city University of Technology
Figure 7. Components of lines and intersection point and Model of a
entrance
At first, we consider P(Xi|și) of parameters și of the number
of intersection points. We can think that this is weight of the
second term and the first term.
185
Oct. 21 – Oct. 23, 2009 IFOST 2009
­1 when i=2
°
®0.8 when i=1 °0.6 when i=0
¯
P ( X i | Ti )
Region
7
8
9
10
11
The second term P(Xl|și,Xi) is ratio of the length of vertical
lines. ha and hb takes the missing portion of the line. The more
this part is long, the more the possibility of entrance is small.
havg
1
N
The obtained value through algorithm
0
0
0.33
0.18
0.16
n
¦h
i
i 1
P ( X i | Ti , X i ) e
§ h h
¨ a b
¨ havg
©
·
¸
¸
¹
Finally, the fist term take the color density of green channel
of RGB color which is normalized from 0 to 255. The entrance
has the low value from 0 to 40 because Most of the entrance is
composed of apparent glass. The apparent glass do not reflect
the shine. We take the region between two lines including
weighted line. Equation(8) show how to the density. P(x|șc)
take the density of color value in green channel. t(g) is the total
number of pixels between two adjacent lines.
P( x | Tc )
T (g)
, 0<T ( g ) 40
t(g)
P( X c | Tc , X l )
¦
Xl
P( x | Tc )
Figure 8. The detected entrance
V.
The proposed method performs in the entrance with
apparent glass. The entrance with inapparent characteristic does
not detected and the entrance is detected inaccurately. In the
future we are going to detect all of the entrance and research
how to detect the entrance exactly.
c( X l )
c(Xl) is the number of pixels in the rectangular region Xl
delimiting two adjacent lines..
IV.
EXPERIMENTAL RESULT
ACKNOWLEDGMENT
The proposed method has been experimented for a variety
of entrance. Fig.8(a) show the results of the entrance with the
apparent characteristic. The entrance is not detected exactly.
The reason is why the edge is not extracted between the
boundary of wall and entrance. Table 1 is the results of Fig.8(a)
from the proposed algorithm. The value of the entrance region
is higher than the others. Fig.8(b) of the entrance of inapparent
characteristic is not detected because the density of color is low.
The blue box of Fig.8 is detected to the entrance and the yellow
box is real entrance.
TABLE I.
The authors would like to thank to Ulsan Metropolitan City.
MKE and MEST which has supported this research in part
through the NARC, the Human Resource Training project for
regional innovation through KIAT and post BK21 project at
University of Ulsan and Ministry of Knowledge Economy
under Human Resources Development Program for
Convergence Robot Specialists.
REFERENCES
[1]
THE VALUE OF REGIONS
[2]
Region
1
2
3
4
5
6
The obtained value through algorithm
0.16
0.09
0.04
0.02
0
0
Ho Chi Minh city University of Technology
FUTURE WORK
[3]
186
Haider Ali, Christin Seifert, Nitin Jindal, Lucas Paletta and Gerhard Paar,
“Window Detection in Facades,” 14th International Conf. on Image
Analysis and Processing, 2007
Konrad Schindler and Joachim Bauer, “A model-Based Method For
Building Reconstruction,” In Proc. of the First IEEE International
Workshop on Higher-Level Knowledge in 3D Modeling and Motion
Analysis, 2003
S.A Stoeter, F. Le Mauff and N.P. Papanikopoulos, “Real-Time Door
Detection In Cluttered Environments,” In 2000 Int. Symposium on
Intelligent Control, 2000, pp. 187-192
Oct. 21 – Oct. 23, 2009 IFOST 2009
[4]
[5]
[6]
[7]
[8]
R. Munoz-Salinas, E. Aguirre and M. Garcia-SilventeJ, “Detection of
doors using a generic visual fuzzy system for mobile robots,” vol 21
Auton Robot, Springer, 2006, pp.123–141.
A.C Murillo, J. Kosecka and J.J. Guerrero C. Sagues, “Visual Door
detection integrating appearance and shape cues”, Robotics and
Autonomous Systems, 2008, pp. 512-521
Jung-Suk Lee, Nakju Lett Doh, Wan Kyun Chung, Bum-Jae You and
Young Il Youm, “Door Detection Algorithm Of Mobile Robot In
Hallway Using PC-Camera,” Proc. of International Conference on
Automation and Robotics in Construction, 2004
D. Anguelov, D. Koller, E. Parker and S. Thrun, “Detecting and
Modeling Doors with Mobile Robots,” Proc. of the IEEE International
Conf. on Robotics and Automation, 2004, pp. 3777-3784
H.H. Trinh, D.N. Kim and K.H. Jo, “Structure Analysis of Multiple
Building for Mobile Robot Intelligence,” Proc. SICE, 2007
Ho Chi Minh city University of Technology
[9]
[10]
[11]
[12]
[13]
[14]
H.H. Trinh, D.N. Kim and K.H. Jo, “Urban Building Detection and
Analysis by Visual and Geometrical Features,” ICCAS, 2007
H.H. Trinh, D.N. Kim and K.H. Jo, “Supervised Training Database by
Using SVD-based Method for Building Recognition,” ICCAS, 2008
H.H. Trinh, D.N. Kim and K.H. Jo, “Facet-based multiple building
analysis for robot intelligence,” Journal of Applied Mathmatics and
Computation(AMC), vol. 205(2), 2008, pp. 537-549
H.H. Trinh, D.N. Kim and K.H. Jo, “Geometrical Characteristics based
Extracting Windows of Building Surface,” unpublished
Richard O. Duda, Peter E. Hart and David G. Stork, “Pattern
Classification,” John wiley & Son, Inc, in press
Linda G. Shapiro, George C. Stockman, “Computer Vision,” Prentice
Hall, in press
.
187
Oct. 21 – Oct. 23, 2009 IFOST 2009
FACIAL EXPRESSION RECOGNITION USING AAM ALGORITHM
Thanh Nguyen Duc, Tan Nguyen Huu, Luy Nguyen Tan*
Division of Automatic Control, Ho Chi Minh University of Technology, Vietnam
*National key lab for Digital Control & System Engineering, Vietnam
ABSTRACT
Facial expression recognition is especially important in interaction between human and intelligent robots.
Since the introduction of AAM model, there has been great change in detection accuracy. The main
concern of this material is on facial expression recognition. In this paper, the recognition task is based on
two methods, one of them is AAM combined with neural network, which gives better accuracy but lower
speed, while the other is AAM combined with point correlation which is especially fast. Thus, it can be
integrated to mobile robot platforms.
Keywords: Facial expression recognition, Active Appearance Models (AAM), Digital image processing
2.1. AAM Model
1. Introduction
Digital image processing is a task
involved in capturing images from camera and
analyzing the images, to extract the necessary
information from the images. Facial expression
recognition is not an exception of this rule. In this
case, the information to be extracted is on special
features of the face that relates to human feelings
such as angry, normal happy, surprise.
W. Y. Zhao [1], B. Fasel [2] gave survey
about facial expression recognition. Philip Michel
& Rana El Kaliouby [3] used SVM (Support
vector machines), the accuracy is over
60%.
Ashish Kapoor [4] considered the movement of
eyebrows. M.S. Barlett [5] combined Adaboost
and SVM with the database DFAT-504 of Cohn
& Kanade
In this article, AAM model [7] is used and
combined with two other methods for
recognitions. The first one is AAM combined with
neural network, which would give us better
accuracy, where as the second one is AAM
combined with point correlation, which would
give us an acceptable accuracy but a better speed.
2.1.1.Building AAM model
Since AAM model was first introduced in [7], the
use of this model has been increasing rapidly. An
AAM model consists of two parts. AAM shape
and AAM texture
2.1.1.1. AAM shape
According to [7], AAM shape is composed of the
coordinates of v vertices make of the mesh:
s
T
> x1 , y1 , x2 , y2 ,..., xv , yv @
(1.1)
Furthermore, AAM has linear shape, thus a shape
vector s can be represented as:
N
s
s0 ¦ pi si
s0 Ps bs
(1.2)
i 1
Where s0 is the base shape and s0 , s1 ,..., sN are
orthogonal eigenvectors obtained from training
2. Background theories on AAM
Ho Chi Minh city University of Technology
188
Oct. 21 – Oct. 23, 2009 IFOST 2009
bs
s1 , s2 ,..., sN and
Ps
shapes,
( p1 , p2 ,..., pN )
A
Taking Taylor expansion of T(W(x, 'p )) in
terms of 'p at 'p 0 . This expression can be
rewritten as
The AAM texture A( x ) is a vector defined as
pixel intensities across objects x ɽ s0,
x, y where x
(1.5)
x
T
2.1.1.2. AAM texture
T
2
¦ > I (W ( x, p)) T (W ( x, 'p))@
wW ( x; p )
A ¦ [ I (W ( x, 'p)) T ( x) ’T
wp
x
. Texture of AAM is also linear.
Thus, it can be presented as:
2
p 0
'p ]
(1.6)
M
A( x)
A0 ( x ) ¦ Oi Ai ( x)
Please note that, W ( x, 0) x (because p 0
does mean no changes at all)
The solution to minimize that expression can be
easily found as followed:
A0 ( x) Pb
t t (1.3)
i 1
x  s 0
T
O1 , O2 ,..., OM Where bt
, Pt
(t1 , t2 ,..., tM )
are orthogonal eigenvectors obtained from the
training textures.
'p
ª
wW º
H ¦ «’T
wp »¼
x ¬
1
T
> I (W ( x; p)) T ( x)@
(1.7)
2.1.2.Fitting AAM model to an object
Where H is the Hessian matrix.
The goal of fitting AAM model is to find the best
alignment to minimize the difference between the
constant template T ( x ) and an input image I ( x )
with respect to warp parameters p . Let
x
T
x, y T
H
ª
wW º ª
wW º
¦x «’T wp » «’T wp »
¬
¼ ¬
¼
be the pixel coordinates and W ( x, p )
Notice that the Jacobian matrix
denotes the set of parameterized allowed warps,
where p ( p1 , p2 ,..., p N )T is a vector of N
2
Table 1 Steps for fitting AAM model
Pre-computation
Iteration
•For every pixel x in •Start iteration at p = 0.
the convex hull of •For each pixel x in the
AAM shape, obtain convex hull of the
the intensity T(x) in reference AAM shape,
the template image.
warp it to coordinate
•Calculate the gradient W ( x, p ) . Then, obtain
of T(x), which is ’T . the intensity
Calculate the Jacobian
I (W ( x, p )) by
(1.4)
x
According to [10], to solve this, we assume that an
estimation of p is known and then iteratively
solve for the increment parameter 'p so that the
following expression is minimized
Ho Chi Minh city University of Technology
wW
is calculated
wp
at p 0 and the matrix T can be pre-computed.
Thus, H can be pre-computed before every
iteration. From these statements, according to
[10], the fitting algorithm for AAM model can be
summarized in the following steps
parameters. The warp ( x, p ) takes the pixel x in
the coordinate frame of a template T and map it
to the sub-pixel location W ( x, p ) in the
coordinate of the input image- I . The LucasKanade image alignment algorithm in [10] is to
minimize:
¦ > I (W ( x, p)) T ( x)@
(1.8)
189
Oct. 21 – Oct. 23, 2009 IFOST 2009
matrix
wW
at ( x, 0) .
wp
•Calculate the steepest
decent image ’T
wW
wp
•Calculate the Hessian
matrix versus (1.8)
interpolation
•Compute error image
I(W(x, p)) – T(x).
•Compute 'p using
the pre-computed H
and the formula (1.7)
•If 'p is small enough,
ends the iterations.
Otherwise, update
Fig. 2 Shape s0
Fig. 3 Base texture A0(x)
Figure 4 show us some examples of fitting the
model face using AAM model. These images have
not been previously used in training phase. The
average number of iterations for each image is 5
'p m ' p p
3. Experimentation
3.1. Building and testing AAM model
In the training phase, we used more than 200
images of the model’s face for 4 basic facial
expressions which are normal, happy, surprise and
angry taken in different lighting conditions. Each
input image is marked with 66 points. The figure
below shows some of the images that have been
used in the training phase.
Fig. 4 Result of AAM face fitting
3.2. Facial Expression Recognition using AAM
combined with neural network
3.2.1.Training neural network
In our experiment, we used about 30 images for
each emotion. They are happy, normal, surprise,
angry. For each image, we used the following
procedure to extract the featuring vector.
1. Load the built AAM model.
2. Load the image need to extract the featuring
vector together with its corresponding
emotion vector E. The emotion vector E is a
4*1 vector, which belongs to one of the 4
T
following vectors > 0, 0, 0,1@ (normal);
Fig. 1 Some input images
T
After performing Procustes’ analysis [10] to
eliminate the effect of similarity transforms on the
input images (such as rotation, scaling, translation
etc), we get normalized input shape vectors. Next,
performing PCA analysis on these vectors, we get
the base shape s0 and other orthogonal shape
eigenvectors.
Finally, performing PCA analysis on input image
textures, after having warped them into base
shape s0 , we got base texture A0 ( x) and other
orthogonal texture eigenvectors.
Ho Chi Minh city University of Technology
T
>0, 0,1, 0@ (happy); >0,1, 0, 0@
T
>1, 0, 0, 0@ (angry).
(surprise);
3. Apply AAM model to track the original face
(F) in the image and normalized it to
another face (F’) which is a texture defined
on the base shape s0 .
4. Perform PCA analysis on F’ using the
textures A0 ( x), A1 ( x),...., AM ( x) . We get
F'
190
A0 v1 A1 v2 A2 ... vM AM
(1.9)
Oct. 21 – Oct. 23, 2009 IFOST 2009
Table 2 Result of detection using AAM and MLP
Emotions
% of correction/True
images (out of 75)
Normal
82.66% (62)
Happy
96.00%(72)
Surprise
85.33%(64)
Angry
84.00%(43)
5. The featuring vector for this image is
v
T
> v1 , v2 ,..., vM @
(1.10)
Using the set of such (v, E ) vectors, we trained
the three-layer neural network, which has the
following structures. Input layer: 62 neurons,
which is also the number of eigen textures; hidden
layer: 50 neurons; output layer: 4 neurons, which
is also the number of emotions.
The average time for each image is about 750
milliseconds. The testing images came from a
Genius’ Slim 1322AF webcam. Totally, 75
images were tested for each emotion.
3.2.2.Testing neural network
3.3. Fast Facial Expression using point
Correlation
For a testing input image, conducting the
following procedures
1. Load the AAM model
2. Load the neural network.
3. Apply AAM model to track the original face
( F ) in the input image and normalized it to
another face ( F ') , defined on the base
shape s0. This will help eliminate effects
cause by similar transform and face rotation
4. Perform PCA analysis on F’ using the base
texture A0 , A1 ,...., AM obtained previously in
the training phase. We get
F'
A0 v1 A1 ... vM AM
3.3.1.Background knowledge
The Facial Expression Recognition based
on MLP proved to be effective but it’s fairly slow
due to the reason of performing PCA analysis. In
order to improve the speed of recognition, we
suggest the idea of using point correlation in
combine with AAM model. Let us consider 66
points on the face after having fit the AAM model
to the face.
(1.11)
5. Using the featuring vector -
v
T
> v1 , v2 ,..., vM @
calculate and choosing
the maximum output of the neural network.
Then, infer the corresponding emotion.
3.2.3.Result
The experiment is implemented on a Compaq
nx6120 laptop, running at 1.73 GHz, Ram 256
MB, using Visual C++ and open CV.
Fig. 5 66 face featuring points.
Let d ( m, n) be the Euclidean distance (in pixels)
between the point m and n. Then, calculate the
following ratios:
Ho Chi Minh city University of Technology
191
Oct. 21 – Oct. 23, 2009 IFOST 2009
Table 3 Result of detection using AAM and Point
Correlation
Emotion
% of correctness/ True images (out
of 75 images)(*)
Normal
85.33%(64)
d (22, 28)
(1.12)
0.5d (17,19) 0.5d (12,15)
0.5d (17,19) 0.5d (12,15)
(1.13)
R eye
d (37,34)
d (4,5)
(1.14)
R eyebrown d (13,16)
R
mouth
On one hand, experimentation proves that the
ratio Rmouth is relatively large when a person is
happy or angry. On the other hand, it is small
when he or she is normal or surprise. Besides, Reye
can be used to differentiate between normal and
surprise emotions. Because, when a person is
surprise, his eyes tend to open larger.
Furthermore, Reyebrown can be used to differentiate
between angry and happy. This results from the
fact that, when a person is angry, the distance
between the eye-browns is enlarged.
Happy
Surprise
90.66%(68)
81.33%(61)
Angry
82.66%(62)
The average processing time for a single image is
about 250 ms, which is far more faster than the
previous method. This is because most of the time
is used for fitting AAM model. Besides, the
distance calculations and comparing do not takes
so much time.
(*)Conducted with mouth threshold = 7.2, eye
threshold = 0.32, eb threshold = 0.90.
4. Conclusion
On the whole, facial recognition using AAM
combined with neural network gives higher
accuracy than that using point correlation method
,but point correlation give us faster recognition
time. Thus, recognition using point correlation
gives us a great chance to integrate the task into
mobile platform, such as robots. However, that is
over the scope of this article.
In order to improve the accurateness of
recognition task, a better minimization techniques
should be used, such as second-order
minimization as in [11], but it will takes longer
time for the algorithm to converge.
The experiment was conducted with a personspecific AAM model. In order to expand the
system to recognize with various people, a lot of
training textures and shapes should be used. In
this case, the algorithm is exactly the same.
Further research will be taken to get better
accuracy and faster recognition time.
Many thanks for sponsor from the science
foundation fund of VNUHCM
Fig. 6 Flowchart of Point Correlation method
The flowchart presented in figures 6 summaries
our algorithm. In this flowchart, mouth threshold,
eye threshold and eb threshold are three tunable
parameters to be suitable for a specific person.
3.3.2.Result
Using test images in previous recognition method,
we have the flowing result.
Ho Chi Minh city University of Technology
REFERENCES
1. W. Y. Zhao et al. , “Face Recognition: A
Literature Survey”, UMD CfAR
Technical Report CAR-TR-948, 2000.
192
Oct. 21 – Oct. 23, 2009 IFOST 2009
IEEE International Conference of
Robotics and Automation, 2004.
12. Selim Benhimane and Ezio Malis, "Realtime image-based tracking of planes using
Efficient Second-order Minimization," ,
pp. 943-948, Proceedings of IEEEIRSJ
International Conference on Intelligent
Robotics and Systems, 2004.
2. B. Fasel and J. Luettin, “Automatic Facial
Expression Analysis: A Survey”, Pattern
Recognition, Vol. 36(1), 2003, pp. 259275.
3. Philip Michel & Rana El Kaliouby, Real
time Facial expression recognition in
Video using Support Vector Machines,
University of Cambridge
4. Ashish Kapoor Yuan Qi Rosalind W.
Picard, Fully Automatic Upper Facial
Action Recognition, IEEE
InternationalWorkshop on Analysis and
Modeling of Faces and Gestures , Oct
2003
5. Marian Stewart Bartlett, Real Time Face
Detection and Facial Expression
Recognition: Development and
Applications to Human Computer
Interaction
6. CVPR Workshop on Computer Vision
and Pattern Recognition for HumanComputer Interaction.
7. Marian Stewart Bartlett ,Towards social
robots: Automatic evaluation of humanrobot interaction by face detection and
expression classification, Advances in
Neural Information Processing Systems
2003.
8. G. J. Edwards, C. J. Taylor, and T. F.
Cootes. Interpreting face images using
active appearance models pp300-305, In
Proc. International Conference on
Automatic Face and Gesture Recognition,
June 1998.
9. T. Cootes and P. Kittipanya-ngam.
Comparing variations on the active
appearance model algorithm. In 10th
British Machine Vision Conference
(BMVC 2002), pages 837{846, Cardi®
University, September 2002.
10. I. Matthews and S. Baker. Active
appearance models revisited. International
Journal of Computer Vision,
60(2):135{164, November 2004.
11. Ezio Malis, "Improving vision-based
control using efficient second-order
minimization techniques," Proceedings of
Ho Chi Minh city University of Technology
193
Oct. 21 – Oct. 23, 2009 IFOST 2009