Development of an Algorithm for Eye Gaze Detection Based on Geometry Features Ho Tan Thuan Huynh Thai Hoang Faculty of Electrical and Electronic Engineering Ho Chi Minh City University of Technology Ho Chi Minh City, Vietnam Email: [email protected] Faculty of Electrical and Electronic Engineering Ho Chi Minh City University of Technology Ho Chi Minh City, Vietnam Email: [email protected] Abstract—This paper describes a simple but effective algorithm for gaze detection based on eye geometry features. First, human face and eye are detected using Haar-like features and Adaboost algorithm. Then Canny filter is employed to extract edge feature in the eye image. Next the iris is estimated by a circle such that the average of nearest distances between a point on the circle and a white pixel in the edge image is minimized. After that the eyelids are detected based on image color’s saturation and approximated by two second degree polynomials. The intersections between the two polynomials are the locations of the eye corners. Using the relative position information between the center of the iris and the eye corners, the eye gaze can be recognized. This system requires only a low cost web camera and a personal computer. The accuracy of this gaze detection system is over 80% in good lighting condition. the eye within the face region is recognized. And last, the eyegaze will be indentified. There are two ways to estimate the gaze. The first way is based on a set of sample images which can be trained by many different techniques (e.g. PCA, neural networks, etc.). Using PCA technique [8], the detection system is simple, speed if the training set contains small number of subjects. But this system often confuses between “looking– straight”, “looking–up” and “looking–down”. To have high accuracy, the size of the training set needs to be increased. However, increasing the training set makes the calculation speed slower. The second way is based on geometry of the eye. This technique use only exist information of objects without the training set. The development of a simple algorithm using geometry features described in this article will relate to this genre. Keywords : image processing, face detection, Adaboost, gaze tracking, eye tracking, eye detection, gaze detection. This article is organized as follows: Section II reviews the theories of Haar–like features, Adaboost algorithm and the Canny edge detector which are the fundamentals of this study. In section III, the development of the eye gaze detection algorithm based on geometry features is described in details. Experimental results are presented in section IV. Finally, conclusions are given in section V. I. INTRODUCTION Nowadays, there are many researches about eye-gaze detection system because of its usefulness for a wide range of applications including medical equipments, music players or cars controlled by eyes, even computer interfaces for handicapped people, etc. However, the gaze detection techniques still have many limitations. II. A. Haar–like Features There are many reasons for using features than using the pixels directly. The most important reason is the feature-based system operating faster and more effective in learning new difficult knowledge than the pixel-based system. One method often used to detect eye-gaze is to illuminate the eyes with infrared light sources. Then one or more digital video cameras capture images of the eyes and a computer estimates the gaze [1], [3], [7]. First the glints of the IR-light sources on the image of the eye are identified due to the brightness of the reflection on the dark pupil. Then the center of the pupil and cornea is estimated using for instance circleestimation, template matching or a simplified model of the eye. From this information, an estimation of eye-gaze is done. This type of gaze tracking system is inexpensive, and often has a high accuracy. However, using infrared light sources to illuminate the eyes possibly damage them and uncomfortable for users. Viola and Jones used three kinds of Haar-like features illustrated in Fig. 1. Figure 1. The Viola and Jones feature The value of every feature is the sum of the pixels within white rectangular region(s) subtracted from the sum of pixels within black rectangular region(s). Based on computer vision techniques, some systems are truly non-intrusive because they use only exist information on the images without infrared light sources [5], [6], [8]. Several steps are used when gaze detection is based on image data. First, the face of the subject is detected. Then the location of Ho Chi Minh city University of Technology REVIEW OF RELATED THEORIES Another Haar-like features are also used by this system, all of them are shown in Fig. 2. 173 Oct. 21 – Oct. 23, 2009 IFOST 2009 Sub-windows Reject 1 Reject 2 Edge features Not object … Line features Reject n Object Figure 4. A cascade of classifiers with n stages. Center–surround features Each stage of the cascade, which is trained by Adaboost, will accept almost 100% of the positive samples and reject 20– 50% of the false samples. Every sub-window which is rejected at k–stage will be concluded that it doesn’t contain a face and ignored in the later stages. Linking n stages, the object in image can be detected with high rate. Figure 2. Haar–like features Rectangle features can be computed very efficiently by using integral image. Integral image at location (x, y) is defined the sum of all pixels within rectangle ranging from the top left corner at location (0, 0) to bottom right corner at location (x, y) : (1) C. Canny Edge Detector The Canny edge detection operator was developed by John F. Canny in 1986. He used a multi-stage algorithm to detect edges in images. After calculating all integral images, rectangle features can be calculated quickly. For example, to calculate the sum of the pixels within region D in Fig. 3, the integral images at P1, P2, P3, P4 are used. We have: First the Canny edge detector smoothes image to eliminate noise before trying to locate and detect any edges. And the Gaussian filter is used in the Canny algorithm, because of the simple of filter mask. The equation of the Gaussian function in two dimensions is: I (x, y) ¦xcdx,ycdy i( x, y) where i(x, y) is the gray-level of input image. D = A + B + C + D – (A+B) – (A+C) + A (2) x 2 y2 or G(x, y) D = I(P4) – I(P2) – I(P3) + I(P1) A P1 B (3) P4 e 2ı 2 (4) where x is the distance from the origin in the horizontal axis, y is the distance from the origin in the vertical, and ı is the standard deviation. P2 An example of Gaussian mask with ı = 1.4 is shown below in Fig. 5: D C 1 2ʌV 2 P4 Figure 3. Using the integral image to compute the sum of the pixels within rectangle D B. Adaboost Algorithm AdaBoost, short for Adaptive Boosting, is a machine learning, proposed by Yoav Freund and Robert Schapire. It can be used in conjunction with many other learning algorithms. Viola and Jones introduced a face detection systems capable of detecting face in real-time with high success rate (about 90%). They combined the efficient fast features calculating, the Adaboost algorithm and the cascade technique. In order to detect an object in an image, all possible subwindows need to be examined, and determine if they contain an object or not. This is done for different positions and different scales. To run in real–time, Viola and Jones suggested using a cascade of classifier for the face detection system, see Fig. 4. Ho Chi Minh city University of Technology 2 4 5 4 2 4 9 11 9 4 5 11 15 11 5 4 9 11 9 4 2 4 2 4 2 Figure 5. Discrete approximation to Gaussian function with ı = 1.4 The larger the width of the Gaussian mask, the lower is the detector's noise sensitivity. After smoothing the image, the next step is using the Sobel operator to calculate the gradients of image. The operator uses a pair of 3x3 kernel convolution mask. If A is defined as the source image, and Gx and Gy are two images which at each 174 Oct. 21 – Oct. 23, 2009 IFOST 2009 point contain the horizontal approximations, we have: Gx and vertical derivative Source Image ª1 0 1 º A * ««2 0 2»» «¬1 0 1»¼ (5) 2 1º ª1 A * «« 0 0 0 »» «¬ 1 2 1»¼ (6) and Gx where “*” is the convolution operator. Detecting face using Adaboost algorithm Detecting iris using geometry model Estimating the location of the eye Detecting eyelid based on color’s saturation and geometry model The magnitude of gradient can be computed using formula: G G 2x G 2y | G x G y (7) and the gradient’s direction: T arctan Gx Gy (8) Finally, two thresholds are used, a high T1 and a low T2, any pixel in the image which has the gradient value greater than T1 is presumed to be an edge pixel. Then, any pixels that are connected to this edge pixel and have the gradient value greater than T2 are also presumed as edge pixels. Examples of the Canny edge detector illustrated in Fig. 6. Detecting eye using Adaboost algorithm Source images Using the relative position between canthus and pupil, estimating the eye gaze Figure 7. Structure of the Algorithm B. Face and Eye Detection Using the Haar–like features, Adaboost algorithm and cascade technique, the system capable of detecting face in realtime with both high detection rate and very low false positive rates. Result images Figure 6. Examples of the Canny edge detector III. DEVELOP THE ALGORITHM FOR EYE GAZE DETECTION To determine the location of the eyes within the face region, two rectangles surrounding the left eye and right eye are relatively estimated to speed up the detection, and reduce the mistaken recognition to other parts of the face such as nose, mouth, etc. A. Structure of the Algorithm The eye-gaze estimated by dividing it into four directions: (1) looking straight, (2) looking up, (3) looking left, and (4) looking right. The structure of the algorithm is illustrated in Fig. 7. Ho Chi Minh city University of Technology 175 Oct. 21 – Oct. 23, 2009 IFOST 2009 If W and H are the width and height of the rectangle surrounding the face, a Descartes coordinate system which is illustrated in Fig. 8 is used. Two rectangles surrounding the left eye and right eye in the figure will satisfy the following conditions: (0, 0) W A Source image – The center of circle: (x, y) = (centerxmin, centerymin) – The radius of circle: r = rmin – Minimum average distance: mindavr = infinite y A’ r < rmax x < centerxmax, y < centerymax H x Figure 8. Estimating two rectangles surrounding the left eye and right eye x Location of point A : xA = W/6, yA = H/5 x Location of point A’ : xA’ = W/2, yA’ = H/5 x Width of rectangles : w = W/3 x Height of rectangles : h = H/3 N END Y With each pixel locate at (i, j) on the circle, the nearest neighbouring edge pixel ( i + epsi, j + espj) is determined. The the distance between two pixels is calculated: dij = sqrt(epsi2 + espj2) Then the sum of distances d is calculated: d = d + dij After considering all pixels on the circle, the average distance is computed: davr = d/(number of pixels on the circle) After estimating the position of left and right eyes, the location of them are exactly detected by Adaboost algorithm as same as face detection. C. Iris detection To detect an eye–gaze, five parts of the human eye: iris, pupil, sclera, eyelids and canthus which are illustrated in Fig. 9 are focused on. davr < mindavr N Y davr = mindavr Figure 9. Five main parts of the eye Increase the value of center (x, y) or the radius of circle Based on relative position between pupil (center of the iris) and canthus where the upper and lower eyelids meet, the eyegaze can be estimated. Therefore, the iris, and two eyelids must be detected. Figure 11. Flow chart of iris detection algorithm To recognize the iris within the eye region which is detected by Adaboost algorithm, the Canny edge detector algorithm is used. Some example images are show in Fig. 10. Because the gray level changes suddenly from iris to sclera, there is a part of circles surrounding the iris in Canny edge image. To detect the iris, circles with the change of center position (x, y) between centerxmin < x < centerxmax and centerymin < y < centerymax and the change of radius between rmin < r < rmax are used to scan the Canny edge image. With each point (xi, yj) on this circles, we find the Canny edge pixels (xi + įi, yj + įj) nearest to (xi, yj), and compute the error distance dij between them. After considering all points on each circle, we calculate the average error distances: Source images Result images d Figure 10. Result of the Canny edge detector Ho Chi Minh city University of Technology 176 ¦ d ij number of points on circle Oct. 21 – Oct. 23, 2009 IFOST 2009 (9) The circle which has smallest average error distance is chosen. n S ¦İ 2 i (11) i 1 Flow chart of iris detection algorithm is showed in Fig. 11. Examples of iris detection are showed below in Fig. 12. Source images Source images Result images Figure 13. Examples of eyelid detection Result images S is the minimum when a1, b1, c1 are the solutions of following equations system: Figure 12. Examples of iris detection D. Eyelid Detection Upper-eyelid and lower-eyelid meet at canthus. So to recognize the position of canthus, first two eyelids are detected. Eyelid detection algorithm base on color’s saturation and geometry features. The shape of two eyelids is considered two parabolas. wS ° ° wa1 ° wS ® ° wb1 ° wS ° wc ¯ 1 To detect eyelid, this algorithm followed these subsequent steps: x Convert color system of source image from RGB to HSV. x Based on saturation and gray level of image pixel, sclera is detected. Using the information of sclera’s edge pixels and iris’s edge pixels, the eyelids are recognized. x 0 (12) 0 0 or na 1 b1 ¦n x i c1 ¦n x i2 i 1 i 1 ° n n n 2 3 ®a 1 ¦i 1 x i b1 ¦i 1 x i c1 ¦i 1 x i °a n x 2 b n x 3 c n x 4 1 ¦i 1 i 1 ¦i 1 i ¯ 1 ¦i 1 i n ¦i 1 y i n ¦i 1 x i y i n ¦i 1 x i2 y i (13) Similar, we interpolate the lower eyelid pixels by another second degree polynomial: y = a2x2 + b2x + c2 Interpolating the pixels on the eyelids to second degree polynomials. The location of canthus is the solution of equations system: a 1 x 2 b1 x c1 ® 2 ¯a 2 x b 2 x c 2 While RGB color space based on 3 basic colors: Red, Green, Blue in Descartes coordination system, the HSV color space based on 3 components: Hue, Value and Saturation. Saturation component of color pixel relate to the purity or blaze of color. Since the monochromatic light is purity and blaze, its saturation is high. Because sclera is white, it has lower saturation value and higher gray value than neighbor regions. y y (14) Examples of interpolating the pixels on the eyelids by second degree polynomials are showed below in Fig. 14. The sclera boundary pixels which are the set of pixels having the change of color’s saturation component, and gray level value greater than thresholds are detected. Combining the edge pixels of sclera and iris, the eyelid’s edge is recognized. Examples of eyelid detection are showed below in Fig. 13. Source images To locate the canthus, first the upper eyelid pixels are interpolated by second degree polynomial: y = a1x2 + b1x + c1 If İi is error of pixel locate at (xi, yi), we have: İi = yi – a1xi2 + b1xi + c1 (10) Figure 14. Interpolating the pixels on the eyelids by second degree polynomials Then, the total square of error is: Ho Chi Minh city University of Technology 177 Oct. 21 – Oct. 23, 2009 IFOST 2009 E. Eye Gaze Estimation To estimate the eye gaze, relative position between pupil and canthus is used. TABLE 1. The statistics of the number of success detected images in good lighting condition Number of source images Number of success detected images x If the distance between the iris center and the left canthus is less than a threshold T1, the eye is looking left. x If the distance between pupil and right canthus is less than a threshold T2, the eye is looking right. Looking Left Looking Right Looking Up 10 10 10 10 8 9 8 8 TABLE 2. The statistics of the number of success detected images in high-bright lighting condition x If the distance between pupil and the mid-pixel of the segment connecting left canthus and right canthus is less than a threshold T3, the gaze is looking straight. If the distance is greater than T3 and less than threshold T4, the gaze is looking up. IV. Looking Straight Number of source images Number of success detected images Looking Straight Looking Left Looking Right Looking Up 10 10 10 10 3 3 4 2 V. CONCLUSIONS EXPERIMENTAL RESULTS This section presents the experimental results to illustrate the performance of the proposed eye gaze detection algorithm. In these experiments 80 images in which 40 images are in good lighting condition and 40 images are in high-bright lightning condition are captured. Fig.15 show some test images in good lighting condition. Table 1 and 2 show the statistics of the number of success detected images. Although the proposed algorithm is able to detect eye gaze in images captured by low resolution web camera, it is still very sensitive to illumination condition. The successful rate is approximately 80% in good lighting condition. Because the canthus detection is based on color saturation, the result is low-accuracy in high-bright lighting condition. To improve the performance of method, the eyelid detection algorithms need to be improved. This paper described our first effort in development an eye gaze detection algorithm for non–intrusive systems. The algorithm consists of these steps: face and eye detection, iris estimation, eye corners localization and eye gaze determination. The algorithm is able to detect the eye gaze with satisfactory successful rate in good lighting condition. Despite its simplicity and easy implementation, the algorithm still has some limitations. Because the canthus detection is based on color saturation, the result is not good as expected in highbright lighting condition. To improve the result to an acceptable level, in the future, the edge detector within iris and eyelids detection algorithms will be improved to make the gaze detection system to be robust to different lighting conditions. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] Figure 15. The result images of gaze detection system in good lighting condition Ho Chi Minh city University of Technology 178 K.Talmi, J.Liu, “Eye and Gaze Tracking for Visually Controlled Interactive Stereoscopic Displays”, Signal Processing: Image Communication 14, 1999. Anders Jorgensen , “Adaboost and Histograms for Fact Face Detection”, KTH Computer Science and Communication, 2006. T. Takegami, T. Gotoh, S. Kagei, and R.Minamikawa–Tachino, “A Hough Based Eye Direction Detection Algorithm without On–site Calibration”, Proc. VIIth Digital Image Computing : Techniques and Applications, Sun C., Talbot H., Ourselin S. and Adrriaansen T. (Eds.), 10 – 12 Dec. 2003, Sydney, pp.459–468. H. Kashima, H. Hongo, K. Kato, K. Yamamoto, “A Robust Iris Detection Method of Facial and Eye Movement”, VI 2001 Vision Interface Annual Conference – Ottawa, Canada – 7-9 June 2001. Mu–Chun Su, Kuo–Chung Wang, Gwo–Dong Chen, “An Eye Tracking System and Its Application in Aids for People with Severe Disabilities”, Department of Computer Science and Information Engineering, National Central University, Chung Li, Taiwan, 2006, pp.44–52. Kyung-Nam Kim, and Ramakrishna, R.S., “Vision-based eye-gaze tracking for human computer interface”, IEEE Trans. on Systems, Man, and Cybernetics, Volume: 2, On page(s): 324-329 vol.2. 1999. A.Pérez, M.L. Córdoba, A.García, R.Méndez, M.L.Munoz, J.L.Pedraze, F.Sánchez, “A Precise Eye–Gaze Detection and Tracking System”, Facultad De Informática, Universidad Politécnica de Madrid, 2003. George Bebis, and Kikuo Fujimura, “An Eigenspace Approach to Eye– Gaze Estimation”, ISCA 13th International Conference on Parallel and Distributed Computing, pp. 604-609, Las Vegas, 2000. Oct. 21 – Oct. 23, 2009 IFOST 2009 Detecting four main objects: building, trees, sky and ground surface for outdoor Mobile Robot My-Ha Le Hoang-Hon Trinh Graduate School of Electrical Engineering University of Ulsan, UoU Ulsan, Korea [email protected] Graduate School of Electrical Engineering University of Ulsan, UoU Ulsan, Korea [email protected] Abstract— This paper proposes a method base on context information and color to detect main objects in outdoor complicated scene for outdoor mobile robot or intelligent system. Outdoor scene usually consists of four main objects: Building, trees, sky, ground surface and some non-objects. According to this method, we detect building by using color, straight line, PCs, edge and vanishing point. The trees can be detected by using features of color. The sky and ground surface can be detected by using context information and color. INTRODUCTION When a Robot navigates in outdoor scene, it needs much information about it. In order to collect that information, the basic function is detecting some main objects in outdoor scene. Firstly, the building is detected by using line segments and their dominant vanishing points [7, 8]. The trees are then detection by using RGB color space. The context information of four main static of objects such as sky, trees, buildings and ground surface is used to verify the tree regions. From top to down, the sky is usually appeared at the highest position; the positions of trees and buildings are approximate to each other; the ground surface is usually located at the bottom of images. Here, the area of each candidate of tree is also considered. There are also many researches for detecting the road [17,9]. All of them seem to be applied for the transportation system, because just only the roads including structured and unstructured models are mentioned. For the other intelligent systems, just only detecting road is not sufficient information in real application; for example, when a mobile robot is working in urban environment he needs to analyze information of both road and court. Furthermore, only detecting the ground surface is not sufficient information to construct all the functions for an intelligent system; for example, to answer the question: where am I in the city? The system should detect and recognize the building where he can get more information than the roads. Therefore, the detection of four main static objects is necessary for an intelligent system. Ho Chi Minh city University of Technology [email protected] A candidate of ground surface is the remained image. The line segments and the remained candidate of tree in previous steps are used to coarsely verify the ground surface. Then the ground surface is identified with other objects such as car, human and the non-object regions by multi-filters. Here, the non-object region includes trees, buildings, etc. which appears so far from camera and it’s information there is no meaning for effecting transport system or robot. Therefore, we cannot detect them in the previous steps. Filters are also used to classify the ground surface with the car and human based on the color information. The second one is based on the frequency of pixel intensity to separate the ground surface and the non-object region. Because the intensities of pixels in the road or court are usually approximate to each other, so that it is considered as a low frequency region. For the images do not contain building, the non-object region is very important for classify the ground surface and the sky. Keywords: Ground surface detection, building detection, trees detection, context information I. Kang-Hyun Jo Graduate School of Electrical Engineering University of Ulsan, UoU Ulsan, Korea II. BUILDING DETECTION A. Method We use line segments and belongings in the appearance of building as geometrical and physical properties respectively. The geometrical properties are represented as principal component parts (PCPs) as a set of door, window, wall and so on. As the physical properties, color, intensity, contrast and texture of regions are used. Analysis process is started by detecting straight line segments. We use MSAC to group such parallel line segments which have a common vanishing point. We calculate one dominant vanishing point for vertical direction and five dominant vanishing points in maximum for horizontal direction. A mesh of basic parallelograms is created by one of horizontal groups and vertical group. Each mesh represents one face of building. The PCPs are formed by merging neighborhood of basic parallelograms which have similar colors. The PCPs are classified into doors, windows and walls. Finally, the structure of building is described as a system of hierarchical features. The building is represented by number of faces. Each face is regarded by a color histogram vector. The color histogram vector just is computed by wall region of face. 179 Oct. 21 – Oct. 23, 2009 IFOST 2009 Horizontal vanishing point detection is performed similarly to previous section. In reality, building is a prototypical structure where many faces and various color appear in images. Therefore, it is necessary to separate faces. We calculate five dominant vanishing points in maximum for horizontal direction. Line segment detection The first step of the line segment detection is the edge detection of image. We used the edge detection function with Canny edge detector algorithm. The function is run in automatically chosen threshold. The second step is line segment detection following the definition: “A straight line segment is a part of edge including a set of pixels which have number of pixels larger than the given threshold (T1 ) and all pixels are alignment. That means, if we draw a line through the ends, the distance from any pixel to this line is less than another given threshold (T2)”. Building image Detection of line segments MSAC-based detection of dominant vanishing points Reduction of noise Separation of planes as the faces of building Creation of mesh of basic parallelograms Number of building faces Detection of PCPs Separation of the planes as the faces of building The vertical segments are extended by their middle points and vertical vanishing point. We based on the number of intersection of vertical lines and horizontal segments to detect and separate the planes as the faces of building. The coarse stage of face separation is performed by the rule as following: 1. If the same region contains two or more horizontal groups then the priority is given to a group with larger number of segment lines 2. If two or more horizontal groups distribute along the vertical direction then the priority is given to a group with lower order (the order follows red, green, blue, yellow, magenta color of horizontal groups) of dominant vanishing point. The second stage is the recovery stage. Some horizontal segments which located on close the vanishing line of two groups are usually mis-grouped. Some segments instead of belonging to lower order groups, they are in higher order groups. So they must be recovered. The recovery stage is performed from the low to high. The third stage is finding boundaries of faces. Furthermore, the PCPs are detected by merging the neighbor Parallelograms which have the similar physical properties. We based on the RGB color space to present the characters of basic parallelogram. In reality, the light energy coming from different region of the large objects including building to the camera is different. So their intensities with different region are not the same although they located on the same PCPs, for example the same wall. Construction of color histogram vectors Number of faces of building and corresponding color histogram Descriptor representation The descriptor vector of the wall region is constructed by RGB color space. The histograms of R, G or B components are computed and quantized into 32 bins. In order to avoid the boundary effects, which cause the histogram to change abruptly when some values shift smoothly from one bin to another, we use linear interpolation to assign weights to adjacent histogram bins according to the distance between the value and the central value of bin. The histogram vectors are modified to reduce the effects of scale change. The vectors are divided by total pixels and then normalized to unit length. Finally, the three component vectors HR, HG, HB are concatenated into indexing vector H to present each wall region. Each face of building is represented by three vectors. Two of them are formed from two regions of this face. The union of these two face regions creates the other indexing vector. Figure 1. Flow chart of proposed algorithm. Reducing the low contrast lines The low contrast lines usually come from the scene such as the electrical line, the branch of tree. Most of them usually do not locate on the edge of PCPs because the edge of PCPs distinguishes the image into two regions which have high contrast color. We based on the intensity of two regions beside the line to discard the low contrast lines. MSAC-Based detection of dominant vanishing points The line segments are coarsely separated into two groups. The vertical group contains line segments which create an actual angle 20° in maximum with the vertical axis. The remanent lines are treated as horizontal groups. For the fine separation stage, we used MSAC (m-estimator sample consensus) [11, 12] robustly to estimate the vanishing point. B. Result of building detection After detect building face, we eliminate building from image. Horizontal vanishing point detection This work was supported (in part) by Ministry of Knowledge Economy under Human Resources Development Program for Convergence Robot Specialists. Ho Chi Minh city University of Technology 180 Oct. 21 – Oct. 23, 2009 IFOST 2009 Figure 3. Trees detection and elimination IV. Figure 2. Building detection and elimination III. TREES DETECTION A. Method To extract region of tree, we use cues of color. Given a set of simple color point representative of color of tree, we obtain an estimate of “average” or “mean”. Let this average color be denoted by the RGB column vector t. the next objective is to classify each RGB pixel in an image as having a color in the specified range or not. To perform this comparison, we need a measure of similarity. One of the simplest measures is the Euclidean distance. Let x denote an arbitrary point in RGB space. We say that x is similar to t if the distance between them is less than a specified threshold, T. The Euclidean distance between x and t is given by D(x, t) = Œx - tŒ = [(x - t) T (x - t)]1/2 = [(xR - tR)2 + (xG - tG)2 + (xB – tB)2]1/2 Where Œ·ŒLVWKHQRUPRIWKHDUJXPHQWDQGWKHVXEVFULSWV R, G and B, denote the RGB components of vectors t and x. the locus of points such that D(x, t) 7 LV D VROLG VSKHUH RI radius T. The points contained within or on the surface of the sphere satisfy the specified color criterion; points outside the sphere do not. Coding these two sets of points in the image with, say, black and white, produces a binary, segmented image. DETECT SKY AND GROUND SURFACE A. Method We use absolute context for referee to the location of objects in the image. The sky is always on the top of image, base on this characteristic we can detect sky with this context information. The cloud exists always inside sky. It becomes the merger as general feature to merge to one region. The cloud is included in sky and has the difficulty of extraction. Therefore region extraction uses general feature of sky and cloud. The intensity of cloud is generally higher than sky of fine day. In the sky gives the condition that the cloud always exists. Region segmentation extracts region of the cloud after we extract the region of sky. Therefore, the result merges the region of the sky and cloud. The sky exists in the image at the top and the cloud gives context information which exists always in the sky. Several color spaces including RGB, HSI, CIE, YIQ, YCbCr, etc are widely used. In this paper, we use RGB color space. A candidate of ground surface is the remained image. The remained object consist some objects that cannot be detected in previous steps. The ground surface is identified with these objects such as car, human and the non-object regions by color detection and filters. Here, the non-object region is what appearance very far or there is no information for navigation of Robot/ intelligent system. Here we use two kinds of filters for ground surface detection. Base on the color information, we can classify the ground surface with cars and human or the other small objects. To separate the ground surface and the non-object region, we use the characteristic of frequency of ground surface region. Because the intensities of pixels in the road or court are usually approximate to each other, so that it is considered as a low frequency region. B. Result of sky and ground surface detection The result for sky and ground surface detection as bellow B. Result of trees detection After detect trees, we also eliminate trees from image, in the next step we detect sky and ground surface using context information. Ho Chi Minh city University of Technology 181 Oct. 21 – Oct. 23, 2009 IFOST 2009 regional innovation through KIAT and post BK21 project at University of Ulsan and Ministry of Knowledge Economy under Human Resources Development Program for Convergence Robot Specialists. REFERENCES [1] R. Arnay, L . Acosta, M. Sigut and J. Toledo, “Ant Colony Optimization Algorithm for Detection and Tracking of Non-structured Roads”, Electronics Letters, Vol. 44 No. 12, pp. 725-727, 5th June 2008. [2] Q. Gao, Q. Luo and S.Moli, “Rough Set based Unstructured Road Detection through Feature Learning”, Proceedings of the IEEE International Conference on Automation and Logistics, pp.101-106, August 18 - 21, Jinan, China, 2007. [3] Y. Guo, V. Gerasimov and G. Poulton, “Vision -Based Drivable Surface Detection in Autonomous Ground Vehicles” Proceedings of the 2006IEEE/RSJ Int’ Conf. on Intelligent Robots and Systems, pp. 32733278, October 9 - 15, 2006. [4] Y. He, H. Wang and B. Zhang, “Color-Based Road Detection in Urban Traf¿F6FHQHV´,(((WUDQVDFWions on intelligent transportation systems, vol. 5, no. 4, pp. 309-318, December 2004. [5] J. Huang, B. Kong, B. Li and F. Zheng, “A New Method of Unstructured Road Detection Based on HSV Color Space and Road Features” Proceedings of the 2007 International Conference on Information Acquisition, pp.596-601, July 9-11,2007. [6] D. Song, H. N. Lee, J. Yi, A. Levandowski, “Vision- Based Motion Planning for an Autonomous Motorcycle on Ill-Structured Roads”, Autonomous Robots, Vol. 23, No.3, pp. 197-212, 2007. [7] H. H. Trinh and K. H. Jo, “Image-based Structural Analysis of Building Using Line Segments and Their Geometrical Vanishing Points”, SICEICASE, Oct. 18-21, 2006. [8] H. H. Trinh, D. N. Kim and K. H. Jo, “Facet-based Multiple Building Analysis for Robot Intelligence”, Journal of Applied Mathematics and Computation (AMC), Vol. 205(2), pp. 537-549, 2008. [9] Y. Wang, D. Chen and C. Shi, “Vision-Based Road Detection by Adaptive Region Segmentation and Edge Constraint” Second International Symposium on Intelligent Information Technology Application, Vol. 1, pp. 342-346, 2008. [10] M. Lievin and F. Luthon, “Nonlinear Color Space and Spatiotemporal MRF for Hierarchical Segmentation of Face Features in Video”, IEEE Trans. on Image Processing, vol. 13, pp. 63-71, 2004. [11] M. A. Fischler and R.C. Bolles, “Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography,” Communications of the ACM, Vol. 24, Issue 6, pp. 381~395, 1981. [12] R. Hartley and A. Zisserman, “Multiple view geometry in computer vision”, Cambridge Uni. Press, 2004. Figure 4. Sky and ground surface detection V. CONCULUSION This paper proposed a method to detect four main objects in outdoor environment using multiple cues. Multiple cues are a color, straight line, context information, PCs, edge, vanishing point. Combining those features, we can segment image to several regions such as building, sky, trees and ground surface. We detect building using color, straight line, PCs, edge and vanishing point. The tree region extract by using features of color. The sky also can be detected by using color and context information. The remained image is candidate of ground surface. Then color and filter is used to identify the ground surface. Because of the complicated of our door scene the result of simulation is not completely perfectibility. In the future, we will keep studying outdoor object detection applying for outdoor mobile robot with another approach and using sequence image or omni-directional camera or using multifilter are the next choices. ACKNOWLEDGMENT The authors would like to thank to Ulsan Metropolitan City. MKE and MEST which has supported this research in part through the NARC, the Human Resource Training project for Ho Chi Minh city University of Technology 182 Oct. 21 – Oct. 23, 2009 IFOST 2009 Entrance Detection using Mutiple Cues Suk-Ju Kang Hoang-Hon Trinh Dae-Nyeon Kim Kang-Hyun Jo Graduate School of Electrical Engineering University of Ulsan, UOU Ulsan, Korea [email protected] Graduate School of Electrical Engineering University of Ulsan, UOU Ulsan, Korea [email protected] Graduate School of Electrical Engineering University of Ulsan, UOU Ulsan, Korea [email protected] Graduate School of Electrical Engineering University of Ulsan, UOU Ulsan, Korea [email protected] entrance. So that the regions which is not matched to wall and windows are considered as the candidates of entrance. Abstract—This paper describes an approach to detect the entrance of building for outdoor robot. The entrance is important component which connects internal and external environments of building for the navigation of robot. Firstly, building surfaces are detected and then wall region and windows are detected. The remaining regions except the wall region and windows are candidates of entrance. To detect the entrance, We use the information of windows in rectangular shape. The geometrical characteristics are used for extracting the entrance such as the height of window, the height of floor. And We adopt a probabilistic approach for entrance detection by defining the likelihood of various features. The proposed approach captures both the shape and color. Keywords-component; Entrance detection, probabilistic model, geometrical characteristics I. INTRODUCTION It is important to find the entrance of the building in the external environment. The robot have to recognize the entrance for navigation. Because the entrance of the building connects the external environment to the internal environment. The features of the entrance are similar to doors and windows such as vertical lines and corners[1, 3-7]. The door detection of interior has been studied numerous times in the past. The entrance is a part of the building. In [2], authors detect the entrance and windows in order to recognize building. They use laser scanners and a CCD camera. The research approach is based on a variety of sensors. For example, doors are detected by sonar sensors and vision sensors with range data and visual information respectively [3]. In [4-6], authors use a CCD camera getting geometrical information and color information. Authors in [4] use the fuzzy system to analyze the existence of doors and genetic algorithm to improve the region of doors. In [5], authors using the shape and the color information detect the doors in probabilistic method. In addition, The research using laser sensors has been studied [7]. Our laboratory has studied the building recognition [8-11]. We use the algorithm which is researched in our laboratory for building recognition and looking for the entrance in connection with the window detection [8-11]. Fig.1 shows an overview of proposed method where wall region, surface detection and window detection have been done by our previous works. The building has three kinds of principal components such as wall, windows and Ho Chi Minh city University of Technology Figure 1. An overview of proposed method II. SURFACE, WALL REGION AND WINDOW DETEION The processes for detecting building surface, estimating wall regions and detecting windows were explained in detail in our previous works [8-12]. First, We detected line segments and then roughly rejected the segments which come from the scene as tree, bush and so on. MSAC algorithm is used for clustering segments into the common dominant vanishing points comprising one vertical and several horizontal vanishing points. The number of intersections between the vertical line and horizontal segments is counted to separate the building pattern into the independent surfaces. And then we found the boundaries of surface as the green frame in the second row of Fig.2. To extract wall region, we used color information of all pixels in the detected surface [10]. At first, a hue histogram of surface’s is calculated, then it is smoothed several times by 1D Gaussian filter. The peaks in the smoothed histogram are detected. The continuous bins that are larger than 40(%) of the highest peak are clustered into separate groups. The pixels indexed by each continuous bin group are clustered together. The pixels of each group are segmented again where the hue 183 Oct. 21 – Oct. 23, 2009 IFOST 2009 value is replaced by gray intensity information. And then The biggest group of pixels is chosen as wall region. Finally, The candidates of windows is the remaining regions not to be the wall region. To detect window the rectangular image was considered. We use the geometrical characteristics to obtain window regions and then do alignment. Fig.2 shows several examples; the first row is the original images; the second row shows the results of wall region, the third row illustrate windows. A. Noise Reduction by Geometrical Information Geometrical information is acquired from window image and the candidate image of entrance. We obtain the window positions and the positions of the candidate of entrance in the image respectively. And then the information is computed by acquired information. The necessary information is the height of windows hw, height of floors hf and the position of the second floor hwp in the window image. The necessary information of candidate image is the height of regions hnw and the position of regions hnwp as defined equation (1). The height of floors is defined from a window to a adjacent window. Normally, The entrance is near to bottom and has higher than windows in height. Also, That is not over the second floor as defined equation (2). Figure 2. Detection of building surface, wall region and windows III. ENTRANCE DETECTION hw x1wmax x1wmin hf x1wmin x2wmin hwp x3wmin hnw x1nw x1nw max min hnwp We acquire the height between floors, floors of the building and so on from the rectangular image binarized. And The vertical lines and horizontal lines are detected by hough transform. Finally, Probabilistic model with the lines and the green color of RGB channel is used to decide the entrance region. Fig.3 present the process detecting the entrance. x1nw min hnw ! hw hnwp hwp Figure 4. window image and the candidate image of entrance B. Line segments First, The vertical lines and horizontal lines are extracted by hough transform in a rectangular image. Hough transform convert the lines into the points in hough space [14]. This accumulated points are become to lines in image space again. We can compute the distance of x-coordinate and y-coordinate between a starting point and a endpoint in a line li respectively. Figure 3. Entrance Detection Algorithm Ho Chi Minh city University of Technology 184 Oct. 21 – Oct. 23, 2009 IFOST 2009 When the distance Di is 5 or less this line become the vertical line as equation (3). The horizontal line is the same as the vertical line except considering the axis. Fig.5(a) show the method and Fig.5(b) is the results through below equation. Di | xa xb | : For Vertical lines D j | ya yb | : For Horizontal lines channel of RGB color space. We use the normalized color from 0 to 255. x The geometrical features : total length of a vertical line, number of intersection points x The color feature : density of color. We will assume that method of probabilistic model is well described by a small set of parameter ș. We use a restricted simple data with line length Xl, intersection points Xi and color information Xc to compute P(Entrance|Xc,Xl,Xi) in the rectangular image as defined form : P ( Entrance | X c , X l , X i ) | p ( X c , X l , X i | Entrance) P ( Entrance) This posterior probability can be decomposed to equation (4). P (T | X c , X l , X i ) v p ( X c , X l , X i | T ) P(T ) Figure 5. Extracting vertical and horizontal lines = P( X c , X l , X i | T c ,T l ,T i ) P(T c ,Tl ,Ti ) = P( X c , X l , X i | T c , Tl , Ti ) P(T c ) P(T l ) P(Ti ) After extracting lines we have to segment lines. Because Edges are divided into pieces that original images are converted to rectangular shape and extracted to multiple lines. Fig.6(a) show the method how the lines is segmented. The vertical lines are extracted by searching from the left to the rignt whether the line is or not. The horizontal lines have the method which is similar to the vertical lines. The different point is the direction of searching. Fig.6(b) is the results through line segment. We consider the parameters șc, șl and și to be independent. We do not consider P(șc) of the prior information of the color and P(șl) and P(și) of the prior knowledge of geometrical parameters. We use only P(Xc,Xl,Xi | șc,șl,și) of likelihood term of individual measurements and consider maximum likelihood values of parameters, given a particular instantiation of the model parameters. The likelihood term can be factored as equation (5) P( X c , X l , X i | Tc ,Tl ,Ti ) P( X c | T c , X l ) P( X l | Tl , X i ) P( X i | Ti ) Figure 6. Line Segments C. Probabilistic Approach After line segments we extract the intersection points between the vertical lines and the horizontal lines. We assume that a vertical line has up to 2 points and the lines of the entrance are longer than the others. Really, a vertical line has that the number of points is from 0 to 2 or more. The average height, havg, shown in equation (7) is average length of vertical lines with two points from bottom point to top point. Three features are used to decide the entrance. Two features are the number of intersection points and the length of vertical lines. The other feature is color density from 0 to 40 in green Ho Chi Minh city University of Technology Figure 7. Components of lines and intersection point and Model of a entrance At first, we consider P(Xi|și) of parameters și of the number of intersection points. We can think that this is weight of the second term and the first term. 185 Oct. 21 – Oct. 23, 2009 IFOST 2009 1 when i=2 ° ®0.8 when i=1 °0.6 when i=0 ¯ P ( X i | Ti ) Region 7 8 9 10 11 The second term P(Xl|și,Xi) is ratio of the length of vertical lines. ha and hb takes the missing portion of the line. The more this part is long, the more the possibility of entrance is small. havg 1 N The obtained value through algorithm 0 0 0.33 0.18 0.16 n ¦h i i 1 P ( X i | Ti , X i ) e § h h ¨ a b ¨ havg © · ¸ ¸ ¹ Finally, the fist term take the color density of green channel of RGB color which is normalized from 0 to 255. The entrance has the low value from 0 to 40 because Most of the entrance is composed of apparent glass. The apparent glass do not reflect the shine. We take the region between two lines including weighted line. Equation(8) show how to the density. P(x|șc) take the density of color value in green channel. t(g) is the total number of pixels between two adjacent lines. P( x | Tc ) T (g) , 0<T ( g ) 40 t(g) P( X c | Tc , X l ) ¦ Xl P( x | Tc ) Figure 8. The detected entrance V. The proposed method performs in the entrance with apparent glass. The entrance with inapparent characteristic does not detected and the entrance is detected inaccurately. In the future we are going to detect all of the entrance and research how to detect the entrance exactly. c( X l ) c(Xl) is the number of pixels in the rectangular region Xl delimiting two adjacent lines.. IV. EXPERIMENTAL RESULT ACKNOWLEDGMENT The proposed method has been experimented for a variety of entrance. Fig.8(a) show the results of the entrance with the apparent characteristic. The entrance is not detected exactly. The reason is why the edge is not extracted between the boundary of wall and entrance. Table 1 is the results of Fig.8(a) from the proposed algorithm. The value of the entrance region is higher than the others. Fig.8(b) of the entrance of inapparent characteristic is not detected because the density of color is low. The blue box of Fig.8 is detected to the entrance and the yellow box is real entrance. TABLE I. The authors would like to thank to Ulsan Metropolitan City. MKE and MEST which has supported this research in part through the NARC, the Human Resource Training project for regional innovation through KIAT and post BK21 project at University of Ulsan and Ministry of Knowledge Economy under Human Resources Development Program for Convergence Robot Specialists. REFERENCES [1] THE VALUE OF REGIONS [2] Region 1 2 3 4 5 6 The obtained value through algorithm 0.16 0.09 0.04 0.02 0 0 Ho Chi Minh city University of Technology FUTURE WORK [3] 186 Haider Ali, Christin Seifert, Nitin Jindal, Lucas Paletta and Gerhard Paar, “Window Detection in Facades,” 14th International Conf. on Image Analysis and Processing, 2007 Konrad Schindler and Joachim Bauer, “A model-Based Method For Building Reconstruction,” In Proc. of the First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003 S.A Stoeter, F. Le Mauff and N.P. Papanikopoulos, “Real-Time Door Detection In Cluttered Environments,” In 2000 Int. Symposium on Intelligent Control, 2000, pp. 187-192 Oct. 21 – Oct. 23, 2009 IFOST 2009 [4] [5] [6] [7] [8] R. Munoz-Salinas, E. Aguirre and M. Garcia-SilventeJ, “Detection of doors using a generic visual fuzzy system for mobile robots,” vol 21 Auton Robot, Springer, 2006, pp.123–141. A.C Murillo, J. Kosecka and J.J. Guerrero C. Sagues, “Visual Door detection integrating appearance and shape cues”, Robotics and Autonomous Systems, 2008, pp. 512-521 Jung-Suk Lee, Nakju Lett Doh, Wan Kyun Chung, Bum-Jae You and Young Il Youm, “Door Detection Algorithm Of Mobile Robot In Hallway Using PC-Camera,” Proc. of International Conference on Automation and Robotics in Construction, 2004 D. Anguelov, D. Koller, E. Parker and S. Thrun, “Detecting and Modeling Doors with Mobile Robots,” Proc. of the IEEE International Conf. on Robotics and Automation, 2004, pp. 3777-3784 H.H. Trinh, D.N. Kim and K.H. Jo, “Structure Analysis of Multiple Building for Mobile Robot Intelligence,” Proc. SICE, 2007 Ho Chi Minh city University of Technology [9] [10] [11] [12] [13] [14] H.H. Trinh, D.N. Kim and K.H. Jo, “Urban Building Detection and Analysis by Visual and Geometrical Features,” ICCAS, 2007 H.H. Trinh, D.N. Kim and K.H. Jo, “Supervised Training Database by Using SVD-based Method for Building Recognition,” ICCAS, 2008 H.H. Trinh, D.N. Kim and K.H. Jo, “Facet-based multiple building analysis for robot intelligence,” Journal of Applied Mathmatics and Computation(AMC), vol. 205(2), 2008, pp. 537-549 H.H. Trinh, D.N. Kim and K.H. Jo, “Geometrical Characteristics based Extracting Windows of Building Surface,” unpublished Richard O. Duda, Peter E. Hart and David G. Stork, “Pattern Classification,” John wiley & Son, Inc, in press Linda G. Shapiro, George C. Stockman, “Computer Vision,” Prentice Hall, in press . 187 Oct. 21 – Oct. 23, 2009 IFOST 2009 FACIAL EXPRESSION RECOGNITION USING AAM ALGORITHM Thanh Nguyen Duc, Tan Nguyen Huu, Luy Nguyen Tan* Division of Automatic Control, Ho Chi Minh University of Technology, Vietnam *National key lab for Digital Control & System Engineering, Vietnam ABSTRACT Facial expression recognition is especially important in interaction between human and intelligent robots. Since the introduction of AAM model, there has been great change in detection accuracy. The main concern of this material is on facial expression recognition. In this paper, the recognition task is based on two methods, one of them is AAM combined with neural network, which gives better accuracy but lower speed, while the other is AAM combined with point correlation which is especially fast. Thus, it can be integrated to mobile robot platforms. Keywords: Facial expression recognition, Active Appearance Models (AAM), Digital image processing 2.1. AAM Model 1. Introduction Digital image processing is a task involved in capturing images from camera and analyzing the images, to extract the necessary information from the images. Facial expression recognition is not an exception of this rule. In this case, the information to be extracted is on special features of the face that relates to human feelings such as angry, normal happy, surprise. W. Y. Zhao [1], B. Fasel [2] gave survey about facial expression recognition. Philip Michel & Rana El Kaliouby [3] used SVM (Support vector machines), the accuracy is over 60%. Ashish Kapoor [4] considered the movement of eyebrows. M.S. Barlett [5] combined Adaboost and SVM with the database DFAT-504 of Cohn & Kanade In this article, AAM model [7] is used and combined with two other methods for recognitions. The first one is AAM combined with neural network, which would give us better accuracy, where as the second one is AAM combined with point correlation, which would give us an acceptable accuracy but a better speed. 2.1.1.Building AAM model Since AAM model was first introduced in [7], the use of this model has been increasing rapidly. An AAM model consists of two parts. AAM shape and AAM texture 2.1.1.1. AAM shape According to [7], AAM shape is composed of the coordinates of v vertices make of the mesh: s T > x1 , y1 , x2 , y2 ,..., xv , yv @ (1.1) Furthermore, AAM has linear shape, thus a shape vector s can be represented as: N s s0 ¦ pi si s0 Ps bs (1.2) i 1 Where s0 is the base shape and s0 , s1 ,..., sN are orthogonal eigenvectors obtained from training 2. Background theories on AAM Ho Chi Minh city University of Technology 188 Oct. 21 – Oct. 23, 2009 IFOST 2009 bs s1 , s2 ,..., sN and Ps shapes, ( p1 , p2 ,..., pN ) A Taking Taylor expansion of T(W(x, 'p )) in terms of 'p at 'p 0 . This expression can be rewritten as The AAM texture A( x ) is a vector defined as pixel intensities across objects x ɽ s0, x, y where x (1.5) x T 2.1.1.2. AAM texture T 2 ¦ > I (W ( x, p)) T (W ( x, 'p))@ wW ( x; p ) A ¦ [ I (W ( x, 'p)) T ( x) T wp x . Texture of AAM is also linear. Thus, it can be presented as: 2 p 0 'p ] (1.6) M A( x) A0 ( x ) ¦ Oi Ai ( x) Please note that, W ( x, 0) x (because p 0 does mean no changes at all) The solution to minimize that expression can be easily found as followed: A0 ( x) Pb t t (1.3) i 1 x s 0 T O1 , O2 ,..., OM Where bt , Pt (t1 , t2 ,..., tM ) are orthogonal eigenvectors obtained from the training textures. 'p ª wW º H ¦ «T wp »¼ x ¬ 1 T > I (W ( x; p)) T ( x)@ (1.7) 2.1.2.Fitting AAM model to an object Where H is the Hessian matrix. The goal of fitting AAM model is to find the best alignment to minimize the difference between the constant template T ( x ) and an input image I ( x ) with respect to warp parameters p . Let x T x, y T H ª wW º ª wW º ¦x «T wp » «T wp » ¬ ¼ ¬ ¼ be the pixel coordinates and W ( x, p ) Notice that the Jacobian matrix denotes the set of parameterized allowed warps, where p ( p1 , p2 ,..., p N )T is a vector of N 2 Table 1 Steps for fitting AAM model Pre-computation Iteration •For every pixel x in •Start iteration at p = 0. the convex hull of •For each pixel x in the AAM shape, obtain convex hull of the the intensity T(x) in reference AAM shape, the template image. warp it to coordinate •Calculate the gradient W ( x, p ) . Then, obtain of T(x), which is T . the intensity Calculate the Jacobian I (W ( x, p )) by (1.4) x According to [10], to solve this, we assume that an estimation of p is known and then iteratively solve for the increment parameter 'p so that the following expression is minimized Ho Chi Minh city University of Technology wW is calculated wp at p 0 and the matrix T can be pre-computed. Thus, H can be pre-computed before every iteration. From these statements, according to [10], the fitting algorithm for AAM model can be summarized in the following steps parameters. The warp ( x, p ) takes the pixel x in the coordinate frame of a template T and map it to the sub-pixel location W ( x, p ) in the coordinate of the input image- I . The LucasKanade image alignment algorithm in [10] is to minimize: ¦ > I (W ( x, p)) T ( x)@ (1.8) 189 Oct. 21 – Oct. 23, 2009 IFOST 2009 matrix wW at ( x, 0) . wp •Calculate the steepest decent image T wW wp •Calculate the Hessian matrix versus (1.8) interpolation •Compute error image I(W(x, p)) – T(x). •Compute 'p using the pre-computed H and the formula (1.7) •If 'p is small enough, ends the iterations. Otherwise, update Fig. 2 Shape s0 Fig. 3 Base texture A0(x) Figure 4 show us some examples of fitting the model face using AAM model. These images have not been previously used in training phase. The average number of iterations for each image is 5 'p m ' p p 3. Experimentation 3.1. Building and testing AAM model In the training phase, we used more than 200 images of the model’s face for 4 basic facial expressions which are normal, happy, surprise and angry taken in different lighting conditions. Each input image is marked with 66 points. The figure below shows some of the images that have been used in the training phase. Fig. 4 Result of AAM face fitting 3.2. Facial Expression Recognition using AAM combined with neural network 3.2.1.Training neural network In our experiment, we used about 30 images for each emotion. They are happy, normal, surprise, angry. For each image, we used the following procedure to extract the featuring vector. 1. Load the built AAM model. 2. Load the image need to extract the featuring vector together with its corresponding emotion vector E. The emotion vector E is a 4*1 vector, which belongs to one of the 4 T following vectors > 0, 0, 0,1@ (normal); Fig. 1 Some input images T After performing Procustes’ analysis [10] to eliminate the effect of similarity transforms on the input images (such as rotation, scaling, translation etc), we get normalized input shape vectors. Next, performing PCA analysis on these vectors, we get the base shape s0 and other orthogonal shape eigenvectors. Finally, performing PCA analysis on input image textures, after having warped them into base shape s0 , we got base texture A0 ( x) and other orthogonal texture eigenvectors. Ho Chi Minh city University of Technology T >0, 0,1, 0@ (happy); >0,1, 0, 0@ T >1, 0, 0, 0@ (angry). (surprise); 3. Apply AAM model to track the original face (F) in the image and normalized it to another face (F’) which is a texture defined on the base shape s0 . 4. Perform PCA analysis on F’ using the textures A0 ( x), A1 ( x),...., AM ( x) . We get F' 190 A0 v1 A1 v2 A2 ... vM AM (1.9) Oct. 21 – Oct. 23, 2009 IFOST 2009 Table 2 Result of detection using AAM and MLP Emotions % of correction/True images (out of 75) Normal 82.66% (62) Happy 96.00%(72) Surprise 85.33%(64) Angry 84.00%(43) 5. The featuring vector for this image is v T > v1 , v2 ,..., vM @ (1.10) Using the set of such (v, E ) vectors, we trained the three-layer neural network, which has the following structures. Input layer: 62 neurons, which is also the number of eigen textures; hidden layer: 50 neurons; output layer: 4 neurons, which is also the number of emotions. The average time for each image is about 750 milliseconds. The testing images came from a Genius’ Slim 1322AF webcam. Totally, 75 images were tested for each emotion. 3.2.2.Testing neural network 3.3. Fast Facial Expression using point Correlation For a testing input image, conducting the following procedures 1. Load the AAM model 2. Load the neural network. 3. Apply AAM model to track the original face ( F ) in the input image and normalized it to another face ( F ') , defined on the base shape s0. This will help eliminate effects cause by similar transform and face rotation 4. Perform PCA analysis on F’ using the base texture A0 , A1 ,...., AM obtained previously in the training phase. We get F' A0 v1 A1 ... vM AM 3.3.1.Background knowledge The Facial Expression Recognition based on MLP proved to be effective but it’s fairly slow due to the reason of performing PCA analysis. In order to improve the speed of recognition, we suggest the idea of using point correlation in combine with AAM model. Let us consider 66 points on the face after having fit the AAM model to the face. (1.11) 5. Using the featuring vector - v T > v1 , v2 ,..., vM @ calculate and choosing the maximum output of the neural network. Then, infer the corresponding emotion. 3.2.3.Result The experiment is implemented on a Compaq nx6120 laptop, running at 1.73 GHz, Ram 256 MB, using Visual C++ and open CV. Fig. 5 66 face featuring points. Let d ( m, n) be the Euclidean distance (in pixels) between the point m and n. Then, calculate the following ratios: Ho Chi Minh city University of Technology 191 Oct. 21 – Oct. 23, 2009 IFOST 2009 Table 3 Result of detection using AAM and Point Correlation Emotion % of correctness/ True images (out of 75 images)(*) Normal 85.33%(64) d (22, 28) (1.12) 0.5d (17,19) 0.5d (12,15) 0.5d (17,19) 0.5d (12,15) (1.13) R eye d (37,34) d (4,5) (1.14) R eyebrown d (13,16) R mouth On one hand, experimentation proves that the ratio Rmouth is relatively large when a person is happy or angry. On the other hand, it is small when he or she is normal or surprise. Besides, Reye can be used to differentiate between normal and surprise emotions. Because, when a person is surprise, his eyes tend to open larger. Furthermore, Reyebrown can be used to differentiate between angry and happy. This results from the fact that, when a person is angry, the distance between the eye-browns is enlarged. Happy Surprise 90.66%(68) 81.33%(61) Angry 82.66%(62) The average processing time for a single image is about 250 ms, which is far more faster than the previous method. This is because most of the time is used for fitting AAM model. Besides, the distance calculations and comparing do not takes so much time. (*)Conducted with mouth threshold = 7.2, eye threshold = 0.32, eb threshold = 0.90. 4. Conclusion On the whole, facial recognition using AAM combined with neural network gives higher accuracy than that using point correlation method ,but point correlation give us faster recognition time. Thus, recognition using point correlation gives us a great chance to integrate the task into mobile platform, such as robots. However, that is over the scope of this article. In order to improve the accurateness of recognition task, a better minimization techniques should be used, such as second-order minimization as in [11], but it will takes longer time for the algorithm to converge. The experiment was conducted with a personspecific AAM model. In order to expand the system to recognize with various people, a lot of training textures and shapes should be used. In this case, the algorithm is exactly the same. Further research will be taken to get better accuracy and faster recognition time. Many thanks for sponsor from the science foundation fund of VNUHCM Fig. 6 Flowchart of Point Correlation method The flowchart presented in figures 6 summaries our algorithm. In this flowchart, mouth threshold, eye threshold and eb threshold are three tunable parameters to be suitable for a specific person. 3.3.2.Result Using test images in previous recognition method, we have the flowing result. Ho Chi Minh city University of Technology REFERENCES 1. W. Y. Zhao et al. , “Face Recognition: A Literature Survey”, UMD CfAR Technical Report CAR-TR-948, 2000. 192 Oct. 21 – Oct. 23, 2009 IFOST 2009 IEEE International Conference of Robotics and Automation, 2004. 12. Selim Benhimane and Ezio Malis, "Realtime image-based tracking of planes using Efficient Second-order Minimization," , pp. 943-948, Proceedings of IEEEIRSJ International Conference on Intelligent Robotics and Systems, 2004. 2. B. Fasel and J. Luettin, “Automatic Facial Expression Analysis: A Survey”, Pattern Recognition, Vol. 36(1), 2003, pp. 259275. 3. Philip Michel & Rana El Kaliouby, Real time Facial expression recognition in Video using Support Vector Machines, University of Cambridge 4. Ashish Kapoor Yuan Qi Rosalind W. Picard, Fully Automatic Upper Facial Action Recognition, IEEE InternationalWorkshop on Analysis and Modeling of Faces and Gestures , Oct 2003 5. Marian Stewart Bartlett, Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction 6. CVPR Workshop on Computer Vision and Pattern Recognition for HumanComputer Interaction. 7. Marian Stewart Bartlett ,Towards social robots: Automatic evaluation of humanrobot interaction by face detection and expression classification, Advances in Neural Information Processing Systems 2003. 8. G. J. Edwards, C. J. Taylor, and T. F. Cootes. Interpreting face images using active appearance models pp300-305, In Proc. International Conference on Automatic Face and Gesture Recognition, June 1998. 9. T. Cootes and P. Kittipanya-ngam. Comparing variations on the active appearance model algorithm. In 10th British Machine Vision Conference (BMVC 2002), pages 837{846, Cardi® University, September 2002. 10. I. Matthews and S. Baker. Active appearance models revisited. International Journal of Computer Vision, 60(2):135{164, November 2004. 11. Ezio Malis, "Improving vision-based control using efficient second-order minimization techniques," Proceedings of Ho Chi Minh city University of Technology 193 Oct. 21 – Oct. 23, 2009 IFOST 2009
© Copyright 2026 Paperzz