http://dx.doi.org/10.5573/JSTS.2012.12.2.150 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 An FPGA-based Parallel Hardware Architecture for Real-time Eye Detection Dongkyun Kim*, Junhee Jung*, Thuy Tuong Nguyen*, Daijin Kim**, Munsang Kim***, Key Ho Kwon*, and Jae Wook Jeon* Abstract—Eye detection is widely used in applications, such as face recognition, driver behavior analysis, and human-computer interaction. However, it is difficult to achieve real-time performance with software-based eye detection in an embedded environment. In this paper, we propose a parallel hardware architecture for real-time eye detection. We use the AdaBoost algorithm with modified census transform(MCT) to detect eyes on a face image. We parallelize part of the algorithm to speed up processing. Several downscaled pyramid images of the eye candidate region are generated in parallel using the input face image. We can detect the left and the right eye simultaneously using these downscaled images. The sequential data processing bottleneck caused by repetitive operation is removed by employing a pipelined parallel architecture. The proposed architecture is designed using Verilog HDL and implemented on a Virtex-5 FPGA for prototyping and evaluation. The proposed system can detect eyes within 0.15 ms in a VGA image. Index Terms—Eye detection, hardware architecture, FPGA, image processing, HDL I. INTRODUCTION Eye detection is an image processing technique to Manuscript received Aug. 23, 2011; revised Dec. 19, 2011. * Dep. ECE., Sungkyunkwan University, Suwon, Korea ** Dep. SCE., POSTECH, Pohang, Korea *** Adv. Robotics Research Center, KIST, Seoul, Korea E-mail : [email protected] locate eye position from an input facial image. Eye detection is widely used in applications, including face recognition, driver behavior analysis, and humancomputer interaction [1-4]. The extraction of local facial features, such as nose, eye, and mouth, is important in face recognition to analyze the face image [1-2]. The eye is the most important and most commonly adopted feature for human recognition [2]. Eye detection is used to track the eye and gaze to monitor driver vigilance in driver behavior analysis [3]. Eye-gaze tracking system can be used for pointing and selection in humancomputer interactions [4]. Z. Zhu and Q. Ji classified traditional eye detection methods into three categories: template based methods, appearance based methods, and feature based methods [5]. In the template-based methods, a generic eye model is first designed and the template matching is used to detect the eyes on the image. The deformable template method is a commonly used template based method [6-8]. In this method, an eye model is first designed and then the eye position is obtained via a recursive process. This method can detect the eye accurately, but it is computationally expensive and good image contrast is required for the method to converge correctly. The appearance-based methods detect eyes based on their photometric appearance [9-11]. These methods usually need a large amount of training data representing the eyes of different subjects, under different face orientations, and under different illumination conditions. These data are used to train a classifier; detection is then achieved via classification. The feature-based methods use characteristics, such as the edge, iris intensity, color distribution, and the flesh of the eyes to identify features JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 around the eyes [12]. There are several approaches to reduce eye detection processing time. These approaches can be broadly classified into two groups: software-based and hardwarebased. The focus in the software-based approach is designing and optimizing an eye detection algorithm for real-time performance. Conversely, in the hardwarebased approach, the eye detection algorithm is parallelized and implemented on DSP or FPGA to achieve real-time performance. T. D'Orazio et al. propose a real time eye detection algorithm [13]. They use iris geometrical information to determine the eye region candidate and symmetry to select the pair of eyes. Their algorithm consists of two steps. First, the region candidate that contains one eye is detected from the whole image by matching the edge directions with an edge template of the iris. Then, in the opposite region, a search is carried out for the second eye, whose distance and orientation are compatible with the range of possible eye positions. The software has been implemented using Visual C++ on a Pentium III 1GHz. They use a camera with a frame rate of 7.5 fps. The search for the eyes in an image of 640×480 takes about 0.3 sec. K. Lin et al. proposed a fast eye detection scheme for use in video streams rather than still images [14]. The temporal coherence of sequential frames was used to improve the detection speed. First, the eye detector trained by the AdaBoost algorithm is used to roughly obtain the eye positions. Then, these candidate positions are filtered according to geometrical patterns of human eyes. The detected eye regions are then taken as the initial detecting window. The detecting window is updated after each frame is detected. The experimental system was developed with Visual C++ 6.0 on a Pentium 2.8 GHz and 1 GB memory. The mean detection rate was 92.73% for 320×240 resolution test videos in their experiments, with a speed of 24.98 ms per frame. However, CPU-based implementation has two major bottlenecks that limit performance: data transfer bandwidth and sequential data processing rate [15]. First, video frames are captured from the camera and transferred to the CPU main memory via a transfer channel, such as USB or IEEE1394. These channels have limited bandwidth; hence they impose a data flow bottleneck. After a frame is grabbed and moved to 151 memory, the CPU can sequentially process the pixels. The CPU adds a large overhead to the actual computation. Its time is split between moving data between memory and registers, applying arithmetic and logic operators on the data, and maintaining the algorithm flow, handling branches, and fetching code. Additionally, a relatively long latency is imposed by the need to wait for an entire frame to be captured before it is processed. The Deployment and real-time processing of the eye detection algorithm in an embedded environment are difficult due to these reasons. Therefore, A. Amir et al. implemented an embedded system for eye detection using a CMOS sensor and FPGA to overcome the shortcomings of CPU-based eye detection [15]. Their hardware-based embedded system for eye detection was implemented using simple logic gates, with no CPU and no addressable frame buffers. The image-processing algorithm was redesigned to enable highly parallel, single-pass image-processing implementation. Their system processes 640×480 progressive scan frames at a rate of 60 fps, and outputs a compact list of eye coordinates via USB communication. It is difficult to achieve real-time performance with CPU-based eye detection in an embedded environment. A. Amir et al. reduce processing time by using FPGA [15]. However, the eye detection algorithm they used is relatively simple; hence, the eye miss rate is relatively high. B. Jun and D. Kim proposed a face detection algorithm using MCT and AdaBoost [16]. I. Choi and D. Kim presented an eye detection method using AdaBoost training with MCT-based eye features [17]. The MCTbased AdaBoost algorithm is very robust, even when the illumination varies, and is very fast. They choose the MCT-based AdaBoost training method to detect the face and the eye due to its simplicity of learning and high speed of face detection. In this paper, we propose a dedicated hardware architecture for eye detection. We use the MCT-based AdaBoost eye detection algorithm proposed by I. Choi and D. Kim [17]. We revise the algorithm slightly for parallelization. We parallelize the part of the algorithm that is iterated to speed up processing. Several downscaled images of eye candidate regions are generated in parallel using the input face image in the pyramid downscaling module. These images are passed to the MCT and cascade classifier module via a pipeline 152 DONGKYUN KIM et al : AN FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME EYE DETECTION and processed in parallel. Thus, we can detect the left and right eye simultaneously. The sequential data processing bottleneck caused by repetitive feature evaluation is removed by employing pipelined parallel architecture. The proposed architecture is described using Verilog HDL (Hardware Description Language) and implemented on an FPGA. Our system uses CameraLink to interface with the camera. The CameraLink interface transfers the control signal and data signal synchronously. The entire system is synchronized with a pixel clock signal from the camera. Our system does not need an interface such as USB; hence we have removed the bottleneck caused by the interface bandwidth. The frame rate of the proposed system can be flexibly changed in proportion to the frequency of the camera clock. The proposed system can detect left and right eyes within 0.15 ms using a VGA (640×480) image and a 25 MHz clock frequency. This paper consists of six sections and appendices. The remainder of the paper is organized as follows. In Section 2, we describe the details of the MCT and AdaBoost algorithm and the overall flow of the eye detection algorithm that we use. Section 3 describes the proposed hardware architecture and the simulation results. Section 4 presents the configuration of the system and the implementation result. Section 5 presents the experimental results. Finally, our conclusions are presented in Section 6. II. EYE DETECTION ALGORITHM 1. Modified Census Transform The modified census transform is a non-parametric local transform that is a modification of the census transform by B. Fröba, and A. Ernst [18]. It is an ordered set of comparisons of pixel intensities in a local neighborhood to find out which pixels have lower intensity than the mean pixel intensity. In general, the size of the local neighborhood is not restricted, but in this work, we always assume a 3×3 surrounding. The modified census transform is described below. We define N(x) as the local spatial neighborhood of the pixel x, so that x N(x), and N’(x) = N(x) x. Let I(x) and I ( x) be the intensity of the pixel x and the intensity mean in the neighborhood N’(x), respectively. The Fig. 1. Example modified census transform. modified census transform can be presented as ( x) I ( x ), I ( x) xN ( x ) where I ( x ), I ( x) is a comparison function which yields 1 if I ( x) I ( x) , and yields 0 if I ( x) I ( x) , and is a concatenation operation to form a bit vector (see Fig. 1). If we consider a 33 neighborhood, we are able to determine 511 distinct structure kernels. Fig. 1 shows an example of the modified census transform. In this example, I(x) is 95.11. Each pixel on N’ (x) is transformed to 0 or 1 by applying I ( x ), I ( x) . Bits are concatenated in order, as shown as Fig. 1. 2. AdaBoost Algorithm AdaBoost, short for Adaptive Boosting, is a machine leaning algorithm, formulated by Y. Freund and R. Schapire [19]. They introduced the AdaBoost algorithm briefly in another work and showed that it had good generalization capability [20]. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve the performance. AdaBoost is adaptive, in the sense that subsequent classifiers that are built are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. However, in some situation it can be less susceptible to the over fitting problem than most learning algorithms [21]. AdaBoost calls a weak classifier repeatedly in a series of rounds t = 1, 2, ... , T from a total of T classifiers. For each call, a distribution of weights Dt is updated that indicates the importance of examples in the data set for the classification. The weights of each incorrectly classified example are increased (or alternatively, the JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 weights of each correctly classified example are decreased), in each round, so that the new classifier focuses more on those examples [21]. 3. MCT-based AdaBoost Algorithm In this paper, we use the AdaBoost training method with MCT-based eye features proposed by I. Choi and D. Kim [17]. In the AdaBoost training, they constructed a weak classifier that classifies the eye and non-eye patterns and then constructed a strong classifier that is a linear combination of weak classifiers [17]. In the detection, they scanned the eye candidate region by moving a 12×8 sized part of the scanning window and obtained a confidence value corresponding to the current window location using the strong classifier. Then, they determined the window location whose confidence value was maximized as the location of the detected eye. Their MCT-based AdaBoost training was performed using only the left eye and non-eye training images. Therefore, when they tried to detect the right eye, they needed to flip the right subregion of the face image [17]. Fig. 2 shows the flowchart of the eye detection algorithm. The eye detection algorithm requires a face image and the coordinates of the face region as inputs. If the detection completes successfully, the algorithm Fig. 2. Eye detection algorithm flowchart. 153 outputs the coordinates of the left and right eyes as results. Eye candidate regions are cropped for pyramid downscaling in the first step. The face region is split in half horizontally and vertically. The top-left subregion is set as the left eye candidate region and top-right subregion is set as the right eye candidate region. Then, the right eye candidate region is flipped, because the training data for the right eye is the same as for the left. These candidate images are downscaled to a specific size to get pyramid images. There are eight pyramid images (four for the left and four for the right) and the images sizes are 33×36, 31×34, 29×32, and 27×29. The four down-scaled images are enough for an indoor environment where the support distance is 0.5 ~ 8 meters. These sizes of the down-scaled images were determined by empirical experiments. MCT is applied to the downscaled pyramid images in the second step. The window size for the transform is 3×3. We can get MCT images by sliding the window across the pyramid image. Then, the MCT-transformed pyramid images are passed to the cascade classifier to detect eyes. The MCT images are scanned by the classifier and candidate eye locations are extracted in the third step. There are three stages of classifiers. The window size for the classifier is 15×15. The classifier slides the window across the MCT image and extracts the confidence value of the current window position. If the confidence value is lower than the specific threshold value, then the confidence value of the next-stage classifier is calculated. If the confidence value of the last stage is lower than the threshold, then an eye candidate is detected in the current window position. The eye coordinate can be calculated from the current window position. If there are no detected eye candidates after sliding, then detection has failed. Eye detection for one pyramid image is completed in the third step. The algorithm repeats these steps for all the pyramid images. After the left eye detection finishes, the same process is applied to the right eye pyramid images. After this repetitive operation, the left eye candidates list and right eye candidates list are updated. In the post-processing step, if there are several eye candidates in the list, the eye candidate that has the maximum confidence value is selected as the result. 154 DONGKYUN KIM et al : AN FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME EYE DETECTION III. PROPOSED HARDWARE ARCHITECTURE 1. Overall Architecture Fig. 3 shows the overall hardware architecture of the proposed system. The proposed system consists of four sub-modules: a pyramid downscaling module, an MCT module, a parallel classifier module, and a postprocessing module. The face detector and SRAM are not part of the proposed system. The face detector receives the image from the camera and detects faces on the image. The face detector outputs the top-left and bottom-right coordinates of the face region. The face detector passes the face region coordinates to the eye detection module after detecting the faces. The SRAM Interface module updates every frame of the image from the camera to SRAM. This image is a non-uniform eye region image. A stored nonuniform eye image is then used to generate the fixed size pyramid image by using the pyramid downscaling module. The pyramid downscaling module receives as input the left and the right eye region images from SRAM. SRAM cannot be accessed in parallel, so we have to crop the image in serial. First, the downscaled images of the left and right eye regions are generated using SRAM data and stored in registers. After generating the first downscaled images, we can generate other downscaled images in parallel, because the registers can be accessed in parallel. This is different to a software algorithm, in which pyramid images are generated one-by-one. All generated pyramid images are passed to the MCT module in parallel through pipelines. There are eight MCT modules for eight pyramid images. In the MCT modules, the 3×3 window slides across the pyramid images for the census transform. MCT images are passed to the classifier module in parallel through pipelines. The classifier module extracts features from the MCT image. In this module, a window with a pre-defined size is used to scan through the MCT image from top-left to bottom-right. At every stage of the window scanning, if the confidence value of this stage is greater than a defined threshold, then the current window region is considered as an eye candidate. The threshold values are built in advance (by learning), corresponding to the classifier’s stages. The coordinates of the eye candidate can be calculated based on the current position of the scanning window. In contrast to the software implementation, in this paper we apply the 3-stage classifier in parallel for eight MCT images. Hence, we need 24 classifiers (3 stages × 8 images) for detection. Moreover, our hardware implementation can detect the left eye and the right one simultaneously. As a result, several eye candidates can be detected within one left or right eye region. If there are more than one eye candidate, the candidate that has the maximum confidence value is selected as the final detection result. Camera Input Eye Region of Faces SRAM Uninformed Eye Image 2. Flip / Pyramid Downscaling Module Face Detector 8-Image x 3-State Parallel Classifier Flip Stage 1 Stage 2 Stage 3 MCT Transform Right Region Flip / Downscaling Left H1<T1 H2<T2 H3<T3 H’1<T1 H’2<T2 H’3<T3 Right Post Processing Detection Result Fig. 3. Overall hardware architecture of the proposed system. Fig. 4 shows the pyramid downscaling images of eye candidate regions. The pyramid downscaling module needs a face image and face region coordinates to generate pyramid images. The face image is stored in SRAM and face region coordinates can be received from the face detector. First, the pyramid module splits the eye candidate regions. As shown in Fig. 4, the top-left region is set as the left eye candidate region and the top-right region is set as the right eye candidate region. Image pixels in SRAM can be accessed one-by-one every pixel clock period. Thus, we cannot generate all eight pyramid images simultaneously. Therefore, we should generate the first pyramid images of the left and right eye regions JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 Left Eye Region ① Right Eye Region ② Fig. 4. Pyramid downscaling images. in serial. In Fig. 4, image (1) and image (2) are generated in serial, and these image data are stored in registers. We use nearest-neighbor interpolation to generate downscaling images. We can generate other downscaled images in parallel using these images stored in registers. In the software algorithm, the right eye is flipped after cropping the image. As shown in Fig. 4, the right eye candidate region is scanned from right to left. Thus, flip is unnecessary in our architecture. The size of pyramid images is the same as in the software algorithm (33×36, 31×34, 29×32, 27×29). 3. MCT Module Fig. 5 shows the realization of MCT bit computation. The average intensity of pixels in the 3×3 window is needed to calculate the MCT bit vector. We need to use Intensity X 1,1 1,2 + 1,3 1,1 1,2 1,3 2,1 2,2 2,3 2,1 2,2 2,3 3,1 3,2 3,3 3,1 3,2 3,3 2,1 2,2 the divider module to calculate the average intensity. However, the computation latency of the divider module is very long compared to the adder module. Thus, we use the adder and shifter instead of the divider. We do not calculate average intensity. Instead, we calculate SUM_I and I(x,y)×9 as shown in Fig. 5. SUM_I is the sum of pixels in the window. We can calculate this value using the adder tree. The Ixy×9 value can be calculated by multiplying the intensity of every pixel by nine. We implement the computation using a combination of the shifter and adder, as shown in Figure 8. By comparing SUM_I and Ixy×9, we can get the number of MCT bits for each pixel in the window. 4. Parallel Classifier Module In the classifier module, a window of a pre-defined sized (15×15) slides from top-left to bottom-right on the MCT images, as shown in Fig. 6. In contrast to the software algorithm, we apply the 3-stage classifier in parallel for eight MCT images. Thus, we can calculate confidence values for each stage classifier simultaneously. We can detect both eyes simultaneously. We need 24 classifiers (8 images × 3 stages). We use a revised version of the parallel classifier cascade architecture proposed by Jin et al. [22] for eye detection, as shown in Fig. 7. They designed the MCT 1,1 1,2 1,3 2,3 3,1 + 3,2 + 3,3 1,1 1,1 3,3 <<3 3,3 <<3 + + + SUM_I I(1,1)x9 I(3,3)x9 SUM_I I(1,1)x9 SUM_I I(3,3)x9 > > MCT(1,1) MCT(3,3) Fig. 5. Realization of MCT bit computation. 155 Fig. 6. Parallel classifier cascade for left eye detection. 156 DONGKYUN KIM et al : AN FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME EYE DETECTION Sum::confidence value at Stage 3 Sum::confidence value at Stage 2 Sum::confidence value at Stage 1 Table 1. Device utilization and equivalent gate count Fig. 7. Parallel classifier cascade and feature evaluation. classifiers using look-up tables (LUTs), since each classifier consists of the set of feature locations and the confidence value corresponding to the LBP (local binary pattern). LUTs contain training data generated by AdaBoost training. We revise this LUT for MCT rather than LBP. The three stages of the cascades have an optimized number of allowed positions: 22, 36, and 100, respectively. In the software algorithm, the classifiers are cascaded in serial. In the proposed hardware, however, all the cascade stages are pipelined and operate in parallel. The feature extraction result is generated at each pixel clock time after the fixed pipeline latency. Logic utilization Used Avail. Util. Num. of slice registers 45,716 207,360 22% Num. of slice LUTs 133,916 207,360 64% Num. of occupied slices 39,914 51,840 76% Memory (KB) 288 10,368 2% Equivalent gate-count : about 200 K gates Table 2. Timing summary Requested Frequency 25.0 MHz Estimated Frequency 106.3 MHz Requested Period 40.000 ns Estimated Period 9.409 ns Slack 30.591 ns front-end and back-end implementation. The place and route phase is performed by Xilinx PAR (Place and Route Program). Table 1 shows the device utilization summary reported by the Xilinx ISE. The number of occupied slices and the total used memory of the proposed system are 39,914 (about 76% of the device) and 288 KB, as shown in Table 1. Its equivalent gate count is about 200 K gates. Table 2 shows the timing summary reported by the Synplify synthesis tool. The requested frequency is 25.0 Mhz and the maximum estimated frequency of the implemented system is 106.3 MHz. We use a progressive camera that has a 24.5454 MHz pixel clock. This camera captures the VGA image at 60 fps. We can expect a rate of around 240 fps for the processing of standard VGA images at the reported maximum estimated frequency of the system, because the frame rate increases linearly with the pixel clock increment. IV. SYSTEM IMPLEMENTATION 2. System Configuration 1. Synthesis Result All hardware modules are implemented using Verilog HDL and simulated by the ModelSim 6.5 Simulator for functional verification. We use a camera with a 24.5454 MHz clock frequency for the implementation, so for consistency we use a 25 MHz clock frequency for the simulation. The hardware module is synthesized by the Synplify logic synthesis tool from Synplicity. The target device for the implementation is a Virtex-5 LX330 FPGA from Xilinx. We use the Xilinx ISE 10.1.3 integration tool for Fig. 8 shows the system configuration for testing and verifying the eye detection system. The overall system consists of four PCBs (Printed Circuit Board): a CameraLink interface board, two FPGA boards for face and eye detection, and an SRAM/DVI interface board. The CameraLink interface board receives sync signals and image data signals from the camera using the CameraLink interface. These signals include pixel clock, VSYNC (vertical sync - frame valid), HSYNC (horizontal sync - line valid), DVAL (data valid) and streaming image data. The FPGA boards grab these signals for use JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 157 Fig. 8. System configuration for eye detection. in face and eye detection. We use two FPGA boards. One is used for face detection and the other is used for eye detection. We use the face detector proposed by Jin et al. [22] to detect the coordinates of face regions. The single image frame is stored and updated in SRAM to generate pyramid downscaling images. The eye detection result is displayed on the monitor using a DVI interface. Fig. 9. Evaluation of 4 different cases: all left-right eye pairs, all except out-of-focus eyes, all except RIP of faces or eyes, all except eyes with-glasses. V. EXPERIMENTAL RESULT The robustness of our system was evaluated using both the provided image database and real people’s faces captured by our camera system. Images from the facial database were downloaded and referred from [23]. We selected the first directory named 2002-07 in the database for the experiment. In this directory, there are 934 images containing 1394 faces or left-right eye pairs. The evaluation was made using four criteria. First, the precision and recall values were analyzed with respect to all left-right eye pairs. These values were calculated based on the number of correct detections (true positives), the number of missed detection (false negatives), and the number of wrong detections (false positives). Second, the result was analyzed when out-of-focus images were not considered. Similarly, in the two remaining criteria, rotation-in-plane (RIP) and with-glasses were not considered, respectively. Fig. 9 shows the experimental results in four different cases. It is noted that our implementation does not support the case of rotation-outof-plane (ROP); thus, all faces rotated out of the plane (i.e. a face that contains only one eye) were not considered in the result evaluation. The experimental results yielded 93% detection accuracy. Fig. 10 shows sample image results where faces and eyes were successfully detected under different conditions. Fig. 11 and Table 3 show the detection results of real Fig. 10. Sample results yielded from the face image database. people’s faces captured by our camera system, with and without glasses, respectively. The detection rate is 87% from 1200 frames. As shown in Table 3, the detection rate of people wearing glasses drops significantly. There can be light spots or blooming near eyes due to light reflection caused by glasses. These effects can influence the detection rate. Generally, the detection rate of people with glasses is lower than without glasses. Therefore, we need some additional processing to reduce the effects caused by glasses. By considering the detection results for the provided image database (software implementation) and the real 158 DONGKYUN KIM et al : AN FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME EYE DETECTION VI. CONCLUSIONS (a) Correctly detected results (b) Falsely detected results Fig. 11. Online detection results of a person with glasses. Table 3. Detection results for persons with and without glasses Person 1 Person 2 Total Glasses Frames Yes No Yes No - 300 300 300 300 1200 Falsely Detected 43 10 85 17 155 Detection Rate 85.67 % 96.67 % 71.67 % 94.33 % 87.08 % people’s faces captured by our camera system (hardware implementation), we can see that the detection rate of the hardware implementation is lower than that of the software implementation in the case of with-glasses. However, the processing time of the hardware system is much better than that of the software: 0.15 ms/frame versus 20 ms/frame. We used a personal computer for the evaluation of the software implementation, which had an AMD Athlon 64 X2 Dual Core Processor 3800+ 2.00 GHz and 2-GB RAM. Table 4 shows a comparison chart with other eye detection algorithms. Our method has the largest image size and fastest processing speed, even though small decrease of detection accuracy. Our method is suitable for hardware implementation. In this paper, we proposed a dedicated hardware architecture for eye detection. We implemented it on an FPGA. The implemented system processes each pyramid image of the left and the right eye candidate regions in parallel. The processing time to detect the left and right eyes of one face is less than 0.15 ms with a VGA image. The maximum estimated frequency of the implemented system is 106.3 MHz. We can expect a rate of around 240 fps for the processing of standard VGA images at the reported maximum estimated frequency of the system, because the frame rate increases linearly with the pixel clock increment. The detection rate of our system is 93% with the FDDB face database and 87% for real people's faces. However, the detection rate for people with glasses drops significantly. This is a weakness of our proposed system. In the future, we will add a pre-processing step to reduce the effects caused by glasses. ACKNOWLEDGMENTS This work was supported by the Ministry of Knowledge Economy and the Korea Institute for Advancement in Technology through the Workforce Development Program in Strategic Technology. REFERENCES [1] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and Machine Recognition of Faces: A Survey”, Proceedings of the IEEE, Vol.83, No.5, May, 1995. [2] G. C. Feng, and P. C. Yuen, “Variance projection Table 4. Performance summary of reported software-based eye detection systems Eye Detection Software Implementations Image Size Detection Rate FPS K. Peng 2005 [24] 92 112 95.2 % 1 fps Feature Localization, Template Matching P. Wang 2005 [25] 320 240 94.5 % 3 fps Fisher Discriminant Analysis, AdaBoost, Recursive Nonparametric Discriminant Analysis Features Q. Wang 2006 [26] 384 286 95.6 % - T. Akashi 2007 [27] 320 240 97.9 % 35 fps Template Matching, Genetic Algorithm H.-J. Kim 2008 [28] 352 240 94.6 % 5 fps Zernike moments, Support Vector Machines M. Shafi 2009 [29] 480 540 87 % 1 fps Geometry, shape, density features H. Liu 2010 [30] 640 480 94.16 % 30 fps Zernike moments, Support Vector Machines, Template Matching Ours 640 480 93 % 50 fps AdaBoost, Modified Census Transform, Multi-scale Processing Binary Template Matching, Support Vector Machines JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] function and its application to eye detection for human face recognition”, Pattern Recognition Letters, Vol.19, No.9, pp.899-906, Jul., 1998. Q. Ji, and X. Yang, “Real-Time Eye, Gaze, and Face Pose Tracking for Monitoring Driver Vigilance”, Real-Time Imaging, Vol.8, No.5, pp.357-377, Oct., 2002. Robert J. K. Jacob, “The Use of Eye Movements in Human-Computer Interaction Techniques: What You Look At is What You Get,” ACM Transactions on Information Systems (TOIS), Vol.9, No.2, Apr., 1991. Z. Zhu, Q. Ji, “Robust real-time eye detection and tracking under variable lighting conditions and various face orientations,” Computer Vision and Image Understanding, Vol.98, No.1, pp.124-154, Apr., 2005. A. L. Yuille, P W. Hallinan, and D. S. Cohen, “Feature Extraction from Faces Using Deformable Templates,” International Journal of Computer Vision, Vol.8, No.2, pp.99-111, 1992. X. Xie, R. Sudhakar, and H. Zhuang, “On improving eye feature extraction using deformable templates,” Pattern Recognition, Vol.27, No.6, pp.791-799, Jun., 1994. K. Lam, and H. Yan, “Locating and extracting the eye in human face images,” Pattern Recognition, Vol.29, No.5, pp.771-779, May, 1996. A. Pentland, B. Moghaddam, and T. Starner, “View-Based and Modular Eigenspaces for Face Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR'94), pp.84-91, 1994. W. Huang, and R. Mariani, “Face Detection and Precise Eyes Location,” in Proceedings of International Conference on Pattern Recognition (ICPR'00), Vol.4, pp.722-727, 2000. J. Huang, H. Wechsler, “Eye Detection Using optimal Wavelet Packets and Radial Basis Functions (RBFs),” International Journal of Pattern Recognition and Artificial Intelligence, Vol.13, No.7, pp.1009-1025, 1999. S. A. Sirohey, A. Rosenfeld, “Eye detection in a face image using linear and nonlinear filters,” Pattern Recognition, Vol.34, No.7, pp.1367-1391, 2001. T. D'Orazio, M. Leo, G.Cicirelli, and A. Distante, [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] 159 “An Algorithm for real time eye detection in face images”, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), Vol.3, 2004. K. Lin, J. Huang, J. Chen, and C. Zhou, “Real-time Eye Detection in Video Streams,” Fourth International Conference on Natural Computation (ICNC'08), pp.193-197, Oct., 2008. A. Amir, L. Zimet, A. Sangiovanni-Vincentelli, and S. Kao, “An embedded system for an eye-detection sensor”, Computer Vision and Image Understanding, Vol.98, No.1, pp.104-123. Apr., 2005. B. Jun, and D. Kim, “Robust Real-Time Face Detection Using Face Certainty Map”, Lecture Notes in Computer Science, Vol.4642, pp.29-38, 2007. I. Choi, and D. Kim, “Eye Correction Using Correlation Information,” Lecture Notes in Computer Science, Vol.4843, pp.698-707, 2007. B. Fröba, and A. Ernst, “Face Detection with the Modified Census Transform,” in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR'04), pp.91-96, May 2004. Y. Freund, and R. E. Schapire, “A DecisionTheoretic Generalization of On-Line Learning and an Application to Boosting,” Journal of Computer and Systems Sciences, Vol.55, pp.119-139, Aug., 1997. Y. Freund, and R. E. Schapire, “A Short Introduction to Boosting,” Journal of Japanese Society for Artificial Intelligence, Vol.14, No.5, pp.771-780, 1999. P. Viola and M. Jones, “Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade,” Advances in Neural Information Processing Systems, Vol.2, pp.13111318, 2002. S. Jin, D. Kim, T. T. Nguyen, D. Kim, M. Kim and J. W. Jeon, “Design and Implementation of a Pipelined Datapath for High-Speed Face Detection Using FPGA,” IEEE Transaction on Industrial Informatics, Vol.8, No.1, pp.158-167, Feb., 2012. V. Jain and E. L. Miller, “FDDB: A Benchmark for Face Detection in Unconstrained Setting,” Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst, 160 [24] [25] [26] [27] [28] [29] [30] DONGKYUN KIM et al : AN FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME EYE DETECTION 2010. http://vis-www.cs.umass.edu/fddb K. Peng, L. Chen, S. Ruan, and G. Kukharev, “A Robust Algorithm for Eye Detection on Gray Intensity Face without Spectacles,” Journal of Computer Science and Technology, Vol.5, No.3, pp.127-132, 2005. P. Wang and Qiang Ji, “Learning Discriminant Features for Multi-View Face and Eye Detection,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2005. Q. Wang and J. Yang, “Eye Detection in Facial Images with Unconstrained Background,” Journal of Pattern Recognition Research, Vol.1, pp.55-62, 2006. T. Akashi, Y. Wakasa, K. Tanaka, S. Karungaru, and M. Fukumi, “Using Genetic Algorithm for Eye Detection and Tracking in Video Sequence,” Journal of Systemics, Cybernetics, and Informatics, Vol.5, No.2, pp.72-78, 2007. H.-J. Kim and W.-Y. Kim, “Eye Detection in Facial Images using Zernike Moments with SVM,” ETRI Journal, Vol.30, No.2, 2008. M. Shafi and P. W. H. Chung, “A Hybrid Method for Eyes Detection in Facial Images,” International Journal of Electrical, Computer, and Systems Engineering, Vol.42, pp.231–236, 2009. H. Liu and Q. Liu, “Robust Real-Time Eye Detection and Tracking for Rotated Facial Images under Complex Conditions,” in Proceedings of International Conference on Natural Computation, 2010. Dongkyun Kim received B.S. and M.S. degrees in electrical and computer engineering from Sungkyunkwan University, Suwon, Korea, in 2007 and 2009, respectively. He is currently working toward a Ph.D. degree with the School of Information and Communication Engineering, Sungkyunkwan University. His research interests include image/speech signal processing, embedded systems, and real-time applications. Junhee Jung received a B.S. degree in the School of Information and Communication Engineering from Sungkyunkwan University, Korea, in 2009 and an M.S. degree in the Department of Electrical and Computer Engineering from Sungkyunkwan University, Korea, in 2011, respectively. His interests include image processing, computer vision and SoC. Thuy Tuong Nguyen received a B.S. (magna cum laude) degree in computer science from Nong Lam University, Ho Chi Minh City, Vietnam, in 2006, M.S. and Ph.D. degrees in electrical and computer engineering from Sungkyunkwan University, Suwon, Korea, in 2009 and 2012, respectively. His research interests include computer vision, image processing, and graphics processing unit computing. Daijin Kim received a B.S. degree in electronic engineering from Yonsei University, Seoul, Korea, in 1981, an M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Taejon, 1984, and a Ph.D. degree in electrical and computer engineering from Syracuse University, Syracuse, NY, in 1991. He is currently Professor in the Department of Computer Science Engineering at POSTECH. His research interests include intelligent fuzzy system, soft computing, genetic algorithms, and hybridization of evolution and learning. JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 Munsang Kim received his B.S. and M.S. degrees in mechanical engineering from the Seoul National University in 1980 and 1982, respectively, and the Dr.-Ing. degree in robotics from the Technical University of Berlin, Germany, in 1987. Since 1987, he has been working as a research scientist at the Korea Institute of Science in Korea. He leads the Advanced Robotics Research Center since 2000 and became the director of the Intelligent RobotThe Frontier 21 Program since 2003. His current research interests are the design and control of novel mobile manipulation systems, haptic device design and control, and sensor application in intelligent robots. Key Ho Kwon received a B.S. degree from the Department of Electronics Engineering, Seoul National University, Korea, in 1975, an M.S. degree from the Department of Electronics Engineering, Seoul National University, Korea, in 1978, and a Ph.D. degree from the Department of Electronics Engineering, Seoul National University, Korea, in 1989. He is currently a Professor at the School of Information and Communication Engineering, Sungkyunkwan University. His research interests include artificial intelligence, fuzzy theory, neural networks, and genetic evolutionary algorithms. 161 Jae Wook Jeon received B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1984 and 1986, respectively, and a Ph.D. degree in electrical engineering from Purdue University, West Lafayette, IN, in 1990. From 1990 to 1994, he was a Senior Researcher with Samsung Electronics, Suwon, Korea. Since 1994, he has been with the School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, first as an Assistant Professor, and currently as a Professor. His research interests include robotics, embedded systems, and factory automation.
© Copyright 2026 Paperzz