ZalhanMohdZinMFKE2007TTT

ENHANCED FEATURE SELECTIONS OF ADABOOST TRAINING
FOR FACE DETECTION USING GENETIC ALGORITHM
ZALHAN BIN MOHD ZIN
A thesis submitted in fulfillment of the
requirements for the award of the degree of
Master of Engineering (Electrical)
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
AUGUST 2007
iii
To my beloved mother and father
iv
ACKNOWLEDGEMENT
In preparing this thesis, I was involved with many academicians and
researchers who have contributed a lot to my understanding and thoughts. I wish to
express my deep and sincere appreciation to my supervisor, Prof. Dr. Marzuki bin
Khalid, for his encouragement, guidance and critics. I am also very thankful to my
co-supervisor, Prof. Dr. Rubiyah binti Yusof, for her advice, guidance and
motivations. Without their continuous support, this thesis would not have been
completed and presented.
I would also like to extend my appreciations to my fellow postgraduate students,
researchers and staff of Center for Artificial Intelligence and Robotics (CAIRO),
Universiti Teknologi Malaysia (UTM) Kuala Lumpur for their support and advice,
especially to Mr. Yap Wooi Hen and Mrs. Nenny Ruthfalydia binti Rosli. Their
views and tips are useful indeed.
The Universiti Kuala Lumpur Malaysia France Institute (UniKL-MFI) should also be
appreciated for their support and financial assistance. My sincere appreciation also
extends to all my colleagues and others who have provided various forms of
assistance.
Last but not least, I would also like to extend my greatest appreciations to my
beloved wife, Mardewee binti Endut, who has been very supportive, cooperative and
understanding towards my commitment in this research and thesis preparation. I am
also very grateful to all my family members.
v
ABSTRACT
A wide variety of face detection techniques have been proposed over the past
decades. Generally, a large number of features are required to be selected for training
purposes. Often some of these features are irrelevant and do not contribute directly to
the face detection techniques. This creates unnecessary computation and usage of
large memory space. In this thesis, features search space has been enlarged by
enriching it with seven additional new feature types. With these new feature types
and larger search space, Genetic Algorithm (GA) is used within the Adaboost
framework, to find sets of features which can provide a better cascade of boosted
classifiers with a shorter training time. This technique is referred to as GABoost for
this training part of a face detection system. The GA carries out an evolutionary
search to select features which results in a higher number of feature types and sets
selected in less time. Experiments on a set of images from BioID face database
proved that by using GA to search on a large number of feature types and sets, the
proposed technique referred to as GABoost was able to obtain the cascades of
boosted classifiers for the face detection system that can give higher detection rates
(94.25%), lower false positive rates (55.94%) and less training time (6.68 hours).
vi
ABSTRAK
Pelbagai teknik pengesanan muka telah diperkenalkan sejak beberapa dekad
lalu. Secara umumnya, sejumlah yang besar ciri-ciri diperlukan, bagi tujuan
pemilihan untuk kegunaan latihan. Kebiasaannya, sebahagian dari ciri-ciri tersebut
adalah tidak berkaitan dan tidak menyumbang secara langsung kepada teknik
pengesanan muka. Keadaan ini mengakibatkan pengiraan mesin yang tidak
sepatutnya dan penggunaan ruang ingatan mesin yang besar. Di dalam tesis ini,
ruang carian bagi ciri-ciri telah diperluaskan dengan cara memperkayakannya dengan
penambahan tujuh jenis ciri-ciri yang baru. Dengan adanya penambahan baru jenis
ciri-ciri ini, dan ruang carian yang lebih luas, Algoritma Genetik (GA) telah
digunakan di dalam lingkungan rangka kerja Adaboost, untuk mencari kumpulan
ciri-ciri yang boleh memberi pengkelas teruja melata dengan waktu latihan yang
lebih singkat. Teknik ini yang dikenali sebagai GABoost untuk bahagian latihan bagi
sistem pengesanan muka. GA menjalankan pencarian secara evolusi untuk memilih
ciri-ciri yang membawa kepada keputusan yang merangkumi bilangan ciri-ciri yang
lebih tinggi dan meletakkan pilihan dalam waktu yang lebih singkat. Ujikaji pada set
gambar-gambar daripada pangkalan data muka BioID telah membuktikan bahawa
dengan menggunakan GA untuk pencarian jenis-jenis dan kumpulan-kumpulan ciriciri dalam bilangan yang besar, teknik yang dikenali sebagai GABoost ini mampu
menghasilkan pengkelas teruja melata untuk sistem pengesanan muka yang boleh
memberi kadar pengesanan muka yang lebih tinggi (94.25%), kadar ketidakbenaran
positif yang lebih rendah (55.94%) dan jumlah pengunaan masa latihan yang kurang
(6.68 jam).
vii
TABLE OF CONTENTS
CHAPTER
TITLE
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENTS
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENTS
vii
LIST OF TABLES
xi
LIST OF FIGURES
xiii
LIST OF ABBREVIATIONS
xx
LIST OF APPENDICES
1
PAGE
xxii
CHAPTER 1: INTRODUCTION
1
1.1
Introduction
1
1.2
Objectives of the Thesis
4
1.3
Scope of the Thesis
5
1.4
Thesis Contributions
5
1.5
Thesis Outlines
6
viii
2
CHAPTER 2: LITERATURE REVIEW
9
2.1
Introduction
9
2.2
Applications of Face Recognitions and Face Detections
12
2.2.1. Physical Access Control
13
2.2.2. Video Surveillance and Watch-list Identifications
15
2.2.3. Image Database Search
18
2.2.4. Entertainment and Leisure
19
Issues in Face Detections
20
2.3.1 Scale
21
2.3.2 Pose
22
2.3.3 Illumination
23
2.3.4 Facial Expression
24
2.3.5 Occlusion
25
Face Detection Methods and Techniques
26
2.4.1 Knowledge based Methods
26
2.4.2 Feature Invariant Approaches
28
2.3
2.4
2.4.2.1 Facial Features
28
2.4.2.2 Skin Color
29
2.4.2.3 Multiple Features
30
2.4.3 Template Matching Methods
31
2.4.4 Appearance based Methods
32
2.4.4.1 Eigenfaces
32
2.4.4.2 Distribution based Methods
33
2.4.4.3 Neural Network
35
2.4.4.4 Support Vector Machines (SVM)
36
2.4.4.5 Adaboost
37
2.5
Evolutionary Algorithm in Face Detection Techniques
40
2.6
Genetic Algorithm
43
2.7
Summary
44
ix
3
CHAPTER 3: FEATURE SELECTIONS OF
47
ADABOOST TRAINING USING GENETIC ALGORTIHM
4
3.1
Introduction
47
3.2
Method and Technique Used
48
3.3
Haar-based Features and Integral Images
50
3.4
Adaboost Learning Algorithm
52
3.5
Cascade of Boosted Classifiers
61
3.6
Genetic Algorithm for Features Selections
63
3.7
Face Databases for Training and Testing
79
3.8
Summary
80
CHAPTER 4: EXPERIMENTAL RESULTS AND ANALYSIS
82
4.1
Introduction
82
4.2
Experiment
on
Evolutionary
Algorithm
with
the
characteristic of Genetic Algorithm for feature selections
Haar based Features and Integral Images
4.3
83
Experimental results in terms of computational training
time
of
Ex_Boost_5F,
GABoost_15F_Ranking
and
GABoost_15F_Roulette.
4.4
86
Experimental results in terms of number of weak
classifiers
or
features
selected
in
Ex_Boost_5F,
GABoost_15F_Ranking and GABoost_15F_Roulette
4.5
89
Experimental results in terms of the performance of hit
detection rates and false positive detection rates in
Ex_Boost_5F,
GABoost_15F_Ranking
and
GABoost_15F_Roulette
4.6
94
Experimental results of the seven new feature types in
GABoost_15F_Ranking and GABoost_15F_Roulette
104
4.7
Analysis of the Experimental Results
110
4.8
Summary
112
x
5
CHAPTER 5: CONCLUSION AND FUTURE WORKS
114
5.1
Conclusions
114
5.2
Future Works
116
REFERENCES
Appendices A – B
120
126 - 140
xi
LIST OF TABLES
TABLE NO.
TITLE
PAGE
3.1
Example of fitness values, normalized fitness values and
accumulative normalized fitness values
71
4.1
The comparison of ExBoost_5F, GABoost_15F_Roulette
and GABoost_15F_Ranking in term of their training time
taken to build 15 stages cascade of classifiers
86
4.2
The computational training time of ten experiments of
GABoost_15F_Ranking in building 15 stages cascade of
boosted classifiers
87
4.3
The computational training time of ten experiments of
GABoost_15F_Roulette in building 15 stages cascade of
boosted classifiers
87
4.4
The
comparisons
of
ExBoost_5F,
GABoost_15F_Ranking and GABoost_15F_Roulette in
term of their total number of features selected and
average time taken to select a single feature in cascade of
boosted classifiers
90
4.5
The number of features selected and the time taken to
select only a single feature in GABoost_15F_Ranking in
building of cascade of boosted classifiers.
91
4.6
The number of features selected and the time taken to
select only a single feature in GABoost_15F_Roulette in
building of cascade of boosted classifiers
92
4.7
The comparison of hit rates and false positive rates
performed by the cascades of boosted classifiers built by
ExBoost_5F,
GABoost_15F_Ranking
and
GABoost_15F_Roulette
96
4.8
The detail of the hit rates and false positives rates
achieved
for
ten
experiments
using
GABoost_15F_Ranking in building of cascade of boosted
classifiers.
97
xii
4.9
The detail of the hit rates and false positive rates
achieved
for
ten
experiments
using
GABoost_15F_Roulette in building of cascade of boosted
classifiers
98
4.10
The detail of the hit rates and false positive rates
achieved for ten experiments using GABoost_Init in the
building of the cascade of boosted classifiers
100
4.11
Details of the average numbers of new seven feature
types selected by GABoost_15F_Ranking.
106
4.12
Details of the average numbers of new seven feature
types selected by GABoost_15F_Roulette.
106
4.13
Details of the seven new feature types selected by
GABoost_15F_Ranking in the ten experiments
107
4.14
Details of the seven new feature types selected by
GABoost_15F_Roulette in the ten experiments.
109
xiii
LIST OF FIGURES
TITLE
FIGURE NO.
of
Closed-Circuit
PAGE
1.1
Structure
network
Television
(CCTV)
2
2.1
A generic framework of face recognition system. The
first step in this framework is the detection of faces in
the image. The detection process is done by a face
detection system
10
2.2
Example of a process of an automated face recognition
system. (a) The face is detected; (b) Pose tracking and
estimation, (c) Alignment process and (d) the person is
recognized
11
2.3
Co-operation is required by the subject by putting his
face in front of the camera
14
2.4
The process of face recognition by FacePASS where a
grid is placed on the subject’s face image and this face
is verified with the database. Access is granted only in
case of good match whereas access is denied to people
unknown to the system
14
2.5
Two different results of FacePASS. On the left, access
is granted while on the right the subject’s access is
rejected.
15
2.6
Face detection and recognition by FaceFINDER. On
the top left of the screenshot is the input of video image
which show many uncooperative subjects walking.
FaceFINDER will detect the faces and compare them
to database of face images as shown on the right side of
the screenshot
17
2.7
FaceSnap will detect faces in the images and it also can
capture and store the facial images
17
2.8
The AcSys Watchlist Main Interface. The system
performs face recognition by using its face images
database
18
xiv
2.9
Example of digital camera that use face detection
technology. From left: Canon Powershot G7, Fuji
Finepix S6000FD and Canon Ixus 850IS
19
2.10
Example of face detection applications used in mobile
phones and camera
19
2.11
Example of a single image which contains different
scales or sizes of faces.
21
2.12
Different effect of illuminations on a face.
23
2.13
Different facial expressions
24
2.14
Different occlusions on a face
25
2.15
Example of horizontal and vertical signature used in
[38] to detect face
27
2.16
Example of the distance measures used in Sung and
Poggio’s method presented in [3]. (a) Computation of
distance between test pattern and clusters. (b) Each
distance measure is two-value metric. D1 is a
Mahalanobis distance, while D2 is the Euclidean
distance
34
2.17
System diagram of the Rowley-Kanade neural network
method [4]
36
2.18
Face detection cascade of classifiers shown in [23],
whereby rejection can happen at any stage
38
2.19
Evolutionary Algorithm used in [14] to build single
stage classifiers
41
2.20
Crossover and mutation process used in [17]. In (a),
each parent is converted into sequence of observation
vectors for crossover while in (b), the process of
crossover is shown and (c) show the mutation process
42
2.21
The procedure of Evolutionary Pruning used in [57] to
reduce the number of weak classifiers trained by
Adaboost during the training of cascade of boosted
classifiers.
43
xv
3.1
Example rectangle features shown relative to the
enclosing detection window. The sums of the pixels
which lie within the white rectangles are subtracted
from the sum of pixels in the black rectangles. Tworectangle features are shown in (a) and (b). Figure (c)
shows a three- rectangle feature and (d) a fourrectangle feature
50
3.2
Five different basic types of rectangle features within
their sub window of 24x24 pixels. These five basic
types of features are the initial features used to train
cascade of classifiers exhaustively in OpenCV
51
3.3
The sum of pixels within rectangle D can be computed
with four array references. The value of integral image
at location 1 is the sum of the pixels of rectangle A.
The value at location 2 is A+B, at location C is A+C,
and at location 4 is A+B+C+D. The sum within D can
be computed as 4+1-(2+3)
52
3.4
Two classes of data points represent two classes of
images such as face or non-face. The weight associated
to each point is equal to 1
53
3.5
The first weak learner is just chosen at chance which in
this case dividing the set of data points into two. A
quite high number of green points as included in its
selection
54
3.6
The weak learner (bold line) seems to be the best
among all weak learners to classify face images and
therefore is selected as the weak classifier
55
3.7
The weight of all misclassified data points or training
examples performed by the first weak classifier is
updated and increased by Adaboost
56
3.8
The second weak classifier is selected. However, there
are still some misclassified data points performed by
these weak classifiers
56
3.9
The selection of the third weak classifier and yet
misclassified data points are still exist
57
3.10
The four weak classifiers selected which best enough to
differentiate between face and non-face images
57
xvi
3.11
Adaboost learning algorithm as proposed in
[23][25][26]. This algorithm is used to select sets of
weak classifiers to form strong classifiers from all
possible features types. The search of good feature
ht was done exhaustively as stated in step 3b above
59
3.12
The determination of thresholds by weak learner
60
3.13
Cascade from simple to complex classifiers with N
layers
61
3.14
A simple feature is used to reject simple background
pattern. The left sub-window image will pass through
to the next strong classifiers while the right one will
simply be discarded
62
3.15
The cascade learning process involving new false
positive images from the previous stages are added into
the set of negative samples images in next stages
63
3.16
Pseudo-code of an Evolutionary Algorithm with the
characteristics of Genetic Algorithm
64
3.17
Structure of an individual or a chromosome which
represents the specific type and location of one single
feature. Its last gene contains the fitness value from the
65
fitness function chosen:
1− εi
3.18
Representation of an individual or chromosome as the
type and location of a feature in the sub-window of
24x24 pixels
66
3.19
Structure of population of Genetic Algorithm with NSize population. Fitness value is equal to 1-error and it
is between 0 and 1. The higher fitness value means the
lower error value for a particular feature or weak
classifier. This figure show the already sorted and
ranked chromosomes based on their fitness value
67
3.20
The existing three types of features within their sub
window of 24x24 pixels. These feature sets are added
in training of cascade of classifiers with Genetic
Algorithm search
68
xvii
3.21
The newly proposed seven types of features within
their sub window of 24x24 pixels. These feature sets
are proposed and added in training of cascade of
classifiers with Genetic Algorithm search. These
additional feature types increase the size of search
space and computational time taken for cascade
training is higher
69
3.22
The example of the each of the ten chromosome’s
fitness value propagated in the form of roulette wheel.
Chromosome number 1 represents the biggest portion
of the roulette while chromosome number 10
represents the smallest one
71
3.23
The example on how the accumulative normalized
fitness vales are assigned to each chromosome. When
the probability p roulette between 0 and 1 is chosen, the
72
comparison between p roulette and accumulative
normalized fitness values is made from the left to right.
The first chromosome found with higher value of
accumulative normalized fitness value than p roulette
will be selected
3.24
The crossover process with two parents chosen and two
genes m and n randomly chosen. The values at position
m and n are crossover to produce new children and it
will be evaluated to get the new fitness value
73
3.25
The mutation process with single chosen parent
chromosome. The selected gene between second and
fifth gene are mutated by adding an integer value
between -2 and 2 while in the first gene containing
type of feature, random number of type between 1 and
15 is then chosen. The new child will then be evaluated
to determine its new fitness value
75
3.26
Some examples of faces images used in the training set
79
3.27
Some examples of non-faces images used in the
training set
79
3.28
Some examples of images containing faces with
various conditions used in the BioID test set
80
4.1
The snapshot of the log file generated by the program
which indicates the end of the training of 15 stages
cascade of classifiers. The total training time taken is
highlighted in the log file as shown in this figure
85
xviii
4.2
The snapshot of the log file generated by the program
performance.exe which give the results of hit rates,
missed rates and false positive (false alarm) rates of
cascade of boosted classifiers
85
4.3
The different performance in term of computational
training time in GABoost_15F_Ranking and
GABoost_15F_Roulette. They are compared to each
other and to the computational training time of
ExBoost_5F. The x-axis represents the number of
experiments and y-axis represents the computational
time in hours
88
4.4
The
number
of
features
selected
in
GABoost_15F_Ranking and GABoost_15F_Roulette in
ten different experiments compared to ExBoost_15F
93
4.5
The computational time taken to select a single feature
in GABoost_15F_Ranking and GABoost_15F_Roulette
in ten different experiments compared to ExBoost_15F
93
4.6
Four different results from the cascade of boosted
classifiers. Images (a) shows the face is correctly
detected and considered as a hit, (b) shows the face in
the image is not detected or missed detected while in
(c), face is not detected but false positive detection
occurred when non-face sub-window is classified as
face. Finally in (d), both hit detection and false positive
detection occurred in this image
94
4.7
The performance in term of hit rates and false positive
rates
between
the
ten
experiments
of
GABoost_15F_Ranking and ExBoost_5F
98
4.8
The performance in term of hit rates and false positive
rates
between
the
ten
experiments
of
GABoost_15F_Roulette and ExBoost_5F
99
4.9
The performance in terms of hit rates and false positive
rates between the ten experiments of GABoost_Init and
ExBoost_5F
100
4.10
Some examples of the test images. The top three
images show faces are not detected and only false
positive detection occurred. The middle three images
show faces are detected simultaneously with false
positive detections while in the bottom three images,
detection of faces are done perfectly without any false
positive detection
103
xix
4.11
The number of all seven new feature types selected
during the training of ten cascades of boosted
classifiers using GABoost_15F_Ranking
105
4.12
The number of all seven new feature types selected
during the training of ten cascades of boosted
classifiers using GABoost_15F_Roulette
105
4.13
The distributions of the seven new feature types
selected during the training of ten cascades of boosted
classifiers using GABoost_15F_Ranking
108
4.14
The distributions of the seven new feature types
selected during the training of ten cascades of boosted
classifiers using GABoost_15F_Roulette
109
5.1
A Generic Memetic Algorithm as used in [62]
116
5.2
The dynamic rate of crossover and mutation for 200
generations
117
5.3
The procedure of Evolutionary Pruning used in [57] to
reduce the number of weak classifiers trained by
Adaboost during the training of cascade of boosted
classifiers
118
xx
LIST OF ABBREVIATIONS
ACTS
-
Advanced Communications Technologies and Services
BioID
-
Biometric Identification
CCTV
-
Closed-Circuit Camera and Television
CSI
-
Crime Scene Investigation
D1
-
Mahanalobis distance
D2
-
Euclidean distance
EA
-
Evolutionary Algorithm
ES
-
Evolutionary Search
FLD
-
Fisher’s Linear Discriminant
GA
-
Genetic Algorithm
HCI
-
Human Computer Interaction
HSV
-
Hue Saturation Value
M2VTS
-
Multi Modal Verification for Teleservices and Security
Applications
MA
-
Memetic Algorithm
NN
-
Neural Networks
OpenCV
-
Open-sourced Computer Vision
PCA
-
Principal Component Analysis
PDM
-
Point Distribution Model
RBF
-
Radial Basis Function
SNoW
-
Sparse Network of Winnow
SVM
-
Support Vector Machines
Fk
-
Strong classifier stage k
Gen
-
Generation
H
-
Strong classifier
II
-
Integral Image
N
-
Size of population
T
-
Total Iteration
xxi
cr
-
Chromosome
dx
-
Width
dy
-
Height
e
-
Training sample
f
-
Feature
h
-
Weak classifier
i
-
Sequence of weak classifier i
l
-
Number of positives samples
m
-
Number of negatives samples
m
-
Horizontal displacement
n
-
Vertical displacement
p
-
Probability rate
p
-
Parity
x
-
Horizontal displacement
y
-
Vertical displacement
α
-
Weight
β
-
Weight update coefficient
ϑ
-
Threshold
ω
-
Weight
ε
-
Error
xxii
LIST OF APPENDICES
TITLE
PAGE
A
Acceptance letter and published paper in the
3rd International Colloquium on Signal
Processing and Its Applications (CSPA2007),
Malacca
127
B
Acceptance letter and published paper in the
3rd IASTED International Conference on
Computational Intelligence (CI07), Banff,
Alberta, Canada
134
APPENDIX
CHAPTER 1
INTRODUCTION
1.1
Introduction
Since the dawn of modern time, humans have been interested in how nature
functions, including themselves. This understanding has allowed mankind to
reproduce certain forms of nature functions and to extend human limitation. An
impressive example is escaping gravitation; (in other words: flying), and now the
human race is increasingly interested in reproducing one of the most impressive
features of nature: intelligence. Researchers are trying to build intelligent machines
that have different capabilities. Building machines or robot with the faculty of vision
is probably one of the most challenging problems humans are trying to solve. The
computer vision community started to pay attention to face processing about three
decades ago, and it has been widely investigated recently [1 -16] and the list is very
far from exhaustive.
For the past decades, many projects have started with the purpose of teaching
the machine to recognize human faces and facial expressions. Computer vision has
become one of the most challenging fields of study nowadays. The need to extract
information from images is enormous. Face detection and extraction as computervision tasks have many applications and have direct relevance to the face-recognition
and facial expression recognition problem. Face detection is the first stage towards
automatic face recognition. Potential application of face detection and extraction are
in human-computer interfaces, surveillance systems, census systems and many more.
The importance of face detection can be rectified by the issues of public securities
such as 9/11 World Trade Center Attack, London and Bali bombings. In major cities
2
like London or Paris for example, monitoring of people especially in the public
places is done by closed-circuit cameras (CCTV) and televisions, which are linked
via cables and some other devices (see Figure 1.1). Some specific software and
applications are also integrated into these CCTV systems. These systems can also be
found in highly monitored location such as casinos, banks and high access level
laboratories or buildings.
Figure 1.1: Structure of Closed-Circuit Television (CCTV) network
The set-up of CCTV is very simple. Some cameras exist to capture the images
including faces of people as they pass through critical locations. Other cameras are
able to detect a threat. Usually, the software and the applications in CCTV system
will play their roles in detecting any kind of threat. In the case of the authorities who
would like to monitor the presence of any suspected individual, CCTV, through its
applications will act with a similar principle as a face detection and recognition
system. First, a face is detected. Then, it can be tracked to enable important features
to be extracted for analysis. The type of features extracted depends strongly on what
the system wants to achieve. Features can be obtained for either the recognition of a
face (identification) or the recognition of an emotion/expression. Face identification
is relevant in retrieving a person’s identity and emotion recognition has its
contribution in the prevention of crime and calamities for instance. In the latter it
concerns aggression detection, unusual or nervous behavioral detection. That is also
3
why extraction and recognition of facial expression have been a hot topic in the last
decade. It is important to note that face detection and facial expression recognition
are distinct subjects. In face detection the different expressions are considered as
noise, whereas in facial expression recognition the identity is considered as noise.
The latter implies that different persons have different neutral faces with different
feature shapes (big/small eyes, big/small mouth, etc.).
This research is mainly interested in the face detection problem, which means
how to find, based on visual information, all the occurrences of faces regardless of
who the person is. Face detection is one of the most challenging problems in
computer vision and no solution has been achieved with performance comparable to
humans both in precision and speed. High precision is now technically achieved by
building systems which learn from a lot of data in the training set in order to
minimize errors on the test sets. In most cases, the increase in precision is achieved at
the expense of degradation in run-time performance (computational time) and, in
major applications, high precision is demanded, and hence dealing with computation
to reduce processing time is now a problem with hard constraints.
Finally, the problem of detecting a face is well handled by the intelligence of
human beings without us realizing it. This research which is dedicated to discover
the magnificent human intelligence is really interesting and will be useful to be
implemented for further research in this country. This is because our country is now
building towards a more knowledgeable society.
4
1.2
Objectives of the Thesis
The main objective of this research is to enhance and improve the selection of
features from a large feature solutions sets in training of cascade of boosted
classifiers for face detection system by using an Evolutionary Algorithm (EA) with
the characteristic of Genetic Algorithm (GA). The more specific objectives are
described in the following:
1. To investigate various techniques that are able to detect and recognize
human faces in images.
2. To investigate and review different techniques such as Haar-based
Features, Adaboost algorithm, Neural Networks, Support Vector
Machines (SVM), Eigenfaces and GA in face detection and face
recognition applications.
3. To investigate and explore the existing Face Detection System using
Haar-based Features and Adaboost algorithm specifically in Intel
OpenCV software.
4. To implement GA inside the Adaboost framework to select features in
building cascade of boosted classifiers.
5. To add seven new feature types in order to increase the quality of
feature solutions thus enlarging feature search space.
6. To programme C/C++ source-codes of Intel OpenCV software to
implement GA
7. To prepare the database for training and testing purposes of the
cascades of boosted classifiers.
8. To analyze and compare the performances of the cascades of boosted
classifiers built using GA with the cascade of classifiers built
exhaustively.
5
1.3
Scope Of The Thesis
The scope of this research is described as follows:
1. The system is developed for human face detection and the tracking is
based on the technique of Haar-features classifiers and Adaboost
algorithm.
2. The system’s primary concern is to train a cascade of boosted classifiers
by using GA technique in the training part. For the detection part, the
system will use this cascade of boosted classifiers that was created
previously.
3. The research also concentrates on writing and modifying the program’s
source codes with the implementation of GA in the face detection
system training part.
4. The research focuses on the improvement of the selections of features
or weak classifiers which later form cascade of boosted classifiers using
GA
5. The research also compares and analyzes the results of the performance
of the trained cascades of boosted classifiers with these two different
techniques: Evolutionary search with GA and exhaustive search.
6. The research will also analyze the performances of the seven new
feature types proposed in the cascade of boosted classifiers training.
1.4
Thesis Contributions
This thesis is expected to make a lot of contributions which can be
categorized as below:
1. The main contribution of this thesis is the implementation of GA inside
Adaboost framework to select features from larger search space to build
cascade of boosted classifiers. The module can be implemented in the
6
training part of face detection system. The feature selections will be done
by GA from a large search space with low computational time as a
replacement to the exhaustive features search from small search space
with high computational time. Face detection experiments on a single
image are conducted to assess the performance in terms of hit rates,
missed rates, false positive rates and the training time of different cascade
of boosted classifiers built using GA and exhaustive techniques. The
results are compared and analyzed.
2. The second contribution is the seven newly proposed feature types to
enrich features solutions set with more quality possible features or weak
classifiers. The performance of these seven new feature types
contributions toward the trained cascades of boosted classifiers are
compared and analyzed.
3. Other contributions relate to providing a comprehensive review of
existing face detection techniques for gray scale images applications. This
is first done by describing the different challenges, then by presenting the
most significant work after dividing the field into four categories.
4. The final contribution relates to the GA, by proposing and developing
programs related to its structure, operators and parameters.
1.5
Thesis Outline
This thesis is divided into five chapters. Chapter 1 provides the Introduction.
Chapter 2 presents some examples of real world applications of face detection and
face recognition systems in four different applications categories. The four different
categories of these applications describe the different functions of the systems used
for face detection system and face recognition system in various requirements,
situations and environments. Also present in this chapter is a full review of the
7
various issues in face detection with four existing categories of face detection
techniques, as well as some review of the researches that involve usage of
Evolutionary Algorithm in face detection. The four categories: 1) Knowledge-based
methods are presented first, and they include rule-based methods which encode
human knowledge on what should constitute a typical face. Usually, the rules capture
the relationships between facial features. 2) Feature-invariant approaches are
algorithms that aim to find structural features that exist even when the pose,
viewpoint or lighting conditions vary, and then use these to locate faces. 3) Then,
template-matching methods will be described. These usually consist of several
standard facial patterns, which are stored to describe the face as a whole or as
separate facial features. The correlation between an input image and the stored
patterns are computed for detection. 4) The fourth and last category consists of
appearance-based methods. In contrast to template matching, the models (or
templates) used here are learned from a set of training images that are meant to
capture the representative variability of facial appearance. Then, these learned
models are used for image detection. The use of Evolutionary Algorithms in face
detection especially the ones involve the appearance-based methods is also
described.
Chapter 3 presents a thorough description of GA to select features in building
cascade of boosted classifiers. The description includes the structure of population
and chromosomes, initial parameters, selection schemes, crossover and mutations
rates, termination criteria and the number of generations of GA. Two types of
selection schemes, Ranking Scheme and Roulette Wheel Scheme are explained in
detail as both of them are used in this research. A review of the selections of weak
classifiers or features to form a set of strong classifiers in various training stages or
layers by Adaboost is also presented. Furthermore, the proposed seven new feature
types to enrich the quality of feature solutions are also presented in this chapter.
Chapter 4 is dedicated to the experiments done to assess the performance of
the trained cascade of boosted classifiers. The main focus of this chapter is to
compare and analyze the performance between cascades of boosted classifiers built
by using two different selection schemes of GA, Ranking Scheme and Roulette
8
Wheel Scheme, with large feature solutions set and cascade of boosted classifiers
built exhaustively from small feature solutions set. The results of these three different
techniques used are shown and analyzed.
Finally, Chapter 5 concludes the thesis with a summary of the work that has
been accomplished, a review of the objectives, their fulfillment, and a glimpse at
future work to improve the proposed techniques.
9
CHAPTER 2
LITERATURE REVIEW
2.1
Introduction
The face is a human primary focus of attention in social intercourses, playing
a major role in conveying identity and emotion. A human being’s ability to identify
and recognize human faces is remarkable. We can still recognize thousands of faces
easily throughout our lifetime and identify similar faces such as our old friends or
neighbors even after years of separations. This extraordinary perception or skill is
part of human intelligences which is robust, despite large changes in the visual
appearances due to viewing conditions, expressions, ageing and distractions such as
glasses, hats or changes in hairstyle or facial hair.
The interest of research and development in the field of face recognition and
detection comes from many potential real world applications especially in security,
law enforcement, video surveillance, networking and human-machine interactions. A
few examples would be the criminal face detector to be installed at the airports, the
reconstruction of criminal or suspect faces for the police, prediction on how an
individual face will look like in a few years and also for a development of intelligent
robots or machines through their computer vision systems. Developing a
computational model of face recognition and face detection is quite difficult because
faces are complex, multidimensional and have meaningful visual changes. Usually in
developing a fully automated face recognition system, many steps and processes
need to be taken. The first step is to detect the presence of human faces in the image.
Once face detection system has detected faces, then these face images will be used to
recognize or verify the different categories of individuals by a face recognition
10
system. The overview of the generic face recognition system framework is shown in
Figures 2.1 and 2.2. These figures show a face is detected and this detected face
image is then analyzed in order to recognize or verify the individual. The initial
problem that arises in the research of face detection field is to build an efficient face
detection system. In this system, the machine will reflect on detecting whether there
is a face or otherwise in the picture or video image.
Figure 2.1: A generic framework of face recognition system. The first step in this
framework is the detection of faces in the image. The detection process is done by a
face detection system.
11
a
b
c
d
Figure 2.2: Example of a process of an automated face recognition system. (a) The
face is detected; (b) Pose tracking and estimation, (c) Alignment process and (d) the
person is recognized.
Building a fully-automated system to analyze the information contained in
face images requires a robust and efficient face detection algorithm. Given a single
image, the goal of face detection is to identify all image regions that contain a face,
regardless of its three-dimensional position, orientation or lighting conditions. Such a
problem is challenging, because faces are non-rigid objects that have a high degree
of variability in size, shape, color and texture. Many techniques have been developed
to detect faces in a single image, and the purpose of this chapter is to categorize and
review these face detection techniques. The motivation behind face detection is that
many research efforts like face recognition system assume that all faces in the images
are identified and localized (for example, identifying the image position of each
single face). Therefore, to obtain robust automated face recognition systems, we
must be able to detect faces within images in an efficient manner before recognizing
them. In [10], within a definition of face detection, the author writes:
“Given an arbitrary image, the goal of face detection is to determine
whether or not there are any faces in the image and, if present, return
the image location and extent of each face”.
12
In general, the issues attributed to this face detection problem are as follows:
1. pose
2. facial expression
3. occlusion
4. image orientation
5. imaging conditions
The first part of this chapter will review some of the commercialized face detection
and recognition applications. The second part will examine some of the issues
attributed to face detection, while the third part will enumerate existing face
detection methods that deal with these five issues when given a single image. The
last part of the chapter will review some improvements made to the face detection
system by an Evolutionary Algorithm such as Genetic Algorithm. Actually, many
methods for face detection are discussed in the literature, including artificial neural
networks face detection [3],[4], support vector machines (SVM) [19], gravity
centered templates [20], graph-matching [21], skin color learning [22], and also
coarse-to-fine processing [23] [24] [27], to name a few.
2.2
Applications of Face Recognition and Face Detection
The concerns on securities especially at important and major places like
airports, ports, customs, prisons, border posts, government buildings, private
buildings and clubs, banks, high-security laboratories and buildings, nuclear power
plants, military facilities and sites, strategic sites, public area etc. has created high
demand of a system capable of handling any threat from any kind of sources. Various
types of systems that recognize, identify and verify a person or an individual have
been used in those places. Most of the systems use biometric identifications such as
finger prints, iris, voices and also non-biometric identifications such as smart cards,
x-ray machines and metal detectors. Face detection and recognition system is an
alternative system that can be used in this field. Actually, many applications of face
13
detection and face recognition systems have been developed and commercialized
nowadays. Although various kinds of research could be done to improve the quality
and the performance of these systems, the commercialized face detection and
recognition applications with good performances already exist in the market. Face
detection and recognition systems are usually used in four different categories of
applications:
1. Physical Access Control
2. Video Surveillance and Watch-list Identification
3. Image Database Search
4. Entertainment and Leisure
2.2.1
Physical Access Control
Physical Access Control requires a cooperative subject or a candidate to be
identified or verified. This type of application is usually used to grant an access to
one restricted location. It has a one-to-one verification technique which means that
the subject’s face is verified based on the subject’s claim who he or she is. The
subject will have to put his or her face in front of the camera in order to grant an
access. In this category, face detection will become a lot more easier since the system
can easily estimate the location of the faces in the input images. Then the face
recognition is performed from the captured face image. An example of a
commercialized Physical Access Control using face recognition technology is
“FacePASS 4.1” [65]. This application was developed by Viisage Technology
Incorporation, one of the world leading companies in face recognition technology.
The example of Physical Access Control and FacePASS 4.1 applications are shown
in Figures 2.3, 2.4 and 2.5.
14
Figure 2.3: Co-operation is required by the subject by putting his face in front of the
camera.
Figure 2.4: The process of face recognition by FacePASS where a grid is placed on
the subject’s face image and this face is verified with the database. Access is granted
only in case of good match whereas access is denied to people unknown to the
system.
15
Figure 2.5: Two different results of FacePASS. On the left, access is granted while
on the right the subject’s access is rejected.
2.2.2
Video Surveillance and Watch-list Identification
In this category, a face recognition system requires an important first step
which is known as face detection. In this case, subjects or people are uncooperative
which means no face from one subject or people will be put directly in front of the
camera. Face detection and recognition systems in this video surveillance and watchlist identification category are integrated with a Closed Circuit Television (CCTV)
and are normally implemented at places which involve public access or area like
airports, ports, banks and also in the city streets. The systems must be able to detect
the presence of faces first before recognizing them. After a face is recognized, the
authorities will know whether any particular individual is a threat or not and then
further action will be taken.
16
The need for this type of face detection and recognition system increases
recently due to security related issues especially after the famous event of terrorist’s
attacks on New York’s World Trade Center, or simply referred to as “9/11”.
Although the capability of a fully automated face recognition system is far from
perfect, many of these applications with good performance in face detection and
recognition have been developed and commercialized. The examples of the
commercialized face recognition systems for video surveillances and watch-list
identifications are FaceFINDER 2.9, developed by an American company, Viisage
Technology Incorporation, FaceSnap & FaceCheck which is developed by a German
company, C-Vis GmbH and FaceIntellect, developed by a Russian company,
ITVSec. The examples of these applications are shown in Figures 2.6 and 2.7.
17
Figure 2.6: Face detection and recognition by FaceFINDER. On the top left of the
screenshot is the input of a video image which show many uncooperative subjects
walking. FaceFINDER will detect the faces and compare them to database of face
images as shown on the right side of the screenshot.
Figure 2.7: FaceSnap will detect faces in the images and also can capture and store
the facial images.
18
2.2.3
Image Database Search
A face recognition system based on the images database search is usually
used by the authorities such as police, Crime Scene Investigation (CSI), customs,
immigrations and local authorities. In this category, the input images from the
authorities are face images and these images will be analyzed and compared to the
faces databases. It is quite similar to a face recognition system in the Physical Access
Control category but it does not need the presence of the subject or the individual to
perform the face recognition. The input images are in picture forms used in driving
license, passport and identity card which already contains the subject’s face. Face
recognition will be done to identify that individual. An example of this face
recognition system is AcSys Watchlist, which is developed by a Canadian company,
AcSys Biometrics Corporation. The main interface of this system is shown in Figure
2.8.
Figure 2.8: The AcSys Watchlist Main Interface. The system performs face
recognition by using its face images database.
19
2.2.4
Entertainment and Leisure
Besides the important role of face detection and recognition systems in terms
of public and private securities, the system could also be used for the purpose of
entertainment and leisure. However, in this category, face detection is more preferred
to face recognition. Some face detection technology has been implemented in mobile
phones and digital cameras such as Canon Ixus 850IS, Canon Powershot G7, Fuji
Finepix S6000FD etc (See Figure 2.9). Recently, in 2006, an American company,
FotoNation has introduced a software called FaceTracker, a face detection software
that can be integrated in digital cameras and mobile phones to detect and track faces.
The examples of face detection technology used in mobile phones and camera are
shown in Figures 2.10 and 2.13.
Figure 2.9: Examples of digital cameras that use face detection technology. From
left: Canon Powershot G7, Fuji Finepix S6000FD and Canon Ixus 850IS.
Figure 2.10: Examples of face detection applications used in mobile phones and
cameras.
20
2.3
Issues in Face Detections
Face detection provides interesting challenges due to the requirements for
pattern classification, learning and recognition techniques. When an image is
considered as an input to a pattern classifier, the dimension of the feature space is
extremely large (i.e., the number of pixels in normalized training images). These
classes of face and non-face images are decidedly characterized by multimodal
distribution functions. To be effective, a classifier must either be able to extrapolate
from a modest number of training samples, or it must be efficient when dealing with
a very large number of these high-dimensional training samples. The factors that
affect a face detection system’s performance especially in hit rate, miss rate and false
positive rate are scale, pose, illumination, facial expression and occlusion.
2.3.1
Scale
In a single given image, a group of faces may appear in different scales or
sizes, as illustrated in Figure 2.11. The scale or size of a face may be dealt with by a
simple resizing process or by warping, based on what is termed as stretching
transform [30]. This technique requires localization of several feature points such as
the eyes, nose, or mouth, in order to warp the face in a way that corresponds to the
face’s biometrics. These feature points may be located manually [30], or by
automatic low-level processing based on facial geometry. However, in general, the
latter is not sufficiently robust, and because the location of the feature points is
unknown in a face detection problem, a resizing process is usually preferred.
21
Figure 2.11: Example of a single image which contains different scales or sizes of
faces.
22
2.3.2
Pose
The performance of face detection systems drops significantly when pose
variations are present. Varying poses occur from a change in viewpoint or when the
head is rotated in three-dimensional axis. This may lead to large linear variations in
facial appearance, due to self-occlusion and self-shading.
Most of the previous works on face detection are limited to frontal views.
Some researchers have attempted to adapt their method for different views [4][19]
yet still deal with rotation in depth and to detect faces across multiple views remains
difficult. With respect to the feature-based approach, Cootes et al. [31] proposed a
3D active appearance model to compute face pose variations explicitly. In [30], a
template-matching approach was used for face recognition with varying poses,
whereby faces are represented by a large number of facial feature templates
associated with different poses and different individuals. Hierarchical coarse-to-fine
searching strategy was proposed to reduce computational demand and to allow for
recognition of a new face as a multi-level pyramid representation of a face image.
However, these methods require a large set of templates that may not be available in
some applications, and they are very demanding.
A similar strategy which would impose fewer computational demands is to
use an appearance-based method to learn the representation of a face under varying
poses. Li et al. developed Support Vector Machines (SVM)-based multi-view face
detection and pose estimation model in [32], and improved their system in [33], by
using Support Vector Regression to solve the problem of pose estimation. They
apply a combined Eigenfaces and Support Vector Machines (SVMs) method, which
improves the overall performance of detection in terms of speed and accuracy.
However, this method demonstrates a weak performance when other factors, such as
illumination or occlusion occurs.
23
2.3.3
Illumination
The problems that illumination creates are illustrated in Figure 2.12 in which
the same face, with the same facial expression, and seen from the same viewpoint
appears differently due to changes in lighting. In fact, changes brought about by
differences in illumination are often larger than the differences exist between
individuals themselves, causing systems based upon comparisons of images to
misclassify input images.
Figure 2.12: Different effect of illuminations on a face.
While there have been many researches in computer vision that detail
methods to handle image variation produced by changes in pose, a few efforts have
been devoted to image variations produced by changes in illumination. For the most
part, object detection algorithms have either ignored variations in illumination, or
have dealt with it by measuring some property or feature of the image. A traditional
way of solving this problem is by intensity normalization, which computes the ratio
of local intensity to the average brightness in a suitable neighborhood as done by
Brunelli and Poggio [28]. This method works well only when there is a slight change
in the light source direction with no shadowing. Another technique which uses
illumination subspace approach has been proposed to deal with scene lighting
conditions by Georghiades, Kriegman and Belhumeur [34]. This method constructs
an illumination cone of a face from a set of images taken under unknown lighting
conditions. This approach is reported to perform significantly better, especially when
evaluating images with extreme illumination, but has yet to be tested for face
detection under a combination of other factors.
24
2.3.4
Facial Expression
As opposed to the effect of scale, pose and illumination, facial expression can
modify significantly the geometrical effect of a face, as shown in Figure 2.13.
Figure 2.13: Different facial expressions
It was shown that facial expressions particularly are important in affecting the
automated detection of facial features [35]. Even today, the problem of effective and
expression-invariant
face
detection
and
segmentation
remains
unsolved.
Experimental findings have revealed that detecting facial features, using a
knowledge-based approach, is especially affected by expressions of happiness and
disgust [36]. In particular, the detection of the nose and mouth are affected by facial
images with expressions of disgust and happiness, with detection accuracies of 75%
and 62%, respectively. Observing such deteriorated detection results proves that a
face detector must take into account these conditions to detect faces with various
expressions.
25
2.3.5
Occlusion
Occlusion is another issue confronted by face detection in practice. Glasses,
scarves and beards could change the appearance of a face. Figure 2.14 shows
examples of the same face under different occlusions such as scarf and glasses. Most
research only addressed the problem of glasses [11]. The problem of hairstyle is
rarely treated in face detection, because as long as it does not obstruct any facial
features, it can be ignored easily using a proper cropping method.
Figure 2.14: Different occlusions on a face
In [37], Support Vector Machines (SVM) method with local kernels has been
proposed to realize a robust face detector under partial occlusion. The robustness of
their proposed method under partial occlusion is demonstrated by evaluating its
performance in detecting an occluded face; and it is discovered that it could detect
faces wearing sunglasses or a scarf. However, the authors propose a polynomial
kernel under restrained conditions, which may not be suitable for detecting occluded
faces under large variety of conditions.
26
2.4
Face Detections Methods and Techniques
Face detection can be viewed as a two-class classification problem in which
an image region is classified as being either a “face” or a “non-face”. With over 170
reported approaches to face detection [10], the impact of its research has broad
implications on face recognition. The various approaches for face detection can be
classified into four categories:
1. Knowledge-based methods use rule-based methods, which encode human
knowledge of what a face is.
2. Feature invariant approaches are regrouping methods that aim to find
robust structural features, invariant to pose, lighting, etc.
3. Template matching methods compute the correlation between standard
patterns of a face and an input image for detection.
4.
Appearance-based methods, in contrast to template matching, use models
learned from training sets to represent the variability of facial appearance.
2.4.1
Knowledge-based methods
With this approach, face detection methods are based upon the rules derived
from the researcher’s knowledge of the geometry and anthropometry of a human
face. Following a set of simple rules (for example, the symmetric properties of the
eyes and the relative distance between features) face candidates are identified. A
verification step is often added to reduce the occurrence of false negatives. However,
the difficulty with this method is that, if the rules are either too strict or too general
(not strict enough), the algorithm will not perform well.
Kotropoulos and Pitas [38] propose the use of hierarchical knowledge-based
method, by which a rule-based localization procedure is utilized. Their technique
locates the facial boundary using the horizontal and vertical projections of I(x,y),
defined as: HI ( x) = ∑ y −1 I ( x, y ) and VI ( x) = ∑ y −1 I ( x, y ) . While HI determines the
n
m
27
left and right side locations of the head, I locate the mouth, lips, nose tip and eyes.
Figure 2.15 displays an example of this procedure. This technique has been tested on
images of faces against uniform backgrounds from the European ACTS M2VTS
database [64]. Experiments show a detection rate of 86.5%. However, this algorithm
is not able to detect multiple faces in an image and fails to perform well when
dealing with non-uniform backgrounds.
A primary advantage of the knowledge-based approach is the easiness to
come up with simple rules to describe the features of a face and their relationships. It
is founded on coded rules: facial features in an input image are extracted first, and
face candidates are identified. It works well for face localization in uncluttered
backgrounds. Unfortunately, it is difficult to translate human knowledge into rules:
detailed rules fail to detect faces, and general rules often identify many false
positives. It is also extremely difficult to extend this approach to detect faces in
different poses.
Figure 2.15: Example of horizontal and vertical signature used in [38] to detect face.
28
2.4.2
Feature invariant approaches
Using the assumption that humans can detect faces and objects in different
poses and lighting conditions effortlessly, researchers have been trying to identify
features that are constant, or invariant, over a large range of imaging conditions. The
idea behind the feature invariant approach lies in the extraction of specific facial
features such as eyes, nose, etc. In contrast to knowledge-based approach, these
features are used to build statistical models describing their spatial relationships and
verifying the existence of a face. However, one of the major disadvantages of this
approach lies in it being easily corrupted by factors like noise and occlusion like any
other approach.
2.4.2.1 Facial Features
While Sirohey [39] develops a method consisting of a boundary-fitting ellipse
enclosing the head region using the edge image, Chetverikov [40] goes a step further
by using blobs and streaks instead of edges. His face model consists of different
blobs representing the eyes, nose and cheekbones. Triangular configurations that
connect these blobs are used to encode their spatial relationships. Leung et al. [21]
discuss a probabilistic approach to locate a face in a cluttered scene based upon local
feature detectors and random graph matching. The problem is viewed as a search
problem in which the goal is to identify an arrangement of facial features that is
likely to be a face. Candidate facial features are obtained by matching a filter
response at each pixel of a test image. Given the top two features, a search for other
features is performed using the expected locations from a statistical model of mutual
distances. This system achieves an 86% detection rate.
Another feature-based method that uses a large amount of evidence from the
visual image was proposed by Yow and Cipolla [41]. First they detected interest
points in a multi-stage process using a filter that indicates possible facial features.
29
Then, edge examination is performed in the neighborhood of these points by
grouping and verifying edge length, strength, etc. Mahalanobis distances
subsequently are used to validate facial features with respect to a certain threshold
and a Bayesian network is used to verify the presence of a face. One interesting
feature is that this method can detect faces in different orientations and poses. This
method subsequently was enhanced by employing active contours.
2.4.2.2 Skin Color
Color provides a computationally efficient method which is robust under
rotations in depth and partial occlusions. Color properties are modeled using
invariant color spaces, for example the components which emphasize skin properties
regardless of strong lighting effects. This invariant color space is called chrominance
color space and is used to model and learn skin color efficiently from a training set
using standard classifier [22].
Extensive research on skin color has proven that this method is a viable way
of detecting faces, proving also that the difference in people’s skin color lies mostly
in intensity, rather than in chrominance [42]. A simple method is to define a region
of skin tone pixels from skin samples and to classify skin tones depending on the
range they fall into. While Crowley et al. [43] used a histogram of normalized RedGreen-Blue (RGB) values to classify pixels according to a threshold, τ .However,
color appearance is often unstable due to changes both in background and foreground
lighting. Several methods have addressed this problem, such as that proposed by
McKenna et al. [44], which specified instead of using color constancy, stochastic
models are exploited to accommodate for view and lighting changes. In [45], a
specularity detection and removal technique is proposed, where a wave-front is
generated outwards, starting from the peak of the specularity onto its boundary.
Upon attaining the specularity boundary, the wavefront contracts inwards while
coloring in the specularity, until the latter no longer exists. Still, the skin tone
30
approach cannot be applied for multiple face detection, as it offers noisy detection
results because of body parts and skin tone line regions.
2.4.2.3 Multiple features
The global (size, shape and skin color) and local features (eyes, nose and hair
features described above), recently have been combined in numerous methods to
locate or detect faces. A typical approach detects skin-like regions as described
previously and the skin pixels are grouped using clustering algorithms. If the shape
of a resulting region is elliptic, it is considered to be a face candidate. Local features
then are used for verification.
Sobottka and Pittas [45] also used shape and color by performing color
segmentation in the Hue-Saturation Value (HSV) color space to extract skin regions.
Connected components are determined, and if a best-fit ellipse fits well, it is selected
as a facial candidate. After determining skin/non-skin regions using class-conditional
density functions, an elliptical face template is used to establish the similarity of skin
regions based upon a Hausdorff distance. Inherent symmetry properties are used to
locate the eye centers, and to deduce the position of the nose tip and mouth on the
face. One drawback of this method is that it is only effective with frontal views.
Other reported methods have applied structure, color and geometry instead of pixel
information [46].
Features are invariant to pose and orientation change. However, it is difficult
to locate facial features due to the inefficiency of these methods to cope with external
factors such as illumination, noise and occlusion. It is also difficult to detect features
in complex backgrounds.
31
2.4.3
Template Matching Methods
Given an input image, the correlation values in predetermined standard
regions, such as face contour, eyes, nose and mouth are calculated independently.
Although, this approach is simple, it has been insufficient for face detection since it
can not handle variations in scale, rotations, pose and shape. Multiresolution,
multiscale, subtemplates and deformable templates have been proposed to achieve
scale and shape invariance template matching [53].
In [20], Miao et al. proposed a hierarchical template matching method for
face detection. Initially, the input image is rotated from − 20 0 to 20 0 degrees to
handle rotation. Then, each rotated image form a mosaic at different scales in which
edges are extracted using Laplacian operator. The face template consists of six facial
components of two eyebrows, two eyes, nose, and mouth. Face candidates are
located by matching templates of face models represented in edges. In the final step,
some heuristics are used to determine the existence of a face. Experiments show
better detection performance for images containing a single face, rather than
multiple. Kwon et al. [48] proposed a detection method based on snakes and
templates. In this approach, an image is first convolved with a blurring filter then
with morphological operator to enhance edges. A modified n-pixel snake is used to
find and eliminate small curve segments. Each candidate is approximated using an
ellipse and for each of these candidates, a deformable template method is used to find
the detailed features. If a sufficient number of facial features are found, and their
ratio satisfies the ratio tests based on the template, a face is considered to be detected.
Lanitis et al. [48] established a detection method utilizing both shape and intensity
information. In this approach, training images are formed in which contours are
manually labeled with sampled points, and vector sample points are used as shape
feature vectors to be detected. They use a Point Distribution Model (PDM) together
with the Principal Components Analysis (PCA) [13] to characterize the shape vectors
over an ensemble of individuals. A face shape PDM can be used to detect face in test
images using active shape model search to estimate face location and shape
parameters. The shape patch is then deformed to the average shape, and intensity
32
parameters are extracted. Then the shape and intensity parameters are used together
for measuring Euclidian distance from the face.
2.4.4
Appearance Based Methods
With appearance-based methods, the “templates” that are used for face
detection are learned from images, rather than predefined by experts, as in template
matching methods. They rely on techniques from statistical analysis and machine
learning to discover characteristics of face and non-face images. Dimensionality
reduction is also an important aspect in these methods. Many of these methods can
be viewed in a probabilistic framework, using the Bayesian classification method to
classify a candidate image for the density functions p(x|face) and p(x|non-face),
where x is a random variable viewed as the feature vector derived from the image.
However, due to the high dimensionality of x, or because of the multimodal behavior
of both these density functions, classification is not so straightforward. Discriminate
functions between face and non-face classes have also been used for appearancebased methods, traditionally by projecting image patterns onto a lower
dimensionality space, or by multi-layer neural networks to form non-linear decision
surfaces. Additionally, Support Vector Machines (SVM) and other kernel methods
have been proposed to implicitly project patterns onto a higher dimensional space in
order to separate these classes.
2.4.4.1 EigenFaces
Kirby and Sirovich [49] first demonstrated that images of faces can be
encoded linearly using a modest number of basic images. This demonstration is
based upon the Karhunen-Loeve transform, known as Principal Component Analysis
[13]. Given a set of n by m pixel training images represented as a vector of size m * n
, basis vectors spanning an optimal subspace are determined, such that the mean
squared error between the projection of the training images on the subspace and the
33
original images is minimized. This set of optimal basis vectors are denoted as
Eigenfaces, since these are simply the eigenvectors of the covariance matrix
computed from the vectorized face images in the training set. Similar to reference
[49], Turk and Pentland [2] applied PCA to face recognition and detection, by
applying a training set of face images to generate Eigenfaces that span a subspace
(called the face space) of the image space [2]. To detect the presence of a face in a
scene, the distance between an image region and the face space is computed for all
locations in the image. However, this method often cannot retrieve faces which
expose variations in head rotation.
2.4.4.2 Distribution-Based Methods
Sung and Poggio developed a distribution-based system for face detection [3]
which demonstrated how the distribution of image patterns from one object class can
be learned from positive and negative image examples of that class. Their system
consists of two components, with a distribution-based model for face/non-face
patterns and a multilayer perception classifier. The patterns are grouped into six face
and six non-face clusters using a modified k-means algorithm, as shown in Figure
2.16.
34
Figure 2.16: Example of the distance measures used in Sung and Poggio’s method
presented in [3]. (a) Computation of distance between test pattern and clusters. (b)
Each distance measure is two-value metric. D1 is a Mahalanobis distance, while D2
is the Euclidean distance.
Each of the six clusters is represented as a Gaussian function, but set at
different scales, hence possessing a different mean image and covariance for each
cluster. Distance metrics are used to classify face window patterns from non-face
patterns, using twelve pairs of distances to each face and non-face cluster. As it is
difficult to generate a representative sample of non-face patterns, the problem was
alleviated by means of a bootstrap method that selectively add images to the training
set as training progresses. The testing process consists of incrementally adding, into
the training set, non-face patterns that are detected as faces. This has significantly
improved the performance of the system.
Other probabilistic visual learning methods followed. Moghaddam and
Pentland [50] developed a technique rooted in density estimation in a highdimensional space, using Eigenspace decomposition. Principal Component Analysis
(PCA) was used to define the subspace which best represents a set of face patterns.
However, PCA does not define a proper density model for the data, since the cost of
coding a data point is equal anywhere along the principal component. Furthermore, it
35
is not immune to independent noise in the features of the data. Two methods, one
based on factor analysis (FA) and another based on Fisher’s Linear Discriminant
(FLD) were proposed by [51] in order to project samples from the high-dimensional
image space to a lower-dimensional feature space.
2.4.4.3 Neural Networks
Neural Networks (NN) have been applied successfully in many pattern
recognition problems, such as optical character recognition, object recognition, and
autonomous robot driving. Since face detection can be treated as a two-class pattern
recognition problem, various neural network architectures have been proposed. The
advantage of using neural networks for face detection is the feasibility of training a
system to capture the complex class conditional density of face patterns.
There have been many attempts to apply NN to face detection, but the most
significant work in this field was done by Rowley et al. [52][4], who demonstrated
significant improvements in many aspects of NN when compared to other methods.
A multilayer neural network is used to learn the face and non-face patterns from
face/non-face images (i.e., the intensities and spatial relationships of pixels). They
also used multiple neural networks and several arbitration methods to improve the
performance of the system. There are two major components: multiple neural
networks (to detect face patterns) and a decision-making module (to render the final
decision from multiple detection results). Figure 2.17 presents a diagram
summarizing Rowley’s method in which, given a test pattern, the output of the
trained neural network indicates evidence for a non-face (close to -1) or face pattern
(close to +1). Nearly 1000 face samples of various sizes, orientations, positions and
intensities were used to train the network. One limitation of the methods by Rowley
[4] and Sung [3] is that they can only detect upright, frontal faces. Furthermore, one
major drawback is that the network architecture must be tuned extensively (number
of layers, number of nodes, learning rates, etc.) to achieve exceptional performance.
36
Figure 2.17: System diagram of the Rowley-Kanade neural network method [4]
2.4.4.4 Support Vector Machines (SVM)
Support Vector Machines (SVM) can be considered to be a new paradigm to
train polynomial functions such as those of neural networks and Radial Basis
Function (RBF) classifiers. While most methods for training a classifier (e.g.,
Bayesian, neural networks, and RBF) are founded on minimizing the training error,
(i.e., empirical risk), SVMs operate on another principle, called structural risk
minimization, which aims to minimize the upper bound on the expected
generalization error. SVMs initially were applied to face detection by Osuna et al.
[19], by developing an efficient method to train an SVM for large-scale problems;
they applied it to face detection.
Since then, a good number of face detection methods based upon Support
Vector Machines were proposed. In reference [53], a subspace approach was
presented, in which a linear SVM classifier is trained as a filter to produce a
subspace, which is then used by a non-linear SVM classifier with an RBF kernel for
face detection. The authors [54] experimented with a similar method, whereby an
SVM was trained in the Eigenspaces instead of in a linear space. Even though
experimental results demonstrated promising performance, these approaches were
unable to detect faces in various poses. The head pose problem remained unresolved.
37
Striving to solve the head pose problem, Li et al. [32] developed SVM-based
multi-view face detection and pose estimation model and improved their system in
[33], by using Support Vector Regression to first solve the problem of pose
estimation. They subsequently used a combined Eigenfaces and Support Vector
Machines (SVM) method, which improves the overall performance in terms of speed
and accuracy. A similar method was proposed by Wang and Ji [55], whereby a
combination of SVMs, using both cascading and bagging methods, was developed to
detect faces in seven different views, under complex backgrounds. However, as
stated before, these methods perform poorly when other factors such as illumination
or occlusion come into play.
2.4.4.5 Adaboost
The Adaboost method initially was presented by Freund and Schapire [56]
[25], whereby a set of weak classifiers (obtained from a series of observed
distributions) are used for learning. Viola and Jones [23][27] presented one of the
first face detection methods that used this statistical approach. The principle concept
behind this technique is to select important features, using a focus of attention
method, and then to rely on integral graphs for fast feature evaluation.
Fundamental to the whole approach are Haar-like features (so called because
they are computed similar to the coefficients in Haar wavelets), whereupon each
feature is described by the template (shape of the feature). Very simple decision tree
classifiers that usually have just two terminal nodes are built to yield a face/non-face
response. Not every classifier is able to detect a face; some, called weak classifiers,
merely react to some simple feature in the image that may relate to the face. A
complex and robust classifier is then constructed out of multiple weak classifiers to
filter out the regions that most likely do not contain faces, by means of the Adaboost
method.
38
Viola and Jones [23] suggested building several boosted classifiers, Fk, with
constantly increasing complexity, and then chaining them into a cascade with simpler
classifiers going first. During the detection stage, each current search window is
analyzed by each of the Fk classifiers that may reject it or let it go through, as shown
in Figure 2.18
Figure 2.18: Face detection cascade of classifiers shown in [23], whereby rejection
can happen at any stage.
In [23] the proposed technique was a real-time and accurate face detection
algorithm which can process an image of 384x288 pixels in 0.067 second. The
approach is based on an efficient feature computation and a selection using a new
image representation referred to as the integral image (discussed in the following
chapter) and the AdaBoost training technique [25], [26].
AdaBoost is actually a boosting technique which is used in [23] both for
training and selecting a subset of critical features among all the possible features.
The method restricts each weak classifier to a single feature. The approach is based
on combining a set of weak classifiers in order to build a stronger one. At each step
of training, and among all possible weak classifiers, one is selected so that its error
rate on the training set is minimal. Accordingly, the parameters (threshold, weight,
etc) of this weak classifier are estimated and the global (strong) classifier is given as
a linear combination of the selected (weak) classifiers. In this repetitive process, a
weak classifier which actually consists of only a single feature with minimum error
39
rate is selected in earlier iterations while weak classifiers selected in further iterations
make higher errors as the discrimination task becomes harder.
Face processing in this approach is achieved by combining successively
stronger classifiers in a cascade (see Fig 2.17) where at the beginning of the cascade,
a simple two-feature strong classifier (a linear combination of two weak classifiers)
is used to filter many background structures (60% in experiments) while preserving
almost 100% of faces. To be evaluated, this two feature strong classifier requires
between 6 to 9 array references to estimate each feature (see following sections), one
threshold operation and one multiply, addition per feature resulting in approximately
60 microprocessor instructions. The number of features in different steps of the
cascade increases and this makes the underlying strong classifiers more and more
discriminating, and even though they are more expensive than the strong classifiers
in the early steps, the overall mean run-time of the global cascade is extremely small.
This is mainly because only face and face-like rare structures reach these steps
requiring further processing. Notice also that thresholds of strong classifiers close to
the beginning of the cascade are adjusted to have no miss-detections.
Experiments in [23][27] showed that a large majority of the sub-windows are
rejected during the first and second layers. These two layers consist only of a small
number of features (an average of about 8 weak classifiers each). So this technique
speeds up the detection process. Furthermore, each of the stages does not need to
offer optimal results; in fact, the stages are trained to achieve high detection rates,
rather than low false positive rates. By choosing the desired hit rate and false positive
rate at every stage, and by choosing the number of stages accurately, it is possible to
achieve very good detection performance. By stacking 20 stages into a cascade, a
detection rate of 98% and a false-detection rate of 0.0001% are achieved.
This approach achieves processing by combining successively more and more
complex strong classifiers in the cascade. Simple classifiers rapidly determine where
in an image faces might occur while more complex processing is applied only on
these regions.
40
Lienhart [30] subsequently improved this method by adding two extensions:
(1) a novel set of rotated Haar-features was added to the method and (2) an enhanced
boosting algorithm, which diminished the number of false detections by 10%. By
reviewing the literature on face detection methods, this rapid face detection
technique, initially proposed by Viola and Jones [23] and improved by Lienhart [29],
is the approach that offers the best results with the least number of constraints on the
data in the field of face detection, because of its high accuracy and great
computational speed.
2.5
Evolutionary Algorithm in Face Detection Techniques
The use of Evolutionary Algorithms in the field of image processing,
especially automatic learning of features for object detection such as face has
received growing interest. Treptow and Zell [14] showed an Evolutionary Algorithm
which can be used within Adaboost framework in single stage classifiers to select
features which provide better classifiers for object detections such as faces and balls.
They have extended the approach of Viola and Jones [23] by using 6 different types
of features instead of four original feature types used in [23]. Exhaustive search over
the whole possible feature sets had been replaced by an evolutionary search in order
to select the features. The training was done to select about 200-features classifiers.
The results have shown that the performance of the classifiers built by using
Evolutionary Algorithm was better in term of the hit rate and training time. However,
the set of classifiers trained with Evolutionary Algorithm is only a single layer or
stage classifiers and the training of cascade of multiple strong classifiers were not
addressed in [14]. The example of the Evolutionary Algorithm used in [14] is shown
in Figure 2.19.
41
Step 3b) EA Search()
Begin
t2 := 0;
Initialize_feature_population (P(0));
Repeat
P’ := select (P(t2));
Crossover (P’);
Mutate (P’);
Train_classifiers (P’);
Evaluate_classification_error (P’);
P(t2+1) := replace (P(t2), P’);
t2 := t2 + 1;
Until terminated
End
Figure 2.19: Evolutionary Algorithm used in [14] to build single stage classifiers.
Chen and Gao [17] had introduced Genetic Algorithm (GA) based method to
swell face database through re-sampling from the existing faces. In other words, they
have expanded the face samples set by using Genetic Algorithm. The new generated
face samples were done by crossover and mutation operators. The example of this
process is shown in Figure 2.20. In their technique, the face samples are divided into
three sub-sets: training, validating and testing. The training set is then used to train a
Sparse Network of Winnow (SNoW) [58]. Chen and Gao have also used this
expanded database to train an Adaboost face detector and showed that better
classifier performance was achieved.
42
Figure 2.20: Crossover and mutation process used in [17]. In (a), each parent is
converted into sequence of observation vectors for crossover while in (b), the process
of crossover is shown and (c) show the mutation process
Recently in 2006, J. S. Jang and J. H. Kim [57] introduced the employment of
Evolutionary Pruning in cascaded structure of classifiers which has the purpose of
reducing the number of weak classifiers found by Adaboost for each stage of cascade
training. The evolutionary pruning was implemented just after Adaboost found a
strong classifier which consists of a linear combination of weak classifiers. This
approach was actually aimed at increasing the performance of face detection speed
by reducing the number of weak classifiers in existing built cascade. The
performance of face detector speed was improved. However the training time of the
cascade of classifiers was ignored in their paper. The procedure of this Evolutionary
Pruning technique is shown in Figure 2.21.
43
1.
2.
3.
4.
Input
Initialization
Adaboost
Evolutionary Pruning
a) Evolutionary arrangement, find a set of
α̂ t which satisfies probability conditions
b) Select h' which has minimum α̂ t
c) Η ' = Η t − {h'}
d) Evolutionary search
If solution is found, then
t ← t − 1, Η t = Η '
Go to 4.(b)
Else
Go to 5.
5. Final strong classifier
H ( x) =
∑
αˆ t ht ( x) ≥ δ
1
if
0
otherwise
ht ∈Η t
Figure 2.21: The procedure of Evolutionary Pruning used in [57] to reduce the
number of weak classifiers trained by Adaboost during the training of cascade of
boosted classifiers.
2.6
Genetic Algorithm
Genetic algorithms (GAs) [59][60][61] are random search algorithms that
imitate natural evolution with the Darwinian “survival of the fittest” approach. GAs
perform on the coding of the parameters and not on the exact parameters; therefore,
they do not depend on the continuity of a parameter, nor on the existence of the
derivatives of the functions, as needed in some conventional optimization algorithms.
The coding method allows GAs to handle easily multiple-parameter or multi-model
types of optimization problems that are rather difficult or are impossible to treat
using classical optimization approaches
44
The population strategy enables a GA to search for near-optimal solutions
from various directions within a search space simultaneously. Therefore, it can avoid
convergence to local minimum or maximum points. A GA processes each
chromosome independently, making it highly adaptable to parallel processing. It
needs no more information than the relative fitness of the chromosomes; thus, it is
suitable for application to systems that are ill-defined. GAs can also work well for
non-deterministic systems, or systems that can be partially modeled only. GAs use
random choice and probabilistic decisions to guide their searches, where the
population improves towards near-optimal points from one generation to another.
The fundamental activity of a GA consists of three basic operations:
reproduction, crossover, and mutation. Reproduction is the process where members
of the population are reproduced according to the relative fitness of the individuals,
where the chromosomes with higher fitness have higher probabilities of having more
copies in the coming generation. There are a few selection schemes available for
reproduction, e.g. the “roulette wheel”, the “tournament scheme”, the “ranking
scheme”, etc. [59]. Crossover in a GA occurs when the selected chromosomes
partially exchange the information in the genes, i.e., part of a string is interchanged
between two selected candidates. Mutation is the occasional alteration of the state at
a particular string position. Mutation is essentially needed in some cases where
reproduction and crossover alone are unable to offer a globally optimal solution. It
serves as an insurance policy which would recover the loss of a particular piece of
information.
2.7
Summary
This chapter has described the general overview of face detection and
recognition systems which include their principal issues such as pose, facial
expression, occlusion, image orientation and imaging conditions. Some examples of
the real world applications that use face detection and face recognition technologies
are reviewed. These applications are divided into four categories which are Physical
45
Access Control, Video Surveillance and Watch-list Identification, Image Database
Search and Entertainment and Leisure. The first and third categories, Physical
Access Control and Image Database Search do not require a robust face detection
algorithm as the input images already contain faces. Since the subjects are required
to be cooperatives in Physical Access Control and in Image Database Search, input
images are like images of driving license, passport or identity card. Face recognition
system will play a bigger role in identifying or verifying an individual. However, in
the other two categories, Video Surveillance and Watch-list Identification and
Entertainment & Leisure, face detection is the most important and critical first step to
be taken. The applications must be able to detect faces before they are recognized or
verified by a face recognition system.
This chapter also presents a literature review of the four general methods
currently used to detect faces in images. The advantage of the first method, the
knowledge-based approach is that it is easy to generate simple rules to describe the
features of a face and their relationships. Unfortunately, it is difficult to translate
human knowledge into some precise rules: very detailed rules may fail to detect faces
while quite general rules often detect many false positives. It is also extremely
difficult to extend this approach to detect faces in different poses, and to enumerate
all possible cases. The second method, the Feature-based methods can be invariant
to pose and orientation change. However, it may be difficult to locate all facial
features due to the inefficiency of these methods to cope with external factors such as
illumination, noise or occlusion. It is also difficult to detect features in complex
backgrounds such as in the streets or public area. The third method, the templatematching approach is quite simple and straightforward, but these templates are
required to be initialized near the face images. Similar to the knowledge-based
approach, it is difficult to enumerate templates for different poses. It has also proven
inadequate for face detection as it is not efficient when dealing with scale, pose and
shape. The fourth method, the appearance-based methods use powerful machine
learning algorithms. They have demonstrated accurate, fast and fairly robust results
and can be extended to detect faces in different pose orientations. On the other hand,
they usually need to search over space and scale, and require many positive and
negative examples during their training.
46
Towards the end of this chapter, the use of Evolutionary Algorithm in the
field of face detection was explained. This algorithm is used to reduce the search
space of feature solutions thus reducing training computational time. It is also used to
re-sample the training images which mean expanding the sample sets and also used
to tune the trained cascade of boosted classifiers by reducing its number of weak
classifiers. In general, the purpose of using Evolutionary Algorithm is to explore and
find good solutions to the problems in lesser or reasonable time. Evolutionary
Algorithm will replace an exhaustive search technique for the solutions where the
entire huge solution sets is explored. In the case of exhaustive search, the process
normally requires a very high computational time. Genetic Algorithm was also
described at the end of this chapter as one of the techniques that is used in this
research.
Finally, the current state in the field of face detection shows that the
technique introduced by Viola and Jones [23] is considered as a fast and robust
method that can be used for real time applications. This technique stands out as the
benchmark technique in the field of face detection. However, the training stage for
this technique is very time-consuming, sometimes requiring days or weeks of
training. Furthermore, it requires a large number of training examples and does not
attempt to train nor detect faces under a wide variety of conditions. The research as
presented in this thesis is based on the techniques used by Viola and Jones [23], will
propose an Evolutionary Algorithm with the characteristic of Genetic Algorithm to
select good features from a very large feature search space with additional new
feature types in building cascade of boosted classifiers. The training time of this
cascade of boosted classifiers using Genetic Algorithm should be reduced while the
trained cascades of boosted classifiers should have similar or better performances.
47
CHAPTER 3
FEATURE SELECTIONS OF ADABOOST TRAINING USING GENETIC
ALGORTIHM
3.1
Introduction
In the previous chapter, many face detection techniques have been reviewed
and explained. The most efficient face detection technique which can also be used in
real time application, is the technique introduced by Viola and Jones [23]. This
technique uses Adaboost algorithm that will select Haar-based features from the
entire possible feature solutions to form a strong classifier in order to build a cascade
of boosted classifiers for face detection. However, this technique requires a very high
computational training time and this seems to be one of its weaknesses. This is due to
the exhaustive search of the features in the whole features search space. In this
research, the focus is on the implementation of an Evolutionary Algorithm with the
characteristic of Genetic Algorithm to select features in building cascade of boosted
classifiers. This chapter also defines some of the terms used in face detection, the
techniques of selecting features using Adaboost algorithm, Genetic Algorithm for
feature selections and also the databases used for the purposes of training and testing
the cascades of boosted classifiers.
48
3.2
Method and Techniques Used
This research focuses on the face detection technique that is proposed by
Viola and Jones [23]. This technique is considered 15 times faster than the method
described by Rowley, Baluja and Kanade [4]. It consists of simple Haar-based
features as shown in Figure 3.1 below, integral images (see Figure 3.3) and Adaboost
algorithm to train the 15 stages cascade of boosted classifiers. This training process
will produce a cascade of classifiers that will be used in the detection part of the
system.
Recapitulating the research that is based on the publication of Viola and
Jones [24] there are mainly two problems to be dealt with: (1) extending the feature
sets and being able to search over these very large sets of features in reasonable
computational time and (2) the best features are not known in advance. Adding the
seven new feature types should enrich the feature search space with more varieties
and useful features but, on the other hand, the computational time taken to train the
cascade of classifiers is increased. The approach chosen in this research focuses on
reducing the computational training time to build the 15-stages cascade of boosted
classifiers which has a similar or better performance by using a very large feature
types and sets. To overcome these problems, this research will use an Evolutionary
Algorithm based on the characteristic of Genetic Algorithm (GA) in combination
with Adaboost algorithm to search over a large number of possible features.
The goal of this research is to find better classifiers that use more significant
features in lesser time and can achieve comparable or even better classification
results compared to the classifiers that are trained with Adaboost in combination with
exhaustive search over a small set of features. GA is implemented into Adaboost
framework in order to find good feature sets which represent sets of strong classifiers
for each stage of cascade training. The cascade of classifiers can be trained faster by
implementing GA search for its feature selections, than using exhaustive search. The
trained cascade consists of the combination of eight basic features and seven new
feature types. In general, the performance of the face detection systems are measured
49
using their hit rates, missed rates and false positive rates. In this research, the training
time of cascade of boosted classifiers is also measured because this measurement is
the main reason why Genetic Algorithm is implemented. The following are the
definitions of the above mentioned:
a) Hit rate – the number of faces that are correctly detected by the cascade of
classifiers divided by the total number of faces.
b) Missed rate – the percentage of faces that are not detected by the cascade
of classifiers divided by the total number of faces.
c) False positive rate – the percentage of false detection (non-face images
are considered as face images by the system) divided by the total number
of all detections.
d) Training time – the computational time taken to train a cascade of boosted
classifiers.
The application is done by using Microsoft Visual C++ and Intel OpenCV, an
open source computer vision software. The implementation of Genetic Algorithm is
done in the training part’s source-code namely cvhaartrainingGA.cpp.
50
3.3
Haar-based Features and Integral Images
Our detection procedure classifies images based on the value of simple
features. There are many motivations for using features rather than the pixels
directly. The most common reason is that features can act to encode ad-hoc domain
knowledge that is difficult to learn using a finite quantity of training data. For this
system, there is also a second critical motivation for features: the feature-based
system operates much faster than a pixel-based system as described in Viola and
Jones [23].
a
b
c
d
Figure 3.1: Examples of rectangle features shown relative to the enclosing detection
window. The sums of the pixels which lie within the white rectangles are subtracted
from the sum of pixels in the black rectangles. Two-rectangle features are shown in
(a) and (b). Figure (c) shows a three- rectangle feature and (d) a four-rectangle
feature.
Three kinds of features are used. The value of a two-rectangle feature is the
difference between the sums of the pixels within two rectangular regions. The
regions have the same size and shape and are horizontally or vertically adjacent (see
Figure 3.1). A three-rectangle feature computes the sum within two outside
rectangles subtracted from the sum in a center rectangle. Finally a four-rectangle
feature computes the difference between diagonal pairs of rectangles. Given that the
base resolution of the detector is 24x24 pixels, the total exhaustive set of rectangle
features is quite large, which is 160,000. Note that unlike the Haar basis, the set of
rectangle features is over complete as described by Viola and Jones [23].
51
Viola and Jones [23] developed a reliable method to detect objects such as
faces in images in real-time. An object that has to be detected is described by a
combination of a set of simple Haar-wavelet like features shown in Figure 3.2. These
feature types are classified as the 5 basic feature types used in Intel OpenCV.
Figure 3.2: Five different basic types of rectangle features within their sub window
of 24x24 pixels. These five basic types of features are the initial features used to train
the cascade of classifiers exhaustively in OpenCV.
The sums of pixels in the white boxes are subtracted from the sum of pixels
in the black areas. The advantage of using these simple features is that they can be
calculated very quickly by using “integral image”. An integral image II over an
image I is defined as follows:
II ( x, y ) =
∑ I ( x' , y ' )
(1)
x '≤ x , y '≤ y
In [23] it is shown that every rectangular sum within an image can be
computed with the use of an integral image by four array references. It is shown as in
Figure 3.3:
52
Figure 3.3: The sum of pixels within rectangle D can be computed with four array
references. The value of integral image at location 1 is the sum of the pixels of
rectangle A. The value at location 2 is A+B, at location C is A+C, and at location 4 is
A+B+C+D. The sum within D can be computed as 4+1-(2+3).
3.4
Adaboost Learning algorithm
A classifier has to be trained from a number of available discriminating
features within a specific sub window in order to detect an object. The possible
positions and scales of the five different feature types as shown in Figure 3.2 above
produce about 90,000 possible alternative features within a sub window size of
24x24 pixels. This number exceeds largely the number of pixels itself which only
stood at 576. Therefore, a small set of features which best describe the object to be
detected, has to be selected. Adaboost Algorithm as introduced by Freund and
Schapire [25] is a mechanism to select a good classification functions, called “weak
classifiers”, to form a final “strong classifier”, which is a linear combination of all
weak classifiers. This strong classifier is defined as:
H ( x) = α 1 h1 ( x) + α 2 h2 ( x) + α 3 h3 ( x) + ......... + α n hn ( x)
(2)
where H(x) is the strong classifier, α i is the weight associated with respective weak
classifier hi and i is the total number of all weak classifiers that form the strong
53
classifier H(x). The boosting process in Adaboost is a sequential procedure to select a
linear combination of weak classifiers to form a final strong classifier. In Figure 3.4
– Figure 3.11, it is shown how a linear combination of weak classifiers is built.
Firstly, in Figure 3.4, each data point has a class of +1 for the red points and -1 for
the green points. These two data classes {+1,-1} are the same with two data classes
used for face detection because the training images for face detection will only
consist of two classes: face or non-face. For this example, the red points represent
faces and the green points represent non-faces. By the way, in this first round or
iteration, the weights ω i for all the points are set to be equal to 1. The same case is
applied for all training examples of face and non-face images.
Figure 3.4: Two classes of data points represent two classes of images such as face
or non-face. The weight associated to each point is equal to 1.
Then, the first weak learner to detect our object of interest, here is face in our case, is
just by chance chosen to differentiate between these two classes’ data points (face or
non-face) as shown in Figure 3.5. We can see that this first weak learner is situated at
the middle of the data points and it also includes a quite large number of green points
which represents non-face images. It seems that this weak learner is not good
enough to classify the data points of face or non-face.
54
Figure 3.5: The first weak learner is just chosen at chance which in this case divides
the set of data points into two. A quite high number of green points are included in
its selection.
Therefore, the all possible weak learners are trained and evaluated exhaustively to
differentiate between these two classes of face and non-face images. It is shown in
Figure 3.6 that at the end, the best weak learner found is selected as the weak
classifier h1 ( x) and is assigned with weight ω1 in this iteration. This weak classifier
is actually the best weak learner that differentiates between face and non-face
images.
55
Figure 3.6: The weak learner (bold line) seems to be the best among all the weak
learners to classify face images and therefore is selected as the weak classifier.
From the same figure, we can see that the weak classifier selected is able to classify
some parts of the face data points (red points) on the left side of it. However, this
weak classifier is not able to classify the large part of the face data points located on
the right side of it. All these face data points are now considered as the non-face
images by this first weak classifier which is false exactly and this means that one
single weak classifier alone is just unable to give a good and acceptable face
classification images.
So, in the next round, the weights associated with each of all the data points
that have been misclassified by weak classifier h1 ( x) are increased as shown in
Figure 3.7.
56
Figure 3.7: The weight of all misclassified data points or training examples
performed by the first weak classifier is updated and increased by Adaboost.
The process will restart where the weak learners start to perform by chance again as
shown in Figure 3.8. It continues until the best weak learner is found and selected as
the weak classifier h2 ( x) for this second round. Then, as previously, the weights of
all the misclassified data points performed by the previous weak classifiers are
updated and increased in the next round in order to find a new weak classifier.
Figure 3.8: The second weak classifier is selected. However, there are still some
misclassified data points performed by these weak classifiers.
57
The process continues with the selection of the third weak classifier as shown in
Figure 3.9 and Figure 3.10.
Figure 3.9: The selection of the third weak classifier and yet the misclassified data
points are still exist.
Figure 3.10: The four weak classifiers selected are good enough to differentiate
between the face and non-face images.
Finally, the strong classifier is built and it actually consists of a linear combination of
all weak classifiers as in equation (2). In the example from Figure 3.4 until Figure
3.10, the strong classifier that has been built will have 4 weak classifiers as shown in
58
Figure 3.10. In Adaboost algorithm, the selections of these weak classifiers are done
in step 3 and 4 in Figure 3.11 and the search was done exhaustively.
In the general context of the learning features, each weak classifier h j (x) consists of
one single feature f
j
:
1 : p j f j ( x) < p jϑ j
h j ( x) = 
 0 : otherwise
(3)
where ϑ j is a threshold and p j a parity to indicate the direction of the inequality. The
description of Adaboost algorithm to select a predefined number of good features
given a training set of positive and negative example images is shown in Figure 3.11.
59
1) Input: Training examples ( xi , y i ) , i = 1..N with positive
( yi = 1) and negative ( yi = 0)
examples.
2) Initialization: weights
ω1,i =
1 1
, with m negative and l positive examples
2m 2l
3) For t=1,...,T:
a) Normalize all weights
b) For each feature
c) Choose
j train classifier h j with error ε j = ∑i | h j ( xi ) − yi |
ht with lowest error ε t
d) Update weights:
ωt +1,i = ωt ,i β t1−e
ei = 1 otherwise and β t =
4) Final strong classifier:
i
where
ei = 0 if xi is correctly classified and
εt
1− εt
1 : T α h ( x) ≥ 0.5∑T α t
1
t =1
with α t = log( )
h( x) =  ∑t =1 t t
βt

0 : otherwise
Figure 3.11: Adaboost learning algorithm as proposed in [23][25][26]. This
algorithm is used to select the sets of weak classifiers to form strong classifiers from
all possible features types. The search of good feature
ht was
done exhaustively as
stated in step 3b above
As described previously, Adaboost algorithm iterates over a number of T
rounds. As shown in Figure 3.11, for Step 3b, in each iteration, the space of all the
possible features is searched exhaustively to train weak classifiers consisting one
single feature. To train a single weak classifier, the threshold ϑ j must be determined
for the feature value to discriminate between positive and negative examples.
Therefore, for each possible feature and given training set, the weak learner
determines two optimal values (thresholds), to ensure no training sample is
misclassified as shown in Figure 3.12.
60
small threshold
big threshold
○○○○○ ○○
Face values extracted from a feature
Figure 3.12: The determination of thresholds by weak learner.
As a result, a weak classifier h j (x) consists of a feature f
j
, a big threshold, a
small threshold and a weight.
1 : smallthreshold ≤ f j ( x) ≤ bigthreshold
h j ( x) = 
0 : otherwise

(4)
As stated in Figure 3.11, in Step 3b, for each weak classifier h j (x) , the error
value ε j will be calculated using the misclassification rate of all positive and
negative training images. It gives each possible feature h j (x) its respective error
value ε j which is between 0 and 1. The best feature ht (x ) found with the lowest error
rate ε t will be selected as the weak classifier for this 1/T iteration. After choosing the
best weak classifier concerning the weighted classification error on the training set,
all training examples are reweighted and normalized, to concentrate in the next round
on those examples that were not correctly classified. At the end, the resulting strong
classifier is a weighted linear combination of all T weak classifiers.
61
3.5
Cascade of Boosted Classifiers
This section describes an algorithm for constructing a cascade of classifiers
which drastically reduces the computational time. The number of sub-windows to be
classified by the detector is enormous and requires a lot of computational time. The
main idea is that a small and more efficient cascade of boosted classifiers can be built
which reject most of the negative sub-windows while detecting almost all positive
instances. Simpler classifiers are used to reject the majority of sub-windows before
more complex classifiers are called upon to achieve low false positive rates. Only
input sub-windows that have passed through all cascade layers are classified as faces
as shown in Figure 3.13. With this flexible structure, easily distinguished non-face
patterns like homogeneous texture can be simply rejected by a simple one-feature
classifier as shown in Figure 3.14. Lienhart, A. Kuranov and V. Pisarevsky [29]
stated in their paper that cascade of classifiers is a degenerated decision tree where at
each stage, a classifier is trained to detect almost all objects of interest (faces in our
example) while rejecting a certain fraction of the non-object patterns. Input for the
cascade is the collection of all sub-windows also called scanning windows. They are
first passed through the first layer in which all sub-windows will be classified as
faces or non faces. The negative results will be discarded. The remaining positive
sub-windows will trigger the evaluation of the next classifier. The same process is
performed in every layer. The sub-windows that reach and pass the last layer are
classified as faces.
Figure 3.13: Cascade from simple to complex classifiers with N layers.
62
Figure 3.14: A simple feature is used to reject simple background pattern. The left
sub-window image will pass through to the next strong classifiers while the right one
will simply be discarded
The structure of the cascade reflects the fact that within any single image on
overwhelming majority of sub-windows are negative. As such, the cascade attempts
to reject as many negatives as possible at the earliest stage possible. Every layer
consists of only a small number of features. In the early stages, with only a small
number of best features, it is possible to determine the existence of a non-face
(negative sub-window). Determining the presence of a face usually needs more
features. Therefore, the cascade has an increasing number of features in each
consecutive layer. While a positive instance will trigger the evaluation of every
classifier in the cascade, this is an exceedingly rare event.
During the training of cascade, the number of layers and the number of
features per layer or stage was driven through a trial and error process. In this
process, the number of features was increased until a significant reduction in the
false positive rate could be achieved. For instance, in our case each stage was trained
to eliminate half or 50% of the non-face patterns while falsely eliminating only
0.005% of the frontal face patterns; 15 stages were trained. Assuming that our
training set is representative of the learning task, false alarm rate expected is about
0.515 ≅ 3 x10 −5 and a hit rate about 0.99515 ≅ 0.93 . More features were added until the
false positive rate on each stage achieved the desired rate while maintaining a high
detection rate. At each training stage, the false positive images from previous stages
are added to the sets of negative or non-faces images and this set of images is used as
negative images in next stages training (See Figure 3.15).
63
Training cascaded classifiers that can achieve both good detection rate and
less computational time is quite complex because higher detection rate requires more
features but more features means more time to evaluate. The detection rate goal and
the false positive rate goal for each layer are set before the training starts.
Cascade Learning
•
In the first stage, a number of P face images and a number of N non-face
images are used for training.
•
For the rest stage,
–
face images are the same set
–
non-face images consist of:
⇒ The false positives of previous stages
⇒ Some false positives generated from a negative training set of images
For example, if false positives of previous stages are 1000
and N is 3000, 2000 false positives are generated from the
negative training samples.
Figure 3.15: The cascade learning process involving new false positive images from
the previous stages are added into the set of negative sample images of the next
stages.
3.6
Genetic Algorithm for Feature Selections
Genetic Algorithms (GAs) [59][60][61] are a family of computational models
inspired by natural evolution. Computational studies of Darwinian evolution and
natural selection have led to numerous models for computer optimization. GAs
comprise a subset of these evolution-based optimization techniques focusing on the
application of selection, mutation, and recombination or crossover to a population of
competing problem solutions.
In this research, GA is used to replace the exhaustive search over all possible
features as shown in Figure 3.3. The evolutionary search with GA characteristics to
64
select features in building cascade of classifiers is shown in Figure 3.16. It will
perform as the searching mechanism to select good features to build a cascade of
classifiers.
Therefore, the single individual or chromosome as in Figures 3.17 and 3.18 of
GA represents the specific type and location of one single feature in its sub-window
of 24x24 pixels. Each individual has the length of 6 genes or elements.
Step 3b) Evolutionary Search with GA
Begin
gen = 1;
Initialize random population;
Train and Evaluate population;
Rank chromosomes;
Do
Select chromosomes for Mating Pool;
Crossover;
Mutation;
Train and Evaluate new chromosomes
Rank chromosomes in population;
gen = gen + 1;
Until termination condition satisfied
End
Figure 3.16: Pseudo-code of an Evolutionary Algorithm with the characteristics
of Genetic Algorithm
65
The first five genes or genes are integer types which consist of:
•
type : type of feature
•
x : coordinate x in sub-window
•
y : coordinate y in sub-window
•
dx : width of feature
•
dy : height of feature
The sixth gene is a decimal number type which stores the fitness value of the
respective features.
type
x
y
dx
2
7
11
3
d
y
6
fitness
0.8158
Type of
features
XCoordinate
Width of
rectangle
feature
YCoordinate
Height of
rectangle
feature
Fitness
value
Figure 3.17: Structure of an individual or a chromosome which represents the
specific type and location of one single feature. Its last gene contains the fitness
value from the fitness function chosen:
1− εi
66
dy = 6
dx = 3
type
x
2
7
y
11
dx
dy
fitness
3
6
0.8158
Figure 3.18: Representation of an individual or chromosome as the type and
location of a feature in the sub-window of 24x24 pixels.
Previously in Figure 3.11, every feature or weak classifier trained and
calculated produce an error value ε j and the feature with the lowest ε j will be
selected in each iteration. So, the fitness function chosen is 1 − ε i where i is the
number between 1 and population size N and the fitness value is a decimal number
between 0 and 1. By using this fitness function, the higher fitness value indicates a
lower error value ε j . From this, it can be said that the chromosome representing the
feature that interests us are those chromosomes with higher fitness value which
automatically means that we are interested in the feature with the lowest error
value ε j . The structure of GA population of N-size is shown in Figure 3.19.
67
type
x
y
dx
dy
fitness
3
8
6
4
6
0.8452
1
7
2
10
4
0.8201
5
2
12
4
9
0.8136
2
9
3
6
8
0.7927
3
2
14
15
16
0.7355
1
7
2
4
2
0.7342
6
7
10
5
8
0.7298
0
1
2
3
N-3
N-2
N-1
Fitness = (1- error) for each individual
Figure 3.19: Structure of population of Genetic Algorithm with N-Size population.
Fitness value is equal to 1-error and it is between 0 and 1. The higher fitness value
means the lower error value for a particular feature or weak classifier. This figure
show the already sorted and ranked chromosomes based on their fitness value.
Selecting features from a larger search space require an increase number of
feature types. The existing three types of features are added to the total feature sets
(see Figure 3.20). These three features are part of up-right rectangle features which
already exist in Intel OpenCV along with the 5 basic feature types that had been
described in Figure 3.2. In addition to that, we add our proposed seven new types of
features (see Figure 3.21) to increase the search space and possibilities of getting a
better cascade of classifiers with a higher performance.
68
Figure 3.20: The existing three types of features within their sub window of 24x24
pixels. These feature sets are added to the training of cascade of classifiers with
Genetic Algorithm search
In Figure 3.21, the four top left features ( a, b, c and d ) with the style of Lshape and inverse L-shape should have the purpose of distinguishing the image of
face and non-face based on the pattern of the side of the face specifically on the left
and right foreheads, temples and jaw lines. The most top right feature (e) is the
inverse of the middle feature shown if Figure 3.20. In the same Figure 3.21, the most
right on the second line feature (f) has the purpose to distinguish the pattern of both
eyes region with forehead and the last feature situated at the bottom (g) should make
the difference between face images and non-face images based on the pattern of eyes
location.
With these additional three existing feature types and seven newly proposed
types, the total feature types equal to 15. Compared to only about 90,000 possible
alternative features within a sub window size of 24x24 pixels produced by 5 basic
feature types in Figure 3.2, this 15 different feature types produce about 302,000
possible alternative features. As a result, while more valuable and better types of
features might be created as the potential sets of good feature solutions, the search
space of all of these feature types increased dramatically. In order to avoid the
exhaustive search through all possible feature types and sets and also not to consume
high computational time to train the cascade of classifiers, Genetic Algorithm is
implemented as an evolutionary search or selection mechanism to select good
features in building similar or better cascade of classifiers with less training time.
69
a
b
e
c
d
f
g
Figure 3.21: The newly proposed seven types of features within their sub window
of 24x24 pixels. These feature sets are proposed and added in training of cascade of
classifiers with Genetic Algorithm search. These additional feature types increase the
size of search space and computational time taken for cascade training is higher.
The search of features by GA is done by using selection, crossover and
mutation throughout a definite number of generations. Initially, in the first
generation, all chromosomes in the entire population are generated randomly and are
evaluated to determine their respective fitness value. Two selections schemes were
used in this research in order to investigate if the output of one selection scheme is
better or otherwise compared to the other selection scheme. The two selection
schemes are “Ranking Scheme” and “Roulette Wheel”, which is commonly used in
GA.
Ranking Scheme selection is used when all chromosomes are sorted by
descending order of their fitness value. This selection scheme is chosen because we
are looking for the chromosome with the highest fitness value possible. So, all the
70
chromosomes inside the GA population are sorted and ranked based on their fitness
values. The first or most top chromosome in the population will have the highest
fitness value while the last chromosome will have the lowest one (see Figure 3.19).
At the end of each generation, the best 50% of the total populations will be selected
and put into a mating pool. In other words, the Ranking Scheme select chromosomes
based on their fitness values ranks in the populations.
On the other hand, Roulette Wheel is the classical selection operator for
generational GA as described by Goldberg [59]. Initially after the evaluations, each
chromosome in the population will have its own fitness value. This fitness value will
be multiplied by a value which is equal to 1/sum of all fitness values in the
population as shown in equation (5).
normalized _ fitness _ valuei =
fitness _ valuei
∑i =1 fitness _ valuei
N
(5)
where i represent the i th chromosome in the Genetic Algorithm population.
This new value for each chromosome is known as normalized fitness value
and the sum of all these normalized fitness values in the population will be equal to
one. Each of these chromosomes is also assigned with the accumulative normalized
fitness value. Therefore, each chromosome of the population is an assigned space on
a roulette wheel that is proportional to its fitness. For the selections of the
chromosomes to be put in the mating pool, a random real number p roulette between 0
and 1 is chosen and it is compared with the accumulative normalized fitness value of
each chromosome. The chromosome which has accumulative normalized fitness
value that is bigger than p roulette will be selected. To understand this selection
scheme further, the examples of Roulette Wheel selection with ten chromosomes are
shown in Table 3.1, Figure 3.22 and Figure 3.23.
71
Table 3.1: Example of fitness values, normalized fitness values and accumulative
normalized fitness values. In this example, only ten chromosomes are shown and
ranked based on their fitness values. norm_fit is the normalized fitness value and
acc_norm_fit is the accumulative normalized fitness values. We can deduce that the
sum of all norm_fit values equalled to 1.
No of
chromosome
1
2
3
4
5
6
7
8
9
10
Total
fitness
norm_fit
acc_norm_fit
0.87
0.81
0.72
0.66
0.64
0.55
0.44
0.38
0.37
0.33
5.77
0.151
0.140
0.125
0.114
0.111
0.095
0.076
0.066
0.064
0.057
1.00
0.15
0.29
0.42
0.53
0.64
0.74
0.81
0.88
0.94
1.00
9
0.064
10
0.057
1
0.151
8
0.066
2
0.140
7
0.076
6
0.095
5
0.111
3
0.125
4
0.114
Figure 3.22: The example of each of the ten chromosomes’ fitness value propagated
in the form of roulette wheel. Chromosome number 1 represents the biggest portion
of the roulette while chromosome number 10 represents the smallest one.
72
1.0
0.94
0.81
0.88
0.74
0.64
0.53
0.42
0.29
∈ { 0 .. 1 }
0.15
roulette
0.0
p
Figure 3.23: The example on how the accumulative normalized fitness vales are
assigned to each chromosome. When the probability p roulette between 0 and 1 is
chosen, the comparison between p roulette and accumulative normalized fitness values
is made from the left to the right. The first chromosome found with higher value of
accumulative normalized fitness value than p roulette will be selected.
As in the Ranking Scheme, the number of chromosomes to be selected and
put into the mating pool is equal to half of the size of the population. In Ranking
Scheme, the selections of the chromosomes to become parents are done based on
their rank in the population. On the other hand, in Roulette Wheel Scheme, the
chromosomes that will be put into the mating pool will be selected based on the
probability rate of p roulette .
73
Each of the chromosomes inside the mating pool will then be used in
crossover or mutation processes if these processes occurs depending on the
probabilities of crossover and mutation. The two-point uniform crossover is applied
here. Therefore, two genes from two chosen individuals will exchange their values as
shown in Figure 3.24.
type
Chromosome 1
x
1
dx
y
dy
2
4
7
0.8264
10
Gene
n=4
Gene
m=2
New children produced
1
3
2
4
4
???
4
8
Chromosome 2
8
6
6
0.7233
Figure 3.24: The crossover process with two parents chosen and two genes m and n
randomly chosen. The values at position m and n are crossover to produce new
children and will be evaluated to get the new fitness value.
74
The rate of crossover probability, p co , where p co is a random decimal
number between 0 and 1. Crossover will take place if p co is below than or equal to
the fixed rate given, and here it is 0.9. This should give us the probability of
crossover of 90%. The crossover and mutation rates chosen are explained later. So,
based on this crossover probability rate p co , two different chromosomes, cr1 and cr2
are chosen from the mating pool to become the new parents. In this crossover, the
first chromosome cr1 is selected orderly one by one from all available chromosomes
in the mating pool. Each time cr1 is selected; a second chromosome cr2 is then
randomly selected from the rest of the chromosomes in the mating pool and cr1 ≠ cr2 .
These two chromosomes cr1 and cr2 are now the parents for potential new children.
After two parents chromosomes cr1 and cr2 are selected, two random
positions of these chromosomes’ genes are chosen which are m and n such that
m, n ∈ {1..5} and m ≠ n . In Figure 3.24, the examples show the values of m and n are
equal to 2 and 4. Then, the genes of m and n perform crossover within these two
parents to produce new children. This process of crossover explores widely the
search space of feature sets to select features with high fitness value. Hence, for the
feature type and location found with a good fitness value, the exploitation of its
surrounding neighboring locations is done using the mutation process.
In mutation process, mutation probability rate p mt is used. The rate p mt is
also a random decimal number between 0 and 1. The one-gene or one-point mutation
is used here. The same approach with crossover, mutation process will take place on
the chromosome chosen from the mating pool based on this probability rate p mt .
However, the fixed rate given here in this case of mutation is 0.1. Mutation will only
occur if p mt is below than 0.1, in other words it means that the probability or chance
to have a mutation process occurring is only 10%. Once a chromosome is decided as
mutated, a random number g is chosen. Like m and n gene position used in
crossover, g ∈ {1..5} and it represents the position of gene in the chromosome. Once
g is selected, the gene at g position will be mutated. If g is equal to 1, then it
75
concerns the first gene type in the chromosome, so the new and different type of
feature will be selected randomly between 1 and 15 since there are 15 different
feature types in total. In other cases such as g is equal to 2, 3, 4 or 5, the value in
genes x, y, dx or dy, will be added with an integer value randomly chosen between -2
and 2. Figure 3.25 shows an example where g is equal to 4 and as a result, the fourth
gene which is dy is mutated by adding 2 into its current value. The value of dy has
been changed from 10 to 12. This chromosome is a new child chromosome and it
will be evaluated in order to determine its fitness value later.
Select random
type
Add randomly value in {-2…2} to
x,y,dx,dy
1
4
7
10
8
0.8269
1
4
7
12
8
???
Figure 3.25: The mutation process with single chosen parent chromosome. The
selected gene between the second and fifth gene are mutated by adding an integer
value between -2 and 2 while in the first gene containing the type of feature, random
number of type between 1 and 15 is then chosen. The new child will then be
evaluated to determine its new fitness value.
76
All parents’ chromosomes that have undergone the crossover and mutation
process will produce new children chromosomes which will be stored in a temporary
population structure. Then, they are re-evaluated the same way as their parents
previously to determine their fitness values, 1 − ε i . The new chromosomes with high
fitness values will have a better chance to be selected and inserted into its appropriate
rank in the main GA population. This population is then sorted and ranked based on
each of its chromosome’s fitness values and is used for selection for the next mating
pool in the next generation. For these new chromosomes that have become invalid or
no longer feasible, for example the sum of their x-coordinate and their width exceed
the allowed width of the sub-window, which is 24x24, their fitness value will be
assigned with the value of zero. In other word, these features are not the good
features and they shall be discarded. While the chromosomes with modest or average
fitness value will not have the opportunity to be selected and inserted either in the
main population or in the next mating pool for the next generation.
In the implementation of cascade of classifiers training using GA, a
generation consists of 200 individuals or chromosomes which represent features.
This GA population size is the result of a trial and error process in which a trade-off
is made between speed and error rate. The size of the GA population would affect the
training time and the quality of the found solutions. If the population is too small, the
training time will be shorter, but there is a probability that not enough number of
good features will be generated and as a result, an average or bad feature will be
selected as weak classifiers. On the other hand, if the population is too large, it will
take a longer time to train the cascade of classifiers and this does not meet the
purpose of using GA in the first place which is to reduce the training time. A
generation should be sufficiently large to create sufficient diversity covering the
possible solution space.
Other parameters of GA are the probabilities for crossover and mutation
operators. In literature review, crossover usually happens with a probability of 7595% and mutation 0.5-5%. Here, in our situation, crossover and mutation are two
processes of randomly selecting another feature from the total feature types and sets.
77
The difference between them is that crossover really explores the search space and
selects a feature at random while mutation selects a feature with a location near to the
original feature which it is mutated from. Treptow and Zell [14] had preferred 20%
of crossover rate and 80% of mutation rate. Since they were using Evolutionary
Algorithm to select about 200 features in only a single stage of classifiers, the choice
of that ratio seems to be reasonable. In the case of training of only a single stage of
classifiers, exploitation rate must be higher than the exploration rate so that the
locations of features found during exploration could be well exploited.
In this research, we have proposed to use 90% of crossover rate and 10% of
mutation rate. This is because training of stages or cascade of classifiers is slightly
different from training of single stage classifiers as described previously even though
both use Adaboost algorithm. In the training of only one single stage of classifiers,
the exploitation seems to be more important than the exploration because the
numbers of features found in this single stage are far higher than the number of
features found in a one stage classifiers cascade. The training samples sets are also
the same. In single stage training, evolutionary algorithm will face the total search
space once when it is initially started and tries to find some good features, then after
those potential features are found, the algorithm just focuses on finding better
features near the locations of founded features.
On the other hand, in cascade training, evolutionary algorithm will face total
search space with slightly different training samples especially negative samples at
each different stage. This is due to the new generated image samples incorrectly
classified (false positive) in the previous stages being added into the training sets
[57]. In order to avoid early local optima or to lose good feature solutions, the
exploration of feature search space should have a higher priority. So, to train cascade
of classifiers which is involves 15 stages applying the ratio 9:1 crossover-mutation
rate is logical and reasonable.
Evolutionary search (ES) with Genetic Algorithm ends when it converges or
satisfies the termination condition. Genetic Algorithm is converged and will stop if
there is no better single feature found within the next 50 consecutive generations. In
78
the case of no convergence occurring, Genetic Algorithm will stop if it reaches the
maximum number of generations which is set to 200 in this case. Note that ES as a
simulation of a genetic process is a non-deterministic search. It could not be
determined if a found solution is optimal or suboptimal. This could be considered as
one of the disadvantages of ES or more specifically in Genetic Algorithm. But on the
other hand, ES can quickly scan a huge solution set in a reasonable time. Besides
that, bad solutions in the population set do not affect the end solution negatively as
they are simply discarded. ES is also very useful for complex or loosely defined
problems. Once the problem is translated successfully into an ES problem it does not
have to identify any rules of the original problem. The evolution mechanism will do
the work.
79
3.7
Face Database for Training and Testing
Face database chosen consists of an initial training set of 7000 positive or
faces images and 3000 negative or non-face images that we gather from various
sources (See Figures 3.26 and 3.27). The dimensions of these grey value images are
24x24 pixels and are different from one another. This training set also contains
people’s face from different gender, race, face rotations, occlusions such as barb or
glasses and also different illumination type. The training set is used to build 15 stages
cascade of classifiers using the standard Adaboost Algorithm with exhaustive feature
search and also a standard of Adaboost Algorithm with Genetic Algorithm to select
features. The performance of both types of cascade is then compared in Chapter IV
by using another set of face database as the test set.
Figure 3.26: Some examples of faces images used in the training set
Figure 3.27: Some examples of non-faces images used in the training set
The test set consists of face dataset images from BioID [63]. This dataset contains
1,200 images containing faces of different people, gender, expression, size, location,
level of illumination and appearance of non-face objects. The examples of these
80
images are shown in figure 3.28. Each of the images in the training and testing sets is
different from one to another.
Figure 3.28: Some examples of images containing faces with various conditions
used in the BioID test set
By using this data set, the performance of cascade built either exhaustively or using
Genetic Algorithm is compared in terms of their hit rates, missed rates, false
detection rates and also their computational time. The results of both cascade training
methods will be captured and the information will be redirected into some log files
created during the training and testing process. Using these log files (.txt), the
comparison of the two different techniques can be done.
3.8
Summary
This chapter starts by explaining the techniques of Viola and Jones [23] to
select the features for face detection. The details of Integral Images, weak classifiers
selections and also Adaboost algorithm are also described. In addition to that, our
approach which consists of implementing Genetic Algorithm inside Adaboost
framework to select features is explained. The explanations include the parameters
used in Genetic Algorithm such as the size of the population, maximum number of
generations, the structure of the chromosome, the fitness function, the two selections
schemes used: Ranking Scheme and Roulette Wheel Scheme, the probability rates of
crossover and mutation and also the termination criteria of Genetic Algorithm.
Besides that, the seven new proposed feature types are also described as they are
added in the feature solutions set to increase the quality of possible feature solutions.
81
As described previously, the purpose of implementing Genetic Algorithm to select
features in this research is to reduce the computational training time. At the same
time, the feature search space is also enriched and enlarged with the seven new
feature types and it is expected that the trained cascades of boosted classifiers should
have similar or better performance compared to the trained cascade of boosted
classifiers without Genetic Algorithm. At the end of this chapter, the descriptions of
databases that are used in this research for training and testing of cascades of boosted
classifiers are also explained.
82
CHAPTER 4
EXPERIMENTAL RESULTS AND ANALYSIS
4.1
Introduction
This chapter describes the proposed techniques of the implementation of
Genetic Algorithm (GA) inside Adaboost framework to develop a 15-stage cascade
of boosted classifiers for face detection purposes. A series of experiments have been
conducted to determine the proposed technique’s performance in terms of hit rates,
missed rates, false positive rates and training time. The experiments of developing a
15-stage cascade of boosted classifiers were done in three different categories of
feature selections. These three categories were divided as follows:
1) Exhaustive search of 5 basic features which is referred as ExBoost_5F.
2) Evolutionary search of 15 different feature types including the seven
newly proposed using Genetic Algorithm with “Ranking Scheme”
selection which is referred as GABoost_15F_Ranking.
3) Evolutionary search of 15 different feature types including the seven
newly proposed using Genetic Algorithm with “Roulette Wheel Scheme”
selection which is referred as GABoost_15F_Roulette.
The size of population of the GA is fixed to 200, crossover and mutation rates
are set to 90% and 10%, respectively and the number of maximum generations is
limited to 200. Furthermore, the GA will be stopped if there are no better
chromosomes found after 50 consecutive generations or once it reaches its maximum
number of generations. The experiments involving each of these categories were
83
done ten times each and the average results were taken. The evaluations of the
performance of the cascades of boosted classifiers were done by using
performance.cpp file in Intel OpenCV. The application of this file, performance.exe
will give the output in terms of hit rates, missed rates and false positive rate. The
experiments were done by using the face database testing set of BioID face database
[63] as described in Chapter 3.
4.2
Experiment on Evolutionary Algorithm with the characteristic of
Genetic Algorithm for feature selections
In this chapter, we compare the performance of the 15-stage cascades of
boosted classifiers built by using the standard Adaboost Algorithm with exhaustive
feature search which we refer to as ExBoost_5F and the standard of Adaboost
Algorithm with evolutionary search with Genetic Algorithm to select features using
the Ranking Scheme selection, which we refer to as GABoost_15F_Ranking and
using
Roulette
Wheel
Scheme
selection
which
we
refer
to
as
GABoost_15F_Roulette.
ExBoost_5F search over initial limited five basic feature types as shown in
Figure 3.2 in Chapter 3, while GABoost_15F_Roulette and GABoost_15F_Ranking
search over larger sets of 15 feature types including seven newly proposed feature
types as shown in Figures 3.19 and 3.20. The training was done by using a training
set of 7000 positive or faces images and 3000 negative or non-faces images that we
gather from various sources as described in Chapter 3. The dimensions of these gray
value images are 24x24 pixels and are different one from another. The training and
testing sets have been described at the end of Chapter 3.
By using the same training set given, the cascades of boosted classifiers are
trained
in
three
categories
mentioned
previously,
ExBoost_5F,
GABoost_15F_Roulette and GABoost_15F_Ranking. The total trainings were
84
stopped after 15 stages or cascade of classifiers were built with the parameters
initially set as described in Chapter 3. The population size is 200, all individuals or
chromosomes are initialized randomly and both GA operators’ crossover and
mutation rates are set to 90% and 10%, respectively. The GA will converge and stop
if there is no better single feature found within the next 50 consecutive generations.
In the case of no convergence occurring, GA will stop as well if it reaches the
maximum number of generations which is set to 200 in our case. Experiments were
carried out on an Intel Pentium IV 3.0 GH processor. ExBoost_5F was executed once
as it can search exhaustively and does not depend on any feature selection
parameters. However, GABoost_15F_Roulette and GABoost_15F_Ranking were
executed ten times and the average results were taken.
To measure the performance of trained cascades of boosted classifiers, the
log files were generated by the programs. Fig 4.1 show log file that indicate the
training time taken to build the 15-stage cascades of boosted classifiers. The
performance of these cascades in terms of their hit rates, missed rates and false
positive rates are shown in Fig. 4.2.
85
Figure 4.1: The snapshot of the log file generated by the program which indicates
the end of the training of 15 stages cascade of classifiers. The total training time
taken is highlighted in the log file as shown in this figure.
Figure 4.2: The snapshot of the log file generated by the program performance.exe
which give the results of hit rates, missed rates and false positive (false alarm) rates
of cascade of boosted classifiers.
86
4.3
Experimental results in terms of computational training time of
Ex_Boost_5F, GABoost_15F_Ranking and GABoost_15F_Roulette.
The computational training times of ExBoost_5F, GABoost_15F_Ranking
and GABoost_15F_Roulette were taken and compared. These times are in seconds
and converted into hours. In Table 4.1, we can see that ExBoost_5F consumed
134225.07 seconds or 37.28 hours to build the 15-stage cascades of boosted
classifiers. Meanwhile, Tables 4.2 and 4.3 show more details on the ten training
times taken from the ten experiments done by each GABoost_15F_Ranking and
GABoost_15F_Roulette respectively. They also show that GABoost_15F_Ranking
took an average of 35634.40 seconds or 9.90 hours and GABoost_15F_Roulette took
an average of 50636.50 seconds or 14.07 hours to build the 15 stages cascades of
boosted classifiers. The improvements made by GABoost_15F_Ranking and
GABoost_15F_Roulette in terms of computational time were about respectively 3.76
and 2.65 times faster than ExBoost_5F in building 15 stages cascades of boosted
classifiers. All these times include the time taken for OpenCV program to prepare the
negative image samples which include new negative images such as non-faces
images incorrectly classified as faces (false positives) in the previous stages.
Table 4.1: The comparison
of ExBoost_5F,
GABoost_15F_Roulette and
GABoost_15F_Ranking in terms of their training time taken to build the 15-stage
cascades of classifiers.
Average
Average
training time
training time
(sec)
(hour)
ExBoost_5F
134225.07
37.28
GABoost_15F_Ranking
35634.40
9.90
GABoost_15F_Roulette
50636.50
14.07
Algorithm
87
Table
4.2:
The
computational
training
time
of
ten
experiments
of
GABoost_15F_Ranking in building 15 stages cascade of boosted classifiers.
Time
Time
taken(h)
taken(s)
GABoost_15F_Ranking _1
7.72
27799
GABoost_15F_Ranking _2
9.74
35048
GABoost_15F_Ranking _3
6.82
24544
GABoost_15F_Ranking _4
6.68
24040
GABoost_15F_Ranking _5
11.51
41444
GABoost_15F_Ranking _6
8.45
30423
GABoost_15F_Ranking _7
13.02
46864
GABoost_15F_Ranking _8
12.70
45737
GABoost_15F_Ranking _9
10.21
36741
GABoost_15F_Ranking _10
12.14
43704
Average training time
9.90
35634.40
Exp No
Table 4.3: The computational training time of ten experiments
GABoost_15F_Roulette in building 15 stages cascade of boosted classifiers.
Time
Time
taken(h)
taken(s)
GABoost_15F_Roulette _1
12.19
43890
GABoost_15F_Roulette _2
10.90
39230
GABoost_15F_Roulette _3
15.16
54571
GABoost_15F_Roulette _4
14.31
51517
GABoost_15F_Roulette _5
16.37
58915
GABoost_15F_Roulette _6
16.01
57620
GABoost_15F_Roulette _7
14.36
51700
GABoost_15F_Roulette _8
16.13
58073
GABoost_15F_Roulette _9
12.26
44123
GABoost_15F_Roulette _10
12.98
46726
Average training time
14.07
50636.50
Exp No
of
88
In terms of computational training time, this technique of exhaustive search is
an exhaustive search to select the features or weak classifiers in ExBoost_5F has
consumed a lot of time. This is due to the selection of features were done by
searching exhaustively through the entire size of possible features solutions or
features search space. In GABoost_15F_Ranking and GABoost_15F_Roulette,
although the possible features solutions or search space has been increased with
additional seven new feature types, the time taken to train cascades of boosted
classifiers was reduced drastically due to their technique of evolutionary search with
the characteristic of GA. We can also remark that the computational training time in
GABoost_15F_Ranking is slightly 1.42 times faster than GABoost_15F_Roulette.
The overall comparison of the computational training time is illustrated in Figure 4.3.
Comparison in term of computational training time
Time (h)
GABoost_Ranking
GABoost_Roulette
ExBoost
40
35
30
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9
10
Exp No
Figure 4.3: The different performance in terms of computational training time in
GABoost_15F_Ranking and GABoost_15F_Roulette. They are compared to each
other and to the computational training time of ExBoost_5F. The x-axis represents
the number of experiments and y-axis represents the computational time in hours.
89
4.4
Experimental results in terms of number of weak classifiers or features
selected in Ex_Boost_5F, GABoost_15F_Ranking and GABoost_15F_Roulette
Table 4.4 shows the total number of features or weak classifiers selected in
cascade of boosted classifiers. The selections of these features were done as shown in
Step 3b in Figure 3.11 in Chapter 3. In ExBoost_5F, the features were selected
exhaustively. It means that the size of the search space was entirely searched in order
to select one single feature in each iteration. However, in GABoost_15F_Ranking
and GABoost_15F_Roulette, the selections of features were done by using GA, thus,
the feature solutions set was not entirely searched. In each iteration, the feature
selected was the best feature found by the GA after some generations. In Table 4.4,
we can see that more features were selected during the training of cascade of boosted
classifiers by GABoost_15F_Ranking and GABoost_15F_Roulette. ExBoost_15F has
selected 400 features while GABoost_15F_Ranking and GABoost_15F_Roulette
have selected an average of 630.40 and 654.40 numbers of features, respectively.
With
less
training
time
in
GABoost_15F_Ranking
and
GABoost_15F_Roulette, the selection of a single feature or weak classifier became
rather fast. GABoost_15F_Ranking only spent an average of 56.59 seconds to select
a single feature; GABoost_15F_Roulette took an average of 77.37 seconds compared
to the exhaustive search in ExBoost_5F that consumed 335.56 seconds to do the
same task. The high computational time consumed by ExBoost_5F to select a single
feature is due to its high computational training time and less number of total
features compared to GABoost_15F_Ranking and GABoost_15F_Roulette which
have selected more features and consumed much less computational training time.
90
Table 4.4: The comparisons of ExBoost_5F, GABoost_15F_Ranking and
GABoost_15F_Roulette in term of their total number of features selected and average
time taken to select a single feature in cascade of boosted classifiers.
Average number of
Algorithm
features in cascade
Average time taken to
select only a single
feature (sec)
ExBoost_5F
400
335.56
GABoost_15F_Ranking
630.40
56.59
GABoost_15F_Roulette
654.40
77.37
With the information from Table 4.4, the selection of a single feature in
GABoost_15F_Ranking and GABoost_15F_Roulette were respectively about 5.93
and 4.33 times faster than in ExBoost_5F. Also to be noted that the total training
times for GABoost_15F_Ranking and GABoost_15F_Roulette are much shorter due
to reduced number of iteration times and also due to the fact that GA does not search
the features from their total search space size. Tables 4.5 and 4.6 show the details of
the number of features selected and the average time taken to select a single feature
in ten experiments of GABoost_15F_Ranking and GABoost_15F_Roulette.
91
Table 4.5: The number of features selected and the time taken to select only a single
feature in GABoost_15F_Ranking in building of cascade of boosted classifiers.
Number of
Time taken to
features
select only a
selected in
single feature
cascade
(sec)
GABoost_15F_Ranking _1
598
46.49
GABoost_15F_Ranking _2
684
51.24
GABoost_15F_Ranking _3
633
38.77
GABoost_15F_Ranking _4
614
39.15
GABoost_15F_Ranking _5
667
62.13
GABoost_15F_Ranking _6
605
50.29
GABoost_15F_Ranking _7
676
69.33
GABoost_15F_Ranking _8
680
67.26
GABoost_15F_Ranking _9
603
60.93
GABoost_15F_Ranking _10
544
80.34
Average
630.40
56.59
Exp No
In Table 4.5 which shows the ten experiments of GABoost_15F_Ranking, we
can see that the average number of features selected is 630.40 and the average time
taken to select a single feature is 56.59 seconds. The fastest selection of a single
feature was done by GABoost_15F_Ranking_3 which only spent 38.77 seconds.
92
Table 4.6: The number of features selected and the time taken to select only a single
feature in GABoost_15F_Roulette in building of cascade of boosted classifiers.
Number of
Time taken to
features
select only a
selected in
single feature
cascade
(sec)
GABoost_15F_Roulette _1
593
74.01
GABoost_15F_Roulette _2
621
63.17
GABoost_15F_Roulette _3
756
72.18
GABoost_15F_Roulette _4
596
86.44
GABoost_15F_Roulette _5
688
85.63
GABoost_15F_Roulette _6
686
83.99
GABoost_15F_Roulette _7
616
83.93
GABoost_15F_Roulette _8
723
80.32
GABoost_15F_Roulette _9
656
67.26
GABoost_15F_Roulette _10
609
76.73
Average
654.40
77.37
Exp No
In Table 4.6, ten experiments of GABoost_15F_Roulette are shown.
GABoost_15F_Roulette has selected an average of 654.40 features during the
training of cascade of boosted classifiers. For a single feature selection,
GABoost_15F_Roulette just spent an average of 77.37 seconds. The overall
comparison in terms of the number of features selected and time taken to select a
single feature is shown as in Figures 4.4 and 4.5.
93
Comparison in term of number of features selected
Number of features
GABoost_Ranking
GABoost_Roulette
ExBoost
800
600
400
200
0
1
2
3
4
5
6
7
8
9
10
Exp No
Figure 4.4: The number of features selected in GABoost_15F_Ranking and
GABoost_15F_Roulette in ten different experiments compared to ExBoost_15F.
Comparison of time taken to select a single feature
GABoost_Ranking
GABoost_Roulette
ExBoost
Time (s)
400
300
200
100
0
1
2
3
4
5
6
7
8
9
10
Exp No
Figure 4.5: The computational time taken to select a single feature in
GABoost_15F_Ranking and GABoost_15F_Roulette in ten different experiments
compared to ExBoost_15F
94
4.5
Experimental results in terms of the performance of hit detection rates
and false positive detection rates in Ex_Boost_5F, GABoost_15F_Ranking and
GABoost_15F_Roulette
As described at the beginning of Chapter 3, the performance of the cascades
of boosted classifiers are usually measured in terms of their hit rates, missed rates
and false positive rates. All cascades of boosted classifiers were tested by using the
BioID faces database [66] as described in Chapter 3. In terms of the results on this
test dataset, four different examples outputs of cascade of boosted classifiers when it
is applied on a single image are shown in Figure 4.6. When a region in the subwindow of the image is classified as the face, the face detector will draw a rectangle
(red color) that represents the face area.
a
b
c
d
Figure 4.6: Four different results from the cascade of boosted classifiers. Images (a)
shows the face is correctly detected and considered as a hit, (b) shows the face in the
image is not detected or missed detected while in (c), face is not detected but false
positive detection occurred when non-face sub-window is classified as face. Finally
in (d), both hit detection and false positive detection occurred in this image.
95
In this research, the percentage number of faces detected from a total number
of faces in testing set is considered as the hit rate while the percentage number of
undetected faces from the total number of faces in the testing set is considered as the
missed rate. Therefore, the total percentage of the hit rate and the missed rate
together is equal to 100%. In the case of false positive detection, the total number of
false positive detection rectangles will be divided by the total number of all detection
rectangles. In four images examples in Figure 4.6, there are four faces to be detected.
The face detector uses the trained cascade of boosted classifiers to detect all these
faces. In this case, only two faces are detected (image (a) and (d)) and this gives the
hit rate of 2/4 or 50% and a missed rate of also 2/4 or 50%. At the same time, two
false positives detections occurred within these four images examples as shown in
image (c) and (d). So, the face detector made two false positive detections out of the
total of four detections and this give a false positive rate of 2/4 or 50%. The higher
number of features selected in GABoost_15F_Ranking and GABoost_15F_Roulette
as shown in Figure 4.4 should increase the performance of the trained cascades of
boosted classifiers. In Table 4.7, the performances in terms of hit rates and false
positive rates of these cascades are compared to the performance of cascade built
exhaustively with five basic feature types.
96
Table 4.7: The comparison of hit rates and false positive rates performed by the
cascades of boosted classifiers built by ExBoost_5F, GABoost_15F_Ranking and
GABoost_15F_Roulette
Average hit
Algorithm/Technique
rate (%)
False positive
detection/total number
of detection (%)
ExBoost_5F
90.01
62.97
GABoost_15F_Ranking
90.13
61.56
GABoost_15F_Roulette
89.87
67.44
Tables
4.7
show
that
although
GABoost_15F_Ranking
and
GABoost_15F_Roulette were run many times, for example ten times each in this
research, the average of the hit rates achieved by these two techniques seems to be
similar with ExBoost_5F. GABoost_15F_Ranking has achieved an average hit rate of
90.13%, which is slightly better than ExBoost_5F that only managed to achieve
90.01% of hit rate. On the other hand, it seems that GABoost_15F_Roulette has
shown slightly less performance in term of face detection accuracy since it achieved
an average hit rate of 89.87%.
In terms of false positive rates, only GABoost_15F_Ranking has achieved a
lower average rate, 61.56% compared to the false positive rate of ExBoost_5F which
stood at 62.97%. GABoost_15F_Roulette has achieved an average false positive rate
of 67.44%. The details performances in terms of hit rates and false positive rates of
the ten experiments for each of the two proposed techniques, GABoost_15F_Ranking
and GABoost_15F_Roulette are shown in Tables 4.8 and 4.9.
97
Table 4.8: The detail of the hit rates and false positives rates achieved for ten
experiments using GABoost_15F_Ranking in the building of the cascade of boosted
classifiers.
Table
Exp No
Hit Rate
False Positive
Rate
GABoost_15F_Ranking _1
94.25
62.32
GABoost_15F_Ranking _2
89.84
64.17
GABoost_15F_Ranking _3
89.84
68.55
GABoost_15F_Ranking _4
92.26
64.78
GABoost_15F_Ranking _5
85.68
49.83
GABoost_15F_Ranking _6
93.01
55.94
GABoost_15F_Ranking _7
88.68
37.78
GABoost_15F_Ranking _8
85.76
65.60
GABoost_15F_Ranking _9
91.51
74.10
GABoost_15F_Ranking _10
90.51
72.57
Average
90.13
61.56
4.8
also
shows
that
some
cascades
trained
using
GABoost_15F_Ranking have achieved higher hit rates such as 94.25% and 93.01%
achieved by the first and the sixth experiments, GABoost_15F_Ranking _1 and
GABoost_15F_Ranking _6. At the same time, the false positive rates (62.32% and
55.94%) achieved by these two experiments are better than the false positive rate
achieved by ExBoost_5F, 62.97%. We can also say that the worst hit rate achieved
by GABoost_15F_Ranking is in the fifth experiment, GABoost_15F_Ranking_5
which has 85.68% of hit rate. However, this GABoost_15F_Ranking_5 has the best
rate of false positive detection, 49.83%. It could be said that this trained cascade of
classifiers in GABoost_15F_Ranking_5 is the hardest cascade of boosted classifiers
for a specific sub-window to pass through until its last layer to be classified as face.
The graph of the different performances of the ten experiments of cascades of
boosted classifiers built by GABoost_15F_Ranking with ExBoost_5F is shown in
Figure 4.7.
98
GABoost_Ranking vs ExBoost
%
GABoost_Ranking_Hit_Rate
GABoost_Ranking_False_Positive_Rate
ExBoost_Hit_Rate
ExBoost_False_Positive_Rate
100
90
80
70
60
50
40
30
20
10
0
1
2
3
4
5
6
Exp No
7
8
9
10
Figure 4.7: The performance in terms of hit rates and false positive rates between the
ten experiments of GABoost_15F_Ranking and ExBoost_5F.
Table 4.9: The detail of the hit rates and false positive rates achieved for ten
experiments using GABoost_15F_Roulette in the building of the cascade of boosted
classifiers
Exp No
Hit Rate
False Positive
Rate
GABoost_15F_Roulette _1
96.59
78.47
GABoost_15F_Roulette _2
91.67
73.40
GABoost_15F_Roulette _3
85.01
60.94
GABoost_15F_Roulette _4
93.51
73.51
GABoost_15F_Roulette _5
82.01
42.87
GABoost_15F_Roulette _6
89.51
70.30
GABoost_15F_Roulette _7
93.51
65.48
GABoost_15F_Roulette _8
84.01
68.98
GABoost_15F_Roulette _9
89.84
64.79
GABoost_15F_Roulette _10
93.01
75.69
Average
89.87
67.44
99
In Table 4.9, the details results of the performances of the cascades of
boosted classifiers built by GABoost_15F_Roulette are shown. From these
experiments, the cascade of boosted classifiers in GABoost_15F_Roulette_1 has
achieved the highest hit rate, 96.59% but at the same time it has also achieved the
worst false positive rate which is equal to 78.47%. We can also see that the cascades
of
boosted
classifiers
trained
GABoost_15F_Roulette_4,
GABoost_15F_Roulette_2,
in
GABoost_15F_Roulette_7
and
GABoost_15F_Roulette_10 have achieved higher hit rates, 91.67%, 93.51%, 93.51%
and 93.01%, respectively, compared to cascade trained in ExBoost_5F(90.1%).
However, all of them also have higher false positive rates than the false positive rate
in the cascade of boosted classifiers in ExBoost_5F. From these ten experiments,
there are only two cascades of boosted classifiers that have lower false positive rates
than cascade in ExBoost_5F (62.97%), which are 60.94% and 42.87%, performed by
GABoost_15F_Roulette_3 and GABoost_15F_Roulette_5, respectively However,
they performed less in terms of hit rates as they are only able to achieve 85.01% and
82.01%. The graph of the performances of the ten cascades of boosted classifiers in
GABoost_15F_Roulette is shown in Figure 4.8.
GABoost_Roulette vs ExBoost
GABoost_Roulette_Hit_Rate
GABoost_False_Positive_Rate
ExBoost_Hit_Rate
ExBoost_False_Positive_Rate
100
80
%
60
40
20
0
1
2
3
4
5
6
Exp No
7
8
9
10
Figure 4.8: The performance in terms of hit rates and false positive rates between the
ten experiments of GABoost_15F_Roulette and ExBoost_5F.
100
In
addition
to
these
experiments
of
GABoost_15F_Ranking
and
GABoost_15F_Roulette, another five experiments have been conducted. This type of
experiment is referred to as GABoost_Init. The purpose is to investigate how
GABoost technique performs when the best chromosomes found in the main GA
population are reused as the initial population in the next round of GA. Other than
this GABoost initialization process, the algorithm used is the same as in
GABoost_15F_Ranking. Table 4.10 and Figure 4.9 show the results of hit rates and
false positive rates of these five GABoost_Init experiments.
Table 4.10: The details of the hit rates and false positive rates achieved for ten
experiments using GABoost_Init in the building of the cascade of boosted classifiers.
Exp No
Hit Rate
False Positive
Rate
GABoost_Init _1
94.67
77.15
GABoost_Init _2
85.10
73.59
GABoost_Init _3
85.26
62.18
GABoost_Init _4
88.54
68.78
GABoost_Init _5
89.77
71.12
Average
89.67
70.56
GABoost_Init vs ExBoost
GABoost_Init_HitRates
GABoost_Init_FalsePositive
ExBoost_HitRates
ExBoost_FalsePositive
100.00
%
80.00
60.00
40.00
20.00
0.00
1
2
3
Exp No
4
5
Figure 4.9: The performance in terms of hit rates and false positive rates between the
ten experiments of GABoost_Init and ExBoost_5F.
101
From Table 4.10 and Figure 4.9, we can see that GABoost_Init has slightly
lower performance in terms of hit rates (89.67%) and higher false detection rates
(70.56%) in comparison to GABoost_15F_Ranking and GABoost_15F_Roulette.
However, it has a similar performance compared to ExBoost_5F. The usage of the
best chromosomes found in the main GA population that are reused as the initial
population in the next round of GA does not improve the performance of the trained
cascade of boosted classifiers. The random initialization of GA population has
produced better trained cascade of boosted classifiers as shown in the results of
GABoost_15F_Ranking and GABoost_15F_Roulette. Random initialization offer
more variety of feature types and locations compared to the technique of
GABoost_Init which only has less variety of feature types and locations in the
initialization process. This is due to the fact that the best chromosomes found in the
main GA population in the previous round are in facts consist of feature types which
are situated in the location close to them.
102
In this research, we can see that GABoost_15F_Ranking has performed
slightly better than GABoost_15F_Roulette and ExBoost_5F in terms of hit rate and
false positive rate. From the graphs in Figure 4.7 and Figure 4.8, we can observe that
the hit rates are usually in direct proportion to the false positive rates for the cascades
of boosted classifiers trained in GABoost_15F_Ranking and GABoost_15F_Roulette.
The cascade with higher hit rates would usually have higher false detection rate.
Actually, it is easier for a face detector to classify a region as a face in a specific subwindow image by using this kind of cascade of boosted classifiers, but at the same
time, this cascade contributes higher false positive rate due to its easiness. The
example of this cascade is GABoost_15F_Roulette_1 which has 96.59% of hit rate
and 78.47% of false positive rate.
On the other hand, the cascade of boosted classifiers with low hit rate usually
has also low false detection rate. This kind of cascade of boosted classifier can be
categorized as “hard” cascade because it is difficult for a face detector using this
cascade to classify a specific sub-window of image as a face. This “hard” cascade of
boosted classifiers contributes to a very low false positive rate and also low hit rate.
The examples of these cascades of boosted classifiers are GABoost_15F_Roulette_5,
GABoost_15F_Ranking_5 and GABoost_15F_Ranking_7. They only managed to
achieve 82.01%, 85.68% and 88.68% of hit rates, respectively. At the same time,
these three cascades have 42.87%, 49.83% and 37.78% of false positive rates,
respectively.
Finally, from these experiments, the best cascades of boosted classifiers in
terms
of
high
hit
rates
and
low
false
positive
rates
achieved
are
GABoost_15F_Ranking_1 with 94.25% of hit rate and 62.32% of false positive rate
and also GABoost_15F_Ranking_6 with 93.01% of hit rate and 55.94% of false
positive rate. The different examples of the outputs of the cascades of boosted
classifiers are shown as in Figure 4.10.
103
Figure 4.10: Some examples of the test images. The top three images show faces are
not detected and only false positive detection occurred. The middle three images
show faces are detected simultaneously with false positive detections while in the
bottom three images, detection of faces are done perfectly without any false positive
detection.
104
4.6
Experimental
results
of
the
seven
new
feature
types
in
GABoost_15F_Ranking and GABoost_15F_Roulette
The performance in terms of the number of selections of seven new feature
types as described in Figure 3.20 in Chapter 3 during the training of cascades of
boosted classifiers are also analyzed. The selections of the features or weak
classifiers to form strong classifiers in cascades of boosted classifiers were done by
Adaboost algorithm [25] [26] which has been introduced in Chapter 2 and described
in Chapter 3. The selections were done dynamically based on the initial parameters
that were set during the training of cascades of boosted classifiers as described in
Chapter 3. As in Adaboost algorithm shown in Figure 3.11 in Chapter 3, the best
weak classifier which actually consists of only a single feature is selected in each
iteration. Usually, most of the types of features selected were the simple features
types like the 5 basic feature types in Figure 3.2 in Chapter 3. ExBoost_5F use only
these five feature types to train cascade of boosted classifiers with exhaustive search.
The additional three feature types ( already exist in OpenCV ) as shown in Figure
3.19 and the seven new feature types in Figure 3.20 will enrich the quality of feature
solutions sets thus increase the search space and training time. For this purpose, GA
is used as an evolutionary technique to select the features. However, in this research,
the interesting focus is to see how the new seven feature types reflects to this training
of cascade of boosted classifiers since the other eight features (five basics and three
existing) are the commonly used features in OpenCV. The evaluation of these new
features were done based on their number of how many times that they were selected
in training of the cascade of boosted classifiers by GABoost_15F_Ranking and
GABoost_15F_Roulette. Figure 4.11 and Figure 4.12 show the performances of all
seven new feature types in GABoost_15F_Ranking and GABoost_15F_Roulette.
105
New feature types vs All feature types
Number of features
Total New Seven Feature Types
Total All 15 Feature Types
700
600
500
400
300
200
100
0
1
2
3
4
5
6
7
8
9
10
Exp No
Figure 4.11: The number of all seven new feature types selected during the training
of ten cascades of boosted classifiers using GABoost_15F_Ranking.
New feature types vs All feature types
Number of features
Total New Seven Feature Types
Total All 15 Feature Types
800
700
600
500
400
300
200
100
0
1
2
3
4
5
6
7
8
9
10
Exp No
Figure 4.12: The number of all seven new feature types selected during the training
of ten cascades of boosted classifiers using GABoost_15F_Roulette.
In Tables 4.11 and 4.12, the results of the selections of each of the seven
feature types are shown. For each feature type a, b, c, d, e, f and g in Figure 3.20, the
average numbers of how many times they are selected in the training of cascade of
106
boosted classifiers using GABoost_15F_Ranking and GABoost_15F_Roulette are
shown.
Table 4.11: Details of the average numbers of new seven feature types selected by
GABoost_15F_Ranking.
Feature Type
a
b
c
d
e
f
g
Average number
of feature selection
27.90
15.40
11.50
2.20
37.30
30.90
4.50
Total seven new
features
129.70
% from total
features
20.60
Table 4.12: Details of the average numbers of new seven feature types selected by
GABoost_15F_Roulette.
Feature Type
a
b
c
d
e
f
g
Average number
of feature selection
26.10
14.40
10.20
2.10
35.20
30.40
5.70
Total seven new
features
124.10
% from total
features
18.99
These seven feature types contributed an average of 20.60% and 18.99% of
the total features selected by GABoost_15F_Ranking and GABoost_15F_Roulette
respectively. In GABoost_15F_Ranking, the seven feature types a, b, c, d, e, f and g
have shown different level of importance in cascades of boosted classifiers. Among
these seven feature types, feature types a, e and f has the most significant influence in
the cascade of boosted classifiers as they were selected 27.9, 37.3 and 30.9 times,
respectively. These three feature types contributed about 76.17% from the total seven
newly proposed feature types and about 15.67% from the total features selected in
the cascades of classifiers built by GABoost_15F_Ranking. The feature types b and c
have little impact on the cascades as they were selected only 15.4 and 11.5 times.
107
These numbers represent only 4.26% of the total features selected. Meanwhile, as
also shown in Table 4.11, the feature types d and g are not considered as good
feature types since they were rarely selected in the training of cascades of boosted
classifiers. These two feature types were selected 2.2 and 4.5 times, respectively,
which equaled to 1.06% of the total features selected. The detail distributions of each
of the selected seven new feature types are shown in Table 4.13 and the graph
representing these numbers of selections by using GABoost_15F_Ranking is shown
in Figure 4.13.
Table
4.13:
Details
of
the
seven
new
feature
types
selected
by
GABoost_15F_Ranking in the ten experiments
Exp No
a
b
c
d
e
f
g
% from
total
features
GABoost_15F_Ranking _1
27
10
9
4
40
26
2
19.73
GABoost_15F_Ranking _2
28
13
11
2
42
32
4
19.30
GABoost_15F_Ranking _3
34
13
13
2
33
32
6
21.01
GABoost_15F_Ranking _4
25
23
11
0
45
44
4
24.76
GABoost_15F_Ranking _5
28
18
13
3
46
34
4
21.89
GABoost_15F_Ranking _6
37
10
17
3
39
25
3
22.15
GABoost_15F_Ranking _7
31
14
13
2
37
24
5
18.64
GABoost_15F_Ranking _8
24
25
9
3
35
35
2
19.56
GABoost_15F_Ranking _9
24
15
11
2
31
23
9
19.07
GABoost_15F_Ranking _10
21
13
8
1
25
34
6
19.85
Average
27.90
15.40
11.50
2.20
37.30
30.90
4.50
20.60
108
Selections of seven new feature types in
GABoost_Ranking
Number of seven new
features
160
g
140
120
f
100
e
80
d
60
c
40
b
20
a
0
1
2
3
4
5
6
7
8
9
10
Exp No
Figure 4.13: The distributions of the seven new feature types selected during the
training of ten cascades of boosted classifiers using GABoost_15F_Ranking.
In
the
case
of
training
of
cascades
of
boosted
classifiers
by
GABoost_15F_Roulette, the results in Table 4.12 have shown similar performances
achieved by the seven feature types as in GABoost_15F_Roulette. Like in
GABoost_15F_Ranking, the three feature types a, e and f have shown their
significant roles in cascades of boosted classifiers because they were selected about
26.1, 35.2 and 30.4 times respectively during the training process. These three
feature types contributes about 73.89% from the total seven newly proposed feature
types and about 14.01% from the total features selected in the cascades of classifiers
built by GABoost_15F_Roulette. The feature types b and c have been selected 14.4
and 10.2 times, only about 3.76% of the total features selected. These two feature
types seems to be only half important than the three feature types a, e and f.
Meanwhile, as shown in Table 4.12, the feature types d and g are the less significant
feature types since they were rarely selected in the training of cascade of boosted
classifiers. These two feature types were only selected 2.1 and 5.7 times respectively
which are only equaled to 1.19% of the total features selected. The detail
distributions of each of the selected seven new feature types is shown in Table 4.14
and
the
graph
representing
these
number
GABoost_15F_Ranking is shown in Figure 4.14.
of
selections
by
using
109
Table
4.14:
Details
of
the
seven
new
feature
types
selected
by
GABoost_15F_Roulette in the ten experiments.
Exp No
a
b
c
d
e
f
g
% from
total
features
GABoost_15F_Roulette _1
23
14
10
1
34
25
9
19.56
GABoost_15F_Roulette _2
27
14
14
5
39
31
7
22.06
GABoost_15F_Roulette _3
24
17
14
4
37
42
8
19.31
GABoost_15F_Roulette _4
23
10
6
2
28
35
7
18.62
GABoost_15F_Roulette _5
30
13
11
5
30
30
4
17.88
GABoost_15F_Roulette _6
23
14
14
0
35
27
4
17.06
GABoost_15F_Roulette _7
28
13
9
1
37
35
5
20.78
GABoost_15F_Roulette _8
32
18
10
0
48
27
4
19.23
GABoost_15F_Roulette _9
32
14
8
0
34
24
5
17.84
GABoost_15F_Roulette _10
19
17
6
3
30
28
4
17.57
Average
26.10
14.40
10.20
2.10
35.20
30.40
5.70
18.99
Selections of seven new feature types in
GABoost_Roulette
Number of seven new
features
160
g
140
f
120
e
100
d
80
c
60
b
40
a
20
0
1
2
3
4
5
6
Exp No
7
8
9
10
Figure 4.14: The distributions of the seven new feature types selected during the
training of ten cascades of boosted classifiers using GABoost_15F_Roulette.
From the results of selections of the seven new feature types by using
GABoost_15F_Ranking and GABoost_15F_Roulette, it seems that the pattern of the
selections of these new seven feature types are very similar. In both categories,
110
features types a, e and f are the most significant features, b and d are the averagely
selected features types and features types d and g are the less important.
4.7
Analysis of the Experimental Results
In these experiments, the cascades of boosted classifiers built using
ExBoost_5F, GABoost_15F_Ranking and GABoost_15F_Roulette have been
analyzed in terms of their computational training times, hit rates, false positive rates,
number of features selected and the time to select a single feature. In addition to that,
the results of the seven newly proposed feature types used in GABoost_15F_Ranking
and GABoost_15F_Roulette in this research have also been analyzed.
In terms of computational training time, GABoost_15F_Ranking and
GABoost_15F_Roulette with larger search space (15 different feature types) have
largely outperformed ExBoost_5F with only small search space (5 different feature
types). GABoost_15F_Ranking and GABoost_15F_Roulette were about 3.76 and
2.65 times respectively faster than ExBoost_5F in building 15 stages cascades of
boosted classifiers. These performances relate highly to the technique of an
evolutionary search with GA as proposed in this research where the huge feature
solutions search space was not exhaustively explored in order to select a single
feature. The reduction of computational training time and the increased number of
features selected in GABoost_15F_Ranking and GABoost_15F_Roulette has also
directly reduced the time taken to select a single feature during this training process.
Meanwhile, in each iteration, GA has selected a single best found feature or
weak classifiers after some generations. This selected feature is not necessarily the
best solution due to the characteristic of an evolutionary search which is not
performing the exhaustive search in each possible features solutions set. The selected
feature can only be considered as the best feature found by an evolutionary search of
GA. Based on the performance of this selected feature, the new training sample sets
especially the negative samples were generated for the selection of the following
111
feature or weak classifier in the next iteration. This is the characteristic of Adaboost
algorithm and training of cascade of boosted classifiers as explained in Chapter 3. As
a consequence, the selections of weak classifiers in each different stage could have
some different in term of the negative training samples as they were generated based
on the performance of the weak classifiers that have been selected previously.
However, all the cascades of boosted classifiers were still using the same criteria and
parameters in their training process such as the number of stages, desired hit rate and
false alarm rate.
GABoost_15F_Ranking and GABoost_15F_Roulette have been able to
produced cascades of boosted classifiers in much shorter time compared to
ExBoost_5F. In addition to that, the cascades of boosted classifiers produced by
GABoost_15F_Ranking have achieved better hit rates (90.13%) and false positive
rates
(61.56%) while
the cascades
of boosted
classifiers
produced
by
GABoost_15F_Roulette have achieved slightly inferior performances of hit rates
(89.87%) and false positive rates (67.44%). These are the results of the different
selections schemes used in Genetic Algorithm for this research. The selection
scheme is used to select a finite number of chromosomes from the main population
to become parent chromosomes. These parent chromosomes are put into a mating
pool and then used for crossover and mutation in GA. From the experiments, we can
see
that
GABoost_15F_Ranking
has
achieved
better
performance
than
GABoost_15F_Roulette. In term of computational training time, Ranking Scheme
seems to be faster than Roulette Wheel Scheme. In GABoost_15F_Roulette,
additional computational tasks are required to calculate the normalized fitness values
and accumulative fitness values, and to generate the probability rates and to perform
comparison of these rates with the accumulative normalized fitness values before the
selection of a single chromosome is done. In GABoost_15F_Ranking, the focus is
more on the best ranked chromosomes in the sorted population as they are becoming
parents for crossover and mutation while in GABoost_15F_Roulette, there is still a
small possibility that a chromosome situated at the bottom of the sorted population is
chosen to become one of the parents. However, the results of the experiments show
that GABoost_15F_Ranking has performed better than GABoost_15F_Roulette in
term of computational training time, hit rates and false positive rates.
112
In the case of the seven new feature types, it seems that only the feature types
a, e and f have shown their significant roles in cascades of boosted classifiers. All
these seven feature types contribute about 20% of the total features selected in
training of cascades of boosted classifiers. The results show some similarities of
achievement by GABoost_15F_Ranking and GABoost_15F_Roulette. It is clear that
the first eight feature types are the most significant features since they contribute
about 80% of the total features. This is due to the simpler shapes and forms of these
eight feature types especially the five basic feature types used. In the seven new
feature types, their shapes and forms have become more complex. However, it might
be interesting to see the performances of these seven new proposed feature types
with other kind of training samples.
With these results from all of the experiments, the main purpose of this
research which is to use GA to select features and reducing training time has been
met. In addition to that, the trained cascades of boosted classifiers in
GABoost_15F_Ranking and GABoost_15F_Roulette have similar and sometimes
better performances in hit rates and false positive rates. The performances achieved
by these cascades are worth with their very low computational training time.
4.8
Summary
This chapter represents the experiments that were done in this research. The
experiments were conducted in order to evaluate and analyze the two different
techniques of the features selections in Adaboost algorithm to build cascade of
boosted classifiers for face detection. The two techniques are: (1) the feature
selections with exhaustive search over the entire possible features solutions search
space and (2) the features selections with an evolutionary search using the
characteristic of GA together with an increased number of feature types. The latter is
also divided into two different cases of selections schemes used which are Ranking
Scheme and Roulette Wheel Scheme. As a result, these experiments are categorized
113
into three different categories which are exhaustive search (ExBoost_5F), GA with
Ranking Scheme (GABoost_15F_Ranking) and GA with Roulette Wheel Scheme
(GABoost_15F_Roulette).
Results from the experiments showed feature selections of Adaboost training
using Genetic Algorithm performed better than feature selections with exhaustive
search. Genetic Algorithms with both Ranking Scheme and Roulette Wheel Scheme
were able to select features in much shorter time in building cascade of boosted
classifiers. The overall performances of hit rates and false positive rates of the three
categories of experiments are quite similar. However, in some experiments,
GABoost_15F_Ranking and GABoost_15F_Roulette produced better cascades of
boosted classifiers with good combinations of high hit rates and low false positive
rates. The contributions of the seven newly proposed feature types were also
analyzed. They were evaluated based on their number of selections in the cascades
training. These seven features represent about 20% of the total number of features
with the three new feature types a, e and f showing their significant roles in cascades
of boosted classifiers.
With these results, the main purpose of this research which is to use Genetic
Algorithm to select features and reducing training time has been met. In addition to
that, the trained cascades of boosted classifiers in GABoost_15F_Ranking and
GABoost_15F_Roulette have similar and sometimes better performances in hit rates
and false positive rates. With very low computational training time, the performances
achieved by these cascades are acceptable.
114
CHAPTER 5
CONCLUSIONS AND FUTURE WORKS
5.1
Conclusions
As stated in the beginning of this thesis, the problem of human face detection
is challenging, due to the fact that faces are non-rigid objects that have a high degree
of variability in size, shape, color, and texture. There are several methods in the
literature that have been proposed for face detection under the effects of different
conditions such as head rotation, illumination, facial expression and occlusion. One
of the most efficient methods or techniques in detecting faces in a single image was
proposed by Viola and Jones referred to as Adaboost Face Detection using cascade
of boosted classifiers with Haar-based right rectangle features [23] and improved
with additional flip features by Lienhart [29]. Their methods require the system to
learn an object to be detected such as the human face, by using simple Haar-based
rectangle features. However, this learning process requires large computational time.
To reduce this computational training time, Treptow and Zell [14] have combined
Evolutionary Algorithm with Adaboost in training of a single stage classifiers.
This research has been extended through the work of Treptow and Zell [14].
They implemented evolutionary search with the characteristics of GA inside the
AdaBoost framework to build a single stage classifiers. In their paper [14], six
feature types including one new feature type introduced by them, and two different
datasets: face and ball, are used in the training. Their experimental results have
shown that GA has been able to reduce the computational time of training of this
single stage cascade of classifiers. However in their paper [14] also, Treptow and
Zell have mentioned that the use of cascade of boosted classifiers which consist of
115
N-stage cascaded classifiers, introduced by Viola and Jones [13], was not addressed
in their paper. In this research, the training of cascade of boosted classifiers is well
addressed beside the use of GA to select the features and the seven new feature types
proposed.
Two different selection schemes, the Ranking Scheme and the Roulette
Wheel Scheme, were used in the GA approach. Furthermore, the population size was
set at 200, crossover and mutation rates are 90% and 10%, respectively, and the
maximum number of generations was set to 200.
Seven new feature types were proposed in order to increase the quality of
feature selections. As a consequence, the feature solution sets and the search space
become bigger and GA has been used to overcome the problem of feature selections
in this huge search space. Computational training time was drastically reduced within
60% to 70%. The trained cascade of boosted classifiers using GA seems to have
better performance in terms of hit rate and also has much lower rate of false positive
compared to the cascade built exhaustively. This lower false positive rate happened
due to the higher number and more variety of features types selected by GABoost. It
was observed also that the best cascade built by GA has outperformed the cascades
built exhaustively without GA in terms of training time, hit rate and false positive
rate. Furthermore, the seven new feature types have shown different levels of
importance in this training. However, it is believed that the performance of these
seven feature types could be different if another kind of training samples set is used.
This research has shown that computational training time of cascade of
boosted classifiers is drastically reduced by using GA to select its features and the
performance of this cascade is also good.
116
5.2
Future Works
There are many directions for further research in this field of research. Other
search algorithms such as Memetic Algorithm [61] [62] could be used to replace GA
to select the features for the cascade training. In fact, Memetic Algorithm is a
modified GA incorporated with local search technique. In [62], it had been shown
that Memetic Algorithm performed better as a search algorithm compared to generic
GA. The example of Memetic Algorithm used in [62] is shown as in Figure 5.1.
Figure 5.1: A Generic Memetic Algorithm as used by Areibi et al. in [62].
In the GA itself, there are many parameters that could be changed in order to
optimize the performance of the feature selections. These parameters include the size
of the GA population, the size of the mating pool that is used to keep all parents
chromosomes selected by the GA selection schemes, the types of the selection
schemes used, the termination criteria such as the maximum number of generations
and also the choice rates of probability of crossover and mutation. Usually, a fixed
crossover rate and mutation rates are used when implementing GA in many
applications, such as that done in this thesis. However, dynamic crossover and
mutation rates could be used in the future as suggested in [60]. These dynamic rates
of crossover and mutation could be implemented and analyzed in future research in
order to further exploit the neighbouring solutions of the best features found. These
dynamic rates had been used as a tuning tool in adaptive control by T. L. Seng, M.
117
Khalid and R. Yusof [60]. The example of dynamic rates for crossover and mutation
is shown in Figure 5.2 with the crossover rate equalled to e
equalled to e
( 0.6*
N
)
M
(−
N
)
M
and the mutation rate
− 1 . The variables M represent the maximum number of
generations and N represents instant generation.
Crossover Rate
Mutation Rate
1.000
0.900
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.000
0
50
100
150
200
Figure 5.2: The dynamic rate of crossover and mutation for 200 generations.
On the other hand, since the speed of detection rate was not addressed in this
research and by looking at the number of features selected, it seems that the
performance of the face detector will be slightly slower. This is due to the higher
number of features or weak classifiers selected by GA in the training of the cascade
of boosted classifiers compared to the number of features selected exhaustively in the
same training. So, the improvement of this cascade built by GA with large feature
types could be done by using the technique of Evolutionary Pruning, an approach
introduced very recently in 2006 by J. S. Jang and J. H. Kim [57] which has the
purpose of reducing the number of weak classifiers in each stage of the cascade
while satisfying the probability conditions, detection rate and false positive rate
achieved by the cascade of classifiers during its training process. Evolutionary
118
Pruning method is used for pruning the structure of cascade of boosted classifiers.
This technique as used in [57] is shown in Figure 5.3.
Figure 5.3: The procedure of Evolutionary Pruning used in [57] to reduce the
number of weak classifiers trained by Adaboost during the training of cascade of
boosted classifiers.
In face detection system, the face detector will use the trained cascade of
boosted classifiers to detect the presence of faces in the image. The computational
time of this face detector is proportional to the number of weak classifiers in the
cascade of boosted classifiers. Therefore, the reduction of the number of weak
classifiers using Evolutionary Pruning technique directly affects to the increment of
the detection speed.
Finally, the major contributions of this thesis are the implementation of
Genetic Algorithm inside Adaboost framework to select features and the seven new
feature types which have been proposed to increase the quality of the feature
solutions set. The results of this research confirmed that Genetic Algorithm is very
useful as an evolutionary searching algorithm in a very large search space. The new
seven feature types also influenced the selections of features by Adaboost algorithm.
119
They have shown their different significance in the trained cascades of boosted
classifiers.
Papers related to this research have been submitted to several conferences
locally and abroad. In early 2007, this research paper has been reviewed and
accepted for oral presentation and in the proceedings of the 3rd International
Colloquium on Signal Processing and Its Application 2007 (CSPA2007) held in
Malacca on March 2007 (See Appendix A). Recently, this research has also been
reviewed and accepted for oral presentation and publication at the 3rd IASTED
International Conference on Computational Intelligence 2007 (CI07) which was held
in Banff, Alberta, Canada on July 2007 (See Appendix B). Furthermore, the research
papers related to this thesis have also been accepted at the 1st Malaysian Japan
International Symposium on Advanced Technology 2007 (MJISAT2007) and at
International Conference on Robotics, Vision, Information and Signal Processing
2007 (ROVISP2007). Both conferences will be held in Kuala Lumpur and Penang
towards the end of the year 2007.
120
REFERENCES
1.
Takeo Kanade, Computer recognition of human faces. Journal of
Interdisciplinary System Research (ISR), Vol 47, Germany, 1977.
2.
Matthew A. Turk and Alex P. Pentland, Face Recognition using Eigenfaces,
Proceedings of IEEE on Computer Vision and Pattern Recognition (CVPR),
June 1991. Hawaii, USA. 586-591.
3.
Sung K. K. and Poggio T., Example-based learning for view-based human face
detection, IEEE Transactions on Pattern Analysis and Machine Intelligence,
1998, 20(1): 39-51
4.
Rowley H., Baluja S. and Kanade T., Neural network-based face detection,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(1):
23-38
5.
Schneiderman H. and Kanade, T., A statistical method for 3d object detection
applied to faces and cars, Proceedings of the Computer Vision and Pattern
Recognition Conference, June 13-15, 2000. South Carolina, USA. 746-752.
6.
Thomas Heseltine, Nick Pears and Jim Austin, Evaluation of Image PreProcessing Techniques for Eigenbase based Face Recognition, Proceedings of
the Second International Conference on Image and Graphics, SPIE vol. 4875,
July 2002. San Jose, USA. 677-685
7.
Ilker Atalay, Face Recognition Using Eigenfaces, M.Sc Thesis, Istanbul
Technical University, January 1996
8.
Chengjun Liu and Harry Wechsler, Evolutionary Pursuit and Its Application to
Face Recognition, IEEE Transaction on Pattern Analysis and Machine
Intelligence, 22 (6), June 2000. 570-582.
9.
Zhao W., Chellappa R., Rosenfeld A. and Phillips P. J., Face recognition: A
Literature Survey, Journal of ACM Computing Survey, 35(4), 2003. 399-458.
10.
Ming-Hsuan Yang, David J. Kriegman and Narendra Ahuja, Detecting Faces in
Images: A Survey, IEEE Transaction on Pattern Analysis And Machine
Intelligence, 24(1), January 2002
11.
Peter N. Belhumeur, Joao P. Hespanha and David J. Kriegman, Eigenfaces vs.
Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE
Transaction on Pattern Analysis and Machine Intelligence, 19(7), July 1997.
711-720
121
12.
Betul Karaomeruglu and Reza Hassanpour, A Comparative Study of Human
Face Identification in Presence of Illumination, Occlusion and Expression, IJCI
Proceedings of Intl. XII. Turkish Symposium on Artificial Intelligence and
Neural Networks,1(1), Turkey, July 2003
13.
Lindsay I Smith, A Tutorial on Principal Components Analysis, February 26,
2002.
14.
A. Treptow and A. Zell, Combining Adaboost Learning and Evolutionary
Search to select Features for Real-Time Object Detection, Proceedings Of the
Congress on Evolutionary Computational CEC 2004, Vol. 2, 2107-2113, San
Diego, USA, 2004.
15.
W. Konen, E. S. Krüger, “ZN-Face: A system for access control using
automated face recognition,” Proceedings of International Workshop on
Automatic Face- and Gesture-Recognition (IWAFGR), Zurich, Switzerland,
1995
16.
Bahadir K. Gunturk, Aziz U. Batur, Yucel Altunbasak, Monson H. Hayes and
Russell M. Mercereau, Eigenface-Domain Super-Resolution for Face
Recognition, IEEE Transactions on Image Processing, 12(5), May 2003
17.
Ji Chen, Xilin Chen and Wen Gao, Expand Training Set For Face Detection by
Genetic Algorithm Resampling, Proceeding of the Sixth IEEE International
Conference on Automatic Face and Gesture Recognition (FGR’04), 2004.
18.
Zehang Sun, George Bebis and Ronald Miller, Object Detection using Feature
Subset Selection, Journal of Pattern Recognition Society, Vol. 27, March 2004,
2165-2176.
19.
Edgar Osuna, Robert Freund and Federico Girosi, Training Support Vector
Machines: Application to Face Detection, Proceeding of Computer Vision and
Pattern Recognition, June 17-19, Puerto Rico, 1997.
20.
Jun Miao, Wen Gao and Jie Liu, Gravity Center Template Based Human Face
Feature Detection, The 3rd International Conference on Multimodal Interfaces
(ICMI), October 14-16, Beijing, China, 2000
21.
T.K. Leung, M.C. Burl and P. Perona, Finding Faces in Cluttered Scenes using
Random Labeled Graph Matching, in The 5th International Conference on
Computer Vision, Cambridge, USA, June 1995.
22.
Rien Lien Hsu, Mohamed Abdel Mottaleb and Anil K. Jain, Face Detection in
Colour Images, Proceedings of the IEEE International Conference on Image
Processing, Greece, October 2001, 1046-1049.
23.
Viola, P. and Jones, M., Rapid object detection using a boosted cascade of
simple features, IEEE Proceedings of the Computer Vision and Pattern
Recognition Conference (CVPR), December 11-13, Hawaii, USA, 2001.
122
24.
Fleuret, F. and Geman, D., Coarse-to-fine visual selection, International
Journal of Computer Vision, 41(2), 2001, 85-107.
25.
Freund Y. and Schapire R. E., A Short Introduction to Boosting, Journal of
Japanese Society for Artificial Intelligence, 14(5), September 1999. 771-780
26.
Tieu, K. and Viola, P, Boosting image retrieval, Proceedings of the Computer
Vision and Pattern Recognition Conference, 2000, 228-235.
27.
Viola, P. and Jones, M., Robust real time object detection, 2nd International
Workshop On Statistical And Computational Theories Of Vision – Modeling,
Learning, Computing and Sampling, July 13, Vancouver, Canada, 2001.
28.
Brunelli, R. and Poggio, T., Face recognition: Features versus templates, IEEE
Transaction on Pattern Analysis and Machine Intelligence, 15(10), 1993, 10421052.
29.
Lienhart R., Kuranov A. and Pisarevsky V., Empirical analysis of detection
cascades of boosted classifiers for rapid object detection. DAGM'03, 25th
Pattern Recognition Symposium, pages 297-304, Germany, 2003.
30.
Beymer D. J., Face recognition under varying poses, Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition, 1994, 756-761.
31.
Cootes T., Walker K., and Taylor C., View-based active appearance models,
Proceedings of International Workshop on Automatic Face and Gesture
Recognition, 2000.
32.
Li Y., Gong S., and Liddell H., Support vector regression and classification
based multi-view face detection and recognition, Proceedings of International
Workshop on Automatic Face and Gesture Recognition, 2000.
33.
Y. Li, S. Gong, J. Sherrah, and H. Liddell, “Multi-view Face Detection Using
Support Vector Machines and Eigenspace Modelling”, Proc. International
Conference on Knowledge-based Intelligent Engineering System and Allied
Tech., 2000, 241-245.
34.
Georghiades A.S., Kriegman D.J., and Belhumeur P.N., Illumination cones for
recognition under variable lighting: Faces, Proceedings of IEEE International
Conference on Computer Vision and Pattern Recognition, 1998, 52-59.
35.
Yacoob Y., Lam H-M., and Davis L., Recognizing faces showing expressions,
Proceedings of International Workshop on Automatic Face and Gesture
Recognition, 1995, 278-283.
36.
Guizatdinova I. and Surakka V., Detection of Facial Landmarks from Neutral,
Happy, and Disgust Facial Images, International Conference in Central Europe
on Computer Graphics, 2005, 55-62.
123
37.
Hotta K., A Robust Face Detection under Partial Occlusion, Proceedings of
International Conference on Image Processing, 2004, 597-600.
38.
Kotropoulos C. and Pitas I., Rule-Based Face Detection in Frontal Views,
Proceedings of International Conference on Acoustics, Speech and Signal
Processing, vol. 4, 1997, 2537-2540.
39.
Sirohey S.A., Human Face Segmentation and Identification, Technical Report
CS-TR-3176, Center for Automation Research, University of Maryland, USA,
November 1993.
40.
Chetverikov D. and Lerch A., Multiresolution Face Detection, Theoretical
Foundations of Computer Vision, vol. 69, 1993, 131-140.
41.
Yow K.C. and Cipolla R., A Probabilistic Framework for Perceptual Grouping
of Features for Human Face Detection, Proceedings of the 2nd International
Conference on Automatic Face and Gesture Recognition, 1996, 16-21.
42.
Yang J. and Waibel A., A Real-Time Face Tracker, Proceedings of the 3rd
Workshop on Applications of Computer Vision, 1996, 142-147.
43.
Crowley J.L. and Bedrune J.M., Integration and Control of Reactive Visual
Processes, Proceedings of the 3rd European Conference on Computer Vision,
vol. 2, 1994, 47-58.
44.
McKenna S., Raja Y., and Gong S., Tracking Color Objects Using Adaptive
Mixture Models, International Conference on Image and Vision Computing
(ICVNZ99), 17(3), August 30-31, New Zealand ,1998. 223-229.
45.
Sobottka K. and Pittas I., Face Localization and Feature Extraction Based on
Shape and Color Information, International Conference on Image Processing
(ICIP), Switzerland, 1996.
46.
Yang M. H. and Ahuja N., Detecting Human Faces in Color Images,
Proceedings of IEEE International Conference on Image Processing, vol.
1,1998,127-130.
47.
Kwon Y. H. and N. da Vitoria Lobo. Face detection using templates,
International Conference on Pattern Recognition, pages 764–767, 1994.
48.
Lanitis A., Taylor C. J., and Cootes T. F. An automatic face identification
system using flexible appearance models, International Conference on Image
and Vision Computing (ICVNZ95), 13:393–401, 1995.
49.
Kirby M. and Sirovich L., Application of the Karhunen-Loeve Procedure for
the Characterization of Human Faces, IEEE Transaction on Pattern Analysis
and Machine Intelligence, 12(1):103-108, 1990.
124
50.
Moghaddam B. and Pentland A., Probabilistic Visual Learning for Object
Representation, IEEE Trans. on Pattern Analysis and Machine Intelligence,
19(7):696-710, 1997.
51.
Yang M. H., Ahuja N., and Kriegman D., Mixtures of Linear Subspaces for
Face Detection, Proceedings of the 4th International Conference on Automatic
Face and Gesture Recognition, 2000
52.
Rowley H., Baluja S., and Kanade T., Human Face Detection in Visual Scenes,
Advances in Neural Information Processing Systems 8 (NIPS12), 1996, 875881.
53.
Ai H., Ying L., and Xu G., A Subspace Approach to Face Detection with
Support Vector machines, Proceedings of IEEE International Conference on
Pattern Recognition, 2002.
54.
Popovici V. and Thiran J. P., Face Detection Using an SVM Trained in
Eigenfaces space, Proceedings of the 4th International Conference on Audio
and Video Based Biometric Person Authentication, 2003.
55.
Wang P. and Ji Q., Multi-View Face Detection under Complex Scene based on
Combined SVMs, Proceedings of IEEE International Conference on Pattern
Recognition, 2004.
56.
Freund Y. and Schapire R., A decision-theoretic generalization of on-line
learning and an application to boosting, Computational Learning Theory:
Eurocolt ’95, Springer-Verlag, 1995, 23–37.
57.
Jang J. S. and Kim J. H., Evolutionary Pruning for Fast and Robust Face
Detection, IEEE Congress on Evolutionary Computation CEC 2006, pages
1293-1299, Vancouver, Canada, July 2006.
58.
Roth D., Yang M. and Ahuja N., A Snowbased Face Detector, Advances in
Neural Information Processing Systems 12 (NIPS 12), volume 12, 2000.
59.
Goldberg, David E. Genetic Algorithms in Search, Optimization and Machine
Learning, US: Addison-Wesley Publication Co. 1989. ISBN: 0201157675
60.
Seng T. L., Khalid M. and Yusof R., Tuning of A Neuro-Fuzzy Controller by
Genetic Algorithm With An Application to A Coupled-Tank Liquid-Level
Control System, International Journal of Engineering Applications on
Artificial Intelligence, Vol. 11, pages 517-529, 1998.
61.
Moscato P., On evolution, search, optimization, genetic algorithms and martial
arts: Towards memetic algorithms, Technical Report 826, California Institute
of Technology, USA 1989.
62.
Areibi, S., Moussa, M., and Abdullah, H., A Comparison of Genetic/Memetic
Algorithms and Other Heuristic Search Techniques, International Conference
on Artificial Intelligence, pages 660-666, Las Vegas, Nevada, 2001
125
63.
BioID Face Database: http://www.bioid.com/downloads/facedb/index.php
64.
ACTS M2VTS Database:
http://www.tele.ucl.ac.be/projects/M2VTS/index.html
65.
Viisage Technology Incorporation, 2004. Technical Specification FacePASS
version 4.1 Product Description
126
127
Feature Selections Using Genetic Algorithm for
Face Detection
Zalhan Mohd Zin1 , Marzuki Khalid2 and Rubiyah Yusof2
1
2
Section of Industrial Automation – Universiti Kuala Lumpur-MFI,
Center for Artificial Intelligence and Robotics – Universiti Teknologi Malaysia
[email protected], [email protected]
Abstract—A variety of face detection techniques has been
proposed over the past decade. Generally, a large number of
features are required to be selected for training purposes.
Often some of these features are irrelevant and does not
contribute directly to the face detection algorithm. This
creates unnecessary computation and usage of large memory
space. In this paper we propose to use Genetic Algorithm
(GA) inside the Adaboost framework to select features which
provide better cascade of classifiers for face detection with
less training time. Eight different feature types are used in
GA search compared to only five basic feature types in
exhaustive search. These three additional feature types
added will enrich the quality of feature solutions but higher
computational time is required to train them due to larger
search space. To reduce the training time, we use GA to
select features during training process of cascade of
classifiers. Implementation of this technique to select
features during training of cascade of classifiers is done by
using Intel OpenCV software. Experiments on set of images
from BioID face database showed that by using GA to search
on larger number of feature types and sets, we are able to
obtain cascade of classifiers for a face detection system with
less training time but gives higher detection rates.
Keywords—Genetic Algorithm, cascade of classifiers, Adaboost,
rectangle features.
I. INTRODUCTION
To detect human faces in images in real-time is a real
challenging problem. Viola and Jones [1] were the first
who developed a real-time frontal face detector by
introducing boosted cascade of simple features that
achieves comparable detection and false positive rates to
actual state-of-the-art systems [4][5][6][7]. Many
researchers have proposed to enhance the idea of boosting
simple weak classifiers. Li et al. [8] describe a variant of
Adaboost called Floatboost for learning better classifiers.
Lienhart et al. [2] showed that by extending the basic
feature types and sets, detectors with lower error rates are
produced. However, extension of feature types will
automatically leads to much higher number of feature sets
and expand the search space thus increases the training
times. Recapitulating the research that is based on the
publication of Viola and Jones [1] we can see that there
are mainly two problems to deal with: (1) Extending the
feature sets and being able to search over these very large
sets in reasonable time. (2) The best feature solution sets
are not known in advance. To overcome these problems,
we use GA in combination with Adaboost to search over a
large number of possible features. Our goal is to find
better cascade of classifiers by using larger feature sets in
lesser time, and that achieve comparable or even better
classification results compared to the cascade of
classifiers that are trained exhaustively over a small
feature sets. The use of Evolutionary Algorithms (EA) in
the field of image processing, especially automatic
learning of features for object detection has received
growing interest. Treptow and Zell [3] showed an
Evolutionary Algorithm can be used within Adaboost
framework in single stage classifiers to find features
which provide better classifiers for object detections such
as faces and balls. Recently in 2006, J. S. Jang and J. H.
Kim [9] introduced the employment of Evolutionary
Pruning in cascaded structure of classifiers which has a
purpose to reduce the number of weak classifiers found
before by Adaboost for each stage of cascade training.
However, this approach was aimed to focus more on
increasing the performance of face detection speed by
reducing the number of weak classifiers in already built
cascade while ignoring the cascade training time.
Our approach is by focusing on reducing the
computational training time to build 15 stages cascade of
boosted classifiers while having similar or better cascade
performance. In order to achieve that goal, three
additional feature types are added to improve the quality
of possible feature solutions. These additional features
will increase the size of the search space thus the cascade
of classifiers training time. GA is then used inside
Adaboost framework to select good features sets which
represent sets of strong classifiers for each stage of
cascade training.
The paper is organized as follows: In the next Section,
Adaboost learning procedure is introduced. Section III
describes cascade of boosted classifiers. In Section IV,
application of GA into Adaboost framework to build
cascade of classifiers using larger number of feature types
International Colloquium on Signal Processing and its Applications, March 9-11, 2007, Melaka, Malaysia.
© Faculty of Electrical Engineering, UiTM Shah Alam, Malaysia.
ISBN: 978-983-42747-7-7
128
is introduced. Section V show the result of performance
of the cascade trained using GA compare with cascade
trained exhaustively in terms of hit rates, missed rates,
false positive rate and training time. The experiments
were done by using open source software, Intel OpenCV,
as done in [2]. Section VI summarizes and concludes our
work and points out perspectives for further future
research.
II. ADABOOST LEARNING OF OBJECT DETECTORS
Viola and Jones [1] developed a reliable method to detect
objects such as faces in images in real-time. An object
that has to be detected is described by a combination of a
set of simple Haar-wavelet like features shown in Fig. 1.
The sums of pixels in the white boxes are subtracted from
the sum of pixels in the black areas. The advantage of
using these simple features is that they can be calculated
very quickly by using “integral image”. An integral image
II over an image I is defined as follows:
II ( x, y ) =
∑ I ( x' , y ' )
where ϑ j is a threshold and p j a parity to indicate the
direction of the inequality. The description of Adaboost
algorithm to select a predefined number of good features
given a training set of positive and negative example
images is shown in Fig. 2. The Adaboost algorithm
iterates over a number of T rounds. In each iteration, the
space of all possible features is searched exhaustively to
train weak classifiers that consist of one single feature.
During this training, the threshold ϑ j must be determined
for the feature value to discriminate between positive and
negative examples. Therefore, for each possible feature
and given training set, the weak learner determines two
optimal values (thresholds), such that no training sample
is misclassified. For each weak classifier h j (x) , the error
value ε j will be calculated using misclassification rate of
all positive and negative training images. It gives each
feature h j (x) trained with its respective error
value ε j which is between 0 and 1.
(1)
x '≤ x , y '≤ y
In [1] also, it is shown that every rectangular sum within
an image can be computed with the use of an integral
image by four array references. A classifier has to be
trained from a number of available discriminating features
within a specific sub window in order to detect an object.
The possible positions and scales of the five different
feature types as shown in Fig. 1 produce about 90,000
possible alternative features within a sub window size of
24x24 pixels. This number exceeds largely the number of
pixels itself.
Fig. 1: Five different basic types of rectangle features within their sub
window of 24x24 pixels. These five types of features are the initial
features used to train cascade of classifiers exhaustively.
Therefore, a small set of features which best describe the
object to be detected, has to be selected. Adaboost [10] is
a technique to select a good classification functions, such
that a final “strong classifier” will be formed, which is in
fact, a linear combination of all weak classifiers. In the
general context of learning features, each weak classifier
h j (x) consists of one single feature f j :
Fig. 2: Adaboost learning algorithm as proposed in [1]. The search of
good feature ht was done exhaustively as stated in step 3b above
The best feature ht ( x) found with the lowest error rate ε t
will be selected as the weak classifier for this 1/T
iteration. After the best weak classifier is selected,
concerning all training examples are reweighted and
normalized to concentrate in the next round particularly
on those examples that were not correctly classified. At
the end, the resulting strong classifier is a weighted linear
combination of all T weak classifiers.
III. CASCADE OF BOOSTED CLASSIFIERS
1 : p j f j ( x) < p jϑ j
h j ( x) = 
 0 : otherwise
(2)
This section describes an algorithm for constructing a
cascade of classifiers which drastically reduces the
computational time. The main idea is to build a set of
boosted classifiers which are smaller, but more efficient,
International Colloquium on Signal Processing and its Applications, March 9-11, 2007, Melaka, Malaysia.
© Faculty of Electrical Engineering, UiTM Shah Alam, Malaysia.
ISBN: 978-983-42747-7-7
129
that reject most of the negative sub-windows while
detecting almost all positive instances. Input for the
cascade is the collection of all sub-windows also called
scanning windows. They are first passed through the first
layer or stage in which all sub-windows will be classified
as faces or non faces. The negative results will be
discarded while remaining positive sub-windows will
trigger the evaluation of the next stage classifier. The subwindows that reach and pass the last layer are classified as
faces (See Fig. 3). Every stage actually consists of only a
small number of features. In the early stages, with only
small number of selected features it is possible to
determine the existence of a non-face. On the other hand,
determining the presence of a face usually needs more
features. Therefore, the trained cascade of classifiers
usually has an increasing number of features in each
consecutive stage until its last layer and become
increasingly more complex.
During the training of cascade of classifiers, the number
of features per stage was driven through a “trial and error”
process. In this process, the number of features was
increased until a significant reduction in the false positive
rate could be achieved. In our case, each stage was trained
to eliminate 50% of the non-face patterns while falsely
eliminating only 0.005% of the frontal face patterns; 15
stages were trained and a false alarm rate about
0.515 ≅ 3 x10 −5 and a hit rate about 0.99515 ≅ 0.93 were
expected. More features were added until the false
positive rate on the each stage achieved the desired rate
while still maintaining a high detection rate. At each
training stage, the false positive images from previous
stages are added to the sets of negative or non-faces
images and this set of images is used as negative images
in the next stages training [9].
•
•
•
•
•
type : type of feature
x : coordinate x in sub-window
y : coordinate y in sub-window
dx : width of feature
dy : height of feature
The sixth gene is a decimal number type which stores the
fitness value of the respective features. As described
previously in Adaboost, every feature or weak classifier
trained produce an error value ε j and the feature with the
lowest ε j will be selected. So, the fitness function chosen
here is
1 − ε i where i is the number between 1 and
population size N. By using this fitness function, the
higher fitness value indicates the lower error value ε j . For
good features selections, we propose to enlarge the
number of feature types. The three feature types in Fig. 4
are added into the total feature sets. These three features
are part of right rectangle features which already exist in
Intel OpenCV along with the 5 basic feature types that
had been shown in Fig. 1. These 8 feature types are
known as Core Features in OpenCV.
With all these, the total feature types equaled to 8. As a
result, while more valuable and better types of features
might be created as the potential sets of good feature
solutions, the search space of all of these feature types
also increased. Therefore GA is used to select the features
in order to avoid the exhaustive search and high
computational time.
a
b
c
Fig. 4: The existing three types of features a, b and c within their sub
window of 24x24 pixels. These feature sets are added in training of
cascade of classifiers with GA
Fig. 3: Cascade from simple to complex classifiers with N stages.
IV. GENETIC ALGORITHM FOR FEATURE SELECTIONS
Genetic Algorithms (GAs) [11] are a family of
computational models inspired by natural evolution. GAs
comprises a subset of evolution-based optimization
techniques focusing on the application of selection,
mutation, and recombination or crossover to a population
of competing problem solutions. In our paper, the
chromosome of GA represents the specific type and
location of one single feature. Each chromosome has the
length of 6 genes. The first five genes are integer types
which consist of:
Initially, in the first generation of GA, all chromosomes
are randomly generated and evaluated to determine their
fitness value. Ranking Scheme selection is used when all
chromosomes are ranked based on their fitness value. The
first chromosome will have the highest fitness value while
the last one will have the lowest one. In each generation,
the chromosomes from half of the total populations will
be selected and put into a mating pool and will become
parents. The two-point crossover is applied here. Based
on the crossover probability rate, p co
where
pco ∈ {0...1} , two different chromosomes are chosen
from the mating pool. At the same time, two random
positions of chromosome’s genes are chosen which are
m and n such that m, n ∈ 1..5 and m ≠ n . Then, the
genes of m and n are crossover within these two parents
International Colloquium on Signal Processing and its Applications, March 9-11, 2007, Melaka, Malaysia.
© Faculty of Electrical Engineering, UiTM Shah Alam, Malaysia.
ISBN: 978-983-42747-7-7
{ }
130
to produce new children. In mutation process, mutation
probability rate p mt where p mt ∈ 0...1 is used. Based
{
}
on this rate, mutation process will take place on the
chromosome chosen from the mating pool. Once mutation
occurs, another random number between 1 and 5 is
selected. This number represents the location of the gene
to be mutated in this chromosome. In the case of the first
gene type, the new and different type of feature will be
selected randomly between 1 and 8. While in the other
cases such as genes x, y, dx or dy, its value will be added
with an integer value randomly chosen between -2 and 2.
All parents’ chromosomes that have undergone crossover
and mutation process will produce new children
chromosomes and they are re-evaluated to determine their
fitness values, 1 − ε i . The new chromosomes with high
fitness values will have a better chance to be selected and
inserted into its appropriate rank in main GA population.
For new chromosomes that become invalid or no longer
feasible, for example the sum of their x-coordinate and
their width exceed the allowed width of the sub-window,
here is 24x24, their fitness value will be assigned with the
value of zero, in other word, these features are not the
good features and they shall be discarded. While the
chromosomes with modest or average fitness value will
not have an opportunity to be selected and inserted neither
in the main population nor in the next mating pool for the
next generation. The 300-GA population size chosen is
the result of a trial and error process in which a trade-off
is made between speed and error rate. The size of a
population should be sufficiently large to create sufficient
diversity covering the possible solution space. Other
parameters of GA are the probabilities for crossover and
mutation operators. Crossover explore search the whole
search space while mutation exploit it. Treptow and Zell
[3] had preferred 20% of crossover rate and 80% of
mutation rate. Since they were using Evolutionary
Algorithm to select about 200 features in only a single
stage of classifiers, the ratio used seems to be reasonable.
In our case, we have proposed to use 90% as crossover
rate and 10% as mutation rate. This is due to the fact that
training of stages or cascade of classifiers is slightly
different from training of single stage classifiers as
described in Section 3 even though both use Adaboost
algorithm. At each different stage in cascade training, GA
search will select features from slightly different search
space. This is due to the new generated images samples
incorrectly classified (false positive) in the previous
stages are added into the training sets. In order to avoid
early local optima or to loose good features solutions, the
exploration of feature search space should have higher
priority. So, to train cascade of classifiers which is
involving 15 stages in our case, the ratio 9:1 crossovermutation rate seems to be logical and reasonable.
In this Section, we compare the performance of the 15
stages cascade of classifiers built by using Adaboost with
exhaustive search which we refer to as 5FBasicsEx and
Adaboost with GA to select features, which we refer to as
8FCoreGA. 5FBasicsEx search over only five basic
feature types while 8FCoreGA search over larger sets of 8
feature types. The training sets used consist of 7000
positive images and 3000 negative images that we gather
from various sources (See Fig. 5). The dimensions of
these gray value images are 24x24 pixels and are different
one from another.
Fig. 5: Examples of faces and non-faces images in the training set
The test set consists of face dataset images from BioID
[12]. This dataset contains 1200 face images with
different people, gender, face expression, size and
location, level of illumination and also appearance of nonface objects. The examples are shown in Fig. 6.
Fig. 6: Examples of images containing faces with various conditions
used in the BioID test set [12]
The total trainings were stopped after 15 stages cascade
built. The population size is 300, all chromosomes are
initialized randomly and both GA operators’ crossover
and mutation rates are set to 90% and 10% respectively.
GA is converged and will stop if there is no better single
feature found within the next 50 consecutive generations.
In case of no convergence occurred, GA will stop as well
if it reaches the maximum number of generations which is
set to 200. Experiments were carried out on an Intel
Pentium IV 3.0 GHz processor. 8FCoreGA was run five
times and the average results were taken. In Table 1, we
can see that 8FCoreGA performs about 3.3 times faster
than 5FBasicsEx.
Algorithm
5FBasicsEx 5F
8FCoreGA 15F
Average
training
time (sec)
Average time
to select single
feature (sec)
134225
336.4
40254
60.1
Average
number of
features in
cascade
400
670
TABLE 1: Training time taken to build 15 stages cascade of classifiers
and their number of features selected.
V.
EXPERIMENTS OF EVOLUTIONARY SEARCH WITH
GENETIC ALGORITHM
Table 1 also show the total number of features found by
both algorithms. We can see that 8FCoreGA select more
International Colloquium on Signal Processing and its Applications, March 9-11, 2007, Melaka, Malaysia.
© Faculty of Electrical Engineering, UiTM Shah Alam, Malaysia.
ISBN: 978-983-42747-7-7
131
features compare to 5FBasicsEx. However, with less
training time, selection of a single feature only require
60.1 seconds in 8FCoreGA while in 5FBasicsEx, 336.4
seconds need to find a single feature. This showed that the
selections of features are about 5.6 times faster in
8FCoreGA than in 5FBasicsEx.
False positive
detection/total
number of detection
(%)
Average
hit rate
(%)
Algorithm
5FBasicsEx
90.01
62.97
8FCoreGA(5Exp)
93.46
68.37
8FCoreGA(best)
94.09
66.33
8FCoreGA(worst)
92.17
69.84
TABLE 2: Comparison of hit rates and false positive rate performed by
the cascades of classifiers built by 5FBasicsEx and 8FCoreGA
In Table 2, it showed that the average hit rate of
8FCoreGA is superior about 3.5% than 5FBasicsEx
(90.01%) and the best cascade trained using 8FCoreGA
has achieved better hit rate, 94.09% while the worst
cascade achieved 92.17%. However, 8FCoreGA give
lower performance in term of false positive rate.
8FCoreGA achieved an average of 68.37% in false
positive rate and 5FBasicEx achieved 62.97%. This rate is
calculated based on the sum all false positives rectangles
detected divided by the total number of detection
rectangles in the whole test sets. The examples are shown
if Fig. 7.
Fig. 7: The examples of the test images show faces are detected. The top
three images however show false positive detections in single image
while in the bottom three images, detection of faces are done perfectly.
In Table 3, the results of the three additional feature types
are examined. For each feature type a, b and c in Fig. 4,
the average number of how many times they are selected
during cascade training using 8FCoreGA is shown.
Feature Type
a
b
c
Average
number of
feature
selection
41.6
63.4
8.2
them, feature types a and b has quite important roles in
the cascade training as they are selected many times. It
also shows that feature types c is not considered as good
feature types since it is rarely selected in the cascade
training.
VI.
CONCLUSIONS AND FUTURE WORK
In this paper, we have extended the work of Treptow and
Zell [3] by implementing GA inside AdaBoost framework
to select features in building 15 stages cascade of
classifiers rather than building single stage classifiers
only. The approach of Viola and Jones [1] is also referred
as we implement cascade training. The additional three
feature types were added in order to increase the quality
of feature solutions. As the consequence, the feature
solutions sets become bigger and we have shown how GA
can be used to overcome the problem of feature selections
in this huge search space. The training time of cascade of
classifiers using GA was drastically reduced. The trained
cascade have higher hit rate but perform less in term of
false positive detection compare to the cascade trained
exhaustively with 5 basic feature types. Furthermore, the
three additional feature types have shown different level
of importance in this training. Nevertheless, they are
many directions for further research in this field. Other
search algorithm may be used to replace GA to select
features in cascade training. In GA itself, dynamic rate of
crossover and mutation could also be implemented and
analyzed. This dynamic rate had been used as a tuning
tool in one part of the research done by T. L. Seng, M.
Khalid and R. Yusof [11].
On the other hand, since the speed of detection rate was
not addressed in this paper and by looking at the number
of features selected, it seems that the performance of the
face detector will be slower. So, the improvement of the
cascaded classifiers built using GA with very large feature
types could be done by using the Evolutionary Prunning,
an approach introduced very recently in 2006 by J. S.
Jang and J. H. Kim [9] which has a purpose to reduce the
number of weak classifiers while maintaining the
performance achieved by the cascade of classifiers. Our
trained cascade using GA could be then improved using
this approach.
REFERENCES
Total three
features
113.20
% from total
features
16.94
TABLE 3: The average number of three additional feature types selected
by 8FCoreGA.
These three feature types contribute about 17% of the
features selected by 8FCoreGA in various stages. Among
[1] P. Viola, M. Jones. Robust Real-time Object
Detection. Second International Workshop on Statistical
and Computational Theories of Vision-Modeling,
Learning, Computing, and Sampling, 2001.
[2] R. Lienhart, A. Kuranov and V. Pisarevsky. Empirical
analysis of detection cascades of boosted classifiers for
rapid object detection. In DAGM'03, 25th Pattern
Recognition Symposium, pages 297-304, 2003.
International Colloquium on Signal Processing and its Applications, March 9-11, 2007, Melaka, Malaysia.
© Faculty of Electrical Engineering, UiTM Shah Alam, Malaysia.
ISBN: 978-983-42747-7-7
132
[3] A. Treptow & A. Zell, Combining Adaboost Learning
and Evolutionary Search to select Features for Real-Time
Object, Proc. Of the Congres on Evolutionary
Computational CEC 2004, Vol 2, page 2107-2113, San
Diego, USA, 2004.
[4] H. Rowley, S. Baluja and T. Kanabe. Neural Networkbased Face Detector, IEEE Transaction on Pattern
Analysis and Machine Intelligence, 20(1), page 23-28,
2000.
[5] K. Sung and T. Poggio. Example-based Learning For
View-based Face Detection, IEEE Transaction on Pattern
Analysis and Machine Intelligence, 20, page 39-51, 1998.
[8] S.Z. Li, Z.Q. Zhang, H. Shum and H. J. Zhan.
Floatboost Learning for Classification, 16th Annual
Conference on Neural Information Processing Systems
(NIPS), 2002.
[9] J. S. Jang and J. H. Kim. Evolutionary Prunning for
Fast and Robust Face Detection, IEEE Congress on
Evolutionary Computation CEC 2006, pages 1293-1299,
Vancouver, Canada, July 2006.
[10] Y. Freund and R. E. Schapire. A Short Introduction
to Boosting, Journal of Japanese Society for Artificial
Intelligence, Vol. 14(5), pages 771-780, September 1999.
[6] H. Schneiderman and T. Kanabe. A Statistical method
for object detection applied to faces and cars,
International Conference on Computer Vision and
Pattern Recognition, page 1746-1759, 2000.
[11] T. L. Seng, M. Khalid and R. Yusof. Tuning of A
Neuro-Fuzzy Controller by Genetic Algorithm With An
Application to A Coupled-Tank Liquid-Level Control
International
Journal
of
Engineering
System,
Applications on Artificial Intelligence, Vol. 11, pages
517-529, 1998.
[7] D. Roth, M. Yang and N. Ahuja. A Snowbased Face
Detector, Advances in Neural Information Processing
Systems 12 (NIPS 12), volume 12, 2000.
[12] BioID Face Database:
http://www.bioid.com/downloads/facedb/index.php
International Colloquium on Signal Processing and its Applications, March 9-11, 2007, Melaka, Malaysia.
© Faculty of Electrical Engineering, UiTM Shah Alam, Malaysia.
ISBN: 978-983-42747-7-7
134
ENHANCED FEATURE SELECTIONS OF ADABOOST TRAINING FOR
FACE DETECTION USING GENETIC ALGORITHM (GABOOST)
Zalhan Mohd Zin1 , Marzuki Khalid2 and Rubiyah Yusof2
Section of Industrial Automation – Universiti Kuala Lumpur-MFI,
2
Center for Artificial Intelligence and Robotics (CAIRO) – Universiti Teknologi Malaysia (UTM)
[email protected], [email protected]
1
ABSTRACT
Various face detection techniques has been proposed
over the past decade. Generally, a large number of
features are required to be selected for training
purposes of face detection system. Often some of these
features are irrelevant and does not contribute directly
to the face detection algorithm. This creates
unnecessary computation and usage of large memory
space. In this paper we propose to enlarge the features
search space by enriching it with more types of
features. With an additional seven new feature types,
we show how Genetic Algorithm (GA) can be used,
within the Adaboost framework, to find sets of features
which can provide better classifiers with a shorter
training time. The technique is referred as GABoost for
our face detection system. The GA carries out an
evolutionary search over possible features search space
which results in a higher number of feature types and
sets selected in lesser time. Experiments on a set of
images from BioID database proved that by using GA
to search on large number of feature types and sets,
GABoost is able to obtain cascade of boosted
classifiers for a face detection system that can give
higher detection rates, lower false positive rates and
less training time.
KEY WORDS
Genetic Algorithm, cascade of classifiers, Adaboost,
rectangle features.
1. Introduction
To detect human faces in images in real-time is a real
challenging problem. Viola and Jones [1] were the first
who developed a real-time frontal face detector by
introducing boosted cascade of simple features that
achieves comparable detection and false positive rates
to actual state-of-the-art systems [4][5][6][7]. Many
researchers have proposed to enhance the idea of
boosting simple weak classifiers. Li et al. [8] describe a
variant of Adaboost called Floatboost for learning
better classifiers. Lienhart et al. [2] showed that by
extending the basic feature types and sets, detectors
with lower error rates are produced. However,
extension of feature types automatically leads to much
higher number of feature sets and expand the search
space thus increases the training times. Recapitulating
the research that is based on the publication of Viola
and Jones [1] we can see that there are mainly two
problems to deal with: (1) Extending the feature sets
and being able to search over these very large sets in
reasonable time. (2) The best feature solution sets are
not known in advance. To overcome these problems,
we use GA in combination with Adaboost to search
over a large number of possible features. Our goal is to
find better cascade of classifiers by using a very large
feature sets in lesser time, and that achieve comparable
or even better classification results compared to the
cascade of classifiers that are trained exhaustively over
a small feature sets. The use of Evolutionary
Algorithms in the field of image processing, especially
automatic learning of features for object detection has
received growing interest. Treptow and Zell [3]
showed an Evolutionary Algorithm can be used within
Adaboost framework in single stage classifiers to find
features which provide better classifiers for object
detections such as faces and balls. In 2006, J. S. Jang
and J. H. Kim [9] introduced the employment of
Evolutionary Pruning in cascaded structure of
classifiers which has a purpose to reduce the number of
weak classifiers found before by Adaboost for each
stage of cascade training. However, this approach was
aimed to focus more on increasing the performance of
face detection speed by reducing the number of weak
classifiers in already built cascade while ignoring the
cascade training time.
Our approach is by focusing on reducing the
computational training time to build 15 stages cascade
of boosted classifiers while having similar or better
cascade performance. In order to achieve that goal, we
enrich the possible feature solutions by adding seven
new types of features. These additional features will
increase the size of the search space thus the cascade of
classifiers training time. GA is then implemented into
Adaboost framework to select good features sets which
represent sets of strong classifiers for each stage of
cascade training. The cascade built consists of the
combination of eight basic features and seven new
feature types.
The paper is organized as follows: In the next Section,
Adaboost learning procedure is introduced. Section III
describes cascade of boosted classifiers. In Section IV,
application of GA into Adaboost framework to build
Proceedings of the Third IASTED International Conference on Computational Intelligence
July 2-4, 2007, Banff, Alberta, Canada
ISBN Hardcopy: 978-0-88986-671-3 / CD: 978-0-88986-672-0
135
cascade of classifiers using large number of feature
types is introduced. This includes the characteristics of
seven new feature types. Section V show the result of
performance of the cascade trained using GA compare
with cascade trained exhaustively in terms of hit rates,
missed rates, false positive rate and training time. The
experiments were done by using open source software,
Intel OpenCV, as done in [2]. Section VI summarizes
and concludes our work and points out perspectives for
further future research.
2. Adaboost Learning of Object Detectors
Viola and Jones [1] developed a reliable method to
detect objects such as faces in images in real-time. An
object that has to be detected is described by a
combination of a set of simple Haar-wavelet like
features shown in Fig. 1.
1 : p j f j ( x) < p jϑ j
h j ( x) = 
 0 : otherwise
(2)
where ϑ j is a threshold and p j a parity to indicate the
direction of the inequality. The description of Adaboost
algorithm to select a predefined number of good
features given a training set of positive and negative
example images is shown in [1].
The Adaboost algorithm iterates over a number of T
rounds. In each iteration, the space of all possible
features is searched exhaustively to train weak
classifiers that consist of one single feature. During this
training, the threshold ϑ j must be determined for the
feature value to discriminate between positive and
negative examples. Therefore, for each possible feature
and given training set, the weak learner determines two
optimal values (thresholds), such that no training
sample is misclassified. For each weak classifier h j (x) ,
the
error
value
εj
will
be
calculated
using
misclassification rate of all positive and negative
training images. It gives each feature h j (x) trained
with its respective error value ε j which is between 0
Fig. 1: Five different basic types of rectangle features within their
sub window of 24x24 pixels. These five types of features are the
initial features used to train cascade of classifiers exhaustively.
and 1. The best feature ht ( x) found with the lowest
error rate ε t will be selected as the weak classifier for
The sums of pixels in the white boxes are subtracted
from the sum of pixels in the black areas. The
advantage of using these simple features is that they
can be calculated very quickly by using “integral
image”. An integral image II over an image I is
defined as follows:
this 1/T iteration. After the best weak classifier is
selected, concerning all training examples are reweighted and normalized to concentrate in the next
round particularly on those examples that were not
correctly classified. At the end, the resulting strong
classifier is a weighted linear combination of all T
weak classifiers.
II ( x, y ) =
∑ I ( x' , y ' )
(1)
x '≤ x , y '≤ y
In [1] also, it is shown that every rectangular sum
within an image can be computed with the use of an
integral image by four array references. A classifier has
to be trained from a number of available discriminating
features within a specific sub window in order to detect
an object. The possible positions and scales of the five
different feature types as shown in Fig. 1 produce
about 90,000 possible alternative features within a sub
window size of 24x24 pixels. This number exceeds
largely the number of pixels itself. Therefore, a small
set of features which best describe the object to be
detected, has to be selected. Adaboost [10] is a
technique that initially selects good classification
functions, such that a final “strong classifier” will be
formed, which is in fact, a linear combination of all
weak classifiers. In the general context of learning
features, each weak classifier h j (x) consists of one
single feature f j :
3. Cascade of Boosted Classifiers
This section describes an algorithm for constructing a
cascade of classifiers which drastically reduces the
computational time. The main idea is to build a set of
cascade boosted classifiers which are smaller, but more
efficient, that will reject most of the negative subwindows while detecting almost all positive instances.
Input for the cascade is the collection of all subwindows also called scanning windows. They are first
passed through the first stage in which all sub-windows
will be classified as faces or non faces. The negative
results will be discarded while the remaining positive
sub-windows will trigger the evaluation of the next
stage classifier. The sub-windows that reach and pass
the last layer are classified as faces (See Fig. 3). Every
layer actually consists of only a small number of
features. In the early stages, with only small number of
selected features it is possible to determine the
existence of a non-face. On the other hand, determining
the presence of a face usually needs more features. The
Proceedings of the Third IASTED International Conference on Computational Intelligence
July 2-4, 2007, Banff, Alberta, Canada
ISBN Hardcopy: 978-0-88986-671-3 / CD: 978-0-88986-672-0
136
trained cascade boosted classifiers usually have an
increasing number of features in each consecutive layer
until its last layer and therefore became increasingly
more complex.
Therefore, fitness function chosen here is
1 − ε i where
i is the number between 1 and population size N. With
this fitness function, the higher fitness value indicates
the lower error value ε j . To increase the possibility of
selecting good features, we increase the number of
feature types. The existing three feature types in
OpenCV as shown in Fig. 4 are added into the total
feature sets. In addition to that, we add our new seven
feature types which should increase the search space
and possibilities of getting better cascade of classifiers
with higher performance.
Fig. 3: Cascade from simple to complex classifiers with N layers.
During the training of the cascade of boosted
classifiers, the number of features per layer or stage
was driven through a “trial and error” process. In this
process, the number of features was increased until a
significant reduction in the false positive rate could be
achieved. In our case, each stage was trained to
eliminate 50% of the non-face patterns while falsely
eliminating only 0.005% of the frontal face patterns; 15
stages were trained and a false alarm rate about
0.515 ≅ 3 x10 −5 and a hit rate about 0.99515 ≅ 0.93 were
expected. More features were added until the false
positive rate on the each stage achieved the desired rate
while still maintaining a high detection rate. At each
training stage, the false positive images from previous
stages are added to the sets of negative or non-faces
images and this set of images is used as negative
images in the next stages training [9].
4. Genetic Algorithm for Feature Selections
Fig. 4: The existing three types of features within their sub window
of 24x24 pixels. These feature sets are added in training of cascade
of classifiers with GA search
In Fig. 5, features a, b, c and d with the style of an Lshape and an inverse L-shape should have the ability to
distinguish the image of face and non-face based on the
pattern of the side of the face specifically on the left
and right foreheads, temples and jaw lines. Feature e is
the inverse of the middle feature shown in Fig. 3. And
lastly feature f should distinguish the pattern of both
eyes region with forehead and feature g should make
the difference between face images and non-face
images based on the pattern of eyes location. With all
these, the total feature types equaled to 15. As a result,
while more valuable and better types of features might
be created as the potential sets of good feature
solutions, the search space of all of these feature types
increased dramatically. Therefore GA is used to select
the features and to avoid the exhaustive search and
high computational time.
Genetic Algorithms (GAs) [11] are a family of
computational models inspired by natural evolution.
GAs comprises a subset of evolution-based
optimization techniques focusing on the application of
selection, mutation, and recombination or crossover to
a population of competing problem solutions. In our
paper, the chromosome of GA represents the specific
type and location of one single feature in the subwindow of 24x24 pixels. Each chromosome has six
genes. The first five genes are integer types which
consist of:
•
•
•
•
•
type : type of feature
x : coordinate x in sub-window
y : coordinate y in sub-window
dx : width of feature
dy : height of feature
The sixth gene contains fitness value between 0 and 1.
As described previously in Adaboost, every trained
feature or weak classifier produce an error value ε j
and the feature with the lowest ε j will be selected.
a
b
e
c
d
f
g
Fig. 5: The newly proposed seven types of features within their sub
window of 24x24 pixels. These feature sets are proposed and added
in training of cascade of classifiers with GA search
The parameters of Genetic Algorithm are shown in
Table 1. In the first generation of GA, all chromosomes
are randomly generated and evaluated to determine
their fitness value. Ranking Scheme selection is used
when all chromosomes are ranked based on their
fitness values. The first chromosome will have the
Proceedings of the Third IASTED International Conference on Computational Intelligence
July 2-4, 2007, Banff, Alberta, Canada
ISBN Hardcopy: 978-0-88986-671-3 / CD: 978-0-88986-672-0
137
highest fitness value while the last one will have the
lowest one. In each generation, half of the total
populations will be selected and put into a mating pool
and will become parent chromosomes.
The two-point standard uniform crossover is applied
here. Based on the crossover probability rate, p co ,
where
pco ∈ {0...1} , two different chromosomes are
chosen from the mating pool. Two random gene
positions, m and n such that m, n ∈ 1..5
and m ≠ n , are chosen randomly. Then, the genes at
position m and n are crossover within these two
parents to produce new children. In mutation process,
mutation probability rate p mt where p mt ∈ 0...1 is
{ }
{
}
used. Based on this rate, mutation process will take
place on the chromosome chosen from the mating pool.
Once mutation occurs, a single gene position is
selected randomly. If the first gene type is selected, the
new and different type of feature will be selected
randomly between 1 and 15. While in the other cases
such as genes x, y, dx or dy, its value will be added
with an integer value randomly chosen between -2 and
2.
All parent chromosomes that have undergone crossover
and mutation process will produce new children
chromosomes and they are re-evaluated to determine
their fitness values, 1 − ε i . The new chromosomes
with high fitness values will have a better chance to be
selected and inserted into GA population. For the new
chromosomes that become invalid or no longer
feasible, for example the sum of their x-coordinate and
their width exceed the allowed width of the subwindow, here is 24x24, then their fitness value will be
assigned with zero, in other word, these features are
not the good features and they shall be discarded.
While the chromosomes with modest or average fitness
values will not have a good opportunity to be selected
for the next generation.
The 200-GA population size chosen is the result of a
trial and error process in which a trade-off is made
between speed and error rate. The size of a population
should be sufficiently large to create sufficient
diversity covering the possible solution space. Other
parameters of GA are the probabilities for crossover
and mutation operators. In their paper, Treptow and
Zell [3] had preferred 20% of crossover rate and 80%
of mutation rate. Since they used EA to select about
200 features in only a single stage of classifiers, the
ratio used seems to be reasonable.
In our case, crossover rate was 90% and mutation rate
was 10%. This is due to the fact that the training of 15
stages cascade of classifiers is slightly different from
the training of a single stage classifiers as described in
Section 3 even though both use Adaboost algorithm.
At each different stage in the cascade training,
GABoost will select features from slightly different
search spaces. This is due to the new generated images
samples incorrectly classified (false positive) in the
previous stages that are added into the training sets. In
order to avoid early local optima or to loose good
features solutions, the exploration of feature search
space should have higher priority. So, to train cascade
of classifiers which involved 15 stages, as in our case,
the ratio 9:1 crossover-mutation rate seems to be
logical and reasonable.
5. Experiments of Evolutionary Search
with Genetic Algorithm
In this Section, we compare the performance of the 15
stages cascade of classifiers that has been built in two
different techniques: (1) Adaboost with exhaustive
feature selections (ExBoost) and (2) Adaboost with GA
to select features (GABoost). ExBoost search over only
five basic feature types while GABoost search over a
larger set of 15 feature types. The training sets used
consist of 7,000 positive images and 3,000 negative
images from various sources (See Fig. 6). The
dimensions of these gray value images are 24x24
pixels and they are different one from another.
Fig. 6: Examples of faces and non-faces images in the training set
The test set consists of face dataset images from BioID
[13]. This dataset contains 1,200 face images with
different people, gender, face expression, size and
location, level of illumination and also appearance of
non-face objects. The examples are shown in Fig. 7.
All images in the training and testing datasets are
different from one to another.
GA parameters
Population Size
200
Crossover Type
Two-points Std Uniform
Crossover Rate
0.9
Mutation Type
Single Gene
Mutation Rate
0.1
Fig. 7: Examples of images containing faces with various conditions
used in the BioID test set [13]
TABLE 1: Parameters used in Genetic Algorithm.
Proceedings of the Third IASTED International Conference on Computational Intelligence
July 2-4, 2007, Banff, Alberta, Canada
ISBN Hardcopy: 978-0-88986-671-3 / CD: 978-0-88986-672-0
138
The total trainings were stopped after a set of 15
cascaded stages has been built. The population size is
200, all chromosomes are initialized randomly,
crossover rate is 0.9 and mutation rate is set to 0.1. GA
is converged and will stop if no better single feature
found within the next 50 consecutive generations. In
case of no convergence, GA will stop as well if it
reaches the maximum number of generations which is
set to 200. Experiments were carried out on an Intel
Pentium IV 3.0 GHz processor. GABoost was run 10
times and the average results are taken. In Table 2, we
can see that GABoost performs 3.7 times faster than
ExBoost. The comparison of computational training
time between ExBoost and GABoost in the ten
experiments is shown in Fig. 8.
Algorithm
Average
training
time (sec)
Average time
to select single
feature (sec)
134225
35634
336.4
56.6
ExBoost 5F
GABoost 15F
Average
number of
features in
cascade
400
630
TABLE 2: Training time taken to build 15 stages cascade of
classifiers and their number of features selected.
ExBoost
40
Time (h)
Table 3 presents the average hit rate of GABoost
(90.13%) is slightly superior to the hit rate of ExBoost
(90.01%). From the experiments also, the best hit rate
achieved by GABoost is 94.25% while the worst hit
rate is 85.68%. GABoost also performs better in false
positive rates. False positive rate is calculated based on
the sum all false positives rectangles detected divided
by the total number of detection rectangles in the
whole test sets. The examples are shown if Fig. 9.
In our experiments, the average false positive rate
achieved by GABoost is 61.56% that is lower than
false positive rate of ExBoost (62.97%). The hit rates
and false positive rates of the ten experiments of
GABoost are shown in Fig. 10.
Comparison in term of computational training time
GABoost
Fig. 9: The examples of the test images show faces are detected. The
top three images however show false positive detections in single
image while in the bottom three images, detection of faces are done
perfectly.
30
20
10
0
GABoost vs ExBoost
1
2
3
4
5
6
7
8
9
10
Exp No
GABoost_False_Positive_Rate
ExBoost_Hit_Rate
ExBoost_False_Positive_Rate
100
80
%
Fig. 8: Comparison of computational training time between ExBoost
and GABoost in ten experiments.
GABoost_Hit_Rate
60
40
Table 2 also show the total number of features found
by both algorithms. We can see that GABoost selected
more features compared to ExBoost. However, with
less training time, selection of a single feature just
require 139.9 seconds in GABoost while in ExBoost,
336.4 seconds need to find a single feature. This
showed that selection of a single feature is 5.9 times
faster in GABoost than in ExBoost.
Algorithm
Average
hit rate
(%)
False positive
detection/total
number of detection
(%)
ExBoost 5F
GABoost 15F(10 Exp)
GABoost 15F(best-hit)
GABoost 15F(worst-hit)
90.01
90.13
94.25
85.68
62.97
61.56
62.32
49.83
20
0
1
2
3
4
5
6
Exp No
7
8
9
10
Fig. 10: Hit rates and False Positive rates in GABoost and ExBoost.
In Table 4, the results of the seven newly proposed
feature types are examined. For each feature type a, b,
c, d, e, f and g in Fig. 5, the average number of times
that they are selected during cascade training using
GABoost is shown.
TABLE 3: Comparison of hit rates and false positive rate performed
by the cascades of classifiers built by ExBoost and GABoost.
Proceedings of the Third IASTED International Conference on Computational Intelligence
July 2-4, 2007, Banff, Alberta, Canada
ISBN Hardcopy: 978-0-88986-671-3 / CD: 978-0-88986-672-0
139
Feature
Type
Average
number
of
feature
selection
a
b
c
d
e
f
g
27.90
15.40
11.50
2.20
37.30
30.90
4.50
Total
seven
new
features
129.70
% from
total
features
20.60
TABLE 4: Details of the average number of seven new feature types
selected by GABoost.
These seven feature types contribute about 21% of the
total features selected by GABoost in various stages.
Among them, feature types a, e and f have important
roles in the cascade training as they are selected many
times. These three feature types contributes about
76.17% from the total seven new feature types
proposed and about 15.67% from the total features
selected in the cascades of boosted classifiers built by
GABoost. Feature types b and c have little impact on
the cascades as they were selected only 15.4 and 11.5
times. These numbers represent only 4.26% of the total
features selected. Table 4 also shows that feature types
d and g can not be considered as good feature types
since they are rarely selected in the cascade training.
Fig. 11 show the distributions of number of feature
types selected among these seven new feature types.
these new feature types have shown very significant
roles in the training of cascade of boosted classifiers.
However, we believe that the performance of these
seven feature types could be different if another set of
training samples is used. Nevertheless, they are many
directions for further research in this field. Other
search algorithms such as Memetic Algorithm [12]
may be used to replace GA to select features in cascade
training.
In GA itself, the dynamic rates of crossover and
mutation could also be implemented and analyzed.
This dynamic rate had been used as a tuning tool in one
part of the research done by T. L. Seng, M. Khalid and
R. Yusof [11]. On the other hand, since the speed of
detection rate was not addressed in this paper and by
looking at the number of features selected, it seems that
the performance of the face detector will be slightly
slower. Thus, the improvement of the cascaded
classifiers built by GABoost with very large feature
types could be done by using the Evolutionary
Prunning, an approach introduced recently in 2006 by
J. S. Jang and J. H. Kim [9]. Its purpose is to reduce the
number of weak classifiers while maintaining the
performance achieved by the cascade of classifiers.
GABoost can then be further improved using this
approach.
References
[1] P. Viola, M. Jones, Rapid object detection using a
boosted cascade of simple features, IEEE Proceedings
of the Computer Vision and Pattern Recognition
Conference (CVPR), December 11-13, Hawaii, USA,
2001.
6. Conclusion and Future Works
In this paper, we have extended the work of Treptow
and Zell [3] by implementing GA inside the AdaBoost
framework to select features. Fifteen stages of
cascaded classifiers are set up rather than building just
a single stage classifier only. Seven new feature types
were proposed in order to increase the quality of
feature solutions. As a consequence, the feature
solutions sets become bigger and we have shown how
GA can be used to overcome the problem of feature
selections in this huge search space. Training time was
drastically reduced and the cascade of boosted
classifiers trained using GABoost has achieved a better
hit rate and lower false positive rate compared to the
original exhaustive technique. This lower false positive
rate might happen due to the higher number and more
variety of features types selected by GABoost. We
have shown also that the best cascade of boosted
classifiers obtained by GABoost has outperformed the
original cascade built exhaustively in computational
training time, hit rate and false positive rate. On the
other hand, the seven new feature types have shown
different level of importance in this training. Three of
[2] R. Lienhart, A. Kuranov and V. Pisarevsky.
Empirical analysis of detection cascades of boosted
classifiers for rapid object detection. In DAGM'03, 25th
Pattern Recognition Symposium, pages 297-304,
Germany, 2003.
[3] A. Treptow & A. Zell, Combining Adaboost
Learning and Evolutionary Search to select Features
for Real-Time Object Detection, Proceedings Of the
Congress on Evolutionary Computational CEC 2004,
Vol. 2, 2107-2113, San Diego, USA, 2004.
[4] H. Rowley, S. Baluja and T. Kanabe. Neural
Network-based Face Detector, IEEE Trans. on Pattern
Analysis and Machine Intelligence, 20(1), page 23-28,
2000.
[5] K. Sung and T. Poggio. Example-based Learning
For View-based Face Detection, IEEE Transaction on
Pattern Analysis and Machine Intelligence, 20, page
39-51, 1998.
Proceedings of the Third IASTED International Conference on Computational Intelligence
July 2-4, 2007, Banff, Alberta, Canada
ISBN Hardcopy: 978-0-88986-671-3 / CD: 978-0-88986-672-0
140
[6] H. Schneiderman and T. Kanabe. A Statistical
method for object detection applied to faces and cars,
International Conference on Computer Vision and
Pattern Recognition, page 1746-1759, 2000.
[7] D. Roth, M. Yang and N. Ahuja. A Snowbased
Face Detector, Advances in Neural Information
Processing Systems 12 (NIPS 12), volume 12, 2000.
[8] S.Z. Li, Z.Q. Zhang, H. Shum and H. J. Zhan.
Floatboost Learning for Classification, 16th Annual
Conference on Neural Information Processing Systems
(NIPS), 2002.
[9] J. S. Jang and J. H. Kim. Evolutionary Prunning for
Fast and Robust Face Detection, IEEE Congress on
Evolutionary Computation CEC 2006, pages 12931299, Vancouver, Canada, July 2006.
[10] Y. Freund and R. E. Schapire. A Short
Introduction to Boosting, Journal of Japanese Society
for Artificial Intelligence, Vol. 14(5), pages 771-780,
September 1999.
[11] T. L. Seng, M. Khalid and R. Yusof. Tuning of A
Neuro-Fuzzy Controller by Genetic Algorithm With
An Application to A Coupled-Tank Liquid-Level
Control System, International Journal of Engineering
Applications on Artificial Intelligence, Vol. 11, pages
517-529, 1998.
[12] Areibi, S., Moussa, M., and Abdullah, H., A
Comparison of Genetic/Memetic Algorithms and Other
Heuristic Search Techniques, International Conference
on Artificial Intelligence, pages 660-666, Las Vegas,
Nevada, 2001
[13] BioID Face Database:
http://www.bioid.com/downloads/facedb/index.php
Proceedings of the Third IASTED International Conference on Computational Intelligence
July 2-4, 2007, Banff, Alberta, Canada
ISBN Hardcopy: 978-0-88986-671-3 / CD: 978-0-88986-672-0

Download Report

ZalhanMohdZinMFKE2007TTT

Paperzz.com

Your Paperzz