Adaboost

AdaBoost & Its
Applications
主講人:虞台文
Outline




Overview
The AdaBoost Algorithm
How and why AdaBoost works?
AdaBoost for Face Detection
AdaBoost & Its
Applications
Overview
Introduction
AdaBoost
Adaptive
Boosting
A learning algorithm
Building a strong classifier a lot of weaker ones
AdaBoost Concept
h1 ( x) {1, 1}
h2 ( x) {1, 1}
..
.
 T

HT ( x)  sign    t ht ( x) 
 t 1

hT ( x) {1, 1}
weak classifiers
slightly better than random
strong classifier
Weaker Classifiers
h1 ( x) {1, 1}
h2 ( x) {1, 1}
..
.
hT ( x) {1, 1}
weak classifiers
slightly better than random


Each weak classifier learns by
considering one simple feature
T most beneficial features for
classification should
T be selected


HT ( x)  sign    t ht ( x) 
 How to
 t 1

–
–
–
–
–
define features?
select beneficial features?
train weak classifiers?
manage (weight) training samples?
associate weight to each weak
classifier?
strong classifier
The Strong Classifiers
h1 ( x) {1, 1}
h2 ( x) {1, 1}
..
.
 T

HT ( x)  sign    t ht ( x) 
 t 1

hT ( x) {1, 1}
weak classifiers
slightly better than random
strong classifier
AdaBoost & Its
Applications
The AdaBoost
Algorithm
The AdaBoost Algorithm
Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
Initialization: D1 (i)  m1 , i  1,
For t  1,
,m
Dt (i):probability distribution of xi 's at time t
,T :
• Find classifier ht : X  {1, 1} which minimizes error wrt Dt ,i.e.,
m
ht  arg min  j where  j   Dt (i )[ yi  h j ( xi )]
minimize weighted error
i 1
hj
• Weight classifier:  t 
1 1 t
ln
2
t
for minimize exponential loss
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
Give error classified patterns more chance for learning.
• Update distribution: Dt 1 (i ) 
The AdaBoost Algorithm
Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
Initialization: D1 (i)  m1 , i  1,
For t  1,
,m
,T :
• Find classifier ht : X  {1, 1} which minimizes error wrt Dt ,i.e.,
m
ht  arg min  j where  j   Dt (i )[ yi  h j ( xi )]
hj
• Weight classifier:  t 
i 1
1 1 t
ln
2
t
• Update distribution: Dt 1 (i ) 
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
T


Output final classifier: sign  H ( x)    t ht ( x) 
t 1


Boosting illustration
Weak
Classifier 1
Boosting illustration
Weights
Increased
Boosting illustration
Weak
Classifier 2
Boosting illustration
Weights
Increased
Boosting illustration
Weak
Classifier 3
Boosting illustration
Final classifier is
a combination of weak
classifiers
AdaBoost & Its
Applications
How and why
AdaBoost works?
What goal the AdaBoost wants to reach?
The AdaBoost Algorithm
Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
Initialization: D1 (i)  m1 , i  1,
For t  1,
,m
,T :
• Find classifier ht : X  {1, 1} which minimizes error wrt Dt ,i.e.,
m
ht  arg min  j where  j   Dt (i )[ yi  h j ( xi )]
hj
• Weight classifier:  t 
i 1
1 1 t
ln
2
t
• Update distribution: Dt 1 (i ) 
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
T


Output final classifier: sign  H ( x)    t ht ( x) 
t 1


What goal the AdaBoost wants to reach?
The AdaBoost Algorithm
Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
Initialization: D1 (i)  m1 , i  1,
For t  1,
They are goal
dependent.
,m
,T :
• Find classifier ht : X  {1, 1} which minimizes error wrt Dt ,i.e.,
m
ht  arg min  j where  j   Dt (i )[ yi  h j ( xi )]
hj
• Weight classifier:  t 
i 1
1 1 t
ln
2
t
• Update distribution: Dt 1 (i ) 
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
T


Output final classifier: sign  H ( x)    t ht ( x) 
t 1


Goal
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Minimize exponential loss
lossexp  H ( x)  Ex , y e
 yH ( x )

Goal
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Minimize exponential loss
lossexp  H ( x)  Ex , y e
 yH ( x )

Maximize the margin yH(x)
Minimize
Goal
lossexp  H ( x)  Ex , y e  yH ( x ) 
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Define H t ( x)  H t 1 ( x)  t ht ( x) with H 0 ( x)  0
Then, H ( x)  HT ( x)
Ex , y e yHt ( x )   Ex  E y e yHt ( x ) | x  
 Ex  E y e y[ Ht1 ( x )t ht ( x )] | x  
 Ex  E y e yHt1 ( x )e yt ht ( x ) | x  
 Ex e yHt1 ( x ) et P( y  ht ( x))  et P( y  ht ( x))  
t  ?
Minimize
lossexp  H ( x)  Ex , y e  yH ( x ) 
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Define H t ( x)  H t 1 ( x)  t ht ( x) with H 0 ( x)  0
Then, H ( x)  HT ( x)
Ex , y e yHt ( x )   Ex eEyyHte1( xyH) t (ex )| txP(y  ht ( x))  et P( y  ht ( x))  

 yH t ( x )

  0
E
e
Set
x, y 
 t
 Ex e yHt1 ( x )  et P( y  ht ( x))  et P( y  ht ( x))    0
0
t  ?
Minimize
lossexp  H ( x)  Ex , y e  yH ( x ) 
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Define H t ( x)  H t 1 ( x)  t ht ( x) with H 0 ( x)  0
Then, H ( x)  HT ( x)
 t 
1 1 t
1 P( y  ht ( x))
  t  ln
ln
2
t
2 P( y  ht ( x))
P( xi , yi )  Dt (i)
m
 t  P(error)   Dt (i )[ yi  h j ( xi )]
i 1
 Ex e yHt1 ( x )  et P( y  ht ( x))  et P( y  ht ( x))    0
0
Given: ( x1 , y1 ),
t  ?
,( xm , ym ) where xi  X , yi {1, 1}
 yH ( x )


Minimize
loss
H
(
x
)

E
e


x, y 
Initialization: D (i)  , i  1, , mexp
1
m
1
For t  1,
,T :


Final
classifier:
h  arg min
 where    Dsign
(i )[ y 
 hH( x()]x)    t ht ( x) 
t 1


• Find classifier ht : X  {1, 1} which minimizes errorTwrt Dt ,i.e.,
m
t
j
j
hj
• Weight classifier:  t 
i 1
t
i
j
i
1 1 t
ln
2
t
Dt (i )(exp[
 t yi ht ( xi )]
( x) distribution:
t ht ( x) with
Define H t ( x)  H t•1Update
Dt 1 (i )  H
, Zt
0 x)  0
Then, H ( x)  HTOutput
( x) final classifier:
 t 
Zt
is for normalization
T


sign  H ( x)    t ht ( x) 
t 1


1 1 t
1 P( y  ht ( x))
  t  ln
ln
2
t
2 P( y  ht ( x))
P( xi , yi )  Dt (i)
m
 t  P(error)   Dt (i )[ yi  h j ( xi )]
i 1
 Ex e yHt1 ( x )  et P( y  ht ( x))  et P( y  ht ( x))    0
0
Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
 yH ( x )


Minimize
loss
H
(
x
)

E
e


exp
x~ D, y 
Initialization: D (i)  , i  1, , m
1
m
1
Dt 1  ?
For t  1,
,T :


Final
classifier:
h  arg min
 where    Dsign
(i )[ y 
 hH( x()]x)    t ht ( x) 
t 1


• Find classifier ht : X  {1, 1} which minimizes errorTwrt Dt ,i.e.,
m
t
j
j
hj
• Weight classifier:  t 
i 1
t
i
j
i
1 1 t
ln
2
t
Dt (i )(exp[
 t yi ht ( xi )]
( x) distribution:
t ht ( x) with
Define H t ( x)  H t•1Update
Dt 1 (i )  H
, Zt
0 x)  0
Then, H ( x)  HTOutput
( x) final classifier:
 t 
Zt
is for normalization
T


sign  H ( x)    t ht ( x) 
t 1


1 1 t
1 P( y  ht ( x))
  t  ln
ln
2
t
2 P( y  ht ( x))
P( xi , yi )  Dt (i)
m
 t  P(error)   Dt (i )[ yi  h j ( xi )]
i 1
 Ex e yHt1 ( x )  et P( y  ht ( x))  et P( y  ht ( x))    0
0
Minimize lossexp  H ( x)  Ex ~ D , y e  yH ( x ) 
Dt 1  ?
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Define H t ( x)  H t 1 ( x)  t ht ( x) with H 0 ( x)  0
Then, H ( x)  HT ( x)

1


Ex , y e  yHt   Ex , y e  yHt 1 e  yt ht   Ex , y e yHt1 1  y t ht   t2 y 2 ht2  
2




1


 ht  arg min Ex , y e yHt 1 1  y t h   t2 y 2 h 2  
h
2




1 

 ht  arg min Ex , y e yHt1 1  y t h   t2  
h
2 


   yHt1 
1 2  
 ht  arg min Ex  E y e
1  yt h  t   | x 
h
2  

 
y 2 h2  1
Minimize lossexp  H ( x)  Ex ~ D , y e  yH ( x ) 
Dt 1  ?
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Define H t ( x)  H t 1 ( x)  t ht ( x) with H 0 ( x)  0
Then, H ( x)  HT ( x)
 ht  arg max Ex 1 h( x)e Ht1 ( x )  P( y  1| x)  (1)  h( x)e Ht1 ( x )  P( y  1| x) 
h
 ht  arg max Ex  E y e  yHt 1  yh   | x 
h
 ht  arg min Ex  E y e  yHt 1   y t h   | x 
h
   yHt1 
1 2  
 ht  arg min Ex  E y e
1  yt h  t   | x 
h
2  

 
Minimize lossexp  H ( x)  Ex ~ D , y e  yH ( x ) 
Dt 1  ?
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Define H t ( x)  H t 1 ( x)  t ht ( x) with H 0 ( x)  0
Then, H ( x)  HT ( x)
 ht  arg max Ex 1 h( x)e Ht1 ( x )  P( y  1| x)  (1)  h( x)e Ht1 ( x )  P( y  1| x) 
h
 ht  arg max Ex, y ~e yHt1 ( x ) P( y| x)  yh( x)
h

 h ( x)  sign  P
 ht ( x)  sign Ex, y ~e yHt1 ( x ) P ( y|x )  y | x 
t
x , y ~ e yHt 1 ( x ) P ( y| x )
maximized when y  h( x) x

( y  1| x)  Px, y ~e yHt1 ( x ) P ( y|x ) ( y  1| x)

Minimize lossexp  H ( x)  Ex ~ D , y e  yH ( x ) 
Dt 1  ?
T


Final classifier: sign  H ( x)    t ht ( x) 
t 1


Define H t ( x)  H t 1 ( x)  t ht ( x) with H 0 ( x)  0
Then, H ( x)  HT ( x)
At time t

x, y ~ e yHt1 ( x ) P( y | x)
 ht ( x)  sign Px, y ~e yHt1 ( x ) P ( y|x ) ( y  1| x)  Px, y ~e yHt1 ( x ) P ( y|x ) ( y  1| x)

Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
 yH ( x )


Minimize
loss
H
(
x
)

E
e


exp
x~ D, y 
Initialization: D (i)  , i  1, , m
1
m
1
Dt 1  ?
For t  1,
,T :


Final
classifier:
h  arg min
 where    Dsign
(i )[ y 
 hH( x()]x)    t ht ( x) 
t 1


• Find classifier ht : X  {1, 1} which minimizes errorTwrt Dt ,i.e.,
m
t
j
j
hj
• Weight classifier:  t 
i 1
t
i
j
i
1 1 t
ln
2
t
Dt (i )(exp[
 t yi ht ( xi )]
( x) distribution:
t ht ( x) with
Define H t ( x)  H t•1Update
Dt 1 (i )  H
, Zt
0 x)  0
Then, H ( x)  HTOutput
( x) final classifier:
Zt
is for normalization
T


sign  H ( x)    t ht ( x) 
t 1


x, y ~ e yHt1 ( x ) P( y | x)
At time t
At time 1 x, y ~ P( y | x)
At time t+1
x, y ~ e
 Dt 1 (i ) 
 yHt ( x )
P( yi | xi )  1  D1 (1) 
1 1

Z1 m
 t yht ( x )

D
e
P( y | x)
t
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
AdaBoost & Its
Applications
AdaBoost for
Face Detection
The Task of
Face Detection
Many slides adapted from P. Viola
Basic Idea

Slide a window across image and evaluate a face model at
every location.
Challenges



Slide a window across image and evaluate a face model at
every location.
Sliding window detector must evaluate tens of thousands of
location/scale combinations.
Faces are rare: 0–10 per image
–
For computational efficiency, we should try to spend as little
time as possible on the non-face windows
–
A megapixel image has ~106 pixels and a comparable number of
candidate face locations
–
To avoid having a false positive in every image image, our false
positive rate has to be less than 106
The Viola/Jones Face Detector



A seminal approach to real-time object detection
Training is slow, but detection is very fast
Key ideas
–
–
–
Integral images for fast feature evaluation
Boosting for feature selection
Attentional cascade for fast rejection of non-face
windows
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.
CVPR 2001.
P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
Image Features
Rectangle filters
Feature Value   (Pixel in white area) 
 (Pixel in black area)
Feature Value   (Pixel in white area) 
 (Pixel in black area)
Image Features
Rectangle filters
Size of Feature Space
• How many number of possible
rectangle features for a
24x24 detection region?
12
Feature Value   (Pixel in white area) 
A+B 2 (24  2 w  1)(24  h  1) 
w 1 h 1
 (Pixel in black area)
8
C
D
24
2 (24  3w  1)(24  h  1) 
w 1 h 1
12
Rectangle filters
24
12
 (24  2w  1)(24  2h  1)
w 1 h 1
160, 000
Feature Selection
• How many number of possible
rectangle features for a
24x24 detection region?
12
24
A+B 2 (24  2 w  1)(24  h  1) 
w 1 h 1
8
C
2 (24  3w  1)(24  h  1) 
w 1 h 1
12
What features are good for
face detection?
D
24
12
 (24  2w  1)(24  2h  1)
w 1 h 1
160, 000
Feature Selection
• How many number of possible
rectangle features for a
24x24 detection region?
12
24
A+B 2 (24  2 w  1)(24  h  1) 
w 1 h 1
8
C

Can we create a good classifier
using just a small subset of all
possible features?

How to select such a subset?
2 (24  3w  1)(24  h  1) 
w 1 h 1
12
D
24
12
 (24  2w  1)(24  2h  1)
w 1 h 1
160, 000
Integral images

The integral image computes
a value at each pixel (x, y)
that is the sum of the pixel
values above and to the left
of (x, y), inclusive.
(x, y)
ii ( x, y ) 

x  x , y  y
i ( x, y)
Computing the Integral Image


The integral image computes
a value at each pixel (x, y)
that is the sum of the pixel
values above and to the left
of (x, y), inclusive.
This can quickly be
computed in one pass
through the image.
ii ( x, y  1)
s( x  1, y )
ii ( x, y ) 
i ( x, y )
(x, y)

x  x , y  y
s( x, y )  s ( x  1, y )  i ( x, y )
ii ( x, y )  ii ( x, y  1)  s( x, y )
i ( x, y)
Computing Sum within a Rectangle
sum  iiA  iiB  iiC  iiD
D
B
C
A
Scaling



Integral image enables
us to evaluate all
rectangle sizes in
constant time.
Therefore, no image
scaling is necessary.
Scale the rectangular
features instead!
1
2
3
4
5
6
Boosting

Boosting is a classification scheme that works by
combining weak learners into a more accurate
ensemble classifier
–

A weak learner need only do better than chance
Training consists of multiple boosting rounds
–
–
During each boosting round, we select a weak learner
that does well on examples that were hard for the
previous weak learners
“Hardness” is captured by weights attached to training
examples
Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese
Society for Artificial Intelligence, 14(5):771-780, September, 1999.
The AdaBoost Algorithm
Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
Initialization: D1 (i)  m1 , i  1,
For t  1,
,m
Dt (i):probability distribution of xi 's at time t
,T :
• Find classifier ht : X  {1, 1} which minimizes error wrt Dt ,i.e.,
m
ht  arg min  j where  j   Dt (i )[ yi  h j ( xi )]
minimize weighted error
i 1
hj
• Weight classifier:  t 
1 1 t
ln
2
t
for minimize exponential loss
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
Give error classified patterns more chance for learning.
• Update distribution: Dt 1 (i ) 
The AdaBoost Algorithm
Given: ( x1 , y1 ),
,( xm , ym ) where xi  X , yi {1, 1}
Initialization: D1 (i)  m1 , i  1,
For t  1,
,m
,T :
• Find classifier ht : X  {1, 1} which minimizes error wrt Dt ,i.e.,
m
ht  arg min  j where  j   Dt (i )[ yi  h j ( xi )]
hj
• Weight classifier:  t 
i 1
1 1 t
ln
2
t
• Update distribution: Dt 1 (i ) 
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
T


Output final classifier: sign  H ( x)    t ht ( x) 
t 1


Weak Learners for Face Detection
What base learner is proper
,( xm , ym ) where xi  X , yi {1, 1}
for face detection?
Initialization: D1 (i)  m1 , i  1, , m
For t  1, , T :
Given: ( x1 , y1 ),
• Find classifier ht : X  {1, 1} which minimizes error wrt Dt ,i.e.,
m
ht  arg min  j where  j   Dt (i )[ yi  h j ( xi )]
hj
• Weight classifier:  t 
i 1
1 1 t
ln
2
t
• Update distribution: Dt 1 (i ) 
Dt (i ) exp[ t yi ht ( xi )]
, Z t is for normalization
Zt
T


Output final classifier: sign  H ( x)    t ht ( x) 
t 1


Weak Learners for Face Detection
value of rectangle
feature
parity
1 if pt ft ( x)  ptt
ht ( x)  
0 otherwise
window
threshold
Boosting

Training set contains face and nonface examples
–

For each round of boosting:
–
–
–
–

Initially, with equal weight
Evaluate each rectangle filter on each example
Select best threshold for each filter
Select best filter/threshold combination
Reweight examples
Computational complexity of learning: O(MNK)
–
M rounds, N examples, K features
Features Selected by Boosting
First two features selected by boosting:
This feature combination can yield 100% detection
rate and 50% false positive rate
A 200-feature classifier can yield 95% detection
rate and a false positive rate of 1 in 14084.
ROC Curve for 200-Feature Classifier
To be practical for real application, the false
positive rate must be closer to 1 in 1,000,000.
Attentional Cascade



We start with simple classifiers which reject many of the
negative sub-windows while detecting almost all positive
sub-windows
Positive response from the first classifier triggers the
evaluation of a second (more complex) classifier, and so on
A negative outcome at any point leads to the immediate
rejection of the sub-window
IMAGE
SUB-WINDOW
Classifier 1
F
NON-FACE
T
Classifier 2
F
NON-FACE
T
Classifier 3
F
NON-FACE
T
FACE
Attentional Cascade
ROC Curve
Chain classifiers that
are progressively more
complex and have lower
false positive rates
% False Pos
0
50
100
% Detection

0
IMAGE
SUB-WINDOW
Classifier 1
F
NON-FACE
T
Classifier 2
F
NON-FACE
T
Classifier 3
F
NON-FACE
T
FACE
Detection Rate and False Positive Rate
K
K
for Chained Classifiers
F   fi , D   di
i 1


i 1
The detection rate and the false positive rate of the
cascade are found by multiplying the respective rates of
the individual stages
A detection rate of 0.9 and a false positive rate on the
order of 106 can be achieved by a 10-stage cascade if each
stage has a detection rate of 0.99 (0.9910 ≈ 0.9) and a false
positive rate of about 0.30 (0.310 ≈ 6106 )
f2 , d2
f1 , d1
IMAGE
SUB-WINDOW
Classifier 1
F
NON-FACE
T
Classifier 2
F
NON-FACE
F, D
f 3 , d3
T
Classifier 3
F
NON-FACE
T
FACE
Training the Cascade


Set target detection and false positive rates for
each stage
Keep adding features to the current stage until
its target rates have been met
–
–


Need to lower AdaBoost threshold to maximize
detection (as opposed to minimizing total classification
error)
Test on a validation set
If the overall false positive rate is not low enough,
then add another stage
Use false positives from current stage as the
negative training examples for the next stage
Training the Cascade
ROC Curves Cascaded Classifier to
Monlithic Classifier
ROC Curves Cascaded Classifier to
Monlithic Classifier


There is little difference between
the two in terms of accuracy.
There is a big difference in terms
of speed.
–
The cascaded classifier is nearly
10 times faster since its first
stage throws out most non-faces
so that they arenever evaluated by
subsequent stages.
The Implemented System

Training Data
–
5000 faces

–
300 million non-faces

–
9500 non-face images
Faces are normalized


All frontal, rescaled to
24x24 pixels
Scale, translation
Many variations
–
–
–
Across individuals
Illumination
Pose
Structure of the Detector Cascade

Combining successively more complex classifiers in cascade
–
–
38 stages
included a total of 6060 features
All Sub-Windows
1
T
F
2
F
T
3
F
T
4
F
T
5
F
T
6
F
T
7
T
F
Reject Sub-Window
8
F
T
38
F
T
Face
Structure of the Detector Cascade
2 features, reject 50% non-faces, detect 100% faces
10 features, reject 80% non-faces, detect 100% faces
25 features
50 features
All Sub-Windows
1
T
F
by algorithm
2
F
T
3
F
T
4
F
T
5
F
T
6
F
T
7
T
F
Reject Sub-Window
8
F
T
38
F
T
Face
Speed of the Final Detector

On a 700 Mhz Pentium III processor, the face
detector can process a 384 288 pixel image in
about .067 seconds”
–
–

15 Hz
15 times faster than previous detector of comparable
accuracy (Rowley et al., 1998)
Average of 8 features evaluated per window on
test set
Image Processing


Training  all example sub-windows were variance
normalized to minimize the effect of different
lighting conditions
Detection  variance normalized as well
 m 
2
Can be computed using
integral image
2
1
N
x
2
Can be computed using
integral image of
squared image
Scanning the Detector



Scaling is achieved by scaling the detector itself,
rather than scaling the image
Good detection results for scaling factor of 1.25
The detector is scanned across location
–
–
Subsequent locations are obtained by shifting the
window [s] pixels, where s is the current scale
The result for  = 1.0 and  = 1.5 were reported
Merging Multiple Detections
ROC Curves for Face Detection
Output of Face Detector on Test Images
Other Detection Tasks
Facial Feature Localization
Male vs.
female
Profile Detection
Other Detection Tasks
Facial Feature Localization
Profile Detection
Other Detection Tasks
Male vs. Female
Conclusions



How adaboost works?
Why adaboost works?
Adaboost for face detection
–
–
–
–
Rectangle features
Integral images for fast computation
Boosting for feature selection
Attentional cascade for fast rejection of
negative windows