Adaboost Algorithm example

Learning with AdaBoost
Fall 2007
Outline
 Introduction
and background of Boosting
and Adaboost
 Adaboost Algorithm example
 Adaboost Algorithm in current project
 Experiment results
 Discussion and conclusion
7/13/2017
Learning with Adaboost
2
Outline
 Introduction
and background of Boosting
and Adaboost
 Adaboost Algorithm example
 Adaboost Algorithm in current project
 Experiment results
 Discussion and conclusion
7/13/2017
Learning with Adaboost
3
Boosting Algorithm


Definition of Boosting[1]:
Boosting refers to a general method of producing a very
accurate prediction rule by combining rough and
moderately inaccurate rules-of-thumb.
Boosting procedures[2]

Given a set of labeled training examples xi , yi  i  1 N,where
yi is the label associated with instance x
i

On each round
t  1, , T ,
• The booster devises a distribution (importance)Dt over the example
set
• The booster requests a weak hypothesis (rule-of-thumb) ht with low
error
t

After T rounds, the booster combine the weak hypothesis into a
single prediction rule.
7/13/2017
Learning with Adaboost
4
Boosting Algorithm(cont’d)
 The
intuitive idea
Altering the distribution over the domain in a way
that increases the probability of the “harder” parts of
the space, thus forcing the weak learner to generate
new hypotheses that make less mistakes on these
parts.
 Disadvantages
Needs to know the prior knowledge of accuracies
of the weak hypotheses
 The performance bounds depends only on the
accurate
weak hypothesis 5
7/13/2017accuracy of the least
Learning with
Adaboost

background of Adaboost[2]
7/13/2017
Learning with Adaboost
6
Adaboost Algorithm[2]
7/13/2017
Learning with Adaboost
7
Advantages of Adaboost
 Adaboost
adjusts adaptively the errors of
the weak hypotheses by WeakLearn.
 Unlike the conventional boosting
algorithm, the prior error need not be
known ahead of time.
 The update rule reduces the probability
assigned to those examples on which the
hypothesis makes a good predictions and
increases the probability of the examples
on which the prediction is poor.
7/13/2017
Learning with Adaboost
8
The error bound[3]

Suppose the weak learning algorithm WeakLearn, when
called by Adaboost, generates hypotheses with errors
 ,, 
. Then the error   Pr h x   y  of the final hypothesis
h
output by Adaboost is bounded above by
1
i~ D
f
i
i
T
f
T
   2  t 1   t 
t 1
Note that the errors generated by WeakLearn are not
uniform, and the final error depends on the error of all of
the weak hypotheses. Recall that the errors of the
previous boosting algorithms depend only on the
maximal error of the weakest hypothesis and ignored the
advantages that can be gained from the hypotheses
whose errors are smaller.
7/13/2017
Learning with Adaboost
9
Outline
 Introduction
and background of Boosting
and Adaboost
 Adaboost Algorithm example
 Adaboost Algorithm in current project
 Experiment results
 Discussion and conclusion
7/13/2017
Learning with Adaboost
10
A toy example[2]
Training set: 10 points
(represented by plus or minus)
Original Status: Equal
Weights for all training
samples
7/13/2017
Learning with Adaboost
11
A toy example(cont’d)
Round 1: Three “plus” points are not correctly classified;
They are given higher weights.
7/13/2017
Learning with Adaboost
12
A toy example(cont’d)
Round 2: Three “minuse” points are not correctly classified;
They are given higher weights.
7/13/2017
Learning with Adaboost
13
A toy example(cont’d)
Round 3: One “minuse” and two “plus” points are not
correctly classified;
They are given higher weights.
7/13/2017
Learning with Adaboost
14
A toy example(cont’d)
Final Classifier: integrate the three “weak” classifiers and
obtain a final strong classifier.
7/13/2017
Learning with Adaboost
15
Outline
 Introduction
and background of Boosting
and Adaboost
 Adaboost Algorithm example
 Adaboost Algorithm in current project
 Experiment results
 Discussion and conclusion
7/13/2017
Learning with Adaboost
16
Look at Adaboost[3] Again
7/13/2017
Learning with Adaboost
17
Adaboost(Con’d):
Multi-class Extensions
 The
previous discussion is restricted to
binary classification problems. The set Y
could have any number of labels, which is
a multi-class problems.
 The multi-class case (AdaBoost.M1)
requires the accuracy of the weak
hypothesis greater than ½. This condition
in the multi-class is stronger than that in
the binary classification cases
7/13/2017
Learning with Adaboost
18
AdaBoost.M1
7/13/2017
Learning with Adaboost
19
Error Upper Bound of
Adaboost.M1[3]

Like the binary classification case, the error of
the final hypothesis is also bounded.
T
   2  t 1   t 
t 1
7/13/2017
Learning with Adaboost
20
How does Adaboost.M1 work[4]?
7/13/2017
Learning with Adaboost
21
Adaboost in our project
7/13/2017
Learning with Adaboost
22
Adaboost in our project

1) The initialization has set the total weights of
target class the same as all other staff.
bird[1,…,10] = ½ * 1/10;
otherstaff[1,…,690] = ½ * 1/690;
 2) The history record is preserved to strengthen the
updating process of the weights.
 3) the unified model obtained from CPM alignment are
used for training process.
7/13/2017
Learning with Adaboost
23
Adaboost in our project

2) The history record
weight_histogram(with
History Record)
7/13/2017
Learning with Adaboost
weight_histogram(
without History
Record)
24
Adaboost in our project
3) the unified model obtained from CPM alignment
are used for training process. This has
decreased the overfitting problem.
3.1) Overfitting Problem.
3.2) CPM model.
7/13/2017
Learning with Adaboost
25
Adaboost in our project
3.1) Overfitting Problem.
Why the trained Adaboost does not work for bird 11~20?
I have compared:
I ) the rank of alpha value for each 60 classifiers
II) how each classifier has actually detected birds in train process
III) how each classifier has actually detected birds in test process.
The covariance is also computed for comparison:
cov(c(:,1),c(:,2))
ans = 305.0000 6.4746
6.4746 305.0000
K>> cov(c(:,1),c(:,3))
Overfitted!
ans = 305.0000 92.8644
92.8644 305.0000
K>> cov(c(:,2),c(:,3))
ans = 305.0000 -46.1186
-46.1186 305.0000
7/13/2017
Train data is different
from test data. This is
very common.
Learning with Adaboost
26
Adaboost in our project
Train Result
(Covariance:6.4746)
7/13/2017
Learning with Adaboost
27
Adaboost in our project
Comparison:Train&Test Result
(Covariance:92.8644)
7/13/2017
Learning with Adaboost
28
Adaboost in our project
3.2) CPM: continuous profile model; put forward
by Jennifer Listgarten. This is very useful for
data alignment.
7/13/2017
Learning with Adaboost
29
Adaboost in our project

The alignment results from CPM model:
Aligned and Scaled Data
2
1
0
-1
-2
0
50
100
150
Latent Time
Unaligned and Unscaled Data
200
250
2
1
0
-1
-2
7/13/2017
0
10
20
30
40
50
60
Experimental Time
Learning with Adaboost
70
80
90
100
30
Adaboost in our project

The unified model from CPM alignment:
2
1.2
1
1.5
0.8
0.6
1
0.4
0.5
0.2
0
0
-0.2
-0.4
-0.5
-0.6
-1
0
50
100
150
200
without resampled
7/13/2017
250
-0.8
0
10
20
30
40
50
60
70
80
90
100
after upsample
and downsample
Learning with Adaboost
31
Adaboost in our project

The influence of CPM for history record
History Record(without CPM Alignment)
History Record(using CPM Alignment)
1
0.8
0.9
0.7
0.8
0.6
0.7
0.5
0.6
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
100
7/13/2017
200
300
400
500
600
700
0
0
100
Learning with Adaboost
200
300
400
500
600
700
32
Outline
 Introduction
and background of Boosting
and Adaboost
 Adaboost Algorithm example
 Adaboost Algorithm in current project
 Experiment results
 Discussion and conclusion
7/13/2017
Learning with Adaboost
33
Browse all birds
7/13/2017
Learning with Adaboost
34
Curvature Descriptor
7/13/2017
Learning with Adaboost
35
Distance Descriptor
7/13/2017
Learning with Adaboost
36
Adaboost without CPM
7/13/2017
Learning with Adaboost
37
Adaboost without CPM(con’d)
7/13/2017
Learning with Adaboost
38
Good_Part_Selected
(Adaboost without CPM con’d)
7/13/2017
Learning with Adaboost
39
Adaboost without CPM(con’d)


The Alpha Values
0.075527
0
0.080877
0.168358
0
0
0
0
0.146951
0.007721
0.218146
0
0.081063
0
0
0.060681
0
0
0.197824
0
0.08873
0
0.080742
0.015646
0
0.080659
0.269843
0
0.028159
0
0
0.19772
0.086019
0.217678
0
0.21836
0
0.080554
0
0
0
0.190074
0
0.21237
0
0
0
0
0
0.060744
0
0
0
0
0.179449
0.338801
0.080667
0.080895
0
0.267993
Other Statistical Data: zero rate: 0.5333;
covariance: 0.0074; median: 0.0874
7/13/2017
Learning with Adaboost
40
Adaboost with CPM
7/13/2017
Learning with Adaboost
41
Adaboost with CPM(con’d)
7/13/2017
Learning with Adaboost
42
Adaboost with CPM(con’d)
7/13/2017
Learning with Adaboost
43
Good_Part_Selected
(Adaboost without CPM con’d)
7/13/2017
Learning with Adaboost
44
Adaboost without CPM(con’d)


The Alpha Values
2.521895
0
2.510827
0.714297
0
0
1.646754
0
0
0
0
0
2.134926
0
2.167948
0
2.526712
0
0.279277
0
0
0
0.0635
2.322823
0
0
2.516785
0
0
0
0
0.04174
0
0.207436
0
0
0
0
1.30396
0
0
0.951666
0
2.513161
2.530245
0
0
0
0
0
0
0.041627
2.522551
0
0.72565
0
2.506505
1.303823
0
1.611553
Other Statistical Data: zero rate: 0.6167;
covariance: 0.9488; median: 1.6468
7/13/2017
Learning with Adaboost
45
Outline
 Introduction
and background of Boosting
and Adaboost
 Adaboost Algorithm example
 Adaboost Algorithm in current project
 Experiment results
 Discussion and conclusion
7/13/2017
Learning with Adaboost
46
Conclusion and discussion
1) Adaboost works with CPM unified model;
This model has smoothed the trained data set
and decreased the influence of overfitting.
2) The influence of history record is very
interesting. It will suppress the noise and
strengthen the WeakLearn boosting direction.
3) The step length of KNN selected by Adaboost
is not discussed here. This is also useful for
suppress noise.
7/13/2017
Learning with Adaboost
47
Conclusion and
discussion(con’d)
4) The Adaboost does not rely on the trained order.
The obtained Alpha value has very similar distribution for all the classifiers.
There are two examples:
Example 1: four different train orders have obtained the Alpha as follow:
1) 6 birds
Alpha_All1=
0.4480 0.1387 0.2074 0.5949 0.5868 0.3947 0.3874
0.5634 0.6694 0.7447
2) 6 birds
Alpha_All2=
0.3998 0.0635 0.2479 0.6873 0.5868 0.2998 0.4320
0.5581 0.6946 0.7652
3) 6 birds
Alpha_All3 =
0.4191 0.1301 0.2513 0.5988 0.5868 0.2920 0.4286
0.5503 0.6968 0.7134
4) 6 birds
Alpha_All4=
0.4506 0.0618 0.2750 0.5777 0.5701 0.3289 0.5948
0.5857 0.7016 0.6212
7/13/2017
Learning with Adaboost
48
Conclusion and
discussion(con’d)
7/13/2017
Learning with Adaboost
49
Conclusion and
discussion(con’d)
Example 2: 60 parts from Curvature Descriptor,
60 from Distance Descriptor;
1) They are trained independently at first;
2) Then they are combined to be trained
together.
The results are as follow:
7/13/2017
Learning with Adaboost
50
Conclusion and
discussion(con’d)
7/13/2017
Learning with Adaboost
51
Conclusion and
discussion(con’d)

5) how to combine the curvature and distance
descriptor will be another important problem.
Currently I can obtain nice results by combining
them. 10 birds are all found.

Are they stable for all other class? How to
integrate the improved Adaboost to combine the
two descriptors? Maybe Adaboost will improve
even further (for general stuff, for example,
elephant or camel).
7/13/2017
Learning with Adaboost
52
Conclusion and
discussion(con’d)
Current results without Adaboost:
7/13/2017
Learning with Adaboost
53
Conclusion and
discussion(con’d)
?
?
6) How about the influence from the search order?
Could we try to reverse the search order?
My current result has improved by one more bird, but not
too much.
7) How many models could we obtain from the CPM
model?
Currently I am using only one unified model.
8) Why does the rescaled model not work?
(I do not think curvature is so sensitive to the rescale).
9) Could we try to boosting the Neural Network?
7/13/2017
Learning with Adaboost
54
Conclusion and
discussion(con’d)
?
?
10) Could we try to change the boosting function?
Currently I am using the Logistical Regression
projection function to transmit the error
information to Alpha value; anyway, there are
many methods to do this work. For example:c45,
decision stump, decision table, naïve bayes,
voted perceptron and zeroR. etc.
11) How to use decision tree to replace Adaboost?
I think this will impede the search speed; but I
am not sure the quality.
7/13/2017
Learning with Adaboost
55
Conclusion and
discussion(con’d)
? 12) How about the fuzzy SVM or  SVM to
?
?
address this good parts selection problem?
13) How to understand the difference among good
parts selected by computer and by human?
(Do the parts from computer program have the
similar semantic meaning?)
14) How about the stability of Curvature and
Distance Descriptors?
7/13/2017
Learning with Adaboost
56
Thanks!
Reference

[1] Yoav Freund, Robert Schapire, a short
Introduction to Boosting
 [2] Robert Schapire, the boosting approach to
machine learning; Princeton University
 [3] Yoav Freund, Robert Schapire, A decisiontheoretic generalization of on-line learning and
application to boosting
 [4] R. Polikar, Ensemble Based Systems in
Decision Making, IEEE Circuits and Systems
Magazine, vol.6, no.3, pp. 21-45, 2006.
7/13/2017
Learning with Adaboost
58