Learning with AdaBoost Fall 2007 Outline Introduction and background of Boosting and Adaboost Adaboost Algorithm example Adaboost Algorithm in current project Experiment results Discussion and conclusion 7/13/2017 Learning with Adaboost 2 Outline Introduction and background of Boosting and Adaboost Adaboost Algorithm example Adaboost Algorithm in current project Experiment results Discussion and conclusion 7/13/2017 Learning with Adaboost 3 Boosting Algorithm Definition of Boosting[1]: Boosting refers to a general method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules-of-thumb. Boosting procedures[2] Given a set of labeled training examples xi , yi i 1 N,where yi is the label associated with instance x i On each round t 1, , T , • The booster devises a distribution (importance)Dt over the example set • The booster requests a weak hypothesis (rule-of-thumb) ht with low error t After T rounds, the booster combine the weak hypothesis into a single prediction rule. 7/13/2017 Learning with Adaboost 4 Boosting Algorithm(cont’d) The intuitive idea Altering the distribution over the domain in a way that increases the probability of the “harder” parts of the space, thus forcing the weak learner to generate new hypotheses that make less mistakes on these parts. Disadvantages Needs to know the prior knowledge of accuracies of the weak hypotheses The performance bounds depends only on the accurate weak hypothesis 5 7/13/2017accuracy of the least Learning with Adaboost background of Adaboost[2] 7/13/2017 Learning with Adaboost 6 Adaboost Algorithm[2] 7/13/2017 Learning with Adaboost 7 Advantages of Adaboost Adaboost adjusts adaptively the errors of the weak hypotheses by WeakLearn. Unlike the conventional boosting algorithm, the prior error need not be known ahead of time. The update rule reduces the probability assigned to those examples on which the hypothesis makes a good predictions and increases the probability of the examples on which the prediction is poor. 7/13/2017 Learning with Adaboost 8 The error bound[3] Suppose the weak learning algorithm WeakLearn, when called by Adaboost, generates hypotheses with errors ,, . Then the error Pr h x y of the final hypothesis h output by Adaboost is bounded above by 1 i~ D f i i T f T 2 t 1 t t 1 Note that the errors generated by WeakLearn are not uniform, and the final error depends on the error of all of the weak hypotheses. Recall that the errors of the previous boosting algorithms depend only on the maximal error of the weakest hypothesis and ignored the advantages that can be gained from the hypotheses whose errors are smaller. 7/13/2017 Learning with Adaboost 9 Outline Introduction and background of Boosting and Adaboost Adaboost Algorithm example Adaboost Algorithm in current project Experiment results Discussion and conclusion 7/13/2017 Learning with Adaboost 10 A toy example[2] Training set: 10 points (represented by plus or minus) Original Status: Equal Weights for all training samples 7/13/2017 Learning with Adaboost 11 A toy example(cont’d) Round 1: Three “plus” points are not correctly classified; They are given higher weights. 7/13/2017 Learning with Adaboost 12 A toy example(cont’d) Round 2: Three “minuse” points are not correctly classified; They are given higher weights. 7/13/2017 Learning with Adaboost 13 A toy example(cont’d) Round 3: One “minuse” and two “plus” points are not correctly classified; They are given higher weights. 7/13/2017 Learning with Adaboost 14 A toy example(cont’d) Final Classifier: integrate the three “weak” classifiers and obtain a final strong classifier. 7/13/2017 Learning with Adaboost 15 Outline Introduction and background of Boosting and Adaboost Adaboost Algorithm example Adaboost Algorithm in current project Experiment results Discussion and conclusion 7/13/2017 Learning with Adaboost 16 Look at Adaboost[3] Again 7/13/2017 Learning with Adaboost 17 Adaboost(Con’d): Multi-class Extensions The previous discussion is restricted to binary classification problems. The set Y could have any number of labels, which is a multi-class problems. The multi-class case (AdaBoost.M1) requires the accuracy of the weak hypothesis greater than ½. This condition in the multi-class is stronger than that in the binary classification cases 7/13/2017 Learning with Adaboost 18 AdaBoost.M1 7/13/2017 Learning with Adaboost 19 Error Upper Bound of Adaboost.M1[3] Like the binary classification case, the error of the final hypothesis is also bounded. T 2 t 1 t t 1 7/13/2017 Learning with Adaboost 20 How does Adaboost.M1 work[4]? 7/13/2017 Learning with Adaboost 21 Adaboost in our project 7/13/2017 Learning with Adaboost 22 Adaboost in our project 1) The initialization has set the total weights of target class the same as all other staff. bird[1,…,10] = ½ * 1/10; otherstaff[1,…,690] = ½ * 1/690; 2) The history record is preserved to strengthen the updating process of the weights. 3) the unified model obtained from CPM alignment are used for training process. 7/13/2017 Learning with Adaboost 23 Adaboost in our project 2) The history record weight_histogram(with History Record) 7/13/2017 Learning with Adaboost weight_histogram( without History Record) 24 Adaboost in our project 3) the unified model obtained from CPM alignment are used for training process. This has decreased the overfitting problem. 3.1) Overfitting Problem. 3.2) CPM model. 7/13/2017 Learning with Adaboost 25 Adaboost in our project 3.1) Overfitting Problem. Why the trained Adaboost does not work for bird 11~20? I have compared: I ) the rank of alpha value for each 60 classifiers II) how each classifier has actually detected birds in train process III) how each classifier has actually detected birds in test process. The covariance is also computed for comparison: cov(c(:,1),c(:,2)) ans = 305.0000 6.4746 6.4746 305.0000 K>> cov(c(:,1),c(:,3)) Overfitted! ans = 305.0000 92.8644 92.8644 305.0000 K>> cov(c(:,2),c(:,3)) ans = 305.0000 -46.1186 -46.1186 305.0000 7/13/2017 Train data is different from test data. This is very common. Learning with Adaboost 26 Adaboost in our project Train Result (Covariance:6.4746) 7/13/2017 Learning with Adaboost 27 Adaboost in our project Comparison:Train&Test Result (Covariance:92.8644) 7/13/2017 Learning with Adaboost 28 Adaboost in our project 3.2) CPM: continuous profile model; put forward by Jennifer Listgarten. This is very useful for data alignment. 7/13/2017 Learning with Adaboost 29 Adaboost in our project The alignment results from CPM model: Aligned and Scaled Data 2 1 0 -1 -2 0 50 100 150 Latent Time Unaligned and Unscaled Data 200 250 2 1 0 -1 -2 7/13/2017 0 10 20 30 40 50 60 Experimental Time Learning with Adaboost 70 80 90 100 30 Adaboost in our project The unified model from CPM alignment: 2 1.2 1 1.5 0.8 0.6 1 0.4 0.5 0.2 0 0 -0.2 -0.4 -0.5 -0.6 -1 0 50 100 150 200 without resampled 7/13/2017 250 -0.8 0 10 20 30 40 50 60 70 80 90 100 after upsample and downsample Learning with Adaboost 31 Adaboost in our project The influence of CPM for history record History Record(without CPM Alignment) History Record(using CPM Alignment) 1 0.8 0.9 0.7 0.8 0.6 0.7 0.5 0.6 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 100 7/13/2017 200 300 400 500 600 700 0 0 100 Learning with Adaboost 200 300 400 500 600 700 32 Outline Introduction and background of Boosting and Adaboost Adaboost Algorithm example Adaboost Algorithm in current project Experiment results Discussion and conclusion 7/13/2017 Learning with Adaboost 33 Browse all birds 7/13/2017 Learning with Adaboost 34 Curvature Descriptor 7/13/2017 Learning with Adaboost 35 Distance Descriptor 7/13/2017 Learning with Adaboost 36 Adaboost without CPM 7/13/2017 Learning with Adaboost 37 Adaboost without CPM(con’d) 7/13/2017 Learning with Adaboost 38 Good_Part_Selected (Adaboost without CPM con’d) 7/13/2017 Learning with Adaboost 39 Adaboost without CPM(con’d) The Alpha Values 0.075527 0 0.080877 0.168358 0 0 0 0 0.146951 0.007721 0.218146 0 0.081063 0 0 0.060681 0 0 0.197824 0 0.08873 0 0.080742 0.015646 0 0.080659 0.269843 0 0.028159 0 0 0.19772 0.086019 0.217678 0 0.21836 0 0.080554 0 0 0 0.190074 0 0.21237 0 0 0 0 0 0.060744 0 0 0 0 0.179449 0.338801 0.080667 0.080895 0 0.267993 Other Statistical Data: zero rate: 0.5333; covariance: 0.0074; median: 0.0874 7/13/2017 Learning with Adaboost 40 Adaboost with CPM 7/13/2017 Learning with Adaboost 41 Adaboost with CPM(con’d) 7/13/2017 Learning with Adaboost 42 Adaboost with CPM(con’d) 7/13/2017 Learning with Adaboost 43 Good_Part_Selected (Adaboost without CPM con’d) 7/13/2017 Learning with Adaboost 44 Adaboost without CPM(con’d) The Alpha Values 2.521895 0 2.510827 0.714297 0 0 1.646754 0 0 0 0 0 2.134926 0 2.167948 0 2.526712 0 0.279277 0 0 0 0.0635 2.322823 0 0 2.516785 0 0 0 0 0.04174 0 0.207436 0 0 0 0 1.30396 0 0 0.951666 0 2.513161 2.530245 0 0 0 0 0 0 0.041627 2.522551 0 0.72565 0 2.506505 1.303823 0 1.611553 Other Statistical Data: zero rate: 0.6167; covariance: 0.9488; median: 1.6468 7/13/2017 Learning with Adaboost 45 Outline Introduction and background of Boosting and Adaboost Adaboost Algorithm example Adaboost Algorithm in current project Experiment results Discussion and conclusion 7/13/2017 Learning with Adaboost 46 Conclusion and discussion 1) Adaboost works with CPM unified model; This model has smoothed the trained data set and decreased the influence of overfitting. 2) The influence of history record is very interesting. It will suppress the noise and strengthen the WeakLearn boosting direction. 3) The step length of KNN selected by Adaboost is not discussed here. This is also useful for suppress noise. 7/13/2017 Learning with Adaboost 47 Conclusion and discussion(con’d) 4) The Adaboost does not rely on the trained order. The obtained Alpha value has very similar distribution for all the classifiers. There are two examples: Example 1: four different train orders have obtained the Alpha as follow: 1) 6 birds Alpha_All1= 0.4480 0.1387 0.2074 0.5949 0.5868 0.3947 0.3874 0.5634 0.6694 0.7447 2) 6 birds Alpha_All2= 0.3998 0.0635 0.2479 0.6873 0.5868 0.2998 0.4320 0.5581 0.6946 0.7652 3) 6 birds Alpha_All3 = 0.4191 0.1301 0.2513 0.5988 0.5868 0.2920 0.4286 0.5503 0.6968 0.7134 4) 6 birds Alpha_All4= 0.4506 0.0618 0.2750 0.5777 0.5701 0.3289 0.5948 0.5857 0.7016 0.6212 7/13/2017 Learning with Adaboost 48 Conclusion and discussion(con’d) 7/13/2017 Learning with Adaboost 49 Conclusion and discussion(con’d) Example 2: 60 parts from Curvature Descriptor, 60 from Distance Descriptor; 1) They are trained independently at first; 2) Then they are combined to be trained together. The results are as follow: 7/13/2017 Learning with Adaboost 50 Conclusion and discussion(con’d) 7/13/2017 Learning with Adaboost 51 Conclusion and discussion(con’d) 5) how to combine the curvature and distance descriptor will be another important problem. Currently I can obtain nice results by combining them. 10 birds are all found. Are they stable for all other class? How to integrate the improved Adaboost to combine the two descriptors? Maybe Adaboost will improve even further (for general stuff, for example, elephant or camel). 7/13/2017 Learning with Adaboost 52 Conclusion and discussion(con’d) Current results without Adaboost: 7/13/2017 Learning with Adaboost 53 Conclusion and discussion(con’d) ? ? 6) How about the influence from the search order? Could we try to reverse the search order? My current result has improved by one more bird, but not too much. 7) How many models could we obtain from the CPM model? Currently I am using only one unified model. 8) Why does the rescaled model not work? (I do not think curvature is so sensitive to the rescale). 9) Could we try to boosting the Neural Network? 7/13/2017 Learning with Adaboost 54 Conclusion and discussion(con’d) ? ? 10) Could we try to change the boosting function? Currently I am using the Logistical Regression projection function to transmit the error information to Alpha value; anyway, there are many methods to do this work. For example:c45, decision stump, decision table, naïve bayes, voted perceptron and zeroR. etc. 11) How to use decision tree to replace Adaboost? I think this will impede the search speed; but I am not sure the quality. 7/13/2017 Learning with Adaboost 55 Conclusion and discussion(con’d) ? 12) How about the fuzzy SVM or SVM to ? ? address this good parts selection problem? 13) How to understand the difference among good parts selected by computer and by human? (Do the parts from computer program have the similar semantic meaning?) 14) How about the stability of Curvature and Distance Descriptors? 7/13/2017 Learning with Adaboost 56 Thanks! Reference [1] Yoav Freund, Robert Schapire, a short Introduction to Boosting [2] Robert Schapire, the boosting approach to machine learning; Princeton University [3] Yoav Freund, Robert Schapire, A decisiontheoretic generalization of on-line learning and application to boosting [4] R. Polikar, Ensemble Based Systems in Decision Making, IEEE Circuits and Systems Magazine, vol.6, no.3, pp. 21-45, 2006. 7/13/2017 Learning with Adaboost 58
© Copyright 2026 Paperzz