Machine learning: Boosting 1. Basic definition: Assume again we have a classification task (e.g. cancer classification) with data H œ Ö x3 ß C 3 × 3 . and C3 œ „ ". Boosting Assume we have a classifier :ÐxÑ which takes feature vector x, and classify Cœ" œ C œ " if :ÐxÑ ! Þ if :ÐxÑ ! Assume :ÐxÑ is 'weak' - i.e., its predictions are not always correct. Now assume a family Ö:4 ÐxÑ×7 4œ" of different weak classifiers. Boosting A boosting classifier takes a linear combination of these classifiers to form a better one, for example, 0 ÐxÑ œ "+3 :3 ÐxÑ, 7 3=1 where, e.g., 0 ÐxÑ ! selects the class, and otherwise the class. 2. AdaBoost (adaptive boosting) algorithm (Freund & Schaphire, 1997) Consider data H œ Öx3 ß C3 ×83œ" . Weigh each data point Ðx3 ß C3 Ñ equally with weight [" Ð3Ñ œ 8" Þ Assume x3 − J œ feature space. Goal: build a classifier 0 ÐxÑ which generalizes data set H to predict class C of x. Let [ œ space of allowed classifier functions 0 AdaBoost: introduction Idea: We will take random samples from data set H (with repetition) according the weight distribution [" , and later some distributions [# , [$ ß á which emphasize the examples x3 we have misclassified previously. Specifically: train initial classifier 2" ÐxÑ À x Ä Ö „ "× which minimizes error with respect to weight distribution [" . That is: AdaBoost: introduction 2" œ function 2 which minimizes weighted number of errors œ function 2 which minimizes error 8 ! [" Ð3ÑMÐC3 Á 2Ðx3 ÑÑ œ argmin ! [" Ð3ÑMÐC3 Á 2Ðx3 ÑÑ 3œ" 8 2−[ 3œ" where MÐC3 Á 2Ðx3 ÑÑ œ œ " ! if C3 Á 2Ðx3 Ñ . otherwise (1) 3. The algorithm: 1. Define %" œ minimal value of error in (1). (i.e. smallest value of the sum (1)) Require: %" Ÿ Þ&; otherwise stop (then have a bad family of classifiers 2; need at least 50% accuracy). AdaBoost algorithm 2. Choose !" à typically !" œ " " %" ln # %" (2) 3. Update weigths [ : [# Ð3Ñ œ [" Ð3Ñ/!" C3 2" Ð3Ñ ß ^" Ð!" Ñ where ^" œ normalizing constant to make Ö[# Ð3Ñ×3 a probability distribution (over 3) which adds up to ". AdaBoost algorithm Now Ö[# Ð3Ñ×3 form a family of weights which we can use to 're-sample' the data set H, and form a new classifier 2# . Specifically 2# œ argmin "[# Ð3ÑMÐC3 Á 2Ðx3 ÑÑ 8 2−[ 3œ" (note that if weight [# Ð3Ñ 8" this is equivalent to 'oversampling' data point x3 ; otherwise 'undersampling' x3 ) AdaBoost algorithm 4. Generally, define 2> œ argmin "[> Ð3ÑMÐC3 Á 2Ðx3 ÑÑ 8 2−[ (2a) 3œ" Again let %> œ smallest value of the sum in (2a) and !> œ " " %> ln Þ # %> (3) AdaBoost algorithm Then define [> Ð3Ñ/!> C3 2> Ð3Ñ [>" Ð3Ñ œ ß ^> and ^> œ normalizing constant as earlier. (3a) AdaBoost algorithm Then after X steps form final classifier 0 ÐxÑ œ sgn"!> 2> ÐxÑÞ X >œ" 4. Observations about AdaBoost Note that the updating equation (3a) has property /!>" C3 2>" Ðx3 Ñ œ " " if C3 œ 2>" Ðx3 Ñ Þ if C3 Á 2>" Ðx3 Ñ Thus after we find best classifier 2>" for sample distribution [>" , examples x3 which 2>" identified incorrectly are weighted more for the selection of 2> . Thus the classifier based on distribution [> will better identify examples that the previous weak classifier missed. AdaBoost: additional observations Recall normalization constant ^> Ð!> Ñ œ "[> Ð3Ñ/!> C3 2> Ð3Ñ . 3 This is weighted measure of the total error at the previous step, since C3 2> Ð3Ñ œ œ " " if 2> Ð3Ñ œ C3 (correct prediction) if 2> Ð3Ñ Á C3 (incorrect prediction) which means /!> C3 2> Ð3Ñ large if 2> Ð3Ñ Á C3 Þ (4) AdaBoost: additional observations Can show our definition (3) for !> minimizes ^> Ð!> Ñ at each step (just use calculus to minimize (4) above w/ respect to the variable !> ). Note: can also show ^> œ #È%> Ð" %> Ñ for above choice of !> . Recall reweighting formula: AdaBoost: additional observations C3 ! !; 2; Ðx3 Ñ > [>" Ð3Ñ œ [> Ð3Ñ/!> C3 2> Ðx3 Ñ ^> recursion œ / ;œ" 8 # ^; > ;œ" (recall initial value H" Ð3Ñ œ 8" ). Þ AdaBoost: additional observations Can also show: fact that !> are chosen to minimize ^> implies that " [>" Ð3Ñ œ " [>" Ð3Ñ; 3À2> Ðx3 ÑœC3 3À2> Ðx3 ÑÁC3 this shows that half the new weight is focused on the misclassified examples for previous classifier 2> (where 2> Ðx3 Ñ Á C3 ). AdaBoost: examples Example 1: Training set: AdaBoost: examples Blue: R Ð!ß "Ñ Red: J. Matas, J. Sochman " "Î#Ð<%Ñ# / <È )<# AdaBoost: examples (distribution is radial with uniform angular distribution) Select weak (always linear) classifier with smallest weighted error (all points equal at this time)Þ [" Ð3Ñ œ 8" Þ AdaBoost: examples Re-weighting formula: find [# Ð3Ñ. Give more weight to misclassified examples; AdaBoost: examples AdaBoost: examples new classifer (focuses more on higher weights): Reweigh again based on previously misclassified examples, etc. AdaBoost: examples Summary: first classifier at > œ " and error rate: Fig. 1: Here and below shaded part is currently classified red (figures due to J. Matas, J. Sochman) AdaBoost: examples Second classifier: > œ # and error rate: AdaBoost: examples >œ$À AdaBoost: examples >œ%À AdaBoost: examples >œ&À AdaBoost: examples >œ'À AdaBoost: examples >œ(À AdaBoost: examples > œ %! À AdaBoost: examples Note misclassified examples closest to the boundary are the ones which at end get highest weight: AdaBoost: examples Baseline: AdaBoost with RBF network weak classifiers from (Ratsch-ML: 2000). AdaBoost: examples Blue: Adaboost (discrete) Green: Continuous Adaboost Red: Adaboost with totally corrective step (TCA) Cyan: Continuous Adaboost with totally corrective step Dashed lines: training (leave one out) error; solid - test error AdaBoost: examples Above dual distribution example AdaBoost: examples (Serum antibodies as predictors of thyroiditis) AdaBoost: examples Breast cancer classification AdaBoost: examples Diabetes from blood markers AdaBoost: examples Heart disease from blood markers All figures: J. Matas and J. Sochman (2005) AdaBoost: examples Long and Vega (2003) Adaboost vs. other classifiers in microarray cancer diagnosis (cross-validation error rates) AdaBoost: examples HCC = Hepatocellular carcinoma ER = estrogen response in breast cancer LN = lymph node spread of cancer
© Copyright 2026 Paperzz