Optimal and Adaptive Algorithms for Online Boosting

Optimal and Adaptive Algorithms for Online Boosting
1
Alina Beygelzimer
1
Yahoo Labs, New York
1. Motivation and Main Results
Boosting:
I well-known ensemble learning method that combines rules
of thumb using a weak learning algorithm.
I extensively studied in the batch setting.
2
1
Satyen Kale
2
Haipeng Luo
Computer Science Department, Princeton University
3. An Optimal Algorithm: Online Boost-By-Majority
Generalizing the batch boost-by-majority algorithm
[Freund, 1995], we propose an online version of BBM:
If the weak
√ learner is derived from an online learning algorithm
with O( T ) regret, then
Algorithm
#Weak
#Examples
Learners
Online BBM
O γ12 ln 1
Õ γ1 2
AdaBoost.OL
O γ1 2
Õ 21γ 4
OSBoost
O γ1 2
Õ γ1 2
This work: a theoretical study of online boosting algorithms,
complemented with an extensive experimental evaluation,
improving previous work in several aspects:
relies on a weaker and more natural assumption.
I
works for both example sampling and weighting models.
I
one optimal algorithm, and one adaptive and
parameter-free algorithm.
2. Setup and Assumptions
[Chen et al., 2012]
6. Experiments
4. An Adaptive Algorithm: AdaBoost.OL
Using the theory of online loss minimization, we propose
another parameter-free algorithm:
On each time step t = 1, . . . , T ,
I an adversary chooses an example (xt , yt ) ∈ X × {−1, 1},
and reveals xt to the online learner.
I the learner makes a prediction on its label ŷt ∈ {−1, 1},
and suffers the 0-1 loss 1{ŷt 6= yt }.
Weak online learner (with edge γ and excess loss S):
PT
1
t=1 1{ŷt 6= yt } ≤ ( 2 − γ)T + S
Strong online learner (for any error and excess loss S):
PT
t=1 1{ŷt 6= yt } ≤ T + S
High level results:
batch weak learning assumption
+ online agnostic learnability
⇓
existence of weak online learners
⇓ via online boosting algorithms
existence of strong online learners
Online BBM
I is optimal (i.e. no algorithm can achieve the same error rate
with less weak learners or less examples),
I but requires knowing the parameter γ.
AdaBoost.OL
I is suboptimal,
I but is adaptive and parameter-free.
However, online learning setting is gaining more and more
popularity, since
I usually it does not require storing all data.
I it works even using adversarial data and is able to adapt to
changing environments.
I
5. Theoretical Comparisons
Implemented in Vowpal Wabbit:
vw --boosting arg (use arg weak learners)
Options:
--alg arg (BBM, logistic, etc)
--gamma arg (γ, used only by BBM)
Experimental setup:
I default Vowpal Wabbit as “weak” learner
I tuned the learning rate, N, and γ using online loss
on the training set; reported is test loss
I average improvement over the weak learner:
OnlineBBM: 5.14%
AdaBoost.OL: 2.67%, OSBoost: 1.98%
Dataset
VW baseline Online BBM AdaBoost.OL OSBoost
20news
0.0812
0.0775
0.0777
0.0791
a9a
0.1509
0.1495
0.1497
0.1509
activity
0.0133
0.0114
0.0128
0.0130
adult
0.1543
0.1526
0.1536
0.1539
bio
0.0035
0.0031
0.0032
0.0033
census
0.0471
0.0469
0.0469
0.0469
covtype
0.2563
0.2347
0.2495
0.2470
letter
0.2295
0.1923
0.2078
0.2148
maptaskcoref
0.1091
0.1077
0.1083
0.1093
nomao
0.0641
0.0627
0.0635
0.0627
poker
0.4555
0.4312
0.4555
0.4555
rcv1
0.0487
0.0485
0.0484
0.0488
vehv2binary
0.0292
0.0286
0.0291
0.0284