Online Learning Rong Jin Batch Learning • Given a collection of training examples D • Learning a classification model from D • What if training examples are received one at each time ? Online Learning For t=1, 2, … T • Receive an instance • Predict its class label • Receive the true class label • Encounter loss • Update the classification model Objective • Minimize the total loss • Loss function • Zero-One loss: • Hinge loss: 4 Loss Functions Hinge Loss Zero-One Loss 1 1 5 Linear Classifiers • Restrict our discussion to linear classifier • Prediction: • Confidence: 6 Separable Set 7 Inseparable Sets 8 Why Online Learning? Fast Memory efficient - process one example at a time Simple to implement Formal guarantees – Regret/Mistake bounds Online to Batch conversions No statistical assumptions Adaptive Not as good as a well designed batch algorithms 9 Update Rules • Online algorithms are based on an update rule which defines from (and possibly other information) • Linear Classifiers : find from based on the input Some Update Rules : – – – – Perceptron (Rosenblat) ALMA (Gentile) ROMMA (Li & Long) NORMA (Kivinen et. al) – MIRA (Crammer & Singer) – EG (Littlestown and Warmuth) – Bregman Based (Warmuth) 10 Perceptron Initialize For t=1, 2, … T • Receive an instance • Predict its class label • Receive the true class label • If then Geometrical Interpretation 12 Mistake Bound: Separable Case • Assume the data set D is linearly separable with margin , i.e., • Assume • Then, the maximum number of mistakes made by the Perceptron algorithm is bounded by Mistake Bound: Separable Case Mistake Bound: Inseparable Case • Let be the best linear classifier • We measure our progress by • Consider we make a mistake for Mistake Bound: Inseparable Case • Result 1: Mistake Bound: Inseparable Case • Result 2 Perceptron with Projection Initialize For t=1, 2, … T • Receive an instance • Predict its class label • Receive the true class label • If then • If then Remarks • Mistake bound is measured for a sequence of classifiers • Bound does not depend on dimension of the feature vector • The bound holds for all sequences (no i.i.d. assumption). • It is not tight for most real world data. But, it can not be further improved in general. 19 Perceptron Conservative: updates Initialize the classifier only For t=1, 2, … T when it misclassifies • Receive an instance • Predict its class label • Receive the true class label • If then Aggressive Perceptron Initialize For t=1, 2, … T • Receive an instance • Predict its class label • Receive the true class label • If then Regret Bound Learning a Classifier • The evaluation (mistake bound or regret bound) concerns a sequence of classifiers • But, by the end of the day, which classifier should used ? The last? By Cross Validation ? Learning with Expert Advice • Learning to combine the predictions from multiple experts • An ensemble of d experts: • Combination weights: • Combined classifier Hedge Simple Case • There exists one expert, denoted by who can perfectly classify all the training examples • What is your learning strategy ? , Difficult case • What if we don’t have such a perfect expert ? Hedge Algorithm +1 -1 +1 +1 Hedge Algorithm Initialize For t=1, 2, … T • Receive a training example • Prediction • If then For i=1, 2, …, d • If then Mistake Bound Mistake Bound • Measure the progress • Lower bound Mistake Bound • Upper bound Mistake Bound • Upper bound Mistake Bound
© Copyright 2025 Paperzz