online

online_old

Online Learning
Rong Jin
Batch Learning
• Given a collection of training examples D
• Learning a classification model from D
• What if training examples are received one
at each time ?
Online Learning
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
Objective
• Minimize the total loss
• Loss function
• Zero-One loss:
• Hinge loss:
4
Loss Functions
Hinge Loss
Zero-One Loss
1
1
5
Linear Classifiers
• Restrict our discussion to linear classifier
• Prediction:
• Confidence:
6
Separable Set
7
Inseparable Sets
8
Why Online Learning?
Fast
Memory efficient - process one example at a time
Simple to implement
Formal guarantees – Regret/Mistake bounds
Online to Batch conversions
No statistical assumptions
Adaptive
Not as good as a well designed batch algorithms
9
Update Rules
• Online algorithms are based on an update rule
which defines
from
(and possibly
other information)
• Linear Classifiers : find
from
based
on the input
Some Update Rules :
–
–
–
–
Perceptron (Rosenblat)
ALMA (Gentile)
ROMMA (Li & Long)
NORMA (Kivinen et. al)
– MIRA (Crammer & Singer)
– EG (Littlestown and Warmuth)
– Bregman Based (Warmuth)
10
Perceptron
Initialize
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
Geometrical Interpretation
12
Mistake Bound: Separable Case
• Assume the data set D is linearly separable
with margin , i.e.,
• Assume
• Then, the maximum number of mistakes
made by the Perceptron algorithm is
bounded by
Mistake Bound: Separable Case
Mistake Bound: Inseparable Case
• Let
be the best linear classifier
• We measure our progress by
• Consider we make a mistake for
Mistake Bound: Inseparable Case
• Result 1:
Mistake Bound: Inseparable Case
• Result 2
Perceptron with Projection
Initialize
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
• If
then
Remarks
• Mistake bound is measured for a sequence
of classifiers
• Bound does not depend on dimension of the
feature vector
• The bound holds for all sequences (no i.i.d.
assumption).
• It is not tight for most real world data. But, it
can not be further improved in general.
19
Perceptron
Conservative: updates
Initialize
the classifier only
For t=1, 2, … T
when it misclassifies
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
Aggressive Perceptron
Initialize
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
Regret Bound
Learning a Classifier
• The evaluation (mistake bound or regret
bound) concerns a sequence of classifiers
• But, by the end of the day, which classifier
should used ? The last? By Cross Validation ?
Learning with Expert Advice
• Learning to combine the predictions from
multiple experts
• An ensemble of d experts:
• Combination weights:
• Combined classifier
Hedge
Simple Case
• There exists one expert, denoted by
who can perfectly classify all the training
examples
• What is your learning strategy ?
,
Difficult case
• What if we don’t have such a perfect expert ?
Hedge Algorithm
+1
-1
+1
+1
Hedge Algorithm
Initialize
For t=1, 2, … T
• Receive a training example
• Prediction
• If
then
For i=1, 2, …, d
• If
then
Mistake Bound
Mistake Bound
• Measure the progress
• Lower bound
Mistake Bound
• Upper bound
Mistake Bound
• Upper bound
Mistake Bound

Download Report

online_old

Paperzz.com

Your Paperzz