Online Passive-Aggressive Algorithms

The Power of Selective Memory
Shai Shalev-Shwartz
Joint work with
Ofer Dekel, Yoram Singer
Hebrew University, Jerusalem
Power of Selective Memory.
Slide 1
Outline
•
•
•
•
•
•
•
Online learning, loss bounds etc.
Hypotheses space – PST
Margin of prediction and hinge-loss
An online learning algorithm
Trading margin for depth of the PST
Automatic calibration
A self-bounded online algorithm for learning
PSTs
Power of Selective Memory.
Slide 2
Online Learning
• For
• Get an instance
• Predict a target
based on
• Get true update
and suffer loss
• Update prediction mechanism
Power of Selective Memory.
Slide 3
Analysis of Online Algorithm
• Relative loss bounds (external regret):
For any fixed hypothesis h :
Power of Selective Memory.
Slide 4
Prediction Suffix Tree (PST)
Each hypothesis is parameterized by a triplet:
context function
Power of Selective Memory.
Slide 5
PST Example
0
-3
-1
1
4
-2
Power of Selective Memory.
7
Slide 6
Margin of Prediction
• Margin of prediction
• Hinge loss
4
0-1 loss
hinge loss
3.5
3
2.5
2
1.5
1
0.5
0
-3
Power of Selective Memory.
-2
-1
0
1
2
3
Slide 7
Complexity of hypothesis
• Define the complexity of hypothesis as
• We can also extend g s.t.
and get
Power of Selective Memory.
Slide 8
Algorithm I :
Learning Unbounded-Depth PST
• Init:
• For t=1,2,…
• Get
and predict
• Get
and suffer loss
• Set
• Update weight vector
• Update tree
Power of Selective Memory.
Slide 9
Example
y=
y=
?
Power of Selective Memory.
0
Slide 10
Example
y=
y=
?
+
Power of Selective Memory.
0
Slide 11
Example
y=
y=
?
+
?
Power of Selective Memory.
0
Slide 12
Example
y=
y=
?
+
?
-
0
+
-.23
Power of Selective Memory.
Slide 13
Example
y=
y=
?
+
?
-
?
0
+
-.23
Power of Selective Memory.
Slide 14
Example
y=
y=
?
+
?
-
?
+
0
+
.23
-.23
+
.16
Power of Selective Memory.
Slide 15
Example
y=
y=
?
+
?
-
?
+
0
+
.23
-.23
+
.16
Power of Selective Memory.
Slide 16
Example
y=
y=
?
+
?
-
?
+
-
0
+
.23
-.42
+
.16
-.14
+
-.09
Power of Selective Memory.
Slide 17
Example
y=
y=
?
+
?
-
?
+
-
+
0
+
.23
-.42
+
.16
-.14
+
-.09
Power of Selective Memory.
Slide 18
Example
y=
y=
?
+
?
-
?
+
-
+
+
0
+
.41
-.42
+
.29
-
-.14
+
.09
+
-.09
.06
Power of Selective Memory.
Slide 19
Analysis
• Let
be a sequence of
examples and assume that
• Let
be an arbitrary hypothesis
• Let
be the loss of
on the
sequence of examples. Then,
Power of Selective Memory.
Slide 20
Proof Sketch
• Define
• Upper bound
• Lower bound
• Upper + lower bounds give the bound in the
theorem
Power of Selective Memory.
Slide 21
Proof Sketch (Cont.)
Where does the lower bound come from?
• For simplicity, assume that
and
• Define a Hilbert space:
• The context function gt+1
is the projection of gt
onto the half-space
where f is the function
Power of Selective Memory.
Slide 22
Example revisited
y=
+
-
+
-
+
-
+
-
• The following hypothesis has cumulative loss
of 2 and complexity of 2. Therefore, the
number of mistakes is bounded above by 12.
Power of Selective Memory.
Slide 23
Example revisited
y=
+
-
+
-
+
-
+
-
• The following hypothesis has cumulative loss
of 1 and complexity of 4. Therefore, the
number of mistakes is bounded above by 18.
But, this tree is very shallow
0
1.41
+
-1.41
Problem: The tree we learned is much more deeper !
Power of Selective Memory.
Slide 24
Geometric Intuition
Power of Selective Memory.
Slide 25
Geometric Intuition (Cont.)
Lets force gt+1 to be sparse by “canceling” the new coordinate
Power of Selective Memory.
Slide 26
Geometric Intuition (Cont.)
Now we can show that:
Power of Selective Memory.
Slide 27
Trading margin for sparsity
• We got that
• If
is much smaller than
get a loss bound !
we can
• Problem: What happens if
is very small
and therefore
?
Solution: Tolerate small margin errors !
• Conclusion: If we tolerate small margin
errors, we can get a sparser tree
Power of Selective Memory.
Slide 28
Automatic Calibration
• Problem: The value of
is unknown
• Solution: Use the data itself to estimate it !
More specifically:
• Denote
• If we keep
mistake bound
Power of Selective Memory.
then we get a
Slide 29
Algorithm II :
Learning Self Bounded-Depth PST
• Init:
• For t=1,2,…
• Get
and predict
• Get
and suffer loss
• If
do nothing! Otherwise:
• Set
• Set
• Set
• Update w and the tree as in Algo. I, up to depth dt
Power of Selective Memory.
Slide 30
Analysis – Loss Bound
• Let
be a sequence of
examples and assume that
• Let
be an arbitrary hypothesis
• Let
be the loss of
on the
sequence of examples. Then,
Power of Selective Memory.
Slide 31
Analysis – Bounded depth
• Under the previous conditions, the depth of
all the trees learned by the algorithm is
bounded above by
Power of Selective Memory.
Slide 32
Example revisited
Performance of Algo. II
• y=+-+-+-+-…
• Only 3 mistakes
• The last PST is of
depth 5
• The margin is 0.61
(after normalization)
• The margin of the max
margin tree (of infinite
depth) is 0.7071
Power of Selective Memory.
0
-
+
.55
-.55
-
+
-
.39
-. 22
+
.07
-.07
+
-
.05
-.05
.03
Slide 33
Conclusions
•
•
•
•
Discriminative online learning of PSTs
Loss bound
Trade margin and sparsity
Automatic calibration
Future work
• Experiments
• Features selection and extraction
• Support vectors selection
Power of Selective Memory.
Slide 34

Download Report

Online Passive-Aggressive Algorithms

Paperzz.com

Your Paperzz