The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem Power of Selective Memory. Slide 1 Outline • • • • • • • Online learning, loss bounds etc. Hypotheses space – PST Margin of prediction and hinge-loss An online learning algorithm Trading margin for depth of the PST Automatic calibration A self-bounded online algorithm for learning PSTs Power of Selective Memory. Slide 2 Online Learning • For • Get an instance • Predict a target based on • Get true update and suffer loss • Update prediction mechanism Power of Selective Memory. Slide 3 Analysis of Online Algorithm • Relative loss bounds (external regret): For any fixed hypothesis h : Power of Selective Memory. Slide 4 Prediction Suffix Tree (PST) Each hypothesis is parameterized by a triplet: context function Power of Selective Memory. Slide 5 PST Example 0 -3 -1 1 4 -2 Power of Selective Memory. 7 Slide 6 Margin of Prediction • Margin of prediction • Hinge loss 4 0-1 loss hinge loss 3.5 3 2.5 2 1.5 1 0.5 0 -3 Power of Selective Memory. -2 -1 0 1 2 3 Slide 7 Complexity of hypothesis • Define the complexity of hypothesis as • We can also extend g s.t. and get Power of Selective Memory. Slide 8 Algorithm I : Learning Unbounded-Depth PST • Init: • For t=1,2,… • Get and predict • Get and suffer loss • Set • Update weight vector • Update tree Power of Selective Memory. Slide 9 Example y= y= ? Power of Selective Memory. 0 Slide 10 Example y= y= ? + Power of Selective Memory. 0 Slide 11 Example y= y= ? + ? Power of Selective Memory. 0 Slide 12 Example y= y= ? + ? - 0 + -.23 Power of Selective Memory. Slide 13 Example y= y= ? + ? - ? 0 + -.23 Power of Selective Memory. Slide 14 Example y= y= ? + ? - ? + 0 + .23 -.23 + .16 Power of Selective Memory. Slide 15 Example y= y= ? + ? - ? + 0 + .23 -.23 + .16 Power of Selective Memory. Slide 16 Example y= y= ? + ? - ? + - 0 + .23 -.42 + .16 -.14 + -.09 Power of Selective Memory. Slide 17 Example y= y= ? + ? - ? + - + 0 + .23 -.42 + .16 -.14 + -.09 Power of Selective Memory. Slide 18 Example y= y= ? + ? - ? + - + + 0 + .41 -.42 + .29 - -.14 + .09 + -.09 .06 Power of Selective Memory. Slide 19 Analysis • Let be a sequence of examples and assume that • Let be an arbitrary hypothesis • Let be the loss of on the sequence of examples. Then, Power of Selective Memory. Slide 20 Proof Sketch • Define • Upper bound • Lower bound • Upper + lower bounds give the bound in the theorem Power of Selective Memory. Slide 21 Proof Sketch (Cont.) Where does the lower bound come from? • For simplicity, assume that and • Define a Hilbert space: • The context function gt+1 is the projection of gt onto the half-space where f is the function Power of Selective Memory. Slide 22 Example revisited y= + - + - + - + - • The following hypothesis has cumulative loss of 2 and complexity of 2. Therefore, the number of mistakes is bounded above by 12. Power of Selective Memory. Slide 23 Example revisited y= + - + - + - + - • The following hypothesis has cumulative loss of 1 and complexity of 4. Therefore, the number of mistakes is bounded above by 18. But, this tree is very shallow 0 1.41 + -1.41 Problem: The tree we learned is much more deeper ! Power of Selective Memory. Slide 24 Geometric Intuition Power of Selective Memory. Slide 25 Geometric Intuition (Cont.) Lets force gt+1 to be sparse by “canceling” the new coordinate Power of Selective Memory. Slide 26 Geometric Intuition (Cont.) Now we can show that: Power of Selective Memory. Slide 27 Trading margin for sparsity • We got that • If is much smaller than get a loss bound ! we can • Problem: What happens if is very small and therefore ? Solution: Tolerate small margin errors ! • Conclusion: If we tolerate small margin errors, we can get a sparser tree Power of Selective Memory. Slide 28 Automatic Calibration • Problem: The value of is unknown • Solution: Use the data itself to estimate it ! More specifically: • Denote • If we keep mistake bound Power of Selective Memory. then we get a Slide 29 Algorithm II : Learning Self Bounded-Depth PST • Init: • For t=1,2,… • Get and predict • Get and suffer loss • If do nothing! Otherwise: • Set • Set • Set • Update w and the tree as in Algo. I, up to depth dt Power of Selective Memory. Slide 30 Analysis – Loss Bound • Let be a sequence of examples and assume that • Let be an arbitrary hypothesis • Let be the loss of on the sequence of examples. Then, Power of Selective Memory. Slide 31 Analysis – Bounded depth • Under the previous conditions, the depth of all the trees learned by the algorithm is bounded above by Power of Selective Memory. Slide 32 Example revisited Performance of Algo. II • y=+-+-+-+-… • Only 3 mistakes • The last PST is of depth 5 • The margin is 0.61 (after normalization) • The margin of the max margin tree (of infinite depth) is 0.7071 Power of Selective Memory. 0 - + .55 -.55 - + - .39 -. 22 + .07 -.07 + - .05 -.05 .03 Slide 33 Conclusions • • • • Discriminative online learning of PSTs Loss bound Trade margin and sparsity Automatic calibration Future work • Experiments • Features selection and extraction • Support vectors selection Power of Selective Memory. Slide 34
© Copyright 2026 Paperzz