The Duality of Strong Convexity and Strong Smoothness

The Duality of Strong Convexity and Strong Smoothness
Applications to Machine Learning
Shai Shalev-Shwartz
Toyota Technological Institute at Chicago
Talk at AFOSR workshop, January, 2009
January 2009
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
1 / 29
Outline
Lemma
f is strongly convex w.r.t. k · k ⇐⇒ f ? is strongly smooth w.r.t. k · k?
Applications:
Rademacher Bounds (⇒ Generalization Bounds)
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
2 / 29
Outline
Lemma
f is strongly convex w.r.t. k · k ⇐⇒ f ? is strongly smooth w.r.t. k · k?
Applications:
Rademacher Bounds (⇒ Generalization Bounds)
Low regret online algorithms (⇒ runtime of SGD/SMD)
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
2 / 29
Outline
Lemma
f is strongly convex w.r.t. k · k ⇐⇒ f ? is strongly smooth w.r.t. k · k?
Applications:
Rademacher Bounds (⇒ Generalization Bounds)
Low regret online algorithms (⇒ runtime of SGD/SMD)
Boosting
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
2 / 29
Outline
Lemma
f is strongly convex w.r.t. k · k ⇐⇒ f ? is strongly smooth w.r.t. k · k?
Applications:
Rademacher Bounds (⇒ Generalization Bounds)
Low regret online algorithms (⇒ runtime of SGD/SMD)
Boosting
Sparsity and `1 norm
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
2 / 29
Outline
Lemma
f is strongly convex w.r.t. k · k ⇐⇒ f ? is strongly smooth w.r.t. k · k?
Applications:
Rademacher Bounds (⇒ Generalization Bounds)
Low regret online algorithms (⇒ runtime of SGD/SMD)
Boosting
Sparsity and `1 norm
Concentration inequalities
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
2 / 29
Outline
Lemma
f is strongly convex w.r.t. k · k ⇐⇒ f ? is strongly smooth w.r.t. k · k?
Applications:
Rademacher Bounds (⇒ Generalization Bounds)
Low regret online algorithms (⇒ runtime of SGD/SMD)
Boosting
Sparsity and `1 norm
Concentration inequalities
Matrix regularization (Multi-task, group Lasso, dynamic bounds)
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
2 / 29
Motivating Problem – Generalization Bounds
Linear predictor is a mapping x 7→ φ(hw, xi)
E.g. x 7→ hw, xi or x 7→ sgn(hw, xi)
Loss of w on (x, y) is assessed by `(hw, xi, y)
Goal: minimize expected loss L(w) = E(x,y) [`(hw, xi, y)]
P
Instead, minimize empirical loss L̂(w) = n1 ni=1 `(hw, xi i, yi )
Bartlett and Mendelson [2002]:
If ` Lipschitz and bounded, w.p. at least 1 − δ
2
∀w ∈ S, L(w) ≤ L̂(w) + Rn (S) +
n
r
log(1/δ)
2n
where
"
def
Rn (S) = E
iid
∼ {±1}n
Shai Shalev-Shwartz (TTI-C)
Duality for ML
sup
n
X
#
i hu, xi i
u∈S i=1
Jan’09
3 / 29
Background – Fenchel Conjugate
Two equivalent representations of a convex function
Set of Tangents
Set of Points
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
4 / 29
Background – Fenchel Conjugate
Two equivalent representations of a convex function
Set of Tangents
Set of Points
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
4 / 29
Background – Fenchel Conjugate
Two equivalent representations of a convex function
Tangent (θ, −f ? (θ))
Point (w, f (w))
slo
f (w)
pe
w
Shai Shalev-Shwartz (TTI-C)
=
θ
−f ? (θ)
Duality for ML
Jan’09
5 / 29
Background – Fenchel Conjugate
Two equivalent representations of a convex function
Tangent (θ, −f ? (θ))
Point (w, f (w))
slo
f (w)
pe
=
θ
w
−f ? (θ)
?
f (θ) = max hw, θi − f (w)
w
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
5 / 29
Background – Fenchel Conjugate
The definition immediately implies Fenchel-Young inequality:
∀u,
f ? (θ) = max hw, θi − f (w)
w
≥ hu, θi − f (u)
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
6 / 29
Background – Fenchel Conjugate
The definition immediately implies Fenchel-Young inequality:
∀u,
f ? (θ) = max hw, θi − f (w)
w
≥ hu, θi − f (u)
If f is closed and convex then f ?? = f
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
6 / 29
Background – Fenchel Conjugate
The definition immediately implies Fenchel-Young inequality:
∀u,
f ? (θ) = max hw, θi − f (w)
w
≥ hu, θi − f (u)
If f is closed and convex then f ?? = f
By the way, this implies Jensen’s inequality:
f (E[w]) = max hθ, E[w]i − f ? (θ)
θ
= max E [hθ, wi − f ? (θ)]
θ
≤ E[ max hθ, wi − f ? (θ) ] = E[f (w)]
θ
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
6 / 29
Background – Fenchel Conjugate
Examples:
f (w)
f ? (θ)
1
2
2 kwk
1
2
2 kθk?
kwk
Indicator of unit k · k? ball
P θi log
ie
P
i wi log(wi )
Indicator of prob. simplex
maxi θi
c g(w) for c > 0
c g ? (θ/c)
inf x g1 (w) + g2 (w − x)
g1? (θ) + g2? (θ)
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
7 / 29
Background – Fenchel Conjugate
Examples:
⇒
⇒
f (w)
f ? (θ)
1
2
2 kwk
1
2
2 kθk?
kwk
Indicator of unit k · k? ball
P θi log
ie
P
i wi log(wi )
Indicator of prob. simplex
maxi θi
c g(w) for c > 0
c g ? (θ/c)
inf x g1 (w) + g2 (w − x)
g1? (θ) + g2? (θ)
(used for boosting)
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
7 / 29
Background – Fenchel Conjugate
Examples:
f (w)
f ? (θ)
1
2
2 kwk
1
2
2 kθk?
kwk
Indicator of unit k · k? ball
P θi log
ie
P
i wi log(wi )
⇒
Indicator of prob. simplex
maxi θi
c g(w) for c > 0
c g ? (θ/c)
inf x g1 (w) + g2 (w − x)
g1? (θ) + g2? (θ)
(infimal convolution theorem)
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
7 / 29
f is strongly convex ⇐⇒ f ? is strongly smooth
The following properties are equivalent:
f (w) is σ-strongly convex w.r.t. k · k
f ? (w) is
1
σ -strongly
Shai Shalev-Shwartz (TTI-C)
smooth w.r.t. k · k?
Duality for ML
Jan’09
8 / 29
f is strongly convex ⇐⇒ f ? is strongly smooth
The following properties are equivalent:
f (w) is σ-strongly convex w.r.t. k · k, that is
∀w, u, θ ∈ ∂f (u), f (w) − f (u) − hθ, w − ui ≥
f ? (w) is
1
σ -strongly
σ
ku − wk2 .
2
smooth w.r.t. k · k?
f (w)
≥
σ
ku
2
− wk2
f (u)
u
w
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
8 / 29
f is strongly convex ⇐⇒ f ? is strongly smooth
The following properties are equivalent:
f (w) is σ-strongly convex w.r.t. k · k, that is
∀w, u, θ ∈ ∂f (u), f (w) − f (u) − hθ, w − ui ≥
f ? (w) is
1
σ -strongly
σ
ku − wk2 .
2
smooth w.r.t. k · k? , that is
∀w, u, f ? (w) − f ? (u) − h∇f ? (u), w − ui ≤
1
ku − wk2? .
2σ
f ? (w)
f (w)
≥
σ
ku
2
− wk2
≤
1
ku
2σ
− wk2?
f ? (u)
f (u)
u
w
Shai Shalev-Shwartz (TTI-C)
u
Duality for ML
w
Jan’09
8 / 29
f is strongly convex ⇐⇒ f ? is strongly smooth
Examples:
f (w)
f ? (θ)
w.r.t. norm
σ
1
2
2 kwk2
1
2
2 kθk2
k · k2
1
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
9 / 29
f is strongly convex ⇐⇒ f ? is strongly smooth
Examples:
f (w)
f ? (θ)
w.r.t. norm
σ
1
2
2 kwk2
1
2
2 kθk2
k · k2
1
1
2
2 kwkq
1
2
2 kθkp
k · kq
(q − 1)
(where q ∈ (1, 2] and
Shai Shalev-Shwartz (TTI-C)
1
q
+
1
p
= 1)
Duality for ML
Jan’09
9 / 29
f is strongly convex ⇐⇒ f ? is strongly smooth
Examples:
f (w)
f ? (θ)
w.r.t. norm
σ
1
2
2 kwk2
1
2
2 kθk2
k · k2
1
1
2
2 kwkq
1
2
2 kθkp
k · kq
(q − 1)
k · k1
1
P
i wi log(wi )
Shai Shalev-Shwartz (TTI-C)
log
θi
ie
P
Duality for ML
Jan’09
9 / 29
Importance
Theorem (1)
Let
f be σ strongly convex w.r.t. k · k
Assume f ? (0) = 0 (for simplicity)
v1 , . . . , vn be arbitrary sequence of vectors
P
Denote wt = ∇f ? ( j<t vj )
Then, for any u we have
X
X
X
hu, vt i − f (u) ≤ f ? (
vt ) ≤
hwt , vt i +
t
Shai Shalev-Shwartz (TTI-C)
t
2
1
2 σ kvt k?
.
t
Duality for ML
Jan’09
10 / 29
Importance
Theorem (1)
Let
f be σ strongly convex w.r.t. k · k
Assume f ? (0) = 0 (for simplicity)
v1 , . . . , vn be arbitrary sequence of vectors
P
Denote wt = ∇f ? ( j<t vj )
Then, for any u we have
X
X
X
hu, vt i − f (u) ≤ f ? (
vt ) ≤
hwt , vt i +
t
t
2
1
2 σ kvt k?
.
t
Proof.
The first inequality is Fenchel-Young and the second inequality follows
from the σ1 smoothness of f ? by induction.
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
10 / 29
Back to Rademacher Complexities
Theorem 1:
X
hu, vt i − f (u) ≤
t
X
hwt , vt i +
2
1
2 σ kvt k?
.
t
Based on Kakade, Sridharan, Tewari [2008]
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
11 / 29
Back to Rademacher Complexities
Theorem 1:
X
hu, vt i − f (u) ≤
t
Therefore, for all S:
X
sup
hu, vt i ≤
u∈S
t
X
hwt , vt i +
2
1
2 σ kvt k?
.
t
1
2σ
X
kvt k2? + sup f (u) +
t
u∈S
X
hwt , vt i
t
Based on Kakade, Sridharan, Tewari [2008]
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
11 / 29
Back to Rademacher Complexities
Theorem 1:
X
hu, vt i − f (u) ≤
t
hwt , vt i +
2
1
2 σ kvt k?
.
t
Therefore, for all S:
X
sup
hu, vt i ≤
u∈S
X
t
1
2σ
X
kvt k2? + sup f (u) +
X
u∈S
t
hwt , vt i
t
Applying with vt = t xt and taking expectation we obtain:
"
#
X
X
2
2
Rn (S) ≤ 21σ
E[t ] kxt k? + sup f (u) + E
hwt , t xt i
t
u∈S
t
|
{z
}
=0
Based on Kakade, Sridharan, Tewari [2008]
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
11 / 29
Rademacher Bounds – Examples
S
f (w)
{w : kwk2 ≤ W }
σ
2
2 kwk2
Shai Shalev-Shwartz (TTI-C)
Duality for ML
X
P
i
kxi k22
n
Rn (S)
XW
√
n
Jan’09
12 / 29
Rademacher Bounds – Examples
S
f (w)
{w : kwk2 ≤ W }
σ
2
2 kwk2
P
{w : kwkq ≤ W }
σ
2
2 kwkq
P
Shai Shalev-Shwartz (TTI-C)
Duality for ML
X
i
kxi k22
n
i
kxi k2p
n
Rn (S)
XW
XW
√
n
p
(p − 1) n
Jan’09
12 / 29
Rademacher Bounds – Examples
S
f (w)
{w : kwk2 ≤ W }
σ
2
2 kwk2
P
{w : kwkq ≤ W }
σ
2
2 kwkq
P
Prob. simplex
Shai Shalev-Shwartz (TTI-C)
σ
P
i wi log(d wi )
Duality for ML
X
i
kxi k22
n
i
kxi k2p
n
i
kxi k2∞
n
P
Rn (S)
XW
XW
X
√
n
p
(p − 1) n
p
log(d) n
Jan’09
12 / 29
Intermediate Summary
f strongly convex ⇐⇒ f ? smooth
Fenchel-Young
Theorem 1
zero-mean
Rademacher Bound
S. Sridharan Srebro
Bartlett-Mendelson
Fast rates for strongly
convex objectives
Generalization bound
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
13 / 29
Coming Next ...
f strongly convex ⇐⇒ f ? smooth
Fenchel-Young
Theorem 1
Online Regret Bounds
zero-mean
Rademacher Bound
S. Sridharan Srebro
Bartlett-Mendelson
Fast rates for strongly
convex objectives
Generalization bound
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
13 / 29
Online Learning – Brief Background
Studied in game theory, information theory, and machine learning
Examples:
Repeated 2-players games (Hannan [57], Blackwell [56])
Predicting with side information (Rosenblatt’s Perceptron [58],
Weighted Majority of Littlestone and Warmuth [88,94])
Predicting of individual sequences (Cover [78], Feder, Merhav and
Gutman [92])
Online convex optimization – a general abstract prediction model
(Gordon [99], Zinkevich [03])
Using our lemma, we can easily derived optimal low regret algorithms
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
14 / 29
Online Learning
Prediction Game – Online Optimization
For t = 1, . . . , n
Learner chooses a decision wt ∈ S
Environment chooses a loss function `t : S → R
Learner pays loss `t (wt )
Regret of learner for not always following the best decision in S
n
X
t=1
`t (wt ) − min
u∈S
n
X
`t (u)
t=1
Goal: Conditions on S and loss functions that guarantee low regret
learning strategy
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
15 / 29
Online Learning
Assume f σ-strongly convex on S w.r.t. k · k
P
Recall Theorem 1: For wt = ∇f ? ( j<t vj ) we have
X
hu − wt , vt i ≤ f (u) +
1
2σ
t
X
kvt k2?
t
Assume `t convex and apply with vt ∈ ∂`t (wt ), thus
`t (wt ) − `t (u) ≤ hu − wt , vt i
Assume `t Lipschitz w.r.t. dual norm, thus kvt k? ≤ V
We obtain the regret bound (S. and Singer [06]):
n
X
`t (wt ) − min
t=1
Shai Shalev-Shwartz (TTI-C)
u∈S
n
X
`t (u) ≤ max f (u) +
t=1
Duality for ML
u∈S
nV 2
.
2σ
Jan’09
16 / 29
Online Learning – Example I
Predicting the next bit of a sequence
For t = 1, . . . , n
Learner predict ŷt ∈ {0, 1}
Environment responds with yt ∈ {0, 1}
Learner pays 1 if ŷt 6= yt
Modeling:
S = [0, 1], f (w) = σ2 w2 , σ =
√
n
Predict ŷt = 1 with probability wt ∈ S
Then, probability of ŷt 6= yt is `t (wt ) = |yt − wt |, which is convex
√
The expected regret is thus bounded by n
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
17 / 29
Online Learning – Example II
Predicting with expert advice
For t = 1, . . . , n
Learner receives a vector xt ∈ [0, 1]d of experts advice
Learner need to predict ŷt ∈ {0, 1}
Environment responds with yt ∈ {0, 1}
Learner pays 1 if ŷt 6= yt
Modeling:
S is p
d dimensional probability simplex, f (w) = σ
σ = n/ log(d)
P
i wi log(wi ),
Predict ŷt = 1 with probability hwt , xt i
Then, probability of ŷt 6= yt is `t (wt ) = |yt − hwt , xt i|, which is convex
p
The expected regret is thus bounded by log(d) n
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
18 / 29
Optimization from a Machine Learning Perspective
Assume we’d like to solve regularized loss minimization:
n
min f (w) +
w
1X
`(w, zi )
n
i=1
Stochastic Mirror Descent
At each step, sample i uniformly at random and feed an online learner
the loss `t (w) = `(w, zi )
Return averaged wt of the online learner
Number of iterations required to achieve accuracy is order of sample
complexity !
Optimality (S. and Srebro [08])
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
19 / 29
Intermediate Summary
f strongly convex ⇐⇒ f ? smooth
Fenchel-Young
Theorem 1
Online Regret Bounds
zero-mean
Rademacher Bound
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
20 / 29
Coming Next ...
f strongly convex ⇐⇒ f ? smooth
Fenchel-Young
Boosting
Theorem 1
Online Regret Bounds
zero-mean
Rademacher Bound
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
20 / 29
Boosting
Input:
Training set of examples (x1 , y1 ), . . . , (xm , ym )
d weak hypotheses h1 , . . . , hd
Output:
Strong hypothesis: Hw (·) =
Pd
i=1 wi hi (·)
Weak Learnability Assumption
For any probability p ∈ Sm over examples
Exists hj with edge at least γ,
X
pi yi hj (xi ) = Pr[hj = y] − Pr[hj 6= y] ≥ γ
i
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
21 / 29
Deriving Boosting Algorithm from Theorem 1
Goal: Find w s.t. mini yi Hw (xi ) > 0.
Equivalently: Find w, µ s.t. µi = yi Hw (xi ) and mini µi > 0
1 P
Define: L(µ) = log m
i exp(−µi ) (1-smooth w.r.t. k · k∞ )
Observe: L(µ) ≤ − log(m) ⇒ mini µi > 0
P
Recall from Theorem 1: L(µn ) ≤ t h∇L(µt ), vt i + 12 kvt k2∞
def
Observe: p = ∇L(µt ) ∈ Sm .
Weak learnability ⇒ exists rt s.t.
P
i pi yi hrt (xi )
≥γ
Apply Theorem 1 with vt,i = −γyi hrt (xi ) gives L(µn ) ≤ − n 2γ
Therefore, n ≥
Shai Shalev-Shwartz (TTI-C)
2 log(m)
γ2
2
⇒ mini µn,i > 0
Duality for ML
Jan’09
22 / 29
Boosting – Brief history
Is weak learnability equivalent to strong learnability ?
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
23 / 29
Boosting – Brief history
Is weak learnability equivalent to strong learnability ?
Yes!
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
23 / 29
Boosting – Brief history
Is weak learnability equivalent to strong learnability ?
Yes!
You can use AdaBoost
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
23 / 29
Boosting – Brief history
Is weak learnability equivalent to strong learnability ?
Yes!
You can use AdaBoost
Boosting is related to margin
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
23 / 29
Boosting – Brief history
Is weak learnability equivalent to strong learnability ?
Yes!
You can use AdaBoost
Boosting is related to margin
Of course, it is a corollary of the minimax theorem
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
23 / 29
Weak Learnability = Separability with `1 margin


y1 h1 (x1 ) . . . y1 hd (x1 )


..
..
..
A=

.
.
.
ym h1 (xm ) . . . ym hd (xm )
Minimax theorem
max min (A w)i = γ = minm max (pT A)j
j
p∈S
| i {z
}
|
{z
}
w∈Sd
margin
Shai Shalev-Shwartz (TTI-C)
γ Weak learnability
Duality for ML
Jan’09
24 / 29
Reinterpreting Boosting Result
Assumption
#iterations
runtime
AdaBoost
`1 margin γ
log(m)
γ2
m log(m) d
γ2
Perceptron
`2 margin γ
Winnow
`1 margin γ
d
γ2
log(d)
γ2
d2
γ2
d log(d)
γ2
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
25 / 29
Summary
f strongly convex ⇐⇒ f ? smooth
Fenchel-Young
Boosting
Theorem 1
Online Regret Bounds
zero-mean
Rademacher Bound
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
26 / 29
More Applications
Sparsification
Theorem: For smooth loss functions, any low `1 linear predictor can
be converted into sparse linear predictor
Proof idea: definition of smoothness + probabilistic construction
Theorem: Also true for non-smooth but Lipschitz loss functions
Proof idea: infimal-convolution + our main lemma ⇒ it’s possible to
approximate any Lipschitz function by a smooth function
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
27 / 29
More Applications
Sparsification
Theorem: For smooth loss functions, any low `1 linear predictor can
be converted into sparse linear predictor
Proof idea: definition of smoothness + probabilistic construction
Theorem: Also true for non-smooth but Lipschitz loss functions
Proof idea: infimal-convolution + our main lemma ⇒ it’s possible to
approximate any Lipschitz function by a smooth function
Concentration Inequalities
Pinelis-like concentration results for martingales in Banach spaces
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
27 / 29
More Applications – Matrix Regularization
Lemma: The matrix function F (A) = f (σ(A)), where f is strongly
convex w.r.t. kwk, is strongly convex w.r.t. kσ(A)k
Corollaries:
Generalization bounds for multi-task learning
Regret bounds for multi-task learning
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
28 / 29
More Applications – Matrix Regularization
Lemma: The matrix function F (A) = f (σ(A)), where f is strongly
convex w.r.t. kwk, is strongly convex w.r.t. kσ(A)k
Corollaries:
Generalization bounds for multi-task learning
Regret bounds for multi-task learning
Lemma: The matrix function F (A) = k (kA1,· k2 , . . . , kAm,· k2 ) k2q is
strongly convex w.r.t. the matrix norm k (kA1,· k2 , . . . , kAm,· k2 ) kq
Corollaries:
Generalization bounds for group Lasso, kernel learning, multi-task
learning
Regret bounds for the above and also shifting regret bounds
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
28 / 29
Summary
Lemma
f is strongly convex w.r.t. k · k ⇐⇒ f ? is strongly smooth w.r.t. k · k?
Isolating a single useful property of regularization functions
Deriving many known result easily based on this property
Good theory should also predict new results – we derived new
algorithms and bounds from the generalized theory
Shai Shalev-Shwartz (TTI-C)
Duality for ML
Jan’09
29 / 29