Convergence of Heterogeneous Distributed

Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Convergence of Heterogeneous Distributed Learning
In Stochastic Routing Game
Syrine Krichene
Walid Krichene
Roy Dong
Alexandre Bayen
September 30, 2015
1/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Outline
1
Introduction
2
Heterogeneous Learning with Stochastic Mirror Descent
3
Simulations
2/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Routing game
Used to model congestion in
Transportation networks
Communication networks
3/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Routing game
Used to model congestion in
Transportation networks
Communication networks
0
1
5
4
6
2
3
Figure: Example network
Directed graph (V , E )
Population k: paths Pk
3/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Routing game
Used to model congestion in
Transportation networks
Communication networks
0
1
5
4
6
2
3
Figure: Example network
Directed graph (V , E )
Population k: paths Pk
Population distribution over paths xPk ∈ ∆Pk
P
Loss on path p: `p (x) = e∈p ce (φe )
3/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Routing game
Used to model congestion in
Transportation networks
Communication networks
0
1
5
4
6
2
3
Figure: Example network
Directed graph (V , E )
Population k: paths Pk
Population distribution over paths xPk ∈ ∆Pk
P
Loss on path p: `p (x) = e∈p ce (φe )
3/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Online learning model
Online Learning Model
1: for t ∈ N do
(t)
2:
Play p ∼ xPk
3:
4:
5:
(t)
Discover `Pk
(t+1)
Update xPk
end for
(t)
1
xP ∈ ∆P1
4/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Online learning model
Online Learning Model
1: for t ∈ N do
(t)
2:
Play p ∼ xPk
3:
4:
5:
(t)
Discover `Pk
(t+1)
Update xPk
end for
(t)
1
xP ∈ ∆P1
(t)
1
Sample p ∼ xP
4/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Online learning model
Online Learning Model
1: for t ∈ N do
(t)
2:
Play p ∼ xPk
3:
4:
5:
(t)
Discover `Pk
(t+1)
Update xPk
end for
(t)
1
xP ∈ ∆P1
(t)
1
Sample p ∼ xP
Discover `P1 (t)
4/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Online learning model
Online Learning Model
1: for t ∈ N do
(t)
2:
Play p ∼ xPk
3:
4:
5:
(t)
Discover `Pk
(t+1)
Update xPk
end for
(t)
1
xP ∈ ∆P1
(t)
1
Sample p ∼ xP
Discover `P1 (t)
(t+1)
1
Update xP
4/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Convergence to Nash equilibria
Nash equilibrium
x ? is a Nash equilibrium if for all x
X
h`(x ? ), x − x ? i =
`Pk (x ? ), xPk − xP? k ≥ 0
k
I.e., for each population, every path in the support of xP? k has minimal loss.
5/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Convergence to Nash equilibria
Nash equilibrium
x ? is a Nash equilibrium if for all x
X
h`(x ? ), x − x ? i =
`Pk (x ? ), xPk − xP? k ≥ 0
k
I.e., for each population, every path in the support of xP? k has minimal loss.
Rosenthal potential f
f (x) =
φe
XZ
e∈E
ce (u)du, φ = Mx
0
∇f (x) = `(x)
N = arg minx∈∆P1 ×···×∆PK f (x)
x (t) → N
⇔
f (x (t) ) − f ? → 0
5/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Previous Results
Average regret of population k
(t)
Rk (yPk ) =
t
E
1 XD
(τ )
`Pk (x (τ ) ), xPk − yPk
t τ =1
Convergence of no-regret dynamics [3]
If every population has vanishing average regret, then x̄ (t) =
1
t
Pt
τ =1
x (τ ) → N .
Convergence of multiplicative weights [7]
Under multiplicative weights learning with ηt ↓ 0, x (t) → N .
[3]Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret: on
convergence to nash equilibria of regret-minimizing algorithms in routing games.
In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed
computing, PODC ’06, pages 45–52, New York, NY, USA, 2006. ACM
[7]Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative updates
outperform generic no-regret learning in congestion games.
In Proceedings of the 41st annual ACM symposium on Theory of computing, pages 533–542.
ACM, 2009
6/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Our Results
Generalize the model:
Observations are stochastic, losses are non Lipschitz.
Learning is heterogeneous.
7/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Our Results
Generalize the model:
Observations are stochastic, losses are non Lipschitz.
Learning is heterogeneous.
More precisely,
h
i
h
i
Observe `ˆ(t) , such that E `ˆ(t) |Ft−1 = `(x (t) ) a.s., and E k`ˆ(t) k2∗ ≤ G 2
uniformly.
Observation noise, or learning model with bandit feedback (form an
unbiased estimator of the loss vector).
Populations can apply different learning algorithms, in particular, different
learning rates ηtk = θk t −αk .
7/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Our Results
Generalize the model:
Observations are stochastic, losses are non Lipschitz.
Learning is heterogeneous.
More precisely,
h
i
h
i
Observe `ˆ(t) , such that E `ˆ(t) |Ft−1 = `(x (t) ) a.s., and E k`ˆ(t) k2∗ ≤ G 2
uniformly.
Observation noise, or learning model with bandit feedback (form an
unbiased estimator of the loss vector).
Populations can apply different learning algorithms, in particular, different
learning rates ηtk = θk t −αk .
Convergence of Distributed Stochastic Mirror Descent
For ηtk =
θk
t αk
, αk ∈ (0, 1),
h
E f (x
(t)
i
?
) −f =O
X
k
log t
!
t min(αk ,1−αk )
In the strongly convex, homogeneous case,
h
i
E Dψ (x ? , x (t) ) = O t −α
7/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Stochastic Mirror Descent
minimize
subject to
convex function
f (x)
d
x ∈X ⊂R
convex, compact set
[9]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in
optimization.
Wiley-Interscience series in discrete mathematics. Wiley, 1983
[8]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation
approach to stochastic programming.
SIAM Journal on Optimization, 19(4):1574–1609, 2009
8/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Stochastic Mirror Descent
minimize
subject to
convex function
f (x)
d
x ∈X ⊂R
convex, compact set
Algorithm 2 MD Method with learning rates (ηt )
1: for t ∈ N do
2:
`(t) ∈ ∂f (x (t) ) D
E
3:
x (t+1) = arg min `(t) , x + η1t Dψ (x, x (t) )
x∈X
4:
f (x(t+1) )
f (x(t) )
end for
ηt : learning rate
Dψ : Bregman divergence generated by a
strongly convex function ψ
f (x)
f (x(t) ) + h`(t) , x − x(t) i
f (x(t) ) + h`(t) , x − x(t) i +
1
(t)
ηt Dψ (x, x )
[9]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in
optimization.
Wiley-Interscience series in discrete mathematics. Wiley, 1983
[8]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation
approach to stochastic programming.
SIAM Journal on Optimization, 19(4):1574–1609, 2009
8/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Stochastic Mirror Descent
minimize
subject to
convex function
f (x)
d
x ∈X ⊂R
convex, compact set
Algorithm 2 MD Method with learning rates (ηt )
1: for t ∈ N do
(t)
2:
observe `Pk ∈ ∂Pk f (x (t) )
D
E
(t+1)
(t)
(t)
3:
xPk = arg min `Pk , x + η1k Dψk (x, xPk )
x∈XP
4:
k
f (x(t+1) )
t
f (x(t) )
end for
ηt : learning rate
Dψ : Bregman divergence generated by a
strongly convex function ψ
f (x)
f (x(t) ) + h`(t) , x − x(t) i
f (x(t) ) + h`(t) , x − x(t) i +
1
(t)
ηt Dψ (x, x )
[9]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in
optimization.
Wiley-Interscience series in discrete mathematics. Wiley, 1983
[8]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation
approach to stochastic programming.
SIAM Journal on Optimization, 19(4):1574–1609, 2009
8/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Stochastic Mirror Descent
minimize
subject to
convex function
f (x)
d
x ∈X ⊂R
convex, compact set
Algorithm 2 SMD Method with learning rates (ηt )
1: for t ∈ N do
h
i
(t)
(t)
2:
observe `ˆPk with E `ˆPk |Ft−1 ∈ ∂Pk f (x (t) )
D
E
(t)
(t+1)
(t)
3:
xPk = arg min `ˆPk , x + η1k Dψk (x, xPk )
x∈XP
4:
k
f (x(t+1) )
t
f (x(t) )
end for
ηt : learning rate
Dψ : Bregman divergence generated by a
strongly convex function ψ
f (x)
f (x(t) ) + h`(t) , x − x(t) i
f (x(t) ) + h`(t) , x − x(t) i +
1
(t)
ηt Dψ (x, x )
[9]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in
optimization.
Wiley-Interscience series in discrete mathematics. Wiley, 1983
[8]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation
approach to stochastic programming.
SIAM Journal on Optimization, 19(4):1574–1609, 2009
8/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Bregman Divergence
Bregman Divergence
Strongly convex function ψ
Dψ (x, y ) = ψ(x) − ψ(y ) − h∇ψ(y ), x − y i
9/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Bregman Divergence
Bregman Divergence
Strongly convex function ψ
Dψ (x, y ) = ψ(x) − ψ(y ) − h∇ψ(y ), x − y i
ψ(x) = 21 kxk22 , Dψ (x, y ) = 21 kx − y k22 (SGD)
9/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Bregman Divergence
Bregman Divergence
Strongly convex function ψ
Dψ (x, y ) = ψ(x) − ψ(y ) − h∇ψ(y ), x − y i
ψ(x) = 21 kxk22 , Dψ (x, y ) = 21 kx − y k22 (SGD)
P
P
ψ(x) = −H(x) = di=1 xi ln xi , Dψ (x, y ) = DKL (x, y ) = di=1 xi ln
xi
yi
.
δ2
q
δ1
δ3
Figure: KL divergence
9/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: the Hedge algorithm
(t+1)
xPk
D
E
(t)
= arg min `Pk , x +
x∈Xk
1
ηtk
(t)
DKL (x, xPk ).
Hedge algorithm
Update the distribution according to observed loss
k (t)
xp(t+1) ∝ xp(t) e −ηt `p
[5]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.
Cambridge University Press, 2006
[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update
method: a meta-algorithm and applications.
Theory of Computing, 8(1):121–164, 2012
[6]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient
descent for linear predictors.
Information and Computation, 132(1):1 – 63, 1997
[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient
10/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: the Hedge algorithm
(t+1)
xPk
D
E
(t)
= arg min `Pk , x +
x∈Xk
1
ηtk
(t)
DKL (x, xPk ).
Hedge algorithm
Update the distribution according to observed loss
k (t)
xp(t+1) ∝ xp(t) e −ηt `p
Also known as
Exponentially weighted average forecaster [5].
[5]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.
Cambridge University Press, 2006
[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update
method: a meta-algorithm and applications.
Theory of Computing, 8(1):121–164, 2012
[6]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient
descent for linear predictors.
Information and Computation, 132(1):1 – 63, 1997
[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient
10/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: the Hedge algorithm
(t+1)
xPk
D
E
(t)
= arg min `Pk , x +
x∈Xk
1
ηtk
(t)
DKL (x, xPk ).
Hedge algorithm
Update the distribution according to observed loss
k (t)
xp(t+1) ∝ xp(t) e −ηt `p
Also known as
Exponentially weighted average forecaster [5].
Multiplicative weight updates [1].
[5]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.
Cambridge University Press, 2006
[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update
method: a meta-algorithm and applications.
Theory of Computing, 8(1):121–164, 2012
[6]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient
descent for linear predictors.
Information and Computation, 132(1):1 – 63, 1997
[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient
10/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: the Hedge algorithm
(t+1)
xPk
D
E
(t)
= arg min `Pk , x +
x∈Xk
1
ηtk
(t)
DKL (x, xPk ).
Hedge algorithm
Update the distribution according to observed loss
k (t)
xp(t+1) ∝ xp(t) e −ηt `p
Also known as
Exponentially weighted average forecaster [5].
Multiplicative weight updates [1].
Exponentiated gradient descent [6].
[5]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.
Cambridge University Press, 2006
[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update
method: a meta-algorithm and applications.
Theory of Computing, 8(1):121–164, 2012
[6]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient
descent for linear predictors.
Information and Computation, 132(1):1 – 63, 1997
[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient
10/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: the Hedge algorithm
(t+1)
xPk
D
E
(t)
= arg min `Pk , x +
x∈Xk
1
ηtk
(t)
DKL (x, xPk ).
Hedge algorithm
Update the distribution according to observed loss
k (t)
xp(t+1) ∝ xp(t) e −ηt `p
Also known as
Exponentially weighted average forecaster [5].
Multiplicative weight updates [1].
Exponentiated gradient descent [6].
Entropic descent [2].
[5]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.
Cambridge University Press, 2006
[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update
method: a meta-algorithm and applications.
Theory of Computing, 8(1):121–164, 2012
[6]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient
descent for linear predictors.
Information and Computation, 132(1):1 – 63, 1997
[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient
10/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: the Hedge algorithm
(t+1)
xPk
D
E
(t)
= arg min `Pk , x +
x∈Xk
1
ηtk
(t)
DKL (x, xPk ).
Hedge algorithm
Update the distribution according to observed loss
k (t)
xp(t+1) ∝ xp(t) e −ηt `p
Also known as
Exponentially weighted average forecaster [5].
Multiplicative weight updates [1].
Exponentiated gradient descent [6].
Entropic descent [2].
Log-linear learning
[5]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.
Cambridge University Press, 2006
[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update
method: a meta-algorithm and applications.
Theory of Computing, 8(1):121–164, 2012
[6]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient
descent for linear predictors.
Information and Computation, 132(1):1 – 63, 1997
[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient
10/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Main tool
A regret bound:
t2
X
τ =t1
h
i
(t )
t2
hD
Ei
E Dψm (xm , xm 1 )
G2 X m
1
1
)
(τ )
E `(τ
≤
+
ητ
+D
−
m
m , xm − xm
ηtm1
ηtm2
ηtm1
2µm τ =t
[10]H. Robbins and D. Siegmund. A convergence theorem for non negative almost
supermartingales and some applications.
Optimizing Methods in Statistics, 1971
[4]Léon Bottou. Online algorithms and stochastic approximations.
In David Saad, editor, Online Learning and Neural Networks. Cambridge University Press,
1
11/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Main tool
A regret bound:
t2
X
τ =t1
h
i
(t )
t2
hD
Ei
E Dψm (xm , xm 1 )
G2 X m
1
1
)
(τ )
E `(τ
≤
+
ητ
+D
−
m
m , xm − xm
ηtm1
ηtm2
ηtm1
2µm τ =t
From here,
h
i
Can easily show E f (x̄ (t) ) → f ? , where x̄ (t) =
1
1
t
Pt
τ =1
x (τ ) .
[10]H. Robbins and D. Siegmund. A convergence theorem for non negative almost
supermartingales and some applications.
Optimizing Methods in Statistics, 1971
[4]Léon Bottou. Online algorithms and stochastic approximations.
In David Saad, editor, Online Learning and Neural Networks. Cambridge University Press,
11/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Main tool
A regret bound:
t2
X
τ =t1
h
i
(t )
t2
hD
Ei
E Dψm (xm , xm 1 )
G2 X m
1
1
)
(τ )
E `(τ
≤
+
ητ
+D
−
m
m , xm − xm
ηtm1
ηtm2
ηtm1
2µm τ =t
1
From here,
h
i
P
Can easily show E f (x̄ (t) ) → f ? , where x̄ (t) = 1t tτ =1 x (τ ) .
P
P 2
Can show a.s. convergence x (t) → X ? if
ηt = ∞ and
ηt < ∞
i
h
i
η2 h
E Dψ (X ? , x (τ +1) )|Fτ −1 ≤ Dψ (X ? , x (τ ) )−ητ (f (x (τ ) ) − f ? )+ τ E k`ˆ(τ ) k2∗ |Fτ −1
2µ
[10]H. Robbins and D. Siegmund. A convergence theorem for non negative almost
supermartingales and some applications.
Optimizing Methods in Statistics, 1971
[4]Léon Bottou. Online algorithms and stochastic approximations.
In David Saad, editor, Online Learning and Neural Networks. Cambridge University Press,
11/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Main tool
A regret bound:
t2
X
τ =t1
h
i
(t )
t2
hD
Ei
E Dψm (xm , xm 1 )
G2 X m
1
1
)
(τ )
E `(τ
≤
+
ητ
+D
−
m
m , xm − xm
ηtm1
ηtm2
ηtm1
2µm τ =t
1
From here,
h
i
P
Can easily show E f (x̄ (t) ) → f ? , where x̄ (t) = 1t tτ =1 x (τ ) .
P
P 2
Can show a.s. convergence x (t) → X ? if
ηt = ∞ and
ηt < ∞
i
h
i
η2 h
E Dψ (X ? , x (τ +1) )|Fτ −1 ≤ Dψ (X ? , x (τ ) )−ητ (f (x (τ ) ) − f ? )+ τ E k`ˆ(τ ) k2∗ |Fτ −1
2µ
Dψ (X ? , x (τ ) ) is an P
almost super martingale [10], so Dψ (X ? , x (τ ) )
converges a.s. and τ ητ (f (x (τ ) ) − f ? ) < ∞ a.s.
Generalizes a known result in stochastic approximation, e.g. [4] (for SGD,
for strictly convex functions).
[10]H. Robbins and D. Siegmund. A convergence theorem for non negative almost
supermartingales and some applications.
Optimizing Methods in Statistics, 1971
[4]Léon Bottou. Online algorithms and stochastic approximations.
In David Saad, editor, Online Learning and Neural Networks. Cambridge University Press,
11/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Main tools and results
h
i
To show convergence E f (x (t) ) → f ? , generalize the technique of Shamir
et al. [11] (for SGD, α = 21 ).
Convergence of Distributed Stochastic Mirror Descent
For ηtk =
θk
t αk
, αk ∈ (0, 1),
h
E f (x
(t)
i
?
) −f =O
X
k
log t
!
t min(αk ,1−αk )
Non-smooth, non-strongly convex.
[11]Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth
optimization: Convergence results and optimal averaging schemes.
In ICML, pages 71–79, 2013
12/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: routing game with non strongly convex potential
1
2
3
4
0
Figure: A non strongly convex example.
Learning model: (smoothed) entropic mirror descent, with ηtk = θk t −αk
13/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: routing game with non strongly convex potential
10−2
ηt1 = t−.3 , ηt2 = t−.4
f (x(τ ) ) − f ∗
10−3
10−4
10−5
10−6 0
10
101
102
τ
Figure: Potential values.
P
For tθαkk , αk ∈ (0, 1), E f (x (t) ) − f ? = O
k
log t
t min(αk ,1−αk )
14/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: routing game with non strongly convex potential
10−2
ηt1 = t−.3 , ηt2 = t−.4
E f (x(τ ) )] − f ∗
10−3
10−4
10−5
10−6 0
10
101
102
τ
Figure: Potential values.
P
For tθαkk , αk ∈ (0, 1), E f (x (t) ) − f ? = O
k
log t
t min(αk ,1−αk )
14/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: routing game with non strongly convex potential
10−2
ηt1 = t−.3 , ηt2 = t−.4
ηt1 = t−.5 , ηt2 = t−.5
E f (x(τ ) )] − f ∗
10−3
10−4
10−5
10−6 0
10
101
102
τ
Figure: Potential values.
P
For tθαkk , αk ∈ (0, 1), E f (x (t) ) − f ? = O
k
log t
t min(αk ,1−αk )
14/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: strongly convex potential
0
1
5
4
6
2
3
Figure: A strongly convex example.
Learning model: (smoothed) entropic mirror descent, with ηt = t −1
15/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Example: routing game with non strongly convex potential
101
ηt1 = t−1 , ηt2 = t−1
E DKL (x? , x(τ ) )
100
10−1
10−2
10−3
10−4 0
10
101
102
τ
Figure:
Potential
values. E Dψ (x ? , x (t) ) = O t −1
16/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Conclusion
Summary
A more realistic model: stochastic observations, non-Lipschitz,
heterogeneous learning.
Convergence bounds for Stochastic Mirror Descent, with heterogeneous
learning rates.
Convergence of x (t) instead of x̄ (t) .
17/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Conclusion
Summary
A more realistic model: stochastic observations, non-Lipschitz,
heterogeneous learning.
Convergence bounds for Stochastic Mirror Descent, with heterogeneous
learning rates.
Convergence of x (t) instead of x̄ (t) .
Current and future work
Model of learning at the player level.
Estimation of model parameters (e.g. learning rate)
Optimal control on top of this behavioral model
17/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
Thank you.
eecs.berkeley.edu/∼walid
18/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
References I
[1] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights
update method: a meta-algorithm and applications. Theory of
Computing, 8(1):121–164, 2012.
[2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected
subgradient methods for convex optimization. Oper. Res. Lett., 31(3):
167–175, May 2003.
[3] Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret:
on convergence to nash equilibria of regret-minimizing algorithms in
routing games. In Proceedings of the twenty-fifth annual ACM symposium
on Principles of distributed computing, PODC ’06, pages 45–52, New
York, NY, USA, 2006. ACM.
[4] Léon Bottou. Online algorithms and stochastic approximations. In David
Saad, editor, Online Learning and Neural Networks. Cambridge University
Press, Cambridge, UK, 1998. revised, oct 2012.
[5] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.
Cambridge University Press, 2006.
19/20
Introduction
Heterogeneous Learning with Stochastic Mirror Descent
Simulations
References
References II
[6] Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus
gradient descent for linear predictors. Information and Computation, 132
(1):1 – 63, 1997.
[7] Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative
updates outperform generic no-regret learning in congestion games. In
Proceedings of the 41st annual ACM symposium on Theory of computing,
pages 533–542. ACM, 2009.
[8] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic
approximation approach to stochastic programming. SIAM Journal on
Optimization, 19(4):1574–1609, 2009.
[9] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method
efficiency in optimization. Wiley-Interscience series in discrete
mathematics. Wiley, 1983.
[10] H. Robbins and D. Siegmund. A convergence theorem for non negative
almost supermartingales and some applications. Optimizing Methods in
Statistics, 1971.
[11] Ohad Shamir and Tong Zhang. Stochastic gradient descent for
non-smooth optimization: Convergence results and optimal averaging
schemes. In ICML, pages 71–79, 2013.
20/20