Lecture Slides

Online Learning
Yiling Chen
Februrary 22, 2016
Outline
I
Randomized Weighted Majority
I
Follow the Leader and Follow the Regularized Leader
I
Online Convex Optimization
Learning from Expert Advice
At each round t ∈ {1, 2, . . . , T },
I
I
The algorithm chooses a distribution p~t .
Each expert i ∈ {1, 2, . . . , n} suffers loss li,t ∈ [0, 1].
I
I
Can be generalized to any bounded loss.
The algorithm suffers expected loss p~t · ~lt .
The algorithm’s goal is to minimize regret:
T
X
t=1
p~t · ~lt −
min
i∈{1,...,n}
T
X
t=1
li,t
Regret Guarantee
The type of guarantee that algorithms for learning from expert
advice can achieve is
T
X
t=1
p~t · ~lt −
min
i∈{1,...,n}
T
X
li,t = O(
p
T log n).
t=1
This means that
PT
P
~t · ~lt − mini∈{1,...,n} Tt=1 li,t
t=1 p
→ 0 as T → ∞.
T
Hence the class of algorithms is called no-regret learning.
Randomized Weighted Majority (RWM)
The Hedge algorithm.
I
Each round an expert has a weight
wi,t = e−ηLi,t−1 = wi,t−1 e−ηli,t−1
I
The algorithm selects an expert according to distribution
wi,t
pi,t = Pn
j=1 wj,t
We’ll come back to the regret guarantee of RWM later.
Follow the Leader
Let’s think about the problem differently. What if the learning
algorithm chooses the distribution p~ with the best performance on
observed data so far?
Follow the Leader (FTL) algorithm
Pt−1 ~
P ~
~ = arg minp~∈∆n t−1
~
s=1 ls · p
s=1 Lt−1 · p
I
p~t ∈ arg minp~∈∆n
I
p~t assigns probability 1 to the best expert so far.
What about the performance of FTL?
Follow the Regularized Leader
FTL is inherently unstable. We introduce a regularization term
R(~
p) to add some stability
Follow the Regularized Leader (FTRL) algorithm
I
P
t−1 ~
p~t ∈ arg minp~∈∆n η s=1
ls · p~ + R(~p)
I
R(~p) is a convex function and η > 0.
RWM as a FTRL algorithm
I
RWM
pRWM
i,t
I
e−ηLi,t−1
P
= n −ηLj,t−1
j=1 e
FTRL
p~t ∈ arg min
p
~∈∆n
I
η
t−1
X
!
~ls · p~ + R(~p)
s=1
RWM as a FTRL
= arg min
p~RWM
t
p
~∈∆n
η
t−1
X
!
~ls · p~ − H(~p) ,
s=1
where H is the entropy function,
H(~p) =
n
X
i=1
pi log
1
.
pi
Be the Regularized Leader
I
Be-the-Regularized-Leader algorithm (an impossible
algorithm)
!
t
X
~ls · p~ + R(~p)
p~t = arg min η
p
~∈∆n
I
s=1
This gives us a super useful lemma (try to prove it
yourself using induction)
T
X
t=1
~lt · p~t+1 −
T
X
~lt · p~ ≤ 1 (R(~p − R(~p1 ))
η
t=1
Regret Guarantee of FTRL
I
Be-the-Regularized-Leader Lemma
T
X
~lt · p~t+1 −
t=1
I
T
X
t=1
~lt · p~ ≤ 1 (R(~
p − R(~
p1 ))
η
Generic regret for FTRL
T
X
p~t · ~lt −
t=1
T
X
min
i∈{1,...,n}
li,t ≤
t=1
T
X
t=1
1
+
η
I
(~
pt · ~lt − p~t · ~lt+1 )
max R(~
p) − min R(~
p)
p
~∈∆n
p
~∈∆n
Regret for RWM
T
X
p~t · ~lt −
t=1
Setting η =
q
log n
T
min
i∈{1,...,n}
T
X
li,t ≤ ηT +
t=1
√
gives O( T log n).
1
log n
η
Online Linear Convex Optimization
I
I
K ⊂ RN is a convex compact decision set.
P
t−1 ~
w
~ t ∈ arg minw∈K
η
l
·
w
~
+
R(
w)
~
~
s=1 s