Multiplicative Update Algorithm - For Learning a Zero

Multiplicative Update Algorithm
For Learning a Zero-Sum Game
Lecture 14, part 1, CMPS 272, W12
Manfred Warmuth
Slides by: Corrie Scalisi
University of California, Santa Cruz
Feb 23, 2012
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
1 / 17
Introduction and notation
Outline
1
Introduction and notation
2
Hedge / Weighted Majority Algorithm
3
MinMax Theorem
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
2 / 17
Introduction and notation
Goal
Algorithm for finding optimal mixture strategy for zero-sum game
Iterative play
Game matrix partially known and may change over time
Opponent may not play optimally
Want algorithm with total loss provably not much worse than best
possible fixed strategy chosen in hindsight
Bounds must hold against any opponent
Nature can be adversarial
No probabilistic assumptions needed
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
3 / 17
Introduction and notation
Game Matrix
Game given as matrix M
Column Player
Row Player
1
.1
.5
0
.5
.7
1
.2
1
Rows in M are pure strategies of row player
Columns are pure strategies of column player
M(i, j) = loss of row player if row player chooses pure strategy i
and column player chooses pure strategy j
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
4 / 17
Introduction and notation
Mixed Strategies
Q(1)
. . . Q(m)
P(1)
.
.
.
M
P(m)
Row player chooses row i with probability P(i)
Column player chooses column j with probability Q(j)
P, Q are non-negative and each sums to 1 - share vectors
Following notation of Freund and Schapire papers
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
5 / 17
Introduction and notation
Expected Loss
Expected Loss of row player
M(P, Q) := P T MQ
(all vectors are column vectors)
PT
M
Q
Assumption
M COL = −(M ROW )T
i.e. loss of row player is gain of the column player (zero-sum)
Deep underlying question:
What part of this generalizes to non-zero-sum games
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
6 / 17
Introduction and notation
Value of Game
v =
|{z}
game
value
min
P
|{z}
row minimizes
max
Q
| {z }
col maximizes
T
P
MQ}
| {z
expected
loss/payoff
If
v = M(P ∗ , Q ∗ ),
then P ∗ and Q ∗ optimal strategies
Lecture mw-lect8-part1:
v , P ∗ and Q ∗ can be found with Linear Programming
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
7 / 17
Introduction and notation
Learning a Good Mixed Strategy for Row Player
M fixed but unknown (except for its size)
For trial # t:
1
Row player (learner) chooses mixed strategy Pt
2
Column player (environment) chooses mixed strategy Qt
3
Learner is told loss of pure strategies
M(i, Qt ) = EiT MQt , where Ei is ith unit vector
4
Learner suffers loss M(Pt , Qt ) = PtT MQt
5
Use (3) to update Pt
Goal: Minimize regret against best strategy P chosen in hindsight:
T
X
t=1
Manfred Warmuth (UCSC)
M(Pt , Qt ) − min
P
T
X
M(P, Qt )
t=1
Multiplicative Update Algorithm
Feb 23, 2012
8 / 17
Hedge / Weighted Majority Algorithm
Outline
1
Introduction and notation
2
Hedge / Weighted Majority Algorithm
3
MinMax Theorem
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
9 / 17
Hedge / Weighted Majority Algorithm
Hedge Algorithm
P1 = initial mixture used by learner
M(i,Qt )
,
Pt+1 = Pt (i)βZt
where Zt is the normalization and β ∈ [0, 1) is the discount factor
used by the algorithm
β = e−η , where η is a positive learning rate
Pt (1)
Pt (n)
Pt (2)
Pt (i) is the fraction of the population playing row i
Update is example where Replicator Dynamics leads to learning
Recall that algorithm only knows M via feedbacks M(i, Qt )
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
10 / 17
Hedge / Weighted Majority Algorithm
Variants of the “multiplicative updates”
The update
Pt+1 (i) =
Pt (i)Ft (i)
Zt
with any factors Ft (i) ∈ β M(i,Qt ) , 1 − (1 − β)M(i, Qt )
will produce the same bounds
Which update factors are the most biologically plausible?
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
11 / 17
Hedge / Weighted Majority Algorithm
Multiplicative Updates
Blessing: Fast
Curse: Some species get wiped out while others dominate
Thesis: All updates in nature are multiplicative
Only species that survive are those that avoid the curse via some
mechanism
mutations
cycling
predator
...
Multiplicative updates are motivated as relative entropy
minimization problems (later)
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
12 / 17
Hedge / Weighted Majority Algorithm
Theorem
For any matrix M with entries in [0, 1] and any Q1 , . . . , QT played by
environment (column player)



 ln( β1 )
M(Pt , Qt ) ≤ min 
P 
1 − β
|t=1 {z
}
T
X
total loss of alg
T
X
1
+
1−β
M(P, Qt )
t=1
|
{z


RE(P||P1 ) 
| {z } 

“distance” of P to P1
}
total loss of comparator P
where relative entropy RE(P||P1 ) is defined as
P
i
P(i)ln PP(i)
1 (i)
If initial mixture P1 = ( n1 , · · · , n1 ), then RE(P||P1 ) ∈ [0, ln(n)].
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
13 / 17
Hedge / Weighted Majority Algorithm
Theorem continued
M may change over time (i.e. Mt )
Holds as long as loss M(i, Qt ) = EiT MQt of row i, lies in [0, 1]
With β =
1+
q1
2ln(n)
T
, bound of theorem becomes


T
T
1 X
1X
2ln(n) ln(n) 

M(Pt , Qt ) ≤ min 
M(P, Qt ) +
+


T
T
T
T
P
|
{z
}
t=1
t=1
r
MT ,n
MT ,n = O
q
Manfred Warmuth (UCSC)
ln(n)
T
and lim MT ,n = 0.
T →∞
Multiplicative Update Algorithm
Feb 23, 2012
14 / 17
MinMax Theorem
Outline
1
Introduction and notation
2
Hedge / Weighted Majority Algorithm
3
MinMax Theorem
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
15 / 17
MinMax Theorem
MinMax Theorem
[Van Neumann]
min max M(P, Q) = max min M(P, Q)
P
Q
Q
P
The easy ≥ direction:
For any optimum strategy P ∗ of the row player
min max M(P, Q) = max M(P ∗ , Q)
P
Q
Q
≥ max min M(P, Q)
Q
P
The hard ≤ direction:
Follows from duality of Linear Programming
Let’s prove it with our algorithm!
Manfred Warmuth (UCSC)
Multiplicative Update Algorithm
Feb 23, 2012
16 / 17
MinMax Theorem
≤
[Freund & Schapire]
Assume that in each round t, environment chooses Qt = argmax M(Pt , Q)
Q
P
P
Define the average probability distributions as P̄ = T1 Tt=1 Pt , Q̄ = T1 Tt=1 Qt
min max P T MQ
P
≤
Q
max P̄ T MQ
Q
=
max
Q
T
1 X T
Pt MQ
T
by definition of P̄
t=1
≤
T
1 X
max PtT MQ
Q
T
=
T
1 X T
Pt MQt
T
t=1
by definition of Qt
t=1
≤
=
≤
min
P
by Theorem
t=1
min P T M Q̄+ MT ,n
P
by definition of Q̄
T
max min P MQ+ MT ,n
Q
Manfred Warmuth (UCSC)
T
1 X T
P MQt + MT ,n
T
P
Multiplicative Update Algorithm
since lim MT ,n = 0
T →∞
Feb 23, 2012
17 / 17

Download Report

Multiplicative Update Algorithm - For Learning a Zero

Paperzz.com

Your Paperzz