מצגת של PowerPoint

Computing Nash Equilibrium
Presenter: Yishay Mansour
1
Outline
•
•
•
•
Problem Definition
Notation
Today: Zero-Sum game
Next week: General Sum Games
– Multiple players
2
Model
• Multiple players N={1, ... , n}
• Strategy set
– Player i has m actions Si = {si1, ... , sim}
– Si are pure actions of player i
– S = i S i
• Payoff functions
– Player i ui : S  
3
Strategies
• Pure strategies: actions
• Mixed strategy
– Player i – pi distribution over Si
– Game - P = i pi
• Product distribution
• Modified distribution
– P-i = probability P except for player i
– (q, P-i ) = player i plays q other player pj
4
Notations
• Average Payoff
– Player i: ui(P) = Es~P[ui(s)] =  P(s)ui(s)
– P(s) = i pi (si)
• Nash Equilibrium
– P* is a Nash Eq. If for every player i
– For any distribution qi
– ui(qi,P*-i)  ui(P*)
• Best Response
5
Notations
• Alternative payoff
– xij(P) = ui(sij,P-i) = Es~P[ui(s) | si = sij]
• Difference in payoff
– zij(P) = xij(P) – ui(P)
• Improvement in payoff
– gij(P) = max{ zij(P),0}
6
Fixed point Theorems
• Intermediate Value Theorem
–
–
–
–
–
–
domain [a,b]
function f continuous
f(a) f(b) < 0
exists z such that f(z)=0
Proof: M+ = { x | f(x) 0} M- ={x | f(x)  0}
closed sets and have an intersection.
7
Brouwer’s Fixed point theorem
• f: S  S continuous, S compact and convex
• There exists z in S : z = f(z)
– For S=[0,1], previous theorem
8
Kakutani’ Fixed Point Theorem
• L: S  S correspondence
– L(x) is a convex set
– L semi-continuous
– S compact and convex
• There exists z: z in L(z)
9
Nash Equilibrium I
• Best response correspondence
– L(P) = argmaxQ { ui(qi, P-i)}
– L is a correspondence, continuous
– Nash is a fixed point of L
• P* in L(P*)
– Kakutani’s fixed point theorem
10
Nash Equilibrium II
• Fixed point
– K(P) has mN parameters
– Kij(P) = (pij+gij(P)) / (1 +  gij(P))
– Nash is a fixed point of K
• P* = K(P*)
– Original proof of Nash
– Continuous function on a compact space
• Brouwer’s fixed point theorem
11
Nash Equilibrium III
• Non-linear complementary problem (NCP)
– Recall zij(P)
– For every player i and action aij:
• zij(P)*pij = 0
• zi(P) is orthogonal to pi
– Nash: z(P*)  0
• zij(P*)  0
12
Nash Equilibrium IV
• Stationary point problem
–
–
–
–
Recall: x = alternative payoff
Nash: P*
For every P
(P-P*) x(P*)  0
• (pij –p*ij) x(P*)  0
13
Nash Equilibrium V
• Minimizing a function
– Objective function:
– V(P) = i j [gij(P)]2
– V(P) is continuous and differentiable, nonnegative function
– NASH: V(P*) = 0
• Local Minima
14
Nash Equilibrium VI
• Semi-Algebraic set
– distribution P: j pij = 1
– difference in payoff:
• zij(P)  0
• zij(P) = xij(P) – ui(P)  0
• Explicitly:
zij ( P) 

s1 ,..., sn S
ui (s, sij ) pi (sk ) 
k i

s1 ,..., sn S
ui (s) pi (sk )
k
15
Two player games
• Payoff matrices (A,B)
– m rows and n columns
– player 1 has m action, player 2 has n actions
• strategies p and q
• Payoffs: u1(pq)=pAqt and u2(pq)= pBqt
• Zero sum game
– A= -B
16
Linear Programming
• Primal LP: SET primal  {x   n :
a x
ij
j
 bi
j
a
ij
x j  bi
j
x j  0}
• x in SETprimal is feasible
• maximize <c,x> subject to x in SETprimal
17
Linear Programming
• Dual LP:
SET dual  { y   m :
ya
i ij
 cj
i
ya
i ij
 cj
i
yi  0}
• y in SETdual is feasible
• minimize <b,y> subject to y in SETdual
18
Duality Theorem
• Weak duality: <c,x>  <b,y>
– for any feasible x and y
– proof!
• Strong Duality
– If there are feasible solutions then
– <c,x> = <b,y> for some feasible x and y
– sketch of proof.
19
Two players zero sum
• Fix strategy q of player 2,
• player 1 best response:
– maximize p (Aqt) such that j pj = 1 and pj 0
– dual LP: minimize u such that u  Aqt
• Player 2: select strategy q :
– minimize u such that u  Aqt and i qi = 1 and qi 0
– dual (strategy for player 1)
– maximize v such that v  pA, j pj = 1 and pj 0
• There exists a unique value v.
20
Example
21
Summary
• Two players zero sum
–
–
–
–
–
–
linear programming
polynomial time
can have multiple Nash
unique value!
If (p,q) and (p’,q’) Nash then
(p,q’) and (p’,q) Nash
22
Online learning
• Playing with unknown payoff matrix
• Online algorithm:
– at each step selects an action.
• can be stochastic or fractional
– Observes all possible payoffs
– Updates its parameters
• Goal: Achieve the value of the game
– Payoff matrix of the “game” define at the end
23
Online learning - Algorithm
• Notations:
– Opponent distribution Qt
– Our distribution Pt
– Observed cost M(i, Qt)
• Should be MQt
– Goal: minimize cost
• Algorithm: Exponential weights
– Action i has weight proportional to bL(i,t)
– L(i,t) = loss of action i until time t
24
Online algorithm: Notations
• Formally:
–
–
–
–
–
parameter: b 0< b < 1
wt+1(i) = wt(i) bM(i,Qt)
Zt =  wt(i)
Pt+1(i) = wt+1(i) / Zt
Number of total steps T is known
25
Online algorithm: Theorem
• Theorem
– For any matrix M with entries in [0,1]
– Any sequence of dist. Q1 ... QT
– The algorithm generates P1, ... , PT
1
 ln( 1 / b) T

M ( Pt , Qt )  min P 
M ( P, Qt ) 
RE ( P || P1 )


1 b
t 1
 1  b t 1

T
– RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]
26
Online algorithm: Analysis
• Lemma
– For any mixed strategy P
RE ( P || Pt 1 )  RE ( P || Pt )  ln( 1 / b)M ( P, Qt )  ln( 1  (1  b)) M ( Pt , Qt )
• Corollary
1
 ln( 1 / b) T

M ( Pt , Qt )  min P 
M ( P, Qt ) 
ln n


t 1
 1  b t 1
 1 b
T
27
Online Algorithm: Optimization
• b= 1/(1 + sqrt{2 (ln n) / T})
• Average Loss: v + O(sqrt{(ln n )/T})
28
Two players General sum games
•
•
•
•
Input matrices (A,B)
No unique value
Computational issues: find some, all Nash
player 1 best response:
–
–
–
–
Like for zero sum:
Fix strategy q of player 2
maximize p (Aqt) such that j pj = 1 and pj 0
dual LP: minimize u such that u  Aqt
29
Two players General sum games
• Assume the support of strategies known.
– p has support Sp and q has support Sq
– Can formulate the Nash as LP:
a q
ij
j
 v for i  S p
j
pa
i ij
 u for j  S q
i
a q
ij
j
 v for i  S p
j
pa
i ij
 u for j  S q
i
pi  0 for i  S p
q j  0 for j  S q
pi  0 for i  S p
q j  0 for j  S q
p
i
i
1
q
j
j
1
30
Approximate Nash
31
Lemke & Howson
32
Example
33