Finding Nash Equilibria in Bimatrix Games

Finding Nash Equilibria in Bimatrix Games
John Fearnley
University of Liverpool
Outline
Part 1 - The complexity of finding Nash equilibria
Part 2 - Algorithms for finding Nash equilibria
Part 3 - Lemke’s algorithm for discounted games
Outline
Part 1 - The complexity of finding Nash equilibria
I
What is a bimatrix game?
I
The complexity of finding Nash equilibria
I
The complexity of finding approximate equilibria
Part 2 - Algorithms for finding Nash equilibria
Part 3 - Lemke’s algorithm for discounted games
Bimatrix games
I
r
p
s
r
0, 0
-1, 1
1, -1
p
1, -1
0, 0
-1, 1
s
-1, 1
1, -1
0, 0
Two matrices R and C
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
1
2
p
1, -1
0, 0
-1, 1
0
s
-1, 1
1, -1
0, 0
I
Two matrices R and C
I
Mixed strategies: row vector x and column vector y
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
I
Two matrices R and C
I
Mixed strategies: row vector x and column vector y
I
Row player payoff: x · R · y
I
Column player payoff: x · C · y
Bimatrix games
I
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
Column player best responses: max(x · C )
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
I
Column player best responses: max(x · C )
I
Regret: payoff of best response - payoff
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
I
Column player best responses: max(x · C )
I
Regret: payoff of best response - payoff
I
Column player regret: max(x · C ) − x · C · y
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
I
Column player best responses: max(x · C )
I
Regret: payoff of best response - payoff
I
Column player regret: max(x · C ) − x · C · y
I
Row player regret: max(R · y ) − x · R · y
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
(x, y ) is a mixed Nash equilibrium if and only if
both players have regret 0
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
(x, y ) is a mixed Nash equilibrium if and only if
both players have regret 0
(x, y ) is a mixed Nash equilibrium if and only if
both players only play best responses
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
Nash’s Theorem (1951)
Every bimatrix game has a mixed Nash equilibrium
Bimatrix games
1
3
1
3
1
3
r
p
s
1
2
r
0, 0
-1, 1
1, -1
0
1
2
p
1, -1
0, 0
-1, 1
0
0
s
-1, 1
1, -1
0, 0
0
−0.5
0.5
0
Nash’s Theorem (1951)
Every bimatrix game has a mixed Nash equilibrium
What is the complexity of finding a Nash equilibrium?
The complexity of finding a Nash equilibrium
Nash’s theorem implies that the problem is in TFNP
The complexity of finding a Nash equilibrium
Actually, the problem lies in PPAD ⊆ TFNP
The complexity of finding a Nash equilibrium
Actually, the problem lies in PPAD ⊆ TFNP
PPAD is the class of problems reducible to End-of-the-Line:
I
Given an implicitly represented directed graph where
I
Every vertex has in-degree and out-degree at most 1
The complexity of finding a Nash equilibrium
Actually, the problem lies in PPAD ⊆ TFNP
PPAD is the class of problems reducible to End-of-the-Line:
I
Given an implicitly represented directed graph where
I
I
Every vertex has in-degree and out-degree at most 1
A vertex with in-degree 0
The complexity of finding a Nash equilibrium
Actually, the problem lies in PPAD ⊆ TFNP
PPAD is the class of problems reducible to End-of-the-Line:
I
Given an implicitly represented directed graph where
I
Every vertex has in-degree and out-degree at most 1
I
A vertex with in-degree 0
I
Find another vertex with either in-degree 0 or out-degree 0
The complexity of finding a Nash equilibrium
Side note: End-of-the-Line asks for the end of any line
The complexity of finding a Nash equilibrium
Side note: End-of-the-Line asks for the end of any line
Other-End-of-the-Line is PSPACE-complete!
The complexity of finding a Nash equilibrium
Theorem (DGP 2006)
Finding a Nash equilibrium in 3-player games is PPAD-complete
The complexity of finding a Nash equilibrium
Theorem (DGP 2006)
Finding a Nash equilibrium in 3-player games is PPAD-complete
I
EOTL → Brouwer → graphical games → 3-player games
The complexity of finding a Nash equilibrium
Theorem (DGP 2006)
Finding a Nash equilibrium in 3-player games is PPAD-complete
I
EOTL → Brouwer → graphical games → 3-player games
Theorem (CDT 2006)
Finding a Nash equilibrium in 2-player games is PPAD-complete
The complexity of finding a Nash equilibrium
Summary
No efficient algorithms for finding Nash equilibria unless PPAD=P
Approximate Equilibria
Recall: row player’s regret is max(R · y ) − x · R · y
(x, y ) is a Nash equilibrium if and only if
both players have regret 0
Approximate Equilibria
Recall: row player’s regret is max(R · y ) − x · R · y
(x, y ) is a Nash equilibrium if and only if
both players have regret 0
(x, y ) is an -Nash equilibrium if and only if
both players have regret at most Approximate Equilibria
Recall: row player’s regret is max(R · y ) − x · R · y
(x, y ) is a Nash equilibrium if and only if
both players have regret 0
(x, y ) is an -Nash equilibrium if and only if
both players have regret at most Note that this is an additive approximation
Payoff Normalisation
We normalise payoffs to the range [0, 1].
0, 0
-1, 1
1, -1
1 1
2, 2
0, 1
1, 0
1, -1
0, 0
-1, 1
1, 0
1 1
2, 2
0, 1
-1, 1
1, -1
0, 0
0, 1
1, 0
1 1
2, 2
Example: How to Find a 0.5-Nash equilibrium
Daskalakis, Mehta, & Papadimitriou [2006]:
1 1
2, 2
0, 1
1, 0
1, 0
1 1
2, 2
0, 1
0, 1
1, 0
1 1
2, 2
Example: How to Find a 0.5-Nash equilibrium
Daskalakis, Mehta, & Papadimitriou [2006]:
I
1 1
2, 2
0, 1
1, 0
1, 0
1 1
2, 2
0, 1
0, 1
1, 0
1 1
2, 2
Choose an arbitrary row
Example: How to Find a 0.5-Nash equilibrium
Daskalakis, Mehta, & Papadimitriou [2006]:
1 1
2, 2
0, 1
1, 0
1, 0
1 1
2, 2
0, 1
0, 1
1, 0
1 1
2, 2
I
Choose an arbitrary row
I
Take a best response for the column player
Example: How to Find a 0.5-Nash equilibrium
Daskalakis, Mehta, & Papadimitriou [2006]:
1 1
2, 2
0, 1
1, 0
1, 0
1 1
2, 2
0, 1
0, 1
1, 0
1 1
2, 2
I
Choose an arbitrary row
I
Take a best response for the column player
I
Take a best response for the row player
Example: How to Find a 0.5-Nash equilibrium
Daskalakis, Mehta, & Papadimitriou [2006]:
0
1
0
1
2
1 1
2, 2
0, 1
1, 0
0
1, 0
1 1
2, 2
0, 1
1
2
0, 1
1, 0
1 1
2, 2
I
Choose an arbitrary row
I
Take a best response for the column player
I
Take a best response for the row player
Example: How to Find a 0.5-Nash equilibrium
Daskalakis, Mehta, & Papadimitriou [2006]:
0
1
0
1
2
1 1
2, 2
0, 1
1, 0
0
0
1, 0
1 1
2, 2
0, 1
1
2
1
2
0, 1
1, 0
1 1
2, 2
1
3
4
1
2
1
2
I
Choose an arbitrary row
I
Take a best response for the column player
I
Take a best response for the row player
I
We have found a 0.5-Nash equilibrium
Best Polynomial-Time Approximations
History
I
0.75 (Kontogiannis, Panagopoulou, and Spirakis, 2006)
I
0.5 (Daskalakis, Mehta, and Papadimitriou, 2006)
I
0.38 (Kontogiannis, Panagopoulou, and Spirakis, 2007)
I
0.36 (Bosse, Byrka, and Markakis, 2007)
I
0.3393 (Tsaknakis, and Spirakis 2007)
Best Polynomial-Time Approximations
History
I
0.75 (Kontogiannis, Panagopoulou, and Spirakis, 2006)
I
0.5 (Daskalakis, Mehta, and Papadimitriou, 2006)
I
0.38 (Kontogiannis, Panagopoulou, and Spirakis, 2007)
I
0.36 (Bosse, Byrka, and Markakis, 2007)
I
0.3393 (Tsaknakis, and Spirakis 2007)
Can we do better?
I
No FPTAS unless PPAD=P (Chen, Deng, and Teng, 2009)
Best Polynomial-Time Approximations
History
I
0.75 (Kontogiannis, Panagopoulou, and Spirakis, 2006)
I
0.5 (Daskalakis, Mehta, and Papadimitriou, 2006)
I
0.38 (Kontogiannis, Panagopoulou, and Spirakis, 2007)
I
0.36 (Bosse, Byrka, and Markakis, 2007)
I
0.3393 (Tsaknakis, and Spirakis 2007)
Can we do better?
I
No FPTAS unless PPAD=P (Chen, Deng, and Teng, 2009)
I
Open problem: is there a PTAS?
A quasi-PTAS For -Nash
There is a qausi-polynomial approximation scheme
(Lipton, Markakis, Mehta, 2003)
A quasi-PTAS For -Nash
There is a qausi-polynomial approximation scheme
(Lipton, Markakis, Mehta, 2003)
The support of a strategy x is |{i : xi > 0}|
A quasi-PTAS For -Nash
There is a qausi-polynomial approximation scheme
(Lipton, Markakis, Mehta, 2003)
The support of a strategy x is |{i : xi > 0}|
n
Claim: Every bimatrix game has an -NE with support O( log
)
2
A quasi-PTAS For -Nash
There is a qausi-polynomial approximation scheme
(Lipton, Markakis, Mehta, 2003)
The support of a strategy x is |{i : xi > 0}|
n
Claim: Every bimatrix game has an -NE with support O( log
)
2
I
Take any Nash equilibrium (x, y )
I
n
Take O( log
) samples from x and y
2
I
Let (x 0 , y 0 ) play uniformly over these samples
I
w.h.p the payoffs of (x 0 , y 0 ) are within
2
of (x, y )
A quasi-PTAS For -Nash
There is a qausi-polynomial approximation scheme
(Lipton, Markakis, Mehta, 2003)
The support of a strategy x is |{i : xi > 0}|
n
Claim: Every bimatrix game has an -NE with support O( log
)
2
I
Take any Nash equilibrium (x, y )
I
n
Take O( log
) samples from x and y
2
I
Let (x 0 , y 0 ) play uniformly over these samples
I
w.h.p the payoffs of (x 0 , y 0 ) are within
We can find an -NE by searching all nO(
2
log n
)
2
of (x, y )
candidates
Well-supported approximate Nash equilibria
Problem: -NE can require you to play a bad strategy
0
1
0
1
2
1 1
2, 2
0, 1
1, 0
0
0
1, 0
1 1
2, 2
0, 1
1
2
1
2
0, 1
1, 0
1 1
2, 2
1
3
4
1
2
1
2
Well-supported approximate Nash equilibria
Problem: -NE can require you to play a bad strategy
0
1
0
1
2
1 1
2, 2
0, 1
1, 0
0
0
1, 0
1 1
2, 2
0, 1
1
2
1
2
0, 1
1, 0
1 1
2, 2
1
3
4
1
2
1
2
Well supported approximate Nash equilibria do not allow this
Well-supported approximate Nash equilibria
(x, y ) is a Nash equilibrium if and only if
both players have regret 0,
or both players only play best responses
Well-supported approximate Nash equilibria
(x, y ) is a Nash equilibrium if and only if
both players have regret 0,
or both players only play best responses
(x, y ) is an -Nash equilibrium if and only if
both players have regret at most Well-supported approximate Nash equilibria
(x, y ) is a Nash equilibrium if and only if
both players have regret 0,
or both players only play best responses
(x, y ) is an -Nash equilibrium if and only if
both players have regret at most (x, y ) is an -WSNE if and only if
both players only player -best reponses
-WSNE results
Every -WSNE is an -Nash equilibrium
Every
2
4 -NE
can be turned into an -WNSE
-WSNE results
Every -WSNE is an -Nash equilibrium
Every
2
4 -NE
can be turned into an -WNSE
I
No FPTAS unles PPAD=P
I
Open problem: is there a PTAS?
-WSNE results
Every -WSNE is an -Nash equilibrium
Every
2
4 -NE
can be turned into an -WNSE
I
No FPTAS unles PPAD=P
I
Open problem: is there a PTAS?
Constant approximations:
I
Maybe
I 2
3
I 2
3
5
6
(Daskalakis, Mehta, and Papadimitriou, 2006)
(Kontogiannis, and Spirakis, 2007)
− 0.004735 (F., Goldberg, Savani, Sørensen, 2012)
Outline
Part 1 - The complexity of finding Nash equilibria
I
What is a bimatrix game?
I
The complexity of finding Nash equilibria
I
The complexity of finding approximate equilibria
Part 2 - Algorithms for finding Nash equilibria
Part 3 - Lemke’s algorithm for discounted games
Outline
Part 1 - The complexity of finding Nash equilibria
I
What is a bimatrix game?
I
The complexity of finding Nash equilibria
I
The complexity of finding approximate equilibria
Part 2 - Algorithms for finding Nash equilibria
I
The Lemke-Howson algorithm
I
Lemke’s algorithm for the linear complementarity problem
Part 3 - Lemke’s algorithm for discounted games
Symmetric Equilibria of a Symmetric Game
A bimatrix game is symmetric if it is of the form (M, M T ).
Symmetric Equilibria of a Symmetric Game
A bimatrix game is symmetric if it is of the form (M, M T ).
A game (R, C ) can be turned into symmetric game (M, M T ):
M=
0
CT
R
0
M
T
=
0
RT
C
0
Symmetric Equilibria of a Symmetric Game
A bimatrix game is symmetric if it is of the form (M, M T ).
A game (R, C ) can be turned into symmetric game (M, M T ):
M=
0
CT
R
0
M
T
=
0
RT
C
0
An equilibrium is symmetric if it is of the form (x, x T )
Symmetric Equilibria of a Symmetric Game
A bimatrix game is symmetric if it is of the form (M, M T ).
A game (R, C ) can be turned into symmetric game (M, M T ):
M=
0
CT
R
0
M
T
=
0
RT
C
0
An equilibrium is symmetric if it is of the form (x, x T )
Symmetric equilibrium of (M, M T ) =⇒ equilibrium of (R, C )
Symmetric Equilibria of a Symmetric Game
A bimatrix game is symmetric if it is of the form (M, M T ).
A game (R, C ) can be turned into symmetric game (M, M T ):
M=
0
CT
R
0
M
T
=
0
RT
C
0
An equilibrium is symmetric if it is of the form (x, x T )
Symmetric equilibrium of (M, M T ) =⇒ equilibrium of (R, C )
It is sufficient to find a symmetric equilibrium of a symmetric game
Symmetric Equilibria of a Symmetric Game
Suppose that (x, x T ) is an equilibrium of (M, M T )
Best response condition:
xi > 0 ⇐⇒ i is a best response
Symmetric Equilibria of a Symmetric Game
Suppose that (x, x T ) is an equilibrium of (M, M T )
Best response condition:
xi > 0 ⇐⇒ i is a best response
⇐⇒ (x · M)i = max(x · M)
⇐⇒ max(x · M) − (x · M)i = 0
Symmetric Equilibria of a Symmetric Game
Suppose that (x, x T ) is an equilibrium of (M, M T )
Best response condition:
xi > 0 ⇐⇒ i is a best response
⇐⇒ (x · M)i = max(x · M)
⇐⇒ max(x · M) − (x · M)i = 0
This is a complementarity condition
x · (max(x · M) − (x · M))T = 0
The Linear Complementarity Problem
We can formulate this as a linear complementarity problem
Given M ∈ Rn×n and q ∈ Rn , find w , z ∈ Rn that satisfy:
w = Mz + q
w, z ≥ 0
wi · zi = 0
∀i ∈ {1 . . . n}
The Linear Complementarity Problem
We can formulate this as a linear complementarity problem
Given M ∈ Rn×n and q ∈ Rn , find w , z ∈ Rn that satisfy:
w = Mz + q
w, z ≥ 0
wi · zi = 0
Game (M, M T ) =⇒ LCP(M, −1)
∀i ∈ {1 . . . n}
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
Each choice defines a cone
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
Each choice defines a cone
1 0
2 1
−1
I =
M=
q=
0 1
1 3
−1
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
Each choice defines a cone
1 0
2 1
−1
I =
M=
q=
0 1
1 3
−1
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
Each choice defines a cone
1 0
2 1
−1
I =
M=
q=
0 1
1 3
−1
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
Each choice defines a cone
1 0
2 1
−1
I =
M=
q=
0 1
1 3
−1
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
Each choice defines a cone
1 0
2 1
−1
I =
M=
q=
0 1
1 3
−1
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
q
Each choice defines a cone
1 0
2 1
−1
I =
M=
q=
0 1
1 3
−1
Geometric Interpretation
q = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
wi > 0 =⇒ we use column i of I
zi > 0 =⇒ we use column i of −M
q
Each choice defines a cone
1 0
2 1
−1
I =
M=
q=
0 1
1 3
−1
We must find a cone that contains q
Geometric Interpretation
In general, it is NP-hard to decide
whether there is a solution
q
Geometric Interpretation
In general, it is NP-hard to decide
whether there is a solution
For bimatrix games we get a
Q-matrix
q
Q-matrix LCP always has a solution
Lemke’s algorithm
Pivoting algorithm given by Lemke in 1965
Lemke’s algorithm
Pivoting algorithm given by Lemke in 1965
Modify the LCP with a scalar z0 and positive covering vector d
w = Mz + q + dz0
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
Lemke’s algorithm
Pivoting algorithm given by Lemke in 1965
Modify the LCP with a scalar z0 and positive covering vector d
w = Mz + q + dz0
w, z ≥ 0
∀i ∈ {1 . . . n}
zi wi = 0
When z0 is large the modified LCP has a trivial solution
w = q + dz0
z =0
Geometric Interpretation of Lemke’s Algorithm
q = Iw − Mz
w, z ≥ 0
zi wi = 0
q
∀i ∈ {1 . . . n}
Geometric Interpretation of Lemke’s Algorithm
q + dz0 = Iw − Mz
w, z ≥ 0
zi wi = 0
q
∀i ∈ {1 . . . n}
Geometric Interpretation of Lemke’s Algorithm
q + dz0 = Iw − Mz
w, z ≥ 0
zi wi = 0
q
∀i ∈ {1 . . . n}
Lemke’s algorithm follows the line
Geometric Interpretation of Lemke’s Algorithm
q + dz0 = Iw − Mz
w, z ≥ 0
zi wi = 0
q
∀i ∈ {1 . . . n}
Lemke’s algorithm follows the line
The Lemke-Howson algorithm can
be obtained using a covering vector
How good is the Lemke-Howson algorithm?
In practice - seems to be good
How good is the Lemke-Howson algorithm?
In practice - seems to be good
In theory
I
Exponential worst case (Savani 2004)
How good is the Lemke-Howson algorithm?
In practice - seems to be good
In theory
I
Exponential worst case (Savani 2004)
I
It is PSPACE-complete to compute the solution found by LH!
(Goldberg, Papadimitriou, Savani, 2013)
Outline
Part 1 - The complexity of finding Nash equilibria
I
What is a bimatrix game?
I
The complexity of finding Nash equilibria
I
The complexity of finding approximate equilibria
Part 2 - Algorithms for finding Nash equilibria
I
The Lemke-Howson algorithm
I
Lemke’s algorithm for the linear complementarity problem
Part 3 - Lemke’s algorithm for discounted games
Outline
Part 1 - The complexity of finding Nash equilibria
I
What is a bimatrix game?
I
The complexity of finding Nash equilibria
I
The complexity of finding approximate equilibria
Part 2 - Algorithms for finding Nash equilibria
I
The Lemke-Howson algorithm
I
Lemke’s algorithm for the linear complementarity problem
Part 3 - Lemke’s algorithm for discounted games
I
Discounted games and parity games
I
How to turn a discounted game into an LCP
I
Lemke’s algorithm applied to the LCP
I
Joint work with Marcin Jurdziński and Rahul Savani
Motivation
Parity games ⇔ Modal µ calculus model-checking
I
In NP ∩ co-NP and PPAD ∩ PLS
I
So far, not in P
Motivation
Parity games ⇔ Modal µ calculus model-checking
I
In NP ∩ co-NP and PPAD ∩ PLS
I
So far, not in P
Parity games can be reduced to discounted games
Motivation
Parity games ⇔ Modal µ calculus model-checking
I
In NP ∩ co-NP and PPAD ∩ PLS
I
So far, not in P
Parity games can be reduced to discounted games
Discounted games can be reduced to P-matrix LCP
(Jurdziński & Savani, 2008)
Motivation
Parity games ⇔ Modal µ calculus model-checking
I
In NP ∩ co-NP and PPAD ∩ PLS
I
So far, not in P
Parity games can be reduced to discounted games
Discounted games can be reduced to P-matrix LCP
(Jurdziński & Savani, 2008)
Goal: how does Lemke’s algorithm solve a discounted game?
Discounted Games
Discount factor β = 0.5
Discounted Games
Discount factor β = 0.5
Payoff =
Discounted Games
Discount factor β = 0.5
Payoff = 2
Discounted Games
Discount factor β = 0.5
Payoff = 2 + (β × 11)
Discounted Games
Discount factor β = 0.5
Payoff = 2 + (β × 11) + (β 2 × 3)
Discounted Games
Discount factor β = 0.5
Payoff = 2 + (β × 11) + (β 2 × 3) + (β 3 × 14)
Discounted Games
Discount factor β = 0.5
Payoff = 2 + (β × 11) + (β 2 × 3) + (β 3 × 14) + (β 4 × 1) . . .
Discounted Games
Discount factor β = 0.5
Payoff = 2 + (β × 11) + (β 2 × 3) + (β 3 × 14) + (β 4 × 1) . . .
∞
X
=
β i · reward(i) = 10.125
i=0
Positional Strategies
Determinacy
Theorem (Shapley 1953)
For every initial vertex v in a discounted game there is a constant
c such that:
I
Max has a positional strategy against which the payoff is ≥ c
I
Min has a positional strategy against which the payoff is ≤ c
Determinacy
Theorem (Shapley 1953)
For every initial vertex v in a discounted game there is a constant
c such that:
I
Max has a positional strategy against which the payoff is ≥ c
I
Min has a positional strategy against which the payoff is ≤ c
Computational problem:
I
Find c
I
Find the two optimal strategies
Geometric Interpretation of Lemke’s Algorithm
q + dz0 = Iw − Mz
w, z ≥ 0
zi wi = 0
q
∀i ∈ {1 . . . n}
Geometric Interpretation of Lemke’s Algorithm
q + dz0 = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
Discounted games can be formulated
as a P-matrix LCP
(Jurdziński & Savani, 2008)
q
I
M is determined by the graph
I
Cones are positional strategies
I
q is determined by the graph
and weights
Geometric Interpretation of Lemke’s Algorithm
q + dz0 = Iw − Mz
w, z ≥ 0
zi wi = 0
∀i ∈ {1 . . . n}
Discounted games can be formulated
as a P-matrix LCP
(Jurdziński & Savani, 2008)
q
I
M is determined by the graph
I
Cones are positional strategies
I
q is determined by the graph
and weights
We will describe Lemke in terms of
the discounted game
Computing the Value of a Vertex
Computing the Value of a Vertex
Value(v ) =
k−1
X
i=0
β i ri +
l
X
β k+i
ci
1 − βl
i=0
The Balance of a Vertex
For a Max vertex:
Balance(v) = Value(v)−(Reward(v,u)+β×Value(u))
The Balance of a Vertex
For a Max vertex:
Balance(v) = Value(v)−(Reward(v,u)+β×Value(u))
The balance compares the two successors
I Positive when the current successor is better
I
Negative when the other successor is better
The Balance of a Vertex
For a Max vertex:
Balance(v) = Value(v)−(Reward(v,u)+β×Value(u))
For a Min vertex:
Balance(v) = (Reward(v, u)+β×Value(u))−Value(v)
The balance compares the two successors
I Positive when the current successor is better
I
Negative when the other successor is better
The Balance of a Vertex
For a Max vertex:
Balance(v) = Value(v)−(Reward(v,u)+β×Value(u))
For a Min vertex:
Balance(v) = (Reward(v, u)+β×Value(u))−Value(v)
The balance compares the two successors
I Positive when the current successor is better
I
Negative when the other successor is better
Theorem (Shapley 1953)
A pair of strategies is optimal if and only if no
vertex has a negative balance
Lemke’s Algorithm
The idea
I
Choose a pair of arbitrary strategies
I
Modify the game to make the pair optimal (the trivial
solution)
I
Transform the modified game back to the original while
ensuring that strategies are optimal
Lemke’s Algorithm
Lemke’s Algorithm
Balance(v) = Value(v) − (Reward(v,u) + β × Value(u))
= (−1 − β × 4 − β 2 × 16) − (−2 + β × 8 + β 2 × 32)
= −48.68
Lemke’s Algorithm
Balance(v) = Value(v) − (Reward(v,u) + β × Value(u))
= (−1 − β × 4 − β 2 × 16) − (−2 − z + β × 8 + β 2 × 32)
= −48.68
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Balance(v) = Value(v) − (Reward(v,u) + β × Value(u))
= (−2 − z + β × 8 + β 2 × 32) − (−1 − β × 4 − β 2 × 16)
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Balance(v) = Value(v) − (Reward(v,u) + β × Value(u))
= (−2 − z + β × (6+z) − β 2 × 16)
− (−1 − β × 4 − β 2 × 16)
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Balance(v) = Value(v) − (Reward(v,u) + β × Value(u))
= (−2 − z + β × (6+z) − β 2 × 16)
−(−1 + β × (5−z) + β 2 × 32)
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
Lemke’s Algorithm
An Exponential-Time Example
An Exponential-Time Example
It has been used before to show lower bounds for:
I
Strategy improvement for 1 player simple-stochastic games
(Melekopoglou and Condon 1994)
I
Strategy improvement for mean-payoff games (Björklund and
Vorobyov 2007)
An Exponential-Time Example
It has been used before to show lower bounds for:
I
Strategy improvement for 1 player simple-stochastic games
(Melekopoglou and Condon 1994)
I
Strategy improvement for mean-payoff games (Björklund and
Vorobyov 2007)
It works because the underlying combinatorial structure is a
Klee-Minty cube.
An Exponential-Time Example
Theorem
When β is close to 1 Lemke’s algorithm will take an exponential
number of steps
An Exponential-Time Example
Theorem
When β is close to 1 Lemke’s algorithm will take an exponential
number of steps
Corollary
Lemke’s algorithm is exponential for parity games
Outline
Part 1 - The complexity of finding Nash equilibria
I
What is a bimatrix game?
I
The complexity of finding Nash equilibria
I
The complexity of finding approximate equilibria
Part 2 - Algorithms for finding Nash equilibria
I
The Lemke-Howson algorithm
I
Lemke’s algorithm for the linear complementarity problem
Part 3 - Lemke’s algorithm for discounted games
I
Discounted games and parity games
I
How to turn a discounted game into an LCP
I
Lemke’s algorithm applied to the LCP
The Complexity of the Simplex Method
Dantzig’s rule was the original pivot rule for the simplex method
Computing the solution found by Dantzig’s rule is
PSPACE-complete
J. Fearnley and R. Savani. The complexity of the simplex method.
arXiv:1404.0605

Download Report

Finding Nash Equilibria in Bimatrix Games

Paperzz.com

Your Paperzz