Computing equilibria in extensive form games

Computing equilibria in
extensive form games
Andrew Gilpin
Mathematical Games - March 29, 2005
This talk
• Extensive form games
– Representation
– Computing equilibrium
• Poker AI
– History of poker research
– Current research
Extensive form representation
1. I = {0, 1, …, n} – players
2. (V,E), terminals Z – tree
3. P: V \ Z H – controlling
player
4. H = {H0, …, Hn} – information
sets
5. A = {A0, …, An} – actions
6. u : Z Rn – payoffs
7. p – chance probabilities
Perfect recall assumption: Players never forget information
Game from: Bernhard von Stengel. Efficient Computation of Behavior
Strategies. In Games and Economic Behavior 14:220-246, 1996.
Computing equilibria via normal form
• Normal form exponential, in worst case and
in practice (e.g. poker)
Sequence form
• Instead of a move for every information set,
consider choices necessary for each leaf
• These choices are sequences and constitute
the pure strategies in the sequence form
S1 = {{}, l, r, L, R}
S2 = {{}, c, d}
Realization plans
• Players strategies are specified as realization plans
over sequences:
• Prop. Realization plans are equivalent to behavior
strategies.
Computing equilibria via sequence form
• Players 1 and 2 have realization plans x and y
• Realization constraint matrices E and F
specify constraints on realizations
{}
l
r
L
R
{}
v
v’
{}
c
d
{}
u
Computing equilibria via sequence form
• Payoffs for player 1 and 2 are:
for suitable matrices A and B
• Creating payoff matrix:
and
– Initialize each entry to 0
– For each leaf, there is a (unique) pair of sequences corresponding
to an entry in the payoff matrix
– Weight the entry by the product of chance probabilities along the
path from the root to the leaf
{}
c
d
{}
l
r
L
R
Computing equilibria via sequence form
Primal
Dual
Holding x fixed,
compute best response
Holding y fixed,
Compute best response
Primal
Dual
Computing equilibria via sequence form:
An example
min p1
subject to
x1:
p1 - p2 - p3
x2: 0y1 +
p2
x3:
-y2 + y3
+ p2
x4:
2y2 - 4y3
+ p3
x5: -y1
+ p3
q1: -y1
= -1
q2: y1 - y2 - y3 = 0
bounds
y1 >= 0 y2 >= 0 y3 >= 0
p1 Free p2 Free p3 Free
end
>=
>=
>=
>=
>=
0
0
0
0
0
Sequence form summary
• Poly-time algorithm for computing Nash
equilibria in 2-player zero-sum games
• Poly-size linear complementarity problem (LCP)
for computing Nash equilibria in 2-player generalsum games
• Major shortcomings:
– Not well understood when more than two players
– Sometimes, polynomial is still slow (e.g. poker)
Poker
• Poker is a wildly popular card game
– This year’s World Series of Poker is expected to have
prizes totaling almost $50 million
• Challenges
– Incomplete information
– Risk assessment
– Deception and counter-deception
• Sequence form does not directly apply
– Two-player Texas Hold’em has ~1018 nodes
Hold’em Poker
• Every player receives hole cards
• Some cards are placed on the table (flop,
turn, river)
• Betting rounds after each deal of cards
– Players can bet, raise, check, fold, call
• At end of the game, player with best hand
takes the pot
Previous work in poker research
• Rule-based
• Simulation/Learning
• Game-theoretic
– Manual abstraction
• “Approximating Game-Theoretic Optimal Strategies
for Full-scale Poker”, Billings, Burch, Davidson,
Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03.
Distinguished Paper Award.
– Automated abstraction
Finding equilibria in large sequential
games of incomplete information
(Joint with Tuomas Sandholm)
• Outline:
–
–
–
–
–
Extensive game isomorphism
Restricted game isomorphic abstraction transformation
GameShrink – automatically shrinking games
Application to poker
Approximation methods
Extensive game isomorphism: example
Extensive game isomorphism: example
Extensive game isomorphism: definition
•
Let G=(I,V,E,P,H,A,u,p) and G’=(I’,V’,E’,P’,H’,A’,
u’,p’) be given. A bijection f:V V’ is an extensive
game isomorphism if:
1. f induces a graph isomorphism between (V,E) and (V’,E’)
2. For each information set h in G, f induces a bijection
between the nodes of h and some h’ in G’
3. P(x) = P’(f(x)) for all x in V \ Z
4. U(x) = u’(f(x)) for all x in Z
5. p(h,a) = p’(f(h), f(a)) for all h in H0
Restricted game isomorphic
abstraction transformation
•
•
The restricted game Gx is obtained from G by
removing all nodes except x and its descendants.
(Gx,Gy) is contractible within G if
1. x and y are in the same information set
2. Every node in that information set has the same
parent, and the parent is either in a singleton
information set or a chance node
3. Gx and Gy are extensive game isomorphic
•
For (Gx,Gy) contractible, the restricted game
isomorphic abstraction transformation is the
game where Gx and Gy are “merged”
Restricted game isomorphic
abstraction transformation: example
Main equilibrium result
• Thm. Let G be a sequential game with
observable actions, let G’ be obtained by
one application of the restricted game
isomorphic abstraction transformation, and
let s’ be a Nash equilibrium for G’. Then the
corresponding s for G is a Nash
equilibrium.
Computing ExtensiveGameIsomorphic?(x,y)
1. If x and y both leaves, return u(x) == u(y)
2. If x and y have different number of children, or
if a different player controls them, return false
3. Construct bipartite graph Gx,y (see next slide).
4. Return true if Gx,y has a perfect matching;
otherwise return false.
Constructing Gx,y
• Each vertex corresponds to an information set containing a
child node.
• Edges connect information sets where there exists a
bijection between extensive game isomorphic vertices
(extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a
child node.
• Edges connect information sets where there exists a
bijection between extensive game isomorphic vertices
(extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a
child node.
• Edges connect information sets where there exists a
bijection between extensive game isomorphic vertices
(extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a
child node.
• Edges connect information sets where there exists a
bijection between extensive game isomorphic vertices
(extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a
child node.
• Edges connect information sets where there exists a
bijection between extensive game isomorphic vertices
(extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a
child node.
• Edges connect information sets where there exists a
bijection between extensive game isomorphic vertices
(extensive game isomorphic information sets)
Constructing Gx,y
• Each vertex corresponds to an information set containing a
child node.
• Edges connect information sets where there exists a
bijection between extensive game isomorphic vertices
(extensive game isomorphic information sets)
GameShrink: Efficiently computing restricted
game isomorphic abstraction transformations
1. Bottom-up pass: Compute the
ExtensiveGameIsomorphic relation for
each pair of equal depth nodes.
2. Top-down pass: For i from 0 to height(G):
•
For each information set h at level i whose
nodes share a common parent:
•
Apply the restricted game isomorphic abstraction
transformation to each applicable x and y in h
Enhancements
• Disjoint-set data structure for storing
isomorphisms
• Implicit enumeration of game tree nodes
• Necessary conditions for extensive game
isomorphism
• Payoff histogram database
Application to poker
• Theorem. In poker, can compute
isomorphisms only considering card tree.
J1
K
J2
J2
K K
J1 J1
J2
0
-1 -1
0
1
1
Rhode Island Hold’em
• Invented as a testbed for AI research [Shi &
Littman 2001]
• More than 3.1 billion game tree nodes
• Applying sequence form:
– LP has 91 million rows and columns
• Applying GameShrink:
– LP has 1.2 million rows and columns
– Solvable in about 1 week
– GameShrink itself takes less than 1 second, the LP
solve still dominates
Future poker research
• More difficult games
– Multi-player
– Tournament
• Maximally vs. Optimally

Download Report

Computing equilibria in extensive form games

Paperzz.com

Your Paperzz