Backward Induction and Imperfect Information Games

Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Extensive Form Games: Backward Induction and
Imperfect Information Games
CPSC 532A Lecture 10
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 1
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Lecture Overview
1
Recap
2
Backward Induction
3
Imperfect-Information Extensive-Form Games
4
Perfect Recall
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 2
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
I promised to revisit this
Question: is there a problem having ∀ai , a0i ∈ Ai in the constraint
that is, are we requiring that the constraint hold in both
directions?
X
X
p(a)ui (a) ≥
p(a)ui (a0i , a−i ) ∀i ∈ N, ∀ai , a0i ∈ Ai
a∈A|ai ∈a
a∈A|ai 0 ∈a
p(a) ≥ 0
X
p(a) = 1
∀a ∈ A
a∈A
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 3
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
I promised to revisit this
Question: is there a problem having ∀ai , a0i ∈ Ai in the constraint
that is, are we requiring that the constraint hold in both
directions?
X
X
p(a)ui (a) ≥
p(a)ui (a0i , a−i ) ∀i ∈ N, ∀ai , a0i ∈ Ai
a∈A|ai ∈a
a∈A|ai ∈a
p(a) ≥ 0
X
p(a) = 1
∀a ∈ A
a∈A
Answer: yes, it was wrong. The version above fixes the problem,
changing the second sum so that it’s identical to the first.
Note that the constraint can equivalently be written as
X
[ui (a) − ui (a0i , a−i )]p(a) ≥ 0.
a∈A|ai ∈a
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 3
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Introduction
The normal form game representation does not incorporate
any notion of sequence, or time, of the actions of the players
The extensive form is an alternative representation that makes
the temporal structure explicit.
Two variants:
perfect information extensive-form games
a “game tree” consisting of choice nodes and terminal nodes
choice nodes labeled with players, and each outgoing edge
labeled with an action for that player
terminal nodes labeled with utilities
imperfect-information extensive-form games
we’ll get to this today
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 4
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Pure Strategies
Overall, a pure strategy for a player in a perfect-information
game is a complete specification of which deterministic action
to take at every node belonging to that player.
Definition
Let G = (N, A, H, Z, χ, ρ, σ, u) be a perfect-information
extensive-form game. Then the pure strategies of player i consist
of the cross product
×
χ(h)
h∈H,ρ(h)=i
Using this definition, we recover the old definitions of mixed
strategies, best response, Nash equilibrium, . . .
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 5
q
,0)
Aq
(2,0)
q
(0,0)
Recap
A
Aq
(1,1)
Backward
Induction
A
Induced Normal Form
Imperfect-Information
Extensive-Form Games
A
q
Aq
(0,0)
(0,2)
Perfect Recall
Figure 5.1 The Sharing game.
at the definition contains a subtlety. An agent’s strategy requires a decision
ice node, regardless of whether or not it is possible to reach that node given
hoice nodes. In the Sharing game above the situation is straightforward—
s three pure strategies, and player 2 has eight (why?). But now consider the
n in Figure 5.2. we can “convert” an extensive-form game
into normal form
1
A
B
2
2
C
D
E
(3,8)
(8,3)
(5,5)
F
1
G
(2,10)
Figure 5.2
H
AG
AH
BG
BH
CE
3, 8
3, 8
5, 5
5, 5
CF
3, 8
3, 8
2, 10
1, 0
DE
8, 3
8, 3
5, 5
5, 5
DF
8, 3
8, 3
2, 10
1, 0
(1,0)
A perfect-information game in extensive form.
o define a complete strategy for this game, each of the players must choose
each of his two choice nodes. Thus we can enumerate the pure strategies
rs as follows.
A, G), (A, H), (B, G), (B, H)}
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 6
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Subgame Perfection
Define subgame of G rooted at h:
the restriction of G to the descendents of H.
Define set of subgames of G:
subgames of G rooted at nodes in G
s is a subgame perfect equilibrium of G iff for any subgame
G0 of G, the restriction of s to G0 is a Nash equilibrium of G0
Notes:
since G is its own subgame, every SPE is a NE.
this definition rules out “non-credible threats”
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 7
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Lecture Overview
1
Recap
2
Backward Induction
3
Imperfect-Information Extensive-Form Games
4
Perfect Recall
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 8
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Centipede Game
5 Reasoning and Computing with the Extensive Form
1q
A
2q
A
1q
A
2q
A
1q
D
D
D
D
D
(1,0)
(0,2)
(3,1)
(2,4)
(4,3)
Figure 5.9
A
(3,5)
The centipede game
Play this as a fun game...
place. In other words, you have reached a state to which your analysis has given a
probability of zero. How should you amend your beliefs and course of action based
on this measure-zero event? It turns out this seemingly small inconvenience actually
raises a fundamental problem in game theory. We will not develop the subject further
here, but let us only mention that there exist different accounts of this situation, and
they
depend
on theBackward
probabilistic
assumptions
made,Games
on what is common
knowledge
(in9
Extensive
Form Games:
Induction and
Imperfect Information
CPSC 532A
Lecture 10, Slide
than
Recap
possibly Backward
finding aInduction
Nash equilibriumImperfect-Information
that involves non-credible
threats)
but also
Extensive-Form
Games
this procedure is computationally simple. In particular, it can be implemented as a
single depth-first traversal of the game tree, and thus requires time linear in the size
of the game representation. Recall in contrast that the best known methods for finding
Nash equilibria of general games require time exponential in the size of the normal
form;
remember
as well
the induced in
normal
of an extensive-form
is
Idea:
Identify
thethatequilibria
theform
bottom-most
trees,game
and
exponentially larger than the original representation.
Perfect Recall
Computing Subgame Perfect Equilibria
adopt
these as one moves up the tree
function BACKWARD I NDUCTION (node h) returns u(h)
if h ∈ Z then
return u(h)
best util ← −∞
forall a ∈ χ(h) do
util at child ←BACKWARD I NDUCTION(σ(h, a))
if util at childρ(h) > best utilρ(h) then
best util ← util at child
return best util
// h is a terminal node
Figure 5.6: Procedure for finding the value of a sample (subgame-perfect) Nash equilibrium
of a at
perfect-information
extensive-form
game.
util
child is a vector
denoting
the utility for each player
the procedure
doesn’t
return
an equilibrium
strategy,
but rather
The algorithm
BACKWARD
I NDUCTION
is described
in Figure 5.6.
The variable
labels
with
a vector
ofplayer
realatnumbers.
util at child
is aeach
vector node
denoting
the utility
for each
the child node; util at childρ(h)
denotes the element of this vector corresponding to the utility for player ρ(h) (the
This labeling can be seen as an extension of the game’s utility
player who gets to move at node h). Similarly best util is a vector giving utilities for
each player. function to the non-terminal nodes
Observe that The
this procedure
does not
return an equilibrium
for each at
of the
equilibrium
strategies:
take thestrategy
best action
each node.
n players, but rather describes how to label each node with a vector of n real numbers.
This labeling can be seen as an extension of the game’s utility function to CPSC
the nonExtensive Form Games: Backward Induction and Imperfect Information Games
532A Lecture 10, Slide 10
good news: not only are we guaranteed to find a subgame-perfect equilibrium (rather
Backward Induction
Imperfect-Information Extensive-Form Games
than possibly finding a Nash equilibrium that involves non-credible threats) but also
this procedure is computationally simple. In particular, it can be implemented as a
single depth-first traversal of the game tree, and thus requires time linear in the size
of the game representation. Recall in contrast that the best known methods for finding
Nash equilibria of general games require time exponential in the size of the normal
Idea:
Identify
thethatequilibria
theform
bottom-most
trees,game
and
form;
remember
as well
the induced in
normal
of an extensive-form
is
exponentially
the original
representation.
these as larger
one than
moves
up the
tree
Recap
Perfect Recall
Computing Subgame Perfect Equilibria
function BACKWARD I NDUCTION (node h) returns u(h)
if h ∈ Z then
return u(h)
best util ← −∞
forall a ∈ χ(h) do
util at child ←BACKWARD I NDUCTION(σ(h, a))
if util at childρ(h) > best utilρ(h) then
best util ← util at child
return best util
adopt
// h is a terminal node
Figure 5.6: Procedure for finding the value of a sample (subgame-perfect) Nash equilibrium of a perfect-information extensive-form game.
For zero-sum games, BackwardInduction has another name:
the minimax algorithm.
The algorithm BACKWARD I NDUCTION is described in Figure 5.6. The variable
util at child isHere
a vectorit’s
denoting
the utility
for each
player
at the child
util at childρ(h)
enough
to store
one
number
pernode;
node.
denotes the element of this vector corresponding to the utility for player ρ(h) (the
It’s
possible
to
speed
things
up
by
pruning
nodes
that will
player who gets to move at node h). Similarly best util is a vector giving utilities for
never
be
reached
in
play:
“alpha-beta
pruning”.
each player.
Observe that this procedure does not return an equilibrium strategy for each of the
n players, but rather describes how to label each node with a vector of n real numbers.
Extensive
Games:
and Imperfect
Information
532A Lecture 10, Slide 10
ThisForm
labeling
canBackward
be seenInduction
as an extension
of the
game’s Games
utility function to CPSC
the non-
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Backward Induction
118
1q
Perfect Recall
5 Reasoning and Computing with the Extensive Form
A
2q
D
D
(1,0)
(0,2)
A
1q
A
D
(3,1)
Figure 5.9
2q
A
1q
D
D
(2,4)
(4,3)
A
(3,5)
The centipede game
What happens when we use this procedure on Centipede?
place.
In other
you have reached
a state to
analysisin
hasthe
givenfirst
a
In the
onlywords,
equilibrium,
player
1 which
goesyourdown
move.
probability of zero. How should you amend your beliefs and course of action based
However,
this
outcome
is
Pareto-dominated
by
all
but
one
on this measure-zero event? It turns out this seemingly small inconvenience actually
raises
a fundamental
problem in game theory. We will not develop the subject further
other
outcome.
here, but let us only mention that there exist different accounts of this situation, and
they depend on the probabilistic assumptions made, on what is common knowledge (in
particular, whether there is common knowledge of rationality), and on exactly how one
practical: human subjects don’t go down right away
revises one’s beliefs in the face of measure zero events. The last question is intimately
theoretical:
you do
as player
2 if player 1 doesn’t
related
to the subjectwhat
of belief should
revision discussed
in Chapter
2.
Two considerations:
go down?
5.2
Imperfect-information
gamesHowever, that same analysis
SPE analysis extensive-form
says to go down.
P1 would
already have
gone
down.players
Howto do you
Up to thissays
point, that
in our discussion
of extensive-form
games we
have allowed
specify the
action thatyour
they would
take at
every choice
node of the of
game.
This implies zero event?
update
beliefs
upon
observation
a measure
that players know the node they are in, and—recalling that in such games we equate
but
if player
1 knows
that
you’ll
doincluding
something
nodes with
the histories
that led
to them—all
the prior
choices,
those of else,
other it is
agents. For
this reasonfor
we have
games.
rational
himcalled
notthese
to perfect-information
go down anymore...
a paradox
We might not always want to make such a strong assumption about our players and
there’sIn many
a whole
literature
onto model
this question
our environment.
situations
we may want
agents needing to act with
partial or no knowledge of the actions taken by others, or even agents with limited
Extensive Form Games:memory
Backward
Induction
andactions.
Imperfect
532A Lecture 10, Slide 11
of their
own past
The Information
sequencing ofGames
choices allows us to CPSC
represent
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Lecture Overview
1
Recap
2
Backward Induction
3
Imperfect-Information Extensive-Form Games
4
Perfect Recall
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 12
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Intro
Up to this point, in our discussion of extensive-form games we
have allowed players to specify the action that they would
take at every choice node of the game.
This implies that players know the node they are in and all the
prior choices, including those of other agents.
We may want to model agents needing to act with partial or
no knowledge of the actions taken by others, or even
themselves.
This is possible using imperfect information extensive-form
games.
each player’s choice nodes are partitioned into information sets
if two choice nodes are in the same information set then the
agent cannot distinguish between them.
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 13
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Formal definition
Definition
An imperfect-information game (in extensive form) is a tuple
(N, A, H, Z, χ, ρ, σ, u, I), where
(N, A, H, Z, χ, ρ, σ, u) is a perfect-information extensive-form
game, and
I = (I1 , . . . , In ), where Ii = (Ii,1 , . . . , Ii,ki ) is an equivalence
relation on (that is, a partition of) {h ∈ H : ρ(h) = i} with
the property that χ(h) = χ(h0 ) and ρ(h) = ρ(h0 ) whenever
there exists a j for which h ∈ Ii,j and h0 ∈ Ii,j .
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 14
playerRecap
would beBackward
able toInduction
distinguish Imperfect-Information
the nodes). Thus,
if I ∈ Games
Ii is an equivalence
Extensive-Form
Perfect Recall clas
we can unambiguously use the notation χ(I) to denote the set of actions available
Example
player
i at any node in information set I.
q1
"b
b R
L"
b
b
"
2 q"
bq 2
"b
(1,1)
"
b
A"
bB
"
b
1
bq
q"
%e
%e
ℓ %
ℓ %
er
er
%
eq
%
eq
q
q
(0,0)
(2,4)
(2,4)
(0,0)
"
Figure 5.10 An imperfect-information game.
What are the equivalence classes for each player?
What are the pure strategies for each player?
Consider the imperfect-information extensive-form game shown in Figure 5.10. I
this game, player 1 has two information sets: the set including the top choice node, an
the set including the bottom choice nodes. Note that the two bottom choice nodes
the second information set have the same set of possible actions. We can regard play
1 as
not knowing
player
choseInformation
A or BGames
when she makes
herLecture
choice
betwee
Extensive
Form Games: whether
Backward Induction
and 2
Imperfect
CPSC 532A
10, Slide
15
playerRecap
would beBackward
able toInduction
distinguish Imperfect-Information
the nodes). Thus,
if I ∈ Games
Ii is an equivalence
Extensive-Form
Perfect Recall clas
we can unambiguously use the notation χ(I) to denote the set of actions available
Example
player
i at any node in information set I.
q1
"b
b R
L"
b
b
"
2 q"
bq 2
"b
(1,1)
"
b
A"
bB
"
b
1
bq
q"
%e
%e
ℓ %
ℓ %
er
er
%
eq
%
eq
q
q
(0,0)
(2,4)
(2,4)
(0,0)
"
Figure 5.10 An imperfect-information game.
What are the equivalence classes for each player?
What are the pure strategies for each player?
Consider the imperfect-information extensive-form game shown in Figure 5.10. I
choice of an action in each equivalence class.
this game, player 1 has two information sets: the set including the top choice node, an
Formally,
the pure
strategies
player
consist
the cross
the set including
the bottom
choice
nodes.ofNote
thati the
two of
bottom
choice nodes
product
×
χ(I
).
i,j same set of possible actions. We can regard play
i,j ∈I
i
the second informationIset
have
the
1 as
not knowing
player
choseInformation
A or BGames
when she makes
herLecture
choice
betwee
Extensive
Form Games: whether
Backward Induction
and 2
Imperfect
CPSC 532A
10, Slide
15
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Normal-form games
5 Reasoning and Computing with the Extensive
We can represent any normal form game.
C""
"
1q
"b
q"
2
%
e
c
d
%
e
%
eq
q
(-1,-1)
(-4,0)
b
bD
b
bq
%e d
c
%
e
%
eq
q
(0,-4)
(-3,-3)
Figure
The
Prisoner’s
Note5.11
that it
would
also beDilemma
the samegame
if we in
putextensive
player 2 form.
at the
root node.
Recall that perfect-information games were not expressive enough to captu
Extensive Form
Games: Backward
and Imperfect
Information
532A
Lecture 10, from
Slide 16 th
isoner’s
Dilemma
gameInduction
and many
other
ones.Games
In contrast,CPSC
as is
obvious
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Induced Normal Form
Same as before: enumerate pure strategies for all agents
Mixed strategies are just mixtures over the pure strategies as
before.
Nash equilibria are also preserved.
Note that we’ve now defined both mapping from NF games to
IIEF and a mapping from IIEF to NF.
what happens if we apply each mapping in turn?
we might not end up with the same game, but we do get one
with the same strategy spaces and equilibria.
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 17
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Randomized Strategies
It turns out there are two meaningfully different kinds of
randomized strategies in imperfect information extensive form
games
mixed strategies
behavioral strategies
Mixed strategy: randomize over pure strategies
Behavioral strategy: independent coin toss every time an
information set is encountered
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 18
Recap
Figure 5.1 The Sharing game.
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Notice that the definition contains a subtlety. An agent’s strategy requires a decision
each choice node, regardless
of whether or not it is possible to reach that node given
Randomizedatthestrategies
example
other choice nodes. In the Sharing game above the situation is straightforward—
player 1 has three pure strategies, and player 2 has eight (why?). But now consider the
game shown in Figure 5.2.
1
A
B
2
2
C
D
E
(3,8)
(8,3)
(5,5)
F
1
G
(2,10)
Figure 5.2
H
(1,0)
A perfect-information game in extensive form.
In order to define a complete strategy for this game, each of the players must choose
at each of his
nodes. Thus westrategy:
can enumerate the pure strategies
Give anan action
example
oftwoachoice
behavioral
of the players as follows.
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F ), (D, E), (D, F )}
It is important to note that we have to include the strategies (A, G) and (A, H), even
though once A is chosen the G-versus-H choice is moot.
The definition of best response and Nash equilibria in this game are exactly as they
are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example,
the perfect-information game of Figure 5.2 can be converted into the normal form image of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are
Multi Agent Systems, draft of September 19, 2006
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 19
Recap
Figure 5.1 The Sharing game.
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Notice that the definition contains a subtlety. An agent’s strategy requires a decision
each choice node, regardless
of whether or not it is possible to reach that node given
Randomizedatthestrategies
example
other choice nodes. In the Sharing game above the situation is straightforward—
player 1 has three pure strategies, and player 2 has eight (why?). But now consider the
game shown in Figure 5.2.
1
A
B
2
2
C
D
E
(3,8)
(8,3)
(5,5)
F
1
G
(2,10)
Figure 5.2
H
(1,0)
A perfect-information game in extensive form.
In order to define a complete strategy for this game, each of the players must choose
at each of his
nodes. Thus westrategy:
can enumerate the pure strategies
Give anan action
example
oftwoachoice
behavioral
of the players as follows.
A with
probability .5 and G with probability .3
S1 = {(A, G), (A, H), (B, G), (B, H)}
= {(C, E), (C,of
F ), (D,
E), (D, F )} strategy that is not a behavioral
Give an Sexample
a mixed
It is important to note that we have to include the strategies (A, G) and (A, H), even
strategy:
though once A is chosen the G-versus-H choice is moot.
2
The definition of best response and Nash equilibria in this game are exactly as they
are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example,
the perfect-information game of Figure 5.2 can be converted into the normal form image of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are
Multi Agent Systems, draft of September 19, 2006
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 19
Recap
Figure 5.1 The Sharing game.
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Notice that the definition contains a subtlety. An agent’s strategy requires a decision
each choice node, regardless
of whether or not it is possible to reach that node given
Randomizedatthestrategies
example
other choice nodes. In the Sharing game above the situation is straightforward—
player 1 has three pure strategies, and player 2 has eight (why?). But now consider the
game shown in Figure 5.2.
1
A
B
2
2
C
D
E
(3,8)
(8,3)
(5,5)
F
1
G
(2,10)
Figure 5.2
H
(1,0)
A perfect-information game in extensive form.
In order to define a complete strategy for this game, each of the players must choose
at each of his
nodes. Thus westrategy:
can enumerate the pure strategies
Give anan action
example
oftwoachoice
behavioral
of the players as follows.
A with
probability .5 and G with probability .3
S1 = {(A, G), (A, H), (B, G), (B, H)}
= {(C, E), (C,of
F ), (D,
E), (D, F )} strategy that is not a behavioral
Give an Sexample
a mixed
It is important to note that we have to include the strategies (A, G) and (A, H), even
strategy:
though once A is chosen the G-versus-H choice is moot.
2
The definition
of best response
Nash equilibria
(.6(A,
G), .4(B,
H)) and
(why
not?) in this game are exactly as they
are in for normal form games. Indeed, this example illustrates how every perfect-
game can be converted to an equivalent normal form game. For example,
In thisinformation
game
every game
behavioral
corresponds
the
perfect-information
of Figure 5.2 canstrategy
be converted into
the normal form im- to a mixed
age of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are
strategy...
Multi Agent Systems, draft of September 19, 2006
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 19
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesn’t know if the other has
arrived before him,
or if he’s the first one.
5.2 Imperfect-information extensive-form games
121
1
L s
T
T
T R
L T
T
1,0
100,100
Figure 5.12
sH
HH
HH R
HH
HH
Hs 2
T
T D
T
T
T
2,2
U 5,1
A game with imperfect recall
What is the
space
of pure
strategies
this
game?
librium.
Note in particular
that in a mixed
strategy, agent 1in
decides
probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 − p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 ∗ p2 + 100 ∗ p(1 − p) + 2 ∗ (1 − p)
The expression simplifies to −99p2 + 98p + 2, whose maximum is obtained at p =
Extensive Form Games: Backward
Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 20
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesn’t know if the other has
arrived before him,
or if he’s the first one.
5.2 Imperfect-information extensive-form games
121
1
L s
T
T
T R
L T
T
1,0
100,100
Figure 5.12
sH
HH
HH R
HH
HH
Hs 2
T
T D
T
T
T
2,2
U 5,1
A game with imperfect recall
What is the
space
of pure
strategies
this
game?
librium.
Note in particular
that in a mixed
strategy, agent 1in
decides
probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
the payoff of 100 is irrelevant in the context of mixed strate1: (L,strategy
R);consistently.
2: (U,Thus
D)
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 − p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 ∗ p2 + 100 ∗ p(1 − p) + 2 ∗ (1 − p)
The expression simplifies to −99p2 + 98p + 2, whose maximum is obtained at p =
Extensive Form Games: Backward
Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 20
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesn’t know if the other has
arrived before him,
or if he’s the first one.
5.2 Imperfect-information extensive-form games
121
1
L s
T
T
T R
L T
T
1,0
100,100
Figure 5.12
sH
HH
HH R
HH
HH
Hs 2
T
T D
T
T
T
2,2
U 5,1
A game with imperfect recall
What is the
space
of pure
strategies
this
game?
librium.
Note in particular
that in a mixed
strategy, agent 1in
decides
probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
the payoff of 100 is irrelevant in the context of mixed strate1: (L,strategy
R);consistently.
2: (U,Thus
D)
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
What is the
mixed strategy equilibrium?
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 − p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 ∗ p2 + 100 ∗ p(1 − p) + 2 ∗ (1 − p)
The expression simplifies to −99p2 + 98p + 2, whose maximum is obtained at p =
Extensive Form Games: Backward
Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 20
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesn’t know if the other has
arrived before him,
or if he’s the first one.
5.2 Imperfect-information extensive-form games
121
1
L s
T
T
T R
L T
T
1,0
100,100
Figure 5.12
sH
HH
HH R
HH
HH
Hs 2
T
T D
T
T
T
2,2
U 5,1
A game with imperfect recall
What is the
space
of pure
strategies
this
game?
librium.
Note in particular
that in a mixed
strategy, agent 1in
decides
probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
the payoff of 100 is irrelevant in the context of mixed strate1: (L,strategy
R);consistently.
2: (U,Thus
D)
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
What is the
mixed strategy equilibrium?
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as folObserve
Dbehavioral
is dominant
for
R, DL with
is probability
better for 1 than
lows. Ifthat
he uses the
strategy (p, 1 − p)
(that2.
is, choosing
p each time he finds himself in the information set), his expected payoff is
L, D, so R, D is an equilibrium.
1 ∗ p2 + 100 ∗ p(1 − p) + 2 ∗ (1 − p)
The expression simplifies to −99p2 + 98p + 2, whose maximum is obtained at p =
Extensive Form Games: Backward
Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 20
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesn’t know if the other has
arrived before him,
or if he’s the first one.
5.2 Imperfect-information extensive-form games
121
1
L s
T
T
T R
L T
T
1,0
100,100
Figure 5.12
sH
HH
HH R
HH
HH
Hs 2
T
T D
T
T
T
2,2
U 5,1
A game with imperfect recall
What is the
space
of pure
strategies
this
game?
librium.
Note in particular
that in a mixed
strategy, agent 1in
decides
probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
the payoff of 100 is irrelevant in the context of mixed strate1: (L,strategy
R);consistently.
2: (U,Thus
D)
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
What is the
mixed strategy equilibrium?
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as folObserve
Dbehavioral
is dominant
for
R, DL with
is probability
better for 1 than
lows. Ifthat
he uses the
strategy (p, 1 − p)
(that2.
is, choosing
p each time he finds himself in the information set), his expected payoff is
L, D, so R, D is an equilibrium.
1 ∗ p2 + 100 ∗ p(1 − p) + 2 ∗ (1 − p)
The expression simplifies to −99p2 + 98p + 2, whose maximum is obtained at p =
Extensive Form Games: Backward
Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 20
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Games of imperfect recall
5.2
121
Imperfect-information extensive-form games
1
L s
T
T
T R
L T
T
1,0
100,100
Figure 5.12
sH
HH
HH R
HH
HH
Hs 2
T
T D
T
T
T
2,2
U 5,1
A game with imperfect recall
What is an
equilibrium in behavioral strategies?
librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 − p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 ∗ p2 + 100 ∗ p(1 − p) + 2 ∗ (1 − p)
The expression simplifies to −99p2 + 98p + 2, whose maximum is obtained at p =
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strategies, and instead we get the equilibrium ((98/198, 100/198), (0, 1)).
There is, however, a broad class of imperfect-information games in which the expressive power of mixed and behavioral strategies coincides. This is the class of games
of perfect recall. Intuitively speaking, in these games no player forgets any information
he knew about moves made so far; in particular, he remembers precisely all his own
moves.
Formally: and Imperfect Information Games
Extensive Form Games: Backward
Induction
CPSC 532A Lecture 10, Slide 21
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Games of imperfect recall
5.2
121
Imperfect-information extensive-form games
1
L s
T
T
T R
L T
T
1,0
100,100
sH
HH
HH R
HH
HH
Hs 2
T
T D
T
T
T
2,2
U 5,1
Figure 5.12
A game with imperfect recall
What is an
equilibrium in behavioral strategies?
librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically
to play L or R in his information set, but once he decides he plays that pure
again,whether
D strongly
2 in the context of mixed stratestrategy
consistently. Thusdominant
the payoff of 100 isfor
irrelevant
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
if 1 uses
the
behavioural
strategy
his
each time he finds himself in the information set. Noting(p,
that 1
the −
purep),
strategy
D is expected
dominant for agent 2 (and in fact is the unique best response to all strategies of
utilityweakly
is
1
∗
p2
+
100
∗
p(1
−
p)
+
2
∗
(1
−
p)
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral
1 − p) (that is, choosing L with probability
simplifies
to −99p2 +strategy
98p(p,+
2
p each time he finds himself in the information set), his expected payoff is
maximum at p = 198/198
∗ p + 100 ∗ p(1 − p) + 2 ∗ (1 − p)
thus equilibrium
is to(98/198,
1) at p =
The expression simplifies
−99p + 98p + 100/198),
2, whose maximum (0,
is obtained
2
2
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strategies, and instead we get the equilibrium ((98/198, 100/198), (0, 1)).
There is, however, a broad class of imperfect-information games in which the expressive power of mixed and behavioral strategies coincides. This is the class of games
of perfect recall. Intuitively speaking, in these games no player forgets any information
he knew about moves made so far; in particular, he remembers precisely all his own
moves.
Formally: and Imperfect Information Games
Extensive Form Games: Backward
Induction
CPSC 532A Lecture 10, Slide 21
Thus, we can have behavioral strategies that are different
from mixed strategies.
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Lecture Overview
1
Recap
2
Backward Induction
3
Imperfect-Information Extensive-Form Games
4
Perfect Recall
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 22
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Perfect Recall: mixed and behavioral strategies coincide
No player forgets anything he knew about moves made so far.
Definition
Player i has perfect recall in an imperfect-information game G if
for any two nodes h, h0 that are in the same information set for
player i, for any path h0 , a0 , h1 , a1 , h2 , . . . , hn , an , h from the root
of the game to h (where the hj are decision nodes and the aj are
actions) and any path h0 , a00 , h01 , a01 , h02 , . . . , h0m , a0m , h0 from the
root to h0 it must be the case that:
1
2
3
n=m
For all 0 ≤ j ≤ n, hj and h0j are in the same equivalence class
for player i.
For all 0 ≤ j ≤ n, if ρ(hj ) = i (that is, hj is a decision node
of player i), then aj = a0j .
G is a game of perfect recall if every player has perfect recall in it.
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 23
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Perfect Recall
Clearly, every perfect-information game is a game of perfect recall.
Theorem (Kuhn, 1953)
In a game of perfect recall, any mixed strategy of a given agent
can be replaced by an equivalent behavioral strategy, and any
behavioral strategy can be replaced by an equivalent mixed
strategy. Here two strategies are equivalent in the sense that they
induce the same probabilities on outcomes, for any fixed strategy
profile (mixed or behavioral) of the remaining agents.
Corollary
In games of perfect recall the set of Nash equilibria does not
change if we restrict ourselves to behavioral strategies.
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 24
Recap
Backward Induction
Imperfect-Information Extensive-Form Games
Perfect Recall
Computing Equilibria of Games of Perfect Recall
How can we find an equilibrium of an imperfect information
extensive form game?
One idea: convert to normal form, and use techniques
described earlier.
Problem: exponential blowup in game size.
Alternative (at least for perfect recall): sequence form
for zero-sum games, computing equilibrium is polynomial in
the size of the extensive form game
exponentially faster than the LP formulation we saw before
for general-sum games, can compute equilibrium in time
exponential in the size of the extensive form game
again, exponentially faster than converting to normal form
Extensive Form Games: Backward Induction and Imperfect Information Games
CPSC 532A Lecture 10, Slide 25

Download Report

Backward Induction and Imperfect Information Games

Paperzz.com

Your Paperzz