Marcin Kadluczka

Equlibrium Selection in
Stochastic Games

By
Marcin Kadluczka
Dec 2nd 2002
CS 594 – Piotr Gmytrasiewicz
CS 594
1
Agenda





Definition of finite discounted stochastic
games
Stationary equilibrium
Linear tracing procedure
Stochastic tracing procedures
Examples of different equlibria
depending on the type of stocastic
tracing
CS 594
2
Finite discounted stochastic
games
 N , ,{S }(i , )N  ,{u }iN ,  ,  
i

i
Where
N – is the finite set of players (N={1,2,…,n} )
 - state space with finite number of states 
Si is the finite action set i  N ,    def
u i : H   where H  {( , s ) |  , s  S    iN Si }
 : H  () transition function (denoted as  ( | , s ) )
  [0,1)  discount factor
CS 594
3
Rules of the game
Time t
Current state t
Player 1: s  S
1
Probability of
transition
1
t
Player 2: s  S
2
2
t
t
 ( t 1 |  t , s )
.
Transition
.
.
Player 1:u1 ( t , s )
t
t
t
Next state t 1
Time t+1
Player 2:u 2 ( t , s )
t
.
.
.
.
.
.
.
n
n
s

S
t
Player n: 
t
Player n:u n ( t , s )
t
Current state
Rewards
CS 594
4
Other assumption

Perfect recall
At each stage each player remembers all past action
chosen by all players and all past states occurred

Difference from normal-form game
The game does not exist of single play, but jumps
according to the probability measure  to the next
state and continues dynamically

For rewards it count future states not only
immediate payoffs
CS 594
5
Pure & Mixed strategy

Pure strategy
  S

Mixed strategy
      iN  i where  i  ( S i )
If mixed strategy is played ->
instantaneous expected payoff of player
i
u
i is denoted by ( , s )
And transition probability by  ( | ,   )
CS 594
6
Stationary strategy payoffs

History
The set of possible histories up to stage k: H k  ik01 Hi where Hi  ( i , si )
Consists of all sequences h k  ( 0 , s , 1 , s ....,  k 1 , s )
0

Behavior strategy
1
k 1
 ik
 i   i 0 ,  i1 ,....,  ik ,.... where  i 0   i , and  ik : H k   i for i  k

Stationary strategy
 ik (h k ,  k )   ik ( k )

Payoffs
U i (,  )  k 0  kU ik (,  )

CS 594
7
Equilibrium

General equilibrium
A strategy-tuple  is an equilibrium if and
only if i is a best response to -i for all i


Stationary equilibrium (Nash Eq.)
Payoff for stationary equilibrium 
U i ( ,  )  u i ( ,  )     ( ,  ,  )U ik ( ,  )
CS 594
8
Comparison with other games

Comparison to normal-form games
  N , , {S i }iN ,{u i }iN 

Comparison to MDPs



More than one agent
If strategy is stationary – they are the same
Comparison to Bayesian Games



No discount in Bayesian
Types -> States
We have beliefs inside prior
CS 594
9
Linear tracing procedure

Corresponding normal-form game
i
i
We fix the state :   N , ,{S  }iN ,{u }iN 

Prior probability distributions = prior
Expectation of each player about other players
strategy choices over the pure strategies
Each player has the same assumption about
others – Important assumption
CS 594
10
Linear tracing procedure

con’t
Family of one-parameter games
{t } where t  [0,1] and 1  

Payoff function
v i (t ;  ,   )  (1  t )u i ( , ( p i ,  i ))  tu i ( ,   )
clearly we have : v i (1;  ,   )  u i ( ,   )
and v i (0;  ,   )  u i ( , ( pi ,  i ))
CS 594
11
Linear tracing procedure
 ( )
t
con’t
- set of equilibrium points in t
t
 L  L( , p ) be the graph of the correspond ence t  ( ) for t  [0,1]
It can be collection of piece of one-dim curves, though in degenerate
cases it may contain isolated points and/or more dim curves


Feasible path 
x1  (1,  * )
x0  (0, 0 )
* is called outcome selected by path  

Linear tracing procedure

1
t
Well-defined l.t.p
CS 594
12
Stochastic tracing procedure


Assumption: and prior p is given
Stochastic game
 N , ,{S }(i , )N  ,{u }iN ,  ,  
i

i
Total expected discounted payoffs
V i (0;,  )  U i (, ( p i ,  i )) and V i (1;,  )  U i (,  )

Stochastic tracing procedure T(,p)
T (, p)  {( t , p)  [0,1]   |  i is a best stationary response to  i in  t }
It is feasible  continuous function  : [0,1]  T (, p) :
 (0)  T (, p)  ({0}  )   (1)  T (, p)  ({1}  )
CS 594
13
Alternative ways of extension payoff
function for stochastic games


There are 4 ways of define player belief:
Correlation within states – C(S)
All opponents plays the same strategy

Absence of correlation within states – I(S)
Each opponent can play different strategy

Correlation across time – C(T)
Each player plays the same strategy accross the time

Absence of correlation across time – I(T)
During the time each player can change its strategy
CS 594
14
Alternatives con’t

Alternative 1: C(S),I(T)
VCi ( S ), I (T ) (t; ,  )  (1  t )u i (, ( p i ,  i ))  tu i (,  ) 
  [(1  t ) ( | , ( pi , i ))  t ( | ,  )]VCi ( S ), I (T ) (t; ,  )

Alternative 2: C(S),C(T)
VCi ( S ),C (T ) (t;,  )  (1  t )U i (, ( p i ,  i ))  tU i (,  )
CS 594
15
Alternatives con’t

Alternative 3: I(S),I(T)
VIi( S ),I (T ) (t;,  )  U i (, ((1  t ) p i  t i ,  i ))
VIi( S ), I (T ) (t; ,  )  u i (, ((1  t ) p i  t i ,  i )) 
     ( | , (1  t ) pi  ti , i )VIi( S ) I (T ) (t; ,  )

Alternative 4: I(S),C(T)
V
i
I ( S ),C (T )
(t; ,  )  S  N \{i} (1  t ) t
U i (, p S ,  N \ S )
S N  S 1
CS 594
16
Example 1 – C(S) versus I(S)



Prior =
Equilibria:pure : (s ; s ; s ) (s ; s ; s
1
2'
3'
(
0
,
(
s
;
s
;
s
))
Starting point:
1 5 1 1 2 1
(( , ), ( , ), ( , ))
6 6 2 2 3 3
1
2
3
CS 594
1'
2'
3'
) and mixed ( 2  1,2  2 )
17
Ex1: C(S) solution
CS 594
18
Ex1: C(S) calculations

(s1,s2,s3;1):
Player 1 expect player 2 plays: (1/2(1-t)+t,1/2(1-t))
Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t))
Expected payoff: (1/2(1-t)+t)(2/3(1-t)+t)*2=1/3(1+t)(2+t)

(s1,s2,s3;2):
Player 2 expect player 1 plays: (1/6(1-t)+t,5/6(1-t))
Player 2 expect player 3 plays: (2/3(1-t)+t,1/3(1-t))
Expected payoff: (1/6(1-t)+t)(2/3(1-t)+t)*2=1/9(1+5t)(2+t)

(s1,s2’,s3;1):
Player 1 expect player 2 plays: (1/2(1-t)+t,5/6(1-t))
Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t))
Expected payoff: (1/2(1-t))(2/3(1-t)+t)*2=1/9(1-t)(2+t)
CS 594
19
Ex1: C(S) trajectory
( s 1 ; s 2 ' , s 3' )
( s1; s 2 , s 3 )
(s1; s 2 , s 3' )
CS 594
20
Ex1: I(S) solution
CS 594
21
Ex1: I(S) calculations

(s1,s2,s3;1):
Player 1 expect player 2&3 plays s2&s3: t
Player 1 expect player 2&3 plays prior(s1&s3) : (1-t)
Expected payoff: ((1-t)(1/2)(2/3)+t) *2=2/3(1-t)+2t

(s1,s2,s3;2):
Player 2 expect player 1&3 plays s1&s3: t
Player 2 expect player 1&3 plays prior(s1&s3) : (1-t)
Expected payoff: ((1-t)(1/6)(2/3)+t) *2=2/9(1-t)+2t

(s1,s2’,s3;1):
Player 1 expect player 2&3 plays s2’&s3: t (but payoff is 0)
Player 1 expect player 2&3 plays prior(s1&s3) : (1-t)
Expected payoff: ((1-t)(1/2)(2/3)) *2=2/3(1-t)
CS 594
22
Ex1: I(S) trajectory
(s1' , s 2' , s 3' )
( s 1 ; s 2 ' , s 3' )
CS 594
23
Example 2 – C(I) versus C(S)



Equilibria: pure : (s ; s ) (s ; s ) and mixed (4  2
1 1
2 1
((
,
),
(
Prior: 2 2 3 , 3 ))
1
2'
(
0
,
(
s
;
s
))
Starting point:
1
2
1'
2'
3 ,2 3  3)
Payoffs
Transition probalilities
CS 594
24
Ex2: C(T) solution 0
Transition probalilities for player 1
Transition probalilities for player 2
CS 594
25
Ex2: C(T) trajectory
( s1' ; s 2' )
( s1 ; s 2' )
CS 594
26
Ex2: I(T) trajectory
( s1; s 2 )
( s1 ; s 2' )
CS 594
28
Summary




Definition of stochastic games
Linear tracing procedure were
presented
Some extension were shown with
examples
C(S),I(T) is probably the best extension
for calculation of strategy
CS 594
30
Reference

“Equlibrium Selection in Stochastic Games”
by P. Jean-Jacques Herings and Ronald J.A.P.
Peeters
CS 594
31
Questions
?
CS 594
32