Equlibrium Selection in
Stochastic Games
By
Marcin Kadluczka
Dec 2nd 2002
CS 594 – Piotr Gmytrasiewicz
CS 594
1
Agenda
Definition of finite discounted stochastic
games
Stationary equilibrium
Linear tracing procedure
Stochastic tracing procedures
Examples of different equlibria
depending on the type of stocastic
tracing
CS 594
2
Finite discounted stochastic
games
N , ,{S }(i , )N ,{u }iN , ,
i
i
Where
N – is the finite set of players (N={1,2,…,n} )
- state space with finite number of states
Si is the finite action set i N , def
u i : H where H {( , s ) | , s S iN Si }
: H () transition function (denoted as ( | , s ) )
[0,1) discount factor
CS 594
3
Rules of the game
Time t
Current state t
Player 1: s S
1
Probability of
transition
1
t
Player 2: s S
2
2
t
t
( t 1 | t , s )
.
Transition
.
.
Player 1:u1 ( t , s )
t
t
t
Next state t 1
Time t+1
Player 2:u 2 ( t , s )
t
.
.
.
.
.
.
.
n
n
s
S
t
Player n:
t
Player n:u n ( t , s )
t
Current state
Rewards
CS 594
4
Other assumption
Perfect recall
At each stage each player remembers all past action
chosen by all players and all past states occurred
Difference from normal-form game
The game does not exist of single play, but jumps
according to the probability measure to the next
state and continues dynamically
For rewards it count future states not only
immediate payoffs
CS 594
5
Pure & Mixed strategy
Pure strategy
S
Mixed strategy
iN i where i ( S i )
If mixed strategy is played ->
instantaneous expected payoff of player
i
u
i is denoted by ( , s )
And transition probability by ( | , )
CS 594
6
Stationary strategy payoffs
History
The set of possible histories up to stage k: H k ik01 Hi where Hi ( i , si )
Consists of all sequences h k ( 0 , s , 1 , s ...., k 1 , s )
0
Behavior strategy
1
k 1
ik
i i 0 , i1 ,...., ik ,.... where i 0 i , and ik : H k i for i k
Stationary strategy
ik (h k , k ) ik ( k )
Payoffs
U i (, ) k 0 kU ik (, )
CS 594
7
Equilibrium
General equilibrium
A strategy-tuple is an equilibrium if and
only if i is a best response to -i for all i
Stationary equilibrium (Nash Eq.)
Payoff for stationary equilibrium
U i ( , ) u i ( , ) ( , , )U ik ( , )
CS 594
8
Comparison with other games
Comparison to normal-form games
N , , {S i }iN ,{u i }iN
Comparison to MDPs
More than one agent
If strategy is stationary – they are the same
Comparison to Bayesian Games
No discount in Bayesian
Types -> States
We have beliefs inside prior
CS 594
9
Linear tracing procedure
Corresponding normal-form game
i
i
We fix the state : N , ,{S }iN ,{u }iN
Prior probability distributions = prior
Expectation of each player about other players
strategy choices over the pure strategies
Each player has the same assumption about
others – Important assumption
CS 594
10
Linear tracing procedure
con’t
Family of one-parameter games
{t } where t [0,1] and 1
Payoff function
v i (t ; , ) (1 t )u i ( , ( p i , i )) tu i ( , )
clearly we have : v i (1; , ) u i ( , )
and v i (0; , ) u i ( , ( pi , i ))
CS 594
11
Linear tracing procedure
( )
t
con’t
- set of equilibrium points in t
t
L L( , p ) be the graph of the correspond ence t ( ) for t [0,1]
It can be collection of piece of one-dim curves, though in degenerate
cases it may contain isolated points and/or more dim curves
Feasible path
x1 (1, * )
x0 (0, 0 )
* is called outcome selected by path
Linear tracing procedure
1
t
Well-defined l.t.p
CS 594
12
Stochastic tracing procedure
Assumption: and prior p is given
Stochastic game
N , ,{S }(i , )N ,{u }iN , ,
i
i
Total expected discounted payoffs
V i (0;, ) U i (, ( p i , i )) and V i (1;, ) U i (, )
Stochastic tracing procedure T(,p)
T (, p) {( t , p) [0,1] | i is a best stationary response to i in t }
It is feasible continuous function : [0,1] T (, p) :
(0) T (, p) ({0} ) (1) T (, p) ({1} )
CS 594
13
Alternative ways of extension payoff
function for stochastic games
There are 4 ways of define player belief:
Correlation within states – C(S)
All opponents plays the same strategy
Absence of correlation within states – I(S)
Each opponent can play different strategy
Correlation across time – C(T)
Each player plays the same strategy accross the time
Absence of correlation across time – I(T)
During the time each player can change its strategy
CS 594
14
Alternatives con’t
Alternative 1: C(S),I(T)
VCi ( S ), I (T ) (t; , ) (1 t )u i (, ( p i , i )) tu i (, )
[(1 t ) ( | , ( pi , i )) t ( | , )]VCi ( S ), I (T ) (t; , )
Alternative 2: C(S),C(T)
VCi ( S ),C (T ) (t;, ) (1 t )U i (, ( p i , i )) tU i (, )
CS 594
15
Alternatives con’t
Alternative 3: I(S),I(T)
VIi( S ),I (T ) (t;, ) U i (, ((1 t ) p i t i , i ))
VIi( S ), I (T ) (t; , ) u i (, ((1 t ) p i t i , i ))
( | , (1 t ) pi ti , i )VIi( S ) I (T ) (t; , )
Alternative 4: I(S),C(T)
V
i
I ( S ),C (T )
(t; , ) S N \{i} (1 t ) t
U i (, p S , N \ S )
S N S 1
CS 594
16
Example 1 – C(S) versus I(S)
Prior =
Equilibria:pure : (s ; s ; s ) (s ; s ; s
1
2'
3'
(
0
,
(
s
;
s
;
s
))
Starting point:
1 5 1 1 2 1
(( , ), ( , ), ( , ))
6 6 2 2 3 3
1
2
3
CS 594
1'
2'
3'
) and mixed ( 2 1,2 2 )
17
Ex1: C(S) solution
CS 594
18
Ex1: C(S) calculations
(s1,s2,s3;1):
Player 1 expect player 2 plays: (1/2(1-t)+t,1/2(1-t))
Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t))
Expected payoff: (1/2(1-t)+t)(2/3(1-t)+t)*2=1/3(1+t)(2+t)
(s1,s2,s3;2):
Player 2 expect player 1 plays: (1/6(1-t)+t,5/6(1-t))
Player 2 expect player 3 plays: (2/3(1-t)+t,1/3(1-t))
Expected payoff: (1/6(1-t)+t)(2/3(1-t)+t)*2=1/9(1+5t)(2+t)
(s1,s2’,s3;1):
Player 1 expect player 2 plays: (1/2(1-t)+t,5/6(1-t))
Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t))
Expected payoff: (1/2(1-t))(2/3(1-t)+t)*2=1/9(1-t)(2+t)
CS 594
19
Ex1: C(S) trajectory
( s 1 ; s 2 ' , s 3' )
( s1; s 2 , s 3 )
(s1; s 2 , s 3' )
CS 594
20
Ex1: I(S) solution
CS 594
21
Ex1: I(S) calculations
(s1,s2,s3;1):
Player 1 expect player 2&3 plays s2&s3: t
Player 1 expect player 2&3 plays prior(s1&s3) : (1-t)
Expected payoff: ((1-t)(1/2)(2/3)+t) *2=2/3(1-t)+2t
(s1,s2,s3;2):
Player 2 expect player 1&3 plays s1&s3: t
Player 2 expect player 1&3 plays prior(s1&s3) : (1-t)
Expected payoff: ((1-t)(1/6)(2/3)+t) *2=2/9(1-t)+2t
(s1,s2’,s3;1):
Player 1 expect player 2&3 plays s2’&s3: t (but payoff is 0)
Player 1 expect player 2&3 plays prior(s1&s3) : (1-t)
Expected payoff: ((1-t)(1/2)(2/3)) *2=2/3(1-t)
CS 594
22
Ex1: I(S) trajectory
(s1' , s 2' , s 3' )
( s 1 ; s 2 ' , s 3' )
CS 594
23
Example 2 – C(I) versus C(S)
Equilibria: pure : (s ; s ) (s ; s ) and mixed (4 2
1 1
2 1
((
,
),
(
Prior: 2 2 3 , 3 ))
1
2'
(
0
,
(
s
;
s
))
Starting point:
1
2
1'
2'
3 ,2 3 3)
Payoffs
Transition probalilities
CS 594
24
Ex2: C(T) solution 0
Transition probalilities for player 1
Transition probalilities for player 2
CS 594
25
Ex2: C(T) trajectory
( s1' ; s 2' )
( s1 ; s 2' )
CS 594
26
Ex2: I(T) trajectory
( s1; s 2 )
( s1 ; s 2' )
CS 594
28
Summary
Definition of stochastic games
Linear tracing procedure were
presented
Some extension were shown with
examples
C(S),I(T) is probably the best extension
for calculation of strategy
CS 594
30
Reference
“Equlibrium Selection in Stochastic Games”
by P. Jean-Jacques Herings and Ronald J.A.P.
Peeters
CS 594
31
Questions
?
CS 594
32
© Copyright 2026 Paperzz