Game Theory School on Systems and Control, IIT Kanpur

Game Theory
School on Systems and Control, IIT Kanpur
Ankur A. Kulkarni
Systems and Control Engineering
Indian Institute of Technology Bombay
[email protected]
Aug 7, 2015
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
1 / 42
Game theory: examples
Prisoner’s dilemma
Consider two prisoners A and B, each confined in a solitary room, who are given a choice
to either testify or maintain their silence with the following consequencesIf A and B both testify, each of them serves 2 years in prison
If one of them opts to remain silent but the other testifies, then the prisoner who
testified will be set free and the one who opted to remain silent will serve 3 years in
prison
If both of them remain silent, then each of them will serve 1 year each in prison
silent
testify
silent
(1,1)
(0,3)
testify
(3,0)
(2,2)
(row,column); both minimizing
Question: What will each player do? What must each player do?
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
2 / 42
Game theory: examples
Prisoner’s dilemma
Consider two prisoners A and B, each confined in a solitary room, who are given a choice
to either testify or maintain their silence with the following consequencesIf A and B both testify, each of them serves 2 years in prison
If one of them opts to remain silent but the other testifies, then the prisoner who
testified will be set free and the one who opted to remain silent will serve 3 years in
prison
If both of them remain silent, then each of them will serve 1 year each in prison
silent
testify
silent
(1,1)
(0,3)
testify
(3,0)
(2,2)
(row,column); both minimizing
Question: What will each player do? What must each player do?
Main observations
Each player is faced with an optimization problem
Decisions have to be made without the knowledge of the other’s decision
But outcome depends not only on what one player does, but also on what the other
does
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
2 / 42
Game theory: examples
Stag hunt/Hunter’s dilemma
The game consists of two hunters who can choose to hunt either deer or rabbits with
following rules:
If A and B both choose to hunt deer, each of them get 2
If one of them opts to hunt deer but the other choses to hunt rabbits, then the
hunter who decides to hunt rabbit will get one, while the one who went for deer will
get nothing!
If both of them choose rabbits, then each of them will receive half a rabbit each
deer
rabbit
deer
(2,2)
(1,0)
rabbit
(0,1)
(0.5,0.5)
Question: What will each player do? What must each player do?
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
3 / 42
What is Game Theory?
Game theory
When two or more rational decision makers (agents)
interact, one would like to have a mathematical
framework for reasoning about such situations, under
certain assumptions. Game theory is the study of such
interactions involving strategic decision making. A game
comprises of
A set of players - {1, 2, . . . , N}
For each player i, a set of strategies Si
An objective function πi : S −→ R that he is trying
to minimize or a payoff/utility function he is trying
to maximize. S = S1 × . . . × SN
What are the players, strategies and utility functions in the above games?
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
4 / 42
What is Game Theory and what it is not
What it is
Game theory develops a framework for logically reasoning about strategic
interactions
It attempts to answer what one can say would be the logical outcomes of a game
Game theory develops and studies solution concepts, which are concepts that can
logically be regarded as outcomes of games
The point of view taken is that of on observer of the game
It can be used for predicting outcomes. In many applications it has provided
surprisingly good predictions
It can be used for tactical/strategic advice/prescriptions.
It can be used to explain, alter or induce behaviour
What it is not
Theory of human behaviour, psychology, emotion, trust etc
Secret code to win games
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
5 / 42
Classification of games
First main vertical along which games are classified is based on communication
requirements: Two broad categories:
Cooperative Game: Any amount of communication allowed between players involved
in the game. Players can have binding agreements between them.
Noncooperative Game: No communication between the players involved. No binding
agreement between the players.
Further classifications. Nature of payoff:
Zero sum games
Nonzero sum games
Aspect of time and information:
Static games: Decisions are made in one shot, without the knowledge of decisions of
other players
Dynamic games: Decisions are made sequentially, with some knowledge of decisions
of other players
Quality of information:
Imperfect information, incomplete information, asymmetric information
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
6 / 42
Game theory in control theory
Game theory covers competitive scenarios as well as cooperative scenarios
Team problems: Games where all players have identical objectives
Competitive
Security: design of engineering systems, e.g., the power grid, has to now consider an
additional aspect of security in additional to usual considerations such as stability,
reliability etc. Game theory provides a natural framework.
Engineering and economics: design of engineering systems, e.g., the internet, is now
intertwined with their economics. Again, game theory applies.
Cooperative
Decentralized control problems can be thought of as team problems
Multiagent, distributed systems are becoming common. These can be modeled
teams where individual components are players
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
7 / 42
How does one apply game theory
Game theory applies whenever a strategic decision is involved, i.e., when there are
multiple agents and payoffs of the agents are affect not just by their own decision
but also of others
Game theory, like optimization, is an omnibus theory that has specialized tools that
have been sharpened for various situations
Data requirements
Game theory presumes players know the rules of the game, their options, which
players are there in the game and their options, and the payoffs for all players.
Game theory assumes each player is rational: i.e., consistent in seeking the highest
payoff
Often the conclusions drawn from game theory are based only on the ordinal
relationships between payoffs, not the exact values
Game theory requires knowledge of the time evolution of the information – i.e., who
would know what in various situations
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
8 / 42
How does one apply game theory
Process of applying game theory
Identify players, identify decisions, identify payoffs, identify rules
Can agreements on cooperation be enforced? yes: cooperative, no: non-cooperative
Can the rules be altered? If so, in this case one is really playing a pregame before
the actual game in which the rules of the actual game are strategies
Are decisions sequential or simultaneous? Simultaneous: e.g., prisoner’s dilemma
=⇒ static game theory. In sequential games one additional information before
making the move and one has to look ahead in order to act now =⇒ dynamic
games
Are objectives in conflict or is there some commonality of interests? diametrically
opposite interests zero-sum games. Some commonality non-zero sum
Is the game played once, or repeatedly (repeated games)?
Do players have full or partial information? static or dynamic games
Add as much contextual information as possible
Select the appropriate kind and analyze using tools available for it
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
9 / 42
Analyzing noncooperative games: Nash equilibrium
Noncooperative games
Players have no scope for communication or discussion
There is no mechanism available for letting the players get into
binding agreements
Decisions are made based on the existing incentives only
Notation
xi ∈ Si = strategy of player i
x = (x1 , . . . , xN ) = strategy profile
x−i = (x1 , . . . , xi−1 , xi+1 , . . . , xN ) = strategies of all players except i
(x̄i , x−i ) = (x1 , . . . , xi−1 , x̄i , xi+1 , . . . , xN )
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
10 / 42
Analyzing noncooperative games: Nash equilibrium
Noncooperative games
Players have no scope for communication or discussion
There is no mechanism available for letting the players get into
binding agreements
Decisions are made based on the existing incentives only
Notation
xi ∈ Si = strategy of player i
x = (x1 , . . . , xN ) = strategy profile
x−i = (x1 , . . . , xi−1 , xi+1 , . . . , xN ) = strategies of all players except i
(x̄i , x−i ) = (x1 , . . . , xi−1 , x̄i , xi+1 , . . . , xN )
Nash equilibrium
Profile of strategies x ∗ = (x1∗ , . . . , xN∗ ) ∈ S such that no player has an incentive to
deviate. i.e., for all i = 1, . . . , N
∗
πi (x ∗ ) ≤ πi (xi ; x−i
)
Ankur A. Kulkarni (IIT Bombay)
Game Theory
∀xi ∈ Si
Aug 7, 2015
10 / 42
Nash equilibrium
It is a profile of strategies x ∗ where if player any i unilaterally shifts from it, he is
worse off
Note that one does not take into account the effect of this deviation on other
players’ strategy. Strategies of other players are held fixed
Nash equilibrium for Prisoner’s dilemma
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
11 / 42
Nash equilibrium
It is a profile of strategies x ∗ where if player any i unilaterally shifts from it, he is
worse off
Note that one does not take into account the effect of this deviation on other
players’ strategy. Strategies of other players are held fixed
Nash equilibrium for Prisoner’s dilemma
Testify, Testify
Why does the Nash equilibrium make sense?
Noncooperative games means no communication allowed
If prisoners could discuss and get into binding agreements, they would agree to both
stay silent
In the absence of a binding agreement, each player has an incentive to deviate from
(silent, silent) or any other point
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
11 / 42
Nash equilibrium
It is a profile of strategies x ∗ where if player any i unilaterally shifts from it, he is
worse off
Note that one does not take into account the effect of this deviation on other
players’ strategy. Strategies of other players are held fixed
Nash equilibrium for Prisoner’s dilemma
Testify, Testify
Why does the Nash equilibrium make sense?
Noncooperative games means no communication allowed
If prisoners could discuss and get into binding agreements, they would agree to both
stay silent
In the absence of a binding agreement, each player has an incentive to deviate from
(silent, silent) or any other point
Nash equilibria for Hunter’s dilemma
(deer, deer) and (rabbit, rabbit)
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
11 / 42
Nash equilibrium
Justifications and interpretations
Stability: A point that is not stable against unilateral deviation is cannot be
regarded as an outcome
Self-fulfilling agreement: if the players could communicate and decide to play Nash,
the decision will hold. Since none of the players will have an incentive to deviate
Settling point of an adjustment process
Nature: Nash equilibrium is seen in nature, e.g., evolutionary biology
Can be deduced, in some cases
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
12 / 42
Nash equilibrium
Justifications and interpretations
Stability: A point that is not stable against unilateral deviation is cannot be
regarded as an outcome
Self-fulfilling agreement: if the players could communicate and decide to play Nash,
the decision will hold. Since none of the players will have an incentive to deviate
Settling point of an adjustment process
Nature: Nash equilibrium is seen in nature, e.g., evolutionary biology
Can be deduced, in some cases
Caveats
Nash equilibrium cannot be derived. It is a concept, it can only be defined
Nash equilibrium as described applies to static games also. There is really no
adjustment process involved
A Nash equilibrium may not always exist and if it exists, may not be unique
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
12 / 42
Nash equilibrium
Justifications and interpretations
Stability: A point that is not stable against unilateral deviation is cannot be
regarded as an outcome
Self-fulfilling agreement: if the players could communicate and decide to play Nash,
the decision will hold. Since none of the players will have an incentive to deviate
Settling point of an adjustment process
Nature: Nash equilibrium is seen in nature, e.g., evolutionary biology
Can be deduced, in some cases
Caveats
Nash equilibrium cannot be derived. It is a concept, it can only be defined
Nash equilibrium as described applies to static games also. There is really no
adjustment process involved
A Nash equilibrium may not always exist and if it exists, may not be unique
Finding a Nash equilibrium
Systematically check all the possible outcome
Computational schemes – more later
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
12 / 42
Example: Nash-Cournot equilibrium
N firms, want to decide production levels q1 , . . . , qN of the same
good, qi ∈ [0, ∞). Cost is ci (qi )
The market clearing price = p(q) is not exogenous but depends on
q = (q1 , . . . , qN )
The firms want to maximize profit
ui (q1 , . . . , qN ) = p(q)qi − ci (qi ), i.e., an optimization problem
max
subject to
Ankur A. Kulkarni (IIT Bombay)
p(q)qi − ci (qi )
qi ≥ 0
Game Theory
Aug 7, 2015
13 / 42
Example: Nash-Cournot equilibrium
N firms, want to decide production levels q1 , . . . , qN of the same
good, qi ∈ [0, ∞). Cost is ci (qi )
The market clearing price = p(q) is not exogenous but depends on
q = (q1 , . . . , qN )
The firms want to maximize profit
ui (q1 , . . . , qN ) = p(q)qi − ci (qi ), i.e., an optimization problem
max
subject to
Suppose p(q) = p(
0
−p (q
∗
)qi∗
P
i
p(q)qi − ci (qi )
qi ≥ 0
qi ). Then KKT =⇒ for all i
+ −p(q ∗ ) + ci0 (qi∗ ) − µ∗i = 0,
µ∗i ≥ 0, qi∗ ≥ 0, µ∗i qi∗ = 0.
NE requires a simultaneous solution of N KKT conditions
P
If p(q) = 1 − i qi and ci (qi ) = cqi (0 ≤ c ≤ 1), and N = 2, then q1∗ = q2∗ =
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
1−c
3
13 / 42
Reaction curves
U i = Set of pure strategies of Pi , J i (u −i , u i ) objective of Pi when Pi plays u i and other
players play u −i (minimizing players)
Best response or reaction curve of player i against u −i Ri (u −i ) = argmin J i (u i , u −i )
u i ∈U i
For two player games, the NE is the intersection of reaction curves
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
14 / 42
Refinements of the Nash equilibrium
Nash equilibrium is often non-unique and depending on the situation some NE can
be more meaningful than others
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
15 / 42
Refinements of the Nash equilibrium
Nash equilibrium is often non-unique and depending on the situation some NE can
be more meaningful than others
Security Dilemma
Consider the game in which the USA and USSR have to decide on whether to have
nuclear weapons or not. The payoff matrix is (both maximizing)
USSR
Yes
No
USA
Yes
(2,2)
(-1,3)
No
(3,-1)
(4,4)
Two Nash equilibria (Yes, Yes)= (2, 2) and (No, No) = (4, 4). In terms of payoff,
playing (No, No) is preferable for both players. Such a NE is Pareto dominant.
But (No,No) is also risky since if the other player changes strategy to Yes, the player
playing No gets -1. Thus (Yes,Yes) is a “safer” equilibrium; it is a risk dominant
Nash equilibrium
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
15 / 42
Refinements of the Nash equilibrium
Refinements of the NE
A refinement of the NE is a particular NE with additional properties
Many other refinements of the NE are known, e.g., trembling hand perfect
equilibrium, proper equilibrium, subgame perfect equilibrium, etc
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
16 / 42
Refinements of the Nash equilibrium
Refinements of the NE
A refinement of the NE is a particular NE with additional properties
Many other refinements of the NE are known, e.g., trembling hand perfect
equilibrium, proper equilibrium, subgame perfect equilibrium, etc
Applying the Nash equilibrium
In order to get good predictions/prescriptions, it may not be enough to only consider the
NE. To choose out of the many possible NEs, one must apply the right contextual
refinement
Some Nash equilibria may
be mathematical quirks of
the data and have no
meaning per se
May be worth perturbing
the data to see which
equilibria survive
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
16 / 42
Zero-sum games
Zero-sum games
A two-player game with objectives π1 , π2 is a zero-sum game if π1 = −π2 .
Note: only two players! Objectives polar opposites of each other
For finitely many strategies, we represent using a single matrix, say, A: rows are
strategies of one player, columns of the other player
(a, −a)
(c, −c)
(b, −b)
(d, −d)
=⇒
a
c
b
d
Row player is trying to minimize the value in the matrix, column player trying to
maximize
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
17 / 42
Zero-sum games
Zero-sum games
A two-player game with objectives π1 , π2 is a zero-sum game if π1 = −π2 .
Note: only two players! Objectives polar opposites of each other
For finitely many strategies, we represent using a single matrix, say, A: rows are
strategies of one player, columns of the other player
(a, −a)
(c, −c)
(b, −b)
(d, −d)
=⇒
a
c
b
d
Row player is trying to minimize the value in the matrix, column player trying to
maximize
Security strategy
Row i ∗ is a security strategy for row player if V (A) , maxj ai ∗ j ≤ maxj aij ∀i
Column j ∗ is a security strategy for column player if V (A) , mini aij ∗ ≥ mini aij
V (A) = mini maxj aij V (A) = maxj min aij
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
∀j
17 / 42
Saddle point
For any game A, V (A) ≥ V (A), but the inequality is often strict:
P1
Ankur A. Kulkarni (IIT Bombay)
1
0
−2
P2
3
3
−1 2
2
0
−2
1
1
Game Theory
V (A) =?V (A) =?
Aug 7, 2015
18 / 42
Saddle point
For any game A, V (A) ≥ V (A), but the inequality is often strict:
P1
1
0
−2
P2
3
3
−1 2
2
0
−2
1
1
V (A) =?V (A) =?
Pair of strategies (i ∗ , j ∗ ) is a saddle point if ai ∗ ,j ∗ = V (A) = V (A)
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
18 / 42
Saddle point
For any game A, V (A) ≥ V (A), but the inequality is often strict:
P1
1
0
−2
P2
3
3
−1 2
2
0
−2
1
1
V (A) =?V (A) =?
Pair of strategies (i ∗ , j ∗ ) is a saddle point if ai ∗ ,j ∗ = V (A) = V (A)
At a saddle point:
ai ∗ ,j ≤ ai ∗ ,j ∗ ≤ ai,j ∗
In short a saddle point is Nash equilibrium of a zero-sum game
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
18 / 42
Saddle point
For any game A, V (A) ≥ V (A), but the inequality is often strict:
P1
1
0
−2
P2
3
3
−1 2
2
0
−2
1
1
V (A) =?V (A) =?
Pair of strategies (i ∗ , j ∗ ) is a saddle point if ai ∗ ,j ∗ = V (A) = V (A)
At a saddle point:
ai ∗ ,j ≤ ai ∗ ,j ∗ ≤ ai,j ∗
In short a saddle point is Nash equilibrium of a zero-sum game
Zero-sum game need not have a saddle point
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
18 / 42
Saddle point
For any game A, V (A) ≥ V (A), but the inequality is often strict:
P1
1
0
−2
P2
3
3
−1 2
2
0
−2
1
1
V (A) =?V (A) =?
Pair of strategies (i ∗ , j ∗ ) is a saddle point if ai ∗ ,j ∗ = V (A) = V (A)
At a saddle point:
ai ∗ ,j ≤ ai ∗ ,j ∗ ≤ ai,j ∗
In short a saddle point is Nash equilibrium of a zero-sum game
Zero-sum game need not have a saddle point
If there is a saddle point, then all saddle points have the same value. If (i1 , j1 ) and
(i2 , j2 ) are saddle points, (i1 , j2 ) and (i2 , j1 ) are also saddle points
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
18 / 42
Saddle point
For any game A, V (A) ≥ V (A), but the inequality is often strict:
P1
1
0
−2
P2
3
3
−1 2
2
0
−2
1
1
V (A) =?V (A) =?
Pair of strategies (i ∗ , j ∗ ) is a saddle point if ai ∗ ,j ∗ = V (A) = V (A)
At a saddle point:
ai ∗ ,j ≤ ai ∗ ,j ∗ ≤ ai,j ∗
In short a saddle point is Nash equilibrium of a zero-sum game
Zero-sum game need not have a saddle point
If there is a saddle point, then all saddle points have the same value. If (i1 , j1 ) and
(i2 , j2 ) are saddle points, (i1 , j2 ) and (i2 , j1 ) are also saddle points
Players need not coordinate on a particular equilibrium:
cricket movies
cricket
2
0
cricket
Husband
Wife
movies
0
1
movies
Ankur A. Kulkarni (IIT Bombay)
Game Theory
cricket
1
0
movies
0
2
Aug 7, 2015
18 / 42
Mixed strategies
Rather than let players pick specific rows/columns, we now let them randomly
choose them
Strategies now are the probabilities, in other words the distributions
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
19 / 42
Mixed strategies
Rather than let players pick specific rows/columns, we now let them randomly
choose them
Strategies now are the probabilities, in other words the distributions
Mixed strategies
For row player is a probability distribution y over the set of rows of A ∈ R m×n y ∈ Rm
such that y ≥ 0 and 1> y = 1. Similarly, a mixed strategy for the column player is a
probability distribution over the set of columns of A ∈ R m×n . i.e., z ∈ Rn such that
z ≥ 0, 1> z = 1.
P
Column player chooses z to maximize i,j aij yi zj = y> Az. Row player y to minimize
y > Az. Rows/columns are called pure strategies.
Interpretation
Deliberate randomization
Division of resources
Diversification of portfolio
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
19 / 42
Minimax theorem
Although there may be no saddle point in pure strategies, there is
always a saddle point if one allows mixed strategies
There exist y ∗ , z ∗ such that
(y ∗ )> Az ≤ (y ∗ )> Az ∗ ≤ y > Az ∗
∀y , z
(y ∗ )> Az ∗ = min max y > Az = max min y > Az
y
z
z
y
All saddle points have same value
Saddle points are comprised of security strategies
If (y 1 , z 1 ) and (y 2 , z 2 ) are saddle points, then so are (y 1 , z 2 ), (y 2 , z 1 )
Minimax theorem is a major milestone in the theory of games
It was later observed that it is equivalent to linear programming duality, which came
many years later
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
20 / 42
Non-zero sum games
let N = {1, 2, . . . , N} set of players
Mi is the finite set of pure strategies for each i ∈ N
axi 1 ,x2 ,...,xN = payoff for player i, denoted as Pi , when the strategies
chosen by P1 , P2 , · · · , Pn are x1 , x2 , · · · , xN respectively.
Yi : set of probability distributions on Mi for Pi , equivalently
mixed strategies
Player i chooses y i to minimize
πi (y 1 , . . . , y N ) =
X
aji1 ,...,jN yj11 . . . yjNN
j1 ∈M1 ,j2 ∈M2 ,...,jN ∈MN
Although there may not be a Nash equilibrium in pure strategies, there always is a
Nash equilibrium in mixed strategies
This is a significant generalization of von Neumann’s minimax theorem
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
21 / 42
Continuous kernel games
Results on existence of Nash equilibria can be extended beyond matrix games and
mixed strategies.
In continuous kernel games, players have a continuum of strategies. e.g., velocity of
a robot, price of a commodity etc.
A mixed strategy in such games is a probability measure on the space of pure
strategies
Theorem (Existence of equilibria in pure strategies)
Let Q
Si ⊆ Rmi be a convex and compact for each i ∈ N . For each i ∈ N , let
ui : j∈N Sj → R be continuous and such that ui (xi , x−i ) is convex in xi for each fixed
x−i . Then, there exists a Nash equilibrium (in pure strategies).
Theorem (Existence of equilibria in mixed strategies)
Let Si ⊆ Rmi be compact (not necessarily convex) for each i ∈ N and let ui be
continuous. Then there exists an equilibrium in mixed strategies.
Further extensions: Si are compact Hausdorff spaces (Glicksberg 1952)
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
22 / 42
Dynamic games
All the above games were simultaneous move games
Example
P1 making the first move starting at node x. If P1
plays L1 , the game ends. Else if P1 plays R1 , P2 has
to move next at node y.
(0, 2) z
(−1, −1)
(1, 1)
L2
R2
y (P2)
What are the strategies?
L1
What are the Nash equilibria?
Ankur A. Kulkarni (IIT Bombay)
Game Theory
x (P1)
R1
Aug 7, 2015
23 / 42
Dynamic games
All the above games were simultaneous move games
Example
P1 making the first move starting at node x. If P1
plays L1 , the game ends. Else if P1 plays R1 , P2 has
to move next at node y.
(0, 2) z
(−1, −1)
(1, 1)
L2
R2
y (P2)
What are the strategies?
L1
What are the Nash equilibria?
x (P1)
R1
For P1 : L1 or R1
For P2 : “do nothing” if P1 has played L1 or L2 , R2 if P1 has played R1
Strategies for P2 are a function of what has happened previously
This game has two Nash equilibria: (L1 , L2 ) and (R1 , R2 )
(L1 , L2 ) is a threat equilibrium
(R1 , R2 ) is subgame perfect (more later)
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
23 / 42
Dynamic games
Suppose that P2 knows whether P1 has played L or not, but
cannot distinguish between M and R.
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
24 / 42
Dynamic games
Suppose that P2 knows whether P1 has played L or not, but
cannot distinguish between M and R.
The set of possible strategies for P1 is L, M, R.
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
24 / 42
Dynamic games
Suppose that P2 knows whether P1 has played L or not, but
cannot distinguish between M and R.
The set of possible strategies for P1 is L, M, R. Strategies of P2 are a function of L and
LC . Since P2 cannot distinguish between M and R, his strategies are same for both
cases.
(
(
L
if P1 plays L
R
if P1 plays L
2
2
γ2 =
γ1 =
L
if P1 plays M or R
R
if P1 plays M or R
(
(
L
if P1 plays L
R
if P1 plays L
γ32 =
γ42 =
R
if P1 plays M or R
L
if P1 plays M or R
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
24 / 42
Extensive form dynamic games
An extensive-form dynamic game for N players is a tree with the following properties:
A specific vertex indicating the starting point
A payoff for each player at each terminal node: J 1 (node), . . . , J N (node)
A partition of the nodes of the tree into N player sets
A subpartition of each player set into information sets {ηji }, i ∈ N such that the
same number of branches emanate from each node belonging to the same
information set and no node follows another node in the same information set
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
25 / 42
Extensive form dynamic games
An extensive-form dynamic game for N players is a tree with the following properties:
A specific vertex indicating the starting point
A payoff for each player at each terminal node: J 1 (node), . . . , J N (node)
A partition of the nodes of the tree into N player sets
A subpartition of each player set into information sets {ηji }, i ∈ N such that the
same number of branches emanate from each node belonging to the same
information set and no node follows another node in the same information set
P2
a
P1
d
b
c
P1
e
P3
f’
P4
g’
P1
h’
f’
P4
P3
m
Ankur A. Kulkarni (IIT Bombay)
Game Theory
h’
g’
n
o
Aug 7, 2015
25 / 42
Strategies in extensive form dynamic games
In a dynamic game, strategies are not merely “actions”; they are a complete plan of
actions for each possible scenario
Thus they are functions of the information
Denote by I i the set of all information sets of Pi . For any η i ∈ I i , let Uηi i be the set
of actions
to Pi at η i . A strategy for Pi is a function γ i : I i → U i , where
S available
i
i
U =
Uηi such that
η i ∈I i
γ i (ηi ) ∈ Uηi i
Ankur A. Kulkarni (IIT Bombay)
Game Theory
∀η i ∈ I i
Aug 7, 2015
26 / 42
Strategies in extensive form dynamic games
In a dynamic game, strategies are not merely “actions”; they are a complete plan of
actions for each possible scenario
Thus they are functions of the information
Denote by I i the set of all information sets of Pi . For any η i ∈ I i , let Uηi i be the set
of actions
to Pi at η i . A strategy for Pi is a function γ i : I i → U i , where
S available
i
i
U =
Uηi such that
η i ∈I i
γ i (ηi ) ∈ Uηi i
∀η i ∈ I i
Optimization v/s Optimal control
This is exactly the distinction between optimization and optimal control. In optimization
the problem is to find a vector. In optimal control, the problem is to find a function or a
control law.
The sets I i , i ∈ N determine the information structure of the game
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
26 / 42
Nash equilibrium and the normal form
Each choice of strategies (functions γ 1 , . . . , γ N ) results in a specific path on the tree
and hence a specific payoff for each player (Why?). Denote this as J i (γ 1 , . . . , γ N )
A dynamic game can be considered to be a static game, but in the space of γ’s
Nash equilibrium
γ 1∗ , . . . , γ N∗ such that for all i ∈ N ,
J i (γ 1∗ , . . . , γ N∗ ) ≤ J i (γ 1 , . . . , γ i−1∗ , γ i , γ i+1∗ , . . . , γ −i∗ )
∀γ i
In an extensive form game one can write an equivalent finite strategy game where
strategies are the γ’s (why?). This is called the normal form
The Nash equilibrium above is that of this normal form
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
27 / 42
Classification of games based on information structures
A Nash equilibrium of a dynamic game can be found using the normal form; but in
practice this is difficult
The tree structure suggests that one should be able to use a recursive argument –
something like dynamic programming. This is not always possible since information sets
may stretch across many branches. Thus a player cannot argue recursively since he does
know which branch he is at. The ease of recursively solving games depends on its
information structure.
In a simultaneous move or static game: each player has only one information set
A game of perfect information: each node is in a different information set, or each
information set is a singleton (each player knows the sequence of strategies played at
any point. In other words)
In a single act game, each player can play at most once. Therefore, each path
starting from the root node intersects the player set of each player at most once.
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
28 / 42
Perfect information and backward induction
We can find an equilibrium of a game with perfect information by backward induction
(multiplayer-version of dynamic programming)
(assume players are seeking to minimize)
At node 4 optimal action for P3 is to pick L3 , at node 5 R3 , at node 6 R3 and at node 7
L3 .
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
29 / 42
Perfect information and backward induction
We can find an equilibrium of a game with perfect information by backward induction
(multiplayer-version of dynamic programming)
(assume players are seeking to minimize)
We are left with this game. Now at node 2, P2 will pick L2 , at node 3, P2 will pick R2 .
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
29 / 42
Perfect information and backward induction
We can find an equilibrium of a game with perfect information by backward induction
(multiplayer-version of dynamic programming)
(assume players are seeking to minimize)
We are left with this game. Now at node 2, P2 will pick L2 , at node 3, P2 will pick R2 .
At root node P1 will choose L1 . Thus we are left with a strategy {L1 , L2 , L3 },
corresponding to which the payoff is (1,2,3) for P1 , P2 and P3 , is in Nash equilibrium.
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
29 / 42
Subgame perfection
Does the backward induction process capture all equilibria? No!
It captures only those that are “recursively rational” – i.e., equilibria that are also
equilibria for every subgame
These equilibria are called subgame perfect
Example
P1 making the first move starting at node x. If P1
plays L1 , the game ends. Else if P1 plays R1 , P2 has
to move next at node y.
(0, 2) z
L1
(−1, −1)
(1, 1)
L2
R2
y (P2)
x (P1)
R1
The only reliable way to find all equilibria is to analyze the normal form
Finding equilibria for general games involves decomposing the tree into subgames
that are either games of perfect information or static games
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
30 / 42
Existence of Nash equilibria in extensive form games
Perfect information
A game with perfect information always admits a Nash equilibrium in pure strategies
More generally, very little can be said
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
31 / 42
Existence of Nash equilibria in extensive form games
Perfect information
A game with perfect information always admits a Nash equilibrium in pure strategies
More generally, very little can be said
Mixed strategy equilibrium
Any game in extensive form always admits a Nash equilibrium in mixed strategies (Why?)
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
31 / 42
Existence of Nash equilibria in extensive form games
Perfect information
A game with perfect information always admits a Nash equilibrium in pure strategies
More generally, very little can be said
Mixed strategy equilibrium
Any game in extensive form always admits a Nash equilibrium in mixed strategies (Why?)
A mixed strategy is a randomization over pure strategies
Ideally we would like something more intuitive: at each information set, we would
like to pick an action randomly. Such a strategy is called a behavioural strategy.
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
31 / 42
Existence of Nash equilibria in extensive form games
Perfect information
A game with perfect information always admits a Nash equilibrium in pure strategies
More generally, very little can be said
Mixed strategy equilibrium
Any game in extensive form always admits a Nash equilibrium in mixed strategies (Why?)
A mixed strategy is a randomization over pure strategies
Ideally we would like something more intuitive: at each information set, we would
like to pick an action randomly. Such a strategy is called a behavioural strategy.
Kuhn’s theorem
If each player in the game has perfect recall, there exists an equilibrium in behavioural
strategies
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
31 / 42
Dynamic games in state space form
Control theory allows two kinds of models of systems: “input/output” models and
“state space”
The extensive form is an “input/output” representation. Following is a state space
representation
State space model of a game
Players N = {1, . . . , N}
Discrete time k ∈ K = {1, . . . , K }
Action uki ∈ Uki at time k
State xk ∈ Xk at time k, given initial state x0 and state dynamics given by
xk+1 = fk (xk , uk1 , . . . , ukN )
Observations yki = hki (xk ) for player i ∈ N at time k ∈ K
Information ηki ⊆ {ytj |j ∈ N , t ≤ k} ∪ {utj |j ∈ N , t ≤ k − 1}. Let Iki be the ambient space
of ηki
Strategies γki : Iki → Uki for each i ∈ N , k ∈ K. γ i = (γki )k∈K . Space of such mappings is Γi
Cost function J i (γ 1 , . . . , γ N ) for each i ∈ N
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
32 / 42
Dynamic games in state space: Nash equilibrium and information structures
Once again, the Nash equilibrium is given by γ 1∗ , . . . , γ N∗ such that
J i (γ 1∗ , . . . , γ N∗ ) ≤ J i (γ i ; γ −i∗ )
∀γ i ∈ Γi .
As before a choice of γ 1 , . . . , γ N uniquely determines a state trajectory (akin to a
path through the tree)
Usually one also makes the assumption that the cost is “stage additive”
X i
J i (γ 1 , . . . , γ N ) ≡
gk (xk , uk1 , . . . , ukN )
k∈K
Again, whether we can apply dynamic programming or do we have to solve the
problem in function space is determined by the information structure
One can also make the model more general by including a “terminal state”
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
33 / 42
Examples of information structures and relation to control theory
Open loop: ηki = {x0 } for all i, k
Closed loop, perfect state: ηki = {x0 , . . . , xk }
Closed loop, imperfect state: ηki = {yki , . . . , yki }
Feedback, perfect state ηki = {xk }
Feedback, imperfect state ηki = {yki }
Optimal control
Optimal control is a particular example of the above model with N = {1} and feedback
or closed loop information structure. When one is seeking optimal feedback “control
laws”, or “output feedback” one is implicitly assuming the feedback or closed loop
information structure.
Information structures can be much more complex than those above; whether one
can do dynamic programming depends on the information structure
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
34 / 42
Informational nonuniqueness of Nash equilibria
An open loop equilibrium strategy is a constant strategy. Thus open loop equilibria
can be found by considering a static game (in the actions) obtained by
back-substituting the state equation: e.g.,
1
N
uki = γki (xk ) = γki (fk−1 (xk−1 , uk−1
, . . . , uk−1
)) = · · ·
In a deterministic game, since future state is a deterministic function of past states and
actions, any closed loop strategy has an “equivalent” open loop strategy that generates
the same state trajectory. Consequently, in optimal control, strategies are generically
informationally nonunique
Thus one speaks not of an optimal strategy but an equivalence class of optimal
strategies that generate the same trajectory.
This issue also percolates to Nash equilibria. Generally, we have
Informational nonuniqueness
G1 is informationally inferior to G2 if for every player, we have that any stage, whatever
the player knows in G1 , he also knows in G2 at the same stage. Then any equilibrium of
G1 is also an equilibrium of G2 .
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
35 / 42
Team problems and optimal control
A game is called a team problem if J 1 = J 2 = . . . = J N , J
A team is only faced with optimization and no competition. γ 1∗ , . . . , γ N∗ is called
team optimal if
J(γ 1∗ , . . . , γ N∗ ) ≤ J(γ 1 , . . . , γ N ),
∀γ 1 , . . . , γ N .
The team optimal solution concept applies if the game is cooperative. i.e., players
can beforehand agree on strategies
γ 1∗ , . . . , γ N∗ is called person by person optimal if
J(γ 1∗ , . . . , γ N∗ ) ≤ J(γ i , γ −i∗ ),
∀γ i , ∀i ∈ N .
This solution concept applies if the game is noncooperative.
Optimal control problems can be considered as team problems in two ways
Trivial team with N = {1}
Each stage or time instant is a separate player that acts only once. i.e., N = K and Uki is a
singleton (a trivial action) if i 6= k
More generally, teams are distinct from optimal control. Team optimization cannot
be reduced to ordinary optimization since information of player is different
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
36 / 42
Concrete results: open loop and closed loop equilibria of LQ games
Open loop Nash equilibria can in principle be found by considering a static game
(obtained by back-substitution). Generally this game becomes too complex.
The other option is to use Pontryagin’s minimum principle, which is also complex1
However, if the game is linear-quadratic, i.e.,
X j > ij j
gki (uk , xk ) = 21 xk> Qki xk +
(uk ) Rk uk ,
xk+1 = Ak xk +
j∈N
X
Bkj ukj ,
j∈N
with Qki 0, R ii 0, one can solve the conditions given by the minimum principle
to show equilibria exist and to find equilibria
A similar method yields equilibria for closed loop information structure and feedback
information structure
Structure of the closed loop equilibrium is closely related to the Ricatti equation
Similar results exist for LQ games with infinite horizon
1
My student and I have developed a new way of attacking this question [AK15]
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
37 / 42
Stochastic teams
We now focus only on team problems; things are harder for games
Things get significantly more complex if there is noise involved
xk+1 = f (xk , uk1 , . . . , ukN , wk )
The recursive back-substitution no more works
Dynamic programming needs cascading conditional expectations, which work only if
the information structure is nested. The only case we know to solve reliably is when
Ik ⊆ Ik+1
Outside this setting, even simple team problems remain unsolved
Witsenhausen’s counterexample [Wit68]
x0 , w independent Gaussian x1 = u0 + w
h
i
J(γ0 , γ1 ) = E (u0 − x0 )2 + (u0 − u1 )2 ,
u0 = γ0 (x0 ), u1 = γ1 (u0 + w ),
Optimal γ0 , γ1 are unknown.
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
38 / 42
Interpretation of the Witsenhausen problem
As a two-stage decision problem with finite memory
Zero recall of past state and actions
As a team problem
Two players, cooperative game, sequentially played
Imperfect communication between them
As an engineering system
γ0 is a “sensor”
γ1 is a “controller” which is not co-located with the sensor to which these signals
are sent
As an organization
γ0 is a field agent
γ1 is a supervisor he reports to
As a communication system
γ0 , γ1 are encoders and decoders, w is the channel noise
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
39 / 42
Static information structure
A case we can solve
Static team
Environmental randomness is ξ.
Observations zi = hi (ξ) for each i ∈ N .
Find ui = γi (zi ) to minimize E[L(u1 , . . . , uN , ξ)]
The information structure is called static because the actions of a player do not
affect the information of other players
Closely related to the broadcast channel in communications
Key structural result
If L is convex in u 1 , . . . , u N and continuously differentiable, then γ ∗ is team optimal if
and only if it is person by person optimal
LQG games (where hi is linear and L is convex quadratic in u and linear in ξ and ξ is
Gaussian) can be solved easily2 . First shown by Radner [Rad62].
2
I have recently extended this to show that it is easy to find near-optimal strategies if ξ has log-concave
density [Kul15]
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
40 / 42
Dynamic information structure and information theory
Dynamic information structure
Information structure is dynamic if the action of a player affects the information of
another player zi = hi (ξ, uj ).
If uj cannot be inferred from zi , we have the situation where Pj affects what Pi can
know, but Pi does not know what Pj knows
Closely related to communication/information theory. Information theory does not
get us too far, though.
In general, there is little clarity about what to do in these settings
More generally, complex networked settings like cyber-physical systems, smart-grids
etc all face this issue
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
41 / 42
Additional reading on game theory
Game theory by Fudenberg and Tirole
Game theory by Myerson
Game theory by Maschler, Solan and Zamir
Dynamic noncooperative game theory by Başar and Olsder
Stochastic networked control by Başar and Yuksel
Works on H. Witsenhausen and Y.C. Ho
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
42 / 42
Mathew P. Abraham and Ankur A. Kulkarni.
New results on existence of open loop Nash equilibria in discrete time dynamic
games.
to be submitted to IEEE Transactions on Automatic Control, 2015.
Ankur A. Kulkarni.
Approximately optimal linear strategies for static teams with big non-gaussian noise.
In under review for the IEEE Conference on Decision and Control, 2015.
Roy Radner.
Team decision problems.
The Annals of Mathematical Statistics, pages 857–881, 1962.
H. S. Witsenhausen.
A counterexample in stochastic optimum control.
SIAM Journal on Control, 6:131, 1968.
Ankur A. Kulkarni (IIT Bombay)
Game Theory
Aug 7, 2015
42 / 42