Math 464: Linear Optimization and Game

Math 464: Linear Optimization and Game
Haijun Li
Department of Mathematics
Washington State University
Spring 2013
Game Theory
Game theory (GT) is a theory of rational behavior of people
with nonidentical (self-) interests.
Common Features:
1
There is a set of at least two players (or entities);
2
all players follow a same set of rules;
3
interests of different players are different and selfish.
Game Theory
• Game theory can be defined as the theory of mathematical
models of conflict and cooperation between intelligent,
rational decision-makers.
• Game theory is applicable whenever at least two
individuals - people, species, companies, political parties,
or nations - confront situations where the outcome for each
depends on the behavior of all.
• Game theory proposes solution concepts, defining rational
outcomes of games. Solution concepts may be hard to
compute ...
Early History ...
• Modern game theory began with the work of Ernst Zermelo
(1913, Well-Ordering Theorem, Axiom of Choice), Émile
Borel (1921, symmetric two-player zero-sum matrix game),
John von Neumann (1928, two-player matrix game).
• The early results are summarized in the great seminal
book “Theory of Games and Economic Behavior” of von
Neumann and Oskar Morgenstern (1944).
In all GT models the basic entity is a player.
Once we defined the set of players we may distinguish between two types of m
-
Types of Games
primitives are the sets of possible actions of individual players;
Non-cooperative Game: actions of individual players
primitives areGame:
the setsjoint
of possible
actions
groups of players.
Cooperative
actionsjoint
of groups
ofofplayers
Game Theory
Noncooperative GT
(models of type I)
Games in
Strategic
Form
Games in Extensive Form
EFG with
Perfect
Information
EFG with
Imperfect
Information
Cooperative GT
(models of type II)
Strategic-Form Games or Games in
Normal Form
Basic ingredients:
• N = {1, . . . , n}, n ≥ 2, is a set of players.
• Si is a nonempty set of possible strategies (or pure
strategies) of player i. Each player i must choose some
si ∈ Si .
• S = {(s1 , . . . , sn ) : si ∈ Si }, the set of all possible outcomes
(or pure strategy profiles).
• ui : S → R, a utility function of player i; that is, ui (s) = payoff
of player i if the outcome is s ∈ S.
Definition
A strategic-form game is Γ = (N, {Si }, {ui }).
John Nash Equilibrium (1950)
• Observe that a player’s utility depends not just on his/her
action, but on actions of other players.
• For player i, finding the best action involves deliberating
about what others would do.
Definition
1
All players in N are happy to find such an outcome s∗ ∈ S
such that
ui (s) ≤ ui (s∗ ), ∀ i ∈ N, s ∈ S.
2
An outcome s∗ = (s∗1 , . . . , s∗n ) ∈ S is a Nash equilibrium if for
all i ∈ N,
ui (s∗1 , . . . , s∗i−1 , ti , s∗i+1 , . . . , s∗n ) ≤ ui (s∗ ), ∀ti ∈ Si .
Example: Prisoners’ Dilemma (RAND,
1950; Albert Tucker, 1950)
• Two suspects (A and B) committed a crime.
• Court does not have enough evidence to convict them of
the crime, but can convict them of a minor offense (1 year
in prison each).
• If one suspect confesses (acts as an informer), he walks
free, and the other suspect gets 20 years.
• If both confess, each gets 5 years.
• Suspects have no way of communicating or making
binding agreements.
Prisoners’ Dilemma: A Matrix Game
Rationality =⇒
6
Best Solution
Suspect A’s reasoning:
• If B stays quiet, I should confess;
• if B confesses, I should confess too.
Suspect B does a similar thing.
Unique Nash Equilibrium at (5, 5):
Both confess and each gets 5 years in prison.
player has a finite set of strategies.
Two-player Matrix Games
= {x1 , . . . , xn }, S2 = Y = {y1 , . . . , ym },
• N = {1, 2}
1 (xi , yj ), bij = u2 (xi , yj ).
• S1 = {x1 , . . . , xn }, S2 = {y1 , . . . , ym }.
• aij = u1 (xi , yj ), bij = u2 (xi , yj ).
y1
…
ym
x1
(a11,b11)
…
(a1m,b 1m)
xn
…
(an1,bn1)
…
…
(anm,b nm)
Figure : Row Player = player 1 Column Player = player 2
outcome for each animal is that in which it acts like a hawk whi
Hawk-Dove
ve; the worst outcome is that Example:
in which both
animals act like haw
• Two
animals are
fighting
over some
prey. Each
behaveif its o
refers to
be hawkish
if its
opponent
is dovish
andcan
dovish
like a dove or like a hawk.
.
• The reasonable outcome for each animal is that in which it
acts like a hawk while the other acts like a dove.
• The
worstequilibria,
outcome is that
which(h,d),
both animals
act like
me has two
Nash
(d,h)in and
corresponding
hawks.
ons about the player who yields.
• Each animal prefers to be hawkish if its opponent is dovish
and dovish if its opponent is hawkish.
• The game has two Nash equilibria, (d,h) and (h,d).
dove
hawk
dove
3,3
1,4
hawk
4,1
0,0
to tw
Example: Matching Pennies
• Each
e has no
Nash
of twoequilibria.
people chooses either Head or Tail.
• If the choices differ, person 1 pays person 2 $1; if they are
the same, person 2 pays person 1 $1.
• Each person cares only about the amount of money that
he receives.
• The game has no Nash equilibrium.
head
tail
head
1,-1
-1,1
tail
-1,1
1,-1
Figure : No Nash equilibrium
Strictly Competitive Games
Definition
A strategic game Γ = ({1, 2}, {S1 , S2 }, {u1 , u2 }) is strictly
competitive if for any outcome (s1 , s2 ) ∈ S, we have
u2 (s1 , s2 ) = −u1 (s1 , s2 ) (Zero-Sum).
Remark
1
If u1 (s1 , s2 ) = gain for player 1, then u1 (s1 , s2 ) = loss for
player 2.
2
If an outcome (s∗1 , s∗2 ) is a Nash equilibrium, then
u1 (s1 , s∗2 ) ≤ u1 (s∗1 , s∗2 ) ≤ u1 (s∗1 , s2 ), ∀ s1 ∈ S1 , s2 ∈ S2 .
That is, Nash equilibrium is a saddle point.
1
Player 1 maximizes gain, whereas player 2 minimizes loss.
min u1 (s1 , y) ≤ max u1 (x, s2 ), ∀ s1 ∈ S1 , s2 ∈ S2 .
y∈S2
2
x∈S1
In other worlds, player 1 maximizes player 2’s loss,
whereas player 2 minimizes player 1’s gain.
max min u1 (x, y) ≤ min max u1 (x, y)
x∈S1 y∈S2
3
y∈S2 x∈S1
A best guaranteed outcome for player 1 would be x∗ with
min u1 (x∗ , y) ≥ min u1 (x, y), ∀ x ∈ S1 .
y∈S2
4
y∈S2
A best guaranteed outcome for player 2 would be y∗ with
max u1 (x, y∗ ) ≤ max u1 (x, y), ∀ y ∈ S2 .
x∈S1
x∈S1
max min u1 (x, y) ≤ min u1 (x∗ , y) ≤ max u1 (x, y∗ ) ≤ min max u1 (x, y).
x∈S1 y∈S2
y∈S2
x∈S1
y∈S2 x∈S1
MiniMax Theorem (Borel, 1921; von Neumann, 1928)
An outcome (s∗1 , s∗2 ) is a Nash equilibrium in a strictly
competitive game Γ = ({1, 2}, {S1 , S2 }, {u1 , −u1 }) if and only if
max min u1 (x, y) = u1 (s∗1 , s∗2 ) = min max u1 (x, y) =: game value,
x∈S1 y∈S2
y∈S2 x∈S1
where s∗1 is a best outcome for player 1 while s∗2 is a best
outcome for player 2.
e strictly competitive strategic game admits simple and convenient
ntation in the matrix form.
Two-player Zero-Sum Matrix Games
{x1 , . . . ,•xnN},=Y{1,
=2}
{y1 , . . . , ym },
• S = {x , . . . , x }, S2 = {y1 , . . . , ym }.
= u1 (xi , yj ), 1u2 (xi ,1 yj ) = n−u1 (x
, yj ) = −aij .
• aij = u1 (xi , yj ), −aij = iu2 (x
i , yj ).
a11
a21
..
.
an1
↓
a12
a22
..
.
an2
...
...
..
.
...
a1m
a2m
..
.
anm
↓
...
↓
max max . . . max
|
{z
}
⇓ min
M
→
→
..
.
→
min
min
..
.
min









max
=⇒ m
Two-Player Constant-Sum Games
• There are two players: player 1 is called the row player and
player 2 is called the column player.
• The row player must choose 1 of n strategies, and the
column player must choose 1 of m strategies.
• If the row player chooses the i-th strategy and the column
player chooses the j-th strategy, then the row player
receives a reward of aij and the column player receives a
reward of c − aij .
• If c = 0, then we have a two-player zero-sum game.
Example: Completing Networks
• Network 1 and Network 2 are competing for an audience of
100 million viewers at certain time slot.
• The networks must simultaneously announce the type of
show they will air in that time slot: Western, soap opera, or
comedy.
• If network 1 has aij million viewers, then network 2 will
have 100 − aij million viewers.
Game of Odds and Evens (or Matching
Pennies, again)
• Two players (Odd and Even) simultaneously choose the
number of fingers (1 or 2) to put out.
• If the sum of the fingers is odd, then Odd wins $1 from
Even.
• If the sum of the fingers is even, then Even wins $1 from
Odd.
• This game has no saddle point.
We Need More Strategies!
To analyze the games without saddle point, we introduce
randomized strategies by choosing a strategy according to a
probability distribution.
• x1 = probability that Odd puts out one finger
• x2 = probability that Odd puts out two finger
• y1 = probability that Even puts out one finger
• y2 = probability that Even puts out two finger
where x1 + x2 = 1 and y1 + y2 = 1, x1 ≥ 0, x2 ≥ 0, y1 ≥ 0, y2 ≥ 0.
Odd tosses a loaded coin (with P(Head) = x1 , P(tail) = x2 ) to
choose a strategy. Even does a similar thing.
If x1 = 1 or x2 = 1 (y1 = 1 or y2 = 1), then Odd (Even) chooses a
pure strategy.
Randomized Strategies
Let (x1 , . . . , xm ) and (y1 , . . . , yn ) be two probability vectors (i.e.,
entries are all non-negative and add up to 1 for each vector).
• There are two players: player 1 is called the row player and
player 2 is called the column player.
• The row player must choose 1 of m strategies, and the
column player must choose 1 of n strategies.
• If the row player chooses the i-th strategy with probability xi
and the column player chooses the j-th strategy with
probability yj , then the row player receives a reward of aij
and the column player receives a reward of −aij .
Given that one player chooses a strategy, how to calculate the
average reward of the other player?
Odd’s Optimal Strategy
Odd needs to minimize his loss (or find a loss floor).
• If Even puts out one finger, then Odd’s average reward is
Odd’s expected reward = (−1)x1 + (+1)(1 − x1 ) = 1 − 2x1 .
• If Even puts out two finger, then Odd’s average reward is
Odd’s expected reward = (+1)x1 + (−1)(1 − x1 ) = 2x1 − 1.
Figure : Odd’s Reward
Even’s Optimal Strategy
Even needs to maximize his reward (or find a reward ceiling).
• If Odd puts out one finger, then Even’s average reward is
Even’s expected reward = (+1)y1 + (−1)(1 − y1 ) = 2y1 − 1.
• If Odd puts out two finger, then Even’s average reward is
Even’s expected reward = (−1)y1 + (+1)(1 − y1 ) = 1 − 2y1 .
Figure : Even’s Reward
Analysis
Figure : Value of Game with Randomized Strategies
Value of Game with Randomized
Strategies
• In the game of Odds and Evens, Odd’s loss floor equals to
Even’s reward ceiling when they use the randomized
strategy ( 12 , 12 ).
• The common value of floor and ceiling is called the value of
the game.
• The strategy that corresponds to the value of the game is
called an optimal strategy.
• This optimal randomized strategy ( 12 , 21 ) can be obtained
via the duality theorem.
Randomized Strategies
• Γ = (N, {Si }, {ui }) is a strategic game.
• A randomized strategy of player i is a probability
distribution Pi over the set Si of its pure strategies.
• Pi (si ) = probability that player i chooses strategy si ∈ Si .
• We assume that randomized strategies of different players
are independent.
Definition
For any i ∈ N, the expected utility of player i given that player j,
j 6= i, chooses strategy sj ∈ Sj is given by
X
ui (s1 , . . . , si−1 , si , si+1 , . . . , sn )Pi (si ).
E(Pi ) :=
si ∈Si
Randomized Strategy Nash Equilibrium
• Γ = (N, {Si }, {ui }) is a strategic game.
• A randomized strategy of player i is a probability
distribution Pi over the set Si of its pure strategies.
• Pi (si ) = probability that player i chooses strategy si ∈ Si .
• We assume that randomized strategies of different players
are independent.
Theorem (Nash, 1950)
Every finite strategic game has a randomized strategy Nash
equilibrium.
Remark
For two-player matrix games this result was obtained by von
Neumann in 1928.
Example: Stone, Paper, and Scissors
• The two players (row and column players) must choose 1
of three strategies: Stone, Paper, and Scissors.
• If both players use the same strategy, the game is a draw.
• Otherwise, one player wins $1 from the other according to
the following rule:
scissors cut paper, paper covers stone, stone breaks scissors.
Randomized Strategies
• x1 = probability that row player chooses stone
• x2 = probability that row player chooses paper
• x3 = probability that row player chooses scissors
• y1 = probability that column player chooses stone
• y2 = probability that column player chooses paper
• y3 = probability that column player chooses scissors
where x1 + x2 + x3 = 1 and y1 + y2 + y3 = 1, x1 , x2 , x3 , y1 , y2 , y3
are all non-negative.
The row player chooses a randomized strategy (x1 , x2 , x3 ).
The column player chooses a randomized strategy (y1 , y2 , y3 ).
Row Player’s LP for Max. Reward v
max z = v
v ≤ x2 − x3
v ≤ −x1 + x3
v ≤ x1 − x2
x1 + x2 + x3 = 1
x1 , x2 , x3 ≥ 0,
v urs.
Column Player’s LP for Min. Loss w
min z = w
w ≥ −y2 + y3
w ≥ y1 − y3
w ≥ −y1 + y2
y1 + y2 + y3 = 1
y1 , y2 , y3 ≥ 0,
w urs.
Dual of Row’s LP = Column LP
The optimal strategy for both players is ( 13 , 31 , 31 ).
Figure : Dual of Row’s LP = Column LP
Proof Idea of Nash’s Theorem via
Duality
• Given that the column player chooses his strategy,
maximize the row player’s expected reward under
randomized strategy (x1 , . . . , xm ).
• Given that the row player chooses his strategy, minimize
the column player’s expected loss under randomized
strategy (y1 , . . . , yn ).
Figure : Dual of Row’s LP = Column LP
• Γ = ({1, 2}, {S1 , S2 }, {u1 , u2 }) is a strategic game.
• A randomized strategy of player i is a probability
distribution Pi over the set Si of its pure strategies.
• Es2 (P1 ) = expected utility of player 1 given that player 2
chooses strategy s2 ∈ S2 .
• Es1 (P2 ) = expected utility of player 2 given that player 1
chooses strategy s1 ∈ S1 .
Primal LP
max z = v
v ≤
min Es2 (P1 ), v urs.
s2 ∈S2
Dual LP
min z = w
w ≥ max Es1 (P2 ), w urs.
s1 ∈S1
Duality, Again
• An optimal solution exists such that
max min Es2 (P1 ) = min max Es1 (P2 )
∀P1 s2 ∈S2
∀P2 s1 ∈S1
• The common value is known as the value of the game.
• Nash’s original proof (in his thesis) used Brouwer’s fixed
point theorem.
• When Nash made this point to John von Neumann in 1949,
von Neumann famously dismissed it with the words,
“That’s trivial, you know. That’s just a fixed point theorem.”
(Nasar, 1998)
Significance of Probabilistic Methods
• Probabilistic methods are often used to incorporate
uncertainty. In contrast, the probabilistic method is used
here to enlarge the solution set so that a Nash equilibrium
can be achieved using randomized strategies.
• Probabilistic methods are increasingly used to prove the
existence of certain rare objects in mathematical
constructs.