GDRR Tutorial Game Theory - School of Computer Science and

July, 2013
Tutorial:
Introduction to Game Theory
Jesus Rios
IBM T.J. Watson Research Center, USA
[email protected]
© 2013 IBM Corporation
Approaches to decision analysis
  Descriptive
–  Understanding of how decisions are made
  Normative
–  Models of how decision should be made
  Prescriptive
–  Helping DM make smart decisions
–  Use of normative theory to support DM
–  Elicit inputs of normative models
•  DM preferences and beliefs (psycho-analysis)
•  use of experts
–  Role of descriptive theories of DM behavior
2
© 2013 IBM Corporation
Game theory arena
  Non-cooperative games
–  More than one intelligent player
–  Individual action spaces
–  Interdependent consequences
•  Players’ consequences depend on their own and other player actions
  Cooperative game theory
–  Normative bargaining models
•  Joint decision making
-  Binding agreements on what to play
•  Given players preferences and solution space
Find a fair, jointly satisfying and Pareto optimal agreement/solution
–  Group decision making on a common action space (Social choice)
•  Preference aggregation
•  Voting rules
-  Arrow’s theorem
–  Coalition games
3
© 2013 IBM Corporation
Cooperative game theory: Bargaining solution concepts
Working alone
Juan
How to distribute
$ 10
the profits of the cooperation?
Maria $ 20
Working together
$ 100
Juan
x
Maria
y
Maria
x + y = 100
90
K
y
Bliss point
• 
• 
• 
• 
• 
Fair
Disagreement point: BATNA, status quo
Feasible solutions: ZOPA
Pareto-efficiency
Aspiration levels
Fairness:
x = 45
K-S, Nash, maxmin solutions
y = 55
20
10
4
x
80
Juan
© 2013 IBM Corporation
Normative models of decision making under uncertainty
  Models for a unitary DM
–  vN-M expected utility
•  Objective probability distributions
–  Subjective expected utility (SEU)
•  Subjective probability distributions
  Example: investment decision problem
–  One decision variable with two alternatives
•  In what to investment?
-  Treasury bonds
-  IBM shares
–  One uncertainty with two possible states
•  IBM share price at the end of the year
-  High
-  Low
–  One evaluation criteria for consequences
•  Profit from investment
  The simplest decision problem under uncertainty
5
© 2013 IBM Corporation
Decision Table
  DM chooses a row without knowing which column will occur
  Choice depends on the relative likelihood of High and Low?
–  If DM is sure that IBM share price will be High,
best choice is to buy Shares
–  If DM is sure that IBM share price will be Low,
best choice is to buy Bonds
Elicit the DM’s beliefs about which column will occur
  Choice depends on the value of money
–  Expected return not a good measure of decision preferences
•  The two alternatives give the same expected return but most of DMs would not fell indifferent between them
Elicit risk attitude of the DM
6
© 2013 IBM Corporation
Decision tree representation
High
IBM Shares
$2,000
uncertainty
price
Low
- $1,000
What
to buy
Bonds
$500
certainty
  What does the choice depends upon?
–  relative likelihood of H vs L
–  strength of preferences for money
7
© 2013 IBM Corporation
Subjective expected utility solution
  If DM’s decision behavior consistent with some set of “rational” desiderata (axioms)
DM decides as if he has
–  probabilities to represent his beliefs about the future price of IBM share
–  “utilities” to represent his preferences and risk attitude towards money
and choose the alternative of maximum expected utility
  The subjective expected utility model balance in a “rational” manner
–  the DM’s beliefs and risk attitudes
  Application requires to
–  know the DM’s beliefs and “utilities”
•  Different elicitation methods
–  compute of expected utilities of each decision strategy
•  It may require approximation in non-simple problems
8
© 2013 IBM Corporation
A constructive definition of “utility”
  The Basic Canonical Reference Lottery ticket: p-BCRL
p
$2,000
BCLR
1-p
- $1,000
Preferences over BCRL
p-BCRL > q-BCRL iff p > q
where p and q are canonical probabilities
9
© 2013 IBM Corporation
Elicit prob. of the price of IBM shares
  Event H
–  IBM price High
H
IBM shares
  Event L
$2,000
price
L
–  IBM price Low
- $1,000
  Pr( H ) + Pr( L ) = 1
p
p-BCRL
  Move p from 1 to 0
  Which alternative is preferred by the DM?
$2,000
BCRL
1-p
- $1,000
–  IBM shares
–  p-BCRL
  There exists a breakeven canonical prob. such that the DM is indifferent
–  pH-BCRL ~ IBM shares
–  The judgmental probability of H is pH
10
© 2013 IBM Corporation
Elicit the utility of $500
p
p - BCLR
$2,000
BCLR
  U( $500 )?
1-p
Bonds
  Move p from 1 to 0
  Which alternative is preferred by the DM?
- $1,000
$500
p-BCRL vs. Bonds
  There exists a breakeven canonical prob. such that the DM is indifferent
–  u-BCRL ~ Bonds
–  This scales the value of $500 between the value of $2,000 and - $1,000
U($500) = u
  What is then U($500)?
–  The probability of a BCRL between $2,000 and - $1,000 that is indifferent (for the DM) to getting
$500 with certainty
11
© 2013 IBM Corporation
H
Comparison of alternatives
IBM shares
$2,000
price
L
~
pH
- $1,000
$2,000
BCRL
- $1,000
U($500)
$2,000
BCLR
The DM prefers to
invest on “IBM Shares”
iff
pH > U($500)
12
- $1,000
~
Bonds
$500
© 2013 IBM Corporation
Solving the tree: backward induction
  Utility scaling
0 = U( - $1,000 ) < U( $500 ) = u < U( $2,000 ) = 1
Utilities
pH
IBM Shares
High
1 - pH
Bonds
13
1
- $1,000
0
$500
u
price
Low
What
to buy
$2,000
© 2013 IBM Corporation
Preferences: value vs. utility
  Value function
–  measure the desirability (intensity of preferences) of money gained,
–  but do not measure risk attitude
  Utility function
–  Measure risk attitude
–  but no intensity of preferences over sure consequences
  Many methods to elicit a utility function
–  Qualitative analysis of risk attitude leads to parametric utility functions
–  Ask quantitative indifference questions between deals (one of which must be an uncertain
lottery) to assess parameters of utility function
–  Consistency checks and sensitivity analysis
14
© 2013 IBM Corporation
The Bayesian process of inference and evaluation
with several stakeholders and decision makers (Group decision making)
15
© 2013 IBM Corporation
Disagreements in group decision making
  Group decision making assumes
–  Group value/utility function
–  Group probabilities on the uncertainties
  If our experts disagree on the science (Expert problem)
–  How to draw together and learn from conflicting probabilistic judgements
–  Mathematical aggregation
•  Bayesian approach
•  Opinion pools
-  There is no opinion pool satisfying a consensus minimum set of “good” probabilistic properties
•  Issues
-  How do we model knowledge overlap/correlation
-  Expertise evaluation
–  Behavioural aggregation
–  The textbook problem
•  If we do not have access to experts we need to develop meta-analytical
methodologies for drawing together expert judgment studies
16
© 2013 IBM Corporation
Disagreements in group decision making
  If group members disagree on the values
–  How to combine different individuals’ rankings of options into a group ranking?
–  Arbitration/voting
•  Ordinal rankings
-  Arrow impossibility results.
•  Cardinal ranking (values and not utilities -- Decisions without uncertainty)
-  Interpersonal comparison of preferences’ strengths
-  Supra decision maker approach (MAUT)
•  Issues: manipulation and true reporting of rankings
  Disagreement on the values and the science
–  Combining
•  individual probabilities and utilities
•  into group probabilities and utilities, respectively,
•  to form the corresponding group expected utilities and choosing accordingly
–  Impossibility of being Bayesian and Paretian at the same time
•  No aggregation method exist (of probabilities and utilities) compatible with the Pareto order
–  Behavioral approaches
•  Consensus on group probabilities and utilities via sensitivity analysis.
•  Agreement on what to do via negotiation
17
© 2013 IBM Corporation
Decision analysis in the presence of intelligent others
  Matrix games against nature
–  One player: R (Row)
•  Two choices: U (Up) and D (Down)
–  Payoff matrix
Nature
L
R
0
5
10
3
U
R
D
If you were R, what would you do?
D > U against L
U > D against R
18
© 2013 IBM Corporation
Games against nature
  Do we know which Colum nature will choose?
–  We know our best responses to Nature moves, but not what move Nature will choose
  Do we know the (objective) probabilities of Nature’s possible moves?
–  YES
p
1-p
Nature
L
R
Expected payoff
0
5
0 p + 5 (1-p)
10
3
10 p + 3 (1-p)
U
R
D
U > D iff p < 1/6
19
Payoffs = vNM utils
© 2013 IBM Corporation
Games against nature and the SEU criteria
  Do we know the (objective) probabilities of Nature’s possible moves?
–  No
•  Variety of decision criteria
-  Maximin (pessimistic), maxmax (optimistic), Hurwicz, minimax regret,…
Nature
L
R
Min
Max
Max Regret
0
5
0
5
10
10
3
3
10
2
U
R
D
Maxmin
D
Maxmax
D
Minmax
Regret
D
SEU criteria
Elicit DM’s subjective probabilistic beliefs about Nature move (p)
Compute SEU of each alternative: D > U iff p > 1/6
20
© 2013 IBM Corporation
Games against others intelligent players
  Bimatrix (simultaneous) games
–  Second intelligent player: C (Column)
•  Two choices: L (Left) and R (Right)
–  Payoff bimatrix
•  we know C payoffs and that he will try to maximize them
–  As R, what would you do?
C
L
R
0
U
5
2
*
4
R
10
D
21
3
3
–  Knowledge C’s payoffs and rationality allows us to
predict with certitude C’s move (R)
8
© 2013 IBM Corporation
One shot simultaneous bi-matrix games
  Two players
–  Trying to maximize their payoffs
  Players must choose one out of two fixed alternatives
–  Row player chooses a row
–  Column player chooses a column
  Payoffs depends of both players’ moves
  Simultaneous move game
–  Players must act without knowing what the other player does
–  Play once
  No other uncertainties involved
  Players have full and common knowledge of
L
–  choice spaces
–  bi-matrix payoffs
  No cooperation allowed
U
R
D
22
C
uR(U,L)
uC(U,L)
uR(D,L)
uC(D,L)
R
uR(U,R)
uC(U,R)
uR(D,L)
uC(D,L)
© 2013 IBM Corporation
Dominant alternatives and social dilemmas
C
  Prisoner dilemma
–  (NC,NC) is mutually dominant
•  Players’ choices are independent of
information regarding the other player’s move
–  (NC,NC) is socially dominated by (C,C)
C
NC
5
C
-5
5
10
R
  Airport network security
10
NC
-2
-5
*
-2
*
23
© 2013 IBM Corporation
Iterative dominance
  No dominant strategy for either player, however
–  There are iterative dominated strategies
•  L > R
•  Now M is dominant in the restricted game
-  M > U and M > D
•  Now L > C in the restricted game
-  20 > - 10
–  (M,L) solution by iteratively elimination of (strict) dominated strategies
•  Common knowledge and rationality assumptions
  Exercise
–  Find if there is a solution by iteratively eliminating dominated strategies
Solution: (D,C)
24
© 2013 IBM Corporation
Nash equilibrium
  Games without
–  Dominant solution
–  Solution by iterative elimination of dominated alternatives
Concert
Ballet
0
2
Ballet
Concert
0
1
*
*
2
Tails
1
1
Head
0
Battle of the sexes
25
Head
-1
-1
-1
0
Tails
1
1
1
-1
Matching pennies
© 2013 IBM Corporation
Existence of Nash equilibrium (Nash)
  Every finite game has a NE in mixed strategies
–  Requires extending the original set of alternatives of each player
  Consider the matching pennies game
–  Mixed strategies
•  Choosing a lottery of certain probabilities over Head and Tails
–  Players’ choice sets defined by the lottery’s probability
•  Row: p in [0,1]
•  Column: q in [0,1]
–  Payoff associated with a pair of strategies (p,q) is
•  (p,1-p) P (q,1-q)T
where P is the payoff matrix for the original game in pure strategies
•  Payoffs need to be vNM utilities
–  Nash equilibrium
•  Intersection of players best response correspondences
uR(p*,q*) > uR(p,q*)
uC(p*,q*) > uC(p*,q)
26
(p*,q*)
© 2013 IBM Corporation
Nash equilibria concept as predictive tool
  Supporting the row player against the column player
  Games with multiple NEs
L
R
4
10
U
-100
12
D
*
6
8
–  To protect himself against -100
  Knowing this, R would prefer to play U
5
*
  Two NEs
  (D,L) > (U,R), since 12>10 and 8>6
  C may prefer to play R
–  ending up at the inferior NE (U,R)
4
  How can we model C behavior?
–  Bayesian K-level thinking
27
© 2013 IBM Corporation
K-level thinking
p
  Row is not sure about Column’s move
–  p: Row’s beliefs about C moving L
–  Row’s SEU
•  U: 4 p + 10 (1-p)
•  D: 12 p + 5 (1-p)
–  U > D iff p < 5/13 = 0.38
q
  How to elicit p?
–  Row’s analysis of Column’s decision
•  Assuming C behave as a SEU maximizer
•  q: C’s beliefs about whether Row is smart enough to choose D (best NE)
•  L SEU: -100 (1-q) + 8 q
R SEU: 6 (1-q) + 4 q
•  L > R iff q > 53/55 = 0.96
•  Since Row does not know q, his beliefs about q are represented by a CPD F
•  p = Pr (q > 0.96) = F(0.96)
28
© 2013 IBM Corporation
Simultaneous vs sequential games
  First mover advantage
–  Both players want to move first
•  Credible commitment/threat
Game of Chicken
29
  Second mover advantage
–  Players want to observe their opponent’s move
before acting
–  Both players try not to disclose their moves
Matching pennies game
© 2013 IBM Corporation
Dynamic games: backward induction
  Sequential Defend-Attack games
–  Two intelligent players
•  Defender and Attacker
–  Sequential moves
•  First Defender, afterwards Attacker knowing Defender’s decision
30
© 2013 IBM Corporation
Standard Game Theoretic Analysis
Expected utilities at node S
Best Attacker’s decision at node A
Assuming Defender knows Attacker’s analysis
Defender’s best decision at node D
Solution:
31
© 2013 IBM Corporation
Supporting a SEU maximizer Defender
Defender’s problem
Defender’s solution of maximum SEU
Modeling input:
32
??
© 2013 IBM Corporation
Example: Banks-Anderson (2006)
  Exploring how to defend US against a possible smallpox attack
–  Random costs (payoffs)
–  Conditional probabilities of each kind of smallpox attack
given terrorists know what defence has been adopted
This is
the problematic step
of the analysis
–  Compute expected cost of each defence strategy
  Solution: defence of minimum expected cost
33
© 2013 IBM Corporation
Predicting Attacker’s decision:
Defender problem
34
.
Defender’s view of Attacker problem
© 2013 IBM Corporation
Solving the assessment problem
Defender’s view of
Attacker problem
Elicitation of
A is an EU maximizer
D’s beliefs about
MC simulation
35
© 2013 IBM Corporation
Bayesian decision solution for the sequential Defend- Attack model
36
© 2013 IBM Corporation
Standard Game Theory vs. Bayesian Decision Analysis
  Decision Analysis (unitary DM)
–  Use of decision trees
–  Opponent’ actions treated as a random variables
•  How to elicit probs on opponents’ decisions??
•  Sensitivity analysis on (problematic) probabilities
  Game theory (multiple DMs)
–  Use of game trees
–  Opponent’ actions treated as a decision variables
–  All players are EU maximizers
•  Do we really know the utilities our opponents try to maximizes?
37
© 2013 IBM Corporation
Bayesian decision analysis approach to games
  One-sided prescriptive support
–  Use a prescriptive model (SEU) for supporting one of the DMs
–  Treat opponent's decisions as uncertainties
–  Assess probs over opponent's possible actions
–  Compute action of maximum expected utility
  The ‘real’ bayesian approach to games (Kadane & Larkey 1982)
–  Weaken common (prior) knowledge assumption
 How to assess a prob distribution over actions of intelligent others??
–  “Adversarial Risk Analysis” (DRI, DB and JR)
–  Development of new methods for the elicitation of probs on adversary’s actions
•  by modeling the adversary’s decision reasoning
-  Descriptive decision models
38
© 2013 IBM Corporation
Relevance to counterbioterrorism
  Biological Threat Risk Assessment for DHS (Battelle, 2006)
–  Based on Probability Event Trees (PET)
•  Government & Terrorists’ decisions treated as random events
  Methodological improvements study (NRC committee)
–  PET appropriate for risk assessment of
•  Random failure in engineering systems
but not for adversarial risk assessment
•  Terrorists are intelligent adversaries
trying to achieve their own objectives
•  Their decisions (if rational) can be somehow anticipated
–  PET cannot be used for a full risk management analysis
•  Government is a decision maker not a random variable
39
© 2013 IBM Corporation
Methodological improvement recommendations
  Distinction between risks from
–  Nature/Accidents vs.
–  Actions of intelligent adversaries
  Need of models to predict Terrorists’ behavior
–  Red team role playing (simulations of adversaries thinking)
–  Attack-preference models
•  Examine decision from Attacker viewpoint (T as DM)
–  Decision analytic approaches
•  Transform the PET in a decision tree (G as DM)
-  How to elicit probs on terrorist decisions??
-  Sensitivity analysis on (problematic) probabilities
- 
Von Winterfeldt and O’Sullivan (2006)
–  Game theoretic approaches
•  Transform the PET in a game tree (G & T as DM)
40
© 2013 IBM Corporation
Models to predict opponents’ behavior
  Role playing (simulations of adversaries thinking)
  Opponent-preference models
–  Examine decision from the opponent viewpoint
•  Elicit opponent’s probs and utilities from our viewpoint (point estimates)
–  Treat the opponent as a EU maximizer ( = rationality?)
•  Solve opponent’s decision problem by finding his action of max. EU
–  Assuming we know the opponent’s true probs and utilities
•  We can anticipate with certitude what the opponent will do
  Probabilistic prediction models
–  Acknowledge our uncertainty on opponent’s thinking
41
© 2013 IBM Corporation
Opponent-preference models
  Von Winterfeldt and O’Sullivan (2006)
–  Should We Protect Commercial Airplanes Against
Surface-to-Air Missile Attacks by Terrorists?
Decision tree + sensitivity analysis on probs
42
© 2013 IBM Corporation
Parnell (2007)
  Elicit Terrorist’s probs and utilities from our viewpoint
–  Point estimates
  Solve Terrorist’s decision problem
–  Finding Terrorist’s action that gives him max. expected utility
  Assuming we know the Terrorist’s true probs and utilities
–  We can anticipate with certitude what the terrorist will do
43
© 2013 IBM Corporation
Parnell (2007)
  Terrorist decision tree
44
© 2013 IBM Corporation
Paté-Cornell & Guikema (2002)
Attacker
45
Defender
© 2013 IBM Corporation
Paté-Cornell & Guikema (2002)
  Assessing probabilities of terrorist’s actions
–  From the Defender viewpoint
•  Model the Attacker’s decision problem
•  Estimate Attacker’s probs and utilities (point estimates)
•  Calculate expected utilities of attacker’s actions
–  Prob of attacker’s actions proportional to their perceived EU
  Feed these probs into the Defender’s decision problem
–  Uncertainty of Attacker’s decisions has been quantified
–  Choose defense of maximum expected utility
  Shortcoming
–  If the (idealized) adversary is an EU maximizer
he would certainly choose the attack of max expected utility
46
© 2013 IBM Corporation
How to assess probabilities over the actions of an intelligent adversary??
  Raiffa (2002): Asymmetric prescriptive/descriptive approach
–  Prescriptive advice to one party conditional on
a (probabilistic) description of how others will behave
–  Assess probability distribution from experimental data
•  Lab role simulation experiments
  Rios Insua, Rios & Banks (2009)
–  Assessment based on an analysis of the adversary rational behavior
•  Assuming the opponent is a SEU maximizer
-  Model his decision problem
-  Assess his probabilities and utilities
-  Find his action of maximum expected utility
–  Uncertainty in the Attacker’s decision stems from
•  our uncertainty about his probabilities and utilities
–  Sources of information
•  Available past statistical data of Attacker’s decision behavior
•  Expert knowledge / Intelligence
47
© 2013 IBM Corporation
The Defend–Attack–Defend model
  Two intelligent players
–  Defender and Attacker
  Sequential moves
–  First, Defender moves
–  Afterwards, Attacker knowing Defender’s move
–  Afterwards, Defender again responding to attack
  Infinite regress
48
© 2013 IBM Corporation
Standard Game Theory Analysis
  Under common knowledge of utilities and probs
  At node
  Expected utilities at node S
  Best Attacker’s decision at node A
  Best Defender’s decision at node
  Nash Solution:
49
© 2013 IBM Corporation
Supporting the Defender against the Attacker
  At node
  Expected utilities at node S
  At node A
  Best Defender’s decision at node
 
50
??
© 2013 IBM Corporation
Predicting
  Attacker’s problem as seen by the Defender
51
© 2013 IBM Corporation
Assessing
Given
52
© 2013 IBM Corporation
Monte-Carlo approximation of
 Drawn
 Generate
by
 Approximate
53
© 2013 IBM Corporation
The assessment of
  The Defender may want to exploit information about
how the Attacker analyzes her problem
  Hierarchy of recursive analysis
–  Infinity regress
–  Stop when there is no more information to elicit
54
© 2013 IBM Corporation
Games with private information
  Example:
–  Consider the following two-person simultaneous game with asymmetric information
•  Player 1 (row) knows whether he is stronger than player 2 (Colum)
but player 2 does not know this
•  Player's type use to represent information privately known by that player
55
© 2013 IBM Corporation
Bayes Nash Equilibrium
  Assumption
–  common prior over the row player's type:
•  Column's beliefs about the row player's type are common knowledge
•  Why column is going to disclose this information?
•  Why row is going to believe that column is disclosing her true beliefs about his type?
  Row’s strategy function
56
© 2013 IBM Corporation
Bayes Nash Equilibrium
57
© 2013 IBM Corporation
Is the common knowledge assumption
realistic?
 
–  Column is better off reporting that
– 
– 
58
© 2013 IBM Corporation
Modeling opponents' learning of private information
  Simultaneous decisions
–  Bayes Nash Equilibrium
–  No opportunity to learn about this information
  Sequential decisions
•  Perfect Bayesian equilibrium/Sequential rationality
•  Opportunity to learn from the observed decision behavior
-  Signaling games
  Models of adversaries' thinking to anticipate their decision behavior
–  need to model opponents' learning of private information we want to keep secret
–  how would this lead to a predictive probability distribution?
59
© 2013 IBM Corporation
Sequential Defend-Attack model with Defender’s private information
  Two intelligent players
–  Defender and Attacker
  Sequential moves
–  First Defender, afterwards Attacker knowing Defender’s decision
  Defender’s decision takes into account her private information
–  The vulnerabilities and importance of sites she wants to protect
–  The position of ground soldiers in the data ferry control problem (ITA)
  Attacker observes Defender’s decision
–  Attacker can infer/learn about information she wants to keep secret
  How to model the Attacker’s learning
60
© 2013 IBM Corporation
Influence diagram vs. game tree representation
61
© 2013 IBM Corporation
A game theoretic analysis
62
© 2013 IBM Corporation
A game theoretic analysis
63
© 2013 IBM Corporation
A game theoretic solution
64
© 2013 IBM Corporation
Supporting the Defender
  We weaken the common knowledge assumption
  The Defender’s decision problem
D
S
A
??
V
65
© 2013 IBM Corporation
Defender’s solution
66
© 2013 IBM Corporation
Predicting the Attacker’s move:
67
© 2013 IBM Corporation
Attacker action of MEU
68
© 2013 IBM Corporation
Assessing
69
© 2013 IBM Corporation
How to stop this hierarchy of recursive analysis?
  Potentially infinite analysis of nested decision models
–  where to stop?
•  Accommodate as much information as we can
•  Stop when the Defender has no more information
•  Non-informative or reference model
•  Sensitivity analysis test
70
© 2013 IBM Corporation