Chapter 16 - UMBC CSEE

Making Simple
Decisions
Chapter 16
Some material borrowed from Jean-Claude Latombe and
Daphne Koller by way of Marie desJadines,
Topics
• Decision making under uncertainty
– Utility theory and rationality
– Expected utility
– Utility functions
– Multiattribute utility functions
– Preference structures
– Decision networks
– Value of information
Uncertain Outcomes of Actions
• Some actions may have uncertain outcomes
– Action: spend $10 to buy a lottery which pays $1000 to the
winner
– Outcome: {win, not-win}
• Each outcome is associated with some merit (utility)
– Win: gain $990
– Not-win: lose $10
• There is a probability distribution associated with the
outcomes of this action (0.0001, 0.9999).
• Should I take this action?
Expected Utility
• Random variable X with n values x1,…,xn and distribution
(p1,…,pn)
– X is the outcome of performing action A (i.e., the state reached
after A is taken)
• Function U of X
– U is a mapping from states to numerical utilities (values)
• The expected utility of performing action A is
EU[A] = Si=1,…,n p(xi|A)U(xi)
Probability of each outcome
Utility of each outcome
• Expected utility of lottery: 0.0001*99 0– 0.9999*10 = –
9.9811
One State/One Action Example
s0
U(S0|A1) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1
= 20 + 35 + 7
= 62
A1
s1
s2
s3
0.2
100
0.7
50
0.1
70
One State/Two Actions Example
s0
A1
• U1(S0|A1) = 62
• U2(S0|A2) = 74
• U(S0) = max{U1(S0|A1),U2(S0|A2)}
= 74
A2
s1
s2
s3
s4
0.2
100
0.7 0.2
50
0.1
70
0.8
80
MEU Principle
• Decision theory: A rational agent should choose the action
that maximizes the agent’s expected utility
• Maximizing expected utility (MEU) is a normative criterion
for rational choices of actions
• Must have complete model of:
– Actions
– States
– Utilities
• Even if you have a complete model, will be computationally
intractable
Comparing outcomes
• Which is better:
A = Being rich and sunbathing where it’s warm
B = Being rich and sunbathing where it’s cool
C = Being poor and sunbathing where it’s warm
D = Being poor and sunbathing where it’s cool
• Multiattribute utility theory
– A clearly dominates B: A > B. A > C. C > D. A > D. What about
B vs. C?
– Simplest case: Additive value function (just add the individual
attribute utilities)
– Others use weighted utility, based on the relative importance of these
attributes
– Learning the combined utility function (similar to joint prob. table)
Multiattribute Utility Theory
• A given state may have multiple utilities
– ...because of multiple evaluation criteria
– ...because of multiple agents (interested parties) with
different utility functions
10
Decision networks
• Extend Bayesian nets to handle actions and utilities
– a.k.a. influence diagrams
• Make use of Bayesian net inference
• Useful application: Value of Information
R&N example
Decision network representation
• Chance nodes: random variables, as in Bayesian nets
• Decision nodes: actions that decision maker can take
• Utility/value nodes: the utility of the outcome state.
Evaluating decision networks
• Set the evidence variables for the current state.
• For each possible value of the decision node (assume just
one):
– Set the decision node to that value.
– Calculate the posterior probabilities for the parent nodes of the
utility node, using BN inference.
– Calculate the resulting utility for the action.
• Return the action with the highest utility.
Exercise: Umbrella network
take/don’t take
P(rain) = 0.4
Umbrella
Weather
Lug umbrella
P(lug|take) = 1.0
P(~lug|~take)=1.0
Happiness
U(lug, rain) = -25
U(lug, ~rain) = 0
U(~lug, rain) = -100
U(~lug, ~rain) = 100
EU(take)
= U(lug, rain)*P(lug)*p(rain) +
U(lug, ~rain)*P(lug)*p(~rain)
= -25*0.4 + 0*P(~rain)
= -25*0.4 = -10
Forecast
f
w
p(f|w)
sunny rain
0.3
rainy
rain
0.7
sunny no rain 0.8
rainy no rain 0.2
EU(~take)
= U(~lug, rain)*P(~lug)*p(rain) +
U(~lug, ~rain)*P(~lug)*p(~rain)
= -100*0.4 + 100*0.6
= 20
Umbrella network
Decision may be helped with forecast (additional information)
take/don’t take
Umbrella
D(F=Sunny) = Take
D(F=Rainy) = Not_Take
P(rain) = 0.4
Weather
Lug umbrella
P(lug|take) = 1.0
P(~lug|~take)=1.0
Happiness
U(lug, rain) = -25
U(lug, ~rain) = 0
U(~lug, rain) = -100
U(~lug, ~rain) = 100
Forecast
f
w
p(f|w)
sunny rain
0.3
rainy
rain
0.7
sunny no rain 0.8
rainy no rain 0.2
Value of Perfect Information (VPI)
• How much is it worth to observe (with certainty) a random
variable X?
• Suppose the agent’s current knowledge is E. The value of the current
best action  is:
EU(α | E) = maxA ∑i U(Resulti(A)) p(Resulti(A) | E, Do(A))
• The value of the new best action after observing the value of X is:
EU(α’ | E,X) = maxA ∑i U(Resulti(A)) p(Resulti(A) | E, X, Do(A))
• …But we don’t know the value of X yet, so we have to sum over its
possible values
• The value of perfect information for X is therefore:
VPI(X) = ( ∑k p(xk | E) EU(αxk | xk, E)) – EU (α | E)
Probability of
each value of X
Expected utility
of the best action
given that value of X
Expected utility
of the best action
if we don’t know X
(i.e., currently)
Umbrella network
Decision may be helped with forecast (additional information)
take/don’t take
Umbrella
D(F=Sunny) = Take
D(F=Rainy) = Not_Take
P(rain) = 0.4
Weather
Lug umbrella
P(lug|take) = 1.0
P(~lug|~take)=1.0
Happiness
U(lug, rain) = -25
U(lug, ~rain) = 0
U(~lug, rain) = -100
U(~lug, ~rain) = 100
Forecast
f
w
p(f|w)
sunny rain
0.3
rainy
rain
0.7
sunny no rain 0.8
rainy no rain 0.2
Exercise: Umbrella network
p(W|F) = α*p(F|W)*P(W)
p(sunny|rain)*p(rain) = 0.3*0.4 = 0.12
P(sunny|~rain)*p(~rain) = 0.8*0.6 = 0.48
α = 1/(0.12+0.48) = 5/3
p(rain|sunny) = 0.12 *5/3 = 0.2
p(~rain|sunny) = 0.48*5/3 = 0.8
Similarly, we have
p(rain|rainy) = 0.12 *2.5 = 0.7
p(~rain|rainy) = 0.28*2.5 = 0.3
EU(take|f=sunny))
= -25*P(rain|sunny) + 0*P(~rain|sunny)
= -25*0.2 = -5
EU(~take|f=sunny)
= -100*0.2 + 100*0.8
= 60
a1 = ~take
EU(take|f=rainy))
= -25*P(rain|rainy) + 0*P(~rain|rainy)
= -25*0.7 = -17.5
EU(~take|f=rainy)
= -100*0.7 + 100*0.3
= -40
a2 = take
P(rain) = 0.4
VPI(F) = 60*P(f=sunny) – 17.5*p(f=rainy) – 20
= 60*0.6 – 17.5*0.4 – 20
=9
f
w
p(f|w)
sunny rain
0.3
rainy
rain
0.7
sunny no rain 0.8
rainy no rain 0.2