Rational decisions Outline Preferences Rational preferences

Rational preferences
Idea: preferences of a rational agent must obey constraints.
Rational preferences ⇒
behavior describable as maximization of expected utility
Rational decisions
Constraints:
Orderability
(A B) ∨ (B A) ∨ (A ∼ B)
Transitivity
(A B) ∧ (B C) ⇒ (A C)
Continuity
A B C ⇒ ∃ p [p, A; 1 − p, C] ∼ B
Substitutability
A ∼ B ⇒ [p, A; 1 − p, C] ∼ [p, B; 1 − p, C]
Monotonicity
A B ⇒ (p ≥ q ⇔ [p, A; 1 − p, B] ∼ [q, A; 1 − q, B])
Chapter 16
Chapter 16
Chapter 16
1
Outline
4
Rational preferences contd.
♦ Rational preferences
Violating the constraints leads to self-evident irrationality
♦ Utilities
For example: an agent with intransitive preferences can be induced to give
away all its money
♦ Money
If B C, then an agent who has C
would pay (say) 1 cent to get B
♦ Multiattribute utilities
♦ Decision networks
If A B, then an agent who has B
would pay (say) 1 cent to get A
♦ Value of information
A
1c
1c
B
If C A, then an agent who has A
would pay (say) 1 cent to get C
Chapter 16
Chapter 16
5
Chapter 16
6
Maximizing expected utility
An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations
with uncertain prizes
A
p
L
Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944):
Given preferences satisfying the constraints
there exists a real-valued function U such that
U (A) ≥ U (B) ⇔ A ∼B
U ([p1, S1; . . . ; pn, Sn]) = Σi piU (Si)
MEU principle:
Choose the action that maximizes expected utility
1−p
B
Notation:
AB
A∼B
A
∼B
1c
2
Preferences
Lottery L = [p, A; (1 − p), B]
C
Note: an agent can be entirely rational (consistent with MEU)
without ever representing or manipulating utilities and probabilities
A preferred to B
indifference between A and B
B not preferred to A
E.g., a lookup table for perfect tictactoe
Chapter 16
3
Utilities
Student group utility
Utilities map states to real numbers. Which numbers?
For each x, adjust p until half the class votes for lottery (M=10,000)
Standard approach to assessment of human utilities:
compare a given state A to a standard lottery Lp that has
“best possible prize” u> with probability p
“worst possible catastrophe” u⊥ with probability (1 − p)
adjust lottery probability p until A ∼ Lp
p
1.0
0.9
0.8
0.7
0.6
0.999999
pay $30
0.5
continue as before
0.4
0.3
~
L
0.2
0.000001
0.1
instant death
0.0
$x
0
Chapter 16
500 1000
2000
3000
4000
5000
6000
7000
8000
9000
7
Utility scales
10000
Chapter 16
10
Decision networks
Add action nodes and utility nodes to belief networks
to enable rational decision making
Normalized utilities: u> = 1.0, u⊥ = 0.0
Micromorts: one-millionth chance of death
useful for Russian roulette, paying to reduce product risks, etc.
Airport Site
QALYs: quality-adjusted life years
useful for medical decisions involving substantial risk
Air Traffic
Deaths
Litigation
Noise
Construction
Cost
Note: behavior is invariant w.r.t. +ve linear transformation
U 0(x) = k1U (x) + k2 where k1 > 0
With deterministic prizes only (no lottery choices), only
ordinal utility can be determined, i.e., total order on prizes
Algorithm:
For each value of action node
compute expected value of utility node given action, evidence
Return MEU action
Chapter 16
8
Money
Utility curve: for what probability p am I indifferent between a prize x and
a lottery [p, $M ; (1 − p), $0] for large M ?
Typical empirical data, extrapolated with risk-prone behavior:
−150,000
o
o
o
o o o
o o o
o
How can complex utility functions be assessed from
preference behaviour?
Idea 1: identify conditions under which decisions can be made without complete identification of U (x1, . . . , xn)
Idea 2: identify various types of independence in preferences
and derive consequent canonical forms for U (x1, . . . , xn)
+U
o
11
How can we handle utility functions of many variables X1 . . . Xn?
E.g., what is U (Deaths, N oise, Cost)?
Given a lottery L with expected monetary value EM V (L),
usually U (L) < U (EM V (L)), i.e., people are risk-averse
o
Chapter 16
Multiattribute utility
Money does not behave as a utility function
o
U
+$
800,000
o
o
Chapter 16
9
Chapter 16
12
Strict dominance
Label the arcs + or –
Typically define attributes such that U is monotonic in each
SocioEcon
Age
GoodStudent
Strict dominance: choice B strictly dominates choice A iff
∀ i Xi(B) ≥ Xi(A) (and hence U (B) ≥ U (A))
VehicleYear
This region
dominates A
X2
ExtraCar
X2
SeniorTrain
B
C
Mileage
RiskAversion
DrivingHist
C
A
MakeModel
DrivingSkill
B
Antilock
A
DrivQuality
D
Ruggedness
X1
OwnDamage
Uncertain attributes
Cushioning
MedicalCost
Chapter 16
OwnCost
OtherCost
LiabilityCost
Strict dominance seldom holds in practice
PropertyCost
Chapter 16
13
Stochastic dominance
1.2
AntiTheft
Theft
X1
Deterministic attributes
16
Label the arcs + or –
SocioEcon
1
Age
GoodStudent
1
0.8
0.8
Probability
Probability
CarValue HomeBase
Airbag
Accident
0.6
S1
S2
0.4
ExtraCar
VehicleYear
S1
S2
0.4
SeniorTrain
0.2
0.2
0
-5.5
-5
-4.5 -4 -3.5
Negative cost
-3
-2.5
-2
+
MakeModel
DrivingSkill
DrivingHist
0
-6
Mileage
RiskAversion
0.6
-6
-5.5
-5
-4.5 -4 -3.5
Negative cost
-3
-2.5
-2
Antilock
DrivQuality
DistributionZ p1 stochastically
dominates distribution p2 iff
Z
t
t
∀ t −∞ p1(x)dx ≤ −∞ p2(t)dt
CarValue HomeBase
Airbag
AntiTheft
Accident
Ruggedness
Theft
OwnDamage
If U is monotonic in x, then A1 with outcome distribution p1
stochastically
dominates A2 Zwith outcome distribution p2:
Z
∞
∞
−∞ p1 (x)U (x)dx ≥ −∞ p2 (x)U (x)dx
Multiattribute case: stochastic dominance on all attributes ⇒ optimal
Chapter 16
Cushioning
OwnCost
OtherCost
MedicalCost
LiabilityCost
PropertyCost
14
Chapter 16
Stochastic dominance contd.
17
Label the arcs + or –
Stochastic dominance can often be determined without
exact distributions using qualitative reasoning
SocioEcon
Age
GoodStudent
ExtraCar
E.g., construction cost increases with distance from city
S1 is closer to the city than S2
⇒ S1 stochastically dominates S2 on cost
Mileage
RiskAversion
VehicleYear
+
SeniorTrain
+
MakeModel
DrivingSkill
E.g., injury increases with collision speed
DrivingHist
Can annotate belief networks with stochastic dominance information:
+ Y (X positively influences Y ) means that
X −→
For every value z of Y ’s other parents Z
∀ x1, x2 x1 ≥ x2 ⇒ P(Y |x1, z) stochastically dominates P(Y |x2, z)
Antilock
DrivQuality
Airbag
Ruggedness
CarValue HomeBase
AntiTheft
Accident
Theft
OwnDamage
Cushioning
MedicalCost
Chapter 16
15
OtherCost
LiabilityCost
OwnCost
PropertyCost
Chapter 16
18
Label the arcs + or –
Preference structure: Deterministic
X1 and X2 preferentially independent of X3 iff
preference between hx1, x2, x3i and hx01, x02, x3i
does not depend on x3
SocioEcon
Age
GoodStudent
ExtraCar
Mileage
RiskAversion
E.g., hN oise, Cost, Saf etyi:
h20,000 suffer, $4.6 billion, 0.06 deaths/mpmi vs.
h70,000 suffer, $4.2 billion, 0.06 deaths/mpmi
VehicleYear
+
SeniorTrain
+
MakeModel
DrivingSkill
DrivingHist
Antilock
DrivQuality
CarValue HomeBase
Airbag
AntiTheft
−
Theorem (Leontief, 1947): if every pair of attributes is P.I. of its complement, then every subset of attributes is P.I of its complement: mutual
P.I..
Accident
Ruggedness
Theorem (Debreu, 1960): mutual P.I. ⇒ ∃ additive value function:
Theft
OwnDamage
Cushioning
V (S) = ΣiVi(Xi(S))
OwnCost
OtherCost
Hence assess n single-attribute functions; often a good approximation
MedicalCost
LiabilityCost
PropertyCost
Chapter 16
19
Label the arcs + or –
Need to consider preferences over lotteries:
X is utility-independent of Y iff
preferences over lotteries in X do not depend on y
SocioEcon
GoodStudent
ExtraCar
Mileage
Mutual U.I.: each subset is U.I of its complement
⇒ ∃ multiplicative utility function:
U = k 1 U 1 + k2 U 2 + k3 U 3
+ k 1 k2 U 1 U 2 + k 2 k3 U 2 U 3 + k 3 k1 U 3 U 1
+ k 1 k2 k3 U 1 U 2 U 3
VehicleYear
−
+
SeniorTrain
+
MakeModel
DrivingSkill
DrivingHist
Antilock
DrivQuality
CarValue HomeBase
Airbag
AntiTheft
−
Accident
Ruggedness
22
Preference structure: Stochastic
Age
RiskAversion
Chapter 16
Routine procedures and software packages for generating preference tests to
identify various canonical families of utility functions
Theft
OwnDamage
Cushioning
OwnCost
OtherCost
MedicalCost
LiabilityCost
PropertyCost
Chapter 16
Label the arcs + or –
Idea: compute value of acquiring each possible piece of evidence
Can be done directly from decision network
SocioEcon
GoodStudent
ExtraCar
Example: buying oil drilling rights
Two blocks A and B, exactly one has oil, worth k
Prior probabilities 0.5 each, mutually exclusive
Current price of each block is k/2
“Consultant” offers accurate survey of A. Fair price?
Mileage
VehicleYear
−
+
SeniorTrain
+
MakeModel
DrivingSkill
DrivingHist
Antilock
DrivQuality
Airbag
CarValue HomeBase
AntiTheft
−
Ruggedness
Accident
Theft
OwnDamage
Cushioning
MedicalCost
OtherCost
LiabilityCost
23
Value of information
Age
RiskAversion
Chapter 16
20
OwnCost
PropertyCost
Chapter 16
21
Solution: compute expected value of information
= expected value of best action given the information
minus expected value of best action without information
Survey may say “oil in A” or “no oil in A”, prob. 0.5 each (given!)
= [0.5 × value of “buy A” given “oil in A”
+ 0.5 × value of “buy B” given “no oil in A”]
–0
= (0.5 × k/2) + (0.5 × k/2) − 0 = k/2
Chapter 16
24
General formula
Current evidence E, current best action α
Possible action outcomes Si, potential new evidence Ej
EU (α|E) = max
Σi U (Si) P (Si|E, a)
a
Suppose we knew Ej = ejk , then we would choose αejk s.t.
EU (αejk |E, Ej = ejk ) = max
Σi U (Si) P (Si|E, a, Ej = ejk )
a
Ej is a random variable whose value is currently unknown
⇒ must compute expected gain over all possible values:
!
V P IE (Ej ) = Σk P (Ej = ejk |E)EU (αejk |E, Ej = ejk ) − EU (α|E)
(VPI = value of perfect information)
Chapter 16
25
Properties of VPI
Nonnegative—in expectation, not post hoc
∀ j, E V P IE (Ej ) ≥ 0
Nonadditive—consider, e.g., obtaining Ej twice
V P IE (Ej , Ek ) 6= V P IE (Ej ) + V P IE (Ek )
Order-independent
V P IE (Ej , Ek ) = V P IE (Ej ) + V P IE,Ej (Ek ) = V P IE (Ek ) + V P IE,Ek (Ej )
Note: when more than one piece of evidence can be gathered,
maximizing VPI for each to select one is not always optimal
⇒ evidence-gathering becomes a sequential decision problem
Chapter 16
26
Qualitative behaviors
a) Choice is obvious, information worth little
b) Choice is nonobvious, information worth a lot
c) Choice is nonobvious, information worth little
P( U | Ej )
P( U | Ej )
P( U | Ej )
U
U1
U2
(a)
U
U
U2 U1
U2 U1
(b)
(c)
Chapter 16
27