Rational preferences Idea: preferences of a rational agent must obey constraints. Rational preferences ⇒ behavior describable as maximization of expected utility Rational decisions Constraints: Orderability (A B) ∨ (B A) ∨ (A ∼ B) Transitivity (A B) ∧ (B C) ⇒ (A C) Continuity A B C ⇒ ∃ p [p, A; 1 − p, C] ∼ B Substitutability A ∼ B ⇒ [p, A; 1 − p, C] ∼ [p, B; 1 − p, C] Monotonicity A B ⇒ (p ≥ q ⇔ [p, A; 1 − p, B] ∼ [q, A; 1 − q, B]) Chapter 16 Chapter 16 Chapter 16 1 Outline 4 Rational preferences contd. ♦ Rational preferences Violating the constraints leads to self-evident irrationality ♦ Utilities For example: an agent with intransitive preferences can be induced to give away all its money ♦ Money If B C, then an agent who has C would pay (say) 1 cent to get B ♦ Multiattribute utilities ♦ Decision networks If A B, then an agent who has B would pay (say) 1 cent to get A ♦ Value of information A 1c 1c B If C A, then an agent who has A would pay (say) 1 cent to get C Chapter 16 Chapter 16 5 Chapter 16 6 Maximizing expected utility An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes A p L Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints there exists a real-valued function U such that U (A) ≥ U (B) ⇔ A ∼B U ([p1, S1; . . . ; pn, Sn]) = Σi piU (Si) MEU principle: Choose the action that maximizes expected utility 1−p B Notation: AB A∼B A ∼B 1c 2 Preferences Lottery L = [p, A; (1 − p), B] C Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities A preferred to B indifference between A and B B not preferred to A E.g., a lookup table for perfect tictactoe Chapter 16 3 Utilities Student group utility Utilities map states to real numbers. Which numbers? For each x, adjust p until half the class votes for lottery (M=10,000) Standard approach to assessment of human utilities: compare a given state A to a standard lottery Lp that has “best possible prize” u> with probability p “worst possible catastrophe” u⊥ with probability (1 − p) adjust lottery probability p until A ∼ Lp p 1.0 0.9 0.8 0.7 0.6 0.999999 pay $30 0.5 continue as before 0.4 0.3 ~ L 0.2 0.000001 0.1 instant death 0.0 $x 0 Chapter 16 500 1000 2000 3000 4000 5000 6000 7000 8000 9000 7 Utility scales 10000 Chapter 16 10 Decision networks Add action nodes and utility nodes to belief networks to enable rational decision making Normalized utilities: u> = 1.0, u⊥ = 0.0 Micromorts: one-millionth chance of death useful for Russian roulette, paying to reduce product risks, etc. Airport Site QALYs: quality-adjusted life years useful for medical decisions involving substantial risk Air Traffic Deaths Litigation Noise Construction Cost Note: behavior is invariant w.r.t. +ve linear transformation U 0(x) = k1U (x) + k2 where k1 > 0 With deterministic prizes only (no lottery choices), only ordinal utility can be determined, i.e., total order on prizes Algorithm: For each value of action node compute expected value of utility node given action, evidence Return MEU action Chapter 16 8 Money Utility curve: for what probability p am I indifferent between a prize x and a lottery [p, $M ; (1 − p), $0] for large M ? Typical empirical data, extrapolated with risk-prone behavior: −150,000 o o o o o o o o o o How can complex utility functions be assessed from preference behaviour? Idea 1: identify conditions under which decisions can be made without complete identification of U (x1, . . . , xn) Idea 2: identify various types of independence in preferences and derive consequent canonical forms for U (x1, . . . , xn) +U o 11 How can we handle utility functions of many variables X1 . . . Xn? E.g., what is U (Deaths, N oise, Cost)? Given a lottery L with expected monetary value EM V (L), usually U (L) < U (EM V (L)), i.e., people are risk-averse o Chapter 16 Multiattribute utility Money does not behave as a utility function o U +$ 800,000 o o Chapter 16 9 Chapter 16 12 Strict dominance Label the arcs + or – Typically define attributes such that U is monotonic in each SocioEcon Age GoodStudent Strict dominance: choice B strictly dominates choice A iff ∀ i Xi(B) ≥ Xi(A) (and hence U (B) ≥ U (A)) VehicleYear This region dominates A X2 ExtraCar X2 SeniorTrain B C Mileage RiskAversion DrivingHist C A MakeModel DrivingSkill B Antilock A DrivQuality D Ruggedness X1 OwnDamage Uncertain attributes Cushioning MedicalCost Chapter 16 OwnCost OtherCost LiabilityCost Strict dominance seldom holds in practice PropertyCost Chapter 16 13 Stochastic dominance 1.2 AntiTheft Theft X1 Deterministic attributes 16 Label the arcs + or – SocioEcon 1 Age GoodStudent 1 0.8 0.8 Probability Probability CarValue HomeBase Airbag Accident 0.6 S1 S2 0.4 ExtraCar VehicleYear S1 S2 0.4 SeniorTrain 0.2 0.2 0 -5.5 -5 -4.5 -4 -3.5 Negative cost -3 -2.5 -2 + MakeModel DrivingSkill DrivingHist 0 -6 Mileage RiskAversion 0.6 -6 -5.5 -5 -4.5 -4 -3.5 Negative cost -3 -2.5 -2 Antilock DrivQuality DistributionZ p1 stochastically dominates distribution p2 iff Z t t ∀ t −∞ p1(x)dx ≤ −∞ p2(t)dt CarValue HomeBase Airbag AntiTheft Accident Ruggedness Theft OwnDamage If U is monotonic in x, then A1 with outcome distribution p1 stochastically dominates A2 Zwith outcome distribution p2: Z ∞ ∞ −∞ p1 (x)U (x)dx ≥ −∞ p2 (x)U (x)dx Multiattribute case: stochastic dominance on all attributes ⇒ optimal Chapter 16 Cushioning OwnCost OtherCost MedicalCost LiabilityCost PropertyCost 14 Chapter 16 Stochastic dominance contd. 17 Label the arcs + or – Stochastic dominance can often be determined without exact distributions using qualitative reasoning SocioEcon Age GoodStudent ExtraCar E.g., construction cost increases with distance from city S1 is closer to the city than S2 ⇒ S1 stochastically dominates S2 on cost Mileage RiskAversion VehicleYear + SeniorTrain + MakeModel DrivingSkill E.g., injury increases with collision speed DrivingHist Can annotate belief networks with stochastic dominance information: + Y (X positively influences Y ) means that X −→ For every value z of Y ’s other parents Z ∀ x1, x2 x1 ≥ x2 ⇒ P(Y |x1, z) stochastically dominates P(Y |x2, z) Antilock DrivQuality Airbag Ruggedness CarValue HomeBase AntiTheft Accident Theft OwnDamage Cushioning MedicalCost Chapter 16 15 OtherCost LiabilityCost OwnCost PropertyCost Chapter 16 18 Label the arcs + or – Preference structure: Deterministic X1 and X2 preferentially independent of X3 iff preference between hx1, x2, x3i and hx01, x02, x3i does not depend on x3 SocioEcon Age GoodStudent ExtraCar Mileage RiskAversion E.g., hN oise, Cost, Saf etyi: h20,000 suffer, $4.6 billion, 0.06 deaths/mpmi vs. h70,000 suffer, $4.2 billion, 0.06 deaths/mpmi VehicleYear + SeniorTrain + MakeModel DrivingSkill DrivingHist Antilock DrivQuality CarValue HomeBase Airbag AntiTheft − Theorem (Leontief, 1947): if every pair of attributes is P.I. of its complement, then every subset of attributes is P.I of its complement: mutual P.I.. Accident Ruggedness Theorem (Debreu, 1960): mutual P.I. ⇒ ∃ additive value function: Theft OwnDamage Cushioning V (S) = ΣiVi(Xi(S)) OwnCost OtherCost Hence assess n single-attribute functions; often a good approximation MedicalCost LiabilityCost PropertyCost Chapter 16 19 Label the arcs + or – Need to consider preferences over lotteries: X is utility-independent of Y iff preferences over lotteries in X do not depend on y SocioEcon GoodStudent ExtraCar Mileage Mutual U.I.: each subset is U.I of its complement ⇒ ∃ multiplicative utility function: U = k 1 U 1 + k2 U 2 + k3 U 3 + k 1 k2 U 1 U 2 + k 2 k3 U 2 U 3 + k 3 k1 U 3 U 1 + k 1 k2 k3 U 1 U 2 U 3 VehicleYear − + SeniorTrain + MakeModel DrivingSkill DrivingHist Antilock DrivQuality CarValue HomeBase Airbag AntiTheft − Accident Ruggedness 22 Preference structure: Stochastic Age RiskAversion Chapter 16 Routine procedures and software packages for generating preference tests to identify various canonical families of utility functions Theft OwnDamage Cushioning OwnCost OtherCost MedicalCost LiabilityCost PropertyCost Chapter 16 Label the arcs + or – Idea: compute value of acquiring each possible piece of evidence Can be done directly from decision network SocioEcon GoodStudent ExtraCar Example: buying oil drilling rights Two blocks A and B, exactly one has oil, worth k Prior probabilities 0.5 each, mutually exclusive Current price of each block is k/2 “Consultant” offers accurate survey of A. Fair price? Mileage VehicleYear − + SeniorTrain + MakeModel DrivingSkill DrivingHist Antilock DrivQuality Airbag CarValue HomeBase AntiTheft − Ruggedness Accident Theft OwnDamage Cushioning MedicalCost OtherCost LiabilityCost 23 Value of information Age RiskAversion Chapter 16 20 OwnCost PropertyCost Chapter 16 21 Solution: compute expected value of information = expected value of best action given the information minus expected value of best action without information Survey may say “oil in A” or “no oil in A”, prob. 0.5 each (given!) = [0.5 × value of “buy A” given “oil in A” + 0.5 × value of “buy B” given “no oil in A”] –0 = (0.5 × k/2) + (0.5 × k/2) − 0 = k/2 Chapter 16 24 General formula Current evidence E, current best action α Possible action outcomes Si, potential new evidence Ej EU (α|E) = max Σi U (Si) P (Si|E, a) a Suppose we knew Ej = ejk , then we would choose αejk s.t. EU (αejk |E, Ej = ejk ) = max Σi U (Si) P (Si|E, a, Ej = ejk ) a Ej is a random variable whose value is currently unknown ⇒ must compute expected gain over all possible values: ! V P IE (Ej ) = Σk P (Ej = ejk |E)EU (αejk |E, Ej = ejk ) − EU (α|E) (VPI = value of perfect information) Chapter 16 25 Properties of VPI Nonnegative—in expectation, not post hoc ∀ j, E V P IE (Ej ) ≥ 0 Nonadditive—consider, e.g., obtaining Ej twice V P IE (Ej , Ek ) 6= V P IE (Ej ) + V P IE (Ek ) Order-independent V P IE (Ej , Ek ) = V P IE (Ej ) + V P IE,Ej (Ek ) = V P IE (Ek ) + V P IE,Ek (Ej ) Note: when more than one piece of evidence can be gathered, maximizing VPI for each to select one is not always optimal ⇒ evidence-gathering becomes a sequential decision problem Chapter 16 26 Qualitative behaviors a) Choice is obvious, information worth little b) Choice is nonobvious, information worth a lot c) Choice is nonobvious, information worth little P( U | Ej ) P( U | Ej ) P( U | Ej ) U U1 U2 (a) U U U2 U1 U2 U1 (b) (c) Chapter 16 27
© Copyright 2026 Paperzz