Utility functions

ArtificialIntelligence
Roman Barták
Department of Theoretical Computer Science and Mathematical Logic
Rationaldecisions
Wearedesigningrationalagentsthatmaximize
expectedutility.
Probabilitytheoryisatoolfordealingwith
degreesofbelief(aboutworldstates,action
effectsetc.).
Now,weexploreutilitytheorytorepresentand
reasonwithpreferences.
Finally,wecombinepreferences(as
expressedbyutilities)with
probabilitiesinthegeneraltheory
ofrationaldecisions– decision
theory.
Utilitytheory
Theagent‘spreferencesarecapturedbyautility
function,U(s),whichassignsasinglenumberto
expressdesirabilityofastate.
Theexpectedutilityofanactiongiventheevidence
isjusttheaveragevalueofoutcomes,weightedby
theirprobabilities
EU(a|e)=∑s P(Result(a)=s|a,e)U(s)
Arationalagentshouldchoosetheactionthat
maximizes theagent’sexpectedutility(MEU)
action=argmaxa EU(a|e)
TheMEUprincipleformalizesthegeneralnotion
thattheagentshould“dotherightthing”,butwe
needmakeitoperational.
Rationalpreferences
Frequently,itiseasierforanagenttoexpresspreferences
betweenstates:
– A>B:theagentprefersAoverB
– A<B:theagentprefersBoverA
– A~B:theagentisindifferentbetweenAandB
WhatsortofthingsareAandB?
– Theycouldbestatesofoftheworld,butmoreoftentannotthere
isuncertaintyaboutwhatisreallybeingoffered.
– Wecanthinkofthesetofoutcomesforeachactionasalottery
(possibleoutcomesS1,…,Sn thatoccurwithprobabilitiesp1,…,pn)
• [p1,S1;…;pn,Sn]
Anexampleoflottery(foodinairplanes)
Chickenorpasta?
– [0.8,juicychicken;0.2,overcooked chicken]
– [0.7,delicious pasta;0.3,congealedpasta]
Propertiesofrationalpreferences
Rationalpreferencesshouldleadtomaximizingexpected
utility(iftheagentviolatesthemitwillexhibitpatently
irrationalbehaviorinsomesituations.
Werequireseveralconstraints(theaxiomsofutilitytheory)
thatrationalpreferencesshouldobey.
– orderability:
exactlyoneof(A>B)or(A<B) or(A~B)holds
– transitivity:
(A<B)∧ (B<C) ⇒ (A<C)
– continuity:
(A>B>C)⇒ ∃p[p,A;1-p,C] ~B
– substitutability:
A~B ⇒ [p,A;1-p,C] ~ [p,B;1-p,C]
– monotonicity:
A>B⇒ (p>q⇔ [p,A;1-p,B] >[q,A;1-q,B]
– decomposability:
[p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B;(1-p)(1-q),C]
Preferencesleadtoutility
Teh axiomsofutilitytheoryareaxiomsaboutpreferencesbut
wecanderivethefollowingconsequencesformthem..
Existenceofutilityfunctionsuchthat:
U(A)<U(B)⇔ A<B
U(A)=U(B)⇔ A~B
Expectedutilityofalottery:
U([p1,S1;…;pn,Sn] )=∑i pi U(Si)
Autilityfunctionexistsforanyrationalagentbutitisnot
unique:
U‘(S)=aU(S)+b
Existenceofautilityfunctiondoesnotnecessarilymeanthat
theagentisexplicitlymaximizingthatutilityfunction.By
observingitspreferencesanobservercanconstructtheutility
function(eveniftheagentdoesnotknowit).
Utilityfunctions
Utilityisafunctionthatmapsfromlotteriestoreal
numbers.
Wemustfirstworkoutwhattheagent‘sutilityfunctionis
(preferenceelicitation).
– Wewillbelookingforanormalized utilityfunction.
– Wefixtheutilityofa“bestpossibleprize”Smax to1,
U(Smax)=1.
– Similarly,a“worstpossiblecatastrophe”Smin ismappedto0,
U(Smin)=0.
– Now,toassesstheutilityofanyparticularprizeSweaskthe
agenttochoosebetweenSandastandardlottery
[p, Smax;1-p, Smin]
– Theprobabilitypisadjusteduntiltheagentisindifferent
between Sandthestandardlottery.
– ThentheutilityofSisgivenby,U(S)=p.
Theutilityofmoney
Universalexchangeability ofmoneyforallkindsof
goodsandservicessuggeststhatmoneyplaysa
significantroleinhumanutilityfunctions.
– Anagentprefersmoremoney toless,allotherthings
beingequal.
Butthisdoesnotmeanthatmoneybehavesasautility
function(becauseitsaysnothingaboutpreferencesbetween
lotteriesinvolvingmoney).
Assumethatyouwonacompetitionandthehostoffersyoua
choice:eitheryoucantakethe1mil.USDpriceoryoucan
gambleitontheflipofcoin.Ifthecoincomesupheads, you
endupwithnothing, butifitcomesuptails,youget2.5mil.
USD.
Whatisyourchoice?
– Expectedmonetary valueofthegambleis1.250.000USD.
– Mostpeopledeclinethegambleandpocketthemillion.Are
theybeingirrational?
Theutilityofmoney
Thedecisioninthepreviousgamedoesnotdependontheprizeonly
butalsoonthewealthoftheplayer!
LetSn denoteastateofpossessingtotalwealthnUSD,andthecurrent
wealthiskUSD.
Thetheexpectedutilitiesoftwoactionsare:
– EU(Accept) =½U(Sk)+½U(Sk+2.500.000)
– EU(Decline) =U(Sk+1.000.000)
SupposeweassignU(Sk)=5,U(Sk+1.000.000)=8,U(Sk+2.500.000)=9.
Thentherationaldecisionwouldbetodecline!
the utility of money
risk-seeking behavior in
this area
risk-averse area (prefer a
sure thing with a payoff less
than the expected monetary
value of a gamble)
an agent that has a linear
curve is said to be riskneutral.
Humanjudgment(certaintyeffect)
Theevidencesuggeststhathumansare“predictable
irrational“.
Allaisparadox
– A:80%chanceof4000USD
– B:100%chanceof3000USD
Whatisyourchoice?
• Mostpeopleconsistentlyprefer BoverA(takingthesurething!)
– C:20%chanceof4000USD
– D:25%chanceof3000USD
Whatisyourchoice?
• Mostpeopleprefer CoverD(higherexpectedmonetaryvalue)
Certaintyeffect– peoplearestronglyattractedtogains
thatarecertain
Humanjudgment(ambiguityaversion)
Ellsbergparadox
Theurncontains1/3redballs,and2/3eitherblackoryellow
balls.
– A:100USDforaredball
– B:100USDforablackball
Whatisyourchoice?
• Mostpeopleprefer AoverB(Agivesa1/3chanceofwinning,whileB
couldbeanywherebetween0and2/3)
– C:100USDforaredoryellowball
– D:100USDforablackoryellowball
Whatisyourchoice?
• Mostpeopleprefer DoverC(Dgivesyoua2/3chance,whileCcould
anywherebetween1/3and3/3)
However,ifyouthinktherearemoreredthanblackballsthen
yourshouldpreferAoverBandCoverD.
Ambiguityaversion– mostpeopleelecttheknown
probabilityratherthantheunknownunknown.
Humanjudgment
Framingeffect– theexactwordingofadecisionproblem
canhaveabigimpactontheagent‘schoices
– medicalprocedureAhas90%survivalrate
– medicalprocedureBhas10%deathrate
Whatisyourchoice?
• Mostpeopleprefer AoverBthoughbothchoicesareidentical
Anchoringeffect– peoplefeelmorecomfortablemaking
relativeutilityjudgmentsratherthanabsoluteones
– Therestauranttakesadvantageofthisbyofferinga$200
bottlethatitknowsnobodywillbuy,butwhichservesto
skewupwardthecustomer’sestimateofthevalueofall
winesandmakethe$55bottleseemlikeabargain.
Multi-attributeutilitytheory
Inreal-lifetheoutcomesarecharacterizedby
twoormoreattributessuchascostandsafety
issues– multi-attributeutilitytheory.
Wewillassumethathighervaluesofan
attributecorrespondtohigherutilities.
Thequestionishowtogetpreferencesformore
attributes
– withoutcombiningtheattributevaluesintoa
singleutilityvalue– dominance
– combiningtheattributevaluesinto
asingleutilityvalue
Dominance
Ifanoptionisoflowervalueonall
attributesthatsomeotheroption,it
neednotbeconsideredfurther
– strictdominance.
Strictdominancecanbedefinedfor
uncertainoutcomestoo.
– ifallpossibleoutcomesofB
strictlydominateallpossible
outcomesofA
Strictdominancewillprobablyoccurlessoften
thaninthedeterministiccase.
Stochasticdominance
Stochasticdominanceoccursmorefrequentlyinrealproblems.It
iseasiertounderstandinthecontextofasinglevariable.
Stochasticdominanceisbestseenbyexaminingthecumulative
distribution thatmeasurestheprobabilitythatthatthecostis
lessthanorequalanygivenamount(itintegratestheoriginal
distribution).
probability distribution
cumulative distribution
Preferencestructure
Tospecifythecompleteutilityfunctionforn
attributeseachhavingdvalues,weneeddn
valuesintheworstcase.
– Thiscorrespondstoasituationinwhichagent‘s
preferenceshavenoregularityatall.
Preferencesoftypicalagentshavemuchmore
structuresothetheutilityfunctioncanbe
expressedas:
U(x1,…,xn)=F[f1(x1),…,fn(xn)]
Preferencestructure(withoutuncertainty)
Thebasicregularityiscalledpreferenceindependence.
TwoattributesX1 andX2 arepreferentiallyindependentof
athirdattributeX3 ifthepreferencebetweenoutcomes
x1,x2,x3 and x’1,x’2,x3
doesnotdependontheparticularvaluex3.
Ifeachpairofattributesispreferentiallyindependentof
anyotherattribute,wetalkaboutmutualpreferential
independence(MPI).
Ifattributesaremutuallypreferentiallyindependentthen
theagent’spreferencebehaviorcanbedescribedas
maximizingthefunction:
U(x1,…,xn)=∑i Ui(xi)
Avaluefunctionofthistypeiscalledanadditivevalue
function.
Preferencestructure(withuncertainty)
Whenuncertaintyispresentweneedtoconsider
thestructureofpreferencesbetweenlotteries.
Formutuallyutilityindependent(MUI)attributes
wecanusemultiplicativeutilityfunction:
U=k1U1 +k2U2 +k3U3
+k1k2U1U2 +k2k3U2U3 +k1k3U1U3
+k1k2k3U1U2U3
FornattributesexhibitingMUIwecanrepresent
theutilityfunctionusingnconstantsandnsingleattributeutilities.
Thevalueofinformation
Sofarwehaveassumedthatallrelevantinformation
isprovidedtotheagentbeforeitmakesitsdecision.
Inpractice,thisishardlyeverthecase.Forexample,
adoctorcannotexpecttobeprovidedwiththe
resultofallpossiblediagnostictests.
Oneofthemostimportantpartsofdecision
makingisknowingwhatquestionstoask.
Wewillnowlookatinformationvaluetheory,which
enablesanagenttochoosewhichinformationto
acquire.
Thevalueofinformation(example)
Supposeanoilcompanyishopingtobuyoneofthenindistinguishable
blocksofocean-drillingrights.
Letusassumefurther thatexactlyoneoftheblockscontainoilworthC
dollars,whileothersareworthless. TheaskingpriceofeachblockisC/n.
ExpectedmonetaryvalueofbuyingoneblockisC/n – C/n =0.
Nowsuppose thataseismologistoffers thecompanytheresultofasurvey
ofonespecificblock,whichindicatesdefinitelywhethertheblockcontains
oil.
Howmuchshouldthecompanytopayforthatinformation?
– Withprobability1/n,thesurveywillindicateoilinagivenblockandthethe
companywillbuyitandmakeaprofitC– C/n.
– Withprobability(n-1)/n,thesurveywillshowthattheblockcontainsnooil,
inwhichcasethecompanywillbuyanotherblock.Nowtheprobabilityof
findingoilinthatotherblockis1/(n-1),sotheexpectedprofitisC/(n-1)–
C/n.
– Togethertheexpectedprofitgiventhesurveyinformationis:
1/n(C– C/n)+(n-1)/n(C/(n-1)– C/n)=C/n
Thereforethecompanyshouldbewillingtopaytheseismologist
uptoC/n dollarsfortheinformation:theinformationis
worthasmuchastheblockitself.
Thevalueofinformation(ageneralformula)
Weassumethatexactevidencecanbeobtainedaboutthe
valueofsomerandomvariableEj – thisiscalledvalueof
perfectinformation (VPI).
Thevalueofthecurrentbestaction& (withtheinitial
evidencee)isdefinedby:
EU(&|e)=maxa ∑s‘ P(Result(a)=s‘|a,e)U(s‘)
Thevalueofthebestaction&jk afterthenewevidenceEj =
ejk isobtainedisdefinedby:
EU(& jk|e,Ej=ejk)=maxa ∑s‘ P(Result(a)=s‘|a,e, Ej=ejk)U(s‘)
ButthevalueofEj iscurrentlyunknownsowemustaverage
overallpossiblevaluesthatwemightdiscoverforEj:
VPIe(Ej)=(∑k P(Ej = ejk|e)EU(& jk|e,Ej=ejk))- EU(&|e)
Thevalueofinformation(qualitatively)
Whenisitbeneficialtoobtainnewinformation?
Clear choice, the
information is not
needed
The choice is unclear
and the information is
crucial
The choice is unclear,
the information is less
valuable
Informationhasvaluetotheextendthat
– itislikelytocauseachangeofaplanand
– thenewplanwillbesignificantlybetterthattheoldplan.
Propertiesofthevalueofinformation
Isitpossibleforinformationtobedeleterious?
Theexpectedvalueofinformationis
nonnegative.
∀e,Ej VPIe(Ej)≥ 0
Thevalueofinformationisnotadditive.
VPIe(Ej,Ek)≠ VPIe(Ej)+VPIe(Ek)
Theexpectedvalueofinformationisorder
independent.
VPIe(Ej,Ek)=VPIe(Ej)+VPIe,ej(Ek)=VPIe(Ek)+VPIe,ek(Ej)
Informationgathering
Asensibleagentshould
– askquestionsinareasonableorder
– avoidaskingquestionsthatareirrelevant
– takeintoaccounttheimportanceofeachpieceofinformationinrelation
itscost
– stopaskingquestionswhenthatisappropriate
Weassumethatwitheachobservableevidence variableEj,thereisan
associatedcost,Cost(Ej).
Information-gathering agentcanselect(greedily)themostefficient
observationuntilnoobservationworthitscosts(myopicapproach)
© 2016 Roman Barták
Department of Theoretical Computer Science and Mathematical Logic
[email protected]