ArtificialIntelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Rationaldecisions Wearedesigningrationalagentsthatmaximize expectedutility. Probabilitytheoryisatoolfordealingwith degreesofbelief(aboutworldstates,action effectsetc.). Now,weexploreutilitytheorytorepresentand reasonwithpreferences. Finally,wecombinepreferences(as expressedbyutilities)with probabilitiesinthegeneraltheory ofrationaldecisions– decision theory. Utilitytheory Theagent‘spreferencesarecapturedbyautility function,U(s),whichassignsasinglenumberto expressdesirabilityofastate. Theexpectedutilityofanactiongiventheevidence isjusttheaveragevalueofoutcomes,weightedby theirprobabilities EU(a|e)=∑s P(Result(a)=s|a,e)U(s) Arationalagentshouldchoosetheactionthat maximizes theagent’sexpectedutility(MEU) action=argmaxa EU(a|e) TheMEUprincipleformalizesthegeneralnotion thattheagentshould“dotherightthing”,butwe needmakeitoperational. Rationalpreferences Frequently,itiseasierforanagenttoexpresspreferences betweenstates: – A>B:theagentprefersAoverB – A<B:theagentprefersBoverA – A~B:theagentisindifferentbetweenAandB WhatsortofthingsareAandB? – Theycouldbestatesofoftheworld,butmoreoftentannotthere isuncertaintyaboutwhatisreallybeingoffered. – Wecanthinkofthesetofoutcomesforeachactionasalottery (possibleoutcomesS1,…,Sn thatoccurwithprobabilitiesp1,…,pn) • [p1,S1;…;pn,Sn] Anexampleoflottery(foodinairplanes) Chickenorpasta? – [0.8,juicychicken;0.2,overcooked chicken] – [0.7,delicious pasta;0.3,congealedpasta] Propertiesofrationalpreferences Rationalpreferencesshouldleadtomaximizingexpected utility(iftheagentviolatesthemitwillexhibitpatently irrationalbehaviorinsomesituations. Werequireseveralconstraints(theaxiomsofutilitytheory) thatrationalpreferencesshouldobey. – orderability: exactlyoneof(A>B)or(A<B) or(A~B)holds – transitivity: (A<B)∧ (B<C) ⇒ (A<C) – continuity: (A>B>C)⇒ ∃p[p,A;1-p,C] ~B – substitutability: A~B ⇒ [p,A;1-p,C] ~ [p,B;1-p,C] – monotonicity: A>B⇒ (p>q⇔ [p,A;1-p,B] >[q,A;1-q,B] – decomposability: [p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B;(1-p)(1-q),C] Preferencesleadtoutility Teh axiomsofutilitytheoryareaxiomsaboutpreferencesbut wecanderivethefollowingconsequencesformthem.. Existenceofutilityfunctionsuchthat: U(A)<U(B)⇔ A<B U(A)=U(B)⇔ A~B Expectedutilityofalottery: U([p1,S1;…;pn,Sn] )=∑i pi U(Si) Autilityfunctionexistsforanyrationalagentbutitisnot unique: U‘(S)=aU(S)+b Existenceofautilityfunctiondoesnotnecessarilymeanthat theagentisexplicitlymaximizingthatutilityfunction.By observingitspreferencesanobservercanconstructtheutility function(eveniftheagentdoesnotknowit). Utilityfunctions Utilityisafunctionthatmapsfromlotteriestoreal numbers. Wemustfirstworkoutwhattheagent‘sutilityfunctionis (preferenceelicitation). – Wewillbelookingforanormalized utilityfunction. – Wefixtheutilityofa“bestpossibleprize”Smax to1, U(Smax)=1. – Similarly,a“worstpossiblecatastrophe”Smin ismappedto0, U(Smin)=0. – Now,toassesstheutilityofanyparticularprizeSweaskthe agenttochoosebetweenSandastandardlottery [p, Smax;1-p, Smin] – Theprobabilitypisadjusteduntiltheagentisindifferent between Sandthestandardlottery. – ThentheutilityofSisgivenby,U(S)=p. Theutilityofmoney Universalexchangeability ofmoneyforallkindsof goodsandservicessuggeststhatmoneyplaysa significantroleinhumanutilityfunctions. – Anagentprefersmoremoney toless,allotherthings beingequal. Butthisdoesnotmeanthatmoneybehavesasautility function(becauseitsaysnothingaboutpreferencesbetween lotteriesinvolvingmoney). Assumethatyouwonacompetitionandthehostoffersyoua choice:eitheryoucantakethe1mil.USDpriceoryoucan gambleitontheflipofcoin.Ifthecoincomesupheads, you endupwithnothing, butifitcomesuptails,youget2.5mil. USD. Whatisyourchoice? – Expectedmonetary valueofthegambleis1.250.000USD. – Mostpeopledeclinethegambleandpocketthemillion.Are theybeingirrational? Theutilityofmoney Thedecisioninthepreviousgamedoesnotdependontheprizeonly butalsoonthewealthoftheplayer! LetSn denoteastateofpossessingtotalwealthnUSD,andthecurrent wealthiskUSD. Thetheexpectedutilitiesoftwoactionsare: – EU(Accept) =½U(Sk)+½U(Sk+2.500.000) – EU(Decline) =U(Sk+1.000.000) SupposeweassignU(Sk)=5,U(Sk+1.000.000)=8,U(Sk+2.500.000)=9. Thentherationaldecisionwouldbetodecline! the utility of money risk-seeking behavior in this area risk-averse area (prefer a sure thing with a payoff less than the expected monetary value of a gamble) an agent that has a linear curve is said to be riskneutral. Humanjudgment(certaintyeffect) Theevidencesuggeststhathumansare“predictable irrational“. Allaisparadox – A:80%chanceof4000USD – B:100%chanceof3000USD Whatisyourchoice? • Mostpeopleconsistentlyprefer BoverA(takingthesurething!) – C:20%chanceof4000USD – D:25%chanceof3000USD Whatisyourchoice? • Mostpeopleprefer CoverD(higherexpectedmonetaryvalue) Certaintyeffect– peoplearestronglyattractedtogains thatarecertain Humanjudgment(ambiguityaversion) Ellsbergparadox Theurncontains1/3redballs,and2/3eitherblackoryellow balls. – A:100USDforaredball – B:100USDforablackball Whatisyourchoice? • Mostpeopleprefer AoverB(Agivesa1/3chanceofwinning,whileB couldbeanywherebetween0and2/3) – C:100USDforaredoryellowball – D:100USDforablackoryellowball Whatisyourchoice? • Mostpeopleprefer DoverC(Dgivesyoua2/3chance,whileCcould anywherebetween1/3and3/3) However,ifyouthinktherearemoreredthanblackballsthen yourshouldpreferAoverBandCoverD. Ambiguityaversion– mostpeopleelecttheknown probabilityratherthantheunknownunknown. Humanjudgment Framingeffect– theexactwordingofadecisionproblem canhaveabigimpactontheagent‘schoices – medicalprocedureAhas90%survivalrate – medicalprocedureBhas10%deathrate Whatisyourchoice? • Mostpeopleprefer AoverBthoughbothchoicesareidentical Anchoringeffect– peoplefeelmorecomfortablemaking relativeutilityjudgmentsratherthanabsoluteones – Therestauranttakesadvantageofthisbyofferinga$200 bottlethatitknowsnobodywillbuy,butwhichservesto skewupwardthecustomer’sestimateofthevalueofall winesandmakethe$55bottleseemlikeabargain. Multi-attributeutilitytheory Inreal-lifetheoutcomesarecharacterizedby twoormoreattributessuchascostandsafety issues– multi-attributeutilitytheory. Wewillassumethathighervaluesofan attributecorrespondtohigherutilities. Thequestionishowtogetpreferencesformore attributes – withoutcombiningtheattributevaluesintoa singleutilityvalue– dominance – combiningtheattributevaluesinto asingleutilityvalue Dominance Ifanoptionisoflowervalueonall attributesthatsomeotheroption,it neednotbeconsideredfurther – strictdominance. Strictdominancecanbedefinedfor uncertainoutcomestoo. – ifallpossibleoutcomesofB strictlydominateallpossible outcomesofA Strictdominancewillprobablyoccurlessoften thaninthedeterministiccase. Stochasticdominance Stochasticdominanceoccursmorefrequentlyinrealproblems.It iseasiertounderstandinthecontextofasinglevariable. Stochasticdominanceisbestseenbyexaminingthecumulative distribution thatmeasurestheprobabilitythatthatthecostis lessthanorequalanygivenamount(itintegratestheoriginal distribution). probability distribution cumulative distribution Preferencestructure Tospecifythecompleteutilityfunctionforn attributeseachhavingdvalues,weneeddn valuesintheworstcase. – Thiscorrespondstoasituationinwhichagent‘s preferenceshavenoregularityatall. Preferencesoftypicalagentshavemuchmore structuresothetheutilityfunctioncanbe expressedas: U(x1,…,xn)=F[f1(x1),…,fn(xn)] Preferencestructure(withoutuncertainty) Thebasicregularityiscalledpreferenceindependence. TwoattributesX1 andX2 arepreferentiallyindependentof athirdattributeX3 ifthepreferencebetweenoutcomes x1,x2,x3 and x’1,x’2,x3 doesnotdependontheparticularvaluex3. Ifeachpairofattributesispreferentiallyindependentof anyotherattribute,wetalkaboutmutualpreferential independence(MPI). Ifattributesaremutuallypreferentiallyindependentthen theagent’spreferencebehaviorcanbedescribedas maximizingthefunction: U(x1,…,xn)=∑i Ui(xi) Avaluefunctionofthistypeiscalledanadditivevalue function. Preferencestructure(withuncertainty) Whenuncertaintyispresentweneedtoconsider thestructureofpreferencesbetweenlotteries. Formutuallyutilityindependent(MUI)attributes wecanusemultiplicativeutilityfunction: U=k1U1 +k2U2 +k3U3 +k1k2U1U2 +k2k3U2U3 +k1k3U1U3 +k1k2k3U1U2U3 FornattributesexhibitingMUIwecanrepresent theutilityfunctionusingnconstantsandnsingleattributeutilities. Thevalueofinformation Sofarwehaveassumedthatallrelevantinformation isprovidedtotheagentbeforeitmakesitsdecision. Inpractice,thisishardlyeverthecase.Forexample, adoctorcannotexpecttobeprovidedwiththe resultofallpossiblediagnostictests. Oneofthemostimportantpartsofdecision makingisknowingwhatquestionstoask. Wewillnowlookatinformationvaluetheory,which enablesanagenttochoosewhichinformationto acquire. Thevalueofinformation(example) Supposeanoilcompanyishopingtobuyoneofthenindistinguishable blocksofocean-drillingrights. Letusassumefurther thatexactlyoneoftheblockscontainoilworthC dollars,whileothersareworthless. TheaskingpriceofeachblockisC/n. ExpectedmonetaryvalueofbuyingoneblockisC/n – C/n =0. Nowsuppose thataseismologistoffers thecompanytheresultofasurvey ofonespecificblock,whichindicatesdefinitelywhethertheblockcontains oil. Howmuchshouldthecompanytopayforthatinformation? – Withprobability1/n,thesurveywillindicateoilinagivenblockandthethe companywillbuyitandmakeaprofitC– C/n. – Withprobability(n-1)/n,thesurveywillshowthattheblockcontainsnooil, inwhichcasethecompanywillbuyanotherblock.Nowtheprobabilityof findingoilinthatotherblockis1/(n-1),sotheexpectedprofitisC/(n-1)– C/n. – Togethertheexpectedprofitgiventhesurveyinformationis: 1/n(C– C/n)+(n-1)/n(C/(n-1)– C/n)=C/n Thereforethecompanyshouldbewillingtopaytheseismologist uptoC/n dollarsfortheinformation:theinformationis worthasmuchastheblockitself. Thevalueofinformation(ageneralformula) Weassumethatexactevidencecanbeobtainedaboutthe valueofsomerandomvariableEj – thisiscalledvalueof perfectinformation (VPI). Thevalueofthecurrentbestaction& (withtheinitial evidencee)isdefinedby: EU(&|e)=maxa ∑s‘ P(Result(a)=s‘|a,e)U(s‘) Thevalueofthebestaction&jk afterthenewevidenceEj = ejk isobtainedisdefinedby: EU(& jk|e,Ej=ejk)=maxa ∑s‘ P(Result(a)=s‘|a,e, Ej=ejk)U(s‘) ButthevalueofEj iscurrentlyunknownsowemustaverage overallpossiblevaluesthatwemightdiscoverforEj: VPIe(Ej)=(∑k P(Ej = ejk|e)EU(& jk|e,Ej=ejk))- EU(&|e) Thevalueofinformation(qualitatively) Whenisitbeneficialtoobtainnewinformation? Clear choice, the information is not needed The choice is unclear and the information is crucial The choice is unclear, the information is less valuable Informationhasvaluetotheextendthat – itislikelytocauseachangeofaplanand – thenewplanwillbesignificantlybetterthattheoldplan. Propertiesofthevalueofinformation Isitpossibleforinformationtobedeleterious? Theexpectedvalueofinformationis nonnegative. ∀e,Ej VPIe(Ej)≥ 0 Thevalueofinformationisnotadditive. VPIe(Ej,Ek)≠ VPIe(Ej)+VPIe(Ek) Theexpectedvalueofinformationisorder independent. VPIe(Ej,Ek)=VPIe(Ej)+VPIe,ej(Ek)=VPIe(Ek)+VPIe,ek(Ej) Informationgathering Asensibleagentshould – askquestionsinareasonableorder – avoidaskingquestionsthatareirrelevant – takeintoaccounttheimportanceofeachpieceofinformationinrelation itscost – stopaskingquestionswhenthatisappropriate Weassumethatwitheachobservableevidence variableEj,thereisan associatedcost,Cost(Ej). Information-gathering agentcanselect(greedily)themostefficient observationuntilnoobservationworthitscosts(myopicapproach) © 2016 Roman Barták Department of Theoretical Computer Science and Mathematical Logic [email protected]
© Copyright 2026 Paperzz