Utilities and Decision Theory. Lirong Xia

Utilities and Decision Theory Lirong Xia

Checking conditional independence from BN graph ØGiven random variables Z 1, Z p, we are asked whether X Y Z 1, Z p dependent if there exists a path where all triples are active independent if for each path, there exists an inactive triple 2

General method for variable elimination Ø Compute a marginal probability p(x 1,,x p ) in a Bayesian network Let Y 1,,Y k denote the remaining variables Step 1: fix an order over the Y s (wlog Y 1 > >Y k ) Step 2: rewrite the summation as sth only involving X s sth only Σ y1 Σ y2 involving Y 1, Σ yk-1 Σ yk anything involving Y 1 and X s sth only Y 2 and X s sth only involving Y 1, Y 2,,Y k-1 and X s Step 3: variable elimination from right to left 3

Today ØUtility theory expected utility: preferences over lotteries maximum expected utility (MEU) principle 4

Expectimax Search Trees Ø Expectimax search Max nodes (we) as in minimax search Chance nodes Need to compute chance node values as expected utilities Ø Next class we will formalize the underlying problem as a Markov decision Process 5

Expectimax Pseudocode Ø Def value(s): If s is a max node return maxvalue(s) If s is a chance node return expvalue(s) If s is a terminal node return evaluations(s) Ø Def maxvalue(s): values = [value(s ) for s in successors(s)] return max(values) Ø Def expvalue(s): values = [value(s ) for s in successors(s)] weights = [probability(s, s ) for s in successors(s)] return expectation(values, weights) 6

Expectimax Quantities 7

Maximum expected utility ØPrinciple of maximum expected utility: A rational agent should chose the action which maximizes its expected utility, given its (probabilistic) knowledge Ø Questions: Where do utilities come from? How do we know such utilities even exist? What if our behavior can t be described by utilities? 8

Inference with Bayes Rule Ø Example: diagnostic probability from causal probability: p Ø Example: ( Cause Effect ) F is fire, {f, f} A is the alarm, {a, a} = ( ) ( ) p Effect Cause p Cause p Effect ( ) p(f)=0.001 p(a)=0.1 p(a f)=0.9 ( ) = p ( a f ) p f p f a ( ) p a ( ) = 0.9 0.001 0.1 = 0.009 Note: posterior probability of fire still very small Note: you should still run when hearing an alarm! Why? 9

0.009 stay run out 0.991 100% 10

Utilities Ø Utilities are functions from outcomes (states of the world, sample space) to real numbers that represent an agent s preferences Ø Where do utilities come from? -10100 100 In a game, may be simple (+1/-1) Utilities summarize the agent s goals -100 11

Preferences over lotteries ØAn agent chooses among: Prizes: A, B, etc. Lotteries: situations with uncertain prizes L = " # p, A; ØNotation: ( 1 p $ ),B% L p 1-p A B A B A B A B A is strictly preferred to B In difference between A and B A is strictly preferred to or indifferent with B 12

Utility theory in Economics ØState of the world: money you earn Money does not behave as a utility function, but we can talk about the utility of having money (or being in debt) Ø Which would you prefer? A lottery ticket that pays out $10 with probability.5 and $0 otherwise, or A lottery ticket that pays out $1 with probability 1 Ø How about: A lottery ticket that pays out $100,000,000 with probability.5 and $0 otherwise, or A lottery ticket that pays out $10,000,000 with probability 1 Ø Usually, people do not simply go by expected value 13

Ø Which one you would prefer? Lottery A: $1M@100% Lottery B: $1M@89% + $5M@10% + 0@1% Ø How about Lottery A: $1M@11%+0@89% Lottery B: $0@90% + $5M@10% 14

Encoding preferences over lotteries ØHow many lotteries? infinite! ØNeed to find a compact representation Maximum expected utility (MEU) principle which type of preferences (rankings over lotteries) can be represented by MEU? 15

Rational Preferences ØWe want some constraints on preferences before we call them rational ØFor example: an agent with intransitive preferences can be induced to give away all of its money If B C, then an agent with C would pay (say) 1 cent to get B If A B, then an agent with B would pay (say) 1 cent to get A If C A, then an agent with A would pay (say) 1 cent to get C ( A B) ( B C) ( A C) 16

Rational Preferences ØPreference of a rational agent must obey constraints The axioms of rationality: for all lotteries A, B, C ( Orderability A B) ( B A) ( A B) ( Transitivity A B) ( B C) ( A C) Continuity A B C p$ p, A;1 p,c& % ' B Substitutability A B # p, A;1 p,c% $ & # p,b;1 p,c% $ & Monotonicity A B p q % p, A;1 p,b' & ( % q, A;1 q,b ' & ( ØTheorem: rational preferences imply behavior describable as maximization of expected utility ( ) 17

MEU Principle Ø Theorem: [Ramsey, 1931; von Neumann & Morgenstern, 1944] Given any preference satisfying these axioms, there exists a real-value function U such that: ( ) >U ( B) A B U A U (" # p 1,S 1 ; ; p n,s $ n %) = pu S i i i ( ) ØMaximum expected utility (MEU) principle: Choose the action that maximizes expected utility Utilities are just a representation! an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities Utilities are NOT money 18

What would you do? -10100 0.009 stay run out 0.991 100% 100-100 19

Common types of utilities 20

Risk attitudes ØAn agent is risk-neutral if she only cares about the expected value of the lottery ticket ØAn agent is risk-averse if she always prefers the expected value of the lottery ticket to the lottery ticket Most people are like this ØAn agent is risk-seeking if she always prefers the lottery ticket to the expected value of the lottery ticket

Decreasing marginal utility ØTypically, at some point, having an extra dollar does not make people much happier (decreasing marginal utility) utility buy a nicer car (utility = 3) buy a car (utility = 2) buy a bike (utility = 1) $200 $1500 $5000 money

Maximizing expected utility utility buy a nicer car (utility = 3) buy a car (utility = 2) buy a bike (utility = 1) $200 $1500 $5000 money Ø Lottery 1: get $1500 with probability 1 gives expected utility 2 Ø Lottery 2: get $5000 with probability.4, $200 otherwise gives expected utility.4*3 +.6*1 = 1.8 (expected amount of money =.4*$5000 +.6*$200 = $2120 > $1500) Ø So: maximizing expected utility is consistent with risk aversion

Different possible risk attitudes under expected utility maximization utility money ØGreen has decreasing marginal utility risk-averse Ø Blue has constant marginal utility risk-neutral Ø Red has increasing marginal utility risk-seeking ØGrey s marginal utility is sometimes increasing, sometimes decreasing neither risk-averse (everywhere) nor risk-seeking (everywhere)

Example: Insurance Ø Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties expected utility You own a car. Your lottery: L Y = [0.8, $0; 0.2, -$200] i.e., 20% chance of crashing You do not want -$200! Insurance is $50 U Y (L Y ) = 0.2*U Y (-$200)=-200 U Y (-$50)=-150 Amount Your Utility U Y $0 0 -$50-150 -$200-1000 25

Example: Insurance Ø Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties expected utility You own a car. Your lottery: L Y = [0.8, $0; 0.2, -$200] i.e., 20% chance of crashing You do not want -$200! Insurance company buys risk: L I = [0.8, $50; 0.2, -$150] i.e., $50 revenue + your L Y Insurer is risk-neutral: U(L) = U(EMV(L)) U Y (L Y ) = 0.2*U Y (-$200)=-200 U Y (-$50)=-150 U I (L I ) = U(0.8*50+0.2*(-150)) = U($10) >U($0) 26

Acting optimally over time Ø Finite number of periods: Overall utility = sum of rewards in individual periods Ø Infinite number of periods: are we just going to add up the rewards over infinitely many periods? Always get infinity! Ø (Limit of) average payoff: lim n Σ 1 t n r(t)/n Limit may not exist Ø Discounted payoff: Σ t ϒ t r(t) for some ϒ < 1 Interpretations of discounting: Interest rate r: ϒ= 1/(1+r) World ends with some probability 1-ϒ Ø Discounting is mathematically convenient We will see more in the next class 27