Decision making in the presence of uncertainty

CS 271 Foundations of AI Lecture 21 Decision making in the presence of uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Decision-making in the presence of uncertainty Many real-world problems require to choose future actions in the presence of uncertainty Examples: patient management, investment decisions Main issues: How to model the decision process in the computer? How to make decisions about actions in the presence of uncertainty? 1

(Stochastic) Decision tree Decision tree: Stock 1 Stock 2 Home decision node chance node outcome (value) node 11 9 14 8 11 1 Decision tree: solution Expectimax Stock 1 Stock 2 Home 12 14 11 1 decision node chance node outcome (value) node 11 9 14 8 11 1 2

Sequential (multi-step) problems The decision tree can be build to capture multi-step decision problems: Choose an action Observe a stochastic outcome And repeat How to make decisions for multi-step problems? Start from the leaves of the decision tree (outcome nodes) Compute expectations at chance nodes Maximize at the decision nodes Algorithm is sometimes called expectimax Multi-step problem example Assume: Two investment periods Two actions: stock and bank Stock 117 11 15 95 11 Stock Stock Stock 15 125 95 9 11 15.5.5.5.5.5.5 2 1 125 13 6 9 14 8 15 3

Conditioning in the decision tree But this may not hold in general. In decision trees: Later outcomes can be conditioned on the earlier stochastic outcomes and actions Example: stock movement probabilities. Assume: 1 st =up)= 2 nd =up 1 st =up)= 2 nd =up 1 st =down)=.5 Stock (1 st up) (1 st down) Stock Stock (2 nd up) 2 (2 nd down) 1 125.5 (2 nd up) 13.5 (2 nd down) 6 9 Trajectory payoffs Outcome values at leaf nodes (e.g. monetary values) Rewards and costs for the path trajectory Example: stock fees and gains. Assume: Fee per period: $5 paid at the beginning Gain for up: 15%, loss for down 1% 1 1-5 Stock (1-5)*1.15 (1 st up) (1 st down) (1-5)*1.15-5 Stock Stock [(1-5)*1.15-5]*1.15=131.14 [(1-5)*1.15-5]*.9=125.33 (2 nd up).5.5 (2 nd down) (2 nd up) 131.14 (2 nd down) 125.33 4

Information-gathering actions Many actions and their outcomes irreversibly change the world Information-gathering (exploratory) actions: make an inquiry about the world Key benefit: reduction in the uncertainty Example: medicine Assume a patient is admitted to the hospital with some set of initial complaints We are uncertain about the underlying problem and consider a surgery, or a medication to treat them But there are often lab tests or observations that can help us to determine more closely the disease the patient suffers from Goal of lab tests: Reduce the uncertainty of outcomes of treatments so that better treatment option can be chosen Decision-making with exploratory actions In decision trees: Exploratory actions can be represented and reasoned about the same way as other actions. How do we capture the effect of exploratory actions in the decision tree model? Information obtained through exploratory actions may affect the probabilities of later outcomes Recall that the probabilities on later outcomes can be conditioned on past observed outcomes and past actions Sequence of past actions and outcomes is remembered within the decision tree branch 5

An oil wildcatter has to make a decision of whether to drill or not to drill on a specific site Chance of hitting an oil deposit: Oil: 4% No-oil: 6% Oil T) Oil F) Cost of drilling: 7K Payoffs: Oil: 22K No-oil: K 22-7=15-7 An oil wildcatter has to make a decision of whether to drill or not to drill on a specific site Chance of hitting an oil deposit: Oil: 4% Oil T) No-oil: 6% Oil F) Cost of drilling: 7K Payoffs: Oil: 22K 18 No-oil: K 22-7=15-7 6

Oil wildcatter problem Assume that in addition to the drill/no-drill choices we have an option to run the seismic resonance test Seismic resonance test results: Closed pattern (more likely when the hole holds the oil) Diffuse pattern (more likely when it is empty) Seismic resonance test Oil ) Seismic resonance test pattern Oil cost: 1K True False closed.8.3 diffuse.2.7 Decision tree (diffuse) 7

Compute outcomes Oil: + 22 : - 7 : - 1? (diffuse) Compute outcomes Oil: + 22 : - 7 : - 1? (diffuse) 8

Compute outcomes Oil: + 22 : - 7 : - 1-7-1= -8? (diffuse) Compute outcomes Oil: + 22 : - 7 : - 1 (diffuse) -7 22-7=15-7-1= -8-1= -1-7-1= -8-1= -1 9

Compute probabilities? (diffuse) 22-7=15-7?? -7-1= -8-1= -1-7-1= -8-1= -1 Decision tree probabilities? -7-1=-8 No?) -1=-1 1

? Decision tree probabilities No -7-1=-8-1=-1 Oil T closed ) Oil T closed )? Decision tree probabilities No 4-7-1=-8-1=-1 Oil T closed ) closed Oil T) Oil T).8* Oil T closed) 4 closed).8* *.2 11

Decision tree probabilities No 4? -7-1=-8-1=-1 Oil F closed ) closed Oil T) Oil T).8* Oil T closed) 4 closed).8* *.2 Decision tree probabilities No 4.36-7-1=-8-1=-1 Oil F closed ) closed Oil T) Oil T).8* Oil T closed) 4 closed).8* *.2 closed Oil F) Oil F) Oil F closed ).36 T closed ) 12

Decision tree probabilities 4.36-7-1=-8-1=-1 No Oil closed ) closed Oil T) Oil T).8* Oil T closed) 4 closed).8* *.2 closed Oil F) Oil F) Oil F closed ).36 T closed ) closed ) closed Oil F) Oil F) closed Oil T) Oil T).5 Decision tree probabilities? (diffuse) No ) 4.36-7-1=-8-1=-1 closed ) closed Oil F) Oil F) closed Oil T) Oil T) diff ) diff Oil F) Oil F) diff Oil T) Oil T) 13

Decision tree probabilities.5.5 (diffuse) No ) 4.36-7-1=-8-1=-1 closed ) closed Oil F) Oil F) closed Oil T) Oil T) diff ) diff Oil F) Oil F) diff Oil T) Oil T) Decision tree.5.5 4.36.16.84 (diffuse) 22-7=15-7 -7-1=-8-1=-1-7-1=-8-1=-1 14

Alternative model.5.5 (diffuse) 4.36.16.84 No -7-1=-8-1=-1-7-1=-8-1=-1 22-7=15-7 Decision tree 25.4 No 18 6.8.5 6.8 4.36-1 -44.8.16.5-1 (diffuse) -1.84 18 18-1=-1-7-1=-8-1=-1 22-7=15-7 -7-1=-8 15

Decision tree 6.8 4 6.8.36-7-1=-8-1.5 25.4-1=-1-44.8.16.5-1.84 (diffuse) -7-1=-8-1 -1=-1 No 18 22-7=15 18 The presence of the test and its result affected our decision: 18-7 if test =closed then drill if test=diffuse then do not drill Value of information When the test makes sense? Only when its result makes the decision maker to change his mind, that is he decides not to drill. Value of information: Measure of the goodness of the information from the test Difference between the expected value with and without the test information Oil wildcatter example: Expected value without the test = 18 Expected value with the test =25.4 Value of information for the seismic test = 7.4 16

Using utility to measure the outcomes Selection based on expected values Until now: The optimal action choice was the option that maximized the expected monetary value. But is the expected monetary value always the quantity we want to optimize? Stock 1 Stock 2 Home 12 14 11 1 11 9 14 8 11 1 17

Selection based on expected values Is the expected monetary value always the quantity we want to optimize? Answer: Yes, but only if we are risk-neutral. But what if we do not like the risk (we are risk-averse)? In that case we may want to get the premium for undertaking the risk (of loosing the money) Example: we may prefer to get $11 for sure against $12 in expectation but with the risk of loosing the money Problem: How to model decisions and account for the risk? Solution: use utility function, and utility theory Utility function (denoted U) Utility function Quantifies how we value outcomes, i.e., it reflects our preferences Can be also applied to value outcomes other than money and gains (e.g. utility of a patient being healthy, or ill) Decision making: uses expected utilities (denoted EU) EU( X ) X x) U( X x) x X U( X x) the utility of outcome x Important!!! Under some conditions on preferences we can always design the utility function that fits our preferences 18

Utility theory Defines axioms on preferences that involve uncertainty and ways to manipulate them. Uncertainty is modeled through lotteries Lottery: [ p : A;(1 p) : C] Outcome A with probability p Outcome C with probability (1-p) The following six constraints are known as the axioms of utility theory. The axioms are the most obvious semantic constraints on preferences with lotteries. Notation: ~ - preferable - indifferent (equally preferable) Axioms of the utility theory Orderability: Given any two states, a rational agent prefers one of them, else the two as equally preferable. ( A B) ( B A) ( A ~ B) Transitivity: Given any three states, if an agent prefers A to B and prefers B to C, the agent must prefer A to C. ( A B) ( B C) ( A C) Continuity: If some state B is between A and C in preference, then there is a p for which the rational agent will be indifferent between state B and the lottery in which A comes with probability p, C with probability (1-p). ( A B C) p [ p : A;(1 p) : C] ~ B 19

Axioms of the utility theory Substitutability: If an agent is indifferent between two lotteries, A and B, then there is a more complex lottery in which A can be substituted with B. ( A ~ B) [ p : A;(1 p) : C] ~ [ p : B;(1 p) : C] Monotonicity: If an agent prefers A to B, then the agent must prefer the lottery in which A occurs with a higher probability ( A B) ( p q [ p : A;(1 p) : B] [ q : A;(1 q) : B]) Decomposability: Compound lotteries can be reduced to simpler lotteries using the laws of probability. [ p : A;(1 [ p : A;(1 p) :[ q : B;(1 q) : C]] p) q : B;(1 p)(1 q) : C] Utility theory If the agent obeys the axioms of the utility theory, then 1. there exists a real valued function U such that: U( A) U( B) A B U ( A) U( B) A ~ B 2. The utility of the lottery is the expected utility, that is the sum of utilities of outcomes weighted by their probability U[ p : A;(1 p) : B] pu( A) (1 p) U( B) 3. Rational agent makes the decisions in the presence of uncertainty by maximizing its expected utility 2

Utility functions We can design a utility function that fits our preferences if they satisfy the axioms of utility theory. But how to design the utility function for monetary values so that they incorporate the risk? What is the relation between utility function and monetary values? Assume we loose or gain $1. Typically this difference is more significant for lower values (around $1-1) than for higher values (~ $1,,) What is the relation between utilities and monetary value for a typical person? Utility functions What is the relation between utilities and monetary value for a typical person? Concave function that flattens at higher monetary values utility 1, Monetary value 21

Utility functions Expected utility of a sure outcome of 75 utility EU(sure 75) U(x) 5 75 1 Monetary value Utility functions Assume a lottery L [.5: 5,.5:1] Expected value of the lottery = 75 Expected utility of the lottery EU(L) is different: EU(L) =.5U(5) +.5*U(1) U(x) utility EU(lottery L) EU line for lotteries with outcomes 5 and 1 Lottery L: [.5: 5,.5:1] 5 75 1 Monetary value 22

Utility functions Expected utility of the lottery EU(lottery L) < EU(sure 75) utility EU(sure 75) EU(lottery L) U(x) Lottery L: [.5: 5,.5:1] 5 75 1 Monetary value Risk aversion a bonus is required for undertaking the risk Decision making with utility function Original problem with monetary outcomes Stock 1 Stock 2 Home 12 14 11 1 11 9 14 8 11 1 23

Decision making with the utility function Utility function log (x) Stock 1 Stock 2 Home 2.653 2.3 2.4 2 2.413 1.9542 2.1461 1.93 2.43 2. 24