Decision making in the presence of uncertainty

Similar documents
Decision making in the presence of uncertainty

Decision making in the presence of uncertainty

CS188 Spring 2012 Section 4: Games

Non-Deterministic Search

CEC login. Student Details Name SOLUTIONS

CS 6300 Artificial Intelligence Spring 2018

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities

CS 188: Artificial Intelligence

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted

CS 188: Artificial Intelligence

CS 5522: Artificial Intelligence II

CSE 473: Artificial Intelligence

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs.

CS 188: Artificial Intelligence Spring Announcements

Utilities and Decision Theory. Lirong Xia

Markov Decision Processes

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case

Complex Decisions. Sequential Decision Making

CS 343: Artificial Intelligence

Decision Making Supplement A

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning?

CSEP 573: Artificial Intelligence

CS 343: Artificial Intelligence

Introduction to Decision Making. CS 486/686: Introduction to Artificial Intelligence

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning?

CS 188: Artificial Intelligence Fall 2011

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Expectimax and other Games

Microeconomics of Banking: Lecture 5

Q1. [?? pts] Search Traces

CS 4100 // artificial intelligence

Markov Decision Process

CUR 412: Game Theory and its Applications, Lecture 9

Introduction to Artificial Intelligence Spring 2019 Note 2

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

G5212: Game Theory. Mark Dean. Spring 2017

Announcements. CS 188: Artificial Intelligence Spring Expectimax Search Trees. Maximum Expected Utility. What are Probabilities?

CS 188: Artificial Intelligence Spring Announcements

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2017

To earn the extra credit, one of the following has to hold true. Please circle and sign.

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

Chapter 11: Dynamic Games and First and Second Movers

Markov Decision Processes

Advanced Engineering Project Management Dr. Nabil I. El Sawalhi Assistant professor of Construction Management

Algorithms and Networking for Computer Games

An introduction on game theory for wireless networking [1]

Mohammad Hossein Manshaei 1394

V. Lesser CS683 F2004

Lecture 12: Introduction to reasoning under uncertainty. Actions and Consequences

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Chapter 10: Mixed strategies Nash equilibria, reaction curves and the equality of payoffs theorem

UNIT 5 DECISION MAKING

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

MGS 3100 Business Analysis. Chapter 8 Decision Analysis II. Construct tdecision i Tree. Example: Newsboy. Decision Tree

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

Subject : Computer Science. Paper: Machine Learning. Module: Decision Theory and Bayesian Decision Theory. Module No: CS/ML/10.

ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE

CS 188: Artificial Intelligence. Outline

Stochastic Games and Bayesian Games

Strategy Lines and Optimal Mixed Strategy for R

Lecture outline W.B.Powell 1

Multiagent Systems. Multiagent Systems General setting Division of Resources Task Allocation Resource Allocation. 13.

Foundations of Artificial Intelligence

Computational Finance Least Squares Monte Carlo

Week 8: Basic concepts in game theory

TIm 206 Lecture notes Decision Analysis

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

The Ohio State University Department of Economics Econ 601 Prof. James Peck Extra Practice Problems Answers (for final)

Reinforcement Learning

SCHOOL OF BUSINESS, ECONOMICS AND MANAGEMENT. BF360 Operations Research

Week 8: Basic concepts in game theory

Extensive-Form Games with Imperfect Information

Energy and public Policies

Using the Maximin Principle

Chapter 18 Student Lecture Notes 18-1

EXPECTED MONETARY VALUES ELEMENTS OF A DECISION ANALYSIS QMBU301 FALL 2012 DECISION MAKING UNDER UNCERTAINTY

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Decision Analysis

Answer Key: Problem Set 4

Causes of Poor Decisions

CUR 412: Game Theory and its Applications Final Exam Ronaldo Carpio Jan. 13, 2015

CHAPTER 15 Sequential rationality 1-1

Markov Decision Processes

MBF1413 Quantitative Methods

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

3.2 Aids to decision making

Stochastic Games and Bayesian Games

Their opponent will play intelligently and wishes to maximize their own payoff.

Progressive Hedging for Multi-stage Stochastic Optimization Problems

16 MAKING SIMPLE DECISIONS

Introduction to Industrial Organization Professor: Caixia Shen Fall 2014 Lecture Note 5 Games and Strategy (Ch. 4)

Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities

Risk-neutral Binomial Option Valuation

Decision Theory. Course details. course notes 2008/2009. Studymanual (online) c L.C. van der Gaag, S. Renooij, P.

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

Markov Decision Processes. Lirong Xia

Transcription:

CS 2750 Foundations of AI Lecture 20 Decision making in the presence of uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Decision-making in the presence of uncertainty Computing the probability of some event may not be our ultimate goal Instead we are often interested in making decisions about our future actions so that we satisfy some goals Example: medicine Diagnosis is typically only the first step The ultimate goal is to manage the patient in the best possible way. Typically many options available: Surgery, medication, collect the new info (lab test) There is an uncertainty in the outcomes of these procedures: patient can be improve, get worse or even die as a result of different management choices. 1

Decision-making in the presence of uncertainty Main issues: How to model the decision process with uncertain outcomes in the computer? How to make decisions about actions in the presence of uncertainty? The field of decision-making studies ways of making decisions in the presence of uncertainty. Decision making example. Assume we want to invest $ for 6 months We have 4 choices: 1. Invest in 1 2. Invest in 2 3. Put money in bank 4. Keep money at home 1 2 1 value can go up or down: Up: with probability Down: with probability 2

Decision making example. Assume we want to invest $ for 6 months We have 4 choices: 1. Invest in 1 2. Invest in 2 3. Put money in bank 4. Keep money at home 1 2 1 value can go up or down: Up: with probability Down: with probability Monetary Outcomes for up and down states Decision making example. Investing of $ for 6 months 1 2 Monetary outcomes for different states 3

Decision making example. We need to make a choice whether to invest in 1 or 2, put money into bank or keep them at home. But how?? 1 2 Monetary outcomes for different scenarios Decision making example. Assume a simplified problem with the and choices only. The result is guaranteed the outcome is deterministic What is the rational choice assuming our goal is to make money? 4

Decision making. Deterministic outcome. Assume a simplified problem with the and choices only. These choices are deterministic. Our goal is to make money. What is the rational choice? Answer: Put money into the bank. The choice is always strictly better in terms of the outcome But what to do if we have uncertain outcomes? Decision making. Stochastic outcome How to quantify the goodness of the stochastic outcome? We want to compare it to deterministic and other stochastic outcomes. 1 2? 5

Decision making. Stochastic outcome How to quantify the goodness of the stochastic outcome? We want to compare it to deterministic and other stochastic outcomes. 1 2 Idea: Use the expected value of the outcome Expected value Let X be a random variable representing the monetary outcome with a discrete set of values X. Expected value of X is: E ( X ) xp( X x) x Intuition: Expected value summarizes all stochastic outcomes into a single quantity. X Example: 1 What is the expected value of the outcome of 1 option? 6

Expected value Let X be a random variable representing the monetary outcome with a discrete set of values X. Expected value of X is: E ( X ) xp( X x) x X Expected value summarizes all stochastic outcomes into a single quantity Example: 1 Expected value for the outcome of the 1 option is: 66 36 Expected values Investing $ for 6 months 1 2 66 36? 7

Expected values Investing $ for 6 months 1 104 2 66 36 56 48 104 Expected values Investing $ for 6 months 1 104 2 66 36 56 48 104? 8

Expected values Investing $ for 6 months 1 104 2 66 36 56 48 104 Expected values Investing $ for 6 months 1 104 2 66 36 56 48 104? 9

Expected values Investing $ for 6 months 1 104 2 66 36 56 48 104 Selection based on expected values The optimal action is the option that maximizes the expected outcome: 1 104 2 10

Relation to the game search Game search: minimax algorithm considers the rational opponent and its best move Decision making: maximizes the expectation play against the nature a stochastic non-malicious opponent 1 2 104 (Stochastic) Decision tree Decision tree: 1 2 104 decision node chance node outcome (value) node 11

Sequential (multi-step) problems The decision tree can be build to capture multi-step decision problems: Choose an action Observe the stochastic outcome And repeat How to make decisions for multi-step problems? Start from the leaves of the decision tree (outcome nodes) Compute expectations at chance nodes Maximize at the decision nodes Algorithm is sometimes called expectimax Multi-step problem example Assume: Two investment periods Two actions: stock and bank 12

Multi-step problem example Assume: Two investment periods Two actions: stock and bank 150 Multi-step problem example Assume: Two investment periods Two actions: stock and bank 150 95 13

Multi-step problem example Assume: Two investment periods Two actions: stock and bank 150 95 150 95 Multi-step problem example Assume: Two investment periods Two actions: stock and bank 117 150 95 150 95 14

Multi-step problem example Assume: Two investment periods Two actions: stock and bank 117 150 95 150 95 Multi-step problems. Conditioning. Notice that the probability of stock going up and down in the 2 nd step is independent of the 1 st step (=) 15

Conditioning in the decision tree But this may not hold in general. In decision trees: Later outcomes can be conditioned on the earlier stochastic outcomes and actions Example: stock movement probabilities. Assume: P(1 st =up)= P(2 nd =up 1 st =up)= P(2 nd =up 1 st =down)= (1 st up) (1 st down) (2 nd up) (2 nd down) (2 nd up) (2 nd down) Multi-step problems. Conditioning. Tree Structure: every observed stochastic outcome = 1 branch P(1 st =up)= P(2 nd =up 1 st =up)= P(2 nd =up 1 st =down)= (1 st up) (1 st down) (1 st up) (1 st down) (2 nd up) (2 nd down) (2 nd up) (2 nd down) (2 nd up) (2 nd down) (2 nd up) (2 nd down) 16

Trajectory payoffs Outcome values at leaf nodes (e.g. monetary values) Rewards and costs for the path trajectory Example: stock fees and gains. Assume: Fee per period: $5 paid at the beginning Gain for up: 15%, loss for down 10% 0 0-5 (0-5)*1.15 (1 st up) (1 st down) (0-5)*1.15-5 [(0-5)*1.15-5]*1.15=1310.14 [(0-5)*1.15-5]*0.9=5.33 (2 nd up) (2 nd down) (2 nd up) 1310.14 (2 nd down) 5.33 Constructing a decision tree The decision tree is rarely given to you directly. Part of the problem is to construct the tree. Example: stocks, bonds, bank for k periods : Probability of stocks going up in the first period: 0.3 Probability of stocks going up in subsequent periods: P(kth step=up (k -1)th step =Up)= P(kth step =Up (k -1)th step=down)= Return if stock goes up: 15 % if down: 10% Fixed fee per investment period: $5 Bonds: Probability of value up:, down: Return if bond value is going up: 7%, if down: 3% Fee per investment period: $2 : Guaranteed return of 3% per period, no fee 17