Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1)

Size: px
Start display at page:

Download "Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1)"

Transcription

1 CS 188: Artificial Intelligence Fall 007 Lecture 9: Utilitie 9/5/007 Dan Klein UC Berkeley Project (due 10/1) Announcement SVN group available, u to requet Midterm 10/16 in cla One ide of a page cheat heet allowed (provided you write it yourelf) Tell u NOW about conflict! An agent chooe among: Prize: A, B, etc. Lotterie: ituation with uncertain prize Notation: Preference Rational Preference We want ome contraint on preference before we call them rational For example: an agent with intranitive preference can be induced to give away all it money If B > C, then an agent with C would pay (ay) 1 cent to get B If A > B, then an agent with B would pay (ay) 1 cent to get A If C > A, then an agent with A would pay (ay) 1 cent to get C Rational Preference Preference of a rational agent mut obey contraint. Thee contraint are the axiom of rationality MEU Principle Theorem: [Ramey, 191; von Neumann & Morgentern, 19] Given any preference atifying thee contraint, there exit a real-valued function U uch that: Theorem: Rational preference imply behavior decribable a maximization of expected utility Maximum expected likelihood (MEU) principle: Chooe the action that maximize expected utility Note: an agent can be entirely rational (conitent with MEU) without ever repreenting or manipulating utilitie and probabilitie E.g., a lookup table for perfect tictactoe, reflex vacuum cleaner 1

2 Human Utilitie Utilitie map tate to real number. Which number? Standard approach to aement of human utilitie: Compare a tate A to a tandard lottery L p between ``bet poible prize'' u + with probability p ``wort poible catatrophe'' u - with probability 1-p Adjut lottery probability p until A ~ L p Reulting p i a utility in [0,1] Utility Scale Normalized utilitie: u + = 1.0, u - = 0.0 Micromort: one-millionth chance of death, ueful for paying to reduce product rik, etc. QALY: quality-adjuted life year, ueful for medical deciion involving ubtantial rik Note: behavior i invariant under poitive linear tranformation With determinitic prize only (no lottery choice), only ordinal utility can be determined, i.e., total order on prize Example: Inurance Conider the lottery [0.5,$1000; 0.5,$0] What i it expected monetary value? ($500) What i it certainty equivalent? Monetary value acceptable in lieu of lottery $00 for mot people Difference of $100 i the inurance premium There an inurance indutry becaue people will pay to reduce their rik If everyone were rik-prone, no inurance needed! Money Money doe not behave a a utility function Given a lottery L: Define expected monetary value EMV(L) Uually U(L) < U(EMV(L)) I.e., people are rik-avere Utility curve: for what probability p am I indifferent between: A prize x A lottery [p,$m; (1-p),$0] for large M? Typical empirical data, extrapolated with rik-prone behavior: Example: Human Rationality? Famou example of Allai (195) A: [0.8,$k; 0.,$0] B: [1.0,$k; 0.0,$0] C: [0.,$k; 0.8,$0] D: [0.5,$k; 0.75,$0] Mot people prefer B > A, C > D But if U($0) = 0, then B > A U($k) > 0.8 U($k) C > D 0.8 U($k) > U($k) [DEMOS] Reinforcement Learning Baic idea: Receive feedback in the form of reward Agent utility i defined by the reward function Mut learn to act o a to maximize expected reward Change the reward, change the learned behavior Example: Playing a game, reward at the end for winning / loing Vacuuming a houe, reward for each piece of dirt picked up Automated taxi, reward for each paenger delivered

3 Markov Deciion Procee An MDP i defined by: A et of tate S A et of action a A A tranition function T() Prob that a from lead to i.e., P(,a) Alo called the model A reward function R(, a, ) Sometime jut R() or R() A tart tate (or ditribution) Maybe a terminal tate MDP are a family of nondeterminitic earch problem Reinforcement learning: MDP where we don t know the tranition or reward function Solving MDP In determinitic ingle-agent earch problem, want an optimal plan, or equence of action, from tart to a goal In an MDP, we want an optimal policy π() A policy give an action for each tate Optimal policy maximize expected if followed Define a reflex agent Optimal policy when R(, a, ) = -0.0 for all non-terminal Example Optimal Policie Example: High-Low R() = R() = -0.0 Three card type:,, Infinite deck, twice a many Start with howing After each card, you ay high or low New card i flipped If you re right, you win the point hown on the new card Tie are no-op If you re wrong, game end R() = -0. R() = -.0 Difference from expectimax: #1: get reward a you go #: you might play forever! State:,,, done Action: High, Low Model: T(, a, ): P(=done, High) = / P(=, High) = 0 P(=, High) = 0 P(=, High) = 1/ P(=done, Low) = 0 P(=, Low) = 1/ P(=, Low) = 1/ P(=, Low) = 1/ Reward: R(, a, ): Number hown on if 0 otherwie Start: High-Low Note: could chooe action with earch. How? Example: High-Low T = 0.5, R = T = 0.5, R = High High Low High Low High Low Low, High, Low T = 0, R = T = 0.5, R = 0

4 MDP Search Tree Each MDP tate give an expectimax-like earch tree (, a) i a q-tate a, a () called a tranition T() = P(,a) R() i a tate Utilitie of Sequence In order to formalize optimality of a policy, need to undertand utilitie of equence of reward Typically conider tationary preference: Auming that reward depend only on tate for thee lide! Theorem: only two way to define tationary utilitie Additive utility: Dicounted utility: Infinite Utilitie?! Problem: infinite equence with infinite reward Solution: Finite horizon: Terminate after a fixed T tep Give nontationary policy (π depend on time left) Aborbing tate(): guarantee that for every policy, agent will eventually die (like done for High-Low) Dicounting: for 0 < γ < 1 Smaller γ mean maller horizon horter term focu Typically dicount reward by γ < 1 each time tep Sooner reward have higher utility than later reward Alo help the algorithm converge Dicounting Utilitie of State Policy Evaluation Fundamental operation: compute the utility of a tate Define the utility of a tate, under a fixed policy π: V π () = expected total dicounted reward (return) tarting in and following π π(), π() How do we calculate the V for a fixed policy? Idea one: turn recurive equation into update Recurive relation (one-tep lookahead): Idea two: it jut a linear ytem, olve with Matlab (or whatever)

5 Example: High-Low Policy: alway ay high Iterative update: [DEMO] Example: GridWorld Equivalent to doing fixed depth earch and plugging in zero at leave Q-Function To implify thing, introduce a q- value, for a tate and action (q-tate) under a policy Utility of tarting in tate, taking action a, then following π thereafter a, a Goal: calculate the optimal utility of each tate Optimal Utilitie V*() = expected (dicounted) reward with optimal action Why? Given optimal utilitie, MEU let u compute the optimal policy Practice: Computing Action Which action hould we choe from tate : Given optimal q-value Q? Given optimal value V? 5

CS 188: Artificial Intelligence Fall Markov Decision Processes

CS 188: Artificial Intelligence Fall Markov Decision Processes CS 188: Artificial Intelligence Fall 2007 Lecture 10: MDP 9/27/2007 Dan Klein UC Berkeley Markov Deciion Procee An MDP i defined by: A et of tate S A et of action a A A tranition function T(,a, ) Prob

More information

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011 CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDP 2/16/2011 Announcement Midterm: Tueday March 15, 5-8pm P2: Due Friday 4:59pm W3: Minimax, expectimax and MDP---out tonight, due Monday February

More information

CS 188: Artificial Intelligence. Maximum Expected Utility

CS 188: Artificial Intelligence. Maximum Expected Utility CS 188: Artificial Intelligence Lecture 7: Utility Theory Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Maximum Expected Utility Why should we average utilities? Why not minimax? Principle

More information

Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities

Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities CS 188: Artificial Intelligence Markov Deciion Procee II Intructor: Dan Klein and Pieter Abbeel --- Univerity of California, Berkeley [Thee lide were created by Dan Klein and Pieter Abbeel for CS188 Intro

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2010 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25.

10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25. Logitic PS 2 due Tueday Thurday 10/18 CSE 473 Markov Deciion Procee PS 3 due Thurday 10/25 Dan Weld Many lide from Chri Bihop, Mauam, Dan Klein, Stuart Ruell, Andrew Moore & Luke Zettlemoyer MDP Planning

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2011 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 7: Expectimax Search 9/15/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Expectimax Search

More information

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Uncertainty and Utilities Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides are based on those of Dan Klein and Pieter Abbeel for

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities CSE 473: Artificial Intelligence Uncertainty, Utilities Probabilities Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are

More information

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case CS 188: Artificial Intelligence Uncertainty and Utilities Uncertain Outcomes Instructor: Marco Alvarez University of Rhode Island (These slides were created/modified by Dan Klein, Pieter Abbeel, Anca Dragan

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Uncertainty and Utilities Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at

More information

CS 4100 // artificial intelligence

CS 4100 // artificial intelligence CS 4100 // artificial intelligence instructor: byron wallace (Playing with) uncertainties and expectations Attribution: many of these slides are modified versions of those distributed with the UC Berkeley

More information

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs.

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities Worst-Case vs. Average Case max min 10 10 9 100 Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro

More information

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted 343H: Honors AI Lecture 7: Expectimax Search 2/6/2014 Kristen Grauman UT-Austin Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted 1 Announcements PS1 is out, due in 2 weeks Last time Adversarial

More information

Announcements. CS 188: Artificial Intelligence Spring Expectimax Search Trees. Maximum Expected Utility. What are Probabilities?

Announcements. CS 188: Artificial Intelligence Spring Expectimax Search Trees. Maximum Expected Utility. What are Probabilities? CS 188: Artificial Intelligence Spring 2010 Lecture 8: MEU / Utilities 2/11/2010 Announcements W2 is due today (lecture or drop box) P2 is out and due on 2/18 Pieter Abbeel UC Berkeley Many slides over

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 8: MEU / Utilities 2/11/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein 1 Announcements W2 is due today (lecture or

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs?

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs? CS 188: Artificil Intelligence Fll 2010 Lecture 9: MDP 9/2/2010 Reinforcement Lerning [DEMOS] Bic ide: Receive feedbck in the form of rewrd Agent utility i defined by the rewrd function Mut (lern to) ct

More information

Expectimax and other Games

Expectimax and other Games Expectimax and other Games 2018/01/30 Chapter 5 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/games.pdf q Project 2 released,

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

CSL603 Machine Learning

CSL603 Machine Learning CSL603 Machine Learning qundergraduate-graduate bridge course qstructure will be similar to CSL452 oquizzes, labs, exams, and perhaps a project qcourse load ~ CSL452 o possibly on the heavier side qmore

More information

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes CS 188: Artificil Intelligence Fll 2011 Mximum Expected Utility Why hould we verge utilitie? Why not minimx? Lecture 8: Utilitie / MDP 9/20/2011 Dn Klein UC Berkeley Principle of mximum expected utility:

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline

More information

Announcements. Maximizing Expected Utility. Preferences. Rational Preferences. Rational Preferences. Introduction to Artificial Intelligence

Announcements. Maximizing Expected Utility. Preferences. Rational Preferences. Rational Preferences. Introduction to Artificial Intelligence Introduction to Artificil Intelligence V22.0472-001 Fll 2009 Lecture 8: Utilitie Announcement Will hve Aignment 1 grded by Wed. Aignment 2 i up on webpge Due on Mon 19 th October (2 week) Rob Fergu Dept

More information

Uncertain Outcomes. CS 232: Ar)ficial Intelligence Uncertainty and U)li)es Sep 24, Worst- Case vs. Average Case.

Uncertain Outcomes. CS 232: Ar)ficial Intelligence Uncertainty and U)li)es Sep 24, Worst- Case vs. Average Case. 1 CS 232: Ar)ficial Intelligence Uncertainty and U)li)es Sep 24, 2015 Uncertain Outcomes [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent

More information

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies.

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies. CS 188: Artificil Intelligence Fll 2008 Lecture 9: MDP 9/25/2008 Announcement Homework olution / review eion: Mondy 9/29, 7-9pm in 2050 Vlley LSB Tuedy 9/0, 6-8pm in 10 Evn Check web for detil Cover W1-2,

More information

MDPs: Bellman Equations, Value Iteration

MDPs: Bellman Equations, Value Iteration MDPs: Bellman Equations, Value Iteration Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) Adapted from slides kindly shared by Stuart Russell Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) 1 Appreciations

More information

Markov Decision Processes. Lirong Xia

Markov Decision Processes. Lirong Xia Markov Decision Processes Lirong Xia Today ØMarkov decision processes search with uncertain moves and infinite space ØComputing optimal policy value iteration policy iteration 2 Grid World Ø The agent

More information

Lecture 12: Introduction to reasoning under uncertainty. Actions and Consequences

Lecture 12: Introduction to reasoning under uncertainty. Actions and Consequences Lecture 12: Introduction to reasoning under uncertainty Preferences Utility functions Maximizing expected utility Value of information Bandit problems and the exploration-exploitation trade-off COMP-424,

More information

General Examination in Microeconomic Theory

General Examination in Microeconomic Theory HARVARD UNIVERSITY DEPARTMENT OF ECONOMICS General Examination in Microeconomic Theory Fall 06 You have FOUR hour. Anwer all quetion Part A(Glaeer) Part B (Makin) Part C (Hart) Part D (Green) PLEASE USE

More information

Utilities and Decision Theory. Lirong Xia

Utilities and Decision Theory. Lirong Xia Utilities and Decision Theory Lirong Xia Checking conditional independence from BN graph ØGiven random variables Z 1, Z p, we are asked whether X Y Z 1, Z p dependent if there exists a path where all triples

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Stochastic domains Image: Berkeley CS188 course notes (downloaded Summer

More information

Allocation to Risky Assets. Risk Aversion and Capital. Allocation to Risky Assets. Risk and Risk Aversion. Risk and Risk Aversion

Allocation to Risky Assets. Risk Aversion and Capital. Allocation to Risky Assets. Risk and Risk Aversion. Risk and Risk Aversion Allocation to Riky Aet 6- Rik Averion and Capital Allocation to Riky Aet Invetor will avoid rik unle there i a reward. The utility model give the optimal allocation between a riky portfolio and a rik-free

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Ryan P. Adams COS 324 Elements of Machine Learning Princeton University We now turn to a new aspect of machine learning, in which agents take actions and become active in their

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Themes. Production-Financing Interactions and Optimal Capital Structure. Outline. Background. Why Change Rate? Why Consider Financing?

Themes. Production-Financing Interactions and Optimal Capital Structure. Outline. Background. Why Change Rate? Why Consider Financing? Production-Financing Interaction and Optimal Capital Structure John R. Birge Northwetern Univerity Theme Production deciion hould reflect: proper conideration of rik and market effect method for financing

More information

CS360 Homework 14 Solution

CS360 Homework 14 Solution CS360 Homework 14 Solution Markov Decision Processes 1) Invent a simple Markov decision process (MDP) with the following properties: a) it has a goal state, b) its immediate action costs are all positive,

More information

Gridworld Values V* Gridworld: Q*

Gridworld Values V* Gridworld: Q* CS 188: Artificil Intelligence Mrkov Deciion Procee II Intructor: Dn Klein nd Pieter Abbeel --- Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

von Thunen s Model Industrial Land Use the von Thunen Model Moving Forward von Thunen s Model Results

von Thunen s Model Industrial Land Use the von Thunen Model Moving Forward von Thunen s Model Results von Thunen Model Indutrial Land Ue the von Thunen Model Philip A. Viton September 17, 2014 In 1826, Johann von Thunen, in Der iolierte Stadt (The iolated city) conidered the location of agricultural activitie

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

Making Simple Decisions

Making Simple Decisions Ch. 16 p.1/33 Making Simple Decisions Chapter 16 Ch. 16 p.2/33 Outline Rational preferences Utilities Money Decision networks Value of information Additional reference: Clemen, Robert T. Making Hard Decisions:

More information

FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 MONETARY POLICY AND FISCAL POLICY. Introduction

FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 MONETARY POLICY AND FISCAL POLICY. Introduction FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 Introduction MONETARY POLICY AND FISCAL POLICY Chapter 7: tudied fical policy in iolation from monetary policy Illutrated ome core iue of fical policy (i.e.,

More information

Tsukuba Economics Working Papers No

Tsukuba Economics Working Papers No Tukuba Economic Working Paper No. 2016-003 Optimal Shadow Price for the Public Sector in the Preence of a Non-linear Income Tax Sytem in an Open Economy by Hiahiro Naito June 2016 UNIVERSITY OF TSUKUBA

More information

CEC login. Student Details Name SOLUTIONS

CEC login. Student Details Name SOLUTIONS Student Details Name SOLUTIONS CEC login Instructions You have roughly 1 minute per point, so schedule your time accordingly. There is only one correct answer per question. Good luck! Question 1. Searching

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2 COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman

More information

Uncover the True Cost of Short-duration Options

Uncover the True Cost of Short-duration Options Uncover the True Cot of Short-duration Option We tend to quote term life inurance annually, but thi may not be the bet way to determine the lowet priced option. The majority of policyholder elect to pay

More information

Confidence Intervals for One Variance with Tolerance Probability

Confidence Intervals for One Variance with Tolerance Probability Chapter 65 Confidence Interval for One Variance with Tolerance Probability Introduction Thi procedure calculate the ample ize neceary to achieve a pecified width (or in the cae of one-ided interval, the

More information

Optimizing Cost-sensitive Trust-negotiation Protocols

Optimizing Cost-sensitive Trust-negotiation Protocols Optimizing Cot-enitive Trut-negotiation Protocol Weifeng Chen, Lori Clarke, Jim Kuroe, Don Towley Department of Computer Science Univerity of Maachuett, Amhert, MA, 000 {chenwf, clarke, kuroe, towley}@c.uma.edu

More information

P s = 1. s=1. where the index of summation t is used to denote states in the denominator, so as to distinguish it from the particular state s, and

P s = 1. s=1. where the index of summation t is used to denote states in the denominator, so as to distinguish it from the particular state s, and ECO 37 Economic of Uncertainty Fall Term 2009 Week 8 Precept Novemer 8 Financial Market - Quetion Tere are S tate of te world laeled =, 2,... S, and H trader laeled =, 2,... H. Eac trader i a price-taker.

More information

Notes for Session 2, Expected Utility Theory, Summer School 2009 T.Seidenfeld 1

Notes for Session 2, Expected Utility Theory, Summer School 2009 T.Seidenfeld 1 Session 2: Expected Utility In our discussion of betting from Session 1, we required the bookie to accept (as fair) the combination of two gambles, when each gamble, on its own, is judged fair. That is,

More information

A hidden Markov chain model for the term structure of bond credit risk spreads

A hidden Markov chain model for the term structure of bond credit risk spreads A hidden Markov chain model for the term tructure of bond credit rik pread Lyn.C.Thoma Univerity of Edinburgh. David.E. Allen Edith Cowan Univerity Nigel Morkel-Kingbury Edith Cowan Univerity Abtract Thi

More information

Q1. [?? pts] Search Traces

Q1. [?? pts] Search Traces CS 188 Spring 2010 Introduction to Artificial Intelligence Midterm Exam Solutions Q1. [?? pts] Search Traces Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

DANIEL FIFE is a postdoctoral fellow in the department of biostatistics, School of Public Health, University of Michigan.

DANIEL FIFE is a postdoctoral fellow in the department of biostatistics, School of Public Health, University of Michigan. KILLING THE GOOSE By Daniel Fife DANIEL FIFE i a potdoctoral fellow in the department of biotatitic, School of Public Health, Univerity of Michigan. Environment, Vol. 13, No. 3 In certain ituation, "indutrial

More information

BLACK SCHOLES THE MARTINGALE APPROACH

BLACK SCHOLES THE MARTINGALE APPROACH BLACK SCHOLES HE MARINGALE APPROACH JOHN HICKSUN. Introduction hi paper etablihe the Black Schole formula in the martingale, rik-neutral valuation framework. he intent i two-fold. One, to erve a an introduction

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a Endogenou Timing irt half baed on Hamilton & Slutky. "Endogenizing the Order of Move in Matrix Game." Theory and Deciion. 99 Second half baed on Hamilton & Slutky. "Endogenou Timing in Duopoly Game: Stackelberg

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1 Intermediate Macroeconomic Theor II, Winter 2009 Solution to Problem Set 1 1. (18 point) Indicate, when appropriate, for each of the tatement below, whether it i true or fale. Briefl explain, upporting

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Columbia University. Department of Economics Discussion Paper Series. Integration of Unemployment Insurance with Retirement Insurance

Columbia University. Department of Economics Discussion Paper Series. Integration of Unemployment Insurance with Retirement Insurance Columbia Univerity Department of Economic Dicuion Paper Serie Integration of Unemployment Inurance with Retirement Inurance Joeph E. Stiglitz Jungyoll Yun Dicuion Paper #:0203-04 Department of Economic

More information

The Realization E ect: Risk-Taking After Realized Versus Paper Losses Appendix: For Online Publication

The Realization E ect: Risk-Taking After Realized Versus Paper Losses Appendix: For Online Publication The Realization E ect: Rik-Taking After Realized Veru Paper Loe Appendix: For Online Publication Alex Ima March 25, 2016 1 Bracketing and Realization To et up the baic framework with no prior outcome,

More information

Robust design of multi-scale programs to reduce deforestation

Robust design of multi-scale programs to reduce deforestation Robut deign of multi-cale program to reduce deforetation Andrea Cattaneo The Wood Hole Reearch Center, 149 Wood Hole Road, Falmouth, MA 02540-1644, USA. Tel. (508) 540-9900 ext. 161. Email: acattaneo@whrc.org

More information

Stochastic House Appreciation and Optimal Mortgage Lending

Stochastic House Appreciation and Optimal Mortgage Lending Stochatic Houe Appreciation and Optimal Mortgage Lending Tomaz Pikorki Columbia Buine School tp5@mail.gb.columbia.edu Alexei Tchityi NYU Stern atchity@tern.nyu.edu February 8 Abtract Auming full rationality,

More information

Stochastic House Appreciation and Optimal Mortgage Lending

Stochastic House Appreciation and Optimal Mortgage Lending Stochatic Houe Appreciation and Optimal Mortgage Lending Tomaz Pikorki Columbia Buine School tp5@mail.gb.columbia.edu Alexei Tchityi NYU Stern atchity@tern.nyu.edu February 8 Abtract Auming full rationality,

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

(a) The first best decision rule is to choose d=0 in and d=1 in s. (b) If we allocate the decision right to A, the per period expected total payoff

(a) The first best decision rule is to choose d=0 in and d=1 in s. (b) If we allocate the decision right to A, the per period expected total payoff Solution to PS 5 Poblem : daptation (a The fit bet deciion ule i to chooe d0 in and d in. (b If we allocate the deciion ight to, the pe peiod expected total payoff ( EU [ + U] i ( + 4 5. If we allocate

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Midterm 1 To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 2 or more hours on the practice

More information

Introductory Microeconomics (ES10001)

Introductory Microeconomics (ES10001) Introductory Microeconomic (ES10001) Exercie 2: Suggeted Solution 1. Match each lettered concept with the appropriate numbered phrae: (a) Cro price elaticity of demand; (b) inelatic demand; (c) long run;

More information

1324 Test 2 Review Page 1 of 15 Instructor: J. Travis. 15. Graph the following linear inequalities.

1324 Test 2 Review Page 1 of 15 Instructor: J. Travis. 15. Graph the following linear inequalities. 4 Tet Review Page of Intructor: J. Travi. Graph the following linear inequalitie. a) y b) 6+ 4y c) 4y< 6. Graph the following ytem of linear inequalitie. y a) b) + y + y 4 y 7. Ue graphical method to olve

More information

Equity Asset Allocation Model for EUR-based Eastern Europe Pension Funds

Equity Asset Allocation Model for EUR-based Eastern Europe Pension Funds TUTWPE(BFE) No. 04/119 Equity Aet Allocation Model for EUR-baed Eatern Europe Penion Fund Robert Kitt Hana Invetment Fund Liivalaia 12/8, 15038 Tallinn, Etonia Telephone: +37-6132784; Fax: +372-6131636

More information

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning CS 188: Artificil Intelligence Spring 2011 Lecture 8: Gme, MDP 2/14/2010 Pieter Abbeel UC Berkeley Mny lide dpted from Dn Klein Outline Zero-um determinitic two plyer gme Minimx Evlution function for non-terminl

More information

Effi cient Entry in Competing Auctions

Effi cient Entry in Competing Auctions Effi cient Entry in Competing Auction Jame Albrecht (Georgetown Univerity) Pieter A. Gautier (VU Amterdam) Suan Vroman (Georgetown Univerity) April 2014 Abtract In thi paper, we demontrate the effi ciency

More information

Overview: Representation Techniques

Overview: Representation Techniques 1 Overview: Representation Techniques Week 6 Representations for classical planning problems deterministic environment; complete information Week 7 Logic programs for problem representations including

More information

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov

More information

Premium Distribution and Market Competitiveness Under Rate Regulation

Premium Distribution and Market Competitiveness Under Rate Regulation Premium Ditribution and Maret Competitivene Under Rate Regulation April 2018 2 Premium Ditribution and Maret Competitivene Under Rate Regulation AUTHOR Zia Rehman, Ph.D., FCAS SPONSOR Society of Actuarie

More information

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration COS402- Artificial Intelligence Fall 2015 Lecture 17: MDP: Value Iteration and Policy Iteration Outline The Bellman equation and Bellman update Contraction Value iteration Policy iteration The Bellman

More information

BANKS RATING IN THE CONTEXT OF THEIR FINANSIAL ACTIVITY USING MODIFIED TAXONOMETRICAL METHOD

BANKS RATING IN THE CONTEXT OF THEIR FINANSIAL ACTIVITY USING MODIFIED TAXONOMETRICAL METHOD Bory Samorodov Doctor of Science (Economic), Kharkiv Intitute of Banking of the Univerity of Banking of the National Bank of Ukraine (city of Kyiv), Acting Director, Kharkiv, Ukraine amorodov@khib.edu.ua

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information