CS 188: Artificial Intelligence Fall Markov Decision Processes

Size: px
Start display at page:

Download "CS 188: Artificial Intelligence Fall Markov Decision Processes"

Transcription

1 CS 188: Artificial Intelligence Fall 2007 Lecture 10: MDP 9/27/2007 Dan Klein UC Berkeley Markov Deciion Procee An MDP i defined by: A et of tate S A et of action a A A tranition function T(,a, ) Prob that a from lead to i.e., P(,a) Alo called the model A reward function R(, a, ) Sometime jut R() or R( ) A tart tate (or ditribution) Maybe a terminal tate MDP are a family of nondeterminitic earch problem Reinforcement learning: MDP where we don t know the tranition or reward function 1

2 Example: High-Low Three card type: 2, 3, 4 Infinite deck, twice a many 2 Start with 3 howing After each card, you ay high or low New card i flipped If you re right, you win the point hown on the new card Tie are no-op If you re wrong, game end Difference from expectimax: #1: get reward a you go #2: you might play forever! High-Low State: 2, 3, 4, done Action: High, Low Model: T(, a, ): P( =done 4, High) = 3/4 P( =2 4, High) = 0 P( =3 4, High) = 0 P( =4 4, High) = 1/4 P( =done 4, Low) = 0 P( =2 4, Low) = 1/2 P( =3 4, Low) = 1/4 P( =4 4, Low) = 1/4 Reward: R(, a, ): Number hown on if 0 otherwie Start: 3 4 Note: could chooe action with earch. How? 2

3 Example: High-Low High 3 Low 3 3, High, Low T = 0.5, R = 2 T = 0.25, R = 3 T = 0, R = 4 T = 0.25, R = High Low High Low High Low MDP Search Tree Each MDP tate give an expectimax-like earch tree i a tate a (, a) i a q-tate, a (,a, ) called a tranition,a, T(,a, ) = P(,a) R(,a, ) 3

4 Utilitie of Sequence In order to formalize optimality of a policy, need to undertand utilitie of equence of reward Typically conider tationary preference: Theorem: only two way to define tationary utilitie Additive utility: Auming that reward depend only on tate for thee lide! Dicounted utility: Infinite Utilitie?! Problem: infinite equence with infinite reward Solution: Finite horizon: Terminate after a fixed T tep Give nontationary policy (π depend on time left) Aborbing tate(): guarantee that for every policy, agent will eventually die (like done for High-Low) Dicounting: for 0 < γ < 1 Smaller γ mean maller horizon horter term focu 4

5 Dicounting Typically dicount reward by γ < 1 each time tep Sooner reward have higher utility than later reward Alo help the algorithm converge Epiode and Return An epiode i a run of an MDP Sequence of tranition (,a, ) Start at tart tate End at terminal tate (if it end) Stochatic! The utility, or return, of an epiode The dicounted um of the reward 5

6 Utilitie under Policie Fundamental operation: compute the utility of a tate Define the value (utility) of a tate, under a fixed policy π: V π () = expected return tarting in and following π π(), π(),a, Recurive relation (one-tep lookahead): Policy Evaluation How do we calculate value for a fixed policy? Idea one: it jut a linear ytem, olve with Matlab (or whatever) Idea two: turn recurive equation into update V iπ () = expected return over the next i tranition while following π π(), π(),a, Equivalent to doing depth i earch and plugging in zero at leave 6

7 Example: High-Low Policy: alway ay high Iterative update: [DEMO] Q-Function Alo, define a q-value, for a tate and action (q-tate) Q π () = expected return tarting in, taking action a and following π thereafter a, a,a, 7

8 Recap: MDP Quantitie Return = Sum of future dicounted reward in one epiode (tochatic) a, a,a, V: Expected return from a tate under a policy Q: Expected return from a q-tate under a policy Optimal Utilitie Fundamental operation: compute the optimal utilitie of tate Define the utility of a tate : V * () = expected return tarting in and acting optimally Define the utility of a q-tate (,a): Q * () = expected return tarting in, taking action a and thereafter acting optimally a, a,a, Define the optimal policy: π * () = optimal action from tate 8

9 The Bellman Equation Definition of utility lead to a imple relationhip amongt optimal utility value: Optimal reward = maximize over firt action and then follow optimal policy Formally: a, a,a, Solving MDP We want to find the optimal policyπ Propoal 1: modified expectimax earch: a, a,a, 9

10 MDP Search Tree? Problem: Thi tree i uually infinite (why?) The ame tate appear over and over (why?) There actually one tree per tate (why?) Idea: Compute to a finite depth (like expectimax) Conider return from equence of increaing length Cache value o we don t repeat work Value Etimate Calculate etimate V k* () Not the optimal value of! The optimal value conidering only next k time tep (k reward) A k, it approache the optimal value Why: If dicounting, ditant reward become negligible If terminal tate reachable from everywhere, fraction of epiode not ending become negligible Otherwie, can get infinite expected utility and then thi approach actually won t work 10

11 Memoized Recurion? Recurrence: Cache all function call reult o you never repeat work What happened to the evaluation function? Value Iteration Problem with the recurive computation: Have to keep all the V k* () around all the time Don t know which depth π k () to ak for when planning Solution: value iteration Calculate value for all tate, bottom-up Keep increaing k until convergence 11

12 Value Iteration Idea: Start with V 0* () = 0, which we know i right (why?) Given V i*, calculate the value for all tate for depth i+1: Thi i called a value update or Bellman update Repeat until convergence Theorem: will converge to unique optimal value Baic idea: approximation get refined toward optimal value Policy may converge long before value do Example: Bellman Update 12

13 Example: Value Iteration V 2 V 3 Information propagate outward from terminal tate and eventually all tate have correct value etimate [DEMO] Define the max-norm: Convergence* Theorem: For any two approximation U and V I.e. any ditinct approximation mut get cloer to each other, o, in particular, any approximation mut get cloer to the true U and value iteration converge to a unique, table, optimal olution Theorem: I.e. once the change in our approximation i mall, it mut alo be cloe to correct 13

14 Policy Iteration Alternative approach: Step 1: Policy evaluation: calculate utilitie for a fixed policy (not optimal utilitie!) until convergence Step 2: Policy improvement: update policy baed on reulting converged (but not optimal!) utilitie Repeat tep until policy converge Thi i policy iteration Can converge fater under ome condition Policy Iteration Policy evaluation: with fixed current policy π, find value with implified Bellman update: Iterate until value converge Policy improvement: with fixed utilitie, find the bet action according to one-tep look-ahead 14

15 Comparion In value iteration: Every pa (or backup ) update both utilitie (explicitly, baed on current utilitie) and policy (poibly implicitly, baed on current policy) In policy iteration: Several pae to update utilitie with frozen policy Occaional pae to update policie Hybrid approache (aynchronou policy iteration): Any equence of partial update to either policy entrie or utilitie will converge if every tate i viited infinitely often 15

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011 CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDP 2/16/2011 Announcement Midterm: Tueday March 15, 5-8pm P2: Due Friday 4:59pm W3: Minimax, expectimax and MDP---out tonight, due Monday February

More information

Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1)

Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1) CS 188: Artificial Intelligence Fall 007 Lecture 9: Utilitie 9/5/007 Dan Klein UC Berkeley Project (due 10/1) Announcement SVN group available, email u to requet Midterm 10/16 in cla One ide of a page

More information

Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities

Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities CS 188: Artificial Intelligence Markov Deciion Procee II Intructor: Dan Klein and Pieter Abbeel --- Univerity of California, Berkeley [Thee lide were created by Dan Klein and Pieter Abbeel for CS188 Intro

More information

10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25.

10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25. Logitic PS 2 due Tueday Thurday 10/18 CSE 473 Markov Deciion Procee PS 3 due Thurday 10/25 Dan Weld Many lide from Chri Bihop, Mauam, Dan Klein, Stuart Ruell, Andrew Moore & Luke Zettlemoyer MDP Planning

More information

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline

More information

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs?

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs? CS 188: Artificil Intelligence Fll 2010 Lecture 9: MDP 9/2/2010 Reinforcement Lerning [DEMOS] Bic ide: Receive feedbck in the form of rewrd Agent utility i defined by the rewrd function Mut (lern to) ct

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

MDPs: Bellman Equations, Value Iteration

MDPs: Bellman Equations, Value Iteration MDPs: Bellman Equations, Value Iteration Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) Adapted from slides kindly shared by Stuart Russell Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) 1 Appreciations

More information

Markov Decision Processes. Lirong Xia

Markov Decision Processes. Lirong Xia Markov Decision Processes Lirong Xia Today ØMarkov decision processes search with uncertain moves and infinite space ØComputing optimal policy value iteration policy iteration 2 Grid World Ø The agent

More information

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies.

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies. CS 188: Artificil Intelligence Fll 2008 Lecture 9: MDP 9/25/2008 Announcement Homework olution / review eion: Mondy 9/29, 7-9pm in 2050 Vlley LSB Tuedy 9/0, 6-8pm in 10 Evn Check web for detil Cover W1-2,

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration COS402- Artificial Intelligence Fall 2015 Lecture 17: MDP: Value Iteration and Policy Iteration Outline The Bellman equation and Bellman update Contraction Value iteration Policy iteration The Bellman

More information

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2 COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations CS 188: Artificil Intelligence Fll 2009 Lecture 10: MDP 9/29/2009 Announcement P2: Due Wednedy P3: MDP nd Reinforcement Lerning i up! W2: Out lte thi week Dn Klein UC Berkeley Mny lide over the coure dpted

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Stochastic domains Image: Berkeley CS188 course notes (downloaded Summer

More information

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Gridworld Values V* Gridworld: Q*

Gridworld Values V* Gridworld: Q* CS 188: Artificil Intelligence Mrkov Deciion Procee II Intructor: Dn Klein nd Pieter Abbeel --- Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions CS 188: Artificil Intelligence Fll 2008 Lecture 10: MDP 9/30/2008 Dn Klein UC Berkeley Recp: MDP Mrkov deciion procee: Stte S Action A Trnition P(,) (or T(,, )) Rewrd R(,, ) (nd dicount γ) Strt tte 0 Quntitie:

More information

General Examination in Microeconomic Theory

General Examination in Microeconomic Theory HARVARD UNIVERSITY DEPARTMENT OF ECONOMICS General Examination in Microeconomic Theory Fall 06 You have FOUR hour. Anwer all quetion Part A(Glaeer) Part B (Makin) Part C (Hart) Part D (Green) PLEASE USE

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning CS 188: Artificil Intelligence Spring 2011 Lecture 8: Gme, MDP 2/14/2010 Pieter Abbeel UC Berkeley Mny lide dpted from Dn Klein Outline Zero-um determinitic two plyer gme Minimx Evlution function for non-terminl

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a Endogenou Timing irt half baed on Hamilton & Slutky. "Endogenizing the Order of Move in Matrix Game." Theory and Deciion. 99 Second half baed on Hamilton & Slutky. "Endogenou Timing in Duopoly Game: Stackelberg

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

2D5362 Machine Learning

2D5362 Machine Learning 2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files

More information

CS360 Homework 14 Solution

CS360 Homework 14 Solution CS360 Homework 14 Solution Markov Decision Processes 1) Invent a simple Markov decision process (MDP) with the following properties: a) it has a goal state, b) its immediate action costs are all positive,

More information

Figure 5-1 Root locus for Problem 5.2.

Figure 5-1 Root locus for Problem 5.2. K K( +) 5.3 () i KG() = (ii) KG() = ( + )( + 5) ( + 3)( + 5) 6 4 Imag Axi - -4 Imag Axi -6-8 -6-4 - Real Axi 5 4 3 - - -3-4 Figure 5- Root locu for Problem 5.3 (i) -5-8 -6-4 - Real Axi Figure 5-3 Root

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1 Intermediate Macroeconomic Theor II, Winter 2009 Solution to Problem Set 1 1. (18 point) Indicate, when appropriate, for each of the tatement below, whether it i true or fale. Briefl explain, upporting

More information

Static Fully Observable Stochastic What action next? Instantaneous Perfect

Static Fully Observable Stochastic What action next?  Instantaneous Perfect CS 188: Ar)ficil Intelligence Mrkov Deciion Procee K+1 Intructor: Dn Klein nd Pieter Abbeel - - - Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Ryan P. Adams COS 324 Elements of Machine Learning Princeton University We now turn to a new aspect of machine learning, in which agents take actions and become active in their

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

CS 6300 Artificial Intelligence Spring 2018

CS 6300 Artificial Intelligence Spring 2018 Expectimax Search CS 6300 Artificial Intelligence Spring 2018 Tucker Hermans thermans@cs.utah.edu Many slides courtesy of Pieter Abbeel and Dan Klein Expectimax Search Trees What if we don t know what

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

CEC login. Student Details Name SOLUTIONS

CEC login. Student Details Name SOLUTIONS Student Details Name SOLUTIONS CEC login Instructions You have roughly 1 minute per point, so schedule your time accordingly. There is only one correct answer per question. Good luck! Question 1. Searching

More information

Capacity Planning in a General Supply Chain with Multiple Contract Types

Capacity Planning in a General Supply Chain with Multiple Contract Types Capacity Planning in a General Supply Chain with Multiple Contract Type Xin Huang and Stephen C. Grave M.I.T. 1 Abtract The ucceful commercialization of any new product depend to a degree on the ability

More information

Price Game Analysis of Leader-Follower Service Providers with Service Delivery. Time Guarantees. ZHANG Yu-lin. ZHANG Jian-wei.

Price Game Analysis of Leader-Follower Service Providers with Service Delivery. Time Guarantees. ZHANG Yu-lin. ZHANG Jian-wei. 00-063 Price Game Analyi of eader-follower Service Provider with Service Delivery Time Guarantee ZANG Yu-lin School of Economic and Management, Southeat Univerity(SEU), Nanging, China, 0096 Email: zhangyl@eu.edu.cn

More information

P s = 1. s=1. where the index of summation t is used to denote states in the denominator, so as to distinguish it from the particular state s, and

P s = 1. s=1. where the index of summation t is used to denote states in the denominator, so as to distinguish it from the particular state s, and ECO 37 Economic of Uncertainty Fall Term 2009 Week 8 Precept Novemer 8 Financial Market - Quetion Tere are S tate of te world laeled =, 2,... S, and H trader laeled =, 2,... H. Eac trader i a price-taker.

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

Confidence Intervals for One Variance using Relative Error

Confidence Intervals for One Variance using Relative Error Chapter 653 Confidence Interval for One Variance uing Relative Error Introduction Thi routine calculate the neceary ample ize uch that a ample variance etimate will achieve a pecified relative ditance

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2015 Introduction to Artificial Intelligence Midterm 1 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Q1. [?? pts] Search Traces

Q1. [?? pts] Search Traces CS 188 Spring 2010 Introduction to Artificial Intelligence Midterm Exam Solutions Q1. [?? pts] Search Traces Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a

More information

CS221 / Spring 2018 / Sadigh. Lecture 7: MDPs I

CS221 / Spring 2018 / Sadigh. Lecture 7: MDPs I CS221 / Spring 2018 / Sadigh Lecture 7: MDPs I cs221.stanford.edu/q Question How would you get to Mountain View on Friday night in the least amount of time? bike drive Caltrain Uber/Lyft fly CS221 / Spring

More information

Lecture 7: MDPs I. Question. Course plan. So far: search problems. Uncertainty in the real world

Lecture 7: MDPs I. Question. Course plan. So far: search problems. Uncertainty in the real world Lecture 7: MDPs I cs221.stanford.edu/q Question How would you get to Mountain View on Friday night in the least amount of time? bike drive Caltrain Uber/Lyft fly CS221 / Spring 2018 / Sadigh CS221 / Spring

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

CS 461: Machine Learning Lecture 8

CS 461: Machine Learning Lecture 8 CS 461: Machine Learning Lecture 8 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu 2/23/08 CS 461, Winter 2008 1 Plan for Today Review Clustering Reinforcement Learning How different from supervised, unsupervised?

More information

Deep RL and Controls Homework 1 Spring 2017

Deep RL and Controls Homework 1 Spring 2017 10-703 Deep RL and Controls Homework 1 Spring 2017 February 1, 2017 Due February 17, 2017 Instructions You have 15 days from the release of the assignment until it is due. Refer to gradescope for the exact

More information

ECE-320, Practice Quiz #2

ECE-320, Practice Quiz #2 ECE-320, Practice Quiz #2 Problem 1 and 2 refer to a ytem with pole at -2+5j. -2-5j. -10+j, -10-j, and -20 1) The bet etimate of the ettling time for thi ytem i a) 2 econd b) 0.4 econd c) 4/5 econd d)

More information

340B Aware and Beware

340B Aware and Beware 340B Aware and Beware Being aware of the complex and ever-changing 340B Drug Pricing Program rule help covered entitie maintain integrity and drive program value. Succeful 340B program focu on three fundamental

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Monte Carlo Methods Heiko Zimmermann 15.05.2017 1 Monte Carlo Monte Carlo policy evaluation First visit policy evaluation Estimating q values On policy methods Off policy methods

More information

AM 121: Intro to Optimization Models and Methods

AM 121: Intro to Optimization Models and Methods AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,

More information

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes CS 188: Artificil Intelligence Fll 2011 Mximum Expected Utility Why hould we verge utilitie? Why not minimx? Lecture 8: Utilitie / MDP 9/20/2011 Dn Klein UC Berkeley Principle of mximum expected utility:

More information

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 12: MDP1 Victor R. Lesser CMPSCI 683 Fall 2010 Biased Random GSAT - WalkSat Notice no random restart 2 Today s lecture Search where there is Uncertainty in Operator Outcome --Sequential Decision

More information

Optimizing Cost-sensitive Trust-negotiation Protocols

Optimizing Cost-sensitive Trust-negotiation Protocols Optimizing Cot-enitive Trut-negotiation Protocol Weifeng Chen, Lori Clarke, Jim Kuroe, Don Towley Department of Computer Science Univerity of Maachuett, Amhert, MA, 000 {chenwf, clarke, kuroe, towley}@c.uma.edu

More information

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure

More information

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search Overview CSE 473 Mrkov Deciion Procee Dn Weld Mny lide from Chri Bihop, Mum, Dn Klein, Sturt Ruell, Andrew Moore & Luke Zettlemoyer Introduction & Agent Serch, Heuritic & CSP Adverril Serch Logicl Knowledge

More information

FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 MONETARY POLICY AND FISCAL POLICY. Introduction

FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 MONETARY POLICY AND FISCAL POLICY. Introduction FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 Introduction MONETARY POLICY AND FISCAL POLICY Chapter 7: tudied fical policy in iolation from monetary policy Illutrated ome core iue of fical policy (i.e.,

More information

Introductory Microeconomics (ES10001)

Introductory Microeconomics (ES10001) Introductory Microeconomic (ES10001) Exercie 2: Suggeted Solution 1. Match each lettered concept with the appropriate numbered phrae: (a) Cro price elaticity of demand; (b) inelatic demand; (c) long run;

More information

Announcements. Maximizing Expected Utility. Preferences. Rational Preferences. Rational Preferences. Introduction to Artificial Intelligence

Announcements. Maximizing Expected Utility. Preferences. Rational Preferences. Rational Preferences. Introduction to Artificial Intelligence Introduction to Artificil Intelligence V22.0472-001 Fll 2009 Lecture 8: Utilitie Announcement Will hve Aignment 1 grded by Wed. Aignment 2 i up on webpge Due on Mon 19 th October (2 week) Rob Fergu Dept

More information

Introduction to Fall 2007 Artificial Intelligence Final Exam

Introduction to Fall 2007 Artificial Intelligence Final Exam NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Final Exam You have 180 minutes. The exam is closed book, closed notes except a two-page crib sheet, basic calculators

More information

Optimizing revenue for bandwidth auctions over networks with time reservations

Optimizing revenue for bandwidth auctions over networks with time reservations Optimizing revenue for bandwidth auction over network with time reervation Pablo Belzarena,a, Fernando Paganini b, André Ferragut b a Facultad de Ingeniería, Univeridad de la República, Montevideo, Uruguay

More information

Chapter eration i calculated along each path. The reulting price are then averaged to yield an unbiaed price etimate. However, for intrument that have

Chapter eration i calculated along each path. The reulting price are then averaged to yield an unbiaed price etimate. However, for intrument that have IMPORTANCE SAMPLING IN LATTICE PRICING MODELS Soren S. Nielen Management Science and Information Sytem Univerity of Texa at Autin, Autin, T. ABSTRACT nielen@guldfaxe.bu.utexa.edu Binomial lattice model

More information

MDPs and Value Iteration 2/20/17

MDPs and Value Iteration 2/20/17 MDPs and Value Iteration 2/20/17 Recall: State Space Search Problems A set of discrete states A distinguished start state A set of actions available to the agent in each state An action function that,

More information

Confidence Intervals for One Variance with Tolerance Probability

Confidence Intervals for One Variance with Tolerance Probability Chapter 65 Confidence Interval for One Variance with Tolerance Probability Introduction Thi procedure calculate the ample ize neceary to achieve a pecified width (or in the cae of one-ided interval, the

More information

- International Scientific Journal about Logistics Volume: Issue: 4 Pages: 7-15 ISSN

- International Scientific Journal about Logistics Volume: Issue: 4 Pages: 7-15 ISSN DOI: 10.22306/al.v3i4.72 Received: 03 Dec. 2016 Accepted: 11 Dec. 2016 THE ANALYSIS OF THE COMMODITY PRICE FORECASTING SUCCESS CONSIDERING DIFFERENT NUMERICAL MODELS SENSITIVITY TO PROGNOSIS ERROR Technical

More information

Markov Decision Processes. CS 486/686: Introduction to Artificial Intelligence

Markov Decision Processes. CS 486/686: Introduction to Artificial Intelligence Markov Decision Processes CS 486/686: Introduction to Artificial Intelligence 1 Outline Markov Chains Discounted Rewards Markov Decision Processes (MDP) - Value Iteration - Policy Iteration 2 Markov Chains

More information

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World CS 188: Artificil Intelligence Mrkov Deciion Procee Non-Determinitic Serch Dn Klein, Pieter Abbeel Univerity of Cliforni, Berkeley Exmple: Grid World Grid World Action A mze-like problem The gent live

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Uncertainty and Utilities Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides are based on those of Dan Klein and Pieter Abbeel for

More information

Intro to Reinforcement Learning. Part 3: Core Theory

Intro to Reinforcement Learning. Part 3: Core Theory Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

CS885 Reinforcement Learning Lecture 3b: May 9, 2018

CS885 Reinforcement Learning Lecture 3b: May 9, 2018 CS885 Reinforcement Learning Lecture 3b: May 9, 2018 Intro to Reinforcement Learning [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3, CS885 Spring

More information

PROBLEM SET 3, MACROECONOMICS: POLICY, 31E23000, SPRING 2017

PROBLEM SET 3, MACROECONOMICS: POLICY, 31E23000, SPRING 2017 PROBLEM SET 3, MACROECONOMICS: POLICY, 31E23000, SPRING 2017 1. Ue the Solow growth model to tudy what happen in an economy in which the labor force increae uddenly, there i a dicrete increae in L! Aume

More information

BLACK SCHOLES THE MARTINGALE APPROACH

BLACK SCHOLES THE MARTINGALE APPROACH BLACK SCHOLES HE MARINGALE APPROACH JOHN HICKSUN. Introduction hi paper etablihe the Black Schole formula in the martingale, rik-neutral valuation framework. he intent i two-fold. One, to erve a an introduction

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov

More information

Lecture 1: Lucas Model and Asset Pricing

Lecture 1: Lucas Model and Asset Pricing Lecture 1: Lucas Model and Asset Pricing Economics 714, Spring 2018 1 Asset Pricing 1.1 Lucas (1978) Asset Pricing Model We assume that there are a large number of identical agents, modeled as a representative

More information

Do profit maximizers take cold showers?

Do profit maximizers take cold showers? Bond Univerity epublication@bond Bond Buine School Publication Bond Buine School 3-1-2001 Do profit maximizer take cold hower? Neil Campbell neil_campbell@bond.edu.au Jeffrey J. Kline Bond Univerity, jeffrey_kline@bond.edu.au

More information

von Thunen s Model Industrial Land Use the von Thunen Model Moving Forward von Thunen s Model Results

von Thunen s Model Industrial Land Use the von Thunen Model Moving Forward von Thunen s Model Results von Thunen Model Indutrial Land Ue the von Thunen Model Philip A. Viton September 17, 2014 In 1826, Johann von Thunen, in Der iolierte Stadt (The iolated city) conidered the location of agricultural activitie

More information