Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities

Size: px
Start display at page:

Download "Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities"

Transcription

1 CS 188: Artificial Intelligence Markov Deciion Procee II Intructor: Dan Klein and Pieter Abbeel --- Univerity of California, Berkeley [Thee lide were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 material are available at A maze-like problem The agent live in a grid Wall block the agent path Example: Grid World Noiy movement: action do not alway go a planned 80% of the time, the action North take the agent North 10% of the time, North take the agent Wet; 10% Eat If there i a wall in the direction the agent would have been taken, the agent tay put The agent receive reward each time tep Small living reward each tep (can be negative) Big reward come at the end (good or bad) Goal: maximize um of (dicounted) reward Recap: MDP Optimal Quantitie Markov deciion procee: State S Action A Tranition P(,a) (or T(,a, )) Reward R(,a, ) (and dicount γ) Start tate 0,a, Quantitie: Policy = map of tate to action Utility = um of dicounted reward Value = expected future utility from a tate (max node) Q-Value = expected future utility from a q-tate (chance node) a, a The value (utility) of a tate : V * () = expected utility tarting in and acting optimally The value (utility) of a q-tate (,a): Q * (,a) = expected utility tarting out having taken action a from tate and (thereafter) acting optimally The optimal policy: π * () = optimal action from tate a, a,a, i a tate (, a) i a q-tate (,a, ) i a tranition [Demo: gridworld value (L9D1)]

2 Gridworld Value V* Gridworld: Q* The Bellman Equation The Bellman Equation How to be optimal: Step 1: Take correct firt action Step 2: Keep being optimal Definition of optimal utility via expectimax recurrence give a imple one-tep lookahead relationhip amongt optimal utility value,a, a, a Thee are the Bellman equation, and they characterize optimal value in a way we ll ue over and over

3 Value Iteration Convergence* Bellman equation characterize the optimal value: V() How do we know the V k vector are going to converge? Value iteration compute them: Value iteration i jut a fixed point olution method though the V k vector are alo interpretable a time-limited value,a, a, a V( ) Cae 1: If the tree ha maximum depth M, then V M hold the actual untruncated value Cae 2: If the dicount i le than 1 Sketch: For any tate V k and V k+1 can be viewed a depth k+1 expectimax reult in nearly identical earch tree The difference i that on the bottom layer, V k+1 ha actual reward while V k ha zero That lat layer i at bet all R MAX It i at wort R MIN But everything i dicounted by γ k that far out So V k and V k+1 are at mot γ k max R different So a k increae, the value converge Policy Method Policy Evaluation

4 Fixed Policie Utilitie for a Fixed Policy Do the optimal action a, a,a, Do what π ay to do π(), π(), π(), Another baic operation: compute the utility of a tate under a fixed (generally non-optimal) policy Define the utility of a tate, under a fixed policy π: V π () = expected total dicounted reward tarting in and following π Recurive relation (one-tep look-ahead / Bellman equation):, π(), π(), π() Expectimax tree max over all action to compute the optimal value If we fixed ome policy π(), then the tree would be impler only one action per tate though the tree value would depend on which policy we fixed Example: Policy Evaluation Example: Policy Evaluation Alway Go Right Alway Go Forward Alway Go Right Alway Go Forward

5 Policy Evaluation Policy Extraction How do we calculate the V for a fixed policy π? Idea 1: Turn recurive Bellman equation into update (like value iteration) π(), π(), π(), Efficiency: O(S 2 ) per iteration Idea 2: Without the maxe, the Bellman equation are jut a linear ytem Solve with Matlab (or your favorite linear ytem olver) Computing Action from Value Let imagine we have the optimal value V*() How hould we act? It not obviou! Computing Action from Q-Value Let imagine we have the optimal q-value: How hould we act? Completely trivial to decide! We need to do a mini-expectimax (one tep) Thi i called policy extraction, ince it get the policy implied by the value Important leon: action are eaier to elect from q-value than value!

6 Policy Iteration Problem with Value Iteration Value iteration repeat the Bellman update: a, a Problem 1: It low O(S 2 A) per iteration Problem 2: The max at each tate rarely change,a, Problem 3: The policy often converge long before the value [Demo: value iteration (L9D2)] k=0 k=1

7 k=2 k=3 k=4 k=5

8 k=6 k=7 k=8 k=9

9 k=10 k=11 k=12 k=100

10 Policy Iteration Alternative approach for optimal value: Step 1: Policy evaluation: calculate utilitie for ome fixed policy (not optimal utilitie!) until convergence Step 2: Policy improvement: update policy uing one-tep look-ahead with reulting converged (but not optimal!) utilitie a future value Repeat tep until policy converge Thi i policy iteration It till optimal! Can converge (much) fater under ome condition Policy Iteration Evaluation: For fixed current policy π, find value with policy evaluation: Iterate until value converge: Improvement: For fixed value, get a better policy uing policy extraction One-tep look-ahead: Comparion Both value iteration and policy iteration compute the ame thing (all optimal value) In value iteration: Every iteration update both the value and (implicitly) the policy We don t track the policy, but taking the max over action implicitly recompute it In policy iteration: We do everal pae that update utilitie with fixed policy (each pa i fat becaue we conider only one action, not all of them) After the policy i evaluated, a new policy i choen (low like a value iteration pa) The new policy will be better (or we re done) Summary: MDP Algorithm So you want to. Compute optimal value: ue value iteration or policy iteration Compute value for a particular policy: ue policy evaluation Turn your value into a policy: ue policy extraction (one-tep lookahead) Thee all look the ame! They baically are they are all variation of Bellman update They all ue one-tep lookahead expectimax fragment They differ only in whether we plug in a fixed policy or max over action Both are dynamic program for olving MDP

11 Double Bandit Double-Bandit MDP Action: Blue, Red State: Win, Loe $1 1.0 W 0.75 $ $ $ $0 L $1 No dicount 100 time tep Both tate have the ame value 1.0 Offline Planning Let Play! Solving MDP i offline planning You determine all quantitie through computation You need to know the detail of the MDP You do not actually play the game! No dicount 100 time tep Both tate have the ame value Play Red Play Blue Value $1 1.0 W 0.75 $ $ $ $0 L $1 1.0 $2 $2 $0 $2 $2 $2 $2 $0 $0 $0

12 Online Planning Let Play! Rule changed! Red win chance i different.?? $0 $1 1.0 W?? $2?? $2?? $0 L $1 1.0 $0 $0 $0 $2 $0 $2 $0 $0 $0 $0 What Jut Happened? Next Time: Reinforcement Learning! That wan t planning, it wa learning! Specifically, reinforcement learning There wa an MDP, but you couldn t olve it with jut computation You needed to actually act to figure it out Important idea in reinforcement learning that came up Exploration: you have to try unknown action to get information Exploitation: eventually, you have to ue what you know Regret: even if you learn intelligently, you make mitake Sampling: becaue of chance, you have to try thing repeatedly Difficulty: learning can be much harder than olving a known MDP

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011 CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDP 2/16/2011 Announcement Midterm: Tueday March 15, 5-8pm P2: Due Friday 4:59pm W3: Minimax, expectimax and MDP---out tonight, due Monday February

More information

CS 188: Artificial Intelligence Fall Markov Decision Processes

CS 188: Artificial Intelligence Fall Markov Decision Processes CS 188: Artificial Intelligence Fall 2007 Lecture 10: MDP 9/27/2007 Dan Klein UC Berkeley Markov Deciion Procee An MDP i defined by: A et of tate S A et of action a A A tranition function T(,a, ) Prob

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in

More information

Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1)

Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1) CS 188: Artificial Intelligence Fall 007 Lecture 9: Utilitie 9/5/007 Dan Klein UC Berkeley Project (due 10/1) Announcement SVN group available, email u to requet Midterm 10/16 in cla One ide of a page

More information

Gridworld Values V* Gridworld: Q*

Gridworld Values V* Gridworld: Q* CS 188: Artificil Intelligence Mrkov Deciion Procee II Intructor: Dn Klein nd Pieter Abbeel --- Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence

More information

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2 COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in

More information

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course

More information

10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25.

10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25. Logitic PS 2 due Tueday Thurday 10/18 CSE 473 Markov Deciion Procee PS 3 due Thurday 10/25 Dan Weld Many lide from Chri Bihop, Mauam, Dan Klein, Stuart Ruell, Andrew Moore & Luke Zettlemoyer MDP Planning

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Stochastic domains Image: Berkeley CS188 course notes (downloaded Summer

More information

MDPs: Bellman Equations, Value Iteration

MDPs: Bellman Equations, Value Iteration MDPs: Bellman Equations, Value Iteration Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) Adapted from slides kindly shared by Stuart Russell Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) 1 Appreciations

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

Static Fully Observable Stochastic What action next? Instantaneous Perfect

Static Fully Observable Stochastic What action next?  Instantaneous Perfect CS 188: Ar)ficil Intelligence Mrkov Deciion Procee K+1 Intructor: Dn Klein nd Pieter Abbeel - - - Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

CS 6300 Artificial Intelligence Spring 2018

CS 6300 Artificial Intelligence Spring 2018 Expectimax Search CS 6300 Artificial Intelligence Spring 2018 Tucker Hermans thermans@cs.utah.edu Many slides courtesy of Pieter Abbeel and Dan Klein Expectimax Search Trees What if we don t know what

More information

Fully Observable. Perfect

Fully Observable. Perfect CS 188: Ar)ficil Intelligence Mrkov Deciion Procee II Stoch)c Plnning: MDP Sttic Environment Fully Obervble Perfect Wht ction next? Stochtic Intntneou Intructor: Dn Klein nd Pieter Abbeel - - - Univerity

More information

Markov Decision Processes. Lirong Xia

Markov Decision Processes. Lirong Xia Markov Decision Processes Lirong Xia Today ØMarkov decision processes search with uncertain moves and infinite space ØComputing optimal policy value iteration policy iteration 2 Grid World Ø The agent

More information

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs?

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs? CS 188: Artificil Intelligence Fll 2010 Lecture 9: MDP 9/2/2010 Reinforcement Lerning [DEMOS] Bic ide: Receive feedbck in the form of rewrd Agent utility i defined by the rewrd function Mut (lern to) ct

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Uncertainty and Utilities Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides are based on those of Dan Klein and Pieter Abbeel for

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World CS 188: Artificil Intelligence Mrkov Deciion Procee Non-Determinitic Serch Dn Klein, Pieter Abbeel Univerity of Cliforni, Berkeley Exmple: Grid World Grid World Action A mze-like problem The gent live

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs.

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities Worst-Case vs. Average Case max min 10 10 9 100 Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro

More information

CS360 Homework 14 Solution

CS360 Homework 14 Solution CS360 Homework 14 Solution Markov Decision Processes 1) Invent a simple Markov decision process (MDP) with the following properties: a) it has a goal state, b) its immediate action costs are all positive,

More information

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case CS 188: Artificial Intelligence Uncertainty and Utilities Uncertain Outcomes Instructor: Marco Alvarez University of Rhode Island (These slides were created/modified by Dan Klein, Pieter Abbeel, Anca Dragan

More information

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

Q1. [?? pts] Search Traces

Q1. [?? pts] Search Traces CS 188 Spring 2010 Introduction to Artificial Intelligence Midterm Exam Solutions Q1. [?? pts] Search Traces Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration COS402- Artificial Intelligence Fall 2015 Lecture 17: MDP: Value Iteration and Policy Iteration Outline The Bellman equation and Bellman update Contraction Value iteration Policy iteration The Bellman

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Uncertainty and Utilities Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at

More information

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities CSE 473: Artificial Intelligence Uncertainty, Utilities Probabilities Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Midterm 1 To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 2 or more hours on the practice

More information

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1 Intermediate Macroeconomic Theor II, Winter 2009 Solution to Problem Set 1 1. (18 point) Indicate, when appropriate, for each of the tatement below, whether it i true or fale. Briefl explain, upporting

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2010 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

Intro to Reinforcement Learning. Part 3: Core Theory

Intro to Reinforcement Learning. Part 3: Core Theory Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2

More information

- International Scientific Journal about Logistics Volume: Issue: 4 Pages: 7-15 ISSN

- International Scientific Journal about Logistics Volume: Issue: 4 Pages: 7-15 ISSN DOI: 10.22306/al.v3i4.72 Received: 03 Dec. 2016 Accepted: 11 Dec. 2016 THE ANALYSIS OF THE COMMODITY PRICE FORECASTING SUCCESS CONSIDERING DIFFERENT NUMERICAL MODELS SENSITIVITY TO PROGNOSIS ERROR Technical

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

MDPs and Value Iteration 2/20/17

MDPs and Value Iteration 2/20/17 MDPs and Value Iteration 2/20/17 Recall: State Space Search Problems A set of discrete states A distinguished start state A set of actions available to the agent in each state An action function that,

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum Reinforcement learning and Markov Decision Processes (MDPs) 15-859(B) Avrim Blum RL and MDPs General scenario: We are an agent in some state. Have observations, perform actions, get rewards. (See lights,

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Ryan P. Adams COS 324 Elements of Machine Learning Princeton University We now turn to a new aspect of machine learning, in which agents take actions and become active in their

More information

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Monte-Carlo Planning Look Ahead Trees. Alan Fern Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Uncover the True Cost of Short-duration Options

Uncover the True Cost of Short-duration Options Uncover the True Cot of Short-duration Option We tend to quote term life inurance annually, but thi may not be the bet way to determine the lowet priced option. The majority of policyholder elect to pay

More information

PROBLEM SET 3, MACROECONOMICS: POLICY, 31E23000, SPRING 2017

PROBLEM SET 3, MACROECONOMICS: POLICY, 31E23000, SPRING 2017 PROBLEM SET 3, MACROECONOMICS: POLICY, 31E23000, SPRING 2017 1. Ue the Solow growth model to tudy what happen in an economy in which the labor force increae uddenly, there i a dicrete increae in L! Aume

More information

Figure 5-1 Root locus for Problem 5.2.

Figure 5-1 Root locus for Problem 5.2. K K( +) 5.3 () i KG() = (ii) KG() = ( + )( + 5) ( + 3)( + 5) 6 4 Imag Axi - -4 Imag Axi -6-8 -6-4 - Real Axi 5 4 3 - - -3-4 Figure 5- Root locu for Problem 5.3 (i) -5-8 -6-4 - Real Axi Figure 5-3 Root

More information

CEC login. Student Details Name SOLUTIONS

CEC login. Student Details Name SOLUTIONS Student Details Name SOLUTIONS CEC login Instructions You have roughly 1 minute per point, so schedule your time accordingly. There is only one correct answer per question. Good luck! Question 1. Searching

More information

Reinforcement Learning Analysis, Grid World Applications

Reinforcement Learning Analysis, Grid World Applications Reinforcement Learning Analysis, Grid World Applications Kunal Sharma GTID: ksharma74, CS 4641 Machine Learning Abstract This paper explores two Markov decision process problems with varying state sizes.

More information

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a Endogenou Timing irt half baed on Hamilton & Slutky. "Endogenizing the Order of Move in Matrix Game." Theory and Deciion. 99 Second half baed on Hamilton & Slutky. "Endogenou Timing in Duopoly Game: Stackelberg

More information

Topics in Computational Sustainability CS 325 Spring 2016

Topics in Computational Sustainability CS 325 Spring 2016 Topics in Computational Sustainability CS 325 Spring 2016 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures.

More information

DANIEL FIFE is a postdoctoral fellow in the department of biostatistics, School of Public Health, University of Michigan.

DANIEL FIFE is a postdoctoral fellow in the department of biostatistics, School of Public Health, University of Michigan. KILLING THE GOOSE By Daniel Fife DANIEL FIFE i a potdoctoral fellow in the department of biotatitic, School of Public Health, Univerity of Michigan. Environment, Vol. 13, No. 3 In certain ituation, "indutrial

More information

2D5362 Machine Learning

2D5362 Machine Learning 2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files

More information

Confidence Intervals for One Variance with Tolerance Probability

Confidence Intervals for One Variance with Tolerance Probability Chapter 65 Confidence Interval for One Variance with Tolerance Probability Introduction Thi procedure calculate the ample ize neceary to achieve a pecified width (or in the cae of one-ided interval, the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information

Announcements. CS 188: Artificial Intelligence Spring Expectimax Search Trees. Maximum Expected Utility. What are Probabilities?

Announcements. CS 188: Artificial Intelligence Spring Expectimax Search Trees. Maximum Expected Utility. What are Probabilities? CS 188: Artificial Intelligence Spring 2010 Lecture 8: MEU / Utilities 2/11/2010 Announcements W2 is due today (lecture or drop box) P2 is out and due on 2/18 Pieter Abbeel UC Berkeley Many slides over

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 8: MEU / Utilities 2/11/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein 1 Announcements W2 is due today (lecture or

More information

CS188 Spring 2012 Section 4: Games

CS188 Spring 2012 Section 4: Games CS188 Spring 2012 Section 4: Games 1 Minimax Search In this problem, we will explore adversarial search. Consider the zero-sum game tree shown below. Trapezoids that point up, such as at the root, represent

More information

Decision making in the presence of uncertainty

Decision making in the presence of uncertainty CS 2750 Foundations of AI Lecture 20 Decision making in the presence of uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Decision-making in the presence of uncertainty Computing the probability

More information

CS221 / Spring 2018 / Sadigh. Lecture 7: MDPs I

CS221 / Spring 2018 / Sadigh. Lecture 7: MDPs I CS221 / Spring 2018 / Sadigh Lecture 7: MDPs I cs221.stanford.edu/q Question How would you get to Mountain View on Friday night in the least amount of time? bike drive Caltrain Uber/Lyft fly CS221 / Spring

More information

Lecture 7: MDPs I. Question. Course plan. So far: search problems. Uncertainty in the real world

Lecture 7: MDPs I. Question. Course plan. So far: search problems. Uncertainty in the real world Lecture 7: MDPs I cs221.stanford.edu/q Question How would you get to Mountain View on Friday night in the least amount of time? bike drive Caltrain Uber/Lyft fly CS221 / Spring 2018 / Sadigh CS221 / Spring

More information

Markov Decision Processes. CS 486/686: Introduction to Artificial Intelligence

Markov Decision Processes. CS 486/686: Introduction to Artificial Intelligence Markov Decision Processes CS 486/686: Introduction to Artificial Intelligence 1 Outline Markov Chains Discounted Rewards Markov Decision Processes (MDP) - Value Iteration - Policy Iteration 2 Markov Chains

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2015 Introduction to Artificial Intelligence Midterm 1 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Introduction to Fall 2007 Artificial Intelligence Final Exam

Introduction to Fall 2007 Artificial Intelligence Final Exam NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Final Exam You have 180 minutes. The exam is closed book, closed notes except a two-page crib sheet, basic calculators

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2011 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 7: Expectimax Search 9/15/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Expectimax Search

More information

1324 Test 2 Review Page 1 of 15 Instructor: J. Travis. 15. Graph the following linear inequalities.

1324 Test 2 Review Page 1 of 15 Instructor: J. Travis. 15. Graph the following linear inequalities. 4 Tet Review Page of Intructor: J. Travi. Graph the following linear inequalitie. a) y b) 6+ 4y c) 4y< 6. Graph the following ytem of linear inequalitie. y a) b) + y + y 4 y 7. Ue graphical method to olve

More information

The exam is closed book, closed calculator, and closed notes except your three crib sheets.

The exam is closed book, closed calculator, and closed notes except your three crib sheets. CS 188 Spring 2016 Introduction to Artificial Intelligence Final V2 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your three crib sheets.

More information

Deep RL and Controls Homework 1 Spring 2017

Deep RL and Controls Homework 1 Spring 2017 10-703 Deep RL and Controls Homework 1 Spring 2017 February 1, 2017 Due February 17, 2017 Instructions You have 15 days from the release of the assignment until it is due. Refer to gradescope for the exact

More information

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints David Laibson 9/11/2014 Outline: 1. Precautionary savings motives 2. Liquidity constraints 3. Application: Numerical solution

More information

P s = 1. s=1. where the index of summation t is used to denote states in the denominator, so as to distinguish it from the particular state s, and

P s = 1. s=1. where the index of summation t is used to denote states in the denominator, so as to distinguish it from the particular state s, and ECO 37 Economic of Uncertainty Fall Term 2009 Week 8 Precept Novemer 8 Financial Market - Quetion Tere are S tate of te world laeled =, 2,... S, and H trader laeled =, 2,... H. Eac trader i a price-taker.

More information

340B Aware and Beware

340B Aware and Beware 340B Aware and Beware Being aware of the complex and ever-changing 340B Drug Pricing Program rule help covered entitie maintain integrity and drive program value. Succeful 340B program focu on three fundamental

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

General Examination in Microeconomic Theory

General Examination in Microeconomic Theory HARVARD UNIVERSITY DEPARTMENT OF ECONOMICS General Examination in Microeconomic Theory Fall 06 You have FOUR hour. Anwer all quetion Part A(Glaeer) Part B (Makin) Part C (Hart) Part D (Green) PLEASE USE

More information

(a) The first best decision rule is to choose d=0 in and d=1 in s. (b) If we allocate the decision right to A, the per period expected total payoff

(a) The first best decision rule is to choose d=0 in and d=1 in s. (b) If we allocate the decision right to A, the per period expected total payoff Solution to PS 5 Poblem : daptation (a The fit bet deciion ule i to chooe d0 in and d in. (b If we allocate the deciion ight to, the pe peiod expected total payoff ( EU [ + U] i ( + 4 5. If we allocate

More information

FROM IDENTIFICATION TO BUDGET ALLOCATION: A NOVEL IT RISK MANAGEMENT MODEL FOR ITERATIVE AGILE PROJECTS

FROM IDENTIFICATION TO BUDGET ALLOCATION: A NOVEL IT RISK MANAGEMENT MODEL FOR ITERATIVE AGILE PROJECTS FROM IDENTIFICATION TO BUDGET ALATION: A NOVEL IT RISK MANAGEMENT MODEL FOR ITERATIVE AGILE PROJECTS 1 AHDIEH KHATAVAKHOTAN 1 NAVID HASHEMITABA 1 SIEW HOCK OW khotan@iwa.um.edu.my nhtaba@iwa.um.edu.my

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2016 Introduction to Artificial Intelligence Midterm V2 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Introduction to Artificial Intelligence Spring 2019 Note 2

Introduction to Artificial Intelligence Spring 2019 Note 2 CS 188 Introduction to Artificial Intelligence Spring 2019 Note 2 These lecture notes are heavily based on notes originally written by Nikhil Sharma. Games In the first note, we talked about search problems

More information

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I CS221 / Spring 2018 / Sadigh Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic

More information

INVERSE REWARD DESIGN

INVERSE REWARD DESIGN INVERSE REWARD DESIGN Dylan Hadfield-Menell, Smith Milli, Pieter Abbeel, Stuart Russell, Anca Dragan University of California, Berkeley Slides by Anthony Chen Inverse Reinforcement Learning (Review) Inverse

More information

CS 4100 // artificial intelligence

CS 4100 // artificial intelligence CS 4100 // artificial intelligence instructor: byron wallace (Playing with) uncertainties and expectations Attribution: many of these slides are modified versions of those distributed with the UC Berkeley

More information