10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25.

Size: px
Start display at page:

Download "10/12/2012. Logistics. Planning Agent. MDPs. Review: Expectimax. PS 2 due Tuesday Thursday 10/18. PS 3 due Thursday 10/25."

Transcription

1 Logitic PS 2 due Tueday Thurday 10/18 CSE 473 Markov Deciion Procee PS 3 due Thurday 10/25 Dan Weld Many lide from Chri Bihop, Mauam, Dan Klein, Stuart Ruell, Andrew Moore & Luke Zettlemoyer MDP Planning Agent Markov Deciion Procee Planning Under Uncertainty Static v. Dynamic Environment Mathematical Framework Bellman Equation Value Iteration Real Time Dynamic Programming Policy Iteration Reinforcement Learning Andrey Markov ( ) Fully v. Partially Obervablebl Perfect v. Noiy What action next Determinitic v. Stochatic Intantaneou v. Durative Percept Action Find a policy : Objective of an MDP which optimize minimize dicounted expected cot to reach a goal or maximize undicount. expected reward maximize expected (reward cot) given a horizon finite infinite indefinite Review: Expectimax What if we don t know what the reult of an action will be E.g., In olitaire, next card i unknown In pacman, the ghot act randomly Can do expectimax earch Max node a in minimax earch Chance node, like min node, except the outcome i uncertain take average (expectation) of children Calculate expected utilitie Today, we formalize a an Markov Deciion Proce Handle intermediate reward & infinite plan More efficient proceing max chance 1

2 Grid World Wall block the agent path Agent action may go atray: 80% of the time, North action take the agent North (auming no wall) 10% actually go Wet 10% actually go Eat If there i a wall in the choen direction, the agent tay put Small living reward each tep Big reward come at the end Goal: maximize um of reward Markov Deciion Procee An MDP i defined by: A et of tate S A et of action a A A tranition function T(,a, ) Probthat a from lead to i.e., P(,a) Alo called the model A reward function R(, a, ) Sometime jut R() or R( ) A tart tate (or ditribution) Maybe a terminal tate MDP: non determinitic earch Reinforcement learning: MDP where we don t know the tranition or reward function What i Markov about MDP Andrey Markov ( ) Markov generally mean that conditioned on the preent tate, the future i independent of the pat For Markov deciion procee, Markov mean: Solving MDP In determinitic ingle-agent earch problem, want an optimal plan, or equence of action, from tart to a goal In an MDP, we want an optimal policy *: S A A policy give an action for each tate Anoptimal policymaximize expected utilityif if followed Define a reflex agent Optimal policy when R(, a, ) = 0.03 for all non terminal Example Optimal Policie Example Optimal Policie R() = 0.01 R() = 0.03 R() = 0.01 R() = 0.03 R() = 0.4 R() = 2.0 R() = 0.4 R() = 2.0 2

3 Example Optimal Policie Example Optimal Policie R() = 0.01 R() = 0.03 R() = 0.01 R() = 0.03 R() = 0.4 R() = 2.0 R() = 0.4 R() = 2.0 Example: High Low Three card type: 2, 3, 4 Infinite deck, twice a many 2 Start with 3 howing After each card, you ay high or low New card i flipped If you re right, ih you win the point hown on the new card Tie are no op (no reward) 0 If you re wrong, game end Difference from expectimax problem: #1: get reward a you go #2: you might play forever! 3 High Low a an MDP State: 2, 3, 4, done Action: High, Low Model: T(, a, ): P( =4 4, Low) = 1/4 P( =3 4, Low) = 1/4 3 P( =2 4, Low) = 1/2 P( =done 4, Low) = 0 P( =4 4, High) = 1/4 P( =3 4, High) = 0 P( =2 4, High) = 0 P( =done 4, High) = 3/4 Reward: R(, a, ): Number hown on if < a= high 0 otherwie Start: 3 Search Tree: High Low MDP Search Tree Each MDP tate give an expectimax like earch tree T = 0.5, R = 2 T = 0.25, R = 3 Low High Low High Low High Low High, Low, High T = 0, R = 4 T = 0.25, R = 0 (, a) i a q-tate,a, a, a i a tate (,a, ) called a tranition T(,a, ) = P(,a) R(,a, ) 3

4 Utilitie of Sequence In order to formalize optimality of a policy, need to undertand utilitie of equence of reward Typically conider tationary preference: Theorem: only two way to define tationary utilitie Additive utility: Infinite Utilitie! Problem: infinite tate equence have infinite reward Solution: Finite horizon: Terminate epiode after a fixed T tep (e.g. life) Give nontationary policie ( depend on time left) Aborbing tate: guarantee that for every policy, a terminal tate will eventually be reached (like done for High Low) Dicounting: for 0 < < 1 Dicounted utility: Smaller mean maller horizon horter term focu Typically dicount reward by < 1 each time tep Sooner reward have higher utility than later reward Alo help the algorithm converge Dicounting Recap: Defining MDP Markov deciion procee: State S Start tate 0 Action A Tranition P(, a) aka T(,a, ),, Reward R(,a, ) (and dicount ) a, a,a, MDP quantitie o far: Policy, = Function that chooe an action for each tate Utility (aka return ) = um of dicounted reward Optimal Utilitie Why Not Search Tree Define the value of a tate : V * () = expected utility tarting in and acting optimally Define the value of a q tate (,a): Q * (,a) = expected utility tarting in, taking action a and thereafter acting optimally Define the optimal policy: * () = optimal action from tate a, a,a, Why not olve with expectimax Problem: Thi tree i uually infinite (why) Same tate appear over and over (why) Wewouldearch once per tate (why) Idea: Value iteration Compute optimal value for all tate all at once uing ucceive approximation Will be a bottom up dynamic program imilar in cot to memoization Do all planning offline, no replanning needed! 4

5 The Bellman Equation Definition of optimal utility lead to a imple one tep look ahead relationhip between optimal utility value: Bellman Equation for MDP Q*(a, ) ( ) a, a,a, Bellman Backup (MDP) Given an etimate of V* function (ay ) Backup function at tate calculate a new etimate (+1 ) : V ax V V 1 = 6.5 Bellman Backup a a 2 2 a 3 V 0 = 0 V 0 = 1 Q 1 (,a 1 ) = ~ 2 Q 1 (,a 2 ) = ~ + 0.1~ 2 ~ 6.1 Q 1 (,a 3 ) = ~ 6.5 Q n+1 (,a) : value/cot of the trategy: execute action a in, execute n ubequently n = argmax a Ap() Q n (,a) max 3 V 0 = 2 Value iteration [Bellman 57] aign an arbitrary aignment of V 0 to each tate. repeat for all tate compute +1 () by Bellman backup at. Iteration n+1 n1 until max +1 () () < -convergence Reidual() Theorem: will converge to unique optimal value Baic idea: approximation get refined toward optimal value Policy may converge long before value do Value Iteration Idea: Start with V 0* () = 0, which we know i right (why) Given V i*, calculate the value for all tate for depth i+1: Thi i called a value update or Bellman update Repeat until convergence Theorem: will converge to unique optimal value Baic idea: approximation get refined toward optimal value Policy may converge long before value do 5

6 Value Etimate Example: Bellman Update Example: =0.9, living reward=0, noie=0.2 Calculate etimate V k* () The optimal value conidering only next k time tep (k reward) A k, V k approache the optimal value Why: If dicounting, ditant reward become negligible If terminal tate reachable from everywhere, fraction of epiode not ending become negligible Otherwie, can get infinite expected utility and then thi approach actually won t work Example: Value Iteration Example: Value Iteration V 1 V 2 QuickTime and a GIF decompreor are needed to ee thi picture. Information propagate outward from terminal tate and eventually all tate have correct value etimate Practice: Computing Action Which action hould we choe from tate : Given optimal value Q Given optimal value V Leon: action are eaier to elect from Q! Comment Deciion theoretic Algorithm Dynamic Programming Fixed Point Computation Probabilitic verion of Bellman Ford Algorithm for hortet path computation MDP 1 : Stochatic Shortet Path Problem Time Complexity one iteration: O( 2 ) number of iteration: poly(,, 1/ 1 ) Space Complexity: O( ) Factored MDP = Planning under uncertainty exponential pace, exponential time 6

7 Convergence Propertie Convergence V* in the limit a n -convergence: function i within of V* Optimality: current policy i within 2 of optimal Monotonicity * V 0 p V * p V* ( monotonic from below) V 0 p V * p V* ( monotonic from above) otherwie non monotonic Define the max norm: Theorem: For any two approximation U t and V t I.e. any ditinct approximation mut get cloer to each other, o, in particular, any approximation mut get cloer to the true V* (aka U) and value iteration converge to a unique, table, optimal olution Theorem: I.e. once the change in our approximation i mall, it mut alo be cloe to correct Value Iteration Complexity Problem ize: A action and S tate Each Iteration Computation: O( A S 2 ) Space: O( S ) Num of iteration Can be exponential in the dicount factor γ Markov Deciion Procee Planning Under Uncertainty MDP Mathematical Framework Bellman Equation Value Iteration Real Time Dynamic Programming Policy Iteration Reinforcement Learning Andrey Markov ( ) Aynchronou Value Iteration State may be backed up in any order Intead of ytematically, iteration by iteration Theorem: A long a every tate i backed up infinitely often Aynchronou value iteration converge to optimal Aynchonou Value Iteration Prioritized Sweeping Why backup a tate if value of ucceor ame Prefer backing a tate whoe ucceor had mot change Priority Queue of (tate, expected change in value) Backup in the order of priority After backing a tate update priority queue for all predeceor 7

8 Aynchonou Value Iteration Real Time Dynamic Programming [Barto, Bradtke, Singh 95] Why Why i next lide aying min Trial: imulate greedy policy tarting from tart tate; perform Bellman backup on viited tate RTDP: Repeat Trial until value function converge RTDP Trial Comment a greedy = a 2 Q n+1 ( 0,a) Min Propertie if all tate are viited infinitely often then V* +1 ( 0 ) a 1 0 a 2 a 3 Goal Advantage Anytime: more probable tate explored quickly Diadvantage complete convergence can be low! Labeled RTDP [Bonet&Geffner ICAPS03] Stochatic Shortet Path Problem Policy w/ min expected cot to reach goal Initialize v 0 () with admiible heuritic Underetimate remaining cot Theorem: if reidual of V k () < and V k ( ) < for all ucc(),, in greedy graph Then V k i conitent and will remain o Labeling algorithm detect convergence Goal Markov Deciion Procee Planning Under Uncertainty MDP Mathematical Framework Bellman Equation Value Iteration Real Time Dynamic Programming Policy Iteration Reinforcement Learning Andrey Markov ( ) 0 8

9 Changing the Search Space Utilitie for Fixed Policie Value Iteration Search in value pace Compute the reulting policy Policy Iteration Search in policy pace Compute the reulting value Another baic operation: compute the utility of a tate under a fix (general non optimal) policy Define the utility of a tate, under a fixed policy : V () = expected total dicounted reward (return) tarting in and following Recurive relation (one tep lookahead / Bellman equation):, (), (), () Policy Evaluation Policy Iteration How do we calculate the V for a fixed policy Idea one: modify Bellman update Idea two: it jut a linear ytem, olve with Matlab (or whatever) Problem with value iteration: Conidering all action each iteration i low: take A time longer than policy evaluation But policy doen t change each iteration, time wated Alternative to value iteration: Step 1: Policy evaluation: calculate utilitie for a fixed policy (not optimal utilitie!) until convergence (fat) Step 2: Policy improvement: update policy uing one tep lookahead with reulting converged (but not optimal!) utilitie (low but infrequent) Repeat tep until policy converge Policy Iteration Policy iteration [Howard 60] Policy evaluation: with fixed current policy, find value with implified Bellman update: Iterate until value converge aign an arbitrary aignment of 0 to each tate. repeat Policy Evaluation: compute +1 the evaluation of n Policy Improvement: for all tate compute n+1 (): argmax a Ap() Q n+1 (,a) cotly: O(n 3 ) Policy improvement: with fixed utilitie, find the bet action according to one tep look ahead until n+1 n Advantage Modified Policy Iteration approximate by value iteration uing fixed policy earching in a finite (policy) pace a oppoed to uncountably infinite (value) pace convergence fater. all other propertie follow! 9

10 Modified Policy iteration aign an arbitrary aignment of 0 to each tate. repeat Policy Evaluation: compute +1 the approx. evaluation of n Policy Improvement: for all tate compute n+1 (): argmax a Ap() Q n+1 (,a) until n+1 n Advantage probably the mot competitive ynchronou dynamic programming algorithm. Policy Iteration Complexity Problem ize: A action and S tate Each Iteration Computation: O( S 3 + A S 2 ) Space: O( S ) Num of iteration Unknown, but can be fater in practice Convergence i guaranteed Comparion In value iteration: Every pa (or backup ) update both utilitie (explicitly, baed on current utilitie) and policy (poibly implicitly, baed on current policy) In policy iteration: Several pae to update utilitie with frozen policy Occaional pae to update policie Hybrid approache (aynchronou policy iteration): Any equence of partial update to either policy entrie or utilitie will converge if every tate i viited infinitely often Recap: MDP Markov deciion procee: State S Action A Tranition P(,a) (or T(,a, )) a Reward R(,a, ) (and dicount ), a Start tate 0,a, Quantitie: Return = um of dicounted reward Value = expected future return from a tate (optimal, or for a fixed policy) Q Value = expected future return from a q tate (optimal, or for a fixed policy) 10

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011

Announcements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011 CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDP 2/16/2011 Announcement Midterm: Tueday March 15, 5-8pm P2: Due Friday 4:59pm W3: Minimax, expectimax and MDP---out tonight, due Monday February

More information

CS 188: Artificial Intelligence Fall Markov Decision Processes

CS 188: Artificial Intelligence Fall Markov Decision Processes CS 188: Artificial Intelligence Fall 2007 Lecture 10: MDP 9/27/2007 Dan Klein UC Berkeley Markov Deciion Procee An MDP i defined by: A et of tate S A et of action a A A tranition function T(,a, ) Prob

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due

More information

Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities

Example: Grid World. CS 188: Artificial Intelligence Markov Decision Processes II. Recap: MDPs. Optimal Quantities CS 188: Artificial Intelligence Markov Deciion Procee II Intructor: Dan Klein and Pieter Abbeel --- Univerity of California, Berkeley [Thee lide were created by Dan Klein and Pieter Abbeel for CS188 Intro

More information

Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1)

Announcements. CS 188: Artificial Intelligence Fall Preferences. Rational Preferences. Rational Preferences. MEU Principle. Project 2 (due 10/1) CS 188: Artificial Intelligence Fall 007 Lecture 9: Utilitie 9/5/007 Dan Klein UC Berkeley Project (due 10/1) Announcement SVN group available, email u to requet Midterm 10/16 in cla One ide of a page

More information

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs?

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs? CS 188: Artificil Intelligence Fll 2010 Lecture 9: MDP 9/2/2010 Reinforcement Lerning [DEMOS] Bic ide: Receive feedbck in the form of rewrd Agent utility i defined by the rewrd function Mut (lern to) ct

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in

More information

MDPs: Bellman Equations, Value Iteration

MDPs: Bellman Equations, Value Iteration MDPs: Bellman Equations, Value Iteration Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) Adapted from slides kindly shared by Stuart Russell Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) 1 Appreciations

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

Markov Decision Processes. Lirong Xia

Markov Decision Processes. Lirong Xia Markov Decision Processes Lirong Xia Today ØMarkov decision processes search with uncertain moves and infinite space ØComputing optimal policy value iteration policy iteration 2 Grid World Ø The agent

More information

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2 COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman

More information

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies.

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies. CS 188: Artificil Intelligence Fll 2008 Lecture 9: MDP 9/25/2008 Announcement Homework olution / review eion: Mondy 9/29, 7-9pm in 2050 Vlley LSB Tuedy 9/0, 6-8pm in 10 Evn Check web for detil Cover W1-2,

More information

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search Overview CSE 473 Mrkov Deciion Procee Dn Weld Mny lide from Chri Bihop, Mum, Dn Klein, Sturt Ruell, Andrew Moore & Luke Zettlemoyer Introduction & Agent Serch, Heuritic & CSP Adverril Serch Logicl Knowledge

More information

CS 6300 Artificial Intelligence Spring 2018

CS 6300 Artificial Intelligence Spring 2018 Expectimax Search CS 6300 Artificial Intelligence Spring 2018 Tucker Hermans thermans@cs.utah.edu Many slides courtesy of Pieter Abbeel and Dan Klein Expectimax Search Trees What if we don t know what

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

General Examination in Microeconomic Theory

General Examination in Microeconomic Theory HARVARD UNIVERSITY DEPARTMENT OF ECONOMICS General Examination in Microeconomic Theory Fall 06 You have FOUR hour. Anwer all quetion Part A(Glaeer) Part B (Makin) Part C (Hart) Part D (Green) PLEASE USE

More information

Optimizing Cost-sensitive Trust-negotiation Protocols

Optimizing Cost-sensitive Trust-negotiation Protocols Optimizing Cot-enitive Trut-negotiation Protocol Weifeng Chen, Lori Clarke, Jim Kuroe, Don Towley Department of Computer Science Univerity of Maachuett, Amhert, MA, 000 {chenwf, clarke, kuroe, towley}@c.uma.edu

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Stochastic domains Image: Berkeley CS188 course notes (downloaded Summer

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations CS 188: Artificil Intelligence Fll 2009 Lecture 10: MDP 9/29/2009 Announcement P2: Due Wednedy P3: MDP nd Reinforcement Lerning i up! W2: Out lte thi week Dn Klein UC Berkeley Mny lide over the coure dpted

More information

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning CS 188: Artificil Intelligence Spring 2011 Lecture 8: Gme, MDP 2/14/2010 Pieter Abbeel UC Berkeley Mny lide dpted from Dn Klein Outline Zero-um determinitic two plyer gme Minimx Evlution function for non-terminl

More information

2D5362 Machine Learning

2D5362 Machine Learning 2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files

More information

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 12: MDP1 Victor R. Lesser CMPSCI 683 Fall 2010 Biased Random GSAT - WalkSat Notice no random restart 2 Today s lecture Search where there is Uncertainty in Operator Outcome --Sequential Decision

More information

CS188 Spring 2012 Section 4: Games

CS188 Spring 2012 Section 4: Games CS188 Spring 2012 Section 4: Games 1 Minimax Search In this problem, we will explore adversarial search. Consider the zero-sum game tree shown below. Trapezoids that point up, such as at the root, represent

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

MDPs and Value Iteration 2/20/17

MDPs and Value Iteration 2/20/17 MDPs and Value Iteration 2/20/17 Recall: State Space Search Problems A set of discrete states A distinguished start state A set of actions available to the agent in each state An action function that,

More information

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions CS 188: Artificil Intelligence Fll 2008 Lecture 10: MDP 9/30/2008 Dn Klein UC Berkeley Recp: MDP Mrkov deciion procee: Stte S Action A Trnition P(,) (or T(,, )) Rewrd R(,, ) (nd dicount γ) Strt tte 0 Quntitie:

More information

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov

More information

Q1. [?? pts] Search Traces

Q1. [?? pts] Search Traces CS 188 Spring 2010 Introduction to Artificial Intelligence Midterm Exam Solutions Q1. [?? pts] Search Traces Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2015 Introduction to Artificial Intelligence Midterm 1 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Ryan P. Adams COS 324 Elements of Machine Learning Princeton University We now turn to a new aspect of machine learning, in which agents take actions and become active in their

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Confidence Intervals for One Variance with Tolerance Probability

Confidence Intervals for One Variance with Tolerance Probability Chapter 65 Confidence Interval for One Variance with Tolerance Probability Introduction Thi procedure calculate the ample ize neceary to achieve a pecified width (or in the cae of one-ided interval, the

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2010 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

Intro to Reinforcement Learning. Part 3: Core Theory

Intro to Reinforcement Learning. Part 3: Core Theory Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2

More information

CEC login. Student Details Name SOLUTIONS

CEC login. Student Details Name SOLUTIONS Student Details Name SOLUTIONS CEC login Instructions You have roughly 1 minute per point, so schedule your time accordingly. There is only one correct answer per question. Good luck! Question 1. Searching

More information

Gridworld Values V* Gridworld: Q*

Gridworld Values V* Gridworld: Q* CS 188: Artificil Intelligence Mrkov Deciion Procee II Intructor: Dn Klein nd Pieter Abbeel --- Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a

Player B ensure a. is the biggest payoff to player A. L R Assume there is no dominant strategy That means a Endogenou Timing irt half baed on Hamilton & Slutky. "Endogenizing the Order of Move in Matrix Game." Theory and Deciion. 99 Second half baed on Hamilton & Slutky. "Endogenou Timing in Duopoly Game: Stackelberg

More information

Static Fully Observable Stochastic What action next? Instantaneous Perfect

Static Fully Observable Stochastic What action next?  Instantaneous Perfect CS 188: Ar)ficil Intelligence Mrkov Deciion Procee K+1 Intructor: Dn Klein nd Pieter Abbeel - - - Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

Action Selection for MDPs: Anytime AO* vs. UCT

Action Selection for MDPs: Anytime AO* vs. UCT Action Selection for MDPs: Anytime AO* vs. UCT Blai Bonet 1 and Hector Geffner 2 1 Universidad Simón Boĺıvar 2 ICREA & Universitat Pompeu Fabra AAAI, Toronto, Canada, July 2012 Online MDP Planning and

More information

CS221 / Spring 2018 / Sadigh. Lecture 7: MDPs I

CS221 / Spring 2018 / Sadigh. Lecture 7: MDPs I CS221 / Spring 2018 / Sadigh Lecture 7: MDPs I cs221.stanford.edu/q Question How would you get to Mountain View on Friday night in the least amount of time? bike drive Caltrain Uber/Lyft fly CS221 / Spring

More information

Lecture 7: MDPs I. Question. Course plan. So far: search problems. Uncertainty in the real world

Lecture 7: MDPs I. Question. Course plan. So far: search problems. Uncertainty in the real world Lecture 7: MDPs I cs221.stanford.edu/q Question How would you get to Mountain View on Friday night in the least amount of time? bike drive Caltrain Uber/Lyft fly CS221 / Spring 2018 / Sadigh CS221 / Spring

More information

Probabilistic Robotics: Probabilistic Planning and MDPs

Probabilistic Robotics: Probabilistic Planning and MDPs Probabilistic Robotics: Probabilistic Planning and MDPs Slide credits: Wolfram Burgard, Dieter Fox, Cyrill Stachniss, Giorgio Grisetti, Maren Bennewitz, Christian Plagemann, Dirk Haehnel, Mike Montemerlo,

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Uncertainty and Utilities Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides are based on those of Dan Klein and Pieter Abbeel for

More information

340B Aware and Beware

340B Aware and Beware 340B Aware and Beware Being aware of the complex and ever-changing 340B Drug Pricing Program rule help covered entitie maintain integrity and drive program value. Succeful 340B program focu on three fundamental

More information

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs.

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities Worst-Case vs. Average Case max min 10 10 9 100 Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro

More information

Lecture outline W.B.Powell 1

Lecture outline W.B.Powell 1 Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous

More information

- International Scientific Journal about Logistics Volume: Issue: 4 Pages: 7-15 ISSN

- International Scientific Journal about Logistics Volume: Issue: 4 Pages: 7-15 ISSN DOI: 10.22306/al.v3i4.72 Received: 03 Dec. 2016 Accepted: 11 Dec. 2016 THE ANALYSIS OF THE COMMODITY PRICE FORECASTING SUCCESS CONSIDERING DIFFERENT NUMERICAL MODELS SENSITIVITY TO PROGNOSIS ERROR Technical

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

Price Trends in a Dynamic Pricing Model with Heterogeneous Customers: A Martingale Perspective

Price Trends in a Dynamic Pricing Model with Heterogeneous Customers: A Martingale Perspective OPERATIONS RESEARCH Vol. 57, No. 5, September October 2009, pp. 1298 1302 in 0030-364X ein 1526-5463 09 5705 1298 inform doi 10.1287/opre.1090.0703 2009 INFORMS TECHNICAL NOTE INFORMS hold copyright to

More information

Building Redundancy in Multi-Agent Systems Using Probabilistic Action

Building Redundancy in Multi-Agent Systems Using Probabilistic Action Proceeding of the Twenty-Ninth International Florida Artificial Intelligence Reearch Society Conference Building Redundancy in Multi-Agent Sytem Uing Probabilitic Action Annie S. Wu, R. Paul Wiegand, and

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2011 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 7: Expectimax Search 9/15/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Expectimax Search

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Uncertainty and Utilities Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at

More information

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration COS402- Artificial Intelligence Fall 2015 Lecture 17: MDP: Value Iteration and Policy Iteration Outline The Bellman equation and Bellman update Contraction Value iteration Policy iteration The Bellman

More information

Capacity Planning in a General Supply Chain with Multiple Contract Types

Capacity Planning in a General Supply Chain with Multiple Contract Types Capacity Planning in a General Supply Chain with Multiple Contract Type Xin Huang and Stephen C. Grave M.I.T. 1 Abtract The ucceful commercialization of any new product depend to a degree on the ability

More information

CS360 Homework 14 Solution

CS360 Homework 14 Solution CS360 Homework 14 Solution Markov Decision Processes 1) Invent a simple Markov decision process (MDP) with the following properties: a) it has a goal state, b) its immediate action costs are all positive,

More information

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted 343H: Honors AI Lecture 7: Expectimax Search 2/6/2014 Kristen Grauman UT-Austin Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted 1 Announcements PS1 is out, due in 2 weeks Last time Adversarial

More information

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes CS 188: Artificil Intelligence Fll 2011 Mximum Expected Utility Why hould we verge utilitie? Why not minimx? Lecture 8: Utilitie / MDP 9/20/2011 Dn Klein UC Berkeley Principle of mximum expected utility:

More information

Confidence Intervals for One Variance using Relative Error

Confidence Intervals for One Variance using Relative Error Chapter 653 Confidence Interval for One Variance uing Relative Error Introduction Thi routine calculate the neceary ample ize uch that a ample variance etimate will achieve a pecified relative ditance

More information

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case CS 188: Artificial Intelligence Uncertainty and Utilities Uncertain Outcomes Instructor: Marco Alvarez University of Rhode Island (These slides were created/modified by Dan Klein, Pieter Abbeel, Anca Dragan

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities CSE 473: Artificial Intelligence Uncertainty, Utilities Probabilities Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are

More information

Deep RL and Controls Homework 1 Spring 2017

Deep RL and Controls Homework 1 Spring 2017 10-703 Deep RL and Controls Homework 1 Spring 2017 February 1, 2017 Due February 17, 2017 Instructions You have 15 days from the release of the assignment until it is due. Refer to gradescope for the exact

More information

Optimizing revenue for bandwidth auctions over networks with time reservations

Optimizing revenue for bandwidth auctions over networks with time reservations Optimizing revenue for bandwidth auction over network with time reervation Pablo Belzarena,a, Fernando Paganini b, André Ferragut b a Facultad de Ingeniería, Univeridad de la República, Montevideo, Uruguay

More information

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1

Intermediate Macroeconomic Theory II, Winter 2009 Solutions to Problem Set 1 Intermediate Macroeconomic Theor II, Winter 2009 Solution to Problem Set 1 1. (18 point) Indicate, when appropriate, for each of the tatement below, whether it i true or fale. Briefl explain, upporting

More information

FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 MONETARY POLICY AND FISCAL POLICY. Introduction

FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 MONETARY POLICY AND FISCAL POLICY. Introduction FISCAL AND MONETARY INTERACTIONS JUNE 15, 2011 Introduction MONETARY POLICY AND FISCAL POLICY Chapter 7: tudied fical policy in iolation from monetary policy Illutrated ome core iue of fical policy (i.e.,

More information

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Monte-Carlo Planning Look Ahead Trees. Alan Fern Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy

More information

The Problem of Temporal Abstraction

The Problem of Temporal Abstraction The Problem of Temporal Abstraction How do we connect the high level to the low-level? " the human level to the physical level? " the decide level to the action level? MDPs are great, search is great,

More information

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World CS 188: Artificil Intelligence Mrkov Deciion Procee Non-Determinitic Serch Dn Klein, Pieter Abbeel Univerity of Cliforni, Berkeley Exmple: Grid World Grid World Action A mze-like problem The gent live

More information

Introduction to Decision Making. CS 486/686: Introduction to Artificial Intelligence

Introduction to Decision Making. CS 486/686: Introduction to Artificial Intelligence Introduction to Decision Making CS 486/686: Introduction to Artificial Intelligence 1 Outline Utility Theory Decision Trees 2 Decision Making Under Uncertainty I give a robot a planning problem: I want

More information

Do you struggle with efficiently managing your assets due to a lack of clear, accurate and accessible data? You re not alone.

Do you struggle with efficiently managing your assets due to a lack of clear, accurate and accessible data? You re not alone. : k o o L e d i In t e A l a t i p a C k r o W t e A ) M A C ( t n e Managem Do you truggle with efficiently managing your aet due to a lack of clear, accurate and acceible data? You re not alone. Many

More information

CS 461: Machine Learning Lecture 8

CS 461: Machine Learning Lecture 8 CS 461: Machine Learning Lecture 8 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu 2/23/08 CS 461, Winter 2008 1 Plan for Today Review Clustering Reinforcement Learning How different from supervised, unsupervised?

More information