Reinforcement Learning

Size: px
Start display at page:

Download "Reinforcement Learning"

Transcription

1 Reinforcement Learning n-step bootstrapping Daniel Hennes University Stuttgart - IPVS - Machine Learning & Robotics 1

2 n-step bootstrapping Unifying Monte Carlo and TD n-step TD n-step Sarsa Tree backup 2

3 n-step TD prediction 3

4 n-step returns Monte Carlo: G t = R t+1 + γr t+2 + γ 2 R t γ T t 1 R T TD: 2-step return: G t:t+1 = R t+1 + γv t (S t+1 ) G t:t+2 = R t+1 + γr t+2 + γ 2 V t+1 (S t+2 ) n-step return: G t:t+n = R t+1 + γr t γ n 1 R t+n + γ n V t+n 1 (S t+n ) 4

5 Error-reduction property Error reduction property of n-step returns max Eπ [G t:t+n S t = s] v π (s) γ n max Vt+n 1 (s) v π (s) s s }{{}}{{} Maximum error using n-step return Maximum error using V G t:t+n = R t+1 + γr t γ n 1 R t+n + γ n V t+n 1 (S t+n ) Using above, we can show that n-step methods converge Generalization of 1-step: max Eπ [R t+1 + γv (S t+1 ) S t = s] v π (s) γ max V (s) vπ (s) s s 5

6 n-step TD n-step return G t:t+n = R t+1 + γr t γ n 1 R t+n + γ n V t+n 1 (S t+n ) Not available until time t + n Natural algorithm is to wait until time t + n n-step TD update: [ ] V t+n (S t ) = V t+n 1 (S t ) + α G t:t+n V t+n 1 (S t ) 6

7 Initialize V (s) arbitrarily, for all s S for each episode do Initialize and store S 0 terminal T repeat for t = 0, 1, 2,... if t < T then Take an action according to π( S t) Observe and store next reward R t+1 and state S t+1 if S t+1 is terminal then T t + 1 end if τ t n + 1 τ is the time whose state s estimate is updated if τ 0 then G min(τ+n,t ) γ i τ 1 R i=τ+1 i if τ + n < T then G G + γ n V (S τ+n) V (S τ ) V (S τ ) + α[g V (S τ )] end if until τ = T 1 end for 7

8 Random walk example 0 A 0 B 0 C 0 D 0 E start 1 Suppose the first episode progressed directly from C to the right, through D and E How does 2-step TD work here? How about 3-step TD? 8

9 19-state random walk n=64 n=32 Average RMS error over 19 states and first 10 episodes n=32 n=16 n=1 n=8 n=4 n=2 An intermediate α is best An intermediate n is best Is there an optimal n? For every task? For larger n, smaller α seems best, why? 9

10 n-step Sarsa Action value of n-step return: G t:t+n = R t+1 + γr t γ n 1 R t+n + γ n Q t+n 1 (S t+n, A t+n ) n-step Sarsa: [ ] Q t+n (S t, A t ) = Q t+n 1 (S t, A t ) + α G t:t+n Q t+n 1 (S t, A t ) n-step Expected Sarsa: same update slightly different n-step return G t:t+n = R t+1 + γr t γ n 1 R t+n + γ n a π(a S t+n )Q t+n 1 (S t+n, a) 10

11 n-step Sarsa 11

12 n-step Sarsa example 12

13 n-step off policy learning Recall the importance sampling ratio: ρ t:h = min(h,t 1) k=t Off policy methods weight updates by this ratio π(a k S k ) µ(a k S k ) Off policy n-step TD: ] V t+n (S t ) = V t+n 1 (S t ) + αρ t:t+n 1 [G t:t+n V t+n 1 (S t ) Off policy n-step Sarsa: ] Q t+n (S t, A t ) = Q t+n 1 (S t, A t ) + αρ t+1:t+n 1 [G t:t+n Q t+n 1 (S t, A t ) Off policy n-step Expected Sarsa: ] Q t+n (S t, A t ) = Q t+n 1 (S t, A t ) + αρ t+1:t+n 2 [G t:t+n Q t+n 1 (S t, A t ), with G t:t+n = R t γ n 1 R t+n + γ n a π(a s)q t+n 1 (s, a) 13

14 Tree Backup Off policy learning without importance sampling Q-learning and Expected Sarsa for one step case Tree Backup: Update from estimated action values of the leaf nodes 14

15 n-step Tree Backup algorithm 1-step tree backup: G t:t+1 = R t+1 + γ a π(a S t+1 )Q t (S t+1, a) 2-step tree backup: G t:t+2 = R t+1 + γ π(a S t+1 )Q t+1 (S t+1, a) a A t+1 ( + γπ(a t+1 S t+1 ) R t+2 + γ ) π(a S t+2 )Q t+1 (S t+2, a) a = R t+1 + γ π(a S t+1 )Q t+1 (S t+1, a) + γπ(a t+1 S t+1 )G t+1:t+2 a A t+1 n-step tree backup: G t:t+n = R t+1 + γ π(a S t+1 )Q t+n 1 (S t+1, a) + γπ(a t+1 S t+1 )G t+1:t+n a A t+1 15

16 A unifying algorithm: n-step Q(σ) Choosing on state by state basis wheter to sample (σ t = 1) or take the expectation (σ t = 0) On half transitions with ρ, importance sampling is required in the off policy case 16

17 Summary n-step bootstrapping generalizes TD and MC learning methods n = 1 is TD n = is MC intermediate n is often better than either extreme applies to both continuing and episodic domains Additional cost in computation we need to remember the last n states learning is delayed by n steps per-step computation is small (like TD) Everything generalizes nicely: error-reduction theory Sarsa, off policy by importance sampling, Expected Sarsa, Tree Backup very general n-step Q(σ) algorithm includes everything 17

Multi-step Bootstrapping

Multi-step Bootstrapping Multi-step Bootstrapping Jennifer She Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto February 7, 2017 J February 7, 2017 1 / 29 Multi-step Bootstrapping Generalization

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Temporal Difference Learning Used Materials Disclaimer: Much of the material and slides

More information

Lecture 4: Model-Free Prediction

Lecture 4: Model-Free Prediction Lecture 4: Model-Free Prediction David Silver Outline 1 Introduction 2 Monte-Carlo Learning 3 Temporal-Difference Learning 4 TD(λ) Introduction Model-Free Reinforcement Learning Last lecture: Planning

More information

Introduction to Reinforcement Learning. MAL Seminar

Introduction to Reinforcement Learning. MAL Seminar Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Motivation: disadvantages of MC methods MC does not work for scenarios without termination It updates only at the end of the episode (sometimes - it i

Motivation: disadvantages of MC methods MC does not work for scenarios without termination It updates only at the end of the episode (sometimes - it i Temporal-Di erence Learning Taras Kucherenko, Joonatan Manttari KTH tarask@kth.se manttari@kth.se March 7, 2017 Taras Kucherenko, Joonatan Manttari (KTH) TD-Learning March 7, 2017 1 / 68 Motivation: disadvantages

More information

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods by following

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Reinforcement Learning 04 - Monte Carlo. Elena, Xi

Reinforcement Learning 04 - Monte Carlo. Elena, Xi Reinforcement Learning 04 - Monte Carlo Elena, Xi Previous lecture 2 Markov Decision Processes Markov decision processes formally describe an environment for reinforcement learning where the environment

More information

2D5362 Machine Learning

2D5362 Machine Learning 2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files

More information

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning)

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) 1 / 24 Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) Julie Nutini MLRG - Winter Term 2 January 24 th, 2017 2 / 24 Monte Carlo Methods Monte Carlo (MC) methods are learning methods, used

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

CS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning

CS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning CS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning Daniel M. Gaines Note: content for slides adapted from Sutton and Barto [1998] Introduction Animals learn through interaction

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Monte Carlo Methods Heiko Zimmermann 15.05.2017 1 Monte Carlo Monte Carlo policy evaluation First visit policy evaluation Estimating q values On policy methods Off policy methods

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

Reinforcement Learning. Monte Carlo and Temporal Difference Learning

Reinforcement Learning. Monte Carlo and Temporal Difference Learning Reinforcement Learning Monte Carlo and Temporal Difference Learning Manfred Huber 2014 1 Monte Carlo Methods Dynamic Programming Requires complete knowledge of the MDP Spends equal time on each part of

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

Summary Sampling Techniques

Summary Sampling Techniques Summary Sampling Techniques MS&E 348 Prof. Gerd Infanger 2005/2006 Using Monte Carlo sampling for solving the problem Monte Carlo sampling works very well for estimating multiple integrals or multiple

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-based RL and Integrated Learning-Planning Planning and Search, Model Learning, Dyna Architecture, Exploration-Exploitation (many slides from lectures of Marc Toussaint & David

More information

Optimal Stopping for American Type Options

Optimal Stopping for American Type Options Optimal Stopping for Department of Mathematics Stockholm University Sweden E-mail: silvestrov@math.su.se ISI 2011, Dublin, 21-26 August 2011 Outline of communication Multivariate Modulated Markov price

More information

CS221 / Spring 2018 / Sadigh. Lecture 8: MDPs II

CS221 / Spring 2018 / Sadigh. Lecture 8: MDPs II CS221 / Spring 218 / Sadigh Lecture 8: MDPs II cs221.stanford.edu/q Question If you wanted to go from Orbisonia to Rockhill, how would you get there? ride bus 1 ride bus 17 ride the magic tram CS221 /

More information

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum Reinforcement learning and Markov Decision Processes (MDPs) 15-859(B) Avrim Blum RL and MDPs General scenario: We are an agent in some state. Have observations, perform actions, get rewards. (See lights,

More information

Reinforcement Learning Lectures 4 and 5

Reinforcement Learning Lectures 4 and 5 Reinforcement Learning Lectures 4 and 5 Gillian Hayes 18th January 2007 Reinforcement Learning 1 Framework Rewards, Returns Environment Dynamics Components of a Problem Values and Action Values, V and

More information

Deep RL and Controls Homework 1 Spring 2017

Deep RL and Controls Homework 1 Spring 2017 10-703 Deep RL and Controls Homework 1 Spring 2017 February 1, 2017 Due February 17, 2017 Instructions You have 15 days from the release of the assignment until it is due. Refer to gradescope for the exact

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

Review: Population, sample, and sampling distributions

Review: Population, sample, and sampling distributions Review: Population, sample, and sampling distributions A population with mean µ and standard deviation σ For instance, µ = 0, σ = 1 0 1 Sample 1, N=30 Sample 2, N=30 Sample 100000000000 InterquartileRange

More information

CS221 / Autumn 2018 / Liang. Lecture 8: MDPs II

CS221 / Autumn 2018 / Liang. Lecture 8: MDPs II CS221 / Autumn 218 / Liang Lecture 8: MDPs II cs221.stanford.edu/q Question If you wanted to go from Orbisonia to Rockhill, how would you get there? ride bus 1 ride bus 17 ride the magic tram CS221 / Autumn

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

Approximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications

Approximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications Approximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications Anna Timonina University of Vienna, Abraham Wald PhD Program in Statistics and Operations

More information

MAFS Computational Methods for Pricing Structured Products

MAFS Computational Methods for Pricing Structured Products MAFS550 - Computational Methods for Pricing Structured Products Solution to Homework Two Course instructor: Prof YK Kwok 1 Expand f(x 0 ) and f(x 0 x) at x 0 into Taylor series, where f(x 0 ) = f(x 0 )

More information

Relational Regression Methods to Speed Up Monte-Carlo Planning

Relational Regression Methods to Speed Up Monte-Carlo Planning Institute of Parallel and Distributed Systems University of Stuttgart Universitätsstraße 38 D 70569 Stuttgart Relational Regression Methods to Speed Up Monte-Carlo Planning Teresa Böpple Course of Study:

More information

Investment strategies and risk management for participating life insurance contracts

Investment strategies and risk management for participating life insurance contracts 1/20 Investment strategies and risk for participating life insurance contracts and Steven Haberman Cass Business School AFIR Colloquium Munich, September 2009 2/20 & Motivation Motivation New supervisory

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

Intro to Reinforcement Learning. Part 3: Core Theory

Intro to Reinforcement Learning. Part 3: Core Theory Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Hierarchical Reinforcement Learning Action hierarchy, hierarchical RL, semi-mdp Vien Ngo Marc Toussaint University of Stuttgart Outline Hierarchical reinforcement learning Learning

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2015 Introduction to Artificial Intelligence Midterm 1 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

2.1 Mathematical Basis: Risk-Neutral Pricing

2.1 Mathematical Basis: Risk-Neutral Pricing Chapter Monte-Carlo Simulation.1 Mathematical Basis: Risk-Neutral Pricing Suppose that F T is the payoff at T for a European-type derivative f. Then the price at times t before T is given by f t = e r(t

More information

Stochastic Dual Dynamic Programming

Stochastic Dual Dynamic Programming 1 / 43 Stochastic Dual Dynamic Programming Operations Research Anthony Papavasiliou 2 / 43 Contents [ 10.4 of BL], [Pereira, 1991] 1 Recalling the Nested L-Shaped Decomposition 2 Drawbacks of Nested Decomposition

More information

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 12: MDP1 Victor R. Lesser CMPSCI 683 Fall 2010 Biased Random GSAT - WalkSat Notice no random restart 2 Today s lecture Search where there is Uncertainty in Operator Outcome --Sequential Decision

More information

Machine Learning for Quantitative Finance

Machine Learning for Quantitative Finance Machine Learning for Quantitative Finance Fast derivative pricing Sofie Reyners Joint work with Jan De Spiegeleer, Dilip Madan and Wim Schoutens Derivative pricing is time-consuming... Vanilla option pricing

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}

More information

GMM for Discrete Choice Models: A Capital Accumulation Application

GMM for Discrete Choice Models: A Capital Accumulation Application GMM for Discrete Choice Models: A Capital Accumulation Application Russell Cooper, John Haltiwanger and Jonathan Willis January 2005 Abstract This paper studies capital adjustment costs. Our goal here

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due

More information

Help Session 2. David Sovich. Washington University in St. Louis

Help Session 2. David Sovich. Washington University in St. Louis Help Session 2 David Sovich Washington University in St. Louis TODAY S AGENDA 1. Refresh the concept of no arbitrage and how to bound option prices using just the principle of no arbitrage 2. Work on applying

More information

"Pricing Exotic Options using Strong Convergence Properties

Pricing Exotic Options using Strong Convergence Properties Fourth Oxford / Princeton Workshop on Financial Mathematics "Pricing Exotic Options using Strong Convergence Properties Klaus E. Schmitz Abe schmitz@maths.ox.ac.uk www.maths.ox.ac.uk/~schmitz Prof. Mike

More information

Optimization Models in Financial Mathematics

Optimization Models in Financial Mathematics Optimization Models in Financial Mathematics John R. Birge Northwestern University www.iems.northwestern.edu/~jrbirge Illinois Section MAA, April 3, 2004 1 Introduction Trends in financial mathematics

More information

CS360 Homework 14 Solution

CS360 Homework 14 Solution CS360 Homework 14 Solution Markov Decision Processes 1) Invent a simple Markov decision process (MDP) with the following properties: a) it has a goal state, b) its immediate action costs are all positive,

More information

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2 COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

Computational Finance Least Squares Monte Carlo

Computational Finance Least Squares Monte Carlo Computational Finance Least Squares Monte Carlo School of Mathematics 2019 Monte Carlo and Binomial Methods In the last two lectures we discussed the binomial tree method and convergence problems. One

More information

Platform Pricing for Ride-Sharing

Platform Pricing for Ride-Sharing Platform Platform for Ride-Sharing Jonathan Andy HBS Digital Initiative May 2016 Ride-Sharing Platforms Platform Dramatic growth of Didi Kuadi, Uber, Lyft, Ola. Spot market approach to transportation Platform

More information

Random Tree Method. Monte Carlo Methods in Financial Engineering

Random Tree Method. Monte Carlo Methods in Financial Engineering Random Tree Method Monte Carlo Methods in Financial Engineering What is it for? solve full optimal stopping problem & estimate value of the American option simulate paths of underlying Markov chain produces

More information

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES Small business banking and financing: a global perspective Cagliari, 25-26 May 2007 ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES C. Angela, R. Bisignani, G. Masala, M. Micocci 1

More information

CS 461: Machine Learning Lecture 8

CS 461: Machine Learning Lecture 8 CS 461: Machine Learning Lecture 8 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu 2/23/08 CS 461, Winter 2008 1 Plan for Today Review Clustering Reinforcement Learning How different from supervised, unsupervised?

More information

Comprehensive Exam. August 19, 2013

Comprehensive Exam. August 19, 2013 Comprehensive Exam August 19, 2013 You have a total of 180 minutes to complete the exam. If a question seems ambiguous, state why, sharpen it up and answer the sharpened-up question. Good luck! 1 1 Menu

More information

Optimal Trade Execution: Mean Variance or Mean Quadratic Variation?

Optimal Trade Execution: Mean Variance or Mean Quadratic Variation? Optimal Trade Execution: Mean Variance or Mean Quadratic Variation? Peter Forsyth 1 S. Tse 2 H. Windcliff 2 S. Kennedy 2 1 Cheriton School of Computer Science University of Waterloo 2 Morgan Stanley New

More information

SYSM 6304: Risk and Decision Analysis Lecture 6: Pricing and Hedging Financial Derivatives

SYSM 6304: Risk and Decision Analysis Lecture 6: Pricing and Hedging Financial Derivatives SYSM 6304: Risk and Decision Analysis Lecture 6: Pricing and Hedging Financial Derivatives M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent

More information

The Values of Information and Solution in Stochastic Programming

The Values of Information and Solution in Stochastic Programming The Values of Information and Solution in Stochastic Programming John R. Birge The University of Chicago Booth School of Business JRBirge ICSP, Bergamo, July 2013 1 Themes The values of information and

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Midterm 1 To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 2 or more hours on the practice

More information

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model Simerjot Kaur (sk3391) Stanford University Abstract This work presents a novel algorithmic trading system based on reinforcement

More information

Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return

Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return Craig Sherstan 1, Dylan R. Ashley 2, Brendan Bennett 2, Kenny Young, Adam White, Martha White, Richard

More information

CS885 Reinforcement Learning Lecture 3b: May 9, 2018

CS885 Reinforcement Learning Lecture 3b: May 9, 2018 CS885 Reinforcement Learning Lecture 3b: May 9, 2018 Intro to Reinforcement Learning [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3, CS885 Spring

More information

Trinomial Tree. Set up a trinomial approximation to the geometric Brownian motion ds/s = r dt + σ dw. a

Trinomial Tree. Set up a trinomial approximation to the geometric Brownian motion ds/s = r dt + σ dw. a Trinomial Tree Set up a trinomial approximation to the geometric Brownian motion ds/s = r dt + σ dw. a The three stock prices at time t are S, Su, and Sd, where ud = 1. Impose the matching of mean and

More information

Chapter 5 Finite Difference Methods. Math6911 W07, HM Zhu

Chapter 5 Finite Difference Methods. Math6911 W07, HM Zhu Chapter 5 Finite Difference Methods Math69 W07, HM Zhu References. Chapters 5 and 9, Brandimarte. Section 7.8, Hull 3. Chapter 7, Numerical analysis, Burden and Faires Outline Finite difference (FD) approximation

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Importance Sampling for Fair Policy Selection

Importance Sampling for Fair Policy Selection Importance Sampling for Fair Policy Selection Shayan Doroudi Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Carnegie Mellon University Pittsburgh, PA 15213 philipt@cs.cmu.edu

More information

Q1. [?? pts] Search Traces

Q1. [?? pts] Search Traces CS 188 Spring 2010 Introduction to Artificial Intelligence Midterm Exam Solutions Q1. [?? pts] Search Traces Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a

More information

IEOR E4703: Monte-Carlo Simulation

IEOR E4703: Monte-Carlo Simulation IEOR E4703: Monte-Carlo Simulation Simulating Stochastic Differential Equations Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Top-down particle filtering for Bayesian decision trees

Top-down particle filtering for Bayesian decision trees Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline

More information

Algorithmic and High-Frequency Trading

Algorithmic and High-Frequency Trading LOBSTER June 2 nd 2016 Algorithmic and High-Frequency Trading Julia Schmidt Overview Introduction Market Making Grossman-Miller Market Making Model Trading Costs Measuring Liquidity Market Making using

More information

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I CS221 / Spring 2018 / Sadigh Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic

More information

The exam is closed book, closed notes except a two-page crib sheet. Non-programmable calculators only.

The exam is closed book, closed notes except a two-page crib sheet. Non-programmable calculators only. CS 188 Spring 2011 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Non-programmable calculators only.

More information

Decision making in the presence of uncertainty

Decision making in the presence of uncertainty Lecture 19 Decision making in the presence of uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Decision-making in the presence of uncertainty Many real-world problems require to choose

More information

EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS

EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Commun. Korean Math. Soc. 23 (2008), No. 2, pp. 285 294 EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Kyoung-Sook Moon Reprinted from the Communications of the Korean Mathematical Society

More information

Portfolio Management and Optimal Execution via Convex Optimization

Portfolio Management and Optimal Execution via Convex Optimization Portfolio Management and Optimal Execution via Convex Optimization Enzo Busseti Stanford University April 9th, 2018 Problems portfolio management choose trades with optimization minimize risk, maximize

More information

Lecture 7: Decision-making under uncertainty: Part 1

Lecture 7: Decision-making under uncertainty: Part 1 princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 7: Decision-making under uncertainty: Part 1 Lecturer: Sanjeev Arora Scribe: Sanjeev Arora This lecture is an introduction to decision theory,

More information

Stochastic Grid Bundling Method

Stochastic Grid Bundling Method Stochastic Grid Bundling Method GPU Acceleration Delft University of Technology - Centrum Wiskunde & Informatica Álvaro Leitao Rodríguez and Cornelis W. Oosterlee London - December 17, 2015 A. Leitao &

More information

Ensemble Methods for Reinforcement Learning with Function Approximation

Ensemble Methods for Reinforcement Learning with Function Approximation Ensemble Methods for Reinforcement Learning with Function Approximation Stefan Faußer and Friedhelm Schwenker Institute of Neural Information Processing, University of Ulm, 89069 Ulm, Germany {stefan.fausser,friedhelm.schwenker}@uni-ulm.de

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes. CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

1 Rare event simulation and importance sampling

1 Rare event simulation and importance sampling Copyright c 2007 by Karl Sigman 1 Rare event simulation and importance sampling Suppose we wish to use Monte Carlo simulation to estimate a probability p = P (A) when the event A is rare (e.g., when p

More information

POMDPs: Partially Observable Markov Decision Processes Advanced AI

POMDPs: Partially Observable Markov Decision Processes Advanced AI POMDPs: Partially Observable Markov Decision Processes Advanced AI Wolfram Burgard Types of Planning Problems Classical Planning State observable Action Model Deterministic, accurate MDPs observable stochastic

More information

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1 Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic Low-level intelligence Machine

More information

The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations

The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations Stan Stilger June 6, 1 Fouque and Tullie use importance sampling for variance reduction in stochastic volatility simulations.

More information

Optimal liquidation with market parameter shift: a forward approach

Optimal liquidation with market parameter shift: a forward approach Optimal liquidation with market parameter shift: a forward approach (with S. Nadtochiy and T. Zariphopoulou) Haoran Wang Ph.D. candidate University of Texas at Austin ICERM June, 2017 Problem Setup and

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2016 Introduction to Artificial Intelligence Midterm V2 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Ryan P. Adams COS 324 Elements of Machine Learning Princeton University We now turn to a new aspect of machine learning, in which agents take actions and become active in their

More information

Single Machine Inserted Idle Time Scheduling with Release Times and Due Dates

Single Machine Inserted Idle Time Scheduling with Release Times and Due Dates Single Machine Inserted Idle Time Scheduling with Release Times and Due Dates Natalia Grigoreva Department of Mathematics and Mechanics, St.Petersburg State University, Russia n.s.grig@gmail.com Abstract.

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Gittins Index: Discounted, Bayesian (hence Markov arms). Reduces to stopping problem for each arm. Interpretation as (scaled)

More information