Chapter 6: Temporal Difference Learning
|
|
- Roderick Shields
- 6 years ago
- Views:
Transcription
1 Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods by following the idea of Generalized Policy Iteration (GPI) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
2 wo Pieces of Motivation: I he neurotransmitter dopamine seems to implement something very similar to the so-called temporal difference error that gives D methods their name. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2
3 wo pieces of motivation: II LEER RESEARCH Nature, February 2015 Extended Data Figure 1 wo-dimensional t-sne embedding of the3 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
4 6.1 D Prediction Policy Evaluation (the prediction problem): for a given policy π, compute the state-value function V π Recall: Simple every-visit Monte Carlo method: [ ] V(S t ) V(S t )+α G t V(S t ) target: the actual return after time t he simplest D method, D(0): [ ] V(S t ) V(S t )+α R t+1 +γ V(S t+1 ) V(S t ) target: an estimate of the return R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4
5 Simple Monte Carlo V(S t ) V(S t )+α [ G t V(S t )] where R t is the actual return following state S t. S t R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5
6 Simplest D Method V(S t ) V(S t )+α R t+1 +γ V(S t+1 ) V(S t ) S t [ ] R t+1 S t+1 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6
7 cf. Dynamic Programming V(S t ) E π { R t+1 +γ V(S t )} S t R t+1 S t+1 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7
8 D methods bootstrap and sample Bootstrapping: update involves an estimate n MC does not bootstrap n DP bootstraps n D bootstraps Sampling: update does not involve an expected value n MC samples n DP does not sample n D samples R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8
9 v (s) = E [G t S t =s] " 1 # X = E k R t+k+1 S t =s k=0 = E "R t+1 + # 1X k R t+k+2 S t =s k=0 = E [R t+1 + v (S t+1 ) S t =s]. Monte Carlo uses an estimate of this Dynamic Programming uses an estimate of this R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 9
10 abular D(0) for estimating state values abular D(0) for estimating v Input: the policy to be evaluated Initialize V (s) arbitrarily (e.g., V (s) =0, 8s 2 S + ) Repeat (for each episode): Initialize S Repeat (for each step of episode): A action given by for S ake action A, observe R, S 0 V (S) V (S)+ R + V (S 0 ) V (S) S S 0 until S is terminal backup diagram R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 10
11 Example: Driving Home Elapsed ime Predicted Predicted State (minutes) ime to Go otal ime leaving o ce, friday at reach car, raining exiting highway ndary road, behind truck entering home street arrive home R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 11
12 Driving Home Changes recommended by Monte Carlo methods (α=1) Changes recommended by D methods (α=1) 45 actual outcome Predicted total travel time leaving office reach exiting 2ndary home arrive car highway road street home Situation R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 12
13 6.2 Advantages of D Prediction Methods D methods do not require a model of the environment, only experience D, but not MC, methods can be fully incremental n n You can learn before knowing the final outcome Less memory Less peak computation You can learn without the final outcome From incomplete sequences Both MC and D converge (under certain assumptions to be detailed later), but which is faster? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 13
14 Random Walk Example A B C D E start 1 Values learned by D(0) after various numbers of episodes R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 14
15 D and MC on the Random Walk Data averaged over 100 sequences of episodes R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 15
10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Temporal Difference Learning Used Materials Disclaimer: Much of the material and slides
More informationLecture 4: Model-Free Prediction
Lecture 4: Model-Free Prediction David Silver Outline 1 Introduction 2 Monte-Carlo Learning 3 Temporal-Difference Learning 4 TD(λ) Introduction Model-Free Reinforcement Learning Last lecture: Planning
More informationMotivation: disadvantages of MC methods MC does not work for scenarios without termination It updates only at the end of the episode (sometimes - it i
Temporal-Di erence Learning Taras Kucherenko, Joonatan Manttari KTH tarask@kth.se manttari@kth.se March 7, 2017 Taras Kucherenko, Joonatan Manttari (KTH) TD-Learning March 7, 2017 1 / 68 Motivation: disadvantages
More informationReinforcement Learning 04 - Monte Carlo. Elena, Xi
Reinforcement Learning 04 - Monte Carlo Elena, Xi Previous lecture 2 Markov Decision Processes Markov decision processes formally describe an environment for reinforcement learning where the environment
More informationReinforcement Learning. Monte Carlo and Temporal Difference Learning
Reinforcement Learning Monte Carlo and Temporal Difference Learning Manfred Huber 2014 1 Monte Carlo Methods Dynamic Programming Requires complete knowledge of the MDP Spends equal time on each part of
More informationThe Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions
The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationIntroduction to Reinforcement Learning. MAL Seminar
Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology
More informationReinforcement Learning
Reinforcement Learning n-step bootstrapping Daniel Hennes 12.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 n-step bootstrapping Unifying Monte Carlo and TD n-step TD n-step Sarsa
More informationMonte Carlo Methods (Estimators, On-policy/Off-policy Learning)
1 / 24 Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) Julie Nutini MLRG - Winter Term 2 January 24 th, 2017 2 / 24 Monte Carlo Methods Monte Carlo (MC) methods are learning methods, used
More informationIntro to Reinforcement Learning. Part 3: Core Theory
Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2
More informationReinforcement Learning
Reinforcement Learning Monte Carlo Methods Heiko Zimmermann 15.05.2017 1 Monte Carlo Monte Carlo policy evaluation First visit policy evaluation Estimating q values On policy methods Off policy methods
More informationMDPs: Bellman Equations, Value Iteration
MDPs: Bellman Equations, Value Iteration Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) Adapted from slides kindly shared by Stuart Russell Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) 1 Appreciations
More informationComplex Decisions. Sequential Decision Making
Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by
More informationReinforcement Learning and Simulation-Based Search
Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision
More informationReinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein
Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the
More informationCS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning
CS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning Daniel M. Gaines Note: content for slides adapted from Sutton and Barto [1998] Introduction Animals learn through interaction
More informationReinforcement Learning Lectures 4 and 5
Reinforcement Learning Lectures 4 and 5 Gillian Hayes 18th January 2007 Reinforcement Learning 1 Framework Rewards, Returns Environment Dynamics Components of a Problem Values and Action Values, V and
More informationMulti-step Bootstrapping
Multi-step Bootstrapping Jennifer She Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto February 7, 2017 J February 7, 2017 1 / 29 Multi-step Bootstrapping Generalization
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More information2D5362 Machine Learning
2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use
More informationCS 461: Machine Learning Lecture 8
CS 461: Machine Learning Lecture 8 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu 2/23/08 CS 461, Winter 2008 1 Plan for Today Review Clustering Reinforcement Learning How different from supervised, unsupervised?
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationReinforcement Learning
Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent
More informationTemporal Abstraction in RL
Temporal Abstraction in RL How can an agent represent stochastic, closed-loop, temporally-extended courses of action? How can it act, learn, and plan using such representations? HAMs (Parr & Russell 1998;
More informationDecision Theory: Value Iteration
Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision
More informationLecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018
Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction
More informationFinancial Engineering and Structured Products
550.448 Financial Engineering and Structured Products Week of March 31, 014 Structured Securitization Liability-Side Cash Flow Analysis & Structured ransactions Assignment Reading (this week, March 31
More informationCOMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2
COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman
More informationReinforcement Learning
Reinforcement Learning Hierarchical Reinforcement Learning Action hierarchy, hierarchical RL, semi-mdp Vien Ngo Marc Toussaint University of Stuttgart Outline Hierarchical reinforcement learning Learning
More informationImportance sampling and Monte Carlo-based calibration for time-changed Lévy processes
Importance sampling and Monte Carlo-based calibration for time-changed Lévy processes Stefan Kassberger Thomas Liebmann BFS 2010 1 Motivation 2 Time-changed Lévy-models and Esscher transforms 3 Applications
More informationReinforcement Learning
Reinforcement Learning Model-based RL and Integrated Learning-Planning Planning and Search, Model Learning, Dyna Architecture, Exploration-Exploitation (many slides from lectures of Marc Toussaint & David
More informationReinforcement Learning
Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical
More informationPractical example of an Economic Scenario Generator
Practical example of an Economic Scenario Generator Martin Schenk Actuarial & Insurance Solutions SAV 7 March 2014 Agenda Introduction Deterministic vs. stochastic approach Mathematical model Application
More informationHomework 1 posted, due Friday, September 30, 2 PM. Independence of random variables: We say that a collection of random variables
Generating Functions Tuesday, September 20, 2011 2:00 PM Homework 1 posted, due Friday, September 30, 2 PM. Independence of random variables: We say that a collection of random variables Is independent
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationBasic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]
Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal
More informationDeep RL and Controls Homework 1 Spring 2017
10-703 Deep RL and Controls Homework 1 Spring 2017 February 1, 2017 Due February 17, 2017 Instructions You have 15 days from the release of the assignment until it is due. Refer to gradescope for the exact
More informationOverview. Transformation method Rejection method. Monte Carlo vs ordinary methods. 1 Random numbers. 2 Monte Carlo integration.
Overview 1 Random numbers Transformation method Rejection method 2 Monte Carlo integration Monte Carlo vs ordinary methods 3 Summary Transformation method Suppose X has probability distribution p X (x),
More informationCS885 Reinforcement Learning Lecture 3b: May 9, 2018
CS885 Reinforcement Learning Lecture 3b: May 9, 2018 Intro to Reinforcement Learning [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3, CS885 Spring
More informationReinforcement Learning. n-armed bandit. n-armed bandit. n-armed bandit estimate. Kevin Spiteri April 21, 2015
Reinforement Learning n-armed andit Kevin Spiteri April 21, 2015 n-armed andit n-armed andit 0.9 0.5 0.1 0.9 0.5 0.1 0.0 0.0 0.0 estimate n-armed andit n-armed andit 0.9 0.5 0.1 0.9 0.5 0.1 0 0.0 0.0 0.0
More informationTemporal Abstraction in RL. Outline. Example. Markov Decision Processes (MDPs) ! Options
Temporal Abstraction in RL Outline How can an agent represent stochastic, closed-loop, temporally-extended courses of action? How can it act, learn, and plan using such representations?! HAMs (Parr & Russell
More informationDynamic Snow Plow Fleet Management under Uncertain Demand and Service DisrupLon
Dynamic Snow Plow Fleet Management under Uncertain Demand and Service DisrupLon Leila Hajibabai 1 and Yanfeng Ouyang 2 1 Washington State Universy 2 Universy of Illinois at Urbana- Champaign January 12,
More informationLecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010
Lecture 12: MDP1 Victor R. Lesser CMPSCI 683 Fall 2010 Biased Random GSAT - WalkSat Notice no random restart 2 Today s lecture Search where there is Uncertainty in Operator Outcome --Sequential Decision
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More informationEE266 Homework 5 Solutions
EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The
More informationThe Problem of Temporal Abstraction
The Problem of Temporal Abstraction How do we connect the high level to the low-level? " the human level to the physical level? " the decide level to the action level? MDPs are great, search is great,
More informationPatrolling in A Stochastic Environment
Patrolling in A Stochastic Environment Student Paper Submission (Suggested Track: Modeling and Simulation) Sui Ruan 1 (Student) E-mail: sruan@engr.uconn.edu Candra Meirina 1 (Student) E-mail: meirina@engr.uconn.edu
More informationGMM for Discrete Choice Models: A Capital Accumulation Application
GMM for Discrete Choice Models: A Capital Accumulation Application Russell Cooper, John Haltiwanger and Jonathan Willis January 2005 Abstract This paper studies capital adjustment costs. Our goal here
More informationMONTE CARLO EXTENSIONS
MONTE CARLO EXTENSIONS School of Mathematics 2013 OUTLINE 1 REVIEW OUTLINE 1 REVIEW 2 EXTENSION TO MONTE CARLO OUTLINE 1 REVIEW 2 EXTENSION TO MONTE CARLO 3 SUMMARY MONTE CARLO SO FAR... Simple to program
More information91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010
91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course
More informationCS 774 Project: Fall 2009 Version: November 27, 2009
CS 774 Project: Fall 2009 Version: November 27, 2009 Instructors: Peter Forsyth, paforsyt@uwaterloo.ca Office Hours: Tues: 4:00-5:00; Thurs: 11:00-12:00 Lectures:MWF 3:30-4:20 MC2036 Office: DC3631 CS
More informationTo earn the extra credit, one of the following has to hold true. Please circle and sign.
CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Midterm 1 To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 2 or more hours on the practice
More informationAssicurazioni Generali: An Option Pricing Case with NAGARCH
Assicurazioni Generali: An Option Pricing Case with NAGARCH Assicurazioni Generali: Business Snapshot Find our latest analyses and trade ideas on bsic.it Assicurazioni Generali SpA is an Italy-based insurance
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in
More informationEnsemble Methods for Reinforcement Learning with Function Approximation
Ensemble Methods for Reinforcement Learning with Function Approximation Stefan Faußer and Friedhelm Schwenker Institute of Neural Information Processing, University of Ulm, 89069 Ulm, Germany {stefan.fausser,friedhelm.schwenker}@uni-ulm.de
More informationReview: Population, sample, and sampling distributions
Review: Population, sample, and sampling distributions A population with mean µ and standard deviation σ For instance, µ = 0, σ = 1 0 1 Sample 1, N=30 Sample 2, N=30 Sample 100000000000 InterquartileRange
More informationRisk Measurement in Credit Portfolio Models
9 th DGVFM Scientific Day 30 April 2010 1 Risk Measurement in Credit Portfolio Models 9 th DGVFM Scientific Day 30 April 2010 9 th DGVFM Scientific Day 30 April 2010 2 Quantitative Risk Management Profit
More informationEconomics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints
Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints David Laibson 9/11/2014 Outline: 1. Precautionary savings motives 2. Liquidity constraints 3. Application: Numerical solution
More informationIntroduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.
Simulation Methods Chapter 13 of Chris Brook s Book Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 April 26, 2017 Christopher
More informationLecture 7: Growth Theory III: Why Doesn t Capital Flow From Rich Countries to Poor Countries?
Lecture 7: Growth Theory III: Why Doesn t Capital Flow From Rich Countries to Poor Countries? See Lucas 1990 Trevor Gallen Spring, 2015 1 / 20 Motivation-I We ve seen the Solow Growth model We ve seen
More informationhp calculators HP 12C Platinum Net Present Value Cash flow and NPV calculations Cash flow diagrams The HP12C Platinum cash flow approach
HP 12C Platinum Net Present Value Cash flow and NPV calculations Cash flow diagrams The HP12C Platinum cash flow approach Practice solving NPV problems How to modify cash flow entries Cash Flow and NPV
More informationAdaptive Experiments for Policy Choice. March 8, 2019
Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:
More informationAn Electronic Market-Maker
massachusetts institute of technology artificial intelligence laboratory An Electronic Market-Maker Nicholas Tung Chan and Christian Shelton AI Memo 21-5 April 17, 21 CBCL Memo 195 21 massachusetts institute
More informationAnnouncements. CS 188: Artificial Intelligence Spring Outline. Reinforcement Learning. Grid Futures. Grid World. Lecture 9: MDPs 2/16/2011
CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDP 2/16/2011 Announcement Midterm: Tueday March 15, 5-8pm P2: Due Friday 4:59pm W3: Minimax, expectimax and MDP---out tonight, due Monday February
More informationCSE 473: Artificial Intelligence
CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due
More informationRATIONAL BUBBLES AND LEARNING
RATIONAL BUBBLES AND LEARNING Rational bubbles arise because of the indeterminate aspect of solutions to rational expectations models, where the process governing stock prices is encapsulated in the Euler
More informationMonte Carlo Methods in Structuring and Derivatives Pricing
Monte Carlo Methods in Structuring and Derivatives Pricing Prof. Manuela Pedio (guest) 20263 Advanced Tools for Risk Management and Pricing Spring 2017 Outline and objectives The basic Monte Carlo algorithm
More informationChapter 2 Uncertainty Analysis and Sampling Techniques
Chapter 2 Uncertainty Analysis and Sampling Techniques The probabilistic or stochastic modeling (Fig. 2.) iterative loop in the stochastic optimization procedure (Fig..4 in Chap. ) involves:. Specifying
More informationRough Heston models: Pricing, hedging and microstructural foundations
Rough Heston models: Pricing, hedging and microstructural foundations Omar El Euch 1, Jim Gatheral 2 and Mathieu Rosenbaum 1 1 École Polytechnique, 2 City University of New York 7 November 2017 O. El Euch,
More informationBusiness 33001: Microeconomics
Business 33001: Microeconomics Owen Zidar University of Chicago Booth School of Business Week 6 Owen Zidar (Chicago Booth) Microeconomics Week 6: Capital & Investment 1 / 80 Today s Class 1 Preliminaries
More informationFinancial Mathematics and Supercomputing
GPU acceleration in early-exercise option valuation Álvaro Leitao and Cornelis W. Oosterlee Financial Mathematics and Supercomputing A Coruña - September 26, 2018 Á. Leitao & Kees Oosterlee SGBM on GPU
More informationDepartment of Mathematics. Mathematics of Financial Derivatives
Department of Mathematics MA408 Mathematics of Financial Derivatives Thursday 15th January, 2009 2pm 4pm Duration: 2 hours Attempt THREE questions MA408 Page 1 of 5 1. (a) Suppose 0 < E 1 < E 3 and E 2
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationCOS 513: Gibbs Sampling
COS 513: Gibbs Sampling Matthew Salesi December 6, 2010 1 Overview Concluding the coverage of Markov chain Monte Carlo (MCMC) sampling methods, we look today at Gibbs sampling. Gibbs sampling is a simple
More informationStochastic Dual Dynamic Programming
1 / 43 Stochastic Dual Dynamic Programming Operations Research Anthony Papavasiliou 2 / 43 Contents [ 10.4 of BL], [Pereira, 1991] 1 Recalling the Nested L-Shaped Decomposition 2 Drawbacks of Nested Decomposition
More informationa 13 Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models The model
Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models This is a lightly edited version of a chapter in a book being written by Jordan. Since this is
More informationMDPs and Value Iteration 2/20/17
MDPs and Value Iteration 2/20/17 Recall: State Space Search Problems A set of discrete states A distinguished start state A set of actions available to the agent in each state An action function that,
More informationFinal exam solutions
EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the
More informationMarkov Decision Processes. Lirong Xia
Markov Decision Processes Lirong Xia Today ØMarkov decision processes search with uncertain moves and infinite space ØComputing optimal policy value iteration policy iteration 2 Grid World Ø The agent
More informationImplementing Personalized Medicine: Estimating Optimal Treatment Regimes
Implementing Personalized Medicine: Estimating Optimal Treatment Regimes Baqun Zhang, Phillip Schulte, Anastasios Tsiatis, Eric Laber, and Marie Davidian Department of Statistics North Carolina State University
More informationPrepared by Pamela Peterson Drake, James Madison University
Prepared by Pamela Peterson Drake, James Madison University Contents Step 1: Calculate the spot rates corresponding to the yields 2 Step 2: Calculate the one-year forward rates for each relevant year ahead
More informationFinancial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA
Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA Rajesh Bordawekar and Daniel Beece IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation
More informationComputational Finance Improving Monte Carlo
Computational Finance Improving Monte Carlo School of Mathematics 2018 Monte Carlo so far... Simple to program and to understand Convergence is slow, extrapolation impossible. Forward looking method ideal
More informationComparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return
Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return Craig Sherstan 1, Dylan R. Ashley 2, Brendan Bennett 2, Kenny Young, Adam White, Martha White, Richard
More informationADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES
Small business banking and financing: a global perspective Cagliari, 25-26 May 2007 ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES C. Angela, R. Bisignani, G. Masala, M. Micocci 1
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Monte Carlo Methods Mark Schmidt University of British Columbia Winter 2019 Last Time: Markov Chains We can use Markov chains for density estimation, d p(x) = p(x 1 ) p(x }{{}
More informationINFORMATION AND WAR PSC/IR 265: CIVIL WAR AND INTERNATIONAL SYSTEMS WILLIAM SPANIEL WJSPANIEL.WORDPRESS.COM/PSCIR-265
INFORMATION AND WAR PSC/IR 265: CIVIL WAR AND INTERNATIONAL SYSTEMS WILLIAM SPANIEL WJSPANIEL.WORDPRESS.COM/PSCIR-265 AGENDA 1. ULTIMATUM GAME 2. EXPERIMENT #2 3. RISK-RETURN TRADEOFF 4. MEDIATION, PREDICTION,
More informationStochastic Grid Bundling Method
Stochastic Grid Bundling Method GPU Acceleration Delft University of Technology - Centrum Wiskunde & Informatica Álvaro Leitao Rodríguez and Cornelis W. Oosterlee London - December 17, 2015 A. Leitao &
More informationFeb. 4 Math 2335 sec 001 Spring 2014
Feb. 4 Math 2335 sec 001 Spring 2014 Propagated Error in Function Evaluation Let f (x) be some differentiable function. Suppose x A is an approximation to x T, and we wish to determine the function value
More informationBrooks, Introductory Econometrics for Finance, 3rd Edition
P1.T2. Quantitative Analysis Brooks, Introductory Econometrics for Finance, 3rd Edition Bionic Turtle FRM Study Notes Sample By David Harper, CFA FRM CIPM and Deepa Raju www.bionicturtle.com Chris Brooks,
More informationFrequency of Price Adjustment and Pass-through
Frequency of Price Adjustment and Pass-through Gita Gopinath Harvard and NBER Oleg Itskhoki Harvard CEFIR/NES March 11, 2009 1 / 39 Motivation Micro-level studies document significant heterogeneity in
More informationOptimal Dam Management
Optimal Dam Management Michel De Lara et Vincent Leclère July 3, 2012 Contents 1 Problem statement 1 1.1 Dam dynamics.................................. 2 1.2 Intertemporal payoff criterion..........................
More informationECN101: Intermediate Macroeconomic Theory TA Section
ECN101: Intermediate Macroeconomic Theory TA Section (jwjung@ucdavis.edu) Department of Economics, UC Davis November 4, 2014 Slides revised: November 4, 2014 Outline 1 2 Fall 2012 Winter 2012 Midterm:
More informationUnobserved Heterogeneity Revisited
Unobserved Heterogeneity Revisited Robert A. Miller Dynamic Discrete Choice March 2018 Miller (Dynamic Discrete Choice) cemmap 7 March 2018 1 / 24 Distributional Assumptions about the Unobserved Variables
More information