Neuro-Dynamic Programming for Fractionated Radiotherapy Planning

Size: px
Start display at page:

Download "Neuro-Dynamic Programming for Fractionated Radiotherapy Planning"

Transcription

1 Neuro-Dynamic Programming for Fractionated Radiotherapy Planning Geng Deng Michael C. Ferris University of Wisconsin at Madison Conference on Optimization and Health Care, Feb, 2006

2 Background Optimal delivery plan Deliver ideal dose on the target while avoid the critical organs and normal tissues.

3 Fractionated radiotherapy (Dynamic problem) Treatments usually last several weeks Limits burning Allows healthy tissue to recover Types of day-to-day error: Registration error, internal organ motion, tumor shrinkage, and non-rigid transformation. Current approach: constant policy. New option: True dose delivered can be measured during individual treatments. Update treatment plan day-to-day (online policy) Compensate for errors

4 Problem overview State and state transition: x k+1 (i) = x k (i) + u k (i + ω k ), i T. (1) Consider simple shifts in each direction Known error distributions Accumulation of errors Determine dose (u k ) to apply to minimize final error

5 Dynamic programming formulation Minimize the cost-to-go function starting at x 0 : [ N 1 ] J 0 (x 0 ) = min E g(x k, x k+1, u k ) + J N (x N ) k=0 s.t. x k+1 (i) = x k (i) + u k (i + ω k ), u k U(x k ), k = 0, 1,, N 1. (2) J N (x N ) is final cost function: J N (X N ) = i T c(i) x N (i) T (i) g(x k, x k+1, u k ) is the immediate cost delivered outside the target: g(x k, x k+1, u k ) = c(i + ω k )u k (i + ω k ) i+ω k / T

6 An iterative formulation The cost-to-go function at stage k can be formulated as: J k (x k ) = min E [g(x k, x k+1, u k ) + J k+1 (x k+1 )] u k U(x k ) Bellman s equation! This is a finite horizon dynamic programming problem.

7 Existing policies We will compare the following policies: Constant policy u k = T /N Reactive policy (Online policy) u k = max(0, T x k )/(N k) Modified reactive policy (Online policy) u k = a max(0, T x k )/(N k)

8 Why do we use NDP? Bellman s equation u k (x k ) = arg min k, x k+1, u k ) + J k+1 (x k+1 )] u k U(x k ) s.t. x k+1 (i) = x k (i) + u k (i + ω k ) (3) Dynamic programming method has difficulty to handle more than 4 stages, because of dimensionality. NDP approximates cost-to-go function J k (x k ) with a simple-structure function J k (x k, r k ). NDP solves the problem fast. NDP obtains sub-optimal solutions.

9 Approximation architectures for J(x, r) Neural network (Input information are based on feature extraction f i (x)) Heuristic mapping: J(x, r) = r 0 + I i=1 r ih ui (x). H ui (x) is the heuristic cost-to-go applying policy u i.

10 Approximate policy iteration Estimate parameters r k. x k, J(, r k ) Bellman s equation û k {x 0i, x 1i,, x Ni }, i = 1,, M Solve least squares problem in r k min r k Generate sample trajectories Evaluate costs c(x ki ) M J k (x ki, r k ) c(x ki ) i=1 Simulation and evaluation steps alternate 2

11 Computational experiments Test a simple one dimensional case and a real problem: head and neck Use 5 candidate policies at each stage Test in high and low volatility scenarios Use two approximation architectures: Neural network: features (f i (x k )) used are average dose, standard deviation of dose, and curvature of dose distribution Heuristic mapping: Heuristic policies used are constant policy, reactive policy and modified reactive policy with a = 2.

12 Performance of approximate policy iteration Final Eror Policy Iteration Number

13 Comparison results in the head and neck problem The figures show results for different policies in the high volatility case: Constant Policy Reactive Policy NDP Policy Constant Policy Reactive Policy NDP Policy Expected Error Expected Error Time Period Time Period Neural network architecture (left) and HEuristic mapping architecture (right) NDP > Reactive > Constant Results of NN and HE are comparably the same, but HE takes much longer computation time Online policies require more computational effort

14 Conclusions Online policies with extra information outperform offline policies DP method is inapplicable in practice. NDP reduces computation time and produces approximately optimal policies Implemented on real patient data Future work: Explore more policies Consider different types of error Fast computation

6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE Suboptimal control Cost approximation methods: Classification Certainty equivalent control: An example Limited lookahead policies Performance bounds

More information

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T.

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T. Practice July 3rd, 2012 Laboratory for Information and Decision Systems, M.I.T. 1 2 Infinite-Horizon DP Minimize over policies the objective cost function J π (x 0 ) = lim N E w k,k=0,1,... DP π = {µ 0,µ

More information

Dynamic Portfolio Choice II

Dynamic Portfolio Choice II Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 8 The portfolio selection problem The portfolio

More information

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time

More information

13.3 A Stochastic Production Planning Model

13.3 A Stochastic Production Planning Model 13.3. A Stochastic Production Planning Model 347 From (13.9), we can formally write (dx t ) = f (dt) + G (dz t ) + fgdz t dt, (13.3) dx t dt = f(dt) + Gdz t dt. (13.33) The exact meaning of these expressions

More information

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 15 : DP Examples

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 15 : DP Examples CE 191: Civil and Environmental Engineering Systems Analysis LEC 15 : DP Examples Professor Scott Moura Civil & Environmental Engineering University of California, Berkeley Fall 2014 Prof. Moura UC Berkeley

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

Stochastic Programming in Gas Storage and Gas Portfolio Management. ÖGOR-Workshop, September 23rd, 2010 Dr. Georg Ostermaier

Stochastic Programming in Gas Storage and Gas Portfolio Management. ÖGOR-Workshop, September 23rd, 2010 Dr. Georg Ostermaier Stochastic Programming in Gas Storage and Gas Portfolio Management ÖGOR-Workshop, September 23rd, 2010 Dr. Georg Ostermaier Agenda Optimization tasks in gas storage and gas portfolio management Scenario

More information

Online Appendix: Extensions

Online Appendix: Extensions B Online Appendix: Extensions In this online appendix we demonstrate that many important variations of the exact cost-basis LUL framework remain tractable. In particular, dual problem instances corresponding

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY A. Ben-Tal, B. Golany and M. Rozenblit Faculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel ABSTRACT

More information

Lecture Notes 1

Lecture Notes 1 4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross

More information

Dynamic Programming (DP) Massimo Paolucci University of Genova

Dynamic Programming (DP) Massimo Paolucci University of Genova Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem

More information

Stochastic Optimal Control

Stochastic Optimal Control Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of

More information

Probabilistic Robotics: Probabilistic Planning and MDPs

Probabilistic Robotics: Probabilistic Planning and MDPs Probabilistic Robotics: Probabilistic Planning and MDPs Slide credits: Wolfram Burgard, Dieter Fox, Cyrill Stachniss, Giorgio Grisetti, Maren Bennewitz, Christian Plagemann, Dirk Haehnel, Mike Montemerlo,

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2016 Slide 1 CPSC 422, Lecture 9 An MDP Approach to Multi-Category Patient Scheduling in a Diagnostic Facility Adapted from: Matthew

More information

Fast Convergence of Regress-later Series Estimators

Fast Convergence of Regress-later Series Estimators Fast Convergence of Regress-later Series Estimators New Thinking in Finance, London Eric Beutner, Antoon Pelsser, Janina Schweizer Maastricht University & Kleynen Consultants 12 February 2014 Beutner Pelsser

More information

Log-Robust Portfolio Management

Log-Robust Portfolio Management Log-Robust Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Elcin Cetinkaya and Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983 Dr.

More information

Optimal Dam Management

Optimal Dam Management Optimal Dam Management Michel De Lara et Vincent Leclère July 3, 2012 Contents 1 Problem statement 1 1.1 Dam dynamics.................................. 2 1.2 Intertemporal payoff criterion..........................

More information

Dynamic Marketing Budget Allocation across Countries, Products, and Marketing Activities

Dynamic Marketing Budget Allocation across Countries, Products, and Marketing Activities Web Appendix Accompanying Dynamic Marketing Budget Allocation across Countries, Products, and Marketing Activities Marc Fischer Sönke Albers 2 Nils Wagner 3 Monika Frie 4 May 200 Revised September 200

More information

Dynamic pricing with diffusion models

Dynamic pricing with diffusion models Dynamic pricing with diffusion models INFORMS revenue management & pricing conference 2017, Amsterdam Asbjørn Nilsen Riseth Supervisors: Jeff Dewynne, Chris Farmer June 29, 2017 OCIAM, University of Oxford

More information

Robust Optimization Applied to a Currency Portfolio

Robust Optimization Applied to a Currency Portfolio Robust Optimization Applied to a Currency Portfolio R. Fonseca, S. Zymler, W. Wiesemann, B. Rustem Workshop on Numerical Methods and Optimization in Finance June, 2009 OUTLINE Introduction Motivation &

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

CHAPTER 5: DYNAMIC PROGRAMMING

CHAPTER 5: DYNAMIC PROGRAMMING CHAPTER 5: DYNAMIC PROGRAMMING Overview This chapter discusses dynamic programming, a method to solve optimization problems that involve a dynamical process. This is in contrast to our previous discussions

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

Dynamic Replication of Non-Maturing Assets and Liabilities

Dynamic Replication of Non-Maturing Assets and Liabilities Dynamic Replication of Non-Maturing Assets and Liabilities Michael Schürle Institute for Operations Research and Computational Finance, University of St. Gallen, Bodanstr. 6, CH-9000 St. Gallen, Switzerland

More information

Lecture outline W.B.Powell 1

Lecture outline W.B.Powell 1 Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous

More information

Unobserved Heterogeneity Revisited

Unobserved Heterogeneity Revisited Unobserved Heterogeneity Revisited Robert A. Miller Dynamic Discrete Choice March 2018 Miller (Dynamic Discrete Choice) cemmap 7 March 2018 1 / 24 Distributional Assumptions about the Unobserved Variables

More information

Optimal Investment with Deferred Capital Gains Taxes

Optimal Investment with Deferred Capital Gains Taxes Optimal Investment with Deferred Capital Gains Taxes A Simple Martingale Method Approach Frank Thomas Seifried University of Kaiserslautern March 20, 2009 F. Seifried (Kaiserslautern) Deferred Capital

More information

SOLVING ROBUST SUPPLY CHAIN PROBLEMS

SOLVING ROBUST SUPPLY CHAIN PROBLEMS SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated

More information

1 Precautionary Savings: Prudence and Borrowing Constraints

1 Precautionary Savings: Prudence and Borrowing Constraints 1 Precautionary Savings: Prudence and Borrowing Constraints In this section we study conditions under which savings react to changes in income uncertainty. Recall that in the PIH, when you abstract from

More information

Predictive Model Learning of Stochastic Simulations. John Hegstrom, FSA, MAAA

Predictive Model Learning of Stochastic Simulations. John Hegstrom, FSA, MAAA Predictive Model Learning of Stochastic Simulations John Hegstrom, FSA, MAAA Table of Contents Executive Summary... 3 Choice of Predictive Modeling Techniques... 4 Neural Network Basics... 4 Financial

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

Chapter 3. Dynamic discrete games and auctions: an introduction

Chapter 3. Dynamic discrete games and auctions: an introduction Chapter 3. Dynamic discrete games and auctions: an introduction Joan Llull Structural Micro. IDEA PhD Program I. Dynamic Discrete Games with Imperfect Information A. Motivating example: firm entry and

More information

1.1 Forms for fractions px + q An expression of the form (x + r) (x + s) quadratic expression which factorises) may be written as

1.1 Forms for fractions px + q An expression of the form (x + r) (x + s) quadratic expression which factorises) may be written as 1 Partial Fractions x 2 + 1 ny rational expression e.g. x (x 2 1) or x 4 x may be written () (x 3) as a sum of simpler fractions. This has uses in many areas e.g. integration or Laplace Transforms. The

More information

Robust Dual Dynamic Programming

Robust Dual Dynamic Programming 1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217 2 / 18 Inspired by SDDP Stochastic optimization

More information

Optimization in Financial Engineering in the Post-Boom Market

Optimization in Financial Engineering in the Post-Boom Market Optimization in Financial Engineering in the Post-Boom Market John R. Birge Northwestern University www.iems.northwestern.edu/~jrbirge SIAM Optimization Toronto May 2002 1 Introduction History of financial

More information

All Investors are Risk-averse Expected Utility Maximizers

All Investors are Risk-averse Expected Utility Maximizers All Investors are Risk-averse Expected Utility Maximizers Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel) AFFI, Lyon, May 2013. Carole Bernard All Investors are

More information

THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION

THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION SILAS A. IHEDIOHA 1, BRIGHT O. OSU 2 1 Department of Mathematics, Plateau State University, Bokkos, P. M. B. 2012, Jos,

More information

Budget Management In GSP (2018)

Budget Management In GSP (2018) Budget Management In GSP (2018) Yahoo! March 18, 2018 Miguel March 18, 2018 1 / 26 Today s Presentation: Budget Management Strategies in Repeated auctions, Balseiro, Kim, and Mahdian, WWW2017 Learning

More information

Economic optimization in Model Predictive Control

Economic optimization in Model Predictive Control Economic optimization in Model Predictive Control Rishi Amrit Department of Chemical and Biological Engineering University of Wisconsin-Madison 29 th February, 2008 Rishi Amrit (UW-Madison) Economic Optimization

More information

Hedging with Life and General Insurance Products

Hedging with Life and General Insurance Products Hedging with Life and General Insurance Products June 2016 2 Hedging with Life and General Insurance Products Jungmin Choi Department of Mathematics East Carolina University Abstract In this study, a hybrid

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Reinforcement Learning and Optimal Control. Chapter 1 Exact Dynamic Programming DRAFT

Reinforcement Learning and Optimal Control. Chapter 1 Exact Dynamic Programming DRAFT Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 1 Exact Dynamic Programming DRAFT This is Chapter 1 of the draft textbook Reinforcement

More information

On Using Shadow Prices in Portfolio optimization with Transaction Costs

On Using Shadow Prices in Portfolio optimization with Transaction Costs On Using Shadow Prices in Portfolio optimization with Transaction Costs Johannes Muhle-Karbe Universität Wien Joint work with Jan Kallsen Universidad de Murcia 12.03.2010 Outline The Merton problem The

More information

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints David Laibson 9/11/2014 Outline: 1. Precautionary savings motives 2. Liquidity constraints 3. Application: Numerical solution

More information

ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION

ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION Nima Akbarzadeh, Cem Tekin Bilkent University Electrical and Electronics Engineering Department Ankara, Turkey Mihaela van der Schaar Oxford Man Institute

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

A model reduction approach to numerical inversion for parabolic partial differential equations

A model reduction approach to numerical inversion for parabolic partial differential equations A model reduction approach to numerical inversion for parabolic partial differential equations Liliana Borcea Alexander V. Mamonov 2, Vladimir Druskin 3, Mikhail Zaslavsky 3 University of Michigan, Ann

More information

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective Systems from a Queueing Perspective September 7, 2012 Problem A surveillance resource must observe several areas, searching for potential adversaries. Problem A surveillance resource must observe several

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

Illiquidity, Credit risk and Merton s model

Illiquidity, Credit risk and Merton s model Illiquidity, Credit risk and Merton s model (joint work with J. Dong and L. Korobenko) A. Deniz Sezer University of Calgary April 28, 2016 Merton s model of corporate debt A corporate bond is a contingent

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Optimization Models in Financial Engineering and Modeling Challenges

Optimization Models in Financial Engineering and Modeling Challenges Optimization Models in Financial Engineering and Modeling Challenges John Birge University of Chicago Booth School of Business JRBirge UIUC, 25 Mar 2009 1 Introduction History of financial engineering

More information

Operation Research II

Operation Research II Operation Research II Johan Oscar Ong, ST, MT Grading Requirements: Min 80% Present in Class Having Good Attitude Score/Grade : Quiz and Assignment : 30% Mid test (UTS) : 35% Final Test (UAS) : 35% No

More information

Estimating term structure of interest rates: neural network vs one factor parametric models

Estimating term structure of interest rates: neural network vs one factor parametric models Estimating term structure of interest rates: neural network vs one factor parametric models F. Abid & M. B. Salah Faculty of Economics and Busines, Sfax, Tunisia Abstract The aim of this paper is twofold;

More information

PART II IT Methods in Finance

PART II IT Methods in Finance PART II IT Methods in Finance Introduction to Part II This part contains 12 chapters and is devoted to IT methods in finance. There are essentially two ways where IT enters and influences methods used

More information

AM 121: Intro to Optimization Models and Methods

AM 121: Intro to Optimization Models and Methods AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

All Investors are Risk-averse Expected Utility Maximizers. Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel)

All Investors are Risk-averse Expected Utility Maximizers. Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel) All Investors are Risk-averse Expected Utility Maximizers Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel) First Name: Waterloo, April 2013. Last Name: UW ID #:

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

A distributed Laplace transform algorithm for European options

A distributed Laplace transform algorithm for European options A distributed Laplace transform algorithm for European options 1 1 A. J. Davies, M. E. Honnor, C.-H. Lai, A. K. Parrott & S. Rout 1 Department of Physics, Astronomy and Mathematics, University of Hertfordshire,

More information

Multi-armed bandit problems

Multi-armed bandit problems Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before

More information

Sensitivity Analysis with Data Tables. 10% annual interest now =$110 one year later. 10% annual interest now =$121 one year later

Sensitivity Analysis with Data Tables. 10% annual interest now =$110 one year later. 10% annual interest now =$121 one year later Sensitivity Analysis with Data Tables Time Value of Money: A Special kind of Trade-Off: $100 @ 10% annual interest now =$110 one year later $110 @ 10% annual interest now =$121 one year later $100 @ 10%

More information

Differential Geometry: Curvature, Maps, and Pizza

Differential Geometry: Curvature, Maps, and Pizza Differential Geometry: Curvature, Maps, and Pizza Madelyne Ventura University of Maryland December 8th, 2015 Madelyne Ventura (University of Maryland) Curvature, Maps, and Pizza December 8th, 2015 1 /

More information

2D5362 Machine Learning

2D5362 Machine Learning 2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files

More information

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Prof. Chuan-Ju Wang Department of Computer Science University of Taipei Joint work with Prof. Ming-Yang Kao March 28, 2014

More information

A model reduction approach to numerical inversion for parabolic partial differential equations

A model reduction approach to numerical inversion for parabolic partial differential equations A model reduction approach to numerical inversion for parabolic partial differential equations Liliana Borcea Alexander V. Mamonov 2, Vladimir Druskin 2, Mikhail Zaslavsky 2 University of Michigan, Ann

More information

Optimal energy management and stochastic decomposition

Optimal energy management and stochastic decomposition Optimal energy management and stochastic decomposition F. Pacaud P. Carpentier J.P. Chancelier M. De Lara JuMP-dev workshop, 2018 ENPC ParisTech ENSTA ParisTech Efficacity 1/23 Motivation We consider a

More information

Section 9.1 Solving Linear Inequalities

Section 9.1 Solving Linear Inequalities Section 9.1 Solving Linear Inequalities We know that a linear equation in x can be expressed as ax + b = 0. A linear inequality in x can be written in one of the following forms: ax + b < 0, ax + b 0,

More information

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006 On the convergence of Q-learning Elif Özge Özdamar elif.ozdamar@helsinki.fi T-61.6020 Reinforcement Learning - Theory and Applications February 14, 2006 the covergence of stochastic iterative algorithms

More information

Notes for Econ202A: Consumption

Notes for Econ202A: Consumption Notes for Econ22A: Consumption Pierre-Olivier Gourinchas UC Berkeley Fall 215 c Pierre-Olivier Gourinchas, 215, ALL RIGHTS RESERVED. Disclaimer: These notes are riddled with inconsistencies, typos and

More information

Optimal switching problems for daily power system balancing

Optimal switching problems for daily power system balancing Optimal switching problems for daily power system balancing Dávid Zoltán Szabó University of Manchester davidzoltan.szabo@postgrad.manchester.ac.uk June 13, 2016 ávid Zoltán Szabó (University of Manchester)

More information

Frequency of Price Adjustment and Pass-through

Frequency of Price Adjustment and Pass-through Frequency of Price Adjustment and Pass-through Gita Gopinath Harvard and NBER Oleg Itskhoki Harvard CEFIR/NES March 11, 2009 1 / 39 Motivation Micro-level studies document significant heterogeneity in

More information

Accelerated Option Pricing Multiple Scenarios

Accelerated Option Pricing Multiple Scenarios Accelerated Option Pricing in Multiple Scenarios 04.07.2008 Stefan Dirnstorfer (stefan@thetaris.com) Andreas J. Grau (grau@thetaris.com) 1 Abstract This paper covers a massive acceleration of Monte-Carlo

More information

Deep RL and Controls Homework 1 Spring 2017

Deep RL and Controls Homework 1 Spring 2017 10-703 Deep RL and Controls Homework 1 Spring 2017 February 1, 2017 Due February 17, 2017 Instructions You have 15 days from the release of the assignment until it is due. Refer to gradescope for the exact

More information

Macroeconomics 2. Lecture 7 - Labor markets: Introduction & the search model March. Sciences Po

Macroeconomics 2. Lecture 7 - Labor markets: Introduction & the search model March. Sciences Po Macroeconomics 2 Lecture 7 - Labor markets: Introduction & the search model Zsófia L. Bárány Sciences Po 2014 March The neoclassical model of the labor market central question for macro and labor: what

More information

Statistical and Computational Inverse Problems with Applications Part 5B: Electrical impedance tomography

Statistical and Computational Inverse Problems with Applications Part 5B: Electrical impedance tomography Statistical and Computational Inverse Problems with Applications Part 5B: Electrical impedance tomography Aku Seppänen Inverse Problems Group Department of Applied Physics University of Eastern Finland

More information

Multistage Stochastic Demand-side Management for Price-Making Major Consumers of Electricity in a Co-optimized Energy and Reserve Market

Multistage Stochastic Demand-side Management for Price-Making Major Consumers of Electricity in a Co-optimized Energy and Reserve Market Multistage Stochastic Demand-side Management for Price-Making Major Consumers of Electricity in a Co-optimized Energy and Reserve Market Mahbubeh Habibian Anthony Downward Golbon Zakeri Abstract In this

More information

Risk Management for Chemical Supply Chain Planning under Uncertainty

Risk Management for Chemical Supply Chain Planning under Uncertainty for Chemical Supply Chain Planning under Uncertainty Fengqi You and Ignacio E. Grossmann Dept. of Chemical Engineering, Carnegie Mellon University John M. Wassick The Dow Chemical Company Introduction

More information

Homework 2: Solutions Sid Banerjee Problem 1: Practice with Dynamic Programming Formulation

Homework 2: Solutions Sid Banerjee Problem 1: Practice with Dynamic Programming Formulation Problem 1: Practice with Dynamic Programming Formulation A product manager has to order stock daily. Each unit cost is c, there is a fixed cost of K for placing an order. If you order on day t, the items

More information

AMH4 - ADVANCED OPTION PRICING. Contents

AMH4 - ADVANCED OPTION PRICING. Contents AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5

More information

Simple Robust Hedging with Nearby Contracts

Simple Robust Hedging with Nearby Contracts Simple Robust Hedging with Nearby Contracts Liuren Wu and Jingyi Zhu Baruch College and University of Utah October 22, 2 at Worcester Polytechnic Institute Wu & Zhu (Baruch & Utah) Robust Hedging with

More information

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move

More information

Trading Financial Markets with Online Algorithms

Trading Financial Markets with Online Algorithms Trading Financial Markets with Online Algorithms Esther Mohr and Günter Schmidt Abstract. Investors which trade in financial markets are interested in buying at low and selling at high prices. We suggest

More information

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics

More information

ON MAXIMIZING DIVIDENDS WITH INVESTMENT AND REINSURANCE

ON MAXIMIZING DIVIDENDS WITH INVESTMENT AND REINSURANCE ON MAXIMIZING DIVIDENDS WITH INVESTMENT AND REINSURANCE George S. Ongkeko, Jr. a, Ricardo C.H. Del Rosario b, Maritina T. Castillo c a Insular Life of the Philippines, Makati City 0725, Philippines b Department

More information

Estimating a Life Cycle Model with Unemployment and Human Capital Depreciation

Estimating a Life Cycle Model with Unemployment and Human Capital Depreciation Estimating a Life Cycle Model with Unemployment and Human Capital Depreciation Andreas Pollak 26 2 min presentation for Sargent s RG // Estimating a Life Cycle Model with Unemployment and Human Capital

More information

Introduction to Dynamic Programming

Introduction to Dynamic Programming Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1

More information

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA We begin by describing the problem at hand which motivates our results. Suppose that we have n financial instruments at hand,

More information

Contract Theory in Continuous- Time Models

Contract Theory in Continuous- Time Models Jaksa Cvitanic Jianfeng Zhang Contract Theory in Continuous- Time Models fyj Springer Table of Contents Part I Introduction 1 Principal-Agent Problem 3 1.1 Problem Formulation 3 1.2 Further Reading 6 References

More information

Optimal Allocation of Policy Limits and Deductibles

Optimal Allocation of Policy Limits and Deductibles Optimal Allocation of Policy Limits and Deductibles Ka Chun Cheung Email: kccheung@math.ucalgary.ca Tel: +1-403-2108697 Fax: +1-403-2825150 Department of Mathematics and Statistics, University of Calgary,

More information

Optimal routing and placement of orders in limit order markets

Optimal routing and placement of orders in limit order markets Optimal routing and placement of orders in limit order markets Rama CONT Arseniy KUKANOV Imperial College London Columbia University New York CFEM-GARP Joint Event and Seminar 05/01/13, New York Choices,

More information

FX Smile Modelling. 9 September September 9, 2008

FX Smile Modelling. 9 September September 9, 2008 FX Smile Modelling 9 September 008 September 9, 008 Contents 1 FX Implied Volatility 1 Interpolation.1 Parametrisation............................. Pure Interpolation.......................... Abstract

More information

Appendix for "Financial Markets Views about the. Euro-Swiss Franc Floor"

Appendix for Financial Markets Views about the. Euro-Swiss Franc Floor Appendix for "Financial Markets Views about the Euro-Swiss Franc Floor" Urban J. Jermann January 21, 2017 Contents 1 Empirical approach in detail 2 2 Robustness to alternative weighting functions 4 3 Model

More information