Markov Decision Processes II

Size: px
Start display at page:

Download "Markov Decision Processes II"

Transcription

1 Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014

2 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies v σ = r σ + βq σ v σ. The value function v R S : v (s) = sup π Π M v π (s), where Π M is the set of Markov plans. In the end, for a v -greedy policy σ we have v = v σ. 1 / 20

3 Review: Operators T σ : R S R S, σ A S : T σ v = r σ + βq σ v. v σ is the unique fixed point of T σ. T : R S R S : T v = max σ A S r σ + βq σ v. By definition, T σ v T v for any σ and v. σ is v-greedy if T σ v = T v. 2 / 20

4 T σ and T are monotone. T σ (v + c1) = T σ v + βc1 and T (v + c1) = T v + βc1. T σ and T are β-contractions. The unique fixed point of T σ is v σ. The unique fixed point of T is v. A v -greedy policy (which exists) is an optimal policy. (T σ v = T v = v v = v σ.) For any v, T n σ v v σ and T n v v as n. 3 / 20

5 Policy Iteration 1. Set n = 0. Choose any σ 0 ; or choose any v 0 and let σ 0 be a v 0 -greedy policy. 2. [Policy evaluation] Solve (I βq σn )x = r σn for x and let v n+1 = x. 3. [Policy improvement] Compute a v n+1 -greedy policy σ n+1, i.e., a σ n+1 such that T σn+1 v n+1 = T v n If σ n+1 = σ n, then return ˆσ = σ n and ˆv = v n+1. Otherwise, let n = n + 1 and go to Step 2. 4 / 20

6 Proposition 1 The policy iteration algorithm terminates in finitely many steps, and ˆσ is an optimal policy and ˆv is the optimal value. 5 / 20

7 ε-optimality Let v be the value function. v is a δ-approximation of v if v v < δ. σ is an ε-optimal policy if v σ is an ε-approximation of v. 6 / 20

8 Error Bounds 1 Lemma 2 For any v R S, v T v β T v v. 1 β Proof v T v v T m v + T m v T v, where m 1 Second term T k+1 v T k v k=1 m 1 k=1 β k T v v = β βm T v v. 1 β Let m. 7 / 20

9 Lemma 3 For any v R S and any T v-greedy policy σ, v σ T v β 1 β T v v. Proof Denote u = T v. Recall that v σ = T σ v σ and T σ u = T u. Then, v σ u = T σ v σ u T σ v σ T u + T u u = T σ v σ T σ u + T u T v β v σ u + β u v. Rearranging terms yields the desired inequality. 8 / 20

10 Proposition 4 For any v R S and any T v-greedy policy σ, v σ v 2β 1 β T v v. Proof By the previous two lemmas, v σ v v σ T v + T v v β 1 β T v v + β 1 β T v v. 9 / 20

11 Error Bounds 2 For x R S, write m(x) = min i x i and M(x) = max i x i. Lemma 5 For any v R S and any v-greedy policy σ, v + 1 β m(t v v)1 T v + m(t v v)1 1 β 1 β v σ v T v + β 1 M(T v v)1 v + M(T v v)1. 1 β 1 β 10 / 20

12 For x R S, write span(x) = M(x) m(x) (= max x i min x i ). i i Proposition 6 For any v R S and any v-greedy policy σ, v v σ β span(t v v), 1 β and v ( T v + β 1 β 1 β span(t v v). 2 1 β m(t v v) + M(T v v) 1) 2 11 / 20

13 Proof of Lemma 5 Take any v R n, and let σ be a v-greedy policy: T σ v = T v. (Recall m(x) = min i x i and M(x) = max i x i.) Clearly, T σ v = T v v + m(t v v)1. By the properties of T σ, Tσ 2 v T σ (v + m(t v v)1) = T σ v + βm(t v v)1 v + (1 + β)m(t v v)1, Tσ 3 v T σ (v + (1 + β)m(t v v)1) = T σ v + β(1 + β)m(t v v)1 v + (1 + β + β 2 )m(t v v)1,. 12 / 20

14 We thus have T n σ v T σ v + (β + + β n 1 )m(t v v)1 v + (1 + β + + β n 1 )m(t v v)1. Letting n, we have v σ T σ v + Note that T σ v = T v. By a similar procedure, we have v T v + Note finally that v v σ. β 1 m(t v v)1 v + m(t v v)1. 1 β 1 β β 1 M(T v v)1 v + M(T v v)1. 1 β 1 β 13 / 20

15 Remarks Similar estimates with T v v and T v v in place of m(t v v) and M(T v v) hold. (Start with T v v 1 T v v T v v 1.) Since m(x) x and M(x) x, we have span(t v v) 2 T v v. 14 / 20

16 Error Bounds and Termination Conditions Bound 1 Bound 2 Value iteration Modified policy iteration 15 / 20

17 Value Iteration with Norm Bounds Specify ε > Set n = 0. Choose any v Let v n+1 = T v n. 3. If v n+1 v n < 1 β 2β ε, then return ˆv = vn+1 and a ˆv-greedy policy ˆσ. Otherwise, let n = n + 1 and go to Step / 20

18 Proposition 7 Given an ε > 0, the value iteration algorithm as described terminates in finitely many steps, and ˆσ is an ε-optimal policy and ˆv is an ε 2 -approximation of v. 17 / 20

19 Modified Policy Iteration with Span Seminorm Bounds Specify ε > 0 and k Set n = 0. Choose any v [Policy improvement] Compute a v n -greedy policy σ n+1, i.e., a σ n+1 such that T σn+1 v n = T v n. Compute also u n = T v n (= T σn+1 v n ). 3. If span(u n v n ) < 1 β ε, then return ˆσ = σ n+1 and ˆv = u n + β 1 β β m(u n v n )+M(u n v n ) 2 1. Otherwise, go to the next step. 4. [Partial policy evaluation] Let v n+1 = (T σn+1 ) k v n = (T σn+1 ) k 1 u n. Let n = n + 1 and go to Step / 20

20 Fact 1 For modified policy iteration, as n, v n v and hence span(t v n v n ) 0. Proposition 8 Given an ε > 0, the modified policy iteration algorithm as described terminates in finitely many steps, and ˆσ is an ε-optimal policy and ˆv is an ε 2 -approximation of v. 19 / 20

21 References D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice Hall, M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley-Interscience, / 20

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

Intro to Reinforcement Learning. Part 3: Core Theory

Intro to Reinforcement Learning. Part 3: Core Theory Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Lecture Quantitative Finance Spring Term 2015

Lecture Quantitative Finance Spring Term 2015 implied Lecture Quantitative Finance Spring Term 2015 : May 7, 2015 1 / 28 implied 1 implied 2 / 28 Motivation and setup implied the goal of this chapter is to treat the implied which requires an algorithm

More information

Dynamic Admission and Service Rate Control of a Queue

Dynamic Admission and Service Rate Control of a Queue Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering

More information

Handout 4: Deterministic Systems and the Shortest Path Problem

Handout 4: Deterministic Systems and the Shortest Path Problem SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas

More information

B. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as

B. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as B Online Appendix B1 Constructing examples with nonmonotonic adoption policies Assume c > 0 and the utility function u(w) is increasing and approaches as w approaches 0 Suppose we have a prior distribution

More information

Problem Set 3. Thomas Philippon. April 19, Human Wealth, Financial Wealth and Consumption

Problem Set 3. Thomas Philippon. April 19, Human Wealth, Financial Wealth and Consumption Problem Set 3 Thomas Philippon April 19, 2002 1 Human Wealth, Financial Wealth and Consumption The goal of the question is to derive the formulas on p13 of Topic 2. This is a partial equilibrium analysis

More information

Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh

Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Omitted Proofs LEMMA 5: Function ˆV is concave with slope between 1 and 0. PROOF: The fact that ˆV (w) is decreasing in

More information

QI SHANG: General Equilibrium Analysis of Portfolio Benchmarking

QI SHANG: General Equilibrium Analysis of Portfolio Benchmarking General Equilibrium Analysis of Portfolio Benchmarking QI SHANG 23/10/2008 Introduction The Model Equilibrium Discussion of Results Conclusion Introduction This paper studies the equilibrium effect of

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Lecture 8: Asset pricing

Lecture 8: Asset pricing BURNABY SIMON FRASER UNIVERSITY BRITISH COLUMBIA Paul Klein Office: WMC 3635 Phone: (778) 782-9391 Email: paul klein 2@sfu.ca URL: http://paulklein.ca/newsite/teaching/483.php Economics 483 Advanced Topics

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation

A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation E Bergou Y Diouane V Kungurtsev C W Royer July 5, 08 Abstract Globally convergent variants of the Gauss-Newton

More information

Asymptotic results discrete time martingales and stochastic algorithms

Asymptotic results discrete time martingales and stochastic algorithms Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete

More information

arxiv: v1 [math.pr] 6 Apr 2015

arxiv: v1 [math.pr] 6 Apr 2015 Analysis of the Optimal Resource Allocation for a Tandem Queueing System arxiv:1504.01248v1 [math.pr] 6 Apr 2015 Liu Zaiming, Chen Gang, Wu Jinbiao School of Mathematics and Statistics, Central South University,

More information

Log-linear Dynamics and Local Potential

Log-linear Dynamics and Local Potential Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically

More information

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

1 Consumption and saving under uncertainty

1 Consumption and saving under uncertainty 1 Consumption and saving under uncertainty 1.1 Modelling uncertainty As in the deterministic case, we keep assuming that agents live for two periods. The novelty here is that their earnings in the second

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006 On the convergence of Q-learning Elif Özge Özdamar elif.ozdamar@helsinki.fi T-61.6020 Reinforcement Learning - Theory and Applications February 14, 2006 the covergence of stochastic iterative algorithms

More information

Long Term Values in MDPs Second Workshop on Open Games

Long Term Values in MDPs Second Workshop on Open Games A (Co)Algebraic Perspective on Long Term Values in MDPs Second Workshop on Open Games Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July 2018

More information

1 No-arbitrage pricing

1 No-arbitrage pricing BURNABY SIMON FRASER UNIVERSITY BRITISH COLUMBIA Paul Klein Office: WMC 3635 Phone: TBA Email: paul klein 2@sfu.ca URL: http://paulklein.ca/newsite/teaching/809.php Economics 809 Advanced macroeconomic

More information

Local vs Non-local Forward Equations for Option Pricing

Local vs Non-local Forward Equations for Option Pricing Local vs Non-local Forward Equations for Option Pricing Rama Cont Yu Gu Abstract When the underlying asset is a continuous martingale, call option prices solve the Dupire equation, a forward parabolic

More information

Stochastic Differential Equations in Finance and Monte Carlo Simulations

Stochastic Differential Equations in Finance and Monte Carlo Simulations Stochastic Differential Equations in Finance and Department of Statistics and Modelling Science University of Strathclyde Glasgow, G1 1XH China 2009 Outline Stochastic Modelling in Asset Prices 1 Stochastic

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b

ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b 316-406 ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b Chris Edmond hcpedmond@unimelb.edu.aui Aiyagari s model Arguably the most popular example of a simple incomplete markets model is due to Rao Aiyagari (1994,

More information

Approximate Value Iteration with Temporally Extended Actions (Extended Abstract)

Approximate Value Iteration with Temporally Extended Actions (Extended Abstract) Approximate Value Iteration with Temporally Extended Actions (Extended Abstract) Timothy A. Mann DeepMind, London, UK timothymann@google.com Shie Mannor The Technion, Haifa, Israel shie@ee.technion.ac.il

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model

Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model 1(23) Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility

More information

Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium

Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium A Short Review of Aguirregabiria and Magesan (2010) January 25, 2012 1 / 18 Dynamics of the game Two players, {i,

More information

Price of Anarchy Smoothness Price of Stability. Price of Anarchy. Algorithmic Game Theory

Price of Anarchy Smoothness Price of Stability. Price of Anarchy. Algorithmic Game Theory Smoothness Price of Stability Algorithmic Game Theory Smoothness Price of Stability Recall Recall for Nash equilibria: Strategic game Γ, social cost cost(s) for every state s of Γ Consider Σ PNE as the

More information

An optimal policy for joint dynamic price and lead-time quotation

An optimal policy for joint dynamic price and lead-time quotation Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming

More information

ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION

ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION Nima Akbarzadeh, Cem Tekin Bilkent University Electrical and Electronics Engineering Department Ankara, Turkey Mihaela van der Schaar Oxford Man Institute

More information

Regression estimation in continuous time with a view towards pricing Bermudan options

Regression estimation in continuous time with a view towards pricing Bermudan options with a view towards pricing Bermudan options Tagung des SFB 649 Ökonomisches Risiko in Motzen 04.-06.06.2009 Financial engineering in times of financial crisis Derivate... süßes Gift für die Spekulanten

More information

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}

More information

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE 6.21 DYNAMIC PROGRAMMING LECTURE LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path

More information

Lecture 6. 1 Polynomial-time algorithms for the global min-cut problem

Lecture 6. 1 Polynomial-time algorithms for the global min-cut problem ORIE 633 Network Flows September 20, 2007 Lecturer: David P. Williamson Lecture 6 Scribe: Animashree Anandkumar 1 Polynomial-time algorithms for the global min-cut problem 1.1 The global min-cut problem

More information

Relaxations of Approximate Linear Programs for the Real Option Management of Commodity Storage

Relaxations of Approximate Linear Programs for the Real Option Management of Commodity Storage Relaxations of Approximate Linear Programs for the Real Option Management of Commodity Storage Selvaprabu Nadarajah, François Margot, Nicola Secomandi Tepper School of Business, Carnegie Mellon University,

More information

Stochastic Optimization Methods in Scheduling. Rolf H. Möhring Technische Universität Berlin Combinatorial Optimization and Graph Algorithms

Stochastic Optimization Methods in Scheduling. Rolf H. Möhring Technische Universität Berlin Combinatorial Optimization and Graph Algorithms Stochastic Optimization Methods in Scheduling Rolf H. Möhring Technische Universität Berlin Combinatorial Optimization and Graph Algorithms More expensive and longer... Eurotunnel Unexpected loss of 400,000,000

More information

STP Problem Set 3 Solutions

STP Problem Set 3 Solutions STP 425 - Problem Set 3 Solutions 4.4) Consider the separable sequential allocation problem introduced in Sections 3.3.3 and 4.6.3, where the goal is to maximize the sum subject to the constraints f(x

More information

The Irrevocable Multi-Armed Bandit Problem

The Irrevocable Multi-Armed Bandit Problem The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Lecture 5: Iterative Combinatorial Auctions

Lecture 5: Iterative Combinatorial Auctions COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes

More information

Blackwell Optimality in Markov Decision Processes with Partial Observation

Blackwell Optimality in Markov Decision Processes with Partial Observation Blackwell Optimality in Markov Decision Processes with Partial Observation Dinah Rosenberg and Eilon Solan and Nicolas Vieille April 6, 2000 Abstract We prove the existence of Blackwell ε-optimal strategies

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical

More information

Analysis of pricing American options on the maximum (minimum) of two risk assets

Analysis of pricing American options on the maximum (minimum) of two risk assets Interfaces Free Boundaries 4, (00) 7 46 Analysis of pricing American options on the maximum (minimum) of two risk assets LISHANG JIANG Institute of Mathematics, Tongji University, People s Republic of

More information

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,

More information

Admissioncontrolwithbatcharrivals

Admissioncontrolwithbatcharrivals Admissioncontrolwithbatcharrivals E. Lerzan Örmeci Department of Industrial Engineering Koç University Sarıyer 34450 İstanbul-Turkey Apostolos Burnetas Department of Operations Weatherhead School of Management

More information

Accelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India

Accelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India Accelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India Presented at OSL workshop, Les Houches, France. Joint work with Prateek Jain, Sham M. Kakade, Rahul Kidambi and Aaron Sidford Linear

More information

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision

More information

Single Machine Inserted Idle Time Scheduling with Release Times and Due Dates

Single Machine Inserted Idle Time Scheduling with Release Times and Due Dates Single Machine Inserted Idle Time Scheduling with Release Times and Due Dates Natalia Grigoreva Department of Mathematics and Mechanics, St.Petersburg State University, Russia n.s.grig@gmail.com Abstract.

More information

A class of coherent risk measures based on one-sided moments

A class of coherent risk measures based on one-sided moments A class of coherent risk measures based on one-sided moments T. Fischer Darmstadt University of Technology November 11, 2003 Abstract This brief paper explains how to obtain upper boundaries of shortfall

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

SWITCHING COSTS WITH A CONTINUUM OF CONSUMERS

SWITCHING COSTS WITH A CONTINUUM OF CONSUMERS SWITCHING COSTS WITH A CONTINUUM OF CONSUMERS GUY ARIE AND PAUL GRIECO Abstract. We study a switching cost model that uses a continuum of consumers. Using discrete choice demand, the model becomes entirely

More information

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics

More information

Optimal Stopping Rules of Discrete-Time Callable Financial Commodities with Two Stopping Boundaries

Optimal Stopping Rules of Discrete-Time Callable Financial Commodities with Two Stopping Boundaries The Ninth International Symposium on Operations Research Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 215 224 Optimal Stopping Rules of Discrete-Time

More information

Lecture 8: Introduction to asset pricing

Lecture 8: Introduction to asset pricing THE UNIVERSITY OF SOUTHAMPTON Paul Klein Office: Murray Building, 3005 Email: p.klein@soton.ac.uk URL: http://paulklein.se Economics 3010 Topics in Macroeconomics 3 Autumn 2010 Lecture 8: Introduction

More information

Competitive Market Model

Competitive Market Model 57 Chapter 5 Competitive Market Model The competitive market model serves as the basis for the two different multi-user allocation methods presented in this thesis. This market model prices resources based

More information

Long-Term Values in MDPs, Corecursively

Long-Term Values in MDPs, Corecursively Long-Term Values in MDPs, Corecursively Applied Category Theory, 15-16 March 2018, NIST Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) MDPs, Corecursively NIST, 15/Mar/2018

More information

Comprehensive Exam. August 19, 2013

Comprehensive Exam. August 19, 2013 Comprehensive Exam August 19, 2013 You have a total of 180 minutes to complete the exam. If a question seems ambiguous, state why, sharpen it up and answer the sharpened-up question. Good luck! 1 1 Menu

More information

A simple wealth model

A simple wealth model Quantitative Macroeconomics Raül Santaeulàlia-Llopis, MOVE-UAB and Barcelona GSE Homework 5, due Thu Nov 1 I A simple wealth model Consider the sequential problem of a household that maximizes over streams

More information

Introduction to Reinforcement Learning. MAL Seminar

Introduction to Reinforcement Learning. MAL Seminar Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology

More information

Stochastic Dual Dynamic Programming Algorithm for Multistage Stochastic Programming

Stochastic Dual Dynamic Programming Algorithm for Multistage Stochastic Programming Stochastic Dual Dynamic Programg Algorithm for Multistage Stochastic Programg Final presentation ISyE 8813 Fall 2011 Guido Lagos Wajdi Tekaya Georgia Institute of Technology November 30, 2011 Multistage

More information

Econ 582 Nonlinear Regression

Econ 582 Nonlinear Regression Econ 582 Nonlinear Regression Eric Zivot June 3, 2013 Nonlinear Regression In linear regression models = x 0 β (1 )( 1) + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β it is assumed that the regression

More information

Endogenous employment and incomplete markets

Endogenous employment and incomplete markets Endogenous employment and incomplete markets Andres Zambrano Universidad de los Andes June 2, 2014 Motivation Self-insurance models with incomplete markets generate negatively skewed wealth distributions

More information

On the Lower Arbitrage Bound of American Contingent Claims

On the Lower Arbitrage Bound of American Contingent Claims On the Lower Arbitrage Bound of American Contingent Claims Beatrice Acciaio Gregor Svindland December 2011 Abstract We prove that in a discrete-time market model the lower arbitrage bound of an American

More information

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015 Best-Reply Sets Jonathan Weinstein Washington University in St. Louis This version: May 2015 Introduction The best-reply correspondence of a game the mapping from beliefs over one s opponents actions to

More information

Essays on Some Combinatorial Optimization Problems with Interval Data

Essays on Some Combinatorial Optimization Problems with Interval Data Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university

More information

FUNCTION-APPROXIMATION-BASED PERFECT CONTROL VARIATES FOR PRICING AMERICAN OPTIONS. Nomesh Bolia Sandeep Juneja

FUNCTION-APPROXIMATION-BASED PERFECT CONTROL VARIATES FOR PRICING AMERICAN OPTIONS. Nomesh Bolia Sandeep Juneja Proceedings of the 2005 Winter Simulation Conference M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines, eds. FUNCTION-APPROXIMATION-BASED PERFECT CONTROL VARIATES FOR PRICING AMERICAN OPTIONS

More information

Equivalence between Semimartingales and Itô Processes

Equivalence between Semimartingales and Itô Processes International Journal of Mathematical Analysis Vol. 9, 215, no. 16, 787-791 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/ijma.215.411358 Equivalence between Semimartingales and Itô Processes

More information

DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS

DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS Vincent Guigues School of Applied Mathematics, FGV Praia de Botafogo, Rio de Janeiro, Brazil vguigues@fgv.br

More information

Final exam solutions

Final exam solutions EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

Discounted Stochastic Games

Discounted Stochastic Games Discounted Stochastic Games Eilon Solan October 26, 1998 Abstract We give an alternative proof to a result of Mertens and Parthasarathy, stating that every n-player discounted stochastic game with general

More information

A reinforcement learning process in extensive form games

A reinforcement learning process in extensive form games A reinforcement learning process in extensive form games Jean-François Laslier CNRS and Laboratoire d Econométrie de l Ecole Polytechnique, Paris. Bernard Walliser CERAS, Ecole Nationale des Ponts et Chaussées,

More information

ONLY AVAILABLE IN ELECTRONIC FORM

ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1080.0610ec pp. ec1 ec42 e-companion ONLY AVAILABLE IN ELECTRONIC FORM informs 2009 INFORMS Electronic Companion Dynamic Capacity Management with Substitution by Robert

More information

Optimal Dynamic Asset Allocation: A Stochastic Invariance Approach

Optimal Dynamic Asset Allocation: A Stochastic Invariance Approach Proceedings of the 45th IEEE Conference on Decision & Control Manchester Grand Hyatt Hotel San Diego, CA, USA, December 13-15, 26 ThA7.4 Optimal Dynamic Asset Allocation: A Stochastic Invariance Approach

More information

Information aggregation for timing decision making.

Information aggregation for timing decision making. MPRA Munich Personal RePEc Archive Information aggregation for timing decision making. Esteban Colla De-Robertis Universidad Panamericana - Campus México, Escuela de Ciencias Económicas y Empresariales

More information

Bounding Optimal Expected Revenues for Assortment Optimization under Mixtures of Multinomial Logits

Bounding Optimal Expected Revenues for Assortment Optimization under Mixtures of Multinomial Logits Bounding Optimal Expected Revenues for Assortment Optimization under Mixtures of Multinomial Logits Jacob Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca,

More information

TOPICS IN MACROECONOMICS: MODELLING INFORMATION, LEARNING AND EXPECTATIONS LECTURE NOTES. Lucas Island Model

TOPICS IN MACROECONOMICS: MODELLING INFORMATION, LEARNING AND EXPECTATIONS LECTURE NOTES. Lucas Island Model TOPICS IN MACROECONOMICS: MODELLING INFORMATION, LEARNING AND EXPECTATIONS LECTURE NOTES KRISTOFFER P. NIMARK Lucas Island Model The Lucas Island model appeared in a series of papers in the early 970s

More information

AM 121: Intro to Optimization Models and Methods

AM 121: Intro to Optimization Models and Methods AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,

More information

Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the

Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the open text license amendment to version 2 of the GNU General

More information

On worst-case investment with applications in finance and insurance mathematics

On worst-case investment with applications in finance and insurance mathematics On worst-case investment with applications in finance and insurance mathematics Ralf Korn and Olaf Menkens Fachbereich Mathematik, Universität Kaiserslautern, 67653 Kaiserslautern Summary. We review recent

More information

4 Martingales in Discrete-Time

4 Martingales in Discrete-Time 4 Martingales in Discrete-Time Suppose that (Ω, F, P is a probability space. Definition 4.1. A sequence F = {F n, n = 0, 1,...} is called a filtration if each F n is a sub-σ-algebra of F, and F n F n+1

More information

The value of foresight

The value of foresight Philip Ernst Department of Statistics, Rice University Support from NSF-DMS-1811936 (co-pi F. Viens) and ONR-N00014-18-1-2192 gratefully acknowledged. IMA Financial and Economic Applications June 11, 2018

More information

Identification and Estimation of Dynamic Games when Players Beliefs are not in Equilibrium

Identification and Estimation of Dynamic Games when Players Beliefs are not in Equilibrium and of Dynamic Games when Players Beliefs are not in Equilibrium Victor Aguirregabiria and Arvind Magesan Presented by Hanqing Institute, Renmin University of China Outline General Views 1 General Views

More information

Topics in Contract Theory Lecture 5. Property Rights Theory. The key question we are staring from is: What are ownership/property rights?

Topics in Contract Theory Lecture 5. Property Rights Theory. The key question we are staring from is: What are ownership/property rights? Leonardo Felli 15 January, 2002 Topics in Contract Theory Lecture 5 Property Rights Theory The key question we are staring from is: What are ownership/property rights? For an answer we need to distinguish

More information

Dynamic pricing and scheduling in a multi-class single-server queueing system

Dynamic pricing and scheduling in a multi-class single-server queueing system DOI 10.1007/s11134-011-9214-5 Dynamic pricing and scheduling in a multi-class single-server queueing system Eren Başar Çil Fikri Karaesmen E. Lerzan Örmeci Received: 3 April 2009 / Revised: 21 January

More information

Optimal Stopping in Infinite Horizon: an Eigenfunction Expansion Approach

Optimal Stopping in Infinite Horizon: an Eigenfunction Expansion Approach Optimal Stopping in Infinite Horizon: an Eigenfunction Expansion Approach Lingfei Li a, Vadim Linetsky b a Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong

More information

Disaster risk and its implications for asset pricing Online appendix

Disaster risk and its implications for asset pricing Online appendix Disaster risk and its implications for asset pricing Online appendix Jerry Tsai University of Oxford Jessica A. Wachter University of Pennsylvania December 12, 2014 and NBER A The iid model This section

More information

Overview: Representation Techniques

Overview: Representation Techniques 1 Overview: Representation Techniques Week 6 Representations for classical planning problems deterministic environment; complete information Week 7 Logic programs for problem representations including

More information

University of Groningen. Inventory Control for Multi-location Rental Systems van der Heide, Gerlach

University of Groningen. Inventory Control for Multi-location Rental Systems van der Heide, Gerlach University of Groningen Inventory Control for Multi-location Rental Systems van der Heide, Gerlach IMPORTANT NOTE: You are advised to consult the publisher's version publisher's PDF) if you wish to cite

More information

Consumption and Asset Pricing

Consumption and Asset Pricing Consumption and Asset Pricing Yin-Chi Wang The Chinese University of Hong Kong November, 2012 References: Williamson s lecture notes (2006) ch5 and ch 6 Further references: Stochastic dynamic programming:

More information

Revenue Management Under the Markov Chain Choice Model

Revenue Management Under the Markov Chain Choice Model Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin

More information

The ruin probabilities of a multidimensional perturbed risk model

The ruin probabilities of a multidimensional perturbed risk model MATHEMATICAL COMMUNICATIONS 231 Math. Commun. 18(2013, 231 239 The ruin probabilities of a multidimensional perturbed risk model Tatjana Slijepčević-Manger 1, 1 Faculty of Civil Engineering, University

More information