Markov Decision Processes II
|
|
- Jane Cannon
- 6 years ago
- Views:
Transcription
1 Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014
2 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies v σ = r σ + βq σ v σ. The value function v R S : v (s) = sup π Π M v π (s), where Π M is the set of Markov plans. In the end, for a v -greedy policy σ we have v = v σ. 1 / 20
3 Review: Operators T σ : R S R S, σ A S : T σ v = r σ + βq σ v. v σ is the unique fixed point of T σ. T : R S R S : T v = max σ A S r σ + βq σ v. By definition, T σ v T v for any σ and v. σ is v-greedy if T σ v = T v. 2 / 20
4 T σ and T are monotone. T σ (v + c1) = T σ v + βc1 and T (v + c1) = T v + βc1. T σ and T are β-contractions. The unique fixed point of T σ is v σ. The unique fixed point of T is v. A v -greedy policy (which exists) is an optimal policy. (T σ v = T v = v v = v σ.) For any v, T n σ v v σ and T n v v as n. 3 / 20
5 Policy Iteration 1. Set n = 0. Choose any σ 0 ; or choose any v 0 and let σ 0 be a v 0 -greedy policy. 2. [Policy evaluation] Solve (I βq σn )x = r σn for x and let v n+1 = x. 3. [Policy improvement] Compute a v n+1 -greedy policy σ n+1, i.e., a σ n+1 such that T σn+1 v n+1 = T v n If σ n+1 = σ n, then return ˆσ = σ n and ˆv = v n+1. Otherwise, let n = n + 1 and go to Step 2. 4 / 20
6 Proposition 1 The policy iteration algorithm terminates in finitely many steps, and ˆσ is an optimal policy and ˆv is the optimal value. 5 / 20
7 ε-optimality Let v be the value function. v is a δ-approximation of v if v v < δ. σ is an ε-optimal policy if v σ is an ε-approximation of v. 6 / 20
8 Error Bounds 1 Lemma 2 For any v R S, v T v β T v v. 1 β Proof v T v v T m v + T m v T v, where m 1 Second term T k+1 v T k v k=1 m 1 k=1 β k T v v = β βm T v v. 1 β Let m. 7 / 20
9 Lemma 3 For any v R S and any T v-greedy policy σ, v σ T v β 1 β T v v. Proof Denote u = T v. Recall that v σ = T σ v σ and T σ u = T u. Then, v σ u = T σ v σ u T σ v σ T u + T u u = T σ v σ T σ u + T u T v β v σ u + β u v. Rearranging terms yields the desired inequality. 8 / 20
10 Proposition 4 For any v R S and any T v-greedy policy σ, v σ v 2β 1 β T v v. Proof By the previous two lemmas, v σ v v σ T v + T v v β 1 β T v v + β 1 β T v v. 9 / 20
11 Error Bounds 2 For x R S, write m(x) = min i x i and M(x) = max i x i. Lemma 5 For any v R S and any v-greedy policy σ, v + 1 β m(t v v)1 T v + m(t v v)1 1 β 1 β v σ v T v + β 1 M(T v v)1 v + M(T v v)1. 1 β 1 β 10 / 20
12 For x R S, write span(x) = M(x) m(x) (= max x i min x i ). i i Proposition 6 For any v R S and any v-greedy policy σ, v v σ β span(t v v), 1 β and v ( T v + β 1 β 1 β span(t v v). 2 1 β m(t v v) + M(T v v) 1) 2 11 / 20
13 Proof of Lemma 5 Take any v R n, and let σ be a v-greedy policy: T σ v = T v. (Recall m(x) = min i x i and M(x) = max i x i.) Clearly, T σ v = T v v + m(t v v)1. By the properties of T σ, Tσ 2 v T σ (v + m(t v v)1) = T σ v + βm(t v v)1 v + (1 + β)m(t v v)1, Tσ 3 v T σ (v + (1 + β)m(t v v)1) = T σ v + β(1 + β)m(t v v)1 v + (1 + β + β 2 )m(t v v)1,. 12 / 20
14 We thus have T n σ v T σ v + (β + + β n 1 )m(t v v)1 v + (1 + β + + β n 1 )m(t v v)1. Letting n, we have v σ T σ v + Note that T σ v = T v. By a similar procedure, we have v T v + Note finally that v v σ. β 1 m(t v v)1 v + m(t v v)1. 1 β 1 β β 1 M(T v v)1 v + M(T v v)1. 1 β 1 β 13 / 20
15 Remarks Similar estimates with T v v and T v v in place of m(t v v) and M(T v v) hold. (Start with T v v 1 T v v T v v 1.) Since m(x) x and M(x) x, we have span(t v v) 2 T v v. 14 / 20
16 Error Bounds and Termination Conditions Bound 1 Bound 2 Value iteration Modified policy iteration 15 / 20
17 Value Iteration with Norm Bounds Specify ε > Set n = 0. Choose any v Let v n+1 = T v n. 3. If v n+1 v n < 1 β 2β ε, then return ˆv = vn+1 and a ˆv-greedy policy ˆσ. Otherwise, let n = n + 1 and go to Step / 20
18 Proposition 7 Given an ε > 0, the value iteration algorithm as described terminates in finitely many steps, and ˆσ is an ε-optimal policy and ˆv is an ε 2 -approximation of v. 17 / 20
19 Modified Policy Iteration with Span Seminorm Bounds Specify ε > 0 and k Set n = 0. Choose any v [Policy improvement] Compute a v n -greedy policy σ n+1, i.e., a σ n+1 such that T σn+1 v n = T v n. Compute also u n = T v n (= T σn+1 v n ). 3. If span(u n v n ) < 1 β ε, then return ˆσ = σ n+1 and ˆv = u n + β 1 β β m(u n v n )+M(u n v n ) 2 1. Otherwise, go to the next step. 4. [Partial policy evaluation] Let v n+1 = (T σn+1 ) k v n = (T σn+1 ) k 1 u n. Let n = n + 1 and go to Step / 20
20 Fact 1 For modified policy iteration, as n, v n v and hence span(t v n v n ) 0. Proposition 8 Given an ε > 0, the modified policy iteration algorithm as described terminates in finitely many steps, and ˆσ is an ε-optimal policy and ˆv is an ε 2 -approximation of v. 19 / 20
21 References D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice Hall, M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley-Interscience, / 20
CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm
CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationIntro to Reinforcement Learning. Part 3: Core Theory
Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationLecture Quantitative Finance Spring Term 2015
implied Lecture Quantitative Finance Spring Term 2015 : May 7, 2015 1 / 28 implied 1 implied 2 / 28 Motivation and setup implied the goal of this chapter is to treat the implied which requires an algorithm
More informationDynamic Admission and Service Rate Control of a Queue
Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering
More informationHandout 4: Deterministic Systems and the Shortest Path Problem
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas
More informationB. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as
B Online Appendix B1 Constructing examples with nonmonotonic adoption policies Assume c > 0 and the utility function u(w) is increasing and approaches as w approaches 0 Suppose we have a prior distribution
More informationProblem Set 3. Thomas Philippon. April 19, Human Wealth, Financial Wealth and Consumption
Problem Set 3 Thomas Philippon April 19, 2002 1 Human Wealth, Financial Wealth and Consumption The goal of the question is to derive the formulas on p13 of Topic 2. This is a partial equilibrium analysis
More informationOnline Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh
Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Omitted Proofs LEMMA 5: Function ˆV is concave with slope between 1 and 0. PROOF: The fact that ˆV (w) is decreasing in
More informationQI SHANG: General Equilibrium Analysis of Portfolio Benchmarking
General Equilibrium Analysis of Portfolio Benchmarking QI SHANG 23/10/2008 Introduction The Model Equilibrium Discussion of Results Conclusion Introduction This paper studies the equilibrium effect of
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationLecture 8: Asset pricing
BURNABY SIMON FRASER UNIVERSITY BRITISH COLUMBIA Paul Klein Office: WMC 3635 Phone: (778) 782-9391 Email: paul klein 2@sfu.ca URL: http://paulklein.ca/newsite/teaching/483.php Economics 483 Advanced Topics
More informationLecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018
Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationA Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation
A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation E Bergou Y Diouane V Kungurtsev C W Royer July 5, 08 Abstract Globally convergent variants of the Gauss-Newton
More informationAsymptotic results discrete time martingales and stochastic algorithms
Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete
More informationarxiv: v1 [math.pr] 6 Apr 2015
Analysis of the Optimal Resource Allocation for a Tandem Queueing System arxiv:1504.01248v1 [math.pr] 6 Apr 2015 Liu Zaiming, Chen Gang, Wu Jinbiao School of Mathematics and Statistics, Central South University,
More informationLog-linear Dynamics and Local Potential
Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically
More informationDynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming
Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role
More informationMaking Complex Decisions
Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2
More information1 Consumption and saving under uncertainty
1 Consumption and saving under uncertainty 1.1 Modelling uncertainty As in the deterministic case, we keep assuming that agents live for two periods. The novelty here is that their earnings in the second
More information17 MAKING COMPLEX DECISIONS
267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the
More informationElif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006
On the convergence of Q-learning Elif Özge Özdamar elif.ozdamar@helsinki.fi T-61.6020 Reinforcement Learning - Theory and Applications February 14, 2006 the covergence of stochastic iterative algorithms
More informationLong Term Values in MDPs Second Workshop on Open Games
A (Co)Algebraic Perspective on Long Term Values in MDPs Second Workshop on Open Games Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July 2018
More information1 No-arbitrage pricing
BURNABY SIMON FRASER UNIVERSITY BRITISH COLUMBIA Paul Klein Office: WMC 3635 Phone: TBA Email: paul klein 2@sfu.ca URL: http://paulklein.ca/newsite/teaching/809.php Economics 809 Advanced macroeconomic
More informationLocal vs Non-local Forward Equations for Option Pricing
Local vs Non-local Forward Equations for Option Pricing Rama Cont Yu Gu Abstract When the underlying asset is a continuous martingale, call option prices solve the Dupire equation, a forward parabolic
More informationStochastic Differential Equations in Finance and Monte Carlo Simulations
Stochastic Differential Equations in Finance and Department of Statistics and Modelling Science University of Strathclyde Glasgow, G1 1XH China 2009 Outline Stochastic Modelling in Asset Prices 1 Stochastic
More informationCPS 270: Artificial Intelligence Markov decision processes, POMDPs
CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward
More informationADVANCED MACROECONOMIC TECHNIQUES NOTE 7b
316-406 ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b Chris Edmond hcpedmond@unimelb.edu.aui Aiyagari s model Arguably the most popular example of a simple incomplete markets model is due to Rao Aiyagari (1994,
More informationApproximate Value Iteration with Temporally Extended Actions (Extended Abstract)
Approximate Value Iteration with Temporally Extended Actions (Extended Abstract) Timothy A. Mann DeepMind, London, UK timothymann@google.com Shie Mannor The Technion, Haifa, Israel shie@ee.technion.ac.il
More informationHandout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,
More informationValuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model
Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model 1(23) Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility
More informationIdentification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium
Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium A Short Review of Aguirregabiria and Magesan (2010) January 25, 2012 1 / 18 Dynamics of the game Two players, {i,
More informationPrice of Anarchy Smoothness Price of Stability. Price of Anarchy. Algorithmic Game Theory
Smoothness Price of Stability Algorithmic Game Theory Smoothness Price of Stability Recall Recall for Nash equilibria: Strategic game Γ, social cost cost(s) for every state s of Γ Consider Σ PNE as the
More informationAn optimal policy for joint dynamic price and lead-time quotation
Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming
More informationONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION
ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION Nima Akbarzadeh, Cem Tekin Bilkent University Electrical and Electronics Engineering Department Ankara, Turkey Mihaela van der Schaar Oxford Man Institute
More informationRegression estimation in continuous time with a view towards pricing Bermudan options
with a view towards pricing Bermudan options Tagung des SFB 649 Ökonomisches Risiko in Motzen 04.-06.06.2009 Financial engineering in times of financial crisis Derivate... süßes Gift für die Spekulanten
More informationThe Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions
The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions Optimality and Approximation Finite MDP: {S, A, R, p, γ}
More information6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE
6.21 DYNAMIC PROGRAMMING LECTURE LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path
More informationLecture 6. 1 Polynomial-time algorithms for the global min-cut problem
ORIE 633 Network Flows September 20, 2007 Lecturer: David P. Williamson Lecture 6 Scribe: Animashree Anandkumar 1 Polynomial-time algorithms for the global min-cut problem 1.1 The global min-cut problem
More informationRelaxations of Approximate Linear Programs for the Real Option Management of Commodity Storage
Relaxations of Approximate Linear Programs for the Real Option Management of Commodity Storage Selvaprabu Nadarajah, François Margot, Nicola Secomandi Tepper School of Business, Carnegie Mellon University,
More informationStochastic Optimization Methods in Scheduling. Rolf H. Möhring Technische Universität Berlin Combinatorial Optimization and Graph Algorithms
Stochastic Optimization Methods in Scheduling Rolf H. Möhring Technische Universität Berlin Combinatorial Optimization and Graph Algorithms More expensive and longer... Eurotunnel Unexpected loss of 400,000,000
More informationSTP Problem Set 3 Solutions
STP 425 - Problem Set 3 Solutions 4.4) Consider the separable sequential allocation problem introduced in Sections 3.3.3 and 4.6.3, where the goal is to maximize the sum subject to the constraints f(x
More informationThe Irrevocable Multi-Armed Bandit Problem
The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationLecture 5: Iterative Combinatorial Auctions
COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes
More informationBlackwell Optimality in Markov Decision Processes with Partial Observation
Blackwell Optimality in Markov Decision Processes with Partial Observation Dinah Rosenberg and Eilon Solan and Nicolas Vieille April 6, 2000 Abstract We prove the existence of Blackwell ε-optimal strategies
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use
More informationSocially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors
Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical
More informationAnalysis of pricing American options on the maximum (minimum) of two risk assets
Interfaces Free Boundaries 4, (00) 7 46 Analysis of pricing American options on the maximum (minimum) of two risk assets LISHANG JIANG Institute of Mathematics, Tongji University, People s Republic of
More informationStrategies and Nash Equilibrium. A Whirlwind Tour of Game Theory
Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,
More informationAdmissioncontrolwithbatcharrivals
Admissioncontrolwithbatcharrivals E. Lerzan Örmeci Department of Industrial Engineering Koç University Sarıyer 34450 İstanbul-Turkey Apostolos Burnetas Department of Operations Weatherhead School of Management
More informationAccelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India
Accelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India Presented at OSL workshop, Les Houches, France. Joint work with Prateek Jain, Sham M. Kakade, Rahul Kidambi and Aaron Sidford Linear
More informationInformation Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)
Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision
More informationSingle Machine Inserted Idle Time Scheduling with Release Times and Due Dates
Single Machine Inserted Idle Time Scheduling with Release Times and Due Dates Natalia Grigoreva Department of Mathematics and Mechanics, St.Petersburg State University, Russia n.s.grig@gmail.com Abstract.
More informationA class of coherent risk measures based on one-sided moments
A class of coherent risk measures based on one-sided moments T. Fischer Darmstadt University of Technology November 11, 2003 Abstract This brief paper explains how to obtain upper boundaries of shortfall
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationSWITCHING COSTS WITH A CONTINUUM OF CONSUMERS
SWITCHING COSTS WITH A CONTINUUM OF CONSUMERS GUY ARIE AND PAUL GRIECO Abstract. We study a switching cost model that uses a continuum of consumers. Using discrete choice demand, the model becomes entirely
More informationOptimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models
Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics
More informationOptimal Stopping Rules of Discrete-Time Callable Financial Commodities with Two Stopping Boundaries
The Ninth International Symposium on Operations Research Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 215 224 Optimal Stopping Rules of Discrete-Time
More informationLecture 8: Introduction to asset pricing
THE UNIVERSITY OF SOUTHAMPTON Paul Klein Office: Murray Building, 3005 Email: p.klein@soton.ac.uk URL: http://paulklein.se Economics 3010 Topics in Macroeconomics 3 Autumn 2010 Lecture 8: Introduction
More informationCompetitive Market Model
57 Chapter 5 Competitive Market Model The competitive market model serves as the basis for the two different multi-user allocation methods presented in this thesis. This market model prices resources based
More informationLong-Term Values in MDPs, Corecursively
Long-Term Values in MDPs, Corecursively Applied Category Theory, 15-16 March 2018, NIST Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) MDPs, Corecursively NIST, 15/Mar/2018
More informationComprehensive Exam. August 19, 2013
Comprehensive Exam August 19, 2013 You have a total of 180 minutes to complete the exam. If a question seems ambiguous, state why, sharpen it up and answer the sharpened-up question. Good luck! 1 1 Menu
More informationA simple wealth model
Quantitative Macroeconomics Raül Santaeulàlia-Llopis, MOVE-UAB and Barcelona GSE Homework 5, due Thu Nov 1 I A simple wealth model Consider the sequential problem of a household that maximizes over streams
More informationIntroduction to Reinforcement Learning. MAL Seminar
Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology
More informationStochastic Dual Dynamic Programming Algorithm for Multistage Stochastic Programming
Stochastic Dual Dynamic Programg Algorithm for Multistage Stochastic Programg Final presentation ISyE 8813 Fall 2011 Guido Lagos Wajdi Tekaya Georgia Institute of Technology November 30, 2011 Multistage
More informationEcon 582 Nonlinear Regression
Econ 582 Nonlinear Regression Eric Zivot June 3, 2013 Nonlinear Regression In linear regression models = x 0 β (1 )( 1) + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β it is assumed that the regression
More informationEndogenous employment and incomplete markets
Endogenous employment and incomplete markets Andres Zambrano Universidad de los Andes June 2, 2014 Motivation Self-insurance models with incomplete markets generate negatively skewed wealth distributions
More informationOn the Lower Arbitrage Bound of American Contingent Claims
On the Lower Arbitrage Bound of American Contingent Claims Beatrice Acciaio Gregor Svindland December 2011 Abstract We prove that in a discrete-time market model the lower arbitrage bound of an American
More informationBest-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015
Best-Reply Sets Jonathan Weinstein Washington University in St. Louis This version: May 2015 Introduction The best-reply correspondence of a game the mapping from beliefs over one s opponents actions to
More informationEssays on Some Combinatorial Optimization Problems with Interval Data
Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university
More informationFUNCTION-APPROXIMATION-BASED PERFECT CONTROL VARIATES FOR PRICING AMERICAN OPTIONS. Nomesh Bolia Sandeep Juneja
Proceedings of the 2005 Winter Simulation Conference M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines, eds. FUNCTION-APPROXIMATION-BASED PERFECT CONTROL VARIATES FOR PRICING AMERICAN OPTIONS
More informationEquivalence between Semimartingales and Itô Processes
International Journal of Mathematical Analysis Vol. 9, 215, no. 16, 787-791 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/ijma.215.411358 Equivalence between Semimartingales and Itô Processes
More informationDASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS
DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS Vincent Guigues School of Applied Mathematics, FGV Praia de Botafogo, Rio de Janeiro, Brazil vguigues@fgv.br
More informationFinal exam solutions
EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the
More informationMaking Decisions. CS 3793 Artificial Intelligence Making Decisions 1
Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside
More informationDiscounted Stochastic Games
Discounted Stochastic Games Eilon Solan October 26, 1998 Abstract We give an alternative proof to a result of Mertens and Parthasarathy, stating that every n-player discounted stochastic game with general
More informationA reinforcement learning process in extensive form games
A reinforcement learning process in extensive form games Jean-François Laslier CNRS and Laboratoire d Econométrie de l Ecole Polytechnique, Paris. Bernard Walliser CERAS, Ecole Nationale des Ponts et Chaussées,
More informationONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1080.0610ec pp. ec1 ec42 e-companion ONLY AVAILABLE IN ELECTRONIC FORM informs 2009 INFORMS Electronic Companion Dynamic Capacity Management with Substitution by Robert
More informationOptimal Dynamic Asset Allocation: A Stochastic Invariance Approach
Proceedings of the 45th IEEE Conference on Decision & Control Manchester Grand Hyatt Hotel San Diego, CA, USA, December 13-15, 26 ThA7.4 Optimal Dynamic Asset Allocation: A Stochastic Invariance Approach
More informationInformation aggregation for timing decision making.
MPRA Munich Personal RePEc Archive Information aggregation for timing decision making. Esteban Colla De-Robertis Universidad Panamericana - Campus México, Escuela de Ciencias Económicas y Empresariales
More informationBounding Optimal Expected Revenues for Assortment Optimization under Mixtures of Multinomial Logits
Bounding Optimal Expected Revenues for Assortment Optimization under Mixtures of Multinomial Logits Jacob Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca,
More informationTOPICS IN MACROECONOMICS: MODELLING INFORMATION, LEARNING AND EXPECTATIONS LECTURE NOTES. Lucas Island Model
TOPICS IN MACROECONOMICS: MODELLING INFORMATION, LEARNING AND EXPECTATIONS LECTURE NOTES KRISTOFFER P. NIMARK Lucas Island Model The Lucas Island model appeared in a series of papers in the early 970s
More informationAM 121: Intro to Optimization Models and Methods
AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,
More informationCopyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the
Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the open text license amendment to version 2 of the GNU General
More informationOn worst-case investment with applications in finance and insurance mathematics
On worst-case investment with applications in finance and insurance mathematics Ralf Korn and Olaf Menkens Fachbereich Mathematik, Universität Kaiserslautern, 67653 Kaiserslautern Summary. We review recent
More information4 Martingales in Discrete-Time
4 Martingales in Discrete-Time Suppose that (Ω, F, P is a probability space. Definition 4.1. A sequence F = {F n, n = 0, 1,...} is called a filtration if each F n is a sub-σ-algebra of F, and F n F n+1
More informationThe value of foresight
Philip Ernst Department of Statistics, Rice University Support from NSF-DMS-1811936 (co-pi F. Viens) and ONR-N00014-18-1-2192 gratefully acknowledged. IMA Financial and Economic Applications June 11, 2018
More informationIdentification and Estimation of Dynamic Games when Players Beliefs are not in Equilibrium
and of Dynamic Games when Players Beliefs are not in Equilibrium Victor Aguirregabiria and Arvind Magesan Presented by Hanqing Institute, Renmin University of China Outline General Views 1 General Views
More informationTopics in Contract Theory Lecture 5. Property Rights Theory. The key question we are staring from is: What are ownership/property rights?
Leonardo Felli 15 January, 2002 Topics in Contract Theory Lecture 5 Property Rights Theory The key question we are staring from is: What are ownership/property rights? For an answer we need to distinguish
More informationDynamic pricing and scheduling in a multi-class single-server queueing system
DOI 10.1007/s11134-011-9214-5 Dynamic pricing and scheduling in a multi-class single-server queueing system Eren Başar Çil Fikri Karaesmen E. Lerzan Örmeci Received: 3 April 2009 / Revised: 21 January
More informationOptimal Stopping in Infinite Horizon: an Eigenfunction Expansion Approach
Optimal Stopping in Infinite Horizon: an Eigenfunction Expansion Approach Lingfei Li a, Vadim Linetsky b a Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
More informationDisaster risk and its implications for asset pricing Online appendix
Disaster risk and its implications for asset pricing Online appendix Jerry Tsai University of Oxford Jessica A. Wachter University of Pennsylvania December 12, 2014 and NBER A The iid model This section
More informationOverview: Representation Techniques
1 Overview: Representation Techniques Week 6 Representations for classical planning problems deterministic environment; complete information Week 7 Logic programs for problem representations including
More informationUniversity of Groningen. Inventory Control for Multi-location Rental Systems van der Heide, Gerlach
University of Groningen Inventory Control for Multi-location Rental Systems van der Heide, Gerlach IMPORTANT NOTE: You are advised to consult the publisher's version publisher's PDF) if you wish to cite
More informationConsumption and Asset Pricing
Consumption and Asset Pricing Yin-Chi Wang The Chinese University of Hong Kong November, 2012 References: Williamson s lecture notes (2006) ch5 and ch 6 Further references: Stochastic dynamic programming:
More informationRevenue Management Under the Markov Chain Choice Model
Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin
More informationThe ruin probabilities of a multidimensional perturbed risk model
MATHEMATICAL COMMUNICATIONS 231 Math. Commun. 18(2013, 231 239 The ruin probabilities of a multidimensional perturbed risk model Tatjana Slijepčević-Manger 1, 1 Faculty of Civil Engineering, University
More information