Optimization Methods. Lecture 16: Dynamic Programming

Similar documents
IEOR E4004: Introduction to OR: Deterministic Models

Dynamic Programming (DP) Massimo Paolucci University of Genova

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE

Dynamic Programming cont. We repeat: The Dynamic Programming Template has three parts.

Handout 4: Deterministic Systems and the Shortest Path Problem

Lecture Notes 1

Stochastic Optimal Control

Lecture 10: The knapsack problem

Introduction to Dynamic Programming

Multistage Stochastic Programming

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

6.262: Discrete Stochastic Processes 3/2/11. Lecture 9: Markov rewards and dynamic prog.

June 11, Dynamic Programming( Weighted Interval Scheduling)

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Chapter 21. Dynamic Programming CONTENTS 21.1 A SHORTEST-ROUTE PROBLEM 21.2 DYNAMIC PROGRAMMING NOTATION

Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras

Reinforcement Learning and Optimal Control. Chapter 1 Exact Dynamic Programming DRAFT

UNIT 2. Greedy Method GENERAL METHOD

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 15 : DP Examples

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

Chapter wise Question bank

Problem Set 2: Answers

Optimization Methods in Management Science

MS-E2114 Investment Science Exercise 4/2016, Solutions

Decision Theory: Value Iteration

Optimization Methods. Lecture 7: Sensitivity Analysis

CHAPTER 5: DYNAMIC PROGRAMMING

CMPSCI 311: Introduction to Algorithms Second Midterm Practice Exam SOLUTIONS

0/1 knapsack problem knapsack problem

More Advanced Single Machine Models. University at Buffalo IE661 Scheduling Theory 1

EE365: Risk Averse Control

Topics in Computational Sustainability CS 325 Spring 2016

Lecture 5 January 30

Lecture outline W.B.Powell 1

56:171 Operations Research Midterm Exam Solutions October 22, 1993

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

Sequential Decision Making

Dynamic Portfolio Choice II

LEC 13 : Introduction to Dynamic Programming

Supply Chains: Planning with Dynamic Demand

1) S = {s}; 2) for each u V {s} do 3) dist[u] = cost(s, u); 4) Insert u into a 2-3 tree Q with dist[u] as the key; 5) for i = 1 to n 1 do 6) Identify

MS-E2114 Investment Science Lecture 4: Applied interest rate analysis

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE

56:171 Operations Research Midterm Exam Solutions Fall 1994

The Optimization Process: An example of portfolio optimization

Homework solutions, Chapter 8

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Optimization Methods in Management Science

Homework #4. CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class

CS 188: Artificial Intelligence

Price Discrimination As Portfolio Diversification. Abstract

MS-E2114 Investment Science Exercise 10/2016, Solutions

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Revenue Management Under the Markov Chain Choice Model

Markov Decision Processes

56:171 Operations Research Midterm Exam Solutions October 19, 1994

Lecture 1: Lucas Model and Asset Pricing

Introduction to Operations Research

Lecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods

Deterministic Dynamic Programming

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

Math Models of OR: More on Equipment Replacement

On solving multistage stochastic programs with coherent risk measures

CS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.

Project Planning. Jesper Larsen. Department of Management Engineering Technical University of Denmark

1 Answers to the Sept 08 macro prelim - Long Questions

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 4: Prior-Free Single-Parameter Mechanism Design. Instructor: Shaddin Dughmi

Dynamic Programming and Reinforcement Learning

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Optimal energy management and stochastic decomposition

CS 188: Artificial Intelligence. Outline

Lecture 5. Varian, Ch. 8; MWG, Chs. 3.E, 3.G, and 3.H. 1 Summary of Lectures 1, 2, and 3: Production theory and duality

14.54 International Trade Lecture 3: Preferences and Demand

Yao s Minimax Principle

Dynamic Appointment Scheduling in Healthcare

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

Credit Value Adjustment (Payo-at-Maturity contracts, Equity Swaps, and Interest Rate Swaps)

AN ALGORITHM FOR FINDING SHORTEST ROUTES FROM ALL SOURCE NODES TO A GIVEN DESTINATION IN GENERAL NETWORKS*

Markov Decision Processes

CS 188: Artificial Intelligence Fall 2011

What is Greedy Approach? Control abstraction for Greedy Method. Three important activities

Chapter 15: Dynamic Programming

High Frequency Trading Strategy Based on Prex Trees

CS 188: Artificial Intelligence

Robust Dual Dynamic Programming

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

CSE 417 Dynamic Programming (pt 2) Look at the Last Element

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

arxiv: v1 [q-fin.rm] 1 Jan 2017

Jessie Jumpshot. Creating Value with Contingent Contracts

Econ 172A - Slides from Lecture 7

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

UGM Crash Course: Conditional Inference and Cutset Conditioning

Transcription:

15.093 Optimization Methods Lecture 16: Dynamic Programming

1 Outline 1. The knapsack problem Slide 1. The traveling salesman problem 3. The general DP framework 4. Bellman equation 5. Optimal inventory control 6. Optimal trading 7. Multiplying matrices The Knapsack problem nx Dene maximize subject to nx j=1 j=1 C i (w) = maximize c j x j w j x j K xj f0; 1g ix j=1 c j x j Slide subject to ix w j x j w j=1 x j f0; 1g.1 A DP Algorithm C i (w): the maximum value that can be accumulated using some of the rst i items subject to the constraint that the total accumulated weight is equal to w Slide 3 Recursion C i+1 (w) = max C i (w); C i (w w i+1 ) + c i+1 By considering all states of the form (i; w) with w K, algorithm has complexity O(nK) 1

3 The TSP G = (V; A) directed graph with n nodes c ij cost of arc (i; j) Approach: choice of a tour as a sequence of choices We start at node 1; then, at each stage, we choose which node to visit next. After a number of stages, we have visited a subset S of V and we are at a current node k S 3.1 A DP algorithm C(S; k) be the minimum cost over all paths that start at node 1, visit all nodes in the set S exactly once, and end up at node k (S; k) a state; this state can be reached from any state of the form fkg; m, with m S n fkg, at a transition cost of c mk Recursion C(S; k) = min C S n fkg; m + c mk ; k S msnfkg C f1g; 1 = 0: S n Slide 4 Slide 5 Length of an optimal tour is Complexity: O n min C f1; : : : ; ng; k k n operations + c k1 4 Guidelines for constructing DP Algorithms View the choice of a feasible solution as a sequence of decisions occurring in stages, and so that the total cost is the sum of the costs of individual decisions. Dene the state as a summary of all relevant past decisions. Determine which state transitions are possible. Let the cost of each state transition be the cost of the corresponding decision. Write a recursion on the optimal cost from the origin state to a destination state. The most crucial step is usually the denition of a suitable state. Slide 6

! 5 The general DP framework Discrete time dynamic system described by state x k, k indexes time. u k control to be selected at time k. u k U k (x k ). Slide 7 w k randomness at time k N time horizon Dynamics: x k+1 = f k (x k ; u k ; w k ) Cost function: additive over time N1 X E g N (x N ) + g k (x k ; u k ; w k ) k=0 5.1 Inventory Control x k stock available at the beginning of the kth period u k stock ordered at the beginning of the kth period w k demand duirng the kth period with given probability distribution. Excess demand is backloged and lled as soon as additional inventory is available. Dynamics x k+1 = x k + u k w k Cost! N1 X E R(x N ) + (r(x k ) + cu k ) 6 The DP Algorithm k=0 Dene J k (x k ) to be the expected optimal cost starting from stage k at state x k. Bellman's principle of optimality Slide 8 Slide 9 J N (x N ) = g N (x N ) J k (x k ) = min E wk g k (x k ; u k ; w k ) + J k+1 (f k (x k ; u k ; w k )) u ku k(x k) Optimal expected cost for the overall problem: J 0 (x 0 ). 3

7 Inventory Control If r(x k ) = ax k, w k N( k ; k ), then Slide 10 u k = c k x k + d k ; J k (x k ) = b k x k + f k x k + e k If r(x k ) = p max(0; x k ) + h max(0; x k ), then there exist S k : S k x k if x k < S k u k = 0 if x k S k 8 Optimal trading S shares of a stock to be bought within a horizon T. t = 1; ; : : :; T discrete trading periods. Slide 11 Control: S t number of shares acquired in period t at price P t, t = 1; ; : : : ; T T X Objective: min E P t S t Dynamics: t=1 where > 0, t N(0; 1) T X s:t: S t = S t=1 P t = P t1 + S t + t 8.1 DP ingredients State: (P t1 ; W t ) Slide 1 P t1 price realized at the previous period W t # of shares remaining to be purchased Control: S t number of shares purchased at time t Randomness: t Objective: min E P T P t S t=1 t Dynamics: P t = P t1 + S t + t W t = W t1 S t1 ; W 1 = S; W T +1 = 0 Note that W T +1 = 0 is equivalent to the constraint that S must be executed by period T Slide 13 4

8. The Bellman Equation Slide 14 J t (P t1 ; W t ) = min E t P t S t + J t+1 (P t ; W t+1 ) S t Since W T +1 = 0 ) S T = W T J T (P T 1 ; W T ) = min E T [P T W T ] = (P T 1 + W T )W T S T 8.3 Solution J T 1 (P T ; W T 1 ) = = min E T 1 P T 1 S T 1 + J T (P T 1 ; W T ) S T1 = min E T 1 (P T + S T 1 + T 1 )S T 1 + S T1 J T P T + S T 1 + T 1 ; W T 1 S T 1 Slide 15 S T 1 = W T 1 3 J T 1 (P T ; W T 1 ) = W T 1 (P T + W T 1 ); 4 Continuing in this fashion, Slide 16 S T k = W T k k + 1 J T k (P T k1 ; W T k ) = k + W T k (P T k1 + W T k ) (k + 1) S 1 = S T S 1 J 1 (P 0 ; W 1 ) = P 0 S + 1 + T S 1 = S = = S T = S T 5

8.4 Dierent Dynamics Slide 17 P t = P t1 + S t + X t + t ; > 0 X t = X t1 + t ; X 1 = 1 ; (1; 1) where t N(0; ) and t N(0; ) 8.5 Solution Slide 18 W T k b k1 S T k = + X T k k + 1 a k1 J T k (P T k1 ; X T k ; W T k ) = P T k1 W T k + a k W T k + for k = 0; 1; : : :; T 1, where: 1 a k = 1 + ; ; a 0 = k + 1 b k X T k W T k + c k X T k + d k b k1 b k = + ; b 0 = a k1 b k1 4a k1 c k = c k1 ; c 0 = 0 d k = d k1 + c k1 ; d 0 = 0 : 9 Matrix multiplication Matrices: M k : n k n k+1 Slide 19 Objective: Find M 1 M M N Example: M 1 M M 3 ; M 1 : 1 10, M : 10 1, M 3 : 1 10. M 1 (M M 3 ) 00 multiplications; (M 1 M )M 3 0 multiplictions. What is the optimal order for performing the multiplication? Slide 0 6

m(i; j) optimal number of scalar multiplications for multiplying M i : : : M j. m(i; i) = 0 For i < j: m(i; j) = min (m(i; k) + m(k + 1; j) + n i n k+1 n j+1 ) ik<j 7

MIT OpenCourseWare http://ocw.mit.edu 15.093J / 6.55J Optimization Methods Fall 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. - 1