Optimization Methods. Lecture 16: Dynamic Programming

Size: px

Start display at page:

Download "Optimization Methods. Lecture 16: Dynamic Programming"

Robert Potter
5 years ago
Views:

1 Optimization Methods Lecture 16: Dynamic Programming

2 1 Outline 1. The knapsack problem Slide 1. The traveling salesman problem 3. The general DP framework 4. Bellman equation 5. Optimal inventory control 6. Optimal trading 7. Multiplying matrices The Knapsack problem nx Dene maximize subject to nx j=1 j=1 C i (w) = maximize c j x j w j x j K xj f0; 1g ix j=1 c j x j Slide subject to ix w j x j w j=1 x j f0; 1g.1 A DP Algorithm C i (w): the maximum value that can be accumulated using some of the rst i items subject to the constraint that the total accumulated weight is equal to w Slide 3 Recursion C i+1 (w) = max C i (w); C i (w w i+1 ) + c i+1 By considering all states of the form (i; w) with w K, algorithm has complexity O(nK) 1

3 3 The TSP G = (V; A) directed graph with n nodes c ij cost of arc (i; j) Approach: choice of a tour as a sequence of choices We start at node 1; then, at each stage, we choose which node to visit next. After a number of stages, we have visited a subset S of V and we are at a current node k S 3.1 A DP algorithm C(S; k) be the minimum cost over all paths that start at node 1, visit all nodes in the set S exactly once, and end up at node k (S; k) a state; this state can be reached from any state of the form fkg; m, with m S n fkg, at a transition cost of c mk Recursion C(S; k) = min C S n fkg; m + c mk ; k S msnfkg C f1g; 1 = 0: S n Slide 4 Slide 5 Length of an optimal tour is Complexity: O n min C f1; : : : ; ng; k k n operations + c k1 4 Guidelines for constructing DP Algorithms View the choice of a feasible solution as a sequence of decisions occurring in stages, and so that the total cost is the sum of the costs of individual decisions. Dene the state as a summary of all relevant past decisions. Determine which state transitions are possible. Let the cost of each state transition be the cost of the corresponding decision. Write a recursion on the optimal cost from the origin state to a destination state. The most crucial step is usually the denition of a suitable state. Slide 6

4 ! 5 The general DP framework Discrete time dynamic system described by state x k, k indexes time. u k control to be selected at time k. u k U k (x k ). Slide 7 w k randomness at time k N time horizon Dynamics: x k+1 = f k (x k ; u k ; w k ) Cost function: additive over time N1 X E g N (x N ) + g k (x k ; u k ; w k ) k=0 5.1 Inventory Control x k stock available at the beginning of the kth period u k stock ordered at the beginning of the kth period w k demand duirng the kth period with given probability distribution. Excess demand is backloged and lled as soon as additional inventory is available. Dynamics x k+1 = x k + u k w k Cost! N1 X E R(x N ) + (r(x k ) + cu k ) 6 The DP Algorithm k=0 Dene J k (x k ) to be the expected optimal cost starting from stage k at state x k. Bellman's principle of optimality Slide 8 Slide 9 J N (x N ) = g N (x N ) J k (x k ) = min E wk g k (x k ; u k ; w k ) + J k+1 (f k (x k ; u k ; w k )) u ku k(x k) Optimal expected cost for the overall problem: J 0 (x 0 ). 3

5 7 Inventory Control If r(x k ) = ax k, w k N( k ; k ), then Slide 10 u k = c k x k + d k ; J k (x k ) = b k x k + f k x k + e k If r(x k ) = p max(0; x k ) + h max(0; x k ), then there exist S k : S k x k if x k < S k u k = 0 if x k S k 8 Optimal trading S shares of a stock to be bought within a horizon T. t = 1; ; : : :; T discrete trading periods. Slide 11 Control: S t number of shares acquired in period t at price P t, t = 1; ; : : : ; T T X Objective: min E P t S t Dynamics: t=1 where > 0, t N(0; 1) T X s:t: S t = S t=1 P t = P t1 + S t + t 8.1 DP ingredients State: (P t1 ; W t ) Slide 1 P t1 price realized at the previous period W t # of shares remaining to be purchased Control: S t number of shares purchased at time t Randomness: t Objective: min E P T P t S t=1 t Dynamics: P t = P t1 + S t + t W t = W t1 S t1 ; W 1 = S; W T +1 = 0 Note that W T +1 = 0 is equivalent to the constraint that S must be executed by period T Slide 13 4

6 8. The Bellman Equation Slide 14 J t (P t1 ; W t ) = min E t P t S t + J t+1 (P t ; W t+1 ) S t Since W T +1 = 0 ) S T = W T J T (P T 1 ; W T ) = min E T [P T W T ] = (P T 1 + W T )W T S T 8.3 Solution J T 1 (P T ; W T 1 ) = = min E T 1 P T 1 S T 1 + J T (P T 1 ; W T ) S T1 = min E T 1 (P T + S T 1 + T 1 )S T 1 + S T1 J T P T + S T 1 + T 1 ; W T 1 S T 1 Slide 15 S T 1 = W T 1 3 J T 1 (P T ; W T 1 ) = W T 1 (P T + W T 1 ); 4 Continuing in this fashion, Slide 16 S T k = W T k k + 1 J T k (P T k1 ; W T k ) = k + W T k (P T k1 + W T k ) (k + 1) S 1 = S T S 1 J 1 (P 0 ; W 1 ) = P 0 S T S 1 = S = = S T = S T 5

7 8.4 Dierent Dynamics Slide 17 P t = P t1 + S t + X t + t ; > 0 X t = X t1 + t ; X 1 = 1 ; (1; 1) where t N(0; ) and t N(0; ) 8.5 Solution Slide 18 W T k b k1 S T k = + X T k k + 1 a k1 J T k (P T k1 ; X T k ; W T k ) = P T k1 W T k + a k W T k + for k = 0; 1; : : :; T 1, where: 1 a k = 1 + ; ; a 0 = k + 1 b k X T k W T k + c k X T k + d k b k1 b k = + ; b 0 = a k1 b k1 4a k1 c k = c k1 ; c 0 = 0 d k = d k1 + c k1 ; d 0 = 0 : 9 Matrix multiplication Matrices: M k : n k n k+1 Slide 19 Objective: Find M 1 M M N Example: M 1 M M 3 ; M 1 : 1 10, M : 10 1, M 3 : M 1 (M M 3 ) 00 multiplications; (M 1 M )M 3 0 multiplictions. What is the optimal order for performing the multiplication? Slide 0 6

8 m(i; j) optimal number of scalar multiplications for multiplying M i : : : M j. m(i; i) = 0 For i < j: m(i; j) = min (m(i; k) + m(k + 1; j) + n i n k+1 n j+1 ) ik<j 7

9 MIT OpenCourseWare J / 6.55J Optimization Methods Fall 009 For information about citing these materials or our Terms of Use, visit: - 1

IEOR E4004: Introduction to OR: Deterministic Models

IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the