Dynamic Programming and Optimal Control Volume 1

Size: px
Start display at page:

Download "Dynamic Programming and Optimal Control Volume 1"

Transcription

1 Dynamic Programming and Optimal Control Volume 1 SECOND EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology Selected Theoretical Problem Solutions Athena Scientific, Belmont, MA, 2000 WWW site for book information and orders 1

2 NOTE This solution set is meant to be a significant extension of the scope and coverage of the book. Solutions to all of the book s exercises marked with the symbol www have been included. The solutions are continuously updated and improved, and additional material, including new problems and their solutions are being added. The solutions may be reproduced and distributed for personal or educational uses. Please send comments, and suggestions for additions and improvements to the author at bertsekas@lids.mit.edu or dimitrib@mit.edu 2

3 Vol. I, Chapter www We first show the result given in the hint. We have for all µ M } max G 0 (w) + G 1 f(w), µ(f(w)) max G 0 (w) + min G } 1 f(w), u w W w W u U and, taking min over µ M we obtain min max µ M w W G 0 (w) + G 1 f(w), µ(f(w)) } max w W G 0 (w) + min G } 1 f(w), u u U We must therefore show the reverse inequality. For any ɛ > 0, let µ ɛ M be such that G 1 f(w), µɛ (f(w)) min G 1 f(w), u + ɛ, w W u U (Such a µ ɛ exists because of the assumption that min u U G 1 f(w), u > ). Then min max } G 0 (w) + G 1 f(w), µ(f(w)) max G 0 (w) + G 1 f(w), µɛ (f(w)) } µ M w W w W max w W G 0 (w) + min G } 1 f(w), u + ɛ u U Since ɛ > 0 can be taken arbitrarily small we obtain the reverse inequality to ( ), and thus the desired result. To see how this result can fail without the condition min u U G 1 f(w), u > for all w, let R be the real line, let u be a real number, let w = (w 1, w 2 ) be a two-dimensional vector, let there be no constraints on u and w (U = R, W = R R), let G 0 (w) = w 1, f(w) = w 2, and G 1 f(w), u = f(w) + u. Then } G 0 (w) + G 1 f(w), µ(f(w)) = so that max w W On the other hand, max w W G 0 (w) + min u U G 1 since min u R w 2 + u} = for all w 2. max w 1 + w 2 + µ(w 2 )} =, µ M, w 1 R, w 2 R min max } G 0 (w) + G 1 f(w), µ(f(w)) =. µ M w W We now turn to showing the DP algorithm. We have J (x 0 ) = min min max µ 0 µ N 1 w 0 Wx 0,µ 0 (x 0 ) = min min min max µ 0 µ N 2 µ N 1 w 0 Wx 0,µ 0 (x 0 ) } } f(w), u = max w 1 + min w 2 + u} =, w 1 R, w 2 R u R max w N 1 Wx N 1,µ N 1 (x N 1 ) max N 1 k=0 w N 2 Wx N 2,µ N 2 (x N 2 ) g k (x k, µ k (x k ), w k ) + g N (x N ) N 2 + max gn 1 (x N 1, µ N 1 (x N 1 ), w N 1 ) + J N (x N ) }}} w N 1 Wx N 1,µ N 1 (x N 1 ) 3 k=0 g k (x k, µ k (x k ), w k ) }

4 Applying the result of the hint with the identifications w = (w 0, w 1,... w N 2 ), u = u N 1, f(w) = x N 1, N 2 G 0 (w) = k=0 g k(x k, µ k (x k ), w k ) if w k W k x k, µ k (x k ), k, otherwise, where we have Ĝ 1 f(w), u = G 1 f(w), u = Ĝ1 f(w), u if u U N 1 f(w), otherwise, ( ) ( max g N 1 f(w), u, wn 1 + JN fn 1 (f(w), u, w N 1 ) )}, w N 1 W N 1 f(w),u J (x 0 ) = min µ 0 min µ N 2 max w 0 Wx 0,µ 0 (x 0 ) max w N 2 Wx N 2,µ N 2 (x N 2 ) N 2 k=0 g k (x k, µ k (x k ), w k ) + J N 1 (x N 1 ) The required condition min u U G 1 f(w), u > for all w is implied by the assumption JN 1 (x N 1 ) > for all x N 1. Without this assumption, it can be seen that mathematical anomalies of the type demonstrated in the earlier example may arise. By working with the preceding expression for J (x 0 ) and by similarly continuing backwards, with N 1 in place of N etc., after N steps we obtain J (x 0 ) = J 0 (x 0 ). } 1.13 www The DP algorithm is J k (x k ) = min u k J N (x N ) = c N x N } E c k x k + g k (u k ) + J k+1 Ak x k + f k (u k ) + w k w k,a k We will show that J k (x k ) is affine through induction. Clearly J N (x N ) is affine. Assume that J k+1 (x k+1 ) is affine; that is, J k+1 (x k+1 ) = b k+1 x k+1 + d k+1 Then J k (x k ) = min u k E c k x k + g k (u k ) + b k+1 A kx k + b k+1 f k(u k ) + b k+1 w } k + d k+1 w k,a k = c k + b k+1 E A k } x k + b k+1 E w k } + min u k gk (u k ) + b k+1 f k(u k ) } + d k+1 Note that EA k } and Ew k } do not depend on x k or u k. If the optimal value is finite then ming k (u k ) + b k+1 f k(u k )} is a real number and J k (x k ) is affine. Furthermore, the optimal control at each stage solves this minimization which is independent of x k. Thus, the optimal policy consists of constant functions µ k. 4

5 1.16 www (a) Given a sequence of matrix multiplications M 1 M 2 M k M k+1 M N we represent it by a sequence of numbers n 1,..., n N+1 }, where n k n k+1 is the dimension of M k. Let the initial state be x 0 = n 1,..., n N+1 }. Then choosing the first multiplication to be carried out corresponds to choosing an element from the set x 0 n 1, n N+1 }. For instance, choosing n 2 corresponds to multiplying M 1 and M 2, which results in a matrix of dimension n 1 n 3, and the initial state must be updated to discard n 2, the control applied at that stage. Hence at each stage the state represents the dimensions of the matrices resulting from the multiplications done so far. The allowable controls at stage k are u k x k n 1, n N+1 }. The system equation evolves according to x k+1 = x k u k. Note that the control will be applied N 1 times, therefore the horizon of this problem is N 1. The terminal state is x N 1 = n 1, n N+1 } and the terminal cost is 0. The cost at stage k is given by the number of multiplications, g k (x k, u k ) = n a n uk n b, where n uk = u k and The DP algorithm for this problem is given by a = max i 1,..., N + 1} i < u k, i x k }, b = min i 1,..., N + 1} i > u k, i x k }. J N 1 (x N 1 ) = 0, J k (x k ) = min na n uk n b + J k+1 (x k u k ) }, k = 0,..., N 2. u k x k n 1,n N+1 } Now consider the given problem, where N = 3 and M 1 is 2 10 M 2 is 10 5 M 3 is 5 1 The optimal order is M 1 (M 2 M 3 ), requiring 70 multiplications. (b) In this part we can choose a much simpler state space. Let the state at stage k be given by a, b}, where a, b 1,..., N } and give the indices of the first and the last matrix in the current partial product. There are two possible controls at each stage, which we denote by L and R. Note that L can be applied only when a 1 and R can be applied only when b N. The system equation evolves according to a 1, b}, if uk = L x k+1 = a, b + 1}, if u k = R k = 1,..., N 1. The terminal state is x N = 1, N} with cost 0. The cost at stage k is given by na 1 n a n g k (x k, u k ) = b+1, if u k = L n a n b+1 n b+2, if u k = R k = 1,..., N 1. To handle the initial stage, we can take x 0 to be the empty set and u 0 1,..., N}. The next state will be given by x 1 = u 0, u 0 } and the cost incurred at the initial stage will be 0 for all possible controls. 5

6 1.19 www Let t 1 < t 2 <... < t N 1 denote the times where g 1 (t) = g 2 (t). Clearly, it is never optimal to switch functions at any other times. We can therefore divide the problem into N 1 stages, where we want to determine for each stage k whether or not to switch activities at time t k. Define 0 if on activity g1 just before time t x k = k 1 if on activity g 2 just before time t k u k = 0 to continue current activity 1 to switch between activities Then the state at time t k+1 is simply x k+1 = (x k + u k ) mod 2, and the profit for stage k is The DP algorithm is then J N (x N ) = 0 g k (x k, u k ) = tk+1 t k g 1+xk+1 (t)dt u k c J k (x k ) = min u k g k (x k, u k ) + J k+1 (x k + u k ) mod 2} www (a) Consider the problem with the state equal to the number of free rooms. At state x 1 with y customers remaining, if the inkeeper quotes a rate r i, the transition probability is p i to state x 1 (with a reward of r i ) and 1 p i to state x (with a reward of 0). The DP algorithm for this problem starts with the terminal conditions J(x, 0) = J(0, y) = 0, x 0, y 0, and is given by J(x, y) = max pi (r i + J(x 1, y 1)) + (1 p i )J(x, y 1), x 0. i=1,...,m From the above equation and the terminal conditions, we can compute sequentially J(1, 1), J(1, 2),..., J(1, y) up to any desired integer y. Then, we can calculate J(2, 1), J(2, 2),..., J(2, y), etc. We first prove by induction on y that for all y, we have J(x, y) J(x 1, y), x 1. Indeed this is true for y = 0. Assuming this is true for a given y, we will prove that J(x, y + 1) J(x 1, y + 1), x 1. This relation holds for x = 1 since r i > 0. For x 2, by using the DP recursion, this relation is written as max pi (r i + J(x 1, y)) + (1 p i )J(x, y) max pi (r i + J(x 2, y)) + (1 p i )J(x 1, y). i=1,...,m i=1,...,m 6

7 By the induction hypothesis, each of the terms on the left-hand side is no less than the corresponding term on the right-hand side, so the above relation holds. The optimal rate is the one that maximizes in the DP algorithm, or equivalently, the one that maximizes p i r i + p i (J(x 1, y 1) J(x, y 1)). The highest rate r m simultaneously maximizes p i r i and minimizes p i. Since J(x 1, y 1) J(x, y 1) 0, as proved above, we see that the highest rate simultaneously maximizes p i r i and p i (J(x 1, y 1) J(x, y 1)), and so it maximizes their sum. (b) The algorithm given is the algorithm of Exercise 1.22 applied to the problem of part (a). Clearly, it is optimal to accept an offer of r i if r i is larger than the threshold r(x, y) = J(x, y 1) J(x 1, y 1) www (a) The total net expected profit from the (buy/sell) investment decissions after transaction costs are deducted is N 1 ( E uk P k (x k ) c u k )}, where k=0 1 if a unit of stock is bought at the kth period, u k = 1 if a unit of stock is sold at the kth period, 0 otherwise. With a policy that maximizes this expression, we simultaneously maximize the expected total worth of the stock held at time N minus the investment costs (including sale revenues). with The DP algorithm is given by J k (x k ) = max u k P k (x k ) c u k + E } J k+1 (x k+1 ) x k, u k = 1,0,1 J N (x N ) = 0, where J k+1 (x k+1 ) is the optimal expected profit when the stock price is x k+1 at time k + 1. Since u k does not influence x k+1 and E J k+1 (x k+1 ) x k }, a decision uk 1, 0,1} that maximizes u k P k (x k ) c u k at time k is optimal. Since P k (x k ) is monotonically nonincreasing in x k, it follows that it is optimal to set u k = 1 if xk x k, 1 if x k x k, 0 otherwise, where x k and x k are as in the problem statement. Note that the optimal expected profit J k (x k ) is given by J k (x k ) = E N 1 i=k max u i = 1,0,1 7 ui P i (x i ) c u i }.

8 (b) Let n k be the number of units of stock held at time k. If n k is less that N k (the number of remaining decisions), then the value n k should influence the decision at time k. We thus take as state the pair (x k, n k ), and the corresponding DP algorithm takes the form with V k (x k, n k ) = Note that we have max uk 1,0,1} max uk 0,1} u k P k (x k ) c u k + E V k+1 (x k+1, n k + u k ) x k } if n k 1, u k P k (x k ) c u k + E V k+1 (x k+1, n k + u k ) x k } if n k = 0, V N (x N, n N ) = 0. V k (x k, n k ) = J k (x k ), if n k N k, where J k (x k ) is given by the formula derived in part (a). Using the above DP algorithm, we can calculate V N 1 (x N 1, n N 1 ) for all values of n N 1, then calculate V N 2 (x N 2, n N 2 ) for all values of n N 2, etc. To show the stated property of the optimal policy, we note that V k (x k, n k ) is monotonically nondecreasing with n k, since as n k decreases, the remaining decisions become more constrained. An optimal policy at time k is to buy if P k (x k ) c + E V k+1 (x k+1, n k + 1) V k+1 (x k+1, n k ) x k } 0, (1) and to sell if P k (x k ) c + E V k+1 (x k+1, n k 1) V k+1 (x k+1, n k ) x k } 0. (2) The expected value in Eq. (1) is nonnegative, which implies that if x k x k, implying that P k (x k ) c 0, then the buying decision is optimal. Similarly, the expected value in Eq. (2) is nonpositive, which implies that if x k < x k, implying that P k (x k ) c < 0, then the selling decision cannot be optimal. It is possible that buying at a price greater than x k is optimal depending on the size of the expected value term in Eq. (1). (c) Let m k be the number of allowed purchase decisions at time k, i.e., m plus the number of sale decisions up to k, minus the number of purchase decisions up to k. If m k is less than N k (the number of remaining decisions), then the value m k should influence the decision at time k. We thus take as state the pair (x k, m k ), and the corresponding DP algorithm takes the form with W k (x k, m k ) = max uk 1,0,1} max uk 1,0} u k P k (x k ) c u k + E W k+1 (x k+1, m k u k ) x k } if m k 1, u k P k (x k ) c u k + E W k+1 (x k+1, m k u k ) x k } if m k = 0, W N (x N, m N ) = 0. From this point the analysis is similar to the one of part (b). (d) The DP algorithm takes the form H k (x k, m k, n k ) = u k P k (x k ) c u k + E } H k+1 (x k+1, m k u k, n k + u k ) x k max u k 1,0,1} if m k 1 and n k 1, and similar formulas apply for the cases where m k = 0 and/or n k = 0 compare with the DP algorithms of parts (b) and (c). (e) Let r be the interest rate, so that x invested dollars at time k will become (1 + r) N k x dollars at time N. Once we redefine the expected profit P k (x k ) to be the preceding analysis applies. P k (x) = Ex N x k = x} (1 + r) N k x, 8

9 1.27 www We consider part (b), since part (a) is essentially a special case. We will consider the problem of placing N 2 points between the endpoints A and B of the given subarc. We will show that the polygon of maximal area is obtained when the N 2 points are equally spaced on the subarc between A and B. Based on geometric considerations, we impose the restriction that the angle between any two successive points is no more than π. As the subarc is traversed in the clockwise direction, we number sequentially the encountered points as x 1, x 2,..., x N, where x 1 and x N are the two endpoints A and B of the arc, respectively. For any point x on the subarc, we denote by φ the angle between x and x N (measured clockwise), and we denote by A k (φ) the maximal area of a polygon with vertices the center of the circle, the points x and x N, and N k 1 additional points on the subarc that lie between x and x N. Without loss of generality, we assume that the radius of the circle is 1, so that the area of the triangle that has as vertices two points on the circle and the center of the circle is (1/2) sin u, where u is the angle corresponding to the center. By viewing as state the angle φ k between x k and x N, and as control the angle u k between x k and x k+1, we obtain the following DP algorithm 1 A k (φ k ) = max 0 u k minφ k,π} 2 sin u k + A k+1 (φ k u k ), k = 1,..., N 2. (1) Once x N 1 is chosen, there is no issue of further choice of a point lying between x N 1 and x N, so we have A N 1 (φ) = 1 sin φ, (2) 2 using the formula for the area of the triangle formed by x N 1, x N, and the center of the circle. It can be verified by induction that the above algorithm admits the closed form solution A k (φ k ) = 1 ( ) 2 (N k) sin φk, k = 1,..., N 1, (3) N k and that the optimal choice for u k is given by u k = φ k N k. Indeed, the formula (3) holds for k = N 1, by Eq. (2). Assuming that Eq. (3) holds for k + 1, we have from the DP algorithm (1) A k (φ k ) = max H k(u k, φ k ), (4) 0 u k minφ k,π} where H k (u k, φ k ) = 1 2 sin u k + 1 ( ) 2 (N k 1) sin φk u k. (5) N k 1 It can be verified that for a fixed φ k and in the range 0 u k minφ k, π}, the function H k (, φ k ) is concave (its second derivative is negative) and its derivative is 0 only at the point u k = φ k/(n k) which must therefore be its unique maximum. Substituting this value of u k in Eqs. (4) and (5), we obtain A k (φ k ) = 1 ( ) 2 sin φk + 1 ( ) N k 2 (N k 1) sin φk φ k /(N k) = 1 ( ) N k 1 2 (N k) sin φk, N k and the induction is complete. Thus, given an optimally placed point x k on the subarc with corresponding angle φ k, the next point x k+1 is obtained by advancing clockwise by φ k /(N k). This process, when started at x 1 with φ 1 equal to the angle between x 1 and x N, yields as the optimal solution an equally spaced placement of the points on the subarc. 9

10 Vol. I, Chapter www (a) We denote by P k the OPEN list after having removed k nodes from OPEN, (i.e., after having performed k iterations of the algorithm). We also denote d k j the value of d j at this time. Let b k = min j Pk d k j }. First, we show by induction that b 0 b 1 b k. Indeed, b 0 = 0 and b 1 = min j a sj } 0, which implies that b 0 b 1. Next, we assume that b 0 b k for some k 1; we shall prove that b k b k+1. Let j k+1 be the node removed from OPEN during the (k + 1)th iteration. By assumption d k j k+1 = min j Pk d k j } = b k, and we also have = mind k i, dk j k+1 + a jk+1 i}. d k+1 i We have P k+1 = (P k j k+1 }) N k+1, where N k+1 is the set of nodes i satisfying d k+1 i and i / P k. Therefore, min d k+1 i } = min i P k+1 d k+1 i i (P k j k+1 }) N k+1 } = min min i P k j k+1 } dk+1 i }, min i N k+1 d k+1 i } = d k j k+1 + a jk+1 i. Clearly, Moreover, min i P k j k+1 } dk+1 min d k+1 i N i } = min d k k+1 i N j k+1 + a jk+1 i} d k j k+1. k+1 i } = min min i P k j k+1 } mind k i, dk j k+1 + a jk+1 i} min i P k j k+1 } dk i }, dk j k+1 = min i P k d k i } = dk j k+1, because we remove from OPEN this node with the minimum d k i. It follows that b k+1 = min i Pk+1 d k+1 i } d k j k+1 = b k. Now, we may prove that once a node exits OPEN, it never re-enters. Indeed, suppose that some node i exits OPEN after the k th iteration of the algorithm; then, d k 1 i = b k 1. If node i re-enters OPEN after the l th iteration (with l > k ), then we have d l1 1 i > d l i = d l 1 j l + a jl i d l 1 j = b l l 1. On the other hand, since d i is non-increasing, we have b k 1 = d k 1 i d l 1 i. Thus, we obtain b k 1 > b l 1, which contradicts the fact that b k is non-decreasing. Next, we claim the following: after the kth iteration, d k i equals the length of the shortest possible path from s to node i P k under the restriction that all intermediate nodes belong to C k. The proof will be done by induction on k. For k = 1, we have C 1 = s} and d 1 i = a si, and the claim is obviously true. Next, we assume that the claim is true after iterations 1,..., k; we shall show that it is also true after iteration k + 1. The node j k+1 removed from OPEN at the (k + 1)-st iteration satisfies min i Pk d k i } = d j k+1. Notice now that all neighbors of the nodes in C k belong either to C k or to P k. It follows that the shortest path from s to j k+1 either goes through C k or it exits C k, then it passes through a node j P k, and eventually reaches j k+1. If the latter case applies, then the length of this path is at least the length of the shortest path from s to j through C k ; by the induction hypothesis, this equals d k j, which is at least dk j k+1. It follows that, for node j k+1 exiting the OPEN list, d k j k+1 equals the length of the shortest path from s to j k+1. Similarly, all nodes that have exited previously have their current estimate 10

11 of d i equal to the corresponding shortest distance from s. * Notice now that d k+1 i = min d k i, dk j k+1 + a jk+1 i}. For i / P k and i P k+1 it follows that the only neighbor of i in C k+1 = C k j k+1 } is node j k+1 ; for such a node i, d k i =, which leads to dk+1 i = d k j k+1 + a jk+1 i. For i j k+1 and i P k, the augmentation of C k by including j k+1 offers one more path from s to i through C k+1, namely that through j k+1. Recall that the shortest path from s to i through C k has length d k i (by the induction hypothesis). Thus, d k+1 i = min d k 1, dk j k+1 + a jk+1 i} is the length of the shortest path from s to i through Ck+1. The fact that each node exits OPEN with its current estimate of d i being equal to its shortest distance from s has been proved in the course of the previous inductive argument. (b) Since each node enters the OPEN list at most once, the algorithm will terminate in at most N 1 iterations. Updating the d i s during an iteration and selecting the node to exit OPEN requires O(N) arithmetic operations (i.e., a constant number of operations per node). Thus, the total number of operations is O(N 2 ). 2.6 www Proposition: If there exist a path from the origin to each node in T, the modified version of the label correcting algorithm terminates with UPPER < and yields a shortest path from the origin to each node in T. Otherwise the algorithm terminates with UPPER =. Proof: The proof is analogous to the proof of Proposition 3.1. To show that this algorithm terminates, we can use the identical argument in the proof of Proposition 3.1. Now suppose that for some node t T, there is no path from s to t. Then a node i such that (i, t) is an arc cannot enter the OPEN list because this would establish that there is a path from s to i, and therefore also a path from s to t. Thus, d t is never changed and UPPER is never reduced from its initial value of. Suppose now that there is a path from s to each node t T. Then, since there is a finite number of distinct lengths of paths from s to each t T that do not contain any cycles, and each cycle has nonnegative length, there is also a shortest path. For some arbitrary t, let (s, j 1, j 2,..., j k, t) be a shortest path and let d t be the corresponding shortest distance. We will show that the value of UPPER upon termination must be equal to d = max t T d t. Indeed, each subpath (s, j 1,..., j m ), m = 1,..., k, of the shortest path (s, j 1,..., j k, t) must be a shortest path from s to j m. If the value of UPPER is larger than d at termination, the same must be true throughout the algorithm, and therefore UPPER will also be larger than the length of all the paths (s, j 1,..., j m ), m = 1,..., k, throughout the algorithm, in view of the nonnegative arc length assumption. If, for each t T, the parent node j k enters the OPEN list with d jk equal to the shortest distance from s to j k, UPPER will be set to d in step 2 immediately following the next time the last of the nodes j k is examined by the algorithm in step 2. It follows that, for some t T, the associated parent node j k will never enter the OPEN list with d jk equal to the shortest distance from s to j k. Similarly, and using also the nonnegative length assumption, this means that node j k 1 will never enter the OPEN list with d j k 1 equal to the shortest distance from s to j k 1. Proceeding backwards, we conclude that j 1 never enters the OPEN list with d j 1 equal to the shortest distance from s to j 1 which is equal to the length of the arc (s, j 1 ). This happens, however, at the first iteration of the algorithm, obtaining a contradiction. It follws that at termination, UPPER will be equal to d. Finally, it can be seen that, upon termination of the algorithm, the path constructed by tracing the parent nodes backward from d to s has length equal to d t for each t T. Thus the path is a shortest path from s to t. * Strictly speaking, this is the shortest distance from s to these nodes because paths are directed from s to the nodes. 11

12 2.9 www (a) It was shown in the proof to proposition 3.2 that (P, p) satisfies CS throughout the original algorithm. Note that deleting arcs does not cause CS conditions to no longer hold. Therefore (P, p) satisfies CS throughout this algorithm. It was also shown in the text that if a pair (P, p) satisfies the CS conditions, then the portion of the path P between node s and any node i P is a shortest path from s to i. Now, consider any node j that becomes the terminal node of P through an extension using (i, j). If there is a shortest path from s to t that does not include node j, then removing the arcs (k, j) where k i yields a graph including the same shortest path. If the only shortest path from s to t does include node j, then since P is a shortest path from s to j, there is a shortest path from s to t that has P as its portion to node j. Thus removing the arcs (k, j), where k i yields a graph including a shortest path from the original graph. If node j has no outgoing arcs, then any path from s to t can not include j. Thus removing j yields a graph including the same paths from s to t as in the original graph. Therefore, both types of arc deletions leave the shortest distance from s to t unaffected. We can view the auction algorithm with graph reduction as follows: an iteration of the original auction algorithm is applied, followed by arc deletions that do not affect the CS conditions and which leaves the shortest distance from s to t unchanged; another iteration of the original auction algorithm is applied to the new graph, follwed by arc deletions; and so on. If an iteration of the original auction algorithm yields a path P with t as the terminal node, P is a shortest path from s to t in the latest modified graph. Since we have shown that each new graph has the same shortest distance from s to t as in the original graph, P must also be a shortest path from s to t in the original graph. Now assume that no iteration of the original auction algorithm ever yields a path P with t as the terminal node; i.e., there is no path from s to t. Assume that the modified algorithm never terminates. Since there are a finite number of arcs and nodes, there can be only a finite number of arc and node deletions. Consider the algorithm after the last deletion. Since there are no more deletions, there must be an outgoing arc (i 1, i 2 ) from any terminal node i 1 of P. Since the algorithm never terminates, i 2 must eventually be added to P. There must also be an outgoing arc (i 2, i 3 ) from i 2, and so on. However, there is only a finite number of nodes so some nodes must be repeated, which implies there is a cycle in P. Since there are no arcs incident to s, there must be some arc not part of the cycle that is incident to a node k in the cycle. But then this node has two incoming arcs. When k became the terminal in P, one of these arcs should have been deleted, yielding a contradiction. Thus the algorithm must terminate. Since there is no path from s to t, the algorithm can only terminate by deleting s. (b) Consider any cycle of zero length. Let j be the first node of the cycle to be a terminal node of path P. Let i be the node preceding j in the path P, and l be the node preceding j in the cycle. All incoming arcs (k, j) of j with k i, including arc (l, j), are deleted. Therefore, our problem reduces to one in which there are no cycles of zero length. (c) The iterations of the modified algorithm applied to the problem of Exercise 2.8 are given below. The first 13 iterations are the same as in the original algorithm, with the exception that at iteration 2, where the path is extended to include node 2, arc (4,2) is also deleted. As a result of this deletion, after the contraction in iteration 13, the price of node 4 is changed to L, resulting in faster convergence of the algorithm. 12

13 Iteration Path Price vector p Action 1 (1) (0,0,0,0,0) contraction at 1 2 (1) (1,0,0,0,0) extension to 2 3 (1,2) (1,0,0,0,0) contraction at 2 4 (1) (1,1,0,0,0) contraction at 1 5 (1) (2,1,0,0,0) extension to 2 6 (1,2) (2,1,0,0,0) extension to 3 7 (1,2,3) (2,1,0,0,0) contraction at 3 8 (1,2) (2,1,1,0,0) contraction at 2 9 (1) (2,2,1,0,0) contraction at 1 10 (1) (3,2,1,0,0) extension to 2 11 (1,2) (3,2,1,0,0) extension to 3 12 (1,2,3) (3,2,1,0,0) extension to 4 13 (1,2,3,4) (3,2,1,0,0) contraction at 4 14 (1,2,3) (3,2,1,L,0) contraction at 3 15 (1,2) (3,2,L+1,L,0) contraction at 2 16 (1) (3,L+2,L+1,L,0) contraction at 1 17 (1) (L+3,L+2,L+1,L,0) extension to 2 18 (1,2) (L+3,L+2,L+1,L,0) extension to 3 19 (1,2,3) (L+3,L+2,L+1,L,0) extension to 4 20 (1,2,3,4) (L+3,L+2,L+1,L,0) extension to 5 21 (1,2,3,4,5) (L+3,L+2,L+1,L,0) done 2.13 www (a) We first need to show that d k i is the length of the shortest k-arc path originating at i, for i t. For k = 1, d 1 i = min c ij j which is the length of shortest arc out of i. Assume that d k 1 i is the length of the shortest (k 1)-arc path out of i. Then d k i = min c ij + d k 1 j j } If d k i is not the length of the shortest k-arc path, the initial arc of the shortest path must pass through a node other than j. This is true since d k 1 j length of any (k 1)-step arc out of j. Let l be the alternative node. From the optimality principle distance of path through l = c il + d k 1 l But this contradicts the choice of d k i in the DP algorithm. Thus, dk i is the length of the shortest k-arc path out of i. Since d k t = 0 for all k, once a k-arc path out of i reaches t we have dκ i = d k i for all κ k. But with all arc lengths positive, d k i is just the shortest path from i to t. Clearly, there is some finite k such that the shortest k-path out of i reaches t. If this were not true, the assumption of positive arc lengths implies that the distance from i to t is infinite. Thus, the algorithm will yield the shortest distances in a finite number of steps. We can estimate the number of steps, N i as N i min j d jt min j,k d jk 13 d k i

14 (b) Let d k i be the distance estimate generated using the initial condition d 0 i = and dk i be the estimate generated using the initial condition d 0 i = 0. In addition, let d i be the shortest distance from i to t. Lemma d k i dk+1 i d i d k+1 i d k i (1) d k i = d i = d k i for k sufficently large (2) Proof Relation (1) follows from the monotonicity property of DP. Note that d 1 i d0 i and that d 1 i d 0 i. Equation (2) follows immediately from the convergence of DP (given d 0 i = ) and from part a). Proposition For every k there exists a time T k such that for all T T k d k i dt i d k i, i = 1,2,..., N Proof The proof follows by induction. For k = 0 the proposition is true, given the positive arc length assumption. Asume it is true for a given k. Let N(i) be a set containing all nodes adjacent to i. For every j N(i) there exists a time, T j k such that d k j dt j d k j T T j k Let T be the first time i updates its distance estimate given that all d T j k j, j N(i), estimates have arrived. Let d T ij be the estimate of d j that i has at time T. Note that this may differ from d T j k j estimates from j may have arrived before T. From the Lemma d k j dt ij which, coupled with the monotonicity of DP, implies d k+1 i d T i d k j d k+1 i T T since the later Since each node never stops transmitting, T is finite and the proposition is proved. Using the Lemma, we see that there is a finite k such that d κ i = d i = d κ i, κ k. Thus, from the proposition, there exists a finite time T such that d T i = d i for all T T and i. 14

15 Vol. I, Chapter www This problem is similar to the Brachistochrone Problem (Example 4.2) described in the text. As in that problem, we introduce the system ẋ = u and have a fixed terminal state problem x(0) = a and x(t) = b. Letting 1 + u 2 g(x, u) =, Cx the Hamiltonian is H(x, u, p) = g(x, u) + pu. Minimization of the Hamiltonian with respect to u yields p(t) = u g(x(t), u(t)). Since the Hamiltonian is constant along an optimal trajectory, we have g(x(t), u(t)) u g(x(t), u(t))u(t) = constant. Substituting in the expression for g, we have 1 + u 2 u 2 Cx 1 + u2 Cx = u2 Cx = constant, which simplifies to (x(t)) 2 (1 + (ẋ(t)) 2 ) = constant. Thus an optimal trajectory satisfies the differential equation ẋ(t) = D (x(t)) 2 (x(t)) 2. It can be seen through straightforward calculation that the curve (x(t)) 2 + (t d) 2 = D satisfies this differential equation, and thus the curve of minimum travel time from A to B is an arc of a circle. 15

16 3.9 www We have the system ẋ(t) = Ax(t) + Bu(t), for which we want to minimize the quadratic cost T x(t) Q T x(t) + x(t) Qx(t) + u(t) Ru(t) dt. 0 The Hamiltonian here is and the adjoint equation is with the terminal condition H(x, u, p) = x Qx + u Ru + p (Ax + Bu), ṗ(t) = A p(t) 2Qx(t), p(t) = 2Qx(T). Minimizing the Hamiltonian with respect to u yields the optimal control u (t) = arg min x (t) Qx (t) + u Ru + p (Ax (t) + Bu) u = 1 2 R 1 B p(t). We now hypothesize a linear relation between x (t) and p(t) 2K(t)x (t) = p(t), t 0, T, and show that K(t) can be obtained by solving the Riccati equation. Substituting this value of p(t) into the previous equation, we have u (t) = R 1 B K(t)x (t). By combining this result with the system equation, we have ẋ(t) = (A BR 1 B K(t)) x (t). ( ) Differentiating 2K(t)x (t) = p(t) and using the adjoint equation yields Combining with ( ), we have 2 K(t)x (t) + 2K(t)ẋ (t) = A 2K(t)x (t) 2Qx (t). K(t)x (t) + K(t)(A BR 1 B K(t)) x (t) = A K(t)x (t) Qx (t), and we thus see that K(t) should satisfy the Riccati equation K(t) = K(t)A A K(t) + K(t)BR 1 B K(t) Q. From the terminal condition p(t) = 2Qx(T), we have K(T) = Q, from which we can solve for K(t) using the Riccati equation. Once we have K(t), we have the optimal control u (t) = R 1 B K(t)x (t). By reversing the previous arguments, this control can then be shown to satisfy all the conditions of the Pontryagin Minimum Principle. 16

17 Vol. I, Chapter www (a) Clearly, J N (x) is continuous. Assume that J k+1 (x) is continuous. We have } J k (x) = cu + L(x + u) + G(x + u) where min u 0,1,...} G(y) = E wk J k+1 (y w k )} L(y) = E wk p max(0, w k y) + h max(0, y w k )} Thus, L is continuous. Since J k+1 is continuous, G is continuous for bounded w k. Assume that J k is not continuous. Then there exists a ˆx such that as y ˆx, J k (y) does not approach J k (ˆx). Let } u y = arg min cu + L(y + u) + G(y + u) u 0,1,...} Since L and G are continuous, the discontinuity of J k at ˆx implies lim y ˆx uy uˆx But since u y is optimal for y, lim cuy + L(y + u y ) + G(y + u y ) } < lim cuˆx + L(y + uˆx ) + G(y + uˆx ) } = J k (ˆx) y ˆx y ˆx This contradicts the optimality of J k (ˆx) for ˆx. Thus, J k is continuous. (b) Let Y k (x) = J k (x + 1) J k (x) Clearly Y N (x) is a non-decreasing function. Assume that Y k+1 (x) is non-decreasing. Then Y k (x + δ) Y k (x) = c(u x+δ+1 u x+δ ) c(u x+1 u x ) + L(x + δ u x+δ+1 ) L(x + δ + u x+δ ) L(x u x+1 ) L(x + u x ) + G(x + δ u x+δ+1 ) G(x + δ + u x+δ ) G(x u x+1 ) G(x + u x ) Since J k is continuous, u y+δ = u y for δ sufficiently small. Thus, with δ small, Y k (x + δ) Y k (x) = L(x + δ u x+1 ) L(x + δ + u x ) L(x u x+1 ) L(x + u x ) + G(x + δ u x+1 ) G(x + δ + u x ) G(x u x+1 ) G(x + u x ) Now, since the control and penalty costs are linear, the optimal order given a stock of x is less than the optimal order given x + 1 stock plus one unit. Thus u x+1 u x u x

18 If u x = u x+1 + 1, Y (x + δ) Y (x) = 0 and we have the desired result. Assume that u x = u x+1. Since L(x) is convex, L(x + 1) L(x) is non-decreasing. Using the assumption that Y k+1 (x) is non-decreasing, we have Y k (x + δ) Y k (x) = L(x + δ u x ) L(x + δ + u x ) L(x u x ) L(x + u x ) }} 0 Thus, Y k (x) is a non-decreasing function in x. + E wk Jk+1 (x + δ u x w k ) J k+1 (x + δ + u x w k ) J k+1 (x u x w k ) J k+1 (x + u x w k ) } }} 0 0 (c) From their definition and a straightforward induction it can be shown that J k (x) and J k(x, u) are bounded below. Furthermore,since lim x L k (x, u) =, we obtain lim x (x, 0) =. From the definition of J k (x, u), we have Let S k be the smallest real number satisfying J k (x, u) = J k (x + 1, u 1) + c, u 1, 2,...}. (2) J k (S k, 0) = J k (S k + 1, 0) + c (1) We show that S k is well defined. If no S k satisfying (1) exists, we must have either J k (x, 0) J k (x + 1, 0) > c, x R or J k (x, 0) J k (x + 1,0) < 0, x R, because J k is continuous. The first possibility contradicts the fact that lim x J k (x, 0) =. The second possibility implies that lim x J k (x,0) + cx is finite. However, using the boundedness of J k+1 (x) from below, we obtain lim x J k (x, 0) + cx =. The contradiction shows that S k is well defined. We now derive the form of an optimal policy u k (x). Fix some x and consider first the case x S k. Using the fact that J k (x, u) J k (x + 1, u) is nondecreasing function of x we have for any u 0, 1,2,...} J k (x + 1, u) J k (x, u) J k (S k + 1, u)j k (S k, u) = J k (S k + 1, 0) J k (S k, 0) = c Therefore, J k (x, u + 1) = J k (x + 1, u) + c J k (x, u) u 0, 1,...}, x S k. This shows that u = 0 minimizes J k (x, u), for all x S k. Now let x S k n, S k n +1), n 1, 2,...}. Using (2), we have J k (x, n + m) J k (x, n) = J k (x + n, m) J k (x + n, 0) 0 m in 0,1,...}. (3) However, if u < n then x + u < S k and Therefore, J k (x + u + 1, 0) J k (x + u,0) < J k (S k + 1,0) J k (S k, 0) = c. J k (x, u + 1) = J k (x + u + 1, 0) + (u + 1)c < J k (x + u,0) + uc = J k (x, u) u 0,1,...}, n < n. (4) Inequalities (3),(4) show that u = n minimizes J k (x, u) whenever x S k n, S k n + 1). 18

19 4.18 www Let the state x k be defined as The system evolves according to The cost function is given by T, if the selection has already terminated x k = 1, if the k th object observed has rank 1 0, if the k th object observed has rank < 1 T, if uk = stop or x x k+1 = k = T w k, if u k = continue k g k (x k, u k, w k ) = N, if x k = 1 and u k = stop 0, otherwise g N (x N ) = 1, if xn = 1 0, otherwise Note that if termination is selected at stage k and x k 1 then the probability of success is 0. Thus, if x k = 0 it is always optimal to continue. To complete the model we have to determine P(w k x k, u k ) = P(w k ) when the control u k = continue. At stage k, we have already selected k objects from a sorted set. Since we know nothing else about these objects the new element can, with equal probability, be in any relation with the already observed objects a j < a i1 < < a i2 < < a ik }} k+1 possible positions for a k+1 Thus, P(w k = 1) = 1 k + 1 ; P(w k = 0) = k k + 1 Proposition If k S N = i ( ) 1 N i 1}, then J k (0) = k N ( 1 N ) k J k (1) = k N. Proof For k = N 1 and µ N 1 (0) = continue. J J N 1 (0) = max }} 0 stop N 1 N 1(1) = max N }} stop, Ew N 1 } = 1 }} N continue, Ew N 1 } = N 1 }} N continue 19

20 and µ N 1 (1) = stop. Note that N 1 S N for all S N. Now, Assume the proposition is true for J k+1 (x k+1 ). Then J k (0) = max }} 0 stop k J k (1) = max N }} stop EJ k+1 (w k )} = 1 k + 1 k + 1 N + k k + 1 k + 1 N = k ( 1 N N ) k, EJ k+1 (w k )} }} continue, EJ k+1 (w k )} }} continue ( 1 N ) k + 1 Clearly and µ k (0) = continue. If k S N, J k (0) = k N ( 1 N ) k J k (1) = k N and µ k (1) = stop. Q.E.D. Proposition If k S N J k (0) = J k (1) = δ 1 N where δ is the minimum element of S N. ( 1 N ) δ 1 Proof For k = δ 1 J k (0) = 1 δ δ N + δ 1 ( δ 1 δ N N ) δ = δ 1 ( 1 N N ) δ 1 δ 1 J k (1) = max = δ 1 N and µ δ 1 (0) = µ δ 1 (1) = continue. Assume the proposition is true for J k (x k ). Then N, δ 1 ( 1 N N ) δ 1 ( 1 N ) δ 1 J k 1 (0) = 1 k J k(1) + k 1 k J k(0) = J k (0) 20

21 and µ k 1 (0) = continue. and µ k 1 (1) = continue. Q.E.D. 1 J k 1 (1) = max δ 1 = max N = J k (0) k J k(1) + k 1 k J k(0), k 1 N ( 1 N δ 1 ), k 1 N ( Thus the optimum ) policy is to continue until the δ th object, where δ is the minimum integer such that 1 N δ 1, and then stop at the first time an element is observed with largest rank www (a) In order that A k x+b k u+w X for all w W k, it is sufficient that A k x+b k u belong to some ellipsoid X such that the vector sum of X and W k is contained in X. The ellipsoid where for some scalar β (0, 1), X = z z F z 1}, F 1 = (1 β)(ψ 1 β 1 D 1 k ) has this property (based on the hint and assuming that F 1 is well-defined as a positive definite matrix). Thus, it is sufficient that x and u are such that (A k x + B k u) F(A k x + B k u) 1. (1) In order that for a given x, there exists u with u R k u 1 such that Eq. (1) is satisfied as well as it is sufficient that x is such that x Ξx 1 min x Ξx + u R k u + (A k x + B k u) F(A k x + B k u) 1, (2) u R m or by carryibf out explicitly the quadratic minimization above, where The control law x Kx 1, K = A k (F 1 + B k R 1 k B k ) 1 + Ξ. µ(x) = (R k + B k F B k) 1 B k F A kx attains the minimum in Eq. (2) for all x, so it achieves reachability. (b) Follows by iterative application of the results of part (a), starting with k = N 1 and proceeding backwards. (c) Follows from the arguments of part (a). 21

22 Vol. I, Chapter www Define y k = x k + A 1 k w k + A 1 k A 1 y N = x N k+1 w k A 1 k A 1 N 1 w N 1 Then and y k = x k + A 1 k (w k x k+1 ) + A 1 k y k+1 = x k + A 1 k ( A kx k B k u k ) + A 1 = A 1 k Now, the cost function is the expected value of We have B ku k + A 1 k y k+1 y k+1 = A k y k + B k u k k y k+1 N 1 N 1 x N Qx N + u k R k u k = y 0 K 0 y 0 + (y k+1 K k+1 y k+1 y k K k y k + u k R k u k ) k=0 k=0 y k+1 K k+1 y k+1 y k K k y k + u k R k u k = (A k y k + B k u k ) K k+1 (A k y k + B k u k ) + u k R k u k y k A k K k+1 K k+1 B k (B k K k+1 B k ) 1 B k τrk k+1 A k y k = y k A k K k+1 A k y k + 2y k A k K k+1b k u k + u k B k K k+1 B k u k y k A k K k+1 A k y k + y k A k K k+1 B k P 1 k B k K k+1a k y k + u k R k u k Thus, the cost function can be written as = 2y k L k P ku k + u k P k u k + y k L k P k L k y k = (u k L k y k ) P k (u k L k y k ) N 1 } E y 0 K 0 y 0 + (u k L k y k ) P k (u k L k y k ) k=0 The problem now is to find µ k (I k), k = 0, 1,..., N 1, that minimize over admissible control laws µ k (I k ), k = 0, 1,..., N 1, the cost function E y 0 K 0 y 0 + N 1 k=0 ( µk (I k ) L k y k ) Pk ( µk (I k ) L k y k ) } 22

23 We do this minimization by first minimizing over µ N 1, then over µ N 2, etc. The minimization over µ N 1 involves just the last term in the sum and can be written as (µn ) PN ( ) } min E 1 (I N 1 ) L N 1 y N 1 1 µn 1 (I N 1 ) L N 1 y N 1 µ N 1 (un 1 ) PN ( ) }} = E min E L N 1 y N 1 1 un 1 L N 1 y IN 1 N 1 u N 1 Thus this minimization yields the optimal control law for the last stage µ N 1 (I } N 1) = L N 1 E y IN 1 N 1 (Recall here that, generically, Ez I} minimizes over u the expression Ez(u z) P(u z) I} for any random variable z, any conditioning variable I, and any positive semidefinite matrix P.) The minimization over µ N 2 involves (µn 2 ) PN 2 ( ) } E (I N 2 ) L N 2 y N 2 µn 2 (I N 2 ) L N 2 y N 2 (E ) L ( ) } + E y N 1 I N 1 } y N 1 N 1 P N 1 L N 1 E y N 1 I N 1 } y N 1 However, as in the lemma of p. 104, the term Ey N 1 I N 1 } y N 1 does not depend on any of the controls (it is a function of x 0, w 0,..., w N 2, v 0,..., v N 1 ). Thus the minimization over µ N 2 involves just the first term above and yields similarly as before µ N 2 (I } N 2) = L N 2 E y IN 2 N 2 Proceeding similarly, we prove that for all k µ k (I } k) = L k E y Ik k Note: The preceding proof can be used to provide a quick proof of the separation theorem for linear-quadratic problems in the case where x 0, w 0,..., w N 1, v 0,..., v N 1 are independent. If the cost function is E x N Q N x N + N 1 ( ) } xk Q k x k + u k R k u k the preceding calculation can be modified to show that the cost function can be written as E x 0 K 0 x 0 + k=0 N 1 ( (uk L k x k ) ) } P k (u k L k x k ) + w k K k+1 w k k=0 By repeating the preceding proof we then obtain the optimal control law as µ k (I } k) = L k E x Ik k 23

24 5.3 www The control at time k is (u k, α k ) where α k is a variable taking values 1 (if the next measurement at time k + 1 is of type 1) or 2 (if the next measurement is of type 2). The cost functional is N 1 E x N Qx N + k=0 (x k Qx k + u k Ru k ) + N 1 We apply the DP algorithm for N = 2. We have from the Riccatti equation So J 1 (I 1 ) = J 1 (z 0, z 1, u 0, α 0 ) = E x1 x 1 (A QA + Q)x 1 I 1 } + E w1 w Qw} k=0 g αk } + min u 1 u1 (B QB + R)u Ex 1 I 1 } A QBu 1 } + ming 1, g 2 µ 1 (I 1) = (B QB + R) 1 B QA Ex 1 I 1 } 1, if α 1 (I g1 g 2 1) = 2, otherwise Note that the measurement selected at k = 1 does not depend on I 1. This is intuitively clear since the measurement z 2 will not be used by the controller so its selection should be based on measurement cost alone and not on the basis of the quality of estimate. The situation is different once more than one stage is considered. Using a simple modification of the analysis in Section 5.2 of the text, we have J 0 (I 0 ) = J 0 (z 0 ) = min E x0 Qx 0 + u 0 Ru 0 + Ax 0 + Bu 0 + w } } 0 K 0 Ax 0 + Bu 0 + w 0 z0 u 0 x 0,w 0 + min E E x1 Ex 1 I 1 } P 1 x 1 Ex 1 I 1 } } } I1 z0, u 0, α 0 + g α0 α 0 z 1 x 1 + E w1 w 1 Qw 1 } + ming 1, g 2 The quantity in the second bracket is the error covariance of the estimation error (weighted by P 1 ) and, as shown in the text, it does not depend on u 0. Thus the minimization is indicated only with respect to α 0 and not u 0. Because all stochastic variables are Gaussian, the quantity in the second pracket does not depend on z 0. (The weighted error covariance produced by the Kalman filter is precomputable and depends only on the system and measurement matrices and noise covariances but not on the measurements received.) In fact E E x1 Ex 1 I 1 } P 1 x 1 Ex 1 I 1 } } } I1 z0, u 0, α 0 z 1 x 1 ( Tr = Tr P P 1 2, if α 0 = 1 ( ) P P 1 2 1, if α 0 = 2 where Tr( ) denotes the trace of a matrix, and ( ) denotes the error covariance of the Kalman filter estimate if a measurement of type 1 (type 2) is taken at k = 0. Thus at time k = 0 we have that the optimal measurement chosen does not depend on z 0 and is of type 1 if and of type 2 otherwise. T r 1 ) ( P Σ P ) + g 1 T r ( P Σ P ) + g 2 24

25 5.7 www (a) We have p j k+1 = P(x k+1 = j z 0,..., z k+1, u 0,..., u k ) = P(x k+1 = j I k+1 ) = P(x k+1 = j, z k+1 I k, u k ) from Bayes P(z k+1 I k, u k ) rule n i=1 = P(x k = i)p(x k+1 = j x k = i, u k )P(z k+1 u k, x k+1 = j) n n s=1 i=1 P(x k = i)p(x k+1 = s x k = i, u k )P(z k+1 u k, x k+1 = s) n i=1 = pi k p ij(u k )r j (u k, z k+1 ) n n s=1 i=1 pi k p is(u k )r s (u k, z k+1 ). Rewriting p j k+1 in vector form, we have p j k+1 = P k P.j(u k )r j (u k, z k+1 ) n s=1 P k P.s(u k )r s (u k, z k+1 ) = r j (u k, z k+1 )P(u k ) P k j n s=1 r s(u k, z k+1 )P(u k ) P k s, j = 1,..., n. Therefore, P k+1 = r(u k, z k+1 ) P(u k ) P k r(u k, z k+1 ) P(u k ) P k. (b) The DP algorithm for this system is J N 1 (P N 1 ) = min u = min u = min u n n p i N 1 p ij (u)g N 1 (i, u, j) i=1 j=1 n } p i N 1 G N 1(u) i i=1 P N 1 G N 1(u)} J k (P k ) = min u n p i k i=1 j=1 = min u P k G k(u) + n p ij (u)g k (i, u, j) + q θ=1 n p i k i=1 j=1 n p ij (u) q r j (u, θ) J k+1 (P k+1 P k, u, θ)} θ=1 r(u, θ) P(u) P k Jk+1 r(u, θ) P(u) P k r(u, θ) P(u) P k }. (c) For k = N 1, J N 1 (λp N 1 ) = minλp u N 1 G N 1(u)} = min = λ min u n i=1 n u i=1 p i N 1 G N 1(u) i } = λ min u 25 λp i N 1 G N 1(u) i } = min u λ n p i N 1 G N 1(u) i } i=1 n p i N 1 G N 1(u) i } = λ J N 1 (P N 1 ). i=1

26 Now assume J k (λp k ) = λ J k (P k ). Then, J k 1 (λp k 1 ) = min u λp k 1 G k 1(u) + q r(u, θ) P(u) λp k 1 Jk (P k P k 1, u, θ)} θ=1 = minλp u k 1 G k 1(u) + λ = λ min u P k 1 G k 1(u) + q r(u, θ) P(u) P k 1 Jk (P k P k 1, u, θ)} θ=1 q r(u, θ) P(u) P k 1 Jk (P k P k 1, u, θ)} θ=1 = λ J k 1 (P k 1 ). Q.E.D. Note that induction is not necessary to show that J k (λp k ) = λ J k (P k ). For any u, r(u, θ) P(u) P k is a constant. Therefore, letting λ = r(u, θ) P(u) P k, we have q r(u, θ) P(u) r(u, θ) P(u) P k P k Jk+1 r(u, θ) P(u) P k θ=1 q = min P k G k(u ) + J k+1 (r(u, θ) P(u) P k ). J k (P k ) = min u P k G k(u) + θ=1 } (d) For k = N 1, we have J N 1 (P N 1 ) = minp N 1 G N 1(u), and so J N 1 (P N 1 ) has the desired form J N 1 (P N 1 ) = minp N 1 α1 N 1,..., P N 1 αm N 1, where α j N 1 is the jth element of G N 1(u). Assume J k+1 (P k+1 ) = minp k+1 α1 k+1,..., P k+1 αm k+1 k+1. Then, using the expression from part (c) for J K (P k ), J k (P k ) = min = min = min = min P k G k(u ) + P k G k(u) + P k G k(u) + P k G k(u) + q J k+1 (r(u, θ) P(u) P k ) θ=1 q minr(u, θ) P(u) P k } α k+1 θ=1 q minp k P(u)r(u, θ) α k+1 θ=1 q minp(u)r(u, θ) α k+1 θ=1 = min P k α1 k,..., P k αm k k, where The induction is thus complete. q α k = G k (u) + minp(u)r(u, θ) α k+1. θ=1 26

27 Vol. I, Chapter www First, we notice that α β pruning is applicable only for arcs that point to right children, so that at least one sequence of moves (starting from the current position and ending at a terminal position, that is, one with no children) has been considered. Furthermore, due to depth-first search the score at the ancestor positions has been derived without taking into account the positions that can be reached from the current point. Suppose now that α-pruning applies at a position with Black to play. Then, if the current position is reached (due to a move by White), Black can respond in such a way that the final position will be worse (for White) than it would have been if the current position were not reached. What α-pruning saves is searching for even worse positions (emanating from the current position). The reason for this is that White will never play so that Black reaches the current position, because he certainly has a better alternative. A similar argument applies for β pruning. A second approach: Let us suppose that it is the WHITE s turn to move. We shall prove that a β cutoff occurring at the nth position will not affect the backed up score. We have from the definition of β β = mintbs of all ancestors of n (white) where BLACK has the move}. For a cutoff to occur: TBS(n) > β. Observe first of all that β = T BS(n 1 ) for some ancestor n 1 where BLACK has the move. Then there exists a path n 1, n 2,..., n k, n. Since it is WHITE s move at n we have that T BS(n) = maxtbs(n), BS(n i )} > β, where n i are the descendants of n. Consider now a position n k. Then T BS(n k ) will either remain unchanged or will increase to a value greater than β as a result of the exploration of node n. Proceeding similarly, we conclude that T BS(n 2 ) will either remain the same or change to a value greater than β. Finally at node n 1 we have that TBS(n 1 ) will not change since it is BLACK s turn to move and he will choose the move with minimum score. Thus the backed up score and the choice of the next move are unaffected from β pruning. A similar argument holds for α pruning www (a) We have for all i F(i) ˆF(i) = a ij(i) + F ( j(i) ). (1) Assume, in order to come to a contradiction, that the graph of the N 1 arcs ( i, j(i) ), i = 1,..., N 1, contains a cycle (i 1, i 2,..., i k, i 1 ). Using Eq. (1), we have By adding the above inequalities, we obtain F(i 1 ) a i1 i 2 + F(i 2 ), F(i 2 ) a i2 i 3 + F(i 3 ), F(i k ) a ik i 1 + F(i 1 ). 0 a i1 i 2 + a i2 i a ik i 1. Thus the length of the cycle (i 1, i 2,..., i k, i 1 ) is nonpositive, a contradiction. Hence, the graph of the N 1 arcs ( i, j(i) ), i = 1,..., N 1, contains no cycle. Given any node i N, we can start with arc ( i, j(i) ), 27

VOL. I, FOURTH EDITION. Dimitri P. Bertsekas. Massachusetts Institute of Technology. Last Updated 2/11/2017. Athena Scientific, Belmont, Mass.

VOL. I, FOURTH EDITION. Dimitri P. Bertsekas. Massachusetts Institute of Technology. Last Updated 2/11/2017. Athena Scientific, Belmont, Mass. Dynamic Programming and Optimal Control VOL. I, FOURTH EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology Selected Theoretical Problem Solutions Last Updated 2/11/2017 Athena Scientific,

More information

Stochastic Optimal Control

Stochastic Optimal Control Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of

More information

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role

More information

Handout 4: Deterministic Systems and the Shortest Path Problem

Handout 4: Deterministic Systems and the Shortest Path Problem SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE 6.21 DYNAMIC PROGRAMMING LECTURE LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path

More information

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

Dynamic Programming and Stochastic Control

Dynamic Programming and Stochastic Control Dynamic Programming and Stochastic Control Dr. Alex Leong Department of Electrical Engineering (EIM-E) Paderborn University, Germany alex.leong@upb.de Dr. Alex Leong (alex.leong@upb.de) DP and Stochastic

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,

More information

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008 (presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35 The Secretary problem You have

More information

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time

More information

On the Optimality of a Family of Binary Trees Techical Report TR

On the Optimality of a Family of Binary Trees Techical Report TR On the Optimality of a Family of Binary Trees Techical Report TR-011101-1 Dana Vrajitoru and William Knight Indiana University South Bend Department of Computer and Information Sciences Abstract In this

More information

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0. Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization

More information

IEOR E4004: Introduction to OR: Deterministic Models

IEOR E4004: Introduction to OR: Deterministic Models IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the

More information

Dynamic Portfolio Execution Detailed Proofs

Dynamic Portfolio Execution Detailed Proofs Dynamic Portfolio Execution Detailed Proofs Gerry Tsoukalas, Jiang Wang, Kay Giesecke March 16, 2014 1 Proofs Lemma 1 (Temporary Price Impact) A buy order of size x being executed against i s ask-side

More information

Tug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract

Tug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract Tug of War Game William Gasarch and ick Sovich and Paul Zimand October 6, 2009 To be written later Abstract Introduction Combinatorial games under auction play, introduced by Lazarus, Loeb, Propp, Stromquist,

More information

Lecture Quantitative Finance Spring Term 2015

Lecture Quantitative Finance Spring Term 2015 implied Lecture Quantitative Finance Spring Term 2015 : May 7, 2015 1 / 28 implied 1 implied 2 / 28 Motivation and setup implied the goal of this chapter is to treat the implied which requires an algorithm

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

Martingales. by D. Cox December 2, 2009

Martingales. by D. Cox December 2, 2009 Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a

More information

Final exam solutions

Final exam solutions EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the

More information

Lecture l(x) 1. (1) x X

Lecture l(x) 1. (1) x X Lecture 14 Agenda for the lecture Kraft s inequality Shannon codes The relation H(X) L u (X) = L p (X) H(X) + 1 14.1 Kraft s inequality While the definition of prefix-free codes is intuitively clear, we

More information

Interpolation. 1 What is interpolation? 2 Why are we interested in this?

Interpolation. 1 What is interpolation? 2 Why are we interested in this? Interpolation 1 What is interpolation? For a certain function f (x we know only the values y 1 = f (x 1,,y n = f (x n For a point x different from x 1,,x n we would then like to approximate f ( x using

More information

Notes on the EM Algorithm Michael Collins, September 24th 2005

Notes on the EM Algorithm Michael Collins, September 24th 2005 Notes on the EM Algorithm Michael Collins, September 24th 2005 1 Hidden Markov Models A hidden Markov model (N, Σ, Θ) consists of the following elements: N is a positive integer specifying the number of

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.

More information

Essays on Some Combinatorial Optimization Problems with Interval Data

Essays on Some Combinatorial Optimization Problems with Interval Data Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university

More information

The Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition.

The Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition. The Real Numbers Here we show one way to explicitly construct the real numbers R. First we need a definition. Definitions/Notation: A sequence of rational numbers is a funtion f : N Q. Rather than write

More information

CHAPTER 5: DYNAMIC PROGRAMMING

CHAPTER 5: DYNAMIC PROGRAMMING CHAPTER 5: DYNAMIC PROGRAMMING Overview This chapter discusses dynamic programming, a method to solve optimization problems that involve a dynamical process. This is in contrast to our previous discussions

More information

Optimization Methods. Lecture 16: Dynamic Programming

Optimization Methods. Lecture 16: Dynamic Programming 15.093 Optimization Methods Lecture 16: Dynamic Programming 1 Outline 1. The knapsack problem Slide 1. The traveling salesman problem 3. The general DP framework 4. Bellman equation 5. Optimal inventory

More information

LECTURE 2: MULTIPERIOD MODELS AND TREES

LECTURE 2: MULTIPERIOD MODELS AND TREES LECTURE 2: MULTIPERIOD MODELS AND TREES 1. Introduction One-period models, which were the subject of Lecture 1, are of limited usefulness in the pricing and hedging of derivative securities. In real-world

More information

MAT 4250: Lecture 1 Eric Chung

MAT 4250: Lecture 1 Eric Chung 1 MAT 4250: Lecture 1 Eric Chung 2Chapter 1: Impartial Combinatorial Games 3 Combinatorial games Combinatorial games are two-person games with perfect information and no chance moves, and with a win-or-lose

More information

So we turn now to many-to-one matching with money, which is generally seen as a model of firms hiring workers

So we turn now to many-to-one matching with money, which is generally seen as a model of firms hiring workers Econ 805 Advanced Micro Theory I Dan Quint Fall 2009 Lecture 20 November 13 2008 So far, we ve considered matching markets in settings where there is no money you can t necessarily pay someone to marry

More information

Introduction to Dynamic Programming

Introduction to Dynamic Programming Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Lecture 10: The knapsack problem

Lecture 10: The knapsack problem Optimization Methods in Finance (EPFL, Fall 2010) Lecture 10: The knapsack problem 24.11.2010 Lecturer: Prof. Friedrich Eisenbrand Scribe: Anu Harjula The knapsack problem The Knapsack problem is a problem

More information

Intro to Economic analysis

Intro to Economic analysis Intro to Economic analysis Alberto Bisin - NYU 1 The Consumer Problem Consider an agent choosing her consumption of goods 1 and 2 for a given budget. This is the workhorse of microeconomic theory. (Notice

More information

Sublinear Time Algorithms Oct 19, Lecture 1

Sublinear Time Algorithms Oct 19, Lecture 1 0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation

More information

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015. FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.) Hints for Problem Set 2 1. Consider a zero-sum game, where

More information

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference. 14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE 6.21 DYNAMIC PROGRAMMING LECTURE LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path

More information

CSCE 750, Fall 2009 Quizzes with Answers

CSCE 750, Fall 2009 Quizzes with Answers CSCE 750, Fall 009 Quizzes with Answers Stephen A. Fenner September 4, 011 1. Give an exact closed form for Simplify your answer as much as possible. k 3 k+1. We reduce the expression to a form we ve already

More information

Economic optimization in Model Predictive Control

Economic optimization in Model Predictive Control Economic optimization in Model Predictive Control Rishi Amrit Department of Chemical and Biological Engineering University of Wisconsin-Madison 29 th February, 2008 Rishi Amrit (UW-Madison) Economic Optimization

More information

Iteration. The Cake Eating Problem. Discount Factors

Iteration. The Cake Eating Problem. Discount Factors 18 Value Function Iteration Lab Objective: Many questions have optimal answers that change over time. Sequential decision making problems are among this classification. In this lab you we learn how to

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Corrections to the Second Edition of Modeling and Analysis of Stochastic Systems

Corrections to the Second Edition of Modeling and Analysis of Stochastic Systems Corrections to the Second Edition of Modeling and Analysis of Stochastic Systems Vidyadhar Kulkarni November, 200 Send additional corrections to the author at his email address vkulkarn@email.unc.edu.

More information

Optimizing Portfolios

Optimizing Portfolios Optimizing Portfolios An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2010 Introduction Investors may wish to adjust the allocation of financial resources including a mixture

More information

Online Appendix Optimal Time-Consistent Government Debt Maturity D. Debortoli, R. Nunes, P. Yared. A. Proofs

Online Appendix Optimal Time-Consistent Government Debt Maturity D. Debortoli, R. Nunes, P. Yared. A. Proofs Online Appendi Optimal Time-Consistent Government Debt Maturity D. Debortoli, R. Nunes, P. Yared A. Proofs Proof of Proposition 1 The necessity of these conditions is proved in the tet. To prove sufficiency,

More information

Optimal Long-Term Supply Contracts with Asymmetric Demand Information. Appendix

Optimal Long-Term Supply Contracts with Asymmetric Demand Information. Appendix Optimal Long-Term Supply Contracts with Asymmetric Demand Information Ilan Lobel Appendix Wenqiang iao {ilobel, wxiao}@stern.nyu.edu Stern School of Business, New York University Appendix A: Proofs Proof

More information

Web Appendix: Proofs and extensions.

Web Appendix: Proofs and extensions. B eb Appendix: Proofs and extensions. B.1 Proofs of results about block correlated markets. This subsection provides proofs for Propositions A1, A2, A3 and A4, and the proof of Lemma A1. Proof of Proposition

More information

An Introduction to Econometrics. Wei Zhu Department of Mathematics First Year Graduate Student Oct22, 2003

An Introduction to Econometrics. Wei Zhu Department of Mathematics First Year Graduate Student Oct22, 2003 An Introduction to Econometrics Wei Zhu Department of Mathematics First Year Graduate Student Oct22, 2003 1 Chapter 1. What is econometrics? It is the application of statistical theories to economic ones

More information

Dynamic Programming and Reinforcement Learning

Dynamic Programming and Reinforcement Learning Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning

More information

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods EC316a: Advanced Scientific Computation, Fall 2003 Notes Section 4 Discrete time, continuous state dynamic models: solution methods We consider now solution methods for discrete time models in which decisions

More information

1 Answers to the Sept 08 macro prelim - Long Questions

1 Answers to the Sept 08 macro prelim - Long Questions Answers to the Sept 08 macro prelim - Long Questions. Suppose that a representative consumer receives an endowment of a non-storable consumption good. The endowment evolves exogenously according to ln

More information

Q1. [?? pts] Search Traces

Q1. [?? pts] Search Traces CS 188 Spring 2010 Introduction to Artificial Intelligence Midterm Exam Solutions Q1. [?? pts] Search Traces Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a

More information

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

Macroeconomics and finance

Macroeconomics and finance Macroeconomics and finance 1 1. Temporary equilibrium and the price level [Lectures 11 and 12] 2. Overlapping generations and learning [Lectures 13 and 14] 2.1 The overlapping generations model 2.2 Expectations

More information

1 Dynamic programming

1 Dynamic programming 1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants

More information

Part 1: q Theory and Irreversible Investment

Part 1: q Theory and Irreversible Investment Part 1: q Theory and Irreversible Investment Goal: Endogenize firm characteristics and risk. Value/growth Size Leverage New issues,... This lecture: q theory of investment Irreversible investment and real

More information

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision

More information

Finite Memory and Imperfect Monitoring

Finite Memory and Imperfect Monitoring Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve

More information

CS 3331 Numerical Methods Lecture 2: Functions of One Variable. Cherung Lee

CS 3331 Numerical Methods Lecture 2: Functions of One Variable. Cherung Lee CS 3331 Numerical Methods Lecture 2: Functions of One Variable Cherung Lee Outline Introduction Solving nonlinear equations: find x such that f(x ) = 0. Binary search methods: (Bisection, regula falsi)

More information

3.2 No-arbitrage theory and risk neutral probability measure

3.2 No-arbitrage theory and risk neutral probability measure Mathematical Models in Economics and Finance Topic 3 Fundamental theorem of asset pricing 3.1 Law of one price and Arrow securities 3.2 No-arbitrage theory and risk neutral probability measure 3.3 Valuation

More information

Lecture 5: Iterative Combinatorial Auctions

Lecture 5: Iterative Combinatorial Auctions COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

MATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models

MATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models MATH 5510 Mathematical Models of Financial Derivatives Topic 1 Risk neutral pricing principles under single-period securities models 1.1 Law of one price and Arrow securities 1.2 No-arbitrage theory and

More information

Lecture 23: April 10

Lecture 23: April 10 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 23: April 10 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

What can we do with numerical optimization?

What can we do with numerical optimization? Optimization motivation and background Eddie Wadbro Introduction to PDE Constrained Optimization, 2016 February 15 16, 2016 Eddie Wadbro, Introduction to PDE Constrained Optimization, February 15 16, 2016

More information

Dynamic Programming (DP) Massimo Paolucci University of Genova

Dynamic Programming (DP) Massimo Paolucci University of Genova Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem

More information

Laurence Boxer and Ismet KARACA

Laurence Boxer and Ismet KARACA SOME PROPERTIES OF DIGITAL COVERING SPACES Laurence Boxer and Ismet KARACA Abstract. In this paper we study digital versions of some properties of covering spaces from algebraic topology. We correct and

More information

Stochastic Dual Dynamic Programming

Stochastic Dual Dynamic Programming 1 / 43 Stochastic Dual Dynamic Programming Operations Research Anthony Papavasiliou 2 / 43 Contents [ 10.4 of BL], [Pereira, 1991] 1 Recalling the Nested L-Shaped Decomposition 2 Drawbacks of Nested Decomposition

More information

1 Appendix A: Definition of equilibrium

1 Appendix A: Definition of equilibrium Online Appendix to Partnerships versus Corporations: Moral Hazard, Sorting and Ownership Structure Ayca Kaya and Galina Vereshchagina Appendix A formally defines an equilibrium in our model, Appendix B

More information

UNIT 2. Greedy Method GENERAL METHOD

UNIT 2. Greedy Method GENERAL METHOD UNIT 2 GENERAL METHOD Greedy Method Greedy is the most straight forward design technique. Most of the problems have n inputs and require us to obtain a subset that satisfies some constraints. Any subset

More information

Notes on Intertemporal Optimization

Notes on Intertemporal Optimization Notes on Intertemporal Optimization Econ 204A - Henning Bohn * Most of modern macroeconomics involves models of agents that optimize over time. he basic ideas and tools are the same as in microeconomics,

More information

Dynamic Portfolio Choice II

Dynamic Portfolio Choice II Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic

More information

Lecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods

Lecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods Lecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods. Introduction In ECON 50, we discussed the structure of two-period dynamic general equilibrium models, some solution methods, and their

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

3 Arbitrage pricing theory in discrete time.

3 Arbitrage pricing theory in discrete time. 3 Arbitrage pricing theory in discrete time. Orientation. In the examples studied in Chapter 1, we worked with a single period model and Gaussian returns; in this Chapter, we shall drop these assumptions

More information

Two-Dimensional Bayesian Persuasion

Two-Dimensional Bayesian Persuasion Two-Dimensional Bayesian Persuasion Davit Khantadze September 30, 017 Abstract We are interested in optimal signals for the sender when the decision maker (receiver) has to make two separate decisions.

More information

Dynamic Admission and Service Rate Control of a Queue

Dynamic Admission and Service Rate Control of a Queue Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

Techniques for Calculating the Efficient Frontier

Techniques for Calculating the Efficient Frontier Techniques for Calculating the Efficient Frontier Weerachart Kilenthong RIPED, UTCC c Kilenthong 2017 Tee (Riped) Introduction 1 / 43 Two Fund Theorem The Two-Fund Theorem states that we can reach any

More information

Maximizing the Spread of Influence through a Social Network Problem/Motivation: Suppose we want to market a product or promote an idea or behavior in

Maximizing the Spread of Influence through a Social Network Problem/Motivation: Suppose we want to market a product or promote an idea or behavior in Maximizing the Spread of Influence through a Social Network Problem/Motivation: Suppose we want to market a product or promote an idea or behavior in a society. In order to do so, we can target individuals,

More information

Competitive Market Model

Competitive Market Model 57 Chapter 5 Competitive Market Model The competitive market model serves as the basis for the two different multi-user allocation methods presented in this thesis. This market model prices resources based

More information

Extraction capacity and the optimal order of extraction. By: Stephen P. Holland

Extraction capacity and the optimal order of extraction. By: Stephen P. Holland Extraction capacity and the optimal order of extraction By: Stephen P. Holland Holland, Stephen P. (2003) Extraction Capacity and the Optimal Order of Extraction, Journal of Environmental Economics and

More information

Department of Mathematics. Mathematics of Financial Derivatives

Department of Mathematics. Mathematics of Financial Derivatives Department of Mathematics MA408 Mathematics of Financial Derivatives Thursday 15th January, 2009 2pm 4pm Duration: 2 hours Attempt THREE questions MA408 Page 1 of 5 1. (a) Suppose 0 < E 1 < E 3 and E 2

More information

AMH4 - ADVANCED OPTION PRICING. Contents

AMH4 - ADVANCED OPTION PRICING. Contents AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5

More information

ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games

ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games University of Illinois Fall 2018 ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games Due: Tuesday, Sept. 11, at beginning of class Reading: Course notes, Sections 1.1-1.4 1. [A random

More information

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.

More information

13.3 A Stochastic Production Planning Model

13.3 A Stochastic Production Planning Model 13.3. A Stochastic Production Planning Model 347 From (13.9), we can formally write (dx t ) = f (dt) + G (dz t ) + fgdz t dt, (13.3) dx t dt = f(dt) + Gdz t dt. (13.33) The exact meaning of these expressions

More information

Decomposition Methods

Decomposition Methods Decomposition Methods separable problems, complicating variables primal decomposition dual decomposition complicating constraints general decomposition structures Prof. S. Boyd, EE364b, Stanford University

More information

The Correlation Smile Recovery

The Correlation Smile Recovery Fortis Bank Equity & Credit Derivatives Quantitative Research The Correlation Smile Recovery E. Vandenbrande, A. Vandendorpe, Y. Nesterov, P. Van Dooren draft version : March 2, 2009 1 Introduction Pricing

More information

6 -AL- ONE MACHINE SEQUENCING TO MINIMIZE MEAN FLOW TIME WITH MINIMUM NUMBER TARDY. Hamilton Emmons \,«* Technical Memorandum No. 2.

6 -AL- ONE MACHINE SEQUENCING TO MINIMIZE MEAN FLOW TIME WITH MINIMUM NUMBER TARDY. Hamilton Emmons \,«* Technical Memorandum No. 2. li. 1. 6 -AL- ONE MACHINE SEQUENCING TO MINIMIZE MEAN FLOW TIME WITH MINIMUM NUMBER TARDY f \,«* Hamilton Emmons Technical Memorandum No. 2 May, 1973 1 il 1 Abstract The problem of sequencing n jobs on

More information

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T.

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T. Practice July 3rd, 2012 Laboratory for Information and Decision Systems, M.I.T. 1 2 Infinite-Horizon DP Minimize over policies the objective cost function J π (x 0 ) = lim N E w k,k=0,1,... DP π = {µ 0,µ

More information

Lecture Notes 1

Lecture Notes 1 4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross

More information

Bargaining and Competition Revisited Takashi Kunimoto and Roberto Serrano

Bargaining and Competition Revisited Takashi Kunimoto and Roberto Serrano Bargaining and Competition Revisited Takashi Kunimoto and Roberto Serrano Department of Economics Brown University Providence, RI 02912, U.S.A. Working Paper No. 2002-14 May 2002 www.econ.brown.edu/faculty/serrano/pdfs/wp2002-14.pdf

More information

ELEMENTS OF MATRIX MATHEMATICS

ELEMENTS OF MATRIX MATHEMATICS QRMC07 9/7/0 4:45 PM Page 5 CHAPTER SEVEN ELEMENTS OF MATRIX MATHEMATICS 7. AN INTRODUCTION TO MATRICES Investors frequently encounter situations involving numerous potential outcomes, many discrete periods

More information