STP Problem Set 3 Solutions
|
|
- Juniper Benson
- 6 years ago
- Views:
Transcription
1 STP Problem Set 3 Solutions 4.4) Consider the separable sequential allocation problem introduced in Sections and 4.6.3, where the goal is to maximize the sum subject to the constraints f(x 1,, x N ) = N g(x t ) t=1 N x i = M with x 1,, x N 0. i=1 As explained in Section 3.3.3, this optimization problem can be formulated as a deterministic dynamic program by letting S = [0, M], A s = [0, s], r t (s, a) = g(a) for t = 1,, N 1 and r N (s) = g(s), with dynamical equation s t+1 = s t a t. For this exercise, we will assume that the function g(a) is strictly concave of the set [0, M], meaning that for all points x, y [0, M] and all p [0, 1] we have with strict inequality whenever p (0, 1). pg(x) + (1 p)g(y) g ( px + (1 p)y ), If we let u t (s), t = 1,, N denote the optimal value functions for this problem, then these satisfy the Bellman equations u { t (s) = sup g(a) + u t+1 (s a), t = 1,, N 1, u N(s) = g(s), which can be solved using backwards recursion. For example, when t = N 1, we have since the concavity of g implies that u N 1(s) = sup {g(a) + g(s a) = 2g(s/2), 1 2 g(a) + 1 g(s a) g 2 ( 1 2 a + 1 ) (s a) = g(s/2), 2 with equality if and only if a = d N 1 (s) = s/2. Similarly, when t = N 2, then { ( ) s a u N 2(s) = sup g(a) + 2g = 3g(s/3), 2 since the concavity of g implies that 1 3 g(a) + 2 ( ) s a 3 2 ( 1 g 3 a ) s a = g(s/3), 2
2 with equality if and only if a = d N 2 (s) = s/3. These calculations suggest that the following general formula holds ( s ) u N t+1(s) = tg t (1) for t = 1,, N with the unique optimal decision rule being d N t+1 (s) = s/t. This can be verified by backwards induction. We first note that (1) holds for t = 1, 2, 3 as above. Suppose, then, that (1) is true for t = 1,, n. Then u { N n(s) = sup g(a) + u N n+1 (s a) = sup = (n + 1)g { g(a) + ng ( ) s. n + 1 ( ) s a Indeed, the third equality follows from the concavity of g, since 1 n + 1 g(a) + n ( ) ( s a 1 n + 1 g g n + 1 n + 1 a + n ) s a n + 1 n with equality if and only if a = d N n (s) = s/(n + 1). n ( ) s = g, n + 1 The optimal allocation pattern can now be determined by forward recursion using the optimal decision rules given above. When t = 1, we have s 1 = M, a 1 = d 1 (M) = M/N and r(s 1, a 1 ) = g(m/n). Likewise, when t = 2, we have s 2 = M(N 1)/N, a 2 = d 2 (M(N 1)/N) = M/N and r(s 2, a 2 ) = g(m/n). Indeed, by forward induction, we can establish that the optimal sequence of states, actions and rewards is for t = 1,, N. s t = Mt N, a t = M N, r(s t, a t ) = g(m/n) Note: In Section 4.6.3, the problem is reformulated as a maximization of the sum N f(x 1,, x N ) = g(x t ) t=1 and then we need to assume that g is convex. 2
3 4.5) Consider the separable sequential allocation problem as formulated in the preceding exercise, but now suppose that g is convex, i.e., pg(x) + (1 p)g(y) g ( px + (1 p)y ). As before, we can solve the Bellman equations using backwards recursion. When t = N 1, we have u N 1(s) = sup {g(a) + g(s a) = g(0) + g(s) with maxima at a = 0 or a = s. Indeed, the convexity of g implies that for any a [0, s] we have ( 1 a ) g(0) + a g(s) g(a) s s and summing these inequalities gives a s g(0) + (1 a s ) g(s) g(s a) g(0) + g(s) g(a) + g(s a) for all such a with equality when a = 0 or a = s. If we next let t = N 2, then u N 1(s) = sup {g(a) + g(0) + g(s a) = 2g(0) + g(s) by the argument given above, with maxima again at a = 0 or a = s. induction, we can establish that for all t = 1,, N Thus, by backwards u t (s) = (N t + 1)g(0) + g(s) and that d t (s) {0, s. It follows that there are at least N different optimal allocation patterns, π1,, π N, where π i is the pattern obtained by allocating all M of the resources during the i th decision epoch and 0 during all of the others. 3
4 4.20) Consider the equipment replacement model of Section with two decision epochs (N = 3), where the condition of the equipment s decays from epoch to epoch according to the equation s t+1 = s t + X t, where X t is geometrically-distributed with parameter π = 0.4, and the costs are determined by the parameters R = 0, K = 5, h(s) = 2s, and r 3 (s) = (5 s) +. By Theorem 4.7.5, we know that this problem is monotone, which implies that for each decision epoch t < N there exists a threshold value s t such that the optimal decision rule takes the form d t (s) = 0 if s < s t and d t (s) = 1 otherwise. In other words, the optimal policy is to order a replacement in epoch t if and only if the condition of the equipment is greater than level s t (higher values indicating poorer condition). This policy can be found by carrying out the monotone backward induction algorithm. begin by observing that u 3(s) = (5 s) + for all s 0. Upon substituting this expression into the optimal value equations, we then have u 2(s) = max { 2s + E[(5 s X) + ], 5 + E[(5 X) + ], where X Geometric(π). A simple calculation shows that the values of the expression E[(5 s X) + ] are , , 2.376, 1.44 and 0.6 for s = 0, 1, 2, 3 and 4, respectively, and 0 for s 5. Accordingly, we obtain the following values for u 2 (s): and we see that s 2 = if s = 0 u 2(s) = if s = if s 2 Using the optimal value equations one more time gives u 1(s) = max { 2s + E[u 2(s + X)], 5 + E[u 2(X)], where the expectation E[u s(s + X)] are equal to , and for s = 0, 1 or 2. Substituting these into optimality equation shows that if s = 0 u 1(s) = if s = if s 2 and so s 1 = 2. We 4
5 4.21) We consider a two-armed bandit model for a project management problem with two projects that are available for selection in each of three periods (N = 4). Project 1 yields a reward of one unit and always occupies state s, while project 2 occupies either state t or u. If project 2 is selected when it occupies state t, then it yields a reward of 0 and moves to state u at the next decision epoch; if selected when it occupies state u, then it yields a reward of 2 units and moves to state t with probability 0.5 and otherwise remains in state u. Assume a terminal reward of 0 and that project 2 does not change state when it is not selected. The policy that maximizes the total expected reward over the three decision epochs can be found with the Bellman equations. Since there is no terminal reward we have u 4 (s) = 0 for all states S = {(s, t), (s, u) and so with d 3 ((s, t)) = 1 and d 3 ((s, u)) = 2. Similarly, u 3((s, t)) = max{1, 0 = 1 u 3((s, u)) = max{1, 2 = 2, u 2((s, t)) = max{1 + u 3((s, t)), 0 + u 3((s, u)) = 2 u 2((s, u)) = max{1 + u 3((s, u)), (u 3((s, t)) + u 3((s, u))) = 3.5, with d 2 ((s, t)) {1, 2 (both choices are optimal) and d 2 ((s, u)) = 2. Lastly, u 1((s, t)) = max{1 + u 2((s, t)), 0 + u 2((s, u)) = 3.5 u 1((s, u)) = max{1 + u 2((s, u)), (u 2((s, t)) + u 2((s, u))) = 4.75, with d 1 ((s, t)) = d 1 ((s, u)) = ) Our aim is to find a foraging strategy that maximizes the probability of survival over 30 days using the model of lion foraging described in problem Recall that the reward function for this problem is { 1 if s > 0 r N (s) = 0 if s = 0 with r t (s, a) = 0 for all s and a when t = 1,, 29. Here s S = [0, 30] specifies the energy reserves of an individual lion, with s = 0 corresponding to death of that lion, while the action a A s = {0,, 6 is either a = 0 if the lion chooses to not hunt on that day or a 1 if the lion chooses to hunt in a group of size a. I will assume that the lion can choose to hunt whenever its energy reserves are positive, but we could also reasonably require s > 0.5 since each hunt uses this amount of energy. Since u 30 (s) = r 30(s), the optimal value equations for t = 29 take the form u 29(s) = max {E[u a 30(X 30 ) X 29 = s, Y 29 = a] = u ( 30 (s 6) + ) { max pa u 30(Φ(s, a)) + (1 p a )u 30((s 6.5) + ) 1 a 6 0 if s = 0 = 0.43 if 0 < s 6 1 if 6 < s 30, 5
6 where p a is the probability of a successful hunt by a group containing a animals and the function { ( Φ(s, a) min 30, s ) + a specifies the change in the lion s energy reserves following a successful hunt by such a group. When s > 6.5 all actions are optimal on day 29 (with respect to this criterion) and every optimal decision rule is consistent with the following specification: 6 if 0 < s 6 d 29(s) = 0 if 6 < s 6.5 0,, 6 if 6.5 < s 30. In general, the optimal value equations for this problem take the form { u 0 if s = 0 t (s) = u t+1 ((s { 6)+ ) max 1 a 6 pa u t+1 (Φ(s, a)) + (1 p a)u t+1 ((s 6.5)+ ) otherwise. These are most easily solved with the help of a computer and a sample C program that does this is attached. When this program is executed, one finds that there is a unique optimal decision rule for days 1 to 20, but that different rules are optimal on days 21 to 29. Furthermore, there are multiple rules that are optimal on days 25 to 29. These are indicated below: 0 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] d t (s) = 5 if s (0, 31/6) 6 if s (31/6, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] for t = 1,, 20, 0 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] d t (s) = 5 if s (0, 14/3] 6 if s (14/3, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] for t = 21 23, 0 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] d t (s) = 5 if s (0, 31/6] 6 if s (31/6, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] for t = 24, 5 if s (0, 19/6] d t (s) = 6 if s (19/6, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] 0, 6 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] 6
7 for t = 25, for t = 26, for t = 27, for t = 28, and for t = if s (24, 24.5] d 6 if s (0, 6] (6.5, 12.5] (13, 19] (19.5, 24] t (s) = 0, 6 if s (6, 6.5] (12.5, 13] (19, 19.5] 0,, 6 if s (24.5, 30] 0 if s (18, 18.5] d 6 if s (0, 6] (6.5, 12.5] (13, 18] t (s) = 0, 6 if s (6, 6.5] (12.5, 13] 0,, 6 if s (18.5, 30] 0 if s (12, 12.5] d 6 if s (0, 6] (6.5, 12] t (s) = 0, 6 if s (6, 6.5] 0,, 6 if s (12.5, 30] 0 if s (6, 6.5] d t (s) = 6 if s (0, 6] 0,, 6 if s (6.5, 30] The near-periodicity of the optimal decision rules for days 1 to 23 seems odd and is probably not robust to natural variation in daily energy budgets or storage capacity. Furthermore, the optimality of foraging in a group of size 5 when an individual s energy reserves are low is at least partly an artifact of the assumption that energy stores exceeding the maximum storage capacity can be used to satisfy daily energy costs. For example, if the model is modified by setting ({ ( Φ(s, a) min 30, s ) + 6.5), a then the optimal policy no longer has this feature. Arguably the most robust prediction of the model is that lions should hunt in large groups when possible, as this increases the likelihood that the hunt is successful and only marginally reduces the amount by which an individual s energy reserves increase following a successful hunt. 7
8 lion_mdp.c 10/15/12 6:49 PM /* Solution of the Bellman equations for the lion foraging model in Puterman p */ #include <stdio.h> #include <math.h> int main(void) { int t, N; int i, k, gain, loss, a; int flag; int dec[9001][7]; double p[7]; double pmax, psurv[7]; double u[31][9001]; /* capture probabilities */ p[1] = 0.15; p[2] = 0.33; p[3] = 0.37; p[4] = 0.4; p[5] = 0.42; p[6] = 0.43; /* boundary conditions for the policy evaluation algorithm */ u[30][0] = 0; for (i=1;i<=9000;i++) u[30][i] = 1; /* backwards induction */ for (t=29;t>=1;t--) { printf("\nt = %d\n", t); u[t][0] = 0.0; for (i=1;i<=9000;i++) { /* a = 0: lion doesn't hunt */ if (i > 1800) psurv[0] = u[t+1][i ]; else psurv[0] = 0; pmax = psurv[0]; /* a > 0: lion hunts in a group of size a */ if (i >= 1) { for (a=1;a<=6;a++) { gain = i *300/a; /* successful hunt */ if (gain > 9000) gain = 9000; if (gain < 0) gain = 0; loss = i ; /* unsuccessful hunt */ if (loss < 0) loss = 0; psurv[a] = p[a]*u[t+1][gain] + (1-p[a])*u[t+1][loss]; if (psurv[a] > pmax) pmax = psurv[a]; else for (a=1;a<=6;a++) psurv[a] = -1; for (a=0;a<=6;a++) { dec[i][a] = 0; Page 1 of 2
9 lion_mdp.c 10/15/12 6:49 PM if (psurv[a] == pmax) { dec[i][a] = 1; /* mark the optimal actions */ u[t][i] = pmax; /* expected total value of the optimal action */ printf("i = 0 (s = %lf): 0", 0.0); printf("\n"); for (i=1;i<=9000;i++) { flag = 0; for (a=0;a<=6;a++) { if (dec[i][a]!= dec[i-1][a]) flag = 1; /* print optimal decision rules at break points */ if (flag == 1) { printf("i = %d (s = %lf): ", i, ((double) i)/300); for (a=0;a<=6;a++) if (dec[i][a] == 1) printf("%d ", a); printf("\n"); Page 2 of 2
10 4.30) We need to find an optimal policy for a call option to purchase 100 shares of stock at a cost of $31 per share over a 30 day period when the initial price is $30 per share and the daily price increases by $0.10 with probability 0.6, remains the same with probability 0.1, and decreases with probability $0.10 with probability 0.3. We assume that the transaction cost is $50 and that the option expires at the end of the 30 day period. This is an example of an American call option, since the buyer has the right to purchase the stock at any time within those 30 days. In contrast, a European call option would allow the buyer to purchase the stock only on the expiration date. In either case, the buyer is not obligated to purchase the stock. As explained in Section 3.4.4, we can formulate this model as an optimal stopping problem with no holding cost and reward r t (s, Q) = 100 (s 31) 50 = 100s 3150 when s [0, ) and r t (s, C) = r t (, Q) = 0 for all s S and all t = 1,, 30. Because the cemetery state is absorbing, it is clear that u t ( ) = 0 for all t = 1,, 31. Furthermore, because the option has no value after its expiration date, we also know that u 31 (s) = r 31(s) = 0 for all s S. Accordingly, the optimal value equation for t = 30 has the form u 30(s) = max {0, 100s 3150 = (100s 3150) +, which shows that the option to purchase the shares should be exercised on the 30 th day if and only if the price of the stock is greater than or equal to $31.5, i.e., an optimal decision rule for t = 30 is { d Q if s (s) = C otherwise. Similarly, the optimal value equations for t 29 can be written as u t (s) = max { 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1) + ), 100s (2) and I claim that the maximum is achieved when d t (s) = C so that u t (s) = 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1) + ). (3) To verify this claim, first note that because the functions u 30 (s) and 100s 3150 are both convex, it follows that u 29 (s) is also convex (the maximum of two convex functions is also a convex function). This in turn implies that u 28 (s) is convex and then backwards induction allows us to conclude that all of the optimal value functions u t (s), t = 1,, 30 are convex. Similarly, either by induction or by an appeal to Proposition 4.7.3, we can conclude that all of the optimal value functions are non-decreasing. Taken together, these two properties imply that u t (s) 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1) + ) 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1)) u t+1(s ) u t+1(s). However, since u 30 (s) 100s 3150 for all s R, a third induction argument shows that u t (s) 100s 3150 for all t = 1,, 30, which then leads to (3). 8
11 These arguments have shown that the optimal policy for this problem is to wait until the expiration date and then purchase the shares if their price is at least $31.50 per share. Thus, to calculate the value of the option when the initial price is $30 per share, we merely need to evaluate u 1 (30), which we can do with the help of the policy evaluation algorithm and a computer (see the attached C code). This shows that the value of the option is a mere $ Indeed, since on average the share price increases by only three cents a day, the expected price of the stock at the end of the 30 day period is only $30.90 per share, which is below the strike price. 9
12 call_option_mdp.c 10/15/12 6:49 PM /* Solution of the Bellman equations for the call option model in Puterman p */ #include <stdio.h> #include <math.h> int main(void) { int t; int i; /* state s = i * $0.10 */ int dec[32][601]; /* optimal decision rules */ double u[32][601]; /* optimal value functions */ double valc, valq; /* Policy evaluation algorithm */ for (i=0;i<=600;i++) u[31][i] = 0; /* boundary conditions: no scrap value */ for (t=30;t>=1;t--) { for (i=0;i<=600-(31-t);i++) { /* valc = expected value if the shares are not purchased */ if (i > 0) valc = 0.6*u[t+1][i+1] + 0.1*u[t+1][i] + 0.3*u[t+1][i -1]; else if (i == 0) valc = 0.6*u[t+1][i+1] + 0.4*u[t+1][i]; /* valq = expected value if the shares are purchased */ valq = 10*((double) (i-310)) - 50; if (valc > valq) { u[t][i] = valc; dec[t][i] = 0; /* C is optimal */ else { u[t][i] = valq; dec[t][i] = 1; /* Q is optimal */ printf("t = %d, u_t(30) = %lf, d_t(30) = %d\n", t, u[t][300], dec[t] [300]); printf("value of option = %lf\n", u[1][300]); Page 1 of 1
STP Problem Set 2 Solutions
STP 425 - Problem Set 2 Solutions 3.2.) Suppose that the inventory model is modified so that orders may be backlogged with a cost of b(u) when u units are backlogged for one period. We assume that revenue
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationLecture Notes 1
4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross
More informationDynamic Portfolio Choice II
Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic
More informationOptimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008
(presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35 The Secretary problem You have
More informationHandout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,
More informationProblem 1: Random variables, common distributions and the monopoly price
Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively
More informationMartingale Pricing Theory in Discrete-Time and Discrete-Space Models
IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,
More informationDefinition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.
102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationReal Options and Game Theory in Incomplete Markets
Real Options and Game Theory in Incomplete Markets M. Grasselli Mathematics and Statistics McMaster University IMPA - June 28, 2006 Strategic Decision Making Suppose we want to assign monetary values to
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Gittins Index: Discounted, Bayesian (hence Markov arms). Reduces to stopping problem for each arm. Interpretation as (scaled)
More informationLecture 5 January 30
EE 223: Stochastic Estimation and Control Spring 2007 Lecture 5 January 30 Lecturer: Venkat Anantharam Scribe: aryam Kamgarpour 5.1 Secretary Problem The problem set-up is explained in Lecture 4. We review
More informationIntro to Economic analysis
Intro to Economic analysis Alberto Bisin - NYU 1 The Consumer Problem Consider an agent choosing her consumption of goods 1 and 2 for a given budget. This is the workhorse of microeconomic theory. (Notice
More informationTopics in Contract Theory Lecture 5. Property Rights Theory. The key question we are staring from is: What are ownership/property rights?
Leonardo Felli 15 January, 2002 Topics in Contract Theory Lecture 5 Property Rights Theory The key question we are staring from is: What are ownership/property rights? For an answer we need to distinguish
More informationMarkov Decision Processes II
Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies
More information6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationThe Value of Information in Central-Place Foraging. Research Report
The Value of Information in Central-Place Foraging. Research Report E. J. Collins A. I. Houston J. M. McNamara 22 February 2006 Abstract We consider a central place forager with two qualitatively different
More informationRevenue Management Under the Markov Chain Choice Model
Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin
More information4: SINGLE-PERIOD MARKET MODELS
4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period
More informationTechnical Appendix to Long-Term Contracts under the Threat of Supplier Default
0.287/MSOM.070.099ec Technical Appendix to Long-Term Contracts under the Threat of Supplier Default Robert Swinney Serguei Netessine The Wharton School, University of Pennsylvania, Philadelphia, PA, 904
More informationEE266 Homework 5 Solutions
EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The
More informationIEOR E4004: Introduction to OR: Deterministic Models
IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the
More informationMATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS
MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.
More informationFrom Discrete Time to Continuous Time Modeling
From Discrete Time to Continuous Time Modeling Prof. S. Jaimungal, Department of Statistics, University of Toronto 2004 Arrow-Debreu Securities 2004 Prof. S. Jaimungal 2 Consider a simple one-period economy
More informationHandout 4: Deterministic Systems and the Shortest Path Problem
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas
More informationLecture 4. Finite difference and finite element methods
Finite difference and finite element methods Lecture 4 Outline Black-Scholes equation From expectation to PDE Goal: compute the value of European option with payoff g which is the conditional expectation
More informationOnline Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh
Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Omitted Proofs LEMMA 5: Function ˆV is concave with slope between 1 and 0. PROOF: The fact that ˆV (w) is decreasing in
More informationFinite Memory and Imperfect Monitoring
Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve
More informationReasoning with Uncertainty
Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally
More informationGame theory for. Leonardo Badia.
Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player
More informationSTATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2016
STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Fall, 2016 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements, state
More informationCHAPTER 5: DYNAMIC PROGRAMMING
CHAPTER 5: DYNAMIC PROGRAMMING Overview This chapter discusses dynamic programming, a method to solve optimization problems that involve a dynamical process. This is in contrast to our previous discussions
More informationδ j 1 (S j S j 1 ) (2.3) j=1
Chapter The Binomial Model Let S be some tradable asset with prices and let S k = St k ), k = 0, 1,,....1) H = HS 0, S 1,..., S N 1, S N ).) be some option payoff with start date t 0 and end date or maturity
More informationMAFS Computational Methods for Pricing Structured Products
MAFS550 - Computational Methods for Pricing Structured Products Solution to Homework Two Course instructor: Prof YK Kwok 1 Expand f(x 0 ) and f(x 0 x) at x 0 into Taylor series, where f(x 0 ) = f(x 0 )
More informationOptimal Policies for Distributed Data Aggregation in Wireless Sensor Networks
Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks Hussein Abouzeid Department of Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute abouzeid@ecse.rpi.edu
More informationIdentification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium
Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium A Short Review of Aguirregabiria and Magesan (2010) January 25, 2012 1 / 18 Dynamics of the game Two players, {i,
More informationCOS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration
COS402- Artificial Intelligence Fall 2015 Lecture 17: MDP: Value Iteration and Policy Iteration Outline The Bellman equation and Bellman update Contraction Value iteration Policy iteration The Bellman
More informationMaking Decisions. CS 3793 Artificial Intelligence Making Decisions 1
Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside
More information1 Consumption and saving under uncertainty
1 Consumption and saving under uncertainty 1.1 Modelling uncertainty As in the deterministic case, we keep assuming that agents live for two periods. The novelty here is that their earnings in the second
More informationSingular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities
1/ 46 Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities Yue Kuen KWOK Department of Mathematics Hong Kong University of Science and Technology * Joint work
More informationLong-Term Values in MDPs, Corecursively
Long-Term Values in MDPs, Corecursively Applied Category Theory, 15-16 March 2018, NIST Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) MDPs, Corecursively NIST, 15/Mar/2018
More informationMath Models of OR: More on Equipment Replacement
Math Models of OR: More on Equipment Replacement John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA December 2018 Mitchell More on Equipment Replacement 1 / 9 Equipment replacement
More informationSequential Investment, Hold-up, and Strategic Delay
Sequential Investment, Hold-up, and Strategic Delay Juyan Zhang and Yi Zhang December 20, 2010 Abstract We investigate hold-up with simultaneous and sequential investment. We show that if the encouragement
More informationTug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract
Tug of War Game William Gasarch and ick Sovich and Paul Zimand October 6, 2009 To be written later Abstract Introduction Combinatorial games under auction play, introduced by Lazarus, Loeb, Propp, Stromquist,
More informationPricing theory of financial derivatives
Pricing theory of financial derivatives One-period securities model S denotes the price process {S(t) : t = 0, 1}, where S(t) = (S 1 (t) S 2 (t) S M (t)). Here, M is the number of securities. At t = 1,
More information16 MAKING SIMPLE DECISIONS
247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result
More informationDynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming
Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationHomework #4. CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class
Homework #4 CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class o Grades depend on neatness and clarity. o Write your answers with enough detail about your approach and concepts
More informationAn Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking
An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking Mika Sumida School of Operations Research and Information Engineering, Cornell University, Ithaca, New York
More informationDiscrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)
3 Discrete Random Variables and Probability Distributions Stat 4570/5570 Based on Devore s book (Ed 8) Random Variables We can associate each single outcome of an experiment with a real number: We refer
More informationDynamic Admission and Service Rate Control of a Queue
Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering
More informationOnline Appendix: Extensions
B Online Appendix: Extensions In this online appendix we demonstrate that many important variations of the exact cost-basis LUL framework remain tractable. In particular, dual problem instances corresponding
More informationCompetitive Market Model
57 Chapter 5 Competitive Market Model The competitive market model serves as the basis for the two different multi-user allocation methods presented in this thesis. This market model prices resources based
More information2.1 Mean-variance Analysis: Single-period Model
Chapter Portfolio Selection The theory of option pricing is a theory of deterministic returns: we hedge our option with the underlying to eliminate risk, and our resulting risk-free portfolio then earns
More informationMonotone, Convex and Extrema
Monotone Functions Function f is called monotonically increasing, if Chapter 8 Monotone, Convex and Extrema x x 2 f (x ) f (x 2 ) It is called strictly monotonically increasing, if f (x 2) f (x ) x < x
More informationPath Dependent British Options
Path Dependent British Options Kristoffer J Glover (Joint work with G. Peskir and F. Samee) School of Finance and Economics University of Technology, Sydney 18th August 2009 (PDE & Mathematical Finance
More informationTHE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE
THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,
More informationShort-time-to-expiry expansion for a digital European put option under the CEV model. November 1, 2017
Short-time-to-expiry expansion for a digital European put option under the CEV model November 1, 2017 Abstract In this paper I present a short-time-to-expiry asymptotic series expansion for a digital European
More information6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time
More information13.3 A Stochastic Production Planning Model
13.3. A Stochastic Production Planning Model 347 From (13.9), we can formally write (dx t ) = f (dt) + G (dz t ) + fgdz t dt, (13.3) dx t dt = f(dt) + Gdz t dt. (13.33) The exact meaning of these expressions
More informationAM 121: Intro to Optimization Models and Methods
AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,
More informationOn solving multistage stochastic programs with coherent risk measures
On solving multistage stochastic programs with coherent risk measures Andy Philpott Vitor de Matos y Erlon Finardi z August 13, 2012 Abstract We consider a class of multistage stochastic linear programs
More informationDynamic Programming (DP) Massimo Paolucci University of Genova
Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem
More informationUtility Indifference Pricing and Dynamic Programming Algorithm
Chapter 8 Utility Indifference ricing and Dynamic rogramming Algorithm In the Black-Scholes framework, we can perfectly replicate an option s payoff. However, it may not be true beyond the Black-Scholes
More informationSequential Investment, Hold-up, and Strategic Delay
Sequential Investment, Hold-up, and Strategic Delay Juyan Zhang and Yi Zhang February 20, 2011 Abstract We investigate hold-up in the case of both simultaneous and sequential investment. We show that if
More informationUncertainty in Equilibrium
Uncertainty in Equilibrium Larry Blume May 1, 2007 1 Introduction The state-preference approach to uncertainty of Kenneth J. Arrow (1953) and Gérard Debreu (1959) lends itself rather easily to Walrasian
More informationB. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as
B Online Appendix B1 Constructing examples with nonmonotonic adoption policies Assume c > 0 and the utility function u(w) is increasing and approaches as w approaches 0 Suppose we have a prior distribution
More informationProblem 1: Random variables, common distributions and the monopoly price
Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively
More informationTDT4171 Artificial Intelligence Methods
TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods
More informationFundamental Theorems of Welfare Economics
Fundamental Theorems of Welfare Economics Ram Singh October 4, 015 This Write-up is available at photocopy shop. Not for circulation. In this write-up we provide intuition behind the two fundamental theorems
More informationAnswers to Problem Set 4
Answers to Problem Set 4 Economics 703 Spring 016 1. a) The monopolist facing no threat of entry will pick the first cost function. To see this, calculate profits with each one. With the first cost function,
More informationHomework 2: Solutions Sid Banerjee Problem 1: Practice with Dynamic Programming Formulation
Problem 1: Practice with Dynamic Programming Formulation A product manager has to order stock daily. Each unit cost is c, there is a fixed cost of K for placing an order. If you order on day t, the items
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationMathematics of Finance Final Preparation December 19. To be thoroughly prepared for the final exam, you should
Mathematics of Finance Final Preparation December 19 To be thoroughly prepared for the final exam, you should 1. know how to do the homework problems. 2. be able to provide (correct and complete!) definitions
More informationKIER DISCUSSION PAPER SERIES
KIER DISCUSSION PAPER SERIES KYOTO INSTITUTE OF ECONOMIC RESEARCH http://www.kier.kyoto-u.ac.jp/index.html Discussion Paper No. 657 The Buy Price in Auctions with Discrete Type Distributions Yusuke Inami
More informationThe Uncertain Volatility Model
The Uncertain Volatility Model Claude Martini, Antoine Jacquier July 14, 008 1 Black-Scholes and realised volatility What happens when a trader uses the Black-Scholes (BS in the sequel) formula to sell
More informationAn optimal policy for joint dynamic price and lead-time quotation
Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming
More informationOptimal Dam Management
Optimal Dam Management Michel De Lara et Vincent Leclère July 3, 2012 Contents 1 Problem statement 1 1.1 Dam dynamics.................................. 2 1.2 Intertemporal payoff criterion..........................
More informationThe investment game in incomplete markets.
The investment game in incomplete markets. M. R. Grasselli Mathematics and Statistics McMaster University RIO 27 Buzios, October 24, 27 Successes and imitations of Real Options Real options accurately
More informationDepartment of Economics The Ohio State University Final Exam Answers Econ 8712
Department of Economics The Ohio State University Final Exam Answers Econ 872 Prof. Peck Fall 207. (35 points) The following economy has three consumers, one firm, and four goods. Good is the labor/leisure
More information16 MAKING SIMPLE DECISIONS
253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)
More informationDepartment of Economics The Ohio State University Final Exam Questions and Answers Econ 8712
Prof. Peck Fall 016 Department of Economics The Ohio State University Final Exam Questions and Answers Econ 871 1. (35 points) The following economy has one consumer, two firms, and four goods. Goods 1
More informationMicroeconomic Foundations I Choice and Competitive Markets. David M. Kreps
Microeconomic Foundations I Choice and Competitive Markets David M. Kreps PRINCETON UNIVERSITY PRESS PRINCETON AND OXFORD Contents Preface xiii Chapter One. Choice, Preference, and Utility 1 1.1. Consumer
More informationTerm Structure Lattice Models
IEOR E4706: Foundations of Financial Engineering c 2016 by Martin Haugh Term Structure Lattice Models These lecture notes introduce fixed income derivative securities and the modeling philosophy used to
More informationCapacity Expansion Games with Application to Competition in Power May 19, Generation 2017 Investmen 1 / 24
Capacity Expansion Games with Application to Competition in Power Generation Investments joint with René Aïd and Mike Ludkovski CFMAR 10th Anniversary Conference May 19, 017 Capacity Expansion Games with
More informationIntroduction to Dynamic Programming
Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1
More informationRobust Pricing and Hedging of Options on Variance
Robust Pricing and Hedging of Options on Variance Alexander Cox Jiajie Wang University of Bath Bachelier 21, Toronto Financial Setting Option priced on an underlying asset S t Dynamics of S t unspecified,
More informationStochastic Optimal Control
Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationKØBENHAVNS UNIVERSITET (Blok 2, 2011/2012) Naturvidenskabelig kandidateksamen Continuous time finance (FinKont) TIME ALLOWED : 3 hours
This question paper consists of 3 printed pages FinKont KØBENHAVNS UNIVERSITET (Blok 2, 211/212) Naturvidenskabelig kandidateksamen Continuous time finance (FinKont) TIME ALLOWED : 3 hours This exam paper
More informationDynamic Programming and Reinforcement Learning
Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning
More informationNotes on the EM Algorithm Michael Collins, September 24th 2005
Notes on the EM Algorithm Michael Collins, September 24th 2005 1 Hidden Markov Models A hidden Markov model (N, Σ, Θ) consists of the following elements: N is a positive integer specifying the number of
More informationOptimal liquidation with market parameter shift: a forward approach
Optimal liquidation with market parameter shift: a forward approach (with S. Nadtochiy and T. Zariphopoulou) Haoran Wang Ph.D. candidate University of Texas at Austin ICERM June, 2017 Problem Setup and
More informationEE365: Risk Averse Control
EE365: Risk Averse Control Risk averse optimization Exponential risk aversion Risk averse control 1 Outline Risk averse optimization Exponential risk aversion Risk averse control Risk averse optimization
More informationAnswer: Let y 2 denote rm 2 s output of food and L 2 denote rm 2 s labor input (so
The Ohio State University Department of Economics Econ 805 Extra Problems on Production and Uncertainty: Questions and Answers Winter 003 Prof. Peck () In the following economy, there are two consumers,
More informationHomework 2: Dynamic Moral Hazard
Homework 2: Dynamic Moral Hazard Question 0 (Normal learning model) Suppose that z t = θ + ɛ t, where θ N(m 0, 1/h 0 ) and ɛ t N(0, 1/h ɛ ) are IID. Show that θ z 1 N ( hɛ z 1 h 0 + h ɛ + h 0m 0 h 0 +
More information