STP Problem Set 3 Solutions

Size: px
Start display at page:

Download "STP Problem Set 3 Solutions"

Transcription

1 STP Problem Set 3 Solutions 4.4) Consider the separable sequential allocation problem introduced in Sections and 4.6.3, where the goal is to maximize the sum subject to the constraints f(x 1,, x N ) = N g(x t ) t=1 N x i = M with x 1,, x N 0. i=1 As explained in Section 3.3.3, this optimization problem can be formulated as a deterministic dynamic program by letting S = [0, M], A s = [0, s], r t (s, a) = g(a) for t = 1,, N 1 and r N (s) = g(s), with dynamical equation s t+1 = s t a t. For this exercise, we will assume that the function g(a) is strictly concave of the set [0, M], meaning that for all points x, y [0, M] and all p [0, 1] we have with strict inequality whenever p (0, 1). pg(x) + (1 p)g(y) g ( px + (1 p)y ), If we let u t (s), t = 1,, N denote the optimal value functions for this problem, then these satisfy the Bellman equations u { t (s) = sup g(a) + u t+1 (s a), t = 1,, N 1, u N(s) = g(s), which can be solved using backwards recursion. For example, when t = N 1, we have since the concavity of g implies that u N 1(s) = sup {g(a) + g(s a) = 2g(s/2), 1 2 g(a) + 1 g(s a) g 2 ( 1 2 a + 1 ) (s a) = g(s/2), 2 with equality if and only if a = d N 1 (s) = s/2. Similarly, when t = N 2, then { ( ) s a u N 2(s) = sup g(a) + 2g = 3g(s/3), 2 since the concavity of g implies that 1 3 g(a) + 2 ( ) s a 3 2 ( 1 g 3 a ) s a = g(s/3), 2

2 with equality if and only if a = d N 2 (s) = s/3. These calculations suggest that the following general formula holds ( s ) u N t+1(s) = tg t (1) for t = 1,, N with the unique optimal decision rule being d N t+1 (s) = s/t. This can be verified by backwards induction. We first note that (1) holds for t = 1, 2, 3 as above. Suppose, then, that (1) is true for t = 1,, n. Then u { N n(s) = sup g(a) + u N n+1 (s a) = sup = (n + 1)g { g(a) + ng ( ) s. n + 1 ( ) s a Indeed, the third equality follows from the concavity of g, since 1 n + 1 g(a) + n ( ) ( s a 1 n + 1 g g n + 1 n + 1 a + n ) s a n + 1 n with equality if and only if a = d N n (s) = s/(n + 1). n ( ) s = g, n + 1 The optimal allocation pattern can now be determined by forward recursion using the optimal decision rules given above. When t = 1, we have s 1 = M, a 1 = d 1 (M) = M/N and r(s 1, a 1 ) = g(m/n). Likewise, when t = 2, we have s 2 = M(N 1)/N, a 2 = d 2 (M(N 1)/N) = M/N and r(s 2, a 2 ) = g(m/n). Indeed, by forward induction, we can establish that the optimal sequence of states, actions and rewards is for t = 1,, N. s t = Mt N, a t = M N, r(s t, a t ) = g(m/n) Note: In Section 4.6.3, the problem is reformulated as a maximization of the sum N f(x 1,, x N ) = g(x t ) t=1 and then we need to assume that g is convex. 2

3 4.5) Consider the separable sequential allocation problem as formulated in the preceding exercise, but now suppose that g is convex, i.e., pg(x) + (1 p)g(y) g ( px + (1 p)y ). As before, we can solve the Bellman equations using backwards recursion. When t = N 1, we have u N 1(s) = sup {g(a) + g(s a) = g(0) + g(s) with maxima at a = 0 or a = s. Indeed, the convexity of g implies that for any a [0, s] we have ( 1 a ) g(0) + a g(s) g(a) s s and summing these inequalities gives a s g(0) + (1 a s ) g(s) g(s a) g(0) + g(s) g(a) + g(s a) for all such a with equality when a = 0 or a = s. If we next let t = N 2, then u N 1(s) = sup {g(a) + g(0) + g(s a) = 2g(0) + g(s) by the argument given above, with maxima again at a = 0 or a = s. induction, we can establish that for all t = 1,, N Thus, by backwards u t (s) = (N t + 1)g(0) + g(s) and that d t (s) {0, s. It follows that there are at least N different optimal allocation patterns, π1,, π N, where π i is the pattern obtained by allocating all M of the resources during the i th decision epoch and 0 during all of the others. 3

4 4.20) Consider the equipment replacement model of Section with two decision epochs (N = 3), where the condition of the equipment s decays from epoch to epoch according to the equation s t+1 = s t + X t, where X t is geometrically-distributed with parameter π = 0.4, and the costs are determined by the parameters R = 0, K = 5, h(s) = 2s, and r 3 (s) = (5 s) +. By Theorem 4.7.5, we know that this problem is monotone, which implies that for each decision epoch t < N there exists a threshold value s t such that the optimal decision rule takes the form d t (s) = 0 if s < s t and d t (s) = 1 otherwise. In other words, the optimal policy is to order a replacement in epoch t if and only if the condition of the equipment is greater than level s t (higher values indicating poorer condition). This policy can be found by carrying out the monotone backward induction algorithm. begin by observing that u 3(s) = (5 s) + for all s 0. Upon substituting this expression into the optimal value equations, we then have u 2(s) = max { 2s + E[(5 s X) + ], 5 + E[(5 X) + ], where X Geometric(π). A simple calculation shows that the values of the expression E[(5 s X) + ] are , , 2.376, 1.44 and 0.6 for s = 0, 1, 2, 3 and 4, respectively, and 0 for s 5. Accordingly, we obtain the following values for u 2 (s): and we see that s 2 = if s = 0 u 2(s) = if s = if s 2 Using the optimal value equations one more time gives u 1(s) = max { 2s + E[u 2(s + X)], 5 + E[u 2(X)], where the expectation E[u s(s + X)] are equal to , and for s = 0, 1 or 2. Substituting these into optimality equation shows that if s = 0 u 1(s) = if s = if s 2 and so s 1 = 2. We 4

5 4.21) We consider a two-armed bandit model for a project management problem with two projects that are available for selection in each of three periods (N = 4). Project 1 yields a reward of one unit and always occupies state s, while project 2 occupies either state t or u. If project 2 is selected when it occupies state t, then it yields a reward of 0 and moves to state u at the next decision epoch; if selected when it occupies state u, then it yields a reward of 2 units and moves to state t with probability 0.5 and otherwise remains in state u. Assume a terminal reward of 0 and that project 2 does not change state when it is not selected. The policy that maximizes the total expected reward over the three decision epochs can be found with the Bellman equations. Since there is no terminal reward we have u 4 (s) = 0 for all states S = {(s, t), (s, u) and so with d 3 ((s, t)) = 1 and d 3 ((s, u)) = 2. Similarly, u 3((s, t)) = max{1, 0 = 1 u 3((s, u)) = max{1, 2 = 2, u 2((s, t)) = max{1 + u 3((s, t)), 0 + u 3((s, u)) = 2 u 2((s, u)) = max{1 + u 3((s, u)), (u 3((s, t)) + u 3((s, u))) = 3.5, with d 2 ((s, t)) {1, 2 (both choices are optimal) and d 2 ((s, u)) = 2. Lastly, u 1((s, t)) = max{1 + u 2((s, t)), 0 + u 2((s, u)) = 3.5 u 1((s, u)) = max{1 + u 2((s, u)), (u 2((s, t)) + u 2((s, u))) = 4.75, with d 1 ((s, t)) = d 1 ((s, u)) = ) Our aim is to find a foraging strategy that maximizes the probability of survival over 30 days using the model of lion foraging described in problem Recall that the reward function for this problem is { 1 if s > 0 r N (s) = 0 if s = 0 with r t (s, a) = 0 for all s and a when t = 1,, 29. Here s S = [0, 30] specifies the energy reserves of an individual lion, with s = 0 corresponding to death of that lion, while the action a A s = {0,, 6 is either a = 0 if the lion chooses to not hunt on that day or a 1 if the lion chooses to hunt in a group of size a. I will assume that the lion can choose to hunt whenever its energy reserves are positive, but we could also reasonably require s > 0.5 since each hunt uses this amount of energy. Since u 30 (s) = r 30(s), the optimal value equations for t = 29 take the form u 29(s) = max {E[u a 30(X 30 ) X 29 = s, Y 29 = a] = u ( 30 (s 6) + ) { max pa u 30(Φ(s, a)) + (1 p a )u 30((s 6.5) + ) 1 a 6 0 if s = 0 = 0.43 if 0 < s 6 1 if 6 < s 30, 5

6 where p a is the probability of a successful hunt by a group containing a animals and the function { ( Φ(s, a) min 30, s ) + a specifies the change in the lion s energy reserves following a successful hunt by such a group. When s > 6.5 all actions are optimal on day 29 (with respect to this criterion) and every optimal decision rule is consistent with the following specification: 6 if 0 < s 6 d 29(s) = 0 if 6 < s 6.5 0,, 6 if 6.5 < s 30. In general, the optimal value equations for this problem take the form { u 0 if s = 0 t (s) = u t+1 ((s { 6)+ ) max 1 a 6 pa u t+1 (Φ(s, a)) + (1 p a)u t+1 ((s 6.5)+ ) otherwise. These are most easily solved with the help of a computer and a sample C program that does this is attached. When this program is executed, one finds that there is a unique optimal decision rule for days 1 to 20, but that different rules are optimal on days 21 to 29. Furthermore, there are multiple rules that are optimal on days 25 to 29. These are indicated below: 0 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] d t (s) = 5 if s (0, 31/6) 6 if s (31/6, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] for t = 1,, 20, 0 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] d t (s) = 5 if s (0, 14/3] 6 if s (14/3, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] for t = 21 23, 0 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] d t (s) = 5 if s (0, 31/6] 6 if s (31/6, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] for t = 24, 5 if s (0, 19/6] d t (s) = 6 if s (19/6, 6] (6.5, 12.5] (13, 19] (19.5, 25.5] (26, 30] 0, 6 if s (6, 6.5] (12.5, 13] (19, 19.5] (25.5, 26] 6

7 for t = 25, for t = 26, for t = 27, for t = 28, and for t = if s (24, 24.5] d 6 if s (0, 6] (6.5, 12.5] (13, 19] (19.5, 24] t (s) = 0, 6 if s (6, 6.5] (12.5, 13] (19, 19.5] 0,, 6 if s (24.5, 30] 0 if s (18, 18.5] d 6 if s (0, 6] (6.5, 12.5] (13, 18] t (s) = 0, 6 if s (6, 6.5] (12.5, 13] 0,, 6 if s (18.5, 30] 0 if s (12, 12.5] d 6 if s (0, 6] (6.5, 12] t (s) = 0, 6 if s (6, 6.5] 0,, 6 if s (12.5, 30] 0 if s (6, 6.5] d t (s) = 6 if s (0, 6] 0,, 6 if s (6.5, 30] The near-periodicity of the optimal decision rules for days 1 to 23 seems odd and is probably not robust to natural variation in daily energy budgets or storage capacity. Furthermore, the optimality of foraging in a group of size 5 when an individual s energy reserves are low is at least partly an artifact of the assumption that energy stores exceeding the maximum storage capacity can be used to satisfy daily energy costs. For example, if the model is modified by setting ({ ( Φ(s, a) min 30, s ) + 6.5), a then the optimal policy no longer has this feature. Arguably the most robust prediction of the model is that lions should hunt in large groups when possible, as this increases the likelihood that the hunt is successful and only marginally reduces the amount by which an individual s energy reserves increase following a successful hunt. 7

8 lion_mdp.c 10/15/12 6:49 PM /* Solution of the Bellman equations for the lion foraging model in Puterman p */ #include <stdio.h> #include <math.h> int main(void) { int t, N; int i, k, gain, loss, a; int flag; int dec[9001][7]; double p[7]; double pmax, psurv[7]; double u[31][9001]; /* capture probabilities */ p[1] = 0.15; p[2] = 0.33; p[3] = 0.37; p[4] = 0.4; p[5] = 0.42; p[6] = 0.43; /* boundary conditions for the policy evaluation algorithm */ u[30][0] = 0; for (i=1;i<=9000;i++) u[30][i] = 1; /* backwards induction */ for (t=29;t>=1;t--) { printf("\nt = %d\n", t); u[t][0] = 0.0; for (i=1;i<=9000;i++) { /* a = 0: lion doesn't hunt */ if (i > 1800) psurv[0] = u[t+1][i ]; else psurv[0] = 0; pmax = psurv[0]; /* a > 0: lion hunts in a group of size a */ if (i >= 1) { for (a=1;a<=6;a++) { gain = i *300/a; /* successful hunt */ if (gain > 9000) gain = 9000; if (gain < 0) gain = 0; loss = i ; /* unsuccessful hunt */ if (loss < 0) loss = 0; psurv[a] = p[a]*u[t+1][gain] + (1-p[a])*u[t+1][loss]; if (psurv[a] > pmax) pmax = psurv[a]; else for (a=1;a<=6;a++) psurv[a] = -1; for (a=0;a<=6;a++) { dec[i][a] = 0; Page 1 of 2

9 lion_mdp.c 10/15/12 6:49 PM if (psurv[a] == pmax) { dec[i][a] = 1; /* mark the optimal actions */ u[t][i] = pmax; /* expected total value of the optimal action */ printf("i = 0 (s = %lf): 0", 0.0); printf("\n"); for (i=1;i<=9000;i++) { flag = 0; for (a=0;a<=6;a++) { if (dec[i][a]!= dec[i-1][a]) flag = 1; /* print optimal decision rules at break points */ if (flag == 1) { printf("i = %d (s = %lf): ", i, ((double) i)/300); for (a=0;a<=6;a++) if (dec[i][a] == 1) printf("%d ", a); printf("\n"); Page 2 of 2

10 4.30) We need to find an optimal policy for a call option to purchase 100 shares of stock at a cost of $31 per share over a 30 day period when the initial price is $30 per share and the daily price increases by $0.10 with probability 0.6, remains the same with probability 0.1, and decreases with probability $0.10 with probability 0.3. We assume that the transaction cost is $50 and that the option expires at the end of the 30 day period. This is an example of an American call option, since the buyer has the right to purchase the stock at any time within those 30 days. In contrast, a European call option would allow the buyer to purchase the stock only on the expiration date. In either case, the buyer is not obligated to purchase the stock. As explained in Section 3.4.4, we can formulate this model as an optimal stopping problem with no holding cost and reward r t (s, Q) = 100 (s 31) 50 = 100s 3150 when s [0, ) and r t (s, C) = r t (, Q) = 0 for all s S and all t = 1,, 30. Because the cemetery state is absorbing, it is clear that u t ( ) = 0 for all t = 1,, 31. Furthermore, because the option has no value after its expiration date, we also know that u 31 (s) = r 31(s) = 0 for all s S. Accordingly, the optimal value equation for t = 30 has the form u 30(s) = max {0, 100s 3150 = (100s 3150) +, which shows that the option to purchase the shares should be exercised on the 30 th day if and only if the price of the stock is greater than or equal to $31.5, i.e., an optimal decision rule for t = 30 is { d Q if s (s) = C otherwise. Similarly, the optimal value equations for t 29 can be written as u t (s) = max { 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1) + ), 100s (2) and I claim that the maximum is achieved when d t (s) = C so that u t (s) = 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1) + ). (3) To verify this claim, first note that because the functions u 30 (s) and 100s 3150 are both convex, it follows that u 29 (s) is also convex (the maximum of two convex functions is also a convex function). This in turn implies that u 28 (s) is convex and then backwards induction allows us to conclude that all of the optimal value functions u t (s), t = 1,, 30 are convex. Similarly, either by induction or by an appeal to Proposition 4.7.3, we can conclude that all of the optimal value functions are non-decreasing. Taken together, these two properties imply that u t (s) 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1) + ) 0.6u t+1(s + 0.1) + 0.1u t+1(s) + 0.3u t+1((s 0.1)) u t+1(s ) u t+1(s). However, since u 30 (s) 100s 3150 for all s R, a third induction argument shows that u t (s) 100s 3150 for all t = 1,, 30, which then leads to (3). 8

11 These arguments have shown that the optimal policy for this problem is to wait until the expiration date and then purchase the shares if their price is at least $31.50 per share. Thus, to calculate the value of the option when the initial price is $30 per share, we merely need to evaluate u 1 (30), which we can do with the help of the policy evaluation algorithm and a computer (see the attached C code). This shows that the value of the option is a mere $ Indeed, since on average the share price increases by only three cents a day, the expected price of the stock at the end of the 30 day period is only $30.90 per share, which is below the strike price. 9

12 call_option_mdp.c 10/15/12 6:49 PM /* Solution of the Bellman equations for the call option model in Puterman p */ #include <stdio.h> #include <math.h> int main(void) { int t; int i; /* state s = i * $0.10 */ int dec[32][601]; /* optimal decision rules */ double u[32][601]; /* optimal value functions */ double valc, valq; /* Policy evaluation algorithm */ for (i=0;i<=600;i++) u[31][i] = 0; /* boundary conditions: no scrap value */ for (t=30;t>=1;t--) { for (i=0;i<=600-(31-t);i++) { /* valc = expected value if the shares are not purchased */ if (i > 0) valc = 0.6*u[t+1][i+1] + 0.1*u[t+1][i] + 0.3*u[t+1][i -1]; else if (i == 0) valc = 0.6*u[t+1][i+1] + 0.4*u[t+1][i]; /* valq = expected value if the shares are purchased */ valq = 10*((double) (i-310)) - 50; if (valc > valq) { u[t][i] = valc; dec[t][i] = 0; /* C is optimal */ else { u[t][i] = valq; dec[t][i] = 1; /* Q is optimal */ printf("t = %d, u_t(30) = %lf, d_t(30) = %d\n", t, u[t][300], dec[t] [300]); printf("value of option = %lf\n", u[1][300]); Page 1 of 1

STP Problem Set 2 Solutions

STP Problem Set 2 Solutions STP 425 - Problem Set 2 Solutions 3.2.) Suppose that the inventory model is modified so that orders may be backlogged with a cost of b(u) when u units are backlogged for one period. We assume that revenue

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Lecture Notes 1

Lecture Notes 1 4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross

More information

Dynamic Portfolio Choice II

Dynamic Portfolio Choice II Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic

More information

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008 (presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35 The Secretary problem You have

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

Problem 1: Random variables, common distributions and the monopoly price

Problem 1: Random variables, common distributions and the monopoly price Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens. 102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Real Options and Game Theory in Incomplete Markets

Real Options and Game Theory in Incomplete Markets Real Options and Game Theory in Incomplete Markets M. Grasselli Mathematics and Statistics McMaster University IMPA - June 28, 2006 Strategic Decision Making Suppose we want to assign monetary values to

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Gittins Index: Discounted, Bayesian (hence Markov arms). Reduces to stopping problem for each arm. Interpretation as (scaled)

More information

Lecture 5 January 30

Lecture 5 January 30 EE 223: Stochastic Estimation and Control Spring 2007 Lecture 5 January 30 Lecturer: Venkat Anantharam Scribe: aryam Kamgarpour 5.1 Secretary Problem The problem set-up is explained in Lecture 4. We review

More information

Intro to Economic analysis

Intro to Economic analysis Intro to Economic analysis Alberto Bisin - NYU 1 The Consumer Problem Consider an agent choosing her consumption of goods 1 and 2 for a given budget. This is the workhorse of microeconomic theory. (Notice

More information

Topics in Contract Theory Lecture 5. Property Rights Theory. The key question we are staring from is: What are ownership/property rights?

Topics in Contract Theory Lecture 5. Property Rights Theory. The key question we are staring from is: What are ownership/property rights? Leonardo Felli 15 January, 2002 Topics in Contract Theory Lecture 5 Property Rights Theory The key question we are staring from is: What are ownership/property rights? For an answer we need to distinguish

More information

Markov Decision Processes II

Markov Decision Processes II Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies

More information

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

The Value of Information in Central-Place Foraging. Research Report

The Value of Information in Central-Place Foraging. Research Report The Value of Information in Central-Place Foraging. Research Report E. J. Collins A. I. Houston J. M. McNamara 22 February 2006 Abstract We consider a central place forager with two qualitatively different

More information

Revenue Management Under the Markov Chain Choice Model

Revenue Management Under the Markov Chain Choice Model Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

Technical Appendix to Long-Term Contracts under the Threat of Supplier Default

Technical Appendix to Long-Term Contracts under the Threat of Supplier Default 0.287/MSOM.070.099ec Technical Appendix to Long-Term Contracts under the Threat of Supplier Default Robert Swinney Serguei Netessine The Wharton School, University of Pennsylvania, Philadelphia, PA, 904

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

IEOR E4004: Introduction to OR: Deterministic Models

IEOR E4004: Introduction to OR: Deterministic Models IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the

More information

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.

More information

From Discrete Time to Continuous Time Modeling

From Discrete Time to Continuous Time Modeling From Discrete Time to Continuous Time Modeling Prof. S. Jaimungal, Department of Statistics, University of Toronto 2004 Arrow-Debreu Securities 2004 Prof. S. Jaimungal 2 Consider a simple one-period economy

More information

Handout 4: Deterministic Systems and the Shortest Path Problem

Handout 4: Deterministic Systems and the Shortest Path Problem SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas

More information

Lecture 4. Finite difference and finite element methods

Lecture 4. Finite difference and finite element methods Finite difference and finite element methods Lecture 4 Outline Black-Scholes equation From expectation to PDE Goal: compute the value of European option with payoff g which is the conditional expectation

More information

Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh

Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Omitted Proofs LEMMA 5: Function ˆV is concave with slope between 1 and 0. PROOF: The fact that ˆV (w) is decreasing in

More information

Finite Memory and Imperfect Monitoring

Finite Memory and Imperfect Monitoring Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally

More information

Game theory for. Leonardo Badia.

Game theory for. Leonardo Badia. Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player

More information

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2016

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2016 STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Fall, 2016 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements, state

More information

CHAPTER 5: DYNAMIC PROGRAMMING

CHAPTER 5: DYNAMIC PROGRAMMING CHAPTER 5: DYNAMIC PROGRAMMING Overview This chapter discusses dynamic programming, a method to solve optimization problems that involve a dynamical process. This is in contrast to our previous discussions

More information

δ j 1 (S j S j 1 ) (2.3) j=1

δ j 1 (S j S j 1 ) (2.3) j=1 Chapter The Binomial Model Let S be some tradable asset with prices and let S k = St k ), k = 0, 1,,....1) H = HS 0, S 1,..., S N 1, S N ).) be some option payoff with start date t 0 and end date or maturity

More information

MAFS Computational Methods for Pricing Structured Products

MAFS Computational Methods for Pricing Structured Products MAFS550 - Computational Methods for Pricing Structured Products Solution to Homework Two Course instructor: Prof YK Kwok 1 Expand f(x 0 ) and f(x 0 x) at x 0 into Taylor series, where f(x 0 ) = f(x 0 )

More information

Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks

Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks Hussein Abouzeid Department of Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute abouzeid@ecse.rpi.edu

More information

Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium

Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium A Short Review of Aguirregabiria and Magesan (2010) January 25, 2012 1 / 18 Dynamics of the game Two players, {i,

More information

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration COS402- Artificial Intelligence Fall 2015 Lecture 17: MDP: Value Iteration and Policy Iteration Outline The Bellman equation and Bellman update Contraction Value iteration Policy iteration The Bellman

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

1 Consumption and saving under uncertainty

1 Consumption and saving under uncertainty 1 Consumption and saving under uncertainty 1.1 Modelling uncertainty As in the deterministic case, we keep assuming that agents live for two periods. The novelty here is that their earnings in the second

More information

Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities

Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities 1/ 46 Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities Yue Kuen KWOK Department of Mathematics Hong Kong University of Science and Technology * Joint work

More information

Long-Term Values in MDPs, Corecursively

Long-Term Values in MDPs, Corecursively Long-Term Values in MDPs, Corecursively Applied Category Theory, 15-16 March 2018, NIST Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) MDPs, Corecursively NIST, 15/Mar/2018

More information

Math Models of OR: More on Equipment Replacement

Math Models of OR: More on Equipment Replacement Math Models of OR: More on Equipment Replacement John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA December 2018 Mitchell More on Equipment Replacement 1 / 9 Equipment replacement

More information

Sequential Investment, Hold-up, and Strategic Delay

Sequential Investment, Hold-up, and Strategic Delay Sequential Investment, Hold-up, and Strategic Delay Juyan Zhang and Yi Zhang December 20, 2010 Abstract We investigate hold-up with simultaneous and sequential investment. We show that if the encouragement

More information

Tug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract

Tug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract Tug of War Game William Gasarch and ick Sovich and Paul Zimand October 6, 2009 To be written later Abstract Introduction Combinatorial games under auction play, introduced by Lazarus, Loeb, Propp, Stromquist,

More information

Pricing theory of financial derivatives

Pricing theory of financial derivatives Pricing theory of financial derivatives One-period securities model S denotes the price process {S(t) : t = 0, 1}, where S(t) = (S 1 (t) S 2 (t) S M (t)). Here, M is the number of securities. At t = 1,

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Homework #4. CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class

Homework #4. CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class Homework #4 CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class o Grades depend on neatness and clarity. o Write your answers with enough detail about your approach and concepts

More information

An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking

An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking Mika Sumida School of Operations Research and Information Engineering, Cornell University, Ithaca, New York

More information

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8) 3 Discrete Random Variables and Probability Distributions Stat 4570/5570 Based on Devore s book (Ed 8) Random Variables We can associate each single outcome of an experiment with a real number: We refer

More information

Dynamic Admission and Service Rate Control of a Queue

Dynamic Admission and Service Rate Control of a Queue Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering

More information

Online Appendix: Extensions

Online Appendix: Extensions B Online Appendix: Extensions In this online appendix we demonstrate that many important variations of the exact cost-basis LUL framework remain tractable. In particular, dual problem instances corresponding

More information

Competitive Market Model

Competitive Market Model 57 Chapter 5 Competitive Market Model The competitive market model serves as the basis for the two different multi-user allocation methods presented in this thesis. This market model prices resources based

More information

2.1 Mean-variance Analysis: Single-period Model

2.1 Mean-variance Analysis: Single-period Model Chapter Portfolio Selection The theory of option pricing is a theory of deterministic returns: we hedge our option with the underlying to eliminate risk, and our resulting risk-free portfolio then earns

More information

Monotone, Convex and Extrema

Monotone, Convex and Extrema Monotone Functions Function f is called monotonically increasing, if Chapter 8 Monotone, Convex and Extrema x x 2 f (x ) f (x 2 ) It is called strictly monotonically increasing, if f (x 2) f (x ) x < x

More information

Path Dependent British Options

Path Dependent British Options Path Dependent British Options Kristoffer J Glover (Joint work with G. Peskir and F. Samee) School of Finance and Economics University of Technology, Sydney 18th August 2009 (PDE & Mathematical Finance

More information

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,

More information

Short-time-to-expiry expansion for a digital European put option under the CEV model. November 1, 2017

Short-time-to-expiry expansion for a digital European put option under the CEV model. November 1, 2017 Short-time-to-expiry expansion for a digital European put option under the CEV model November 1, 2017 Abstract In this paper I present a short-time-to-expiry asymptotic series expansion for a digital European

More information

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time

More information

13.3 A Stochastic Production Planning Model

13.3 A Stochastic Production Planning Model 13.3. A Stochastic Production Planning Model 347 From (13.9), we can formally write (dx t ) = f (dt) + G (dz t ) + fgdz t dt, (13.3) dx t dt = f(dt) + Gdz t dt. (13.33) The exact meaning of these expressions

More information

AM 121: Intro to Optimization Models and Methods

AM 121: Intro to Optimization Models and Methods AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,

More information

On solving multistage stochastic programs with coherent risk measures

On solving multistage stochastic programs with coherent risk measures On solving multistage stochastic programs with coherent risk measures Andy Philpott Vitor de Matos y Erlon Finardi z August 13, 2012 Abstract We consider a class of multistage stochastic linear programs

More information

Dynamic Programming (DP) Massimo Paolucci University of Genova

Dynamic Programming (DP) Massimo Paolucci University of Genova Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem

More information

Utility Indifference Pricing and Dynamic Programming Algorithm

Utility Indifference Pricing and Dynamic Programming Algorithm Chapter 8 Utility Indifference ricing and Dynamic rogramming Algorithm In the Black-Scholes framework, we can perfectly replicate an option s payoff. However, it may not be true beyond the Black-Scholes

More information

Sequential Investment, Hold-up, and Strategic Delay

Sequential Investment, Hold-up, and Strategic Delay Sequential Investment, Hold-up, and Strategic Delay Juyan Zhang and Yi Zhang February 20, 2011 Abstract We investigate hold-up in the case of both simultaneous and sequential investment. We show that if

More information

Uncertainty in Equilibrium

Uncertainty in Equilibrium Uncertainty in Equilibrium Larry Blume May 1, 2007 1 Introduction The state-preference approach to uncertainty of Kenneth J. Arrow (1953) and Gérard Debreu (1959) lends itself rather easily to Walrasian

More information

B. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as

B. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as B Online Appendix B1 Constructing examples with nonmonotonic adoption policies Assume c > 0 and the utility function u(w) is increasing and approaches as w approaches 0 Suppose we have a prior distribution

More information

Problem 1: Random variables, common distributions and the monopoly price

Problem 1: Random variables, common distributions and the monopoly price Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Fundamental Theorems of Welfare Economics

Fundamental Theorems of Welfare Economics Fundamental Theorems of Welfare Economics Ram Singh October 4, 015 This Write-up is available at photocopy shop. Not for circulation. In this write-up we provide intuition behind the two fundamental theorems

More information

Answers to Problem Set 4

Answers to Problem Set 4 Answers to Problem Set 4 Economics 703 Spring 016 1. a) The monopolist facing no threat of entry will pick the first cost function. To see this, calculate profits with each one. With the first cost function,

More information

Homework 2: Solutions Sid Banerjee Problem 1: Practice with Dynamic Programming Formulation

Homework 2: Solutions Sid Banerjee Problem 1: Practice with Dynamic Programming Formulation Problem 1: Practice with Dynamic Programming Formulation A product manager has to order stock daily. Each unit cost is c, there is a fixed cost of K for placing an order. If you order on day t, the items

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Mathematics of Finance Final Preparation December 19. To be thoroughly prepared for the final exam, you should

Mathematics of Finance Final Preparation December 19. To be thoroughly prepared for the final exam, you should Mathematics of Finance Final Preparation December 19 To be thoroughly prepared for the final exam, you should 1. know how to do the homework problems. 2. be able to provide (correct and complete!) definitions

More information

KIER DISCUSSION PAPER SERIES

KIER DISCUSSION PAPER SERIES KIER DISCUSSION PAPER SERIES KYOTO INSTITUTE OF ECONOMIC RESEARCH http://www.kier.kyoto-u.ac.jp/index.html Discussion Paper No. 657 The Buy Price in Auctions with Discrete Type Distributions Yusuke Inami

More information

The Uncertain Volatility Model

The Uncertain Volatility Model The Uncertain Volatility Model Claude Martini, Antoine Jacquier July 14, 008 1 Black-Scholes and realised volatility What happens when a trader uses the Black-Scholes (BS in the sequel) formula to sell

More information

An optimal policy for joint dynamic price and lead-time quotation

An optimal policy for joint dynamic price and lead-time quotation Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming

More information

Optimal Dam Management

Optimal Dam Management Optimal Dam Management Michel De Lara et Vincent Leclère July 3, 2012 Contents 1 Problem statement 1 1.1 Dam dynamics.................................. 2 1.2 Intertemporal payoff criterion..........................

More information

The investment game in incomplete markets.

The investment game in incomplete markets. The investment game in incomplete markets. M. R. Grasselli Mathematics and Statistics McMaster University RIO 27 Buzios, October 24, 27 Successes and imitations of Real Options Real options accurately

More information

Department of Economics The Ohio State University Final Exam Answers Econ 8712

Department of Economics The Ohio State University Final Exam Answers Econ 8712 Department of Economics The Ohio State University Final Exam Answers Econ 872 Prof. Peck Fall 207. (35 points) The following economy has three consumers, one firm, and four goods. Good is the labor/leisure

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

Department of Economics The Ohio State University Final Exam Questions and Answers Econ 8712

Department of Economics The Ohio State University Final Exam Questions and Answers Econ 8712 Prof. Peck Fall 016 Department of Economics The Ohio State University Final Exam Questions and Answers Econ 871 1. (35 points) The following economy has one consumer, two firms, and four goods. Goods 1

More information

Microeconomic Foundations I Choice and Competitive Markets. David M. Kreps

Microeconomic Foundations I Choice and Competitive Markets. David M. Kreps Microeconomic Foundations I Choice and Competitive Markets David M. Kreps PRINCETON UNIVERSITY PRESS PRINCETON AND OXFORD Contents Preface xiii Chapter One. Choice, Preference, and Utility 1 1.1. Consumer

More information

Term Structure Lattice Models

Term Structure Lattice Models IEOR E4706: Foundations of Financial Engineering c 2016 by Martin Haugh Term Structure Lattice Models These lecture notes introduce fixed income derivative securities and the modeling philosophy used to

More information

Capacity Expansion Games with Application to Competition in Power May 19, Generation 2017 Investmen 1 / 24

Capacity Expansion Games with Application to Competition in Power May 19, Generation 2017 Investmen 1 / 24 Capacity Expansion Games with Application to Competition in Power Generation Investments joint with René Aïd and Mike Ludkovski CFMAR 10th Anniversary Conference May 19, 017 Capacity Expansion Games with

More information

Introduction to Dynamic Programming

Introduction to Dynamic Programming Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1

More information

Robust Pricing and Hedging of Options on Variance

Robust Pricing and Hedging of Options on Variance Robust Pricing and Hedging of Options on Variance Alexander Cox Jiajie Wang University of Bath Bachelier 21, Toronto Financial Setting Option priced on an underlying asset S t Dynamics of S t unspecified,

More information

Stochastic Optimal Control

Stochastic Optimal Control Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

KØBENHAVNS UNIVERSITET (Blok 2, 2011/2012) Naturvidenskabelig kandidateksamen Continuous time finance (FinKont) TIME ALLOWED : 3 hours

KØBENHAVNS UNIVERSITET (Blok 2, 2011/2012) Naturvidenskabelig kandidateksamen Continuous time finance (FinKont) TIME ALLOWED : 3 hours This question paper consists of 3 printed pages FinKont KØBENHAVNS UNIVERSITET (Blok 2, 211/212) Naturvidenskabelig kandidateksamen Continuous time finance (FinKont) TIME ALLOWED : 3 hours This exam paper

More information

Dynamic Programming and Reinforcement Learning

Dynamic Programming and Reinforcement Learning Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning

More information

Notes on the EM Algorithm Michael Collins, September 24th 2005

Notes on the EM Algorithm Michael Collins, September 24th 2005 Notes on the EM Algorithm Michael Collins, September 24th 2005 1 Hidden Markov Models A hidden Markov model (N, Σ, Θ) consists of the following elements: N is a positive integer specifying the number of

More information

Optimal liquidation with market parameter shift: a forward approach

Optimal liquidation with market parameter shift: a forward approach Optimal liquidation with market parameter shift: a forward approach (with S. Nadtochiy and T. Zariphopoulou) Haoran Wang Ph.D. candidate University of Texas at Austin ICERM June, 2017 Problem Setup and

More information

EE365: Risk Averse Control

EE365: Risk Averse Control EE365: Risk Averse Control Risk averse optimization Exponential risk aversion Risk averse control 1 Outline Risk averse optimization Exponential risk aversion Risk averse control Risk averse optimization

More information

Answer: Let y 2 denote rm 2 s output of food and L 2 denote rm 2 s labor input (so

Answer: Let y 2 denote rm 2 s output of food and L 2 denote rm 2 s labor input (so The Ohio State University Department of Economics Econ 805 Extra Problems on Production and Uncertainty: Questions and Answers Winter 003 Prof. Peck () In the following economy, there are two consumers,

More information

Homework 2: Dynamic Moral Hazard

Homework 2: Dynamic Moral Hazard Homework 2: Dynamic Moral Hazard Question 0 (Normal learning model) Suppose that z t = θ + ɛ t, where θ N(m 0, 1/h 0 ) and ɛ t N(0, 1/h ɛ ) are IID. Show that θ z 1 N ( hɛ z 1 h 0 + h ɛ + h 0m 0 h 0 +

More information