Stochastic Optimal Control

Size: px
Start display at page:

Download "Stochastic Optimal Control"

Transcription

1 Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of The theme is optimal control of discrete time stochastic systems using dynamic programming. Although the course covered estimation as well as control, these notes only cover perfectly observed systems. The primary sources are Dynamic Programming and Optimal Control by Dimitri Bertsekas, Stochastic Systems: Estimation, Identification and Adaptive Control by P.R. Kumar and Pravin Varaiya, and Prof. Eilyan Bitar s lecture notes. contents theory the basic optimal control problem dynamic programming applications optimal stopping inventory control portfolio analysis LQ systems 1

2 part I theory The Basic Problem Definitions The state of a system is a variable that separates the future from the past. The state is constrained at each time k by x k X k R n, k = 0, 1,..., N where X k is the feasible state space. The initial state x 0 is generally treated either as given, or as an n-dimensional random vector. The control or input u k is constrained to a feasible control space U k : where m n. u k U k (x k ) R m The state disturbance w k is a p-dimensional random vector: w k W k R p where p n. The disturbance w k is characterized by a probability distribution P wk ( x k, u k ) which, by assumption, does not depend explicitly on prior disturbances w 0,..., w k 1. The state transition map is a function f k : X k U k W k X k+1 that describes the evolution of the state in time: x k+1 = f k (x k, u k, w k ). The measurement noise v k is a q-dimensional random vector: v k V k R q where q n. The noise v k is characterized by a probability distribution P vk ( x 0,..., x k, u 0,..., u k 1, w 0,..., w k 1, v 0,..., v k 1 ) which (unlike the disturbance) may depend explicitly on prior states, controls, disturbances and noises. 2

3 The observation y k is a function of the state and noise: y k = h k (x k, v k ) Y k R r where r n and h k : X k V k Y k is the measurement model. The stage cost is a function g k : X k U k (x k ) W k R. The terminal cost is a function g N : X N R. A policy or control law π is a sequence of decision functions µ k (y k ), where y k = (y 0,..., y k ). Explicitly, ( ) π = µ 0 (y 0 ), µ 1 (y 0, y 1 ),..., µ N 1 (y 0, y 1,..., y N 1 ) ( ) = µ 0 (y 0 ), µ 1 (y 1 ),..., µ N 1 (y N 1 ) A policy is called admissible or feasible if, for all k = 0,..., N 1, and µ k (y k ) U k (x k ) y k Y 0 Y k and x k X k f k (x k, µ k (y k ), w k ) X k+1. Denote the set of all admissible policies by Π. The expected cost of a policy π, for a given initial state x 0, is the expected sum of the terminal cost and all stage costs under π: ] J π (x 0 ) = E wk [g N (x N π ) + N 1 k=0 g k (x π k, µ k (y k,π ), w k ) The superscript π emphasizes the dependence of x k and y k on the policy π. Remember that y k depends implicitly on x k through the observation map. The expected cost of a policy π over all initial states, then, is J(π) = E x0 [J π (x 0 )]. Optimal Control The goal of optimal control is as follows. Given a joint pdf on {x 0, w 0,..., w N, v 0,..., v N }, state transition maps f k, and measurement models h k, we seek to design a feasible policy π that minimizes the expected cost over all initial states, J(π). This is a harder problem than static optimization because we re optimizing over functions rather than numbers. 3

4 Formally, the basic optimal control problem is min J(π) π Π s.t. x π k+1 = f k (x π k, u π k, w k ) y π k = h k (x π k, v k ) u π k = µ k (y 0 π,..., y π k) U k (x π k) A policy π is called optimal if it minimizes the expected cost, J(π ) = E x0 [J π (x 0 )] = min π Π J(π). We call J π (x 0 ) the optimal cost function or optimal value function. It maps every initial state x 0 into an optimal cost. Interesting fact: in many cases, a policy π that minimizes J(π ), the expected cost over all initial states, is also optimal for any particular initial state x 0. Controlled Markov Chains For stochastic systems with finite-dimensional state spaces, we can use an alternative formulation. The controlled Markov chain description of a stochastic system is specified by 1. state transition probabilities P xk+1 x k,u k (x k+1 x k, u k ), 2. observation maps h k (x k, u k ) or observation probabilities P yk x k (y k x k ), and 3. the joint distribution of the basic RVs x 0, v 0,..., v N 1. More specifically, we can describe a controlled Markov chain as follows. State: x k X = {1, 2,..., I}. The feasible state space X is time invariant and finite-dimensional. Control: u k U(x k ). The feasible control space U may be infinite-dimensional. State transition probability matrix, P (u) [0, 1] I I : P (u) = [ p ij (u) ], 1 i, j I where p ij (u) = P(x k+1 = j x k = i, u k = u). 4

5 Costs: the expected cost under a policy π, for initial condition x 0, is [ ] J π (x 0 ) = E g N (x π n) + N 1 k=0 g k (x π k, u π k) Here the expectation is just a sum, weighted by the state transition probabilities p ij (u), and g k (x π k, uπ k ) and g N(x π N ) are the stage and terminal costs. The Bellman Equations are therefore Information Patterns V N(i) = g N (i) V k (i) = inf u U(i) { g k (i, u) + I j=1 } p ij (u) Vk+1(j) Let I k be the set of all observations that your controller is allowed to depend upon at time k. A stochastic system can follow one of four information patterns, defined by their particular forms of I k. 1. Oracle. I k = {w 0,..., w N 1 } {v 0,..., v N 1 } {x 0 } Perfect knowledge of past, present and future. Realizations of all disturbances, noise and initial state are known.. 2. No knowledge. I k = 3. Perfect information. I k = {x k } Current state is completely observed with no noise. 4. Imperfect information. I k = {y k } Current state is incompletely observed, possibly with noise. Mathematically, patterns 1 and 2 are easy. Patterns 3 and 4 are hard. In order of performance, 1 > 3 > 4 > 2. 5

6 Dynamic Programming Dynamic programming (DP) is the primary tool for solving optimal control problems. Here we present the dynamic programming algorithm for a perfectly observed Markovian system, i.e. one with yk π = x π k (perfect observation) and u π k = µ k (x π k) (Markovian policies) Assume that the random vectors {x 0, w 0,..., w N } are independent. Dynamic Programming Algorithm The DP algorithm consists of the following three steps: 1. Define the optimal terminal value function by V N(x N ) = g N (x N ). 2. At each stage k = N 1,..., 0, recursively define the optimal stage value function by solving the following optimization problem: Vk (x k ) = min E [ gk (x k, u k, w k ) + Vk+1( fk (x k, u k, w k ) )]. u k U k (x k ) w k The solution to the above results in - the optimal stage decision function µ k (x k), and - the optimal stage value function V k. 3. At stage 0, calculate the optimal expected cost over all initial states: J(π ) = E x0 [V 0 (x 0 )] The optimal policy π is the sequence of decision functions found in Step 2. The equations in Steps 1 and 2 are collectively referred to as the Bellman Equation. Optimality of the Bellman Equation Some theorems necessary for proving that dynamic programming works. 6

7 Preliminaries Let s build up to the proof of optimality of the Bellman Equation. Here are a few useful results, presented without proof. The proofs can be found in Kumar and Varaiya, pages Lemma. (Markov Property) Let Π M Π be the set of all Markovian feasible policies, i.e. those where the policy depends on the current state only, so µ k = µ k (x k ). If π Π M, then x π k+1 {x π k 1, x π k 2,... } x π k or, equivalently, x π k+1 = f k (x π k, µ k (x π k), w k ) and w k {x 0 π,..., x k 1 π }. A Markovian policy is one where the control depends only on the current state. If the policy is Markovian, then the state is Markovian too. Definition. At stage k, the expected cost-to-go for the Basic Problem under policy π is J π k (x 0 ) = E wk [ ( g N (x N π ) + N 1 j=k ) g j (x π j, u π {x0 j, w j ) π,..., x π k} The expected total cost under π over all initial states x 0, then, is J π (x 0 ) = E x0 [J π 0 (x 0 )]. ]. The Comparison Principle Lemma. (Comparison Principle) Let V k (x k ), k = 0,..., N be any collection of functions that satisfy C1. V N (x N ) g N (x N ) x N X N C2. V k (x k ) g k (x k ) + Ew k [V k+1 (f k (x k, u k, w k ))] x k X k, u k U k, k = 0,..., N 1 Then for any π Π, V k (x π k) J π k (x 0 ), k = 0,..., N. Corollary. Let V 0 (x 0 ),..., V N (x N ) satisfy C1 and C2. Then J(π ) E x0 [V 0 (x 0 )] and if J π 0 (x 0 ) = V 0 (x 0 ), then π is optimal. The value functions upper bound the optimal expected cost. If the total expected cost under a feasible policy π equals the stage 0 value function, then π is optimal. 7

8 Optimality of DP Definition. Let the following define the Bellman Equation: A1. V N (x N ) = g N (x N ) A2. V k (x k ) = inf uk U k { g k (x k, u k, w k ) + Ew k [ Vk+1 ( fk (x k, u k, w k ) )] } Theorem. (Optimality of DP) 1. For any π Π, V k (x π k ) J π k (x 0), and in particular, J(π) Ex 0 [V 0 (x 0 )]. 2. If a Markov policy π Π M achieves the infimum in A2, then (i) π is optimal, (ii) V k (x π k ) = J π k (x 0), and (iii) J(π) = Ex 0 [V 0 (x 0 )] (i.e. the inequalities in 1 are binding). 3. A Markov policy π Π M is optimal only if for each k, the infimum at x π k by u π k = µ k(x π k ), i.e. is achieved V k (x π k) = g k (x π k, µ k (x π k), w k ) + E wk [V k+1 (f k (x π k, µ k (x π k), w k ))]. A feasible Markov policy is optimal if and only if it satisfies the Bellman Equation. Monotonicity of DP Consider the stationary version of the basic problem, i.e. X k = X, U k = U, f k = f, g k = g for all k, and w k iid. If, in this case, the expected terminal cost exceeds the expected cost-to-go at stage N 1, then the value functions must be monotonically decreasing with k: V N(x) V N 1(x) for all x X = V k+1(x) V k (x) for all x X and for all k. Similarly, if the stage N 1 cost-to-go exceeds the terminal cost, then the value functions must be monotonically increasing with k: V N(x) V N 1(x) for all x X = V k+1(x) V k (x) for all x X and for all k. 8

9 part II applications We studied the following problems in the segment of the class on perfectly observed systems. 1. Moving on a graph. 2. (Deterministic) shortest path. 3. Multi-period gambling (a.k.a. dynamic portfolio analysis). 4. Optimal stopping problems: - asset selling - deadline purchasing - the secretary problem - general stopping problems. 5. Perfectly observed linear systems with quadratic costs (LQ) - finite horizon - infinite horizon. 6. Inventory control (book only). 7. Scheduling (e.g. EV problem; see Bertsekas 4.5). Optimal Stopping Problems Asset Selling A series of offers w k are received at stages k = 0,..., N. The decisions are to either stop in which case the offer is taken and invested at interest rate r or to continue. If at stage N you haven t taken an offer, you must take offer w N. Structure of the optimal policy: threshold. If x k > α k, accept. For iidoffers, the thresholds monotonically decrease with time, i.e. α k+1 α k. If the horizon N is very large, the optimal policy is well approximated by a stationary threshold policy: stop if x k < ᾱ. A good practice problem: derive the thresholds α k and ᾱ. 9

10 Deadline Purchasing A similar problem to asset selling. Some stuff has to be bought by some deadline. At each stage there s a random price w k. Whether the prices are correlated or iid, the optimal policy ends up being a threshold policy: purchase if x k < α k. The α k are given by a linear system, and for both the iidand correlated price cases the thresholds are monotonically increasing with time, i.e. α k+1 α k. Good/easy practice problem: derive the threshold α k for the iidcase. Harder problem: do the same for the correlated case. General Stopping Problems A stationary problem (X k = X, U k = U, f k = f, g k = g k, w k iid). At each stage, you can pay a cost t(x k ) to stop, at which point you stay stopped forever at no additional cost. Stopping is mandatory by stage N. The Bellman Equations for this problem are VN(x N ) = t(x N ) { Vk (x k ) = min t(x k ), min u k U(x k ) { Ewk [ g(xk, u k, w k ) + V k+1(f(x k, u k, w k )) ] }}. by The optimal policy is stop when x k T k, where T k is the optimal stopping set, defined T k = { { [ x k t(xk ) min Ewk g(xk, u k, w k ) + Vk+1(f(x k, u k, w k )) ] }}. u k U(x k ) For the general stopping problem, T 0 T k T k+1... T N 1, i.e. the optimal stopping sets get larger as time goes on. For the special case where the stage N 1 stopping set is absorbing, i.e. f(x, u, w) T N 1 x T N 1, u U, and w, we get the nice result that T k = T N 1 for all k. In words, the optimal stopping set at stage k is the set of all states for which it is better to stop now, than to proceed for one more stage and then stop. Policies of this type are called one step look-ahead policies. Good problems: work through Examples and, time permitting, The Secretary Problem You plan to interview (or date) N secretaries (partners). You want to maximize the probability that you hire (marry) the best secretary (wife). (NB: this is a different objective than our typical metric of optimizing the expected value.) Assume that you hire (date) one candidate per stage and that each candidate has a random quality w k [0, 1]. 10

11 There are four possible states: (you ve stopped or you haven t) (the current candidate is the best or not the best). There are three possible controls: accept the current candidate, reject them, or do nothing. If you ve already stopped, your only choice is to do nothing. If you ve already seen a better candidate than your current option, then you can t stop now. If you reach the last stage without hiring (marrying) anyone, then you have to hire (marry) your last option. Result: the optimal policy is a threshold policy: stop at stage k, where where k = inf { k N k N V k (0) } Vk (0) = k N 1 N j=k is a non-increasing function in k, and the state 0 means you ve yet to pick, but the current candidate is not the best you ve seen so far. As expected, the thresholds Vk (0) get lower as time goes on. A nice approximate solution that s independent of the distribution of w k : stop after k secretaries, where k = N e. Expect to date 10 people of random and iidquality? Marry the fourth one ( = 4). Inventory Control Here the state is the stuff held in inventory, the control is how much new stuff to order at each stage (at a unit cost c), and the disturbance is the demand at each stage. No Fixed Cost In the simple version of this problem, the stage cost is p if x k < 0 (a shortage cost ) and h if x k > 0 (a holding cost ). This leads to a convex optimization with a single solution, and therefore a threshold policy: buy S k x k if x k < S k, and buy nothing if x k S k. We can interpret S k as the optimal amount of stock to have at stage k. Formally, we have V k (x k ) = min y k x k G k (y k ) cx k, where G k (y) = cy + H(y) + E w [ V k+1 (y w) ], H(y) = p E wk [max 0, w k y] + h E wk [max 0, y w k ], y k = x k + u k, and S k = arg min y R G k(y). 1 j 11

12 Positive Fixed Cost In this case, we add a fixed cost K to the stage cost (which is still either a shortage or a holding cost). The optimal policy is still a threshold policy, but with a different optimal amount of stock: buy S k x k if x k < s k, and buy nothing if x k s k. The derivation of this cost is harder, however, because the cost function is no longer convex, but K-convex. Formal definitions: y k, G k (y), S k, and H(y) are defined as above, but we have s k = min y { y Gk (y) = K + G k (S k ) }. A policy of this form is called a multiperiod (s,s) policy. Dynamic Portfolio Analysis In this problem, an investor decides how to distribute an initial fortune x 0 among a collection of n assets with random rates of return e 1,..., e n. The investor also has the option of a riskfree asset with rate of return s. The controls u 1,..., u n are the investments in each of the risky assets. We assume the objective is to maximize the expectation of a utility function U(x), assumed concave and twice differentiable. Single Period The investor s fortune after a single period is n x 1 = sx 0 + (e i s)u i. i=1 When the utility function satisfies U (x)/u (x) = a + bx (e.g. exponential, logarithmic or power functions) for all x and for some scalars a, b, then the optimal portfolio is the linear policy µ i (x 0 ) = α i (a + bsx 0 ) for i = 1,..., n, where the α i are constants. The single-period optimal policy is called myopic. Multiperiod Here we reinvest the stage k 1 fortune at stage k: n x k+1 = s k x k + (e k i s k )u k i, k = 0,..., N 1, i=1 and we maximize the expected utility of the terminal fortune x N. If the utility function follows the properties above, then we again get linear controls: ( µ k (x a ) k) = α k + bs k x k R n, s N 1... s k+1 12

13 where α k is a random vector that depends on the joint distribution of the rates of return. 1 For utility functions of the form ln(x) or b 1 (bx)1 (1/b), b 0, b 1, or for risk-free rate of return s k = s = 1 for all k, it turns out that the myopic policy is optimal at every stage. In some other cases, a partially myopic (limited look-ahead) policy is optimal. For the infinite horizon problem, the optimal multiperiod policy approaches myopia. Perfectly Observed Linear Systems with Quadratic Costs (a.k.a. LQ systems) Finite Horizon First we study the non-stationary linear system x k+1 = A k x k + B k u k + w k, k = 0,..., N 1. (w k is an n-dimensional random vector; if you don t have randomness entering on all channels, then the elements of w k corresponding to those channels are just zero.) The stage and terminal costs are quadratic, so the Bellman Equations are VN(x N ) = x T N Q N x N { [ Vk (x k ) = min x T k Q k x k + u T k R k u k + E V k+1 (A k x k + B k u k + w k ) ] } u k wk We assume that the matrices Q k are positive semidefinite symmetric, and the matrices R k are positive definite symmetric. The controls u k are unconstrained, and the disturbances w k are zero mean, finite variance independent random vectors with distributions independent of state and control. Result. state: where The optimal value functions V k are quadratic, so the optimal control law is linear in the µ k (x k ) = L k x k, (1) L k = (B T k K k+1 B k + R k ) 1 B T k K k+1 A k, and the matrices K k are given by the following recursion, known as the discrete-time Riccati equation (DRE): K N = Q N, K k = A T k (K k+1 K k+1 B k (B T k K k+1 B k + R k ) 1 B T k K k+1 )A k + Q k. The optimal cost (for a given initial state) is J 0 (x 0 ) = x 0 T K 0 x 0 + N 1 k=0 [ ] E T wk K k+1 w k. w k 13

14 It s occasionally useful to express the last term as the following: [ ] E T wk K k+1 w k = Tr(Kk+1 Cov(w k, w k )), where w k Cov(w k, w k ) = E wk [ wk w k T ]. Infinite Horizon Now we consider the (mostly) stationary linear system x k+1 = Ax k + Bu k + w k, k = 0,..., N 1. with Q k = Q, R k = R. (It s mostly stationary because the disturbances w k are not assumed to be iid.) In the large horizon regime (N 1, k N), the optimal finite horizon control law (1) is well approximated by µ (x) = Lx, where L = (B T KB + R) 1 B T KA, and the matrices K solve the discrete-time algebraic Riccati equation (DARE): K = A T (K KB(B T KB + R) 1 B T K)A + Q. This linear control law is easy to implement and applicable to a wide range of problems. It s so nice to work with, in fact, that people often squish problems into the LQ framework so that these powerful results (and the analogous ones for the LQG problem) can be applied. Many variations of the LQ problem have been studied, such as non-zero-mean disturbances, tracking a trajectory rather than driving the state to 0, and random system matrices. The last is a tool for overcoming the sensitivity to modeling error that these results often exhibit. When is it valid to use the asymptotic approximation, i.e. to use the DARE in place of the DRE? When the following conditions are met: 1. The system is stationary: A k = A, B k = B, Q k = Q, R k = R for all k, 2. Q = Q T 0, R = R T 0, 3. (A,B) is a controllable pair, and 4. (A,C) is an observable pair, where we can write Q = C T C. Under these conditions, we re guaranteed that: 1. There exists a K 0 s.t. for all K N, lim k K k (K N ) = K, 2. K is the unique solution of the DARE, and 3. The closed-loop system x k+1 = (A + BL) x k is stable, where L = (B T KB + R) 1 B T KA. 14

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move

More information

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role

More information

Final exam solutions

Final exam solutions EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the

More information

Dynamic Programming and Stochastic Control

Dynamic Programming and Stochastic Control Dynamic Programming and Stochastic Control Dr. Alex Leong Department of Electrical Engineering (EIM-E) Paderborn University, Germany alex.leong@upb.de Dr. Alex Leong (alex.leong@upb.de) DP and Stochastic

More information

Dynamic Portfolio Execution Detailed Proofs

Dynamic Portfolio Execution Detailed Proofs Dynamic Portfolio Execution Detailed Proofs Gerry Tsoukalas, Jiang Wang, Kay Giesecke March 16, 2014 1 Proofs Lemma 1 (Temporary Price Impact) A buy order of size x being executed against i s ask-side

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Dynamic Programming and Reinforcement Learning

Dynamic Programming and Reinforcement Learning Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning

More information

CHAPTER 5: DYNAMIC PROGRAMMING

CHAPTER 5: DYNAMIC PROGRAMMING CHAPTER 5: DYNAMIC PROGRAMMING Overview This chapter discusses dynamic programming, a method to solve optimization problems that involve a dynamical process. This is in contrast to our previous discussions

More information

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 005 Seville, Spain, December 1-15, 005 WeA11.6 OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

Optimization Methods. Lecture 16: Dynamic Programming

Optimization Methods. Lecture 16: Dynamic Programming 15.093 Optimization Methods Lecture 16: Dynamic Programming 1 Outline 1. The knapsack problem Slide 1. The traveling salesman problem 3. The general DP framework 4. Bellman equation 5. Optimal inventory

More information

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens. 102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the

More information

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008 (presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35 The Secretary problem You have

More information

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods EC316a: Advanced Scientific Computation, Fall 2003 Notes Section 4 Discrete time, continuous state dynamic models: solution methods We consider now solution methods for discrete time models in which decisions

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

1 Precautionary Savings: Prudence and Borrowing Constraints

1 Precautionary Savings: Prudence and Borrowing Constraints 1 Precautionary Savings: Prudence and Borrowing Constraints In this section we study conditions under which savings react to changes in income uncertainty. Recall that in the PIH, when you abstract from

More information

Introduction to Dynamic Programming

Introduction to Dynamic Programming Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1

More information

Handout 4: Deterministic Systems and the Shortest Path Problem

Handout 4: Deterministic Systems and the Shortest Path Problem SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas

More information

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time

More information

AMH4 - ADVANCED OPTION PRICING. Contents

AMH4 - ADVANCED OPTION PRICING. Contents AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5

More information

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE 6.21 DYNAMIC PROGRAMMING LECTURE LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path

More information

Approximate Revenue Maximization with Multiple Items

Approximate Revenue Maximization with Multiple Items Approximate Revenue Maximization with Multiple Items Nir Shabbat - 05305311 December 5, 2012 Introduction The paper I read is called Approximate Revenue Maximization with Multiple Items by Sergiu Hart

More information

Stochastic Approximation Algorithms and Applications

Stochastic Approximation Algorithms and Applications Harold J. Kushner G. George Yin Stochastic Approximation Algorithms and Applications With 24 Figures Springer Contents Preface and Introduction xiii 1 Introduction: Applications and Issues 1 1.0 Outline

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

Martingales. by D. Cox December 2, 2009

Martingales. by D. Cox December 2, 2009 Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a

More information

Asymptotic results discrete time martingales and stochastic algorithms

Asymptotic results discrete time martingales and stochastic algorithms Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

Lecture Notes 1

Lecture Notes 1 4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

1 Dynamic programming

1 Dynamic programming 1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants

More information

Dynamic Portfolio Choice II

Dynamic Portfolio Choice II Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic

More information

3. The Dynamic Programming Algorithm (cont d)

3. The Dynamic Programming Algorithm (cont d) 3. The Dynamic Programming Algorithm (cont d) Last lecture e introduced the DPA. In this lecture, e first apply the DPA to the chess match example, and then sho ho to deal ith problems that do not match

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

What can we do with numerical optimization?

What can we do with numerical optimization? Optimization motivation and background Eddie Wadbro Introduction to PDE Constrained Optimization, 2016 February 15 16, 2016 Eddie Wadbro, Introduction to PDE Constrained Optimization, February 15 16, 2016

More information

Scenario Generation and Sampling Methods

Scenario Generation and Sampling Methods Scenario Generation and Sampling Methods Güzin Bayraksan Tito Homem-de-Mello SVAN 2016 IMPA May 9th, 2016 Bayraksan (OSU) & Homem-de-Mello (UAI) Scenario Generation and Sampling SVAN IMPA May 9 1 / 30

More information

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006 On the convergence of Q-learning Elif Özge Özdamar elif.ozdamar@helsinki.fi T-61.6020 Reinforcement Learning - Theory and Applications February 14, 2006 the covergence of stochastic iterative algorithms

More information

SYLLABUS AND SAMPLE QUESTIONS FOR MSQE (Program Code: MQEK and MQED) Syllabus for PEA (Mathematics), 2013

SYLLABUS AND SAMPLE QUESTIONS FOR MSQE (Program Code: MQEK and MQED) Syllabus for PEA (Mathematics), 2013 SYLLABUS AND SAMPLE QUESTIONS FOR MSQE (Program Code: MQEK and MQED) 2013 Syllabus for PEA (Mathematics), 2013 Algebra: Binomial Theorem, AP, GP, HP, Exponential, Logarithmic Series, Sequence, Permutations

More information

Universal Portfolios

Universal Portfolios CS28B/Stat24B (Spring 2008) Statistical Learning Theory Lecture: 27 Universal Portfolios Lecturer: Peter Bartlett Scribes: Boriska Toth and Oriol Vinyals Portfolio optimization setting Suppose we have

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

arxiv: v1 [math.pr] 6 Apr 2015

arxiv: v1 [math.pr] 6 Apr 2015 Analysis of the Optimal Resource Allocation for a Tandem Queueing System arxiv:1504.01248v1 [math.pr] 6 Apr 2015 Liu Zaiming, Chen Gang, Wu Jinbiao School of Mathematics and Statistics, Central South University,

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

SOLVING ROBUST SUPPLY CHAIN PROBLEMS

SOLVING ROBUST SUPPLY CHAIN PROBLEMS SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

So we turn now to many-to-one matching with money, which is generally seen as a model of firms hiring workers

So we turn now to many-to-one matching with money, which is generally seen as a model of firms hiring workers Econ 805 Advanced Micro Theory I Dan Quint Fall 2009 Lecture 20 November 13 2008 So far, we ve considered matching markets in settings where there is no money you can t necessarily pay someone to marry

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

Dynamic Admission and Service Rate Control of a Queue

Dynamic Admission and Service Rate Control of a Queue Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering

More information

EE365: Markov Decision Processes

EE365: Markov Decision Processes EE365: Markov Decision Processes Markov decision processes Markov decision problem Examples 1 Markov decision processes 2 Markov decision processes add input (or action or control) to Markov chain with

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

Multi-armed bandit problems

Multi-armed bandit problems Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before

More information

Chapter 7: Portfolio Theory

Chapter 7: Portfolio Theory Chapter 7: Portfolio Theory 1. Introduction 2. Portfolio Basics 3. The Feasible Set 4. Portfolio Selection Rules 5. The Efficient Frontier 6. Indifference Curves 7. The Two-Asset Portfolio 8. Unrestriceted

More information

Dynamic - Cash Flow Based - Inventory Management

Dynamic - Cash Flow Based - Inventory Management INFORMS Applied Probability Society Conference 2013 -Costa Rica Meeting Dynamic - Cash Flow Based - Inventory Management Michael N. Katehakis Rutgers University July 15, 2013 Talk based on joint work with

More information

Lecture 5 January 30

Lecture 5 January 30 EE 223: Stochastic Estimation and Control Spring 2007 Lecture 5 January 30 Lecturer: Venkat Anantharam Scribe: aryam Kamgarpour 5.1 Secretary Problem The problem set-up is explained in Lecture 4. We review

More information

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.

More information

Stochastic Games and Bayesian Games

Stochastic Games and Bayesian Games Stochastic Games and Bayesian Games CPSC 532l Lecture 10 Stochastic Games and Bayesian Games CPSC 532l Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games 4 Analyzing Bayesian

More information

Lecture outline W.B.Powell 1

Lecture outline W.B.Powell 1 Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous

More information

Dynamic Programming (DP) Massimo Paolucci University of Genova

Dynamic Programming (DP) Massimo Paolucci University of Genova Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem

More information

Stochastic Optimal Control With Dynamic, Time-Consistent Risk Constraints

Stochastic Optimal Control With Dynamic, Time-Consistent Risk Constraints Stochastic Optimal Control With Dynamic, Time-Consistent Risk Constraints Yin-Lam Chow, Marco Pavone risk metric is applied to the future stream of costs; typical examples include variance-constrained

More information

Part 4: Markov Decision Processes

Part 4: Markov Decision Processes Markov decision processes c Vikram Krishnamurthy 2013 1 Part 4: Markov Decision Processes Aim: This part covers discrete time Markov Decision processes whose state is completely observed. The key ideas

More information

1 Economical Applications

1 Economical Applications WEEK 4 Reading [SB], 3.6, pp. 58-69 1 Economical Applications 1.1 Production Function A production function y f(q) assigns to amount q of input the corresponding output y. Usually f is - increasing, that

More information

Approximation Algorithms for Stochastic Inventory Control Models

Approximation Algorithms for Stochastic Inventory Control Models Approximation Algorithms for Stochastic Inventory Control Models Retsef Levi Martin Pal Robin Roundy David B. Shmoys Abstract We consider stochastic control inventory models in which the goal is to coordinate

More information

Stochastic Programming and Financial Analysis IE447. Midterm Review. Dr. Ted Ralphs

Stochastic Programming and Financial Analysis IE447. Midterm Review. Dr. Ted Ralphs Stochastic Programming and Financial Analysis IE447 Midterm Review Dr. Ted Ralphs IE447 Midterm Review 1 Forming a Mathematical Programming Model The general form of a mathematical programming model is:

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

Consumption and Portfolio Choice under Uncertainty

Consumption and Portfolio Choice under Uncertainty Chapter 8 Consumption and Portfolio Choice under Uncertainty In this chapter we examine dynamic models of consumer choice under uncertainty. We continue, as in the Ramsey model, to take the decision of

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

Log-linear Dynamics and Local Potential

Log-linear Dynamics and Local Potential Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically

More information

CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization

CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization March 9 16, 2018 1 / 19 The portfolio optimization problem How to best allocate our money to n risky assets S 1,..., S n with

More information

Mean-Variance Analysis

Mean-Variance Analysis Mean-Variance Analysis Mean-variance analysis 1/ 51 Introduction How does one optimally choose among multiple risky assets? Due to diversi cation, which depends on assets return covariances, the attractiveness

More information

Financial Mathematics III Theory summary

Financial Mathematics III Theory summary Financial Mathematics III Theory summary Table of Contents Lecture 1... 7 1. State the objective of modern portfolio theory... 7 2. Define the return of an asset... 7 3. How is expected return defined?...

More information

Problem 1: Random variables, common distributions and the monopoly price

Problem 1: Random variables, common distributions and the monopoly price Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively

More information

Lec 1: Single Agent Dynamic Models: Nested Fixed Point Approach. K. Sudhir MGT 756: Empirical Methods in Marketing

Lec 1: Single Agent Dynamic Models: Nested Fixed Point Approach. K. Sudhir MGT 756: Empirical Methods in Marketing Lec 1: Single Agent Dynamic Models: Nested Fixed Point Approach K. Sudhir MGT 756: Empirical Methods in Marketing RUST (1987) MODEL AND ESTIMATION APPROACH A Model of Harold Zurcher Rust (1987) Empirical

More information

Risk Measurement in Credit Portfolio Models

Risk Measurement in Credit Portfolio Models 9 th DGVFM Scientific Day 30 April 2010 1 Risk Measurement in Credit Portfolio Models 9 th DGVFM Scientific Day 30 April 2010 9 th DGVFM Scientific Day 30 April 2010 2 Quantitative Risk Management Profit

More information

Optimization in Finance

Optimization in Finance Research Reports on Mathematical and Computing Sciences Series B : Operations Research Department of Mathematical and Computing Sciences Tokyo Institute of Technology 2-12-1 Oh-Okayama, Meguro-ku, Tokyo

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics

More information

Pricing Problems under the Markov Chain Choice Model

Pricing Problems under the Markov Chain Choice Model Pricing Problems under the Markov Chain Choice Model James Dong School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jd748@cornell.edu A. Serdar Simsek

More information

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T.

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T. Practice July 3rd, 2012 Laboratory for Information and Decision Systems, M.I.T. 1 2 Infinite-Horizon DP Minimize over policies the objective cost function J π (x 0 ) = lim N E w k,k=0,1,... DP π = {µ 0,µ

More information

SYLLABUS AND SAMPLE QUESTIONS FOR MS(QE) Syllabus for ME I (Mathematics), 2012

SYLLABUS AND SAMPLE QUESTIONS FOR MS(QE) Syllabus for ME I (Mathematics), 2012 SYLLABUS AND SAMPLE QUESTIONS FOR MS(QE) 2012 Syllabus for ME I (Mathematics), 2012 Algebra: Binomial Theorem, AP, GP, HP, Exponential, Logarithmic Series, Sequence, Permutations and Combinations, Theory

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence

More information

IEOR E4004: Introduction to OR: Deterministic Models

IEOR E4004: Introduction to OR: Deterministic Models IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the

More information

Dynamic Contracts. Prof. Lutz Hendricks. December 5, Econ720

Dynamic Contracts. Prof. Lutz Hendricks. December 5, Econ720 Dynamic Contracts Prof. Lutz Hendricks Econ720 December 5, 2016 1 / 43 Issues Many markets work through intertemporal contracts Labor markets, credit markets, intermediate input supplies,... Contracts

More information

Lecture 23: April 10

Lecture 23: April 10 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 23: April 10 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION

THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION SILAS A. IHEDIOHA 1, BRIGHT O. OSU 2 1 Department of Mathematics, Plateau State University, Bokkos, P. M. B. 2012, Jos,

More information

The ruin probabilities of a multidimensional perturbed risk model

The ruin probabilities of a multidimensional perturbed risk model MATHEMATICAL COMMUNICATIONS 231 Math. Commun. 18(2013, 231 239 The ruin probabilities of a multidimensional perturbed risk model Tatjana Slijepčević-Manger 1, 1 Faculty of Civil Engineering, University

More information

An optimal policy for joint dynamic price and lead-time quotation

An optimal policy for joint dynamic price and lead-time quotation Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming

More information

The Complexity of Simple and Optimal Deterministic Mechanisms for an Additive Buyer. Xi Chen, George Matikas, Dimitris Paparas, Mihalis Yannakakis

The Complexity of Simple and Optimal Deterministic Mechanisms for an Additive Buyer. Xi Chen, George Matikas, Dimitris Paparas, Mihalis Yannakakis The Complexity of Simple and Optimal Deterministic Mechanisms for an Additive Buyer Xi Chen, George Matikas, Dimitris Paparas, Mihalis Yannakakis Seller has n items for sale The Set-up Seller has n items

More information

Course notes for EE394V Restructured Electricity Markets: Locational Marginal Pricing

Course notes for EE394V Restructured Electricity Markets: Locational Marginal Pricing Course notes for EE394V Restructured Electricity Markets: Locational Marginal Pricing Ross Baldick Copyright c 2018 Ross Baldick www.ece.utexas.edu/ baldick/classes/394v/ee394v.html Title Page 1 of 160

More information

Optimal Inventory Policies with Non-stationary Supply Disruptions and Advance Supply Information

Optimal Inventory Policies with Non-stationary Supply Disruptions and Advance Supply Information Optimal Inventory Policies with Non-stationary Supply Disruptions and Advance Supply Information Bilge Atasoy (TRANSP-OR, EPFL) with Refik Güllü (Boğaziçi University) and Tarkan Tan (TU/e) July 11, 2011

More information

SJÄLVSTÄNDIGA ARBETEN I MATEMATIK

SJÄLVSTÄNDIGA ARBETEN I MATEMATIK SJÄLVSTÄNDIGA ARBETEN I MATEMATIK MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET Dynamic Programming and Applications in Economics av Johan Palmquist 2015 - No 15 MATEMATISKA INSTITUTIONEN, STOCKHOLMS

More information

CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION

CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION Szabolcs Sebestyén szabolcs.sebestyen@iscte.pt Master in Finance INVESTMENTS Sebestyén (ISCTE-IUL) Choice Theory Investments 1 / 65 Outline 1 An Introduction

More information

1 IEOR 4701: Notes on Brownian Motion

1 IEOR 4701: Notes on Brownian Motion Copyright c 26 by Karl Sigman IEOR 47: Notes on Brownian Motion We present an introduction to Brownian motion, an important continuous-time stochastic process that serves as a continuous-time analog to

More information

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision

More information

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints David Laibson 9/11/2014 Outline: 1. Precautionary savings motives 2. Liquidity constraints 3. Application: Numerical solution

More information

Chair of Communications Theory, Prof. Dr.-Ing. E. Jorswieck. Übung 5: Supermodular Games

Chair of Communications Theory, Prof. Dr.-Ing. E. Jorswieck. Übung 5: Supermodular Games Chair of Communications Theory, Prof. Dr.-Ing. E. Jorswieck Übung 5: Supermodular Games Introduction Supermodular games are a class of non-cooperative games characterized by strategic complemetariteis

More information

PhD Qualifier Examination

PhD Qualifier Examination PhD Qualifier Examination Department of Agricultural Economics May 29, 2014 Instructions This exam consists of six questions. You must answer all questions. If you need an assumption to complete a question,

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

Part 1: q Theory and Irreversible Investment

Part 1: q Theory and Irreversible Investment Part 1: q Theory and Irreversible Investment Goal: Endogenize firm characteristics and risk. Value/growth Size Leverage New issues,... This lecture: q theory of investment Irreversible investment and real

More information