Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Size: px
Start display at page:

Download "Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0."

Transcription

1 Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization Random order with fixed step size Cyclic order with fixed step size Steepest direction Gauss-Southwell rule with fixed step size Alternatives Summary 3 Examples Linear equations Logistic regression Basic idea coordinate minimization: Compute the next iterative using the update x k+ = x k α ke ik Given a function f : R n R, consider the unconstrained optimization problem We will consider various assumptions on f: nonconvex and differentiable f convex and differentiable f strongly convex and differentiable f minimize fx We will not consider general non-smooth f, because we can not prove anything. We will briefly consider structured non-smooth problems, i.e., problems that use an additional separable regularizer. Notation: f k := fx k and g k := fx k Algorithm General coordinate minimization framework. : Choose x 0 R n and set k 0. 2: loop 3: Choose i k {, 2,..., n}. 4: Choose α k > 0 5: Set x k+ x k α ke ik. 6: Set k k +. 7: end loop α k is the step size. Options include: fixed, but sufficiently small inexact linesearch exact linesearch i k {, 2,..., n} has to be chosen. Options include: cycle through the entire set choose it randomly without replacement choose it randomly with replacement choose it based on which element of fx k is the largest in absolute value e ik is the i k-th coordinate vector this update seeks better points in span{e ik }.

2 The following example shows that coordinate minimization may fail if f is non-smooth: minimize f x, x2 := x + x2 + min{0, x }2 + min{0, x2 }2 + 3 x x2 x,x2 Algorithm 2 Coordinate minimization with cyclic order and exact minimization. : 2: 3: 4: Choose x0 Rn and set k 0. loop Choose ik = modk, n +. Calculate the exact coordinate minimizer: αk argmin f xk αeik α R 5: 6: 7: Set xk+ xk αk eik. Set k k +. end loop Comments: This algorithm assumes that the exact minimizers exist and that they are unique. A reasonable stopping condition should be incorporated, such as k f xk k2 0 6 max{, k f x0 k2 } Figure : Level curves for the function f defined above. Coordinate minimization cannot make progress from any point satisfying x = x2 > 0. Note: Coordinate descent works if the non-smoothness is structured block separable An interesting example introduced by Powell [5, formula 2] is minimize f x, x2, x3 := x x2 + x2 x3 + x x3 + x,x2,x3 3 X xi 2 i= f is continuously differentiable and nonconvex f has minimizers at,, and,, of the unit cube. Coordinate descent with exact minimization started just outside the unit cube near any nonoptimal vertex cycles around neighborhoods of all 6 non-optimal vertices. Powell shows that the cyclic nonconvergence behavior is special and is destroyed by small perturbations on this particular example. Theorem 2. see [6, Theorem 5.32] Assume that the following hold: f is continuously differentiable; the level set L0 := {x Rn : f x f x0 } is bounded; and for every x L0 and all j {, 2,..., n}, the optimization problem minimize f x + ζej ζ R has a unique minimizer. Then, for any limit point x of the sequence {xk } generated by Algorithm 2 that satisfies f x = 0. Figure : Three dimensional example given above. It shows the possible lack of convergence of a coordinate descent method with exact minimization. This example and others in [5] show that we cannot expect a general convergence result for nonconvex functions similar to that for full-gradient descent. This picture was taken from [7]. Proof: Since f xk+ f xk for all k N, we know that the sequence {xk } k=0 L0. Since L0 is bounded that {xk } k=0 has at least one limit point; let x be any such limit point. Thus, there exists a subsequence K N satisfying lim xk = x. k K 2 Combining this with monotonicity of { f xk } and continuity of f also shows that lim f xk = f x and f xk f x for all k. k 3

3 We assume fx 0, and then reach a contradiction. First, consider the subsets K i K, i = 0,..., n defined as K i := {k K : k i mod n}. Since K is an infinite subsequence of the natural numbers, one of the K i must be an infinite set. Without loss of generality, we assume it is K 0 the argument is very similar for any other i, because we are using cyclic order. Let us perform a hypothetical sweep" of coordinate minimization starting from x, so that we would obtain l y := x n, with x l := x + [τ ] je j for all l =,..., n and note that since fx 0 by assumption, we must have fy < fx. why? 4 NOTE: If K i was infinite for some i 0, then we would do above sweep" at x starting with coordinate i and going in cyclic order to cover all n coordinates. Next, notice that by construction of the coordinate minimization scheme, that x k+l = x k + l τ k+j e j for all k K 0 and l n, 5 meaning that x k+l x k = τ k τ k+. τ k+l 2 max{ x : x L 0} < for all k K 0 and l n. We used the assumption that L 0 is bounded. Since this shows that the set {τ k, τ k+,... τ k+n T } k K0 is bounded, we may pass to a subsequence K K 0 with lim k K τ k τ k+. τ k+n = τ L for some τ L R n. 6 Taking the limit of 5 over k K K for each l, and using 2 and 6 we find that lim xk+l = x + k K l [τ L ] je j for each l n. 7 We next claim the following, which we will prove by induction: [τ L ] p = [τ ] p for all p n, and 8 lim k K xk+p = x p for all p n. 9 Base case: p =. We know from the coordinate minimization that fx k+ fx k + τ e for all k and τ R. Taking limits over k K K and using continuity of f, 7 with l =, and 2 yields fx + [τ L ] e = f lim xk+ = lim fxk+ lim fxk + τ e k K k K k K = f lim k K xk + τ e = fx + τ e for all τ R. Since the minimizations in coordinate directions are unique by assumption, we know that [τ L ] = [τ ], which is the first desired result. Also, combining it with 7 gives lim xk+ = x + [τ L ] e = x + [τ ] e x k K, which completes the base base. Induction step: assume that 8 and 9 hold for p p n. We know from the coordinate minimization that fx k+ p+ fx k+ p + τ e p+ for all k and τ R. Taking the limit over k K, continuity of f, 7 with l = p +, and 9 give p+ fx + [τ L ] je j = f lim xk+ p+ = lim fxk+ p+ lim fxk+ p + τ e p+ k K k K k K = f lim k K xk+ p + τ e p+ = fx p + τ e p+ for all τ R. Thus, the definition of x p, and the fact that 8 holds for all p p show that fx p + [τ L ] p+e p+ = fx + = fx + p p [τ ] je j + [τ L ] p+e p+ [τ L ] je j + [τ L ] p+e p+ p+ = fx + [τ L ] je j fx p + τ e p+ for all τ R. Since the minimization in coordinate directions is unique by assumption, we know that [τ L ] p+ = [τ ] p+, which is the first desired result. Also, combining it with 7 gives p+ p+ lim xk+ p+ = x + [τ L ] je j = x + [τ ] je j x p+, k K which completes the proof by induction.

4 Notation: Let L j denote the jth component Lipschitz constant, i.e., it satisfies j f x + te j j fx L j t for all x R n and t. From our induction proof, we have that τ = τ L. Combining this with 7 and the definition of y gives lim xk+n = x + k K n τ L j e j = x + Finally, combining 3, continuity of f, 0, and 4 shows that n τ j e j x n y. 0 fx = lim fxk+n = f lim xk+n = fy < fx, k K k K which is a contradiction. This completes the proof. Let denote the coordinate Lipschitz constant, i.e., it satisfies := max L i. Algorithm 3 Coordinate minimization with random order and a fixed step size. : Choose α 0, /], where 2: Choose x 0 R n and set k 0. 3: loop 4: Choose i k {, 2,..., n} randomly with equal probability. 5: Set x k+ x k α ik fx ke ik. 6: Set k k +. 7: end loop Comments: A reasonable stopping condition should be incorporated, such as fx k max{, fx 0 2} A maximum number of iterations should be included in practice. Theorem 2.2 Suppose that α = / and let the following assumptions hold: f is convex f is globally Lipschitz continuous the minimum value of f is obtained on some set S, i.e., there exists S R n with there exists a scalar R 0 satisfying x S and f := fx = min fx max max { x x S x R x 2 : fx fx 0} R 0 n Then, the iterate sequence {x k} generated by Algorithm 3 satisfies E[ fx k] f 2nLmaxR2 0. k Moreover, if f is strongly convex with parameter σ > 0, i.e., for all {x, y} R n, then fy fx + fx T y x + σ 2 y x 2 2 E[ fx k] f σ k fx0 f. Proof: follows Wright [7] It follows from Taylor s Theorem, definitions of L j and, and α = /, that fx k+ = f x k α ik fx ke ik fx k α ik fx ke T i k fx k + 2 α2 L ik ik fx k 2 = fx k α ik fx k α2 L ik ik fx k 2 fx k α ik fx k α2 ik fx 2 k = fx k α αlmax ik 2 fxk 2 Why? Exercise. = fx k ik fx k 2. 2 If we now take the expectation of both sides with respect to i k, we find that E ik [ fx k+] fx k n Subtracting f from both sides, shows that n j fx k 2 = fx k 2 fx k E ik [ fx k+] f fx k f 2 fx k 2 2.

5 From the previous slide, we have E ik [ fx k+] f fx k f 2 fx k 2 2. Taking expectation with respect to all the random variables {i 0, i, i 2,... }, and defining we find that φ k+ = E [ fx k+ ] f φ k := E[ fx k] f, [ = E i0,i 2,...i Eik k [ fx ] k+ x k] f [ E i0,i,...i k fx k f [ = E fx k f = φ k [ E fx k φ k 2 E [ fx k 2] 2, [ fxk 2 ] ] 2 2 [ fxk 2 ] ] 2 2 ] where we used Jensen s Inequality to derive the last inequality. 4 From the previous slide, we showed that φ k+ φ k 2 E [ fx k 2] 2. 5 Next, note from convexity of f, definition of R 0, and the fact that fx k fx 0 for all k by construction of the algorithm, that we have fx k f fx k T x x k fx k 2 x k x 2 R 0 fx k 2. 6 Taking expectation of both sides shows that Combining this bound with 5 yields φ k+ φ k E [ fx k 2 ] R 0 E[ fx k f ] = R 0 φ k. 2R 2 0 φ 2 k φ k φ k+ Combining this with φ k+ φ k see 5, we have 2R 2 0 φk φk+ φ 2 k 2R 2 0 φk φk+ φ kφ k+ = φ k+ φ k. φ 2 k. 7 From the previous slide, we showed that 2R 2 0 φ k+ φ k for all k. Summing both sides for k = 0,,..., l, shows that l l l = =. φ k+ φ k φ l φ 0 φ l 2R 2 0 2nL k=0 maxr 2 0 k=0 Rearranging, replacing l by k, and using the definition of φ k, this is equivalent to E[ fx k] f = φ k 2nLmaxR2 0 k which is the first desired result. For the second part, assume that f is strongly convex with parameter σ, i.e., that fy fx + fx T y x + σ 2 y x 2 2 for all {x, y} R n. By choosing x = x k and minimizing both sides with respect to y, we find that where y k f = min y R n fy min fxk + y R n fxkt y x k + σ 2 y xk 2 2 = fx k + fx k T y k x k + σ 2 y k x k 2 2 = fx k σ fxk σ fxk 2 2 = fx k 2σ fxk 2 2, := x k fxk. σ On the previous slide we proved that f fx k 2σ fxk Combining this bound with 4 gives the inequality φ k+ φ k E [ fx k 2 ] 2 2 Applying this recursively shows that φ k 2 E [ 2σ fx k f ] = φ k σ E [ fx k f ] = φ k σ φ k nl max = σ φ k φ k σ k φ 0 so that after we use the definition for φ k, we have E[ fx k] f σ k fx 0 f which is the second desired result.

6 Notation: Let L j denote the jth component Lipschitz constant, i.e., it satisfies j f x + te j j fx Lj t for all x R n and t. Let denote the coordinate Lipschitz constant, i.e., it satisfies := max L i. Let L denote the Lipschitz constant for f. Algorithm 4 Coordinate minimization with cyclic order and a fixed step size. : Choose α 0, /]. 2: Choose x 0 R n and set k 0. 3: loop 4: Choose i k = modk, n +. 5: Set x k+ x k α ik fx ke ik. 6: Set k k +. 7: end loop Comments: A reasonable stopping condition should be incorporated, such as fx k max{, fx 0 2} A maximum number of allowed iterations should be included in practice. Theorem 2.3 see [, Theorem 3.6,Theorem 3.9] and [7, Theorem 3] Suppose that α = / and let the following assumptions hold: f is convex f is globally Lipschitz continuous the minimum value of f is obtained on some set S, i.e., there exists S R n with there exists a scalar R 0 satisfying x S and f := fx = min fx max max { x x S x R x 2 : fx fx 0} R 0 n If {x k} is the iterate sequence of Algorithm 4, then for k {n, 2n, 3n,... } we have fx k f 4nLmax + nl2 /L 2 maxr k + 8 If f is strongly convex with parameter σ > 0 see then for k {n, 2n, 3n,... } fx k f k/n σ fx0 f + nl 2 /L 2 max Proof: See [, Theorem 3.6 and Theorem 3.9] and use i each iteration k" in [] is a cycle of n iterations; ii choose in [] the values L i = for all i; iii in [] we have p = since our blocks of variables are singletons, i.e., coordinate descent. Comments on Theorem 2.3: The numerator in 9 is On 2, while the numerator in the analogous result Theorem 2.2 for the random coordinate choice with fixed step size is On. But, Theorem 2.3 is a deterministic result, while Theorem 2.2 is a result in expectation. As part of the homework assignment, you will find out for yourself how these methods perform on a simple quadratic objective function. It can be shown that L n Lj see [3, Lemma 2 with α = ] It follows from the fact that j fx + te j j fx fx + te j fx 2 L t holds for all j, t, and x that L j L. By combining the previous two bullet points, we find that max L j L j n L j so that L n Roughly speaking, L/ is closer to when the coordinates are more decoupled". In light of 9, the complexity result for coordinate descent becomes better as the variables become more decoupled. This makes sense! Notation: Let L j denote the jth component Lipschitz constant, i.e., it satisfies j f x + te j j fx L j t for all x R n and t. Let denote the coordinate Lipschitz constant, i.e., it satisfies := max L i. Algorithm 5 Coordinate minimization with Gauss-Southwell Rule and a fixed step size. : Choose α 0, /]. 2: Choose x 0 R n and set k 0. 3: loop 4: Calculate i k as the steepest coordinate direction, i.e., 5: Set x k+ x k α ik fx ke ik. 6: Set k k +. 7: end loop i k argmax i fx k Comments: A reasonable stopping condition should be incorporated, such as fx k max{, fx 0 2}

7 Theorem 2.4 Suppose that α = / and let the following assumptions hold: f is convex f is globally Lipschitz continuous the minimum value of f is obtained on some set S, i.e., there exists S R n with there exists a scalar R 0 satisfying x S and f := fx = min fx max max { x x S x R x 2 : fx fx 0} R 0 n Then, the iterate sequence {x k} computed from Algorithm 5 satisfies fx k f 2nLmaxR k If f is strongly convex with parameter σ > 0 see then fx k f σ k fx0 f Proof: From earlier see 2, we showed that f k+ fx k ik fx k 2. Combining this with the choice i k argmax i fx k and the standard norm inequality v 2 n v, it holds that f k+ fx k ik fx k 2 = fx k fx k 2 2 fx k fx k Subtracting f from both sides and using the previous fact see 6 that we find that fx k f R 0 fx k 2, f k+ f fx k f 2 fx k 2 2 fx k f Using the notation φ k = fx k f, this is equivalent to φ k+ φ k 2R 2 0 φ 2 k. 2R 2 0 fxk f 2. From the previous slide, we have φ k+ φ k φ 2 2R 2 k 0 which is exactly the same as the inequality 7 except that we now have a different definition of φ k. Then, as shown in that proof, we have which is the desired result for convex f. fx k f = φ k 2nLmaxR2 0 k Next, assume that f is strongly convex, from which earlier we showed see 8 that f fx k 2σ fxk 2 2. Subtracting f from each side of 22 and then using the previous inequality shows that so that f k+ f fx k f 2 fx k 2 2 fx k f fx k f which is the last desired result. σ fxk f = σ k fx0 f σ fxk f Comments so far for fixed step size: Cyclic has the worst dependence on n: Cyclic: On 2 Random and Gauss-Southwell: On Random is a rate in expectation. Gauss-Southwell is a deterministic rate. There is a better analysis for Gauss-Southwell when we assume that f is strongly convex that changes the above comment! See [4]. We show this next.

8 Theorem 2.5 Suppose that α = / and let the following assumptions hold: f is l -strongly convex, i.e., there exists σ > 0 such that fy fx + fx T y x + σ 2 y x 2 for all {x, y} R n f is globally Lipschitz continuous the minimum value of f is obtained Then, the iterate sequence {x k} computed from Algorithm 5 satisfies fx k f k σ fx0 f Proof see [4]: Using l -strong convexity means that fy fx + fx T y x + σ 2 y x 2 for all {x, y} R n for the l -strong convexity parameter σ. If we now minimize both sides with respect to y and replace x by x k, we find that where y k f = minimize y R n minimize y R n fy fx k + fx k T y x k + σ 2 y xk 2 = fx k + fx k T y k x k + σ 2 y k x k 2 why? exercise = fx k 2σ fx k 2 := x k + z k with and l any index satisfying Therefore, we have that [z k ] i := { 0 if i l i fx k σ if i = l l { j : jfx k = fx k }. fx k 2 2σ fxk f. From the previous slide, we showed that fx k 2 2σ fxk f. Subtracting f from both sides of 2 and using the previous inequality shows that f k+ f fx k f fx k 2 fx k f σ fxk f L max = σ fxk f. Applying this inequality recursively gives k fx k f σ fx0 f which is the desired result. For strongly convex functions: Random coordinate choice has the expected rate of E[ fx k] f σ k fx0 f. Gauss-Southwell coordinate choice has the determinstic rate of k fx k f σ fx0 f 23 The bound for Gauss-Southwell is better since so that σ σ n σ σ σ n σ σ σ σ

9 Example: A Simple Diagonal Quadratic Function Consider the problem where minimize g T x + 2 xt Hx H = diagλ, λ 2,..., λ n with λ i > 0 for all i {, 2,..., n}. For this problem, we know that σ = min{λ, λ 2,..., λ n} and σ = n λ i i= Case : For λ = α for some α > 0, the minimum value for σ occurs when α = λ = λ 2 = = λ n, which gives Thus, the convergence constants are: random selection : Gauss-Southwell selection : σ = α and σ = α n. σ σ = = α α Case 2: For this other extreme case, let us suppose that λ = β and λ 2 = λ 3 = = λ n = α with α β. For this case, it can be shown that σ = β and σ = If we now take the limit as α we find that βα n α n + n βα n 2 = βα α + n β. σ = β and σ β = σ Thus, the convergence constants in the limit are: random selection : σ = Gauss-Southwell selection : σ = β β so that Gauss-Southwell is a factor n faster than using a random coordinate selection. so the convergence constants are the same; this is the worst case for Gauss-Southwell. Alternative strongly convex: individual coordinate Lipschitz constants. The iteration update is x k+ = x k + L ik ik fx ke ik Using a similar analysis as before, it can be shown k fx k f σ fx 0 f L ij Better decrease than prior analysis since see 23 k k new rate = σ σ = previous rate L ij faster provided at least one of the used L ij satisfies L ij <. Alternative 2 strongly convex: Lipschitz sampling. Use a random coordinate direction chosen using a non-uniform probability distribution: Pi k = j = L j n l= Ll for all j {, 2,..., n} Using an analysis similar to the previous one, but using the new probability distribution when computing the expectation, it can be shown that E[ fx k+] f σ E[ fxk] f n L with L being the average component Lipschitz constant, i.e., L := n The analysis was first performed in [2]. This rate is faster than uniform random sampling if not all of the component Lipschitz constants are the same. n i= L i

10 Alternative 3 strongly convex: Gauss-Southwell-Lipschitz rule. Choose i k according to the rule i k max i fx k 2 L i 24 Using an argument similar to that which led to 2, it may be shown that fx k+ fx k 2L ik ik fx k 2 The update 24 is designed to choose i k to minimize the guaranteed decrease given by 25, which uses the component Lipschitz constants. It may be shown, using this update, that fx k+ f σ L fx k f where σ L is the strong convexity parameter with respect to v L := n i= Li v i. It is shown in [4, Appendix 6.2] that max { } σ n L, σ σ L σ min {L i} 25 Ordering of constant in linear convergence results when f is strongly convex: Comments: random uniform sampling, < Gauss-Southwell < Gauss-Southwell with {L i} random Lipschitz sampling, {L i} < Gauss-Southwell-Lipschitz max j n i fx k Gauss-Southwell-Lipschitz: the best rate, but is the most expensive per iteration. Better rates if you know and use {L i} instead of just using their max, i.e.,. L i 2 At least as fast as the fastest of Gauss-Southwell and Lipschitz sampling options. Linear Equations Let m n, b R m, and A T = a... a m R n m with a i 2 = for all i. Furthermore, suppose that A T has full column rank, meaning that the linear system Aw = b has infinitely many solutions. To seek the least-length solution, we wish to solve The Lagrangian dual problem is minimize w R n 2 w 2 2 subject to Aw = b. minimize x R m fx := 2 AT x 2 2 b T x, where we note that fx = AA T x b and i fx = a T i A T x b i. The solutions to the primal and dual are related via w = A T x. Coordinate descent gives x k+ = x k αa T i A T x k b ie i. If we maintain an estimate w k = A T x k, then we see that w k+ = A T x k+ = A T x k αa T i A T x k b ie i = A T x k αa T i A T x k b ia i = w k αa T i w k b ia i. Note that if α =, then it follows by using a i 2 = that Linear Equations Summary Coordinate minimization for solving the dual problem associated with linear equations along the direction e i with α = satisfies the ith linear equation exactly. Sometimes called the method of successive projections. Update: w k+ = w k αa T i w k b ia i n + addition/subtractions 2n + multiplications 3n + 2 total floating-point operations Computing fx requires a multiplication with A, which is much more expensive. a T i w k+ = a T i w k a T i w k b ia i = a T i w k a T i w k b ia T i a i = b i so that the i-th equation is satisfied exactly.

11 Logistic Regression Give data {d j} N R n and labels { y j} N {, } associated with the data, solve minimize fx := N If we define the data matrix D such that then it follows that i fx = N D = N.. dt., dn T N Consider the coordinate minimization update log + e y jdj T x. e y jdj T x + e y jd j T yj dji. x x k+ = x k + αe ik for some i k {, 2,..., n} and α R. For efficiency, we store and update the required quantities {Dx k} using Dx k+ }{{} new value = Dx k + αe ik = Dx k + αde ik = Dx }{{} k +α D:, i k, old value where D:, i k denotes the i k-th column of D; if x 0 0, then we can set Dx 0 0. Logistic Regression Summary Coordinate minimization for the Logistic Regression problem does not require computing the entire gradient during every iteration. Update to obtain Dx k+ requires a single vector-vector add. Computing i fx k only requires accessing a single column of the data matrix D. Computing fx requires accessing the entire data matrix D. References I [] A. BECK AND L. TETRUASHVILI, On the convergence of block coordinate descent type methods, SIAM Journal on Optimization, , pp [2] D. LEVENTHAL AND A. S. LEWIS, Randomized methods for linear constraints: convergence rates and conditioning, Mathematics of Operations Research, , pp [3] Y. NESTEROV, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM Journal on Optimization, , pp [4] J. NUTINI, M. SCHMIDT, I. H. LARADJI, M. FRIEDLANDER, AND H. KOEPKE, Coordinate descent converges faster with the gauss-southwell rule than random selection, in Proceedings of the 32nd International Conference on Machine Learning ICML-5, 205, pp [5] M. J. POWELL, On search directions for minimization algorithms, Mathematical programming, 4 973, pp [6] A. P. RUSZCZYŃSKI, Nonlinear optimization, vol. 3, Princeton university press, [7] S. J. WRIGHT, Coordinate descent algorithms, Mathematical Programming, 5 205, pp

Is Greedy Coordinate Descent a Terrible Algorithm?

Is Greedy Coordinate Descent a Terrible Algorithm? Is Greedy Coordinate Descent a Terrible Algorithm? Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, Hoyt Koepke University of British Columbia Optimization and Big Data, 2015 Context: Random

More information

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016 AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 9 February 24th Overview In the previous lecture we reviewed results from multivariate calculus in preparation for our journey into convex

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Exercise List: Proving convergence of the (Stochastic) Gradient Descent Method for the Least Squares Problem.

Exercise List: Proving convergence of the (Stochastic) Gradient Descent Method for the Least Squares Problem. Exercise List: Proving convergence of the (Stochastic) Gradient Descent Method for the Least Squares Problem. Robert M. Gower. October 3, 07 Introduction This is an exercise in proving the convergence

More information

Trust Region Methods for Unconstrained Optimisation

Trust Region Methods for Unconstrained Optimisation Trust Region Methods for Unconstrained Optimisation Lecture 9, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Trust

More information

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3 6.896 Topics in Algorithmic Game Theory February 0, 200 Lecture 3 Lecturer: Constantinos Daskalakis Scribe: Pablo Azar, Anthony Kim In the previous lecture we saw that there always exists a Nash equilibrium

More information

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models Math. Program., Ser. A DOI 10.1007/s10107-017-1137-4 FULL LENGTH PAPER Global convergence rate analysis of unconstrained optimization methods based on probabilistic models C. Cartis 1 K. Scheinberg 2 Received:

More information

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity Coralia Cartis, Nick Gould and Philippe Toint Department of Mathematics,

More information

Lecture Quantitative Finance Spring Term 2015

Lecture Quantitative Finance Spring Term 2015 implied Lecture Quantitative Finance Spring Term 2015 : May 7, 2015 1 / 28 implied 1 implied 2 / 28 Motivation and setup implied the goal of this chapter is to treat the implied which requires an algorithm

More information

The Correlation Smile Recovery

The Correlation Smile Recovery Fortis Bank Equity & Credit Derivatives Quantitative Research The Correlation Smile Recovery E. Vandenbrande, A. Vandendorpe, Y. Nesterov, P. Van Dooren draft version : March 2, 2009 1 Introduction Pricing

More information

Approximate Composite Minimization: Convergence Rates and Examples

Approximate Composite Minimization: Convergence Rates and Examples ISMP 2018 - Bordeaux Approximate Composite Minimization: Convergence Rates and S. Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi MLO Lab, EPFL, Switzerland sebastian.stich@epfl.ch July 4, 2018

More information

Convergence of trust-region methods based on probabilistic models

Convergence of trust-region methods based on probabilistic models Convergence of trust-region methods based on probabilistic models A. S. Bandeira K. Scheinberg L. N. Vicente October 24, 2013 Abstract In this paper we consider the use of probabilistic or random models

More information

Stochastic Proximal Algorithms with Applications to Online Image Recovery

Stochastic Proximal Algorithms with Applications to Online Image Recovery 1/24 Stochastic Proximal Algorithms with Applications to Online Image Recovery Patrick Louis Combettes 1 and Jean-Christophe Pesquet 2 1 Mathematics Department, North Carolina State University, Raleigh,

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

What can we do with numerical optimization?

What can we do with numerical optimization? Optimization motivation and background Eddie Wadbro Introduction to PDE Constrained Optimization, 2016 February 15 16, 2016 Eddie Wadbro, Introduction to PDE Constrained Optimization, February 15 16, 2016

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

First-Order Methods. Stephen J. Wright 1. University of Wisconsin-Madison. IMA, August 2016

First-Order Methods. Stephen J. Wright 1. University of Wisconsin-Madison. IMA, August 2016 First-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) First-Order Methods IMA, August 2016 1 / 48 Smooth

More information

Support Vector Machines: Training with Stochastic Gradient Descent

Support Vector Machines: Training with Stochastic Gradient Descent Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Support vector machines Training by maximizing margin The SVM

More information

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 005 Seville, Spain, December 1-15, 005 WeA11.6 OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF

More information

Financial Optimization ISE 347/447. Lecture 15. Dr. Ted Ralphs

Financial Optimization ISE 347/447. Lecture 15. Dr. Ted Ralphs Financial Optimization ISE 347/447 Lecture 15 Dr. Ted Ralphs ISE 347/447 Lecture 15 1 Reading for This Lecture C&T Chapter 12 ISE 347/447 Lecture 15 2 Stock Market Indices A stock market index is a statistic

More information

IEOR E4004: Introduction to OR: Deterministic Models

IEOR E4004: Introduction to OR: Deterministic Models IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the

More information

CS 3331 Numerical Methods Lecture 2: Functions of One Variable. Cherung Lee

CS 3331 Numerical Methods Lecture 2: Functions of One Variable. Cherung Lee CS 3331 Numerical Methods Lecture 2: Functions of One Variable Cherung Lee Outline Introduction Solving nonlinear equations: find x such that f(x ) = 0. Binary search methods: (Bisection, regula falsi)

More information

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. In this paper we prove global

More information

Interpolation. 1 What is interpolation? 2 Why are we interested in this?

Interpolation. 1 What is interpolation? 2 Why are we interested in this? Interpolation 1 What is interpolation? For a certain function f (x we know only the values y 1 = f (x 1,,y n = f (x n For a point x different from x 1,,x n we would then like to approximate f ( x using

More information

Chapter 7: Portfolio Theory

Chapter 7: Portfolio Theory Chapter 7: Portfolio Theory 1. Introduction 2. Portfolio Basics 3. The Feasible Set 4. Portfolio Selection Rules 5. The Efficient Frontier 6. Indifference Curves 7. The Two-Asset Portfolio 8. Unrestriceted

More information

A Trust Region Algorithm for Heterogeneous Multiobjective Optimization

A Trust Region Algorithm for Heterogeneous Multiobjective Optimization A Trust Region Algorithm for Heterogeneous Multiobjective Optimization Jana Thomann and Gabriele Eichfelder 8.0.018 Abstract This paper presents a new trust region method for multiobjective heterogeneous

More information

Sy D. Friedman. August 28, 2001

Sy D. Friedman. August 28, 2001 0 # and Inner Models Sy D. Friedman August 28, 2001 In this paper we examine the cardinal structure of inner models that satisfy GCH but do not contain 0 #. We show, assuming that 0 # exists, that such

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May, 204 Review of Game heory: Let M be a matrix with all elements in [0, ]. Mindy (called the row player) chooses

More information

EE/AA 578 Univ. of Washington, Fall Homework 8

EE/AA 578 Univ. of Washington, Fall Homework 8 EE/AA 578 Univ. of Washington, Fall 2016 Homework 8 1. Multi-label SVM. The basic Support Vector Machine (SVM) described in the lecture (and textbook) is used for classification of data with two labels.

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

Adaptive cubic overestimation methods for unconstrained optimization

Adaptive cubic overestimation methods for unconstrained optimization Report no. NA-07/20 Adaptive cubic overestimation methods for unconstrained optimization Coralia Cartis School of Mathematics, University of Edinburgh, The King s Buildings, Edinburgh, EH9 3JZ, Scotland,

More information

Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity

Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity Coralia Cartis,, Nicholas I. M. Gould, and Philippe L. Toint September

More information

Department of Mathematics. Mathematics of Financial Derivatives

Department of Mathematics. Mathematics of Financial Derivatives Department of Mathematics MA408 Mathematics of Financial Derivatives Thursday 15th January, 2009 2pm 4pm Duration: 2 hours Attempt THREE questions MA408 Page 1 of 5 1. (a) Suppose 0 < E 1 < E 3 and E 2

More information

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,

More information

Ellipsoid Method. ellipsoid method. convergence proof. inequality constraints. feasibility problems. Prof. S. Boyd, EE364b, Stanford University

Ellipsoid Method. ellipsoid method. convergence proof. inequality constraints. feasibility problems. Prof. S. Boyd, EE364b, Stanford University Ellipsoid Method ellipsoid method convergence proof inequality constraints feasibility problems Prof. S. Boyd, EE364b, Stanford University Ellipsoid method developed by Shor, Nemirovsky, Yudin in 1970s

More information

Convergence Analysis of Monte Carlo Calibration of Financial Market Models

Convergence Analysis of Monte Carlo Calibration of Financial Market Models Analysis of Monte Carlo Calibration of Financial Market Models Christoph Käbe Universität Trier Workshop on PDE Constrained Optimization of Certain and Uncertain Processes June 03, 2009 Monte Carlo Calibration

More information

arxiv: v1 [math.pr] 6 Apr 2015

arxiv: v1 [math.pr] 6 Apr 2015 Analysis of the Optimal Resource Allocation for a Tandem Queueing System arxiv:1504.01248v1 [math.pr] 6 Apr 2015 Liu Zaiming, Chen Gang, Wu Jinbiao School of Mathematics and Statistics, Central South University,

More information

Decomposition Methods

Decomposition Methods Decomposition Methods separable problems, complicating variables primal decomposition dual decomposition complicating constraints general decomposition structures Prof. S. Boyd, EE364b, Stanford University

More information

MAT 4250: Lecture 1 Eric Chung

MAT 4250: Lecture 1 Eric Chung 1 MAT 4250: Lecture 1 Eric Chung 2Chapter 1: Impartial Combinatorial Games 3 Combinatorial games Combinatorial games are two-person games with perfect information and no chance moves, and with a win-or-lose

More information

25 Increasing and Decreasing Functions

25 Increasing and Decreasing Functions - 25 Increasing and Decreasing Functions It is useful in mathematics to define whether a function is increasing or decreasing. In this section we will use the differential of a function to determine this

More information

Chapter 7 One-Dimensional Search Methods

Chapter 7 One-Dimensional Search Methods Chapter 7 One-Dimensional Search Methods An Introduction to Optimization Spring, 2014 1 Wei-Ta Chu Golden Section Search! Determine the minimizer of a function over a closed interval, say. The only assumption

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS

DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS Vincent Guigues School of Applied Mathematics, FGV Praia de Botafogo, Rio de Janeiro, Brazil vguigues@fgv.br

More information

A No-Arbitrage Theorem for Uncertain Stock Model

A No-Arbitrage Theorem for Uncertain Stock Model Fuzzy Optim Decis Making manuscript No (will be inserted by the editor) A No-Arbitrage Theorem for Uncertain Stock Model Kai Yao Received: date / Accepted: date Abstract Stock model is used to describe

More information

Sublinear Time Algorithms Oct 19, Lecture 1

Sublinear Time Algorithms Oct 19, Lecture 1 0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation

More information

Portfolio Management and Optimal Execution via Convex Optimization

Portfolio Management and Optimal Execution via Convex Optimization Portfolio Management and Optimal Execution via Convex Optimization Enzo Busseti Stanford University April 9th, 2018 Problems portfolio management choose trades with optimization minimize risk, maximize

More information

Revenue Management Under the Markov Chain Choice Model

Revenue Management Under the Markov Chain Choice Model Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin

More information

CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to Finance Optimization Problems

CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to Finance Optimization Problems CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to Finance Optimization Problems January 26, 2018 1 / 24 Basic information All information is available in the syllabus

More information

HIGH ORDER DISCONTINUOUS GALERKIN METHODS FOR 1D PARABOLIC EQUATIONS. Ahmet İzmirlioğlu. BS, University of Pittsburgh, 2004

HIGH ORDER DISCONTINUOUS GALERKIN METHODS FOR 1D PARABOLIC EQUATIONS. Ahmet İzmirlioğlu. BS, University of Pittsburgh, 2004 HIGH ORDER DISCONTINUOUS GALERKIN METHODS FOR D PARABOLIC EQUATIONS by Ahmet İzmirlioğlu BS, University of Pittsburgh, 24 Submitted to the Graduate Faculty of Art and Sciences in partial fulfillment of

More information

Technical Report Doc ID: TR April-2009 (Last revised: 02-June-2009)

Technical Report Doc ID: TR April-2009 (Last revised: 02-June-2009) Technical Report Doc ID: TR-1-2009. 14-April-2009 (Last revised: 02-June-2009) The homogeneous selfdual model algorithm for linear optimization. Author: Erling D. Andersen In this white paper we present

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

Part 1: q Theory and Irreversible Investment

Part 1: q Theory and Irreversible Investment Part 1: q Theory and Irreversible Investment Goal: Endogenize firm characteristics and risk. Value/growth Size Leverage New issues,... This lecture: q theory of investment Irreversible investment and real

More information

Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization

Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint October 30, 200; Revised March 30, 20 Abstract

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.

More information

Lecture 8: Asset pricing

Lecture 8: Asset pricing BURNABY SIMON FRASER UNIVERSITY BRITISH COLUMBIA Paul Klein Office: WMC 3635 Phone: (778) 782-9391 Email: paul klein 2@sfu.ca URL: http://paulklein.ca/newsite/teaching/483.php Economics 483 Advanced Topics

More information

Tutorial 4 - Pigouvian Taxes and Pollution Permits II. Corrections

Tutorial 4 - Pigouvian Taxes and Pollution Permits II. Corrections Johannes Emmerling Natural resources and environmental economics, TSE Tutorial 4 - Pigouvian Taxes and Pollution Permits II Corrections Q 1: Write the environmental agency problem as a constrained minimization

More information

Optimal Allocation of Policy Limits and Deductibles

Optimal Allocation of Policy Limits and Deductibles Optimal Allocation of Policy Limits and Deductibles Ka Chun Cheung Email: kccheung@math.ucalgary.ca Tel: +1-403-2108697 Fax: +1-403-2825150 Department of Mathematics and Statistics, University of Calgary,

More information

Steepest descent and conjugate gradient methods with variable preconditioning

Steepest descent and conjugate gradient methods with variable preconditioning Ilya Lashuk and Andrew Knyazev 1 Steepest descent and conjugate gradient methods with variable preconditioning Ilya Lashuk (the speaker) and Andrew Knyazev Department of Mathematics and Center for Computational

More information

Richardson Extrapolation Techniques for the Pricing of American-style Options

Richardson Extrapolation Techniques for the Pricing of American-style Options Richardson Extrapolation Techniques for the Pricing of American-style Options June 1, 2005 Abstract Richardson Extrapolation Techniques for the Pricing of American-style Options In this paper we re-examine

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Intro to Economic analysis

Intro to Economic analysis Intro to Economic analysis Alberto Bisin - NYU 1 The Consumer Problem Consider an agent choosing her consumption of goods 1 and 2 for a given budget. This is the workhorse of microeconomic theory. (Notice

More information

No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate

No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate Fuzzy Optim Decis Making 217 16:221 234 DOI 117/s17-16-9246-8 No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate Xiaoyu Ji 1 Hua Ke 2 Published online: 17 May 216 Springer

More information

Chapter 5 Finite Difference Methods. Math6911 W07, HM Zhu

Chapter 5 Finite Difference Methods. Math6911 W07, HM Zhu Chapter 5 Finite Difference Methods Math69 W07, HM Zhu References. Chapters 5 and 9, Brandimarte. Section 7.8, Hull 3. Chapter 7, Numerical analysis, Burden and Faires Outline Finite difference (FD) approximation

More information

Markowitz portfolio theory

Markowitz portfolio theory Markowitz portfolio theory Farhad Amu, Marcus Millegård February 9, 2009 1 Introduction Optimizing a portfolio is a major area in nance. The objective is to maximize the yield and simultaneously minimize

More information

Chapter 5 Portfolio. O. Afonso, P. B. Vasconcelos. Computational Economics: a concise introduction

Chapter 5 Portfolio. O. Afonso, P. B. Vasconcelos. Computational Economics: a concise introduction Chapter 5 Portfolio O. Afonso, P. B. Vasconcelos Computational Economics: a concise introduction O. Afonso, P. B. Vasconcelos Computational Economics 1 / 22 Overview 1 Introduction 2 Economic model 3 Numerical

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

Essays on Some Combinatorial Optimization Problems with Interval Data

Essays on Some Combinatorial Optimization Problems with Interval Data Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university

More information

Order book resilience, price manipulations, and the positive portfolio problem

Order book resilience, price manipulations, and the positive portfolio problem Order book resilience, price manipulations, and the positive portfolio problem Alexander Schied Mannheim University PRisMa Workshop Vienna, September 28, 2009 Joint work with Aurélien Alfonsi and Alla

More information

On Complexity of Multistage Stochastic Programs

On Complexity of Multistage Stochastic Programs On Complexity of Multistage Stochastic Programs Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA e-mail: ashapiro@isye.gatech.edu

More information

GPD-POT and GEV block maxima

GPD-POT and GEV block maxima Chapter 3 GPD-POT and GEV block maxima This chapter is devoted to the relation between POT models and Block Maxima (BM). We only consider the classical frameworks where POT excesses are assumed to be GPD,

More information

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA We begin by describing the problem at hand which motivates our results. Suppose that we have n financial instruments at hand,

More information

Lecture 19: March 20

Lecture 19: March 20 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 19: March 0 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

The Probabilistic Method - Probabilistic Techniques. Lecture 7: Martingales

The Probabilistic Method - Probabilistic Techniques. Lecture 7: Martingales The Probabilistic Method - Probabilistic Techniques Lecture 7: Martingales Sotiris Nikoletseas Associate Professor Computer Engineering and Informatics Department 2015-2016 Sotiris Nikoletseas, Associate

More information

Application of an Interval Backward Finite Difference Method for Solving the One-Dimensional Heat Conduction Problem

Application of an Interval Backward Finite Difference Method for Solving the One-Dimensional Heat Conduction Problem Application of an Interval Backward Finite Difference Method for Solving the One-Dimensional Heat Conduction Problem Malgorzata A. Jankowska 1, Andrzej Marciniak 2 and Tomasz Hoffmann 2 1 Poznan University

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Infinite Reload Options: Pricing and Analysis

Infinite Reload Options: Pricing and Analysis Infinite Reload Options: Pricing and Analysis A. C. Bélanger P. A. Forsyth April 27, 2006 Abstract Infinite reload options allow the user to exercise his reload right as often as he chooses during the

More information

I R TECHNICAL RESEARCH REPORT. A Framework for Mixed Estimation of Hidden Markov Models. by S. Dey, S. Marcus T.R

I R TECHNICAL RESEARCH REPORT. A Framework for Mixed Estimation of Hidden Markov Models. by S. Dey, S. Marcus T.R TECHNICAL RESEARCH REPORT A Framework for Mixed Estimation of Hidden Markov Models by S. Dey, S. Marcus T.R. 98-31 I R INSTITUTE FOR SYSTEMS RESEARCH ISR develops, applies and teaches advanced methodologies

More information

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

Large-Scale SVM Optimization: Taking a Machine Learning Perspective Large-Scale SVM Optimization: Taking a Machine Learning Perspective Shai Shalev-Shwartz Toyota Technological Institute at Chicago Joint work with Nati Srebro Talk at NEC Labs, Princeton, August, 2008 Shai

More information

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T.

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T. Practice July 3rd, 2012 Laboratory for Information and Decision Systems, M.I.T. 1 2 Infinite-Horizon DP Minimize over policies the objective cost function J π (x 0 ) = lim N E w k,k=0,1,... DP π = {µ 0,µ

More information

Lecture 8: Introduction to asset pricing

Lecture 8: Introduction to asset pricing THE UNIVERSITY OF SOUTHAMPTON Paul Klein Office: Murray Building, 3005 Email: p.klein@soton.ac.uk URL: http://paulklein.se Economics 3010 Topics in Macroeconomics 3 Autumn 2010 Lecture 8: Introduction

More information

Pricing Problems under the Markov Chain Choice Model

Pricing Problems under the Markov Chain Choice Model Pricing Problems under the Markov Chain Choice Model James Dong School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jd748@cornell.edu A. Serdar Simsek

More information

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens. 102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Notes on the EM Algorithm Michael Collins, September 24th 2005

Notes on the EM Algorithm Michael Collins, September 24th 2005 Notes on the EM Algorithm Michael Collins, September 24th 2005 1 Hidden Markov Models A hidden Markov model (N, Σ, Θ) consists of the following elements: N is a positive integer specifying the number of

More information

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008 (presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35 The Secretary problem You have

More information

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg :

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

A Stochastic Approximation Algorithm for Making Pricing Decisions in Network Revenue Management Problems

A Stochastic Approximation Algorithm for Making Pricing Decisions in Network Revenue Management Problems A Stochastic Approximation Algorithm for Making ricing Decisions in Network Revenue Management roblems Sumit Kunnumkal Indian School of Business, Gachibowli, Hyderabad, 500032, India sumit kunnumkal@isb.edu

More information

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ. Sufficient Statistics Lecture Notes 6 Sufficiency Data reduction in terms of a particular statistic can be thought of as a partition of the sample space X. Definition T is sufficient for θ if the conditional

More information

On the complexity of the steepest-descent with exact linesearches

On the complexity of the steepest-descent with exact linesearches On the complexity of the steepest-descent with exact linesearches Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint 9 September 22 Abstract The worst-case complexity of the steepest-descent algorithm

More information

Valuation of performance-dependent options in a Black- Scholes framework

Valuation of performance-dependent options in a Black- Scholes framework Valuation of performance-dependent options in a Black- Scholes framework Thomas Gerstner, Markus Holtz Institut für Numerische Simulation, Universität Bonn, Germany Ralf Korn Fachbereich Mathematik, TU

More information

Allocation of Risk Capital via Intra-Firm Trading

Allocation of Risk Capital via Intra-Firm Trading Allocation of Risk Capital via Intra-Firm Trading Sean Hilden Department of Mathematical Sciences Carnegie Mellon University December 5, 2005 References 1. Artzner, Delbaen, Eber, Heath: Coherent Measures

More information

The Optimization Process: An example of portfolio optimization

The Optimization Process: An example of portfolio optimization ISyE 6669: Deterministic Optimization The Optimization Process: An example of portfolio optimization Shabbir Ahmed Fall 2002 1 Introduction Optimization can be roughly defined as a quantitative approach

More information

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods EC316a: Advanced Scientific Computation, Fall 2003 Notes Section 4 Discrete time, continuous state dynamic models: solution methods We consider now solution methods for discrete time models in which decisions

More information

Homework Assignments

Homework Assignments Homework Assignments Week 1 (p. 57) #4.1, 4., 4.3 Week (pp 58 6) #4.5, 4.6, 4.8(a), 4.13, 4.0, 4.6(b), 4.8, 4.31, 4.34 Week 3 (pp 15 19) #1.9, 1.1, 1.13, 1.15, 1.18 (pp 9 31) #.,.6,.9 Week 4 (pp 36 37)

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Lecture l(x) 1. (1) x X

Lecture l(x) 1. (1) x X Lecture 14 Agenda for the lecture Kraft s inequality Shannon codes The relation H(X) L u (X) = L p (X) H(X) + 1 14.1 Kraft s inequality While the definition of prefix-free codes is intuitively clear, we

More information

Econ 582 Nonlinear Regression

Econ 582 Nonlinear Regression Econ 582 Nonlinear Regression Eric Zivot June 3, 2013 Nonlinear Regression In linear regression models = x 0 β (1 )( 1) + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β it is assumed that the regression

More information

MTH6154 Financial Mathematics I Interest Rates and Present Value Analysis

MTH6154 Financial Mathematics I Interest Rates and Present Value Analysis 16 MTH6154 Financial Mathematics I Interest Rates and Present Value Analysis Contents 2 Interest Rates 16 2.1 Definitions.................................... 16 2.1.1 Rate of Return..............................

More information