Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization

Size: px
Start display at page:

Download "Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization"

Transcription

1 Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint October 30, 200; Revised March 30, 20 Abstract The adaptive cubic regularization algorithms described in Cartis, Gould & Toint (2009, 200) for unconstrained (nonconvex) optimization are shown to have improved worst-case efficiency in terms of the function- and gradient-evaluation count when applied to convex and strongly convex objectives. In particular, our complexity upper bounds match in order (as a function of the accuracy of approximation), and sometimes even improve, those obtained by Nesterov (2004, 2008) and Nesterov & Polyak (2006) for these same problem classes, without requiring exact Hessians or exact or global solution of the subproblem. An additional outcome of our approximate approach is that our complexity results can naturally capture the advantages of both first- and second-order methods. Introduction State-of-the-art methods for unconstrained smooth optimization typically depend on trust-region [6] or line-search [7] techniques to globalise Newton-like iterations. Of late, a third alternative, in which a local cubic over-estimator of the objective is used as the basis for a regularization strategy for the step computation, has been proposed [9, 2, 3]; see [2, ] for a detailed description of these contributions. Such ideas have been refined so that they are now well suited to large-scale computation for a wide class of nonlinear nonconvex objectives; rigorous convergence and complexity analyses under weak assumptions, together with promising numerical experience with these techniques, are available [2, 3]. Our objective in this paper is to show that the complexity bounds for this type of algorithms significantly improve in the presence of convexity or strong convexity. Specifically, at each iteration of what we call an ARC(Adaptive Regularization with Cubics) framework, a possibly nonconvex model m k (s) = f(x k )+s T g k + 2s T B k s+ 3σ k s 3, (.) is employed as an approximation to the smooth objective f(x k + s) we wish to minimize. Here σ k > 0 is a regularization weight, we have written f(x k ) = g(x k ) = g k and here and hereafter we choose the Euclidean norm = 2. To compute the change s k to x k, the model m k is globally minimized, either exactly or approximately, with respect to s IR n. Note that if B k is taken to be the Hessian H(x) of f, and the latter is globally Lipschitz continuous with Lipschitz constant σ k /2, we have the overestimation property f(x k +s) m k (s) for all s IR n [2, ]. Thus in this case, minimizing m k with respect to s forces a decrease in f from the value f(x k ), since f(x k ) = m k (0). In the general ARC algorithmic framework, School of Mathematics, University of Edinburgh, The King s Buildings, Edinburgh, EH9 3JZ, Scotland, UK. coralia.cartis@ed.ac.uk. All three authors are grateful to the Royal Society for its support through the International Joint Project Computational Science and Engineering Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX 0QX, England, UK. nick.gould@stfc.ac.uk. This work was supported by the EPSRC grant EP/E05335/. Department of Mathematics, FUNDP - University of Namur, 6, rue de Bruxelles, B-5000, Namur, Belgium. philippe.toint@fundp.ac.be.

2 2 Complexity of adaptive cubic regularization methods for convex unconstrained optimization H need not be Lipschitz, nor need B k be H(x k ), but in this case σ k must be adjusted as the computation proceeds to ensure convergence [2, 3, S2.]. The generic ARC framework [2, 3, 2.] may be summarised as follows: Algorithm.: Adaptive Regularization using Cubics (ARC) [2, 3]. Given x 0, γ 2 γ >, > η 2 η > 0, and σ 0 > 0, for k = 0,,... until convergence,. Compute a step s k for which m k (s k ) m k (s C k), (.2) where the Cauchy point s C k = αkg C k and αk C = arg min m k ( αg k ). (.3) α IR+ 2. Compute f(x k +s k ) and ρ k = f(x k) f(x k +s k ). (.4) f(x k ) m k (s k ) 3. Set 4. Set σ k+ { xk +s x k+ = k if ρ k η otherwise. x k (0,σ k ] if ρ k > η 2 [very successful iteration] [σ k,γ σ k ] if η ρ k η 2 [successful iteration] [γ σ k,γ 2 σ k ] otherwise. [unsuccessful iteration] (.5) (.6) For a detailed description of the algorithm construction, including a justification that (.2) (.4) are well-ined until termination, see [2]. The above ARC algorithm is a very general first-order framework that due to the Cauchy condition (.2) ensures at least a steepest-descent-like decrease in each (successful) iteration. This is sufficient to ensure global convergence of ARC both to first-order critical points [2, 2.] and with steepest-descent-like function-evaluation complexity bounds of order ǫ 2 [3, 3] to guarantee g k ǫ. (.7) These results require that g(x) is uniformly and Lipschitz continuous (respectively) and that {B k } is uniformly bounded above. Clearly, the Cauchy point s C k achieves (.2) in a computationally inexpensive way (see [2, 2.]); the choice of interest, however, is when s k is an (approximate global) minimizer of m k (s) and B k, a nontrivial approximation to the Hessian H(x k ) (see 3). Although m k might be nonconvex, its global minimizer over IR n is always well-ined and can be characterized in a computationally-viable way [2, Thm.3.], [9, 2]. This characterization is best suited for exact computation when B k is sparse or of modest size. For large problems, a suitable alternative is to improve upon the Cauchy point by globally minimizing m k over (nested and increasing) subspaces that include g k which ensures (.2) remains satisfied until a suitable termination condition is achieved. (For instance, in our ARC implementation [2], the successive subspaces that the model is minimized over are generated using Lanczos method.) These ARC variants are summarized in Algorithm.2; where h k ( s k, g k ) is some generic function of s k and g k, with specific examples of suitable choices given in (.0) and (.) below.

3 C. Cartis, N. I. M. Gould and Ph. L. Toint 3 Algorithm.2: ARC (h) [2, 3]. In each iteration k of Algorithm., compute s k in Step as the global minimizer of min s IR nm k(s) subject to s L k, (.8) where L k is a subspace of IR n containing g k, and such that the termination condition TC.h s m k (s k ) θ k g k, where θ k = κ θ min(,h k ) and h k = h k ( s k, g k ) > 0, (.9) is satisfied, for some constant κ θ (0,) chosen at the start of the algorithm. Clearly, TC.h is satisfied when s k is the global minimizer of m k over the whole space, but one hopes that termination of the subspace minimization will occur well before this inevitable outcome, at least in early stages of the iteration. Note that in fact, TC.h only requires an approximate critical point of the model, and as such the global subspace minimization in (.8) may only need to hold along the one-dimensional subspace determined by s k [2, (3.), (3.2)], provided (.2) holds. For ARC (h) to be a proper second-order method, a careful choice of h k needs to be made, such as h k = s k or h k = g k 2, yielding the termination criteria and TC.s s m k (s k ) θ k g k, where θ k = κ θ min(, s k ). (.0) TC.g2 s m k (s k ) θ k g k, where θ k = κ θ min (, g k 2). (.) Forthwith, we refer to ARC (h) with TC.s and with TC.g2 as ARC (S) and ARC (g2), respectively. The benefit of requiring the more stringent conditions (.8), and (.0) or (.), in the above ARC variants is that ARC (S) and ARC (g2) are also guaranteed to converge locally Q-quadratically and globally to secondorder critical points [2, 4.2, 5], and to have improved function-evaluation complexity of order ǫ 3/2 to ensure (.7) [3, 5], provided H(x) is globally Lipschitz continuous along the path of the iterates and there is sufficiently good agreement between the H(x k ) and its approximation B k. In this paper, we investigate the worst-case function-evaluation complexity of the basic ARC framework and its second-order variants ARC (S) and/or ARC (g2) when applied to the minimization of special classes of objectives, namely convex and strongly convex ones. In particular, we show that as expected, these algorithms satisfy improved bounds compared to the nonconvex case. Specifically, generic ARC(Algorithm.) takes at most O(ǫ ) and O(logǫ ) function-evaluations to reach the neighbourhood f(x k ) f ǫ (.2) of the (global) minimum f of convex and strongly convex objectives, respectively, with Lipschitz continuous gradients, where the dependence of these bounds on problem conditioning is carefully considered (see page 9). Unsurprinsingly, due to the simple Cauchy decrease condition (.2) required on the step, these bounds match in order those for standard steepest-descent methods on the same classes of objectives [0]. When applied to convex objectives with bounded level sets and globally Lipschitz continuous Hessian, ARC (g2) with B k = H(x k ) will reach approximate optimality in the (.2) sense in at most O(ǫ /2 ) function-evaluations; this matches in order the bound obtained in [, 2] for cubic regularization on the same problem class when the exact subproblem solution is computed in each iteration. Note that asymptotically, in ARC (g2), the subproblem is solved to higher accuracy than in ARC (S), which seems to be crucial when deriving the improved bound compared to the first-order basic ARC. We also present an illustration on a common convex objective that indicates that despite being worst-case, the bounds presented here may be tight.

4 4 Complexity of adaptive cubic regularization methods for convex unconstrained optimization Iftheobjectiveisstronglyconvex, thenarc (S) andarc (g2) (withapproximatehessiansasb k )require at most O( log κ + log log ǫ ) function-evaluations to satisfy (.2), where κ is a problem-dependent constant and where the double logarithm term expresses the local Q-quadratic rate of convergence of these variants. The strongly convex-case bound improves that obtained in [, 2] for cubic regularization with exact subproblem solution in that the former has a logarithmic dependence on κ while the latter only includes a polynomial dependence on problem condition numbers. Our result is a direct consequence of using increasing accuracy in the subproblem solution with first-order-like behaviour, and hence complexity early on, and second-order characteristics asymptotically. Note that the assumption labeling used throughout the paper was chosen to maintain consistency with notation introduced in [2, 3]. The structure of the paper is as follows. Section 2 analyzes the complexity of basic ARC, while Section 3 that of the second-order variants ARC (S) and ARC (g2), in the convex and strongly convex cases. Section 3.3 presents a convex example of inefficient ARC behaviour with O(ǫ /2 ) complexity, and Section 4 draws some conclusions and open questions. 2 The complexity of the basic ARC framework This section addresses the basic ARC algorithm, Algorithm.. We assume that AF. f C (IR n ), (2.) and that the gradient g is Lipschitz continuous on an open convex set X containing all the iterates {x k }, AF.4 g(x) g(y) κ H x y, for all x, y X, and some κ H. (2.2) If f C 2 (IR n ), then AF.4 is satisfied if the Hessian H(x) is bounded above on X. Note however, that for now, we only assume AF.. In particular, no Lipschitz continuity of H(x) will be required in this section. The model m k is assumed to achieve AM. B k κ B, for all k 0, and some κ B. (2.3) In the case when f C 2 (IR n ) and B k = H(x k ) for all k, then AF.4 implies AM. with κ B = κ H. Naturally, we assume f is bounded below, letting f > be the (global) minimum of f and 2. Relating successful and total iteration counts k = f(x k ) f, for all k 0. (2.4) Note that the total number of ARC iterations is the same as the number of function evaluations (as we also need to evaluate f on unsuccessful iterations in order to be able to compute ρ k in (.4)), while the number of successful ARC iterations is the same as that of gradient evaluations. Let us introduce some useful notation. Throughout, denote the index set and, given any j 0, let S = {k 0 : k successful or very successful in the sense of (.6)}, (2.5) S j = {k j : k S}, (2.6) with S j denoting the cardinality of the latter. Concerning σ k, we may require that on each very successful iteration k S j, σ k+ is chosen such that σ k+ γ 3 σ k, for some γ 3 (0,]. (2.7) Note that (2.7) allows {σ k } to converge to zero on very successful iterations (but no faster than {γ k 3}). A stronger condition on σ k is σ k σ min, k 0, (2.8)

5 C. Cartis, N. I. M. Gould and Ph. L. Toint 5 for some σ min > 0. These conditions on σ k and the construction of ARC s Steps 2 4 allow us to quantify the total iteration count as a function of the successful ones. Theorem 2.. For any fixed j 0, let S j be ined in (2.6). Assume that (2.7) holds and let σ > 0 be such that σ k σ, for all k j. (2.9) Then j logγ ( ) 3 σ S j + log. (2.0) logγ logγ σ 0 In particular, if σ k satisfies (2.8), then it also achieves (2.7) with γ 3 = σ min /σ, and we have that j + 2 ( ) σ log S j. (2.) logγ σ min Proof. Apply [3, Theorem 2.] and the fact that the unsuccessful iterations up to j together with S j form a partition of {0,...,j}. Values for σ in (2.9) are provided in (2.6) below, and under stronger assumptions, in (3.6). (Note that due to Lemmas 2.4 and 2.6, the condition required for (2.6) is achieved by the gradient of convex and strongly convex functions, with appropriate values of ǫ, whenever k > ǫ.) Thus, based on the above theorem, we are left with bounding the successful iteration count S j until iteration j that is within ǫ of the optimum, which we focus on for the remainder of the paper and that has the outcome that the total iteration count up to j is of the same order in ǫ as S j. 2.2 Some useful properties The next lemma summarizes some useful properties of the basic ARC iteration. Lemma 2.2. Suppose that the step s k satisfies (.2). i) [2, Lemma 2.] Let AM. hold. Then for k 0, we have that f(x k ) m k (s k ) g k 6 2 min g k, g k, (2.2) κ B 2 σ k and so k in (2.4) is monotonically decreasing, k+ k, k 0. (2.3) ii) [3, Lemma 3.2] Let AF., AF.4 and AM. hold. Also, assume that Then iteration k is very successful and σk g k > 08 2 η 2 (κ H +κ B ) = κ HB. (2.4) σ k+ σ k. (2.5)

6 6 Complexity of adaptive cubic regularization methods for convex unconstrained optimization iii) [3, Lemma 3.3] Let AF., AF.4 and AM. hold. For any ǫ > 0 and j 0 such that g k > ǫ for all k {0,...,j}, we have ( σ k max σ 0, γ 2κ 2 ) HB, 0 k j. (2.6) ǫ A generic property follows. Lemma 2.3. Assume AF., AF.4 and AM. hold, and that when applying ARC to minimizing f, for some κ c > 0 and p > 0, with k ined in (2.4). Then where κ HB is ined in (2.4) and k κ c g k p, for all k 0, (2.7) f(x k ) m k (s k ) κ m 2/p k, for all k 0, (2.8) κ m = 2 2κ 2/p c min κ/p c σ 0 /p 0,. (2.9) γ2 κ HB Proof. We first show that σ k /p k For this, we use the implication ( max σ 0 /p 0,γ 2 κ /p c κ 2 HB ), for all k 0. (2.20) σ k /p k > κ /p c κ 2 HB = σ k+ /p k+ σ k /p k, (2.2) which follows from (2.5) in Lemma 2.2 ii), (2.7) and (2.3). Thus, when σ 0 /p 0 γ 2 κ /p c κ 2 HB, (2.2) implies σ k /p k γ 2 κ /p c κ 2 HB, where the factor γ 2 is introduced for the case when σ k /p k is less than κ /p c κ 2 HB and the iteration k is not very successful. Letting k = 0 in (2.2) gives the first inequality in (2.20) when σ 0 /p 0 γ 2 κ /p c κ 2 HB, since γ 2 >. Next we deduce from (2.2) and (2.7) that f(x k ) m k (s k ) 2/p k 6 min,, 2κ /p c κ /p c κ B 2κ /(2p) c σ k /p k which together with (2.20) and the inition of κ HB, gives (2.8) and (2.9). In the next two sections, we show that when applied to convex and strongly convex functions with globally Lipschitz continuous gradients, the basic ARC algorithm, with only the Cauchy condition for the step computation, satisfies the same upper iteration complexity bounds namely O(ǫ ) and O( logǫ ), respectively as steepest descent when applied to these problem classes; see [0, Theorems 2..4, 2..5]. 2.3 Basic ARC complexity on convex objectives Let us now assume that AF.7 f is convex, (2.22)

7 C. Cartis, N. I. M. Gould and Ph. L. Toint 7 and also that the level sets of f are bounded, namely AF.8 x x D, for all x such that f(x) f(x 0 ), (2.23) where x is any global minimizer of f and D. The following property specifies the values of p and κ c for which (2.7) holds in the convex case. Lemma 2.4. Assume AF. and AF.7 AF.8 hold, and let f = f(x ) be the (global) minimum of f. When applying ARC to minimizing f, we have for (2.4), k D g k, for all k 0. (2.24) Proof. AF.7 implies f(x) f(y) g(y) T (x y), for all x,y IR n. This with x = x and y = x k, the Cauchy-Schwarz inequality, f(x k ) f(x 0 ) and AF.8 give (2.24). An O(ǫ ) upper bound on the ARC iteration count for reaching within ǫ optimality of the objective value is given next. Theorem 2.5. Assume AF., AF.4, AF.7 AF.8 and AM. hold, and let f = f(x ) be the (global) minimum of f. Then, when applying ARC to minimizing f, we have where S j is ined in (2.6), and κ c m has the expression κ c m j = f(x j ) f S j η κ c, j 0, (2.25) m = Thus, given any ǫ > 0, ARC takes at most ( D 2 2D min, 2 σ 0 0 κ c s ǫ ). (2.26) γ2 κ HB successful iterations and gradient evaluations to generate f(x j ) f ǫ, where κ c s = (η κ c m). (2.27) Proof. From (.4) and (.5), we have f(x k ) f(x k+ ) η (f(x k ) m k (s k )), k S. (2.28) Lemma 2.4 implies that the conditions of Lemma 2.3 are satisfied with p = and κ c = D, and so (2.8) and (2.28) imply f(x k ) f(x k+ ) η κ c m 2 k, where κ c m is ined in (2.26). Thus, recalling (2.4), we have k k+ η κ c m 2 k, k S, or equivalently, k+ k = k k+ k k+ k η κ c m η κ c m, k S, k+

8 8 Complexity of adaptive cubic regularization methods for convex unconstrained optimization where in the last inequality, we used (2.3). Since k = k+ for any k / S, summing up the above inequalities up to j gives j 0 + S j η κ c m S j η κ c m, j 0, which gives (2.25), and hence, also (2.27). 2.4 Basic ARC complexity on strongly convex objectives When we know even more information about f, namely, that f is strongly convex, a global linear rate of convergence, and hence, an improved iteration-complexity of at most O(logǫ ) can be proved for the ARC basic framework, as we show next. This represents, as expected, a marked improvement over the global sublinear rate of convergence obtained in the nonconvex and convex cases, and the corresponding iteration complexity bounds. Let us assume that f is strongly convex, namely, there exists a constant µ > 0 such that AF.9 f(y) f(x)+g(x) T (y x)+ µ 2 y x 2, x, y IR n. (2.29) When AF.9 holds, f has a unique minimizer, say x. The next property specifies the values of p and κ c for which (2.7) holds in the strongly convex case. Lemma 2.6. Assume AF. and AF.9 hold, and let x be the global minimizer of f. When applying ARC to minimizing f, we have k 2µ g k 2, for all k 0. (2.30) Proof. AF.9 implies f(y) f(x) + g(x) T (y x) + 2µ g(x) g(y) 2, for all x,y IR n ; see [0, Theorem 2..0] and its proof. Letting x = x and y = x k in the latter gives (2.30). AnO(logǫ )upperboundonthearciteration count for reachingwithin ǫoptimality of theobjective value is given next. Theorem 2.7. Assume AF., AF.4, AF.9 and AM. hold, and let x be the global minimizer of f. Then, when applying ARC to minimizing f, we have j = f(x j ) f ( η κ sc m) Sj 0, j 0, (2.3) where S j is ined in (2.6), and κ sc m has the expression ( ) κ sc m = µ 6 2 min, (0,). (2.32) σ0 2µ 0 γ2 κ HB Thus, given any ǫ > 0, ARC takes at most κ sc s log 0 ǫ (2.33) successful iterations and gradient evaluations, to generate f(x j ) f ǫ, where κ sc s = (η κ sc m).

9 C. Cartis, N. I. M. Gould and Ph. L. Toint 9 Proof. Lemma 2.6 implies that (2.7) holds with p = 2 and κ c = /(2µ), and so the conditions of Lemma 2.3 are satisfied and it follows immediately from (2.8), (2.9), (2.28) and the above choices of p and κ c that k k+ = f(x k ) f(x k+ ) η κ sc m k, where κ sc m is ined in (2.32), which immediately gives (2.3) since k = k+ for any k / S. To show that κ sc m <, use γ 2, κ HB > κ H and κ H /µ ; the latter inequality follows from (2.30) and from (2.37) with x = x k. The bound (2.3) and the inequality ( η κ sc m) Sj e ηκsc m Sj imply that j ǫ provided e ηκsc m Sj 0 ǫ, which then gives (2.33) by applying the logarithm. Some remarks on basic ARC s complexity for convex and strongly convex objectives. Let us comment on the results in Theorems 2.5 and 2.7. Note that, despite AF.7 or AF.9, no convexity assumption was made on m k, confirming the basic ARC framework to be a steepest-descent-like method. The only model assumption is AM.. Our results match in order, as a function of the accuracy ǫ, the (nonoptimal) complexity bounds for steepest-descent applied to convex and strongly convex objectives with Lipschitz continuous gradients given in [0, Corollary 2..2, Theorem 2..5]. Let us now discuss the condition numbers that occur in our bounds and their connection to standard measures of conditioning. Consider first the convex-case bound in Theorem 2.5. Assume that the initial regularization parameter σ 0 is chosen small enough, namely, σ 0 / g 0. Then (2.24) implies that D/(σ 0 0 ) and so (2.26) becomes κ c m = (2 2γ 2 κ HB D 2 ), where we also used that γ 2, κ HB. Recalling (2.4) and that γ 2, η and η 2 are user-chosen constants, we deduce that the bound (2.27) is a problem-independent constant multiple of max(κ B,κ H )D 2, ǫ where D measures the size of the f(x 0 ) level set, and κ H and κ B are the exact and approximate Lipschitz constants of the gradient, respectively. The displayed expression coincides with the bound in [0, Corollary 2..2] when the exact Hessian is used in place of B k so that κ B = κ H and all iterations are successful. Consider now the strongly convex case and Theorem 2.7. Choosing again σ 0 / g 0, (2.30) provides that σ 0 2µ 0. Using this, γ 2 and κ HB, (2.32) becomes κ sc m = (6 2γ 2 κ HB /µ). Employing (2.4) for the expression of κ HB, (2.3) now becomes ( j = f(x j ) f η ) Sj 0, (2.34) c(h) where η = η ( η 2 )/(2592 γ 2 ) (0,) and c(h) = max(κ H,κ B ). (2.35) µ Note that c(h) is a uniform upper bound on the Hessian s condition number, which equals the common measure κ H /µ when exact Hessians are employed in place of B k. Recalling that η,2 and γ 2 are user-chosen parameters, we deduce that, whenever σ 0 / g 0, (2.33) is a problem-independent constant multiple of c(h)log 0 ǫ, (2.36) where c(h) is ined in (2.35). When B k = H(x k ), the function-decrease bound for steepest descent method in [0, Theorem 2..5] has a similar form to the simplified bound (2.34) with the term η/c(h) replaced by the slightly smaller expression (c(h) ) 2 /(c(h)+) 2. Note that both (2.27) and (2.33) are worse than the complexity bounds of the optimal gradient method [0]. The latter enjoys a worst-case bound of order O(/ ǫ) when applied to convex objectives [0, Theorems 2..7, 2.2.], and of order O(( c(h) ) 2 /( c(h)+) 2 logǫ ) for strongly convex functions. These two upper bounds match the lower complexity bounds for the minimization of convex and strongly convex functions with Lipschitz continuous gradient by means of gradient methods [0], and hence they are optimal from a worst-case complexity point of view.

10 0 Complexity of adaptive cubic regularization methods for convex unconstrained optimization 2.5 Complexity of basic ARC generating approximately-optimal gradients Let us address the implication of the above results on the ARC s complexity for achieving (.7). This issue is important as the latter can be used as a termination condition for ARC, while k in (2.4), whose complexity was estimated above, cannot be computed in practice since f and x are unknown. The following generic property is useful in this and other contexts. Lemma 2.8. Let AF. and AF.4 hold, and assume f is bounded below by f. Then f(x) f f(x) f(x αg(x)) 2κ H g(x) 2, for all α 0 and x IR n. (2.37) Thus, when ARC is applied to minimizing f, we have and so, for any ǫ > 0, g j ǫ holds whenever k 2κ H g k 2, k 0, (2.38) f(x j ) f ǫ2 2κ H. (2.39) Proof. First-order Taylor expansion and AF.4 give the overestimation property f(x+s) = f(x)+g(x) T s+ Thus, letting s = αg(x), we obtain 0 f(x) f(x αg(x)) (g(x+ts) g(x))dt f(x)+g(x) T s+ κ H 2 s 2, for all x,s IR n. ( α κ H 2 α2) g(x) 2, for all α 0. The minimum of the right-hand side of the above inequality is attained at α = /κ H, giving (2.37). Under the conditions of Theorem 2.5, ARC will take at most O(ǫ 2 ) successful iterations to ensure (2.39) when applied to convex objectives. For strongly convex functions, Theorem 2.7 implies the same order of complexity of logǫ for g j ǫ. (Note that the term f(x 0 ) f in (2.25) and (2.3) can be replaced by D g 0 and g 0 2 /(2µ), respectively.) Now recall [3, Corollary 3.4], which states that, when applied to nonconvex objectives, the basic ARC scheme takes at most O(ǫ 2 ) iterations to generate a first iterate k with g j ǫ. Hence we see that the difference between the convex and nonconvex cases is not so great, and the bound improvement (for g j ) is somewhat slight. Namely, as the bound on g j in the convex case was obtained from that on the function values f(x j ) which decrease monotonically, it follows from (2.38) that once g k ǫ, it will remain as such for all subsequent iterations, and so the O(ǫ 2 ) iteration bound represents the maximum total number of (successful) iterations with g k > ǫ that may occur. Clearly, there is a marked improvement in ARC s worst-case complexity for the strongly convex case. 3 The complexity of second-order ARC variants Let us now consider the complexity of Algorithm.2 with inner iteration termination criteria (.0) and (.), namely of the ARC (S) and ARC (g2) variants. For the remainder of the paper, we assume that AF.3 f C 2 (IR n ). (3.)

11 C. Cartis, N. I. M. Gould and Ph. L. Toint While no assumption on the Hessian of f being globally or locally Lipschitz continuous has been imposed in the complexity results of 2.2, we now require that the objective s Hessian is globally Lipschitz continuous on the path of the iterates, namely, there exists a constant L > 0 independent of k such that AF.6 H(x) H(x k ) L x x k, for all x [x k,x k +s k ] and all k 0, (3.2) and that B k and H(x k ) agree along s k in the sense that AM.4 (H(x k ) B k )s k C s k 2, for all k 0, and some constant C > 0. (3.3) By using finite differences on the gradient for computing B k, we showed in [5] that AM.4 can be achieved in O(n log ǫ ) additional iterations and gradient evaluations (for any user-chosen constant C). Next we recall some results for ARC (h), in particular, necessary conditions for the global subproblem solution (.8) and expressions for the model decrease (see Lemma 3 i)); also, some general properties that hold for a large class of (nonconvex) functions (see Lemma 3 ii) and iii)). Lemma 3.. i) [2, Lemmas 3.2, 3.3] Let s k be the global minimizer of (.8) for any k 0. Then g k s k +s k B k s k +σ k s k 3 = 0, (3.4) and f(x k ) m k (s k ) = 2 st kb k s k σ k s k 3. (3.5) ii) [2, Lemma 5.2] Let AF.3, AF.6 and AM.4 hold. Then σ k max(σ 0, 3 2γ 2 (C +L)) = L 0, for all k 0. (3.6) iii) [3, Lemma 5.2] Let AF.3 AF.4, AF.6, AM.4 and TC.s hold. Then s k satisfies where κ g is the positive constant s k κ g gk+ for all successful iterations k, (3.7) κ g = ( κ θ )/(L+C +L 0 +κ θ κ H ). (3.8) Note that in our second-order ARC variants in [2, 3], we employ the more general condition (3.4) and an approximate nonnegative curvature requirement [2, (3.2)] for ining the choice of s k, which may hold at other points (of local minimum) than the global minimizer over L k as prescribed by (.8). When the model is convex, as it is often the case here, such situations do not arise. The bound (3.7) ensures that the step s k does not become too small compared to the size of the gradient, and it is a crucial ingredient for obtaining, as shown in [3, Corollary 5.3], an O(ǫ 3/2 ) upper bound on the iteration count of ARC (S) to generate g k ǫ for general nonconvex functions. Next we improve the order of this bound for convex and strongly convex objectives. Despite solving the subproblem to higher accuracy than the generic ARC framework, the second-order ARC variants still only evaluate the objective function and its gradient once in each (major) iteration and each successful iteration, respectively; hence the correspondence between (successful) iteration count and the number of (gradient) function evaluations continues to hold. Recall also Theorem 2. that relates the total number of iterations to that of successful ones.

12 2 Complexity of adaptive cubic regularization methods for convex unconstrained optimization 3. ARC (g2) complexity on convex objectives Here, we prove an O(/ ǫ) iteration upper bound for ARC (g2) to achieve (.2), which improves the steepest-descent-like bound of order /ǫ for basic ARC in Theorem 2.5. A stronger requirement than AF.6 is required in this section, namely, that the Hessian is globally Lipschitz continuous AF.6 H(x) H(y) L x y, for all x, y IR n. (3.9) Note that AF.6 and AF.8 imply AF.4 on the f(x 0 ) level set of f, which is the required domain of gradient Lipschitz continuity for the results in this section. We also employ the true Hessian values for B k, namely, we make the following choice in ARC (g2), B k = H(x k ), for all k 0. (3.0) Thus AM.4 holds in this case with C = 0, and AF.4 (or AF.6 and AF.8) implies AM.. A useful lemma is given first. Lemma 3.2. Let AF.3, AF.6 and AF.7 AF.8 hold. Let f = f(x ) be the (global) minimum of f. Consider the subproblem (.8) with B k = H(x k ) and for a(ny) subspace L k of IR n with g L k. Then min m k (s) f(x k ) 2κ c m(g2)[f(x k ) f(x k +s k)] 3 2, (3.) s L k where s k is a (global) minimizer of f(x k +s) over s L k, and where κ c m(g2) ( = 6D ) 6DL and L = max(σ 0,γ 2 L,κ H ). (3.2) Proof. From AF.3 and AF.6, we have the overestimation property f(xk +s) f(x k ) s T g k 2s T H(x k )s L 6 s 3, s IR n, (3.3) and so, from (.) and B k = H(x k ), we have Employing (3.6) and γ 2, we further obtain m k (s) f(x k +s)+ 2σ k +L s 3, s IR n. 6 m k (s) f(x k +s)+l s 3, s IR n, (3.4) where L is ined in (3.2). (Note that κ H is not needed as yet in the inition of L ; it will be useful later as we shall see.) Minimizing on both sides of (3.4) gives the first inequality below { min m k (s) min f(xk +s)+l s 3} { min f(xk +αs k)+l α 3 s k 3}, (3.5) s L k s L k α [0,] where the second inequality follows from the inition of s k which gives αs k L k for all α [0,]. From AF.7, we have f(x k +αs k ) ( α)f(x k)+αf(x k +s k ), for all α [0,], and so, from (3.5), { min m k (s) f(x k )+ min α[f(xk +s k) f(x k )]+L α 3 s k 3}. (3.6) s L k α [0,] The construction of the algorithm implies f(x k ) f(x 0 ), so that x k x D due to AF.8. Furthermore, f(x k +s k ) f(x k), and so x k +s k x D. Thus s k x k x + x k +s k x 2D, and (3.6) implies min m k (s) f(x k )+ min s L k α [0,] { α[f(xk +s k) f(x k )]+8α 3 L D 3}. (3.7)

13 C. Cartis, N. I. M. Gould and Ph. L. Toint 3 The minimum in the right-hand side of (3.7) is attained at αk f(xk ) f(x k +s k = min{, ˆα k }, where ˆα k := ) 2D. 6L D Let us show that ˆα k, namely, f(x k ) f(x k +s k ) 24L D 3. AF.7 gives the first inequality f(x k +s k) f(x k ) g T ks k g k s k 2D g k = 2D g k g(x ) 2κ H D 2, where we also used the Cauchy-Schwarz inequality, the bound on s k just before (3.7), AF.4 and AF.8. Since we assumed in AF.8 that D, and the inition of L implies L κ H, we conclude that f(x k + s k ) f(x k) 2κ H D 3 2L D 3 24L D 3. Thus, αk = ˆα k and substituting the above value of ˆα k in (3.7), we deduce (3.) with the notation (3.2). The main result of this section follows. Theorem 3.3. Let AF.3, AF.6 and AF.7 AF.8 hold. Let f = f(x ) be the (global) minimum of f. Apply ARC (g2) with the choices (2.8) and (3.0) to minimizing f. Then j = f(x j ) f ( S j η βκ c 2, j 0, (3.8) m(g2)) where S j is ined in (2.6), κ c m(g2) in (3.2) and ) β = (, 2 min κ 3/2 G 4(κ H D) 3/2 with κ G = σ min(κ c m(g2)) 2. (3.9) 4κ 2 θ κ3 H Thus, given any ǫ > 0, ARC (g2) takes at most κ c s(g2) ǫ (3.20) successfuliterationsandgradientevaluationstogeneratef(x j ) f ǫ, whereκ c s(g2) = (η βκ c m(g2)). Proof. Let k S. From (.4), (.5) and (2.5), we have f(x k+ ) ( η )f(x k )+η m k (s k ) = ( η )f(x k )+η [m k (s k ) m k (s m k )]+η m k (s m k ), (3.2) where s m k denotes the global minimizer of m k(s) over IR n. AF.7 implies H(x k ) is positive semiinite and so m k (s) is convex, which gives the first inequality below, m k (s k ) m k (s m k ) s m k (s k ) T (s k s m k ) s m k (s k ) s k s m k κ θ g k 3 s k s m k. (3.22) where the second inequality follows from TC.g2 (.). To bound s k s m k, recall that both s k and s m k satisfy (3.4), which implies due to (2.8) and B k = H(x k ) being positive semiinite, σ min s 3 σ k s 3 g T ks g k s, where s = s k or s = s m k. Thus max{ s k, s m k } g k /σ min, and so s k s m g k k 2. σ min

14 4 Complexity of adaptive cubic regularization methods for convex unconstrained optimization This and (3.22) now provide the first inequality below, m k (s k ) m k (s m k ) 2κ θ σmin g k 7 2 2κ θ κ H 2κH σmin gk 3 2 k, (3.23) while the second inequality follows from (2.38). Recalling (3.2), we are left with bounding m k (s m k ) above, for which we use Lemma 3.2 with L k = IR n. Then, s k = x x k and so f(x k ) f(x k +s k ) = k, and (3.) implies m k (s m k ) f(x k ) 2κ c m(g2) 3 2 k. Substituting this bound and (3.23) into (3.2), we deduce ( ) κθ κ H 2κH f(x k+ ) f(x k )+2η σmin gk κ c m(g2) 3 2 k, or equivalently, recalling (2.4) and (3.9), Thus we have the implication k k+ 2η κ c m(g2) g k 3 2 2κ k. G g k κ G 2 = k k+ η κ c m(g2) 3 2 k. (3.24) It remains to prove a bound of the same form as the right-hand side of (3.24) when g k > κ G /2. For this, we employ again Lemma 3.2, this time for s k and the subspace L k in the kth iteration of ARC (g2) with g L k. Thus noting that the left-hand side of (3.) is equal to m k (s k ) in this case, we employ (3.) to upper bound the first inequality in (3.2), and obtain f(x k+ ) f(x k ) 2η κ c m(g2)[f(x k ) f(x k +s k)] 3 2. (3.25) Since s k is a global minimizer of f(x k+s) over s L k, and g L k, we have the first inequality below, for any α 0, f(x k ) f(x k +s k) f(x k ) f(x k αg k ) 2κ H g k 2 g k 2κ H D k, where the second and third inequalities follow from the second inequality in (2.37) and from (2.24), respectively. It follows from (3.25) that or equivalently, f(x k+ ) f(x k ) η κ c g k 3 2 m(g2) κ H D 2κ H D 3 2 k, Thus we have the implication k k+ η κ c g k 3 2 m(g2) κ H D 2κ H D 3 2 k. g k > κ G 2 = k k+ η κ c κ G κg m(g2) 4κ H D κ H D 3 2 k. (3.26) Finally, we conclude from (3.24) and (3.26) that k k+ 2η βκ c m(g2) 3 2 k, k S, (3.27)

15 C. Cartis, N. I. M. Gould and Ph. L. Toint 5 where β is ined in (3.9). For any k S, we have the identity below = k+ k k k+ k k+ ( k + k+ ) 2η βκ c m(g2) k+ ( k + k+ ) η βκ c m(g2), k where we also used (3.27) and (2.3), respectively. Thus, recalling that k remains unchanged on unsuccessful iterations and summing the above up to j, we deduce + S j η βκ c m(g2) S j η βκ c j 0 m(g2), j 0, which gives (3.8) and also (3.20). As TC.g2 is satisfied at the global minimizer of the cubic model m k (s), the latter can be chosen as the step in our algorithm, which is an efficient choice as far as the cost of the subproblem solution is concerned, provided the problem is medium-size or the Hessian at the iterates is sparse. Note the two regimes of analysis in the above proof, namely in the model decreases (3.24) and (3.26). To obtain the former asymptotic case, the termination criteria TC.g2 was used, while for the latter early stages case, the first-order condition that the gradient be included in the subspace of minimization, and the ensuing decrease along the steepest descent direction, were essential. Thus the construction of ARC (g2) to behave like steepest-descent early on and then naturally switch to higher accuracy as it approaches the solution is reflected in our complexity analysis, with the slight caveat that the (converging) gradient is nonmonotonic and so the distinction between the asymptotic and nonasymptotic regimes is not strict. Furthermore, the nonasymptotic result (3.26) also holds for ARC (S), but the termination condition TC.s does not seem strong enough to ensure a similar property to (3.24) for the asymptotic regime of ARC (S). Assuming that σ 0 is chosen small enough, then the condition number κ c m(g2) in (3.2) and (3.8) that characterizes the asymptotic function decrease is a problem-indepedent constant multiple of max(κh,l)d 3 while β (0,) in (3.8) represents the fraction of this function decrease that can be ensured in the nonasymptotic regime when only a Cauchy decrease is achieved. The iteration complexity of Nesterov & Polyak s cubic regularization algorithm applied to convex problems is analysed in [2, Theorem 4] and [, Theorem ], and an O(/ ǫ) bound is obtained. Here, we relax the requirement that the subproblem be solved globally and exactly, allowing approximate solutions to obtain a same-order bound. Complexity of generating approximately-optimal gradient values ThecomplexityofARC (g2) generating a gradient value g j ǫ can be obtained as described in Section 2.5, by using (2.39) in Lemma 2.8, and an O(/ǫ) upper bound on the total number of iterations and gradient-evaluations with g k > ǫ ensues. 3.2 ARC (S) complexity on strongly convex objectives For generality purposes (since TC.s is a milder condition than TC.g2), we focus on ARC (S) in this section, but similar results can be shown for ARC (g2). Let us now assume AF.9. Due to AF.3, (2.29) is equivalent to u T H(x)u µ u 2, for all u, x IR n. (3.28) Employing (2.29) with y = x and x = x, we deduce that AF.8 is implied by AF.9 with D 2 0 /µ. (3.29) The strong convexity of f implies that asymptotically, ARC (S) converges Q-quadratically to the (global) minimizer and hence it possesses an associated evaluation complexity of order log 2 log 2 ǫ from some iteration j q 0 onwards [, 9.5.3].

16 6 Complexity of adaptive cubic regularization methods for convex unconstrained optimization Lemma 3.4. Assume AF.3 AF.4, AF.6, AF.9 and AM.4 hold, and let x be the global minimizer of f. Apply ARC (S) to minimizing f, and assume that the Rayleigh quotient of B k along s k is uniformly bounded away from zero, namely R k (s k ) = st k B ks k s k 2 R min > 0, k S. (3.30) Then, recalling κ g ined in (3.8) and letting δ = 2(η R min κ 2 g µ) 2, N f = {x : f(x) f(x ) δ} (3.3) is a neighbourhood of quadratic convergence for f, so that if there exists j q 0 such that x jq N f with jq δ/2, then x k N f for all k j q, and k+ δ 2 k, for all k S and k j q. (3.32) Furthermore, given ǫ > 0, ARC (S) takes at most ( ) δ log 2 log 2 ǫ (3.33) successful iterations and gradient evaluations from j q onwards, to generate f(x j ) f ǫ. Proof. Let k S. Then (.5), (3.5), (3.30) and (3.7) imply f(x k ) f(x k+ ) η (f(x k ) m k (s k )) 2η R k (s k ) s k 2 2η R min s k 2 2η R min κ 2 g g k+, k S. Lemma 2.6 applies at k + and so The last two displayed equations further give k+ 2µ g k+ 2. k f(x k ) f(x k+ ) 2η R min κ 2 g 2µ k+, and so k+ δ 2 k, for all k S, (3.34) where δ is ined in (3.3). Thus the expression of N f in (3.3) follows, as well as (3.32). Assuming that x jq N f with jq δ/2, we deduce from (3.32) that j δ 2l 2l j q, for any j j q, (3.35) where l = {j q,j q+,...,j} S denotes the number of successful iterations from j q up to j. Now employing jq δ/2 in (3.35) shows that j ǫ provided 2 2l δ ǫ, which gives the bound (3.33). Remark on satisfying (3.30). If exact Hessians are used so that B k = H(x k ) for all k, then AF.9 implies (3.30) due to (3.28). Alternatively, (3.30) can be ensured if AM.4 holds with a sufficiently small C. Namely, note that AF.9, AM.4 and (3.29) imply µ st k H ks k s k 2 R k (s k )+ st k (H k B k )s k s k 2 R k (s k )+C s k R k (s k )+2CD, k 0.

17 C. Cartis, N. I. M. Gould and Ph. L. Toint 7 Thus (3.30) holds provided C < µ/(2d). Recall our comments on satisfying AM.4 by finite differencing following (3.3). We are left with bounding the successful iterations up to j q, namely, the iterations ARC (S) takes until entering the region of quadratic convergence N f (which must happen under the conditions of Corollary 3.5 as x k converges to the unique global minimizer x ). From the inition of j q and N f in Lemma 3.4, this is equivalent to counting the successful iterations until jq = f(x jq ) f(x ) 2δ, (3.36) with δ ined in (3.3). The choice of s k in (.8) with g k L k implies that ARC (S) always satisfies the Cauchy condition (.2) and so the bound in Theorem 2.7 holds. This yields an upper bound on (the successful iterations up to) j q of order log( 0 /δ), and emphasizes again that early on in the running of the algorithm, steepest-descent-like decrease is sufficient even from a worst-case complexity viewpoint. The bound on the total number of successful iterations is then obtained by adding up the bounds on the two distinct phases, up to and then inside the neighbourhood of quadratic convergence. Corollary 3.5. Assume AF.3 AF.4, AF.6, AF.9, AM. and AM.4 hold, and let x be the global minimizer of f. Apply ARC (S) to minimizing f, assuming that (3.30) holds. Then, given any ǫ > 0, ARC (S) takes, in total, at most κ sc s log 2 ( ) 0 δ +log δ 2 log 2 ǫ (3.37) successful iterations and gradient evaluations to generate f(x j ) f(x ) ǫ, where κ sc s is ined in (2.33) and δ in (3.3). Proof. The conditions of Theorem 2.7 are satisfied, and so letting ǫ = δ/2 in (2.33), we deduce that (3.36) holds in at most κ sc s log(2 0 /δ) successful iterations. To bound the number of iterations from j q to j, we employ Lemma 3.4. Thus the total number of successful iterations up to j is the sum of these two bounds. Recalling our comments following (2.34), let us interpret the condition numbers in (3.37). In particular, providedσ 0 ischosensufficientlysmall, weobtainfrom(2.36)thatκ sc s is a problem-independent multiple of theboundc(h)in(2.35)ontheconditionnumberofthehessianmatrixh(x). Additionally, ifb k = H(x k ) so that C = 0 and R min = µ, δ in (3.3) and (3.37) simplifies to a multiple of µ/c(h). Note that for the non-asymptotic phase of ARC (S), an O(/ δ) bound can be deduced similarly to the proof of Theorem 3.3. Namely, using Lemma 3.2, which clearly holds for ARC (S), we deduce (3.25); then employ (2.37) just as in the first displayed equation after (3.25) and use (2.30). Then the total ARC (S) complexity would be of order δ /2 + log 2 log 2 (δ/ǫ), which matches the bounds for cubic regularization with exact subproblem solution in [2, pages ] and [, pages 76 77]. Note that such bounds are weaker than the ones we obtained in Corollary 3.5. Complexity of generating approximately-optimal gradient values We have the following result, where the constants have already been ined in Corollary 3.5.

18 8 Complexity of adaptive cubic regularization methods for convex unconstrained optimization Lemma 3.6. Assume AF.3 AF.4, AF.6, AF.9, AM. and AM.4 hold. Apply ARC (S) to minimizing f, assuming that (3.30) holds. Then N g = {x : g(x) ( 2η R min κ g ) 2 = ζ} is a neighbourhood of quadratic convergence for the gradient g, namely, there exists j q such that x jq N g with g jq ζ/2, then x k N g for all k j q, and g k+ ζ g k 2, for all k S and k j q. (3.38) Thus, given ǫ > 0, ARC (S) takes at most ( ) ζ log 2 log 2 ǫ (3.39) successful iterations from j q onwards, to generate g j ǫ. Furthermore, to generate g jq ζ, ARC (S) takes at most 2κ sc s log g 0 κ H ζ (3.40) µ successful iterations, so that the total number of successful iterations and gradient evaluations required to generate g j ǫ is at most equal to the sum of the bounds (3.39) and (3.40). Proof. AF.9 implies AF.7 which gives f(x k+ ) f(x k ) g T ks k g k s k, k 0. This and the first set of displayed equations in the proof of Lemma 3.4 give the first inequality below g k 2η R min s k 2η R min κ g gk+, k S, (3.4) where the latter inequality follows from (3.7). The expression and properties of N g follow. The bound (3.39) is obtained similarly to the proof of (3.33) in Lemma 3.4. To deduce (3.40), let ǫ = ζ in (2.39) and in (2.33), and replace 0 in the latter by its upper bound g 0 2 /(2µ). A similar estimate of a neighbourhood of quadratic convergence for the gradient can be found in [] for Nesterov & Polyak s cubic regularization algorithm. 3.3 On the tightness of ARC s complexity bounds The question arises as to whether the complexity bounds on ARC s performance on special problem classes presented in this section are too pessimistic, even for the worst-case, and could potentially be improved. This is particularly relevant when it comes to the convex case and the corresponding bound of order / ǫ (Theorem 3.3), implying a sublinear rate of convergence of second-order ARC variants on convex functions. (For the strongly convex case, the log log ǫ bound can commonly be observed numerically when Q-quadratic convergence takes place.) Here, we find a convex function that satisfies all the conditions of Theorem 3.3 apart from having bounded level sets and on which ARC takes precisely order / ǫ iterations (and function- and gradientevaluations), to generate f(x j ) f ǫ. Consider a convex function f C 2 (IR), with f(x) = e x, for x 0. (3.42) We have the following complexity result, whose proof is given in the Appendix.

19 C. Cartis, N. I. M. Gould and Ph. L. Toint 9 Lemma 3.7. The function (3.42) is convex, bounded below by f = 0 and has bounded above and Lipschitz continuous second derivatives f (x) for x [0, ) with constants κ H = L =, thus satisfying AF.4, AF.6 and AF.7. Apply ARC to minimizing (3.42), starting with x 0 0. On each iteration k, compute the step s k as the global minimizer of the model m k (s) in (.) with B k = f (x k ) and with the (reasonable) choice σ k := σ L 2 =, k 0, (3.43) 2 which ensures that every iteration is very successful and that (2.8) holds. Then AM. and AM.4 hold (with κ B = and C = 0), and ARC takes Θ(ǫ /2 ) total iterations to achieve f(x k ) ǫ, where Θ( ) denotes upper and lower bounds of that order. Several remarks are in order concerning the above example. This example also applies to Nesterov & Polyak s cubic regularization algorithm [2, ]; recall our choice of s k and σ k in the above. In particular, it satisfies all the conditions in [, Theorem ] including σ k = L/2 but except f having bounded level sets. The latter theorem establishes the O(ǫ /2 ) iteration upper bound for Nesterov & Polyak s cubic regularization. Approximate termination criteria like TC.g2 and TC.s do not give better performance than the exact subproblem solution in this case (see the right-hand side plot of basic ARC with the Cauchy condition in Figure 3.). If Newton s method is applied to this example, the complexity would be better; see Figure 3.. Similarly, if we allowed σ k to decrease to zero so that the step approaches the Newton step, the complexity would again improve. Thus the inefficient behaviour in this example is due to keeping the regularization always switched on, and always strongly regularizing. However, we have shown in [4] that for nonconvex problems, Newton s method can behave worse than second-order ARC in the worst case, in fact it can be as poor as steepest descent. It remains to see whether this is also possible for convex problems, or for problems with bounded level sets Cauchy σ =L/2 k σ k > 0 Newton Iterations vs Objective; log scale 00 Figure 3.: Graph of (3.42) and the local cubic regularizations at the ARC iterates (left-hand side). Plot of objective values at the iterates on a log scale for different ARC variants and for Newton s method (right-hand side).

20 20 Complexity of adaptive cubic regularization methods for convex unconstrained optimization 4 Conclusions The behaviour of ARC on some special problem classes was investigated and, as expected, improved complexity bounds were shown when additional structure was assumed to be present in the problem. In particular, upper bounds of order O(/ ǫ) and O( logκ +log logǫ ) were proved for second-order ARC variants when applied to convex and strongly convex objectives, respectively. For the latter case, the fact that the constant number of steps before entering the region of quadratic convergence is a logarithmic function of condition numbers is an improvement over existing complexity bounds for second-order methods applied to such problems. We have also given an example of (relatively) inefficient behaviour of second-order ARC on a convex problem with unbounded level sets which takes order / ǫ iterations to reach within ǫ of the optimum. Several open questions remain, such as whether a convex objective with bounded level sets can be found on which the latter iteration bound is attained, or whether Newton s method always has better worst-case complexity than ARC in the convex case. References [] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, United Kingdom, [2] C. Cartis, N. I. M. Gould and Ph. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 27(2): , 20. [3] C. Cartis, N. I. M. Gould and Ph. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity. Mathematical Programming, DOI: 0.007/s y, 200 (online). [4] C. Cartis, N. I. M. Gould and Ph. L. Toint. On the complexity of steepest descent, Newton s and regularized Newton s methods for nonconvex unconstrained optimization. SIAM Journal on Optimization, 20(6): , 200. [5] C. Cartis, N. I. M. Gould and Ph. L. Toint. On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization. ERGO Technical Report 0-005, School of Mathematics, University of Edinburgh, 200. [6] A. R. Conn, N. I. M. Gould and Ph. L. Toint. Trust-Region Methods. SIAM, Philadelphia, USA, [7] J. E. Dennis and R. B. Schnabel. Numerical methods for unconstrained optimization and nonlinear equations. Prentice-Hall, Englewood Cliffs, New Jersey, USA, 983. Reprinted as Classics in Applied Mathematics 6, SIAM, Philadelphia, USA, 996. [8] G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University Press, Baltimore, USA, 996. [9] A. Griewank. The modification of Newton s method for unconstrained optimization by bounding cubic terms. Technical Report NA/2 (98), Department of Applied Mathematics and Theoretical Physics, University of Cambridge, United Kingdom, 98. [0] Yu. Nesterov. Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands, [] Yu. Nesterov. Accelerating the cubic regularization of Newton s method on convex problems. Mathematical Programming, 2():59 8, 2008.

Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity

Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity Coralia Cartis,, Nicholas I. M. Gould, and Philippe L. Toint September

More information

Adaptive cubic overestimation methods for unconstrained optimization

Adaptive cubic overestimation methods for unconstrained optimization Report no. NA-07/20 Adaptive cubic overestimation methods for unconstrained optimization Coralia Cartis School of Mathematics, University of Edinburgh, The King s Buildings, Edinburgh, EH9 3JZ, Scotland,

More information

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity Coralia Cartis, Nick Gould and Philippe Toint Department of Mathematics,

More information

On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization

On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization C. Cartis, N. I. M. Gould and Ph. L. Toint 22 September 2011 Abstract The (optimal) function/gradient

More information

On the complexity of the steepest-descent with exact linesearches

On the complexity of the steepest-descent with exact linesearches On the complexity of the steepest-descent with exact linesearches Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint 9 September 22 Abstract The worst-case complexity of the steepest-descent algorithm

More information

Universal regularization methods varying the power, the smoothness and the accuracy arxiv: v1 [math.oc] 16 Nov 2018

Universal regularization methods varying the power, the smoothness and the accuracy arxiv: v1 [math.oc] 16 Nov 2018 Universal regularization methods varying the power, the smoothness and the accuracy arxiv:1811.07057v1 [math.oc] 16 Nov 2018 Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint Revision completed

More information

Trust Region Methods for Unconstrained Optimisation

Trust Region Methods for Unconstrained Optimisation Trust Region Methods for Unconstrained Optimisation Lecture 9, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Trust

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models Math. Program., Ser. A DOI 10.1007/s10107-017-1137-4 FULL LENGTH PAPER Global convergence rate analysis of unconstrained optimization methods based on probabilistic models C. Cartis 1 K. Scheinberg 2 Received:

More information

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016 AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 9 February 24th Overview In the previous lecture we reviewed results from multivariate calculus in preparation for our journey into convex

More information

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0. Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization

More information

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. In this paper we prove global

More information

Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models

Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models Worst-case evaluation comlexity for unconstrained nonlinear otimization using high-order regularized models E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint 2 Aril 26 Abstract

More information

Corrigendum: On the complexity of finding first-order critical points in constrained nonlinear optimization

Corrigendum: On the complexity of finding first-order critical points in constrained nonlinear optimization Corrigendum: On the complexity of finding first-order critical points in constrained nonlinear optimization C. Cartis, N. I. M. Gould and Ph. L. Toint 11th November, 2014 Abstract In a recent paper (Cartis

More information

University of Edinburgh, Edinburgh EH9 3JZ, United Kingdom.

University of Edinburgh, Edinburgh EH9 3JZ, United Kingdom. An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity by C. Cartis 1, N. I. M. Gould 2 and Ph. L. Toint 3 February 20, 2009;

More information

Convergence of trust-region methods based on probabilistic models

Convergence of trust-region methods based on probabilistic models Convergence of trust-region methods based on probabilistic models A. S. Bandeira K. Scheinberg L. N. Vicente October 24, 2013 Abstract In this paper we consider the use of probabilistic or random models

More information

Is Greedy Coordinate Descent a Terrible Algorithm?

Is Greedy Coordinate Descent a Terrible Algorithm? Is Greedy Coordinate Descent a Terrible Algorithm? Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, Hoyt Koepke University of British Columbia Optimization and Big Data, 2015 Context: Random

More information

A Trust Region Algorithm for Heterogeneous Multiobjective Optimization

A Trust Region Algorithm for Heterogeneous Multiobjective Optimization A Trust Region Algorithm for Heterogeneous Multiobjective Optimization Jana Thomann and Gabriele Eichfelder 8.0.018 Abstract This paper presents a new trust region method for multiobjective heterogeneous

More information

BOUNDS FOR THE LEAST SQUARES RESIDUAL USING SCALED TOTAL LEAST SQUARES

BOUNDS FOR THE LEAST SQUARES RESIDUAL USING SCALED TOTAL LEAST SQUARES BOUNDS FOR THE LEAST SQUARES RESIDUAL USING SCALED TOTAL LEAST SQUARES Christopher C. Paige School of Computer Science, McGill University Montreal, Quebec, Canada, H3A 2A7 paige@cs.mcgill.ca Zdeněk Strakoš

More information

First-Order Methods. Stephen J. Wright 1. University of Wisconsin-Madison. IMA, August 2016

First-Order Methods. Stephen J. Wright 1. University of Wisconsin-Madison. IMA, August 2016 First-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) First-Order Methods IMA, August 2016 1 / 48 Smooth

More information

Strategies for Improving the Efficiency of Monte-Carlo Methods

Strategies for Improving the Efficiency of Monte-Carlo Methods Strategies for Improving the Efficiency of Monte-Carlo Methods Paul J. Atzberger General comments or corrections should be sent to: paulatz@cims.nyu.edu Introduction The Monte-Carlo method is a useful

More information

Accelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India

Accelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India Accelerated Stochastic Gradient Descent Praneeth Netrapalli MSR India Presented at OSL workshop, Les Houches, France. Joint work with Prateek Jain, Sham M. Kakade, Rahul Kidambi and Aaron Sidford Linear

More information

Essays on Some Combinatorial Optimization Problems with Interval Data

Essays on Some Combinatorial Optimization Problems with Interval Data Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university

More information

Nonlinear programming without a penalty function or a filter

Nonlinear programming without a penalty function or a filter Nonlinear programming without a penalty function or a filter N I M Gould Ph L Toint October 1, 2007 RAL-TR-2007-016 c Science and Technology Facilities Council Enquires about copyright, reproduction and

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Nonlinear programming without a penalty function or a filter

Nonlinear programming without a penalty function or a filter Report no. NA-07/09 Nonlinear programming without a penalty function or a filter Nicholas I. M. Gould Oxford University, Numerical Analysis Group Philippe L. Toint Department of Mathematics, FUNDP-University

More information

Chapter 7 One-Dimensional Search Methods

Chapter 7 One-Dimensional Search Methods Chapter 7 One-Dimensional Search Methods An Introduction to Optimization Spring, 2014 1 Wei-Ta Chu Golden Section Search! Determine the minimizer of a function over a closed interval, say. The only assumption

More information

Lecture Quantitative Finance Spring Term 2015

Lecture Quantitative Finance Spring Term 2015 implied Lecture Quantitative Finance Spring Term 2015 : May 7, 2015 1 / 28 implied 1 implied 2 / 28 Motivation and setup implied the goal of this chapter is to treat the implied which requires an algorithm

More information

Approximate Composite Minimization: Convergence Rates and Examples

Approximate Composite Minimization: Convergence Rates and Examples ISMP 2018 - Bordeaux Approximate Composite Minimization: Convergence Rates and S. Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi MLO Lab, EPFL, Switzerland sebastian.stich@epfl.ch July 4, 2018

More information

Richardson Extrapolation Techniques for the Pricing of American-style Options

Richardson Extrapolation Techniques for the Pricing of American-style Options Richardson Extrapolation Techniques for the Pricing of American-style Options June 1, 2005 Abstract Richardson Extrapolation Techniques for the Pricing of American-style Options In this paper we re-examine

More information

CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to Finance Optimization Problems

CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to Finance Optimization Problems CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to Finance Optimization Problems January 26, 2018 1 / 24 Basic information All information is available in the syllabus

More information

Keywords: evaluation complexity, worst-case analysis, least-squares, constrained nonlinear optimization, cubic regularization methods.

Keywords: evaluation complexity, worst-case analysis, least-squares, constrained nonlinear optimization, cubic regularization methods. On the evaluation complexity of cubic regularization methos for potentially rank-icient nonlinear least-squares problems an its relevance to constraine nonlinear optimization Coralia Cartis, Nicholas I.

More information

Realizability of n-vertex Graphs with Prescribed Vertex Connectivity, Edge Connectivity, Minimum Degree, and Maximum Degree

Realizability of n-vertex Graphs with Prescribed Vertex Connectivity, Edge Connectivity, Minimum Degree, and Maximum Degree Realizability of n-vertex Graphs with Prescribed Vertex Connectivity, Edge Connectivity, Minimum Degree, and Maximum Degree Lewis Sears IV Washington and Lee University 1 Introduction The study of graph

More information

An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process

An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process Computational Statistics 17 (March 2002), 17 28. An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process Gordon K. Smyth and Heather M. Podlich Department

More information

Nonlinear programming without a penalty function or a filter

Nonlinear programming without a penalty function or a filter Math. Program., Ser. A (2010) 122:155 196 DOI 10.1007/s10107-008-0244-7 FULL LENGTH PAPER Nonlinear programming without a penalty function or a filter N. I. M. Gould Ph.L.Toint Received: 11 December 2007

More information

25 Increasing and Decreasing Functions

25 Increasing and Decreasing Functions - 25 Increasing and Decreasing Functions It is useful in mathematics to define whether a function is increasing or decreasing. In this section we will use the differential of a function to determine this

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods EC316a: Advanced Scientific Computation, Fall 2003 Notes Section 4 Discrete time, continuous state dynamic models: solution methods We consider now solution methods for discrete time models in which decisions

More information

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

Large-Scale SVM Optimization: Taking a Machine Learning Perspective Large-Scale SVM Optimization: Taking a Machine Learning Perspective Shai Shalev-Shwartz Toyota Technological Institute at Chicago Joint work with Nati Srebro Talk at NEC Labs, Princeton, August, 2008 Shai

More information

On the Number of Permutations Avoiding a Given Pattern

On the Number of Permutations Avoiding a Given Pattern On the Number of Permutations Avoiding a Given Pattern Noga Alon Ehud Friedgut February 22, 2002 Abstract Let σ S k and τ S n be permutations. We say τ contains σ if there exist 1 x 1 < x 2

More information

Pricing Dynamic Solvency Insurance and Investment Fund Protection

Pricing Dynamic Solvency Insurance and Investment Fund Protection Pricing Dynamic Solvency Insurance and Investment Fund Protection Hans U. Gerber and Gérard Pafumi Switzerland Abstract In the first part of the paper the surplus of a company is modelled by a Wiener process.

More information

Rohini Kumar. Statistics and Applied Probability, UCSB (Joint work with J. Feng and J.-P. Fouque)

Rohini Kumar. Statistics and Applied Probability, UCSB (Joint work with J. Feng and J.-P. Fouque) Small time asymptotics for fast mean-reverting stochastic volatility models Statistics and Applied Probability, UCSB (Joint work with J. Feng and J.-P. Fouque) March 11, 2011 Frontier Probability Days,

More information

Revenue Management Under the Markov Chain Choice Model

Revenue Management Under the Markov Chain Choice Model Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin

More information

What can we do with numerical optimization?

What can we do with numerical optimization? Optimization motivation and background Eddie Wadbro Introduction to PDE Constrained Optimization, 2016 February 15 16, 2016 Eddie Wadbro, Introduction to PDE Constrained Optimization, February 15 16, 2016

More information

On the Optimality of a Family of Binary Trees Techical Report TR

On the Optimality of a Family of Binary Trees Techical Report TR On the Optimality of a Family of Binary Trees Techical Report TR-011101-1 Dana Vrajitoru and William Knight Indiana University South Bend Department of Computer and Information Sciences Abstract In this

More information

Infinite Reload Options: Pricing and Analysis

Infinite Reload Options: Pricing and Analysis Infinite Reload Options: Pricing and Analysis A. C. Bélanger P. A. Forsyth April 27, 2006 Abstract Infinite reload options allow the user to exercise his reload right as often as he chooses during the

More information

Portfolio Management and Optimal Execution via Convex Optimization

Portfolio Management and Optimal Execution via Convex Optimization Portfolio Management and Optimal Execution via Convex Optimization Enzo Busseti Stanford University April 9th, 2018 Problems portfolio management choose trades with optimization minimize risk, maximize

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Principles of Financial Computing

Principles of Financial Computing Principles of Financial Computing Prof. Yuh-Dauh Lyuu Dept. Computer Science & Information Engineering and Department of Finance National Taiwan University c 2008 Prof. Yuh-Dauh Lyuu, National Taiwan University

More information

Sublinear Time Algorithms Oct 19, Lecture 1

Sublinear Time Algorithms Oct 19, Lecture 1 0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation

More information

Technical Report Doc ID: TR April-2009 (Last revised: 02-June-2009)

Technical Report Doc ID: TR April-2009 (Last revised: 02-June-2009) Technical Report Doc ID: TR-1-2009. 14-April-2009 (Last revised: 02-June-2009) The homogeneous selfdual model algorithm for linear optimization. Author: Erling D. Andersen In this white paper we present

More information

A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation

A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation E Bergou Y Diouane V Kungurtsev C W Royer July 5, 08 Abstract Globally convergent variants of the Gauss-Newton

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

Approximate Revenue Maximization with Multiple Items

Approximate Revenue Maximization with Multiple Items Approximate Revenue Maximization with Multiple Items Nir Shabbat - 05305311 December 5, 2012 Introduction The paper I read is called Approximate Revenue Maximization with Multiple Items by Sergiu Hart

More information

arxiv: v3 [cs.lg] 1 Jul 2017

arxiv: v3 [cs.lg] 1 Jul 2017 Jonas Moritz Kohler 1 Aurelien Lucchi 1 arxiv:1705.05933v3 [cs.lg] 1 Jul 2017 Abstract We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus

More information

Worst-case evaluation complexity of regularization methods for smooth unconstrained optimization using Hölder continuous gradients

Worst-case evaluation complexity of regularization methods for smooth unconstrained optimization using Hölder continuous gradients Worst-case evaluation comlexity of regularization methods for smooth unconstrained otimization using Hölder continuous gradients C Cartis N I M Gould and Ph L Toint 26 June 205 Abstract The worst-case

More information

Short-time-to-expiry expansion for a digital European put option under the CEV model. November 1, 2017

Short-time-to-expiry expansion for a digital European put option under the CEV model. November 1, 2017 Short-time-to-expiry expansion for a digital European put option under the CEV model November 1, 2017 Abstract In this paper I present a short-time-to-expiry asymptotic series expansion for a digital European

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

Lecture 5: Iterative Combinatorial Auctions

Lecture 5: Iterative Combinatorial Auctions COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes

More information

CS364A: Algorithmic Game Theory Lecture #14: Robust Price-of-Anarchy Bounds in Smooth Games

CS364A: Algorithmic Game Theory Lecture #14: Robust Price-of-Anarchy Bounds in Smooth Games CS364A: Algorithmic Game Theory Lecture #14: Robust Price-of-Anarchy Bounds in Smooth Games Tim Roughgarden November 6, 013 1 Canonical POA Proofs In Lecture 1 we proved that the price of anarchy (POA)

More information

Department of Mathematics. Mathematics of Financial Derivatives

Department of Mathematics. Mathematics of Financial Derivatives Department of Mathematics MA408 Mathematics of Financial Derivatives Thursday 15th January, 2009 2pm 4pm Duration: 2 hours Attempt THREE questions MA408 Page 1 of 5 1. (a) Suppose 0 < E 1 < E 3 and E 2

More information

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Prof. Chuan-Ju Wang Department of Computer Science University of Taipei Joint work with Prof. Ming-Yang Kao March 28, 2014

More information

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical

More information

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015 Best-Reply Sets Jonathan Weinstein Washington University in St. Louis This version: May 2015 Introduction The best-reply correspondence of a game the mapping from beliefs over one s opponents actions to

More information

Analysing multi-level Monte Carlo for options with non-globally Lipschitz payoff

Analysing multi-level Monte Carlo for options with non-globally Lipschitz payoff Finance Stoch 2009 13: 403 413 DOI 10.1007/s00780-009-0092-1 Analysing multi-level Monte Carlo for options with non-globally Lipschitz payoff Michael B. Giles Desmond J. Higham Xuerong Mao Received: 1

More information

The Correlation Smile Recovery

The Correlation Smile Recovery Fortis Bank Equity & Credit Derivatives Quantitative Research The Correlation Smile Recovery E. Vandenbrande, A. Vandendorpe, Y. Nesterov, P. Van Dooren draft version : March 2, 2009 1 Introduction Pricing

More information

Smoothed Analysis of Binary Search Trees

Smoothed Analysis of Binary Search Trees Smoothed Analysis of Binary Search Trees Bodo Manthey and Rüdiger Reischuk Universität zu Lübeck, Institut für Theoretische Informatik Ratzeburger Allee 160, 23538 Lübeck, Germany manthey/reischuk@tcs.uni-luebeck.de

More information

Final Projects Introduction to Numerical Analysis Professor: Paul J. Atzberger

Final Projects Introduction to Numerical Analysis Professor: Paul J. Atzberger Final Projects Introduction to Numerical Analysis Professor: Paul J. Atzberger Due Date: Friday, December 12th Instructions: In the final project you are to apply the numerical methods developed in the

More information

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation Chapter 3: Black-Scholes Equation and Its Numerical Evaluation 3.1 Itô Integral 3.1.1 Convergence in the Mean and Stieltjes Integral Definition 3.1 (Convergence in the Mean) A sequence {X n } n ln of random

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization for Strongly Convex Stochastic Optimization Microsoft Research New England NIPS 2011 Optimization Workshop Stochastic Convex Optimization Setting Goal: Optimize convex function F ( ) over convex domain

More information

The Yield Envelope: Price Ranges for Fixed Income Products

The Yield Envelope: Price Ranges for Fixed Income Products The Yield Envelope: Price Ranges for Fixed Income Products by David Epstein (LINK:www.maths.ox.ac.uk/users/epstein) Mathematical Institute (LINK:www.maths.ox.ac.uk) Oxford Paul Wilmott (LINK:www.oxfordfinancial.co.uk/pw)

More information

In Discrete Time a Local Martingale is a Martingale under an Equivalent Probability Measure

In Discrete Time a Local Martingale is a Martingale under an Equivalent Probability Measure In Discrete Time a Local Martingale is a Martingale under an Equivalent Probability Measure Yuri Kabanov 1,2 1 Laboratoire de Mathématiques, Université de Franche-Comté, 16 Route de Gray, 253 Besançon,

More information

Hints on Some of the Exercises

Hints on Some of the Exercises Hints on Some of the Exercises of the book R. Seydel: Tools for Computational Finance. Springer, 00/004/006/009/01. Preparatory Remarks: Some of the hints suggest ideas that may simplify solving the exercises

More information

3.2 No-arbitrage theory and risk neutral probability measure

3.2 No-arbitrage theory and risk neutral probability measure Mathematical Models in Economics and Finance Topic 3 Fundamental theorem of asset pricing 3.1 Law of one price and Arrow securities 3.2 No-arbitrage theory and risk neutral probability measure 3.3 Valuation

More information

On the Superlinear Local Convergence of a Filter-SQP Method. Stefan Ulbrich Zentrum Mathematik Technische Universität München München, Germany

On the Superlinear Local Convergence of a Filter-SQP Method. Stefan Ulbrich Zentrum Mathematik Technische Universität München München, Germany On the Superlinear Local Convergence of a Filter-SQP Method Stefan Ulbrich Zentrum Mathemati Technische Universität München München, Germany Technical Report, October 2002. Mathematical Programming manuscript

More information

Haiyang Feng College of Management and Economics, Tianjin University, Tianjin , CHINA

Haiyang Feng College of Management and Economics, Tianjin University, Tianjin , CHINA RESEARCH ARTICLE QUALITY, PRICING, AND RELEASE TIME: OPTIMAL MARKET ENTRY STRATEGY FOR SOFTWARE-AS-A-SERVICE VENDORS Haiyang Feng College of Management and Economics, Tianjin University, Tianjin 300072,

More information

Smooth estimation of yield curves by Laguerre functions

Smooth estimation of yield curves by Laguerre functions Smooth estimation of yield curves by Laguerre functions A.S. Hurn 1, K.A. Lindsay 2 and V. Pavlov 1 1 School of Economics and Finance, Queensland University of Technology 2 Department of Mathematics, University

More information

1 Residual life for gamma and Weibull distributions

1 Residual life for gamma and Weibull distributions Supplement to Tail Estimation for Window Censored Processes Residual life for gamma and Weibull distributions. Gamma distribution Let Γ(k, x = x yk e y dy be the upper incomplete gamma function, and let

More information

CS 3331 Numerical Methods Lecture 2: Functions of One Variable. Cherung Lee

CS 3331 Numerical Methods Lecture 2: Functions of One Variable. Cherung Lee CS 3331 Numerical Methods Lecture 2: Functions of One Variable Cherung Lee Outline Introduction Solving nonlinear equations: find x such that f(x ) = 0. Binary search methods: (Bisection, regula falsi)

More information

Maximum Contiguous Subsequences

Maximum Contiguous Subsequences Chapter 8 Maximum Contiguous Subsequences In this chapter, we consider a well-know problem and apply the algorithm-design techniques that we have learned thus far to this problem. While applying these

More information

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA We begin by describing the problem at hand which motivates our results. Suppose that we have n financial instruments at hand,

More information

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 005 Seville, Spain, December 1-15, 005 WeA11.6 OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF

More information

A Preference Foundation for Fehr and Schmidt s Model. of Inequity Aversion 1

A Preference Foundation for Fehr and Schmidt s Model. of Inequity Aversion 1 A Preference Foundation for Fehr and Schmidt s Model of Inequity Aversion 1 Kirsten I.M. Rohde 2 January 12, 2009 1 The author would like to thank Itzhak Gilboa, Ingrid M.T. Rohde, Klaus M. Schmidt, and

More information

Support Vector Machines: Training with Stochastic Gradient Descent

Support Vector Machines: Training with Stochastic Gradient Descent Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Support vector machines Training by maximizing margin The SVM

More information

Market Liquidity and Performance Monitoring The main idea The sequence of events: Technology and information

Market Liquidity and Performance Monitoring The main idea The sequence of events: Technology and information Market Liquidity and Performance Monitoring Holmstrom and Tirole (JPE, 1993) The main idea A firm would like to issue shares in the capital market because once these shares are publicly traded, speculators

More information

Stability in geometric & functional inequalities

Stability in geometric & functional inequalities Stability in geometric & functional inequalities A. Figalli The University of Texas at Austin www.ma.utexas.edu/users/figalli/ Alessio Figalli (UT Austin) Stability in geom. & funct. ineq. Krakow, July

More information

Lecture 4: Divide and Conquer

Lecture 4: Divide and Conquer Lecture 4: Divide and Conquer Divide and Conquer Merge sort is an example of a divide-and-conquer algorithm Recall the three steps (at each level to solve a divideand-conquer problem recursively Divide

More information

LECTURE 2: MULTIPERIOD MODELS AND TREES

LECTURE 2: MULTIPERIOD MODELS AND TREES LECTURE 2: MULTIPERIOD MODELS AND TREES 1. Introduction One-period models, which were the subject of Lecture 1, are of limited usefulness in the pricing and hedging of derivative securities. In real-world

More information

COMBINATORICS OF REDUCTIONS BETWEEN EQUIVALENCE RELATIONS

COMBINATORICS OF REDUCTIONS BETWEEN EQUIVALENCE RELATIONS COMBINATORICS OF REDUCTIONS BETWEEN EQUIVALENCE RELATIONS DAN HATHAWAY AND SCOTT SCHNEIDER Abstract. We discuss combinatorial conditions for the existence of various types of reductions between equivalence

More information

The Value of Information in Central-Place Foraging. Research Report

The Value of Information in Central-Place Foraging. Research Report The Value of Information in Central-Place Foraging. Research Report E. J. Collins A. I. Houston J. M. McNamara 22 February 2006 Abstract We consider a central place forager with two qualitatively different

More information

Ellipsoid Method. ellipsoid method. convergence proof. inequality constraints. feasibility problems. Prof. S. Boyd, EE364b, Stanford University

Ellipsoid Method. ellipsoid method. convergence proof. inequality constraints. feasibility problems. Prof. S. Boyd, EE364b, Stanford University Ellipsoid Method ellipsoid method convergence proof inequality constraints feasibility problems Prof. S. Boyd, EE364b, Stanford University Ellipsoid method developed by Shor, Nemirovsky, Yudin in 1970s

More information

Optimal Allocation of Policy Limits and Deductibles

Optimal Allocation of Policy Limits and Deductibles Optimal Allocation of Policy Limits and Deductibles Ka Chun Cheung Email: kccheung@math.ucalgary.ca Tel: +1-403-2108697 Fax: +1-403-2825150 Department of Mathematics and Statistics, University of Calgary,

More information

Final Projects Introduction to Numerical Analysis atzberg/fall2006/index.html Professor: Paul J.

Final Projects Introduction to Numerical Analysis  atzberg/fall2006/index.html Professor: Paul J. Final Projects Introduction to Numerical Analysis http://www.math.ucsb.edu/ atzberg/fall2006/index.html Professor: Paul J. Atzberger Instructions: In the final project you will apply the numerical methods

More information

Non replication of options

Non replication of options Non replication of options Christos Kountzakis, Ioannis A Polyrakis and Foivos Xanthos June 30, 2008 Abstract In this paper we study the scarcity of replication of options in the two period model of financial

More information

Portfolio selection with multiple risk measures

Portfolio selection with multiple risk measures Portfolio selection with multiple risk measures Garud Iyengar Columbia University Industrial Engineering and Operations Research Joint work with Carlos Abad Outline Portfolio selection and risk measures

More information

IDENTIFYING BROAD AND NARROW FINANCIAL RISK FACTORS VIA CONVEX OPTIMIZATION: PART II

IDENTIFYING BROAD AND NARROW FINANCIAL RISK FACTORS VIA CONVEX OPTIMIZATION: PART II 1 IDENTIFYING BROAD AND NARROW FINANCIAL RISK FACTORS VIA CONVEX OPTIMIZATION: PART II Alexander D. Shkolnik ads2@berkeley.edu MMDS Workshop. June 22, 2016. joint with Jeffrey Bohn and Lisa Goldberg. Identifying

More information

Lecture IV Portfolio management: Efficient portfolios. Introduction to Finance Mathematics Fall Financial mathematics

Lecture IV Portfolio management: Efficient portfolios. Introduction to Finance Mathematics Fall Financial mathematics Lecture IV Portfolio management: Efficient portfolios. Introduction to Finance Mathematics Fall 2014 Reduce the risk, one asset Let us warm up by doing an exercise. We consider an investment with σ 1 =

More information

Econ 582 Nonlinear Regression

Econ 582 Nonlinear Regression Econ 582 Nonlinear Regression Eric Zivot June 3, 2013 Nonlinear Regression In linear regression models = x 0 β (1 )( 1) + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β it is assumed that the regression

More information

Catalyst Acceleration for Gradient-Based Non-Convex Optimization

Catalyst Acceleration for Gradient-Based Non-Convex Optimization Catalyst Acceleration for Gradient-Based Non-Convex Optimization Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui To cite this version: Courtney Paquette, Hongzhou Lin,

More information