Worst-case evaluation complexity of regularization methods for smooth unconstrained optimization using Hölder continuous gradients

Size: px

Start display at page:

Download "Worst-case evaluation complexity of regularization methods for smooth unconstrained optimization using Hölder continuous gradients"

Lynn Pitts
5 years ago
Views:

1 Worst-case evaluation comlexity of regularization methods for smooth unconstrained otimization using Hölder continuous gradients C Cartis N I M Gould and Ph L Toint 26 June 205 Abstract The worst-case behaviour of a general class of regularization algorithms is considered in the case where only objective function values and associated gradient vectors are evaluated Uer bounds are derived on the number of such evaluations that are needed for the algorithm to roduce an aroximate first-order critical oint whose accuracy is within a user-ined threshold The analysis covers the entire range of meaningful owers in the regularization term as well as in the Hölder exonent for the gradient The resulting comlexity bounds vary according to the regularization ower and the assumed Hölder exonent, recovering known results when available Introduction The comlexity analysis of algorithms for smooth, ossibly non-convex, unconstrained otimization has been the subject of a burgeoning literature over the ast few years see the contributions by Nesterov 2, 5, Gratton, Sartenaer and Toint, Cartis, Gould and Toint 3, 5, 6, 7, Ueda 7, Ueda and Yamashita8, 9, Graiglia, Yuan and Yuan 9, 0, and Vicente 20, for instance) The resent contribution belongs to this active trend and focuses on the analysis of the worst-case behaviour of regularization methods where only objective function values and associated gradient vectors are evaluated It rooses uer bounds on the number of such evaluations that are needed for the algorithm to roduce an aroximate first-order critical oint whose accuracy is within a user-ined threshold An analysis of this tye is already available for the case where the objective function s gradient is assumed to be Lischitz-continuous and where the regularization uses the second or third ower of the norm of the comuted ste at a given iteration see the aer by Nesterov 3 for the former and those of Cartis et al 5, 6 for both cases) The novelty of the resent aroach is to extend the analysis to cover roblems whose objective gradients are simly Hölder continuous and methods that allow weaker regularization than in the Lischitz case Mathematical Institute, Oxford University, Oxford OX2 6GG, Great Britain coraliacartis@mathsoxacuk Numerical Analysis Grou, Rutherford Aleton Laboratory, Chilton OX 0QX, Great Britain nickgould@stfcacuk Namur Center for Comlex Systems naxys) and Deartment of Mathematics, University of Namur, 6, rue de Bruxelles, B-5000 Namur, Belgium hilietoint@unamurbe

2 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 2 The resulting comlexity bounds vary according to the regularization ower and the assumed Hölder exonent, roviding a unified view and recovering known results when available The aer is organized as follows Section 2 resents the roblem and the class of algorithms considered The comlexity analysis itself is given in Section 3 and the shareness of the obtained result is discussed in Section 4 Section 5 finally rovides some comments of the results Notations: In what follows, denotes the Euclidean norm and the T suerscrit denotes transosition If v is a vector in IR n, v i denotes its i-th comonent 2 The roblem and algorithm We consider the roblem of finding an aroximate solution of the otimization roblem min x fx) 2) where x IR n is the vector of otimization variables and f is a function from IR n into IR that is assumed to be bounded below and continuously differentiable with Hölder continuous gradients If we denote gx) = x fx), the latter says that the inequality gx) gy) L x y 22) holds for all x, y IR n, where L 0 and > 0 are constants indeendent of x and y and where is the Euclidean norm on IR n As exlained in Lemma 3 below, we will assume, without loss of generality, that Problems involving functions with Hölder continuous gradients are interesting on their own rights, but can also be found in engineering ractice, such as in the design of gas ielines the Panhandle law which governs such flows states that the gas flow rate in a ieline is a ower between and 2 of the difference in squared ressures, see 6, Section 7, for instance) Such functions also aear in the solution of certain nonlinear PDE roblems see Bensoussan and Frehse ) In our context, an aroximate solution for roblem 2) is a vector x ɛ such that gx ɛ ) ɛ or fx) f target 23) where ɛ > 0 is a user-secified accuracy threshold and f target is a threshold value indeendent of ɛ under which the reduction of the objective function is deemed sufficient by the user The first case in 23) corresonds to finding an aroximate first-order-critical oint If a suitable value for f target is not known, minus infinity can be used instead, in effect making the second art of 23) imossible to satisfy and reducing this condition to its first art The class of regularization methods that we consider for comuting an x satisfying 23) consists of iterative algorithms where, at each iteration, a local linear or quadratic) model of f around the current iterate x k is constructed, regularized by a term using the -th ower of the norm of the ste, and then aroximately minimized in the Cauchy oint sense) to rovide a trial ste s k The quality of this ste is then measured in order to accet the resulting trial oint x k + s k as the next iterate, or to reject it and adjust the strength of the regularization More secifically, a regularized model of fx k + s) of the form m k x k + s) = fx k ) + g T k s + 2 st B k s + σ k s 24)

3 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 3 is considered around the k-th iterate x k, where we have ined g k = gx k ), where B k is a symmetric n n matrix, where σ k > 0 is the regularization arameter at iteration k and where > is the iteration indeendent) user-ined regularization ower In ractice, the matrix B k may be chosen to rovide suitable scaling of the variables if known), for instance using quasi-newton formulae The model 24) is then aroximately minimized in the sense that the trial ste s k is comuted such that m k x k + s k ) m k x k + s C k ), 25) where the Cauchy ste s C k is ined by s C k = αc k g k with α C k = arg min α 0 m kx k αg k ) 26) We will choose the regularization ower in 24) in order to guarantee that m k is bounded below and grows at infinity, thereby ensuring that 26) is well-ined In articular, this imoses the restriction > and furthermore > 2 whenever B k is allowed to not be ositive semi-inite 27) Notice that 25) and 26) together imly that m k x k + s k ) m k x k + s C k ) < fx k) 28) rovided gx k ) 0 We may now describe our class of algorithms more formally as Algorithm 2 on the following age Iterations of Algorithm 2 where ρ k η are called successful and their index set is denoted by S Note that the mechanism of the algorithm ensures that σ k > 0 for all k 0 Note also that each iteration of the algorithm involves a single evaluation of the objective function and for successful iterations only) of its gradient The evaluation comlexity can therefore be carried out by measuring how many iterations are needed before an aroximate first-order critical oint is found or the objective value decreases below the required target If = 2 or = 3, the model minimization occuring in Ste 2 of the algorithm is tyically easy to comute if one is hay with the minimum requirement that 25) and 26) hold: an efficient unidimensional linesearch technique using quadratic or cubic interolation is all that is needed Larger model decrease may be obtained by ursuing the minimization beyond the Cauchy oint, and again efficient algorithms are known for quadratic and cubic regularizations see Cartis et al 4 for the latter case, the former being the well known roblem of minimizing a quadratic function) Good methods are also available for more general values of in effect requiring the one-dimensional minimization of a -th order olynomial) : see Cartis et al 2 for the case of regularized least-norm roblems with general 2 or Gould, Robinson and Thorne 8 for even more general cases 3 Worst-case evaluation comlexity analysis In order to analyze the worst-case comlexity of Algorithm 2, we need to secify our assumtions

4 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 4 Algorithm 2: A Class of First-Order Adative Regularization Methods Ste 0: Initialization An initial oint x 0, a target objective function value f target fx 0 ) and an initial regularization arameter σ 0 > 0 are given, as well as an accuracy level ɛ The constants η, η 2, γ, γ 2 and γ 3 are also given and satisfy 0 < η η 2 < and 0 < γ < < γ 2 < γ 3 29) Comute fx 0 ) and set k = 0 Ste : Test for termination If g k ɛ or fx k ) f target, terminate with the aroximate solution x ɛ = x k Ste 2: Ste calculation Comute the ste s k aroximately by minimizing the model 24) in the sense that conditions 25) and 26) hold Ste 3: Accetance of the trial oint Comute fx k + s k ) and ine ρ k = fx k) fx k + s k ) m k x k ) m k x k + s k ) 20) If ρ k η, then ine x k+ = x k + s k and evaluate gx k+ ); otherwise ine x k+ = x k Ste 4: Regularization arameter udate Set γ σ k, σ k if ρ k η 2, σ k+ σ k, γ 2 σ k if ρ k η, η 2 ), γ 2 σ k, γ 3 σ k if ρ k < η 2) Increment k by one and go to Ste AS The objective function f is continuously differentiable on IR n AS2 g = x f is Hölder continuous in the sense that 22) holds for all x, y IR n and some constants L 0 and > 0 AS3 There exists a constant f low ossibly equal to minus infinity) such that, for all x IR n, fx) f low and f = maxf low, f target > AS4 There exists constants κ gl 0 and κ gu such that κ gl gx) κ gu for all x IR n such that f fx) fx 0 ) AS5 There exists a constant κ B 0 such that, for all k 0, B k κ B

5 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 5 AS and AS2 formalize our framework, as described in the introduction while AS5 is standard in similar contexts and avoids ossibly infinite curvature of the model, which would make the regularization irrelevant Note that the values of L 0 and > 0 are often unknown to the user AS3 states that, if no target value is secified by the user, then there must exist a global lower bound on the objective function s values to make the minimization roblem meaningful The role of AS4 is to take into account that, when f = f target > f low, it may well haen that no single x IR n satisfies both conditions in 23), and thus that the first termination criterion in 23) cannot be satisfied by our minimization algorithm before the second We take this ossibility into account by allowing κ gl > 0, and exresssing the comlexity results in terms of ɛ = max ɛ, κ gl 3) which is the attainable gradient accuracy for the roblem given f target For simlicity of exosition, we assume for now that ɛ <, but comment on the case ɛ at the end of the aer We note that AS4 automatically holds if the set {x IR n f fx) fx 0 )} is bounded, but also, as we discuss in Lemma 32 below, in the frequent situation where fx) is bounded below on the level set {x IR n fx) fx 0 )} We start by deriving consequences of our assumtions, which are indeendent of the algorithm The first is intended to exlore the consequence of a value of exceeding Lemma 3 Suose that AS holds and that AS2 holds for some > Then f is linear in IR n, AS2 holds for all > 0 with L = 0 and AS4 holds with κ gl = κ gu = gx 0 ) Proof If e i is the i-th vector of the canonical basis and gx) i the i-th comonent of the gradient at x, we have, using the Cauchy-Schwarz inequality and the Hölder condition 22), that, for all i =,, n and all x IR n, gx + te i ) i gx) i t gx + te i) gx) x + te i x L t and > 0 Taking the limit when t 0 gives that the directional derivative of each g ) i exists and is zero for all i and at all x Thus the gradient is constant in IR n, f is linear and AS2 obviously holds with L = 0 for all > 0 since gx) gy) is identically zero for all x, y IR n This justifies our choice to restrict our attention to the case where 0, for the rest of our analysis The second result indicates common circumstances in which AS4 holds Lemma 32 Suose that AS and AS2 hold, and that there exists a constant f low > such that fx) f low 32) for all x L 0 = {x IR n fx) fx 0 )} for all x IR n Then AS4 holds

6 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 6 Proof s, Let x L 0 AS, the mean-value theorem, and AS2 then ensure that, for all f low fx + s) fx) + gx) T s + 0 gx + ξs) gx)) T s dξ fx) + gx) T s + L s + = hs) + Given that the minimizer of the convex function hs) is given by 33) we obtain that min s s = gx) L / gx), hs) = hs ) = fx) L + gx) + As a consequence, we obtain, using the fact that fx) fx 0 ) since x L 0 and 33), that which in turn imlies that gx) f low fx 0 ) L + gx) +, L + ) + fx 0 ) f low) = κ gu, irresective of the value of f target This and the choice κ gl = 0 yield the desired conclusion Note that 32) is indeed very common For instance, f low = 0 for all nonlinear least-squares roblems Hence the form of AS4 should not be viewed as overly restrictive and also allows for the case where 32) fails but the objective function s gradient remains reasonably wellbehaved For instance, roblems whose objective function is an ininite quadratic are allowed rovided f target > We now turn to the analysis of the algorithm s roerties But, before we start in earnest, it is useful to introduce some secific notation In a number of occurrences, we need to include some of the terms in formulae only if certain conditions aly We will indicate this by underbracing the conditional art of the formula, the text below the underbrace then secifying the relevant condition For instance we may have an exression of the tye maxa }{{}, b, c, a>0 meaning that the maximum should include the first term if and only if a > 0 making the term well-ined in this case) We first derive two bounds of the ste length, generalizing Lemma 22 in 4

7 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 7 Lemma 33 We have that, for all k 0, s k max B k σ k ) 2 } {{ } B k 0 2, σ k g k ) 34) Moreover, rovided s k ) 2 g k σ k 35) σ k B k ) 36) 2 g k ) 2 Proof Observe first that 24), 28) and g k 0 ensure that m k x k + s k ) fx k ) = g T k s k + 2 st k B ks k + σ k s k < 0 37) Assume first that s T k B ks k > 0 Then we must have that g T k s k + σ k s k < 0, and therefore remembering that σ k > 0 and that g T k s k g k s k ) s k < ) g k σ k ) 2 < g k 38) σ k If s T k B ks k 0, we may rewrite 37) as gk T s k + σ k 2 s k + 2 st k B ks k + σ k 2 s k < 0 and the left-hand side of this inequality can only be negative if at least one of the bracketed exressions is negative, giving that ) ) 2 2 s k max B k, g k, σ k σ k where we also used that gk T s k g k s k and s T k B ks k B k s k 2 Combining this with 38) then yields 34) Checking 35) subject to 36) is straightforward We now turn to the task of finding a lower bound on the model decrease fx k ) m k x k + s k ) resulting from 25)-26) The first ste is to find a suitable ositive lower bound on the ste αk C as ined in 26)

8 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 8 Lemma 34 We have that m k x k + s C k ) m kx k α k g k) < fx k ) 39) where α k = min g k 2 2g T k B kg k }{{} g T k B kg k >0, 2σ k g k 2 ) 30) Proof Substituting the inition s = αg k into 24), we obtain from 25)-26) that, for all α > 0, m k x k αg k ) fx k ) = α g k αgt k B kg k + σ ) k α g k 3) Assume first that g T k B kg k 0 Then for all α 0, ᾱ k where g k 2 + σ k α g k < 0 ᾱ k = ) σ k g k 2 32) and, because α > 0 and g T k B kg k 0, we also obtain from 3) that m k x k αg k ) < fx k ) for all α 0, ᾱ k In articular, this yields that m k x k α k g k) < fx k ), where Condition 26) then ensures that 39) holds as desired Assume next that g T k B kg k > 0 and, in this case, ine α k = min ) αk = 33) 2σ k g k 2 g k 2 ) 2gk T B, kg k 2σ k g k 2 Then it is easy to verify that both bracketed exressions in 2 g k α k gt k B kg k + g 2 k 2 + σ k α k ) g k = ) αk m k x k αk g k) fx k ) are negative and thus, because αk > 0, that m kx k αk g k) < fx k ) The desired conclusion can now be obtained by invoking 26)

9 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 9 We now translate the conclusions of the last lemma in terms of the model reduction at the Cauchy oint and beyond, generalizing Lemma 2 in 4 Lemma 35 We have that fx k ) m k x k + s k ) 4 min g k 4 2gk T B, kg k }{{} gk T B kg k >0 2σ k g k ) 34) Proof If g T k B kg k 0, substituting 33) into 3) immediately yields that ) fx k ) m k x k αk g k) 2σ k g k 2 g k 2 g 2 k 2 = ) g k 2 2σ k 35) If gk T B kg k > 0, we have from 3) and 30) that fx k ) m k x k α k g k) α k = min g k 2 gk 2 ) 2 2gk T B gk T kg B kg k σ ) k k 2σ k g k 2 g k g k 2 ) 2gk T B, kg k 2σ k g k 2 g k 2 g 4 k 2 g 2 k 2 = 4 min g k 4 ) 2gk T B, g k kg k 2σ k Combining this last inequality with 35) and using 25) then gives 34) The model decrease secified by 34) turns out to be useful if the value of σ k aearing at the denominator of the second term in the min) can be bounded above across all iterations We obtain this result in two stages, the first being to determine conditions under which an iteration must be very successful

10 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 0 Lemma 36 Suose that AS, AS2 and AS5 hold Then ρ k η 2, iteration k is very successful and σ k+ σ k i) if + and where σ k κ g k + 36) ) L κ = 2, + ii) if + < and where σ k κ 2 max κ 2 = max 2 g k 2, g k + ) 2 κ B, 2 2+ κ3, 8 κ 3 37) 38) with κ 3 = L + + κ 4 2 B η 2 ) 39) Proof First notice that AS, the mean-value theorem and 24) imly that fx k + s k ) m k x k + s k ) = Using now AS2, we obtain that fx k + s k ) m k x k + s k ) 0 gx k + ξs k ) g k ) T s k dξ 2 st k B ks k σ k s k L + s k + 2 st k B ks k σ k s k 320) Assume first that + which imlies that B 0 because of 27)) Then fx k +s k ) m k x k + s k ) and thus ρ k > η 2 ) if which, in view of 34) and B k 0, holds if that is if σ k L + s k +, σ k L ) + 2 g k, + σ k σ k 2 roving the first item in the lemma s statement ) L gk +, 32) +

11 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions Assume now that > +, in which case B k is allowed to be ininite if > 2 and we cannot guarantee that s T k B ks k 0 in 320) Then ρ k η 2 if r k = fx k + s k ) m k x k + s k ) η 2 )fx k ) m k x k + s k )) < 0 Note that a lower bound on fx k ) m k x k +s k ) is given by Lemma 35 If we now assume that, whenever g T k B kg k > 0, σ k 2 2κ B) g k 2, 322) then we obtain that the minimum occurring in the right-hand side of 34) is achieved by the second term, yielding that fx k ) m k x k + s k ) 4 ) g k 2σ k As a consequence, we obtain from 320), the Cauchy-Schwarz inequality and AS5 that r k L + s k + + κ 2 B s k 2 η ) 2 g k 4 2σ k If we also assume that, whenever B k 0, 36) also holds, then we may substitute the uer bound 35) in this equation and obtain that r k < 0 if ) + ) 2 ) L 2 g k σ κ η 2 B g k < g k k σ k 4 2σ k Now, if, on one hand, then we obtain that r k < 0 if ) + 2 g k σ k ) 2 2 g k, 323) σ k ) ) + ) L κ η 2 2 B g k < g k σ k 4 2σ k Taking the )-th ower and rearranging, we obtain that r k < 0 if ) ) σ k 2 2+ L + + κ 4 2 B gk + 324) η 2 If, on the other hand, 323) fails, then r k < 0 if ) ) 2 ) L κ η 2 2 B g k < g k σ k 4 2σ k Once more taking the )-th ower and rearranging, we obtain that r k < 0 if ) ) L 4 σ k κ 2 B g k 2 325) η 2

12 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 2 Thus r k < 0 and therefore ρ k η 2 ) when > + rovided 324) and 325) hold together with 36) when B k 0) and 322) when gk T B kg k > 0) This roves the second item in the lemma s statement if we note that κ B ) 2) 2 = 2 2 κ B ) < 2 2 κ B ) and 2 2κ B) < 2 ) 2 κ B Note that the second art of the lemma extends the result of Lemma 3 in 5 to general and We are now in osition to rove an iteration-indeendent uer bound on the value of σ k Lemma 37 Suose that AS AS5 hold and that ɛ < Then, as long as the algorithm does not terminate, we have that, for all k 0, i) if +, where κ σ σ k κ σ, 326) = max γ 3 κ κ + gu, σ 0 327) ii) if + <, σ k max κ σ 2, κ σ 3 ɛ +, 328) where κ σ 2 = max 0, γ 3 κ 2 κgu 2, σ 0 }{{} 2 and κ σ 3 = γ 3 κ 2 329) with κ and κ 2 ined in 38) Proof We again distinguish two cases Assume first that +, which in turn imlies that, 2 and thus, in view of 27), that B k 0 for all k Then AS4 and condition Lemma 36 i) imly that σ k+ σ k rovided which is a constant indeendent of k and ɛ + σ k κ κgu, 330) The second case is when + < We first consider the subclass where 2 where, using AS4, g k 2 κ 2 gu 33) This bound, art ii) of Lemma 36 and the fact that g k > ɛ as long as the algorithm has not terminated then imly that σ k+ σ k rovided σ k κ 2 max κgu 2 }{{} 2, ɛ + 332)

13 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 3 where we have used that + < 0 Alternatively, if > 2, art ii) of Lemma 36 and the fact that g k > ɛ as long as the algorithm has not terminated then give that σ k+ σ k rovided σ k κ 2 max ɛ 2, ɛ + where the last equality now results from the fact that, because, + = κ 2 ɛ, 333) 0 > The roof of 326) and 328) is then comleted by taking into account that the initial arameter σ 0 may exceed the bound given by the right-hand side 330) if + ) or 332) if + < ), and also that these bounds may just fail by a small margin at an unsuccessful iteration, resulting in an increase of σ k by a factor γ 3 before the relevant bound alies Having now derived an iteration indeendent uer bound on σ k, we may return to the model decrease given by Lemma 35 Lemma 38 Suose that AS AS5 hold and that ɛ < algorithm does not terminate, Then, as long as the if +, then where if + <, then where fx k ) mx k + s k ) κ m ɛ 334) κ m = ) 4 min, 2κ B 2κ σ, 335) fx k ) mx k + s k ) κ m 2 ɛ + 336) κ m 2 = ) 4 min, 2κ B 2κ σ 337) 2 Proof Assume first that + As above, this imlies that, 2 and hence, because of 27), that gk T B kg k 0 Taking into account that, in this case, gk T B kg k κ B g k 2

14 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 4 because of AS5, substituting 326) into 34) and using 326) and the fact that g k ɛ as long as the algorithm has not terminated, yields that fx k ) m k x k + s k ) 4 min ɛ 2 ) 2κ, B 2κ σ ɛ ) ) 4 min 2κ, B 2κ σ min ɛ 2, ɛ and 334) follows since ɛ < and 2 for, 2 Consider now the case where + < Substituting now 328) into 34), using 328), AS5 and the fact that g k ɛ as long as the algorithm has not terminated, we obtain that fx k ) m k x k + s k ) 4 min ɛ 2 2κ B }{{} g T k B kg k >0 4 min 2κ, B, 2 max 2 max κ σ 2, κ σ 3 ɛ κ σ 2, κ σ 3 ɛ ) which yields 336) since ɛ < and, for + < and 0,, + min ɛ 2, ɛ, ɛ + ) + and + 2 We now recall an imortant technical lemma which, in effect, gives a bound on the total number of unsuccessful iterations before iteration k as a function of the number of successful ones Lemma 39 The mechanism of Algorithm 2 guarantees that, if for some σ max > 0, then k S k σ k σ max, 338) + log γ ) + log log γ 2 log γ 2 σmax σ 0 ) 339) Proof See 5

15 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 5 We are now ready to rove our main result on the worst-case comlexity of Algorithm 2 Theorem 30 Suose that AS AS5 hold and that ɛ ined in 3) satisfies ɛ < If +, there exist constants κ s, κ a and κ c such that, for any ɛ > 0, Algorithm 2 requires at most κ s fx 0 ) f 340) ɛ successful iterations and gradient evaluations),and a total of κ a fx 0 ) f + κ c ɛ 34) iterations and objective function evaluations) before roducing an iterate x ɛ such that gx ɛ ) ɛ or fx ɛ ) f target 2 If + <, there exist constants κ s, κa, κb and κc such that, for all ɛ > 0, Algorithm 2 requires at most fx 0 ) f 342) κ s ɛ + successful iterations and gradient evaluations) and a total of fx 0 ) f + κ b log ɛ + κ c 343) κ a ɛ + iterations and objective function evaluations) before roducing an iterate x ɛ such that gx ɛ ) ɛ or fx ɛ ) f target In the above statements the constants are given by and κ a κ s = κ s = + log γ η κ m log γ 2 κ a κ c = η κ m + log γ log γ 2 =, 344) η κ m ), κ c = κ σ log log γ 2 σ 0 ), 345) ), κ b = 346) log γ 2 = ) log max, κ σ 2, κ σ 3 ) + logσ 0 ), 347) log γ 2

16 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 6 where ) ) L κ = 2, κ2 = max 2 2 κ B, κ3, 8 κ 3 348) with and κ m κ σ κ 3 = = γ 3 κ κ + gu, κ σ 2 = γ 3 max = ) 4 min, 2κ B 2κ σ L + + κ 4 2 B η 2 and κ m 2 0, κ 2 κ 2 gu }{{} 2 ), κ σ 2, 349) = max γ 3 κ 2 350) = ) 4 min, 2κ B 2κ σ 2 35) Proof Consider first the case where + We then deduce from AS3, the inition of a successful iteration and 334) in Lemma 38, that, as long as the algorithm has not terminated, fx 0 ) f fx 0 ) fx k+ ) = fxj ) fx j + s j ) j S k η fxj ) m j x j + s j ) 352) j S k > η κ m ɛ S k, where S k is the cardinality of S k = {j S j k}, that is the number of successful iterations u to iteration k This rovides an uer bound on S k which is indeendent of k and ɛ, from which we obtain the bound 340) with 344) Calling now uon Lemma 39 and 326), we deduce that the total number of iterations and function evaluations) cannot exceed κ s fx 0 ) f ɛ which then gives the bound 34) with 345) + log γ ) + κ σ log log γ 2 log γ 2 The roof for the case where + < is derived in a manner entirely similar to that used for the case where +, relacing ɛ by ɛ + in 352) since 334) is used instead of 336), and also noting that, when using 328) instead of 326) in Lemma 39, max κ σ + 2, κσ 3 ɛ log σ 0 + σ 0 ), log ɛ + logmax, κ σ 2, κ σ 3 ) + logσ 0 )

17 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 7 We may thus deduce that 342) and 343) hold with 346) 35) A close look at the exressions of the constants in 344)-35) reveals that the global uer bound on the gradient norm, κ gu, only occurs in the case where < 2 Therefore, AS4 is only needed in this case since the existence of κ gl 0 is always ensured by the non-negativity of gx) 4 Sharness We now show that the bound secified by art ii) of Theorem 30 is essentially shar in the sense that we exhibit a class of one-dimensional examles where the number of iterations necessary to roduce an aroximate first-order critical oint is arbitrarily close to the theorem s bound ) To achieve this goal, we first establish sequences of iterates {x k }, function values {fx k )}, gradient values {g k } and regularization arameter values {σ k } which can be generated by Algorithm 2 and such that the gradient values converge to zero sufficiently slowly to attain the desired lower bound on the number of iterations and evaluations) Once these are ined, we construct a function fx) which interolates these function and gradient values and finally rove that all our assumtions are satisfied Because the derivation of the comlexity bound involves an increasing sequence of regularization arameters {σ k }, our examle is unfortunately somewhat comlicated because it has to include both successful and unsuccessful iterations We choose to construct it such that all even iterations are unsuccessful and all odd ones are successful Consider the gradient sequence ined, for > +, any arbitrarily small τ 0, ), a ositive integer q and all k 0, by ) + +τ g 2k =, g 2k+ = g 2k 4) k + q and observe that the sequence of gradient norms { g k } is non-increasing for any choice of q Assume first that q = This inition imlies that when k tends to infinity, and thus that ω 2k = g 2k+ g 2k+ + 2 g 2k+3 g 2k+3 g 2k+ 42) ) g 2k g 2k ) + ) 2 43) 3 Hence, there exists an integer l 2 such that ) 2 ) 5 ω 2k, 0, ) for k l 44) We now re)ine q in 4) by setting q = l, in effect shifting the {k} sequence by l such that 42)-44) holds with 4) for the comlete shifted sequence Note that q only deends ) Whether this can also be achieved for art i) of the theorem is still unknown at this oint

18 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 8 on and τ and is indeendent of ɛ Observe also that the rate of monotonic) convergence of the sequence {g k } to zero ensures that, for any ɛ 0, ), g k ɛ only for k larger than 2 ɛ + +τ+) q) In order to ensure the roer rate of increase of σ k, we choose to set σ 2k+ = g 2k+ + 45) for all k 0 remembering that odd iterations are successful), while the value of σ 2k is still to be determined within the constraints of 2) Associated with the sequence {g k }, we ine the sequence of iterates {x k } by x 0 = x = 0, x 2k+2 = x 2k+ + s 2k+ = x 2k+3 k 0) In this inition, the ste s 2k+ at a successful iterations is comuted by minimizing the model 24) with B 2k+ = 0, that is m 2k+ x 2k+ + s) = fx 2k+ ) + g 2k+ s + σ 2k+ s, over s, where the function value fx 2k+ ) is still to be ined A simle calculation shows that ) g2k+ s 2k+ = = g2k+ g 0 <, 46) σ 2k+ where we substituted 45) to obtain the last equality, and that m 2k+ = m 2k+ x 2k+ ) m 2k+ x 2k+ + s 2k+ ) = ) g 2k+ + ) = g 2k+ s 2k+ 47) Similarly, we also ine the ste s 2k as the minimizer of m 2k x 2k + s) with B 2k = 0, yielding g2k s 2k = σ 2k ) 48) and m 2k = m 2k x 2k ) m 2k x 2k + s 2k ) = The sequence of function values is then ined by ) g2k σ 2k ) = ) g 2k s 2k 49) fx 0 ) = fx ) = 0, fx 2k+2 ) = m 2k+ x 2k+ + s 2k+ ) = fx 2k+3 ) k 0), 40) where the second art guarantees the very successful nature of iteration 2k + We observe that, for k 0, fx 2k ) fx 2k+ ) = 0

19 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 9 since iteration 2k is unsuccessful, and yielding that, for every k 0, that fx 2k+ ) fx 2k+2 ) = m 2k+ = ) g 2k+ +, 4) fx 0 ) fx 2k+2 ) = = = k fx 2j+ ) fx 2j+2 ) j=0 ) k j=0 ) k j=0 g 2j+ + ) + + τ j + q Hence the sequence {fx k )} is bounded below by f = ) ζ + + ) τ q + j= j ) + + τ >, 42) where ζ ) is the Riemann zeta function We conclude the inition of the sequences involved in our examle by selecting σ 2k in order to imose that, for all k 0, s 2k = s 2k+ + 2 s 2k+3 43) where, ) is chosen as when ining q above Using 48), this is equivalent to asking 2 2 that g 2k = s 2k+ + σ s 2 2k+3), 2k which, in view of 45), is equivalent to requiring that ) σ 2k g 2k = σ 2k g 2k + s 2k+ + s 2 2k+3 If we now take 46), 43) and 44) into account, this amounts to imosing that ) σ 2k 2 ) 5 = ω 2k,, σ 2k therefore satisfying 2) at successful iterations for a choice of γ 2 ) 2 3 In order to start the recursion, we arbitrarily) ine σ by 45) with k = and g = /q ) + +τ ) We also observe that, for large enough k, σ 2k+ σ 2k = g + 2k+ = ω 2k σ 2k s2k+ + s ) 2 2k+3 s 2k+ s + s 2 s 3 ), ) ) and 2) therefore also holds at unsuccessful iterations As a consequence of this somewhat lengthy descrition, we may therefore deduce that the sequences {x k }, {g k } {σ k } and {fx k )}

20 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 20 may be generated by Algorithm 2 rovided only that iteration 2k is indeed unsuccessful, that is if fx 2k ) fx 2k + s 2k ) < η m 2k, where fx 2k +s 2k ) is the still unined value of our utative objective function at x 2k +s 2k = x 2k s 2k+3 This condition is obviously satisfied if we also imose that fx 2k s 2k+3) = f 2k 2k+3, where f2k+3 = max fx 2k+3 ), fx 2k ) 099 η m 2k, fx 2k+4 ) g 2 2k+4s 2k+3 45) Note that this last condition ensures that and also, since fx 2k ) = fx 2k+ ) > fx 2k+3 ), that fx 2k+2 ) = fx 2k+3 ) f 2k 2k+3 46) f 2k 2k+3 fx 2k+4) 2 g 2k+4s 2k+3 f 2k 2k+3 fx 2k+3), fx 2k+ ) 47) We now turn to the inition of the objective function fx) which must interolate function and gradient values at the iterates We start by noting that, for arbitrary a > 0 and s > 0, function values f a and f b and gradient values g a and g b, it is ossible to construct a function f as t) = f a + g a t + c as sinφ as t) + 48) on the interval a, a + s where the arameters c as and φ as 0, π can be determined to ensure that f as 0) = f a, g as 0) = g a, f a s) = f b and g as s) = g b Indeed, since we deduce that g as t) = g a + c as + )φ as sinφ as t) cosφ as t), 49) which may substitute in 48) to obtain that g b g a = c as + )φ as sinφ as s) cosφ as s), 420) f b f a = g a s + g b g a ) sinφ as s) + )φ as cosφ as s), and hence conclude that φ as s is the smallest ositive root θ as of the nonlinear equation sinθ) θ = ν as cosθ), where ν as = + ) f b f a g a s g b g a )s 42) It is easy to check that such a root always exist in 0, π 2 if ν as > Given φ as, or, equivalently, θ as = φ as s, we also obtain that c as = f b f a g a s sinθ as ) + We now use this interolation technique on each of the sequence of intervals secified in Table 42 Observe that the function is interolated for every successful ste in two ieces with an intermediate oint corresonding for all iterations beyond the first) to the enultimate

21 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 2 unsuccessful trial oint, where condition 45) is imosed as well as a zero gradient We also choose arbitrarily) f 2 = g +)/ 099 3η 2 g σ ) /) corresonding to a fictitious unsuccessful iteration of index k = 2 with g 2 = g and σ 2 = σ / + 2 ) ) Iteration Interolation interval Interolated values k a, a + s f a g a f b g b x, x + s 2 fx 0 ) = fx ) g f 2 0 x + s 2, x fx 2 ) g 2 3 x 3, x 3 + s 2 3 fx 2 ) = fx 3 ) g 3 f x 3 + s 2 3, x 4 f3 0 0 fx 4 ) g 4 5 x 5, x 5 + s 2 5 fx 4 ) = fx 5 ) g 5 f x 5 + s 2 5, x 6 f5 2 0 fx 6 ) g 6 2k + x 2k+, x 2k+ + s 2 2k+ fx 2k ) = fx 2k+ ) g 2k+ f 2k 2 2k+ 0 2k + x 2k+ + s 2 2k+, x 2k+2 f 2k 2 2k+ 0 fx 2k+2 ) g 2k+2 Table 42: Interolation conditions for successful iterations For the function 48) and its gradient 49) to be well-ined, we still need that ν as > for each interolation interval Consider the first such interval at iteration 2k + k 0) and ν 2k+, the value of ν as corresonding to that interval Using 46), we obtain that ν 2k+ 2k f2k+ = + ) fx 2k+) + g 2 2k+s 2k+ g + > 422) 2 2k+s 2k+ as desired For the second interolation interval at iteration 2k +, we have that ν 2 2k+ 2k 2 f2k+ = + ) fx 2k+2) ) g 2 2k+2s 2k+ + ) g 2 2k+2s 2k+ g 2 2k+2s 2k+ = + > 423) where we have used 45) to derive the inequality We therefore obtain from 422) and 423) that, for all k 0, the desired roots θ2k+ and θ2 2k+ exist and satisfy θ 2k+ π 2 and θ2 2k+ π 2 424) As a consequence sinφ i 2k+t) is ositive on each interolation interval i =, 2), and our interolating function and its gradient are also well-ined for each interval Moreover, since both ν2k+ and ν2 2k+ are bounded below by +, we obtain that there is a constant κ θ > 0 such that θ2k+ κ θ, π 2 and θ2 2k+ κ θ, π, 425) 2

22 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 22 and thus that there exists a constant κ sin > 0 indeendent of k such that sinθ 2k+ ) κ sin and sinθ 2 2k+ ) κ sin 426) Figure 4 shows the shae of the resulting function and Figure 42 the shae of its gradient, whose construction imlies that AS holds Figure 4 also shows the shae of the models m 2k x 2k + s) on the intervals x 2k, x 2k + s 2k = x 2k, x 2k s 2k+3 dashed lines), illustrating that the model is a bad redictor of the objective function value at the oint x 2k + s 2k, causing the unsuccessful nature of iteration 2k Note that fx) may be extended smoothly into a decreasing function for x < Figure 4: The shae of fx) for the first 8 successful iterations and the shae of the model at each unsuccessful iteration dashed) for = 045, = 2, τ = 000, η = 06 and q = 3 As can be checked in these figures, fx) is nonconvex and continuously differentiable The form 49) imlies that gx) varies very quickly at the beginning of each interolation interval, which is visible in Figure 42 We now investigate the roerties of our interolant further, and observe that, because of 4), 47), 47), the fact that fx 2k ) = fx 2k+ ) and the inequality g 2k+2 s 2k+ g 2k s 2k, f 2k 2 2k+ fx 2k+) maxfx 2k ) fx 2k+ ), fx 2k+2 ) 2 g 2k+2s 2k+ fx 2k+ ) x < g 2k s 2k < max m 2k, 2 g 2k+2s 2k+ < max g 2k s 2k, g 2k+2 s 2k+

23 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions Figure 42: The shae of gx) for the first 8 successful iterations for = 045, = 2, τ = 000, η = 06 and q = 3 and hence that ν2k+ = + ) f 2k 2 2k+ fx 2k+) + g 2 2k+s 2k+ g 2 2k+s 2k+ ) 2 + ) g g 2k+ s 2k+ 2k s 2k + g 2 2k+s 2k+ ) 2 + ) < 2 g g 2k+ s 2k+ 2k s 2k = 4 + ) g + 2k g 2k+ 427) 4 + ) where we used 42) Similarly, using 47), 40), 47), 4) and 4) in succession, we

24 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 24 obtain that ν2k+ 2 = + ) f 2k 2 2k+ fx 2k+2) g 2 2k+2s 2k+ 2 + ) max fx 2k ) fx 2k+2 ), 2 g 2k+2s 2k+ g 2k+2 s 2k+ 2 + ) max m 2k + m 2k+, g 2 2k+2s 2k+ g 2k+2 s 2k+ 2 m2k 2 + ) max g 2k+2 s 2k+, 2 = 2 + ) max 2 ) m2k g 2k+ s 2k+ m 2k+ g 2k+2 s 2k+, 2 = 2 + ) max 2 ) + g 2k g 2k+ s 2k+ g 2k+ g 2k+2 s 2k+, 2 = 2 + ) max 2 ) + g 2k g 2k+ g 2k+ g 2k+2, ) max 2 ), 2 428) We may therefore deduce from 427) and 428) that there exists a constant κ ν > 0 indeendent of k such that, for all k 0, ν 2k+ κ ν and ν 2 2k+ κ ν As a consequence, and since the nonlinear equation in 42) can be written in the form tanθ) = ν as θ, we obtain that θ as is uniformly bounded away from π 2 and hence that there exists a constant κ cos > 0 such that cosθ as ) = cosφ as s) κ cos 429) for every interolation interval Consider now 0 t < t 2 s for a given interolation interval a, a + s Because of 424), we then have that gt 2 ) gt ) = c as + )φ as { sinφ as t 2 ) cosφ as t 2 ) sinφ as t ) cosφ as t ) } c as + )φ as { sinφ as t 2 ) cosφ as t 2 ) sinφ as t 2 ) cosφ as t ) + sinφas t 2 ) cosφ as t ) sinφ as t ) cosφ as t ) } = c as + )φ as { sinφ as t 2 ) cosφ as t 2 ) cosφ as t ) + cosφ as t ) sinφ as t 2 ) sinφ as t ) } c as + )φ as { cosφas t 2 ) cosφ as t ) + sinφas t 2 ) sinφ as t ) } Now, using the mean-value theorem, π ) cosφ as t 2 ) cosφ as t ) = sinξ) φ as t 2 t φ 2 as t 2 t 430)

25 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 25 where ξ φ as t, φ as t 2 ) and where we have used the fact ) 2φas t 2 t ) 2φas t 2 t φ as t 2 t = π 2 π π 2 π because φ as t 2 t φ as s π 2 Moreover, using the inequality and the fact that u v u v for all u, v 0,, 43) ) φas sin 2 t 2 t ) < φ as 2 t 2 t ) since φ as t 2 t ) φ as s π 2, we deduce that sinφ as t 2 ) sinφ as t ) sinφ as t 2 ) sinφ as t ) ) = 2 cos φas 2 t ) 2 + t ) φas sin 2 t 2 t ) ) 2 sin φas 2 t 2 t ) < φ as t 2 t Thus, combining this inequality with 430), we obtain that π ) gt 2 ) gt ) + + ) c as φ + as t 2 t 432) 2 But we know from 46) that, for all k 0, g 2k+ = s 2k+ and g 2k+2 = g 2k+3 = s g 2k+3 2k+ g 2k+ s 2k+ As a consequence, we deduce using Table 42 that, for every interolation interval, g b g a 2 s because the length s of each interval is equal to half that of the corresonding successful ste Using this inequality and 420), we obtain that c as φ + as φ as g b g a + )sinθ s ) cosθ s ) 2 φ ass + )sinθ s ) cosθ s ) ) π2 2 + )κ sin κ cos 433) where we used the equality φ as s = θ as, 424), 426), and 429) to derive the last inequality Hence, we deduce from 432) that, for x and y belonging to the same interolation interval, π π ) gx) gy) κ sin x y = κ L 2 x y 434) cos

26 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 26 Consider now 0 x < y where x and y belong to different interolation intervals and assume first that y belongs to the interolation interval following that containing x Then, if z x, y) is the junction oint between the two successive intervals, gx) gy) gx) gz) + gz) gy) L 2 x z + L 2 z y 435) L x y where we use the triangle inequality, 434) on each interval, and the fact that u + v 2u + v) for all u, v 0, Consider finally 0 x < y where x and y belong to different interolation intervals, where y does not belong to the interval following that containing x Let us denote by r x the smallest root of g larger than x and by r y the largest root smaller than y Note that the existence of these roots is guaranteed by the construction of the interolating function f which ensures that stationary oint occurs at the junction between to two interolation intervals covering a single successful ste It is easy to verify that x and r x must belong either to the same interolation interval or to two successive intervals The same is true of r y and y, yielding that x r x and r y y 436) Moreover, using either 434) or 435), we have that gx) gr x ) L x r a and gr y ) gy) L r b y and we may deduce, using 436) and 43), that gx) gy) gx) gr x ) + gr y ) gy) L rx x) + y r y ) ) L r x x + y r y ) L y x 437) It then results from 434), 435) and 437) that gx) is Hölder continuous and AS2 is satisfied in our examle This is illustrated in Figure 43 We also note that, because of 425), the inition of θ as, the fact that <, 46) and 2 the decreasing nature of { g k }, we have that, for every interolation interval, φ as > κθ ) κ θ s g a κ θ g 0 Hence 49) and 433) ensure that gx) is bounded above for x 0, which, together with the inequalities fx k ) f >, s k and the mean-value theorem alied in each interval, guarantees that there exists a constant f low > such that fx) f low for all x 0 Thus AS3 holds with f target = and f = f low Moreover, AS4 trivially follows with κ gl = 0, κ gu = and ɛ = ɛ AS5 is satisfied by construction with κ B = 0 since we set B k = 0 for all k 0 We therefore conclude that all our assumtions hold and that our examle is valid, in that Algorithm 2 alied on fx) with arbitrarily small τ 0, ) in the case where > + needs at least 2 ɛ + +τ+) q)

27 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 27 Figure 43: The shae of the function gx) gy) / x y for the interval sanned by the first 8 successful iterations for = 045, = 2, τ = 000, η = 06 and q = 3 iterations and function evaluations) to obtain an iterate x ɛ such that gx ɛ ) ɛ Since q is indeendent of ɛ, this shows that the comlexity bound stated by art ii) of Theorem 30 is essentially shar 5 Discussion Which ower of ɛ < dominates in the comlexity bounds of Theorem 30 is illustrated in Figure 54 as a function of and It is interesting to note that the worst-case evaluation comlexity of our general class of regularized method does deend on the relative values of and Observe also that, when ɛ <, ɛ > ɛ + ) in the triangle for which + and 2 As can be seen in this figure, there is little incentive for a user to choose a regularization ower < 2, at least from the worst-case comlexity oint of view not to mention the need of AS4) It is also interesting to observe that, if 2, the comlexity no longer deends on the recise value of, but only deends on the smoothness of the objective function as measured by the Hölder exonent whose knowledge is not required a riori) In that sense, the algorithm adats itself to the roblem at hand without any further user tuning see also the universal gradient methods by Nesterov for the convex case 4) If ɛ that is if either ɛ or κ gl ), the results above simlify because negative owers of ɛ are bounded above by one As a consequence, all terms involving such owers which we ket exlicit in the analysis for ɛ < ) are absorbed in the constants, and the comlexity bounds of Theorem 30 essentially reduce to multiles of the difference fx 0 ) f Note also that Lemma 3 allows us to equate > with = and κ gl = gx 0 ) In this case, either ɛ = ɛ > gx 0 ) and Algorithm 2 stos at iteration 0, or ɛ = gx 0 ) and the bounds of Theorem 30 become indeendent of ɛ, resulting in a bound on the number of

28 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 28 5 O) Oε * /) ) 05 Oε * +/) ) + = Figure 54: Worst-case evaluation comlexity as a function of and in the cases where ɛ < iterations and evaluations directly roortional to fx 0 ) f target, as exected We conclude by observing that the theory resented above recovers known results see 5 for the case where = 3 and = and 3, 6 for the case where = 2 and = ); these cases corresond to the thick dots in Figure 54 References Alain Bensoussan and Jens Frehse Regularity results for nonlinear ellitic systems and alications Sringer Verlag, Heidelberg, Berlin, New York, C Cartis, N I M Gould, and Ph L Toint Trust-region and other regularization of linear least-squares roblems BIT, 49):2 53, C Cartis, N I M Gould, and Ph L Toint On the comlexity of steeest descent, Newton s and regularized Newton s methods for nonconvex unconstrained otimization SIAM Journal on Otimization, 206): , C Cartis, N I M Gould, and Ph L Toint Adative cubic overestimation methods for unconstrained otimization Part I: motivation, convergence and numerical results Mathematical Programming, Series A, 272): , 20 5 C Cartis, N I M Gould, and Ph L Toint Adative cubic overestimation methods for unconstrained otimization Part II: worst-case function-evaluation comlexity Mathematical Programming, Series A, 302):295 39, 20 6 C Cartis, N I M Gould, and Ph L Toint On the evaluation comlexity of comosite function minimization with alications to nonconvex nonlinear rogramming SIAM Journal on Otimization, 24):72 739, 20

29 Cartis, Gould, Toint: Comlexity of unconstrained otimization of C, functions 29 7 C Cartis, N I M Gould, and Ph L Toint On the evaluation comlexity of cubic regularization methods for otentially rank-icient nonlinear least-squares roblems and its relevance to constrained nonlinear otimization SIAM Journal on Otimization, 233):553574, N I M Gould, D P Robinson, and H S Thorne On solving trust-region and other regularised subroblems in otimization Mathematical Programming, Series C, 2):2 57, G N Graiglia, J Yuan, and Y Yuan Global convergence and worst-case comlexity of a derivative-free trust-region algorithm for comosite nonsmooth otimization Technical reort, University of Parana, Curitiba, Brasil, G N Graiglia, J Yuan, and Y Yuan On the convergence and worst-case comlexity of trust-region and regularization methods for unconstrained otimization Mathematical Programming, Series A, to aear), 204 S Gratton, A Sartenaer, and Ph L Toint Recursive trust-region methods for multiscale nonlinear otimization SIAM Journal on Otimization, 9):44 444, Yu Nesterov Introductory Lectures on Convex Otimization Alied Otimization Kluwer Academic Publishers, Dordrecht, The Netherlands, Yu Nesterov Gradient mehods for minimizing comosite objective functions Mathematical Programming, Series A, 40):25 6, Yu Nesterov Universal gradient methods for convex otimization roblems Technical Reort DP 203/2640, CORE, Catholic University of Louvain, Louvain-la-Neuve, Belgium, Yu Nesterov and B T Polyak Cubic regularization of Newton method and its global erformance Mathematical Programming, Series A, 08):77 205, Gas Processors and Suliers Association Engineering Data Book Vol 2 GPSA, Tulsa, USA, K Ueda A Regularized Newton Method without Line Search for Unconstrained Otimization PhD thesis, Deartment of Alied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Jaan, K Ueda and N Yamashita Convergence roerties of the regularized Newton method for the unconstrained nonconvex otimization Alied Mathematics & Otimization, 62):27 46, K Ueda and N Yamashita On a global comlexity bound of the Levenberg-Marquardt method Journal of Otimization Theory and Alications, 47: , L N Vicente Worst case comlexity of direct search EURO Journal on Comutational Otimization, :43 53, 203

Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models

Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models Worst-case evaluation comlexity for unconstrained nonlinear otimization using high-order regularized models E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint 2 Aril 26 Abstract