On the complexity of the steepest-descent with exact linesearches

Size: px

Start display at page:

Download "On the complexity of the steepest-descent with exact linesearches"

Alexina Mathews
5 years ago
Views:

1 On the complexity of the steepest-descent with exact linesearches Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint 9 September 22 Abstract The worst-case complexity of the steepest-descent algorithm with exact linesearches for unconstrained smooth optimization is analyzed, and it is shown that the number of iterations of this algorithm which may be necessary to find an iterate at which the norm of the objective function s gradient is less that a prescribed ǫ is, essentially, a multiple of /ǫ 2, as is the case for variants of the same algorithms using inexact linesearches. Introduction The worst-case analysis of optimization algorithms for finding unconstrained stationary points of nonlinear non-convex functions has recently been considered in a number of contributions (see Nesterov, 24, Nesterov and Polyak, 26, Cartis, Gould and Toint, 2a, 2b, 2c, 22a, 22b, 22c, Vicente, 2, Bian, Chen and Ye, 22, Gratton, Sartenaer and Toint, 28, or Jarre, 2, to cite a few. In particular, the study of the steepest-descent method, the most archetypal method for unconstrained nonlinear optimization, was considered by several authors, whose analysis differ primarily by the particular technique used for (possibly approximately minimizing the objective function along the steepest-descent direction. An upper bound on the number of iterations required to obtain an approximate stationary point was given by Nesterov (24 using a variant of the algorithm where the step is computed using the knowledge of a global Lipschitz constant on the gradient of the objective function. He showed that at most O(ǫ 2 iterations might be needed to find an iterate at which the Euclidean norm of the gradient is below a generic tolerance ǫ >. As it turns out, his result also applies to the pure steepest-descent algorithm, that is the variant using exact linesearches. A lower complexity bound was also obtained by Cartis, Gould and Toint (2, where it was shown that the bound of O(ǫ 2 iterations is essentially tight for a version using a Goldstein type linesearch. However, this result depends on a one-dimensional counter-example where the objective function is monotonically decreasing, in which case an exact linesearch would obviously give much better results. The purpose of this short paper is to close the remaining conceptual gap, that is to show that the lower bound of O(ǫ 2 iterations also holds for the steepest-descent algorithm with exact linesearches when applied on functions with globally Lipschitz continuous gradient. The next section recalls the algorithms and the assumptions required for our complexity analysis. Section 3 proposes an example of worst-case behaviour for the method, while School of Mathematics, University of Edinburgh, The King s Buildings, Edinburgh, EH9 3JZ, Scotland, UK. coralia.cartis@ed.ac.uk. The work of this author was supported by EPSRC Grant EP/I28854/. Computational Science and Engineering Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX QX, England, UK. nick.gould@stfc.ac.uk. Department of Mathematics, FUNDP - University of Namur, 6, rue de Bruxelles, B-5, Namur, Belgium. philippe.toint@fundp.ac.be.

2 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 2 Section 4 is devoted to verifying that the example does satisfy the assumptions stated. A few words of conclusion are presented in Section 5. 2 The steepest-descent method with exact linesearches We consider the unconstrained minimization problem min x IR nf(x (2. where f(x is a smooth function from IR n into IR. One of the simplest and oldest algorithm for solving this problem is the steepest-descent method by Cauchy (847, whose iterates are defined, for a given initial guess x, by the simple iteration x k+ = argmin t f(x k tg k, (k (2.2 where g k = x f(x k and where ties are broken by choosing the first minimizer of f(x k tg k if there is more than one (say. This choice is of course most often numerically unrealistic, except for special functions f(x such as quadratics, where the minimizer can be determined analytically. But it remains an ideal that numerically sounder techniques attempt to imitate, justifying our curiosity. The assumptions we make on problem (2. are as follows. AF. f(x is bounded below on IR n, that is there exists a constant κ lbf such that, for all x IR n, f(x κ lbf. AF. f(x is continuously differentiable on IR n. AF.2 g(x = x f(x is Lipschitz continuous on IR n, that is there exists a constant L g such that, for all x,y IR n, g(x g(y L g x y. Here and below, stands for the Euclidean norm. We now briefly recall the upper complexity bound for algorithm (2.2 by suitably reformulating the result of Nesterov (24. Theorem 2. Suppose that AF. AF.2 hold. Then there exists a constant κ upp depending on x and possibly on n such that, for all ǫ (, at most κupp ǫ 2 (2.3 iterations of method (2.2 are needed to obtain an iterate x k such that g k ǫ. Proof. We first note that AF., Taylor s expansion at x k and AF.2 give that, for each k, f(x k f(x k tg k f(x k f(x k +t g k 2 2t 2 L g g k 2 for any t. Maximizing the right-hand side of this inequality with respect to t, we obtain that f(x k f (x k Lg g k g k 2 ǫ2 (2.4 2L g 2L g

3 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 3 for each iteration k, as long as g k > ǫ. But (2.2 ensures that the slope of f(x k tg k must be zero at x k+ = x k t k g k, giving that, for all k, = g k,g(x k+ = g k 2 + g k,g(x k t k g k g k g k 2 ( L g t k, where we used the Cauchy-Schwarz inequality and AF.2. This implies that t k, the argument of the (first minimum in (2.2, is such that t k /L g and therefore, because of (2.4, that, for each k, f(x k f(x k t k g k f(x k f (x k Lg g k ǫ2 2L g as long as g k > ǫ. Thus a maximum number of 2Lg (f(x κ lbf ǫ 2 def = κupp ǫ 2 such iterations may take place before x k is found such that g k ǫ. The purpose of the present paper is to show that the bound (2.3 is essentially tight, which cannot be deduced from the one-dimensional example of Cartis et al. (2. The next section describes how to build a new two-dimensional example where algorithm (2.2 essentially requires O(ǫ 2 iterations to achieve g k ǫ. 3 Constructing a counter-example Because, as in Cartis et al. (2, our example is based on polynomial Hermite interpolation, we first state and prove crucial properties of this type of interpolation. Theorem 3. Assume that real values f, g, h, f T, g T, h T and T > are known. Then there exists a fifth order polynomial p(t def = c +c t+c 2 t 2 +c 3 t 3 +c 4 t 4 +c 5 t 5, t [,T], such that p( = f, p ( = g and p ( = h, p(t = f T, p (T = g T and p (T = h T. The coefficients of this polynomial are given by where c = f, c = g, c 2 = 2h, c 3 = T (r 4r + 2r 2, c 4 = T 2( 5r +7r r 2 and c 5 = T 3(6r 3r + 2r 2, r = T 2(f T f g T 2h T 2, r = T (g T g h T and r 2 = h T h. Moreover, if there are non-negative constants κ, κ and κ 2 such that (3.5 r κ, r κ and r 2 κ 2, (3.6 Then there exists κ f, κ g and κ h only depending on κ, κ and κ 2 such that, for all t [,T], p(t f + g T + 2 h T 2 +κ f T 2, (3.7 p (t g + 2 h T +κ g T, and p (t h +κ h. (3.8

4 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 4 Proof. (See Cartis et al., 2c. Using the form of p(t, we write the desired interpolation conditions as p( = c = f, p ( = c = g, p ( = 2c 2 = h (3.9 (which immeditaley gives the desired values for c, c and c 2 and p(t = c +c T +c 2 T 2 +c 3 T 3 +c 4 T 4 +c 5 T 5 = f T, p (T = c +2c 2 T +3c 3 T 2 +4c 4 T 3 +5c 5 T 4 = g T, p (T = 2c 2 +6c 3 T +2c 4 T 2 +2c 5 T 3 = h T. These conditions can the be re-expressed as a linear system with unknowns c 3, c 4 an c 5, whose solution exists and turns out to be c 3 T 4 2 [f T c 4 = T f g T 2h T 2 ] T c 2 T[g T g h T], T 3 2 h T h completeing the proof of (3.5. Taking absolute values in this relation, we obtain that c 3 T[κ +4κ + 2κ 2 ] κ c3 /T c 4 [5κ T c κ +κ 2 ] def = κ c4 /T 2. [6κ T 3 +3κ + 2κ 2 ] κ c5 /T 3 As a consequence, we have that, for all t [,T], p(t f +g T + 2h T 2 +(κ c3 +κ c4 +κ c5 T 2, (3. which gives (3.7 with κ f def = κ + κ c3 + κ c4 + κ c5. Similarly, we obtain that, for all t [,T], p (t g +h T +(3κ c3 +4κ c4 +5κ c5 T (3. yieldling the first part of (3.8 with κ g def = κ +3κ c3 +4κ c4 +5κ c5, and p (t h +(6κ c3 +2κ c4 +2κ c5, (3.2 from which the second part of (3.8 finally follows with κ h def = κ 2 +6κ c3 +2κ c4 +2κ c5. We now turn to construction our worst-case example for the steepest-descent method (2.2. The idea is to fix an arbitrary τ (, 3] and then to define f(x,y, the objective function in the example as the sum of f (x and f 2 (x,y. As in Cartis et al. (2, f (x is defined by piecewise Hermite polynomial interpolation between the sequence of iterates of the values x =, x k+ = x k +σ k (k (3.3 f (x = ζ(+2η, f (x k+ = f (x k σ 2 k, f (x k = σ k, and f (x k =, (3.4 where ζ( is the Riemann zeta function and η = η(τ def = 2 τ 2 = τ ( 4 2τ (, 2 def 2 +η and σ k =. (3.5 k + From (3.5, we then find that, for x [x k,x k+ ] and t = (x x k /σ k, f (x = f (x k σ 2 kt+σ k (σ k σ k+ [ 4t 3 +7t 4 3t 5],

5 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 5 and It is easy to verify that, for t [,] and thus, using (3.6, that In addition, taking into account that f (x = σ k +(σ k σ k+ [ 2t 2 +28t 3 5t 4], (3.6 f (x = σ k σ k+ σ k [ 24t+84t 6t 3 ]. (3.7 2t 2 +28t 3 5t 4 = t 2 [2 28t+5t 2 ] f (x σ k+ < for all x [x k,x k+ ]. (3.8 < σ k σ k+ σ k = ( k + 2 +η <, k +2 for k and that t [,] if x [x k,x k+ ], we obtain by a straightforward majoration in (3.7 that f (x < 68 (3.9 for x [x k,x k+ ], which in turn implies that f (x is uniformly bounded for all x. The behaviour of f (x and of its first and second derivatives are pictured in Figure 3. on this page. 2.5 x Figure 3.: The function f (x and its first two derivatives (from top to bottom and left to right on the first 8 intervals We now turn to the specification of the function f 2 (x,y, whose role is to limit the iterates in the y-direction to a progressively narrower corridor, thereby forcing the iteration path to oscillate between its lower and upper limits. We have already prescribed

6 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 6 that the x-components of the successive iterates are given by (3.3, and f 2 (x,y will be constructed in such a way that its gradient at the iterates is equal in norm to that of f (x but alternating in sign, thus generating the necessary orthogonality conditions and the oscillating iteration path. More specifically (and in accordance with (3.3, define, for all k, with ( x y = (, ( xk+ = y k+ ( xk y k + ( σ k ( k σ k σ k = f (x k, (3.2 defining the zig-zaging piecewise linear iteration path y(x illustrated in Figure Figure 3.2: The iteration path y(x (plain, the upper and lower boundaries y low (x and y up (x (dashed and y mid (x (dotted for k =,...,8 and η = 5. We now define the lower and upper boundaries of the corridor containing the iterates. This is achieved by defining the lower boundary y low (x as a twice continuously differentiable curve that interpolates the y coordinates of the iterates of index 2k (k and is constant on the intervals [x 2k,x 2k ], yielding y low (x 2k = y low (x 2k = y 2k. Polynomial Hermite interpolation is used to twice continuously connect the constants parts. The upper boundary y up (x is defined in the same way to interpolate the y coordinates of the iterates of index 2k + (k, being constant on [x 2k,x 2k+ ], yielding y up (x 2k = y up (x 2k+ = y 2k+. Both y low (x and y up (x are shown on Figure 3.2, as well as their average y mid (x = 2(y up (x+y low (x. If we define (the corridor width at x, we note that, by construction, δ(x def = y up (x y low (x, (3.2 δ(x k = σ k for all k. (3.22 Moreover, since the interpolation conditions defining y up (x are given (for x [x 2k,x 2k ], say by y up (x 2k = ( i σ i and y up (x 2k = y up (x 2k+ = 2k 2 i= 2k i= ( i σ i, (3.23 y up(x 2k = y up(x 2k = and y up(x 2k = y up(x 2k =, (3.24

7 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 7 acloserinspectionoftheinterpolatingpolynomial(see(3.5revealsthat,forx [x 2k,x 2k ], y up (x = y up (x 2k (σ 2k σ 2k [ t 3 5t 4 +6t 5], (3.25 where t = (x x 2k /σ 2k. Symmetrically, we have that, for x [x 2k,x 2k+ ], y low (x = y low (x 2k +(σ 2k σ 2k+ [ t 3 5t 4 +6t 5], (3.26 where t = (x x 2k /σ 2k. We thus obtain from (3.25 and (3.26, using (3.22 and defining t = (x x k /σ k, that, for x [x k,x k+ ] and δ(x = σ k (σ k σ k+ [ t 3 5t 4 +6t 5] (3.27 y mid (x = y mid (x k + 2( k (σ k σ k+ [ t 3 5t 4 +6t 5]. (3.28 These two last relations yield that and also that δ (x = 2( k+ y mid(x = 3 σ k σ k+ σ k [ t 2 2t 3 +t 4], (3.29 δ (x = 2( k+ y mid(x = 6 σ k σ k+ [ t 3t 2 σk 2 +2t 3]. (3.3 The last inequality in (3.29 results from the decreasing nature of σ k and the fact that 2t + t 2 = ( t 2 for t [,]. It immediately implies, with (3.22 and (3.29, that δ(x is non-increasing and that σ k = δ(x k δ(x δ(x k+ = σ k+ for x [x k,x k+ ]. (3.3 The next step is to define, for each x, f 2 (x,y as a twice continuously differentiable function of y whose value is small between y low (x and y up (x and first increases before levelling off when the distance of y to the corridor increases, thereby keeping the iterates within the corridor. The details of f 2 (x,y are given by 8δ(x 2 if y y low (x f 2 (x,y = (y y mid (x 2 if y [y low (x,y up (x] (3.32 8δ(x 2 if y y up (x+ where Hermite interpolation is once more used to twice continuously connect the first and second interval, as well as the second and third. In the first of these intervals, f 2 (x,y is thus defined by a fifth order polynomial translated to [, ], with boundary conditions on this latter interval given by and p( = 8δ(x 2, p ( =, p ( = p( = (y low (x y mid (x 2, p ( = 2(y low (x y mid (x = δ(x and p ( = 2. The interpolation conditions on the second interval are symmetrically defined. Figure 3.3 shows the shape of f 2 (x,y for fixed x. Note that f 2 (x,y is symmetric in y with respect to y mid (x by construction. Note also that, using (3.2, the definition of y mid (x and (3.22, f 2 y (x 2k,y 2k = f 2 y (x 2k,y low (x 2k = 2(y low (x 2k y mid (x 2k = δ(x 2k = σ 2k (3.33

8 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches Figure 3.3: The shape of f 2 (x,y for x = x 2 and η = 5, the vertical lines indicating the values of y low (x 2 and y up (x 2. and, similarly, f 2 y (x 2k+,y 2k+ = f 2 y (x 2k+,y up (x 2k+ = 2(y up (x 2k+ y mid (x 2k+ = δ(x 2k+ = σ 2k+. (3.34 Note also that, because of (3.32 and (3.29 taken at x = x k (i.e. t =, f 2 x (x k,y k = 2(y k y mid (x k y mid(x k =. (3.35 We finally define the objective function of our minimization problem (2. by f(x,y def = 2 [ f (x+f 2 (x,y ] 2 [ f (+xf (+f 2 (,y ] for x,y IR, for x <,y IR, (3.36 whose contour lines, superimposed on the path of iterates, are shown in Figure 3.4. We thus obtain, using (3.4, (3.33-(3.34 and (3.35, that g SD2 (x k,y k = ( σ k 2 ( k, (3.37 σ k and therefore that g SD2 (x k = σ k. Because of the definition of σ k in (3.5, this implies that the algorithm will require, for any ǫ (,, at least ǫ 2 τ (3.38 iterations to produce an iterate x k such that g k ǫ. This allows us to conclude, as desired, that the evaluation complexity bound of O(ǫ 2 is essentially sharp, provided we can show that f(x,y is bounded below and has a globally Lipschitz continuous gradient, and that the slope of f(x,y is always non-positive along the trajectory. This is the object of the next section. 4 Verifying the example We start by a useful auxiliary result.

9 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches Figure 3.4: The countour lines of f(x,y and the path of iterates for η = 5. Lemma 4. The values of f 2 (x,y, f 2 x (x,y, f 2 y (x,y, 2 f 2 x 2 (x,y, 2 f 2 y 2 (x,y and 2 f 2 x y (x,y are uniformly bounded (in absolute value for all x and y [y low (x,y low (x] [y up (x,y up (x+]. Proof. Because, for each x and y [y low (x,y low (x], f 2 (x,y is a polynomial in y on an interval of length one, its values and that of its first and second derivatives with respect to y are uniformly bounded (in absolute value provided its coefficients are uniformly bounded, which is the case (see (3.5 with T = in Theorem 3., page 3 if the quantities 8δ(x 2 (y up (x y mid (x 2 2 and δ(x (4.39 are themselves uniformly bounded (the third component of the right-hand side of (3.5 being identically equal to 2. But this is the case for the first term in (4.39 since 8δ(x 2 (y up (x y mid (x 2 = 8δ(x 2 4δ(x 2 < 8δ(x 2 8, and for the second because of (3.3 and the bound σ k. What about the derivatives with respect to x (for y [y low (x,y low (x]? Since f 2 (x,y is defined, in this interval, as a polynomial in y shifted to [,], the dependence in x is entirely captured by the coefficients c,...c 5 of this polynomial, themselves depending on the boundary conditions c = 8δ(x 2, c = and c 2 = (4.4 and(3.5. Theboundednessofthefirstandsecondderivativesofc,...,c 5 (asfunctions of x are then implied by (4.42 and the boundedness of the two terms in (4.39, which

10 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches we already verified. Finally, the second derivative of f 2 (x,y with respect to x and y (for y [y low (x,y low (x] is also a polynomial on an (shifted interval of length one, obtained by differentiating c,...,c 5 with respect to x in the polynomial corresponding to the derivative of f 2 (x,y with respect to y. Because we just verified that the first derivatives of c,...,c 5 with respect to x are themselves uniformly bounded in x, this must also be the case of the cross-derivatives of f 2 (x,y. By symmetry, the conclusion of the lemma also holds for all x and y [y up (x,y up (x+]. Theorem 4.2 The function f(x,y is uniformly bounded below on IR 2. Proof. Observe first that (3.8 implies that f (x is bounded below because [ f (x k f (x k+ ] = σk 2 = ζ(+2η <. i= Moreover, it also results from this last observation that f (x for all x (and thus also for all x IR. The fact that f 2 (x,y is also uniformly bounded below results from its definition in (3.32 and Lemma 4.. The desired conclusion then follows from (3.36. The verification that the gradient of f(x,y admits a uniform Lipschitz constant is a more lengthy calculation, which is the object of the next theorem. It depends on the observation that i= σ k σ k+ = ( 2 +η ( k + ( 2 +η( k η k η ( k + = ( 2 +η( 2 +η ( k +2 k + k +2 (4.4 ( 2 +ησ 2 k where we used the bound η < < 2 and the resulting concavity of t 2 +η. Theorem 4.3 The gradient of function f(x, y is uniformly Lipschitz continuous on IR 2. Proof. Let us consider the functions δ(x and y mid (x. Remembering (3.29, (3.3 and (4.4 and using the fact that t [,] when x [x k,x k+ ], we easily deduce that, for x in this interval, max [ δ (x, δ (x, y mid(x, y mid(x ] 36( 2 +η def = κ dy. (4.42 We now turn to the analysis of the second derivatives of f 2 (x,y. Consider first the case where y [y low (x,y up (x]. In this interval, we obtain, for x [x k,x k+ ], that 2 f 2 (x,y = 2. (4.43 y2

11 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches Moreover and thus f 2 x (x,y = 2(y y mid(xy mid(x ( f 2 x 2 (x,y = 2y mid(x 2 2(y y mid (xy mid(x. Taking absolute values and noting that, because of the definition of y mid (x and (4.42, y y mid (x 2δ(x 2σ k (4.45 for x [x k,x k+ ] and y [y low (x,y sup (x], we obtain, for x and y in these intervals, that 2 f 2 x 2 (x,y 2κ2 dy +σ k κ dy 2κ 2 dy +κ dy, (4.46 where we also used (4.42 and the bound σ k. Finally, for x and y in the same intervals, we have that 2 f 2 x y (x,y = 2y mid(x 2κ dy where we used (4.44 and (4.42. Considering this last relation together with (4.43 and (4.46, we thus conclude that the second derivatives of f 2 (x,y are uniformly bounded for all x and all y [y low (x,y sup (x]. The case where y [y low (x,y low (x] [y up (x,y up (x+] is covered by Lemma 4.. To conclude our analysis, we are thus left with checking the boundedness of the second derivatives of f 2 (x,y for y y up + and y y low. In these intervals, f 2 (x,y = 8δ(x 2, whose second derivatives are bounded because of (4.42. We may therefore finally assess that the second derivatives of f 2 (x,y are bounded for all x and all y. (Figure 4.5 shows the second derivative of f 2 (x,y with respect to y for x = x 2. We may now combine this last conclusion with (3.9 and (3.36 to deduce that f(x,y has uniformly bounded second derivatives for all (x,y IR 2. The desired Lipschitz continuity of its gradient then follows Figure 4.5: The second derivative of f 2 (x 2,y, for η = 5. We conclude the construction of our example by verifying that the sequence of iterates can indeed be obtained from the steepest-descent method with exact linesearches.

12 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 2 Theorem 4.4 The iterate (x k+,y k+ is the first minimizer along the steepest descent direction from (x k,y k. Proof. The theorem statement is equivalent to verifying that the slope ( ω(x = f(x,y(x, 2 ( k [ = 2 f (x+ f ] 2 f 2 x (x,y+( k y (x,y of f(x,y on [x k,x k+ ], which is given by ω(x = 2f (x+[y(x y mid (x][( k y mid(x], (4.47 is always non-positive and is zero only at the iterates (the corners of the trajectory. To prove this property, we first observe that, because of (3.29, ( k y mid(x = ( k[ y mid(x ] (4.48 Observe now that (4.4 and the decreasing nature of σ k together give that, for k >, σ 2k σ 2k σ 2k ( 2 +ησ 2k < 2 +η.6. where the last inequality follows from the bound η. Hence, recalling (3.29 and using the fact that max t [,] t 2 ( t 2 = 6, we obtain that, for x [x 2k,x 2k ], y mid(x < 5.6 max t [,] t2 ( t 2 <.57. (4.49 Similarly, (4.4 and the decreasing nature of σ k imply that, for k >, σ 2k+ σ 2k σ 2k ( 2 +ησ 2k < 2 +η.6 while, for k =, σ σ ( σ = 2 2 +η <.6. This thus gives that y mid(x < 5.6 max t [,] t2 ( t 2 <.57 (4.5 for x [x 2k,x 2k+ ]. Combining (4.49 and (4.5, we obtain that y mid(x < for all x, and therefore, using (4.48, that ( k y mid(x, for all x [x k,x k+ ], where the inequality is strict except at x k and x k+ since y mid(x k = y mid(x k+ =. Hence we obtain, using (4.45, that, for x [x k,x k+ ], [y(x ymid (x][( k y mid(x] 2δ(x. (4.5 Moreover, since, at the leftmost boundary of [x 2k,x 2k ], y(x 2k y mid (x 2k = 2δ(x 2k = 2σ 2k

13 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 3 and, at the leftmost boundary of [x 2k,x 2k+ ], y(x 2k y mid (x 2k = 2δ(x 2k = 2σ 2k (where we used (3.3, we deduce from (4.48 that the inequality in (4.5 can only hold as an equality at x k+. Our penultimate step to is note that (3.6 and (3.27 together give that, for x [x k,x k+ ] and t = (x x k /σ k, f (x+δ(x = (σ k σ k+ [ 2t 2 +8t 3 6t 5], (4.52 where, again, the inequality is strict in the interior of the interval (see Figure Figure 4.6: The polynomial 2t 2 +8t 3 6t 5 on [,]. Combining finally (4.47, (4.5 and (4.52, we obtain that, for all k, ω(x < for x [x k,x k+, (4.53 and (x k+,y k+ is indeed the first local minimizer of f(x,y along the steepest-descent direction at iterate (x k,y k. This last theorem is illustrated in Figure 4.7, and completes the construction of our example. 5 Conclusions We have constructed an example where, for an arbitrary τ >, the steepest-descent method with exact linesearches takes at least a multiple of ǫ 2+τ iterations to find an approximate stationary point at which g k ǫ, for any ǫ (,. This result closes the gap left by Cartis et al. (2 who could not accomodate this type of linesearch corresponding to the archetypal, if very often impractical, definition of the method. Given that we have shown in this last paper that it is impossible to obtain an O(ǫ 2 worst-case complexity for all ǫ, this is probably the best result that can be obtained. As was the case in this last paper, our example may furthermore be adapted to cover the case where the level sets of the objective are finite by extending f(x,y beyond the approximate minimizer. This is achieved by smoothly prolongating f (x beyond this point with a suitably increasing function and by, say, keeping the width of the corridor constant in this part of the plane. Such an example may therefore be constructed for every ǫ (,.

14 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches References Figure 4.7: The behaviour of ω(x for η = 5. W. Bian, X. Chen, and Y. Ye. Complexity analysis of interior point algorithms for non- Lipschitz and nonconvex minimization. Mathematical Programming, Series A, (submitted, 22. C. Cartis, N. I. M. Gould, and Ph. L. Toint. On the complexity of steepest descent, Newton s and regularized Newton s methods for nonconvex unconstrained optimization. SIAM Journal on Optimization, 2(6, , 2. C. Cartis, N. I. M. Gould, and Ph. L. Toint. Adaptive cubic overestimation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, Series A, 27(2, , 2a. C. Cartis, N. I. M. Gould, and Ph. L. Toint. An adaptive cubic regularisation algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity. IMA Journal of Numerical Analysis, (to appear, 2b. C. Cartis, N. I. M. Gould, and Ph. L. Toint. Optimal Newton-type methods for nonconvex optimization. Technical Report naxys-7-2, Namur Centre for Complex Systems (naxys, FUNDP-University of Namur, Namur, Belgium, 2c. C. Cartis, N. I. M. Gould, and Ph. L. Toint. Complexity bounds for second-order optimality in unconstrained optimization. Journal of Complexity, 28, 93 8, 22a. C. Cartis, N. I. M. Gould, and Ph. L. Toint. A note about the complexity of minimizing Nesterov s smooth Chebyshev-Rosenbrock function. Optimization Methods and Software, (to appear, 22b. C. Cartis, N. I. M. Gould, and Ph. L. Toint. On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization. SIAM Journal on Optimization, 22(, 66 86, 22c. A. Cauchy. Méthode générale pour la résolution des systèmes d équations simultannées. Comptes Rendus de l Académie des Sciences, pp , 847. S. Gratton, A. Sartenaer, and Ph. L. Toint. Recursive trust-region methods for multiscale nonlinear optimization. SIAM Journal on Optimization, 9(, , 28. F. Jarre. On Nesterovs smooth Chebyshev-Rosenbrock function. Technical report, University of Düsseldorf, Düsseldorf, Germany, May 2.

15 Cartis, Gould, Toint: Complexity of steepest descent with exact linesearches 5 Yu. Nesterov. Introductory Lectures on Convex Optimization. Applied Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands, 24. Yu. Nesterov and B. T. Polyak. Cubic regularization of Newton method and its global performance. Mathematical Programming, Series A, 8(, 77 25, 26. L. N. Vicente. Worst case complexity of direct search. Technical report, Department of Mathematics, University of Coimbra, Coimbra, Portugal, May 2. Preprint -7, revised 2.

Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization

Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint October 30, 200; Revised March 30, 20 Abstract