Keywords: evaluation complexity, worst-case analysis, least-squares, constrained nonlinear optimization, cubic regularization methods.

Size: px

Start display at page:

Download "Keywords: evaluation complexity, worst-case analysis, least-squares, constrained nonlinear optimization, cubic regularization methods."

Erick Jacobs
6 years ago
Views:

1 On the evaluation complexity of cubic regularization methos for potentially rank-icient nonlinear least-squares problems an its relevance to constraine nonlinear optimization Coralia Cartis, Nicholas I. M. Goul an Philippe L. Toint March 21, 2012 Abstract We propose a new termination criteria suitable for potentially singular, zero or non-zero resiual, least-squares problems, with which cubic regularization variants take at most O(ǫ 3/2 ) resiual- an Jacobian-evaluations to rive either the Eucliean norm of the resiual or its graient below ǫ; this is the best-known boun for potentially rank-icient nonlinear least-squares problems. We then apply the new optimality measure an cubic regularization steps to a family of least-squares merit functions in the context of a target-following algorithm for nonlinear equality-constraine problems; this approach yiels the first evaluation complexity boun of orer ǫ 3/2 for nonconvexly constraine problems when higher accuracy is require for primal feasibility than for ual first-orer criticality. Keywors: evaluation complexity, worst-case analysis, least-squares, constraine nonlinear optimization, cubic regularization methos. 1 Introuction An ubiquitous challenge in scientific computing is the minimization of an appropriate norm of a given, sufficiently smooth, vector-value function r : IR n IR m. This problem formulation arises in numerous real-life applications requiring ata fitting, parameter estimation, image reconstruction, weather forecasting an so forth [23]. Crucially, it is often an essential builing block when solving constraine nonlinear programming problems, being use for example, to reuce the constraint violation in various sequential programming [2, 12, 24 26], filter [16], funnel [18] an re-weighte least squares approaches [23]. Nonlinear least-squares problems are also at the heart of the path-following metho for constraine problems which we propose an analyze here, as well. Here we focus on the Eucliean-norm case that gives rise to the equivalent nonlinear least-squares problem, min x IR nφ(x) = 2 r(x) 1 2, (1.1) now involving the smooth function Φ(x); other norms may be of interest an some are equally acceptable in this framework. We allow arbitrary values for m an n, an so both over- an uner-etermine resiuals r(x) are allowe in (1.1), as well as square nonlinear systems of equations; in the latter two cases, one may wish to reuce Φ(x) in (1.1) to zero so as to fin the zeros of the system r(x) = 0. Methos for solving (1.1) iffer not only in their practical performance, but also in the theoretical bouns known on their worst-case efficiency, which is the focus of this paper. Of the various methos propose, Gauss-Newton techniques are the most popular an well-researche[15,23]. Rather than tackling School of Mathematics, University of Einburgh, The King s Builings, Einburgh, EH9 3JZ, Scotlan, UK. coralia.cartis@e.ac.uk. All three authors are grateful to the Royal Society for its support through the International Joint Project Computational Science an Engineering Department, Rutherfor Appleton Laboratory, Chilton, Oxforshire, OX11 0QX, Englan, UK. nick.goul@stfc.ac.uk. Department of Mathematics, FUNDP - University of Namur, 61, rue e Bruxelles, B-5000, Namur, Belgium. philippe.toint@funp.ac.be. 1

2 2 C. Cartis, N. I. M. Goul an Ph. L. Toint the smooth formulation (1.1), recent algorithmic variants [1, 8, 21] attempt to minimize the un-square an hence nonsmooth, norm of r(x) instea, in an attempt to improve the conitioning of the system that ines the change to the iterates. Using only first-orer information namely, values of the resiual r(x) an its Jacobian J(x) at given x, obtaine from a so-calle black-box/oracle both classical an moern variants can be mae/shown to be globally convergent to stationary points of (1.1), namely to points satisfying x Φ(x) = J(x) T r(x) = 0; (1.2) furthermore, the number of resiual an Jacobian evaluations require to bring the norm of (1.2) or some (nonsmooth) first-orer optimality measure within some tolerance ǫ is O(ǫ 2 ), provie J(x) an r(x) are Lipschitz continuous [1, 8, 13, 21, 23]. Another possibility is to apply Newton-type methos to the unconstraine problem (1.1), which can ensure for example, fast local convergence for nonzero resiual problems an most importantly here, improve global efficiency for both zero- an non-zero resiual problems. In particular, cubic regularization methos [9,17,22] applie to (1.1) take O(ǫ 3/2 ) resiual evaluations to ensure (1.2) is within ǫ, provie r(x), J(x) an the Hessians xx r i (x), i = 1,...,m, are Lipschitz continuous; this boun is sharp for nonlinear least-squares [11], is optimal from a worstcase complexity point of view for a wie class of secon-orer methos an nonconvex unconstraine problems [4], an is the best-known complexity for secon-orer methos. This boun can be further improve for graient-ominate resiuals (such as when the singular values of the Jacobian are uniformly boune away from, or converge to, zero at the same rate as the resiual) [22]. The (natural) approximate satisfaction of (1.2) as termination criteria for the cubic regularization an other methos suffers from the isavantage that an approximate zero of r(x) is guarantee only when J(x) is uniformly full-rank, with a known lower boun on its smallest singular value this is a strong assumption. In this paper, we introuce a termination conition that can istinguish between the zero an non-zero resiual case automatically/implicitly. Namely, we argue for the use of a scale variant of (1.2), which is precisely the graient of r(x) whenever r(x) 0, as well as the inclusion of the size of the resiual in the termination conition. Without requiring a non-egenerate Jacobian, we then show that cubic regularization methos can generate either an approximate scale graient or resiual value within ǫ in at most O(ǫ 3/2 ) resiual-evaluations, thus preserving the (optimal) orer of the boun for cubic regularization. Consier now the evaluation complexity of minimizing a smooth but potentially nonconvex objective f(x) IR for x C. When C is escribe by finitely many smooth (but potentially nonconvex) equality an inequality constraints, we have shown that a first-orer exact penalty metho with boune penalty parameters [8], as well as a short-step target-following algorithm with steepest-escent-like steps [3], take O(ǫ 2 ) objective an constraint evaluations to generate an approximate KKT point of the problem or an infeasible point of the feasibility measure with respect to the constraints. Thus aing constraints oes not eteriorate the orer of worst-case evaluation complexity boun achieve in the unconstraine case when steepest-escent like methos are employe. A natural question arises as to whether an improve evaluation complexity boun, of the orer of cubic regularization, can be shown for constraine problems. In the case when C is given by convex constraints, projecte cubic regularization variants can be shown to satisfy the O(ǫ 3/2 ) evaluation boun [6]. In this paper, in a similar vein to [3], we propose a short-step target-following algorithm for problems with nonconvex equality constraints minimize f(x) such that c(x) = 0, that takes cubic regularization steps for a sequence of shifting least-squares merit functions. The evaluation complexity of the resulting algorithm is better than that for steepest-escent, an can even achieve O(ǫ 3/2 ), provie the (ual) KKT conitions are satisfie with lower accuracy than the (primal) feasibility with respect to the constraints. The structure of the paper is as follows. Section 2 summarizes aaptive cubic regularization methos [9] an relevant complexity results. Section 3.1 presents the new termination criteria for (1.1) base on the scale graient, while Section 3.2 gives the complexity result for cubic regularization applie to

3 Evaluation complexity of nonlinear least-squares an constraine problems 3 (1.1) with the new termination criteria. Sections 4 an 5 present the short-step target-following cubic regularization algorithm for the equality-constraine problem an its complexity analysis, respectively. Section 6 summarizes our contributions an iscusses possible extensions of this work. 2 Previous cubic regularization construction an results 2.1 Description of aaptive cubic regularization algorithm We consier applying the Aaptive Regularization with Cubics (ARC) algorithm [9, 10] to (1.1); here, we focus on the ARC variant that has the best known an optimal worst-case evaluation complexity, so-calle ARC (S). At each iterate x k, k 0, a step s k is compute that approximately minimizes the local cubic moel m k (s) = 1 2 r(x k ) 2 +s T J(x k ) T r(x k )+ 1 2s T B k s+ 1 3σ k s 3 (2.1) of Φ(x k +s) with respect to s, where B k is an approximation to the Hessian of Φ at x k an σ k > 0 is a regularization parameter. In this metho, the step s k is compute to satisfy s T kj(x k ) T r(x k )+s T kb k s k +σ k s k 3 = 0 (2.2) an s T kb k s k +σ k s k 3 0. (2.3) Conitions (2.2) an (2.3) are achieve whenever s k is a global minimizer of the moel m k along the irection s k, namely, arg min α IR m k (αs k ) = 1; in particular, they are satisfie whenever s k is a global minimizer of the moel m k over a(ny) subspace [10, Theorem 3.1, Lemma 3.2]. Note that if s k is chosen as the global minimizer of m k over the entire space, σ k is maintaine at a sufficiently large value an B k is the true Hessian, then ARC (S) is similar to the cubic regularization technique propose in [22]. To ensure ARC s fast local convergence, we nee to go beyon uniimensional minimization, an so we terminate the inner moel minimization when TC.s s m k (s k ) κ θ min{1, s k } J(x k ) T r(x k ), (2.4) where κ θ is any constant in (0,1); see [10, 3.2] for a etaile escription of this an other possible termination conitions. Note that s m k (0) = x Φ(x k ) = J(x k ) T r(x k ) so that (2.4) is a relative error conition, which is clearly satisfie at any minimizer s k of m k since then s m k (s k ) = 0. Generally, we hope that the inner minimization will be terminate before this inevitable outcome. Note that when s k is compute by minimizing m k over a subspace, we may increase the subspace of minimization until TC.s is satisfie. In particular, one may use a Lanczos-base approach where the subspace is the Krylov one generate by { x Φ(x k ), B k x Φ(x k ), B 2 k xφ(x k )... }. In this case, conitions (2.2) an (2.3) are also achieve [10, 3.2, 6, 7]. It remains to escribe the iterate upating an moel improvement technique in ARC. The step s k is accepte an the new iterate x k+1 set to x k +s k whenever (a reasonable fraction of) the preicte moel ecrease Φ(x k ) m k (s k ) is realize by the actual ecrease in the objective, Φ(x k ) Φ(x k +s k ). This is measure by computing the ratio ρ k in (2.5) an requiring ρ k to be greater than a prescribe positive constant η 1 (for example, η 1 = 0.1); it can be shown that ρ k is well-ine whenever x Φ(x k ) 0 [10, Lemma 2.1]. Since the current weight σ k has resulte in a successful step, there is no pressing reason to increase it, an inee there may be benefits in ecreasing it if the moel overestimates the function locally. By contrast, if ρ k is smaller than η 1, we juge that the improvement in objective is insufficient inee there is no improvement if ρ k 0. If this happens, the step will be rejecte an x k+1 left as x k. Uner these circumstances, the only recourse available is to increase the weight σ k prior to the next iteration with the implicit intention of reucing the size of the step. A summary of the ARC (S) algorithm applie to (1.1) follows.

4 4 C. Cartis, N. I. M. Goul an Ph. L. Toint Algorithm 2.1: Aaptive Regularization using Cubics (ARC (S) ) [9,10] applie to (1.1). A starting point x 0, an initial an a minimal regularization parameter σ 0 σ min > 0, an algorithmic parameters γ 2 γ 1 > 1 an 1 > η 2 η 1 > 0, are given. For k = 0,1,..., until termination, o: 1. Compute a step s k that satisfies (2.2) (2.4). 2. Compute r(x k +s k ) an ρ k = 1 2 r(x k ) 2 2 r(x 1 k +s k ) r(x k ) 2. (2.5) m k (s k ) 3. Set { xk +s x k+1 = k if ρ k η 1 otherwise. x k 4. Set σ k+1 [σ min,σ k ] if ρ k > η 2 [very successful iteration] [σ k,γ 1 σ k ] if η 1 ρ k η 2 [successful iteration] [γ 1 σ k,γ 2 σ k ] otherwise. [unsuccessful iteration] (2.6) Note that we have not yet ine the conition require for ARC (S) to terminate. In [9,10], we terminate ARC when x Φ(x k ) ǫ, an possibly also λ min ( xx Φ(x k )) ǫ, for a user-specifie tolerance ǫ (0,1). Here, we will require that either some scale graient or the resiual is within ǫ; this novel termination conition, specific to (1.1), is escribe in Section Assumptions an useful results The following assumptions are chosen to ensure that those in [9,10] are satisfie when ARC (S) is applie to (1.1), which allows us to employ some crucial ARC results from [9,10] to (1.1). Let X be an open convex set containing all the generate iterates {x k,x k + s k }, k 0. We assume that AR.1 r i C 2 (IR n ) an r i (x) is uniformly boune above on X, i {1,...,m}. (2.7) For each i {1,...,m}, the resiuals r i are Lipschitz continuous on X, namely, AR.2 r i (x) r i (y) κ ri x y, for all x, y X, an some κ ri 1. (2.8) This implies that r is Lipschitz continuous on X, with Lipschitz constant κ r = (κ r1,...,κ rm ) 1. We also assume that the Jacobian J of r is Lipschitz continuous on X, namely AR.3 J(x) J(y) κ J x y, for all x, y X, an some κ J > 0. (2.9) Note that AR.1 AR.3 imply that the graient x Φ given in (1.2) is Lipschitz continuous on X with Lipschitz constant L g = κ 2 r +r max κ J 1, (2.10) where r max > 0 enotes an upper boun on r(x), x X. (This is assumption AF.4 in [9,10].) For each i {1,...,m}, the Hessian xx r i is also assume to be globally Lipschitz continuous on the path of all generate iterates, namely, there exists a constant L i such that AR.4 xx r i (x) xx r i (x k ) L i x x k, for all x [x k,x k +s k ] an all k 0. (2.11)

5 Evaluation complexity of nonlinear least-squares an constraine problems 5 Note that AR.1 AR.4 imply that the Hessian of Φ xx Φ(x) = J(x) T J(x)+ n r i (x) xx r i (x) (2.12) is globally Lipschitz continuous on the path of all generate iterates, with Lipschitz constant L n n = 2κ J κ r +κ J κ ri + r(x 0 ) L i, (2.13) i=1 where we also use that ARC generates monotonically ecreasing function values so that r(x k ) r(x 0 ). (This is assumption AF.6 in [9,10].) Clearly, the values of the resiual r(x k ) an its Jacobian J(x k ) are require to form the moel (2.1) an estimate (2.5). Thus, as B k is an approximation to the Hessian of Φ in (2.12) at x k, only the Hessian of each r i nees to be approximate in B k an so it is natural to consier B k to be of the form where M k i=1 i=1 B k = J(x k ) T J(x k )+M k, (2.14) H Φ (x k ) = We require that M k an H Φ (x k ) in (2.15) agree along s k in the sense that n r i (x k ) xx r i (x k ). (2.15) i=1 AM.4 (H Φ (x k ) M k )s k C s k 2, for all k 0, an some constant C > 0. (2.16) This, (2.12) an (2.14) imply that [ xx Φ(x k ) B k ]s k C s k 2, for all k 0, (2.17) which is assumption AM.4 in [9,10]. The conition AM.4 is trivially satisfie with C = 0 when we set M k = H Φ (x k ) i.e., B k = xx Φ(x k ) for all k 0 in the ARC algorithm. The requirement (2.16) or (2.17) is a slight strengthening of the Dennis Moré conition [14]. The latter is achieve by some quasi- Newton upates provie further assumptions hol(see the iscussion following[10,(4.6)]). Quasi-Newton methos may still satisfy AM.4 in practice, though we are not aware if this can be ensure theoretically. We have shown in [7] that AM.4 can be achieve when B k is approximate by (forwar) finite ifferences of graient values, without changing the orer of the worst-case evaluation complexity boun as a function of the accuracy ǫ. The first lemma recalls some useful ARC properties, crucial to the complexity boun in Section 3.2. Lemma 2.1. Let AR.1 AR.4 an AM.4 hol, an apply Algorithm ARC (S) to (1.1). Then an so σ k 3 2(L+C) = k is very successful, (2.18) σ k max(σ 0, 3 2γ 2 (L+C)) = σ, for all k 0, (2.19) where L an C are ine in (2.13) an (2.16), respectively. Also, we have the function ecrease 1 2 r(x k ) r(x k+1 ) 2 α J(xk+1 ) T r(x k+1 ) 3/2 for all successful iterations k, (2.20) where α = η 1 σ min κ 3 g/6 an where κ g is the positive constant κ g = (1 κ θ )/( 1 2L+C +σ +κ θ L g ), (2.21) with κ θ, σ an L g ine in (2.4), (2.19) an (2.10), respectively.

6 6 C. Cartis, N. I. M. Goul an Ph. L. Toint Proof. The relation (2.18) an the boun (2.19) both follow from [10, Lemma 5.2], an (2.20) from (2.5), σ k σ min (ue to (2.6)), [10, Lemmas 3.3] an [9, Lemma 5.2]. Relating successful an total iteration counts The total number of (major) ARC iterations is the same as the number of resiual/function evaluations (as we also nee to evaluate r on unsuccessful iterations in orer to be able to compute ρ k in (2.5)), while the number of successful ARC iterations is the same as that of Jacobian/graient evaluations. Let us introuce some useful notation. Throughout, enote the inex set S = {k 0 : k successful or very successful in the sense of (2.6)}, (2.22) an, given any j 0, let S j = {k j : k S}, (2.23) with S j enoting the carinality of the latter. The lower boun on σ k an the construction of Steps 2 4 of ARC (S) allow us to quantify the total iteration count as a function of the successful ones. Theorem 2.2. For any fixe j 0, let S j be ine in (2.23). Assume that there exists σ > 0 be such that σ k σ, for all k j. (2.24) Then j 1+ 2 ( ) σ log S j. (2.25) logγ 1 σ min Proof. The upates (2.6) imply that σ k σ min for all k. Now apply [9, Theorem 2.1], namely, the boun [9, (2.14)] on the number of unsuccessful iterations up to j, an use the fact that the unsuccessful iterations up to j together with S j form a partition of {0,...,j}. Values for σ in (2.24) are provie in (2.19), uner the assumptions of Lemma 2.1. Thus, base on Theorem 2.2, it remains to boun the successful iteration count S j since the total iteration count up to j is of the same orer in ǫ as S j. 3 Evaluation complexity of cubic regularization for potentially rank-icient nonlinear least-squares problems 3.1 A suitable termination conition for ARC (S) Here, we epart from the stanar choice of termination criterion for erivative-base optimization algorithms such as ARC (S) when applie to (1.1), namely, requiring a sufficiently small graient Φ x (x k ) = J(x k ) T r(x k ) ǫ, where ǫ > 0 is the user-specifie accuracy tolerance. Such a conition is only guaranteetoprovieanapproximatezerooftheresiualr whenj(x)isuniformlyfull-rankanalowerbounon its smallest singular values is known, which are limiting assumptions. Such assumptions are not require for steepest-escent-like methos if appropriate optimality measures are employe [3, 8], but the complexity of such methos is worse than the best secon-orer methos [8, 11]. Thus, we introuce a termination conition that can istinguish between the zero an non-zero resiual case automatically/implicitly. We propose the following termination for ARC (S), termination : r(x k ) ǫ p or g r (x k ) ǫ, (3.1)

7 Evaluation complexity of nonlinear least-squares an constraine problems 7 where ǫ p > 0 an ǫ > 0 are the require accuracy tolerances an where J(x) T r(x) g r (x), whenever r(x) 0; = r(x) 0, otherwise. (3.2) Note that the scale graient g r (x) in (3.2) is precisely the graient of r(x) whenever r(x) 0. If r(x) = 0, we are at the global minimum of r an so g r (x) = 0 ( r(x) ) [19, VI.3]. In the termination conition (3.1), the scale graient g r (x k ) may be boune away from zero for instance, when the singular values of the Jacobian are uniformly boune away from zero then, as we show in the next section, the resiual values converge to zero, an so (3.1) can be achieve. When the iterates approach a nonzero resiual value, then g r converges to zero an so again, (3.1) can be satisfie. Another suitable termination conition with similar properties is given after the main result in the next section. Inthenextsection, weshowthatarc (S) cangenerateeitheranapproximatescalegraientorresiual value within ǫ in at most O(ǫ 3/2 ) resiual-evaluations, thus preserving the (optimal) orer of the boun for cubic regularization. 3.2 Evaluation complexity of ARC (S) with termination conition (3.1) The first lemma exploits (2.20) to give new lower bouns on the function ecrease that epen on the resiual an the scale graient (3.2); the bouns below will also be use for the constraine case. Lemma 3.1. Let AR.1 AR.4 an AM.4 hol, an apply the ARC (S) algorithm to (1.1). Then, for all successful iterations k for which r(x k ) 0, we have { } r(x k ) r(x k+1 ) min αβ 3/2 g r (x k+1 ) 3/2 r(x k ) 1/2, (1 β) r(x k ) (3.3) an r(x k ) 1/2 r(x k+1 ) 1/2 min { 1 2αβ 3/2 g r (x k+1 ) 3/2,(β 1/2 1) r(x k+1 ) 1/2}, (3.4) where α is ine just after (2.20) an β (0,1) is any fixe problem-inepenent constant. Proof. Suppose that r(x k ) 0, let β (0,1) an enote S β = {k S : r(x k+1 ) > β r(x k ) }, (3.5) where S is ine in (2.22). We first analyze the function ecrease for iterations k S β an then, for the ones in S \S β. Let k S β ; then r(x k+1 ) 0 since r(x k ) 0. From (2.20), (3.2) an (3.5), we euce r(x k ) 2 r(x k+1 ) 2 ( 2α J(x k+1 ) T r(x k+1 ) 3/2 J(xk+1 ) T ) 3/2 r(x k+1 ) = 2α r(x k+1) 3/2 r(x k+1 ) = 2α g r (x k+1 ) 3/2 r(x k+1 ) 3/2 2αβ 3/2 g r (x k+1 ) 3/2 r(x k ) 3/2. Conjugacy properties an the monotonicity relation r(x k ) r(x k+1 ) give r(x k ) r(x k+1 ) = r(x k) 2 r(x k+1 ) 2 r(x k ) + r(x k+1 ) r(x k) 2 r(x k+1 ) 2 2 r(x k ) (3.6) (3.7)

8 8 C. Cartis, N. I. M. Goul an Ph. L. Toint an furthermore r(xk ) r(x k+1 ) = r(x k) r(x k+1 ) r(xk ) + r(x k+1 ) r(x k) 2 r(x k+1 ) 2 4 r(x k ) 3/2. (3.8) Employing the last inequality in (3.6) into (3.7) an (3.8), respectively, we obtain an r(x k ) r(x k+1 ) αβ 3/2 g r (x k+1 ) 3/2 r(x k ) 1/2, for all k S β, (3.9) r(x k ) 1/2 r(x k+1 ) 1/2 αβ3/2 g r (x k+1 ) 3/2, for all k S β. (3.10) 2 Conversely, let k S \S β, which gives r(x k+1 ) β r(x k ), (3.11) an so the resiual values ecrease linearly on such iterations. It follows from (3.11) that on such iterations we have the following function ecrease an r(x k ) r(x k+1 ) (1 β) r(x k ) for all k S \S β. (3.12) r(x k ) 1/2 r(x k+1 ) 1/2 (1 β) r(x k ) 1/2 1 β β r(x k+1 ) 1/2 for all k S \S β. (3.13) (Note that (3.12) an (3.13) continue to hol if r(x k+1 ) = 0.) The boun (3.3) now follows from (3.9) an (3.12), an (3.4) from (3.10) an (3.13). The next theorem gives a general evaluation complexity result for ARC (S) applie to (1.1) when the termination conition (3.1) is employe. Theorem 3.2. LetAR.1 AR.4anAM.4hol, anletǫ p, ǫ (0,1). ConsierapplyingtheARC (S) algorithm with the termination conition (3.1) to minimizing (1.1). Then ARC (S) terminates after at most max{κ 1 ǫ 3/2,κ 2 ǫ 1/2 p } +1 (3.14) successful iterations or equivalently, Jacobian-evaluations an at most κ S max{κ 1 ǫ 3/2,κ 2 ǫ 1/2 p } +1 (3.15) total (successful an unsuccessful) iterations or equivalently, resiual-evaluations, where κ 1 = 2 r(x 0 ) 1/2 α 1 β 3/2, κ 2 = r(x 0 ) 1/2 (β 1/2 1) 1, (3.16) κ S = 2(1+κ u S) an κ u S = 2log(σ/σ min )/logγ 1, (3.17) with α ine just after (2.20), σ, in (2.19), an β (0,1) a fixe problem-inepenent constant. Proof. Clearly, if (3.1) is satisfie at the starting point, there is nothing left to prove. Assume now that (3.1) fails at k = 0. For any iteration (k+1) at which ARC (S) oes not terminate, it follows from (3.1) that we have r(x k+1 ) > ǫ p an g r (x k+1 ) > ǫ. (3.18)

9 Evaluation complexity of nonlinear least-squares an constraine problems 9 From (3.4) an (3.18), we euce r(x k ) 1/2 r(x k+1 ) 1/2 min{ 1 2αβ 3/2 ǫ 3/2,(β 1/2 1) p } for all k S for which (3.18) hols. (3.19) Summing up (3.19) over all iterations k S for which (3.18) hols, with say j ǫ as the largest inex, an using that the ARC (S) iterates remain unchange over unsuccessful iterations, we obtain r(x 0 ) 1/2 r(x jǫ ) 1/2 = j ǫ 1 k=0,k S S ǫ min [ r(x k ) 1/2 r(x k+1 ) 1/2] { } 1 2αβ 3/2 ǫ 3/2,(β 1/2 1) p (3.20) where S ǫ enotes the number of successful iterations up to iteration j ǫ. Using that r(x jǫ ) 1/2 0, we further obtain from (3.20) that j ǫ < an that r(x 0 ) 1/2 S ǫ { }, 1 min 2αβ 3/2 ǫ 3/2,(β 1/2 1) p which gives (3.14) since S ǫ must be an integer an since the termination conition is checke at the next iteration; see [9, (5.21), (5.22)] for full etails. To erive (3.15), apply Theorem 2.2 with j = j ǫ, σ ine in (2.19), an use also that ǫ p, ǫ (0,1). The next corollary gives the main complexity result of this section, whose proof follows immeiately from Theorem 3.2. It shows that the evaluation complexity of ARC (S) riving either r(x) or its graient below ǫ is O(ǫ 3/2 ), an improvement of existing ARC (S) results which can only ensure that the graient of r(x) 2 goes below ǫ in that same-orer number of evaluations. Corollary 3.3. Let AR.1 AR.4 an AM.4 hol, an let ǫ = min{ǫ p,ǫ } (0,1). Consier applying the ARC (S) algorithm with the termination conition (3.1) to minimizing (1.1). Then ARC (S) terminates after at most κ s Sǫ 3/2 +1 (3.21) successful iterations or equivalently, Jacobian-evaluations an at most κ S κ s Sǫ 3/2 +1 (3.22) total (successful an unsuccessful) iterations or equivalently, resiual-evaluations, where κ s S = r(x 0 ) 1/2 /min{ 1 2αβ 3/2,β 1/2 1}, (3.23) with α ine just after (2.20), κ S, in (3.17) an β (0,1) a fixe problem-inepenent constant. Some remarks on the above theorem/corollary an its proof follow: Note that in the non-zero resiual case, namely, when { r(x k ) } converges to some r > 0, the monotonicity of this sequence implies that r(x k+1 ) β r(x k ) for all k, with β = r / r(x 0 ) (0,1). Thus in this case, there is no nee to consier the iterations (3.11) of faster linear convergence. The function ecrease in (3.4) implies that instea of (3.1), we coul have use the conition termination 2 : r(x k ) 1/3 ǫ p or g r (x k ) ǫ, (3.24)

10 10 C. Cartis, N. I. M. Goul an Ph. L. Toint as termination for the ARC (S) algorithm, without changing the orer of the complexity boun as a function of (ǫ p,ǫ ) or even of ǫ = min{ǫ p,ǫ }. In fact, using the conition (3.24) improves the boun/accuracy for the resiual values reaching within ǫ p. Note that the boun (3.14) is a boun on the total number of successful iterations for which (3.18) hols. Thus espite the measure (3.1) being non-monotonic, after (3.14) iterations are taken, this measure woul remain below (ǫ p,ǫ ) for the remaining ARC (S) iterations, if any are taken. The use of conjugacy in the above proof is reminful of re-weighte least-squares techniques [23]. However, our attempts at applying (some moifie) ARC to such variants of (1.1) have not been successful. 3.3 Is the boun (3.15) sharp for the nonlinear least-squares problem (1.1)? Recall the example in [11, 5] that shows that ARC (S) takes essentially ǫ 3/2 iterations/evaluations to ensure that the norm of the graient is less than ǫ. The univariate function f : IR IR in question is positive for all x 0 an at the iterates, an it is zero at infinity, minimum to which ARC (S) converges. Thus this example can be viewe as a least-squares, zero-resiual problem, with r in (1.1) ine as r = f. It shows that ARC (S) with the termination conition that the absolute value of (1.2) which in this case, is precisely the first erivative of f is less than ǫ takes essentially ǫ 3/2 iterations/evaluations an so the ARC (S) complexity boun is sharp for nonlinear least-squares. (Note that although f(x) an its erivatives may not be globally Lipschitz continuous as x, the first an secon erivatives of r 2 = f have this property, as we have shown in [11, 5]. The latter conitions are sufficient for the O(ǫ 3/2 ) boun to hol for ARC (S).) It is unclear whether the boun (3.15) for ARC (S) with the termination conition (3.1) is also sharp. 3.4 Further improving the evaluation complexity of cubic regularization for nonlinear least-squares with special structure Suppose that r(x) in (1.1) is graient-ominate of egree 2 [22], namely, J(x) T r(x) r(x) σ min (J(x)) τ 2 > 0, x IR n, (3.25) where σ min (J(x)) enotes the smallest singular value of J(x); this implies that g r in (3.1) is boune away from zero for all r(x) 0. Then uner the conitions of Theorem 3.2, one can euce from (3.4) an (3.19) that r(x k ) must converge to zero as k, an that the asymptotic rate of this convergence is superlinear (i.e., linear with any convergence factor β (0, 1)); also, the algorithm takes a (problemepenent) constant number of steps to enter this region of superlinear convergence. We o not give the etails of this result here as a (slightly stronger) result of this form where the size of the neighbourhoo of fast local convergence oes not epen on β an r(x 0 ) enters the complexity boun in a polynomial way was given in [22, Theorem 7] for cubic regularization; the latter result continues to hol here for ARC (S) when applie to problems which we know a-priori satisfy (3.25) since then (3.1) is no longer require explicitly. An avantage of our (slightly weaker) approach here is that the termination conition (3.1) senses naturally when (3.25) hols an ensures ARC (S) behaves accoringly. Similarly, assume now that the smallest singular value of the Jacobian of r(x) converges to zero at the same rate as r(x), or that there exists τ 1 > 0 such that J(x) T r(x) / r(x) τ 1 r(x) for all x, which is the same as r(x) being graient-ominate of egree 1 [22]. Then again we can euce improve complexity bouns from (3.4) in the same vein as [22, Theorem 6], giving that ARC (S) requires at most O ( ǫ 1) evaluations to ensure r(x k ) ǫ. (Note the unerstanably weaker boun in this case since we minimize the square of the resiual, when compare to the ARC boun of orer O ( ǫ 1/2) for minimizing general unconstraine graient-ominate functions of egree 1 [5, 22].) The cases of graient-ominate resiuals of some intermeiate egree with value between 1 an 2, can be similarly analyze, yieling improvement over the boun (3.15).

11 Evaluation complexity of nonlinear least-squares an constraine problems 11 4 The ShS-ARC algorithm for equality-constraine problems Consier now the equality constraine problem where f : IR n IR an c : IR n IR m with m n. We assume that minimize f(x) such that c(x) = 0, (4.1) AC.1 The function c is twice continuously ifferentiable on IR n an f is twice continuously ifferentiable in an open neighbourhoo of C 1 = {x IR n c(x) κ c }, where κ c > 0 is (small) constant. (4.2) The algorithm we now escribe consists of two phases; see Figure 4.1 (a). In the first, ARC (S) with termination conition (3.1) is applie to (1.1) with r = c, so as to minimize 2 c(x) 1 2 (inepenently of the objective function f), resulting in a point which is either (approximately) feasible, or is an approximate infeasible stationary point of c(x). The latter outcome is not esirable if one wishes to solve (4.1), but cannot be avoie by any algorithm not relying on global minimization or if C 1 is empty. If an (approximate) feasible point has been foun, Phase 2 of the algorithm then performs short cubic regularization steps for a parametrize family of least-squares functions so long as first-orer criticality is not attaine. These steps are compute by attempting to preserve approximate feasibility of the iterates while proucing values of the objective function that are close to a sequence of ecreasing targets. To be specific, one or more ARC (S) iterations are applie to minimizing the least-squares function Φ(x,t) = 1 2 r(x,t) 2 with respect to x, where ( r(x,t) c(x) = f(x) t an where t is a target value for f(x). Clearly, the Jacobian A(x,t) of the resiual function r(x,t) in (4.3) satisfies ( ) A(x,t) J(x) = A(x) =, (4.4) g(x) where J(x) is the Jacobian of the constraint function c(x) an g(x) is the graient of f(x). Thus x Φ(x,t) = A(x,t) T r(x,t) an the scale graient (3.2) has the expression A(x,t) T r(x,t) g r (x,t) = r(x,t) 0, otherwise. ) = J(x)T c(x)+(f(x) t)g(x), whenever r(x,t) 0; r(x,t) We are now reay to summarize our Short-Step ARC (ShS-ARC) algorithm. Algorithm 4.1: The Short-Step ARC (ShS-ARC) algorithm for (4.1). A starting point x 0, initial regularization parameters σ 0 an σ 1 an a minimal one σ min such that min{σ 0,σ 1 } σ min > 0, an algorithmic parameters γ 2 γ 1 > 1 an 1 > η 2 η 1 > 0, as well as the tolerances ǫ p (0,1) an ǫ (0,1), are given. Phase 1: Starting from x 0, apply ARC (S) to minimize 2 c(x) 1 2 until a point x 1 is foun such that (3.1) is satisfie, namely, J(x 1 ) T c(x 1 ) c(x 1 ) ǫ p or ǫ. (4.6) c(x 1 ) If c(x 1 ) > ǫ p, terminate [locally infeasible]. (4.3) (4.5)

12 12 C. Cartis, N. I. M. Goul an Ph. L. Toint Phase 2: 1. Set t 1 = f(x 1 ) 2. For k = 1,2,..., o: ǫ 2 p c(x 1 ) 2 an k = 1. 2a. Startingfromx k, applyoneiterationofarc (S) toapproximatelyminimize 1 2 r(x,t k ) 2 in (4.3). 2b. If g r (x k+1,t k ) ǫ, terminate. 2c. If ρ k η 1, set t k+1 = f(x k+1 ) r(x k,t k ) 2 r(x k+1,t k ) 2 +(f(x k+1 ) t k ) 2. (4.7) Otherwise, set t k+1 = t k. Note that the monotonicity property of the ARC (S) iterates [10, (2.5), (3.19)] generate in Step 2a of Phase 2 of ShS-ARC provies r(x k,t k ) r(x k+1,t k ) for all k 1, (4.8) an so the upating proceure for t k in (4.7) is well ine. Furthermore, (4.7) gives t k t k+1 = (f(x k+1 ) t k )+ r(x k,t k ) 2 r(x k+1,t k ) 2 +(f(x k+1 ) t k ) 2, (4.9) for any successful k 1, which we use to show next that the target values t k ecrease monotonically. Lemma 4.1. In Phase 2 of the ShS-ARC algorithm, the target values satisfy t k t k+1 for all k 1. (4.10) Proof. Due to (4.9), (4.10) follows immeiately in the case when f(x k+1 ) t k. Otherwise, when f(x k+1 ) > t k, conjugacy properties an (4.9) give r(x k,t k ) 2 r(x k+1,t k ) 2 t k t k+1 = 0, f(x k+1 ) t k + r(x k,t k ) 2 r(x k+1,t k ) 2 +(f(x k+1 ) t k ) 2 where in the last inequality, we also use (4.8). Phase 2 of the ShS-ARC terminates when g r (x k+1,t k ) ǫ, (4.11) where g r is ine in (4.5) an ǫ (0,1) is fixe at the start of the algorithm. Allowing ifferent primal an ual accuracy tolerances makes sense if one consiers the possibly ifferent scalings of the (primal) resiuals an (ual) graients. The latter may occur for instance when the Jacobian A(x) in (4.4) is not full rank, which is the case at KKT points of (4.1). The next lemma connects (4.11) to relative KKT points of (4.1) an to approximate critical points of the feasibility measure c(x).

13 Evaluation complexity of nonlinear least-squares an constraine problems 13 Lemma 4.2. For some (x,t), assume that the scale graient (4.5) of r(x,t) in (4.3) satisfies g r (x,t) = J(x)T c(x)+(f(x) t)g(x) r(x,t) ǫ. (4.12) Then either or J(x) T y(x,t)+g(x) (y(x,t),1) J(x) T c(x) c(x) ǫ, (4.13) ǫ, with y(x,t) = c(x) f(x) t. (4.14) Proof. We istinguish two possible cases. Firstly, assume that f(x) = t. Then (4.3) an (4.12) straightforwarly imply (4.13). Alternatively, we must have that f(x) t. This allows us to ivie in the numerator of (4.12) by f(x) t, which then provies, also using r(x,t) = (c(x),f(x) t), J(x) T c(x) f(x) t +g(x) r(x,t) ( ) f(x) t ǫ = c(x) f(x) t,1 ǫ, which gives (4.14). The conition (4.14) is an instance of the relative ual KKT stopping criterion J(x) T y +g(x) ǫ, (4.15) (y,1) for some multiplier y IR m. The relative error conition (4.15) can be justifie by means of a perturbation argument. Namely, consiering the perturbations x = x +δx an y = y +δy to some KKT point x an to a corresponing multiplier y, a Taylor expansion an the KKT conition J(x ) T y +g(x ) = 0 give that the perturbe ual feasibility resiual J(x) T y+g(x) is to first orer [H(x )+ m i=1 y i xxc i (x )]δx+ J(x ) T δy. Thepresenceofthemultipliery inthelatterremainerillustratesthatthesizeofthemultiplier shoul not be ignore when measuring KKT equation resiuals. If we enter Phase 2 of ShS-ARC, we have c(x 1 ) ǫ p. We show in the next section that in fact, we remain sufficiently close to the constraints for all subsequent iterates so that c(x k ) ǫ p. This an Lemma 4.2 imply that when the ShS-ARC algorithm terminates with (4.11), then either we are close to a feasible critical point of the feasibility measure c(x) or we are close to a (relative) KKT point of (4.1). In the next section, we establish that ShS-ARC remains close to the constraints at each step, an that the target values t k ecrease by a fixe amount in each iteration. Thus either (4.11) hols for some k an so we are approximately critical for (4.1) or for the constraints or the targets reach f, the global minimum of f over the set of constraints, in which case again (4.11) must hol. Thus ShS-ARC will terminate; furthermore, when ǫ p = ǫ an ǫ = ǫ 2/3, its worst-case evaluation complexity is O ( ǫ 3/2), just like in the unconstraine case. 5 Complexity of the ShS-ARC algorithm for the equality constraine problem Before analyzing the complexity of Algorithm ShS-ARC, we state our assumptions formally (in aition to AC.1): AC.2 The norm of c is uniformly boune above on IR n by c up an its Jacobian J(x) is globally Lipschitz continuous in IR n with Lipschitz constant L J > 0. The components c i an xx c i (x) are globally Lipschitz continuous on IR n with Lipschitz constants L ci > 0 an L H,ci, for i {1,...,m}.

14 14 C. Cartis, N. I. M. Goul an Ph. L. Toint Φ(x,t) f ε 3/2 Φ(x +,t) t Phase 1 t + Phase 2 ε ε c Figure 4.1: (a) Illustration of ShS-ARC Phase 1 & 2. (b) A successful iteration of ShS-ARC s Phase 2 in the case where ǫ p = ǫ an ǫ = ǫ 2/3. AC.3 f(x), g(x) an xx f(x) are Lipschitz continuous in C 1 which is ine in (4.2), with Lipschitz constants L f, L g,f > 0 an L H,f, respectively. AC.4 The objective f(x) is boune above an below in C 1, that is there exist constants f low an f low f up +1 such that f low f(x) f up for all x C 1. The assumptions AC.1 AC.4, the construction of ShS-ARC an (4.10) imply that AR.1 AR.4 hol for 1 each of the least-squares functions that we employ in ShS-ARC, namely, 2 c(x) 2 an 1 2 r(x,t k ) 2 for k 1; furthermore, the resulting constants are inepenent of k. In particular, the corresponing values of L g in (2.10) for 2 c(x) 1 2 an 2 r(x,t 1 k ) 2 are, respectively, L g,1 = L 2 c +c up L J an L g,2 = (L c,l f ) 2 +max{c up, f up + t 1 } (L J,L g,f ), (5.1) where L c = (L c1,...,l cm ) is the Lipschitz constant of c, while the corresponing values of L in (2.13) for 1 2 c(x) 2 an 1 2 r(x,t k ) 2 are, respectively, m m = 2L J L c +L J L ci + c(x 0 ) L H,ci, an (5.2) L 1 i=1 i=1 ( ( ) m m L 2 = (L J,L g,f ) 2 (L c,l f ) +L f + L ci )+max{c up, f up + t 1 } L H,f + L H,ci. (5.3) i=1 ThenextlemmashowsthatPhase2ofShS-ARCconsistsof(atmost)aconstantnumberofunsuccessful ARC (S) stepsfollowebyasuccessfuloneforminimizing 1 2 r(x,t k ) 2 forfixet k,afterwhicht k isecrease accoring to (4.7). i=1

15 Evaluation complexity of nonlinear least-squares an constraine problems 15 Lemma 5.1. Let AC.1 AC.4 hol, as well as AM.4 for the Hessian of 1 2 r(x,t k ) 2 an its approximation. Then the Phase 2 iterations of the ShS-ARC algorithm satisfy σ k max(σ 1, 3 2γ 2 (L 2 +C)) = σ sh, for all k 1, (5.4) where L 2 is ine in (5.3). Also, at most L sh = 1+ 2 logγ 1 log ( σsh σ min ) (5.5) ShS-ARC/ARC (S) iterations are performe for each istinct target value t k. Proof. The implication (2.18) in Lemma 2.1 irectly applies to the Phase 2 iterations of ShS-ARC, with constants L = L 2 ine in (5.3) an C given in AM.4, inepenent of k. The construction of a Phase 2 iteration of ShS-ARC an (2.6) imply that whenever σ k is large in the sense of (2.18), we have σ k+1 σ k. Thus (5.4) follows, noting that the factor γ 2 in σ sh is allowe for the case when σ k is only slightly less than 3(L 2 +C)/2 an k is not very successful, while the term σ 1 in (5.4) accounts for choices at the start of Phase 2. Note that Theorem 2.2 irectly applies to the Phase 2 iterations of ShS-ARC that employ the same target value t k. Thus the boun (5.5) follows irectly from (2.25), (5.4), the use of parameters γ 1 an σ min in Phase 2 of ShS-ARC, as well as the fact that we only take one successful ShS-ARC/ARC (S) iteration for each fixe t k (an so, here, S j = 1 in (2.25)). The next lemma gives an auxiliary result to be use in Lemma 5.3. Lemma 5.2. Consier the following optimization problem in two variables min (f,c) IR 2F(f,c) = f + ǫ 2 c 2 subject to f 2 +c 2 α 2, (5.6) where 0 < α < ǫ. The global minimum of (5.6) is attaine at (f,c ) = (α,0) an it is given by F(f,c ) = α+ǫ. Proof. Note that for any feasible (f,c), ǫ 2 c 2 > 0 since α < ǫ. We have ( ) c F(f,c) = 1, 0, for all (f,c). ǫ2 c 2 Thus the solution of (5.6) is attaine on the bounary of the feasible region, namely f 2 +c 2 = α 2. Also, (f,c ) satisfies the KKT conitions for (5.6), namely, 1+2λ f = 0 c +2λ c = 0 ǫ2 c 2 λ 0 an f 2 +c 2 = α 2, where λ is the multiplier at the minimizer (f,c ). The first conition above implies that f 0 an λ 0. Thus λ = 1/(2f ), which substitute into the secon KKT conition gives ( ) 1 1 c = 0. f ǫ2 c 2

16 16 C. Cartis, N. I. M. Goul an Ph. L. Toint Note that f ǫ 2 c 2 since f 2 + c 2 = α 2 an α < ǫ. Thus c = 0, an f = ±α. Since we are minimizing, the smallest value of F(f,c ) is at f = α. The next lemma proves the crucial observation that all Phase 2 iterates remain (approximately) feasible, an that the targets t k ecrease by a quantity boune below by a multiple of ǫ 3/2 p at every successful iteration k until termination. Lemma 5.3. Suppose that AC.1 AC.4 hol, as well as AM.4 for the Hessian of 1 2 r(x,t k ) 2 an its approximation. For every Phase 2 iteration k 1 of the ShS-ARC algorithm for which (4.11) fails, we have that f(x k+1 ) t k+1 0, (5.7) Moreover, if iteration k is successful an ǫ ǫ 1/3 p, then r(x k,t k ) = ǫ p, (5.8) c(x k ) ǫ p an f(x k ) t k ǫ p. (5.9) t k t k+1 κ t ǫ 3/2 p (5.10) for some problem-epenent constant κ t = min{αβ 3/2,1 β}, where β (0,1) is any fixe probleminepenent constant, α = η 1 σ min κ 3 g,r an κ g,r = (1 κ θ )/( 1 2L 2 +C +σ sh +κ θ L g,2 ), (5.11) with κ θ, L 2, C, σ sh an L g,2 ine in (2.4), (5.3), (2.16), (5.4) an (5.1), respectively. Proof. We start by observing that (5.7) immeiately follows from (4.7) an (4.8). Also, (5.9) follows straightforwarly from (5.8). Next, we prove (5.8), by inuction on k. Firstly, note that this inequality hols by construction for k = 1. Assume now that iteration k > 1 is successful an that Then r(x k,t k ) = ǫ p. (5.12) (f(x k+1 ) t k+1 ) 2 = r(x k,t k ) 2 r(x k+1,t k ) 2 +(f(x k+1 ) t k ) 2 = r(x k,t k ) 2 c(x k+1 ) 2, where (5.7) an (4.7) give the first ientity, while the secon equality follows from (4.3). Thus we euce, also using (4.3), that which conclues our inuction step ue to (5.12). r(x k+1,t k+1 ) 2 = r(x k,t k ) 2, It remains to establish (5.10). Lemma 3.1 applies to minimizing 1 2 r(x,t k ) 2, an so (3.3) implies that for any successful k 1, we have { } r(x k,t k ) r(x k+1,t k ) κ t min g r (x k+1,t k ) 3/2 r(x k,t k ) 1/2, r(x k,t k ), (5.13) where κ t is ine below (5.10). Thus for any successful k 1 for which (4.11) fails, (5.13) becomes r(x k,t k ) r(x k+1,t k ) κ t min { ǫ 3/2 p,ǫ p } = κ t ǫ 3/2 p, (5.14)

17 Evaluation complexity of nonlinear least-squares an constraine problems 17 where we also use (5.8) in the first inequality an ǫ ǫ 1/3 p, in the secon ientity. Using (4.3) an the properties of the l 2 -norm, (4.9) becomes t k t k+1 = (f(x k+1 ) t k )+ r(x k,t k ) 2 c(x k+1 ) 2 = (f(x k+1 ) t k )+ ǫ 2 p c(x k+1 ) 2, where we use (5.8) in the secon equality. It follows from (4.3) that (f(x k+1 ) t k ) 2 + c(x k+1 ) 2 = r(x k+1,t k ) 2 (5.15) ( ) 2 ( ) 2 r(x k,t k ) κ t ǫ 3/2 p = ǫ p κ t ǫ 3/2 p (5.16) where in the first inequality we use (5.14) an in the secon, (5.8). We now apply Lemma 5.2 to the thir right-han sie of (5.15), letting f = f(x k+1 ) t k, c = c(x k+1 ), ǫ = ǫ p an α = ǫ p κ t ǫ 3/2 p. We euce from this Lemma, (5.15) an (5.16) that t k t k+1 α+ǫ p = (ǫ p κ t ǫ 3/2 p )+ǫ p = κ t ǫ 3/2 p which proves (5.10). Figure 4.1 (b) illustrates the workings of one successful Phase 2 iteration for ǫ = ǫ p an ǫ = ǫ 2/3, the case of most interest to us as it coincies with the evaluation complexity of ARC for the unconstraine case. The figure exemplifies that the amount of ecrease in the target values is inherite from the merit function ecrease (5.14). Note that the ShS-ARC algorithm requires one evaluation of the objective function, its graient (an possibly Hessian) an one evaluation of the vector of constraint functions, its Jacobian (an possibly Hessians) per iteration. We are now reay to give the main complexity result for ShS-ARC applie to (4.1). Theorem 5.4. Suppose that AC.1 AC.4 hol, an that ShS-ARC is applie to minimizing (4.1) with ǫ ǫ 1/3 p. Assume also that AM.4 hols for the Hessians of 1 2 c(x) 2 an 2 r(x,t 1 k ) 2 an its approximations. Then the ShS-ARC algorithm generates an iterate x k satisfying either a relative KKT conition for (4.1), namely, c(x k ) ǫ p an J(x k) T y k +g(x k ) (y k,1) ǫ (5.17) for some y k IR m, or an approximate first-orer criticality conition for the feasibility measure c(x), namely, J(x k ) T c(x k ) ǫ (5.18) c(x k ) in at most κ f,c ǫ 3/2 ǫ 1/2 p (5.19) evaluations of c an f (an their erivatives), where κ f,c > 0 is a problem-epenent constant, inepenent of ǫ p, an x 0. Proof. The evaluation complexity of Phase 1 follows irectly from Theorem 3.2 with Φ(x) = 1 2 c(x) 2. In particular, the evaluation complexity of obtaining x 1 is boune above by κ S max{κ 1,κ 2 }max{ǫ 3/2,ǫ 1/2 p } (5.20) where κ 1,2 an κ S are ine in (3.16) an (3.17) with r(x 0 ) = c(x 0 ), L = L 1 given in (5.2) an L g = L g,1 in (5.1). If the ShS-ARC algorithm terminates at this stage, then (4.6) implies that (5.18) hols with

18 18 C. Cartis, N. I. M. Goul an Ph. L. Toint k = 1 an c(x 1 ) > ǫ p. Assume now that Phase 2 of the ShS-ARC algorithm is entere. From AC.4 an (5.9), we have f low f(x k ) t k +ǫ p t 1 i k κ t ǫ 3/2 p +ǫ p f(x 1 ) i k κ t ǫ 3/2 p +ǫ p where i k is the number of successful ShS-ARC iterations from 1 to k for which (4.11) fails, an where we have also use (5.10) an the inition of t 1 in the ShS-ARC algorithm. Hence, we obtain from the inequality f(x 1 ) f up (itself implie by AC.4 again) an ǫ p (0,1) that f up f low +1 i k κ t ǫ 3/2 p = L s sh. (5.21) Since for each istinct value of t k we have one successful iteration, (5.5) in Lemma 5.1 implies that the total number of Phase 2 iterations for which (4.11) fails is boune above by L s sh L sh, where L sh is ine in (5.5) an L s sh, in (5.21). Thus the ShS-ARC algorithm must terminate after this many iterations at most, yieling, because of Lemma 4.2, an iterate satisfying c(x k ) ǫ p an either (4.14) or (4.13); thus either (5.17) or (5.18) will hol in this case. Recalling that only one evaluation of c an f (an their erivatives, if successful) occurs per iteration, the boun (5.19) now follows by summing up the Phase 1 an Phase 2 iteration bouns, an using that ǫ p (0,1) which gives that the Phase 2 boun ominates in the orer of (ǫ p,ǫ ). If ǫ = ǫ 2/3 p, then Theorem 5.4 implies that the evaluation complexity of ShS-ARC is at most O(ǫ 3/2 p ), the same as for the unconstraine case. However, if ǫ = ǫ p, then this complexity boun worsens to O(ǫ 2 p ), the same in orer as for steepest-escent-type methos for both constraine an unconstraine problems [3, 11]. 6 Conclusions We have shown that with an appropriate an practical termination conition, the (optimal) cubic regularization variant ARC (S) takes at most O(ǫ 3/2 ) evaluations to rive the resiual or the scale graient of the potentially singular least-squares problem (1.1) below ǫ. Our analysis has focuse on the Eucliean norm case, but it can be easily extene to general inner proucts an inuce norms, an to smooth l p -norms for p > 2. Though the orer ǫ 3/2 of the ARC boun is optimal for unconstraine optimization when secon-orer methos are employe [4], an it is sharp for nonlinear least-squares when ensuring (1.2) is sufficiently small, it is unclear whether it is optimal or even sharp for ARC (S) with the novel termination conition (3.1). For the equality-constraine potentially nonconvex programming problem (4.1), we presente a targetfollowing technique ShS-ARC that applies ARC (S) to target-epenent least-squares merit functions tracking a path of approximately feasible points (if an initial such point can be foun). Furthermore, in orer to ensure approximate first-orer conitions for (4.1) or for a feasibility measure within ǫ p for the constraint feasibility an within ǫ for ual (first-orer) feasibility ShS-ARC requires at most O(ǫ 3/2 ǫ 1/2 p ) problem-evaluations, which epening on the choice of tolerances ǫ p an ǫ can take any value between the complexity O(ǫ 3/2 p ) of ARC (namely, when ǫ = ǫ 2/3 p ) an O(ǫ 2 p ) of steepest-escent (when ǫ = ǫ p ). Though it is natural for the primal an ual feasibility resiuals to vary at ifferent rates, an hence require ifferent optimality tolerances (with higher accuracy for primal feasibility than for ual being common), it is an open question at the moment whether an algorithm for nonconvexly constraine problems can be evise that has worst-case evaluation complexity of orer ǫ 3/2 where ǫ = ǫ p = ǫ. Also, extening ShS- ARC or other cubic regularization approaches to problems with nonconvex inequality constraints remains to be consiere.

19 Evaluation complexity of nonlinear least-squares an constraine problems 19 References [1] S. Bellavia, C. Cartis, N. I. M. Goul, B. Morini an Ph. L. Toint. Convergence of a Regularize Eucliean Resiual algorithm for nonlinear least-squares. SIAM Journal on Numerical Analysis, 48(1): 1 29, [2] R. H. Byr, R. B. Schnabel an G. A. Shultz. A trust region algorithm for nonlinearly constraine optimization. SIAM Journal on Numerical Analysis, 24: , [3] C. Cartis, N. I. M. Goul an Ph. L. Toint. On the complexity of fining first-orer critical points in constraine nonlinear programming. ERGO Technical Report , School of Mathematics, University of Einburgh, [4] C. Cartis, N. I. M. Goul an Ph. L. Toint. Optimal Newton-type methos for nonconvex smooth optimization problems. ERGO Technical Report , School of Mathematics, University of Einburgh, [5] C. Cartis, N. I. M. Goul an Ph. L. Toint. Evaluation complexity of aaptive cubic regularization methos for convex unconstraine optimization. Optimization Methos an Software, DOI: / , [6] C. Cartis, N. I. M. Goul an Ph. L. Toint. An aaptive cubic regularization algorithm for nonconvex optimization with convex constraints an its function-evaluation complexity. IMA Journal of Numerical Analysis, oi: /imanum/rr035, [7] C. Cartis, N. I. M. Goul an Ph. L. Toint. On the oracle complexity of first-orer an erivative-free algorithms for smooth nonconvex minimization. SIAM Journal on Optimization, 22(1):66 86, [8] C. Cartis, N. I. M. Goul an Ph. L. Toint. On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM Journal on Optimization, 21(4): , [9] C. Cartis, N. I. M. Goul an Ph. L. Toint. Aaptive cubic regularisation methos for unconstraine optimization. Part II: worst-case function- an erivative-evaluation complexity. Mathematical Programming, 130(2): , [10] C. Cartis, N. I. M. Goul an Ph. L. Toint. Aaptive cubic regularisation methos for unconstraine optimization. Part I: motivation, convergence an numerical results. Mathematical Programming, 127(2): , [11] C. Cartis, N. I. M. Goul an Ph. L. Toint. On the complexity of steepest escent, Newton s an regularize Newton s methos for nonconvex unconstraine optimization. SIAM Journal on Optimization, 20(6): , [12] M. R. Celis. A trust region strategy for nonlinear equality constraine optimization. Technical Report Department of Computational an Applie Mathematics, Rice University, Houston, Texas, USA, [13] A. R. Conn, N. I. M. Goul an Ph. L. Toint. Trust-Region Methos. SIAM, Philaelphia, USA, [14] J. E. Dennis an J. J. Moré. A characterization of superlinear convergence an its application to quasi-newton methos. Mathematics of Computation, 28(126): , [15] J. E. Dennis an R. B. Schnabel. Numerical Methos for Unconstraine Optimization an Nonlinear Equations. Prentice-Hall, Englewoo Cliffs, NJ, USA, Reprinte as Classics in Applie Mathematics 16, SIAM, Philaelphia, USA, 1996.

Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity

Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity Coralia Cartis,, Nicholas I. M. Gould, and Philippe L. Toint September