On the Superlinear Local Convergence of a Filter-SQP Method. Stefan Ulbrich Zentrum Mathematik Technische Universität München München, Germany

Size: px

Start display at page:

Download "On the Superlinear Local Convergence of a Filter-SQP Method. Stefan Ulbrich Zentrum Mathematik Technische Universität München München, Germany"

Archibald Pearson
6 years ago
Views:

1 On the Superlinear Local Convergence of a Filter-SQP Method Stefan Ulbrich Zentrum Mathemati Technische Universität München München, Germany Technical Report, October 2002.

3 Mathematical Programming manuscript No. (will be inserted by the editor) Stefan Ulbrich On the superlinear local convergence of a filter-sqp method October 2002 Abstract. Transition to superlinear local convergence is shown for a modified version of the trust-region filter-sqp method for nonlinear programming introduced by Fletcher, Leyffer, and Toint [8]. Hereby, the original trust-region SQP-steps can be used without an additional second order correction. The main modification consists in using the Lagrangian function value instead of the objective function value in the filter together with an appropriate infeasibility measure. Moreover, it is shown that the modified trust-region filter- SQP method has the same global convergence properties as the original algorithm in [8]. Key words. nonlinear programming, superlinear convergence, global convergence, filter, SQP 1. Introduction This paper is concerned with the transition of trust-region filter-sqp methods to superlinear local convergence without using second order corrections. Hereby, we consider the general nonlinear programming problem (NLP) min f(x) subject to c E (x) = 0, c I (x) 0 (P) with twice continuously differentiable functions f : R n R, c : R n R m and c E = (c i ) i E, c I = (c i ) i I, where E = {1,..., m E } and I = {m E + 1,..., m} denote the indices of equality and inequality constraints, respectively. With these notations the Lagrangian function associated with (P) is given by l(x, y) = f(x) + y T c(x), where y = (y T E, yt I )T R m, y I 0. We propose a modification of the globally convergent filter-sqp method by Fletcher, Leyffer, and Toint [8] that has the same global convergence properties and ensures in addition transition to superlinear local convergence under appropriate assumptions on the Hessian approximation in the SQP-problems close to a regular minimum. In the pioneering wor [7], Fletcher and Leyffer proposed the filter technique for the globalization of SQP methods, which avoids the use of classical merit functions with penalty term. Instead, the the acceptability of steps is determined by comparing the constraint violation and objective function value with previous iterates collected in a filter. The new iterate is acceptable to the filter if either feasibility or objective function value is sufficiently improved in comparison to all iterates boomared in the current filter. The promising numerical results in [7] led to a growing interest in filter methods Stefan Ulbrich: Zentrum Mathemati, Technische Universität München, München, Germany, sulbrich@ma.tum.de.

4 2 Stefan Ulbrich in recent years. By augmenting the basic filter concept with a sufficient decrease condition for the objective function if the current iterate is sufficiently feasible, the global convergence of three variants of trust-region filter-sqp methods was shown in Fletcher, Gould, Leyffer, Toint, and Wächter [6], Conn, Gould, and Toint [4], Fletcher, Leyffer, and Toint [8], and Fletcher, Gould, Leyffer, Toint, and Wächter [6]. Global convergence of a filter method for nonlinear programming was subsequently also shown by Gonzaga, Karas, and Vanti [10]. A primal-dual interior-point filter method for NLP is proposed in Ulbrich, Ulbrich, and Vicente [15] and global convergence is shown. An alternative nonmonotone trust-region SQP-method without penalty function for equality constrained problems was proposed by Ulbrich and Ulbrich [14]. For this latter algorithm, in addition to global convergence also the transition to superlinear local convergence was proven in [14]. Hereby, no second order correction is necessary. So far, to the best of our nowledge, the only result on the superlinear local convergence of a filter method is shown in Wächter and Biegler [16] for an interior point linesearch filter method with second order correction steps. In this paper we propose a modified version of the filter-sqp algorithm in Fletcher, Leyffer, and Toint [8] that ensures transition to superlinear local convergence without using second order correction steps. We believe that the modifications and proof techniques of the present paper can be adapted to ensure also the fast local convergence of the filter methods in [4,6,16] without second order corrections. The original filter-sqp methods in [4,6,8] can at least if no second order correction steps are used be affected by the well nown Maratos effect. In fact, these algorithms use constraint violation and objective function value in the filter. By the Maratos effect a full SQP-step can lead to an increase of both these filter components even arbitrarily close to a regular minimizer. This maes the full SQP-step unacceptable for the filter and can prohibit fast local convergence. To avoid this problem, we use the value of the Lagrangian function in the filter instead of the objective function and modify the feasibility measure in the filter slightly. The paper is organized as follows. In section 2 we describe briefly the trust-region filter-sqp method of Fletcher, Leyffer, and Toint [8] and recall the global convergence result proven in [8]. In section 3 we derive a modified filter-sqp method and prove its transition to superlinear local convergence close to a point satisfying the strong second order sufficient condition. In subsection 3.1 we describe the local requirements on Hessian approximation and Lagrange multipliers. In subsection 3.2 we motivate the necessary changes to avoid the Maratos effect and formulate the modified trust-region filter-sqp method. In subsection 3.3 we prove the transition to superlinear local convergence of the modified algorithm. In section 4 we show how the global convergence proof in [8] can be adapted to the modified algorithm. 2. A trust-region filter-sqp algorithm In this section we recall the globally convergent trust-region filter-sqp-framewor proposed by Fletcher, Leyffer, and Toint [8] for the solution of (P). Given the current iterate x and a trust-region radius, the step computation is based on the approximate solution of the trust-region SQP-subproblem QP(x, ) minimize q (s) subject to c E (x ) + A E (x )s = 0, c I (x ) + A I (x )s 0, QP(x, ) s.

5 Superlinear local convergence of filter-sqp methods 3 Hereby, A E (x ) = (a i (x) T ) i E := c E (x ) T, A I (x ) = (a i (x) T ) i I := c I (x ) T are the Jacobians of c E and c I, respectively, q (s) = f + g T s st B s with g = g(x ) := f(x ) is a quadratic model of l(x + s, y ) y T A(x )s, A(x ) = c(x ) T, and B R n,n is a symmetric approximation of the Hessian of the Lagrangian H := xx l(x, y ) = xx f(x ) + (y ) i xx c i (x ). i E I To ensure global convergence, filter methods evaluate the quality of a search direction by using a filter F {(h j, f j ) : j = 0,..., 1} that collects pairs (h j, f j ) := (h(x j ), f(x j )) from certain previous iterations. Hereby, h(x) is an infeasibility measure. Fletcher, Leyffer, and Toint [8] propose to use h(x) := c E (x) 1 + c + I (x) 1, where c + I (x) = (max {c i(x), 0}) i I. Since a local solution x of (P) minimizes f locally under the constraint h(x) = 0, a promising trial step s should either reduce the contraint violation h or the function value f. To ensure sufficient decrease of at least one of the two criteria, Fletcher et al. [8] call x acceptable to the filter F if for all (h j, f j ) F either h(x) βh j or f(x) + γh(x) f j (1) is satisfied with constants 0 < γ < β < 1. Following [8] a pair (h j, f j ) is called dominated by (h(x), f(x)) if h(x) h j and f(x) f j. To update the information in the filter appropriate points x, which are acceptable to the filter, are added to the filter F. This operation means that (h(x), f(x)) is added to the set of pairs (h j, f j ) in F and subsequently any old pair is removed from the filter that is dominated by the new pair (h(x), f(x)). The acceptance test (1) has the useful inclusion property that after adding a pair (h(x), f(x)) to the filter all unacceptable points of the old filter remain unacceptable for the new filter (although dominated pairs are removed). This allows to show that the infeasibility tends to zero for any infinite subsequence of iterates added to the filter, which is an essential ingredient for the global convergence analysis of filter methods. More precisely, we have the following result, cf. the corollary of Lemma 1 in [8]. Lemma 1. For any infinite sequence of iterates x for which h > 0, {f } is bounded below, and (h, f ) is entered to the filter, it follows that h 0. Proof. See Lemma 1 and its corollary in [8].

6 4 Stefan Ulbrich Obviously, the filter concept alone does not necessarily enforce convergence to a local minimum. For example if the iterates converge to an arbitrary feasible point and the infeasibility measure h is in each iteration reduced by a factor of at least β then the iterates will be acceptable to the filter. Therefore, the filter-sqp algorithm of Fletcher, Leyffer, and Toint [8] imposes a sufficient reduction condition of the form f (s) := f(x ) f(x + s) σ q (s), q (s) := q (0) q (s) with a constant 0 < σ < 1 if the feasibility is good enough, more precisely if q (s) > 0. Finally, it is obvious that the SQP-subproblem QP(x, ) can be incompatible (infeasible). In this case, the algorithm invoes a restoration phase that finds an iterate x and a trust-region radius min > 0 such that QP(x, ) is compatible and (h, f ) is acceptable to the filter F. We have now collected all ingredients to give a precise statement of the filter-sqp method of Fletcher, Leyffer, and Toint [8]. Algorithm 1. Filter-SQP Algorithm of [8] Let 0 < γ < β < 1, σ ]0, 1[, and min > Initialize with a point x, = 1 and filter F 1 = {(u, )} with some u βh(x). 1. Enter restoration phase to find a point x acceptable to the filter F such that QP(x, ) is compatible for some min and initialize =. 2. Try to compute a solution s of QP(x, ). If the QP is incompatible then include (h, f ) in the filter, set := + 1 and goto step 1 (h-type iteration). 3. If s = 0 stop with KT point x. 4. If x + s is not acceptable to F (h, f ) then set := /2 and goto If q (s) := q (0) q (s) > 0 and f (s) := f f(x + s) < σ q (s) then set := /2 and goto If q (s) 0 then include (h, f ) in the filter (h-type iteration). 7. Set s = s, =, x +1 = x +s, and initialize +1 min. Set := +1 and goto step 2. A convergence analysis of this algorithm is given in [8] under the following assumptions. Assumptions (A1) (A3). (A1) All points that are sampled by the algorithm lie in a nonempty convex and compact set X. (A2) The problem functions f(x) and c(x) are twice continuously differentiable on an open set containing X. (A3) There exists an M B > 0 such that the Hessian matrices B satisfy B 2 M B for all. The following global convergence result is proven in [8]. Theorem 1. (Global convergence result, cf. [8, Thm. 7]) Under assumtions (A1) (A3) the outcome of applying the filter-sqp algortihm of Algorithm 1 is one of the following:

7 Superlinear local convergence of filter-sqp methods 5 (A) The restoration phase fails to find a point x which is both acceptable to the filter and for which QP(x, ) is compatible for some > min. (B) A KT point of problem (P) is found. (C) There exists an accumulation point that is feasible and either is a KT point or fails to satisfy the Mangasarian-Fromowitz constraint qualification (MFCQ). We recall that MFCQ holds at some point x if the rows of A E (x) are linearly independent and there exists a vector s with A E (x)s = 0, A A (x)s < 0, where A I is the index set of active inequality constraints at x and A A (x) = (a i (x) T ) i A. 3. Transition to fast local convergence Our aim is to modify Algorithm 1 in such a way that it allows transition to superlinear local convergence if the computed iterates x converge to a point x, x satisfies the second order sufficient condition with strict complementarity and the linear independence constraint qualification (LICQ), the Hessian approximations B are chosen appropriately for x close to x. More precisely, we assume the following. Assumption (A4). (A4) The iterates x converge to a point x that satisfies with a Lagrange multiplier ȳ = (ȳe T, ȳt I ) the following second order sufficient conditions with strict complementarity and LICQ: h( x) = 0, x l( x, ȳ) = 0, ȳ A = 0, ȳ I\A > 0, where A := {i I : c i ( x) = 0}, xx l( x, ȳ) is positive definite on {s : A E A ( x)s = 0}, A E A ( x) has full row ran. (O) We still have to specify our requirements on the choice of B Local requirement on the step computation In this subsection we introduce a standard framewor for the Lagrange-multiplier update that ensures with the choice B = xx l(x, y ) q-superlinear local convergence to a point x satisfying (O) if full (local) SQP-steps are taen. To obtain suitable Lagrange multiplier estimates, we will mae use of a multipliers function, see [9,5]: Definition 1. Let U x be a neighborhood of x satisfying (O). Then a function Y : U x R m is called a multipliers function if Y : U x R m is continuous at x and ȳ = Y ( x).

8 6 Stefan Ulbrich One can show that for U x small enough a continuously differentiable multipliers function is for example given by the multipliers function of Glad and Pola Y GP (x) = ( A(x)A(x) T + γ 1 diag(c i (x) 2 ) ) 1 A(x)g(x) with γ 1 > 0, see [11,5]. Moreover, by the strict complementarity assumption in (O) it is easy to see that for any multipliers function Y and any constant ν > 0 for U x small enough it holds A = {i I : c i (x) νy i (x)} =: A(x) x U x. (2) Thus, the active set can be identified exactly close to x. To analyze the transition to fast local convergence we mae for simplicity the following assumptions on the choice of B and y. The use of inexact Hessian approximations, see for example [2,5], would be possible. Assumption (A5). (A5) There is a neighborhood U x of x such that B := H = xx l(x, Y (x )), y = Y (x ) x U x, where Y : U x R m is a Lipschitz continuous multipliers function that satisfies Y I (x) 0, Y I\A (x) = 0 x U x. Remar 1. If Y : U x R m is a Lipschitz continuous multipliers function then a multipliers function Ŷ satisfying (A5) can for U x small enough be obtained by setting Ŷ E A(x) = Y E A(x), ŶI\A(x) = 0 with A(x) in (2), since A(x) = A for U x sufficiently small. Under assumption (O) the point x is a strict local minimum of (P) and we find > 0 such that for U x small enough the following holds: for all x U x with B according to (A5) we have by continuity with a constant κ B > 0 s T B s κ B s 2 2 on {s : A E A (x )s = 0}, A E A (x ) (y ) A 1 2ȳA > 0. has full row ran, Moreover, for any given 0 < we can shrin U x such that for all x U x the solution s of QP(x, ) is unique, satisfies s < and coincides with the solution of the equality constrained problem EQP(x ) given by minimize q (s) subject to c E A (x ) + A E A (x )s = 0, (EQP) where A I is the set of active inequality constraints at x, see for example [1, Sec ], [2, Sec. 5] or [12, Thm. 18.1]. In particular, since (y ) I\A = 0 by (A5) the step s satisfies the KT conditions ( xx l(x, y ) A E A (x ) T A E A (x ) 0 ) ( ) s = z ( ) x l(x, y ). c E A (x )

9 Superlinear local convergence of filter-sqp methods 7 Now it is well nown that with the choice x +1 := x + s, y +1 := Y (x +1 ) the sequence {x } converges under assumption (A5) q-superlinearly to x and that the convergence is q-quadratic if xx f, and xx c i are Lipschitz continuous in a neighborhood of x, see for example [5,14]. We collect these facts in the following proposition: Proposition 1. Let x satisfy (O) and let (A5) hold. Then there exists > 0 such that for any given ζ ]0, 1[ and 0 < possibly after shrining U x the following holds: and with a constant κ B > 0 A E A (x) has full row ran on a compact set S x U x, (3) Y A (x) 1 2ȳA > 0 x U x, (4) s T xx l(x, Y (x))s κ B s 2 2 on {s : A E A (x)s = 0} x U x. (5) For any x U x, y = Y (x ), the sequence defined by x := x and x j+1 = x j + s j with the unique solution s j of QP(x j, ) for j =, + 1,..., where B j = xxl(x j, Y (x j )) are chosen according to (A5), is well defined, converges q-superlinearly to x (q-quadratically if xx f, xx c i are Lipschitz on U x ), and satisfies s j+1 ζ s j, x j+1 x ζ x j x, s j <. (6) Thus, s j solves also QP(x j, ) for all [, ]. Moreover, s j is the unique solution of EQP(x j ) Modifications of the filter-sqp algorithm to avoid the Maratos effect By the well nown Maratos effect the full SQP-step s can lead to an increase of f and h and therefore the iterate x + s will not be acceptable to F (h, f ). To avoid this difficulty, there are three promising approaches: Perform a second order correction step if the Maratos effect is detected. This approach is used successfully by Wächter and Biegler [16] in the context of an interiorpoint linesearch filter algorithm. Allow nonmonotonicity in the filter to admit an increase in f and h. This is in the spirit of the well nown watchdog technique. A promising approach in this direction is the nonmonotone filter method proposed recently by Toint [13]. Use the Lagrangian function instead of the objective function in the filter. In this paper we will consider the last approach. As already mentioned, the Maratos effect can cause that the full SQP-step is not acceptable with respect to F (h, f ), since both h and f are increased. By using as a technical tool the fact that the augmented Lagrangian function l ρ (x, Y (x)) = l(x, Y (x)) + ρ 2 c E A(x) 2 2 (7) is an exact penalty function close to x satisfying (O) if ρ > 0 is large enough and Y : U x R m is a Lipschitz continuous multipliers function, we will see that full SQPsteps are eventually accepted by the filter, if the filter entries (h(x), f(x)) are replaced

10 8 Stefan Ulbrich by entries (θ(x, y), l(x, y)) with an appropriately defined infeasibility measure θ(x, y). This leads to our first modification of Algorithm 1: Modification 1. Instead of the filter entries (h(x), f(x)) we use the entries (θ(x, y), l(x, y)), where θ(x, y) := c E (x) c+ I (x) (yt I c I (x))2. (8) Consequently, we call (x, y) acceptable to the filter F if either θ(x, y) βθ j or l(x, y) + γθ(x, y) l j (9) with constants 0 < γ < β < 1. We remar that Moreover, it is obvious from (8) that 0 h(x) mθ(x, y) 1/2. (10) θ(x, y) = 0 = y T c(x) = 0. (11) The second modification concerns step 5 of Algorithm 1. We note that in the case q (s) > 0 the step is only accepted if the sufficient decrease condition f (s) σ q (s) holds. This decrease condition can only be ensured if the error of the quadratic model becomes small compared to the size of the predicted decrease q (s). We now estimate the size of q (s) for a full SQP-step. Let (O) and (A5) hold and consider x U x with U x as in Proposition 1. To explore the decrease properties of the full (local) SQP-step s given as the solution of QP(x, ) we use the fact that s solves EQP(x ). For convenience we set Then EQP(x ) can be written as Â := A E A (x ), ĉ := c E A (x ). minimize q (s) subject to ĉ + Âs = 0. (EQP) Now we decompose s in a normal step s n and a tangential step s t. To this end, let Then we have obviously q (s) = min ba s t =0 s = s n + s t, s n = ÂT (ÂÂT ) 1 ĉ. (12) (g T s n + 12 (sn ) T B s n + (g + B s n ) T s t + 12 ( st ) T B s t ). (13) Since x U x, we now from (3), (4), and (8) that there exists κ n > 0 (independent of ) with s n κ n θ 1/2. (14) Moreover, (5) yields (s t ) T B s t κ B s t 2.

11 Superlinear local convergence of filter-sqp methods 9 Thus, we obtain by (13) with a constant C > 0 q (s) = 1 2 (st ) T B s t g T s n 1 2 (sn ) T B s n κ B 2 st 2 g s n M B 2 sn 2 κ B 2 st 2 C( s n + s n 2 ) (15) and this lower bound can be sharp. Since f (s) = q (s) + O( s 2 ), the full SQPstep will not necessarily satisfy f (s) σ q (s), unless q (s) (1 σ) f (s) q (s) = O( s 2 ), which is not ensured in view of (15). To circumvent this difficulty we replace the predicted reduction q (s) and actual reduction f (s) by quantities that differ only by o( s 2 ) and we chec the decrease ratio only if the predicted reduction is large enough compared to θ. As we will see, the first goal can be accomplished by using the actual reduction l (s) := l l(x + s, y (s)) (16) instead of f (s), where y (s) denotes the new multiplier estimate, and the predicted reduction q (s) := q (s) + y T c (17) instead of q (s). Secondly, we require the sufficient decrease condition l (s) σ q (s) only if q (s) > κ θ θ ψ/2 with a constant ψ ]1/2, 1]. This leads to the following modification of step 5. Modification 2. To ensure that f-type steps allow eventually full (local) SQP-steps we replace the condition in step 5 by q (s) := q (s) + y T c > κ θ θ ψ/2 and l (s) < σ q (s) with ψ ]1/2, 1]. Analogously, we replace the condition q (s) 0 in step 6 by q (s) κ θ θ ψ/2. Finally, we enter restoration not only if QP(x, ) is infeasible but also if the Lagrange multipliers weight inactive constraints to strongly. More precisely, we define the compatibility of QP(x, ) as follows. Modification 3. We call QP(x, ) compatible if QP(x, ) is feasible and θ 1/2 κ 1+ξ (18) with constants κ > 0 and ξ ]0, 1[. This leads to the following modified version of Algorithm 1.

12 10 Stefan Ulbrich Algorithm 2. Modified Filter-SQP Algorithm Let 0 < γ < β < 1, σ ]0, 1[, ξ ]0, 1[, ψ ]1/2, 1], κ θ, κ > 0, and min > Initialize with a point (x, y), = 1 and filter F 1 = {(u, )} with some u βθ(x, y). 1. Enter restoration phase to find a point (x, y ) acceptable to the (θ, l)-filter F such that QP(x, ) is compatible in the sense (18) for some min and initialize =. 2. Try to compute a solution s of QP(x, ). If the QP is incompatible according to (18) then include (θ, l ) in the filter, set := + 1 and goto step 1 (h-type iteration). 3. If s = 0 stop with KT point x. Otherwise compute a Lagrange multiplier estimate y = y (s). 4. If (x + s, y) is not acceptable to F (θ, l ) then set := /2 and goto If q (s) := q (s) + y T c > κ θ θ ψ/2 and l (s) < σ q (s) then set := /2 and goto If q (s) κ θ θ ψ/2 then include (θ, l ) in the filter (h-type iteration). 7. Set s = s, =, x +1 = x + s, y +1 = y, and initialize min. Set := + 1 and goto step 2. Remar 2. We note that the modified filter-sqp method in Algorithm 2 has many similarities with the filter-sqp algortihms proposed by Fletcher, Gould, Leyffer, Toint, and Wächter [6] and Conn, Gould, and Toint [4]. The convergence analysis in [8] of the original filter-sqp algorithm in Algorithm 1 uses two essential observations. Firstly, q (s) never increases if the trust-region radius is reduced, since the feasible set of QP(x, ) shrins if is reduced. Thus, the condition q (s) > 0 for trying the possibility of an f-type step can only switch from true to false in the inner loop, but not vice versa. We note that the modified Algorithm 2 has also this property, since we only replace q (s) > 0 by q (s) > y T c + κ θ θ ψ/2 and y T c + κ θ θ ψ/2 is fixed in the inner iteration. Remar 3. A reinspection of the convergence proof in [8] shows that the global convergence result of [8] and also the results of the present paper remain true if in step 1 only the local minimizer of QP(x, ) is determined, which is closest to x. Essential is the monotonicity of q (s) with respect to. The second ey point in the analysis of [8] is the observation that Algorithm 1 only adds pairs (h, f ) to the filter with h > 0. Analogously, all pairs (θ, l ) added to the filter by Algorithm 2 satisfy θ > 0. In fact, new entries to the filter can only be added in step 2 if QP(x, ) is incompatible or in step 6 if q (s) y T c + κ θ θ ψ/2. But if θ = 0 then QP(x, ) is compatible in the sense (18) for all > 0, since s = 0 is feasible. Thus, (θ, l ) is not added to the filter in step 2. Moreover, if x is not a KT point then we must have q (s) > 0 and θ = 0 implies y T c + κ θ θ ψ/2 = 0 by (11). Thus, we have q (s) = q (s) > 0 = κ θ θ ψ/2 and also the condition in step 6 for adding (θ, l ) to the filter is not satisfied. Consequently, the minimum constraint violation in the filter satisfies always τ := min θ j > 0. (19) (θ j,l j ) F

13 Superlinear local convergence of filter-sqp methods 11 This will allow us to adjust in section 4 the elegant global convergence analysis in [8] to the modified Algorithm 2. First, we will show that Algorithm 2 allows transition to fast local convergence Proof of transition to fast local convergence Our aim is to show the following result. Theorem 2. Let assumptions (A1) (A5) hold. Then there is K > 0 such that Algorithm 2 taes full (local) SQP-steps for all K, i.e., with some > 0 x +1 = x + s, s < with the unique solution s of QP(x, ) K. In particular, x converges q-superlinearly to x. If xx f and xx c i are Lipschitz continuous in a neighborhood of x then x converges q-quadratically. The proof of this result requires some preparation and is given at the end of this section. In a first step we show that full SQP-steps according to Proposition 1 are eventually accepted by the filter. To this end, we prove as an auxiliary result that the augmented Lagrangian function l ρ (x, Y (x)) in (7) with a Lipschitz continuous multipliers function Y : U x R m is an exact penalty function. Lemma 2. Let x satisfy (O) and let Y : U x R m be a Lipschitz continuous multipliers function. Then there exist constants 0 < λ < η, ρ 0 > 0 and a neighborhood B δ ( x) = {x : x x < δ} such that the augmented Lagrangian function l ρ in (7) satisfies λ 2 x x ρ ρ 0 c E A (x) l ρ(x, Y (x)) l ρ ( x, ȳ) η(1+ρ) x x 2 2 (20) for all ρ ρ 0 and all x B δ ( x). Moreover, possibly after increasing η > 0, ρ 0 > 0, and reducing δ > 0 we have for the generalized augmented Lagrangian l ρ (x, Y (x)) := l(x, Y (x)) + ρ θ(x, Y (x)) (21) 2 with θ according to (8) that for all ρ ρ 0 and all x B δ ( x) λ 2 x x ρ ρ 0 θ(x, Y (x)) 2 l ρ (x, Y (x)) l ρ ( x, ȳ) η(1+ρ) x x 2 2. (22) Proof. Let x U x. We use again the notation ĉ(x) = c E A (x), Â(x) = A E A (x). Using Taylor s theorem and x l ρ ( x, ȳ) = 0, y l ρ ( x, ȳ) = 0, we have with some ( x, ỹ) on the line segment between ( x, ȳ) and (x, Y (x)) l ρ (x, Y (x)) l ρ ( x, ȳ) = 1 2 (x x)t xx l ρ ( x, ỹ)(x x) + (x x) T xy l ρ ( x, ỹ)(y (x) ȳ). (23) Now it is obvious that xx l ρ ( x, ỹ) = xx l ρ/2 ( x, ỹ + ρ 2 ĉ( x) ) + ρ 2 Â( x)t Â( x). (24)

14 12 Stefan Ulbrich It is well nown that under assumption (O) there exists ρ > 0 such that xx l ρ/2 ( x, ȳ) is uniformly positive definite for all ρ ρ, i.e., s T xx l ρ/2 ( x, ȳ)s 4λ s 2 2 s R n with a constant λ > 0, see for example [12, Thm. 17.5]. Let L y be the Lipschitz constant of Y (x) and let ρ 0 := max ( ρ, 4L 2 y/λ ). By continuity, we find δ > 0 such that for all (x, y) with x x < δ, y ȳ < L y δ ( s T xx l ρ0 /2 x, y + ρ ) 0 2 ĉ(x) s 2λ s 2 2 s R n, (25) where we choose δ so small that the δ-ball B δ ( x) = {x : x x < δ} is contained in U x. Thus, we obtain for all x B δ ( x) by (23), (24), (25) l ρ0 (x, Y (x)) l ρ0 ( x, ȳ) λ x x ρ 0 4 Â( x)(x x) 2 2 Now we have for all α > 0 + (Y (x) ȳ) T Â( x)(x x). 2(Y (x) ȳ) T Â( x)(x x) 1 α Â( x)(x x) 2 2 α Y (x) ȳ α Â( x)(x x) 2 2 αl2 y x x 2 2. Hence, the choice α = λ/l 2 y yields (note that ρ 0 /4 L 2 y/λ = 1 α ) l ρ0 (x, Y (x)) l ρ0 ( x, ȳ) λ 2 x x ρ 0 8 Â( x)(x x) 2 2. (26) Since l ρ ( x, ȳ) = l ρ0 ( x, ȳ) = f( x), we have l ρ (x, Y (x)) l ρ ( x, ȳ) = l ρ0 (x, Y (x)) l ρ0 ( x, ȳ) + ρ ρ 0 ĉ(x) 2 2 2, (27) and together with (26) the left inequality in (20) is shown. On the other hand, it is obvious from (23) that for all x U x l ρ (x, Y (x)) l ρ ( x, ȳ) η(1 + ρ) x x 2 2 with a constant η > 0. This proves the right inequality in (20). The estimates (22) for l ρ (x, Y (x)) in (21) follow now from the following observations. By (O) we can reduce δ > 0 such that for all x B δ ( x) Thus, we find by (8) a constant µ > 0 with Y A (x) 1 > 0, c + 2ȳA I\A (x) = 0. µ ĉ(x) 2 2 θ(x, Y (x)) ĉ(x) (Y I(x) T c I (x))2. (28) With ˆρ 0 = ρ 0 /µ we obtain therefore from (26) and lˆρ0 ( x, ȳ) = l ρ0 ( x, ȳ) = f( x) that lˆρ0 (x, Y (x)) lˆρ0 ( x, ȳ) l ρ0 (x, Y (x)) l ρ0 ( x, ȳ) λ 2 x x 2 2.

15 Superlinear local convergence of filter-sqp methods 13 Since an analogue of (27) holds for l ρ, this proves as above the left inequality in (22) after increasing ρ 0 to ˆρ 0. On the other hand we have by the right inequality in (28) l ρ (x, Y (x)) l ρ ( x, ȳ) l ρ (x, Y (x)) l ρ ( x, ȳ) + ρ 2 (Y I(x) T c I (x))2. Since c A ( x) = 0, Y I\A( x) = 0, and Y is Lipschitz at x, we have (Y I (x) T c I (x))2 = O( x x 2 ). This shows the right inequality in (22) possibly after increasing η. We are now in the position to show that for large enough the sequence {x j } j of full (local) SQP-steps is acceptable to the filter. Lemma 3. Assume that x x with x satisfying (O) and that (A5) holds. Then with some > 0 for any ζ ]0, 1[, 0 <, and M 1 there is an index K such that for all K the following holds: The sequence {x j } j of full SQP-steps defined in Proposition 1 is well defined, converges q-superlinearly to x, and satisfies (6). Moreover, the problems QP(x j, ),, are all compatible in the sense (18) and for all [, ] they have the same solution as QP(x j, ). Finally, for all K with x x M min x l x (29) (θ l,l l ) F the points (x j+1, Y (x j+1 )), j =, + 1,..., corresponding to full SQP-steps are acceptable to F j := F (θ, l ) (θ +1, l +1) (θ j, l j). Thus, full SQP-steps starting from x pass the filter test in step 4 of Algorithm 2. Proof. Let > 0 be as in Proposition 1 and choose U x as in Proposition 1 for any given 0 < and ζ = 1/2, say. Moreover, let ρ 0 and B δ ( x) U x be given by Lemma 2. Then by Proposition 1 for all x B δ ( x) the above sequence {x j } j is well defined, satisfies (6) and converges q-superlinearly to x. In particular, we obtain from (6) that the solutions of QP(x j, ) and QP(x j, ) coincide for all [, ]. Without restriction we can choose δ > 0 so small that θ(x, Y (x)) 1/2 κ 1+ξ for all x B δ ( x). Then QP(x j, ),, are all compatible in the sense (18), since {x j } j B δ ( x) for x B δ ( x) by (6). Choose K 1 so large that x B δ ( x) for all K 1 and choose ρ ρ 0 so large that ( ρ ) β 2 γ β ρ 4 and ρ ρ 0 2 ( 1 β ) ρ 4 2. (30) Then we now from (22) in Lemma 2 that for all x B δ ( x) with l ρ in (21) ( λ 2 x x β ) ρ 4 2 θ(x, Y (x)) l ρ (x, Y (x)) l ρ ( x, ȳ) η(1 + ρ) x x 2 2 (31) holds with constants 0 < λ < η. Let as in (19) τ K1 := min (θ j,l j ) F K1 θ j.

16 14 Stefan Ulbrich We have already observed that τ K1 > 0. By continuity we find 0 < δ 1 < δ such that θ(x, Y (x)) βτ K1 for all x B δ1 ( x). Thus, for all points x B δ1 ( x) the pair (x, Y (x)) is acceptable to F K1. Since x x we find K 2 > K 1 with x B δ1 ( x) for all K 2. Using (6) in Proposition 1, we can choose δ 1 so small that for all K 2 the sequence {x j } j of full SQP-steps converges linearly with a contraction factor of at least ( x 1 j+1 x 1 1 β/2 ) λ 2 1 β/4 2M 2 η(1 + ρ) x j x 1 2 x j x. (32) Now consider an arbitrary K 2 such that (29) holds and set δ 2 := x x. By (32) we have x +1 B δ 2 ( x) and thus (x +1, Y (x +1 )) is acceptable to F K 1. We still have to show that it is acceptable with respect to (θ l, l l ) F (θ, l ) for K 1 l. For these l we have x l B δ ( x) \ B δ2 /M ( x) by (29) and therefore by (31) l ρ (x l, Y (x l )) l ρ ( x, ȳ) On the other hand, (31) and (32) yield λ 2M 2 δ2 2 + ( 1 β 4 ) ρ 2 θ l. (33) l ρ (x +1, Y (x +1)) l ρ ( x, ȳ) η(1 + ρ) x +1 x 2 1 ( 1 1 β/2 ) λ 2 1 β/4 2M 2 δ2 2. (34) To prove that (x +1, Y (x +1 )) is acceptable with respect to (θ l, l l ) we show now that any point (x, Y (x)) with x B δ2 ( x) that is not acceptable with respect to (θ l, l l ) satisfies in contrast to (34) ( l ρ (x, Y (x)) l ρ ( x, ȳ) > 1 1 β/2 ) λ 1 β/4 2M 2 δ2 2. In fact, if (x, Y (x)) is not acceptable to (θ l, l l ) then θ(x, Y (x)) > βθ l and l(x, Y (x)) + γθ(x, Y (x)) > l l. Thus, we have with (30) l ρ (x, Y (x)) = l(x, Y (x)) + ρ ( ρ ) 2 θ(x, Y (x)) > l l + 2 γ θ(x, Y (x)) > l l + β ( ρ 2 γ ) θ l l l + β ρ 4 θ l = l ρ (x l, Y (x l )) l ρ (x l, Y (x l )) 1 β/2 ( lρ (x l, Y (x l )) 1 β/4 l ρ ( x, ȳ)). ( 1 β 2 Hereby, we have used (31) in the last step. This shows with (33) that ( l ρ (x, Y (x)) l ρ ( x, ȳ) > 1 1 β/2 ) ( lρ (x l, Y (x l )) 1 β/4 l ρ ( x, ȳ)) ( 1 1 β/2 ) λ 1 β/4 2M 2 δ2 2. ) ρ 2 θ l Since this does by (34) not hold for x +1, we have shown that (x +1, Y (x +1 )) is acceptable to F (θ, l ).

17 Superlinear local convergence of filter-sqp methods 15 Obviously, the same argument can be applied to x +1 and F (θ, l ) instead of x and F, since by (32) again (29) is satisfied for x +1 and F (θ, l ). This shows that x +2 is acceptable to F (θ l ) (θ +1, l +1 ). Thus the acceptability of (θ j+1, l j+1 ) for F j follows by induction. Finally, we show that the sequence {x j } j of full SQP-steps passes for large enough also the decrease test in step 5. Lemma 4. Let the assumptions of Lemma 3 hold and let K be as in Lemma 3 for any fixed ζ 1/2 and 0 <. Then after a possible increase of K for all K the sequence {x j } j of full SQP-steps defined in Proposition 1 satisfies with y j = Y (x j ) the following implication for all j : q j (s j) := q j(s j) + (y j) T c j > κ θ (θ j) ψ/2 = l j(s j) σ q j(s j). (35) Hereby, we use the notations l j (s j ) = l(x j, y j ) l(x j +s j, Y (x j +s j )), c j = c(x j ), θ j = θ(x j, y j ) and analogously for the other quantities. Thus, full SQP-steps starting from x pass step 5 of Algorithm 2 for all for which the solutions of QP(x j, ) and QP(x j, ) coincide, in particular for all [, ]. Proof. By Lemma 3 we now that there exists K > 0 such that for all K the sequence {x j } j, corresponding to full SQP-steps converges q-superlinearly to x and satisfies (6) for ζ = 1/2. We then have Let x j x 1 2 j x x, s j x j x + x j+1 x q j(s j) := q j(s j) + (y j) T c j. 3 2 j +1 x x. Obviously, the assertion is proven if we show that with K > 0 large enough for all j K in the case q j (s j ) > σ(θ j )ψ/2 the following holds q j(s j) l j(s j) (1 σ) q j(s j). Since s j is also the solution of EQP(x j ), we have (36) ĉ j = Â js j (37) with ĉ j = c E I(x j ), Â j = A E I(x j ). Moreover, (y j ) I\A = 0 and thus This shows that (y j) T c j = (y j) T E Aĉ j = (y j) T E AÂ js j = (y j) T A js j. q j(s j) = x l(x j, y j) T s j 1 2 s T j B js j. Moreover, we have by (A5) that (Y (x j + s j ) y j ) I\A = 0 and thus (Y (x j + s j) y j) T c(x j + s j) (Y (x j + s j) y j) E A c E A (x j + s j) = o( s j 2 ).

18 16 Stefan Ulbrich Hereby, we have used that Y (x j + s j ) y j L y s j and that c E A(x j + s j ) = ĉ j + Â j s j + O( s j 2 ) = O( s j 2 ) by (37). Thus, we obtain with some τ [0, 1] q j (s j) l j(s j) 1 2 s T j ( xx l(x j + τs j, y j) B j)s j + (Y (x j + s j) y j) T c(x j + s j) = o( s j 2 ). On the other hand, using the step decomposition (12) and the obvious inequality s j 2 = s n j + s t j 2 2( s n j 2 + s t j 2 ) we have by (15) and (14) after a possible increase of K for all j K q j(s j) κ B 2 s t j 2 C( s n j + s n j 2 ) κ ( B 4 s j 2 C + κ ) B ( s n j + s n j 2 ) 2 κ B 4 s j 2 C(θ j) ψ/2 with a constant C > 0. Hereby, we use (14) and ψ ]1/2, 1] in the last step and choose K large enough such that θ j 1 for all j K. Let M y = 1 + sup x U x Y (x). Then (y j )T c j M yθ 1/2 j M y θ ψ/2 j by (8) and we conclude that (38) q j(s j) κ B 4 s j 2 ( C + M y )(θ j) ψ/2. (39) We are now ready to show the implication (35). Let q j (s j ) > κ θ(θ j )ψ/2. We consider two cases: If q j (s j ) κ B 8 s j 2 then (39) yields κ B 8 s j 2 ( C + M y )(θ j )ψ/2 and thus q j(s j) > κ θ (θ j) ψ/2 q j(s j) > min κ θ κ B 8( C + M y ) s j 2. Otherwise, we have q j (s j ) > κ B 8 s j 2. Thus, we obtain in both cases { } κ B 8, κ θ κ B 8( C + M y ) s j 2 =: κ 1 s j 2. (40) Since x x, we obtain by (36) and (38) possibly after increasing K that for all j K q j(s j) l j(s j) (1 σ)κ 1 s j 2 < (1 σ) q j(s j), where we use (40) in the last inequality. This proves the lemma. By combining Lemma 3 and Lemma 4 we are now able to prove Theorem 2. Proof of Theorem 2. Since (A4) and (A5) hold, we have x x with x satisfying (O) and we can thus apply Proposition 1, Lemma 3, and{ Lemma } 4. Let > 0 be as in Proposition 1, fix some M 1 and set := min, min /4. Now choose K > 0 large enough such that the assertions of Lemma 3 and Lemma 4 hold. Since x +1 x 0 we find K 1 K such that s < = min { }, min, θ 1/2 = θ(x, Y (x )) 1/2 κ 4 1+ξ (41)

19 Superlinear local convergence of filter-sqp methods 17 for all K 1. Then by (41) and Lemma 3 the problems QP(x, ) are compatible in the sense (18) for all K 1 as long as. Now let x, K 1, be an arbitrary iterate satisfying (29). We show that x +1 = x +1 = x + s, where s is the unique solution of QP(x, ) as in Proposition 1. Since the inner iteration halfs starting from min 4 and QP(x, ) are compatible for, it either finds an acceptable iterate x +1 = x + s for some > 2 or it locates a trust-region radius [, 2 ]. In the first case we have s < < by (41) and thus s must coincide with the unique solution s of QP(x, ) as asserted. In the second case denote the solution of QP(x, ) by s. Since [, ] and K, we now by Proposition 1 and Lemma 3 that s coincides with the unique solution s of QP(x, ) and thus s < by (6). Moreover, by Lemma 3 the iterate (x +1, y +1 ) = (x +s, Y (x +s )) is acceptable to the filter F (θ, l ) and passes by Lemma 4 the decrease test in step 5. Thus, the trial iterate (x + s, Y (x + s)) is accepted by the algorithm and we have x +1 = x +1 = x + s, x +1 = Y (x +1 ). Hence, in both cases the algorithm taes the full (local) SQP-step x +1 = x +1. The next iterate satisfies again (29) and we obtain by induction that x j = x j for all j with the iterates {x j } j according to Proposition 1. Now Proposition 1 yields the q-superlinear (or q-quadratic) convergence of {x }. The proof is complete. 4. Global convergence of the modified filter-sqp algorithm In this section we show that the global convergence result of Theorem 1 remains true for the modified filter-sqp method of Algorithm 2. In addition to assumptions (A1) (A3) we assume that the used Lagrange multipliers remain uniformly bounded and that the Lagrange multiplier estimates weight inactive inequality constraints not to strongly. Assumption (A6). (A6) There exists a constant M y > 0 such that all Lagrange multiplier estimates generated in step 1 or step 3 satisfy y M y, y (s) M y, (y ) I 0, (y (s)) I 0 (42) for all. Moreover, there are constants M L > 0 and ζ > 0 with max {ξ, 1ψ } 1 < ζ 1 (43) such that all Lagrange multiplier estimates y = y (s) generated in step 3 satisfy yi T c I (x + s) M L 1+ζ. (44) We note that the condition (44) is not required for Lagrange multipliers generated in the restoration. (42) is a natural requirement and is consistent with (A5). Remar 4. In the equality constrained case (44) is void. Therefore, for equality constrained problems (A6) is obviously satisfied by the multipliers in assumption (A5).

20 18 Stefan Ulbrich Remar 5. Condition (44) can for example be ensured by applying the modification rule max { (c I (x + s)) i, (c I (x ) + A I (x )s) i } < MI 1+ζ = y i = 0, (45) since (c I (x + s)) i (c I (x ) + A I (x )s) i 1 2 nm C 2, cf. Lemma 6 below. This rule allows in particular y i 0 for all constraints that are active at the solution s of QP(x, ). Remar 6. If (A4) and (A5) hold and if QP(x, ) has sufficiently close to x once the correct active set (which will fail to occur only in quite pathological situations if we use B = xx l(x, Y (x )) close to x) then from this iteration on the multipliers Y (x) can be taen and thus (A5) is compatible with (A6). This can be seen as follows. If (A4) holds and the solution of QP(x, ) has for x U x the active set E A then the rule (45) allows to choose y +1 = Y (x +1 ) according to (A5). Thus, with := + 1 we have y = Y (x ) and if x is close enough to x then the condition θ 1/2 κ 1+ξ in (18) ensures that c A (x ) µ 1+ξ with a constant µ > 0. Using this together with (O) one can show that for all x U x possibly after shrining U x the active set of QP(x, ) with B = xx (x, Y (x )) is E A under the compatibility requirement (18) and thus the rule (45) leaves y = Y (x + s) unchanged. More precisely, we have the following lemma, which is proven in the appendix. Lemma 5. Let (A4) hold. Then we can find K > 0 such that for all K the following is true: if B = xx l(x, Y (x )), y = Y (x ) as in (A5) and if QP(x, ) is compatible in the sense (18) then QP(x, ) has the active set E A. The global convergence of Algorithm 2 can be obtained by a slight modification of the convergence proof of Fletcher, Leyffer, and Toint [8] for Algorithm 1. Our aim is to show the following analogue of Theorem 1. Theorem 3. Let assumptions (A1) (A3) and (A6) hold. Then the result of applying the modified filter-sqp algorithm of Algorithm 2 is one of the scenarios (A), (B), or (C) stated in Theorem 1. To prove this theorem we only indicate the necessary modifications of the proof in [8]. We have already observed after the statement of Algorithm 2 that it shares the following properties with Algorithm 1 that are essential for the convergence analysis in [8]: the predicted reduction q (s) depends monotone nonincresing on and the minimal constraint violation τ in the filter F, see (19), is always positive. In the following we setch how the convergence analysis in [8] can be adjusted to the framewor of Algorithm 2. We start by recalling the following auxiliary result in [8, Lem. 3]. Lemma 6. Let assumptions (A1) (A3) hold. Then for any feasible point s of QP(x, ) the following holds. f (s) q (s) nm B 2, (46) c E (x + s) 1 2 nm C 2, (47) c + I (x + s) 1 2 nm C 2, (48) where M C = sup x X xx c i (x) 2, 1 i m.

21 Superlinear local convergence of filter-sqp methods 19 Proof. This is a direct consequence of Taylor s theorem, see [8, Lem. 3]. The next lemma is an analogue of [8, Lem. 4] and shows that the iterate x + s is acceptable to the filter if is small enough and QP(x, ) is feasible. Lemma 7. Let assumptions (A1) (A3) and (A6) hold. If s solves QP(x, ), then x + s is acceptable to the filter if 2+2ζ min { 1, 4βτ /(mn 2 M 2 C + 4M 2 L )} with τ in (19) and ζ in (A6). Proof. By (47), (48), and (A6) we have θ(x + s, y (s)) = c E (x + s) c+ I (x + s) ((y (s)) T I c I (x + s)) mn2 M 2 C 4 + M 2 L 2+2ζ. Thus, θ(x + s) βτ holds for 2+2ζ min { 1, 4βτ /(mn 2 MC 2 + 4M L 2)}. This ensures by (9) and the definition of τ in (19) that the iterate x + s is acceptable to the filter. The next lemma is an adaption of [8, Lem. 5] to Algorithm 2. Lemma 8. Let assumptions (A1) (A3) and (A6) hold and let x X be a feasible point of problem (P) at which MFCQ holds but which is not a KT point. Then there exists a neighborhood N of x and constants ε, µ, κ > 0 such that for all x N X and all for which µ(θ 1/(2+2ξ) + θ ψ/2 + θ 1/2 ) κ (49) it follows that QP(x, ) is compatible in the sense (18) with solution s at which the predicted reduction (17) satisfies q (s) 1 4 ε + κ θθ ψ/2 (50) and the actual reduction (16) satisfies the sufficient reduction condition l (s) σ q (s) (51) as well as l (s) γθ(x + s). (52) Proof. In [8, Lem. 5] it is shown that there exists a neighborhood N of x and constants ε, µ, κ > 0 such that for all x N X and all with µh κ the problem QP(x, ) has a feasible solution s at which q (s) 1 3 ε, f (s) σ q (s), f (s) γh(x + s). (53) Without restriction we can choose κ 1. Now set { µ = max 1, κ 1/(1+ξ), mµ, 12 κ } θ + M y + 1. ε Then (49) implies 1+ξ κ 1 θ1/2.

22 20 Stefan Ulbrich Thus, QP(x, ) is compatible in the sense (18). By (10) we have mθ 1/2 h. Hence, (49) implies by our choice of µ µh µθ 1/2 κ, 12 κ θ + M y + 1 (θ ψ/2 + θ 1/2 ). (54) ε Therefore, (53) is satisfied by [8, Lem. 5] and yields with (49), (54) This shows that q (s) 1 4 ε ε 1 4 ε + (κ θ + M y + 1)(θ ψ/2 + θ 1/2 ). q (s) = q (s) + y T c q (s) (M y + 1)θ 1/2 1 4 ε + κ θθ ψ/2 and (50) is proven. It remains to show (51) and (52). By (42) in (A6) we have (y (s)) T I c I (x +s) 0. This yields with (47), (48) l (s) = f (s) + y T c y (s) T c(x + s) f (s) + y T 1 c M y mnmc 2. 2 Together with (46) and (50) we obtain ( ) 1 l (s) q (s) nm B + M y mnmc 2 2 σ q (s) + (1 σ) 1 ( ) 4 ε 1 nm B + M y mnmc 2. 2 Thus, (51) holds for all (55) (1 σ) 1 4 ε 1 =: κ 1 (56) nm B + M y 2 mnmc and consequently (49) ensures (51) if we choose κ κ 1. Finally, (51), (50), (47), (48), and (A6) yield l (s) γθ(x + s) σ 1 4 ε γ 1 4 mn2 M 2 C 4 γm 2 L 2+2ζ (57) and the right hand side is nonnegative for all { ( ) } 1/(1+2ζ) σε min 1, γmn 2 MC 2 + 4γM L 2 =: κ 2. (58) Thus, also (52) is satisfied if we choose κ κ 2 in (49). As a last preparation for the proof of Theorem 3 we proceed as in [8] and show that the inner loop of Algorithm 2 terminates finitely. The following lemma is a counterpart of [8, Lem. 6]. Lemma 9. Let assumptions (A1) (A3) and (A6) hold. Then the inner iteration of Algorithm 2 terminates finitely.

23 Superlinear local convergence of filter-sqp methods 21 Proof. Again, we only describe the modifications of the proof of Lemma 6 in [8]. If x is a KT point of (P) then s = 0 solves QP(x, ) and the inner iteration terminates in step 3. Otherwise, we have 0 if the inner iteration does not terminate finitely. If θ > 0 then QP(x, ) is obviously incompatible in the sense (18) for all sufficiently small > 0. Thus, the inner iteration terminates finitely in step 2. If θ = 0 then we find exactly as in the proof of Lemma 6 in [8, p. 55] a constant η > 0 with q (s) 1 2η for all 0 < κ with a constant κ > 0 sufficiently small. Now, θ = 0 implies y T c = 0 by (11) and thus q (s) = q (s) 1 2 η > κ θ θ ψ/2 = 0. Thus, it follows that q (s) 1 4ε with ε = 2η. Now we can continue exactly as in the proof of Lemma 8. In fact, we obtain (55) and thus l (s) σ q (s) folds for all κ 1 with κ 1 > 0 according to (56). Hence, the sufficient decrease condition in step 5 is satisfied for min {κ, κ 1 }. Moreover, we have also (57) and thus l (s) γθ(x + s) is satisfied for κ 2 with κ 2 > 0 in (58). Thus, x is for min {κ, κ 1, κ 2 } also acceptable relative to (θ, l ), cf. (9). Moreover, for { ( ) } 1/(2+2ζ) 4βτ min 1, mn 2 MC 2 + 4M L 2 =: κ 3 the iterate x is by Lemma 7 also acceptable to F. Hence, for min {κ, κ 1, κ 2, κ 3 } also the filter test of step 4 is satisfied and thus the inner iteration terminates in step 7. The proof is complete. We are now able to adapt the global convergence proof of [8] to Algorithm 2. Proof of Theorem 3. We have only to consider case (C). Then the algorithm runs infinitely and since the inner iteration is finite by Lemma 9, the outer iteration produces an infinite sequence of iterates x. By (A1) the sequence {x } has at least an accumulation point. Consider the case that the outer iteration produces an infinite sequence of h-type iterations. Exactly as in [8] we obtain a subsequence S of h-type iterations such that x x with some x, τ +1 = θ < τ for S and Lemma 1 with (θ, l ) instead of (h, f ) yields θ 0 for S. Thus, x is feasible and if x violates MFCQ then (C) is satisfied. Now consider the case where MFCQ is satisfied at x and x is not a KT point. As in [8] we will derive a contradiction. Let N be the neighborhood of x given in Lemma 7 and choose K > 0 so large that x N for all S, K. Lemma 7 and Lemma 8 show that with constants µ, κ > 0 for all S, K and all with { ( ) } 1/(2+2ζ) µ(θ 1/2 + θ ψ/2 + θ 1/(2+2ξ) ) min κ, 1, 4βτ mn 2 M 2 C + 4M 2 L the problem QP(x, ) is compatible and the solution s satisfies l (s) σ q (s), q (s) > κ θ θ ψ/2, l (s) γθ(x + s) (see Lemma 8), and (θ(x + s), l(x + s, y (s))) is acceptable to F (see Lemma 7). In particular, for all µ(θ 1/2 + θ ψ/2 + θ 1/(2+2ξ) ) no h-type iteration occurs, since QP(x, ) is compatible in the sense (18) and q (s) > κ θ θ ψ/2 (recall that q (s) is nondecreasing in ) and for satisfying (59) an f-type step will occur. Hence, the algorithm will tae an f-type step in iteration S, K, (59)

24 22 Stefan Ulbrich if halfing of beginning by some min in the inner loop locates a satisfying (59) (then the conditions for an f-type step will be met if this was not the case before). But we have τ +1 = θ < τ and θ 0 for S. Since by (43) max { 1 2, ψ 2, ξ } > ζ, the upper bound in (49) will for all S large enough be at least twice the lower bound. Thus, for large S an f-type step will be taen which contradicts the fact that the iterations S are composed of h-type steps. Thus, the assumption that x is not a KT point was wrong. Finally, consider the case where the algorithm taes only a finite number of h-type steps. Then there is K > 0 such that l l +1 σ q (s) > κ θ θ ψ/2 for all K. By (A1), (A2), (A6) l is uniformly bounded and thus > K (l l +1 ) K σ q (s) > K σκ θ θ ψ/2 0. Therefore, we have θ 0 and thus any accumulation point x of {x } which exists by (A1) is feasible. Assume that x satisfies MFCQ but is no KT point and let S be a subsequence of iterates converging to x. As above let N be the neighborhood of x given in Lemma 7 and choose K > 0 so large that x N for all S, K. Since no h-type steps occur for K we have τ = τ K > 0 for all K and exactly as above an f-type step is guaranteed for all S, K if { ( ) } 1/(2+2ζ) µ(θ 1/2 + θ ψ/2 + θ 1/(2+2ξ) ) min κ, 1, 4βτ K =: κ. mn 2 MC 2 + 4M L 2 (60) This time the upper bound is constant and the lower bound is half the upper bound for K possibly after increasing K. Hence, for all S, K, the f-type step occurs with min { min, κ/2} and thus (50) yields l l +1 σ q (s ) 1 4 ε min { min, κ/2}. Since l l +1 > κ θ θ ψ/2 0 for all K, this implies l, which is a contradiction to the boundedness of l. Thus x is a KT point or violates MFCQ. 5. Conclusions We have proposed a modified version of the globally convergent trust-region filter- SQP method by Fletcher, Leyffer, and Toint [8] such that the transition to fast local convergence is ensured under appropriate local assumptions on the Hessian approximation in the SQP-problems and on the Lagrange multipliers. Hereby, the original SQP-steps without additional second-order correction can be used. The modified algorithm is globally convergent if the Lagrange multipliers used are bounded and weight inactive inequality constraints not to strongly. Thus, in the equality constrained case only the boundedness of the Lagrange multipliers is assumed globally which is always compatible with our local requirement on the Lagrange multipliers. In the inequality constrained case our global and local conditions on the Lagrange multipliers are in particular compatible as soon as sufficiently close to the limit point x a subproblem

Nonlinear programming without a penalty function or a filter

Report no. NA-07/09 Nonlinear programming without a penalty function or a filter Nicholas I. M. Gould Oxford University, Numerical Analysis Group Philippe L. Toint Department of Mathematics, FUNDP-University