An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity

Size: px

Start display at page:

Download "An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity"

Robert Ward
5 years ago
Views:

1 An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity Coralia Cartis, Nick Gould and Philippe Toint Department of Mathematics, University of Namur, Belgium ( philippe.toint@fundp.ac.be ) Buenos-Aires, IFIP, July 2009

2 Cubic regularization for unconstrained problems The problem We consider the unconstrained nonlinear programming problem: minimize f (x) for x IR n and f : IR n IR smooth. Important special case: the nonlinear least-squares problem minimize f (x) = 1 F (x) 2 2 for x IR n and F : IR n IR m smooth. Philippe Toint (Namur) July / 26

3 Cubic regularization for unconstrained problems A useful observation Note the following: if f has gradient g and globally Lipschitz continuous Hessian H with constant 2L Taylor, Cauchy-Schwarz and Lipschitz imply f (x + s) = f (x) + s, g(x) + 1 s, H(x)s (1 α) s, [H(x + αs) H(x)]s dα f (x) + s, g(x) s, H(x)s L s 3 2 }{{} m(s) = reducing m from s = 0 improves f since m(0) = f (x). Philippe Toint (Namur) July / 26

4 Cubic regularization for unconstrained problems The cubic regularization Change from trust-regions: min s f (x) + s, g(x) + 1 s, H(x)s s.t. s 2 to cubic regularization: min s f (x) + s, g(x) s, H(x)s σ s 3 σ is the (adaptive) regularization parameter (ideas from Griewank, Weiser/Deuflhard/Erdmann, Nesterov/Polyak, Cartis/Gould/T) Philippe Toint (Namur) July / 26

5 Cubic regularization for unconstrained problems Cubic regularization highlights f (x + s) m(s) f (x) + s T g(x) st H(x)s L s 3 2 Nesterov and Polyak minimize m globally and exactly N.B. m may be non-convex! efficient scheme to do so if H has sparse factors global (ultimately rapid) convergence to a 2nd-order critical point of f better worst-case function-evaluation complexity than previously known Obvious questions: can we avoid the global Lipschitz requirement? can we approximately minimize m and retain good worst-case function-evaluation complexity? does this work well in practice? Philippe Toint (Namur) July / 26

6 Cubic regularization for unconstrained problems Cubic overestimation Assume Use f C 2 f, g and H at x k are f k, g k and H k symmetric approximation B k to H k B k and H k bounded at points of interest cubic overestimating model at x k m k (s) f k + s T g k st B k s σ k s 3 2 σ k is the iteration-dependent regularisation weight easily generalized for regularisation in M k -norm s Mk where M k is uniformly positive definite = s T M k s Philippe Toint (Namur) July / 26

7 Cubic regularization for unconstrained problems Adaptive Regularization with Cubic (ARC) Algorithm 1.1: The ARC Algorithm Step 0: Initialization: x 0 and σ 0 > 0 given. Set k = 0 Step 1: Step computation: Compute s k for which m k (s k ) m k (s C k ) Cauchy point: s C k = αc k g k & α C k = arg min α IR + m k ( αg k ) Step 2: Step acceptance: Compute ρ k = f (x k) f (x k + s k ) { f (x k ) m k (s k ) xk + s and set x k+1 = k if ρ k > 0.1 x k otherwise Step 3: Update the regularization parameter: σ k+1 (0, σ k ] = 1σ 2 k if ρ k > 0.9 very successful [σ k, γ 1 σ k ] = σ k if 0.1 ρ k 0.9 successful [γ 1 σ k, γ 2 σ k ] = 2σ k otherwise unsuccessful Philippe Toint (Namur) July / 26

8 Cubic regularization for unconstrained problems Local convergence theory for cubic regularization (1) The Cauchy condition: m k (x k ) m k (x k + s k ) κ CR g k min g k 1 + H k, g k σ k The bound on the stepsize: s k 3 max H k σ k, g k σ k (Cartis/Gould/T) Philippe Toint (Namur) July / 26

9 Cubic regularization for unconstrained problems Local convergence theory for cubic regularization (2) And therefore... lim g k = 0 k Under stronger assumptions can show that first-order global convergence If s k minimizes m k over subspace with orthogonal basis Q k, lim k QT k H k Q k 0 second-order global convergence Philippe Toint (Namur) July / 26

10 Cubic regularization for unconstrained problems Fast convergence For fast asymptotic convergence = need to improve on Cauchy point: minimize over Krylov subspaces g stopping-rule: s m k (s k ) min(1, g k 1 2 ) g k s stopping-rule: s m k (s k ) min(1, s k ) g k If B k satisfies the Dennis-Moré condition (B k H k )s k / s k 0 whenever g k 0 and x k x with positive definite H(x ) = Q-superlinear convergence of x k under the g- and s-rules If additionally H(x) is locally Lipschitz around x and (B k H k )s k = O( s k 2 ) = Q-quadratic convergence of x k under the s-rule Philippe Toint (Namur) July / 26

11 Cubic regularization for unconstrained problems Function-evaluation complexity How many function evaluations (iterations) are needed to ensure that g k ɛ? so long as for very successful iterations σ k+1 γ 3 σ k for γ 3 < 1 = basic ARC algorithm requires at most κc ɛ 2 function evaluations for some κ C independent of ɛ c.f. steepest descent if H is globally Lipschitz, the s-rule is applied and additionally s k is the global (line) minimizer of m k (αs k ) as a function of α = ARC algorithm requires at most κs ɛ 3/2 for some κ S independent of ɛ function evaluations c.f. Nesterov & Polyak Philippe Toint (Namur) July / 26

12 Cubic regularization for unconstrained problems Minimizing the model m(s) f + s T g st Bs σ s 3 2 Small problems: use Moré-Sorensen-like method with modified secular equation (also OK as long as factorization is feasible) Large problems: an iterative Krylov space method approximate solution Numerically sound procedures for computing exact/approximate steps Philippe Toint (Namur) July / 26

13 Cubic regularization for unconstrained problems The main features of adaptive cubic regularization And the result is... longer steps on ill-conditioned problems similar (very satisfactory) convergence analysis best function-evaluation complexity for nonconvex problems excellent performance and reliability Philippe Toint (Namur) July / 26

14 Cubic regularization for unconstrained problems Numerical experience small problems using Matlab 1 Performance Profile: iteration count 131 CUTEr problems 0.9 fraction of problems for which method within α of best ACO g stopping rule (3 failures) ACO s stopping rule (3 failures) trust region (8 failures) α Philippe Toint (Namur) July / 26

15 Regularization techniques for constrained problems The constrained case Can we apply regularization to the constrained case? Consider the constrained nonlinear programming problem: minimize f (x) x F for x IR n and f : IR n IR smooth, and where F is convex. Main ideas: exploit (cheap) projections on convex sets define using the generalized Cauchy point idea prove global convergence + function-evaluation complexity Philippe Toint (Namur) July / 26

16 Regularization techniques for constrained problems Constrained step computation (1) min s subject to f (x) + s, g(x) s, H(x)s σ s 3 x + s F σ is the (adaptive) regularization parameter Criticality measure: (as before) χ(x) def = min xf (x), d x+d F, d 1, Philippe Toint (Namur) July / 26

17 Regularization techniques for constrained problems The generalized Cauchy point for ARC Cauchy step: Goldstein-like piecewise linear seach on m k along the gradient path projected onto F Find such that x GC k = P F [x k tk GC g k] def = x k + sk GC (t GC k > 0) m k (x GC k ) f (x k) + κ ubs g k, s GC k (below linear approximation) and either m k (x GC k ) f (x k) + κ lbs g k, s GC k (above linear approximation) or P T (x GC k ) [ g k] κ epp g k, sk GC no trust-region condition! (close to path s end) Philippe Toint (Namur) July / 26

18 Regularization techniques for constrained problems Searching for the ARC-GCP m k (0 + s) = 3.57s 1 1.5s 2 s 3 + s 1 s 2 + 3s s 2s 3 2s s 3 such that s 1.5 Philippe Toint (Namur) July / 26

19 Regularization techniques for constrained problems A constrained regularized algorithm Algorithm 2.1: ARC for Convex Constraints (COCARC) Step 0: Initialization. x 0 F, σ 0 given. Compute f (x 0 ), set k = 0. Step 1: Generalized Cauchy point. If x k not critical, find the generalized Cauchy point xk GC by piecewise linear search on the regularized cubic model. Step 2: Step calculation. Compute s k and x + def k = x k + s k F such that m k (x + k ) m k(xk GC). Step 3: Acceptance of the trial point. Compute f (x + k ) and ρ k. If ρ k η 1, then x k+1 = x k + s k ; otherwise x k+1 = x k. Step 4: Regularisation parameter update. Set (0, σ k ] if ρ k η 2, σ k+1 [σ k, γ 1 σ k ] if ρ k [η 1, η 2 ), [γ 1 σ k, γ 2 σ k ] if ρ k < η 1. Philippe Toint (Namur) July / 26

20 Regularization techniques for constrained problems Local convergence theory for COCARC The Cauchy condition: [ m k (x k ) m k (x k + s k ) κ CR χ k min χ k 1 + H k, ] χk, 1 σ k The bound on the stepsize: s k 3 max [ H k σ k, ( ) 1 χk σ k 2, ( χk σ k ) 1 ] 3 And therefore... lim χ k = 0 k (Cartis/Gould/T) Philippe Toint (Namur) July / 26

21 Regularization techniques for constrained problems Function-Evaluation Complexity for COCARC (1) But What about function-evaluation complexity? If, for very successful iterations, σ k+1 γ 3 σ k for γ 3 < 1, the COCARC algorithm requires at most κc ɛ 2 function evaluations (for some κ C independent of ɛ) to achieve χ k ɛ c.f. steepest descent Do the nicer bounds for unconstrained optimization extend to the constrained case? Philippe Toint (Namur) July / 26

22 Regularization techniques for constrained problems Function-evaluation complexity for COCARC (2) As for unconstrained, impose a termination rule on the subproblem solution: Do not terminate solving min xk +s F m k (x k + s) before where χ m k χ m k (x + k ) min(κ stop, s k ) χ k (x) def = min xm k (x), d x+d F, d 1 Note: OK at local constrained model minimizers c.f. the s-rule for unconstrained Philippe Toint (Namur) July / 26

23 Regularization techniques for constrained problems Walking through the pass x k α g k 0 + x min x k feasible A beyond the pass constrained problem with m(x, y) = x y 3 10 x y [x 2 + y 2 ] 3 2 Philippe Toint (Namur) July / 26

24 Regularization techniques for constrained problems Walking through the pass...with a sherpa 2 1 x k,c x k,a x k α g k 0 x k + x k feasible A piecewise descent path from x k to x + k on m(x, y) = x y 3 10 x y [x 2 + y 2 ] 3 2 Philippe Toint (Namur) July / 26

25 Regularization techniques for constrained problems Function-Evaluation Complexity for COCARC (2) Assume also x k x + k in a bounded number of feasible descent substeps H k xx f (x k ) κ s k 2 xx f ( ) is globally Lipschitz continuous {x k } bounded The COCARC algorithm requires at most κc ɛ 3/2 function evaluations (for some κ C independent of ɛ) to achieve χ k ɛ Caveat: cost of solving the subproblem c.f. unconstrained case!!! Philippe Toint (Namur) July / 26

26 Conclusions Conclusions Much left to do... but very interesting Meaningful numerical evaluation still needed for many of these algorithms Many issues regarding regularizations still unresolved Many thanks for your attention! Philippe Toint (Namur) July / 26

Adaptive cubic overestimation methods for unconstrained optimization

Report no. NA-07/20 Adaptive cubic overestimation methods for unconstrained optimization Coralia Cartis School of Mathematics, University of Edinburgh, The King s Buildings, Edinburgh, EH9 3JZ, Scotland,