A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation

Size: px
Start display at page:

Download "A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation"

Transcription

1 A Stochastic Levenberg-Marquardt Method Using Random Models with Application to Data Assimilation E Bergou Y Diouane V Kungurtsev C W Royer July 5, 08 Abstract Globally convergent variants of the Gauss-Newton algorithm are often the preferred methods to tackle nonlinear least squares problems Among such frameworks, the Levenberg- Marquardt and the trust-region methods are two well-established paradigms, and their similarities have often enabled to derive similar analyses of these schemes Both algorithms have indeed been successfully studied when the Gauss-Newton model is replaced by a random model, only accurate with a given probability Meanwhile, problems where even the obective value is subect to noise have gained interest, driven by the need for efficient methods in fields such as data assimilation In this paper, we describe a stochastic Levenberg-Marquardt algorithm that can handle noisy obective function values as well as random models, provided sufficient accuracy is achieved in probability Our method relies on a specific scaling of the regularization parameter, which clarifies further the correspondences between the two classes of methods, and allows us to leverage existing theory for trust-region alorithms Provided the probability of accurate function estimates and models is sufficiently large, we establish that the proposed algorithm converges globally to a first-order stationary point of the obective function with probability one Furthermore, we derive a bound the expected number of iterations needed to reach an approximate stationary point We finally describe an application of our method to variational data assimilation, where stochastic models are computed by the so-called ensemble methods Keywords: Levenberg-Marquardt method, nonlinear least squares, regularization, random models, noisy functions, data assimilation MaIAGE, INRA, Université Paris-Saclay, Jouy-en-Josas, France (elhoucinebergou@inrafr) ISAE-SUPAERO, Université de Toulouse, 3055 Toulouse Cedex 4, France (youssefdiouane@isaefr) Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague Support for this author was provided by the Czech Science Foundation proect S (vyacheslavkungurtsev@felcvutcz) Wisconsin Institute for Discovery, University of Wisconsin-Madison, 330 N Orchard St, Madison, WI 5375, USA (croyer@wiscedu) Support for this author was provided by Subcontract 3F-30 from Argonne National Laboratory

2 Introduction Minimizing a nonlinear least-squares function is one of the most classical problems in numerical optimization, that arises in a variety of fields In many applications, the obective function to be optimized can only be accessed through noisy estimates Typical occurrences of such a formulation can be found when solving inverse problems [6, 7, 8 or while minimizing the error of a model in the context of machine learning [9 In such cases, the presence of noise is often due to the estimation of the obective function via cheaper, less accurate calculations: this is for instance true when part of the data is left aside while computing this estimate In fact, in data-fitting problems such as those coming from machine learning, a huge amount of data is available, and considering the entire data throughout the optimization process can be extremely costly Moreover, the measurements can be redundant and possibly corrupted: in that context, a full evaluation of the function or the gradient may be unnecessary Such concerns have motivated the development of optimization frameworks that cope with inexactness in the obective function or its derivatives In particular, the field of derivative-free optimization [5, where it is assumed that the derivatives exist but are unavailable for use in an algorithm, has expanded in the recent years with the introduction of random models One seminal work in this respect is [, where the authors applied arguments from compressed sensing to guarantee accuracy of quadratic models whenever the Hessian had a certain (unknown) sparsity pattern Trust-region methods based on general probabilistic models were then proposed in [, where convergence to first- and second-order stationary points was established under appropriate accuracy assumptions on the models Global convergence rates were derived for this approach in [9, in expectation and with high probability Of particular interest to us is the extension of trust-region methods with probabilistic models to the case of noisy function values [3: the corresponding algorithm considers two sources of randomness, respectively arising from the noisy function estimates and the random construction of the models A global convergence rate in expectation for this method was derived in [7, where it was established that the method needed O(ɛ ) iterations in expectation to drive the gradient norm below some threshold ɛ In the context of derivative-free least-squares problems where exact function values are available, various deterministic approaches based on globalization of the Gauss-Newton method have been studied The algorithms developed in the derivative-free community are mostly of trustregion type, and rely on building models that satisfy the so-called fully linear property, which requires the introduction of a so-called criticality step to guarantee its satisfaction throughout the algorithmic process [, 3, 3, 9 The recent DFO-GN algorithm [ was equipped with a complexity result, showing a bound of the same order than derivative-free trust-region methods for generic functions [8 As for general problems, considering random models is a possible way of relaxing the need for accuracy at every iteration A Levenberg-Marquardt algorithm based in this idea was proposed in [6, motivated by problems from data assimilation The authors of [6 proposed an extension of the classical LM algorithm that replaces the gradient of the obective function by a noisy estimate, that is accurate only with a certain probability Using similar arguments than for the trust-region case [, almost-sure global convergence to a first-order stationary point was established The case of noisy least squares has also been examined A very recent preprint [0 proposed a efficient approach for handling noisy values in practice, but did not provide theoretical guarantees A Levenberg-Marquardt framework for noisy optimization without derivatives was proposed in [4, with similar goals as those aimed in this paper The method proposed in [4

3 assumes that function values can be estimated to a prescribed accuracy level, and explicitly maintains a sequence of these accuracies throughout the algorithm Although such an approach is relevant when any accuracy level can be used (for instance, all the data can be utilized to estimate the function), it does not allow for arbitrarily bad estimations on any iteration: moreover, the noise level must be small compared to the norm of the upcoming Levenberg-Marquardt step, a condition that may force to reduce this noise level, and resembles the criticality step of derivative-free model-based methods By contrast, the use of random models and estimates with properties only guaranteed in probability allows for arbitrarily bad estimates, which seems more economical at the iteration level, and does not necessarily mean that a good step will not be computed in that case Probabilistic properties thus emerges as an interesting alternative, particularly when it is expensive to compute accurate estimates, and one can then think of exploiting the connection between Levenberg-Marquardt and trust-region methods [3 to analyze the former in the case of noisy problems In this paper, we propose a stochastic framework that builds upon the approach developed in [6 to handle both random models and noise in the function evaluations This new algorithm is also inspired by a recently proposed variant of the Levenberg-Marquardt framework [5, where a specific scaling of the regularization parameter enabled the derivation of worst-case complexity results We adapt the analysis of the stochastic trust-region framework using random models proposed in [7, 3 to prove that our framework enoys comparable convergence and complexity guarantees Unlike [4, our setup allows for arbitrarily inaccurate models or function estimates, as long as it happens with a small probability Our method is particularly suited for applications in data assimilation, which we illustrate in the context of ensemble methods The remainder of the paper is organized as follows Section presents our Levenberg- Marquardt framework; Section 3 established the accuracy requirements we make on the function values and the models, as well as their probabilistic counterparts Global convergence and worstcase complexity of the method are analyzed in Sections 4 and 5, respectively Finally, Section 6 describes an application of our method in data assimilation A Levenberg-Marquardt algorithm based on estimated values In this paper, we consider the following nonlinear least squares problem: min x R n f(x) = r(x), () where r : R n R m is the residual vector-valued function, assumed to be continuously differentiable, and most likely m n During the minimization process, the optimizer can only have access to estimates of f - referred as f This estimate is assumed to be noisy, ie, one has for all x R n, f(x) = r(x, ξ) where the noise ξ is a random variable This section recalls the main features of the Levenberg-Marquardt method, then describes our extension of this algorithm to handle noisy function values and gradients Deterministic Levenberg-Marquardt algorithm Whenever the function r and its Jacobian can be accessed, one possible approach for solving the problem () is based on the Gauss-Newton model More precisely, at a given iterate x, a step 3

4 is computed as a solution of the linearized least squares subproblem min s R n r + J s, where r = r(x ) and J = J(x ) denotes the Jacobian of r at x The subproblem has a unique solution if J has full column rank, and in that case the step is a descent direction for f When J is not of full column rank, the introduction of a regularization parameter can lead to similar properties This is the underlying idea behind the Levenberg-Marquardt [,, 4 algorithm, a globally convergent method based upon the Gauss-Newton model At each iteration, one considers a step of the form (J J + γ I) J r, corresponding to the unique solution of min s R n r + J s + γ s, () where γ 0 is an appropriately chosen regularization parameter, typically updated in the spirit of the classical trust-region radius update strategy at each iteration Several strategies were then developed to update γ Several approaches have considered scaling this parameter using the norm of the gradient of the Gauss-Newton model [5, 33 A similar choice will be adopted in this paper Algorithmic framework based on estimates In this work, we are interested in the case where r and J r cannot be directly accessed, but noisy estimates are available As a result, we will consider a variant of the Levenberg-Marquardt algorithm in which both the function and gradient values are approximated Algorithm presents a description of our method At every iteration, estimates of the values of f and its derivative at the current iterate are obtained, and serve to define a regularized Gauss-Newton model (3), where the regularization parameter is defined using a specific scaling formula: γ = µ m (x ) where µ 0 The model m is then approximately minimized, yielding a trial step s The resulting new point is accepted only if the ratio ρ between estimated decrease (f is again estimated at the new trial point) and model decrease is sufficiently high The Levenberg-Marquardt parameter µ is updated depending on the value of ρ, and also on a condition involving the model gradient Such updates have been widely used in derivative-free model-based methods based on random estimates [, 6, 3, 9 3 Probabilistic properties for the Levenberg-Marquardt method We are interested in the case where the obective function values, the gradient J r and the Jacobian J are noisy, and we only have their approximations 3 Gradient and function estimates We begin by describing our accuracy requirements for the models computed based on sampled values, of the form given in (3) Following previous work on derivative-free Levenberg-Marquardt methods [6, we propose the following accuracy definition, and motivate further its use below 4

5 Algorithm : A Levenberg-Marquardt method using random models Data: Define η (0, ), η, µ min > 0, and λ > Choose x 0 and µ 0 µ min for = 0,, do Compute an estimate f 0 of f(x ) Compute g m and J m, the gradient and the Jacobian estimate at x, set γ = µ g m, and define the model m of f around x by: s R n, m (x + s) = m (x ) + gm s + ( ) s Jm J m + γ I s (3) 3 Compute an approximate solution s of the subproblem 4 Compute an estimate f of f(x + s ), then compute min m (x + s) (4) s R n ρ = f 0 f m (x ) m (x + s ) 5 If ρ η and g m η µ, set x + = x + s and µ + = max µ λ, µ min} Otherwise, set x + = x and µ + = λ µ end Definition 3 Consider a realization of Algorithm, and the model m of f defined around the iterate x of the form (3), and let κ ef, κ eg > 0 Then, the model m is called (κ ef, κ eg )-firstorder accurate with respect to (x, µ ) if g m J r κ eg µ (5) and f(x ) m (x ) κ ef µ (6) Remark 3 The accuracy requirement for the model gradient (5) is similar to the first-order accuracy property introduced by Bergou, Gratton and Vicente [6 However, it is not exactly equivalent as we use µ instead of γ = µ g m The purpose of this new parametrization is twofold First, it allows us to measure the accuracy in formulas (5) and (6) through a parameter that is updated in an explicit fashion throughout the algorithmic run: this is a key property for performing a probabilistic analysis of optimization methods Secondly, we believe this choice to be a better reflection of the relationship between the Levenberg-Marquardt and the trust-region parameter Indeed, for a realization of the method, the Levenberg-Marquardt direction minimizing m (s) is given by ( d = J J + γ I) J r, (7) 5

6 which is also the solution of the trust-region subproblem mind r + J d st d δ = d As a result, we see that for a large value of γ, one would have: ( ) J r δ = O, (9) which suggests that γ is not exactly equivalent to the inverse of the trust-region radius, as suggested in [6, but that it rather is an equivalent to J F δ Still, this relation implies that µ can be seen as an equivalent to δ : in that sense, (5) matches the gradient assumption for fully linear models [5 Note that Definition 3 contains two requirements: in the absence of noise, (6) is trivially satisfied by setting m (x ) = f(x ) In this work, we consider that even function values cannot be accessed inexactly, thus (6) appears to be necessary In the case of noisy function values, we also expect the estimates computed by Algorithm to be sufficiently accurate with a suitable probability This is formalized in the following definitions Definition 3 Given ε f > 0, we say that two values f 0 and f are ε f -accurate estimates of f(x ) and f(x + s ), respectively, for a given µ, if f 0 f(x ) ε f µ and f f(x + s ) ε f µ (0) 3 Probabilistic accuracy of model gradients and function estimates We are further interested in the case where the models are built in some random fashion We will thus consider random models of the form M, and we use the notation m = M (ω) for its realizations Correspondingly, let random variables g M and J M denote the estimates of the gradient J r and the Jacobian J, with their realizations denoted by g m = g M (ω), and J m = J M (ω) Note that the randomness of the models implies the randomness of the iterate X, the parameters Γ, and the step S, and so x = X (ω), γ = Γ (ω), µ = (ω) and s = S (ω) will denote their respective realizations As described in the introduction, another source of randomness from our problem in that the obective function f is accessed through a randomized estimator f For a given iteration index, we define F 0 = f(x ) and F = f(x + S ) The realizations of F 0 and F (taken over the randomness of f as well as that of the iterate X ) will be denoted by f 0 and f We can now provide probabilistic equivalents of Definitions 3 and 3 Definition 33 Let p (0,, κ ef > 0 and κ eg > 0 A sequence of random models M } is said to be p-probabilistically κ ef, κ eg }-first-order accurate with respect to the sequence X, } if the events } gm U = J(X ) r(x ) κ eg 6 γ & f(x ) M (X ) κ ef (8)

7 satisfy the following condition p = P (V F M F ) p, () where F M F = σ(m 0,, M, F0 0, F 0,, F 0, F ) is the σ-algebra generated by M 0,, M and F0 0, F 0,, F 0, F Definition 34 Given constants ε f > 0, and q (0,, the sequences of random quantities F 0 and F is called (q)-probabilistically ε f -accurate, for corresponding sequences X }, Γ }, if the events V = F 0 f(x ) ε f and F f(x + S ) } ε f satisfy the following condition q = P (V F / M F ) q, () where F M F / is the σ-algebra generated by M 0,, M, F 0 0, F 0, F 0, F Here again, we point out that the parameter µ plays the role of a reciprocal of the trustregion radius In that sense, the previous definitions are consistent with the definitions of sufficient accuracy presented in the case of stochastic trust-region methods [3 4 Global convergence to first-order critical points In this section, we aim at establishing convergence of Algorithm when the function estimates and the models satisfy the probabilistic properties described in Section 3 Our analysis bears strong similarities with that of the STORM algorithm [3, but possesses significant differences induced by the use of probabilistic gradients rather than probabilistic fully linear models 4 Assumptions and deterministic results We will analyze Algorithm under the following assumptions Assumption 4 f is continuously differentiable on an open set containing the level set L(x 0 ) = x R n f(x) f(x 0 )}, with Lipschitz continuous gradient, of Lipschitz constant ν We also require that the Jacobian model is uniformly bounded Note that the result is assumed to hold for every realization of the algorithm, therefore such an assumption will be valid in both a deterministic and random context Assumption 4 There exists κ Jm > 0 such that for all and all realizations J m of the -th model Jacobian, one has: J m κ Jm Additionally, we assume that the subproblem is approximately solved so that a fraction of a Cauchy decrease is satisfied for the model 7

8 Assumption 43 There exists θ fcd > 0 such that for every iteration of every realization of the algorithm, m (x ) m (x + s ) θ fcd g m J m (3) + γ We will also assume the following bounds hold Assumption 44 At each iteration and for each realization of the algorithm, the step size satisfies s g m γ = µ, (4) and there exists θ in > 0 such that s (γ s + g m ) 4 J m g m + θ in g m γ = 4 J m + θ in µ (5) Several choices for the approximate minimization of m (x +s) that verify (3),(4) and (5) can be proposed; in particular, the result holds for steps computed via a truncated Conugate Gradient algorithm (initialized with the null vector) applied to the quadratic m (x + s) m (x )) [6, Lemma 5 Lemma 4 Let Assumptions 4, 4, and 44 hold for a realization of Algorithm Consider the -th iteration of that realization, and suppose that m is (κ ef, κ eg )-first-order accurate Then, where κ efs = κ ef +κ eg+ν+4κ Jm f(x + s ) m (x + s ) κ efs µ, (6) Proof Using Assumptions 4, 4, and 44 within a Taylor expansion of the function f around x, we obtain: f(x + s ) m (x + s ) f(x ) + f(x ) s + ν s m (x ) gm s s Jm J m s f(x ) m (x ) + ( ) ν f(x ) g m s + s + J m J m s hence the result κ ef µ + κ eg s + ν + κ J m s µ κ ef + κ eg + ν + 4κ J m Lemma 4 illustrates that our accuracy requirements are enough to guarantee accuracy of any computed step We now state various results holding for a realization of Algorithm that do not make direct use of the probabilistic nature of the method These will be instrumental in proving Theorem 4 8, µ

9 Lemma 4 Let Assumptions 4, 4, 43, and 44 hold for a realization of Algorithm, and consider its -th iteration If the model is (κ ef, κ eg )-first-order accurate and µ max κ J m, 8(κ } ef + κ efs ) η θ fcd g m, (7) then the trial step s satisfies f(x + s ) f(x ) η θ fcd 8 Proof Since the model is (κ ef, κ eg )-first-order accurate, we have: g m µ (8) f(x + s ) f(x ) = f(x + s ) m(x + s ) + m(x + s ) m (x ) + m (x ) f(x ) κ efs µ + m(x + s ) m (x ) + κ ef µ κ ef + κ efs µ = κ ef + κ efs µ η θ fcd η θ fcd g m κ J m + γ g m κ J m + µ g m, where we used the result of Lemma 4 and Assumption 43 Using the first part of (7), we have µ g m κ J m and thus f(x + ) f(x ) κ ef + κ efs µ κ ef + κ efs µ η θ fcd η θ fcd g m κ J m + µ g m g m µ g m = [ κef + κ efs η θ fcd g m µ µ 4 [ η θ fcd g m, µ 8 where the second part of the maximum in (7) was used in the last line, yielding the expected result The next result is a consequence of Lemma 4 Lemma 43 Let the assumptions of Lemma 4 hold If m is (κ ef, κ eg )-first-order accurate and ( µ κ eg + max κ J m, 8(κ }) ef + κ efs ) η θ fcd f(x ), (9) then the trial step s satisfies f(x + s ) f(x ) C f(x ) µ, (0) 9

10 where C = η θ fcd 8 max κ Jm, 8(κ } ef +κ efs ) η θ fcd κ eg+max κ Jm, 8(κ ef +κ efs ) η θ fcd } Proof Since the model is (κ ef, κ eg )-first-order accurate, we have: f(x ) f(x ) g m + g m κ eg µ + g m () Using (9) to bound the left-hand side, we obtain: } κ eg + max κ J m, 8(κ ef +κ efs ) η θ fcd κ eg + g m, µ µ } which gives µ max κ J m, 8(κ ef +κ efs ) η θ fcd g m We are thus in the assumptions of Lemma 4, and (8) holds Using again the fact that the model is (κ ef, κ eg )-first-order accurate together with (7) and (), we have: leading to f(x ) κ eg µ + g m κ eg + max κ eg κ J m, 8(κ ef +κ efs ) η θ fcd } max κ J m, 8(κ ef +κ efs ) η θ fcd g m } f(x κ eg + max κ J m, 8(κ ef +κ efs ) ) η θ fcd Combining this relation with (8) finally gives (0) } f(x ) + g m, Lemma 44 Let Assumptions 4, 4, 43 and 44 hold Consider the -th iteration of a realization of Algorithm such that x is not a critical point of f Suppose further that m is (κ ef, κ eg )-first-order accurate, (f 0, f ) is ε f -accurate, and α + α + 4ακ J m ( η ) µ max ( η ), η g m = κ µg g m, () holds, where α = ε f + κ eg + ν + 5κ J m + θ in Then, the -th iteration is successful (ie ρ η and g m η µ ) Proof To simplify the notations, we will omit the indices in the proof 0

11 ρ = = = = f 0 f m(x) m(x + s) m(x) m(x + s) f 0 + f m(x) m(x + s) g ms s (JmJ m + γi)s f 0 + f m(x) m(x + s) ( f f 0 gms s JmJ m s ) s (g m + γs) m(x) m(x + s) f f 0 gms s JmJ m s + s (g m + γs) m(x) m(x + s) We look in more detail at the first term arising in the numerator; we have: f f 0 gms s JmJ m s = f f(x + s) + f(x + s) f(x) + f(x) f 0 gms s JmJ m s = f f(x + s) + f(x) s + ( f(x + ts) f(x)) s 0 +f(x) f 0 gms s JmJ m s = f f(x + s) + [ f(x) g m s + ( f(x + ts) f(x)) s 0 +f(x) f 0 s JmJ m s f f(x + s) + [ f(x) g m s Thus, we obtain: + 0 f(x + ts) f(x) s dt + f(x) f 0 + s J mj m s ε f µ + κ eg s + ν µ s + ε f µ + J m s = ε f µ + κ eg s µ + ν + κ J m s ρ ε f + κeg s µ µ + ν+κ Jm 4 s + s (g m + γs) m(x) m(x + s) at Using Assumption 44 on the numerator and Assumption 43 on the denominator, we arrive

12 ρ ε f µ + κeg s µ + ν+κ Jm 4 s + s (g m + γs) m(x) m(x + s) ε f + κeg + ν+κ Jm + 4κ Jm +θ in µ µ µ µ m(x) m(x + s) ( εf + κ eg + ν + 5κ ) J m + θ in µ As a result, we have = θ fcd g m J m +γ ( εf + κ eg + ν + 5κ J m + θ in ) gm θ fcd g m κ Jm +γ = ( ε f + κ eg + ν + 5κ J m + θ in ) κ J m + γ γ = α κ J m + γ γ ρ η α κ J m + γ γ η 0 ( η )γ αγ α κ J m Since the right-hand side is a second-order polynomial in γ, this gives α + α + 4ακ J m ( η ) α + α + 4ακ J m ( η ) γ µ ( η ) ( η ) g m But this contradicts (), from which we conclude that we necessarily have ρ < η, and thus ρ > η Since g m η µ as a direct consequence of (), the iteration is a successful one, and the parameter µ is not increased We point out that Lemma 44 only involves the accuracy requirements on the model gradient, thanks to the accuracy of the function estimates Lemma 45 Let Assumptions 4, 4, 43, and 44 hold Consider a successful iteration of index for a realization of Algorithm, such that x is not a critical point of f Suppose further that (f 0, f ) is ε f -accurate, with } η max κ 8ε f J m, (3) η θ fcd γ Then, one has: f(x + s ) f(x ) C µ (4) where C = η η θ fcd 4 ε f > 0

13 Proof By definition of a successful iteration and using the accuracy properties of the models and the estimates, we have f(x + ) f(x ) = f(x + s ) f(x ) = f(x + s ) f + f f 0 + f 0 f(x ) ε f µ ε f µ ε f µ ε f µ + f f 0 + η (m(x + s ) m(x )) η θ fcd η θ fcd as η κ J m Since the iteration is successful, we have µ g m η, leading to f(x + ) f(x ) ε f µ ε f µ = C µ η θ fcd 4 η η θ fcd 4 g m κ J m + µ g m g m η + µ g m g m µ which proves the desired result (the positivity of C comes from (3)) µ 4 Almost-sure global convergence We now turn to the probabilistic properties to be assumed in our algorithm Assumption 45 The random model sequence M } is p-probabilistically κ ef, κ eg }-first-order accurate for some p (0,, κ ef > 0, and κ eg > 0 Assumption 46 The sequence of random function estimates (F 0, F )} is q-probabilistically ε f -accurate for some q (0, and ε f > 0 Assumption 47 The constant η is chosen such as η max κ J m, 6(κ } ef + κ efs ) 8ε f, (5) θ fcd η θ fcd In the rest of the paper, we will assume that pq (if pq =, we have for every, p = P (U F M ) = p = q = P (V F M F ) = q = and the behavior of the algorithm reduces to that of an inexact deterministic algorithm with inexact subproblem solution) We introduce the random function Φ = τf(x ) + τ, (6) 3

14 where τ (0, ) satisfies τ λ τ > max λ, λ λ, λ λ C ζ C κ ef +κ efs } (7) and ζ is a parameter such that ( ζ κ eg + max κ µg, 8(κ }) ef + κ efs ), κ J η θ m, η (8) fcd The proposition below states that the regularization parameters diverges with probability Theorem 4 Let Assumptions 4, 4, 43, 44 and 47 hold Suppose that Assumptions 45 and 46 are also satisfied, with the probabilities p and q chosen in a way specified later on Then, P < = (9) µ =0 Proof We follow the proof technique of [3, Theorem 4 (see also [0) Our goal is to show that there exists σ > 0 such that at every iteration, E [ Φ + Φ F M F σ, (30) where here the expectation is taken over the product σ-algebra generated by all models and function value estimates (note however that Since f is bounded from below (by 0), Φ 0 and > 0, (30) guarantees that the series converges almost surely (see, eg, [4, Proposition 44) We will now prove that (30) holds and give appropriate values for τ and σ Consider a realization of Algorithm, and let φ be the corresponding realization of Φ If is the index of a successful iteration, then x + = x + s, and µ + µ λ One thus has: φ + φ τ (f(x + ) f(x )) + ( τ) λ µ (3) If is the index of an unsuccessful iteration, x + = x and µ + = λµ, leading to ( ) φ + φ = ( τ) λ µ < 0 (3) For both types of iterations, we will consider four possible outcomes, involving the quality of the model and estimates In addition, we will divide the iterations in two groups, depending on the relationship between the true gradient norm and ζ µ, where ζ satisfies (8) above Case : f(x ) ζ µ 4

15 (a) Both m and (f 0, f ) are accurate Since we are in Case, f(x ) (κ eg + κ µg ) µ Because the model is (κ ef, κ eg )-first-order accurate, this implies g m f(x ) κ eg µ (ζ κ eg ) µ κ µg µ so () holds; since the estimates are also accurate, the iteration is successful by Lemma 44 Moreover, ( f(x ) ζ µ κ eg + max κ J m, 8(κ }) ef + κ efs ) η θ µ, fcd so the condition (9) is satisfied, and by Lemma 43, we can guarantee a decrease on the function value More precisely, ϕ + ϕ f(x ) τc + ( τ)(λ ) µ µ (33) By (7), we have τc f(x ) µ + ( τ)(λ ) µ τc ζ + ( τ)(λ ) µ ( ) < ( τ) λ µ, so the last right-hand side of (33) and (3) also holds (the latter will be used for the remaining cases) (b) Only m is accurate The decrease formula of Lemma 43 is still valid in that case: if the iteration is successful, then (33) holds and by (7), (3) also holds Otherwise, (3) holds (c) Only (f 0, f ) is accurate If the iteration is unsuccessful, then (3) is satisfied Otherwise, we can apply Lemma 45 and have a guarantee of decrease in the case of a successful iteration, namely, f(x + s ) f(x ) C µ, from which we obtain ϕ + ϕ [ τc + ( τ)(λ ) µ (34) We again deduce from (7) that (3) also holds in that case (d) Both m and (f 0, f ) are inaccurate We again focus on the successful iteration case, as we can use (3) in the other situation By considering a Taylor expansion of f(x + s ), 5

16 we know that the possible increase in the step is bounded above by: f(x + s ) f(x ) f(x ) s + L s f(x ) s + L s f(x ) s + L s µ ( + L ) f(x ) s ζ ( + L ) f(x ) ζ µ We thus obtain the following bound on the change in φ: where C 3 = ( + L ζ ϕ + ϕ ) f(x ) τc 3 + ( τ)(λ ) µ µ, (35) Putting the four cases together with their associated probability of occurrence, we have [ E Φ + Φ F M F, f(x ) ζ } [ [ f(x ) pq τc + ( τ)(λ ) ( ) + [p( q) + ( p)q ( τ) λ [ f(x ) +( p)( q) τc 3 + ( τ)(λ ) = [ C pq + ( p)( q)c 3 τ f(x ) [ C pq + ( p)( q)c 3 τ f(x ) where the last line uses + [pq λ (p( q) + ( p)q) + ( p)( q)( τ)(λ ) + ( τ)(λ ), pq λ (p( q) + ( p)q) + ( p)( q) (p + ( p))(q + ( q)) = Suppose p and q are chosen such that holds Then, one has by combining (36) and (7): pq / ( p)( q) C 3 C, (36) [ C pq + ( p)( q)c 3 C ( τ)(λ ) (37) τζ 6

17 On the other hand, since f(x ) ζ/µ, we have: ( τ)(λ ) [ C pq + ( p)( q)c 3 τ f(x ) This leads to [ E Φ + Φ F M F, f(x ) ζ µ } 4 C τ f(x ) 4 C τζ, which, using (37), finally gives: [ E Φ + Φ F M F, f(x ) ζ µ } ( τ) ( λ ) 4 ( τ) ( λ ) 4 (38) Case : f(x ) < ζ µ Whenever g m < η µ, the iteration is necessarily unsuccessful and (3) holds We thus assume in what follows that g m η µ, and consider again four cases (a) Both m and (f 0, f ) are accurate It is clear that (3) holds if the iteration is unsuccessful; if it is successful, then we can use the result from Lemma 45, and we have: f(x + ) f(x ) C µ from which we obtain (34) We thus deduce from (7) that (3) also holds in that case (b) Only m is accurate If the iteration is unsuccesful, it is clear that (3) holds Otherwise, using η κ J m that arises from (5) and applying the same argument as in the proof of Lemma 45, we have g m η µ κ Jm µ, thus m (x ) m (x + s ) θ fcd g m κ J m + µ g m θ fcd g m µ g m = θ fcd g m η θ fcd 4 µ 4 Since the model is (κ ef, κ eg )-first-order accurate, the function variation satisfies: f(x ) f(x + s ) = f(x ) m (x ) + m (x ) m (x + s ) + m (x + s ) f(x + s ) κ ef µ + η θ fcd 4 µ κ efs µ ( ) η θ fcd (κ ef + κ efs ) 4 µ κ ef + κ efs µ, µ where the last line comes from (5) As a result, [ φ + φ τ κ ef + κ efs + ( τ)(λ ) by (7) µ ( ( τ) ) λ µ (39) 7

18 (c) Only (f 0, f ) is accurate This case can be analyzed the same way as Case a (d) Both m and (f 0, f ) are inaccurate As in Case d, we have The change in φ thus is f(x + s ) f(x ) f(x ) s + L s ζ s + L µ s ( ζ + L ) ζ < ζc 3 µ µ φ + φ [ τc 3 ζ + ( τ)(λ ) µ (40) Combining all the subcases for Case, we can bound all of those by (3) save for Case d, which occurs with probability ( p)( q) Thus, [ E Φ + Φ F M F, f(x ) < ζ } ( ) [pq + p( q) + q( p)( τ) λ ( ( τ) ) λ + ( p)( q) [ τc 3 ζ + ( τ)(λ ) + ( p)( q) [ τc 3 ζ + ( τ)(λ ) We now assume that p and q have been chosen such that ( p)( q) ( τ) ( ) λ τc 3 ζ + ( τ)(λ ) (4) holds Using (4), we obtain [ E Φ + Φ F M F, f(x ) < ζ } ( τ) ( λ ) ( 4 ( τ) ) λ, (4) which is the same amount of decrease as in (38) Letting σ = 4 ( τ) ( λ ), we have then established that for every iteration, E [ Φ + Φ F M F σ < As a result, the statement of the theorem holds In the proof of Theorem 4, we have enforced several properties on the probability thresholds p and q: we summarize those as follows 8

19 Corollary 4 Under the assumptions of Theorem 4, its statement holds provided the probabilities p and q satisfy: pq / ( p)( q) C 3, (43) C and ( p)( q) ( τ) ( λ ) (τc 3 ζ + ( τ)(λ )) (44) Proposition 4 Let G be a submartingale, in other words, a set of random variables which are integrable (E( G ) < ) and satisfy E(G F ) G, for every, where F = σ(g 0,, G ) is the σ-algebra generated by G 0,, G and E(G F ) denotes the conditional expectation of G given the past history of events F Assume further that there exists M > 0 such that G G M <, for every Consider the random events C = lim G exists and is finite} and D = lim sup G = } Then P (C D) = This finally leads to the desired result Theorem 4 Let the assumptions of Theorem 4 and Corollary 4 hold Then, the sequence of random iterates generated by Algorithm satisfies: ( ) P lim inf f(x ) = 0 = Proof Following the lines of the proof of [3, Theorem 46, we proceed by contradiction and assume that, there exists ɛ > 0 such that P ( f(x ) ɛ } ) > 0 We then consider a realization of Algorithm for which f(x ) ɛ Since lim µ =, there exists 0 such that for every 0, we have: } µ > b = κ µg max ɛ, 6(κ ef + κ efs ) η θ fcd ɛ, κ J m ɛ, η ɛ, λ µ min (45) Let R be a random variable with realizations r = log λ ( b µ ): then for the realization we are considering, we have r < 0 for 0 Our obective is to show that such a realization has a a zero probability of occurrence Consider 0 such that both events S and V happen: the probability of such an event is at least pq Because the model is accurate and we have (45): g m f(x ) κ eg µ ɛ ɛ = ɛ We are thus in the assumptions of Lemmas 43 and 44, from which we conclude that the - th iteration is successful, so the parameter µ is decreased, ie, µ + = µ λ Consequently, r + = r + 9

20 For any other outcome for U and V other than both happen (which occur with probability at most pq), we have µ + λµ As a result, letting F V T = σ(v 0,, V k ) σ(t 0,, T k ) = σ(r 0,, R ), E [ r + F V T pq(r + ) + ( pq)(r ) r, because pq > / as a consequence of the assumptions from Corollary 4 This implies that R is a submartingale We now define another submartingale W by W = ( Ui Vi ), i=0 where A is the indicator random variable of the event A Note that W is defined on the same probability space as R, and that we have: E [ W F V T = E [ W F V T [ + E U V F V T = W + P ( U V F V T ) W, where the last inequality holds because pq / Therefore, W is a submartingale with bounded (±) increments By Proposition 4, it does not have a finite limit and the event lim sup W = } has probability To conclude, observe that by construction of R and W, one has r r 0 w w 0, where w is a realization of W This means that R must be positive infinitely often with probability one, thus that there is a zero probability of having r < 0 0 This contradicts our initial assumption that P( f(x ) ɛ ) > 0, which means that we must have ( ) P lim inf ) = 0 = 5 Complexity analysis In this section, we will analyze the convergence rate of our algorithm using stochastic processes The proposed expected convergence rate methodology is inspired by the complexity analysis developed by Blanchet et al [7 However, it presents a number of variations that lead to a difference in the components of the final complexity bound (see Theorem 5) The derivation of our complexity result is thoroughly detailed in order to clarify the original features of our reasoning Given a stochastic process X }, T is said to a be a stopping time for X }, if, for all, the event T } belongs to the σ-algebra by X, X, X For a given ɛ > 0, define a random time T ɛ T ɛ = inf 0, f(x ) ɛ}, 0

21 let also ɛ = ζ ɛ, where ζ ( κ eg + max κ µg, 8(κ }) ef + κ efs ), κ J η θ m, η (46) fcd Based on Theorem 4, one deduces that T ɛ is a stopping time for the stochastic process defined by Algorithm and hence for Φ, } where Φ is given by (6) Assumption 5 There exists a positive constant Φ max > 0 such that Φ Φ max For simplicity reasons, we will assume that µ 0 = ɛ λ and µ s min = ɛ λ for some integers s, t > 0, t hence for all, one has = ɛ for some integer k We note that, in this case, whenever λ k < ɛ, one has ɛ λ, and hence + ɛ This assumption can be made without loss of generality, for instance, provided µ min = µ 0 λ s t (one can choose µ min so that this is true) and ζ = µ 0 λ s ɛ, where s is the smallest integer such that ζ satisfies (46) We first depart from the analysis of [7 in the next lemma It defines a geometric random walk based on successful iterations The final complexity result heavily depends upon the behavior of this random walk Lemma 5 Let Assumptions 4, 4, 43, 44 and 47 hold For all < T ɛ, whenever ɛ, one has or, equivalently, letting γ = log(λ), one has + = λ Ω + λ ( Ω ), + = e γλ, (47) where Ω is equal to if the iteration is successful and 0 otherwise and Λ = Ω defines a birth-and-death process, ie, P(Λ = F M F, ɛ ) = P(Λ = F M F, ɛ ) = ω, with ω pq Proof By the mechanism of the algorithm one has + = λ Ω + λ ( Ω ) Moreover, if µ ɛ for a given < T ɛ, one has f(x ) ɛ and hence from the definition of ɛ one gets f(x ) ζ µ Assuming U = and V = (ie both m and (f 0, f ) are accurate) Since the model is (κ ef, κ eg )-first-order accurate, this implies g m f(x ) κ eg µ (ζ κ eg ) µ κ µg µ ; since the estimates are also accurate, the iteration is successful by Lemma 44 Hence, one gets ω = P(Λ = F M F, ɛ ) pq Lemma 5 is analogous to [7, Lemma 35, however, in our case, the birth-and-process Λ } is based on successful iterations, whereas [7 considered the iterations where both the function estimates and the model were accurate The next result exactly follows Case in the proof of Theorem 4, therefore its proof is omitted

22 Lemma 5 Let Assumptions 4, 4, 43, 44 and 47 hold Suppose that Assumptions 45 and 46 are also satisfied, with the probabilities p and q satisfy: and ( p)( q) pq / ( p)( q) C 3 C, (48) ( τ) ( λ ) (τc 3 ζ + ( τ)(λ )) (49) Then, there exists a constant σ > 0 such that, conditioned on T ɛ >, one has or, equivalently, E [ Φ + Φ F M F σ <, (50) E [ Φ + F M F In this case, σ = 4 ( τ) ( λ ) We define the renewal process A i } as follows: < Φ σ, (5) A 0 = 0 and A i = min k > A i : k ɛ } A represents the number of iterations for which has a value smaller than ɛ Let also, for all, τ = A A The next result provides a bound on the expected value of τ Lemma 53 Let Assumptions 4, 4, 43, 44 and 47 hold Assuming that pq >, one has for all Proof One has E [τ pq pq (5) E [τ = E [ τ A < ɛ P( A < ɛ ) + E [ τ A = ɛ P( A = ɛ ) maxe [ τ A < ɛ, E [ τ A = ɛ } (53) First we note that whenever < ɛ, one has ɛ λ, and hence + ɛ Thus, if A < ɛ, one deduces that A = A + and then E [ τ A < ɛ = (54) Assuming now that A > A + (if not, meaning that A = A +, the proof is straightforward ), then conditioned on A = ɛ, one has A = ɛ as well We note also that for all k [A, A, one has k ɛ Hence, using Lemma 5, one has k + = k e γλ k,

23 where γ = log(λ) and P(Λ k = Fk M F, k ɛ ) = ω k and P(Λ k = Fk M F, k ɛ ) = ω k Moreover, one has ω k pq The process A, A +,, A } then defines a geometric random walk between two returns to the same state (ie, ɛ ) and τ represents the number of iterations until a return to the initial state For such a geometric random walk, one can define the state probability vector π = (π k ) k corresponding to the limiting stationary distribution [5 Using the local balance equation between the two states k and k +, see [5, Theorem 3, one has Since ω k pq, one deduces that Hence, π k κ k π 0 ( ω k )π k = ω k π k+ ( pq)π k pqπ k+ where κ = pq pq Using the assumption κ < (ie pq > ) and the definition of the state probability k π k =, one has π 0 κ (this is a classical result for geometric random walk, see for instance [5, Example 6) Applying the properties of ergodic Markov chains, one deduces that the expected number of iterations until a return to the initial state (the state 0) is given by π 0 Hence E [ τ A = ɛ = π 0 κ = pq pq (55) By substituting (54) and (55) into (53), one deduces E [τ completed pq pq and hence the proof is We now introduce a counting process N() given by the number of renewals that occur before time : N() = max i : A i } We also consider the sequence of random variables defined by Y 0 = Φ 0 and Y = Φ min(,tɛ) + σ min(,t ɛ) k=0 ( k ) for all The definition of Y } is our second and main difference with the analysis of [7, which leads to a different form for the bound on E [N(T ɛ ) provided in the lemma below, compared to [7, Lemma Lemma 54 Under the assumptions of Lemma 5 Let Assumption 5 hold One has, E [N(T ɛ ) Φ 0 σ ɛ 3

24 Proof Note that Y defines a supermartingale with respect to F M F Indeed, if < T ɛ, then using Lemma 5 one has, E [ Y + F M F = E [ [ Φ + F M F + E σ ( k=0 ( ) ( Φ σ + σ k=0 ( ) = Φ + σ = Y k=0 [ If T ɛ, one has Y + = Y and thus E Y + F M F = Y k k k ) ) F M F Using Assumption 5, one has for all T ɛ, Y = Y Tɛ Φ max + (Tɛ+)σ Hence, since T µ ɛ min is bounded, Y is also bounded Using Theorem 4, one knows that T ɛ is a stopping time and hence by means of the optional stopping Theorem ([6, Theorem 64) for supermartingale, one concludes that E [Y Tɛ E [Y 0 Hence, σe [ Tɛ ( ) E [Y Tɛ E [Y 0 = Φ 0 (56) k=0 k By the definition of the counting process N(T ɛ ), since the renewal times A i (which satisfy Ai ɛ ) are a subset of the iterations 0,,, T ɛ, one has T ɛ k=0 ( k Inserting the latter inequality in (56), one gets Which concludes the proof ) ( ) N(T ɛ ) ɛ E [N(T ɛ ) Φ 0 σ ɛ Using Wald s equation [6, Corollary 63, we can finally obtain a bound on the expected value of T ɛ Theorem 5 Under the assumptions of Lemma 5 Let Assumption 5 hold One has E [T ɛ pq ( ) Φ0 pq σ ɛ +, or, equivalently, E [T ɛ pq ( κs ɛ + ) pq 4

25 where κ s = τf(x 0)+( τ)µ 0 ) ζ, τ (0, ) satisfies λ 4 ( τ) ( τ λ τ > max λ, λ λ, λ λ C ζ C κ ef +κ efs and ζ is a parameter such that ( ζ κ eg + max κ µg, 8(κ }) ef + κ efs ), κ J η θ m, η fcd Proof First note that the renewal process A N(Tɛ)+ = N(T ɛ) i=0 τ i where τ i defines independent inter-arrival times Moreover, since the probabilities p and q satisfy (48), one has pq > / and hence, by applying Lemma 53, for all i =,, N(T ɛ ) one has E [τ i by Wald s equation [6, Corollary 63, one gets, E [ A N(Tɛ)+ = E [τ E [N(T ɛ ) + pq pq E [N(T ɛ) + } pq pq By the definition of N(T ɛ ) one has A N(Tɛ)+ > T ɛ, hence using Lemma 54 one gets E [T ɛ E [ ( ) pq Φ0 A N(Tɛ)+ pq σ ɛ + < + Thus, As for the previous lemma, we observe that the complexity bound of Theorem 5 has a different form than that of [7 Both are of order of ɛ, but our result does not include a term in ɛ 6 Application to data assimilation Data assimilation is the process by which observations of a physical system are incorporated into a model together with prior knowledge, so as to produce an estimate of the state of this system More precisely, the methodology consists in computing z 0,, z T, where z i is the realization of the stochastic state Z i at time i, from (a) an initial state Z 0 z b + N(0, B ), with z b being the prior knowledge at time 0 of the process Z, (b) the observations y i which satisfy y i H i (Z i ) + N(0, R i ), i = 0,, T, and (c) the numerical physical system model Z i M i (Z i ) + N(0, Q i ), i =,, T We note that the model operator M i at time i as well as the observation operator H i are not necessary linear The random vectors Z 0 z b, y i H i (Z i ) and Z i M i (Z i ) define the noises on the prior, on the observation at time i, and on the model at time i, with covariance matrices B, R i, and Q i, respectively The 4DVAR formulation is one of the most popular data assimilation methods It assumes that the errors (the prior, the observation, and the model errors) are independent from each other and uncorrelated in time It also assumes that the posterior probability function of Z (in other words, the probability density function of Z 0,, Z T knowing y 0,, y T is proportional to ( ( )) exp T T z 0 z b (B ) + z i M i (z i ) + y Q i H i (z i ) i R i i= 5 i=0

26 The 4DVAR method maximizes the previous function over z 0,, z T, which is equivalent to minimizing, ( z 0 z b (B ) + T i= x i M i (z i ) Q i + T i=0 y i H i (z i ) R i ) (57) The latter optimization problem is known to the data assimilation community as the weak constraint 4DVAR formulation [8 One of the most significant challenges with this formulation is the practical estimation of the covariance matrices Q i, i =,, T [8, 8 In many applications it is assumed that the physical model is perfect, ie, i, Q i = 0 This scenario, known as the strong constraint 4DVAR formulation [6 is equivalent to solving the following minimization problem, ( min z 0,,z T R n z 0 z b (B ) + ) T i=0 y i H i (z i ), R i (58) s t z i = M i (z i ), i =,, T For the sake of simplicity, we will focus on problem (58) in the rest of the section By defining R H 0 (z) y = [y 0 ; ; y T, R = 0 R 0, and H(z) H o M (z) =, 0 0 R T H T o M T o M T o o M (z) we can re-write problem (58) as ) min ( z 0 z b z 0 R n (B ) + y H(z 0 ) R (59) The problem thus reduces to the determination of z 0, as z,, z T can be computed afterwards using z i = M i (z i ), i =,, T In order to link the notation to the generic optimization problem defined earlier in this paper, we will now denote the vector z 0 in (59) by x In many data assimilation problems, like those appearing in weather forecasting, the covariance matrix B is only known approximately Instead, one has access to an ensemble of N elements ẑ k} N, assumed to be sampled from the Gaussian distribution with the empirical k= mean z b and the unknown covariance matrix B The matrix B is approximated by the empirical covariance matrix of the ensemble: B N = N N k= (ẑ k z b ) ( ẑ k z b ) (60) The matrix B N follows the Wishart distribution [30; thus, if N n +, B N is nonsingular with probability one (the matrix (B N ) follows the inverse Wishart distribution) In this case, E [ B N = B and E [ (B N ) = N N n (B ) (6) We will assume that N is large enough relative to n, so that the empirical covariance matrix B N can be assumed to be non-singular and, furthermore, E [ (B N ) approximates (B ) 6

27 sufficiently well Since E [ B N (or equivalently B ) is usually not known for many problems, in practice, one considers the following problem of minimization in lieu of (59): min x R n x z b (B N ) + y H(x) R, (6) This optimization problem can be seen as a noisy approximation of (59), with B N instead of B To find the solution of the problem (6), a common approach used in the data assimilation community is to proceed iteratively by linearization At a given iteration, one computes s as an approximate solution of the auxiliary linear least squares subproblem defined as min s R n s + x z b (B N ) + y H(x ) H s R, (63) and sets x + = x + s, where H = H (x ) Such an iterative process is known in the data assimilation community as the incremental approach [6 This method is simply the Gauss- Newton method [3 applied to (6) To solve the subproblem (63), we propose to use the ensemble Kalman filter (EnKF) as a linear least squares solver The EnKF [7 consists of applying Monte Carlo techniques to approximately solve the subproblem (63) Recall that we have a ensemble of N elements ẑ k, for k =,, N, which are assumed to be sampled from the Gaussian distribution with the mean z b and the unknown covariance matrix B Thus, the empirical covariance matrix of the ensemble B N, which approximates the matrix B, is given by (60) EnKF generates a new ensemble s k,a as follows ( ( ) s k,a = ẑ k x + K y H(x ) H ẑ k x ˆv k), ( where ˆv k is sampled from N(0, R), and K = B N H H B N H + R) In practice, the matrices B N and K are never computed or stored explicitly The reader is referred to [7 and the references therein for more details on the computation The subproblem (63) solution is then approximated by s a = z b x + K (y H(x ) H (ẑ b x ) ˆv), where ˆv is the empirical mean of the ensemble ˆv k } One can show easily that s a is the minimizer of min s R n s + x z b (B N ) + y H(x ) ˆv H s R (64) Both the incremental method (ie, the Gauss-Newton method) and the method which approximates the solution of the linearized subproblem using EnKF may diverge A regularization approach like that of Algorithm controls the norm of the step so as to guarantee convergence We thus consider min s R n s + x z b (B N ) + y H(x ) ˆv H s R + γ s, for use as a subproblem in Algorithm In order for the algorithm to be globally convergent, one then has to ensure that the regarded data assimilation problem provides estimates for the 7

28 obective function and the gradient that are sufficiently accurate to a suitable high probability By analogy with the previous sections of the paper, we set f(x) f(x + s) m (0) m (s) = x z b (B ) + y H(x) R, (65) = x + s z b (B ) + y H(x + s) R, (66) = x z b (B N ) + y H(x ) ˆv R, (67) = s + x z b (B N ) + y H(x ) ˆv H s R + γ s (68) Furthermore, natural estimates f 0 and f to f(x) and f(x + s) respectively, can be given by f 0 (x ) f (x + s) = x z b (B N ) + y H(x ) R, = x + s z b (B N ) + y H(x + s) R The exact gradient of the non-noisy function (65) is then given by f(x ) = (B ) (x z b ) + H R (H(x ) y), and the gradient of the stochastic model (67) is m (x ) = (B N ) (x z b ) + H R (H(x ) y ˆv) To derive simple bounds for the errors and for simplicity, we make the assumption that ˆv = 0 In practice this assumption can be easily satisfied by centering the ensemble ˆv k } In fact, one generates ˆv k } then consider the ensemble defined by ṽ k = ˆv k ˆv instead of ˆv k } Note that the empirical mean of ṽ k } is then ˆṽ = 0 In the next lemma, we recall the Chebyshev s inequality which will be useful in the sequel of this section Lemma 6 (Chebyshev s inequality) Let X be an n dimensional random vector with expected value µ and covariance matrix V, then for any real number t > 0 P X µ V > t} n t In particular, if X is scalar valued, then one has P X µ > t} V t The next theorem gives estimates of the required bounds on the errors appearing in Assumptions 45 and 46 Note that, at a given iteration, conditioned on the σ-algebra associated up to the current iterate F / M F, the remaining randomness only comes from the matrix BN We will consider a run of the algorithm under stopping criterion of the form µ > µ max Theorem 6 Let denote the current iterate index Assume that the ensemble size N is large enough compared to n, ie, N > (ℵ minλ µ 0, µ max } + )n +, where X z b (B ℵ = max ), X + S z b (B ), X } z b (B ), (B ) (X z b ) ε f ε f κ eg minλ µ 0, µ max } 8 κ eg

Convergence of trust-region methods based on probabilistic models

Convergence of trust-region methods based on probabilistic models Convergence of trust-region methods based on probabilistic models A. S. Bandeira K. Scheinberg L. N. Vicente October 24, 2013 Abstract In this paper we consider the use of probabilistic or random models

More information

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. In this paper we prove global

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models Math. Program., Ser. A DOI 10.1007/s10107-017-1137-4 FULL LENGTH PAPER Global convergence rate analysis of unconstrained optimization methods based on probabilistic models C. Cartis 1 K. Scheinberg 2 Received:

More information

Trust Region Methods for Unconstrained Optimisation

Trust Region Methods for Unconstrained Optimisation Trust Region Methods for Unconstrained Optimisation Lecture 9, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Trust

More information

Revenue Management Under the Markov Chain Choice Model

Revenue Management Under the Markov Chain Choice Model Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin

More information

Asymptotic results discrete time martingales and stochastic algorithms

Asymptotic results discrete time martingales and stochastic algorithms Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete

More information

Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes

Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes Fabio Trojani Department of Economics, University of St. Gallen, Switzerland Correspondence address: Fabio Trojani,

More information

Adaptive cubic overestimation methods for unconstrained optimization

Adaptive cubic overestimation methods for unconstrained optimization Report no. NA-07/20 Adaptive cubic overestimation methods for unconstrained optimization Coralia Cartis School of Mathematics, University of Edinburgh, The King s Buildings, Edinburgh, EH9 3JZ, Scotland,

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0. Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

On Complexity of Multistage Stochastic Programs

On Complexity of Multistage Stochastic Programs On Complexity of Multistage Stochastic Programs Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA e-mail: ashapiro@isye.gatech.edu

More information

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity Coralia Cartis, Nick Gould and Philippe Toint Department of Mathematics,

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Econ 582 Nonlinear Regression

Econ 582 Nonlinear Regression Econ 582 Nonlinear Regression Eric Zivot June 3, 2013 Nonlinear Regression In linear regression models = x 0 β (1 )( 1) + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β it is assumed that the regression

More information

Dynamic Replication of Non-Maturing Assets and Liabilities

Dynamic Replication of Non-Maturing Assets and Liabilities Dynamic Replication of Non-Maturing Assets and Liabilities Michael Schürle Institute for Operations Research and Computational Finance, University of St. Gallen, Bodanstr. 6, CH-9000 St. Gallen, Switzerland

More information

A No-Arbitrage Theorem for Uncertain Stock Model

A No-Arbitrage Theorem for Uncertain Stock Model Fuzzy Optim Decis Making manuscript No (will be inserted by the editor) A No-Arbitrage Theorem for Uncertain Stock Model Kai Yao Received: date / Accepted: date Abstract Stock model is used to describe

More information

A Trust Region Algorithm for Heterogeneous Multiobjective Optimization

A Trust Region Algorithm for Heterogeneous Multiobjective Optimization A Trust Region Algorithm for Heterogeneous Multiobjective Optimization Jana Thomann and Gabriele Eichfelder 8.0.018 Abstract This paper presents a new trust region method for multiobjective heterogeneous

More information

Portfolio Management and Optimal Execution via Convex Optimization

Portfolio Management and Optimal Execution via Convex Optimization Portfolio Management and Optimal Execution via Convex Optimization Enzo Busseti Stanford University April 9th, 2018 Problems portfolio management choose trades with optimization minimize risk, maximize

More information

Lecture Quantitative Finance Spring Term 2015

Lecture Quantitative Finance Spring Term 2015 implied Lecture Quantitative Finance Spring Term 2015 : May 7, 2015 1 / 28 implied 1 implied 2 / 28 Motivation and setup implied the goal of this chapter is to treat the implied which requires an algorithm

More information

In Discrete Time a Local Martingale is a Martingale under an Equivalent Probability Measure

In Discrete Time a Local Martingale is a Martingale under an Equivalent Probability Measure In Discrete Time a Local Martingale is a Martingale under an Equivalent Probability Measure Yuri Kabanov 1,2 1 Laboratoire de Mathématiques, Université de Franche-Comté, 16 Route de Gray, 253 Besançon,

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

Equity correlations implied by index options: estimation and model uncertainty analysis

Equity correlations implied by index options: estimation and model uncertainty analysis 1/18 : estimation and model analysis, EDHEC Business School (joint work with Rama COT) Modeling and managing financial risks Paris, 10 13 January 2011 2/18 Outline 1 2 of multi-asset models Solution to

More information

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation Chapter 3: Black-Scholes Equation and Its Numerical Evaluation 3.1 Itô Integral 3.1.1 Convergence in the Mean and Stieltjes Integral Definition 3.1 (Convergence in the Mean) A sequence {X n } n ln of random

More information

On the Lower Arbitrage Bound of American Contingent Claims

On the Lower Arbitrage Bound of American Contingent Claims On the Lower Arbitrage Bound of American Contingent Claims Beatrice Acciaio Gregor Svindland December 2011 Abstract We prove that in a discrete-time market model the lower arbitrage bound of an American

More information

Finite Memory and Imperfect Monitoring

Finite Memory and Imperfect Monitoring Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve

More information

Online Appendix Optimal Time-Consistent Government Debt Maturity D. Debortoli, R. Nunes, P. Yared. A. Proofs

Online Appendix Optimal Time-Consistent Government Debt Maturity D. Debortoli, R. Nunes, P. Yared. A. Proofs Online Appendi Optimal Time-Consistent Government Debt Maturity D. Debortoli, R. Nunes, P. Yared A. Proofs Proof of Proposition 1 The necessity of these conditions is proved in the tet. To prove sufficiency,

More information

Arbitrage of the first kind and filtration enlargements in semimartingale financial models. Beatrice Acciaio

Arbitrage of the first kind and filtration enlargements in semimartingale financial models. Beatrice Acciaio Arbitrage of the first kind and filtration enlargements in semimartingale financial models Beatrice Acciaio the London School of Economics and Political Science (based on a joint work with C. Fontana and

More information

Strategies for Improving the Efficiency of Monte-Carlo Methods

Strategies for Improving the Efficiency of Monte-Carlo Methods Strategies for Improving the Efficiency of Monte-Carlo Methods Paul J. Atzberger General comments or corrections should be sent to: paulatz@cims.nyu.edu Introduction The Monte-Carlo method is a useful

More information

Arbitrage Conditions for Electricity Markets with Production and Storage

Arbitrage Conditions for Electricity Markets with Production and Storage SWM ORCOS Arbitrage Conditions for Electricity Markets with Production and Storage Raimund Kovacevic Research Report 2018-03 March 2018 ISSN 2521-313X Operations Research and Control Systems Institute

More information

Non replication of options

Non replication of options Non replication of options Christos Kountzakis, Ioannis A Polyrakis and Foivos Xanthos June 30, 2008 Abstract In this paper we study the scarcity of replication of options in the two period model of financial

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4.

Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4. Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4. If the reader will recall, we have the following problem-specific

More information

Final exam solutions

Final exam solutions EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the

More information

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

Large-Scale SVM Optimization: Taking a Machine Learning Perspective Large-Scale SVM Optimization: Taking a Machine Learning Perspective Shai Shalev-Shwartz Toyota Technological Institute at Chicago Joint work with Nati Srebro Talk at NEC Labs, Princeton, August, 2008 Shai

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Methods and Models of Loss Reserving Based on Run Off Triangles: A Unifying Survey

Methods and Models of Loss Reserving Based on Run Off Triangles: A Unifying Survey Methods and Models of Loss Reserving Based on Run Off Triangles: A Unifying Survey By Klaus D Schmidt Lehrstuhl für Versicherungsmathematik Technische Universität Dresden Abstract The present paper provides

More information

Variable-Number Sample-Path Optimization

Variable-Number Sample-Path Optimization Noname manuscript No. (will be inserted by the editor Geng Deng Michael C. Ferris Variable-Number Sample-Path Optimization the date of receipt and acceptance should be inserted later Abstract The sample-path

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29 Chapter 5 Univariate time-series analysis () Chapter 5 Univariate time-series analysis 1 / 29 Time-Series Time-series is a sequence fx 1, x 2,..., x T g or fx t g, t = 1,..., T, where t is an index denoting

More information

Nonlinear programming without a penalty function or a filter

Nonlinear programming without a penalty function or a filter Report no. NA-07/09 Nonlinear programming without a penalty function or a filter Nicholas I. M. Gould Oxford University, Numerical Analysis Group Philippe L. Toint Department of Mathematics, FUNDP-University

More information

Self-organized criticality on the stock market

Self-organized criticality on the stock market Prague, January 5th, 2014. Some classical ecomomic theory In classical economic theory, the price of a commodity is determined by demand and supply. Let D(p) (resp. S(p)) be the total demand (resp. supply)

More information

BROWNIAN MOTION Antonella Basso, Martina Nardon

BROWNIAN MOTION Antonella Basso, Martina Nardon BROWNIAN MOTION Antonella Basso, Martina Nardon basso@unive.it, mnardon@unive.it Department of Applied Mathematics University Ca Foscari Venice Brownian motion p. 1 Brownian motion Brownian motion plays

More information

Lecture 5 Theory of Finance 1

Lecture 5 Theory of Finance 1 Lecture 5 Theory of Finance 1 Simon Hubbert s.hubbert@bbk.ac.uk January 24, 2007 1 Introduction In the previous lecture we derived the famous Capital Asset Pricing Model (CAPM) for expected asset returns,

More information

Is Greedy Coordinate Descent a Terrible Algorithm?

Is Greedy Coordinate Descent a Terrible Algorithm? Is Greedy Coordinate Descent a Terrible Algorithm? Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, Hoyt Koepke University of British Columbia Optimization and Big Data, 2015 Context: Random

More information

Dynamic Admission and Service Rate Control of a Queue

Dynamic Admission and Service Rate Control of a Queue Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering

More information

Martingales. by D. Cox December 2, 2009

Martingales. by D. Cox December 2, 2009 Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

Log-linear Dynamics and Local Potential

Log-linear Dynamics and Local Potential Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically

More information

3.2 No-arbitrage theory and risk neutral probability measure

3.2 No-arbitrage theory and risk neutral probability measure Mathematical Models in Economics and Finance Topic 3 Fundamental theorem of asset pricing 3.1 Law of one price and Arrow securities 3.2 No-arbitrage theory and risk neutral probability measure 3.3 Valuation

More information

Stock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models

Stock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models Stock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models David Prager 1 1 Associate Professor of Mathematics Anderson University (SC) Based on joint work with Professor Qing Zhang,

More information

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Prof. Chuan-Ju Wang Department of Computer Science University of Taipei Joint work with Prof. Ming-Yang Kao March 28, 2014

More information

STOCHASTIC CALCULUS AND BLACK-SCHOLES MODEL

STOCHASTIC CALCULUS AND BLACK-SCHOLES MODEL STOCHASTIC CALCULUS AND BLACK-SCHOLES MODEL YOUNGGEUN YOO Abstract. Ito s lemma is often used in Ito calculus to find the differentials of a stochastic process that depends on time. This paper will introduce

More information

Practical example of an Economic Scenario Generator

Practical example of an Economic Scenario Generator Practical example of an Economic Scenario Generator Martin Schenk Actuarial & Insurance Solutions SAV 7 March 2014 Agenda Introduction Deterministic vs. stochastic approach Mathematical model Application

More information

B. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as

B. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as B Online Appendix B1 Constructing examples with nonmonotonic adoption policies Assume c > 0 and the utility function u(w) is increasing and approaches as w approaches 0 Suppose we have a prior distribution

More information

Tangent Lévy Models. Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford.

Tangent Lévy Models. Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford. Tangent Lévy Models Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford June 24, 2010 6th World Congress of the Bachelier Finance Society Sergey

More information

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n 6. Martingales For casino gamblers, a martingale is a betting strategy where (at even odds) the stake doubled each time the player loses. Players follow this strategy because, since they will eventually

More information

3 Arbitrage pricing theory in discrete time.

3 Arbitrage pricing theory in discrete time. 3 Arbitrage pricing theory in discrete time. Orientation. In the examples studied in Chapter 1, we worked with a single period model and Gaussian returns; in this Chapter, we shall drop these assumptions

More information

MATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models

MATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models MATH 5510 Mathematical Models of Financial Derivatives Topic 1 Risk neutral pricing principles under single-period securities models 1.1 Law of one price and Arrow securities 1.2 No-arbitrage theory and

More information

1 Consumption and saving under uncertainty

1 Consumption and saving under uncertainty 1 Consumption and saving under uncertainty 1.1 Modelling uncertainty As in the deterministic case, we keep assuming that agents live for two periods. The novelty here is that their earnings in the second

More information

Optimal retention for a stop-loss reinsurance with incomplete information

Optimal retention for a stop-loss reinsurance with incomplete information Optimal retention for a stop-loss reinsurance with incomplete information Xiang Hu 1 Hailiang Yang 2 Lianzeng Zhang 3 1,3 Department of Risk Management and Insurance, Nankai University Weijin Road, Tianjin,

More information

An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking

An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking Mika Sumida School of Operations Research and Information Engineering, Cornell University, Ithaca, New York

More information

Stochastic Proximal Algorithms with Applications to Online Image Recovery

Stochastic Proximal Algorithms with Applications to Online Image Recovery 1/24 Stochastic Proximal Algorithms with Applications to Online Image Recovery Patrick Louis Combettes 1 and Jean-Christophe Pesquet 2 1 Mathematics Department, North Carolina State University, Raleigh,

More information

What can we do with numerical optimization?

What can we do with numerical optimization? Optimization motivation and background Eddie Wadbro Introduction to PDE Constrained Optimization, 2016 February 15 16, 2016 Eddie Wadbro, Introduction to PDE Constrained Optimization, February 15 16, 2016

More information

A class of coherent risk measures based on one-sided moments

A class of coherent risk measures based on one-sided moments A class of coherent risk measures based on one-sided moments T. Fischer Darmstadt University of Technology November 11, 2003 Abstract This brief paper explains how to obtain upper boundaries of shortfall

More information

Stochastic calculus Introduction I. Stochastic Finance. C. Azizieh VUB 1/91. C. Azizieh VUB Stochastic Finance

Stochastic calculus Introduction I. Stochastic Finance. C. Azizieh VUB 1/91. C. Azizieh VUB Stochastic Finance Stochastic Finance C. Azizieh VUB C. Azizieh VUB Stochastic Finance 1/91 Agenda of the course Stochastic calculus : introduction Black-Scholes model Interest rates models C. Azizieh VUB Stochastic Finance

More information

Chapter 6 Forecasting Volatility using Stochastic Volatility Model

Chapter 6 Forecasting Volatility using Stochastic Volatility Model Chapter 6 Forecasting Volatility using Stochastic Volatility Model Chapter 6 Forecasting Volatility using SV Model In this chapter, the empirical performance of GARCH(1,1), GARCH-KF and SV models from

More information

Short-time-to-expiry expansion for a digital European put option under the CEV model. November 1, 2017

Short-time-to-expiry expansion for a digital European put option under the CEV model. November 1, 2017 Short-time-to-expiry expansion for a digital European put option under the CEV model November 1, 2017 Abstract In this paper I present a short-time-to-expiry asymptotic series expansion for a digital European

More information

Optimum Thresholding for Semimartingales with Lévy Jumps under the mean-square error

Optimum Thresholding for Semimartingales with Lévy Jumps under the mean-square error Optimum Thresholding for Semimartingales with Lévy Jumps under the mean-square error José E. Figueroa-López Department of Mathematics Washington University in St. Louis Spring Central Sectional Meeting

More information

GPD-POT and GEV block maxima

GPD-POT and GEV block maxima Chapter 3 GPD-POT and GEV block maxima This chapter is devoted to the relation between POT models and Block Maxima (BM). We only consider the classical frameworks where POT excesses are assumed to be GPD,

More information

Multistage risk-averse asset allocation with transaction costs

Multistage risk-averse asset allocation with transaction costs Multistage risk-averse asset allocation with transaction costs 1 Introduction Václav Kozmík 1 Abstract. This paper deals with asset allocation problems formulated as multistage stochastic programming models.

More information

Chapter 2 Uncertainty Analysis and Sampling Techniques

Chapter 2 Uncertainty Analysis and Sampling Techniques Chapter 2 Uncertainty Analysis and Sampling Techniques The probabilistic or stochastic modeling (Fig. 2.) iterative loop in the stochastic optimization procedure (Fig..4 in Chap. ) involves:. Specifying

More information

IEOR E4602: Quantitative Risk Management

IEOR E4602: Quantitative Risk Management IEOR E4602: Quantitative Risk Management Basic Concepts and Techniques of Risk Management Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Course information FN3142 Quantitative finance

Course information FN3142 Quantitative finance Course information 015 16 FN314 Quantitative finance This course is aimed at students interested in obtaining a thorough grounding in market finance and related empirical methods. Prerequisite If taken

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Stability in geometric & functional inequalities

Stability in geometric & functional inequalities Stability in geometric & functional inequalities A. Figalli The University of Texas at Austin www.ma.utexas.edu/users/figalli/ Alessio Figalli (UT Austin) Stability in geom. & funct. ineq. Krakow, July

More information

Online Appendices to Financing Asset Sales and Business Cycles

Online Appendices to Financing Asset Sales and Business Cycles Online Appendices to Financing Asset Sales usiness Cycles Marc Arnold Dirk Hackbarth Tatjana Xenia Puhan August 22, 2017 University of St. allen, Unterer raben 21, 9000 St. allen, Switzerl. Telephone:

More information

Hedging under Arbitrage

Hedging under Arbitrage Hedging under Arbitrage Johannes Ruf Columbia University, Department of Statistics Modeling and Managing Financial Risks January 12, 2011 Motivation Given: a frictionless market of stocks with continuous

More information

Lecture 5: Iterative Combinatorial Auctions

Lecture 5: Iterative Combinatorial Auctions COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes

More information

Convergence Analysis of Monte Carlo Calibration of Financial Market Models

Convergence Analysis of Monte Carlo Calibration of Financial Market Models Analysis of Monte Carlo Calibration of Financial Market Models Christoph Käbe Universität Trier Workshop on PDE Constrained Optimization of Certain and Uncertain Processes June 03, 2009 Monte Carlo Calibration

More information

Essays on Some Combinatorial Optimization Problems with Interval Data

Essays on Some Combinatorial Optimization Problems with Interval Data Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university

More information

Risk Neutral Measures

Risk Neutral Measures CHPTER 4 Risk Neutral Measures Our aim in this section is to show how risk neutral measures can be used to price derivative securities. The key advantage is that under a risk neutral measure the discounted

More information

AMH4 - ADVANCED OPTION PRICING. Contents

AMH4 - ADVANCED OPTION PRICING. Contents AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5

More information

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 005 Seville, Spain, December 1-15, 005 WeA11.6 OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF

More information

The value of foresight

The value of foresight Philip Ernst Department of Statistics, Rice University Support from NSF-DMS-1811936 (co-pi F. Viens) and ONR-N00014-18-1-2192 gratefully acknowledged. IMA Financial and Economic Applications June 11, 2018

More information

Simulating Stochastic Differential Equations

Simulating Stochastic Differential Equations IEOR E4603: Monte-Carlo Simulation c 2017 by Martin Haugh Columbia University Simulating Stochastic Differential Equations In these lecture notes we discuss the simulation of stochastic differential equations

More information

RECURSIVE VALUATION AND SENTIMENTS

RECURSIVE VALUATION AND SENTIMENTS 1 / 32 RECURSIVE VALUATION AND SENTIMENTS Lars Peter Hansen Bendheim Lectures, Princeton University 2 / 32 RECURSIVE VALUATION AND SENTIMENTS ABSTRACT Expectations and uncertainty about growth rates that

More information

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.

More information

MITCHELL S THEOREM REVISITED. Contents

MITCHELL S THEOREM REVISITED. Contents MITCHELL S THEOREM REVISITED THOMAS GILTON AND JOHN KRUEGER Abstract. Mitchell s theorem on the approachability ideal states that it is consistent relative to a greatly Mahlo cardinal that there is no

More information

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical

More information

4 Martingales in Discrete-Time

4 Martingales in Discrete-Time 4 Martingales in Discrete-Time Suppose that (Ω, F, P is a probability space. Definition 4.1. A sequence F = {F n, n = 0, 1,...} is called a filtration if each F n is a sub-σ-algebra of F, and F n F n+1

More information

THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION

THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION THE OPTIMAL ASSET ALLOCATION PROBLEMFOR AN INVESTOR THROUGH UTILITY MAXIMIZATION SILAS A. IHEDIOHA 1, BRIGHT O. OSU 2 1 Department of Mathematics, Plateau State University, Bokkos, P. M. B. 2012, Jos,

More information

Markov Decision Processes II

Markov Decision Processes II Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies

More information

Richardson Extrapolation Techniques for the Pricing of American-style Options

Richardson Extrapolation Techniques for the Pricing of American-style Options Richardson Extrapolation Techniques for the Pricing of American-style Options June 1, 2005 Abstract Richardson Extrapolation Techniques for the Pricing of American-style Options In this paper we re-examine

More information

arxiv: v1 [q-fin.pm] 13 Mar 2014

arxiv: v1 [q-fin.pm] 13 Mar 2014 MERTON PORTFOLIO PROBLEM WITH ONE INDIVISIBLE ASSET JAKUB TRYBU LA arxiv:143.3223v1 [q-fin.pm] 13 Mar 214 Abstract. In this paper we consider a modification of the classical Merton portfolio optimization

More information

IEOR E4703: Monte-Carlo Simulation

IEOR E4703: Monte-Carlo Simulation IEOR E4703: Monte-Carlo Simulation Simulating Stochastic Differential Equations Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

arxiv: v1 [math.st] 18 Sep 2018

arxiv: v1 [math.st] 18 Sep 2018 Gram Charlier and Edgeworth expansion for sample variance arxiv:809.06668v [math.st] 8 Sep 08 Eric Benhamou,* A.I. SQUARE CONNECT, 35 Boulevard d Inkermann 900 Neuilly sur Seine, France and LAMSADE, Universit

More information